Chemical identifiers as Subject terms

Hi everyone,
I’m working with chemical substance identifiers as metadata for data deposits in order to more closely align the records with FAIR. I’d like to represent the identifiers using the Subject properties in DataCite (Subject, subjectScheme, schemeURI, valueURI, and classificationCode), insofar as they can include/enable linked information.

I’ll use this record in PubChem as an example. The identifiers listed in section 2 of this record in PubChem: 2,2',5,6'-Tetrachlorobiphenyl | C12H6Cl4 - PubChem

For identifiers that are computed (2.1.x , rules/software generate the identifier based on the chemical structure), I’m uncertain what to include for the schemURI and valueURI for the following, which are preferred since in general they are unique identifiers for a substance.
IUPAC Name, InChI, InChI Key, and Canonical SMILES

Other identifiers are more straightforward (2.3 on the web page), because a valueURI can be included, such as PubChem ID, and DSSTox Substance ID.

Any ideas on how to handle the computed identifiers?
Thanks!

2 Likes

I just ran across a suggestion for InChI and InChIKey in a presentation on Connecting Chemistry Through PIDs (see slide 18), so I’ll adopt that approach.

1 Like

Hi Brian, yes indeed the approach you identify is now the topic of an IUPAC working party (FAIRSpec), in fact on NMR spectroscopy but it will include such identifiers in its recommendations. See eg R. M. Hanson, D. Jeannerat, M. Archibald, I. Bruno, S. Chalk, A. N. Davies, R. J. Lancashire, J. Lang and H. S. Rzepa, IUPAC specification for the FAIR management of spectroscopic data in chemistry (IUPAC FAIRSpec) – guiding principles, Pure App. Chem., 2022, DOI: https://doi.org/10.1515/pac-2021-2009 We hope these recommendations will emerge within the next 12 months. So hang on in there.

1 Like

Brian, For some background, the first instances I think of using chemical identifiers as subject terms was described in M. J. Harvey, A. McLean, H. S. Rzepa, A metadata-driven approach to data repository design, J. Cheminform. , 2017 , DOI: 10.1186/s13321-017-0190-6

but the realisation soon dawned that the process needed to be standardized. So efforts started to gather a group of chemists interested in doing this and an IUPAC working party emerged in 2020 to do so.

1 Like

If you want to see some examples of the use of subject terms, see this slide https://www.ch.ic.ac.uk/rzepa/talks/watoc22/9.html from the talk just delivered https://doi.org/hsgk

These uses of the subject term were introduced in 2016, but must be considered provisional and will be subsumed by the IUPAC FAIRSpec recommendations. They were proofs of concept on what could be achieved using them.

This slide https://www.ch.ic.ac.uk/rzepa/talks/watoc22/10.html introduces the concept of media types into searches. A Media type can be registered using eg https://support.datacite.org/reference/list-media-for-doi-name-1 and can be used to narrow down data to particular “formats” in common use in eg Chemistry.

1 Like