Where to add the repository name and identifier in DataCite metadata?

Dear all,

Within NFDI4Chem and our repositories for chemistry data, we had discussions on the publisher field, where to put the repository name and why and how to add a repository identifier in the DOI metadata in DataCite’s schema. Let’s go though all of this:

Publisher: Definition from DataCite Schema v4.4:

The name of the entity that holds, archives, publishes prints, distributes, releases, issues, or produces the resource. This property will be used to formulate the citation, so consider the prominence of the role.

On citation the documentation tells:

Creator (PublicationYear): Title. Publisher. (resourceTypeGeneral). Identifier”

So what should show up here is the name of the repository, as journals are mentioned in citations for articles. This is clear and precise. Example:

S. Herres-Pawlis, F. Bach, I. Bruno, S. Chalk, N. Jung, J. Liermann, L. McEwen, S. Neumann, C. Steinbeck, M. Razum, O. Koepler, Angew. Chem. Int. Ed. 2022, 61, e2022203038. hxttps://doi.org/10.1002/anie.202203038

…and a hypothetical dataset:

Jon Doe (2023): NMR Spectra of all Structures published in PubChem. nmrXiv. (Dataset)
hxtps://doi.org/10.57992/nmrxiv.p1

The publisher field would provided the repository name.

Repository identifier: Better than a name would be an identifier. Identifiers for repositories are provided by re3data.org e.g. the identifier for RADAR4Chem is hxtp://doi.org/10.17616/R31NJNAY . Where to add this to the metadata?

The publisher field is certainly the wrong field (see above, definition and examples). Someone gave me the hint that the repo identifier might be added as a contributor with the contributorType hostingInstitution. However, I would be hesitant to add a DOI there, while all other contributors have names. Alternatively, the repository identifier could be a relatedIdentifier → relatedIdentiferType: DOI → relationType: IsPublishedIn → hxtp://doi.org/10.17616/R31NJNAY

Any opinions on that? Is is planned to add some best practice example on repository identifiers to the next version of the DataCite schema?

The use case of such a repository would be to search for all datasets published in a repository. This is, unfortunately, not possible based on the DOI prefix, as this is unique for the registrant. Following the example mentioned above, the registrant for DOIs in RADAR4Chem is FIZ Karlsruhe, but hey do also register DOIs for other repositories e.g. RADAR4Culture RADAR4Culture | re3data.org or RADAR RADAR | re3data.org

Consequently, “Find Related Works” on the commons.datacite.org page of RADAR DataCite Commons shows datasets in RADAR, RADA4Chem and RADAR4Culture, as all datasets in these repositories have DOIs with the same prefix.

Best,
Tillmann

1 Like

Thanks for this question @tfischer. The Publisher property can be used for the repository name. In the upcoming version 4.5 of the Metadata Schema, we are adding a publisherIdentifier attribute to Publisher, which could be a ROR ID, re3data identifier, or other appropriate identifier.

You can preview the documentation for v4.5 here: DataCite Metadata Schema Documentation for the Publication and Citation of Research Data and Other Research Outputs — DataCite Metadata Schema 4.5 documentation. We expect to release early in 2024.

1 Like

Dear Kelly,

Thanks for the reply, the link to the preview and great to hear that this is being added to the schema!

For research data repositories, the re3data identifier may be the identifier of choice, but not the ROR, as the repositories are hosted by organisations that have a ROR but aren’t organisations themselves.

Best,
Tillmann

Definitely, re3data is also a good choice here!

Dear Kelly,

An additional option to think about could be to have 1-n = required and repeatable for occurrences rather than 0-1 = optional, but not repeatable. If this is the case, the research data repositories might provide the re3data identifier to clearly identify a repository but could also provide the ROR of the organisation which hosts the repository.

Best,
Tillmann

1 Like

Dear Kelly,

I need to make a correction on my previous post: An additional option to think about could be to have 0-n = optional and repeatable instead of 0-1 = optional, but not repeatable for the occurrences.

If more than one publisherIdentifier would be allowed, repositories could provide an re3data identifier for the repository and the ROR identifier for the hosting institution.

Best,
Tillmann

1 Like