At DataCite, we often receive questions about the different options for the relationType attribute of the relatedIdentifier property. relationTypes also come frequently on the PID Forum (search for “relationType” or “relation type”).
To help clarify when to use different relationTypes, schema 5.0 could include revised definitions, more examples, user guides, and/or changes to the list of relationTypes.
So, to get this work started, I have a few questions for DataCite metadata schema users:
What types of relationships are you trying to capture between DOIs and other research outputs?
For example: a paper citing a dataset; an article that is a translation of another article; simulation data that is generated by software…
What relationTypes are you using for those relationships, if you’re including them in DataCite metadata?
Has your institution developed interpretations for any of the DataCite relationTypes?
Great that you are planning on improving this in the schema 5.0. For our use case, the description of academic events, we already can use quite a lot of relationTypes. We use the following relationTypes for describing the relations of academic events to other ressources:
is part of: for relating to the event series (ressourceType: Collection)
is documented by: for relating conference outputs like recordings of talks or sessions, presentation slides or posters
is supplemented by: for relating the proceedings
By reading the available definition, I feel that we are somewhat stretching the current interpretation of the isDocumentedBy and isSupplementedBy relationTypes.
A use case that we thought of but not have made any experiments with yet is the referencing of an event in e.g. a paper. For example when the paper refers to a discussion at a conference or a topic a conference covered. I think for this case we would want to use IsReferencedBy. However, I wouldn’t consider the conference “as a source of information” as the current definition says in this case, rather simply as being referenced.
Beside of many other topics, the NFDI4Chem also works on publication standards focusing on academic publishing in chemistry. We look though the lens of FAIR research data publishing but also other open science practices on author guidelines of chemistry-focused scientific journals and question how these guidelines might improve the uptake of research data sharing.
One of the initial questions to us was how to link articles in scientific journals to dataset(s) in research data repositories and vice versa. This was also a question which was brought to us from the chemistry community, as the choice of the correct relationType was not obvious to these researchers.
The answer might be actually given in the documentation of DataCite’s metadata schema AND also in the schema provided by CrossRef, which is of immanent importance for academic publishing in scientific journal as CrossRef is the main DOI registration agency for papers published in these journals. However, the choice is not totally clear.
Example:
In metadata of a scientific publication with a DOI registered with CrossRef and its metadata schema, the link to a dataset in a research data repository via its DOI might be achieved via the related_item with inter_work_relation, identifier-type=“doi” and relationship-type=“references”.
In metadata of a dataset with a DOI registered with DataCite with its metadata schema, the link to a scientific publication via its DOI might be achieved via the relatedIdentifier with relatedIdentifierType=“doi” and relationType=“isReferencedBy”.
The first is relevant for journals and their submission systems.
Research data repositories with metadata close the DataCite schema usually offer the whole controlled list of relation Types to users. This list is quiet long and beside of the way of doing things mentioned above researchers might come up with other relation types such as IsCitedBy (no corresponding relation type in CrossRef schema). Another option would be IsSupplmenetTo, looking on a dataset holding all information previously provided as pictures in supplementary information PDFs on publishers web pages.
The later two relation Types are described in DataCite’s metadata schema documentation as “recommended for discovery”—difficult to understand from the perspective of researchers who want to publish the corresponding dataset of a scientific publication and might thing of a space shuttle retired in 2011.
I think there are tree major handles:
research data repository operators should not provide the full controlled list of relation type but might sort them in “favorites” and “others”, presenting the most relevant one for the usual use case.
Metadata schema documentations need to include more examples and usage notes also covering the aforementioned classical “pair” of a scientific publication in a journal with it corresponding dataset(s).
Publishers need to agree on a relation type to be used to reference a corresponding dataset. In other words: There need to be a standard.
The types of relationships we are trying to capture between DOIs and other research outputs are:
The relation between a paper and the data underpinning the conclusions presented in the paper. The data is published with a DOI and is cited/referenced in the paper.
Likewise the relation between the paper and corresponding data, - captured in the metadata of the data item.
For datasets that are published in our institutional research data repository this relation is represented by the relationType “IsSupplementTo”. The relationType “IsIdenticalTo” is used to indicate a version of the same item. Snippet here:
We wish to start capturing the relations between instruments that we registerer with a DOI and index the relation in the metadata of the published dataset and vice verca. So here are the use cases:
a. Reference the instrument DOI in the data publication.
b. Reference the dataset DOI in the instrument metadata.
c. Reference High Performance Computing resources used for data processing in the metadata of a data item.
We have not developed our own interpretations but the examples provided in your post are very useful and inspirational. Thanks for linking.
Thank you @JulianF, @tfischer, and @Sigga for these thoughtful responses! This is great input.
Since this thread was started a few months ago, I wanted to mention that we will continue to welcome more responses over the coming months! Any revisions to the relationType vocabulary and definitions will be undertaken carefully, and it’s very important that we understand how the relationTypes are being applied.
Here in the UK there’s a growing expectation from funders here in the UK to assign a unique PID to Authors’ Accepted Manuscripts deposited in an OA repository (i.e. different from the PID for the Version of Record). Whether or not to use DOIs for this is an ongoing debate, but where it is done we are recommending people make liberal use of relatedIdentifier to minimise any possible confusion over the duplication. For example:
IsVersionOf or IsOriginalFormOf to link from the AAM to the VoR
IsIdenticalTo to link between copies of the same AAM held in different institutional repositories
It would be interesting to hear any thought folks have on further refining the relationType controlled list to better characterise these relationships, as that would definitely improve DOI’s suitability for this use case.
Something I would like to mention here additionally: Within NFDI4Chem we also discussed about the relation Type “IsSupplementTo” and compared this to “IsReferencedBy”.
Using the first is kind of intuitive, as researchers publish supplemental PDFs, which might be overcome when the actual data with metadata and provenance information is published. I would like to point out that from our perspective each data publication is a great piece of work and represents an own independent work. Hence, we think that we need to chance our perspective from “data is something attached to another scientific outcome” to “the data itself is worth to be published”.
I have just been talking to our relatively new Director of Scholarly Communications, who is interested in helping people comply with our Open Access policy by depositing works into our repository.
We are discussing 1) whether DOIs should be assigned to pre-prints or Authors’ Accepted Manuscripts and 2) if so, how to describe the relationship of these items to the version of record with DC metadata. We looked at IsVersionOf and IsOriginalFormOf and were concerned that the descriptions only mentioned software. To what extent can we employ our own interpretations of these terms?
Also of interest is whether there are any groups/projects/harvester/etc. that use specific DC metadata elements or attributes to identify OA versions of published articles. If so, we may want to comply with those guidelines to make sure our content is getting picked up by them. I’m thinking in particular of Unpaywall.
What have you all decided to use, @jezcope ? And does anyone else have thoughts on this? Thanks!
Hi @amyhodge, I can confirm that those relationTypes are not limited to software! Some of the examples are for software but these are not restrictive. I think IsVersionOf/HasVersion would be the pair to use here (IsOriginalFormOf would pair with IsVariantFormOf).
@KellyStathis It looks like with the relatedIdentifier/relationType there is no way to explicitly call out whether an article is a submitted manuscript version or an authors’ accepted manuscript. Is that true? Thanks.
1: When you’re registering a DOI for the preprint you can use resourceTypeGeneral (required) and resourceType (optional) to get more specific.
For example: <resourceType resourceTypeGeneral="Preprint">Submitted manuscript version<resourceType>
or simply <resourceType resourceTypeGeneral="Preprint">
(I’d consider “Preprint” and “Submitted manuscript” to be the same thing, but I know there’s some variation with this terminology; the idea with resourceType is that you can specify further if needed!)
2: When you’re linking to another version from this preprint, you would use RelatedIdentifier. RelatedIdentifier has an optional resourceTypeGeneral attribute. I would say “JournalArticle” here is the best fit for the version of record, e.g.:
We are looking at people most likely depositing one of three versions of a journal article to our repository to meet our OA policy, depending on what the publisher allows:
submitted version/authors original manuscript (AOM)/original manuscript/preprint
published version/final published version/version of record
Your example above for relatedIdentifier could be used exactly as is for the post-print, and with a change of “IsVersionOf” to “IsIdenticalTo” would work for the published version as well. So, that seems to work just fine for all cases.
For resourceTypeGeneral, it’s great to have preprint, but post-print would be super helpful to have. Then we could designate these three versions as:
Without it, it seems like we would have to decide whether to refer to post-prints as '“Preprint,” which isn’t technically correct, or as “JournalArticle,” which only feels appropriate for the published version to me (and ideally only for the publisher’s copy, not our OA one, but that’s an even finer point that may not be worth talking about).
Without such a term, I might be inclined to refer to all of these as “JournalArticle” and then designate them as Submitted version/Accepted version/published version (or whichever equivalent terms our metadata folks settled on) for the resourceType in the free text area as one way to be consistent and clearer:
I definitely prefer having the controlled vocabulary terms of resourceTypeGeneral for this, but we’ll have to do what we can with what’s available when we get to this project!
Thanks for outlining this so clearly! I’ve made a note that this is a gap in the current resourceTypeGeneral vocabulary.
For interim solutions, I would probably lean towards using “Preprint” for the preprint still, since we do have that resourceTypeGeneral. I also agree the published version should be “JournalArticle”. That leaves the post-print: you could pick “Preprint” or “JournalArticle”, or another option is to use the more general “Text”, again with a free text resourceType to add specificity. So you’d have:
Preprint for the preprint
Text (with resourceType “Post-print”, “accepted version”, or similar) for the post-print