How are you using DataCite relationTypes?

At DataCite, we often receive questions about the different options for the relationType attribute of the relatedIdentifier property. relationTypes also come frequently on the PID Forum (search for “relationType” or “relation type”).

To help clarify when to use different relationTypes, schema 5.0 could include revised definitions, more examples, user guides, and/or changes to the list of relationTypes.

So, to get this work started, I have a few questions for DataCite metadata schema users:

  • What types of relationships are you trying to capture between DOIs and other research outputs?

    • For example: a paper citing a dataset; an article that is a translation of another article; simulation data that is generated by software…
  • What relationTypes are you using for those relationships, if you’re including them in DataCite metadata?

  • Has your institution developed interpretations for any of the DataCite relationTypes?

Here’s schema 4.4 for reference: https://schema.datacite.org/meta/kernel-4.4/doc/DataCite-MetadataKernel_v4.4.pdf - the relationTypes and their definitions are on p. 58.

All input is welcome - we’ll be delving into specific use cases in the coming year as we start schema 5.0 work.

1 Like

Great that you are planning on improving this in the schema 5.0. For our use case, the description of academic events, we already can use quite a lot of relationTypes. We use the following relationTypes for describing the relations of academic events to other ressources:

  • is part of: for relating to the event series (ressourceType: Collection)
  • is documented by: for relating conference outputs like recordings of talks or sessions, presentation slides or posters
  • is supplemented by: for relating the proceedings

By reading the available definition, I feel that we are somewhat stretching the current interpretation of the isDocumentedBy and isSupplementedBy relationTypes.

A use case that we thought of but not have made any experiments with yet is the referencing of an event in e.g. a paper. For example when the paper refers to a discussion at a conference or a topic a conference covered. I think for this case we would want to use IsReferencedBy. However, I wouldn’t consider the conference “as a source of information” as the current definition says in this case, rather simply as being referenced.

1 Like

Beside of many other topics, the NFDI4Chem also works on publication standards focusing on academic publishing in chemistry. We look though the lens of FAIR research data publishing but also other open science practices on author guidelines of chemistry-focused scientific journals and question how these guidelines might improve the uptake of research data sharing.

One of the initial questions to us was how to link articles in scientific journals to dataset(s) in research data repositories and vice versa. This was also a question which was brought to us from the chemistry community, as the choice of the correct relationType was not obvious to these researchers.

The answer might be actually given in the documentation of DataCite’s metadata schema AND also in the schema provided by CrossRef, which is of immanent importance for academic publishing in scientific journal as CrossRef is the main DOI registration agency for papers published in these journals. However, the choice is not totally clear.

Example:

  • In metadata of a scientific publication with a DOI registered with CrossRef and its metadata schema, the link to a dataset in a research data repository via its DOI might be achieved via the related_item with inter_work_relation, identifier-type=“doi” and relationship-type=“references”.
  • In metadata of a dataset with a DOI registered with DataCite with its metadata schema, the link to a scientific publication via its DOI might be achieved via the relatedIdentifier with relatedIdentifierType=“doi” and relationType=“isReferencedBy”.

The first is relevant for journals and their submission systems.

Research data repositories with metadata close the DataCite schema usually offer the whole controlled list of relation Types to users. This list is quiet long and beside of the way of doing things mentioned above researchers might come up with other relation types such as IsCitedBy (no corresponding relation type in CrossRef schema). Another option would be IsSupplmenetTo, looking on a dataset holding all information previously provided as pictures in supplementary information PDFs on publishers web pages.

The later two relation Types are described in DataCite’s metadata schema documentation as “recommended for discovery”—difficult to understand from the perspective of researchers who want to publish the corresponding dataset of a scientific publication and might thing of a space shuttle retired in 2011.

I think there are tree major handles:

  • research data repository operators should not provide the full controlled list of relation type but might sort them in “favorites” and “others”, presenting the most relevant one for the usual use case.
  • Metadata schema documentations need to include more examples and usage notes also covering the aforementioned classical “pair” of a scientific publication in a journal with it corresponding dataset(s).
  • Publishers need to agree on a relation type to be used to reference a corresponding dataset. In other words: There need to be a standard.

Best

1 Like

Hi Kelly

Thank you for the invitation to share inputs.

The types of relationships we are trying to capture between DOIs and other research outputs are:

  • The relation between a paper and the data underpinning the conclusions presented in the paper. The data is published with a DOI and is cited/referenced in the paper.
    Likewise the relation between the paper and corresponding data, - captured in the metadata of the data item.

For datasets that are published in our institutional research data repository this relation is represented by the relationType “IsSupplementTo”. The relationType “IsIdenticalTo” is used to indicate a version of the same item. Snippet here:

{
  "relationType": "IsSupplementTo",
  "relatedIdentifier": "10.1016/j.scitotenv.2022.158936",
  "relatedIdentifierType": "DOI"
},
{
  "relationType": "IsIdenticalTo",
  "relatedIdentifier": "10.11583/dtu.19455554",
  "relatedIdentifierType": "DOI"
  • We wish to start capturing the relations between instruments that we registerer with a DOI and index the relation in the metadata of the published dataset and vice verca. So here are the use cases:

a. Reference the instrument DOI in the data publication.
b. Reference the dataset DOI in the instrument metadata.
c. Reference High Performance Computing resources used for data processing in the metadata of a data item.

We have not developed our own interpretations but the examples provided in your post are very useful and inspirational. Thanks for linking.

Best wishes, Signe

1 Like

Thank you @JulianF, @tfischer, and @Sigga for these thoughtful responses! This is great input.

Since this thread was started a few months ago, I wanted to mention that we will continue to welcome more responses over the coming months! Any revisions to the relationType vocabulary and definitions will be undertaken carefully, and it’s very important that we understand how the relationTypes are being applied.

1 Like

Here in the UK there’s a growing expectation from funders here in the UK to assign a unique PID to Authors’ Accepted Manuscripts deposited in an OA repository (i.e. different from the PID for the Version of Record). Whether or not to use DOIs for this is an ongoing debate, but where it is done we are recommending people make liberal use of relatedIdentifier to minimise any possible confusion over the duplication. For example:

  • IsVersionOf or IsOriginalFormOf to link from the AAM to the VoR
  • IsIdenticalTo to link between copies of the same AAM held in different institutional repositories

It would be interesting to hear any thought folks have on further refining the relationType controlled list to better characterise these relationships, as that would definitely improve DOI’s suitability for this use case.

Something I would like to mention here additionally: Within NFDI4Chem we also discussed about the relation Type “IsSupplementTo” and compared this to “IsReferencedBy”.

Using the first is kind of intuitive, as researchers publish supplemental PDFs, which might be overcome when the actual data with metadata and provenance information is published. I would like to point out that from our perspective each data publication is a great piece of work and represents an own independent work. Hence, we think that we need to chance our perspective from “data is something attached to another scientific outcome” to “the data itself is worth to be published”.

I have just been talking to our relatively new Director of Scholarly Communications, who is interested in helping people comply with our Open Access policy by depositing works into our repository.

We are discussing 1) whether DOIs should be assigned to pre-prints or Authors’ Accepted Manuscripts and 2) if so, how to describe the relationship of these items to the version of record with DC metadata. We looked at IsVersionOf and IsOriginalFormOf and were concerned that the descriptions only mentioned software. To what extent can we employ our own interpretations of these terms?

Also of interest is whether there are any groups/projects/harvester/etc. that use specific DC metadata elements or attributes to identify OA versions of published articles. If so, we may want to comply with those guidelines to make sure our content is getting picked up by them. I’m thinking in particular of Unpaywall.

What have you all decided to use, @jezcope ? And does anyone else have thoughts on this? Thanks!

Hi @amyhodge, I can confirm that those relationTypes are not limited to software! Some of the examples are for software but these are not restrictive. I think IsVersionOf/HasVersion would be the pair to use here (IsOriginalFormOf would pair with IsVariantFormOf).

Thanks, @KellyStathis. Very helpful.

@KellyStathis It looks like with the relatedIdentifier/relationType there is no way to explicitly call out whether an article is a submitted manuscript version or an authors’ accepted manuscript. Is that true? Thanks.

I think there are two pieces to this:

1: When you’re registering a DOI for the preprint you can use resourceTypeGeneral (required) and resourceType (optional) to get more specific.

For example: <resourceType resourceTypeGeneral="Preprint">Submitted manuscript version<resourceType>

or simply <resourceType resourceTypeGeneral="Preprint">

(I’d consider “Preprint” and “Submitted manuscript” to be the same thing, but I know there’s some variation with this terminology; the idea with resourceType is that you can specify further if needed!)

2: When you’re linking to another version from this preprint, you would use RelatedIdentifier. RelatedIdentifier has an optional resourceTypeGeneral attribute. I would say “JournalArticle” here is the best fit for the version of record, e.g.:

<relatedIdentifier relatedIdentifierType="DOI" relationType="IsVersionOf" resourceTypeGeneral="JournalArticle">10.21384/bar</relatedIdentifier>

I’m not sure about linking to an AAM, though; we don’t have a resourceTypeGeneral for “Postprint”. Maybe we should consider it?

Thanks, Kelly.

We are looking at people most likely depositing one of three versions of a journal article to our repository to meet our OA policy, depending on what the publisher allows:

  • submitted version/authors original manuscript (AOM)/original manuscript/preprint
  • accepted version/authors accepted manuscript (AAM)/authors accepted version/final author version/post-print
  • published version/final published version/version of record

Your example above for relatedIdentifier could be used exactly as is for the post-print, and with a change of “IsVersionOf” to “IsIdenticalTo” would work for the published version as well. So, that seems to work just fine for all cases.

For resourceTypeGeneral, it’s great to have preprint, but post-print would be super helpful to have. Then we could designate these three versions as:

<resourceType resourceTypeGeneral="Preprint">
<resourceType resourceTypeGeneral="Post-print">
<resourceType resourceTypeGeneral="JournalArticle">

Without it, it seems like we would have to decide whether to refer to post-prints as '“Preprint,” which isn’t technically correct, or as “JournalArticle,” which only feels appropriate for the published version to me (and ideally only for the publisher’s copy, not our OA one, but that’s an even finer point that may not be worth talking about).

Without such a term, I might be inclined to refer to all of these as “JournalArticle” and then designate them as Submitted version/Accepted version/published version (or whichever equivalent terms our metadata folks settled on) for the resourceType in the free text area as one way to be consistent and clearer:

<resourceType resourceTypeGeneral="JournalArticle">Submitted version<resourceType>
<resourceType resourceTypeGeneral="JournalArticle">Accepted version<resourceType>
<resourceType resourceTypeGeneral="JournalArticle">Published version<resourceType>

I definitely prefer having the controlled vocabulary terms of resourceTypeGeneral for this, but we’ll have to do what we can with what’s available when we get to this project!

Thanks.

Thanks for outlining this so clearly! I’ve made a note that this is a gap in the current resourceTypeGeneral vocabulary.

For interim solutions, I would probably lean towards using “Preprint” for the preprint still, since we do have that resourceTypeGeneral. I also agree the published version should be “JournalArticle”. That leaves the post-print: you could pick “Preprint” or “JournalArticle”, or another option is to use the more general “Text”, again with a free text resourceType to add specificity. So you’d have:

  • Preprint for the preprint
  • Text (with resourceType “Post-print”, “accepted version”, or similar) for the post-print
  • JournalArticle for the published version

Also a reasonable solution. Thanks!

Dear Kelly,

For reference: CrossRef says in its documentation that the relationship_type IsSupplementedBy should be used for datasets, see Relationships - Crossref → Table 2 → line at the bottom. Conversely, the DOI metadata of the dataset should link to the corresponding article using relationship_type IsSupplementTo.

Although we still believe that datasets aren’t something attached to another scientific result, but that datasets themselves are worthy of being independent publications, we now also support both mentioned relation types in favour of references and isReferencedBy.

Best,
Tillmann

1 Like

Hi, all; thanks for the discussion and examples on this topic! Here at ICPSR, we are looking to standardize how we–and our depositers–characterize relationships between data collections and other resources. Right now, we are focusing on the following terms (and local definitions):

  • “IsSupplementTo”/“IsSupplementedBy”: emphasizes a supplementary or complementary relationship; supplement is meant to enhance or support the main work.
  • “IsReferencedBy”/“References” and “IsCitedBy”/“Cites”: both involve referencing but with different levels of formality and specificity. “IsCitedBy” is a more formal acknowledgment of the cited work in a structured bibliography or reference section.

We look forward to learning more about and contributing to future work on relationTypes in DataCite–thanks!

Mike Shallcross
Inter-university Consortium for Political and Social Research (ICPSR)