What relationType to use?

I am wondering about a very common use case: A dataset is used for a study that is published in a paper, which also contains the methods describing how the dataset was created / observed / measured. The paper refers to the PID of the dataset and vice-versa. All of the relationTypes (DataCite Schema 4.3) below seem to apply. Should I really attach all of them to the dataset?

IsCitedBy
Cites
IsSupplementTo
IsSupplementedBy
IsDescribedBy
IsReferencedBy
References
IsDocumentedBy

Hi @hvw

This is a good question. We recently shared a presentation describing our recommendation. You can find more information here: https://datacite.org/assets/Scholix_OpenHours.pptx

Essentially, the relationType you use will depend on whether you are describing a citation, a reference or a general relation between the two objects. You will also need to consider the direction of the relation. In the case you describe, if you are adding the relationType to the DOI metadata of the dataset (subject); IsCitedBy, IsSupplementTo or IsReferencedBy the “paper” (object) are all appropriate.

Let us know if you have any questions or feedback on this.

Mary

Hi @Mary_Hirsch

Thanks for the quick answer! I am still not sure what the proper relationships should be though. Let’s pull that apart:

  1. If there are multiple possible relationship-pairs between paper and dataset, is the recommendation to pick one, or to put all of them into the metadata?

  2. If the former, is there a recommendation for precedence between IsCitedBy/Cites and IsSupplementTo/IsSupplementedBy ?

  3. It would be nice to have more info about the semantics of the terms:

    • In a scientific context, if A cites B, A usually also references B, so the two go together, right?

    • The paper usually contains something like “data is available at https://doi.org/xxxx”. Would it be correct to assume that this is a reference but not a citation? What is an example sentence in a paper that would be construed as the paper “citing” the dataset?

    • In any case (that went well) there will be a reference from the paper to the dataset and vice versa. So almost all datasets belonging to a paper should have both, “References” and “IsReferencedBy”, correct?

    • Is “Supplement” always interpreted one-directional (paper -> datset) thus indicating that the data plays a secondary role? If you want to use the data, you could also regard the Method section of the paper as “Supplement”, or not?

    • What is the difference between “IsDocumentedBy” and “IsDescribedBy” ?

Hi @hvw

This is our recommendation:

As you can see there are 3 relationTypes (in both directions) that will count as citations or references.

  1. If there are multiple possible relationship-pairs between paper and dataset, is the recommendation to pick one, or to put all of them into the metadata?

For the purposes of DataCite citation counts as outlined above you need to pick one. You should decide whether the relation is a citation, a reference or another relation and the direction and then assign the corresponding relationType. The schema documentation provides a description for all the relationTypes https://schema.datacite.org/

  1. If the former, is there a recommendation for precedence between IsCitedBy/Cites and IsSupplementTo/IsSupplementedBy ?

Both will be counted as citations. Those assigning the relationType should decide which one is most relevant in each case.

  1. It would be nice to have more info about the semantics of the terms:
  • In a scientific context, if A cites B, A usually also references B, so the two go together, right?

In the metadata, both relationTypes can be used to generate a citation i.e. A either "references" or "cites" B

  • The paper usually contains something like “data is available at https://doi.org/xxxx”. Would it be correct to assume that this is a reference but not a citation? What is an example sentence in a paper that would be construed as the paper “citing” the dataset?

The metadata of the article and dataset establish the link/relation in a machine readable format, regardless of the format of the information in, for example, the text of the publication. However, it is important that to cite data correctly and there are guidelines for DataCite citations outlined here:

  • In any case (that went well) there will be a reference from the paper to the dataset and vice versa. So almost all datasets belonging to a paper should have both, “References” and “IsReferencedBy”, correct?

That is not always the case. E.g. sometimes a dataset will be created after publication of the paper and will reference that paper, but there won't be a reference from the paper to the dataset.

  • Is “Supplement” always interpreted one-directional (paper → datset) thus indicating that the data plays a secondary role? If you want to use the data, you could also regard the Method section of the paper as “Supplement”, or not?

It is feasible that a paper IsSupplementTo a dataset, there is nothing to prevent this relation.

  • What is the difference between “IsDocumentedBy” and “IsDescribedBy” ?

Although they can be used interchangeably the main use for IsDescribedBy is for data papers. IsDocumentedBy is more for information about how to use whatever it is that’s been documented, like, for example, software.

Dear @Mary_Hirsch

Thank you very much for the extensive answer. That was quite helpful! Taking that together (mainly “pick one”, “both will be counted as citations”, and “symmetric ‘Supplement’ relationship between paper & data is ok”), the quite comfortable solution to my standard use-case is simply to use both “IsSupplementTo” and “IsSupplementedBy”, as it always applies, no matter whether one actually cites or references the other.

Best,
Harald

Hi @hvw
Good to hear that has clarified things. On that last point, it’s fine to use the “IsSupplement” relationType in both directions, but keep in mind, if a dataset IsSupplmentTo a paper this will count as a citation for the dataset. If the dataset IsSupplementedBy by a paper will not.
Best
Mary

1 Like

Thanks @Mary_Hirsch

Jep, that’s clear. Thanks again!