I am wondering about a very common use case: A dataset is used for a study that is published in a paper, which also contains the methods describing how the dataset was created / observed / measured. The paper refers to the PID of the dataset and vice-versa. All of the relationTypes (DataCite Schema 4.3) below seem to apply. Should I really attach all of them to the dataset?
Essentially, the relationType you use will depend on whether you are describing a citation, a reference or a general relation between the two objects. You will also need to consider the direction of the relation. In the case you describe, if you are adding the relationType to the DOI metadata of the dataset (subject); IsCitedBy, IsSupplementTo or IsReferencedBy the âpaperâ (object) are all appropriate.
Let us know if you have any questions or feedback on this.
Thanks for the quick answer! I am still not sure what the proper relationships should be though. Letâs pull that apart:
If there are multiple possible relationship-pairs between paper and dataset, is the recommendation to pick one, or to put all of them into the metadata?
If the former, is there a recommendation for precedence between IsCitedBy/Cites and IsSupplementTo/IsSupplementedBy ?
It would be nice to have more info about the semantics of the terms:
In a scientific context, if A cites B, A usually also references B, so the two go together, right?
The paper usually contains something like âdata is available at https://doi.org/xxxxâ. Would it be correct to assume that this is a reference but not a citation? What is an example sentence in a paper that would be construed as the paper âcitingâ the dataset?
In any case (that went well) there will be a reference from the paper to the dataset and vice versa. So almost all datasets belonging to a paper should have both, âReferencesâ and âIsReferencedByâ, correct?
Is âSupplementâ always interpreted one-directional (paper -> datset) thus indicating that the data plays a secondary role? If you want to use the data, you could also regard the Method section of the paper as âSupplementâ, or not?
What is the difference between âIsDocumentedByâ and âIsDescribedByâ ?
As you can see there are 3 relationTypes (in both directions) that will count as citations or references.
If there are multiple possible relationship-pairs between paper and dataset, is the recommendation to pick one, or to put all of them into the metadata?
For the purposes of DataCite citation counts as outlined above you need to pick one. You should decide whether the relation is a citation, a reference or another relation and the direction and then assign the corresponding relationType. The schema documentation provides a description for all the relationTypes https://schema.datacite.org/
If the former, is there a recommendation for precedence between IsCitedBy/Cites and IsSupplementTo/IsSupplementedBy ?
Both will be counted as citations. Those assigning the relationType should decide which one is most relevant in each case.
It would be nice to have more info about the semantics of the terms:
In a scientific context, if A cites B, A usually also references B, so the two go together, right?
In the metadata, both relationTypes can be used to generate a citation i.e. A either "references" or "cites" B
The paper usually contains something like âdata is available at https://doi.org/xxxxâ. Would it be correct to assume that this is a reference but not a citation? What is an example sentence in a paper that would be construed as the paper âcitingâ the dataset?
The metadata of the article and dataset establish the link/relation in a machine readable format, regardless of the format of the information in, for example, the text of the publication. However, it is important that to cite data correctly and there are guidelines for DataCite citations outlined here:
In any case (that went well) there will be a reference from the paper to the dataset and vice versa. So almost all datasets belonging to a paper should have both, âReferencesâ and âIsReferencedByâ, correct?
That is not always the case. E.g. sometimes a dataset will be created after publication of the paper and will reference that paper, but there won't be a reference from the paper to the dataset.
Is âSupplementâ always interpreted one-directional (paper â datset) thus indicating that the data plays a secondary role? If you want to use the data, you could also regard the Method section of the paper as âSupplementâ, or not?
It is feasible that a paper IsSupplementTo a dataset, there is nothing to prevent this relation.
What is the difference between âIsDocumentedByâ and âIsDescribedByâ ?
Although they can be used interchangeably the main use for IsDescribedBy is for data papers. IsDocumentedBy is more for information about how to use whatever it is thatâs been documented, like, for example, software.
Thank you very much for the extensive answer. That was quite helpful! Taking that together (mainly âpick oneâ, âboth will be counted as citationsâ, and âsymmetric âSupplementâ relationship between paper & data is okâ), the quite comfortable solution to my standard use-case is simply to use both âIsSupplementToâ and âIsSupplementedByâ, as it always applies, no matter whether one actually cites or references the other.
Hi @hvw
Good to hear that has clarified things. On that last point, itâs fine to use the âIsSupplementâ relationType in both directions, but keep in mind, if a dataset IsSupplmentTo a paper this will count as a citation for the dataset. If the dataset IsSupplementedBy by a paper will not.
Best
Mary