relationType for datasets containing parts of other datasets

dnoesgaard · November 25, 2022, 1:15pm

GBIF has ~80k datasets proper registered with DataCite. In addition, we mint DOIs for custom datasets created based on aggregated queries (aka. downloads) among the combined records of the 80k datasets.

Downloads are linked to all parent/contributing datasets via relatedIdentifiers, however, using "relationType": "References". An inadvertent consequence of this is that downloads are counted as citations in Datacite Commons, e.g., iNaturalist is shown with 82k citations. The real number is ~3000.

To avoid this, should we pick another relationType for the download-parent dataset relation, and if so, which one? IsDerivedFrom?

Thoughts appreciated!

jezcope · November 25, 2022, 2:31pm

Yes, I’d agree with the use of IsDerivedFrom: I had a look through the schema to remind myself of the options and briefly also considered IsPartOf but that seems more for a discrete element within a container (e.g. an issue in a series) rather than one of many (possibly overlapping) subsets. Indeed “the dataset is derived from a larger dataset” is the example given in the schema for IsDerivedFrom.

dnoesgaard · November 25, 2022, 2:52pm

Thanks, Jez, I had also considered IsPartOf, so I’m glad we reached the same conclusion.

KellyStathis · November 25, 2022, 5:51pm

IsDerivedFrom seems perfect to me too! Thanks for sharing this use case.