Referring to a data sharing policy in dataset metadata

Another silly question!

What would be the best way to refer to an organisation’s data sharing policy in DataCite metadata? I see two options: 16 Rights and 20 RelatedItem. To me, neither is a “perfect” fit. There is no “Policy” resourceTypeGeneral, and certainly no relationType that fits. Putting it into the Rights field then muddles it up with references to specific licences or valid data reuses using DUO.

@mpfl Thanks for this question! My first thought is that Rights is a better fit, but I am interested in how others have handled this situation. The Rights field is already used for a variety of purposes including license/copyright and access information.

You could also use 17 Description with descriptionType “Other”.

Interesting question. Seems to me that the repository data sharing policy is a property of the repository and it covers all datasets in the repository. Is that right?

The licenses reflect the implementation of those policies, i.e. what a user needs to know about what they can do with the data. So I would think that is enough. If a user was interested in the actual policy, they would get to it through the repository home page.

BTW - where does that homepage go? Is that URL an identifier for the organization in the HostingInstitution contributor section? Does that mean that there are two identifiers there: ROR + URL?

1 Like

Thanks for the response, @KellyStathis and @tedhabermann.

In this particular case, the data itself is sensitive and will be distributed across many, many organisations, each with their own data sharing policies. Even within organisations the data could be in any number of systems. The metadata will be brought together into a single discovery portal. We are trying to provide guidance on how our partners can provide us with relevant metadata, while trying to cleave to the DataCite 4.4 schema as closely as possible.

The aim of DataCite metadata properties is to describe a specific resource. The sharing policy of each resource is stated through DataCite properties Rights and Licenses. The description of an organization’s policy for its resources which are exposed or available through portals or repositories is out of range of DataCite.
In your case maybe the policy would be stated on the futur discovery portal especially if it’s listed in re3data where ‘‘repositories’’ are described by their own meatada schema : https://gfzpublic.gfz-potsdam.de/rest/items/item_5007395_6/component/file_5007461/content
using properties 19 to 25

Interesting questions and also very relevant these days, as metadata and data (resource) license information may well be different: eg, the resource being described may be subject to a CC BY license, while its metadata may be subject to a CC0 license. The Rights property is used to explicitly indicate any rights information about the resource. Metadata sharing policy should refer to the repository policy and Re3data schema schema.re3data.org/3-1/re3data-template-V3-1.xml can accommodate it. But complex scenarios may emerge, as far as metadata are concerned: eg, how to indicate that the metadata of a specific collection carry a metadata license/policy which is not the repository’s standard policy?

Thanks for the additional context @mpfl and to everyone from the working group for your insights!

Personally, I feel that including a data sharing policy in Rights is in scope for the DataCite metadata schema. While the aim of DataCite metadata is indeed to describe a specific resource, it is also possible—even encouraged—to include information about other entities that are related to that resource. For example, even though creators can have ORCID iDs, and ORCID profiles include creator names and affiliations, we still include creator names and affiliation in the DataCite metadata.

Similarly, although though there is a place for the policy in the re3data schema for research data repositories, the data sharing policy is directly relevant to the specific resource and is useful to have readily available alongside it. I think this is important 1) for the aggregator/discovery portal use case and 2) for resources that do not have a specific license, including sensitive data.