This post resulted from an online discussion among Keith Russell, Kirsten Elger, Natasha Simons, Lesley Wyborn, and Jens Klump.
In an effort to accelerate broad community convergence on FAIR implementation options, the GO FAIR community has launched the development of machine-actionable FAIR Implementation Profiles (FIP) (Schultes et al., 2020). In their questionnaire, the very first question raises a point that needs clarification.
The FIP questionnaire starts with the question “What globally unique, resolvable identifiers do you use for metadata records?”. This question is related to FAIR Principle F1 (Findable) (Wilkinson et al., 2016).
Asking for separate PIDs as identifiers for metadata records, as well as for the data object, raises the question: what is the use case? Why would I want to identify the metadata record? Is the metadata record a research artefact that I need to identify unambiguously into the future? What is the metadata record identifying? Are we identifying the data at the “expression” or “item” level (Klump et al. 2021)?
The reason this question was added was that data set PIDs were frequently resolving to landing pages, but there was no consistent way for a machine to find the link to the actual data set. To ensure both the landing page and the data set could be found the request was added to have separate DOIs: one resolving to the landing page and one resolving to the data set.
The question of where a PID should resolve to is as old as the PID community itself, and early on it was agreed that a PID should always resolve to a landing page. DataCite states in its notes for DOI best practice:
DOIs should resolve to a landing page, not directly to the content
It is important that both humans and machines have context for the item that the DOI is resolving to. DOIs should therefore resolve to a landing page containing metadata about the item, rather than to a PDF, for example. The landing page should contain a full bibliographic citation, so that a human can tell they have arrived at the correct item, and so that a machine can retrieve additional information about the item that might not be easily retrievable from the item itself.
A more thorough discussion on how to use PIDs in scholarly data repositories was published by Fenner et al. (2019).
However, it has been recognised that there should be a machine-actionable pathway from the resolved PID to the content object too to ensure that machines can find their way from the PID to the data set without requiring human intervention.
As an example the DataCite Metadata Working Group proposes to introduce a new metadata element that would point from the metadata record to the content object. For a more scalable approach, this element could also be embedded in the DOI Handle object itself (Weigel et al. 2019).
It would be important for fully implementing FAIR if other Persistent Identifier systems would follow a similar approach to DataCite. It is also important that this information is presented on the landing page in a consistent machine readable fashion so that machines can easily parse this and find their way to the data set.
Fenner, M., Crosas, M., Grethe, J. S., Kennedy, D., Hermjakob, H., Rocca-Serra, P., et al. (2019). A data citation roadmap for scholarly data repositories. Scientific Data, 6(1), 1–9. A data citation roadmap for scholarly data repositories | Scientific Data
Klump, J., Wyborn, L. A. I., Wu, M., Martin, J., Downs, R. R., & Asmi, A. (2021). Versioning Data Is About More than Revisions: A Conceptual Framework and Proposed Principles. Data Science Journal, 20(1), 12 p. https://doi.org/10.5334/dsj-2021-012
Schultes, E., Magagna, B., Hettne, K. M., Pergl, R., Suchánek, M., & Kuhn, T. (2020). Reusable FAIR Implementation Profiles as Accelerators of FAIR Convergence. In G. Grossmann & S. Ram (Eds.), Advances in Conceptual Modelling (Vol. 12584, pp. 138–147). Cham, Switzerland: Springer International Publishing. Reusable FAIR Implementation Profiles as Accelerators of FAIR Convergence | SpringerLink
Weigel, T., Schwardmann, U., Klump, J., Bendoukha, S., & Quick, R. (2019). Making data and workflows findable for machines. Data Intelligence, 2(1–2), 40–46. Making Data and Workflows Findable for Machines | Data Intelligence | MIT Press
Wilkinson, M. D., Dumontier, M., Packer, A. L., Gray, A. J. G., Mons, A., Gonzalez-Beltran, A., et al. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3, 160018. The FAIR Guiding Principles for scientific data management and stewardship | Scientific Data