Developing a US National PID Strategy Report

erik-desci · April 1, 2024, 4:55pm

@TAC_NISO Seems like this thread is getting a little heated. I appreciate you bringing this report to the larger group and would like to do two things to try and make your life easier. I’ll…

Summarize the remarks on this thread
Submit a potential section for your reviewal that could be included in the next round of edits

Hopefully this helps push the conversation in a more productive direction that you can bring back to the committee

Summary

Concerns were raised about the absence of key members from relevant groups in the report’s development process, suggesting either 1) a need for more inclusive and broad consultations or 2) a foundational refactoring of committee membership. Concerns were raised about representation in the committee itself due to a number of seemingly “big misses”.
It was requested that the report be renamed to something indicating its draft stage. One potential suggestion was “National PID Strategy - Discussion Document”. Additional emphasis was put on the perceived finality of Table 1, the document (beyond just the name) reads as a final draft. Softening language at this early stage may be necessary.
It was mentioned that recommendations from established working groups on the subject of PIDs were not included in this report. Examples with respect to software include the RDA Software Source Code Identification WG and its output, along with the work of the RDA/ReSA FAIR4RS and FORCE11 Software Citation groups. Publishing a literature review as supplementary material prior to the policy draft would be beneficial.
More specifically, participants critiqued the report’s approach to software PIDs, emphasizing the mismatch between the recommendation for DOIs and the complex realities of software identification and preservation.
- Comments from the group included the phrases naive, impractical, concerning, confusing, and “…not credible”. This needs to be addressed by the committee.
A wide array of PIDs were left off the list in favor of a purely DOI centric approach. The group felt that this was heavily detrimental. Most notable, Software Heritage IDs, ARKs, RRIDs were never mentioned in the report despite wide adoption and established track records of 20+ years.
Concerns were raised about the persistence and scalability of DOIs and the handle system. The group felt that a small section on the persistence problems in DOI should be included.

(For consideration as an addition) Trends in Emerging Persistent Identification Technologies under Observation

Persistent Identifiers (PIDs) play a crucial role in anchoring the scientific record against the rapid advancements and challenges posed by artificial intelligence technologies. As the digital landscape evolves, foundational changes to PID technologies are increasingly recognized as necessary for safeguarding the integrity and accessibility of scholarly work. A select number of trends are regularly emerging in the design choices of new-age PID systems, highlighting potential innovations to be observed moving forward.

Trend	Reasoning
Content-based Identification	Using content hashes as a means of identification ensures that the identifier of a document or dataset is directly derived from its content, permanently linking the two and eliminating content drift. The benefit in combating AI lies in its inherent ability to detect changes or manipulations in the data, as any alteration in the content would result in a different identifier. This helps maintain the integrity of scientific records in an era where AI-generated content could potentially flood databases with inauthentic or altered information.
Scalable/Versionable PIDs	In most digital ecosystems, context and provenance are used as a first line of defense against unsolicited machine agents. Versions, comments, likes, stars, and more aid all digitally native platforms (from GitHub to Facebook) in disambiguating humans from machines. Artifact fragmentation in legacy systems presents a barrier to the persistent linkage of information at scale. Many emerging PID systems are repurposing technologies from the software development industry such as Directed Acyclic Graphs (DAGs) to make scalable PIDs which can cryptographically link an effectively infinite range of information.
Verifiable Ownership	Verifiable Ownership through cryptographic methods, particularly Verifiable Credentials, plays a pivotal role in fortifying the scientific record against the encroachment of AI-generated fabrications. These credentials enable the immutable association of digital objects with their creators, facilitating the verification of authenticity and provenance in an environment increasingly populated by sophisticated AI algorithms capable of producing seemingly credible but falsified data. By embedding verifiable digital signatures within PID systems, the scientific community can ensure that each piece of research, dataset, or publication is traceable back to its original source, thus providing a robust defense against the dissemination of misinformation and reinforcing the trustworthiness of scholarly communication.
Distributed Architectures	Distributed architectures for persistent identification systems align naturally with the inherently distributed nature of academia, where journals and repositories often manage their metadata in isolation, leading to inconsistencies and gaps in the scholarly record. By decentralizing the infrastructure for PID minting and management, these systems eliminate the opacity and control issues associated with centralized models, ensuring transparency and wider accessibility in the creation and tracking of scholarly outputs. Moreover, distributed systems offer significant advantages in terms of load balancing and efficient shared storage, facilitating a more resilient and scalable approach to handling the explosive growth of data generated by academic research.

@all, additions to this post are encouraged. Would you change anything here?

@TAC_NISO I hear what you’re saying about the importance of only referencing established PID systems by name. While I disagree on the assessment of systems like ARKs, SHWIDs, and RRIDs, I can see the wisdom in ISCC and dPIDs. I wrote this section in the hopes that it (or something similar) can be a happy medium, adding to the impact of the document. By referencing concepts only, you’re not recommending any system, rather acknowledging the need for innovation, identifying important technologies in the space, and being observant of upcoming trends.

CShillum · April 2, 2024, 6:24pm

@jak One correction to your earlier post. ORCID is neither DOI-based, nor sponsored by Crossref. We are an independent member-governed non-profit organization, with the majority of membership coming from Universities and other research institutions. See https://info.orcid.org/what-is-orcid/