Could you please share your knowledge and best practices you know about minting and using the PIDs for:
- Content-addressed datasets and their elements (for example, for the collections with a possibility to search by image, or for any other non-verbal type of the searching keys);
- Version-controlled datasets and software.
Most users solve the problems they encounter over the related data curation tasks using VCS and DOI authority, like the pair Git + Zenodo, but I could also collect information about the practices that are not considered widespread and known to society, especially about hash-based PIDs.
in the AV portal at the TIB we use hash-based DOIs to cite videos at a certain timestamp. Here is an example:
DOI for the video: https://doi.org/10.5446/62930
DOI for a certain timestamp of that video: https://doi.org/10.5446/62930#t=00:56
DOI for a certain section of that video: https://doi.org/10.5446/62930#t=02:21,02:35
The resolving of the hash takes place in the AV portal itself, not at DOI level.
@Frauke could you please tell a bit more about the hashing functions you use, and the resolver pipeline implemented on your side?
You may find more helpful information on media fragments here:
@Frauke There is nothing now on the link you shared(
As I understood, the hash is used only as URI affix where URI represents DOI resolved by the centralized DOI gateway. Would you tell also about how the affix is formulated and hashed (maybe sharing the internal spec if it would be open)? I would be really appreciated studying your use case for implementation also of the openly reproducible PIDs and affix model based on them, like I described here:
A bit late to the discussion, but for VCS systems, you could also take a look at how Software Heritage handles this:
@mvermeyen thank you! It is a good concept, especially because it uses Merkle DAG concept. I implemented Git bridge for IPFS that is also content-addressed network using Merkle DAG. There should be a way to integrate Merkle DAG structures with Git and content-addressed networking directly.
I am also working on the prefix-based concept allowing decentralized resolving of DOIs and all prefix-based identifiers. Do you know something about the similar projects already exist?
https://av.tib.eu/media/ is not really a link just the start of a URL for explanatory purposes
The hash suffix in this case is the common standard Media Fragment Identifier (see links in that post). The DOI system ignores everything coming after a hash by default and affixes it to the URL. So for example this
https://dx.doi.org/10.14454/3w3z-sa82#pdf transforms to the following URL:
https://schema.datacite.org/meta/kernel-4.4/#pdf - just in this case the hash does not do anything. But for example it could also be used to jump to a certain caption on a HTML page.
So to recap, the TIB AV Portal uses the Media Fragment Identifier standard and combines it with the DOI system’s default settings.
I hope this helps to clear it up a bit.
@Frauke Thank you! This clarifies things