Dear colleagues,
Could you please share your knowledge and best practices you know about minting and using the PIDs for:
Content-addressed datasets and their elements (for example, for the collections with a possibility to search by image, or for any other non-verbal type of the searching keys);
Version-controlled datasets and software.
Most users solve the problems they encounter over the related data curation tasks using VCS and DOI authority, like the pair Git + Zenodo, but I could also collect information about the practices that are not considered widespread and known to society, especially about hash-based PIDs.
in the AV-Portal, the video player component of our JavaScript application takes over the task of resolving the timestamps in the DOI (which points to the actual portal URL, starting with https://av.tib.eu/media/…). There are some special parsing methods implemented in order to make the player compatible with the common timestamp resolving standards (so-called media fragments) that should be supported by modern HTML5 players. The hash can contain the starting playback timestamp (second-wise), but you can also use the from-to format (start of the segment → end of the segment). Addtionally, you can make use of the built-in citation function of the portal, which lets you cite the currently playing segment of the video in the aforementioned from-to format.
You may find more helpful information on media fragments here:
As I understood, the hash is used only as URI affix where URI represents DOI resolved by the centralized DOI gateway. Would you tell also about how the affix is formulated and hashed (maybe sharing the internal spec if it would be open)? I would be really appreciated studying your use case for implementation also of the openly reproducible PIDs and affix model based on them, like I described here:
@mvermeyen thank you! It is a good concept, especially because it uses Merkle DAG concept. I implemented Git bridge for IPFS that is also content-addressed network using Merkle DAG. There should be a way to integrate Merkle DAG structures with Git and content-addressed networking directly.
I am also working on the prefix-based concept allowing decentralized resolving of DOIs and all prefix-based identifiers. Do you know something about the similar projects already exist?
@twdragonhttps://av.tib.eu/media/ is not really a link just the start of a URL for explanatory purposes
The hash suffix in this case is the common standard Media Fragment Identifier (see links in that post). The DOI system ignores everything coming after a hash by default and affixes it to the URL. So for example this https://dx.doi.org/10.14454/3w3z-sa82#pdf transforms to the following URL: https://schema.datacite.org/meta/kernel-4.4/#pdf - just in this case the hash does not do anything. But for example it could also be used to jump to a certain caption on a HTML page.
So to recap, the TIB AV Portal uses the Media Fragment Identifier standard and combines it with the DOI system’s default settings.