Assigning PIDs to All The Things


I had the pleasure of meeting some of you folks at the RDA and DataCite events the other week. Just wanted to quickly jot down here the use case for what we want to use PIDs for - potentially DOIs, but we’re fairly agnostic at this point.

I’m working at Instruct, an ERIC in the domain of Structural Biology, working on the ARIA development team. We facilitate access to high end technologies (scanners, microscopes, Diamond, etc) and make them available to EU researchers.

We manage proposals as they come through our system, and also research products (images, samples, code, papers, etc etc etc). Pretty much the whole process.

What we’d like to be able to do is tag these, in real time, as they go through the pipeline, in order to better facilitate reuse and reproducibility of the science, as well as to allow us to do some fun data processing further down the road.

Some of the things we would like to identify:

  • Users
  • Organisations & Facilities
  • Papers
  • Proposals
  • Visits
  • Calls
  • Machines (scanners, microscopes etc)
  • Machine configurations
  • Research inputs
  • Research outputs
    • Images
    • code
    • Image collections
    • Machine configurations
    • Chemicals
    • Slides
    • samples
    • Sample collections

For some of these we’d hold the data, for others (machine configurations for example) these would reside with the machine itself and be retrievable over an api using the PID identifier.

Some of the things we’d be tagging are digital products, others are physical, and we’re talking about potentially massive amounts of PIDs - e.g. every frame of an electron microscope scan, every cell in a sample tray x each machine x each facility x each day.

We’d like to be able to resolve a PID to a machine readable page, or at least have a way of getting to a machine readable page. Ideally we’d like to be able to specify our own schema extensions, or have the facility to do so.

We’d also like to be able to link DOIs together in a graph, using arbitrary verbs or just a link (this is probably something easy enough for us to build on top, or work with the good folks at FREYA, but mentioning for a complete picture).

We’re not wedded to a particular PID scheme, although we already use ORCID for users, and DOIs for publications.

Anyway, that’s a quick overview of our “user story” - in summary, we want to be able to identify pretty much everything, this will be happening in near real time, and we will be wanting to issue lots of them, and graph them together.