Decentralised PIDs - Discussion and Proposals

Hi all,

For starters, the working group is going strong! This level of enthusiasm is exciting to see! Check out the draft prospectus if you’re curios about the group: [Decentralised PID Working Group - Prospectus - Google Docs]. It has a link to the common notes included.

@APfeil I’m reiterating the answer from the thread on this forum so others who stumble across it can join in on the conversation. This is documented in the dPID Working Group Common Notes

Erik S brought a framework for segmenting persistence to my attention after one of his presentations at the national academies. It has been useful in structuring/clarifying my thoughts around the topic. I can’t find the source, if anyone knows of it please let me know. The framework splits persistence into 6 distinct points:

  1. Persistence of the payload as a thing
  2. Persistence of the mechanism to handle the payload’s non-persistence
  3. Persistence of the identifier as a thing
  4. Persistence of the binding between the identifier and the payload
  5. Persistence of the service to resolve from the identifier to the payload
  6. Persistence of the service to allow for updating of the binding between identifier and payload

See below for my take on persistence in the dPID system and the places where social contracts may need to be taken into account by designated resources (preferably federated or decentralized). Comments and thoughts are definitely appreciated.

Persistence 2, 3, and 4 are architected to be fully decentralized. Persistence 1 (payload referring to metadata mainly) could benefit from dedicated centralized or federated resources. Persistence 5 and 6 could benefit from federated or decentralized resources. Federated/Decentralized resolver infrastructure and an insurance policy on Manifest/DAG storage is important. This could be built into PID minting stacks relatively easily and hosted locally by institutions. Manifest files are small, this task should have a relatively light footprint. Given the rise of decentralized compute orchestration layers and decentralized content delivery networks, this may throw an interesting spin on the role of a Federated PID infrastructure.

1. Persistence of the payload as a thing (as best possible): (Meta)data includes both data and metadata. Ensuring the persistence of all data is an impossible task, but we should work to make metadata permanent. Designated IPFS nodes could be useful for Manifest/DAG storage. DeSci Labs is already committed to this functionality.
2. Persistence of the mechanism to handle the payload’s non-persistence: Content drift is eliminated in dPID, and link rot is mitigated for pretty much all cases except file deletion. From a technical standpoint, this is already handled by content-addressable storage networks. Would appreciate thoughts on this, but I don’t really see how Persistence 2 would benefit from a central authority. Once data is completely off a P2P network, there’s not much a central authority can do.
3. Persistence of the identifier as a thing: PID stored on DLT, this is as close as we can come to permanent on the internet. Central authority is likely not helpful here. Helping to secure the chain is always appreciated but blockchains can’t differentiate or discriminate.
4. Persistence of the binding (link) between the identifier and the payload: The CID is an abstracted hash of underlying content, no central authority would be helpful there. IPFS uses a distributed hash table to link PIDs to the CIDs. It’s always helpful to have more resources managing this hash table, but they would effectively be contributors in a decentralized network in this capacity, not centralized authorities. IPFS can’t differentiate or discriminate.
5. Persistence of the service to resolve from the identifier to the payload: Open source, deployable application to provide resolution of identifiers to their payloads over HTTP. I could see the utility of dedicated providers, however, a small group of central authority seems restricted given that IPFS is an open P2P network. Best to federate/decentralize as possible on this one through a dPID stack.
6. Persistence of the service to update/CRUD over the binding between identifier and payload: Handled by the protocol, likely better off on an org by org level. As long as the manifest file is stored this should be fine. Different communities will want to handle updating differently and that’s ok. Best to federate/decentralise as possible on this one through a dPID stack.

The group was in general agreement with this sentiment, additional discussion is on the google doc if you’re curious. @twdragon should be sharing the recording to that call shortly.


@sharif.islam Those are excellent points and questions.

We also need to engage our user community and funding organisations to better understand the problems we are trying to solve.

Totally agreed. In addition to Universities and funding agencies, we should be brining in other prominent FAIR organizations to these discussions (FDO, RDA, GFF, CODATA, etc). Any thoughts on how we can bring in these 3 groups?

Here, I would like to summarise a few high-level points that can facilitate productive discussion, rather than framing the discussion as centralised versus decentralised. Most decentralised technologies still rely on centralised governance models.

In essence, we need to think about types of decentralisation and hybrid implementations, taking into account the context of other components in the infrastructure. PID is just one aspect.

I couldn’t agree more. Governance for the PID system will need to be considered in depth, and you’re right about centralized governance. The DeSci Foundation is standing up a governance committee right now. It would be good to have the governance committee tightly coupled with a technical steering committee. I know that is on the horizon for the Foundation. The experience of members on this forum thread would be invaluable there.

I’m sure Philipp would appreciate discussion on this topic, reach out to him if you’d like to discuss.

  1. Keeping data as close as possible to the compute and analytics platform, while also allowing for local domain ownership. This can change where and how the PID infrastructure interacts with the data and compute part.

I could see a case for repositories to play a more important role in the governance and operations of a PID system which more closely integrates to data through decentralized technologies like Bacalahu.

I’ve broached this topic in the past and have found quite a bit of pushback. Nobody disagrees with the concept, but it seems to be too early for this discussion. Cyber concerns, architecture, privacy, cost, etc all get brought up regularly (and are valid concerns). We should keep this edge compute functionality in mind, but focus on communicating more basic concepts like persistent folder structures.

  1. Can we think of a hybrid solution that takes advantage of centralised governance and data aggregation and the independence of decentralised model?

This is a very good question. While I dont have an immediate answer, I would love to explore your thoughts. Do you have any specific answers in mind?

  1. Leveraging the existing trust models of centralised organisations. Trust needs to accommodate complex social and local contexts; cryptographic verification is just one aspect.

One of the main points of discussion thus far in the dPID working group has been how we can build on these existing trust models. One concern that has been expressed is that current trust models tend to conflate the existence of a PID with the ability to trust information (Seeing PIDs as a “stamp of science”). While there is definitely a correlation between the two, it’s not a causation and can be a dangerous assumption. Do you have any thoughts on the idea of using the scalability of IPFS based PIDs to build a more explicit verification and social trust layer post minting? We talked about doing that through the idea of annotations and attestations it in this video