Please give us your feedback

Over the final half of 2019 a small taskforce has come together to draft a policy on persistent identifiers (PIDs) for the European Open Science Cloud. Representatives from the EOSC FAIR and EOSC Architecture Working Groups have joined forces to create an outline of what is meant by ‘persistent identifier’ and what the key features of PIDs are that must be present in order to support FAIR research across EOSC.

The initial draft has been uploaded to Zenodo (https://doi.org/10.5281/zenodo.3574203).

How can I respond or comment?
Please share your responses with taskforce and the wider PID community by creating a topic in this category. We strongly encourage wider discussion on the content and implications of this policy in this space. You can also email us your responses, inform-fair-wg@eoscsecretariat.eu.

We will be presenting on this draft at PIDapalooza in Lisbon on 29th January 2020. If you are attending, we will be gathering face-to-face feedback there. We will announce further opportunities for face-to-face feedback here and on the EOSC FAIR Working Group Blog.

Where can I find out more?
More details including next steps are included in a blog post to be published here.

2 Likes

Hello, I have read the EOSC PID Policy and it makes sense. It chimes pretty much exactly with what I understand a PID to be. I like the phrase kernel information used to describe the core, machine-readable metadata that is associated with the PID. In the case of the DataCite DOI, for example, does the kernel information mean the metadata stored in the DataCite registry, using the DataCite metadata schema, or does it refer to a cached subset of that metadata?

Section 5.3 of the policy reads:

“Applications require secure mechanisms built-in PID Infrastructures and some applications require encryption of PIDs to protect activities.”

I don’t know what this means. It seems to be at odds with the openness that should be fundamental to any PID system, particularly ones that support open research. It may well be that you have discussed particular use-cases that require a level of security. However, is the PID layer the best place to implement such measures? Can you clarify this?

Thanks!

David Kane

Many thanks for your positive feedback David!

I believe 5.3 mentions secure mechanisms that more related to 5.4 and the need to ensure that only appropriate parties are able to make changes to PIDs. But I do agree it’s not very clear in the text as it is. I’ll flag this for us to review.

Thanks, Rachael.

it’s a great progress to have the PIDs policy in EOSC, would it possible
to have a cooperation with experts from China, as we are improving the PIDs to be one of the infrastructure of research data. Could we contribute to this draft, we think it’s a good opportunity to push forward synchronize.

Hi. First, it’s great to have the policy available in draft for public comment and it’s a terrific document so thank you very much for that. Second, I have a few Qs about the policy:

  1. It says that PID providers need to be certified on a regular basis to ensure they fit with the principles and the kernel requirement - who or what is the certification body? What is the process?
  2. It says EOSC need to be part of the governance structure for PIDs - this comes across as a broad and non specific statement: what is the governance structure being referred to? Is there going to be a new governance body on PIDs for EOSC? Or does it mean EOSC being represented on PID service provider governance boards? Or… ?
  3. The PID kernel information and the API to access - will these be built by EOSC? Or are PID providers expected to build this (as part of certification perhaps)? Is the PID Kernel information drawing on the work being done by the RDA PID kernel WG?

Thanks,
Natasha
Australian Research Data Commons

Thanks Liujia, I’ll highlight your offer to the working group and get back to you - the Policy is in support of European work, but that doesn’t mean we can’t have a more internationalised version later - or add more of that international context to support the policy.

Hi Natasha, thanks for taking a look!
My initial answer to your questions:

  1. The certification process and body are not quite in scope for the policy, this is to be fleshed out by a subsequent implementation plan. But we could make that clearer. We would like to get further comment on this aspect in general - even if some feedback we get is fed into implementation rather than the policy itself.
  2. I believe we intended this statement to mean the latter - EOSC rep on service provider governance boards. We’d love to know what the community reaction to that is, but you’re right, we could clarify this statement better in the current text.
  3. PID providers would be expected to provide this, and it could be an aspect for certification. We are certainly liaising with the PID Kernal WG at RDA, Tobias is one of our co-authors and has been key in developing policy points around the PID Kernal.

Thanks! Rachael.

1 Like

Firstly, thank you @RachaelK and the other authors of this first draft for your work - it is already a well-written document.

I’m commenting mostly from the perspective of how the PID policy relates to software objects.

  1. It is good to see the policy explicitly including (in 6.1) software as a research entity which should be identified using a PID.

  2. I think there are still challenges for ensuring that non-data digital research objects can implement FAIR (particularly the I). This document is primarily looking at ensuring the objects can be “reliably found, used by humans and machines, […] and be cited” (2.4). I think this is an appropriate approach for now, but that work more generally needs to be done to interpret the FAIR principles for other types of digital object.

  3. It is important that PIDs for software support granularity (5.5) and versioning (5.6). However one thing that isn’t clearly stated is that to support this, there should be the ability to link PIDs provided by different PID Authorities such that the linkage can be identified in a PID graph. An example for software might be that you wish to interrelate a DOI used to identify a specific version of a software package with the Software Heritage PID for the package, and also for the individual files, or even functions. So PID linkage for software is required at the conceptual object level as well as to support granularity.

  4. How would you apply this policy to provide a PID for software-as-a-service or complex digital object? For example, to refer to an instance of an online analysis suite? Or an executable notebook with parts which are hosted separately from the notebook itself? Is this covered by the “levels of
    granularity appropriate to community best practice and use cases should be provided, while allowing for flexibility to respond to how those needs and practices will evolve”?

  5. Should metadata embodied in PIDs be made explicitly public domain / CC0 to cover use in jurisdictions which allow for copyright of unoriginal databases? This would also explicitly signal that the metadata is available for reuse (even if it was already possible to do that). At what point (if ever) might a PID Service Provider be considered to have amassed a collection of data that was sufficient to be a creative work (cf genetic databases, ancestry databases)?

Thanks for this useful document!

In Section 6, that describes the PID types, it is important to add “intrinsic identifiers”, to take into account various categories of identifiers that are broadly used in various important communities.
This includes the cryptographic hashes found in version control systems used by tens of millions of software developers worldwide in their daily work, and in particular the SWH-ID identifiers (https://docs.softwareheritage.org/devel/swh-model/persistent-identifiers.html) that are now available for 20 billion software artifacts in the Software Heritage archive (see https://hal.archives-ouvertes.fr/hal-01865790 for a discussion and https://www.softwareheritage.org/save-and-reference-research-software/ for detailed usage guidelines.)

In Section 7, that describes the PID services, a mention could be added concerning the services that are proper to intrinsic identifiers. We propose the following paragraph to be added to 7.2
“Service provider can also provide advance services for intrinsic identifiers (e.g PID verification, comparison and reverse lookup)”

Greetings. In general I think it’s very useful and tries to strike the balance of being appropriately general & specific; and does a good job of acknowledging the variety of PID types. I do have a couple of comments for clarification:

3.3.1.3 “In some cases absolute fixity of the referent is required”–an example is given but not really overall criteria for in what cases this would apply. That may be just fine if the idea is to let various stakeholders use their judgment on this, but if you have particular ideas/criteria, it could be useful to state them explicitly.

  1. Roles & Responsibilities
    This all makes sense; and also I wonder if there are formal groups w/in EOSC that would have roles to be articulated. Some of these are mentioned in the Rationale, but I wonder if that information is better placed (or even repeated) in section 4. And there may be other EOSC examples as well.

5.5 “Granularity of PIDs is very much dependent on the communities and it will change over time.”
This is true (as is the paragraph in general), but I also think that there would be a role for standards here, in setting rules/conventions for what level of granularity should be used (i.e., would this not be the role of the PID Authority?).

7.7 “certified based on agreed rule sets”–how would those differ from the principles set out in this document? (and I see someone else already has raised the question of the certification body; nb, it would be useful to reference documents on the certification examples (DONA, ePIC, DOI))

8.1. the “EU research community needs to be represented in the governance structure…”; I see Natasha commented on this as well–I assumed you meant the governance structure of the Service Providers, but questions could be:
– if it’s as Natasha thought that it’s to ensure compliance with the policy, simply involvement from someone from the EU research community (who may/may not have a connection with the EOSC or knowledge of this policy) may not fit that goal
– if this refers to membership on Boards of Directors (which many PID Service Providers do have) , there’s ambiguity in how to define “research community”–just active researchers would exclude valuable people involved in research infrastructure ; one may or may not have to define this at this point, but I wonder if this could become a point of debate

And some minor editing comments:

  • 7.1 I believe that would be Infrastructures plural if you’re referring to ERICs as a named concept.
  • Policy is capitalized in some places and not others
  • Fully alphabetising the glossary would be useful. Also, other concepts it might be useful to have in the glossary: technology readiness levels

See also comments from Alex Hardisty at Comments by Alex Hardisty (Cardiff University) on EOSC PID Policy draft

Thanks everyone for your replies, keep them coming!

Note that any of you who are lucky enough to be in Lisbon for PIDapalooza this week, we have a session for the policy and will be taking further feedback during a small number of the breaks.

The main PIDpolicy session will be at 12.00 on Wednesday (29th) in the Sophia de Mello Room.

1 Like

Very good document.

One remark about “8.2”: I think the word “justifiable” may not be touching the right economic principle here. One problem with it is that it is partially subjective, and this is compounded by the fact that potential users may underestimate the value provided by the service, and possibly underestimate how much effort it takes to provide the necessary functions. The result: although the costs of the service may be justifiable, potential users may not be able to justify it to their project, and choose inferior solutions.

It may be wise to consider building the economics on providing the service at ‘marginal’ costs (possibly with a reasonable markup). This may require that some base cost would be covered by other, fixed, infrastructure funds.

Indeed a very good document, but I miss the explicit mentioning of the use of PID’s as linked data URI’s by way of content negotiation. Implicitly content negotiation is hinted at in section 3.4. Resolvable. Subsection3.4.1. states: “There can be two intentions of PID resolution. A PID is resolvable when it allows both human and machine users to access: 3.4.1.1. An object or its representation: This would either allow direct access to its assigned object or representation, or information on how the object can be accessed. 3.4.1.2. Kernel Information: A global resolution system should support access to Kernel Information from its PID.”

A PID resolver should be able to distinguish between a) human and machine users, and b) the requested representations (plural!). This is done by the mechanism of content negotiation, where the resolver decides whether to redirect to a human readable landing page, a specific digital representation/version of the identified object, the metadata about the object in various formats (RDF, JSON, etc.). Linked data services are only possible if one or more RDF formats can be requested.

Currently the main PID systems (Handle, DOI, even ARK) do not support real content negotiation. I think the policy should definitely aim to persuade the relevant PID Authorities to put this on their roadmap.

Lukas Koster
Digital Infrastructure Coordinator
Library of the University of Amsterdam

Dear colleagues,

Thank you for drafting this policy document and for sharing it with the PID community for feedback.
I would like to emphasize the crucial role that ISSN and the ISSN portal (portal.issn.org) play regarding the identification and preservation of digital journals by disseminating free metadata and providing a reliable and sustainable system based on ISO standard 3297 currently under revision. The ISSN Network has been around for more than 40 years and comprises 90 registration agencies around the world. The ISSN International Centre is the Registration Authority appointed by ISO.
The ISSN is a PID that resolves through URN and EMBL-EBI compact identifier. Kernel metadata provided by the ISSN portal is available under various linked data formats (RDF/XML, JSON, Turtle). Full metadata, including links between serial publications (print and digital), can be downloaded via a Rest API. Since December 2019, the ISSN portal has taken over the Keepers Registry service that aggregates preservation information submitted by 13 archiving organisations. For more information, please refer to the presentation given at PIDapalooza 2020 posted on the ISSN Slideshare account ot to our websites (portal.issn.org; www.issn.org).

Best regards,

Dr. Gaelle BEQUET
Director
ISSN International Centre

1 Like

Hi,
First of all thank you for the opportunity to take part in this process. I think it’s a really good document over all. One thing that was new to me was, and which I found very interesting, was the use of technology readiness levels.

I just have one minor comment:
Point 7.7, while valid, is not clear (to me, at least) about which service providers it applies to.