Concerns and Benefits: PIDs for secondary publications in repositories

Frauke · November 14, 2024, 7:52am

Hi all,
we had some discussions with repository managers here in Germany who had concerns assigning DOIs to sencondary publications in their repositories (secondary publications: e.g. articles that were first published in a traditional journal und later published in an open access (institutional) repository). Their primary concern was that the impact factor may be fragmented if an article cpould be cited with two separate DOIs. I wrote a blog article about this and am very interested in your views on this issue. Is this a common concern you heard about? How do others approach the identification of secondary publications?

CShillum · November 16, 2024, 3:24am

Not a response to your question, but a note that “secondary publications” is being used here in a different sense than what is traditional. Usually, secondary publications refers to publications which provide commentary or analysis of primary publications, e.g. review articles (see e.g. Primary, Secondary, and Tertiary Sources | University of Minnesota Crookston)

The terminology I more usually here for the situation you are describing is “alternative copies” or “alternative versions”.

Alessandra · March 20, 2025, 11:04am

Dear Frauke,

I propose my use case. For a few months now, we have been providing each record on infoscience.epfl.ch with a HANDLE (previously, we only had an internal unique reference number and the OAI identifier). We also assign a DOI (DataCite) to primary publications that do not already have one from other services (in particular, our doctoral theses).

We share your concern about the opportunity to assign a DOI to secondary publications as well.

The conversation is still ongoing on our end. The issue of fragmented citations could be solved by using the ‘relatedIdentifier’ field of DataCite, as you rightly mention in your blog post. The version in the repository could have an open access fulltext AAM, potentially leading to new citations.

On the other hand, there are other considerations to take into account. Assigning a DOI brings many high-value-added services thanks to metadata registries. But for publications that already have a “primary” DOI, aren’t these services already guaranteed?

There is also the issue of the perception of the “secondary” DOI among researchers. As a librarian, I see that many already struggle to understand the proper use of DOI on ResearchGate or Zenodo.

Finally, there’s the not insignificant question of cost!

I’m very interested in exchanging views on this topic and am available for an “ask me anything”!

Charlie · March 20, 2025, 5:12pm

Hello,

I believe that there should be only one PID of a specific kind per object to identify (e.g., 1 person = 1 ORCID ID, 1 version of an article = 1 DOI, although there can be a handle too as it’s a different identifier). I’ve been promoting this idea in the workshops on PIDs that I give to researchers at my institution. My goal is to prevent them from turning to other publishing platforms to get multiple DOIs for a single article.

In the case of our institutional repository, we have strict rules that we don’t assign a new DOI to an article that already has one. In our experience, the versions deposited in the repository are usually the same as the ones published. The repository simply becomes another access point to the article.

I’m unsure if a version where only the cover page and page numbers slightly differ justifies calling something a different version. Anyone wanting to look up a citation is more likely to search for it using CTL+F than reading through a page based on its number.

I’m curious to know people’s thoughts on this. Do you think multiple DOIs for minor version differences are justified? Is a slight difference in page number considered a major or minor difference?

jsicot · March 25, 2025, 10:01am

Hi all,

A valid reason to assign a DOI to an alternative version of an article, even if the published version already has a publisher-assigned DOI, is to promote open access and ensure broader dissemination of research. By making freely available versions, such as an Author Accepted Manuscript (AAM) or a preprint, more discoverable, citable, and trackable, this practice aligns with open science principles and facilitates access for researchers who may not have subscriptions to paywalled content.

When metadata properly describe relationships between versions, linking preprints or AAMs to the publisher’s Version of Record (VoR), citation fragmentation should be avoided, ensuring that all references ultimately connect to the final published work.

This approach does not create unnecessary duplication but rather complements publisher-assigned DOIs by integrating open-access versions into the global scholarly communication network.

pedrohenriquenopid · April 26, 2025, 2:00am

This is a very interesting discussion!

My master’s thesis specifically addresses the topic of citation dilution. I want to understand how many citations, on average, a journal might lose when it publishes an article that already existed as a preprint — especially if readers continue to cite the preprint after the final version has been published.

My Study:
Initial title: “Study on Citation Dilution in Articles Derived from Preprints Published in Health Sciences Journals” (preview available on ResearchGate: https://www.researchgate.net/publication/382652349_Study_on_Citation_Dilution_in_Articles_Derived_from_Preprints_Published_in_Health_Sciences_Journals).

Revised methodology: although I’ve changed the methods, the draft still offers valuable insights into the citation dilution phenomenon.

I agree with @CShillum suggestion to refer to preprints and final articles as alternative versions or alternative copies.
I also agree with @Charlie : if the content is identical, it should have the same DOI (Digital Object Identifier). The repository would simply be another access point.

It only makes sense to assign a new PID (Persistent Identifier) when the text changes substantially — for instance, in the case of preprints, after peer review that adds or removes information.

Challenges in Citation Counting:
In practice, Crossref does not combine the citations of preprints and final articles, even when they are properly linked (see: Gmail [ Crossref] Re Citations : Free Download, Borrow, and Streaming : Internet Archive). I’m not sure about DataCite.

Tools like Google Scholar attempt to group citations of different versions of the “same work,” but databases like Web of Science and Scopus do not consolidate them (Redirecting).

Personally, I find it problematic to merge citations. If someone cited version X, the citation should count for X — not version Y. In the case of preprints, for example, the cited content may have been changed or removed in the final article. I believe the reference and citation count should belong to the specific version that was actually read.

This is a very compelling conversation — thank you for bringing it up, @Frauke ^-^

Additional readings: