DataCite @ Stanford

amyhodge · June 7, 2019, 10:31pm

Hodge_DataCite2019.pdf (1.8 MB) This Use Case was shared at the DataCite General Assembly in April. This is the text that accompanied the slides. As a new user I apparently do not have permission to upload files (or so it told me). I’ll see if I can find a workaround for that, though most of what you need is in this text and not in the slides.

Slide 1: I’m Amy Hodge and I work for Stanford University Libraries in the Science and Engineering Resource Group. Part of my job is to help researchers on campus to use the Stanford Digital Repository, which is a service built and run by Stanford Libraries. In addition, Hannah Frost, who is part of our digital library group, manage our campus DOI service.

Slide 2: The Stanford Digital Repository, or SDR, is designed for digital content management, publishing, and preservation. It’s supported by a team of about 20 developers, operations managers, project managers, and service managers. The SDR has been in production since 2006 and holds nearly 600TB of total content and close to 2 million objects. Most of these are materials owned by the libraries and are core parts of our library collections, but the SDR does directly serve people across the university as well. This is the home page for our online deposit application, which is what most researchers on campus use to deposit data or other scholarly content into the SDR.

Slide 3: This graph shows that since January of 2013, when our very first data deposit was completed, we have had nearly 20TB of content and 466 items deposited in the SDR. We know this is a mere fraction of what is produced on campus; for the 2018-2019 academic year alone Stanford has some 6000 sponsored projects with a total budget of $1.63 BILLION dollars. We would love to capture more of the outputs from this research in our repository.

As this is the 10th anniversary of DataCite, it’s clear that the SDR has been in production longer than DataCite has been around. So, when the SDR was built, the architects included the ability for us to generate unique identifiers for all of our content. These are referred to as Digital Repository Unique Identifiers, or DRUIDs.

Slide 4: A DRUID is an 11-character, randomly-generated, alpha-numeric string. Nearly all of our repository content is made available on the web at Persistent URLs, or PURLs, that include these DRUIDs. So, when the DRUID is part of a URL, it looks somewhat like a URL for a DOI, however it functions slightly differently. The PURL is the actual physical location of the content, while the DOI link redirects people to the content elsewhere on the web. In addition, the DOI is a global identifier, while the DRUID is strictly local. When we create DOIs for our digital repository content, the idea is that we would use the DRUID for the suffix so that we know immediately which piece of our repository content that DOI is assigned to.

We have been operating our repository since 2006 without using DOIs, so what made us – about a year ago now – become DataCite Members?

Slide 5: There are a number of reasons, one of which is that DOIs have become the gold standard for uniquely identifying research data. And as a top-tier research institution, we want to be using the gold standard.

Another is that over the last few years we’ve gotten more people on campus coming to us saying, “My publisher is requiring me to get a DOI for my data.” Early on, these statements actually weren’t true. Most of the official statements from these publishers required a unique identifier, which we could provide, but not a DOI specifically. However, over time this has been changing.

Springer Nature’s Research Data Policy indicates you can use an institutional repository for your publication-related data “if they are able to mint DataCite DOIs.”
https://www.springernature.com/gp/authors/research-data-policy/other-repositories/12327120

Science Magazine’s editorial policies state that if you can’t find a specialized repository for your data, then “an archived static version should be deposited at a general repository and the DOI included in the Science Journals paper,” implying that a DOI is required.
https://www-sciencemag-org.stanford.idm.oclc.org/authors/science-journals-editorial-policies

At issue for us was that Stanford authors publishing in these journals could not use the SDR as the repository for their publication-related data. Once researchers have turned to some other non-Stanford, general-purpose repository, they may continue to use that service and never come to us, because that other service is familiar to them. By offering DOIs, we could remove this impediment to SDR use and users would hopefully “flock” to us.

Also, in those cases where people are not reading the journal requirements carefully and are making an inaccurate assumption about there being a DOI requirement, they will no longer dismiss the SDR out of hand for not being able to provide a DOI, even if in reality they don’t actually need one. In short, we consider the DOI a significant added value to our repository service that we think will lead Stanford researchers with repository needs to our service.

Slide 6: This is our Digital Exhibit that features all the research data deposited into the SDR to date. It is still too early to say whether offering DOIs is causing an increase in adoption of the SDR for publication-related data, or data sharing in general. We admittedly have not promoted the service widely as we are at this point still generating DOIs one-at-a-time upon request. We hope to have the bandwidth to integrate the API into our repository systems sometime soon.

Maaike · June 11, 2019, 9:41am

Hi Amy, many thanks for this interesting use case. If you’d still like to upload files with it, you should now be able to do so.

Cheers,
Maaike

amyhodge · June 11, 2019, 3:21pm

Great. I have attached a compressed pdf version of the slides.

Helena · June 12, 2019, 5:46am

Thanks for sharing your use case Amy!