User story
As a librarian, I want to find all preprints with a DOI for a given search term, no matter whether registered with Crossref or DataCite
Query strategy
We use the Preprint
content type in the GraphQL API and this includes Crossref DOIs with work type PostedContent
and DataCite DOIs with resourceTypeGeneral Text
and resourceType Preprint
. As a search keyword we use COVID-19
, a highly relevant term in April 2020. Caution: as of April 2020, only a subset of Crossref DOIs (about 8 million) have been included in the common DOI index used in the DataCite GraphQL API, so the results are not including all preprints with DOIs for the keyword COVID-19
. The citations and thus citation counts used in the example are also not comprehensive but can be used as a starting point to build a citation graph.
Why GraphQL
We can query multiple sources (here Crossref and DataCite) in a single query. We can also fetch additional information, in this case how often these preprints have been cited. We include the date issued, which is more detailed information than the publication year, which in the example will always be 2020.
Use the following query in the GraphQL client at https://api.datacite.org/graphql
{
preprints(query: "COVID-19", hasCitations: 1, first: 100) {
totalCount
years {
title
count
}
registrationAgencies {
title
count
}
nodes {
id
publicationYear
publisher
titles {
title
}
dates {
date
dateType
}
citationCount
}
}
}
It is very likely that users will be interested in all research outputs (with DOIs) and not just preprints for the given keyword COVID-19
. Use the following query, again only including research outputs where we found at least one citation:
{
works(query: "COVID-19", hasCitations: 1) {
totalCount
years {
title
count
}
registrationAgencies {
title
count
}
nodes {
id
publicationYear
publisher
titles {
title
}
dates {
date
dateType
}
citationCount
}
}
}
The following query can be used to obtain more information about the citations of a particular research output we found. For performance reasons this is currently not possible in the earlier queries that return multiple DOIs. You will also notice that only one citation was returned for a citation count of 8, which is a bug we will look into.
{
work(id: "https://doi.org/10.15585/mmwr.mm6908e1") {
id
publisher
publicationYear
creators {
id
name
affiliation {
name
}
}
titles {
title
}
descriptions {
description
}
dates {
date
dateType
}
citationCount
citations {
nodes {
id
publisher
publicationYear
creators {
id
name
affiliation {
name
}
}
titles {
title
}
descriptions {
description
}
citationCount
}
}
}
}