PID Graph GraphQL Example Preprints

User story

As a librarian, I want to find all preprints with a DOI for a given search term, no matter whether registered with Crossref or DataCite

Query strategy

We use the Preprint content type in the GraphQL API and this includes Crossref DOIs with work type PostedContent and DataCite DOIs with resourceTypeGeneral Text and resourceType Preprint. As a search keyword we use COVID-19, a highly relevant term in April 2020. Caution: as of April 2020, only a subset of Crossref DOIs (about 8 million) have been included in the common DOI index used in the DataCite GraphQL API, so the results are not including all preprints with DOIs for the keyword COVID-19. The citations and thus citation counts used in the example are also not comprehensive but can be used as a starting point to build a citation graph.

Why GraphQL

We can query multiple sources (here Crossref and DataCite) in a single query. We can also fetch additional information, in this case how often these preprints have been cited. We include the date issued, which is more detailed information than the publication year, which in the example will always be 2020.

Use the following query in the GraphQL client at https://api.datacite.org/graphql

{
  preprints(query: "COVID-19", hasCitations: 1, first: 100) {
    totalCount
    years {
      title
      count
    }
    registrationAgencies {
      title
      count
    }
    nodes {
      id
      publicationYear
      publisher
      titles {
        title
      }
      dates {
        date
        dateType
      }
      citationCount
    }
  }
}

It is very likely that users will be interested in all research outputs (with DOIs) and not just preprints for the given keyword COVID-19. Use the following query, again only including research outputs where we found at least one citation:

{
  works(query: "COVID-19", hasCitations: 1) {
    totalCount
    years {
      title
      count
    }
    registrationAgencies {
      title
      count
    }
    nodes {
      id
      publicationYear
      publisher
      titles {
        title
      }
      dates {
        date
        dateType
      }
      citationCount
    }
  }
}

The following query can be used to obtain more information about the citations of a particular research output we found. For performance reasons this is currently not possible in the earlier queries that return multiple DOIs. You will also notice that only one citation was returned for a citation count of 8, which is a bug we will look into.

{
  work(id: "https://doi.org/10.15585/mmwr.mm6908e1") {
    id
    publisher
    publicationYear
    creators {
      id
      name
      affiliation {
        name
      }
    }
    titles {
      title
    }
    descriptions {
      description
    } 
    dates {
      date
      dateType
    }
    citationCount
    citations {
      nodes {
        id
        publisher
        publicationYear
        creators {
          id
          name
          affiliation {
            name
          }
        }
        titles {
          title
        }
        descriptions {
          description
        }
        citationCount
      }
    }
  }
}

@bmkramer possibly of interest to you. Please keep in mind that this is still a pre-release version of the GraphQL API, and the service does currently only cover a subset of Crossref DOIs (about 8 million, the number will be much higher by the end of the year).

@mhfenner This seems very useful to me. I would like to retrieve all preprints from medRxiv along with all citations attributed to these preprints. I would also need the date of each citation. Do you think this is possible?