PID Graph GraphQL Example Dissertations

User story

As a researcher, I want to discover all dissertations potentially relevant for my work, so that I don’t miss out on important science.

Search strategy

We focus our search on dissertations with a DOI. DataCite doesn’t use a controlled vocabulary for text documents, but the query for dissertations retrieves all text documents with resourceType Dissertation or Thesis, combined over 300k publications. We also search Crossref for content type Dissertation, but don’t have any DOIs of that content type in what we have indexed from Crossref so far.

We use the query term “expanding universe” from astrophysics in our example. What stands out in the 14 results is a dissertation published in 1966, 41 years before the next dissertation found. This is the dissertation by theoretical physicist Stephen Hawking, and we can find more publications by him via his ORCID ID provided in the metadata of the dissertation.

Why GraphQL

We can provide a simplified search interface for dissertations to users without a need to update the metadata schema and have repositories update DOI metadata. We can also add additional information such as citations, views and downloads of each dissertation.

The GraphQL API provides a backend service for building a “Dissertation/Thesis Search” across institutions and countries with a simple Javascript single page application.

Use the following query in the GraphQL client at https://api.datacite.org/graphql

{
  dissertations(query: "expanding universe") {
    totalCount
    years {
      title
      count
    }
    repositories {
      title
      count
    }
    nodes {
      id
      publicationYear
      publisher
      creators {
        id
        name
      }
      titles {
        title
      }
    }
  }
}

@maddenfc and @RachaelK potentially of interest.

1 Like

Thanks for this Martin. I just tried a variation of this query looking for machine learning dissertations.

    {
  dissertations(query: "machine learning", first: 30) {
    totalCount
    years {
      title
      count
    }
    nodes {
      id
      publicationYear
      publisher
      creators {
        id
        name
      }
      titles {
        title
      }
    }
  }
}

It only returned 25 nodes, rather than the 30 I would expect. Is there a limit on how many can be returned at once or is there an error in my query?

Thanks @maddenfc. This is probably a bug I will look into it. The query returns 1440 results in total.