GraphQL project ideas

Hi all
I have an undergraduate Computer Science student who’ll be working with me over the Summer (ten weeks). I’d like to have try out the GraphQL API. If any of you have thoughts on something you would like to see done in that area but have never had the time to get to then please post them here.

Many thanks
Hugh

Thanks Hugh. There are lots of things you can do with the current DataCite GraphQL API, but there is also functionality still missing. So let us also know if there is something you want to add to the GraphQL API, e.g. the re3data API.

Hi Martin - thanks for this. We’ll have a look at the r3data API.

The re3data API would allow queries by repository characteristics, both subject area, keywords, etc. to find a specific repositories, but also certification, use of persistent identifiers, etc. - the work DataCite and re3data are doing in the FAIRsFAIR project.

Hi Martin, I’m the undergraduate student who will be undertaking the work, and I just had a few questions regarding your idea around integrating the functionality of the re3data API into the DataCite API.
Are you implying that for each DOI, where some repository information exists, these would be coupled in the graph? And to do this, would it require for me to have copies of the entirety of the data from both DataCite and re3data such that I can merge these datasets and extend the current GraphQL schema?
My thoughts are that I would need to write some tool to parse the data and insert it into the extended schema, but do let me know if my thoughts are incorrect.

I do think this is an excellent idea though, as the flexibility GraphQL brings over traditional REST APIs are great.

** EDIT

So upon further inspection, I’ve started to understand the more low-level aspects of GraphQL, and it appears one has a set of resolvers that can allow you to wrap GraphQL around a REST API, if I am to integrate the re3data API into the current DataCite API, it appears to me that all I would need is access to the backend code so that I can add the necessary resolvers to account for any schema changes. Is there some way I could do so?

Adam,

DataCite will do the work of integrating re3data into the GraphQL API, and I hope to do that before the end of the month. You can focus on consuming this information, coming up with interesting queries and with data from elsewhere, and combining everything into a Jupyter notebook.

For the re3data GraphQL implementation I would use live data from re3data, avoiding extra work of copying data, and not worrying about the data getting out of sync. re3data has only about 2,000 records, so building a GraphQL layer on top of the REST API is straightforward, and I already did something similar with the Crossref Funder ID REST API.

Hi Martin
If it’s alright I would like to clarify something here. Are you okay with Adam putting the code together for the re3data GraphQL implementation? In the second paragraph above you are suggesting that but in the first you’re suggesting that Adam do something more exploratory. Having had a one-to-one with Adam, it’s clear that he would like to have a project with a clear goal rather than a more exploratory angle. If the above proposal is something DataCite is doing then no worries, I’ll work with Adam to have a clear project; I just want to make sure that we’re all happy.

Thanks! Hugh

Hi Martin - I had another chat with Adam and yes we need to focus on something else apart from the integration idea.

Hugh

Hugh, for a number of reasons it makes more sense that we build the GraphQL API for re3data, mainly that we run the infrastructure already anyway, and that we committed to build the API in the FREYA and FAIRsFAIR projects.

I have now built and launched a first iteration of the re3data GraphQL API, accessible via https://api.datacite.org/graphql. It provides the query parameters from https://repositoryfinder.datacite.org, in particular those relevant for repository recommendations (filter by subject area, data is openly available, use of PIDs, certification). An example query would be

query {
  repositories(
    subject: 11,
    disciplinary: true
    open: true
    pid: true
    certified: true
    first: 50
  ) {
    totalCount
    nodes {
      id
      name
      description
      url
      subjects {
        scheme
        text
      }
      certificates {
        text
      }
      types {
        text
      }
      pidSystems {
        text
      }
    }
  }
}

This returns the fields specified for all data repositories in the humanities (subject area 11 from the DFG classification used by re3data) that are open, use PIDs and are certified. Including the datasets from these repositories (if they use DataCite DOIs) is one of the next steps.

There are a lot of things you can do with this GraphQL API, from exploration using Jupyter notebooks to services that use re3data data. I hope that makes it clearer what I had in mind. Also happy to have a call for a chat, if that makes things easier.

Hi Martin
thanks for this - Adam and I are clear on this now. We have another idea we’re trying out but I’ll try and get in touch with you this week.
Hugh

Great! I look forward to hear back from you.