We assign DOIs to Plant Genetic Resources and would like to identify publications and datasets that reference our DOIs. So far, I have been using the EventData API but PID Graph intrigues me and I would like to start using GraphQL. I have been able to put together a query that almost works:
{
publications(
query: "relatedIdentifiers.relatedIdentifier:10.18730/*"
first: 10
) {
nodes {
doi
titles {
title
}
publicationYear
publisher
creators {
familyName
givenName
}
relatedIdentifiers {
relationType
relatedIdentifier
relatedIdentifierType
}
}
}
}
I say “almost” because I would like to receive only the relatedIdentifiers that match the query string, not all of them. The question is: is there a way to filter the relatedIdentifiers in the returned nodes?
Thank you!
@marco.marsella when using a GraphQL API, I suggest to follow a two step process:
- define the query that returns exactly what you want. Make use of PIDs where possible
- define the fields you want the API to return to you.
In your particular case I would do a query that basically says:
identify all works (in this case all physical samples) from my repository (DataCite repositoryID FAO.ITPGRFA) that have been cited and return basic information about them, and the citing publications and datasets.
{
works(repositoryId: "fao.itpgrfa", hasCitations: 1) {
totalCount
published {
id
count
}
nodes {
id
titles {
title
}
citations {
totalCount
nodes {
id
titles {
title
}
}
}
}
}
}
As you can see the query returns zero results. Which is more about the data we have collected (or rather not) collected so far, and the reformatting needed to get the citations from Event Data into the GraphQL API. When you query the DataCite Event Data query API directly, you find 19 citations from publication metadata provided by Crossref: https://api.datacite.org/events?prefix=10.18730&source-id=crossref
Getting this information into GraphQL is thus a good starting point for the DataCite team.
The same query returns lots of results for a data repository where each dataset is linked to an article on submission:
{
works(repositoryId: "dryad.dryad", hasCitations: 1) {
totalCount
published {
id
count
}
nodes {
id
titles {
title
}
citations {
totalCount
nodes {
id
titles {
title
}
}
}
}
}
}
Once you have a working query you can do additional filtering (e.g. only published in 2020) and change what fields are returned.
Best, Martin
Also, you can do the same queries in DataCite Commons if you prefer a web interface:
The last query is actually using FAO the organization (using its ROR ID) instead of a specific repository. In GraphQL it looks like this:
{
organization(id: "https://ror.org/00pe0tf51") {
name
inceptionYear
works(hasCitations: 1) {
totalCount
published {
id
count
}
nodes {
id
titles {
title
}
creators {
id
name
affiliation {
id
name
}
}
citations {
totalCount
nodes {
id
titles {
title
}
}
}
}
}
}
}
You see that two Dryad datasets with citations and an author affiliated with FAO are found. I think we will be ablte do similar queries starting with a repository identifier instead of ROR identifier next year.
Best,
Martin
@mhfenner thank you for the hints, but my issue is exactly the application of filters. I got the query returning the 2 publications that cited least one of our DOIs along with the cited DOIs. What I would like to do now is to filter the cited DOIs to just return ours and filter out the others. I could not find a way of doing this. Also, I could not find any way of returning only the publication DOIs updated after a given date because I do not want to scan the entire list every time.
Marco, filtering results by date range would indeed be a very useful functionality, but is not supported yet. I will add it to our TODO list.
I am not sure about the other filter, only showing your DOIs, and only those that have been cited. Is the following not providing that?
works(repositoryId: "fao.itpgrfa", hasCitations: 1)
Where hasCitations: 1
means at least one citation.
Best,
Martin
My query works the other way around. I find the publications that cite at least one of our DOIs and then would like to filter the relatedIdentifiers associated to each node to only return our DOIs. Your query works, but the logic is not what I would like because the cardinality of the GLIS DOI->Publication DOI relationship is quite high (there will be publications with hundreds of GLIS DOIs cited).
My question is in general: is there a way of filtering the relatedIdentifiers?
Thanks @marco.marsella, happy to rework to a query that makes sense for you. I would prefer to not use relatedIdentifier
because that is a field in the DataCite metadata that includes all kinds of related content. I would prefer to use the concept of citations
which requires specific relation types. This could be done using relatedIdentifier
and relationType
, but then becomes a rather complex query.
Is the basic question "Please show me all citations since November 1st to any DOI registered in the “fao.itpgrfa” repository? Which in GraphQL could look like
{
citations(repositoryId: "fao.itpgrfa", dateRegistered: "[2020-11-01 TO *]") {
totalCount
nodes {
id
titles {
title
}
references(repositoryId: "fao.itpgrfa") {
id
}
}
}
}
This is not working yet, but happy to implement if this is what you had in mind, as this is probably a very common use case.
@mhfenner this looks much better! I look forward to testing it!
Thank you!
@marco.marsella. This is a bit of work, I hope to report back here in December. Best, Martin