Employment field always empty when using connection

Hi all,
I’m using the datacite commons GraphQL API https://api.datacite.org/graphql to retrieve data, and I noticed when querying an organization and its affiliated people, the employment field for a person is always empty, even though it is filled in their ORCID profile. I don’t have that problem if I query the person directly, then the employment field is always filled exactly like their ORCID profile.

So to make it more specific, if I use this query:

{
  organization(id: "https://ror.org/04qmmjx98") {
   id
people {
  edges {
    node {
      id
      employment {
        organizationId
        organizationName
        startDate
        endDate
      }
    }
  }
}
  }
}

then all the employment fields I get are empty, but if I take one of the person.id that was returned and use it in the following query I get the employment data:

{
  person(id: "https://orcid.org/0000-0003-4106-461X") {
    employment {
      organizationId
      organizationName
      startDate
      endDate
    }
  }
}

That behavior is strange, since I expected to have at least the organization in the employment in the first query, because I assume the employment field makes the connection between an organization and a person.

This is a limitation of the ORCID REST API, which returns much more detailed information for a single record than for a query. We can make this clearer in the GraphQL API, but ideally ORCID would support returning this information also in queries.

If we would be interested in supporting this requirement: Do you know who should we talk to?

I’m a bit confused about what is happening here.

The first query is looking for a ROR id. We don’t yet return ROR ids in our API responses (but will do by the end of the year), but we do return GRID ids. So I am assuming that the commons are doing something behind the scenes to find ORCID iDs with the corresponding GRID ids.

You can see this in the second query, which shows grid.10854.38 not Research Organization Registry (ROR) Search

@mhfenner You have the correct ORCIDs in the first query response. How do you work them out?

@TomDemeranville In GraphQL we can combine multiple REST calls into one GraphQL call. What is happening here is that in the first part we get the GRID ID for ROR ID “Research Organization Registry (ROR) Search”, and in the second part query the ORCID API for the affiliation defined by the GRID ID.

For those who don’t know Tom, he is the ORCID Product Director.

Ah, so you use the search API for a specific GRID, right?

Yes. Which is easy at this point, as GRID and ROR map perfectly.

So this might be somewhat helpful, as it at least includes the other affiliation names: https://pub.orcid.org/v3.0/expanded-search/?q=grid-org-id:grid.10854.38 . I can see a case for including their ORG ids as well.

1 Like

Thanks to both of you for giving a peak behind the curtains :slight_smile:

Now I see, when I start querying at the organization level, I only get the attributes in the link Tom Demeranville provided,
these are ORCID, givenName, familyName and a list of names of institutions. So not only employment is empty but also for example the attributes links or country.
(Btw is there a documentation which API calls are made under the hood of datacite commons?)

But now there is another question:
How come I get a different number of connected ORCIDs to an organization in the ORCID API and in datacite commons?
When I follow Tom Demeranville’s link I get a result of 151 people connected to the gridid and when I use the organization query from my previous post the totalCount is 435?

Sandra, you can always look at the source code if you are comfortable with Ruby, e.g. here: lupo/app/graphql/types at master · datacite/lupo · GitHub. GraphQL in the backend is a bit complex, even more so for these examples where we combine API calls to ROR, ORCID and even a bit of Wikidata.

For the discrepancy in numbers I suggest you compare the results by ORCID ID and other metadata, I have no easy explanation.

1 Like

And to be clear, DataCite Commons is using the same public ORCID REST API that you can use, there are no “secret” API calls between DataCite and ORCID, also not any ORCID member API calls.

Thank you, @mhfenner for the link to the code. I looked though it, and from what I gathered the process to link an organization to its affiliated persons, goes like this:

  • Query organization data from ROR
  • Use the Wikidata-Id from ROR and query organization data from Wikidata
    → Merge together for complete organization data
  • Extract grid-id and ringgold-id from organization data
  • Query ORCID API with ringgold & grid-id for affiliated people

so to extend the example the affiliated people for the UOS would be queried like https://pub.orcid.org/v3.0/expanded-search/?q=ringgold-org-id:9186%20OR%20grid-org-id:grid.10854.38 , then the numbers from Datacite Commons and the query match :white_check_mark:

One thing though is quite unfortunate for our use-case:
When querying the employment data for a person in Datacite Commons consisting of organizationId, organizationName, startDate and endDate, then we get the name that the person typed into ORCID (free text-field) as organizationName and an organizationId ONLY if the disambiguated-organization from ORCID returns a grid-id, but not if the disambiguated-organization is based on the Ringgold-id (then the organizationId is empty), see lupo/person.rb at 8164dcccac39db63b9599cf214ad0599afa76963 · datacite/lupo · GitHub

So if we query an organization and its affiliated people there is unfortunately no way to filter for only current employees, since we would need to disambiguate the organization based only on its name (in case the ORCID disambiguation was based on Ringgold and organizationId is null)

For example:

"employment": [
        {
          "organizationId": null,
          "organizationName": "Technische Informationsbibliothek Universitätsbibliothek Hannover",
          "startDate": "2008-03-01T00:00:00Z",
          "endDate": "2012-12-13T00:00:00Z"
        },
        {
          "organizationId": null,
          "organizationName": "Freie Universität Berlin",
          "startDate": "2007-10-01T00:00:00Z",
          "endDate": null
        }]

@mhfenner do you know why the Ringgold-id is not set as organizationId, if it was used for disambiguation by ORCID? Would it be possible to set it as organizationId, so the organization is uniquely identified?

@sandram this is a limitation of the data that ORCID uses. There is no official mapping of Ringgold to ROR/GRID, so using the Ringgold ID would create an “island” organization data not connected to the other organization data in DataCite Commons. We can fix that if Ringgold and/or ORCID provide an official mapping, as we do for the Crossref Funder ID/ROR and of course GRID/ROR mapping.

hmm and what about using the mapping Ringgold <-> Grid from Wikidata like Datacite Commons is already doing for the organization data in reverse order?

So if ORCID returns a Ringgold-id, you could query Wikidata for the Grid-id like:
https://tools.wmflabs.org/wikidata-todo/resolver.php?prop=P3500&value=#{ringgold-id}

for the example (UOS) this would be:
https://tools.wmflabs.org/wikidata-todo/resolver.php?prop=P3500&value=9186

and then instead of Ringgold, set the respective Grid-id?

@sandram I looked into Wikidata as intermediary for mapping Ringgold and ROR when we built the system last year, but ultimately decided against it for mainly reasons: a) the workflow is rather complex, which would significantly slow down the system if not built carefully (remember that for a variety of reasons we do live api calls in many cases), b) ORCID has committed to build ROR into their affiliation information.

Have you looked at going via ISNI? GRID and ROR have ISNI id’s in their public data releases, and I believe there’s an official Ringgold-ISNI mapping somewhere also?

Arthur, there are of course various ways to map organization identifiers, but for production infrastructure it has to be practical in the legal and technical sense. And again, ORCID is working on adding ROR support, which should address the use case by @sandram.

Thank you for all the information @mhfenner

Do you know, if ORCID will add ROR as an additional organization identifier or replace the existing ones and use only ROR?
Because in the first case, if they keep Ringgold, the field would still be mapped to null …

@sandram I would reach out to ORCID directly for more details of their planned implementation. Maybe @TomDemeranville will respond here.