Documentation for the ROR API?

Is there public documentation for the ROR API yet?

I saw the https://api.ror.org/organizations endpoint mentioned in https://www.wikidata.org/wiki/Wikidata:Property_proposal/ROR_ID, but I couldn’t find anything from https://ror.org/. I’m especially interested in documentation regarding pagination, since https://api.ror.org/organizations returns 20 items by default.

Also, is there a way to download (releases/tags of) the whole database as a single file?

Duh, of course there is, and I find it right after posting this… https://github.com/ror-community/ror-api/blob/master/api_documentation.md

My second question still holds though. :slight_smile:

@Luc, right now pagination is hardwired to 20 per page.

There will be an official data dump soon, until then you can download the data as zipped JSON file from https://github.com/ror-community/ror-api/tree/master/rorapi/data (one folder per GRID release).

1 Like

Thanks Martin, that’s exactly what I was looking for!

Also, nice to see that the JSON for ROR uses (almost exactly) the same format as the JSON for GRID. :+1:

Edited to add: for those interested, see also Users need to be able to download the entire ROR dataset · Issue #26 · ror-community/ror-api · GitHub.

@luc thanks for raising the GitHub issues. And we will include additional metadata from GRID in the ROR output, let us know if you are interested in anything in particular.

As far as Cobaltmetrics is concerned, we’re only interested in PIDs/URIs/URLs and links between these IDs, so the current data dump looks complete.

Thanks for the feedback @Luc.

The paging only works up to a point. Paging from the start goes fine, but at the time I tried, the first page said there were 96793 results, which would be 4840 pages of 20 results. But when I tried to get page 4840, I got the following error:

TransportError at /organizations
TransportError(500, 'search_phase_execution_exception', 'Result window is too large, from + size must be less than or equal to: [10000] but was [96800]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.')

And indeed, I get that error starting from page 501, the first page that exceeds 10000.

Also, the error is accompanied by a complete django stack trace, which is probably something you want to disable on production, to not give people with bad intentions more information than is strictly necessary.

Thanks @martijn. This is known limitation of search indexes such as Elasticsearch, the deep paging problem, and there is an issue open in the ror-api GitHub repo. API documentation is here. I think the solution is threefold:

  • limit pagination to 10,000 results
  • implement cursor-page pagination, which overcomes this limitation
  • provide a data dump of all data for people who basically want all data
  • provide better API documentation

We are working on this.