Documentation for the ROR API?

Luc · June 21, 2019, 3:46pm

Is there public documentation for the ROR API yet?

I saw the https://api.ror.org/organizations endpoint mentioned in https://www.wikidata.org/wiki/Wikidata:Property_proposal/ROR_ID, but I couldn’t find anything from https://ror.org/. I’m especially interested in documentation regarding pagination, since https://api.ror.org/organizations returns 20 items by default.

Also, is there a way to download (releases/tags of) the whole database as a single file?

Luc · June 21, 2019, 3:49pm

Duh, of course there is, and I find it right after posting this… https://github.com/ror-community/ror-api/blob/master/api_documentation.md

My second question still holds though.

mhfenner · June 27, 2019, 12:14pm

@Luc, right now pagination is hardwired to 20 per page.

There will be an official data dump soon, until then you can download the data as zipped JSON file from https://github.com/ror-community/ror-api/tree/master/rorapi/data (one folder per GRID release).

Luc · June 27, 2019, 1:14pm

Thanks Martin, that’s exactly what I was looking for!

Also, nice to see that the JSON for ROR uses (almost exactly) the same format as the JSON for GRID.

Edited to add: for those interested, see also Users need to be able to download the entire ROR dataset · Issue #26 · ror-community/ror-api · GitHub.

mhfenner · June 27, 2019, 3:53pm

@luc thanks for raising the GitHub issues. And we will include additional metadata from GRID in the ROR output, let us know if you are interested in anything in particular.

Luc · June 27, 2019, 4:31pm

As far as Cobaltmetrics is concerned, we’re only interested in PIDs/URIs/URLs and links between these IDs, so the current data dump looks complete.

mhfenner · June 28, 2019, 4:44am

Thanks for the feedback @Luc.

martijn · July 1, 2019, 7:57am

The paging only works up to a point. Paging from the start goes fine, but at the time I tried, the first page said there were 96793 results, which would be 4840 pages of 20 results. But when I tried to get page 4840, I got the following error:

TransportError at /organizations
TransportError(500, 'search_phase_execution_exception', 'Result window is too large, from + size must be less than or equal to: [10000] but was [96800]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.')

And indeed, I get that error starting from page 501, the first page that exceeds 10000.

Also, the error is accompanied by a complete django stack trace, which is probably something you want to disable on production, to not give people with bad intentions more information than is strictly necessary.

mhfenner · July 2, 2019, 7:02am

Thanks @martijn. This is known limitation of search indexes such as Elasticsearch, the deep paging problem, and there is an issue open in the ror-api GitHub repo. API documentation is here. I think the solution is threefold:

limit pagination to 10,000 results
implement cursor-page pagination, which overcomes this limitation
provide a data dump of all data for people who basically want all data
provide better API documentation

We are working on this.