GBIF use case: assigning DOIs to downloads

The Global Biodiversity Information Facility (GBIF) is a network and research infrastructure aimed at providing anyone, anywhere, open access to data about all types of life on earth. GBIF stores data including ~16,000 datasets made up of 1.1 billion “occurrences” or evidence of an organism occurring at a specific time and place, as well as 25,000 datasets that are “checklists” or lists of named organisms. GBIF datasets receive around 155K user sessions and 36 billion downloads per month. There are some 3,583 peer-reviewed articles citing the use of GBIF-mediated data.

Typically, users request data on specific taxa, periods of time and/or geography. For example searches might correspond to “Blue Whales”(> 69 datasets), “Birds in Japan” (> 78 datasets), “Reptiles in Africa between 1950 and 1980” (> 88 datasets). The problem was how to cite this data, which could span a large number of datasets.

The solution was to assign DOIs to downloads. This way users are able to cite the use of data spanning numerous datasets. Each download request is stored and is assigned a DOI resolving to a landing page which includes the date and size of download, the filters used in the query, a link to re-download and the details of the contributing datasets. The relationships are modeled in metadata.

References to these downloads can then be included in the structured metadata with Crossref, and the citations and usage statistics are stored in Event Data. The ability to track citations and usage makes an important contribution to exposing the impact of open data sharing.

This use case was presented at the DataCite Member’s meeting on 1 April 2019 (slides).

3 Likes

Thanks @dnoesgaard! For those reading this, there is a related post from Daniel’s colleague @trobertson: Billions of records

1 Like