COUNTER Code of Practice for high-volume Research Data?

For high-volume data like Earth System Model data, usually data is referenced on collections consisting of many datasets (e.g. datasets of a model run) but statistics are based on dataset download information. The COUNTER Code of Practice for Research Data implicitly assumes that DOIs are registered for datasets. How could high-volume data provide data usage (or rather data viewing and data download) information according to the COUNTER standard?

In Earth System Modeling, many variables are required to analyze the simulated climate for a future projection. Therefore usually all variables created in the same model run or over an ensemble of model runs receive a single DOI to be referenced in scholarly articles and included in the reference list. Moreover, several models in different configurations contribute their data to such an international research project, e.g. CMIP (Coupled Model Intercomparison Project) or CORDEX (Coordinated Regional Climate Downscaling Experiment). Thus a common use case for data usage statistics requested by the project management is: Which climate variables were downloaded the most? That information is used for the data request in the next phase of the project. Another use case are our annual reports on data downloads, where we count dataset downloads and volume of downloaded data. Thus, download information on the dataset or the sub-DOI-granularity is essential.

Our use case in COUNTER terminology:

  • Component = File (in binary community formats)
  • Dataset = Timeseries of a variable consisting of >=1 files. This granularity is suitable for statistics, versioning and tracing/provenance information.
  • Collection = ~100 to ~10000 Datasets belonging to a numeric model run or an ensemble of model runs. This data is created with one source (numeric model) and for one research question and therefore suitable for data citation (DOI).

How to use the COUNTER standard on our data?

  1. Aggregate all information on the DOI granularity:

That possibility is easy to implement but all information on the dataset level is lost, which on the other hand is essential for many statistical applications. Therefore, the usefulness of this download information is questionable.

  1. Provide the COUNTER information for datasets:

In this case, rich information on data viewing and data download is provided. However, what is the ‘dataset-id’? We can provide urls leading to information on the specific dataset but would like to keep also the DOI reference, to which the dataset belongs, as relation of type ‘IsPartOf’.

  1. Provide COUNTER information on DOI and dataset levels:

If we provide reports for DOI data collections, the DOI could be specified as part of the general information of the report. That information would be an extension because mixing report information with reported content should be avoided. The body of the report would be as in 2. but no reference of the DOI granularity within an individual dataset report would be needed.

Is anyone dealing with the same problem or has a view on this?



1 Like

Martina, the COUNTER Code of Practice for Research Data does not require DOIs as identifiers for datasets. If it makes more sense in your situation to provide handles or internal identifiers that is also fine. That should allow you to implement option 2).

Thanks Martin,

how do I preserve the DOI, to which a dataset belongs in the report?

We have several use cases where the connection between dataset download and DOI reference is required. Currently, these two are split between the “publishing world” like Scholix, WoS, Google Dataset Search or alike, and the “technical world” around downloads, provenance and other tracing issues.

Is there a possibility in the dataset report for such a

  • ‘dataset x is part of DOI y’ (technical) or
  • ‘dataset x is cited as DOI y’ (functional) information?

Moreover, we do provide two citation granularities on different levels for the very large international projects. How can we describe that in the report?

Thanks again,

Hi Martin,

a late follow-on question for you:

If we publish COUNTER code usage information based on datasets (sub-DOI granularity), what are the implications for your services?

It only makes sense to provide usage information for DataCite, if this information is accessible and visible at your end. For example, the DOI information in DataCite’s Search will possibly still say after providing our usage information that there is no usage information available.

Thanks again,