I’m curious how to express the relationship between code (software) and a dataset via RelatedIdentifier. Here are two examples I’m currently looking at though there are probably others. (1) The code is used to analyze the data, and both data and code are cited by the related article. Is there a way to represent the relationship between the data and code, or are they separate elements only related via the article (as citedby and cites)? (2) The code is used to transform raw data (a) into data that is then published (b) and used for an article. Here the second dataset is clearly dependent on the original data and the code, regardless of a related publication.
Thanks @brian-westra for this question. I would certainly argue for linking the dataset and code together directly where possible, but I agree with you that the choice of relationType is ambiguous in these scenarios.
(1) The code is used to analyze the data, and both data and code are cited by the related article.
You could argue that the related article is the output of analyzing the data with the code, and so having the article cite both is sufficient. But if the code is created specifically to analyze the data (as opposed to being generic/multi-purpose software), you could also say that the data “IsRequiredBy” the code.
(2) The code is used to transform raw data (a) into data that is then published (b) and used for an article.
While we have “IsDerivedFrom”/“IsSourceOf” for relationships between raw and processed datasets, there isn’t a clear choice for linking the processed data to the code that performs that processing. I could see either “IsCompiledBy” (the processed data is compiled by the code, sort of) or “Requires” (processing the dataset required the use of the software) being possibilities.
What do you think of these options? We’re also always open to suggestions for improving the DataCite Metadata Schema in cases where there is no appropriate relationType.