Searching for suffix convention for de-identified clinical trial dataset

Just joined. I am looking for some guidance from a pharma sponsor or from a government/non-profit working with Phase II-III patient clinical trials.

My org wants to create DOIs for 200 de-identified cancer clinical trials shared on our platform. Was hoping to tie in the 11-character ID string used by, for example “NCT12345678”. Our accrual is slow, expecting to have no more than 1000 items by 2030.

Within the current 200 data items, there are some duplicates, some are subsets of others, and a small number were never assigned a identifier.

Of course we could simply number them sequentially, but our first thought was that with the right convention we could create a “speaking number/letter” combination that shows relationship to other items and that shows heritage to their source trial ( identifier) or some attribute (type of cancer).

Hoping for any suggestion to lead me to create a clear and maintainable convention. Thanks!

1 Like

Welcome aboard. :slight_smile: DataCite generally recommends using DOIs that don’t carry any meaning. You can use the relatedIdentifier field to express the kinds of relationships you’re talking about. You could for instance use the relatedIdentiferType “URL” to express the clinical trial IDs as URLs (looks like they take the form Using relatedIdentifiers also has the added advantage that you can specify the relationType, so then you can make it clear which of the trials are identical and which are subsets, etc.

1 Like