PIDs for participants in a longitudinal health registry?

Here is another unusual question. A group of researchers have recently reached out with a question about assigning PIDs to a longitudinal health registry. I have explained to them that PIDs that we tend to work with are public identifiers, used to create a persistent web presence for digital assets, while they want to use smarter personal identifiers for patients, with data encoded in them (e.g. dates), unlike just the consecutive numerals.

Does anyone know about the best practices in assigning PIDs for patients? I haven’t seen much use in the literature yet…



My personal view is that any identifiers for people in clinical data – including in a registry – should always be completely opaque and information free. Firstly, the embedded information may need to be revised or corrected, in which case the identifier immediately becomes misleading. Secondly, these identifiers are unlikely ever to be PIDs – just identifiers in the context of a registry. In general, they could not be used elsewhere unless other users can also identify the individuals concerned (at which point the red flags start to wave). Thirdly, and most importantly, any PID is designed to be shared and / or cited, and creating identifiers in this way would therefore involve sharing information about individuals, (whether or not people know how the data is embedded or what it represents), for which you will normally need consent, at least in Europe under GDPR. It would be difficult to get meaningful consent unless all uses of the identifiers was foreseen. In addition, even if these identifiers were never available publicly, consent can be withdrawn – and the identifier becomes unusable. So in general I would keep well clear of this approach!
I think if PIDs are to be applied to people it needs to be on a voluntary process, and the individuals should know all about them – for example using something like the ORCID system, as you demonstrate in your post.
