Nine basic steps to pseudonymize your research data

Advice: evaluate your current pseudonymization methodology.

02/17/2020 | 10:12 AM

When doing research involving human subjects, you have to ensure the protection of your data. There are various technical and organizational measures you can take to protect the privacy of your participants, one of them being pseudonymization. Jolien Scholten, one of the specialists working at the VU Research Data Management (RDM) Support Desk, recently participated in a national task group on pseudonymization. One of the results is a flyer which presents nine basic steps for pseudonymization, including how to handle key files generated by the pseudonymization process. It is a useful resource for researchers at the VU working with small-scale quantitative data about human subjects.

What is pseudonymization?
Usually the process of pseudonymization entails separating directly identifying personal data from the rest of the dataset, and replacing it with an artificial identifier. In some disciplines this is also known as coding. Pseudonymization makes your dataset less identifiable, because it’s harder to identify subjects if information such as name, date of birth, postal code etc. are not included in the pseudonymized dataset with which you perform your research. There are various reasons to pseudonymize data, for example to ensure scientific integrity (e.g. preventing bias when analyzing the data), but also to protect the privacy of your participants. Note that there are several definitions of pseudonymization. In the flyer pseudonymization is interpreted in the sense described here: removing directly identifying information. The General Data Protection Regulation (GDPR, AVG in Dutch) uses a stricter definition, saying that pseudonymization should result in a situation in which it’s only possible to re-identify someone via the artificial identifier that is assigned to the specific person concerned.

Pseudonymization is not the same as anonymization
It is important to be aware that pseudonymization is not the same as anonymization according to the GDPR; a pseudonymized dataset is still subject to the GDPR. However, pseudonymization is nevertheless an important step to take to ensure the protection of your data and privacy of the research subjects. For information on anonymization, check out this postcard from another national task group, which was led by Jessica Hrudey, Research Data Officer and Privacy Champion at the Faculty of Behavioral and Movement Sciences. See the page about the VU RDM Community for a list of all data stewards at the VU, including their contact information, and VUnet for an overview of Privacy Champions, who can help you with questions relating to the GDPR.

Advice: evaluate your current pseudonymization methodology
Even if you already have an established pseudonymization workflow, we still advise you to have a look at the flyer and evaluate whether your current methodology is in line with the advice presented there.

The flyer mentions both technical and organizational measures that you should take to protect personal data, including how to:
1) separate research data from direct identifiers;
2) store research data and key file separately and;
3) organize access rights to both the research data and key file.

Pseudonymization at the VU
By participating in national task groups such as the ones mentioned above, the VU stays informed on the issues of pseudonymization and anonymization. The VU Research Data Support Program is developing plans to improve the infrastructure for information security, including tools that are necessary for pseudonymization. If you would like to know more about these developments, contact the RDM Support Desk.

By Jolien Scholten and Jessica Hrudey.