De-identifying human subject data for sharing


JHU Data Management Services will be giving training sessions periodically, starting this spring, in which we offer tips and techniques for preparing human subject data for public access, for both quantitative and qualitative research. More and more funding agencies, publishers, and research communities are asking researchers to make results of funded studies publicly available to other researchers. While funders acknowledge IRB and federal restrictions on protecting identities of research subjects, there are benefits to sharing de-identified datasets. These include allowing peer review, reproducibility, building upon prior research, and increasing citation rates, as well as saving on the costs of new data collecting and of administering access to restricted datasets.

Removing personal identifiers from data can be a significant effort, but our workshop offers a range of methods for more efficiently integrating identifier protection throughout the research process. These include:

  • Making sure IRB forms and participant consent forms are provided for sharing de-identified data after the project is complete.
  • Knowing what types of studies pose greater disclosure risk, such as those with small samples from geographically specific areas, sensitive topics, or protected subjects such as children, and multiple demographic variables.
  • Being alert for variables that could be associated with data from public databases or the internet, such as Facebook profiles, and ‘outlier’ subjects with unusual combinations of variables linkable to a particular region or group.
  • Removing identifiers during data analysis rather than waiting until the study is complete. Keep any documentation of changes and codes in a secure encrypted file.

The training will discuss statistical techniques for de-identifying variables in quantitative studies, and also ways to de-identify qualitative data, including audio/video.

A goal of disclosure protection is to increase the level of uncertainty for identifying any given subject, without removing too many variables and limiting the utility of the data for further research. The level of identity disclosure risk is at the researcher’s discretion to an extent, so always check with IRB for their assessment of whether a dataset is adequately protected before sharing it with others or online.

Join JHU Data Services for training sessions on De-identifying Human Subject Data for Sharing. The next session is:

April 10th, 12-1:30 PM Homewood, Brody Learning Commons, 4040

Please RSVP to to attend.

Leave a reply