Quantitative data anonymisation: practical guidance for anonymising sensitive social science data

Publication

FORS Guide Nº 23

How to cite

Kleiner, B. & Heers, M. (2024). Quantitative data anonymisation: practical guidance for anonymising sensitive social science data. FORS Guide, 23, Version 1.0, 1-17. https://doi:10.24449/FG-2024-00023

Abstract

In the social sciences, requirements from funders and journals to make data available often present difficulties for researchers because of data protection issues. Anonymisation is a good solution for addressing the challenges of personal and sensitive data. This FORS Guide provides some practical guidance on how to select and apply techniques for anonymising quantitative data within a larger strategic framework for sharing.

Recommendations

  • Keep in mind that anonymisation is difficult to achieve with social science data.
  • Plan anonymisation at the beginning of your research project, and not at the end. This will allow you to avoid pitfalls that would slow down or prevent its appropriate application.
  • Always consider anonymisation of research data together with consent agreements and access restrictions, with respect to potential risk and data utility. If anonymisation has been promised to respondents, this promise must be kept. We recommend not promising anonymisation, as it is difficult to achieve with social science data. Better use a formulation such as “you will not be identifiable in the data”.
  • Regulating/restricting user access may in some cases offer a better solution than full anonymisation, where data utility may be too diminished.
  • In order to avoid unnecessary operations to protect respondents, one good practice is to ask only for what is really needed in one’s data collection (i.e., minimisation). Collecting data that afterwards must be suppressed or manipulated might involve an unnecessary burden for respondents. Therefore, with respect to anonymisation, we encourage you to consider the consequences of your data collection instrument while you design it.
  • Try to maintain maximum information in the data to the extent that this is possible.
  • Use syntaxes in statistical software to apply the anonymisation techniques. This not only saves time but also helps you to document the anonymisation process.
  • During data collection and data processing, follow good practice in data storage and security, ensuring that only eligible people can access the data.
  • If you are doing longitudinal research, be sure to be consistent in how anonymisation is done across waves.
  • Make available your data only in trusted digital data repositories. We recommend SWISSUbase. This refers to anonymised as well as non-anonymised data.

  • Copyright

    © the author 2024. This work is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0)

    Publication year

    2024