Data Linkage

Publication

FORS Guide Nº 18

How to cite

Vaccaro, G. & Swerts, E. (2022). Data Linkage. FORS Guide No. 18, Version 1.0. Lausanne (FORS).
doi:10.24449/FG-2022-00018

Abstract

This guide provides an overview of data linkage, analyses the main advantages, and summarizes the challenges quantitative researchers and data practitioners face when working with multiple data sources. Moreover, this document aims to provide examples of the developments of data linkage in Switzerland as well as inform practitioners about key steps when linking data.

Recommendations

  • Take care of data security: Data linking increases the risk of identification of individuals. It is therefore important to respect legal frameworks, and to exercise basic precautions such as keeping data in secure locations. To secure data after being used, usually it is recommended and sometimes required to delete or erase the data after the completion of the project.
  • Check the presence of variables that allow the matching process. In the absence of a unique identifier, the quality of the database is crucial for matching. Errors in the descriptive variables of the data, names, codes, can greatly reduce the accuracy and quality of the matching. It is therefore important to carefully check the presence of key variables in the data before processing the data linking.
  • Correctly choose your type of linkage method. The type of method to be used to match the data depends on the configuration of the data. If they have common unique identifiers, then the deterministic method can be used. Without common unique identifiers, a probabilistic method will be used.
  • Check legal formalities and contracts. Depending on the degree of sensitivity of the data, a contract will have to be signed with the owner of the data to define the modalities of access and the framework of the use and the diffusion of the matched data.
  • Facilitate reproducibility of results and finding replications. Document clearly the matching procedure, methods involved, and all processes involved in the data linkage, so that other researchers can replicate your findings.

Copyright

Copyright: © the authors 2022. This work is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0)

Publication year

2022