Improving Probabilistic Record Linkage Using Statistical Prediction Models

Publication date

2023-12

Authors

Moretti, AngeloISNI 0000000476579573
Shlomo, Natalie

Editors

Advisors

Supervisors

Document Type

Article
Open Access logo

License

cc_by_nc

Abstract

Record linkage brings together information from records in two or more data sources that are believed to belong to the same statistical unit based on a common set of matching variables. Matching variables, however, can appear with errors and variations and the challenge is to link statistical units that are subject to error. We provide an overview of record linkage techniques and specifically investigate the classic Fellegi and Sunter probabilistic record linkage framework to assess whether the decision rule for classifying pairs into sets of matches and non-matches can be improved by incorporating a statistical prediction model. We also study whether the enhanced linkage rule can provide better results in terms of preserving associations between variables in the linked data file that are not used in the matching procedure. A simulation study and an application based on real data are used to evaluate the methods.

Keywords

Linkage errors, matching variables, predictions, propensity scores, Statistics and Probability, Statistics, Probability and Uncertainty

Citation

Moretti, A & Shlomo, N 2023, 'Improving Probabilistic Record Linkage Using Statistical Prediction Models', International Statistical Review, vol. 91, no. 3, pp. 368-394. https://doi.org/10.1111/insr.12535