Automatic de-identification of Data Download Packages

Publication date

2021

Authors

Boeschoten, LauraISNI 0000000492859815
Voorvaart, Roos
van den Goorbergh, Ruben
Kaandorp, CasperISNI 0000000506769619
De Vos, Martine G.

Editors

Advisors

Supervisors

Document Type

Contribution to conference
Open Access logo

License

cc_by

Abstract

The General Data Protection Regulation (GDPR) grants all natural persons the right to access their personal data if this is being processed by data controllers. The data controllers are obliged to share the data in an electronic format and often provide the data in a so called Data Download Package (DDP). These DDPs contain all data collected by public and private entities during the course of a citizens’ digital life and form a treasure trove for social scientists. However, the data can be deeply private. To protect the privacy of research participants while using their DDPs for scientific research, we developed a deidentification algorithm that is able to handle typical characteristics of DDPs. These include regularly changing file structures, visual and textual content, differing file formats, differing file structures and private information like usernames. We in

Keywords

Instagram, de-identification, anonymization, pseudonymization, Data Download Package

Citation

Boeschoten, L, Voorvaart, R, van den Goorbergh, R, Kaandorp, C & De Vos, M G 2021, 'Automatic de-identification of Data Download Packages', pp. 101-120. https://doi.org/10.3233/DS-210035