GenSynthPop: generating a spatially explicit synthetic population of individuals and households from aggregated data

Publication date

2024-10-03

Authors

de Mooij, Jan
Sonnenschein, Tabea S.
Pellegrino, Marco
Dastani, Mehdi
Ettema, Dick
Logan, Brian
Verstegen, Judith A.

Editors

Advisors

Supervisors

Document Type

Article

Collections

Open Access logo

License

cc_by

Abstract

Synthetic populations are representations of actual individuals living in a specific area. They play an increasingly important role in studying and modeling individuals and are often used to build agent-based social simulations. Traditional approaches for synthesizing populations use a detailed sample of the population (which may not be available) or combine data into a single joint distribution, and draw individuals or households from these. The latter group of existing sample-free methods fail to integrate (1) the best available data on spatial granular distributions, (2) multi-variable joint distributions, and (3) household level distributions. In this paper, we propose a sample-free approach where synthetic individuals and households directly represent the estimated joint distribution to which attributes are iteratively added, conditioned on previous attributes such that the relative frequencies within each joint group of attributes are maintained and fit granular spatial marginal distributions. In this paper we present our method and test it for the Zuid-West district of The Hague, the Netherlands, showing that spatial, multi-variable and household distributions are accurately reflected in the resulting synthetic population.

Keywords

Data disaggregation, Iterative proportional fitting, Sample-free data synthesis, Spatial heterogeneity, Synthetic households, Synthetic population, Synthetic reconstruction, Artificial Intelligence

Citation

de Mooij, J, Sonnenschein, T, Pellegrino, M, Dastani, M, Ettema, D, Logan, B & Verstegen, J A 2024, 'GenSynthPop : generating a spatially explicit synthetic population of individuals and households from aggregated data', Autonomous Agents and Multi-Agent Systems, vol. 38, no. 2, 48, pp. 1-28. https://doi.org/10.1007/s10458-024-09680-7