A global and interoperable dataset of linguistic distributions derived from the Atlas of the World’s Languages

Publication date

2025-08-22

Authors

Ranacher, Peter
Forkel, Robert
Efrat-Kowalsky, Nour
Urban, Matthias
Hehli, Antonia
Franz, Micha
Biland, Gregory
Kreienbühl, Aaron
Hermida Rodríguez, Alba
Azevedo, Matheus

Editors

Advisors

Supervisors

Document Type

Article
Open Access logo

License

cc_by

Abstract

Asher and Moseley’s Atlas of the World’s Languages illustrates the past and present spatial distribution of human languages across more than 100 maps. While the Atlas is an impressive resource, its data are not readily accessible for research. Language areas are presented as printed maps and referenced by name, rather than as digital spatial objects linked to a standardised language catalogue. To address these limitations, we present a digital dataset derived from the Atlas. We georeferenced the map images, digitised the language polygons in a Geographic Information System (GIS), and linked each polygon to a Glottocode — a unique identifier for languages and language varieties. Following the FAIR principles, we provide the data as a faithful digital replication of the Atlas (comprising 6,992 distinct language areas) and in enriched, aggregated versions for contemporary and traditional languages. The datasets capture the spatial distribution of human languages as depicted in the Atlas, with each polygon linked to an unambiguous identifier, enabling computational analyses of the origins, distribution, and drivers of global linguistic diversity.

Keywords

Statistics and Probability, Information Systems, Education, Computer Science Applications, Statistics, Probability and Uncertainty, Library and Information Sciences

Citation

Ranacher, P, Forkel, R, Efrat-Kowalsky, N, Urban, M, Hehli, A, Franz, M, Biland, G, Kreienbühl, A, Hermida Rodríguez, A, Azevedo, M, Romar, M, Klaussova, A, Takahashi, T, Neureiter, N, van Gijn, R, Roose, M, Vesakoski, O, Weibel, R, Kaiping, G & Norder, S 2025, 'A global and interoperable dataset of linguistic distributions derived from the Atlas of the World’s Languages', Scientific data, vol. 12, no. 1, 1466. https://doi.org/10.1038/s41597-025-05828-6