Automated Coding of Job Descriptions From a General Population Study: Overview of Existing Tools, Their Application and Comparison

Wan, Wenxin; Ge, Calvin B; Friesen, Melissa C; Locke, Sarah J; Russ, Daniel E; Burstyn, Igor; Baker, Christopher J O; Adisesh, Anil; Lan, Qing; Rothman, Nathaniel; Huss, Anke; van Tongeren, Martie; Vermeulen, Roel; Peters, Susan

doi:https://doi.org/10.1093/annweh/wxad002

Automated Coding of Job Descriptions From a General Population Study: Overview of Existing Tools, Their Application and Comparison

Files

wxad002.pdf (282.24 KB)

Publication date

2023-06-01

Authors

Wan, Wenxin

Ge, Calvin

Friesen, Melissa C

Locke, Sarah J

Russ, Daniel E

Burstyn, Igor

Baker, Christopher J O

Adisesh, Anil

Lan, Qing

Rothman, Nathaniel

DOI

https://doi.org/10.1093/annweh/wxad002

Document Type

Article

Metadata

Show full item record

Collections

Utrecht University Repository

License

cc_by_nc

Abstract

OBJECTIVES: Automatic job coding tools were developed to reduce the laborious task of manually assigning job codes based on free-text job descriptions in census and survey data sources, including large occupational health studies. The objective of this study is to provide a case study of comparative performance of job coding and JEM (Job-Exposure Matrix)-assigned exposures agreement using existing coding tools. METHODS: We compared three automatic job coding tools [AUTONOC, CASCOT (Computer-Assisted Structured Coding Tool), and LabourR], which were selected based on availability, coding of English free-text into coding systems closely related to the 1988 version of the International Standard Classification of Occupations (ISCO-88), and capability to perform batch coding. We used manually coded job histories from the AsiaLymph case-control study that were translated into English prior to auto-coding to assess their performance. We applied two general population JEMs to assess agreement at exposure level. Percent agreement and PABAK (Prevalence-Adjusted Bias-Adjusted Kappa) were used to compare the agreement of results from manual coders and automatic coding tools. RESULTS: The coding per cent agreement among the three tools ranged from 17.7 to 26.0% for exact matches at the most detailed 4-digit ISCO-88 level. The agreement was better at a more general level of job coding (e.g. 43.8-58.1% in 1-digit ISCO-88), and in exposure assignments (median values of PABAK coefficient ranging 0.69-0.78 across 12 JEM-assigned exposures). Based on our testing data, CASCOT was found to outperform others in terms of better agreement in both job coding (26% 4-digit agreement) and exposure assignment (median kappa 0.61). CONCLUSIONS: In this study, we observed that agreement on job coding was generally low for the three tools but noted a higher degree of agreement in assigned exposures. The results indicate the need for study-specific evaluations prior to their automatic use in general population studies, as well as improvements in the evaluated automatic coding tools.

Keywords

automatic job coding tool, free-text job description, general population studies, reliability, General Medicine

Citation

Wan, W, Ge, C B, Friesen, M C, Locke, S J, Russ, D E, Burstyn, I, Baker, C J O, Adisesh, A, Lan, Q, Rothman, N, Huss, A, van Tongeren, M, Vermeulen, R & Peters, S 2023, 'Automated Coding of Job Descriptions From a General Population Study : Overview of Existing Tools, Their Application and Comparison', Annals of Work Exposures and Health, vol. 67, no. 5, wxad002, pp. 663-672. https://doi.org/10.1093/annweh/wxad002

URI

https://dspace.library.uu.nl/handle/1874/429394

Automated Coding of Job Descriptions From a General Population Study: Overview of Existing Tools, Their Application and Comparison

Files

Publication date

Authors

Editors

Advisors

Supervisors

DOI

Document Type

Metadata

Collections

License

Abstract

Keywords

Citation

URI