Evaluation of classification models for retrieving experimental sections from full-text publications

Publication date

2019

Authors

Lefebvre, A.ORCID 0000-0002-7428-1728ISNI 0000000506017484
Berendsen, Jorrit
Spruit, M.R.ISNI 0000000077172004

Editors

Advisors

Supervisors

DOI

Document Type

Report
Open Access logo

License

Abstract

In recent years, reporting scientific experiments became a challenge for scientists working data-intensive research fields. One of these challenges is to accurately report experimental work relying on computational activities. In this report, an exploratory computational experiment is conducted. We evaluate the performance of a set of classification models to extract experimental paragraphs from full-text scientific publications in an unsupervised fashion. The results show that the best performing classification model (Multinomial Naive Bayes) trained on 30 publications in the Proteomics domain achieves a Recall of 87.12% and an Accuracy of 80.63%. Successful unsupervised extraction of experimental paragraphs from reports can considerably reduce the noise present in full-text publications. This approach could be beneficial to automatically generate domain specific vocabulary describing experimental designs and experimental processes. As such, this work contributes to the identification of NLP techniques automatizing the extraction of domain-specific paragraphs which relate to experimental work.

Keywords

Citation

Lefebvre, A, Berendsen, J & Spruit, M 2019, Evaluation of classification models for retrieving experimental sections from full-text publications. Technical Report Series, no. UU-CS-2019-002, UU BETA ICS Departement Informatica, Utrecht.