Evaluation of classification models for retrieving experimental sections from full-text publications
Publication date
2019
Editors
Advisors
Supervisors
DOI
Document Type
Report
Metadata
Show full item recordCollections
License
Abstract
In recent years, reporting scientific experiments became a challenge for scientists working data-intensive research fields. One of these challenges is to accurately report experimental work relying on computational activities. In this report, an exploratory computational experiment is conducted. We evaluate the performance of a set of classification models to extract experimental paragraphs from full-text scientific publications in an unsupervised fashion. The results show that the best performing classification model (Multinomial Naive Bayes) trained on 30 publications in the Proteomics domain achieves a Recall of 87.12% and an Accuracy of 80.63%. Successful unsupervised extraction of experimental paragraphs from reports can considerably reduce the noise present in full-text publications. This approach could be beneficial to automatically generate domain specific vocabulary describing experimental designs and experimental processes. As such, this work contributes to the identification of NLP techniques automatizing the extraction of domain-specific paragraphs which relate to experimental work.
Keywords
Citation
Lefebvre, A, Berendsen, J & Spruit, M 2019, Evaluation of classification models for retrieving experimental sections from full-text publications. Technical Report Series, no. UU-CS-2019-002, UU BETA ICS Departement Informatica, Utrecht.