Comprehensive benchmark and architectural analysis of deep learning models for nanopore sequencing basecalling

Pagès-Gallego, Marc; de Ridder, Jeroen

doi:https://doi.org/10.1186/s13059-023-02903-2

Comprehensive benchmark and architectural analysis of deep learning models for nanopore sequencing basecalling

Files

s13059-023-02903-2.pdf (1.83 MB)

Publication date

2023-04-11

Authors

Pagès-Gallego, Marc

de Ridder, Jeroen

DOI

https://doi.org/10.1186/s13059-023-02903-2

Document Type

Article

Metadata

Show full item record

Collections

UMC Repository

License

cc_by

Abstract

Background: Nanopore-based DNA sequencing relies on basecalling the electriccurrent signal. Basecalling requires neural networks to achieve competitive accuracies.To improve sequencing accuracy further, new models are continuously proposed withnew architectures. However, benchmarking is currently not standardized, andevaluation metrics and datasets used are defined on a per publication basis, impedingprogress in the field. This makes it impossible to distinguish data from model drivenimprovements. Results: To standardize the process of benchmarking, we unifiedexisting benchmarking datasets and defined a rigorous set of evaluation metrics. Webenchmarked the latest seven basecaller models by recreating and analyzing theirneural network architectures. Our results show that overall Bonito’s architecture is thebest for basecalling. We find, however, that species bias in training can have a largeimpact on performance. Our comprehensive evaluation of 90 novel architecturesdemonstrates that different models excel at reducing different types of errors and usingrecurrent neural networks (long short-term memory) and a conditional random fielddecoder are the main drivers of high performing models. Conclusions: We believe thatour work can facilitate the benchmarking of new basecaller tools and that thecommunity can further expand on this work.

Keywords

Basecalling, Benchmark, Deep learning, Nanopore, Ecology, Evolution, Behavior and Systematics, Genetics, Cell Biology

Citation

Pagès-Gallego, M & de Ridder, J 2023, 'Comprehensive benchmark and architectural analysis of deep learning models for nanopore sequencing basecalling', Genome Biology, vol. 24, no. 1, 71, pp. 1-18. https://doi.org/10.1186/s13059-023-02903-2

URI

https://dspace.library.uu.nl/handle/1874/451977

Comprehensive benchmark and architectural analysis of deep learning models for nanopore sequencing basecalling

Files

Publication date

Authors

Editors

Advisors

Supervisors

DOI

Document Type

Metadata

Collections

License

Abstract

Keywords

Citation

URI