OGRE: Overlap Graph-based metagenomic Read clustEring

Publication date

2021-04-01

Authors

Balvert, MarleenISNI 000000045265046X
Luo, Xiao
Hauptfeld, ErnestinaISNI 0000000492869095
Schönhuth, AlexanderISNI 0000000527767348
Dutilh, Bas EISNI 0000000389464735

Editors

Advisors

Supervisors

Document Type

Article
Open Access logo

License

cc_by

Abstract

Motivation: The microbes that live in an environment can be identified from the combined genomic material, also referred to as the metagenome. Sequencing a metagenome can result in large volumes of sequencing reads. A promising approach to reduce the size of metagenomic datasets is by clustering reads into groups based on their overlaps. Clustering reads are valuable to facilitate downstream analyses, including computationally intensive strain-aware assembly. As current read clustering approaches cannot handle the large datasets arising from high-throughput metagenome sequencing, a novel read clustering approach is needed. In this article, we propose OGRE, an Overlap Graph-based Read clustEring procedure for high-throughput sequencing data, with a focus on shotgun metagenomes. Results: We show that for small datasets OGRE outperforms other read binners in terms of the number of species included in a cluster, also referred to as cluster purity, and the fraction of all reads that is placed in one of the clusters. Furthermore, OGRE is able to process metagenomic datasets that are too large for other read binners into clusters with high cluster purity. Conclusion: OGRE is the only method that can successfully cluster reads in species-specific clusters for large metagenomic datasets without running into computation time- or memory issues. Availabilityand implementation: Code is made available on Github (https://github.com/Marleen1/OGRE).

Keywords

Statistics and Probability, Biochemistry, Molecular Biology, Computer Science Applications, Computational Theory and Mathematics, Computational Mathematics

Citation

Balvert, M, Luo, X, Hauptfeld, E, Schönhuth, A & Dutilh, B E 2021, 'OGRE : Overlap Graph-based metagenomic Read clustEring', Bioinformatics, vol. 37, no. 7, btaa760, pp. 905–912. https://doi.org/10.1093/bioinformatics/btaa760