RFPlasmid: Predicting plasmid sequences from short read assembly data using machine learning

Publication date

2020-09-02

Authors

Graaf-van Bloois, Linda van derORCID 0000-0001-8181-3393ISNI 0000000395094347
Wagenaar, Jaap AISNI 0000000388430808
Zomer, Aldert L.ORCID 0000-0002-0758-5190ISNI 0000000393481634

Editors

Advisors

Supervisors

Document Type

/dk/atira/pure/researchoutput/researchoutputtypes/workingpaper/preprint
Open Access logo

License

Abstract

Antimicrobial resistance (AMR) genes in bacteria are often carried on plasmids and these plasmids can transfer AMR genes between bacteria. For molecular epidemiology purposes and risk assessment, it is important to know if the genes are located on highly transferable plasmids or in the more stable chromosomes. However, draft whole genome sequences are fragmented, making it difficult to discriminate plasmid and chromosomal contigs. Current methods that predict plasmid sequences from draft genome sequences rely on single features, like k-mer composition, circularity of the DNA molecule, copy number or sequence identity to plasmid replication genes, all of which have their drawbacks, especially when faced with large single copy plasmids, which often carry resistance genes. With our newly developed prediction tool RFPlasmid, we use a combination of multiple features, including k-mer composition and databases with plasmid and chromosomal marker proteins, to predict if the likely source of a contig is plasmid or chromosomal. The tool RFPlasmid supports models for 17 different bacterial species, including Campylobacter, E. coli, and Salmonella, and has a species agnostic model for metagenomic assemblies or unsupported organisms. RFPlasmid is available both as standalone tool and via web interface.Competing Interest StatementThe authors have declared no competing interest.

Keywords

Citation

van Bloois, L V D G, Wagenaar, J A & Zomer, A L 2020 'RFPlasmid: Predicting plasmid sequences from short read assembly data using machine learning' bioRxiv, pp. 1-17. https://doi.org/10.1101/2020.07.31.230631