gplasCC: classification and recovery of plasmids from short-read sequencing data for any bacterial species

Publication date

2026-03-01

Authors

Paganini, Julian A.
Kerkvliet, Jesse
Jordan, Oscar
Teunis, Gijs
Plantinga, Nienke L.
Meneses, Rodrigo
Willems, RobISNI 0000000388459432
Arredondo-Alonso, Sergio
Schürch, AnitaORCID 0000-0003-1894-7545ISNI 0000000139649112

Editors

Advisors

Supervisors

Document Type

Article

Collections

Open Access logo

License

cc_by_nc

Abstract

Plasmids play a pivotal role in the spread of antibiotic resistance genes. Accurately reconstructing plasmids often requires long-read sequencing, but bacterial genomic data in publicly accessible repositories have historically been derived from short-read sequencing technology. We recently presented an approach for recovering Escherichia coli antimicrobial resistance plasmids using Illumina short reads. This method consisted of combining a robust binary classification tool named plasmidEC with gplas2, which is a tool that makes use of features of the assembly graph to bin predicted plasmid contigs into individual plasmids. Here, we developed plasmidCC, an upgrade from plasmidEC, capable of classifying plasmid contigs using Centrifuge databases. We have developed seven plasmidCC databases in addition to the database for E. coli: six species-specific models (Acinetobacter baumannii, Enterococcus faecium, Enterococcus faecalis, Klebsiella pneumoniae, Staphylococcus aureus, and Salmonella enterica) and one species-independent model for less frequently studied bacterial species. We combined these models with gplasCC to recover plasmids from >100 bacterial species. This approach allows comprehensive analysis of the wealth of bacterial short-read sequencing data available in public repositories and advances our understanding of microbial plasmids.

Keywords

Structural Biology, Molecular Biology, Genetics, Computer Science Applications, Applied Mathematics

Citation

Paganini, J A, Kerkvliet, J J, Jordan, O, Teunis, G, Plantinga, N L, Meneses, R, Willems, R J L, Arredondo-Alonso, S & Schürch, A C 2026, 'gplasCC : classification and recovery of plasmids from short-read sequencing data for any bacterial species', NAR genomics and bioinformatics, vol. 8, no. 1, lqag028. https://doi.org/10.1093/nargab/lqag028