Echtvar: compressed variant representation for rapid annotation and filtering of SNPs and indels

Publication date

2023-01-11

Authors

Pedersen, Brent S.
de Ridder, JeroenORCID 0000-0002-0828-3477ISNI 0000000391695751

Editors

Advisors

Supervisors

Document Type

Article

Collections

Open Access logo

License

cc_by

Abstract

Germline and somatic variants within an individual or cohort are interpreted with information from large cohorts. Annotation with this information becomes a computational bottleneck as population sets grow to terabytes of data. Here, we introduce echtvar, which efficiently encodes population variants and annotation fields into a compressed archive that can be used for rapid variant annotation and filtering. Most variants, represented by chromosome, position and alleles are encoded into 32-bits-half the size of previous encoding schemes and at least 4 times smaller than a naive encoding. The annotations, stored separately within the same archive, are also encoded and compressed. We show that echtvar is faster and uses less space than existing tools and that it can effectively reduce the number of candidate variants. We give examples on germ-line and somatic variants to document how echtvar can facilitate exploratory data analysis on genetic variants. Echtvar is available at https://github.com/brentp/echtvar under an MIT license.

Keywords

Genetics

Citation

Pedersen, B S & de Ridder, J 2023, 'Echtvar : compressed variant representation for rapid annotation and filtering of SNPs and indels', Nucleic acids research, vol. 51, no. 1, e3, pp. 1-8. https://doi.org/10.1093/nar/gkac931