RTCR: a pipeline for complete and accurate recovery of T cell repertoires from high throughput sequencing data

Publication date

2016

Authors

Gerritsen, BramISNI 0000000506008086
Pandit, AridamanISNI 0000000507287224
Andeweg, Arno C
de Boer, Rob J.ORCID 0000-0002-2130-691XISNI 000000039525534X

Editors

Advisors

Supervisors

Document Type

Article
Open Access logo

License

Abstract

MOTIVATION: High Throughput Sequencing (HTS) has enabled researchers to probe the human T cell receptor (TCR) repertoire, which consists of many rare sequences. Distinguishing between true but rare TCR sequences and variants generated by polymerase chain reaction (PCR) and sequencing errors remains a formidable challenge. The conventional approach to handle errors is to remove low quality reads, and/or rare TCR sequences. Such filtering discards a large number of true and often rare TCR sequences. However, accurate identification and quantification of rare TCR sequences is essential for repertoire diversity estimation. RESULTS: We devised a pipeline, called Recover TCR (RTCR), that accurately recovers TCR sequences, including rare TCR sequences, from HTS data (including barcoded data) even at low coverage. RTCR employs a data-driven statistical model to rectify PCR and sequencing errors in an adaptive manner. Using simulations, we demonstrate that RTCR can easily adapt to the error profiles of different types of sequencers and exhibits consistently high recall and high precision even at low coverages where other pipelines perform poorly. Using published real data, we show that RTCR accurately resolves sequencing errors and outperforms all other pipelines. AVAILABILITY AND IMPLEMENTATION: The RTCR pipeline is implemented in Python (v2.7) and C and is freely available at http://uubram.github.io/RTCR/along with documentation and examples of typical usage. CONTACT: b.gerritsen@uu.nl.

Keywords

Citation

Gerritsen, B, Pandit, A, Andeweg, A C & de Boer, R J 2016, 'RTCR : a pipeline for complete and accurate recovery of T cell repertoires from high throughput sequencing data', Bioinformatics, vol. 32, no. 20, pp. 3098-3106. https://doi.org/10.1093/bioinformatics/btw339