PLM-eXplain: Divide and Conquer the Protein Embedding Space

Publication date

2026-01

Authors

van Eck, Jan
Gogishvili, DeaORCID 0000-0001-8809-0861ISNI 0000000503797411
Silva, WilsonORCID 0000-0002-4080-9328ISNI 0000000518163972
Abeln, SanneORCID 0000-0002-2779-7174ISNI 0000000133909702

Editors

Advisors

Supervisors

Document Type

Article
Open Access logo

License

cc_by

Abstract

MOTIVATION: Protein language models (PLMs) have revolutionized computational biology through their ability to generate powerful sequence representations for diverse prediction tasks. However, their black-box nature limits biological interpretation and translation to actionable insights. Bridging this gap requires approaches that maintain predictive performance while providing interpretable explanations of model behaviour. RESULTS: We present PLM-eXplain (PLM-X), an explainable adapter layer that bridges this gap by factoring PLM embeddings into two complementary components: an interpretable subspace based on established biochemical features, and a residual subspace that retains predictive, non-interpretable information. Using embeddings from ESM2 and ProtBert, PLM-X incorporates well-established properties, including secondary structure and hydropathy, while maintaining high predictive performance. We demonstrate the effectiveness of our approach across three biologically relevant classification tasks: extracellular vesicle association, transmembrane helix prediction, and aggregation propensity prediction. PLM-X enables biological interpretation of model decisions without sacrificing accuracy, offering a generalizable solution for enhancing PLM interpretability across various downstream applications. AVAILABILITY AND IMPLEMENTATION: Source code and models are available at https://github.com/AIT4LIFE-UU/PLM-eXplain/.

Keywords

Statistics and Probability, Biochemistry, Molecular Biology, Computer Science Applications, Computational Theory and Mathematics, Computational Mathematics

Citation

van Eck, J, Gogishvili, D, Silva, W & Abeln, S 2026, 'PLM-eXplain : Divide and Conquer the Protein Embedding Space', Bioinformatics (Oxford, England), vol. 42, no. 1, btaf631. https://doi.org/10.1093/bioinformatics/btaf631