PLM-eXplain: Divide and Conquer the Protein Embedding Space
Files
Publication date
2026-01
Editors
Advisors
Supervisors
Document Type
Article
Metadata
Show full item recordCollections
License
cc_by
Abstract
MOTIVATION: Protein language models (PLMs) have revolutionized computational biology through their ability to generate powerful sequence representations for diverse prediction tasks. However, their black-box nature limits biological interpretation and translation to actionable insights. Bridging this gap requires approaches that maintain predictive performance while providing interpretable explanations of model behaviour. RESULTS: We present PLM-eXplain (PLM-X), an explainable adapter layer that bridges this gap by factoring PLM embeddings into two complementary components: an interpretable subspace based on established biochemical features, and a residual subspace that retains predictive, non-interpretable information. Using embeddings from ESM2 and ProtBert, PLM-X incorporates well-established properties, including secondary structure and hydropathy, while maintaining high predictive performance. We demonstrate the effectiveness of our approach across three biologically relevant classification tasks: extracellular vesicle association, transmembrane helix prediction, and aggregation propensity prediction. PLM-X enables biological interpretation of model decisions without sacrificing accuracy, offering a generalizable solution for enhancing PLM interpretability across various downstream applications. AVAILABILITY AND IMPLEMENTATION: Source code and models are available at https://github.com/AIT4LIFE-UU/PLM-eXplain/.
Keywords
Statistics and Probability, Biochemistry, Molecular Biology, Computer Science Applications, Computational Theory and Mathematics, Computational Mathematics
Citation
van Eck, J, Gogishvili, D, Silva, W & Abeln, S 2026, 'PLM-eXplain : Divide and Conquer the Protein Embedding Space', Bioinformatics (Oxford, England), vol. 42, no. 1, btaf631. https://doi.org/10.1093/bioinformatics/btaf631