Insights into the inner workings of transformer models for protein function prediction

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 19 Jan 2024Embargo end date: 01 Jan 2023 Germany English Publisher:Oxford University Press (OUP)Journal:Bioinformatics, volume 40 (eissn: 1367-4811,

Copyright policy )

Authors: Markus Wenzel 0002; Erik Grüner; Nils Strodthoff;

doi: 10.1093/bioinformatics/btae031 , 10.48550/arxiv.2309.03631

pmid: 38244570

pmc: PMC10950482

arXiv: 2309.03631

Insights into the inner workings of transformer models for protein function prediction

- Summary
- Subjects
- Metrics

Abstract

AbstractMotivationWe explored how explainable artificial intelligence (XAI) can help to shed light into the inner workings of neural networks for protein function prediction, by extending the widely used XAI method of integrated gradients such that latent representations inside of transformer models, which were finetuned to Gene Ontology term and Enzyme Commission number prediction, can be inspected too.ResultsThe approach enabled us to identify amino acids in the sequences that the transformers pay particular attention to, and to show that these relevant sequence parts reflect expectations from biology and chemistry, both in the embedding layer and inside of the model, where we identified transformer heads with a statistically significant correspondence of attribution maps with ground truth sequence annotations (e.g. transmembrane regions, active sites) across many proteins.Availability and ImplementationSource code can be accessed at https://github.com/markuswenzel/xai-proteins.

Country

Germany

Related Organizations

Fraunhofer Society
Germany
Fraunhofer Institute for Telecommunications, Heinrich Hertz Institute
Germany
State Education and Research Institute for Viticulture and Pomology Weinsberg
Germany
Fraunhofer Heinrich Hertz Institute
Germany
Carl von Ossietzky University of Oldenburg
Germany

Keywords

FOS: Computer and information sciences, Computer Science - Machine Learning, Original Paper, biology, Biomolecules (q-bio.BM), Computer science, Life sciences, Machine Learning (cs.LG), Gene Ontology, Quantitative Biology - Biomolecules, Protein Domains, Artificial Intelligence, FOS: Biological sciences, internet, Neural Networks, Computer, Amino Acids

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	12
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%