
AbstractMotivationWe explored how explainable artificial intelligence (XAI) can help to shed light into the inner workings of neural networks for protein function prediction, by extending the widely used XAI method of integrated gradients such that latent representations inside of transformer models, which were finetuned to Gene Ontology term and Enzyme Commission number prediction, can be inspected too.ResultsThe approach enabled us to identify amino acids in the sequences that the transformers pay particular attention to, and to show that these relevant sequence parts reflect expectations from biology and chemistry, both in the embedding layer and inside of the model, where we identified transformer heads with a statistically significant correspondence of attribution maps with ground truth sequence annotations (e.g. transmembrane regions, active sites) across many proteins.Availability and ImplementationSource code can be accessed at https://github.com/markuswenzel/xai-proteins.
FOS: Computer and information sciences, Computer Science - Machine Learning, Original Paper, biology, Biomolecules (q-bio.BM), Computer science, Life sciences, Machine Learning (cs.LG), Gene Ontology, Quantitative Biology - Biomolecules, Protein Domains, Artificial Intelligence, FOS: Biological sciences, internet, Neural Networks, Computer, Amino Acids
FOS: Computer and information sciences, Computer Science - Machine Learning, Original Paper, biology, Biomolecules (q-bio.BM), Computer science, Life sciences, Machine Learning (cs.LG), Gene Ontology, Quantitative Biology - Biomolecules, Protein Domains, Artificial Intelligence, FOS: Biological sciences, internet, Neural Networks, Computer, Amino Acids
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 12 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
