
Abstract Motivation Drug–target interaction (DTI) prediction is a foundational task for in-silico drug discovery, which is costly and time-consuming due to the need of experimental search over large drug compound space. Recent years have witnessed promising progress for deep learning in DTI predictions. However, the following challenges are still open: (i) existing molecular representation learning approaches ignore the sub-structural nature of DTI, thus produce results that are less accurate and difficult to explain and (ii) existing methods focus on limited labeled data while ignoring the value of massive unlabeled molecular data. Results We propose a Molecular Interaction Transformer (MolTrans) to address these limitations via: (i) knowledge inspired sub-structural pattern mining algorithm and interaction modeling module for more accurate and interpretable DTI prediction and (ii) an augmented transformer encoder to better extract and capture the semantic relations among sub-structures extracted from massive unlabeled biomedical data. We evaluate MolTrans on real-world data and show it improved DTI prediction performance compared to state-of-the-art baselines. Availability and implementation The model scripts are available at https://github.com/kexinhuang12345/moltrans. Supplementary information Supplementary data are available at Bioinformatics online.
FOS: Computer and information sciences, Computer Science - Machine Learning, Quantitative Biology - Quantitative Methods, Original Papers, Machine Learning (cs.LG), Drug Development, Pharmaceutical Preparations, FOS: Biological sciences, Drug Discovery, Computer Simulation, Algorithms, Quantitative Methods (q-bio.QM)
FOS: Computer and information sciences, Computer Science - Machine Learning, Quantitative Biology - Quantitative Methods, Original Papers, Machine Learning (cs.LG), Drug Development, Pharmaceutical Preparations, FOS: Biological sciences, Drug Discovery, Computer Simulation, Algorithms, Quantitative Methods (q-bio.QM)
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 500 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 0.1% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 1% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 0.1% |
