Deep Learning for Protein-Ligand Docking: Are We There Yet?

descriptionPublicationkeyboard_double_arrow_right Other literature type , Preprint 05 Jun 2024 English Publisher:ZenodoJournal:CoRR, volume abs/2405.14108

Authors: Morehead, Alex; Giri, Nabin; Liu, Jian; Cheng, Jianlin;

doi: 10.5281/zenodo.13858866 , 10.5281/zenodo.11477766 , 10.5281/zenodo.14629652

Deep Learning for Protein-Ligand Docking: Are We There Yet?

- Summary
- Subjects
- Metrics

Abstract

Included are preprocessed datasets and benchmark method predictions accompanying the benchmarking manuscript "Deep Learning for Protein-Ligand Docking: Are We There Yet?" [1]. In particular, the preprocessed Astex Diverse, PoseBusters Benchmark, and DockGen datasets as well as the publicly available CASP15 targets referenced in the manuscript are available for download. Also available are baseline method predictions from a variety of deep learning and conventional docking methods (e.g., DiffDock-L, Vina) for each of these benchmarking datasets (including pocket-only baseline results for the PoseBusters Benchmark dataset). Note that the "holo_aligned" AlphaFold 3-predicted protein structures provided for the Astex Diverse, PoseBusters Benchmark, and DockGen datasets have been pre-aligned to the corresponding ground-truth (holo) protein structures. Similarly, the "predicted_structures" AlphaFold 3-predicted protein structures provided for the CASP15 dataset have been pre-aligned to the corresponding ground-truth (holo) protein structures. Paper Abstract: The effects of ligand binding on protein structures and their in vivo functions carry numerous implications for modern biomedical research and biotechnology development efforts such as drug discovery. Although several deep learning (DL) methods and benchmarks designed for protein-ligand docking have recently been introduced, to date no prior works have systematically studied the behavior of docking methods within the broadly applicable context of (1) using predicted (apo) protein structures for docking (e.g., for applicability to unknown structures); (2) docking multiple ligands concurrently to a given target protein (e.g., for enzyme design); and (3) having no prior knowledge of binding pockets (e.g., for unknown pocket generalization). To enable a deeper understanding of docking methods' real-world utility, we introduce PoseBench, the first comprehensive benchmark for broadly applicable protein-ligand docking. PoseBench enables researchers to rigorously and systematically evaluate DL docking methods for apo-to-holo protein-ligand docking and protein-ligand structure generation using both single and multi-ligand benchmark datasets, the latter of which we introduce for the first time to the DL community. Empirically, using PoseBench, we find that (1) DL methods consistently outperform conventional docking algorithms; (2) most recent DL docking methods fail to generalize to multi-ligand protein targets; and (3) training DL methods with physics-informed loss functions on diverse clusters of protein-ligand complexes is a promising direction for future work. Code, data, tutorials, and benchmark results are available at https://github.com/BioinfoMachineLearning/PoseBench. References: [1] Morehead A, Giri N, Liu J, Cheng J. Deep Learning for Protein-Ligand Docking: Are We There Yet? arXiv; 2024. Available from: http://arxiv.org/abs/2308.05777

Related Organizations

University of Missouri
United States

Keywords

Deep Learning, Protein-Ligand Structure Prediction, Protein-Ligand Docking

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	1
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

1

Average

Green

Beta

SDGs Suggest

3. Good health

Beta

SDGs:

3. Good health,

Related to Research communities

Knowmad Institut