Automated benchmarking of combined protein structure and ligand conformation prediction

Leemann, Michèle; Sagasta, Ander; Eberhardt, Jerome; Schwede, Torsten; Robin, Xavier; Durairaj, Janani

Found an issue? Give us feedback

ZENODOarrow_drop_down

ZENODO

Dataset . 2023

License: CC BY

Data sources: Datacite

ZENODO

Dataset . 2023

License: CC BY

Data sources: Datacite

ZENODO

Dataset . 2023

License: CC BY

Data sources: ZENODO

Automated benchmarking of combined protein structure and ligand conformation prediction

Research datakeyboard_double_arrow_right Dataset 15 Sep 2023Publisher:ZenodoFunded by:EC | LIGATE, EC | ELIXIR-EXCELERATE

Authors: Leemann, Michèle; Sagasta, Ander; Eberhardt, Jerome; Schwede, Torsten; Robin, Xavier; Durairaj, Janani;

doi: 10.5281/zenodo.8348279 , 10.5281/zenodo.8348280

Automated benchmarking of combined protein structure and ligand conformation prediction

- Summary
- Metrics

Abstract

The prediction of protein-ligand complexes (PLC), using both experimental and predicted structures, is an active and important area of research, underscored by the inclusion of the Protein-Ligand Interaction category in the latest round of the Critical Assessment of Protein Structure Prediction experiment CASP15. The prediction task in CASP15 consisted of predicting both the three-dimensional structure of the receptor protein as well as the position and conformation of the ligand. This paper addresses the challenges and proposed solutions for devising automated benchmarking techniques for PLC prediction. The reliability of experimentally solved PLC as ground truth reference structures is assessed using various validation criteria. Similarity of PLC to previously released complexes are employed to judge PLC diversity and the difficulty of a PLC as a prediction target. We show that the commonly used PDBBind time-split test-set is inappropriate for comprehensive PLC evaluation, with state-of-the-art tools showing conflicting results on a more representative and high quality dataset constructed for benchmarking purposes. We also show that redocking on crystal structures is a much simpler task than docking into predicted protein models, demonstrated by the two PLC-prediction-specific scoring metrics created. Finally, we introduce a fully automated pipeline that predicts PLC and evaluates the accuracy of the protein structure, ligand pose, and protein-ligand interactions. This repository contains: all_validation_clustering_data.tsv - X-ray validation data and MMSeqs cluster identifiers at different sequence identities for over a million small molecule and ion-binding pockets in the PDB. hqr_dataset.tsv - PDB IDs and ligand information for the high quality representative (HQR) dataset described in the manuscript score_files.tar.gz - Full docking results for all detected pockets for the PDBBind time-split test-set, the HQR dataset, and the subsets of AF models created for both datasets. One file per tool benchmarked with the following columns: Tool, Complex, Pocket, Rank, lDDT-PLI, lDDT-LP, BiSyRMSD, Reference_Ligand, Tool-generated Score errors_all_sets.csv - Report of failures running the pipeline with the following columns: Process, Complex/Ligand/Receptor, Problem

Related Organizations

University of Lausanne
Switzerland
SIB Swiss Institute of Bioinformatics
Switzerland
University of Basel
Switzerland

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average