
Recently, many artificial intelligence (AI)-powered protein–ligand docking and scoring methods have been developed, demonstrating impressive speed and accuracy. However, these methods often neglected the physical plausibility of the docked complexes and their efficacy in virtual screening (VS) projects. Therefore, we conducted a comprehensive benchmark analysis of four AI-powered and four physics-based docking tools and two AI-enhanced rescoring methods. We initially constructed the TrueDecoy set, a dataset on which the redocking experiments revealed that KarmaDock and CarsiDock surpassed all physics-based tools in docking accuracy, whereas all physics-based tools notably outperformed AI-based methods in structural rationality. The low physical plausibility of docked structures generated by the top AI method, CarsiDock, mainly stems from insufficient intermolecular validity. The VS results on the TrueDecoy set highlight the effectiveness of RTMScore as a rescore function, and Glide-based methods achieved the highest enrichment factors among all docking tools. Furthermore, we created the RandomDecoy set, a dataset that more closely resembles real-world VS scenarios, where AI-based tools obviously outperformed Glide. Additionally, we found that the employed ligand-based postprocessing methods had a weak or even negative impact on optimizing the conformations of docked complexes and enhancing VS performance. Finally, we proposed a hierarchical VS strategy that could efficiently and accurately enrich active molecules in large-scale VS projects.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
