
This preprint reviews and synthesizes how modern machine learning is reshaping the drug discovery and clinical development pipeline, addressing two core bottlenecks in pharma R&D: extreme cost (often cited at multi-billion USD per approved drug) and high clinical trial failure rates. The paper surveys practical ML approaches across target identification, hit discovery, lead optimization, ADMET/toxicity prediction, de novo molecular design, and clinical trial risk modeling, highlighting where specific model families fit best, including graph neural networks for molecular property prediction and transformer-based architectures for molecule generation and sequence-driven tasks. A comparative evaluation is presented with reported gains such as improved hit identification performance, faster lead optimization cycles, stronger prediction of mid-stage (Phase II) trial failures, and robust toxicity prediction (e.g., AUC > 0.85) alongside generation of novel compounds with high synthetic accessibility. The manuscript also discusses real-world limitations: data quality, bias, interpretability, privacy/proprietary constraints, and regulatory acceptance, while outlining near-term and longer-term integration directions (e.g., federated learning, digital twins, automated labs, and quantum ML).
FOS: Computer and information sciences, Medical and health sciences, Computer and information sciences, Pharmaceutical sciences, FOS: Medical and health sciences
FOS: Computer and information sciences, Medical and health sciences, Computer and information sciences, Pharmaceutical sciences, FOS: Medical and health sciences
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
