Finding the most potent compounds using active learning on molecular pairs

descriptionPublicationkeyboard_double_arrow_right Article , Other literature type 27 Aug 2024 English Publisher:Beilstein InstitutJournal:Beilstein Journal of Organic Chemistry, volume 20, pages 2,152-2,162 (eissn: 1860-5397,

Copyright policy )Funded by:NIH | Designing Personalized Fo...

Authors: Zachary Fralish; Daniel Reker;

doi: 10.3762/bjoc.20.185

pmid: 39224230

pmc: PMC11368049

Finding the most potent compounds using active learning on molecular pairs

- Summary
- Subjects
- Related research
  (1)
- Metrics

Abstract

Active learning allows algorithms to steer iterative experimentation to accelerate and de-risk molecular optimizations, but actively trained models might still exhibit poor performance during early project stages where the training data is limited and model exploitation might lead to analog identification with limited scaffold diversity. Here, we present ActiveDelta, an adaptive approach that leverages paired molecular representations to predict improvements from the current best training compound to prioritize further data acquisition. We apply the ActiveDelta concept to both graph-based deep (Chemprop) and tree-based (XGBoost) models during exploitative active learning for 99 Ki benchmarking datasets. We show that both ActiveDelta implementations excel at identifying more potent inhibitors compared to the standard exploitative active learning implementations of Chemprop, XGBoost, and Random Forest. The ActiveDelta approach is also able to identify more chemically diverse inhibitors in terms of their Murcko scaffolds. Finally, deep models such as Chemprop trained on data selected through ActiveDelta approaches can more accurately identify inhibitors in test data created through simulated time-splits. Overall, this study highlights the large potential for molecular pairing approaches to further improve popular active learning strategies in low data regimes by enabling faster and more accurate identification of more diverse molecular hits against critical drug targets.

Related Organizations

Duke University
United States
Department of Biomedical Engineering Duke University
United States
DUKE UNIVERSITY
DUKE UNIVERSITY

Keywords

machine learning, QD241-441, drug design, active learning, potency predictions, Science, Q, molecular optimization, Organic chemistry, Full Research Paper

1 Research products, page 1 of 1

ActiveDelta software on GitHub
IsRelatedTo

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	3
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average