<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=undefined&type=result"></script>');
-->
</script>
This repository contains data and models used in the following paper. Swanson, K., Liu, G., Catacutan, D., Zou, J. & Stokes, J. Generative AI for designing and validating easily synthesizable and structurally novel antibiotics. Nature Machine Intelligence, 2024. The data and models are meant to be used with the SyntheMol code. More details about how to use the data and models with the code are available here. The Data.zip file has the following structure. Note that the numbers for the Data subdirectories correspond to the supplementary data numbers in the paper (e.g., 1_training_data corresponds to Supplementary Data 1). Data 1_training_data: The Acinetobacter baumannii inhibition data used to train antibiotic property prediction models. 2_chembl: Known antibiotic and antibacterial molecules from ChEMBL, which are used to compute the novelty of generated antibiotic candidates. 4_real_space: Data files and statistics for the Enamine REAL Space. The molecular building blocks file is version 2021 q3-4 while all other REAL Space details are computed from the full enumerated REAL space version 2022 q1-2 (downloaded on August 30, 2022). 5_generations_clogp: Compounds generated by SyntheMol using Chemprop models trained to predict cLogP. 6_generations_chemprop: Compounds generated by SyntheMol using Chemprop models trained to predict A. baumannii inhibition. 7_generations_chemprop_rdkit: Compounds generated by SyntheMol using Chemprop-RDKit models trained to predict A. baumannii inhibition. 8_generations_random_forest: Compounds generated by SyntheMol using random forest models trained to predict A. baumannii inhibition. 9_synthesized: Information on the 58 SyntheMol-generated compounds that were successfully synthesized by Enamine. The Models.zip file contains one folder for each model used in the paper. Note that each model is technically an ensemble of ten individual models, so each directory contains ten model files.
machine learning, generative ai, synthesizability, antibiotics, drug discovery
machine learning, generative ai, synthesizability, antibiotics, drug discovery
citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 1 | |
popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |