Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Image . 2025
License: CC BY
Data sources: ZENODO
ZENODO
Image . 2025
License: CC BY
Data sources: Datacite
ZENODO
Image . 2025
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

[Data augmentation in a TTL] - Figure 3 TMAPS // Comparison of USPTO and fictive reactions in terms of chemical space coverage.

Authors: Grandjean, Yves; Kreutter, David Patrick Joseph; Reymond, Jean-Louis;

[Data augmentation in a TTL] - Figure 3 TMAPS // Comparison of USPTO and fictive reactions in terms of chemical space coverage.

Abstract

Here are the 2 interactive TMAPs shown in figure 3 a, b of our work. Feel free to explore the different reactions and molecules. Fig 3a: DRFP TMAP comparing the fictive dataset (~1M reactions) with USPTO140kt, labels are the dataset from which each reaction is originated. Each template is represented by 2 randomly picked reactions in each dataset, making a total of 55k reactions. Fig 3b: MHFP6 TMAP of starting materials (SM) considering 10,000 SM randomly picked from USPTO14kt and 40,000 SM randomly picked from the 1M fictive reactions. Title of the manuscript: "Data augmentation in a Triple Transformer Loop retrosynthesis model" Abstract: Reactions in the US Patent Office (USPTO) are biased towards a few over-represented reaction types, which potentially limits its usefulness for computer-assisted synthesis planning (CASP). To obtain an equilibrated dataset, we applied retrosynthesis templates to USPTO molecules as products (P) to generate starting materials (SM). We then used transformer T2 from our recently reported triple transformer loop (TTL) retrosynthesis model to predict reagents (R) for the SM®P reaction. Finally, we validated the prediction by requesting a high confidence prediction (>95%) for the prediction of P from SM+R by TTL transformer T3. We generated up to 5,000 reactions per template, resulting in 27.5 million validated fictive reactions covering the chemical space of the original UPSTO dataset. To exemplify the use of this dataset, we show that a single-step retrosynthesis transformer model trained with a template equilibrated subset of 1,097,374 fictive reactions outperforms the corresponding model trained on USPTO reactions only.

Related Organizations
Keywords

retrosynthesis, transformers

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average