Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Presentation . 2021
License: CC BY
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Presentation . 2021
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

Open source application for small molecule pKa predictions

Authors: Baltruschat, Marcel; Bushiri, David; Tapavicza, Enrico; Czodrowski, Paul;

Open source application for small molecule pKa predictions

Abstract

The acid-base dissociation constant (pKa) of a drug has a far-reaching influence on pharmacokinetics by altering the solubility, membrane permeability and protein binding affinity of the drug [1,2]. To the best of our knowledge, there is no publicly available, open source and license-free pKa prediction tool that can reach the quality of commercial tools. Our goal is to develop a highly accurate pKa prediction tool based on a mixture of freely accessible and commercial data which is however still free to use for everyone. To do so, we identified multiple freely available experimental pKa datasets, including data from DataWarrior, ChEMBL and various scientific publications. Additionally, we have access to several commercial data sets, for example from Novartis [3] and OpenEye [4]. We started with monoprotic molecules obtained from ChEMBL and DataWarrior to evaluate the dataset quality and our modelling concepts. After preprocessing 5994 unique structures with a pKa value between 2 and 12 were used for machine learning. We tested seven different machine learning configurations including four different basic regressors together with six unique descriptor/fingerprint sets resulting in a total number of 42 trained and 5-fold cross-validated models. Additionally, we evaluated the models with two external test datasets. The results have been published in March 2020 [5]. Furthermore, we investigated how Graph Convolutional Networks and QM-based approaches can be used to further improve prediction quality. To be able to predict the pKa values of multiprotic molecules, two major problems had to be solved: Localization of the titratable groups without licensed software and the once-only assignment of the experimental values to the corresponding groups for all datasets. For the localization part we evaluated the results of Marvin [6] and Dimorphite-DL [7] to compile a list of 24 SMARTS pattern that catch almost 90% of all groups in our combined dataset of over 17000 unique molecules. Finally, the Marvin [6] predictions were used to assign the experimental values to the corresponding group while removing outliers. The resulting data set can be used as a starting point for machine learning in a following step. All data and code can be found at https://github.com/czodrowskilab [1] Charifson, P. S., & Walters, W. P. (2014). Acidic and Basic Drugs in Medicinal Chemistry: A Perspective. Journal of Medicinal Chemistry [2] Manallack, D. T. (2007). The pKa Distribution of Drugs: Application to Drug Discovery. Perspectives in Medicinal Chemistry [3] Richard A. Lewis, Stephane Rodde, Novartis Pharma AG, Basel, Switzerland [4] pKa COMPLETE_DATABASE v1.13: OpenEye Scientific Software, Santa Fe, NM. [5] Baltruschat M and Czodrowski P. Machine learning meets pKa [version 2; peer review: 2 approved]. F1000Research 2020, 9(Chem Inf Sci):113 [6] Marvin 20.1.0, 2020, ChemAxon Ltd, http://www.chemaxon.com [7] Ropp PJ, Kaminsky JC, Yablonski S, Durrant JD (2019) Dimorphite-DL: An open-source program for numerating the ionization states of drug-like small molecules. Journal of Cheminformatics

Keywords

machine learning, pKa prediction, computational chemistry

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    OpenAIRE UsageCounts
    Usage byUsageCounts
    visibility views 139
    download downloads 106
  • 139
    views
    106
    downloads
    Powered byOpenAIRE UsageCounts
Powered by OpenAIRE graph
Found an issue? Give us feedback
visibility
download
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
views
OpenAIRE UsageCountsViews provided by UsageCounts
downloads
OpenAIRE UsageCountsDownloads provided by UsageCounts
0
Average
Average
Average
139
106
Green
Related to Research communities