Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2024
License: CC 0
Data sources: ZENODO
ZENODO
Dataset . 2024
License: CC 0
Data sources: Datacite
ZENODO
Dataset . 2024
License: CC 0
Data sources: Datacite
versions View all 2 versions
addClaim

SPICE 2.0.1

Authors: Eastman, Peter; Behara, Pavan Kumar; Dotson, David; Galvelis, Raimondas; Herr, John; Horton, Josh; Mao, Yuezhi; +5 Authors

SPICE 2.0.1

Abstract

SPICE (Small-Molecule/Protein Interaction Chemical Energies) is a collection of quantum mechanical data for training potential functions. The emphasis is particularly on simulating drug-like small molecules interacting with proteins. It is described in these publications: Peter Eastman, Pavan Kumar Behara, David L. Dotson, Raimondas Galvelis, John E. Herr, Josh T. Horton, Yuezhi Mao, John D. Chodera, Benjamin P. Pritchard, Yuanqing Wang, Gianni De Fabritiis, and Thomas E. Markland. "SPICE, A Dataset of Drug-like Molecules and Peptides for Training Machine Learning Potentials." Scientific Data 10, 11 (2023). https://doi.org/10.1038/s41597-022-01882-6 Peter Eastman, Benjamin P. Pritchard, John D. Chodera, Thomas E. Markland. "Nutmeg and SPICE: Models and Data for Biomolecular Machine Learning." J. Chem. Theory Comput. 20, 19, 8583-8593 (2024). https://doi.org/10.1021/acs.jctc.4c00794 Version 2 is a major update that roughly doubles the total amount of data. Additions since version 1 include: Over 13,000 new PubChem molecules (50 conformations each) Over 194,000 conformations for dimers consisting of an amino acid and a ligand 1000 water clusters 1397 PubChem molecules solvated with a shell of water molecules Two new elements (boron and silicon) 2.0.1 is a minor update. It removes a small number of conformations in which bonds broke during conformation generation, leading to molecules that did not match the SMILES strings. The HDF5 file is structured as follows. There is one top level group for each unique molecule or cluster. The name of each group is usually either a PubChem Substance ID (for PubChem molecules), an amino acid sequence (for dipeptides and solvated amino acids), or a SMILES string (for everything else). Each group contains the following datasets. N is the number of atoms in the molecule and M is the number of conformations. (Some groups may be missing some of them, for example if MBIS failed to converge.) subset: The name of the data subset the molecule is from. smiles: The canonical SMILES string for the molecule. It includes explicit hydrogens and atom indices. atomic_numbers: Array of length N containing the atomic number of every atom. They are ordered following the indices in the SMILES string. conformations: Array of shape (M, N, 3) containing the atomic coordinates for every conformation. formation_energy: Array of length M containing the total energy of each conformation, minus the reference energies of the individual atoms when infinitely separated. This is the most useful energy for most purposes, since it contains all energy components that vary with atom positions but removes the large constant part corresponding to the internal energies of individual atoms. dft_total_energy: Array of length M containing the energy of each conformation. dft_total_gradient: Array of shape (M, N, 3) containing the gradient of the energy with respect to the atomic coordinates. mbis_charges: Array of shape (M, N, 1) containing the MBIS charge of each atom. mbis_dipoles: Array of shape (M, N, 3) containing the MBIS dipole of each atom. mbis_quadrupoles: Array of shape (M, N, 3, 3) containing the MBIS quadrupole of each atom. mbis_octupoles: Array of shape (M, N, 3, 3, 3) containing the MBIS octupole of each atom. scf_dipoles: Array of shape (M, 3) containing the dipole of each molecule. scf_quadrupole: Array of shape (M, 3, 3) containing the quadrupole of each molecule. mayer_indices: Array of shape (M, N, N) containing the Mayer bond indices. wiberg_lowdin_indices: Array of shape (M, N, N) containing the Wiberg bond indices using orthogonal Löwdin orbitals. All values are in atomic units. Distances are in bohr and energies in hartree.

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    4
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Top 10%
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Top 10%
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
4
Top 10%
Average
Top 10%
Related to Research communities