SPICE 2.0.1

SPICE (Small-Molecule/Protein Interaction Chemical Energies) is a collection of quantum mechanical data for training potential functions. The emphasis is particularly on simulating drug-like small molecules interacting with proteins. It is described in these publications: Peter Eastman, Pavan Kumar Behara, David L. Dotson, Raimondas Galvelis, John E. Herr, Josh T. Horton, Yuezhi Mao, John D. Chodera, Benjamin P. Pritchard, Yuanqing Wang, Gianni De Fabritiis, and Thomas E. Markland. "SPICE, A Dataset of Drug-like Molecules and Peptides for Training Machine Learning Potentials." Scientific Data 10, 11 (2023). https://doi.org/10.1038/s41597-022-01882-6 Peter Eastman, Benjamin P. Pritchard, John D. Chodera, Thomas E. Markland. "Nutmeg and SPICE: Models and Data for Biomolecular Machine Learning." J. Chem. Theory Comput. 20, 19, 8583-8593 (2024). https://doi.org/10.1021/acs.jctc.4c00794 Version 2 is a major update that roughly doubles the total amount of data. Additions since version 1 include: Over 13,000 new PubChem molecules (50 conformations each) Over 194,000 conformations for dimers consisting of an amino acid and a ligand 1000 water clusters 1397 PubChem molecules solvated with a shell of water molecules Two new elements (boron and silicon) 2.0.1 is a minor update. It removes a small number of conformations in which bonds broke during conformation generation, leading to molecules that did not match the SMILES strings. The HDF5 file is structured as follows. There is one top level group for each unique molecule or cluster. The name of each group is usually either a PubChem Substance ID (for PubChem molecules), an amino acid sequence (for dipeptides and solvated amino acids), or a SMILES string (for everything else). Each group contains the following datasets. N is the number of atoms in the molecule and M is the number of conformations. (Some groups may be missing some of them, for example if MBIS failed to converge.) subset: The name of the data subset the molecule is from. smiles: The canonical SMILES string for the molecule. It includes explicit hydrogens and atom indices. atomic_numbers: Array of length N containing the atomic number of every atom. They are ordered following the indices in the SMILES string. conformations: Array of shape (M, N, 3) containing the atomic coordinates for every conformation. formation_energy: Array of length M containing the total energy of each conformation, minus the reference energies of the individual atoms when infinitely separated. This is the most useful energy for most purposes, since it contains all energy components that vary with atom positions but removes the large constant part corresponding to the internal energies of individual atoms. dft_total_energy: Array of length M containing the energy of each conformation. dft_total_gradient: Array of shape (M, N, 3) containing the gradient of the energy with respect to the atomic coordinates. mbis_charges: Array of shape (M, N, 1) containing the MBIS charge of each atom. mbis_dipoles: Array of shape (M, N, 3) containing the MBIS dipole of each atom. mbis_quadrupoles: Array of shape (M, N, 3, 3) containing the MBIS quadrupole of each atom. mbis_octupoles: Array of shape (M, N, 3, 3, 3) containing the MBIS octupole of each atom. scf_dipoles: Array of shape (M, 3) containing the dipole of each molecule. scf_quadrupole: Array of shape (M, 3, 3) containing the quadrupole of each molecule. mayer_indices: Array of shape (M, N, N) containing the Mayer bond indices. wiberg_lowdin_indices: Array of shape (M, N, N) containing the Wiberg bond indices using orthogonal Löwdin orbitals. All values are in atomic units. Distances are in bohr and energies in hartree.

Related Organizations

Newcastle University
United Kingdom
University of California, Irvine
United States
Stanford University
United States
Virginia Tech
United States
Memorial Sloan Kettering Cancer Center
United States

View all View all

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	4
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%

Found an issue? Give us feedback

4

Top 10%

Average

Top 10%

Related to Research communities

EUTOPIA Open Research Portal