Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Software
Data sources: ZENODO
addClaim

ProteinTensor: AI-Native Biomolecular Tensor Storage for Structural Biology ML

Authors: Moore, Clayton W.;

ProteinTensor: AI-Native Biomolecular Tensor Storage for Structural Biology ML

Abstract

ProteinTensor is a Python library and file format (.ptt) that eliminates redundant preprocessing in structural biology machine learning pipelines. It converts mmCIF/PDB structures - or raw protein sequences - once into a Zarr-backed, LZ4-compressed, memory-mappable store, providing zero-parse access to atomic coordinates, backbone geometry, covalent bond graphs, MSA tokens, pairwise distance features, and protein language model embeddings. Sequence-only entries serve as direct input to AlphaFold- and Boltz-style predictors. Round-trip conversion is lossless, and structure loading is benchmarked at 2-95x faster than mmCIF parsing across proteins from 74 to 3,525 residues.

Powered by OpenAIRE graph
Found an issue? Give us feedback