Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2025
License: CC BY
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2025
License: CC BY
Data sources: ZENODO
ZENODO
Dataset . 2025
License: CC BY
Data sources: Datacite
ZENODO
Dataset . 2025
License: CC BY
Data sources: Datacite
versions View all 3 versions
addClaim

ANABAG: ANnotated Antibody AntiGen dataset

Authors: Grandguillaume, Ilyas;

ANABAG: ANnotated Antibody AntiGen dataset

Abstract

ANABAG (ANnotated AntiBody AntiGen) ANABAG is a curated dataset of antibody–antigen complexes. It includes: - 3D structural data (with various formats) - Per-sequence and per-residue features - Frequent updates (monthly on the GitHub repository) The analysis and prediction of antibody–antigen (Ab–Ag) interactions often overlook critical structural features such as glycosylation, physical chemical conditions like pH and salt concentration, as well as the lack of standardized criteria for selecting complexes based on structural properties and sequence identity. Common practices in dataset construction rely on removing redundancy using sequence identity thresholds, which can inadvertently exclude complexes with alternative binding modes that share identical sequences. To enable more precise Ab–Ag modeling and antibody engineering, it is essential to incorporate richer structural and physical information into both physics-based and machine learning models. To address these limitations, we present ANABAG, a new curated dataset of Ab–Ag complexes annotated at the residue level with UniProt sequence information and enriched with a wide range of structural and physicochemical features. The dataset allows flexible filtering of complexes using a variety of descriptors available at both the complex and residue levels. Selected features are ready to use in machine learning workflows, while the structural files are compatible with antibody design and docking pipelines like Rosetta or Haddock. The complete dataset is available on Zenodo, and all accompanying scripts and usage documentation can be accessed via GitHub. Files Included This dataset is provided in three versions to accommodate different computational requirements: 1. data.tar.gz (Full Dataset, ~30 GB) The complete ANABAG dataset containing all biological units (BUs) with comprehensive features and structures: Initial chain structures: Renumbered, chain-standardized format with antigen (AG) first and antibody (AG) second Formatted structures: Identical formating with the exeption of the chains: chain-standardized format with AG as chain A and AB as chain B Heteroatom files: Identical as Initial chain structure with the inclusion of all non-protein atoms (cofactors, glycans, water, etc.) Rosetta-processed data: Energy-minimized structures (relax) and associated features Note: Some Rosetta calculations did not complete successfully; these BUs lack Rosetta-specific outputs 2. light_version.tar.gz (Light Version, ~7 GB) A streamlined version for users who need core structural data without additional processing: Initial chain version of each biological unit Associated features and annotations Excludes: Heteroatom files, Rosetta features, and relaxed structures Ideal for initial exploration and machine learning applications that don't require heteroatoms 3. formated_structures_only.tar.gz (Minimal Version, ~4 GB) The most compact version containing essential structural information: Initial chain version of each biological unit only Suitable for quick access and overview of available complexes Recommended for users with limited storage or bandwidth 4. per_residue_files.tar.gz (Minimal Version, ~3 GB) The per residue features per_residue_information_AG.tsv containing all features for antigen residues per_residue_information_AB.tsv containing all features for antibody residues Note: All structures (except heteroatom files) include modeled regions where gaps up to 12 residues were modelled using Modeller and Disgro. Each residue is annotated in the 'Stat_res_pdbm' column as either 'Modelled' or 'Solved', allowing users to filter based on experimental vs. modeled content. The 'Distance_interface' column (in Ångströms) enables filtering of modeled residues (or any residue) by their proximity to the binding interface. Usage and Tools ANABAG can be used directly or through our companion tools available at: DSIMB/anabag-handler These scripts enable users to: Filter biological units based on specific criteria (pH range, experimental technique, resolution, secondary structures, etc.) Extract subsets for specialized analyses Convert between different structural formats Generate machine learning-ready features For detailed usage instructions and examples, please refer to the GitHub repository documentation.

Related Organizations
Keywords

Physical chemistry, Antigen-Antibody Complex, Uniprot

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average