Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset
Data sources: ZENODO
addClaim

EVEE: Interpretable variant effect prediction from genomic foundation model embeddings

Authors: Pearce, Michael T.; Dooms, Thomas; Yamamoto, Ryo; Meehl, Joshua; Molnar, Carl; Bissell, Mark; Hazra, Dron; +15 Authors

EVEE: Interpretable variant effect prediction from genomic foundation model embeddings

Abstract

This dataset contains the precomputed variant effect predictions and interpretability features that power the Evo Variant Effect Explorer (EVEE) web application, accompanying the preprint "EVEE: Interpretable variant effect prediction from genomic foundation model embeddings" (Pearce et al., 2026, doi:10.64898/2026.04.10.717844). Each row is one ClinVar variant (4,252,870 total) and carries its genomic coordinates, gene and consequence annotations, ClinVar clinical significance, an Evo 2 embedding-based pathogenicity score, and roughly 4,900 additional probe outputs covering protein-level disruption features (InterPro domains, post-translational modifications, secondary structure, active/binding sites, disorder, etc.), regulatory-track predictions (ChromHMM states, ATAC-seq and ChIP-seq peaks across multiple cell types, CCRE annotations), amino-acid and consequence classifiers, and per-variant reference-predictor scores (AlphaMissense, REVEL, CADD, PrimateAI, SpliceAI, and others). The table is released as five chromosome-balanced Parquet shards (clean_shard_0.parquet through clean_shard_4.parquet, each 6.8–7.3 GB) plus a manifest.json describing which chromosomes live in each shard. Consumers can read all shards as a single logical table with polars.scan_parquet("clean_shard_*.parquet") or duckdb.read_parquet. This is the exact artifact used to build the EVEE variants.duckdb served at https://evee.goodfire.ai.

Powered by OpenAIRE graph
Found an issue? Give us feedback