Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Software . 2025
Data sources: ZENODO
ZENODO
Software . 2025
Data sources: Datacite
addClaim

EvoMotif: Evolution-Driven Framework for Protein Motif Discovery

Authors: Taha Ahmad;

EvoMotif: Evolution-Driven Framework for Protein Motif Discovery

Abstract

EvoMotif: Evolutionary Protein Motif Discovery and Statistical Validation OVERVIEWEvoMotif discovers evolutionarily conserved protein motifs through multi-species sequence analysis, combining information theory, evolutionary substitution matrices, and rigorous statistical validation. CORE ALGORITHMS 1. Dual-Metric Conservation Scoring - Shannon Entropy: H(i) = -Σ p_a(i) × log₂ p_a(i), normalized to [0,1] Detects strict conservation (identical residues at catalytic sites) - BLOSUM62 Score: Captures functional constraints from evolutionary substitution data Detects functional conservation (physicochemically similar substitutions) - Combined Score: C_final(i) = 0.5 × C_shannon(i) + 0.5 × B_norm(i) 2. Sliding Window Motif Discovery - Multi-scale scanning: windows of 5, 7, 9, 11, 13, 15, 17, 19, 21 residues - Adaptive thresholding (default: conservation ≥ 0.70) - Overlap resolution: keeps highest-scoring windows - Gap filtering: requires ≥70% sequence coverage 3. Statistical Validation - Permutation Testing: 10,000 permutations per motif for exact p-values - FDR Correction: Benjamini-Hochberg procedure at α = 0.05 - Effect Size: Cohen's d > 0.5 required for reporting - Only motifs with p 0.5 are reported VALIDATION RESULTSTested against known functional sites in hemoglobin α-chain, p53 tumor suppressor, and BRCA1:- Hemoglobin: 100% detection of heme-binding residues (His59, His88)- p53: All 5 Zn²⁺-binding cysteines identified, R248 and R273 cancer hotspots detected- BRCA1: RING domain Cys/His residues, BRCT phospho-peptide binding sites foundConclusion: All discovered motifs correspond to experimentally validated functional sites PERFORMANCE BENCHMARKS (Intel Core i7-9700K, 16GB RAM)- Ubiquitin (50 seq, 76 res): 45 sec total, 350 MB memory, 4 motifs- Hemoglobin α (100 seq, 143 res): 2.5 min total, 580 MB memory, 9 motifs- p53 (150 seq, 393 res): 8 min total, 1.2 GB memory, 12 motifs- BRCA1 (200 seq, 1863 res): 28 min total, 3.8 GB memory, 38 motifs USE CASES1. Mutagenesis Planning: Identify critical residues (conservation > 0.85) vs safe targets (< 0.4)2. Disease Variant Interpretation: Assess pathogenicity of missense mutations3. Functional Domain Annotation: Discover domains in unannotated proteins4. Protein Engineering: Design minimal functional constructs5. Structural Biology: Correlate conservation with AlphaFold confidence scores6. Comparative Genomics: Study evolutionary constraints across protein families PIPELINE STAGESSequence retrieval (NCBI) → Alignment (MAFFT) → Conservation scoring (Shannon + BLOSUM62) → Motif discovery (sliding windows) → Statistical validation (permutation + FDR) → Phylogenetic tree (FastTree) → Structure mapping (PDB) OUTPUT FILES- FASTA: sequences and alignments- JSON: conservation scores, motifs with p-values and effect sizes- Newick: phylogenetic trees- PDB: conservation mapped to B-factor column INSTALLATIONpip install evomotifExternal dependencies: mafft, fasttree (via apt, brew, or conda) DOCUMENTATIONGitHub: https://github.com/tahagill/EvoMotifComplete Guide: https://github.com/tahagill/EvoMotif/blob/main/docs/COMPLETE_GUIDE.mdPyPI: https://pypi.org/project/evomotif/ REQUIREMENTSPython 3.8-3.11, Linux/macOS/WSL, 8GB RAM minimum (16GB recommended) LICENSEMIT License

Keywords

phylogenetics, protein-structure, protein-motifs, evolution, variant-analysis, bioinformatics, conservation-analysis, structural-biology, computational-biology, multiple-sequence-alignment

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average