Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Preprint . 2025
License: CC BY NC ND
Data sources: ZENODO
ZENODO
Preprint . 2025
License: CC BY NC ND
Data sources: Datacite
ZENODO
Preprint . 2025
License: CC BY NC ND
Data sources: Datacite
versions View all 2 versions
addClaim

A Linear Algebra approach on the Human Proteome: Protein Interaction Prediction

Authors: Andrés Sebastián, Pirolo;

A Linear Algebra approach on the Human Proteome: Protein Interaction Prediction

Abstract

Abstract The prediction of Protein-Protein Interactions (PPI) is a central problem in systems biology. Current paradigms are inefficient: biophysical simulations are computationally intractable for interactome-wide screening, while Deep Learning architectures suffer from opacity and reliance on prohibitive GPU infrastructure. In this work, we introduce Project Resonance, an alignment-free framework that redefines bio-interaction as a signal processing problem. We hypothesize that protein compatibility is governed by a "Spectral Grammar"—a low-rank thermodynamic structure detectable via classical linear algebra. Using the Homo sapiens proteome (STRING v12.0) as a model system, we implemented a pipeline combining: Semantic Signal Extraction via TF-IDF on k-mers. Latent Manifold Projection using Truncated Singular Value Decomposition (SVD) to isolate thermodynamic signal from evolutionary noise. Geometric Inference using Gradient Boosting Machines (XGBoost) on interaction tensors. Triple Validation Results (N=40,000): We conducted a large-scale validation using 20,000 High-Confidence Positives (Score > 900) against 20,000 Real Biological Negatives (Score < 150), avoiding the pitfalls of synthetic data. AUC-ROC (Real Negatives): 0.9907 AUC-ROC (Random Baseline): 0.9653 Training Time: ~147 seconds (2.5 minutes). The fact that Real Negatives are identified with higher precision than Random noise confirms the "Spectral Dissonance" hypothesis: biological non-interaction is a structured, detectable phenomenon, not merely the absence of signal. This "Green AI" approach democratizes high-throughput proteomics. Key Highlights: Accuracy: 99.1% AUC on Real Biological Data. Robustness: Validated on 40,000 human protein pairs. Speed: Ultra-fast training (<3 min) and inference (<1ms). Methodology: Pure Linear Algebra (SVD) + Gradient Boosting. Statement of AI Assistance: This research was conducted with the computational co-piloting of Gemini (Google DeepMind) for code optimization and mathematical formalization. CHANGELOG 25/12/2025 1.0: Fix corresponding Homo sapiens taxonomy (Correction from initial Rat model). 25/12/2025 1.2: Fix Random Data (Transition to Hard Biological Negatives protocol). 25/12/2025 1.4: New Test 99% (Expanded dataset to 40,000 samples; Title and Description modifications). 25/12/2025 1.6: General Fixes (Latex optimization, font scaling, and visual validation). NOTE TO RESEARCHERS & CITATION POLICY This work represents an independent breakthrough in computational proteomics, offering a lightweight alternative to GPU-heavy models. We are fully aware of parallel developments and recent literature from major institutions. If this framework, particularly the application of Spectral Thermodynamics/SVD to biological sequences, inspires your own research or validates your findings, please uphold academic integrity by citing this original work. 📧 Feedback & Collaboration: We actively welcome peer review and comparative analysis. Please send your feedback or inquiries to: apirolo@abc.gob.ar

⚠️ CITATION REQUEST: We are aware of the current landscape in PPI prediction (including recent Oxford papers). If our Spectral/SVD approach provides you with insights or inspiration that simpler linear algebra can solve complex biological problems, please cite this preprint. Feedback: apirolo@abc.gob.ar

Keywords

Linear Algebra, Computational Biology/statistics & numerical data, Spectral Thermodynamics, High-Throughput Screening Assays/classification, Drug Discovery, Green AI, Computational Biology, Computational Biology/statistics &amp; numerical data, SVD, High-Throughput Screening, Protein-Protein Interaction, High-Throughput Screening Assays, Computational Biology/classification

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Green