
This repository contains the computational framework, simulated datasets, and a standalone Python solver designed to evaluate and mitigate variance leakage in ANOVA-PCA under unbalanced experimental designs. Classical ANOVA-PCA relies on marginal means to estimate factor effects, which implicitly assumes structural orthogonality. However, missing data or unbalanced designs compromise this assumption, leading to severe Frobenius estimation errors and spurious geometric subspace rotations during the PCA stage. To address this, this repository provides: Monte Carlo Simulation Engine: A complete Python pipeline to simulate synthetic vibrational spectra based on a hierarchical $2^4$ factorial design. It allows the evaluation of data loss impacts under distinct attrition regimes (e.g., Missing Completely at Random - MCAR vs. systematic structural imbalance). Theoretical Bound Assessment: Scripts to quantify subspace distortion and interpretability loss strictly through the lens of Davis-Kahan perturbation bounds. Robust GLM Solver (glm_anova_pca.py): A standalone, object-oriented Python module that implements General Linear Model (GLM) orthogonal projections via Moore-Penrose pseudoinverse. This solver is designed for direct application to new experimental datasets, ensuring structural integrity even under severe imbalance. Real-World Validation: Execution logs and scripts applying the GLM framework to experimental Near-Infrared (NIR) spectroscopy data (bread staling kinetics), establishing baseline equivalence and backward compatibility with classical approaches in perfectly balanced scenarios. This open-science package ensures full reproducibility of the associated manuscript, providing chemometricians and data scientists with a robust standard for routine variance partitioning in multivariate analysis.
