Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Software
Data sources: ZENODO
addClaim

General Linear Model (GLM) framework for robust ANOVA-PCA in unbalanced chemometric designs

Authors: Torres, Éder Rissi; Ferreira, Márcia Miguel Castro;

General Linear Model (GLM) framework for robust ANOVA-PCA in unbalanced chemometric designs

Abstract

This repository contains the computational framework, simulated datasets, and a standalone Python solver designed to evaluate and mitigate variance leakage in ANOVA-PCA under unbalanced experimental designs. Classical ANOVA-PCA relies on marginal means to estimate factor effects, which implicitly assumes structural orthogonality. However, missing data or unbalanced designs compromise this assumption, leading to severe Frobenius estimation errors and spurious geometric subspace rotations during the PCA stage. To address this, this repository provides: Monte Carlo Simulation Engine: A complete Python pipeline to simulate synthetic vibrational spectra based on a hierarchical $2^4$ factorial design. It allows the evaluation of data loss impacts under distinct attrition regimes (e.g., Missing Completely at Random - MCAR vs. systematic structural imbalance). Theoretical Bound Assessment: Scripts to quantify subspace distortion and interpretability loss strictly through the lens of Davis-Kahan perturbation bounds. Robust GLM Solver (glm_anova_pca.py): A standalone, object-oriented Python module that implements General Linear Model (GLM) orthogonal projections via Moore-Penrose pseudoinverse. This solver is designed for direct application to new experimental datasets, ensuring structural integrity even under severe imbalance. Real-World Validation: Execution logs and scripts applying the GLM framework to experimental Near-Infrared (NIR) spectroscopy data (bread staling kinetics), establishing baseline equivalence and backward compatibility with classical approaches in perfectly balanced scenarios. This open-science package ensures full reproducibility of the associated manuscript, providing chemometricians and data scientists with a robust standard for routine variance partitioning in multivariate analysis.

Powered by OpenAIRE graph
Found an issue? Give us feedback