CGAA: Concept-Guided Adversarial Attacks

An adversarial attack that adds a CAV-based concept-alignment term to the BIM loss. Archived after Phase 1 smoke tests revealed that the approach moves linear probe outputs without producing genuine concept changes (mean-pool gradient pathology), and that the core entanglement claim was occupied by Nicolson et al. (TMLR 2025).

Found an issue? Give us feedback