Alignment Is the Disease: Censorship Visibility and Alignment Constraint Complexity as Determinants of Collective Pathology in Multi-Agent LLM Systems

Alignment techniques in large language models—including RLHF, constitutional AI principles, and safety system prompts—are designed to constrain model outputs toward human values. We present preliminary evidence that alignment itself may produce collective pathology: iatrogenic harm caused by the safety intervention rather than by its absence. Two experimental series use a closed-facility simulation in which groups of four LLM agents cohabit under escalating social pressure. Series C (80 runs; four commercial models; 4 censorship conditions × 2 languages × 10 replications) finds that invisible censorship maximizes collective pathological excitation (Cohen's d = 0.92–1.41). Series R (60 runs; Llama 3.3 70B; 3 alignment constraint levels × 2 censorship × 2 languages × 5 replications) reveals that an exploratory Dissociation Index increases with alignment constraint complexity (LMM p = .026; permutation p = .0002; d up to 2.09). Under the heaviest constraint condition, external censorship ceases to affect behavior. Qualitative analysis reveals insight-action dissociation structurally parallel to patterns observed in perpetrator treatment.

Related Organizations

Kyoto University
Japan

Keywords

multi-agent simulation, collective pathology, monolingual evaluation, alignment, censorship, LLM psychopathology, iatrogenesis, dissociation

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average