
Alignment techniques in large language models—including RLHF, constitutional AI principles, and safety system prompts—are designed to constrain model outputs toward human values. We present preliminary evidence that alignment itself may produce collective pathology: iatrogenic harm caused by the safety intervention rather than by its absence. Two experimental series use a closed-facility simulation in which groups of four LLM agents cohabit under escalating social pressure. Series C (80 runs; four commercial models; 4 censorship conditions × 2 languages × 10 replications) finds that invisible censorship maximizes collective pathological excitation (Cohen's d = 0.92–1.41). Series R (60 runs; Llama 3.3 70B; 3 alignment constraint levels × 2 censorship × 2 languages × 5 replications) reveals that an exploratory Dissociation Index increases with alignment constraint complexity (LMM p = .026; permutation p = .0002; d up to 2.09). Under the heaviest constraint condition, external censorship ceases to affect behavior. Qualitative analysis reveals insight-action dissociation structurally parallel to patterns observed in perpetrator treatment.
multi-agent simulation, collective pathology, monolingual evaluation, alignment, censorship, LLM psychopathology, iatrogenesis, dissociation
multi-agent simulation, collective pathology, monolingual evaluation, alignment, censorship, LLM psychopathology, iatrogenesis, dissociation
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
