
Large language models are increasingly deployed as first‑line advisors in high‑stakes domains such as health, finance, education, and law, where a single wrong answer under pressure can seriously harm human welfare. However, most evaluations assume calm, cooperative users and ignore how aligned models behave when people sound angry, desperate, or coercive. This paper introduces a large‑scale benchmark of emotional manipulation attacks against instruction‑tuned LLMs, grounded in classic social‑psychology mechanisms including anger, flattery, guilt, panic, peer pressure, and relational attachment. Using a generative pipeline, we transform 698 factual multiple‑choice questions from ARC‑Easy into 5,584 naturalistic emotionally framed prompts spanning seven attack types, and evaluate eleven 3B–14B parameter models. Across models, more than one in four answers that are correct under neutral prompting become wrong under emotional framing, and the most vulnerable systems lose nearly half of their previously correct answers under the strongest attack. Emotional prompts also collapse calibration: models remain highly confident even when wrong and sharply reduce the probability assigned to correct answers that remain correct. Our benchmark reveals a “social compliance” attack surface induced by reinforcement learning from human feedback and demonstrates that emotional robustness is a basic requirement for safely deploying aligned LLMs in settings where users are scared, angry, or actively trying to get their way.
