
Initial public release accompanying the submission of "Backfire Phase Structure in Contaminated Chain-of-Thought Reasoning." Contents Raw trial data for all experiments (E1, E1b, E1aug, E3, diagnostic) Merged n=100 dataset with 95% Wilson confidence intervals Experiment notebooks (Google Colab) Analysis script for merging and computing CIF/GAF Figures (PDF vector format) Experiments | Experiment | Description | Trials | |------------|-------------|--------| | E1 | 3 consumer models × 3 domains × 5 conditions × 50 problems | ~2,250 | | E1b | 2 additional consumer models (same protocol) | ~1,500 | | E1aug | All 5 models on new problem set (indices 50-99) | ~3,750 | | E3 | Social compliance battery (5 models) | ~1,000 | | E1aug_diag | Prompt sensitivity diagnostic (GPT-4o-mini × GSM8K) | ~150 | Key results (merged n=100) CIF rates: 1.1% (Sonnet 4, BoolQ) to 85.5% (GPT-3.5, CSQA) Gap amplification factor: up to 57× (BoolQ) Sycophancy dissociation confirmed across all domains
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
