
Parallax is a multi-module cognitive augmentation middleware layer that operates at inference time to measurably improve the quality of foundation model outputs without fine-tuning, weight modification, or model-specific training. We evaluate Parallax across seven foundation models — Claude Opus 4.6, Claude Sonnet 4.6, GPT-4.1, Mistral Large, DeepSeek v3.1, GPT-OSS 120B, and Qwen3-VL 235B — using a 38-task benchmark battery (24 elevation, 9 stability, 5 preservation) spanning cognitive elevation, multi-turn stability, and output preservation. All outputs were scored blind by two independent AI judges (Claude Opus 4.6 and Grok 4.1 Fast) across five dimensions (Depth, Utility, Specificity, Coherence, Elevation) on a 0–3 scale. Six of seven models showed positive cognitive lift when Parallax was active, with averaged dual-judge gains ranging from +0.13 to +0.69 composite points across models. The strongest single-model result was Mistral Large, which improved from 1.46 to 2.27 under one judge and 2.13 to 2.64 under the other. Parallax also improved the frontier model (Claude Opus 4.6: +0.46 avg), demonstrating value beyond rescue of underperforming systems. Inter-rater reliability was strong: 96.2% agreement within one point, 56.1% exact match. Elevation (+0.57 avg lift) and Depth (+0.48 avg lift) were the most consistently improved dimensions, confirming that Parallax primarily enhances cognitive processing rather than surface formatting. One model (Qwen3-VL 235B) showed negligible negative lift (−0.10 avg), attributable to alignment-origin mismatch rather than capability deficit.
cognitive augmentation, foundation models, model-agnostic, Computer Systems/ethics, middleware, benchmark, Computers, Computer Systems, inference-time enhancement
cognitive augmentation, foundation models, model-agnostic, Computer Systems/ethics, middleware, benchmark, Computers, Computer Systems, inference-time enhancement
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
