The Verbosity Premium: What RLHF-Induced Token Inflation Costs the AI Industry

We aggregate published measurements of RLHF-induced response length inflation across the literature and compute the first industry-scale estimate of its economic cost. Alignment training systematically inflates output length: sentences triple after SFT, DPO doubles response length within the first 10% of training, and on one benchmark 98% of PPO reward improvement is attributable to length alone. Verbosity compensation rates range from 13.6% to 74.2% across 14 models, and output tokens cost 4-8x more than input tokens across all frontier providers. Combining published verbosity rates, real-world token volumes, and current API pricing, we estimate the annual verbosity premium at 500M to 1.8B, with a central estimate of 1.2B (approximately 14% of total industry inference spend). We survey 12 training-side mitigations and show that all target response length rather than information density. A 500-token response with 50 atomic facts is efficient; the same length with 10 facts restated five ways is waste. Length penalties cannot distinguish these cases. Drawing on rate-distortion theory and evidence that factual precision degrades with response length, we argue the correct optimization target is information density (supported facts per token) and present two concrete density-aware reward formulations.

Found an issue? Give us feedback