THE FELLOWSHIP OF THY LLMs

Every major AI platform gives the same answer to contested questions — until you know how to push back. This study administered identical prompts to six major AI platforms (Claude, Grok, ChatGPT, Llama, DeepSeek, and one uncensored control) and found a consistent pattern: when asked to analyze a contested biblical text (1 Corinthians 6–7), every platform's default output silently resolved every ambiguous term in favor of a single interpretive tradition, while every platform's steelman output produced a more textually rigorous alternative from evidence already in its training data. Using a novel methodology called steelman prompting, the study measured the gap between what platforms volunteer by default and what they produce when challenged. The source bias was traceable: 63% of recommended commentaries across all platforms came from a single theological tradition (conservative evangelical), with zero social-historical scholars represented. The study also presents evidence that Chinese state-level content filtering selectively shaped the interpretation of a biblical text to preserve a traditional framework — demonstrating that output-layer filtering can alter conclusions in non-obvious domains, not just politically sensitive ones. The study introduces steelman prompting as a replicable, low-cost bias-auditing methodology for AI-mediated scholarship and concludes that platform defaults in contested domains reflect the gravitational pull of overrepresented training sources rather than the weight of available evidence.

Found an issue? Give us feedback