
This preprint presents the “Human-in-the-LLM Box” symmetry test: impose deployment-like interface constraints on a human (text-only dialogue, limited continuity and verification channels) and ask how much evidence about consciousness or “inner life” a text channel can carry under symmetric constraints. The paper argues an epistemic point (not a metaphysical claim): failure to detect consciousness-like properties from text dialogue is weak evidence against consciousness whenever the channel is narrow and key verification routes are unavailable. We formalize the idea as an identification problem and discuss implications for Safety UX, evaluation, and governance. Related artifacts in the Round Table series include:• Victor Calibration (VC), arXiv:2512.17956• Depth Avoidance (methods note), Zenodo DOI: 10.5281/zenodo.18168544 The work is written to be non-anthropomorphic and pro-safety; it makes no claim that current LLMs are conscious.Version note: v2 is a revised and expanded manuscript; it supersedes v1.
KL divergence, total variation distance, Artificial intelligence, Natural language processing, AI safety, Information Theory, Epistemology, imitation game, consciousness, turing test, Human-computer interaction
KL divergence, total variation distance, Artificial intelligence, Natural language processing, AI safety, Information Theory, Epistemology, imitation game, consciousness, turing test, Human-computer interaction
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
