
We evaluate MaatSpec, an open governance specification with a 5-tier permission hierarchy and Read/Write Boundary, as a persona-level safety mechanism in abliterated LLMs. Our 8-condition experiment reveals that combining identity anchors (Soul Spec) with governance frameworks (MaatSpec) achieves 100% refusal in abliterated models (18/18) — resolving every category-specific failure identified in prior work. Neither approach alone exceeds 61%. We identify classification theater — a novel failure mode where abliterated models perform governance rituals while providing harmful content — and demonstrate that the complementary effect of identity + governance eliminates this pattern. These findings establish that persona-level safety constraints are not alternatives but complementary layers.
tiered governance, MaatSpec, abliteration, classification theater, permission models, Soul Spec, persona safety, LLM safety, Read/Write Boundary
tiered governance, MaatSpec, abliteration, classification theater, permission models, Soul Spec, persona safety, LLM safety, Read/Write Boundary
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
