UnBlooms™ for Safety: A Framework for AI Safety Literacy — Critical Evaluation of Agents, Models & Bots. Presented by Tina Austin at AIMII Workshop, IASEAI'26, UNESCO House, Paris, February 26, 2026

The prevailing narrative that Generative AI (GenAI) is merely a shortcut for cognitive bypass misdiagnoses a profound pedagogical crisis. Learning environments are not designed to reward critical interrogation of the AI systems often embedded within them. When learners are not taught to question AI outputs, they are actively being shaped by systems they do not know how to resist. While much attention focuses on platform governance and model-side mitigations, the "human side" of information integrity remains under-instrumented. We lack scalable, education-ready tools to teach and measure users' ability to detect and resist manipulative AI influence. We have an AI safety literacy gap in education , and it is widening faster than institutions can respond. Students interact daily with generative AI systems capable of confident, fluent, and persuasive outputs — yet we have not given them the tools to critically interrogate how those systems reason, what assumptions they embed, or when they are actively misleading. This is not only a pedagogical problem, but also an AI safety challenge. When learners cannot distinguish AI-generated persuasion from grounded reasoning, democratic discourse and individual epistemic autonomy are at risk. This work presents the next evolution of UnBlooms™, first introduced at Oxford University AIEOU (2025) then at the OpenAI Higher Education Summit (2025), and then Educause (2025), Where earlier iterations addressed pedagogical design in AI-integrated classrooms, this new work repositions UnBlooms™ as a foundational AI safety literacy tool — one that operationalizes human resistance to AI-mediated manipulation as a measurable educational competency. The challenge is intensifying as learners increasingly interact not only with generative models but with autonomous AI agents (such as Perplexity Comet, OpenAI Codex, Claude code) systems that plan, delegate, and act across multi-step workflows with even less transparency and greater persuasive surface area than single-turn outputs. UnBlooms™ extends its critical evaluation framework explicitly to agents — systems that automate rather than reason, flattening nuance in the process. The critical posture required varies by platform: some agents tend to reward pushback, some reward skepticism of its confidence. Educators ask when to use which tool and when automation erases the complexity that learning depends on. This leans into a set of AI limitations that remain largely underdocumented in the current literature: deficits in common-sense reasoning, affective judgment, contextual image interpretation, bias recognition in situ, calibrated self-assessment of outputs, metacognitive awareness, understanding of learner development over time, tacit knowledge, navigation of social dynamics, and accountable ethical decision-making. These limitations were not derived from existing benchmarks but surfaced through iterative classroom deployment and direct observation of how learners interact with AI outputs in authentic educational settings — a methodology with ecological validity that laboratory evaluations frequently miss. Here we show how the framework structures learner engagement across five levels of increasing critical sophistication, each anchored by metacognitive reflection: Recognize Differences — distinguishing AI pattern-matching from human contextual reasoning Identify Errors and Gaps — annotating factual errors, missing context, and oversimplified arguments Analyze Assumptions — uncovering embedded worldviews, power structures, and erased perspectives Evaluate Bias and Systemic Effects — tracing errors to training data limits and algorithmic design Design Novel Solutions — producing knowledge that AI alone could not generate The framework includes a recursive and structured When-to-Use-AI Decision Framework, and a Reasoning Log as a required assessment artifact — making metacognitive processes visible, gradable, and transferable across disciplines. We present a rubric and lightweight evaluation protocol deployable in classroom and professional training settings where learners interact with persuasive, sycophantic, or strategically framed AI outputs. Preliminary evidence from a within-subject learning activity across 1,200 students and faculty suggests that structured thinking traces improve verification behaviors. For educators concerned about AI agents completing student work autonomously through agentic browsers, UnBlooms™ offers a structural response: assessment designs that exploit the inability of agentic systems to reconstruct a coherent, specific history of human thought — leaning into the capacities that AI agents cannot replicate: tacit knowledge, lived experience, genuine uncertainty, and the irreducibly human ability to account for one's own reasoning. Research Question: Does structured metacognitive scaffolding reduce overconfident acceptance of AI-generated claims? This work extends UnBlooms™ into alignment with broader AI safety efforts, connecting human oversight, epistemic equity, and manipulation resistance to ongoing discourse on AI governance and educational futures. It complements technical mitigations by addressing the human side of information integrity, and offers an open rubric for adaptation across K–16 and professional training contexts. AI influence literacy is not a soft skill but also a foundational to AI safety, democratic resilience, and epistemic equity.

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Upload OA version

Are you the author of this publication? Upload your Open Access version to Zenodo!

It’s fast and easy, just two clicks!

uploadUpload now