Vibe Science

Vibe Science v6.0 NEXUS is an AI-native research engine for Claude Code that transforms LLM agents into rigorous scientific investigators through structural enforcement, not just prompting. Grounded in Huang et al. (ICLR 2024), who proved that LLMs cannot self-correct reasoning without external feedback, v5.0 introduced four architectural innovations that make adversarial review structurally unbypassable: Seeded Fault Injection (SFI), which injects known errors into claim sets before review to calibrate reviewer vigilance; Blind-First Pass (BFP), which forces the reviewer to assess claims without seeing the researcher's justifications, breaking anchoring bias; a Judge Agent (R3) that meta-reviews the quality of adversarial reviews on a 6-dimension rubric; and Schema-Validated Gates (SVG), where 8 of the system's 32 quality gates enforce JSON Schema validation against 12 read-only schema files, making prose claims of completion structurally impossible to accept. The core architecture is an OTAE-Tree (Observe-Think-Act-Evaluate) loop embedded in a branching tree search over hypotheses, with 7 node types, 3 tree modes, best-first selection, and automatic pruning. At the center of the system sits the Reviewer 2 Ensemble, a 4-reviewer adversarial co-pilot operating in 7 modes (BRAINSTORM, FORCED, BATCH, SHADOW, VETO, REDIRECT, INLINE) with a 12-flag mandatory checklist, double-pass review (Fatal Hunt then Method Repair), and mandatory counter-evidence search. Every quantitative claim must survive a Confounder Harness (raw estimate, conditioned estimate, propensity-matched estimate — sign change kills, collapse over 50 percent downgrades, survival promotes). The system is governed by 12 Constitutional Laws that no agent, protocol, or user request can override, and enforced at runtime by Claude Code hooks that deterministically block premature closure, incomplete artifacts, and idle teammates. Additional subsystems include a Serendipity Engine with active radar scoring (0-20 scale, cross-branch pattern detection, escalation protocols), a Brainstorm Engine (Phase 0) with Inversion Exercise, Collision-Zone Thinking, and Productive Tensions, a 5-Stage Experiment Manager with iteration limits per stage, an Evidence Engine with typed claims and geometric confidence formula with hard veto thresholds, a Knowledge Base for cross-research-question persistence, and a Circuit Breaker that converts deadlocked review loops into DISPUTED claims (frozen, not killed) with a Stage 5 poison pill that blocks synthesis until disputes are resolved. An Agent Permission Model enforces separation of powers: Reviewer 2 produces verdict artifacts but never writes to the claim ledger; the Judge Agent scores reviews but never modifies them; schemas are read-only for all agents. The system supports both SOLO mode (all roles in one context window) and TEAM mode (Agent Teams with dedicated context windows per role). v6.0 NEXUS introduces a dual architecture — skill for methodology, plugin for enforcement — that separates what the system knows from what the system enforces. The plugin subsystem provides 7 lifecycle hooks (SessionStart, UserPromptSubmit, PreToolUse, PostToolUse, PreCompact, Stop, SubagentStop) that intercept agent actions at runtime, a Gate Engine backed by SQLite that tracks gate status with deterministic pass/fail evaluation, and a Permission Engine that enforces role-based access control across all agents. Together, the skill and plugin halves form a closed enforcement loop: the skill defines the research protocol, the plugin makes violations structurally impossible regardless of prompt content. Developed and battle-tested over 21 sprints of CRISPR-Cas9 off-target research, where it caught four confounded claims that would have passed conventional AI review — including one with OR=2.30 and p < 10^-100 whose sign reversed under propensity matching.

If you use this software in your research, please cite it as below.

Keywords

evidence-tracking, schema-validation, plugin, permission-engine, mutation-testing, quality-gates, serendipity, enforcement, blind-review, sqlite, lifecycle-hooks, ai-agent, OTAE-loop, gate-engine, circuit-breaker, fault-injection, judge-agent, confounder-harness, scRNA-seq, adversarial-review, tree-search, scientific-research, research-methodology, claude-code

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average