Powered by OpenAIRE graph
Found an issue? Give us feedback
ZENODOarrow_drop_down
ZENODO
Preprint . 2026
License: CC BY
Data sources: Datacite
ZENODO
Preprint . 2026
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

Consistency Is All You Need: Cross-Architecture Validation and Replication Guide

Authors: Napolitano, Logan Matthew;

Consistency Is All You Need: Cross-Architecture Validation and Replication Guide

Abstract

This is the complete technical companion to "Consistency Is All You Need," extending the original work from a single architecture to a full cross-architecture validation across three fundamentally different large language model families. Where the original paper introduced the concept of lightweight cognitive probes for behavioral detection, this report provides the definitive evidence that the method is architecture-independent, the complete methodology for replication, and a novel discovery about state-space model superiority. We validate our cognitive probe methodology on three 7B-parameter models that represent the major architectural paradigms in modern AI: Qwen 2.5-7B (transformer with Grouped-Query Attention), Mistral-7B-Instruct-v0.3 (transformer with Sliding Window Attention), and Falcon-Mamba-7B (a pure state-space model with zero attention heads). The same probe architecture — a 200K-parameter fiber projection paired with a lightweight classification head — achieves extreme behavioral separation on all three, with zero modifications to the base model and 0.003% parameter overhead. Results summary: Qwen achieved separation ratios from 125x to 366x across nine behavioral dimensions (repetition, hedging, verbosity, sycophancy, depth, specificity, calibration, focus, coherence). Mistral achieved 999x separation on all five enhancement probes (depth, specificity, calibration, focus, coherence), representing near-perfect behavioral detection. Falcon-Mamba achieved 999x separation on depth and specificity probes, matching transformer performance despite having a completely different computational mechanism — recurrent state updates instead of attention. A key novel finding is the discovery that state-space models achieve significantly faster probe convergence than transformers. Using our Convergence Efficiency Metric (CEM = separation / training steps), Mamba's specificity probe reached 724x separation in just 500 steps compared to Qwen requiring 1,500 steps for equivalent performance — a 4.3x convergence advantage. We hypothesize this stems from SSMs' single-pathway information flow creating more coherent behavioral encoding compared to multi-head attention distributing information across parallel pathways. The probe architecture consists of two components. The Fiber Projection extracts behavioral signals from three model layers (selected at 25%, 50%, and 75% of model depth) and projects them from the full hidden dimension (4096) to a 16-dimensional behavioral fiber space using learned linear projections with softmax-weighted layer aggregation. The Probe Head is a small MLP (16 → 64 → 64 → 1) with ReLU activations and sigmoid output that classifies the fiber embedding into a behavioral score between 0.0 (desired behavior) and 1.0 (undesired behavior). This report includes: complete training results with step-by-step convergence logs for all architectures; the full probe architecture with exact parameter counts (201,924 parameters per probe); per-token behavioral labeling algorithms for all nine dimensions with complete code; three intervention mechanisms (temperature steering, best-of-K token selection, and logit biasing) with production-ready implementations; hyperparameter sensitivity analysis covering fiber dimension sweeps, learning rate sensitivity, and probe layer selection strategies; a production deployment guide with monitoring code and alert thresholds; and a complete replication guide covering environment setup, hardware requirements, training pipeline, expected results at each checkpoint, and checkpoint format specification. All results were produced on a single NVIDIA RTX 3090 (24GB) using 4-bit NF4 quantization. The training pipeline uses AdamW optimization with a learning rate of 5e-5, batch size of 2, and gradient accumulation of 8 steps. No distributed training, no cloud compute, and no proprietary datasets were required. Training data is generated synthetically using contrastive prompt-response pairs for each behavioral dimension. The trained Qwen model with embedded cognitive probes is publicly available on HuggingFace at LoganResearch/qwen2.5-7b-cognitive-enhanced. The project website is at proprietiveai.com. Keywords: cognitive probes, behavioral detection, AI safety, state-space models, Mamba, transformer probing, lightweight inference, cross-architecture validation, behavioral control, LLM monitoring

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Upload OA version
Are you the author of this publication? Upload your Open Access version to Zenodo!
It’s fast and easy, just two clicks!