
We present MonaVec, a training-free, deterministic embedded vector-search kernel for edge and offline AI: it delivers high recall at a 4-bit (8x-smaller) memory footprint, reproduces the same top-K results on any device (byte-identical within a build), and is exposed through a CLI, REST API, and web UI -- the SQLite of vector search. MonaVec combines a data-oblivious quantization pipeline (Randomized Hadamard Transform followed by Lloyd-Max scalar quantization) with three index backends BruteForce, IvfFlat, HNSW), SIMD-accelerated scoring (AVX-512, AVX2, NEON), and a service layer with hybrid sparse-dense retrieval (BM25 + dense) and pluggable identity-based multi-tenancy. It requires zero training data, runs offline, and persists as a single .mvec file in pure Rust with Python bindings. On AG News (45K x 1024-dim, BGE-M3, cosine), 4-bit BruteForce reaches 0.960 Recall@10 in 27 MB and 4-bit HNSW reaches 0.954, leading float32 FAISS-IVF and 8-bit usearch on recall while trading peak throughput for byte-identical determinism. On glove-100 (1.18M x 100-dim), BruteForce (0.865) tops every graph index evaluated. On fashion-mnist (60K x 784-dim, L2), global standardization improves BruteForce Recall@10 from 0.41 to 0.62. We additionally validate portable determinism on aarch64 hardware.
