BLURR: A Boosted Low-Resource Inference for Vision-Language-Action Models

Vision-language-action (VLA) models enable impressive zero shot manipulation, but their inference stacks are often too heavy for responsive web demos or high frequency robot control on commodity GPUs. We present BLURR, a lightweight inference wrapper that can be plugged into existing VLA controllers without retraining or changing model checkpoints. Instantiated on the pi-zero VLA controller, BLURR keeps the original observation interfaces and accelerates control by combining an instruction prefix key value cache, mixed precision execution, and a single step rollout schedule that reduces per stResearch goal: How does the inference throughput (tokens per second) of SMoES-based MoE-VLMs with varying expert counts (e.g., 4, 8, 16) compare to dense VLMs on the MMMU benchmark under fixed FLOPs budgets?Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 7.5/10.

Found an issue? Give us feedback