
While Multi-Hop Question Answering is a foundational task in Natural Language Processing, the current approaches rely on Large Language Models, which we characterize as computationally prohibitive for local deployment. In this paper, we propose a data-centric approach to Multi-Hop Retrieval, utilizing a 5.38M parameter model with 195.5ms of CPU latency, which occupies only 20.5MB of disk space. On standardized numerical benchmarks, it achieves 0.61 on MRR and 0.56 on Recall@1, outperforming the non-iterative sBERT baseline by factors of 1.72× and 2.75×, respectively. Through the Broken Chain experiment, we demonstrate that such models have the ability to implicitly learn sequential dependencies, previously often observed in larger-scale architectures. We also utilize the Explicit Positional Encoding (EPE) tokens as a way to ground the model’s output in long-sequence environments. Furthermore, we characterize EPE as a structural regularizer, demonstrating its ability to mitigate the Sequence Tax in small recurrent architectures.
