Scaling Down: Multi-Hop Information Retrieval in Resource-Constrained Environments

Staroverov, Nikolay

Found an issue? Give us feedback

ZENODOarrow_drop_down

ZENODO

Preprint

Data sources: ZENODO

Scaling Down: Multi-Hop Information Retrieval in Resource-Constrained Environments

descriptionPublicationkeyboard_double_arrow_right Preprint Under curation English Publisher:Zenodo

Authors: Staroverov, Nikolay;

doi: 10.5281/zenodo.20526992

Scaling Down: Multi-Hop Information Retrieval in Resource-Constrained Environments

- Summary

Abstract

While Multi-Hop Question Answering is a foundational task in Natural Language Processing, the current approaches rely on Large Language Models, which we characterize as computationally prohibitive for local deployment. In this paper, we propose a data-centric approach to Multi-Hop Retrieval, utilizing a 5.38M parameter model with 195.5ms of CPU latency, which occupies only 20.5MB of disk space. On standardized numerical benchmarks, it achieves 0.61 on MRR and 0.56 on Recall@1, outperforming the non-iterative sBERT baseline by factors of 1.72× and 2.75×, respectively. Through the Broken Chain experiment, we demonstrate that such models have the ability to implicitly learn sequential dependencies, previously often observed in larger-scale architectures. We also utilize the Explicit Positional Encoding (EPE) tokens as a way to ground the model’s output in long-sequence environments. Furthermore, we characterize EPE as a structural regularizer, demonstrating its ability to mitigate the Sequence Tax in small recurrent architectures.

Found an issue? Give us feedback