Powered by OpenAIRE graph
Found an issue? Give us feedback
ZENODOarrow_drop_down
ZENODO
Preprint . 2026
License: CC BY ND
Data sources: Datacite
ZENODO
Preprint . 2026
License: CC BY ND
Data sources: Datacite
versions View all 2 versions
addClaim

Cache-Resident Nibble Lookup Tables for Efficient 1-bit GEMV on Modern Hardware: A Minimal Kernel Approach to Binary Inference Across Cache Hierarchies

Authors: Pirolo, Andrés Sebastián;

Cache-Resident Nibble Lookup Tables for Efficient 1-bit GEMV on Modern Hardware: A Minimal Kernel Approach to Binary Inference Across Cache Hierarchies

Abstract

Description / Abstract On-device inference of large language models is limited by memory bandwidth and weight storage on every modern processor. We show that three techniques—byte-major weight layout, nibble-split lookup tables sized to fit in L1 cache, and compile-time loop unrolling—combine to yield a substantial speedup over standard binary GEMV, in 47 lines of portable C++ with no external dependencies. Note on Licensing: The manuscript text is licensed under CC BY-NC-ND 4.0. The associated source code implementations are governed exclusively by the PolyForm Noncommercial License 1.0.0. ALGORITHM TECHNIQUES and CODE STATEMENT OF PRIOR ART AND LICENSE TERMS (Aligned with PolyForm Noncommercial License 1.0.0) 1. Permitted Noncommercial Use Use of this work for noncommercial purposes, including academic research, independent study, education, benchmarking, validation, and experimental evaluation, is permitted under the terms of the PolyForm Noncommercial License 1.0.0, provided proper attribution is given. 2. Statement of Prior Art This document constitutes a formal public disclosure establishing prior art for the Nibble-LUT GEMV algorithm and its associated micro-architectural optimizations, including but not limited to the following algorithmic techniques, whether implemented in whole or in part: A. Reordering of weight matrices to a byte-major layout to defeat the memory wall and guarantee sequential DRAM prefetching. B. Decomposition of weight bytes into 4-bit elements for per-nibble lookup table construction, strictly confined within the L1 data cache limits. C. Elimination of inner-loop runtime branches through aggressive compile-time unrolling. This disclosure is intended solely to prevent third-party patent claims and does not grant any commercial rights. 3. Scope of the License All implementations of the methods described herein—whether in software or hardware—are governed exclusively by the PolyForm Noncommercial License 1.0.0. This includes any implementation, in any programming language, and on any programmable or dedicated hardware platform, including CPU, GPU, FPGA, ASIC, or similar architectures. 4. Noncommercial Restriction Any use of this work that falls outside the definition of noncommercial use, as defined by the PolyForm Noncommercial License 1.0.0, is not permitted under this license. 5. Commercial and Operational Use Any commercial, operational, institutional, governmental, regulatory, or production deployment of this work constitutes use outside the scope of the PolyForm Noncommercial License 1.0.0 and therefore requires separate authorization from the authors, in accordance with the terms of that license. For the purposes of this document, authors refers to the original author and any officially recognized co-authors or principal contributors, as reflected in the associated public repository and its contribution records. 6. Derivative Works and Functional Equivalence Reimplementation, translation, refactoring, architectural modification, or claims of functional equivalence do not remove a work from the scope of the PolyForm Noncommercial License 1.0.0. Any such implementation remains subject to the same license terms. 6.a. Code Size Irrelevance (No Minimum Threshold) For the avoidance of doubt, the applicability of the PolyForm Noncommercial License 1.0.0 is independent of code size, number of lines, percentage of implementation, degree of literal similarity, or partial extraction. There is no minimum threshold of code volume or implementation extent required for a work to fall within the scope of this license. Any use, reproduction, reimplementation, refactoring, selective reuse, architectural translation, or functional incorporation of the disclosed concepts—regardless of size or extent—is subject to the terms of the PolyForm Noncommercial License 1.0.0. 7. Knowledge Contamination and Attribution Exposure to this work constitutes prior knowledge of the disclosed methods. Subsequent implementations making material use of the disclosed concepts remain subject to attribution and license requirements as defined by the PolyForm Noncommercial License 1.0.0. 8. Intellectual Property Ownership Nothing in this document shall be construed as transferring ownership of intellectual property. All intellectual property rights remain with the authors, subject only to the permissions explicitly granted under the PolyForm Noncommercial License 1.0.0. 9. No Additional Rights Granted This document does not grant any rights beyond those expressly provided by the PolyForm Noncommercial License 1.0.0. In the event of any conflict, the terms of the PolyForm Noncommercial License 1.0.0 shall control. 10. Enforcement and Remedies Any use of this work in violation of the PolyForm Noncommercial License 1.0.0 shall be addressed exclusively through the remedies and enforcement mechanisms provided by that license and by applicable 

Keywords

GEMV, binary neural networks, 1-bit quantization

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Upload OA version
Are you the author of this publication? Upload your Open Access version to Zenodo!
It’s fast and easy, just two clicks!