Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Preprint . 2026
License: CC BY
Data sources: ZENODO
ZENODO
Preprint . 2026
License: CC BY
Data sources: Datacite
ZENODO
Preprint . 2026
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

SSM-Aware Fine-Tuning for Hybrid Mamba-Transformer Models: A Comparative Study on Granite 4.0-H-Micro

Authors: Cody, Ford;

SSM-Aware Fine-Tuning for Hybrid Mamba-Transformer Models: A Comparative Study on Granite 4.0-H-Micro

Abstract

We present a systematic study of LoRA fine-tuning strategies for IBM Granite 4.0-H-Micro, a 3.2B-parameter hybrid architecture comprising 36 Mamba-2 state space layers and 4 Transformer attention layers. We evaluate four fine-tuning approaches against an unmodified baseline across three domain-specific tasks: document classification (24K examples), schema mapping (15K examples), and structured rule generation (9K examples). Our investigation proceeds in two stages. First, we find that co-training LoRA adapters with unfrozen SSM core parameters (A_log, D, dt_bias) yields consistent improvements across all tasks (V3). However, PEFT's adapter-only serialization — combined with a save-ordering issue in our training script — silently discarded the trained SSM values from the saved PEFT artifact. Second, after fixing the persistence pipeline (V4), we estimate the SSM parameters' direct contribution: an additional 3.6 percentage point gain on classification (55.8% vs 52.2%), confirming that the co-training effect and persistent SSM adaptation are complementary mechanisms. Classification benefits most from persistent SSM changes — a cumulative 37% relative improvement over LoRA-only (V2) — while schema mapping and rule generation gains are driven primarily by the co-training effect alone. We are not aware of prior public results that combine LoRA targeting of Mamba projections with training and persisting SSM core parameters on a publicly available hybrid Mamba-Transformer model.

Keywords

SSM, Mamba-2, state space model, Artificial Intelligence, parameter-efficient-tuning, IBM Granite, Hybrid Architecture, LoRa, fine-tuning

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average