Orchid 1.0: A Reproducible Recipe for Aligned Ternary-Weight Language Models on Consumer Hardware

Romero Chisco, Michelangelo

Found an issue? Give us feedback

ZENODOarrow_drop_down

ZENODO

Preprint

Data sources: ZENODO

Orchid 1.0: A Reproducible Recipe for Aligned Ternary-Weight Language Models on Consumer Hardware

descriptionPublicationkeyboard_double_arrow_right Preprint Under curation English Publisher:Zenodo

Authors: Romero Chisco, Michelangelo;

doi: 10.5281/zenodo.20452163

Orchid 1.0: A Reproducible Recipe for Aligned Ternary-Weight Language Models on Consumer Hardware

- Summary

Abstract

We present Orchid 1.0, a 2-billion-parameter ternary-weight language model aligned through a three-stage LoRA pipeline (reasoning SFT, identity-and-knowledge SFT, and Odds-Ratio Preference Optimization) on a single RTX 3050 laptop with 4 GB of VRAM. We document each design decision, memory-management technique, and recovery procedure that made the training feasible on this hardware. We then describe and resolve the ternary merge problem — the destructive interaction between LoRA deltas and ternary weight quantization — which motivated the construction of ternative.cpp, a purpose-built C++ inference engine that loads a base I2_S GGUF and a separate LoRA adapter GGUF and merges them at full precision at load time. Ternative.cpp supports CPU (AVX2, OpenMP) and GPU (CUDA 12.6) execution with an OpenAI-compatible HTTP server. We evaluate Orchid 1.0 on four standard benchmarks: ARC-Challenge 56.0% (+6.1 pp over the BitNet base), HellaSwag 52.0%, WinoGrande 74.0%, and MMLU 38.6%. All artifacts are openly available:- Model: https://huggingface.co/MicheRomChis/orchid-1.0- Inference engine: https://github.com/michelangeloromerochisco/ternative- Training code: https://github.com/michelangeloromerochisco/orchid-1.0

Found an issue? Give us feedback