
Title SF-LM: A Neuro-Symbolic Language Model with Proto-Language Abstractions for Efficient and Faithful Text Generation Authors Usai, Luigi Description This work introduces the Semantic-First Language Model (SF-LM), a novel neuro-symbolic architecture designed to address the prohibitive computational costs and lack of interpretability in current monolithic Large Language Models (LLMs). Inspired by cognitive models of language processing and the "telegraphic" stage of child language acquisition, SF-LM decouples semantic understanding from syntactic generation. The model operates in a two-stage pipeline: A Core Semantic Parser (Msem) first translates input text into a structured, explicit Intermediate Semantic Representation (ISR), or "Proto-Language." This symbolic representation captures the core meaning of a sentence using thematic roles (e.g., agent:cat action:lick patient:ice-cream mod:fluffy). A lightweight Syntactic Realizer (Msyn) then converts this ISR into a grammatically fluent and complete sentence. We present the formal definition of the ISR using a BNF grammar and provide empirical evidence from summarization and text simplification tasks. Our results demonstrate that SF-LM achieves a superior trade-off between performance, efficiency, and faithfulness compared to a monolithic T5-Base baseline. Key Findings: Efficiency: SF-LM reduces model parameters and inference FLOPs by nearly 45%. Faithfulness: The modular design, constrained by the explicit ISR, significantly reduces hallucinations and improves factual consistency, achieving a human-evaluated faithfulness score of 4.6/5.0 compared to the baseline's 3.9/5.0. Performance: The model maintains comparable quality on standard NLP metrics like ROUGE-L and BLEU, with only a marginal drop in performance. This research demonstrates that a modular, semantic-first approach offers a promising path toward more efficient, controllable, and interpretable language models. This record may contain the research paper, the source code for the SF-LM model, and the WikiProto-1M dataset created for training and evaluation. Keywords Natural Language Processing, Language Models, Neuro-Symbolic AI, Semantic Parsing, Computational Efficiency, Interpretability, Text Generation, Faithfulness, Proto-Language, T5, Language Generation.
Neuro-Symbolic AI, Semantic Parsing, Computational Efficiency, Natural language processing, Text Generation, Proto-Language, Interpretability, Faithfulness, Language Models, Natural Language Processing, Language Generation, T5
Neuro-Symbolic AI, Semantic Parsing, Computational Efficiency, Natural language processing, Text Generation, Proto-Language, Interpretability, Faithfulness, Language Models, Natural Language Processing, Language Generation, T5
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
