Deriving Coding-Specific Sub-Models from LLMs using Resource-Efficient Pruning

Name: Deriving Coding-Specific Sub-Models from LLMs using Resource-Efficient Pruning
Keywords: I.2.2, D.1.2, Software Engineering (cs.SE), FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Software Engineering, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, I.2.6, I.2.2; I.2.6; D.1.2

Puccioni, Laura; Farshin, Alireza; Scazzariello, Mariano; Wang, Changjie; Chiesa, Marco; Kostic, Dejan

Found an issue? Give us feedback

arXiv.org e-Print Ar...arrow_drop_down

arXiv.org e-Print Archive

Preprint . 2025

Data sources: arXiv.org e-Print Archive

https://doi.org/10.1109/llm4co...

Article . 2025 . Peer-reviewed

License: STM Policy #29

Data sources: Crossref

https://dx.doi.org/10.48550/ar...

Article . 2025

License: CC BY

Data sources: Datacite

Deriving Coding-Specific Sub-Models from LLMs using Resource-Efficient Pruning

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 03 May 2025Embargo end date: 01 Jan 2025Publisher:IEEEJournal:2025 IEEE/ACM International Workshop on Large Language Models for Code (LLM4Code)Funded by:unidentified

Authors: Puccioni, Laura; Farshin, Alireza; Scazzariello, Mariano; Wang, Changjie; Chiesa, Marco; Kostic, Dejan;

doi: 10.1109/llm4code66737.2025.00028 , 10.48550/arxiv.2501.05248

arXiv: 2501.05248

Deriving Coding-Specific Sub-Models from LLMs using Resource-Efficient Pruning

- Summary
- Subjects
- Metrics

Abstract

Large Language Models (LLMs) have demonstrated their exceptional performance in various complex code generation tasks. However, their broader adoption is limited by significant computational demands and high resource requirements, particularly memory and processing power. To mitigate such requirements, model pruning techniques are used to create more compact models with significantly fewer parameters. However, current approaches do not focus on the efficient extraction of programming-language-specific sub-models. In this work, we explore the idea of efficiently deriving coding-specific sub-models through unstructured pruning (i.e., Wanda). We investigate the impact of different domain-specific calibration datasets on pruning outcomes across three distinct domains and extend our analysis to extracting four language-specific sub-models: Python, Java, C++, and JavaScript. We are the first to efficiently extract programming-language-specific sub-models using appropriate calibration datasets while maintaining acceptable accuracy w.r.t. full models. We are also the first to provide analytical evidence that domain-specific tasks activate distinct regions within LLMs, supporting the creation of specialized sub-models through unstructured pruning. We believe that this work has significant potential to enhance LLM accessibility for coding by reducing computational requirements to enable local execution on consumer-grade hardware, and supporting faster inference times critical for real-time development feedback.

Related Organizations

Royal Institute of Technology
Sweden
RISE Research Institutes of Sweden
Sweden

Keywords

I.2.2, D.1.2, Software Engineering (cs.SE), FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Software Engineering, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, I.2.6, I.2.2; I.2.6; D.1.2, Machine Learning (cs.LG)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green

Funded by

[no funder available]| unidentified

Related to Research communities

UArctic