Assessing-Small-Language-Models-SLMs-for-Code-Generation: An Empirical Study with Benchmarks

Hasan, Md Mahade; Waseem, Muhammad; Kemell, Kai-Kristian; Rasku, Jussi; Ala-Rantala, Juha; Abrahamsson, Pekka

Found an issue? Give us feedback

ZENODOarrow_drop_down

ZENODO

Software . 2026

License: http://www.apache.org/licenses/LICENSE-2.0

Data sources: Datacite

ZENODO

Software . 2026

License: http://www.apache.org/licenses/LICENSE-2.0

Data sources: Datacite

Assessing-Small-Language-Models-SLMs-for-Code-Generation: An Empirical Study with Benchmarks

integration_instructionsResearch softwarekeyboard_double_arrow_right Software 04 Feb 2026Publisher:Zenodo

Authors: Hasan, Md Mahade; Waseem, Muhammad; Kemell, Kai-Kristian; Rasku, Jussi; Ala-Rantala, Juha; Abrahamsson, Pekka;

doi: 10.5281/zenodo.18486806 , 10.5281/zenodo.18486805

Assessing-Small-Language-Models-SLMs-for-Code-Generation: An Empirical Study with Benchmarks

- Summary
- Metrics

Abstract

### Research Overview This repository contains the code and evaluation framework for benchmarking 20 Small Language Models (SLMs) on 5 code generation benchmarks. This work extends the BigCode Evaluation Harness with automated benchmarking capabilities, VRAM monitoring, and performance tracking. Key Contributions Automated Benchmarking: Custom benchmarking.py script for systematic evaluation of multiple models Comprehensive Evaluation: 20 SLMs evaluated across 5 benchmark suites Performance Monitoring: Real-time VRAM usage tracking and execution time measurement Reproducible Results: Complete configuration and results for all experiments

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average