
Neural Machine Translation (NMT) systems have achieved strong performance for highresource languages; however, many African languages remain underrepresented due to the scarcity of high-quality parallel data. Kimbundu, a Bantu language spoken in Angola, is one such low-resource language with limited machine translation support.In this work, we introduce a manually curated and humanreviewed Kimbundu–Portuguese parallel corpus and investigate its use for fine-tuning multilingual NMT models. By leveraging the NLLB200 (600M) architecture, we employ parameterefficient fine-tuning with QLoRA to adapt the model to the Kimbundu→Portuguese direction. Experimental results on a professionally reviewed test set of 1,000 sentence pairs demonstrate substantial improvements over strong multilingual baselines, with gains of +10.1 BLEU and +13.2 chrF. Furthermore, semantic metrics—including COMET, AfriCOMET, and BERTScore—show consistent growth, while qualitative analysis confirms better handling of Kimbundu’s complex morphology. These findings suggest that high-quality human reviewed data, combined with efficient fine-tuning, is a viable path to bridging the digital divide for low-resource African languages.
Fine-tuning, Multilingual language models, Low-resource languages, Machine translation
Fine-tuning, Multilingual language models, Low-resource languages, Machine translation
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
