To automatically map source code entities to architectural modules with Naive Bayes

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 01 Jan 2022Embargo end date: 01 Jan 2021 Sweden English Publisher:Elsevier BVJournal:Journal of Systems and Software, volume 183, page 111,095 (issn: 0164-1212,

Copyright policy )

Authors: Tobias Olsson; Morgan Ericsson; Anna Wingkvist;

doi: 10.1016/j.jss.2021.111095 , 10.48550/arxiv.2109.09525

arXiv: 2109.09525

To automatically map source code entities to architectural modules with Naive Bayes

- Summary
- Subjects
- Related research
  (3)
- Metrics

Abstract

Background: The process of mapping a source code entity onto an architectural module is to a large degree a manual task. Automating this process could increase the use of static architecture conformance checking methods, such as reflexion modeling, in industry. Current techniques rely on user parameterization and a highly cohesive design. A machine learning approach would potentially require fewer parameters and better use of the available information to aid in automatic mapping. Aim: We investigate how a classifier can be trained to map from source code to architecture modules automatically. This classifier is trained with semantic and syntactic dependency information extracted from the source code and from architecture descriptions. The classifier is implemented using multinomial naive Bayes and evaluated. Method: We perform experiments and compare the classifier with three state-of-the-art mapping functions in eight open-source Java systems with known ground-truth-mappings. Results: We find that the classifier outperforms the state-of-the-art in all cases and that it provides a useful baseline for further research in the area of semi-automatic incremental clustering. Conclusions: We conclude that machine learning is a useful approach that performs better and with less need for parameterization compared to other approaches. Future work includes investigating problematic mappings and a more diverse set of subject systems.

Accepted for Publishing in The Journal of Systems and Software

Country

Sweden

Related Organizations

Keywords

FOS: Computer and information sciences, Programvaruteknik, I.5.3, Software architecture, Software Engineering, Incremental clustering, D.2.11; I.5.3, 004, D.2.11, Orphan adoption, Software Engineering (cs.SE), Naive Bayes, Computer Science - Software Engineering, Machine learning

3 Research products, page 1 of 1

Replication2 software on GitHub
IsRelatedTo
s4rdm3x software on GitHub
IsRelatedTo
SAEroConRepo software on GitHub
IsRelatedTo

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	13
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%