
Enhancing the modular structure of existing systems has attracted substantial research interest, focusing on two main methods: (1) software modularization and (2) identifying design issues (e.g., smells) as refactoring opportunities. However, re-modularization solutions often require extensive modifications to the original modules, and the design issues identified are generally too coarse to guide refactoring strategies. Combining the above two methods, this paper introduces a novel concept, PairSmell, which exploits modularization to pinpoint design issues necessitating refactoring. We concentrate on a granular but fundamental aspect of modularity principles -- modular relation (MR), i.e., whether a pair of entities are separated or collocated. The main assumption is that, if the actual MR of a pair violates its `apt MR', i.e., an MR agreed on by multiple modularization tools (as raters), it can be deemed likely a flawed architectural decision that necessitates further examination. To quantify and evaluate PairSmell, we conduct an empirical study on 20 C/C++ and Java projects, using 4 established modularization tools to identify two forms of PairSmell: inapt separated pairs InSep and inapt collocated pairs InCol. Our study on 260,003 instances reveals that their architectural impacts are substantial: (1) on average, 14.60% and 20.44% of software entities are involved in InSep and InCol MRs respectively; (2) InSep pairs are associated with 190% more co-changes than properly separated pairs, while InCol pairs are associated with 35% fewer co-changes than properly collocated pairs, both indicating a successful identification of modular structures detrimental to software quality; and (3) both forms of PairSmell persist across software evolution.
Accepted by 2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE'25)
Software Engineering (cs.SE), FOS: Computer and information sciences, Computer Science - Software Engineering, D.2
Software Engineering (cs.SE), FOS: Computer and information sciences, Computer Science - Software Engineering, D.2
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
