
Context: The vast majority of software engineering research is independent of the application domain: techniques and tools usage is reported without any domain context. This has not always been so - early in the computing era, the research focus was frequently application domain specific (for example, scientific and data processing). \ud \ud Objective: We believe determining the research context is often important. Therefore we propose a code-based approach to identify the application domain of a software system, via its lexicon. We compare its precision with the plain textual description attached to the same system. \ud \ud Method: Using a sample of 50 Java projects, we obtained i) the description of each project (e.g., its ReadMe file), ii) the lexicon extracted from its source code, and iii) a list of its main topics extracted with the Latent Dirichlet Allocation (LDA) information retrieval technique. We assigned a random subset of these data items to different researchers (i.e., ‘experts’), and asked them to assign each item to one (or more) application domain. We then evaluated the precision and accuracy of the three techniques. \ud \ud Results: Using the agreement levels between experts, We observed that the ‘baseline’ dataset (i.e., the ReadMe files) obtained the highest average in terms of agreement between experts, but we also observed that the three techniques had the same mode and median agreement levels. Additionally, in the cases where no agreement was reached for the baseline dataset, the two other techniques provided sufficient additional support. \ud \ud Conclusions: We conclude that using the corpora or the topics from source code can be an adequate substitution to plain description when assigning a software system to an application domain
source code, latent Dirichlet allocation, application domains, expert judgement, Java
source code, latent Dirichlet allocation, application domains, expert judgement, Java
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 3 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
