descriptionPublicationkeyboard_double_arrow_right Article , Part of book or chapter of book , Conference object 15 Apr 2020Publisher:ACMJournal:Proceedings of the Evaluation and Assessment in Software Engineering

Authors: Capiluppi, A; Ajienka, N; Ali, N; Arzoky, M; Counsell, S; Destefanis, G; Miron, A; +5 Authors

doi: 10.1145/3383219.3383231

Using the Lexicon from Source Code to Determine Application Domain

- Summary
- Subjects
- Related research
  (50)
- Metrics

Abstract

Context: The vast majority of software engineering research is independent of the application domain: techniques and tools usage is reported without any domain context. This has not always been so - early in the computing era, the research focus was frequently application domain specific (for example, scientific and data processing). \ud \ud Objective: We believe determining the research context is often important. Therefore we propose a code-based approach to identify the application domain of a software system, via its lexicon. We compare its precision with the plain textual description attached to the same system. \ud \ud Method: Using a sample of 50 Java projects, we obtained i) the description of each project (e.g., its ReadMe file), ii) the lexicon extracted from its source code, and iii) a list of its main topics extracted with the Latent Dirichlet Allocation (LDA) information retrieval technique. We assigned a random subset of these data items to different researchers (i.e., ‘experts’), and asked them to assign each item to one (or more) application domain. We then evaluated the precision and accuracy of the three techniques. \ud \ud Results: Using the agreement levels between experts, We observed that the ‘baseline’ dataset (i.e., the ReadMe files) obtained the highest average in terms of agreement between experts, but we also observed that the three techniques had the same mode and median agreement levels. Additionally, in the cases where no agreement was reached for the baseline dataset, the two other techniques provided sufficient additional support. \ud \ud Conclusions: We conclude that using the corpora or the topics from source code can be an adequate substitution to plain description when assigning a software system to an application domain

Related Organizations

Brunel University London
United Kingdom
Nottingham Trent University
United Kingdom

Keywords

source code, latent Dirichlet allocation, application domains, expert judgement, Java

50 Research products, page 1 of 5

Cross-validation based K nearest neighbor imputation for software effort prediction
2008IsPartOf
jeesite software on GitHub
IsRelatedTo
graphql-java software on GitHub
IsRelatedTo
weixin-java-tools software on GitHub
IsRelatedTo
ExpectAnim software on GitHub
IsRelatedTo
simplify software on GitHub
IsRelatedTo
jna software on GitHub
IsRelatedTo
halo software on GitHub
IsRelatedTo
swagger-core software on GitHub
IsRelatedTo
librec software on GitHub
IsRelatedTo

chevron_left
1
2
3
4
5
chevron_right

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	3
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

Average

Green

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering