Towards a systematic approach to manual annotation of code smells - C# Dataset of Long Method and Large Class code smells

Nikola Luburić; Simona Prokić; Katarina-Glorija Grujić; Jelena Slivka; Aleksandar Kovačević; Goran Sladić; Dragan Vidaković

Found an issue? Give us feedback

ZENODOarrow_drop_down

ZENODO

Dataset . 2022

License: CC BY

Data sources: Datacite

ZENODO

Dataset . 2022

License: CC BY

Data sources: Datacite

ZENODO

Dataset . 2022

License: CC BY

Data sources: ZENODO

Towards a systematic approach to manual annotation of code smells - C# Dataset of Long Method and Large Class code smells

Research datakeyboard_double_arrow_right Dataset 05 May 2022 English Publisher:Zenodo

Authors: Nikola Luburić; Simona Prokić; Katarina-Glorija Grujić; Jelena Slivka; Aleksandar Kovačević; Goran Sladić; Dragan Vidaković;

doi: 10.5281/zenodo.6520055 , 10.5281/zenodo.6520056

Towards a systematic approach to manual annotation of code smells - C# Dataset of Long Method and Large Class code smells

- Summary
- Subjects
- Metrics

Abstract

This dataset includes open-source projects written in C# programing language, annotated for the presence of Long Method and God Class code smells. Each instance was manually annotated by at least two annotators. We explain our motivation and methodology for creating this dataset in our preprint: Luburić, N., Prokić, S., Grujić, K.G., Slivka, J., Kovačević, A., Sladić, G. and Vidaković, D., 2021. Towards a systematic approach to manual annotation of code smells. The dataset contains two excel datasheets: DataSet_Large Class.xlsx – C# classes annotated for the Large Class code smell severity. DataSet_Long Method.xlsx – C# methods annotated for the Long method code smell severity. The columns in the datasheet represent: Code Snippet ID – the full name of the code snippet. For classes, this is the package/namespace name followed by the class name. The full name of inner classes also contains the names of any outer classes (e.g., namespace.subnamespace.outerclass.innerclass). For methods, this is the full name of the class and the methods’s signature (e.g., namespace.class.method(param1Type, param2Type) ). Link – The GitHub link to the code snippet, including the commit and the start and end LOC. Code Smell – code smell for which the code snippet is examined (Large Class or Long Method). Project Link – the link to the version of the code repository that was annotated. Metrics – a list of metrics for the code snippet, calculated by our platform. Our dataset provides 25 class-level metrics for Large Class detection and 18 method-level metrics for Long Method detection The list of metrics and their definitions is available here. Final annotation – a single severity score calculated by a majority vote. Annotators – each annotator's (1, 2, or 3) assigned severity score. To help guide their reasoning for evaluating the presence and the severity of a code smell, three annotators independently annotated whether the considered heuristics apply to an evaluated code snippet. We provide these results in two separate excel datasheets: LargeClass_Heuristics.xlsx - C# classes annotated for the presence of heuristics relevant for the Large Class code smell. LongMethod_Heuristics.xlsx - C# classes annotated for the presence of heuristics relevant for the Large Class code smell. The columns of these two datasheets are: Code Snippet ID - the full name of the code snippet (matching the IDs from DataSet_Large Class.xlsx and DataSet_Long Method.xlsx) Annotators – heuristics labelled by each of the annotators (1, 2, or 3). Heuristics – whether the heuristic is applicable to the examined code snippet or not (Section 1.2.4 lists heuristics relevant for the Large Class detection, and Section 1.2.5 lists the heuristics relevant for the Long Method detection).

{"references": ["Luburi\u0107, N., Proki\u0107, S., Gruji\u0107, K.G., Slivka, J., Kova\u010devi\u0107, A., Sladi\u0107, G. and Vidakovi\u0107, D., 2021. Towards a systematic approach to manual annotation of code smells.", "Proki\u0107, S., Gruji\u0107, K.G., Luburi\u0107, N., Slivka, J., Kova\u010devi\u0107, A., Vidakovi\u0107, D. and Sladi\u0107, G., Clean Code and Design Educational Tool. In 2021 44th International Convention on Information, Communication and Electronic Technology (MIPRO) (pp. 1601-1606). IEEE."]}

This research was supported by the Science Fund of the Republic of Serbia, Grant No 6521051, AI-Clean CaDET.

Related Organizations

University of Novi Sad
Serbia

Keywords

clean code, dataset, manual annotation, C#, code smell

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Usage byUsageCounts

visibility	views	28
download	downloads	14

28
views
14
downloads
Powered by

Found an issue? Give us feedback

visibility

download

0

Average

28

14