Downloads provided by UsageCounts
This dataset includes open-source projects written in C# programing language, annotated for the presence of Long Method and God Class code smells. Each instance was manually annotated by at least two annotators. We explain our motivation and methodology for creating this dataset in our preprint: Luburić, N., Prokić, S., Grujić, K.G., Slivka, J., Kovačević, A., Sladić, G. and Vidaković, D., 2021. Towards a systematic approach to manual annotation of code smells. The dataset contains two excel datasheets: DataSet_Large Class.xlsx – C# classes annotated for the Large Class code smell severity. DataSet_Long Method.xlsx – C# methods annotated for the Long method code smell severity. The columns in the datasheet represent: Code Snippet ID – the full name of the code snippet. For classes, this is the package/namespace name followed by the class name. The full name of inner classes also contains the names of any outer classes (e.g., namespace.subnamespace.outerclass.innerclass). For methods, this is the full name of the class and the methods’s signature (e.g., namespace.class.method(param1Type, param2Type) ). Link – The GitHub link to the code snippet, including the commit and the start and end LOC. Code Smell – code smell for which the code snippet is examined (Large Class or Long Method). Project Link – the link to the version of the code repository that was annotated. Metrics – a list of metrics for the code snippet, calculated by our platform. Our dataset provides 25 class-level metrics for Large Class detection and 18 method-level metrics for Long Method detection The list of metrics and their definitions is available here. Final annotation – a single severity score calculated by a majority vote. Annotators – each annotator's (1, 2, or 3) assigned severity score. To help guide their reasoning for evaluating the presence and the severity of a code smell, three annotators independently annotated whether the considered heuristics apply to an evaluated code snippet. We provide these results in two separate excel datasheets: LargeClass_Heuristics.xlsx - C# classes annotated for the presence of heuristics relevant for the Large Class code smell. LongMethod_Heuristics.xlsx - C# classes annotated for the presence of heuristics relevant for the Large Class code smell. The columns of these two datasheets are: Code Snippet ID - the full name of the code snippet (matching the IDs from DataSet_Large Class.xlsx and DataSet_Long Method.xlsx) Annotators – heuristics labelled by each of the annotators (1, 2, or 3). Heuristics – whether the heuristic is applicable to the examined code snippet or not (Section 1.2.4 lists heuristics relevant for the Large Class detection, and Section 1.2.5 lists the heuristics relevant for the Long Method detection).
{"references": ["Luburi\u0107, N., Proki\u0107, S., Gruji\u0107, K.G., Slivka, J., Kova\u010devi\u0107, A., Sladi\u0107, G. and Vidakovi\u0107, D., 2021. Towards a systematic approach to manual annotation of code smells.", "Proki\u0107, S., Gruji\u0107, K.G., Luburi\u0107, N., Slivka, J., Kova\u010devi\u0107, A., Vidakovi\u0107, D. and Sladi\u0107, G., Clean Code and Design Educational Tool. In 2021 44th International Convention on Information, Communication and Electronic Technology (MIPRO) (pp. 1601-1606). IEEE."]}
This research was supported by the Science Fund of the Republic of Serbia, Grant No 6521051, AI-Clean CaDET.
clean code, dataset, manual annotation, C#, code smell
clean code, dataset, manual annotation, C#, code smell
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
| views | 28 | |
| downloads | 14 |

Views provided by UsageCounts
Downloads provided by UsageCounts