
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=undefined&type=result"></script>');
-->
</script>
In order to ensure transparency and reproducibility, we have made everything available publicly here, including the Code, Models, Datasets and more. All the files and their functionality used in this paper are explained clearly in the README.md file. Background: Technical Debt (TD) needs to be controlled and tracked during software development. Support to automatically track TD in issue trackers is limited. Aim: We explore the usage of a large dataset of developer-labeled TD issues in combination with cutting-edge Natural Language Processing (NLP) approaches to automatically classify TD in issue trackers. Method: We mine and analyze more than 160GB of textual data from GitHub projects, collecting over 55,600 TD issues and consolidating them into a large dataset (GTD dataset). We use such datasets to train and test Transformer ML models. Then we test the model's generalization ability by testing them on six unseen projects. Finally, we re-train the models including part of the TD issues from the target project to test their adaptability. Results and Conclusion: (i) We create and release the GTD dataset, a comprehensive dataset including TD issues from 6,401 public repositories with various contexts; (ii) By training Transformers using the GTD dataset, we achieve performance metrics that are promising; (iii) Our results are a significant step forward towards supporting the automatic classification of TD in issue trackers, especially when the models are adapted to the context of unseen projects after fine-tuning.
Machine Learning, Issue Trackers, Transformers, Technical Debt, NLP, Dataset
Machine Learning, Issue Trackers, Transformers, Technical Debt, NLP, Dataset
citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
views | 232 | |
downloads | 22 |