Performance evaluation based on data from code reviews
- Publisher: Blekinge Tekniska Högskola, Institutionen för datalogi och datorsystemteknik
software verification | semi-supervised learning | regression analysis | development performance evaluation | Computer Sciences | Datavetenskap (datalogi)
Context. Modern code review tools such as Gerrit have made available great amounts of code review data from different open source projects as well as other commercial projects. Code reviews are used to keep the quality of produced source code under control but the stored data could also be used for evaluation of the software development process. Objectives. This thesis uses machine learning methods for an approximation of review expert’s performance evaluation function. Due to limitations in the size of labelled data sample, this work uses semisupervised machine learning methods and measure their influence on the performance. In this research we propose features and also analyse their relevance to development performance evaluation. Methods. This thesis uses Radial Basis Function networks as the regression algorithm for the performance evaluation approximation and Metric Based Regularisation as the semi-supervised learning method. For the analysis of feature set and goodness of fit we use statistical tools with manual analysis. Results. The semi-supervised learning method achieved a similar accuracy to supervised versions of algorithm. The feature analysis showed that there is a significant negative correlation between the performance evaluation and three other features. A manual verification of learned models on unlabelled data achieved 73.68% accuracy. Conclusions. We have not managed to prove that the used semisupervised learning method would perform better than supervised learning methods. The analysis of the feature set suggests that the number of reviewers, the ratio of comments to the change size and the amount of code lines modified in later parts of development are relevant to performance evaluation task with high probability. The achieved accuracy of models close to 75% leads us to believe that, considering the limited size of labelled data set, our work provides a solid base for further improvements in the performance evaluation approximation.