publication . Preprint . 2020

On the Replicability and Reproducibility of Deep Learning in Software Engineering

Liu, Chao; Gao, Cuiyun; Xia, Xin; Lo, David; Grundy, John; Yang, Xiaohu;
Open Access English
  • Published: 25 Jun 2020
Abstract
Deep learning (DL) techniques have gained significant popularity among software engineering (SE) researchers in recent years. This is because they can often solve many SE challenges without enormous manual feature engineering effort and complex domain knowledge. Although many DL studies have reported substantial advantages over other state-of-the-art models on effectiveness, they often ignore two factors: (1) replicability - whether the reported experimental result can be approximately reproduced in high probability with the same DL model and the same data; and (2) reproducibility - whether one reported experimental findings can be reproduced by new experiments ...
Subjects
free text keywords: Computer Science - Software Engineering, Computer Science - Machine Learning
Download from
162 references, page 1 of 11

[1] Aysh Al-Hroob, Ayad Tareq Imam, and Rawan Al-Heisa. 2018. The use of artificial neural networks for extracting actions and actors from requirements document. Information and Software Technology 101 (2018), 1-15.

[2] Hamdi A Al-Jamimi and Moataz Ahmed. 2013. Machine learning-based software quality prediction models: state of the art. In 2013 International Conference on Information Science and Applications (ICISA). IEEE, 1-4.

[3] Sven Amann, Stefanie Beyer, Katja Kevic, and Harald Gall. 2013. Software mining studies: Goals, approaches, artifacts, and replicability. In Software Engineering. Springer, 121-158.

[4] Bente CD Anda, Dag IK Sjøberg, and Audris Mockus. 2008. Variability and reproducibility in software engineering: A study of four companies that developed the same system. IEEE Transactions on Software Engineering 35, 3 (2008), 407-429.

[5] Anders Arpteg, Björn Brinne, Luka Crnkovic-Friis, and Jan Bosch. 2018. Software engineering challenges of deep learning. In 2018 44th Euromicro Conference on Software Engineering and Advanced Applications (SEAA). IEEE, 50-59.

[6] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).

[7] Antoine Barbez, Foutse Khomh, and Yann-Gaël Guéhéneuc. 2019. Deep Learning Anti-patterns from Code Metrics History. In 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 114-124. [OpenAIRE]

[8] Raja Ben Abdessalem, Shiva Nejati, Lionel C Briand, and Thomas Stifter. 2016. Testing advanced driver assistance systems using multi-objective search and neural networks. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. 63-74.

[9] Jacob Benesty, Jingdong Chen, Yiteng Huang, and Israel Cohen. 2009. Pearson correlation coeficient. In Noise reduction in speech processing. Springer, 1-4.

[10] Sahil Bhatia, Pushmeet Kohli, and Rishabh Singh. 2018. Neuro-symbolic program corrector for introductory programming assignments. In 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE). IEEE, 60-70.

[11] Manjubala Bisi and Neeraj Kumar Goyal. 2016. Software development eforts prediction using artificial neural network. IET Software 10, 3 (2016), 63-71.

[12] John E Boylan, Paul Goodwin, Maryam Mohammadipour, and Aris A Syntetos. 2015. Reproducibility in forecasting research. International Journal of Forecasting 31, 1 (2015), 79-90.

[13] António Branco, Kevin Bretonnel Cohen, Piek Vossen, Nancy Ide, and Nicoletta Calzolari. 2017. Replicability and reproducibility of research results for human language technology: Introducing an LRE special section. [OpenAIRE]

[14] Lutz Büch and Artur Andrzejak. 2019. Learning-based recursive aggregation of abstract syntax trees for code clone detection. In 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 95-104.

[15] Jose Cambronero, Hongyu Li, Seohyun Kim, Koushik Sen, and Satish Chandra. 2019. When deep learning met code search. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 964-974.

162 references, page 1 of 11
Abstract
Deep learning (DL) techniques have gained significant popularity among software engineering (SE) researchers in recent years. This is because they can often solve many SE challenges without enormous manual feature engineering effort and complex domain knowledge. Although many DL studies have reported substantial advantages over other state-of-the-art models on effectiveness, they often ignore two factors: (1) replicability - whether the reported experimental result can be approximately reproduced in high probability with the same DL model and the same data; and (2) reproducibility - whether one reported experimental findings can be reproduced by new experiments ...
Subjects
free text keywords: Computer Science - Software Engineering, Computer Science - Machine Learning
Download from
162 references, page 1 of 11

[1] Aysh Al-Hroob, Ayad Tareq Imam, and Rawan Al-Heisa. 2018. The use of artificial neural networks for extracting actions and actors from requirements document. Information and Software Technology 101 (2018), 1-15.

[2] Hamdi A Al-Jamimi and Moataz Ahmed. 2013. Machine learning-based software quality prediction models: state of the art. In 2013 International Conference on Information Science and Applications (ICISA). IEEE, 1-4.

[3] Sven Amann, Stefanie Beyer, Katja Kevic, and Harald Gall. 2013. Software mining studies: Goals, approaches, artifacts, and replicability. In Software Engineering. Springer, 121-158.

[4] Bente CD Anda, Dag IK Sjøberg, and Audris Mockus. 2008. Variability and reproducibility in software engineering: A study of four companies that developed the same system. IEEE Transactions on Software Engineering 35, 3 (2008), 407-429.

[5] Anders Arpteg, Björn Brinne, Luka Crnkovic-Friis, and Jan Bosch. 2018. Software engineering challenges of deep learning. In 2018 44th Euromicro Conference on Software Engineering and Advanced Applications (SEAA). IEEE, 50-59.

[6] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).

[7] Antoine Barbez, Foutse Khomh, and Yann-Gaël Guéhéneuc. 2019. Deep Learning Anti-patterns from Code Metrics History. In 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 114-124. [OpenAIRE]

[8] Raja Ben Abdessalem, Shiva Nejati, Lionel C Briand, and Thomas Stifter. 2016. Testing advanced driver assistance systems using multi-objective search and neural networks. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. 63-74.

[9] Jacob Benesty, Jingdong Chen, Yiteng Huang, and Israel Cohen. 2009. Pearson correlation coeficient. In Noise reduction in speech processing. Springer, 1-4.

[10] Sahil Bhatia, Pushmeet Kohli, and Rishabh Singh. 2018. Neuro-symbolic program corrector for introductory programming assignments. In 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE). IEEE, 60-70.

[11] Manjubala Bisi and Neeraj Kumar Goyal. 2016. Software development eforts prediction using artificial neural network. IET Software 10, 3 (2016), 63-71.

[12] John E Boylan, Paul Goodwin, Maryam Mohammadipour, and Aris A Syntetos. 2015. Reproducibility in forecasting research. International Journal of Forecasting 31, 1 (2015), 79-90.

[13] António Branco, Kevin Bretonnel Cohen, Piek Vossen, Nancy Ide, and Nicoletta Calzolari. 2017. Replicability and reproducibility of research results for human language technology: Introducing an LRE special section. [OpenAIRE]

[14] Lutz Büch and Artur Andrzejak. 2019. Learning-based recursive aggregation of abstract syntax trees for code clone detection. In 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 95-104.

[15] Jose Cambronero, Hongyu Li, Seohyun Kim, Koushik Sen, and Satish Chandra. 2019. When deep learning met code search. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 964-974.

162 references, page 1 of 11
Any information missing or wrong?Report an Issue