
Software testing and debugging are standard practices of software quality assurance since they enable the identification and correction of failures. Benchmarks have been used in that context as a group of programs to support the comparison of different techniques according to pre-established parameters. However, the reasons that inspire researchers to propose novel benchmarks are not fully understood. This article reports the investigation, identification, classification, and externalization of the state of the art about the proposition of benchmarks on software testing and debugging domains. The study was carried out using systematic mapping procedures according to the guidelines widely followed by software engineering literature. The search identified 1674 studies, from which, 25 were selected for analysis. A list of benchmarks is provided and descriptively mapped according to their characteristics, motivations, and scope of use for their creation. The lack of data to support the comparison between available and novel software testing and debugging techniques is the main motivation for the proposition of benchmarks. Advancements in the standardization and prescription of benchmark structure and composition are still required. Establishing such a standard could foster benchmark reuse, thereby saving time and effort in the engineering of benchmarks for software testing and debugging.
QA76.75-76.765, benchmark, Computer software, testing, debugging, software engineering
QA76.75-76.765, benchmark, Computer software, testing, debugging, software engineering
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
