
doi: 10.1109/dsn.2015.30
Research and practice show that the effectiveness of vulnerability detection tools depends on the concrete use scenario. Benchmarking can be used for selecting the most appropriate tool, helping assessing and comparing alternative solutions, but its effectiveness largely depends on the adequacy of the metrics. This paper studies the problem of selecting the metrics to be used in a benchmark for software vulnerability detection tools. First, a large set of metrics is gathered and analyzed according to the characteristics of a good metric for the vulnerability detection domain. Afterwards, the metrics are analyzed in the context of specific vulnerability detection scenarios to understand their effectiveness and to select the most adequate one for each scenario. Finally, an MCDA algorithm together with experts' judgment is applied to validate the conclusions. Results show that although some of the metrics traditionally used like precision and recall are adequate in some scenarios, others require alternative metrics that are seldom used in the benchmarking area.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 35 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
