Powered by OpenAIRE graph
Found an issue? Give us feedback
ZENODOarrow_drop_down
ZENODO
Dataset . 2026
License: CC BY
Data sources: Datacite
addClaim

Function Name Recovery in Stripped Binaries: An Experience Report on Preprocessing, Evaluation, and Reproducibility

Function Name Recovery in Stripped Binaries: An Experience Report on Preprocessing, Evaluation, and Reproducibility

Abstract

Abstract: Recovering function names from stripped binaries remains a bottleneck in software maintenance, program comprehension, binary debugging, and security analysis. Although recent years have seen a wave of machine-learning-based techniques, the practical state of the art remains difficult to assess. Prior studies are confounded by three recurring problems: a widespread assumption that heavy manual preprocessing is needed to help tokenizers, even though such processing can erase domain-specific semantics or simplify labels in ways that inflate scores; evaluations that are not directly comparable because tools rely on different function-discovery backends or permissive metrics such as token-level top-$k$; and severe reproducibility barriers caused by missing artifacts, undocumented bugs, and extreme computational cost. This experience paper reports our effort to systematize and re-evaluate function-name recovery through a within-pipeline sensitivity analysis. We reproduce four representative state-of-the-art models on a common dataset and controlled pipeline, then retrain them under multiple preprocessing configurations to test whether manual segmentation and normalization are necessary. Across models, we find that these hand-engineered strategies often provide limited benefit over modern tokenizers and can silently discard useful semantic information. We further re-evaluate model outputs under stricter, analyst-facing criteria and show that permissive scoring schemes can substantially overstate practical performance. Finally, we document the scalability and reproducibility challenges encountered during reproduction, including missing artifacts, software bugs, and prohibitive resource demands. Based on these findings, we propose a unified evaluation framework and concrete best practices for more robust, comparable, and reproducible research on function name recovery.

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average