Graph-Driven Indirect Call Prediction in Binary Code with Cross-Reference Augmented Control Flow Representations

Found an issue? Give us feedback

ZENODOarrow_drop_down

ZENODO

Model . 2024

License: CC BY

Data sources: Datacite

Graph-Driven Indirect Call Prediction in Binary Code with Cross-Reference Augmented Control Flow Representations

appsOther research productkeyboard_double_arrow_right Model 22 Jun 2024Publisher:Zenodo

doi: 10.5281/zenodo.12364897

Graph-Driven Indirect Call Prediction in Binary Code with Cross-Reference Augmented Control Flow Representations

- Summary
- Related research
  (1)
- Metrics

Abstract

Abstract: Binary code analysis has extensive downstream applications such as binary rewriting, recompilation, and software security. A significant challenge in maintaining the integrity of static analysis is resolving indirect call targets. This difficulty arises because the operand of a call instruction (e.g., call rax) remains unknown until runtime, leading to an incomplete inter-procedural control flow graph (CFG). Traditional solutions suffer from low accuracy or poor scalability, prompting recent research to explore machine learning (ML) approaches. However, the quality of ground truth data critically affects the accuracy of ML models. In this paper, we present NeuCall, a novel approach for resolving indirect calls using graph neural networks (GNNs). Repository Structure We have uploaded both the source code and dataset used in our experiments. The detailed information is listed below..├── batch├── binary_collection├── database├── env├── graph_generation├── model├── model.checkpoint├── predictor.checkpoint├── README.md└── util Dataset: binary: binary.zip graph: graph.zip Detailed Workflow Step 1: Environment Setup To replicate our experiments, first, set up the environment using the provided YAML files. This ensures that all dependencies and configurations are correctly established. Step 2: Binary Collection Using the modified Typro_CFI and GHCC tools, we collect binaries and annotate them with ground truth information. This step involves compiling source code from Arch Linux and GitHub to create a comprehensive dataset. Step 3: Graph Generation Convert the collected binary information into heterogeneous graphs. These graphs serve as the input for our GNN model, capturing the necessary control flow and data flow information. Step 4: Model Training Train the GNN model using the generated graphs. Experiment with different model settings, such as varying the size, layers, and types of graph elements, to optimize performance.

1 Research products, page 1 of 1

Graph-Driven Indirect Call Prediction in Binary Code with Cross-Reference Augmented Control Flow Representations
2024IsVersionOf

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average