
Abstract: Binary code analysis has extensive downstream applications such as binary rewriting, recompilation, and software security. A significant challenge in maintaining the integrity of static analysis is resolving indirect call targets. This difficulty arises because the operand of a call instruction (e.g., call rax) remains unknown until runtime, leading to an incomplete inter-procedural control flow graph (CFG). Traditional solutions suffer from low accuracy or poor scalability, prompting recent research to explore machine learning (ML) approaches. However, the quality of ground truth data critically affects the accuracy of ML models. In this paper, we present NeuCall, a novel approach for resolving indirect calls using graph neural networks (GNNs). Repository Structure We have uploaded both the source code and dataset used in our experiments. The detailed information is listed below..├── batch├── binary_collection├── database├── env├── graph_generation├── model├── model.checkpoint├── predictor.checkpoint├── README.md└── util Dataset: binary: binary.zip graph: graph.zip Detailed Workflow Step 1: Environment Setup To replicate our experiments, first, set up the environment using the provided YAML files. This ensures that all dependencies and configurations are correctly established. Step 2: Binary Collection Using the modified Typro_CFI and GHCC tools, we collect binaries and annotate them with ground truth information. This step involves compiling source code from Arch Linux and GitHub to create a comprehensive dataset. Step 3: Graph Generation Convert the collected binary information into heterogeneous graphs. These graphs serve as the input for our GNN model, capturing the necessary control flow and data flow information. Step 4: Model Training Train the GNN model using the generated graphs. Experiment with different model settings, such as varying the size, layers, and types of graph elements, to optimize performance.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
