Static Stack-Preserving Intra-Procedural Slicing of WebAssembly Binaries

# About this artifact This artifact contains the implementation and the results of the evaluation of a static slicer for WebAssembly described in the ICSE 2022 paper titled "Static Stack-Preserving Intra-Procedural Slicing of WebAssembly Binaries". The artifact contains a docker image (`wassail-eval.tar.xz`) that contains everything necessary to reproduce our evaluation, and the actual data resulting from our evaluation: 1. The implementation of our slicer (presented in Section 4.1) is included in the docker machine, and is available publicly here: https://github.com/acieroid/wassail/tree/icse2022 2. Test cases used for our evaluation of RQ1 are included in the docker machine and in the `rq1.tar.xz` archive. 3. The dataset used in RQ2, RQ3, and RQ4 is included in the docker machine. 4. The code needed to run our evaluation of RQ2, RQ3, and RQ4 is included in the docker machine. 5. The scripts used to generate the statistics and graphs that are included in the paper for RQ2, RQ3, and RQ4 are included in the docker machine and as the `*.py` files in this artifact. 6. The data of RQ5 that has been used in our manual investigation is included in the docker machine and in the `rq5.tar.xz` archive, along with `rq5-manual.txt` detailing our manual analysis findings. # How to obtain it Our artifact is available on Zenodo at the following URL: https://zenodo.org/record/5821007 # Setting up the Docker image ## Downloading The Artifact The artifact is available at the following URL: https://zenodo.org/record/5821007 ## Loading The Docker Image Once the artifact is downloaded in the file `icse2022slicing.tar.xz`, it can be extracted and loaded into Docker as follows (this takes a few minutes): ``` docker import icse2022slicing.tar.xz ``` To simplify further commands, you can tag the image using the printed sha256 hash of the image: if the `docker import` command resulted in the hash `54aa9416a379a6c71b1c325985add8bf931752d754c8fb17872c05f4e4b52ea2`, you can run: ``` docker tag 54aa9416a379a6c71b1c325985add8bf931752d754c8fb17872c05f4e4b52ea2 wassail-eval ``` Once the Docker image has been loaded, you can run the following commands to obtain a shell in the appropriate environment: ``` docker volume create result docker run -it -v result:/tmp/out/ wassail-eval bash su - opam ``` # Reproducing results of RQ1 Our manual translations of the "classical" examples are included in the `rq1/` directory (available in the docker image and in `rq1.tar.xz`). We include the slices computed by our implementation in the `rq1/out/` directory. A slice can be produced for each example in the docker image as follows, where the first argument is the name of the program being sliced, the second the function index being sliced, the third the slicing criterion (indicated as the instruction index, where instructions start at 1), and the last argument is the output file for the slice: ``` cd rq1/ wassail slice scam-mug.wat 5 8 scam-mug-slice.wat wassail slice montreal-boat.wat 5 19 montreal-boat-slice.wat wassail slice word-count.wat 1 41 word-count-slice1.wat wassail slice word-count.wat 1 43 word-count-slice2.wat wassail slice word-count.wat 1 39 word-count-slice3.wat wassail slice word-count.wat 1 45 word-count-slice4.wat wassail slice word-count.wat 1 37 word-count-slice5.wat wassail slice agrawal-fig-3.wat 3 38 agrawal-fig-3-slice.wat wassail slice agrawal-fig-5.wat 3 37 agrawal-fig-5-slice.wat ``` The slice results can then be inspected manually, and compared with the original version of the .wat program to see which instructions have been removed, or with the expected solutions in the `out/` directory, e.g. by running: ``` diff word-count-slice1.wat out/word-count-slice1.wat ``` (No output is expected if the slice is correct) # Reproducing results of RQ2, RQ3, and RQ4 For these RQ, we include the data resulting from our evaluation, but we also allow reviewers to rerun the full evaluation if needed. However, such an evaluation requires a heavy machine and takes quite some time (4-5 days to run to completion with a 4 hours timeout). In our case, we used a machine with 256 GB of RAM and a 64-core processor with HyperThreading enabled, allowing us to run 128 slicing jobs in parallel. ## Runnig the Evaluation We explain how to run the full evaluation, or only a partial evaluation below. One can directly skip to the next section and reuse our raw evaluation results, provided alongside this artifact. ### Running the Full Evaluation In order to reproduce our evaluation, you can run the following commands in the docker image. It is recommended to run them in a tmux session if one wants to inspect other elements in parallel (tmux is installed in the docker image). The timeout (set to 4 hours per binary, like in the paper) can be decreased by editing the `evaluate.sh` script (vim is installed in the docker image). This is expected to take 2-3 days of time, on a machine with 128 cores. In order to produce only partial results, see the next section. ``` cd filtered cat ../supported.txt | parallel --bar -j 128 sh ../evaluate.sh {} ``` The results are outputted in the `/tmp/out/` directory. ### Running a Partial Evaluation If one does not have access to a high-end machine with 128 cores nor the time to run the full evaluation, it is possible to produce partial results. To do so, the following commands can be run. This will run the evaluation on the full dataset in a random order, which can be stopped early to represent a partial view of our evaluation, on a random subset of the data. In order to gather more datapoints, it is also advised to decrease the timeout in the `evaluate.sh` file, for example to 20 minutes by setting `TIMEOUT=20m` with `nano evaluate.sh`. The number of slicing jobs running in parallel can also be decreased to match the number of processors on the machine running the experiments (the `-j 128` argument in the following command runs 128 parallel jobs) ``` sudo chown opam:opam /tmp/out/ cd filtered shuf ../supported.txt | parallel --bar -j 128 sh ../evaluate.sh {} ``` The evaluation results will be stored in the `/tmp/out/` directory. ### Skipping the Evaluation Run Instead of rerunning the evaluation, one can rely on our full results included in the `data.txt.xz` and `error.txt.xz` archives. These can simply be downloaded from within the Docker machine and extracted in `/tmp/out/`: ``` cd /tmp/out/ wget https://zenodo.org/record/5821007/files/data.txt.xz wget https://zenodo.org/record/5821007/files/error.txt.xz unxz data.txt.7z unxz error.txt.7z ``` ## Processing the data In order to process this data, we included multiple python script. These require around 100GB of RAM to load the full dataset in memory. The scripts should be run with Python 3. When running this in the docker image, first run `cd /tmp/out/ && cp /home/opam/*.py ./` - To count the number of functions sliced, run `cut -d, -f 1,2 data.txt | sort -u | wc -l`. This takes around 6 minutes to run on the full dataset. - To count the total number of slices encountered, run `wc -l data.txt error.txt`. This takes around 15 seconds to run. - To count the number of errors encountered, run `wc -l error.txt`. This takes around 1 second to run. - To produce data and graphs regarding the sizes and timing, run `python3 statistics-and-plots.py`. This will output the statistics presented in the paper, along with Figure 2 (rq2-sizes.pdf) and Figure 3 (rq2-times.pdf). This script takes around 35 minutes to run. - To find the executable slices that are larger than the original programs, run `python3 larger-slices.py > larger.txt`. This script takes around 2h30 to run. It will list the slice using the notation `filename function-sliced slicing-criterion` in the larger.txt file, from which the slice can be recomputed by running `wassail slice function-sliced slicing-criterion output.wat` in the docker image. It will also output statistics regarding these slices, which you can easily inspect by running `tail larger.txt`. - To investigate slices that could not be computed, run: ``` sed -i error.txt -e 's/annotation,/annotation./' python3 errors.py ``` This will take a few seconds to run and will print a summary of the errors encountered during the slicing process, and requires some manual sorting to map to the categories we discuss in the paper. Here is a summary of the errors encountered and their root cause: ### Root Cause: Unsupported Usage of br_table Error: (Failure"Invalid vstack when popping 2 values") Error: (Failure"Spec_inference.drop: not enough elements in stack") Error: (Failure"Spec_inference.take: not enough element in var list") Error: (Failure"unsupported in spec_inference: incompatible stack lengths (probably due to mismatches in br_table branches)") ### Root Cause: Unreachable Code Error: (Failure"Unsupported in slicing: cannot find an instruction. It probably is part of unreachable code.") Error: (Failure"bottom annotation") Error: (Failure"bottom annotation. this an unreachable instruction") # RQ5: Comparison to Slicing C Programs For this RQ, we include the following data in the `rq5.7z` archive, and in the `rq5/` directory in the docker image: - The slicing subjects in their C and textual wasm form in `rq5/subjects/` - The CodeSurfer slices in their C and textual wasm form in `rq5/codesurfer/` - Our slices in their wasm form in `rq5/wasm-slices/` As this RQ requires heavy manual comparison, we do not expect the reviewers to reproduce all of our results. We include a summary of our manual investigation in `rq5-manual.txt`. In order to validate these manual findings, one can for example inspect a specific slice. For example, the following line in `rq5-manual.txt`: ``` adpcm_apl1_565_expr.c.wat INTERPROCEDURAL ``` can be validated as follows: ``` cd ~/ # This generates a trimmed down version of the CodeSurfer slice, only containing the function of interest wassail count-in-slice rq5/codesurfer/adpcm_slices/adpcm_apl1_565_expr.c.wat slice.wat # This compares the CodeSurfer slice with our slice diff --side-by-side slice.wat rq5/adpcm_apl1_565_expr.c.wat ``` In this case, most extraneous instructions are present in the CodeSurfer slices, at the end of the function. This indicates that these are present in order to preserve interprocedural behavior, which corresponds to the `INTERPROCEDURAL` tag in the `rq5-manual.txt`

Related Organizations

Loyola Marymount University
United States
Vrije Universiteit Brussel
Belgium

Keywords

slicing, webassembly

3 Research products, page 1 of 1

The Wassail Tradition at Curry Rivel
1978IsAmongTopNSimilarDocuments
‘A place of magic’: enchanting geographies of contemporary wassailing practices
2019IsAmongTopNSimilarDocuments
Cider with Grundy: on the community orchard in Ambridge
2016IsAmongTopNSimilarDocuments

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average