DeepDataFlow

This dataset contains 493k LLVM-IRs taken from a wide range of projects and source programming languages, and includes labels for several compiler data analyses. We also include the logs for the machine learning jobs which produced our published experimental results. The uncompressed dataset uses the following layout: labels/ Directory containing machine learning features and labels for programs for compiler data flow analyses. labels/<analysis>/<source>.<id>.<lang>.ProgramFeaturesList.pb A ProgramFeaturesList protocol buffer containing a list of features resulting from running a data flow analysis on a program. graphs/ Directory containing ProGraML representations of LLVM IRs. graphs/<source>.<id>.<lang>.ProgramGraph.pb A ProgramGraph protocol buffer of an LLVM IR in the ProGraML representation. ll/ Directory containing LLVM-IR files. ir/<source>.<id>.<lang>.ll An LLVM IR in text format, as produced by clang -emit-llvm -S or equivalent. test/ A directory containing symlinks to graphs in the graphs/ directory, indicating which graphs should be used as part of the test set. train/ A directory containing symlinks to graphs in the graphs/ directory, indicating which graphs should be used as part of the training set. val/ A directory containing symlinks to graphs in the graphs/ directory, indicating which graphs should be used as part of the validation set. vocab/ Directory containing vocabulary files. vocab/<type>.csv A vocabulary file, which lists unique node texts, their frequency in the dataset, and the cumulative proportion of total unique node texts that is covered. For further information please see our ProGraML repository.

{"references": ["Cummins, C., Fisches, Z. V., Ben-Nun, T., Hoefler, T., & Leather, H. (2020). ProGraML: Graph-based Deep Learning for Program Optimization and Analysis. arXiv preprint arXiv:2003.10536."]}

Related Organizations

University of Edinburgh
United Kingdom

Keywords

Machine Learning, Compilers, LLVM, Programming languages

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average