<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=undefined&type=result"></script>');
-->
</script>

COPY SCRIPT

For further information contact us at helpdesk@openaire.eu

NJR-1 Dataset

Name: NJR-1 Dataset
Keywords: Static Analysis, Java

Research datakeyboard_double_arrow_right Dataset 16 Jun 2020 English Publisher:ZenodoFunded by:NSF | Collaborative Research: C..., NSF | CRI: CI-New: Collaborativ...

Authors: Utture, Akshay; Kalhauge, Christian Gram; Liu, Shuyang; Palsberg, Jens;

doi: 10.5281/zenodo.6314162 , 10.5281/zenodo.8015477 , 10.5281/zenodo.4632231 , 10.5281/zenodo.3897692 , 10.5281/zenodo.3897691 , 10.5281/zenodo.4839913 , 10.5281/zenodo.6622019

NJR-1 Dataset

- Summary
- Subjects
- Related research
  (1)
- Metrics

Abstract

NJR is a Normalized Java Resource. The NJR-1 dataset consists of 293 Java programs. TOOLS THAT RUN ON NJR-1 Each program runs successfully with the following 13 Java static analysis tools: SpotBugs (https://spotbugs.github.io) Wala (https://wala.github.io) Doop (https://bitbucket.org/yanniss/doop) Soot (https://github.com/soot-oss/soot) Petablox (https://github.com/petablox/petablox) Infer (https://fbinfer.com) Error-Prone (http://errorprone.info) Checker-Framework (https://checkerframework.org) Opium (Opal-framework) (https://www.opal-project.de) Spoon (https://spoon.gforge.inria.fr) PMD (https://pmd.github.io) CheckStyle (https://checkstyle.org) Codeguru* (https://aws.amazon.com/codeguru) In addition to these static analysis tools, the NJR dataset has also been tested with 6 other tools that operate on Java bytecode. Jacoco (https://www.jacoco.org): Dynamic analysis tool Wiretap (https://github.com/ucla-pls/wiretap): Dynamic analysis tool JReduce (https://github.com/ucla-pls/jreduce): Bytecode reduction tool Procyon (https://github.com/ststeiger/procyon): Decompiler CFR (https://www.benf.org/other/cfr/): Decompiler Fernflower (https://github.com/fesh0r/fernflower): Decompiler BENCHMARK PROGRAMS The NJR programs are repositories picked from a set of Java-8 projects on Github that compile and run successfully. Each of these programs come with a jar file, the compiled bytecode files, compiled library files and the Java source code. The availability of the files in both jar-file form, as well as source code form (with the compiled library classes) is a major reason the dataset works with so many tools, without requiring any extra effort. Internally, each benchmark program has the following structure: src: directory with source files. classes: directory with class files. lib: compiled third party library classes (source files not available, since libraries are distributed as class-files). jarfile: jar file containing the compiled application classes and third-party library classes. info: directory with information about the program. It includes the following files. classes: list of application classes (excludes third-party library classes). mainclasses: list of main classes that can be run. sources: list of source file names. declarations: list of method declarations categorized by source file name. The benchmarks already come with a compiled JAR file, but some tools need to compile and run the benchmarks. The following simple commands can be used for compilation and running (replace <jarfilename> with the file in the jarfile directory. replace <mainclassname> with any of the classes from info/mainclasses): javac -d compiled_classes -cp lib @info/sources java -cp jarfile/<jarfilename> <mainclassname> FILES AVAILABLE FOR DOWNLOAD There are 3 files available for download: njr-1_dataset.zip, scripts.zip, benchmark_stats.csv. njr-1_dataset.zip has the actual dataset programs. scripts.zip contains Python3 scripts for each tool, to run it on the entire dataset. It also contains a Readme detailing the version number, download link and setup instructions for each tool. The benchmark_stats.csv file lists, for each benchmark, the number of nodes and edges in its dynamic application call-graph, as well as the number of edges in its static application call-graph (as computed by Wala) when using the main function listed in the info/mainclassname file. STATISTICS Here are some statistics about the benchmark programs Mean number of application classes: 97 Each program executes at least 100 unique application methods at runtime. Mean lines of application source code: 9911 Mean number of 3rd party library classes: 2608 Mean (estimated) lines of 3rd party library source code: 250,000 A summary of the statistics from the benchmark_stats file is listed here: Statistics Dynamic-Nodes Dynamic-Edges Static-Edges Mean 205 469 1404 St.Dev 199 464 2523 Median 149 327 610 NOTES Note: Zenodo shows 4 changes for this repository. However all the changes involve updating the scripts folder, as more tools get tested on the dataset. The programs in the dataset themselves remain unchanged. *Note 2: Codeguru Reviewer is a paid, proprietary tool by Amazon. Our experiments show that it runs successfully on all the benchmarks in this dataset. However, we don't include any scripts to replicate this run because of its paid nature. To cite the dataset, please cite the following paper: Jens Palsberg and Cristina V. Lopes, NJR: a Normalized Java Resource. In Proceedings of ACM SIGPLAN International Workshop on State Of the Art in Program Analysis (SOAP), 2018.

Funded by the following NSF grant (https://www.nsf.gov/awardsearch/showAward?AWD_ID=1823360&HistoricalAwards=false)

Related Organizations

University of Chicago
United States
University of California, Los Angeles
United States

Keywords

Static Analysis, Java

1 Research products, page of 1

Impact byBIP!

	citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	3
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average