publication . Article . Preprint . 2020

Java decompiler diversity and its application to meta-decompilation

Nicolas Harrand; César Soto-Valero; Martin Monperrus; Benoit Baudry;
Open Access
  • Published: 21 May 2020 Journal: Journal of Systems and Software, volume 168, page 110,645 (issn: 0164-1212, Copyright policy)
  • Publisher: Elsevier BV
Abstract
During compilation from Java source code to bytecode, some information is irreversibly lost. In other words, compilation and decompilation of Java code is not symmetric. Consequently, decompilation, which aims at producing source code from bytecode, relies on strategies to reconstruct the information that has been lost. Different Java decompilers use distinct strategies to achieve proper decompilation. In this work, we hypothesize that the diverse ways in which bytecode can be decompiled has a direct impact on the quality of the source code produced by decompilers. In this paper, we assess the strategies of eight Java decompilers with respect to three quality in...
Subjects
free text keywords: Hardware and Architecture, Software, Information Systems, Computer Science - Software Engineering
47 references, page 1 of 4

[1] Amin, N., Tate, R., 2016. Java and scala's type systems are unsound: The existential crisis of null pointers, in: Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, Association for Computing Machinery, New York, NY, USA. p. 838-848. URL: https://doi.org/10.1145/2983990.2984004, doi:10.1145/ 2983990.2984004.

[2] Benfield, L., 2019. CFR. https://www.benf.org/other/cfr/. [Online; accessed 19-July-2019].

[3] Blackburn, S.M., Garner, R., Hoffmann, C., Khang, A.M., McKinley, K.S., Bentzur, R., Diwan, A., Feinberg, D., Frampton, D., Guyer, S.Z., Hirzel, M., Hosking, A., Jump, M., Lee, H., Moss, J.E.B., Phansalkar, A., Stefanović, D., VanDrunen, T., von Dincklage, D., Wiedermann, B., 2006. The dacapo benchmarks: Java benchmarking development and analysis, in: Proceedings of the 21st Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications, Association for Computing Machinery, New York, NY, USA. p. 169-190. URL: https://doi.org/10. 1145/1167473.1167488, doi:10.1145/1167473.1167488.

[4] Chen, Y., Jiang, Y., Ma, F., Liang, J., Wang, M., Zhou, C., Su, Z., Jiao, X., 2018. EnFuzz: Ensemble Fuzzing with Seed Synchronization among Diverse Fuzzers. arXiv e-prints , arXiv:1807.00182arXiv:1807.00182.

[5] Dann, A., Hermann, B., Bodden, E., 2019. Sootdiff: Bytecode comparison across different java compilers, in: Proceedings of the 8th ACM SIGPLAN International Workshop on State Of the Art in Program Analysis, Association for Computing Machinery, New York, NY, USA. p. 14-19. URL: https://doi.org/10.1145/ 3315568.3329966, doi:10.1145/3315568.3329966. [OpenAIRE]

[6] Dupuy, E., 2019. Java Decompiler. https://http:// java-decompiler.github.io/. [Online; accessed 19-July2019].

[7] Ďurfina, L., Křoustek, J., Zemek, P., 2013. PsybOt Malware: A StepBy-Step Decompilation Case Study, in: 20th Working Conference on Reverse Engineering (WCRE), pp. 449-456. doi:10.1109/WCRE. 2013.6671321.

[8] Emamdoost, N., Sharma, V., Byun, T., McCamant, S., 2019. Binary mutation analysis of tests using reassembleable disassembly. doi:10. 14722/bar.2019.23058.

[9] Falleri, J.R., Morandat, F., Blanc, X., Martinez, M., Monperrus, M., 2014. Fine-grained and Accurate Source Code Differencing, in: 29th International Conference on Automated Software Engineering (ASE), ACM, New York, NY, USA. pp. 313-324. URL: http://doi. acm.org/10.1145/2642937.2642982, doi:10.1145/2642937. 2642982. [OpenAIRE]

[10] Flores-Montoya, A., Schulte, E.M., 2019. Datalog disassembly. CoRR abs/1906.03969. URL: http://arxiv.org/abs/1906. 03969, arXiv:1906.03969.

[11] Foster, B., Somayaji, A., 2010. Object-level recombination of commodity applications, in: Proceedings of the 12th annual conference on Genetic and evolutionary computation, pp. 957-964.

[12] Fu, C., Chen, H., Liu, H., Chen, X., Tian, Y., Koushanfar, F., Zhao, J., 2019. Coda: An end-to-end neural program decompiler, in: Advances in Neural Information Processing Systems, pp. 3703-3714.

[13] Grech, N., Brent, L., Scholz, B., Smaragdakis, Y., 2019. Gigahorse: thorough, declarative decompilation of smart contracts, in: International Conference on Software Engineering, IEEE. pp. 1176-1186. [OpenAIRE]

[14] Gusarovs, K., 2018. An Analysis on Java Programming Language Decompiler Capabilities. Applied Computer Systems 23, 109-117. [OpenAIRE]

[15] Hamilton, J., Danicic, S., 2009. An Evaluation of Current Java Bytecode Decompilers, in: 9th IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM), pp. 129-136. doi:10.1109/SCAM.2009.24.

47 references, page 1 of 4
Abstract
During compilation from Java source code to bytecode, some information is irreversibly lost. In other words, compilation and decompilation of Java code is not symmetric. Consequently, decompilation, which aims at producing source code from bytecode, relies on strategies to reconstruct the information that has been lost. Different Java decompilers use distinct strategies to achieve proper decompilation. In this work, we hypothesize that the diverse ways in which bytecode can be decompiled has a direct impact on the quality of the source code produced by decompilers. In this paper, we assess the strategies of eight Java decompilers with respect to three quality in...
Subjects
free text keywords: Hardware and Architecture, Software, Information Systems, Computer Science - Software Engineering
47 references, page 1 of 4

[1] Amin, N., Tate, R., 2016. Java and scala's type systems are unsound: The existential crisis of null pointers, in: Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, Association for Computing Machinery, New York, NY, USA. p. 838-848. URL: https://doi.org/10.1145/2983990.2984004, doi:10.1145/ 2983990.2984004.

[2] Benfield, L., 2019. CFR. https://www.benf.org/other/cfr/. [Online; accessed 19-July-2019].

[3] Blackburn, S.M., Garner, R., Hoffmann, C., Khang, A.M., McKinley, K.S., Bentzur, R., Diwan, A., Feinberg, D., Frampton, D., Guyer, S.Z., Hirzel, M., Hosking, A., Jump, M., Lee, H., Moss, J.E.B., Phansalkar, A., Stefanović, D., VanDrunen, T., von Dincklage, D., Wiedermann, B., 2006. The dacapo benchmarks: Java benchmarking development and analysis, in: Proceedings of the 21st Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications, Association for Computing Machinery, New York, NY, USA. p. 169-190. URL: https://doi.org/10. 1145/1167473.1167488, doi:10.1145/1167473.1167488.

[4] Chen, Y., Jiang, Y., Ma, F., Liang, J., Wang, M., Zhou, C., Su, Z., Jiao, X., 2018. EnFuzz: Ensemble Fuzzing with Seed Synchronization among Diverse Fuzzers. arXiv e-prints , arXiv:1807.00182arXiv:1807.00182.

[5] Dann, A., Hermann, B., Bodden, E., 2019. Sootdiff: Bytecode comparison across different java compilers, in: Proceedings of the 8th ACM SIGPLAN International Workshop on State Of the Art in Program Analysis, Association for Computing Machinery, New York, NY, USA. p. 14-19. URL: https://doi.org/10.1145/ 3315568.3329966, doi:10.1145/3315568.3329966. [OpenAIRE]

[6] Dupuy, E., 2019. Java Decompiler. https://http:// java-decompiler.github.io/. [Online; accessed 19-July2019].

[7] Ďurfina, L., Křoustek, J., Zemek, P., 2013. PsybOt Malware: A StepBy-Step Decompilation Case Study, in: 20th Working Conference on Reverse Engineering (WCRE), pp. 449-456. doi:10.1109/WCRE. 2013.6671321.

[8] Emamdoost, N., Sharma, V., Byun, T., McCamant, S., 2019. Binary mutation analysis of tests using reassembleable disassembly. doi:10. 14722/bar.2019.23058.

[9] Falleri, J.R., Morandat, F., Blanc, X., Martinez, M., Monperrus, M., 2014. Fine-grained and Accurate Source Code Differencing, in: 29th International Conference on Automated Software Engineering (ASE), ACM, New York, NY, USA. pp. 313-324. URL: http://doi. acm.org/10.1145/2642937.2642982, doi:10.1145/2642937. 2642982. [OpenAIRE]

[10] Flores-Montoya, A., Schulte, E.M., 2019. Datalog disassembly. CoRR abs/1906.03969. URL: http://arxiv.org/abs/1906. 03969, arXiv:1906.03969.

[11] Foster, B., Somayaji, A., 2010. Object-level recombination of commodity applications, in: Proceedings of the 12th annual conference on Genetic and evolutionary computation, pp. 957-964.

[12] Fu, C., Chen, H., Liu, H., Chen, X., Tian, Y., Koushanfar, F., Zhao, J., 2019. Coda: An end-to-end neural program decompiler, in: Advances in Neural Information Processing Systems, pp. 3703-3714.

[13] Grech, N., Brent, L., Scholz, B., Smaragdakis, Y., 2019. Gigahorse: thorough, declarative decompilation of smart contracts, in: International Conference on Software Engineering, IEEE. pp. 1176-1186. [OpenAIRE]

[14] Gusarovs, K., 2018. An Analysis on Java Programming Language Decompiler Capabilities. Applied Computer Systems 23, 109-117. [OpenAIRE]

[15] Hamilton, J., Danicic, S., 2009. An Evaluation of Current Java Bytecode Decompilers, in: 9th IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM), pp. 129-136. doi:10.1109/SCAM.2009.24.

47 references, page 1 of 4
Powered by OpenAIRE Research Graph
Any information missing or wrong?Report an Issue