publication . Preprint . 2017

Learn&Fuzz: Machine Learning for Input Fuzzing

Godefroid, Patrice; Peleg, Hila; Singh, Rishabh;
Open Access English
  • Published: 25 Jan 2017
Abstract
Fuzzing consists of repeatedly testing an application with modified, or fuzzed, inputs with the goal of finding security vulnerabilities in input-parsing code. In this paper, we show how to automate the generation of an input grammar suitable for input fuzzing using sample inputs and neural-network-based statistical machine-learning techniques. We present a detailed case study with a complex input format, namely PDF, and a large complex security-critical parser for this format, namely, the PDF parser embedded in Microsoft's new Edge browser. We discuss (and measure) the tension between conflicting learning and fuzzing goals: learning wants to capture the structu...
Subjects
ACM Computing Classification System: TheoryofComputation_LOGICSANDMEANINGSOFPROGRAMS
free text keywords: Computer Science - Artificial Intelligence, Computer Science - Cryptography and Security, Computer Science - Learning, Computer Science - Programming Languages, Computer Science - Software Engineering
Download from
28 references, page 1 of 2

1. Adobe Systems Incorporated. PDF Reference, 6th edition, Nov. 2006. Available at http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/ pdf_reference_1-7.pdf.

2. Osbert Bastani, Rahul Sharma, Alex Aiken, and Percy Liang. Synthesizing program input grammars. CoRR, abs/1608.01723, 2016. [OpenAIRE]

3. Sahil Bhatia and Rishabh Singh. Automated correction for syntax errors in programming assignments using recurrent neural networks. CoRR, abs/1603.06129, 2016. [OpenAIRE]

4. Rudy R. Bunel, Alban Desmaison, Pawan Kumar Mudigonda, Pushmeet Kohli, and Philip H. S. Torr. Adaptive neural compilation. In NIPS, pages 1444{1452, 2016. [OpenAIRE]

5. Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In EMNLP, pages 1724{1734, 2014.

6. Jan K Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, and Yoshua Bengio. Attention-based models for speech recognition. In Advances in Neural Information Processing Systems, pages 577{585, 2015. [OpenAIRE]

7. K. Claessen and J. Hughes. QuickCheck: A Lightweight Tool for Random Testing of Haskell Programs. In Proceedings of ICFP'2000, 2000. [OpenAIRE]

8. D. Coppit and J. Lian. yagg: an easy-to-use generator for structured test inputs. In ASE, 2005.

9. Weidong Cui, Marcus Peinado, Karl Chen, Helen J Wang, and Luis Irun-Briz. Tupni: Automatic reverse engineering of input formats. In Proceedings of the 15th ACM conference on Computer and communications security, pages 391{402. ACM, 2008.

10. Brett Daniel, Danny Dig, Kely Garcia, and Darko Marinov. Automated testing of refactoring engines. In FSE, 2007.

11. P. Godefroid, A. Kiezun, and M. Y. Levin. Grammar-based Whitebox Fuzzing. In Proceedings of PLDI'2008 (ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation), pages 206{215, Tucson, June 2008. [OpenAIRE]

12. P. Godefroid, M.Y. Levin, and D. Molnar. Automated Whitebox Fuzz Testing. In Proceedings of NDSS'2008 (Network and Distributed Systems Security), pages 151{166, San Diego, February 2008.

13. Rahul Gupta, Soham Pal, Aditya Kanade, and Shirish Shevade. Deep x: Fixing common c language errors by deep learning. In AAAI, 2017.

14. K.V. Hanford. Automatic Generation of Test Cases. IBM Systems Journal, 9(4), 1970.

15. Sepp Hochreiter and Jurgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735{1780, 1997.

28 references, page 1 of 2
Powered by OpenAIRE Open Research Graph
Any information missing or wrong?Report an Issue