Feature Engineering-Based Detection of Buffer Overflow Vulnerability in Source Code Using Neural Networks

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Conference object 01 Jun 2023Embargo end date: 01 Jan 2023 Italy Publisher:IEEEJournal:2023 IEEE 47th Annual Computers, Software, and Applications Conference (COMPSAC)Funded by:NSF | Authentic Learning Module..., NSF | Collaborative Research: S..., NSF | Authentic Learning Module... +1 projects

Authors: Mst. Shapna Akter; Hossain Shahriar; Juan Rodriguez Cardenas; Sheikh Iqbal Ahamed; Alfredo Cuzzocrea;

doi: 10.1109/compsac57700.2023.00106 , 10.48550/arxiv.2306.07981

arXiv: 2306.07981

handle: 20.500.11770/379017

Feature Engineering-Based Detection of Buffer Overflow Vulnerability in Source Code Using Neural Networks

- Summary
- Subjects
- Metrics

Abstract

One of the most significant challenges in the field of software code auditing is the presence of vulnerabilities in software source code. Every year, more and more software flaws are discovered, either internally in proprietary code or publicly disclosed. These flaws are highly likely to be exploited and can lead to system compromise, data leakage, or denial of service. To create a large-scale machine learning system for function level vulnerability identification, we utilized a sizable dataset of C and C++ open-source code containing millions of functions with potential buffer overflow exploits. We have developed an efficient and scalable vulnerability detection method based on neural network models that learn features extracted from the source codes. The source code is first converted into an intermediate representation to remove unnecessary components and shorten dependencies. We maintain the semantic and syntactic information using state of the art word embedding algorithms such as GloVe and fastText. The embedded vectors are subsequently fed into neural networks such as LSTM, BiLSTM, LSTM Autoencoder, word2vec, BERT, and GPT2 to classify the possible vulnerabilities. We maintain the semantic and syntactic information using state of the art word embedding algorithms such as GloVe and fastText. The embedded vectors are subsequently fed into neural networks such as LSTM, BiLSTM, LSTM Autoencoder, word2vec, BERT, and GPT2 to classify the possible vulnerabilities. Furthermore, we have proposed a neural network model that can overcome issues associated with traditional neural networks. We have used evaluation metrics such as F1 score, precision, recall, accuracy, and total execution time to measure the performance. We have conducted a comparative analysis between results derived from features containing a minimal text representation and semantic and syntactic information.

Country

Italy

Related Organizations

Kennesaw State University
United States
Kennesaw State University Research and Service Foundation
United States
University of Calabria
Italy
Tuskegee University
United States
University of Calabria (Modeling & Simulation Center - Laboratory of Enterprise Solutions)
Italy

View all View all

Keywords

Software Engineering (cs.SE), FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Software Engineering, Computer Science - Cryptography and Security, Cryptography and Security (cs.CR), Machine Learning (cs.LG)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	1
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

1

Average

Green

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Funded by

NSF| Authentic Learning Modules for DevOps Security Education, NSF| Collaborative Research: SaTC: EDU: Authentic Learning of Machine Learning in Cybersecurity with Portable Hands-on Labware, NSF| Authentic Learning Modules for DevOps Security Education, NSF| Collaborative Research: SaTC: EDU: Authentic Learning of Machine Learning in Cybersecurity with Portable Hands-on Labware