MALGRA: Machine Learning and N-Gram Malware Feature Extraction and Detection System

descriptionPublicationkeyboard_double_arrow_right Article , Other literature type 26 Oct 2020 United Kingdom English Publisher:MDPI AGJournal:Electronics, volume 9, page 1,777 (eissn: 2079-9292,

Copyright policy )Funded by:EC | CYBER-TRUST

Authors: Muhammad Ali; Stavros Shiaeles; Gueltoum Bendiab; Bogdan Ghita;

doi: 10.3390/electronics9111777

MALGRA: Machine Learning and N-Gram Malware Feature Extraction and Detection System

- Summary
- Subjects
- Related research
  (12)
- Metrics

Abstract

Detection and mitigation of modern malware are critical for the normal operation of an organisation. Traditional defence mechanisms are becoming increasingly ineffective due to the techniques used by attackers such as code obfuscation, metamorphism, and polymorphism, which strengthen the resilience of malware. In this context, the development of adaptive, more effective malware detection methods has been identified as an urgent requirement for protecting the IT infrastructure against such threats, and for ensuring security. In this paper, we investigate an alternative method for malware detection that is based on N-grams and machine learning. We use a dynamic analysis technique to extract an Indicator of Compromise (IOC) for malicious files, which are represented using N-grams. The paper also proposes TF-IDF as a novel alternative used to identify the most significant N-grams features for training a machine learning algorithm. Finally, the paper evaluates the proposed technique using various supervised machine-learning algorithms. The results show that Logistic Regression, with a score of 98.4%, provides the best classification accuracy when compared to the other classifiers used.

Country

United Kingdom

Related Organizations

University of Portsmouth
United Kingdom
Plymouth University
United Kingdom
UNIVERSITY OF PORTSMOUTH HIGHER EDUCATION CORPORATION
United Kingdom

Keywords

Random Forests, /dk/atira/pure/subjectarea/asjc/2200/2207, Sandbox, Computer Networks and Communications, /dk/atira/pure/subjectarea/asjc/2200/2208, Decision Tree, Malware, Naive Bayes, Machine learning, Dynamic analysis, Logistic Regression, Electrical and Electronic Engineering, API call, /dk/atira/pure/subjectarea/asjc/1700/1711, sandbox, /dk/atira/pure/subjectarea/asjc/1700/1705, malware, /dk/atira/pure/subjectarea/asjc/1700/1708, dynamic analysis, SNDBOX, machine learning, Control and Systems Engineering, Hardware and Architecture, Signal Processing, N-grams

12 Research products, page 1 of 2

A Novel Blockchain-Based Trust Model for Cloud
2018IsPartOf
A zero-crossing based 10-bit 100 MS/s pipeline ADC with controlled current in 90 nm CMOS
2014IsAmongTopNSimilarDocuments
A Guide for Achieving High Performance with Very Small Matrices on GPU: A Case Study of Batched LU and Cholesky Factorizations
2018IsAmongTopNSimilarDocuments
Analysis, Design, and Experimental Validation of a Primary Side Current-Sensing Flyback Converter for Use in a Battery Management System
2018IsAmongTopNSimilarDocuments
A note on “On the ratio of independent complex Gaussian random variables”
2017IsAmongTopNSimilarDocuments
A Soft Coprocessor Approach for Developing Image and Video Processing Applications on FPGAs
2022IsAmongTopNSimilarDocuments
Soft context clustering for F0 modeling in HMM-based speech synthesis
2015IsAmongTopNSimilarDocuments
Multichannel Online Blind Speech Dereverberation with Marginalization of Static Observation Parameters in a Rao-Blackwellized Particle Filter
2010IsAmongTopNSimilarDocuments
Autotuning Skeleton-Driven Optimizations for Transactional Worklist Applications
2012IsAmongTopNSimilarDocuments
A novel method for the detection of R-peaks in ECG based on K-Nearest Neighbors and Particle Swarm Optimization
2017IsAmongTopNSimilarDocuments

chevron_left
1
2
chevron_right

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	59
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 1%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 1%