Living off the Analyst: Harvesting Features from Yara Rules for Malware Detection

Name: Living off the Analyst: Harvesting Features from Yara Rules for Malware Detection
Keywords: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Cryptography and Security, Cryptography and Security (cs.CR), Machine Learning (cs.LG)

Gupta, Siddhant; Lu, Fred; Barlow, Andrew; Raff, Edward; Ferraro, Francis; Matuszek, Cynthia; Nicholas, Charles; Holt, James

Found an issue? Give us feedback

arXiv.org e-Print Ar...arrow_drop_down

arXiv.org e-Print Archive

Preprint . 2024

Data sources: arXiv.org e-Print Archive

https://doi.org/10.1109/bigdat...

Article . 2024 . Peer-reviewed

License: STM Policy #29

Data sources: Crossref

https://dx.doi.org/10.48550/ar...

Article . 2024

License: arXiv Non-Exclusive Distribution

Data sources: Datacite

Living off the Analyst: Harvesting Features from Yara Rules for Malware Detection

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 15 Dec 2024Embargo end date: 01 Jan 2024Publisher:IEEEJournal:2024 IEEE International Conference on Big Data (BigData)

Authors: Gupta, Siddhant; Lu, Fred; Barlow, Andrew; Raff, Edward; Ferraro, Francis; Matuszek, Cynthia; Nicholas, Charles; +1 Authors

doi: 10.1109/bigdata62323.2024.10825735 , 10.48550/arxiv.2411.18516

arXiv: 2411.18516

Living off the Analyst: Harvesting Features from Yara Rules for Malware Detection

- Summary
- Subjects
- Related research
  (13)
- Metrics

Abstract

A strategy used by malicious actors is to "live off the land," where benign systems and tools already available on a victim's systems are used and repurposed for the malicious actor's intent. In this work, we ask if there is a way for anti-virus developers to similarly re-purpose existing work to improve their malware detection capability. We show that this is plausible via YARA rules, which use human-written signatures to detect specific malware families, functionalities, or other markers of interest. By extracting sub-signatures from publicly available YARA rules, we assembled a set of features that can more effectively discriminate malicious samples from benign ones. Our experiments demonstrate that these features add value beyond traditional features on the EMBER 2018 dataset. Manual analysis of the added sub-signatures shows a power-law behavior in a combination of features that are specific and unique, as well as features that occur often. A prior expectation may be that the features would be limited in being overly specific to unique malware families. This behavior is observed, and is apparently useful in practice. In addition, we also find sub-signatures that are dual-purpose (e.g., detecting virtual machine environments) or broadly generic (e.g., DLL imports).

To appear in BigData'24 CyberHunt 2024

Related Organizations

University of Maryland, College Park
United States
University of Maryland
United States
Booz Allen Hamilton (United States)
United States
BOOZ ALLEN HAMILTON

Keywords

FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Cryptography and Security, Cryptography and Security (cs.CR), Machine Learning (cs.LG)

13 Research products, page 1 of 2

binaryalert software on GitHub
IsRelatedTo
YARA-rules software on GitHub
IsRelatedTo
ConventionEngine software on GitHub
IsRelatedTo
awesome-YARA software on GitHub
IsRelatedTo
malware-signatures software on GitHub
IsRelatedTo
lasagna software on GitHub
IsRelatedTo
signature-base software on GitHub
IsRelatedTo
malware-ioc software on GitHub
IsRelatedTo
protections-artifacts software on GitHub
IsRelatedTo
Burp-YARA-Rules software on GitHub
IsRelatedTo

chevron_left
1
2
chevron_right

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green