descriptionPublicationkeyboard_double_arrow_right Part of book or chapter of book , Conference object , Article 01 Jan 2011Publisher:Springer Berlin Heidelberg

Authors: Ahmed, Irfan; Lhee, Kyung-Suk; Shin, Hyun-Jung; Hong, Man-Pyo;

doi: 10.1007/978-3-642-24212-0_5

Fast Content-Based File Type Identification

- Summary
- Subjects
- Metrics

Abstract

Digital forensic examiners often need to identify the type of a file or file fragment based on the content of the file. Content-based file type identification schemes typically use a byte frequency distribution with statistical machine learning to classify file types. Most algorithms analyze the entire file content to obtain the byte frequency distribution, a technique that is inefficient and time consuming. This paper proposes two techniques for reducing the classification time. The first technique selects a subset of features based on the frequency of occurrence. The second speeds up classification by randomly sampling file blocks. Experimental results demonstrate that up to a fifteen-fold reduction in computational time can be achieved with limited impact on accuracy.

Related Organizations

Ajou University
Korea (Republic of)
Queensland University of Technology
Australia
Institute of Information Security
Japan

Keywords

[INFO] Computer Science [cs], file content classification, File type identification, byte frequency

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	22
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%