Feature selection based file type identification algorithm

descriptionPublicationkeyboard_double_arrow_right Article 01 Oct 2010Publisher:IEEEJournal:2010 IEEE International Conference on Intelligent Computing and Intelligent Systems

Authors: null Ding Cao; null Junyong Luo; null Meijuan Yin; null Huijie Yang;

doi: 10.1109/icicisys.2010.5658559

Feature selection based file type identification algorithm

- Summary
- Metrics

Abstract

Identifying the true type of an arbitrary file is very important in information security. Methods based on file extensions or magic numbers can be easily spoofed, while a more reliable way is based on analyzing the file's binary content. We propose an algorithm to generate models for each file type based on analyzing the binary contents of a set of known input files by using n-gram analysis and design a novel feature selection evaluation function for extracting signatures from the models, then using the signatures to recognize the true type of unknown files. Our aim is not to use the structure and key words of any specific file types as this allows the approach to be applied to general file types. Experiments show that the proposed approach is promising especially when the feature selection evaluation function is applied.

Related Organizations

Institute of Scientific and Technical Information
China (People's Republic of)
Chinese Academy of Tropical Agricultural Sciences
China (People's Republic of)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	2
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

Average

Fields of Science (3) View all

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

View all

Upload OA version

Are you the author of this publication? Upload your Open Access version to Zenodo!

It’s fast and easy, just two clicks!

uploadUpload now