An Adaptive Fusion Algorithm for Spam Detection

descriptionPublicationkeyboard_double_arrow_right Article 01 Jul 2014Publisher:Institute of Electrical and Electronics Engineers (IEEE)Journal:IEEE Intelligent Systems, volume 29, pages 2-8 (issn: 1541-1672, eissn: 1941-1294,

Copyright policy )

Authors: Congfu Xu; Baojun Su; Yunbiao Cheng; Weike Pan; Li Chen 0009;

doi: 10.1109/mis.2013.54

An Adaptive Fusion Algorithm for Spam Detection

- Summary
- Metrics

Abstract

Spam detection has become a critical component in various online systems such as email services, advertising engines, social media sites, and so on. Here, the authors use email services as an example, and present an adaptive fusion algorithm for spam detection (AFSD), which is a general, content-based approach and can be applied to nonemail spam detection tasks with little additional effort. The proposed algorithm uses n-grams of nontokenized text strings to represent an email, introduces a link function to convert the prediction scores of online learners to become more comparable, trains the online learners in a mistake-driven manner via thick thresholding to obtain highly competitive online learners, and designs update rules to adaptively integrate the online learners to capture different aspects of spams. The prediction performance of AFSD is studied on five public competition datasets and on one industry dataset, with the algorithm achieving significantly better results than several state-of-the-art approaches, including the champion solutions of the corresponding competitions.

Related Organizations

Shenzhen University
China (People's Republic of)
Zhejiang Ocean University
China (People's Republic of)
Hong Kong Baptist University
China (People's Republic of)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	7
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%