Prophiler: a fast filter for the large-scale detection of malicious web pages

Conference object, Article English OPEN
Canali, Davide; Cova, Marco; Vigna, Giovanni; Kruegel, Christopher (2011)
  • Publisher: ACM
  • Related identifiers: doi: 10.1145/1963405.1963436
  • Subject: malicious web page analysis | [ SCCO.COMP ] Cognitive science/Computer science | Malicious web page analysis, drive-by download exploits, efficient web page filtering | drive-by download exploits | efficient web page filtering

International audience; Malicious web pages that host drive-by-download exploits have become a popular means for compromising hosts on the Internet and, subsequently, for creating large-scale botnets. In a drive-by-download exploit, an attacker embeds a malicious script (typically written in JavaScript) into a web page. When a victim visits this page, the script is executed and attempts to compromise the browser or one of its plugins. To detect drive-by-download exploits, researchers have developed a number of systems that analyze web pages for the presence of malicious code. Most of these systems use dynamic analysis. That is, they run the scripts associated with a web page either directly in a real browser (running in a virtualized environment) or in an emulated browser, and they monitor the scripts' executions for malicious activity. While the tools are quite precise, the analysis process is costly, often requiring in the order of tens of seconds for a single page. Therefore, performing this analysis on a large set of web pages containing hundreds of millions of samples can be prohibitive. One approach to reduce the resources required for performing large-scale analysis of malicious web pages is to develop a fast and reliable filter that can quickly discard pages that are benign, forwarding to the costly analysis tools only the pages that are likely to contain malicious code. In this paper, we describe the design and implementation of such a filter. Our filter, called Prophiler, uses static analysis techniques to quickly examine a web page for malicious content. This analysis takes into account features derived from the HTML contents of a page, from the associated JavaScript code, and from the corresponding URL. We automatically derive detection models that use these features using machine-learning techniques applied to labeled datasets. To demonstrate the effectiveness and efficiency of Prophiler, we crawled and collected millions of pages, which we analyzed for malicious behavior. Our results show that our filter is able to reduce the load on a more costly dynamic analysis tools by more than 85%, with a negligible amount of missed malicious pages.
  • References (30)
    30 references, page 1 of 3

    [1] Alexa.com. Alexa Top Global Sites. http://www.alexa.com/topsites/.

    [2] Clam AntiVirus. http://www.clamav.net/, 2010.

    [3] A. Clark and M. Guillemot. CyberNeko HTML Parser. http://nekohtml.sourceforge.net/.

    [4] M. Cova, C. Kruegel, and G. Vigna. Detection and Analysis of Drive-by-Download Attacks and Malicious JavaScript Code. In Proceedings of the International World Wide Web Conference (WWW), 2010.

    [5] B. Feinstein and D. Peck. Caffeine Monkey: Automated Collection, Detection and Analysis of Malicious JavaScript. In Proceedings of the Black Hat Security Conference, 2007.

    [6] S. Garera, N. Provos, M. Chew, and A. D. Rubin. A Framework for Detection and Measurement of Phishing Attacks. In Proceedings of the Workshop on Rapid Malcode (WORM), 2007.

    [7] D. Goodin. SQL injection taints BusinessWeek.com. http://www.theregister.co.uk/2008/09/16/ businessweek_hacked/, September 2008.

    [8] D. Goodin. Potent malware link infects almost 300,000 webpages. http://www.theregister.co.uk/ 2009/12/10/mass_web_attack/, December 2010.

    [9] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The WEKA Data Mining Software: An Update. SIGKDD Explorations, 11(1):10-18.

    [10] Heritrix. http://crawler.archive.org/.

  • Metrics
    No metrics available
Share - Bookmark