Robust URL Classification With Generative Adversarial Networks

descriptionPublicationkeyboard_double_arrow_right Article 25 Jan 2019 English Publisher:Association for Computing Machinery (ACM)Journal:ACM SIGMETRICS Performance Evaluation Review, volume 46, pages 143-146 (issn: 0163-5999,

Copyright policy )

Authors: Martino Trevisan; Idilio Drago;

doi: 10.1145/3308897.3308959

handle: 11368/3025221 , 11583/2723875 , 2318/1767140

Robust URL Classification With Generative Adversarial Networks

- Summary
- Subjects
- Related research
  (1)
- Metrics

Abstract

Classifying URLs is essential for different applications, such as parental control, URL filtering and Ads/tracking protection. Such systems historically identify URLs by means of regular expressions, even if machine learning alternatives have been proposed to overcome the time-consuming maintenance of classification rules. Classical machine learning algorithms, however, require large samples of URLs to train the models, covering the diverse classes of URLs (i.e., a ground truth), which somehow limits the applicability of the approach. We here give a first step towards the use of Generative Adversarial Neural Networks (GANs) to classify URLs. GANs are attractive for this problem for two reasons. First, GANs can produce samples of URLs belonging to specific classes even if exposed to a limited training set, outputting both synthetic traces and a robust discriminator. Second, a GAN can be trained to discriminate a class of URLs without being exposed to all other URLs classes - i.e., GANs are robust even if not exposed to uninteresting URL classes during training. Experiments on real data show that not only the generated synthetic traces are somehow realistic, but also the URL classification is accurate with GANs.

Related Organizations

Keywords

Machine Learning, Neural Network, URL generation, Generative Adversarial Networks; Machine Learning; Neural Networks; URL generation, Generative Adversarial Networks; Machine Learning; Neural Networks; URL generation;, Generative Adversarial Network

1 Research products, page 1 of 1

URL-generator software on GitHub
IsRelatedTo

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	14
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%