
Classification of search queries is a complex and computationally challenging task. Typically, search queries are short, reveal very few features per single query and are therefore a weak source for traditional machine learning. In this paper, we present a method that combines limited manual labeling, computational linguistics and information retrieval to classify a large collection of Web search queries. A short set of manually chosen terms that are known a priori to be of interest to a particular class is used to cull a small number of actual queries from a commercial search engine log. These queries are then submitted to a commercial search engine and the returned search results are used to find more class related terms. We examine classification proficiency of the proposed method on a large Web search engine query log and show that up to 48% of the unlabeled set could be classified using this method. We discuss results of this research and its implications on the advancement of short text classification
Information Retrieval, Classification Schemes, 006, Computer Networks, Man-Machine Systems, User Interfaces
Information Retrieval, Classification Schemes, 006, Computer Networks, Man-Machine Systems, User Interfaces
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 4 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
