Efficient cascade ranking for information retrieval

Web search services play a central role in modern society, providing access to information and knowledge. Relevance ranking for web search is a highly imbalanced problem, where non-relevant documents far out number those that are relevant to a user's particular information need. Advanced ranking methods such as Learning to Rank provide users with accurate and relevant results, albeit with greater demand for computing resources. This increases hardware and electricity costs for search providers impacting quality of service for end users. Furthermore, many Learning to Rank techniques regard the issue of reducing the computational load as a separate problem that is orthogonal to ranking result quality. This thesis investigates new techniques in cascade ranking for finding the right balance between efficiency and effectiveness in large-scale search systems. Cascade ranking employs a sequence of increasingly complex ranking models to progressively prune out less promising documents and refine the relevance ranking of those that remain. Combining cost-sensitive learning with document pruning over multiple ranking stages provides a greater degree of flexibility within the conjoint trade-off space of efficiency and effectiveness. It allows search providers to ask composite questions of the ranking models themselves. Such as what is the right balance of rank quality and query throughput for a relevance ranking model given the operational context it will be situated in? Several contributions toward cost-sensitive cascade ranking are presented. Our approach relies on the fact that earlier stages will see a full candidate list, but the majority of documents will not be relevant, or even scored in later stages. Therefore a trade-off has to be made between extracting cheap (or cost-efficient) features early on and maximizing effectiveness in the final stages of the retrieval process. More specifically, this thesis (a) collates previous work on algorithms for cost-sensitive Learning to Rank; (b) critically analyzes current methods on efficient cascade ranking; (c) presents a framework for performing cost-sensitive feature selection in the cascade ranking setting; (d) investigates the current limitations with regard to feature extraction tooling and reproducible Learning to Rank dataset construction, along with a proposed solution; (e) derives a new method for jointly optimizing a cascade of Learning to Rank models via document instance weighting that maximizes training data use for cascade learning.

Keywords

Information retrieval and web search

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Upload OA version

Are you the author of this publication? Upload your Open Access version to Zenodo!

It’s fast and easy, just two clicks!

uploadUpload now