Beyond bags of words

Name: Beyond bags of words
Creator: Metzler, Donald A.
Keywords: 0202 electrical engineering, electronic engineering, information engineering, 006, 02 engineering and technology, 16. Peace & justice, Computer science

Metzler, Donald A.

Found an issue? Give us feedback

ACM SIGIR Forumarrow_drop_down

ACM SIGIR Forum

Article . 2008 . Peer-reviewed

License: https://www.acm.org/publications/policies/copyright_policy#Background

Data sources: Crossref

DBLP

Article

Data sources: DBLP

https://dx.doi.org/10.1145/139...

Article

Data sources: Microsoft Academic Graph

Beyond bags of words

effectively modeling dependence and features in information retrieval

descriptionPublicationkeyboard_double_arrow_right Article 01 Jun 2008 United States English Publisher:Association for Computing Machinery (ACM)Journal:ACM SIGIR Forum, volume 42, pages 77-77 (issn: 0163-5840,

Copyright policy )

Authors: Metzler, Donald A.;

doi: 10.1145/1394251.1394271

Beyond bags of words

- Summary
- Subjects
- Metrics

Abstract

Current state of the art information retrieval models treat documents and queries as bags of words. There have been many attempts to go beyond this simple representation. Unfortunately, few have shown consistent improvements in retrieval effectiveness across a wide range of tasks and data sets. Here, we propose a new statistical model for information retrieval based on Markov random fields. The proposed model goes beyond the bag of words assumption by allowing dependencies between terms to be incorporated into the model. This allows for a variety of textual and non-textual features to be easily combined under the umbrella of a single model. Within this framework, we explore the theoretical issues involved, parameter estimation, feature selection, and query expansion. We give experimental results from a number of information retrieval tasks, such as ad hoc retrieval and web search.

Country

United States

Related Organizations

University of Massachusetts System
United States
University of Massachusetts Amherst
United States

Keywords

006, Computer science

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	14
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%