Querying the World Wide Web

descriptionPublicationkeyboard_double_arrow_right Article , Conference object 01 Apr 1997 English Publisher:Springer Science and Business Media LLCJournal:International Journal on Digital Libraries, volume 1, pages 54-67 (issn: 1432-5012,

Copyright policy )Funded by:NSERC | unidentified

Authors: Alberto O. Mendelzon; George A. Mihaila; Tova Milo;

doi: 10.1007/s007990050004 , 10.1109/pdis.1996.568671

Querying the World Wide Web

- Summary
- Metrics

Abstract

The World Wide Web is a large, heterogeneous, distributed collection of documents connected by hypertext links. The most common technology currently used for searching the Web depends on sending information retrieval requests to "index servers". One problem with this is that these queries cannot exploit the structure and topology of the document network. The authors propose a query language, WebSQL, that takes advantage of multiple index servers without requiring users to know about them, and that integrates textual retrieval with structure and topology-based queries. They give a formal semantics for WebSQL using a calculus based on a novel "virtual graph" model of a document network. They propose a new theory of query cost based on the idea of "query locality," that is, how much of the network must be visited to answer a particular query. Finally, they describe a prototype implementation of WebSQL written in Java.

Related Organizations

Tel Aviv University
Israel
University of Toronto
Canada

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	221
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 0.1%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 1%