
Indexing and ranking are two key factors for efficient and effective XML information retrieval. Inappropriate indexing may result in false negatives and false positives, and improper ranking may lead to low precisions. In this paper, we propose a configurable XML information retrieval system, in which users can configure appropriate index types for XML tags and text contents. Based on users' index configurations, the system transforms XML structures into a compact tree representation, Ctree, and indexes XML text contents. To support XML ranking, we propose the concepts of "weighted term frequency" and "inverted element frequency," where the weight of a term depends on its frequency and location within an XML element as well as its popularity among similar elements in an XML dataset. We evaluate the effectiveness of our system through extensive experiments on the INEX 03 dataset and 30 content and structure (CAS) topics. The experimental results reveal that our system has significantly high precision at low recall regions and achieves the highest average precision (0.3309) as compared with 38 official INEX 03 submissions using the strict evaluation metric.
| citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 30 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
