
Adapting keyword search to XML data has been attractive recently, generalized as XML Keyword Search (XKS). Its fundamental task is to retrieve meaningful and concise result for the given keyword query, and [1] is the latest work which returns the fragments rooted at the SLCA (Smallest LCA - Lowest Common Ancestor) nodes. To guarantee the fragments only containing meaningful nodes, [1] proposed a contributor-based filtering mechanism in its MaxMatch algorithm. However, the filtering mechanism is not sufficient. It will commit the false positive problem (discarding interesting nodes) and the redundancy problem (keeping uninteresting nodes).In this paper, we propose a new filtering mechanism to overcome those two problems. The fundamental concept is valid contributor. A child v is a valid contributor to its parent u, if (1) v's label is unique among all u's children; or (2) for the siblings with same label as v, v's content is not covered by any of them. Our new filtering mechanism is: all the nodes in each retrieved fragment should be valid contributors to their parents. By doing so, it not only satisfies the axiomatic properties proposed by [1], but also ensures the filtered fragment more meaningful and concise. We implement our proposal in ValidMatch, and compare ValidMatch with MaxMatch on real and synthetic XML data. The result verifies our claims, and shows the effectiveness of our valid-contributor-based filtering mechanism.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
