
doi: 10.13016/j9tp-e9pg
handle: 1903/33067
Approximate nearest neighbor search (ANNS) on high-dimensional vectors is a fundamental primitive used widely for search over neural embeddings of unstructured data. Prior work on ANNS has produced indices which provide fast and accurate search on datasets up to billions of points, but are not well suited to queries restricted to some subset of the original dataset. Filtered ANNS is a formulation of the problem which adds metadata to points in the dataset which can be used to filter points at query time. This setting requires indexing a dataset in a metadata-aware way to support filtered queries. Filtered ANNS is important for applications such as product and image search, and necessary to give recently popular `vector databases' functionality similar to more traditional tabular databases. This work concerns two versions of the filtered ANNS problem. The most popular formulation in prior work associates points with boolean metadata in the form of labels and filters queries using a boolean predicate on these labels. In this setting, we present a novel index with state-of-the-art performance for queries with filters requiring either one label or both of a pair of labels which won a large benchmarking competition's track focused on the problem. We also introduce a novel formulation of filtered ANNS called `window filtered' ANNS, in which points are associated with a continuous metadata value (in practical use, this corresponds to a timestamp, measure of popularity, etc.), and queries are filtered to a range of metadata values. In addition to describing the problem, we present a practical and theoretically motivated index which handily outperforms baselines.
020, filtered anns, anns, vector search, Computer science, ann, filtered vector search, 004, approximate nearest neighbors
020, filtered anns, anns, vector search, Computer science, ann, filtered vector search, 004, approximate nearest neighbors
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
