
arXiv: 1801.06232
Probabilistic membership filters are a type of data structure designed to quickly verify whether an element of a large data set belongs to a subset of the data. While false negatives are not possible, false positives are. Therefore, the main goal of any good probabilistic membership filter is to have a small false-positive rate while being memory efficient and fast to query. Although Bloom filters are fast to construct, their memory efficiency is bounded by a strict theoretical upper bound. Weaver et al. introduced random satisfiability-based filters that significantly improved the efficiency of the probabilistic filters, however, at the cost of solving a complex random satisfiability (SAT) formula when constructing the filter. Here we present an improved SAT filter approach with a focus on reducing the filter building times, as well as query times. Our approach is based on using not-all-equal (NAE) SAT formulas to build the filters, solving these via a mapping to random SAT using traditionally-fast random SAT solvers, as well as bit packing and the reduction of the number of hash functions. Paired with fast hardware, NAE-SAT filters could result in enterprise-size applications.
13 pages, 4 figures, 3 pages
FOS: Computer and information sciences, Computer Science - Cryptography and Security, Statistical Mechanics (cond-mat.stat-mech), Computer Science - Data Structures and Algorithms, FOS: Physical sciences, Data Structures and Algorithms (cs.DS), Cryptography and Security (cs.CR), Condensed Matter - Statistical Mechanics
FOS: Computer and information sciences, Computer Science - Cryptography and Security, Statistical Mechanics (cond-mat.stat-mech), Computer Science - Data Structures and Algorithms, FOS: Physical sciences, Data Structures and Algorithms (cs.DS), Cryptography and Security (cs.CR), Condensed Matter - Statistical Mechanics
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
