<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=undefined&type=result"></script>');
-->
</script>

COPY SCRIPT

For further information contact us at helpdesk@openaire.eu

Hash Adaptive Bloom Filter

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 01 Apr 2021Embargo end date: 01 Jan 2021Publisher:IEEEJournal:2021 IEEE 37th International Conference on Data Engineering (ICDE)

Authors: Guihai Chen; Rongbiao Xie; Meng Li; Haipeng Dai; Rong Gu; He Huang; Zheyu Miao;

doi: 10.1109/icde51399.2021.00061 , 10.48550/arxiv.2106.07037

arXiv: http://arxiv.org/abs/2106.07037

Hash Adaptive Bloom Filter

- Summary
- Subjects
- Related research
  (3)
- Metrics

Abstract

Bloom filter is a compact memory-efficient probabilistic data structure supporting membership testing, i.e., to check whether an element is in a given set. However, as Bloom filter maps each element with uniformly random hash functions, few flexibilities are provided even if the information of negative keys (elements are not in the set) are available. The problem gets worse when the misidentification of negative keys brings different costs. To address the above problems, we propose a new Hash Adaptive Bloom Filter (HABF) that supports the customization of hash functions for keys. The key idea of HABF is to customize the hash functions for positive keys (elements are in the set) to avoid negative keys with high cost, and pack customized hash functions into a lightweight data structure named HashExpressor. Then, given an element at query time, HABF follows a two-round pattern to check whether the element is in the set. Further, we theoretically analyze the performance of HABF and bound the expected false positive rate. We conduct extensive experiments on representative datasets, and the results show that HABF outperforms the standard Bloom filter and its cutting-edge variants on the whole in terms of accuracy, construction time, query time, and memory space consumption (Note that source codes are available in [1]).

11 pages, accepted by ICDE 2021

Related Organizations

Zhejiang Ocean University
China (People's Republic of)
Alibaba Group (China)
China (People's Republic of)
Zhejiang University
China (People's Republic of)
Nanjing University
China (People's Republic of)
Zhejiang University
China (People's Republic of)

View all View all

Keywords

FOS: Computer and information sciences, Computer Science - Databases, Databases (cs.DB)

3 Research products, page 1 of 1

Impact byBIP!

	citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	14
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%