TY - GEN
T1 - Learned Bloom Filter for Multi-key Membership Testing
AU - Li, Yunchuan
AU - Wang, Ziwei
AU - Yang, Ruixin
AU - Zhao, Yan
AU - Zhou, Rui
AU - Zheng, Kai
PY - 2023
Y1 - 2023
N2 - Multi-key membership testing refers to checking whether a queried element exists in a given set of multi-key elements, which is a fundamental operation for computing systems and networking applications such as web search, mail systems, distributed databases, firewalls, and network routing. Most existing studies for membership testing are built on Bloom filter, a space-efficient and high-security probabilistic data structure. However, traditional Bloom filter always performs poorly in multi-key scenarios. Recently, a new variant of Bloom filter that has combined machine learning methods and Bloom filter, also known as Learned Bloom Filter (LBF), has drawn increasing attention for its significant improvements in reducing space occupation and False Positive Rate (FPR). More importantly, due to the introduction of the learned model, LBF can well address some problems of Bloom filter in multi-key scenarios. Because of this, we propose a Multi-key LBF (MLBF) data structure, which contains a value-interaction-based multi-key classifier and a multi-key Bloom filter. To reduce FPR, we further propose an Interval-based MLBF, which divides keys into specific intervals according to the data distribution. Extensive experiments based on two real datasets confirm the superiority of the proposed data structures in terms of FPR and query efficiency.
AB - Multi-key membership testing refers to checking whether a queried element exists in a given set of multi-key elements, which is a fundamental operation for computing systems and networking applications such as web search, mail systems, distributed databases, firewalls, and network routing. Most existing studies for membership testing are built on Bloom filter, a space-efficient and high-security probabilistic data structure. However, traditional Bloom filter always performs poorly in multi-key scenarios. Recently, a new variant of Bloom filter that has combined machine learning methods and Bloom filter, also known as Learned Bloom Filter (LBF), has drawn increasing attention for its significant improvements in reducing space occupation and False Positive Rate (FPR). More importantly, due to the introduction of the learned model, LBF can well address some problems of Bloom filter in multi-key scenarios. Because of this, we propose a Multi-key LBF (MLBF) data structure, which contains a value-interaction-based multi-key classifier and a multi-key Bloom filter. To reduce FPR, we further propose an Interval-based MLBF, which divides keys into specific intervals according to the data distribution. Extensive experiments based on two real datasets confirm the superiority of the proposed data structures in terms of FPR and query efficiency.
UR - http://www.scopus.com/inward/record.url?scp=85161683133&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-30637-2_5
DO - 10.1007/978-3-031-30637-2_5
M3 - Article in proceeding
SN - 978-3-031-30636-5
T3 - Lecture Notes in Computer Science
SP - 62
EP - 79
BT - Database Systems for Advanced Applications - 28th International Conference, DASFAA 2023, Proceedings
A2 - Wang, Xin
A2 - Sapino, Maria Luisa
A2 - Han, Wook-Shin
A2 - Abbadi, Amr El
A2 - Dobbie, Gill
A2 - Feng, Zhiyong
A2 - Shao, Yingxiao
A2 - Yin, Hongzhi
PB - Springer Nature Switzerland AG
T2 - 28th International Conference, DASFAA 2023
Y2 - 17 April 2023 through 20 April 2023
ER -