PM-LSH: A fast and accurate LSH framework for high-dimensional approximate NN search

Bolong Zheng, Zhao Xi, Lianggui Weng, Nguyen Quoc Viet Hung, Hang Liu, Christian S. Jensen

Research output: Contribution to journalJournal articleResearchpeer-review

5 Citations (Scopus)
51 Downloads (Pure)

Abstract

Nearest neighbor (NN) search in high-dimensional spaces is inherently computationally expensive due to the curse of dimensionality. As a well-known solution to approximate NN search, locality-sensitive hashing (LSH) is able to answer c-approximate NN (c-ANN) queries in sublinear time with constant probability. Existing LSH methods focus mainly on building hash bucket based indexing such that the candidate points can be retrieved quickly. However, existing coarse-grained structures fail to offer accurate distance estimation for candidate points, which translates into additional computational overhead when having to examine unnecessary points. This in turn reduces the performance of query processing. In contrast, we propose a fast and accurate LSH framework, called PM-LSH, that aims to compute the c-ANN query on large- scale, high-dimensional datasets. First, we adopt a simple yet effective PM-tree to index the data points. Second, we develop a tunable confidence interval to achieve accurate distance estimation and guarantee high result quality. Third, we propose an efficient algorithm on top of the PM-tree to improve the performance of computing c-ANN queries. Extensive experiments with real-world data offer evidence that PM-LSH is capable of outperforming existing proposals with respect to both efficiency and accuracy.
Original languageEnglish
JournalProceedings of the VLDB Endowment
Volume13
Issue number5
Pages (from-to)643-655
Number of pages13
ISSN2150-8097
DOIs
Publication statusPublished - 2020

Fingerprint Dive into the research topics of 'PM-LSH: A fast and accurate LSH framework for high-dimensional approximate NN search'. Together they form a unique fingerprint.

Cite this