PM-LSH: A fast and accurate LSH framework for high-dimensional approximate NN search

Bolong Zheng; Zhao Xi; Lianggui Weng; Nguyen Quoc Viet Hung; Hang Liu; Christian S. Jensen

doi:10.14778/3377369.3377374

PM-LSH: A fast and accurate LSH framework for high-dimensional approximate NN search

Bolong Zheng, Zhao Xi, Lianggui Weng, Nguyen Quoc Viet Hung, Hang Liu, Christian S. Jensen

Publikation: Bidrag til tidsskrift › Tidsskriftartikel › Forskning › peer review

50 Citationer (Scopus)

590 Downloads (Pure)

Abstract

Nearest neighbor (NN) search in high-dimensional spaces is
inherently computationally expensive due to the curse of dimensionality. As a well-known solution to approximate NN
search, locality-sensitive hashing (LSH) is able to answer
c-approximate NN (c-ANN) queries in sublinear time with
constant probability. Existing LSH methods focus mainly
on building hash bucket based indexing such that the candidate points can be retrieved quickly. However, existing
coarse-grained structures fail to offer accurate distance estimation for candidate points, which translates into additional
computational overhead when having to examine unnecessary points. This in turn reduces the performance of query
processing. In contrast, we propose a fast and accurate LSH
framework, called PM-LSH, that aims to compute the cANN query on large- scale, high-dimensional datasets. First,
we adopt a simple yet effective PM-tree to index the data
points. Second, we develop a tunable confidence interval
to achieve accurate distance estimation and guarantee high
result quality. Third, we propose an efficient algorithm on
top of the PM-tree to improve the performance of computing c-ANN queries. Extensive experiments with real-world
data offer evidence that PM-LSH is capable of outperforming existing proposals with respect to both efficiency and
accuracy.

Originalsprog	Engelsk
Tidsskrift	Proceedings of the VLDB Endowment
Vol/bind	13
Udgave nummer	5
Sider (fra-til)	643-655
Antal sider	13
ISSN	2150-8097
DOI	https://doi.org/10.14778/3377369.3377374
Status	Udgivet - 2020

Adgang til dokumentet

10.14778/3377369.3377374

p643-zheng (1)Forlagets udgivne version, 3,07 MB

http://www.vldb.org/pvldb/vol13/p643-zheng.pdf

AUB Link

Søg efter materialet i Aalborg Universitetsbiblioteks søgemaskine

Andre filer og links

Link to publication in Scopus

Citationsformater

@article{806a71ff72874843aab1e6e670929856,

title = "PM-LSH: A fast and accurate LSH framework for high-dimensional approximate NN search",

abstract = "Nearest neighbor (NN) search in high-dimensional spaces is inherently computationally expensive due to the curse of dimensionality. As a well-known solution to approximate NN search, locality-sensitive hashing (LSH) is able to answer c-approximate NN (c-ANN) queries in sublinear time with constant probability. Existing LSH methods focus mainly on building hash bucket based indexing such that the candidate points can be retrieved quickly. However, existing coarse-grained structures fail to offer accurate distance estimation for candidate points, which translates into additional computational overhead when having to examine unnecessary points. This in turn reduces the performance of query processing. In contrast, we propose a fast and accurate LSH framework, called PM-LSH, that aims to compute the c-ANN query on large- scale, high-dimensional datasets. First, we adopt a simple yet effective PM-tree to index the data points. Second, we develop a tunable confidence interval to achieve accurate distance estimation and guarantee high result quality. Third, we propose an efficient algorithm on top of the PM-tree to improve the performance of computing c-ANN queries. Extensive experiments with real-world data offer evidence that PM-LSH is capable of outperforming existing proposals with respect to both efficiency and accuracy.",

author = "Bolong Zheng and Zhao Xi and Lianggui Weng and Hung, {Nguyen Quoc Viet} and Hang Liu and Jensen, {Christian S.}",

year = "2020",

doi = "10.14778/3377369.3377374",

language = "English",

volume = "13",

pages = "643--655",

journal = "Proceedings of the VLDB Endowment",

issn = "2150-8097",

publisher = "VLDB Endowment",

number = "5",

}

TY - JOUR

T1 - PM-LSH

T2 - A fast and accurate LSH framework for high-dimensional approximate NN search

AU - Zheng, Bolong

AU - Xi, Zhao

AU - Weng, Lianggui

AU - Hung, Nguyen Quoc Viet

AU - Liu, Hang

AU - Jensen, Christian S.

PY - 2020

Y1 - 2020

N2 - Nearest neighbor (NN) search in high-dimensional spaces is inherently computationally expensive due to the curse of dimensionality. As a well-known solution to approximate NN search, locality-sensitive hashing (LSH) is able to answer c-approximate NN (c-ANN) queries in sublinear time with constant probability. Existing LSH methods focus mainly on building hash bucket based indexing such that the candidate points can be retrieved quickly. However, existing coarse-grained structures fail to offer accurate distance estimation for candidate points, which translates into additional computational overhead when having to examine unnecessary points. This in turn reduces the performance of query processing. In contrast, we propose a fast and accurate LSH framework, called PM-LSH, that aims to compute the c-ANN query on large- scale, high-dimensional datasets. First, we adopt a simple yet effective PM-tree to index the data points. Second, we develop a tunable confidence interval to achieve accurate distance estimation and guarantee high result quality. Third, we propose an efficient algorithm on top of the PM-tree to improve the performance of computing c-ANN queries. Extensive experiments with real-world data offer evidence that PM-LSH is capable of outperforming existing proposals with respect to both efficiency and accuracy.

AB - Nearest neighbor (NN) search in high-dimensional spaces is inherently computationally expensive due to the curse of dimensionality. As a well-known solution to approximate NN search, locality-sensitive hashing (LSH) is able to answer c-approximate NN (c-ANN) queries in sublinear time with constant probability. Existing LSH methods focus mainly on building hash bucket based indexing such that the candidate points can be retrieved quickly. However, existing coarse-grained structures fail to offer accurate distance estimation for candidate points, which translates into additional computational overhead when having to examine unnecessary points. This in turn reduces the performance of query processing. In contrast, we propose a fast and accurate LSH framework, called PM-LSH, that aims to compute the c-ANN query on large- scale, high-dimensional datasets. First, we adopt a simple yet effective PM-tree to index the data points. Second, we develop a tunable confidence interval to achieve accurate distance estimation and guarantee high result quality. Third, we propose an efficient algorithm on top of the PM-tree to improve the performance of computing c-ANN queries. Extensive experiments with real-world data offer evidence that PM-LSH is capable of outperforming existing proposals with respect to both efficiency and accuracy.

UR - http://www.scopus.com/inward/record.url?scp=85089179174&partnerID=8YFLogxK

U2 - 10.14778/3377369.3377374

DO - 10.14778/3377369.3377374

M3 - Journal article

SN - 2150-8097

VL - 13

SP - 643

EP - 655

JO - Proceedings of the VLDB Endowment

JF - Proceedings of the VLDB Endowment

IS - 5

ER -

PM-LSH: A fast and accurate LSH framework for high-dimensional approximate NN search

Abstract

Adgang til dokumentet

AUB Link

Andre filer og links

Fingeraftryk

Citationsformater