Efficient Metric Indexing for Similarity Search and Similarity Joins

Lu Chen, Yunjun Gao, Xinhan Li, Christian S. Jensen, Gang Chen

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningpeer review

28 Citationer (Scopus)

Abstract

Spatial queries including similarity search and similarity joins are useful in many areas, such as multimedia retrieval, data integration, and so on. However, they are not supported well by commercial DBMSs. This may be due to the complex data types involved and the needs for flexible similarity criteria seen in real applications. In this paper, we propose a versatile and efficient disk-based index for metric data, the S pace-filling curve and Pivot-based B+-tree (SPB-tree). This index leverages the B+-tree, and uses space-filling curve to cluster data into compact regions, thus achieving storage efficiency. It utilizes a small set of so-called pivots to reduce significantly the number of distance computations when using the index. Further, it makes use of a separate random access file to support a broad range of data. By design, it is easy to integrate the SPB-tree into an existing DBMS. We present efficient algorithms for processing similarity search and similarity joins, as well as corresponding cost models based on SPB-trees. Extensive experiments using both real and synthetic data show that, compared with state-of-the-art competitors, the SPB-tree has much lower construction cost, smaller storage size, and supports more efficient similarity search and similarity joins with high accuracy cost models.

OriginalsprogEngelsk
Artikelnummer7349200
TidsskriftIEEE Transactions on Knowledge and Data Engineering
Vol/bind29
Udgave nummer3
Sider (fra-til)556-571
Antal sider16
ISSN1041-4347
DOI
StatusUdgivet - 1 mar. 2017

Fingeraftryk

Dyk ned i forskningsemnerne om 'Efficient Metric Indexing for Similarity Search and Similarity Joins'. Sammen danner de et unikt fingeraftryk.

Citationsformater