Parallel Semantic Trajectory Similarity Join

Lisi Chen, Shuo Shang, Christian S. Jensen, Bin Yao, Panos Kalnis

Research output: Contribution to book/anthology/report/conference proceedingArticle in proceedingResearchpeer-review

Abstract

Matching similar pairs of trajectories, called trajectory similarity join, is a fundamental functionality in spatial data management. We consider the problem of semantic trajectory similarity join (STS-Join). Each semantic trajectory is a sequence of Points-of-interest (POIs) with both location and text information. Thus, given two sets of semantic trajectories and a threshold θ, the STS-Join returns all pairs of semantic trajectories from the two sets with spatio-textual similarity no less than θ. This join targets applications such as term-based trajectory near-duplicate detection, geo-text data cleaning, personalized ridesharing recommendation, keyword-aware route planning, and travel itinerary recommendation.With these applications in mind, we provide a purposeful definition of spatio-textual similarity. To enable efficient STS-Join processing on large sets of semantic trajectories, we develop trajectory pair filtering techniques and consider the parallel processing capabilities of modern processors. Specifically, we present a two-phase parallel search algorithm. We first group semantic trajectories based on their text information. The algorithm's per-group searches are independent of each other and thus can be performed in parallel. For each group, the trajectories are further partitioned based on the spatial domain. We generate spatial and textual summaries for each trajectory batch, based on which we develop batch filtering and trajectory-batch filtering techniques to prune unqualified trajectory pairs in a batch mode. Additionally, we propose an efficient divide-and-conquer algorithm to derive bounds of spatial similarity and textual similarity between two semantic trajectories, which enable us prune dissimilar trajectory pairs without the need of computing the exact value of spatio-textual similarity. Experimental study with large semantic trajectory data confirms that our algorithm of processing semantic trajectory join is capable of outperforming our well-designed baseline by a factor of 8-12.

Original languageEnglish
Title of host publication2020 IEEE 36th International Conference on Data Engineering
Number of pages12
PublisherIEEE
Publication date2020
Pages997-1008
Article number9101683
ISBN (Print)978-1-7281-2904-4
ISBN (Electronic)9781728129037
DOIs
Publication statusPublished - 2020
EventInternational Conference on Data Engineering - Dallas, United States
Duration: 20 Apr 202024 Apr 2020
Conference number: 36th

Conference

ConferenceInternational Conference on Data Engineering
Number36th
CountryUnited States
CityDallas
Period20/04/202024/04/2020
SeriesProceedings of the International Conference on Data Engineering
ISSN1063-6382

Fingerprint Dive into the research topics of 'Parallel Semantic Trajectory Similarity Join'. Together they form a unique fingerprint.

Cite this