Multi-Source Spatial Entity Linkage

Suela Isaj, Esteban Zimányi, Torben Bach Pedersen

Research output: Contribution to book/anthology/report/conference proceedingArticle in proceedingResearchpeer-review

134 Downloads (Pure)

Abstract

Besides the traditional cartographic data sources, spatial information can also be derived from location-based sources. Location-based sources offer rich spatial information describing the semantics of locations. However, even though different location-based sources refer to the same physical world, each one has only partial coverage of the spatial entities of interest, describe them with different attributes and sometimes provide contradicting information. Hence, the problem of finding which pairs of spatial entities belong to the same physical spatial entity demands specific attention. We propose a solution (QuadSky) to the problem of spatial entity linkage across diverse location-based sources. QuadSky starts with a spatial blocking technique (QuadFlex) that inherits the concept and the complexity from the quadtree algorithm but improves the splitting technique not to separate nearby points. After comparing the spatial entities of the same block, we propose a novel algorithm, referred to as SkyEx that separates the pairs considered as a match (positive class) from the rest (negative class) by using Pareto optimality. SkyEx does not require weights on the attributes, scoring function or a training set. QuadSky achieves 0.85 precision and 0.85 recall for a manually labeled dataset of 1,500 pairs and 0.87 precision and 0.6 recall for a semi-manually labeled dataset of 777,452 pairs. Moreover, QuadSky provides the best trade-off between precision and recall and consequently, the best F-measure compared to the existing baselines.
Original languageEnglish
Title of host publicationInternational Symposium on Spatial and Temporal Databases : SSTD
Number of pages10
Publication date19 Aug 2019
Pages1-10
ISBN (Print)978-1-4503-6280-1
ISBN (Electronic)978-1-4503-6280-1
DOIs
Publication statusPublished - 19 Aug 2019
EventInternational Symposium on Spatial and Temporal Databases - Wien, Austria
Duration: 19 Aug 201921 Aug 2019
Conference number: 16th
http://sstd2019.org/

Conference

ConferenceInternational Symposium on Spatial and Temporal Databases
Number16th
CountryAustria
CityWien
Period19/08/201921/08/2019
Internet address

Fingerprint

Semantics

Cite this

Isaj, S., Zimányi, E., & Pedersen, T. B. (2019). Multi-Source Spatial Entity Linkage. In International Symposium on Spatial and Temporal Databases: SSTD (pp. 1-10) https://doi.org/10.1145/3340964.3340979
Isaj, Suela ; Zimányi, Esteban ; Pedersen, Torben Bach. / Multi-Source Spatial Entity Linkage. International Symposium on Spatial and Temporal Databases: SSTD. 2019. pp. 1-10
@inproceedings{4f774561cdb44e13b72a10278eb23b8e,
title = "Multi-Source Spatial Entity Linkage",
abstract = "Besides the traditional cartographic data sources, spatial information can also be derived from location-based sources. Location-based sources offer rich spatial information describing the semantics of locations. However, even though different location-based sources refer to the same physical world, each one has only partial coverage of the spatial entities of interest, describe them with different attributes and sometimes provide contradicting information. Hence, the problem of finding which pairs of spatial entities belong to the same physical spatial entity demands specific attention. We propose a solution (QuadSky) to the problem of spatial entity linkage across diverse location-based sources. QuadSky starts with a spatial blocking technique (QuadFlex) that inherits the concept and the complexity from the quadtree algorithm but improves the splitting technique not to separate nearby points. After comparing the spatial entities of the same block, we propose a novel algorithm, referred to as SkyEx that separates the pairs considered as a match (positive class) from the rest (negative class) by using Pareto optimality. SkyEx does not require weights on the attributes, scoring function or a training set. QuadSky achieves 0.85 precision and 0.85 recall for a manually labeled dataset of 1,500 pairs and 0.87 precision and 0.6 recall for a semi-manually labeled dataset of 777,452 pairs. Moreover, QuadSky provides the best trade-off between precision and recall and consequently, the best F-measure compared to the existing baselines.",
author = "Suela Isaj and Esteban Zim{\'a}nyi and Pedersen, {Torben Bach}",
year = "2019",
month = "8",
day = "19",
doi = "10.1145/3340964.3340979",
language = "English",
isbn = "978-1-4503-6280-1",
pages = "1--10",
booktitle = "International Symposium on Spatial and Temporal Databases",

}

Isaj, S, Zimányi, E & Pedersen, TB 2019, Multi-Source Spatial Entity Linkage. in International Symposium on Spatial and Temporal Databases: SSTD. pp. 1-10, International Symposium on Spatial and Temporal Databases, Wien, Austria, 19/08/2019. https://doi.org/10.1145/3340964.3340979

Multi-Source Spatial Entity Linkage. / Isaj, Suela; Zimányi, Esteban; Pedersen, Torben Bach.

International Symposium on Spatial and Temporal Databases: SSTD. 2019. p. 1-10.

Research output: Contribution to book/anthology/report/conference proceedingArticle in proceedingResearchpeer-review

TY - GEN

T1 - Multi-Source Spatial Entity Linkage

AU - Isaj, Suela

AU - Zimányi, Esteban

AU - Pedersen, Torben Bach

PY - 2019/8/19

Y1 - 2019/8/19

N2 - Besides the traditional cartographic data sources, spatial information can also be derived from location-based sources. Location-based sources offer rich spatial information describing the semantics of locations. However, even though different location-based sources refer to the same physical world, each one has only partial coverage of the spatial entities of interest, describe them with different attributes and sometimes provide contradicting information. Hence, the problem of finding which pairs of spatial entities belong to the same physical spatial entity demands specific attention. We propose a solution (QuadSky) to the problem of spatial entity linkage across diverse location-based sources. QuadSky starts with a spatial blocking technique (QuadFlex) that inherits the concept and the complexity from the quadtree algorithm but improves the splitting technique not to separate nearby points. After comparing the spatial entities of the same block, we propose a novel algorithm, referred to as SkyEx that separates the pairs considered as a match (positive class) from the rest (negative class) by using Pareto optimality. SkyEx does not require weights on the attributes, scoring function or a training set. QuadSky achieves 0.85 precision and 0.85 recall for a manually labeled dataset of 1,500 pairs and 0.87 precision and 0.6 recall for a semi-manually labeled dataset of 777,452 pairs. Moreover, QuadSky provides the best trade-off between precision and recall and consequently, the best F-measure compared to the existing baselines.

AB - Besides the traditional cartographic data sources, spatial information can also be derived from location-based sources. Location-based sources offer rich spatial information describing the semantics of locations. However, even though different location-based sources refer to the same physical world, each one has only partial coverage of the spatial entities of interest, describe them with different attributes and sometimes provide contradicting information. Hence, the problem of finding which pairs of spatial entities belong to the same physical spatial entity demands specific attention. We propose a solution (QuadSky) to the problem of spatial entity linkage across diverse location-based sources. QuadSky starts with a spatial blocking technique (QuadFlex) that inherits the concept and the complexity from the quadtree algorithm but improves the splitting technique not to separate nearby points. After comparing the spatial entities of the same block, we propose a novel algorithm, referred to as SkyEx that separates the pairs considered as a match (positive class) from the rest (negative class) by using Pareto optimality. SkyEx does not require weights on the attributes, scoring function or a training set. QuadSky achieves 0.85 precision and 0.85 recall for a manually labeled dataset of 1,500 pairs and 0.87 precision and 0.6 recall for a semi-manually labeled dataset of 777,452 pairs. Moreover, QuadSky provides the best trade-off between precision and recall and consequently, the best F-measure compared to the existing baselines.

U2 - 10.1145/3340964.3340979

DO - 10.1145/3340964.3340979

M3 - Article in proceeding

SN - 978-1-4503-6280-1

SP - 1

EP - 10

BT - International Symposium on Spatial and Temporal Databases

ER -

Isaj S, Zimányi E, Pedersen TB. Multi-Source Spatial Entity Linkage. In International Symposium on Spatial and Temporal Databases: SSTD. 2019. p. 1-10 https://doi.org/10.1145/3340964.3340979