Multi-Source Spatial Entity Linkage

Suela Isaj, Esteban Zimányi, Torben Bach Pedersen

Publikation: Bidrag til bog/antologi/rapport/konference proceedingKonferenceartikel i proceedingForskningpeer review

161 Downloads (Pure)

Resumé

Besides the traditional cartographic data sources, spatial information can also be derived from location-based sources. Location-based sources offer rich spatial information describing the semantics of locations. However, even though different location-based sources refer to the same physical world, each one has only partial coverage of the spatial entities of interest, describe them with different attributes and sometimes provide contradicting information. Hence, the problem of finding which pairs of spatial entities belong to the same physical spatial entity demands specific attention. We propose a solution (QuadSky) to the problem of spatial entity linkage across diverse location-based sources. QuadSky starts with a spatial blocking technique (QuadFlex) that inherits the concept and the complexity from the quadtree algorithm but improves the splitting technique not to separate nearby points. After comparing the spatial entities of the same block, we propose a novel algorithm, referred to as SkyEx that separates the pairs considered as a match (positive class) from the rest (negative class) by using Pareto optimality. SkyEx does not require weights on the attributes, scoring function or a training set. QuadSky achieves 0.85 precision and 0.85 recall for a manually labeled dataset of 1,500 pairs and 0.87 precision and 0.6 recall for a semi-manually labeled dataset of 777,452 pairs. Moreover, QuadSky provides the best trade-off between precision and recall and consequently, the best F-measure compared to the existing baselines.
OriginalsprogEngelsk
TitelProceedings of the 16th International Symposium on Spatial and Temporal Databases, SSTD 2019 : SSTD
Antal sider10
ForlagAssociation for Computing Machinery
Publikationsdato19 aug. 2019
Sider1-10
ISBN (Trykt)978-1-4503-6280-1
ISBN (Elektronisk)978-1-4503-6280-1
DOI
StatusUdgivet - 19 aug. 2019
BegivenhedInternational Symposium on Spatial and Temporal Databases - Wien, Østrig
Varighed: 19 aug. 201921 aug. 2019
Konferencens nummer: 16th
http://sstd2019.org/

Konference

KonferenceInternational Symposium on Spatial and Temporal Databases
Nummer16th
LandØstrig
ByWien
Periode19/08/201921/08/2019
Internetadresse

Fingerprint

Semantics

Citer dette

Isaj, S., Zimányi, E., & Pedersen, T. B. (2019). Multi-Source Spatial Entity Linkage. I Proceedings of the 16th International Symposium on Spatial and Temporal Databases, SSTD 2019: SSTD (s. 1-10). Association for Computing Machinery. https://doi.org/10.1145/3340964.3340979
Isaj, Suela ; Zimányi, Esteban ; Pedersen, Torben Bach. / Multi-Source Spatial Entity Linkage. Proceedings of the 16th International Symposium on Spatial and Temporal Databases, SSTD 2019: SSTD. Association for Computing Machinery, 2019. s. 1-10
@inproceedings{4f774561cdb44e13b72a10278eb23b8e,
title = "Multi-Source Spatial Entity Linkage",
abstract = "Besides the traditional cartographic data sources, spatial information can also be derived from location-based sources. Location-based sources offer rich spatial information describing the semantics of locations. However, even though different location-based sources refer to the same physical world, each one has only partial coverage of the spatial entities of interest, describe them with different attributes and sometimes provide contradicting information. Hence, the problem of finding which pairs of spatial entities belong to the same physical spatial entity demands specific attention. We propose a solution (QuadSky) to the problem of spatial entity linkage across diverse location-based sources. QuadSky starts with a spatial blocking technique (QuadFlex) that inherits the concept and the complexity from the quadtree algorithm but improves the splitting technique not to separate nearby points. After comparing the spatial entities of the same block, we propose a novel algorithm, referred to as SkyEx that separates the pairs considered as a match (positive class) from the rest (negative class) by using Pareto optimality. SkyEx does not require weights on the attributes, scoring function or a training set. QuadSky achieves 0.85 precision and 0.85 recall for a manually labeled dataset of 1,500 pairs and 0.87 precision and 0.6 recall for a semi-manually labeled dataset of 777,452 pairs. Moreover, QuadSky provides the best trade-off between precision and recall and consequently, the best F-measure compared to the existing baselines.",
author = "Suela Isaj and Esteban Zim{\'a}nyi and Pedersen, {Torben Bach}",
year = "2019",
month = "8",
day = "19",
doi = "10.1145/3340964.3340979",
language = "English",
isbn = "978-1-4503-6280-1",
pages = "1--10",
booktitle = "Proceedings of the 16th International Symposium on Spatial and Temporal Databases, SSTD 2019",
publisher = "Association for Computing Machinery",
address = "United States",

}

Isaj, S, Zimányi, E & Pedersen, TB 2019, Multi-Source Spatial Entity Linkage. i Proceedings of the 16th International Symposium on Spatial and Temporal Databases, SSTD 2019: SSTD. Association for Computing Machinery, s. 1-10, International Symposium on Spatial and Temporal Databases, Wien, Østrig, 19/08/2019. https://doi.org/10.1145/3340964.3340979

Multi-Source Spatial Entity Linkage. / Isaj, Suela; Zimányi, Esteban; Pedersen, Torben Bach.

Proceedings of the 16th International Symposium on Spatial and Temporal Databases, SSTD 2019: SSTD. Association for Computing Machinery, 2019. s. 1-10.

Publikation: Bidrag til bog/antologi/rapport/konference proceedingKonferenceartikel i proceedingForskningpeer review

TY - GEN

T1 - Multi-Source Spatial Entity Linkage

AU - Isaj, Suela

AU - Zimányi, Esteban

AU - Pedersen, Torben Bach

PY - 2019/8/19

Y1 - 2019/8/19

N2 - Besides the traditional cartographic data sources, spatial information can also be derived from location-based sources. Location-based sources offer rich spatial information describing the semantics of locations. However, even though different location-based sources refer to the same physical world, each one has only partial coverage of the spatial entities of interest, describe them with different attributes and sometimes provide contradicting information. Hence, the problem of finding which pairs of spatial entities belong to the same physical spatial entity demands specific attention. We propose a solution (QuadSky) to the problem of spatial entity linkage across diverse location-based sources. QuadSky starts with a spatial blocking technique (QuadFlex) that inherits the concept and the complexity from the quadtree algorithm but improves the splitting technique not to separate nearby points. After comparing the spatial entities of the same block, we propose a novel algorithm, referred to as SkyEx that separates the pairs considered as a match (positive class) from the rest (negative class) by using Pareto optimality. SkyEx does not require weights on the attributes, scoring function or a training set. QuadSky achieves 0.85 precision and 0.85 recall for a manually labeled dataset of 1,500 pairs and 0.87 precision and 0.6 recall for a semi-manually labeled dataset of 777,452 pairs. Moreover, QuadSky provides the best trade-off between precision and recall and consequently, the best F-measure compared to the existing baselines.

AB - Besides the traditional cartographic data sources, spatial information can also be derived from location-based sources. Location-based sources offer rich spatial information describing the semantics of locations. However, even though different location-based sources refer to the same physical world, each one has only partial coverage of the spatial entities of interest, describe them with different attributes and sometimes provide contradicting information. Hence, the problem of finding which pairs of spatial entities belong to the same physical spatial entity demands specific attention. We propose a solution (QuadSky) to the problem of spatial entity linkage across diverse location-based sources. QuadSky starts with a spatial blocking technique (QuadFlex) that inherits the concept and the complexity from the quadtree algorithm but improves the splitting technique not to separate nearby points. After comparing the spatial entities of the same block, we propose a novel algorithm, referred to as SkyEx that separates the pairs considered as a match (positive class) from the rest (negative class) by using Pareto optimality. SkyEx does not require weights on the attributes, scoring function or a training set. QuadSky achieves 0.85 precision and 0.85 recall for a manually labeled dataset of 1,500 pairs and 0.87 precision and 0.6 recall for a semi-manually labeled dataset of 777,452 pairs. Moreover, QuadSky provides the best trade-off between precision and recall and consequently, the best F-measure compared to the existing baselines.

U2 - 10.1145/3340964.3340979

DO - 10.1145/3340964.3340979

M3 - Article in proceeding

SN - 978-1-4503-6280-1

SP - 1

EP - 10

BT - Proceedings of the 16th International Symposium on Spatial and Temporal Databases, SSTD 2019

PB - Association for Computing Machinery

ER -

Isaj S, Zimányi E, Pedersen TB. Multi-Source Spatial Entity Linkage. I Proceedings of the 16th International Symposium on Spatial and Temporal Databases, SSTD 2019: SSTD. Association for Computing Machinery. 2019. s. 1-10 https://doi.org/10.1145/3340964.3340979