Fantastic Tables and Where to Find Them: Table Search in Semantic Data Lakes

Bidragets oversatte titel: Fantastiske Tabeller og Hvor De Findes: Tabelsøgning i Semantiske Data Søer

Martin Pekár Christensen, Aristotelis Leventidis, Matteo Lissandrini, Laura Di Rocco, Renée J. Miller, Katja Hose

Publikation: Bidrag til bog/antologi/rapport/konference proceedingKonferenceartikel i proceedingForskningpeer review

178 Downloads (Pure)

Abstract

In data lakes, one of the core challenges remains finding relevant tables.
We introduce the notion of semantic data lakes, i.e., repositories where datasets are linked to concepts and entities described in a knowledge graph (KG).
We formalize the problem of semantic table search, i.e., retrieving tables containing information semantically related to a given set of entities, and provide the first formal definition of semantic relatedness of a dataset to tuples of entities.
Our solution offers the first general framework to compute the semantic relevance of the contents of a table w.r.t. entity tuples, as well as efficient algorithms (exploiting semantic signals, such as entity types and embeddings) to scale the semantic search to repositories with hundreds of thousands of distinct tables.
Our extensive experiments on both real-world and synthetic benchmarks show that our approach is able to retrieve more relevant tables (up to 5.4 times higher recall) in comparison to existing methods while ensuring fast response times (up to 17 times faster with LSH).
Bidragets oversatte titelFantastiske Tabeller og Hvor De Findes: Tabelsøgning i Semantiske Data Søer
OriginalsprogEngelsk
TitelProceedings 28th International Conference on Extending Database Technology ( EDBT 2025 )
Antal sider14
UdgivelsesstedOpenProceedings.org
ForlagOpenProceedings
Publikationsdato2025
Udgave28
Sider397-410
ISBN (Trykt)978-3-89318-098-1
DOI
StatusUdgivet - 2025
Begivenhed28th International Conference on Extending Database Technology (EDBT) - Barcelone, Spanien
Varighed: 25 mar. 202528 mar. 2025
https://edbticdt2025.upc.edu/

Konference

Konference28th International Conference on Extending Database Technology (EDBT)
Land/OmrådeSpanien
ByBarcelone
Periode25/03/202528/03/2025
Internetadresse
NavnAdvances in Database Technology
Vol/bind28
ISSN2367-2005

Emneord

  • Table search
  • Semantic Web
  • Data Lakes
  • Data discovery

Fingeraftryk

Dyk ned i forskningsemnerne om 'Fantastiske Tabeller og Hvor De Findes: Tabelsøgning i Semantiske Data Søer: Table Search in Semantic Data Lakes'. Sammen danner de et unikt fingeraftryk.

Citationsformater