Fantastic Tables and Where to Find Them: Table Search in Semantic Data Lakes

Martin Pekár Christensen, Aristotelis Leventidis, Matteo Lissandrini, Laura Di Rocco, Renée J. Miller, Katja Hose

Research output: Contribution to book/anthology/report/conference proceedingArticle in proceedingResearchpeer-review

220 Downloads (Pure)

Abstract

In data lakes, one of the core challenges remains finding relevant tables. We introduce the notion of semantic data lakes, i.e.,
repositories where datasets are linked to concepts and entities
described in a knowledge graph (KG). We formalize the problem
of semantic table search, i.e., retrieving tables containing information semantically related to a given set of entities, and provide
the first formal definition of semantic relatedness of a dataset to
tuples of entities. Our solution offers the first general framework
to compute the semantic relevance of the contents of a table w.r.t.
entity tuples, as well as efficient algorithms (exploiting semantic signals, such as entity types and embeddings) to scale the
semantic search to repositories with hundreds of thousands of
distinct tables. Our extensive experiments on both real-world and
synthetic benchmarks show that our approach is able to retrieve
more relevant tables (up to 5.4 times higher recall) in comparison
to existing methods while ensuring fast response times (up to 17
times faster with LSH).
Translated title of the contributionFantastiske Tabeller og Hvor De Findes: Tabelsøgning i Semantiske Data Søer
Original languageEnglish
Title of host publicationProceedings 28th International Conference on Extending Database Technology ( EDBT 2025 )
Number of pages14
Place of PublicationOpenProceedings.org
PublisherOpenProceedings
Publication date2025
Edition28
Pages397-410
ISBN (Print)978-3-89318-098-1
DOIs
Publication statusPublished - 2025
Event28th International Conference on Extending Database Technology (EDBT) - Barcelone, Spain
Duration: 25 Mar 202528 Mar 2025
https://edbticdt2025.upc.edu/

Conference

Conference28th International Conference on Extending Database Technology (EDBT)
Country/TerritorySpain
CityBarcelone
Period25/03/202528/03/2025
Internet address
SeriesAdvances in Database Technology
Volume28
ISSN2367-2005

Fingerprint

Dive into the research topics of 'Fantastic Tables and Where to Find Them: Table Search in Semantic Data Lakes'. Together they form a unique fingerprint.

Cite this