Projects per year
Abstract
In data lakes, one of the core challenges remains finding relevant tables. We introduce the notion of semantic data lakes, i.e.,
repositories where datasets are linked to concepts and entities
described in a knowledge graph (KG). We formalize the problem
of semantic table search, i.e., retrieving tables containing information semantically related to a given set of entities, and provide
the first formal definition of semantic relatedness of a dataset to
tuples of entities. Our solution offers the first general framework
to compute the semantic relevance of the contents of a table w.r.t.
entity tuples, as well as efficient algorithms (exploiting semantic signals, such as entity types and embeddings) to scale the
semantic search to repositories with hundreds of thousands of
distinct tables. Our extensive experiments on both real-world and
synthetic benchmarks show that our approach is able to retrieve
more relevant tables (up to 5.4 times higher recall) in comparison
to existing methods while ensuring fast response times (up to 17
times faster with LSH).
repositories where datasets are linked to concepts and entities
described in a knowledge graph (KG). We formalize the problem
of semantic table search, i.e., retrieving tables containing information semantically related to a given set of entities, and provide
the first formal definition of semantic relatedness of a dataset to
tuples of entities. Our solution offers the first general framework
to compute the semantic relevance of the contents of a table w.r.t.
entity tuples, as well as efficient algorithms (exploiting semantic signals, such as entity types and embeddings) to scale the
semantic search to repositories with hundreds of thousands of
distinct tables. Our extensive experiments on both real-world and
synthetic benchmarks show that our approach is able to retrieve
more relevant tables (up to 5.4 times higher recall) in comparison
to existing methods while ensuring fast response times (up to 17
times faster with LSH).
Translated title of the contribution | Fantastiske Tabeller og Hvor De Findes: Tabelsøgning i Semantiske Data Søer |
---|---|
Original language | English |
Title of host publication | Proceedings 28th International Conference on Extending Database Technology ( EDBT 2025 ) |
Number of pages | 14 |
Place of Publication | OpenProceedings.org |
Publisher | OpenProceedings |
Publication date | 2025 |
Edition | 28 |
Pages | 397-410 |
ISBN (Print) | 978-3-89318-098-1 |
DOIs | |
Publication status | Published - 2025 |
Event | 28th International Conference on Extending Database Technology (EDBT) - Barcelone, Spain Duration: 25 Mar 2025 → 28 Mar 2025 https://edbticdt2025.upc.edu/ |
Conference
Conference | 28th International Conference on Extending Database Technology (EDBT) |
---|---|
Country/Territory | Spain |
City | Barcelone |
Period | 25/03/2025 → 28/03/2025 |
Internet address |
Series | Advances in Database Technology |
---|---|
Volume | 28 |
ISSN | 2367-2005 |
Fingerprint
Dive into the research topics of 'Fantastic Tables and Where to Find Them: Table Search in Semantic Data Lakes'. Together they form a unique fingerprint.-
Poul Due Jensen Professorate in Big Data and Artificial Intelligence
Hose, K. (PI), Jendal, T. E. (Project Participant) & Hansen, E. R. (Project Participant)
01/11/2019 → 31/12/2025
Project: Research
-