Projekter pr. år
Abstract
In data lakes, one of the core challenges remains finding relevant tables.
We introduce the notion of semantic data lakes, i.e., repositories where datasets are linked to concepts and entities described in a knowledge graph (KG).
We formalize the problem of semantic table search, i.e., retrieving tables containing information semantically related to a given set of entities, and provide the first formal definition of semantic relatedness of a dataset to tuples of entities.
Our solution offers the first general framework to compute the semantic relevance of the contents of a table w.r.t. entity tuples, as well as efficient algorithms (exploiting semantic signals, such as entity types and embeddings) to scale the semantic search to repositories with hundreds of thousands of distinct tables.
Our extensive experiments on both real-world and synthetic benchmarks show that our approach is able to retrieve more relevant tables (up to 5.4 times higher recall) in comparison to existing methods while ensuring fast response times (up to 17 times faster with LSH).
We introduce the notion of semantic data lakes, i.e., repositories where datasets are linked to concepts and entities described in a knowledge graph (KG).
We formalize the problem of semantic table search, i.e., retrieving tables containing information semantically related to a given set of entities, and provide the first formal definition of semantic relatedness of a dataset to tuples of entities.
Our solution offers the first general framework to compute the semantic relevance of the contents of a table w.r.t. entity tuples, as well as efficient algorithms (exploiting semantic signals, such as entity types and embeddings) to scale the semantic search to repositories with hundreds of thousands of distinct tables.
Our extensive experiments on both real-world and synthetic benchmarks show that our approach is able to retrieve more relevant tables (up to 5.4 times higher recall) in comparison to existing methods while ensuring fast response times (up to 17 times faster with LSH).
Bidragets oversatte titel | Fantastiske Tabeller og Hvor De Findes: Tabelsøgning i Semantiske Data Søer |
---|---|
Originalsprog | Engelsk |
Titel | Proceedings 28th International Conference on Extending Database Technology ( EDBT 2025 ) |
Antal sider | 14 |
Udgivelsessted | OpenProceedings.org |
Forlag | OpenProceedings |
Publikationsdato | 2025 |
Udgave | 28 |
Sider | 397-410 |
ISBN (Trykt) | 978-3-89318-098-1 |
DOI | |
Status | Udgivet - 2025 |
Begivenhed | 28th International Conference on Extending Database Technology (EDBT) - Barcelone, Spanien Varighed: 25 mar. 2025 → 28 mar. 2025 https://edbticdt2025.upc.edu/ |
Konference
Konference | 28th International Conference on Extending Database Technology (EDBT) |
---|---|
Land/Område | Spanien |
By | Barcelone |
Periode | 25/03/2025 → 28/03/2025 |
Internetadresse |
Navn | Advances in Database Technology |
---|---|
Vol/bind | 28 |
ISSN | 2367-2005 |
Emneord
- Table search
- Semantic Web
- Data Lakes
- Data discovery
Fingeraftryk
Dyk ned i forskningsemnerne om 'Fantastiske Tabeller og Hvor De Findes: Tabelsøgning i Semantiske Data Søer: Table Search in Semantic Data Lakes'. Sammen danner de et unikt fingeraftryk.-
Poul Due Jensen Professorate in Big Data and Artificial Intelligence
Hose, K. (PI (principal investigator)), Jendal, T. E. (Projektdeltager) & Hansen, E. R. (Projektdeltager)
01/11/2019 → 31/12/2025
Projekter: Projekt › Forskning
-
RelWeb: A Reliable Web of Data
Hose, K. (PI (principal investigator))
01/09/2019 → 31/08/2024
Projekter: Projekt › Forskning