Projects per year
Abstract
Yelp Open Dataset (YOD) is a widely used dataset for Recommender Systems (RS).
Multiple \glspl{kg} have been built for YOD, but they have various issues:
the conversion processes usually do not follow state-of-the-art methodologies, fail to properly link to other KGs, do not link to existing vocabularies, ignore important data, and are generally of small size.
Instead, we present the Yelp Collaborative Knowledge Graph (YCKG), where we correctly integrating taxonomies, product categories, business locations, and the Yelp social network, through common practices within the semantic web community, overcoming all these issues.
As a result, the YCKG includes 150k businesses and 16.9M reviews from 1.9M distinct real users, resulting in over $244$ million triples, $144$ distinct predicates, for about $72$ million resources, with an average in-degree and out-degree of $3.3$ and $12.2$, respectively.
Further, we release both the data and the code used to generate the KG for inspection and further extensions.
This dataset can be used to develop and test both recommendation and data-mining algorithms able to exploit rich and semantically meaningful knowledge.
We publicize the code for the CKG construction on: https://github.com/MadsCorfixen/The-Yelp-Collaborative-Knowledge-Graph.
Multiple \glspl{kg} have been built for YOD, but they have various issues:
the conversion processes usually do not follow state-of-the-art methodologies, fail to properly link to other KGs, do not link to existing vocabularies, ignore important data, and are generally of small size.
Instead, we present the Yelp Collaborative Knowledge Graph (YCKG), where we correctly integrating taxonomies, product categories, business locations, and the Yelp social network, through common practices within the semantic web community, overcoming all these issues.
As a result, the YCKG includes 150k businesses and 16.9M reviews from 1.9M distinct real users, resulting in over $244$ million triples, $144$ distinct predicates, for about $72$ million resources, with an average in-degree and out-degree of $3.3$ and $12.2$, respectively.
Further, we release both the data and the code used to generate the KG for inspection and further extensions.
This dataset can be used to develop and test both recommendation and data-mining algorithms able to exploit rich and semantically meaningful knowledge.
We publicize the code for the CKG construction on: https://github.com/MadsCorfixen/The-Yelp-Collaborative-Knowledge-Graph.
| Original language | English |
|---|---|
| Title of host publication | CIKM '25: Proceedings of the 34th ACM International Conference on Information and Knowledge Management |
| Number of pages | 6 |
| Publisher | Association for Computing Machinery (ACM) |
| Publication status | Accepted/In press - Aug 2025 |
Keywords
- Dataset
- Recommender systems
- Knowledge graph
- Open Data
Fingerprint
Dive into the research topics of 'The Yelp Collaborative Knowledge Graph'. Together they form a unique fingerprint.-
Poul Due Jensen Professorate in Big Data and Artificial Intelligence
Hose, K. (PI), Jendal, T. E. (Project Participant) & Hansen, E. R. (Project Participant)
01/11/2019 → 31/12/2025
Project: Research
-
EDAO: EDAO: Example Driven Analytics for Open Knowledge Graphs
Lissandrini, M. (PI), Pedersen, T. B. (Supervisor) & Hose, K. (Other)
15/09/2019 → 14/09/2021
Project: Research
-
Datasets
-
The Yelp Collaborative Knowledge Graph
Corfixen, M. (Creator), Olesen, M. (Creator), Heede, T. (Creator), Nielsen, C. F. P. (Creator), Lissandrini, M. (Contributor), Dell'Aglio, D. (Contributor) & Jendal, T. E. (Contributor), Zenodo, 7 May 2023
DOI: 10.5281/zenodo.7878446, https://zenodo.org/record/7878446
Dataset