Abstract
Updates on ontologies affect the operations built on top of them. But not all changes are equal: some updates drastically change the result of operations; others lead to minor variations, if any. Hence, estimating the impact of a change ex-ante is highly important, as it might make ontology engineers aware of the consequences of their action during editing. However, in order to estimate the impact of changes, we need to understand how to measure them.
To address this gap for embeddings, we propose a new measure called Embedding Resemblance Indicator (ERI), which takes into account both the stochasticity of learning embeddings as well as the shortcomings of established comparison methods. We base ERI on (i) a similarity score, (ii) a robustness factor $\hatμ $ (based on the embedding method, similarity measure, and dataset), and (iii) the number of added or deleted entities to the embedding computed with the Jaccard index.
To evaluate ERI, we investigate its usage in the context of two biomedical ontologies and three embedding methods---GraRep, LINE, and DeepWalk---as well as the two standard benchmark datasets---FB15k-237 and Wordnet-18-RR---with TransE and RESCAL embeddings. To study different aspects of ERI, we introduce synthetic changes in the knowledge graphs, generating two test-cases with five versions each and compare their impact with the expected behaviour. Our studies suggests that ERI behaves as expected and captures the similarity of embeddings based on the severity of changes. ERI is crucial for enabling further studies into impact of changes on embeddings.
To address this gap for embeddings, we propose a new measure called Embedding Resemblance Indicator (ERI), which takes into account both the stochasticity of learning embeddings as well as the shortcomings of established comparison methods. We base ERI on (i) a similarity score, (ii) a robustness factor $\hatμ $ (based on the embedding method, similarity measure, and dataset), and (iii) the number of added or deleted entities to the embedding computed with the Jaccard index.
To evaluate ERI, we investigate its usage in the context of two biomedical ontologies and three embedding methods---GraRep, LINE, and DeepWalk---as well as the two standard benchmark datasets---FB15k-237 and Wordnet-18-RR---with TransE and RESCAL embeddings. To study different aspects of ERI, we introduce synthetic changes in the knowledge graphs, generating two test-cases with five versions each and compare their impact with the expected behaviour. Our studies suggests that ERI behaves as expected and captures the similarity of embeddings based on the severity of changes. ERI is crucial for enabling further studies into impact of changes on embeddings.
| Original language | English |
|---|---|
| Title of host publication | K-CAP '21: Proceedings of the 11th International Conference on Knowledge Capture (K-CAP 2021) |
| Number of pages | 8 |
| Publisher | Association for Computing Machinery (ACM) |
| Publication date | 2021 |
| Pages | 177-184 |
| ISBN (Electronic) | 978-1-4503-8457-5 |
| DOIs | |
| Publication status | Published - 2021 |
| Event | K-CAP 2021: Knowledge Capture Conference - Virtual Event Duration: 2 Dec 2021 → 3 Dec 2021 Conference number: 11th |
Conference
| Conference | K-CAP 2021: Knowledge Capture Conference |
|---|---|
| Number | 11th |
| Location | Virtual Event |
| Period | 02/12/2021 → 03/12/2021 |
Keywords
- Ontology evolution
- Machine Learning
- embedding similarity
- knowledge graph embeddings