Counts-of-counts similarity for prediction and search in relational data

Manfred Jaeger; Marco Lippi; Giovanni  Pellegrini; Andrea Passerini

doi:10.1007/s10618-019-00621-7

Counts-of-counts similarity for prediction and search in relational data

Manfred Jaeger, Marco Lippi, Giovanni Pellegrini, Andrea Passerini

Research output: Contribution to journal › Journal article › Research › peer-review

90 Downloads (Pure)

Abstract

Defining appropriate distance functions is a crucial aspect of effective and efficient similarity-based prediction and retrieval. Relational data are especially challenging in this regard. By viewing relational data as multi-relational graphs, one can easily see that a distance between a pair of nodes can be defined in terms of a virtually unlimited class of features, including node attributes, attributes of node neighbors, structural aspects of the node neighborhood and arbitrary combinations of these properties. In this paper we propose a rich and flexible class of metrics on graph entities based on earth mover’s distance applied to a hierarchy of complex counts-of-counts statistics. We further propose an approximate version of the distance using sums of marginal earth mover’s distances. We show that the approximation is correct for many cases of practical interest and allows efficient nearest-neighbor retrieval when combined with a simple metric tree data structure. An experimental evaluation on two real-world scenarios highlights the flexibility of our framework for designing metrics representing different notions of similarity. Substantial improvements in similarity-based prediction are reported when compared to solutions based on state-of-the-art graph kernels.

Original language	English
Journal	Data Mining and Knowledge Discovery
Volume	33
Issue number	5
Pages (from-to)	1254–1297
Number of pages	44
ISSN	1384-5810
DOIs	https://doi.org/10.1007/s10618-019-00621-7
Publication status	Published - Sept 2019

Keywords

Earth-mover’s distance
Graph mininga
Relational data mining
Similarity search
Statistical relational learning

Access to Document

10.1007/s10618-019-00621-7

Accepted author manuscriptAccepted author manuscript, 1.36 MB

https://link.springer.com/article/10.1007/s10618-019-00621-7

AUB Link

Search for the material in Aalborg University Library's search engine

Cite this

@article{753a64432c784576aab210b44926c72d,

title = "Counts-of-counts similarity for prediction and search in relational data",

abstract = "Defining appropriate distance functions is a crucial aspect of effective and efficient similarity-based prediction and retrieval. Relational data are especially challenging in this regard. By viewing relational data as multi-relational graphs, one can easily see that a distance between a pair of nodes can be defined in terms of a virtually unlimited class of features, including node attributes, attributes of node neighbors, structural aspects of the node neighborhood and arbitrary combinations of these properties. In this paper we propose a rich and flexible class of metrics on graph entities based on earth mover{\textquoteright}s distance applied to a hierarchy of complex counts-of-counts statistics. We further propose an approximate version of the distance using sums of marginal earth mover{\textquoteright}s distances. We show that the approximation is correct for many cases of practical interest and allows efficient nearest-neighbor retrieval when combined with a simple metric tree data structure. An experimental evaluation on two real-world scenarios highlights the flexibility of our framework for designing metrics representing different notions of similarity. Substantial improvements in similarity-based prediction are reported when compared to solutions based on state-of-the-art graph kernels.",

keywords = "Earth-mover{\textquoteright}s distance, Graph mininga, Relational data mining, Similarity search, Statistical relational learning",

author = "Manfred Jaeger and Marco Lippi and Giovanni Pellegrini and Andrea Passerini",

year = "2019",

month = sep,

doi = "10.1007/s10618-019-00621-7",

language = "English",

volume = "33",

pages = "1254–1297",

journal = "Data Mining and Knowledge Discovery",

issn = "1384-5810",

publisher = "Springer",

number = "5",

}

TY - JOUR

T1 - Counts-of-counts similarity for prediction and search in relational data

AU - Jaeger, Manfred

AU - Lippi, Marco

AU - Pellegrini, Giovanni

AU - Passerini, Andrea

PY - 2019/9

Y1 - 2019/9

N2 - Defining appropriate distance functions is a crucial aspect of effective and efficient similarity-based prediction and retrieval. Relational data are especially challenging in this regard. By viewing relational data as multi-relational graphs, one can easily see that a distance between a pair of nodes can be defined in terms of a virtually unlimited class of features, including node attributes, attributes of node neighbors, structural aspects of the node neighborhood and arbitrary combinations of these properties. In this paper we propose a rich and flexible class of metrics on graph entities based on earth mover’s distance applied to a hierarchy of complex counts-of-counts statistics. We further propose an approximate version of the distance using sums of marginal earth mover’s distances. We show that the approximation is correct for many cases of practical interest and allows efficient nearest-neighbor retrieval when combined with a simple metric tree data structure. An experimental evaluation on two real-world scenarios highlights the flexibility of our framework for designing metrics representing different notions of similarity. Substantial improvements in similarity-based prediction are reported when compared to solutions based on state-of-the-art graph kernels.

AB - Defining appropriate distance functions is a crucial aspect of effective and efficient similarity-based prediction and retrieval. Relational data are especially challenging in this regard. By viewing relational data as multi-relational graphs, one can easily see that a distance between a pair of nodes can be defined in terms of a virtually unlimited class of features, including node attributes, attributes of node neighbors, structural aspects of the node neighborhood and arbitrary combinations of these properties. In this paper we propose a rich and flexible class of metrics on graph entities based on earth mover’s distance applied to a hierarchy of complex counts-of-counts statistics. We further propose an approximate version of the distance using sums of marginal earth mover’s distances. We show that the approximation is correct for many cases of practical interest and allows efficient nearest-neighbor retrieval when combined with a simple metric tree data structure. An experimental evaluation on two real-world scenarios highlights the flexibility of our framework for designing metrics representing different notions of similarity. Substantial improvements in similarity-based prediction are reported when compared to solutions based on state-of-the-art graph kernels.

KW - Earth-mover’s distance

KW - Graph mininga

KW - Relational data mining

KW - Similarity search

KW - Statistical relational learning

U2 - 10.1007/s10618-019-00621-7

DO - 10.1007/s10618-019-00621-7

M3 - Journal article

SN - 1384-5810

VL - 33

SP - 1254

EP - 1297

JO - Data Mining and Knowledge Discovery

JF - Data Mining and Knowledge Discovery

IS - 5

ER -

Counts-of-counts similarity for prediction and search in relational data

Abstract

Keywords

Access to Document

AUB Link

Fingerprint

Cite this