TY - GEN
T1 - Beyond frequencies
T2 - 21st International Conference on Extending Database Technology, EDBT 2018
AU - Preti, Giulia
AU - Mottin, Davide
AU - Lissandrini, Matteo
AU - Velegrakis, Yannis
PY - 2018/1/1
Y1 - 2018/1/1
N2 - Graph pattern mining aims at identifying structures that appear frequently in large graphs, under the assumption that frequency signies importance. Several measures of frequency have been proposed that respect the apriori property, essential for an e-cient search of the patterns. This property states that the number of appearances of a pattern in a graph cannot be larger than the frequency of any of its sub-patterns. In real life, there are many graphs with weights on nodes and/or edges. For these graphs, it is fair that the importance (score) of a pattern is determined not only by the number of its appearances, but also by the weights on the nodes/edges of those appearances. Scoring functions based on the weights do not generally satisfy the apriori property, thus forcing many approaches to employ other, less ecient, pruning strategies to speed up the computation. The problem becomes even more challenging in the case of multiple weighting functions that assign dierent weights to the same nodes/edges. In this work, we provide ecient and eective techniques for mining patterns in multi-weight graphs. We devise both an exact and an approximate solution. The rst is characterized by intelligent storage and computation of the pattern scores, while the second is based on the aggregation of similar weighting functions to allow scalability and avoid redundant computations. Both methods adopt a scoring function that respects the apriori property, and thus they can rely on eective pruning strategies. Extensive experiments under dierent parameter settings prove that the presence of edge weights and the choice of scoring function aect the patterns mined, and hence the quality of the results returned to the user. Finally, experiments on datasets of dierent sizes and increasing numbers of weighting functions show that, even when the performance of the exact algorithm degrades, the approximate algorithm performs well and with quite good quality.
AB - Graph pattern mining aims at identifying structures that appear frequently in large graphs, under the assumption that frequency signies importance. Several measures of frequency have been proposed that respect the apriori property, essential for an e-cient search of the patterns. This property states that the number of appearances of a pattern in a graph cannot be larger than the frequency of any of its sub-patterns. In real life, there are many graphs with weights on nodes and/or edges. For these graphs, it is fair that the importance (score) of a pattern is determined not only by the number of its appearances, but also by the weights on the nodes/edges of those appearances. Scoring functions based on the weights do not generally satisfy the apriori property, thus forcing many approaches to employ other, less ecient, pruning strategies to speed up the computation. The problem becomes even more challenging in the case of multiple weighting functions that assign dierent weights to the same nodes/edges. In this work, we provide ecient and eective techniques for mining patterns in multi-weight graphs. We devise both an exact and an approximate solution. The rst is characterized by intelligent storage and computation of the pattern scores, while the second is based on the aggregation of similar weighting functions to allow scalability and avoid redundant computations. Both methods adopt a scoring function that respects the apriori property, and thus they can rely on eective pruning strategies. Extensive experiments under dierent parameter settings prove that the presence of edge weights and the choice of scoring function aect the patterns mined, and hence the quality of the results returned to the user. Finally, experiments on datasets of dierent sizes and increasing numbers of weighting functions show that, even when the performance of the exact algorithm degrades, the approximate algorithm performs well and with quite good quality.
UR - http://www.scopus.com/inward/record.url?scp=85072013048&partnerID=8YFLogxK
U2 - 10.5441/002/edbt.2018.16
DO - 10.5441/002/edbt.2018.16
M3 - Article in proceeding
AN - SCOPUS:85072013048
T3 - Advances in Database Technology - EDBT
SP - 169
EP - 180
BT - Advances in Database Technology - EDBT 2018
A2 - Bohlen, Michael
A2 - Pichler, Reinhard
A2 - May, Norman
A2 - Rahm, Erhard
A2 - Wu, Shan-Hung
A2 - Hose, Katja
PB - OpenProceedings.org
Y2 - 26 March 2018 through 29 March 2018
ER -