TY - JOUR
T1 - Local fast failover routing with low stretch
AU - Foerster, Klaus Tycho
AU - Pignolet, Yvonne Anne
AU - Schmid, Stefan
AU - Tredan, Gilles
PY - 2018/1/1
Y1 - 2018/1/1
N2 - Network failures are frequent and disruptive, and can significantly reduce the throughput even in highly connected and regular networks such as datacenters. While many modern networks support some kind of local fast failover to quickly reroute flows encountering link failures to new paths, employing such mechanisms is known to be non-trivial, as conditional failover rules can only depend on local failure information. While over the last years, important insights have been gained on how to design failover schemes providing high resiliency, existing approaches have the shortcoming that the resulting failover routes may be unnecessarily long, i.e., they have a large stretch compared to the original route length. This is a serious drawback, as long routes entail higher latencies and introduce loads, which may cause the rerouted flows to interfere with existing flows and harm throughput. This paper presents the first deterministic local fast failover algorithms providing provable resiliency and failover route lengths, even in the presence of many concurrent failures. We present stretch-optimal failover algorithms for dierent network topologies, including multi-dimensional grids, hypercubes and Clos networks, as they are frequently deployed in the context of HPC clusters and datacenters. We show that the computed failover routes are optimal in the sense that no failover algorithm can provide shorter paths for a given number of link failures.
AB - Network failures are frequent and disruptive, and can significantly reduce the throughput even in highly connected and regular networks such as datacenters. While many modern networks support some kind of local fast failover to quickly reroute flows encountering link failures to new paths, employing such mechanisms is known to be non-trivial, as conditional failover rules can only depend on local failure information. While over the last years, important insights have been gained on how to design failover schemes providing high resiliency, existing approaches have the shortcoming that the resulting failover routes may be unnecessarily long, i.e., they have a large stretch compared to the original route length. This is a serious drawback, as long routes entail higher latencies and introduce loads, which may cause the rerouted flows to interfere with existing flows and harm throughput. This paper presents the first deterministic local fast failover algorithms providing provable resiliency and failover route lengths, even in the presence of many concurrent failures. We present stretch-optimal failover algorithms for dierent network topologies, including multi-dimensional grids, hypercubes and Clos networks, as they are frequently deployed in the context of HPC clusters and datacenters. We show that the computed failover routes are optimal in the sense that no failover algorithm can provide shorter paths for a given number of link failures.
KW - Fast Reroute
KW - Network Algorithms
KW - Static Resiliency
UR - http://www.scopus.com/inward/record.url?scp=85047296815&partnerID=8YFLogxK
U2 - 10.1145/3211852.3211858
DO - 10.1145/3211852.3211858
M3 - Journal article
AN - SCOPUS:85047296815
SN - 0146-4833
VL - 48
SP - 35
EP - 41
JO - Computer Communication Review
JF - Computer Communication Review
IS - 1
ER -