Fault Tolerant Horizontal Computation Offloading

Alexander Droob*, Daniel Morratz, Frederik Langkilde Jakobsen, Jacob Carstensen, Magnus Mathiesen, Rune Bohnstedt, Michele Albano, Sergio Moreschini, Davide Taibi

*Corresponding author for this work

Research output: Contribution to book/anthology/report/conference proceedingArticle in proceedingResearchpeer-review

3 Citations (Scopus)

Abstract

The broad development and usage of edge devices has highlighted the importance of creating resilient and computationally advanced edge-to-cloud continuum environments. When working with edge devices these desiderata are usually achieved through replication and offloading. This paper reports on the design and implementation of a fault-tolerant service that enables the offloading of jobs from devices with limited computational power. We propose a solution that allows users to upload jobs through a web service, which will be executed on edge nodes within the system. The solution is designed to be fault tolerant and scalable, with no single point of failure as well as the ability to accommodate growth, if the service is expanded. The use of Docker checkpointing on the worker machines ensures that jobs can be resumed in the event of a fault. We provide a mathematical approach to optimize the number of checkpoints that are created along a computation, given that we can forecast the time needed to execute a job. We present experiments that indicate in which scenarios checkpointing benefits job execution. Our experiments shows the benefits of using checkpointing and restore when the completion jobs' time rises compared with the forecast fault rate.

Original languageEnglish
Title of host publicationProceedings - 2023 IEEE International Conference on Edge Computing and Communications, EDGE 2023
EditorsClaudio Ardagna, Feras Awaysheh, Hongyi Bian, Carl K. Chang, Rong N. Chang, Flavia Delicato, Nirmit Desai, Jing Fan, Geoffrey C. Fox, Andrzej Goscinski, Zhi Jin, Anna Kobusinska, Omer Rana
Number of pages6
PublisherIEEE
Publication date2023
Pages177-182
ISBN (Print)979-8-3503-0484-8
ISBN (Electronic)979-8-3503-0483-1
DOIs
Publication statusPublished - 2023
Event7th IEEE International Conference on Edge Computing and Communications, EDGE 2023 - Hybrid, Chicago, United States
Duration: 2 Jul 20238 Jul 2023

Conference

Conference7th IEEE International Conference on Edge Computing and Communications, EDGE 2023
Country/TerritoryUnited States
CityHybrid, Chicago
Period02/07/202308/07/2023
SeriesProceedings - IEEE International Conference on Edge Computing
Volume2023-July
ISSN2767-9918

Bibliographical note

Publisher Copyright:
© 2023 IEEE.

Keywords

  • checkpointing
  • edge nodes
  • orchestration
  • replication
  • totally ordered multicast
  • workers

Fingerprint

Dive into the research topics of 'Fault Tolerant Horizontal Computation Offloading'. Together they form a unique fingerprint.

Cite this