Shield Synthesis for Reinforcement Learning

Bettina Könighofer; Florian Lorber; Nils Jansen; Roderick Bloem

doi:10.1007/978-3-030-61362-4_16

Shield Synthesis for Reinforcement Learning

Bettina Könighofer^*, Florian Lorber, Nils Jansen, Roderick Bloem

^*Corresponding author for this work

Research output: Contribution to book/anthology/report/conference proceeding › Article in proceeding › Research › peer-review

22 Citations (Scopus)

Abstract

Reinforcement learning algorithms discover policies that maximize reward. However, these policies generally do not adhere to safety, leaving safety in reinforcement learning (and in artificial intelligence in general) an open research problem. Shield synthesis is a formal approach to synthesize a correct-by-construction reactive system called a shield that enforces safety properties of a running system while interfering with its operation as little as possible. A shield attached to a learning agent guarantees safety during learning and execution phases. In this paper we summarize three types of shields that are synthesized from different specification languages, and discuss their applicability to reinforcement learning. First, we discuss deterministic shields that enforce specifications expressed as linear temporal logic specifications. Second, we discuss the synthesis of probabilistic shields from specifications in probabilistic temporal logic. Third, we discuss how to synthesize timed shields from timed automata specifications. This paper summarizes the application areas, advantages, disadvantages and synthesis approaches for the three types of shields and gives an overview of experimental results.

Original language	English
Title of host publication	Leveraging Applications of Formal Methods, Verification and Validation : Verification Principles - 9th International Symposium on Leveraging Applications of Formal Methods, ISoLA 2020, Proceedings
Editors	Tiziana Margaria, Bernhard Steffen
Number of pages	17
Publisher	Springer Science+Business Media
Publication date	2020
Pages	290-306
ISBN (Print)	978-3-030-61361-7
ISBN (Electronic)	978-3-030-61362-4
DOIs	https://doi.org/10.1007/978-3-030-61362-4_16
Publication status	Published - 2020
Event	9th International Symposium on Leveraging Applications of Formal Methods, Verification and Validation, ISoLA 2020 - Rhodes, Greece Duration: 20 Oct 2020 → 30 Oct 2020

Conference

Conference	9th International Symposium on Leveraging Applications of Formal Methods, Verification and Validation, ISoLA 2020
Country/Territory	Greece
City	Rhodes
Period	20/10/2020 → 30/10/2020

Series	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	12476 LNCS
ISSN	0302-9743

Bibliographical note

Publisher Copyright:
© 2020, Springer Nature Switzerland AG.

Access to Document

10.1007/978-3-030-61362-4_16

AUB Link

Search for the material in Aalborg University Library's search engine

Cite this

Könighofer, B., Lorber, F., Jansen, N., & Bloem, R. (2020). Shield Synthesis for Reinforcement Learning. In T. Margaria, & B. Steffen (Eds.), Leveraging Applications of Formal Methods, Verification and Validation: Verification Principles - 9th International Symposium on Leveraging Applications of Formal Methods, ISoLA 2020, Proceedings (pp. 290-306). Springer Science+Business Media. https://doi.org/10.1007/978-3-030-61362-4_16

Könighofer, Bettina ; Lorber, Florian ; Jansen, Nils et al. / Shield Synthesis for Reinforcement Learning. Leveraging Applications of Formal Methods, Verification and Validation: Verification Principles - 9th International Symposium on Leveraging Applications of Formal Methods, ISoLA 2020, Proceedings. editor / Tiziana Margaria ; Bernhard Steffen. Springer Science+Business Media, 2020. pp. 290-306 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 12476 LNCS).

@inproceedings{cf2e0c2e7ef84297968d92e26d7782ab,

title = "Shield Synthesis for Reinforcement Learning",

abstract = "Reinforcement learning algorithms discover policies that maximize reward. However, these policies generally do not adhere to safety, leaving safety in reinforcement learning (and in artificial intelligence in general) an open research problem. Shield synthesis is a formal approach to synthesize a correct-by-construction reactive system called a shield that enforces safety properties of a running system while interfering with its operation as little as possible. A shield attached to a learning agent guarantees safety during learning and execution phases. In this paper we summarize three types of shields that are synthesized from different specification languages, and discuss their applicability to reinforcement learning. First, we discuss deterministic shields that enforce specifications expressed as linear temporal logic specifications. Second, we discuss the synthesis of probabilistic shields from specifications in probabilistic temporal logic. Third, we discuss how to synthesize timed shields from timed automata specifications. This paper summarizes the application areas, advantages, disadvantages and synthesis approaches for the three types of shields and gives an overview of experimental results.",

author = "Bettina K{\"o}nighofer and Florian Lorber and Nils Jansen and Roderick Bloem",

note = "Publisher Copyright: {\textcopyright} 2020, Springer Nature Switzerland AG.; 9th International Symposium on Leveraging Applications of Formal Methods, Verification and Validation, ISoLA 2020 ; Conference date: 20-10-2020 Through 30-10-2020",

year = "2020",

doi = "10.1007/978-3-030-61362-4_16",

language = "English",

isbn = "978-3-030-61361-7",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Science+Business Media",

pages = "290--306",

editor = "Tiziana Margaria and Bernhard Steffen",

booktitle = "Leveraging Applications of Formal Methods, Verification and Validation",

address = "United States",

}

Könighofer, B, Lorber, F, Jansen, N & Bloem, R 2020, Shield Synthesis for Reinforcement Learning. in T Margaria & B Steffen (eds), Leveraging Applications of Formal Methods, Verification and Validation: Verification Principles - 9th International Symposium on Leveraging Applications of Formal Methods, ISoLA 2020, Proceedings. Springer Science+Business Media, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 12476 LNCS, pp. 290-306, 9th International Symposium on Leveraging Applications of Formal Methods, Verification and Validation, ISoLA 2020, Rhodes, Greece, 20/10/2020. https://doi.org/10.1007/978-3-030-61362-4_16

Shield Synthesis for Reinforcement Learning. / Könighofer, Bettina; Lorber, Florian; Jansen, Nils et al.
Leveraging Applications of Formal Methods, Verification and Validation: Verification Principles - 9th International Symposium on Leveraging Applications of Formal Methods, ISoLA 2020, Proceedings. ed. / Tiziana Margaria; Bernhard Steffen. Springer Science+Business Media, 2020. p. 290-306 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 12476 LNCS).

Research output: Contribution to book/anthology/report/conference proceeding › Article in proceeding › Research › peer-review

TY - GEN

T1 - Shield Synthesis for Reinforcement Learning

AU - Könighofer, Bettina

AU - Lorber, Florian

AU - Jansen, Nils

AU - Bloem, Roderick

PY - 2020

Y1 - 2020

N2 - Reinforcement learning algorithms discover policies that maximize reward. However, these policies generally do not adhere to safety, leaving safety in reinforcement learning (and in artificial intelligence in general) an open research problem. Shield synthesis is a formal approach to synthesize a correct-by-construction reactive system called a shield that enforces safety properties of a running system while interfering with its operation as little as possible. A shield attached to a learning agent guarantees safety during learning and execution phases. In this paper we summarize three types of shields that are synthesized from different specification languages, and discuss their applicability to reinforcement learning. First, we discuss deterministic shields that enforce specifications expressed as linear temporal logic specifications. Second, we discuss the synthesis of probabilistic shields from specifications in probabilistic temporal logic. Third, we discuss how to synthesize timed shields from timed automata specifications. This paper summarizes the application areas, advantages, disadvantages and synthesis approaches for the three types of shields and gives an overview of experimental results.

AB - Reinforcement learning algorithms discover policies that maximize reward. However, these policies generally do not adhere to safety, leaving safety in reinforcement learning (and in artificial intelligence in general) an open research problem. Shield synthesis is a formal approach to synthesize a correct-by-construction reactive system called a shield that enforces safety properties of a running system while interfering with its operation as little as possible. A shield attached to a learning agent guarantees safety during learning and execution phases. In this paper we summarize three types of shields that are synthesized from different specification languages, and discuss their applicability to reinforcement learning. First, we discuss deterministic shields that enforce specifications expressed as linear temporal logic specifications. Second, we discuss the synthesis of probabilistic shields from specifications in probabilistic temporal logic. Third, we discuss how to synthesize timed shields from timed automata specifications. This paper summarizes the application areas, advantages, disadvantages and synthesis approaches for the three types of shields and gives an overview of experimental results.

UR - http://www.scopus.com/inward/record.url?scp=85097420844&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-61362-4_16

DO - 10.1007/978-3-030-61362-4_16

M3 - Article in proceeding

AN - SCOPUS:85097420844

SN - 978-3-030-61361-7

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 290

EP - 306

BT - Leveraging Applications of Formal Methods, Verification and Validation

A2 - Margaria, Tiziana

A2 - Steffen, Bernhard

PB - Springer Science+Business Media

T2 - 9th International Symposium on Leveraging Applications of Formal Methods, Verification and Validation, ISoLA 2020

Y2 - 20 October 2020 through 30 October 2020

ER -

Könighofer B, Lorber F, Jansen N, Bloem R. Shield Synthesis for Reinforcement Learning. In Margaria T, Steffen B, editors, Leveraging Applications of Formal Methods, Verification and Validation: Verification Principles - 9th International Symposium on Leveraging Applications of Formal Methods, ISoLA 2020, Proceedings. Springer Science+Business Media. 2020. p. 290-306. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 12476 LNCS). doi: 10.1007/978-3-030-61362-4_16

Shield Synthesis for Reinforcement Learning

Abstract

Conference

Bibliographical note

Access to Document

AUB Link

Other files and links

Fingerprint

Cite this