Shielded Reinforcement Learning for Safe and Optimal Cyber Physical Systems

Publikation: Bidrag til tidsskriftKonferenceabstrakt i tidsskriftForskningpeer review

Abstract

I will present recent advances and applications of the tool UPPAAL Stratego (www.uppaal.org) supporting automatic synthesis of guaranteed safe and near-optimal control strategies for cyber physical systems. UPPAAL Stratego support reinforcement learning methods to construct near-optimal controllers. However, their behavior is not guaranteed to be safe, even when it is encouraged by reward engineering. One way of imposing safety to a learned controller is to use a safety shield, synthesized using symbolic methods from checking, and hence correct by design. To make synthesis of shields for hybrid environments tractable UPPAAL Stratego are using various abstraction techniques for hybrids systems.

We study the impact of the synthesized shield when applied as either a pre-shield (applied before learning a controller) or a post-shield (only applied after learning a controller). In addition trade-offs between efficiency of strategy representation and degree of optimality subject to safety constraints will be discussed, as well as successful on-going applications (water-management, heating systems, and traffic control).
OriginalsprogEngelsk
TidsskriftElectronic Proceedings in Theoretical Computer Science, EPTCS
Vol/bind409
Sider (fra-til)2
Antal sider1
ISSN2075-2180
DOI
StatusUdgivet - 30 okt. 2024
Begivenhed15th International Symposium on Games, Automata, Logics, and Formal Verification, G and ALF 2024 - Reykjavik, Island
Varighed: 19 jun. 202421 jun. 2024

Konference

Konference15th International Symposium on Games, Automata, Logics, and Formal Verification, G and ALF 2024
Land/OmrådeIsland
ByReykjavik
Periode19/06/202421/06/2024

Fingeraftryk

Dyk ned i forskningsemnerne om 'Shielded Reinforcement Learning for Safe and Optimal Cyber Physical Systems'. Sammen danner de et unikt fingeraftryk.

Citationsformater