Online Learning of Safety function for Markov Decision Processes

Abhijit Mazumdar*, Rafal Wisniewski, Manuela L Bujorianu

*Corresponding author for this work

Research output: Contribution to book/anthology/report/conference proceedingArticle in proceedingResearchpeer-review

1 Citation (Scopus)

Abstract

In this paper, we aim to study safety specifications for a Markov decision process with stochastic stopping time in an almost model-free setting. Our approach involves characterizing a proxy set of the states that are near in a probabilistic sense to the set of unsafe states - forbidden set. We also provide results that relate safety function with reinforcement learning. Consequently, we develop an online algorithm based on the temporal difference method to compute the safety function. Finally, we provide simulation results that demonstrate our work in a simple example.
Original languageEnglish
Title of host publication2023 European Control Conference (ECC)
Number of pages6
PublisherIEEE
Publication date13 Jun 2023
Pages1-6
ISBN (Print)978-1-6654-6531-1, 978-3-907144-09-1
ISBN (Electronic)978-3-907144-08-4
DOIs
Publication statusPublished - 13 Jun 2023
Event2023 European Control Conference, ECC 2023 - Bucharest, Romania
Duration: 13 Jun 202316 Jun 2023

Conference

Conference2023 European Control Conference, ECC 2023
Country/TerritoryRomania
CityBucharest
Period13/06/202316/06/2023

Fingerprint

Dive into the research topics of 'Online Learning of Safety function for Markov Decision Processes'. Together they form a unique fingerprint.

Cite this