Online Learning of Safety function for Markov Decision Processes

Abhijit Mazumdar*, Rafal Wisniewski, Manuela L Bujorianu

*Kontaktforfatter

Publikation: Bidrag til bog/antologi/rapport/konference proceedingKonferenceartikel i proceedingForskningpeer review

1 Citationer (Scopus)

Abstract

In this paper, we aim to study safety specifications for a Markov decision process with stochastic stopping time in an almost model-free setting. Our approach involves characterizing a proxy set of the states that are near in a probabilistic sense to the set of unsafe states - forbidden set. We also provide results that relate safety function with reinforcement learning. Consequently, we develop an online algorithm based on the temporal difference method to compute the safety function. Finally, we provide simulation results that demonstrate our work in a simple example.
OriginalsprogEngelsk
Titel2023 European Control Conference (ECC)
Antal sider6
ForlagIEEE (Institute of Electrical and Electronics Engineers)
Publikationsdato13 jun. 2023
Sider1-6
ISBN (Trykt)978-1-6654-6531-1, 978-3-907144-09-1
ISBN (Elektronisk)978-3-907144-08-4
DOI
StatusUdgivet - 13 jun. 2023
Begivenhed2023 European Control Conference, ECC 2023 - Bucharest, Rumænien
Varighed: 13 jun. 202316 jun. 2023

Konference

Konference2023 European Control Conference, ECC 2023
Land/OmrådeRumænien
ByBucharest
Periode13/06/202316/06/2023

Fingeraftryk

Dyk ned i forskningsemnerne om 'Online Learning of Safety function for Markov Decision Processes'. Sammen danner de et unikt fingeraftryk.

Citationsformater