Safe reinforcement learning for constrained Markov decision processes with stochastic stopping time

Abhijit Mazumdar*, Rafal Wisniewski, Manuela L Bujorianu

*Kontaktforfatter

Publikation: Bidrag til bog/antologi/rapport/konference proceedingKonferenceartikel i proceedingForskningpeer review

5 Downloads (Pure)

Abstract

In this paper, we present an online reinforcement learning algorithm for constrained Markov decision processes with a safety constraint. Despite the necessary attention of the scientific community, considering stochastic stopping time, the problem of learning optimal policy without violating safety constraints during the learning phase is yet to be addressed. To this end, we propose an algorithm based on linear programming that does not require a process model. We show that the learned policy is safe with high confidence. We also propose a method to compute a safe baseline policy, which is central in developing algorithms that do not violate the safety constraints. Finally, we provide simulation results to show the efficacy of the proposed algorithm. Further, we demonstrate that efficient exploration can be achieved by defining a subset of the state-space called proxy set.
OriginalsprogEngelsk
Titel2024 IEEE 63rd Conference on Decision and Control (CDC)
ForlagIEEE (Institute of Electrical and Electronics Engineers)
Publikationsdato2024
Artikelnummer10886382
ISBN (Trykt)979-8-3503-1632-2, 979-8-3503-1634-6
ISBN (Elektronisk)979-8-3503-1633-9
DOI
StatusUdgivet - 2024
Begivenhed2024 IEEE 63rd Conference on Decision and Control (CDC) - Milan, Italien
Varighed: 16 dec. 202419 dec. 2024

Konference

Konference2024 IEEE 63rd Conference on Decision and Control (CDC)
Land/OmrådeItalien
ByMilan
Periode16/12/202419/12/2024
NavnI E E E Conference on Decision and Control. Proceedings
ISSN0743-1546

Fingeraftryk

Dyk ned i forskningsemnerne om 'Safe reinforcement learning for constrained Markov decision processes with stochastic stopping time'. Sammen danner de et unikt fingeraftryk.

Citationsformater