Projekter pr. år
Abstract
We study optimality for the safety-constrained Markov decision process which is the underlying framework for safe reinforcement learning. Specifically, we consider an undiscounted safety-constrained Markov decision process subject to random stopping times. The decision maker's goal is to reach a goal state while avoiding unsafe states with certain probabilistic guarantees. Therefore the underlying Markov chain for any control policy will be Multichain or non-ergodic since by definition there exists a goal set and an unsafe set. Bellman's principle of optimality does not hold for such a safety-constrained Markov decision process in a Multichain setting as highlighted by a counterexample. We resolve the aforementioned counterexample by considering a zero-sum game setting between the policy and the Lagrange multiplier vector. Under suitable assumptions regarding the existence of admissible policy, we propose an off-policy RL algorithm for learning an optimal policy that satisfies the probabilistic safety guarantees. After that, we present the finite time error bound of the proposed RL algorithm. Lastly, we present simulation results of the aforementioned RL algorithm on a robot in a grid world setting.
Originalsprog | Engelsk |
---|---|
Bogserie | IFAC-PapersOnLine |
Vol/bind | 58 |
Udgave nummer | 17 |
Sider (fra-til) | 338-343 |
Antal sider | 6 |
ISSN | 2405-8971 |
DOI | |
Status | Udgivet - 1 aug. 2024 |
Begivenhed | 26th International Symposium on Mathematical Theory of Networks and Systems, MTNS 2024 - Cambridge, Storbritannien Varighed: 19 aug. 2024 → 23 aug. 2024 |
Konference
Konference | 26th International Symposium on Mathematical Theory of Networks and Systems, MTNS 2024 |
---|---|
Land/Område | Storbritannien |
By | Cambridge |
Periode | 19/08/2024 → 23/08/2024 |
Sponsor | International Federation of Automatic Control (IFAC) |
Bibliografisk note
Publisher Copyright:Copyright © 2024 The Authors.
Fingeraftryk
Dyk ned i forskningsemnerne om 'On principle of optimality for safety-constrained Markov Decision Process and p-Safe Reinforcement Learning'. Sammen danner de et unikt fingeraftryk.Projekter
- 1 Afsluttet
-
SWIFT
Wisniewski, R. (PI (principal investigator)), Misra, R. (Projektdeltager), Rathore, S. S. (Projektdeltager), Kuskonmaz, B. (Projektdeltager), Jessen, J. F. (Projektdeltager), Andersen, A. O. (CoPI) & Gundersen, J. S. (Projektdeltager)
01/10/2019 → 30/09/2024
Projekter: Projekt › Forskning
Udstyr
-
Smart Water Infrastructures Laboratory (SWIL)
Ledesma, J. V. (Operatør), Wisniewski, R. (Leder), Kallesøe, C. (Operatør), Rathore, S. S. (Leder), Misra, R. (Leder), Sawant, V. S. (Leder) & Mazumdar, A. (Leder)
Institut for Elektroniske SystemerFacilitet: Laboratorie
Publikation
- 1 Ph.d.-afhandling
-
Decentralized Control of Complex Systems: Managing Uncertainties and Multi-objective Optimization with Multiple Controllers
Misra, R., 2024, Aalborg University Open Publishing. 223 s.Publikation: Ph.d.-afhandling
Åben adgangFil51 Downloads (Pure)