Abstract
Modern model checking tools like UPPAAL Stratego pro-
vide a rich framework for modeling cyber-physical systems involving
non-determinism as well as time, stochastic and continuous state de-
scriptors. A key objective is to design controllers for such systems that
optimize a given objective, e.g., minimizing energy consumption. At an
abstract level, the controller design problem can be cast as optimizing a
strategy in a continuous (Euclidean) Markov decision process. Partition-
ing the continuous state space is a simple yet effective strategy to solve
this optimization problem in a flexible, non-parametric manner. In pre-
vious work we have introduced a reinforcement learning strategy under
an undiscounted cost objective on dynamically refined partitions, and we
have analyzed at the semantic level approximations of Euclidean MDPs
by Imprecise MDPs. In this paper we are extending the approximation
analysis to discounted and average cost objectives, and we are moving
to close the gap between the theoretical analysis and the practical rein-
forcement learning approach. We introduce several alternative simulation
strategies that on the one hand maintain approximation guarantees as
the granularity of the partitioning increases, and on the other hand turns
our learning scenario into a standard Q-learning procedure.
vide a rich framework for modeling cyber-physical systems involving
non-determinism as well as time, stochastic and continuous state de-
scriptors. A key objective is to design controllers for such systems that
optimize a given objective, e.g., minimizing energy consumption. At an
abstract level, the controller design problem can be cast as optimizing a
strategy in a continuous (Euclidean) Markov decision process. Partition-
ing the continuous state space is a simple yet effective strategy to solve
this optimization problem in a flexible, non-parametric manner. In pre-
vious work we have introduced a reinforcement learning strategy under
an undiscounted cost objective on dynamically refined partitions, and we
have analyzed at the semantic level approximations of Euclidean MDPs
by Imprecise MDPs. In this paper we are extending the approximation
analysis to discounted and average cost objectives, and we are moving
to close the gap between the theoretical analysis and the practical rein-
forcement learning approach. We introduce several alternative simulation
strategies that on the one hand maintain approximation guarantees as
the granularity of the partitioning increases, and on the other hand turns
our learning scenario into a standard Q-learning procedure.
Originalsprog | Engelsk |
---|---|
Titel | Bridging the Gap Between AI and Reality : Second International Conference, AISoLA 2024 |
Redaktører | Bernhard Steffen |
Forlag | Springer |
Publikationsdato | 30 dec. 2024 |
Sider | 312-335 |
ISBN (Trykt) | 978-3-031-75433-3 |
ISBN (Elektronisk) | 978-3-031-75434-0 |
DOI | |
Status | Udgivet - 30 dec. 2024 |
Begivenhed | AISoLA 2024 - Crete, Grækenland Varighed: 30 okt. 2024 → 3 nov. 2024 |
Konference
Konference | AISoLA 2024 |
---|---|
Land/Område | Grækenland |
By | Crete |
Periode | 30/10/2024 → 03/11/2024 |
Navn | Lecture Notes in Computer Science (LNCS) |
---|---|
Vol/bind | LNCS 15217 |
ISSN | 0302-9743 |