Teaching Stratego to Play Ball: Optimal Synthesis for Continuous Space MDPs

Manfred Jaeger, Peter Gjøl Jensen*, Kim Guldstrand Larsen, Axel Bernard E Legay, Sean Sedwards, Jakob Haahr Taankvist

*Corresponding author for this work

Research output: Contribution to book/anthology/report/conference proceedingArticle in proceedingResearchpeer-review

18 Citations (Scopus)
353 Downloads (Pure)

Abstract

Formal models of cyber-physical systems, such as priced timed Markov decision processes, require a state space with continuous and discrete components. The problem of controller synthesis for such systems then can be cast as finding optimal strategies for Markov decision processes over a Euclidean state space. We develop two different reinforcement learning strategies that tackle the problem of continuous state spaces via online partition refinement techniques. We provide theoretical insights into the convergence of partition refinement schemes. Our techniques are implemented in Open image in new window . Experimental results show the advantages of our new techniques over previous optimization algorithms of Open image in new window .
Original languageEnglish
Title of host publicationAutomated Technology for Verification and Analysis- 17th International Symposium, AVTA 2019, Proceedings : ATVA 2019: Automated Technology for Verification and Analysis
EditorsYu-Fang Chen, Chih-Hong Cheng, Javier Esparza
Number of pages17
PublisherSpringer
Publication date28 Oct 2019
Pages81-97
ISBN (Print)978-3-030-31783-6
ISBN (Electronic)978-3-030-31784-3
DOIs
Publication statusPublished - 28 Oct 2019
EventInternational Symposium on Automated Technology for Verification and Analysis - Taipei, Taiwan, Province of China
Duration: 28 Oct 201931 Oct 2019

Conference

ConferenceInternational Symposium on Automated Technology for Verification and Analysis
Country/TerritoryTaiwan, Province of China
CityTaipei
Period28/10/201931/10/2019
SeriesLecture Notes in Computer Science
Volume11781
ISSN0302-9743

Fingerprint

Dive into the research topics of 'Teaching Stratego to Play Ball: Optimal Synthesis for Continuous Space MDPs'. Together they form a unique fingerprint.

Cite this