L*-Based Learning of Markov Decision Processes

Martin Tappler*, Bernhard K. Aichernig, Giovanni Bacci, Maria Eichlseder, Kim Guldstrand Larsen


Publikation: Bidrag til bog/antologi/rapport/konference proceedingKonferenceartikel i proceedingForskningpeer review

5 Citationer (Scopus)
11 Downloads (Pure)


Automata learning techniques automatically generate system models from test observations. These techniques usually fall into two categories: passive and active. Passive learning uses a predetermined data set, e.g., system logs. In contrast, active learning actively queries the system under learning, which is considered more efficient.
An influential active learning technique is Angluin’s L* algorithm for regular languages which inspired several generalisations from DFAs to other automata-based modelling formalisms. In this work, we study L*-based learning of deterministic Markov decision processes, first assuming an ideal setting with perfect information. Then, we relax this assumption and present a novel learning algorithm that collects information by sampling system traces via testing. Experiments with the implementation of our sampling-based algorithm suggest that it achieves better accuracy than state-of-the-art passive learning techniques with the same amount of test data. Unlike existing learning algorithms with predefined states, our algorithm learns the complete model structure including the states.
TitelFormal Methods – The Next 30 Years - 3rd World Congress, FM 2019, Proceedings : FM 2019: Formal Methods – The Next 30 Years
RedaktørerMaurice H. ter Beek, Annabelle McIver, José N. Oliveira
Antal sider19
Publikationsdato23 sep. 2019
ISBN (Trykt)978-3-030-30941-1
ISBN (Elektronisk)978-3-030-30942-8
StatusUdgivet - 23 sep. 2019
BegivenhedInternational Symposium on Formal Methods: Formal Methods – The Next 30 Years - Porto, Portugal
Varighed: 7 okt. 201911 okt. 2019


KonferenceInternational Symposium on Formal Methods
NavnLecture Notes in Computer Science

Fingeraftryk Dyk ned i forskningsemnerne om 'L*-Based Learning of Markov Decision Processes'. Sammen danner de et unikt fingeraftryk.