Debugging a Policy: Automatic Action-Policy Testing in AI Planning

Marcel Steinmetz, Daniel Fišer, Hasan Ferit Eniser, Patrick Ferber, Timo P. Gros, Philippe Heim, Daniel Höller, Xandra Schuler, Valentin Wüstholz, Maria Christakis, Jörg Hoffmann

Research output: Contribution to book/anthology/report/conference proceedingArticle in proceedingResearchpeer-review

6 Citations (Scopus)

Abstract

Testing is a promising way to gain trust in neural action policies π. Previous work on policy testing in sequential decision making targeted environment behavior leading to failure conditions. But if the failure is unavoidable given that behavior, then π is not actually to blame. For a situation to qualify as a "bug" in π, there must be an alternative policy π' that does better. We introduce a generic policy testing framework based on that intuition. This raises the bug confirmation problem, deciding whether or not a state is a bug. We analyze the use of optimistic and pessimistic bounds for the design of test oracles approximating that problem. We contribute an implementation of our framework in classical planning, experimenting with several test oracles and with random-walk methods generating test states biased to poor policy performance and/or state novelty. We evaluate these techniques on policies π learned with ASNets. We find that they are able to effectively identify bugs in these π, and that our random-walk biases improve over uninformed baselines.
Original languageEnglish
Title of host publicationroceedings of the Thirty-Second International Conference on Automated Planning and Scheduling, ICAPS-22
Number of pages9
Publication date13 Jun 2022
Pages353-361
DOIs
Publication statusPublished - 13 Jun 2022
Externally publishedYes
EventThe 32nd International Conference on Automated Planning and Scheduling - Virtual, Singapore, Singapore
Duration: 13 Jun 202224 Jun 2022

Conference

ConferenceThe 32nd International Conference on Automated Planning and Scheduling
LocationVirtual
Country/TerritorySingapore
CitySingapore
Period13/06/202224/06/2022

Fingerprint

Dive into the research topics of 'Debugging a Policy: Automatic Action-Policy Testing in AI Planning'. Together they form a unique fingerprint.

Cite this