We consider Markov decision processes (MDP) with unknown probabilistic transition function and unknown reward function. We formalize the problem of maximizing the mean-payoff value with high probability while satisfying a parity objective in this setting. This problem can be viewed as strategy synthesis for parametric MDP where the parameters are fixed but unknown. Assuming the support of the unknown transition function and a lower bound on the minimal transition probability are known in advance, we construct probably approximately correct (PAC) strategies w.r.t. the mean payoff objective that guarantee sure or almost-sure satisfaction of a parity condition, depending on the memory allowed. This is a joint work with Guillermo Perez and Jean-Francois Raskin published at Concur’18 and ongoing work.
Program Display Configuration
Sat 6 Apr
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Viennachange