Learning-Based Mean-Payoff Optimization in an unknown MDP under Omega-Regular Constraints (SynCoP 2019 - 6th Workshop on Synthesis of Complex Parameters)

Sat 6 - Thu 11 April 2019 Prague, Czech Republic

Track

SynCoP 2019

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Sat 6 Apr 2019 16:45 - 17:30 at S10 - Markov decision processes

Abstract

We consider Markov decision processes (MDP) with unknown probabilistic transition function and unknown reward function. We formalize the problem of maximizing the mean-payoff value with high probability while satisfying a parity objective in this setting. This problem can be viewed as strategy synthesis for parametric MDP where the parameters are fixed but unknown. Assuming the support of the unknown transition function and a lower bound on the minimal transition probability are known in advance, we construct probably approximately correct (PAC) strategies w.r.t. the mean payoff objective that guarantee sure or almost-sure satisfaction of a parity condition, depending on the memory allowed. This is a joint work with Guillermo Perez and Jean-Francois Raskin published at Concur’18 and ongoing work.

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Sat 6 Apr
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

16:00 - 18:00	Markov decision processesSynCoP at S10

16:00 45m Talk		Convex Optimization meets Parameter Synthesis for MDPs SynCoP Nils Jansen
16:45 45m Talk		Learning-Based Mean-Payoff Optimization in an unknown MDP under Omega-Regular Constraints SynCoP Jan Kretinsky Technical University of Munich

Learning-Based Mean-Payoff Optimization in an unknown MDP under Omega-Regular Constraints

Sat 6 Apr
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

Jan Kretinsky

Technical University of Munich

ETAPS 2019

Co-hosted Conferences

Workshops

Learning-Based Mean-Payoff Optimization in an unknown MDP under Omega-Regular Constraints

Program Display Configuration

Program Display Configuration

Sat 6 AprDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

Jan Kretinsky

Technical University of Munich

Sat 6 Apr
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change