Gradient-Based Reinforcement Learning for Dynamic Quantile

Gradient-Based Reinforcement Learning for Dynamic Quantile

Autoři: Lukáš Janásek
Publikováno v: IES Working Papers 12/2025
Klíčová slova:

Dynamic programming, Quantile preferences, Reinforcement learning

JEL kódy:

C61, C63

Citace:

Janásek L. (2025): " Gradient-Based Reinforcement Learning for Dynamic Quantile " IES Working Papers 12/2025. IES FSV. Charles University.

Abstrakt:

This paper develops a novel gradient-based reinforcement learning algorithm for solving dynamic quantile models with uncertainty. Unlike traditional approaches that rely on expected utility maximization, we focus on agents who evaluate outcomes based on specific quantiles of the utility distribution, capturing intratemporal risk attitudes via a quantile level τ ∈ (0, 1). We formulate a recursive quantile value function associated with time consistent dynamic quantile preferences in Markov decision process. At each period, the agent aims to maximize the quantile of a distribution composed of instantaneous utility combined with the discounted future value, conditioned on the current state. Next, we adapt the Actor-Critic framework to learn τ-quantile of the distribution and policy maximizing the τ-quantile. We demonstrate the accuracy and robustness of the proposed algorithm using an quantile intertemporal consumption model with known analytical solutions. The results confirm the effectiveness of our algorithm in capturing optimal quantile-based behavior and stability of the algorithm.

Ke stažení: wp_2025_12_janasek