Volume 1, Issue 1
Perturbational Complexity by Distribution Mismatch: A Systematic Analysis of Reinforcement Learning in Reproducing Kernel Hilbert Space

Jihao Long & Jiequn Han

J. Mach. Learn. , 1 (2022), pp. 1-37.

Published online: 2022-03

Category: Theory

[An open-access article; the PDF is free to any online user.]

Export citation
  • Abstract

Most existing theoretical analysis of reinforcement learning (RL) is limited to the tabular setting or linear models due to the difficulty in dealing with function approximation in high dimensional space with an uncertain environment. This work offers a fresh perspective into this challenge by analyzing RL in a general reproducing kernel Hilbert space (RKHS). We consider a family of Markov decision processes $\mathcal{M}$ of which the reward functions lie in the unit ball of an RKHS and transition probabilities lie in a given arbitrary set. We define a quantity called perturbational complexity by distribution mismatch $∆_{\mathcal{M}}(\epsilon)$ to characterize the complexity of the admissible state-action distribution space in response to a perturbation in the RKHS with scale $\epsilon$. We show that $∆_{\mathcal{M}}(\epsilon)$ gives both the lower bound of the error of all possible algorithms and the upper bound of two specific algorithms (fitted reward and fitted $Q$-iteration) for the RL problem. Hence, the decay of $∆_{\mathcal{M}}(\epsilon)$ with respect to $\epsilon$ measures the difficulty of the RL problem on $\mathcal{M}.$ We further provide some concrete examples and discuss whether $∆_{\mathcal{M}}(\epsilon)$ decays fast or not in these examples. As a byproduct, we show that when the reward functions lie in a high dimensional RKHS, even if the transition probability is known and the action space is finite, it is still possible for RL problems to suffer from the curse of dimensionality.

  • General Summary

Reinforcement learning is a field of machine learning concerned with how agents interact with an unknown environment sequentially to maximize the expected cumulative reward. In practical problems with enormous states, function approximation is necessary to represent value or policy functions. However, when estimating the value or policy functions, one can not access the probability distribution under which the functions should be approximated. That is the state distribution induced by the estimated value or policy functions, which are unknown a priori. This phenomenon is termed distribution mismatch in this paper: a mismatch between the distribution used for the estimation and the distribution induced by the current approximation of the value or policy functions. This phenomenon is ubiquitous in the analysis of reinforcement learning algorithms and poses significant challenges for analysis.

This paper provides a new perspective to deal with this difficulty. An obvious sufficient condition is to show that the errors are uniformly small under the probability distributions induced by all possible policies. This work shows that this condition is also necessary, in the setting when the problem is defined in the reproducing kernel Hilbert space. It also introduces a key new concept, perturbational complexity by distribution mismatch, which characterizes the complexity of the corresponding reinforcement learning problem.

  • AMS Subject Headings

  • Copyright

COPYRIGHT: © Global Science Press

  • Email address
  • BibTex
  • RIS
  • TXT
@Article{JML-1-1, author = {Long , Jihao and Han , Jiequn}, title = {Perturbational Complexity by Distribution Mismatch: A Systematic Analysis of Reinforcement Learning in Reproducing Kernel Hilbert Space}, journal = {Journal of Machine Learning}, year = {2022}, volume = {1}, number = {1}, pages = {1--37}, abstract = {

Most existing theoretical analysis of reinforcement learning (RL) is limited to the tabular setting or linear models due to the difficulty in dealing with function approximation in high dimensional space with an uncertain environment. This work offers a fresh perspective into this challenge by analyzing RL in a general reproducing kernel Hilbert space (RKHS). We consider a family of Markov decision processes $\mathcal{M}$ of which the reward functions lie in the unit ball of an RKHS and transition probabilities lie in a given arbitrary set. We define a quantity called perturbational complexity by distribution mismatch $∆_{\mathcal{M}}(\epsilon)$ to characterize the complexity of the admissible state-action distribution space in response to a perturbation in the RKHS with scale $\epsilon$. We show that $∆_{\mathcal{M}}(\epsilon)$ gives both the lower bound of the error of all possible algorithms and the upper bound of two specific algorithms (fitted reward and fitted $Q$-iteration) for the RL problem. Hence, the decay of $∆_{\mathcal{M}}(\epsilon)$ with respect to $\epsilon$ measures the difficulty of the RL problem on $\mathcal{M}.$ We further provide some concrete examples and discuss whether $∆_{\mathcal{M}}(\epsilon)$ decays fast or not in these examples. As a byproduct, we show that when the reward functions lie in a high dimensional RKHS, even if the transition probability is known and the action space is finite, it is still possible for RL problems to suffer from the curse of dimensionality.

}, issn = {2790-2048}, doi = {https://doi.org/10.4208/jml.220114}, url = {http://global-sci.org/intro/article_detail/jml/20370.html} }
TY - JOUR T1 - Perturbational Complexity by Distribution Mismatch: A Systematic Analysis of Reinforcement Learning in Reproducing Kernel Hilbert Space AU - Long , Jihao AU - Han , Jiequn JO - Journal of Machine Learning VL - 1 SP - 1 EP - 37 PY - 2022 DA - 2022/03 SN - 1 DO - http://doi.org/10.4208/jml.220114 UR - https://global-sci.org/intro/article_detail/jml/20370.html KW - Reinforcement learning, Reproducing kernel Hilbert space, Perturbational complexity by distribution mismatch, High-dimensionality analysis. AB -

Most existing theoretical analysis of reinforcement learning (RL) is limited to the tabular setting or linear models due to the difficulty in dealing with function approximation in high dimensional space with an uncertain environment. This work offers a fresh perspective into this challenge by analyzing RL in a general reproducing kernel Hilbert space (RKHS). We consider a family of Markov decision processes $\mathcal{M}$ of which the reward functions lie in the unit ball of an RKHS and transition probabilities lie in a given arbitrary set. We define a quantity called perturbational complexity by distribution mismatch $∆_{\mathcal{M}}(\epsilon)$ to characterize the complexity of the admissible state-action distribution space in response to a perturbation in the RKHS with scale $\epsilon$. We show that $∆_{\mathcal{M}}(\epsilon)$ gives both the lower bound of the error of all possible algorithms and the upper bound of two specific algorithms (fitted reward and fitted $Q$-iteration) for the RL problem. Hence, the decay of $∆_{\mathcal{M}}(\epsilon)$ with respect to $\epsilon$ measures the difficulty of the RL problem on $\mathcal{M}.$ We further provide some concrete examples and discuss whether $∆_{\mathcal{M}}(\epsilon)$ decays fast or not in these examples. As a byproduct, we show that when the reward functions lie in a high dimensional RKHS, even if the transition probability is known and the action space is finite, it is still possible for RL problems to suffer from the curse of dimensionality.

Long , Jihao and Han , Jiequn. (2022). Perturbational Complexity by Distribution Mismatch: A Systematic Analysis of Reinforcement Learning in Reproducing Kernel Hilbert Space. Journal of Machine Learning. 1 (1). 1-37. doi:10.4208/jml.220114
Copy to clipboard
The citation has been copied to your clipboard