Optimal-PhiBE: A Framework for Reinforcement Learning in the Physical World
Yuhua Zhu
Assistant Professor, Department of Statistics and Data Science, UCLA

Abstract: This talk addresses reinforcement learning (RL) in the physical world where dynamics exhibit smooth transitions. The challenge is to we want to control the system in continuouslu in time but only discrete-time data are available. We first highlights the classical RL algorithms suffer from substantial discretization error because they ignore the intrinsic smoothness structure. We then introduce Optimal-PhiBE, a structure-aware framework that integrates discrete-time information into a continuous-time PDE. By utilizing the smooth structure of the system dynamics, PhiBE yields a provably more stable approximation to the optimal policy. In linear-quadratic control, Optimal-PhiBE can even achieve accurate optimal policy with only discrete-time information.
We further develop a model free algorithm to compute Optimal PhiBE and establish its convergence under model misspecification, together with finite-sample guarantees. Remarkably, a minimal modification, changing a single line of code in standard RL algorithms, enables adaptation to physical world RL problems and significantly reduces both errors. Finally, we characterize a fundamental trade-off between discretization error and sample error that is inherent in RL problems in the physical world.