Deep Reinforcement Learning (DRL) faces challenges bridging the sim-to-real gap to enable real-world applications. In contrast to the simulated environments used in conventional DRL training, real-world systems are non-linear and evolve in an asynchronous fashion; sensors and actuators have limited precision; communication channels are noisy; and many components introduce variable delays. While these issues are known to many researchers, published methods for systematically tackling the problem of DRL training under these conditions without using simulation are sparse in the field. To this end, this paper proposes a non-blocking and asynchronous DRL training architecture for non-linear, real- time dynamical systems tailored to handling variable delays. Compared to conventional DRL training, we: (i) decouple the RL loop into separate processes run independently at their own frequencies, (ii) further decouple collection of transition tuples (st,att,st+1) via asynchronous and independent streaming of both actions and observations, and (iii) mitigate the effects of delays and increase sample efficiency by providing delay-length measurements to the training loop and regular retraining of the DRL network. This allows the action step time to be tuned to find an optimal control frequency for a given system, and handles streamed observations that arrive with random delays and independently of action timing. We demonstrate the efficacy of this architecture with a physical implementations of a commodity-grade swing-up pendulum and a quadrupedal robot. Our architecture achieves the best results balancing the pendulum for almost entire length of the episode, compared to conventional blocking approaches which fail to learn effective policies. Our results show that these techniques scale to more complex tasks such as quadrupedal locomotion.

Speaker Bio:

Peter Bohm received his Masters in Management of Information Systems from the Comenius University in Bratislava, Slovakia. He is currently a PhD student under supervision of Dr. Archie Chapman and Dr. Pauline Pounds. His research interests include deep reinforcement learning (DRL) and implementation of DRL in robotics and UAVs.

About Data Science Seminar

This seminar series will be run as weekly sessions and is hosted by ITEE Data Science.

78-421 and via Zoom