Reward Shaping for Reinforcement Learning in the Aviation Industry

Published in Hamburg University of Technology (TUHH), 2022

The incorporation of neural network models into reinforcement learning facilitates the application of machine learning elements in high dimensional and complex problems. Some of the most demanding challenges result from the field of robotics which are widely investigated to increase the utilization of process automation techniques. In this work the advantages of automatization is further explored by examining a robotic gripper based on the control theories of deep reinforcement learning to open the door of a printer enclosure. For the training process, two of the currently most used reinforcement learning algorithms Soft Actor-Critic (SAC) and Proximal Policy Optimization (PPO) are implemented. The core idea in reinforcement learning is to explore an unknown environment by trial and error where the feedback of a predefined reward function guides the agent to determine the best performing strategy. In order to study the convergence properties and accelerate the training, different reward functions are established and compared against each other. Furthermore, suitable evaluation metrics are deployed to illustrate the success rate of the task completion. To validate the reward functions, different simulation environments are built up that imitate the dynamics of the real world door opening problem. In this regard, interesting and surprising learning behaviours can be shown in which the reinforcement agent tries to benefit from every inaccuracy in the task specifications. More importantly it could be concluded that PPO is faster and more reliable then SAC with respect to the door opening task and that carefully designing the reward functions can improve the learning behaviour of the training.

Recommended citation: Wei, J. "Reward Shaping for Reinforcement Learning in the Aviation Industry", Master's Thesis, Hamburg University of Technology, October 2022.

Johann Christensen (né Lange)