Rewards in Action for Deep Reinforcement Learning of Industrial Robotics
Updated: Feb 17
"Virtue is its own reward." (Socrates)
In ACROBA a deep reinforcement learning agent forms so-to-speak the digital twin of a manufacturing robot. The agent virtually represents and embodies the robot's physical and functional capabilities (skills), limitations set out by the work-cell and its tasks . During a simulation of a specific manufacturing scenario the agent tries to optimize the robot skills in carrying out tasks such as cutting holes and removing flashes from a plastic container. The agent does so over many simulation runs by searching for those sequences of actions that maximize the overall score of the perception-guided control of the robot skills in accomplishing such a task. In this context, MrNeC develops on the one hand agent perception-guided control action functions that are not only feasible but also help speed up deep reinforcement learning. It does so by incorporating static or dynamic constraints that correspond to e.g. robot's joints limits, obstacles and human operators being present or active in the work-cell, directly into the agent's perception-guided control architecture. The agent action functions ensure that the perception-guided control space of the robot is maximally and most efficiently searched. On the other hand, MrNeC develops agent reward functions that evaluate the impact of the simulated / real-world increase in robot's skill, which is a direct consequence of the agent's recommended perception-guided control actions, on the the task performance (e.g. manufacturing speed and human safety). The output of these of these reward functions in combination with the observation action spaces yield the necessary basis for computing the overall score in optimizing the robot's skill. Rewarding in action by deep reinforcement learning agents using different policies are demonstrated for special autonomous light-out and human robot collaborative pilot cases.
Results are reported, extended and released over the next one-and-half year.