top of page

Rewards in Action for Deep Reinforcement Learning of Industrial Robotics

Updated: Aug 9, 2023

"Virtue is its own reward." (Socrates)

Potential flow away from obstacles surfaces and ending on target object - escaping both white and black holes!

In ACROBA a deep reinforcement learning agent forms so-to-speak the digital twin of a manufacturing robot. The agent virtually represents and embodies the robot's physical and functional capabilities (skills), limitations set out by the work-cell and its tasks . During a simulation of a specific manufacturing scenario the agent tries to optimize the robot skills in carrying out tasks such as cutting holes and removing flashes from a plastic container. The agent does so over many simulation runs by searching for those sequences of actions that maximize the overall score of the perception-guided control of the robot skills in accomplishing such a task. In this context, MrNeC develops on the one hand agent perception-guided control action functions that are not only feasible but also help speed up deep reinforcement learning. It does so by incorporating static or dynamic constraints that correspond to e.g. robot's joints limits, obstacles and human operators being present or active in the work-cell, directly into the agent's perception-guided control architecture. The agent action functions ensure that the perception-guided control space of the robot is maximally and most efficiently searched. On the other hand, MrNeC develops agent reward functions that evaluate the impact of the simulated / real-world increase in robot's skill, which is a direct consequence of the agent's recommended perception-guided control actions, on the the task performance (e.g. manufacturing speed and human safety). The output of these of these reward functions in combination with the observation action spaces yield the necessary basis for computing the overall score in optimizing the robot's skill. Rewarding in action by deep reinforcement learning agents using different policies are demonstrated for special autonomous light-out and human robot collaborative pilot cases.

Results are reported, extended and released in the report below:



ACROBA uses reward functions to evaluate and create neural networks for optimized perception-guided robot control in a wide range of industrial applications. Here we design and showcase reward functions such that a DRL agent acquires an optimized policy for the generation of a Tool Centre Point (TCP) pose and tool width trajectory of a robot arm with gripper. The agent tries during training to align the TCP pose as accurately as possible with the perceived grasp pose for picking a part from a bin and tries at the same time to minimise the distance traversed by the TCP.

Rewards in Action for Deep Reinforcement Learning of Industrial Robotics
Download PDF • 608KB

92 views0 comments


Deep Reinforcement Learning 

Nonlocal Multi-Scale Complex Interaction Network Analytics and Predictive Distributed Control 

Critical Infrastructures 

Health and Home Care

Business Intelligence

Autonomous Transport and Logistics

Smart Regions, Industry and Nations

bottom of page