Abstract
There is no doubt that we are living in the age of data. In the last two decades, the scientific community has been able to produce systems with superhuman capabilities through the combination of modern hardware advancements, novel learning algorithms and architectures, and advances in software frameworks. Such progress revolutionised domains like computer vision and language processing, showing performance previously out of reach. One may think that results could transfer straightforwardly to other fields like robotics until realising the existence of domain-specific characteristics and limitations hindering the potential of these learning methods. Generating enough data from real-world robots is often too expensive or not even possible to the desired scale. Data sampled from robots has a sequential nature, and not all families of learning algorithms are effective in this context. Furthermore, most algorithms that excel in this sequential setting, such as those belonging to the Reinforcement Learning (RL) family, learn by a trial-and-error process, which could lead to trajectories that damage either the robots or their surroundings.
In this thesis, we attempt to answer the question, "How can modern technology help us generate synthetic data for humanoid robot planning and control?".
Motivated by the advancements in hardware accelerators that are revolutionising scientific computing, we limit our analysis to the simulation realm. In this context, we first introduce a software architecture allowing to structure learning environments for robotics that can be adopted to train and run RL policies regardless of the simulated or real-world setting. With its underlying simulation technology and exploiting a scheme based on reward shaping, we validate the architecture by training with RL a push-recovery controller capable of synthesising whole-body references for the humanoid robot iCub. Then, motivated by overcoming the bottlenecks related to the poor sampling performance of traditional rigid-body simulators, we present a new physics engine in reduced coordinates that can simulate robots interacting with a ground surface on hardware accelerators like GPUs and TPUs. To this end, we present a contact-aware continuous state-space representation describing the dynamical evolution of floating-base robots that can be numerically integrated for simulation purposes. We adopt the new general-purpose Gazebo Sim simulator as our first solution to sample synthetic data, and exploit JAX and its hardware support to scale the sampling performance for highly parallel problems. Furthermore, we implement and benchmark common Rigid Body Dynamics Algorithms part of the proposed physics engine on hardware accelerators and assess their scalability properties on different GPUs. These pieces of technology help to lower the computational barriers that nowadays are still among the main bottlenecks for obtaining intelligent agents, democratising the applicability of this family of learning-based methods.
@phdthesis{ferigo_phd_thesis_2022,
title = {Simulation Architectures for Reinforcement Learning applied to Robotics},
author = {Ferigo, Diego},
school = {University of Manchester},
type = {PhD Thesis},
month = {July},
year = {2022},
url = {https://github.com/diegoferigo/phd-thesis/releases/latest/download/thesis.pdf},
}
For any doubt or to report an error, please open an issue.
If you want to fix the document yourself, please open a PR against the main
branch (see branching details below).
The Continuous Integration pipeline implemented in this repository will compile the LaTeX sources with your contribution
and upload the PDF document as artifact of the workflow for inspection.
This repository has two branches:
overleaf
is the branch connected to my personal Overleaf project.main
is the branch associated to external contributions and releases.
The Overleaf Git system does not currently support branching.
For this reason, I cannot select main
as default branch of the repository, even if it is.
If you want to contribute with a new PR, please target the main
branch.