This code is structured into three main parts: high-level policy training, mid-level policy training, and low-level.
As the proposed method is a hierarchical generative model, the lower levels need to be implemented first, before the higher levels can be trained. Thus, the low-level controller needs to be first tuned. Then, the mid-level policy can be trained, and lastly, the high-level policy is trained that uses both mid-level and low-level policy.
After successfully obtaining all three policies, the task as outlined in the paper can be achieved by running qlearning/test_val.py.
The high-level policy is trained using double Q-Learning, which is implemented in qlearning/learn_val.py. The policy is tested by running qlearning/test_val.py.
The mid-level policy for locomotion is trained via the Soft Actor-Critic (SAC), which is implemented in the mbpo module, that concurrently uses functionalities from the softlearning module. The mid-level policy (minimum jerk control) for manipulation is directly implemented in env/valkyrie/valkyrie_gym_env/envs/valkyrie.py.
After installation of mbpo (see mbpo_readme.md), the locomotion policy is trained by running
mbpo run_local examples.development --config=examples.config.custom.valkyrie_walk
The low level functionality is implemented in the Valkyrie class in env/valkyrie/valkyrie_gym_env/envs/valkyrie.py. This class serves as environment for simulation as well.
This repository uses code from the repositories mbpo for SAC and deep-reinforcement-learning-gym for Double Q-Learning, which were developed by Michael Janner and Lilian Weng respectively.