This time I want to explore how deep reinforcement learning can be utilized e.g. making a humanoid model walk. This kind of task is a continuous control task. A solution to such a task differs from the one you might know and use to play Atari games, like Pong, with e.g. Deep Q-Network (DQN). I’ll talk about what characterizes continuous control environments. Then, I’ll introduce the actor-critic architecture to you and show the example of the state-of-the-art actor-critic method, Soft Actor-Critic (SAC). Finally, we will dive into the code.