A dilemma

RL vs. Supervised

We have made an important choice: we will focus on using reinforcement learning to bootstrap the learned optimizer instead of traditional supervised gradient-based methods. We believe that once a first strong learned optimizer that is intelligent enough to effectively optimize itself, or other other optimizers is trained, it will be irrelevant how we got there.

Challenges

  • Higher computational costs
  • Less stable training
  • Requires careful reward engineering

Our Current Thinking

On supervised tasks, the potential upside of gradient-based methods are smaller, and it is not clear whether further improvements in model capabilities would be obtained by simply fitting to a dataset faster and tighter. While in reinforcement learning tasks, there is no hard ceiling on the poteontial to improve capabilities in a wide range of tasks, ranging from language modeling, algorithmic tasks and games, robotics, and more.