A dilemma

2025-04-01 2 min read

RL vs. Supervised

We have made an important choice: we will focus on using reinforcement learning to bootstrap the learned optimizer instead of traditional supervised gradient-based methods. We believe that once a first strong learned optimizer that is intelligent enough to effectively optimize itself, or other other optimizers is trained, it will be irrelevant how we got there.

Challenges

Higher computational costs
Less stable training
Requires careful reward engineering

Our Current Thinking

On supervised tasks, the potential upside of gradient-based methods are smaller, and it is not clear whether further improvements in model capabilities would be obtained by simply fitting to a dataset faster and tighter. While in reinforcement learning tasks, there is no hard ceiling on the poteontial to improve capabilities in a wide range of tasks, ranging from language modeling, algorithmic tasks and games, robotics, and more.