A dilemma
RL vs. Supervised
We have made an important choice: we will focus on using reinforcement learning to bootstrap the learned optimizer instead of traditional supervised gradient-based methods. We believe that once a first strong learned optimizer that is intelligent enough to effectively optimize itself, or other other optimizers is trained, it will be irrelevant how we got there.
Challenges
- Higher computational costs
- Less stable training
- Requires careful reward engineering
Our Current Thinking
On supervised tasks, the potential upside of gradient-based methods are smaller, and it is not clear whether further improvements in model capabilities would be obtained by simply fitting to a dataset faster and tighter. While in reinforcement learning tasks, there is no hard ceiling on the poteontial to improve capabilities in a wide range of tasks, ranging from language modeling, algorithmic tasks and games, robotics, and more.