When training becomes inference
Key Insights
Our recent work with learned optimizers has revealed several interesting patterns:
- In the domain of learned optimizers, the traditional boundary between training and inference blurs.
- Ultimately, all training will be done by inference with learned optimizers.
- Most breakthroughs in learning come from decades of human ingenuity and pure chance.
- As compute becomes more powerful, there have been huge improvements in model capabilities, but few fundamental breakthroughs.
- We envision that in the future, there will be no more human discovered breakthroughs, and all progress will come from models learning to learn.
- For models to learn to learn, it is necessary to use models to train models - which is inference.
- Inference can imitate any training process - and for significantly cheaper and faster
Practical Implications
Architecture Design
Significant work has been done in neural architecture search, with few successes. In practice, solely modifying the architecture is much less important than the learning process itself. We expect that architectures will become increasingly unimportant, and that a sufficiently intelligent learning algorithm will be able to train arbitrary architectures to similar performance. Approaches previously believed to be excessively narrow and specific to inference will have immensely more general applications
Computational Efficiency
Approaches previously believed to be excessively narrow and specific to inference will have immensely more general applications. There have been numerous new paradigms for inference that circumvent the bottlenecks in traditional matrix multiplication bound regimes.
Outlook
Efficient inference will become the largest bottleneck on training better models, and may ultimately even become more important than the methods we use to train them.