It’s worth noting that the progress so far, though it follows a task doubling time of 7 months (per METR), is not built on simple scaling laws. Therefore, it is perhaps not reasonable to expect continual progress just because research time passes. Since scaling laws in pre-training had been exhausted, reasoning tokens, agentic workflows, and scaling RL have provided the additional boosts. Most evidence point to these additional developments as having plateaus.
I am skeptical about using priors in a time series without considering the variables that have driven that performance increase to begin with.
I would expect the continual progress to reach a plateau by the end of 2025, unless another architectural breakthrough achieves scale.
LAMs or large agentic models is the next scaler - i thought this was known? First you optimise the best combination of ideas, then you RL on their cooperation to work in increasingly organised ways, something like an team in a workplace.
It’s worth noting that the progress so far, though it follows a task doubling time of 7 months (per METR), is not built on simple scaling laws. Therefore, it is perhaps not reasonable to expect continual progress just because research time passes. Since scaling laws in pre-training had been exhausted, reasoning tokens, agentic workflows, and scaling RL have provided the additional boosts. Most evidence point to these additional developments as having plateaus.
I am skeptical about using priors in a time series without considering the variables that have driven that performance increase to begin with.
I would expect the continual progress to reach a plateau by the end of 2025, unless another architectural breakthrough achieves scale.
LAMs or large agentic models is the next scaler - i thought this was known? First you optimise the best combination of ideas, then you RL on their cooperation to work in increasingly organised ways, something like an team in a workplace.