Discussion about this post

User's avatar
Rainbow Roxy's avatar

Hey, great read as always. It just shows how things rarely scale uniformely, reminding me of slowly perfecting a new Pilates move.

Expand full comment
Egg Syntax's avatar

This post seems valuable and I agree that it's a metric worth tracking both as information about how close we are to research automation and a way to get a better understanding of the degree to which Anthropic (or at least Dario Amodei) is being hype-y.

I'd like to zero in a bit on one claim you make which surprised me:

'[Even if 90% of the code is written by AI] this probably results in a productivity boost for engineers that is much less than 10x and I’d guess less than 2x...The productivity boost at a given fraction of code generated isn’t that high because AI allows people to cheaply generate lots of very low value code.'

I'd love to understand better why you say this. If we assume (as you do in the adjudication) that this is 'measured by what gets committed to relevant repositories', then for this to be true, the quality of committed code at Anthropic would have to have gone way down. Do you expect this to be the case? My sense has been that at good orgs, developers may use LLMs to generate code, but they're responsible for ensuring that what makes it into pull requests is high quality. Maybe you're skeptical of that being the case?

The METR study on open source developers (https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/), if true, would justify the claim, but there's been a fair amount of skepticism of those results applying in real-world situations. To what extent is your claim based on the results from that study?

To be clear, I'd certainly believe that it would be less than 10x, but would be pretty surprised if it was less than, say, 4x (conditional on 90% of the committed code being AI-written).

Thanks!

Expand full comment

No posts