Should we update against seeing relatively…

Jul 28

Maybe we should (re)assess the case for relatively fast progress after the GPT-5 release.

4 Comments

> Interestingly, reasoning doesn't seem to help Anthropic models on agentic software engineering tasks, but does help OpenAI models.

Source?

Expand full comment

asher

Jul 28

Note also that Sama has said:

"we achieved gold medal level performance on the 2025 IMO competition with a general-purpose reasoning system! to emphasize, this is an LLM doing math and not a specific formal math system; it is part of our main push towards general intelligence...

we are releasing GPT-5 soon but want to set accurate expectations: this is an experimental model that incorporates new research techniques we will use in future models. we think you will love GPT-5, but we don't plan to release a model with IMO gold level of capability for many months."

which unfortunately implies that the GPT5 release may be significantly behind what they have internally (though it's still an update if GPT5 itself, the first model incorporating the new research techniques, is unexpectedly good or bad)

Expand full comment

Reply (1)

asher

Jul 28

I'm curious if you have takes on how much to update on the LLM IMO results

Expand full comment

Noah Birnbaum

Jul 28

Good points!

This is somewhat related to a short piece I posted today about how to update about timelines if pre-training is dead: https://www.lesswrong.com/posts/En2ksovwtaAKZAa7K/how-to-update-if-pre-training-is-dead. Curious to hear your thoughts here!

Expand full comment

Redwood Research blog

Should we update against seeing relatively…