4 Comments
User's avatar
Atticus Wang's avatar

> Interestingly, reasoning doesn't seem to help Anthropic models on agentic software engineering tasks, but does help OpenAI models.

Source?

Expand full comment
asher's avatar

Note also that Sama has said:

"we achieved gold medal level performance on the 2025 IMO competition with a general-purpose reasoning system! to emphasize, this is an LLM doing math and not a specific formal math system; it is part of our main push towards general intelligence...

we are releasing GPT-5 soon but want to set accurate expectations: this is an experimental model that incorporates new research techniques we will use in future models. we think you will love GPT-5, but we don't plan to release a model with IMO gold level of capability for many months."

which unfortunately implies that the GPT5 release may be significantly behind what they have internally (though it's still an update if GPT5 itself, the first model incorporating the new research techniques, is unexpectedly good or bad)

Expand full comment
asher's avatar

I'm curious if you have takes on how much to update on the LLM IMO results

Expand full comment
Noah Birnbaum's avatar

Good points!

This is somewhat related to a short piece I posted today about how to update about timelines if pre-training is dead: https://www.lesswrong.com/posts/En2ksovwtaAKZAa7K/how-to-update-if-pre-training-is-dead. Curious to hear your thoughts here!

Expand full comment