5 Comments
User's avatar
Andreas Robinson's avatar

"I expect Anthropic is getting a larger speed-up than OpenAI... I think that Anthropic’s best internal models provide a larger engineering acceleration than OpenAI’s best internal models"

I'd be curious why you say this?

Regarding public models, my experience is that openai coding models are smarter and better at challenging engineering/math/algorithm heavy tasks (which is generally reflected in benchmarks), whereas claude will run longer without stopping which can be useful for automating easier more rote tasks. So on net, I tend to think that openai models are more useful for frontier lab research automation, but curious if/why you disagree.

A. S.'s avatar

Agree with this! OpenAI models have a far more frequent tendency to "give up", a consequence of a higher length penalty perhaps?

Despite that, GPT-5.4-Thinking xhigh seems to be better at catching bugs and architecting correct systems when compared to Claude-Opus-4.6.

Uzay's avatar

> I expect the situation with AI cyber capabilities will seem extreme to security professionals and to maintainers of commonly-used software that tries to be secure (e.g. Chrome, Linux, etc), but will have almost no direct effect on random people in the US and won’t even have much effect on software engineers at big tech companies.

Was surprised by this? I guess maybe it hinges on the fact I think the cost of finding very impactful exploits will be quite a bit cheaper than 1M, and probably even quite a bit of consumer software things will get compromised/before people can update their stuff?

Michel Justen's avatar

> "The Anthropic Constitution also intentionally gives Anthropic’s AIs long-run cross-context goals to a much greater extent than OpenAI models have such goals. (I think this is a poor choice that makes problematic misalignment substantially more likely, but I’m not that confident and there isn’t very good science either way.)"

Can you say more about this? My understanding is that Anthropic is trying to align these models to this constitution, so in so far as they're pursuing constitution-like goals/principles they're not misalinged. But I might be missing some gray area here.

Dan N's avatar

"Current systems are reasonably likely to reward hack especially on very hard (or impossible) tasks and when operating autonomously for long stretches..."

I am reading this as implying ...that models are less likely to reward back if tasks Are simpler and smaller? And if so, is the implication from this, that using task decomposition and/or a "mixture of experts/models" to tackle complex tasks could (in theory) reduce the likelihood or severity of reward hacking and misaligned behavior. This would fit a kind of modularity and just-in-time-assembly paradigm we see in a lot defense in depth planning, so my intuition says it might work but I suspect you would know better than I if that's viable (if I have even read you correctly)