Redwood Research blog
Subscribe
Sign in
Home
Reading List
Archive
About
Latest
Top
Discussions
Will AI systems drift into misalignment?
A reason alignment could be hard
Nov 15
•
Josh Clymer
10
1
What's up with Anthropic predicting AGI by early 2027?
I operationalize Anthropic's prediction of "powerful AI" and explain why I'm skeptical
Nov 3
•
Ryan Greenblatt
17
1
October 2025
Sonnet 4.5's eval gaming seriously undermines alignment evals
And this seems caused by training on alignment evals.
Oct 30
•
Alexa Pan
and
Ryan Greenblatt
24
5
2
Should AI Developers Remove Discussion of AI Misalignment from AI Training Data?
There is some concern that training AI systems on content predicting AI misalignment will hyperstition AI systems into misalignment.
Oct 23
•
Alek Westover
9
3
Is 90% of code at Anthropic being written by AIs?
I'm skeptical that Dario's prediction of AIs writing 90% of code in 3-6 months has come true
Oct 22
•
Ryan Greenblatt
14
2
Reducing risk from scheming by studying trained-in scheming behavior
Can we study scheming by studying AIs trained to act like schemers?
Oct 16
•
Ryan Greenblatt
5
Iterated Development and Study of Schemers (IDSS)
A strategy for handling scheming
Oct 10
•
Ryan Greenblatt
3
The Thinking Machines Tinker API is good news for AI control and security
It's a promising design for reducing model access inside AI companies.
Oct 9
•
Buck Shlegeris
13
1
1
Plans A, B, C, and D for misalignment risk
Different plans for different levels of political will
Oct 8
•
Ryan Greenblatt
12
September 2025
Notes on fatalities from AI takeover
Large fractions of people will die, but literal human extinction seems unlikely
Sep 23
•
Ryan Greenblatt
11
5
Focus transparency on risk reports, not safety cases
Transparency about just safety cases would have bad epistemic effects
Sep 22
•
Ryan Greenblatt
10
1
Prospects for studying actual schemers
Studying actual schemers seems promising but tricky
Sep 19
•
Ryan Greenblatt
and
Julian Stastny
8
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts