Redwood Research blog
Subscribe
Sign in
Home
Reading List
Archive
About
Latest
Top
Discussions
Notes on fatalities from AI takeover
Large fractions of people will die, but literal human extinction seems unlikely
Sep 23
•
Ryan Greenblatt
11
5
Focus transparency on risk reports, not safety cases
Transparency about just safety cases would have bad epistemic effects
Sep 22
•
Ryan Greenblatt
8
1
Prospects for studying actual schemers
Studying actual schemers seems promising but tricky
Sep 19
•
Ryan Greenblatt
and
Julian Stastny
8
What training data should developers filter to reduce risk from misaligned AI?
An initial narrow proposal
Sep 17
•
Alek Westover
15
4
AIs will greatly change engineering in AI companies well before AGI
AIs that speed up engineering by 2x wouldn't accelerate AI progress that much
Sep 9
•
Ryan Greenblatt
11
5
Trust me bro, just one more RL scale up, this one will be the real scale up with the good environments, the actually legit one, trust me bro
Above trend progress due to a rapid increase in RL env quality is unlikely
Sep 3
•
Ryan Greenblatt
16
5
August 2025
Attaching requirements to model releases has serious downsides (relative to a different deadline for these requirements)
System cards are established but other approaches seem importantly better
Aug 27
•
Ryan Greenblatt
6
Notes on cooperating with unaligned AIs
More thoughts on making deals with schemers
Aug 24
•
Lukas Finnveden
8
2
Being honest with AIs
When and why we should refrain from lying
Aug 21
•
Lukas Finnveden
6
My AGI timeline updates from GPT-5 (and 2025 so far)
AGI before 2029 now seems substantially less likely
Aug 20
•
Ryan Greenblatt
23
3
Four places where you can put LLM monitoring
To wit: LLM APIs, agent scaffolds, code review, and detection-and-response systems
Aug 9
•
Fabien Roger
and
Buck Shlegeris
14
1
July 2025
Should we update against seeing relatively fast AI progress in 2025 and 2026?
Maybe we should (re)assess the case for relatively fast progress after the GPT-5 release.
Jul 28
•
Ryan Greenblatt
21
4
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts