Redwood Research blog
Subscribe
Sign in
Home
Podcast
Chat
Reading List
Archive
About
Latest
Top
Discussions
AI Futurism Reading List
We recently ran a strategy fellowship through Astra. As part of this, we ran a reading group for our fellows on some of the topics that we think are…
Jul 2
•
Alexa Pan
34
1
6
June 2026
The distillation double bind: Distilling misaligned models either transfers misalignment or it doesn't
If it transfers misalignment, we might get a misaligned model that’s easier to incriminate. If it doesn’t, we might get a capable benign replacement…
Jun 18
•
Alek Westover
,
Alexa Pan
,
Sebastian Prasanna
, and
Arun Jose
13
1
Estimating No-CoT Task-Completion Time Horizons of Frontier AI Models
Models' no-CoT time horizon has doubled roughly every year.
Jun 10
•
Anders Cairns Woodruff
24
1
Efficient tradeoffs and the safety-usefulness tradeoff model
When is "increasing safety budget" a useful concept?
Jun 8
•
Buck Shlegeris
8
1
1
May 2026
Retrying vs Resampling in AI Control
We’ve just released a new paper: Retrying vs Resampling in AI Control. We revisit the resampling protocols introduced in Ctrl-Z with an up-to-date…
May 29
•
James Lucassen
and
Adam Kaufman
10
1
Advice for making robust-to-training model organisms
We’d like to develop training techniques that work when applied to future misaligned AI systems. One strategy for studying proposed techniques is to…
May 28
•
Alek Westover
,
Sebastian Prasanna
,
Vivek Hebbar
,
Julian Stastny
, and
Dylan Xu
12
Full automation of AI R&D probably yields a large speed up even without a software-only singularity
Full automation likely yields a one-time speed-up and higher returns from compute
May 27
•
Ryan Greenblatt
16
1
1
Incriminating misaligned AI models via distillation
Suppose we have a dangerous misaligned AI that can fool alignment audits, and distill it into a student model.
May 18
•
Alek Westover
,
Sebastian Prasanna
,
Alex Mallen
,
Alexa Pan
,
Julian Stastny
, and
Arun Jose
32
6
Risk reports need to address deployment-time spread of misalignment
Deployment-time spread is the most plausible near-term route to consistent adversarial misalignment
May 15
•
Alex Mallen
10
How useful is the information you get from working inside an AI company?
My median guess: it's as good as a crystal ball that sees 2.5 months into the future.
May 11
•
Buck Shlegeris
and
Anders Cairns Woodruff
24
1
A review of “Investigating the consequences of accidentally grading CoT during RL”
Last week, OpenAI staff shared an early draft of Investigating the consequences of accidentally grading CoT during RL with Redwood Research staff.
May 7
•
Buck Shlegeris
21
1
Risk from fitness-seeking AIs: mechanisms and mitigations
Fitness-seeking is increasingly what misalignment looks like in practice—how should we respond?
May 1
•
Alex Mallen
8
1
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts