AI Futurism Reading List
We recently ran a strategy fellowship through Astra. As part of this, we ran a reading group for our fellows on some of the topics that we think are important for thinking about AI futurism (key dynamics in AI development, existential risk from AI, and approaches to mitigating risk). This post contains the reading list we used.
The selection reflects my opinionated views of the field, focuses particularly on topics we happen to focus on at Redwood, and doesn’t aim to be comprehensive.
I selected readings that I thought described conceptual frames and hypotheses in AI futurism that are regularly used by me and my coworkers. I think it is a good exercise to consider whether you agree with their theses and ways in which their predictions have fared well or badly in light of recent evidence.
If you have suggestions for this reading list, please let me know.
How to use this reading list
This reading list has a core and extended section.
Core readings are organized into 4 weeks. Each week covers <8 hours of foundational context on a topic.
Topics are chosen for (1) general importance for AI risk threat modeling and/or (2) relevance to work at Redwood Research.
We recommend that you prioritize “recommended” readings before “optional”.
We recommend that you prioritize starred readings if you only have ~1 hour.
Extended readings are for optional reference.
“Key questions” and “exercises” are recommended for discussion groups.
Core readings
Week 1: Timelines / takeoff modeling
Key questions:
What are key milestones to track in AI development?
When will powerful AIs[1] arrive? (Timelines)
How quickly will powerful AIs arrive? (Takeoff speeds)
What do existing models say about the above questions? What are key assumptions/parameters in these models?
How powerful are AIs currently?
What implications do timelines/takeoff modeling have on AI strategy?
Recommended
*Three Types of Intelligence Explosion (~30 min, audio option)
A breakdown of AI capability levels focused on AI R&D labor acceleration
Similar: Six milestones for AI automation
Will AI R&D Automation Cause a Software Intelligence Explosion? (~1.5 hr, audio option)
Full automation of AI R&D probably yields a large speed up even without a software-only singularity
*AI Futures Model: Dec 2025 update
Exercise: Think about how this model compares to other takeoff models.
Exercise: Play with the web app. Share the median takeoff forecast according to your parameters and any thoughts on this model.
Clarifying limitations of time horizon - METR
Exercise: What seems to be the main methodological differences between Epoch ECI and METR time horizon? Which one do you prefer to extrapolate to understand capabilities progress and why?
Exercise: Are there better measures of capabilities progress / AI R&D progress you prefer?
Optional
AI’s capability improvements haven’t come from it getting less affordable
Broad Timelines — LessWrong (+ top Ryan comment)
Do the returns to software R&D point towards a singularity? | Epoch AI
How quick and big would a software intelligence explosion be?
Read the summary, then play with the web app. Share the median takeoff forecast according to your parameters and any thoughts on this model.
How does this model differ from AI Futures Project’s?
What a Compute-Centric Framework Says About Takeoff Speeds (“Long Summary”, ~1hr, somewhat obsoleted by the above)
Week 2: Misaligned AI takeover threat modeling
Key questions:
What motivations will drive the behavior of powerful AIs?
What exactly do we mean by scheming AIs?
How likely is scheming (compared to other misaligned motivation)?
How dangerous is scheming (compared to other misaligned motivations)?
How might scheming AIs take over?
What motivation do current models seem to have? How does this update us about hte motivations of future models, if at all?
Recommended
*The behavioral selection model for predicting AI motivations — LessWrong (25 min)
Risk from fitness-seeking AIs: mechanisms and mitigations (40 min)
*Will AIs fake alignment during training in order to get power (Summary 40min, audio option)
Exercise: estimate P(scheming) based on the arguments in this report and according to your all-things-considered view
AI 2027 (3hr, recommend skimming); AI Goals Forecast
Ryan on the 80,000 Hours podcast (Takeover threat modeling discussion 00:17-00:34)
What failure looks like (15min)
Optional:
Papers (5-30min each)
Emergent misalignment paper
Weird generalization paper
OAI scheming paper (+ eval awareness post)
Deeper dive into scheming report
Will AIs fake alignment during training in order to get power (Section 2.1 (requirements for scheming), 2.3 (the goal guarding story), 4.2 (counting argument), 4.3 (simplicity argument), audio option)
Basic frames for thinking about AIs
P(misalignment)
P(scheming)
Many arguments for AI x-risk are wrong — LessWrong (section summarizing the rebuttal against the counting argument)
P(AI takeover | misalignment)
Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover (1hr)
Risk reports need to address deployment-time spread of misalignment (30 min)
Behavioral red-teaming is unlikely to produce clear, strong evidence that models aren’t scheming
Sleeper agent paper, password locked model paper
Alignment faking paper (interview)
P(AI takeover)
Shallow review of technical AI safety, 2025 (1hr, skim)
Week 3: Control
Key questions
What is AI control? Why (not) research AI control?
What are key threat models, mitigations, and areas of work in control?
e.g. What do we mean by “concentrated vs. diffuse failures” and “high-stakes / diffuse control”?
What is the current state of control research? How might this change with more powerful AIs?
What happens after we do control?
Recommended (Most are under 30 min. Can treat 4-5 as optional bc they’re more in the technical weeds.)
Foundations
Threat modeling and game dynamics
Plans after control / Macrostrategy
Control measures
Examples of empirical research in (high-stakes) control:
Ideas for related research directions
Incriminating misaligned AI models via distillation
Week 4: Governance / strategy
Key questions:
How should/will AI companies make decisions about AI development/deployment?
What will the decision-making dynamics inside them be like?
How will they be affected by the outside world (perhaps especially by relevant governments)?
How should/will powerful states interact with the development of powerful AI?
What will the decision-making dynamics inside/between them be like?
Recommended (These recommendations are significantly less confident.)
The Playbook
Lab dynamics
AI 2027 (race/slowdown endings after branching point)
AIFP CEO takeover scenario (20 min)
Ten people on the inside (5 min)
State dynamics
Situational awareness (essays III-V, 3 hr)
Current US admin on AI and national security, state legislation
China admin on AI
How China Views AI Risks and What to do About Them | Carnegie Endowment for International Peace
Why China isn’t about to leap ahead of the West on compute | Epoch AI
Chinese AI models have lagged the US frontier by 7 months on average since 2023 | Epoch AI
No, the 2017 New Generation AI Development Plan did not include a goal of building AGI
Optional
AI Deterrence Is Our Best Option | AI Frontiers (MAIM’s response to critics)
Evaluating the Risks of Preventive Attack in the Race for Advanced AI | RAND
Frontier lab safety policies/commitments
Lab governance scrunity
Anthropic is Quietly Backpedalling on its Safety Commitments — LessWrong
Holden Karnofsky on dozens of amazing opportunities to make AI safer — and all his AGI takes | 80,000 Hours (Can we trust Anthropic, or any AI company? 00:43)
US AI policy/regulation
China AI policy/regulation
Extended readings
Concrete projects to prepare for superintelligence
Trading with AIs
Schelling’s “Arms and Influence” (Chapter 1)
Schelling’s “Strategy of Conflict” (Chapters 1-3)
Power concentration/coup prevention
AI-enabled coups: how a small group could use AI to seize power
How Can We Prevent AI-Enabled Coups? - Podcast by Forethought
How much should we worry about secretly loyal AIs? — LessWrong
Checks, Balances, and Power Concentration - Podcast by Forethought
Acausal stuff
[Note: the below are largely superseded by the new acausal reading list. Probably reach out to Chi if you want to be up to date.]
Cooperating with aliens and AGIs: An ECL explainer — LessWrong
Evidential Cooperation in Large Worlds: Potential Objections & FAQ — LessWrong
TBD: some decision theory basics
Moral patienthood
Foundations (mostly Eleos AI research outputs)
Empirical work on introspection and model self-reports
Model welfare sections of various Anthropic system cards
AI biorisk / other AI x-risk
AI-biorisk
Other AI x-risk
Model spec
Lab model specs
Claude’s Constitution \ Anthropic (Jan 2026)
Model Spec Midtraining: Improving How Alignment Training Generalizes (May 2026)
OpenAI Model Spec(Dec 2025)
Sharing the latest Model Spec | OpenAI (Feb 2025)
Stress-testing model specs reveals character differences among language models (Oct 2025)
Claude 4.5 Opus’ Soul Document — LessWrong (Nov 2025)
Claude’s Character \ Anthropic (June 2024)
Constitutional AI: Harmlessness from AI Feedback \ Anthropic (Dec 2022)
Better futures / Post AGI governance
Forethought’s Better Futures series (essays 1-5)
Moral public goods are a big deal for whether we get a good future
Space governance
Thanks to Alex Mallen, Buck Shlegeris, Jackson Sipple, and Aniket Chakravorty for helpful input.
“Powerful AI” is left intentionally vague here. In practice, it can refer to any relevant milestone we’re interested in forecasting, e.g. the AIs which provide 3x AI R&D labor acceleration, AIs which fully automate AI research, AIs which dominate human experts in all cognitive tasks, etc.


I'd like to do a zoom study group or partnered havruta on this. Message me if you'd be interested.