5 Comments
User's avatar
Pepamo's avatar

Fire title

Expand full comment
Super Statistician's avatar

Seems like your overall thesis is somewhat contra the idea that we could see "substantial gains from lots of AI automation of RL environment construction," so I'm curious why you don't have that as a bigger part of your estimate. This strikes me as a plausible channel for recursive self-improvement/bootstrapping.

How confident are you that we're many years away from models autonomously building challenging agentic RL environments (e.g. writing most of the infrastructure code for GPT Plays Pokemon)?

Expand full comment
Eric-Navigator's avatar

Nice article! Nice to know you!

I have a few question on this, maybe naive, but perhaps they are also interesting.

First, are you thinking about disembodied or embodied AI? What environment does RL environment include? Is it only text-based? Does it include image, audio and video? Does it include a simulated physical environment, like Issac Gym? And if we are talking about replacing 50% of the real-world work with a time span of a month, does that mean the RL training environment has to last a month? It sounds like the cost will be astronomical.

And in the case that the AI agent has to interact with complex real environments or real humans with a long horizon and learn in real time, like most of the actual jobs require, how can we make sure the RL environment captures all that complexity? Can AI simulate everything to this degree?

Expand full comment
Emil Ryd's avatar

"I think the compute cost for doing RL on a hard agentic software engineering task might be around $10 to $1000 ($0.1 to $1 for each long rollout and you might do 100 to 1k rollouts?)"

This seems too low by an oom or two. i would guess labs want to do on the order of 100 steps of RL, and each step would be a batch of around 100 rollouts (in particular if you do GRPO where the micro train batch is ~4 or ~8, so number of rollouts is 4 x batch size), so more like 10k. Maybe batch sizes are generally quite small, like 4-8? 10 steps would seem surprisingly little to me.

Expand full comment
Eric-Navigator's avatar

I am working on a wild concept for this, and I am not shamed by sounding sci-fi-like!

The ultimate version of the RL environment concept will be like a school for AI agents. And as AI agents grow to take on real world problems by themselves, they must also take responsibilities. This is where my “Academy for Synthetic Citizens” concept comes from.

Check my profile for more information!

Expand full comment