Notes on fatalities from AI takeover

Sep 23

Large fractions of people will die, but literal human extinction seems unlikely

5 Comments

Interesting, I share many intuitions here, but come to very different probabilities. The basic flow of the argument is that AIs might care a little bit about us, and that this little bit of caring will make the AI keep us around (although we are given a very small fraction of the universe).

I agree this is reasonable, but after thinking about it, can't honestly give a probability of more than 10-15% for why they'd care even "a little bit". The reasons you give the the AI caring are:

1. Preferences they get from training. My basic interpretation for a situation where this would make sense is like: The AI has a utility functions where killing a living human = -1utils. Whatever humans would do if given a star = +10 utils. Whatever the AI can do with a star is +1000 Utils. So the AI deems it worth it to sacrifice a small fraction of the lightcone not to kill currently existing humans, eg by slowing down industrial expansion, but will not give humans any more resources than whats required to keep them alive and plausibly happy. (killing humans is -8e9, which is much more than the solar system is worth, but much less than the lightcone is worth)

2. Acausal trade with humans from other Everett branches where we solved alignment

3. Acausal trade with aliens.

4. It keeps humans around in anticipation of future Causal trade with aliens.

I'd put like 1% on (1), 1% on (2) and like 5-10% on (3,4) jointly since they're pretty correlated.

Reason (1) seems so unlikely to me is that from the "standard" alignment perspective, getting AIs to care about humans in themselves, in a way that won't be goodhearted from our perspective by a ASI, requires many thing to go very right in our alignment efforts. If we manage that I think we will not have takeover, and if we have takeover I don't think our alignment will have worked nearly that well. In the case where the AI does take-over but some aspects of human-related alignment goals got into the utility function, seems it would 99% (made up number) of the time be of the "low fidelity humans are simulated and say 'amazing work!!! 24/7' " variety.

Reason (2) seems low to me for reasons basically said discussion in the post you cited. In the case we are killed, I expect that to be fairly overdetermined, with the branches where humanity survived being sparse enough that they'll have very very little to bargain with.

Reason (3,4) is low would take a long time to explain, and it depends on how you model existence of aliens and their interactions in the far future, which equilibrium they converge towards. But the short summary is that I think you'd need a non-negligible fraction of aliens to not just be some flavor of kind, but kind in the very specific sense where they'd be sad about humans not existing, in a way that couldn't be made up for by having a trillion happy beings existing somewhere else.

Expand full comment

Austin Morrissey

Sep 23

Thanks Ryan; I appreciate your posts.

Expand full comment

Oscar Delaney

Sep 24

On what moral views is P(extinction|takeover) an important question? Possibly on an egoist view or something, but on anything broadly longtermist it seems to not matter much, right?

Expand full comment

Reply (1)

Ryan Greenblatt

Oct 2

I agree on a broadly longtermist view, we care most about P(takeover).

I think there are moral views which care a decent amount about E[fatalities | takeover] and some views that care extinction over and above fatalities (though these views strike me as being somewhat incoherant given reasonable views about the size and scale of the universe, especially views which treat extinction as a particularly salient threshold over and above losing most control over the future)

Expand full comment

Artep

Sep 23

I am afraid you lost me halfway through. I mean, I could be run over by a bus tomorrow. What is the point of this speculation, really?

Expand full comment

Redwood Research blog

Notes on fatalities from AI takeover