ASI probably won't kill literally everyone, but not necessarily for happy reasons

It's easier to destroy civilization than to do an omnicide

Feb 17, 2025

Epistemic status: Inherently uncertain, like everything about speculative AI

The standard argument about AI risk goes like this: If you imagine an arbitrarily powerful superintelligence - or at least one that’s as powerful as AI-powered recursive improvement can make it in a short time - then, assuming it’s misaligned, it quickly kills us all to use our atoms for its own purposes. And since we don’t seem to be making enough progress on alignment, that’s the inevitable endpoint we’re headed to.

As a proof of existence to overcome doubt that AI can be risky at all, this is a strong argument. If you assume AI just keeps improving at current trends, there’s probably some point at which it has both the capabilities and the misalignment to do all that.

But that’s not the first point at which AI becomes dangerous. While AI does occasionally seem to jump in capabilities (Alphago went right from low-level pro play to world champion), it seems to move slower on real-world tasks (look how gradual improvement on self-driving was). And even Alphago wasn’t that big a jump - from a human perspective, there actually isn’t that much of a gap between a midlevel pro player and a world champion. The waitbutwhy model of a sudden jump in AI abilities just doesn’t match how we’ve seen AI develop.

The Artificial Intelligence Revolution: Part 1 - Wait But Why — This was a plausible theory of what AI improvement might look like for a while, but in retrospect it’s pretty clearly wrong.

Still, AI capabilities do seem to be gradually improving, plausibly in a bad direction. So what does disaster look like?

If we stick to the risk that the AI does something bad, the first really bad thing an AI might do probably isn’t world-destroying. It’s probably something like a poorly-planned cyberattack on global infrastructure with some frighteningly well-executed components that overlooks some real-world considerations that make it just not work but alert people to the risk (this seems like what a GPT6 level attack would look like). Maybe it’s a somewhat better-executed attack that brings down several continents’ power grids and causes mass starvation or civilizational collapse. Both those things are just so much easier than literally killing everyone at once. Even if GPT8 could instantly kill everyone with diamondium bacteria, we’ll probably become smart enough to never build it if GPT6 keeps making poorly-executed semi-competent plots to kill everyone. And we’ll definitely never make it if GPT7 destroys the power grid and knocks us all back to the stone age.

But I don’t actually think we get there, because while humanity seems uncautious (to put it mildly) about giving AI agency, agency is still an extra complication that makes it harder to do damage. What I actually think happens is that someone makes non-agentic AI that’s smart enough to design a supervirus, and five minutes later a terrorist group somewhere in the world tells it to do that, which human civilization doesn’t survive.

But either way, the good news is we probably don’t actually all get paperclipped.

A few more notes:

This model still implies AI risk awareness raising (and AI safety research) are important. If some big disaster happens, and there’s a decent chance the first big disaster doesn’t immediately wipe out humanity, then the threshold for a disaster big enough to make us take AI safety seriously (or stop development) is lower if people are actually expecting it to be risky. And even if all AI safety research ever does is come up with convincing-sounding explanations for why some approaches won’t work, it might help us avoid going “oh well we just need to add another layer of RLHF to make it totally safe” after AI blows up Australia.
This is a tangent, but I think the AI risk community failed to sufficiently update on how well RLHF worked. There’s a lot of very good arguments why it’s not reliable as a safety mechanism in itself (and plenty of demonstrations of how easily it can be cracked), but it genuinely gave AI a rough idea of some intuitive measure of what no-nos are to be avoided, and mostly managed to make it stick, in a way a lot of AI risk people used to say would be difficult to impossible. This doesn’t mean it solved AI risk, but when one thing we thought would be impossible turned out to be doable, we should at least update somewhat towards safety being more likely to be achievable.
Since I hate when these vague warning articles don’t come with predictions of risk probabilities, here are my (very rough, not super thought out) outcome probability distribution:
(note that some of these scenarios are overlapping, so this doesn’t add up to 100%)
1. Improvements in AI just hit a cliff short of anything that could be genuinely threatening or revolutionary: 25%
2. AI safety turns out to be naturally solvable as part of making useful, directable AI and we end up not having to do any extra effort for it: 15%
3. Making AI safe is doable, but only with lots of hard work and smart research on the problem: 20%
4. Agentic AI collapses civilization: 20%
5. Some terrorist group uses AI to do a mass terror attack that does unprecedented damage (e.g. brings down a national power grid or kills over a million people): 40%
6. A massive AI attack that falls short of bringing down civilization is nonetheless big enough to make us take AI risk seriously enough to prevent it doing anything worse: 35%
7. A massive AI attack destroys modern civilization but doesn’t cause human extinction: 15%
8. AI literally kills every human: 10%

Gesild

Mar 4

I've gone through a lot of the literature on the dangers of ASI and I'm just not convinced that ASI would view us as a threat or any kind of barrier to their goals (amassing resources, proliferating etc.). Compared to an AI we are very fragile and live for a very short amount of time, what threat do we pose?

Expand full comment

4 replies by Shaked Koplewitz and others

4 more comments...

shakeddown

Discussion about this post