Not Built to Think About AI

In the Terminator movies, the Skynet AI becomes self-aware, kills billions of people, and dispatches killer robots to wipe out the remaining bands of human resistance fighters. That sounds pretty bad, but in NPR’s piece on the intelligence explosion, eBay programmer Keefe Roedersheimer explained that the creation of an actual machine superintelligence would be much worse than that.

Martin Kaste (NPR): Much worse than Terminator?

Keefe Roedersheimer: Much, much worse.

Kaste: . . . That’s a moonscape with people hiding under burnt out buildings being shot by lasers. I mean, what could be worse than that?

Roedersheimer: All the people are dead.¹

Why did he say that? For most goals an AI could have—whether it be proving the Riemann hypothesis or maximizing oil production—the simple reason is that “the AI does not love you, nor does it hate you, but you are made of atoms which it can use for something else.”² And when a superhuman AI notices that we humans are likely to resist having our atoms used for “something else,” and therefore pose a threat to the AI and its goals, it will be motivated to wipe us out as quickly as possible—not in a way that exposes its critical weakness to an intrepid team of heroes who can save the world if only they can set aside their differences . . . No. For most goals an AI could have, wiping out the human threat to its goals, as efficiently as possible, will maximize the AI’s expected utility.

And let’s face it: there are easier ways to kill us humans than to send out human-shaped robots that walk on the ground and seem to enjoy tossing humans into walls instead of just snapping their necks. Much better if the attack on humans is sudden, has simultaneous effects all over the world, is rapidly lethal, and defies available countermeasures. For example: Why not just use all that superintelligence to engineer an airborne, highly contagious and fatal supervirus? A 1918 mutation of the flu killed 3% of Earth’s population, and that was before air travel (which spreads diseases across the globe) was common, and without an ounce of intelligence going into the design of the virus. Apply a bit of intelligence, and you get a team from the Netherlands creating a variant of bird flu that “could kill half of humanity.”³ A superintelligence could create something far worse. Or the AI could hide underground or blast itself into space and kill us all with existing technology: a few thousand nuclear weapons.

The point is not that either of these particular scenarios is likely. I’m just trying to point out that the reality of a situation can be, of course, quite different from what makes for good storytelling. As Oxford philosopher Nick Bostrom put it:

When was the last time you saw a movie about humankind suddenly going extinct (without warning and without being replaced by some other civilization)?⁴

It makes a better story if the fight is plausibly winnable by either side. The Lord of the Rings wouldn’t have sold as many copies if Frodo had done the sensible thing and dropped the ring into the volcano from the back of a giant eagle. And it’s not an interesting story if humans suddenly lose, full stop.

When thinking about AI, we must not generalize from fictional evidence. Unfortunately, our brains do that automatically.

A famous 1978 study asked subjects to judge which of two dangers occurred more often. Subjects thought accidents caused about as many deaths as disease, and that homicide was more frequent than suicide.⁵ Actually, disease causes sixteen times as many deaths as accidents, and suicide is twice as common as homicide. What happened?

Dozens of studies on the availability heuristic suggest that we judge the frequency or probability of events by how easily they come to mind. That’s not too bad a heuristic to have evolved in our ancestral environment when we couldn’t check actual frequencies on Wikipedia or determine actual probabilities with Bayes’ Theorem. The brain’s heuristic is quick, cheap, and often right.

But like so many of our evolved cognitive heuristics, the availability heuristic often gets the wrong results. Accidents are more vivid than diseases, and thus come to mind more easily, causing us to overestimate their frequency relative to diseases. The same goes for homicide and suicide.

The availability heuristic also explains why people think flying is more dangerous than driving when the opposite is true: a plane crash is more vivid and is reported widely when it happens, so it’s more available to one’s memory, and the brain tricks itself into thinking the event’s availability indicates its probability.

What does your brain do when it thinks about superhuman AI? Most likely, it checks all the instances of superhuman AI you’ve encountered—which, to make things worse, are all fictional—and judges the probability of certain scenarios by how well they match with the scenarios that come to mind most easily (due to your mind encountering them in fiction). In other words, “Remembered fictions rush in and do your thinking for you.”

So if you’ve got intuitions about what superhuman AIs will be like, they’re probably based on fiction, without you even realizing it.

This is why I began Facing the Intelligence Explosion by talking about rationality. The moment we start thinking about AI is the moment we run straight into a thicket of common human failure modes. Generalizing from fictional evidence is one of them. Here are a few others:

Due to the availability heuristic, your brain will tell you that an AI wiping out mankind is incredibly unlikely because you’ve never encountered this before. Moreover, even when things go as badly as they do in, say, alien invasion movies, the plucky human heroes always end up finding a way to win at the last minute.
Because we overestimate the probability of conjunctive events but underestimate the probability of disjunctive events,⁶ we’re likely to overestimate the probability that superhuman AI will turn out okay because X, Y, and Z will all happen, and we’re likely to underestimate the probability that superhuman AI will turn out badly because there are many ways superhuman AI can turn out badly that don’t depend on many other events occurring.
Due to your brain’s anchoring and adjustment heuristic, your judgment of a situation will anchor on noticeably irrelevant information. For example, the number that comes up on a wheel of fortune roll will affect your guess at how many countries there are in Africa. Even though I’ve just talked about how The Terminator is irrelevant fictional evidence about how superhuman AIs will turn out, your brain will be tempted to anchor on The Terminator and then adjust away from it, but not far enough away.
Due to the affect heuristic, we judge things based on how we feel about them. We feel good about beautiful people, so we assume beautiful people are also smart and hard-working. We feel good about intelligence, so we might expect an intelligent machine to be benevolent. (But no.)
Due to scope insensitivity, we do not feel any more motivated to prevent one hundred thousand deaths than we do to prevent one hundred deaths.

Clearly, we were not built to think about AI.

To think wisely about AI, we’ll have to keep watch for—and actively resist—many kinds of common thinking errors. We’ll have to use the laws of thought rather than normal human insanity to think about AI.

Or, to put it another way, superhuman AI will have a huge impact on our world, so we want very badly to “win” instead of “lose” with superhuman AI. Technical rationality done right is a system for optimal winning—indeed, it’s the system a flawless AI would use to win as much as possible.⁷ So if we want to win with superhuman AI, we should use rationality to do so.

And that’s just what we’ll begin to do in the next chapter.

* * *

¹Martin Kaste, “The Singularity: Humanity’s Last Invention?,” NPR, All Things Considered (January 11, 2011), accessed November 4, 2012, http://www.npr.org/2011/01/11/132840775/The-Singularity-Humanitys-Last-Invention.

²Eliezer Yudkowsky, “Artificial Intelligence as a Positive and Negative Factor in Global Risk,” in Global Catastrophic Risks, ed. Nick Bostrom and Milan M. Ćirković (New York: Oxford University Press, 2008), 308–345.

³RT, “Man-Made Super-Flu Could Kill Half Humanity,” RT, November 24, 2011, http://www.rt.com/news/bird-flu-killer-strain-119/.

⁴Nick Bostrom, “Existential Risks: Analyzing Human Extinction Scenarios and Related Hazards,” Journal of Evolution and Technology 9 (2002), http://www.jetpress.org/volume9/risks.html.

⁵Sarah Lichtenstein et al., “Judged Frequency of Lethal Events,” Journal of Experimental Psychology: Human Learning and Memory 4 (6 1978): 551–578, doi:10.1037/0278-7393.4.6.551.

⁶Tversky and Kahneman, “Judgment Under Uncertainty.”

⁷Stephen M. Omohundro, “Rational Artificial Intelligence for the Greater Good,” in The Singularity Hypothesis: A Scientific and Philosophical Assessment, ed. Amnon Eden et al. (Berlin: Springer, 2012), Preprint at, http://selfawaresystems.files.wordpress.com/2012/03/rational_ai_greater_good.pdf.

Next Chapter →