The AI Problem, with Solutions

We find ourselves at a crucial moment in Earth’s history. Like a boulder perched upon a mountain’s peak, we stand at an unstable point. We cannot stay where we are: AI is coming provided that scientific progress continues. Soon we will tumble down one side of the mountain or another to a stable resting place.

One way lies human extinction. (“Go extinct? Stay on that square.”) Another resting place may be a stable global totalitarianism that halts scientific progress, although that seems unlikely.1

What about artificial intelligence? AI leads to intelligence explosion, and, because we don’t know how to give an AI benevolent goals, by default an intelligence explosion will optimize the world for accidentally disastrous ends. A controlled intelligence explosion, on the other hand, could optimize the world for good. (More on this option in the next chapter.)

I, for one, am leaning all my weight in the direction of this last valley: a controlled intelligence explosion.

For a fleeting moment in history we are able to comprehend (however dimly) our current situation and influence which side of the mountain we are likely to land in. What, then, shall we do?

Differential Intellectual Progress

What we need is differential intellectual progress:

Differential intellectual progress consists in prioritizing risk-reducing intellectual progress over risk-increasing intellectual progress. As applied to AI risks in particular, a plan of differential intellectual progress would recommend that our progress on the scientific, philosophical, and technological problems of AI safety outpace our progress on the problems of AI capability such that we develop safe superhuman AIs before we develop (arbitrary) superhuman AIs. Our first superhuman AI must be a safe superhuman AI, for we may not get a second chance.

To oversimplify, AI safety research is in a race against AI capabilities research. Right now, AI capabilities research is winning, and in fact is pulling ahead. Humanity is pushing harder on AI capabilities research than on AI safety research.

If AI capabilities research wins the race, humanity loses. If AI safety research wins the race, humanity wins.

Many people know what it looks like to push on AI capabilities research. That’s most of the work you read about in AI. But what does it look like to push on AI safety research?

This article contains a long list of problem categories in AI safety research, but for now let me give just a few examples. (Skip this list if you want to avoid scary technical jargon.)

  • When an agent considers radical modification of its own decision mechanism, how can it ensure that doing so will increase its expected utility? Current decision theories stumble over Löb’s Theorem at this point, so a new “reflectively consistent” decision theory is needed.
  • An agent’s utility function may refer to states of, or entities within, its ontology. But as Peter de Blanc notes, “If the agent may upgrade or replace its ontology, it faces a crisis: the agent’s original [utility function] may not be well-defined with respect to its new ontology.”2 We need to figure out how to make sure that after we give an AI good goals, those goals won’t be “corrupted” when the AI updates its ontology.
  • How can we construe a desirable utility function from what humans “want”? Current preference acquisition methods in AI are inadequate: we need newer, more powerful and universal algorithms for preference acquisition. Or perhaps we must allow actual humans to reason about their own preferences for a very long time until they reach a kind of “reflective equilibrium” in their preferences. This latter path may involve whole brain emulation—but how can we build that without first enabling the creation of dangerous brain-inspired self-improving AI?
  • We may not solve the problems of value theory before AI is created. Perhaps instead we need a theory of how to handle this normative uncertainty, for example something like Bostrom’s proposed Parliamentary Model.

Besides these technical research problems, we could also consider differential intellectual progress to recommend progress on a variety of strategic research problems. Which technologies should humanity move funding toward or away from? What can we do to reduce the risk of an AI arms race? Will it reduce AI risk to encourage widespread rationality training or benevolence training? Which interventions should we prioritize?

Action, Today

So one part of the solution to the problem of AI risk is differential intellectual progress. Another part of the solution is to act on the recommendations of the best strategy research we can do. For example, the following actions probably reduce AI risk:

  • Donate to organizations doing the kinds of technical and strategic research in AI safety we discussed above—organizations like the Machine Intelligence Research Institute and the Future of Humanity Institute.
  • Persuade people to take AI safety seriously, especially AI researchers, philanthropists, smart young people, and people in positions of influence.


Thus far I’ve been talking about AI risk, but it’s important not to lose sight of the opportunity of AI either:

We don’t usually associate cancer cures or economic stability with artificial intelligence, but curing cancer is ultimately a problem of being smart enough to figure out how to cure it, and achieving economic stability is ultimately a problem of being smart enough to figure out how to achieve it. To whatever extent we have goals, we have goals that can be accomplished to greater degrees using sufficiently advanced intelligence.

In my final chapter, I will try to explain just how good things can be if we decide to take action and do AI right.

Yes, we must be sober about the fact that nothing in physics prohibits very bad outcomes. But we must also be sober about the fact that nothing in physics prohibits outcomes of greater joy and harmony than our primitive monkey brains can imagine.

* * *

1Bryan Caplan, “The Totalitarian Threat,” in Bostrom and Ćirković, Global Catastrophic Risks, 504–519.

2Peter de Blanc, Ontological Crises in Artificial Agents’ Value Systems (The Singularity Institute, San Francisco, CA, May 19, 2011), http://arxiv.org/abs/1105.3821.