Skip to main content

How to Design AI Right According to AAAS Fellow Stuart Russell

There was once a mythical king named Midas who wished that everything he touched would turn to gold. The gods obliged, granting his wish. Through touch, Midas indiscriminately transformed any object, from pebbles, trees, and his worldly possessions, into solid gold. To his horror, so too did he turn his food, drink, and even beloved daughter to gold, all because he had failed to specify his wish correctly.

headshot of AAAS Fellow Stuart Russell
AAAS Fellow Stuart Russell, Ph.D.

Such is the core problem with how we design artificial intelligence (AI) systems today, says AAAS Fellow Stuart Russell, Ph.D., professor of computer science at the University of California, Berkeley. For forty years, the author of the AI field’s most widely used textbook, Artificial Intelligence: A Modern Approach, has explored the theoretical foundations of AI as well as its real-world applications—such as the new monitoring system for the Comprehensive Nuclear-Test-Ban Treaty. Over the last decade, his research has focused on anticipating the success of AI. According to Russell, success in the field would mean the creation of general-purpose AI systems that make better real-world decisions than human beings in every area in which the human intellect is relevant.

Russell’s fascination with future worlds and powerful technologies has been a lifelong interest, one that began with science fiction books. As a child growing up in England, he devoured tales of intelligent robots penned by authors such as Robert Heinlein, Isaac Asimov, and Arthur C. Clarke. While completing his A-levels in physics and math, Russell enrolled in a program that a nearby local college was offering for the first time—computing science. At the time, most people believed AI to be an idea from science fiction, not a technology we could use in the real world.

“It never occurred to me at that time that artificial intelligence could be, or was, an academic discipline that I could pursue,” says Russell, who graduated with first-class honors in Physics from Oxford University in 1982. But by 1981, after a stint working at IBM Scientific Center in Los Angeles, he had made up his mind to switch into this emerging field.

By the time Russell had completed his doctorate in computer science at Stanford University in 1986, there was a “huge explosion” in interest in AI with corporations and start-ups investing in the technology. Over the course of his career, Russell has seen waves of interest in AI crest and subside. The latest wave, starting about a decade ago, has focused on deep learning, an approach based on training very large circuits to fit large collections of empirical data. Asked to describe the leaps in understanding of AI that the scientific community has made during that decade, he gave a surprising answer: “Leaps in understanding doesn’t really describe it. We have no understanding right now! Sometimes systems work, and other times they fail completely, and we have no idea why.”

Russell believes that other technical approaches, particularly those based on a technology he and his former students helped invent called probabilistic programming, may have more potential in the long run.

But for Russell, the specific technology is less important than the overarching methodological framework of AI, and he believes that framework is seriously flawed. The “standard model,” as he calls it, involves defining a fixed objective and then plugging it into an objective-achieving machine. It sounds like a reasonable approach, and it works wonderfully in the lab on games and puzzles where the objective is built into the task definition. But when the objective of an AI system is specified incorrectly and doesn’t factor in harmful externalities, much like King Midas’ wish, it can become a real problem.

Take social media algorithms, for example. These systems have a pre-specified objective to maximize engagement—that is, to get people to use, and stay on, their platforms. According to the field’s current definition of success, these social media algorithms are meeting their objectives. But using social media for long and consistent periods of time is not always in the user’s best interest—doing so can lead to anxiety, depression, and low self-esteem.  Nor is it in society’s best interest, as the algorithms have learned to amplify misinformation and extreme content and to manipulate user’s preferences, leading to polarization.

In the real world, it is impossible for humans to specify the objectives of an AI system completely and correctly, says Russell, because in the long run, AI systems can affect anything that humans care about. In his 2021 Reith Lectures, delivered on the BBC, Russell gave an example of what could go wrong.

“Suppose, for example, that COP36 asks for help in deacidifying the oceans; they know the pitfalls of specifying objectives incorrectly, so they insist that all the by-products must be non-toxic, and no fish can be harmed. The AI system comes up with a new self-multiplying catalyst that will do the trick with a very rapid chemical reaction. Great! But the reaction uses up a quarter of all the oxygen in the atmosphere and we all die slowly and painfully. From the AI system’s point of view, eliminating humans is a feature, not a bug, because it ensures that the oceans stay in their now-pristine state.”

To avoid this scenario, Russell argues for two basic principles: first, that the AI system’s only objective is to further the best interests of humans, and second, that the AI system should be explicitly uncertain about what those interests are. It can learn more about human interests from observation of human behavior, including verbal behaviors such as asking for things and saying “Stop!” This approach leads automatically to the machine acting carefully, asking permission before doing something that might affect unknown preferences, and allowing itself to be switched off if humans feel it necessary.

In his Reith Lectures, Russell also gives an example of what it is like to be a machine in this position: “You’re the robot; your partner is the human; and you have to buy your partner the perfect birthday present using money from the joint account. You’re not sure what to get, and in past years you’ve usually got it wrong, but your payoff is precisely your partner’s happiness with the present.” In this example, rather than requiring the human partner to define the objective—that is, to specify in advance a ranking over all possible presents—the burden is shifted the machine, which can ask questions, observe the human to pick up clues, learn from past failures, and so on.

“If we pursue this approach and make it feasible on a practical scale, then I think we can safely adopt AI in all kinds of application areas,” he says. “Whereas if we stick with the old methods, I think we'll just see increasingly dangerous, systemic failures.”