Stuart Russell says to give up on deontologies

Posted on October 28, 2019 by Jonathan Andreas — Leave a comment

Stuart Russell wrote a new book about managing the coming artificial intelligence revolution called Human Compatible. He believes that the main danger of AI is not that it will become conscious and decide to maximize its own well-being at the expense of humans, but that AI will slavishly try to solve the problems that humans direct it to solve and cause disastrous unintended consequences. For example:

the problem comes from increasing capabilities, coupled with our inability to specify objectives completely and correctly. Can we restore our carbon dioxide to historical levels so that we get the climate back in balance? Sounds like a great objective. Well, the easiest way to do that is to get rid of all those things that are producing carbon dioxide, which happen to be humans. You want to cure cancer as quickly as possible. Sounds great, right? But the quickest way to do it is to run medical trials in parallel with millions of human subjects or billions of human subjects. So you give everyone cancer and then you see what treatments work.

He says we can’t write down rules for how to do this safely because laws don’t work well enough. You can declare, “Thou shall not kill” but the AI might misinterpret this (as humans often do) or find some loophole in which the AI merely causes someone else to kill humans. Russell gives the analogy of how impossible it is to write a tax law without people finding loopholes and causing unintended consequences:

So, we’ve been trying to write tax law for 6,000 years. And yet, humans come up with loopholes and ways around the tax laws so that, for example, our multinational corporations are paying very little tax to most of the countries that they operate in. They find loopholes. And this is what, in the book, I call the loophole principle. It doesn’t matter how hard you try to put fences and rules around the behavior of the system. If it’s more intelligent than you are, it finds a way to do what it wants.

Russell says that we have to stop using deontology (a rule-based ethical system) and move to a consequentialist ethical system for managing AI.

Instead, the AI system has a constitutional requirement that it be of benefit to human beings. But it knows that it doesn’t know what that means. It doesn’t know our preferences. And it knows that it doesn’t know our preferences about how the future should unfold. So you get totally different behavior. Basically, the machines defer to humans. They ask permission before doing anything that messes with part of the world. …[An AI that is trying to learn what humans want] has an incentive to be honest about its plans because it wants to get feedback and so on.

This does seem to be an improvement, but I’m still worried about AI developing its own priorities and even if AIs stay obedient to humans, I’m worried about the character of the individuals that the machines will obey. The people who own various AI could have disastrous priorities.

Russell makes the parallel with nuclear physics in order to argue that we need to be thinking about how to manage the dangers before they become a problem, but I think that is actually a scarier analogy than Russell had intended.

I think it’s useful to look back at the history of nuclear energy and nuclear physics, because it has many parallels… when Leo Szilard invented the nuclear chain reaction, he didn’t know which atoms could be induced to go through a fission reaction and produce neutrons that would then produce more fission reactions…. The only way you get nuclear safety is by worrying about the ways [reactors] can blow up and preventing them from blowing up.

Unfortunately for this analogy he first application of controlled nuclear reactions was the nuclear bomb and the physicists at the time risked the survival of the entire planet when they tested it. They suspected that the explosion could cause a chain reaction that would ignite the atmosphere of the entire world in a global nuclear explosion when they tested it, but they did it anyhow.

Similarly, AI will eventually become dangerously powerful. Who is to say that the first AI won’t be deployed as a weapon between nations or as a weapon of wealth accumulation by a private party who isn’t worried about the wellbeing cost to humanity because of lust for power? With anything as powerful as nuclear reactions or AI, we will need to worry about the purposes of any humans who have any control over it. With AI, there is the additional worry that it will escape the control of those humans altogether and develop independent motivations.