There’s something nice about all this media attention being given to the AI these days. It’s nice for one’s work to be seen as important or world-transforming. It’s also nice to see people speaking up to say that it might be risky or dangerous. The dialog about the potential benefits and pitfalls of AI seems to be honest and genuine.
Unfortunately, “honest” and “genuine” doesn’t remotely equate to “nuanced” or “deep”. This is especially true around the so-called “AI apocalypse”, the idea that a general, super-human computer intelligence is inevitable and what that means for the human race. There are at least three opinions on the matter:
- Super-human AI is coming, and it’s going to destroy us all
- Super-human AI is coming, and it’s going to be awesome
- Super-human AI isn’t coming, and you’re naive for thinking it will
The first of these gets the most press, (because who doesn’t love a good apocalypse?) and is espoused by guys like Stephen Hawking and Elon Musk. The second is the singularity crowd, led by Ray Kurzweil and others. The third is probably where most researchers and practitioners sit, including roboticist Rodney Brooks, who did a decent interview about the subject.
The thing that makes the discussion in the popular media depressingly shallow is that rarely is robust reasoning given for why someone thinks their idea is correct. With little in the way of a real argument presented for or against any theory, most readers will doubtlessly rely on their own intuitions, preferences, and fears to guide them. This does not an informed populace make.
Luckily, Dr. Stuart Russell is here to save us from our own ignorance. At a conference I attended recently, he gave a great talk entitled “Provably Beneficial AI” which gave concrete technical arguments that speak to each of the three views above: Why general AI might (or might not) be on its way, and why AI might (or might not) be a significant risk to the human race. While his talk didn’t give too much in the way of answers, it was clear that at least that he’s done a lot of thinking about the questions.
Here’s my summary of that talk. Let’s think along with him, shall we?
The Specter of Artificial Intelligence
The argument from the “Strong AI is inevitable” crowd goes something like this: Look at how much progress AI has made in the last 30 (or 20 or 10 or 5) years! I bet in the next n years it will be MOAR BETTER.
This argument is unconvincing to many in the field, and for good reasons. One is that the things we’ve gotten good at in the last few decades are really just more impressive versions of things we could already do. We’ve been doing limited domain speech recognition fairly well for at least 20 years. Now we’ve got more compute power, more data, models that are a bit more clever, et voila, our domain has expanded to the whole language. Similarly, a pro-level backgammon player was developed 25 years ago; some more compute resources, a bit more cleverness, and we can do the same with chess, and even more recently with Go.
Outside observers of this situation tend to overgeneralize. The idea seems that we have technology to solve backgammon and we “fancied it up” and were able to do well at Go, which is harder. Well, if we “fancy it up even more” we should be able to solve anything, right?
No. There are fundamental differences between speech recognition and game playing, and, well, life in general. Here are a few things that computers can’t yet do that you absolutely need for general intelligence:
Long-term, multi-level planning. Even simple goals we have as people require multiple plans on multiple levels all being attended to simultaneously. A good example is my recent travel from Chicago to Melbourne. This involved two flights, two car rides, two train rides, and loads of walking. Only a few of those individual steps were actually in the direction of my destination because there are many intermediate goals to be satisfied at every step (leave the house, get into the car, leave the driveway, get to the freeway, and so on). Humans are surprisingly good in formulating these ad-hoc, interacting hierarchies of goals in a way that computers are very much unable to do, as of yet.
Generalization of knowledge. Autonomous driving is basically a solved problem, but if you put a system trained for one vehicle into another, the likelihood of failure is pretty high even if the systems have only trivial differences. There aren’t yet any compelling general algorithms that allow computers to automatically reuse parts of already-learned tasks to solve a novel problem. There are parts of Go and Chess that are highly related (the ideas of attacking and defending, for example), but it’s hard to imagine how a chess-trained computer could use that fact to learn how to play Go more quickly. This ability to “analogize” between domains is one of the central pieces of human intelligence.
Real natural language understanding. Right now, you can ask Siri “What’s the weather going to be like today?” But suppose you asked, “Can I go golfing at 5pm today?” There’s a lot more to unpack in even that simple statement. Is the course open? Is there a public course nearby? Am I a member at a nearby private course? What will the weather be like? What is my schedule like? Do I own clubs? Current systems can only answer questions when the answers are fairly straightforward; doing anything more will require a significant leap in complexity over current language models.
Importantly, none of these things will be solved by more computers or bigger computers or quantum computers. These are things that need to be solved on a conceptual level; more computing power will just get us the same wrong answers we have now, faster. There is nothing substantial right now in the research to indicate that these will be solved in a general way in the near future.
Moreover, for some of these problems, there’s no obvious need for a computational solution. After all, how deep of an understanding of natural language does a computer need to have to do everything we want it to do? To have a computer read and really understand a novel, for example, is probably not terribly useful. There are lots of technologies out there that are more than possible from an engineering standpoint (fuel cell vehicles, travel to the moon, mile-high buildings), that never become commonplace only because of economics and their level of necessity.
So anyone who says strong AI is “inevitable” runs up against obvious counter-arguments. We can’t do it now, and there’s no reason now to believe we ever will . . . well, maybe not no reason. Russell shares the cautionary tale of Ernst Rutherford, who in 1933 was quoted in a newspaper article about the impossibility of extracting energy from an atom. Leo Szlisard read the article and, within a day, had conceived of a neutron-initiated nuclear chain reaction. The problem had gone from “impossible” to “essentially solved” in less than 24 hours. Is there any guarantee that strong AI will ever happen? Absolutely not. But it’s a fool’s bet to be on the wrong side of human ingenuity.
Will It Be Awesome or the End of Humanity?
So let’s suppose, for arguments sake, that we do invent a strong form of AI. Why should we believe the alarmists that it will be “evil” and destroy us all?
Certainly, there’s the possibility that an evil person gets a hold of such an AI and uses it against his or her enemies. But that threat isn’t a problem particular to AI; even if AI is an existential threat in such a situation, it’s difficult to make the case that it’s more dangerous than nuclear weapons, which basically have the human race at a hair’s breadth from annihilation all the time. There will always be dangerous things and evil people in the world, with or without AI, and that reality will always have to be dealt with.
But there are problems that are unique to AI, and many of them relate to the fact that a superhuman intelligence would be very good at doing what it was told.
Consider: You tell your super-intelligent robot to go and get you some coffee. What if the only coffee within 10 miles costs $50? You didn’t say to the robot, “Don’t pay too much”, so of course it happily hands over the $50 because that’s what you wanted. What if there’s no coffee shop within 100 miles? Coffee road trip!
Probably you can see how this quickly gets sinister. What if the coffee shop is closed? The robot is happy to break in and make the coffee itself. You wanted coffee, didn’t you? What if a police officer sees that the shop was broken into and tries to pull the robot out of the shop. Well, you said you wanted coffee. Maybe most importantly, what if anyone tries to power down or otherwise destroy the robot? Oh no, you don’t! I can’t get the coffee if I’m dead!
We don’t need to program our robot specifically to engage in violent behavior or over-consumption of resources. If a robot has a goal, and such behavior furthers that goal, the robot might engage in such behavior because it has no reason to avoid it.
The problem is one that Russell calls “value misalignment”. By telling the robot to get the coffee, we’ve implicitly given it a system of values: “Getting the coffee” has some positive value in the robot’s mind. Everything else – time, money, human life, and anything else we ourselves might value – is tied at zero.
Furthermore, the astute reader will see that the problem only gets worse as the robot gets smarter. Some of the most important attributes of human intelligence are adaptation, improvisation, and creativity. We want a robot to be able to figure out how to accomplish it’s goals even if there are obstacles in the way. As it gets more clever, though, its space of possible actions becomes larger and larger, and the more likely it is that its actions will, just by chance, run counter to a human value that it doesn’t know about.
To a degree, we see this in Machine Learning applications already: We’ve become so good at solving optimization problems that if there is a way to “cheat” by being myopic about the problem the optimizer will find it. A good example is the recent debacle where a Google image recognizer labeled some people as gorillas. To the system, the difference between a human face and a gorilla face is fairly subtle, and as long as it gets that right most of the time it thinks it’s doing a good job. After all, if I get all of the gorilla faces right and pull in just a few humans, that’s still a pretty good result, right? But of course, that’s not right at all.
If we told the system that it was important not to make this mistake, it could avoid making it. But if we haven’t told it so, why on earth would it try to avoid this any more than, say, mixing up a microwave and a toaster oven? This can all be summarized succinctly by one of my early CS professors:
“Don’t you hate it when when the program does what you told it to do instead of what you want it to do?”
Clarity of intention is of the utmost importance in computer programming, and shame on us when it is found lacking.
So, then, what does “telling the robot to avoid this sort of mistake” look like for general behavior in the world? Well, for starters, it means simply encoding the knowledge that human life, compliance with the law, and finite resources like money and time all have a positive value. This by itself goes a long way towards giving the robot a “more human” way of reasoning about its methods.
However, anyone who’s read Asimov knows the danger of relying on rules to govern robotic behavior. Again, computers are great optimizers; if its goals are furthered by finding loopholes in those rules, the robot will find them. It’s the aggressiveness of the optimization itself that needs to somehow be mitigated.
One way to do this, as Russell proposes, is to add uncertainty to the robot’s notion of value. More specifically, the robot needs to maintain an uncertainty about the value of changing the world when it doesn’t have sufficient information about that value.
For example, if the robot makes coffee for its owner using that owner’s coffee pot, the world stays more or less the same, apart from the owner having her coffee. A bit of water and electricity have been used. A few dishes are dirty and there’s, of course, less coffee, but by and large, the world hasn’t changed very much. Typically, this is a good outcome: The biggest change in the world is the one you wanted, and the others are negligible. The robot couldn’t have acted in a way that result in negative consequences according to human values simply because the robot’s actions didn’t have that many consequences at all.
Contrast this with breaking into the coffee shop and having the police called. This represents a pretty significant change to the state the of the world, and this change was effected by the robot’s behavior. If the robot can see this coming (and if it’s smart enough, it can), then this should at least give it a reason to be suspicious that this might be a bad idea. If the robot doesn’t know if this is bad (relative to not having coffee), it might decide that the risk of it being bad is enough not to try it. Or, alternatively, the robot might ask someone for more information about the choice it’s thinking about making.
Said one way, the robot should strive not change things unless it understands the impact of that change. Said another way, we’d like the robot to have a sense of humility about its own ideas of what is best for the world at large. Sounds like a pretty good idea (and maybe not just for robots).
Provably Beneficial AI
These arguments all seem kind of squishy and non-technical, but the nice thing about them is that they can be formalized in actual concrete mathematical terms. Even better, once formalized, we can use this mathematics to prove that a certain behavior will be beneficial to its owner. That is, given some amount of information about human preferences, our robot can have strong guarantees that its behavior does not run contrary to the owner’s values.
Does this guarantee that all robots will always behave in a way that benefits all people? No. If a nefarious human gives a robot the idea that, say, compliance with the law has negative value, the robot will still act accordingly. Even if a well-meaning human correctly programs their values, there’s no guarantee the robot won’t behave in a way that runs contrary to the values of some other person. The guarantees here are that the robot will not violate preferences that the owner has that the robot has only limited information about, which is a much more difficult, common, and dangerous problem.
Since Russell started promoting this line of thought, a number of critics have pointed out that Russell is a person from a certain country, of a certain race and gender, with a certain religious inclination, and so on. Why should we trust him (or anyone) to put a system of values in place for our robot? It’s a sound argument, but it’s occasionally taken too far when these critics say that because no one can really be trusted, we should avoid Russell’s ideas, presumably to maintain some level of “neutrality” or that “it will do more harm than good to give a robot values”.
It’s important to realize that any software with a goal (which is really to say, all software) is neutral in absolutely no meaningful way. Anything that takes action in the world has an implicit set of values that are absurdly far from anything approaching what we would recognize as “good”. The coffee fetching robot already has values that, as we’ve seen, are completely broken in a very well-defined sense.
The question isn’t whether we should all buy into the value system of the programmer as being the right ones, the question is whether the programmer’s values are better than the ones you’d get otherwise. Most people would probably agree that any human could do better than “by all means, use lethal force to acquire coffee.” The former isn’t perfect, but if you had to choose, you’d probably take it over the latter.
Reasons for Optimism
I was on a panel recently with Jorie Koster-Hale discussing the changes to society that pervasive AI will bring. We were talking about employment, but she had a hopeful perspective that I think applies here as well. She pointed out that there aren’t many strong reasons to believe this change will be much worse than past societal changes, and there are reasons to believe that it will be better.
While powerful AI does have the potential to be a destructive force, people like Dr. Russell are out ahead of it. Neither panic, nor denial, nor sanguine acceptance will produce anything useful. But by being precise about the problems that AI might have and what their solutions might look like, we have a real shot at creating something that will transform our lives for the better.