In my previous two posts in this series, I’ve essentially argued both sides of the same issue. In the first, I explained why deep learning is not a panacea, when machine learning systems (now and likely always) will fail, and why deep learning in its current state is not immune to these failures.
In the second post, I explained why deep learning, from the perspective of machine learning scientists and engineers, is an important advance: Rather than a learning algorithm, deep learning gives us a flexible, extensible framework for specifying machine learning algorithms. Many of the algorithms so far expressed in that framework give orders of magnitude-level improvement on the performance of previous solutions. In addition, it’s a tool that allows us to tackle some problems heretofore unsolvable directly by machine learning methods.
For those of you wanting a clean sound-byte about deep learning, I’m afraid you won’t get it from me. The reason I’ve written so much here is that I think nature of the advance that deep learning has brought to machine learning is complex and defies broad judgments, especially at this fairly early stage in its development. But I think it is worth it to take a step backward and try to understand which judgments are important and how to make them properly.
Flaky Machines or Lazy People?
This series of posts was motivated in part by my encounters with Gary Marcus’ perspectives on deep learning. At the root of his positions is the notion that deep learning (and here he means “statistical machine learning”) is, in various ways, “not enough”. In his medium post, it’s “not enough” for general intelligence, and in the synced interview it’s “not enough” to be “reliable”.
This notion of whether current machine learning systems are “good enough” gets to the heart of the back and forth on deep learning. Marcus cites driverless cars as an example of how AI isn’t mature enough yet to rely on 100%, and that AI needs a “foundational change” to ensure a safe level of reliability. There’s a bit of ambiguity in the interview about what he means by AI, but my own impression is that this is less of a critique of machine learning, and more of a critique of the software around it.
For example, we have vision systems able to track and identify pedestrians on the road. These systems, as Marcus says, are mostly reliable but certainly make occasional mistakes. The job of academic and corporate researchers is to create these systems and make them as error-free as possible, but in the long run, they will always have some degree of unreliability.
Something consumes the predictions of these vision systems and acts accordingly; it is and always will be the job of that thing to avoid treating these predictions as the unvarnished truth. If the predictions were guaranteed to be correct, the consumer’s job would be much easier. As it is, consuming the predictions of a vision system requires some level of cleverness and skepticism. Maybe that cleverness involves awareness of separate sensor systems or other information streams like location and time of day. It might require symbolic approaches of the type Marcus favors. It might require more and very different deep learning, as Yann LeCun suggests. It might require something that’s entirely new.
Designing software that works properly with machine-learned models is hard. You have to do the difficult work of characterizing the model’s weaknesses and engineering around them. But critical readers should reject the notion that machine learning needs to provide extreme reliability on its own in order to be useful in mission critical situations. If a vision system can accurately find and track 95% of pedestrians, and other sensors and logic pick up the remaining 5%, you’ve arrived at “enough” without having a perfect model.
When is “Enough” Enough?
So then the question becomes, “are we there yet?” with current ML systems. That depends, of course, on how good we think we need them to be for the engineers and domain experts to pull their outputs across the finish line. There are a lot of areas in which deep learning puts us in shouting distance, but it general, whether or not we’re there yet depends in turn on what you want the system to do and the quality of your engineers. When thinking about that question, though, it’s important to consider that the finish line might not be exactly where you think it is.
Consider the problem of machine translation. Douglas Hofstadter wrote a great article where he systematically lays bare the flaws in state-of-the-art machine translation systems. He’s right: For linguistic ideas with even a little complexity, they’re not great and are at times totally unusable. But the whole article reminded me of a blog post Hal Daumé III wrote more than 10 years ago when he and I were both recent Ph.D.’s In it, he wonders how much of human translation is better than computer translation when you really consider everything (street signs, menus, simple interpersonal interactions, and so on). Again, he asks this more than ten years ago.
The point here is that if machine translation for these things is already noticeably better than the second-rate human translations we apply in practice (or was ten years ago), there’s already a sense in which the models we have are very much good enough. How it deals with more complex phrases and ideas is an interesting question, and might yield new research directions, but this is all academic as far as its applicability is concerned. The existing technology, imperfect as it is, has a use and a place in society.
Even less relevant is how “deep” the model’s knowledge is, or how “stupid” it is, or whether the algorithm is “actually learning” (whatever that means). These are all flavors of the “computers don’t really understand what they’re doing” argument that traces its way through Hofstadter, John Searle, Alan Turing and dozens of other philosophers all the way back to Ada Lovelace. There are loads of counter-arguments (I have even spun out a few of my own versions), but maybe the most compelling reason to ignore these questions is that the answers are often less interesting than the answer to the question, “Can we use it?”
A number of years ago, my wife and I hosted two members of a Belgian boys choir that was on tour. Neither she nor I spoke any French, so we relied on Google Translate to communicate with them. To this day, I remember typing “We made a pie. Would you like some?” into my phone and watching their faces light up as the translation appeared. Did the computer understand anything about pie, or generosity, or the happiness of children, or how its own flawed translations could help create indelible memories? Probably not. But we did!
The Final Exam
The criticism that machine learning is not enough on its own to produce systems that exhibit reliably intelligent behavior is a broken criticism. Deep learning gets us part of the way towards such systems, perhaps quite a lot of the way, but does anyone think it’s necessary or even advisable to cede the entire behavior of, say, a car to a machine-learned model? Saying no doesn’t mean backing away from a fully-autonomous car; as Marcus himself points out, there are other techniques in AI and software at large that are better suited to certain aspects of these problems. There can be many layers of human-comprehensible logic sitting between deep learning and the gas pedal, and it’s likely the totality of the system, rather than the learned component alone, that will display behavior that we might recognize as intelligent.
Is it a flaw or a problem with deep learning when it can’t solve the aspects of these problems that no one really wants or needs solved? I don’t think so. Again, paraphrasing Marcus (and myself), machine learning is a tool. If you buy a nail gun and it jams, then yeah, that’s a problem with the nail gun, but if you try to use a nail gun to cut a piece of wood in half, that’s more of a problem with you. Deep learning is a very important step forward in the evolution of the tool (and a large one compared to other recent steps), but that step doesn’t change its fundamental nature. No matter what improvements you make, a nail gun is never going to become a table saw. Certainly, it’s unethical and bad business for tool manufacturers to make inflated claims about their tool’s usefulness, but it’s finally the job of the operator to determine which tool to use and how to use it.
Pundits can argue all day long about how impactful deep learning is and how smart machine learning can possibly be, but none of those arguments will matter in the long run. As I’ve said before, the only real test of the usefulness of machine learning is if domain experts and data engineers can leverage it to create software that has value for other human beings. Therein lies the power, the only real power, of new technology and the only goal that counts.
One comment