I am fortunate enough to have had a number of conversations with Dr. Kiri Wagstaff of NASA’s JPL on a number of occasions (you might as well get the jokes about “not having to be a rocket scientist to understand machine learning” out of the way right now).
Wagstaff is a brilliant scientist. On top of that, and fortunately for all of us, she works very “close to the data”, using her machine learning expertise to solve important problems that directly impact people other than machine learning researchers. This closeness to the data is somewhat rare among machine learning experts but is becoming more and more common. Our own chief scientist Tom Dietterich is a pioneer in computational sustainability and ecosystem informatics. Another acquaintance of mine, Rayid Ghani, left a lucrative position at Accenture Research to head up President Obama’s vaunted data analytics team.
At the 2012 ICML conference, Wagstaff wrote an excellent little position paper calling for “Machine Learning that Matters”. In it, she points out (with little argument from the gallery) that machine learning papers typically proceed as follows: Find a dataset for a problem that is only partially solved, invent or improve an algorithm to solve this problem such that some improvement on some standard metric is made on that dataset, write it up and publish. Repeat until tenured.
Wagstaff notes in her paper that, while this approach produces a substantial number of computer science professors, it isn’t nearly as successful at getting machine learning algorithms in a position to make a measurable difference in society at large. At fault, she says, is both the data we use to test algorithms and the metrics we use to evaluate our tests. Many of the datasets we use in the scientific literature in machine learning have been around for years or even decades. Producing a small performance improvement on the vast majority of these datasets will have little effect outside of machine learning conferences. And what do we even mean when we say “small performance improvement”? If the AUC of my classifier on some dataset is 0.85 and that of yours is 0.81, what does that say about the consequences of using my classifier versus yours in the context of that data? The answer, of course, is totally dependent on the data. And if those consequences are meaningless or unnoticable, then we’re well on our way to producing technology useful to absolutely no one.
She then goes on to list some of the challenges to socially relevant machine learning, and calls for a renewed focus on real-world data in the machine learning literature. She also calls for metrics that reflect the real world impact of the technology (think, instead of “F-Measure”, of something like “gallons of water sanitized” or “doses of vaccine distributed”). Finally, she provides a very nice list of challenge problems for machine learning researchers, which we’ll get to later. Definitely read the whole paper, which is very accessible to non-experts, to get the full-fat version of the argument. For now, though, I’m going to play devil’s advocate a bit with Wagstaff’s excellent work. I’m not doing this to be argumentative, as I mostly agree with everything she is saying. Rather, I’m going to use it as a jumping off point to maybe inspire some of the work she’s hoping will happen.
Sympathy for the Devil
The conundrum that Wagstaff points out is possibly as old as science itself. Dijkstra advises scientists to strive in their work for both “scientific soundness” and “social relevance”. Science, especially computer science, occupies a shadowy space between mathematics and engineering. Like mathematicians, we would like our work to be abstract and general, to say something fundamental about optimization or, even better, about intelligence. Like engineers, however, we would also like to solve a specific problem in the world. To be able to point to a cured patient, or an organized news feed, or a translated document and say “my algorithm did that”.
And therein lies the problem: The more abstract and general something is, the less specific it is. Machine learning scientists, both writers and reviewers, faced with the choice, tend towards the former. Part of the reason, I suspect, is that it’s easier to retain some degree of scientific objectivity when dealing with the abstract. Suppose a paper proposes a classifier system that lowers the total cost of cancer screenings by half and only raises the rate of undetected cancers by 1%. One suspects that reviewers who have lost loved ones to cancer may feel differently about this than ones who have not. Another part of the reason is a healthy suspicion of solutions that don’t generalize past a single case, as such niche solutions are an important step on the road to charlatanry.
Nonetheless, science can and does impact the real world in very specific ways. The frustrating part about machine learning in particular is that we are so tantalizingly close to having our cake and eating it, too. A century ago, Einstein labored at relativity with (probably) no thought to GPS satellites, so he could not have appreciated the full effect his work would have on society. Scientists are accustomed to the idea that our work might not be useful to society for years after it occurs, and so there’s no stigma attached to producing work that is scientifically sound but not (yet) socially relevant. But machine learning is so directly applicable to so many problems that at times it seems downright easy to find a landing zone for these algorithms. Yet, the traditional scientific mindset pulls us back to the abstract. Even though the gap between theory and practice in machine learning seems nearly bridged, it looks like we still need a little help.
Fixing a Hole
In Wagstaff’s paper, she lays out a program of challenges for machine learning researchers interested in making a difference in the real world:
- A law passed or legal decision made that relies on the result of an ML analysis.
- $100M saved through improved decision making provided by an ML system.
- A conﬂict between nations averted through high quality translation provided by an ML system.
- A 50% reduction in cybersecurity break-ins through ML defenses.
- A human life saved through a diagnosis or intervention recommended by an ML system.
- Improvement of 10% in one country’s Human Development Index (HDI) attributable to an ML system
This is an excellent set of challenges, and I would guess that few or none will be solved by traditional machine learning scientists (with the exception of #5, which I think may have already been met depending on how one defines “saved”). This isn’t because the problems themselves are terribly hard from a machine learning perspective (though they might be), but simply because we don’t have many machine learning researchers at high levels of government, or writing laws, or responsible for network security. They have spent their careers studying machine learning, in all of its arcane mathematical glory. To think they might also know enough about law to apply it in that context is asking a lot. Who will come to their aid?
At BigML, we’ve committed ourselves to putting machine learning technology into the hands of people who can meet the challenges that Wagstaff has put forth. You have the data and the expertise to know which problems in your field of interest can be solved by machine learning; we have the machine learning expertise to provide you with that solution. I don’t think we need machine learning gurus to tackle Wagstaff’s problems, I think we need gurus from other fields who know enough about machine learning to use it. Wagstaff alludes to this in her paper. If we can get usable, flexible, dependable machine learning software into the hands of domain experts, benefits to society are bound to follow.
If I Had a Hammer
I often tell people that machine learning is a hammer. Machine learning scientists are busy every year, every day, improving that hammer, making it more durable, harder, improving the shape, and on and on. But as Wagstaff points out, that hammer is no good to anyone until someone starts hitting nails with it. BigML is giving the world a chance to swing that hammer, whether you’re an expert in machine learning or anything else.
This is an excellent, well written post. I completely agree about the tension between solutions that are generalizable-yet-abstract and specific-yet-limited-in-scope.
I also particularly like this point:
“This isn’t because the problems themselves are terribly hard from a machine learning perspective (though they might be), but simply because we don’t have many machine learning researchers at high levels of government, or writing laws, or responsible for network security.”
And I applaud you for working to give people ML tools so that problems outside of the benchmark world can be tackled. I also see a lot of power in collaborations between ML experts and people in other domains, for problems where the nuances of data set representation, algorithm choice, robustness to noise, and principled evaluation methodology call for more ML expertise (areas we might refer to as the “art” of ML since at this point we don’t have general, abstract solutions for some of these design choices).
Reblogged this on Personal Notes and commented:
Love the points made by the writer based on the Dr. Kiri Wagstaff’s paper – Machine Learning that Matters