Tonight I am excited to participate in the SVForum panel on Machine Learning as a Service, my first public event since joining BigML two weeks ago. In addition to touting BigML’s amazing SunBurst visualization for decision trees, I also hope to discuss broader questions about machine learning as a service. To start with, why build such a service?
For anyone who has wrestled with today’s tools, the answer is simple: machine learning is way too hard. You spend most of your time fretting over the fiddly details of your workflow rather than finding insight in your data, and you don’t get the privilege of experiencing such pain without first mastering an array of specialized technical skills. Oh, and nothing’s integrated. Need to validate your model? Cobble that together yourself. Want better visualization? Maybe there’s a package somewhere that does that. Having a bunch of letters after your name doesn’t help, either: maybe you have a better appreciation for how needless all this ML pain is, but you’re still in the same boat as everyone else.
This is why BigML has lovingly crafted the easiest imaginable workflow for model training, evaluation and visualization. You upload your training set, select the target variable, press a button, and get a beautiful interactive visualization of a decision tree. From playing with the tree, you get new ideas for which features are useful, and BigML lets you quickly iterate new training sets to test these ideas. Once you’re happy with your inputs, we make it easy to validate your model by splitting data into training and test. And if you need predictive power for production use, you can train a random forest or bagged ensemble with a single click, then download that multi-tree model as code—again, with just one click.
Any ML task that can be automated is almost certainly boring, and therefore what can be automated, should be. Once the boring part is automated, what remains is the fun part: understanding your data, testing your ideas, and figuring out how to implement your findings.
All this automation raises another question: is it possible to make machine learning too easy? What happens when anyone can train a model? The answer, I think, is to give people the tools they need to validate their models and understand the limitations of their data. BigML already makes it easy to split data into training and test, and provides reports that use scores from the test set to evaluate the model. The next step is to bring cross-validation to the web interface (currently it is only available in the BigML API), while polishing our evaluation reports to make them simple to understand.
Another way to address this concern: just go back to using old tools for machine learning. There’s nothing easy about that!