As you, our faithful readers know, we compared some machine learning services several months ago in our machine learning throwdown. In another recent blog post, we talked about the power of ensembles, and how your BigML models can be made into an even more powerful classifier when many of them are learned over samples of the data. With this in mind, we decided to re-run the performance tests from the fourth throwdown post using BigML ensembles as well as single BigML models.
You can see the results in an updated version of the throwdown details file. As you’ll be able to see, the ensemble of classifiers (Bagged BigML Classification/Regression Trees) almost always outperform their solo counterparts. In addition, if we update our “medal count” table tracking the competition among our three machine learning services, we see that the BigML ensembles now lead in the number of “wins” over all datasets:
Contender | Gold | Silver | Bronze | Total |
---|---|---|---|---|
BigML (with Bagging) | 12 | 11 | 5 | 28 |
Google Prediction API | 10 | 13 | 5 | 28 |
6 | 4 | 11 | 21 |
Are we saying this just to raise our self-esteem by bringing down others? Yes, absolutely. In your face, Google Prediction API! On an admittedly limited sample of datasets in a wide variety of domains, ensemble of trees tend to outperform all other off-the-shelf classifiers. Oh, you don’t believe us? Well then why don’t you go and ask SCIENCE?
As we’ve said before, the differences between classifiers are fairly small, and performance alone probably shouldn’t drive your decision to use one service or another. Perhaps more important than its performance is the fact the BigML gives you the best of both worlds: A fully-white-boxed, downloadable model that is easy to interpret, and even beautiful, with the power to kick things into high gear for maximum performance.
Try creating some ensembles on your data with the BigMLer command line interface. The power of ensembles is a single command away!
4 comments