Hi, I’m Nick the intern. The fine folks at BigML brought me on board for the summer to drink their coffee, eat their snacks, and compare their service to similar offerings from other companies. I have a fair amount of software engineering experience but limited machine learning skills beyond some introductory classes. Prior to beginning this internship, I had no experience with the services I am going to talk about. Since BigML aims to make machine learning easy for non-experts like myself, I believe I am in a great position to provide feedback on these types of services. But please, take what I say with a grain of salt. I’ll try to stay impartial but it’s not easy when BigML keeps dumping piles of money and BigML credits on my doorstep to ensure a favorable outcome.
From my time at BigML, it has become clear that everyone here is a big believer in the power of machine learning to extract value from data and build intelligent systems. Unfortunately, machine learning has traditionally had a high barrier to entry. The BigML team is working hard to change this; they want anyone to be able to gain valuable insights and predictive power from their data.
It turns out BigML is not the only player in this game. How does it stack up against the competition? This is the first in a series of blog posts where I compare BigML to a few other services offering machine learning capabilities. These services vary in multiple ways including the level of expertise required, the types of models that can be created, and the ease with which they can be integrated into your business.
- BigML is a startup that aims to make machine learning available to anyone, whether or not they know how to program. Their predictive models can be created and explored with the web interface or API. You can request an invitation to the beta on their website.
- Google Prediction API provides machine learning tools on top of Google’s cloud infrastructure. Among the cloud-based services, this one has been around the longest and was first released in May 2010.
- Prior Knowledge helps you make sense of your data and build intelligent applications. Their Veritable API provides access to a predictive database and is currently in public beta.
- Weka is a powerful application and suite of algorithms for experienced machine learning practitioners. It is a standalone open source (GPL) Java application that you install and run on your own computer.
The first three are all cloud-based services that were chosen because they have an API and don’t require software to be installed. Other services that didn’t make the cut include Precog (machine learning is “coming soon”) and Skytree Server (requires running your own servers). Weka, the traditional desktop application, is included in the comparison to establish a baseline for performance. Other similar applications not included in this comparison include Apache Mahout, RapidMiner, and Orange, as well as various machine learning libraries for Matlab, R, Java, Python, and other programming languages.
In upcoming blog posts, I will compare the chosen services using the following criteria:
- Data Preparation – Formatting your data and loading it into the service
- Model Creation and Usage – Ease of creation/use, visualization, robustness to data problems, and integration
- Prediction Creation – Ease of creation, accuracy, and integration
- Miscellaneous – Cost, support, documentation, etc.
Stay tuned to see how BigML is doing compared to its competition.
(Note: Per Dec 5, 2012 Prior Knowledge no longer supports its public API.)
I look forward to your observations
But would also like to see you include RapidMiner, if you can find the time
Reblogged this on Data Science 101 and commented:
This will probably be a good series to follow. What machine learning options are available?
Nick thats a great blog for starters.
Looking forward to you sharing a post on the programming languages that are at use in BigML core infrastructure software development (i am not talking of language bindings) and your experience with libraries, scalability and performance.