At BigML we believe that over the next few years automated, data-driven decisions and data-driven applications are going to change the world. In fact, we think it will be the biggest shift in business efficiency since the dawn of the office calculator, when individuals had “Computer” listed as the title on their business card. We want to help people rapidly and easily create predictive models using their datasets, no matter what size they are. Our easy-to-use, public API is a great step in that direction but a few bindings for popular languages is obviously a big bonus.
Thus, we are very happy to announce an open source Python binding to BigML.io, the BigML REST API. You can find it and fork it at Github.
The BigML Python module makes it extremely easy to programmatically manage BigML sources, datasets, models and predictions. The snippet below sketches how you can create a source, dataset, model and then a prediction for a new object.
from bigml.api import BigML api = BigML() source = api.create_source('yourdata.csv') dataset = api.create_dataset(source) model = api.create_model(dataset) prediction = api.create_prediction(model, new_object)
Just like magic!
We have tried to build a very simple binding just wrapping all the HTTP requests and responses to BigML.io within one class. Over the next few weeks we’ll see how to add more layers of abstraction so that you can have different ways to exploit all the information provided by datasets, models and predictions. You can see a few more examples in the github page.
Getting back to the example above, imagine all the steps you would need in order to create a predictive model using another ML or statistical package. There are several specific Machine Learning libraries for Python. For example, PyML, PyBrain, or Orange to name a few. Of course there are also the fabulous SciPy and NumPy libraries. They are great tools that can be the perfect complement or supplement to your BigML application. But using them is still non-trivial, and one needs to pay attention to lots of nitty-gritty details to model any realistic problem.
On the other hand, there are a few advantages to a clould-based machine learning service like BigML that you need to bear in mind:
- You do not need to spend resources and time engineering complex machine learning or distributed algorithms on your own.
- All the computation is performed remotely and asynchronously by BigML so you have access to unlimited computing resources without the need to purchase or maintain machines or software packages.
- You can minimize distraction and focus on understanding and evaluating the insights locked in your data.
There are also two key specific advantages to building BigML predictive models:
- Beautiful Models: See our former post about it. A few days ago Ajay Ohri wrote about BigML:
The website is very intuitively designed – You can create a dataset from an uploaded file in one click and you can create a Decision Tree model in one click as well. I wish other cloud computing websites like Google Prediction API make design so intuitive and easy to understand.
So when you use BigML API or the Python binding to create new models programmatically you can later access to them through the BigML interface where you can nicely visualize and explore the models in your dashboard.
- White-Boxed Models: BigML predictive models are fully white-boxed. As Ajay Ohri mentioned:
Also unlike Google Prediction API, the models are not black box models, but have a description which can be understood
In other words, your model is always an API call away, allowing you to download it locally and deploy it however you choose.
When we first started BigML we spent some time making our models exportable to PMML. However, we soon saw that creating a light-weight version in JSON was the way to go. Not only the models are smaller, simpler and easier to read, but they are also directly translatable to PMML.
Finally, we didn’t want to finish this post without giving you a sneak peek of things that are coming soon:
- Remote URLs and S3 buckets to create your models,
- Increasing the size for individual sources to 64GB (and counting!)
- Accepting new formats like Microsoft Excel and Apple Numbers files.
- Open source bindings for R and Clojure. We also excited to know that some folks in the community are already working on bindings for Ruby, .NET and PHP. If you’re working on a binding for your favorite language we’d love to hear about it!
We are doing our best to make “machine learning for everyone” a reality. Now it’s time to unleash your inner hacker and show the world what machine learning can do in your application. If you don’t have a BigML account yet or need more credits just send us an email to firstname.lastname@example.org and we’ll be happy to move you to the top of our invite list.
Cool software. But is the actual code for BigML open source? I would like to run this on my own server.
What we have open sourced is the Python bindings that connect to our public API. You can use it to easily connect to our servers. If the size of your data is relatively big you do not need to be concerned with installing software, configuring machines, etc and you can just focus on analyzing and understanding your predictive models. We are working on both extending the ways you can share data with our servers and the number of bindings and tools that you can use to programmatically interact with BigML.
Cheers, The BigML Team
Well, isn’t that a clever way of getting numerous datasets for free.