Machine Learning for the Masses
Machine learning is a tool, and like any useful tool, it has the potential make the world better for experts and non-experts alike. This is especially true for businesses of all stripes. Whether you work with wonderful wines, worrisome weather, or worn out Winnebagos, BigML can help you understand and predict the important aspects of your business.
So how can machine learning work for you? Maybe Wendy can help you find out.
R you ready for Big Machine Learning?
Recently, we released python bindings for our API. We received fantastic feedback on the related blog post from hacker news and twitter, so we started thinking about other languages that could benefit from a tighter integration with the BigML API.
Today BigML releases the bigml package for R. R is already well known for its capabilities in statistics and data analysis, and we use it internally for a number of different day-to-day tasks. The bigml package enables the R community to easily take advantage of our highly scalable cloud based machine learning infrastructure, while using familiar R data structures and workflows.
The package is available through CRAN here.
To use it, simply install it from CRAN as usual, and load it via the library command. You will also need to set your credentials, either by running the setCredentials command or by setting BIGML_USERNAME and BIGML_API_KEY in your .Renviron file.
install.packages("bigml")
library(bigml)
setCredentials("username", "api_key")
If this is the first time you’ve worked with cURL, it may be necessary to update your SSL certificates. You can do that from within R by:
download.file(url="http://curl.haxx.se/ca/cacert.pem", destfile="cacert.pem");
curl <- getCurlHandle();
options(RCurlOptions = list(capath = system.file("CurlSSL", "cacert.pem",package = "RCurl"), ssl.verifypeer = FALSE));
curlSetOpt(.opts = list(proxy = ’proxyserver:port’), curl = curl);
R’s dataframes are very similar to BigML datasets. The bigml package lets you directly specify a model from an existing R dataframe.
So, let’s create a model from the iris data set that comes with R, and make a prediction with it.
iris.model = quickModel(iris, objective_field = 'Species' ) quickPrediction(iris.model,c(Sepal.Length=2, Petal.Length=3, Petal.Width=2)) # [1] "virginica"
That’s it!
The bigml package will manage the creation of appropriate sources, datasets, and models behind the scenes. It will also let you use the existing names in the original dataframe to specify input or objective fields.
The “quick” methods are geared towards working with native R resources, and do a large number of things automatically. If you prefer more manual control of the operations, basic methods are also provided :
write.csv(iris, "~/iris.csv", row.names=F)
iris.source = createSource("~/iris.csv", 'iris')
iris.dataset = createDataset(iris.source$resource)
iris.model = createModel(iris.dataset$resource)
After you create models, datasets, and other resources, you can easily retrieve your list of resources with the package as well. The bigml package reconfigures the API results into simple R dataframes in order to present well-structured records of every resource you have created at BigML:
listSources() # e.g. # name updated number_of_datasets etc... #1 iris 2012-05-08T20:58:01.916000 1 ... #2 test 2012-05-08T20:40:28.846000 1 ... #3 census 2012-05-08T20:34:53.398000 1 ... listDatasets() listModels()
There are many more options available in the package, so be sure to check out the documentation for the individual functions, or the package itself. It’s available here and within the package itself:
?bigml ?quickModel ?listModels
We’ve also made the package source code available at the bigml-r repo on github, in case you’re interested in following the progress, or making suggestions.
At BigML we believe that over the next few years automated, data-driven decisions and data-driven applications are going to change the world. In fact, we think it will be the biggest shift in business efficiency since the dawn of the office calculator, when individuals had “Computer” listed as the title on their business card. We want to help people rapidly and easily create predictive models using their datasets, no matter what size they are. Our easy-to-use, public API is a great step in that direction but a few bindings for popular languages is obviously a big bonus.
Thus, we are very happy to announce an open source Python binding to BigML.io, the BigML REST API. You can find it and fork it at Github.
The BigML Python module makes it extremely easy to programmatically manage BigML sources, datasets, models and predictions. The snippet below sketches how you can create a source, dataset, model and then a prediction for a new object.
from bigml.api import BigML
api = BigML()
source = api.create_source('yourdata.csv')
dataset = api.create_dataset(source)
model = api.create_model(dataset)
prediction = api.create_prediction(model, new_object)
Just like magic!
We have tried to build a very simple binding just wrapping all the HTTP requests and responses to BigML.io within one class. Over the next few weeks we’ll see how to add more layers of abstraction so that you can have different ways to exploit all the information provided by datasets, models and predictions. You can see a few more examples in the github page.
Getting back to the example above, imagine all the steps you would need in order to create a predictive model using another ML or statistical package. There are several specific Machine Learning libraries for Python. For example, PyML, PyBrain, or Orange to name a few. Of course there are also the fabulous SciPy and NumPy libraries. They are great tools that can be the perfect complement or supplement to your BigML application. But using them is still non-trivial, and one needs to pay attention to lots of nitty-gritty details to model any realistic problem.
On the other hand, there are a few advantages to a clould-based machine learning service like BigML that you need to bear in mind:
- You do not need to spend resources and time engineering complex machine learning or distributed algorithms on your own.
- All the computation is performed remotely and asynchronously by BigML so you have access to unlimited computing resources without the need to purchase or maintain machines or software packages.
- You can minimize distraction and focus on understanding and evaluating the insights locked in your data.
There are also two key specific advantages to building BigML predictive models:
- Beautiful Models: See our former post about it. A few days ago Ajay Ohri wrote about BigML:
The website is very intuitively designed – You can create a dataset from an uploaded file in one click and you can create a Decision Tree model in one click as well. I wish other cloud computing websites like Google Prediction API make design so intuitive and easy to understand.
So when you use BigML API or the Python binding to create new models programmatically you can later access to them through the BigML interface where you can nicely visualize and explore the models in your dashboard.
- White-Boxed Models: BigML predictive models are fully white-boxed. As Ajay Ohri mentioned:
Also unlike Google Prediction API, the models are not black box models, but have a description which can be understood
In other words, your model is always an API call away, allowing you to download it locally and deploy it however you choose.
When we first started BigML we spent some time making our models exportable to PMML. However, we soon saw that creating a light-weight version in JSON was the way to go. Not only the models are smaller, simpler and easier to read, but they are also directly translatable to PMML.
Finally, we didn’t want to finish this post without giving you a sneak peek of things that are coming soon:
- Remote URLs and S3 buckets to create your models,
- Increasing the size for individual sources to 64GB (and counting!)
- Accepting new formats like Microsoft Excel and Apple Numbers files.
- Open source bindings for R and Clojure. We also excited to know that some folks in the community are already working on bindings for Ruby, .NET and PHP. If you’re working on a binding for your favorite language we’d love to hear about it!
We are doing our best to make “machine learning for everyone” a reality. Now it’s time to unleash your inner hacker and show the world what machine learning can do in your application. If you don’t have a BigML account yet or need more credits just send us an email to support@bigml.com and we’ll be happy to move you to the top of our invite list.
Machine Learning in Action: Interactive Model Gallery
In this blog we have been showing you pictures of our decision tree models. Looking at a picture of colored balls that are connected somehow doesn’t really do it. But when the model is actually clickable and interactive, it starts to make a lot more sense! The problem was that you had to be a registered user to get access to this great feature. Until now.
We have introduced a model gallery that is publicly accessible. It is populated with (sample) models that are just as clickable and interactive as if you were logged in! Now you can fully explore the great features of our decision trees. We have included some samples in a range of categories. For instance this Solar Energy Model. It predicts the output of a solar power system installed in Berkeley, CA (USA). (The data was compiled by Ph.D. candidate Alexandra Constantin and is available here.)
We’re working hard to make it possible for all our users to share their models, if they want to. This way you can incorporate them in a blog, link to them from a web site, or share them in your favorite social media. After all, it’s a waste if the rest of the world can’t enjoy the great stuff that you have created!
For now, if you think our Model Gallery is missing some great models and you have the data, talk to us at feedback@bigml.com!
Predicting Survival on the Titanic
On April 15, 2012 it is exactly 100 years ago that the Titanic hit an iceberg and sank. 1,496 people lost their lives. Only 712 survived. The Titanic became iconic, being built as a fast and unsinkable ship, yet sinking on her maiden journey.
There is an enormous amount of data available on the Titanic, her journey, her passengers and crew. We have taken some of that data and ran it through our algorithms, to see what we could learn from it. Here’s the resulting predictive model, based on “Age”, “Gender”, “Fare Price” (in modern Pounds), “Class/Department”, “Group”, “City of Embarkment” and (as objective field) “Survived”. Here’s the interactive model you can explore: Surviving the Titanic.
As it turns out, the single most important thing you could have done to survive was . . . be female. The phrase “women and children first” was taken seriously on the Titanic, and women had a much better average survival rate than men. “Class/Department” was also very important for your chances of survival. Working in food service or in the “Engine Room” greatly reduced your odds. Working on deck, however, gave you a head start to the lifeboats. For passengers, the first class passengers had far better luck than those down in third class.
Behind the nodes, branches and colors of the decision tree, there are stories that the model doesn’t tell. If you go to the Titanic museum in Atlanta, you can get a reproduction of a Titanic boarding pass with information about one of her passengers. Like this one, with information about Mr. Austin Partner, a stockbroker from Surrey who had made the voyage to Canada 17 times, and had just started a new job.
Unfortunately, Mr. Partner did not make it to Canada. Although a majority of the first class passengers survived, the model tells us that Mr. Partner had other factors working against him. He was a man, middle-aged, and had a ticket that was cheap among the first class tickets.
Mr. Partner left behind a wife and a pair of young sons in Surrey.
Another story with a more happy ending is that of surviving crew member mr Frederick Fleet (age 24), who boarded the Titanic as Lookout. He reported the iceberg, when on duty late night on Apr 14, 1912.
“On April 14, 1912, along with Mr Reginald Lee, Fleet took watch at 10pm, relieving Mr George Symons and Mr Archie Jewel from the previous watch. Just after seven bells, Fleet saw a black mass ahead, immediately struck three bells and telephoned the bridge. He reported “Iceberg right ahead,” receiving the reply “Thank you.” While still on the telephone, the ship started swinging to port. The lookouts saw the starboard side of the ship scrape alongside the iceberg, and saw ice falling on the decks. They had thought that it had been either a close shave or a near miss. The lookouts remained in the crows nest until relieved about 20 minutes later.” (From www.encyclopedia-titanica.org)
Frederick Fleet made it to life boat 6, the first boat to be launched from port side. He would return to sea and continued sailing with various companies for another 24 years.
There is always more to analysis than just the mere data and models. Tell the story and let the model come alive.
Machine Learning at Your Fingertips!
BigML is now fully iPad enabled! You can access all of your datasets and models, and make predictions from anywhere you can bring your iPad. I’m sure we all remember a recent past in which analyzing gigabytes of data or using predictive models meant you needed a very powerful machine or even a cluster of machines with loads of complex software. With our iPad interface, you have the power of that cluster while sitting on your sofa, or waiting in line at your preferred coffee shop, with no extra software required.
One of the things that has made iPad support a lot easier is a decision we made early on to be “forward-focused” rather than “backward-focused”. What does this mean? Glad you asked!
Building something as useful as possible with limited resources means a lot of hard choices, and some of the hardest choices in the web development world involve browser support. Many of the best ideas in web development in the last few years are not completely compatible with every single browser. Technologies like HTML5 and SVG let you create beautiful in-browser graphics that were previously either impossible or required plugins.
So we had a choice:
- Design something that we loved using great new technologies compatible with most modern browsers
- Design something substandard with old technology for older browsers
We decided that we’re building a tool for the future, and so that’s where our focus was going to be. While we’re always trying to be compatible with more browsers, our main development efforts usually concentrate on the latest versions of Firefox, Chrome, and Safari. So get your hands on a copy of Firefox (it’s free!) or an iPad. BigML is waiting for you!
From Zero to Machine Learning in Less than Seven Minutes
Here at BigML, we do a lot of work trying to make machine learning accessible. This involves a lot of thought about everything from classification algorithms, to data visualization, to infrastructure, databases, particle physics, and security.
Okay, not particle physics. But definitely all of that other stuff.
After all that thinking, our hope is that we’ve built something that non-experts can use to build data-driven decisions into their applications and business logic. To get you started, we’ve made a series of short videos showing the key features of the site. Watch and learn. Machine learning is only seven minutes away.
Data Source Creation
Dataset Creation
Model Creation
Model Interaction
Prediction
Introducing the Tea Leaf Predictor
One of our goals at BigML is to provide you with a predictive model that can help you run your business or build a better software application. While we’ve been trying to build a service that does this through the magic of data analytics, we realized today that we’ve been missing the big picture. Analyzing big data is useful, but what about tapping into the all-knowing energies of the universe, made manifest in inscrutable cosmic signs, and destined to remain a mystery to all but the barest handful of enlighted truth seekers? That’s a no-brainer, right?
So we gave our mystic friend “Xamxor the Omnipontent” a call. To our benefit and yours, he has agreed to be the engine behind BigML’s latest offering, the Tea Leaf Predictor!
To get started using Xamxor for tea-driven decisions, first click on the steaming cup of tea in the predictions panel.
Much like BigML’s data source/dataset/model workflow, the Tea Leaf Predictor has a three step process.
- Your data is sent to Xamxor via an HTTP POST request. Because nothing in this life-age of the universe is unknown to Xamxor, your username and api key are unnecessary.
- Xamxor brews up a fresh pot of tea (or as he laughingly calls it, “Predictin’ Juice”), drinks it, and stares at the leaves, channeling the wisdom of a thousand generations. The answer comes to him first as a whisper and then as a raging torrent, threatening to destroy him.
- Xamxor then contacts your mind directly and imparts his knowledge. Note that you may hear the wisdom of Xamxor with significant and grandiose echo. This is normal.
Below is simplified flowchart of the Tea Leaf workflow.
We hope that you will find Xamxor’s wisdom as useful as we do.
BigML’s API documentation is here!
We are excited to publish the documentation for BigML.io, BigML’s public API! BigML.io is a REST-style API that’ll let you create and manage BigML resources programmatically. Using BigML.io you are able to create, retrieve, update and delete Sources, Datasets, Models and Predictions via standard HTTP methods. We serve BigML.io over HTTPS to ensure data privacy. Unencrypted HTTP is not supported.
Using the API opens up a world of opportunities. You can now add the power of modeling and predictions to your own applications. Just imagine all the automated decisions and predictions that could make your apps smarter!
BigML gives you an API with:
- Secure programmatic access to all your BigML resources;
- Fully white-box access to your datasets and models;
- Asynchronous creation of datasets and models;
- Near real-time predictions.
We’ve made an effort to make the documentation simple and easy to use. The examples in the documentation even have your own id and API key incorporated (do keep those secret!) so you can make it as simple as copy and paste to get started. If you have any suggestions, questions or requests for improving the documentation, we’d love to hear from you. Drop us a note at feedback@bigml.com.
This documentation explains the format for raw HTTP calls to BigML.io . We are developing binding libraries in a number of programming languages, so stay tuned!
PS: To make use of all this, you need an invitation code, so you can register, create your user id and receive an API key. If you’re not on the list for receiving an invitation code yet, this is the place to go.
If you have a great idea for using our API, drop us a note at feedback@bigml.com and we’ll gladly move you to the top of the list!
282 days of hacking in 3 minutes
Over a year ago we thought it was as good a time as ever to start a new software company. We began with an abundance of ideas, enthusiasm, bits of code for prototyping and the notion that 24 hours in a day is never going to be sufficient to do everything that needs to be done. Architecture and infrastructure came together to support the ever growing bits of code. A Git repo was created on June 13, 2011. Initial commits were made. BigML was born.
This video visualizes 9 months of commits that resulted in the first (Alpha) version of our product, up until March 21, 2012. So here is 282 days of hacking compressed in 3 minutes!
This video is created using Gource, based on the commits from our git project. Music by JewelBeat: ‘Last Push’.
















