Predictive Models: Build once, Run Anywhere

Posted by

We have released a new version of our open source Python bindings. This new version aims at showing how the BigML API can be used to build predictive models capable of generating predictions locally or remotely.  You can get full access to the code at Github and read the full documentation at Read the Docs.

The complete list of updates includes (drum roll, please):

Development Mode

We recently introduced a free sandbox to help developers play with BigML on smaller datasets without being concerned about credits. In the new Python bindings you can use BigML in development mode, and all dataset and models smaller than 1MB can be created for free:

from bigml.api import BigML

api = BigML(dev_mode=True)

Remote Sources

You can now create new “remote” sources using URLs. For example, the following snippet would allow you to create a new source using loan funding data from Lending Club.


source = api.create_source("https://www.lendingclub.com/fileDownload.action?file=InFundingStats.csv&type=gen")

The URLs to create remote sources can be http or https with basic realm authentication (e.g., https://test:test@static.bigml.com/csv/iris.csv) or Amazon S3 buckets (e.g., s3://bigml-public/csv/iris.csv) or even Microsoft Azure blobs (e.g.,  azure://csv/iris.csv?AccountName=bigmlpublic)

Bigger files streamed with Poster

In the first version we were using the fabulous Python request  HTTP library to fulfill all the HTTPS communications with BigML. However, we realized that Poster was better suited for streaming large files, mainly due to the fact that it does not need to load the entire file into memory before sending it. Due to the lengthy transmission times, we’ve also added a text progress bar that lets yous see how many bytes have been uploaded so far.

Progress bar for new sources

Asynchronous Uploading

Uploading big file might take a while depending on your available bandwidth. We’ve added an  async flag to enable asynchronous creation of new sources.


source = api.create_source("sales2012.csv", async=True)

You can then monitor that source to check the progress. Once the upload is finished the variable will contain the BigML resource.

>>> source
{'code': 202, 'resource': None, 'location': None, 'object': {'status': {'progress': 0.48, 'message': 'The upload is in progress', 'code': 6}}, 'error': None}

Local Models

If you import the new Model class you will be able to create a local version of a BigML model that you can use to make predictions locally, generate an IF-THEN rule version of your model, or even a Python function that implements the model.   This is as easy as:


from bigml.api import BigML
from bigml.model import Model

api = BigML(dev_mode=True)

source = api.create_source('https://static.bigml.com/csv/iris.csv')
dataset = api.create_dataset(source)
model = api.create_model(dataset)
model = api.get_model(model)

local_model = Model(model)

This code not only will generate a model like the one depicted below in your dashboard at BigML.com but also a local model that would be able to use to generate local predictions or to translate into a set of IF-THEN rules or even into Python code.

Local Predictions

Using local models for predictions has a big advantage: they won’t cost you anything. They are also a good idea for situations where extreme low latency is required, or a network connection is not available. The trade off is that you will lose the ability for BigML to track or improve the performance of your models.

>>>local_model.predict({"petal length": 3, "petal width": 1})
petal length > 2.45 AND petal width <= 1.65 AND petal length <= 4.95 => Iris-versicolor

Rule Generation

Many organizations use rule engines to implement certain business logic. IF-THEN rules are a great way of capturing a white-box predictive model, as they are a human-readable language common to any actor in the organization. That’s a big advantage when you need to share it with, let’s say, the marketing department to adapt their strategies or the CEO to approve it. You can easily get the rules that implement a BigML predictive model as follows:

>>> local_model.rules()
 IF petal length > 2.45 AND
     IF petal width > 1.65 AND
         IF petal length > 5.05 THEN
             species = Iris-virginica
         IF petal length <= 5.05 AND
             IF sepal width > 2.9 AND
                 IF sepal length > 5.95 AND
                     IF petal length > 4.95 THEN
                         species = Iris-versicolor
                     IF petal length <= 4.95 THEN
                         species = Iris-virginica
                 IF sepal length <= 5.95 THEN
                      species = Iris-versicolor
             IF sepal width <= 2.9 THEN
                 species = Iris-virginica
     IF petal width <= 1.65 AND
         IF petal length > 4.95 AND
             IF sepal length > 6.05 THEN
                   species = Iris-virginica
             IF sepal length <= 6.05 AND
                 IF sepal width > 2.45 THEN
                     species = Iris-versicolor
                 IF sepal width <= 2.45 THEN
                     species = Iris-virginica
         IF petal length <= 4.95 THEN
             species = Iris-versicolor
 IF petal length <= 2.45 THEN
     species = Iris-setosa

Python Generation

Local models can also be used to generate a Python function that implements the predictive model as a sequence of IF statements. For example:


>>> local_model.python()
def predict_species(sepal_length=5.77889, sepal_width=3.02044, petal_length=4.34142, petal_width=1.32848):
   if (petal_length > 2.45):
       if (petal_width > 1.65):
           if (petal_length > 5.05):
               return 'Iris-virginica'
           if (petal_length <= 5.05):
               if (sepal_width > 2.9):
                   if (sepal_length > 5.95):
                       if (petal_length > 4.95):
                           return 'Iris-versicolor'
                       if (petal_length <= 4.95):
                           return 'Iris-virginica'
                   if (sepal_length <= 5.95):
                        return 'Iris-versicolor'
               if (sepal_width <= 2.9):
                    return 'Iris-virginica'
       if (petal_width <= 1.65):
            if (petal_length > 4.95):
                 if (sepal_length > 6.05):
                      return 'Iris-virginica'
                 if (sepal_length <= 6.05):
                      if (sepal_width > 2.45):
                           return 'Iris-versicolor'
                      if (sepal_width <= 2.45):
                           return 'Iris-virginica'
            if (petal_length <= 4.95):
                 return 'Iris-versicolor'
    if (petal_length <= 2.45):
         return 'Iris-setosa'

So with this new binding for the BigML API and only a few lines of Python you can upload a big dataset, create a predictive model, and then auto-generate the Python code that will implement the predictive model in your own application. How cool is that? Remember, you can clone the bindings from Github and read the full documentation at Read the Docs.

4 comments

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s