We have released a new version of our open source Python bindings. This new version aims at showing how the BigML API can be used to build predictive models capable of generating predictions locally or remotely. You can get full access to the code at Github and read the full documentation at Read the Docs.
The complete list of updates includes (drum roll, please):
Development Mode
We recently introduced a free sandbox to help developers play with BigML on smaller datasets without being concerned about credits. In the new Python bindings you can use BigML in development mode, and all dataset and models smaller than 1MB can be created for free:
from bigml.api import BigML api = BigML(dev_mode=True)
Remote Sources
You can now create new “remote” sources using URLs. For example, the following snippet would allow you to create a new source using loan funding data from Lending Club.
source = api.create_source("https://www.lendingclub.com/fileDownload.action?file=InFundingStats.csv&type=gen")
The URLs to create remote sources can be http or https with basic realm authentication (e.g., https://test:test@static.bigml.com/csv/iris.csv) or Amazon S3 buckets (e.g., s3://bigml-public/csv/iris.csv) or even Microsoft Azure blobs (e.g., azure://csv/iris.csv?AccountName=bigmlpublic)
Bigger files streamed with Poster
In the first version we were using the fabulous Python request HTTP library to fulfill all the HTTPS communications with BigML. However, we realized that Poster was better suited for streaming large files, mainly due to the fact that it does not need to load the entire file into memory before sending it. Due to the lengthy transmission times, we’ve also added a text progress bar that lets yous see how many bytes have been uploaded so far.

Asynchronous Uploading
Uploading big file might take a while depending on your available bandwidth. We’ve added an async
flag to enable asynchronous creation of new sources.
source = api.create_source("sales2012.csv", async=True)
You can then monitor that source to check the progress. Once the upload is finished the variable will contain the BigML resource.
>>> source {'code': 202, 'resource': None, 'location': None, 'object': {'status': {'progress': 0.48, 'message': 'The upload is in progress', 'code': 6}}, 'error': None}
Local Models
If you import the new Model
class you will be able to create a local version of a BigML model that you can use to make predictions locally, generate an IF-THEN rule version of your model, or even a Python function that implements the model. This is as easy as:
from bigml.api import BigML from bigml.model import Model api = BigML(dev_mode=True) source = api.create_source('https://static.bigml.com/csv/iris.csv') dataset = api.create_dataset(source) model = api.create_model(dataset) model = api.get_model(model) local_model = Model(model)
This code not only will generate a model like the one depicted below in your dashboard at BigML.com but also a local model that would be able to use to generate local predictions or to translate into a set of IF-THEN rules or even into Python code.
Local Predictions
Using local models for predictions has a big advantage: they won’t cost you anything. They are also a good idea for situations where extreme low latency is required, or a network connection is not available. The trade off is that you will lose the ability for BigML to track or improve the performance of your models.
>>>local_model.predict({"petal length": 3, "petal width": 1}) petal length > 2.45 AND petal width <= 1.65 AND petal length <= 4.95 => Iris-versicolor
Rule Generation
Many organizations use rule engines to implement certain business logic. IF-THEN rules are a great way of capturing a white-box predictive model, as they are a human-readable language common to any actor in the organization. That’s a big advantage when you need to share it with, let’s say, the marketing department to adapt their strategies or the CEO to approve it. You can easily get the rules that implement a BigML predictive model as follows:
>>> local_model.rules() IF petal length > 2.45 AND IF petal width > 1.65 AND IF petal length > 5.05 THEN species = Iris-virginica IF petal length <= 5.05 AND IF sepal width > 2.9 AND IF sepal length > 5.95 AND IF petal length > 4.95 THEN species = Iris-versicolor IF petal length <= 4.95 THEN species = Iris-virginica IF sepal length <= 5.95 THEN species = Iris-versicolor IF sepal width <= 2.9 THEN species = Iris-virginica IF petal width <= 1.65 AND IF petal length > 4.95 AND IF sepal length > 6.05 THEN species = Iris-virginica IF sepal length <= 6.05 AND IF sepal width > 2.45 THEN species = Iris-versicolor IF sepal width <= 2.45 THEN species = Iris-virginica IF petal length <= 4.95 THEN species = Iris-versicolor IF petal length <= 2.45 THEN species = Iris-setosa
Python Generation
Local models can also be used to generate a Python function that implements the predictive model as a sequence of IF statements. For example:
>>> local_model.python() def predict_species(sepal_length=5.77889, sepal_width=3.02044, petal_length=4.34142, petal_width=1.32848): if (petal_length > 2.45): if (petal_width > 1.65): if (petal_length > 5.05): return 'Iris-virginica' if (petal_length <= 5.05): if (sepal_width > 2.9): if (sepal_length > 5.95): if (petal_length > 4.95): return 'Iris-versicolor' if (petal_length <= 4.95): return 'Iris-virginica' if (sepal_length <= 5.95): return 'Iris-versicolor' if (sepal_width <= 2.9): return 'Iris-virginica' if (petal_width <= 1.65): if (petal_length > 4.95): if (sepal_length > 6.05): return 'Iris-virginica' if (sepal_length <= 6.05): if (sepal_width > 2.45): return 'Iris-versicolor' if (sepal_width <= 2.45): return 'Iris-virginica' if (petal_length <= 4.95): return 'Iris-versicolor' if (petal_length <= 2.45): return 'Iris-setosa'
So with this new binding for the BigML API and only a few lines of Python you can upload a big dataset, create a predictive model, and then auto-generate the Python code that will implement the predictive model in your own application. How cool is that? Remember, you can clone the bindings from Github and read the full documentation at Read the Docs.
4 comments