Automating Fusions with WhizzML and the Python Bindings

Posted by

This blog post is the fifth in our series of posts about Fusions and focuses on how to automate processes that include them using WhizzML, BigML’s Domain Specific Language for Machine Learning workflow automation. Summarizing what a Fusion is, we can define them as a group of resources that predict together in order to reduce each resource’s individual weakness.

In this post, we are going to describe how to automate a process that creates a good predictor by employing Fusions in a programmatic way. As we have commented in other posts related to WhizzML, WhizzML allows you to execute complex tasks that are computed completely on the server side with parallelization. This eliminates connection issues and takes care of your account limits regarding the maximum number of resources you can create at the same time. We will also describe the same operations with our Python bindings as another option for client-side control.

As we have mentioned, this release of Fusions puts the focus on the power of the many models working together. Starting from the beginning, suppose we have a group of trained models (trees, ensembles of trees, logistic regressions, deepnets, or another fusion), and we want to use all of them to create new predictions. Using multiple models will remove the weakness of a singular model. The first step is to create a Fusion resource passing the models as a parameter. Below is the code for creating a Fusion in the simplest way: passing a list of models in this format [“<resouce_type/resource_id>”, “<resouce_type/resource_id>”, …] and without any other parameter, that is, taking the default parameters.

;; WhizzML - create a fusion
(define my-fusion (create-fusion {"models" my-best-models}))

If you choose to use Python to code your workflows and run the process locally, instead of running completely in the server,  the equivalent code is below, where the models are also passed as the unique parameter in a list.

# Python - create a fusion
fusion = api.create_fusion(["model/5af06df94e17277501000010",

Just like all BigML resources, Fusions have parameter options that the user can add in the creation request to improve the final result. For instance, suppose that we want to add different weights to each one of the models that compose the Fusion because we know that one of the models is more accurate than others. Another point to highlight is that the creation of the resource in BigML is done in an asynchronous way, which means that most of the time, the creation request doesn’t return the resource as it will be when it’s completed. In order to get it completed, you have two main options: to make an iterative retrievement until it’s finished, or use the functions created by that effect. In WhizzML, the function create-and-wait​ stops the workflow execution while the Fusion is not completed.

Let’s see how to do it specifying weights for the models and assign the variable when the resource is completed. Looking at the code below, you can see how the list of models persists, but now we are also passing a set of dictionaries:

;; WhizzML - create a fusion with weights and wait for the finish
(define my-fusion
  (create-and-wait-fusion {"models"
                           [{"id" "model/5af06df94e17277501000010"
                             "weight" 1}
                            {"id" "deepnet/5af06df84e17277502000016"
                             "weight" 4}
                            {"id" "ensemble/5af06df74e1727750100000d"
                             "weight" 3}]}))

In Python bindings, the asynchronism is managed by the ok function, and the weights are added to each model’s object in the Fusion. Here is the code for the Python binding that is equivalent to the WhizzML code above.

# Python - create a fusion with weights and wait for the finish
fusion = api.create_fusion([
    {"id": "model/5af06df94e17277501000010", "weight": 1},
    {"id": "deepnet/5af06df84e17277502000016", "weight": 4},
    {"id": "ensemble/5af06df74e1727750100000d", "weight": 3}])

To see the complete list of arguments for Fusion creation, visit the corresponding section in the API documentation.

Once the Fusion has been created, the best way to measure its performance, as with every type of supervised model, is to make an evaluation. To do so, you need to choose data that is different than the one used to create the Fusion, since you want to avoid overfitting problems. This data is often referred to as a”test dataset”. Let’s see first how to evaluate a Fusion by employing WhizzML:

;; WhizzML - Evaluate a fusion
(define my-evaluation
    (create-evaluation {"fusion" my-fusion "dataset" my-test-dataset}))

and now how it should be done with the Python bindings:

# Python - Evaluate a fusion
evaluation = api.create_evaluation(my_fusion, my_test_dataset)

In both cases, the code is extraordinary simple. With the evaluation, you can determine if the performance given is acceptable for your use case, or if you need to continue improving your Fusion by adding models or training new ones.

As with any supervised resource, once the model has a good level of performance, you can start using it to make predictions, which is the goal of the built Fusion model. Following the line of the post, let’s write the WhizzML code to make single predictions, that is predict just the result of a “row” of new data.

;; WhizzML - Predict using a fusion
(define my-prediction
    (create-prediction {"fusion" my-fusion
                        "input-data" {"state" "kansas" "bmi" 32.5}}))

To do exactly the same with Python bindings, your code should be like the following. The first parameter is the Fusion resource ID and the second one is the new data to predict with.

# Python - Predict using a fusion
prediction = api.create_prediction(my_fusion, {
    "state": "kansas",
    "bmi": 32.5

Here we are showing the most simple case to make a prediction, but prediction creation has a large list of parameters in order to bring a good fit to the result, according to your needs.

When your goal is not only to predict a single row but a group of data, represented as a new dataset (that you previously uploaded to the BigML platform), you should create a batchprediction resource, which only requires two parameters: the Fusion ID and the ID of this new dataset.

;; WhizzML - Make a batch of predictions using a fusion
(define my-batchprediction
    (create-batchprediction {"fusion" my-fusion "dataset" my-dataset))

It couldn’t get any easier. The equivalent code in Python is almost the same and very simple too. Here it is:

# Python - Make a batch of predictions using a fusion
batch_prediction = api.create_batch_prediction(my_fusion, my_dataset)

Want to know more about Fusions?

Stay tuned for the next blog post to learn how Fusions work behind the scenes. If you have any questions or you would like to learn more about how Fusions work, please visit the release page. It includes a series of blog posts, the BigML Dashboard and API documentation, the webinar slideshow as well as the full webinar recording.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s