Finding your Optimal Models Automatically with WhizzML and OptiML

Posted by

This blog post, the fifth of our series of posts about OptiML, focuses on how to programmatically use this resource with WhizzML, BigML’s Domain Specific Language for Machine Learning workflow automation. To refresh your memory, WhizzML allows you to execute complex tasks that are computed completely on the server side with built-in parallelization.

BigML Resource Family Grows

Our posts so far have been exclusively focusing on OptiML since it’s the newest resource we are presenting on the BigML Dashboard. The theme of, E Pluribus Unum (“out of many, one”), best explains the rationale behind these new resources that make use of multiple algorithms to converge on a best fitting model or ensemble for better results.  All three resources will be accessible programmatically come May 16.

Whereas OptiML identifies the best performing model for each classification or regression algorithm. Now that we’ve covered the brief introductions, let’s see how to work with these new resources using WhizzML.

Creating an OptiML

To start creating an OptiML via WhizzML, we begin with an existing dataset that will be split to train and evaluate more than a hundred models, including decision trees (aka models), ensembles, logistic regressions and deepnets. So our WhizzML code will need to include the dataset ID we want to use for training as shown below:

;; creates an OptiML with default settings
(define my-optiml 
  (create-optiml {"dataset" my-dataset})

As we commented in the previous post, the BigML API is mostly asynchronous, meaning the execution will return the OptiML ID before its creation is completed. This implies that the model exploration process will continue after the code snippet is executed. You can use the directive “create-and-wait-optiml” to be sure that the exploration process has been finished:

;; creates an OptiML with default paramters. Once it's
;; completed the ID is stored in my-optiml variable
(define my-optiml 
  (create-and-wait-optiml {
    "dataset" my-dataset
    }))

Given that different use cases will require different properties,  BigML provides several parameters to fine tune the model’s exploration process (more on this will be in the OptiML documentation to be released on May 16). Here, we will configure an OptiML via WhizzML to set the metric and the class to optimize by using property pairs such as <property_name> and <property_value>. Let’s see how to create an OptiML that optimizes the classifier search according to the area under the curve (AUC metric) and with “Mexico” as the class. Here is the straightforward code for that example case:

;; creates an OptiML setting parameters. Once it's
;; completed the ID is stored in my-optiml variable
(define my-optiml 
  (create-and-wait-optiml {
    "dataset" my-dataset
    "metric" "area_under_roc_curve"
    "objective_field" "00000d"
    "metric_class" "Mexico"
    }))

Once the model exploration process is complete and we have created an OptiML, we can easily retrieve the first model (the top-performing one) to get predictions from it as follows:

;; retrieves the first model from an OptiML
(define best-model (head (get (fetch my-optim)))

Want to know more about OptiML?

If you have any questions or you would like to learn more about how OptiML works, please visit the release page. It includes a series of blog posts, the BigML Dashboard and API documentation, the webinar slideshow as well as the full webinar recording.