Finding your Optimal Model Automatically with the BigML API and OptiML

Posted by

In this post, the fourth of our 6 blog posts focused on optimizing Machine Learning automatically, we will explore how to use OptiML with the BigML API. So far we have covered an introduction to OptiML, seen a use case, and walked through the BigML Dashboard step-by-step. Here we will shift the focus to our REST API.

optiml-workflow

OptiML can be applied to both classification and regression problems. Because it’s an entirely automated process, it requires very few parameters overall. With regards to programmatic control, options are mostly constrained to limiting the extent of the search for the total time, the total number of algorithms, the types of algorithms considered, and performance metrics. Longtime readers will notice that this post is similar in structure to other release tutorials, due to the overall standardization of resource creation and execution using the BigML API.

Authentication

Before using the API, you must set up your environment variables. You should set BIGML_USERNAME, BIGML_API_KEY, and BIGML_AUTH in your .bash_profile. BIGML_USERNAME is just your username. Your BIGML_API_KEY can be found on the Dashboard by clicking on your username to pull up the account page, and then clicking on ‘API Key’. Finally, BIGML_AUTH is simply the combination of these elements.

export BIGML_USERNAME=my_name
export BIGML_API_KEY=123456789
export BIGML_AUTH=“username=$BIGML_USERNAME;api_key=$BIGML_API_KEY;“

Analogous to the Dashboard process, the first step is uploading a data source to be processed. You can point to remote sources, or upload files locally, using a range of different file formats. Using the terminal, the CURL command can be used to upload the file “loans.csv” which was utilized in our previous OptiML blog post.

curl "https://bigml.io/source?$BIGML_AUTH" -F file=@loans.csv

Creating a Dataset

A BigML dataset is a separate resource and is a serialized form of your data. In the Dashboard, it is displayed with some simple summary statistics and is the resource consumed by Machine Learning algorithms. To create a dataset from your uploaded data via the API, you can use the following command, which specifies the source used to generate the dataset.

curl "https://bigml.io/dataset?$BIGML_AUTH" \
  -X POST \
  -H 'content-type: application/json' \
  -d '{"source": "source/5af59c8692527328b40007ed"}'

Creating an OptiML

OptiML automates the entire process of model selection and parameterization end-to-end for classification and regression problems. This automation accelerates the process to improve model performance, and thus makes sophisticated workflows accessible to non-experts. In order to create an OptiML, all you need is the dataset ID.

curl "https://horizon.bigml.io/optiml?$BIGML_AUTH" \
  -X POST \
  -H 'content-type: application/json' \
  -d '{"dataset": "dataset/5af59f9cc7736e6b33005697"}'

Once the process of creating an OptiML is complete, you can return all of the models that have been optimally created as well as their corresponding performance metrics, whether they are logistic regressions, models (decision trees), deepnets, or ensembles. Because an OptiML might be composed of hundreds or even thousands of fields, it is possible to specify that only a subset of fields needs to be retrieved.

curl "https://tropo.dev.bigml.io/andromeda/optiml/5af5a712b95b397877000372?$BIGML_AUTH"

From the list of optimal models returned by OptiML, you can continue your Machine Learning workflow in whichever direction is most applicable. In the example below, we select the best performing model overall, in this case, a logistic regression, and perform an evaluation with our original dataset. Just as easily, we could choose to run batch predictions on new data, or consider other more complicated workflows with the optimized models.

curl "https://tropo.dev.bigml.io/evaluation?$BIGML_AUTH" \
  -X POST \
  -H 'content-type: application/json' \
  -d '{"dataset": "dataset/5af5a69cb95b39787700036f",
       "logisticregression": "logisticregression/5af5af5db95b3978820001e0"}'

Want to know more about OptiML?

If you have any questions or you would like to learn more about how OptiML works, please visit the release page. It includes a series of blog posts, the BigML Dashboard and API documentation, the webinar slideshow as well as the full webinar recording.

Leave a comment