Last year, BigML launched the OptiML resource for Automatic Model Optimization. Without a doubt, it has marked a milestone in our Machine Learning platform. Since then, many users have included OptiML in their Machine Learning toolboxes. However, some users are asking us to go further than model selection, so today we’re presenting BigML’s AutoML, an Automated Machine Learning tool for BigML.
This first version of AutoML helps automate the complete Machine Learning pipeline, not only the model selection. To boot, it’s pretty easy to execute. Give it training and validation datasets and it will give you a Fusion with the best possible models using the least possible number of features: ready to predict!
The returned model will be the result of three AutoML stages: Feature Generation, Feature Selection and Model Selection. AutoML will also return the evaluation of the model, in order to show the user its performance.
AutoML is provided as a WhizzML script and a library. You can find it in the WhizzML’s public repository on Github.
As mentioned, behind the scenes, BigML’s AutoML is performing three main operations: Feature Generation, Feature Selection, and Model Selection.
The first stage is Feature Generation. During this stage, some new features are added to the original datasets. These features are obtained by applying Unsupervised Learning models to them. The new synthetic features added to the dataset come from:
- Cluster Batch Centroids (Clustering)
- Anomaly Scores (Anomaly Detection)
- Batch Association Sets (Association Discovery): Using the objective field from your dataset as consequent and using leverage and lift as search_strategy
- PCA Batch Projections (Principal Component Analysis)
- Batch Topic Distributions (Topic Model): Created only when the dataset contains text fields.
The next stage is Feature Selection. After Feature Generation, we usually end up with very wide datasets, so, AutoML applies Recursive Feature Elimination to remove unimportant or redundant features. Those that may need a refresher can revisit the post we shared a few months ago about the topic. The steps that AutoML follow are identical to the ones shown in that blog post.
The final stage is Model Selection. Guess what? OptiML will help us with this task. The full power of Bayesian Optimization is leveraged at this step to arrive at the best possible models. At the end of the process, by choosing only the best models evaluated by the OptiML we can create a Fusion (e.g., top-performing Deepnet plus top tree ensemble). The final Fusion model should also be evaluated against the validation dataset in order to record and display its performance for the end-user.
It’s your turn now…
This is only the opening salvo of Automated Machine Learning within BigML. If you want to start using AutoML, please check the WhizzML public repository on Github. There, you will find all the information needed to install and run AutoML. This is only one example of how WhizzML lets us further extend BigML’s capabilities. And anybody can do it the entire AutoML code is made public which means you can modify or extend it to fit your specific needs. You asked us to go one step further and so we did and now, we’d love to receive your feedback on ways to further improve on this. In subsequent blog posts, we will also showcase AutoML in action with some example use cases. Until then, stay tuned!