Programming Fusions with the BigML API

Posted by

As part of our Fusions release, we have already demonstrated a use case and walked through an example using the BigML Dashboard. Our fourth of six blog posts on Fusions will demonstrate how to utilize Fusions by directly calling the BigML REST API. As a reminder, Fusions can be used for both classification and regression supervised Machine Learning problems, and function by aggregating the results of multiple models (decision trees, ensembles, logistic regressions, and/or deepnets), often achieving better performance as result.

Authentication

Using  the BigML API, requires that you first set up the correct environment variables. In your .bash_profile, you must set BIGML_USERNAME, BIGML_API_KEY, and BIGML_AUTH to the correct value. BIGML_USERNAME is simply your BigML username. Additionally, the BIGML_API_KEY can be found on the Dashboard by clicking on your username to pull up the account page, and then clicking on ‘API Key’. BIGML_AUTH requires the combination of these elements.

Upload your Data and Create a Dataset

For this tutorial we are using the same dataset of home sales from the Redfin search engine used in our previous tutorial of the BigML Dashboard, and available in the BigML Gallery. Preparing our data for Machine Learning requires two major steps: first creating a source followed by creating a dataset. It is important to make sure that the objective field of the dataset is “LAST SALE PRICE” before creating any predictive models.

curl "https://bigml.io/source?$BIGML_AUTH" -F file=@Redfin.csv

curl "https://bigml.io/dataset?$BIGML_AUTH" \
  -X POST \
  -H 'content-type: application/json' \
  -d '{"source": "source/5b3fa219983efc5ae5000055"}'

Create Two Simple Models

Because Fusions are aggregates of component models, we first need to create these models. In this case, we will create both an ensemble model and a deepnet model using the default parameters for each. Fusions typically work best when the component models are both high-performing and diverse.

curl "https://bigml.io/ensemble?$BIGML_AUTH" \
  -X POST \
  -H 'content-type: application/json' \
  -d '{"dataset": "dataset/5b3fa2ec983efc5bde000037"}'

curl "https://bigml.io/deepnet?$BIGML_AUTH" \
  -X POST \
  -H 'content-type: application/json' \
  -d '{"dataset": "dataset/5b3fa2ec983efc5bde000037"}'

Create your Fusion from Existing Models

In order to create the Fusion, it is as straightforward as providing the resource IDs for the models that you would like to include. Here we are selecting both the ensemble and deepnet created in the previous step, and weighing them equally. However, it is possible to include any number of models in a Fusions, including other Fusions, as well as adjusting the weight of each model on the final result.

curl "https://bigml.io/fusion?$BIGML_AUTH" \
  -X POST \
  -H 'content-type: application/json' \
  -d '{"models": [
  "ensemble/5b4003a1983efc5a8e00002b",
  "deepnet/5b4003b3983efc5bde00005f"]}'

Evaluate your Fusion

Fusions are designed for both classification and regression problems, and can be evaluated to check for performance metrics, as well as to investigate aspects of the model such as field importance. Fusion evaluations require specifying both the trained Fusion, as well as the dataset to be evaluated.

curl "https://bigml.io/evaluation?$BIGML_AUTH" \
  -X POST \
  -H 'content-type: application/json' \
  -d '{"model": "fusion/5b40d4a4983efc5c13000003",
  "dataset": "dataset/5b3fa2ec983efc5bde000037"}'

Create and Retrieve a Fusion Prediction

Once created, Fusions function like any other class of predictive model in BigML with regard to predictions. In the below example, we are providing values for SQFT, BEDS, and BATHS as input data, and then retrieving the result, which should yield $395,470 in this case. Of course, it is also possible to perform evaluations on Fusions or batch predictions, and further examples can be found in the BigML API Documentation.

curl "https://bigml.io/prediction?$BIGML_AUTH" \
  -X POST \
  -H 'content-type: application/json' \
  -d '{"fusion": "fusion/5b40d4a4983efc5c13000003", 
  "input_data": {"SQFT": 3000, "BEDS": 3, "BATHS": 2}}'

curl "https://bigml.io/prediction/5b40f017983efc5ae50000c2?$BIGML_AUTH"

Want to know more about Fusions?

Stay tuned for the next blog post to learn how to automate Fusions with WhizzML and the BigML Python Bindings. If you have any questions or you would like to learn more about how Fusions work, please visit the release page. It includes a series of blog posts, the BigML Dashboard and API documentation, the webinar slideshow as well as the full webinar recording.

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s