It is no surprise by now that we are having to deal with lots of data in many different formats in our everyday life. From Database Managers taming growing quantities of data in large companies to the handy spreadsheet that can work just fine for small tasks or personal use, BigML has been on a mission to bring Machine Learning predictions to every dataset. In that spirit, we continue the Google integration theme with news on our Google Sheets add-on.
Google Sheets is a truly wonderful tool to store your datasets. It is fully functional as a spreadsheet, but it turns out that you can still improve its utility by taking advantage of add-ons. The add-ons are macro-like Google Apps Scripts that can interact with the contents of your Google Sheet (or Docs or Forms) and automate repetitive tasks, or connect to other services. At BigML, we’ve built our own add-on that will let you use your models or clusters to add predictions to your data in Google Sheets.
BigML users already know how easy it is to start making predictions in BigML. Basically, you register, upload your data to BigML (it can be in CSV local or remote files, Excel spreadsheets, inline data etc.) and in one click build a dataset, where all the statistical information is summarized. With a second click, you can build a model, where the hidden patterns in your data are unveiled. Those rules can later be used to predict the contents of empty fields in new data instances. With our add-on, it’s now possible to perform those predictions directly in your Google Sheet.
The wine shop use case
The first time you login to BigML, you land in a development area with a bunch of sample data sources available for you to play with at no cost. Let’s use one of these to build an example: the fictional wine sales dataset. It contains historical wine sales figures and the related features for each wine such as the country, type of grape, rating, origin, and price. Imagine you want to carry new wines in your store. It would be great to have an estimate of the total sales you can expect from each new wine, so that you can choose the ones that will sell better, right?
Using the above dataset, you can easily create a BigML decision tree model that can predict the total sales for a wine given its features. Thus, for every new wine, you can use the model to compute the expected total sales and choose the new wines most likely to maximize your revenue. But what if your list of new wines is in a Google Sheet? Good news! You can also use your BigML model from within your Google sheet to quickly compute the predicted sales values for the new wines.
Using BigML models from Google Sheets
To use this functionality, you’ll need to first install the BigML add-on (available under the add-ons menu in Google Sheets). Once installed, it will appear under the add-ons menu as seen below. You can now choose the ‘Predict’ submenu item, which will display the form needed to access all your models and clusters in BigML (provided that you’ve authenticated with your BigML credentials). In this case, you’ll sort through your list of models and select the one that was built on your historical wine sales data. Finally, you’ll be ready to add predictions to the empty cells in your Google Sheet.
To do this, select the range of cells that contain the information available for your new wines list. Pressing the ‘Predict’ button on the right-hand side panel, the prediction for each row will be placed in the next empty cell on the right, and the associated error will also be appended in a second column. In this example, the prediction has been a number, but you can add predictions for categorical fields just as easily:
So how does the BigML add-on work behind the scenes? The add-on code is executed in Google Apps Script servers. Google Apps Script code can connect to BigML and download your models to those servers (after validating your credentials in BigML). It also can interact with your Google Sheet. The BigML model you choose is downloaded to the Google Apps Script server environment, where the script runs each row in your selected range through the model and updates the cells in your sheet with the computed predictions. Thus, no data in your sheet has to reach BigML to obtain predictions. It stays in Google servers the whole time. This video shows the basic steps for this and other examples dealing with categorical models or clusters. Give it a try!