More Machine Learning in your Google Sheets

Posted by

It’s been a while since the first version of BigML’s add-on for Google Sheets. The post announcing it described how one could add predictions to Google Sheets cells by using BigML’s Decision Trees. It was also possible to apply segmentation to the rows in a spreadsheet by tapping into Clustering Models previously created in BigML.

During these years, BigML has been adding new supervised and unsupervised models to its portfolio of native resources. All along, the add-on has been steadily updated to include most of them, like Logistic and Linear Regressions. However, so far it has not been possible to upload to or download the information in the spreadsheet to BigML. On the contrary, the models in BigML were downloaded to Google machines, where predictions were computed. That implied some limitations, because Google sets some limits to the size of the objects that can be downloaded. Therefore, heavier models like Deepnets or Anomaly Detectors could not be added to the add-on model list.

We’re happy to share that this last version of BigML’s add-on has overcome these limits to provide more flexibility and options to users. This video shows a quick taste of the add-on functions that will be explained in this post.

When uploading the information from Google Sheets to BigML the result is a Source resource that contains the data dictionary describing how data is parsed i.e., the number of fields, their names, and types. From that, a Dataset containing the values of the fields can be built. That opens up plenty of possibilities to extract insights from your data, because datasets are the starting point for all Machine Learning procedures, like modeling, scoring, or evaluating.

Let’s learn by example about the new capabilities in the add-on. I was curious about Tenet, the last film by Christopher Nolan, so I searched for twits talking about Nolan’s films and created a small sample in a Google Sheet.

My goal would be to predict the sentiment associated with each sentence so that I don’t need to read the opinions to know if it would be worth seeing the movie. In order to do that, we need a large enough Dataset that contains sentences and the sentiment label (positive or negative) associated with each one. In BigML’s datasets gallery, we can easily find a Review Text Sentiment dataset that seems fortunately fit that description:

We can clone the dataset to our account by clicking on the FREE label that you see in the top right corner. Once the dataset is cloned, we can inspect the kind of information that it contains.

There are two fields: sentiment, a categorical field that contains only two labels (positive or negative), and text, a text field that contains the sentences that have been previously labeled and will be used as training data. We can also see the kind of topics discussed by looking at the text field tag cloud.

We observe that the dataset contains opinions about movies and they are already classified as positive or negative. That’s exactly what the algorithm needs to build a model to predict the sentiment associated with a particular sentence. Therefore, we can create a Deepnet in 1-click:

The next step is using BigML’s Review Text Sentiment dataset information to assign a label to those opinions. BigML’s add-on will allow us to locate the Deepnet we just created. Simply select the Start action in the add-on menu and search for Deepnets in the dropdown.

The list of your Deepnets will appear. Clicking on the link of the Review Text Sentiment Deepnet, you should end up in the predict view. Pressing the predict button, the add-on sends every sentence to BigML and runs them through the model and brings back the corresponding sentiment labels and the confidences associated with each prediction.

Of course, this one-by-one process can be slow if you need to classify a lot of rows. In this case, a different approach is recommended. Open the add-on menu and use the Upload to BigML action to upload the contents of the active Sheet to BigML, where a Source will be created.

The Source’s view menu allows creating a Dataset in 1-click, summarizing all the contents of your sheet.

At this point, you’re ready to go back to the Deepnet view, where the actions menu offers a Batch Prediction action.

It applies the selected model to each row of your dataset and adds a new column along with the prediction results. Simply select the Dataset that was created after uploading your active sheet to BigML in the right combo box and press the Predict button when activated. The list of datasets appears when typing the first characters of the name of your active sheet.

There you are! A new dataset with a sentiment column appended is ready for you in BigML. You just need to download it to Google Sheets. To do this, open the add-on menu and select the Download from BigML action.

The newly created dataset should appear first in the list. Click the link to download the information.

A new Sheet will appear in your file with both the original sentences and the sentiment label associated with them.

Of course, the size of data that can be uploaded or downloaded using the add-on is limited. Google sets different limits depending on the kind of account you are running on their site. Still, you can always upload any amount of data by creating a CSV and dragging and dropping it to BigML. Similarly, any Batch Prediction can be downloaded from BigML directly as a CSV.

As you can see, the new options in BigML’s add-on for Google Sheets offer great ease of use. It also enriches your data with all the insights that can be drawn from the entire set of models and workflows available in BigML. What are you waiting for? Give it a try and let us know how you like it!