Enhancing your Web Browsing Experience with Machine Learned Models (part II)
In the first part of this series, we saw how to insert machine-learned model predictions into a webpage on-the-fly using a content script browser extension which grabs the relevant pieces of the webpage and feeds them into a BigML actionable model. At the conclusion of that post, I noted that actionable models were very straightforward to use, but their static nature would lead to lots of effort in maintaining the browser extension if the underlying model were constantly updated with new data. In this post, we’ll see how to use BigML’s powerful API to both keep your models up to date, and write a browser extension that stays current with your latest models.
Staying up to date
In keeping with our previous post, we will be working on a model to predict the repayment status of loans on the microfinance site Kiva. In the last post, we used a model that was trained nearly one year ago on a snapshot of the Kiva database. Hundreds of new loans become available every day, which means this model is starting to look a bit long in the tooth. We’ll start off by learning a fresh model from the latest Kiva database snapshot, and in doing so, we’ll get to use some of the new features we’ve added to BigML over the last year, such as objective weighting and text analysis. However, the Kiva database snapshot is several gigabytes of JSON-encoded data, much of which won’t be relevant to our model, so we really don’t want to relearn from a new snapshot several months down the line when our hot new model becomes stale. Fortunately, we can avoid this situation using multi-datasets. Periodically, say once a month, we’ll grab the newest loan data from Kiva and build a BigML dataset. With multi-datasets, we can take our monthly dataset, our original snapshot dataset, and all the previous monthly datasets, concatenate them together and learn a model from the whole thing. With the BigML API, this all happens behind the scenes, and all we need to do is supply a list of the individual datasets. We can do all of this with a little Python script. Here is the main loop:
Every dataset created by this script will have “kiva-data” as its name, so we can use the BigML API to list the available datasets and filter by that name. If none show up, then we know we need to create a base dataset from a Kiva snapshot; otherwise, we’ll use the Kiva API to create an incremental update dataset. In either case, we then proceed to create a new model using multi-datasets. All we need to do is pass a list of our desired dataset resource IDs. We assign the name “kiva-model” to the model so that it can be easily found by our browser extension. We employ a few other minor tricks in our script, such as avoiding throttling in the Kiva API. You can check out the whole thing at this handy git repo.
API-powered browser extension
Our first browser extension was a content script that fired whenever we navigated to a Kiva loan page. It would grab the loan information, feed it to the model, and then use jQuery to insert a status indicator into the webpage’s DOM tree. Our new extension won’t be very different with regards to loan data scraping and DOM manipulation; the main difference will be how the model predictions are generated. Whereas our first extension featured an actionable model, i.e. a series of nested if-else statements, our new extension will perform predictions with a live BigML model, using the REST-ful API. To interface with the model, our extension now needs to know about our BigML credentials. In order to store the BigML credentials, we create a simple configuration page for the extension, consisting of two input boxes and a Save button.
Whenever we save a new set of credentials, we want to lookup the most recent Kiva loans model found in that BigML account. However, this isn’t the only context in which we want to fetch models. For instance, it would be a good idea to get the latest model every time the web browser is started up. We’ll implement our model fetching procedure inside an event page, so it can be accessed from any part of the extension.
At the bottom of event page, we see that it listens to the browser startup event, and any messages with the greeting “fetchmodel” to fire the model fetching procedure. We also see that it listens for the
onInstalled to open up the configuration page for the first time. With this extra infrastructure in place, we are ready to make our modifications to the content script. Here is the script in its entirety:
Compared to the previous version of the extension, the bulk of the differences lie in the
predictStatus function. Rather than evaluating a hard-coded nested if-then-else structure, the loan data is POSTed to the BigML prediction URL with AJAX, and most of the DOM manipulation has been refactored to occur in the AJAX callback function. One perk we get from using a live model is that we get a confidence value along with our prediction of the loan status. We’ve used that to add a nice little meter alongside our status indicator icon.
Better browsing through big data
You can grab the source code for this Chrome browser extension here, where it is also available as a Greasemonkey user script for Firefox. We hope that this post, and its prequel, have been able to show the relative ease with which BigML models can be incorporated into content scripts and browser extensions. Beyond this kiva.org example, there are a myriad of potential applications on the internet where machine learned models can be used to provide a richer and more informed web browsing experience. It’s now up to you to use what you’ve learned here to go out and realize those applications!