Skip to content

R you ready for Big Machine Learning?

by on May 10, 2012

Recently, we released python bindings for our API.  We received fantastic feedback on the related blog post from hacker news and twitter, so we started thinking about other languages that could benefit from a tighter integration with the BigML API.

Today BigML releases the bigml package for R.  R is already well known for its capabilities in statistics and data analysis, and we use it internally for a number of different day-to-day tasks.  The bigml package enables the R community to easily take advantage of our highly scalable cloud based machine learning infrastructure, while using familiar R data structures and workflows.

The package is available through CRAN here.

To use it, simply install it from CRAN as usual, and  load it via the library command.  You will also need to set your credentials, either by running the setCredentials command or by setting BIGML_USERNAME and BIGML_API_KEY in your .Renviron file.

install.packages("bigml")
library(bigml)
setCredentials("username", "api_key")

If this is the first time you’ve worked with cURL, it may be necessary to update your SSL certificates. You can do that from within R by:

download.file(url="http://curl.haxx.se/ca/cacert.pem", destfile="cacert.pem");
curl <- getCurlHandle();
options(RCurlOptions = list(capath = system.file("CurlSSL", "cacert.pem",package = "RCurl"), ssl.verifypeer = FALSE));
curlSetOpt(.opts = list(proxy = ’proxyserver:port’), curl = curl);

R’s dataframes are very similar to BigML datasets.  The bigml package lets you directly specify a model from an existing R dataframe.

So, let’s create a model from the iris data set that comes with R, and make a prediction with it.

iris.model = quickModel(iris, objective_field = 'Species' )
quickPrediction(iris.model,c(Sepal.Length=2, Petal.Length=3, Petal.Width=2))
# [1] "virginica"

That’s it!

The bigml package will manage the creation of appropriate sources, datasets, and models behind the scenes.  It will also let you use the existing names in the original dataframe to specify input or objective fields.

The “quick” methods are geared towards working with native R resources, and do a large number of things automatically.  If you prefer more manual control of the operations, basic methods are also provided :

write.csv(iris, "~/iris.csv", row.names=F)
iris.source = createSource("~/iris.csv", 'iris')
iris.dataset = createDataset(iris.source$resource)
iris.model = createModel(iris.dataset$resource)

After you create models, datasets, and other resources, you can easily retrieve your list of resources with the package as well. The bigml package reconfigures the API results into simple R dataframes in order to present well-structured records of every resource you have created at BigML:

listSources()
# e.g.
# name updated number_of_datasets etc...
#1 iris 2012-05-08T20:58:01.916000 1 ...
#2 test 2012-05-08T20:40:28.846000 1 ...
#3 census 2012-05-08T20:34:53.398000 1 ...
listDatasets()
listModels()

There are many more options available in the package, so be sure to check out the documentation for the individual functions, or the package itself.  It’s available here and within the package itself:

?bigml
?quickModel
?listModels

We’ve also made the package source code available at the bigml-r repo on github, in case you’re interested in following the progress, or making suggestions.

From → API, BigML.io

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 131 other followers

%d bloggers like this: