R you ready for Big Machine Learning?

Posted by

Recently, we released python bindings for our API.  We received fantastic feedback on the related blog post from hacker news and twitter, so we started thinking about other languages that could benefit from a tighter integration with the BigML API.

Today BigML releases the bigml package for R.  R is already well known for its capabilities in statistics and data analysis, and we use it internally for a number of different day-to-day tasks.  The bigml package enables the R community to easily take advantage of our highly scalable cloud based machine learning infrastructure, while using familiar R data structures and workflows.

The package is available through CRAN here.

To use it, simply install it from CRAN as usual, and  load it via the library command.  You will also need to set your credentials, either by running the setCredentials command or by setting BIGML_USERNAME and BIGML_API_KEY in your .Renviron file.

install.packages("bigml")
library(bigml)
setCredentials("username", "api_key")

If this is the first time you’ve worked with cURL, it may be necessary to update your SSL certificates. You can do that from within R by:

download.file(url="http://curl.haxx.se/ca/cacert.pem", destfile="cacert.pem");
curl <- getCurlHandle();
options(RCurlOptions = list(capath = system.file("CurlSSL", "cacert.pem",package = "RCurl"), ssl.verifypeer = FALSE));
curlSetOpt(.opts = list(proxy = ’proxyserver:port’), curl = curl);

R’s dataframes are very similar to BigML datasets.  The bigml package lets you directly specify a model from an existing R dataframe.

So, let’s create a model from the iris data set that comes with R, and make a prediction with it.

iris.model = quickModel(iris, objective_field = 'Species' )
quickPrediction(iris.model,c(Sepal.Length=2, Petal.Length=3, Petal.Width=2))
# [1] "virginica"

That’s it!

The bigml package will manage the creation of appropriate sources, datasets, and models behind the scenes.  It will also let you use the existing names in the original dataframe to specify input or objective fields.

The “quick” methods are geared towards working with native R resources, and do a large number of things automatically.  If you prefer more manual control of the operations, basic methods are also provided :

write.csv(iris, "~/iris.csv", row.names=F)
iris.source = createSource("~/iris.csv", 'iris')
iris.dataset = createDataset(iris.source$resource)
iris.model = createModel(iris.dataset$resource)

After you create models, datasets, and other resources, you can easily retrieve your list of resources with the package as well. The bigml package reconfigures the API results into simple R dataframes in order to present well-structured records of every resource you have created at BigML:

listSources()
# e.g.
# name updated number_of_datasets etc...
#1 iris 2012-05-08T20:58:01.916000 1 ...
#2 test 2012-05-08T20:40:28.846000 1 ...
#3 census 2012-05-08T20:34:53.398000 1 ...
listDatasets()
listModels()

There are many more options available in the package, so be sure to check out the documentation for the individual functions, or the package itself.  It’s available here and within the package itself:

?bigml
?quickModel
?listModels

We’ve also made the package source code available at the bigml-r repo on github, in case you’re interested in following the progress, or making suggestions.

4 comments

  1. I get the following error using R 3.* both in Linux and Windows; what should I do to make the API work? Many thanks!

    Source creation in progress…
    Error in .check_for_code(result) :
    Error: BigML returned code HTTP_BAD_REQUEST {
    “code”: 400,
    “status”: {
    “code”: -1204,
    “extra”: {
    “dataset”: {
    “fields”: “The key of the objects in this field must be any of [‘description’, ‘term_analysis’, ‘name’, ‘preferred’, ‘label’]”
    }
    },
    “message”: “Bad request”
    }
    }

Leave a comment