R you ready for Big Machine Learning?
Recently, we released python bindings for our API. We received fantastic feedback on the related blog post from hacker news and twitter, so we started thinking about other languages that could benefit from a tighter integration with the BigML API.
Today BigML releases the bigml package for R. R is already well known for its capabilities in statistics and data analysis, and we use it internally for a number of different day-to-day tasks. The bigml package enables the R community to easily take advantage of our highly scalable cloud based machine learning infrastructure, while using familiar R data structures and workflows.
The package is available through CRAN here.
To use it, simply install it from CRAN as usual, and load it via the library command. You will also need to set your credentials, either by running the setCredentials command or by setting BIGML_USERNAME and BIGML_API_KEY in your .Renviron file.
install.packages("bigml") library(bigml) setCredentials("username", "api_key")
If this is the first time you’ve worked with cURL, it may be necessary to update your SSL certificates. You can do that from within R by:
download.file(url="http://curl.haxx.se/ca/cacert.pem", destfile="cacert.pem"); curl <- getCurlHandle(); options(RCurlOptions = list(capath = system.file("CurlSSL", "cacert.pem",package = "RCurl"), ssl.verifypeer = FALSE)); curlSetOpt(.opts = list(proxy = ’proxyserver:port’), curl = curl);
So, let’s create a model from the iris data set that comes with R, and make a prediction with it.
iris.model = quickModel(iris, objective_field = 'Species' ) quickPrediction(iris.model,c(Sepal.Length=2, Petal.Length=3, Petal.Width=2)) #  "virginica"
The bigml package will manage the creation of appropriate sources, datasets, and models behind the scenes. It will also let you use the existing names in the original dataframe to specify input or objective fields.
The “quick” methods are geared towards working with native R resources, and do a large number of things automatically. If you prefer more manual control of the operations, basic methods are also provided :
write.csv(iris, "~/iris.csv", row.names=F) iris.source = createSource("~/iris.csv", 'iris') iris.dataset = createDataset(iris.source$resource) iris.model = createModel(iris.dataset$resource)
After you create models, datasets, and other resources, you can easily retrieve your list of resources with the package as well. The bigml package reconfigures the API results into simple R dataframes in order to present well-structured records of every resource you have created at BigML:
listSources() # e.g. # name updated number_of_datasets etc... #1 iris 2012-05-08T20:58:01.916000 1 ... #2 test 2012-05-08T20:40:28.846000 1 ... #3 census 2012-05-08T20:34:53.398000 1 ... listDatasets() listModels()
There are many more options available in the package, so be sure to check out the documentation for the individual functions, or the package itself. It’s available here and within the package itself:
?bigml ?quickModel ?listModels
We’ve also made the package source code available at the bigml-r repo on github, in case you’re interested in following the progress, or making suggestions.