This is a guest post by Erik Meijer (@headinthebox). He is an accomplished programming-language designer who runs the Cloud Programmability Team at Microsoft and a professor of Cloud Programming at TUDelft.
There is a lot of hype and mystique around Machine Learning these days. The combination of the words “machine” and “learning” induces hallucinations of intelligent machines that magically learn by soaking up Big Data and then both solving world hunger and making us rich while we lay on the beach sipping a cold one.
Worse yet, the esoteric and mathematical terminology of many Machine Learning textbooks and research papers fuels the mystique, resulting in the persona of the Data Scientist as the 21st century druid that mystically distills insight and knowledge from raw data.
However, just as normal programmers can write code without needing to understand Universal Turing Machines, power domains, or predicate transformers, we believe that normal programmers can use Machine Learning without needing to understand vectors, features, probability density, Jacobians, etc. In fact, the very essence of Machine Learning is creating code from a finite set of sample input/output pairs. This is something that programmers are already deeply familiar with; in other words, Machine Learning is Test Driven Development performed by code (TDD).
For example, given the well-know “Iris flower dataset” as our test case:
Sepal length | Sepal width | Petal length | Petal width | Iris |
5.1 | 3.5 | 1.4 | 0.2 | Setosa |
7.0 | 3.2 | 4.7 | 1.4 | Versicolor |
6.1 | 2.6 | 5.6 | 1.4 | Virginica |
we can apply Machine Learning to generate executable code that “predicts” the class of Iris given the four parameters:
enum Iris { Setosa, Versicolor, Virginica } Iris Predict(double sepalLength, double sepalWidth, double petalLength, double petalWidth){ … }
The nice folks at BigML have created a service that allows developers like you and me to either use Machine Learning as TDD programmatically using a simple REST API, or manually using an elegant website. The process involves uploading a datasource (typically a csv file), converting it into a typed dataset, and finally creating a model using the decision tree generator algorithm.
This model can then be rendered as code in several programming languages such as C#, Java, Objective-C, Python, etc or exposed as an interactive webpage where users can navigate the model by answering questions.
When confronted with a REST API, the very first thing that every sane developer does is to implement a high level abstraction that hides all the low-level details of making HTTP requests, creating query strings, and munching JSON. Developers from many languages that use BigML’s REST API are no exception to this rule, and have created bindings for Clojure, iOS, Java, Python, R, Ruby and now also .NET.
The .NET bindings for BigML that are available on Github expose a full LINQ provider, a strongly typed projection of all the JSON objects exposed by the REST API, as well as the ability to compile models to .NET assemblies.
To access BigML using the .NET bindings, you first create a new client object by passing your user name and API key. The client object provides (strongly typed) methods for all the operations provided by the BigML API as documented here; for example listing, filtering and sorting your BigML sources using LINQ queries. Of course the binding may not reflect all of the latest features, for example we do not implement Evaluations yet, but that is why we provide the source on Github. The implementation of the LINQ provider may be an interesting topic of study by itself, and follows the pattern as outlined in the CACM paper “The World According to LINQ”.
// New BigML client using username and API key. Console.Write("user: "); var User = Console.ReadLine(); Console.Write("key: "); var ApiKey = Console.ReadLine(); var client = new Client(User, ApiKey); Ordered<Source.Filterable, Source.Orderable, Source> result = (from s in client.ListSources() orderby s.Created descending select s); var sources = await result; foreach(var src in sources) Console.WriteLine(src.ToString());
Below is an example of how to create a new source from an in-memory collection, then a dataset and finally a model. Since the BigML resource creation is asynchronous we need to poll until we get the status code “finished” back from the service. Note that BigML (and the .NET bindings) also supports creating sources from local files, Amazon S3, or Azure Blob store.
<em id="__mceDel">// New source from in-memory stream, with separate header. var source = await client.Create(iris, "Iris.csv", "sepal length, sepal width, petal length, petal width, species"); // No push, so we need to busy wait for the source to be processed. while ((source = await client.Get(source)).StatusMessage.StatusCode != Code.Finished) await Task.Delay(10); Console.WriteLine(source.StatusMessage.ToString()); // Default dataset from source var dataset = await client.Create(source); // No push, so we need to busy wait for the source to be processed. while ((dataset = await client.Get(dataset)).StatusMessage.StatusCode != Code.Finished) await Task.Delay(10); Console.WriteLine(dataset.StatusMessage.ToString()); // Default model from dataset var model = await client.Create(dataset); // No push, so we need to busy wait for the source to be processed. while ((model = await client.Get(model)).StatusMessage.StatusCode != Code.Finished) await Task.Delay(10); Console.WriteLine(model.StatusMessage.ToString());
Of course what we are really after, since we want to show that Machine Learning is automated TDD, is the generated model for our source. The model description is a giant JSON object that represents the decision tree that BigML has “learned” from the data we fed it. In the example below, we translate the model into a .NET expression tree, compile the expression tree into a .NET delegate, and then call it on one of the test inputs to see if it predicts the same kind of iris:
<em id="__mceDel"><em id="__mceDel">var description = model.ModelDescription; Console.WriteLine(description.ToString()); // First convert it to a .NET expression tree var expression = description.Expression(); Console.WriteLine(expression.ToString()); // Then compile the expression tree into MSIL var predict = expression.Compile() as Func<double,double,double,double,string>; // And try the first flower of the example set. var result2 = predict(5.1, 3.5, 1.4, 0.2); Console.WriteLine("result = {0}, expected = {1}", result2, "setosa");
We hope that this library makes BigML easily accessible to .NET programmers that want to incorporate Machine Learning in their applications. And we hope to see it flourish like all the other bindings for BigML that are available at https://github.com/bigmlcom/io.
Awesome! I have to get time to test that!
Money quote: “we believe that normal programmers can use Machine Learning without needing to understand vectors, features, probability density” – yes, those “features” are scary and complicated!
deal unir esfuerzos
https://blog.bigml.com/2013/03/06/democratizing-machine-learning-with-c/
http://numl.net/
https://rodrigopb.wordpress.com/2015/03/12/pasos-para-realizar-un-experimento-de-aprendizaje-automatico-y-azure-machine-learning/
http://blog.koalite.com/2015/03/machine-learning-con-princesas-disney/
Saludos.
How does BigML compare to Azure ML in ease of use and flexibility?
We would say BigML is much easier to use and much more flexible. You don’t need to run everything in Azure, you can export your models, and use them everywhere. But why don’t you try both services and compare by yourself? it’s a great exercise.