Skip to content

BigML Summer 2017 Release Webinar Video is Here: Deepnets!

We are happy to share that Deepnets are fully implemented on our platform and available from the BigML Dashboard, API, as well as from WhizzML for its automation.

BigML Deepnets are the world’s first deep learning algorithm capable of Automatic Network Detection and Structure Search, which automatically optimize model performance and minimize the need for expensive specialists to drive business value. Following our mission to make Machine Learning beautifully simple for everyone, BigML now offers the very first service that enables non-experts to use deep learning with results matching that of top-level data scientists. BigML’s extensive benchmark conducted on 50+ datasets has shown Deepnets, an optimized version of Deep Neural Networks brought to the BigML platform, to outperform other algorithms offered by popular Machine Learning libraries. With nothing to install, nothing to configure, and no need to specify a neural network structure, anyone can use BigML’s Deepnets to transform raw data into valuable business insights.

Special thanks to all webinar attendees who joined the BigML Team yesterday during the official launch. For those who missed the live webinar, you can watch the full recording on the BigML YouTube channel.

As explained in the video, one of the main complexities of deep learning is that a Machine Learning expert is required to find the best network structure for each problem. This can often be a tedious trial-and-error process that can take from days to weeks. To combat these challenges and make deep learning accessible for everyone, BigML now enables practitioners to find the best network for their data without having to write custom code or hand-tune parameters. We make this possible with two unique parameter optimization options: Automatic Network Search and Structure Suggestion.

BigML’s Automatic Network Search conducts an intelligent guided search over the space of possible networks to find suitable configurations for your dataset. The final Deepnet will use the top networks found in this search to make predictions. This capability yields a better model, however, it takes longer since the algorithm conducts an extensive search for the best solution. It’s ideal for use cases that justify the incremental wait for optimal Deepnet performance. On the other hand, BigML’s Structure Suggestion only takes nominally longer than training a single network. This option is capable of swiftly recommending a neural network structure that is optimized to work well with your particular dataset.

For further learning on Deepnets, please visit our dedicated summer 2017 release page, where you will find:

  • The slides used during the webinar.
  • The detailed documentation to learn how to create and evaluate your Deepnets, and interpret the results before making predictions from the BigML Dashboard and the BigML API.
  • The series of six blog posts that gradually explain Deepnets.

Thanks for your positive comments after the webinar. And remember that you can always reach out to us at support@bigml.com for any suggestions or questions.

Deepnets: Behind The Scenes

Over our last few blog posts, we’ve gone through the various ways you can use BigML’s new Deepnet resource, via the Dashboard, programmatically, and via download on your local machine. But what’s going on behind the curtain? Is there a little wizard pulling an elaborate console with cartoonish-looking levers and dials?

Well, as we’ll see, Deepnets certainly do have a lot of levers and dials.  So many, in fact, that using them can be pretty intimidating. Thankfully, BigML is here to be your wizard* so you aren’t the one looked shamefacedly at Dorothy when she realizes you’re not as all-powerful as you might have thought.

BigML Deep Learning

Deepnets:  Why Now?

First, let’s address an important question, why are deep neural networks suddenly all the rage?  After all, the Machine Learning techniques underpinning deep learning have been around for quite some time. The reason boils down to a combination of innovations in the technology supporting the learning algorithms more than advances in learning algorithms themselves. It’s worth quoting Stuart Russell at length here:

. . .deep learning has existed in the neural network community for over 20 years. Recent advances are driven by some relatively minor improvements in algorithms and models and by the availability of large data sets and much more powerful collections of computers.

He gets at most of the reasons in this short paragraph.  Certainly, the field has been helped along by the availability of huge datasets like the ones generated by Google and Facebook, as well as some academic advances in algorithms and models.

But the things I will focus on in this post are in the family of “much more powerful collections of computers”.  In the context of Machine Learning, I think this means two things:

  • Highly parallel, memory-rich computers provisionable on-demand.  Few people can justify building a massive GPU-based server to train a deep neural network on huge data if they’re only going to use it every now and then. But most people can afford the same for a few days at a time. Making such power available in this way makes deep learning cost-effective for far more people than it used to be.
  • Software frameworks that have automatic differentiation as first-class functionality. Modern computational frameworks (like TensorFlow, for example) allow programmers to instantiate a network structure programmatically and then just say “now do gradient descent!” without ever having to do any calculus or worry about third-party optimization libraries.  Because differentiation of the parameters with respect to the input data is done automatically, it becomes much easier to try a wide variety of structures on a given dataset.

The problem here then becomes one of expertise:  people need powerful computers to do Machine Learning, but few people know how to provision and deploy machines on, say Amazon AWS to do this. Similarly, computational frameworks to specify and train almost any deep network exist and are very nice, but exactly how to use those frameworks and exactly what sort of network to create are problems that require knowledge in a variety of different domains.

Of course, this is where we come in. BigML has positioned itself at the intersection of these two innovations, utilizing a framework for network construction that allows us to train a wide variety of network structures for a given dataset, using the vast amount of available compute power to do it quickly and at scale, as we do with everything else. We add to these innovations a few of our own, in an attempt to make Deepnet learning as “hands off” and as “high quality” as we can manage.

What BigML Deepnets are (currently) Not:

Before we get into exactly what BigML’s Deepnets are let’s talk a little bit about what they aren’t. Many people who are technically minded will immediately bring to mind the convolutional networks that do so well on vision problems, or the recurrent networks (such as LSTM networks) that have great performance on speech recognition and NLP problems.

We don’t yet support these network types.  The main reason for this is that these networks are designed to solve supervised learning problems that have a particular structure, not the general supervised learning problem.  It’s very possible we’ll support particular problems of those types in the future (as we do with, say, time series analysis and topic modeling), and we’d introduce those extensions to our deep learning capabilities at that time.

Meanwhile, we’d like to bring the power of deep learning so obvious in those applications to our users in general.  Sure, deep learning is great for vision problems, but what can it do on my data?  Hopefully, this post will prove to you that it can do quite a lot.

Down to Brass Tacks

The central idea behind neural networks is fairly simple, and not so very different from the idea behind logistic regression:  You have a number of input features and a number of possible outputs (either one probability for each class in classification problems or a single value for regression problems). We posit that the outputs are a function of the dot product of the input features, together with some learned weights for each feature.  In logistic regression, for example, we imagine that the probability of a given class can be expressed as the logistic function applied to this dot product (plus some bias term).

Deep networks extend this idea in several ways. First, we introduce the idea of “layers”, where the inputs are fed into a layer of output nodes, the outputs of which are in turn fed into another layer of nodes, and so on until we get to a final “readout” layer with the number of output nodes equal to the number of classes.

deepnet.png

This gives rise to an infinity of possible network topologies:  How many intermediate (hidden) layers do you want?  How many nodes in each one?  There’s no need to apply the logistic function at each node; in principle, we can apply anything that our network framework can differentiate. So we could have a particular “activation function” for each layer or even for each node! Could we apply multiple activation functions?  Do we learn a weight for every single node in one layer to every single node in another, or do we skip some?

Add to this the usual parameters for gradient descent: Which algorithm to use?  How about the learning rate? Are we going to dropout training to avoid overfitting?  And let’s not even get into the usual parameters common to all ML algorithms, like how to handle missing data or objective weights. Whew!

Extra Credit

What we’ve described above is the vanilla feed-forward neural network that the literature has known for quite a while, and we can see that they’re pretty parameter heavy. To add a bit more to the pile, we’ve added support for a couple of fairly recent advances in the deep learning world (some of the “minor advances” mentioned by Russel) that I’d like to mention briefly.

Batch Normalization

During the training of deep neural networks, the activations of internal layers of the network can change wildly throughout the course of training. Often, this means that the gradients computed for training can behave poorly so one must be very careful to select a sufficiently low learning rate to mitigate this behavior.

Batch normalization fixes this by normalizing the inputs to each layer for each training batch of instances, assuring that the inputs are always mean-centered and unit-variance, which implies well-behaved gradients when training. The downside is that you must now know both mean and variance for each layer at prediction time so you can standardize the inputs as you did during training. This extra bookkeeping tends to slow down the descent algorithm’s per-iteration speed a bit, though it sometimes leads to faster and more robust convergence, so the trade-off can be worthwhile for certain network structures.

Learn Residuals

Residual networks are networks with “skip” connections built in. That is, every second or third layer, the input to each node is the usual output from the previous layer, plus the raw, unweighted output from two layers back.

The theory behind this idea is that this allows information present in the early layers to bubble up through the later layers of the network without being subjected to a possible loss of that information via reweighting on subsequent layers. Thus, the later layers are encouraged to learn a function representing the residual values that will allow for good prediction when used on top of the values from the earlier layers. Because the early layers contain significant information, the weights for these residual layers can often be driven down to zero, which is typically “easier” to do in a gradient descent context than to drive them towards some particular non-zero optimum.

Tree Embedding

When this option is selected, we learn a series of decision trees, random forest-style, against the objective before learning (with appropriate use of holdout sets, of course). We then use the predictions of the trees as generated “input features” for the network. Because these features tend to have a monotonic relationship with the class probabilities, this can make gradient descent reach a good solution more quickly, especially in domains where there are many non-linear relationships between inputs and outputs.

If you want, this is a rudimentary form of stacked generalization embedded in the tree learning algorithm.

Pay No Attention To The Man Behind The Curtain

So it seems we’ve set ourselves an impossible task here. We have all of these parameters. How on earth are we going to find a good network in the middle of this haystack?

Here’s where things get BigML easy:  the answer is that we’re going to do our best to find ones that work well on the dataset we’re given. We’ll do this in two ways:  via metalearning and via hyper-parameter search.

Metalearning

Metalearning is another idea that is nearly as old as Machine Learning itself.  In its most basic form, the idea is that we learn a bunch of classifiers on a bunch of different datasets and measure their performance. Then, we apply Machine Learning to get a model that predicts the best classifier for a given dataset. Simple, right?

In our application, this means we’re going to learn networks of every sort of topology parameterized in every sort of way. What do I mean by “every sort”?  Well, we’ve got over 50 datasets so we did five replications of 10-fold cross-validation on each one. For each fold, we learned 128 random networks, then we measured the relative performance of each network on each fold.  How many networks is that?  Here, allow me to do the maths:

50 * 5 * 10 * 128 = 320,000.

“Are you saying you trained 320,000 neural networks”  No, no, no, of course not!  Some of the datasets were prohibitively large so we only learned a paltry total of 296,748 networks.  This is what we do for fun here, people!

When we model the relative quality of the network given the parameters (which of course do use BigML), we learn a lot of interesting little bits about how the parameters of neural networks relate to one another and the data on which they’re being trained.

You’ll get better results using the “adadelta” decent algorithm, for example, with high learning rates, as you can see by the green areas of the figure below, indicating parts of the parameter space that specify networks, which perform better on a relative basis.

adagrad.png

But if you’re using “rms_prop”, you’ll want to use learning rates that are several orders of magnitude lower.

rms_prop.png

Thankfully, you don’t have to remember all of this.  The wisdom is in the model, and we can use this model to make intelligent suggestions about which network topology and parameters you should use, which is exactly what we do with the “Automatic Structure Suggestion” button.

Structuresuggestion.png

But, of course, the suggested structure and parameters represent only a guess at the optimal choices, albeit an educated one.  What if we’re wrong?  What if there’s another network structure out there that performs well on your data, if you could only find it.  Well, we’ll have to go looking for it, won’t we?

Network Search

Our strategy here again comes in many pieces.  The first is Bayesian hyperparameter optimization.  I won’t go into it too much here because this part of the architecture isn’t too much more than a server-side implementation of SMACdown, which I’ve described previously. This technique essentially allows a clever search through the space of possible models by using the results of previously learned networks to guide the search. The cleverness here lies in using the results of the previously trained networks to select the best next one to evaluate. Beyond SMAC, we’ve done some experiments with regret-based acquisition functions, but the flavor of the algorithm is the same.

We’ve also more heavily parallelized the algorithm, so the backend actually keeps a queue of networks to try, reordering that queue periodically as its model of network performance is refined.

The final innovation here is using bits and pieces of the hyperband algorithm.  The insight of the algorithm is that a lot of savings in computation time can be gained by simply stopping training on networks that, even if their fit is still improving, have little hope of reaching near-optimal performance in reasonable time. Our implementation differs significantly in the details (in our interface, for example, you provide us with a budget of search time that we always respect), but does stop training early for many underperforming networks, especially in the later stages of the search.

This is all available right next door to the structure suggestion button in the interface.Automatic Network SearchBetween metalearning and network search, we can make a good guess as to some reasonable network parameters for your data, and we can do a clever search for better ones if you’re willing to give us a bit more time.  It sounds good on paper, so let’s take it for a spin.

Benchmarking

So how did we do? Remember those 50+ datasets we used for metalearning?  We can use the same datasets to benchmark the performance of our network search against other algorithms (before you ask, no, we didn’t apply our metalearning model during this search as that would clearly be cheating). We again do 5 replications of 10-fold cross validation for each dataset and measure performance over 10 different metrics. As this was a hobby project I started before I came to BigML, you can see the result here.

Deepnets Benchmark

You can see on the front page a bit of information about how the benchmark was conducted (scroll below the chart), and a comparison of 30+ different algorithms from various software packages. The thing that’s being measured here is “How close is the performance of this algorithm to the best algorithm for a given dataset and metric?”  You can quickly see that BigML Deepnets are the best thing or quite close more often than any other off-the-shelf algorithm we’ve tested.

The astute reader will certainly reply, “Well, yes, but you’ve done all manner of clever optimizations on top of the usual deep learning; you could apply such cleverness to any algorithm in the list (or a combination!) and maybe get better performance still!”

This is absolutely true.  I’m certainly not saying that BigML deep learning is the best Machine Learning algorithm there is; I don’t even know how you would prove something like that. But what these results do show is that if you’re going to just pull something off of the shelf and use it, with no parameter tuning and little or no coding, you could do a lot worse than to pull off BigML. Moreover, the results show that deep learning (BigML or otherwise; notice that multilayer perceptrons in scikit are just a few clicks down the list) isn’t just for vision and NLP problems, and it might be the right thing for your data too.

Another lesson here, editorially, is that benchmarking is a subtle thing, easily done wrong. If you go to the “generate abstract” page, you can auto-generate a true statement (based on these benchmarks) that “proves” that any algorithm in this list is “state-of-the-art”. Never trust a benchmark on a single dataset or a single metric! While any benchmark of ML algorithms is bound to be inadequate, we hope that this benchmark is sufficiently general to be useful.

Cowardly Lion Deepnets

Hopefully, all of this has convinced you to go off to see the Wizard and give BigML Deepnets a try. Don’t be the Cowardly Lion! You have nothing to lose except underperforming models…

Want to know more about Deepnets?

Please visit the dedicated release page for further learning. It includes a series of six blog posts about Deepnets, the BigML Dashboard and API documentation, the webinar slideshow as well as the full webinar recording.

 

Automating Deepnets with WhizzML and The BigML Python Bindings

by

This blog post, the fifth of our series of six posts about Deepnets, focuses on those users that want to automate their Machine Learning workflows using programming languages. If you follow the BigML blog, you may know WhizzML, BigML’s domain-specific language for automating Machine Learning workflows, implementing high-level Machine Learning algorithms, and easily sharing them with others. WhizzML helps developers to create Machine Learning workflows and execute them entirely in the cloud. This avoids network problems, memory issues and lack of computing capacity while taking full advantage of WhizzML’s built-in parallelization. If you are not familiar with WhizzML yet, we recommend that you read the series of posts we published this summer about how to create WhizzML scripts: Part 1, Part 2 and Part 3 to quickly discover their benefits.

In addition, in order to easily automate the use of BigML’s Machine Learning resources, we maintain a set of bindingswhich allow users to work in their favorite language (Java, C#, PHP, Swift, and others) with the BigML platform.

Screen Shot 2017-03-15 at 01.51.13

Let’s see how to use Deepnets through both the popular BigML Python Bindings and WhizzML, but note that the operations described in this post are also available in this list of bindings.

We start creating Deepnets with the default settings. For that, we need to start from an existing Dataset to train the network in BigML, so our call to the API will need to include the Dataset ID we want to use for training as shown below:

;; Creates a deepnet with default parameters
(define my_deepnet (create-deepnet {"dataset" training_dataset}))

The BigML API is mostly asynchronous, that is, the above creation function will return a response before the Deepnet creation is completed. This implies that the Deepnet information is not ready to make predictions right after the code snippet is executed, so you must wait for its completion before you can predict with it. You can use the directive “create-and-wait-deepnet” for that:

;; Creates a deepnet with default settings. Once it's
;; completed the ID is stored in my_deepnet variable
(define my_deepnet
  (create-and-wait-deepnet {"dataset" training_dataset}))

If you prefer the BigML Python Bindings, the equivalent code is:

from bigml.api import BigML
api = BigML()
my_deepnet = api.create_deepnet("dataset/59b0f8c7b95b392f12000000")

Next up, we will configure a Deepnet with WhizzML. The configuration properties can be easily added in the mapping by using property pairs such as <property_name> and <property_value>. For instance, to create a Deepnet with a dataset, BigML automatically fixes the number of iterations to optimize the network to 20,000, but if you prefer the maximum number of gradient steps to take during the optimization process, you should add the property “max_iterations and set it to 100,000. Additionally, you might want to set the value used by the Deepnet when numeric fields are missing. Then, you need to set thedefault_numeric_valueproperty to the right value. In the example shown below, it is replaced by the mean value. Property names always need to be between quotes and the value should be expressed in the appropriate type. The code for our example can be seen below:

;; Creates a deepnet with some settings. Once it's
;; completed the ID is stored in my_deepnet variable
(define my_deepnet
  (create-and-wait-deepnet {"dataset" training_dataset
                            "max_iterations" 100000
                            "default_numeric_value" "mean"}))

The equivalent code for the BigML Python Bindings is:

from bigml.api import BigML
api = BigML()
args = {"max_iterations": 100000, "default_numeric_value": "mean"}
training_dataset ="dataset/59b0f8c7b95b392f12000000"
my_deepnet = api.create_deepnet(training_dataset, args)

For more details about these and other properties, please check the dedicated API documentation (available on October 5.)

Once the Deepnet has been created, we can evaluate how good its performance is. Now, we will use a different dataset with non-overlapped data to check the Deepnet performance.  The “test_dataset” parameter in the code shown below represents the second dataset. Following WhizzML’s philosophy of “less is more”, the snippet that creates an evaluation has only two mandatory parameters: a Deepnet to be evaluated and a Dataset to use as test data.

;; Creates an evaluation of a deepnet
(define my_deepnet_ev
 (create-evaluation {"deepnet" my_deepnet "dataset" test_dataset}))

Similarly, in Python the evaluation is done as follows:

from bigml.api import BigML
api = BigML()
my_deepnet = "deepnet/59b0f8c7b95b392f12000000"
test_dataset = "dataset/59b0f8c7b95b392f12000002"
evaluation = api.create_evaluation(my_deepnet, test_dataset)

After evaluating your Deepnet, you can predict the results of the network for new values of one or many features in your data domain. In this code, we demonstrate the simplest case, where the prediction is made only for some fields in your dataset.

;; Creates a prediction using a deepnet with specific input data
(define my_prediction
 (create-prediction {"deepnet" my_deepnet
                     "input_data" {"sepal length" 2 "sepal width" 3}}))

And the equivalent code for the BigML Python bindings is:

from bigml.api import BigML
api = BigML()
input_data = {"sepal length": 2, "sepal width": 3}
my_deepnet = "deepnet/59b0f8c7b95b392f12000000"
prediction = api.create_prediction(my_deepnet, input_data)

In addition to this prediction, calculated and stored in BigML servers, the Python Bindings allow you to instantly create single local predictions on your computer. The Deepnet information is downloaded to your computer the first time you use it, and the predictions are computed locally on your machine, without any costs or latency:

from bigml.deepnet import Deepnet
local_deepnet = Deepnet("deepnet/59b0f8c7b95b392f12000000")
input_data = {"sepal length": 2, "sepal width": 3}
local_deepnet.predict(input_data)

It is pretty straightforward to create a Batch Prediction from an existing Deepnet, where the dataset named “my_dataset” represents a set of rows with the data to predict by the network:

;; Creates a batch prediction using a deepnet 'my_deepnet'
;; and the dataset 'my_dataset' as data to predict for
(define my_batchprediction
 (create-batchprediction {"deepnet" my_deepnet
                          "dataset" my_dataset}))

The code in Python Bindings to perform the same task is:

from bigml.api import BigML
api = BigML()
my_deepnet = "deepnet/59d1f57ab95b39750c000000"
my_dataset = "dataset/59b0f8c7b95b392f12000000"
my_batchprediction = api.create_batch_prediction(my_deepnet, my_dataset)

Want to know more about Deepnets?

Our next blog post, the last one of this series, will cover how Deepnets work behind the scenes, diving into the most technical aspects of BigML’s latest resource. If you have any questions or you’d like to learn more about how Deepnets work, please visit the dedicated release page. It includes a series of six blog posts about Deepnets, the BigML Dashboard and API documentation, the webinar slideshow as well as the full webinar recording.

Programming Deepnets with the BigML API

So far, we have introduced BigML’s Deepnets, how they are used, and how to create one in the BigML Dashboard. In this post, the fourth in our series of blog posts about Deepnets, we will see how to use them programmatically using the API. So let’s start!

The API workflow to create Deepnets includes five main steps: first, upload your data to BigML, then create a dataset, create a Deepnet, evaluate it and finally make predictions. Note that any resource created with the API will automatically be created in your Dashboard too so you can take advantage of BigML’s intuitive visualizations at any time.

Authentication

The first step in using the API is setting up your authentication. This is done by setting the BIGML_USERNAME and BIGML_API_KEY environment variables. Your username is the same as the one you use to log into the BigML website. To find your API key, on the website, navigate to your user account page and then click on ‘API Key’ on the left. To set your environment variables, you can add lines like the following to your .bash_profile file.

export BIGML_USERNAME=my_name
export BIGML_API_KEY=13245 
export BIGML_AUTH="username=$BIGML_USERNAME;api_key=$BIGML_API_KEY;"

Once your authentication is set up, you can begin your workflow.

1. Upload your Data

Data can be uploaded to BigML in many different ways. You can use a local file, a remote URL, or inline data. To create a source from a remote URL, use the curl command:

 curl "https://bigml.io/source?$BIGML_AUTH" \
 -X POST \
 -H 'content-type: application/json' \
 -d '{"remote": "https://static.bigml.com/csv/iris.csv"}'

2. Create a Dataset

Now that you have a source, you will need to process it into a dataset. Use the curl command:

curl "https://bigml.io/dataset?$BIGML_AUTH" \
-X POST \
-H 'content-type: application/json' \
-d '{"source": "source/59c14dce2ba7150b1500fdb5"}'

If you plan on running an evaluation later (and you will want to evaluate your results!), you will want to split this dataset into a testing and a training dataset. You will create your Deepnet using the training dataset (commonly 80% of the original dataset) and then evaluate it against the testing dataset (the remaining 20% of the original dataset). To make this split using the API, you will first run the command:

curl "https://bigml.io/dataset?$BIGML_AUTH" \
-X POST \
-H 'content-type: application/json' \
-d '{"origin_dataset": "dataset/59c153eab95b3905a3000054", 
     "sample_rate": 0.8, 
     "seed": "myseed"}'

using the sample rate and seed of your choice. This creates the training dataset.

Now, to make the testing dataset, run:

curl "https://bigml.io/dataset?$BIGML_AUTH" \
-X POST \
-H 'content-type: application/json' \
-d '{"origin_dataset": "dataset/59c153eab95b3905a3000054", 
     "sample_rate": 0.8,
     "out_of_bag": true,
     "seed": "myseed"}'

By setting “out_of_bag” to true, you are choosing all the rows you did not choose while creating the training set. This will be your testing dataset.

3. Create a Deepnet

Now that you have the datasets you need, you can create your Deepnet. To do this, use the command:

curl "https://bigml.io/deepnet?$BIGML_AUTH" \
-X POST \
-H 'content-type: application/json' \
-d '{"dataset": "dataset/59c15634b95b3905a1000032"}' 

being sure to use the dataset ID of your training dataset. This will create a Deepnet by using the default settings.

You can also modify the settings of your Deepnet in various ways. For example, if you want to change the maximum number of gradient steps to be ten and you want to name your deepnet “my deepnet” you could run:

curl "https://bigml.io/deepnet?$BIGML_AUTH" \
-X POST \
-H 'content-type: application/json' \
-d '{"dataset": "dataset/59c15634b95b3905a1000032",
     "max_iterations": 10,
     "name": "my deepnet"}' 

The full list of Deepnet arguments can be found in our API documentation, which will be fully available on October 5.

4. Evaluate your Deepnet

Once you have created a Deepnet, you may want to evaluate it to see how well it is performing. To do this, create an Evaluation using the resource ID of your Deepnet and the dataset ID of the testing dataset you created earlier.

curl "https://bigml.io/evaluation?$BIGML_AUTH" \
-X POST \
-H 'content-type: application/json' \
-d '{"deepnet": "deepnet/59c157cfb95b390597000085"'
     "dataset": "dataset/59c1568ab95b3905a0000040"}'

Once you have your Evaluation, you may decide that you want to change some of your Deepnet parameters to improve its performance. If so, just repeat step three with different parameters.

5. Make Predictions

When you are satisfied with your Deepnet, you can begin to use it to make predictions. For example, suppose you wanted to predict if the value of field “000001” was 3. To do this, use the command:

curl "https://bigml.io/prediction?$BIGML_AUTH" \
-X POST \
-H 'content-type: application/json' \
-d '{"deepnet": "deepnet/59c157cfb95b390597000085", 
     "input_data" : {"000001": 3}}'

Want to know more about Deepnets?

Stay tuned for the more blog posts! In the next post, we will explain how to automate Deepnets with WhizzML and the BigML Python Bindings. Additionally, if you have any questions or you’d like to learn more about how Deepnets work, please visit the dedicated release page. It includes a series of six blog posts about Deepnets, the BigML Dashboard and API documentation, the webinar slideshow as well as the full webinar recording.

 

Creating your Deepnets with the BigML Dashboard

The BigML Team has been working hard this summer to bring Deepnets to the platform, which will be available on October 5, 2017. As explained in our introductory post, Deepnets are an optimized implementation of the popular Deep Neural Networks, a supervised learning technique that can be used to solve classification and regression problems. Neural Networks became popular because they were able to address complex problems for a machine to solve like identifying objects in images.

The previous blog post presented a study case that showed how Deepnets can help you solve your real-world problems. In this post, we will take you through the five necessary steps to train a Deepnet that correctly identifies the digits using the BigML Dashboard. We will use a partial version of the well-known MNIST dataset (provided by Kaggle) for image recognition that contains 42,000 handwritten images of numbers from 0 to 9.

1. Upload your Data

As usual, you need to start by uploading your data to your BigML account. BigML offers several ways to do it, you can drag and drop a local file, connect BigML to your cloud repository (e.g., S3 buckets) or copy and paste a URL. This will create a source in BigML.

BigML automatically identifies the field types. In this case, our objective field (“label”) has been identified as numeric because it contains the digit values from 0 to 9. Since this is a classification problem (we want our Deepnet to predict the exact digit label for each image instead of a continuous numeric value), we need to configure the field type and select “Categorical” for our objective field.

source-config.png

2. Create a Dataset

From your source view, use the 1-click dataset menu option to create a dataset, a structured version of your data ready to be used by a Machine Learning algorithm.

1-click-dataset.png

In the dataset view, you will be able to see a summary of your field values, some basic statistics, and the field histograms to analyze your data distributions. You can see that our dataset has a total of 42,000 instances and approximately 4,500 instances for each digit class in the objective field.

field-distrib.png

The dataset also includes a total of 784 fields containing the pixel information for each image. You can see that many fields are automatically marked as non-preferred in this dataset. This is because they contain the same value for all images so they are not good predictors for the model.

non-preferred fields.png

Since Deepnets are a supervised learning method, it is key to train and evaluate your model with different data to ensure it generalizes well against unseen data. You can easily split your dataset using the BigML 1-click menu option, which randomly sets aside 80% of the instances for training and 20% for testing.

split-dataset.png

3. Create a Deepnet

In BigML you can use the 1-click Deepnet menu option, which will create the model using the default parameter values, or you can tune the parameters using the Configure Deepnet option.

create-deepnet.png

BigML provides the following parameters to manually configure your Deepnet:

  • Maximum training time and maximum iterations: you can set an upper bound to the Deepnet runtime by setting a maximum of time or a maximum number of iterations.
  • Missing numerics and default numeric value: if your dataset contains missing values you can include them as valid values or replace them with the field mean, minimum, maximum or zero.
  • Network architecture: you can define the number of hidden layers in your network, the activation function and the number of nodes for each of them. BigML also provides other three options related to how the layer connections are arranged such as residual learning, batch normalization and tree embedding (a tree-based representation of the data to input along with the raw features).
  • Algorithm: you can select the gradient descent optimizer including Adam, Adagrad, Momentum, RMSProp, and FTRL. For each of these algorithms, you can tune the learning rate, the dropout rate and a set of specific parameters which explanation goes beyond the purpose of this post. To know more about the differences between algorithms you can read this article.
  • Weights: if your dataset contains imbalanced classes for the objective field, you can automatically balance them with this option that uses the oversampling strategy for the minority class.

Neural Networks are known for being notoriously sensitive to the chosen architecture and the algorithm used to optimize the parameters thereof. Due to this sensitivity and the large number of different parameters, hand-tuning Deepnets can be difficult and time-consuming as the number of choices that lead to poor networks typically vastly outnumber the choices that lead to good results.

To combat this problem, BigML offers first-class support for automatic parameter optimization. In this case, we will use the automatic network search option. This option trains and evaluates over all possible network configurations, returning the best networks found for your problem. The final Deepnet will use the top networks found in this search to make predictions.

config-deepnet.png

When the network search optimizer is enabled, the Deepnet creation may take some time to be created (so be patient!). By default, the maximum training time is set to 30 minutes, but you can configure it.

When your Deepnet is created you will be able to visualize the results in the Partial Dependence Plot. This unique view allows you to inspect the input fields’ impact on predictions. You can select two different fields for the axes and set the values for the rest of input fields to the right. You can see the predictions for each of the classes in different colors in the legend to the right by hovering the chart area. Each class color is shaded according to the class probability.

PDP-viz.png

4. Evaluate your Deepnet

The Deepnet looks good, but we can’t know anything about the performance until we evaluate it. From your Deepnet view, click on the evaluate option in the 1-click menu and BigML will automatically select the remaining 20% of the dataset that you set aside for testing.

evaluate.png

You can see in the image below that this model has an overall accuracy of 96.1%, however, a high accuracy may be hiding a bad performance for some of the classes.

evaluation.png

To look at the correct decisions as well as the mistakes made by the model per class we need to look at the confusion matrix, but we have too many different categories in the objective field and BigML cannot plot all them in the Dashboard, so we need to download the confusion matrix in Excel format.

confusion-matrix.png

Ok, it seems we get very good results! In the diagonal of the table you can find the right decisions made by the model. Almost all categories have a precision and recall over 95%. Despite these great results, can they get better using other algorithms? We trained a Random Decision Forest (RDF) of 20 trees and a Logistic Regression to compare their performances with Deepnets. BigML offers a comparison tool to easily compare the results of different classifiers. We use the well-known ROC curve and the AUC as the comparison measure. We need to select a positive class, to make the comparison (in this case we selected the digit 9) and the ROC curves for each model are plotted in the chart. You can see in the image below how our Deepnet outperforms the other two models since its ROC AUC is higher.

comparison-tool.png

Although RDF also provides pretty good results, the ROC AUC for Deepnets is consistently better for all the digits from 0 to 9 as you can see in the first column in the table below.

table-comparison.png

5. Make Predictions using your Deepnet

Predictions work for Deepnets exactly the same as for any other supervised method in BigML. You can make predictions for a new single instance or multiple instances in batch.

Single predictions

Click on the Predict option from your Deepnet view. A form containing all your input fields will be displayed and you will be able to set the values for a new instance. At the top of the view, you will get all the objective class probabilities for each prediction.

predict.png

Since this dataset contains more than 100 fields, we cannot perform single predictions from the BigML Dashboard, but we can use the BigML API for that.

Batch predictions

If you want to make predictions for multiple instances at the same time, click on the Batch Prediction option and select the dataset containing the instances for which you want to know the objective field value.

batch-pred2.png

You can configure several parameters of your batch prediction like the possibility to include all class probabilities in the output dataset and file. When your batch prediction finishes, you will be able to download the CSV file and see the output dataset.

batch-pred.png

Want to know more about Deepnets?

Stay tuned for the next blog post, where you will learn how to program Deepnets via the BigML API. If you have any questions or you’d like to learn more about how Deepnets work, please visit the dedicated release page. It includes a series of six blog posts about Deepnets, the BigML Dashboard and API documentation, the webinar slideshow as well as the full webinar recording.

Case Study: Finding Higgs Bosons with Deepnets

In our previous blog post, we’ve introduced Deep Neural Networks. In today’s blog post, the second of our series of six, we’ll use BigML’s deep networks, called Deepnets, to walk through a supervised learning problem.

The Data

The dataset we are investigating is the Higgs dataset found at the UCI Machine Learning Repository. The problem, as explained in the original Nature Communications paper, is that particle accelerators create a huge amount of data. The Large Hadron Collider, for example, can produce 100 billion collisions in an hour and only a tiny fraction may produce detectable exotic particles like the Higgs boson. So a well-trained model is immensely useful for finding the needle in the haystack.

By Lucas Taylor / CERN-EX-9710002-1 (http://cdsweb.cern.ch/record/628469)

This dataset contains 28 different numeric fields and 11 million rows generated through simulation, but imitating actual collisions. As explained in the paper, these 28 fields come in two kinds. The first 21 of the fields capture low-level kinematic properties measured directly by the detectors in the accelerator. The last seven fields are combinations of the kinematic fields hand-designed by physicists to help differentiate between collisions that result in a Higgs boson and those that do not. So some of these fields are “real” data as measured, and some are constructed from domain-specific knowledge.

This kind of feature engineering is very common when solving Machine Learning problems, and can greatly increase the accuracy of your predictions. But what if we don’t have a physicist around to assist in this sort of feature engineering? Compared to other Machine Learning techniques (such as Boosted Trees), Deepnets can perform quite well with just the low-level fields by learning their own higher-level combinations (especially if the low-level fields are numeric and continuous).

To try this out, we created a dataset of just the first 1.5 million data points (for speed), and removed the last seven high-level features. We then split the data into a 80% training dataset and 20% testing dataset. Next up we create a Deepnet by using BigML’s automatic structure search.

Screen Shot 2017-09-27 at 9.28.36 AM

Deep neural networks can be built in a multitude of ways, but we’ve simplified this by intelligently searching through many possible network structures (number of layers, activation functions, nodes, etc.) and algorithm parameters (the gradient descent optimizer, the learning rate, etc.) and then making an ensemble classifier from the best individual networks. You can manually choose a network structure, tweak the search parameters, or simply leave all those details to BigML (as we are doing here).

The Deepnet

img 1

Once the Deepnet is created, it would be nice to know how well it is performing. To find out, we create an Evaluation using this Deepnet and the 20% testing dataset we split off earlier.

img 2

We get an accuracy of 65.9% and a ROC AUC of 0.7232. Perhaps we could even improve these numbers by lengthening the maximum training time. This is significantly better than our results using a Boosted Tree run with default settings. Here they are compared:

img 3

The Boosted Tree has an accuracy of 59.4% and a ROC AUC of 0.6394. The Boosted Tree is just not able to pull as much information out of these low-level features as the Deepnet. This is a clear example of when we should choose Deepnets over other supervised learning techniques to solve your classification or regression problems.

Want to know more about Deepnets?

Stay tuned for the more publications! In the next post we will explain how to use Deepnets through the BigML Dashboard. Additionally, if you have any questions or you’d like to learn more about how Deepnets work, please visit the dedicated release page. It includes a series of six blog posts about Deepnets, the BigML Dashboard and API documentation, the webinar slideshow as well as the full webinar recording.

The Importance of Protecting Your Inventions On Time

BigML, in collaboration with Telefónica, the law firms of Schwabe Williamson & Wyatt and FisherBroyles, ValenciaLAB, and the Valencian Municipality (VIT Emprende, València Activa, and Ajuntament de València) are bringing to Spain a special event about the importance of patents in protecting your inventions expeditiously.

This event will be held in Madrid, Barcelona, and Valencia, on October 25, 26 and 27 respectively. These three workshops will cover the importance of being IP-aware, as protecting the Intellectual Property (IP) of your business is key for its commercial success, whether your product is on the market or still under development. By patenting your ideas, brand, and offerings, your company becomes more competitive in the marketplace and more attractive for investors willing to support your company.

Why attend?

The event will clearly define what a patent is, while addressing topics such as the importance of patenting new ideas in your business, the risks you bear if your IP is not properly patented, the impact of patents on continuous innovation, the process to protect your inventions, and the grants or awards that you can receive when patenting your ideas. All these questions will be discussed from three different perspectives: beginning from a startup’s point of view, presented by Francisco J. Martín, BigML’s CEO; following up with a perspective of big corporations, explained by Luis Ignacio Vicente del Olmo, Head of Telefonica Patent Office at Telefonica; and concluding with the legal viewpoint of the importance of IP by Micah D. Stolowitz, IP Strategist at FisherBroyles, and Graciela Gómez Cowger, IP Strategist and CEO at Schwabe, Williamson & Wyatt. All the sessions will delve into the current IP climate for both European and US companies.

Who should attend?

Intellectual Property and Patent Attorneys, Intellectual Property Patent Engineers, C-level Executives, Corporate General Counsel, Corporate Attorneys, Financing Executives, Venture Firm Capitalists, and any other professional interested in this topic.

If you have your own business or you plan to have one in the near future, you should not miss this chance to learn from two of the most experienced lawyers in the Intellectual Property field. Moreover, hearing the insider stories from two completely different organizations will help you crystallize the impact that patents can have in your business as you set off on your own patent journey.

When and Where?

These workshops will be held in three cities in Spain during October 2017:

On October 25 the event will take place in Madrid at Telefónica Building:

  • Room: Agora in Wayra Madrid (8th Floor)
  • Address: Gran Via, 28, 28013, Madrid (Access from Valverde Street, 2)

On October 26 the workshop will be held in Barcelona at Diagonal 00 Telefónica Tower:

  • Room: Auditorium (2nd Floor)
  • Address: Ernest Lluch and Martin Square, 5, 08019, Barcelona

On October 27, the last workshop will be held in Valencia at ValenciaLAB.

  • Room: Conference Room (Ground Floor)
  • Address: Music Peydro Street, 36, 46001, Valencia

Speakers

Dr. Francisco J. Martín, the Co-Founder and CEO of BigML, is an innovative leader experienced at inventing, designing, building, and operating Intelligent Systems, from concept development to market validation and business growth.

Before BigML, from 2004 to 2010, Francisco J. Martín was CEO and founder of Strands, a leading company that develops recommendation and personalization technologies. Strands pioneered research and applications of social recommendation technologies in several domains (music, videos, personal finance, sports, etc). Prior to that, from 1999 to 2003, Dr. Martin was founder and CEO of iSOCO, a company that specializes in business solutions built on top of Artificial Intelligence. iSOCO was the first spin-off of the IIIA Research Centre of Artificial Intelligence, belonging to the Spanish Council for Scientific Research; and pioneered research and applications of semantic web technologies.

Regarding his education, Dr. Martín did a fellowship research in the same center from January 1996 to 1999; got his Computer Science Degree in 1996 by the Polytechnic University of Valencia; holds a PHD in Artificial Intelligence by the Polytechnic University of Catalunya; and a Postdoc in Machine Learning by the Oregon State University.

 

Dr. Luis Ignacio Vicente del Olmo is currently managing the Return on Innovation Area of Telefonica SA, the telco operator based in Spain with operations in Europe and South America, with a focus on new areas such as 5G, Machine Learning, and Industry 4.0. This role includes the leadership of Telefonica Patent Office, the leading Intellectual Property Unit in Spain. Dr. del Olmo is a member of preeminent Spanish, European & International Boards related to R&D Management, a Professor at the Master of Digital Science promoted by the European Institute of Technology, and a member of the Board of Telefonica I+D Chile, the main R&D Center of Telefonica in South America.

Dr. del Olmo has over 25 years of working experience in R&D & innovation management, mainly at Telefonica. He is a Senior expert in R&D & Innovation Management and has international experience participating in R&D & Innovation as well as 20 years of experience working with European Commission, MIT, OECD, BEI, BID, among others.

Dr. del Olmo is an engineer and has a PhD in Physics (with a specialization in Electronics). He completed a Master in Analysis and Management of Science & Technology, graduating in Economy of Telecommunications, and has a degree in Industrial Engineering and Innovation Management. He is also a specialist in Innovation Economy, and a graduate of European Communities by the Diplomatic School of the Spanish Ministry of Foreign Affairs.

 

Mr. Micah Stolowitz, IP Strategist at FisherBroyles, has practiced law for 36 years, focused on intellectual property protection, enforcement and transactions. He represents a wide range of companies from Fortune 100 to new startups and many in between, and has been recognized by his peers as a Superlawyer®  in IP every year for over 10 years. He has testified as an expert witness in patent litigation, and also served as a Special Master, appointed to assist the district court in Lizardtech, Inc. v. Earth Resource Mapping, Inc. (CAFC 2005). Micah Stolowitz is a fellow of the Academy of Court Appointed Masters.

Mr. Stolowitz works with clients on IP strategy, patent and trademark prosecution, licensing and monetization. His work includes infringement and validity studies and opinions, and design around advice. He has negotiated patent sales valued in millions of dollars. A sampling of his extensive experience would include drafting and prosecuting patents directed to digital, analog and mixed signal circuits, software of all kinds, cryptographic and other systems for security and authentication, physical object “fingerprinting,” identification and authentication, internet high bandwidth and availability, wireless telecommunication, 3GPP standards, database systems, SaaS “cloud computing,” prediction systems, Machine Learning, character recognition, solid state and memory and disk drives, printing technologies, and medical devices.

Micah Stolowitz serves as adjunct professor of Patent Law at Lewis and Clark College, where he also completed his Juris Doctor degree. Mr. Stolowitz has worked as an electrical engineer in Silicon Valley, and holds a B.S. degree in Electrical Engineering and Computer Science, from the University of California, Berkeley. His industry experience helps him see IP challenges from a client’s perspective and align his legal services with client goals. He is committed to staying on the cutting edge of technological advances, and counts himself fortunate to work with many great teachers (the inventors in the firm he is serving).  

 

Ms. Graciela Gómez Cowger, CEO-Select and IP Strategist at Schwabe

Ms. Graciela Gómez Cowger, IP Strategist and CEO at Schwabe, Williamson & Wyatt, helps individuals and companies protect innovations in the technology and health industries. She prepares, files, and prosecutes patents in the electronics, software and communications arts and also drafts patent infringement analysis and opinion letters. Working closely with inventors and companies, Ms. Gómez Cowger helps assess the value of patent portfolios, and crafts licensing and other strategies to maximize intellectual property investments.

Ms. Gómez Cowger has extensive experience helping individuals and companies develop branding strategies that protect market presence, including preparing, filing and prosecuting trademark applications in the United States and abroad. Before becoming a lawyer, she worked as a research and design engineer at HP Inc, a large electronics company developing cutting-edge printing systems. This experience has allowed her to work collaboratively with inventors to quickly identify the distinctions that make innovations patentable.

Apply to join the event:

IMPORTANT: Attendance is free but by invitation only. Space is limited, so please fill in this form to register for the event and make sure you select the preferred location, either Madrid, Barcelona, or Valencia. Shortly after your registration, the BigML Team will send you your invitation.

In case you have questions, please check the dedicated event page for more details.

Introduction to Deepnets

We are proud to present Deepnets as the new resource brought to the BigML platform. On October 5, 2017, it will be available via the BigML Dashboard, API and WhizzML. Deepnets (an optimized version of Deep Neural Networks) are part of a broader family of classification and regression methods based on learning data representations from a wide variety of data types (e.g., numeric, categorical, text, image). Deepnets have been successfully used to solve many types of classification and regression problems in addition to social network filtering, machine translation, bioinformatics and similar problems in data-rich domains.

Intro to Deepnets

In the spirit of making Machine Learning easy for everyone, we will provide new learning material for you to start with Deepnets from scratch and progressively become a power user. We start by publishing a series of six blog posts that will gradually dive deeper into the technical and practical aspects of Deepnets. Today’s post sets off by explaining the basic Deepnet concepts. We will follow with an example use case. Then, there will be several posts focused on how to use and interpret Deepnets through the BigML Dashboard, API, and WhizzML and Python Bindings. Finally, we will complete this series with a technical view of how Deepnet models work behind the scenes.

Let’s dive right in!

Why Bring Deepnets to BigML?

Unfortunately, there’s a fair amount of confusion about Deepnets in the popular media as part of the ongoing “AI misinformation epidemic”. This has caused the uninitiated to regard Deepnets as some sort of a robot messiah destined to either save or destroy our planet. Contrary to the recent “immaculate conception” like narrative fueled by Deepnets’ achievements in the computer vision domain after 2012, the theoretical background of Deepnets dates back 25+ years.

So what explains Deepnets’ newfound popularity?
  • The first reason has to do with sheer scale. There are problems that involve massive parameter spaces that can be sufficiently represented only by massive data. In those instances, Deepnets can come to the rescue, thanks to the abundance of modern computational power.  Speech recognition is a good example of such a challenging use case, where the difference between 97% accuracy and 98.5% accuracy can mean the difference between a consumer product that is very frustrating to interact with or one that is capable of near-human performance.
  • In addition, the availability of the number of open source frameworks for computational graph composition has helped popularize the technique among more scientists.  Such frameworks “compile” the required Deepnets architectures into a highly optimized set of commands that run quickly and with maximum parallelism. They essentially work by symbolically differentiating the objective for gradient descent thus freeing the practitioner from having to work out the underlying math himself.

Comparing Deepnets to Other Classifiers

All good, but if we look beyond the hype, what are some of the major advantages and disadvantages of Deepnets before giving it serious consideration as part of our Machine Learning toolbox? For this, it’s best to contrast them with the pros and cons of alternative classifiers.

  • As you’ll remember, decision trees have the advantage of massive representational power that expands as your data gets larger due to efficient and fast search capabilities. On the negative side, decision trees struggle with the representation of smooth functions as their axis-parallel thresholds require many variables to be able to account for them. Tree ensembles, however, mitigate some of these difficulties by learning a bunch of trees from sub-samples to counter the smoothness issue.
  • As far as Logistic Regression is concerned, it can handle some smooth, multivariate functions as long as they lie in its hypothesis space. Furthermore, it trains real fast. However, because it is a parametric method, it tends to fall short for use cases where the decision boundary is nonlinear.

So, the question is: Can Deepnets mitigate the shortcomings of trees or Logistic Regression? Just like trees (or tree ensembles), with Deepnets, we get arbitrary representational power by modifying their underlying structure.  On the other hand, similar to Logistic Regression, smooth, multivariate objectives don’t present a problem to Deepnets provided that we have the right network architecture underneath.

You may already be suspecting what the catch is, since there is no free lunch in Machine Learning. The first tradeoff comes in the form of ‘Efficiency‘, because there is no guarantee that the right neural network structure for a given data will be easily found. As a matter of fact, most structures are not useful. So you’re really left with no choice but to try different structures by trial and error. As a result, nailing the Deepnets structure remains a time-consuming task compared with the decision trees’ greedy optimization routine.

Interpretability‘ also is negatively impacted as the Deepnets practitioner ends up getting quite far away from the type of intuitive interpretability of tree-based algorithms. One possible solution is to use sampling and tree induction to create decision tree-like explanations for your Deepnet predictions (more on this later).

Deepnets vs.LR vs. Trees
Where does this leave us as to when to use Deepnets? First off, let’s outline the factors that make Deepnets less useful:
  • If you have smaller datasets (e.g., thousands instead of millions of instances) you may be better off looking into other techniques like Logistic Regression or tree ensembles.
  • Since better features almost always beat better models, problems that benefit from quick iterations may best be handled by other approaches. That is, if you need to iterate quickly and there are many creative features you can generate from your dataset to do so, it’s usually best to skip Deepnets and their trial and error based iterations in favor of algorithms that fit faster e.g., tree ensembles.
  • If your problem’s cost-benefit equation doesn’t require every ounce of accuracy you can grab, you may also be better off with other more efficient algorithms that consume less resources.

Finally, remember that Deepnets are just another sort of classifier. As Stuart J. Russell, Professor of Computer Science at the University of California, Berkeley, has put:

  • “…deep learning has existed in the neural network community for over 20 years. Recent advances are driven by some relatively minor improvements in algorithms and models and by the availability of large data sets and much more powerful collections of computers.”

From Zero to Deepnet Predictions

In BigML, a regular Deepnet workflow is composed of training your data, evaluating it and predicting what will happen in the future. In that way, it is very much like other supervised modeling methods available in BigML. But what makes Deepnets different, if at all?

1. The training data structure: The instances in the training data can be arranged in any manner. Furthermore, all types of fields (numeric, categoric, text and items) and missing values are supported in the same vein as other classification and regression models and ensembles.

2. Deepnet models: A Deepnet model can accept either a numeric field (for regression) or a categorical field (for classification) in the input dataset as objective field. In a nutshell, BigML’s Deepnets implementation is differentiated from similar algorithms due to its automatic optimization option to help you discover the best Deepnet parametrization during your network search. We will get into the details of our approach in a future post that will focus on what goes on under the hood.

3. Evaluation: Evaluations for Deepnet models are similar to other supervised learning evaluations, where random training-test splits are usually preferred. The same classification and regression performance metrics are applicable to Deepnets (e.g., AUC for classification or R-squared for regression). These metrics are fully covered in the Dashboard documentation.

4. Prediction: As in other classification and regression resources, BigML Deepnets can be used to make single or batch predictions depending on your need.

Understanding Deepnet Models

Not a lot of people talk about Deepnets as a generalization of Logistic Regression, but we can very well think about it in that way. (If you need a refresher on Logistic Regression, we recommend a quick read of our related post.) Take the following representation of Logistic Regression with nodes and arrows. The circles at the bottom are the input features and the ones on top are the corresponding output probabilities. Each arrow represents one of the coefficients (betas) to learn, which means the output becomes a function of those betas you’re trying to learn.

Logistic Regression

What if we add a bunch more circles as a hidden layer in the middle as seen below?  In this case, the intermediate nodes would be computed the same way as before. Following the same approach, we can add as many nodes and layers to our structure as we choose to. Not only that, we can also mess with the function that computes the weights. In the case of Logistic Regression, it is the well-known logistic function, which is easily differentiable and optimized. By the same token, any other easily differentiable function can replace the logistic function, when it comes to Deepnet structures. Given all these structural choices, you can intuit how representationally powerful Deepnets can be, yet how difficult searching for the optimal structure can get in such a vast hypothesis space.

Deep Neural Network

In Summary

To wrap up, BigML’s Deepnet models:
  • Help solve use cases such as computer vision, speech recognition, natural language processing, audio recognition, social network filtering, machine translation, and bioinformatics, among others.
  • Can be used in classification and regression problems among other algorithms BigML provides.
  • Implement a generalized form of Deep Neural Networks, instead of a specialized architecture optimized for a specific use case e.g., RNN, CNN, LSTM.
  • BigML’s Deepnets implementation also comes with an automatic optimization option that lets you discover the best Deepnet parametrization during your network search.

You can create Deepnet models, interpret and evaluate them, as well as make predictions with them via the BigML Dashboard, our API and Bindings, plus WhizzML and Bindings (to automate your Deepnet workflows). All of these options will be showcased in future blog posts.

Want to know more about Deepnets?

At this point, you may be wondering how to apply Deepnets to solve real-world problems. For starters, in our next post, we will show a use case, where we will be examining an example dataset to see if we can make accurate predictions based on our Deepnet model. Stay tuned!

If you have any questions or you’d like to learn more about how Deepnets work, please visit the dedicated release page. It includes a series of six blog posts about Deepnets, the BigML Dashboard and API documentation, the webinar slideshow as well as the full webinar recording.

BigML Summer 2017 Release and Webinar: Deepnets!

BigML’s Summer 2017 Release is here! Join us on Thursday October 5, 2017, at 10:00 AM PDT (Portland, Oregon. GMT -07:00) / 07:00 PM CEST (Valencia, Spain. GMT +02:00) for a FREE live webinar to discover the latest update of the BigML platform. We will be presenting Deepnets, a highly effective supervised learning method that solves classification and regression problems in a way that can match or exceed human performance, especially in domains where effective feature engineering is difficult.

Deepnets, the new resource that we bring to the BigML Dashboard, API and WhizzML, are an optimized version of Deep Neural Networks, the machine-learned models loosely inspired by the neural circuitry of the human brain. Deepnets are state-of-the-art in many important supervised learning applications. To avoid the difficult and time-consuming work of hand-tuning the algorithm, BigML’s unique implementation of Deep Neural Networks offers first-class support for automatic network topology search and parameter optimization. BigML makes it easier for you by searching over all possible networks for your dataset and returning the best network found to solve your problem.

As any other supervised learning model, you need to evaluate the performance of your Deepnets to get an estimate of how good your model will be at making predictions for new data. To do this, prior to training your model, you will need to split your dataset into two different subsets (one for training and the other one for testing). When your Deepnets model is trained, you can use your pre-built test dataset to evaluate its performance and easily interpret the results with BigML’s evaluation comparison tool.

One of the main goals of any BigML resource is making predictions, and Deepnets are no exception. As Deepnets have more than one layer of nodes between the input and the output layers, the output is the network’s prediction: an array of per-class probabilities for classification problems, or a single, real value for regression problems. Moreover, BigML provides a prediction explanation whereby you can request a list of human-readable rules that explain why the network predicted a particular class or value.

Want to know more about Deepnets?

If you have any questions or you’d like to learn more about how Deepnets work, please visit the dedicated release page. It includes a series of six blog posts about Deepnets, the BigML Dashboard and API documentation, the webinar slideshow as well as the full webinar recording.

 

Sending off the BigML #Summer2017 Interns

During the summer of 2017, a group of interns joined the BigML Team. They came from different backgrounds, different countries, and we wanted to let them briefly share their experiences interning at BigML.

Barbara Martin Summer 2017 InternMy name is Barbara Martin, and I had the opportunity to complete a 5-month internship at BigML, as a part of my 2nd year Engineer School program at ISIMA in France. I spent the first four months of my internship in Valencia, Spain working on several Machine Learning projects. For the last month of my internship, I went to Corvallis, USA to do a thrilling project with other interns. This topic suits my major in Computer Science, and also brought me to the interesting area of Machine Learning. During my internship, I gained a lot of knowledge and had a great chance to sharpen my skills in a professional working environment. I learned new technologies and had opportunities to practice my communication skills by giving presentations and having discussions with my supervisors, experts in the field, and the other staff within and outside BigML.

Jeremiah Lin Summer 2017 InternMy name is Jeremiah Lin. I had the opportunity working for BigML this summer as a marketing intern. I am going into my third-year as an undergraduate at the University of Oregon with a major in Business Administration and a minor in Product Design. During my internship, I worked on translating and reviewing BigML resources that will be released in other languages. I also worked on a research project about how Machine Learning can be implemented to improve the efficiency of personalized marketing. I had little knowledge about Machine Learning going in my internship, but I have learned a lot about it throughout my time at in BigML, and it was eye-opening to see its power to revolutionize the marketing world and many other industries. I loved working with the interns and other team members, and I am grateful for working in such supportive environment that we had at the office. This internship has been a great opportunity to learn about Machine Learning, but more importantly, to learn from the hard-working, team-oriented culture that is embedded in this company.
María Peña Summer 2017 InternMy name is María Peña and I am a last year student of Industrial Design at the UPV. During my internship at BigML, I have discovered and learned how Machine Learning can change our lives and help people to develop their companies. I have also had the opportunity to meet the BigML Team, which includes people from all over the world with the same dream working together. I do everything related to design and I usually spend my time in web design and graphic design, so thanks to BigML I have improved my skills in HTML and CSS. Therefore, working at BigML is the perfect experience to be part of the change and start your career in an amazing way.Mohan Kumar Janapareddi Summer 2017 InternMy name is Mohan Kumar Janapareddi, and I graduated from Ferris State University in the Information Security and Intelligence program. During my internship at BigML, I completed the BigML Certification Program, which helped me learn all the major concepts of Machine Learning in a short period of time. I also got to be a part of the intern’s group project. We worked on building a dynamic website where users can easily make predictions for selected features by simply uploading their data to this website. My portion of the project involved learning how to work with the Django web framework to build a website from scratch. Overall, working at BigML has been a great opportunity to strengthen my technical knowledge of Machine Learning and programming.Ryan Alder Summer 2017 InternMy name is Ryan Alder, and I worked at BigML as an intern for just over a month this summer. I am a second year student at Oregon State University studying Computer Science. I joined BigML because I plan to write my thesis for the OSU Honors College on Machine Learning. I did not know much about Machine Learning when I first joined, and this internship has taught me a great deal regarding what actually happens behind the scenes, and how companies such as BigML are able to take a dataset and convert it accurately into predictions. During my internship, I became a Certified Engineer through the BigML Certification Program, and I worked on an application with other interns. My responsibility for this project was working on the front end, mainly the website. I incorporated HTML, CSS, and javascript to make a working prototype of the website. I had a great deal of fun working with the people here at BigML, and they taught me a significant amount as well.

INTERNSML

Interested in becoming a BigML intern next summer? Let us know at openings@bigml.com and we’d love to hear if you have project ideas. We are always looking for energetic, team-oriented, self-starters to join our team and help bring Machine Learning to everyone!

%d bloggers like this: