Skip to content

Predicting TED Talks Popularity

Everyone knows TED talks. TED started in 1984 as a conference series on technology, education, and design. In essence, TED talks aim to democratize knowledge. Nowadays it produces more than 200 talks per year addressing dozens of different topics. Despite the critics who claim that TED talks reduce complex ideas to 20-minute autobiographical stories of inspiration, the great influence they have on the knowledge diffusion in our society is undeniable.

Predicting Ted Talks Views with Machine Learning

When I came across the TED dataset in Kaggle, the first thing that caught my attention was the great dispersion in the number of views: from 50K to over 47M (with a median of ~1M views). One can’t keep but wonder what makes some talks 20 times more popular than others? Can the TED organizers and speakers do something to maximize the views in advance? In this blog post, we’ll try to predict the popularity of TED talks and analyze the most influential factors.

The Data

The original file provides information for 2,550 talks over one decade: from 2006 (only 50 talks were published) until 2017 (over 200 talks were published). When you create a dataset in BigML, you can see all the features with their types and their distributions. You can see the distribution of talks over the years in the last field in the dataset seen below.


For text fields, we can inspect the word frequency in a tag cloud. For example, the most used words in the titles are “world”, “life” and “future”.


When we take a look at the features, we see two main sets: one that informs us about the talk impact (comments, languages, and views -our objective field-), and another that describes the talk characteristics (the title, description, transcript, speakers, duration, etc). Apart from the original features, we came up with two additional ones: the days between video creation and publishing and the days between publishing and dataset collection on Sept 21, 2017.

The fields related to talk impact may be a future consequence of our objective field: the views. A talk with more views is more likely to have a higher number of comments and to be translated into more languages. Therefore, it’s best if we exclude those from our analysis, otherwise, we will create a “leakage”, i.e., leaking information from the future in the past, thus obtaining unrealistically good predictions.

On the other hand, all the fields related to the talk characteristics can be used as predictors. Most of them are text fields such as the title, the description, or the transcript. BigML supports text fields for all supervised models, so we can just feed the algorithms with them. However, we suspect that not all the thematically related talks necessarily use the same words. Thus, using the raw text fields may not be the best way to find patterns in the training data that can be generalized to other TED talks. What if instead of using the exact raw words in the talks, we could extract their main topics and use them as predictors? That is exactly what BigML topic models allow us to do!

Extracting the Topics of TED talks

We want to know the main topics of our TED talks and use them as predictors. To do this, we build a topic model in BigML using the title, the description, the transcript, and the tags as input fields.

The coolest thing about BigML topic models is that you don’t need to worry about text pre-processing. It’s very hand that BigML automatically cleans the punctuation marks, homogenizes the cases, excludes stopwords, and applies stemming during the topic model creation. You can also fine tune those settings and include bigrams as you wish by configuring your topic model in advance.


When our topic model is created, we can see that BigML has found 40 different topics in our TED talks including technology, education, business, religion, politics, and family, among others.  You can see all of them in a circle map visualization in which each circle represents a topic. The size of the circle represents the importance of that topic in the dataset and related topics are located closer in the graph. Each topic is a distribution over term probabilities. If you mouse over each topic you can see the top 20 terms and their probabilities within that topic. BigML also provides another visualization in which you can see all the top terms per topic displayed in horizontal bars. You can observe both views below or better yet inspect the model by yourself here!

BigML topic models are an optimized implementation of Latent Dirichlet Allocation (LDA), one of the most popular probabilistic methods for topic modeling. If you want to learn more about topic models, please read the documentation.

All the topics found seem to be coherent with main TED talks themes. Now we can use this model to calculate the topic probabilities for any new text. Just click on the option Topic Distribution in the 1-click action menu, and you will be redirected to the prediction view where you can set new values to your input fields. See below how the distribution over topics changes when you change the text in the input fields.

Now we want to do the same for our TED talks. To calculate the topic probabilities for each TED talk we use the option Batch Topic Distribution in the 1-click action menu. Then we select our TED talks dataset. We also need to make sure that the option to create a new dataset out of the topic distribution is enabled!


When the batch topic distribution is created, we can find the new dataset with additional numeric fields, containing the probabilities of each topic per TED talk. These are the fields that we will use as inputs to predict the views replacing the transcript, title, description, and tags.


Predicting the TED Talks Views

Now we are ready to predict the talk views. As we mentioned at the beginning of this post, the number of talks is a widely dispersed and highly skewed field, therefore predicting the exact number of views can be difficult. In order to find more general patterns of the influence of topics as applicable to the talks popularity, we are going to discretize the views and make it categorical. This is very easy in BigML, you just need to select the option to Add fields to the dataset in the configure option menu. We are going to discretize the field by percentiles by using the median.


Then we click the button to create a dataset. The dataset contains a new field with two classes, the first class containing the talks below the median number of views (less than 1M views), and a second class containing the talks over the median number of views, (more than 1M views).


Before creating our classification model, we need to split our dataset into two subsets: the 80% for training and the remaining 20% for testing to ensure that our model generalizes well against data that the model has not seen before. We can easily do this in BigML by using the corresponding option in the 1-click action menu, as shown below.


We proceed with the 80% of our dataset to create our predictive model. To compare how different algorithms perform, we create a single tree, an ensemble (Random Decision Forest), a logistic regression and the latest addition to BigML, deepnets (an optimized implementation of the popular deep neural networks). You can easily create those models from the dataset menus. BigML automatically selects the last field in the dataset as the objective field “views (discretized)” so we don’t need to configure it differently. Then we use the 1-click action menu to easily create our models.


Apart from the 1-click deepnet, which uses an automatic parameter optimization option called Structure Suggestion, we also create another deepnet by configuring an alternative automatic option called Network Search. BigML offers this unique capability for automatic parameter optimization to eliminate the difficult and time-consuming work of hand-tuning your deepnets.


After some iterations, we realize that the features related to the speaker have no effect on the number of views, therefore we eliminate those along with the field “event” that seems to be causing overfitting. At the end, we use as input fields all the topics, the year of the published date, the duration of the talk, plus our calculated field that measures the number of days since the published date (until the 21st of Sept 2017).

After creating all the models with these selected features, we need to evaluate them by using the remaining 20% of the dataset that we set aside earlier. We can easily compare the performance of our models with the BigML evaluation comparison tool, where the ROC curves can be analyzed altogether. As seen below, the winner with the highest AUC (0.776)  is a deepnet that uses the automatic parametrization option “Network Search”. The second best performing model is again a deepnet, but the one using the automatic option “Structure Suggestion”. This one has a AUC value of 0.7557. In third place, we see the ensemble (AUC of 0.7469), followed by logistic regression (AUC of 0.7097) and finally the single tree (AUC of 0.6781).


When we take a look at the confusion matrix of our top performing deepnet, we can see that we are achieving over 70% recall with a 70% precision for both classes of the objective field.


Inspecting the Deepnet

Usually, deep neural network predictions are hard to analyze, that’s why BigML provides ways to make it easy for you to understand why your model is making particular decisions.

First, BigML can tell us which fields are most important in predicting the number of views. By clicking on the “Summary Report” option, we get a histogram with a percentage per field that displays field importances. We can see that (not surprisingly) the “days_since_published” is the most important field (19.33%), followed by the topic “Entertainment” (15.90%), and the “published_date.year” (13.62%). Amongst the top 10 most important fields we can also find the topics “Relationships”, “Global issues”, “Research”, “Oceans”, “Data”, “Psychology” and “Science”.


Great! We can state that our deepnet found the topics as relevant predictors in deciding the number of views. But how exactly do these topics impact predictions? Will talks about psychology have more or less views than talks about science? To answer this question, BigML provides a Partial Dependence Plot view, where we can analyze the marginal impact of the input fields on the objective field. Let’s see some examples (or play with the visualization here at your leisure).

For example, see in the image below how the combination of the topics “Entertainment” and “Psychology” have a positive impact on the number of views. Higher probabilities of those topics result in the prediction of our 2nd class (the blue one), which is the class over 1M number of views.


On the contrary, if we select “Health” instead, we can see how higher probabilities of this topic result in a higher probability of predicting the 1st class (the class below 1M number of views).


We can also see a change over time in the interest in some topics. As you can see below, the probability of the psychology topic to have more than 1M views has increased in the recent years given the period of 2012 to 2017.



In summary, we have seen that topics do have a significant influence on the number of views. After analyzing the impact of each topic in predictions, we observe that “positive” topics such as entertainment or motivation are more likely to have a greater number of views while talks with higher percentages of the “negative” topics like diseases, global issues or war are more likely to have fewer views. In addition, it seems that the interest in individual human-centered topics such as psychology or relationships has increased over the years to the detriment of broader social issues like health or development.

We hope you enjoyed this post! If you have any questions don’t hesitate to contact us at

How to Create a WhizzML Script – Part 4

As the final installment of our WhizzML blog series (Part 1, Part 2, Part 3), it’s time to learn the BigML Python bindings way to create WhizzML scripts. With this last tool at your disposal, you’ll officially be a WhizzML-script-creator genius!

About BigML Python bindings

BigML Python bindings allow you to interact with, the API for BigML. You can use it to easily create, retrieve, list, update, and delete BigML resources (i.e., sources, datasets, models and predictions). This tool becomes even more powerful when it is used with WhizzML. Your WhizzML code can be stored and executed in BigML by using three kinds of resources: Scripts, Libraries, and Executions (see the first post: How to create WhizzML Script. Part 1).pythonWhizzMl

WhizzML Scripts can be executed in BigML’s servers in a controlled, fully-scalable environment that automatically takes care of their parallelization and fail-safe operation. Each execution uses an Execution resource to store the arguments and the results of the process. WhizzML Libraries store generic code to be shared or reused in other WhizzML Scripts. But this is not really new, so let’s get into some new tricks instead.


You can create and execute your WhizzML script via the bindings to create powerful workflows. There are multiple languages supported by BigML bindings e.g., Python, C#, Java or PHP. In this post we’ll use the Python bindings but you can find the others on GitHub. If you need a BigML Python bindings refresher, please refer to the dedicated Documentation.


In BigML a script resource stores WhizzML source code, and the results of its compilation. Once a WhizzML script is created, it’s automatically compiled. If compilation is successful, the script can be run, that is, used as the input for a WhizzML execution resource. Suppose we want a simple script that creates a model from a dataset ID. The code in WhizzML to do that is:

# define the code that will be compiled in the script
script_code = "(create-model {\"dataset\" dataset-id})"

However, you need to keep in mind that every script can have inputs and outputs. In this case, we’ll want the dataset-id variable to contain our input, the dataset ID. In the code below, we see how to describe this input, its type and even associate a specific ID that will be used by default when no input is provided.

from bigml.api import BigML
api = BigML()

# define the code include in the script
script_code = "(create-model {\"dataset\" dataset-id})" 

# define parameters
args = {"inputs": [{
    "name": "dataset-id",
    "type": "dataset-id",
    "default": "dataset/598ceafe4006830450000046",
    "description": "Dataset to be modelized"

# create the script
script = api.create_script(script_code, args)

# waiting for the script compilation to finish


To execute a compiled WhizzML script in BigML, you need to create an execution resource. Each execution is run under its associated user credentials and its particular environment constraints. Furthermore, a script can be shared, and you can execute the same script several times under different usernames by creating a different execution. For this, you need to first identify the script you want to execute and set your inputs as an array of arrays (one per variable-value pair). In this execution, we are setting the value of variable x in the script to 2. The parameters for the execution are the script inputs. It’s a piece of cake!

from bigml.api import BigML
api = BigML() 

# choose workflow
script = 'script/591ee3d00144043f5c003061' 

# define parameters
args = {"inputs": [['x', 2]]}

# execute and wait for the execution to finish
execution = api.create_execution(script, args)


We haven’t created libraries in the previous posts, but there’s always a first time. A library is a shared collection of WhizzML functions and definitions usable by any script that imports them. It’s a special kind of compiled WhizzML source code that only defines functions and constants. Libraries can be imported in scripts. The imports attribute of a script can contain a list of library IDs whose defined functions and constants will be usable in the script. It’s very easy to create WhizzML libraries by using BigML Python bindings.

Let’s go through a simple example. We’ll define one function get-max and a constant. The function will get the max value in a list. In WhizzML language this can be expressed as below.

from bigml.api import BigML
api = BigML()

# define the library code
lib_code = \
    "(define (get-max xs)" \
    "  (reduce (lambda (x y) (if (> x y) x y))" \
    "    (head xs) xs))"

# create a library
library = api.create_library(lib_code)

# waiting for the library compilation to finish

For more examples, don’t hesitate to take a look at the WhizzML resources section in the BigML Python Bindings documentation.

So that’s it! You now know all the techniques needed to create and execute WhizzML scripts. We’ve covered the basic concepts of WhizzML, how to clone existing scripts or import them from Github (Part 1), how to create new scripts by using Scriptify and the editor (Part 2), how to use the BigMLer command line (Part 3) and finally the BigML Python bindings in this last post.


To go deeper into the world of Machine Learning automation via WhizzML, please check out our WhizzML tutorials such as the Automated Dataset Transformation tutorial. If you need a WhizzML refresher, you can always visit the WhizzML documentation. Start with the “Hello World” section, in the Primer Guide, for a gentle introduction. And yes, please be on the lookout for more WhizzML related posts on our blog.

BigML Summer 2017 Release Webinar Video is Here: Deepnets!

We are happy to share that Deepnets are fully implemented on our platform and available from the BigML Dashboard, API, as well as from WhizzML for its automation.

BigML Deepnets are the world’s first deep learning algorithm capable of Automatic Network Detection and Structure Search, which automatically optimize model performance and minimize the need for expensive specialists to drive business value. Following our mission to make Machine Learning beautifully simple for everyone, BigML now offers the very first service that enables non-experts to use deep learning with results matching that of top-level data scientists. BigML’s extensive benchmark conducted on 50+ datasets has shown Deepnets, an optimized version of Deep Neural Networks brought to the BigML platform, to outperform other algorithms offered by popular Machine Learning libraries. With nothing to install, nothing to configure, and no need to specify a neural network structure, anyone can use BigML’s Deepnets to transform raw data into valuable business insights.

Special thanks to all webinar attendees who joined the BigML Team yesterday during the official launch. For those who missed the live webinar, you can watch the full recording on the BigML YouTube channel.

As explained in the video, one of the main complexities of deep learning is that a Machine Learning expert is required to find the best network structure for each problem. This can often be a tedious trial-and-error process that can take from days to weeks. To combat these challenges and make deep learning accessible for everyone, BigML now enables practitioners to find the best network for their data without having to write custom code or hand-tune parameters. We make this possible with two unique parameter optimization options: Automatic Network Search and Structure Suggestion.

BigML’s Automatic Network Search conducts an intelligent guided search over the space of possible networks to find suitable configurations for your dataset. The final Deepnet will use the top networks found in this search to make predictions. This capability yields a better model, however, it takes longer since the algorithm conducts an extensive search for the best solution. It’s ideal for use cases that justify the incremental wait for optimal Deepnet performance. On the other hand, BigML’s Structure Suggestion only takes nominally longer than training a single network. This option is capable of swiftly recommending a neural network structure that is optimized to work well with your particular dataset.

For further learning on Deepnets, please visit our dedicated summer 2017 release page, where you will find:

  • The slides used during the webinar.
  • The detailed documentation to learn how to create and evaluate your Deepnets, and interpret the results before making predictions from the BigML Dashboard and the BigML API.
  • The series of six blog posts that gradually explain Deepnets.

Thanks for your positive comments after the webinar. And remember that you can always reach out to us at for any suggestions or questions.

Deepnets: Behind The Scenes

Over our last few blog posts, we’ve gone through the various ways you can use BigML’s new Deepnet resource, via the Dashboard, programmatically, and via download on your local machine. But what’s going on behind the curtain? Is there a little wizard pulling an elaborate console with cartoonish-looking levers and dials?

Well, as we’ll see, Deepnets certainly do have a lot of levers and dials.  So many, in fact, that using them can be pretty intimidating. Thankfully, BigML is here to be your wizard* so you aren’t the one looked shamefacedly at Dorothy when she realizes you’re not as all-powerful as you might have thought.

BigML Deep Learning

Deepnets:  Why Now?

First, let’s address an important question, why are deep neural networks suddenly all the rage?  After all, the Machine Learning techniques underpinning deep learning have been around for quite some time. The reason boils down to a combination of innovations in the technology supporting the learning algorithms more than advances in learning algorithms themselves. It’s worth quoting Stuart Russell at length here:

. . .deep learning has existed in the neural network community for over 20 years. Recent advances are driven by some relatively minor improvements in algorithms and models and by the availability of large data sets and much more powerful collections of computers.

He gets at most of the reasons in this short paragraph.  Certainly, the field has been helped along by the availability of huge datasets like the ones generated by Google and Facebook, as well as some academic advances in algorithms and models.

But the things I will focus on in this post are in the family of “much more powerful collections of computers”.  In the context of Machine Learning, I think this means two things:

  • Highly parallel, memory-rich computers provisionable on-demand.  Few people can justify building a massive GPU-based server to train a deep neural network on huge data if they’re only going to use it every now and then. But most people can afford the same for a few days at a time. Making such power available in this way makes deep learning cost-effective for far more people than it used to be.
  • Software frameworks that have automatic differentiation as first-class functionality. Modern computational frameworks (like TensorFlow, for example) allow programmers to instantiate a network structure programmatically and then just say “now do gradient descent!” without ever having to do any calculus or worry about third-party optimization libraries.  Because differentiation of the parameters with respect to the input data is done automatically, it becomes much easier to try a wide variety of structures on a given dataset.

The problem here then becomes one of expertise:  people need powerful computers to do Machine Learning, but few people know how to provision and deploy machines on, say Amazon AWS to do this. Similarly, computational frameworks to specify and train almost any deep network exist and are very nice, but exactly how to use those frameworks and exactly what sort of network to create are problems that require knowledge in a variety of different domains.

Of course, this is where we come in. BigML has positioned itself at the intersection of these two innovations, utilizing a framework for network construction that allows us to train a wide variety of network structures for a given dataset, using the vast amount of available compute power to do it quickly and at scale, as we do with everything else. We add to these innovations a few of our own, in an attempt to make Deepnet learning as “hands off” and as “high quality” as we can manage.

What BigML Deepnets are (currently) Not:

Before we get into exactly what BigML’s Deepnets are let’s talk a little bit about what they aren’t. Many people who are technically minded will immediately bring to mind the convolutional networks that do so well on vision problems, or the recurrent networks (such as LSTM networks) that have great performance on speech recognition and NLP problems.

We don’t yet support these network types.  The main reason for this is that these networks are designed to solve supervised learning problems that have a particular structure, not the general supervised learning problem.  It’s very possible we’ll support particular problems of those types in the future (as we do with, say, time series analysis and topic modeling), and we’d introduce those extensions to our deep learning capabilities at that time.

Meanwhile, we’d like to bring the power of deep learning so obvious in those applications to our users in general.  Sure, deep learning is great for vision problems, but what can it do on my data?  Hopefully, this post will prove to you that it can do quite a lot.

Down to Brass Tacks

The central idea behind neural networks is fairly simple, and not so very different from the idea behind logistic regression:  You have a number of input features and a number of possible outputs (either one probability for each class in classification problems or a single value for regression problems). We posit that the outputs are a function of the dot product of the input features, together with some learned weights for each feature.  In logistic regression, for example, we imagine that the probability of a given class can be expressed as the logistic function applied to this dot product (plus some bias term).

Deep networks extend this idea in several ways. First, we introduce the idea of “layers”, where the inputs are fed into a layer of output nodes, the outputs of which are in turn fed into another layer of nodes, and so on until we get to a final “readout” layer with the number of output nodes equal to the number of classes.


This gives rise to an infinity of possible network topologies:  How many intermediate (hidden) layers do you want?  How many nodes in each one?  There’s no need to apply the logistic function at each node; in principle, we can apply anything that our network framework can differentiate. So we could have a particular “activation function” for each layer or even for each node! Could we apply multiple activation functions?  Do we learn a weight for every single node in one layer to every single node in another, or do we skip some?

Add to this the usual parameters for gradient descent: Which algorithm to use?  How about the learning rate? Are we going to dropout training to avoid overfitting?  And let’s not even get into the usual parameters common to all ML algorithms, like how to handle missing data or objective weights. Whew!

Extra Credit

What we’ve described above is the vanilla feed-forward neural network that the literature has known for quite a while, and we can see that they’re pretty parameter heavy. To add a bit more to the pile, we’ve added support for a couple of fairly recent advances in the deep learning world (some of the “minor advances” mentioned by Russel) that I’d like to mention briefly.

Batch Normalization

During the training of deep neural networks, the activations of internal layers of the network can change wildly throughout the course of training. Often, this means that the gradients computed for training can behave poorly so one must be very careful to select a sufficiently low learning rate to mitigate this behavior.

Batch normalization fixes this by normalizing the inputs to each layer for each training batch of instances, assuring that the inputs are always mean-centered and unit-variance, which implies well-behaved gradients when training. The downside is that you must now know both mean and variance for each layer at prediction time so you can standardize the inputs as you did during training. This extra bookkeeping tends to slow down the descent algorithm’s per-iteration speed a bit, though it sometimes leads to faster and more robust convergence, so the trade-off can be worthwhile for certain network structures.

Learn Residuals

Residual networks are networks with “skip” connections built in. That is, every second or third layer, the input to each node is the usual output from the previous layer, plus the raw, unweighted output from two layers back.

The theory behind this idea is that this allows information present in the early layers to bubble up through the later layers of the network without being subjected to a possible loss of that information via reweighting on subsequent layers. Thus, the later layers are encouraged to learn a function representing the residual values that will allow for good prediction when used on top of the values from the earlier layers. Because the early layers contain significant information, the weights for these residual layers can often be driven down to zero, which is typically “easier” to do in a gradient descent context than to drive them towards some particular non-zero optimum.

Tree Embedding

When this option is selected, we learn a series of decision trees, random forest-style, against the objective before learning (with appropriate use of holdout sets, of course). We then use the predictions of the trees as generated “input features” for the network. Because these features tend to have a monotonic relationship with the class probabilities, this can make gradient descent reach a good solution more quickly, especially in domains where there are many non-linear relationships between inputs and outputs.

If you want, this is a rudimentary form of stacked generalization embedded in the tree learning algorithm.

Pay No Attention To The Man Behind The Curtain

So it seems we’ve set ourselves an impossible task here. We have all of these parameters. How on earth are we going to find a good network in the middle of this haystack?

Here’s where things get BigML easy:  the answer is that we’re going to do our best to find ones that work well on the dataset we’re given. We’ll do this in two ways:  via metalearning and via hyper-parameter search.


Metalearning is another idea that is nearly as old as Machine Learning itself.  In its most basic form, the idea is that we learn a bunch of classifiers on a bunch of different datasets and measure their performance. Then, we apply Machine Learning to get a model that predicts the best classifier for a given dataset. Simple, right?

In our application, this means we’re going to learn networks of every sort of topology parameterized in every sort of way. What do I mean by “every sort”?  Well, we’ve got over 50 datasets so we did five replications of 10-fold cross-validation on each one. For each fold, we learned 128 random networks, then we measured the relative performance of each network on each fold.  How many networks is that?  Here, allow me to do the maths:

50 * 5 * 10 * 128 = 320,000.

“Are you saying you trained 320,000 neural networks”  No, no, no, of course not!  Some of the datasets were prohibitively large so we only learned a paltry total of 296,748 networks.  This is what we do for fun here, people!

When we model the relative quality of the network given the parameters (which of course do use BigML), we learn a lot of interesting little bits about how the parameters of neural networks relate to one another and the data on which they’re being trained.

You’ll get better results using the “adadelta” decent algorithm, for example, with high learning rates, as you can see by the green areas of the figure below, indicating parts of the parameter space that specify networks, which perform better on a relative basis.


But if you’re using “rms_prop”, you’ll want to use learning rates that are several orders of magnitude lower.


Thankfully, you don’t have to remember all of this.  The wisdom is in the model, and we can use this model to make intelligent suggestions about which network topology and parameters you should use, which is exactly what we do with the “Automatic Structure Suggestion” button.


But, of course, the suggested structure and parameters represent only a guess at the optimal choices, albeit an educated one.  What if we’re wrong?  What if there’s another network structure out there that performs well on your data, if you could only find it.  Well, we’ll have to go looking for it, won’t we?

Network Search

Our strategy here again comes in many pieces.  The first is Bayesian hyperparameter optimization.  I won’t go into it too much here because this part of the architecture isn’t too much more than a server-side implementation of SMACdown, which I’ve described previously. This technique essentially allows a clever search through the space of possible models by using the results of previously learned networks to guide the search. The cleverness here lies in using the results of the previously trained networks to select the best next one to evaluate. Beyond SMAC, we’ve done some experiments with regret-based acquisition functions, but the flavor of the algorithm is the same.

We’ve also more heavily parallelized the algorithm, so the backend actually keeps a queue of networks to try, reordering that queue periodically as its model of network performance is refined.

The final innovation here is using bits and pieces of the hyperband algorithm.  The insight of the algorithm is that a lot of savings in computation time can be gained by simply stopping training on networks that, even if their fit is still improving, have little hope of reaching near-optimal performance in reasonable time. Our implementation differs significantly in the details (in our interface, for example, you provide us with a budget of search time that we always respect), but does stop training early for many underperforming networks, especially in the later stages of the search.

This is all available right next door to the structure suggestion button in the interface.Automatic Network SearchBetween metalearning and network search, we can make a good guess as to some reasonable network parameters for your data, and we can do a clever search for better ones if you’re willing to give us a bit more time.  It sounds good on paper, so let’s take it for a spin.


So how did we do? Remember those 50+ datasets we used for metalearning?  We can use the same datasets to benchmark the performance of our network search against other algorithms (before you ask, no, we didn’t apply our metalearning model during this search as that would clearly be cheating). We again do 5 replications of 10-fold cross validation for each dataset and measure performance over 10 different metrics. As this was a hobby project I started before I came to BigML, you can see the result here.

Deepnets Benchmark

You can see on the front page a bit of information about how the benchmark was conducted (scroll below the chart), and a comparison of 30+ different algorithms from various software packages. The thing that’s being measured here is “How close is the performance of this algorithm to the best algorithm for a given dataset and metric?”  You can quickly see that BigML Deepnets are the best thing or quite close more often than any other off-the-shelf algorithm we’ve tested.

The astute reader will certainly reply, “Well, yes, but you’ve done all manner of clever optimizations on top of the usual deep learning; you could apply such cleverness to any algorithm in the list (or a combination!) and maybe get better performance still!”

This is absolutely true.  I’m certainly not saying that BigML deep learning is the best Machine Learning algorithm there is; I don’t even know how you would prove something like that. But what these results do show is that if you’re going to just pull something off of the shelf and use it, with no parameter tuning and little or no coding, you could do a lot worse than to pull off BigML. Moreover, the results show that deep learning (BigML or otherwise; notice that multilayer perceptrons in scikit are just a few clicks down the list) isn’t just for vision and NLP problems, and it might be the right thing for your data too.

Another lesson here, editorially, is that benchmarking is a subtle thing, easily done wrong. If you go to the “generate abstract” page, you can auto-generate a true statement (based on these benchmarks) that “proves” that any algorithm in this list is “state-of-the-art”. Never trust a benchmark on a single dataset or a single metric! While any benchmark of ML algorithms is bound to be inadequate, we hope that this benchmark is sufficiently general to be useful.

Cowardly Lion Deepnets

Hopefully, all of this has convinced you to go off to see the Wizard and give BigML Deepnets a try. Don’t be the Cowardly Lion! You have nothing to lose except underperforming models…

Want to know more about Deepnets?

Please visit the dedicated release page for further learning. It includes a series of six blog posts about Deepnets, the BigML Dashboard and API documentation, the webinar slideshow as well as the full webinar recording.


Automating Deepnets with WhizzML and The BigML Python Bindings


This blog post, the fifth of our series of six posts about Deepnets, focuses on those users that want to automate their Machine Learning workflows using programming languages. If you follow the BigML blog, you may know WhizzML, BigML’s domain-specific language for automating Machine Learning workflows, implementing high-level Machine Learning algorithms, and easily sharing them with others. WhizzML helps developers to create Machine Learning workflows and execute them entirely in the cloud. This avoids network problems, memory issues and lack of computing capacity while taking full advantage of WhizzML’s built-in parallelization. If you are not familiar with WhizzML yet, we recommend that you read the series of posts we published this summer about how to create WhizzML scripts: Part 1, Part 2 and Part 3 to quickly discover their benefits.

In addition, in order to easily automate the use of BigML’s Machine Learning resources, we maintain a set of bindingswhich allow users to work in their favorite language (Java, C#, PHP, Swift, and others) with the BigML platform.

Screen Shot 2017-03-15 at 01.51.13

Let’s see how to use Deepnets through both the popular BigML Python Bindings and WhizzML, but note that the operations described in this post are also available in this list of bindings.

We start creating Deepnets with the default settings. For that, we need to start from an existing Dataset to train the network in BigML, so our call to the API will need to include the Dataset ID we want to use for training as shown below:

;; Creates a deepnet with default parameters
(define my_deepnet (create-deepnet {"dataset" training_dataset}))

The BigML API is mostly asynchronous, that is, the above creation function will return a response before the Deepnet creation is completed. This implies that the Deepnet information is not ready to make predictions right after the code snippet is executed, so you must wait for its completion before you can predict with it. You can use the directive “create-and-wait-deepnet” for that:

;; Creates a deepnet with default settings. Once it's
;; completed the ID is stored in my_deepnet variable
(define my_deepnet
  (create-and-wait-deepnet {"dataset" training_dataset}))

If you prefer the BigML Python Bindings, the equivalent code is:

from bigml.api import BigML
api = BigML()
my_deepnet = api.create_deepnet("dataset/59b0f8c7b95b392f12000000")

Next up, we will configure a Deepnet with WhizzML. The configuration properties can be easily added in the mapping by using property pairs such as <property_name> and <property_value>. For instance, to create a Deepnet with a dataset, BigML automatically fixes the number of iterations to optimize the network to 20,000, but if you prefer the maximum number of gradient steps to take during the optimization process, you should add the property “max_iterations and set it to 100,000. Additionally, you might want to set the value used by the Deepnet when numeric fields are missing. Then, you need to set thedefault_numeric_valueproperty to the right value. In the example shown below, it is replaced by the mean value. Property names always need to be between quotes and the value should be expressed in the appropriate type. The code for our example can be seen below:

;; Creates a deepnet with some settings. Once it's
;; completed the ID is stored in my_deepnet variable
(define my_deepnet
  (create-and-wait-deepnet {"dataset" training_dataset
                            "max_iterations" 100000
                            "default_numeric_value" "mean"}))

The equivalent code for the BigML Python Bindings is:

from bigml.api import BigML
api = BigML()
args = {"max_iterations": 100000, "default_numeric_value": "mean"}
training_dataset ="dataset/59b0f8c7b95b392f12000000"
my_deepnet = api.create_deepnet(training_dataset, args)

For more details about these and other properties, please check the dedicated API documentation (available on October 5.)

Once the Deepnet has been created, we can evaluate how good its performance is. Now, we will use a different dataset with non-overlapped data to check the Deepnet performance.  The “test_dataset” parameter in the code shown below represents the second dataset. Following WhizzML’s philosophy of “less is more”, the snippet that creates an evaluation has only two mandatory parameters: a Deepnet to be evaluated and a Dataset to use as test data.

;; Creates an evaluation of a deepnet
(define my_deepnet_ev
 (create-evaluation {"deepnet" my_deepnet "dataset" test_dataset}))

Similarly, in Python the evaluation is done as follows:

from bigml.api import BigML
api = BigML()
my_deepnet = "deepnet/59b0f8c7b95b392f12000000"
test_dataset = "dataset/59b0f8c7b95b392f12000002"
evaluation = api.create_evaluation(my_deepnet, test_dataset)

After evaluating your Deepnet, you can predict the results of the network for new values of one or many features in your data domain. In this code, we demonstrate the simplest case, where the prediction is made only for some fields in your dataset.

;; Creates a prediction using a deepnet with specific input data
(define my_prediction
 (create-prediction {"deepnet" my_deepnet
                     "input_data" {"sepal length" 2 "sepal width" 3}}))

And the equivalent code for the BigML Python bindings is:

from bigml.api import BigML
api = BigML()
input_data = {"sepal length": 2, "sepal width": 3}
my_deepnet = "deepnet/59b0f8c7b95b392f12000000"
prediction = api.create_prediction(my_deepnet, input_data)

In addition to this prediction, calculated and stored in BigML servers, the Python Bindings allow you to instantly create single local predictions on your computer. The Deepnet information is downloaded to your computer the first time you use it, and the predictions are computed locally on your machine, without any costs or latency:

from bigml.deepnet import Deepnet
local_deepnet = Deepnet("deepnet/59b0f8c7b95b392f12000000")
input_data = {"sepal length": 2, "sepal width": 3}

It is pretty straightforward to create a Batch Prediction from an existing Deepnet, where the dataset named “my_dataset” represents a set of rows with the data to predict by the network:

;; Creates a batch prediction using a deepnet 'my_deepnet'
;; and the dataset 'my_dataset' as data to predict for
(define my_batchprediction
 (create-batchprediction {"deepnet" my_deepnet
                          "dataset" my_dataset}))

The code in Python Bindings to perform the same task is:

from bigml.api import BigML
api = BigML()
my_deepnet = "deepnet/59d1f57ab95b39750c000000"
my_dataset = "dataset/59b0f8c7b95b392f12000000"
my_batchprediction = api.create_batch_prediction(my_deepnet, my_dataset)

Want to know more about Deepnets?

Our next blog post, the last one of this series, will cover how Deepnets work behind the scenes, diving into the most technical aspects of BigML’s latest resource. If you have any questions or you’d like to learn more about how Deepnets work, please visit the dedicated release page. It includes a series of six blog posts about Deepnets, the BigML Dashboard and API documentation, the webinar slideshow as well as the full webinar recording.

Programming Deepnets with the BigML API

So far, we have introduced BigML’s Deepnets, how they are used, and how to create one in the BigML Dashboard. In this post, the fourth in our series of blog posts about Deepnets, we will see how to use them programmatically using the API. So let’s start!

The API workflow to create Deepnets includes five main steps: first, upload your data to BigML, then create a dataset, create a Deepnet, evaluate it and finally make predictions. Note that any resource created with the API will automatically be created in your Dashboard too so you can take advantage of BigML’s intuitive visualizations at any time.


The first step in using the API is setting up your authentication. This is done by setting the BIGML_USERNAME and BIGML_API_KEY environment variables. Your username is the same as the one you use to log into the BigML website. To find your API key, on the website, navigate to your user account page and then click on ‘API Key’ on the left. To set your environment variables, you can add lines like the following to your .bash_profile file.

export BIGML_USERNAME=my_name
export BIGML_API_KEY=13245 
export BIGML_AUTH="username=$BIGML_USERNAME;api_key=$BIGML_API_KEY;"

Once your authentication is set up, you can begin your workflow.

1. Upload your Data

Data can be uploaded to BigML in many different ways. You can use a local file, a remote URL, or inline data. To create a source from a remote URL, use the curl command:

 curl "$BIGML_AUTH" \
 -X POST \
 -H 'content-type: application/json' \
 -d '{"remote": ""}'

2. Create a Dataset

Now that you have a source, you will need to process it into a dataset. Use the curl command:

curl "$BIGML_AUTH" \
-H 'content-type: application/json' \
-d '{"source": "source/59c14dce2ba7150b1500fdb5"}'

If you plan on running an evaluation later (and you will want to evaluate your results!), you will want to split this dataset into a testing and a training dataset. You will create your Deepnet using the training dataset (commonly 80% of the original dataset) and then evaluate it against the testing dataset (the remaining 20% of the original dataset). To make this split using the API, you will first run the command:

curl "$BIGML_AUTH" \
-H 'content-type: application/json' \
-d '{"origin_dataset": "dataset/59c153eab95b3905a3000054", 
     "sample_rate": 0.8, 
     "seed": "myseed"}'

using the sample rate and seed of your choice. This creates the training dataset.

Now, to make the testing dataset, run:

curl "$BIGML_AUTH" \
-H 'content-type: application/json' \
-d '{"origin_dataset": "dataset/59c153eab95b3905a3000054", 
     "sample_rate": 0.8,
     "out_of_bag": true,
     "seed": "myseed"}'

By setting “out_of_bag” to true, you are choosing all the rows you did not choose while creating the training set. This will be your testing dataset.

3. Create a Deepnet

Now that you have the datasets you need, you can create your Deepnet. To do this, use the command:

curl "$BIGML_AUTH" \
-H 'content-type: application/json' \
-d '{"dataset": "dataset/59c15634b95b3905a1000032"}' 

being sure to use the dataset ID of your training dataset. This will create a Deepnet by using the default settings.

You can also modify the settings of your Deepnet in various ways. For example, if you want to change the maximum number of gradient steps to be ten and you want to name your deepnet “my deepnet” you could run:

curl "$BIGML_AUTH" \
-H 'content-type: application/json' \
-d '{"dataset": "dataset/59c15634b95b3905a1000032",
     "max_iterations": 10,
     "name": "my deepnet"}' 

The full list of Deepnet arguments can be found in our API documentation, which will be fully available on October 5.

4. Evaluate your Deepnet

Once you have created a Deepnet, you may want to evaluate it to see how well it is performing. To do this, create an Evaluation using the resource ID of your Deepnet and the dataset ID of the testing dataset you created earlier.

curl "$BIGML_AUTH" \
-H 'content-type: application/json' \
-d '{"deepnet": "deepnet/59c157cfb95b390597000085"'
     "dataset": "dataset/59c1568ab95b3905a0000040"}'

Once you have your Evaluation, you may decide that you want to change some of your Deepnet parameters to improve its performance. If so, just repeat step three with different parameters.

5. Make Predictions

When you are satisfied with your Deepnet, you can begin to use it to make predictions. For example, suppose you wanted to predict if the value of field “000001” was 3. To do this, use the command:

curl "$BIGML_AUTH" \
-H 'content-type: application/json' \
-d '{"deepnet": "deepnet/59c157cfb95b390597000085", 
     "input_data" : {"000001": 3}}'

Want to know more about Deepnets?

Stay tuned for the more blog posts! In the next post, we will explain how to automate Deepnets with WhizzML and the BigML Python Bindings. Additionally, if you have any questions or you’d like to learn more about how Deepnets work, please visit the dedicated release page. It includes a series of six blog posts about Deepnets, the BigML Dashboard and API documentation, the webinar slideshow as well as the full webinar recording.


Creating your Deepnets with the BigML Dashboard

The BigML Team has been working hard this summer to bring Deepnets to the platform, which will be available on October 5, 2017. As explained in our introductory post, Deepnets are an optimized implementation of the popular Deep Neural Networks, a supervised learning technique that can be used to solve classification and regression problems. Neural Networks became popular because they were able to address complex problems for a machine to solve like identifying objects in images.

The previous blog post presented a study case that showed how Deepnets can help you solve your real-world problems. In this post, we will take you through the five necessary steps to train a Deepnet that correctly identifies the digits using the BigML Dashboard. We will use a partial version of the well-known MNIST dataset (provided by Kaggle) for image recognition that contains 42,000 handwritten images of numbers from 0 to 9.

1. Upload your Data

As usual, you need to start by uploading your data to your BigML account. BigML offers several ways to do it, you can drag and drop a local file, connect BigML to your cloud repository (e.g., S3 buckets) or copy and paste a URL. This will create a source in BigML.

BigML automatically identifies the field types. In this case, our objective field (“label”) has been identified as numeric because it contains the digit values from 0 to 9. Since this is a classification problem (we want our Deepnet to predict the exact digit label for each image instead of a continuous numeric value), we need to configure the field type and select “Categorical” for our objective field.


2. Create a Dataset

From your source view, use the 1-click dataset menu option to create a dataset, a structured version of your data ready to be used by a Machine Learning algorithm.


In the dataset view, you will be able to see a summary of your field values, some basic statistics, and the field histograms to analyze your data distributions. You can see that our dataset has a total of 42,000 instances and approximately 4,500 instances for each digit class in the objective field.


The dataset also includes a total of 784 fields containing the pixel information for each image. You can see that many fields are automatically marked as non-preferred in this dataset. This is because they contain the same value for all images so they are not good predictors for the model.

non-preferred fields.png

Since Deepnets are a supervised learning method, it is key to train and evaluate your model with different data to ensure it generalizes well against unseen data. You can easily split your dataset using the BigML 1-click action menu, which randomly sets aside 80% of the instances for training and 20% for testing.


3. Create a Deepnet

In BigML you can use the 1-click Deepnet menu option, which will create the model using the default parameter values, or you can tune the parameters using the Configure Deepnet option.


BigML provides the following parameters to manually configure your Deepnet:

  • Maximum training time and maximum iterations: you can set an upper bound to the Deepnet runtime by setting a maximum of time or a maximum number of iterations.
  • Missing numerics and default numeric value: if your dataset contains missing values you can include them as valid values or replace them with the field mean, minimum, maximum or zero.
  • Network architecture: you can define the number of hidden layers in your network, the activation function and the number of nodes for each of them. BigML also provides other three options related to how the layer connections are arranged such as residual learning, batch normalization and tree embedding (a tree-based representation of the data to input along with the raw features).
  • Algorithm: you can select the gradient descent optimizer including Adam, Adagrad, Momentum, RMSProp, and FTRL. For each of these algorithms, you can tune the learning rate, the dropout rate and a set of specific parameters which explanation goes beyond the purpose of this post. To know more about the differences between algorithms you can read this article.
  • Weights: if your dataset contains imbalanced classes for the objective field, you can automatically balance them with this option that uses the oversampling strategy for the minority class.

Neural Networks are known for being notoriously sensitive to the chosen architecture and the algorithm used to optimize the parameters thereof. Due to this sensitivity and the large number of different parameters, hand-tuning Deepnets can be difficult and time-consuming as the number of choices that lead to poor networks typically vastly outnumber the choices that lead to good results.

To combat this problem, BigML offers first-class support for automatic parameter optimization. In this case, we will use the automatic network search option. This option trains and evaluates over all possible network configurations, returning the best networks found for your problem. The final Deepnet will use the top networks found in this search to make predictions.


When the network search optimizer is enabled, the Deepnet creation may take some time to be created (so be patient!). By default, the maximum training time is set to 30 minutes, but you can configure it.

When your Deepnet is created you will be able to visualize the results in the Partial Dependence Plot. This unique view allows you to inspect the input fields’ impact on predictions. You can select two different fields for the axes and set the values for the rest of input fields to the right. You can see the predictions for each of the classes in different colors in the legend to the right by hovering the chart area. Each class color is shaded according to the class probability.


4. Evaluate your Deepnet

The Deepnet looks good, but we can’t know anything about the performance until we evaluate it. From your Deepnet view, click on the evaluate option in the 1-click action menu and BigML will automatically select the remaining 20% of the dataset that you set aside for testing.


You can see in the image below that this model has an overall accuracy of 96.1%, however, a high accuracy may be hiding a bad performance for some of the classes.


To look at the correct decisions as well as the mistakes made by the model per class we need to look at the confusion matrix, but we have too many different categories in the objective field and BigML cannot plot all them in the Dashboard, so we need to download the confusion matrix in Excel format.


Ok, it seems we get very good results! In the diagonal of the table you can find the right decisions made by the model. Almost all categories have a precision and recall over 95%. Despite these great results, can they get better using other algorithms? We trained a Random Decision Forest (RDF) of 20 trees and a Logistic Regression to compare their performances with Deepnets. BigML offers a comparison tool to easily compare the results of different classifiers. We use the well-known ROC curve and the AUC as the comparison measure. We need to select a positive class, to make the comparison (in this case we selected the digit 9) and the ROC curves for each model are plotted in the chart. You can see in the image below how our Deepnet outperforms the other two models since its ROC AUC is higher.


Although RDF also provides pretty good results, the ROC AUC for Deepnets is consistently better for all the digits from 0 to 9 as you can see in the first column in the table below.


5. Make Predictions using your Deepnet

Predictions work for Deepnets exactly the same as for any other supervised method in BigML. You can make predictions for a new single instance or multiple instances in batch.

Single predictions

Click on the Predict option from your Deepnet view. A form containing all your input fields will be displayed and you will be able to set the values for a new instance. At the top of the view, you will get all the objective class probabilities for each prediction.


Since this dataset contains more than 100 fields, we cannot perform single predictions from the BigML Dashboard, but we can use the BigML API for that.

Batch predictions

If you want to make predictions for multiple instances at the same time, click on the Batch Prediction option and select the dataset containing the instances for which you want to know the objective field value.


You can configure several parameters of your batch prediction like the possibility to include all class probabilities in the output dataset and file. When your batch prediction finishes, you will be able to download the CSV file and see the output dataset.


Want to know more about Deepnets?

Stay tuned for the next blog post, where you will learn how to program Deepnets via the BigML API. If you have any questions or you’d like to learn more about how Deepnets work, please visit the dedicated release page. It includes a series of six blog posts about Deepnets, the BigML Dashboard and API documentation, the webinar slideshow as well as the full webinar recording.

Case Study: Finding Higgs Bosons with Deepnets

In our previous blog post, we’ve introduced Deep Neural Networks. In today’s blog post, the second of our series of six, we’ll use BigML’s deep networks, called Deepnets, to walk through a supervised learning problem.

The Data

The dataset we are investigating is the Higgs dataset found at the UCI Machine Learning Repository. The problem, as explained in the original Nature Communications paper, is that particle accelerators create a huge amount of data. The Large Hadron Collider, for example, can produce 100 billion collisions in an hour and only a tiny fraction may produce detectable exotic particles like the Higgs boson. So a well-trained model is immensely useful for finding the needle in the haystack.

By Lucas Taylor / CERN-EX-9710002-1 (

This dataset contains 28 different numeric fields and 11 million rows generated through simulation, but imitating actual collisions. As explained in the paper, these 28 fields come in two kinds. The first 21 of the fields capture low-level kinematic properties measured directly by the detectors in the accelerator. The last seven fields are combinations of the kinematic fields hand-designed by physicists to help differentiate between collisions that result in a Higgs boson and those that do not. So some of these fields are “real” data as measured, and some are constructed from domain-specific knowledge.

This kind of feature engineering is very common when solving Machine Learning problems, and can greatly increase the accuracy of your predictions. But what if we don’t have a physicist around to assist in this sort of feature engineering? Compared to other Machine Learning techniques (such as Boosted Trees), Deepnets can perform quite well with just the low-level fields by learning their own higher-level combinations (especially if the low-level fields are numeric and continuous).

To try this out, we created a dataset of just the first 1.5 million data points (for speed), and removed the last seven high-level features. We then split the data into a 80% training dataset and 20% testing dataset. Next up we create a Deepnet by using BigML’s automatic structure search.

Screen Shot 2017-09-27 at 9.28.36 AM

Deep neural networks can be built in a multitude of ways, but we’ve simplified this by intelligently searching through many possible network structures (number of layers, activation functions, nodes, etc.) and algorithm parameters (the gradient descent optimizer, the learning rate, etc.) and then making an ensemble classifier from the best individual networks. You can manually choose a network structure, tweak the search parameters, or simply leave all those details to BigML (as we are doing here).

The Deepnet

img 1

Once the Deepnet is created, it would be nice to know how well it is performing. To find out, we create an Evaluation using this Deepnet and the 20% testing dataset we split off earlier.

img 2

We get an accuracy of 65.9% and a ROC AUC of 0.7232. Perhaps we could even improve these numbers by lengthening the maximum training time. This is significantly better than our results using a Boosted Tree run with default settings. Here they are compared:

img 3

The Boosted Tree has an accuracy of 59.4% and a ROC AUC of 0.6394. The Boosted Tree is just not able to pull as much information out of these low-level features as the Deepnet. This is a clear example of when we should choose Deepnets over other supervised learning techniques to solve your classification or regression problems.

Want to know more about Deepnets?

Stay tuned for the more publications! In the next post we will explain how to use Deepnets through the BigML Dashboard. Additionally, if you have any questions or you’d like to learn more about how Deepnets work, please visit the dedicated release page. It includes a series of six blog posts about Deepnets, the BigML Dashboard and API documentation, the webinar slideshow as well as the full webinar recording.

The Importance of Protecting Your Inventions On Time

BigML, in collaboration with Telefónica, the law firms of Schwabe Williamson & Wyatt and FisherBroyles, ValenciaLAB, and the Valencian Municipality (VIT Emprende, València Activa, and Ajuntament de València) are bringing to Spain a special event about the importance of patents in protecting your inventions expeditiously.

This event will be held in Madrid, Barcelona, and Valencia, on October 25, 26 and 27 respectively. These three workshops will cover the importance of being IP-aware, as protecting the Intellectual Property (IP) of your business is key for its commercial success, whether your product is on the market or still under development. By patenting your ideas, brand, and offerings, your company becomes more competitive in the marketplace and more attractive for investors willing to support your company.

Why attend?

The event will clearly define what a patent is, while addressing topics such as the importance of patenting new ideas in your business, the risks you bear if your IP is not properly patented, the impact of patents on continuous innovation, the process to protect your inventions, and the grants or awards that you can receive when patenting your ideas. All these questions will be discussed from three different perspectives: beginning from a startup’s point of view, presented by Francisco J. Martín, BigML’s CEO; following up with a perspective of big corporations, explained by Luis Ignacio Vicente del Olmo, Head of Telefonica Patent Office at Telefonica; and concluding with the legal viewpoint of the importance of IP by Micah D. Stolowitz, IP Strategist at FisherBroyles, and Graciela Gómez Cowger, IP Strategist and CEO at Schwabe, Williamson & Wyatt. All the sessions will delve into the current IP climate for both European and US companies.

Who should attend?

Intellectual Property and Patent Attorneys, Intellectual Property Patent Engineers, C-level Executives, Corporate General Counsel, Corporate Attorneys, Financing Executives, Venture Firm Capitalists, and any other professional interested in this topic.

If you have your own business or you plan to have one in the near future, you should not miss this chance to learn from two of the most experienced lawyers in the Intellectual Property field. Moreover, hearing the insider stories from two completely different organizations will help you crystallize the impact that patents can have in your business as you set off on your own patent journey.

When and Where?

These workshops will be held in three cities in Spain during October 2017:

On October 25 the event will take place in Madrid at Telefónica Building:

  • Room: Agora in Wayra Madrid (8th Floor)
  • Address: Gran Via, 28, 28013, Madrid (Access from Valverde Street, 2)

On October 26 the workshop will be held in Barcelona at Diagonal 00 Telefónica Tower:

  • Room: Auditorium (2nd Floor)
  • Address: Ernest Lluch and Martin Square, 5, 08019, Barcelona

On October 27, the last workshop will be held in Valencia at ValenciaLAB.

  • Room: Conference Room (Ground Floor)
  • Address: Music Peydro Street, 36, 46001, Valencia


Dr. Francisco J. Martín, the Co-Founder and CEO of BigML, is an innovative leader experienced at inventing, designing, building, and operating Intelligent Systems, from concept development to market validation and business growth.

Before BigML, from 2004 to 2010, Francisco J. Martín was CEO and founder of Strands, a leading company that develops recommendation and personalization technologies. Strands pioneered research and applications of social recommendation technologies in several domains (music, videos, personal finance, sports, etc). Prior to that, from 1999 to 2003, Dr. Martin was founder and CEO of iSOCO, a company that specializes in business solutions built on top of Artificial Intelligence. iSOCO was the first spin-off of the IIIA Research Centre of Artificial Intelligence, belonging to the Spanish Council for Scientific Research; and pioneered research and applications of semantic web technologies.

Regarding his education, Dr. Martín did a fellowship research in the same center from January 1996 to 1999; got his Computer Science Degree in 1996 by the Polytechnic University of Valencia; holds a PHD in Artificial Intelligence by the Polytechnic University of Catalunya; and a Postdoc in Machine Learning by the Oregon State University.


Dr. Luis Ignacio Vicente del Olmo is currently managing the Return on Innovation Area of Telefonica SA, the telco operator based in Spain with operations in Europe and South America, with a focus on new areas such as 5G, Machine Learning, and Industry 4.0. This role includes the leadership of Telefonica Patent Office, the leading Intellectual Property Unit in Spain. Dr. del Olmo is a member of preeminent Spanish, European & International Boards related to R&D Management, a Professor at the Master of Digital Science promoted by the European Institute of Technology, and a member of the Board of Telefonica I+D Chile, the main R&D Center of Telefonica in South America.

Dr. del Olmo has over 25 years of working experience in R&D & innovation management, mainly at Telefonica. He is a Senior expert in R&D & Innovation Management and has international experience participating in R&D & Innovation as well as 20 years of experience working with European Commission, MIT, OECD, BEI, BID, among others.

Dr. del Olmo is an engineer and has a PhD in Physics (with a specialization in Electronics). He completed a Master in Analysis and Management of Science & Technology, graduating in Economy of Telecommunications, and has a degree in Industrial Engineering and Innovation Management. He is also a specialist in Innovation Economy, and a graduate of European Communities by the Diplomatic School of the Spanish Ministry of Foreign Affairs.


Antonio López-Carrasco Comajuncosas, Senior Technology Expert at Telefónica Patent Office, is a European Patent Attorney and Chartered Lawyer that has worked for Deutsche Telefonwerke (Berlin) and for the European Space Agency (Noordwijk, NL). In addition, he has been Examiner at the European Patent Office for about 12 years and the Head of the patent department at Oficina Ponti (Barcelona). At present, he is a senior expert and in-house counsel at Telefónica.

Over the years, Toni has acquired extensive experience in patents and related technical IP portfolio management as well as in strategic business support, i.e. prosecution, licensing, litigation, tech transfer and monetization. He teaches on innovation protection and technology transfer (IP strategy and management) within post-graduate business degrees at university and lectures frequently to international audiences. Finally, he is the coordinator of the CEIPI course on European Patent Law in Barcelona, and holds a M.Sc. in Physics and a M.Sc. in Space Engineering.


Mr. Micah Stolowitz, IP Strategist at FisherBroyles, has practiced law for 36 years, focused on intellectual property protection, enforcement and transactions. He represents a wide range of companies from Fortune 100 to new startups and many in between, and has been recognized by his peers as a Superlawyer®  in IP every year for over 10 years. He has testified as an expert witness in patent litigation, and also served as a Special Master, appointed to assist the district court in Lizardtech, Inc. v. Earth Resource Mapping, Inc. (CAFC 2005). Micah Stolowitz is a fellow of the Academy of Court Appointed Masters.

Mr. Stolowitz works with clients on IP strategy, patent and trademark prosecution, licensing and monetization. His work includes infringement and validity studies and opinions, and design around advice. He has negotiated patent sales valued in millions of dollars. A sampling of his extensive experience would include drafting and prosecuting patents directed to digital, analog and mixed signal circuits, software of all kinds, cryptographic and other systems for security and authentication, physical object “fingerprinting,” identification and authentication, internet high bandwidth and availability, wireless telecommunication, 3GPP standards, database systems, SaaS “cloud computing,” prediction systems, Machine Learning, character recognition, solid state and memory and disk drives, printing technologies, and medical devices.

Micah Stolowitz serves as adjunct professor of Patent Law at Lewis and Clark College, where he also completed his Juris Doctor degree. Mr. Stolowitz has worked as an electrical engineer in Silicon Valley, and holds a B.S. degree in Electrical Engineering and Computer Science, from the University of California, Berkeley. His industry experience helps him see IP challenges from a client’s perspective and align his legal services with client goals. He is committed to staying on the cutting edge of technological advances, and counts himself fortunate to work with many great teachers (the inventors in the firm he is serving).  


Ms. Graciela Gómez Cowger, CEO-Select and IP Strategist at Schwabe

Ms. Graciela Gómez Cowger, IP Strategist and CEO at Schwabe, Williamson & Wyatt, helps individuals and companies protect innovations in the technology and health industries. She prepares, files, and prosecutes patents in the electronics, software and communications arts and also drafts patent infringement analysis and opinion letters. Working closely with inventors and companies, Ms. Gómez Cowger helps assess the value of patent portfolios, and crafts licensing and other strategies to maximize intellectual property investments.

Ms. Gómez Cowger has extensive experience helping individuals and companies develop branding strategies that protect market presence, including preparing, filing and prosecuting trademark applications in the United States and abroad. Before becoming a lawyer, she worked as a research and design engineer at HP Inc, a large electronics company developing cutting-edge printing systems. This experience has allowed her to work collaboratively with inventors to quickly identify the distinctions that make innovations patentable.

Apply to join the event:

IMPORTANT: Attendance is free but by invitation only. Space is limited, so please fill in this form to register for the event and make sure you select the preferred location, either Madrid, Barcelona, or Valencia. Shortly after your registration, the BigML Team will send you your invitation.

In case you have questions, please check the dedicated event page for more details.

Introduction to Deepnets

We are proud to present Deepnets as the new resource brought to the BigML platform. On October 5, 2017, it will be available via the BigML Dashboard, API and WhizzML. Deepnets (an optimized version of Deep Neural Networks) are part of a broader family of classification and regression methods based on learning data representations from a wide variety of data types (e.g., numeric, categorical, text, image). Deepnets have been successfully used to solve many types of classification and regression problems in addition to social network filtering, machine translation, bioinformatics and similar problems in data-rich domains.

Intro to Deepnets

In the spirit of making Machine Learning easy for everyone, we will provide new learning material for you to start with Deepnets from scratch and progressively become a power user. We start by publishing a series of six blog posts that will gradually dive deeper into the technical and practical aspects of Deepnets. Today’s post sets off by explaining the basic Deepnet concepts. We will follow with an example use case. Then, there will be several posts focused on how to use and interpret Deepnets through the BigML Dashboard, API, and WhizzML and Python Bindings. Finally, we will complete this series with a technical view of how Deepnet models work behind the scenes.

Let’s dive right in!

Why Bring Deepnets to BigML?

Unfortunately, there’s a fair amount of confusion about Deepnets in the popular media as part of the ongoing “AI misinformation epidemic”. This has caused the uninitiated to regard Deepnets as some sort of a robot messiah destined to either save or destroy our planet. Contrary to the recent “immaculate conception” like narrative fueled by Deepnets’ achievements in the computer vision domain after 2012, the theoretical background of Deepnets dates back 25+ years.

So what explains Deepnets’ newfound popularity?
  • The first reason has to do with sheer scale. There are problems that involve massive parameter spaces that can be sufficiently represented only by massive data. In those instances, Deepnets can come to the rescue, thanks to the abundance of modern computational power.  Speech recognition is a good example of such a challenging use case, where the difference between 97% accuracy and 98.5% accuracy can mean the difference between a consumer product that is very frustrating to interact with or one that is capable of near-human performance.
  • In addition, the availability of the number of open source frameworks for computational graph composition has helped popularize the technique among more scientists.  Such frameworks “compile” the required Deepnets architectures into a highly optimized set of commands that run quickly and with maximum parallelism. They essentially work by symbolically differentiating the objective for gradient descent thus freeing the practitioner from having to work out the underlying math himself.

Comparing Deepnets to Other Classifiers

All good, but if we look beyond the hype, what are some of the major advantages and disadvantages of Deepnets before giving it serious consideration as part of our Machine Learning toolbox? For this, it’s best to contrast them with the pros and cons of alternative classifiers.

  • As you’ll remember, decision trees have the advantage of massive representational power that expands as your data gets larger due to efficient and fast search capabilities. On the negative side, decision trees struggle with the representation of smooth functions as their axis-parallel thresholds require many variables to be able to account for them. Tree ensembles, however, mitigate some of these difficulties by learning a bunch of trees from sub-samples to counter the smoothness issue.
  • As far as Logistic Regression is concerned, it can handle some smooth, multivariate functions as long as they lie in its hypothesis space. Furthermore, it trains real fast. However, because it is a parametric method, it tends to fall short for use cases where the decision boundary is nonlinear.

So, the question is: Can Deepnets mitigate the shortcomings of trees or Logistic Regression? Just like trees (or tree ensembles), with Deepnets, we get arbitrary representational power by modifying their underlying structure.  On the other hand, similar to Logistic Regression, smooth, multivariate objectives don’t present a problem to Deepnets provided that we have the right network architecture underneath.

You may already be suspecting what the catch is, since there is no free lunch in Machine Learning. The first tradeoff comes in the form of ‘Efficiency‘, because there is no guarantee that the right neural network structure for a given data will be easily found. As a matter of fact, most structures are not useful. So you’re really left with no choice but to try different structures by trial and error. As a result, nailing the Deepnets structure remains a time-consuming task compared with the decision trees’ greedy optimization routine.

Interpretability‘ also is negatively impacted as the Deepnets practitioner ends up getting quite far away from the type of intuitive interpretability of tree-based algorithms. One possible solution is to use sampling and tree induction to create decision tree-like explanations for your Deepnet predictions (more on this later).

Deepnets vs.LR vs. Trees
Where does this leave us as to when to use Deepnets? First off, let’s outline the factors that make Deepnets less useful:
  • If you have smaller datasets (e.g., thousands instead of millions of instances) you may be better off looking into other techniques like Logistic Regression or tree ensembles.
  • Since better features almost always beat better models, problems that benefit from quick iterations may best be handled by other approaches. That is, if you need to iterate quickly and there are many creative features you can generate from your dataset to do so, it’s usually best to skip Deepnets and their trial and error based iterations in favor of algorithms that fit faster e.g., tree ensembles.
  • If your problem’s cost-benefit equation doesn’t require every ounce of accuracy you can grab, you may also be better off with other more efficient algorithms that consume less resources.

Finally, remember that Deepnets are just another sort of classifier. As Stuart J. Russell, Professor of Computer Science at the University of California, Berkeley, has put:

  • “…deep learning has existed in the neural network community for over 20 years. Recent advances are driven by some relatively minor improvements in algorithms and models and by the availability of large data sets and much more powerful collections of computers.”

From Zero to Deepnet Predictions

In BigML, a regular Deepnet workflow is composed of training your data, evaluating it and predicting what will happen in the future. In that way, it is very much like other supervised modeling methods available in BigML. But what makes Deepnets different, if at all?

1. The training data structure: The instances in the training data can be arranged in any manner. Furthermore, all types of fields (numeric, categoric, text and items) and missing values are supported in the same vein as other classification and regression models and ensembles.

2. Deepnet models: A Deepnet model can accept either a numeric field (for regression) or a categorical field (for classification) in the input dataset as objective field. In a nutshell, BigML’s Deepnets implementation is differentiated from similar algorithms due to its automatic optimization option to help you discover the best Deepnet parametrization during your network search. We will get into the details of our approach in a future post that will focus on what goes on under the hood.

3. Evaluation: Evaluations for Deepnet models are similar to other supervised learning evaluations, where random training-test splits are usually preferred. The same classification and regression performance metrics are applicable to Deepnets (e.g., AUC for classification or R-squared for regression). These metrics are fully covered in the Dashboard documentation.

4. Prediction: As in other classification and regression resources, BigML Deepnets can be used to make single or batch predictions depending on your need.

Understanding Deepnet Models

Not a lot of people talk about Deepnets as a generalization of Logistic Regression, but we can very well think about it in that way. (If you need a refresher on Logistic Regression, we recommend a quick read of our related post.) Take the following representation of Logistic Regression with nodes and arrows. The circles at the bottom are the input features and the ones on top are the corresponding output probabilities. Each arrow represents one of the coefficients (betas) to learn, which means the output becomes a function of those betas you’re trying to learn.

Logistic Regression

What if we add a bunch more circles as a hidden layer in the middle as seen below?  In this case, the intermediate nodes would be computed the same way as before. Following the same approach, we can add as many nodes and layers to our structure as we choose to. Not only that, we can also mess with the function that computes the weights. In the case of Logistic Regression, it is the well-known logistic function, which is easily differentiable and optimized. By the same token, any other easily differentiable function can replace the logistic function, when it comes to Deepnet structures. Given all these structural choices, you can intuit how representationally powerful Deepnets can be, yet how difficult searching for the optimal structure can get in such a vast hypothesis space.

Deep Neural Network

In Summary

To wrap up, BigML’s Deepnet models:
  • Help solve use cases such as computer vision, speech recognition, natural language processing, audio recognition, social network filtering, machine translation, and bioinformatics, among others.
  • Can be used in classification and regression problems among other algorithms BigML provides.
  • Implement a generalized form of Deep Neural Networks, instead of a specialized architecture optimized for a specific use case e.g., RNN, CNN, LSTM.
  • BigML’s Deepnets implementation also comes with an automatic optimization option that lets you discover the best Deepnet parametrization during your network search.

You can create Deepnet models, interpret and evaluate them, as well as make predictions with them via the BigML Dashboard, our API and Bindings, plus WhizzML and Bindings (to automate your Deepnet workflows). All of these options will be showcased in future blog posts.

Want to know more about Deepnets?

At this point, you may be wondering how to apply Deepnets to solve real-world problems. For starters, in our next post, we will show a use case, where we will be examining an example dataset to see if we can make accurate predictions based on our Deepnet model. Stay tuned!

If you have any questions or you’d like to learn more about how Deepnets work, please visit the dedicated release page. It includes a series of six blog posts about Deepnets, the BigML Dashboard and API documentation, the webinar slideshow as well as the full webinar recording.

%d bloggers like this: