Skip to content

BSSML17: Machine Learning Summer School in Curitiba

If you follow the BigML blog, you may be familiar with our popular Machine Learning Schools held globally throughout the year. These crash courses are key to contribute to the democratization of Machine Learning and to produce a much larger group of ML-literate professionals such as developers, analysts, managers, and subject-matter experts, to help them remain competitive in the marketplace.

We are proud to announce the next course, the second edition of our Machine Learning Summer School in Brazil that will take place on November 29, 2017, in Curitiba, Paraná. The BigML Chief Infrastructure Officer, Poul Petersen, will be giving a one-day course ideal for industry practitioners, advanced undergraduates, as well as graduate students seeking a quick, practical, and hands-on introduction to Machine Learning. The Summer School, co-organized by Sebrae, BigML, and Telefónica Open Future_, will serve as a good introduction to the kind of work that students can expect if they enroll in Machine Learning masters.

Where?

Sebrae Building: Caeté Street, 150 – Prado Velho, Curitiba, Paraná, Brazil.

When?

November 29, 2017, from 8:30 AM to 6:30 PM BRST.

Apply now!

Find more details about the program and register here to attend the event before it’s fully booked, as space is limited.

BigML’s ML Schools Evolution

Since we ran the first edition of our Machine Learning Schools in Valencia, Spain, the interest of many ML practitioners has increased over the years. The best proof of that is in the applicant and attendee stats since the first edition.

In September 2015, at the very first Valencian Summer School in Machine Learning we welcomed 95 attendees coming from 7 different countries, and out of those, 92 were from Europe. 42 of them came from the academia representing 13 universities and the remaining 53 attendees came from 40 organizations.

One year later, in September 2016, we celebrated the second edition and went from 95 attendees we had in the first edition to 142 attendees from 19 different countries. We realized that the world was ready to learn and apply Machine Learning techniques more than ever. Out of the 142 attendees, 125 of them were from Europe, and out of those, 109 from Spain. We also had more business profiles join the event compared to the first edition. 39 attendees represented 21 universities versus the 82 remaining representing 53 organizations.

The positive response from the audience encouraged us to do more; that’s when we decided to organize the first Brazilian Summer School in Machine Learning. That event took place in São Paulo, in December 2016. We were surrounded by 202 Brazilian attendees coming from 6 different states such as Minas Gerais, São Paulo, Rio de Janeiro, Paraná, Santa Catarina, and Rio Grande do Sul. And again, many more attendees, 186, coming from private companies versus the 16 attendees from academic institutions.

This year, in September 2017, the BigML Team ran the third edition of our Valencian Summer School in Machine Learning, that brought together 204 attendees from 14 countries and 183 of them from Europe, mostly from Spain. Among the crowd, there were 45 attendees from 28 universities, and 159 from 92 organizations.

Now, with your help, it’s time to beat the records with the second edition of our Machine Learning School in Brazil, this year in Curitiba!

The Impact of Protecting Inventions on Time for Tech Companies

Protecting the Intellectual Property for any business, especially for innovative tech companies, is key to become more competitive in the marketplace as well as more attractive for investors. This was the main message explained by all speakers at the three special events celebrated in Spain on October 25, 26 and 27 in Madrid, Barcelona, and Valencia, respectively.

BigML, together with Telefónica and the two law firms FisherBroyles and Schwabe, Williamson & Wyatt, co-organized this event to discuss the importance that patents have in any tech business, explained from three different perspectives: startup, big corporation, and the legal view. The talks analyzed the main differences between the patent protection process for US and European companies, where the regulations in US make it easier than in European countries.

Starting with the startup perspective, Dr. Francisco J. Martín, the Co-Founder and CEO of BigML, provided a talk based on his personal and professional experience on how patents have helped the companies he created. “The companies that file patents receive more funding and support from investors”, highlighted Dr. Martín, based on the study he presented made with PreSeries, which analyzes more than 300.000 tech companies and shows the correlation between companies that file patents during their first years and the amount of funding they receive throughout their lives.

Following up with the perspective of big corporations, Dr. Luis Ignacio Vicente del Olmo, the Return on Innovation Area of Telefónica and Head of Telefónica Patent Office, explained how to manage Intellectual Property within the innovation process, as well as how to protect software-implemented inventions, since by 2020, 50% of the patent applications will be software-implemented. Del Olmo presented both sessions in Madrid and Valencia, whereas the speaker in Barcelona was Antonio López-Carrascoso, Senior Technology Expert at Telefónica Patent Office.

Finally, concluding with the legal viewpoint, Ms. Graciela Gómez Cowger, IP Strategist and CEO at Schwabe, Williamson & Wyatt, and Mr. Micah Stolowitz, IP Strategist at FisherBroyles, addressed why all tech startups should make Intellectual Property an integral part of their market strategy, especially because “VC’s invest in IP, if it’s missing, they simply don’t invest”, pointed out Ms. Gómez. Both IP Strategists emphasized the importance of these intangible assets, an idea that was present in the three talks and which the audience devoted some time to ask related questions during the round table.

If you are hungry for more, please visit our Facebook and Google+ pages to check all the pictures taken at the events in Madrid, Barcelona and Valencia. The BigML Team looks forward to seeing you all in further editions!

Predicting TED Talks Popularity

Everyone knows TED talks. TED started in 1984 as a conference series on technology, education, and design. In essence, TED talks aim to democratize knowledge. Nowadays it produces more than 200 talks per year addressing dozens of different topics. Despite the critics who claim that TED talks reduce complex ideas to 20-minute autobiographical stories of inspiration, the great influence they have on the knowledge diffusion in our society is undeniable.

Predicting Ted Talks Views with Machine Learning

When I came across the TED dataset in Kaggle, the first thing that caught my attention was the great dispersion in the number of views: from 50K to over 47M (with a median of ~1M views). One can’t keep but wonder what makes some talks 20 times more popular than others? Can the TED organizers and speakers do something to maximize the views in advance? In this blog post, we’ll try to predict the popularity of TED talks and analyze the most influential factors.

The Data

The original file provides information for 2,550 talks over one decade: from 2006 (only 50 talks were published) until 2017 (over 200 talks were published). When you create a dataset in BigML, you can see all the features with their types and their distributions. You can see the distribution of talks over the years in the last field in the dataset seen below.

dataset.png

For text fields, we can inspect the word frequency in a tag cloud. For example, the most used words in the titles are “world”, “life” and “future”.

tag-cloud.png

When we take a look at the features, we see two main sets: one that informs us about the talk impact (comments, languages, and views -our objective field-), and another that describes the talk characteristics (the title, description, transcript, speakers, duration, etc). Apart from the original features, we came up with two additional ones: the days between video creation and publishing and the days between publishing and dataset collection on Sept 21, 2017.

The fields related to talk impact may be a future consequence of our objective field: the views. A talk with more views is more likely to have a higher number of comments and to be translated into more languages. Therefore, it’s best if we exclude those from our analysis, otherwise, we will create a “leakage”, i.e., leaking information from the future in the past, thus obtaining unrealistically good predictions.

On the other hand, all the fields related to the talk characteristics can be used as predictors. Most of them are text fields such as the title, the description, or the transcript. BigML supports text fields for all supervised models, so we can just feed the algorithms with them. However, we suspect that not all the thematically related talks necessarily use the same words. Thus, using the raw text fields may not be the best way to find patterns in the training data that can be generalized to other TED talks. What if instead of using the exact raw words in the talks, we could extract their main topics and use them as predictors? That is exactly what BigML topic models allow us to do!

Extracting the Topics of TED talks

We want to know the main topics of our TED talks and use them as predictors. To do this, we build a topic model in BigML using the title, the description, the transcript, and the tags as input fields.

The coolest thing about BigML topic models is that you don’t need to worry about text pre-processing. It’s very hand that BigML automatically cleans the punctuation marks, homogenizes the cases, excludes stopwords, and applies stemming during the topic model creation. You can also fine tune those settings and include bigrams as you wish by configuring your topic model in advance.

topic-model.png

When our topic model is created, we can see that BigML has found 40 different topics in our TED talks including technology, education, business, religion, politics, and family, among others.  You can see all of them in a circle map visualization in which each circle represents a topic. The size of the circle represents the importance of that topic in the dataset and related topics are located closer in the graph. Each topic is a distribution over term probabilities. If you mouse over each topic you can see the top 20 terms and their probabilities within that topic. BigML also provides another visualization in which you can see all the top terms per topic displayed in horizontal bars. You can observe both views below or better yet inspect the model by yourself here!

BigML topic models are an optimized implementation of Latent Dirichlet Allocation (LDA), one of the most popular probabilistic methods for topic modeling. If you want to learn more about topic models, please read the documentation.

All the topics found seem to be coherent with main TED talks themes. Now we can use this model to calculate the topic probabilities for any new text. Just click on the option Topic Distribution in the 1-click action menu, and you will be redirected to the prediction view where you can set new values to your input fields. See below how the distribution over topics changes when you change the text in the input fields.

Now we want to do the same for our TED talks. To calculate the topic probabilities for each TED talk we use the option Batch Topic Distribution in the 1-click action menu. Then we select our TED talks dataset. We also need to make sure that the option to create a new dataset out of the topic distribution is enabled!

topic-distribution.png

When the batch topic distribution is created, we can find the new dataset with additional numeric fields, containing the probabilities of each topic per TED talk. These are the fields that we will use as inputs to predict the views replacing the transcript, title, description, and tags.

ted-dataset-topics.png

Predicting the TED Talks Views

Now we are ready to predict the talk views. As we mentioned at the beginning of this post, the number of talks is a widely dispersed and highly skewed field, therefore predicting the exact number of views can be difficult. In order to find more general patterns of the influence of topics as applicable to the talks popularity, we are going to discretize the views and make it categorical. This is very easy in BigML, you just need to select the option to Add fields to the dataset in the configure option menu. We are going to discretize the field by percentiles by using the median.

discretization.png

Then we click the button to create a dataset. The dataset contains a new field with two classes, the first class containing the talks below the median number of views (less than 1M views), and a second class containing the talks over the median number of views, (more than 1M views).

views-discretize.png

Before creating our classification model, we need to split our dataset into two subsets: the 80% for training and the remaining 20% for testing to ensure that our model generalizes well against data that the model has not seen before. We can easily do this in BigML by using the corresponding option in the 1-click action menu, as shown below.

split-dataset.png

We proceed with the 80% of our dataset to create our predictive model. To compare how different algorithms perform, we create a single tree, an ensemble (Random Decision Forest), a logistic regression and the latest addition to BigML, deepnets (an optimized implementation of the popular deep neural networks). You can easily create those models from the dataset menus. BigML automatically selects the last field in the dataset as the objective field “views (discretized)” so we don’t need to configure it differently. Then we use the 1-click action menu to easily create our models.

select-model.png

Apart from the 1-click deepnet, which uses an automatic parameter optimization option called Structure Suggestion, we also create another deepnet by configuring an alternative automatic option called Network Search. BigML offers this unique capability for automatic parameter optimization to eliminate the difficult and time-consuming work of hand-tuning your deepnets.

deepnets-config.png

After some iterations, we realize that the features related to the speaker have no effect on the number of views, therefore we eliminate those along with the field “event” that seems to be causing overfitting. At the end, we use as input fields all the topics, the year of the published date, the duration of the talk, plus our calculated field that measures the number of days since the published date (until the 21st of Sept 2017).

After creating all the models with these selected features, we need to evaluate them by using the remaining 20% of the dataset that we set aside earlier. We can easily compare the performance of our models with the BigML evaluation comparison tool, where the ROC curves can be analyzed altogether. As seen below, the winner with the highest AUC (0.776)  is a deepnet that uses the automatic parametrization option “Network Search”. The second best performing model is again a deepnet, but the one using the automatic option “Structure Suggestion”. This one has a AUC value of 0.7557. In third place, we see the ensemble (AUC of 0.7469), followed by logistic regression (AUC of 0.7097) and finally the single tree (AUC of 0.6781).

compare-evals.png

When we take a look at the confusion matrix of our top performing deepnet, we can see that we are achieving over 70% recall with a 70% precision for both classes of the objective field.

confusion-matrix.png

Inspecting the Deepnet

Usually, deep neural network predictions are hard to analyze, that’s why BigML provides ways to make it easy for you to understand why your model is making particular decisions.

First, BigML can tell us which fields are most important in predicting the number of views. By clicking on the “Summary Report” option, we get a histogram with a percentage per field that displays field importances. We can see that (not surprisingly) the “days_since_published” is the most important field (19.33%), followed by the topic “Entertainment” (15.90%), and the “published_date.year” (13.62%). Amongst the top 10 most important fields we can also find the topics “Relationships”, “Global issues”, “Research”, “Oceans”, “Data”, “Psychology” and “Science”.

importances.png

Great! We can state that our deepnet found the topics as relevant predictors in deciding the number of views. But how exactly do these topics impact predictions? Will talks about psychology have more or less views than talks about science? To answer this question, BigML provides a Partial Dependence Plot view, where we can analyze the marginal impact of the input fields on the objective field. Let’s see some examples (or play with the visualization here at your leisure).

For example, see in the image below how the combination of the topics “Entertainment” and “Psychology” have a positive impact on the number of views. Higher probabilities of those topics result in the prediction of our 2nd class (the blue one), which is the class over 1M number of views.

pdp1.png

On the contrary, if we select “Health” instead, we can see how higher probabilities of this topic result in a higher probability of predicting the 1st class (the class below 1M number of views).

pdp2.png

We can also see a change over time in the interest in some topics. As you can see below, the probability of the psychology topic to have more than 1M views has increased in the recent years given the period of 2012 to 2017.

pdp3.png

Conclusions

In summary, we have seen that topics do have a significant influence on the number of views. After analyzing the impact of each topic in predictions, we observe that “positive” topics such as entertainment or motivation are more likely to have a greater number of views while talks with higher percentages of the “negative” topics like diseases, global issues or war are more likely to have fewer views. In addition, it seems that the interest in individual human-centered topics such as psychology or relationships has increased over the years to the detriment of broader social issues like health or development.

We hope you enjoyed this post! If you have any questions don’t hesitate to contact us at support@bigml.com.

How to Create a WhizzML Script – Part 4

As the final installment of our WhizzML blog series (Part 1, Part 2, Part 3), it’s time to learn the BigML Python bindings way to create WhizzML scripts. With this last tool at your disposal, you’ll officially be a WhizzML-script-creator genius!

About BigML Python bindings

BigML Python bindings allow you to interact with BigML.io, the API for BigML. You can use it to easily create, retrieve, list, update, and delete BigML resources (i.e., sources, datasets, models and predictions). This tool becomes even more powerful when it is used with WhizzML. Your WhizzML code can be stored and executed in BigML by using three kinds of resources: Scripts, Libraries, and Executions (see the first post: How to create WhizzML Script. Part 1).pythonWhizzMl

WhizzML Scripts can be executed in BigML’s servers in a controlled, fully-scalable environment that automatically takes care of their parallelization and fail-safe operation. Each execution uses an Execution resource to store the arguments and the results of the process. WhizzML Libraries store generic code to be shared or reused in other WhizzML Scripts. But this is not really new, so let’s get into some new tricks instead.

Bindings

You can create and execute your WhizzML script via the bindings to create powerful workflows. There are multiple languages supported by BigML bindings e.g., Python, C#, Java or PHP. In this post we’ll use the Python bindings but you can find the others on GitHub. If you need a BigML Python bindings refresher, please refer to the dedicated Documentation.

Scripts

In BigML a script resource stores WhizzML source code, and the results of its compilation. Once a WhizzML script is created, it’s automatically compiled. If compilation is successful, the script can be run, that is, used as the input for a WhizzML execution resource. Suppose we want a simple script that creates a model from a dataset ID. The code in WhizzML to do that is:

# define the code that will be compiled in the script
script_code = "(create-model {\"dataset\" dataset-id})"

However, you need to keep in mind that every script can have inputs and outputs. In this case, we’ll want the dataset-id variable to contain our input, the dataset ID. In the code below, we see how to describe this input, its type and even associate a specific ID that will be used by default when no input is provided.

from bigml.api import BigML
api = BigML()

# define the code include in the script
script_code = "(create-model {\"dataset\" dataset-id})" 

# define parameters
args = {"inputs": [{
    "name": "dataset-id",
    "type": "dataset-id",
    "default": "dataset/598ceafe4006830450000046",
    "description": "Dataset to be modelized"
}]}

# create the script
script = api.create_script(script_code, args)

# waiting for the script compilation to finish
api.ok(script)

Executions

To execute a compiled WhizzML script in BigML, you need to create an execution resource. Each execution is run under its associated user credentials and its particular environment constraints. Furthermore, a script can be shared, and you can execute the same script several times under different usernames by creating a different execution. For this, you need to first identify the script you want to execute and set your inputs as an array of arrays (one per variable-value pair). In this execution, we are setting the value of variable x in the script to 2. The parameters for the execution are the script inputs. It’s a piece of cake!

from bigml.api import BigML
api = BigML() 

# choose workflow
script = 'script/591ee3d00144043f5c003061' 

# define parameters
args = {"inputs": [['x', 2]]}

# execute and wait for the execution to finish
execution = api.create_execution(script, args)
api.ok(execution)

Libraries

We haven’t created libraries in the previous posts, but there’s always a first time. A library is a shared collection of WhizzML functions and definitions usable by any script that imports them. It’s a special kind of compiled WhizzML source code that only defines functions and constants. Libraries can be imported in scripts. The imports attribute of a script can contain a list of library IDs whose defined functions and constants will be usable in the script. It’s very easy to create WhizzML libraries by using BigML Python bindings.

Let’s go through a simple example. We’ll define one function get-max and a constant. The function will get the max value in a list. In WhizzML language this can be expressed as below.

from bigml.api import BigML
api = BigML()

# define the library code
lib_code = \
    "(define (get-max xs)" \
    "  (reduce (lambda (x y) (if (> x y) x y))" \
    "    (head xs) xs))"

# create a library
library = api.create_library(lib_code)

# waiting for the library compilation to finish
api.ok(library)

For more examples, don’t hesitate to take a look at the WhizzML resources section in the BigML Python Bindings documentation.

So that’s it! You now know all the techniques needed to create and execute WhizzML scripts. We’ve covered the basic concepts of WhizzML, how to clone existing scripts or import them from Github (Part 1), how to create new scripts by using Scriptify and the editor (Part 2), how to use the BigMLer command line (Part 3) and finally the BigML Python bindings in this last post.

2

To go deeper into the world of Machine Learning automation via WhizzML, please check out our WhizzML tutorials such as the Automated Dataset Transformation tutorial. If you need a WhizzML refresher, you can always visit the WhizzML documentation. Start with the “Hello World” section, in the Primer Guide, for a gentle introduction. And yes, please be on the lookout for more WhizzML related posts on our blog.

BigML Summer 2017 Release Webinar Video is Here: Deepnets!

We are happy to share that Deepnets are fully implemented on our platform and available from the BigML Dashboard, API, as well as from WhizzML for its automation.

BigML Deepnets are the world’s first deep learning algorithm capable of Automatic Network Detection and Structure Search, which automatically optimize model performance and minimize the need for expensive specialists to drive business value. Following our mission to make Machine Learning beautifully simple for everyone, BigML now offers the very first service that enables non-experts to use deep learning with results matching that of top-level data scientists. BigML’s extensive benchmark conducted on 50+ datasets has shown Deepnets, an optimized version of Deep Neural Networks brought to the BigML platform, to outperform other algorithms offered by popular Machine Learning libraries. With nothing to install, nothing to configure, and no need to specify a neural network structure, anyone can use BigML’s Deepnets to transform raw data into valuable business insights.

Special thanks to all webinar attendees who joined the BigML Team yesterday during the official launch. For those who missed the live webinar, you can watch the full recording on the BigML YouTube channel.

As explained in the video, one of the main complexities of deep learning is that a Machine Learning expert is required to find the best network structure for each problem. This can often be a tedious trial-and-error process that can take from days to weeks. To combat these challenges and make deep learning accessible for everyone, BigML now enables practitioners to find the best network for their data without having to write custom code or hand-tune parameters. We make this possible with two unique parameter optimization options: Automatic Network Search and Structure Suggestion.

BigML’s Automatic Network Search conducts an intelligent guided search over the space of possible networks to find suitable configurations for your dataset. The final Deepnet will use the top networks found in this search to make predictions. This capability yields a better model, however, it takes longer since the algorithm conducts an extensive search for the best solution. It’s ideal for use cases that justify the incremental wait for optimal Deepnet performance. On the other hand, BigML’s Structure Suggestion only takes nominally longer than training a single network. This option is capable of swiftly recommending a neural network structure that is optimized to work well with your particular dataset.

For further learning on Deepnets, please visit our dedicated summer 2017 release page, where you will find:

  • The slides used during the webinar.
  • The detailed documentation to learn how to create and evaluate your Deepnets, and interpret the results before making predictions from the BigML Dashboard and the BigML API.
  • The series of six blog posts that gradually explain Deepnets.

Thanks for your positive comments after the webinar. And remember that you can always reach out to us at support@bigml.com for any suggestions or questions.

Deepnets: Behind The Scenes

Over our last few blog posts, we’ve gone through the various ways you can use BigML’s new Deepnet resource, via the Dashboard, programmatically, and via download on your local machine. But what’s going on behind the curtain? Is there a little wizard pulling an elaborate console with cartoonish-looking levers and dials?

Well, as we’ll see, Deepnets certainly do have a lot of levers and dials.  So many, in fact, that using them can be pretty intimidating. Thankfully, BigML is here to be your wizard* so you aren’t the one looked shamefacedly at Dorothy when she realizes you’re not as all-powerful as you might have thought.

BigML Deep Learning

Deepnets:  Why Now?

First, let’s address an important question, why are deep neural networks suddenly all the rage?  After all, the Machine Learning techniques underpinning deep learning have been around for quite some time. The reason boils down to a combination of innovations in the technology supporting the learning algorithms more than advances in learning algorithms themselves. It’s worth quoting Stuart Russell at length here:

. . .deep learning has existed in the neural network community for over 20 years. Recent advances are driven by some relatively minor improvements in algorithms and models and by the availability of large data sets and much more powerful collections of computers.

He gets at most of the reasons in this short paragraph.  Certainly, the field has been helped along by the availability of huge datasets like the ones generated by Google and Facebook, as well as some academic advances in algorithms and models.

But the things I will focus on in this post are in the family of “much more powerful collections of computers”.  In the context of Machine Learning, I think this means two things:

  • Highly parallel, memory-rich computers provisionable on-demand.  Few people can justify building a massive GPU-based server to train a deep neural network on huge data if they’re only going to use it every now and then. But most people can afford the same for a few days at a time. Making such power available in this way makes deep learning cost-effective for far more people than it used to be.
  • Software frameworks that have automatic differentiation as first-class functionality. Modern computational frameworks (like TensorFlow, for example) allow programmers to instantiate a network structure programmatically and then just say “now do gradient descent!” without ever having to do any calculus or worry about third-party optimization libraries.  Because differentiation of the parameters with respect to the input data is done automatically, it becomes much easier to try a wide variety of structures on a given dataset.

The problem here then becomes one of expertise:  people need powerful computers to do Machine Learning, but few people know how to provision and deploy machines on, say Amazon AWS to do this. Similarly, computational frameworks to specify and train almost any deep network exist and are very nice, but exactly how to use those frameworks and exactly what sort of network to create are problems that require knowledge in a variety of different domains.

Of course, this is where we come in. BigML has positioned itself at the intersection of these two innovations, utilizing a framework for network construction that allows us to train a wide variety of network structures for a given dataset, using the vast amount of available compute power to do it quickly and at scale, as we do with everything else. We add to these innovations a few of our own, in an attempt to make Deepnet learning as “hands off” and as “high quality” as we can manage.

What BigML Deepnets are (currently) Not:

Before we get into exactly what BigML’s Deepnets are let’s talk a little bit about what they aren’t. Many people who are technically minded will immediately bring to mind the convolutional networks that do so well on vision problems, or the recurrent networks (such as LSTM networks) that have great performance on speech recognition and NLP problems.

We don’t yet support these network types.  The main reason for this is that these networks are designed to solve supervised learning problems that have a particular structure, not the general supervised learning problem.  It’s very possible we’ll support particular problems of those types in the future (as we do with, say, time series analysis and topic modeling), and we’d introduce those extensions to our deep learning capabilities at that time.

Meanwhile, we’d like to bring the power of deep learning so obvious in those applications to our users in general.  Sure, deep learning is great for vision problems, but what can it do on my data?  Hopefully, this post will prove to you that it can do quite a lot.

Down to Brass Tacks

The central idea behind neural networks is fairly simple, and not so very different from the idea behind logistic regression:  You have a number of input features and a number of possible outputs (either one probability for each class in classification problems or a single value for regression problems). We posit that the outputs are a function of the dot product of the input features, together with some learned weights for each feature.  In logistic regression, for example, we imagine that the probability of a given class can be expressed as the logistic function applied to this dot product (plus some bias term).

Deep networks extend this idea in several ways. First, we introduce the idea of “layers”, where the inputs are fed into a layer of output nodes, the outputs of which are in turn fed into another layer of nodes, and so on until we get to a final “readout” layer with the number of output nodes equal to the number of classes.

deepnet.png

This gives rise to an infinity of possible network topologies:  How many intermediate (hidden) layers do you want?  How many nodes in each one?  There’s no need to apply the logistic function at each node; in principle, we can apply anything that our network framework can differentiate. So we could have a particular “activation function” for each layer or even for each node! Could we apply multiple activation functions?  Do we learn a weight for every single node in one layer to every single node in another, or do we skip some?

Add to this the usual parameters for gradient descent: Which algorithm to use?  How about the learning rate? Are we going to dropout training to avoid overfitting?  And let’s not even get into the usual parameters common to all ML algorithms, like how to handle missing data or objective weights. Whew!

Extra Credit

What we’ve described above is the vanilla feed-forward neural network that the literature has known for quite a while, and we can see that they’re pretty parameter heavy. To add a bit more to the pile, we’ve added support for a couple of fairly recent advances in the deep learning world (some of the “minor advances” mentioned by Russel) that I’d like to mention briefly.

Batch Normalization

During the training of deep neural networks, the activations of internal layers of the network can change wildly throughout the course of training. Often, this means that the gradients computed for training can behave poorly so one must be very careful to select a sufficiently low learning rate to mitigate this behavior.

Batch normalization fixes this by normalizing the inputs to each layer for each training batch of instances, assuring that the inputs are always mean-centered and unit-variance, which implies well-behaved gradients when training. The downside is that you must now know both mean and variance for each layer at prediction time so you can standardize the inputs as you did during training. This extra bookkeeping tends to slow down the descent algorithm’s per-iteration speed a bit, though it sometimes leads to faster and more robust convergence, so the trade-off can be worthwhile for certain network structures.

Learn Residuals

Residual networks are networks with “skip” connections built in. That is, every second or third layer, the input to each node is the usual output from the previous layer, plus the raw, unweighted output from two layers back.

The theory behind this idea is that this allows information present in the early layers to bubble up through the later layers of the network without being subjected to a possible loss of that information via reweighting on subsequent layers. Thus, the later layers are encouraged to learn a function representing the residual values that will allow for good prediction when used on top of the values from the earlier layers. Because the early layers contain significant information, the weights for these residual layers can often be driven down to zero, which is typically “easier” to do in a gradient descent context than to drive them towards some particular non-zero optimum.

Tree Embedding

When this option is selected, we learn a series of decision trees, random forest-style, against the objective before learning (with appropriate use of holdout sets, of course). We then use the predictions of the trees as generated “input features” for the network. Because these features tend to have a monotonic relationship with the class probabilities, this can make gradient descent reach a good solution more quickly, especially in domains where there are many non-linear relationships between inputs and outputs.

If you want, this is a rudimentary form of stacked generalization embedded in the tree learning algorithm.

Pay No Attention To The Man Behind The Curtain

So it seems we’ve set ourselves an impossible task here. We have all of these parameters. How on earth are we going to find a good network in the middle of this haystack?

Here’s where things get BigML easy:  the answer is that we’re going to do our best to find ones that work well on the dataset we’re given. We’ll do this in two ways:  via metalearning and via hyper-parameter search.

Metalearning

Metalearning is another idea that is nearly as old as Machine Learning itself.  In its most basic form, the idea is that we learn a bunch of classifiers on a bunch of different datasets and measure their performance. Then, we apply Machine Learning to get a model that predicts the best classifier for a given dataset. Simple, right?

In our application, this means we’re going to learn networks of every sort of topology parameterized in every sort of way. What do I mean by “every sort”?  Well, we’ve got over 50 datasets so we did five replications of 10-fold cross-validation on each one. For each fold, we learned 128 random networks, then we measured the relative performance of each network on each fold.  How many networks is that?  Here, allow me to do the maths:

50 * 5 * 10 * 128 = 320,000.

“Are you saying you trained 320,000 neural networks”  No, no, no, of course not!  Some of the datasets were prohibitively large so we only learned a paltry total of 296,748 networks.  This is what we do for fun here, people!

When we model the relative quality of the network given the parameters (which of course do use BigML), we learn a lot of interesting little bits about how the parameters of neural networks relate to one another and the data on which they’re being trained.

You’ll get better results using the “adadelta” decent algorithm, for example, with high learning rates, as you can see by the green areas of the figure below, indicating parts of the parameter space that specify networks, which perform better on a relative basis.

adagrad.png

But if you’re using “rms_prop”, you’ll want to use learning rates that are several orders of magnitude lower.

rms_prop.png

Thankfully, you don’t have to remember all of this.  The wisdom is in the model, and we can use this model to make intelligent suggestions about which network topology and parameters you should use, which is exactly what we do with the “Automatic Structure Suggestion” button.

Structuresuggestion.png

But, of course, the suggested structure and parameters represent only a guess at the optimal choices, albeit an educated one.  What if we’re wrong?  What if there’s another network structure out there that performs well on your data, if you could only find it.  Well, we’ll have to go looking for it, won’t we?

Network Search

Our strategy here again comes in many pieces.  The first is Bayesian hyperparameter optimization.  I won’t go into it too much here because this part of the architecture isn’t too much more than a server-side implementation of SMACdown, which I’ve described previously. This technique essentially allows a clever search through the space of possible models by using the results of previously learned networks to guide the search. The cleverness here lies in using the results of the previously trained networks to select the best next one to evaluate. Beyond SMAC, we’ve done some experiments with regret-based acquisition functions, but the flavor of the algorithm is the same.

We’ve also more heavily parallelized the algorithm, so the backend actually keeps a queue of networks to try, reordering that queue periodically as its model of network performance is refined.

The final innovation here is using bits and pieces of the hyperband algorithm.  The insight of the algorithm is that a lot of savings in computation time can be gained by simply stopping training on networks that, even if their fit is still improving, have little hope of reaching near-optimal performance in reasonable time. Our implementation differs significantly in the details (in our interface, for example, you provide us with a budget of search time that we always respect), but does stop training early for many underperforming networks, especially in the later stages of the search.

This is all available right next door to the structure suggestion button in the interface.Automatic Network SearchBetween metalearning and network search, we can make a good guess as to some reasonable network parameters for your data, and we can do a clever search for better ones if you’re willing to give us a bit more time.  It sounds good on paper, so let’s take it for a spin.

Benchmarking

So how did we do? Remember those 50+ datasets we used for metalearning?  We can use the same datasets to benchmark the performance of our network search against other algorithms (before you ask, no, we didn’t apply our metalearning model during this search as that would clearly be cheating). We again do 5 replications of 10-fold cross validation for each dataset and measure performance over 10 different metrics. As this was a hobby project I started before I came to BigML, you can see the result here.

Deepnets Benchmark

You can see on the front page a bit of information about how the benchmark was conducted (scroll below the chart), and a comparison of 30+ different algorithms from various software packages. The thing that’s being measured here is “How close is the performance of this algorithm to the best algorithm for a given dataset and metric?”  You can quickly see that BigML Deepnets are the best thing or quite close more often than any other off-the-shelf algorithm we’ve tested.

The astute reader will certainly reply, “Well, yes, but you’ve done all manner of clever optimizations on top of the usual deep learning; you could apply such cleverness to any algorithm in the list (or a combination!) and maybe get better performance still!”

This is absolutely true.  I’m certainly not saying that BigML deep learning is the best Machine Learning algorithm there is; I don’t even know how you would prove something like that. But what these results do show is that if you’re going to just pull something off of the shelf and use it, with no parameter tuning and little or no coding, you could do a lot worse than to pull off BigML. Moreover, the results show that deep learning (BigML or otherwise; notice that multilayer perceptrons in scikit are just a few clicks down the list) isn’t just for vision and NLP problems, and it might be the right thing for your data too.

Another lesson here, editorially, is that benchmarking is a subtle thing, easily done wrong. If you go to the “generate abstract” page, you can auto-generate a true statement (based on these benchmarks) that “proves” that any algorithm in this list is “state-of-the-art”. Never trust a benchmark on a single dataset or a single metric! While any benchmark of ML algorithms is bound to be inadequate, we hope that this benchmark is sufficiently general to be useful.

Cowardly Lion Deepnets

Hopefully, all of this has convinced you to go off to see the Wizard and give BigML Deepnets a try. Don’t be the Cowardly Lion! You have nothing to lose except underperforming models…

Want to know more about Deepnets?

Please visit the dedicated release page for further learning. It includes a series of six blog posts about Deepnets, the BigML Dashboard and API documentation, the webinar slideshow as well as the full webinar recording.

 

Automating Deepnets with WhizzML and The BigML Python Bindings

by

This blog post, the fifth of our series of six posts about Deepnets, focuses on those users that want to automate their Machine Learning workflows using programming languages. If you follow the BigML blog, you may know WhizzML, BigML’s domain-specific language for automating Machine Learning workflows, implementing high-level Machine Learning algorithms, and easily sharing them with others. WhizzML helps developers to create Machine Learning workflows and execute them entirely in the cloud. This avoids network problems, memory issues and lack of computing capacity while taking full advantage of WhizzML’s built-in parallelization. If you are not familiar with WhizzML yet, we recommend that you read the series of posts we published this summer about how to create WhizzML scripts: Part 1, Part 2 and Part 3 to quickly discover their benefits.

In addition, in order to easily automate the use of BigML’s Machine Learning resources, we maintain a set of bindingswhich allow users to work in their favorite language (Java, C#, PHP, Swift, and others) with the BigML platform.

Screen Shot 2017-03-15 at 01.51.13

Let’s see how to use Deepnets through both the popular BigML Python Bindings and WhizzML, but note that the operations described in this post are also available in this list of bindings.

We start creating Deepnets with the default settings. For that, we need to start from an existing Dataset to train the network in BigML, so our call to the API will need to include the Dataset ID we want to use for training as shown below:

;; Creates a deepnet with default parameters
(define my_deepnet (create-deepnet {"dataset" training_dataset}))

The BigML API is mostly asynchronous, that is, the above creation function will return a response before the Deepnet creation is completed. This implies that the Deepnet information is not ready to make predictions right after the code snippet is executed, so you must wait for its completion before you can predict with it. You can use the directive “create-and-wait-deepnet” for that:

;; Creates a deepnet with default settings. Once it's
;; completed the ID is stored in my_deepnet variable
(define my_deepnet
  (create-and-wait-deepnet {"dataset" training_dataset}))

If you prefer the BigML Python Bindings, the equivalent code is:

from bigml.api import BigML
api = BigML()
my_deepnet = api.create_deepnet("dataset/59b0f8c7b95b392f12000000")

Next up, we will configure a Deepnet with WhizzML. The configuration properties can be easily added in the mapping by using property pairs such as <property_name> and <property_value>. For instance, to create a Deepnet with a dataset, BigML automatically fixes the number of iterations to optimize the network to 20,000, but if you prefer the maximum number of gradient steps to take during the optimization process, you should add the property “max_iterations and set it to 100,000. Additionally, you might want to set the value used by the Deepnet when numeric fields are missing. Then, you need to set thedefault_numeric_valueproperty to the right value. In the example shown below, it is replaced by the mean value. Property names always need to be between quotes and the value should be expressed in the appropriate type. The code for our example can be seen below:

;; Creates a deepnet with some settings. Once it's
;; completed the ID is stored in my_deepnet variable
(define my_deepnet
  (create-and-wait-deepnet {"dataset" training_dataset
                            "max_iterations" 100000
                            "default_numeric_value" "mean"}))

The equivalent code for the BigML Python Bindings is:

from bigml.api import BigML
api = BigML()
args = {"max_iterations": 100000, "default_numeric_value": "mean"}
training_dataset ="dataset/59b0f8c7b95b392f12000000"
my_deepnet = api.create_deepnet(training_dataset, args)

For more details about these and other properties, please check the dedicated API documentation (available on October 5.)

Once the Deepnet has been created, we can evaluate how good its performance is. Now, we will use a different dataset with non-overlapped data to check the Deepnet performance.  The “test_dataset” parameter in the code shown below represents the second dataset. Following WhizzML’s philosophy of “less is more”, the snippet that creates an evaluation has only two mandatory parameters: a Deepnet to be evaluated and a Dataset to use as test data.

;; Creates an evaluation of a deepnet
(define my_deepnet_ev
 (create-evaluation {"deepnet" my_deepnet "dataset" test_dataset}))

Similarly, in Python the evaluation is done as follows:

from bigml.api import BigML
api = BigML()
my_deepnet = "deepnet/59b0f8c7b95b392f12000000"
test_dataset = "dataset/59b0f8c7b95b392f12000002"
evaluation = api.create_evaluation(my_deepnet, test_dataset)

After evaluating your Deepnet, you can predict the results of the network for new values of one or many features in your data domain. In this code, we demonstrate the simplest case, where the prediction is made only for some fields in your dataset.

;; Creates a prediction using a deepnet with specific input data
(define my_prediction
 (create-prediction {"deepnet" my_deepnet
                     "input_data" {"sepal length" 2 "sepal width" 3}}))

And the equivalent code for the BigML Python bindings is:

from bigml.api import BigML
api = BigML()
input_data = {"sepal length": 2, "sepal width": 3}
my_deepnet = "deepnet/59b0f8c7b95b392f12000000"
prediction = api.create_prediction(my_deepnet, input_data)

In addition to this prediction, calculated and stored in BigML servers, the Python Bindings allow you to instantly create single local predictions on your computer. The Deepnet information is downloaded to your computer the first time you use it, and the predictions are computed locally on your machine, without any costs or latency:

from bigml.deepnet import Deepnet
local_deepnet = Deepnet("deepnet/59b0f8c7b95b392f12000000")
input_data = {"sepal length": 2, "sepal width": 3}
local_deepnet.predict(input_data)

It is pretty straightforward to create a Batch Prediction from an existing Deepnet, where the dataset named “my_dataset” represents a set of rows with the data to predict by the network:

;; Creates a batch prediction using a deepnet 'my_deepnet'
;; and the dataset 'my_dataset' as data to predict for
(define my_batchprediction
 (create-batchprediction {"deepnet" my_deepnet
                          "dataset" my_dataset}))

The code in Python Bindings to perform the same task is:

from bigml.api import BigML
api = BigML()
my_deepnet = "deepnet/59d1f57ab95b39750c000000"
my_dataset = "dataset/59b0f8c7b95b392f12000000"
my_batchprediction = api.create_batch_prediction(my_deepnet, my_dataset)

Want to know more about Deepnets?

Our next blog post, the last one of this series, will cover how Deepnets work behind the scenes, diving into the most technical aspects of BigML’s latest resource. If you have any questions or you’d like to learn more about how Deepnets work, please visit the dedicated release page. It includes a series of six blog posts about Deepnets, the BigML Dashboard and API documentation, the webinar slideshow as well as the full webinar recording.

Programming Deepnets with the BigML API

So far, we have introduced BigML’s Deepnets, how they are used, and how to create one in the BigML Dashboard. In this post, the fourth in our series of blog posts about Deepnets, we will see how to use them programmatically using the API. So let’s start!

The API workflow to create Deepnets includes five main steps: first, upload your data to BigML, then create a dataset, create a Deepnet, evaluate it and finally make predictions. Note that any resource created with the API will automatically be created in your Dashboard too so you can take advantage of BigML’s intuitive visualizations at any time.

Authentication

The first step in using the API is setting up your authentication. This is done by setting the BIGML_USERNAME and BIGML_API_KEY environment variables. Your username is the same as the one you use to log into the BigML website. To find your API key, on the website, navigate to your user account page and then click on ‘API Key’ on the left. To set your environment variables, you can add lines like the following to your .bash_profile file.

export BIGML_USERNAME=my_name
export BIGML_API_KEY=13245 
export BIGML_AUTH="username=$BIGML_USERNAME;api_key=$BIGML_API_KEY;"

Once your authentication is set up, you can begin your workflow.

1. Upload your Data

Data can be uploaded to BigML in many different ways. You can use a local file, a remote URL, or inline data. To create a source from a remote URL, use the curl command:

 curl "https://bigml.io/source?$BIGML_AUTH" \
 -X POST \
 -H 'content-type: application/json' \
 -d '{"remote": "https://static.bigml.com/csv/iris.csv"}'

2. Create a Dataset

Now that you have a source, you will need to process it into a dataset. Use the curl command:

curl "https://bigml.io/dataset?$BIGML_AUTH" \
-X POST \
-H 'content-type: application/json' \
-d '{"source": "source/59c14dce2ba7150b1500fdb5"}'

If you plan on running an evaluation later (and you will want to evaluate your results!), you will want to split this dataset into a testing and a training dataset. You will create your Deepnet using the training dataset (commonly 80% of the original dataset) and then evaluate it against the testing dataset (the remaining 20% of the original dataset). To make this split using the API, you will first run the command:

curl "https://bigml.io/dataset?$BIGML_AUTH" \
-X POST \
-H 'content-type: application/json' \
-d '{"origin_dataset": "dataset/59c153eab95b3905a3000054", 
     "sample_rate": 0.8, 
     "seed": "myseed"}'

using the sample rate and seed of your choice. This creates the training dataset.

Now, to make the testing dataset, run:

curl "https://bigml.io/dataset?$BIGML_AUTH" \
-X POST \
-H 'content-type: application/json' \
-d '{"origin_dataset": "dataset/59c153eab95b3905a3000054", 
     "sample_rate": 0.8,
     "out_of_bag": true,
     "seed": "myseed"}'

By setting “out_of_bag” to true, you are choosing all the rows you did not choose while creating the training set. This will be your testing dataset.

3. Create a Deepnet

Now that you have the datasets you need, you can create your Deepnet. To do this, use the command:

curl "https://bigml.io/deepnet?$BIGML_AUTH" \
-X POST \
-H 'content-type: application/json' \
-d '{"dataset": "dataset/59c15634b95b3905a1000032"}' 

being sure to use the dataset ID of your training dataset. This will create a Deepnet by using the default settings.

You can also modify the settings of your Deepnet in various ways. For example, if you want to change the maximum number of gradient steps to be ten and you want to name your deepnet “my deepnet” you could run:

curl "https://bigml.io/deepnet?$BIGML_AUTH" \
-X POST \
-H 'content-type: application/json' \
-d '{"dataset": "dataset/59c15634b95b3905a1000032",
     "max_iterations": 10,
     "name": "my deepnet"}' 

The full list of Deepnet arguments can be found in our API documentation, which will be fully available on October 5.

4. Evaluate your Deepnet

Once you have created a Deepnet, you may want to evaluate it to see how well it is performing. To do this, create an Evaluation using the resource ID of your Deepnet and the dataset ID of the testing dataset you created earlier.

curl "https://bigml.io/evaluation?$BIGML_AUTH" \
-X POST \
-H 'content-type: application/json' \
-d '{"deepnet": "deepnet/59c157cfb95b390597000085"'
     "dataset": "dataset/59c1568ab95b3905a0000040"}'

Once you have your Evaluation, you may decide that you want to change some of your Deepnet parameters to improve its performance. If so, just repeat step three with different parameters.

5. Make Predictions

When you are satisfied with your Deepnet, you can begin to use it to make predictions. For example, suppose you wanted to predict if the value of field “000001” was 3. To do this, use the command:

curl "https://bigml.io/prediction?$BIGML_AUTH" \
-X POST \
-H 'content-type: application/json' \
-d '{"deepnet": "deepnet/59c157cfb95b390597000085", 
     "input_data" : {"000001": 3}}'

Want to know more about Deepnets?

Stay tuned for the more blog posts! In the next post, we will explain how to automate Deepnets with WhizzML and the BigML Python Bindings. Additionally, if you have any questions or you’d like to learn more about how Deepnets work, please visit the dedicated release page. It includes a series of six blog posts about Deepnets, the BigML Dashboard and API documentation, the webinar slideshow as well as the full webinar recording.

 

Creating your Deepnets with the BigML Dashboard

The BigML Team has been working hard this summer to bring Deepnets to the platform, which will be available on October 5, 2017. As explained in our introductory post, Deepnets are an optimized implementation of the popular Deep Neural Networks, a supervised learning technique that can be used to solve classification and regression problems. Neural Networks became popular because they were able to address complex problems for a machine to solve like identifying objects in images.

The previous blog post presented a study case that showed how Deepnets can help you solve your real-world problems. In this post, we will take you through the five necessary steps to train a Deepnet that correctly identifies the digits using the BigML Dashboard. We will use a partial version of the well-known MNIST dataset (provided by Kaggle) for image recognition that contains 42,000 handwritten images of numbers from 0 to 9.

1. Upload your Data

As usual, you need to start by uploading your data to your BigML account. BigML offers several ways to do it, you can drag and drop a local file, connect BigML to your cloud repository (e.g., S3 buckets) or copy and paste a URL. This will create a source in BigML.

BigML automatically identifies the field types. In this case, our objective field (“label”) has been identified as numeric because it contains the digit values from 0 to 9. Since this is a classification problem (we want our Deepnet to predict the exact digit label for each image instead of a continuous numeric value), we need to configure the field type and select “Categorical” for our objective field.

source-config.png

2. Create a Dataset

From your source view, use the 1-click dataset menu option to create a dataset, a structured version of your data ready to be used by a Machine Learning algorithm.

1-click-dataset.png

In the dataset view, you will be able to see a summary of your field values, some basic statistics, and the field histograms to analyze your data distributions. You can see that our dataset has a total of 42,000 instances and approximately 4,500 instances for each digit class in the objective field.

field-distrib.png

The dataset also includes a total of 784 fields containing the pixel information for each image. You can see that many fields are automatically marked as non-preferred in this dataset. This is because they contain the same value for all images so they are not good predictors for the model.

non-preferred fields.png

Since Deepnets are a supervised learning method, it is key to train and evaluate your model with different data to ensure it generalizes well against unseen data. You can easily split your dataset using the BigML 1-click action menu, which randomly sets aside 80% of the instances for training and 20% for testing.

split-dataset.png

3. Create a Deepnet

In BigML you can use the 1-click Deepnet menu option, which will create the model using the default parameter values, or you can tune the parameters using the Configure Deepnet option.

create-deepnet.png

BigML provides the following parameters to manually configure your Deepnet:

  • Maximum training time and maximum iterations: you can set an upper bound to the Deepnet runtime by setting a maximum of time or a maximum number of iterations.
  • Missing numerics and default numeric value: if your dataset contains missing values you can include them as valid values or replace them with the field mean, minimum, maximum or zero.
  • Network architecture: you can define the number of hidden layers in your network, the activation function and the number of nodes for each of them. BigML also provides other three options related to how the layer connections are arranged such as residual learning, batch normalization and tree embedding (a tree-based representation of the data to input along with the raw features).
  • Algorithm: you can select the gradient descent optimizer including Adam, Adagrad, Momentum, RMSProp, and FTRL. For each of these algorithms, you can tune the learning rate, the dropout rate and a set of specific parameters which explanation goes beyond the purpose of this post. To know more about the differences between algorithms you can read this article.
  • Weights: if your dataset contains imbalanced classes for the objective field, you can automatically balance them with this option that uses the oversampling strategy for the minority class.

Neural Networks are known for being notoriously sensitive to the chosen architecture and the algorithm used to optimize the parameters thereof. Due to this sensitivity and the large number of different parameters, hand-tuning Deepnets can be difficult and time-consuming as the number of choices that lead to poor networks typically vastly outnumber the choices that lead to good results.

To combat this problem, BigML offers first-class support for automatic parameter optimization. In this case, we will use the automatic network search option. This option trains and evaluates over all possible network configurations, returning the best networks found for your problem. The final Deepnet will use the top networks found in this search to make predictions.

config-deepnet.png

When the network search optimizer is enabled, the Deepnet creation may take some time to be created (so be patient!). By default, the maximum training time is set to 30 minutes, but you can configure it.

When your Deepnet is created you will be able to visualize the results in the Partial Dependence Plot. This unique view allows you to inspect the input fields’ impact on predictions. You can select two different fields for the axes and set the values for the rest of input fields to the right. You can see the predictions for each of the classes in different colors in the legend to the right by hovering the chart area. Each class color is shaded according to the class probability.

PDP-viz.png

4. Evaluate your Deepnet

The Deepnet looks good, but we can’t know anything about the performance until we evaluate it. From your Deepnet view, click on the evaluate option in the 1-click action menu and BigML will automatically select the remaining 20% of the dataset that you set aside for testing.

evaluate.png

You can see in the image below that this model has an overall accuracy of 96.1%, however, a high accuracy may be hiding a bad performance for some of the classes.

evaluation.png

To look at the correct decisions as well as the mistakes made by the model per class we need to look at the confusion matrix, but we have too many different categories in the objective field and BigML cannot plot all them in the Dashboard, so we need to download the confusion matrix in Excel format.

confusion-matrix.png

Ok, it seems we get very good results! In the diagonal of the table you can find the right decisions made by the model. Almost all categories have a precision and recall over 95%. Despite these great results, can they get better using other algorithms? We trained a Random Decision Forest (RDF) of 20 trees and a Logistic Regression to compare their performances with Deepnets. BigML offers a comparison tool to easily compare the results of different classifiers. We use the well-known ROC curve and the AUC as the comparison measure. We need to select a positive class, to make the comparison (in this case we selected the digit 9) and the ROC curves for each model are plotted in the chart. You can see in the image below how our Deepnet outperforms the other two models since its ROC AUC is higher.

comparison-tool.png

Although RDF also provides pretty good results, the ROC AUC for Deepnets is consistently better for all the digits from 0 to 9 as you can see in the first column in the table below.

table-comparison.png

5. Make Predictions using your Deepnet

Predictions work for Deepnets exactly the same as for any other supervised method in BigML. You can make predictions for a new single instance or multiple instances in batch.

Single predictions

Click on the Predict option from your Deepnet view. A form containing all your input fields will be displayed and you will be able to set the values for a new instance. At the top of the view, you will get all the objective class probabilities for each prediction.

predict.png

Since this dataset contains more than 100 fields, we cannot perform single predictions from the BigML Dashboard, but we can use the BigML API for that.

Batch predictions

If you want to make predictions for multiple instances at the same time, click on the Batch Prediction option and select the dataset containing the instances for which you want to know the objective field value.

batch-pred2.png

You can configure several parameters of your batch prediction like the possibility to include all class probabilities in the output dataset and file. When your batch prediction finishes, you will be able to download the CSV file and see the output dataset.

batch-pred.png

Want to know more about Deepnets?

Stay tuned for the next blog post, where you will learn how to program Deepnets via the BigML API. If you have any questions or you’d like to learn more about how Deepnets work, please visit the dedicated release page. It includes a series of six blog posts about Deepnets, the BigML Dashboard and API documentation, the webinar slideshow as well as the full webinar recording.

Case Study: Finding Higgs Bosons with Deepnets

In our previous blog post, we’ve introduced Deep Neural Networks. In today’s blog post, the second of our series of six, we’ll use BigML’s deep networks, called Deepnets, to walk through a supervised learning problem.

The Data

The dataset we are investigating is the Higgs dataset found at the UCI Machine Learning Repository. The problem, as explained in the original Nature Communications paper, is that particle accelerators create a huge amount of data. The Large Hadron Collider, for example, can produce 100 billion collisions in an hour and only a tiny fraction may produce detectable exotic particles like the Higgs boson. So a well-trained model is immensely useful for finding the needle in the haystack.

By Lucas Taylor / CERN-EX-9710002-1 (http://cdsweb.cern.ch/record/628469)

This dataset contains 28 different numeric fields and 11 million rows generated through simulation, but imitating actual collisions. As explained in the paper, these 28 fields come in two kinds. The first 21 of the fields capture low-level kinematic properties measured directly by the detectors in the accelerator. The last seven fields are combinations of the kinematic fields hand-designed by physicists to help differentiate between collisions that result in a Higgs boson and those that do not. So some of these fields are “real” data as measured, and some are constructed from domain-specific knowledge.

This kind of feature engineering is very common when solving Machine Learning problems, and can greatly increase the accuracy of your predictions. But what if we don’t have a physicist around to assist in this sort of feature engineering? Compared to other Machine Learning techniques (such as Boosted Trees), Deepnets can perform quite well with just the low-level fields by learning their own higher-level combinations (especially if the low-level fields are numeric and continuous).

To try this out, we created a dataset of just the first 1.5 million data points (for speed), and removed the last seven high-level features. We then split the data into a 80% training dataset and 20% testing dataset. Next up we create a Deepnet by using BigML’s automatic structure search.

Screen Shot 2017-09-27 at 9.28.36 AM

Deep neural networks can be built in a multitude of ways, but we’ve simplified this by intelligently searching through many possible network structures (number of layers, activation functions, nodes, etc.) and algorithm parameters (the gradient descent optimizer, the learning rate, etc.) and then making an ensemble classifier from the best individual networks. You can manually choose a network structure, tweak the search parameters, or simply leave all those details to BigML (as we are doing here).

The Deepnet

img 1

Once the Deepnet is created, it would be nice to know how well it is performing. To find out, we create an Evaluation using this Deepnet and the 20% testing dataset we split off earlier.

img 2

We get an accuracy of 65.9% and a ROC AUC of 0.7232. Perhaps we could even improve these numbers by lengthening the maximum training time. This is significantly better than our results using a Boosted Tree run with default settings. Here they are compared:

img 3

The Boosted Tree has an accuracy of 59.4% and a ROC AUC of 0.6394. The Boosted Tree is just not able to pull as much information out of these low-level features as the Deepnet. This is a clear example of when we should choose Deepnets over other supervised learning techniques to solve your classification or regression problems.

Want to know more about Deepnets?

Stay tuned for the more publications! In the next post we will explain how to use Deepnets through the BigML Dashboard. Additionally, if you have any questions or you’d like to learn more about how Deepnets work, please visit the dedicated release page. It includes a series of six blog posts about Deepnets, the BigML Dashboard and API documentation, the webinar slideshow as well as the full webinar recording.

%d bloggers like this: