Skip to content

Come One, Come All! BigML Customers Share Testimonials With Our ML Community

As the adoption of Machine Learning as a Service continues to pick up, we want to take a moment to thank all our customers who have joined our mission to bring Machine Learning to everyone. We are excited to announce a BigML Customers page that shares testimonials from various users who have chosen BigML as their preferred tool for Machine Learning. From CEOs and Chief Scientists to analysts, software developers, and students, BigML enables anyone to become a master of their data, regardless if they have prior experience in Machine Learning. 

Today, BigML proudly serves over 57,000 customers from 161 countries around the world. Whether you have been with us for years, or you are new to BigML, we are glad to have you as a part of our global community of practitioners. 

BigML Customers

BigML not only helps large companies and organizations of all kinds, but we also actively support the teaching of Machine Learning with our Education Program. Over 600 universities worldwide use BigML and nearly 200 BigML Ambassadors actively promote the BigML platform on their campuses. The program continues to grow and help more students and educators every day. BigML’s Machine Learning Schools are another key component of our effort to bring Machine Learning to everyone. In addition to the events we host, BigML regularly participates in industry events worldwide, such as the upcoming Machine Learning Prague 2018 conference. See our dedicated events page for the full list.

Education Program and Events

With all that said, BigML’s growth wouldn’t be possible without YOU, so thanks for helping us democratize Machine Learning one step at a time! If you are interested in being featured on our Customers page and providing a testimonial, please contact us at If you are starting to learn Machine Learning with BigML or are working on a project, let us know how we can help at We always appreciate your feedback and like to hear how you are using the BigML platform.


Brazilian Entrepreneurs Meet at the BSSML17 and Bet on ML to Increase Business Competitiveness

The BigML Team continues traveling around the world to help democratize Machine Learning across geographies, industries, and organizations of all kinds. BigML, together with Sebrae and Telefónica Open Future_, brought to Curitiba, Parana, the fifth edition of our series of international Machine Learning Summer Schools, the second in Brazil, to prepare companies specialized in Information Technology to see business opportunities.

The one-day crash course took place this week, on November 29, and gave 45 entrepreneurs, programmers and students, from Paraná and other states in Brazil, a quick and practical introduction to Machine Learning. Subjects included supervised and unsupervised learning techniques, data transformations and feature engineering, as well as more advanced topics to learn how to automate Machine Learning workflows. The workshop provided practical knowledge of fast and useful techniques for how to use Machine Learning to improve performance and increase the competitiveness of companies. You can check the packed agenda on the BigML’s SlideShare account and visit the BSSML17 photo albums on Facebook and Google+ to see more pictures from the event.

The Chief Infrastructure Officer of BiGML, Poul Petersen, conducted all sessions and said that despite the technical theory, which involves mathematical calculations and statistical knowledge, the course balanced the content with educational and practical aspects that the tool entails. “We wanted to give a sense of how things work, in a simple way. Another relevant aspect is that the focus is not on the immediate result, but the long term … to prepare companies and professionals to succeed in the next five to 10 years,” he said.

A common and relevant question among participants and organizers was, “Are Machine Learning concepts applicable to small businesses?” Petersen’s response was emphatic: “Yes, they certainly are. Many companies that participated in the course were startups that often compete in the market with much larger and well-structured companies. And there they were gaining the knowledge to innovate and increase their competitiveness in the disputed Brazilian market. These companies, if they continue investing in knowledge, will certainly bother, in a good sense, the big players in the market,” he added.

“The concepts of Machine Learning are not restricted to technology companies. They will be incorporated into products and services in all branches of business. They make data correlations to project, with greater precision, what will happen in the future. That is why they are strategic and very important tools to generate competitiveness, boost growth and improve results,” summarized Julio Agostini, Sebrae/PR Operations Director. Regarding the Machine Learning Summer School in Curitiba, the BSSML17, Julio Agostini commented that “the difference of this course is that the attendees participate in a workshop, where they had practical experiences of how to apply the concepts to be able to evaluate the use in their businesses and to see opportunities for innovation, modernization of processes and development of new products and services to expand their activities.”

Pedro Riviere, head of strategic partnerships of Wayra at Telefónica Open Future_, said he was happy to contribute to the viability of the event. “Thanks to this global Telefónica project, Open Future_, which circulates in many countries and has already invested in projects of more than 700 startups in the world, including BigML, we managed to include Brazil in the script. It was a great learning opportunity for the Paraná entrepreneurs to have access to the content of BigML, the company that pioneered the creation of Machine Learning as a Service (MLAAS).

Among the attendees were relevant companies from Brazil such as Cinq, DP6, and Escotta Consulting, that wanted to delve deeper into learning leading Machine Learning tools like BigML to really get the insights of their data with ease in order to keep their businesses competitive in the marketplace. This great spirit only served to further validate BigML’s enthusiasm to continue organizing more Machine Learning Schools, so a big thank you to everyone for coming and following our educational activities! For more information on future Machine Learning Summer Schools please visit the dedicated page and stay tuned for future announcements!

BigML for Education: Getting to Know the BigML Ambassadors

BigML is actively being used in many educational institutions across the globe thanks to our Education Program. We would like to present several personal stories of our ambassadors on how they inspire their students or classmates that are looking to become more data-driven with a solid understanding of Machine Learning with BigML. Today we start with Iván Robles, an engineer in telecommunications with a Bachelor Degree in Mathematics that teaches Machine Learning at the EAE Business School and at ICEMD Business School in Madrid, Spain, while working for the telecom company Orange. Let’s get to know Iván a bit more!

BigML: How did you get into working with data and Machine Learning in particular?

Iván Robles: I love Mathematics and just after I finished my degree, back in 2006, a friend of mine told me about a company that used math to solve real-world problems. The idea of using math and statistics in my daily life was pretty exciting to me, so I applied for an open position and started working with them. Before 2006 I did not know what Machine Learning was, but since that moment I haven’t stopped working and learning in this field. Machine Learning has the perfect mix between math and programming, which really got my attention, especially when I could see that I was helping solve real-world problems in several areas such as marketing, networks, finance, among others. Naturally, this is what I currently teach to my students.

BigML: How do you see that Machine Learning is transforming the world?

Iván Robles: My students come from different companies, some of them are working for big corporations and others are entrepreneurs that are building their own company, and in both scenarios, they do invest resources in applying Machine Learning techniques to learn how to make decisions based on data instead of human intuitions alone. This, as well as all the news related to governments from many countries investing in Machine Learning, tells me we are on the right path to transforming all types of organizations into data-driven companies. This was non-existent a few years ago, which tells us that Machine Learning is already transforming many industries. Some good examples are self-driving cars, chatbots that answer questions automatically, and human-robot interactions, but in my opinion, this is just the beginning.

BigML: How do you currently apply Machine Learning?

Iván Robles: I teach Machine Learning in two business schools located in Madrid, the EAE Business School, and ICEMD. My students have different profiles, so my goal is to showcase different domains where they can apply Machine Learning. For instance, in my classes I use BigML’s time series models to find out the number of calls that a given call center will receive per day, to prepare budgets, and to forecast sales of a given product, among other examples. With classification models like BigML’s decision trees and ensembles, we analyze and predict churn, as well as what clients will buy a certain product based on their characteristics, and other similar use cases.

BigML: What’s the biggest advantage of applying Machine Learning in your field?

Iván Robles: As a professor that teaches Machine Learning, the main advantage that I see is that many non-experts can actually enjoy the benefits of Machine Learning without having to figure out exactly how the math behind the algorithms work. For example, Machine Learning allows you to analyze in minutes a multitude of data and relationships among them that without it would take years. This inspires my students and it certainly has a very positive impact in their business life.

BigML: What is your goal using BigML?

Iván Robles: I use BigML in my classes because unlike other Machine Learning platforms, BigML is very intuitive and accessible. It is built to not only make data scientists more productive, but to enable anyone to harness the potential of Machine Learning. I always like to showcase real use cases to my students that can be solved by applying Machine Learning techniques, therefore, at EAE Business School and ICEMD we take advantage of the BigML Education Program to accomplish that goal.

BigML: How do you find BigML different from other Machine Learning platforms?

Iván Robles: BigML is very easy to use, understand, and interpret, thanks to your powerful visualizations. This is obviously very much linked to your mission of democratizing Machine Learning. I can tell this clearly works in my classes because some of my students don’t have any technical background, yet they do understand BigML and they see how they can use the results obtained with the Machine Learning models they create. They really like the fact that they don’t need to compute or calculate anything, as BigML does it for them. They only need to drag and drop their data and in as little as a few clicks get the results they are looking for to continue building their projects, which is pretty awesome.

For that matter, I do see the tremendous added value of the BigML platform and totally identify it with your vision of democratizing Machine Learning. The ease of use allows everyone to be able to use Machine Learning in their projects. And the powerful visualizations generate a “wow” effect. They are very interactive, and really help us see how the problems are solved by the rules suggested by the Machine Learning algorithms. Indeed, it’s very impressive!

BigML: What is the reaction of your students when they use BigML?

Iván Robles: My classmates love BIGML! They also have other subjects like programming languages for instance, but especially the business profiles find these subjects more difficult. However, with BigML it is different because the non-technical students often mention that BigML is their saviour, as they get the expected results very easily without being an expert in Machine Learning programming. On the other hand, the programmers also enjoy using BigML because they find that with your platform they don’t need to invest as much time and effort as they do with other tools. In fact, for both types of students, when they need to work on their final projects, they prefer easier and capable tools like BigML.

BigML: Any advice you would give to our readers to get started with Machine Learning?

Iván Robles: Start with BigML, and maybe you don’t need anything else! I mean it, BigML covers a large variety of use cases that can easily solve real-world problems with very good results applicable to plenty of organizations.

We hope Iván’s story inspired you to become a BigML Ambassador. Stay tuned for future blog posts to get to know more BigML community members and their personal stories. If you are not part of the BigML community yet you can change that right now, simply register here for free! Also, for those working in academia or still studying, feel free to join our Education Program and apply here to become a BigML Ambassador which also empowers you to promote our platform on your campus. Thanks for helping us in delivering #MachineLearning made beautifully simple for everyone!

BSSML17: Machine Learning Summer School in Curitiba

If you follow the BigML blog, you may be familiar with our popular Machine Learning Schools held globally throughout the year. These crash courses are key to contribute to the democratization of Machine Learning and to produce a much larger group of ML-literate professionals such as developers, analysts, managers, and subject-matter experts, to help them remain competitive in the marketplace.

We are proud to announce the next course, the second edition of our Machine Learning Summer School in Brazil that will take place on November 29, 2017, in Curitiba, Paraná. The BigML Chief Infrastructure Officer, Poul Petersen, will be giving a one-day course ideal for industry practitioners, advanced undergraduates, as well as graduate students seeking a quick, practical, and hands-on introduction to Machine Learning. The Summer School, co-organized by Sebrae, BigML, and Telefónica Open Future_, will serve as a good introduction to the kind of work that students can expect if they enroll in Machine Learning masters.


Sebrae Building: Caeté Street, 150 – Prado Velho, Curitiba, Paraná, Brazil.


November 29, 2017, from 8:30 AM to 6:30 PM BRST.

Apply now!

Find more details about the program and register here to attend the event before it’s fully booked, as space is limited.

BigML’s ML Schools Evolution

Since we ran the first edition of our Machine Learning Schools in Valencia, Spain, the interest of many ML practitioners has increased over the years. The best proof of that is in the applicant and attendee stats since the first edition.

In September 2015, at the very first Valencian Summer School in Machine Learning we welcomed 95 attendees coming from 7 different countries, and out of those, 92 were from Europe. 42 of them came from the academia representing 13 universities and the remaining 53 attendees came from 40 organizations.

One year later, in September 2016, we celebrated the second edition and went from 95 attendees we had in the first edition to 142 attendees from 19 different countries. We realized that the world was ready to learn and apply Machine Learning techniques more than ever. Out of the 142 attendees, 125 of them were from Europe, and out of those, 109 from Spain. We also had more business profiles join the event compared to the first edition. 39 attendees represented 21 universities versus the 82 remaining representing 53 organizations.

The positive response from the audience encouraged us to do more; that’s when we decided to organize the first Brazilian Summer School in Machine Learning. That event took place in São Paulo, in December 2016. We were surrounded by 202 Brazilian attendees coming from 6 different states such as Minas Gerais, São Paulo, Rio de Janeiro, Paraná, Santa Catarina, and Rio Grande do Sul. And again, many more attendees, 186, coming from private companies versus the 16 attendees from academic institutions.

This year, in September 2017, the BigML Team ran the third edition of our Valencian Summer School in Machine Learning, that brought together 204 attendees from 14 countries and 183 of them from Europe, mostly from Spain. Among the crowd, there were 45 attendees from 28 universities, and 159 from 92 organizations.

Now, with your help, it’s time to beat the records with the second edition of our Machine Learning School in Brazil, this year in Curitiba!

The Impact of Protecting Inventions on Time for Tech Companies

Protecting the Intellectual Property for any business, especially for innovative tech companies, is key to become more competitive in the marketplace as well as more attractive for investors. This was the main message explained by all speakers at the three special events celebrated in Spain on October 25, 26 and 27 in Madrid, Barcelona, and Valencia, respectively.

BigML, together with Telefónica and the two law firms FisherBroyles and Schwabe, Williamson & Wyatt, co-organized this event to discuss the importance that patents have in any tech business, explained from three different perspectives: startup, big corporation, and the legal view. The talks analyzed the main differences between the patent protection process for US and European companies, where the regulations in US make it easier than in European countries.

Starting with the startup perspective, Dr. Francisco J. Martín, the Co-Founder and CEO of BigML, provided a talk based on his personal and professional experience on how patents have helped the companies he created. “The companies that file patents receive more funding and support from investors”, highlighted Dr. Martín, based on the study he presented made with PreSeries, which analyzes more than 300.000 tech companies and shows the correlation between companies that file patents during their first years and the amount of funding they receive throughout their lives.

Following up with the perspective of big corporations, Dr. Luis Ignacio Vicente del Olmo, the Return on Innovation Area of Telefónica and Head of Telefónica Patent Office, explained how to manage Intellectual Property within the innovation process, as well as how to protect software-implemented inventions, since by 2020, 50% of the patent applications will be software-implemented. Del Olmo presented both sessions in Madrid and Valencia, whereas the speaker in Barcelona was Antonio López-Carrascoso, Senior Technology Expert at Telefónica Patent Office.

Finally, concluding with the legal viewpoint, Ms. Graciela Gómez Cowger, IP Strategist and CEO at Schwabe, Williamson & Wyatt, and Mr. Micah Stolowitz, IP Strategist at FisherBroyles, addressed why all tech startups should make Intellectual Property an integral part of their market strategy, especially because “VC’s invest in IP, if it’s missing, they simply don’t invest”, pointed out Ms. Gómez. Both IP Strategists emphasized the importance of these intangible assets, an idea that was present in the three talks and which the audience devoted some time to ask related questions during the round table.

If you are hungry for more, please visit our Facebook and Google+ pages to check all the pictures taken at the events in Madrid, Barcelona and Valencia. The BigML Team looks forward to seeing you all in further editions!

Predicting TED Talks Popularity

Everyone knows TED talks. TED started in 1984 as a conference series on technology, education, and design. In essence, TED talks aim to democratize knowledge. Nowadays it produces more than 200 talks per year addressing dozens of different topics. Despite the critics who claim that TED talks reduce complex ideas to 20-minute autobiographical stories of inspiration, the great influence they have on the knowledge diffusion in our society is undeniable.

Predicting Ted Talks Views with Machine Learning

When I came across the TED dataset in Kaggle, the first thing that caught my attention was the great dispersion in the number of views: from 50K to over 47M (with a median of ~1M views). One can’t keep but wonder what makes some talks 20 times more popular than others? Can the TED organizers and speakers do something to maximize the views in advance? In this blog post, we’ll try to predict the popularity of TED talks and analyze the most influential factors.

The Data

The original file provides information for 2,550 talks over one decade: from 2006 (only 50 talks were published) until 2017 (over 200 talks were published). When you create a dataset in BigML, you can see all the features with their types and their distributions. You can see the distribution of talks over the years in the last field in the dataset seen below.


For text fields, we can inspect the word frequency in a tag cloud. For example, the most used words in the titles are “world”, “life” and “future”.


When we take a look at the features, we see two main sets: one that informs us about the talk impact (comments, languages, and views -our objective field-), and another that describes the talk characteristics (the title, description, transcript, speakers, duration, etc). Apart from the original features, we came up with two additional ones: the days between video creation and publishing and the days between publishing and dataset collection on Sept 21, 2017.

The fields related to talk impact may be a future consequence of our objective field: the views. A talk with more views is more likely to have a higher number of comments and to be translated into more languages. Therefore, it’s best if we exclude those from our analysis, otherwise, we will create a “leakage”, i.e., leaking information from the future in the past, thus obtaining unrealistically good predictions.

On the other hand, all the fields related to the talk characteristics can be used as predictors. Most of them are text fields such as the title, the description, or the transcript. BigML supports text fields for all supervised models, so we can just feed the algorithms with them. However, we suspect that not all the thematically related talks necessarily use the same words. Thus, using the raw text fields may not be the best way to find patterns in the training data that can be generalized to other TED talks. What if instead of using the exact raw words in the talks, we could extract their main topics and use them as predictors? That is exactly what BigML topic models allow us to do!

Extracting the Topics of TED talks

We want to know the main topics of our TED talks and use them as predictors. To do this, we build a topic model in BigML using the title, the description, the transcript, and the tags as input fields.

The coolest thing about BigML topic models is that you don’t need to worry about text pre-processing. It’s very hand that BigML automatically cleans the punctuation marks, homogenizes the cases, excludes stopwords, and applies stemming during the topic model creation. You can also fine tune those settings and include bigrams as you wish by configuring your topic model in advance.


When our topic model is created, we can see that BigML has found 40 different topics in our TED talks including technology, education, business, religion, politics, and family, among others.  You can see all of them in a circle map visualization in which each circle represents a topic. The size of the circle represents the importance of that topic in the dataset and related topics are located closer in the graph. Each topic is a distribution over term probabilities. If you mouse over each topic you can see the top 20 terms and their probabilities within that topic. BigML also provides another visualization in which you can see all the top terms per topic displayed in horizontal bars. You can observe both views below or better yet inspect the model by yourself here!

BigML topic models are an optimized implementation of Latent Dirichlet Allocation (LDA), one of the most popular probabilistic methods for topic modeling. If you want to learn more about topic models, please read the documentation.

All the topics found seem to be coherent with main TED talks themes. Now we can use this model to calculate the topic probabilities for any new text. Just click on the option Topic Distribution in the 1-click action menu, and you will be redirected to the prediction view where you can set new values to your input fields. See below how the distribution over topics changes when you change the text in the input fields.

Now we want to do the same for our TED talks. To calculate the topic probabilities for each TED talk we use the option Batch Topic Distribution in the 1-click action menu. Then we select our TED talks dataset. We also need to make sure that the option to create a new dataset out of the topic distribution is enabled!


When the batch topic distribution is created, we can find the new dataset with additional numeric fields, containing the probabilities of each topic per TED talk. These are the fields that we will use as inputs to predict the views replacing the transcript, title, description, and tags.


Predicting the TED Talks Views

Now we are ready to predict the talk views. As we mentioned at the beginning of this post, the number of talks is a widely dispersed and highly skewed field, therefore predicting the exact number of views can be difficult. In order to find more general patterns of the influence of topics as applicable to the talks popularity, we are going to discretize the views and make it categorical. This is very easy in BigML, you just need to select the option to Add fields to the dataset in the configure option menu. We are going to discretize the field by percentiles by using the median.


Then we click the button to create a dataset. The dataset contains a new field with two classes, the first class containing the talks below the median number of views (less than 1M views), and a second class containing the talks over the median number of views, (more than 1M views).


Before creating our classification model, we need to split our dataset into two subsets: the 80% for training and the remaining 20% for testing to ensure that our model generalizes well against data that the model has not seen before. We can easily do this in BigML by using the corresponding option in the 1-click action menu, as shown below.


We proceed with the 80% of our dataset to create our predictive model. To compare how different algorithms perform, we create a single tree, an ensemble (Random Decision Forest), a logistic regression and the latest addition to BigML, deepnets (an optimized implementation of the popular deep neural networks). You can easily create those models from the dataset menus. BigML automatically selects the last field in the dataset as the objective field “views (discretized)” so we don’t need to configure it differently. Then we use the 1-click action menu to easily create our models.


Apart from the 1-click deepnet, which uses an automatic parameter optimization option called Structure Suggestion, we also create another deepnet by configuring an alternative automatic option called Network Search. BigML offers this unique capability for automatic parameter optimization to eliminate the difficult and time-consuming work of hand-tuning your deepnets.


After some iterations, we realize that the features related to the speaker have no effect on the number of views, therefore we eliminate those along with the field “event” that seems to be causing overfitting. At the end, we use as input fields all the topics, the year of the published date, the duration of the talk, plus our calculated field that measures the number of days since the published date (until the 21st of Sept 2017).

After creating all the models with these selected features, we need to evaluate them by using the remaining 20% of the dataset that we set aside earlier. We can easily compare the performance of our models with the BigML evaluation comparison tool, where the ROC curves can be analyzed altogether. As seen below, the winner with the highest AUC (0.776)  is a deepnet that uses the automatic parametrization option “Network Search”. The second best performing model is again a deepnet, but the one using the automatic option “Structure Suggestion”. This one has a AUC value of 0.7557. In third place, we see the ensemble (AUC of 0.7469), followed by logistic regression (AUC of 0.7097) and finally the single tree (AUC of 0.6781).


When we take a look at the confusion matrix of our top performing deepnet, we can see that we are achieving over 70% recall with a 70% precision for both classes of the objective field.


Inspecting the Deepnet

Usually, deep neural network predictions are hard to analyze, that’s why BigML provides ways to make it easy for you to understand why your model is making particular decisions.

First, BigML can tell us which fields are most important in predicting the number of views. By clicking on the “Summary Report” option, we get a histogram with a percentage per field that displays field importances. We can see that (not surprisingly) the “days_since_published” is the most important field (19.33%), followed by the topic “Entertainment” (15.90%), and the “published_date.year” (13.62%). Amongst the top 10 most important fields we can also find the topics “Relationships”, “Global issues”, “Research”, “Oceans”, “Data”, “Psychology” and “Science”.


Great! We can state that our deepnet found the topics as relevant predictors in deciding the number of views. But how exactly do these topics impact predictions? Will talks about psychology have more or less views than talks about science? To answer this question, BigML provides a Partial Dependence Plot view, where we can analyze the marginal impact of the input fields on the objective field. Let’s see some examples (or play with the visualization here at your leisure).

For example, see in the image below how the combination of the topics “Entertainment” and “Psychology” have a positive impact on the number of views. Higher probabilities of those topics result in the prediction of our 2nd class (the blue one), which is the class over 1M number of views.


On the contrary, if we select “Health” instead, we can see how higher probabilities of this topic result in a higher probability of predicting the 1st class (the class below 1M number of views).


We can also see a change over time in the interest in some topics. As you can see below, the probability of the psychology topic to have more than 1M views has increased in the recent years given the period of 2012 to 2017.



In summary, we have seen that topics do have a significant influence on the number of views. After analyzing the impact of each topic in predictions, we observe that “positive” topics such as entertainment or motivation are more likely to have a greater number of views while talks with higher percentages of the “negative” topics like diseases, global issues or war are more likely to have fewer views. In addition, it seems that the interest in individual human-centered topics such as psychology or relationships has increased over the years to the detriment of broader social issues like health or development.

We hope you enjoyed this post! If you have any questions don’t hesitate to contact us at

How to Create a WhizzML Script – Part 4

As the final installment of our WhizzML blog series (Part 1, Part 2, Part 3), it’s time to learn the BigML Python bindings way to create WhizzML scripts. With this last tool at your disposal, you’ll officially be a WhizzML-script-creator genius!

About BigML Python bindings

BigML Python bindings allow you to interact with, the API for BigML. You can use it to easily create, retrieve, list, update, and delete BigML resources (i.e., sources, datasets, models and predictions). This tool becomes even more powerful when it is used with WhizzML. Your WhizzML code can be stored and executed in BigML by using three kinds of resources: Scripts, Libraries, and Executions (see the first post: How to create WhizzML Script. Part 1).pythonWhizzMl

WhizzML Scripts can be executed in BigML’s servers in a controlled, fully-scalable environment that automatically takes care of their parallelization and fail-safe operation. Each execution uses an Execution resource to store the arguments and the results of the process. WhizzML Libraries store generic code to be shared or reused in other WhizzML Scripts. But this is not really new, so let’s get into some new tricks instead.


You can create and execute your WhizzML script via the bindings to create powerful workflows. There are multiple languages supported by BigML bindings e.g., Python, C#, Java or PHP. In this post we’ll use the Python bindings but you can find the others on GitHub. If you need a BigML Python bindings refresher, please refer to the dedicated Documentation.


In BigML a script resource stores WhizzML source code, and the results of its compilation. Once a WhizzML script is created, it’s automatically compiled. If compilation is successful, the script can be run, that is, used as the input for a WhizzML execution resource. Suppose we want a simple script that creates a model from a dataset ID. The code in WhizzML to do that is:

# define the code that will be compiled in the script
script_code = "(create-model {\"dataset\" dataset-id})"

However, you need to keep in mind that every script can have inputs and outputs. In this case, we’ll want the dataset-id variable to contain our input, the dataset ID. In the code below, we see how to describe this input, its type and even associate a specific ID that will be used by default when no input is provided.

from bigml.api import BigML
api = BigML()

# define the code include in the script
script_code = "(create-model {\"dataset\" dataset-id})" 

# define parameters
args = {"inputs": [{
    "name": "dataset-id",
    "type": "dataset-id",
    "default": "dataset/598ceafe4006830450000046",
    "description": "Dataset to be modelized"

# create the script
script = api.create_script(script_code, args)

# waiting for the script compilation to finish


To execute a compiled WhizzML script in BigML, you need to create an execution resource. Each execution is run under its associated user credentials and its particular environment constraints. Furthermore, a script can be shared, and you can execute the same script several times under different usernames by creating a different execution. For this, you need to first identify the script you want to execute and set your inputs as an array of arrays (one per variable-value pair). In this execution, we are setting the value of variable x in the script to 2. The parameters for the execution are the script inputs. It’s a piece of cake!

from bigml.api import BigML
api = BigML() 

# choose workflow
script = 'script/591ee3d00144043f5c003061' 

# define parameters
args = {"inputs": [['x', 2]]}

# execute and wait for the execution to finish
execution = api.create_execution(script, args)


We haven’t created libraries in the previous posts, but there’s always a first time. A library is a shared collection of WhizzML functions and definitions usable by any script that imports them. It’s a special kind of compiled WhizzML source code that only defines functions and constants. Libraries can be imported in scripts. The imports attribute of a script can contain a list of library IDs whose defined functions and constants will be usable in the script. It’s very easy to create WhizzML libraries by using BigML Python bindings.

Let’s go through a simple example. We’ll define one function get-max and a constant. The function will get the max value in a list. In WhizzML language this can be expressed as below.

from bigml.api import BigML
api = BigML()

# define the library code
lib_code = \
    "(define (get-max xs)" \
    "  (reduce (lambda (x y) (if (> x y) x y))" \
    "    (head xs) xs))"

# create a library
library = api.create_library(lib_code)

# waiting for the library compilation to finish

For more examples, don’t hesitate to take a look at the WhizzML resources section in the BigML Python Bindings documentation.

So that’s it! You now know all the techniques needed to create and execute WhizzML scripts. We’ve covered the basic concepts of WhizzML, how to clone existing scripts or import them from Github (Part 1), how to create new scripts by using Scriptify and the editor (Part 2), how to use the BigMLer command line (Part 3) and finally the BigML Python bindings in this last post.


To go deeper into the world of Machine Learning automation via WhizzML, please check out our WhizzML tutorials such as the Automated Dataset Transformation tutorial. If you need a WhizzML refresher, you can always visit the WhizzML documentation. Start with the “Hello World” section, in the Primer Guide, for a gentle introduction. And yes, please be on the lookout for more WhizzML related posts on our blog.

BigML Summer 2017 Release Webinar Video is Here: Deepnets!

We are happy to share that Deepnets are fully implemented on our platform and available from the BigML Dashboard, API, as well as from WhizzML for its automation.

BigML Deepnets are the world’s first deep learning algorithm capable of Automatic Network Detection and Structure Search, which automatically optimize model performance and minimize the need for expensive specialists to drive business value. Following our mission to make Machine Learning beautifully simple for everyone, BigML now offers the very first service that enables non-experts to use deep learning with results matching that of top-level data scientists. BigML’s extensive benchmark conducted on 50+ datasets has shown Deepnets, an optimized version of Deep Neural Networks brought to the BigML platform, to outperform other algorithms offered by popular Machine Learning libraries. With nothing to install, nothing to configure, and no need to specify a neural network structure, anyone can use BigML’s Deepnets to transform raw data into valuable business insights.

Special thanks to all webinar attendees who joined the BigML Team yesterday during the official launch. For those who missed the live webinar, you can watch the full recording on the BigML YouTube channel.

As explained in the video, one of the main complexities of deep learning is that a Machine Learning expert is required to find the best network structure for each problem. This can often be a tedious trial-and-error process that can take from days to weeks. To combat these challenges and make deep learning accessible for everyone, BigML now enables practitioners to find the best network for their data without having to write custom code or hand-tune parameters. We make this possible with two unique parameter optimization options: Automatic Network Search and Structure Suggestion.

BigML’s Automatic Network Search conducts an intelligent guided search over the space of possible networks to find suitable configurations for your dataset. The final Deepnet will use the top networks found in this search to make predictions. This capability yields a better model, however, it takes longer since the algorithm conducts an extensive search for the best solution. It’s ideal for use cases that justify the incremental wait for optimal Deepnet performance. On the other hand, BigML’s Structure Suggestion only takes nominally longer than training a single network. This option is capable of swiftly recommending a neural network structure that is optimized to work well with your particular dataset.

For further learning on Deepnets, please visit our dedicated summer 2017 release page, where you will find:

  • The slides used during the webinar.
  • The detailed documentation to learn how to create and evaluate your Deepnets, and interpret the results before making predictions from the BigML Dashboard and the BigML API.
  • The series of six blog posts that gradually explain Deepnets.

Thanks for your positive comments after the webinar. And remember that you can always reach out to us at for any suggestions or questions.

Deepnets: Behind The Scenes

Over our last few blog posts, we’ve gone through the various ways you can use BigML’s new Deepnet resource, via the Dashboard, programmatically, and via download on your local machine. But what’s going on behind the curtain? Is there a little wizard pulling an elaborate console with cartoonish-looking levers and dials?

Well, as we’ll see, Deepnets certainly do have a lot of levers and dials.  So many, in fact, that using them can be pretty intimidating. Thankfully, BigML is here to be your wizard* so you aren’t the one looked shamefacedly at Dorothy when she realizes you’re not as all-powerful as you might have thought.

BigML Deep Learning

Deepnets:  Why Now?

First, let’s address an important question, why are deep neural networks suddenly all the rage?  After all, the Machine Learning techniques underpinning deep learning have been around for quite some time. The reason boils down to a combination of innovations in the technology supporting the learning algorithms more than advances in learning algorithms themselves. It’s worth quoting Stuart Russell at length here:

. . .deep learning has existed in the neural network community for over 20 years. Recent advances are driven by some relatively minor improvements in algorithms and models and by the availability of large data sets and much more powerful collections of computers.

He gets at most of the reasons in this short paragraph.  Certainly, the field has been helped along by the availability of huge datasets like the ones generated by Google and Facebook, as well as some academic advances in algorithms and models.

But the things I will focus on in this post are in the family of “much more powerful collections of computers”.  In the context of Machine Learning, I think this means two things:

  • Highly parallel, memory-rich computers provisionable on-demand.  Few people can justify building a massive GPU-based server to train a deep neural network on huge data if they’re only going to use it every now and then. But most people can afford the same for a few days at a time. Making such power available in this way makes deep learning cost-effective for far more people than it used to be.
  • Software frameworks that have automatic differentiation as first-class functionality. Modern computational frameworks (like TensorFlow, for example) allow programmers to instantiate a network structure programmatically and then just say “now do gradient descent!” without ever having to do any calculus or worry about third-party optimization libraries.  Because differentiation of the parameters with respect to the input data is done automatically, it becomes much easier to try a wide variety of structures on a given dataset.

The problem here then becomes one of expertise:  people need powerful computers to do Machine Learning, but few people know how to provision and deploy machines on, say Amazon AWS to do this. Similarly, computational frameworks to specify and train almost any deep network exist and are very nice, but exactly how to use those frameworks and exactly what sort of network to create are problems that require knowledge in a variety of different domains.

Of course, this is where we come in. BigML has positioned itself at the intersection of these two innovations, utilizing a framework for network construction that allows us to train a wide variety of network structures for a given dataset, using the vast amount of available compute power to do it quickly and at scale, as we do with everything else. We add to these innovations a few of our own, in an attempt to make Deepnet learning as “hands off” and as “high quality” as we can manage.

What BigML Deepnets are (currently) Not:

Before we get into exactly what BigML’s Deepnets are let’s talk a little bit about what they aren’t. Many people who are technically minded will immediately bring to mind the convolutional networks that do so well on vision problems, or the recurrent networks (such as LSTM networks) that have great performance on speech recognition and NLP problems.

We don’t yet support these network types.  The main reason for this is that these networks are designed to solve supervised learning problems that have a particular structure, not the general supervised learning problem.  It’s very possible we’ll support particular problems of those types in the future (as we do with, say, time series analysis and topic modeling), and we’d introduce those extensions to our deep learning capabilities at that time.

Meanwhile, we’d like to bring the power of deep learning so obvious in those applications to our users in general.  Sure, deep learning is great for vision problems, but what can it do on my data?  Hopefully, this post will prove to you that it can do quite a lot.

Down to Brass Tacks

The central idea behind neural networks is fairly simple, and not so very different from the idea behind logistic regression:  You have a number of input features and a number of possible outputs (either one probability for each class in classification problems or a single value for regression problems). We posit that the outputs are a function of the dot product of the input features, together with some learned weights for each feature.  In logistic regression, for example, we imagine that the probability of a given class can be expressed as the logistic function applied to this dot product (plus some bias term).

Deep networks extend this idea in several ways. First, we introduce the idea of “layers”, where the inputs are fed into a layer of output nodes, the outputs of which are in turn fed into another layer of nodes, and so on until we get to a final “readout” layer with the number of output nodes equal to the number of classes.


This gives rise to an infinity of possible network topologies:  How many intermediate (hidden) layers do you want?  How many nodes in each one?  There’s no need to apply the logistic function at each node; in principle, we can apply anything that our network framework can differentiate. So we could have a particular “activation function” for each layer or even for each node! Could we apply multiple activation functions?  Do we learn a weight for every single node in one layer to every single node in another, or do we skip some?

Add to this the usual parameters for gradient descent: Which algorithm to use?  How about the learning rate? Are we going to dropout training to avoid overfitting?  And let’s not even get into the usual parameters common to all ML algorithms, like how to handle missing data or objective weights. Whew!

Extra Credit

What we’ve described above is the vanilla feed-forward neural network that the literature has known for quite a while, and we can see that they’re pretty parameter heavy. To add a bit more to the pile, we’ve added support for a couple of fairly recent advances in the deep learning world (some of the “minor advances” mentioned by Russel) that I’d like to mention briefly.

Batch Normalization

During the training of deep neural networks, the activations of internal layers of the network can change wildly throughout the course of training. Often, this means that the gradients computed for training can behave poorly so one must be very careful to select a sufficiently low learning rate to mitigate this behavior.

Batch normalization fixes this by normalizing the inputs to each layer for each training batch of instances, assuring that the inputs are always mean-centered and unit-variance, which implies well-behaved gradients when training. The downside is that you must now know both mean and variance for each layer at prediction time so you can standardize the inputs as you did during training. This extra bookkeeping tends to slow down the descent algorithm’s per-iteration speed a bit, though it sometimes leads to faster and more robust convergence, so the trade-off can be worthwhile for certain network structures.

Learn Residuals

Residual networks are networks with “skip” connections built in. That is, every second or third layer, the input to each node is the usual output from the previous layer, plus the raw, unweighted output from two layers back.

The theory behind this idea is that this allows information present in the early layers to bubble up through the later layers of the network without being subjected to a possible loss of that information via reweighting on subsequent layers. Thus, the later layers are encouraged to learn a function representing the residual values that will allow for good prediction when used on top of the values from the earlier layers. Because the early layers contain significant information, the weights for these residual layers can often be driven down to zero, which is typically “easier” to do in a gradient descent context than to drive them towards some particular non-zero optimum.

Tree Embedding

When this option is selected, we learn a series of decision trees, random forest-style, against the objective before learning (with appropriate use of holdout sets, of course). We then use the predictions of the trees as generated “input features” for the network. Because these features tend to have a monotonic relationship with the class probabilities, this can make gradient descent reach a good solution more quickly, especially in domains where there are many non-linear relationships between inputs and outputs.

If you want, this is a rudimentary form of stacked generalization embedded in the tree learning algorithm.

Pay No Attention To The Man Behind The Curtain

So it seems we’ve set ourselves an impossible task here. We have all of these parameters. How on earth are we going to find a good network in the middle of this haystack?

Here’s where things get BigML easy:  the answer is that we’re going to do our best to find ones that work well on the dataset we’re given. We’ll do this in two ways:  via metalearning and via hyper-parameter search.


Metalearning is another idea that is nearly as old as Machine Learning itself.  In its most basic form, the idea is that we learn a bunch of classifiers on a bunch of different datasets and measure their performance. Then, we apply Machine Learning to get a model that predicts the best classifier for a given dataset. Simple, right?

In our application, this means we’re going to learn networks of every sort of topology parameterized in every sort of way. What do I mean by “every sort”?  Well, we’ve got over 50 datasets so we did five replications of 10-fold cross-validation on each one. For each fold, we learned 128 random networks, then we measured the relative performance of each network on each fold.  How many networks is that?  Here, allow me to do the maths:

50 * 5 * 10 * 128 = 320,000.

“Are you saying you trained 320,000 neural networks”  No, no, no, of course not!  Some of the datasets were prohibitively large so we only learned a paltry total of 296,748 networks.  This is what we do for fun here, people!

When we model the relative quality of the network given the parameters (which of course do use BigML), we learn a lot of interesting little bits about how the parameters of neural networks relate to one another and the data on which they’re being trained.

You’ll get better results using the “adadelta” decent algorithm, for example, with high learning rates, as you can see by the green areas of the figure below, indicating parts of the parameter space that specify networks, which perform better on a relative basis.


But if you’re using “rms_prop”, you’ll want to use learning rates that are several orders of magnitude lower.


Thankfully, you don’t have to remember all of this.  The wisdom is in the model, and we can use this model to make intelligent suggestions about which network topology and parameters you should use, which is exactly what we do with the “Automatic Structure Suggestion” button.


But, of course, the suggested structure and parameters represent only a guess at the optimal choices, albeit an educated one.  What if we’re wrong?  What if there’s another network structure out there that performs well on your data, if you could only find it.  Well, we’ll have to go looking for it, won’t we?

Network Search

Our strategy here again comes in many pieces.  The first is Bayesian hyperparameter optimization.  I won’t go into it too much here because this part of the architecture isn’t too much more than a server-side implementation of SMACdown, which I’ve described previously. This technique essentially allows a clever search through the space of possible models by using the results of previously learned networks to guide the search. The cleverness here lies in using the results of the previously trained networks to select the best next one to evaluate. Beyond SMAC, we’ve done some experiments with regret-based acquisition functions, but the flavor of the algorithm is the same.

We’ve also more heavily parallelized the algorithm, so the backend actually keeps a queue of networks to try, reordering that queue periodically as its model of network performance is refined.

The final innovation here is using bits and pieces of the hyperband algorithm.  The insight of the algorithm is that a lot of savings in computation time can be gained by simply stopping training on networks that, even if their fit is still improving, have little hope of reaching near-optimal performance in reasonable time. Our implementation differs significantly in the details (in our interface, for example, you provide us with a budget of search time that we always respect), but does stop training early for many underperforming networks, especially in the later stages of the search.

This is all available right next door to the structure suggestion button in the interface.Automatic Network SearchBetween metalearning and network search, we can make a good guess as to some reasonable network parameters for your data, and we can do a clever search for better ones if you’re willing to give us a bit more time.  It sounds good on paper, so let’s take it for a spin.


So how did we do? Remember those 50+ datasets we used for metalearning?  We can use the same datasets to benchmark the performance of our network search against other algorithms (before you ask, no, we didn’t apply our metalearning model during this search as that would clearly be cheating). We again do 5 replications of 10-fold cross validation for each dataset and measure performance over 10 different metrics. As this was a hobby project I started before I came to BigML, you can see the result here.

Deepnets Benchmark

You can see on the front page a bit of information about how the benchmark was conducted (scroll below the chart), and a comparison of 30+ different algorithms from various software packages. The thing that’s being measured here is “How close is the performance of this algorithm to the best algorithm for a given dataset and metric?”  You can quickly see that BigML Deepnets are the best thing or quite close more often than any other off-the-shelf algorithm we’ve tested.

The astute reader will certainly reply, “Well, yes, but you’ve done all manner of clever optimizations on top of the usual deep learning; you could apply such cleverness to any algorithm in the list (or a combination!) and maybe get better performance still!”

This is absolutely true.  I’m certainly not saying that BigML deep learning is the best Machine Learning algorithm there is; I don’t even know how you would prove something like that. But what these results do show is that if you’re going to just pull something off of the shelf and use it, with no parameter tuning and little or no coding, you could do a lot worse than to pull off BigML. Moreover, the results show that deep learning (BigML or otherwise; notice that multilayer perceptrons in scikit are just a few clicks down the list) isn’t just for vision and NLP problems, and it might be the right thing for your data too.

Another lesson here, editorially, is that benchmarking is a subtle thing, easily done wrong. If you go to the “generate abstract” page, you can auto-generate a true statement (based on these benchmarks) that “proves” that any algorithm in this list is “state-of-the-art”. Never trust a benchmark on a single dataset or a single metric! While any benchmark of ML algorithms is bound to be inadequate, we hope that this benchmark is sufficiently general to be useful.

Cowardly Lion Deepnets

Hopefully, all of this has convinced you to go off to see the Wizard and give BigML Deepnets a try. Don’t be the Cowardly Lion! You have nothing to lose except underperforming models…

Want to know more about Deepnets?

Please visit the dedicated release page for further learning. It includes a series of six blog posts about Deepnets, the BigML Dashboard and API documentation, the webinar slideshow as well as the full webinar recording.


Automating Deepnets with WhizzML and The BigML Python Bindings


This blog post, the fifth of our series of six posts about Deepnets, focuses on those users that want to automate their Machine Learning workflows using programming languages. If you follow the BigML blog, you may know WhizzML, BigML’s domain-specific language for automating Machine Learning workflows, implementing high-level Machine Learning algorithms, and easily sharing them with others. WhizzML helps developers to create Machine Learning workflows and execute them entirely in the cloud. This avoids network problems, memory issues and lack of computing capacity while taking full advantage of WhizzML’s built-in parallelization. If you are not familiar with WhizzML yet, we recommend that you read the series of posts we published this summer about how to create WhizzML scripts: Part 1, Part 2 and Part 3 to quickly discover their benefits.

In addition, in order to easily automate the use of BigML’s Machine Learning resources, we maintain a set of bindingswhich allow users to work in their favorite language (Java, C#, PHP, Swift, and others) with the BigML platform.

Screen Shot 2017-03-15 at 01.51.13

Let’s see how to use Deepnets through both the popular BigML Python Bindings and WhizzML, but note that the operations described in this post are also available in this list of bindings.

We start creating Deepnets with the default settings. For that, we need to start from an existing Dataset to train the network in BigML, so our call to the API will need to include the Dataset ID we want to use for training as shown below:

;; Creates a deepnet with default parameters
(define my_deepnet (create-deepnet {"dataset" training_dataset}))

The BigML API is mostly asynchronous, that is, the above creation function will return a response before the Deepnet creation is completed. This implies that the Deepnet information is not ready to make predictions right after the code snippet is executed, so you must wait for its completion before you can predict with it. You can use the directive “create-and-wait-deepnet” for that:

;; Creates a deepnet with default settings. Once it's
;; completed the ID is stored in my_deepnet variable
(define my_deepnet
  (create-and-wait-deepnet {"dataset" training_dataset}))

If you prefer the BigML Python Bindings, the equivalent code is:

from bigml.api import BigML
api = BigML()
my_deepnet = api.create_deepnet("dataset/59b0f8c7b95b392f12000000")

Next up, we will configure a Deepnet with WhizzML. The configuration properties can be easily added in the mapping by using property pairs such as <property_name> and <property_value>. For instance, to create a Deepnet with a dataset, BigML automatically fixes the number of iterations to optimize the network to 20,000, but if you prefer the maximum number of gradient steps to take during the optimization process, you should add the property “max_iterations and set it to 100,000. Additionally, you might want to set the value used by the Deepnet when numeric fields are missing. Then, you need to set thedefault_numeric_valueproperty to the right value. In the example shown below, it is replaced by the mean value. Property names always need to be between quotes and the value should be expressed in the appropriate type. The code for our example can be seen below:

;; Creates a deepnet with some settings. Once it's
;; completed the ID is stored in my_deepnet variable
(define my_deepnet
  (create-and-wait-deepnet {"dataset" training_dataset
                            "max_iterations" 100000
                            "default_numeric_value" "mean"}))

The equivalent code for the BigML Python Bindings is:

from bigml.api import BigML
api = BigML()
args = {"max_iterations": 100000, "default_numeric_value": "mean"}
training_dataset ="dataset/59b0f8c7b95b392f12000000"
my_deepnet = api.create_deepnet(training_dataset, args)

For more details about these and other properties, please check the dedicated API documentation (available on October 5.)

Once the Deepnet has been created, we can evaluate how good its performance is. Now, we will use a different dataset with non-overlapped data to check the Deepnet performance.  The “test_dataset” parameter in the code shown below represents the second dataset. Following WhizzML’s philosophy of “less is more”, the snippet that creates an evaluation has only two mandatory parameters: a Deepnet to be evaluated and a Dataset to use as test data.

;; Creates an evaluation of a deepnet
(define my_deepnet_ev
 (create-evaluation {"deepnet" my_deepnet "dataset" test_dataset}))

Similarly, in Python the evaluation is done as follows:

from bigml.api import BigML
api = BigML()
my_deepnet = "deepnet/59b0f8c7b95b392f12000000"
test_dataset = "dataset/59b0f8c7b95b392f12000002"
evaluation = api.create_evaluation(my_deepnet, test_dataset)

After evaluating your Deepnet, you can predict the results of the network for new values of one or many features in your data domain. In this code, we demonstrate the simplest case, where the prediction is made only for some fields in your dataset.

;; Creates a prediction using a deepnet with specific input data
(define my_prediction
 (create-prediction {"deepnet" my_deepnet
                     "input_data" {"sepal length" 2 "sepal width" 3}}))

And the equivalent code for the BigML Python bindings is:

from bigml.api import BigML
api = BigML()
input_data = {"sepal length": 2, "sepal width": 3}
my_deepnet = "deepnet/59b0f8c7b95b392f12000000"
prediction = api.create_prediction(my_deepnet, input_data)

In addition to this prediction, calculated and stored in BigML servers, the Python Bindings allow you to instantly create single local predictions on your computer. The Deepnet information is downloaded to your computer the first time you use it, and the predictions are computed locally on your machine, without any costs or latency:

from bigml.deepnet import Deepnet
local_deepnet = Deepnet("deepnet/59b0f8c7b95b392f12000000")
input_data = {"sepal length": 2, "sepal width": 3}

It is pretty straightforward to create a Batch Prediction from an existing Deepnet, where the dataset named “my_dataset” represents a set of rows with the data to predict by the network:

;; Creates a batch prediction using a deepnet 'my_deepnet'
;; and the dataset 'my_dataset' as data to predict for
(define my_batchprediction
 (create-batchprediction {"deepnet" my_deepnet
                          "dataset" my_dataset}))

The code in Python Bindings to perform the same task is:

from bigml.api import BigML
api = BigML()
my_deepnet = "deepnet/59d1f57ab95b39750c000000"
my_dataset = "dataset/59b0f8c7b95b392f12000000"
my_batchprediction = api.create_batch_prediction(my_deepnet, my_dataset)

Want to know more about Deepnets?

Our next blog post, the last one of this series, will cover how Deepnets work behind the scenes, diving into the most technical aspects of BigML’s latest resource. If you have any questions or you’d like to learn more about how Deepnets work, please visit the dedicated release page. It includes a series of six blog posts about Deepnets, the BigML Dashboard and API documentation, the webinar slideshow as well as the full webinar recording.

%d bloggers like this: