Skip to content

Bring Machine Learning to PostgreSQL with BigML

As of late, we’ve been using PostgreSQL in BigML quite a lot, and so do some of our customers. We love the features that they are bringing in the next release (which is in Beta as I write) and particularly the one that allows creating what they call generated columns. In this post, I’ll be explaining what these generated columns are and how they can use the Machine Learning models in BigML to fill in any numeric or categorical field in your table.

What is a generated column?

A generated column is a special column that is defined as the result of a computation that involves any of the values of the regular columns in the row. Let’s see an example.

Say you have a table of contacts in your database where you fill in their first and last name plus their email. You might want to keep also the full name for output purposes, but of course, you don’t want that to be a column to be filled independently. Here’s where a generated column will come handy:

CREATE OR REPLACE FUNCTION my_concat(text, text)
RETURNS TEXT AS 'text_concat_ws' LANGUAGE internal immutable;

CREATE TABLE contacts (
    first_name TEXT,
    last_name TEXT,
        my_concat(' ',first_name,last_name)) STORED,
    email TEXT);

The full_name column is defined as generated always, so you will not be able to insert values in that column. Instead, the column is automatically filled by a concatenation of the contents of first_name and last_name with a blank between them.

testdb=# INSERT INTO contacts (first_name, last_name, email)
    VALUES ('John', 'Doe', '');
testdb=# SELECT * FROM contacts;
 first_name | last_name | full_name |     email      
 John       | Doe       | John Doe  |
(1 row)

Ok, that’s not bad at all and will both ensure consistency and ease maintenance. However, the information in the table has not increased. The generated column is not telling us anything that we do not know in advance. What if we could use Machine Learning to add more information to our table?

Machine Learning insights

For those of you who are not familiar with Machine Learning, it’s a branch of Artificial Intelligence that has been proven to be very useful to the Enterprise so far. The basic idea behind Machine Learning is using computers to label things for us by just providing a collection of previously labeled examples. The computer uses some algorithms to learn from these examples and is able to predict the label for the new incoming cases.

For instance, imagine that you run a telecom. Some of your customers will churn, but which ones? Wouldn’t it be nice to be told who is likely to churn? Maybe you could offer a discount or other offer to convince them to stay.

That’s one of many things that Machine Learning can do for you. Based on the examples of customers that churned, the computer can learn which patterns lead to churn and predicts whether a user calling your customer service line matches any of those patterns and therefore at risk of churning (for more details, check this post by Chris Mohritz).

Back to our example, how could we add that label to our call center table?

Powering tables with AI

In order to predict the likeliness of a customer to churn, the Machine Learning algorithms build models. You can learn more about the different types of models and their uses in our videos. As this post is not focused on how to build a model, let’s use an existing model for the telecom churn problem that we have in our model gallery. You can easily clone that into your BigML account for free.

Churn telecom model

Feeding what we know about the user to the model (the total day minutes, voice mail plan, total day charge, total intl minutes, total intl calls, etc.) the model will tell us what we don’t know: whether the user is likely to churn. Could we add that information as one more column in our table?

The good news is that PostgreSQL offers extensions that allow you to define functions using several general purpose languages. One of them is plpythonu that you can use to embed Python code in the postgreSQL functions. Also, BigML offers bindings to several languages (Python included) that know how to use the Machine Learning models to create predictions for your input data. Let’s put all of that together in five steps:

  1. Register or login to BigML so as to use its models.
  2. Clone the model available in BigML’s gallery to your account.
  3. Install the Python bindings.
  4. Create a function to generate the prediction.
  5. Create a table to store your input data and the generated column.

Step 1 is easily done by using the signup form at that will ask for your email and basic information. Then you can follow the link to the model and click on the buy link to copy it. From that moment, you’ll be able to use the model to make predictions. At this point, your model is stored in your private environment in BigML’s servers.

The next step is installing the bindings, where some classes are able to download that model to your local computer and use the information therein to predict the churn output for each set of inputs. The detail about how to install them can be found in the bindings documentation, but basically, it means using pip to install them:

pip install bigml

Now comes the time for defining the function that will predict whether the customer is going to churn in PostgreSQL:

CREATE OR REPLACE FUNCTION predict_churn(total_day_minutes REAL,
                                         voice_mail_plan TEXT,
                                         total_day_charge REAL,
                                         total_intl_minutes REAL,
                                         total_intl_calls REAL)
          RETURNS text
          AS $$
            from bigml.model import Model
            from bigml.api import BigML

            # ------ user's data -------------- #
            model_id = "model/52bc7fd03c1920e4a3000016" # model ID
            username = "my_username" # replace with your username
            api_key = "*****" # replace with your API key
            # ---------------------------------- #
            local_model = Model(model_id,
                                api = BigML(username, api_key,
            return local_model.predict( \
                {"total day minutes": total_day_minutes,
                 "voice mail plan": voice_mail_plan,
                 "total day charge": total_day_charge,
                 "total intl minutes": total_intl_minutes,
                 "total intl calls": total_intl_calls})
          $$ LANGUAGE plpythonu immutable;

The Python code uses the ID of the model, which can be retrieved from BigML’s dashboard, and your credentials (username and API key). Thanks to that, the Model class will download the model information to your computer the first time that function is called. The model will be stored in a ./storage folder, and from then on this local copy will be used to make the predictions. In order to use the function, we just need to create the table with a generated column as before:

    total_day_minutes REAL,
    voice_mail_plan TEXT,
    total_day_charge REAL,
    total_intl_minutes REAL,
    total_intl_calls REAL,
    churn_prediction TEXT GENERATED ALWAYS AS
                       total_intl_calls)) STORED);

And voilà! Next time a customer calls your call center, insert the information about him in the table

INSERT INTO churn (total_day_minutes,
VALUES (45,'yes',55,120,3);

INSERT INTO churn (total_day_minutes,
VALUES (55,'no',50,100,12);

The model will immediately add the prediction for the churn

SELECT churn_prediction FROM churn;
(2 rows)

Can’t wait to use it!

For those of you that want to try this right away, there’s an alternative to generated columns: using triggers. A trigger is a function that will be called on the event of inserting, updating or deleting a row. Triggers can be attached to tables so that tasks are performed before or after one of these designated events take place.

To mimic our example, we could create a regular table with plain columns

CREATE TABLE plain_churn (
    total_day_minutes REAL,
    voice_mail_plan TEXT,
    total_day_charge REAL,
    total_intl_minutes REAL,
    total_intl_calls REAL,
    churn_prediction TEXT);

but add a trigger on insert or update, so that the content of churn_prediction is computed as the prediction based on the rest of columns

CREATE or REPLACE FUNCTION predict_churn_trg()
          AS $$
            from bigml.model import Model
            from bigml.api import BigML
            # ------ user's data -------------- #
            model_id = "model/52bc7fd03c1920e4a3000016" # model ID
            username = "my_username" # replace with your username
            api_key = "*****" # replace with your API key
            # ---------------------------------- #
            local_model = Model(model_id,
                                api = BigML(username, api_key,
            new_values = TD["new"] # values to be stored
            new_values["churn_prediction"] = local_model.predict( \
                {"total day minutes": new_values["total_day_minutes"],
                 "voice mail plan": new_values["voice_mail_plan"],
                 "total day charge": new_values["total_day_charge"],
                 "total intl minutes": new_values["total_intl_minutes"],
                 "total intl calls": new_values["total_intl_calls"]})
            return "MODIFY"
          $$ LANGUAGE plpythonu;

By associating the trigger to the previous table, the churn_prediction column will also be automatically generated when the rest of the values change.

EXECUTE PROCEDURE predict_churn_trg();

So we are ready to go!

INSERT INTO plain_churn (total_day_minutes,
VALUES (55,'no',50,100,12);
SELECT churn_prediction FROM plain_churn;
(1 rows)

Cool right? Let us know how your experience goes and meanwhile happy predicting!

Celebrating 100,000 Registered Customers!

It’s not every day that one comes across a commercial software platform hitting the 100,000 registrations mark in the Machine Learning world. After all, Machine Learning is only now shedding its reputation as mostly an academic endeavor and becoming a business imperative for both large and mid-sized (or even small businesses) that represent many industries and are looking to implement a wide variety of use cases.

BigML Use Cases

In the case of BigML, it took about 6 years to get to our first 50,000 registrations starting from our inception in 2011. However, it has taken less than 2 years to add the next 50,000, which is a testament to BigML’s staying power despite the existence of a dizzying array of Machine Learning tools including highly specialized open source tools and libraries.100k Customers

Naturally, one wonders what forces are driving the accelerating adoption of BigML given our recent experience. As you’d expect, while some reasons are exogenous, others are endogenous to BigML’s product design and go-to-market choices. With that stated, the following waves of change come to the fore:

  • Without a doubt, the interest in Machine Learning has seen an exponential increase in the business world. The routine mentions of “Machine Learning” and/or “AI” in public company earning calls by many executives demonstrate how related initiatives are perceived to introduce strategic implications for many industries.


  • The BigML platform has continually evolved and improved over the course of the last two years making it more comprehensive and able to handle many diverse use cases once out of its reach. It’s somewhat nostalgic to remember that the first version of BigML only featured decision trees as part of a very simple workflow that supported flat file imports and the ability to make form-based single predictions. Over time, BigML has evolved to not only support more algorithms but also multiple options for automation of workflows all the while abstracting infrastructure layer concerns from the analytical end-user in a scalable manner.


  • We must give a special attribution to our auto-ML capability OptiML, which has leveled the playing field for even the novices not familiar with the intricacies of hyperparameter tuning by automating the chore of picking just the best set of parameters for any classification or regression technique available on the platform. More competent models, in turn, mean higher potential business impact and even more interest in iterating with better features, more data, etc. Before you know it, it becomes a positive feedback loop!



  • Our insistence on skipping online forms to fill and sales calls to have before an interested party can even get to experience BigML has also been paying off handsomely. We like to refer to this as Free and Immediate Access as there are no large downloads or painful setup or installation routines or worse yet credit card verifications needed to actually tackle your first predictive use case. Just enter your email and kickstart your personal Machine Learning journey.


  • The next factor is what we can sum up as the human touch aspect and it includes a mix of affordable summer schools, certifications, timely customer support provided to both paying and non-paying users as well as customized assistance that is tailored to desired predictive use cases and ML techniques.


The BigML Team

Late to the party? Get started today…

So regardless of your level of understanding of Machine Learning or the sophistication at your workplace about the matter, you have a spectrum of options to engage with the BigML platform to get real value in the shortest amount of time. We suggest you try any or all of the following routes as your first step and don’t hesitate to reach out to us anytime.

  • FREE Forever Subscription: If you haven’t done so there’s never a better time than now to sign up for the FREE version of BigML.  It only takes an email.
  • FREE Education videos: Unlike the typical advice on how to become a Data Scientist (HINT: take many online courses, read many books on statistics, etc.) you can find a comprehensive set of education videos on each BigML resource that assumes no prior Machine Learning background.
  • BigML Lite for Small Business or Pilot Projects: Larger businesses usually require their own dedicated instance of BigML due to internal rules or preferences but for SMBs or a single business unit of a large organization, it makes more sense to deploy BigML Lite for cost and speed to market reasons.

As the BigML Team, we’re proud to serve our community of early adopters and wish to add another 100,000 users in the next year!

Wrapping up the first Machine Learning Summer School in The Netherlands

85 decision makers, analysts, domain experts, entrepreneurs, and academics from 13 countries came together at the Nyenrode Business University in Breukelen, Netherlands this week to attend the 1st edition of BigML’s Machine Learning Summer School in The Netherlands. These attendees represented 38 distinct organizations and 5 universities ensuring a diverse mix of expertise and experience with data-driven decision making.


The event was organized into three parts. The program for Day 1, Machine Learning for Executives, targeted business leaders as it concentrated on the economics of Machine Learning in a business setting and establishing the right strategy when adopting it. Days 2 and 3, Introduction to Machine Learning, made up the main track during which attendees got a more detailed technical look under the hood of supervised and unsupervised learning techniques supported by the BigML platform. Finally, despite the fact that most of the attendees had little or no background on Machine Learning prior to the summer school, that didn’t stop them from successfully completing a credit default risk analysis use case example as part of the workshop on Day 4, Working with the Masters.

The chairman of the event, Jan W. Veldsink of Rabobank, stressed the importance of keeping Machine Learning initiatives within business departments rather than treating them solely as IT initiatives, which makes it harder to unearth valuable, data-driven inferences. If business unit representatives aren’t included until the later stages establishing trust becomes very difficult, which causes problems when operationalizing any new insights. Ideally, each project team should contain a data domain expert, a business owner, and a Machine Learning expert.

Associate Professor Jeroen van der Velden of Nyenrode warned the audience that AI solutions also generate new types of risks (e.g., biases in training data) that must be given special attention before any adverse effects are observed in production. This is an area that doesn’t yet receive the attention it deserves from industry practitioners but as the regulatory landscape catches on it will no longer be optional.

Wibout De Klijne of Rabobank touched on the idea of combining expert opinion with Machine Learning models to arrive at the best possible outcomes within the context of fraud detection systems. Using more interpretable modeling techniques lessens any reservations or objections against black box models “taking over”.

Enrique Dans, of IE University provided a deep dive into the business opportunities possible thanks to Machine Learning such as personalized healthcare, but also mentioned that it likely will be a bumpy ride as experts (e.g., doctors) incorporate models into their everyday workflows.

BigML’s Chief Scientist Professor Tom Dietterich gave the audience a quick tour of the latest and greatest from the world of Machine Learning research getting them to go a level deeper on topics like contextual bandits, deep learning interpretability and continuous learning agents.


BigML client Juriblox , as well as partners T2Client and A1 Digital, presented reference Machine Learning use cases such as predicting logistics expedition outcomes, energy trading, and others in the automotive and legal verticals.

Nyenrode Business University

Good news for those that could not make it to this edition, but the presentions from the summer school can be found here. However, we’re looking forward to hosting you at the next Machine Learning summer school or training event!

Meet the Lecturers of the first Machine Learning Summer School in The Netherlands!

The first edition of our Machine Learning Summer School in The Netherlands is here! On July 8-11, at the Nyenrode Business Universiteit, in Breukelen, executives, managers, decision makers, technology and business professionals, will get a good overview of how Machine Learning has evolved and where the industry is headed, both from a technical and business perspective. We will learn the must-know core Machine Learning concepts and techniques and innovative real-world use cases to understand how companies are already applying it. Moreover, we will cover the benefits and ways of adopting Machine Learning in any size organization. Finally, the attendees will have the chance to immediately apply the new skills learned as part of a practical workshop.

The Distinguished Lecturers

Machine Learning: Why Now? 

by Atakan Cetinsoy, VP of Predictive Applications at BigML.

Machine Learning: A Business Perspective

Digital Strategy and AI

Machine Learning for Fraud Detection

Machine Learning for Managers

by Jan W VeldsinkNyenrodeRabobank, and Grio. Check out his interview about the event here!

Machine Learning for Law


Machine Learning: Technical Perspective

Anatomy of an Application: Machine Learning End-to-End

Machine Learning Techniques

Machine Learning for Logistics: Predicting Expedition Outcome

Machine Learning for Energy Trading and the Automotive Sector

Data Preparation for Machine Learning

by Paul RobertsTrifactasponsoring the DutchMLSchool.


Machine Learning Put to Use

by Mercè Martín Prats, VP of Insights and Applications at BigML.

Automating your own Machine Learning Projects

by jao, Co-Founder and Chief Technology Officer at BigML.


Check the full agenda for more details on each talk and get your ticket today if you don’t have it yet!

Thirdware and BigML Announce Major Partnership to Accelerate Enterprise Adoption of Machine Learning

Thirdware Inc is a leader in enterprise applications that has supported a range of Fortune 500 organizations as an implementation partner for infrastructure technologies such as enterprise resource planning, enterprise performance management, cloud services, and robotic process automation. Today, we are happy to announce our partnership with Thirdware Inc to accelerate enterprise adoption of Machine Learning.

The partnership is a continuation of Thirdware’s long-term growth strategy, which has been a key focus for Thirdware CEO, Bhavesh Shah. “Our mission at Thirdware has always been to build the most comprehensive portfolio of technology solutions to support the rapidly evolving ecosystem of emerging technologies relevant for the automotive industry. Given BigML’s pioneering work as a Machine Learning platform, we see the addition as a natural fit to bring substantial value to the automotive industry,” said Shah.

A New Group Within Thirdware

In tandem with this partnership, Thirdware has formalized a new group within the company called Thirdware Labs and has brought on former Ernst & Young, Fiat Chrysler Automotive, and Ford Motor Company Executive, Kristin Slanina, as its first Chief Transformation Officer.

“I’ve spent my career in the heart of the evolving automotive industry and have seen the challenges of adopting emerging technology become a huge barrier in the board-room, which continues to be a key topic in the C-suite today. We believe Machine Learning can catapult the mobility ecosystem, as well as other industry verticals, into a new universe of monetization. The BigML partnership represents a major step towards achieving that vision,” said Kristin Slanina, Thirdware’s Chief Transformation Officer.

How BigML & Thirdware Will Add Value Together

For BigML, this partnership serves as an equally important milestone. The company will soon announce its 100,000 registered user milestone on its software-as-a-service Machine Learning platform. “Since the inception of BigML, the team and I have always focused on building the most complete, methodologically robust, and easy-to-use Machine Learning platform in the marketplace. Now, with Thirdware’s leadership and expertise in the enterprise, we are confident that we can further reduce the barriers that most teams within large companies have when it comes to solving problems using Machine Learning,” said Dr. Francisco Martin, Co-founder & Chief Executive Officer of BigML.

As it continues its momentum in 2019, there are two other areas in which Thirdware will continue to build capabilities in unison with its new Machine Learning offering. This includes blockchain and connectivity. By the end of 2019, Thirdware Labs will extend its offerings in these emerging technologies to other industry verticals such as healthcare, finance, and energy. In many ways, the application of these emerging technologies across the industry verticals undergoing the most disruptive change represents the long-term vision of Thirdware.

Earlier this year, Thirdware tapped Mohammad Hamid, former Chief Executive Officer of Unison, to join its team as a principal within Thirdware Labs. “While there has been a proliferation of advanced technologies over the past 5-6 years, many large enterprises still struggle to evaluate, implement, and scale these technologies across multiple regions, hundreds of thousands of employees, and multiple business units. With the pedigree of Thirdware over the past few decades and partnerships with cutting-edge technology companies like BigML, we believe that Thirdware can bridge this gap,” said Hamid.

Commenting on the new Partnership, Dr. José Antonio Ortega, BigML’s Co-founder and Chief Technology Officer said that “Machine Learning is reaching a level of maturity making it feasible for any enterprise to adopt and automate sophisticated tasks formerly managed by trusted human experts manually. With BigML, this wave of innovation can be further standardized and streamlined, which results in many robust and easily reproducible custom workflows each augmenting a specific decision-making process. The combination of Thirdware’s pedigree in Enterprise IT services and BigML’s machine learning expertise will help countless enterprises make the transition to The Fourth Industrial Revolution by optimizing their businesses in a cost effective manner,” said Ortega.

Next Steps & Contact Information

The path ahead for Thirdware customers now includes the option to utilize the BigML platform. Furthermore, Thirdware will provide professional services related to BigML including: data preparation, advanced feature engineering, advanced modeling and prediction strategy, model operationalization, and measuring the business impact of automated process by machine-learned models. For more information on Thirdware’s Machine Learning capabilities or the broader portfolio of capabilities in emerging technologies, please contact or

Adding Two Marquee Lecturers to our #DutchMLSchool Lineup

Our very first Machine Learning Summer School in The Netherlands (#DutchMLSchool) is taking place on July 8-11, 2019 in Breukelen, The Netherlands, in collaboration with Nyenrode Business University.

Today, we have two important announcements regarding the program that is shaping up to be our strongest one yet. Firstly, we are very happy to share that BigML’s Chief Scientist, one of the founders of the field of Machine Learning, Professor Tom Dietterich will be a presenting. Professor Dietterich’s historical perspective on the evolution and the current state of Machine Learning is unmatched so the audience will be treated to quite a journey regarding the most salient topics in both applied research and advances in various industry verticals.

Our second addition as a lecturer is none other than Enrique Dans, Professor of Information Systems at IE Business School, who has previously participated in our Valencian Summer School in Machine Learning 2017 to great fanfare. Dr. Dans will share his expertise on the overall business impact and the future strategic planning implications of innovative technologies such as Machine Learning across our global digitized economy with relevant use case examples.

Machine Learning Put to Use

One such great real-world example is from Rabobank‘s recent experience in deploying a custom Machine Learning solution for fraud detection built on top of the BigML platform. Below is a video about the implementation approach and early results, as well as how they adopted Rabobank across the organization to whet your appetite on what’s to come your way should you attend the #DutchMLSchool.

As always, the program will be blending a healthy mix of most versatile Machine Learning techniques as well as real-world predictive use cases so the participants will have a very grounded view of the possibilities that Machine Learning can unlock in their business contexts. Spread over the course of four days, attendees of different profiles will be able to find impactful content that best fits their roles:

  • Machine Learning for Executives – July 8 (day 1): A C-level course on Machine Learning, ideal for business leaders and senior executives in all industries. Attendees will be able to understand how Machine Learning can be adopted in any organization, focusing on the strategy to follow as well as the key points that managers should know when making decisions.
  • MAIN CONFERENCE: Introduction to Machine Learning – July 9 and 10 (days 2-3): A two-day crash course designed for business innovators, industry practitioners, as well as students, seeking a quick, practical, and hands-on introduction to Machine Learning to solve real-world problems.
  • Working with the Masters – July 11 (day 4): A full day of learning with the Machine Learning masters that helps put theoretical concepts into practice in a hands-on manner. This course is tailored for experienced business analysts, data scientists, and Machine Learning practitioners that wish to work on real-world data. Attendees will be able to bring their own data.

If you don’t want to miss out on this great opportunity to add to your hands-on analytical skills and be more knowledgeable about this foundational technology, be sure to register and take advantage of our early bird rates today!

Machine Learning for MBAs? Yes, they can!

Two weeks ago, I had the chance to conduct a workshop at The University of California, Berkeley’s Haas School of Business as part of Professor Gregory La Blanc’s Data Science and Strategy class for MBAs and business leaders. This meant showcasing a subset of the comprehensive Machine Learning capabilities of the BigML platform such as Models (Decision Trees), Logistic Regressions and Ensembles while solving some example use cases centered around the predictive use cases of disease diagnostics and credit risk analysis. The best of it was that those in the classroom got to replicate those use cases in their own BigML accounts instead of passively observing.

Haas School of Business

Haas School of Business

According to the syllabus, the objective of the Data Strategy course is to provide an understanding of the role of data and statistical analysis in managerial decision-making with a specific focus on the role of managers as both consumers and producers of information, illustrating how finding and/or developing the right data and applying appropriate statistical methods can help solve problems in business. As such, the main focus areas are developing literacy within the potentially intimidating field of quantitative analytics and the ability to assess existing business models from that analytical prism.

As an MBA that has followed a career trajectory spanning highly data-driven roles such as marketing analytics, software product management, and business intelligence I have consistently the beneficiary of following an empirical approach informed by insights based on business data harvested from various systems of record.

Haas School of Business MBAs

After the workshop, I’m very encouraged to have seen the conviction and the resolve from tomorrow’s MBA candidates to own up to the “In God we trust, but all else bring data” mentality. In addition to that broader impression, I’d like to share some findings from an informal survey shared with the attendees.

  • The class had a good mix of those with technical degrees (engineering, math, etc.) and non-technical degrees.
  • Based on survey feedback, more than two-thirds of the class did not have any prior experience with Machine Learning whatsoever. The remaining ones had some limited exposure in the form of self-learning or a related class they took as part of their former technical education. With that said, none had practiced Machine Learning in their prior careers. All in all, they were newbies to Machine Learning.
  • On a very positive note, after the workshop, most respondents thought Machine Learning can be described as a more advanced form of analytics while some opined that it’s also increasingly a must-learn skill set for any white-collar professional. Interestingly, no attendees mentioned that Machine Learning is too complex and confusing or “overhyped” even though those were also offered as attitudinal choices. We’ve been observing this new behavior for multiple years now. Some refer to it as the Citizen Data Scientist movement even though I don’t much fancy that phrase but am fully in support of the core concept it represents.
  • Perhaps the most interesting feedback was related to the main motives in learning Machine Learning. Almost all respondents agreed that they would like to be able to better communicate with Machine Learning specialists or Data Engineers in their future jobs by having a good grasp of the core concepts of Machine Learning ( e.g., cut through ‘hype’ or jargon) as well as being self-sufficient when it comes to discovering insights in business data they have direct access to. Following those top two reasons was the perception that Machine Learning has become a highly desirable skill by employers potentially giving them an edge when re-entering the job market. Close behind that third motivation was the fact that some find Machine Learning intellectually stimulating regardless of its implications on their future career. I suspect those were skewed to the left-brained ones with technical degrees.
  • Last but not least, almost everyone in the classroom thought that they were likely to use BigML especially when they are considering a new predictive use case where they have access to relevant business data.

I predict future business leaders will follow in the footsteps of examples like NDA Lynn such that they won’t be afraid to autonomously initiate and execute their search for new business insights with or without help from scientists and/or researchers in their organizations. We’ll keep tirelessly promoting the promise and potential of Machine Learning and see how far we can take this prediction.

Machine Learning Summer School in The Netherlands: First Edition!

BigML and Nyenrode Business Universiteit are thrilled to announce the first edition of our Machine Learning Summer School in The Netherlands! The four-day event will take place at Nyenrode Business University, in Breukelen, and the program is designed to cater to different professional profiles and their needs:

  • Machine Learning for Executives – July 8 (day 1): A C-level course on Machine Learning, ideal for business leaders and senior executives in all industries. Attendees will be able to understand how Machine Learning can be adopted in any organization, focusing on the strategy to follow as well as the key points that managers should know when making decisions. Additionally, we will see several real-world success stories presented by companies that are currently applying Machine Learning techniques.
  • MAIN CONFERENCE: Introduction to Machine Learning – July 9 and 10 (days 2-3): A two-day crash course designed for business innovators, industry practitioners, as well as students, seeking a quick, practical, and hands-on introduction to Machine Learning to solve real-world problems. The content presented during these two days will serve as a good introduction to the kind of work that students can expect if they enroll in advanced Machine Learning and AI Masters.
  • Working with the Masters – July 11 (day 4): A full day of learning with the Machine Learning masters that helps put theoretical concepts into practice in a hands-on manner. This course is tailored for experienced business analysts, data scientists, and Machine Learning practitioners that wish to work on real-world data and real use cases; a unique opportunity to work with leading Machine Learning experts. Attendees will be able to bring their own data.


Nyenrode Business Universiteit, Straatweg 25, 3621 BG Breukelen, The Netherlands. See map here.


4-day event: on July 8-11, 2019 from 8:30 AM to 5:00 PM CEST.


Please purchase your ticket(s) here. We recommend that you register soon as space is limited. You can join the complete four-day event for a full experience or just the courses you find most interesting!


You can check out the full agenda and other details of the event here.


Get to know the lecturers and speakers and other attendees during the networking breaks and dinners we offer after the sessions. We expect hundreds of locals as well as Machine Learning practitioners and experts attending from all around the world!

Do not hesitate to contact us at if you would like to co-organize a Machine Learning School in your city, as we look forward to growing the Machine Learning Schools series!

Machine Learning Internship: Standing on the shoulders of giants

I took this photo at the Valencian Summer School in Machine Learning 2018. That was my second Summer School, but my first one as a BigML intern. My internship had just started few days ago. Since I published this tweet last September things changed a lot, but let me provide a context for it.

Internship tweet

What happened between both Summer Schools? I realized that almost all my viewpoints about Machine Learning were wrong.

I belong to the most adaptive and agile generation ever. People call us Millennials. We were born during the dot-com bubble, and we lived through the dot-com crash. We saw the first iPhone keynote and the transformation from taxis to Ubers and hotels to Airbnbs. We know hype well, and we’re starting to learn how to separate hype from real value. It was in my first Valencian Summer School, and more specifically, during the Enrique Dans talk, when I decided to unlearn everything I had been told previously about Machine Learning.

I forgot about killer robots, having machines replacing doctors or trying to build KITT. Instead, I started to think about finding patterns in data that can help doctors making decisions, reduce waste of energy or help to save lives by preventing disasters.

In the same way, I forgot about unaffordable GPUs, tons of hours of programming every single line of every single ML algorithm or the frustration of not being able to find the best hyper-parameters for my  model. Instead, I started to focus on the problem, not the tool, and let BigML do the rest for me. After all, why shy away from standing on the shoulders of giants?

And that was my philosophy during this internship. I got certified as a BigML Engineer, worked on multiple real-world use cases and created workflows with WhizzML to perform Feature Selection. And then, I met one of those giants to stand on, Jao, BigML’s CTO. With him, I started working on BigML’s backend, called wintermute.

I discovered the benefits of functional programming with Jao, and he even introduced me to the emacs religion! The experience I gained with WhizzML helped me to move forward and abandon the Algol family of languages. Clojuredocs was my homepage during those days, and it still is.

There is an interesting internal project in which I’ve been involved that I would also like to mention. It’s called Neuromancer. With Neuromancer, we can see how well our resources scale, beyond the Big-O notation. It let us test possible optimizations for all BigML’s models.

Pablo González

Looking back, the journey has been long, but this is only the beginning. Now, as a full-time employee of BigML, I will keep contributing to our mission of democratizing Machine Learning as it penetrates all corners of our globe. Just like a bamboo plant, we’ve planted it a while back on stable ground, and we now see a few new bamboo shoots growing each and every day. But, soon enough, when the roots are fully established underground, it will grow as crazy, positively impacting all Millenial careers for decades to come.

Deep Learning, Part 3: Too Deep or Not Too Deep? That is the Question.

In my previous two posts in this series, I’ve essentially argued both sides of the same issue. In the first, I explained why deep learning is not a panacea, when machine learning systems (now and likely always) will fail, and why deep learning in its current state is not immune to these failures.

Deep Learning

In the second post, I explained why deep learning, from the perspective of machine learning scientists and engineers, is an important advance: Rather than a learning algorithm, deep learning gives us a flexible, extensible framework for specifying machine learning algorithms. Many of the algorithms so far expressed in that framework give orders of magnitude-level improvement on the performance of previous solutions. In addition, it’s a tool that allows us to tackle some problems heretofore unsolvable directly by machine learning methods.

For those of you wanting a clean sound-byte about deep learning, I’m afraid you won’t get it from me. The reason I’ve written so much here is that I think nature of the advance that deep learning has brought to machine learning is complex and defies broad judgments, especially at this fairly early stage in its development. But I think it is worth it to take a step backward and try to understand which judgments are important and how to make them properly.

Flaky Machines or Lazy People?

This series of posts was motivated in part by my encounters with Gary Marcus’ perspectives on deep learning. At the root of his positions is the notion that deep learning (and here he means “statistical machine learning”) is, in various ways, “not enough”. In his medium post, it’s “not enough” for general intelligence, and in the synced interview it’s “not enough” to be “reliable”.

This notion of whether current machine learning systems are “good enough” gets to the heart of the back and forth on deep learning. Marcus cites driverless cars as an example of how AI isn’t mature enough yet to rely on 100%, and that AI needs a “foundational change” to ensure a safe level of reliability. There’s a bit of ambiguity in the interview about what he means by AI, but my own impression is that this is less of a critique of machine learning, and more of a critique of the software around it.

For example, we have vision systems able to track and identify pedestrians on the road. These systems, as Marcus says, are mostly reliable but certainly make occasional mistakes. The job of academic and corporate researchers is to create these systems and make them as error-free as possible, but in the long run, they will always have some degree of unreliability.

Something consumes the predictions of these vision systems and acts accordingly; it is and always will be the job of that thing to avoid treating these predictions as the unvarnished truth. If the predictions were guaranteed to be correct, the consumer’s job would be much easier. As it is, consuming the predictions of a vision system requires some level of cleverness and skepticism. Maybe that cleverness involves awareness of separate sensor systems or other information streams like location and time of day. It might require symbolic approaches of the type Marcus favors. It might require more and very different deep learning, as Yann LeCun suggests. It might require something that’s entirely new.

Designing software that works properly with machine-learned models is hard. You have to do the difficult work of characterizing the model’s weaknesses and engineering around them. But critical readers should reject the notion that machine learning needs to provide extreme reliability on its own in order to be useful in mission critical situations. If a vision system can accurately find and track 95% of pedestrians, and other sensors and logic pick up the remaining 5%, you’ve arrived at “enough” without having a perfect model.

When is “Enough” Enough?

So then the question becomes, “are we there yet?” with current ML systems. That depends, of course, on how good we think we need them to be for the engineers and domain experts to pull their outputs across the finish line. There are a lot of areas in which deep learning puts us in shouting distance, but it general, whether or not we’re there yet depends in turn on what you want the system to do and the quality of your engineers. When thinking about that question, though, it’s important to consider that the finish line might not be exactly where you think it is.

Consider the problem of machine translation. Douglas Hofstadter wrote a great article where he systematically lays bare the flaws in state-of-the-art machine translation systems. He’s right: For linguistic ideas with even a little complexity, they’re not great and are at times totally unusable. But the whole article reminded me of a blog post Hal Daumé III wrote more than 10 years ago when he and I were both recent Ph.D.’s In it, he wonders how much of human translation is better than computer translation when you really consider everything (street signs, menus, simple interpersonal interactions, and so on). Again, he asks this more than ten years ago.

The point here is that if machine translation for these things is already noticeably better than the second-rate human translations we apply in practice (or was ten years ago), there’s already a sense in which the models we have are very much good enough. How it deals with more complex phrases and ideas is an interesting question, and might yield new research directions, but this is all academic as far as its applicability is concerned. The existing technology, imperfect as it is, has a use and a place in society.

Even less relevant is how “deep” the model’s knowledge is, or how “stupid” it is, or whether the algorithm is “actually learning” (whatever that means). These are all flavors of the “computers don’t really understand what they’re doing” argument that traces its way through Hofstadter, John Searle,  Alan Turing and dozens of other philosophers all the way back to Ada Lovelace. There are loads of counter-arguments (I have even spun out a few of my own versions), but maybe the most compelling reason to ignore these questions is that the answers are often less interesting than the answer to the question, “Can we use it?”

A number of years ago, my wife and I hosted two members of a Belgian boys choir that was on tour. Neither she nor I spoke any French, so we relied on Google Translate to communicate with them. To this day, I remember typing “We made a pie.  Would you like some?” into my phone and watching their faces light up as the translation appeared. Did the computer understand anything about pie, or generosity, or the happiness of children, or how its own flawed translations could help create indelible memories?  Probably not.  But we did!

The Final Exam

Artificial Intelligence

The criticism that machine learning is not enough on its own to produce systems that exhibit reliably intelligent behavior is a broken criticism. Deep learning gets us part of the way towards such systems, perhaps quite a lot of the way, but does anyone think it’s necessary or even advisable to cede the entire behavior of, say, a car to a machine-learned model? Saying no doesn’t mean backing away from a fully-autonomous car; as Marcus himself points out, there are other techniques in AI and software at large that are better suited to certain aspects of these problems. There can be many layers of human-comprehensible logic sitting between deep learning and the gas pedal, and it’s likely the totality of the system, rather than the learned component alone, that will display behavior that we might recognize as intelligent.

Is it a flaw or a problem with deep learning when it can’t solve the aspects of these problems that no one really wants or needs solved?  I don’t think so. Again, paraphrasing Marcus (and myself), machine learning is a tool. If you buy a nail gun and it jams, then yeah, that’s a problem with the nail gun, but if you try to use a nail gun to cut a piece of wood in half, that’s more of a problem with you. Deep learning is a very important step forward in the evolution of the tool (and a large one compared to other recent steps), but that step doesn’t change its fundamental nature. No matter what improvements you make, a nail gun is never going to become a table saw. Certainly, it’s unethical and bad business for tool manufacturers to make inflated claims about their tool’s usefulness, but it’s finally the job of the operator to determine which tool to use and how to use it.

Pundits can argue all day long about how impactful deep learning is and how smart machine learning can possibly be, but none of those arguments will matter in the long run. As I’ve said before, the only real test of the usefulness of machine learning is if domain experts and data engineers can leverage it to create software that has value for other human beings. Therein lies the power, the only real power, of new technology and the only goal that counts.

%d bloggers like this: