Skip to content

Machine Learning in Construction: Predicting Oil Temperature Anomalies in a Tunnel Boring Machine

Today, we continue our series of blog posts highlighting presentations from the 2nd Edition of Seville Machine Learning School (MLSEV). You may read the first post about the ‘6 Challenges of Machine Learning’ here.

One of the very interesting real-world use case presentations during the event was that of Guillem Ràfales from SENER. Founded in 1956 in Spain, SENER is a multi-national private engineering and technology group active in a set of diverse industrial activities such as construction, energy, environment, aerospace, infrastructure and transport, renewables, power, oil & gas, and marine.

SENER Projects

SENER’s Tunnel Construction Projects

Under its construction activities, SENER has successfully completed 19 large scale tunnel boring projects amounting to 80 kilometers of urban tunnels and a total of 224 kilometers of tunnels in the last 20 years. A great example is the high-speed railway service project in Barcelona. SENER delivered the 5.25 km segment near Gaudi’s architectural masterpiece Basílica de la Sagrada Família, a UNESCO World Heritage site.
Tunnel Boring BCN

Select technical specs of SENER’s project in Barcelona, Spain

Tunnel Boring Machines (TBM) are used to perform rock-tunneling excavation by mechanical means. The main bearing of a TBM is the mechanical core of the colossal machine. It enables the turning cutter head and transmits the machine’s torque to the terrain. At all times, It is critical to keep the bearing properly lubricated, often to the tune of 5000 liters of oil. One of the ways to monitor TBM performance is to analyze the physical and chemical properties of the lubricant oil in regular intervals.


The operational benefits of applying Machine Learning and advanced analytics in the context of TBMs can be summed up as avoiding unnecessary wear, costly equipment breakdowns, and overall suboptimal performance that may result in cost and project delivery overruns.


With this consideration in mind, BigML has worked closely with SENER engineering teams to build models to predict changes in the gear oil temperature variations for their TBMs. There two main objectives of the project were to:
  • understand how various internal TBM parameters are related to temperature changes
  • try and predict such temperature changes to avoid machinery wear or failure

The team worked on a large dataset from a past SENER project that contained hundreds of measurements internal to TBM operations sampled every 10 seconds. Some of the key measurements included torque, speed, pressure and chamber material attributes. The fact that notable oil temperature variations tend to take place gradually and infrequently added to the overall challenge in the form of a highly unbalanced dataset. Despite these, BigML’s feature engineering, algorithmic learning resources were put to great use. The team was able to uncover key insights with the help of Association Discovery during the data exploration phase followed by Anomaly Detection and Classification modeling that ultimately helped SENER technicians isolate an important subset of instances, where the oil temperature increases could be anticipated in advance. The entire custom workflow was captured in the BigML platform for traceability, easy re-training and automation purposed as seen in the plot below.

BigML for Construction

Custom BigML workflow for the SENER project.

If you’ve hung around thus far, it’s time for you to take a more in-depth look into this exciting project pushing the limits of Machine Learning-powered smart applications in the field of Construction Engineering. The end-to-end Machine Learning process underlying this endeavor was managed and presented by BigML’s own Guillem Vidal. Now, please click on the Youtube video below and/or access the slides on our SlideShare channel:

Do you have a Construction Engineering challenge?

Depending on your specific needs, BigML provides expert help and consultation services in a wide range of formats including Customized Assistance to turn-key smart application delivery — all built on BigML’s market-leading Machine Learning platform. Do not hesitate to reach out to us anytime to discuss your specific use case at


MLSEV Conference Videos: ‘Six Challenges of Machine Learning’

We really enjoyed virtually hosting thousands of business professionals, developers, analysts, academics, and students during the two jam-packed days of training last week as part of the Seville Machine Learning School. To us, it was one more piece of evidence that Machine Learning is a global phenomenon that will keep positively impacting all kinds of industries as the world economy recovers from the effects of Novel Coronavirus.

As promised during the event, below are useful links to the material covered during MLSEV for your review and self-study as well as related pointers for follow up actions you can take.



One of the cornerstones of MLSEV was BigML Chief Scientist, Professor Tom Dietterich‘s presentation on the State or the Art in Machine Learning.  Professor Dietterich specifically talked about the Six Challenges in Machine Learning by providing the historical perspective for each point as well as the present-day state of affairs as it applies to the advances in research. These six challenges are:

  1. Generalization
  2. Feature Engineering
  3. Explanation and Uncertainty
  4. Uncertainty Quantification
  5. Run-time Monitoring
  6. Application-Specific Metrics


The video above is a must-watch to find out more on each topic and get caught up with some of the best new ideas the Machine Learning research community has been able to offer in recent years. For brevity, we’d like to open up a special parenthesis for Explanation and Uncertainty, which is near and dear to our hearts at BigML. ML Explanation and Uncertainty

In order for interpretability to make a difference, we need to refine the context in which explanations are needed. In many, cases that means understanding the end-user persona to consume the said explanations of the model and its predictions. Sometimes the persona is a Machine Learning engineer worried about more higher-order concepts like model performance metrics and overfitting. Other times, the end-user can be a frontline worker or end-consumer that will be looking for a simple cue (e.g., a recommendation for a similar product that’s in stock). A world-class smart application should be able to discern between the two scenarios while satisfying the needs of both sets of users.

We urge you to watch the rest of the video and find more about these key topics. One more thing…in the coming weeks, we will be covering other focal topics and themes from MLSEV as part of this blog series, so stay tuned!

The 2nd Edition of Seville Machine Learning School gathers thousands virtually

Due to the COVID-19 (Novel Coronavirus) pandemic, we’re living through unprecedented times that have put life on hold practically everywhere on earth. This very much applies to business gatherings such as conventions, symposiums, training programs, and conferences as well. The 2nd Edition of our Machine Learning School in Seville was one of many such events that were threatened to get canceled altogether. As the BigML team, we had to think on our feet and quickly react to the turn of events. The decision was made quickly to virtually deliver the content for FREE instead of telling the registrants thus far “Sorry folks, we’ll see you next time.” To be honest, we weren’t sure how the virtual event would be received given that people had a whole new set of priorities.

To our delight, many more thousands than originally planned for positively responded to the changes as if to say, “We won’t let this public health crisis keep us from our longer-term business and career goals!” As we were getting close to the event, we ended up with nearly 2,500 registrations from 89 different countries representing all continents but Antarctica! Over 900 businesses from a diverse set of industries and close to 500 education institutions made up our body of registrants. Overall, we observed a healthy 60%/40% split between business and academia, respectively.

MLSEV Registrations

As the event neared on March 26, 9 AM Central  European Time, we were pleased to host nearly 1,600 attendees live. They were happy to give a shout out to us on Twitter too, seemingly enjoying the fact that they were making the best out of their situation. Considering that CET is not the most convenient in many other parts of the world like North America, this was quite amazing!

MLSEV Tweets

MLSEV attendees sharing their experiences on Twitter

The high and participatory level of attendance from Day1, fortunately, carried over to Day 2 thanks to our distinguished mix of speakers ranging from BigML’s Chief Scientist, one of the fathers of Machine Learning, Professor Tom Dietterich to BigML customers (and partners) presenting real-life use cases as well as our experienced instructors delving into the state-of-the-art techniques on the BigML platform.

MLSEV Speakers

MLSEV Speakers connecting with the audience

Virtual Conferences are Here to Stay

Given this fresh experience in putting together our first virtual conference, we have a feeling this may be the wave of the future in a post-COVID-19 world that may drastically alter business travel habits and further limit opportunities to make contact at shared physical spaces. While one can easily make a case that human interactions online are not the same as in real life, we must also recognize that there are a different set of advantages in being virtual. Virtual events are perhaps best described not as perfect substitutes in that sense, but rather, as adjacent branches of the same tree. For those who are aspiring to organize virtual events in the future, here are a few pointers to take into account:

  • Coronavirus or not, life goes on. It pays off to have a parallel virtual event delivery plan even if your event is sticking with the good old co-location format.
  • Time zone differences are absolutely key for virtual conferences given spread-out speakers and attendees. Try and find the best balance based on the expected geographical center of gravity of your ideal audience.
  • Practice makes perfect. It’s best to schedule multiple dry-runs with each speaker prior to the event.
  • Have a ‘Plan B’ in case connection issues surface and be ready to shuffle content around to avoid people rolling their fingers and waiting for something to happen. Experienced moderators are key to carry out those transitions smoothly.
  • Make it (nearly) FREE! Unless your business model revolves around selling event tickets, it’s better to convey your message to a larger audience that has taken the time to register (for FREE) to your event. Attention is the most valuable currency in a world where content is constantly doubling.
  • Count on word of mouth to spread the message more so than traditional marketing channels. If your value proposition is strong enough people will show up.
  • Hands-on experiences beat dry, theoretical presentations online as most people can follow the steps involved in a virtual demo session (e.g., Machine Learning industry use case) void of distractions in their home office or other personal space provided that you give them access to the necessary tools. We made it a point to mention attendees can take advantage of the FREE-tier BigML subscription at the beginning of the event.
  • Overcommunicate. This applies equally before, during and after the event. Tools like Slack, Mailchimp, your blog and social channels help make up for the lack of physical contact.
  • Simulate the real world as appropriate. We put together four parallel “Meet the Speaker” Google Meet sessions in between regularly scheduled presentation/demo sessions to simulate the coffee breaks in physical spaces and they turned out pretty popular.
  • The all too familiar linear narrative is broken in the online world so it’s best to embrace non-linearity by breaking your video and/or slide contents into digestible pieces and sharing them online shortly after the event.

Are you planning a Machine Learning themed event in 2020?

Let us know of your idea at and we’d be happy to collaborate on it to better serve your audience.

Meanwhile, stay safe and carry on!

Machine Learning Benchmarking: You’re Doing It Wrong

I’m not going to bury the lede: Most machine learning benchmarks are bad.  And not just kinda-sorta nit-picky bad, but catastrophically and fundamentally flawed. 

ML Benchmarking

TL;DR: Please, for the love of statistics, do not trust any machine learning benchmark that:

  1. Does not split training and test data
  2. Does not use the identical train/test split(s) across each algorithm being tested
  3. Does not do multiple train/test splits
  4. Uses less than five different datasets (or appropriately qualifies its conclusions)
  5. Uses less than three well-established metrics to evaluate the results (or appropriately qualifies its conclusions)
  6. Relies on one of the services / software packages being tested to compute quality metrics
  7. Does not use the same software to compute all quality metrics for all algorithms being tested

Feel free to hurl accusations of bias my way. After all, I work at a company that’s probably been benchmarked a time or two. But these are rules I learned the hard way before I started working at BigML, and you should most certainly adopt them and avoid my mistakes.

Now let’s get to the long version.

Habits of Mind

The term “data scientist” has had its fair share of ups and downs over the past few years. At the same time, it can indicate both a person who’s technical skills are in high demand and a code word for an expensive charlatan. Just the same, I find it useful, not so much as a description of a skill set, but as a reminder of one quality you must have in order to be successful when trying to extract value from data. You must have the habits of mind of a scientist.

What do I mean by this? Primarily, I mean the intellectual humility necessary to be one’s own harshest critic. To treat any potential success or conclusion as spurious and do everything possible to explain it away as such. Why? Because often that humility is the only thing between junk science and a bad business decision. If you don’t expose the weaknesses of your process, putting it into production surely will.

This is obvious in few places more than benchmarking machine learning software, algorithms, and services, where weak processes seem to be the rule rather than the exception. Let’s start with a benchmarking fable.

A Tale Of Two Coders

Let’s say you are the CEO of a software company composed of you and two developers. You just got funding to grow to 15. Being the intrepid data scientist that you are, you gather some data on your two employees.

First, you ask each of them a question:  “How many lines of code did you write today?”.

“About 200.” says one.

“About 300.” says the other.

You lace your fingers and sit back in your chair with a knowing smile, confident you have divined which is the more productive of the two employees. To uncover the driving force behind this discrepancy, you examine the resumes of the two employees. “Aha!”  You say to yourself, the thrill of discovery coursing through your veins. “The superior employee is from New Jersey and the other is from Rhode Island!”  You promptly go out and hire 12 people from New Jersey, congratulating yourself the entire time on your principled, data-driven hiring strategy.

Of course, this is completely crazy. I hope that no one in their right mind would actually do this. Anyone who witnessed or read about such a course of action would understand how weak the drawn conclusions are.

And yet I’ve seen a dozen benchmarks of machine learning software that make at least one of the same mistakes.  These mistakes generally fall into one of three categories that I like to think of as the three-legged stool for good benchmarking: 

  • Replications: The number of times each test is replicated to account for random chance,
  • Datasets: The number and nature of datasets that you use for testing
  • Metrics: The way you measure the result of the test. 

Let’s visit these in reverse order with our fable in mind.

3 legged stool

#3: Metrics

Probably the biggest problem most developers would have with the above story is the use of “lines of code generated” as a metric to determine developer quality. These people aren’t wrong: Basically everyone concludes that it is a terrible metric.

I wish that people doing ML benchmarks could mount this level of care for their metric choices. For instance, how many of the people who regularly report results in terms of area under an ROC curve (AUC) are aware that there is research showing that the metric is mathematically incoherent? Or that when you compare models using the AUC, you’ll often get results that are opposite those given by other, equally established metrics? There isn’t a broad mathematical consensus on the validity of the AUC in general, but the arguments against it are sound, and so if you’re making decisions based on AUC, you should at least be aware of some of the counter-arguments and see if they make sense to you.

And the decision to use or not use an individual metric isn’t without serious repercussions. In my own academic work prior to joining BigML, I found that, in a somewhat broad test of datasets and classifiers, I could choose a metric that would make a given classifier in a paired comparison seem better than the other in more than 40% of possible comparisons (out of tens of thousands)!  The case where all metrics agree with one another is rarer than you might think, and when they don’t agree the result of your comparison hinges completely on your choice of metric.

The main way out of this is either to be more or less specific about your choice. You might make the former choice in cases where you have a very good idea of what your actual, real-world loss or cost function is. You might, for example, know the exact values of a cost matrix for your predictions. In this case, you can just use that as your metric and it doesn’t matter if this metric is good or bad in general; it’s by definition perfect for this problem.

If you don’t know the particulars of your loss function in advance, another manner of dealing with this problem is to test multiple metrics. Use three or four or five different common metrics and make sure they agree at least on which algorithm is better. If they don’t, you might be in a case where it’s too close to call unless you’re more specific about what you want (that is, which metric is most appropriate for your application).

But there’s an even worse and more subtle problem with the scenario above. Notice that the CEO doesn’t independently measure the lines of code that each developer is producing. Instead, he simply asks them to report it. Again, an awful idea.  How do you know they’re counting in just the same way? How do you know they worked on things that were similarly difficult? How do you know neither of them is lying outright?

Metrics we use to evaluate machine learning models are comparatively well defined, but there are still corner cases all over the place. To take a simple example, when you compute the accuracy of a model, you usually do with respect to some threshold on the model’s probability prediction. If the threshold is 0.5, then the logic is something like “If the predicted probability is greater than 0.5, predict true, if not predict false”. But depending on the software, you might get “greater than or equal to” instead. If you’re relying on different sources to report metrics, you might hit these differences, and they might well matter.

It almost goes without saying, but the fix here is just consistency, and ideally objectivity. When you compare models from two different sources, make sure the tools you use for evaluation are the same, and ideally not ones provided by either of the sources being tested. It’s a pain, yes, but if you’re comparing weights there’s just no way around buying your own scale. There are plenty of open-source reference implementations of almost any metric you can think of. Use one.

#2: Datasets

For the sake of argument, though, let’s assume that you have a good metric for measuring developer productivity. You’re still only measuring performance on the one thing each of your developers did yesterday! What if they’re writing python, and you’re hiring a javascript developer? What if you’re hiring a UI designer? What if you’re hiring a sales rep? Do you really think that the rules for finding a successful python developer will generalize so far?

Generalization can be a dangerous business. Those who have practiced machine learning for long enough know this from hard experience. Which is why it’s infuriating to see someone test a handful of algorithms on one or two or three datasets and then make a statement like, “As you can see from these results, algorithm X is to be preferred for classification problems.”

No, that’s not what I can see at all. What I see is that (assuming you’ve done absolutely everything else correctly), algorithm X performed better than the tested alternatives on one or two or three other problems. You might be tempted to think this is better than nothing, but depending on what algorithm you fancy you can *almost always* find a handful of datasets that show “definitively” that your chosen algorithm is the state of the art. In fact, go ahead and click on “generate abstract” on my ML benchmarking page to do exactly this!

This might seem unbelievable, but the reality is that supervised classification problems, though they seem similar enough on the surface, can be quite diverse mathematically. Dimensionality, decision boundary complexity, data types, label noise, class imbalance, and many other things make classification an incredibly rich problem space. Algorithms that succeed spectacularly with a dozen features fail just as spectacularly with a hundred. There’s a reason people still turn to logistic regression in spite of the superior performance of random forests and/or deep learning in the majority of cases: It’s because there are still a whole lot of datasets where logistic regression is just as good and tons faster. The “best thing” simply always has and always will depend on the dataset to which the thing is applied.

The solution here, as with metrics, is to be more or less specific. If you know basically the data shape and characteristic of every machine learning problem that you’ll face in your job, and you have a reasonably large collection of datasets laying around that is nicely representative of your problem space, then yes, you can use these to conduct a benchmark that will tell you what the best sort of algorithm is for this subset of problems.

If you want to know the best thing generally, you’ll have to do quite a bit more work. My benchmark uses over fifty datasets and I’m still not comfortable enough with its breadth to say that I’ve really uncovered anything that could be said about machine learning problems as a whole (besides that it’s breathtakingly easy to find exceptions to any proposed rule). And even if rules could be found, for how long would they hold? The list of machine learning use cases and their relative importance grows and changes every day. The truth about machine learning today isn’t likely to be the truth tomorrow.

#1: Replications

Finally, and maybe most obviously: The entire deductive process in the fable above is based on only a single day of data from two employees. Even the most basic mathematical due diligence would tell you that you can’t learn anything from so few examples.

Yet there are benchmarks out there that try to draw conclusions from a single training/test split on a single dataset. Making decisions like this based on a point estimate of performance derived from a not-that-big test set is a problem for statistical reasons that are not even all that deep, which is a shame as single-holdout based competitions like the sort that happen on Kaggle are implicitly training novice practitioners to do exactly this.

How do you remedy this?  The blog post above suggests some simple statistical tests you can do based on the number of examples in the test set, which is fine and good and way, way better than nothing.  When you’re evaluating algorithms or frameworks or collections of parameter settings rather than the individual models they produce, however, there are more sources of randomness than just the data itself.  There are, for example, things like random seeds, initializations, and the order in which the data is presented to the algorithm.  Tests based on the dataset don’t account for “luck” with those model-based aspects of training.

There aren’t any perfect ways around this, but you can get a good part of the way there by doing a lot of train/test splits (several runs of cross-validation, for example), and varying the randomized parts of training (seed, data ordering, etc.) with each split.  After you’ve accumulated the results, you might be tempted to average together these results and then choose the algorithm with the higher average, but this obscures the main utility of doing multiple estimates, which is that you get to know something about the distribution of all of those estimates.

Suppose, for example, you have a dataset of 500 points. You do five 80%/20% training/test splits of the data, and measure the performance on each split with two different algorithms (of course, you’re using the exact same five splits for each algorithm, right?):

Algorithm 1: [0.75, 0.9, 0.7, 0.85, 0.9].  Average = 0.820

Algorithm 2: [0.73, 0.84, 0.91, 0.74, 0.89].  Average = 0.821

Sure, the second algorithm has better average performance, but given the swings from split to split, this performance difference is probably just an artifact of the overall variance in the data. Stated another way, it’s really unlikely that two algorithms are going to perform identically on every split, so one or the other of them will almost certainly end up being “the winner”. But is it just a random coin flip to decide who wins? If the split-to-split variance is high relative to the difference in performance, it gives us a clue that it might be.

Unfortunately, even if a statistical test shows that the groups of results are significantly different, this is still not enough by itself to declare that one algorithm is better than another (this would be abuse of statistical tests for several reasons).  However, the converse should be true: If one algorithm is truly better than another in any reasonable sense, it should certainly survive this test.

What, then, if this test fails?  What can we say about the performance of the two models?  This is where we have to be very careful. It’s tempting to dismiss the results by saying, “Neither, they’re about the same”, but the more precise answer is that our test didn’t give evidence that the performance of either one was better than the other.  It might very well be the case that one is better than the other and we just don’t have the means (the amount of data, the resources, the time, etc.) to do a test that shows it properly. Or perhaps you do have those means and you should avail yourself of them.  Beware the trap, however, of endless fiddling with with modeling parameters on the same data.  For lots of datasets, real performance differences between algorithms are both difficult to detect and often too small to be important.

For me, though, the more interesting bit of this analysis is again the variance of the results. Above we have a mean performance of 0.82, with a range of 0.7 to 0.9.  That result is quite different to a mean performance of 0.82 with a range of 0.815 to 0.823. In the former case, you’d go to production having a good bit of uncertainty around the actual expected performance. In the latter, you’d expect the performance to be much more stable.  I’d say it’s a fairly important distinction, and one you can’t possibly see with a single split.

There are many cases in which you can’t know with any reasonable level of certainty if one algorithm is better than another with a single train/test split. Unless you have some idea of the variance that comes with a different test set, there’s no way to say for sure if the difference you see (and you will very likely see a difference!) might be a product of random chance.

Emerge with The Good Stuff

I get it. I’m right there with you. When I’m running tests, I want so badly to get to “the truth”. I want the results of my tests to mean something, and it can be so, so tempting to quit early. To run one or two splits on a dataset and say, “Yeah, I get the idea.” To finish as quickly as possible with testing so you can move on to the very satisfying phase of knowing something that others don’t.

But as with almost any endeavor in data mining, the landscape of benchmarking is littered with fool’s gold. There are so very many tests one can do that are meaningless, where the results are quite literally worth less than doing no test at all. Only if one is careful about the procedure, skeptical of results, and circumscribed in one’s conclusions is it possible to sort out the rare truth from the myriad of ill-supported fictions.

Changes to the 2020 Machine Learning School in Seville (VIRTUAL CONFERENCE)

As most event organizers, we have been monitoring the information about the progress of COVID-19 in Spain and around the world on a daily basis. Given the proximity of our Machine Learning School and the questions that we have been receiving, we’d like to share with you important changes to the event.

We have been trying to collect data about how the virus is evolving, but because the number of tests is not published it’s difficult to build a reliable model that goes beyond a time series. In any case, even if the current impact in Seville is low as temperatures are already high, the number of infected people in Spain is likely to increase significantly until the measures recently put in place take full effect. Your well-being is our utmost concern. Therefore, beyond the evolution of this disease in our geographical area, we must also consider all the potential regulations or individual company guidelines that can be put in place right before, during or following our event. Our interpretation is that the most common recommendation is to generally avoid traveling.

Virtual Conference

As such, we have decided that ML School Seville 2020 will take place according to the planned schedule, however, only virtually. The lectures will be delivered via live webinars from different parts of the world for our registrants in many different locales around the world. The conference will include a number of expert moderators that will compile questions during the sessions and you will also have the opportunity to talk with our presenters LIVE. In addition, during coffee breaks, you can attend smaller sessions (maximum 15 people) with the speakers. As usual, MLSchool session materials will be made available on the SlideShare platform.

Moreover, we will be issuing refunds shortly to all our attendees so their participation will effectively be free of charge.

A special thank you to our speakers and those who have been steadfastly assisting our organization. We are doing our best to make this virtual conference experience a satisfying one given the circumstances.  As they say, when life gives you lemons, make a virtual lemonade! With that spirit in mind, we ask you to give us a hand to pass on the news and invite friends or colleagues that may be interested in getting better with Machine Learning in the comfort of their home or office.

You will receive a message shortly with the relevant instructions on how to connect to the event on the mornings of March 26 & 27.


~The BigML Team

Registration Open for 2nd Edition of Machine Learning School in The Netherlands: July 7-10, 2020

We’re happy to share that we are launching the Second Edition of our Machine Learning School in The Netherlands in collaboration with the Nyenrode Business School and BigML Partner, INFORM.  This edition follows the footsteps of the well-attended First Edition and will take place on July 7-10, 2020 in Breukelen near Amsterdam. It’s the ideal learning setting for professionals that wish to solve real-world problems by applying Machine Learning in a hands-on manner. This includes analysts, business leaders, industry practitioners, and anyone looking to boost their team’s productivity by leveraging the power of automated data-driven decision making.

Dutch ML School 2020

The four-day #DutchMLSchool kicks off with its Machine Learning for Executives program on Day 1 and continues with a jam-packed introductory two-day course optimized to learn the basic Machine Learning concepts and techniques that are applicable in a multitude of industries. On Day 4, the curriculum concludes with a hands-on Working with the Masters track that allows the attendees to put in practice the techniques taught in the days prior by implementing an end-to-end use case starting with raw data.


Nyenrode Business University, Straatweg 25, 3621 BG Breukelen. See the map here.


4-day event: July 7-10, 2020 from 8:30 AM to 5:30 PM CEST.


Please follow this Eventbrite link to order your tickets. Act today to take advantage of our early bird rates and save. We recommend that you register soon since space is limited and the event may sell out quickly.


Lecturer details can be accessed here and the full agenda will be published as the event nears.

Beyond Machine Learning

In addition to the core sessions of the course, we wish to get to know all attendees better. As such, we’re organizing the following activities for you and will be sharing more details shortly:

  • Genius Bar. A one-on-one appointment to help you with your questions regarding your business, use cases, or any ideas related to Machine Learning. If you’re coming to the Machine Learning School and would like to share your thoughts with the BigML Team, be sure to book your 30-minute slot by contacting us at
  • International Networking. Meet the lecturers and attendees during the breaks. We expect local business leaders and other experts coming from European locales as well as from other countries around the globe.

We look forward to your participation!


Grading our 2020 Oscars Machine Learning Predictions

The 2020 Oscars were presented on Sunday evening in Hollywood, California in what turned out a night of some historic firsts. The second in a row hostless edition of the award ceremony was reportedly watched live by an audience of 23.6 million, which tracked lower than last year’s total of 29.6 million viewers. While all award show broadcasts these days are reporting declines in viewership, the Academy Awards still rule the roost by a healthy margin with continued reverberations in social media days after the event.

2019 Oscars

So how did we do with our predictions this year?

The short answer is we got 5 out of 8 predictions right. It’s not so shabby when considering that if we hired chimpanzees to randomly throw darts on boards with the names of the nominees on them, it would take our furry friends 3,125 tries to correctly guess the five awards we got right. (NOTE: For the ubergeek in you, a score of perfect 8 out 8 would take the chimp squad 703,125 tries on average.) As for human experts, we’re not aware of any high profile movie critics that got every award right either.

So it’s safe to say our models were definitely picking up some legitimate signals in what is essentially a fairly small sample (1288 records) of past award data. To be more specific, the number of positive examples (there are just 20 award winners in a given category in the last 20 years) in each held out test set is pretty tiny. This opens the door for potential overfitting. Contrast that against millions of data points collected from sensors on a piece of machinery that can be modeled to make much more robust predictions.

As it turned out, BigML was not alone in missing the mark on some of the high-profile award predictions this year. Some were counting on 1917, fresh off its Golden Globes win, to take home the big one while others applied the “wisdom of the crowd” approach to arrive at the same conclusion. It seems those approaches fell short not only in terms of prediction accuracy (which can happen with a smaller dataset) but also because they neither described a repeatable, end-to-end process nor shared a public dataset for interested parties to utilize — more work needed.2020 Oscars Results

As we analyze our misses as seen in the table above, we swung but missed the Best Picture, Best Director and Best Adapted Screenplay awards but did very well with Best Original Screenplay as well as all four of the Actor and Actress categories. Without a doubt, South Korea’s Parasite bucking the trend and becoming the first foreign-language film to win the big awards, Best Picture and Best Director (Bong Joon Ho), had a lot to do with our respectable but less than perfect results.

In general, Machine Learning models such as the classification models we built for this project rely on the assumption that the newly presented data will not drastically deviate from historical datasets they were trained on. While this helps produce robust results that are statistically significant most of the time, it may also miss important points of deflection from the norm like what we witnessed on Sunday.

To be fair, our model had Parasite sport the second-best score of 82/100 for the Best Picture award behind only Once Upon a Time…in Hollywood. And we did mention in our post that a Parasite win could not be ruled out. Similarly, Bong Joon Ho was given a score of 55/100 for the Best Director award, which can be interpreted as a win probability of 55% which is significant enough by itself.

If we put Parasite under the “microscope” (sorry, couldn’t help it :), we see that the producers and the director of the original crime drama tirelessly campaigned to generate grassroots interest in many International film festivals, which helped carry their momentum into the box office to the tune of $35M+ in the U.S. and $165M worldwide. That’s pretty impressive for an Asian production with a modest budget of $11M.

In the past decades, movie studios like U.S.-based Lionsgate had perfected the game of getting smaller budget flicks to punch above their weight as was the case for the 2006 surprise winner, Crash. On the other hand, last year, we witnessed Alejandro González Iñárritu’s Mexican production Roma (another foreign-language release with subtitles) counted among the favorites. However, Roma grossed barely above $1M despite great reviews by art-house critics. So, in the 2020 edition of the Oscars, a perfect storm of a foreign language movie not only well-liked by the critics but also by worldwide and U.S. audiences may have been brewing in front of our eyes.

It’s perhaps very fitting that Parasite‘s big wins coincided with the name change of what was called the “Best Foreign Language Film” award to “Best International Feature Film” as the welcome change shows the Academy is adapting to the more inclusive point of view considering world cinema not “foreign” or “other,” but part of the broader movie landscape. This gives us the motivation to expand our dataset for next year to cover more international festivals such as Cannes and the Toronto International Film Festival (TIFF). This may help capture a more complete sentiment of worldwide movie fans.

The last award we predicted incorrectly was the Adapted Screenplay. The winner, Jojo Rabbit, had a very low score of 1 out of 100 yet managed to beat The Irishman that had 67. We’ll chalk that up to the “big miss” bucket and admittedly have to do some deeper digging down to see if there was an angle we overlooked or even an underlying data issue.

With this wrap, as the pioneers of ML-as-a-Service here at BigML, we welcome you to build your own models that hopefully will beat what we’ve shared here so we can learn a few tricks from you too. The movies 2000-2019 dataset is public and calling for your time and creativity. In this exercise, you will have the benefit of knowing the winners in advance, but it may still make great practice for the 2021 Oscars. Let us know how your results come out on Twitter @bigmlcom or shoot us a note at anytime!

Predicting the 2020 Oscars Winners with Machine Learning

This year, we continue the tradition of predicting the winners of the most anticipated movie awards of the year, The 92nd Academy Awards. 2020 nominees present us with an interesting mix of “sure bet” categories alongside some more contentious ones that can be ripe for at least some mild surprises. Nevertheless, we’re excited to share our predictions and see how the Academy Awards pan out this Sunday!

Oscars 2020

The Data

If you’re up for some DIY Machine Learning fun, you can find our Movies dataset on the BigML Gallery and build your own models to actively join in the fun after you create a free account.

Machine Learning models typically improve with more data instances so we have kept all the previous data and features we considered for previous years’ predictions, and we added data from 2019, all of which amount to a total of 1,288 movies nominated for various awards from 2000 to 2019. In the resulting dataset, each film has 100+ features including:

  • Film characteristics such as synopsis (new), duration, budget, and genre.
  • Film evaluation measures in IMDB like viewer votes, ratings, and Metascore.
  • This year’s nominations and winners for 20 key industry awards including Golden Globes, BAFTA, Screen Actors Guild, and Critics Choice.
2020 Oscars Dataset

2000-2019 Movies Dataset

The Model(s)

We’ve tried multiple modeling approaches such as OptiML, the optimization process on BigML that automatically finds the best performing supervised model, individual Deepnets along with some Fusions, which combines multiple supervised models for potentially more robust performance. For each award category, eight in total, we trained separate models to see how the predictions would compare and which method would give the best results.

Once our candidate models were created, we made Batch Predictions against the movies produced in 2019 that we had set aside in a separate dataset. We quickly found that the individual Deepnets models configured with the Automatic Network Search option gave out more clear cut answers for winners so we decided to favor those to avoid a muddier picture.

Deepnets PDP Oscars 2020

Deepnets Visualization for Best Picture of 2020

For example, the field importance report below is that of the Best Picture Oscar and it shows that fields like synopsis, nominations in BAFTA, Critic’s Choice, Online Film TV Association, Golden Globes, LA Film Critics Association and wins in Online Film Critics Society and People’s Choice all factored in strongly in calculating the final scores for each Best Picture nominee.

Deepnet Field Importance Oscars 2020

Best Picture Deepnet Summary Report

The Predictions

Without further ado, let’s predict the 2020 winners! For each category, we predict the most likely winner along with other nominees sorted by decreasing scores. Keep in mind that these scores aren’t supposed to add up to 100. Rather, they are “points” given to the nominee by the underlying Deepnet model on a scale of 0 to 100. Another way to look at this is that the model is telling us how a movie/artist with a given set of characteristics will do in a given award based on 19 years of historical data on that award AND independent of the other nominees for the same award this year.

The Best Picture award is customarily awarded last in the real ceremony, but we’ll turn that order upside down here for you so we get to start with a B-A-N-G! Our models (Deepnets, OptiML) gave a strong nod to Once Upon a Time…in Hollywood in what can be portrayed as a rather controversial choice. The projected winner gets a very high score closely followed by the South Korean production Parasite and Martin Scorsese‘s epic, The Irishman both of which have respectable scores of their own. On the other hand, the award season favorite 1917, which comes fresh from its “Best Motion Picture – Drama” victory in the Golden Globes didn’t fare too well according to our final model. If we are right, this won’t be the first time Golden Globes and The Academy Awards go in separate directions.

Best Picture Oscar 2020

In the Best Director category, our model finds a considerably larger margin in scores and predicts that Sem Mendes will be the most likely recipient of the gold statuette. If that is correct, 1917 fans may find solace in this scenario despite falling short in the Best Picture category. With that said, Todd Phillips of Joker, Quentin Tarantino of Once Upon a Time…in Hollywood and even Bong Joon Ho of Parasite received scores that are hard to sneeze at so a surprise win from either them should not be seen as a complete shocker.

Best Director Oscar 2020

For Best Actress, Renée Zellweger in Judy has swept many awards this season and is as good a shoo-in as one can hope for with a score of 90 and a clear separation from the second favorite Scarlett Johansson, who held her ground with a score of 45 vs. the remaining nominees.

Best Actress Oscar 2020

The award with the largest score margin between the predicted winner and the next best nominee this year turned out to be Best Actor. It will indeed be considered a big upset if Joaquin Phoenix fails to walk away with the Oscar given how low the scores for his competition came out.

Best Actor Oscar 2020

The Best Supporting Actress category seems Laura Dern‘s to lose despite a less than stellar score of 55 historically speaking. Florence Pugh looks like the clear second choice, but not close in terms of her score.

Best Supporting Actress Oscar 2020

The Best Supporting Actor award presents an interesting challenge as both Joe Pesci and Al Pacino are in The Irishman thus arguably hurting each other’s chances despite decent scores. This means our model likes none other than Mr. Brad Pitt to pick up the award at the expense of those two veteran performers.

Best Supporting Actor Oscar 2020

Parasite seems to be the best bet for Best Original Screenplay with a score of 58 and a healthy margin against other nominees that our model predicts would make historically weak selections for this award.

Best Original Screen Play Oscar 2020

And last but not least, our model picks The Irishman as the winner of Best Original Screenplay. The margin between The Irishman and the second-placed Little Women tells us it will likely take more than luck of the draw for the other nominees to unseat the favorite.

Best Adapted Screenplay Oscar 2020

This concludes our 2020 Oscars predictions. As you get ready for the ceremony on Sunday, February 9th, you can now choose to make Machine Learning part of your night. Best of luck to all nominees and a big thanks to the Academy of Motion Picture Arts and Sciences (AMPAS) for carrying the tradition since 1929! 

BigML Customer Rabobank featured on ‘AI in Banking’ Podcast

There are many uses of Machine Learning in banking as the industry is among the leading ones when it comes to devoting resources to better take advantage of proprietary datasets in order to gain better customer insights.

Long-time BigML customer Rabobank is no exception. This week, Jan Veldsink of Rabobank was featured on the Emerj ‘AI in Banking’ podcast, talking about how his group is utilizing NLP and unsupervised learning techniques such as Topic Modeling as it strives to make sense of large collections of documents and their underlying textual content.

Emerj Rabobank Podcast

The Emerj podcast series is aimed at financial professionals looking to stay ahead of the curve as technologies like Machine Learning disrupt the sector. With an emphasis on real lessons learned from the innovators at global organizations around the world, Emerj Founder Daniel Faggella interviews them to bring their insights to his audience.

Check it out and see if you can spot a few new ideas you can readily explore on BigML!

Presenting the ‘Best of Both Worlds’ Program for the Machine Learning School in Seville

The primary goal of the second edition of our Machine Learning School in Seville (#MLSEV) is to introduce basic as well as more advanced Machine Learning concepts and techniques that will help you boost your business productivity significantly. Our extensive experience in welcoming business and technical professionals of different backgrounds into the world of Machine Learning has taught us that one size does not fit all.  Gradually, the topics covered and the structure within which those “nuggets” are exposed to the audience has taken shape to become the 2-day event format we offer today. This format excels in that it delivers a “Best of Both Worlds” viewpoint by mixing hands-on technical sessions with sessions about challenges faced and lessons learned when implementing real-life Machine Learning systems.

MLSEV 2020

Let’s take a quick look at the highlights of what #MLSEV attendees can expect on March 26 and 27, 2020 at EOI Andalucía.


After opening remarks, Day 1 begins with Ed Fernández (Arowana) followed by Professor Enrique Dans (IE University). Jointly, they will give the audience a good understanding of the unfolding business impact of Machine Learning being enabled with modern software platforms like BigML. Next up, BigML’s Chief Scientist and one of the fathers of the discipline of Machine Learning, Professor Tom Dietterich takes the stage to talk about state-of-the-art Machine Learning techniques as well as where we’re likely headed in the coming years.

MLSEV Speakers

After some introductory hands-on technical sessions delivered by experienced BigML Machine Learning experts and lunch break, we will delve into insightful afternoon presentations given by Michael Skiba (aka Dr. Fraud), Jan W Veldsink (Rabobank), and  Roy Prayikulam / Kevin Nagel (INFORM). These sessions cover real-life Machine Learning implementations in areas such as financial fraud detection and anti-money laundering (AML). The first day ends with a mini “Get your hands dirty” Machine Learning exercise using the BigML Dashboard giving our attendees a chance to interact with some of the concepts covered during the first day.


We will kick off Day 2 with more technical sessions about some of the most versatile supervised, unsupervised and AutoML learning techniques on the BigML platform. The lecturers will not only convey the high-level concepts behind those approaches but also how they work in practice.

To complete the program, BigML partners represented by José Cárdenas (Indorama), Christina Rodríguez & Delio Tolivia (Talento Transformación Digital) and Andrés González ( take the baton as they explain how they helped implement different operational use cases delivering tangible benefits such as quality optimization and wait time minimization.

We will wrap up Day 2 with the demonstration of how to push to production your Machine Learning models that you will have built during the practice session at the end of Day 1. This will complete the picture by giving our attendees an in-depth understanding of the end-to-end Machine Learning process they can follow as they take their new knowledge and predictive use case ideas back to their workplaces.

To boot, all this content comes as a very affordable (sub-100 Euro/USD) package!

We strongly recommend that you buy your ticket today to not miss this opportunity to get up to speed with what Machine Learning can concretely deliver for your business. You will leave with a good understanding of the possibilities and a tangible list of ideas you can implement in short order, while meeting like-minded executives and professionals in a historic setting. Here’s to seeing you in Seville…

%d bloggers like this: