Skip to content

Grading our 2020 Oscars Machine Learning Predictions

The 2020 Oscars were presented on Sunday evening in Hollywood, California in what turned out a night of some historic firsts. The second in a row hostless edition of the award ceremony was reportedly watched live by an audience of 23.6 million, which tracked lower than last year’s total of 29.6 million viewers. While all award show broadcasts these days are reporting declines in viewership, the Academy Awards still rule the roost by a healthy margin with continued reverberations in social media days after the event.

2019 Oscars

So how did we do with our predictions this year?

The short answer is we got 5 out of 8 predictions right. It’s not so shabby when considering that if we hired chimpanzees to randomly throw darts on boards with the names of the nominees on them, it would take our furry friends 3,125 tries to correctly guess the five awards we got right. (NOTE: For the ubergeek in you, a score of perfect 8 out 8 would take the chimp squad 703,125 tries on average.) As for human experts, we’re not aware of any high profile movie critics that got every award right either.

So it’s safe to say our models were definitely picking up some legitimate signals in what is essentially a fairly small sample (1288 records) of past award data. To be more specific, the number of positive examples (there are just 20 award winners in a given category in the last 20 years) in each held out test set is pretty tiny. This opens the door for potential overfitting. Contrast that against millions of data points collected from sensors on a piece of machinery that can be modeled to make much more robust predictions.

As it turned out, BigML was not alone in missing the mark on some of the high-profile award predictions this year. Some were counting on 1917, fresh off its Golden Globes win, to take home the big one while others applied the “wisdom of the crowd” approach to arrive at the same conclusion. It seems those approaches fell short not only in terms of prediction accuracy (which can happen with a smaller dataset) but also because they neither described a repeatable, end-to-end process nor shared a public dataset for interested parties to utilize — more work needed.2020 Oscars Results

As we analyze our misses as seen in the table above, we swung but missed the Best Picture, Best Director and Best Adapted Screenplay awards but did very well with Best Original Screenplay as well as all four of the Actor and Actress categories. Without a doubt, South Korea’s Parasite bucking the trend and becoming the first foreign-language film to win the big awards, Best Picture and Best Director (Bong Joon Ho), had a lot to do with our respectable but less than perfect results.

In general, Machine Learning models such as the classification models we built for this project rely on the assumption that the newly presented data will not drastically deviate from historical datasets they were trained on. While this helps produce robust results that are statistically significant most of the time, it may also miss important points of deflection from the norm like what we witnessed on Sunday.

To be fair, our model had Parasite sport the second-best score of 82/100 for the Best Picture award behind only Once Upon a Time…in Hollywood. And we did mention in our post that a Parasite win could not be ruled out. Similarly, Bong Joon Ho was given a score of 55/100 for the Best Director award, which can be interpreted as a win probability of 55% which is significant enough by itself.

If we put Parasite under the “microscope” (sorry, couldn’t help it :), we see that the producers and the director of the original crime drama tirelessly campaigned to generate grassroots interest in many International film festivals, which helped carry their momentum into the box office to the tune of $35M+ in the U.S. and $165M worldwide. That’s pretty impressive for an Asian production with a modest budget of $11M.

In the past decades, movie studios like U.S.-based Lionsgate had perfected the game of getting smaller budget flicks to punch above their weight as was the case for the 2006 surprise winner, Crash. On the other hand, last year, we witnessed Alejandro González Iñárritu’s Mexican production Roma (another foreign-language release with subtitles) counted among the favorites. However, Roma grossed barely above $1M despite great reviews by art-house critics. So, in the 2020 edition of the Oscars, a perfect storm of a foreign language movie not only well-liked by the critics but also by worldwide and U.S. audiences may have been brewing in front of our eyes.

It’s perhaps very fitting that Parasite‘s big wins coincided with the name change of what was called the “Best Foreign Language Film” award to “Best International Feature Film” as the welcome change shows the Academy is adapting to the more inclusive point of view considering world cinema not “foreign” or “other,” but part of the broader movie landscape. This gives us the motivation to expand our dataset for next year to cover more international festivals such as Cannes and the Toronto International Film Festival (TIFF). This may help capture a more complete sentiment of worldwide movie fans.

The last award we predicted incorrectly was the Adapted Screenplay. The winner, Jojo Rabbit, had a very low score of 1 out of 100 yet managed to beat The Irishman that had 67. We’ll chalk that up to the “big miss” bucket and admittedly have to do some deeper digging down to see if there was an angle we overlooked or even an underlying data issue.

With this wrap, as the pioneers of ML-as-a-Service here at BigML, we welcome you to build your own models that hopefully will beat what we’ve shared here so we can learn a few tricks from you too. The movies 2000-2019 dataset is public and calling for your time and creativity. In this exercise, you will have the benefit of knowing the winners in advance, but it may still make great practice for the 2021 Oscars. Let us know how your results come out on Twitter @bigmlcom or shoot us a note at anytime!

Predicting the 2020 Oscars Winners with Machine Learning

This year, we continue the tradition of predicting the winners of the most anticipated movie awards of the year, The 92nd Academy Awards. 2020 nominees present us with an interesting mix of “sure bet” categories alongside some more contentious ones that can be ripe for at least some mild surprises. Nevertheless, we’re excited to share our predictions and see how the Academy Awards pan out this Sunday!

Oscars 2020

The Data

If you’re up for some DIY Machine Learning fun, you can find our Movies dataset on the BigML Gallery and build your own models to actively join in the fun after you create a free account.

Machine Learning models typically improve with more data instances so we have kept all the previous data and features we considered for previous years’ predictions, and we added data from 2019, all of which amount to a total of 1,288 movies nominated for various awards from 2000 to 2019. In the resulting dataset, each film has 100+ features including:

  • Film characteristics such as synopsis (new), duration, budget, and genre.
  • Film evaluation measures in IMDB like viewer votes, ratings, and Metascore.
  • This year’s nominations and winners for 20 key industry awards including Golden Globes, BAFTA, Screen Actors Guild, and Critics Choice.
2020 Oscars Dataset

2000-2019 Movies Dataset

The Model(s)

We’ve tried multiple modeling approaches such as OptiML, the optimization process on BigML that automatically finds the best performing supervised model, individual Deepnets along with some Fusions, which combines multiple supervised models for potentially more robust performance. For each award category, eight in total, we trained separate models to see how the predictions would compare and which method would give the best results.

Once our candidate models were created, we made Batch Predictions against the movies produced in 2019 that we had set aside in a separate dataset. We quickly found that the individual Deepnets models configured with the Automatic Network Search option gave out more clear cut answers for winners so we decided to favor those to avoid a muddier picture.

Deepnets PDP Oscars 2020

Deepnets Visualization for Best Picture of 2020

For example, the field importance report below is that of the Best Picture Oscar and it shows that fields like synopsis, nominations in BAFTA, Critic’s Choice, Online Film TV Association, Golden Globes, LA Film Critics Association and wins in Online Film Critics Society and People’s Choice all factored in strongly in calculating the final scores for each Best Picture nominee.

Deepnet Field Importance Oscars 2020

Best Picture Deepnet Summary Report

The Predictions

Without further ado, let’s predict the 2020 winners! For each category, we predict the most likely winner along with other nominees sorted by decreasing scores. Keep in mind that these scores aren’t supposed to add up to 100. Rather, they are “points” given to the nominee by the underlying Deepnet model on a scale of 0 to 100. Another way to look at this is that the model is telling us how a movie/artist with a given set of characteristics will do in a given award based on 19 years of historical data on that award AND independent of the other nominees for the same award this year.

The Best Picture award is customarily awarded last in the real ceremony, but we’ll turn that order upside down here for you so we get to start with a B-A-N-G! Our models (Deepnets, OptiML) gave a strong nod to Once Upon a Time…in Hollywood in what can be portrayed as a rather controversial choice. The projected winner gets a very high score closely followed by the South Korean production Parasite and Martin Scorsese‘s epic, The Irishman both of which have respectable scores of their own. On the other hand, the award season favorite 1917, which comes fresh from its “Best Motion Picture – Drama” victory in the Golden Globes didn’t fare too well according to our final model. If we are right, this won’t be the first time Golden Globes and The Academy Awards go in separate directions.

Best Picture Oscar 2020

In the Best Director category, our model finds a considerably larger margin in scores and predicts that Sem Mendes will be the most likely recipient of the gold statuette. If that is correct, 1917 fans may find solace in this scenario despite falling short in the Best Picture category. With that said, Todd Phillips of Joker, Quentin Tarantino of Once Upon a Time…in Hollywood and even Bong Joon Ho of Parasite received scores that are hard to sneeze at so a surprise win from either them should not be seen as a complete shocker.

Best Director Oscar 2020

For Best Actress, Renée Zellweger in Judy has swept many awards this season and is as good a shoo-in as one can hope for with a score of 90 and a clear separation from the second favorite Scarlett Johansson, who held her ground with a score of 45 vs. the remaining nominees.

Best Actress Oscar 2020

The award with the largest score margin between the predicted winner and the next best nominee this year turned out to be Best Actor. It will indeed be considered a big upset if Joaquin Phoenix fails to walk away with the Oscar given how low the scores for his competition came out.

Best Actor Oscar 2020

The Best Supporting Actress category seems Laura Dern‘s to lose despite a less than stellar score of 55 historically speaking. Florence Pugh looks like the clear second choice, but not close in terms of her score.

Best Supporting Actress Oscar 2020

The Best Supporting Actor award presents an interesting challenge as both Joe Pesci and Al Pacino are in The Irishman thus arguably hurting each other’s chances despite decent scores. This means our model likes none other than Mr. Brad Pitt to pick up the award at the expense of those two veteran performers.

Best Supporting Actor Oscar 2020

Parasite seems to be the best bet for Best Original Screenplay with a score of 58 and a healthy margin against other nominees that our model predicts would make historically weak selections for this award.

Best Original Screen Play Oscar 2020

And last but not least, our model picks The Irishman as the winner of Best Original Screenplay. The margin between The Irishman and the second-placed Little Women tells us it will likely take more than luck of the draw for the other nominees to unseat the favorite.

Best Adapted Screenplay Oscar 2020

This concludes our 2020 Oscars predictions. As you get ready for the ceremony on Sunday, February 9th, you can now choose to make Machine Learning part of your night. Best of luck to all nominees and a big thanks to the Academy of Motion Picture Arts and Sciences (AMPAS) for carrying the tradition since 1929! 

BigML Customer Rabobank featured on ‘AI in Banking’ Podcast

There are many uses of Machine Learning in banking as the industry is among the leading ones when it comes to devoting resources to better take advantage of proprietary datasets in order to gain better customer insights.

Long-time BigML customer Rabobank is no exception. This week, Jan Veldsink of Rabobank was featured on the Emerj ‘AI in Banking’ podcast, talking about how his group is utilizing NLP and unsupervised learning techniques such as Topic Modeling as it strives to make sense of large collections of documents and their underlying textual content.

Emerj Rabobank Podcast

The Emerj podcast series is aimed at financial professionals looking to stay ahead of the curve as technologies like Machine Learning disrupt the sector. With an emphasis on real lessons learned from the innovators at global organizations around the world, Emerj Founder Daniel Faggella interviews them to bring their insights to his audience.

Check it out and see if you can spot a few new ideas you can readily explore on BigML!

Presenting the ‘Best of Both Worlds’ Program for the Machine Learning School in Seville

The primary goal of the second edition of our Machine Learning School in Seville (#MLSEV) is to introduce basic as well as more advanced Machine Learning concepts and techniques that will help you boost your business productivity significantly. Our extensive experience in welcoming business and technical professionals of different backgrounds into the world of Machine Learning has taught us that one size does not fit all.  Gradually, the topics covered and the structure within which those “nuggets” are exposed to the audience has taken shape to become the 2-day event format we offer today. This format excels in that it delivers a “Best of Both Worlds” viewpoint by mixing hands-on technical sessions with sessions about challenges faced and lessons learned when implementing real-life Machine Learning systems.

MLSEV 2020

Let’s take a quick look at the highlights of what #MLSEV attendees can expect on March 26 and 27, 2020 at EOI Andalucía.


After opening remarks, Day 1 begins with Ed Fernández (Arowana) followed by Professor Enrique Dans (IE University). Jointly, they will give the audience a good understanding of the unfolding business impact of Machine Learning being enabled with modern software platforms like BigML. Next up, BigML’s Chief Scientist and one of the fathers of the discipline of Machine Learning, Professor Tom Dietterich takes the stage to talk about state-of-the-art Machine Learning techniques as well as where we’re likely headed in the coming years.

MLSEV Speakers

After some introductory hands-on technical sessions delivered by experienced BigML Machine Learning experts and lunch break, we will delve into insightful afternoon presentations given by Michael Skiba (aka Dr. Fraud), Jan W Veldsink (Rabobank), and  Roy Prayikulam / Kevin Nagel (INFORM). These sessions cover real-life Machine Learning implementations in areas such as financial fraud detection and anti-money laundering (AML). The first day ends with a mini “Get your hands dirty” Machine Learning exercise using the BigML Dashboard giving our attendees a chance to interact with some of the concepts covered during the first day.


We will kick off Day 2 with more technical sessions about some of the most versatile supervised, unsupervised and AutoML learning techniques on the BigML platform. The lecturers will not only convey the high-level concepts behind those approaches but also how they work in practice.

To complete the program, BigML partners represented by José Cárdenas (Indorama), Christina Rodríguez & Delio Tolivia (Talento Transformación Digital) and Andrés González ( take the baton as they explain how they helped implement different operational use cases delivering tangible benefits such as quality optimization and wait time minimization.

We will wrap up Day 2 with the demonstration of how to push to production your Machine Learning models that you will have built during the practice session at the end of Day 1. This will complete the picture by giving our attendees an in-depth understanding of the end-to-end Machine Learning process they can follow as they take their new knowledge and predictive use case ideas back to their workplaces.

To boot, all this content comes as a very affordable (sub-100 Euro/USD) package!

We strongly recommend that you buy your ticket today to not miss this opportunity to get up to speed with what Machine Learning can concretely deliver for your business. You will leave with a good understanding of the possibilities and a tangible list of ideas you can implement in short order, while meeting like-minded executives and professionals in a historic setting. Here’s to seeing you in Seville…

Looking back at BigML’s 2019 in Numbers

2019 has come to an end and it seems to have gone in warp speed with so much happening in the fast-paced world of Machine Learning. 2019 was no exception to the rule among recent years in that it has witnessed a continued rise in interest in Machine Learning from all industries. Furthermore, many more businesses have transitioned from talking about or only conducting mini-experiments with the technology into rolling out smart applications in production that have embedded predictive capabilities. The focus now is turning to a different set of affairs such as operationalization, automation, robustness, eliminating bias, and naturally better assessing Return on Investment. When things happen so fast, one can sometimes find it a challenge to stop and reflect on milestones and achievements. So below are the highlights of what made 2019 another special year for BigML.

BigML 2019 Summary Stats

Perhaps the most special milestone for us in 2019 was marked by our platform crossing the 100,000 registrations mark worldwide (as of this writing past 110,000 registered users) adding to our growing milestones since inception in 2011.

Our users keep making a difference in their workplaces, government agencies, as well as educational institutions all over the map putting the BigML platform to use in the most creative ways. If you’d like to dig deeper into some of those use cases, we’ve recently blogged about customer success stories on the BigML platform both from startups as well as large enterprises.

Continuous Product Updates via 166 Deploys

In 2019, we kept bringing new features and enhancements to our already comprehensive platform that is evolving based on feedback from customers that are solving a variety of real-life use cases. This includes the addition of the Linear Regressions resource. Linear Regression is the granddaddy of statistical techniques remaining popular among Machine Learning practitioners as a baseline model since it trains fast and can be easily interpreted.

Linear Regression

Another addition came in the form of AutoML. This first version of AutoML helps automate the complete Machine Learning pipeline, not only the model selection. Give it training and validation datasets and it will give you a Fusion with the best possible models using the least possible number of features: ready to predict! The returned model is the result of three AutoML stages: Feature GenerationFeature Selection and Model Selection. AutoML also returns the evaluation of the model, in order to show the user its performance.

Aside from major releases, we’ve also made many more smaller but noteworthy improvements to the BigML platform that can be tracked on our What’s New page in case you’d like to try out the ones you may have missed. All this activity was made possible by 166 production deployments throughout the entire year as directed by our product team. Expect more release announcements from us in early 2020. HINT: Some goodies that will bring new data types to the platform!

BigML Team

We’ve also kept BigML Tools updated to make sure insights from BigML resources find their way to complementary platforms. A new addition to that list in 2019 was the BigML Node-RED bindings, which makes it easier to create and deploy ML-powered IoT devices using one of the most popular development environments for IoT: Node-RED.

Of course, anytime a new major release or enhancements are available those find their way to our multi-tenant environments hosted on (or immediately. After a brief monitoring period, the same capabilities are seamlessly added to the instances of our private deployment customers FREE of charge so that they stay current with the latest and greatest BigML has to offer.

38 Events and Machine Learning Schools

In 2019, we continued the tradition of organizing and delivering Machine Learning schools with the first Machine Learning School in Seville, Spain followed by the first Machine Learning School in The Netherlands and gathered a great audience mixing business professionals along with academics. In 2020, the second editions of both these events will be held to pick up from where we left in Seville, Spain as well as in Breukelen near Amsterdam. You can already buy a ticket by following those links and secure your spot in advance!

schools_2018On the other hand, BigML employees and advisors attended 36 business and academic conferences and events to inform the business community about their experiences in implementing real-life predictive applications or to share the latest product capabilities with fellow Machine Learning experts and enthusiasts.

234 new Ambassadors, 9 Certification Batches

Our Education Program saw continued growth in 2019, with the addition of 234 new applicants to help promote Machine Learning on their campuses. BigML Ambassadors span the globe and include students as well as educators. We intend to continue reaching out to universities and other educational institutions throughout 2020.

In addition, as part of BigML’s Internship Program, our interns made valuable contributions while gaining crucial work experience. We are thrilled to do our part in contributing to the careers of young guns in the Machine Learning world.learnml_2018

On the other hand, our team of expert Machine Learning instructors completed 9 new rounds of the BigML Certification Programs passing on their deep expertise to newly minted BigML Engineers. Furthermore, this year saw the inaugural BigML Architect Certification wave take flight. If you’re serious about delivering (to your clients) any sophisticated fully-automated smart applications that thrive on enterprise-grade data pipelines, certifications will help get you there.

More BigML Partners, More Verticals

In 2019, we had a number of new partners join our BigML Preferred Partner Program. Most recently, in September 2019, INFORM GmbH based in Aachen, Germany joined BigML’s partner program to further ingrain best practice machine-learned algorithms to the daily fight against financial crime with RiskShield. Under this agreement, INFORM will be developing and offering access to RiskShield ML, powered by BigML. which, combines INFORMS’s rule-based and fuzzy logic approach with BigML’s top-notch Machine Learning resources delivering a “Hybrid AI solution” to its customers.

Earlier in the year, Thirdware became a BigML partner in May 2019. Thirdware is a leader in enterprise applications that has supported a range of Fortune 500 organizations as an implementation partner for infrastructure technologies such as enterprise resource planning, enterprise performance management, cloud services, and robotic process automation. In tandem with this partnership, Thirdware has formalized a new group within the company called Thirdware Labs and has brought on former Ernst & Young, Fiat Chrysler Automotive, and Ford Motor Company Executive, Kristin Slanina, as its first Chief Transformation Officer. You may follow Kristin’s thoughts on the opportunities Machine Learning represents in the automotive industry in this guest post. Stay tuned for more partnership announcements in different industry verticals in 2020!

44 Blog Posts

As usual, we’ve kept our blog running on all cylinders throughout the year. 44 new posts were added to our blog, which has long been recognized as one of the Top 20 Machine Learning blogs. Below is a selection of posts that were popular on our social media channels in case you’re interested in catching up with some Machine Learning reading in the coming days.

Looking forward to the rest of 2020…

This concludes our brief tour of what’s been happening around our neck of the woods in the last year. In the new year, we will proudly continue to help our customers bring Machine Learning to everyone in a simple yet beautiful form. Stay tuned as we along with our network of partners share more insights, customer success stories, new capabilities, and business milestones. Thanks for being part of BigML’s journey!

Meet the Distinguished Speaker Lineup for the Machine Learning School in Seville (#MLSEV)

Our second Machine Learning Summer School in Seville, Spain (#MLSEV) is taking place on March 26-27, 2020 in collaboration with EOI Business School. Under the guidance of Program Chairman, Juan Ignacio de Arcos of BigML, the two-day event is shaping up nicely with a healthy balance between technical instructors and business leaders among the distinguished speakers. As usual, BigML’s seasoned team of instructors will have no proverbial stone unturned when it comes to covering each technique our comprehensive platform offers whereas industry expert speakers will provide details of what works (and what doesn’t) in deploying real-world Machine Learning systems. MLSEV Speakers


We are also happy to share that BigML’s Chief Scientist, one of the founding fathers of the field of Machine Learning, Professor Tom Dietterich will be presenting as part of #MLSEV. Professor Dietterich’s historical perspective on the evolution and the current state of Machine Learning is unmatched so the audience will be treated to quite a journey regarding the most salient topics in both applied research and advances in various industry verticals.


On the other hand, Enrique Dans, Professor of Information Systems at IE Business School, who has been very positively received in previous Machine Learning schools will share his expertise on the overall business impact and the future strategic planning implications of advanced analytics and Machine Learning across our globally digitized economy along with relevant use case examples.


Machine Learning Put to Use

At #MLSEV, we will open a parenthesis for the use case of crime-fighting with special emphasis on financial fraud detection. One such real-world example is that of Rabobank‘s recent experience in deploying a hybrid solution for fraud detection that includes BigML models.

Moreover, this year’s speaker lineup includes representatives from INFORM GmbH, Talento Transformación Digital, and One thing is for sure, over the course of two days, attendees with different profiles will be able to find appealing content relevant to their roles.
MLSEVKeep in mind that no prior experience in Machine Learning is required. Tickets are now available for a very affordable sum of €70+IVA.  Space is limited so be sure to reserve your spot before too late and take part in this unique event held in a host city known for its beauty, charming people, and immense cultural heritage!

BigML Customer Success Highlights – Part 2

In this post, we continue revealing BigML customer success stories that we kicked off with our last post detailing how a number of startups are basing their smart applications and services on the BigML platform. Those companies have profited from adopting BigML rather than taking the costly and risky approach of trying to build their own Machine Learning infrastructure that could divert their attention away from their core predictive use cases.

Today, we get into a potpourri of business problems tackled with the help of the BigML platform by large multi-national businesses. We see multiple scenarios play out as businesses with global footprints go about consuming Machine Learning. This also holds true for the sample of predictive use cases outlined in this post as we give you a glimpse of the motivation behind solving each reference application.

BigML Customers

Industry-specific use cases

Every industry contains a portfolio of data-rich workflows as part of the associated core operations and standard practices. Hard-coded business rules or knowledge-based approaches tend to govern many of those processes leaving room for further improvement with the introduction of Machine Learning approaches that frequently yield dramatic increases in productivity.

  • Rabobank, one of the largest banks in The Netherlands, is a great example of such a use case. Rabobank was faced with the challenge of having to manually analyze a very large volume of payment transactions to guard against potential financial transaction fraud. A set of heuristics and business rules existed but were difficult and time-consuming to manage. The team tasked with the monitoring was overwhelmed with the number of payments flagged by existing systems. There had to be a smarter way to deal with this situation without having to lose the gains made so far or multiplying headcount continually. As a result, they chose to focus efforts on a new Machine Learning-driven approach letting the algorithms do the hard work of sifting through hundreds of thousands of transactions to reveal the highly anomalous ones. The resulting models were able to pinpoint problematic transactions in a highly accurate manner, which is why they were eventually embedded in Rabobank’s commercial fraud detection point solution. Fraud detection is not a “one and done” type problem so the models are continually monitored against covariate shift and are automatically refreshed as new data arrives in order to stay ahead of the fraudsters.
  • In a somewhat similar vein, Seagate, the world-renown manufacturer of computer hardware headquartered in Silicon Valley routinely manufactures and services millions of parts such as hard drives, which are covered under the company’s product warranty programs that can at times be abused by fraudsters that are always looking to game those programs by inventing schemes like returning counterfeit parts in the hope to receive back the genuine article. BigML-based fraud detection models have been able to successfully identify suspicious return patterns that have helped Seagate’s customer service and security teams focus their limited attention on truly anomalous instances while minimizing false positives that could negatively affect customer satisfaction metrics.

Enterprise support functions

Modern enterprises have complex ways to organize themselves into a multitude of functions, e.g., finance, marketing, sales, operations, legal, HR and more. Some of these functions are considered to be ‘core’ such as operations while others can be portrayed as ‘support’ functions. Because most companies that begin investing in Machine Learning have done so by creating central teams with advanced technical degrees, they tend to concentrate on a few use cases revolving around the core activities. This results in an imbalanced picture that starves ‘non-core’ functions of any Machine Learning capabilities save for basic ones baked into standard third-party SaaS tools.

  • Experiencing a similar challenge with their Human Resources function, ABN-AMRO chose to get on board with BigML to predict key employee metrics, e.g., likelihood to vacate positions in upcoming periods. With positive results in supporting ongoing retention efforts, this use case has proven that with a Machine Learning platform like BigML (and some training) any enterprise function or department can reskill employees and employ a self-serve analytical approach by creating custom workflows and optionally integrating the resulting predictions in relevant IT systems to better adapt to challenges they face.

B2B platform use cases

  • In addition to the above, there are certain situations that involve embedding predictive capabilities in platforms offering B2B services as the primary beneficiary.  In such instances, the need for automation is paramount besides the ability to offer analytical end-users of client businesses ways to visually interpret the underlying custom models they can build on the subject-matter B2B platform in a self-serve manner.  Dun & Bradstreet represents such a scenario as they have chosen to integrate BigML’s resources into their Analytics-as-Service B2B platform gaining time-to-market and scale while being able to control cost by fully automating workflows on behalf of their clients.

There are too many use cases to list here among those explored on the BigML platform either by our Private Deployment customers or by more than 100,000 registered users on our multi-tenant cloud platform offering a wide spectrum of subscription choices.

The main lesson learned here is that the Machine Learning consumption behavior of large organizations cannot be pigeonholed into a few perfunctory scenarios e.g., build vs. buy. The shades of grey do matter here. However, we can make some broad-based recommendations. Presented with such a foundational piece of technology that has the potential to eventually touch every operational process, businesses can benefit from a longer-term strategic approach to ML adoption rather than solely a use case-specific outlook saving the day with incremental improvements.

The latter approach may at times be satisfactorily implemented through third-party point solutions baking in some predictive capabilities generally based on standard data models such tools contain, e.g., predictive features baked into a CRM tool. Nonetheless, this piece-meal approach may fall short if further customization is desirable to better leverage custom data sources and may, in fact, result in unwanted system integration costs leaving host businesses with siloed bespoke systems.

If you have a similar business problem as the above or have an idea of a new and potentially game-changing analytical use case in your industry, be sure to get in touch with us at We can swiftly match you with a BigML expert, who can help you better formulate your approach by advising you on your data strategy, modeling (and evaluation) strategy, as well as your run-time prediction and deployment strategies.

In short, the BigML team is ready to help you have a merry Machine Learning-filled new year in 2020!

Registration Open for 2nd Edition of Machine Learning School in Seville: March 26-27, 2020

Based on the successful reception of our First EditionBigML, in collaboration with the EOI Business School, is launching the Second Edition of our Machine Learning School in Seville, which will take place on March 26 and 27, 2020. The #MLSEV will be an introductory two-day course optimized to learn the basic Machine Learning concepts and techniques that are impacting all economic sectors. This training event is ideal for professionals that wish to solve real-world problems by applying Machine Learning in a hands-on manner, e.g., analysts, business leaders, industry practitioners, and anyone looking to do more with fewer resources by leveraging the power of automated data-driven decision making.

Machine Learning School in Seville, 2nd Edition

Besides the basic concepts, the course will cover a selection of state-of-the-art techniques with relevant business-oriented examples such as smart applications, real-world use cases in multiple industries, practical workshops, and much more.


EOI Andalucía, Leonardo da Vinci Street, 12. 41092. Cartuja Island, Seville, Spain. See the map here.


2-day event: March 26-27, 2020 from 8:30 AM to 6:30 PM CET.


Please complete this form to apply. After your application is processed, you will receive an invitation to purchase your ticket. We recommend that you register soon, space is limited and per previous editions, the event may sell out quickly.


Lecturer details can be accessed here and the full agenda will be published as the event nears.

Beyond Machine Learning

In addition to the core sessions of the course, we wish to get to know all attendees better. As such, we’re organizing the following activities for you and will be sharing more details shortly:

  • Genius Bar. A one-on-one appointment to help you with your questions regarding your business, use cases, or any ideas related to Machine Learning. If you’re coming to the Machine Learning School in Seville and would like to share your thoughts with the BigML Team, be sure to book your 30-minute slot by contacting us at  
  • Fun runs. We will also go for a healthy and fun 30-minute run after the sessions. Details on the meeting point and time will follow. Stay tuned!
  • International networking. Meet the lecturers and attendees during the breaks. We expect hundreds of local business leaders and other experts coming from several regions of Spain as well as from other countries around the globe.

We look forward to your participation in our first Machine Learning school of the next decade!


BigML Customer Success Highlights – Part 1

Our post on the 100,000 registered customers milestone this summer included an infographic of sample use cases being explored by BigML users, which naturally span many different sectors and industries. Today, we’d like to start a series of posts that further highlight a subset of those business problems to give our readers some clues on how a comprehensive platform as ours can be utilized in different business contexts in case they’re considering new Machine Learning solutions.

BigML Use Cases

There are many ways to organize use cases, e.g., by industry, function, geography. In this post, we will focus on startups and SMBs as we give you a glimpse of the motivation behind solving each reference use case. In a later post, we’ll concentrate on large multinational companies also finding success with the BigML platform.

Startups and SMBs have good reasons to prefer the BigML platform because it lets them to affordably step into Machine Learning with ample room to further scale efforts as data volumes and the number of use cases implemented grow over time. Some startups have products and services that cannot even be launched without Machine Learning at their core (e.g., sensor-based medical diagnosis), whereas others grow into Machine Learning as they realize they are sitting on top of a hard-to-replicate and/or completely unique dataset that can fuel high-value predictive use cases that help differentiate their existing products.

Once useful models are in place, the systems integration and deployment choices are multiple. On the lighter side, predictions can be served in real-time through the BigML REST API and be included in a customer-facing user interface, say for instance in a given module like the product or next-best-action recommendations. On the other hand, if the end-user is expected to interact with and interpret the models first hand (rather than just consuming their predictions), the visualizations from BigML models can be made available in the host application in a white-label manner.

Predictive use case examples at ML-driven startups

Startup ML Use Cases

  • Juriblox B.V. is a European startup active in the legal services space. Their SaaS solution takes care of an oft-overlooked aspect of legal contract review and management: non-disclosure agreements (aka NDAs).  The Juriblox service named NDALynn can quickly grade any NDA uploaded by its subscribers to let them know not only the overall aggressiveness of the subject-matter NDA but also highlights specific clauses that are likely to cause problems down the road. All of these predictive capabilities baked into their web user interface are made possible thanks to a number of BigML models tapped into via the BigML API. Juriblox achievement is especially remarkable given that they didn’t have a data scientist or other highly-paid dedicated analytical expert on staff. This example shows that a group of subject-matter experts with access to relevant data and armed with a good understanding of their customers’ context coupled with a developer team can deploy sophisticated Machine Learning systems that are core to their offering.
  • Another BigML customer,, helps B2C businesses optimize demand generation by combining their customers’ CRM data with their national database containing key traits on over 125 million U.S. households. Faraday customers have been able to attribute as much as 1/3 of their sales to the ML-driven cross-channel campaigns addressing all stages of the B2C revenue lifecycle from customer acquisition to upsell and retention, e.g., social media advertisement performance comparable to the best targeting that Facebook ML models can support.
  • On the other hand, Frogtek helps Mexican micro-retailers to better control and grow their businesses as the company’s point-of-sale (POS) systems register every transaction. This data is a boon for Consumer Packaged Goods companies that are starved for visibility into consumer behavior and preferences to optimize operational efficiencies such as inventory management with Machine Learning.
  • The potential applications of ML to automate accounting are many. For example, Anfix, a Spanish startup, can help clients predict the correct expense account that a given invoice belongs to. Before Machine Learning, this process could only be performed by an accountant with an in-depth understanding of the company operations. The automation of such bookkeeping tasks allows financial professionals to use their time to focus on other activities that either result in more value to their customers or help them find new customers. Additional predictive efforts include determining in advance whether a given company will run out of money at some point in time, allowing scenario planning based on short, mid, or long term funding options. Knowing this information, the company can anticipate the negotiations with a bank to get a loan under more advantageous terms.


We hope these use cases give you some ideas about the wide range of Machine Learning opportunities in your setting. Please stay tuned as we will follow up this one with use case examples from large multinational companies in our next post.

Are you a manager or professional at a startup (or SMB) evaluating your options to better take advantage of your proprietary data sources by implementing Machine Learning systems to integrate predictions into your value proposition?  Be sure to get in touch with us at We can swiftly match you with a BigML expert, who can help you better formulate your approach by advising you on your data strategy, modeling (and evaluation) approaches, as well as your run-time prediction and deployment options.

Accelerating Machine Learning Adoption in the Automotive Industry

A few weeks ago, I had the chance to participate in the Ford Innovation Day organized by BigML partner Thirdware. The two-day event included innovation projects ranging from conversational agents to predictive maintenance systems leveraging Machine Learning.


My presentation was titled the same as the title of this blog post, thus mainly concentrating on the prospects of Machine Learning for automotive companies. In some ways, the automotive industry is not that different from many other industries of the global economy as it has been struggling to find its footing when it comes to putting Machine Learning front and central at scale. That’s not necessarily just due to a lack of meaningful investment either.

Let’s take a step back and take a look at the broader trends first, McKinsey Global Institute finds in its forward-looking Vision 2030 report on the automotive industry that the next decade will bring about slow (2%) growth in the traditional vehicle sales and related aftermarket services. To boot, most of this growth in the traditional business segments will occur in emerging markets driven by demographic and macroeconomic factors. Yet the global automotive industry revenue is expected to increase by $1.5T (+30%) thanks to new business models such as shared mobility and connectivity services materializing by 2030. As a side note, shared mobility examples include car-sharing and e-hailing while data connectivity services include specialized applications such as entertainment, remote services, and subscription-based software upgrades. In fact, 10% of cars sold in 2030 are expected to be shared vehicles adding to special purpose fleets and mobility-as-a-service solutions popular in dense urban areas. Various flavors (e.g., hybrid, plug-in, battery-electric, fuel cell) of Electric Vehicles will make up to 50% of vehicles sold by that time!

To make this new landscape possible new competing ecosystems with more diverse players will need to emerge to deliver a much more integrated customer experience. However, the common denominator of this future vision seems to be highly integrated intelligent software applications giving way to data-driven insights acting as the connective tissue in between. Permissioned data becomes the new currency of collaboration, software becomes much more central to everything and it doesn’t take a genius to figure out that Machine Learning gets to play a big role to play in this scenario.

Sounds great, right? Not so fast!

Back to today’s reality, another 2018 report, this time from CapGemini, found only modest gains (quantified as an increase from 7% to 10% of surveyees year over year) of ML systems deployed at scale among automotive OEMs, suppliers, and dealers. Yet 80% of respondents mentioned Machine Learning as a strategic initiative. There’s a tremendous gap between 10% and 80% that’s worth re-emphasizing.

ML is Hard

Potential use cases for automotive companies are many, touching both core operations (e.g., Predictive Maintenance, Supplier Risk Management) as well as support functions like Sales, Marketing, Finance or HR. So what explains the slow uptake? Part of this outcome has been shaped by previous expensive and failed attempts by these companies in dealing with the inherent complexity of Machine Learning. In response to this, most industry players seemed to have changed tack to apply a more measured approach in selecting use cases and projects. The surveyed companies that deployed at least three use cases at scale across their entire operations were dubbed as “Scale Champions” in the report, which frankly is a pretty low bar considering the true upside. Whatsmore, the “champs” did a markedly better job with the re-skilling of their workforces and putting in place a Machine Learning governance process.

Tale of Two Innovations

There’s something to be said of this top-down approach defined often by executive mandates and buttressed by committees defining and prioritizing the use cases of interest, identifying risks and the rules of the road. Things then get handed down to IT teams and Data Scientists to implement and roll out to production in collaboration. It’s certainly possible to make headway through this waterfall-like modus operandi albeit at greater cost and a slower speed.

However, there’s also a newly emerging bottom-up approach that is synergistic. Thanks to a new set of easy-to-use MLaaS tools with low-code visual interfaces and built-in AutoML capabilities subject matter experts, analysts and even business folks can be upskilled faster to autonomously explore their own predictive ideas that would otherwise go unexplored altogether. In this decentralized model of embedding ML in many more business processes, standardized workflows, and RESTful APIs play a critical role in deploying the worthy predictive models with high signal to noise ratios to production systems, thus eliminating the need to rewrite them from scratch with heavy IT involvement. As a bonus, the fact that a working ML governance framework from the previous waterfall projects exists serves to make this agile approach even more effective in managing related organizational risks.

This new way of thinking seems to be gathering steam with more thought leaders in the industry who are already singing the praises of an elevated level of accessibility. Take for instance Andrew Moore, who proclaimed:

“After years of hype around mysterious neural networks and the Ph.D. researchers who design them, we’re entering an age in which just about anyone can leverage the power of intelligent algorithms to solve the problems that matter to them. Ironically, although breakthroughs get the headlines, it’s accessibility that really changes the world. That’s why, after such an eventful decade, a lack of hype around machine learning may be the most exciting development yet.

We wholeheartedly agree!




%d bloggers like this: