Skip to content

Finding Sense in March Madness with Machine Learning

If you are one of the approximately 54 million Americans that filled out a bracket to predict the NCAA Men’s Basketball tournament this year, odds are that your bracket was no longer perfect within the first 24 hours of tournament, and substantially off track by the end of the opening weekend. Correctly predicting all 63 games (ignoring the 4 play-in games) is infamously difficult, with a probability that ranges from 1 in 9.2 quintillion to a much more manageable 1 in 128 billion, depending on who is counting. With such low odds, it is no surprise that Warren Buffet had famously offered a $1 billion prize to anyone who correctly picks every winner.

Not familiar with how the NCAA basketball tournament works? Now is a good time to pause and check out this guide.


This year, the tournament once again lived up to its “March Madness” moniker with a number of heavy favorites losing in the first two rounds of the tournament. The chaos was headlined by an unprecedented upset of #1 overall seed Virginia in the first round by long shot UMBC, and also claimed #1 Xavier, #2 North Carolina, and #2 Cincinnati as victims. With so many brackets officially busted, we decided to investigate how a Machine Learning approach would perform on predicting the remainder of the tournament.

Data Collection and Feature Engineering

In nearly all data analysis projects, data acquisition and wrangling constitute the greatest challenge and time demand. Fortunately, a well-structured data set of NCAA basketball games  – extending back to the 1985 season – has been compiled by Kaggle and Kenneth Massey. While this data was not in a format that could be considered “Machine Learning-ready” it did provide substantial raw material to engineer features. Our approach was to represent each team as a list of these engineered features, such that each past basketball game could be represented by the combination of two lists (one for each team), and an objective field consisting of the result of the game from the perspective of the first team listed. Because so many of our features relied on data that was only collected going back to the 2003 season, we limited our final data set to 118 total features acquired from the most recent 15 seasons.

March Madness features table

The features used in this investigation belonged to several different categories:

  • Team performance: e.g., home/away wins, wins in close games, longest win streak, scoring differential, etc.
  • In-game statistics: e.g., free-throw percentage, three-point field goal percentage, average three-point field goal attempts, average rebounding difference, etc.
  • Ranking information: e.g., RPI, tournament seeding, Whitlock, Pomeroy, Sagarin, etc.

Model Training and Evaluation

Operating under the assumption that each season can be considered independently, we trained and compared four distinct supervised Machine Learning algorithms offered by BigML using historical NCAA tournament and regular season data going back to 2003. Using the Python bindings, a unique cross-validation approach was implemented in which the results for each tournament were predicted using games from all other tournaments as training data. Given that 15 seasons of training data was available, the resulting evaluation was analogous to a 15-fold cross validation, which is visualized in the boxplot below. Default parameters were used for each of the four algorithms investigated: random decision forests, boosted trees, logistic regression, and deepnets.

March Madness model comparison

While these algorithms performed similarly to one another, we ultimately decided to apply our deepnets implementation since it had the smallest variance season-to-season and the greatest minimum performance, that is, it rarely performed poorly relative to the other methods.

March Madness deepnet model feature importance

Top 20 features used in the final deepnet model. Chart was created by downloading the CSV of field importances from BigML.

When investigating the field importance of the model, interestingly the team seed was not the most important feature, although both ranked in the top 20. This indicates that our model will return predictions that consider more than simply the seeding in the tournament. The top four features were quite consistent: the average scoring differential and margin of victories for the respective teams. This result suggests that how many points you score relative to the competition is perhaps a greater indicator of future wins than simply whether or not you have won in the past. Accordingly, teams with many blowout wins likely perform better than teams that have equivalent records but have won by narrower margins. The absence of “close wins/losses” among the top features indicates that close games may be decided more often by chance than by determination.

Finally, a number of different ranking systems, including RPI, WLK (Whitlock), WOL (Wolfe), and POM (Pomeroy), were found among the top features. It should be noted that while each of these systems uses a different methodology to rank teams, they are very highly correlated with one another as well as the seeding in the tournament.  If interested, you can check out an exhaustive list of these different ranking systems and how they compare.

Deepnet Bracket Prediction

Filling out an NCAA tournament bracket certainly does not require an algorithm, and several popular heuristics exist which enjoy varying degrees of success. While some combination of gut instinct, alma mater loyalty, and even mascot preference inform most bracket decisions, the default method of selection is simply choosing the lower seed. While this pattern largely held in our machine learning-generated predictions, the efficacy of this method breaks down after the initial rounds pass and teams become more equally matched. In a typical bracket contest, these are also the rounds worth the greatest amount of points.

In addition to picking the winner of each match-up according to our model, we have also color-coded the probability of each team winning. The intensity of color represents the probability of the result, with upsets being colored in red and pale colors indicating low confidence in an outcome.

BigML's bracket for March Madness

While conservative overall in its predictions, our model does not always choose the lower seed. In the East and Midwest, Villanova and Duke are expected to advance to the Final Four, although both Elite 8 match-ups are predicted with confidence not much higher than a coin flip. In the South, our model prefers #9 Kansas State over lower-ranked Kentucky, although Nevada is picked to advance to the semi-finals. Finally, in the West Region, our deepnet model has considerable confidence in Gonzaga and Michigan advancing, and prefers Michigan overall. Our projected championship game is a match-up between Big Ten tournament champion Michigan and perennial contender Duke, with the Blue Devils emerging victorious on April 2 for their 6th national title.

Tournament Simulations

While predicting the discrete outcome of a match-up is a compelling exercise, the frequency of upsets in the NCAA tournament reminds us that even very rare events inevitably occur. The next step was to explore the probabilities returned by our model in greater detail.

Rather than simply assuming the higher-probability result would always occur, we can instead simulate each game as an “unfair” coin-flip, according to the match-up probabilities returned by our model. That is, if Villanova is likely to defeat West Virginia with 73% probability, there is still a considerable chance (27%) that West Virginia may advance. By simulating tournament games in this manner, we can introduce upsets into our predictions according to how likely they are to occur. In the end, we simulated the remaining games of the tournament 10,000 times. The results are summarized in the table below.

The probability of each team advancing to the remaining rounds of the NCAA tournament. These values were calculated according to 10,000 simulations of the tournament.

Because the probability of events in the later rounds of the tournament represent compound probabilities, we see results that at first glance may not seem consistent with the bracket produced above. For instance, although Villanova is favored over Purdue in a head-to-head match-up, Purdue* still has the highest probability of winning the entire tournament. This is a reflection of two factors:

  1. Purdue being more likely to advance out of the Sweet 16 round than Villanova.
  2. Differences in results with the other teams in the tournament that could be faced in the Final Four rounds.

*Unfortunately, our model does not have the sophistication to factor in injuries to key players at this point, nor was it updated with data from the first two rounds of the tournament. Many experts agree that Purdue’s chances have taken a significant hit following the injury to 7-foot-2 center Isaac Haas, unless Purdue engineers can save the day.

Final Thoughts

While relying on a Machine Learning model to predict a bracket is far from a sure strategy, it can provide an alternative method to make interesting and compelling picks. By emphasizing the probability of the events, rather than the discrete outcomes, we can get a better sense of the frequency of upsets. Ultimately, the only way to know who will win the tournament is to play the games.


Rethinking the Legal Profession in the Age of ML

By now, Machine Learning is soundly in the public domain as its wide impact is being felt across many industries around the world as they go through digital transformations. Although the spearheading ML applications have come from the usual suspects such as Internet companies and software firms, the waves of automation and data-driven decision making have been recently crushing on the shores of the Legal Services industry (article in Spanish).

A typical law firm in the Western world employs tens or even hundreds of attorneys specializing in different practice areas e.g., intellectual property, corporate, civil, criminal, constitutional law. The business of legal services remains perhaps the very definition of a human-driven industry essentially relying on increasing the employee count to be able to scale to higher revenues. Such growth no doubt may present some efficiencies, but there’s no evidence of strong network effects letting few players dominate the market. So it becomes even more important to make the best use of your expensive human resources to succeed in this highly fragmented industry full of niche players.

Whatsmore, the legal profession is historically known as quite conservative in its business practices since it is educated on precedent and is less forgiving towards experimentation and failure. However, a combination of factors sweeping the industry is pushing more firms to reconsider this stance. For starters, clients are demanding faster, more intuitive and accessible legal advice delivered over multiple channels and geographies. In addition, billable hours for less sophisticated commodity aspects such as research or project management are being scrutinized more closely as opposed to reasoning and judgment.

In their 2018 predictions, the Legal Institute For Forward Thinking outlines that AI will be a ticket for admission as a driver of consistent, high-quality client experience. This suggests leading law firms will have to be run more like other companies with an emphasis on operational efficiency. Those who are left behind will have to do with less profitable clients and a shrinking client base.

How can ML make a difference?

Digitalization is the norm in today’s business environment, which means detailed data on legal evidence, contracts, legislation, and jurisprudence are all available in easily accessible digital formats. However, the bigger challenge remains in making sense of this data deluge, which where most law firms have been struggling to keep up with. Unsurprisingly, a lot of them are turning to technology to be able to deal with it without having to multiply human experts on their payroll.

The typical legal practice tasks involve reviewing and generating documents, discovering useful associations and understanding motivation and behavior of the parties involved in a legal dispute. State of the art Machine Learning techniques that work with unstructured data have a high degree of applicability in these tasks, in turn, reducing the burden of excessive paperwork. For example, contract specifics like parties involved, payment terms, or start and end dates can be automatically extracted and mapped for faster due diligence or anomaly detection.

On the other hand, legal firms share similar administrative challenges as many other firms like human resources management, pricing, forecasting or customer relationship management. By some accounts, over 50% of partner and associate time is being spent on such administrative tasks. The more efficient these peripheral activities and their underlying processes run, the more profitable the firm becomes as it leaves more resources to be creatively deployed towards new specializations and differentiated service offerings. As you may have come to suspect, with a little human expert help, Machine Learning can connect many of these dots better than humans alone can.

These opportunities do not merely represent forward-looking statements and wishful thinking either. As the old adage goes: the future is here, it’s just unevenly distributed. In fact, we’re already witnessing ML being successfully introduced into more sub-domains of law with use cases ranging from automated jurisprudence aids and predicting judicial decisions to predicting the success of claims. In all three examples, AI systems did as good if not better than collections of human experts. Are these the Google Deepmind moments of the legal industry? Time will tell.

Predictive Apps on BigML

As for BigML, thanks to our engagement with a leading North American law firm, we have been able to implement a solution to help predict (in detail) future legal services demand, associated resource requirements and optimal pricing for new matters by analyzing more than a decade’s worth of invoices and other expense reports. The resulting system provided partners and administrators unprecedented insights into cost drivers by matter type, jurisdiction, litigation team structure, and other case-specific factors. None of this could be replicated even by the most experienced members of the firm.

Other BigML customers in the legal space also keep adding to the creative ways ML innovations are deployed in the legal industry. For instance, NDA Lynn recently launched its automated NDA checker service, to begin with, training their models on hundreds and then thousands of variations of Non-disclosure Agreements. This collection of data produced interesting patterns that can serve as early warning signs for NDA Lynn customers looking to address any undue risks before agreeing to the terms stated in their NDA.

NDA Lynn

This simple, narrow-AI example will likely find its way to many other types of contracts over time as digital data samples increase in size and the need to manage risks in a quantifiable way mounts in today’s ultra-competitive legal marketplace. As such, leading-edge law firms see the need to add many more ML-powered micro-services capabilities to their next generation IT platforms making lawyering more efficient, accurate, and less labor intensive. If this trend stays in place, CTO or CDO jobs in law firms may be a hotter commodity than they’ve been perceived so far by top-notch technical professionals, further attracting the best and brightest young lawyers feeling right at home working with ML-driven systems.

Should be a fun ride to see how it all unfolds and whether one of the oldest industries can pass its test against technology!

Predicting Air Pollution in Madrid

Air pollution is a tremendous problem in big cities, where health issues and traffic restrictions are continuously increasing. The concentration of Nitrogen Dioxide (NO2) is commonly used to determine the level of pollution. In Madrid, Spain, there are several stations in different parts of the city that are constantly collecting the NO2 levels. My colleague, Jaime Boscá, and I applied BigML to see if we could accurately predict air pollution in Madrid.

Air Pollution Map Spain

NO2 view from European satellite Sentinel-5P (Photo: ESA)

A set of alerts based on the NO2 levels (shown in the table below) have been defined to monitor and avoid high pollution levels.


Madrid government air pollution alert states

These alerts trigger some measures that subsequently enforce traffic restrictions for Madrid citizens. The main problem is that these levels of NO2 are usually reached at the end of the day and the traffic restriction measures take effect the next day. Therefore, the population affected has only a few hours to rearrange their means of transport the following day. These measures have caused many criticisms of the local government. Predicting such alerts would help warn the population in advance so they have more time to reschedule their transportation plans.


Traffic restrictions due to air pollution are common in European big cities like Madrid or Paris (Photo: AFP)

Is it possible to predict which days will have pollution alerts?

Our goal is to predict a pollution alert (YES/NO) in advance by 1, 4, and 7 days. A pollution alert means that one of the previous alert levels has been reached.

Data collection

To address Madrid’s air pollution problem, we used three main data sources about the city:

  • Air quality data: has been gathered for years and is available for multiple air measuring stations gathering NO2 levels on an hourly basis. 
  • Weather data: information available daily about temperature, rain, and wind.
  • Historical traffic data: detailed traffic load information available online for main streets and highways around Madrid.

The data used was collected from 2013 to 2017. To simplify the problem, we limited the analysis to zone 1 (shown below) as it includes most of the Madrid city area, which has the greatest number of air stations.


Map of air and weather stations in Madrid zone 1

Data transformations

Both the weather information and the pollution alerts statuses are available daily. That’s why data has been represented with daily granularity: each sample (or instance) will provide information for a given day. Therefore, aggregated information of weather and air are included as additional features per day.

We also considered traffic conditions and the predictions of traffic in our model. In order to include traffic predictions, we used another model to predict Madrid traffic, which was implemented in BigML using features such as weekdays and holidays. The evaluation results have been promising, allowing us to use BigML traffic batch prediction results as features in our model for predicting air pollution. In the same way, temperature predictions were also modeled and used as features.

Predicting air pollution is a challenge. How many days in advance could we anticipate obtaining an acceptable prediction? We tried three different predictions: 1, 4 and 7 days in advance. Each prediction uses a different time window for the same features.

Feature engineering

Most datasets can be enriched with extra features derived from existing data. In our case, we can use time-based information such as feature values for a previous date or number of days since an event happened. We used the following features:

  • NO2 averages and maximum values.
  • Maximum, minimum and average temperatures.
  • Rain, wind and traffic information.
  • Traffic predictions.
  • Number of days since the last alert.


The datasets used including all features are available on the BigML Gallery:

Data exploration

The colored table previously mentioned shows the 3 air pollution alert levels defined in Madrid: “prior notice”, “notice” and “alert”. Within the five years of available data, only “prior notice” and “notice” alerts occurred; red “alert” never happened. Also, the distribution of pollution alerts is not balanced, but luckily, not many alerts are raised: less than 100 “notice” and “prior notice” states have been observed in total.

That’s why we decided to group alerts and create a boolean objective field to predict whether or not a pollution alert will be raised.

From our analysis, we can see that NO2 levels are directly related to air pollution (shown in the visualization below). We can also see that significant rain and wind have an impact on NO2 levels.


NO2, traffic load, wind and rainfall visualization

In the graph above, maximum total precipitation daily is represented in blue and wind maximum gust speed is in orange. Traffic load is represented in green while NO2 average level is represented in grey. In general, high wind speeds and abundant precipitations seem to correlate with lower NO2 levels, while low traffic loads seem to correlate with lower NO2 levels.

The BigML scatter plot graphs below support this correlation. The following graph displays the correlation between the boolean of whether there was rain over 15mm during the last 3 days and the average level of NO2. We can observe that all cases with rain over 15mm correspond to NO2 levels under 55 µgrams/m3.


3 days rainfall over 15mm correlation to average NO2 level

The next graph displays the correlation between the wind daily average maximum speed over the past 3 days and the NO2 level. When the wind average maximum speed is over 20km/h then NO2 is under 50 µgrams/m3.


3 days wind maximum speed correlation to average NO2 level


Predictive modeling involves evaluating models and comparing results to select the appropriate algorithms and their specific parameters. Initially, we tried different algorithms available in BigML suitable for classification (models, logistic regression, ensembles, and deepnets). Ensembles gave the best results (see all models comparison in the next evaluation section). Using the WhizzML script SMACdown we could automatically test all possible parameter settings for ensembles.


Modeling strategy


Initially, the dataset is split chronologically: data from 2013 to 2016 is used for training and 2017 data is used as a test set for evaluations. Evaluations criteria are based on the Area Under the Curve (AUC) of the ROC curve (graphically representing the trade-off between the recall and specificity for classification problems). Since we have a very imbalanced dataset (the days with alerts are very few compared to the days without alerts), we need to balance the model by applying a probability threshold. The optimal threshold has been set trying to minimize the False Negatives (days predicted as not having alerts but they actually have an alert) without penalizing too many of the False Positives (days predicted as having alerts, but they don’t actually have an alert). We have compared all the available models using the BigML comparison tool to ensure we selected the best performing model.

Below we can find the field importance graphic for this ensemble used in the evaluation. The most important field is the number of stations having a NO2 measure over 150 µgrams/m3 the day before, followed by the NO2 average range the day before, and the NO2 maximum range over the 5 previous days. Traffic prediction, rainfall, and wind representative fields also appear in the top 10.


BigML field importance graph: 1 day prediction ensemble

We can see in the figure below the different evaluations for predictions 1 day in advance. The boosted ensemble of 300 iterations (represented in orange below) gave the highest ROC AUC (0.8781).


BigML evaluations comparison tool: 1 day prediction

Once we have selected the best model by looking at the AUC metric, we need to look at the recall and precision of a given model to select the optimal threshold to start making predictions. The recall is the number of true positives over the number of positive instances, while the precision is the number of true positives over the number of positive predictions. The image below displays a BigML prediction evaluation for 1 day with the suitable probability threshold set to 27%. We can see how the model predicted 14 out of 19 actual alerts resulting in a 73.68% recall. It also predicted 16 other days that did not have an alert incorrectly which means a precision of 46.67%.

evaluation (1)

BigML ensemble evaluation: 1 day prediction

The chart below shows the recall and precision for the three predictions performed: 1, 4, and 7 days in advance.


Precision and recall results

As expected, the higher number of days in advance we try to predict the lower the performance. Nevertheless, making pollution alert predictions even one day in advance would already benefit citizens in their daily lives, as they are currently being warned only a few hours in advance.

Taking this use case a step further, predicting pollution levels accurately and sufficiently in advance could even enable us to reduce high pollution levels, one city after another. Insights from Machine Learning aren’t meant to simply remain as additional information about our world – they are meant to be put to good use and improve people’s lives, in our businesses, societies, and beyond.

2ML Madrid Machine Learning: Keep your Business Ahead of the Competition

This week we saw the power of Machine Learning in action when BigML deepnets accurately predicted the winners of the major award categories of the Oscars 2018. The movie industry is simply one visible example where Machine Learning can be applied, but there are many more business-oriented real-world use cases that will be shown at the second edition of 2ML Madrid Machine Learning, to be held in Madrid, Spain, on May 8-9.

Barrabé and BigML are bringing to Madrid the second edition of the annual series of 2ML events where hundreds of decision makers, technology professionals, and other industry practitioners will gather to learn and discuss the latest developments in the fast-changing world of Machine Learning. 2ML is a game-changer event that helps you keep your business competitive, raises awareness about the best ways to integrate Machine Learning capabilities into your business processes, analyzes the current analytics landscape in leading industries, and showcases the impact that Machine Learning is already having in finance, the legal sector, marketing, sales, human resources, sports, social enterprises, and more.

Encouraged by the success of 2ML17, we are ready to continue raising awareness of the key role that Machine Learning plays in the transformation of sectors representing a wide swath of global economic activity.


Want to know more about #2ML18? 

Discover the impact that Machine Learning is going to have on your business while receiving valuable insights from innovators and early adopters that can help keep your company ahead of the competition. Don’t miss 2ML 2018’s jam-packed agenda presenting some of the brightest minds in the Machine Learning field today. Join us at #2ML18 on May 8-9 in Madrid, Spain, and be sure to purchase your ticket before March 28 to get a 30% discount!

2018 Oscars Predictions Proved Right: 6 out of 6!

Last night, Hollywood stars were looking stunning on the red carpet for the 2018 Oscars Ceremonies. BigML’s Machine Learning algorithm was also on point. Our predictions for the 2018 Oscar Winners were correct, 6 out of 6!

BigML's 2018 Oscars Predictions

The BigML deepnets model accurately predicted the winners of the major award categories: best picture, best director, best actress,  best actor, best supporting actress, and best supporting actor. The notable improvement from our 2017 Oscar Predictions is thanks to the powerful capabilities of our deepnets model, which is one of the top performing algorithms across different platforms.

2018 Oscars Predictions Results

Movies bring people together, from families getting cozy on their couches, to individuals sharing their stories with complete strangers across the globe. For this reason, the entertainment industry is an exciting area to apply Machine Learning, as seen in the outpouring reactions to BigML’s 2018 Oscars Predictions.

Thanks to everyone who has commented and joined the conversation! To mention a few, check out Enrique Dans’s blog recap, KDnuggets tweets, and the article by El País Retina (in Spanish). Head to BigML’s Twitter to see many more.

Can’t wait for next year’s Oscars! In the meantime, we look forward to many other cool applications of Machine Learning. Have ideas to share with BigML? We’d love to hear them at

Predicting the 2018 Oscar Winners

After the success of last year’s Oscar winner predictions, we are excited to announce this year’s predictions. Furthermore, this year we count on our powerful BigML deepnets and their automatic optimization option which makes them one the best performing algorithms across platforms.

This year, there is a clear favorite, The Shape of Water with 13 nominations, but this doesn’t mean we are not witnessing a fierce competition between a wide set of high-quality independent films with stunning performances. However, models don’t care much about this as they don’t merely follow critics’ opinions. Instead, they search patterns based on the films that won in the past and make predictions for this year’s nominees. Ok… what data exactly?

The Data

Theoretically, models get better with more observations. Therefore, this year we are keeping all the previous data and features we had brought together for last year’s predictions. This amounts to a total of 1,183 movies from 2000 to 2017, where each film has 100+ features including:


The only major change in the data this year was the removal of the full user reviews from IMDB since they didn’t prove to be important last year and the effort to obtain them is relatively high.

The Models

As before, we train a separate model per award category. For a change, this year we’ll use deepnets, BigML deep neural networks, instead of the ensembles that we used last year.  Using BigML deepnets with their unique first-class automatic optimization option (“Automatic Network Search”) is the simplest and safest way to ensure that we are building a top performing classifier. Each model takes around 30 minutes to train since it’s training dozens of different networks in the background, but it is time well spent as the resulting model is very likely to beat others you’d configure via trial and error.


When the deepnet is created, we can easily inspect the most important features of the model and the impact of each of them on predictions.  For example, in the case of the Best Picture, we can find several awards like the Critics’ Choice Awards, the Online Film and Television Awards, the Hollywood Film Awards, and the BAFTA Awards among the top predictors. Alleviating the fact that deep neural networks tend to be hard to interpret, BigML offers a unique deepnet visualization, the Partial Dependence Plot, to analyze the marginal impact of various fields on your predictions.


To ensure our model is a good classifier, we trained it by using the movies from 2000 until 2012 and we then evaluated it by using the movie data from 2013 and 2016.  For all award categories, we obtained a ROC AUC over 0.98 which means that models were able to predict the winners for four consecutive years (2013 until 2016) with few mistakes. For example, see below the confusion matrix for the Best Actress model, where it predicts correctly 3 out of the 4 test years.


The Predictions

Without further adieu, let’s predict the 2018 winners! Drum rolls please…

For each category, you can find the winner as well as the scores predicted by the model for the rest of nominees.

The Shape of Water is the heavy favorite for the Best Picture with a 91 score. However, the model also gives a respectable chance of awarding the prized statue to Three Billboards Outside Ebbing, Missouri with a 68 score.


For the Best Director category, the model doesn’t have doubts. Guillermo del Toro is the likely winner with a score of 75 and no other nominee comes close.


Similarly, for Best Actress, there seems to be little competition.  Frances McDormand is the undoubtedly the favorite with a 99 score. Far behind, we can find Margot Robbie with a score of 5.


Gary Oldman is predicted by the model as the winner for Best Actor with a score of 88 for his amazing transformation as Winston Churchill in the Darkest Hour. However, he will need to subdue Timothée Chalament, the up-and-comer from Call Me By Your Name, who shows a score of 72 to win according to the model. Another strong rival is the consummate professional Daniel Day-Lewis with a score of 51 to win the award for his role in Poul Thomas Anderson’s film, Phantom Thread.


Among the five nominees for the Best Supporting Actress, the model favors Allison Janney for her role in I, Tonya with a 64 score.


The Best Supporting Actor category has more competition, however, Sam Rockwell seems the clear favorite for his role in Three Billboards Outside Ebbing, Missouri with a 95 score. With that said, Willem Dafoe has also a decent chance for his performance in The Florida Project with a score of 61.


This wraps up our 2018 Oscar predictions. So it’s time to grab your popcorn, favorite drink and see who the real winners are this Sunday, March 4th. That is, unless Jimmy Kimmel and company come with another jaw-dropping snafu to mess up our models even if for a Hollywood minute!

The BigML Dashboard Released in Chinese

新年快乐!Happy Chinese New Year!

It’s only fitting that we release the BigML Dashboard in Chinese at this time of the year.

Since its very beginning, BigML has strived to make Machine Learning Beautifully Simple for Everyone (机器学习美观简单人人用). Today our journey reached another milestone by allowing over 1 billion people to use the BigML platform in their native language.

Top 20 Languages

You can change the website language on BigML by using the selector highlighted in the image below. While the web interface will appear different, all the BigML functionalities remain the same.

When you sign up, your BigML username will still have to be alphanumeric, but you can use Chinese in your “Full Name”. After logging on, all Dashboard features are identical to the English version. You can create and manage all resources and workflows the same way.

Dataset View

You can watch this video to check out the BigML Dashboard in Chinese:

Over time, we will make improvements such as providing more documentation and tutorials in Chinese, and integrating the BigML Blog and Events pages. Furthermore, our internationalization will continue with support for more languages. Let us know at what languages you would like to see on BigML next.

读万卷书,行万里路。For hundreds of years, this phrase has been the motto of Chinese intellectuals. It literally means reading tens of thousands of books, and walking tens of thousands of miles. We think it now applies to BigML: Learn millions of volumes of data and travel millions of miles, to reach every corner of the world.

Kicking off the BigML Internship Program

There has never been a better time to become proficient in Machine Learning, a field that is rapidly transforming the world as we speak. While there are hundreds of tutorials and courses available online, nothing compares to learning directly from the team that pioneered Machine Learning as a Service (MLaaS) in 2011 and continues to implement real-world Machine Learning solutions every day.

BigML Internship Program

How can you get this hands-on experience, you may ask? One of the best ways is becoming a BigML Intern to get a headstart! We are excited to launch our Internship Program, which welcomes advanced students and young professionals to join the BigML Team. BigML is seeking interns during the summer of 2018 and throughout the whole year for the following roles:

  • Product Development
  • Software Engineering
  • IT
  • Marketing
  • Business Development

To give a taste of BigML Internship projects, here are two examples from past interns:

Every company, big and small, has data and the need to analyze it to transform it into actionable insights. For this reason, Machine Learning is becoming a common and expected tool in business settings. Regardless of your academic background and career aspirations, having a grasp of applied Machine Learning will serve you well in the future. Machine Learning skills are replacing the “proficient in Excel” requirement on resumes, and this progression will only continue in 2018. In fact, we are already seeing job applications that list BigML as a required skill (see minute 1:45).

BigML Interns 2018

The BigML platform continues to expand and improve, thanks to our top-notch team that has been international since day one. Our summer 2017 interns were no exception, representing a variety of backgrounds and coming from 4 different countries of origin (China, India, Spain, and the USA). The BigML Team today consists of 35 members from 6 countries, representing 14 different cities. During 2018, BigML looks forward to bringing several interns on board who are eager to learn and put Machine Learning to good use.

If you are nodding your head at this point and thinking, “sign me up!” please see the dedicated Internships page for more details about the program and how to apply. Feel free to contact with any questions. Don’t hesitate and start your application today!

Really Automating Machine Learning at ML Prague 2018!

On March 23-25, 2018, more than 1,000 Machine Learning practitioners will gather at Machine Learning Prague 2018 to hear from 45 speakers currently working in the Machine Learning field. The event will take place over 3 days with fun and interesting talks, such as “Really Automating Machine Learning”, presented by Dr. Charles Parker, BigML’s VP of Machine Learning Algorithms, who will present what’s behind the scenes of Machine Learning automation, focusing on how to know when your automation is succeeding, and what to do to make it easier. Interested in attending Dr. Charles Parker’s talk or other sessions at Machine Learning Prague 2018? Purchase your ticket today using the code below to get a 10% discount!

We interviewed Dr. Charles Parker to share a glimpse of what’s to come in his talk at ML Prague 2018. Find the excerpts below.

During the last years of your career, you have been focusing on ML automation. What’s the motivation behind this topic?

The main reason is necessity: the amount of training data available to people is increasing. As it does, the model complexity we can consider grows more and more, which means we need new ways of addressing parameterization of all of these very complex models. Nobody, experts included, has the time to figure out, for example, the effects of all of the various parameters of all of the various ways of doing gradient descent. So these automation methods are sort of bound to become de rigueur in the field.

On a personal level, there’s something compelling to me about working on a layer atop all of these parameters. The problem you’re trying to solve when you’re parameterizing an ML algorithm is “what are the best parameters for this algorithm for this data?.” At the automation level, your task is “what is the way to find the best parameters for any algorithm on any data, given finite compute power and time?”  To me, there are a whole bunch of interesting questions in there about the distribution of datasets in the real world as well as the distribution of loss functions. What kinds of data do “most people” have, and how to “most people” know when they’ve got a good model?  Knowing those priors would make automation a lot easier. However, not only do we not know what they are, but I don’t think we even have a very good idea about how to figure out what they might be. That’s an exciting place to work.

What would you tell those who refuse to join the Machine Learning revolution?

I’m going to surprise everyone and say those people are absolutely right to be suspicious! Machine Learning is harder than you think. Using ML algorithms is fundamentally a different way of interacting with computers than most people are used to, and it takes some time and talent to develop a facility for that interaction. When we train ML algorithms, we’re really programming with data. As such, even though you don’t need to write code, you need to develop a lot of the same habits of mind that a programmer has: Good programmers learn early on to see their code as the machine sees it; good ML practitioners see their data as the machine sees it.

In addition to all of that, it’s also harder to measure the performance of a Machine Learning algorithm than you think, so even when people have done everything right with their data, they sometimes still don’t get what they think they will.

This isn’t an excuse to let it pass you by, though. Forty years ago, computers themselves were difficult to use and strictly the domain of geek enthusiasts. Those who developed the ability to interact with them early on put themselves in a position to succeed in a whole bunch of different careers. I think it’s very possible that Machine Learning will have the same sort of transformative power for those who take the time to learn how to do it right.

You have been developing Machine Learning applications for quite some time. Based on your experience, what would you recommend to the new ML practitioners that are starting now?

The best way to learn is by practicing. I recommend to get some data and start playing around, trying to predict things, introspecting models. If that’s too vague, check out the BigML education videos and try to find data from other sources where you can do the same things we’re doing in the videos.  In this way, you get a sense of how Machine Learning data is “supposed to look” which is maybe the first and most important skill when you start with Machine Learning.

There’s this misperception that you have to know how to write code to get started and that just isn’t true anymore. BigML is a fantastic playground for messing with these ideas because you don’t have to write any code or install any software. On top of that, we’ve got an interface that lets you interact with the models you train, which is hard to come by elsewhere. Sure, it sounds like a pitch for BigML, but if I didn’t think it was the best way for non-experts to do Machine Learning, I probably wouldn’t work here!

About the Lecturer

Dr. Charles Parker is the Vice President of Machine Learning Algorithms at BigML. Dr. Parker holds a Ph.D. in computer science from Oregon State University, and was previously a research associate at the Eastman Kodak Company where he applied Machine Learning to image, audio, video, and document analysis. He also worked as a research analyst for Allston Holdings, a proprietary stock trading company, developing statistically-based trading strategies for U.S. and European futures markets.

His current work for BigML is in the areas of Bayesian Parameter Optimization and Deep Learning. One of his recent projects developed at BigML was bringing Deepnets to the BigML platform, the world’s first deep learning algorithm capable of Automatic Network Detection and Structure Search, which automatically optimize model performance and minimize the need for expensive specialists to drive business value. Dr. Charles Parker ran a benchmark test with over 50 datasets representing a variety of use cases and industries. Based on extensive evaluations that considered 10 different performance metrics, each based on 10-fold cross validation, BigML Deepnets came out on top of 30 competing algorithms.

Additionally, Dr. Charles Parker regularly contributes to the BigML blog. Here are a few highlighted for your leisure:


BigML Connects with Corvallis Community at WiN PubTalk

On Tuesday night, the BigML Team presented at the February Willamette Innovators Network (WiN) PubTalk. This event was hosted by the WiN and the Corvallis Benton County Economic Development Office and held at Eat and Drink 101, a cozy restaurant in downtown Corvallis, Oregon.

BigML at the WiN PubTalk

The night kicked off with networking time, followed by a round of introductions where anyone in the audience could give a 30-second pitch about their company. Next came the main presentation about BigML, which was given by Adam Ashenfelter, BigML’s Chief Data Engineer and Co-founder. Since we’re used to presenting to ML newbies, Adam first explained what Machine Learning is in a nutshell, and how it’s used in many industries for tasks like fraud detection, market segmentation, recommendation systems, and more. He touched on “the Good and the Bad of Machine Learning” and how BigML is working to solve these challenges:

  • The GOOD: thanks to cheap computation, data is abundant for all sorts of industries.
  • The BAD: many Machine Learning tools are difficult to use and require deep expertise. To boot, there are not enough ML experts to go around.
  • BigML’s SOLUTION: make ML tools 10x easier so everybody can get more from their data and make better decisions.

Adam also shared about BigML’s background and our “locally grown Machine Learning.” As mentioned in our previous post about why BigML is in Corvallis, the BigML Team has strong roots in Corvallis, with seven team members currently living there (three of which are long-time locals). Oregon State University has played a big role in the founding of BigML, thanks to its leading programs in Artificial Intelligence and Machine Learning.

BigML at the WiN PubTalk

Attendees represented a variety of industries and profiles within the local community including engineers from software startups, professors and students from Oregon State University, members from non-profit organizations, entrepreneurs, and other business people from Corvallis and other cities in Oregon such as Portland and Bend. Twitter was the largest company represented in the audience with a remote worker from Portland.

The crowd was engaged, asking questions about how Machine Learning could be applied to their current businesses. Tom Nelson, Economic Development Manager at the Corvallis Benton County Economic Development Office, commented:

“WiN had an impressive turnout at the PubTalk. We had standing-room only with over 60 in attendance, and we had a VERY cool company, BigML, to tell us about their company and Machine Learning. The topic was obviously a draw!”

Our team in Corvallis extends a big thank you to the Willamette Innovators Network for organizing this event, and we look forward to building more connections with the local community. If you are in the Corvallis area and would like to learn more about BigML, please reach out to us at Check out our photo album on Facebook and on Google+ to see more pictures from the event.

%d bloggers like this: