Grading our 2021 Oscars Machine Learning Predictions

After a difficult year in moviemaking and show business in general, the rescheduled 2021 Oscars were presented on Sunday evening marking the end of this year’s award season. Ultimately, history will be a better judge of the winning performances and productions as well as the hostless, lowkey ceremony by the Academy standards. However, few things come to mind as far as highlights are concerned:

a very different format void of fans and one that is likely to be abandoned once the pandemic is fully behind us
a diverse set of winners headlined by Chloé Zhao (Nomadland), who became the first woman of color to win the Best Director award
Francis McDormand (Nomadland) racking up her third statue — nice going!
Yuh-Jung Youn (Minari) continuing the South Korean streak that started with the astonishing success of last year’s Parasite
and a somewhat surprising Best Actor winner in Anthony Hopkins (The Father) and a hasty ending due to him not being in attendance.

It is amazing that, at 83, Hopkins is now the oldest person ever to win an Oscar passing Christopher Plummer and giving us mortals some hope that it may never be too late for a “second act!” With his few-in-a-generation talent having graced our screens over the years, in hindsight, his lauded performance in The Father should have been taken more seriously despite the late Chadwick Boseman‘s seemingly rock-solid frontrunner status.

Drum rolls please…

So how did we do with our predictions this time? The quick answer is we got four out of eight predictions right. At first glance, a 50% hit rate doesn’t impress much. It almost sounds like you could flip a coin and get the same result. But considering the five nominees in the seven Oscars that we made predictions for plus the eight movies nominated for Best Picture, we were really dealing with 600,000+ combinations. So, even getting the four we did get right is equivalent to a 1 out for 1,000 probability if it were done so out of pure luck.

Of course, we need to restate that all our models are based on a fairly small sample (1,345 movies) of past award data. To be more specific, the number of positive examples (there are just 21 award winners in a given category in the last 21 years) in each held-out test set is pretty tiny, which may result in overfitting. Contrast that against millions of data points most industrial-scale Machine Learning applications routinely ingest to turn out predictions.

Prediction accuracy is only half the game. What makes the BigML approach in this fun annual diversion unique is that it presents a repeatable, end-to-end process built on top of a publicly shared dataset. In general, Machine Learning models such as the classification models we built for this project rely on the assumption that the newly presented data will not drastically deviate from historical datasets they were trained on. While this helps produce robust results that are statistically significant most of the time, it may also miss important points of deflection from the norm as voting patterns or the panel of voters evolve.

With that said, we are always looking to learn ways to improve. Some explanations on our misses can be found online with some pundits claiming that The Father was building momentum among the Academy voters with perfect release timing purposefully planned by Sony Pictures Classics. Currently, we don’t have variables directly taking into account such recency effects or social media buzz (e.g., movie critic sentiments from select Twitter accounts). These may make nice future additions, however.

One more thing…

The table below compiles all our past predictions from 2018 to 2021 Oscars and the corresponding hit rates. In addition to the Top Picks that we annually shared in our past blogs, this table lists how the accuracy metric improves if we also consider the movies that received the highest two (Top 2) or three (Top 3) scores. The Top Picks alone had an average 70% hit rate, whereas the coverage reached 90% with Top 3.

2018/21 Oscars Predictions — NOTE: Original and Adapted Screenplay predictions apply only to 2019-21.

On a related note, the Screenplay categories we added in 2019 onwards seem to have proven more challenging possibly because they have their own quirks. For instance, a different set of awards than the film industry-heavy ones we take into account may be more indicative for those crossbreed categories with literary tints. If we take them out, there’s a single miss from Top 3 in the last four years: 2020’s perfect storm of Bong Joon Ho (Parasite) snatching the Best Director Oscar despite being the fourth pick of our model.

As the pioneers of ML-as-a-Service here at BigML, we welcome you to build your own models with the public movies 2000-2020 dataset. It’s a great way to put your Machine Learning skills to test quickly without the overhead of having to download and install many open source packages and worry about compatibility issues or hard-to-decipher error messages. It takes 1 minute to create a FREE account and about as much to clone the movies dataset to your account. Just like that, you’re in business running ML tasks. As always, let us know how your results turn out on Twitter @bigmlcom, or send us a note anytime at feedback@bigml.com!

Drum rolls please…

One more thing…

Share this:

Relacionado

Leave a comment Cancel reply