Subscribe to DSC Newsletter

Applying Data Science to Oscar Nominations

Oscars ceremony last weekend was a blast. My friends had their theories as to what movie would win the Best Picture Award; I secretly prayed for "The Imitation Game" to make it. Alas, I was wrong and I figured if I want to make better guesses in the future, I should seriously learn a little more about movie industry. Especially given how much I like movies (and who doesn't?).

So how do I learn more about movie industry and figure out what it takes to get nominated for an Oscar ?


I have to admit that I know nothing about movie industry, so after spending a few hours on google search I came across a pretty cool movie website ( where they publish basic information about movies made in the last 20 years. I managed to obtain information on 11,330 movies produced between 1995 - 2015.

This dataset (lets call it Movie Dossier) consisted of the following fields:

  • Movie Name
  • Release Date
  • Distributor (Film Studio Name)
  • Genre
  • MPAA Rating (Movie rating system that suggests what type of audience is eligible for watching the movie)
  • Total Revenue ($)
  • Total # Tickets Sold (Units)
  • Year (When the movie was in production)

Along with the Movie Dossier, I also found a separate database on Box Office Mojo ( listing movies nominated for the Oscar Best Picture Award. I joined Movie Dossier and Mojo Oscar tables (on Movie Name) and voila - I knew if a movie was nominated or not.

Exploratory Analysis

When I looked at the Movie Dossier dataset, I didn't know where to start. Does movie production change over time? If so, how? Does movie genre have any significance? How come some movies are so much more popular than others? And above all, how can this data help me understand what helped 8 movies beat other 660 competitors and get nominated for Oscar's Best Picture Award in 2015? 
So here goes...

My Lesson #1

When you enter an uncharted territory and you lack domain knowledge in the subject you are about to analyze, take a pause...find data you think is relevant and play with it. Very similar to how you usually play with a new gadget when you are too lazy to read the manual. This will help you learn more about the subject and generate hypotheses you are looking for.

I took my own lesson and started looking at basic metrics like:

  • Counts and Frequency tables on categorical fields (Movie name, Distributor, MPAA Rating and Genre)
  • Total, Average, Standard deviation of numerical fields (Total Revenue and Total # Tickets Sold)

While doing this simple analysis, I noticed that movie production has been following overall US economy market trends with a little lag. Here is a visualization I put together.

Preliminary Findings

First off, why do Drama movies bring less money than Comedy films?! I guess people prefer to be more funny than serious... But look, film producers don't seem to agree since they keep making more drama than comedy (1,960 comedy films vs. 3,541 drama movies have been produced since 1995, but comedies earned 31% more money than dramas). Adventure movies turned out to be the most efficient ones -- can you believe that 619 movies made almost $39B (i.e. $61M/movie)? Well, I guess they are the most expensive ones too.

Speaking of production budgets, Avatar proved to be a revenue champion in action genre with $760M in gross earnings and $425M spent on production (who said 79% is a bad ROI?).

Curiously, Action, Thriller/Suspense and Adventure top movies earned 2 times more money than Drama, Comedy/Romantic Comedy and Horror favorites. And to my biggest dissapointment, Justin Bieber's concert show ranked #1 in Concert/Performance genre. But lets keep moving...

The bottom chart displays movie production volume change since 1995. Bar charts represent the # of movies released in that year and the trend line shows average revenue per ticket sold. As an add-on, if you hover over any bar you will see:

  • Average revenue per movie in that year
  • Average # tickets sold that year

It was a big revelation for me that although movie production consistently followed US economic trend with a little lag (US market activity dropped in 2008-2009, whereas movie industry showed decline in 2010), in 2010 when movie production went down by 40%, on average film companies made a lot of money per film. Average revenue per movie was $25M which is the highest average revenue seen in 20 years. But when I looked at how much money each US citizen spent on movies that year, the picture cleared up a little. Turns out, people were paying $8.3 for a ticket compared to $6/ticket historical average.

So in a matter of few hours I saw that

  • People pay more to laugh than to cry although film producers for some reason make more drama than comedy.
  • If you happen to become a top movie you seem to be better off in action/thriller/adventure genre than in drama or comedy.
  • Turns out people will pay more for fewer movies than less for more movies.

Next Steps

I think I kicked my data around enough to generate initial hypotheses to answer my main question.

  • Genre is a promising factor that could help me understand whether a movie can get nominated for Oscar.
  • How much money a movie makes could be a determining factor in Oscar nominations.
  • Number of tickets sold is a reflection of movie's popularity, therefore, we could also use this factor to answer our question.

In my next post I will conduct a confirmatory analysis where I will test how well each factor can predict the likelihood of a movie to be nominated for Best Picture Award.

Your Turn

What do you think about the data? Could I use other sources to dive deeper into existing datasets?

Did exploratory analysis make sense to you? How else could I have explored the data to better link to Oscar nominations topic?

Have you conducted a similar exploratory analysis before? How did you approach the problem?

Originally posted here.

Views: 1858


You need to be a member of AnalyticBridge to add comments!

Join AnalyticBridge

Comment by Tatiana Sorokina on April 5, 2015 at 8:25am

Thank you for your thoughts Chris. It is great to hear that we find common analytical problems in such different domains. You are right about US economy trend.  It looks like we can expect continuous growth in the next few years (well, at least in the movie industry).

Comment by Chris Lira on March 24, 2015 at 3:47pm

Hi Tatiana,

Enjoyable reading especially in the thought process. In answering your questions,

* "What do you think about the data?" I really like the bases which have good history. Only wish there was a way to add up all of the earnings per movie across all years

* "Did exploratory analysis make sense to you?" Yes

* "Have you conducted a similar exploratory analysis before?" Currently doing Australia's population and looking to match the health of its residents and health industry stats across 10 year intervals.

* "How did you approach the problem?" Currently a work in process with original information in raw form coming from the Australian Bureau of Statistics.....and various pivot excel table scenarios.

I have one theory for you based on the lower part of the chart. With all of the financial predictions around the economy (stretch/growth/downturn views), The chart tells me that the US Economy is on the up for the next five to seven years when you look at the 2000 and 2012 points on  :). Great article.

Chris Lira

On Data Science Central

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service