A Data Science Central Community
Oscars ceremony last weekend was a blast. My friends had their theories as to what movie would win the Best Picture Award; I secretly prayed for "The Imitation Game" to make it. Alas, I was wrong and I figured if I want to make better guesses in the future, I should seriously learn a little more about movie industry. Especially given how much I like movies (and who doesn't?).
So how do I learn more about movie industry and figure out what it takes to get nominated for an Oscar ?
Dataset
I have to admit that I know nothing about movie industry, so after spending a few hours on google search I came across a pretty cool movie website (http://www.the-numbers.com/market/) where they publish basic information about movies made in the last 20 years. I managed to obtain information on 11,330 movies produced between 1995 - 2015.
This dataset (lets call it Movie Dossier) consisted of the following fields:
Along with the Movie Dossier, I also found a separate database on Box Office Mojo (http://www.boxofficemojo.com/oscar/) listing movies nominated for the Oscar Best Picture Award. I joined Movie Dossier and Mojo Oscar tables (on Movie Name) and voila - I knew if a movie was nominated or not.
Exploratory Analysis
When I looked at the Movie Dossier dataset, I didn't know where to start. Does movie production change over time? If so, how? Does movie genre have any significance? How come some movies are so much more popular than others? And above all, how can this data help me understand what helped 8 movies beat other 660 competitors and get nominated for Oscar's Best Picture Award in 2015?
So here goes...
My Lesson #1
When you enter an uncharted territory and you lack domain knowledge in the subject you are about to analyze, take a pause...find data you think is relevant and play with it. Very similar to how you usually play with a new gadget when you are too lazy to read the manual. This will help you learn more about the subject and generate hypotheses you are looking for.
I took my own lesson and started looking at basic metrics like:
While doing this simple analysis, I noticed that movie production has been following overall US economy market trends with a little lag. Here is a visualization I put together.
Preliminary Findings
First off, why do Drama movies bring less money than Comedy films?! I guess people prefer to be more funny than serious... But look, film producers don't seem to agree since they keep making more drama than comedy (1,960 comedy films vs. 3,541 drama movies have been produced since 1995, but comedies earned 31% more money than dramas). Adventure movies turned out to be the most efficient ones -- can you believe that 619 movies made almost $39B (i.e. $61M/movie)? Well, I guess they are the most expensive ones too.
Speaking of production budgets, Avatar proved to be a revenue champion in action genre with $760M in gross earnings and $425M spent on production (who said 79% is a bad ROI?).
Curiously, Action, Thriller/Suspense and Adventure top movies earned 2 times more money than Drama, Comedy/Romantic Comedy and Horror favorites. And to my biggest dissapointment, Justin Bieber's concert show ranked #1 in Concert/Performance genre. But lets keep moving...
The bottom chart displays movie production volume change since 1995. Bar charts represent the # of movies released in that year and the trend line shows average revenue per ticket sold. As an add-on, if you hover over any bar you will see:
It was a big revelation for me that although movie production consistently followed US economic trend with a little lag (US market activity dropped in 2008-2009, whereas movie industry showed decline in 2010), in 2010 when movie production went down by 40%, on average film companies made a lot of money per film. Average revenue per movie was $25M which is the highest average revenue seen in 20 years. But when I looked at how much money each US citizen spent on movies that year, the picture cleared up a little. Turns out, people were paying $8.3 for a ticket compared to $6/ticket historical average.
So in a matter of few hours I saw that
Next Steps
I think I kicked my data around enough to generate initial hypotheses to answer my main question.
In my next post I will conduct a confirmatory analysis where I will test how well each factor can predict the likelihood of a movie to be nominated for Best Picture Award.
Your Turn
What do you think about the data? Could I use other sources to dive deeper into existing datasets?
Did exploratory analysis make sense to you? How else could I have explored the data to better link to Oscar nominations topic?
Have you conducted a similar exploratory analysis before? How did you approach the problem?
Comment
Thank you for your thoughts Chris. It is great to hear that we find common analytical problems in such different domains. You are right about US economy trend. It looks like we can expect continuous growth in the next few years (well, at least in the movie industry).
Hi Tatiana,
Enjoyable reading especially in the thought process. In answering your questions,
* "What do you think about the data?" I really like the bases which have good history. Only wish there was a way to add up all of the earnings per movie across all years
* "Did exploratory analysis make sense to you?" Yes
* "Have you conducted a similar exploratory analysis before?" Currently doing Australia's population and looking to match the health of its residents and health industry stats across 10 year intervals.
* "How did you approach the problem?" Currently a work in process with original information in raw form coming from the Australian Bureau of Statistics.....and various pivot excel table scenarios.
I have one theory for you based on the lower part of the chart. With all of the financial predictions around the economy (stretch/growth/downturn views), The chart tells me that the US Economy is on the up for the next five to seven years when you look at the 2000 and 2012 points on :). Great article.
Chris Lira
© 2021 TechTarget, Inc.
Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
Most Popular Content on DSC
To not miss this type of content in the future, subscribe to our newsletter.
Other popular resources
Archives: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More
Most popular articles
You need to be a member of AnalyticBridge to add comments!
Join AnalyticBridge