EDA
Various variables such as number of audience and total sales were studied in the project. Our goal was to identify variables that affect the success of a movie.
Prerequisites:
- Jupyter Notebook
- Python3
- Anaconda
- matplotlib.pyplot
- seaborn
- warnings
- font_manager (matplotlib) - for Korean font
- pandas
- (536rowsx11columns)
- genre
- release date (month,year, season)
- total screen number
- total audience
- total sales
- point
- rate
- actors
I. Data Cleansing
- Korean movies from 2013~2020 were used.
- Eliminated movies rated 'Adult'.
- 'Total audience' was converted to thousands.
- Actors with the same name were removed from the list.
II. Data Visualization
- Expected that the number of movies released would have decreased in 2020 because of the pandemic.
- Even if year 2020 did not end and more movies may be released, still the number compared to last year has a big difference.
- Expected that, more movies released in a year would mean more number of screens and more audience.
- But instead found that sales and audience increase and decrease in the same movement while number of screens and movie released do not.
- Sorted by Top 10 Actors who have starred in the most movies during the period mentined above. Total credit count on Top 10 actors is 183.
- Top 10 Actors have appeared on 34.14% of all the movies during the period.
- Top 10 Actors performed remarkably in the Genre of Drama, Crime which is counted above 30, respectively.
- When Top 10 Actors appeared in the movie, it had better performance in the way of number of the Audience and the Screen than the other movies.
- Apparently it doesn't mean that it deserved better points than the other movies.
- As KOBIS has announced earlier, the Movie above Audience 7,000K is so-called "Box-office bomb", which is so rare case for the most of the actors. It turned out all 10 Actors made a huge success more than once during 2013-2020.
- 김예지
- Data gathering through API. Data cleansing process. Data visualizing on the overall movie data. (process 1, 2, and 3 of Data Visualization)
- Github : https://github.com/yeji0701
- 방희란
- Data cleansing process. Data visualizing on the overall movie data. (process 4, 5 and 6 of Data Visualization)
- Github : https://github.com/Heeran-cloud