Skip to content

Heeran-cloud/LinearRegression_ML

 
 

Repository files navigation

Regression_Project

Predicting the Movie's Revenue and Audience with Linear Regression Model.

When having a data which we could know before a film was released, what would the Revenue and Audience of the film be predicted? We strove to find out the best Linear Regression Model to predict its Revenue and Audience.

Prerequisites:

  • Jupyter Notebook
  • Python3
  • Anaconda

Getting Started

Packages to install
  • matplotlib.pyplot
  • seaborn
  • warnings
  • pandas
  • numpy
  • sklearn.preprocessing
  • sklearn.model_selection
  • sklearn.linear_model
  • sklearn.metrics
Dataset
  • (811rowsx10columns)
  • Feature
    • Number of Screen
    • Genre
    • Distributor
    • AgeRate
    • Release Date (month,year,season)
    • Actor
  • Label
    • Sales
    • Audience

Procedure:

I. Data Cleansing

  • Korean movies from 2008~2020 were used.
  • Eliminated movies rated 'Adult'.
  • 'Audience', 'Sales' were converted to a million.
  • Actors with the same name or one-syllable name were removed from the list.
  • A Distributor value was missing and filled in with mode value of Distributor.

II. Data Visualization image

  • Two Histograms of Sales and Audience are right skewed, which means most of movies are struggling to be successful. Plus, we found ourselves in trouble to predict those two as imbalance data.
  • Since Number of Screen has correlated closely to the Sales and Audience, the histogram of Screen is right skewed as well.
  • We could see that Top 5 Distributor are taking over this industry.

III. Testing model

  1. Label : Audience image
  • Since Audience Data is placed with the outliers in upper fences, I supposed, if necessary, outliers will be elimated one by one within upper fences. image
  • Chose to go with a RMSE indication to see how much better the model is than just predicting without adequate data cleansing. image
  • RMSE of Test Data decreased from 1.73 to 0.75 after continuous data cleansing.

.
Built with

Acknowledgements

About

Fast Campus 3rd project.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%