Linear Regression Analysis - Metis Project 2 Predicting US Domestic Gross Total Revenues
Description of project goals
The project required web scrapping and analysis of resulting data using regression, the form of regression was based on the data collected and the feature engineering necessary to analyze the data in regression. This project used simple linear regression with K-fold cross validation to predict the Target using the Features listed below.
Features and Target Variables
Target :
- Total Domestic Gross
Features:
- Months,
- Years,
- Distributor,
- MPAA Rating,
- Runtime,
- Budget
Data Used
- Box Office Mojo by IMDB data
- The Numbers
Tools Used
- Numpy
- Pandas
- Pickle
- Matplotlib
- Seaborn
- Beautiful Soup
- Sklearn
- Request
Impacts in the scope of the project:
- To create a prediction model for total domestic gross in the United States
- To determine the salient features for predicting Domestic Box Office Revenues in the United States
Workflow
Created databases using the Numbers webpage and Box Office Mojo, which are noted in their respective jupyter notebooks. Then joined the datasets and conduct cleaning, preprocessing, EDA, and linear regression within the Regression notebook.