New movies are released every year, some of them become very successful with very high income, while others can not have the same success. In this project, I will try to estimate the price of these movies based on different features such as actors, production, year of reals ..etc.
If a new film released, how much it will make?
Data has been collected from different websites using web scraping. These websites are,
- "https://www.rottentomatoes.com" data extracted are rating, type, language, Director, Producer, Writer, Release Date, Runtime and Distributor.
- "https://www.wikipedia.org", get last five year films' names.
- "https://www.the-numbers.com" get films' ranks
- SQL
- Tableau
- Numpy
- Pandas
- BeautifulSoup
- Selenium
- Requests
I started by extracting all films from wekibidea using this code and then print all names. The second step is taking these names and saving them as a list, after that, the for loop is used to get the film name for filem_list and use the selenium library to extract more film details from rottentomatoes.com - code. All data are saved on a CSV file for future use. in addition, adding more details about film stars is important so, numbers.com websites are used to extract more details such as star's details code , directors , prediction companies. After that, all data gather together to make one CSV using SQL. To analyze the data frame, I started clean data, after that made five new columns "dummy" to represent the most popular film's type. After that, a first leaner regression model was used to predict films income - code- however polynomial regression gives the better result - code