Skip to content

Scraping reviews for movies and series (along with general info about shows) from a popular online database of information related to movies and television series.

Notifications You must be signed in to change notification settings

Extremesarova/shows_parsing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Shows Parsing

I've decided to parse web-pages with reviews for movies and series (along with general info about shows) from popular online database of information related to movies and television series and use this data to practice different data science techniques and improve my skills.

General Information

I've parsed information about top-1000 movies and top-1000 series in Russia and divided the data into 4 datasets:

Dataset Name Dimensions Main columns
Movie Info (984, 43) Show ID, Russian Title, Original title, Actors, Show Info, Ratings, Synopsis, Critics' Scores
Movie Reviews (171094, 8) Show ID, Date and Time, Sentiment, Review Subtitle, Review, Usefulness of Review
Series Info (978, 40) Show ID, Russian Title, Original title, Actors, Show Info, Ratings, Synopsis, Critics' Scores
Series Reviews (35643, 8) Show ID, Date and Time, Sentiment, Review Subtitle, Review, Usefulness of Review

Overall, I've got 206 737 reviews and 1962 shows.

Code

Set-up

pip install -e .
python src/parsing_pages/parsing.py data movies 12
python src/parsing_pages/parsing.py data series 12

About

Scraping reviews for movies and series (along with general info about shows) from a popular online database of information related to movies and television series.

Topics

Resources

Stars

Watchers

Forks