Shows Parsing

I've decided to parse web-pages with reviews for movies and series (along with general info about shows) from popular online database of information related to movies and television series and use this data to practice different data science techniques and improve my skills.

General Information

I've parsed information about top-1000 movies and top-1000 series in Russia and divided the data into 4 datasets:

Dataset Name	Dimensions	Main columns
Movie Info	(984, 43)	Show ID, Russian Title, Original title, Actors, Show Info, Ratings, Synopsis, Critics' Scores
Movie Reviews	(171094, 8)	Show ID, Date and Time, Sentiment, Review Subtitle, Review, Usefulness of Review
Series Info	(978, 40)	Show ID, Russian Title, Original title, Actors, Show Info, Ratings, Synopsis, Critics' Scores
Series Reviews	(35643, 8)	Show ID, Date and Time, Sentiment, Review Subtitle, Review, Usefulness of Review

Overall, I've got 206 737 reviews and 1962 shows.

Code

Parsing to start parsing process (with multiprocessing)
Dataobjects to represent review and show info abstractions
Parsers to parse web-pages
HTML Reader to read the page
Parsing Utils

Set-up

pip install -e .
python src/parsing_pages/parsing.py data movies 12
python src/parsing_pages/parsing.py data series 12

Name		Name	Last commit message	Last commit date
Latest commit History 104 Commits
notebooks		notebooks
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Shows Parsing

General Information

Code

Set-up

About

Languages

Extremesarova/shows_parsing

Folders and files

Latest commit

History

Repository files navigation

Shows Parsing

General Information

Code

Set-up

About

Topics

Resources

Stars

Watchers

Forks

Languages