Anime fans always want to keep up with their favorite anime. Let AnimeToday help you do it.
AnimeToday is an anime tracking application. Instead of checking the website multiple times per day, with AnimeToday, you will be notified once the anime you are following have new episodes air. Moreover, it will give you the url of the episode, so that you can start watch with only one click!
Now it only support crunchyroll.com. It will expend to more websites in the future.
Screenshots of the emails and text messages sent by the application
- Pooling data Crawl Crunchyroll. Save all html code of websites needed into AWS S3 bucket as text file.
- Scraping data
Use python library
BeautifulSoup
to extract information about the name of anime, episodes of anime, url of the cover picture, etc. Store them into PostgreSQL - Querying and Notifying users Select episodes that release today, sending notifications to those users who are following the anime though email and sms
- Automation Add Apache Airflow on top of the pipline to automate the workflow, so that it can easily run daily or any frequency.
- Scaling up Built a distributed architecture. Able the pipeline to handle higher volume as well as be fault-tolerant
Check the Wiki of this Repo to deploy a distributed Airflow architecture!
- Installations
- PostgreSQL 10
- python 3.6
- boto3
- psycopg2
- bs4
- Services
- AWS EC2
- AWS S3
- AWS SES
- AWS SNS
- Installations
- Airflow 1.10.1
- Installations
- Airflow 1.10.1
- Celery 4.1.1
- RabbitMQ 3.7.0
- Use Spark to do batch processing
- Build a front end for user to register and choose animes
- Set AWS EC2 Auto Scaling Groups Policy to automatically launch more workers as needed