Skip to content

sainas/AnimeToday

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AnimeToday

Gotta Watch 'Em All!

Introduction

Anime fans always want to keep up with their favorite anime. Let AnimeToday help you do it.

AnimeToday is an anime tracking application. Instead of checking the website multiple times per day, with AnimeToday, you will be notified once the anime you are following have new episodes air. Moreover, it will give you the url of the episode, so that you can start watch with only one click!

Now it only support crunchyroll.com. It will expend to more websites in the future.

Demo

email_and_text_demo

Screenshots of the emails and text messages sent by the application

Approaches

pipeline

  1. Pooling data Crawl Crunchyroll. Save all html code of websites needed into AWS S3 bucket as text file.
  2. Scraping data Use python library BeautifulSoup to extract information about the name of anime, episodes of anime, url of the cover picture, etc. Store them into PostgreSQL schema
  3. Querying and Notifying users Select episodes that release today, sending notifications to those users who are following the anime though email and sms
  4. Automation Add Apache Airflow on top of the pipline to automate the workflow, so that it can easily run daily or any frequency.
  5. Scaling up Built a distributed architecture. Able the pipeline to handle higher volume as well as be fault-tolerant

Distributed Airflow Architecture

Deployment

Check the Wiki of this Repo to deploy a distributed Airflow architecture!

Prerequisites

Basic Pipeline

Prerequisites

  • Installations
    • PostgreSQL 10
    • python 3.6
    • boto3
    • psycopg2
    • bs4
  • Services
    • AWS EC2
    • AWS S3
    • AWS SES
    • AWS SNS

Automated with Airflow

  • Installations
    • Airflow 1.10.1

Scale up number of workers

  • Installations
    • Airflow 1.10.1
    • Celery 4.1.1
    • RabbitMQ 3.7.0

Future Work

  • Use Spark to do batch processing
  • Build a front end for user to register and choose animes
  • Set AWS EC2 Auto Scaling Groups Policy to automatically launch more workers as needed

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published