Skip to content

Latest commit

 

History

History
80 lines (62 loc) · 1.62 KB

README.md

File metadata and controls

80 lines (62 loc) · 1.62 KB

Web Scraper Project

This project scrapes data from two different websites and saves the data in CSV format.

Project Structure

  • .github/workflows/: Contains GitHub Actions workflows.
  • Data/: Contains data files.
  • Datasets/: Contains scraped datasets.
  • logs/: Contains log files.
  • Report/: Report files
  • Scripts/: Contains Python scripts.
    • additional.py: Additional functions and logging configuration.
    • dataset_download.py: Downloads datasets from Kaggle.
    • dataset_upload.py: Uploads datasets to Kaggle.
    • Selenium.py: Browser automation with Selenium.
  • requirements.txt: Lists the required Python packages.

Setup

  1. Install the required packages:
    pip install -r requirements.txt

Usage

Scraping and Uploading BKM Data to Kaggle

  1. Scrape BKM categories:

    python Scripts/bkm_scrape_categories.py
  2. Download the Kaggle dataset:

    python Scripts/dataset_download.py
  3. Scrape BKM data:

    python Scripts/bkm_scrape.py
  4. Combine the scraped data:

    python Scripts/bkm_combine.py
  5. Upload the dataset to Kaggle:

    python Scripts/dataset_upload.py

Scraping and Uploading KY Data to Kaggle

  1. Scrape KY categories:

    python Scripts/ky_scrape_categories.py
  2. Download the Kaggle dataset:

    python Scripts/dataset_download.py
  3. Scrape KY data:

    python Scripts/ky_scrape.py
  4. Combine the scraped data:

    python Scripts/ky_combine.py
  5. Upload the dataset to Kaggle:

    python Scripts/dataset_upload.py