Skip to content

A simple web scraper to extract Product Data and # from Amazon

Notifications You must be signed in to change notification settings

Sam-Mear/amazon-scraper

 
 

Repository files navigation

Amazon Scraper using Selectorlib

A simple amazon scraper to extract product details and prices from Amazon.com using Python Requests and Selectorlib.

There are two simple scrapers in this project.

  1. Amazon Product Page Scraper amazon.py
  2. Amazon Search Results Page Scraper searchresults.py

Usage

From a terminal

  1. Clone this project git clone https://github.com/scrapehero-code/amazon-scraper.git and cd into it cd amazon-scraper
  2. Add a Virtual Environment python3 -m venv .venv (Optional)
  3. Activate the Virtual Environment source .venv/bin/activate (Optional)
  4. Install Requirements pip3 install -r requirements.txt

Scrape Product Details from Product Page

  1. Add Amazon Product URLS to urls.txt
  2. Run python3 amazon.py
  3. Get data from output/product.jsonl

Scrape Products from Search Results

This scraper only scrapes product from the first page of search results

  1. Add Amazon Product URLS to search_results_urls.txt
  2. Run python3 searchresults.py or python3 searchresults.py -removeAds to run and not include the ads
  3. Get data from output/search_results_output.jsonl

Check the output readme

FAQ

- I am seeing \u* before my outputs(for example in price) This is a unicode symbol. For example: \u00a3 is a UK pound sign, so \u00a3250.00 would be £250 if you encoded the unicode character.

- The URL output from searchresults.py is not a full URL Add https://www.amazon.co.uk in front of it. (Or whatever amazon region you want to scrape, this example goes to .co.uk)

About

A simple web scraper to extract Product Data and # from Amazon

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%