A simple amazon scraper to extract product details and prices from Amazon.com using Python Requests and Selectorlib.
There are two simple scrapers in this project.
- Amazon Product Page Scraper
amazon.py
- Amazon Search Results Page Scraper
searchresults.py
From a terminal
- Clone this project
git clone https://github.com/scrapehero-code/amazon-scraper.git
and cd into itcd amazon-scraper
- Add a Virtual Environment
python3 -m venv .venv
(Optional) - Activate the Virtual Environment
source .venv/bin/activate
(Optional) - Install Requirements
pip3 install -r requirements.txt
- Add Amazon Product URLS to urls.txt
- Run
python3 amazon.py
- Get data from output/product.jsonl
This scraper only scrapes product from the first page of search results
- Add Amazon Product URLS to search_results_urls.txt
- Run
python3 searchresults.py
orpython3 searchresults.py -removeAds
to run and not include the ads - Get data from output/search_results_output.jsonl
Check the output readme
- I am seeing \u*
before my outputs(for example in price)
This is a unicode symbol. For example: \u00a3
is a UK pound sign, so \u00a3250.00
would be £250
if you encoded the unicode character.
- The URL output from searchresults.py
is not a full URL
Add https://www.amazon.co.uk
in front of it. (Or whatever amazon region you want to scrape, this example goes to .co.uk)