Scrapy spiders for news website
- Install dependency (
pip install -r requirements.txt
) - Run spider
- Modify Scrapy Settings if needed
scrapy runspider [SPIDER PATH] -a start_id=1000 -a end_id=1500 -o [OUTPUT_FILE]
scrapy runspider ./news/spiders/prachatai.py -a start_id=1000 -a end_id=1500 -o prachatai.jl
URL: https://prachatai.com/print/[ARTICLE_ID]
** Arguments **:
start_id
- Article IDsend_id
- Article IDs
scrapy runspider ./news/spiders/prachatai.py -a start_id=1000 -a end_id=1500 -o prachatai.jl
URL: http://news.thaipbs.or.th/content/[ARTICLE_ID]
** Arguments **:
start_id
- Article IDsend_id
- Article IDs
scrapy runspider ./news/spiders/thaipbs.py -a start_id=1000 -a end_id=1500 -o thaipbs.jl
Support as scrapy feed export
.csv
.jl
(JSON Line).json
.xml
scrapy runspider .news/spiders/thaipbs.py -a start_id=1000 -a end_id=1500 -o thaipbs.csv