Skip to content

lukkiddd/scrapy-news

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Scrapy-news

Scrapy spiders for news website

1. How to use

  1. Install dependency (pip install -r requirements.txt)
  2. Run spider
  3. Modify Scrapy Settings if needed
scrapy runspider [SPIDER PATH] -a start_id=1000 -a end_id=1500 -o [OUTPUT_FILE]

1.1 Example

scrapy runspider ./news/spiders/prachatai.py -a start_id=1000 -a end_id=1500 -o prachatai.jl

2. Spiders

2.1 Prachatai

URL: https://prachatai.com/print/[ARTICLE_ID]

** Arguments **:

  • start_id - Article IDs
  • end_id - Article IDs
scrapy runspider ./news/spiders/prachatai.py -a start_id=1000 -a end_id=1500 -o prachatai.jl

2.2 Thaipbs

URL: http://news.thaipbs.or.th/content/[ARTICLE_ID]

** Arguments **:

  • start_id - Article IDs
  • end_id - Article IDs
scrapy runspider ./news/spiders/thaipbs.py  -a start_id=1000 -a end_id=1500 -o thaipbs.jl

3. Output format

Support as scrapy feed export

  • .csv
  • .jl (JSON Line)
  • .json
  • .xml
scrapy runspider .news/spiders/thaipbs.py  -a start_id=1000 -a end_id=1500 -o thaipbs.csv

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages