Scrapy-news

Scrapy spiders for news website

1. How to use

Install dependency (pip install -r requirements.txt)
Run spider
Modify Scrapy Settings if needed

scrapy runspider [SPIDER PATH] -a start_id=1000 -a end_id=1500 -o [OUTPUT_FILE]

1.1 Example

scrapy runspider ./news/spiders/prachatai.py -a start_id=1000 -a end_id=1500 -o prachatai.jl

2. Spiders

2.1 Prachatai

URL: https://prachatai.com/print/[ARTICLE_ID]

** Arguments **:

start_id - Article IDs
end_id - Article IDs

scrapy runspider ./news/spiders/prachatai.py -a start_id=1000 -a end_id=1500 -o prachatai.jl

2.2 Thaipbs

URL: http://news.thaipbs.or.th/content/[ARTICLE_ID]

** Arguments **:

start_id - Article IDs
end_id - Article IDs

scrapy runspider ./news/spiders/thaipbs.py  -a start_id=1000 -a end_id=1500 -o thaipbs.jl

3. Output format

Support as scrapy feed export

.csv
.jl (JSON Line)
.json
.xml

scrapy runspider .news/spiders/thaipbs.py  -a start_id=1000 -a end_id=1500 -o thaipbs.csv

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
news		news
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scrapy-news

1. How to use

1.1 Example

2. Spiders

2.1 Prachatai

2.2 Thaipbs

3. Output format

About

Releases

Packages

Languages

lukkiddd/scrapy-news

Folders and files

Latest commit

History

Repository files navigation

Scrapy-news

1. How to use

1.1 Example

2. Spiders

2.1 Prachatai

2.2 Thaipbs

3. Output format

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages