Skip to content

bkamapantula/thehindu-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

thehindu-scraper

Scraping archived web links from thehindu.com website

Output structure

year-mm-dd/          # directory
  - links.csv
  - links-out.csv    # sub-directory

sample links.csv content

link,title
https://www.thehindu.com/todays-paper/sc-to-handle-ayodhya-title-dispute-only-as-a-land-issue/article22697171.ece,SC to handle Ayodhya title dispute only as a ‘land issue’
https://www.thehindu.com/todays-paper/two-ksrtc-drivers-killed-as-buses-collide-head-on/article22698589.ece,Two KSRTC drivers killed as buses collide head-on
...

sample links-out.csv content

link,title,content
https://www.thehindu.com/todays-paper/sc-to-handle-ayodhya-title-dispute-only-as-a-land-issue/article22697171.ece,SC to handle Ayodhya title dispute only as a ‘land issue’,[HUGE CONTENT]

About

Scraping archived web links from thehindu.com website

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published