wikipedia-web-crawler

This piece of code Crawls the wikipedia articles one by one (First link in the page) until it encounters the Language Article. I had heared somewhere that when you go on clicking the first link in every article on wikipedia, you'll end up on Philosophy page, but when i ran my program and returned the counts it turned out to be Language Article.

version 1.0

wherever you start there is high possibility that you'll encounter Language Article while Crawling. Does that mean This article has the most incoming links or is it something before this article?

we'll try to find out in future increments.

Features :

Uses a real WebBrowser for automation (Safari, Apple)
Shows pages traversed on terminal and saves it temporarily (Until program terminates)

Future ideas :

Count each page's appearance
Crawl a thousand times and find possible interconnections and rank them
Find the most interconnected Article
Support for other browsers

currently only supports Safari

Copyright :

Husain AKbar Shaikh

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
code_1.py		code_1.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

wikipedia-web-crawler

version 1.0

Features :

Future ideas :

Copyright :

About

Releases

Packages

Languages

husainshaikh895/wikipedia-web-crawler

Folders and files

Latest commit

History

Repository files navigation

wikipedia-web-crawler

version 1.0

Features :

Future ideas :

Copyright :

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages