Skip to content

Web crawler to count in URLs how many times a word appears on the page.

Notifications You must be signed in to change notification settings

lucaslatorre/WordCrawlerAPI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WordCrawlerAPI

This API will take a url(s), a word, and will count the number of matches (case-sensitive).

How it works

This web crawler is responsible for crawling the URL(s) reported and returning a response containing a json with the number of occurrences of the reported word, per site.

Example:

http://127.0.0.1:5000/?url=www.pyhon.org&url=flask.pocoo.org&url=www.djangoproject.com&word=docs&ignorecase=true

?url=python.org, &url=flask.pocoo.org and &url=www.djangoproject.com are the url's to be requested

&word=docs is the word to be searched and counted

&ignorecase=true pass this argument to match ignoring case

Return JSON:

[
    {
        "http://www.pyhon.org": {
            "docs": 13
        }
    },
    {
        "http://flask.pocoo.org": {
            "docs": 10
        }
    },
    {
        "http://www.djangoproject.com": {
            "docs": 16
        }
    }
]

Setup

To run this script, you'll need Python 3.x installed and pip to get the packages.

Then you must get the extensions Flask-restful, requests and Flask-Caching.

To get then you can simply open a terminal and type: pip install flask-restful then pip install requests and finnaly pip install Flask-Caching.

Running

To run the program, clone the repository or download crawler.py

Open a terminal and go to the folder that contains crawler.py

Type python crawler.py and it will run the service under http://127.0.0.1:5000/

Then all you need to do is request the service passing the arguments:

  • via Browser : you can do it on your browser, just type this in the address box = http://127.0.0.1:5000/?url=globo.com&url=terra.com.br&word=google

  • via Postman : See the documentation

Testing

The file crawler_test.py is a test suit that was created to be used with the library pytest

Install pytest by running in your terminal pip install pytest

Then, on the root folder of this project, run the terminal command pytest. It will automatically look for and run tests files (that follow a naming convention)

About

Web crawler to count in URLs how many times a word appears on the page.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages