Hacker News Web Scraper

A command-line tool that scraps the Hacker News website and returns information about the top posts

How it works

This tool scraps Hacker News (https://news.ycombinator.com/news) and parses the information about the top posts into the following object structure:

{ 'author': 'a-name', 'comments': 0, 'points': 0, 'rank': 0, 'title': 'a-title', 'uri': 'a-url' }

It was built and tested in Python 3.7. To run it locally, ensure you have Python 3.x and pip installed on your machine. To install dependencies run pip install -r requirements.txt in the project root.

It uses the Beautiful Soup and requests libraries to fetch and parse the html. These are two well-known libraries for web scraping and Python is often considered one of the best languages for this type of task.

Hacker News also has its own API (https://github.com/HackerNews/API), but for the purposes of this project it was decided to use web scraping instead as it is quicker than making multiple calls to the API and there are certain fields not exposed by the API that are available in the web view (such as number of comments).

To Build and Run

If you have Python 3 installed in your machine and have installed the dependencies, you can run this tool from the project root by running python cli.py --posts 20. If you do not specify a number of posts to return, the tool defaults to 10.

You can also build and run this in Docker:

Install Docker - https://docs.docker.com/v17.09/engine/installation/
In the project root, run docker build -t hacker-news .
Run docker run hacker-news --posts 3

To Run Tests

In the project root, run ./run-tests.sh

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
test		test
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
cli.py		cli.py
constants.py		constants.py
post_fetcher.py		post_fetcher.py
requirements.txt		requirements.txt
run-tests.sh		run-tests.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hacker News Web Scraper

How it works

To Build and Run

To Run Tests

About

Releases

Packages

Languages

rebecca-thompson/hacker-news-web-scraper

Folders and files

Latest commit

History

Repository files navigation

Hacker News Web Scraper

How it works

To Build and Run

To Run Tests

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages