Skip to content

AndreaSolinas/sitemapScraping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Version License Release Bild

Scraper for publication control and SEO elements

I examine the sitemap and take the published articles, the time, and the domain.

This version is intended to achieve clean code, with separation between levels. Furthermore, in this version, the parameterization of configurations is to be used, which is why environment variables (.env) and .yaml files for configurations have been introduced.

Warning

This is a reworking of version 1.0.0, this release is not yet in production, but is in the embryonic stage of development.

Caution

Web scraping is not always a legal activity. Even if this information is public and can therefore be accessed by everyone, this project is still for personal use or with the approval of the site to be scraped.

License

This project is proprietary and is not licensed for public distribution or modification. All rights reserved by the author.