GitHub - Evgen-dev1989/m-tak: collect information

Web Scraping via Proxy with Chrome Headers

Overview This project collects information from websites using a proxy server while dynamically modifying the headers to mimic different Chrome versions. Each product is uniquely identified using a hash of its URL.

Features

Uses proxy servers to bypass restrictions.
Modifies request headers to simulate different Chrome browser versions.
Extracts and processes web data using requests and lxml.
Implements MurmurHash to generate unique keys for each product.
Stores and manages data using SQLite and Pandas.
Supports asynchronous operations with asyncio and asyncpg.

Dependencies Ensure you have the following Python libraries installed: bash pip install requests fp lxml configparser sqlite3 asyncio pandas asyncpg mmh3 struct json

Installation Clone this repository and navigate to the project directory: bash git clone https://github.com/Evgen-dev1989/m-tak.git cd yourrepository

Usage

Retrieve a free proxy using fp.fp.FreeProxy.
Modify headers dynamically to include different Chrome versions.
Parse HTML responses using lxml.
Store data efficiently in a database.

Example `python from fp.fp import FreeProxy import requests from lxml import html

proxy = FreeProxy().get() session = requests.Session() session.proxies = {"http": proxy, "https": proxy} headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) Chrome/114.0.0.0 Safari/537.36"} response = session.get("https://example.com", headers=headers) parsed_html = html.fromstring(response.content)

License This project is licensed under the MIT License.

Contact For any inquiries, please contact [camkaenota@gmail.com].

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
__pycache__		__pycache__
.gitignore		.gitignore
M-TAC.py		M-TAC.py
README.md		README.md
chrome_version.py		chrome_version.py
chrome_version.txt		chrome_version.txt
config.py		config.py
get_response_text.py		get_response_text.py
my_database.db		my_database.db
output.txt		output.txt
path_to_file.xlsx		path_to_file.xlsx
proxy-settings.py		proxy-settings.py
read_db.py		read_db.py
requirements.txt		requirements.txt
table.json		table.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

Evgen-dev1989/m-tak

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages