Porn Site Scraper

A web based scraper and UI monitoring tool for downloading videos and full photo galleries from various porn sites. Launches a web driver for scraping and can be configured to handle authentication for scraping media hidden behind logins.

Currently supported sites

Videos

PornHub (pornhub.com)
Porntrex (porntrex.com)
FullPorner (fullporner.com)
PornKTube (pornktube.tv)
HQPorner (hqporner.com)
WhoresHub (whoreshub.com)
YouJizz (youjizz.com)
Goodporn (goodporn.to)

Photo Galleries

PornPics (pornpics.com)
PicHunter (pichunter.com)

Usage

Visit one of the supported sites and navigate to a desired video.
Copy the URL from the site and paste it into the web UI input and add a name to be used for saved file.
Click scrape button to launch the job and monitor progress in the UI.

Installation

Pre-reqs

>=PHP 7.2
Chrome browser installed on host system
FFmpeg installed on host browser

Installation steps

Clone the repository
git clone https://github.com/ed36080666/site_scraper.git
Install PHP dependencies
composer install
Install Laravel Dusk chrome driver
php artisan dusk:chrome-driver
Install frontend dependencies
npm install
Copy and configure .env
cp .env.example .env
1. Set full system path for FFMPEG_OUTPUT_PATH variable in .env. This determines where saved videos are stored.
2. Set full system path for FFMPEG_LOG_PATH variable in .env. This determines where FFmpeg will store log files.
Generate Laravel application key
php artisan key:generate
Create the base SQLite database
touch ./database/database.sqlite
Run database migrations
php artisan migrate
Build frontend assets
npm run dev
Start a queue worker (handles scraping jobs in background)
php artisan queue:work
Start the application
php artisan serve

Running queue workers

To get the most out of this application, you should leverage the Laravel worker queue. The best way to do this is by running queue workers in the background using Supervisor. Supervisor will launch a given number of worker threads and keep them running.

Install supervisor
sudo apt update && sudo apt install supervisor
Create a new config file for our workers:
sudo vim /etc/supervisor/conf.d/site_scraper_worker.conf

[program:site_scraper_worker]
process_name=%(program_name)s_%(process_num)02d
# cstomize system path to root of the site_scraper directory
command=php /var/www/vhosts/site_scraper/artisan queue:work --tries=1 --timeout=7000
autostart=true
autorestart=true
stopasgroup=true
killasgroup=true
redirect_stderr=true
stopwaitsecs=7201
user=# set appropriate system user
numprocs=8 # Can add more or fewer works based on your hardware, network etc.
stdout_logfile=# Customize this to wherever you want to place your queue worker logs.

Reread the config files and update supervisor
sudo supervisorctl reread
sudo supervisorctl update
Check the workers are running
sudo supervisorctl update

You should see something along the following:

site_scraper_worker:site_scraper_worker_00   RUNNING   pid 20567, uptime 0:02:55
... 1 entry for each worker

Troubleshooting

Chrome Web Driver exceptions
1. Ensure Chrome is installed on the host system
2. Ensure Laravel Dusk Chrome driver binary is installed
  1. Visit Laravel Dusk docs for more info
Out of date errors. Sometimes Laravel Dusk will install a version of the Chrome driver that requires a higher version of the Chrome binary than what is installed on the system. If you see errors about unsupported versions during scraping, try updating the Chrome binary to a higher version (aka re-install/update Chrome browser).
Permission errors
1. Ensure ffmpeg binary has execute permissions allowing server to launch processes
2. Ensure server has write permissions to the video output directory
3. Ensure server has write permissions to all the log directories

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
app		app
bootstrap		bootstrap
config		config
database		database
public		public
resources		resources
routes		routes
storage		storage
tests		tests
.editorconfig		.editorconfig
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
.styleci.yml		.styleci.yml
README.md		README.md
artisan		artisan
composer.json		composer.json
composer.lock		composer.lock
package-lock.json		package-lock.json
package.json		package.json
phpunit.xml		phpunit.xml
server.php		server.php
webpack.mix.js		webpack.mix.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Porn Site Scraper

Currently supported sites

Usage

Installation

Pre-reqs

Installation steps

Running queue workers

Troubleshooting

About

Languages

ed36080666/site_scraper

Folders and files

Latest commit

History

Repository files navigation

Porn Site Scraper

Currently supported sites

Usage

Installation

Pre-reqs

Installation steps

Running queue workers

Troubleshooting

About

Topics

Resources

Stars

Watchers

Forks

Languages