A web based scraper and UI monitoring tool for downloading videos and full photo galleries from various porn sites. Launches a web driver for scraping and can be configured to handle authentication for scraping media hidden behind logins.
Videos
- PornHub (pornhub.com)
- Porntrex (porntrex.com)
- FullPorner (fullporner.com)
- PornKTube (pornktube.tv)
- HQPorner (hqporner.com)
- WhoresHub (whoreshub.com)
- YouJizz (youjizz.com)
- Goodporn (goodporn.to)
Photo Galleries
- PornPics (pornpics.com)
- PicHunter (pichunter.com)
- Visit one of the supported sites and navigate to a desired video.
- Copy the URL from the site and paste it into the web UI input and add a name to be used for saved file.
- Click scrape button to launch the job and monitor progress in the UI.
- Clone the repository
git clone https://github.com/ed36080666/site_scraper.git
- Install PHP dependencies
composer install
- Install Laravel Dusk chrome driver
php artisan dusk:chrome-driver
- Install frontend dependencies
npm install
- Copy and configure
.env
cp .env.example .env
- Set full system path for
FFMPEG_OUTPUT_PATH
variable in.env
. This determines where saved videos are stored. - Set full system path for
FFMPEG_LOG_PATH
variable in.env
. This determines where FFmpeg will store log files.
- Set full system path for
- Generate Laravel application key
php artisan key:generate
- Create the base SQLite database
touch ./database/database.sqlite
- Run database migrations
php artisan migrate
- Build frontend assets
npm run dev
- Start a queue worker (handles scraping jobs in background)
php artisan queue:work
- Start the application
php artisan serve
To get the most out of this application, you should leverage the Laravel worker queue. The best way to do this is by running queue workers in the background using Supervisor. Supervisor will launch a given number of worker threads and keep them running.
- Install supervisor
sudo apt update && sudo apt install supervisor
- Create a new config file for our workers:
sudo vim /etc/supervisor/conf.d/site_scraper_worker.conf
[program:site_scraper_worker]
process_name=%(program_name)s_%(process_num)02d
# cstomize system path to root of the site_scraper directory
command=php /var/www/vhosts/site_scraper/artisan queue:work --tries=1 --timeout=7000
autostart=true
autorestart=true
stopasgroup=true
killasgroup=true
redirect_stderr=true
stopwaitsecs=7201
user=# set appropriate system user
numprocs=8 # Can add more or fewer works based on your hardware, network etc.
stdout_logfile=# Customize this to wherever you want to place your queue worker logs.
- Reread the config files and update supervisor
sudo supervisorctl reread
sudo supervisorctl update
- Check the workers are running
sudo supervisorctl update
You should see something along the following:
site_scraper_worker:site_scraper_worker_00 RUNNING pid 20567, uptime 0:02:55
... 1 entry for each worker
- Chrome Web Driver exceptions
- Ensure Chrome is installed on the host system
- Ensure Laravel Dusk Chrome driver binary is installed
- Visit Laravel Dusk docs for more info
- Out of date errors. Sometimes Laravel Dusk will install a version of the Chrome driver that requires a higher version of the Chrome binary than what is installed on the system. If you see errors about unsupported versions during scraping, try updating the Chrome binary to a higher version (aka re-install/update Chrome browser).
- Permission errors
- Ensure ffmpeg binary has execute permissions allowing server to launch processes
- Ensure server has write permissions to the video output directory
- Ensure server has write permissions to all the log directories