-
Notifications
You must be signed in to change notification settings - Fork 1
Install cronjobs
The scripts constitute a set of helpful tools about the administration of a to.science installation.
Most of the scripts have been written for the use in cronjobs, such as
– backups of database, of the search index
– re-indexing the search index
– nightly web crawling
– update OAI sets
– export webserver logs to a statistics tool
– and more
They can also be run on demand from Linux console.
Most scripts have been written in Linux bash (Bourne Again Shell) scripting language.
Install the scripts:
cd /opt/toscience/
git clone "https://github.com/hbz/to.science.scripts.git" bin
cd bin
ln -s ../git/to.science.install/variables.conf
The latter command line adds the link to the global configuration file of to.science. The configuration file is needed by almost all scripts to run.
The scripts need additional software components to run.
This is needed to run several update scripts, including indexAll.sh, lobidifyAll.sh and updateAll.sh .
Create a new software repository in YaST2 with the URL https://download.opensuse.org/repositories/home:/tange/SLE_15_SP1_Backports as “tange’s Home Project”. Install “parallel – a command line tool to execute jobs in parallel” with YaST2.
Install the WWW::Curl module. This is needed to run get_pids.pl as a part of updateAll.sh
Install libcurl-devel with YaST2. Then do
sudo su
cpan WWW::Curl
zypper in jq
mkdir ~/backup
mkdir ~/backup/mysql
mkdir ~/backup/elasticsearch
sudo su
chown -R elasticsearch /opt/toscience/backup/elasticsearch
ctrl^D # become toscience again
cd ~/bin
backup-db.sh --init
source variables.conf
curl -XPUT $ELASTICSEARCH/_snapshot/my_backup -d'{"type":"fs","settings":{"compress":true,"location":"/opt/toscience/backup/elasticsearch"}}}'
Create dir for apachelogs backup (for depersonalization)
mkdir ~/apachelog.bck
Add user toscience and user root to group adm.
Make /var/log/apache2 writeable for group adm:
chown root:adm /var/log/apache2
chmod 770 /var/log/apache2
ls -ld /var/log/apache2
drwxrwx--- 2 root adm 53248 Sep 17 16:21 /var/log/apache2
Install basic cronjobs for a toscience instance that has webcrawling facilities:
Edit crontab : crontab -e
# m h dom mon dow command
# Update Metadata (lobdify & enrich)
0 5 * * 6 /opt/toscience/bin/updateAll.sh > /dev/null
30 22 * * * /opt/toscience/bin/turnOnOaiPmhPolling.sh
45 5 * * * /opt/toscience/bin/turnOffOaiPmhPolling.sh
# Anonymize local apache logs
0 1 * * * /opt/toscience/bin/depersonalize-apache-logs.sh > /dev/null
# Load apache logs of the previous day into the local piwik instance
0 2 * * * /opt/toscience/bin/import-logfiles.sh > /dev/null
# Run Edoweb Webgathering
0 20 * * * /opt/toscience/bin/runGatherer.sh >> /opt/toscience/logs/runGatherer.log
# Evaluate the latest Webgatherer run
0 21 * * * /opt/toscience/bin/evalWebgatherer.sh >> /opt/toscience/logs/runGatherer.log
# Move files out of the working directory of wpull into the output directory of wpull
0 22 * * * /opt/toscience/bin/move_files_from_crawldir.sh >> /opt/toscience/logs/ks.move_files_from_crawldir.log
# Generate Crawl Reports
0 22 * * 6 /opt/toscience/bin/crawlReport.sh >> /opt/toscience/logs/crawlReport.log
# Create Elasticsearch und mysql - Backups in /opt/toscience/backup
0 2 * * * /opt/toscience/bin/backup-es.sh -c >> /opt/toscience/logs/backup-es.log 2>&1
30 2 * * * /opt/toscience/bin/backup-es.sh -b >> /opt/toscience/logs/backup-es.log 2>&1
0 2 * * * /opt/toscience/bin/backup-db.sh -c >> /opt/toscience/logs/backup-db.log 2>&1
30 2 * * * /opt/toscience/bin/backup-db.sh -b >> /opt/toscience/logs/backup-db.log 2>&1