Install cronjobs

Jump to bottom

Ingolf Kuss edited this page Sep 17, 2024 · 12 revisions

Install cronjobs and scripts

Install scripts for to.science

The scripts constitute a set of helpful tools about the administration of a to.science installation.
Most of the scripts have been written for the use in cronjobs, such as
– backups of database, of the search index
– re-indexing the search index
– nightly web crawling
– update OAI sets
– export webserver logs to a statistics tool
– and more
They can also be run on demand from Linux console.
Most scripts have been written in Linux bash (Bourne Again Shell) scripting language.

Install the scripts:

cd /opt/toscience/ 
  git clone "https://github.com/hbz/to.science.scripts.git" bin 
  cd bin 
  ln -s ../git/to.science.install/variables.conf

The latter command line adds the link to the global configuration file of to.science. The configuration file is needed by almost all scripts to run.

Install additional Software

The scripts need additional software components to run.

Install parallel

This is needed to run several update scripts, including indexAll.sh, lobidifyAll.sh and updateAll.sh .
Create a new software repository in YaST2 with the URL https://download.opensuse.org/repositories/home:/tange/SLE_15_SP1_Backports as “tange’s Home Project”. Install “parallel – a command line tool to execute jobs in parallel” with YaST2.

Install additional Perl modules

Install the WWW::Curl module. This is needed to run get_pids.pl as a part of updateAll.sh

Install libcurl-devel with YaST2. Then do

sudo su
cpan WWW::Curl

Install jq

zypper in jq

Install cronjobs for to.science

Initialize backup jobs

mkdir ~/backup
mkdir ~/backup/mysql
mkdir ~/backup/elasticsearch
sudo su
chown -R elasticsearch /opt/toscience/backup/elasticsearch
ctrl^D  # become toscience again
cd ~/bin
backup-db.sh --init
source variables.conf
curl -XPUT $ELASTICSEARCH/_snapshot/my_backup -d'{"type":"fs","settings":{"compress":true,"location":"/opt/toscience/backup/elasticsearch"}}}'

Initilaite apache log depersonalization

Create dir for apachelogs backup (for depersonalization)

mkdir ~/apachelog.bck

Add user toscience and user root to group adm.
Make /var/log/apache2 writeable for group adm:

chown root:adm /var/log/apache2
chmod 770 /var/log/apache2
ls -ld /var/log/apache2
drwxrwx--- 2 root adm 53248 Sep 17 16:21 /var/log/apache2

Install basic cronjobs

Install basic cronjobs for a toscience instance that has webcrawling facilities:

Edit crontab : crontab -e

# m h  dom mon dow   command
# Update Metadata (lobdify & enrich)
0 5 * * 6 /opt/toscience/bin/updateAll.sh > /dev/null
30 22 * * * /opt/toscience/bin/turnOnOaiPmhPolling.sh
45 5 * * * /opt/toscience/bin/turnOffOaiPmhPolling.sh
# Anonymize local apache logs
0 1 * * * /opt/toscience/bin/depersonalize-apache-logs.sh > /dev/null
# Load apache logs of the previous day into the local piwik instance
0 2 * * * /opt/toscience/bin/import-logfiles.sh > /dev/null
# Run Edoweb Webgathering
0 20 * * * /opt/toscience/bin/runGatherer.sh >> /opt/toscience/logs/runGatherer.log
# Evaluate the latest Webgatherer run
0 21 * * * /opt/toscience/bin/evalWebgatherer.sh >> /opt/toscience/logs/runGatherer.log
# Move files out of the working directory of wpull into the output directory of wpull
0 22 * * * /opt/toscience/bin/move_files_from_crawldir.sh >> /opt/toscience/logs/ks.move_files_from_crawldir.log
# Generate Crawl Reports
0 22 * * 6 /opt/toscience/bin/crawlReport.sh >> /opt/toscience/logs/crawlReport.log
# Create Elasticsearch und mysql - Backups in /opt/toscience/backup
0 2 * * * /opt/toscience/bin/backup-es.sh -c >> /opt/toscience/logs/backup-es.log 2>&1
30 2 * * * /opt/toscience/bin/backup-es.sh -b >> /opt/toscience/logs/backup-es.log 2>&1
0 2 * * * /opt/toscience/bin/backup-db.sh -c >> /opt/toscience/logs/backup-db.log 2>&1
30 2 * * * /opt/toscience/bin/backup-db.sh -b >> /opt/toscience/logs/backup-db.log 2>&1