You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For PBS systems (and in the forthcoming archer2 changes for slurm systems also), there is a continuation script in the monc/misc folder that contains running scripts for continuation jobs. These make use of dependency chains so that a job can be started after a previous job has completed, using the checkpoint of the previous job.
A similar script can be written for the arc4 sge systems, using the --hold_jid flag which does the same job as the slurm --dependency flag.
The job submission script used by @craigpoku had a function in it which checks for completed jobs as a way of providing this functionality. That and the monc/misc/continuation.sh script would be a good place to start. The relevant code is below:
--- Checks:
# Check for run completion message in monc output file:functioncheck_complete() {
if [ -r"${MONC_OUT}" ] ;then
grep -q 'Model run complete due to model time'${MONC_OUT}>& /dev/null
if [ "${?}"="0" ] ;thenecho'MONC run appears to have completed (exceeded termination time)'# Display end time:echo"END TIME: $(date)"exit 0
fifi
}
check_complete
# Check for previous checkpoint file:if [ -r"${MONC_OUT}" ] ;then
PREV_CKPT_FILE=$(basename $(grep \'Restarted configuration from checkpoint file' \${MONC_OUT}| egrep -o '[0-9a-zA-Z_/-]+\.nc') \fi# Check for most recent existing checkpoint file:CKPT_FILE=$(basename $(\ls -1v ${CKPT_DIR}| tail -n 1)2> /dev/null)# If current chckpoint file is same as previous, give up:if [ !-z"${PREV_CKPT_FILE}" ] && [ !-z"${CKPT_FILE}" ] ;thenif [ "${PREV_CKPT_FILE}"="${CKPT_FILE}" ] ;thenecho"Previous checkpoint file is same as current (${CKPT_FILE})"# Display end time:echo"END TIME: $(date)"exit 1fifi# If we have a checkpoint file, restart MONC, else, start from config:if [ !-z"${CKPT_FILE}" ] ;then MONC_ARGS="--checkpoint=${CKPT_DIR}/${CKPT_FILE}"else MONC_ARGS="--config=${MONC_CONFIG}"fi
The text was updated successfully, but these errors were encountered:
A script has been modified by @gyoung410 in her fork of this repo. The qsub command in the continuation script remains unchanged and so does not fully work - it still uses PBS syntax - however it shows some of the changes necessary.
For PBS systems (and in the forthcoming archer2 changes for slurm systems also), there is a continuation script in the monc/misc folder that contains running scripts for continuation jobs. These make use of dependency chains so that a job can be started after a previous job has completed, using the checkpoint of the previous job.
A similar script can be written for the arc4 sge systems, using the
--hold_jid
flag which does the same job as the slurm--dependency
flag.The job submission script used by @craigpoku had a function in it which checks for completed jobs as a way of providing this functionality. That and the
monc/misc/continuation.sh
script would be a good place to start. The relevant code is below:The text was updated successfully, but these errors were encountered: