Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

EOS spooling and traffic light #58

Merged
merged 4 commits into from
Sep 13, 2019
Merged

Conversation

sixtadm
Copy link
Contributor

@sixtadm sixtadm commented Sep 13, 2019

This PR aims at implementing the traffic light for BOINC submission of SixTrack tasks. At the same time, since spooling based on AFS only can become a problem due to its limited storage space, spooling to EOS has been implemented as well.

The logics is:

  • users can submit tasks whenever they want; these are regularly uploaded to the boinc.work volume on AFS;
  • an acrontab job on lxplus takes care of taring new tasks (input files) in zipped archives and upload the .tar.gz to EOS. Tasks are tared by order of arrival of the study, and (for a study) by order of arrival of the input files. By default, every tar contains inputs for at most 10k jobs, and 1k job per study. The acrontab job is run every 10min;
  • an acrontab job on the boinc server takes care of checking the status of the queue and, in case there is room, it starts untaring the archives and submitting work to BOINC. The acrontab job does not query the DB for getting the present status of the queue, but it takes the number from the server status_page; hence, this acrontab job is run every hour. For every tar file, the job checks what is the current status of the queue, such that if the status changes, it is taken into account immediately. In addition, if there is room for new tasks, the acrontab job checks if the number of tasks in the current tar file fits into the available room; if not, the following tar is checked.

The above approach has the following assets:

  • no change in user daily experience (no new scripts, errors, etc...);
  • all EOS-related operations are done with native commands, e.g. xrdcp, eos find, etc... such that failures experienced earlier due to the fuse-mounted EOS will not affect the behaviour of the scripts;
  • enlarged spooling space - only with the submissions from yesterday night (from 06:00 PM) to now, we would have increased the occupancy of the work.boinc volume by 20GB;
  • mixing of studies, such that there are no long periods where a user/study monopolises the resources;
  • both acrontab jobs have locking and logging mechanism, error handling (as much as I could without losing too much time on that), and recovery from interrupted runs;
  • no additional load on the BOINC DB, as queue status is taken from the status_page;
  • looping over tars when actually submitting to BOINC allows to avoid blocking processing because of big tar files.

Sixtrack runtime added 4 commits September 10, 2019 17:15
…OS spooling

traffic light has been implemented also in regular task submission processes, even if they have been disabled.
Moreover, find commands on tasks and .desc files order results by time
@amereghe amereghe merged commit 922314b into SixTrack:master Sep 13, 2019
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants