Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Auto-reboot if kernel upgraded immediately after idr-00-preinstall.yml #108

Merged
merged 1 commit into from
Apr 19, 2018

Conversation

manics
Copy link
Contributor

@manics manics commented Apr 16, 2018

This reduces but does not eliminate some long standing issues with race conditions during reboots. The original aim of this repository was to support a full deployment run-through from scratch, with reboots postponed to the very end. In practice this causes problems when installed services are unnecessarily interrupted, or when a service is installed against an old running kernel which changes after the reboot. Instead this PRs executes the reboot at the end of the pre-install stage, after system packages have been upgraded but before any applications are installed.

This will lead to the first full run-through failing after the preinstall stage since the combination of multiple VM reboots via a proxy and rebooting the proxy makes it too complicated to auto-detect when all VMs have resumed, however this is preferable to other problems caused later.

@joshmoore
Copy link
Member

The description here with planned failure seems at best like a workaround. Is deploy-idr.sh trying to do too much? Is there any 2 stage way of calling that would not fail?

@manics
Copy link
Contributor Author

manics commented Apr 17, 2018

deploy-idr.sh currently has 7 steps:

- 'galaxy': Install galaxy roles
- 'provision': Provision OpenStack instances, storage and networks
- 'network': Miscellaneous network configuration inside instances
- 'deploy-pre': Initialise instances, including updating packages
- 'deploy': Deploy the IDR
- 'deploy-apps': Deploy public IDR apps
- 'deploy-post': Additional setup including OMERO accounts and monitoring

We could remove all and recommend running the stages individually, or add additional groupings?

@joshmoore
Copy link
Member

I tested this locally by using:

$ deploy-idr.sh prod50 all  # failed for some unrelated issue
$ deploy-idr.sh prod50 expert deployment/ansible/idr-00-preinstall.yml

For me, the above which amounts to galaxy + provision + network + pre-install + the necessary reboot is a pretty good function for this script (though I'd still say in needs to be this repo). This amounts to "get me started" or bootstrap. The rest is pretty straight-forward on a playbook-by-playbook basis. e.g. I'm currently testing with this full.yml:


# Common usage:
# ansible-playbook playbooks/full.yml -u centos -e idr_environment=testXX -u centos

- hosts: localhost
  connection: local
  gather_facts: False
  pre_tasks:
   - fail: msg="Variable 'idr_environment' is not defined"
     when: idr_environment is not defined

- hosts: localhost
  connection: local
  gather_facts: False
  tasks:
   - include_vars:
       file: vars/os-idr-create-{{ idr_environment }}.yml

- import_playbook: ../deployment/ansible/idr-01-install-idr.yml
- import_playbook: ../deployment/ansible/idr-02-services.yml
- import_playbook: ../deployment/ansible/idr-03-postinstall.yml
- import_playbook: idr-links.yml
- import_playbook: idr-oneoff-steps.yml
- import_playbook: ../deployment/ansible/idr-09-monitoring.yml
- import_playbook: notify-slack.yml

@joshmoore
Copy link
Member

Relaunched travis with gh-109

@joshmoore
Copy link
Member

Now green with travis in addition to being run against prod50 as outlined above.

Copy link
Member

@sbesson sbesson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given all the ongoing work and the existing issues, I would propose to merge with the caveats discussed above. My understanding is that there is no way to safely get a single script deploying everything in one run with the guarantee that the system will be usable and that we have to evolve towards a multi-stage deployment:

boostrap (galaxy + provision + network + upgrade + reboot)
deploy (core + vae + ftp depending on the flages)

More discussion will be necessary about the future of the deploy-idr.sh script discussed above. If we are splitting the single script into self-contained group of phase, an additional thought is whether we should handle the decoupled components via separate standalone deployment phases (instead of using flags) i.e.:

boostrap (galaxy + provision + network + upgrade + reboot)
deploy_core
deploy_ftp
deploy_vae

@sbesson sbesson merged commit 7ce8da3 into IDR:master Apr 19, 2018
@manics manics deleted the reboot-in-preinstall branch April 19, 2018 16:19
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants