Skip to content

Useful tips and troubleshooting

Solomon Shorser edited this page Nov 13, 2015 · 2 revisions

###Other useful tips

####Setting a spot price. Amazon spot # allows you to specify a maximum price for your EC2 instances, and this can let you save money on running your instances. You can read more about spot # here.

To set a spot instance price, run the pancancer sysconfig --force command. You will be prompted to answer a number of questions, the answer to which are probably already correct. When you get to the question that asks "What spot price would you like to set?", type your new spot instance price and hit Enter.

$ pancancer sysconfig --force
Setting up pancancer config files.
What is your AWS Key [Press "ENTER" to use the previous value: <PREVIOUS VALUE>, or type a new value if you want]? 
What is your AWS Secret Key [Press "ENTER" to use the previous value: <PREVIOUS VALUE>, or type a new value if you want]? 
How many VMs do you want in your fleet [Press "ENTER" to use the previous value: 1, or type a new value if you want]? 
What AWS Security Group should the VMs belong to [Press "ENTER" to use the previous value: test_security_group, or type a new value if you want]? 
What spot price would you like to set [Press "ENTER" to use the previous value: 0.001, or type a new value if you want]? 0.15
Your new pancancer system config file has been written to /home/ubuntu/.pancancer/simple_pancancer_config.json
The next time you want to run "pancancer sysconfig", you can use this file like this: 
pancancer sysconfig --config /home/ubuntu/.pancancer/simple_pancancer_config.json

If you wish to return to on-demand #, re-run the above commandbut set the spot price to 0.001.

####Configuration

Configuration should already be complete once you have entered the Pancancer Launcher, but if you need to change or adjust some configuration options (such as fleet size), you can use this command:

$ pancancer sysconfig --force

This command will ask you a series of questions that you have already answered, but you may provide a new answer if you wish to. If you do not give a new answer, the original answer you gave will continue to be used.

####Detaching and reattaching with Docker If you need to do some work on the host machine, it is better to detach from the pancancer_launcher container than to exit. If you exit, the container should restart automatically, but any processes that are running (such as the provisioning process) may be terminated and that could affect any VMs that are in mid-provision.

To detach safely, press Ctrl-P Ctrl-Q

To re-attach to your pancancer_launcher container, issue this command on your host machine:

$ sudo docker attach pancancer_launcher

Press Enter if the prompt does not appear right away.

###Troubleshooting

The provisioner keeps getting SSH errors!

There are a few things that could cause this to happen, the most common being:

  • An invalid PEM key file
  • Security group configuration issues.

Ensure that your PEM key file is valid and that the path is configured correctly. You can check the path that the system is using in ~/.pancancer/simple_pancancer_config.json.

Ensure that the security group allows inbound connections on all TCP ports from the public IP address of the machine which is acting as the launcher host.

I changed my configuration but it doesn't seem to be having any effect.

If the configuration is changed while the Provisioner and Coordinator are running, they may need to be restarted to use the new configuration. It is best to avoid doing this while a VM is being provisioned. Try stopping these services, updating your configuration, and then starting them again:

$ pancancer coordinator stop
$ pancancer provisioner stop
$ pancancer sysconfig --force
$ pancancer coordinator start
$ pancancer provisioner start

My Worker VMs fail and get shut down, but I want them to stay up and running so I can debug problems.

Normally, failed workers are cleaned up automatically. It is sometimes useful to leave failed workers up and running if you are interested in debugging a problematic workflow.

Keeping failed workers must be configured when generating job requests:

$ pancancer run-workers --keep_failed

The result of this is that if a Worker VM completes its work successfully, it will be automatically removed from the fleet, but if a worker fails, it will be left alone, and you will be able to log in to it and debug whatever caused it fo fail.

There is a worker that is stuck in a bad state and I need to get rid of it.

Normally, Worker VM are removed from the fleet when they have finished working. Failed workers are usually cleaned up automatically as well. If it happens that a worker gets stuck in a bad state and cannot be removed automatically, you may need to manually remove it from the fleet.

To remove a Worker VM from the fleet, create a file named kill-list.json. This will will contain a list of IP addresses of any Worker VMs that need to be removed:

[ "0.0.0.1","0.0.0.2"]

Then run the Reaper command, passing it the file name containing the list:

$ Reaper --kill-list kill-list.json 
[2015/10/13 18:06:59] | Killing {i-346db1a6=i-346db1a6},
[2015/10/13 18:06:59] | Marking instances for death i-346db1a6