Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Decouple individual pod termination frequency from cluster size #20

Open
linki opened this issue Feb 23, 2017 · 3 comments · May be fixed by #353
Open

Decouple individual pod termination frequency from cluster size #20

linki opened this issue Feb 23, 2017 · 3 comments · May be fixed by #353

Comments

@linki
Copy link
Owner

linki commented Feb 23, 2017

Currently, the probability of a pod being killed depends on the number of pods being in the target group. This is bad if you want to run chaoskube as a cluster addon and opt-in to being killed via annotations because you cannot estimate how often that would happen.

Proposal

Allow specifying or at least somehow keep track of what's going on so Pod terminations happen in a somewhat predictable way. For example, instead of terminating a single pod every 10 minutes, each pod may have a probability of X% of being killed per hour. This, hopefully, would make pod terminations independent of the number of pods running in total.

@klautcomputing
Copy link
Contributor

Would you like this to be pod specific or a cluster wide probability?

@linki
Copy link
Owner Author

linki commented Jul 6, 2017

I was thinking about making it pod specific but I also see value in a global version of it like you propose in #34.

For the pod specific version I thought one would annotate a PodSpec with something like:

  • chaos.alpha.kubernetes.io/frequency=2/day for "kill this twice per day"
  • chaos.alpha.kubernetes.io/frequency=10/hour for "kill this ten times per hour"
  • chaos.alpha.kubernetes.io/frequency=1/week for "kill this once a week"
  • etc.

To implement this: one invokes chaoskube at a certain interval like before and then calculate a probability per pod based on the desired frequency and how often chaoskube is invoked.

For instance, let's assume chaoskube runs at an interval of 1 minute and a pod has the annotation set to twice a day (2/day). Then on each iteration chaoskube would calculate a probability that this pod should be killed like this:

  • twice a day => 24*60/2 => every 720 minutes
  • since chaoskube runs every minute => 1/720 => 0,14% chance to kill this pod in each iteration.
    or for ten times an hour (10/hour).
  • ten times an hour => 60/10 => every 6 minutes
  • since chaoskube runs every minute => 1/6 => 16,6% chance to kill this pod in each iteration.

This would also work with different intervals I think.

I'm not sure if this is correct but if it is it would allow chaoskube to remain stateless and pods would be killed at roughly the same pace over time regardless of cluster size.

@klautcomputing
Copy link
Contributor

Sounds good.

Just a couple of thoughts/questions I had when I read this:

  • What do you plan to do with non annotated pods?
  • What happens to pods that want a higher kill rate than 60/hour?
  • This is more complicated, but might be the better approach because every pod can opt-in and decide its own kill rate.
  • Do we want that pod or namespace specific?

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
2 participants