Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

chaoskube always kills the same pod #197

Closed
HaveFun83 opened this issue Apr 15, 2020 · 5 comments
Closed

chaoskube always kills the same pod #197

HaveFun83 opened this issue Apr 15, 2020 · 5 comments
Labels

Comments

@HaveFun83
Copy link

Hi

currently upgrading from v0.15.1 to v0.19.0

NAME                              READY   STATUS    RESTARTS   AGE
chaoskube-demo-7f5ffd44db-djjcc   1/1     Running   0          10m
redis-demo-master-0               1/1     Running   0          8m29s
redis-demo-slave-0                1/1     Running   0          26s
redis-demo-slave-1                1/1     Running   0          7m39s
redis-demo-slave-2                1/1     Running   0          7m14s

and now always redis-demo-slave-0 will be killed no random pod

time="2020-04-15T15:42:03Z" level=info msg="starting up" dryRun=false interval=2m0s version=v0.19.0
W0415 15:42:03.745799       6 client_config.go:543] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
time="2020-04-15T15:42:03Z" level=info msg="connected to cluster" master="https://10.96.0.1:443" serverVersion=v1.17.2
time="2020-04-15T15:42:03Z" level=info msg="setting pod filter" annotations= excludedPodNames="chaoskube-demo|redis-demo-master" includedPodNames="<nil>" labels= maxKill=1 minimumAge=0s namespaceLabels= namespaces=demo
time="2020-04-15T15:42:03Z" level=info msg="setting quiet times" daysOfYear="[]" timesOfDay="[16:00-07:00]" weekdays="[Friday Saturday Sunday]"
time="2020-04-15T15:42:03Z" level=info msg="setting timezone" location=UTC name=UTC offset=0
time="2020-04-15T15:46:04Z" level=info msg="terminating pod" name=redis-demo-slave-0 namespace=demo
time="2020-04-15T15:48:04Z" level=info msg="terminating pod" name=redis-demo-slave-0 namespace=demo
time="2020-04-15T15:50:04Z" level=info msg="terminating pod" name=redis-demo-slave-0 namespace=demo
time="2020-04-15T15:52:04Z" level=info msg="terminating pod" name=redis-demo-slave-0 namespace=demo

before upgrade chaoskube kills on of the redis-demo-slave pods randomly.

any ideas?

tia

@linki
Copy link
Owner

linki commented Apr 17, 2020

@HaveFun83 Interesting 🤔

Please run it with --debug which gives us some more information if it even has more candidates to pick from. It'll also show us the values of all flags.

@HaveFun83
Copy link
Author

HaveFun83 commented Apr 17, 2020

Log from working v0.15.1

time="2020-04-17T10:54:07Z" level=debug msg="reading config" annotations= debug=true dryRun=false excludedDaysOfYear= excludedPodNames="chaoskube-demo|redis-demo-master" excludedTimesOfDay="16:00-07:00" excludedWeekdays="Sat,Sun" gracePeriod=-1s includedPodNames="<nil>" interval=2m0s kubeconfig= labels= logFormat=text master= metricsAddress=":8080" minimumAge=0s namespaceLabels= namespaces=demo timezone=UTC
time="2020-04-17T10:54:07Z" level=info msg="starting up" dryRun=false interval=2m0s version=v0.15.1
time="2020-04-17T10:54:07Z" level=debug msg="using cluster config" kubeconfig= master=
time="2020-04-17T10:54:07Z" level=info msg="connected to cluster" master="https://10.96.0.1:443" serverVersion=v1.17.2
time="2020-04-17T10:54:07Z" level=info msg="setting pod filter" annotations= excludedPodNames="chaoskube-demo|redis-demo-master" includedPodNames="<nil>" labels= minimumAge=0s namespaceLabels= namespaces=demo
time="2020-04-17T10:54:07Z" level=info msg="setting quiet times" daysOfYear="[]" timesOfDay="[16:00-07:00]" weekdays="[Saturday Sunday]"
time="2020-04-17T10:54:07Z" level=info msg="setting timezone" location=UTC name=UTC offset=0
time="2020-04-17T10:54:08Z" level=debug msg="found candidates" count=3
time="2020-04-17T10:54:08Z" level=info msg="terminating pod" name=redis-demo-slave-1 namespace=demo
time="2020-04-17T10:54:08Z" level=debug msg="calling deletePod endpoint" name=redis-demo-slave-1 namespace=demo terminator=DeletePod
time="2020-04-17T10:54:08Z" level=debug msg=sleeping...
time="2020-04-17T10:56:08Z" level=debug msg="found candidates" count=3
time="2020-04-17T10:56:08Z" level=info msg="terminating pod" name=redis-demo-slave-0 namespace=demo
time="2020-04-17T10:56:08Z" level=debug msg="calling deletePod endpoint" name=redis-demo-slave-0 namespace=demo terminator=DeletePod
time="2020-04-17T10:56:08Z" level=debug msg=sleeping...
time="2020-04-17T10:58:08Z" level=debug msg="found candidates" count=3
time="2020-04-17T10:58:08Z" level=info msg="terminating pod" name=redis-demo-slave-0 namespace=demo
time="2020-04-17T10:58:08Z" level=debug msg="calling deletePod endpoint" name=redis-demo-slave-0 namespace=demo terminator=DeletePod
time="2020-04-17T10:58:08Z" level=debug msg=sleeping...
time="2020-04-17T11:00:08Z" level=debug msg="found candidates" count=3
time="2020-04-17T11:00:08Z" level=info msg="terminating pod" name=redis-demo-slave-2 namespace=demo
time="2020-04-17T11:00:08Z" level=debug msg="calling deletePod endpoint" name=redis-demo-slave-2 namespace=demo terminator=DeletePod
time="2020-04-17T11:00:08Z" level=debug msg=sleeping...

Log from not working v0.19.0

time="2020-04-17T11:01:08Z" level=debug msg="reading config" annotations= debug=true dryRun=false excludedDaysOfYear= excludedPodNames="chaoskube-demo|redis-demo-master" excludedTimesOfDay="16:00-07:00" excludedWeekdays="Sat,Sun" gracePeriod=-1s includedPodNames="<nil>" interval=2m0s kubeconfig= labels= logFormat=text master= maxKill=1 metricsAddress=":8080" minimumAge=0s namespaceLabels= namespaces=demo slackWebhook= timezone=UTC
time="2020-04-17T11:01:08Z" level=info msg="starting up" dryRun=false interval=2m0s version=v0.19.0
time="2020-04-17T11:01:08Z" level=debug msg="using cluster config" kubeconfig= master=
W0417 11:01:08.166400       7 client_config.go:543] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
time="2020-04-17T11:01:08Z" level=info msg="connected to cluster" master="https://10.96.0.1:443" serverVersion=v1.17.2
time="2020-04-17T11:01:08Z" level=info msg="setting pod filter" annotations= excludedPodNames="chaoskube-demo|redis-demo-master" includedPodNames="<nil>" labels= maxKill=1 minimumAge=0s namespaceLabels= namespaces=demo
time="2020-04-17T11:01:08Z" level=info msg="setting quiet times" daysOfYear="[]" timesOfDay="[16:00-07:00]" weekdays="[Saturday Sunday]"
time="2020-04-17T11:01:08Z" level=info msg="setting timezone" location=UTC name=UTC offset=0
time="2020-04-17T11:01:09Z" level=debug msg="found victims" count=1
time="2020-04-17T11:01:09Z" level=info msg="terminating pod" name=redis-demo-slave-0 namespace=demo
time="2020-04-17T11:01:09Z" level=debug msg="calling deletePod endpoint" name=redis-demo-slave-0 namespace=demo terminator=DeletePod
time="2020-04-17T11:01:09Z" level=debug msg=sleeping...
time="2020-04-17T11:03:08Z" level=debug msg="found victims" count=1
time="2020-04-17T11:03:08Z" level=info msg="terminating pod" name=redis-demo-slave-0 namespace=demo
time="2020-04-17T11:03:08Z" level=debug msg="calling deletePod endpoint" name=redis-demo-slave-0 namespace=demo terminator=DeletePod
time="2020-04-17T11:03:08Z" level=debug msg=sleeping...

@linki
Copy link
Owner

linki commented Apr 17, 2020

I think the reason is this.

This was done to prevent killing pods from the same replication group, such as Deployment, StatefulSet and so on. It's only really needed when --max-kill is more than 1.

However, looking at the implementation it looks like it always picks the first pod from the group as the target.

@linki linki added the bug label Apr 23, 2020
Repository owner deleted a comment from ravikumar2000 Apr 29, 2020
Repository owner deleted a comment from ravikumar2000 Apr 29, 2020
@linki
Copy link
Owner

linki commented May 2, 2020

Fixed in #203.

Thanks @HaveFun83 for reporting this.

@linki
Copy link
Owner

linki commented Jul 3, 2020

This is fixed in version v0.20.0.

@linki linki closed this as completed Jul 3, 2020
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants