-
Notifications
You must be signed in to change notification settings - Fork 10
Add drain-delay flag to wait before eviction #13
Conversation
This flag allows to specify a period of time for which drainer should wait after cordoning a node. This ensure that for example after instance refresh pods don't become 'Pending' for some period of time waiting for the new node to come up.
@pawelprazak could you take a look and let me know what you think? |
Sounds useful, and backwards compatible. Thank you for your contribution. |
@@ -45,6 +45,7 @@ func serveCmd() *cobra.Command { | |||
Timeout: 60 * time.Second, | |||
DeleteLocalData: true, | |||
IgnoreAllDaemonSets: true, | |||
DrainDelay: 0 * time.Second, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the fact it's zero for backward compatibility
@@ -39,6 +39,7 @@ type Options struct { | |||
DeleteLocalData bool | |||
Selector string | |||
PodSelector string | |||
DrainDelay time.Duration `mapstructure:"drain-delay"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, I don't understand the significance of the mapstructure:"drain-delay"
part, the code is not fresh in my mind like it used to be, but AFAIR there is no serialisation here, happy to learn more
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope I got it right.
In cmd/kubedrainer/serve.go#181 you call settings.Parse(o.Drainer)
which unmarshalls flags using mapstructure:
err := viper.Unmarshal(target, func(config *mapstructure.DecoderConfig) {
config.ErrorUnused = false
config.ZeroFields = false
})
I tested it without a mapstructure tag and it couldn't parse flags properly: the value of DrainDelay
stayed default - 0s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, I believe, that some other flags, like for example pod-selector
cannot be parsed properly as well:
I added simple debug output at the end of serve/Parse() function to check it:
// Parse parses all flags and settings to options
func (o *ServeOptions) Parse(cmd *cobra.Command) error {
settings.Bind(cmd.Flags()) // needs to be run inside the command and before any viper usage for flags to be visible
if debug {
log.Debug().Msgf("All keys: %+v", viper.AllKeys())
log.Debug().Msgf("All settings: %+v", viper.AllSettings())
cmd.Flags().VisitAll(func(flag *pflag.Flag) {
log.Debug().Msgf("'%s' -> flag: '%+v' | setting: '%+v'", flag.Name, flag.Value, viper.Get(flag.Name))
})
log.Debug().Msgf("Settings: %+v", *o)
}
if err := settings.Parse(o.Kubernetes); err != nil {
return err
}
if err := settings.Parse(o.Drainer); err != nil {
return err
}
if err := settings.Parse(o.AWS); err != nil {
return err
}
log.Debug().Msgf("Parsed settings: %+v", *o) # <--- this one
return nil
}
and I see next behaviour when starting kubedrainer with ./kubedrainer -d serve --pod-selector checking
:
- without mapstructure tag for
PodSelector
:
{"level":"debug","time":1613463410,"message":"Settings: {Kubernetes:{ConfigFlags:{CacheDir:/Users/rtim/.kube/http-cache KubeConfig: ClusterName: AuthInfoName: Context: Namespace:<nil> APIServer: Insecure:false CertFile: KeyFile: CAFile: BearerToken: Impersonate:<nil> ImpersonateGroup:<nil> Username:<nil> Password:<nil> Timeout:0 clientConfig:<nil> lock:{state:0 sema:0} usePersistentConfig:true}} Drainer:{Node: Force:false GracePeriodSeconds:-1 IgnoreAllDaemonSets:true Timeout:1m0s DeleteLocalData:true Selector: PodSelector: DrainDelay:0s} AWS:0xc00003a6e0}"}
{"level":"debug","time":1613463410,"message":"Parsed settings: {Kubernetes:{ConfigFlags:{CacheDir:/Users/rtim/.kube/http-cache KubeConfig: ClusterName: AuthInfoName: Context: Namespace:<nil> APIServer: Insecure:false CertFile: KeyFile: CAFile: BearerToken: Impersonate:<nil> ImpersonateGroup:<nil> Username:<nil> Password:<nil> Timeout:0 clientConfig:<nil> lock:{state:0 sema:0} usePersistentConfig:true}} Drainer:{Node: Force:false GracePeriodSeconds:-1 IgnoreAllDaemonSets:true Timeout:1m0s DeleteLocalData:true Selector: PodSelector: DrainDelay:0s} AWS:0xc00003a6e0}"}
you can see that PodSelector(at the end of JSONs) doesn't change
- with mapstructure tag
mapstructure:"pod-selector"
:
{"level":"debug","time":1613463080,"message":"Settings: {Kubernetes:{ConfigFlags:{CacheDir:/Users/rtim/.kube/http-cache KubeConfig: ClusterName: AuthInfoName: Context: Namespace:<nil> APIServer: Insecure:false CertFile: KeyFile: CAFile: BearerToken: Impersonate:<nil> ImpersonateGroup:<nil> Username:<nil> Password:<nil> Timeout:0 clientConfig:<nil> lock:{state:0 sema:0} usePersistentConfig:true}} Drainer:{Node: Force:false GracePeriodSeconds:-1 IgnoreAllDaemonSets:true Timeout:1m0s DeleteLocalData:true Selector: PodSelector: DrainDelay:0s} AWS:0xc00029e5f0}"}
{"level":"debug","time":1613463080,"message":"Parsed settings: {Kubernetes:{ConfigFlags:{CacheDir:/Users/rtim/.kube/http-cache KubeConfig: ClusterName: AuthInfoName: Context: Namespace:<nil> APIServer: Insecure:false CertFile: KeyFile: CAFile: BearerToken: Impersonate:<nil> ImpersonateGroup:<nil> Username:<nil> Password:<nil> Timeout:0 clientConfig:<nil> lock:{state:0 sema:0} usePersistentConfig:true}} Drainer:{Node: Force:false GracePeriodSeconds:-1 IgnoreAllDaemonSets:true Timeout:1m0s DeleteLocalData:true Selector: PodSelector:checking DrainDelay:0s} AWS:0xc00029e5f0}"}
You can verify it yourself and if you want I can work on this issue as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're absolutely right, thank you for looking at other fields as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to me to continue in this PR then or should we merge this one and open a new PR to fix other fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll merge this one, and if you would be son kind we'll gladly accept the fix for the other fields (was about to start working on it) :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, I can fix other fields
FYI: https://github.com/VirtusLab/kubedrainer/releases/tag/v0.0.10 was released |
This flag allows to specify a period of time for which drainer should
wait after cordoning a node. This ensure that for example after instance
refresh in EKS pods don't become 'Pending' for some period of time waiting for
the new node to come up.