Add drain-delay flag to wait before eviction #13

rtim75 · 2021-02-09T14:30:15Z

This flag allows to specify a period of time for which drainer should
wait after cordoning a node. This ensure that for example after instance
refresh in EKS pods don't become 'Pending' for some period of time waiting for
the new node to come up.

This flag allows to specify a period of time for which drainer should wait after cordoning a node. This ensure that for example after instance refresh pods don't become 'Pending' for some period of time waiting for the new node to come up.

rtim75 · 2021-02-13T14:15:18Z

@pawelprazak could you take a look and let me know what you think?

pawelprazak · 2021-02-15T10:48:20Z

Sounds useful, and backwards compatible. Thank you for your contribution.

pawelprazak · 2021-02-15T10:48:48Z

cmd/kubedrainer/serve.go

@@ -45,6 +45,7 @@ func serveCmd() *cobra.Command {
 			Timeout:             60 * time.Second,
 			DeleteLocalData:     true,
 			IgnoreAllDaemonSets: true,
+			DrainDelay:          0 * time.Second,


I like the fact it's zero for backward compatibility

pawelprazak · 2021-02-15T10:50:14Z

pkg/drainer/drainer.go

@@ -39,6 +39,7 @@ type Options struct {
 	DeleteLocalData     bool
 	Selector            string
 	PodSelector         string
+	DrainDelay          time.Duration `mapstructure:"drain-delay"`


sorry, I don't understand the significance of the mapstructure:"drain-delay" part, the code is not fresh in my mind like it used to be, but AFAIR there is no serialisation here, happy to learn more

I hope I got it right.
In cmd/kubedrainer/serve.go#181 you call settings.Parse(o.Drainer) which unmarshalls flags using mapstructure:

err := viper.Unmarshal(target, func(config *mapstructure.DecoderConfig) { config.ErrorUnused = false config.ZeroFields = false })

I tested it without a mapstructure tag and it couldn't parse flags properly: the value of DrainDelay stayed default - 0s.

In fact, I believe, that some other flags, like for example pod-selector cannot be parsed properly as well:

I added simple debug output at the end of serve/Parse() function to check it:

// Parse parses all flags and settings to options func (o *ServeOptions) Parse(cmd *cobra.Command) error { settings.Bind(cmd.Flags()) // needs to be run inside the command and before any viper usage for flags to be visible if debug { log.Debug().Msgf("All keys: %+v", viper.AllKeys()) log.Debug().Msgf("All settings: %+v", viper.AllSettings()) cmd.Flags().VisitAll(func(flag *pflag.Flag) { log.Debug().Msgf("'%s' -> flag: '%+v' | setting: '%+v'", flag.Name, flag.Value, viper.Get(flag.Name)) }) log.Debug().Msgf("Settings: %+v", *o) } if err := settings.Parse(o.Kubernetes); err != nil { return err } if err := settings.Parse(o.Drainer); err != nil { return err } if err := settings.Parse(o.AWS); err != nil { return err } log.Debug().Msgf("Parsed settings: %+v", *o) # <--- this one return nil }

and I see next behaviour when starting kubedrainer with ./kubedrainer -d serve --pod-selector checking:

without mapstructure tag for PodSelector:

{"level":"debug","time":1613463410,"message":"Settings: {Kubernetes:{ConfigFlags:{CacheDir:/Users/rtim/.kube/http-cache KubeConfig: ClusterName: AuthInfoName: Context: Namespace:<nil> APIServer: Insecure:false CertFile: KeyFile: CAFile: BearerToken: Impersonate:<nil> ImpersonateGroup:<nil> Username:<nil> Password:<nil> Timeout:0 clientConfig:<nil> lock:{state:0 sema:0} usePersistentConfig:true}} Drainer:{Node: Force:false GracePeriodSeconds:-1 IgnoreAllDaemonSets:true Timeout:1m0s DeleteLocalData:true Selector: PodSelector: DrainDelay:0s} AWS:0xc00003a6e0}"} {"level":"debug","time":1613463410,"message":"Parsed settings: {Kubernetes:{ConfigFlags:{CacheDir:/Users/rtim/.kube/http-cache KubeConfig: ClusterName: AuthInfoName: Context: Namespace:<nil> APIServer: Insecure:false CertFile: KeyFile: CAFile: BearerToken: Impersonate:<nil> ImpersonateGroup:<nil> Username:<nil> Password:<nil> Timeout:0 clientConfig:<nil> lock:{state:0 sema:0} usePersistentConfig:true}} Drainer:{Node: Force:false GracePeriodSeconds:-1 IgnoreAllDaemonSets:true Timeout:1m0s DeleteLocalData:true Selector: PodSelector: DrainDelay:0s} AWS:0xc00003a6e0}"}

you can see that PodSelector(at the end of JSONs) doesn't change

with mapstructure tag mapstructure:"pod-selector":

{"level":"debug","time":1613463080,"message":"Settings: {Kubernetes:{ConfigFlags:{CacheDir:/Users/rtim/.kube/http-cache KubeConfig: ClusterName: AuthInfoName: Context: Namespace:<nil> APIServer: Insecure:false CertFile: KeyFile: CAFile: BearerToken: Impersonate:<nil> ImpersonateGroup:<nil> Username:<nil> Password:<nil> Timeout:0 clientConfig:<nil> lock:{state:0 sema:0} usePersistentConfig:true}} Drainer:{Node: Force:false GracePeriodSeconds:-1 IgnoreAllDaemonSets:true Timeout:1m0s DeleteLocalData:true Selector: PodSelector: DrainDelay:0s} AWS:0xc00029e5f0}"} {"level":"debug","time":1613463080,"message":"Parsed settings: {Kubernetes:{ConfigFlags:{CacheDir:/Users/rtim/.kube/http-cache KubeConfig: ClusterName: AuthInfoName: Context: Namespace:<nil> APIServer: Insecure:false CertFile: KeyFile: CAFile: BearerToken: Impersonate:<nil> ImpersonateGroup:<nil> Username:<nil> Password:<nil> Timeout:0 clientConfig:<nil> lock:{state:0 sema:0} usePersistentConfig:true}} Drainer:{Node: Force:false GracePeriodSeconds:-1 IgnoreAllDaemonSets:true Timeout:1m0s DeleteLocalData:true Selector: PodSelector:checking DrainDelay:0s} AWS:0xc00029e5f0}"}

You can verify it yourself and if you want I can work on this issue as well.

you're absolutely right, thank you for looking at other fields as well

Do you want to me to continue in this PR then or should we merge this one and open a new PR to fix other fields?

I'll merge this one, and if you would be son kind we'll gladly accept the fix for the other fields (was about to start working on it) :)

sure, I can fix other fields

pawelprazak · 2021-02-17T12:22:33Z

FYI: https://github.com/VirtusLab/kubedrainer/releases/tag/v0.0.10 was released

Add drain delay flag

e98a257

This flag allows to specify a period of time for which drainer should wait after cordoning a node. This ensure that for example after instance refresh pods don't become 'Pending' for some period of time waiting for the new node to come up.

pawelprazak reviewed Feb 15, 2021

View reviewed changes

pawelprazak merged commit 2f4909b into VirtusLab:master Feb 16, 2021

rtim75 mentioned this pull request Feb 24, 2021

Add missing mapstructure tags to drainer and asg options #14

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add drain-delay flag to wait before eviction #13

Add drain-delay flag to wait before eviction #13

rtim75 commented Feb 9, 2021

rtim75 commented Feb 13, 2021

pawelprazak commented Feb 15, 2021

pawelprazak Feb 15, 2021

pawelprazak Feb 15, 2021

rtim75 Feb 16, 2021 •

edited

Loading

rtim75 Feb 16, 2021 •

edited

Loading

pawelprazak Feb 16, 2021

rtim75 Feb 16, 2021 •

edited

Loading

pawelprazak Feb 16, 2021

rtim75 Feb 16, 2021

pawelprazak commented Feb 17, 2021

Add drain-delay flag to wait before eviction #13

Add drain-delay flag to wait before eviction #13

Conversation

rtim75 commented Feb 9, 2021

rtim75 commented Feb 13, 2021

pawelprazak commented Feb 15, 2021

pawelprazak Feb 15, 2021

Choose a reason for hiding this comment

pawelprazak Feb 15, 2021

Choose a reason for hiding this comment

rtim75 Feb 16, 2021 • edited Loading

Choose a reason for hiding this comment

rtim75 Feb 16, 2021 • edited Loading

Choose a reason for hiding this comment

pawelprazak Feb 16, 2021

Choose a reason for hiding this comment

rtim75 Feb 16, 2021 • edited Loading

Choose a reason for hiding this comment

pawelprazak Feb 16, 2021

Choose a reason for hiding this comment

rtim75 Feb 16, 2021

Choose a reason for hiding this comment

pawelprazak commented Feb 17, 2021

rtim75 Feb 16, 2021 •

edited

Loading

rtim75 Feb 16, 2021 •

edited

Loading

rtim75 Feb 16, 2021 •

edited

Loading