-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
[env] Do not share config between tests #396
[env] Do not share config between tests #396
Conversation
This commit fixes a bug where the config object was shared between tests. When running tests in parallel, it was then impossible to rely on fields like the namespace because they could have been overwritten by another test. Also, it led to tests using `-race` to fail because the shared config.klient object that were updating the same fields when initializing the client. This commit creates a new `deepCopyConfig` method that allows to create a deep copy of the config object. This way, each test can have its own without impacting the others. Now the config has the following lifecycle: - It is created when a testEnv is created - Each test uses a child testEnv that inherits the main testEnv's config - Each feature uses a child testEnv that inherits the test's testEnv's config This way, a feature inherits all the changes made in BeforeEach functions while not impacting the other features.
Hi @Fricounet. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/assign @vladimirvivien |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Fricounet! @maruina pinged me to have a look at this, I do agree a shared-nothing approach here makes more sense than to have a lock on the whole config and to remember to handle it properly, left a few minor comments 🙏
pkg/env/env.go
Outdated
return &testEnv{ | ||
ctx: e.ctx, | ||
cfg: e.deepCopyConfig(), | ||
actions: e.actions, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we create a new slice here as we do in WithContext, to avoid possible race conditions on that too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, it's not a perfect copy though: since the underlying funcs
, featureFuncs
and testFuncs
fields of each action are also slices, they are still shared. But I don't think there is any way in the current code to alter those slices after creation so it should not cause any issue even if we just copy the []action
slice.
Would you prefer a deep copy of the actions as well to be safe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've copied the actions
like it is done in WithContext
for now and I've also created a child context so that none of the fields of the testEnv are shared
// deepCopyFeature just copies the values from the Feature but creates a deep | ||
// deepCopyConfig just copies the values from the Config to create a deep | ||
// copy to avoid mutation when we just want an informational copy. | ||
func (e *testEnv) deepCopyConfig() *envconf.Config { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know we do something similar for deepCopyFeature
, but in that case it's kind of simpler and so the risk to forget to update it for new fields is lower, here I feel it would be higher, so would it make sense to either autogenerate or rely on reflect.DeepCopy directly? The performance hit would not be so relevant here probably.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried using xreflect.DeepCopy
however it does not quite work:
- it panics when one of the struct fields of
Config
isnil
withpanic: reflect: call of reflect.Value.Set on zero Value
- it panics when trying to copy the regex values with
panic: reflect.Set: value of type *syntax.Prog is not assignable to type *regexp.onePassProg
I didn't try to go deeper but I think a bunch of fixes would be needed in xreflect to be able to use it 😞
Concerning, if we can autogenerate it, I've not considered it. Mainly because I've no idea how to do it 😅
and I don't know if it would be able to do the deep copy on complex struct fields like the regex ones.
Still, if you think it would work, I can try. In that case, could you link some doc I can use to get started?
What do you think?
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Fricounet The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
I've tested this with a simple test (from #352) by checking out locally an adding a
( I needed to add additional and I still get a data race:
|
@pmalek indeed 🤔
Each has its own quirks:
For now, I went with 1) because it is the easiest to do but I'd rather do 2) if we say it is fine to break the API |
I think client caching also has another issue, so while at it, might be worth addressing that too: if one gets the client, modifies the kubeconfig or any other client related parameter in the config, and then gets the client again, they would get the original client and not a new one with the new kubeconfig. We could probably remove caching altogether and solve both issues. |
If it's possible to go this path I'd vote for that option. 👍 |
@phisco @pmalek I got rid of the caching by removing it altogether. In any case, removing the caching altogether seems like an ok solution if we are fine with the performance cost induced by having to recreate the client all the time. |
I am personally not afraid of the performance costs of creating the client each time (it should be negligible, right?). Haven't measured this though. |
+1, being this a testing framework I feel that shouldn't be the focus, but the call is to the maintainers to make for sure. |
I personally don't think this overhead in performance will be that significant if we create client each time. But we should probably keep in mind the resource usage in terms of memory. Just to account for cases where one runs these in a sandbox-ed container env. We don't want them getting OOM'ed suddenly out of nowhere right ? |
@harshanarayana this PR does indeed create some memory overhead. However, I did some profiling and observed the following:
Considering this, I think the overhead is not a big issue because it needs heavy parallelization to show a big impact. I personally don't see a situation where a system able to run 100 parallel tests would be restricted by 70MB of RAM. And considering that without the changes brought by this PR, running parallel tests with the current framework is simply not practical because of all the race conditions due to the shared config, I don't think we are losing anything here:
What do you think? If you want to reproduce the profiling I did, I used the following code:
then |
💯 This can always be left as an exercise for those that would like to optimize this later on. This PR will give users the possibility to run tests in parallel without data races. Let's focus on that. If there's ever need ( I doubt that ) to make this more efficient, we can work on that then. Thanks for all the tests that you've done @Fricounet 👍 |
I appreciate the thoroughness of the analysis in this PR. Thank you all. |
/retest |
@Fricounet: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
func (c *Config) Client() klient.Client { | ||
if c.client != nil { | ||
return c.client | ||
} | ||
|
||
client, err := klient.NewWithKubeConfigFile(c.kubeconfig) | ||
client, err := c.NewClient() | ||
if err != nil { | ||
panic(fmt.Errorf("envconfig: client failed: %w", err).Error()) | ||
panic(err) | ||
} | ||
c.client = client | ||
return c.client | ||
return client | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
klient.NewWithKubeConfigFile(c.kubeconfig)
instead of relying on the injected one, which afaict is a supported usecase at the moment!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #413 for an attempt at solving this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a real example of this being an issue?
Because currently, I can't figure out a situation where this happens. And you can still do GetClient
to retrieve the client right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just that this would be a behavior change, currently someone creating a config with an injected client would always get that same client when calling Client()
. With this change one would have to call GetClient()
explicitly to get that back, while calling Client()
would result in an error if there was no kubeconfig configured for example, which is something totally acceptable if you just injected your own client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, it makes sense then. If you can find a way to support this without overcomplicating thing, I'm all in.
Otherwise, maybe we could consider doing a breaking change? Since we are running in version v0.x.y
, I feel like we can afford to do breaking changes if it means we have more stable foundations to run the e2e tests at scale.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my draft PR, unfortunately I couldn't get the tests to run yet.
What type of PR is this?
/kind feature
What this PR does / why we need it:
Following discussions in #367
This PR fixes a bug where the config object was shared between tests. When running tests in parallel, it was then impossible to rely on fields like the namespace because they could have been overwritten by another test. Also, it led to tests using
-race
to fail because the shared config.klient object that were updating the same fields when initializing the client.This PR creates a new
deepCopyConfig
method that allows to create a deep copy of the config object. This way, each test can have its own without impacting the others. Now the config has the following lifecycle:This way, a feature inherits all the changes made in BeforeEach functions while not impacting the other features.
Which issue(s) this PR fixes:
Fixes #352
Fixes #384
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., Usage docs, etc.: