-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Refactor mapping node traversal and optimize RNode.GetAnnotations and RNode.GetLabels #4944
Refactor mapping node traversal and optimize RNode.GetAnnotations and RNode.GetLabels #4944
Conversation
Refactor mapping node content traversal so that all code paths execute through the same root function.
This commit optimizes in three ways: 1. For heavily used functions, allocate memory to avoid overhead associated with map and array re-sizing. 2. Where appropriate, limit annotation and label retrievals to only the desired keys. 3. Adjust annotation and label retrieval to avoid unnecessary temporary object creation.
@ephesused: This PR has multiple commits, and the default merge method is: merge. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Hi @ephesused. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I quite like the approach in this PR. I have a couple of comments as a first pass but I think this is a good direction. In particular, having node traversal go through the same code path is a good change on its own and the performance boost is a huge bonus. I think we should try to get this PR into the next release if we can.
cc @KnVerey
// restricted to only those entries with keys that match | ||
// one of the specific annotations. If no annotations are | ||
// provided, then the map will contain all entries. | ||
func (rn *RNode) GetAnnotations(annotations ...string) map[string]string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to this change, I like the option of providing specific annotations/labels to get.
kyaml/yaml/fns.go
Outdated
return rn, nil | ||
|
||
content := rn.Content() | ||
stillMissing := true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I suggest changing the name of this bool to fieldStillNotFound
kyaml/yaml/rnode.go
Outdated
return true | ||
}) | ||
default: // visit specified fields | ||
// assumption: fields in content have unique names |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to add test coverage for this assumption before we make it. The kyaml Filters are part of the public kyaml api, so we do need to be a little bit careful.
Could you pick one of the Filters that uses this visitMappingNodeFields
, and write a test in fns_test.go
that contains duplicate keys? If it throws some sort of error saying you can't have duplicate keys, then we are fine with this assumption. But if it does not throw any error and instead behaves in some unexpected or obviously incorrect way, then this assumption is risky. One option for handling duplicate names is to simply store a set of visited field names (in go, I usually implement this as a map[string]bool
), so that we can check it right before we increment found
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback and discussion.
Considering the current behavior, I don't think we should introduce an error condition here. Instead, I think the best approach is to create a local "to be found" map[string]bool of fieldNames and use that to determine when a given field should be added to the result map.
I don't believe there's clear guidance since the current behavior does not support restricting the visited mapping node fields. However, I think we can use the current behavior for guidance. With a quick look around, I don't see any methods or functions that ensure mapping node keys are unique. Given that, I don't think it's appropriate to introduce that here. Instead, I think the uniqueness should be handled on the passed fieldNames.
In the case of matching keys, getMapFromMeta's current behavior stores the value of the first key: The VisitFields loop iterates over the full list of keys (in order), using Field for processing. Field returns the first match. So if there are duplicates, then the key is processed multiple times - but each time Field will find the first match and its value will be what's placed into the getMapFromMeta map. I'm going to follow that pattern for how visitMappingNodeFields will execute.
I'll have another commit coming to show what I'm thinking.
Thanks for the feedback - I hope to have updates (and lint fixes) in the next day or two. I haven't figured out how to get kustomize's linting working properly in my dev environment, so I'm working a bit in the dark on that front. |
This commit adjusts visitMappingNodeFields so that it no longer assumes the mapping node has unique keys.
Here are the CPU and memory amounts for a0e94c1, the latest commit in the PR. The key point is that, compared to 20b0d3c (the prior refactor commit), the changes in a0e94c1 didn't have a negative effect on either run time or memory consumption. (This output also highlights that there's some non-trivial variance between runs on this machine - the master runs previously were 13.41s, 207.21s, and 49.45s.)
The benchmark run results for GetAnnotations are pretty much the same as they were with the earlier refactor commit:
|
/approve I'll defer the LGTM to @KnVerey, so that she has a chance to take a look as well. |
@KnVerey, just checking in here. It'd be nice to get this in to the next release. Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Really nice work, thank you!
@@ -460,11 +457,13 @@ func (rn *RNode) getMetaData() *RNode { | |||
} else { | |||
n = rn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I think we can avoid creating a new RNode above as well
content := rn.Content()
if rn.YNode().Kind == DocumentNode {
// get the content if this is the document node
content = content[0].Content
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good catch. I'll plan to provide a small PR to tidy that up. Thanks!
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ephesused, KnVerey, natasha41575 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This PR has two purposes.
First, it refactors mapping node traversal so that all code paths execute through the same function.
Second, it optimizes RNode's GetAnnotations and GetLabels in three ways:
The execution time and memory use improvements over master can be significant. The results shown here are for current master and for the refactored and optimized code. One test case (ds) is a moderately sized and has a somewhat simple set of kustomizations. One (na) is very large and has a complex set of kustomizations. One (ro) is a large set of resources, with no kustomizations - it's just a resource load. kustomize builds using master match exactly with the builds using the PR.
Using go's profiling tools, here's how things look with the current master (a1bfab3) and with this PR (20b0d3c):
This PR also includes a benchmark test for GetAnnotations. Here's the run against master, followed by the run against the PR:
Closes #4940