-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Researching performance degradation (and improvement) over a number of releases #5422
Comments
This issue is currently awaiting triage. SIG CLI takes a lead on issue triage for this repo, but any Kubernetes member can accept issues by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
So there were two releases that clearly made PSM performance worse: 1) 3.8.0; 2) 00f0fd7. #2987, which was specifically referring to PSM (created for 00f0fd7) is closed with #4568, which first appeared in the v4.5.5 release, unless I'm reading the git log wrong. I couldn't find any issues or PRs concerning the earlier 5x degradation that the benchmark shows, which was introduced between 3.7.0 and 3.8.0. It's quite likely that that code still exists. I am now going to add v4.5.5 to the list of versions to run the benchmark for and re-run it. I expect that, if #4568 did the job, PSM performance will become 5x faster than the release before it. Will post the updated results and zip archive of the benchmark suite in further comments when they're ready. |
That's precisely what happened! Here are the test results with the added 4.5.4 and 4.5.5 that demonstrate the effect of #4568.
So, finally, it seems that we should be looking for something between 3.7.0 and 3.8.0 that introduced the first PSM performance degradation. At this point, someone familiar with the code base (@ephesused? :)) should be more efficient in searching for the culprit than myself. Updated benchmark suite: kustomize-versions-benchmark-moreversions.zip |
Nevermind, there's just a few commits between 3.7.0 and 3.8.0. I tested all of them:
So we see that 5a02286 is the one that introduced the first patchesStrategicMerge performance degradation, for which I could find no issue or PR with a fix. It was simply an update of kustomize/api from v0.4.2 to v0.5.0, so the actual code that causes this degradation is to be found there (unless it was fixed later). |
As an aside, yes, there were significant performance improvements with v5.0.0 (#4791, #4944, #4809). Back to the issue at hand... Looking around a little bit, I suspect the performance issue comes from something like a one-time initialization. I say that since I see similar run time performance when I invoke the executable, but when I craft a test that loops 200 builds within the same execution, the run time for test 3 is on par with test 2. The effect of the one-time extra cost would be most significant to users that have lots of kustomize invocations against a large number of small builds. If I remember correctly, that's exactly your situation.
Admittedly, I haven't been in this code for a little while, so I hope this testing pattern still is valid. kustomize/5422_test.go package main
import (
"fmt"
"os"
"path/filepath"
"testing"
"sigs.k8s.io/kustomize/kustomize/v5/commands"
)
func Test_5422_2(t *testing.T) {
run5422(t, "2_patches-json6902")
}
func Test_5422_3(t *testing.T) {
run5422(t, "3_patches-strategic-merge")
}
func run5422(t *testing.T, subdir string) {
rootdir := "/src/dist/issue5422/kustomize-versions-benchmark/tests/"
outputPath := filepath.Join(rootdir, fmt.Sprintf("/results-%s.yaml", subdir))
inputPath := filepath.Join(rootdir, subdir)
os.Args = []string{"kustomize", "build", "-o", outputPath, inputPath}
for i := 0; i < 200; i++ {
err := commands.NewDefaultCommand().Execute()
if err != nil {
t.Fatalf("Failure: '%s'", err.Error())
}
}
} |
It is. Mine is more or less okay now, given all the improvements in the recent releases (although still slower overally than 3.5.4 -- we use a lot of PSMs), but for some people this can be really critical, e.g. #5084. Your tests are interesting. Yes, they do suggest that it's something like one-time initialization, but they also show that it happens only for test 3 (PSM). In test 2, durations for 200 * 1 and 1 * 200 are pretty close, considering all the overhead of spawning the processes etc., but in test 3 there is a big difference.
|
I did some profiling, for both the pprof-compatible files are attached: kustomize-5422-profiles.zip Standalone profiles were made for the following test cases: 1) PSM; 2) JSON6902 Go test profiles were made for PSM and JSON6902, for both single run and 200 iterations. One difference that clearly stands out, when the standalone profiles are viewed as graphs (will attach screenshots below), is that in the case of PSM kustomize spends a lot of time (cumulative -- child calls included) in (another interesting hint is that the PSM's profile file is 2.5 times bigger than the json6902s, which suggests that they have quite different execution paths.) Both of these calls are a part of the following code: kustomize/kyaml/openapi/openapi.go Lines 722 to 733 in e002b49
Does this ring a bell for you @ephesused? I'm afraid that at this point I finally lack the knowledge required to try and refactor this code, or comment out certain parts to see if something improves performance. p.s. nevermind the long execution times in the profiles: I intentionally ran kustomize under qemu (arm64 guest, amd64 host) to make it run as slowly as possible to get meaningful timings, because my host machine is too fast to profile a single invocation properly. PSM:JSON6902: |
...not quite. So more experiments :) I narrowed it down to the following, or I think I did: kustomize/kyaml/yaml/walk/walk.go Line 57 in e002b49
If we do, as an experiment, I wonder if This situation somewhat reminds me of the work done in #4569 to skip making unnecessary calls to process unknown resource kinds. Maybe something similar can be done for PSMs too. |
I added a benchmark in #5425 in order to improve performance and detect regressions in the future. I will make safer versions of my enhancements as new PRs for review soon. |
@shapirus or @ephesused if one of you is available to review #5425 before we assign it to an approver, that would help us out a ton. |
...still no ideas on the PatchesStrategicMerge performance issue? It's the only thing that is still slow. |
I haven't had time to investigate anything recently, sorry. If you'd like, I can rebase my work from #5084 (comment) and put it up as a PR if you want to see whether that provides any help. Given your analysis here, though, I think this issue is unrelated to what I am pursuing over in #5084. While I'd like to dig into this problem, right now I expect the earliest I might have to do so would be the new year. |
That's fine, no worries. It was more like a keep-alive packet to prevent the overly intelligent bots from auto-closing the issue :).
Yes, please do, if you have time. My benchmark suite accepts PR numbers, in addition to revision hashes and release versions, so if the changes are in a PR, then it'll be very easy to benchmark it against a selected set of versions to see if it makes any difference. |
It's now #5481. |
I took some time to dig around, and there are many places in the strategic merge patch flow where the schema is involved. Adjusting that code for this performance benefit would be a large effort, likely with substantial regression concerns. I adjusted my runs to reset the schema before each execution, which enabled profiling to line up better with your use case: os.Args = []string{"kustomize", "build", "-o", outputPath, inputPath}
for i := 0; i < b.N; i++ {
openapi.ResetOpenAPI()
err := commands.NewDefaultCommand().Execute()
if err != nil {
b.Fatalf("Failure: '%s'", err.Error())
}
} That led to a realization that might work if your use cases are well-defined. I don't know how wise this would be - I'm just thinking through possibilities. This approach assumes that the default schema doesn't have any effect on your kustomization output. That's true for the simplified test case. Since the default schema has no effect, adjust kind: Kustomization
apiVersion: kustomize.config.k8s.io/v1beta1
openapi:
path: "empty.json"
{} The empty schema still gets parsed, but even with the cost of the file load, it can speed things up a good bit. Here's what I get with the default behavior:
Here's what I get with the empty schema approach:
|
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale time to rerun the performance tests, it's a good reminder. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
/remove-lifecycle rotten not ready yet |
/lifecycle frozen |
All right, here are my fresh benchmark results. To recap, these tests resemble a CD scenario where a tool such as ArgoCD has to run kustomize potentially many hundreds of times to build manifests for multiple applications. Using kustomize v3.5.4 as a reference point, we can see that there were a number of versions with terribly degraded performance in some or even all test cases, but starting with v5.2.1 almost all of that have been fixed, except the tests which involve strategic merge patches: these are still about 2.5 times slower than in v3.5.4. There is also a notable improvement in all cases but PSM in the latest versions compared to v3.5.4, which is excellent. ...if only we could fix the strategic merge patches performance hit! That's the only thing left that is still much worse than it was before the performance regressions were first introduced. (the contents of the test manifests can be found here: https://github.com/shapirus/kustomize-benchmark-suite/tree/master/tests)
Another run: more iterations per test, less versions to test.
|
Added the fresh 5.4.3 release to the tests, no changes compared to 5.4.2:
|
Summary
This is an attempt to track kustomize's performance degradation (and improvement) for several test cases, each working with a minimal set of manifests aiming to utilize a specific set of kustomize's subroutines.
TLDR
patchesStrategicMerge is the only thing that is still slow.
Benchmark description
The tests are as follows.
1. A basic Deployment:
+
2. Same Deployment plus a JSON6902 patch:
3. Same Deployment plus a strategic merge patch:
4. A resource of an "unknown" kind
(can be e.g. SealedSecret in a real application):
It is worth mentioning that saving the patches in separate files made no difference compared to having them inline in kustomization.yaml. There is no difference between the resources having and not having the
namespace
attribute, either.Running the benchmark
I have prepared a set of shell scripts designed to be run in a Docker container that execute a user-configurable number of iterations (default: 200, can be set via
ITERATIONS
env variable) for each of the above cases for several versions (and git revisions) of kustomize starting with v3.5.4 and ending with v5.2.1.The mentioned benchmark supports building for
linux/amd64
andlinux/arm64
(meaning it should also be possible to build and run it on an ARM Mac host, but I have no hardware to verify this), and runtime has been tested on both platforms.A zip file is attached to this ticket which contains manifests, helper scripts (beware: not necessarily safe to run outside a docker container), and Dockerfile. The versions/revisions to download or build are specified in Dockerfile, see comments inside for more information. Also see the end of this comment for an updated version.
Build and run in one command:
(but it's better to run a simple
docker build .
first to make sure it builds properly -- unlike the one above, this will produce output.)Test results
(also see updated results with more versions in the comments below.)
ARM64 results for the sake of completeness
As we can see, there was a slight (but reproducibly visible) degradation between 3.5.4 and 3.8.0 in tests 1, 2, 4, and a significant 5x degradation of test 3 (PSM) starting with 3.8.0.
In addition to the continuing gradual degradation from version to version, revision 00f0fd7 introduced further 5x degradation in test 3 (PSM) compared to 3.8.0 (revision 1c6481d is shown here as it comes right before 00f0fd7), see #2987 for this specific one.
Version 3.8.3 is yet slower than 00f0fd7.
Version 3.8.6 was as slow as 3.8.3 in test 3 (PSM), but introduced a 20x degradation in the rest of the tests, making them as slow as PSM (#4100).
Then there was further 20% degradation in all tests by version 4.3.0.
Improvements started to come with 4.4.0, but tests 3 and 4 stayed unfixed.
Version 5.0.0 seems to have added a serious improvement in something that affected all test cases, but tests 3 (PSM) and 4 (unknown kind), even though becoming much faster than in previous releases, remained much slower than tests 1 and 2.
Version 5.2.1 fixed test 4 (unknown kind), it came with merging PR #5076, discussed in #4569.
Conslusion
The slow patchesStrategicMerge, even though it is much faster than in older releases, still remains, and it is probably the only significant performance issue that still requires close attention.
There is, apparently, some unoptimal code specific to PSM. As we can see, there were improvements that made all modes more effective, PSM included, but relatively to the others it is still very slow.
Related issues
Here are some of the existing issues that gave hints as for what specific versions/revisions were to be tested:
...and the following might give yet another overall improvement?
kustomize build
#5084Follow-up
I have added more tests to cover the Components functionality: 5) no base + plain Deployment as a Component; 6) plain Deployment in base + a JSON6902 patch as a Component; 7) same as 2, but with PSM.
Results:
As we can see here, in 3.7.0, when Components were introduced, the basic case was only 1.25 times slower than the no-component plain-deployment case and the json6902 component test was 1.6 times slower than the respective no-component test.
In v5.2.1, however, this difference increased to 1.8x and 2.76x, respectively. This may indicate either that there are still possibilities for improvement in the Components code, or that the common overhead was reduced so much that what's left are the Components' genuine computational requirements which became more visible after shaving off the time spent on the common overhead.
It is also clear that both the component and no-component patchesStragegicMerge cases suffer from the same performance hit and should equally benefit from a single fix.
Not attaching my tests as a zip file any longer, since they are now available on github: https://github.com/shapirus/kustomize-benchmark-suite. There's no readme, but if somebody is interested, there is a brief instruction in Dockerfile.
The text was updated successfully, but these errors were encountered: