-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
all: kubernetes slower execution since Go 1.4.3 #14396
Comments
Thanks for filing this and again many apologies that we failed to do so! |
@rsc - sorry for late response Yes - the reason for Go 1.4.3 was exactly the same (performance was significantly higher than with Go 1.4.2). And I guess it should be much easier to pin down the reason of that between those two versions than between Go 1.5.3. Regarding running kubemark - yes, it's definitely possible for you to run it. Please let me know if you need any help with that (feel free to also email me directly at wojtekt@google.com) |
@wojtek-t, I'm really surprised at any regression from 1.4.2 to 1.4.3. Those two releases are basically identical. There are only four changes. See the commits on Sep 21, 2005: https://github.com/golang/go/commits/release-branch.go1.4 cb65428 seems unlikely. It only affects error paths. So, I suspect either your Kubernetes benchmarks are bogus (or at best noisy and statistically useless), or something else changes in between benchmarking Go 1.4.2 and Go 1.4.3: machine types? GCE zone? Kubernetes rev itself? Were you ever able to reliably reproduce the purported Go 1.4.2 to Go 1.4.3 performance regression with all else being equal, or was this the result of two sole data points? And if the latter, how far apart in time were those two data points? |
@bradfitz - yes - we are very very very reliably able to reproduce it - it wasn't cause by machine types, zone, anything - we wer using exactly the same code and environment and the only difference was Go version. Regarding our benchmarks - those are not really benchmarks, those are load tests on real cluster etc. And I wouldn't call them bogus or stastically useless - we found a number of regressions in our code and fixed a huge number of problems thanks to them (real problems). So those really work. It's definitely something around Go version (maybe we are using Go not as expected in some places? - but it has to do with Go version). |
@gmarek - FYI |
@wojtek-t, thanks. This is the first we'd heard about this, so pardon my skepticism. I had to ask the usual questions. I look forward to seeing how this mystery is resolved. |
@bradfitz no worries - we were also initially skeptical about this - but reverting back to Go 1.4.2 solve the problem... |
@wojtek-t did you run a git bisect by any chance? |
@OneOfOne, the difference was the golang:1.4.2 to golang:1.4.3 docker image. There is no opportunity for bisect. Everyone, please stop suggesting things for the Kubernetes team to try. This one is on us right now, and I'd rather they spend their time answering questions about how to reproduce the problem than answer questions about random debugging they did or did not do. Thanks. |
@wojtek-t, thanks for the details. I got stuck, question below. I am trying to reproduce the problem using the kubemark-guide doc you linked to earlier. I tried kubernetes v.1.8.0 since that appeared to be the latest non-alpha, non-beta release, but that git tag seems to have no test/kubemark. So now I am using dff7490, which is the revision that showed the kubemark failure going from 1.4.2 to 1.5.3. I'm stuck right after 'make quick-release'. From the guide, the next step appears to be
but I get:
Where does 'make quick-release' write its output, and how do I unpack that output to install kubectl? Or am I supposed to be using some system version of kubernetes? Thanks. |
@rsc, what is the docker image you refer to? The official Docker golang one I assume? One thing I note looking at https://imagelayers.io/?images=golang:1.4.2,golang:1.4.3 is that 1.4.3 added the "procps" package to the image,
(in the 4th layer) ... but I don't see that reflected in the commit which updates from Go 1.4.2 to Go 1.4.3: docker-library/golang@a4f3927 So if procps was silently added, what else changed in the |
@bradfitz, exactly. I'd like to first be able to reproduce the kubemark delta and then the fact that we're talking about the Docker golang image 1.4.2 -> 1.4.3 and not just the Go distribution 1.4.2 -> 1.4.3 seems like a good thing to check. But so far I can't run kubemark at all. |
@rsc Most of the Kubernetes test scripts automatically try to find You should also be able to install an "official" version with |
It doesn't. You also probably should reduce the number of Nodes to 100 and Master size to 4 Nodes - 1000 Node cluster needs ~100 cores (~70 for Nodes and 32 for Master). |
Note that you also need to start 'external' cluster for Kubemark. For 100 Node one you'd need ~12 cores in Nodes and 2 core master, so edit |
procps was added in docker-library/buildpack-deps@1845b3f That Dockerfile is based on the "curl" micro layer, which has no changes, but the curl layer is based on "debian:jessie", whose history I'm finding it hard to track. How do I find the history of a Dockerfile image's tag? There's this tag-history page: https://github.com/docker-library/docs/blob/master/debian/tag-details.md#debianjessieWhich has history: https://github.com/docker-library/docs/commits/master/debian/tag-details.md The commits updating debian:jessie can be found by cloning https://github.com/docker-library/docs.git and looking at e.g. It appears there were updates on: Mar 27 2015 Unfortunately, it looks like the bot updating the tag history didn't begin life until So "golang:1.4.3" (updated on Sep 23, 2015) was built with a known "debian:jessie" hash, but I can't figure out what "golang:1.4.2" was built with. Maybe @jfrazelle can help me do archaeology. |
OK, still following the doc posted earlier. I can run start-kubemark.sh, but the output looks maybe not quite right:
Is that successful output? Looks like maybe not? I am using Since I don't know if that command succeeded, I tried the next one. It definitely says it failed, but I can't quite infer whether it's because the previous command failed or something else. Full output is below but the main complaint seems to be:
I do agree that 0 is in fact zero-valued, but I am not sure what the point is. Full output:
I will try to do the same thing over again and see if I get a different result. |
Doing the same thing over again at the same git revision produced the same result. Trying kubernetes 492c5d6 instead. Roughly same. The curl chatter is gone but some machine still can't connect to localhost:8080.
and then run-e2e-tests.sh gives a more understandable message:
Help? |
@rsc - sorry it's late evening in Poland already - let me answer all questions above tomorrow morning Poland time |
@rsc it's only mentioned in passing in the kubemark doc, but did you start a "regular" Kubernetes cluster first as noted in #14396 (comment)? |
You shouldn't need to edit cluster/gce/config-test.sh though; setting |
@ixdy, thanks, I did not understand that comment the first time. Creating that other cluster does seem to have gotten the e2e tests going. |
OK, both at master and at dff7490, I now get further, but I don't think the test is working. I've gotten the command sequence down to:
and then that last command prints:
where each stanza sits for a little while and then says "Unable to retrieve kubelet pods for node XXX". I am guessing this is not good. I let it run for quite a while once and then interrupted it. I can let it run longer next time if this is actually OK. When I interrupt it I get:
Am I making progress? |
Yes - you're really close. If you're passing any flags ( |
OK, without the --delete-namespace=false, I get some latency numbers before it goes into the apparently infinite "Unable to retrieve kubelet pods ..." loop. It looks like this is the one that was reported as high latency in the second comment on kubernetes/kubernetes#20400:
and then after more latencies I get the usual painful pod loop, taking 30 seconds per node:
Is that part important? Can I just ^C it? Is there something useful it's doing? Is there a flag to turn it off? |
@rsc - thanks a lot for looking into it! Regarding what to look at - if you are running Density test (and apparently you are from the logs above), there are lines like these in the test:
[You can grep the logs for "Top latency metric".] And one line like this:
Those are basically lines that we mostly focusing on. Regarding your second question - this painful loop should appear only if the test actually fails (metrics I mentioned above exceeded some threshold that we set up). In that case, we are trying to gather more logs for future debugging. From your perspective you can ^C them, but if you do that you need to cleanup the the cluster after it so that you don't have some rubbish during the next run. BTW, looking into your logs I see:
This is really strange - because we don't start any redis containers in this tests. Did you start them manually? |
@wojtek-t I've never started a redis before. No idea where those came from. |
@rsc that's strange. Can you please do: and check if that is repeatable? |
Thanks, yes, I found that and have been modifying it to flip between 1.4.2, 1.4.3, and 1.5.3. So far I seem to be able to reproduce the minor differences in 100-node tests and I found the kube-apiserver.log that has the raw data in it. This is all good. I am waiting on a quota bump so I can run 1000-node tests. |
@rsc - I've just sent you an invite to my project where I have enough quota to run 1000-node kubemarks. Feel free to use it. |
@rsc - were you able to reproduce the issue? Anything I can help with? |
Hi @wojtek-t. Yes, I've been able to reproduce the observed kubemark latency increases on 1000-node systems. I'm still using your GCE project. Still gathering data. Tentatively, it looks like the difference in average latency from Go 1.4.2 to Go 1.5.3 was mainly due to Go 1.5.3 doing a better job of keeping heap size at 2 x live data size, but that meant more frequent garbage collection and therefore increased overall garbage collection overhead. Running Go 1.5.3 with GOGC=300 (target 4x instead of 2x) brings things back to roughly Go 1.4 levels. On Kubemark that makes plenty of sense, because the benchmark is using about 2 GB of memory on an otherwise mostly idle 120 GB (n1-standard-32) machine; letting it use 4 GB instead is clearly reasonable for improved latency. It's not clear to me whether kube-apiserver should do that in general, since I don't know whether it usually runs on such idle machines. I also tried with Go 1.6.0, and that looks like it is doing a better job at latency even without any GOGC tuning, so my suggestion would be to jump from Go 1.4.2 to Go 1.6.0 and not bother trying to game the GOGC setting. This is what I've seen so far:
So Go 1.6.0 is by default about as good as Go 1.4.2 for the 'LIST nodes' metric. Unfortunately, the test using Go 1.6.0 is failing by now complaining about pod startup latency. Normally you see measured latencies of 1-2 seconds, and the limit in the test is 5 seconds, but the claim in the failure is that some pod startup latencies are measured in minutes. That would obviously be bad, but I can't tell whether it's true. For the 1000 node test starting 30000 pods, both the passing and the failing tests take about 35 minutes to start all the pods. That total time doesn't change even though the reported startup latency changes dramatically. I haven't worked out how the time is measured or what might influence it. If you have any ideas about that, I'd love to hear them. Right now I think that if we can figure out the pod startup latency issue then the API call latency problems are basically fixed by Go 1.6. Separately, from looking at CPU profiles to try to understand the Go 1.5.3 delta I've identified a number of simple optimizations in the kubernetes code that would likely improve API call latency across the board. I will measure and send those once all the Go-specific stuff is done. |
@rsc - awesome debugging thanks! Regarding Go 1.6.0 - are those failure repeatable all the time? Because we recently (~2-3 weeks ago) fixed about that caused pod startup test failing with 1-2 minutes latencies from time to time. IIUC, you were using some old branch, so that fix may not be there.
Great - looking forward for them and happy to review them. I have one more question. From what you wrote above, it seems you don't observe any difference when changing to Go 1.4.3, right? From what I know, we were observing some differences. But I was on paternity leave the whole January, when this was observed, so this is only what I've heard. |
@wojtek-t, I have two 1.4.2 runs and one 1.4.3 run in the numbers above. There does appear to be some difference in that one run at the 50th and 90th percentile, though not 99th. I don't have enough data to know if that's a real measurement or just noise. |
We'll try to recreate the problem on Monday. |
I repeated the kubemark test with Go 1.6.0 on top of current master and it passes. Both pod latency and API call latency are considered OK by the test. The API latency is higher than what I was seeing before, but the test also does a different set of operations than it used to, which could easily affect the overall distribution in either direction. I am running 1.6.0 and 1.4.2 a bunch of times to gather actual statistics. But at least if kubemark is the bar then it looks like Go 1.6.0 is acceptable. I'll have more about 1.4.2 vs 1.6.0 on the current kubemark master after the weekend. Still using your GCE project. FWIW, I noticed that recent changes to the build process mean that 'make quick-release' assumes it has write access to gcr.io/google_containers. I patched around it (rsc/kubernetes@f6dc8f6) but it seems like a potential mistake waiting to happen, especially since there's nothing forcing build/build-image/cross/Dockerfile and build/build-image/cross/VERSION to stay in sync. It's easy to imagine someone updating Dockerfile and forgetting to update VERSION and thereby accidentally replacing, say, kube-cross:v1.4.2-1 with an image using Go 1.6. Or maybe I misunderstand and they are immutable? Regardless, it also makes it harder for non-k8s engineers like me to run tests. |
I agree that's not good (even though we have to #to a separate account to push). Filed kubernetes/kubernetes#22868, thanks. (Note that some containers are merely tagged as being gcr.io/google_containers and then shipped off to the nodes manually (scp)-- they should not be actually pushed to our registry.) |
/cc @luxas Cross-ref to the |
@rsc - it seems you're still using my cluster; do you still need it? I would also like to use. |
Hi @wojtek-t, I do not still need it. Feel free to take down the jobs I have left (should be just a stop-kubemark.sh away from being clean, and I will not start anything new). My quota came through a few days ago. I have a lot of new data from the runs in your cluster that I haven't had time to go through properly. My son got some nasty virus over the weekend and I lost a lot of this week to staying home to take care of him. But I haven't forgotten! |
I sent pull request kubernetes/kubernetes#23210 to clean up one use of time.After that occurs on a particularly highly executed path during the e2e test. See the commit description for details. Times for the kubemark e2e test using Go 1.4.2 and Go 1.6.0, both before and after the timer fix:
I believe that with the timer fix applied, Go 1.6.0 is safe for Kubernetes to merge. As mentioned in the PR commit message, I also intend to make Go 1.7 not require that kind of fix, so that all the other uses of time.After will become asymptotically more efficient automatically. This is #8898. Once #8898 is submitted, then I am OK with recognizing @dvyukov has also been investigating some other possible causes of high 99th percentile latency as part of investigating #14790. He has a number of CLs pending as I write this, and they're linked from that issue. One final note, for @wojtek-t and @gmarek, is that I think Kubernetes may be significantly mismeasuring its server latency in this benchmark, at least with Go 1.4. A 'LIST nodes' RPC comes in about every 5 seconds, and a GC in Go 1.4 takes about 100ms. That means there's about a 1 in 50 chance of the RPC arriving during the GC. If that happens, the RPC sits in a kernel receive buffer while the GC finishes; only once the GC is done does the apiserver read the request and start its timer. The reported latency therefore ignores the time spent waiting for the GC that was going on. Still using very rough numbers, this will affect about 2% of requests, and it will add half a GC pause to each on average. Without knowing the precise shape of the underlying distribution it's hard to estimate the effect on 99th percentile latency, but certainly there is an effect. The reported 99th percentile latency will be lower, perhaps significantly lower, than the actual 99th percentile latency. Go 1.5 introduced a concurrent garbage collector, so that the actual stop-the-world pauses, during which the apiserver ignores incoming network traffic, are much smaller. The apiserver should be able to read the request and start the timer earlier in the cases just described, although since a GC is still in progress taking up some of the CPU, the overall time spent on the request will be longer. So part of the reported latency increase from Go 1.4 to Go 1.5 may well be real problems introduced by Go 1.5, but part of the reported latency increase is from reporting something closer to the actual latency. I think it's encouraging that, after the timer fix, moving from Go 1.4 to Go 1.6 reports basically no latency increase. Since you'd expect the reported latencies to have gone up due to the more accurate reporting but the reported latencies are level, that suggests that actual latencies went down from Go 1.4 to Go 1.6, a win for users if not for the SLO/SLA measurement. I'm hopeful that the combination of Dmitry's scheduler fixes, the timer channel fixes, and @RLH and @aclements's overall GC throughput fixes will make the kubemark latency even lower in Go 1.7. Now that we can run the benchmark, I intend to make sure we test that theory. Once these fixes are all in and I've tested with current Go 1.7 master (on my own cluster, not woktek's), I will close this issue. P.S. The ± above are standard deviation, and all the times are 'LIST nodes' apiserver-reported latency in ms. |
If we are talking about hundreds of milliseconds, I don't think my pending changes will provide significant improvement. But who knows.
Finally! |
@rsc - thanks for awesome work! If you have any suggestions on how to make time measurement better, we're happy to hear them. IIUC this problem impacts pretty much every
pattern measuring wall time. It is a common pattern in Kubernetes (and I guess in quite a few other projects) |
On the topic of accurately timing things, there is also the fact that |
The problem is not the use of time.Now and time.Since, nor the particular kind of clock. (Btw @gmarek, time.Since reads better than time.Now().Subtract.) The problem is that the timer is only started once the server has read the request off the network. Any delay before the request is read (for example, if the request arrives during a GC pause) is therefore not accounted for. The GC pauses that prevent the program from reading from the network and are therefore unaccounted for are larger in Go 1.4 than in Go 1.5. This means that Go 1.5+ is probably giving a more accurate accounting of RPC latency than Go 1.4 did. At least part of the observed latency increase is due to this better accounting. |
I fully understand what you're saying @rsc, just pointing out another issue to keep in mind when one wants to accurately measure things. For measuring the amount of time spent in kernel buffers, this was actually an issue in Google's internal RPC stack that was fixed back around 2009 or so, I forget if it was by using |
CL https://golang.org/cl/21503 mentions this issue. |
Two GC-related functions, scang and casgstatus, wait in an active spin loop. Active spinning is never a good idea in user-space. Once we wait several times more than the expected wait time, something unexpected is happenning (e.g. the thread we are waiting for is descheduled or handling a page fault) and we need to yield to OS scheduler. Moreover, the expected wait time is very high for these functions: scang wait time can be tens of milliseconds, casgstatus can be hundreds of microseconds. It does not make sense to spin even for that time. go install -a std profile on a 4-core machine shows that 11% of time is spent in the active spin in scang: 6.12% compile compile [.] runtime.scang 3.27% compile compile [.] runtime.readgstatus 1.72% compile compile [.] runtime/internal/atomic.Load The active spin also increases tail latency in the case of the slightest oversubscription: GC goroutines spend whole quantum in the loop instead of executing user code. Here is scang wait time histogram during go install -a std: 13707.0000 - 1815442.7667 [ 118]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎... 1815442.7667 - 3617178.5333 [ 9]: ∎∎∎∎∎∎∎∎∎ 3617178.5333 - 5418914.3000 [ 11]: ∎∎∎∎∎∎∎∎∎∎∎ 5418914.3000 - 7220650.0667 [ 5]: ∎∎∎∎∎ 7220650.0667 - 9022385.8333 [ 12]: ∎∎∎∎∎∎∎∎∎∎∎∎ 9022385.8333 - 10824121.6000 [ 13]: ∎∎∎∎∎∎∎∎∎∎∎∎∎ 10824121.6000 - 12625857.3667 [ 15]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 12625857.3667 - 14427593.1333 [ 18]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 14427593.1333 - 16229328.9000 [ 18]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 16229328.9000 - 18031064.6667 [ 32]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 18031064.6667 - 19832800.4333 [ 28]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 19832800.4333 - 21634536.2000 [ 6]: ∎∎∎∎∎∎ 21634536.2000 - 23436271.9667 [ 15]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 23436271.9667 - 25238007.7333 [ 11]: ∎∎∎∎∎∎∎∎∎∎∎ 25238007.7333 - 27039743.5000 [ 27]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 27039743.5000 - 28841479.2667 [ 20]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ 28841479.2667 - 30643215.0333 [ 10]: ∎∎∎∎∎∎∎∎∎∎ 30643215.0333 - 32444950.8000 [ 7]: ∎∎∎∎∎∎∎ 32444950.8000 - 34246686.5667 [ 4]: ∎∎∎∎ 34246686.5667 - 36048422.3333 [ 4]: ∎∎∎∎ 36048422.3333 - 37850158.1000 [ 1]: ∎ 37850158.1000 - 39651893.8667 [ 5]: ∎∎∎∎∎ 39651893.8667 - 41453629.6333 [ 2]: ∎∎ 41453629.6333 - 43255365.4000 [ 2]: ∎∎ 43255365.4000 - 45057101.1667 [ 2]: ∎∎ 45057101.1667 - 46858836.9333 [ 1]: ∎ 46858836.9333 - 48660572.7000 [ 2]: ∎∎ 48660572.7000 - 50462308.4667 [ 3]: ∎∎∎ 50462308.4667 - 52264044.2333 [ 2]: ∎∎ 52264044.2333 - 54065780.0000 [ 2]: ∎∎ and the zoomed-in first part: 13707.0000 - 19916.7667 [ 2]: ∎∎ 19916.7667 - 26126.5333 [ 2]: ∎∎ 26126.5333 - 32336.3000 [ 9]: ∎∎∎∎∎∎∎∎∎ 32336.3000 - 38546.0667 [ 8]: ∎∎∎∎∎∎∎∎ 38546.0667 - 44755.8333 [ 12]: ∎∎∎∎∎∎∎∎∎∎∎∎ 44755.8333 - 50965.6000 [ 10]: ∎∎∎∎∎∎∎∎∎∎ 50965.6000 - 57175.3667 [ 5]: ∎∎∎∎∎ 57175.3667 - 63385.1333 [ 6]: ∎∎∎∎∎∎ 63385.1333 - 69594.9000 [ 5]: ∎∎∎∎∎ 69594.9000 - 75804.6667 [ 6]: ∎∎∎∎∎∎ 75804.6667 - 82014.4333 [ 6]: ∎∎∎∎∎∎ 82014.4333 - 88224.2000 [ 4]: ∎∎∎∎ 88224.2000 - 94433.9667 [ 1]: ∎ 94433.9667 - 100643.7333 [ 1]: ∎ 100643.7333 - 106853.5000 [ 2]: ∎∎ 106853.5000 - 113063.2667 [ 0]: 113063.2667 - 119273.0333 [ 2]: ∎∎ 119273.0333 - 125482.8000 [ 2]: ∎∎ 125482.8000 - 131692.5667 [ 1]: ∎ 131692.5667 - 137902.3333 [ 1]: ∎ 137902.3333 - 144112.1000 [ 0]: 144112.1000 - 150321.8667 [ 2]: ∎∎ 150321.8667 - 156531.6333 [ 1]: ∎ 156531.6333 - 162741.4000 [ 1]: ∎ 162741.4000 - 168951.1667 [ 0]: 168951.1667 - 175160.9333 [ 0]: 175160.9333 - 181370.7000 [ 1]: ∎ 181370.7000 - 187580.4667 [ 1]: ∎ 187580.4667 - 193790.2333 [ 2]: ∎∎ 193790.2333 - 200000.0000 [ 0]: Here is casgstatus wait time histogram: 631.0000 - 5276.6333 [ 3]: ∎∎∎ 5276.6333 - 9922.2667 [ 5]: ∎∎∎∎∎ 9922.2667 - 14567.9000 [ 2]: ∎∎ 14567.9000 - 19213.5333 [ 6]: ∎∎∎∎∎∎ 19213.5333 - 23859.1667 [ 5]: ∎∎∎∎∎ 23859.1667 - 28504.8000 [ 6]: ∎∎∎∎∎∎ 28504.8000 - 33150.4333 [ 6]: ∎∎∎∎∎∎ 33150.4333 - 37796.0667 [ 2]: ∎∎ 37796.0667 - 42441.7000 [ 1]: ∎ 42441.7000 - 47087.3333 [ 3]: ∎∎∎ 47087.3333 - 51732.9667 [ 0]: 51732.9667 - 56378.6000 [ 1]: ∎ 56378.6000 - 61024.2333 [ 0]: 61024.2333 - 65669.8667 [ 0]: 65669.8667 - 70315.5000 [ 0]: 70315.5000 - 74961.1333 [ 1]: ∎ 74961.1333 - 79606.7667 [ 0]: 79606.7667 - 84252.4000 [ 0]: 84252.4000 - 88898.0333 [ 0]: 88898.0333 - 93543.6667 [ 0]: 93543.6667 - 98189.3000 [ 0]: 98189.3000 - 102834.9333 [ 0]: 102834.9333 - 107480.5667 [ 1]: ∎ 107480.5667 - 112126.2000 [ 0]: 112126.2000 - 116771.8333 [ 0]: 116771.8333 - 121417.4667 [ 0]: 121417.4667 - 126063.1000 [ 0]: 126063.1000 - 130708.7333 [ 0]: 130708.7333 - 135354.3667 [ 0]: 135354.3667 - 140000.0000 [ 1]: ∎ Ideally we eliminate the waiting by switching to async state machine for GC, but for now just yield to OS scheduler after a reasonable wait time. To choose yielding parameters I've measured golang.org/x/benchmarks/http tail latencies with different yield delays and oversubscription levels. With no oversubscription (to the degree possible): scang yield delay = 1, casgstatus yield delay = 1 Latency-50 1.41ms ±15% 1.41ms ± 5% ~ (p=0.611 n=13+12) Latency-95 5.21ms ± 2% 5.15ms ± 2% -1.15% (p=0.012 n=13+13) Latency-99 7.16ms ± 2% 7.05ms ± 2% -1.54% (p=0.002 n=13+13) Latency-999 10.7ms ± 9% 10.2ms ±10% -5.46% (p=0.004 n=12+13) scang yield delay = 5000, casgstatus yield delay = 3000 Latency-50 1.41ms ±15% 1.41ms ± 8% ~ (p=0.511 n=13+13) Latency-95 5.21ms ± 2% 5.14ms ± 2% -1.23% (p=0.006 n=13+13) Latency-99 7.16ms ± 2% 7.02ms ± 2% -1.94% (p=0.000 n=13+13) Latency-999 10.7ms ± 9% 10.1ms ± 8% -6.14% (p=0.000 n=12+13) scang yield delay = 10000, casgstatus yield delay = 5000 Latency-50 1.41ms ±15% 1.45ms ± 6% ~ (p=0.724 n=13+13) Latency-95 5.21ms ± 2% 5.18ms ± 1% ~ (p=0.287 n=13+13) Latency-99 7.16ms ± 2% 7.05ms ± 2% -1.64% (p=0.002 n=13+13) Latency-999 10.7ms ± 9% 10.0ms ± 5% -6.72% (p=0.000 n=12+13) scang yield delay = 30000, casgstatus yield delay = 10000 Latency-50 1.41ms ±15% 1.51ms ± 7% +6.57% (p=0.002 n=13+13) Latency-95 5.21ms ± 2% 5.21ms ± 2% ~ (p=0.960 n=13+13) Latency-99 7.16ms ± 2% 7.06ms ± 2% -1.50% (p=0.012 n=13+13) Latency-999 10.7ms ± 9% 10.0ms ± 6% -6.49% (p=0.000 n=12+13) scang yield delay = 100000, casgstatus yield delay = 50000 Latency-50 1.41ms ±15% 1.53ms ± 6% +8.48% (p=0.000 n=13+12) Latency-95 5.21ms ± 2% 5.23ms ± 2% ~ (p=0.287 n=13+13) Latency-99 7.16ms ± 2% 7.08ms ± 2% -1.21% (p=0.004 n=13+13) Latency-999 10.7ms ± 9% 9.9ms ± 3% -7.99% (p=0.000 n=12+12) scang yield delay = 200000, casgstatus yield delay = 100000 Latency-50 1.41ms ±15% 1.47ms ± 5% ~ (p=0.072 n=13+13) Latency-95 5.21ms ± 2% 5.17ms ± 2% ~ (p=0.091 n=13+13) Latency-99 7.16ms ± 2% 7.02ms ± 2% -1.99% (p=0.000 n=13+13) Latency-999 10.7ms ± 9% 9.9ms ± 5% -7.86% (p=0.000 n=12+13) With slight oversubscription (another instance of http benchmark was running in background with reduced GOMAXPROCS): scang yield delay = 1, casgstatus yield delay = 1 Latency-50 840µs ± 3% 804µs ± 3% -4.37% (p=0.000 n=15+18) Latency-95 6.52ms ± 4% 6.03ms ± 4% -7.51% (p=0.000 n=18+18) Latency-99 10.8ms ± 7% 10.0ms ± 4% -7.33% (p=0.000 n=18+14) Latency-999 18.0ms ± 9% 16.8ms ± 7% -6.84% (p=0.000 n=18+18) scang yield delay = 5000, casgstatus yield delay = 3000 Latency-50 840µs ± 3% 809µs ± 3% -3.71% (p=0.000 n=15+17) Latency-95 6.52ms ± 4% 6.11ms ± 4% -6.29% (p=0.000 n=18+18) Latency-99 10.8ms ± 7% 9.9ms ± 6% -7.55% (p=0.000 n=18+18) Latency-999 18.0ms ± 9% 16.5ms ±11% -8.49% (p=0.000 n=18+18) scang yield delay = 10000, casgstatus yield delay = 5000 Latency-50 840µs ± 3% 823µs ± 5% -2.06% (p=0.002 n=15+18) Latency-95 6.52ms ± 4% 6.32ms ± 3% -3.05% (p=0.000 n=18+18) Latency-99 10.8ms ± 7% 10.2ms ± 4% -5.22% (p=0.000 n=18+18) Latency-999 18.0ms ± 9% 16.7ms ±10% -7.09% (p=0.000 n=18+18) scang yield delay = 30000, casgstatus yield delay = 10000 Latency-50 840µs ± 3% 836µs ± 5% ~ (p=0.442 n=15+18) Latency-95 6.52ms ± 4% 6.39ms ± 3% -2.00% (p=0.000 n=18+18) Latency-99 10.8ms ± 7% 10.2ms ± 6% -5.15% (p=0.000 n=18+17) Latency-999 18.0ms ± 9% 16.6ms ± 8% -7.48% (p=0.000 n=18+18) scang yield delay = 100000, casgstatus yield delay = 50000 Latency-50 840µs ± 3% 836µs ± 6% ~ (p=0.401 n=15+18) Latency-95 6.52ms ± 4% 6.40ms ± 4% -1.79% (p=0.010 n=18+18) Latency-99 10.8ms ± 7% 10.2ms ± 5% -4.95% (p=0.000 n=18+18) Latency-999 18.0ms ± 9% 16.5ms ±14% -8.17% (p=0.000 n=18+18) scang yield delay = 200000, casgstatus yield delay = 100000 Latency-50 840µs ± 3% 828µs ± 2% -1.49% (p=0.001 n=15+17) Latency-95 6.52ms ± 4% 6.38ms ± 4% -2.04% (p=0.001 n=18+18) Latency-99 10.8ms ± 7% 10.2ms ± 4% -4.77% (p=0.000 n=18+18) Latency-999 18.0ms ± 9% 16.9ms ± 9% -6.23% (p=0.000 n=18+18) With significant oversubscription (background http benchmark was running with full GOMAXPROCS): scang yield delay = 1, casgstatus yield delay = 1 Latency-50 1.32ms ±12% 1.30ms ±13% ~ (p=0.454 n=14+14) Latency-95 16.3ms ±10% 15.3ms ± 7% -6.29% (p=0.001 n=14+14) Latency-99 29.4ms ±10% 27.9ms ± 5% -5.04% (p=0.001 n=14+12) Latency-999 49.9ms ±19% 45.9ms ± 5% -8.00% (p=0.008 n=14+13) scang yield delay = 5000, casgstatus yield delay = 3000 Latency-50 1.32ms ±12% 1.29ms ± 9% ~ (p=0.227 n=14+14) Latency-95 16.3ms ±10% 15.4ms ± 5% -5.27% (p=0.002 n=14+14) Latency-99 29.4ms ±10% 27.9ms ± 6% -5.16% (p=0.001 n=14+14) Latency-999 49.9ms ±19% 46.8ms ± 8% -6.21% (p=0.050 n=14+14) scang yield delay = 10000, casgstatus yield delay = 5000 Latency-50 1.32ms ±12% 1.35ms ± 9% ~ (p=0.401 n=14+14) Latency-95 16.3ms ±10% 15.0ms ± 4% -7.67% (p=0.000 n=14+14) Latency-99 29.4ms ±10% 27.4ms ± 5% -6.98% (p=0.000 n=14+14) Latency-999 49.9ms ±19% 44.7ms ± 5% -10.56% (p=0.000 n=14+11) scang yield delay = 30000, casgstatus yield delay = 10000 Latency-50 1.32ms ±12% 1.36ms ±10% ~ (p=0.246 n=14+14) Latency-95 16.3ms ±10% 14.9ms ± 5% -8.31% (p=0.000 n=14+14) Latency-99 29.4ms ±10% 27.4ms ± 7% -6.70% (p=0.000 n=14+14) Latency-999 49.9ms ±19% 44.9ms ±15% -10.13% (p=0.003 n=14+14) scang yield delay = 100000, casgstatus yield delay = 50000 Latency-50 1.32ms ±12% 1.41ms ± 9% +6.37% (p=0.008 n=14+13) Latency-95 16.3ms ±10% 15.1ms ± 8% -7.45% (p=0.000 n=14+14) Latency-99 29.4ms ±10% 27.5ms ±12% -6.67% (p=0.002 n=14+14) Latency-999 49.9ms ±19% 45.9ms ±16% -8.06% (p=0.019 n=14+14) scang yield delay = 200000, casgstatus yield delay = 100000 Latency-50 1.32ms ±12% 1.42ms ±10% +7.21% (p=0.003 n=14+14) Latency-95 16.3ms ±10% 15.0ms ± 7% -7.59% (p=0.000 n=14+14) Latency-99 29.4ms ±10% 27.3ms ± 8% -7.20% (p=0.000 n=14+14) Latency-999 49.9ms ±19% 44.8ms ± 8% -10.21% (p=0.001 n=14+13) All numbers are on 8 cores and with GOGC=10 (http benchmark has tiny heap, few goroutines and low allocation rate, so by default GC barely affects tail latency). 10us/5us yield delays seem to provide a reasonable compromise and give 5-10% tail latency reduction. That's what used in this change. go install -a std results on 4 core machine: name old time/op new time/op delta Time 8.39s ± 2% 7.94s ± 2% -5.34% (p=0.000 n=47+49) UserTime 24.6s ± 2% 22.9s ± 2% -6.76% (p=0.000 n=49+49) SysTime 1.77s ± 9% 1.89s ±11% +7.00% (p=0.000 n=49+49) CpuLoad 315ns ± 2% 313ns ± 1% -0.59% (p=0.000 n=49+48) # %CPU MaxRSS 97.1ms ± 4% 97.5ms ± 9% ~ (p=0.838 n=46+49) # bytes Update #14396 Update #14189 Change-Id: I3f4109bf8f7fd79b39c466576690a778232055a2 Reviewed-on: https://go-review.googlesource.com/21503 Run-TryBot: Dmitry Vyukov <dvyukov@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Austin Clements <austin@google.com>
Kubernetes is on Go 1.6 as of April 5 (see kubernetes/kubernetes#20656). |
Yeah - with 1.6 we didn't see any issues. Thanks a lot for help! |
The Kubernetes team found that updating from Go 1.4.2 to Go 1.4.3 caused a performance regression for them, and also that Go 1.5.3 was bad as well.
The 1.4.3 bug is kubernetes/kubernetes#17524, "fixed" by reverting to Go 1.4.2. The error report at the top of the bug is difficult for me to interpret: to me it says mainly that 1 != 0 and 3 != 0. A comment on the next bug says “... but we can't do what InfluxDB did, because we have some pretty massive issues on 1.4.3. 1.4.2 was the last even vaguely happy point.” Right now, I don't have any way to quantify the 1.4.2 to 1.4.3 problem other than that remark. (The InfluxDB note is referring to #14189, but we don't know that they are necessarily related.)
The 1.5.3 bug is kubernetes/kubernetes#20400, "fixed" by reverting to Go 1.4.2 again. That bug has clearer data: on a particular load test, tail latencies are significantly higher in 1.5.3.
The Kubernetes team created kubernetes/kubernetes#20656 to track further investigation of these regressions. This is the corresponding issue on the Go side.
The text was updated successfully, but these errors were encountered: