-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
rowexec: upgrade hyperloglog for table stats processors #138561
Conversation
fc51387
to
8dce713
Compare
8dce713
to
8592fdc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an easy way to add this regression test? #137386 (comment)
Reviewed 2 of 2 files at r1, 18 of 18 files at r2, 10 of 10 files at r3, 8 of 8 files at r4, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @michae2 and @yuzefovich)
-- commits
line 4 at r1:
nit: currenntly => currently
pkg/sql/execversion/version.go
line 18 at r3 (raw file):
// execution logic to be used for a particular DistSQL flow and is picked by the // gateway node after consulting the cluster version. type DistSQLVersion uint32
🚲 shed: should this type be called simply V
?
8592fdc
to
924739f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spent some time trying to figure out how to add a regression test for this. I could think of two options:
- a full-blown roachtest, which seems like an overkill, and we do get that coverage implicitly via other mixed-version roachtests
- using
cockroach-go-testserver-configs
logic test config which in theory is exactly what we want. However, I was unable to get the test to fail with the revert of stats: revert upgrade of hyperloglog library #138358 and not sure why. I also often run into errors onEXPERIMENTAL_RELOCATE
commands after having upgraded a node in the cluster, even withretry
directive, so it seems like the test could be flaky. Perhaps we need to harden this config a bit. I'll reach out to Foundations to ask if they have any advice.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner and @michae2)
pkg/sql/execversion/version.go
line 18 at r3 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
🚲 shed: should this type be called simply
V
?
Nice, I like it, done.
924739f
to
0223a82
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a regression test via the logic test framework. There were a couple of difficulties with the framework itself (see here) that I stumbled upon and will improve in a separate patch. Let's see whether CI accepts this test (it works locally and properly failed when reverting #138358).
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner and @michae2)
We currently store the statement in the context as a value in the hopes of enriching the sentry reports. Anecdotally, most reports don't get the stmt information (because the current mechanism is missing some code paths), and even when it's present in most cases the report is still unactionable. A more lengthy discussion can be found in this slack [thread](https://cockroachlabs.slack.com/archives/C01RX2G8LT1/p1733954812725919). This commit removes this altogether in order to free up a spot in the "fast values" context storage that will be utilized by the following commit. We also try to include the plan gist into sentry reports, and I find that a bit more helpful, so I chose to keep plan gists over the stmts. Release note: None
The following commits will introduce dependency on this version into other low-dependency packages, so this extraction is needed to avoid import cycles. The only change is renaming `DistSQLVersion` type to `V`, to avoid stuttering with the new package name. Release note: None
This commit refactors the DistSQL version mechanism to better serve our needs. When it was first introduced, it was too coarse since a change in the logic in one processor would make _all_ physical plans not be distributable in the mixed-version state, so we have avoided performing the DistSQL version bumps since 23.1 cockroach version. However, we still need a mechanism to perform execution logic "upgrades". The main idea is that we will need to keep the code that can execute every version in `[MinAccepted, Latest]` range, and we’d have execution logic conditional on the version that is put into `SetupFlowRequest`. The gateway for the query will be consulting cluster version to decide which DistSQL exec version is safe to use. `MinAccepted` can only be bumped (and the corresponding execution code retired / conditionals removed) once compatibility with the cluster version that uses it is no longer needed. For example, right now DistSQL version is 71, and it’s what all 24.3 and prior binaries use; say in 25.1 we introduce a change in the execution engine, then we bump the version to 72 but keep MinAccepted as 71. In mixed-version 24.3-25.1 cluster we’ll be using version 71 so that all nodes can participate in the distributed plan. Once compatibility with 24.3 is no longer needed, we can bump MinAccepted to 72 and remove the old execution code. In order to reduce the amount of plumbing necessary for these migrations, we'll store the DistSQL exec version in the context. This commit already bumps the 25.1 exec version to 72 (in anticipation of its usage in the following commit). Release note: None
This commit utilizes the updated DistSQL version mechanism to upgrade the hyperloglog library used for the distinct count estimation in the sampler and sample aggregator. In `sketchInfo` struct we now have fields corresponding to the old and the new versions and we populate only one, based on the DistSQL version provided on the processor creation via the context. Only one of the two fields is updated throughout the execution, so there is no confusion when unmarshalling the sketch datum. It turns out that we already had a sketch version baked into the sketch proto, so we could have used that to perform this upgrade. However, we should rely on the universal DistSQL version mechanism instead, so this commit also begins the process of removing the sketch version from the proto by deleting the version checks. Release note: None
0223a82
to
62765d6
Compare
Looks like CI is green (modulo unrelated flake), so let's keep the test. I'm a bit concerned that it'll be flaky, and if so, we can remove it later. TFTR! bors r+ |
Nice job figuring out the test. 🤞 it's not flaky. |
137916: roachprod: reduce start-up lag r=tbg a=tbg On my machine (which is in Europe), this brings `time roachprod --help` from `1.56s` down to to `0.06s` under the following env vars: ``` ROACHPROD_DISABLE_UPDATE_CHECK=true ROACHPROD_DISABLED_PROVIDERS=azure ROACHPROD_SKIP_AWSCLI_CHECK=true ``` Under these env vars, my roachprod - no longer invokes `aws --version` on each start (python, ~400ms) - no longer inits azure, which is >1s for me - doesn't list the gs bucket to check for a newer roachprod binary (~800ms; doesn't exist for OSX anyway). A better way (but one outside of my purview) for most of these would be to add caching for each of these and so to avoid the cost in the common case. Azure is an exception, as the (wall-clock) profile below shows we're spending most of our time waiting for `GetTokenFromCLIWithParams` to return. It's not clear how to optimize this. (The AWS portion of the flamegraph is `aws --version`). ![image](https://github.com/user-attachments/assets/b4677da6-c5a5-4552-b0d5-462932f1062e) Epic: none 138283: DEPS: upgrade grpc to v1.57.2 r=tbg a=tbg See #136278 (comment). `grpc` has gotten a little worse at allocations, but it's overall similarly fast, perhaps even a little faster in the smaller RPCs we care most about. <details><summary>Benchmark results</summary> <p> ``` $ benchdiff --old lastmerge ./pkg/rpc -b -r 'BenchmarkGRPCPing' -d 1s -c 10 old: 3ce8f44 Merge #138561 #138779 #138793 new: 3708ee5 DEPS: add resolve hints and update packages name old time/op new time/op delta GRPCPing/bytes=____256/rpc=UnaryUnary-24 126µs ± 3% 124µs ± 2% -1.59% (p=0.035 n=9+10) GRPCPing/bytes=___8192/rpc=StreamStream-24 126µs ± 3% 124µs ± 1% -1.32% (p=0.011 n=10+10) GRPCPing/bytes=______1/rpc=UnaryUnary-24 124µs ± 4% 123µs ± 3% ~ (p=0.315 n=10+10) GRPCPing/bytes=______1/rpc=StreamStream-24 70.3µs ± 3% 70.8µs ± 2% ~ (p=0.393 n=10+10) GRPCPing/bytes=____256/rpc=StreamStream-24 74.5µs ± 3% 75.1µs ± 2% ~ (p=0.105 n=10+10) GRPCPing/bytes=___1024/rpc=UnaryUnary-24 123µs ± 6% 120µs ± 4% ~ (p=0.661 n=10+9) GRPCPing/bytes=___1024/rpc=StreamStream-24 67.4µs ± 8% 67.4µs ± 6% ~ (p=0.720 n=10+9) GRPCPing/bytes=___2048/rpc=UnaryUnary-24 133µs ± 5% 133µs ± 4% ~ (p=0.986 n=10+10) GRPCPing/bytes=___2048/rpc=StreamStream-24 73.9µs ± 1% 74.6µs ± 2% ~ (p=0.234 n=8+8) GRPCPing/bytes=___4096/rpc=UnaryUnary-24 150µs ± 2% 151µs ± 3% ~ (p=0.182 n=9+10) GRPCPing/bytes=___4096/rpc=StreamStream-24 97.4µs ±10% 95.3µs ±10% ~ (p=0.393 n=10+10) GRPCPing/bytes=___8192/rpc=UnaryUnary-24 175µs ± 1% 176µs ± 2% ~ (p=0.720 n=9+10) GRPCPing/bytes=__16384/rpc=UnaryUnary-24 252µs ± 1% 253µs ± 1% ~ (p=0.315 n=9+10) GRPCPing/bytes=__16384/rpc=StreamStream-24 190µs ± 1% 189µs ± 2% ~ (p=0.497 n=9+10) GRPCPing/bytes=__32768/rpc=UnaryUnary-24 363µs ± 1% 366µs ± 1% ~ (p=0.079 n=10+9) GRPCPing/bytes=__32768/rpc=StreamStream-24 305µs ± 3% 305µs ± 1% ~ (p=0.579 n=10+10) GRPCPing/bytes=__65536/rpc=UnaryUnary-24 512µs ± 2% 515µs ± 1% ~ (p=0.095 n=9+10) GRPCPing/bytes=__65536/rpc=StreamStream-24 449µs ± 1% 452µs ± 1% ~ (p=0.059 n=9+8) GRPCPing/bytes=_262144/rpc=UnaryUnary-24 1.48ms ± 3% 1.48ms ± 2% ~ (p=0.739 n=10+10) GRPCPing/bytes=_262144/rpc=StreamStream-24 1.42ms ± 1% 1.41ms ± 2% ~ (p=0.182 n=9+10) GRPCPing/bytes=1048576/rpc=UnaryUnary-24 5.90ms ± 2% 5.86ms ± 1% ~ (p=0.278 n=10+9) GRPCPing/bytes=1048576/rpc=StreamStream-24 5.81ms ± 2% 5.84ms ± 3% ~ (p=0.631 n=10+10) name old speed new speed delta GRPCPing/bytes=____256/rpc=UnaryUnary-24 4.44MB/s ± 3% 4.51MB/s ± 2% +1.58% (p=0.033 n=9+10) GRPCPing/bytes=___8192/rpc=StreamStream-24 130MB/s ± 3% 132MB/s ± 1% +1.32% (p=0.010 n=10+10) GRPCPing/bytes=______1/rpc=UnaryUnary-24 386kB/s ± 4% 391kB/s ± 3% ~ (p=0.378 n=10+10) GRPCPing/bytes=______1/rpc=StreamStream-24 682kB/s ± 3% 676kB/s ± 2% ~ (p=0.189 n=10+9) GRPCPing/bytes=____256/rpc=StreamStream-24 7.52MB/s ± 3% 7.46MB/s ± 2% ~ (p=0.100 n=10+10) GRPCPing/bytes=___1024/rpc=UnaryUnary-24 17.1MB/s ± 6% 17.4MB/s ± 4% ~ (p=0.645 n=10+9) GRPCPing/bytes=___1024/rpc=StreamStream-24 31.1MB/s ± 8% 31.1MB/s ± 6% ~ (p=0.720 n=10+9) GRPCPing/bytes=___2048/rpc=UnaryUnary-24 31.1MB/s ± 5% 31.2MB/s ± 4% ~ (p=0.986 n=10+10) GRPCPing/bytes=___2048/rpc=StreamStream-24 56.1MB/s ± 1% 55.6MB/s ± 2% ~ (p=0.224 n=8+8) GRPCPing/bytes=___4096/rpc=UnaryUnary-24 55.1MB/s ± 2% 54.6MB/s ± 3% ~ (p=0.189 n=9+10) GRPCPing/bytes=___4096/rpc=StreamStream-24 85.1MB/s ±11% 87.0MB/s ±11% ~ (p=0.393 n=10+10) GRPCPing/bytes=___8192/rpc=UnaryUnary-24 93.7MB/s ± 1% 93.5MB/s ± 2% ~ (p=0.720 n=9+10) GRPCPing/bytes=__16384/rpc=UnaryUnary-24 130MB/s ± 1% 130MB/s ± 1% ~ (p=0.305 n=9+10) GRPCPing/bytes=__16384/rpc=StreamStream-24 173MB/s ± 1% 173MB/s ± 2% ~ (p=0.497 n=9+10) GRPCPing/bytes=__32768/rpc=UnaryUnary-24 180MB/s ± 1% 179MB/s ± 1% ~ (p=0.079 n=10+9) GRPCPing/bytes=__32768/rpc=StreamStream-24 215MB/s ± 2% 215MB/s ± 1% ~ (p=0.579 n=10+10) GRPCPing/bytes=__65536/rpc=UnaryUnary-24 256MB/s ± 2% 255MB/s ± 1% ~ (p=0.095 n=9+10) GRPCPing/bytes=__65536/rpc=StreamStream-24 292MB/s ± 1% 290MB/s ± 1% ~ (p=0.059 n=9+8) GRPCPing/bytes=_262144/rpc=UnaryUnary-24 353MB/s ± 3% 353MB/s ± 2% ~ (p=0.447 n=10+9) GRPCPing/bytes=_262144/rpc=StreamStream-24 369MB/s ± 1% 371MB/s ± 2% ~ (p=0.182 n=9+10) GRPCPing/bytes=1048576/rpc=UnaryUnary-24 355MB/s ± 2% 358MB/s ± 1% ~ (p=0.278 n=10+9) GRPCPing/bytes=1048576/rpc=StreamStream-24 361MB/s ± 2% 359MB/s ± 3% ~ (p=0.631 n=10+10) name old alloc/op new alloc/op delta GRPCPing/bytes=______1/rpc=UnaryUnary-24 16.9kB ± 1% 16.9kB ± 3% ~ (p=0.579 n=10+10) GRPCPing/bytes=____256/rpc=UnaryUnary-24 19.8kB ± 2% 19.9kB ± 2% ~ (p=0.755 n=10+10) GRPCPing/bytes=____256/rpc=StreamStream-24 7.35kB ± 2% 7.43kB ± 2% ~ (p=0.052 n=10+10) GRPCPing/bytes=___1024/rpc=UnaryUnary-24 29.8kB ± 2% 29.8kB ± 1% ~ (p=0.853 n=10+10) GRPCPing/bytes=___1024/rpc=StreamStream-24 17.7kB ± 1% 17.7kB ± 1% ~ (p=0.796 n=10+10) GRPCPing/bytes=___2048/rpc=UnaryUnary-24 43.2kB ± 1% 43.0kB ± 1% ~ (p=0.218 n=10+10) GRPCPing/bytes=___2048/rpc=StreamStream-24 31.0kB ± 0% 31.1kB ± 1% ~ (p=0.278 n=9+10) GRPCPing/bytes=___4096/rpc=UnaryUnary-24 73.0kB ± 1% 73.2kB ± 1% ~ (p=0.393 n=10+10) GRPCPing/bytes=___4096/rpc=StreamStream-24 61.6kB ± 1% 61.7kB ± 0% ~ (p=0.573 n=10+8) GRPCPing/bytes=___8192/rpc=UnaryUnary-24 127kB ± 0% 127kB ± 1% ~ (p=0.393 n=10+10) GRPCPing/bytes=___8192/rpc=StreamStream-24 118kB ± 1% 118kB ± 0% ~ (p=0.796 n=10+10) GRPCPing/bytes=__16384/rpc=UnaryUnary-24 237kB ± 1% 237kB ± 1% ~ (p=0.579 n=10+10) GRPCPing/bytes=__16384/rpc=StreamStream-24 227kB ± 1% 227kB ± 1% ~ (p=0.481 n=10+10) GRPCPing/bytes=__32768/rpc=UnaryUnary-24 500kB ± 1% 500kB ± 1% ~ (p=0.912 n=10+10) GRPCPing/bytes=__32768/rpc=StreamStream-24 492kB ± 0% 492kB ± 0% ~ (p=0.968 n=9+10) GRPCPing/bytes=__65536/rpc=UnaryUnary-24 873kB ± 0% 872kB ± 0% ~ (p=0.780 n=9+10) GRPCPing/bytes=__65536/rpc=StreamStream-24 868kB ± 0% 868kB ± 0% ~ (p=1.000 n=9+9) GRPCPing/bytes=_262144/rpc=UnaryUnary-24 3.50MB ± 0% 3.51MB ± 0% ~ (p=0.436 n=10+10) GRPCPing/bytes=_262144/rpc=StreamStream-24 3.49MB ± 0% 3.50MB ± 0% ~ (p=0.436 n=10+10) GRPCPing/bytes=1048576/rpc=UnaryUnary-24 13.5MB ± 0% 13.5MB ± 0% ~ (p=0.515 n=8+10) GRPCPing/bytes=1048576/rpc=StreamStream-24 13.5MB ± 0% 13.5MB ± 0% ~ (p=0.549 n=10+9) GRPCPing/bytes=______1/rpc=StreamStream-24 4.08kB ± 3% 4.18kB ± 3% +2.28% (p=0.008 n=9+10) name old allocs/op new allocs/op delta GRPCPing/bytes=_262144/rpc=UnaryUnary-24 282 ± 4% 286 ± 4% ~ (p=0.223 n=10+10) GRPCPing/bytes=_262144/rpc=StreamStream-24 147 ± 3% 149 ± 3% ~ (p=0.053 n=9+8) GRPCPing/bytes=1048576/rpc=UnaryUnary-24 510 ± 2% 513 ± 3% ~ (p=0.656 n=8+9) GRPCPing/bytes=1048576/rpc=StreamStream-24 370 ± 6% 377 ± 3% ~ (p=0.168 n=9+9) GRPCPing/bytes=____256/rpc=UnaryUnary-24 183 ± 0% 184 ± 0% +0.71% (p=0.000 n=8+10) GRPCPing/bytes=______1/rpc=UnaryUnary-24 183 ± 0% 184 ± 0% +0.77% (p=0.000 n=10+8) GRPCPing/bytes=__32768/rpc=UnaryUnary-24 211 ± 0% 213 ± 0% +0.95% (p=0.000 n=10+10) GRPCPing/bytes=__16384/rpc=UnaryUnary-24 195 ± 0% 197 ± 0% +1.03% (p=0.000 n=10+10) GRPCPing/bytes=___8192/rpc=UnaryUnary-24 184 ± 0% 186 ± 0% +1.09% (p=0.000 n=10+10) GRPCPing/bytes=___2048/rpc=UnaryUnary-24 183 ± 0% 185 ± 0% +1.09% (p=0.000 n=10+10) GRPCPing/bytes=___4096/rpc=UnaryUnary-24 183 ± 0% 185 ± 0% +1.09% (p=0.000 n=10+10) GRPCPing/bytes=___1024/rpc=UnaryUnary-24 182 ± 0% 184 ± 0% +1.10% (p=0.000 n=10+10) GRPCPing/bytes=__65536/rpc=UnaryUnary-24 219 ± 0% 221 ± 0% +1.10% (p=0.000 n=10+8) GRPCPing/bytes=__32768/rpc=StreamStream-24 75.0 ± 0% 77.0 ± 0% +2.67% (p=0.000 n=10+10) GRPCPing/bytes=__65536/rpc=StreamStream-24 83.0 ± 0% 85.3 ± 1% +2.77% (p=0.000 n=9+10) GRPCPing/bytes=__16384/rpc=StreamStream-24 57.0 ± 0% 59.0 ± 0% +3.51% (p=0.000 n=10+10) GRPCPing/bytes=___8192/rpc=StreamStream-24 51.0 ± 0% 53.0 ± 0% +3.92% (p=0.000 n=10+10) GRPCPing/bytes=___4096/rpc=StreamStream-24 49.0 ± 0% 51.0 ± 0% +4.08% (p=0.000 n=10+10) GRPCPing/bytes=___2048/rpc=StreamStream-24 48.0 ± 0% 50.0 ± 0% +4.17% (p=0.000 n=10+10) GRPCPing/bytes=______1/rpc=StreamStream-24 47.0 ± 0% 49.0 ± 0% +4.26% (p=0.000 n=10+10) GRPCPing/bytes=____256/rpc=StreamStream-24 47.0 ± 0% 49.0 ± 0% +4.26% (p=0.000 n=10+10) GRPCPing/bytes=___1024/rpc=StreamStream-24 47.0 ± 0% 49.0 ± 0% +4.26% (p=0.000 n=10+10) ``` </p> </details> Epic: None Release note: None 138939: changefeedccl/kvfeed: pass consumer id correctly r=andyyang890,stevendanna a=wenyihu6 Previously, we introduced the concept of a consumer ID to prevent a single changefeed job from over-consuming the catch-up scan quota and blocking other consumers from making progress on the server side. However, the changefeed client-side code requires the consumer ID to be passed again in the rangefeed options during rangefeedFactory.Run. This was missing in the previous PR, causing the changefeed job ID to default to zero. This patch fixes the issue by ensuring the consumer ID is correctly passed in the rangefeed options. Related: #133789 Release note: none Epic: none Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com> Co-authored-by: Wenyi Hu <wenyi@cockroachlabs.com>
sql: stop storing statement in the context
We currently store the statement in the context as a value in the hopes of enriching the sentry reports. Anecdotally, most reports don't get the stmt information (because the current mechanism is missing some code paths), and even when it's present in most cases the report is still unactionable. A more lengthy discussion can be found in this slack thread.
This commit removes this altogether in order to free up a spot in the "fast values" context storage that will be utilized by the following commit. We also try to include the plan gist into sentry reports, and I find that a bit more helpful, so I chose to keep plan gists over the stmts.
Fixes: #38516.
execversion: extract DistSQL version into a new package
The following commits will introduce dependency on this version into other low-dependency packages, so this extraction is needed to avoid import cycles.
execversion: refactor DistSQL version
This commit refactors the DistSQL version mechanism to better serve our needs. When it was first introduced, it was too coarse since a change in the logic in one processor would make all physical plans not be distributable in the mixed-version state, so we have avoided performing the DistSQL version bumps since 23.1 cockroach version. However, we still need a mechanism to perform execution logic "upgrades".
The main idea is that we will need to keep the code that can execute every version in
[MinAccepted, Latest]
range, and we’d have execution logic conditional on the version that is put intoSetupFlowRequest
. The gateway for the query will be consulting cluster version to decide which DistSQL exec version is safe to use.MinAccepted
can only be bumped (and the corresponding execution code retired / conditionals removed) once compatibility with the cluster version that uses it is no longer needed.For example, right now DistSQL version is 71, and it’s what all 24.3 and prior binaries use; say in 25.1 we introduce a change in the execution engine, then we bump the version to 72 but keep MinAccepted as 71. In mixed-version 24.3-25.1 cluster we’ll be using version 71 so that all nodes can participate in the distributed plan. Once compatibility with 24.3 is no longer needed, we can bump MinAccepted to 72 and remove the old execution code.
In order to reduce the amount of plumbing necessary for these migrations, we'll store the DistSQL exec version in the context. This commit already bumps the 25.1 exec version to 72 (in anticipation of its usage in the following commit).
Fixes: #98550.
rowexec: upgrade hyperloglog for table stats processors
This commit utilizes the updated DistSQL version mechanism to upgrade the hyperloglog library used for the distinct count estimation in the sampler and sample aggregator. In
sketchInfo
struct we now have fields corresponding to the old and the new versions and we populate only one, based on the DistSQL version provided on the processor creation via the context. Only one of the two fields is updated throughout the execution, so there is no confusion when unmarshalling the sketch datum.It turns out that we already had a sketch version baked into the sketch proto, so we could have used that to perform this upgrade. However, we should rely on the universal DistSQL version mechanism instead, so this commit also begins the process of removing the sketch version from the proto by deleting the version checks.
Fixes: #138546.
Release note: None