Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

storage: adopt new raft MaxCommittedSizePerReady config parameter #32387

Merged
merged 1 commit into from
Nov 15, 2018

Conversation

ajwerner
Copy link
Contributor

Before this change, the size of log committed log entries which a replica could
apply at a time was bound to the same configuration as the total size of log
entries which could be sent in a message (MaxSizePerMsg) which is generally
kilobytes. This limit had an impact on the throughput of writes to a replica,
particularly when writing large amounts of data. A new raft configuration option
MaxCommittedSizePerReady was adding to etcd/raft in (etcd-io/etcd#10258)
which allows these two size parameters to be decoupled. This change adopts the
configuration and sets it to a default of 64MB.

On the below workload which is set up to always return exactly one entry per
Ready wiht the old configuration we see a massive win in both throughput and
latency.

./workload run kv {pgurl:1-3} \
    --init --splits=10 \
    --duration 60s \
    --read-percent=${READ_PERCENT} \
    --min-block-bytes=8193 --max-block-bytes=16385 \
    --concurrency=1024
name       old ops/s  new ops/s  delta
KV0         483 ± 3%  2025 ± 3%  +319.32%  (p=0.002 n=6+6)

Before:

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
   60.0s        0          29570          492.8   1981.2   2281.7   5100.3   5637.1   6442.5  write
   60.0s        0          28405          473.4   2074.8   2281.7   5637.1   6710.9   7516.2  write
   60.0s        0          28615          476.9   2074.3   2550.1   5905.6   6442.5   8321.5  write
   60.0s        0          28718          478.6   2055.4   2550.1   5100.3   6442.5   7516.2  write
   60.0s        0          28567          476.1   2079.8   2684.4   4831.8   5368.7   6442.5  write
   60.0s        0          29981          499.7   1975.7   1811.9   5368.7   6174.0   6979.3  write

After:

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
   60.0s        0         119652         1994.0    510.9    486.5   1006.6   1409.3   4295.0  write
   60.0s        0         125321         2088.4    488.5    469.8    906.0   1275.1   4563.4  write
   60.0s        0         119644         1993.9    505.2    469.8   1006.6   1610.6   5637.1  write
   60.0s        0         119027         1983.6    511.4    469.8   1073.7   1946.2   4295.0  write
   60.0s        0         121723         2028.5    500.6    469.8   1040.2   1677.7   4160.7  write
   60.0s        0         123697         2061.4    494.1    469.8   1006.6   1610.6   4295.0  write

Fixes #31511

Release note: None

@ajwerner ajwerner requested review from a team November 15, 2018 16:15
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm_strong: @bdarnell should sign off on the default value here, then we should be good.

s/Ready wiht the/Ready with the/ in the commit message

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained


pkg/storage/store.go, line 143 at r1 (raw file):

		Settings:                    st,
		AmbientCtx:                  log.AmbientContext{Tracer: st.Tracer},
		Clock:                       clock,

nit: are you running go1.11's gofmt? We're still using go1.10's at the moment.

Before this change, the size of log committed log entries which a replica could
apply at a time was bound to the same configuration as the total size of log
entries which could be sent in a message (MaxSizePerMsg) which is generally
kilobytes. This limit had an impact on the throughput of writes to a replica,
particularly when writing large amounts of data. A new raft configuration option
MaxCommittedSizePerReady was adding to etcd/raft in (etcd-io/etcd#10258)
which allows these two size parameters to be decoupled. This change adopts the
configuration and sets it to a default of 64MB.

On the below workload which is set up to always return exactly one entry per
Ready with the old configuration we see a massive win in both throughput and
latency.

```
./workload run kv {pgurl:1-3} \
    --init --splits=10 \
    --duration 60s \
    --read-percent=${READ_PERCENT} \
    --min-block-bytes=8193 --max-block-bytes=16385 \
    --concurrency=1024
```

```
name       old ops/s  new ops/s  delta
KV0         483 ± 3%  2025 ± 3%  +319.32%  (p=0.002 n=6+6)
```

Before:
```
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
   60.0s        0          29570          492.8   1981.2   2281.7   5100.3   5637.1   6442.5  write
   60.0s        0          28405          473.4   2074.8   2281.7   5637.1   6710.9   7516.2  write
   60.0s        0          28615          476.9   2074.3   2550.1   5905.6   6442.5   8321.5  write
   60.0s        0          28718          478.6   2055.4   2550.1   5100.3   6442.5   7516.2  write
   60.0s        0          28567          476.1   2079.8   2684.4   4831.8   5368.7   6442.5  write
   60.0s        0          29981          499.7   1975.7   1811.9   5368.7   6174.0   6979.3  write
```

After:
```
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
   60.0s        0         119652         1994.0    510.9    486.5   1006.6   1409.3   4295.0  write
   60.0s        0         125321         2088.4    488.5    469.8    906.0   1275.1   4563.4  write
   60.0s        0         119644         1993.9    505.2    469.8   1006.6   1610.6   5637.1  write
   60.0s        0         119027         1983.6    511.4    469.8   1073.7   1946.2   4295.0  write
   60.0s        0         121723         2028.5    500.6    469.8   1040.2   1677.7   4160.7  write
   60.0s        0         123697         2061.4    494.1    469.8   1006.6   1610.6   4295.0  write
```

Fixes cockroachdb#31511

Release note: None
@ajwerner ajwerner force-pushed the ajwerner/raft-setting-change branch from ec36024 to 4787e3c Compare November 15, 2018 16:34
Copy link
Contributor Author

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the commit message typo.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained


pkg/storage/store.go, line 143 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: are you running go1.11's gofmt? We're still using go1.10's at the moment.

Reformatted with gofmt from 1.10

@ajwerner ajwerner requested a review from bdarnell November 15, 2018 16:37
@nvanbenschoten
Copy link
Member

should sign off on the default value here

Specifically, 64MB might be a bit high, but I'll defer to Ben's judgment.

Copy link
Contributor

@bdarnell bdarnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained

@ajwerner
Copy link
Contributor Author

bors r+

craig bot pushed a commit that referenced this pull request Nov 15, 2018
32387: storage: adopt new raft MaxCommittedSizePerReady config parameter r=ajwerner a=ajwerner

Before this change, the size of log committed log entries which a replica could
apply at a time was bound to the same configuration as the total size of log
entries which could be sent in a message (MaxSizePerMsg) which is generally
kilobytes. This limit had an impact on the throughput of writes to a replica,
particularly when writing large amounts of data. A new raft configuration option
MaxCommittedSizePerReady was adding to etcd/raft in (etcd-io/etcd#10258)
which allows these two size parameters to be decoupled. This change adopts the
configuration and sets it to a default of 64MB.

On the below workload which is set up to always return exactly one entry per
Ready wiht the old configuration we see a massive win in both throughput and
latency.

```
./workload run kv {pgurl:1-3} \
    --init --splits=10 \
    --duration 60s \
    --read-percent=${READ_PERCENT} \
    --min-block-bytes=8193 --max-block-bytes=16385 \
    --concurrency=1024
```

```
name       old ops/s  new ops/s  delta
KV0         483 ± 3%  2025 ± 3%  +319.32%  (p=0.002 n=6+6)
```

Before:
```
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
   60.0s        0          29570          492.8   1981.2   2281.7   5100.3   5637.1   6442.5  write
   60.0s        0          28405          473.4   2074.8   2281.7   5637.1   6710.9   7516.2  write
   60.0s        0          28615          476.9   2074.3   2550.1   5905.6   6442.5   8321.5  write
   60.0s        0          28718          478.6   2055.4   2550.1   5100.3   6442.5   7516.2  write
   60.0s        0          28567          476.1   2079.8   2684.4   4831.8   5368.7   6442.5  write
   60.0s        0          29981          499.7   1975.7   1811.9   5368.7   6174.0   6979.3  write
```

After:
```
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
   60.0s        0         119652         1994.0    510.9    486.5   1006.6   1409.3   4295.0  write
   60.0s        0         125321         2088.4    488.5    469.8    906.0   1275.1   4563.4  write
   60.0s        0         119644         1993.9    505.2    469.8   1006.6   1610.6   5637.1  write
   60.0s        0         119027         1983.6    511.4    469.8   1073.7   1946.2   4295.0  write
   60.0s        0         121723         2028.5    500.6    469.8   1040.2   1677.7   4160.7  write
   60.0s        0         123697         2061.4    494.1    469.8   1006.6   1610.6   4295.0  write
```

Fixes #31511

Release note: None

Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
@craig
Copy link
Contributor

craig bot commented Nov 15, 2018

Build succeeded

@craig craig bot merged commit 4787e3c into cockroachdb:master Nov 15, 2018
@nvanbenschoten
Copy link
Member

Once #32600 is merged, I think we should backport this to 2.1 as well.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants