-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
[LibOS] Use static buffer for first thread executing sendfile()
#1615
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel)
libos/src/sys/libos_file.c
line 23 at r1 (raw file):
* Read/write in 64KB chunks in the sendfile() syscall. This syscall also has an optimization of * using a statically allocated buffer instead of allocating on the heap (as our internal malloc() * has a heavy mutex that guards all memory-allocation operations). To prevent data races of
Actually, maybe I should drop the has a heavy mutex ...
explanation because I'm not sure it's correct in our case -- after all, Nginx is single-threaded so there shouldn't be really any mutex contention on malloc/free. Probably the overhead comes just from our internal allocator, which is not super-optimized...
b9ea93d
to
8b46db0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @dimakuv)
libos/src/sys/libos_file.c
line 23 at r1 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
Actually, maybe I should drop the
has a heavy mutex ...
explanation because I'm not sure it's correct in our case -- after all, Nginx is single-threaded so there shouldn't be really any mutex contention on malloc/free. Probably the overhead comes just from our internal allocator, which is not super-optimized...
+1
libos/src/sys/libos_file.c
line 458 at r2 (raw file):
bool buf_in_use = false; if (__atomic_compare_exchange_n(&g_sendfile_buf_in_use, &buf_in_use, /*desired=*/true, /*weak=*/true, __ATOMIC_RELEASE, __ATOMIC_ACQUIRE)) {
Actually, you can simplify (+ probably slightly speed up) this by using __atomic_exchange_n()
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 files reviewed, 2 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @mkow)
libos/src/sys/libos_file.c
line 23 at r1 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
+1
Done.
libos/src/sys/libos_file.c
line 458 at r2 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
Actually, you can simplify (+ probably slightly speed up) this by using
__atomic_exchange_n()
instead.
Done.
According to my quick-and-dirty measurements, there is no change in performance (in comparison to the previous approach with __atomic_compare_exchange_n
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r3, all commit messages.
Reviewable status: all files reviewed, all discussions resolved, not enough approvals from maintainers (1 more required), "fixup! " found in commit messages' one-liners
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required), "fixup! " found in commit messages' one-liners
a discussion (no related file):
@aneessahib and @TejaswineeL: could you verify that this PR improves performance on your Nginx workload? The effect should be the same as in the original PR #1600. I will block until you ack/report the result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r3, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion, "fixup! " found in commit messages' one-liners
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 1 unresolved discussion, "fixup! " found in commit messages' one-liners (waiting on @aneessahib)
a discussion (no related file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
@aneessahib and @TejaswineeL: could you verify that this PR improves performance on your Nginx workload? The effect should be the same as in the original PR #1600. I will block until you ack/report the result.
@dimakuv
yes the changes give the expected performance results for nginx (as that of with thread unsafe static buffer approach)
we are good to go with this change for nginx
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, all discussions resolved, "fixup! " found in commit messages' one-liners
a discussion (no related file):
Previously, TejaswineeL (Tejaswinee) wrote…
@dimakuv
yes the changes give the expected performance results for nginx (as that of with thread unsafe static buffer approach)
we are good to go with this change for nginxThanks
@TejaswineeL Awesome, thanks for running the experiment!
This performance optimization uses a statically allocated buffer instead of allocating on the heap in the `sendfile()` syscall. To prevent data races on a single static buffer, this optimization is applied only to the first thread. This optimization is useful for sendfile-heavy workloads, e.g. Nginx in HTTP mode (and with config `sendfile=on`). Co-authored-by: TejaswineeL <tejaswinee.ramdas.langhe@intel.com> Signed-off-by: TejaswineeL <tejaswinee.ramdas.langhe@intel.com> Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii@intel.com>
7a8c86f
to
53a0178
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed all commit messages.
Reviewable status:complete! all files reviewed, all discussions resolved
Description of the changes
This performance optimization uses a statically allocated buffer instead of allocating on the heap in the
sendfile()
syscall. To prevent data races on a single static buffer, this optimization is applied only to the first thread. This optimization is useful for sendfile-heavy workloads, e.g. Nginx in HTTP mode (and with configsendfile=on
).The approach is suggested by @mkow privately.
Closes #1600. See there for preliminary discussions and approaches that don't work well.
Thanks to @aneessahib and @TejaswineeL for doing initial exploration.
How to test this PR?
Use Nginx in HTTP mode. The
sendfile=on
option is already set in our CI-Examples config file.Manual results below. Take them with grain of salt as I tested on my single developer machine, without bothering for proper experiment environment. But they give an idea of how the optimization helps:
HOW I RAN
The above
benchmark-http.sh
script basically runs this:wrk -c 300 -d 30 -t 256 -R 10000 http://127.0.0.1:8002/random/10K.1.html
WITHOUT PR
WITH PR
make start-gramine-server
):make start-gramine-server SGX=1
):CONCLUSIONS
We have roughly the same throughput, but the latency is improved by 15-25%. Not bad.
This change is