Skip to content

Fix nagle, other minor optimizations #8222

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

rustyrussell
Copy link
Contributor

I noticed we were not going as fast as we should, in a 100-payments test. Turns out our Nagle-suppressing code was wrong!

Then I continued with minor optimization until our profiling basically only contained sqlite operations.

@rustyrussell rustyrussell added this to the v25.05 milestone Apr 9, 2025
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Our CORK logic was wrong, and it's better to use Nagle anyway:
we disable Nagle just before sending timing-critical messages.

Time for 100 (failed) payments:

Before:
	148.8573575

After:
	10.7356977

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Changelog-Fixed: Protocol: Removed 200ms latency from sending commit/revoke messages.
To run, use:
	VALGRIND=0 TEST_BENCH=1 pytest tests/test_connection.py::test_bench

Example of running on my laptop (without --enable-debugbuild):

	FAILED tests/test_connection.py::test_bench - assert 53.11918330192566 == 0

You can also run perf on l1 once it's running:

	perf record --call-graph dwarf -q -p $(cat /tmp/ltests-*/test_bench_1/lightning-1/lightningd-regtest.pid)

Then ^C after 10 seconds and run "perf report".

Things which stand out:

1. Tracing in db_exec_prepared_v2:

   31.12%     0.04%  lightningd  lightningd            [.] db_exec_prepared_v2
   - 31.08% db_exec_prepared_v2
      + 22.96% db_sqlite3_exec
      + 4.46% trace_span_end
      + 1.77% trace_span_start
      + 1.11% trace_span_tag
      + 0.72% tal_free

2. Logging:

   - 16.03% logv
      - 8.15% maybe_print
         - log_to_files
            + 4.51% __GI__IO_fflush (inlined)
            + 1.97% tal_fmt_
            + 0.51% __GI___strftime_l (inlined)

3. Notification (when nothing is listening) in notify_log:

      - 6.84% maybe_notify_log
         - notify_log
            + 3.37% notify_send
            + 1.75% notify_start
            + 1.71% log_notification_serialize
        0.56% new_log_entry

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
…hecking.

Before:
92.26076173782349
91.9576666355133
90.92732524871826

After:
90.5989830493927
88.10309219360352
90.07689118385315
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If nobody is subscribed, have notify_start return NULL and the caller
can skip serialization.  This is particularly useful for the "log"
notification which can get called a lot.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This means we don't have to iterate through all plugins, making
our "do we even have to construct this notification" optimization
much more efficient.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This speeds logging slightly, but most of the time is actually
spent in fflush() (which we need).

Result:
	FAILED tests/test_connection.py::test_bench - assert 51.54966354370117 == 0

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant