Rework the pipeline: batch allocation + deterministic backoff #1769

wyfo · 2025-02-12T10:25:47Z

wyfo · 2025-02-25T14:29:51Z

The performance regression on Linux has been fixed.
The random backoff has been replaced by a deterministic batching mode: when there is more than one batches used at the same time (for a given priority), it means that the tx task is not delivering messages fast enough, so the pipeline enters in batching mode where it doesn't notify the tx task when messages are written on the current batch.
On the tx side, when batching mode is detected, it backs off until a deadline starting from the latest successful pull (or until a batch is pushed ofc); if deadline is reached, batching mode is manually stopped (next notification will be triggered on the pipeline side). If there is no backoff (or deadline was reached), the current buffer is retrieved — or a small backoff is applied if it is not available, but a notification should arrive soon in this case.

This PR gives the same throughput as main branch on Linux, around 2.7Mmsg/s on my Linux setup. On my Mac, both branches are around 6Mmsg/s. Latency of low throughput is untouched. The CPU consumed by the tx task seems to have been improved, which was expected because batching is done more deterministically and systematically.
Regarding memory consumption, batches are allocated with a size depending on the current workflow: if the last pushed batch was small enough, then the small size (an arbitrary constant of 2kiB for now) is used for the next one, otherwise, it will use the max size.

The current PR version embeds a batch reusing mechanism: one single batch is kept in memory, and if it is not available (meaning it is currently owned by the tx task), a new one is allocated on demand, respecting the limit fixed by the configuration. The batch will be refilled by the tx task after use, the next one being dropped if the slot is already refilled.
However, the impact of the allocations on performance is quite negligible, as reported on the graph below: when refilling is commented out, malloc in get_batch counts for only 0.01% of the CPU, and free in refill for 0.1%. As expected with these numbers, not using refilling doesn't have impact on performance.

Here are the flamegraphs of z_pub_thr

this PR
this PR (without using batch refill, so only allocations)
main

wyfo · 2025-02-26T12:06:26Z

io/zenoh-transport/src/common/pipeline.rs

-            deadline: None,
-            wait_time,
+impl StageIn {
+    const SMALL_BATCH_SIZE: BatchSize = 1 << 11;


This constant is of course arbitrary. It can be exposed in configuration if needed.

…eclipse-zenoh#1751)" This reverts commit e17de41.

This reverts commit 6360eb6

it breaks unixpipe test

wyfo added the internal Changes not included in the changelog label Feb 12, 2025

wyfo changed the title ~~feat: allocate tx batch on demand~~ Rework the pipeline: batch allocation + deterministic backoff Feb 26, 2025

wyfo requested a review from yellowhatter February 26, 2025 12:04

wyfo commented Feb 26, 2025

View reviewed changes

wyfo added 17 commits February 26, 2025 19:49

Revert "Change default memory allocation policy and size of the queue (…

cc50515

…eclipse-zenoh#1751)" This reverts commit e17de41.

Revert "Reduce TX/RX buffers allocation (eclipse-zenoh#1749)"

085b0e7

This reverts commit 6360eb6

feat: allocate tx batch on demand

64a78c7

fix: lints

5a727ac

fix: try to pull before awaiting int the pull loop

f4b0c8a

chore: remove crossbeam dependency

8b39e0c

fix: add missing break from pull loop in case of backoff

5babf83

refactor: improve code

d9b7459

fix: fix backoff semantic

039045c

replace random backoff with a deterministic batching mode

e9a4307

fix lints

a6d569b

add missing drop to BatchPool

d293e85

add more comments to explain backoff

c87f99c

fix typo

133dccd

add safety comments for refilling cell

f894f63

remove dead code + add comments

6965cde

remove dead code

524cf7b

wyfo force-pushed the feat/batch_allocation branch from fd29946 to 524cf7b Compare February 26, 2025 18:49

do not push empty batches

88f07b0

it breaks unixpipe test

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework the pipeline: batch allocation + deterministic backoff #1769

Rework the pipeline: batch allocation + deterministic backoff #1769

wyfo commented Feb 12, 2025 •

edited

Loading

wyfo commented Feb 25, 2025

wyfo Feb 26, 2025

Rework the pipeline: batch allocation + deterministic backoff #1769

Are you sure you want to change the base?

Rework the pipeline: batch allocation + deterministic backoff #1769

Conversation

wyfo commented Feb 12, 2025 • edited Loading

wyfo commented Feb 25, 2025

wyfo Feb 26, 2025

Choose a reason for hiding this comment

wyfo commented Feb 12, 2025 •

edited

Loading