Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Rework the pipeline: batch allocation + deterministic backoff #1769

Draft
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

wyfo
Copy link
Contributor

@wyfo wyfo commented Feb 12, 2025

See more details in #1769 (comment)

@wyfo wyfo added the internal Changes not included in the changelog label Feb 12, 2025
@wyfo
Copy link
Contributor Author

wyfo commented Feb 25, 2025

The performance regression on Linux has been fixed.
The random backoff has been replaced by a deterministic batching mode: when there is more than one batches used at the same time (for a given priority), it means that the tx task is not delivering messages fast enough, so the pipeline enters in batching mode where it doesn't notify the tx task when messages are written on the current batch.
On the tx side, when batching mode is detected, it backs off until a deadline starting from the latest successful pull (or until a batch is pushed ofc); if deadline is reached, batching mode is manually stopped (next notification will be triggered on the pipeline side). If there is no backoff (or deadline was reached), the current buffer is retrieved — or a small backoff is applied if it is not available, but a notification should arrive soon in this case.

This PR gives the same throughput as main branch on Linux, around 2.7Mmsg/s on my Linux setup. On my Mac, both branches are around 6Mmsg/s. Latency of low throughput is untouched. The CPU consumed by the tx task seems to have been improved, which was expected because batching is done more deterministically and systematically.
Regarding memory consumption, batches are allocated with a size depending on the current workflow: if the last pushed batch was small enough, then the small size (an arbitrary constant of 2kiB for now) is used for the next one, otherwise, it will use the max size.

The current PR version embeds a batch reusing mechanism: one single batch is kept in memory, and if it is not available (meaning it is currently owned by the tx task), a new one is allocated on demand, respecting the limit fixed by the configuration. The batch will be refilled by the tx task after use, the next one being dropped if the slot is already refilled.
However, the impact of the allocations on performance is quite negligible, as reported on the graph below: when refilling is commented out, malloc in get_batch counts for only 0.01% of the CPU, and free in refill for 0.1%. As expected with these numbers, not using refilling doesn't have impact on performance.

Here are the flamegraphs of z_pub_thr

  • this PR
    batch_alloc
  • this PR (without using batch refill, so only allocations)
    batch_alloc-no-refill
  • main
    main

@wyfo wyfo changed the title feat: allocate tx batch on demand Rework the pipeline: batch allocation + deterministic backoff Feb 26, 2025
@wyfo wyfo requested a review from yellowhatter February 26, 2025 12:04
deadline: None,
wait_time,
impl StageIn {
const SMALL_BATCH_SIZE: BatchSize = 1 << 11;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This constant is of course arbitrary. It can be exposed in configuration if needed.

@wyfo wyfo force-pushed the feat/batch_allocation branch from fd29946 to 524cf7b Compare February 26, 2025 18:49
it breaks unixpipe test
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
internal Changes not included in the changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant