Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

mlx5: Introduce data direct placement (DDP) over the DV API #1494

Merged
merged 4 commits into from
Nov 10, 2024

Conversation

yishaih
Copy link
Member

@yishaih yishaih commented Sep 3, 2024

This series introduces the data direct placement (DDP) functionality for the mlx5 provider over the DV API.

This feature allows WRs on the receiver side of the QP to be consumed out of order, permitting the sender side to transmit messages without guaranteeing arrival order on the receiver side.

When enabled, the completion ordering of WRs remains in-order, regardless of the Receive WRs consumption order.

When the application understands the new rules for processing CQEs/etc. it sets a new DV bit during QP creation.

The relevant man pages were extended to describe the expected usage and the semantics for DDP.

Further details exist as part of the commit messages.

The matching kernel series was sent already to rdma-next.

Yishai Hadas and others added 4 commits November 10, 2024 14:18
To commit: 8b36f7c3c661 ("RDMA/mlx5: Support OOO RX WQE consumption").

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Based on IB specification, the current code assumes that WQE buffers are
consumed and CQEs are generated in-order.
The WQE in-order consumption is not guaranteed when HW has to handle
out-of-order (OOO) packets. HW may consume buffers OOO but generate CQEs
in-order.

When scatter2cqe is enabled, we must scatter the data to the correct WQE
buffer. This also applies to WR IDs. Assuming incremental WQE indexes
leads to incorrect WR IDs being returned to users.

Therefore, we need to use WQE index field from CQE to access the WQE
instead of assuming the WQE's order is the same as the CQE's and access
the next WQE in the WQ in an incremental way.

So, this is a preparation patch to support the WQE's OOO mode as will be
introduced in the next patches from the series.

Signed-off-by: Edward Srouji <edwards@nvidia.com>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
In create_qp(), RQ max_post value was redundantly overwritten after
being calculated already as part of mlx5_calc_rq_size().

This assumption that max_post is equal to wqe_cnt is not necessarily
true in all cases, as will be changed in upcoming patches where the OOO
RX support feature is introduced.

The line was removed, and the calculation is now handled by
mlx5_calc_rq_size() without overwriting its calculation.

Signed-off-by: Edward Srouji <edwards@nvidia.com>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Add a new MLX5DV_QP_CREATE_OOO_DP flag, which allows WRs on the receiver
side of a QP to be consumed out-of-order (OOO).
Additionally, it permits the sender side to transmit messages without
guaranteeing arrival order on the receiver side.

When enabled, the flag ensures that the completion ordering of WRs
remains unchanged, regardless of the consumption order of Receive WRs.
RDMA Read and RDMA Atomic operations on the responder side continue to
be executed in order, while the ordering of data placement for RDMA
Write and Send operations is not guaranteed.

The MLX5DV_QP_CREATE_OOO_DP flag must be set on both the sender and
receiver sides of a QP, such as DCT and DCI, to allow the sender side to
transmit messages without guaranteeing any arrival ordering on the
receiver side.

It is optional, and its availability must be queried via the application
using mlx5dv_query_device() with a newly added
MLX5DV_CONTEXT_MASK_OOO_RECV_WRS flag.

Although enabling OOO on the QP is relevant in Init to RTR modification
stage, the relevant flag is passed by the user on QP creation.
This should be done because internally, when enabled on a QP, its RQ
buffer size is double the user requested size if it's a cyclic
implemented buffer (e.g. RC, UC, UD, etc.). This is to prevent a
scenario where WQE overwrites may happen when WQEs are consumed OOO.

If the Kernel or device does not support this feature, creating the QP
with this flag will fail.

Signed-off-by: Edward Srouji <edwards@nvidia.com>
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
@yishaih yishaih merged commit e3286dd into linux-rdma:master Nov 10, 2024
14 checks passed
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants