-
Notifications
You must be signed in to change notification settings - Fork 32
/
Copy pathREADME
264 lines (200 loc) · 8.79 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
Kernel Mode RDMA Ping Module
Steve Wise - 8/2009
---
Updated 8/2016
---
============
Introduction
============
The krping module is a kernel loadable module that utilizes the Open
Fabrics verbs to implement a client/server ping/pong program. The module
was implemented as a test vehicle for working with the iwarp branch of
the OFA project.
The goals of this program include:
- Simple harness to test kernel-mode verbs: connection setup, send,
recv, rdma read, rdma write, and completion notifications.
- Client/server model.
- IP addressing used to identify remote peer.
- Transport independent utilizing the RDMA CMA service
- No user-space application needed.
- Just a test utility...nothing more.
This module allows establishing connections and running ping/pong tests
via a /proc entry called /proc/krping. This simple mechanism allows
starting many kernel threads concurrently and avoids the need for a user
space application.
The krping module is designed to utilize all the major DTO operations:
send, recv, rdma read, and rdma write. Its goal was to test the API
and as such is not necessarily an efficient test. Once the connection
is established, the client and server begin a ping/pong loop:
Client Server
---------------------------------------------------------------------
SEND(ping source buffer rkey/addr/len)
RECV Completion with ping source info
RDMA READ from client source MR
RDMA Read completion
SEND .go ahead. to client
RECV Completion of .go ahead.
SEND (ping sink buffer rkey/addr/len)
RECV Completion with ping sink info
RDMA Write to client sink MR
RDMA Write completion
SEND .go ahead. to client
RECV Completion of .go ahead.
Validate data in source and sink buffers
<repeat the above loop>
============
To build/install the krping module
============
# git clone git://git.openfabrics.org/~swise/krping
# cd krping
<edit Makefile and set KSRC accordingly>
# make && make install
# modprobe rdma_krping
============
Using Krping
============
Communication from user space is done via the /proc filesystem.
Krping exports file /proc/krping. Writing commands in ascii format to
/proc/krping will start krping threads in the kernel. The thread issuing
the write to /proc/krping is used to run the krping test, so it will
block until the test completes, or until the user interrupts the write.
Here is a simple example to start an rping test using the rdma_krping
module. The server's address is 192.168.69.127. The client will
connect to this address at port 9999 and issue 100 ping/pong messages.
This example assumes you have two systems connected via IB and the
IPoverIB devices are configured on the 192.168.69/24 subnet accordingly.
Server:
# modprobe rdma_krping
# echo "server,addr=192.168.69.127,port=9999" >/proc/krping
The echo command above will block until the krping test completes,
or the user hits ctrl-c.
On the client:
# modprobe rdma_krping
# echo "client,addr=192.168.69.127,port=9999,count=100" >/proc/krping
Just like on the server, the echo command above will block until the
krping test completes, or the user hits ctrl-c.
The syntax for krping commands is a string of options separated by commas.
Options can be single keywords, or in the form: option=operand.
Operands can be integers or strings.
Note you must specify the _same_ options on both sides. For instance,
if you want to use the server_invalidate option, then you must specify
it on both the server and client command lines.
Opcode Operand Type Description
------------------------------------------------------------------------
client none Initiate a client side krping thread.
server none Initiate a server side krping thread.
addr string The server's IP address in dotted
decimal format. Note the server can
use 0.0.0.0 to bind to all devices.
port integer The server's port number in host byte
order.
count integer The number of rping iterations to
perform before shutting down the test.
If unspecified, the count is infinite.
size integer The size of the rping data. Default for
rping is 65 bytes.
verbose none Enables printk()s that dump the rping
data. Use with caution!
validate none Enables validating the rping data on
each iteration to detect data
corruption.
mem_mode string Determines how memory will be
registered. Modes include dma,
and reg. Default is dma.
server_inv none Valid only in reg mr mode, this
option enables invalidating the
client's reg mr via
SEND_WITH_INVALIDATE messages from
the server.
local_dma_lkey none Use the local dma lkey for the source
of writes and sends, and in recvs
read_inv none Server will use READ_WITH_INV. Only
valid in reg mem_mode.
============
Memory Usage:
============
The krping client uses 4 memory areas:
start_buf - the source of the ping data. This buffer is advertised to
the server at the start of each iteration, and the server rdma reads
the ping data from this buffer over the wire.
rdma_buf - the sink of the ping data. This buffer is advertised to the
server each iteration, and the server rdma writes the ping data that it
read from the start buffer into this buffer. The start_buf and rdma_buf
contents are then compared if the krping validate option is specified.
recv_buf - used to recv "go ahead" SEND from the server.
send_buf - used to advertise the rdma buffers to the server via SEND
messages.
The krping server uses 3 memory areas:
rdma_buf - used as the sink of the RDMA READ to pull the ping data
from the client, and then used as the source of an RDMA WRITE to
push the ping data back to the client.
recv_buf - used to receive rdma rkey/addr/length advertisements from
the client.
send_buf - used to send "go ahead" SEND messages to the client.
============
Memory Registration Modes:
============
Each of these memory areas are registered with the RDMA device using
whatever memory mode was specified in the command line. The mem_mode
values include: dma, and reg (aka fastreg). The default mode, if not
specified, is dma.
The dma mem_mode uses a single dma_mr for all memory buffers.
The reg mem_mode uses a reg mr on the client side for the
start_buf and rdma_buf buffers. Each time the client will advertise
one of these buffers, it invalidates the previous registration and fast
registers the new buffer with a new key. If the server_invalidate
option is on, then the server will do the invalidation via the "go ahead"
messages using the IB_WR_SEND_WITH_INV opcode. Otherwise the client
invalidates the mr using the IB_WR_LOCAL_INV work request.
On the server side, reg mem_mode causes the server to use the
reg_mr rkey for its rdma_buf buffer IO. Before each rdma read and
rdma write, the server will post an IB_WR_LOCAL_INV + IB_WR_REG_MR
WR chain to register the buffer with a new key. If the krping read-inv
option is set then the server will use IB_WR_READ_WITH_INV to do the
rdma read and skip the IB_WR_LOCAL_INV wr before re-registering the
buffer for the subsequent rdma write operation.
============
Stats
============
While krping threads are executing, you can obtain statistics on the
thread by reading from the /proc/krping file. If you cat /proc/krping,
you will dump IO statistics for each running krping thread. The format
is one thread per line, and each thread contains the following stats
separated by white spaces:
Statistic Description
---------------------------------------------------------------------
Name krping thread number and device being used.
Send Bytes Number of bytes transferred in SEND WRs.
Send Messages Number of SEND WRs posted
Recv Bytes Number of bytes received via RECV completions.
Recv Messages Number of RECV WRs completed.
RDMA WRITE Bytes Number of bytes transferred in RDMA WRITE WRs.
RDMA WRITE Messages Number of RDMA WRITE WRs posted.
RDMA READ Bytes Number of bytes transferred via RDMA READ WRs.
RDMA READ Messages Number of RDMA READ WRs posted.
Here is an example of the server side output for 5 krping threads:
# cat /proc/krping
1-amso0 0 0 16 1 12583960576 192016 0 0
2-mthca0 0 0 16 1 60108570624 917184 0 0
3-mthca0 0 0 16 1 59106131968 901888 0 0
4-mthca1 0 0 16 1 101658394624 1551184 0 0
5-mthca1 0 0 16 1 100201922560 1528960 0 0
#
============
EXPERIMENTAL
============
There are other options that enable micro benchmarks to measure
the kernel rdma performance. These include:
Opcode Operand Type Description
------------------------------------------------------------------------
wlat none Write latency test
rlat none read latency test
poll none enable polling vs blocking for rlat
bw none write throughput test
duplex none valid only with bw, this
enables bidirectional mode
tx-depth none set the sq depth for bw tests
See the awkit* files to take the data logged in the kernel log
and compute RTT/2 or Gbps results.
Use these at your own risk.
END-OF-FILE