Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

slow sctp performance on arm64 #1815

Open
nshopik opened this issue Dec 21, 2024 · 2 comments
Open

slow sctp performance on arm64 #1815

nshopik opened this issue Dec 21, 2024 · 2 comments

Comments

@nshopik
Copy link

nshopik commented Dec 21, 2024

Context

  • Version of iperf3: 3.12

  • Hardware: arm ampere 2 cores with 4gb memory

  • Operating system (and distribution, if any): Linux 6.1.0-27-arm64 #1 SMP Debian 6.1.115-1 (2024-11-01) aarch64

Bug Report

  • Expected Behavior

somewhat similar what we see on x86

  • Actual Behavior
    sctp test
iperf 3.12
Linux goro 6.1.0-27-arm64 #1 SMP Debian 6.1.115-1 (2024-11-01) aarch64
Control connection MSS 32768
Time: Sat, 21 Dec 2024 20:44:37 GMT
Connecting to host 127.0.0.1, port 5201
      Cookie: rp56igphb6uclgiy73ijfa2ymnn6a465xotj
[  5] local 127.0.0.1 port 32984 connected to 127.0.0.1 port 5201
Starting Test: protocol: SCTP, 1 streams, 65536 byte blocks, omitting 0 seconds, 10 second test, tos 0
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  6.62 MBytes  55.6 Mbits/sec
[  5]   1.00-2.00   sec   448 KBytes  3.67 Mbits/sec
[  5]   2.00-3.00   sec   640 KBytes  5.24 Mbits/sec
[  5]   3.00-4.00   sec   576 KBytes  4.72 Mbits/sec
[  5]   4.00-5.00   sec   768 KBytes  6.29 Mbits/sec
[  5]   5.00-6.00   sec   384 KBytes  3.15 Mbits/sec
[  5]   6.00-7.00   sec   576 KBytes  4.72 Mbits/sec
[  5]   7.00-8.00   sec   448 KBytes  3.67 Mbits/sec
[  5]   8.00-9.00   sec  2.00 MBytes  16.8 Mbits/sec
[  5]   9.00-10.00  sec  1.56 MBytes  13.1 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.00  sec  13.9 MBytes  11.7 Mbits/sec                  sender
[  5]   0.00-10.00  sec  13.9 MBytes  11.6 Mbits/sec                  receiver
CPU Utilization: local/sender 0.2% (0.0%u/0.2%s), remote/receiver 0.1% (0.1%u/0.1%s)

tcp test

Linux goro 6.1.0-27-arm64 #1 SMP Debian 6.1.115-1 (2024-11-01) aarch64
Control connection MSS 32768
Time: Sat, 21 Dec 2024 20:44:51 GMT
Connecting to host 127.0.0.1, port 5201
      Cookie: ri4uc63tpz74m2zoipzcjv3d323jjh4lcill
      TCP MSS: 32768 (default)
[  5] local 127.0.0.1 port 40698 connected to 127.0.0.1 port 5201
Starting Test: protocol: TCP, 1 streams, 131072 byte blocks, omitting 0 seconds, 10 second test, tos 0
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  4.18 GBytes  35.9 Gbits/sec    0   2.37 MBytes
[  5]   1.00-2.00   sec  4.29 GBytes  36.9 Gbits/sec    0   2.50 MBytes
[  5]   2.00-3.00   sec  4.27 GBytes  36.7 Gbits/sec    0   2.62 MBytes
[  5]   3.00-4.00   sec  4.21 GBytes  36.2 Gbits/sec    0   2.87 MBytes
[  5]   4.00-5.00   sec  4.30 GBytes  37.0 Gbits/sec    0   2.87 MBytes
[  5]   5.00-6.00   sec  4.31 GBytes  37.0 Gbits/sec    0   2.87 MBytes
[  5]   6.00-7.00   sec  4.15 GBytes  35.7 Gbits/sec    0   4.37 MBytes
[  5]   7.00-8.00   sec  4.26 GBytes  36.6 Gbits/sec    0   4.37 MBytes
[  5]   8.00-9.00   sec  4.27 GBytes  36.6 Gbits/sec    0   4.37 MBytes
[  5]   9.00-10.00  sec  4.29 GBytes  36.8 Gbits/sec    0   4.37 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  42.5 GBytes  36.5 Gbits/sec    0             sender
[  5]   0.00-10.00  sec  42.5 GBytes  36.5 Gbits/sec                  receiver
CPU Utilization: local/sender 99.3% (0.8%u/98.5%s), remote/receiver 81.2% (2.2%u/79.0%s)
snd_tcp_congestion cubic
rcv_tcp_congestion cubic
  • Steps to Reproduce

It seems issue only can be reproduced on arm64 builds. I tried raspeberry pi2 as well with most recent build from source.

  • Possible Solution

tcpdump show random 200ms delay between packets after full block size received (65536) and next SACK. perf top doesn't show any kernel bottleneck

@davidBar-On
Copy link
Contributor

I am not familiar with SCTP, but it seems that SCTP includes a "delayed ack" mechanism and that the 200ms delay is per a related system configuration setting. For example, see this and SCTP RFC 4960 section 6.2 (or in the update RFC 9260).

@nshopik
Copy link
Author

nshopik commented Jan 9, 2025

Yeah I need investigate bit more, I never looked into x86 traffic dump to see if there difference, could be related to arch settings

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants