-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Socket.Send not thread-safe on Linux #44422
Comments
Tagging subscribers to this area: @dotnet/ncl Issue meta data
|
Repro code? |
And what docs are you referring to @tyb-dev ? I don't see any claims here: https://docs.microsoft.com/en-us/dotnet/api/system.net.sockets.socket.send?view=netcore-3.1#System_Net_Sockets_Socket_Send_System_Byte___System_Int32_System_Int32_System_Net_Sockets_SocketFlags_System_Net_Sockets_SocketError__ |
It's on the class: The claim is definitely true on Windows, and also almost always true on Linux. It is quite challenging to reproduce, we are seeing this rarely in production where wrong bytes are received, except when lock is used. A simple loop with Send, run 10 times concurrently, even with 1000 segments will not typically show this behavior. If I manage to create a repro, I will certainly attach it here. |
Ah, you are talking about the multi-buffer Send overload. Yeah, unfortunately I think you are right here. There is no synchronization here that will guarantee that all the buffer segments from a single Send are sent at the same time. I wish the coordination and queuing logic we have in SocketAsyncContext worked completely differently. We should just have a lock/semaphore that guarantees only one thread is trying to send or receive at a time. That would address this problem. |
If I'm not misreading it, the OP is talking about about the synchronous variant + blocking sockets. @tyb-dev can you confirm? |
We're using this overload: I made an attempt at creating a repro but did not succeed. In the repro I used the loopback adapter and C# for both server and client. Our production workload runs on Docker on Kubernetes, with one Pod communicating with another and the server part is written in native C++, only the sender part is in C#. It's unfortunately not trivially reproducible. I thought perhaps the guarantee is simply not intended by looking at the code and it could simply be documented. We're happy with the workaround. We've gone from around 50 sporadic connection drops during the day due to illegal data received by the server, to 0, after adding locks around Socket.Send. |
Yeah, good point. So strictly speaking this has nothing to do with SocketAsyncContext. That said, I still wish there was just a simple read lock and write lock on the socket that serialized operations. That would both fix this issue and simplify SocketAsyncContext significantly. |
@tyb-dev Can you give me more details about what your app does? Is it TCP or UDP? |
It is TCP: var socket = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp);
socket.Blocking = true; |
Here's a working repro. The error takes a few moments or so to pop up. It's very sporadic. I can't reproduce it on Windows or wsl. It could be timing related, or the limitation is in the Linux Kernel. My Linux test machine is Ubuntu 20.04 running with 2 processors on VirtualBox (not very fast). using System;
using System.Collections.Generic;
using System.Linq;
using System.Net;
using System.Net.Sockets;
using System.Threading;
namespace Repro
{
public class SynchronousSocketClient
{
private static object syncObj = new object();
public static void StartClient()
{
try
{
var sender = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp)
{
Blocking = true
};
sender.Connect(new IPEndPoint(IPAddress.Loopback, 12345));
// Example data: 10x counting from 0 to 255
var data = new List<ArraySegment<byte>>();
for (var i = 0; i < 10; i++)
{
data.Add(new ArraySegment<byte>(Enumerable.Range(0, 256).Select(s => (byte)s).ToArray()));
}
// Keep sending it on 10 parallel threads
for(int j = 0; j < 10; j++)
{
ThreadPool.QueueUserWorkItem((x) =>
{
while(true)
{
int bytesSent;
SocketError errorCode;
//lock (syncObj)
{
bytesSent = sender.Send(data, SocketFlags.None, out errorCode);
}
if (bytesSent != data.Sum(s => s.Count))
{
throw new InvalidOperationException(
$"Wrong total: {bytesSent}. Error code {errorCode}");
}
}
}, null);
}
}
catch (Exception e)
{
Console.WriteLine(e.ToString());
}
}
public class SynchronousSocketListener
{
public static void StartListening(ManualResetEventSlim ready)
{
var listener = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp)
{
Blocking = true
};
var bytes = new byte[1024];
try
{
listener.Bind(new IPEndPoint(IPAddress.Loopback, 12345));
listener.Listen(10);
ready.Set();
while (true)
{
var handler = listener.Accept();
Console.WriteLine("Connected");
// Get data and verify content
var counter = 0;
var iteration = 0;
while (true)
{
var bytesRec = handler.Receive(bytes);
for (var i = 0; i < bytesRec; i++)
{
if (bytes[i] != counter)
{
throw new InvalidOperationException("Wrong.");
}
counter++;
counter %= 256;
}
iteration++;
Console.Write($"Iterations: {iteration}\r");
}
}
}
catch (Exception e)
{
Console.WriteLine();
Console.WriteLine(e.ToString());
}
}
public static void Main(string[] args)
{
var ready = new ManualResetEventSlim(false);
ThreadPool.QueueUserWorkItem((x) => { StartListening(ready); });
ready.Wait();
StartClient();
Thread.Sleep(-1);
}
}
}
} Output
|
Triage: This may be limitation of Linux and how we use Sockets there -- we turn the call into loop of It may not be guaranteed even on Windows -- something to look into. We might have to change the docs. |
@tmds this may be of interest to you. |
This is a platform difference. Windows WSASend guarantees the function returns on a blocking socket only when everything is sent. On Linux, send can return before everything gets sent and the caller is responsible for making multiple send calls. We should update the docs. This is specific for Stream Sockets (like TCP). For datagram sockets (like UDP) it is safe to make concurrent calls, also on Linux. |
What would we update the docs to say? "On Linux, Socket is not thread-safe for connection-oriented sockets, e.g. TCP"? That's a pretty significant departure from what we have said in the past here. |
Either update the docs, or make Socket.Send thread-safe. Even if we want to make Socket.Send thread-safe, we may still want to update the docs to describe the current behaviour on non-Windows. |
The perf hit of making it thread-safe seems to be to have a lock that we take around all send operations to ensure they are serialized. Compared to the kernel call, that's probably cheap. Or are you thinking of something else in terms of perf? |
Yes, I'm thinking about the lock, and I also expect it will be cheap compared to the syscall. We should avoid taking the lock for non-stream sockets, like UDP. |
On Linux you can also use I understand that it should either succeed atomically or return |
I don't think it's a Thread-Safe error, just that you use TCP and you're not framing messages correctly. |
What is the guidance here? The docs still claim that socket instances are thread safe, but this issue is still open. |
do we have corresponding issue for doc @antonfirsov? |
From the discussion this seems like a problem we want to fix eventually. However it won't happen fast, and won't apply to previous versions, so it's probably worth to update the docs so they match the current behavior. Thoughts about a future fix @wfurt? |
triage: we should update docs to be correct. We can consider changes independently. |
Description
According to the documentation, instances of Socket.Send are thread-safe. This is reliably the case on Windows. When running the same code on Linux, sporadically, the data that is received by the receiving party seems to be partially scrambled when Socket.Send is called from multiple threads concurrently.
Configuration
.NET Core 3.1.9
Linux Ubuntu 20.04
x64
release mode
Other information
A reliable work-around is to put a lock-statement around each Socket.Send.
Blocking mode.
Socket.Send overload taking IList<ArraySegment>
I only briefly looked into the code (SocketAsyncContext.Unix.cs, method PerformSyncOperation), which seems to be a loop and a queue. I am not sure I see any measures that would ensure buffers that are sent as a list of segments are sent as a single block.
The text was updated successfully, but these errors were encountered: