Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Socket.Send not thread-safe on Linux #44422

Closed
tyb-dev opened this issue Nov 9, 2020 · 24 comments · Fixed by dotnet/dotnet-api-docs#9216
Closed

Socket.Send not thread-safe on Linux #44422

tyb-dev opened this issue Nov 9, 2020 · 24 comments · Fixed by dotnet/dotnet-api-docs#9216
Labels
area-System.Net.Sockets documentation Documentation bug or enhancement, does not impact product or test code
Milestone

Comments

@tyb-dev
Copy link

tyb-dev commented Nov 9, 2020

Description

According to the documentation, instances of Socket.Send are thread-safe. This is reliably the case on Windows. When running the same code on Linux, sporadically, the data that is received by the receiving party seems to be partially scrambled when Socket.Send is called from multiple threads concurrently.

Configuration

.NET Core 3.1.9
Linux Ubuntu 20.04
x64
release mode

Other information

A reliable work-around is to put a lock-statement around each Socket.Send.
Blocking mode.
Socket.Send overload taking IList<ArraySegment>

I only briefly looked into the code (SocketAsyncContext.Unix.cs, method PerformSyncOperation), which seems to be a loop and a queue. I am not sure I see any measures that would ensure buffers that are sent as a list of segments are sent as a single block.

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Net.Sockets untriaged New issue has not been triaged by the area owner labels Nov 9, 2020
@ghost
Copy link

ghost commented Nov 9, 2020

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.


Issue meta data

Issue content: ### Description

According to the documentation, instances of Socket.Send are thread-safe. This is reliably the case on Windows. When running the same code on Linux, sporadically, the data that is received by the receiving party seems to be partially scrambled when Socket.Send is called from multiple threads concurrently.

Configuration

.NET Core 3.1.9
Linux Ubuntu 20.04
x64
release mode

Other information

A reliable work-around is to put a lock-statement around each Socket.Send.
Blocking mode.
Socket.Send overload taking IList<ArraySegment>

I only briefly looked into the code (SocketAsyncContext.Unix.cs, method PerformSyncOperation), which seems to be a loop and a queue. I am not sure I see any measures that would ensure buffers that are sent as a list of segments are sent as a single block.

Issue author: tyb-dev
Assignees: -
Milestone: -

@geoffkizer
Copy link
Contributor

Repro code?

@tyb-dev
Copy link
Author

tyb-dev commented Nov 9, 2020

It's on the class:
https://docs.microsoft.com/en-us/dotnet/api/system.net.sockets.socket?view=netcore-3.1#thread-safety

The claim is definitely true on Windows, and also almost always true on Linux. It is quite challenging to reproduce, we are seeing this rarely in production where wrong bytes are received, except when lock is used.

A simple loop with Send, run 10 times concurrently, even with 1000 segments will not typically show this behavior. If I manage to create a repro, I will certainly attach it here.

@geoffkizer
Copy link
Contributor

Ah, you are talking about the multi-buffer Send overload.

Yeah, unfortunately I think you are right here. There is no synchronization here that will guarantee that all the buffer segments from a single Send are sent at the same time.

I wish the coordination and queuing logic we have in SocketAsyncContext worked completely differently. We should just have a lock/semaphore that guarantees only one thread is trying to send or receive at a time. That would address this problem.

@antonfirsov
Copy link
Member

antonfirsov commented Nov 10, 2020

I wish the coordination and queuing logic we have in SocketAsyncContext worked completely differently.

If I'm not misreading it, the OP is talking about about the synchronous variant + blocking sockets. @tyb-dev can you confirm?

@tyb-dev
Copy link
Author

tyb-dev commented Nov 10, 2020

We're using this overload:
Send(IList<ArraySegment>, SocketFlags, SocketError)

I made an attempt at creating a repro but did not succeed. In the repro I used the loopback adapter and C# for both server and client. Our production workload runs on Docker on Kubernetes, with one Pod communicating with another and the server part is written in native C++, only the sender part is in C#.

It's unfortunately not trivially reproducible. I thought perhaps the guarantee is simply not intended by looking at the code and it could simply be documented.

We're happy with the workaround. We've gone from around 50 sporadic connection drops during the day due to illegal data received by the server, to 0, after adding locks around Socket.Send.

@geoffkizer
Copy link
Contributor

OP is talking about about the synchronous variant + blocking sockets

Yeah, good point. So strictly speaking this has nothing to do with SocketAsyncContext.

That said, I still wish there was just a simple read lock and write lock on the socket that serialized operations. That would both fix this issue and simplify SocketAsyncContext significantly.

@geoffkizer
Copy link
Contributor

@tyb-dev Can you give me more details about what your app does? Is it TCP or UDP?

@tyb-dev
Copy link
Author

tyb-dev commented Nov 10, 2020

It is TCP:

var socket = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp);
socket.Blocking = true;

@tyb-dev
Copy link
Author

tyb-dev commented Nov 10, 2020

Here's a working repro.

The error takes a few moments or so to pop up. It's very sporadic. I can't reproduce it on Windows or wsl. It could be timing related, or the limitation is in the Linux Kernel. My Linux test machine is Ubuntu 20.04 running with 2 processors on VirtualBox (not very fast).

using System;
using System.Collections.Generic;
using System.Linq;
using System.Net;
using System.Net.Sockets;
using System.Threading;

namespace Repro
{
    public class SynchronousSocketClient
    {
        private static object syncObj = new object();

        public static void StartClient()
        {
            try
            {
                var sender = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp)
                {
                    Blocking = true
                };

                sender.Connect(new IPEndPoint(IPAddress.Loopback, 12345));

                // Example data: 10x counting from 0 to 255
                var data = new List<ArraySegment<byte>>();
                for (var i = 0; i < 10; i++)
                {
                    data.Add(new ArraySegment<byte>(Enumerable.Range(0, 256).Select(s => (byte)s).ToArray()));
                }

                // Keep sending it on 10 parallel threads
                for(int j = 0; j < 10; j++)
                {
                    ThreadPool.QueueUserWorkItem((x) =>
                    {
                        while(true)
                        {
                            int bytesSent;
                            SocketError errorCode;
                            //lock (syncObj)
                            {
                                bytesSent = sender.Send(data, SocketFlags.None, out errorCode);
                            }

                            if (bytesSent != data.Sum(s => s.Count))
                            {
                                throw new InvalidOperationException(
                                    $"Wrong total: {bytesSent}. Error code {errorCode}");
                            }
                        }
                    }, null);
                }
            }
            catch (Exception e)
            {
                Console.WriteLine(e.ToString());
            }
        }

        public class SynchronousSocketListener
        {
            public static void StartListening(ManualResetEventSlim ready)
            {
                var listener = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp)
                {
                    Blocking = true
                };

                var bytes = new byte[1024];
                try
                {
                    listener.Bind(new IPEndPoint(IPAddress.Loopback, 12345));
                    listener.Listen(10);

                    ready.Set();
                    while (true)
                    {
                        var handler = listener.Accept();

                        Console.WriteLine("Connected");

                        // Get data and verify content
                        var counter = 0;
                        var iteration = 0;
                        while (true)
                        {
                            var bytesRec = handler.Receive(bytes);
                            for (var i = 0; i < bytesRec; i++)
                            {
                                if (bytes[i] != counter)
                                {
                                    throw new InvalidOperationException("Wrong.");
                                }

                                counter++;
                                counter %= 256;
                            }

                            iteration++;
                            Console.Write($"Iterations: {iteration}\r");
                        }
                    }

                }
                catch (Exception e)
                {
                    Console.WriteLine();
                    Console.WriteLine(e.ToString());
                }

            }

            public static void Main(string[] args)
            {
                var ready = new ManualResetEventSlim(false);
                ThreadPool.QueueUserWorkItem((x) => { StartListening(ready); });
                ready.Wait();

                StartClient();
                
                Thread.Sleep(-1);
            }
        }
    }
}

Output

Iterations: 14447
System.InvalidOperationException: Wrong.
   at Repro.SynchronousSocketClient.SynchronousSocketListener.StartListening(ManualResetEventSlim ready) in (path redacted)Program.cs:line 88

@karelz
Copy link
Member

karelz commented Nov 24, 2020

Triage: This may be limitation of Linux and how we use Sockets there -- we turn the call into loop of Send calls, which can be interleaved by other thread sadly.
We expect this is rarely used pattern, so we should be ok-ish.
We need to investigate a bit more.

It may not be guaranteed even on Windows -- something to look into. We might have to change the docs.

@karelz karelz added bug and removed untriaged New issue has not been triaged by the area owner labels Nov 24, 2020
@karelz karelz added this to the Future milestone Nov 24, 2020
@geoffkizer
Copy link
Contributor

@tmds this may be of interest to you.

@tmds
Copy link
Member

tmds commented Nov 26, 2020

This is a platform difference. Windows WSASend guarantees the function returns on a blocking socket only when everything is sent. On Linux, send can return before everything gets sent and the caller is responsible for making multiple send calls.
So even when sending a single buffer, you can get this behavior.

We should update the docs. This is specific for Stream Sockets (like TCP). For datagram sockets (like UDP) it is safe to make concurrent calls, also on Linux.

@geoffkizer
Copy link
Contributor

What would we update the docs to say? "On Linux, Socket is not thread-safe for connection-oriented sockets, e.g. TCP"?

That's a pretty significant departure from what we have said in the past here.

@tmds
Copy link
Member

tmds commented Nov 26, 2020

Either update the docs, or make Socket.Send thread-safe.
It's a trade-off between between performance and backwards-compatibility.

Even if we want to make Socket.Send thread-safe, we may still want to update the docs to describe the current behaviour on non-Windows.

@geoffkizer
Copy link
Contributor

The perf hit of making it thread-safe seems to be to have a lock that we take around all send operations to ensure they are serialized. Compared to the kernel call, that's probably cheap.

Or are you thinking of something else in terms of perf?

@tmds
Copy link
Member

tmds commented Nov 26, 2020

Yes, I'm thinking about the lock, and I also expect it will be cheap compared to the syscall.

We should avoid taking the lock for non-stream sockets, like UDP.

@CyberShadow
Copy link

CyberShadow commented Apr 17, 2021

On Linux you can also use sendmsg with struct iovec *msg_iov being a vector of fragmented data segments.

I understand that it should either succeed atomically or return EMSGSIZE.

@RuanCardoso
Copy link

RuanCardoso commented Jul 31, 2022

Here's a working repro.

The error takes a few moments or so to pop up. It's very sporadic. I can't reproduce it on Windows or wsl. It could be timing related, or the limitation is in the Linux Kernel. My Linux test machine is Ubuntu 20.04 running with 2 processors on VirtualBox (not very fast).

using System;
using System.Collections.Generic;
using System.Linq;
using System.Net;
using System.Net.Sockets;
using System.Threading;

namespace Repro
{
    public class SynchronousSocketClient
    {
        private static object syncObj = new object();

        public static void StartClient()
        {
            try
            {
                var sender = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp)
                {
                    Blocking = true
                };

                sender.Connect(new IPEndPoint(IPAddress.Loopback, 12345));

                // Example data: 10x counting from 0 to 255
                var data = new List<ArraySegment<byte>>();
                for (var i = 0; i < 10; i++)
                {
                    data.Add(new ArraySegment<byte>(Enumerable.Range(0, 256).Select(s => (byte)s).ToArray()));
                }

                // Keep sending it on 10 parallel threads
                for(int j = 0; j < 10; j++)
                {
                    ThreadPool.QueueUserWorkItem((x) =>
                    {
                        while(true)
                        {
                            int bytesSent;
                            SocketError errorCode;
                            //lock (syncObj)
                            {
                                bytesSent = sender.Send(data, SocketFlags.None, out errorCode);
                            }

                            if (bytesSent != data.Sum(s => s.Count))
                            {
                                throw new InvalidOperationException(
                                    $"Wrong total: {bytesSent}. Error code {errorCode}");
                            }
                        }
                    }, null);
                }
            }
            catch (Exception e)
            {
                Console.WriteLine(e.ToString());
            }
        }

        public class SynchronousSocketListener
        {
            public static void StartListening(ManualResetEventSlim ready)
            {
                var listener = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp)
                {
                    Blocking = true
                };

                var bytes = new byte[1024];
                try
                {
                    listener.Bind(new IPEndPoint(IPAddress.Loopback, 12345));
                    listener.Listen(10);

                    ready.Set();
                    while (true)
                    {
                        var handler = listener.Accept();

                        Console.WriteLine("Connected");

                        // Get data and verify content
                        var counter = 0;
                        var iteration = 0;
                        while (true)
                        {
                            var bytesRec = handler.Receive(bytes);
                            for (var i = 0; i < bytesRec; i++)
                            {
                                if (bytes[i] != counter)
                                {
                                    throw new InvalidOperationException("Wrong.");
                                }

                                counter++;
                                counter %= 256;
                            }

                            iteration++;
                            Console.Write($"Iterations: {iteration}\r");
                        }
                    }

                }
                catch (Exception e)
                {
                    Console.WriteLine();
                    Console.WriteLine(e.ToString());
                }

            }

            public static void Main(string[] args)
            {
                var ready = new ManualResetEventSlim(false);
                ThreadPool.QueueUserWorkItem((x) => { StartListening(ready); });
                ready.Wait();

                StartClient();
                
                Thread.Sleep(-1);
            }
        }
    }
}

Output

Iterations: 14447
System.InvalidOperationException: Wrong.
   at Repro.SynchronousSocketClient.SynchronousSocketListener.StartListening(ManualResetEventSlim ready) in (path redacted)Program.cs:line 88

I don't think it's a Thread-Safe error, just that you use TCP and you're not framing messages correctly.
You have to consider that TCP is a flow, it will deliver the data little by little and you have join the frames!

@PBS590
Copy link

PBS590 commented Oct 18, 2022

What is the guidance here? The docs still claim that socket instances are thread safe, but this issue is still open.

@wfurt
Copy link
Member

wfurt commented Oct 22, 2022

do we have corresponding issue for doc @antonfirsov?

@antonfirsov
Copy link
Member

From the discussion this seems like a problem we want to fix eventually. However it won't happen fast, and won't apply to previous versions, so it's probably worth to update the docs so they match the current behavior.

Thoughts about a future fix @wfurt?

@wfurt wfurt assigned wfurt and liveans and unassigned wfurt Dec 15, 2022
@wfurt
Copy link
Member

wfurt commented Dec 15, 2022

triage: we should update docs to be correct. We can consider changes independently.

@wfurt wfurt modified the milestones: Future, 8.0.0 Dec 15, 2022
@karelz karelz added documentation Documentation bug or enhancement, does not impact product or test code and removed bug labels Jun 9, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Sep 24, 2023
# for free to subscribe to this conversation on GitHub. Already have an account? #.
Labels
area-System.Net.Sockets documentation Documentation bug or enhancement, does not impact product or test code
Projects
None yet
Development

Successfully merging a pull request may close this issue.