-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
[FEATURE] Reduce support issues related to concurrent writes #957
Comments
Thanks @hulkingshtick for this thorough write-up. Based on what you've written and the previous threads on this, I'd say we should probably perform a combo of I'd prioritise any PRs done for this to be released asap. |
The more I think about it, panic is better for developers than returning an error. Because many developers ignore errors, applications will silently fail when an error is returned. The panic makes the developers aware that there is a programming error in their application. If the panicking goroutine does not recover from the panic, then the application exits with a dump of all goroutine stack traces. It should be easy for developers find the two concurrent writers from those stack traces. I suspect that the panicking goroutine is often a goroutine started by the net/http server. These goroutines recover from panics and print the stack trace of the panicking goroutine only. It's difficult to find the other concurrent writer with one stack trace. This simple change can cause the other goroutine to panic:
Here's a demonstration of this change: https://go.dev/play/p/ItZzymA0OEf. Notice how the application prints stack traces for example1 and example2. The application prints the stack trace for example2 only when the new Because the concurrency detector is only executed from functions where concurrency is not allowed, there are two possible causes of the panic: the application called the write methods concurrently or there's a bug in the concurrency detector. If there's a bug in the concurrency detector, then there's a code path where a single threaded program will panic. I examined all of the connection code and cannot concoct a code path where a single threaded program will panic. The field |
Another challenge with making the connection write methods thread-safe: NextWriter() returns an io.WriteCloser on the message and Close() on that io.WriteCloser finalizes the message. NextWriter() automatically finalizes a pending message before starting the next message. Applications in the wild may rely on message finalization in NextWriter(). There's not a good way to make the connection write methods thread-safe given the previous paragraph. If NextWriter() starts by locking a write mutex, applications that rely on message finalization in NextWriter will block forever. If NextWriter() barges through the lock, the methods is not thread-safe. The connection write methods were designed for single threaded use and is easy to use the methods that way. |
Is there an existing feature request for this?
Is your feature request related to a problem? Please describe.
The documentation says:
To help developers detect concurrency errors in their application, the connection makes a best effort to detect concurrent calls to WriteMessage, WriteJSON and Write on the writer returned from NextWriter. When a concurrent calls is detected, the connection panics with the string
concurrent write to websocket connection
. This panic is better than the alternatives such as commingling messages on the underlying network connection or a nil pointer exception.The
concurrent write to websocket connection
panic is the most common support issue for the package. The issue arises because developers incorrectly assume that concurrency is allowed or wrongly think they covered all the bases with regards to ensuring a single writer.Describe the solution that you would like.
Here are options for improving the situation:
Improve the panic string
The terse panic string does not sufficiently describe the problem or a fix. Create a markdown file on this repository explaining the concurrency restriction and suggested fixes. Include a link to the markdown file in the panic string.
Improve the detector
The current detector is:
The second concurrent writer panics, but not the first. The stack trace for the first concurrent writer may be the key to finding the bug in the application. Cause the first writer to panic using the following code. In this code, the field
c.writing
is anint
and replaces the fieldc.isWriting
.(this code requires more review and testing).
Return errors instead of panicking
Return an error instead of panicking. Include the stack trace of the caller in the error string.
Describe alternatives you have considered.
I considered and rejected making the existing methods thread safe. It's possible to make the individual methods thread safe, but the SetWriteDeadline, EnableWriteCompression and SetCompressionLevel methods require synchronization at the application level to be useful. Because robust applications should call SetWriteDeadline, a robust application will implement synchronization at the application level. It follows that there's no point in making the individual methods thread safe.
I also considered changing the API to accept a WriteOptions struct for NextWriter, WriteMessage and WriteJSON. The options struct replaces the SetWriteDeadline, EnableWriteCompression and SetCompressionLevel methods. I rejected this idea because it's a breaking change to the API.
Anything else?
No response
The text was updated successfully, but these errors were encountered: