-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
fix: add connect timeout for exec websockets to avoid hanging #1247
Conversation
ops/pebble.py
Outdated
@@ -2832,10 +2832,17 @@ def _cancel_stdin(): | |||
|
|||
def _connect_websocket(self, task_id: str, websocket_id: str) -> '_WebSocket': | |||
sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM) | |||
# Set socket timeout to a short timeout during connection phase, in | |||
# case Pebble side times out (5s), so this side doesn't hang. See: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default timeout for this class is 5s. If pebble timeout is also 5s, the net result feels race-y...
I wonder how to test this exhaustively.
One option would be to set up a dummy but programmable websocket listener. What do you reckon, is it worth the effort?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not actually racy, as those two timeouts (while they happen to be the same value) are completely independent: the Pebble timeout happens first, then this 5s timeout is added to that. So the total timeout here before the Python side will fail is 10s.
It's easy to test manually with the instructions given in #1246. I wondered whether it was worth an automated test, but decided it was too hard and awkward. We'd need some hook to patch in an arbitrary delay (like the time.sleep(5.1)
in the manual test). I decided that because the fix is simple and clear, and it's easy to repro manually, it's probably not worth the hours we'd spend on an automated test for this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a caveat to this way of injecting the timeout.
Each raw socket operation may take up to 5s, meaning that the total an unsuccessful websocket connection may take would be 5 * (N+1)
where N depends entirely on the websocket-client code and deps.
I figure that this is still way better than not setting a timeout :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P.S. I have an idea how to set up automated tests for this, do tell if you'd like me to try.
I had done something similar in the past for some other protocol.
On the other hand, we'd be mostly testing our expectations of the websocket-client library, rather than our code.
To try this easily in a charm, you can patch the from ops import pebble
import socket
import websocket
def _connect_websocket(self, task_id, websocket_id):
sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
sock.settimeout(self.timeout)
sock.connect(self.socket_path)
url = self._websocket_url(task_id, websocket_id)
ws = websocket.WebSocket(skip_utf8_validation=True)
ws.connect(url, socket=sock)
sock.settimeout(None)
return ws
pebble.Client._connect_websocket = _connect_websocket |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
I could reproduce reliably with the steps in #1246, and running from your branch indeed nicely errors out instead.
Co-authored-by: Tony Meyer <tony.meyer@gmail.com>
…cal#1247) This is the fix for the issue described at canonical#1246. Essentially, if the Pebble timeout has already elapsed, Pebble will happily wait indefinitely for the connect to go through, and the Python side will hang. Add a timeout during the connect phase to cut this short. Fixes canonical#1246. --------- Co-authored-by: Tony Meyer <tony.meyer@gmail.com>
This is the fix for the issue described at #1246. Essentially, if the Pebble timeout has already elapsed, Pebble will happily wait indefinitely for the connect to go through, and the Python side will hang. Add a timeout during the connect phase to cut this short.
Fixes #1246.