Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Improve http.Client connection handling #1581

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

mzagozen
Copy link
Contributor

@mzagozen mzagozen commented Nov 17, 2023

If the (HTTPS) server closes the socket on us after a response, the stream becomes invalid. Let's communicate that to the user (http.Client) by calling on_error. The user (http.Client) can automatically reconnect and resend the failed requests without any intervention from the actors further up the stack.

The HTTP(S) server closing the socket is perfectly valid behavior for HTTP/1.0. Users of http.Client do not have to do anything, they just get the response as expected.

// closed the socket. In that case just call the on_close callback.
char errmsg[] = "TLS TCP socket already closed?";
log_debug(errmsg);
on_close->$class->__asyn__(on_close, self);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We initially thought about raising an exception here instead, but http.Client calls net.TLSConnection.write() asynchronously. And I guess others would too?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you should blindly accept async write as the correct thing. It's not a complete design as-is...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, for close, I'm not sure we even need to close on_close. Like why do we end up here with a -1 stream?

If you call TLSConnection.close() 10 times in a row, would we expect the on_close() callback to also be called 10 times? I think it makes more sense to treat it as a declarative thing, so if TLSConnection.close() is called on a closed connection, the goal of having the connection closed is already reached. We don't actually transition, i.e. we don't go from established -> closed, and so on_close will not be called!?

The stream must have gone to -1 at some earlier point at which the on_error() callback should be called, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling on_close even though we're already closed (stream == -1) allows for chaining the callbacks in reconnect():

    def reconnect():
        close(_connect)

@mzagozen mzagozen force-pushed the http-client-reconnect branch from 76acc0d to 136a7f7 Compare November 17, 2023 10:16
@@ -460,6 +460,7 @@ actor Client(cap: net.TCPConnectCap, scheme: str, address: str, port: ?int, tls_
var close_connection: bool = True
var tcp_conn: ?net.TCPConnection = None
var tls_conn: ?net.TLSConnection = None
var reconnecting: bool = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to connecting and initialize to True since we are in fact not connected when we start up. This way we can buffer outgoing requests already from the start.

Separate to this, we should add arguments for configuring that sort of buffering. I think per default we want to allow for some queueing up of outgoing requests but it should definitely be bounded, perhaps both in number of requests as well as reconnection attempts and in time.

For a more "proper" and robust error handling, I think applications will want to crank down the amount of queued up request.

HTTP 1.0 servers close the connection automatically after sending the
response. We updated the http.Client actor such that the users can use
the high-level API like making several requests in a row without having
to worry about reconnects. http.Client will buffer the requests and
flush the buffer once it (re)connects the socket.

Writing to a closed socket is bad and guaranteed to fail, so the user
(in this case the http.Client actor) should be notified ASAP before
making further writes. Rather than using the on_error callback from the
underlying TCPConnection and TLSConnection actors we raise an exception
from their write() methods. This implies the calls to write() are
synchronous, but since we're not waiting on network I/O it should be
fine?!
@mzagozen mzagozen force-pushed the http-client-reconnect branch from 136a7f7 to 1178aa0 Compare November 17, 2023 23:13
@mzagozen mzagozen changed the title Call on_error if net.TLSConnection actor writes to closed socket Improve http.Client connection handling Nov 17, 2023
@@ -500,10 +511,10 @@ actor Client(cap: net.TCPConnectCap, scheme: str, address: str, port: ?int, tls_
r, buf = parse_response(buf, _log)
if r is not None:
if "connection" in r.headers and r.headers["connection"] == "close":
close_connection = True
_conn_close()
# Is this really the right thing to do here? If the client
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a TODO entry here. We should inspect our queue, i.e. _on_response list to see if there is more stuff.

I'm also thinking like we can maybe have some basic heuristics, like when we create a http.Client the first time we obviously connect directly then we expect to run at least one query. Maybe we should not reconnect after that (in case of HTTP / 1.0 or not having persistent in later HTTP versions) until there is a second query. Now if we see a second query we can assume "there are multiple requests" and thus reconnect directly after the 2nd and subsequent requests. WDYT?

base/src/http.act Outdated Show resolved Hide resolved
base/src/net.ext.c Outdated Show resolved Hide resolved
if (strstr(errmsg, "bad file descriptor")) {
// This can happen if the socket is closed, se we raise an exception
// and let the caller retry
$RAISE(((B_BaseException)B_RuntimeErrorG_new(to$str(errmsg))));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are generally doing Python-like-exception stuff. It's not a specific design, i.e. we can deviate where we think there is good reason but if we have no idea we might as well follow Python. I think Python uses OSError for most of these, not RunTimeError.

base/src/net.ext.c Outdated Show resolved Hide resolved
Add some basic heuristics, like when we create a http.Client the first
time we obviously connect directly then we expect to run at least one
query. Maybe we should not reconnect after that (in case of HTTP / 1.0
or not having persistent in later HTTP versions) until there is a second
query. Now if we see a second query we can assume "there are multiple
requests" and thus reconnect directly after the 2nd and subsequent
requests.
By grouping the socket errors under ConnectionError we can be more
precise in handling the error in consumers of TCP connection actors.
@@ -344,6 +344,9 @@ class MemoryError (Exception):
class OSError (Exception):
pass

class ConnectionError (OSError):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't actually looked much at how Python does things beyond looking at OSError. I see now that it subclasses things quite nicely. Should we not just copy that whole thing verbatim? Like it feels like a better starting point than starting from scratch. In particular, in this case for the error you later raise, I think it's a BrokenPipeError.

We probably want to consider changing the __init__ method of our OSError so we can take errno and stuff like that, again just like in Python. Without default args it'll be a little fiddly, but that's OK I guess.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants