-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
[Bug]: HTTP connection state with podman #7310
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Comments
I had a look to see if docker offers Keep-Alive headers but it just looks like the docker server lets the connections live a long time and the http client default keepalive would be lower.
So 20 minutes between 2 requests in one socat session and I killed it with ctrl-c, where if I try the same with podman on linux:
The connection is killed at 10 seconds. Which looks like the IdleTimeout of the podman http server with the default 5 second api timeout (times 2). |
I noticed the reproducer succeeds when you remove the parallelism (so only one connection is created in the HTTP client pool), I think this is because the HTTP client will retry 1 time by default. So if you have a pool of multiple connections and all of them have been closed on the server side, then this 1 retry can fail too when it leases a second closed connection from the pool. It's visible in the earlier attached log podmanTimeout.log
fails on connection |
So some changes to the HTTP client that would help it work with the default podman configuration:
But I'm not sure the best way to slot that into the code, are any of them something that would be safe to apply to both docker/podman? I'm wondering if testcontainers could set the connectionKeepAlive to 8 seconds for all apache httpclient usages. The cost would be some docker connections being closed that could have been re-used.
edit: I'm happy to contribute this to docker-java/testcontainers |
Why: We have been experiencing problems with podman and testcontainers using a globally cached ZerodepDockerHttpClient. The problem is, with it's default configuration, podman closes idle connections after 10 seconds. The apache HTTP client's default connection keepalive is 3 minutes. So we are seeing exceptions thrown when the client attempts to use these stale connections that it has in the connection pool. Two configurations that could help are reducing the connectionKeepAlive, so we could configure the pool to close connections in line with this podman timeout. Or enabling stale connection checking after the connection has been idle in the client pool for some time. testcontainers/testcontainers-java#7310
Why: We have been experiencing problems with podman and testcontainers using a globally cached ZerodepDockerHttpClient. The problem is, with it's default configuration, podman closes idle connections after 10 seconds. The apache HTTP client's default connection keepalive is 3 minutes. So we are seeing exceptions thrown when the client attempts to use these stale connections that it has in the connection pool. Two configurations that could help are reducing the connectionKeepAlive, so we could configure the pool to close connections in line with this podman timeout. Or enabling stale connection checking after the connection has been idle in the client pool for some time. testcontainers/testcontainers-java#7310
Why: We have been experiencing problems with podman and testcontainers using a globally cached ZerodepDockerHttpClient. The problem is, with it's default configuration, podman closes idle connections after 10 seconds. The apache HTTP client's default connection keepalive is 3 minutes. So we are seeing exceptions thrown when the client attempts to use these stale connections that it has in the connection pool. A configuration that could help are reducing the connectionKeepAlive, so we could configure the pool to close connections in line with this podman timeout. Note: stale connection checking does not work and blocks in the isStale see docker-java#1726 testcontainers/testcontainers-java#7310
Any progress on this? containers/podman#17640 seems to made the decision that podman is going to stick with the 10s timer communicated to the client via the |
My current workaround in tests is to use |
Why: We have been experiencing problems with podman and testcontainers using a globally cached ZerodepDockerHttpClient. The problem is, with it's default configuration, podman closes idle connections after 10 seconds. The apache HTTP client's default connection keepalive is 3 minutes. So we are seeing exceptions thrown when the client attempts to use these stale connections that it has in the connection pool. A configuration that could help are reducing the connectionKeepAlive, so we could configure the pool to close connections in line with this podman timeout. Note: stale connection checking does not work and blocks in the isStale see docker-java#1726 testcontainers/testcontainers-java#7310
Any progress on this? Just adding the we have a same problem on windows and podman, it throws:
|
We've been using echo 'mkdir -p /etc/containers/containers.conf.d && printf "[engine]\nservice_timeout=91\n" > /etc/containers/containers.conf.d/service-timeout.conf && systemctl restart podman.socket' | podman machine ssh --username root -- as a work around for people running with podman remote (aka MacOS and presumably windows) |
Thank you very much for the hint! With IntelliJ IDEA, I had to put mine at 300 (or even 600), otherwise it would time out before I could finish building my containers. Why |
I don't remember the specifics of why we picked it, trying to balance the needs of our test suite with how podman is intended to operate. The |
Module
Core
Testcontainers version
1.18.3
Using the latest Testcontainers version?
Yes
Host OS
Linux
Host Arch
x86
Docker version
What happened?
We have been chasing issues between testcontainers and podman where tests would fail on MacOS with
or on Linux with
After much head scratching we think we have figured it out as a mismatch between the connection timeout expected by testcontainers-java and the podman API. By default the podman socket is closed after 5s however test containers expects to be able to use the connection for up to 3 mins.
I'm attaching a failing test.
Podman generally expects to be started by systemd in response to data being sent to the socket (via the
podman.socket
systemd unit) as it is has a demon less execution model. I'm not close enough to the details to understand why the connection fails with a broken pipe instead of re-activating the connection.My suspicion is that we see different behaviour on MacOS as gvproxy sits in between testcontaienrs and the podman API and thus when the socket goes away it handles the broken pipe differently (possibly stalled waiting for data to come back).
Relevant log output
No response
Additional Information
Podman logs
Passes with time=100: doTheBadThing-time-100-17-07-2023_04:39.log
Fails with time=10: doTheBadThing-time-10-17-07-2023_04:44.log
Test containers debug log from the failing run
podmanTimeout.log
Testcase
Test case: DoTheBadThingTest.java.txt
Systemd customisations for changing timeouts
Podman systemd configuration override override.conf.txt which can be applied with
systemctl edit --user podman.service && systemctl restart --user podman.service
The text was updated successfully, but these errors were encountered: