Skip to content

[Bug]: HTTP connection state with podman #7310

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
SamBarker opened this issue Jul 17, 2023 · 9 comments
Open

[Bug]: HTTP connection state with podman #7310

SamBarker opened this issue Jul 17, 2023 · 9 comments

Comments

@SamBarker
Copy link

SamBarker commented Jul 17, 2023

Module

Core

Testcontainers version

1.18.3

Using the latest Testcontainers version?

Yes

Host OS

Linux

Host Arch

x86

Docker version

❯ docker version
Client:       Podman Engine
Version:      4.5.1
API Version:  4.5.1
Go Version:   go1.20.4
Built:        Fri May 26 17:58:48 2023
OS/Arch:      linux/amd64

What happened?

We have been chasing issues between testcontainers and podman where tests would fail on MacOS with

Exception in thread "Thread-0" java.lang.RuntimeException: com.github.dockerjava.zerodep.shaded.org.apache.hc.core5.http.NoHttpResponseException: localhost:2375 failed to respond

or on Linux with

java.lang.RuntimeException: java.io.IOException: Broken pipe

After much head scratching we think we have figured it out as a mismatch between the connection timeout expected by testcontainers-java and the podman API. By default the podman socket is closed after 5s however test containers expects to be able to use the connection for up to 3 mins.

I'm attaching a failing test.

Podman generally expects to be started by systemd in response to data being sent to the socket (via the podman.socket systemd unit) as it is has a demon less execution model. I'm not close enough to the details to understand why the connection fails with a broken pipe instead of re-activating the connection.

My suspicion is that we see different behaviour on MacOS as gvproxy sits in between testcontaienrs and the podman API and thus when the socket goes away it handles the broken pipe differently (possibly stalled waiting for data to come back).

Relevant log output

No response

Additional Information

Podman logs

Passes with time=100: doTheBadThing-time-100-17-07-2023_04:39.log

Fails with time=10: doTheBadThing-time-10-17-07-2023_04:44.log

Test containers debug log from the failing run

podmanTimeout.log

Testcase

Test case: DoTheBadThingTest.java.txt

Systemd customisations for changing timeouts

Podman systemd configuration override override.conf.txt which can be applied with systemctl edit --user podman.service && systemctl restart --user podman.service

@robobario
Copy link

robobario commented Jul 17, 2023

I had a look to see if docker offers Keep-Alive headers but it just looks like the docker server lets the connections live a long time and the http client default keepalive would be lower.

[roby@roby kroxylicious-junit5-extension]$ socat - UNIX-CONNECT:/run/docker.sock
GET /version HTTP/1.1
Host: www.example.com
Connection: keep-alive

HTTP/1.1 200 OK
Api-Version: 1.43
Content-Type: application/json
Docker-Experimental: false
Ostype: linux
Server: Docker/24.0.2 (linux)
Date: Mon, 17 Jul 2023 07:06:16 GMT
Content-Length: 780

{"Platform":{"Name":""},"Components":[{"Name":"Engine","Version":"24.0.2","Details":{"ApiVersion":"1.43","Arch":"amd64","BuildTime":"2023-05-29T15:50:06.000000000+00:00","Experimental":"false","GitCommit":"659604f9ee","GoVersion":"go1.20.4","KernelVersion":"6.4.2-arch1-1","MinAPIVersion":"1.12","Os":"linux"}},{"Name":"containerd","Version":"v1.7.2","Details":{"GitCommit":"0cae528dd6cb557f7201036e9f43420650207b58.m"}},{"Name":"runc","Version":"1.1.7","Details":{"GitCommit":""}},{"Name":"docker-init","Version":"0.19.0","Details":{"GitCommit":"de40ad0"}}],"Version":"24.0.2","ApiVersion":"1.43","MinAPIVersion":"1.12","GitCommit":"659604f9ee","GoVersion":"go1.20.4","Os":"linux","Arch":"amd64","KernelVersion":"6.4.2-arch1-1","BuildTime":"2023-05-29T15:50:06.000000000+00:00"}
GET /version HTTP/1.1
Host: www.example.com
Connection: keep-alive

HTTP/1.1 200 OK
Api-Version: 1.43
Content-Type: application/json
Docker-Experimental: false
Ostype: linux
Server: Docker/24.0.2 (linux)
Date: Mon, 17 Jul 2023 07:28:50 GMT
Content-Length: 780

{"Platform":{"Name":""},"Components":[{"Name":"Engine","Version":"24.0.2","Details":{"ApiVersion":"1.43","Arch":"amd64","BuildTime":"2023-05-29T15:50:06.000000000+00:00","Experimental":"false","GitCommit":"659604f9ee","GoVersion":"go1.20.4","KernelVersion":"6.4.2-arch1-1","MinAPIVersion":"1.12","Os":"linux"}},{"Name":"containerd","Version":"v1.7.2","Details":{"GitCommit":"0cae528dd6cb557f7201036e9f43420650207b58.m"}},{"Name":"runc","Version":"1.1.7","Details":{"GitCommit":""}},{"Name":"docker-init","Version":"0.19.0","Details":{"GitCommit":"de40ad0"}}],"Version":"24.0.2","ApiVersion":"1.43","MinAPIVersion":"1.12","GitCommit":"659604f9ee","GoVersion":"go1.20.4","Os":"linux","Arch":"amd64","KernelVersion":"6.4.2-arch1-1","BuildTime":"2023-05-29T15:50:06.000000000+00:00"}

So 20 minutes between 2 requests in one socat session and I killed it with ctrl-c, where if I try the same with podman on linux:

socat - UNIX-CONNECT:/run/user/1000/podman/podman.sock
GET /version HTTP/1.1
Host: www.example.com
Connection: keep-alive

HTTP/1.1 200 OK
Api-Version: 1.41
Content-Type: application/json
Libpod-Api-Version: 4.5.1
Server: Libpod/4.5.1 (linux)
X-Reference-Id: 0xc000507298
Date: Mon, 17 Jul 2023 07:35:55 GMT
Content-Length: 1063

{"Platform":{"Name":"linux/amd64/arch-unknown"},"Components":[{"Name":"Podman Engine","Version":"4.5.1","Details":{"APIVersion":"4.5.1","Arch":"amd64","BuildTime":"2023-05-27T10:19:54+12:00","Experimental":"false","GitCommit":"9eef30051c83f62816a1772a743e5f1271b196d7-dirty","GoVersion":"go1.20.4","KernelVersion":"6.4.2-arch1-1","MinAPIVersion":"4.0.0","Os":"linux"}},{"Name":"Conmon","Version":"conmon version 2.1.7, commit: f633919178f6c8ee4fb41b848a056ec33f8d707d","Details":{"Package":"/usr/bin/conmon is owned by conmon 1:2.1.7-1"}},{"Name":"OCI Runtime (crun)","Version":"crun version 1.8.5\ncommit: b6f80f766c9a89eb7b1440c0a70ab287434b17ed\nrundir: /run/user/1000/crun\nspec: 1.0.0\n+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL","Details":{"Package":"/usr/bin/crun is owned by crun 1.8.5-1"}}],"Version":"4.5.1","ApiVersion":"1.41","MinAPIVersion":"1.24","GitCommit":"9eef30051c83f62816a1772a743e5f1271b196d7-dirty","GoVersion":"go1.20.4","Os":"linux","Arch":"amd64","KernelVersion":"6.4.2-arch1-1","BuildTime":"2023-05-27T10:19:54+12:00"}

real	0m10.959s
user	0m0.003s
sys	0m0.003s

The connection is killed at 10 seconds. Which looks like the IdleTimeout of the podman http server with the default 5 second api timeout (times 2).

@robobario
Copy link

I noticed the reproducer succeeds when you remove the parallelism (so only one connection is created in the HTTP client pool), I think this is because the HTTP client will retry 1 time by default. So if you have a pool of multiple connections and all of them have been closed on the server side, then this 1 retry can fail too when it leases a second closed connection from the pool.

It's visible in the earlier attached log podmanTimeout.log

2340-2023-07-17 05:21:42 DEBUG com.github.dockerjava.zerodep.shaded.org.apache.hc.client5.http.wire:106 - http-outgoing-1 >> "[write] I/O error: Broken pipe"
2341-2023-07-17 05:21:42 DEBUG com.github.dockerjava.zerodep.shaded.org.apache.hc.client5.http.impl.io.DefaultManagedHttpClientConnection:125 - http-outgoing-1: Close connection
2342-2023-07-17 05:21:42 DEBUG com.github.dockerjava.zerodep.shaded.org.apache.hc.client5.http.wire:92 - http-outgoing-1 >> "GET /v1.32/version HTTP/1.1[\r][\n]"
2343-2023-07-17 05:21:42 DEBUG com.github.dockerjava.zerodep.shaded.org.apache.hc.client5.http.wire:92 - http-outgoing-1 >> "accept: application/json[\r][\n]"
2344-2023-07-17 05:21:42 DEBUG com.github.dockerjava.zerodep.shaded.org.apache.hc.client5.http.wire:92 - http-outgoing-1 >> "x-tc-sid: 32532a43-a66e-48c2-9466-0135e16c4a27[\r][\n]"
2345-2023-07-17 05:21:42 DEBUG com.github.dockerjava.zerodep.shaded.org.apache.hc.client5.http.wire:92 - http-outgoing-1 >> "Accept-Encoding: gzip, x-gzip, deflate[\r][\n]"
2346-2023-07-17 05:21:42 DEBUG com.github.dockerjava.zerodep.shaded.org.apache.hc.client5.http.wire:92 - http-outgoing-1 >> "Host: localhost:2375[\r][\n]"
2347-2023-07-17 05:21:42 DEBUG com.github.dockerjava.zerodep.shaded.org.apache.hc.client5.http.wire:92 - http-outgoing-1 >> "Connection: keep-alive[\r][\n]"
2348-2023-07-17 05:21:42 DEBUG com.github.dockerjava.zerodep.shaded.org.apache.hc.client5.http.wire:92 - http-outgoing-1 >> "User-Agent: Apache-HttpClient/5.0.3 (Java/20)[\r][\n]"
2349-2023-07-17 05:21:42 DEBUG com.github.dockerjava.zerodep.shaded.org.apache.hc.client5.http.wire:92 - http-outgoing-1 >> "[\r][\n]"
2350-2023-07-17 05:21:42 DEBUG com.github.dockerjava.zerodep.shaded.org.apache.hc.client5.http.wire:106 - http-outgoing-1 >> "[write] I/O error: Broken pipe"
2351-2023-07-17 05:21:42 DEBUG com.github.dockerjava.zerodep.shaded.org.apache.hc.client5.http.impl.classic.InternalHttpClient:239 - ep-0000002B: endpoint closed
2352-2023-07-17 05:21:42 DEBUG com.github.dockerjava.zerodep.shaded.org.apache.hc.client5.http.impl.classic.InternalHttpClient:243 - ep-0000002B: discarding endpoint
2353-2023-07-17 05:21:42 DEBUG com.github.dockerjava.zerodep.shaded.org.apache.hc.client5.http.impl.io.PoolingHttpClientConnectionManager:349 - ep-0000002B: releasing endpoint
2354-2023-07-17 05:21:42 DEBUG com.github.dockerjava.zerodep.shaded.org.apache.hc.client5.http.impl.io.PoolingHttpClientConnectionManager:372 - ep-0000002B: connection is not kept alive
2355-2023-07-17 05:21:42 DEBUG com.github.dockerjava.zerodep.shaded.org.apache.hc.client5.http.impl.io.PoolingHttpClientConnectionManager:381 - ep-0000002B: connection released [route: {}->unix://localhost:2375][total available: 9; route allocated: 9 of 2147483647; total allocated: 9 of 2147483647]
2356-2023-07-17 05:21:42 DEBUG com.github.dockerjava.zerodep.shaded.org.apache.hc.client5.http.impl.classic.HttpRequestRetryExec:107 - ex-0000002C: Broken pipe
... snip giant exception
2023-07-17 05:21:42 INFO  com.github.dockerjava.zerodep.shaded.org.apache.hc.client5.http.impl.classic.HttpRequestRetryExec:110 - Recoverable I/O exception (java.io.IOException) caught when processing request to {}->unix://localhost:2375
2023-07-17 05:21:42 DEBUG com.github.dockerjava.zerodep.shaded.org.apache.hc.client5.http.protocol.RequestAddCookies:114 - Cookie spec selected: strict
2023-07-17 05:21:42 DEBUG com.github.dockerjava.zerodep.shaded.org.apache.hc.client5.http.protocol.RequestAuthCache:76 - Auth cache not set in the context
2023-07-17 05:21:42 DEBUG com.github.dockerjava.zerodep.shaded.org.apache.hc.client5.http.impl.classic.ProtocolExec:161 - ex-0000002C: target auth state: UNCHALLENGED
2023-07-17 05:21:42 DEBUG com.github.dockerjava.zerodep.shaded.org.apache.hc.client5.http.impl.classic.ProtocolExec:167 - ex-0000002C: proxy auth state: UNCHALLENGED
2023-07-17 05:21:42 DEBUG com.github.dockerjava.zerodep.shaded.org.apache.hc.client5.http.impl.classic.ConnectExec:116 - ex-0000002C: acquiring connection with route {}->unix://localhost:2375
2023-07-17 05:21:42 DEBUG com.github.dockerjava.zerodep.shaded.org.apache.hc.client5.http.impl.classic.InternalHttpClient:101 - ex-0000002C: acquiring endpoint (3 MINUTES)
2023-07-17 05:21:42 DEBUG com.github.dockerjava.zerodep.shaded.org.apache.hc.client5.http.impl.io.PoolingHttpClientConnectionManager:258 - ex-0000002C: endpoint lease request (3 MINUTES) [route: {}->unix://localhost:2375][total available: 9; route allocated: 9 of 2147483647; total allocated: 9 of 2147483647]
2023-07-17 05:21:42 DEBUG com.github.dockerjava.zerodep.shaded.org.apache.hc.client5.http.impl.io.PoolingHttpClientConnectionManager:283 - ex-0000002C: endpoint leased [route: {}->unix://localhost:2375][total available: 8; route allocated: 9 of 2147483647; total allocated: 9 of 2147483647]
2023-07-17 05:21:42 DEBUG com.github.dockerjava.zerodep.shaded.org.apache.hc.client5.http.impl.io.PoolingHttpClientConnectionManager:319 - ex-0000002C: acquired ep-0000002C
2023-07-17 05:21:42 DEBUG com.github.dockerjava.zerodep.shaded.org.apache.hc.client5.http.impl.classic.InternalHttpClient:120 - ex-0000002C: acquired endpoint ep-0000002C
2023-07-17 05:21:42 DEBUG com.github.dockerjava.zerodep.shaded.org.apache.hc.client5.http.impl.classic.MainClientExec:102 - ex-0000002C: executing GET /v1.32/version HTTP/1.1
2023-07-17 05:21:42 DEBUG com.github.dockerjava.zerodep.shaded.org.apache.hc.client5.http.impl.classic.InternalHttpClient:213 - ep-0000002C: start execution ex-0000002C
2023-07-17 05:21:42 DEBUG com.github.dockerjava.zerodep.shaded.org.apache.hc.client5.http.impl.io.PoolingHttpClientConnectionManager:594 - ep-0000002C: executing exchange ex-0000002C over http-outgoing-2
2023-07-17 05:21:42 DEBUG com.github.dockerjava.zerodep.shaded.org.apache.hc.client5.http.headers:169 - http-outgoing-2 >> GET /v1.32/version HTTP/1.1
2023-07-17 05:21:42 DEBUG com.github.dockerjava.zerodep.shaded.org.apache.hc.client5.http.headers:172 - http-outgoing-2 >> accept: application/json
2023-07-17 05:21:42 DEBUG com.github.dockerjava.zerodep.shaded.org.apache.hc.client5.http.headers:172 - http-outgoing-2 >> x-tc-sid: 32532a43-a66e-48c2-9466-0135e16c4a27
2023-07-17 05:21:42 DEBUG com.github.dockerjava.zerodep.shaded.org.apache.hc.client5.http.headers:172 - http-outgoing-2 >> Host: localhost:2375
2023-07-17 05:21:42 DEBUG com.github.dockerjava.zerodep.shaded.org.apache.hc.client5.http.headers:172 - http-outgoing-2 >> Connection: keep-alive
2023-07-17 05:21:42 DEBUG com.github.dockerjava.zerodep.shaded.org.apache.hc.client5.http.headers:172 - http-outgoing-2 >> User-Agent: Apache-HttpClient/5.0.3 (Java/20)
2023-07-17 05:21:42 DEBUG com.github.dockerjava.zerodep.shaded.org.apache.hc.client5.http.wire:92 - http-outgoing-2 >> "GET /v1.32/version HTTP/1.1[\r][\n]"
2023-07-17 05:21:42 DEBUG com.github.dockerjava.zerodep.shaded.org.apache.hc.client5.http.wire:92 - http-outgoing-2 >> "accept: application/json[\r][\n]"
2023-07-17 05:21:42 DEBUG com.github.dockerjava.zerodep.shaded.org.apache.hc.client5.http.wire:92 - http-outgoing-2 >> "x-tc-sid: 32532a43-a66e-48c2-9466-0135e16c4a27[\r][\n]"
2023-07-17 05:21:42 DEBUG com.github.dockerjava.zerodep.shaded.org.apache.hc.client5.http.wire:92 - http-outgoing-2 >> "Host: localhost:2375[\r][\n]"
2023-07-17 05:21:42 DEBUG com.github.dockerjava.zerodep.shaded.org.apache.hc.client5.http.wire:92 - http-outgoing-2 >> "Connection: keep-alive[\r][\n]"
2023-07-17 05:21:42 DEBUG com.github.dockerjava.zerodep.shaded.org.apache.hc.client5.http.wire:92 - http-outgoing-2 >> "User-Agent: Apache-HttpClient/5.0.3 (Java/20)[\r][\n]"
2023-07-17 05:21:42 DEBUG com.github.dockerjava.zerodep.shaded.org.apache.hc.client5.http.wire:92 - http-outgoing-2 >> "[\r][\n]"
2023-07-17 05:21:42 DEBUG com.github.dockerjava.zerodep.shaded.org.apache.hc.client5.http.wire:106 - http-outgoing-2 >> "[write] I/O error: Broken pipe"

fails on connection http-outgoing-1 then retries on http-outgoing-2 getting a broken pipe on each.

@robobario
Copy link

robobario commented Jul 19, 2023

So some changes to the HTTP client that would help it work with the default podman configuration:

  1. set connectionKeepAlive on the RequestConfig to something <= 10 seconds. This means that if the connection is idle in the client pool for longer than that it will be closed.
  2. set maxConnsPerRoute to some lower value, say 3. And increase the number of retries to be maxConnsPerRoute + 1 so that in the worst case we try all connections in the pool for the route +1. edit: this doesn't work because the client will not retry POST requests by default (because it considers it non-idempotent).
  3. set validateOnInactivity to a non-negative value so the connections are checked after some time, before the pool leases them again. edit: this doesn't work because it blocks in isStale while the connection is active, so it would block forever if used with docker.

But I'm not sure the best way to slot that into the code, are any of them something that would be safe to apply to both docker/podman? I'm wondering if testcontainers could set the connectionKeepAlive to 8 seconds for all apache httpclient usages. The cost would be some docker connections being closed that could have been re-used.

Or we could enable validateOnInactivity for both docker/podman at 8 seconds so the client stale checks all connections idle in the pool for >8s.

edit: I'm happy to contribute this to docker-java/testcontainers

robobario added a commit to robobario/docker-java that referenced this issue Jul 20, 2023
Why:
We have been experiencing problems with podman and testcontainers using a
globally cached ZerodepDockerHttpClient. The problem is, with it's default configuration,
podman closes idle connections after 10 seconds. The apache HTTP client's default
connection keepalive is 3 minutes. So we are seeing exceptions thrown when the client
attempts to use these stale connections that it has in the connection pool.

Two configurations that could help are reducing the connectionKeepAlive, so we could
configure the pool to close connections in line with this podman timeout. Or enabling
stale connection checking after the connection has been idle in the client pool for
some time.

testcontainers/testcontainers-java#7310
robobario added a commit to robobario/docker-java that referenced this issue Jul 20, 2023
Why:
We have been experiencing problems with podman and testcontainers using a
globally cached ZerodepDockerHttpClient. The problem is, with it's default configuration,
podman closes idle connections after 10 seconds. The apache HTTP client's default
connection keepalive is 3 minutes. So we are seeing exceptions thrown when the client
attempts to use these stale connections that it has in the connection pool.

Two configurations that could help are reducing the connectionKeepAlive, so we could
configure the pool to close connections in line with this podman timeout. Or enabling
stale connection checking after the connection has been idle in the client pool for
some time.

testcontainers/testcontainers-java#7310
robobario added a commit to robobario/docker-java that referenced this issue Jul 21, 2023
Why:
We have been experiencing problems with podman and testcontainers using a
globally cached ZerodepDockerHttpClient. The problem is, with it's default configuration,
podman closes idle connections after 10 seconds. The apache HTTP client's default
connection keepalive is 3 minutes. So we are seeing exceptions thrown when the client
attempts to use these stale connections that it has in the connection pool.

A configuration that could help are reducing the connectionKeepAlive, so we could
configure the pool to close connections in line with this podman timeout.

Note: stale connection checking does not work and blocks in the isStale see docker-java#1726

testcontainers/testcontainers-java#7310
@jcechace
Copy link

Any progress on this? containers/podman#17640 seems to made the decision that podman is going to stick with the 10s timer communicated to the client via the keep-alive header.

@bluejack2000
Copy link

My current workaround in tests is to use .withStartupAttempts(2) with podman.

robobario added a commit to robobario/docker-java that referenced this issue Jul 15, 2024
Why:
We have been experiencing problems with podman and testcontainers using a
globally cached ZerodepDockerHttpClient. The problem is, with it's default configuration,
podman closes idle connections after 10 seconds. The apache HTTP client's default
connection keepalive is 3 minutes. So we are seeing exceptions thrown when the client
attempts to use these stale connections that it has in the connection pool.

A configuration that could help are reducing the connectionKeepAlive, so we could
configure the pool to close connections in line with this podman timeout.

Note: stale connection checking does not work and blocks in the isStale see docker-java#1726

testcontainers/testcontainers-java#7310
@mocenas
Copy link

mocenas commented Nov 20, 2024

Any progress on this?

Just adding the we have a same problem on windows and podman, it throws:

...
com.github.dockerjava.zerodep.shaded.org.apache.hc.core5.http.NoHttpResponseException: localhost:2375 failed to respond
...

@SamBarker
Copy link
Author

We've been using

echo 'mkdir -p /etc/containers/containers.conf.d && printf "[engine]\nservice_timeout=91\n" > /etc/containers/containers.conf.d/service-timeout.conf && systemctl restart podman.socket' |  podman machine ssh --username root --

as a work around for people running with podman remote (aka MacOS and presumably windows)

@acerspyro
Copy link

acerspyro commented Feb 16, 2025

We've been using

echo 'mkdir -p /etc/containers/containers.conf.d && printf "[engine]\nservice_timeout=91\n" > /etc/containers/containers.conf.d/service-timeout.conf && systemctl restart podman.socket' | podman machine ssh --username root --

as a work around for people running with podman remote (aka MacOS and presumably windows)

Thank you very much for the hint!

With IntelliJ IDEA, I had to put mine at 300 (or even 600), otherwise it would time out before I could finish building my containers.

Why 91 specifically? And why is there a timeout at all?

@SamBarker
Copy link
Author

Why 91 specifically? And why is there a timeout at all?

I don't remember the specifics of why we picked it, trying to balance the needs of our test suite with how podman is intended to operate. The 1 was to avoid round numbers to reduce the chance of a collision (we have had a call that timed out at 90s but I don't think so).

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

7 participants