-
-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
v2.6.4 php-fpm gets stuck in read state #5420
Comments
Thanks for the report. What is your config? |
Caddyfile:
|
That's odd. Was it broken on v2.6.2 as well? Would you be able to set up a minimally reproducible example? i.e. a simple PHP script that exhibits the issue, plus a Caddyfile that only has the bits that are relevant to the breakage. |
nope, v2.6.2 ran on production just fine Im having trouble reproducing it locally (or on our staging environment) Maybe it's got something to do with the AWS Network Load Balancer, or maybe bad bots that send malformed headers (wrong Content-Length etc) |
v2.6.4 should be functionally equivalent to v2.6.2 in terms of the proxy buffering functionality. I don't understand what's going on here. |
Can you try enabling access logs and maybe see if we can pin down which requests exactly are hanging? |
We have cloudwatch enabled for our access logs
From this ts (2023-03-05T20:01:15.211+01:00) I went back in the logs to see what was happening. |
Is your health.php making any external calls that might not be returned to PHP before Caddy gives up on the request? I worked on a project where we had reverse proxying to PHP on Docker and couldn't figure out why we would repeatedly get 5xx errors and no running PHP. It turned out that we were making requests with Curl to a flaky service and someone had inadvertently set |
nope: <?php
const MAX_LOAD = 25;
$load = sys_getloadavg();
if ($load[0] > MAX_LOAD) {
http_response_code(503);
exit('503');
}
if (!isset($_SERVER['APP_ENV'])) {
$root = __DIR__.'/..';
if (!file_exists($root.'/.env')) {
http_response_code(500);
exit('unknown environment');
}
require_once $root.'/vendor/autoload.php';
Dotenv\Dotenv::createImmutable($root)->load();
}
exit('ok'); and if i understand strace correctly, it's php who is trying to read on the incoming connection from caddy that is stuck |
Thanks, the health.php file you provided looks fine. I stand by my theory though that you have something else that's causing PHP to not return quickly enough for your call to /health.php. I did create a reproduction using Docker Compose and put it in this repo: https://github.com/trea/php-fpm-issue If you follow the README (and you may have to run the healthcheck a few times), you end up with requests that take too long. The healthcheck fails and then autoheal restarts it and Caddy can't connect to PHP anymore until it's finished restarting:
Log Output
I would recommend setting up a slowlog for FPM and ensuring that the ptrace capability is available, that way you should be able to get a better idea of what is causing FPM to hang. In this test case the output points to
|
There are indeed some calls that incidentally take too long. Maybe #5431 is related |
Actually, after looking into this more, I think you were right when you suggested #5367 was related, and I do think #5431 is related as well. I think they specifically go back to this change: The FastCGI specification states that Responders read up to the total of CONTENT_LENGTH, but as a result of 5367, that is now empty for requests with I have updated my repo to include both versions of Caddy and two more services in the Docker Compose file I understand the reverting of the buffer settings, but I think Caddy should guarantee that I think changing this to guard against empty strings and send a 0 instead of an empty string for In fact, it's done for the FastCGI protocol elsewhere 🤔 : |
Ah, interesting. That does make sense @trea. PR is definitely welcome :) |
…PHP-FPM requests from hanging Fixes: caddyserver#5420
This turned out to be more complicated than I expected (weird! 😅 ) Setting the However, I suspect this will mean only preventing those requests from hanging and not doing anything about the any underlying issue. I discovered that FastCGI doesn't allow for transfer encodings, and indeed Caddy's reverse proxying rightly categorizes the Caddy accounts for this in its HTTP Transport for reverse proxying and allows for buffering by default. The issue seems to just be that it doesn't do so for FastCGI. For a point of comparison, nginx has buffering for fastcgi on by default 4/8k in memory and then to tmpfile up to 1gb both configurable limits. As a recap:Prior to Caddy 2.6.3: Requests to PHP backends that had |
@trea Excellent, thanks for confirming. We'll make note of the need for FastCGI to buffer. @WeidiDeng is currently our FastCGI expert -- I'm not sure we're ready to implement temp file buffers though, but maybe we are? I dunno. I'm sure we'd also accept a PR for that too if you're interested 😉 Thank you for the carefully-researched contribution and the effort you have made for an elegant fix! |
You're very welcome! Glad I could dig into it! I might have a look at buffering for the transport at some point 😄 I think a few intermediary steps could be taken:
My concern mostly is that yes Caddy will now not OOM, but PHP upstreams will not work correctly for chunked requests and I suspect that'll cause some headaches. 😅 |
Hi @mholt ; You marked this problem as solved in 2.7.6, but we continue to experience the same problem in requests made with --header "Transfer-Encoding: chunked". I encounter the same problem when I try to compile from the master branch.
For our customers who need to use this header, we are forced to revert to version 2.6.3.
I might be missing something. Thanks. |
@ahmetsoguksu Can you post a minimally reproducible test case? For example, a Caddyfile config (ideally one that is self-contained, as setting up backend apps and such is time-consuming), and a |
Hi @mholt, Since we are experiencing the relevant problem with php-fpm; I have to add this to the test scenario as well. Lets started with version 2.7.6 $ caddy version
v2.7.6 h1:w0NymbG2m9PcvKWsrXO6EEkY9Ru4FJK8uQbYcev1p3A=
We can test it with a simple php code. $ cat wwwroot/index.php
<?php
echo "Hello"; I used container for php-fpm, you can use the same compose file just by changing the path of index.php. $ cat docker-compose.yml
version: "3.7"
services:
php-fpm:
image: php:8.1-fpm
ports:
- "9000:9000"
volumes:
- /home/ahmetsoguksu/projects/php-fpm/wwwroot:/home/ahmetsoguksu/projects/php-fpm/wwwroot The Caddyfile is like this: $ cat /etc/caddy/Caddyfile
:80 {
root * /home/ahmetsoguksu/projects/php-fpm/wwwroot
php_fastcgi localhost:9000
encode gzip zstd
file_server
}
I see that "Hello" is returned in a normal request: $ curl 127.0.0.1
Hello When I send a request with the "Transfer-Encoding:chunked" header, I do not receive a response: $ curl 127.0.0.1 --header "Pragma: no-cache" -X POST -d file=test --header "Transfer-Encoding:chunked" -vvv
Note: Unnecessary use of -X or --request, POST is already inferred.
* Trying 127.0.0.1:80...
* Connected to 127.0.0.1 (127.0.0.1) port 80 (#0)
> POST / HTTP/1.1
> Host: 127.0.0.1
> User-Agent: curl/7.81.0
> Accept: */*
> Pragma: no-cache
> Transfer-Encoding:chunked
> Content-Type: application/x-www-form-urlencoded
>
I don't experience any problems when I try it with version 2.6.3. $ wget https://github.com/caddyserver/caddy/releases/download/v2.6.3/caddy_2.6.3_linux_amd64.deb
$ sudo dpkg -i caddy_2.6.3_linux_amd64.deb
$ caddy version
v2.6.3 h1:QRVBNIqfpqZ1eJacY44I6eUC1OcxQ8D04EKImzpj7S8=
$ sudo systemctl restart caddy $ curl 127.0.0.1 --header "Pragma: no-cache" -X POST -d file=test --header "Transfer-Encoding:chunked" -vvv
Note: Unnecessary use of -X or --request, POST is already inferred.
* Trying 127.0.0.1:80...
* Connected to 127.0.0.1 (127.0.0.1) port 80 (#0)
> POST / HTTP/1.1
> Host: 127.0.0.1
> User-Agent: curl/7.81.0
> Accept: */*
> Pragma: no-cache
> Transfer-Encoding:chunked
> Content-Type: application/x-www-form-urlencoded
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Content-Type: text/html; charset=UTF-8
< Server: Caddy
< X-Powered-By: PHP/8.1.27
< Date: Thu, 07 Mar 2024 13:30:34 GMT
< Content-Length: 5
<
* Connection #0 to host 127.0.0.1 left intact
Hello Thanks. |
@mholt do you have any comments about @ahmetsoguksu's latest comment? Thank you. |
What are the contents of the file named One thought I have is that we/Go recently fixed a bug related to concurrently responding to requests while still reading from them (i.e. before the payload is finished). Does |
Based on this (#5420 (comment)), we first upgraded Caddy which was 2.6.4 to 2.7.6 and we enabled fastcgi buffering like this:
Now it doesn't hang. |
@mholt Could you help us implement fastcgi buffering like Nginx: "nginx has buffering for fastcgi on by default 4/8k in memory and then to tmpfile up to 1gb both configurable limits." With this setting in Caddyfile ( Thanks. |
@akindemirci That's definitely something I could prioritize with a sufficient sponsorship (send me an email if you have interest in that -- see Caddy website for an email address)! |
Could we get this issue reopend? It still persists in the latest version and it's pretty evil that you can kill with simple curl commands a PHP Application with Caddy. I would have rated this tbh also as an security issue as you an attacker can easily deny access to websites. Here is an simple reproducer: FROM php:fpm-alpine
COPY --from=caddy /usr/bin/caddy /usr/bin/caddy
COPY <<EOF /etc/caddy/Caddyfile
:8000
root * /var/www/html/public
php_fastcgi localhost:9000 {
request_buffers 4k
response_buffers 4k
}
EOF
COPY <<EOF /var/www/html/public/index.php
<?php
phpinfo();
EOF
COPY <<EOF /startup
#!/usr/bin/env sh
caddy run --config /etc/caddy/Caddyfile &
php-fpm &
wait
EOF
ENTRYPOINT ["/bin/sh", "/startup"] docker build -t caddy-issue .
docker run --rm -p 8000:800 caddy-issue http://localhost:8000 serves a phpinfo, run multiple times:
and the page does not respond anymore as php-fpm processes are blocked |
@shyim this issue is for the version v2.6.4 of caddy. But your comment is about the latest version of caddy. Either way, it's tracked in the new issue. |
Since we upgraded from v2.6.3 to v2.6.4, we are experiencing some problems with our php-fpm backend
The problem is that the php-fpm process gets stuck in a "read" state (and not go back to "accept" state).
Eventually php's pm.max_children is reached and the entire server will be killed by ec2's health checker.
I've done some debugging with strace, but have not found a pattern, because the servers gets lots of traffic, but the problem only occurs every once in a while. (maybe once in a 1000 requests)
Im afraid that #5367 might need to be revisited.
I've reverted back to v2.6.3 and the problem is gone.
The text was updated successfully, but these errors were encountered: