-
Notifications
You must be signed in to change notification settings - Fork 136
[http_client] broken connection to firehose.eu-west-1.amazonaws.com:443 #354
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Comments
This is current recommendation for CloudWatch plugin config. Could you try these config? Also i wonder how your load testing is running? These errors shows only in low throughput but not high throughput seems not make sense to me. |
The above graphs are of a load test we did on our application and the metrics generated by Firelens during that time. From what I see in the guidance, this config helps for high throughput cases which is not the problem here. Should I try it anyway? |
Tried with the new config, and still seeing: |
Any new on this please? |
@LucasHantz , may I confirm with you that the issue only occurs with low throughput? So you can confirm that you run fluent bit in the same way and same config and you see problems only with a lower ingestion rate right? May I know what the throughput is? |
The issue happens in fact both the low and high throughput. The following graph is the number of record per minute in the last 8h. As you can see, 2 times in the last 8 hours we have fluent bit falling and not reporting any new logs. Until the fluentbit container exploded in memory and force the whole task to shutdown |
Any thoughts on this? What I can provide more to help figure out this problem? |
Just saw the issue raised on fluent/fluent-bit#5705 I'm getting this error as well in our traces |
@PettitWesley maybe? Any way to get that pushed up in the line, it's impacting our prod and I don't see how to revert back to a stable solution on this |
@LucasHantz Unfortunately right now I don't have any good ideas beyond using the settings here: #340 And checking this: https://github.com/aws/aws-for-fluent-bit/blob/mainline/troubleshooting/debugging.md#network-connection-issues |
Describe the question/issue
I'm having a broken connection error to Firehose and Cloudwatch on containers with low traffic as they are in a staging environment.
Once they log this error, the RAM usage is only growing until they reach the maximum threshold and the task is killed.
Configuration
Fluent Bit Log Output
Fluent Bit Version Info
I'm reproducing the same error with the stable and latest version of the image.
Cluster Details
ECS Fargate with awsvpc networking.
The Firehose and Cloudwatch VPC endpoints are enabled.
Application Details
NTR
Steps to reproduce issue
We have run load testing on the container with the same configuration without noticing this error, so it seems this error is happening when the throughput is low.
Related Issues
This is the new configuration I've come up with based on the recommendation given here:
#351
Let me know if I did something wrong.
The text was updated successfully, but these errors were encountered: