-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Jaeger-collector stops storing traces #897
Comments
@jkandasa, are we using Elasticsearch for the soak tests? For how long are we running those? |
this may add some more insight: #896 (comment) |
After some digging it looks like the culprit is
which is reported in #779. However, I think maybe we should disable buffering of messages that were rejected because of validation so that Jaeger will still be able to process and save next traces. |
@jpkrohling In the past I've run soak tests for up to 12 hours with as many as 5 collectors and have not seen this issue. We have been waiting to get a better performance testing setup to run longer tests. @jkandasa can help more with this in the future. |
Yes |
Requirement - what kind of business use case are you trying to solve?
I want to be sure
jaeger-collector
collects and persists dataProblem - what in Jaeger blocks you from solving the requirement?
We've started using Jaeger in production 5 days ago. Since we started sending traces from 60+ services it stopped working twice. I mean by that that traces that are sent to this particular collector are no longer available in
jaeger-query
. We have 4 collectors (on prod & dev). All use the same Elasticsearch cluster to store their traces. Still, only those handling production traffic break. Dev traces go through.Despite not very readable logs on
stderr
(#896) I was able to fish out some stacktrace which may be helpful. Appearance of logs onstderr
correlates with this collector not working. There are no logs onstderr
orstdout
if it operates normally.jaeger-collector v1.5.0 deployed on Kubernetes cluster
Pod restart resolves the issue for some time
Proposal - what do you suggest to solve the problem or improve the existing situation?
N/A
Any open questions to address
N/A
The text was updated successfully, but these errors were encountered: