Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Traces in a rolled-over index are unable to be hotlinked via /trace/{id} #3152

Closed
wiardvanrij opened this issue Jul 20, 2021 · 7 comments · Fixed by #3169
Closed

Traces in a rolled-over index are unable to be hotlinked via /trace/{id} #3152

wiardvanrij opened this issue Jul 20, 2021 · 7 comments · Fixed by #3169
Labels
documentation needs-info The ticket does not contain sufficient information storage/elasticsearch

Comments

@wiardvanrij
Copy link

Describe the bug

  • Search a trace in your 'history' that should be in an index that is in a rollover'd index.
  • You are able to find this via the search and click on it. This gives a working view of your trace at /trace/{your-id}
  • Open a new tab
  • Insert the link
  • You get a 404 when directly going to the //trace/{your-id} page

When using a trace that is 'fresh' i.e. has not been in a rollover index, the /trace/{your-id} works.

Expected behavior
Directly 'hotlinking' to any /trace/{your-id} should work. Regardless if this lives in a roll-over'd index.

Version (please complete the following information):

  • Jaeger version: 1.21

What troubleshooting steps did you try?
So many things, but your own tracing and debug lvls are empty.

Additional context
I'm aware of other issues before, I have searched them. The key difference here is:

  • We can find the traces via search. It's merely based on the direct link to /trace/{your-id}

I think, but I got no evidence for this, that the direct page on /trace/{your-id} does not account for the flag es.use-aliases

@pavolloffay
Copy link
Member

Without knowing your storage configuration it's impossible to understand whether this is a bug or misconfiguration. Could you please share all jaeger flags and rollover script configuration?

@pavolloffay pavolloffay added the needs-info The ticket does not contain sufficient information label Jul 20, 2021
@wiardvanrij
Copy link
Author

We are using Elasticsearch as storage. The flags regarding es.use-alias are implemented over all the components. We use jobs for rollovers and clean up. These come from Jaeger itself.
Basically nothing special, rollover on size.

I'm open to share more tomorrow but I'm somewhat confident that this ain't a misconfiguration as everything works fine. Traces can be found/searched for. It's merely not working for specific traces that are "older".

This gives an 404 when using the URL directly. Even though these can be found in ES. When just searching by date, these traces can be accessed. Again, this behavior is only for "older" traces.

Stats wise, we ingest about 350 billion spans per month. Or for the sane European, that's 350 miljard.

@pavolloffay
Copy link
Member

The real configuration values would help here, but if you don't want to share that is fine.

Setting --es.max-span-age to match rollover's "lookback" (

print('lookback configuration:')
) should solve your issue.

@wiardvanrij
Copy link
Author

Welp, this is awkward. That indeed fixed it. I still don't really understand how one could get a trace from the search UI part, but not from the URL. Anyhow, thank you so much!

@pavolloffay
Copy link
Member

pavolloffay commented Jul 21, 2021

Let's keep this open and improve docs so the other people don't run into the same issue.

Where did you learn how to configure rollover? In the initial setup did you configure --es.max-span-age?

We should improve rollover docs and as well document this in --es.max-span-age.

Are you using operator as well?

@wiardvanrij
Copy link
Author

I'm using the helm chart, but we've edited this quite some bit so this is not perse in line with upstream anymore.
So we use that rollover feature there: https://github.com/jaegertracing/helm-charts/blob/main/charts/jaeger/values.yaml#L577

but instead of days, we use

    - name: CONDITIONS
      value: '{"max_size": "75gb"}'

And basically this works like a charm and you can also see your traces via the search (also on rollover'd indexes).

@albertteoh
Copy link
Contributor

I still don't really understand how one could get a trace from the search UI part, but not from the URL

@wiardvanrij I think this is a good question and I'll attempt (because I didn't understand myself until looking over the code just now) to explain why this behaviour is the way it is. Feel free to correct me if I'm wrong @pavolloffay.

If you're executing the lookback action as per the docs and the max-span-age doesn't match, it may mean that some spans were evicted from the read alias, causing the search by TraceID to fail because it's looking only at the read alias. If the spans were still in the read alias, the mismatch in max-span-age could lead to the query still missing the window of time where the spans existed:

It means that old data will not be available for search. This imitates the behavior of --es.max-span-age flag used in the default index-per-day deployment.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
documentation needs-info The ticket does not contain sufficient information storage/elasticsearch
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants