Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

NameResolutionTelemetry event source provides average dns lookup time, which is not very informative #57553

Closed
HolyPrapor opened this issue Aug 17, 2021 · 6 comments
Milestone

Comments

@HolyPrapor
Copy link

An average time is misleading in a lot of cases because one is unable to see spikes of load.

Unfortunately, .NET NameResolutionTelemetry does not calculate informative percentiles nor it has a way to calculate those because underlying stop/fail events do not have a stopwatch (or at least hostname) to calculate latency.

I understand that underlying counters do not calculate percentiles because it's expensive and calculate average instead. (It's better than nothing)

But as far as I can see, it is very easy to add this info to events so that one could calculate whatever he needs.

Would you consider this proposal?

@dotnet-issue-labeler dotnet-issue-labeler bot added area-System.Net untriaged New issue has not been triaged by the area owner labels Aug 17, 2021
@ghost
Copy link

ghost commented Aug 17, 2021

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

An average time is misleading in a lot of cases because one is unable to see spikes of load.

Unfortunately, .NET NameResolutionTelemetry does not calculate informative percentiles nor it has a way to calculate those because underlying stop/fail events do not have a stopwatch (or at least hostname) to calculate latency.

I understand that underlying counters do not calculate percentiles because it's expensive and calculate average instead. (It's better than nothing)

But as far as I can see, it is very easy to add this info to events so that one could calculate whatever he needs.

Would you consider this proposal?

Author: HolyPrapor
Assignees: -
Labels:

area-System.Net, untriaged

Milestone: -

@MihaZupan
Copy link
Member

MihaZupan commented Aug 17, 2021

The NameResolution telemetry will emit Start/Stop pairs for each resolution.

The events will be fired from within the same async scope and can as such be correlated (an AsyncLocal created during Start will be visible during Stop).
From this, you can get the time taken for a specific operation by calculating the difference between the event timestamps, or calculate any other statistic you are interested in.

For an example, see https://gist.github.com/MihaZupan/ba499563e5d602d990872fd9f5c35229 that is making use of the Yarp.Telemetry.Consumption package (it can be used outside of YARP). The sample does correlation for Kestrel events, but then pattern is the same for name resolution.

@MihaZupan
Copy link
Member

That said, if there are some statistics that would be useful to include in counters for multiple users, we would certainly consider it - one such discussion is going on in #48885

@jkotas
Copy link
Member

jkotas commented Aug 17, 2021

The events will be fired from within the same async scope and can as such be correlated (an AsyncLocal created during Start will be visible during Stop).

AsyncLocals work only if you are consuming the events in-proc.

@MihaZupan Is it possible to correlate the events in out-of-proc listener? Out-of-proc listeners are very common in telemetry solutions.

@MihaZupan
Copy link
Member

Is it possible to correlate the events in out-of-proc listener?

Yes. If you enable the TplEventSource with the TasksFlowActivityIds keyword, other EventSource events will also contain an ActivityID. The activity ID is also tracked as an AsyncLocal by ActivityTracker and it encodes an async scope hiearchy.

An example with PerfView using *System.Net.Sockets,*System.Net.NameResolution,*System.Net.Http,*System.Net.Security,System.Threading.Tasks.TplEventSource:0x80:4:
image

@karelz
Copy link
Member

karelz commented Aug 17, 2021

Triage: Seems to be addressed - solution does exist. Closing.

@karelz karelz closed this as completed Aug 17, 2021
@karelz karelz added this to the 6.0.0 milestone Aug 17, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Sep 16, 2021
@karelz karelz removed the untriaged New issue has not been triaged by the area owner label Oct 20, 2022
# for free to subscribe to this conversation on GitHub. Already have an account? #.
Projects
None yet
Development

No branches or pull requests

4 participants