-
Notifications
You must be signed in to change notification settings - Fork 744
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Call tree summary, e.g. after each request #639
Comments
Thanks for the great feature request! I really appreciate the level of thought and detail that you've put into this issue; this is very helpful! :) I believe @davidbarsky has been experimenting with something similar --- I'm not sure how much additional work is needed to get that crate ready for release, it may just be adding documentation and polish. I think David's crate right now is currently closer to a tree-structured log than a flamegraph-like profile — it performs no aggregation of multiple instances of the same span, and includes events as children of spans. However, I think we could add a lot of the features you're proposing on top of what he's got, and it could probably expose some configuration options to change the level of aggregation, etc. Here are some assorted notes on what you've written up:
Note that
Span metadata already has a
A much simpler option: we could just start by tracking any span that is a root (i.e. doesn't have a parent). We could also add the ability to layer in a filter using the existing filter infrastructure, to allow selecting which roots should be tracked by disabling those we don't care about. I think users could easily configure the behavior you're describing, with a special And some general open questions:
|
Yup. I can release it now as an alpha to crates.io with minimal documentation, but before I feel comfortable saying its "ready", I'd like to add additional documentation.
I'd gladly accept these PRs. The proposal @kolloch has laid out is a direction I'd like to see |
Thank you for your interest and your thoughtful comments!
@hawkw @davidbarsky I just had a look at the code and I don't see much overlap, unfortunately. No aggregation, the output format is quite different, ... I might be wrong! Maybe when I start implementing I will realize that I need exactly what David wrote already ;)
Yes, I am aware.
Very good suggestion, sounds like a match. Thanks!
That is a good default, true.
Nice, I have to read up on the code to fully grasp what you mean.
I am thinking of true call paths in which path entries correspond to code locations. Probably I could do the matching on callsite. It might be sometimes useful to distinguish spans by further means but I would only implement it if that happens to be common in practice. I'd guess by default not and there might be an option or so to include certain fields in the call path element identity?
Yeah, true. We could e.g. count all warnings/errors at a call path. The name of events is not very useful without having the source code side-by-side: definition of the name. Maybe we could use a certain special field value or do you have other ideas? In similar cases, I have seen the formatting string used as informal "name" -- with the placeholders still shown as placeholders. E.g. Unfortunately, that is not yet available within the |
Thanks to your encouragement, I finally got around putting something together: https://github.com/kolloch/reqray I think it should be useful at its current form. One implementation detail that I am uncomfortable with is the use of locking through the span extensions: Since this involves locking under the hood and I use extensions not only from the current span, but the parent and the root spans, I could imagine this results in deadlocks if another layer also uses these extensions. Can I assume if I only use my own crate types in the extensions map that I am free from such interferences of other extensions? Thanks! |
This looks really cool, @kolloch!
Yup. While the types placed within an extension can be shared between layers, I'm not sure how often it happens in practice due to those types being private. |
Thank you! The fn extensions(&self) -> Extensions<'_> {
Extensions::new(self.inner.extensions.read().expect("Mutex poisoned"))
}
fn extensions_mut(&self) -> ExtensionsMut<'_> {
ExtensionsMut::new(self.inner.extensions.write().expect("Mutex poisoned"))
} The code I wrote originally acquired the extensions of the root/parent spans in addition to the "active" span. If other extensions did that as well and tried to acquire the locks in opposite order, it would result in deadlocks. I avoid that now by always releasing the lock before acquiring other extensions. I guess that this should probably be documented in the Also, I am quite open to contribute the code of |
Feature Request
Provide an easy way to print a call tree summary after each processed request or another meaningful unit of work in your application. The call tree should be based on the tracing spans. Events are ignored.
(I am writing this feature proposal after the nice encouragement by @hawkw)
Output
In the past, roughly the information that you would find in a flame graph but in text form has proven very helpful. Here is my first proposal for an output format:
[# calls]
the total number of calls for this call tree path.wall ms
the total wall time that a span with this call path was alive in ms (Subscriber::new_span
untiltry_close
). Edit: wassum ms
before but this is misleading.own ms
the total wall time that executing was in a span with this call path (Subscriber::enter
untilSubscriber::leave
).span name
The name from theMetadata
. We could also add the callsite for disambiguation but this is probably not necessary.some relatively short identifier for the span -- probably we should allow customizing this so that each user can create a function that creates an appropriate short name for each span.The order of tree nodes: There should be only one entry for each call path but the spans should be sorted by the first time they were seen. Think of storing the children of each call path in something like a linked_hash_map. That way, the order of the children resembles the order in which they were called. For repeated calls, the order is often still quite readable since it is typical that some sub sequence of calls is simply repeated in a loop.
That, in practice, gives you a lovely outline of how your request was processed.
Crates
I think that we should create a new sub crate for this since the functionality is orthogonal to the rest and only users who want this should pay the price. Alternatively, it might be enabled by a feature on the
tracing-subscriber
crate.The data model built by this subscriber might be useful for other summaries and might be extracted into another crate or a lib in the
tracing-subscriber
crate. I'd start with keeping the code together in one place, though.Motivation
Let's assume a simple data model for the above example. Every
election
has severalelection_options
for voting. Users can comment on everyelection_option
with anelection_comment
.election_options.query
calls for only oneelections.query
call. Fortunately, the comments do not seem to be queried individually since their cardinality is the same as the parent.wall ms
is useful: We do see where the majority of the latency comes from.own ms
is useful: We can confirm that our app mostly waits for the database. At least for async this should work well, for sync code we should rely on sub spans.Proposal
A new configurable subscriber should be created for this. I assume that the
Layer
/Subscriber
infrastructure in tracing-subscriber is a good match but I haven't looked into the details.Introspection into spans:
summary roots: By default, the subscriber should start tracking at all root spans.
the subscriber should start tracking call paths for any spans that have a marker field, like e.g.. This should be overridable.summary_root
short names: The subscriber needs to get a short name for each span. This should be user configurable.(not necessary, the metadata for spans already includes a "name" that is suitable for this)New spans which are not children of summary roots can be completely ignored.
When a summary root is
try_close
d, the summary as defined above should be printed.Synthetic example
should result in the following tree (times are obviously unrealistic):
The text was updated successfully, but these errors were encountered: