-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Track memory usage for each individual operator #899
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Comments
This got me interesting, so I started looking into it, and I'm not sure how we aim to tackle it. My 1st idea was to write a Decorator which implements The other approach I found (from the article below) is implementing something similar to Servo's Not sure where to go from here, would love to hear some feedback. Some references: https://rust-analyzer.github.io/blog/2020/12/04/measuring-memory-usage-in-rust.html |
I was kind of imagining we would have to do something like manually registering memory allocations. the While it would be likely be crazy complicated to do this for all allocations, I think all the built in DataFusion operators use most of their memory in intermediate RecordBatches and a potential single large structure (e.g. the hash tables in hash_join and hash_aggregate) If we captured these large sources I think that would get us most of the value |
Cool, so I dug through the code a bit, and this seems to be a bit out of my league (needs high familiarity with way too many things). Thank you for the response! |
This is now handled with the https://docs.rs/datafusion/latest/datafusion/execution/memory_pool/trait.MemoryPool.html |
Uh oh!
There was an error while loading. Please reload this page.
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
When reviewing a plan, it would be nice to know the amount of memory each individual
ExecutionPlan
allocated during its execution.Describe the solution you'd like
Add two new metrics to all operators:
"Allocated" should include both memory in created record batches as well as any internal memory (as described in #898 -- hopefully this code would just use the same underlying allocation measurement)
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Probably could follow the same model as #866 (baseline metrics for all operators) once that is implemented
#898 is for tracking overall memory allocations across all operators in a plan. This issue is for tracking the allocations for each individual operator
The text was updated successfully, but these errors were encountered: