You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like for join_nodes to support priorities.
Hello, I'm working on a project that uses the TBB flow graph on a cyclic directed graph. The graph contains ~1M functional node types and most of them use parallel_for algorithms in the bodies. The graph is dynamically built according to a user input file, and there are timing constraints on many nodes. Static Timing Analysis is done to propagate constraints and accumulate delays throughout the graph, which is then used in a static scheduling strategy to assign priorities to every node in a side structure (and directly to tbb nodes that support it). It is very important for the project not to form bubbles on the critical path for performance purposes (not mission critical, and I realize tbb is not suitable for real time applications which this is not).
I want my parallel_for tasks that are nested within the flow graph nodes to be properly prioritized as well (otherwise the purpose of priorities is defeated), so I've made my own parallel_for algorithms that take a priority parameter (same priority as the functional node that calls parallel_for) and push work onto a priority queue. Then the parallel_for algorithm spawns a task that will pop from the priority queue by calling submit directly on the arena with the critical parameter set to false.
If I understand correctly, non functional nodes are spawned in the task slots. Meaning all my join nodes (which are fixed at no_priority) are put into the task slots. Profiling traces indicate that the join tasks are getting starved out (on the order of 10ms to 200ms which is total limbo land) even though dependencies are satisfied, which also starves their high priority successors. I think that they are starved out because my program is so massively parallel that tasks flood the task slots ahead of the join node's forward_task_bypass, and the critical arena queue is not helping the starvation situation. (Yes I am using grain sizes and serial cutoffs, however they've been tuned for isolated benchmarks).
My theory of what is happening.
All functional nodes are going into the arena's critical queue
All non-functional nodes (my join nodes that precede critical function nodes) are going into the task slots.
parallel_for tasks flood the task slots, and join nodes get starved out, leading tasks to be mis-prioritized.
I've also read the broadcast_cache and I've found that reordering successors has a big impact.
I am eager for feedback as to whether my understanding of the starvation issue is correct, and I welcome suggestions to resolve it. I also hope that TBB can support priorities for join_node's (and perhaps also all non-functional nodes) in the future if deemed appropriate.
Thank you.
The text was updated successfully, but these errors were encountered:
Many thanks for highlighting this and describing your use-case. We will definitely consider extending the priorities API for join_node and other non-functional nodes.
As a temporary solution for now, I can suggest to try creating two separate explicit task_arena instances, one with high priority and another with normal priority. And then execute the entire flow graph in a high priority arena and all of parallel_fors inside node bodies - in a normal priority arena. My guess is it will help to move forward Flow Graph service tasks because of the arenas priority setting.
I am adding priority support to other nodes on a fork. Not sure if I will push it upstream yet. Perhaps we can work together on getting this merged (testing and design considerations), or the TBB team can take this action and I will pull the updates.
There is one additional feature I am implementing on my fork which is to add a set_priority api to modify a node's priority post construction. This saves me from pausing operation and rebuilding the graph.
This performance investigation is on pause from my end and I haven't yet tried the suggestions.
My next steps are to update my org's toolchain to the latest tbb version to fix the race condition in the limiter node, which solves the timing sensitive hangs I was running into when using lightweight nodes. This will help solve some of the scheduling delays I was seeing by taking advantage of scheduler bypass, at which point I will reevaluate and perhaps try the suggestions by @kboyarinov
I would like for join_nodes to support priorities.
Hello, I'm working on a project that uses the TBB flow graph on a cyclic directed graph. The graph contains ~1M functional node types and most of them use parallel_for algorithms in the bodies. The graph is dynamically built according to a user input file, and there are timing constraints on many nodes. Static Timing Analysis is done to propagate constraints and accumulate delays throughout the graph, which is then used in a static scheduling strategy to assign priorities to every node in a side structure (and directly to tbb nodes that support it). It is very important for the project not to form bubbles on the critical path for performance purposes (not mission critical, and I realize tbb is not suitable for real time applications which this is not).
I want my parallel_for tasks that are nested within the flow graph nodes to be properly prioritized as well (otherwise the purpose of priorities is defeated), so I've made my own parallel_for algorithms that take a priority parameter (same priority as the functional node that calls parallel_for) and push work onto a priority queue. Then the parallel_for algorithm spawns a task that will pop from the priority queue by calling
submit
directly on the arena with thecritical
parameter set to false.If I understand correctly, non functional nodes are spawned in the task slots. Meaning all my join nodes (which are fixed at
no_priority
) are put into the task slots. Profiling traces indicate that the join tasks are getting starved out (on the order of 10ms to 200ms which is total limbo land) even though dependencies are satisfied, which also starves their high priority successors. I think that they are starved out because my program is so massively parallel that tasks flood the task slots ahead of the join node'sforward_task_bypass
, and the critical arena queue is not helping the starvation situation. (Yes I am using grain sizes and serial cutoffs, however they've been tuned for isolated benchmarks).My theory of what is happening.
I've also read the
broadcast_cache
and I've found that reordering successors has a big impact.I am eager for feedback as to whether my understanding of the starvation issue is correct, and I welcome suggestions to resolve it. I also hope that TBB can support priorities for join_node's (and perhaps also all non-functional nodes) in the future if deemed appropriate.
Thank you.
The text was updated successfully, but these errors were encountered: