-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
ARROW-15732: [C++] Do not use any CPU threads in execution plan when use_threads is false #15104
ARROW-15732: [C++] Do not use any CPU threads in execution plan when use_threads is false #15104
Conversation
|
… executor instead of a null executor
9100491
to
7fdf8b3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall. Thanks for cleaning up the tests.
We do have a lot of different modes of execution, though I suppose it's just because of sync/async × threaded/serial.
Thanks for the quick review.
Yep. The async-serial mode is newly introduced here and it would be nice to avoid it. However, I can't easily switch R/python over to sync-serial because I need to wait until we fix ordering (or add ordering to the DeclarationToXyz methods which I don't want to do). Fixing ordering in this PR would make the PR too large so I think I'm stuck with this. |
Benchmark runs are scheduled for baseline = 237bc30 and contender = 498b645. 498b645 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
['Python', 'R'] benchmarks have high level of regressions. |
…use_threads is false (apache#15104) This PR gets rid of the "executor==nullptr" style of execution from Acero. Now a CPU executor must always be defined. This also updates the DeclarationToXyz methods to support 5 "standard" modes of operation: * sync / threaded - All CPU work is done on the CPU thread pool. The calling thread sleeps until the plan is done. * sync / serial - All CPU work is done on the calling thread. The calling thread becomes a serial executor until plan is done. * async / threaded - All CPU work is done on the CPU thread pool. The calling thread returns immediately. * async / serial - All CPU work is done on a plan-specific 1-thread pool. The calling thread returns immediately. * async / custom exec context - All CPU work is done on the provided executor which cannot be nullptr and the caller is responsible for keeping it alive. For backwards compatibility the "old style" of serial execution (which is still used by R/python) will still be accepted. Under the hood a 1-thread pool will be created and its lifetime will be bound to the exec plan (this is similar to async / serial when using DeclarationToXyz). Since DeclarationToXyz doesn't support "ordered sinks" we cannot yet migrate the bindings and so they will continue to rely on the deprecated API. All unit tests are updated to use the new API. Some unit tests were not actually running in parallel (even though they said they were) and now that they are running in parallel a few changes had to be made to accept / handle non-deterministic result order. As part of this process some unit tests were migrated away from StartAndCollect in favor of DeclarationToXyz. Hopefully we can eventually migrate almost all of the unit tests to DeclarationToXyz but that is a longer term effort. Authored-by: Weston Pace <weston.pace@gmail.com> Signed-off-by: Weston Pace <weston.pace@gmail.com>
…use_threads is false (apache#15104) This PR gets rid of the "executor==nullptr" style of execution from Acero. Now a CPU executor must always be defined. This also updates the DeclarationToXyz methods to support 5 "standard" modes of operation: * sync / threaded - All CPU work is done on the CPU thread pool. The calling thread sleeps until the plan is done. * sync / serial - All CPU work is done on the calling thread. The calling thread becomes a serial executor until plan is done. * async / threaded - All CPU work is done on the CPU thread pool. The calling thread returns immediately. * async / serial - All CPU work is done on a plan-specific 1-thread pool. The calling thread returns immediately. * async / custom exec context - All CPU work is done on the provided executor which cannot be nullptr and the caller is responsible for keeping it alive. For backwards compatibility the "old style" of serial execution (which is still used by R/python) will still be accepted. Under the hood a 1-thread pool will be created and its lifetime will be bound to the exec plan (this is similar to async / serial when using DeclarationToXyz). Since DeclarationToXyz doesn't support "ordered sinks" we cannot yet migrate the bindings and so they will continue to rely on the deprecated API. All unit tests are updated to use the new API. Some unit tests were not actually running in parallel (even though they said they were) and now that they are running in parallel a few changes had to be made to accept / handle non-deterministic result order. As part of this process some unit tests were migrated away from StartAndCollect in favor of DeclarationToXyz. Hopefully we can eventually migrate almost all of the unit tests to DeclarationToXyz but that is a longer term effort. Authored-by: Weston Pace <weston.pace@gmail.com> Signed-off-by: Weston Pace <weston.pace@gmail.com>
This PR gets rid of the "executor==nullptr" style of execution from Acero. Now a CPU executor must always be defined. This also updates the DeclarationToXyz methods to support 5 "standard" modes of operation:
For backwards compatibility the "old style" of serial execution (which is still used by R/python) will still be accepted. Under the hood a 1-thread pool will be created and its lifetime will be bound to the exec plan (this is similar to async / serial when using DeclarationToXyz).
Since DeclarationToXyz doesn't support "ordered sinks" we cannot yet migrate the bindings and so they will continue to rely on the deprecated API.
All unit tests are updated to use the new API. Some unit tests were not actually running in parallel (even though they said they were) and now that they are running in parallel a few changes had to be made to accept / handle non-deterministic result order.
As part of this process some unit tests were migrated away from StartAndCollect in favor of DeclarationToXyz. Hopefully we can eventually migrate almost all of the unit tests to DeclarationToXyz but that is a longer term effort.