ARROW-15732: [C++] Do not use any CPU threads in execution plan when use_threads is false #15104

westonpace · 2022-12-28T16:57:59Z

This PR gets rid of the "executor==nullptr" style of execution from Acero. Now a CPU executor must always be defined. This also updates the DeclarationToXyz methods to support 5 "standard" modes of operation:

sync / threaded - All CPU work is done on the CPU thread pool. The calling thread sleeps until the plan is done.
sync / serial - All CPU work is done on the calling thread. The calling thread becomes a serial executor until plan is done.
async / threaded - All CPU work is done on the CPU thread pool. The calling thread returns immediately.
async / serial - All CPU work is done on a plan-specific 1-thread pool. The calling thread returns immediately.
async / custom exec context - All CPU work is done on the provided executor which cannot be nullptr and the caller is responsible for keeping it alive.

For backwards compatibility the "old style" of serial execution (which is still used by R/python) will still be accepted. Under the hood a 1-thread pool will be created and its lifetime will be bound to the exec plan (this is similar to async / serial when using DeclarationToXyz).

Since DeclarationToXyz doesn't support "ordered sinks" we cannot yet migrate the bindings and so they will continue to rely on the deprecated API.

All unit tests are updated to use the new API. Some unit tests were not actually running in parallel (even though they said they were) and now that they are running in parallel a few changes had to be made to accept / handle non-deterministic result order.

As part of this process some unit tests were migrated away from StartAndCollect in favor of DeclarationToXyz. Hopefully we can eventually migrate almost all of the unit tests to DeclarationToXyz but that is a longer term effort.

github-actions · 2022-12-28T16:58:25Z

https://issues.apache.org/jira/browse/ARROW-15732

github-actions · 2022-12-28T16:58:27Z

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

… executor instead of a null executor

lidavidm

LGTM overall. Thanks for cleaning up the tests.

We do have a lot of different modes of execution, though I suppose it's just because of sync/async × threaded/serial.

cpp/src/arrow/util/async_util.cc

cpp/src/arrow/compute/exec/exec_plan.h

cpp/src/arrow/compute/exec/exec_plan.cc

cpp/src/arrow/compute/exec/sink_node.cc

westonpace · 2022-12-30T15:15:58Z

Thanks for the quick review.

We do have a lot of different modes of execution, though I suppose it's just because of sync/async × threaded/serial.

Yep. The async-serial mode is newly introduced here and it would be nice to avoid it. However, I can't easily switch R/python over to sync-serial because I need to wait until we fix ordering (or add ordering to the DeclarationToXyz methods which I don't want to do).

Fixing ordering in this PR would make the PR too large so I think I'm stuck with this.

ursabot · 2022-12-31T00:17:09Z

Benchmark runs are scheduled for baseline = 237bc30 and contender = 498b645. 498b645 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️0.0% ⬆️0.0%] test-mac-arm
[Finished ⬇️2.3% ⬆️0.0%] ursa-i9-9960x
[Failed ⬇️0.0% ⬆️0.0%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 498b645e ec2-t3-xlarge-us-east-2
[Failed] 498b645e test-mac-arm
[Finished] 498b645e ursa-i9-9960x
[Failed] 498b645e ursa-thinkcentre-m75q
[Finished] 237bc304 ec2-t3-xlarge-us-east-2
[Finished] 237bc304 test-mac-arm
[Finished] 237bc304 ursa-i9-9960x
[Finished] 237bc304 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

ursabot · 2022-12-31T00:17:20Z

['Python', 'R'] benchmarks have high level of regressions.
ursa-i9-9960x

…use_threads is false (apache#15104) This PR gets rid of the "executor==nullptr" style of execution from Acero. Now a CPU executor must always be defined. This also updates the DeclarationToXyz methods to support 5 "standard" modes of operation: * sync / threaded - All CPU work is done on the CPU thread pool. The calling thread sleeps until the plan is done. * sync / serial - All CPU work is done on the calling thread. The calling thread becomes a serial executor until plan is done. * async / threaded - All CPU work is done on the CPU thread pool. The calling thread returns immediately. * async / serial - All CPU work is done on a plan-specific 1-thread pool. The calling thread returns immediately. * async / custom exec context - All CPU work is done on the provided executor which cannot be nullptr and the caller is responsible for keeping it alive. For backwards compatibility the "old style" of serial execution (which is still used by R/python) will still be accepted. Under the hood a 1-thread pool will be created and its lifetime will be bound to the exec plan (this is similar to async / serial when using DeclarationToXyz). Since DeclarationToXyz doesn't support "ordered sinks" we cannot yet migrate the bindings and so they will continue to rely on the deprecated API. All unit tests are updated to use the new API. Some unit tests were not actually running in parallel (even though they said they were) and now that they are running in parallel a few changes had to be made to accept / handle non-deterministic result order. As part of this process some unit tests were migrated away from StartAndCollect in favor of DeclarationToXyz. Hopefully we can eventually migrate almost all of the unit tests to DeclarationToXyz but that is a longer term effort. Authored-by: Weston Pace <weston.pace@gmail.com> Signed-off-by: Weston Pace <weston.pace@gmail.com>

github-actions bot added Component: C++ Component: Python Component: R labels Dec 28, 2022

ARROW-15732: change behavior of exec plan serial mode to use a serial…

7fdf8b3

… executor instead of a null executor

westonpace force-pushed the feature/ARROW-15732--proper-exec-plan-serial-mode branch from 9100491 to 7fdf8b3 Compare December 30, 2022 03:46

Minor test fixes

468c6ba

westonpace marked this pull request as ready for review December 30, 2022 04:22

Fixing a few more unit tests

ab2f5d9

westonpace requested a review from lidavidm December 30, 2022 06:23

Minor compiler warning fix

d62ae38

lidavidm reviewed Dec 30, 2022

View reviewed changes

lidavidm approved these changes Dec 30, 2022

View reviewed changes

Address review comments

0af7a69

westonpace merged commit 498b645 into apache:master Dec 30, 2022

westonpace mentioned this pull request Dec 30, 2022

ARROW-17980: [C++] As-of-Join Substrait extension #14485

Merged

westonpace mentioned this pull request Dec 31, 2022

[CI] arrow-compute: ExecPlanExecution.StressSourceOrderBy may failed #15141

Closed

asfimport mentioned this pull request Dec 31, 2022

[C++] Do not use any CPU threads in execution plan when use_threads is false #31181

Closed

This was referenced Mar 11, 2025

[C++][Python] Remove deprecated functions in 20.0 #45680

Open

[C++] Remove deprecated functions in 20.0 (C++ subset) #45746

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARROW-15732: [C++] Do not use any CPU threads in execution plan when use_threads is false #15104

ARROW-15732: [C++] Do not use any CPU threads in execution plan when use_threads is false #15104

westonpace commented Dec 28, 2022 •

edited

Loading

github-actions bot commented Dec 28, 2022

github-actions bot commented Dec 28, 2022

lidavidm left a comment

westonpace commented Dec 30, 2022

ursabot commented Dec 31, 2022

ursabot commented Dec 31, 2022

ARROW-15732: [C++] Do not use any CPU threads in execution plan when use_threads is false #15104

ARROW-15732: [C++] Do not use any CPU threads in execution plan when use_threads is false #15104

Conversation

westonpace commented Dec 28, 2022 • edited Loading

github-actions bot commented Dec 28, 2022

github-actions bot commented Dec 28, 2022

lidavidm left a comment

Choose a reason for hiding this comment

westonpace commented Dec 30, 2022

ursabot commented Dec 31, 2022

ursabot commented Dec 31, 2022

westonpace commented Dec 28, 2022 •

edited

Loading