-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
New Approach to investigate Parallel Execution of SELECT Plan in REFRESH Command. #892
Conversation
This commit introduces a new approach to determine if the SELECT part of the REFRESH command is executed in parallel within Cloudberry. Since REFRESH is a utility command, it cannot be analyzed using EXPLAIN. Previously, I developed the `refresh_compare` function in `cbdb_parallel.sql`, which relied on actual execution times, with the expectation that parallel execution would consistently outperform non-parallel execution. However, since our CI transitioned to Apache, resource limitations have led to frequent failures in parallel refresh tests, making it challenging to obtain reliable results. The new method focuses on assessing whether the REFRESH plan is parallel, specifically for AO/AOCS materialized views. I have retained the original test cases to allow other databases built on Cloudberry to evaluate actual execution times, as the new approach only identifies parallel execution without measuring performance. Authored-by: Zhang Mingli avamingli@gmail.com
check_refresh_plan_parallel && | ||
plan_has_parallel(plan)) | ||
{ | ||
ereport(NOTICE, errmsg("Plan of REFRESH is parallel")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use client_min_messages
to avoid introduce check_refresh_plan_parallel
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better then, but what should the log level be?
We don’t want to send this to the server logs, and displaying a message like 'Plan of REFRESH is parallel' to users by default is generally unnecessary and could disrupt their experience.
Additionally, it complicates the adjustment of many test cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (client_min_messages == WARNING) then ereport ? then set once for this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
like this?
if(client_min_messages == WARNING)
ereport(NOTICE, errmsg("Plan of REFRESH is parallel"));
very wired, and client_min_messages
should be used only for client side, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I found another way:
set debug_print_plan to on;
enable_parallel=on;
:numSlices 3
:slices[i].sliceIndex 0
:slices[i].parentIndex -1
:slices[i].gangType 4
:slices[i].numsegments 3
:slices[i].parallel_workers 0
:slices[i].segindex 0
:slices[i].directDispatch.isDirectDispatch false
:slices[i].directDispatch.contentIds <>
:slices[i].sliceIndex 1
:slices[i].parentIndex 0
:slices[i].gangType 2
:slices[i].numsegments 1
:slices[i].parallel_workers 0
:slices[i].segindex 2
:slices[i].directDispatch.isDirectDispatch false
:slices[i].directDispatch.contentIds <>
:slices[i].sliceIndex 2
:slices[i].parentIndex 1
:slices[i].gangType 3
:slices[i].numsegments 3
:slices[i].parallel_workers 2
:slices[i].segindex 0
:slices[i].directDispatch.isDirectDispatch false
:slices[i].directDispatch.contentIds <>
:rewindPlanIDs (b)
:slices[i].parallel_workers = 2, that's diff from parallel close case :slices[i].parallel_workers 0
.
use udf parse output , no need to change code, no overhead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls refer to select wait_until_query_output_to_file('/tmp/bfv_cte.out');
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't get it, could you provide codes to introduce your idea?
and:
How to print plan in REFRESH command,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How to print plan in REFRESH command
@yjhjstz Could you show your steps?
Even with your settings like set debug_print_plan to on;, I couldn't see plan tree when exec 'refresh materialized view mv;'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok,
step 1:
postgres=# create table t_p(c1 int, c2 int) with(parallel_workers=8) distributed by(c1);
CREATE TABLE
postgres=# insert into t_p select i, i+1 from generate_series(1, 10000000)i;
INSERT 0 10000000
postgres=# create materialized view matv using ao_row as select sum(a.c2) as c2, avg(b.c1) as c1 from t_p a join t_p b on a.c1 = b.c1 with no data distributed by(c2);
CREATE MATERIALIZED VIEW
step2:
bash -c 'psql -X postgres -c "set optimizer=off; set enable_parallel=on;set client_min_messages to log; set debug_print_plan=on; refresh materialized view matv;" &> /tmp/refresh_plan.out'
step3:
cat /tmp/refresh_plan.out, found :slices[i].parallel_workers 2
;
select wait_until_query_then_parse('/tmp/refresh_plan.out'); // TODO wait_until_query_then_parse
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussing with @yjhjstz , he suggested using debug_print_plan = on to view the plan tree for a REFRESH command. No code changes are required for this. However, since the REFRESH command does not support plan printing well (and there is no guarantee that utility commands will print plans), we encountered some buggy output: "could not dump unrecognized node type: 1001." Fixing this issue would divert us from our main objective, and it's uncertain how many such cases exist. Additionally, I believe there is no need to export the output to a file outside the database; a function to filter the output would suffice. This is just a suggestion for anyone interested. The real issue lies in our CI resource limitations, which we hope will be addressed thoroughly. Any workaround at this stage is merely a temporary solution. |
I can't count how many times I've been wrong in this situation in parallel refresh materialized views:
and all we can do is retry.
Previously, I developed the
refresh_compare
function incbdb_parallel.sql
, which relied on actual execution times, with the expectation that parallel execution would consistently outperform non-parallel execution. It worked well for a certain period of time.However, since our CI transitioned to Apache, resource limitations have led to frequent failures in parallel refresh tests, making it challenging to obtain reliable results.
This commit introduces a new approach to determine if the SELECT part of the REFRESH command is executed in parallel within Cloudberry. Since REFRESH is a utility command, it cannot be analyzed using EXPLAIN.
The new method focuses on assessing whether the REFRESH plan is parallel, specifically for AO/AOCS materialized views. I have retained the original test cases to allow other databases built on Cloudberry to evaluate actual execution times, as the new approach only identifies parallel execution without measuring performance.
Authored-by: Zhang Mingli avamingli@gmail.com
Fixes #ISSUE_Number
What does this PR do?
Type of Change
Breaking Changes
Test Plan
make installcheck
make -C src/test installcheck-cbdb-parallel
Impact
Performance:
User-facing changes:
Dependencies:
Checklist
Additional Context
CI Skip Instructions