Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

New Approach to investigate Parallel Execution of SELECT Plan in REFRESH Command. #892

Closed
wants to merge 1 commit into from

Conversation

avamingli
Copy link
Contributor

I can't count how many times I've been wrong in this situation in parallel refresh materialized views:

image
and all we can do is retry.

Previously, I developed the refresh_compare function in cbdb_parallel.sql, which relied on actual execution times, with the expectation that parallel execution would consistently outperform non-parallel execution. It worked well for a certain period of time.
However, since our CI transitioned to Apache, resource limitations have led to frequent failures in parallel refresh tests, making it challenging to obtain reliable results.

This commit introduces a new approach to determine if the SELECT part of the REFRESH command is executed in parallel within Cloudberry. Since REFRESH is a utility command, it cannot be analyzed using EXPLAIN.
The new method focuses on assessing whether the REFRESH plan is parallel, specifically for AO/AOCS materialized views. I have retained the original test cases to allow other databases built on Cloudberry to evaluate actual execution times, as the new approach only identifies parallel execution without measuring performance.

Authored-by: Zhang Mingli avamingli@gmail.com

Fixes #ISSUE_Number

What does this PR do?

Type of Change

  • Bug fix (non-breaking change)
  • New feature (non-breaking change)
  • Breaking change (fix or feature with breaking changes)
  • Documentation update

Breaking Changes

Test Plan

  • Unit tests added/updated
  • Integration tests added/updated
  • Passed make installcheck
  • Passed make -C src/test installcheck-cbdb-parallel

Impact

Performance:

User-facing changes:

Dependencies:

Checklist

Additional Context

CI Skip Instructions


This commit introduces a new approach to determine if the SELECT part of
the REFRESH command is executed in parallel within Cloudberry. Since
REFRESH is a utility command, it cannot be analyzed using EXPLAIN.
Previously, I developed the `refresh_compare` function in
`cbdb_parallel.sql`, which relied on actual execution times, with the
expectation that parallel execution would consistently outperform
non-parallel execution.

However, since our CI transitioned to Apache, resource limitations have
led to frequent failures in parallel refresh tests, making it
challenging to obtain reliable results.

The new method focuses on assessing whether the REFRESH plan is
parallel, specifically for AO/AOCS materialized views. I have retained
the original test cases to allow other databases built on Cloudberry to
evaluate actual execution times, as the new approach only identifies
parallel execution without measuring performance.

Authored-by: Zhang Mingli avamingli@gmail.com
check_refresh_plan_parallel &&
plan_has_parallel(plan))
{
ereport(NOTICE, errmsg("Plan of REFRESH is parallel"));
Copy link
Member

@yjhjstz yjhjstz Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use client_min_messages to avoid introduce check_refresh_plan_parallel ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better then, but what should the log level be?
We don’t want to send this to the server logs, and displaying a message like 'Plan of REFRESH is parallel' to users by default is generally unnecessary and could disrupt their experience.
Additionally, it complicates the adjustment of many test cases.

Copy link
Member

@yjhjstz yjhjstz Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (client_min_messages == WARNING) then ereport ? then set once for this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like this?

		if(client_min_messages == WARNING)
			ereport(NOTICE, errmsg("Plan of REFRESH is parallel"));

very wired, and client_min_messages should be used only for client side, isn't it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I found another way:
set debug_print_plan to on;
enable_parallel=on;

:numSlices 3 
   :slices[i].sliceIndex 0 
   :slices[i].parentIndex -1 
   :slices[i].gangType 4 
   :slices[i].numsegments 3 
   :slices[i].parallel_workers 0 
   :slices[i].segindex 0 
   :slices[i].directDispatch.isDirectDispatch false 
   :slices[i].directDispatch.contentIds <> 
   :slices[i].sliceIndex 1 
   :slices[i].parentIndex 0 
   :slices[i].gangType 2 
   :slices[i].numsegments 1 
   :slices[i].parallel_workers 0 
   :slices[i].segindex 2 
   :slices[i].directDispatch.isDirectDispatch false 
   :slices[i].directDispatch.contentIds <> 
   :slices[i].sliceIndex 2 
   :slices[i].parentIndex 1 
   :slices[i].gangType 3 
   :slices[i].numsegments 3 
   :slices[i].parallel_workers 2 
   :slices[i].segindex 0 
   :slices[i].directDispatch.isDirectDispatch false 
   :slices[i].directDispatch.contentIds <> 
   :rewindPlanIDs (b)

:slices[i].parallel_workers = 2, that's diff from parallel close case :slices[i].parallel_workers 0.

use udf parse output , no need to change code, no overhead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls refer to select wait_until_query_output_to_file('/tmp/bfv_cte.out');

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get it, could you provide codes to introduce your idea?
and:

How to print plan in REFRESH command,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to print plan in REFRESH command

@yjhjstz Could you show your steps?
Even with your settings like set debug_print_plan to on;, I couldn't see plan tree when exec 'refresh materialized view mv;'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok,
step 1:

postgres=# create table t_p(c1 int, c2 int) with(parallel_workers=8) distributed by(c1);
CREATE TABLE
postgres=#   insert into t_p select i, i+1 from generate_series(1, 10000000)i;
INSERT 0 10000000
postgres=#     create materialized view matv using ao_row as select sum(a.c2) as c2, avg(b.c1) as c1 from t_p a join t_p b on a.c1 = b.c1 with no data distributed by(c2);
CREATE MATERIALIZED VIEW

step2:

bash -c 'psql -X postgres -c "set optimizer=off; set enable_parallel=on;set client_min_messages to log; set debug_print_plan=on; refresh materialized view matv;" &> /tmp/refresh_plan.out'

step3:
cat /tmp/refresh_plan.out, found :slices[i].parallel_workers 2;

select wait_until_query_then_parse('/tmp/refresh_plan.out'); // TODO wait_until_query_then_parse

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

@avamingli
Copy link
Contributor Author

avamingli commented Jan 26, 2025

After discussing with @yjhjstz , he suggested using debug_print_plan = on to view the plan tree for a REFRESH command. No code changes are required for this.

However, since the REFRESH command does not support plan printing well (and there is no guarantee that utility commands will print plans), we encountered some buggy output: "could not dump unrecognized node type: 1001."

Fixing this issue would divert us from our main objective, and it's uncertain how many such cases exist. Additionally, I believe there is no need to export the output to a file outside the database; a function to filter the output would suffice.

This is just a suggestion for anyone interested. The real issue lies in our CI resource limitations, which we hope will be addressed thoroughly. Any workaround at this stage is merely a temporary solution.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants