[CINN][Backend Pass Update No.12] Update transform_gpu_forloop pass #70883

Albresky · 2025-01-17T17:49:13Z

PR Category

CINN

PR Types

Improvements

Description

改造了 transform_gpu_forloop pass

Hongqing-work · 2025-02-06T07:27:53Z

paddle/cinn/optim/replace_var_with_expr.cc

+
+    std::vector<Expr> iter_values = stmt->iter_values();
+    for (ir::Expr& iter_value : iter_values) {
+      ir::IRMutator<>::Visit(&iter_value, &iter_value);
+    }
+    stmt->set_iter_values(iter_values);
+


replace_var_with_expr为什么需要replace iter_value呢

class ReplaceLoopVarToGpu { public: void operator()(ir::stmt::BlockRef block) { ... } private: void VisitStmt(ir::stmt::For stmt) { auto bind_info = stmt->bind_info(); std::string var_name = ""; if (bind_info.offset <= 0) var_name = "x"; else if (bind_info.offset == 1) var_name = "y"; else if (bind_info.offset == 2) var_name = "z"; if (stmt->is_gpu_block_binded()) { var_name = "blockIdx." + var_name; optim::ReplaceVarWithExpr<ir::stmt::StmtRef>( stmt, stmt->loop_var(), ir::Expr(ir::Var(var_name))); } else if (stmt->is_gpu_thread_binded()) { var_name = "threadIdx." + var_name; optim::ReplaceVarWithExpr<ir::stmt::StmtRef>( stmt, stmt->loop_var(), ir::Expr(ir::Var(var_name))); } operator()(stmt->body()); } };

在 ReplaceLoopVarToGpu 中调用 optim::ReplaceVarWithExprir::stmt::StmtRef() 时，stmt （Schedule）中 iter_value : (i_j_fused / 128ll) 的 i_j_fused 也应该被替换为 blockIdx.x。

例如：

Before：

245: { 245: Schedule (root_14) { 245: attrs(tile_method:TileFirstGeneralTactic) 245: thread_bind[blockIdx.x] for (i_j_fused, 0ll, (S0 * 128ll)) { 245: Schedule (var_7) { 245: i0_14, i1_8 = axis.bind((i_j_fused / 128ll), (i_j_fused % 128ll)) 245: read_buffers(_var[i0_14(0:S0), i1_8(0:128ll)], _var_1[i0_14(0:S0), i1_8(0:128ll)]) 245: write_buffers(_var_7[i0_14(0:S0), i1_8(0:128ll)]) 245: var_7[(i_j_fused / 128ll), (i_j_fused % 128ll)] = (exp(var[(i_j_fused / 128ll), (i_j_fused % 128ll)]) - var_1[(i_j_fused / 128ll), (i_j_fused % 128ll)]) 245: } 245: } 245: } 245: }

After：

245: { 245: Schedule (root_13) { 245: attrs(tile_method:TileFirstGeneralTactic) 245: thread_bind[blockIdx.x] for (blockIdx.x, 0ll, (S0 * 128ll)) { 245: Schedule (var_7) { 245: i0_13, i1_7 = axis.bind((blockIdx.x / 128ll), (blockIdx.x % 128ll)) 245: read_buffers(_var[i0_13(0:S0), i1_7(0:128ll)], _var_1[i0_13(0:S0), i1_7(0:128ll)]) 245: write_buffers(_var_7[i0_13(0:S0), i1_7(0:128ll)]) 245: var_7[(blockIdx.x / 128ll), (blockIdx.x % 128ll)] = (exp(var[(blockIdx.x / 128ll), (blockIdx.x % 128ll)]) - var_1[(blockIdx.x / 128ll), (blockIdx.x % 128ll)]) 245: } 245: } 245: } 245: }

Hongqing-work · 2025-02-06T07:34:54Z

paddle/cinn/optim/transform_gpu_forloop.cc

+LogicalResult CudaSyncThreadsDropIfThenElsePass::Run(ir::LoweredFunc fn) {
+  DropIfThenElseMutator mutator;
+  mutator(fn->body_block);
+  return LogicalResult::success();
+}


这个实现为blockpass即可，遍历block里面的stmt，如果是IfThenElse stmt且满足条件（condition，内部是个Evaluate(call syncthread)）就可以将其直接替换成内部的stmt

paddle-ci-bot · 2025-02-10T02:58:19Z

Sorry to inform you that dffe97a's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

…oop; resolve conflicts

Hongqing-work · 2025-02-14T09:12:37Z

paddle/cinn/hlir/framework/pir/op_lowering_impl.cc

 #endif
          },
          [&](std::variant<common::HygonDCUArchHIP, common::HygonDCUArchSYCL>) {
-            // optim::EliminateCommonGlobalMemoryRead(&(func_body));
-            optim::OptimizeExprGPU(&(func_body));
+            optim::EliminateCommonGlobalMemoryRead(&(func_body));


这个pass影响性能了，保持注释就好

Hongqing-work · 2025-02-14T09:13:14Z

paddle/cinn/hlir/framework/pir/op_lowering_impl.cc

@@ -392,13 +393,27 @@ std::vector<ir::LoweredFunc> OpLowererImpl::PostProcess(
                           common::ARMArch>) {},
          [&](common::NVGPUArch) {
 #ifdef CINN_WITH_CUDA
-            // optim::EliminateCommonGlobalMemoryRead(&(func_body));
-            optim::OptimizeExprGPU(&(func_body));
+            optim::EliminateCommonGlobalMemoryRead(&(func_body));


暂时不开这个优化，保持注释就好

Albresky added 2 commits January 18, 2025 01:41

Update transform_gpu_forloop pass

d1c4122

Merge branch 'PaddlePaddle:develop' into cinn-pass-transform_gpu_forloop

b8e9718

paddle-bot bot added the contributor External developers label Jan 17, 2025

Update op_lowering_impl.cc

dffe97a

luotao1 added the HappyOpenSource 快乐开源活动issue与PR label Jan 18, 2025

luotao1 assigned luotao1 and Hongqing-work Jan 18, 2025

luotao1 mentioned this pull request Jan 20, 2025

【开源任务】CINN编译器后端Pass改造 #69639

Closed

Hongqing-work reviewed Feb 6, 2025

View reviewed changes

Albresky added 2 commits February 12, 2025 15:06

Update CudaSyncThreadsDropIfThenElse pass

1e00fae

Merge branch 'PaddlePaddle:develop' into cinn-pass-transform_gpu_forl…

002fa84

…oop; resolve conflicts

Hongqing-work reviewed Feb 14, 2025

View reviewed changes

Disable EliminateCommonGlobalMemoryRead pass

198eda5

Hongqing-work approved these changes Feb 19, 2025

View reviewed changes

Hongqing-work merged commit fba2e98 into PaddlePaddle:develop Feb 19, 2025
31 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CINN][Backend Pass Update No.12] Update transform_gpu_forloop pass #70883

[CINN][Backend Pass Update No.12] Update transform_gpu_forloop pass #70883

Albresky commented Jan 17, 2025

Hongqing-work Feb 6, 2025

Albresky Feb 12, 2025

Hongqing-work Feb 6, 2025

paddle-ci-bot bot commented Feb 10, 2025

Hongqing-work Feb 14, 2025

Hongqing-work Feb 14, 2025

Albresky Feb 14, 2025

[CINN][Backend Pass Update No.12] Update transform_gpu_forloop pass #70883

[CINN][Backend Pass Update No.12] Update transform_gpu_forloop pass #70883

Conversation

Albresky commented Jan 17, 2025

PR Category

PR Types

Description

Hongqing-work Feb 6, 2025

Choose a reason for hiding this comment

Albresky Feb 12, 2025

Choose a reason for hiding this comment

Hongqing-work Feb 6, 2025

Choose a reason for hiding this comment

paddle-ci-bot bot commented Feb 10, 2025

Hongqing-work Feb 14, 2025

Choose a reason for hiding this comment

Hongqing-work Feb 14, 2025

Choose a reason for hiding this comment

Albresky Feb 14, 2025

Choose a reason for hiding this comment