Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

新IR Python API适配升级(第三期) #62618

Closed
YuanRisheng opened this issue Mar 11, 2024 · 7 comments
Closed

新IR Python API适配升级(第三期) #62618

YuanRisheng opened this issue Mar 11, 2024 · 7 comments
Assignees
Labels
HappyOpenSource 快乐开源活动issue与PR PFCC Paddle Framework Contributor Club,https://github.com/PaddlePaddle/community/tree/master/pfcc status/close 已关闭

Comments

@YuanRisheng
Copy link
Contributor

YuanRisheng commented Mar 11, 2024

一、BackGround 📚

任务背景、任务修改内容、提交样例可参考前期已发布过的任务:#58067

二、Task 📚

序号 Python API 所在文件 优先级 单测覆盖率 认领人 PR
🚧1 wait python/paddle/distributed/communication/group.py p1 🚧@zrr1999 #62974
🚧2 barrier python/paddle/distributed/communication/group.py p1 🚧@zrr1999 #62974
🙋3 all_gather python/paddle/distributed/communication/stream/all_gather.py p1 🙋@Eacient
✅4 all_reduce python/paddle/distributed/communication/stream/all_reduce.py p1 @SigureMo #62694
🙋5 alltoall python/paddle/distributed/communication/stream/all_to_all.py p1 🙋@ooooo-create
🙋6 broadcast python/paddle/distributed/communication/stream/broadcast.py p1 🙋@ooooo-create
🙋7 recv python/paddle/distributed/communication/stream/recv.py p1 🙋@ooooo-create
🙋8 reduce_scatter python/paddle/distributed/communication/stream/reduce_scatter.py p1 🙋@ooooo-create
🔵9 reduce python/paddle/distributed/communication/stream/reduce.py p1
🔵10 scatter python/paddle/distributed/communication/stream/scatter.py p1
🔵11 send python/paddle/distributed/communication/stream/send.py p1
🔵12 reshard python/paddle/distributed/auto_parallel/api.py p1
🔵13 split python/paddle/distributed/fleet/layers/mpu/mp_ops.py p1
🔵14 dropout python/paddle/distributed/fleet/layers/mpu/random.py p1
🙋15 RandomHorizontalFlip python/paddle/vision/transforms/transforms.py p1 🙋@jshh0401
🔵16 RandomVerticalFlip python/paddle/vision/transforms/transforms.py p1
🔵17 RandomErasing python/paddle/vision/transforms/transforms.py p1
🔵18 frame python/paddle/signal.py p1
🔵19 overlap_add python/paddle/signal.py p1
🔵20 fused_dot_product_attention python/paddle/incubate/nn/functional/fused_dot_product_attention.py p1
🔵21 flash_attn_unpadded python/paddle/nn/functional/flash_attention.py p1
🔵22 edit_distance python/paddle/nn/functional/loss.py p1
🔵23 l2_norm python/paddle/nn/utils/weight_norm_hook.py p1
🔵24 conv1d_transpose python/paddle/nn/functional/conv.py p1
🔵25 Adadelta python/paddle/optimizer/adadelta.py p1
🔵26 Dirichlet python/paddle/distribution/dirichlet.py p1
🙋27 LinearQuanter python/paddle/nn/quant/format.py p1 🙋@zrr1999
🙋28 LinearDeQuanter python/paddle/nn/quant/format.py p1 🙋@zrr1999
🙋29 FakeQuantAbsMax python/paddle/nn/quant/quant_layers.py p1 🙋@zrr1999
🙋30 FakeQuantMovingAverageAbsMax python/paddle/nn/quant/quant_layers.py p1 🙋@zrr1999
🙋31 FakeQuantChannelWiseAbsMax python/paddle/nn/quant/quant_layers.py p1 🙋@zrr1999
🙋32 MovingAverageAbsMaxScale python/paddle/nn/quant/quant_layers.py p1 🙋@zrr1999
✅33 weight_quantize python/paddle/nn/quant/quantized_linear.py p1 @zrr1999 #62988
#62988
✅34 weight_dequantize python/paddle/nn/quant/quantized_linear.py p1 @zrr1999 #62988
#62988
✅35 weight_only_linear python/paddle/nn/quant/quantized_linear.py p1 @zrr1999 #62988
#62988
✅36 apply_per_channel_scale python/paddle/nn/quant/quantized_linear.py p1 @zrr1999 #63472
#63472
🔵37 FakeQuanterWithAbsMaxObserver python/paddle/quantization/quanters/abs_max.py p1

任务统计

任务数量 🔵 可认领 🙋已认领 🚧 迁移中 🟢 待合入 ✅ 完成 🟡 下阶段推进 🏁完成率
37 18 12 2 0 5 0 13.5%

贡献者名单

排名不分先后 @SigureMo(1) @zrr1999(4)

分布式 API 适配指南

Tip

测试分布式 API 需要编译时开启 -DWITH_DISTRIBUTE=ON,并且需要至少两卡环境(单测需要两卡)

分布式 API 适配可参考 PR #62694,主要分为两部分:API 适配、单测验证。

API 适配

API 适配部分与前几期任务相同,如 #58067,即适配 API 中静态图分支,在 PIR 模式下分发到 PIR 下的 _C_ops 组网 API 上,如 #62694 中对于 python/paddle/distributed/communication/stream/all_reduce.py(适配 all_reduce)中的更改

值得注意的是,如果老 IR 静态图分支是 inplace 的操作,那么我们应该使用 inplace 的 op,如 c_allreduce_sum_,在老 IR 下是使用的 c_allreduce_sum,但输入输出为同一个,在 PIR 下应该直接用相应的 inplace op c_allreduce_sum_

单测验证

PIR 分布式 API 可以通过在 test/collective/process_group_nccl_pir.py 中添加新的 case 来验证,整体可参考相应的动态图单测 test/collective/process_group_nccl.py,新增单测 case 顺序最好和动态图保持一致。如果动态图单测中没有相应的 API,需要根据文档确定该 API 语义,并添加相应的 case。

添加后可通过运行

ctest -R test_collective_process_group_pir

来验证适配是否成功

调试技巧

你可以通过将子进程 stdout、stderr 重定向到文件中以便调试,如修改 test/legacy_test/test_parallel_dygraph_dataparallel.py 如下:

     procs = []
-    for t in pod.trainers:
+    for i, t in enumerate(pod.trainers):
         ...
-        proc = subprocess.Popen(cmd.split(" "), env=current_env)
+        proc = subprocess.Popen(cmd.split(" "), env=current_env, stdout=open(f"/tmp/out_{i}.log", "wb"), stderr=open(f"/tmp/err_{i}.log", "wb"))

之后就可以在 /tmp/out_0.log/tmp/out_1.log/tmp/err_0.log/tmp/err_1.log 中看到各个子进程详细的输出和报错信息了

量化 API 适配指南

TODO

@YuanRisheng YuanRisheng added the PFCC Paddle Framework Contributor Club,https://github.com/PaddlePaddle/community/tree/master/pfcc label Mar 11, 2024
@SigureMo SigureMo assigned zrr1999 and unassigned JZ-LIANG Mar 12, 2024
@zrr1999
Copy link
Member

zrr1999 commented Mar 12, 2024

【报名】:1、33-35

@SigureMo
Copy link
Member

【报名】:4

@luotao1 luotao1 moved this to In Progress in Call for Contributions Mar 15, 2024
@luotao1 luotao1 added the HappyOpenSource 快乐开源活动issue与PR label Mar 15, 2024
@luotao1 luotao1 self-assigned this Mar 15, 2024
@Eacient
Copy link

Eacient commented Mar 18, 2024

【报名】:3

@jshh0401
Copy link

【报名】:15

@zrr1999
Copy link
Member

zrr1999 commented Mar 27, 2024

【报名】:27、28、36

@zrr1999
Copy link
Member

zrr1999 commented Mar 31, 2024

【报名】:29-32

@ooooo-create
Copy link
Contributor

【报名】:5-8

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
HappyOpenSource 快乐开源活动issue与PR PFCC Paddle Framework Contributor Club,https://github.com/PaddlePaddle/community/tree/master/pfcc status/close 已关闭
Projects
Development

No branches or pull requests

9 participants