Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

使用Paddle Custom NPU训练SAC一段时间后reward一直不变 #1106

Open
USTCKAY opened this issue Jul 4, 2023 · 0 comments
Open

使用Paddle Custom NPU训练SAC一段时间后reward一直不变 #1106

USTCKAY opened this issue Jul 4, 2023 · 0 comments

Comments

@USTCKAY
Copy link

USTCKAY commented Jul 4, 2023

Hello,最近我在用NPU跑SAC时遇到了如题所说的情况,reward信息如下图。想请教一下PARL的同学这种情况可能是什么原因导致的呢?
image
我用GPU和CPU版本的Paddle试过,模型都能够正常训练,说明算法本身没有问题。我又统计了一下SAC用到的paddle算子,发现只有add clip full_ matmul relu scale tanh uniform,所以尝试了逐个把这些算子fallback到cpu上运行,但是除了屏蔽matmul外还是会出现相同的情况。而fallback matmul算子时在训练一段时间后会报如下的错误:
image
我这边暂时没有什么思路来定位问题了,恳请PARL的同学帮忙看一下,多谢!
ps:使用的paddle和PARL都是最新的develop版本

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant