We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
在相同的1n1g的机器资源下,为什么对于tensor model parallel,bs更大,samples/s 还小了?
The text was updated successfully, but these errors were encountered:
视前向计算在整体的占比,如果是 acc 场景, 占比会更大一些,约 1/3 = 前向 /( 前向 + 反向),一般网络,反向计算量是前向的两倍。
tensor model parallel 中用到了 ac,所以才可以跑 128 这么大的 bs,代价就是会多做一次前向。
Sorry, something went wrong.
哦哦,了解了,这样看来对于bert,使用tensor parallel没有效果啊
No branches or pull requests
在相同的1n1g的机器资源下,为什么对于tensor model parallel,bs更大,samples/s 还小了?
The text was updated successfully, but these errors were encountered: