-
Notifications
You must be signed in to change notification settings - Fork 248
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
是否会有deepspeed加速训练和推理过程呢? #46
Comments
您是用A100还是T4 啊。A100应该会快不少,我们后面会去研究模型并行,谢谢您的意见- -! |
其实主要是这样的,GLM没有被移到huggingface标准管线,如果移动进去,应该可以用accelerator直接去加速。 我后面想看看别的hf管线内的模型能不能做这个,我觉得您这个讨论挺有意义的。 |
我是用V100 32G的,加载完会大概需要14G的显存,我尝试用了chatglm 的deepspeed支持,但似乎底层代码还不是很支持在luotuo上做inference,加载到多个显存后会报错“同一份数据无法在两个显存加载”。 |
请教一下还有什么加速的方法呢?比如模型量化后是否能够加速推理呢。 |
我今天看到一个很惊人的工作 https://zhuanlan.zhihu.com/p/622754642 High-throughput Generative Inference of Large Language Models with a Single GPU 但是感觉想整合这个的码量不小 |
现在用tuoling摘要,每个对话需要运行15s才能有结果,希望后续能推出多gpu加速后的推理。
The text was updated successfully, but these errors were encountered: