Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

是否会有deepspeed加速训练和推理过程呢? #46

Open
heyday111 opened this issue Apr 17, 2023 · 5 comments
Open

是否会有deepspeed加速训练和推理过程呢? #46

heyday111 opened this issue Apr 17, 2023 · 5 comments

Comments

@heyday111
Copy link

现在用tuoling摘要,每个对话需要运行15s才能有结果,希望后续能推出多gpu加速后的推理。

@LC1332
Copy link
Owner

LC1332 commented Apr 17, 2023

您是用A100还是T4 啊。A100应该会快不少,我们后面会去研究模型并行,谢谢您的意见- -!

@LC1332
Copy link
Owner

LC1332 commented Apr 17, 2023

其实主要是这样的,GLM没有被移到huggingface标准管线,如果移动进去,应该可以用accelerator直接去加速。 我后面想看看别的hf管线内的模型能不能做这个,我觉得您这个讨论挺有意义的。

@heyday111
Copy link
Author

我是用V100 32G的,加载完会大概需要14G的显存,我尝试用了chatglm 的deepspeed支持,但似乎底层代码还不是很支持在luotuo上做inference,加载到多个显存后会报错“同一份数据无法在两个显存加载”。

@heyday111
Copy link
Author

请教一下还有什么加速的方法呢?比如模型量化后是否能够加速推理呢。

@LC1332
Copy link
Owner

LC1332 commented Apr 18, 2023

请教一下还有什么加速的方法呢?比如模型量化后是否能够加速推理呢。

我今天看到一个很惊人的工作 https://zhuanlan.zhihu.com/p/622754642 High-throughput Generative Inference of Large Language Models with a Single GPU 但是感觉想整合这个的码量不小

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants