Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

使用RAG对话时,当询问几轮后,显存溢出,请问有人知道怎么释放对话历史吗?谢谢 #5226

Open
azhe1234 opened this issue Feb 13, 2025 · 5 comments

Comments

@azhe1234
Copy link

Image
部署的是deepseek-r1:32b的模型,当进行两三轮对话时,显存一直增加,最后溢出,我在创建任务后,释放了内存,但是显存并没有被释放,请问有人知道怎么修改代码吗,谢谢!

Image

Copy link

dosubot bot commented Feb 13, 2025

@azhe1234 你好!我是Dosu,一个可以帮助你解决错误、回答问题并成为贡献者的机器人。在等待人类维护者时,我会尽力帮助你。

I found a similar closed issue regarding GPU memory overflow when using the Langchain-Chatchat repository. However, no solution was found in the comments of that issue [1].

For managing GPU memory during multi-turn conversations, you might want to consider the following:

  1. PagedAttention and FlashAttention: These techniques help optimize memory usage by reducing memory waste and avoiding large intermediate attention matrices [2].

  2. Quantized KV Cache and Model Parallelism: These methods can help reduce memory usage and accelerate computation [2].

  3. Dynamic Batching and Inflight Batching: These strategies improve GPU utilization by adjusting batch sizes dynamically and scheduling at the step level [2].

Additionally, you can try using the torch.cuda.empty_cache() and torch.cuda.ipc_collect() functions in your script to clear GPU memory, as shown in your provided image. However, if these methods are not effective, you may need to explore more advanced memory management techniques or consider optimizing your model's memory usage further.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other | Bug Report

@wanmuchun0
Copy link

这个历史对话轮数是需要改代码吗,我的历史对话轮数不起作用

@wanmuchun0
Copy link

Image 部署的是deepseek-r1:32b的模型,当进行两三轮对话时,显存一直增加,最后溢出,我在创建任务后,释放了内存,但是显存并没有被释放,请问有人知道怎么修改代码吗,谢谢!

Image

老哥你咋设置好历史对话的啊,我的历史对话都不起作用

@azhe1234
Copy link
Author

Image 部署的是deepseek-r1:32b的模型,当进行两三轮对话时,显存一直增加,最后溢出,我在创建任务后,释放了内存,但是显存并没有被释放,请问有人知道怎么修改代码吗,谢谢!
Image

老哥你咋设置好历史对话的啊,我的历史对话都不起作用

我的也不好使,history为空,我还在尝试改

@wanmuchun0
Copy link

Image 部署的是deepseek-r1:32b的模型,当进行两三轮对话时,显存一直增加,最后溢出,我在创建任务后,释放了内存,但是显存并没有被释放,请问有人知道怎么修改代码吗,谢谢!
Image

老哥你咋设置好历史对话的啊,我的历史对话都不起作用

我的也不好使,history为空,我还在尝试改

老哥改好了吗

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants