Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Feat (gptq): optimizing CPU to GPU memory transfer #1009

Merged
merged 2 commits into from
Sep 12, 2024

Conversation

i-colbert
Copy link
Collaborator

No description provided.

@i-colbert i-colbert requested a review from Giuseppe5 August 26, 2024 23:55
@i-colbert i-colbert requested review from Giuseppe5 and removed request for Giuseppe5 August 27, 2024 01:12
@nickfraser
Copy link
Collaborator

I'm slightly worried about this messing with our interop with HuggingFace accelerate. Would you test this in a multi-GPU setup with accelerate? Easiest way is to use this: https://github.com/huggingface/optimum-amd/tree/main/examples/quantization/brevitas

@nickfraser
Copy link
Collaborator

I ran a small multi-GPU test with accelerate and this seems to work.

@Giuseppe5 Giuseppe5 merged commit 10dcee3 into Xilinx:dev Sep 12, 2024
337 checks passed
@i-colbert i-colbert deleted the feat/gptq branch September 12, 2024 21:45
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants