Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

How to use on GPU? #10

Open
phiweger opened this issue Jul 15, 2023 · 2 comments
Open

How to use on GPU? #10

phiweger opened this issue Jul 15, 2023 · 2 comments

Comments

@phiweger
Copy link

Very interesting library @r2d4 !

I am trying to use the example in the README but with the model being on the GPU (as is required for many of the recent larger LLMs):

import regex
from transformers import AutoModelForCausalLM, AutoTokenizer

from rellm import complete_re

model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")

prompt = "ReLLM, the best way to get structured data out of LLMs, is an acronym for "
pattern = regex.compile(r'Re[a-z]+ L[a-z]+ L[a-z]+ M[a-z]+')

# THIS IS WHAT I'D LIKE TO DO
devide = "cuda:0"
model.to(device)

output = complete_re(tokenizer=tokenizer, 
                     model=model, 
                     prompt=prompt,
                     pattern=pattern,
                     do_sample=True,
                     max_new_tokens=80)
print(output)

fails with

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select)

Is it possible to use ReLLM with the model living on the GPU?

@phiweger
Copy link
Author

related to #6 I guess

@Emekaborisama
Copy link

i am having this same issue. any help pls

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants