Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

How slow/fast is this method of calling generate() #11

Open
RevanthRameshkumar opened this issue Jul 23, 2023 · 0 comments
Open

How slow/fast is this method of calling generate() #11

RevanthRameshkumar opened this issue Jul 23, 2023 · 0 comments

Comments

@RevanthRameshkumar
Copy link

I noticed one of the core parts of the strategy is to call generate one token at a time, but I was wondering how slow/fast this is compared to using the ConstrainedBeam search or something similar from HF.
Also curious what the speedup might be of implementing in c++ vs via python wrapper. ggerganov/llama.cpp#1773

I actually think your approach is better for my use case because there are many tweaks you can make even on the grammar sampling (as evidenced by the discussion in the above PR) ... but I am curious as to what the performance impact is.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant