Skip to content

Fine Tuning #55

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
miolini opened this issue Mar 12, 2023 · 3 comments
Closed

Fine Tuning #55

miolini opened this issue Mar 12, 2023 · 3 comments
Labels
model Model specific

Comments

@miolini
Copy link

miolini commented Mar 12, 2023

Hey!

Thank you for your amazing job!

I'm curious is it possible to use RLHF feedback after a response to make small incremental adjustments in a tuning process? For example, if the user decides to fine-tune after an incorrect answer, can the model spend 60 seconds in the fine-tuning phase, save a checkpoint to disk, and then move on to the next question?

@miolini miolini changed the title Inplace Fine Tuning Fine Tuning Mar 12, 2023
@gjmulder
Copy link
Collaborator

I believe llama.ccp is only for inference, not training. Check out chatllama, but you will likely need some high-end GPUs to do RLHF. Alternatively, look at accelerate trl for performing RHLF on models that fit on consumer GPUs.

@miolini
Copy link
Author

miolini commented Mar 12, 2023

@gjmulder I would like to continue narrative to run such processes on CPU only. Even if it super slow I think it's possible to spend some time budget (60 secs) to improve a bit weights and close the loop of self improvement like
in Gödel machines.

@gjmulder
Copy link
Collaborator

Check out thread #23. This would allow you to have ChatGPT-type narrative conversations with the model, but is not RLHF.

@gjmulder gjmulder added the model Model specific label Mar 15, 2023
@gjmulder gjmulder closed this as not planned Won't fix, can't repro, duplicate, stale Mar 15, 2023
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
model Model specific
Projects
None yet
Development

No branches or pull requests

2 participants