Fine Tuning #55

miolini · 2023-03-12T17:50:41Z

Hey!

Thank you for your amazing job!

I'm curious is it possible to use RLHF feedback after a response to make small incremental adjustments in a tuning process? For example, if the user decides to fine-tune after an incorrect answer, can the model spend 60 seconds in the fine-tuning phase, save a checkpoint to disk, and then move on to the next question?

gjmulder · 2023-03-12T18:21:27Z

I believe llama.ccp is only for inference, not training. Check out chatllama, but you will likely need some high-end GPUs to do RLHF. Alternatively, look at accelerate trl for performing RHLF on models that fit on consumer GPUs.

miolini · 2023-03-12T18:37:31Z

@gjmulder I would like to continue narrative to run such processes on CPU only. Even if it super slow I think it's possible to spend some time budget (60 secs) to improve a bit weights and close the loop of self improvement like
in Gödel machines.

gjmulder · 2023-03-12T18:55:26Z

Check out thread #23. This would allow you to have ChatGPT-type narrative conversations with the model, but is not RLHF.

miolini changed the title ~~Inplace Fine Tuning~~ Fine Tuning Mar 12, 2023

gjmulder added the model label Mar 15, 2023

gjmulder closed this as not planned Mar 15, 2023

Bearsaerker mentioned this issue Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fine Tuning #55

Fine Tuning #55

miolini commented Mar 12, 2023 •

edited

Loading

gjmulder commented Mar 12, 2023

miolini commented Mar 12, 2023 •

edited

Loading

gjmulder commented Mar 12, 2023

Fine Tuning #55

Fine Tuning #55

Comments

miolini commented Mar 12, 2023 • edited Loading

gjmulder commented Mar 12, 2023

miolini commented Mar 12, 2023 • edited Loading

gjmulder commented Mar 12, 2023

miolini commented Mar 12, 2023 •

edited

Loading

miolini commented Mar 12, 2023 •

edited

Loading