Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

LLM Evaluation Tutorials with Evalverse #76

Open
jihoo-kim opened this issue May 27, 2024 · 2 comments
Open

LLM Evaluation Tutorials with Evalverse #76

jihoo-kim opened this issue May 27, 2024 · 2 comments

Comments

@jihoo-kim
Copy link

Suggest for LLM Evaluation Tutorials with Evalverse

image

@jihoo-kim jihoo-kim changed the title LLM Evaluation Tutorials with Evalverse (Open LLM Evaluation Tutorials with Evalverse May 27, 2024
@mlabonne
Copy link
Owner

Hey thanks for the suggestion, this is quite exciting. I've been looking for something like this for a while.

I've tried it yesterday and I ran into some issues:

  • Results couldn't be written on disk (at least for MT-Bench and EQ-Bench), which means I lost my MT-Bench's results
  • I couldn't use EQ-Bench without a default chat template (ChatML seems to be selected by default), which meant I couldn't use Llama 3's chat template

In general, I would really appreciate if we could have an example with Llama 3.

@jihoo-kim
Copy link
Author

Thanks for accepting my suggestion and trying it. @mlabonne

Issue 1

Results couldn't be written on disk (at least for MT-Bench and EQ-Bench), which means I lost my MT-Bench's results

Could you tell me what script you ran it with? If you specify the output_path argument, the results would be saved on the disk. The default values of output_path is the directory where evalverse is placed.

Please try again with your own output_path.

CLI

python3 evaluator.py \
    --ckpt_path {your_model} \
    --mt_bench \
    --num_gpus_total 8 \
    --parallel_api 4 \
    --output_path {your_path}

Library

import evalverse as ev

evaluator = ev.Evaluator()
evaluator.run(
    model={your_model},
    benchmark="mt_bench",
    num_gpus_total=8,
    parallel_api=4,
    output_path={your_path}
)

Issue 2

I couldn't use EQ-Bench without a default chat template (ChatML seems to be selected by default), which meant I couldn't use Llama 3's chat template

I will fix it as soon as possible and let you know again. Thank you!

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants