Skip to content

Leaderboard README improvements #217

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

nikita1503
Copy link

@nikita1503 nikita1503 commented Apr 14, 2024

While trying to run the steps given in leaderboard README, found following improvements

  • 1-Setup
    model variable to be initialised before creating generations and metrics directories

  • 2-Generations
    save_generations flag is missing in while running generations
    max_length to be 1024 for some tasks, based on your tokeniser (Fix for max_length_generation parameter #207)

  • 3-Evaluations
    Generations file is saved in save_generations_path_$task, while running evaluations it should load from this path(_$task is missing in the path in README).

    save_generations_path = f"{os.path.splitext(args.save_generations_path)[0]}_{task}.json"

Copy link
Collaborator

@loubnabnl loubnabnl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for the fixes!

@@ -111,7 +113,7 @@ for lang in "${langs[@]}"; do
task=multiple-$lang
fi

gen_suffix=generations_$task\_$model.json
gen_suffix=generations_$task\_$model\_$task.json
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need to have the same path format as here

save_generations_path = f"{os.path.splitext(args.save_generations_path)[0]}_{task}.json"
because during evaluation we call --load-generations_path which can be anything. So let's maybe keep the original path to not have task twice?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because during evaluation we call --load-generations_path which can be anything.

Right however, current README steps for Evaluation passes $gen_suffix variable in --load-generations_path argument

gen_suffix=generations_$task\_$model.json
metric_suffix=metrics_$task\_$model.json
echo "Evaluation of $model on $task benchmark, data in $generations_path/$gen_suffix"
sudo docker run -v $(pwd)/$generations_path/$gen_suffix:/app/$gen_suffix:ro -v $(pwd)/$metrics_path:/app/$metrics_path -it evaluation-harness-multiple python3 main.py \
--model $org/$model \
--tasks $task \
--load_generations_path /app/$gen_suffix \

Since $gen_suffix is missing the _task suffix, Running evaluations results in the following error

image

After adding _task suffix in $gen_suffix, evaluations run successfully.

I was able to run evaluations for the Artigenz-Coder-DS-6.7B here after these changes

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it shouldn't throw an error if you used save_generations_path=generations_$task\_$model.json in the generations

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants