-
Notifications
You must be signed in to change notification settings - Fork 244
Leaderboard README improvements #217
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, thanks for the fixes!
@@ -111,7 +113,7 @@ for lang in "${langs[@]}"; do | |||
task=multiple-$lang | |||
fi | |||
|
|||
gen_suffix=generations_$task\_$model.json | |||
gen_suffix=generations_$task\_$model\_$task.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need to have the same path format as here
bigcode-evaluation-harness/main.py
Line 387 in 094c7cc
save_generations_path = f"{os.path.splitext(args.save_generations_path)[0]}_{task}.json" |
--load-generations_path
which can be anything. So let's maybe keep the original path to not have task
twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because during evaluation we call
--load-generations_path
which can be anything.
Right however, current README steps for Evaluation passes $gen_suffix
variable in --load-generations_path
argument
bigcode-evaluation-harness/leaderboard/README.md
Lines 114 to 121 in 642c57f
gen_suffix=generations_$task\_$model.json | |
metric_suffix=metrics_$task\_$model.json | |
echo "Evaluation of $model on $task benchmark, data in $generations_path/$gen_suffix" | |
sudo docker run -v $(pwd)/$generations_path/$gen_suffix:/app/$gen_suffix:ro -v $(pwd)/$metrics_path:/app/$metrics_path -it evaluation-harness-multiple python3 main.py \ | |
--model $org/$model \ | |
--tasks $task \ | |
--load_generations_path /app/$gen_suffix \ |
Since $gen_suffix
is missing the _task
suffix, Running evaluations results in the following error

After adding _task
suffix in $gen_suffix
, evaluations run successfully.
I was able to run evaluations for the Artigenz-Coder-DS-6.7B here after these changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it shouldn't throw an error if you used save_generations_path=generations_$task\_$model.json
in the generations
While trying to run the steps given in leaderboard README, found following improvements
1-Setup
model
variable to be initialised before creating generations and metrics directories2-Generations
save_generations
flag is missing in while running generationsmax_length
to be 1024 for some tasks, based on your tokeniser (Fix for max_length_generation parameter #207)3-Evaluations
Generations file is saved in
save_generations_path_$task
, while running evaluations it should load from this path(_$task is missing in the path in README).bigcode-evaluation-harness/main.py
Line 387 in 094c7cc