Add G-Pass@k Metric #589

jnanliu · 2025-02-26T10:14:10Z

This PR aims to support the G-Pass@k from paper.

G-Pass@k is a generalized version of Pass@k, measuring the ability of models to generate m correct solution in k attempts, where m is controled by the parameter thresholds. When the threshold is 0, G-Pass@k will discard to G-Pass@k. G-Pass@k can measure the potential and stability of models simultaneously.

$$ \text{G-Pass@}k_{\tau} = E_{\text{Questions}} \left[ \sum_{j = \lceil \tau \cdot k \rceil}^{c} \frac{\binom{c}{j} \cdot \binom{n - c}{k - j}}{\binom{n}{k}} \right] $$

$$ \text{mG-Pass@}k_{\tau} = 2\int_{0.5}^{1.0} \text{G-Pass@}k_{\tau} d \tau = \frac{2}{k} \sum_{i= \lceil 0.5 \cdot k \rceil + 1}^{k} \text{G-Pass@}k_{\frac{i}{k}} $$

NathanHB · 2025-03-04T12:59:29Z

hey ! Thanks for the PR :)
Do you plan to also add the math benchmark that comes with it ?

…g-pass-at-k-dev

jnanliu · 2025-03-05T11:40:42Z

hey, I have added some tasks in tasks/default_tasks.py that support G-Pass@16 evaluation on AIME24/25 and MATH500 benchmarks, you can check it :)

jnanliu added 4 commits February 26, 2025 07:27

add gpassk metric

b7e104a

fix pre-commit error

7ae3af0

fix return type check

a7ae7cc

fix metrics

c6e7a48

jnanliu added 2 commits March 5, 2025 09:31

Merge branch 'main' of https://github.com/huggingface/lighteval into …

c2ddba8

…g-pass-at-k-dev

support gpassk for aime24/25 and math_500

947a5ec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add G-Pass@k Metric #589

Add G-Pass@k Metric #589

jnanliu commented Feb 26, 2025

NathanHB commented Mar 4, 2025

jnanliu commented Mar 5, 2025

Add G-Pass@k Metric #589

Are you sure you want to change the base?

Add G-Pass@k Metric #589

Conversation

jnanliu commented Feb 26, 2025

NathanHB commented Mar 4, 2025

jnanliu commented Mar 5, 2025