Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Add G-Pass@k Metric #589

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

Conversation

jnanliu
Copy link

@jnanliu jnanliu commented Feb 26, 2025

This PR aims to support the G-Pass@k from paper.

G-Pass@k is a generalized version of Pass@k, measuring the ability of models to generate m correct solution in k attempts, where m is controled by the parameter thresholds. When the threshold is 0, G-Pass@k will discard to G-Pass@k. G-Pass@k can measure the potential and stability of models simultaneously.

$$ \text{G-Pass@}k_{\tau} = E_{\text{Questions}} \left[ \sum_{j = \lceil \tau \cdot k \rceil}^{c} \frac{\binom{c}{j} \cdot \binom{n - c}{k - j}}{\binom{n}{k}} \right] $$

$$ \text{mG-Pass@}k_{\tau} = 2\int_{0.5}^{1.0} \text{G-Pass@}k_{\tau} d \tau = \frac{2}{k} \sum_{i= \lceil 0.5 \cdot k \rceil + 1}^{k} \text{G-Pass@}k_{\frac{i}{k}} $$

@NathanHB
Copy link
Member

NathanHB commented Mar 4, 2025

hey ! Thanks for the PR :)
Do you plan to also add the math benchmark that comes with it ?

@jnanliu
Copy link
Author

jnanliu commented Mar 5, 2025

hey, I have added some tasks in tasks/default_tasks.py that support G-Pass@16 evaluation on AIME24/25 and MATH500 benchmarks, you can check it :)

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants