You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have 2 questions about the computation of the pass@k metric after applying filtering on the APPS benchmark.
Will the total array in the below code snippet contain numbers of filtered samples that passed the example test cases (from problem statement), i.e. each number <= N_original_samples(=1000)?
In the cases when a number of filtered samples is less than k (=[1,5]), how do you compute the pass@k metric for these cases? For example, when N_filtered_samples = 1 and k=5, can we assume execution results of 4 failures and 1 passed/failure (depending on the final unit test results of this filtered sample)?
The text was updated successfully, but these errors were encountered:
Hi,
Thank you for the great work!
I have 2 questions about the computation of the pass@k metric after applying filtering on the APPS benchmark.
Will the
total
array in the below code snippet contain numbers of filtered samples that passed the example test cases (from problem statement), i.e. each number <= N_original_samples(=1000)?human-eval/human_eval/evaluation.py
Line 85 in 312c5e5
In the cases when a number of filtered samples is less than k (=[1,5]), how do you compute the pass@k metric for these cases? For example, when N_filtered_samples = 1 and k=5, can we assume execution results of 4 failures and 1 passed/failure (depending on the final unit test results of this filtered sample)?
The text was updated successfully, but these errors were encountered: