Llama results in ASQA #25

Guanyu-Lin · 2024-05-13T18:52:10Z

Hi,

After I run the code with config for llama 7b on ASQA (https://github.com/princeton-nlp/ALCE/blob/main/configs/asqa_llama-7b_shot1_ndoc3_gtr_default.yaml), I get the result as below.
{
"length": 88.25,
"str_em": 24.326652601969055,
"str_hit": 7.59493670886076,
"QA-EM": 10.161744022503516,
"QA-F1": 14.520007320055194,
"QA-Hit": 1.160337552742616,
"mauve": 59.28902389177575
}
It seems the mauve is different from the results reported in Table 19 of the paper with mauve at 69.8. Is there some problem?

Besides, does the Correct (EM Rec.) is the "str_em" here instead of "QA-EM"?

gaotianyu1350 · 2024-05-13T20:35:16Z

Hi,

str_em corresponds to the EM Rec. I'm not sure what it is with Mauve but it is known to be highly unstable (a slight change of environments might lead to drastically different results). We mostly use Mauve as a sanity check so I wouldn't worry too much about it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama results in ASQA #25

Llama results in ASQA #25

Guanyu-Lin commented May 13, 2024

gaotianyu1350 commented May 13, 2024

Llama results in ASQA #25

Llama results in ASQA #25

Comments

Guanyu-Lin commented May 13, 2024

gaotianyu1350 commented May 13, 2024