Only has 0.44 accuracy on GSM8K after running the provided codes #13

TrueNobility303 · 2024-06-10T08:36:02Z

Dear authors,

I train the CLLM model on GSM8k with Abel-7B-001 as the teacher model, using the dataset
cleaned_gsm8k_jacobi dataset you provided on huggingface, and run the train_cllm.sh, and set "use_gt_labels" in the file train_cllm_global.py to be False according to this previous issue.

The trained model only has an accuracy 0.44 after running bash eval/gsm8k/acc.sh, which is much lower than the result of the checkpoint provided by you.

Could you tell me what is wrong? What is the exact hyperparameter to reproduce the results?

I would greatly appreciate it if you could help me.

Best regards.

The text was updated successfully, but these errors were encountered:

karrykkk · 2024-06-17T12:49:49Z

Hi~ Thanks for your interest in our work! ☺️
Please set use_gt_labels as True for more strict AR labels(as teacher_output_ids may not be accurate though closer to origin distribution) and train a whole epoch with this dataset (max_new_seq_len=256 in the collected Jacobi trajectory). Model collapse may happen during training but the AR loss of ground-truth labels will pull the model back to the normal distribution after more training.
Feel free to reach out if there are still problems.

TrueNobility303 · 2024-06-18T05:33:08Z

I use exactly the same dataset and train for a whole epoch, but no matter whether setting use_gt_labels as True/ False can not have the desired result.

karrykkk · 2024-06-18T08:33:32Z

Hi~ Sorry for the confusion. This may result from a bug (i.e. forgetting to add detach() to loss which leads to extra backpropagation) in the training script. We have fixed this issue in the latest version of the code:

Consistency_LLM/cllm/cllm_trainer_global.py

Line 102 in 425691e

loss = loss_ar.detach() + loss_global.detach()

You can try running the updated code and the results should be normal now. Let me know if you have any other questions!

TrueNobility303 · 2024-06-18T13:36:25Z

But after this modification, the accuracy becomes 0.0. It seems that this modification is not correct.

karrykkk · 2024-06-20T12:51:35Z

My bad😥... While we do use .detach() during CLLM training, we found the key bug causing this result is that the labels_ids in the dataset we provided before is defective. As shown in the screenshot below, the labels_ids only include the question, and the ground-truth answer is missing.

We have launched a modified version and will update the dataset in the hugging face as soon as the generation ends. You can either wait for the generation to complete or generate your own dataset by running the provided script. Sorry again and thanks for your patience and understanding.

TrueNobility303 · 2024-06-21T06:35:01Z

But it is strange that setting use_gt_labels=False still does not solve this problem.

snyhlxde1 · 2024-08-05T19:54:50Z

Hi @TrueNobility303. Thanks for your patience! We have identified the problems in the training script:

instead of using complete_teacher_output_ids, teacher_output_ids from the trajectory dataset should be used.
during training, model should be loaded in bf16 (in consistency with the setting when generating the trajectories).

Please pull the code again. After applying the patches, training a model should be able to give you a much better performance as we have reported in the paper.

Songwxuan · 2025-01-08T08:28:01Z

Hi~Thank you for your reply. So after the modification, should I set use_gt_labels as True or not?

karrykkk · 2025-01-08T12:07:00Z

Sorry for the earlier confusion. We have checked that the current version, with use_gt_labels set to False, can reproduce the results reported in the paper.

Songwxuan · 2025-01-08T12:10:44Z

Thanks a lot!

TrueNobility303 changed the title ~~Cannot reproduce the result~~ Only has 0.44 accuracy on GSM8K after running the provided codes Jun 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only has 0.44 accuracy on GSM8K after running the provided codes #13

Only has 0.44 accuracy on GSM8K after running the provided codes #13

TrueNobility303 commented Jun 10, 2024 •

edited

Loading

karrykkk commented Jun 17, 2024

TrueNobility303 commented Jun 18, 2024

karrykkk commented Jun 18, 2024

TrueNobility303 commented Jun 18, 2024 •

edited

Loading

karrykkk commented Jun 20, 2024 •

edited

Loading

TrueNobility303 commented Jun 21, 2024

snyhlxde1 commented Aug 5, 2024 •

edited

Loading

Songwxuan commented Jan 8, 2025

karrykkk commented Jan 8, 2025

Songwxuan commented Jan 8, 2025

Only has 0.44 accuracy on GSM8K after running the provided codes #13

Only has 0.44 accuracy on GSM8K after running the provided codes #13

Comments

TrueNobility303 commented Jun 10, 2024 • edited Loading

karrykkk commented Jun 17, 2024

TrueNobility303 commented Jun 18, 2024

karrykkk commented Jun 18, 2024

TrueNobility303 commented Jun 18, 2024 • edited Loading

karrykkk commented Jun 20, 2024 • edited Loading

TrueNobility303 commented Jun 21, 2024

snyhlxde1 commented Aug 5, 2024 • edited Loading

Songwxuan commented Jan 8, 2025

karrykkk commented Jan 8, 2025

Songwxuan commented Jan 8, 2025

TrueNobility303 commented Jun 10, 2024 •

edited

Loading

TrueNobility303 commented Jun 18, 2024 •

edited

Loading

karrykkk commented Jun 20, 2024 •

edited

Loading

snyhlxde1 commented Aug 5, 2024 •

edited

Loading