Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems When using CACHE #22

Open
Louis-ZhangLe opened this issue Jan 23, 2024 · 9 comments
Open

Problems When using CACHE #22

Louis-ZhangLe opened this issue Jan 23, 2024 · 9 comments

Comments

@Louis-ZhangLe
Copy link

Thanks for your great job. When I was reproducing your work using the CACHE, I could not find the matching hash key under the responses file. Looking forward to your reply, thank you.

@huy-ha
Copy link
Member

huy-ha commented Jan 24, 2024

Hey! I'm able to reproduce your error. When I rollback to the first commit 3d2f43c, I'm no longer get the same error. My guess is that this bug was introduced in 218a618.

Unfortunately, I don't have time for another week to fix this issue. For now, if you don't need the FR5 robot, could you also use 3d2f43c? Thanks!

@Louis-ZhangLe
Copy link
Author

Thank you for your answer. I'll give it a try first. Also, the reason why I didn't use the OpenAI API is because there were no logprobs in the response. May I ask if it is possible to remove logprobs from the code? Looking forward to your reply, thank you.

@huy-ha
Copy link
Member

huy-ha commented Jan 25, 2024

Ah right, the OpenAI API removed logprobs recently. Anyways, you should be able to remove the logprobs from the completion sampling procedure without affecting the results too much!

@Louis-ZhangLe
Copy link
Author

Hello, I have successfully run the version you submitted for the first time. I first perform the reproduction work in the transport task. But The training results cannot reach the effect of the paper, and the gap is too big. Can you provide more details on model training, such as training parameters, num_steps_per_update, batch_size and epoch for each domain. Looking forward to your reply, thank you.

@huy-ha
Copy link
Member

huy-ha commented Jan 31, 2024

Hey! The default training parameters are the ones I used (batch size of 1024, 10 epochs, 1 num steps per update, etc.
How many datapoints was used for training?

@Louis-ZhangLe
Copy link
Author

The datapoints of transport task is 52133. I found that the default value of num_steps_per_update is 10000, which means that the model is updated 10,000 times in an epoch. Are you sure you set it to 1. Moreover, Isn’t it necessary to test during training? Just verification is enough. So the evaluation.num_episodes=0?In addition, I found that inference using diffusion policy is slower and keeps printing some warnings, such as " WARNING Failed to converge after 299 steps: err_norm=0.104888 ". Is this normal? Finally, I would also like to ask about the model name with the best output called "last.ckpt". Why can't I load it? It says there is no such file, but the path is correct. Other checkpoints can be loaded. Looking forward to your reply, thank you.

@Louis-ZhangLe
Copy link
Author

Louis-ZhangLe commented Jan 31, 2024

Sorry, I was referring to "num_steps_per_update", not "num_timesteps_per_batch".

@OceansDepp
Copy link

Hi! I just rollback to the first commit and download the responses files. But I still have the issue: 'openai.error.AuthenticationError: No API key provided. You can set your API key in code using 'openai.api_key = ', or you can set the environment variable OPENAI_API_KEY=). If your API key is stored in a file, you can point the openai module at it with 'openai.api_key_path = '. ....' I have no idea what might be the reason. Looking forward to the reply from both of you guys, thank you!

@Root970103
Copy link

Hi! I just rollback to the first commit and download the responses files. But I still have the issue: 'openai.error.AuthenticationError: No API key provided. You can set your API key in code using 'openai.api_key = ', or you can set the environment variable OPENAI_API_KEY=). If your API key is stored in a file, you can point the openai module at it with 'openai.api_key_path = '. ....' I have no idea what might be the reason. Looking forward to the reply from both of you guys, thank you!

Maybe you can check the path saving cache file. Make sure it is scalingup/scalingup/responses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants