Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions on the Transport task #29

Open
Knight-xiao opened this issue Dec 28, 2024 · 13 comments
Open

Questions on the Transport task #29

Knight-xiao opened this issue Dec 28, 2024 · 13 comments
Assignees
Labels
bug Something isn't working

Comments

@Knight-xiao
Copy link

Thank you very much for your excellent work. I have a few questions that I would like to get your answers .
Firstly, I tried to reproduce the Transport task in robomimic. After fine-tuning, the displayed success rate on the terminal has reached over 90%. However, when I executed the eva file, I found that the success rate was not high, far from 90%. I also tested the policy obtained from pre-training, and surprisingly, the effects of the pre-trained and fine-tuned policies were exactly the same. Do you have any solutions for this? It should be noted that I used pre_diffusion_mlp for pre-training and ft_ppo_diffusion_mlp for fine-tuning, and in ft_ppo_diffusion_mlp, I changed the base_policy_path to the policy obtained from pre-training.
Additionally, I successfully recorded the Transport task by following your tutorial, mainly by modifying render_num, env.n_envs, and env.save_video in eval_diffusion_mlp. However, what should I do to directly display the simulation process of the Transport task?

@allenzren
Copy link
Member

Hi! Could you share the commit you are running with, and also the exact files/configs? I can try to help if you can send me those.

For the second question, do you mean showing the GUI live instead of recording the video? I've done that before by not using vectorized environment but I don't have the script available right now. You can take a look at the robomimic/robosuite doc.

@Knight-xiao
Copy link
Author

Thank you for your answer! I found that the aforementioned issue is not limited to the Transport task; other tasks also exhibit the same problem. Below are some configurations related to the Square task:
First, I conducted pre-training by executing the command:

python script/run.py --config-name=pre_diffusion_mlp --config-dir=cfg/robomimic/pretrain/squre

Here, I did not make any modifications to the pre-train configuration. Below are the results obtained from directly evaluating the pre-trained policy:
1

Then, I performed fine-tuning by executing the command:

python script/run.py --config-name=ft_ppo_diffusion_mlp --config-dir=cfg/robomimic/finetune/squre

Here, I only modified the base_policy_path in the configuration file as follows:

name: ${env_name}_ft_diffusion_mlp_ta${horizon_steps}_td${denoising_steps}_tdf${ft_denoising_steps}
logdir: ${oc.env:DPPO_LOG_DIR}/robomimic-finetune/${name}/${now:%Y-%m-%d}_${now:%H-%M-%S}_${seed}
# base_policy_path: ${oc.env:DPPO_LOG_DIR}/robomimic-pretrain/square/square_pre_diffusion_mlp_ta4_td20/2024-07-10_01-46-16/checkpoint/state_8000.pt
base_policy_path: ${oc.env:DPPO_LOG_DIR}/robomimic-pretrain/square_pre_diffusion_mlp_ta4_td20/2024-12-28_18-14-32_42/checkpoint/state_3000.pt # This is my pre-train policy
robomimic_env_cfg_path: cfg/robomimic/env_meta/${env_name}.json
normalization_path: ${oc.env:DPPO_DATA_DIR}/robomimic/${env_name}/normalization.npz

Below are the results obtained from the fine-tuned policy:
2

@Knight-xiao
Copy link
Author

From the second figure, it appears that the fine-tuned policy has achieved a 99% success rate, significantly higher than the 49% success rate of the pre-trained policy. However, when actually evaluating the fine-tuned policy, its performance is the same as that of the pre-trained policy:
3
It should be noted that the only modification I made in cfg/robomimic/eval/squre/eval_diffusion_mlp.yaml was to the base_policy_path:

name: ${env_name}_eval_diffusion_mlp_ta${horizon_steps}_td${denoising_steps}
logdir: ${oc.env:DPPO_LOG_DIR}/robomimic-eval/${name}/${now:%Y-%m-%d}_${now:%H-%M-%S}_${seed}
# pre-train policy
# base_policy_path: ${oc.env:DPPO_LOG_DIR}/robomimic-pretrain/square_pre_diffusion_mlp_ta4_td20/2024-12-28_18-14-32_42/checkpoint/state_3000.pt

# fine-tune policy
base_policy_path: ${oc.env:DPPO_LOG_DIR}/robomimic-finetune/square_ft_diffusion_mlp_ta4_td20_tdf10/2024-12-28_21-20-17_42/checkpoint/state_200.pt
robomimic_env_cfg_path: cfg/robomimic/env_meta/${env_name}.json
normalization_path: ${oc.env:DPPO_DATA_DIR}/robomimic/${env_name}/normalization.npz

@Knight-xiao
Copy link
Author

The above are the modifications I made to the relevant configuration files during the training process. Do you need any other information? I sincerely appreciate your help and also wish you a Happy New Year!

@allenzren
Copy link
Member

allenzren commented Dec 29, 2024

Hmm I see your logs but I don't have a clue. Maybe you can look at the weights of the pre-trained and saved fine-tuned policies, and make sure they are not the same?

Since the reward numbers exactly match, I would suspect something off with saving/loading the policy. Maybe my code has a bug but I would need to run the training, if you still can't figure it out.

@Knight-xiao
Copy link
Author

Thank you very much for your answer, but I'm sorry that I couldn't find the weights of the pre-trained and saved fine-tuned policies in the YAML configuration file.

@allenzren
Copy link
Member

Oh I meant loading the weights of the two policies in torch and see if they are the same. It is really suspicious since the pre-trained and fine-tuned policies show the exactly same values for the evaluation reward (80.595), so I wonder if the policies are actually exactly the same.

@ltl520
Copy link

ltl520 commented Jan 2, 2025

Hi! I encountered the same problem when training the one_leg task. I directly used the pre-trained model you released for fine-tuning. When testing the pre-training results with the eval_diffusion_mlp code, the success_rate was 40%. After fine-tuning for 200 epochs, the test result in the fine-tuning code was over 90%.
image
However, when testing the fine-tuned result with the eval_diffusion_mlp code, the success_rate was 45%.
image
Same as @Knight-xiao, I only modified the base_policy_path in the eval_diffusion_mlp configuration file too.
image
Is this situation normal? What could be the cause? Thank you very much.

@allenzren
Copy link
Member

allenzren commented Jan 2, 2025 via email

@ltl520
Copy link

ltl520 commented Jan 2, 2025

Thanks for your reply!

@allenzren
Copy link
Member

allenzren commented Jan 3, 2025

@Knight-xiao @ltl520 Hey guys, really sorry about this, but I think I have a bug about saving and loading the fine-tuned checkpoints in eval. Basically here

self.load_state_dict(checkpoint["model"], strict=False)
when the policy is loaded in eval, it is looking for "network" parameters, which is not the fine-tuned weights.

I will make sure to fix this tomorrow (including which parameters to save in training), it is really late in my time now. Meanwhile you can try it yourself

@allenzren allenzren added the bug Something isn't working label Jan 3, 2025
@allenzren allenzren self-assigned this Jan 3, 2025
@ltl520
Copy link

ltl520 commented Jan 3, 2025

Thanks! I fixed it by adding test_model_load to agent/eval/eval_agent.py to load the fine-tuned weights. And it works.

def test_model_load(self): 
  """
  loads model from disk
  """
  loadpath = os.path.join(self.network_path)
  checkpoint = torch.load(loadpath, weights_only=True)
  if "ema" in checkpoint:
      pass
  else:
      model_dict = self.model.state_dict()
      state_dict = {k: v for k, v in checkpoint["model"].items() if "actor_ft." in k}
      model_dict.update(state_dict)
      self.model.load_state_dict(model_dict, strict=False)
      logging.info("Loaded RL-trained policy for eval from %s", loadpath)

@allenzren
Copy link
Member

allenzren commented Jan 3, 2025

@ltl520 Cool! Yea this works if all denoising steps are fine-tuned, which is the case with furniture ddim setup. For @Knight-xiao, the pre-trained policy (frozen in fine-tuning) also needs to be set up for inference for early denoising steps

@Knight-xiao would you like to try out this branch for me? #31 thanks very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants