Questions on the Transport task #29

Knight-xiao · 2024-12-28T10:18:05Z

Thank you very much for your excellent work. I have a few questions that I would like to get your answers .
Firstly, I tried to reproduce the Transport task in robomimic. After fine-tuning, the displayed success rate on the terminal has reached over 90%. However, when I executed the eva file, I found that the success rate was not high, far from 90%. I also tested the policy obtained from pre-training, and surprisingly, the effects of the pre-trained and fine-tuned policies were exactly the same. Do you have any solutions for this? It should be noted that I used pre_diffusion_mlp for pre-training and ft_ppo_diffusion_mlp for fine-tuning, and in ft_ppo_diffusion_mlp, I changed the base_policy_path to the policy obtained from pre-training.
Additionally, I successfully recorded the Transport task by following your tutorial, mainly by modifying render_num, env.n_envs, and env.save_video in eval_diffusion_mlp. However, what should I do to directly display the simulation process of the Transport task?

The text was updated successfully, but these errors were encountered:

allenzren · 2024-12-28T16:54:42Z

Hi! Could you share the commit you are running with, and also the exact files/configs? I can try to help if you can send me those.

For the second question, do you mean showing the GUI live instead of recording the video? I've done that before by not using vectorized environment but I don't have the script available right now. You can take a look at the robomimic/robosuite doc.

Knight-xiao · 2024-12-29T04:32:40Z

Thank you for your answer! I found that the aforementioned issue is not limited to the Transport task; other tasks also exhibit the same problem. Below are some configurations related to the Square task:
First, I conducted pre-training by executing the command:

python script/run.py --config-name=pre_diffusion_mlp --config-dir=cfg/robomimic/pretrain/squre

Here, I did not make any modifications to the pre-train configuration. Below are the results obtained from directly evaluating the pre-trained policy:

Then, I performed fine-tuning by executing the command:

python script/run.py --config-name=ft_ppo_diffusion_mlp --config-dir=cfg/robomimic/finetune/squre

Here, I only modified the base_policy_path in the configuration file as follows:

name: ${env_name}_ft_diffusion_mlp_ta${horizon_steps}_td${denoising_steps}_tdf${ft_denoising_steps}
logdir: ${oc.env:DPPO_LOG_DIR}/robomimic-finetune/${name}/${now:%Y-%m-%d}_${now:%H-%M-%S}_${seed}
# base_policy_path: ${oc.env:DPPO_LOG_DIR}/robomimic-pretrain/square/square_pre_diffusion_mlp_ta4_td20/2024-07-10_01-46-16/checkpoint/state_8000.pt
base_policy_path: ${oc.env:DPPO_LOG_DIR}/robomimic-pretrain/square_pre_diffusion_mlp_ta4_td20/2024-12-28_18-14-32_42/checkpoint/state_3000.pt # This is my pre-train policy
robomimic_env_cfg_path: cfg/robomimic/env_meta/${env_name}.json
normalization_path: ${oc.env:DPPO_DATA_DIR}/robomimic/${env_name}/normalization.npz

Below are the results obtained from the fine-tuned policy:

Knight-xiao · 2024-12-29T04:38:37Z

From the second figure, it appears that the fine-tuned policy has achieved a 99% success rate, significantly higher than the 49% success rate of the pre-trained policy. However, when actually evaluating the fine-tuned policy, its performance is the same as that of the pre-trained policy：

It should be noted that the only modification I made in cfg/robomimic/eval/squre/eval_diffusion_mlp.yaml was to the base_policy_path:

name: ${env_name}_eval_diffusion_mlp_ta${horizon_steps}_td${denoising_steps}
logdir: ${oc.env:DPPO_LOG_DIR}/robomimic-eval/${name}/${now:%Y-%m-%d}_${now:%H-%M-%S}_${seed}
# pre-train policy
# base_policy_path: ${oc.env:DPPO_LOG_DIR}/robomimic-pretrain/square_pre_diffusion_mlp_ta4_td20/2024-12-28_18-14-32_42/checkpoint/state_3000.pt

# fine-tune policy
base_policy_path: ${oc.env:DPPO_LOG_DIR}/robomimic-finetune/square_ft_diffusion_mlp_ta4_td20_tdf10/2024-12-28_21-20-17_42/checkpoint/state_200.pt
robomimic_env_cfg_path: cfg/robomimic/env_meta/${env_name}.json
normalization_path: ${oc.env:DPPO_DATA_DIR}/robomimic/${env_name}/normalization.npz

Knight-xiao · 2024-12-29T04:43:01Z

The above are the modifications I made to the relevant configuration files during the training process. Do you need any other information? I sincerely appreciate your help and also wish you a Happy New Year!

allenzren · 2024-12-29T08:20:42Z

Hmm I see your logs but I don't have a clue. Maybe you can look at the weights of the pre-trained and saved fine-tuned policies, and make sure they are not the same?

Since the reward numbers exactly match, I would suspect something off with saving/loading the policy. Maybe my code has a bug but I would need to run the training, if you still can't figure it out.

Knight-xiao · 2024-12-29T12:09:39Z

Thank you very much for your answer, but I'm sorry that I couldn't find the weights of the pre-trained and saved fine-tuned policies in the YAML configuration file.

allenzren · 2024-12-30T20:27:45Z

Oh I meant loading the weights of the two policies in torch and see if they are the same. It is really suspicious since the pre-trained and fine-tuned policies show the exactly same values for the evaluation reward (80.595), so I wonder if the policies are actually exactly the same.

ltl520 · 2025-01-02T14:42:17Z

Hi! I encountered the same problem when training the one_leg task. I directly used the pre-trained model you released for fine-tuning. When testing the pre-training results with the eval_diffusion_mlp code, the success_rate was 40%. After fine-tuning for 200 epochs, the test result in the fine-tuning code was over 90%.

However, when testing the fine-tuned result with the eval_diffusion_mlp code, the success_rate was 45%.

Same as @Knight-xiao, I only modified the base_policy_path in the eval_diffusion_mlp configuration file too.

Is this situation normal? What could be the cause? Thank you very much.

allenzren · 2025-01-02T15:40:58Z

I see, I will look into this later today and run some training! Thanks for raising the issue

…

On Thu, Jan 2, 2025 at 9:42 AM ltl520 ***@***.***> wrote: Hi! I encountered the same problem when training the one_leg task. I directly used the pre-trained model you released for fine-tuning. When testing the pre-training results with the eval_diffusion_mlp code, the success_rate was 40%. After fine-tuning for 200 epochs, the test result in the fine-tuning code was over 90%. image.png (view on web) <https://github.com/user-attachments/assets/0acf9374-b507-4b87-9ea7-027e206a1277> However, when testing the fine-tuned result with the eval_diffusion_mlp code, the success_rate was 45%. image.png (view on web) <https://github.com/user-attachments/assets/3176ff22-e287-4bd6-a2b4-45a2a5cf7f7d> Same as @Knight-xiao <https://github.com/Knight-xiao>, I only modified the base_policy_path in the eval_diffusion_mlp configuration file too. image.png (view on web) <https://github.com/user-attachments/assets/16f9de2b-8d5f-416d-a5c5-094929498d51> Is this situation normal? What could be the cause? Thank you very much. — Reply to this email directly, view it on GitHub <#29 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADNL7MP3XOBE4TU6FKS42D32IVF57AVCNFSM6AAAAABUJVQWKSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNRXHA3TSMBVGQ> . You are receiving this because you commented.Message ID: ***@***.***>

ltl520 · 2025-01-02T16:36:52Z

Thanks for your reply!

allenzren · 2025-01-03T06:57:20Z

@Knight-xiao @ltl520 Hey guys, really sorry about this, but I think I have a bug about saving and loading the fine-tuned checkpoints in eval. Basically here

dppo/model/diffusion/diffusion.py

Line 85 in e7f73df

self.load_state_dict(checkpoint["model"], strict=False)

when the policy is loaded in eval, it is looking for "network" parameters, which is not the fine-tuned weights.

I will make sure to fix this tomorrow (including which parameters to save in training), it is really late in my time now. Meanwhile you can try it yourself

ltl520 · 2025-01-03T18:05:05Z

Thanks! I fixed it by adding test_model_load to agent/eval/eval_agent.py to load the fine-tuned weights. And it works.

def test_model_load(self): 
  """
  loads model from disk
  """
  loadpath = os.path.join(self.network_path)
  checkpoint = torch.load(loadpath, weights_only=True)
  if "ema" in checkpoint:
      pass
  else:
      model_dict = self.model.state_dict()
      state_dict = {k: v for k, v in checkpoint["model"].items() if "actor_ft." in k}
      model_dict.update(state_dict)
      self.model.load_state_dict(model_dict, strict=False)
      logging.info("Loaded RL-trained policy for eval from %s", loadpath)

allenzren · 2025-01-03T22:26:57Z

@ltl520 Cool! Yea this works if all denoising steps are fine-tuned, which is the case with furniture ddim setup. For @Knight-xiao, the pre-trained policy (frozen in fine-tuning) also needs to be set up for inference for early denoising steps

@Knight-xiao would you like to try out this branch for me? #31 thanks very much!

allenzren added the bug Something isn't working label Jan 3, 2025

allenzren self-assigned this Jan 3, 2025

allenzren mentioned this issue Jan 3, 2025

Set up proper models in eval #31

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions on the Transport task #29

Questions on the Transport task #29

Knight-xiao commented Dec 28, 2024

allenzren commented Dec 28, 2024

Knight-xiao commented Dec 29, 2024

Knight-xiao commented Dec 29, 2024

Knight-xiao commented Dec 29, 2024

allenzren commented Dec 29, 2024 •

edited

Loading

Knight-xiao commented Dec 29, 2024

allenzren commented Dec 30, 2024

ltl520 commented Jan 2, 2025

allenzren commented Jan 2, 2025 via email •

edited

Loading

ltl520 commented Jan 2, 2025

allenzren commented Jan 3, 2025 •

edited

Loading

ltl520 commented Jan 3, 2025 •

edited

Loading

allenzren commented Jan 3, 2025 •

edited

Loading

Questions on the Transport task #29

Questions on the Transport task #29

Comments

Knight-xiao commented Dec 28, 2024

allenzren commented Dec 28, 2024

Knight-xiao commented Dec 29, 2024

Knight-xiao commented Dec 29, 2024

Knight-xiao commented Dec 29, 2024

allenzren commented Dec 29, 2024 • edited Loading

Knight-xiao commented Dec 29, 2024

allenzren commented Dec 30, 2024

ltl520 commented Jan 2, 2025

allenzren commented Jan 2, 2025 via email • edited Loading

ltl520 commented Jan 2, 2025

allenzren commented Jan 3, 2025 • edited Loading

ltl520 commented Jan 3, 2025 • edited Loading

allenzren commented Jan 3, 2025 • edited Loading

allenzren commented Dec 29, 2024 •

edited

Loading

allenzren commented Jan 2, 2025 via email •

edited

Loading

allenzren commented Jan 3, 2025 •

edited

Loading

ltl520 commented Jan 3, 2025 •

edited

Loading

allenzren commented Jan 3, 2025 •

edited

Loading