Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the training curve #6

Open
xu-yang16 opened this issue Apr 30, 2024 · 18 comments
Open

About the training curve #6

xu-yang16 opened this issue Apr 30, 2024 · 18 comments

Comments

@xu-yang16
Copy link

I am using the code to train A1 on RTX 4090. However, I've noticed a significant decrease in the mean reward around the 2800th iteration. Is everything right? Should I continue training?

screenshot-20240430-133237
screenshot-20240430-133213

@hdShang
Copy link

hdShang commented Apr 30, 2024

I had the same problem, the rewards fluctuated a lot.The first drop in rewards may be due to the command curriculum update, but the reason for the second drop is something I don't know yet.
image

@sixFlag
Copy link

sixFlag commented May 2, 2024

为啥a1.urdf 中的limit effort 会被修改啊?

@sixFlag
Copy link

sixFlag commented May 3, 2024

Please post the learning rate for a look. Visual inspection suggests that there are too many iterations. 1500 iterations is enough.

@Junfeng-Long
Copy link
Collaborator

Sorry for the late reply. We did not train for such a long time. However, the significant decrease in the mean reward is probably due to the curriculum of commands and terrains. I suggest training for at most 2000 iterations. We also updated the config for a1 and go1, high speed is a bit of a conflict with the ability to cross difficult terrains for small dogs, so we limited the highest command for small dogs to 2m/s.

@Junfeng-Long
Copy link
Collaborator

为啥a1.urdf 中的limit effort 会被修改啊?

We use the a1.urdf from the original repo of legged_gym. There is also a file named a1.urdf.origin from unitree_ros, which seems hard to train. But the deployment is built upon the control loop of joint position and unitree SDK seems to handle this issue well. Therefore, don't worry about this. Position limits, velocity limits and power penalization are enough.

@sixFlag
Copy link

sixFlag commented May 3, 2024

为啥a1.urdf 中的limit effort 会被修改啊?

We use the a1.urdf from the original repo of legged_gym. There is also a file named a1.urdf.origin from unitree_ros, which seems hard to train. But the deployment is built upon the control loop of joint position and unitree SDK seems to handle this issue well. Therefore, don't worry about this. Position limits, velocity limits and power penalization are enough.

In other words, the current environment is not suitable for small dogs, but in order not to change the environment, the model is changed. Am i right?

@UltronAI
Copy link

@Junfeng-Long
Thank you for your impressive work on this project. I've been trying to replicate the results shown in Figure 5 of the HIMloco paper but have encountered some discrepancies. My training results are notably lower than those reported. I noticed that the curves I obtained (shown below) appear similar to those previously posted by other users.

I ran the python train.py command without making any modifications. Here are the curves I obtained:

image

If I want to compare different methods in simulation, can I directly compare them with these curves?

@hanzhi0410
Copy link

Hello, I have also encountered the same problem. Do you know where the problem lies? Thank you for your help

@Junfeng-Long
Copy link
Collaborator

Sorry for the late reply. There are some bugs and improper configs in the code. Already fixed, please try the new one.

@hanzhi0410
Copy link

为啥a1.urdf 中的limit effort 会被修改啊?

We use the a1.urdf from the original repo of legged_gym. There is also a file named a1.urdf.origin from unitree_ros, which seems hard to train. But the deployment is built upon the control loop of joint position and unitree SDK seems to handle this issue well. Therefore, don't worry about this. Position limits, velocity limits and power penalization are enough.

Thank you for your impressive work on this project.I used this project to train a policy and wanted to simulate it in Gazebo, but I found that the policy performed well in isaac. However, when I used Gazebo, the robot shook violently and could not stand properly. Have you done similar work before? Can you give me some suggestions? Thank you for your help

@Junfeng-Long
Copy link
Collaborator

为啥a1.urdf 中的limit effort 会被修改啊?

We use the a1.urdf from the original repo of legged_gym. There is also a file named a1.urdf.origin from unitree_ros, which seems hard to train. But the deployment is built upon the control loop of joint position and unitree SDK seems to handle this issue well. Therefore, don't worry about this. Position limits, velocity limits and power penalization are enough.

Thank you for your impressive work on this project.I used this project to train a policy and wanted to simulate it in Gazebo, but I found that the policy performed well in isaac. However, when I used Gazebo, the robot shook violently and could not stand properly. Have you done similar work before? Can you give me some suggestions? Thank you for your help

We have done this test with Aliengo in gazebo, it works well but is still worse than Isaac. I would like to help if you can offer more information. For example, video, inference output, or the code. You can send me directly or post them here.

@hanzhi0410
Copy link

为啥a1.urdf 中的limit effort 会被修改啊?

We use the a1.urdf from the original repo of legged_gym. There is also a file named a1.urdf.origin from unitree_ros, which seems hard to train. But the deployment is built upon the control loop of joint position and unitree SDK seems to handle this issue well. Therefore, don't worry about this. Position limits, velocity limits and power penalization are enough.

Thank you for your impressive work on this project.I used this project to train a policy and wanted to simulate it in Gazebo, but I found that the policy performed well in isaac. However, when I used Gazebo, the robot shook violently and could not stand properly. Have you done similar work before? Can you give me some suggestions? Thank you for your help

We have done this test with Aliengo in gazebo, it works well but is still worse than Isaac. I would like to help if you can offer more information. For example, video, inference output, or the code. You can send me directly or post them here.
May I ask what inference tool you are using, is it libtorch? Thank you for your help.

@Junfeng-Long
Copy link
Collaborator

为啥a1.urdf 中的limit effort 会被修改啊?

We use the a1.urdf from the original repo of legged_gym. There is also a file named a1.urdf.origin from unitree_ros, which seems hard to train. But the deployment is built upon the control loop of joint position and unitree SDK seems to handle this issue well. Therefore, don't worry about this. Position limits, velocity limits and power penalization are enough.

Thank you for your impressive work on this project.I used this project to train a policy and wanted to simulate it in Gazebo, but I found that the policy performed well in isaac. However, when I used Gazebo, the robot shook violently and could not stand properly. Have you done similar work before? Can you give me some suggestions? Thank you for your help

We have done this test with Aliengo in gazebo, it works well but is still worse than Isaac. I would like to help if you can offer more information. For example, video, inference output, or the code. You can send me directly or post them here.
May I ask what inference tool you are using, is it libtorch? Thank you for your help.

We use pytorch since there are cuda on dog's ob-board computer.

@Junfeng-Long
Copy link
Collaborator

为啥a1.urdf 中的limit effort 会被修改啊?

We use the a1.urdf from the original repo of legged_gym. There is also a file named a1.urdf.origin from unitree_ros, which seems hard to train. But the deployment is built upon the control loop of joint position and unitree SDK seems to handle this issue well. Therefore, don't worry about this. Position limits, velocity limits and power penalization are enough.

Thank you for your impressive work on this project.I used this project to train a policy and wanted to simulate it in Gazebo, but I found that the policy performed well in isaac. However, when I used Gazebo, the robot shook violently and could not stand properly. Have you done similar work before? Can you give me some suggestions? Thank you for your help

It seems that you accidentally open an issue under my homepage repo. Sorry for not noticing that. But happy to see that you figured out the problem:)

@hanzhi0410
Copy link

为啥a1.urdf 中的limit effort 会被修改啊?

We use the a1.urdf from the original repo of legged_gym. There is also a file named a1.urdf.origin from unitree_ros, which seems hard to train. But the deployment is built upon the control loop of joint position and unitree SDK seems to handle this issue well. Therefore, don't worry about this. Position limits, velocity limits and power penalization are enough.

Thank you for your impressive work on this project.I used this project to train a policy and wanted to simulate it in Gazebo, but I found that the policy performed well in isaac. However, when I used Gazebo, the robot shook violently and could not stand properly. Have you done similar work before? Can you give me some suggestions? Thank you for your help

It seems that you accidentally open an issue under my homepage repo. Sorry for not noticing that. But happy to see that you figured out the problem:)
Hello, thank you for your reply. I'm sorry, but as a beginner, I'm not familiar with GitHub. I left a message in the wrong position before, and I noticed that during real machine deployment, there may be a situation where the hind legs are inside eight and the dog's steering is not responsive. Have you ever encountered this situation? I hope you can give me some advice. Thank you for your help

@hanzhi0410
Copy link

And I found that the training strategy also has an inner eight situation
微信图片_20240518125650

@Junfeng-Long
Copy link
Collaborator

I think this is due to improper target height configuration. Try a lower target height, for example, 0.25m for a1.

@hanzhi0410
Copy link

I think this is due to improper target height configuration. Try a lower target height, for example, 0.25m for a1.

Thank you for your reply. I noticed that your code has set the leg lifting height, but the weight setting is very low and the leg lifting height after training is not ideal, which is much different from the effect in the video you posted. I would like to ask if you have any other methods to improve this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants