Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Got bad performance when reproducing HACO #6

Open
dq0803 opened this issue Jul 26, 2023 · 3 comments
Open

Got bad performance when reproducing HACO #6

dq0803 opened this issue Jul 26, 2023 · 3 comments

Comments

@dq0803
Copy link

dq0803 commented Jul 26, 2023

Hi, I just attempt to reproduce HACO with keyboard by running "train_haco_keyboard_easy.py ", but encountered unsatisfactory training performance.

At the early stage, I can see the model was improved with the help of human interventions. After around 20~40 iterations, the car has learned some driving skills, and occasionally managed to reach the destination, albeit with uneven performance. However, after a few more iterations, strange things occurred. The car failed to start normally and would brake suddenly while driving. It seems like the model forgot the skills it previously learned and its performance worsened.

Could you please explain the reasons behind this issue? Is it related to improper timing for human intervention, an excessive focus on exploration, or some other factor?

The screenshot below is the evaluation results by running "eval_haco.py ", with EPISODE_NUM_PER_CKPT = 2.
eval_res

@pengzhenghao
Copy link
Member

Hi, I just attempt to reproduce HACO with keyboard by running "train_haco_keyboard_easy.py ", but encountered unsatisfactory training performance.

Thank you for running our code!

At the early stage, I can see the model was improved with the help of human interventions. After around 20~40 iterations, the car has learned some driving skills, and occasionally managed to reach the destination, albeit with uneven performance.

This is expected.

However, after a few more iterations, strange things occurred. The car failed to start normally and would brake suddenly while driving. It seems like the model forgot the skills it previously learned and its performance worsened.

This is also expected and observed during our experiments.

Could you please explain the reasons behind this issue? Is it related to improper timing for human intervention, an excessive focus on exploration, or some other factor?

Sure! First I am really happy you reproduce (even the strange behavior) our experiments.

In the beginning of the training, we sometimes will take a full control for 1 or 2 episodes, in order to fill the human buffer with more useful data. Then we will enter human-AI shared control and intervenes when something go wrong.

In our experiment we are very conservative and maintain the speed to near 15~20 kmh (so either the throttle we gave during intervention is also in a low value). We never brake during intervention, because we find the policy will soon converge to emergency stopping and never move again. And as you already saw, after training a long period the policy will collapse.

We have some hypothesis behind this:

  1. the Q values might be too large as the outcome of CQL loss
  2. the acceleration demonstration only occupies a very small portion of the human buffer and thus those data samples are hard to be sampled, neither to be learned.

@xiaozhao12345
Copy link

Hello, how to solve the problem of suddenly emergency braking after several iterations?

@pengzhenghao
Copy link
Member

Hello, how to solve the problem of suddenly emergency braking after several iterations?

That's a good question! We observe that too!

The answer is:

  1. do not brake at all as a human demonstrator
  2. keep slow and almost constant speed
  3. try to use our new algorithm PVP: https://github.com/metadriverse/pvp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants