robotic_insertion.mp4
State action pair (
- State: postion, orientation and force at robot end-effector
- Action: postion, orientation and force setpoints given to controller (Note: force setopints are always zero, here for ease of computations)
Multi-layer perceptrons, two hidden layers with 500 neurons per layer.
Supervised training minimizes the model's error
where, indices
Using the dynamics model and cost function, the discounted receding horizon cost-to-go
The optimal control sequence at each time step is computed by optimizing the following equation,
The feedback controller is formulated by solving the above optimization problem repeatedly at each timestep and applying only the first element of the optimal action sequence.
The first element of the optimal sequence,
Two controllers available are Random Shooting and Model Predictive Path Integral(MPPI)
Random shooting is a sampling based optimization method that generates random sequences of
candidate samples for evaluation. First,
MPPI is based on importance sampling with a smoother update rule that aggregates the samples to compute the update. Instead of directly sampling candidate action sequences from a Gaussian, this method uses a filtering technique to compute smooth candidate control sequences.