In this work, DDP was first used to find a controller that flies the helicopter following certain trajectory. Inverse optimal control is then implemented based on the demonstrated trajectories from the given controller. The learned Q and R are computed by doing subgradients on the cost functions of linearly weighted features and the loss between demoed trajectories and true trajectories, and finally the projection to the convex cone of positive semi-definite matrices.
The learned Q is denser than the given; the controller is similar to the given:
And the cost is decresed as follows: