#Any reference to a better approach would
1 messages · Page 1 of 1 (latest)
you will be better off using isaacgym because this is going to take absolutely forever to train
and use the distance from your destination as your regret (aka -distance is your reward)
you will get better results if you normalize reward, correctly configure kl divergence, clip, etc.