#Any reference to a better approach would

1 messages · Page 1 of 1 (latest)

twin grail
#

you will be better off using isaacgym because this is going to take absolutely forever to train

#

and use the distance from your destination as your regret (aka -distance is your reward)

#

you will get better results if you normalize reward, correctly configure kl divergence, clip, etc.