#reward shaping

1 messages · Page 1 of 1 (latest)

plush maple
#

Hello guys, i'm training an agent on https://gymnasium.farama.org/environments/classic_control/mountain_car/

And as a reward shaping I use

@staticmethod
def potential(s):
  return 10 * abs(s[0]+0.5) + 1000 * s[1]**2

def shape_reward(self, reward, sp, s):
  return reward + self.gamma*self.potential(sp) - self.potential(s)

the problem is that it doesnt work at all and using only the env reward make training impossible.

A standard API for reinforcement learning and a diverse set of reference environments (formerly Gym)

neat lichen