For a project for school, I am trying to create Mario Kart-esque AIs that will navigate a track while hitting as few things as possible. In the game, hitting something damages your vehicle, so it should be extremely discouraged. Right now I'm passing in a vector observation from the front of the car in a 70 degree arc so it can see its surroundings, as well as the distance from the car to its next checkpoint. I have tried to use immitation learning, but it was often unsuccessful. Right now, I'm just trying to have it figure out how to stumble into checkpoints. When it hits a checkpoint, it receives a reward of 2. I have tried using an existential punishment of 0.0001, but that didn't change anything besides lowering the mean reward. I'm currently doing 2 million steps, but I've seen people on youtube do it in a lot less. I'm just wondering: what am I doing wrong? Config and screenshots attached
#Racing AI
1 messages · Page 1 of 1 (latest)
behaviors:
racer:
trainer_type: ppo
hyperparameters:
batch_size: 256
buffer_size: 10240
learning_rate: 3.0e-4
beta: 5.0e-4
epsilon: 0.2
lambd: 0.99
num_epoch: 3
learning_rate_schedule: linear
network_settings:
normalize: false
hidden_units: 128
num_layers: 2
reward_signals:
extrinsic:
gamma: 0.99
strength: 1.0
max_steps: 2000000
time_horizon: 64
summary_freq: 10000
It often felt like the AI just... wasn't learning. It would do the same thing over and over.
I'm not sure how often I should be doing episodes? I'm not really sure how they work. Right now, an episode ends if a car hits something or if the car completes a lap (which none have been able to do yet)
here's the vision of the cars
checkpoints, walls, and cars are on their own layer and tag
seeing this on the forums so
maybe that's the next play?
OH 2 million epidoes??? i'm nowhere near that
I got it to work, it was just an issue with my setup that prevented them from getting rewards for crossing checkpoints unless they backed up first, which confused them on what direction they were supposed to go