#Racing AI

1 messages · Page 1 of 1 (latest)

brazen herald
#

For a project for school, I am trying to create Mario Kart-esque AIs that will navigate a track while hitting as few things as possible. In the game, hitting something damages your vehicle, so it should be extremely discouraged. Right now I'm passing in a vector observation from the front of the car in a 70 degree arc so it can see its surroundings, as well as the distance from the car to its next checkpoint. I have tried to use immitation learning, but it was often unsuccessful. Right now, I'm just trying to have it figure out how to stumble into checkpoints. When it hits a checkpoint, it receives a reward of 2. I have tried using an existential punishment of 0.0001, but that didn't change anything besides lowering the mean reward. I'm currently doing 2 million steps, but I've seen people on youtube do it in a lot less. I'm just wondering: what am I doing wrong? Config and screenshots attached

#
behaviors:
  racer:
    trainer_type: ppo
    hyperparameters:
      batch_size: 256
      buffer_size: 10240
      learning_rate: 3.0e-4
      beta: 5.0e-4
      epsilon: 0.2
      lambd: 0.99
      num_epoch: 3
      learning_rate_schedule: linear
    network_settings:
      normalize: false
      hidden_units: 128
      num_layers: 2
    reward_signals:
      extrinsic:
        gamma: 0.99
        strength: 1.0

    max_steps: 2000000
    time_horizon: 64
    summary_freq: 10000
#

It often felt like the AI just... wasn't learning. It would do the same thing over and over.

#

I'm not sure how often I should be doing episodes? I'm not really sure how they work. Right now, an episode ends if a car hits something or if the car completes a lap (which none have been able to do yet)

#

here's the vision of the cars

#

checkpoints, walls, and cars are on their own layer and tag

#

seeing this on the forums so

#

maybe that's the next play?

#

OH 2 million epidoes??? i'm nowhere near that

brazen herald
#

I got it to work, it was just an issue with my setup that prevented them from getting rewards for crossing checkpoints unless they backed up first, which confused them on what direction they were supposed to go