Currently working on a 2D platformer with around 64 agents training at the same time. I am experiencing trouble when the agents start reaching the end. They start reverting to undesirable behaviours and basically undo all their progress. Just wondering if anything in my OnActionRceived() would affect multi-agent training negatively? Mainly looking at rewards and stuff
#Issue training multiple agents at the same time
1 messages · Page 1 of 1 (latest)
this isn't directly related to your question yet, but I want some clarification on something. Are you doing something special with how you set up your discrete sizes? because I think by default, it starts at 0, so if you set the size to 3, the branch would output a number between 0-2. Are you seeing any agents actually stand still, because if not it might be that they cannot, which might be negatively affecting them
I might be overthinking it, but you can use debug.log(movement) to check what values its returning
I will check if that is an issue. Im sure you are correct
You were correct in that they cannot stand still. Cant believe I havent recognized it up to this point after so many runs
I will let you know how the training goes tho
still same thing happening
Can you roughly give a summary of the following:
- your goal for the ai
- reward and punishment structures
- episode end conditions (and any rewards/punishments that accompany)
Its not impossible to make out exactly whats happening in the code, but missing context makes it a little hard
Here is what the expected behaviour I had in a run almost a year ago.
Rewards: About 20 checkpoints throughtout the level as you get closer to the finish with a 0.1 reward, a -0.001 reward for every frame that passes, and a 1 reward for making it to the finish
episode ends when they touch a color that they dont match or 2000 steps
-1 reward when dying
Its difficult for me to say what exactly could cause this.
Its very strange to see reward going down instead of plateauing, at least I've rarely seen it before
I would recommend looking into the following possible solutions:
-
mess around with reward values and possibly new rewards, I cant find anything blatantly wrong with what you currently have, but your understanding of the problem and task are far better, so really try to think through what AI could be doing and why
-
You could try an incremental reward as the agents gets closer to the next checkpoint, kinda like the opposite of your negative reward for each frame
-
read through your code carefully and make sure theres no bugs or anything, this is culprit number 1 in my projects when agents start doing weird things
-
adjust the config file. I would recommend doing small, isolated changes with backups of the checkpoints. Tweaking config values has helped me a lot, but, while theres some guidelines, theres no exact science to it so it comes down to a bit of guess and check. I would suggest using the unity page on mlagents config, specifically look for changes that will fix instability
-
restart training. Its possible that not being able to stop messed up the model while it was training, and now its honed in on a behavior that gets messed up at the end. This is really just a shot in the dark if nothing else works
these are what I would do personally, in no particular order but some solutions are more drastic then others so probably try to choose least impactful to most.
Also, i assume you would've said so, but did you make some drastic change at around step 1.1M? I've only seen an agent lower in reward when I changed the environment, the reward system, or something in the config file. Seems weird if it was just training like normal then suddenly dropped like that
Nope. I dont touch anything when it trains but I did figure out the issue. I was sending the agent's transform x and y position to the observations and I ended up removing it and training improved greatly
still a weird dip in performance before 1M steps. Another one around 1.5M as I type this.
^ finished product
wow that's impressive
i've never used unity ML, can you show your setup?