#Unity ML Agents Having Constant 0 Cumulative Reward on TensorBoard

1 messages · Page 1 of 1 (latest)

deft marsh
#

trying to learn ML agent toolkit by making a flappy bird ai, as you can see from the screenshots, the demo is recording rewards properly, the code is also properly executed (debug.log will execute when bird passes checkpoint), but when I try to train the brain with the demo i recorded, the reward is constant 0 as you can see from the tensor board screenshot

bold ruin
#

you are reseting your agent when they fail right?

deft marsh
#

yes

deft marsh
#

this is how i reset it

#

during demo recording it all works fine, but not during actual training

bold ruin
#

if the agent dies, you need to call EndEpisode()

#

that resets the agent itself and prepares it for another segment of training

#

call it whenever it hits a pipe

deft marsh
#

as u can see from the 2nd screen shot, the demo had 3 eposides and was getting all the stuff correctly

bold ruin
#

Well if your agent is improving and getting rewards, something could just be messed up in the results folder

#

Where tensorboard is getting the info from

deft marsh
bold ruin
#

Send the scripts you are using

#

And a picture of all the agent scripts (your agent, behaviour params, decision requester)

bold ruin
deft marsh
# bold ruin Any reason the addReward() is commented out?

so orginially i was giving rewards not only on pasting through the pipe, aka point scored, but also was rewarding the agent for surviving, and i realized the bird was only getting the reward for surviing and not scoring points

#

so i commented out all the addReward() and just left the one for the point scored

bold ruin
#

should probably use addreward instead of SetReward, but I dont think that is your problem

#

have you tried print statements

#

like you are 100% that the collision is correct

#

and the code is running

deft marsh
deft marsh
#

i have put Debug.Log() right before and after the addReward()

#

and both registered

bold ruin
#

okay, can i see your yaml file?

deft marsh
bold ruin
#

looks like you are using curriculum and/or demonstrations? personally, i dont think you would need to approach this in that kind of way, so it could be something to do with that

#

i don't have any experience with curriculum or demonstrations so i wont be able to help you in that case. your yaml and code both look fine so

deft marsh
#

im not using curriculum, is just the name

#

one of my project memebr was naming things like that so i didnt bother to change

#

if u are referring to line 2 that is

#

i am using demo tho

bold ruin
#

yes i was. I am also talking about the demonstration recorder attached to your agent object. i have a feeling that it is messing up the rewards or you arent implimenting rewards properly. Have you read the documentation or only the tutorial?