trying to learn ML agent toolkit by making a flappy bird ai, as you can see from the screenshots, the demo is recording rewards properly, the code is also properly executed (debug.log will execute when bird passes checkpoint), but when I try to train the brain with the demo i recorded, the reward is constant 0 as you can see from the tensor board screenshot
#Unity ML Agents Having Constant 0 Cumulative Reward on TensorBoard
1 messages · Page 1 of 1 (latest)
you are reseting your agent when they fail right?
yes
this is how i reset it
during demo recording it all works fine, but not during actual training
if the agent dies, you need to call EndEpisode()
that resets the agent itself and prepares it for another segment of training
call it whenever it hits a pipe
oh yeah i did that too
as u can see from the 2nd screen shot, the demo had 3 eposides and was getting all the stuff correctly
Well if your agent is improving and getting rewards, something could just be messed up in the results folder
Where tensorboard is getting the info from
is not learning at all tho, the eposide length is not increasing either
Send the scripts you are using
And a picture of all the agent scripts (your agent, behaviour params, decision requester)
Any reason the addReward() is commented out?
so orginially i was giving rewards not only on pasting through the pipe, aka point scored, but also was rewarding the agent for surviving, and i realized the bird was only getting the reward for surviing and not scoring points
so i commented out all the addReward() and just left the one for the point scored
should probably use addreward instead of SetReward, but I dont think that is your problem
have you tried print statements
like you are 100% that the collision is correct
and the code is running
i tried both, both not rewarding
yes
i have put Debug.Log() right before and after the addReward()
and both registered
okay, can i see your yaml file?
looks like you are using curriculum and/or demonstrations? personally, i dont think you would need to approach this in that kind of way, so it could be something to do with that
i don't have any experience with curriculum or demonstrations so i wont be able to help you in that case. your yaml and code both look fine so
im not using curriculum, is just the name
one of my project memebr was naming things like that so i didnt bother to change
if u are referring to line 2 that is
i am using demo tho
yes i was. I am also talking about the demonstration recorder attached to your agent object. i have a feeling that it is messing up the rewards or you arent implimenting rewards properly. Have you read the documentation or only the tutorial?