Hi hi, I'm working with Unity ML-Agents and using the Malbers Animal Controller to control a wolf character. I want to train an agent with a discrete action space, but I don't want all actions to always be available at every timestep.
My agent has mainly 2 actions: Speed mode (Walk, Sprint, Attack, Sneak), and Movement (forward/back/left/right)
If stamina is too low, the agent should only be allowed to walk and move.
If the agent is currently attacking, it should not be able to choose a movement at the same time.
I don't want the agent to learn from invalid actions. I want to prevent the policy from selecting those actions in the first place.
Any suggestions, or references would be helpful !
Thanks !!!