Hey guys! I'm quite new on AI and training I've been working on an board game digital adaptation, and needed to train an AI so players can play against. In this game we have 2 phases, a draft phase and an action phase. The AI can pick only one card, and then perform an action and/or a power. I believe I did almost everything correctly when:
- Making the AI pick a card based on game status and reward it for picking a good card.
- Making the AI select an action and rewarding it when he picks a valid action.
- Letting the AI choose if he wants to end turn or change action mid play and rewarding or penalizing based on remaining possible moves.
- Letting the AI do an action on the board, adding reward if he performs an action on a valid tile, and penalizing if it tries an invalid tile.
- EndingEpisodes after they finish their turn on the Action phase, so it can undestand that a episode consists of these 2 phases.
With all this in mind, when I put the AI to train it performs quite good, and can finish a match in under 2 seconds, the problem comes when you perceive that it's not learning based on their mistakes on the board, and it doesn't seem to understand my board correctly.
The main board consists of a 9x11 space we'll use that as default, but size will vary in later updates, so I need this value to be mostly dynamic. The AI needs to understand their board and the pieces put in it, but I can't seem to make him understand that. My board class returns me a Dictionary<Vector2 position, Tile information>().
I tried adding each tile in the board as an observation like this:
foreach (var tile in tiles)
{
sensor.AddObservation(new Vector3(tile.posX, tile.posY, (int) tile.pieceType));
}
For decisions OnActionRecived() the AI will return me 2 discrete actions:
int boardX = actions.DiscreteActions[2];
int boardY = actions.DiscreteActions[3];
Vector2 boardPos = new Vector2(boardX, boardY);
// After: Try all tiles until he tries a possible tile and reward it.
What am I doing wrong and/or what should I do to make the board be understandable?