I am doing many offline Stockfish evaluations (hundreds of thousands / millions of games).
Each game will have a full stockfish evaluation (a list of objects) saved to a Game table in a SQL database.
Each time I have Stockfish evaluate a position, I want to determine the likelihood that I will need to evaluate that same position again so that I can cache that evaluation.
For instance, many of these games will have the exact same position to evaluate after 2 moves, but it is very unlikely that any game 20 moves in will have a position that is reached in any other game.
A simple approach would be to just cache the first n moves for every game. This is a good partial solution!
Consider the endgame; we will likely see common positions across games when there are only 1-2 pieces left on the board.
The “entropy” of a chess game seems to increase and then decrease as the game goes on.
I’m curious if anyone has spent some time thinking through this question and if they had any interesting findings?
TLDR: How to (roughly) calculate the likelihood that any FEN is repeated across any pool of chess games?