I'm currently building an API that fetches chess games from Chess.com. The goal is to analyze these games using Stockfish in order to identify all mistakes and blunders.
The problem I'm facing now is that it's difficult to reliably detect mistakes in the games, because the evaluation I get back from Stockfish isn't consistent or trustworthy enough.
What I do is send each move to Stockfish, which returns an evaluation for the given position. I then calculate the evaluation difference between consecutive moves, and if the difference exceeds a certain threshold, I classify it as a mistake or blunder.
However, the issue is that even after just two opening moves (e4 e5), Stockfish sometimes returns an evaluation of +1.06, which is clearly unreasonable at such an early stage in a balanced opening.
So I'm currently unsure how to move forward. I haven’t found any other APIs that can analyze PGN-format games reliably, but I know there are websites that provide similar analysis, so there must be a way to do it.
How should I proceed?