I'm facing a choice.
What I want : to study and analyse the most common (and critical) mistakes in opening for players of my level (or from any range of level). Make some stats, ...Etc the sky is the limit once I get what I want : idealy a huge cvs file of the most common lines, their evaluation, and probability of occurence.
2 ways :
-
there is already someone who made something using the database : Finding the Most Common Chess Blunders : https://www.youtube.com/watch?v=7eevSgJqV7o . It's a python code that go through all the games available and creat a "blunder dictionary"
Pros : simple and doesn't use lichess bandwidth
Cons : It uses only the analysed games, which makes a huge biais.
the blunder dictionary might be too huge if I want to look at all the common mistakes too.
I'm not sure of the format of this dictionnary, but I don't think I'd be able to do relevant stats on it, cause it uses the fen position, and not the order of moves -
I use the lichess API to check for the most common moves on opening, for the people of the rank I want.
If I want to go up until move 7, and focus only on the top 3 common move each times, it makes 3^14 = 4 782 969 positions. I would need 2 calls to the API per position, which means 9 565 938 calls. If I put some waiting times in my algo, I can try to limit the number of calls per seconds to, say 500. Which would mean 5 hours of intense usage of the API. Is it too much ?
Pro : gives me everything that I want, and use the stats that lichess has already done, I don't need to do them myself.
Cons : Heavy usage of lichess API + I don't even know if I can create and process a cvs file with 4 millions rows.
What do you guys think ?