#Rate Limiting, what’s reasonable?

16 messages · Page 1 of 1 (latest)

solemn lake
#

Hi friends,

I’m working on a research project to explore positions that have been reached above a certain number of times in Lichess (with some filters to get decent quality somewhat recent games only).

Right now, my code basically descends the game tree recursively, calculates some information about that position relative to the moves played, and moves to each of the moves played with some frequency.

I can share more details of my research if anyone is interested, but that’s not really the point of my question.

As you could imagine,”scraping” the Lichess API for all positions that occur above some frequency threshold can get large fast. Obviously raw read only access to the database tables directly would be a simpler solution but I don’t have that (though, would be happy if that was an option!)

I see the documentation around rate limiting. Is the implication that I just shouldn’t do this? Or I should restrict myself to one call per 3 seconds (as an example)?

Really incredible that a) this data is available via an API and 2) it is free. I want to use but not abuse it!

Thanks!

placid lotus
solemn lake
#

I could… I didn’t think of that route because I’d need to recreate some of the functionality of the opening explorer database. For example, given an FEN, I want to know what moves were played and how often. That’s not impossible. But, let’s say I don’t want any FENs that appear less than 10 times, I can cut out a LOT of data processing traversing down the game tree using opening explorer db rather than “building up” a database of positions from scratch, millions of times for some positions.

My CS/Algorithms background isn’t that strong so maybe there’s an easier way to approach this with downloaded games I hadn’t considered.

#

But the recursion is also computationally expensive with my poorly written code 😅

glacial nest
jade pewter
#

I see the documentation around rate limiting. Is the implication that I just shouldn’t do this?

Yes. Basically no API in existence is intended to scrape lots of data.

#

As schlawg said, use the db dumps. That should be way more efficient anyway.

#

Then running your own opening explorer instance as tors mentioned is one option.

solemn lake
#

Thanks for the ideas! Will explore running my own instance and downloading the games directly.

edgy cedar
#

I tried to stay within the limits but seem to have been blocked for a time when running something overnight. Adding some more documentation around the rate limiting that even is more conservative than reality would be good to avoid a situation like this. Although it is very easy to download the data, the games are very slow to index into RocksDB, and I had a conversation with @gilded dawn about the speeds I'm seeing and they are to be expected. This obviates the "just download the data" argument, aside from stopping at small amounts of data which detracts from the statistical significance of whatever you are calculating using the data. I'm working on my own aggregation that cares less about the state of the lichess social world and more about aggregating stats for positions based on my own filters, but this isn't an easy solution by any means. I understand this data is a gift more than a right, and I am thankful. I just don't understand the lack of transparency around the rate limiting.

edgy cedar
#

To be fair I goofed and I'm being rate limited on cloud evals, but still the same principle. I have no idea how often I'm able to query it.

placid lotus
#

There are no rate limits on a downloaded db. Could you explain how fetching them 1 by 1 via API improves your RocksDB situation?

edgy cedar
#

You misunderstand. Of course there are no rate limits on a local db. To extract pgns from zst and insert them into the DB 1 by 1 via the provided rust client takes a very long time for the amount of data.

#

I am however being rate limited on the public API, which I am using while I work on a faster aggragation method locally

#

And I'm only asking about the transparency around said rate limiting. I would like to know if I'm pushing too hard right away rather than finding out by being blocked entirely.

#

Again I appreciate all you do in supporting this