#I'm having trouble with the rate limits

82 messages · Page 1 of 1 (latest)

solemn condor
#

As far as I can see, I am respecting the rate limit of 5 rps and am staying well below the token limits. However, when i test it with 100 requests (spread over at least 20 seconds...) Some of them still run into the rate limit. I have to increase the rate limit all the way down to 2 requests per second to avoid getting the error.

There is always the chance of it being a problem in my code, but when i look at my logs, I can't understand why im being rate limited.

signal mantle
solemn condor
#

This line in particular: "ratelimitbysize-reset": "21", is a bit puzzeling to me.

Hang on, Discord is being very particular about character limits

mossy pecan
#

how you send it and how the server accepts it can be discrepancies

#

as just because you send it off doesnt mean the execution starts right away

#

you hit a queu

#

and then thats gets usually counted towards the enforcement of throwput

signal mantle
#

still Im curious to know how exactly he "spreads" it

mossy pecan
#

so the real key here is have a backoff and retry strategy

#

pretty much spam the server .. hit a limit wait half a sec .. try again

solemn condor
#

well here is my llm script, its part of a module that is used in my FastAPI app.

mossy pecan
#

i would assume that ratelimiting is a working ringbuffer in redis .. like in most systems out there

solemn condor
mossy pecan
#

thats the sane engeeniering on your end

#

aka the default

#

if you are queued up and your old connections are still to be processed

#

that still piles up

solemn condor
#

also thank you both for replying to quickly

mossy pecan
#

and you have to back off

#

so system eng. wise

#

the best way to work around that

#

is a regular backoff/retry strategy

solemn condor
#

ok, my app is using async programming, so there is a good chance that if i don't limit it on my end, i might just send 10k requests to MistralAPI at once, would you suggest doing both?

signal mantle
#

holoThinking hum the way you are doing it doesnt take into account the time it takes between each request to be completed, nor takes into account the delay between 5 requests -> for the next 5 requests, if this delay is too short it will likely reach a rate limit?

mossy pecan
#

the tps are acting like concurrency ..

#

so .. here is the rub send 5 in

#

wait till its done

#

and send 5 more

mossy pecan
#

that is easy to async

#

without much fuzz

signal mantle
#

it uses the old client tho

#

but you can use it as inspiration

mossy pecan
#

no client needed either way

#

that can be just a regular http post

#

for all its worth

#

i personally do that with retry policies when i send 10k + requests onto the api

solemn condor
#

@mossy pecan I think your proposed solution would not max out my rate limit, as i would be waiting for the last of my batch of 5 to finish before sending another 5, this would guarantee i would max it out for the first second after sending my batch, but not for the second following it right

mossy pecan
#

you dont max it out anyway . as you will hit walls

#

as the dequeuing is pretty much waiting till it starts to count towards the limiter

signal mantle
#

but there are so many ways to do this like dragon for example

mossy pecan
signal mantle
#

20 concurrent workers work for that cookbook cause one request takes sometimes 5 seconds to complete

#

so even with 5rps it works

solemn condor
#

im reading through your cookbook @signal mantle , thank you for this! But i notice that it uses Semaphore to limit it right?

async with semaphore:
gen_dialog = ""
while not self._validate_generated(gen_dialog):
if len(dialogs) == 0:
return []

            dialog = dialogs.pop()
            gen_dialog = await self._async_generate(description, dialog)

        pbar.update(1)
        return gen_dialog
#

where self.cli is the AsyncMistral client (which is now* built into the normal Mistral) client

signal mantle
#

yes

#

its not perfect but works fairly well

solemn condor
#

well in my code, i too use the Semaphore, but it still runs into the limit

signal mantle
#

but yours uses semaphore with a different approach no? and doesnt take into account how long a request lasts

#

my guess is that your use case is really fast right?

solemn condor
#

yeah, there is a good chance i get a reply within 1 second back, so if i put the semaphore at 5, i might still run into the requests per second limit

#

this is my rate_limit function:

def rate_limit(rate_limiter, semaphore):
"""Decorator to apply rate limiting using both AsyncLimiter and Semaphore."""
def decorator(func):
@functools.wraps(func) # Preserve the original function metadata
async def wrapper(*args, **kwargs):
# Acquire the semaphore to limit concurrent access
async with semaphore:
# Use the rate limiter to control the request rate
async with rate_limiter:
return await func(*args, **kwargs)
return wrapper
return decorator

signal mantle
#

then maybe the 2 rate limit is actually accurate, cause its a 2 concurrent limit and not a 2 requests per second limit no?

solemn condor
#

and my rate_limiter and semaphore:

Define the Rate Limiter

rate_limiter = AsyncLimiter(max_rate=5, time_period=1)
semaphore = asyncio.Semaphore(5)

they are defined globally, so every request uses the same rate limiter and semaphore

signal mantle
#

technically if the api is very very fast you wouldnt even need async, and you could even have to add a time.sleep

solemn condor
#

I mean, Mistral might've implemented it differently than this, but i assume i wouldn';t be the first to figure that out and ive been searching haha

signal mantle
#

but what happens after the end of a batch and start of a new batch, doenst it have a chance of going too fast there?

solemn condor
solemn condor
signal mantle
#

if you take a lookat the timestamps of your requests what does it look like, is it sending requests in bursts? does it reach the rate limit in the middle of a batch? end of a batch?

signal mantle
solemn condor
#

my batch is 100 tasks (for testing), and its kind of random. Sometimes 30 of themc omplete no issue, and then on and off failure and then it succeeds again many requests in a row.

signal mantle
#

otherwise will reach the limit

solemn condor
signal mantle
#

oh yes my bad!

solemn condor
#

❤️

#

i think that might be it, i was working under the impression that this:

from aiolimiter import AsyncLimiter

should take care of that for me, and it definitely has an impact. If i set it to a very low value, i get a very veyr small chance at rate limit errors.

#

from their readthedocs:

API

class aiolimiter.AsyncLimiter(max_rate, time_period=60)

A leaky bucket rate limiter.

This is an asynchronous context manager; when used with async with, entering the context acquires capacity:

limiter = AsyncLimiter(10)
for foo in bar:
    async with limiter:
        # process foo elements at 10 items per minute

Parameters:

        max_rate (float) – Allow up to max_rate / time_period acquisitions before blocking.

        time_period (float) – duration, in seconds, of the time period in which to limit the rate. Note that up to max_rate acquisitions are allowed within this time period in a burst.

async acquire(amount=1)

    Acquire capacity in the limiter.

    If the limit has been reached, blocks until enough capacity has been freed before returning.

    Parameters:

        amount (float) – How much capacity you need to be available.
    Exception:

        Raises ValueError if amount is greater than max_rate.
    Return type:

        None

has_capacity(amount=1)

    Check if there is enough capacity remaining in the limiter

    Parameters:

        amount (float) – How much capacity you need to be available.
    Return type:

        bool

max_rate: float

    The configured max_rate value for this limiter.

time_period: float

    The configured time_period value for this limiter.
#

so out of 100 requests, 10 caused a rate-limiting error

solemn condor
#

alright, i limited the minimum runtime of 1 single request to a minimum of 1 second. But it appears that there are still (albeit limited) occurances of my request being rate-limited.

I am now going to repeat this without semaphore and with the aiolimiter to see if it solves the problem.

#

using aiolimiter and a minimum request duration of 1 second it still yields a few errors in my 300 request attempt.

#

now trying both semaphore and aiolimiter

#

yup, still running into the rate limit occatinoally

#

very frustrating problem this

#

if i limit the request to minimum of 2 sec runtime, surely it should now be gone