solemn condor Oct 20, 2024, 3:26 PM

#

As far as I can see, I am respecting the rate limit of 5 rps and am staying well below the token limits. However, when i test it with 100 requests (spread over at least 20 seconds...) Some of them still run into the rate limit. I have to increase the rate limit all the way down to 2 requests per second to avoid getting the error.

There is always the chance of it being a problem in my code, but when i look at my logs, I can't understand why im being rate limited.

signal mantle Oct 20, 2024, 3:28 PM

#

solemn condor As far as I can see, I am respecting the rate limit of 5 rps and am staying well...

how exactly are you spreading the requests?

solemn condor Oct 20, 2024, 3:29 PM

#

This line in particular: "ratelimitbysize-reset": "21", is a bit puzzeling to me.

Hang on, Discord is being very particular about character limits

📎 mistral_request.json

mossy pecan Oct 20, 2024, 3:29 PM

#

how you send it and how the server accepts it can be discrepancies

#

as just because you send it off doesnt mean the execution starts right away

#

you hit a queu

#

and then thats gets usually counted towards the enforcement of throwput

signal mantle Oct 20, 2024, 3:30 PM

#

still Im curious to know how exactly he "spreads" it

mossy pecan Oct 20, 2024, 3:30 PM

#

so the real key here is have a backoff and retry strategy

#

pretty much spam the server .. hit a limit wait half a sec .. try again

solemn condor Oct 20, 2024, 3:31 PM

#

well here is my llm script, its part of a module that is used in my FastAPI app.

📎 llm.py

mossy pecan Oct 20, 2024, 3:31 PM

#

i would assume that ratelimiting is a working ringbuffer in redis .. like in most systems out there

solemn condor Oct 20, 2024, 3:31 PM

#

mossy pecan pretty much spam the server .. hit a limit wait half a sec .. try again

ahh, that cannot be what Mistral would want me to do right?

mossy pecan Oct 20, 2024, 3:31 PM

#

thats the sane engeeniering on your end

#

aka the default

#

if you are queued up and your old connections are still to be processed

#

that still piles up

solemn condor Oct 20, 2024, 3:32 PM

#

also thank you both for replying to quickly

mossy pecan Oct 20, 2024, 3:32 PM

#

and you have to back off

#

so system eng. wise

#

the best way to work around that

#

is a regular backoff/retry strategy

solemn condor Oct 20, 2024, 3:33 PM

#

ok, my app is using async programming, so there is a good chance that if i don't limit it on my end, i might just send 10k requests to MistralAPI at once, would you suggest doing both?

signal mantle Oct 20, 2024, 3:34 PM

#

holoThinking hum the way you are doing it doesnt take into account the time it takes between each request to be completed, nor takes into account the delay between 5 requests -> for the next 5 requests, if this delay is too short it will likely reach a rate limit?

#

the two ways ive done async/parallel requests while trying to max out without the need for retrying and spaming are the ones ive used here if you are curious: https://github.com/mistralai/cookbook/tree/main/mistral/data_generation

GitHub

cookbook/mistral/data_generation at main · mistralai/cookbook

Contribute to mistralai/cookbook development by creating an account on GitHub.

mossy pecan Oct 20, 2024, 3:34 PM

#

the tps are acting like concurrency ..

#

so .. here is the rub send 5 in

#

wait till its done

#

and send 5 more

solemn condor Oct 20, 2024, 3:34 PM

#

signal mantle the two ways ive done async/parallel requests while trying to max out without th...

i will study this

mossy pecan Oct 20, 2024, 3:34 PM

#

that is easy to async

#

without much fuzz

signal mantle Oct 20, 2024, 3:35 PM

#

solemn condor i will study this

https://github.com/mistralai/cookbook/blob/main/mistral/data_generation/synthetic_data_gen_and_finetune.ipynb

this one uses async magic

GitHub

cookbook/mistral/data_generation/synthetic_data_gen_and_finetune.ip...

Contribute to mistralai/cookbook development by creating an account on GitHub.

#

it uses the old client tho

#

but you can use it as inspiration

mossy pecan Oct 20, 2024, 3:35 PM

#

no client needed either way

#

that can be just a regular http post

#

for all its worth

#

i personally do that with retry policies when i send 10k + requests onto the api

solemn condor Oct 20, 2024, 3:36 PM

#

@mossy pecan I think your proposed solution would not max out my rate limit, as i would be waiting for the last of my batch of 5 to finish before sending another 5, this would guarantee i would max it out for the first second after sending my batch, but not for the second following it right

mossy pecan Oct 20, 2024, 3:37 PM

#

you dont max it out anyway . as you will hit walls

#

as the dequeuing is pretty much waiting till it starts to count towards the limiter

signal mantle Oct 20, 2024, 3:37 PM

#

signal mantle https://github.com/mistralai/cookbook/blob/main/mistral/data_generation/syntheti...

btw this approach basically makes lets say 20 workers that will send requests each from a batch

#

but there are so many ways to do this like dragon for example

mossy pecan Oct 20, 2024, 3:37 PM

#

signal mantle btw this approach basically makes lets say 20 workers that will send requests ea...

mind you we have different limits

signal mantle Oct 20, 2024, 3:38 PM

#

mossy pecan mind you we have different limits

most of my cookbooks i design them with the basic tier 1

#

20 concurrent workers work for that cookbook cause one request takes sometimes 5 seconds to complete

#

so even with 5rps it works

solemn condor Oct 20, 2024, 3:41 PM

#

im reading through your cookbook @signal mantle , thank you for this! But i notice that it uses Semaphore to limit it right?

async with semaphore:
gen_dialog = ""
while not self._validate_generated(gen_dialog):
if len(dialogs) == 0:
return []

            dialog = dialogs.pop()
            gen_dialog = await self._async_generate(description, dialog)

        pbar.update(1)
        return gen_dialog

#

where self.cli is the AsyncMistral client (which is now* built into the normal Mistral) client

signal mantle Oct 20, 2024, 3:43 PM

#

yes

#

its not perfect but works fairly well

solemn condor Oct 20, 2024, 3:44 PM

#

well in my code, i too use the Semaphore, but it still runs into the limit

signal mantle Oct 20, 2024, 3:45 PM

#

but yours uses semaphore with a different approach no? and doesnt take into account how long a request lasts

#

my guess is that your use case is really fast right?

solemn condor Oct 20, 2024, 3:45 PM

#

yeah, there is a good chance i get a reply within 1 second back, so if i put the semaphore at 5, i might still run into the requests per second limit

#

this is my rate_limit function:

def rate_limit(rate_limiter, semaphore):
"""Decorator to apply rate limiting using both AsyncLimiter and Semaphore."""
def decorator(func):
@functools.wraps(func) # Preserve the original function metadata
async def wrapper(*args, **kwargs):
# Acquire the semaphore to limit concurrent access
async with semaphore:
# Use the rate limiter to control the request rate
async with rate_limiter:
return await func(*args, **kwargs)
return wrapper
return decorator

signal mantle Oct 20, 2024, 3:46 PM

#

then maybe the 2 rate limit is actually accurate, cause its a 2 concurrent limit and not a 2 requests per second limit no?

solemn condor Oct 20, 2024, 3:47 PM

#

and my rate_limiter and semaphore:

Define the Rate Limiter

rate_limiter = AsyncLimiter(max_rate=5, time_period=1)
semaphore = asyncio.Semaphore(5)

they are defined globally, so every request uses the same rate limiter and semaphore

signal mantle Oct 20, 2024, 3:47 PM

#

technically if the api is very very fast you wouldnt even need async, and you could even have to add a time.sleep

solemn condor Oct 20, 2024, 3:47 PM

#

#

I mean, Mistral might've implemented it differently than this, but i assume i wouldn';t be the first to figure that out and ive been searching haha

signal mantle Oct 20, 2024, 3:48 PM

#

but what happens after the end of a batch and start of a new batch, doenst it have a chance of going too fast there?

solemn condor Oct 20, 2024, 3:48 PM

#

signal mantle technically if the api is very very fast you wouldnt even need async, and you co...

yeah that might be a solution, but im building a FastAPI app and part of my pipeline should be run in parallel, so async is the logical way to implement this

solemn condor Oct 20, 2024, 3:49 PM

#

signal mantle technically if the api is very very fast you wouldnt even need async, and you co...

Hmm, i could enforce a minimum request time of 1 second to ensure my limiters work out, i would have to compute the entire runtime of the reply_text and reply_model functions in my llm.py file and subtract that from 1 sec and then wait for it.

signal mantle Oct 20, 2024, 3:50 PM

#

if you take a lookat the timestamps of your requests what does it look like, is it sending requests in bursts? does it reach the rate limit in the middle of a batch? end of a batch?

signal mantle Oct 20, 2024, 3:50 PM

#

solemn condor Hmm, i could enforce a minimum request time of 1 second to ensure my limiters wo...

yeah, one batch of 5 shouldnt last more than 1 second for sure

solemn condor Oct 20, 2024, 3:50 PM

#

my batch is 100 tasks (for testing), and its kind of random. Sometimes 30 of themc omplete no issue, and then on and off failure and then it succeeds again many requests in a row.

signal mantle Oct 20, 2024, 3:51 PM

#

otherwise will reach the limit

solemn condor Oct 20, 2024, 3:51 PM

#

signal mantle yeah, one batch of 5 shouldnt last more than 1 second for sure

you mean less right?

signal mantle Oct 20, 2024, 3:51 PM

#

oh yes my bad!

solemn condor Oct 20, 2024, 3:51 PM

#

❤️

#

i think that might be it, i was working under the impression that this:

from aiolimiter import AsyncLimiter

should take care of that for me, and it definitely has an impact. If i set it to a very low value, i get a very veyr small chance at rate limit errors.

#

from their readthedocs:

API

class aiolimiter.AsyncLimiter(max_rate, time_period=60)

A leaky bucket rate limiter.

This is an asynchronous context manager; when used with async with, entering the context acquires capacity:

limiter = AsyncLimiter(10)
for foo in bar:
    async with limiter:
        # process foo elements at 10 items per minute

Parameters:

        max_rate (float) – Allow up to max_rate / time_period acquisitions before blocking.

        time_period (float) – duration, in seconds, of the time period in which to limit the rate. Note that up to max_rate acquisitions are allowed within this time period in a burst.

async acquire(amount=1)

    Acquire capacity in the limiter.

    If the limit has been reached, blocks until enough capacity has been freed before returning.

    Parameters:

        amount (float) – How much capacity you need to be available.
    Exception:

        Raises ValueError if amount is greater than max_rate.
    Return type:

        None

has_capacity(amount=1)

    Check if there is enough capacity remaining in the limiter

    Parameters:

        amount (float) – How much capacity you need to be available.
    Return type:

        bool

max_rate: float

    The configured max_rate value for this limiter.

time_period: float

    The configured time_period value for this limiter.

#

i wrote a little demo of my issue 🙂

📎 rate_limit.py

#

so out of 100 requests, 10 caused a rate-limiting error

solemn condor Oct 20, 2024, 4:18 PM

#

alright, i limited the minimum runtime of 1 single request to a minimum of 1 second. But it appears that there are still (albeit limited) occurances of my request being rate-limited.

I am now going to repeat this without semaphore and with the aiolimiter to see if it solves the problem.

📎 rate_limit2.py

#

using aiolimiter and a minimum request duration of 1 second it still yields a few errors in my 300 request attempt.

#

now trying both semaphore and aiolimiter

#

yup, still running into the rate limit occatinoally

#

very frustrating problem this

#

if i limit the request to minimum of 2 sec runtime, surely it should now be gone

#I'm having trouble with the rate limits

Define the Rate Limiter