How to avoid API rate limits when writing LLM workflows | Dagger | Page 1

warped spear Mar 27, 2025, 6:11 PM

#

Hi folks, It's the first time I am using Dagger so maybe it's an obvious question 🙂 I am trying to build a workflow that involves LLM calls, specifically Claude 3.5.
I am using the LLM to go over a pretty large codebase, which cause my workflow to reach the max api limits of tokens per minute.
Is there any way that I can configure Dagger to slow down the LLM requests to avoid hitting those rate limits? Thanks!

manic whale Mar 27, 2025, 6:50 PM

#

I opened this issue for configurable retries: https://github.com/dagger/dagger/issues/9970
I wonder if that should include backoff logic too

GitHub

✨ LLM: configurable retries · Issue #9970 · dagger/dagger

What are you trying to do? Some LLM providers tend to throw HTTP 429/502/503 when they're overloaded. We should have a configurable retry when one of these statuses is received Why is this impo...

warped spear Mar 27, 2025, 7:04 PM

#

Thank you! I upvoted the issue.
Do you think there is a workaround or something I can do until this is implemented?

manic whale Mar 27, 2025, 7:10 PM

#

Not that I've tried, maybe a proxy like litellm (https://docs.litellm.ai/docs/) can handle it but I don't know for sure

#

@neat turret did you have some ideas for retry logic already?

neat turret Mar 27, 2025, 7:25 PM

#

manic whale <@108011715077091328> did you have some ideas for retry logic already?

I had one idea, but then it failed and I haven't gotten back to it. I wanted to inject a custom client to the LLM that detected errors and retried, but not all providers support that. But now that I think about it again, we can probably just wrap the entire loop in retry logic, as long as we can detect retryable errors. Might need some kind of collaboration between the generic outer loop and the per-provider code, like inside of SendQuery it could annotate the error based on provider-specific error checking.

#How to avoid API rate limits when writing LLM workflows