Hi folks, It's the first time I am using Dagger so maybe it's an obvious question 🙂 I am trying to build a workflow that involves LLM calls, specifically Claude 3.5.
I am using the LLM to go over a pretty large codebase, which cause my workflow to reach the max api limits of tokens per minute.
Is there any way that I can configure Dagger to slow down the LLM requests to avoid hitting those rate limits? Thanks!
#How to avoid API rate limits when writing LLM workflows
1 messages · Page 1 of 1 (latest)
I opened this issue for configurable retries: https://github.com/dagger/dagger/issues/9970
I wonder if that should include backoff logic too
Thank you! I upvoted the issue.
Do you think there is a workaround or something I can do until this is implemented?
Not that I've tried, maybe a proxy like litellm (https://docs.litellm.ai/docs/) can handle it but I don't know for sure
@neat turret did you have some ideas for retry logic already?
I had one idea, but then it failed and I haven't gotten back to it. I wanted to inject a custom client to the LLM that detected errors and retried, but not all providers support that. But now that I think about it again, we can probably just wrap the entire loop in retry logic, as long as we can detect retryable errors. Might need some kind of collaboration between the generic outer loop and the per-provider code, like inside of SendQuery it could annotate the error based on provider-specific error checking.