#Workflow retry from step
1 messages · Page 1 of 1 (latest)
I understand your concern about retrying workflows efficiently, especially when dealing with expensive operations like LLM calls.
The Workflow component in Convex is designed to handle exactly this scenario. When a workflow fails after reaching the maximum retry attempts, you have a few options:
Resuming from failure point
Currently, there isn't a built-in way to manually resume a workflow from the exact point of failure after it has reached its maximum retry attempts and entered a "failed" state. The Workflow component is designed to either complete successfully or fail after exhausting retries.
Recommended approaches
1. Use step-level retry configuration
You can configure different retry behaviors for different steps in your workflow:
const workflow = new WorkflowManager(components.workflow);
export const myWorkflow = workflow.define({
args: { prompt: v.string() },
handler: async (step, { prompt }) => {
// Create thread - minimal retries needed
const { threadId } = await step.runMutation(internal.example.createThread, {
userId, title: "Support Request",
});
// LLM call - more retries for API stability
const suggestion = await step.runAction(internal.example.getSupport, {
threadId, userId, prompt,
}, {
retry: { maxAttempts: 5, initialBackoffMs: 1000, base: 2 }
});
// Flaky third step - maximum retries
const polished = await step.runAction(internal.example.flakyAction, {
suggestion
}, {
retry: { maxAttempts: 10, initialBackoffMs: 250, base: 2 }
});
},
});
This approach lets you fine-tune retry behavior based on the nature of each step Workflow Component.
2. Store intermediate results in the database
For workflows with expensive operations, storing intermediate results in the database is a good practice:
- After each significant step, store the results in the database
- At the beginning of each step, check if you already have the results
- If a workflow fails completely, you can start a new workflow that checks for and uses these stored results
This approach gives you more control over resuming from specific points.
3. Consider using the Workpool component
For more complex retry scenarios, the Workpool component offers additional control:
const pool = new Workpool(components.emailWorkpool, {
retryActionsByDefault: true,
defaultRetryBehavior: { maxAttempts: 3, initialBackoffMs: 1000, base: 2 },
});
// Then in your mutation:
await pool.enqueueAction(ctx, internal.myAction, args, {
onComplete: internal.handleCompletion,
context: { stepData: intermediateResults },
retry: { maxAttempts: 10, initialBackoffMs: 250, base: 2 },
});
The Workpool component provides completion handling that can help you manage the flow between steps Workpool Component.
Best practices
- Make your actions idempotent so they can be safely retried
- Use exponential backoff with jitter for retries (built into the components)
- Store intermediate results for expensive operations
- Consider breaking very complex workflows into smaller, more manageable workflows