#Workflow retry from step

1 messages · Page 1 of 1 (latest)

hazy ivyBOT
#

Hi @fleet sierra. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:

#

I understand your concern about retrying workflows efficiently, especially when dealing with expensive operations like LLM calls.

The Workflow component in Convex is designed to handle exactly this scenario. When a workflow fails after reaching the maximum retry attempts, you have a few options:

Resuming from failure point

Currently, there isn't a built-in way to manually resume a workflow from the exact point of failure after it has reached its maximum retry attempts and entered a "failed" state. The Workflow component is designed to either complete successfully or fail after exhausting retries.

Recommended approaches

1. Use step-level retry configuration

You can configure different retry behaviors for different steps in your workflow:

const workflow = new WorkflowManager(components.workflow);

export const myWorkflow = workflow.define({
  args: { prompt: v.string() },
  handler: async (step, { prompt }) => {
    // Create thread - minimal retries needed
    const { threadId } = await step.runMutation(internal.example.createThread, {
      userId, title: "Support Request",
    });
    
    // LLM call - more retries for API stability
    const suggestion = await step.runAction(internal.example.getSupport, {
      threadId, userId, prompt,
    }, { 
      retry: { maxAttempts: 5, initialBackoffMs: 1000, base: 2 } 
    });
    
    // Flaky third step - maximum retries
    const polished = await step.runAction(internal.example.flakyAction, {
      suggestion
    }, {
      retry: { maxAttempts: 10, initialBackoffMs: 250, base: 2 }
    });
  },
});

This approach lets you fine-tune retry behavior based on the nature of each step Workflow Component.

2. Store intermediate results in the database

#

For workflows with expensive operations, storing intermediate results in the database is a good practice:

  1. After each significant step, store the results in the database
  2. At the beginning of each step, check if you already have the results
  3. If a workflow fails completely, you can start a new workflow that checks for and uses these stored results

This approach gives you more control over resuming from specific points.

3. Consider using the Workpool component

For more complex retry scenarios, the Workpool component offers additional control:

const pool = new Workpool(components.emailWorkpool, {
  retryActionsByDefault: true,
  defaultRetryBehavior: { maxAttempts: 3, initialBackoffMs: 1000, base: 2 },
});

// Then in your mutation:
await pool.enqueueAction(ctx, internal.myAction, args, {
  onComplete: internal.handleCompletion,
  context: { stepData: intermediateResults },
  retry: { maxAttempts: 10, initialBackoffMs: 250, base: 2 },
});

The Workpool component provides completion handling that can help you manage the flow between steps Workpool Component.

Best practices

  1. Make your actions idempotent so they can be safely retried
  2. Use exponential backoff with jitter for retries (built into the components)
  3. Store intermediate results for expensive operations
  4. Consider breaking very complex workflows into smaller, more manageable workflows
#

The durable workflow pattern in Convex is designed to handle these scenarios, but sometimes manual checkpointing is still valuable for very expensive operations Stack article on durable workflows.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon: