#is it ok to do workflow + workpool?
8 messages · Page 1 of 1 (latest)
Hi @tranquil dragon. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:
Yes, it is absolutely okay to use Workflow and Workpool together in Convex. In fact, the Workflow component is built on top of the Workpool component to provide durable execution of long-running functions with retries, delays, and parallelism management. This means that when you use the Workflow component, you are already leveraging the capabilities of Workpool under the hood.
You can also configure the parallelism and retry behavior of workflows by passing workpool options to the WorkflowManager. This allows you to fine-tune how many steps run in parallel and how retries are handled, just as you would with Workpool directly:
const workflow = new WorkflowManager(components.workflow, {
workpoolOptions: {
maxParallelism: 10,
retryActionsByDefault: true,
defaultRetryBehavior: {
maxAttempts: 5,
initialBackoffMs: 1000,
base: 2,
},
},
});
Specifying step parallelism, Retrying pesky LLMs
So, using Workflow and Workpool together is not only supported, but is a recommended pattern for building reliable, scalable, and durable workflows in Convex.
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
@tranquil dragon Kapa.ai is still learning and improving, please let me know how I did by reacting below.
@umbral knot
await [WORKPOOL_NAME].[ENQUEUE_METHOD](
ctx,
internal.points.[PROCESS_FUNCTION_NAME],
[WINNING_PLAYERS_ARRAY].map(([PLAYER_ITEM]) => ({
[ID_FIELD]: [ID_VALUE],
[PLAYER_ID_FIELD]: [PLAYER_ITEM].[PLAYER_ID_PROP]
}))
)
is it ok to do .collect() inside the points.[PROCESS_FUNCTION_NAME]? or we still need to keep pagination
Let me check the knowledge base for you again. One moment! :books:
It is generally not recommended to use .collect() inside your points.[PROCESS_FUNCTION_NAME] if there is a chance that the number of documents being queried is large (e.g., 1000+ documents). Using .collect() will load all matching documents into memory, which can lead to performance issues, increased bandwidth usage, and even query failures if the result set is too large. Convex specifically recommends using pagination (e.g., .paginate()) or limiting the number of results (e.g., .take(n)) for large tables or unbounded queries.
From the best practices:
All results returned from
.collectcount towards database bandwidth (even ones filtered out by.filter). It also means that if any document in the result changes, the query will re-run or the mutation will hit a conflict.If there's a chance the number of results is large (say 1000+ documents), you should use an index to filter the results further before calling
.collect, or find some other way to avoid loading all the documents such as using pagination, denormalizing data, or changing the product feature.
Only use.collectwith a small number of results
For batch processing inside a workpool or workflow, you should process documents in pages using .paginate() and keep track of your progress with cursors. This approach is scalable and avoids the pitfalls of loading too much data at once. See also: Splitting up the work with pagination and limits.
Summary:
- Use
.collect()only if you are certain the result set is small. - For large or unbounded queries, use
.paginate()or.take(n)inside your processing function, even when using Workpool.