#workflow hitting limit when looping thru paginated queries

19 messages · Page 1 of 1 (latest)

zealous trail
#

im trying to fetch data from a huge table recursively within a workflow (see the snapshot). im getting this error.

my error:

Workflow journal size limit exceeded (1066648 bytes > 1048576 bytes).
Consider breaking up the workflow into multiple runs, using smaller step     arguments or return values, or using fewer steps.

seems like each step sums up into total data usage (which im aware its 1Mib).

limitations from the doc:

Steps can only take in and return a total of 1 MiB of data within a single workflow execution. If you run into journal size limits, you can work around this by storing results within your worker functions and then passing IDs around within the the workflow.

I need advice on how to build the workaround for this.
i dont understand what that part
storing results within your worker functions and passing the Id around

please help!

distant thorn
#

You need to create internal paginated query that will call itself recursively and not in a while loop

#

But overall you should describe what you want to achieve. It may be overkill here

zealous trail
distant thorn
#

you need to describe first what you want to achieve with your code

zealous trail
distant thorn
#

if convex only you need to run recursive mutation paginated query like i described before. this recursive mutation would initiate with first batch. write data into file storage in csv. than initiate mutation with second batch and add data to csv file from before. and so on till its finished

#

or you run batches with timeout delay like i did here

zealous trail
#

Collect() will not return me the whole count

#

The last time i ran migration, my table size is almost half mil rows

#

Paginated query until the cursor is complete seems to be the right way

ripe wing
#

hey @zealous trail, sorry this is a bit confusing / not well documented.

my mental model for workflow stuff is that it's about more about coordinating different steps than doing data fetching itself.

so in this case, you want to load a bunch of data, write it out to a CSV, and then pass that CSV to an AI agent. here's how I'd break it up.

  1. paginated query for fetching data from the db (what you have with getCommentsForWeeklyInsights)
  2. an action for running the paginated query, collecting the results, and writing it out to file storage
  3. another action that takes in the storage ID and calls the LLM
  4. a workflow that coordinates calling (2), getting the storage ID as a response, and then passing it to (3). the benefit of the workflow is that it can execute for a very long period of time, retry (2) or (3) if they hit transient errors, etc.

lmk if that makes sense.

zealous trail
gaunt gale
#

yep

#

that's right

#

the workflow steps mostly just record the "status" of the flow as it succeeds at each step

#

but the step doesn't do the work itself