workflow hitting limit when looping thru paginated queries | Convex Community | Page 1

zealous trail Feb 24, 2025, 2:35 PM

#

im trying to fetch data from a huge table recursively within a workflow (see the snapshot). im getting this error.

my error:

Workflow journal size limit exceeded (1066648 bytes > 1048576 bytes).
Consider breaking up the workflow into multiple runs, using smaller step     arguments or return values, or using fewer steps.

seems like each step sums up into total data usage (which im aware its 1Mib).

limitations from the doc:

Steps can only take in and return a total of 1 MiB of data within a single workflow execution. If you run into journal size limits, you can work around this by storing results within your worker functions and then passing IDs around within the the workflow.

I need advice on how to build the workaround for this.
i dont understand what that part
storing results within your worker functions and passing the Id around

please help!

distant thorn Feb 24, 2025, 6:46 PM

#

You need to create internal paginated query that will call itself recursively and not in a while loop

#

But overall you should describe what you want to achieve. It may be overkill here

zealous trail Feb 24, 2025, 9:18 PM

#

distant thorn You need to create internal paginated query that will call itself recursively an...

tell me more about the internal paginated query?
convex query has its limitation too, it can only read/write up to 8Mib of data. i think if i put that in the workflow, its definitely going to fail because it exceed that 1mib limit.

distant thorn Feb 25, 2025, 12:31 AM

#

you need to describe first what you want to achieve with your code

zealous trail Feb 25, 2025, 2:09 AM

#

distant thorn you need to describe first what you want to achieve with your code

im trying to fetch the data from specified date (start and end), compile them into a csv format to feed to AI. on demand.
on my end pretty important to have it in csv format

distant thorn Feb 25, 2025, 10:52 AM

#

zealous trail im trying to fetch the data from specified date (start and end), compile them in...

better to do it on fronted side. frontend should run script with paginated query that requests convex in batches

#

if convex only you need to run recursive mutation paginated query like i described before. this recursive mutation would initiate with first batch. write data into file storage in csv. than initiate mutation with second batch and add data to csv file from before. and so on till its finished

#

or you run batches with timeout delay like i did here

zealous trail Feb 25, 2025, 11:02 AM

#

distant thorn or you run batches with timeout delay like i did here

Hey i dont think this is the right way for my case,

#

Collect() will not return me the whole count

#

The last time i ran migration, my table size is almost half mil rows

#

Paginated query until the cursor is complete seems to be the right way

ripe wing Feb 25, 2025, 4:57 PM

#

hey @zealous trail, sorry this is a bit confusing / not well documented.

my mental model for workflow stuff is that it's about more about coordinating different steps than doing data fetching itself.

so in this case, you want to load a bunch of data, write it out to a CSV, and then pass that CSV to an AI agent. here's how I'd break it up.

paginated query for fetching data from the db (what you have with getCommentsForWeeklyInsights)
an action for running the paginated query, collecting the results, and writing it out to file storage
another action that takes in the storage ID and calls the LLM
a workflow that coordinates calling (2), getting the storage ID as a response, and then passing it to (3). the benefit of the workflow is that it can execute for a very long period of time, retry (2) or (3) if they hit transient errors, etc.

lmk if that makes sense.

zealous trail Feb 25, 2025, 5:45 PM

#

ripe wing hey <@1090346549625765989>, sorry this is a bit confusing / not well documented....

ah so instead of using workflow to perform the paginated queries, i should use action to do it instead ? that action can still be called as a workflow step right ?

gaunt gale Feb 25, 2025, 6:18 PM

#

yep

#

that's right

#

the workflow steps mostly just record the "status" of the flow as it succeeds at each step

#

but the step doesn't do the work itself

#workflow hitting limit when looping thru paginated queries