#Issues with limits/timeouts doing cron jobs on large datasets; infrequent aggregates of data

20 messages · Page 1 of 1 (latest)

karmic echo
#

Hi, does anyone have a good solution for doing bulk data operations within Convex cron jobs? I keep running into either data query limits or action timeouts. I have a table closing in on 1 million rows on which I want to run weekly-ish updates and store the aggregates results in another table (like 20 rows, that's the easy part). At the moment I was able to group by a field in the data, reducing the data set to maybe 50-100k rows and run the aggregates on that smaller dataset but I just keep running into limits or timeouts.

The aggregates can be slow to compute, that's no problem, the cron jobs just need to be stable. Is there maybe a way to offload it somewhere outside of a "regular" action?

earnest voidBOT
#

Thanks for posting in #1088161997662724167.
Reminder: If you have a Convex Pro account, use the Convex Dashboard to file support tickets.

    - Provide context: What are you trying to achieve, what is the end-user interaction, what are you seeing? (full error message, command output, etc.)
    - Use [search.convex.dev](https://search.convex.dev) to search Docs, Stack, and Discord all at once.
    - Additionally, you can post your questions in the Convex Community's #1228095053885476985 channel to receive a response from AI.
    - Avoid tagging staff unless specifically instructed.

    Thank you!
narrow sable
karmic echo
#

Aah I didn't know about that, thanks for the suggestion! I think this could help. Do you think that could also work around query timeout issues? Also with queries that fall within the max. document limit

karmic echo
#

I think I finally got somewhere now. The workflow was working, I was just returning the in-memory data too early. I was combining multiple Convex actions together, causing it to hit the 1MB limit. The solution was to just have 1 action that does the heavy lifting, and only return data from said action after processing it. So essentially moving more raw TS logic into the action

errant depot
#

did you figure this out

#

is there no way to increase this limit

karmic echo
#

I ended up rewriting everything into workflows and plain javascript functions to pass around the data

#

So the different workflows have a small input object to denote what to process, that then runs the heavy workload with just JS functions, and the "small" aggregate object in then returned to avoid the data limits

#

I'm now processing around 1-2M rows in this workflow

#

Also had to use pagination for queries, otherwise I was running into issues with the document limit

errant depot
#

ahh so no direct way like instant kind of

#

I dont have million rows but like maybe 10-20k

#

so it hits like 1mb limit

#

5-6 mb of data is what I have
but I ahve to make it into batches

#

I thought maybe a better way

karmic echo
#

An easy solution would be to make a function that does a paginated query and returns the full data. Regular functions can return large objects

#

Just avoid returning large objects from actions

#

So if you rewrote your code a little bit to first get all data using pagination, you could then do the processing and return the aggregates or whatever you wanna do with the data

#

Maybe a workflow isn't even necessary for you. I was running into timeout issues on top of size limits. Yours might be quick enough