#Uncaught Error: Too many documents read in a single function execution (limit: 32000).

5 messages · Page 1 of 1 (latest)

glad leaf
#

I have a paginated Query, using stream inside this query. In my table are 4600 entries (which are very few). I use convex selfhosted.
The paginated query failes with

Uncaught Error: Too many documents read in a single function execution (limit: 32000).

But how is that ever possible? I give you my setup. I have a table, called "report".

 report: defineTable({
    title: v.optional(v.string()),
    content: v.string(),
    content_summarized: v.string(),
 }).index("title", ["title"])

As you see, I have index. But in my streamed query I can not used it, because I do a custom filter with string compare. And I do a join so that I merge another stream. Function searchReportByFilter is the one which makes troubles.

export const searchReportByFilter = query({
  args: {
    paginationOpts: paginationOptsValidator,
    _term: v.optional(v.string()),
    title: v.optional(v.string()),
  },
  handler: async (ctx, args) => {
    // setup report stream
    const reportsStream = stream(ctx.db, schema)
      .query("report")
      .withIndex("by_creation_time")
      .order(args.sorting === "oldest" ? "asc" : "desc");


    // "join" users 
    const reportsWithUsersStream = reportsStream.map(async report => {
      const user = (await ctx.db.get(report.report_from)) || null;
      return { report, user };  
    });

    // filter by search term called _term
    let reportsFiltered = reportsWithUsersStream.filterWith(async () => true);

    reportsFiltered = reportsFiltered.filterWith(async ({ report }) => {
      if (args._term == null || args._term.trim() === "") {
        return true;
      }
      const searchIn = (
        (report.title ?? "") +
        " " +
        (report.content_summarized ?? "")
      ).toLowerCase();
      return searchIn.includes(args._term.toLowerCase());
    });

    // apply the pagination Options
    return await reportsFiltered.paginate(args.paginationOpts);
  }
});
grizzled onyxBOT
#

Hi @glad leaf. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:

#

The error is coming from Convex’s per-function read limits, not from the number of rows in your table. Every document that is scanned (even if later filtered out) counts toward the 32,000 “documents scanned” limit for a single query/mutation execution. [Limits; Filter perf]

In your case:

  • stream(ctx.db, schema).query("report").withIndex("by_creation_time").order(...) creates a stream over the whole report table, ordered by _creationTime.
  • .filterWith runs after reading documents from that stream. It can keep reading more and more documents to try to fill a page, and all of those reads count toward the 32k limit. [Filtering]
  • You also do a join (ctx.db.get(report.report_from)) per document in the stream, which adds more reads.

Even with only ~4,600 reports, if your filter excludes many of them (e.g. most don’t match _term), the paginator may scan a large fraction of the table to fill each page, and across reactive re-runs or large numItems you can hit the 32k scanned-doc limit. [Filtering; Read/write limits]

From the docs and Discord threads, the recommended patterns are:

  1. Expect full-table scans with custom JS filters
#

Using .filterWith (or JS predicates) over a base query that doesn’t narrow the index range is effectively a table scan. This is explicitly called out as potentially reading “every row of the database”. [Filtering; Filter perf]

  1. Use indexes or search indexes to narrow the range first

    • If possible, denormalize a searchText field (e.g. title + " " + content_summarized) and use a searchIndex with filter fields for other constraints, then paginate directly on that. [Discord search filters]
    • Or add multi-field indexes that match your most common filters so you can use .withIndex(..., q => q.eq(...).eq(...)) and then only do light additional filtering. [Queries that scale]
  2. If you must use .filterWith, cap how much you read

    The streams docs recommend passing maximumRowsRead in paginationOpts to avoid reading too much data when using .filterWith with custom logic. [Filtering]

    That won’t remove the limit, but it lets you fail earlier or return smaller pages instead of blowing past 32k scanned docs.

  3. Be aware that joins inside .map / .filterWith add reads

    Every ctx.db.get(report.report_from) is another document read. With thousands of rows, this can quickly add up toward the 32k limit. [Filtering]

Given only the provided sources, I can’t give a fully worked alternative implementation, but the Convex guidance for your situation is:

#
  • Avoid relying on .filterWith over a broad stream for primary filtering; instead:
    • Use an index or search index to narrow the candidate set as much as possible.
    • Then, if needed, apply .filterWith or JS filtering on that smaller set.
  • If you keep the current pattern, reduce numItems and set maximumRowsRead in paginationOpts to keep each execution under the 32k scanned-doc limit. [Filtering; Transactions]

If you can share how many items per page you request and how selective _term usually is, that would help decide whether you should move to a searchIndex-based approach or just tighten pagination / indexing.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon: