#Query optimization

18 messages · Page 1 of 1 (latest)

still fossilBOT
#

Hi @shell pond. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:

#

Your new implementation does improve query efficiency in some ways, but there are important tradeoffs regarding database bandwidth and performance.

What’s Good

  • You use an index (withIndex('by_workspace_id', ...)) to efficiently fetch only the relevant workspaceImages for the current workspace. This avoids a full table scan and is a best practice for scalable queries in Convex, especially as your data grows [Queries that scale].
  • You limit the number of images returned by breaking the loop after reaching args.initialNumItems, which helps control the amount of data processed and returned.

Bandwidth Considerations

However, your implementation still performs a separate database read for each image and its sizes:

  • For each workspaceImage, you call ctx.db.get(workspaceImage.imageId) and then a separate query for imageSizes.
  • This results in N+1 queries (N for images, 1 for the initial list), which can increase total database bandwidth and latency, especially if initialNumItems is large.

Bandwidth and Filtering

  • Filtering by alphaOnly and noAlphaOnly is done in JavaScript after fetching each image. This means you may fetch more images than you ultimately return, especially if many images are filtered out. All documents read—even those filtered out—count toward your database bandwidth usage [Discord support thread].
  • Ideally, you would filter as much as possible using indexes or in the initial query, but Convex currently does not support filtering on array fields or arbitrary fields in the index directly. So, this pattern is sometimes necessary, but it’s not optimal for bandwidth.

Comparison to the "Commented One"

#

You didn’t provide the commented-out implementation, so I can’t directly compare. However, if the previous version did a full table scan or fetched all images without using an index, your new version is likely more efficient for large datasets. If the previous version fetched all images and then filtered in memory, the bandwidth usage would be even higher.

Recommendations

  • Your use of indexes is correct and will help with scalability.
  • If you expect a large number of images per workspace, consider paginating the results to further control bandwidth and performance [Best Practices].
  • If possible, try to minimize the number of separate queries per image (for example, by denormalizing some data or batching requests), but this depends on your schema and requirements.

Summary

  • Your new implementation is better than a full table scan, thanks to the use of indexes.
  • However, filtering after fetching and the N+1 query pattern can still lead to higher bandwidth usage, especially if many images are filtered out.
  • All documents read—even those filtered out—count toward bandwidth.

For more on optimizing queries and bandwidth in Convex, see:

If you have the previous implementation, I can provide a more direct comparison.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@shell pond Kapa.ai is still learning and improving, please let me know how I did by reacting below.

shell pond
#

@still fossil
// const workspaceImages = await ctx.db.query("workspaceImages")
// .withIndex('by_workspace_id', q => q.eq('workspaceId', workspaceId))
// .order("desc")
// .take(args.initialNumItems)

the commented one looked like this and then It would query each image, and then filter on them, but if none would have alpha for example, it would return no results when asked for said 50

still fossilBOT
#

Let me check the knowledge base for you again. One moment! :books:

#

Thank you for clarifying the previous (commented) implementation:

const workspaceImages = await ctx.db.query("workspaceImages")
  .withIndex('by_workspace_id', q => q.eq('workspaceId', workspaceId))
  .order("desc")
  .take(args.initialNumItems);
// Then, for each workspaceImage, fetch the image and filter by alpha

Comparison of Approaches

Commented Implementation

  • How it works: Fetches a fixed number (initialNumItems) of workspaceImages using an index, then loads each corresponding image and applies the alpha filter in code.
  • Downside: If none of the first N images match the filter (e.g., hasAlpha), you may return fewer than requested (possibly zero), even if more matching images exist later in the table. This is because the filter is applied after pagination, not before.
  • Bandwidth: Only the first N workspaceImages and their related images are loaded, but you may need to make additional requests if you want to guarantee a full page of filtered results.

New Implementation (for-await loop)

#
  • How it works: Iterates through all workspaceImages (using the index), loads each image, applies the filter, and stops once you have enough matching results.
  • Advantage: Guarantees you return up to initialNumItems that match the filter, as you keep walking through the table until you have enough.
  • Bandwidth: May read more documents than the commented version, especially if many images are filtered out, because you may need to scan a large number of workspaceImages and their related images to find enough matches. All documents read—even those filtered out—count toward your database bandwidth usage [Best Practices][Discord support thread].

Tradeoffs

  • New implementation: More reliable for returning a full page of filtered results, but potentially higher bandwidth usage if the filter is selective.
  • Commented implementation: Lower bandwidth if most images match the filter, but may return fewer results than requested if the filter is selective.

Convex Guidance

This pattern—walking through results and filtering until you have enough for pagination—is a common approach in Convex when you can't express the filter in an index. It is explicitly described as a valid solution, but you should be aware of the potential for increased bandwidth if many documents are filtered out [Discord support thread: filter index based on multiple possible values?].

#

"As long as you're aware that the function walks the requests for the user, and only stops when it finds enough that fit the filter criteria. In particular, a single query might read a lot of data if there are lots of documents that don't match the filter."
— Trusted Support Staff

Summary

  • Your new implementation is better for user experience (guaranteeing a full page of filtered results), but may increase database bandwidth if the filter is selective.
  • The commented implementation is more bandwidth-efficient if most images match, but can return fewer results than requested.

This tradeoff is a known limitation in Convex when filtering cannot be done via an index [Best Practices][Discord support thread].

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@shell pond Kapa.ai is still learning and improving, please let me know how I did by reacting below.

shell pond
#

but my new implementation doesnt read ALL the database, right ?

#

@still fossil right ?

still fossilBOT
#

Let me check the knowledge base for you again. One moment! :books:

#

Yes, your understanding is correct.

  • The commented implementation (.take(args.initialNumItems)) fetches a fixed number of workspaceImages and then filters them. If few or none match your filter (e.g., hasAlpha), you may return fewer results than requested, even if more matching images exist further in the table. This is because pagination happens before filtering.
  • The new implementation iterates through all workspaceImages (using the index), applies the filter as it goes, and stops once it has enough matching results. This ensures you return up to the requested number of filtered results, but it may read and process more documents, increasing database bandwidth if many images are filtered out.

This tradeoff—between returning a full page of filtered results and potentially higher bandwidth usage—is a known and documented pattern in Convex when filtering cannot be done via an index. The approach is valid, but you should be aware of the bandwidth implications if your filter is selective and many documents are skipped before finding enough matches. All documents read, even those filtered out, count toward your database bandwidth usage [Best Practices][Discord: filter index based on multiple possible values?].

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@shell pond Kapa.ai is still learning and improving, please let me know how I did by reacting below.

shell pond
#

@still fossil but it read the database one by one right, not getting ALL workspace images and THEN filtering ?

still fossilBOT
#

Let me check the knowledge base for you again. One moment! :books:

#

Yes, your new implementation reads the database one by one—it iterates through the workspaceImages using an async iterator, fetching and filtering each image as it goes, and stops once it has enough matching results. It does not load all workspaceImages into memory first and then filter them.

This approach is described in the Convex documentation: you can use JavaScript's for await...of to loop through the table one document at a time, applying your filter as you go. This is more efficient than collecting all documents and then filtering, especially when you only need a subset of results [Advanced filtering techniques].

"Even with this optimization you are still just looping over the table to find the first post that matches and may hit your function limits. Using indexes is still the way to go."
— Convex Docs

So, your understanding is correct: the new implementation reads and filters documents one by one, not all at once.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon: