#Join tables query / paginate

73 messages · Page 1 of 1 (latest)

proud shoal
#

An important limitation in Convex that affects how we need to structure queries, especially for paginated results with joins. Let me explain the problem and show hacky workarounds:

The Problem

In my current code, i am trying to:

  1. Paginate products
  2. Then fetch variations for those products
  3. Then filter based on those variations (colors, sizes)

This approach is problematic because:

  1. The pagination happens BEFORE the joins and filters on variations
  2. This means I might get fewer results than requested after filtering
  3. I can't do true SQL-like JOINs in a single query

For example, if I request 10 products:

  • I might get 10 products from the initial pagination
  • After filtering by variations, I might end up with only 3-4 products that actually match
  • The user sees fewer items than expected

Possible Solutions

  1. Denormalization Approach
// Store available colors and sizes directly on the product document
interface Product {
  _id: Id<"products">;
  // ... other fields ...
  availableColors: Id<"productColors">[];
  availableSizes: Array<{
    sizeCategoryId: Id<"sizeCategories">;
    values: string[];
  }>;
  hasInventory: boolean;
}
  1. Materialized Views Approach
// Create a separate table that pre-joins product data
interface ProductView {
  _id: Id<"productViews">;
  productId: Id<"products">;
  availableColors: Id<"productColors">[];
  availableSizes: Array<{
    sizeCategoryId: Id<"sizeCategories">;
    values: string[];
  }>;
  // ... other fields needed for filtering
}
  1. Batch Loading Approach
// Load more products than needed initially, then filter
const BATCH_MULTIPLIER = 3; // Load 3x more products than requested
const batchSize = paginationOpts.numItems * BATCH_MULTIPLIER;

// Modified pagination options
const batchPaginationOpts = {
  ...paginationOpts,
  numItems: batchSize
};
steady obsidianBOT
#

Thanks for posting in #1088161997662724167.
Reminder: If you have a Convex Pro account, use the Convex Dashboard to file support tickets.

    - Provide context: What are you trying to achieve, what is the end-user interaction, what are you seeing? (full error message, command output, etc.)
    - Use [search.convex.dev](https://search.convex.dev) to search Docs, Stack, and Discord all at once.
    - Additionally, you can post your questions in the Convex Community's #1228095053885476985 channel to receive a response from AI.
    - Avoid tagging staff unless specifically instructed.

    Thank you!
wind swan
#

It seems the easiest way to start would be to do (3), maybe fetching 100 at a time. You can always filter the UI to only show the first 10, then have a load more / etc. button keep showing 10 until a new page is actually needed from the backend.
For products, fetching a lot at once seems like a good move. They're unlikely to change, so that query won't be re-fetching frequently. In a messaging app you might query fewer, because every new message re-fetches up to the page size.

Another approach to consider if there aren't a ton of products is to paginate through them all to get them client-side, then do your filtering there. it'd be crazy fast since it wouldn't need a server round-trip, and you may end up doing less DB bandwidth if the filters change frequently. Or fetch the first 1k products & metadata (ProductView), and apply filters client-side, and only fetch more if there aren't enough products after the filters

proud shoal
#

none of this would work. those are thousands of products. fitler can be very detialed and return only few of the products. sometime i need like 100 paginate requests just to find right products. this is huge limitation.

#

here also someone mentioned this problem in youtube comments

#

join query is pretty basic feature that is missing in convex

proud shoal
#

also i cant put variation data into product table because its not possible to filter json content inside cell

#

so basically there is 0 solution for this problem

slender glacier
#

This is an interesting problem. The key question is: do you want to pre-filter or post-filter? i.e. do you want to fetch X documents that all match your filter, or do you want to fetch Y documents and then check which ones match your filter until you have X results.

If the former, you'll need to encode the filter condition into an index range. This may be via a join table, so you paginate on the table that joins variations -> products, with an index on the variation.

If the latter, you'll need to potentially fetch an unbounded number of results in one query. You can do this in Convex using some of the convex-helpers like https://github.com/get-convex/convex-helpers/blob/main/packages/convex-helpers/server/pagination.ts#L127 , but I agree it's not easy.

A simpler version of this is to fetch a fixed page size, filter in the query, and then call loadMore on the client automatically until you get enough results. This is what @wind swan suggested, i think, and it's what I would do if I had a complex filtered query to paginate.

GitHub

A collection of useful code to complement the official packages. - get-convex/convex-helpers

proud shoal
#

well everything of this sounds not like a real solution. join query functionality would be great. like every database has

proud shoal
#

here is an example:

left join files f
   on u.id = f.user_id```
#
  1. there are limits how many documents can be loaded into memory in convex. this means filtering needs to happen on the database side
  2. i cant paginate and than filter. it needs to happen on one run. otherwise i will get fewer results or no results
#

the only solution that i am using currently is huge batch size and multiple paginate requests till everything is loaded. basically what could have be done in one query requires multiple executions and requests from the client

slender glacier
slender glacier
#

Sql is a language. Mysql/postgres/etc are databases which cannot run this query

proud shoal
#

joining two tables users and files of the user and filter by file type

#

but its mysql query

slender glacier
#

It sounds like you fundamentally disagree with the Convex principle that OLTP queries should be fast and bounded. If you want to do large unbounded queries we recommend OLAP databases, which can execute arbitrary sql

proud shoal
#

i dont disagree i just can find solution for simple method to paginate and filter at the same time

#

filter based on another table i mean

#

paginate 1 table + filter based on 2nd table

slender glacier
proud shoal
#

it would be easy in js but i cant load all the table into memory

#

SELECT * FROM
(SELECT * FROM users LIMIT 10 OFFSET 10) AS u
LEFT JOIN files f
ON u.id = f.user_id
WHERE f.mime_type = 'jpg'

slender glacier
proud shoal
#

the query paginates users and filters bases on files table

slender glacier
#

i think you might be aiming for

SELECT * FROM users u
LEFT JOIN files f
    ON u.id = f.user_id
WHERE f.mime_type = 'jpg'
LIMIT 10 OFFSET 10
proud shoal
#

yes

slender glacier
#

how would you feel about a plain-javascript .filter function you could attach to a paginated query, that would filter out all of the results before reaching the page size. the downside would be if the filter is sparse, it could read too much data & be slow or throw error

proud shoal
#

this will not work if there are thouthands of documents

slender glacier
#

correct

proud shoal
#

and thats the case 🙂

slender glacier
#

nothing will work if there are thousands od documents

#

because then it's an OLAP query -- can't be made fast or efficient

proud shoal
#

there are 1k products. each of product has 1-10 variations

#

means 10k variations at max

slender glacier
proud shoal
#

well i may not hit limits now but in future probably

#

well if you would allow to have cached queries without those limits it would be possible to prefilter in js

#

this would mean each filter result would be stored in cache

#

and if it requires too much cache just pay for extra storage

#

so basically filtering and joining data would be done in js and stored in cache but need to disable limits for this

cobalt mango
#

I have almost literally the exact same scenario that you've got @proud shoal (same domain, even, with products and variations thereof), and I solved it with the "materialized view" approach. You can use the convex-helpers database triggers feature to keep this up-to-date. If you have array or object fields which you need to filter on, you can flatten those in your "view" table.

proud shoal
#

so basically secondary table that updates based on triggers

cobalt mango
#

I'm not the database expert that many Convex employees are, but I think the frustrating part here is basically that SQL will let you run a very inefficient query, and it will get slower and slower as your dataset grows, and eventually it may become pathological (by which I mean, cause your whole database to fall down). Convex imposes limits so that you are forced to architect your code such that you can't fall into this trap. So like Lee said, you can either use an OLAP database for this kind of query, or you can put in a little more architectural effort to ensure that it remains fast and efficient.

proud shoal
#

with secondary table i would also have a problem. i have variations table existing exactly for filtering. because if the data would be stored in json inside products table i would not be able to use index or filter based on this

#
    amount: v.float64(),
    brandArticle: v.optional(v.string()),
    color: v.string(),
    productColor: v.id("productColors"),
    productId: v.id("products"),
    sizeId: v.id("sizes"),
    variationIndex: v.float64(),
  })```
#

thats why i have 2 tables products and products variations. so queries can use variations for filter

cobalt mango
#

I don't think I follow what the problem is, exactly. Where are the JSON fields?

You should be able to just join the variations with the products in your "view" table, and thus every field will be flattened (and indexable). I agree that if you just tried to merge your products table with your productVariations table such that you end up with 1 row per product, and stored each variation as an array field on this new "view" document, your problem won't be solved.

proud shoal
#

as far i know its not possible to filter based on array of numbers / strings inside field.
if flatten table would have a list of colors for example in one field that still cant be used as filter

cobalt mango
#

Right, you would need to flatten those colors, too. Like so:

type Product = {
  colors: v.array(v.string()),
};
type ProductsTable = Product[];

// should become

type SearchableProduct = {
  productId: v.id("products"),
  color: v.string()
};
type SearchableProductsTable = SearchableProduct[];

// so if you have

const myProducts: ProductsTable = [{
  _id: "abc123",
  colors: ["blue", "green"]
}]

// you need to turn this into

const mySearchableProducts: SearchableProductsTable = [
  {
    productId: "abc123",
    color: "blue"
  },
  {
    productId: "abc123",
    color: "green"
  },
]
proud shoal
cobalt mango
#

Like normal, you just add an index for the fields you want to filter on (color in this case) and use withIndex to filter on them

proud shoal
cobalt mango
# proud shoal but how do you filter by one value on array of values? do you use js filter or d...

When you create this new table ("view"), you need to flatten (distribute, join) every field containing nested data or reference to a foreign table (these are functionally the same thing for our purposes here). That's what I was trying to show above—if you have a field with an array, you need to distribute each element of the array across a new field (colors -> color). So if you have 1 product with 2 colors, you need to have 2 rows in your "view" table, each representing a different variation (different color) of the same product.

proud shoal
#

hm what if color amount is dynamic?

#

just pregenerate like 10 colors?

cobalt mango
#

I don't think that matters, does it? You're just filtering on a string field, color, like above

#

Let me try to show you what it would look like if you modeled colors differently

#
products: defineTable({
  ...
}),
productColors: defineTable({
  color: v.string(),
  productId: v.id("products")
})

This is the same thing from a cardinality (I think that's the right term) perspective

#

As storing each color in a colors field on the products table

#

There's the same number of possible combinations of product and color

#

So just think about how you would join products and productColors to get a "view" table

#

That will look the same as if you instead model color variations with a colors field on the products table

#

At the end of the day, your "view" table will look like this:

searchableProducts: defineTable({
  productId: v.id("products"),
  color: v.string(),
})
#

Which is just the result of joining products and productColors

proud shoal
#

ok undersstand thanks

#

pretty hacky 😄

cobalt mango
#

No problem!