#efficient querying when it comes to array of convex ids

85 messages · Page 1 of 1 (latest)

woeful sand
#
// convex/utils/enrichCardsWithCollectionStatus.ts
import { Doc, Id } from '../_generated/dataModel';
import { QueryCtx } from '../_generated/server';
import { getAuthUserInternal } from '../auth';
import { getLatestCardPricing } from './getCardPricing';

export type CardWithCollectionStatus = Doc<'pokemonCards'> & {
  collectionStatus?: {
    isOwned: boolean;
    ownedCard?: Doc<'userCards'>;
    collections: Array<Doc<'userCollections'>>;
  };
};

export async function enrichPokemonCards(
  ctx: QueryCtx,
  cards: Array<Doc<'pokemonCards'>>
): Promise<CardWithCollectionStatus[]> {
  // pricing enrichment
  const prices = await getLatestCardPricing(ctx.db, cards);

  // collection enrichment
  const auth = await getAuthUserInternal(ctx);
  if (auth.isErr) {
    return prices.map((card) => ({ ...card, collectionStatus: undefined }));
  }

  if (!auth.value?._id) {
    return prices.map((card) => ({ ...card, collectionStatus: undefined }));
  }

  const authUserId = auth.value._id;

  // Get all user cards for these specific cards in a single query
  const cardIds = prices.map((card) => card._id);
  const userCards = await ctx.db
    .query('userCards')
    .withIndex('by_userId_cardId', (q) => q.eq('userId', authUserId))
    .filter((q) =>
      q.or(...cardIds.map((cardId) => q.eq(q.field('cardId'), cardId)))
    )
    .collect();

  // Get all junction table entries for the user's cards in a single query
  const userCardIds = userCards.map((uc) => uc._id);
  const junctions =
    userCardIds.length > 0
      ? await ctx.db
          .query('userCardCollections')
          .withIndex('by_userCardId_userCollectionId')
          .filter((q) =>
            q.or(
              ...userCardIds.map((userCardId) =>
                q.eq(q.field('userCardId'), userCardId)
              )
            )
          )
          .collect()
      : [];

  // Get all collections referenced in junctions
  const collectionIds = [...new Set(junctions.map((j) => j.userCollectionId))];
  const collections = await Promise.all(
    collectionIds.map((id) => ctx.db.get(id))
  );
  const validCollections = collections.filter(
    (c): c is Doc<'userCollections'> => c !== null
  );

  // Build lookup maps for efficiency
  const userCardsByCardId = new Map(userCards.map((uc) => [uc.cardId, uc]));

  const junctionsByUserCardId = new Map<
    Id<'userCards'>,
    Id<'userCollections'>[]
  >();
  junctions.forEach((junction) => {
    const existing = junctionsByUserCardId.get(junction.userCardId) || [];
    existing.push(junction.userCollectionId);
    junctionsByUserCardId.set(junction.userCardId, existing);
  });

  const collectionsById = new Map(
    validCollections.map((collection) => [collection._id, collection])
  );

  // Enrich each card with collection status
  const enrichedCards: CardWithCollectionStatus[] = prices.map((card) => {
    const userCard = userCardsByCardId.get(card._id);
    const isOwned = !!userCard;

    let cardCollections: Array<Doc<'userCollections'>> = [];

    if (userCard) {
      const collectionIds = junctionsByUserCardId.get(userCard._id) || [];

      // Get collection details
      cardCollections = collectionIds
        .map((collectionId) => collectionsById.get(collectionId))
        .filter(
          (collection): collection is Doc<'userCollections'> =>
            collection !== undefined
        );
    }

    return {
      ...card,
      collectionStatus: {
        isOwned,
        ownedCard: userCard,
        collections: cardCollections,
      },
    };
  });

  return enrichedCards;
}

Looking at how im looping over my junction table, is this an efficient way in convex to query all my ids with the q.or syntax? or is the withIndex & q.or not used correctly

blissful robinBOT
#

Hi @woeful sand. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:

#

Your current approach using .withIndex(...).filter(q => q.or(...)) is not the most efficient way to query for multiple IDs in Convex. Here’s why:

  • The .withIndex method is efficient only for contiguous ranges in the index. When you use .filter(q => q.or(...)) to match multiple IDs, Convex still needs to scan the entire index range defined by .withIndex, and then filter the results. This means all documents matching the initial index condition (e.g., all userCards for a user) are loaded, and then filtered in memory for the specific card IDs you want. This can be inefficient if the index range is large, as all those documents count against your DB bandwidth and function limits, even if you only need a subset of them.

  • Convex indexes do not support or, in, or neq operations directly in .withIndex. This is by design, to make query performance predictable and to avoid accidental full scans or inefficient queries. Index queries must be contiguous in the index’s sort order, and or breaks that contiguity, so it cannot be pushed down to the index layer for efficient lookup. This is explained in the docs and community discussions:

    There's no or, in, neq for withIndex ... index queries look at a contiguous range of the table when sorted by the index fields. with an OR statement, it's no longer contiguous, so it's no longer a single db query. this is part of the convex philosophy to make the performance of a query obvious from how it's written; so you don't accidentally query too much data.
    Discord support thread: Index query with "OR" statement
    Stack: Convex’s query syntax

#
  • The recommended efficient pattern for fetching multiple documents by ID is to use multiple queries (one per ID) or, if you’re using Convex Ents, the .getMany method, which is optimized for this use case. If you have a small number of IDs, this is usually fine. If you have a large number, you may need to batch or rethink your data model to avoid this pattern.

Summary:
Your current use of .withIndex(...).filter(q => q.or(...)) is not as efficient as you might hope—it still loads all documents in the index range and filters them in memory. There is no way to push an or or in condition down to the index in Convex. For truly efficient queries, you need to use indexes for contiguous ranges, or fetch by ID directly (possibly in parallel). If you find yourself needing to do this often, consider if your data model or indexing strategy can be adjusted to better fit your access patterns.

For more details, see:


Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@woeful sand Kapa.ai is still learning and improving, please let me know how I did by reacting below.

woeful sand
#
import { Doc } from '../_generated/dataModel';
import { QueryCtx } from '../_generated/server';
import { getAuthUserInternal } from '../auth';
import { getLatestCardPricing } from './getCardPricing';

export type CardWithCollectionStatus = Doc<'pokemonCards'> & {
  collectionStatus?: {
    ownedCard: Doc<'userCards'> | null;
    collections: Array<Doc<'userCollections'>>;
  };
};

export async function enrichPokemonCards(
  ctx: QueryCtx,
  cards: Array<Doc<'pokemonCards'>>
) {
  // pricing enrichment
  const prices = await getLatestCardPricing(ctx.db, cards);

  // collection enrichment
  const auth = await getAuthUserInternal(ctx);
  if (auth.isErr) {
    return prices;
  }

  if (!auth.value?._id) {
    return prices;
  }

  const enrichedCards: Array<CardWithCollectionStatus> = await Promise.all(
    prices.map(async (card) => {
      const ownedCard = await ctx.db
        .query('userCards')
        .withIndex('by_userId_cardId', (q) =>
          q.eq('userId', auth.value._id).eq('cardId', card._id)
        )
        .first();

      const collections = (
        ownedCard
          ? await ctx.db
              .query('userCardCollections')
              .withIndex('by_userCardId_userCollectionId', (q) =>
                q.eq('userCardId', ownedCard._id)
              )
              .collect()
              .then(
                async (results) =>
                  await Promise.all(
                    results.map(async (res) => {
                      const collection = await ctx.db.get(res.userCollectionId);
                      if (!collection) return null;

                      return collection;
                    })
                  )
              )
          : []
      ).filter((col) => col !== null);

      return {
        ...card,
        collectionStatus: {
          ownedCard,
          collections,
        },
      };
    })
  );

  return enrichedCards;
}

so is this update better?

@blissful robin

blissful robinBOT
#

Let me check the knowledge base for you again. One moment! :books:

#

Your updated approach, which uses Promise.all to process each card individually and issues separate queries for each card, is functionally correct but not optimal for efficiency in Convex. Here’s why, based on the knowledge sources:

Analysis

  • Per-card queries:
    For each card, you are issuing a separate query to userCards and, if owned, another query to userCardCollections, and then potentially multiple db.get calls for collections. This results in a number of database round-trips proportional to the number of cards and collections, which is the classic "N+1 query problem" (see explanation). While Convex's architecture makes this less painful than in traditional databases, it is still less efficient than batching when possible.

  • Batching is preferred:
    The more efficient pattern in Convex is to batch queries when you can, especially when you have an array of IDs. For example, you can fetch all relevant userCards for the user and all relevant userCardCollections in a single query each, then join the results in memory. This reduces the number of database queries and leverages Convex's ability to efficiently scan index ranges (see best practices, see query performance).

#
  • Why not use .filter with .or?
    As discussed previously, using .filter(q => q.or(...)) after .withIndex is not as efficient as a true index scan, but it is still better than issuing many individual queries if you must filter on multiple values (see discussion). However, the best performance comes from using indexes for contiguous ranges, not for "or" conditions.

Recommendation

  • Batch your queries:
    Instead of querying for each card individually, fetch all userCards for the user in one query, all userCardCollections for those cards in one query, and all collections in one batch. Then, join the results in memory. This is the pattern shown in your original code and is recommended in Convex documentation (see best practices).

  • Avoid N+1 patterns:
    Your updated code reintroduces the N+1 pattern, which is less efficient, especially as the number of cards grows (see N+1 discussion).

Conclusion

Your update is not better for efficiency.
The original approach, where you batch queries and join results in memory, is more efficient and recommended for Convex. Stick with batching when possible, and avoid per-item queries in a loop.

#

@woeful sand Kapa.ai is still learning and improving, please let me know how I did by reacting below.

woeful sand
#

@blissful robin update my function to be as efficient as possible

blissful robinBOT
#

Let me check the knowledge base for you again. One moment! :books:

#

To make your function as efficient as possible in Convex, you should batch your queries using indexes and avoid per-item queries in a loop. The recommended pattern is:

  1. Batch fetch all userCards for the user and the relevant card IDs in one query.
  2. Batch fetch all userCardCollections for those userCards in one query.
  3. Batch fetch all userCollections referenced in the junctions in one go.
  4. Join the results in memory.

This approach minimizes the number of database round-trips and leverages Convex's efficient index scans, as recommended in the Convex documentation and community discussions.

Here’s how you can update your function:

import { Doc, Id } from '../_generated/dataModel';
import { QueryCtx } from '../_generated/server';
import { getAuthUserInternal } from '../auth';
import { getLatestCardPricing } from './getCardPricing';

export type CardWithCollectionStatus = Doc<'pokemonCards'> & {
  collectionStatus?: {
    ownedCard: Doc<'userCards'> | null;
    collections: Array<Doc<'userCollections'>>;
  };
};

export async function enrichPokemonCards(
  ctx: QueryCtx,
  cards: Array<Doc<'pokemonCards'>>
): Promise<CardWithCollectionStatus[]> {
  // pricing enrichment
  const prices = await getLatestCardPricing(ctx.db, cards);

  // collection enrichment
  const auth = await getAuthUserInternal(ctx);
  if (auth.isErr || !auth.value?._id) {
    return prices;
  }
  const authUserId = auth.value._id;
#

// 1. Batch fetch all userCards for these cards
const cardIds = prices.map((card) => card._id);
const userCards = await ctx.db
.query('userCards')
.withIndex('by_userId_cardId', (q) => q.eq('userId', authUserId))
.collect();
// Filter in memory for only the cards we care about
const userCardsByCardId = new Map(
userCards.filter(uc => cardIds.includes(uc.cardId)).map(uc => [uc.cardId, uc])
);

// 2. Batch fetch all userCardCollections for these userCards
const userCardIds = Array.from(userCardsByCardId.values()).map(uc => uc._id);
const userCardCollections = userCardIds.length > 0
? await ctx.db
.query('userCardCollections')
.withIndex('by_userCardId_userCollectionId', q =>
q.or(...userCardIds.map(id => q.eq(q.field('userCardId'), id)))
)
.collect()
: [];

// 3. Batch fetch all userCollections referenced in the junctions
const collectionIds = [...new Set(userCardCollections.map(j => j.userCollectionId))];
const collections = await Promise.all(collectionIds.map(id => ctx.db.get(id)));
const collectionsById = new Map(
collections.filter((c): c is Doc<'userCollections'> => c !== null).map(c => [c._id, c])
);

// 4. Build lookup for userCardCollections
const collectionsByUserCardId = new Map<Id<'userCards'>, Id<'userCollections'>[]>();
userCardCollections.forEach(junction => {
const arr = collectionsByUserCardId.get(junction.userCardId) || [];
arr.push(junction.userCollectionId);
collectionsByUserCardId.set(junction.userCardId, arr);
});

#

// 5. Enrich each card
return prices.map(card => {
const ownedCard = userCardsByCardId.get(card._id) ?? null;
const collectionIds = ownedCard ? (collectionsByUserCardId.get(ownedCard._id) || []) : [];
const cardCollections = collectionIds
.map(id => collectionsById.get(id))
.filter((c): c is Doc<'userCollections'> => c !== undefined);

return {
  ...card,
  collectionStatus: {
    ownedCard,
    collections: cardCollections,
  },
};

});
}


**Key points:**
- Only one query per table, no per-card or per-collection queries in a loop.
- All joining is done in memory.
- This pattern is recommended in Convex for efficient querying and to avoid the N+1 problem [More complex queries](<https://docs.convex.dev/database/reading-data#more-complex-queries>), [N+1 discussion](<https://stack.convex.dev/functional-relationships-helpers#what-about-n1>).

If you have a very large number of cards, consider paginating your queries as well to avoid hitting read limits.

If you need to query by multiple IDs efficiently, remember that Convex does not support `or` or `in` directly in `.withIndex`, so you may need to filter in memory after fetching a reasonable index range, as shown above. This is the most efficient pattern available in Convex today.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
#

@woeful sand Kapa.ai is still learning and improving, please let me know how I did by reacting below.

woeful sand
#
// convex/utils/enrichCardsWithCollectionStatus.ts
import { Doc, Id } from '../_generated/dataModel';
import { QueryCtx } from '../_generated/server';
import { getAuthUserInternal } from '../auth';
import { getLatestCardPricing } from './getCardPricing';

export type CardWithCollectionStatus = Doc<'pokemonCards'> & {
  collectionStatus?: {
    ownedCard?: Doc<'userCards'>;
    collections: Array<Doc<'userCollections'>>;
  };
};

export async function enrichPokemonCards(
  ctx: QueryCtx,
  cards: Array<Doc<'pokemonCards'>>
): Promise<CardWithCollectionStatus[]> {
  // pricing enrichment
  const prices = await getLatestCardPricing(ctx.db, cards);

  // collection enrichment
  const auth = await getAuthUserInternal(ctx);
  if (auth.isErr) {
    return prices.map((card) => ({ ...card, collectionStatus: undefined }));
  }

  if (!auth.value?._id) {
    return prices.map((card) => ({ ...card, collectionStatus: undefined }));
  }

  const userId = auth.value._id;

  // BATCH QUERY 1: Get ALL user's cards in one efficient query
  // This leverages the index range efficiently
  const allUserCards = await ctx.db
    .query('userCards')
    .withIndex('by_userId_cardId', (q) => q.eq('userId', userId))
    .collect();

  // BATCH QUERY 2: Get ALL user's collections in one query
  const userCollections = await ctx.db
    .query('userCollections')
    .withIndex('by_userId', (q) => q.eq('userId', userId))
    .collect();

  // BATCH QUERY 3: Get ALL junction entries for this user's cards in one query
  // We can do this efficiently by querying all junctions for this user's userCard IDs
  const userCardIds = allUserCards.map((uc) => uc._id);
  const allJunctions =
    userCardIds.length > 0
      ? await ctx.db
          .query('userCardCollections')
          .withIndex('by_collectionId') // Use the most efficient index
          .collect()
      : [];

  // Filter junctions to only include this user's cards (in-memory filter is efficient here)
  const userCardIdSet = new Set(userCardIds);
  const relevantJunctions = allJunctions.filter((junction) =>
    userCardIdSet.has(junction.userCardId)
  );

  // Build efficient lookup maps for in-memory joins
  const userCardsByCardId = new Map(allUserCards.map((uc) => [uc.cardId, uc]));

  const collectionsById = new Map(
    userCollections.map((collection) => [collection._id, collection])
  );

  const junctionsByUserCardId = new Map<
    Id<'userCards'>,
    Id<'userCollections'>[]
  >();
  relevantJunctions.forEach((junction) => {
    const existing = junctionsByUserCardId.get(junction.userCardId) || [];
    existing.push(junction.userCollectionId);
    junctionsByUserCardId.set(junction.userCardId, existing);
  });

  // IN-MEMORY JOIN: Enrich each card with collection status
  const enrichedCards: CardWithCollectionStatus[] = prices.map((card) => {
    const userCard = userCardsByCardId.get(card._id);
    const isOwned = !!userCard;

    let cardCollections: Array<Doc<'userCollections'>> = [];

    if (userCard) {
      const collectionIds = junctionsByUserCardId.get(userCard._id) || [];

      // Get collection details using our lookup map
      cardCollections = collectionIds
        .map((collectionId) => collectionsById.get(collectionId))
        .filter(
          (collection): collection is Doc<'userCollections'> =>
            collection !== undefined
        );
    }

    return {
      ...card,
      collectionStatus: {
        ownedCard: userCard,
        collections: cardCollections,
      },
    };
  });

  return enrichedCards;
}

this is what i came up with. Is this the correct way for Convex querying now? @blissful robin

blissful robinBOT
#

Let me check the knowledge base for you again. One moment! :books:

#

Your approach is very close to the recommended best practices for efficient querying in Convex. Here’s a review based on the knowledge sources:

What you’re doing well

  • Batching queries:
    You batch-fetch all relevant userCards and userCollections for the user using efficient index scans. This is exactly what Convex recommends for scalable queries, as it avoids the N+1 query problem and leverages indexes for performance Queries that scale, Best Practices.

  • In-memory joins:
    You join the results in memory using lookup maps, which is the preferred way to handle relationships and "joins" in Convex, as described in the Database Relationship Helpers and More complex queries.

  • Efficient filtering:
    You use a Set to efficiently filter junctions for the user’s cards, which is a good in-memory pattern.

One area for improvement

#
  • Junction table query:
    You currently fetch all userCardCollections (junctions) using .withIndex('by_collectionId') and then filter in memory. If your userCardCollections table is large, this could be inefficient.
    The most efficient pattern would be to use an index that allows you to fetch only the junctions for the relevant userCardIds. However, as discussed in the knowledge sources, Convex does not support or or in queries directly in .withIndex, so you cannot fetch all relevant junctions in a single indexed query if you have many userCardIds.
    The current best practice is to use an index that gets you as close as possible (e.g., if you have a by_userCardId index, you could loop over your userCardIds and fetch junctions for each, or batch if the number is small). If the table is small, your approach is fine; if it’s large, consider paginating or restructuring your data/indexes if this becomes a bottleneck Best Practices.

Summary

  • Your function follows Convex’s recommended patterns for efficient querying and in-memory joining.
  • The only caveat is the full-table scan on the junction table, which is a known limitation in Convex’s current query model. If this table grows large, consider paginating or restructuring.

In summary:
Your code is correct and efficient for Convex, given the current limitations of index-based querying. This pattern is explicitly recommended in the official docs and community guides for scalable, maintainable Convex apps Database Relationship Helpers, Best Practices.

#

If you ever need to optimize further, focus on the junction table query as your next step.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@woeful sand Kapa.ai is still learning and improving, please let me know how I did by reacting below.

woeful sand
#
// convex/utils/enrichCardsWithCollectionStatus.ts
import { Doc, Id } from '../_generated/dataModel';
import { QueryCtx } from '../_generated/server';
import { getAuthUserInternal } from '../auth';
import { getLatestCardPricing } from './getCardPricing';

export type CardWithCollectionStatus = Doc<'pokemonCards'> & {
  collectionStatus?: {
    isOwned: boolean;
    ownedCard?: Doc<'userCards'>;
    collections: Array<Doc<'userCollections'>>;
  };
};

export async function enrichPokemonCards(
  ctx: QueryCtx,
  cards: Array<Doc<'pokemonCards'>>
): Promise<CardWithCollectionStatus[]> {
  // pricing enrichment
  const prices = await getLatestCardPricing(ctx.db, cards);

  // collection enrichment
  const auth = await getAuthUserInternal(ctx);
  if (auth.isErr) {
    return prices.map(card => ({ ...card, collectionStatus: undefined }));
  }

  if (!auth.value?._id) {
    return prices.map(card => ({ ...card, collectionStatus: undefined }));
  }

  const userId = auth.value._id;

  // BATCH QUERY 1: Get ALL user's cards in one efficient query
  const allUserCards = await ctx.db
    .query('userCards')
    .withIndex('by_userId_cardId', (q) => q.eq('userId', userId))
    .collect();

  // BATCH QUERY 2: Get ALL user's collections in one query
  const userCollections = await ctx.db
    .query('userCollections')
    .withIndex('by_userId', (q) => q.eq('userId', userId))
    .collect();

  // BATCH QUERY 3: Get junctions efficiently using the proper index
  // Use the by_userCardId_userCollectionId index to get junctions for each userCard
  const userCardIds = allUserCards.map(uc => uc._id);
  
  const junctionPromises = userCardIds.map(userCardId =>
    ctx.db
      .query('userCardCollections')
      .withIndex('by_userCardId_userCollectionId', (q) => 
        q.eq('userCardId', userCardId)
      )
      .collect()
  );

  const junctionResults = await Promise.all(junctionPromises);
  const allJunctions = junctionResults.flat();

  // Build efficient lookup maps for in-memory joins
  const userCardsByCardId = new Map(
    allUserCards.map(uc => [uc.cardId, uc])
  );

  const collectionsById = new Map(
    userCollections.map(collection => [collection._id, collection])
  );

  const junctionsByUserCardId = new Map<Id<'userCards'>, Id<'userCollections'>[]>();
  allJunctions.forEach(junction => {
    const existing = junctionsByUserCardId.get(junction.userCardId) || [];
    existing.push(junction.userCollectionId);
    junctionsByUserCardId.set(junction.userCardId, existing);
  });

  // IN-MEMORY JOIN: Enrich each card with collection status
  const enrichedCards: CardWithCollectionStatus[] = prices.map(card => {
    const userCard = userCardsByCardId.get(card._id);
    const isOwned = !!userCard;
    
    let cardCollections: Array<Doc<'userCollections'>> = [];

    if (userCard) {
      const collectionIds = junctionsByUserCardId.get(userCard._id) || [];
      
      // Get collection details using our lookup map
      cardCollections = collectionIds
        .map(collectionId => collectionsById.get(collectionId))
        .filter((collection): collection is Doc<'userCollections'> => collection !== undefined);
    }

    return {
      ...card,
      collectionStatus: {
        isOwned,
        ownedCard: userCard,
        collections: cardCollections,
      },
    };
  });

  return enrichedCards;
}

alright, so how is this approach now @blissful robin

blissful robinBOT
#

Let me check the knowledge base for you again. One moment! :books:

#

Your latest approach is well-aligned with Convex best practices for efficient querying and relationship handling:

  • Batching queries:
    You batch-fetch all relevant userCards and userCollections for the user using efficient index scans. This is recommended for scalable queries in Convex, as it avoids the N+1 query problem and leverages indexes for performance Best Practices.

  • Efficient junction table access:
    You use the by_userCardId_userCollectionId index to fetch all junctions for each userCard in parallel. While this does result in one query per userCard, it is the most efficient way to fetch only the relevant junctions given Convex’s current index/query model, since Convex does not support or or in queries directly in .withIndex (see discussion). This avoids a full-table scan on the junction table.

  • In-memory joins:
    You build lookup maps and join the results in memory, which is the recommended pattern for handling relationships in Convex Database Relationship Helpers.

Summary:
Your function is efficient and follows Convex’s recommended patterns for scalable querying and relationship handling. The only minor caveat is that if you have a very large number of userCards, the parallel queries for junctions could become a bottleneck, but this is currently the best available pattern in Convex for this use case.

If you ever need to optimize further, consider paginating your queries or restructuring your data model if you hit Convex’s read or execution limits. But for most use cases, your approach is correct and efficient.

#

@woeful sand Kapa.ai is still learning and improving, please let me know how I did by reacting below.

woeful sand
#

about how many documents can convex handle before needing pagination? @blissful robin

blissful robinBOT
#

Let me check the knowledge base for you again. One moment! :books:

#

Convex recommends using pagination or indexes when your table could contain more than a few thousand documents. Specifically, if your table has more than a few thousand documents, you should consider paginating your queries to ensure they remain fast and do not hit system limits. This is because full table scans become slow and can hit enforced limits as your data grows. For example, Convex enforces a limit of scanning no more than 16,384 documents in a single query or mutation function execution. If you expect your tables to grow beyond this, you should implement pagination to avoid errors and performance issues Best Practices Querying performance and limits Queries that scale.

In summary:
If your table has more than a few thousand documents, you should use pagination.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@woeful sand Kapa.ai is still learning and improving, please let me know how I did by reacting below.

woeful sand
#

just looking over the latest function now. We are querying for all userCards and userCollections now, regardless of the cards that are being passed to this function.

This is still more efficient then how i was doing it before, were looping over the cards to then query in promise.all and .map(async () => ...) but only on the cards being passed in?

@blissful robin

blissful robinBOT
#

Let me check the knowledge base for you again. One moment! :books:

#

Yes, batching queries for all of a user's userCards and userCollections is generally more efficient in Convex than issuing individual queries for each card in a loop with Promise.all or .map(async () => ...).

Here's why, based on Convex best practices:

  • Batching with indexes: When you use an indexed query to fetch all userCards for a user, Convex can efficiently scan the relevant range in the index, which is much faster and less resource-intensive than making a separate query for each card. This approach avoids the N+1 query problem and is explicitly recommended in Convex documentation and community guides Best Practices What about N+1?.

  • In-memory filtering: Even if you fetch more data than strictly necessary (e.g., all of a user's cards instead of just the subset passed in), the cost of filtering in memory is usually much lower than the cost of many round-trips to the database. Convex's architecture is designed to make these in-memory operations fast and predictable.

  • Avoiding per-item queries: Looping over the cards and querying for each one individually (even in parallel with Promise.all) results in many separate database queries, which is less efficient and can quickly hit system limits as your data grows What about N+1?.

#

In summary:
Batching and then filtering in memory is the recommended and more scalable approach in Convex, even if it means fetching a superset of the data you need. This is preferable to issuing many small queries, which is less efficient and can lead to performance issues as your data grows.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@woeful sand Kapa.ai is still learning and improving, please let me know how I did by reacting below.

woeful sand
#

okay one more question on what i have now:

and here is my schema:

im querying all user data basically, userCards, userCollections & the junction table of userCardCollections to filter them in memory against the user cards, not really using the indexes i have setup in the schema right?

this is still more efficient then looping over just the cards being passed to this helper & querying based on the indexes? @blissful robin

blissful robinBOT
#

Let me check the knowledge base for you again. One moment! :books:

#

You are correct that in your current approach, you are querying all of the user's userCards, userCollections, and all relevant junctions, and then filtering them in memory to match the cards passed to your function. This means you are not fully leveraging the indexes for the specific set of cards, but rather for the user as a whole.

Is this more efficient than looping and querying per card?
Yes, this is still more efficient than looping over the cards and issuing a query for each one, even if those queries use indexes. The reason is that each individual query (even if indexed) incurs overhead, and making many small queries (the N+1 problem) is less efficient than a single batched query that leverages an index to fetch a contiguous range of documents. Convex's architecture is designed to make these batched, indexed queries fast and predictable, and in-memory filtering is generally much cheaper than many round-trips to the database What about N+1? Best Practices.

Are you using your indexes?
You are using your indexes to efficiently fetch all documents for a user (e.g., .withIndex('by_userId_cardId', q => q.eq('userId', userId))), but you are not using them to restrict the query to only the specific cards passed in. Convex does not currently support in or or queries directly in .withIndex, so you cannot efficiently fetch only a specific subset of cards in a single indexed query. The recommended pattern is to fetch the relevant range (all user cards for a user) and filter in memory Best Practices.

#

Summary:

  • Batching and filtering in memory is more efficient than per-card queries, even if those use indexes.
  • You are using your indexes as intended for the current capabilities of Convex.
  • If your tables grow very large, consider paginating your queries to avoid hitting read limits Querying performance and limits.

This approach is explicitly recommended in Convex’s documentation and community guides for scalable, maintainable apps.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@woeful sand Kapa.ai is still learning and improving, please let me know how I did by reacting below.

woeful sand
#

okay so building off what we know here, this method that is being called as well within enriching the pokemon card data:

import { GenericDatabaseReader } from 'convex/server';

import { DataModel, Doc } from '../_generated/dataModel';
import batchRunner from './batchRunner';

export type CardWithLatestPricing = Doc<'pokemonCards'> &
  Pick<Doc<'pokemonCardPricing'>, 'marketPrice' | 'lastSeenDate'>;

/**
 * Get the latest pricing for a list of cards
 *
 * This is a helper function to get the latest pricing for a list of cards
 * It uses batchRunner to get the latest pricing for a list of cards in batches
 * of 300 cards to avoid exceeding the query limit
 *
 * @param db - The convex database instance
 * @param cards - The list of cards to get the latest pricing for. We pass the card document instead of just card.id as we attach the pricing on each card doc manually to not have to denormalize the data.
 */
export async function getLatestCardPricing(
  db: GenericDatabaseReader<DataModel>,
  cards: Doc<'pokemonCards'>[]
): Promise<Array<CardWithLatestPricing>> {
  const cardsWithLatestPricing: Array<CardWithLatestPricing> = [];

  await batchRunner(cards, 300, async (batchedCards) => {
    await Promise.all(
      batchedCards.map(async (card) => {
        const pricing = await db
          .query('pokemonCardPricing')
          .withIndex('by_cardId_lastSeenDate_variant', (q) =>
            q.eq('cardId', card._id)
          )
          .order('desc')
          .first();

        cardsWithLatestPricing.push({
          ...card,
          marketPrice: pricing?.marketPrice || 0,
          lastSeenDate: pricing?.lastSeenDate || new Date().toISOString(),
        });
      })
    );
  });

  return cardsWithLatestPricing;
}

is this not efficient then? the pokemonCardPricing table has millions of documents @blissful robin

blissful robinBOT
#

Let me check the knowledge base for you again. One moment! :books:

#

Your current getLatestCardPricing function is not the most efficient approach in Convex, especially given that your pokemonCardPricing table has millions of documents.

Why is this inefficient?

  • Per-card queries:
    For each card, you are issuing a separate query to the pokemonCardPricing table (even though you batch the cards, you still run one query per card in the batch). This is the classic N+1 query problem, which is less efficient than batching, especially when the underlying table is very large. Even though you use an index, each query still incurs overhead, and making many queries in parallel can quickly hit system limits or degrade performance What about N+1?.

  • Convex best practice:
    Convex recommends batching queries and using indexes to fetch contiguous ranges, then joining/filtering in memory. However, Convex does not currently support fetching the "latest" document for multiple different cardIds in a single indexed query, so the most efficient pattern available is to batch as much as possible and avoid per-item queries Best Practices.

What could be improved?

  • If you must get the latest pricing for many cards:
    The current approach (one query per card) is not ideal, but due to Convex's current query model, there is no way to fetch the "latest" pricing for multiple card IDs in a single query using indexes. The best you can do is what you are already doing: batching the work to avoid hitting function limits, but you are still making one query per card.
#
  • If you frequently need this data:
    Consider denormalizing or caching the latest price on the card document itself, or maintaining a separate table with the latest price per card, so you can fetch all the latest prices in a single indexed query. This is a common pattern in systems where "latest" or "most recent" data is frequently accessed Best Practices.

Summary

  • Your current approach is not efficient for millions of documents, but is a limitation of Convex's current query model.
  • If this becomes a bottleneck, consider denormalizing or maintaining a "latest price" table for more efficient access.

If you need more details on how to structure this, let me know!

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@woeful sand Kapa.ai is still learning and improving, please let me know how I did by reacting below.

woeful sand
#

having to denormalize every part of my data tho seems to be counter intuitive to setting up your tables & schema with the relational ids. @blissful robin

blissful robinBOT
#

Let me check the knowledge base for you again. One moment! :books:

#

You're right that denormalization can feel counter to the idea of a normalized, relational schema with references. However, in Convex (and many modern databases), denormalization is a common and recommended pattern for optimizing query performance in specific scenarios.

Convex encourages you to start with a normalized schema using relational IDs, which is great for maintainability and flexibility. But, as your app grows and you encounter performance bottlenecks—especially with queries that need to aggregate or fetch the "latest" related data across large tables—denormalizing certain fields or maintaining summary tables becomes necessary for efficiency.

This is explicitly mentioned in Convex best practices:

I personally prefer to start with a fully normalized data model, and only denormalize as performance requires.
Discord: Normalization Best Practices

And for optimizing queries:

If there’s a specific query you need to make faster, you can consider storing extra fields on your document to improve query performance... As a general pattern, you can store booleans or other types on your documents to speed up lookup by those properties.
Denormalizing properties

So, the recommended approach is to design your schema relationally, and then selectively denormalize only the parts that are proven bottlenecks for your access patterns. This hybrid approach gives you both maintainability and performance as your app scales.

If you have further questions about when or how to denormalize in Convex, feel free to ask!

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@woeful sand Kapa.ai is still learning and improving, please let me know how I did by reacting below.

woeful sand
#

these batches or querying of cards should only be about a couple hundred of cards with how my program is set up right now, so sounds like what im doing shouldn't be a problem? or should i store the latest pricing on the card itself @blissful robin

blissful robinBOT
#

Let me check the knowledge base for you again. One moment! :books:

#

If your batches are only a couple hundred cards at a time, your current approach should be fine and not cause performance issues. Convex is designed to handle queries and mutations working with less than a few hundred records efficiently, and this is explicitly recommended in the Zen of Convex:

In general, your mutations and queries should be working with less than a few hundred records and should aim to finish in less than 100ms. It's nearly impossible to maintain a snappy, responsive app if your synchronous transactions involve a lot more work than this.

Denormalizing (storing the latest pricing on the card itself) is only necessary if you start to see performance problems or if your access patterns change and you need to fetch thousands of records at once. The Convex best practice is to start normalized and only denormalize as performance requires Denormalizing properties.

Summary:
With only a couple hundred cards per batch, your current approach is appropriate and you do not need to denormalize the latest pricing onto the card unless you encounter performance issues as your data grows.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@woeful sand Kapa.ai is still learning and improving, please let me know how I did by reacting below.

woeful sand
#
import { paginationOptsValidator } from 'convex/server';
import { Infer, v } from 'convex/values';

import { QueryCtx } from '../../../_generated/server';
import { populateUserCardsWithDetailsAndPricing } from '../../../utils/populateUserCardsWithDetailsAndPricing';

export const GetUserCardsQueryArgs = v.object({
  userId: v.id('users'),
  pagination: paginationOptsValidator,
});

export const getUserCardsHandler = async (
  ctx: QueryCtx,
  args: Infer<typeof GetUserCardsQueryArgs>
) => {
  const query = ctx.db
    .query('userCards')
    .withIndex('by_userId_cardId', (q) => q.eq('userId', args.userId))
    .order('desc');

  const paginatedUserCards = await query.paginate(args.pagination);
  const { page, ...rest } = paginatedUserCards;

  const cards = await populateUserCardsWithDetailsAndPricing(ctx.db, page);

  return {
    cards,
    ...rest,
  };
};
import { Doc } from '../_generated/dataModel';
import { QueryCtx } from '../_generated/server';
import { getPricingByCardId } from './getCardPricing';

/**
 * populate a list of user cards with their relational data
 *
 * - `pokemonCards` table details
 * - `pokemonCardsPricing` table details (latest pricing doc)
 */
export async function populateUserCardsWithDetailsAndPricing(
  db: QueryCtx['db'],
  userCards: Array<Doc<'userCards'>>
) {
  const populatedCards = await Promise.all(
    userCards.map(async (card) => {
      const cardDetails = await db.get(card.cardId);
      if (!cardDetails) return null;

      const latestPricing = await getPricingByCardId(db, cardDetails._id);

      return {
        userCard: card,
        pokemonCard: cardDetails,
        pricing: latestPricing,
      };
    })
  );

  const cardsWithLatestPricing = populatedCards.filter((card) => card !== null);

  return cardsWithLatestPricing;
}
export async function getPricingByCardId(
  db: QueryCtx['db'],
  cardId: Id<'pokemonCards'>
) {
  return await db
    .query('pokemonCardPricing')
    .withIndex('by_cardId_lastSeenDate_variant', (q) => q.eq('cardId', cardId))
    .order('desc')
    .first();
}

looking at something like this, if its paginated to less than a couple hundred at a time, is this efficient enough? compared to some of the other queries we just wrote.

If i had pagination to all queries and loop over in an array where pagination wouldnt be more than a couple hundred MAX to loop over, does that make performance better? @blissful robin

blissful robinBOT
#

Let me check the knowledge base for you again. One moment! :books:

#

Yes, if your pagination ensures that you are only ever working with a couple hundred records at a time, your current approach is efficient and aligns with Convex best practices.

Convex is designed to handle queries and mutations that work with less than a few hundred records efficiently. The documentation and community guidance repeatedly state that as long as you are not scanning or processing thousands of documents in a single function, you are within the recommended usage patterns. For example:

If your table could contain more than a few thousand documents, you should consider pagination or an index with a range expression to ensure that your queries stay fast.
Best Practices: Use indexes or paginate all large database queries

In general, your mutations and queries should be working with less than a few hundred records and should aim to finish in less than 100ms.
The Zen of Convex

Paginating all queries and only looping over a couple hundred items at a time is the recommended way to keep performance predictable and efficient in Convex. This is preferable to unbounded queries or full table scans, and is the same pattern used in the efficient examples from the knowledge sources (Queries that scale).

Summary:

  • Your approach is efficient for paginated queries with a couple hundred items.
  • This is in line with Convex’s best practices and should not cause performance issues.
  • If your data grows, always keep queries paginated and avoid unbounded loops or full table scans.
#

If you ever need to process more than a few hundred records, consider further batching or denormalization, but for your current usage, you are following Convex’s recommended patterns.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@woeful sand Kapa.ai is still learning and improving, please let me know how I did by reacting below.

woeful sand
#

so even if im querying the db for each card separately here in that getPricebyCardId, it would still be conisdered best practice since im using pagination @blissful robin

blissful robinBOT
#

Let me check the knowledge base for you again. One moment! :books:

#

Yes, if you are paginating so that you only process a couple hundred cards at a time, querying the database for each card separately (as in your getPricingByCardId function) is considered acceptable and aligns with Convex best practices. Convex is designed to efficiently handle queries and mutations that work with less than a few hundred records at a time, and this is explicitly recommended in the documentation:

In general, your mutations and queries should be working with less than a few hundred records and should aim to finish in less than 100ms.
The Zen of Convex

While batching is always preferred when possible, Convex’s architecture makes per-item queries much less costly than in traditional databases, especially when the total number of queries is kept small by pagination. This is also discussed in the context of the N+1 problem, where Convex’s close-to-database execution and efficient point queries make this pattern reasonable for small batches What about N+1?.

Summary:

  • Querying per card is fine if you are paginating to a few hundred items or less.
  • This is in line with Convex’s best practices and should not cause performance issues for small batches.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@woeful sand Kapa.ai is still learning and improving, please let me know how I did by reacting below.

woeful sand
#

So update this then to show me the difference with pagination involved. does that mean we can efficiently querying now by indexes, to not load all userCard docs, userCollection docs AND userCardCollections doc into memory ?

if we are paginating for under 300 cards

#

So update this then to show me the difference with pagination involved. does that mean we can efficiently querying now by indexes, to not load all userCard docs, userCollection docs AND userCardCollections doc into memory ?

if we are paginating for under 300 cards
@blissful robin

blissful robinBOT
#

Let me check the knowledge base for you again. One moment! :books:

#

If you paginate your queries so that you are only working with under 300 cards at a time, you can and should use indexes to efficiently fetch only the relevant documents for that page, rather than loading all userCards, userCollections, and userCardCollections into memory.

Here’s how pagination changes your approach:

  • userCards:
    Instead of loading all userCards for a user, you use an indexed, paginated query to fetch only the current page of userCards:

    const paginatedUserCards = await ctx.db
      .query('userCards')
      .withIndex('by_userId_cardId', (q) => q.eq('userId', userId))
      .order('desc')
      .paginate(args.pagination);
    const { page: userCardsPage } = paginatedUserCards;
    

    This ensures you only have (for example) 100–300 userCards in memory at a time, not all of them.

  • userCardCollections (junction table):
    For the userCards in the current page, you can fetch the relevant junctions by querying for each userCard’s ID using the appropriate index. Since you have a small number of userCards (due to pagination), this is efficient:

    const userCardIds = userCardsPage.map(uc => uc._id);
    const junctions = await Promise.all(
      userCardIds.map(userCardId =>
        ctx.db
          .query('userCardCollections')
          .withIndex('by_userCardId_userCollectionId', (q) => q.eq('userCardId', userCardId))
          .collect()
      )
    );
    const allJunctions = junctions.flat();
    

    This avoids loading all junctions for the user, and only loads those relevant to the current page.

#
  • userCollections:
    After collecting the relevant userCollection IDs from the junctions, you can fetch just those collections:
    const collectionIds = [...new Set(allJunctions.map(j => j.userCollectionId))];
    const collections = await Promise.all(collectionIds.map(id => ctx.db.get(id)));
    
    Again, this only loads the collections needed for the current page.

Summary:
With pagination, you use indexes to efficiently fetch only the documents needed for the current page, rather than loading all related documents into memory. This is the recommended and scalable approach in Convex, and is explicitly supported by the API and best practices documentation Best Practices Paginated Queries.

This pattern is efficient and scalable for up to a few hundred records per page, and is preferable to loading all related documents at once.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@woeful sand Kapa.ai is still learning and improving, please let me know how I did by reacting below.

woeful sand
#

Is pagination correctly utilized here? and is using the enrichPokemonCards fn that we have talked about used properly along with pagination @blissful robin

blissful robinBOT
#

Let me check the knowledge base for you again. One moment! :books:

#

Yes, pagination is correctly utilized in your example, and using the enrichPokemonCards function in conjunction with pagination is an appropriate and recommended pattern in Convex.

Why is this correct?

  • Pagination on userCards:
    You are using .paginate(args.pagination) on your indexed query for userCards, which ensures that you only fetch a limited number of documents (e.g., under 300) per request. This is explicitly recommended in Convex documentation for efficient querying and to avoid loading too many documents into memory at once Best Practices Paginated Queries.

  • Enrichment after pagination:
    After paginating, you pass only the current page of userCards to your enrichPokemonCards (or similar enrichment) function. This means all subsequent lookups and joins (for details, collections, pricing, etc.) are performed only for the small, paginated subset, not for the entire dataset. This is the pattern shown in Convex’s documentation and community examples for scalable, efficient data enrichment Paginated Queries Queries that scale.

  • Efficient use of indexes:
    By using indexes for your queries and limiting the number of records processed at each step, you are following Convex’s guidance for scalable and performant applications.

Summary:

  • Pagination is correctly used to limit the number of userCards processed at a time.
  • The enrichment function is used efficiently, operating only on the paginated subset.
  • This approach is explicitly recommended in Convex’s documentation and best practices.
#

If you continue to paginate all queries and only process a few hundred records at a time, your approach will remain efficient and scalable in Convex.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@woeful sand Kapa.ai is still learning and improving, please let me know how I did by reacting below.

woeful sand
#
import { Doc } from '../_generated/dataModel';
import { QueryCtx } from '../_generated/server';
import { getAuthUserInternal } from '../auth';
import { getLatestCardPricing } from './getCardPricing';

export type CardWithCollectionStatus = Doc<'pokemonCards'> & {
  collectionStatus?: {
    ownedCard: Doc<'userCards'> | null;
    collections: Array<Doc<'userCollections'>>;
  };
};

export async function enrichPokemonCards(
  ctx: QueryCtx,
  cards: Array<Doc<'pokemonCards'>>
) {
  // pricing enrichment
  const prices = await getLatestCardPricing(ctx.db, cards);

  // collection enrichment
  const auth = await getAuthUserInternal(ctx);
  if (auth.isErr) {
    return prices;
  }

  if (!auth.value?._id) {
    return prices;
  }

  const enrichedCards: Array<CardWithCollectionStatus> = await Promise.all(
    prices.map(async (card) => {
      const ownedCard = await ctx.db
        .query('userCards')
        .withIndex('by_userId_cardId', (q) =>
          q.eq('userId', auth.value._id).eq('cardId', card._id)
        )
        .first();

      const collections = (
        ownedCard
          ? await ctx.db
              .query('userCardCollections')
              .withIndex('by_userCardId_userCollectionId', (q) =>
                q.eq('userCardId', ownedCard._id)
              )
              .collect()
              .then(
                async (results) =>
                  await Promise.all(
                    results.map(async (res) => {
                      const collection = await ctx.db.get(res.userCollectionId);
                      if (!collection) return null;

                      return collection;
                    })
                  )
              )
          : []
      ).filter((col) => col !== null);

      return {
        ...card,
        collectionStatus: {
          ownedCard,
          collections,
        },
      };
    })
  );

  return enrichedCards;
}

alright so knowing what we know with about paginating the queries, so this enrichPokemonCards function should only be getting a subset (under 1k cards). Is this approach valid again now? @blissful robin

blissful robinBOT
#

Let me check the knowledge base for you again. One moment! :books:

#

Yes, your approach is valid and acceptable in Convex as long as the enrichPokemonCards function is only ever called with a paginated subset of cards (e.g., under 1,000, but ideally a few hundred or less per request). This pattern—where you perform per-card lookups and enrichment after pagination—is explicitly supported and recommended in Convex’s documentation and community guidance:

  • Convex is designed to efficiently handle queries and mutations that work with less than a few hundred records at a time. The documentation states:

    In general, your mutations and queries should be working with less than a few hundred records and should aim to finish in less than 100ms.
    The Zen of Convex

  • Paginating your queries before enrichment ensures you are not loading or processing too many documents at once, which is the recommended way to keep performance predictable and efficient in Convex Best Practices Paginated Queries.

  • While batching is generally preferred over per-item queries, Convex’s architecture makes per-item lookups reasonable for small batches, especially when you are paginating and not exceeding a few hundred records per function call What about N+1?.

Summary:

  • Your function is valid and efficient as long as it operates on a paginated subset of cards (ideally a few hundred or less).
  • This approach is in line with Convex’s best practices and should not cause performance issues for small batches.
#

If your data or access patterns change and you need to process larger sets, consider further batching or denormalization, but for your current usage, your approach is correct.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@woeful sand Kapa.ai is still learning and improving, please let me know how I did by reacting below.

woeful sand
#
import { Doc } from '../../_generated/dataModel';
import { QueryCtx } from '../../_generated/server';
import { getAuthUserInternal } from '../../auth';
import batchRunner from '../../utils/batchRunner';
import {
  CardWithLatestPricing,
  getLatestCardPricing,
} from '../../utils/getCardPricing';

export type EnrichedCardWithPricingAndCollectionData = CardWithLatestPricing & {
  collectionStatus?: {
    ownedCard: Doc<'userCards'> | null;
    collections: Array<Doc<'userCollections'>>;
  };
};

export async function enrichPokemonCards(
  ctx: QueryCtx,
  cards: Array<Doc<'pokemonCards'>>
) {
  if (cards.length > 1000)
    console.warn(
      '[enrichPokemonCards] Warning. Cards array being passed to this function is over 1k. This should be used with pagination as its using N+1 requests in simple relational queries. This follows best practices for paginated queries that are less than 1k documents, consider reworking the caller of this function to use pagination or optimize `enrichPokemonCards()`'
    );

  // pricing enrichment
  const cardsWithPricing = await getLatestCardPricing(ctx.db, cards);

  // user collection enrichment

  const auth = await getAuthUserInternal(ctx);

  // no auth user - just return the cards with pricing & no collection data
  if (auth.isErr || !auth.value?._id) {
    return cardsWithPricing;
  }
  const authUserId = auth.value._id;

  const enrichedCards: Array<EnrichedCardWithPricingAndCollectionData> = [];

  await batchRunner(cardsWithPricing, 300, async (batch) => {
    await Promise.all(
      batch.map(async (card) => {
        const ownedCard = await ctx.db
          .query('userCards')
          .withIndex('by_userId_cardId', (q) =>
            q.eq('userId', authUserId).eq('cardId', card._id)
          )
          .first();

        const collections = (
          ownedCard
            ? await ctx.db
                .query('userCardCollections')
                .withIndex('by_userCardId_userCollectionId', (q) =>
                  q.eq('userCardId', ownedCard._id)
                )
                .collect()
                .then(
                  async (results) =>
                    await Promise.all(
                      results.map(async (res) => {
                        const collection = await ctx.db.get(
                          res.userCollectionId
                        );
                        if (!collection) return null;

                        return collection;
                      })
                    )
                )
            : []
        ).filter((col) => col !== null);

        const enrichedCard = {
          ...card,
          collectionStatus: {
            ownedCard,
            collections,
          },
        };

        enrichedCards.push(enrichedCard);
      })
    );
  });

  return enrichedCards;
}

updated it a bit to use the batchRunner:

/**
 * Runs an async function over an array in batches to stay within Convex limits
 * @param array The array to process
 * @param batchSize The size of each batch (default: 250)
 * @param fn The async function to run on each batch
 */
const batchRunner = async <T>(
  array: T[],
  batchSize: number = 250,
  fn: (
    batch: T[],
    batchInfo: {
      batchIndex: number;
      totalBatches: number;
      startIndex: number;
      endIndex: number;
    }
  ) => Promise<void>
) => {
  const totalBatches = Math.ceil(array.length / batchSize);

  for (let i = 0; i < array.length; i += batchSize) {
    const batch = array.slice(i, i + batchSize);
    const batchIndex = Math.floor(i / batchSize);

    await fn(batch, {
      batchIndex,
      totalBatches,
      startIndex: i,
      endIndex: Math.min(i + batchSize, array.length),
    });
  }
};

export default batchRunner;

Now is this efficient for paginated queries, or cards less than 1k items?

#
import { Doc } from '../../_generated/dataModel';
import { QueryCtx } from '../../_generated/server';
import { getAuthUserInternal } from '../../auth';
import batchRunner from '../../utils/batchRunner';
import {
  CardWithLatestPricing,
  getLatestCardPricing,
} from '../../utils/getCardPricing';

export type EnrichedCardWithPricingAndCollectionData = CardWithLatestPricing & {
  collectionStatus?: {
    ownedCard: Doc<'userCards'> | null;
    collections: Array<Doc<'userCollections'>>;
  };
};

export async function enrichPokemonCards(
  ctx: QueryCtx,
  cards: Array<Doc<'pokemonCards'>>
) {
  if (cards.length > 1000)
    console.warn(
      '[enrichPokemonCards] Warning. Cards array being passed to this function is over 1k. This should be used with pagination as its using N+1 requests in simple relational queries. This follows best practices for paginated queries that are less than 1k documents, consider reworking the caller of this function to use pagination or optimize `enrichPokemonCards()`'
    );

  // pricing enrichment
  const cardsWithPricing = await getLatestCardPricing(ctx.db, cards);

  // user collection enrichment

  const auth = await getAuthUserInternal(ctx);

  // no auth user - just return the cards with pricing & no collection data
  if (auth.isErr || !auth.value?._id) {
    return cardsWithPricing;
  }
  const authUserId = auth.value._id;

  const enrichedCards: Array<EnrichedCardWithPricingAndCollectionData> = [];

  await batchRunner(cardsWithPricing, 300, async (batch) => {
    await Promise.all(
      batch.map(async (card) => {
        const ownedCard = await ctx.db
          .query('userCards')
          .withIndex('by_userId_cardId', (q) =>
            q.eq('userId', authUserId).eq('cardId', card._id)
          )
          .first();

        const collections = (
          ownedCard
            ? await ctx.db
                .query('userCardCollections')
                .withIndex('by_userCardId_userCollectionId', (q) =>
                  q.eq('userCardId', ownedCard._id)
                )
                .collect()
                .then(
                  async (results) =>
                    await Promise.all(
                      results.map(async (res) => {
                        const collection = await ctx.db.get(
                          res.userCollectionId
                        );
                        if (!collection) return null;

                        return collection;
                      })
                    )
                )
            : []
        ).filter((col) => col !== null);

        const enrichedCard = {
          ...card,
          collectionStatus: {
            ownedCard,
            collections,
          },
        };

        enrichedCards.push(enrichedCard);
      })
    );
  });

  return enrichedCards;
}

updated it a bit to use the batchRunner:

/**
 * Runs an async function over an array in batches to stay within Convex limits
 * @param array The array to process
 * @param batchSize The size of each batch (default: 250)
 * @param fn The async function to run on each batch
 */
const batchRunner = async <T>(
  array: T[],
  batchSize: number = 250,
  fn: (
    batch: T[],
    batchInfo: {
      batchIndex: number;
      totalBatches: number;
      startIndex: number;
      endIndex: number;
    }
  ) => Promise<void>
) => {
  const totalBatches = Math.ceil(array.length / batchSize);

  for (let i = 0; i < array.length; i += batchSize) {
    const batch = array.slice(i, i + batchSize);
    const batchIndex = Math.floor(i / batchSize);

    await fn(batch, {
      batchIndex,
      totalBatches,
      startIndex: i,
      endIndex: Math.min(i + batchSize, array.length),
    });
  }
};

export default batchRunner;

Now is this efficient for paginated queries, or cards less than 1k items? @blissful robin

blissful robinBOT
#

Let me check the knowledge base for you again. One moment! :books:

#

Yes, your approach using batchRunner to process cards in batches (e.g., 300 at a time) is efficient and aligns with Convex best practices for paginated queries or when working with less than 1,000 items.

Why is this efficient?

  • Batching keeps you within Convex’s function and read limits: Convex recommends that queries and mutations work with less than a few hundred records at a time, and batching ensures you don’t exceed these limits, even if the input array is close to 1,000 items The Zen of Convex.
  • Pagination and batching are explicitly recommended: Convex documentation and community guidance state that for large tables or result sets, you should use pagination or batching to avoid performance issues and stay within system limits Best Practices Paginated Queries.
  • N+1 pattern is acceptable for small batches: While your function does use an N+1 pattern (one query per card for enrichment), Convex’s architecture makes this reasonable for small batches, especially when you are paginating and batching the work What about N+1?.

Summary:

  • Your use of batchRunner to process cards in batches of 300 is efficient and safe for paginated queries or when working with less than 1,000 items.
  • This approach is in line with Convex’s best practices and should not cause performance issues for small batches.
#

If your data or access patterns change and you need to process larger sets, consider further batching, pagination, or denormalization, but for your current usage, your approach is correct and efficient.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon: