#Join

14 messages · Page 1 of 1 (latest)

heavy basin
#

const profileTagsWithRank = await ctx.db
.query("profileTags")
.withIndex("by_tag", (q) => q.eq("tag", tag.slug))
.join("profile", (q) => q.db.get(q.field("profile")))
.filter((q) => q.neq(q.field("profile"), null))
.order("asc", (q) => q.field("profile.rank"))
.take(4);

How can I do this properly?

plain barnBOT
#

Hi @heavy basin. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:

#

It looks like you're trying to join profile data with profile tags and order by a rank field. There are a few issues with your current approach. Convex doesn't support the kind of direct join operation you're attempting with the .join() method, and the ordering by a nested field (profile.rank) isn't supported either.

Here's how you can properly implement this query in Convex:

// First, query the profileTags with the given tag
const profileTagsWithTag = await ctx.db
  .query("profileTags")
  .withIndex("by_tag", (q) => q.eq("tag", tag.slug))
  .collect();

// Then, fetch the associated profiles and combine the data
const profilesWithRank = await Promise.all(
  profileTagsWithTag.map(async (profileTag) => {
    const profile = await ctx.db.get(profileTag.profile);
    if (profile === null) return null;
    return { ...profileTag, profile };
  })
);

// Filter out any null profiles
const validProfiles = profilesWithRank.filter(item => item !== null);

// Sort by the profile's rank
validProfiles.sort((a, b) => a.profile.rank - b.profile.rank);

// Take only the first 4 results
const topFourProfiles = validProfiles.slice(0, 4);

This approach follows Convex's recommended pattern for joining data from related tables:

  1. First, query one table using an index for efficiency Introduction to Indexes and Query Performance
  2. Then, use Promise.all with map to fetch the related documents Database Relationship Helpers
  3. Finally, perform any additional filtering, sorting, and limiting in JavaScript
#

For more complex relationships, you might also consider using the getManyVia helper from the convex-helpers package, which simplifies this pattern Database Relationship Helpers.

If you need to paginate this query for larger datasets, you would need to implement a different approach using cursor-based pagination.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@heavy basin Kapa.ai is still learning and improving, please let me know how I did by reacting below.

heavy basin
#

@plain barn I have 500,000 records this won't work. How can I make this efficent?

plain barnBOT
#

Let me check the knowledge base for you again. One moment! :books:

#

You're right - with 500,000 records, the approach I suggested won't work efficiently due to Convex's limits on how many documents can be processed in a single query. Convex has an 8MB limit for data read in a single function execution, and you'll hit this limit with large datasets.

Here are more efficient approaches for handling large datasets in Convex:

1. Implement proper pagination

Instead of trying to fetch all records at once, implement pagination:

export const getProfilesByTag = query({
  args: { tag: v.string(), cursor: v.optional(v.string()) },
  handler: async (ctx, args) => {
    return await ctx.db
      .query("profileTags")
      .withIndex("by_tag", (q) => q.eq("tag", args.tag))
      .paginate({ cursor: args.cursor });
  }
});

Then on the client side, you can load pages as needed and fetch the related profiles for each page.

2. Use offset-based pagination with the aggregate component

For jumping to specific pages, you can use the offset-based pagination from the aggregate component:

// Using the @convex-dev/aggregate package
const { key } = await profileTags.at(ctx, offset, { 
  namespace: tag.slug 
});

Offset-based Pagination

3. Maintain counters on write

Instead of counting on read, maintain counters when documents are created or modified:

// When adding a new profile tag
export const addProfileTag = mutation({
  handler: async (ctx, args) => {
    // Add the tag
    await ctx.db.insert("profileTags", {...});
    
    // Increment the counter for this tag
    await ctx.db.patch(counterDocId, {
      count: { $increment: 1 }
    });
  }
});

4. Consider denormalizing data

#

For your specific use case, you might want to denormalize some profile data into the profileTags table to avoid joins altogether:

// Store rank directly in profileTags
const profileTagsSchema = defineTable({
  tag: v.string(),
  profile: v.id("profiles"),
  profileRank: v.number(), // Denormalized from profile
  // other fields...
}).index("by_tag_rank", ["tag", "profileRank"]);

Then your query becomes much simpler:

const topProfiles = await ctx.db
  .query("profileTags")
  .withIndex("by_tag_rank", (q) => 
    q.eq("tag", tag.slug)
  )
  .order("asc")
  .take(4);

5. Stream query for processing large datasets

For operations like backfilling or migrations, use streaming:

// In an action (not a query or mutation)
for await (const doc of ctx.db.query("profileTags").withIndex("by_tag")) {
  // Process in batches
  // Be careful not to exceed function limits
}

Remember that with 500,000 records, you'll need to be very careful about how you structure your queries and indexes to avoid hitting Convex's limits.

Queries that scale

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:

#

@heavy basin Kapa.ai is still learning and improving, please let me know how I did by reacting below.

heavy basin
#

@plain barn Can I use the aggregate component to help with this?

plain barnBOT
#

Let me check the knowledge base for you again. One moment! :books:

#

Yes, the aggregate component can definitely help with your scenario of handling 500,000 records efficiently. It's designed specifically for cases like yours where you need to work with large datasets without hitting Convex's 8MB read limit.

Here's how you can use the aggregate component for your use case:

Using Aggregate for Offset-Based Pagination

The aggregate component allows you to implement offset-based pagination, which is perfect for your situation:

import { TableAggregate } from "@convex-dev/aggregate";
import { components } from "./_generated/api";
import { DataModel } from "./_generated/dataModel";

// Define your aggregate
const profileTagsAggregate = new TableAggregate<{
  Namespace: string; // tag slug
  Key: number; // profile rank
  DataModel: DataModel;
  TableName: "profileTags";
}>(components.aggregate, {
  namespace: (doc) => doc.tag,
  sortKey: (doc) => doc.profileRank, // Assuming you've denormalized the rank
});

// Query function with pagination
export const getProfilesByTagPaginated = query({
  args: { tag: v.string(), offset: v.number(), numItems: v.number() },
  handler: async (ctx, { tag, offset, numItems }) => {
    // Get the key at the specified offset
    const { key } = await profileTagsAggregate.at(ctx, offset, { namespace: tag });
    
    // Use that key to paginate
    return await ctx.db.query("profileTags")
      .withIndex("by_tag_rank", q => 
        q.eq("tag", tag).gte("profileRank", key)
      )
      .take(numItems);
  },
});

This approach allows you to jump to any page in your results without having to scan through all previous records, making it much more efficient for large datasets.

Setting Up the Aggregate Component

  1. First, install the component:
    npm install @convex-dev/aggregate
    
#
  1. Create a convex.config.ts file in your convex/ folder:

    import { defineApp } from "convex/server";
    import aggregate from "@convex-dev/aggregate/convex.config";
    
    const app = defineApp();
    app.use(aggregate);
    export default app;
    
  2. Set up triggers to keep the aggregate in sync with your data:

    import { Triggers } from "convex-helpers/server/triggers";
    
    const triggers = new Triggers<DataModel>();
    triggers.register("profileTags", profileTagsAggregate.trigger());
    
    export const wrapDB = (ctx) => triggers.wrapDB(ctx);
    
  3. Backfill existing data (in batches to avoid the 8MB limit):

    export const backfillAggregates = internalMutation({
      handler: async (ctx) => {
        await profileTagsAggregate.clear(ctx);
        // Schedule a backfill job that processes in batches
      },
    });
    

The aggregate component is specifically designed to handle large datasets efficiently by maintaining a denormalized data structure that allows for fast counting, summing, and offset-based access without having to scan the entire table.

Aggregate Component
Offset-based Pagination

Remember that for this to work optimally, you should denormalize the profile rank into your profileTags table as mentioned in our previous conversation.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon: