#[Aggregate]: `aggregate.at` exceedingly slow?

12 messages · Page 1 of 1 (latest)

frank cradle
#

Hey, we are using the shuffle example at https://github.com/get-convex/aggregate/blob/main/example/convex/shuffle.ts but are finding the aggregate.at(ctx, index) to be exceedingly slow.

See the snippet below: async Promise.all takes almost a full second (counted by console.time) for just 30 elements. Is this to be expected?

    // indexes has length = 30
    const indexes = allIndexes.slice(offset, offset + numItems);
    console.time("atIndexes");
    const atIndexes = await Promise.all(
      indexes.map((i) => profileSubmissionAggregate.at(ctx, i)),
    );
    console.timeEnd("atIndexes"); // ~1000 milliseconds
GitHub

Component for aggregating counts and sums of Convex documents - get-convex/aggregate

inland slate
#

I think the answer here would be to expose a batched at.
Currently every call to ctx.run* spins up a new v8 isolate - each call to a component runs in its own environment which can add ~50ms, which in a batch like this can get quite noticeable! Can you file an issue on the repo? Hopefully can get that out as early as next week

frank cradle
slim ginkgo
#

yeah, I've needed this too! batch operations for the aggregate component

#

very ➕ on this feature

frank cradle
#

This problem is rearing its head now in another function that calls an aggregate inside Promise.all. The code below on an array of length 170 takes 4000 ms 😱 (measured by console.time)

   await Promise.all(
      nonRejected.map(async (video) => {
        const watchCount = await watchPercentageByVideo.count(ctx, {
          namespace: video._id,
          bounds,
        });
        watchCountMap.push({ videoId: video._id, watchCount });
      }),
    );
      });```
#

This is a pretty hefty aggregate however as its a tableaggregate on a table with half a million documents. Maybe setting rootLazyto false to allow quicker count if that's what I'm mostly using it for would be beneficial here?

inland slate
#

The actual count is fast, but spinning up 170 separate v8 isolates is what takes time, and it's possible those are serialized across the component boundary for determinism /serializability reasons.
I'll bump this up on my backlog, thanks for chiming in

frank cradle
#

Hey @inland slate , any idea on the approx timeline of this? We still have sad users in production with >4 sec execution times on simple queries! 🥲

inland slate
# frank cradle Hey <@897754604790480906> , any idea on the approx timeline of this? We still ha...

I kicked off a coding agent to implement it, and it's making surprisingly good progress (I have mostly been let down by them so far).
You can follow along here: https://github.com/get-convex/aggregate/pull/43
And there's already an npm package you can test out:
npm i https://pkg.pr.new/get-convex/aggregate/@convex-dev/aggregate@43
If you could do a test of that in dev with representative data, that would be a big relief to make sure it solves the problem

GitHub

Add batch count() and at() functions to aggregate component
Summary
This PR adds batch versions of the count() and at() methods to the aggregate component, allowing multiple queries to be executed ...

frank cradle
# inland slate I kicked off a coding agent to implement it, and it's making surprisingly good p...

Thanks Ian! I installed the package and tried it out, but it seems to be lacking support for passing different namespaces to the array of arguments?

Atm, you can vary the bounds bounds:

const queries = [
  { bounds: { lower: { key: 0 }, upper: { key: 10 } } },
  { bounds: { lower: { key: 10 }, upper: { key: 20 } } }
];
await aggregate.batchCount(ctx, { queries }); // works

But not differing namespaces:

  const bounds = { ... };

  const queries = [
    {
      bounds,
      namespace: video1._id,
    },
    {
      bounds,
      namespace: video2._id,
    },
  ];
  await aggregate.batchCount(ctx, { queries }); // doesnt work

I am using the latter pattern: I keep bounds constant and vary the namespace for each call to .count that I want batched. I guess the api should support both patterns and their combination

#

Looking briefly at the implementation of batchCount, maybe the approach to support different namespaces is something like this?

export async function batchCount(
  ctx: RunQueryCtx,
  ...opts: NamespacedOpts<{ query: { bounds?: Bounds<K, ID> } }, Namespace>[]
): Promise<number[]> {
  const queryArgs: { k1?: Position; k2?: Position; namespace: Namespace }[] =
    opts.map((opt) => {
      const namespace = namespaceFromOpts(opt);
      const query = opt[0]!.query;

      const { k1, k2 } = boundsToPositions(query.bounds);

      const queryArg = {
        k1,
        k2,
        namespace,
      };
      return queryArg;
    });

  const results = await ctx.runQuery(
    this.component.btree.aggregateBetweenBatch,
    {
      queries: queryArgs,
    },
  );

  return results.map((result: { count: number }) => result.count);
}