#Get the count of the query without using collect()

21 messages · Page 1 of 1 (latest)

vocal laurel
#

Hi, I'm curious to learn if there is a way to get the total count of the particular query, not paginated, just regular query.
I can't find anything in the docs nor here in the conversations.

Thank you!

burnt rapids
buoyant wedge
#

There isn’t currently, you have to collect and count. If you need this, I’d recommend storing any counts explicitly in a table, use a migration to get initial count(s) and increment/decrement from there. Not great, but possible.

vocal laurel
#

@burnt rapids yes, total count of the query.
@buoyant wedge thank you, i'm doing that which is not great.
As a matter of fact Firebase added this capability not long ago, as the demand was very high.

buoyant wedge
#

cc/ @last finch (we need a way to at-mention convex team btw)

#

@vocal laurel curious, can you share a little more on what your use case is?

last finch
#

@vocal laurel i hear you- there has been some interest in both full table counts and query count. one related thread with some thinking: https://discord.com/channels/1019350475847499849/1166106601543962715
One thing that is worth considering is that when you do a count of a query in SQL, unless it’s a full table count or something it specifically keeps statistics on, it has to scan all the related rows to get the count. So it’s just a more bandwidth-efficient version of .collect().length. However if you have 10k+ documents in the query, sql will just take longer, where Convex will currently refuse to read that much. That’s why Firebase didn’t have it for so long, and why their docs say

scale proportionally to the number of index entries scanned. Latency increases with the number of items in the aggregation.

for firebase, it’s especially important since all of the data would otherwise be sent all the way to the client. In Convex, your data is only going a short hop to your function.

Denormalizing as a pattern is the efficient solution, and something I want to make some user-space helpers for, so it feels less manual & safer to get regularly used statistics.

I know the answer isn’t exactly what you want, but I’m curious how workable it sounds? What is your used case so we can better evaluate alternatives?

icy bloom
#

@last finch firestore got count/sum/avg on client not that long ago, and it doesn't fetch all the document, of course 🙂

#

i was hoping to see triggers in Convex(which could be, theoretically, used for these kind of problems), and was promised they're coming, but it seems like it's not a high priority

burnt rapids
last finch
last finch
# icy bloom <@897754604790480906> firestore got count/sum/avg on client not that long ago, a...

Yes firestore doesn't send all documents to the client, but it does have to scan them all, assuming based on their docs:

scale proportionally to the number of index entries scanned. Latency increases with the number of items in the aggregation.
So it's similar: your backend (which is close to the DB) scans the documents and sends the client the count. It just currently will incur more function <-> DB bandwidth, whereas firestore absorbs that cost for you.

I think we're agreeing, but lmk if I missed something

last finch
icy bloom
#

I'm really interested in a normal db-driven triggers, imho, they're the best feature of firestore, basically for any event-driven flows it's a must. I'll wait, I'm patient and i know how hard these things are to implement 🙂

buoyant wedge
#

@icy bloom for what use case are you seeing the current api's not supporting triggers well? I'm doing event triggers pretty minimally, but it's working so far. Curious what your experience has been.

icy bloom
#

Anything that remotely resembles CQRS pattern. When you have a flow of commands and events that trigger each other. Imagine you have a command like "create patient", after it's executed and an event "patient created" was emitted and persisted into a db, you might want to execute multiple different commands related to that event, establishing some defaults, setting some cron jobs, etc. You definitely can do that programmatically, but alternatively, db can become a source of truth and context for all the related actions you want to execute, following your app's business logic

buoyant wedge
#

Gotcha, makes sense. Curious to see what Ian has cooking for user space triggers.

last finch
#

My first pass is akin to the RLS pattern - a db wrapper that executes actions on db writes.
The API I'm thinking of would let you register functions like

  tableName: async (ctx, { oldDoc, newDoc })
  // insert: oldDoc is null
  // delete: newDoc is null
  // update: both are set
  otherTable: ... 
}```
and expose a `dbWithoutTriggers` in `ctx` if you don't want to cascade them. 

From there you could build thin layers to:
- schedule a mutation/ action for each doc (to not bloat the transaction if it doesn't need to be transactional)
- configure denormalization rules, including statistics
- assign auto-incrementing IDs 
- do write-time zod validation

The shortcomings are:
- You'd have to enforce that all writes use the wrapped DB. Something like a lint rule that prohibits bare mutation / internalMutation definitions. 
- Data inserted from the CLI or dashboard or if you stream in via fivetran wouldn't trigger the updates.
- Doing a backfill is complicated and out of scope for a first version - you'd need to process existing documents then atomically switch to new events, while also having captured changes to older documents. Needs its own metadata table. Haven't given it much thought.
- Not v1, but you could ask for build-time cycle detection by configuring which tables you have access to write in each . Maybe extend to allow you to capture specific fields changed in the table to enable finer-grain cycle detection at build or run time.

The primitive is just a `db` wrapper, from there it'd hopefully be easy to make your own behavior. To insert rules at a deeper layer that capture dashboard/CLI edits would require a deeper integration.

None of this is a promise - just a sketch of what I'm thinking - wdyt?
vocal laurel
#

Thank you very much everyone for your thoughts and ideas.
@buoyant wedge the use cases of getting query count can be limitless, for example, I'd like to know the count of businesses that are open 24/7 and have pizza. I don't need the data of qualifying records because 1) I may want to paginate 2) UI doesn't need to show all the records, rather just the total count for user is enough or 3) it may cost me to read each record (this is the case in FB, not sure about convex)

so doing something like this would be ideal

...
.query("businesses")
 .filter((q) => q.eq(q.field("isOpen247"), args.isOpen247))
 .filter((q) => q.eq(q.field("pizzeria"), args.pizzeria))
 .count(); <<---

from the thread, I'm getting this feeling that it's either not possible atm or doesn't have priority, hopefully in future then?

last finch
#

If you have fewer than thousands of records I’d try implementing with collect and see if it ends up being an issue. Queries that don’t change data are cached, so all users will see the same results and you only pay db bandwidth once. So you could make a query specifically for this, if it is <8k records

#

But yes it’s not currently implemented and not in active development but we’re thinking about it