Get the count of the query without using collect() | Convex Community | Page 1

vocal laurel Nov 30, 2023, 5:18 AM

#

Hi, I'm curious to learn if there is a way to get the total count of the particular query, not paginated, just regular query.
I can't find anything in the docs nor here in the conversations.

Thank you!

burnt rapids Nov 30, 2023, 6:55 AM

#

vocal laurel Hi, I'm curious to learn if there is a way to get the total count of the particu...

What do you mean by count? Do you mean the sql COUNT? Or something else

buoyant wedge Nov 30, 2023, 7:11 AM

#

There isn’t currently, you have to collect and count. If you need this, I’d recommend storing any counts explicitly in a table, use a migration to get initial count(s) and increment/decrement from there. Not great, but possible.

vocal laurel Dec 1, 2023, 6:19 AM

#

@burnt rapids yes, total count of the query.
@buoyant wedge thank you, i'm doing that which is not great.
As a matter of fact Firebase added this capability not long ago, as the demand was very high.

buoyant wedge Dec 1, 2023, 1:31 PM

#

cc/ @last finch (we need a way to at-mention convex team btw)

#

@vocal laurel curious, can you share a little more on what your use case is?

last finch Dec 1, 2023, 4:45 PM

#

@vocal laurel i hear you- there has been some interest in both full table counts and query count. one related thread with some thinking: https://discord.com/channels/1019350475847499849/1166106601543962715
One thing that is worth considering is that when you do a count of a query in SQL, unless it’s a full table count or something it specifically keeps statistics on, it has to scan all the related rows to get the count. So it’s just a more bandwidth-efficient version of .collect().length. However if you have 10k+ documents in the query, sql will just take longer, where Convex will currently refuse to read that much. That’s why Firebase didn’t have it for so long, and why their docs say

scale proportionally to the number of index entries scanned. Latency increases with the number of items in the aggregation.

for firebase, it’s especially important since all of the data would otherwise be sent all the way to the client. In Convex, your data is only going a short hop to your function.

Denormalizing as a pattern is the efficient solution, and something I want to make some user-space helpers for, so it feels less manual & safer to get regularly used statistics.

I know the answer isn’t exactly what you want, but I’m curious how workable it sounds? What is your used case so we can better evaluate alternatives?

icy bloom Dec 1, 2023, 7:05 PM

#

@last finch firestore got count/sum/avg on client not that long ago, and it doesn't fetch all the document, of course 🙂

#

i was hoping to see triggers in Convex(which could be, theoretically, used for these kind of problems), and was promised they're coming, but it seems like it's not a high priority

burnt rapids Dec 1, 2023, 7:19 PM

#

last finch <@1165760069607104586> i hear you- there has been some interest in both full tab...

does convex not store the table count anyway in its table metadata since when you add a row it has to add a new one to the index? So should this not be fetched from there is does it not work that way.

last finch Dec 1, 2023, 11:30 PM

#

burnt rapids does convex not store the table count anyway in its table metadata since when yo...

We could expose a total count, but not a count for a specific query

last finch Dec 1, 2023, 11:33 PM

#

icy bloom <@897754604790480906> firestore got count/sum/avg on client not that long ago, a...

Yes firestore doesn't send all documents to the client, but it does have to scan them all, assuming based on their docs:

scale proportionally to the number of index entries scanned. Latency increases with the number of items in the aggregation.
So it's similar: your backend (which is close to the DB) scans the documents and sends the client the count. It just currently will incur more function <-> DB bandwidth, whereas firestore absorbs that cost for you.

I think we're agreeing, but lmk if I missed something

last finch Dec 1, 2023, 11:35 PM

#

icy bloom i was hoping to see triggers in Convex(which could be, theoretically, used for t...

I actually have been working on some code to make user-space triggers easier to write - hopefully something in the next week or two. The full-on trigger feature is dependent on current Component workstream which is currently underway

icy bloom Dec 1, 2023, 11:52 PM

#

I'm really interested in a normal db-driven triggers, imho, they're the best feature of firestore, basically for any event-driven flows it's a must. I'll wait, I'm patient and i know how hard these things are to implement 🙂

buoyant wedge Dec 1, 2023, 11:57 PM

#

@icy bloom for what use case are you seeing the current api's not supporting triggers well? I'm doing event triggers pretty minimally, but it's working so far. Curious what your experience has been.

icy bloom Dec 2, 2023, 12:12 AM

#

Anything that remotely resembles CQRS pattern. When you have a flow of commands and events that trigger each other. Imagine you have a command like "create patient", after it's executed and an event "patient created" was emitted and persisted into a db, you might want to execute multiple different commands related to that event, establishing some defaults, setting some cron jobs, etc. You definitely can do that programmatically, but alternatively, db can become a source of truth and context for all the related actions you want to execute, following your app's business logic

buoyant wedge Dec 2, 2023, 12:17 AM

#

Gotcha, makes sense. Curious to see what Ian has cooking for user space triggers.

last finch Dec 2, 2023, 4:25 AM

#

My first pass is akin to the RLS pattern - a db wrapper that executes actions on db writes.
The API I'm thinking of would let you register functions like

  tableName: async (ctx, { oldDoc, newDoc })
  // insert: oldDoc is null
  // delete: newDoc is null
  // update: both are set
  otherTable: ... 
}```
and expose a `dbWithoutTriggers` in `ctx` if you don't want to cascade them. 

From there you could build thin layers to:
- schedule a mutation/ action for each doc (to not bloat the transaction if it doesn't need to be transactional)
- configure denormalization rules, including statistics
- assign auto-incrementing IDs 
- do write-time zod validation

The shortcomings are:
- You'd have to enforce that all writes use the wrapped DB. Something like a lint rule that prohibits bare mutation / internalMutation definitions. 
- Data inserted from the CLI or dashboard or if you stream in via fivetran wouldn't trigger the updates.
- Doing a backfill is complicated and out of scope for a first version - you'd need to process existing documents then atomically switch to new events, while also having captured changes to older documents. Needs its own metadata table. Haven't given it much thought.
- Not v1, but you could ask for build-time cycle detection by configuring which tables you have access to write in each . Maybe extend to allow you to capture specific fields changed in the table to enable finer-grain cycle detection at build or run time.

The primitive is just a `db` wrapper, from there it'd hopefully be easy to make your own behavior. To insert rules at a deeper layer that capture dashboard/CLI edits would require a deeper integration.

None of this is a promise - just a sketch of what I'm thinking - wdyt?

vocal laurel Dec 2, 2023, 9:05 AM

#

Thank you very much everyone for your thoughts and ideas.
@buoyant wedge the use cases of getting query count can be limitless, for example, I'd like to know the count of businesses that are open 24/7 and have pizza. I don't need the data of qualifying records because 1) I may want to paginate 2) UI doesn't need to show all the records, rather just the total count for user is enough or 3) it may cost me to read each record (this is the case in FB, not sure about convex)

so doing something like this would be ideal

...
.query("businesses")
 .filter((q) => q.eq(q.field("isOpen247"), args.isOpen247))
 .filter((q) => q.eq(q.field("pizzeria"), args.pizzeria))
 .count(); <<---

from the thread, I'm getting this feeling that it's either not possible atm or doesn't have priority, hopefully in future then?

last finch Dec 2, 2023, 6:01 PM

#

If you have fewer than thousands of records I’d try implementing with collect and see if it ends up being an issue. Queries that don’t change data are cached, so all users will see the same results and you only pay db bandwidth once. So you could make a query specifically for this, if it is <8k records

#

But yes it’s not currently implemented and not in active development but we’re thinking about it

#Get the count of the query without using collect()