#Search Optimization with a large collection [MongoDB]

34 messages · Page 1 of 1 (latest)

cyan zinc
#

I have a collection with 100k+ documents and counting. Right now the search functionality in the list view for the collection is quite slow. I have tried optimizing with indexes, but its still operating quite slowly.

For example, I have a collection called 'jobs' and i want to search it by fields 'id', 'title_raw', and 'company_raw'. My assumption is that the app is creating a query like this:

{
$or: [
{ company_raw: { $regex: 'Accounts Payable Clerk', $options: 'i' } },
{ title_raw: { $regex: 'Accounts Payable Clerk', $options: 'i' } },
{ _id: { $regex: 'Accounts Payable Clerk', $options: 'i' } }
]
}

is that correct?

This runs pretty fast in mongosh or when I run the aggregation from my own code, but i takes 30-50 seconds in the Payload List view.

My questions are:

  1. Am I understanding the search query properly?
  2. Is there any way to create custom logic for function that provides data to this view without having to rewrite the whole component? I've tried writing my own tables as an override but they were quite a pain to maintain.
silent bronzeBOT
viscid marlin
#

Hey @cyan zinc! Appreciate your patience here. We'll get you some answers here shortly.

warm charm
#

Hey @cyan zinc are you on 2.0?

night torrent
#

^ Specifically payload v2.0.14 and db-mongodb v1.0.5 or higher. That's when a major performance optimization was introduced for searching/filtering through a large amount of documents

warm charm
#

also, @cyan zinc, do you have drafts enabled on that collection?

cyan zinc
#

hey! sorry i just re-engaged with this issue I was having. these are the versions I'm using:

"payload": "^2.8.2",
"@payloadcms/db-mongodb": "^1.3.2",

#

I do not have drafts enabled for that collection

#

and a jam of the speed

#

Well after i went to make the jam, it's going pretty fast! maybe there were some caching of package issues

#

actually seems to be a bit in consistent. will send that jam after all

#

as you can see, the first two queries are fast and then the last one is slow. maybe it's the query itself and nothing to do with payload but if you have any hunches, let me know!

warm charm
#

this is super weird

#

very strange that the first few queries were lightning fast and then for some reason medical-assistant was not

#

is it related to the medical assistant search being 2 words by chance?

#

can you narrow that down?

cyan zinc
#

It does look like i get a faster result with one word medical vs two words medical assistant

#

I'm also noticing that if you pass the search parameter to the api endpoint, it doesn't keyword search. is that by design?

warm charm
#

hmmmmmmm

#

so, the search param in the admin UI is used to separate out what is entered in the search bar vs. what is built up manually in the filters dropdown

#

but search is just combined into the typical where URL param for the REST API

#

re: one vs two words, here's a bit of info about what's happening in the background when you search multiple words

#

Payload basically splits out a multi word string into an $and query, where each word is searched for separately

#

so it's not super different if there's only one word vs. many words

#

do you have the fields indexed properly?

cyan zinc
#

Thank you that's really helpful! I figured as much about the search , just wanted to check and make sure that was by design. so that makes sense.

#

The more I experiment, the less it makes sense why some queries are slower than others. I'm wondering if it's actually the sort parameter that makes it really slow. i do have indexes set up for each searched field and popular sort fields (including the one i'm using in the jams)

#

If i'm searching my three fields, do i need a compound index? The research I did told me that a compound index wouldn't help and when i added one it, it didn't improve.

it's possbile that searching by three fields is the problem. When i drop it down to 1 though, i'm still getting wildely varying speeds of execution. and it actually does not appear to matter how many words i search.

Generally, when I want to search a MongoDB database like this I would use a search index. Are there any plans to support that kind of querying?

warm charm
#

yes, adding support for a search index is definitely on the radar

#

@steep plover and I will get it added to our roadmap officially

steep plover
#

Yes