SKIPINITIALSCAN | Redis | Page 1

halcyon zenith May 1, 2024, 5:32 PM

#

What is the use case for SKIPINITIALSCAN?

If I have an existing index, in this case one with a VECTOR field, can I enable that and then update a million records and then disable it?

Does that even make sense?

The issue I'm dealing with right now is a saturated CPU after updating a large amount of records.

Thanks!

gloomy ferry May 1, 2024, 6:21 PM

#

This is my understanding but I haven't tested it. So take this with a grain of salt.

That said, when you initially create an index, Redis scans everything that matches the prefix and type, so, for example all Hashes that start with bigfoot:sighting:. It will do this in another thread so as not to lock up the main Redis thread. SKIPINITIALINDEX tells it not do do that. So, any existing data will not be indexed.

If the index has already been created, adding a Hash should trigger the indexing.

The notes in the docs are a little confusing. The first one states:

Attribute number limits: RediSearch supports up to 1024 attributes per schema, out of which at most 128 can be TEXT attributes. On 32 bit builds, at most 64 attributes can be TEXT attributes. The more attributes you have, the larger your index, as each additional 8 attributes require one extra byte per index record to encode. You can always use the NOFIELDS option and not encode attribute information into the index, for saving space, if you do not need filtering by text attributes. This will still allow filtering by numeric and geo attributes.

This is saying that instead of using SKIPINITIALINDEX, you could use NOFIELDS instead which will still allow you to search NUMERIC and GEO fields. My guess is that you would not be able to use VECTOR fields either.

rugged briar May 2, 2024, 12:51 PM

#

@halcyon zenith - are you updating the vector fields? Most likely the CPU spikes are from building the HNSW. SKIPINITIALSCAN skips the initial scan over the key space when a index is created or altered.

halcyon zenith May 2, 2024, 1:59 PM

#

Yeah, it seems that the CPU spikes enough that it starts to interfere with the actual setting of the hash.

#

We were just trying to brainstorm ideas on how to perhaps pause indexing, update hashes, and then resume indexing.

Are there established patterns for this sort of thing?

#SKIPINITIALSCAN