INCRBY not storing value even though there is more space in key? | Redis | Page 1

long spire Aug 29, 2024, 11:18 PM

#

I've created a key, let's call myKey with a K of 100 and all the defaults for other parameters.

If I do

TOPK.LIST myKey

In my specific example, I have 14 different values of very high counts that are popped out.

Then if I do

TOPK.INCRBY myKey someNewValue 99999
TOPK.LIST myKey

I still see only 14 keys! I expect to see 15 keys with the new value persisted. What's going on here?

long spire Aug 29, 2024, 11:44 PM

#

This is probably a bug.

I found out that if I do

TOPK.INCRBY myKey More 99999

it does not persist

if I do

TOPK.INCRBY myKey more 99999

it persists.

any idea what's going on?

#

if I use a new key, then it works btw. If I use this specific key, or recreate this specific key with the same elements, then it runs into this issue

sinful zodiac Aug 29, 2024, 11:52 PM

#

What cliient are you using to perform those ops?

long spire Aug 29, 2024, 11:52 PM

#

stackexchange

GitHub

GitHub - StackExchange/StackExchange.Redis: General purpose redis c...

General purpose redis client. Contribute to StackExchange/StackExchange.Redis development by creating an account on GitHub.

sinful zodiac Aug 29, 2024, 11:54 PM

#

I remember RedisInsight cli had similar problem with case-sensitivity. Could be intentionally made to be strict case mode.

long spire Aug 29, 2024, 11:57 PM

#

would that issue apply to all words? If I add "Something" then it works

#

I also tried adding M, Mo, Mor and they all worked.

Could it be a hashing issue? Btw I also tested directly with the CLI and ran into the same issue. I don't think it's with the client.

#

/opt/homebrew/bin/redis-cli

sinful zodiac Aug 30, 2024, 12:17 AM

#

Hm, might be keywords for parser... Not sure about that. 😦

long spire Aug 30, 2024, 12:22 AM

#

any ideas on resolving this issue?

#

like always lowercasing maybe?

sinful zodiac Aug 30, 2024, 12:24 AM

#

Sounds reasonable. May be @bright lake has some inputs on that, I remember she did good videos about Probabilistic structs.

keen storm Aug 30, 2024, 1:04 PM

#

When you define a TopK, you give it a height and a width. The height is the number of hashing functions used and correspond to arrays of counts and hashed values. The arrays themselves are width long. When you update the count of a value in a TopK, you hash the value and modulus it with the width to get an index into all of the arrays. If any of those indices are occupied, that is a chance that the count will decrease. That chance is based on the count with larger counts being less likely to decrease.

My guess is that you're getting collisions on that particular word in all of your arrays and it's not adding the word. This is the sort of thing that can happen with probabilisitic data structures. To test this hypothesis, check the count of all the other keys before and after and see if they decrease. If so, it's working as desigend. To mitigate this problem, adjust the height and width of your TopK data structure.

I also have a talk where I explain how TopK works (and some other probabilisitic data structures). It's been recorded a few times and posted to YouTube. It's called Understanding Probabilistic Data Structures with 112,092 UFO Sightings. With a title like that, it should be easy to find. 😉

long spire Aug 30, 2024, 3:23 PM

#

Is the hash based on the hostname or counts or something? I can only reproduce it on a particular redis host.

#

Correction - it only happens when the counts of the existing values are high.

#

to reproduce

del the key
reserve the key with 100 slots
incrby the key with the 13 keys with 10K increment
incrby the problematic key with any increment
notice that the problematic key isn't added

#

If in step 3, I incrby the 13 keys with only a value of 1, then I'm able to add the problematic key

#

btw there are three problematic values that don't get added. I think at the scale of sub 100 values, I'd imagine the case of hash collision should be extremely low with the default parameters.

But maybe I'm wrong, the defaults need to be tuned.

#

`10.136.5.12:27783> TOPK.LIST heartbeat_cardinality:mock_cardinality_field

94
60
69
80
49
59
55
64
53
75
79
99
90
39
35
45
65
84
40
50
44
20
54
30
58
38
15
25
13
34
10.136.5.12:27783> TOPK.INCRBY heartbeat_cardinality:mock_cardinality_field 1 100
89
10.136.5.12:27783> TOPK.LIST heartbeat_cardinality:mock_cardinality_field
94
60
59
55
69
80
64
49
39
53
75
79
99
90
35
45
44
65
84
40
54
50
20
30
58
38
15
25
13`

#

Sorry for the word vomit, but I finally had an example I'd be allowed to share!

#

as we can see, the value 1 was not added. Also tested in the other example, and I looked at your video, the value did not change.

Also at the very least the string should enter the heap right?

#

entering the heap shouldn't be blocked due to hash collision 🤔

#

To test this hypothesis, check the count of all the other keys before and after and see if they decrease.

The counts were unchanged. It's almost as if topK was never even consulted - like there's a bloom filter that's somehow based on the count as well blocking the new entry. I proved it in a separate example that contains some internal information. I made all of the counts be 10000, and then I tried adding "More" with an increment of 99999 and it did not get added.

#

I obfuscated this here:

`<IP_ADDRESS>:<PORT>> TOPK.LIST key WITHCOUNT

Value
(integer) 20000
Value
(integer) 20000
Value
(integer) 20000
Value
(integer) 20000
Value
(integer) 20000
Value
(integer) 20000
Value
(integer) 20000
Value
(integer) 20000
Value
(integer) 20000
Value
(integer) 20000
Value
(integer) 20000
Value
(integer) 20000
Value
(integer) 20000
<IP_ADDRESS>:<PORT>> TOPK.INCRBY key More 19999
(nil)
<IP_ADDRESS>:<PORT>> TOPK.LIST key WITHCOUNT
Value
(integer) 20000
Value
(integer) 20000
Value
(integer) 20000
Value
(integer) 20000
Value
(integer) 20000
Value
(integer) 20000
Value
(integer) 20000
Value
(integer) 20000
Value
(integer) 20000
Value
(integer) 20000
Value
(integer) 20000
Value
(integer) 20000
Value
(integer) 20000`

#

I also noticed that the count of the key decrementing is probabalistic - I ran the incrby for "More" about 100 times with the default .9 with no change.

long spire Sep 3, 2024, 5:28 PM

#

@keen storm - So I'm not sure why it's not accepting values, but a wider width helps with accepting new values. Do these values look right to you?

K - 100 (desired)

Width - 1000 (k * log(k) ~ 650)
Depth - 8
Decay - .9

keen storm Sep 3, 2024, 5:59 PM

#

I misspoke when I said this bit here:

To test this hypothesis, check the count of all the other keys before and after and see if they decrease. If so, it's working as desigend. To mitigate this problem, adjust the height and width of your TopK data structure.
It's very possible that the counts won't go down.

Wider and deeper will both help. Ultimately, probabilisitic data structures are, probabilisitic. So, sometimes what you add to them won't be added or will be removed later.

long spire Sep 3, 2024, 6:02 PM

#

makes sense! given that, do you think my chances of collisions are more reasonable given the above settings?

Also even if counts don't go down, your talk goes on to mention that the value gets added to the heap afterwards. Is there additional probabilistic computation that decides what gets added? because it doesn't seem probabilistic if it happens 100/100 times. I had assumed just the counting portion of topK was probabilistic.

keen storm Sep 3, 2024, 9:01 PM

#

My guess is that the code removes (or doesn't add) items to the heap when the count in the heavy keeper is 0.

#INCRBY not storing value even though there is more space in key?