#INCRBY not storing value even though there is more space in key?

30 messages · Page 1 of 1 (latest)

long spire
#

I've created a key, let's call myKey with a K of 100 and all the defaults for other parameters.

If I do

TOPK.LIST myKey

In my specific example, I have 14 different values of very high counts that are popped out.

Then if I do

TOPK.INCRBY myKey someNewValue 99999
TOPK.LIST myKey

I still see only 14 keys! I expect to see 15 keys with the new value persisted. What's going on here?

long spire
#

This is probably a bug.

I found out that if I do

TOPK.INCRBY myKey More 99999

it does not persist

if I do

TOPK.INCRBY myKey more 99999

it persists.

any idea what's going on?

#

if I use a new key, then it works btw. If I use this specific key, or recreate this specific key with the same elements, then it runs into this issue

sinful zodiac
#

What cliient are you using to perform those ops?

long spire
sinful zodiac
#

I remember RedisInsight cli had similar problem with case-sensitivity. Could be intentionally made to be strict case mode.

long spire
#

would that issue apply to all words? If I add "Something" then it works

#

I also tried adding M, Mo, Mor and they all worked.

Could it be a hashing issue? Btw I also tested directly with the CLI and ran into the same issue. I don't think it's with the client.

#

/opt/homebrew/bin/redis-cli

sinful zodiac
#

Hm, might be keywords for parser... Not sure about that. 😦

long spire
#

any ideas on resolving this issue?

#

like always lowercasing maybe?

sinful zodiac
#

Sounds reasonable. May be @bright lake has some inputs on that, I remember she did good videos about Probabilistic structs.

keen storm
#

When you define a TopK, you give it a height and a width. The height is the number of hashing functions used and correspond to arrays of counts and hashed values. The arrays themselves are width long. When you update the count of a value in a TopK, you hash the value and modulus it with the width to get an index into all of the arrays. If any of those indices are occupied, that is a chance that the count will decrease. That chance is based on the count with larger counts being less likely to decrease.

My guess is that you're getting collisions on that particular word in all of your arrays and it's not adding the word. This is the sort of thing that can happen with probabilisitic data structures. To test this hypothesis, check the count of all the other keys before and after and see if they decrease. If so, it's working as desigend. To mitigate this problem, adjust the height and width of your TopK data structure.

I also have a talk where I explain how TopK works (and some other probabilisitic data structures). It's been recorded a few times and posted to YouTube. It's called Understanding Probabilistic Data Structures with 112,092 UFO Sightings. With a title like that, it should be easy to find. 😉

long spire
#

Is the hash based on the hostname or counts or something? I can only reproduce it on a particular redis host.

#

Correction - it only happens when the counts of the existing values are high.

#

to reproduce

  1. del the key
  2. reserve the key with 100 slots
  3. incrby the key with the 13 keys with 10K increment
  4. incrby the problematic key with any increment
  5. notice that the problematic key isn't added
#

If in step 3, I incrby the 13 keys with only a value of 1, then I'm able to add the problematic key

#

btw there are three problematic values that don't get added. I think at the scale of sub 100 values, I'd imagine the case of hash collision should be extremely low with the default parameters.

But maybe I'm wrong, the defaults need to be tuned.

#

`10.136.5.12:27783> TOPK.LIST heartbeat_cardinality:mock_cardinality_field

  1. 94
  2. 60
  3. 69
  4. 80
  5. 49
  6. 59
  7. 55
  8. 64
  9. 53
  10. 75
  11. 79
  12. 99
  13. 90
  14. 39
  15. 35
  16. 45
  17. 65
  18. 84
  19. 40
  20. 50
  21. 44
  22. 20
  23. 54
  24. 30
  25. 58
  26. 38
  27. 15
  28. 25
  29. 13
  30. 34
    10.136.5.12:27783> TOPK.INCRBY heartbeat_cardinality:mock_cardinality_field 1 100
  31. 89
    10.136.5.12:27783> TOPK.LIST heartbeat_cardinality:mock_cardinality_field
  32. 94
  33. 60
  34. 59
  35. 55
  36. 69
  37. 80
  38. 64
  39. 49
  40. 39
  41. 53
  42. 75
  43. 79
  44. 99
  45. 90
  46. 35
  47. 45
  48. 44
  49. 65
  50. 84
  51. 40
  52. 54
  53. 50
  54. 20
  55. 30
  56. 58
  57. 38
  58. 15
  59. 25
  60. 13`
#

Sorry for the word vomit, but I finally had an example I'd be allowed to share!

#

as we can see, the value 1 was not added. Also tested in the other example, and I looked at your video, the value did not change.

Also at the very least the string should enter the heap right?

#

entering the heap shouldn't be blocked due to hash collision 🤔

#

To test this hypothesis, check the count of all the other keys before and after and see if they decrease.

The counts were unchanged. It's almost as if topK was never even consulted - like there's a bloom filter that's somehow based on the count as well blocking the new entry. I proved it in a separate example that contains some internal information. I made all of the counts be 10000, and then I tried adding "More" with an increment of 99999 and it did not get added.

#

I obfuscated this here:

`<IP_ADDRESS>:<PORT>> TOPK.LIST key WITHCOUNT

  1. Value
  2. (integer) 20000
  3. Value
  4. (integer) 20000
  5. Value
  6. (integer) 20000
  7. Value
  8. (integer) 20000
  9. Value
  10. (integer) 20000
  11. Value
  12. (integer) 20000
  13. Value
  14. (integer) 20000
  15. Value
  16. (integer) 20000
  17. Value
  18. (integer) 20000
  19. Value
  20. (integer) 20000
  21. Value
  22. (integer) 20000
  23. Value
  24. (integer) 20000
  25. Value
  26. (integer) 20000
    <IP_ADDRESS>:<PORT>> TOPK.INCRBY key More 19999
  27. (nil)
    <IP_ADDRESS>:<PORT>> TOPK.LIST key WITHCOUNT
  28. Value
  29. (integer) 20000
  30. Value
  31. (integer) 20000
  32. Value
  33. (integer) 20000
  34. Value
  35. (integer) 20000
  36. Value
  37. (integer) 20000
  38. Value
  39. (integer) 20000
  40. Value
  41. (integer) 20000
  42. Value
  43. (integer) 20000
  44. Value
  45. (integer) 20000
  46. Value
  47. (integer) 20000
  48. Value
  49. (integer) 20000
  50. Value
  51. (integer) 20000
  52. Value
  53. (integer) 20000`
#

I also noticed that the count of the key decrementing is probabalistic - I ran the incrby for "More" about 100 times with the default .9 with no change.

long spire
#

@keen storm - So I'm not sure why it's not accepting values, but a wider width helps with accepting new values. Do these values look right to you?

K - 100 (desired)

Width - 1000 (k * log(k) ~ 650)
Depth - 8
Decay - .9

keen storm
#

I misspoke when I said this bit here:

To test this hypothesis, check the count of all the other keys before and after and see if they decrease. If so, it's working as desigend. To mitigate this problem, adjust the height and width of your TopK data structure.
It's very possible that the counts won't go down.

Wider and deeper will both help. Ultimately, probabilisitic data structures are, probabilisitic. So, sometimes what you add to them won't be added or will be removed later.

long spire
#

makes sense! given that, do you think my chances of collisions are more reasonable given the above settings?

Also even if counts don't go down, your talk goes on to mention that the value gets added to the heap afterwards. Is there additional probabilistic computation that decides what gets added? because it doesn't seem probabilistic if it happens 100/100 times. I had assumed just the counting portion of topK was probabilistic.

keen storm
#

My guess is that the code removes (or doesn't add) items to the heap when the count in the heavy keeper is 0.