#Quadratic probing hashmap implementation in C++20

78 messages · Page 1 of 1 (latest)

worldly flame
#

Hello, I've implemented a hashmap using quadratic probing as a form of collision handling in C++20.

I'm specifically looking for some possible but reasonable:

  • Optimizations that could be made (speed and space).
  • Optimal usage of C++20 features.
  • Make the code look better.
  • Redundancy reduction + code size reduction.
  • Modern practices for the C++20 standard.
cedar peak
#
        HashMap& operator=(HashMap&& moved) noexcept {
            if(this != &moved) {
                _entries = std::move(moved._entries);
                _entries_len = moved._entries_len;
            }

            return *this;
        }

just spotted this, generally moving something leaves it in a valid but undetermined state, it seems like you are moving the entries, but just copying the entries_len what might be a bit odd as calling _get_size() might now return values that don't make to much sense anymore

#

it's not a huge deal in this case as you can't realy index on the structure

#

but it might be worthwhile to set ensure the entries are zeroed out or something

worldly flame
#

I see, so what should I do?

#

Transfer ownership to the moved class?

#

Via: _entries_len = std::move(moved._entries_len);?

worldly flame
worldly flame
cedar peak
#

It's not the ownership transfer that's important for that its just an int, but I would just copy it and zero it so it's 0 like entries themselves

eager ice
# worldly flame

keep in mind that quadratic probing and exponential growth might not be a good idea, because quadratic probing doesn't actually reach all buckets

for example, say that the initial entry_index is 3, and you have a capacity of 8, now the following buckets are traversed:

3 + 0 * 0 = 3
3 + 1 * 1 = 4
3 + 2 * 2 = 7
3 + 3 * 3 = 1
3 + 4 * 4 = 3
3 + 5 * 5 = 4
...

you don't actually cover all buckets and could loop endlessly in search for an empty bucket, if 3, 4, 7, and 1 are already taken, but say, 2 is still empty

this is because there are limited quadratic residues, namely (p + 1)/2 for a prime modulus, and even fewer for a non-prime such as a power of two

#

what you're doing is particularly problematic because you only grow if

if(_entries_len == capacity) {

normally you grow when half the capacity is reached or less

#

anyhow, it's really important that you don't create an infinite loop when looking for buckets, which is basically guaranteed with your implementation

#

try to grow to a prime number p instead, and never fill more than (p + 1) / 2 buckets to ensure that you will find an empty one

#

or just use a different probing method which guarantees that you traverse all buckets

#

quadratic probing doesn't go well with powers of two

worldly flame
worldly flame
#

And I should reduce the size at which I grow?

eager ice
#

it's all regarding this function

void _insert(const std::pair<K, V>& kv_pair) noexcept {
    std::size_t capacity = _get_capacity();

    if(_entries_len == capacity) {
        std::cout << "Skid?\n";
        _extend_hashmap(capacity * 2);
    }

    // h(k, i) = h(k) + c_1i + c_2i^2 % m
    std::size_t entry_index = _hashing_func(kv_pair.first);

    for(std::size_t i = 0; _entries[entry_index].is_used; ++i) {
        entry_index = (entry_index + i * i) % capacity;
    }

    _entries[entry_index].kv_pair = kv_pair;     
    _entries[entry_index].is_used = true;
    ++_entries_len;
}
worldly flame
#

I have actually updated the code from when I first changed it, I should send that when I can

eager ice
eager ice
#

if 75% of buckets are full, you might still not find any empty buckets

worldly flame
#

Which results in endless traversal?

eager ice
#

yeah

worldly flame
#

And the solution is to grow the entry vector sooner to ensure that we have an abundance of empty buckets

eager ice
#

yeah, but that's only part of the solution, since powers of two don't give you enough quadratic residues

worldly flame
#

And I should grow it by the length you said earlier

#

(Prime_number + 1) / 2?

#

Or no

eager ice
#

yeah you should grow to the next prime number or the next prime number close to a power of two

worldly flame
#

Mmmh okay

eager ice
#

powers of two are bad because e.g. 4096 has 684 residues

#

so you'd need to grow when 684 buckets are used, which is just ~17%

#

or maybe just use a better probing scheme that eventually traverses all buckets

worldly flame
#

What do you have in mind?

#

I've implemented linear probing before and I wanted to implement quadratic probing because it's new to me and looks fun

#
  • modern hashmaps use them
#

Rust's hashmap uses them

eager ice
#

the point of quadratic probing is that it doesn't produce as much clustering as linear probing but it gradually increase step size, which is cache friendly

#

but you could achieve that trade-off with any number of functions

worldly flame
#

Is it going to be static or will it have to be relative to a number?

eager ice
#

you could grow to the next larger or nest lower prime number than a power of two

#

LCG = linear congruential generator

worldly flame
#

God what's that from

#

That sounds familiar

#

Isn't that used in PRNG engines?

worldly flame
#

The capacity or length?

eager ice
worldly flame
#

Oooh what's that?

#

And that's what I'll extend by?

eager ice
#

In mathematics, a Mersenne prime is a prime number that is one less than a power of two. That is, it is a prime number of the form Mn = 2n − 1 for some integer n. They are named after Marin Mersenne, a French Minim friar, who studied them in the early 17th century. If n is a composite number then so is 2n − 1. Therefore, an equivalent defini...

worldly flame
#

So I should do something like this:

if(_entries_len == (capacity / 2) + 1) {
  T prime = get_mersenne_prime(capacity);
  extend_hashmap(prime);
}
```?
eager ice
#

it's (p + 1) / 2 residues

#

and just grow to the next prime number, it doesn't have to a be a mersenne prime, although if you stick to finding prime numbers close to powers of two, you are likely to find a mersenne prime

worldly flame
worldly flame
#
if(_entries_len == (capacity + 1) / 2) {
    _extend_hashmap(capacity * find_nearest_prime(capacity));
}
#

Does this look correct?

#

Sorry, I haven't worked on it all day

eager ice
worldly flame
#

Okay that makes much more sense ~_~

#

Why the: * 2 - 1?

eager ice
# worldly flame Why the: ``* 2 - 1``?

because mersenne primes are either one lower or one greater than a power of two, so if you start out e.g. 17, you get to the next one with (17 - 1) * 2 - 1 = 31

worldly flame
#

The rounding function doesn’t fetch the nearest mersenne

eager ice
#

you just need some prime

#

the point of sticking to powers of two is that you keep relatively large growth, instead of growing from say, 19 to 23

#

the reason why mersenne primes are useful is that they can be found quickly

worldly flame
#

Right okay

#

Thanks

spare bolt