Confused about hash table insertion time (closed hashing using linear probing) | Together C & C++ | Page 1

fast dragonBOT Jul 12, 2024, 7:34 PM

#

When your question is answered use !solved to mark the question as resolved.

Remember to ask specific questions, provide necessary details, and reduce your question to its simplest form. For tips on how to ask a good question use !howto ask.

stuck heart Jul 12, 2024, 7:39 PM

#

Confused about hash table insertion time (closed hashing using linear probing)

cobalt dove Jul 12, 2024, 11:39 PM

#

Disclaimer: So far I've only looked at your testing program:

First of all try to change your get_time function to this:

static inline double get_time(void) {
    struct timespec t;
    clock_gettime(CLOCK_MONOTONIC, &t);
    return t.tv_sec + t.tv_nsec * 1e-9;
}

The reason for that is on the manpage of gettimeofday:

The time returned by gettimeofday() is affected by discontinuous
jumps in the system time (e.g., if the system administrator
manually changes the system time). If you need a monotonically
increasing clock, see clock_gettime(2).

Secondly you should definetly use double instead of float for percent and why do you have the / 100? Just make it like so:

for (double percent = 0.002; percent < 1.0; percent += 0.002) {
    const int n = TABLE_CAPACITY * percent;
    ...
```This makes much more sense since mathematically 100% is the same as 1.

cobalt dove Jul 13, 2024, 12:01 AM

#

errrr?

Why do you even have a struct called bucket? You're not using buckets. That's just confusing to read
You really shouldn't use % tab->capacity to calculate the index. Rather be smart and in the constructor set tab->capacity to be the next bigger power of 2, then have an additional variable mask that you'll use instead of tab->capacity but you'll use bit-AND with it (much faster).
Here's how the constructor would change:

tab->capacity = nextPow2(capacity);
tab->mask = tab->capacity - 1;
```in theory you can even entirely remove `tab->capacity`, but keeping it doesn't hurt.
And here's how you would calculate the index:
```c
int hash = hash_function(key) & mask;  // I kept your variable name for consistency
// but hopefully this shows why the name 'hash' doesn't make sense since it's 
// not a hash but an index into your map
// Your 'hash_function' is just the identify function, so it just returns 'key' again

nocturne shore Jul 13, 2024, 12:05 AM

#

Y'all aren't addressing the question.

cobalt dove Jul 13, 2024, 12:05 AM

#

And on a second look I don't quiet get this line:

T *= 1e9;             /* Convert to microseconds. */
```shouldn't that convert that to nanoseconds?
Like, in your `get_time` function you even do:
```c
tv.tv_usec * 1e-6;
```which means you must've obviously known the conversions, no?

cobalt dove Jul 13, 2024, 12:06 AM

#

nocturne shore Y'all aren't addressing the question.

Cause I can't see anything particularly wrong with the hashtable that could explain such a behaviour

nocturne shore Jul 13, 2024, 12:06 AM

#

Do you consider the difference statistically significant? How so?

nocturne shore Jul 13, 2024, 12:07 AM

#

cobalt dove Cause I can't see anything particularly wrong with the hashtable that could expl...

Okay. So you decided to talk about unrelated things instead?

cobalt dove Jul 13, 2024, 12:08 AM

#

Da fuck mate? Calm down.
I decided to talk about potential root causes for the issue, like him using gettimeofday instead of clock_gettime and him doing

T *= 1e9;             /* Convert to microseconds. */
```which should be 
```c
T *= 1e6;
```and other things

nocturne shore Jul 13, 2024, 12:08 AM

#

cobalt dove Da fuck mate? Calm down. I decided to talk about potential root causes for the i...

I'm calm.

cobalt dove Jul 13, 2024, 12:09 AM

#

Yeah, it works but have you read the quote from the manpage I copied?

#

clock_gettime with a CLOCK_MONTONIC is simply more accurate (and it also gives you nanosecond precision instead of microsecond precision) than gettimeofday

#

Btw, don't use time (ms) if by ms you mean microseconds since ms stands for milliseoconds basically everywhere. The short version for microseconds is us

nocturne shore Jul 13, 2024, 12:16 AM

#

I think this result could be correct.

#

I think it would be a good exercise to try to show why it is or isn't correct.

cobalt dove Jul 13, 2024, 12:21 AM

#

Okay, then your professor made a mistake because he's clearly not printing out microseconds per operation, but he's printing out nanoseconds per operation (and he named it milliseconds in the output 🐒 ).

But yeah, considering we measured this with an insufficient clock (as I said a few times now: gettimeofday only provides micro-second precision, not nano-second precision, so if he tries to convert that to nanoseconds it's always going to be flawed, even if the clock was perfect which it isn't since it can be affected by system time jumps) this looks like a to be expected result, especially since the variance in the nanosecond range is so incredibly sensitive to how your CPU felt during that timespan.

I also think that even a table size of 1'000'000 like you have it will still mostly fit into your CPU caches. If you truly want to have some memory reads you'd need to make the table even bigger.

(You can also measure the program with perf to see how many cache/branch misses you have)

fast dragonBOT Jul 13, 2024, 12:25 AM

#

Thank you and let us know if you have any more questions!

This thread is now set to auto-hide after an hour of inactivity

#Confused about hash table insertion time (closed hashing using linear probing)