#HashMap Collision Question

1 messages · Page 1 of 1 (latest)

acoustic girder
#

I am confused.

According to many websites, including the ones below, a HashMap must have unique non-repeating keys, but the values can be repeated.

https://data-flair.training/blogs/hashset-vs-hashmap-in-java/#:~:text=A HashMap class is also,can be added to it.
https://stackoverflow.com/questions/27128247/hashmap-duplicate-values-identify-the-duplicates (Top answer)

Why then, is there a collision if two different keys have the same value, as described in this website?

https://www.baeldung.com/java-hashmap-advanced

Take the code shown below.

import java.util.*;
public class MyClass {
    public static void main(String args[]) {
        HashMap<Integer, String> test = new HashMap<Integer, String>();
        test.put(1, "Joe");
        test.put(2, "Tim");
        test.put(3, "Harry");
        test.put(4, "Tim");
        System.out.println(test);
    }
}

I am not getting any errors despite both the keys of 2 and 4 having the same value, "Tim". I don't understand when a collision could occur. It all seems contradictory. Its legal to have the same value for different keys but apparently that can also cause a collision?

Can anyone provide an example of a HashMap collision and explain how that is different from the example I shared in the code snippet above?

Thank you.

Baeldung

A quick and practical guide to Hashmap's internals

Learn about hashset and hashmap in java with examples. See difference between java HashSet vs HashMap and when to use them.

pearl glacierBOT
#

<@&987246717831381062> please have a look, thanks.

pearl glacierBOT
#

While you are waiting for getting help, here are some tips to improve your experience:

Code is much easier to read if posted with syntax highlighting and proper formatting.

If nobody is calling back, that usually means that your question was not well asked and hence nobody feels confident enough answering. Try to use your time to elaborate, provide details, context, more code, examples and maybe some screenshots. With enough info, someone knows the answer for sure.

Don't forget to close your thread using the command </help-thread close:1027500463647621170> when your question has been answered, thanks.

#

Here is an AI assisted attempt to answer your question 🤖. Maybe it helps! In any case, a human is on the way 👍. To continue talking to the AI, you can use </chatgpt:1108714622413963314>.

#

ining.net/java-hashmap/

https://www.geeksforgeeks.org/hashmap-containskey-method-in-java/

However, I am confused about what happens when there is a collision in a HashMap.

Can someone please explain how collisions are handled in a HashMap?

cerulean hinge
#

HashMap (and most other Maps) don't have any rules for values besides some disallowing null values. If you use the same key again you'll overwrite the previous value. Hash collisions are a more advanced topic (that only applies to keys). They don't affect the actual behaviour, but can slow the map down. You usually don't have to worry about them if you have a resonably good hash function for your key type (all jdk classes should have a decent one, except for URL).

acoustic girder
# cerulean hinge `HashMap` (and most other `Map`s) don't have any rules for values besides some ...

Thanks! I have an interview coming up where I'll be asked how HashMap collisions occur and how they can be resolved.

Even this website says that a collision is when 2+ keys are assigned the same value. https://www.educative.io/answers/hash-table-collision-resolution But I thought values can be repeated - multiple keys can have the same value, so long as the keys are unique. So it kind of seems like a collision is legal? I'm not sure.

tribal cliff
#

Where did you see that it was from same value ?

acoustic girder
# tribal cliff Collisions happen when two keys have the same hash, not value

I guess my next question is, what is the difference between a hash and a value in a java HashMap? Even this website says a collision is when two or more keys are mapped to the same value. https://www.geeksforgeeks.org/hashing-in-java/

#

I asked ChatGPT and it basically said - Collision is when two or more values have the same key. Is that one way to think about it?

cerulean hinge
#

They are referring to two or more keys having the same hash value

#

It's very confusingly written

acoustic girder
tribal cliff
#

You have to understand what a map is in order to understand key-value

acoustic girder
#

How are hash values calculated? Is it done implicitly?

#

The hashCode() function is called to calculate the hash value, is that done under the hood?

cerulean hinge
#

HashMap and HashTable both call that

acoustic girder
#

Got it. I'm assuming hashCode() is called everytime a new entry is put into the HashMap? And anytime the hash value for any two keys are the same, there is a collision.

#

Also, according to this website (https://www.educative.io/answers/hash-table-collision-resolution), the hash value is calculated by the following formula -

H(i) % length of HashMap. Is that always the formula used to calculate the hash values? And I'm assuming hash values for each key-value pair is recalculated anytime something is added/removed as the size of the HashMap changes?

tribal cliff
# acoustic girder Also, according to this website (https://www.educative.io/answers/hash-table-col...

The way a hashmap/set works, is that it uses an array, and use hashes to find the index
But what if the hash is 10000000 ? It can't create an array this length, so it will do hash % array.length so it is distributed within the length of the array, and yes, this will increases collisions
When there is a collision, both keys are stored in the same index (called bucket), in the form of a linked list

acoustic girder
# tribal cliff The way a hashmap/set works, is that it uses an array, and use hashes to find t...

That makes a lot of sense. So a hashmap is essentially an array. The hash value is basically the index in the array that a key-value pair is located in.

Arrays have a maximum size (can't be longer than 2,147,483,647 elements) so hashing is done to distribute data within the length of the biggest array possible. This could result in collisions, where some key-value pairs are in the same array index.

Collisions are more likely to occur when hashmaps are larger with more key-value pairs.

Hope I've got that down correctly.

Thanks for the explanation.

tribal cliff
# acoustic girder That makes a lot of sense. So a hashmap is essentially an array. The hash value ...

A few things, a hash can't be bigger than 2 billion since it's an int anyway so it's not the problem
The problem is that you don't want to create a 2Gb array just to store one key
So for example the array will have a length of 10 and all the hashes will be mod 10
Collisions are not more likely with more data, since the map just have to increase the array length
Also note that even without the % length, two hashes may have collisions

acoustic girder
tribal cliff
acoustic girder
# tribal cliff No, it's not the length of the hashmap, it wouldn't make sense, assume that it i...

Got it. Thanks so much. I don't think I need to worry about hashmaps turning into red black trees for my interview but its good to know.

I guess the size of the array and calculating hashcodes are not something I need to worry about doing myself since all that happens under the hood when using the HashMap class in Java.

I understand what the purpose of calculating a hashcode for hashmaps is, and how the same hashcode for multiple key-value pairs results in a collision. That's what matters.

Thanks for taking the time to help me. Appreciate it

tribal cliff
# acoustic girder Got it. Thanks so much. I don't think I need to worry about hashmaps turning int...

Note that the array size is called capacity, the how-full-before-resize is called load factor
Note that both are written in the hashmap doc, so just check it
Note that from the source code, default capacity is 16, default load factor is 0.75, but also note that this is specific to java HashMap, it may different in other languages
And note that HashMap has constructors where you can specify custom capacity and load factor

wary anchor
# acoustic girder Also, according to this website (https://www.educative.io/answers/hash-table-col...

jlyk, it isn't always the length of the array that causes the collision. it also can by the hash method but it's not that common. for example let's build a hashing function for String, be aware this won't be a very good example to actually use in a HashMap but just for the sake of understanding for now.

public static int hash(String string, int length) {
    byte[] bytes = string.getBytes();
    int hash = 0;
    for (int i = 0; i < bytes.length; i++)
        hash += bytes[i];
    return hash % length;
}

So what we're doing we get the string's bytes because each letter represents a number for example "a" converted to 97, "b" converted to 98 etc. And then after we know each letter's number we sum all of them together into one in hash. Basically turning the String into numeric number to be able to store it as index. We then modulo it to the length to make sure the number isn't larger than the length to be valid index. Alright so there's 2 possible collisions here. which is the first way you mentioned, array being too big that the hash method starts returning same hashes or the way it hashes it being too common. this hashing method in the example isn't really good. why? because it can return the same hash with different keys for example "aaab" is exactly the same hash as "abaa" making it return the exact same hash with just different order of the letters or even so the letters sum is equal to another letters'. You could make the hashing more unique for example by hashing the index of each letter too to make the place of letters included in the number making it harder for it to have hashing collision with the same letters in different order. so basically the HashMap hashes

#

the key to use it as index in the array then assign that index the value required.

Here's the current implementation of HashMap's hashing method:

    static final int hash(Object key) {
        int h;
        return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
    }
fresh torrentBOT
#
public static <K, V> HashMap<K, V> newHashMap(
  int numMappings
)

Creates a new, empty HashMap suitable for the expected number of mappings. The returned map uses the default load factor of 0.75, and its initial capacity is generally large enough so that the expected number of mappings can be added without resizing the map.

wanton quarry
#

If you know the approximate number of keys then you can directly use this method

#

It will make the necessary calculations and create a hashmap with the desired initial capacity

pearl glacierBOT
#

Closed the thread due to inactivity.

If your question was not resolved yet, feel free to just post a message to reopen it, or create a new thread. But try to improve the quality of your question to make it easier to help you 👍

pearl glacierBOT