Whether to use hash map or list? | Together Java | Page 1

chrome widget Oct 3, 2023, 3:52 PM

#

I have one application where I process messages from Kafka, in the object that I receive from the Kafka, I have one value that is basically ID, after each Kafka message I have to process message in the following way:

use that id value to check if it is in the IDs list
if it is not in the list, then I add that id to the list and process it

I wonder whether instead of saving id to the list it would be more efficient to save it in the hashmap and then check whether there exist particular key in the hash map, instead of checking whether there is id in the list

spiral slateBOT Oct 3, 2023, 3:52 PM

#

<@&987246399047479336> please have a look, thanks.

spiral slateBOT Oct 3, 2023, 3:52 PM

#

spiral slate <@&987246399047479336> please have a look, thanks.

#

While you are waiting for getting help, here are some tips to improve your experience:

Code is much easier to read if posted with syntax highlighting and proper formatting.

If nobody is calling back, that usually means that your question was not well asked and hence nobody feels confident enough answering. Try to use your time to elaborate, provide details, context, more code, examples and maybe some screenshots. With enough info, someone knows the answer for sure.

Don't forget to close your thread using the command </help-thread close:1027500463647621170> when your question has been answered, thanks.

#

Here is an AI assisted attempt to answer your question 🤖. Maybe it helps! In any case, a human is on the way 👍. To continue talking to the AI, you can use </chatgpt:1108714622413963314>.

#

s not, process the message and add the ID to the list.

In this scenario, using a hash map would be more efficient than using a list.

A hash map provides constant time complexity (O(1)) for both insertion and lookup operations, while a list has linear time complexity (O(n)) for lookup operations. This means that as the size of your data grows, the performance of a hash map will remain constant, while the performance of a list will degrade.

Additionally, a hash map allows you to quickly check if an ID exists in the data set or not. If you use a list, you would need to iterate through each element to find the ID, which can be time-consuming for large data sets.

Therefore, using a hash map would provide better performance and efficiency for your application in this case.

jade spoke Oct 3, 2023, 4:27 PM

#

HashMap is usually better if you're performing a lot of searches or even if you have a lot of ids to look up for.

#

Could also use HashSet, if you just care about if a certain id is present or not.

#

Btw one more thing i should mention, both are not thread-safe if that's relevant to your use case.

chrome widget Oct 3, 2023, 4:38 PM

#

jade spoke Btw one more thing i should mention, both are not thread-safe if that's relevant...

wat's thread safe?

jade spoke Oct 3, 2023, 4:39 PM

#

in multi-threaded environment, you can not rely on what it says. Things can break.

#

You might wanna open a new thread, to keep this relevant to OP's questions.😄

chrome widget Oct 3, 2023, 4:53 PM

#

@jade spoke actually I can see now that I use ConcurrentSkipListSet. Yeah thread-safety is relevant for my case. Is there anything better other than ConcurrentSkipListSet?

jade spoke Oct 3, 2023, 4:55 PM

#

Might wanna look into HashTable, think its synchronized

chrome widget Oct 3, 2023, 4:57 PM

#

jade spoke Might wanna look into `HashTable`, think its synchronized

I think I could use ConcurrentHashMap

jade spoke Oct 3, 2023, 4:59 PM

#

altho i dont think you would need get

#

but just read up the docs just in case if you're missing something otherwise you have HashTable

chrome widget Oct 3, 2023, 5:04 PM

#

What are the differences between hash map and hash table?

jade spoke Oct 3, 2023, 6:10 PM

#

I personally havent used em, mostly just hashmap and hashset. Hash table was something that came up on a quick search about a thread safe way to perform same operation. I would suggest reading up the docs or more investigation around hash table to be on safer side.

vernal drum Oct 4, 2023, 5:46 AM

#

Hey there!

sonic bladeBOT Oct 4, 2023, 5:46 AM

#

Hashtable

📦 java.base/java.util

public class Hashtable<K, V>
  extends Dictionary<K, V>
  implements Map<K, V>, Cloneable, Serializable

This class implements a hash table, which maps keys to values. Any non-null object can be used as a key or as a value.

To successfully store and retrieve objects from a hashtable, the objects used as keys must implement the hashCode method and the equals method.

An instance of Hashtable has two parameters that affect its performance: initial capacity and load factor . The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. Note that the hash table is open : in the case of a "hash collision", a single bucket stores multiple entries, which must be searched sequentially. The load factor

round cove Oct 4, 2023, 7:47 AM

#

sonic blade

That's legacy don't use it

lusty junco Oct 4, 2023, 7:51 AM

#

( @vernal drum )

#

likely confused with HashMap, which is the proper class to use

vernal drum Oct 4, 2023, 7:54 AM

#

lusty junco likely confused with `HashMap`, which is the proper class to use

Hey! Not im not, i thought HashTable still in use

#

I dnt use it thought since 2017

#

So yeah it's HashMap

amber mist Oct 4, 2023, 7:55 AM

#

Why does it need to be synced. Isnt the id you check for the same you use to map the message in the kafka topic?

#

So the same listener always gets the same ids? (unless youre in a autoscaling multi pod environment)

#

but agree, just use Collections.synchronizedMap(new HashMap<...>())

round cove Oct 4, 2023, 7:59 AM

#

jade spoke Might wanna look into `HashTable`, think its synchronized

Like said above, don't @chrome widget

round cove Oct 4, 2023, 8:00 AM

#

chrome widget <@1014989258165072028> actually I can see now that I use ConcurrentSkipListSet. ...

Are you sure that it is relevant, and only the map, nothing else ?

chrome widget Oct 4, 2023, 8:02 AM

#

round cove Are you sure that it is relevant, and only the map, nothing else ?

Yeah, I think that I could use ConcurrentHashMap. For adding data and check whether map contains particular key in average case complexity is O(1). Also it is synchronized so that is good because once after some time I remove elements

round cove Oct 4, 2023, 8:09 AM

#

chrome widget Yeah, I think that I could use ConcurrentHashMap. For adding data and check whet...

I don't understand your sentence, what do you not know ?
And don't put synchronized blindly around your code

chrome widget Oct 4, 2023, 8:15 AM

#

round cove I don't understand your sentence, what do you not know ? And don't put synchroni...

Edited sentence

I am not using synchronized, I guess that by using ConcurrentHashMap only one thread can access the map

round cove Oct 4, 2023, 8:17 AM

#

chrome widget Edited sentence I am not using synchronized, I guess that by using ConcurrentHa...

But then are you sure about the surrounding of this code ? That there is no concurrent access?

chrome widget Oct 4, 2023, 8:20 AM

#

round cove But then are you sure about the surrounding of this code ? That there is no conc...

Yeah there is no concurrent access, that is fine

round cove Oct 4, 2023, 8:22 AM

#

chrome widget Yeah there is no concurrent access, that is fine

So you only have a concurrent access for the map but nothing else ?

chrome widget Oct 4, 2023, 8:23 AM

#

round cove So you only have a concurrent access for the map but nothing else ?

Yeah

marsh pelican Oct 4, 2023, 8:25 AM

#

map is actually used for key and value pair

chrome widget Oct 4, 2023, 8:30 AM

#

chrome widget Edited sentence I am not using synchronized, I guess that by using ConcurrentHa...

ConcurrentHashMap allows any thread to access the map just like normal
However, it ensures that access happens in order. So if you access with 2 threads at the same time, 1 has to wait for the other to finish their operation

#

If your app needs to access it concurrently, the same problem would arise when using a list

#

As for hashmap vs list, it really depends on how many items you have

#

A hashmap is fast. But a list can be significantly faster when you don't have a lot of elements

#

Also, as said above, a hashmap is for mapping keys to values.

#

If you don't have values, you should use a HashSet

round cove Oct 4, 2023, 8:34 AM

#

chrome widget As for hashmap vs list, it really depends on how many items you have

No it simply depends if you want access by key or not

chrome widget Oct 4, 2023, 8:35 AM

#

Searching a list vs looking up in a hashset
Item count makes the difference there

#

The question says that they currently use a list search + insert if it doesn't exist
I'm saying that that could be faster than using a hashmap / hashset when the item count is small

chrome widget Oct 4, 2023, 8:36 AM

#

chrome widget If you don't have values, you should use a `HashSet`

I need concurrency because at one point I am deleting items, there is no ConcurrentHashSet

#

ConcurrentHashMap.newKeySet();

#

Creates concurrent (hash) Set

#

That would be the set equivalent of ConcurrentHashMap

#

Similar to normal hashset

#

except synchronization ofc

round cove Oct 4, 2023, 8:38 AM

#

chrome widget Searching a list vs looking up in a hashset Item count makes the difference ther...

Speed doesn't matter

chrome widget Oct 4, 2023, 8:39 AM

#

round cove Speed doesn't matter

I wonder whether instead of saving id to the list it would be more efficient to save it in the hashmap
I'm implying this as a speed question

chrome widget Oct 4, 2023, 8:40 AM

#

chrome widget `ConcurrentHashMap.newKeySet();`

Is time complexity of acces then O(1)?

#

Bro I read your messages

#

it's not like they dissapear when some1 else talks

round cove Oct 4, 2023, 8:40 AM

#

chrome widget > I wonder whether instead of saving id to the list it would be more efficient t...

The answer is it doesn't matter
The synchronization will eat all the performances anyway

chrome widget Oct 4, 2023, 8:40 AM

#

That's fair

chrome widget Oct 4, 2023, 8:40 AM

#

chrome widget Is time complexity of acces then O(1)?

SImilar to normal HashSet (so O(1)) but indeed synchronization will take away alot

jade spoke Oct 4, 2023, 8:41 AM

#

Can we do hashmap but sync the add/delete only?

chrome widget Oct 4, 2023, 8:42 AM

#

¯_(ツ)_/¯

#

I'd answer this with
A Set vs List wouldn't really matter when looking at speed cuz of synchronization
A Set would be clearer in this case, since you need unique elements.

#

So I see benefit in replacing the list with a set

chrome widget Oct 4, 2023, 8:50 AM

#

chrome widget I'd answer this with A Set vs List wouldn't really matter when looking at speed ...

A Set vs List wouldn't really matter when looking at speed cuz of synchronization

How do you explain that?

#

Do I need to worry about synchronization if I am at some point removing all elements from the set but maybe at the same time I will add one element to existing set when I try to remove all of them?

chrome widget Oct 4, 2023, 9:18 AM

#

If you have multiple threads accessing it at some point (or modifying), you gotta have synchronization

#

But since synchronization makes things wait for each other, that would give a way bigger performance difference than the difference with list vs set

#

Imo I would go for the set, because a set is for unique elements, which I believe you want

#

Keep in mind tho

#

When we say 'performance difference'

#

That is still most likely nihil

#

Unless you do this extremely often

#

One other question, what is difference between using Long and long, I know that Long is object

#

Yes that is basically the difference. they are just wrappers that contain the primitive type.
Long Integer and so on are basically because of the java limitation that you can't use primitives with generics

#

So List<int> isn't allowed

#

That's why those wrapper classes exist

round cove Oct 4, 2023, 12:18 PM

#

chrome widget Do I need to worry about synchronization if I am at some point removing all elem...

I really can't say if you are misusing those classes or if you have concurrency problems or not, you should show your code

#Whether to use hash map or list?