#Whether to use hash map or list?

1 messages · Page 1 of 1 (latest)

chrome widget
#

I have one application where I process messages from Kafka, in the object that I receive from the Kafka, I have one value that is basically ID, after each Kafka message I have to process message in the following way:

  • use that id value to check if it is in the IDs list
  • if it is not in the list, then I add that id to the list and process it

I wonder whether instead of saving id to the list it would be more efficient to save it in the hashmap and then check whether there exist particular key in the hash map, instead of checking whether there is id in the list

spiral slateBOT
#

<@&987246399047479336> please have a look, thanks.

spiral slateBOT
#

While you are waiting for getting help, here are some tips to improve your experience:

Code is much easier to read if posted with syntax highlighting and proper formatting.

If nobody is calling back, that usually means that your question was not well asked and hence nobody feels confident enough answering. Try to use your time to elaborate, provide details, context, more code, examples and maybe some screenshots. With enough info, someone knows the answer for sure.

Don't forget to close your thread using the command </help-thread close:1027500463647621170> when your question has been answered, thanks.

#

Here is an AI assisted attempt to answer your question 🤖. Maybe it helps! In any case, a human is on the way 👍. To continue talking to the AI, you can use </chatgpt:1108714622413963314>.

#

s not, process the message and add the ID to the list.

In this scenario, using a hash map would be more efficient than using a list.

A hash map provides constant time complexity (O(1)) for both insertion and lookup operations, while a list has linear time complexity (O(n)) for lookup operations. This means that as the size of your data grows, the performance of a hash map will remain constant, while the performance of a list will degrade.

Additionally, a hash map allows you to quickly check if an ID exists in the data set or not. If you use a list, you would need to iterate through each element to find the ID, which can be time-consuming for large data sets.

Therefore, using a hash map would provide better performance and efficiency for your application in this case.

jade spoke
#

HashMap is usually better if you're performing a lot of searches or even if you have a lot of ids to look up for.

#

Could also use HashSet, if you just care about if a certain id is present or not.

#

Btw one more thing i should mention, both are not thread-safe if that's relevant to your use case.

jade spoke
#

in multi-threaded environment, you can not rely on what it says. Things can break.

#

You might wanna open a new thread, to keep this relevant to OP's questions.😄

chrome widget
#

@jade spoke actually I can see now that I use ConcurrentSkipListSet. Yeah thread-safety is relevant for my case. Is there anything better other than ConcurrentSkipListSet?

jade spoke
#

Might wanna look into HashTable, think its synchronized

chrome widget
jade spoke
#

altho i dont think you would need get

#

but just read up the docs just in case if you're missing something otherwise you have HashTable

chrome widget
#

What are the differences between hash map and hash table?

jade spoke
#

I personally havent used em, mostly just hashmap and hashset. Hash table was something that came up on a quick search about a thread safe way to perform same operation. I would suggest reading up the docs or more investigation around hash table to be on safer side.

vernal drum
#

Hey there!

sonic bladeBOT
#
public class Hashtable<K, V>
  extends Dictionary<K, V>
  implements Map<K, V>, Cloneable, Serializable

This class implements a hash table, which maps keys to values. Any non-null object can be used as a key or as a value.

To successfully store and retrieve objects from a hashtable, the objects used as keys must implement the hashCode method and the equals method.

An instance of Hashtable has two parameters that affect its performance: initial capacity and load factor . The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. Note that the hash table is open : in the case of a "hash collision", a single bucket stores multiple entries, which must be searched sequentially. The load factor

round cove
lusty junco
#

( @vernal drum )

#

likely confused with HashMap, which is the proper class to use

vernal drum
#

I dnt use it thought since 2017

#

So yeah it's HashMap

amber mist
#

Why does it need to be synced. Isnt the id you check for the same you use to map the message in the kafka topic?

#

So the same listener always gets the same ids? (unless youre in a autoscaling multi pod environment)

#

but agree, just use Collections.synchronizedMap(new HashMap<...>())

round cove
round cove
chrome widget
round cove
chrome widget
round cove
chrome widget
round cove
marsh pelican
#

map is actually used for key and value pair

chrome widget
#

If your app needs to access it concurrently, the same problem would arise when using a list

#

As for hashmap vs list, it really depends on how many items you have

#

A hashmap is fast. But a list can be significantly faster when you don't have a lot of elements

#

Also, as said above, a hashmap is for mapping keys to values.

#

If you don't have values, you should use a HashSet

round cove
chrome widget
#

Searching a list vs looking up in a hashset
Item count makes the difference there

#

The question says that they currently use a list search + insert if it doesn't exist
I'm saying that that could be faster than using a hashmap / hashset when the item count is small

chrome widget
#

ConcurrentHashMap.newKeySet();

#

Creates concurrent (hash) Set

#

That would be the set equivalent of ConcurrentHashMap

#

Similar to normal hashset

#

except synchronization ofc

chrome widget
# round cove Speed doesn't matter

I wonder whether instead of saving id to the list it would be more efficient to save it in the hashmap
I'm implying this as a speed question

chrome widget
#

Bro I read your messages

#

it's not like they dissapear when some1 else talks

round cove
chrome widget
#

That's fair

chrome widget
jade spoke
#

Can we do hashmap but sync the add/delete only?

chrome widget
#

¯_(ツ)_/¯

#

I'd answer this with
A Set vs List wouldn't really matter when looking at speed cuz of synchronization
A Set would be clearer in this case, since you need unique elements.

#

So I see benefit in replacing the list with a set

chrome widget
#

Do I need to worry about synchronization if I am at some point removing all elements from the set but maybe at the same time I will add one element to existing set when I try to remove all of them?

chrome widget
#

If you have multiple threads accessing it at some point (or modifying), you gotta have synchronization

#

But since synchronization makes things wait for each other, that would give a way bigger performance difference than the difference with list vs set

#

Imo I would go for the set, because a set is for unique elements, which I believe you want

#

Keep in mind tho

#

When we say 'performance difference'

#

That is still most likely nihil

#

Unless you do this extremely often

#

One other question, what is difference between using Long and long, I know that Long is object

#

Yes that is basically the difference. they are just wrappers that contain the primitive type.
Long Integer and so on are basically because of the java limitation that you can't use primitives with generics

#

So List<int> isn't allowed

#

That's why those wrapper classes exist

round cove