#Performance and Construction Advice (Priority Queue Lib)

84 messages ยท Page 1 of 1 (latest)

thick mountain
#

Hi all, I've been working on a small crate for the past week learning rust. Was hoping someone might be able to look over it and provide some feedback on any issues or things I could be doing better. I've spent quite a bit of time in flame graph optimizing things but I'm not sure that I'm doing all that I can.

Thanks in advance to anyone willing to provide feedback!

https://github.com/JustinTimperio/rpq

GitHub

RPQ is a high performance embeddable double priority queue with complex priority ordering guarantees - GitHub - JustinTimperio/rpq: RPQ is a high performance embeddable double priority queue wit...

sacred trellis
# thick mountain Hi all, I've been working on a small crate for the past week learning rust. Was ...

Hello ๐Ÿ‘‹

Ok a few things I can suggest to improve performance would be:

  • If you don't have a particular need of hashing or a strong requirement then I would suggest replacing hashmaps with Vec<(datatype, datatype)> this will improve speed by eliminating the hashing overhead on each element insersation and in vector datatype there are also some really efficient functions like swap_remove() which can really improve speed if used in the right way.

  • I would look into the build profile flags like lto, strip, codegen-units, etc which you can tweak it for release builds which can give some really good performance.

Also, for a complete guide on different other optimisation techniques you can use I would suggest going through the performance book here ๐Ÿ™‚ :

https://nnethercote.github.io/perf-book/introduction.html

I hope this helps ๐Ÿ™‚

granite lynx
sacred trellis
granite lynx
sacred trellis
# granite lynx I have no idea what you're talking about.

Ok I would suggest this post on time complexity of hashing:

https://crypto.stackexchange.com/questions/67448/what-is-the-time-complexity-of-computing-a-cryptographic-hash-function-random-or

I think it will help understand I am saying more clearly.

Also, now after reading this now if you look into how hashmaps work you will see that if you insert any new key value pair what it does is it takes the key and then it uses a hashing function on it and then sorts all the keys based on this newly hashed key now as you can see already the overhead is there like first it needs to hash then it needs to sort and sorting itself can at best be O(n log(n)) now adding both O(k+ n) + O(n log(n)) it just gets worst like why would you want to add such a time complexity on every new key value inserted to your hashmap like it is not very good in performance that's why if you come from c++ there they have talks about data structure where they say try to avoid hashmaps as much as possible use data structures which are memory contingent and also if you don't care about hashing or ddos protection just use vectors they perform way better.

Also, you might argue rust uses swisstable yes I agree it does and it is memory contingent that's why I didn't mention about memory contingency like we have already solved that but what is the hit in performance here is first hashing and then sorting so the overhead internally of a data structure like hashmaps and hashsets are high so that's I suggested going with vectors than with this data structures if you don't have the hard requirements as I mentioned. ๐Ÿ˜…

I know again this sounds confusing but to understand this I think you might need to study like how internally how each data structure work and you will really understand what I meant. ๐Ÿ˜…

granite lynx
#

I think you're misunderstanding something.

sacred trellis
granite lynx
#

Hashmaps just do an array indexing operation. That doesn't require sorting.

sacred trellis
granite lynx
sacred trellis
# granite lynx Because hashmaps are fast for key-value lookups.

Yes true but insertions are slow. And for a queue I don't think hashmaps are great as a whole like inserting each element O(n+k) while inseting an element in vector is just O(1) and O(n) (if you insert from front) quite good in alot of cases while remove is also the same O(1) from back and O(n) from front. And get is O(1) in vector. So you see what I mean that's why even std library we have a vector based deque vecdeque though note it is not contingenious. I hope you got the idea. ๐Ÿ˜…

granite lynx
#

Inserting into a hashmap is O(1)

sacred trellis
# granite lynx Inserting into a hashmap is O(1)

Well what about the hashing part it will be an overhead right?

Though don't take this whole discussion as an argument or debate just maybe with our discussion we may get things cleared like I mean all the doubts everything cleared who knows. ๐Ÿ˜…

granite lynx
#

There is an overhead for hashing, yes. But if you need to search for something corresponding to a key, a HashMap will be better than a Vec, since searching in a Vec is O(n), while searching a HashMap is O(1).

sacred trellis
short cove
# granite lynx Inserting into a hashmap is O(1)

(amorized yes, but if you insert enough it will have to grow eventually, which is gonna be O(n) as it will have to basically construct a bigger hashmap again from scratch (which is also true for vec, it may need to resize as well)

deep garnet
# thick mountain Hi all, I've been working on a small crate for the past week learning rust. Was ...

https://github.com/JustinTimperio/rpq/blob/master/src/pq.rs#L200
There are pub fns that are commented as for "internal use only". These should be pub(crate) or module-private if possible, or should at least be #[doc(hidden)]

GitHub

RPQ is a high performance embeddable double priority queue with complex priority ordering guarantees - JustinTimperio/rpq

iron cairn
#

how do you write like an ai

sacred trellis
short cove
#

if you do random lookups by key, using a vec of tuples is only gonna be better than a hashmap if there are very few elements

sacred trellis
# iron cairn how do you write like an ai

Well, Idk about that tbh like why whatever I write sounds like AI generated though in reality I am typing myself with my thoughts. So no idea why it sounds like that. ๐Ÿ˜…

sacred trellis
thick mountain
thick mountain
#

@here I have resolved a number of the issues in the library and have merged in the RC1 that fixes some bugs and resolves some of the performance and synchronization issues. If anyone would like to take a look feel free

sacred trellis
# thick mountain For my use case Vec is indeed faster than a HashMap given that typically priorit...

Well after reviewing your PR. I think I would suggest a few things:

  • The choice of Vecdeque is good but one thing to note it is not a continginous data structures meaning it will have cache misses alot which can really slow things down if you are reading data alot which is not great. So I would suggest if you can just implement your own dequeue from scratch using purely vectors as they tend to be contigenous and less prone to cache misses. Thus improving performance.

Also, when you implement your own deque structure and if you need to have the need to implement find/search function I think you can take the help of iterators for that like iter + find or iter + filter though I assume they should be O(1) in time complexity seeing the lazy nature of iterators and reading the std library docs though I will still leave the confirmation of whether it is O(1) on Helix(noop_noob) & noratrieb because I could be wrong with this one ๐Ÿ˜… .

  • I would suggest using iterators more than loops because they have zero cost abstraction and are slightly more performant than loops which becomes more pronounced if you iterate over large number of elements.

Other than that I would suggest setting up clippy lints like:

#![forbid(unsafe_code, clippy::panic)]
#![deny(missing_docs, clippy::missing_docs_in_private_items, clippy::perf)]
#![warn(clippy::cognitive_complexity, rust_2018_idioms)]

Which can really help diagnose your code easily and suggest code suggestion to improve code readability and performance. Though note don't copy and paste the above example blindly I would strongly suggest reading each lint carefully before you do so. ๐Ÿ™‚

granite lynx
sacred trellis
# granite lynx Why on earth would you think that cobbling up a deque from Vecs manually would h...

Well if you read the last paragraph on the vecdequeue docs in the std library it says:

Since VecDeque is a ring buffer, its elements are not necessarily contiguous in memory. If you want to access the elements as a single slice, such as for efficient sorting, you can use make_contiguous. It rotates the VecDeque so that its elements do not wrap, and returns a mutable slice to the now-contiguous element sequence.

Where it says that it is not memory contigent meaning all the elements are not stored in one after the other memory which cause cache misses when you are iterating over this data structure. To understand this I would suggest this video to you:

https://www.youtube.com/watch?v=247cXLkYt2M

And for iterators I think I had seen a benchmark on this and even I have done and it is slightly I mean marginally but quite noticeable when iterating over a large number of elements. ๐Ÿ™‚

Why is the first loop 10x faster than the second, despite doing the exact same work?

Follow me on:
Twitter: https://twitter.com/iced_coffee_dev
Github: https://github.com/simondevyoutube/

In this video we talk a bit about memory, the cpu caches (specifically the L1/L2/L3 cache), hardware prefetching, and how all this comes together for arrays....

โ–ถ Play video
rose veldt
rose veldt
sacred trellis
# rose veldt i would like to point out that a VecDeque is continuous in so much as the alloca...

Well I read this in the std library docs that's why I made the suggestion ๐Ÿ˜… :

Since VecDeque is a ring buffer, its elements are not necessarily contiguous in memory. If you want to access the elements as a single slice, such as for efficient sorting, you can use make_contiguous. It rotates the VecDeque so that its elements do not wrap, and returns a mutable slice to the now-contiguous element sequence.

rose veldt
halcyon portal
#

Ye, a vecdeque at worst means you cache two memory regions that add up to the same size as a single contiguous one

#

That's not a big deal

#

Caches dislike your data being scattered in memory. A VecDeque is a bit more scattered than a Vec, but by a tiny margin

rose veldt
#

its not like a linked list that has potentially one page per element

sacred trellis
sacred trellis
iron cairn
#

you have these clippy lints, yet you use Box<dyn Error>

thick mountain
iron cairn
#

yes.

thick mountain
#

k im gonna have to sit down and do that

iron cairn
#

look into thiserror

short cove
#

if youre a library and dont have that many different errors you probably just want to implement your own error types by hand without any other library

iron cairn
#

possiblu

#

but you can inline the thiserror macro anyways

#

and sometimes you have a lot of error types

short cove
thick mountain
#

i think there maybe 10 low level errors that just get passed up to other error returns so prob can hand roll it

thick mountain
# halcyon portal That's not a big deal

so is the consensus that VecDeqeue is the correct choice here? The impl i have is about ~15% away in speed from a standard binary heap, and almost all of the overhead is futex syscalls so that data structure is thread safe

sacred trellis
# iron cairn why are you randomly suggesting clippy lints? rather odd clippy lints too?

Well just as a tip to help them with maintaining code easily. Also, some lints are not bad at all like there is clippy::perf lint for lints specific to performance and idioms one which I think gives suggestions that makes code more readable I feel. Also congnitive complexity one which suggests code changes that reduce mental strain while reading code. ๐Ÿ™‚

iron cairn
#

clippy::perf is enabled by default though

#

i have problems with #![deny] in general, and #![forbid(panic)] is simply silly

sacred trellis
iron cairn
#

dont use error stack

#

error stack is not for libs

sacred trellis
iron cairn
#

just stop them from adding panics that are wrong?

rose veldt
#

A wild panic has appeared!

#

neon used #![forbid(panic)], it was mildly effective

iron cairn
#

its not like panics are invisible and you cant see them

#

does clippy::panic look at # Panics too

#

no its literally just panic!?

thick mountain
#

pardon my ignorance but what exactly is clippy?

rose veldt
iron cairn
rose veldt
sacred trellis
thick mountain
sacred trellis
sacred trellis
rose veldt
#

though snafu leaks its involvement sometimes

sacred trellis
rose veldt
#

error_stack like crates use grouped error types ment for reporting to humans, not for programmatic response to them

#

most of the time a library should not assume it's errors will just be displayed

sacred trellis
iron cairn
#

i think that panic should be legal

#

panic is not a mistake

#

panic is not a problem to be eliminated