#Bench-marking with Criterion
22 messages · Page 1 of 1 (latest)
looks generally reasonable in terms of "writing correct benchmark code".
arguably the benchmark isn't fair, because stats_channel includes the costs of there being a receiver but stats_counter does not include the cost of anything else reading the atomic, but I'm not sure whether that's a significant concern or what sort of code would be a good test in that regard.
however, a bigger concern, I would say, is that stats_atomic is so tiny — it is executing one CPU instruction in a loop — that its performance characteristics here are not going to reflect how it performs when it is combined with code doing actual work
this is the most micro of micro-benchmarks, and it's easy for those to be unrealistic
Gotcha, indeed it's a very very minimal one which might fail to correspond to anything real.
My goal is to compare thread statistic reporting between channel with Crossbeam and Atomics basically
sure
I'm saying that probably, what you actually want to know is:
"what will these reporting strategies cost when in use reporting the progress of some other code?"
and the benchmark tells you something, but it does not actually tell you the answer to that question
also, the atomic update is basically always going to be cheaper than anything else because every other kind of thread synchronization is (usually) built on atomics, i.e. Sender::send() is going to start with some atomic operation and also do more things
Gotcha. So you would write actual workers with workload and benchmark these in that context ?
Sure I just want to measure it to justify my choices
I "would write actual workers" if someone was paying me to make and justify a design decision. Not necessarily in every benchmark I ever write. Just, you should always keep in mind the ways in which your benchmark is unrealistic, and in this case, the big one is "the CPU is not executing any other instructions doing actual work".
Got it thanks a lot!
In the same order, regarding benchmarking Mutex against RwLock for a shared Vec<Vec<u8> that is very often read but written only < 1% of the time, RwLock outperforms it naturally ?
that's the sort of thing where you really do need to test with closer to actual workload
because the results will depend hugely on the timing of accesses
also, for read-often-write-rarely use cases you should consider using https://docs.rs/arc-swap/ or using a channel to distribute the new data to the workers. The advantage of these strategies over RwLock is that the readers never take a lock and therefore can never be blocked on the writer or block the writer
Making Arc itself atomic
Ooh that looks really interesting indeed, thanks a lot! Idk if I write that rarely tho (it's for a fuzzer, basically for the shared corpus) but worth looking into it
if you write often it might even be better, because, again, less blocking
the cost is just that you have to allocate copies of what you write instead of mutating it in-place
not a problem in my case since I work locally and sync only periodically with other thread