#better way for doing this
46 messages ยท Page 1 of 1 (latest)
any help?
I haven't had too much chance to look at it yet, but at first glance the locate function could be re-written as:
pub fn locate(board: &[&[u8]], target_ch: u8) -> Option<(usize, usize)> {
CACHE.with(|cache| {
*cache.borrow_mut().entry(target_ch).or_insert_with(|| {
board.iter().enumerate().find_map(|(i, row)| {
row.iter().enumerate().find_map(|(j, elem)| {
if *elem == target_ch {
Some((i, j))
} else {
None
}
})
})
})
})
}
Which probably isn't much of a performance gain here (except being able to insert in place with the entry API) - but will allow for easy parallelisation using Rayon if you do desire by changing iter to par_iter, and find_map to find_map_any:
pub fn locate(board: &[&[u8]], target_ch: u8) -> Option<(usize, usize)> {
CACHE.with(|cache| {
*cache.borrow_mut().entry(target_ch).or_insert_with(|| {
board.par_iter().enumerate().find_map_any(|(i, row)| {
row.par_iter().enumerate().find_map_any(|(j, elem)| {
if *elem == target_ch {
Some((i, j))
} else {
None
}
})
})
})
})
}
Although usually parallelising that is probably only really worth it on large grid sizes
can u explain the second code
*cache.borrow_mut().entry(target_ch).or_insert_with(|| {
board.par_iter().enumerate().find_map_any(|(i, row)| {
Yeah sure ๐
- cache.borrow_mut() gets the &mut to the HashMap as before
- HashMap::entry is a function that takes a key and returns an "Entry" type which might have something in it, or might not (if it doesn't exist in the HashMap)
- Entry::or_insert_with takes a function to call that produces an item to be placed into the HashMap under the key in the case that the key didn't exist in the map (won't be called if it was already)
- board.par_iter() produces a parallel iterator over the collection - Rayon provides many of the same iterator adapters as std::Iterator except that when they're run, they can run in parallel - Rayon handles that behind the scenes for you
- enumerate() exists in this case because the par_iter function of the collection produces an IndexedParallelIterator which supports random access and more adapters
- find_map_any is like your regular find_map except that Rayon can run it in parallel - so it can check multiple rows at the same time on different threads, and it'll stop whenever one of the invocations returns a Some
- We can just do the same for iterating the elements in a given row in parallel with a nested par_iter/enumerate/find_map_any
i see, and that is already parallel or do we make it more parallel
if you don't use par_iter (and just use regular iter) then none of that is parallel - it'll be using the std::Iterator API which is single threaded - using par_iter unlocks all the parallelisation that Rayon provides
ahh true
and for this part
can u also suggest a better way of doing it
btw how do i check if this is faster than the older one
You can use benchmarking such as Criterion which is fairly easy to use: https://bheisler.github.io/criterion.rs/book/getting_started.html
User Guide and Other Prose Documentation For Criterion.rs
You'd just benchmark the parent function - whatever is visible, you wouldn't be able to benchmark the nbrs in isolation since it's not accessbile to call in your Criterion benchmarks
A quick-and-dirty method would be just to time portions of the code yourself like this
?play
use std::time::Instant;
fn main() {
let start_time = Instant::now();
// do some stuff
for _ in 0..10 {
println!("Doing Stuff...");
}
let duration = start_time.elapsed();
println!("Took {} seconds", duration.as_secs_f32());
}
Doing Stuff...
Doing Stuff...
Doing Stuff...
Doing Stuff...
Doing Stuff...
Doing Stuff...
Doing Stuff...
Doing Stuff...
Doing Stuff...
Doing Stuff...
Took 0.000022211 seconds```
sorry i still dont get it how would this work without changing function
You'll have to change the outer function to be able to time the inner function - if you want to benchmark without changing any function then you can only benchmark the outer function because that's all that's publicly visible
ohh okay ill try that
and
Yeah parallelizing isn't always a gain especially for small workloads, the benefits increase as the sizes get larger generally
what do think shld be done here then to make it parallel and faster
@pearl compass sorry to bother but can i get some help with this one pls
parallel != faster
yes but im asked to make it parallel as much as it can be and fast as well
sometimes the answer to having faster code is to not make it parallel
true, but do u have any recommendation on making it better
or smthing
how do i add that using terminal
cargo criterion?
you dont add that to the terminal. have you read the docs for criterion?
have you read the criterion book
no
is there another way other than benchmark
since ive never used that before
how will you measure performance if you dont benchmark?