Optimizing loops within a prime generator | Rust Programming Language Community | Page 1

random arrow Sep 15, 2023, 9:23 AM

#

I have made several prime number generators that don't make use of the sieve of Eratosthenes on purpose. Both functions work in a similar way; I tried to make the latter more optimized than the previous one by making it run through less checks (tests to verify whether a candidate is prime) however it runs much slower. It seems to me that it performs less well due to the type of iteration I used. Is there any way to make the second function (V4) run faster?

Benchmark results:```
V2 benchmark
Last prime: 15485863
Primes generated: 1000000
Number of checks (modulus operations): 505508948
Elapsed: 7.499s

V4 benchmark
Last prime: 15485863
Primes generated: 1000000
Number of checks (modulus operations): 496617235
Elapsed: 9.652s

#

Code:```rust
fn prime_generator_2(n: usize) {
let mut primes: Vec<u32> = vec![2, 3];
let mut prime_candidate: u32 = 3;
let mut num_checks: u32 = 0;

'candidate_iterator: loop {
    prime_candidate += 2;
    let sqrt_prime_candidate = (prime_candidate as f32).sqrt() as u32;

    for divider in &primes {
        num_checks += 1; // not essential, just for benchmarks
        if prime_candidate % divider == 0 { continue 'candidate_iterator; }

        if divider > &sqrt_prime_candidate {
            primes.push(prime_candidate);
            break;
        }
    }

    if primes.len() >= n { break; }
}

println!("Last prime: {}", primes[primes.len() - 1]);
println!("Primes generated: {}", primes.len());
println!("Number of checks (modulus operations): {}", num_checks);

}

fn prime_generator_4(n: usize) {
let mut primes: Vec<u32> = vec![2, 3];
let mut limiter: usize = 1;
let mut prev_threshold: u32 = 5;
let mut curr_threshold: u32 = 9;
let mut num_checks: u32 = 0;

'threshold_iter: loop {
    'candidate_iter: for prime_candidate in (prev_threshold..curr_threshold).step_by(2) {
        for i in 1..limiter {
            num_checks += 1; // not essential, just for benchmarks
            if (prime_candidate % primes[i]) == 0 { continue 'candidate_iter; }
        }

        primes.push(prime_candidate);

        if primes.len() >= n { break 'threshold_iter; }
    }

    limiter += 1;
    prev_threshold = curr_threshold + 2;
    curr_threshold = primes[limiter].pow(2);
}

println!("Last prime: {}", primes[primes.len() - 1]);
println!("Primes generated: {}", primes.len());
println!("Number of checks (modulus operations): {}", num_checks);

}

lunar rose Sep 15, 2023, 9:39 AM

#

in prime_generator_2, you have .sqrt() called n-times for no reason. just put outside the loop

#

if you can allocate the primes Vec once would be better ```rust
let mut primes: Vec<u32> = Vec::with_capacity(n);
primes.push(2);
primes.push(3);

random arrow Sep 15, 2023, 9:56 AM

#

I could do that as a safety measure

#

But really its the prime_generator_4 that needs optimization

random arrow Sep 15, 2023, 10:01 AM

#

lunar rose in `prime_generator_2`, you have `.sqrt()` called n-times for no reason. just pu...

I think it's necessary to put it inside cause it is dependent on the current prime candidate value

#

I kinda figured this method out by myself but as you can see from the benchmarks the number of modulo operations is lower in prime_generator_4 than in prime_generator_2 so it should be more efficient but it runs slower

random arrow Sep 15, 2023, 10:05 AM

#

lunar rose if you can allocate the primes Vec once would be better ```rust let mut primes: ...

Noted! That should help for both functions

snow wren Sep 15, 2023, 10:06 AM

#

Your code fails to test the highest integer at most equal to the square root of the current number.

Let's start from is_prime(3). sqrt(3) is approximately 1.7, which truncates to 1, and 2..1 is an empty range, so you skip the loop altogether and return true by default.

2..2 is also an empty range, which means you also pass-by-default all numbers from 4 to 8 inclusive.

2..3 is not an empty range, but it does not contain 3. Only 2. Thus, you only test 9 % 2 == 0, which naturally isn't true, so you don't return false and end up passing 9 as well.
You also accept 11, 13, and 15 without testing the 3 divisor.

#

The correct fix is to replace .. with ..=, an inclusive range (and naturally, remove the + 1)

lunar rose Sep 15, 2023, 10:07 AM

#

random arrow I think it's necessary to put it inside cause it is dependent on the current pri...

yeah my bad.

random arrow Sep 15, 2023, 10:09 AM

#

snow wren Your code fails to test the highest integer at most equal to the square root of ...

My code works properly, there are no issues with the prime numbers it generated

snow wren Sep 15, 2023, 10:10 AM

#

...thinks on second thought, adding +1 to the RHS or making the range inclusive are the same thing

#

Apologies, I started writing that before I realized the range was the "issue"

snow wren Sep 15, 2023, 10:10 AM

#

snow wren ..._thinks_ on second thought, adding `+1` to the RHS or making the range inclus...

(well, they differ if you ask is_prime(u64::MAX) I guess)

random arrow Sep 15, 2023, 10:15 AM

#

It's probably because of the loops maybe I dunno ferrisCluelesser

lunar rose Sep 15, 2023, 12:13 PM

#

random arrow Noted! That should help for both functions

do you have a max for n ? my thinking is you can probably offset sqrt(n) from runtime to compile time by using lazy_static crate and store precaculated sqrt in a hashmap and in your prime_generator, you can do precalculated_sqrt.get(n).unwrap_or(n.sqrt())

rose cloud Sep 15, 2023, 5:28 PM

#

so

#

did anybody mention --release yet

#

i didn't see OP actually explicitly say that
if it's release, it might be optimizing the simpler algo better, or maybe OP is right and the algo just does extra work
if it's debug, turn on release

#

    'threshold_iter: loop {
        'candidate_iter: for prime_candidate in (prev_threshold..curr_threshold).step_by(2) {
            for i in 1..limiter {
                num_checks += 1;
                                   // bounds check?
                                   //       vvv
                if (prime_candidate % primes[i]) == 0 { continue 'candidate_iter; }
            }

            primes.push(prime_candidate);

            if primes.len() >= n { break 'threshold_iter; }
        }
        // is this guaranteed to be in bounds?
        limiter += 1;
        prev_threshold = curr_threshold + 2;
                          // bounds check?
                          //   vvvvvvvvv
        curr_threshold = primes[limiter].pow(2);
    }

#

compare with prime_generator_2 which uses native iteration rather than indexed

#

surely this won't make a difference in release mode
surely

    'threshold_iter: loop {
        'candidate_iter: for prime_candidate in (prev_threshold..curr_threshold).step_by(2) {
-           for i in 1..limiter {
+           for prime in primes[1..limiter] {
                num_checks += 1; // not essential, just for benchmarks
-               if (prime_candidate % primes[i]) == 0 { continue 'candidate_iter; }
+               if (prime_candidate % prime) == 0 { continue 'candidate_iter; }
            }

            primes.push(prime_candidate);

            if primes.len() >= n { break 'threshold_iter; }
        }

        limiter += 1;
        prev_threshold = curr_threshold + 2;
        curr_threshold = primes[limiter].pow(2);
    }

random arrow Sep 15, 2023, 6:36 PM

#

rose cloud i didn't see OP actually explicitly say that if it's release, it might be optimi...

Nope, I wasn't in release mode

random arrow Sep 15, 2023, 6:44 PM

#

rose cloud surely this won't make a difference in release mode surely ```diff 'threshol...

This seems to be doing the trick (1 second faster than prime_number_generator_2 so 2 seconds improvement from previous code). Damn I've never seen the range function used within an array index, wish I knew that earlier (feels better on the eyes too). Anyway thanks a lot!!! ferristhumbsup

random arrow Sep 15, 2023, 6:54 PM

#

rose cloud ```rs 'threshold_iter: loop { 'candidate_iter: for prime_candidate i...

Nothing to worry about concerning the bounds; the main idea is that for numbers under n^2, it is unnecessary to test factors equal to or beyond n. Say for numbers under 25, I only need to test 3 as a divider (15 is divisible by 3). Also the current threshold is a square (hence not a prime) so I can skip it for the next iteration of previous threshold. I keep adding 2 for the next prime candidate since I am always iterating over odd numbers after three.

#

Basing the bounds on the squares of primes is the reciprocal idea of testing factors up to the sqrt of a prime candidate, if that makes any sense ferrisDerp

rose cloud Sep 16, 2023, 4:46 AM

#

random arrow Nope, I wasn't in release mode

try running the same benchmarks in release

#

should give you drastically faster results

#

but you'd have to "black box" a few things so that the optimizer doesn't accidentally replace your entire function with a constant

#

something like

// inside your benchmark function
prime_generator_4(std::hint::black_box(n));

#

a lot of debug data is tracked during debug mode

#

stack traces, bounds checks, none of the functions are inlined either

random arrow Sep 16, 2023, 4:51 AM

#

rose cloud but you'd have to "black box" a few things so that the optimizer doesn't acciden...

What is "black boxing"?

rose cloud Sep 16, 2023, 4:51 AM

#

random arrow What is "black boxing"?

basically telling the optimizer "no, you can't assume that this value is known"

#

because if i do some complex operation like

let x = 5;
let y = 7;
let z = x + y;

#

the compiler will just say

let z = 12;

#

it will skip over all the calculations

#

so for example i could write fibonacci(10) and the compiler would just replace it with the 10th fibonacci number without calculating anything

#

just an extra precaution

random arrow Sep 16, 2023, 4:53 AM

#

rose cloud it will skip over all the calculations

And that would basically give me inaccurate idea of the actual performance of the generators?

rose cloud Sep 16, 2023, 4:53 AM

#

yea

random arrow Sep 16, 2023, 4:53 AM

#

I see

rose cloud Sep 16, 2023, 4:53 AM

#

it often gives ideas like "oh my god my function ran in 1 nanosecond! that's so blazingly fast! 🔥 🔥 🔥"

#

then you look at how the optimizer actually wrote the program and it's literally just print the result

#

or something

rose cloud Sep 16, 2023, 4:56 AM

#

rose cloud stack traces, bounds checks, none of the functions are inlined either

also in debug mode, for loops i think have to go through this process:

for item in array[start..end] {
    do_something(item);
}
``````rs
assert!(end <= array.len());
let slice = array[start..end];
let mut _iterator = slice.into_iter();
loop {
    match _iterator.next() {
        Some(item) => do_something(item),
        None => break,
    }
}

#

and every tick, the iterator has to call .next(), increment counter, check that it's not exhausted, wrap the next item in a Some, return it, match over it, before it does something

#

in release mode it'd just inline .next() so it doesn't have to do a bunch of jumping around

#

and the Some() => {}, None => {} stuff just don't need to exist at all

#

it'd just be a normal C for loop

#

anyway i'll talk to you in like 20min

random arrow Sep 16, 2023, 4:58 AM

#

The new benchmarks: ```
V2 benchmark
Last prime: 67867967
Primes generated: 4000000
Number of checks (modulo operations): 3707195204
Elapsed: 11.800s

V3 benchmark
Last prime: 67867967
Primes generated: 4000000
Number of checks (modulo operations): 3668727080
Elapsed: 9.418s

V4 benchmark
Last prime: 67867967
Primes generated: 4000000
Number of checks (modulo operations): 3668727080
Elapsed: 9.382s

#Optimizing loops within a prime generator