#Optimizing loops within a prime generator

54 messages · Page 1 of 1 (latest)

random arrow
#

I have made several prime number generators that don't make use of the sieve of Eratosthenes on purpose. Both functions work in a similar way; I tried to make the latter more optimized than the previous one by making it run through less checks (tests to verify whether a candidate is prime) however it runs much slower. It seems to me that it performs less well due to the type of iteration I used. Is there any way to make the second function (V4) run faster?

Benchmark results:```
V2 benchmark
Last prime: 15485863
Primes generated: 1000000
Number of checks (modulus operations): 505508948
Elapsed: 7.499s

V4 benchmark
Last prime: 15485863
Primes generated: 1000000
Number of checks (modulus operations): 496617235
Elapsed: 9.652s

#

Code:```rust
fn prime_generator_2(n: usize) {
let mut primes: Vec<u32> = vec![2, 3];
let mut prime_candidate: u32 = 3;
let mut num_checks: u32 = 0;

'candidate_iterator: loop {
    prime_candidate += 2;
    let sqrt_prime_candidate = (prime_candidate as f32).sqrt() as u32;

    for divider in &primes {
        num_checks += 1; // not essential, just for benchmarks
        if prime_candidate % divider == 0 { continue 'candidate_iterator; }

        if divider > &sqrt_prime_candidate {
            primes.push(prime_candidate);
            break;
        }
    }

    if primes.len() >= n { break; }
}

println!("Last prime: {}", primes[primes.len() - 1]);
println!("Primes generated: {}", primes.len());
println!("Number of checks (modulus operations): {}", num_checks);

}

fn prime_generator_4(n: usize) {
let mut primes: Vec<u32> = vec![2, 3];
let mut limiter: usize = 1;
let mut prev_threshold: u32 = 5;
let mut curr_threshold: u32 = 9;
let mut num_checks: u32 = 0;

'threshold_iter: loop {
    'candidate_iter: for prime_candidate in (prev_threshold..curr_threshold).step_by(2) {
        for i in 1..limiter {
            num_checks += 1; // not essential, just for benchmarks
            if (prime_candidate % primes[i]) == 0 { continue 'candidate_iter; }
        }

        primes.push(prime_candidate);

        if primes.len() >= n { break 'threshold_iter; }
    }

    limiter += 1;
    prev_threshold = curr_threshold + 2;
    curr_threshold = primes[limiter].pow(2);
}

println!("Last prime: {}", primes[primes.len() - 1]);
println!("Primes generated: {}", primes.len());
println!("Number of checks (modulus operations): {}", num_checks);

}

lunar rose
#

in prime_generator_2, you have .sqrt() called n-times for no reason. just put outside the loop

#

if you can allocate the primes Vec once would be better ```rust
let mut primes: Vec<u32> = Vec::with_capacity(n);
primes.push(2);
primes.push(3);

random arrow
#

I could do that as a safety measure

#

But really its the prime_generator_4 that needs optimization

random arrow
#

I kinda figured this method out by myself but as you can see from the benchmarks the number of modulo operations is lower in prime_generator_4 than in prime_generator_2 so it should be more efficient but it runs slower

random arrow
snow wren
#

Your code fails to test the highest integer at most equal to the square root of the current number.

Let's start from is_prime(3). sqrt(3) is approximately 1.7, which truncates to 1, and 2..1 is an empty range, so you skip the loop altogether and return true by default.

2..2 is also an empty range, which means you also pass-by-default all numbers from 4 to 8 inclusive.

2..3 is not an empty range, but it does not contain 3. Only 2. Thus, you only test 9 % 2 == 0, which naturally isn't true, so you don't return false and end up passing 9 as well.
You also accept 11, 13, and 15 without testing the 3 divisor.

#

The correct fix is to replace .. with ..=, an inclusive range (and naturally, remove the + 1)

random arrow
snow wren
#

...thinks on second thought, adding +1 to the RHS or making the range inclusive are the same thing

#

Apologies, I started writing that before I realized the range was the "issue"

snow wren
random arrow
#

It's probably because of the loops maybe I dunno ferrisCluelesser

lunar rose
# random arrow Noted! That should help for both functions

do you have a max for n ? my thinking is you can probably offset sqrt(n) from runtime to compile time by using lazy_static crate and store precaculated sqrt in a hashmap and in your prime_generator, you can do precalculated_sqrt.get(n).unwrap_or(n.sqrt())

rose cloud
#

so

#

did anybody mention --release yet

#

i didn't see OP actually explicitly say that
if it's release, it might be optimizing the simpler algo better, or maybe OP is right and the algo just does extra work
if it's debug, turn on release

#
    'threshold_iter: loop {
        'candidate_iter: for prime_candidate in (prev_threshold..curr_threshold).step_by(2) {
            for i in 1..limiter {
                num_checks += 1;
                                   // bounds check?
                                   //       vvv
                if (prime_candidate % primes[i]) == 0 { continue 'candidate_iter; }
            }

            primes.push(prime_candidate);

            if primes.len() >= n { break 'threshold_iter; }
        }
        // is this guaranteed to be in bounds?
        limiter += 1;
        prev_threshold = curr_threshold + 2;
                          // bounds check?
                          //   vvvvvvvvv
        curr_threshold = primes[limiter].pow(2);
    }
#

compare with prime_generator_2 which uses native iteration rather than indexed

#

surely this won't make a difference in release mode
surely

    'threshold_iter: loop {
        'candidate_iter: for prime_candidate in (prev_threshold..curr_threshold).step_by(2) {
-           for i in 1..limiter {
+           for prime in primes[1..limiter] {
                num_checks += 1; // not essential, just for benchmarks
-               if (prime_candidate % primes[i]) == 0 { continue 'candidate_iter; }
+               if (prime_candidate % prime) == 0 { continue 'candidate_iter; }
            }

            primes.push(prime_candidate);

            if primes.len() >= n { break 'threshold_iter; }
        }

        limiter += 1;
        prev_threshold = curr_threshold + 2;
        curr_threshold = primes[limiter].pow(2);
    }
random arrow
random arrow
random arrow
# rose cloud ```rs 'threshold_iter: loop { 'candidate_iter: for prime_candidate i...

Nothing to worry about concerning the bounds; the main idea is that for numbers under n^2, it is unnecessary to test factors equal to or beyond n. Say for numbers under 25, I only need to test 3 as a divider (15 is divisible by 3). Also the current threshold is a square (hence not a prime) so I can skip it for the next iteration of previous threshold. I keep adding 2 for the next prime candidate since I am always iterating over odd numbers after three.

#

Basing the bounds on the squares of primes is the reciprocal idea of testing factors up to the sqrt of a prime candidate, if that makes any senseferrisDerp

rose cloud
#

should give you drastically faster results

#

but you'd have to "black box" a few things so that the optimizer doesn't accidentally replace your entire function with a constant

#

something like

// inside your benchmark function
prime_generator_4(std::hint::black_box(n));
#

a lot of debug data is tracked during debug mode

#

stack traces, bounds checks, none of the functions are inlined either

rose cloud
#

because if i do some complex operation like

let x = 5;
let y = 7;
let z = x + y;
#

the compiler will just say

let z = 12;
#

it will skip over all the calculations

#

so for example i could write fibonacci(10) and the compiler would just replace it with the 10th fibonacci number without calculating anything

#

just an extra precaution

random arrow
rose cloud
#

yea

random arrow
#

I see

rose cloud
#

it often gives ideas like "oh my god my function ran in 1 nanosecond! that's so blazingly fast! 🔥 🔥 🔥"

#

then you look at how the optimizer actually wrote the program and it's literally just print the result

#

or something

rose cloud
#

and every tick, the iterator has to call .next(), increment counter, check that it's not exhausted, wrap the next item in a Some, return it, match over it, before it does something

#

in release mode it'd just inline .next() so it doesn't have to do a bunch of jumping around

#

and the Some() => {}, None => {} stuff just don't need to exist at all

#

it'd just be a normal C for loop

#

anyway i'll talk to you in like 20min

random arrow
#

The new benchmarks: ```
V2 benchmark
Last prime: 67867967
Primes generated: 4000000
Number of checks (modulo operations): 3707195204
Elapsed: 11.800s

V3 benchmark
Last prime: 67867967
Primes generated: 4000000
Number of checks (modulo operations): 3668727080
Elapsed: 9.418s

V4 benchmark
Last prime: 67867967
Primes generated: 4000000
Number of checks (modulo operations): 3668727080
Elapsed: 9.382s