Optimizing my collatz code | Together C & C++ | Page 2

stone plover Oct 22, 2023, 6:05 PM

#

python code automatically manages memory

torn quarry Oct 22, 2023, 6:06 PM

#

but it was always the same numpy array, no?

stone plover Oct 22, 2023, 6:06 PM

#

realloc isn't used on the giant_buffer

#

its being used on numbers and counters

torn quarry Oct 22, 2023, 6:06 PM

#

what other buffers exist?

#

I mean: what for?

stone plover Oct 22, 2023, 6:07 PM

#

I'll sent the code and I'll explain

#

int basic_collatz(uint64_t n) {
    uint64_t count = 0;
    uint64_t arr_counter = 0;
    uint64_t size = n;
    uint64_t* numbers;
    uint64_t* counters;
 
    numbers = malloc(size * sizeof(uint64_t));
    counters = malloc(size * sizeof(uint64_t));
    
    while(n != 1) {
        if (n % 2 == 0) {
 
            if (arr_counter >= size) {
                size *= 2;
                numbers = realloc(numbers, size * sizeof(uint64_t));
                counters = realloc(counters, size * sizeof(uint64_t));
            }
 
            assert(arr_counter < size);
            numbers[arr_counter] = n;
            counters[arr_counter] = count;
 
            n /= 2;
            count++;
            arr_counter++;
 
        } else {
 
            if (arr_counter >= size) {
                size *= 2;
                numbers = realloc(numbers, size * sizeof(uint64_t));
                counters = realloc(counters, size * sizeof(uint64_t));
            }
 
            assert(arr_counter < size);
            numbers[arr_counter] = n;
            counters[arr_counter] = count;
 
            n = ((3 * n) + 1) / 2;
            count+=2;
            arr_counter++;
 
        }
 
    }
 
    for (uint64_t i = 0; i < arr_counter; i++) {
        giant_buffer[numbers[i]] = count - counters[i];
    }
 
    free(numbers);
    free(counters);
 
    return count;
}```

#

do you want me to send the python code that is easier to read?

torn quarry Oct 22, 2023, 6:07 PM

#

yes

stone plover Oct 22, 2023, 6:07 PM

#

def basic_algo(n):
    global mem
    c = 0
    numbers = []
    counters = []

    while n > 1:
                
        if n % 2 == 0:

            numbers.append(n)
            counters.append(c)

            n = n // 2
            c+=1

        else:

            numbers.append(n)
            counters.append(c)

            n = (3 * n + 1) // 2
            c+=2

    for i in range(len(numbers)):
        mem[numbers[i]] = c - counters[i]
    
    return c```

torn quarry Oct 22, 2023, 6:07 PM

#

thx!

stone plover Oct 22, 2023, 6:07 PM

#

numbers and counters

#

I cache all the numbers that were computed in the end

#

I do it at the end because I don't know how far they are from 1 (collatz..)

torn quarry Oct 22, 2023, 6:08 PM

#

okay but that's where your problem is

stone plover Oct 22, 2023, 6:08 PM

#

the program is on numbers[i]

torn quarry Oct 22, 2023, 6:08 PM

#

mem has been defined with capacity end-start

#

so it is too short to hold all the results basic_algo might need

stone plover Oct 22, 2023, 6:09 PM

#

no

torn quarry Oct 22, 2023, 6:09 PM

#

you can cache what basic_algo does, but you have no proof it fits into mem

stone plover Oct 22, 2023, 6:09 PM

#

the problem isn't the mem

#

numbers[i] returns an absurd huge number

torn quarry Oct 22, 2023, 6:10 PM

#

yeah that's what I mean

stone plover Oct 22, 2023, 6:10 PM

#

@stone plover on the call `basic_collatz(103561)`, loop iteration `i=23` from the final loop is causing `numbers[i]` to result in `129140162```

torn quarry Oct 22, 2023, 6:10 PM

#

mem is not meant to track an entire orbit

#

(it just cann't do it, by definition)

#

numbers and counters are tracking a full orbit

#

(the sequence of numbers that are hit by the basic algo, from a giving starting point)

#

but those numbers can because extremely big compared to your problem range

stone plover Oct 22, 2023, 6:11 PM

#

oh

#

so I can just add a check

#

that if the number is bigger than the range

#

I don't cache it

torn quarry Oct 22, 2023, 6:11 PM

#

yes

stone plover Oct 22, 2023, 6:11 PM

#

Wonderful!

#

Thanks!

#

I thought the number that numbers[i] was returning wasn't a computed number

#

However I verified it is

#

Thanks for helping me understand it

torn quarry Oct 22, 2023, 6:12 PM

#

you're welcome!

#

don't thx me, thx python and its ~~verbose~~ ~~high level~~ ~~decent~~ sane-for-the-mind style 🤣 hahahaha

torn quarry Oct 22, 2023, 6:15 PM

#

stone plover that if the number is bigger than the range

and because you now know what you're going to cache, fits into mem,

#

please remove those realloc ; it's totally unnecessary now 😉

stone plover Oct 22, 2023, 6:15 PM

#

what

#

The reallocs are not for mem

torn quarry Oct 22, 2023, 6:16 PM

#

but cann't you mutate mem while iterating?

stone plover Oct 22, 2023, 6:16 PM

#

mutate?

torn quarry Oct 22, 2023, 6:16 PM

#

ah, maybe it will become silly

stone plover Oct 22, 2023, 6:16 PM

#

:_:

torn quarry Oct 22, 2023, 6:17 PM

#

is mem an array of signed integers?

stone plover Oct 22, 2023, 6:17 PM

#

confusion immerses

#

no?

#

mem is the giant buffer

#

that we cache all numbers

#

we would compute in the future

torn quarry Oct 22, 2023, 6:18 PM

#

okay nvm 😄 I had an idea but, I think it will get too mystic

stone plover Oct 22, 2023, 6:18 PM

#

curiosity hits

#

I don't know I want to hear it I have wasted a lot of my time on this project but curiosity hits hard

#

I made some slight modifications should be all good now

#

//create simple collatz algorithm
int basic_collatz(uint64_t n) {
    uint64_t count = 0;
    uint64_t arr_counter = 0;
    uint64_t size = n;
    uint64_t* numbers;
    uint64_t* counters;

    numbers = malloc(size * sizeof(uint64_t));
    counters = malloc(size * sizeof(uint64_t));
    
    while(n != 1) {
        if (n % 2 == 0) {

            if (arr_counter >= size) {
                size *= 2;
                numbers = realloc(numbers, size * sizeof(uint64_t));
                counters = realloc(counters, size * sizeof(uint64_t));
            }

            if (n > max_number) {
                n/=2;
                count++;
            }
            else {
            numbers[arr_counter] = n;
            counters[arr_counter] = count;

            n /= 2;
            count++;
            arr_counter++;
            }

        } else {

            if (arr_counter >= size) {
                size *= 2;
                numbers = realloc(numbers, size * sizeof(uint64_t));
                counters = realloc(counters, size * sizeof(uint64_t));
            }

            if (n > max_number) {
                n = ((3 * n) + 1) / 2;
                count+=2;
            }
            else {
                
            numbers[arr_counter] = n;
            counters[arr_counter] = count;

            n = ((3 * n) + 1) / 2;
            count+=2;
            arr_counter++;

            }

        }

    }

    for (uint64_t i = 0; i < arr_counter; i++) {
        giant_buffer[numbers[i]] = count - counters[i];
    }

    free(numbers);
    free(counters);

    return count;
}```

#

I added that if ( n > max_number )

#

Alright it works now 🙂

#

a wonderful number of 2.7 seconds on O0

torn quarry Oct 22, 2023, 6:21 PM

#

def basic_algo(n):
    global mem
    # `mem` will get mutated by this method. We keep track of the mutations
    # by using the sign as a flag. 
    c = 0

    while n > 1:
        if n < mem_size and mem[n] == 0:
           mem[n] = -c
                
        if n % 2 == 0:
            n = n // 2
            c+=1
        else:
            n = (3 * n + 1) // 2
            c+=2

    for i in range(len(mem)):
        if mem[i] < 0: mem[i] = c + mem[i]
    
    return c

stone plover Oct 22, 2023, 6:21 PM

#

lets see the O3

torn quarry Oct 22, 2023, 6:21 PM

#

would that work ?

stone plover Oct 22, 2023, 6:22 PM

#

why are you storing mem[n] = -c?

#

it should be mem[n] = c no?

torn quarry Oct 22, 2023, 6:22 PM

#

I use the sign as a way to keep track of the indices I'm mutating

stone plover Oct 22, 2023, 6:22 PM

#

so after you remove it from mem[i] = c - mem[i]

torn quarry Oct 22, 2023, 6:22 PM

#

ah yeah sorry

stone plover Oct 22, 2023, 6:22 PM

#

but yeah

torn quarry Oct 22, 2023, 6:22 PM

#

fixed

stone plover Oct 22, 2023, 6:22 PM

#

That should probably work perfectly

#

and I remove the need of buffers

#

let me backup this code that I have right now

#

and I'll try to implement that

#

sounds easy

torn quarry Oct 22, 2023, 6:23 PM

#

(I haven't added the check on n! so beware

stone plover Oct 22, 2023, 6:23 PM

#

Yeah I have noticed

#

I'll add it

torn quarry Oct 22, 2023, 6:23 PM

#

fixed

#

ok !

stone plover Oct 22, 2023, 6:23 PM

#

this saves all the buffer and the reallocations

#

thats perfect!

#

anyways lets see some beautiful O3 performance 🙂

torn quarry Oct 22, 2023, 6:24 PM

#

maybe measure it before checking if it's really nicer 😄 hahaha

#

it's a bit hacky.. maybe a comment would be useful to understand why we use negative numbers

stone plover Oct 22, 2023, 6:24 PM

#

I still don't get why you use negative

#

I think it should be like this:

#

def basic_algo(n):
    global mem
    c = 0

    while n > 1:
        if n < mem_size and mem[n] == 0:
           mem[n] = c # CHANGED THIS
                
        if n % 2 == 0:
            n = n // 2
            c+=1
        else:
            n = (3 * n + 1) // 2
            c+=2

    for i in range(len(mem)):
        if mem[i] < 0: mem[i] = c - mem[i] #CHANGED THIS AS WELL
    
    return c```

#

Then mem can be unsigned

torn quarry Oct 22, 2023, 6:25 PM

#

no

#

because if you do that

stone plover Oct 22, 2023, 6:25 PM

#

I changed the + to -

torn quarry Oct 22, 2023, 6:25 PM

#

next time you'll need to increment this counter (the old counters buffer), you won't see it

#

because it has positive sign

stone plover Oct 22, 2023, 6:26 PM

#

wait

torn quarry Oct 22, 2023, 6:26 PM

#

it really must stay negative, with a + at the end. otherwise you keep track of what entry is in "mutation" state

#

no? I'm saying something wrong?

stone plover Oct 22, 2023, 6:26 PM

#

I think?

torn quarry Oct 22, 2023, 6:26 PM

#

aaaaaaah, maybe I'm too prudent yes : because you have no cycles

#

so you don't come back on the same entry

stone plover Oct 22, 2023, 6:27 PM

#

on the while loop

torn quarry Oct 22, 2023, 6:27 PM

#

okay fair

stone plover Oct 22, 2023, 6:27 PM

#

it will count how many iterations it has done until now

torn quarry Oct 22, 2023, 6:27 PM

#

yeah okay okay fair to me! I trust

stone plover Oct 22, 2023, 6:27 PM

#

no maybe I am wrong

#

I need to be sure

#

xd

torn quarry Oct 22, 2023, 6:27 PM

#

(I mean, let's just run the python code and see if it dramatically fails 🤣

stone plover Oct 22, 2023, 6:27 PM

#

oh right

torn quarry Oct 22, 2023, 6:27 PM

#

haha

stone plover Oct 22, 2023, 6:27 PM

#

yeah let me fix it and ill run it

torn quarry Oct 22, 2023, 6:28 PM

#

before refactoring the C one, maybe playing with Python is helpful 😄

#

prototyping and shell-with-classes : where python is not that bad, I guess 😄

stone plover Oct 22, 2023, 6:29 PM

#

oh wait

#

Oh I got what you meant about the plus

#

yeah you're right

#

I hadn't notice the check you have down there

#

is it really worth to do that

#

I mean

#

I guess ?

#

it should 🙂

torn quarry Oct 22, 2023, 6:30 PM

#

yeah

stone plover Oct 22, 2023, 6:30 PM

#

however

torn quarry Oct 22, 2023, 6:30 PM

#

because otherwise you don't know where you need to do + c

stone plover Oct 22, 2023, 6:30 PM

#

torn quarry because otherwise you don't know where you need to do + c

yea that

#

however is it faster

#

to run all over the mem

#

with a loop

#

or two loop

#

two custom buffers

#

with x specific iterations

torn quarry Oct 22, 2023, 6:30 PM

#

idk

stone plover Oct 22, 2023, 6:30 PM

#

buffers seem more optimal

torn quarry Oct 22, 2023, 6:31 PM

#

to be honest, I think both are good in their own ways

stone plover Oct 22, 2023, 6:31 PM

#

we could only use one buffer to keep track of what numbers to change

#

and then use your approach

torn quarry Oct 22, 2023, 6:31 PM

#

because this last loop, you see,

stone plover Oct 22, 2023, 6:31 PM

#

to change those numbers

torn quarry Oct 22, 2023, 6:31 PM

#

it could be vectorized

#

(by C, I mean)

stone plover Oct 22, 2023, 6:31 PM

#

oh

#

well I think

#

we could use an implementation of the two

torn quarry Oct 22, 2023, 6:31 PM

#

maybe writing it with a ternary would make it more obvious for the compiler ; idk

stone plover Oct 22, 2023, 6:31 PM

#

have an array

#

that holds the numbers

#

and store the number of iterations on mem

#

and then just

torn quarry Oct 22, 2023, 6:32 PM

#

it also depends on how you realloc, and how much this method is effectively called

stone plover Oct 22, 2023, 6:32 PM

#

change the numbers

torn quarry Oct 22, 2023, 6:32 PM

#

because for some cases, you might only hit it rarely

#

(when you start low)

#

on other cases maybe you hit it every time

stone plover Oct 22, 2023, 6:32 PM

#

hm

#

well the implementation of the two sounds better

#

don't you think?

#

use realloc only once for one buffer

#

store the counters in the mem

#

and then with a for loop

#

change the numbers to their supposed iterations

torn quarry Oct 22, 2023, 6:33 PM

#

tbh I don't know

#

it's difficult to reason here because the complexity of the basic algorithm is difficult to comprehend

#

so maybe O(n) is "ridiculously small" compared to the main loop anyway

#

I would just check both implementations (realloc as you have, and no realloc at all),
in different scenarios

check if there is really a winner in all cases

if not, keep the one that is easier to understand

#

(assuming both are bugfree, ofc...)

stone plover Oct 22, 2023, 8:17 PM

#

One quick question just wondering

#

why if we put as a max number: 1.000.000.000 it gets a segmentation fault?

#

it can't handle such big numbers?

#

(nvm I think its because of the 32 bit compiler settings)

#

!close

leaden obsidianBOT Oct 22, 2023, 8:24 PM

#

stone plover !close

Message is already solved

torn quarry Oct 22, 2023, 8:38 PM

#

stone plover (nvm I think its because of the 32 bit compiler settings)

yeah I guess it's related ; not an expert but ig on 32 bits, pointers have 32 bits lengths, but you send 64 bits integers

#

that's actually a limitation of the cache approach

#

(you can never hope to cache integers that go beyond the size_t number

#

maybe you should disallow caching for those cases

#

(the algo will be slower... yes. but it will remain correct.

#

(note that you cannot just check the program inputs... because the orbits are not bounded by the inputs (as your previous issue showed))

torn quarry Oct 22, 2023, 8:45 PM

#

stone plover !close

actually, more generally, I think you should get more picky on the cache

first bound the size of the cache by a number you're sure pointers can reprsent
try to calloc until it succeeds ; if not, retry with a smaller size
add a guard everytime you try to access the buffer (IMO that might be a good case to write a structure or a macro, so you'll have a unified way to access this buffer)

#

you could also check if malloc couldn't be used, if calloc fails (say: your stack is much smaller than your heap) ; idk what others would think about it

stone plover Oct 22, 2023, 8:50 PM

#

It's okay

#

It's an assignment after all

#

Not an actual program I'll use in my daily life or distribute xd

#

All the test cases work just fine

#

So there's no problem

torn quarry Oct 22, 2023, 9:13 PM

#

okay 😄

stone plover Oct 23, 2023, 6:53 AM

#

I hadn't read the assignment right

#

It's true after all, that whoever writes the fastest code, gets bonus

#

💀

#

Maybe after all I'll do more optimizations and more micro optimizations

#

I asked my professor, he said some students are fighting in milliseconds

#

I'm done

stone plover Oct 23, 2023, 7:50 AM

#

📎 collatz.c

prime ravine Oct 23, 2023, 7:52 AM

#

you can probably reuse the numbers and counters buffers, instead of having to allocate for each number

stone plover Oct 23, 2023, 7:52 AM

#

Yeah I guess that's possible

#

Okay that's one optimization

#

NOTED

#

If you have any other ideas lmk

#

Alright I fixed that

#

run some test cases

#

everything is working fine

stone plover Oct 23, 2023, 8:44 AM

#

@prime ravine

#

My teacher said something about bss section and that i could utilize that maybe in some way

#

Any ideas what that is?

prime ravine Oct 23, 2023, 8:45 AM

#

bss section is basically where statically allocated variables (that haven't been assigned values) get stored

#

so if you know the size of some buffer, for example, you can store it in the bss section and avoid an allocation at runtime

stone plover Oct 23, 2023, 8:46 AM

#

I see

prime ravine Oct 23, 2023, 8:46 AM

#

i guess you could use that for giant_buffer here?

#

wait no nvm

stone plover Oct 23, 2023, 8:46 AM

#

I could

prime ravine Oct 23, 2023, 8:46 AM

#

that's not true, you don't know the size at compile- time

stone plover Oct 23, 2023, 8:46 AM

#

I can guess it

prime ravine Oct 23, 2023, 8:47 AM

#

then you lose program correctness

stone plover Oct 23, 2023, 8:47 AM

#

hear me out

#

the assignment says

#

the numbers will be given from the range (-100m, 100m)

#

(all negative numbers return 0)

#

so I can assume the maximum number size to be 100m + 1

prime ravine Oct 23, 2023, 8:48 AM

#

yep, i guess that's true

stone plover Oct 23, 2023, 8:48 AM

#

How would I do that?

#

can you give me an example code block that I use bss?

prime ravine Oct 23, 2023, 8:49 AM

#

you just statically declare giant_buffer

stone plover Oct 23, 2023, 8:49 AM

#

how do I do that?

prime ravine Oct 23, 2023, 8:49 AM

#

the actual usage of .bss is an implementation detail

stone plover Oct 23, 2023, 8:49 AM

#

so

#

how do I set the size of a buffer without alloc

prime ravine Oct 23, 2023, 8:50 AM

#

just declare at the top of your file uint64_t giant_buffer[100000001]

#

just declare it as an array

stone plover Oct 23, 2023, 8:50 AM

#

Okay

prime ravine Oct 23, 2023, 8:50 AM

#

(this will create a p large binary though, like 1GB)

stone plover Oct 23, 2023, 8:50 AM

#

what

#

idk we'll see

prime ravine Oct 23, 2023, 8:51 AM

#

the .bss section is part of the binary itself

#

that's how you avoid the allocation

#

(you don't really avoid the allocation, it just gets mapped into memory as part of the image load)

stone plover Oct 23, 2023, 8:51 AM

#

If it doesn't really give any performance boost

#

I'll remove it then

#

time to benchmark

#

warning: assignment to ‘uint64_t’ {aka ‘long unsigned int’} from ‘uint64_t *’ {aka ‘long unsigned int
’} makes integer from pointer without a cast```

#

what is this error?

prime ravine Oct 23, 2023, 8:53 AM

#

you haven't shown me the line of code that generates it

#

so i can't tell

stone plover Oct 23, 2023, 8:53 AM

#

oh mb

#

collatz.c: In function ‘basic_collatz’:
collatz.c:84:34: warning: assignment to ‘uint64_t *’ {aka ‘long unsigned int *’} from ‘uint64_t’ {aka ‘long unsigned in
’} makes pointer from integer without a cast [-Wint-conversion]
   84 |         giant_buffer[numbers[i]] = count - counters[i];```

#

There are tons of warnings

#

in the same format

prime ravine Oct 23, 2023, 8:54 AM

#

you've declared giant_buffer wrong

stone plover Oct 23, 2023, 8:54 AM

#

uint64_t* giant_buffer[100000001];

prime ravine Oct 23, 2023, 8:54 AM

#

have you declared it as an array of uint64_t *s instead of uint64_ts?

stone plover Oct 23, 2023, 8:54 AM

#

isn't it an array?

#

That's how I had declared it until now

#

OH

prime ravine Oct 23, 2023, 8:55 AM

#

that's an array of pointers

stone plover Oct 23, 2023, 8:55 AM

#

Yeah I got it

#

17k is the file

prime ravine Oct 23, 2023, 8:56 AM

#

hmm

stone plover Oct 23, 2023, 8:56 AM

#

program works

#

why would it be 1G?

#

100.000.000 * 64 bit

prime ravine Oct 23, 2023, 8:57 AM

#

ohhhh i was wrong about bss

#

it does get allocated at runtime

#

the size is simply recorded in the object file, that makes sense

#

i'm dubious this will give you a performance speedup

stone plover Oct 23, 2023, 8:58 AM

#

(dubious mean?)

#

(will not give any?)

prime ravine Oct 23, 2023, 8:58 AM

#

i am doubtful that you'll get one

stone plover Oct 23, 2023, 8:58 AM

#

oh alright

#

well

#

its still a micro optimization

#

collatz.c:8:10: error: variably modified ‘giant_buffer’ at file scope
    8 | uint64_t giant_buffer[BUFFER_SIZE];
      |          ^~~~~~~~~~~~
collatz.c: In function ‘main’:```

#

I tried making a const value BUFFER_SIZE

#

and set that to the giant_buffer

#

what's the error?

prime ravine Oct 23, 2023, 9:01 AM

#

const value != constant expression

#

you can use a macro instead if you like

stone plover Oct 23, 2023, 9:01 AM

#

what's a macro?

prime ravine Oct 23, 2023, 9:02 AM

#

#define

stone plover Oct 23, 2023, 9:02 AM

#

no idea

#

oh okay I went to read the docs

#

so I can just basically:

#define BUFFER_SIZE 100000000;```

prime ravine Oct 23, 2023, 9:04 AM

#

yup

#

macros get expanded before the compiler runs

stone plover Oct 23, 2023, 9:04 AM

#

I don't need any extras or something

prime ravine Oct 23, 2023, 9:04 AM

#

nope

stone plover Oct 23, 2023, 9:04 AM

#

sounds good

prime ravine Oct 23, 2023, 9:04 AM

#

so the compiler actually sees uint64_t buffer[100000000], not uint64_t buffer[BUF_SIZE]

stone plover Oct 23, 2023, 9:04 AM

#

aaa

#

noice

#

alright

#

trilon@M1911:~/hw0-TR1LON/collatz/src$ gcc -O0 -m32 -Wall -Wextra -Werror -pedantic -o collatz collatz.c
collatz.c:8:10: error: size ‘3200000032’ of array ‘giant_buffer’ exceeds maximum object size ‘2147483647’
    8 | uint64_t giant_buffer[BUFFER_SIZE];```

#

Bruh moment

#

Macro is 400.000.004

prime ravine Oct 23, 2023, 9:11 AM

#

why is that?

#

and not 100 000 001?

stone plover Oct 23, 2023, 9:12 AM

#

so I can cache more values that will be calculated

prime ravine Oct 23, 2023, 9:12 AM

#

...but you said the max was 100 million

stone plover Oct 23, 2023, 9:12 AM

#

yes

#

but collatz numbers can go above 100m

prime ravine Oct 23, 2023, 9:12 AM

#

your original code doesn't account for that

stone plover Oct 23, 2023, 9:12 AM

#

it does

#

basically

#

my original code says:

#

if the number that is calculated is bigger than the cache:

#

dont cache it

#

However

#

since I have a static cache now

#

I said let's make it bigger

#

if there's no easy fix

#

I'll just make it smaller and not cache big numbers

prime ravine Oct 23, 2023, 9:13 AM

#

i really doubt that using bss will give you a performance increase

#

but try it and see what happens

stone plover Oct 23, 2023, 9:14 AM

#

how do I fix that error

prime ravine Oct 23, 2023, 9:14 AM

#

you reduce the size of the buffer?

stone plover Oct 23, 2023, 9:14 AM

#

oh

#

alright

#

nothing changes

#

performance wise

#

o well

#

whatever

prime ravine Oct 23, 2023, 9:15 AM

#

yup, thought so

stone plover Oct 23, 2023, 9:16 AM

#

I'll leave it like that

#

if you think of anything else

#

don't hesitate

#

Current time is at:

trilon@M1911:~/hw0-TR1LON/collatz/src$ time ./collatz 100 100000000
950

real    0m4.581s
user    0m4.360s
sys     0m0.220s```

prime ravine Oct 23, 2023, 9:19 AM

#

i wonder... how slow does it run without memoising?

stone plover Oct 23, 2023, 9:19 AM

#

x10 slower

#

I think

prime ravine Oct 23, 2023, 9:19 AM

#

hmm

#

im wondering if you could get some performance boosts by multithreading

#

(but then you have issues with sharing the same cache)

stone plover Oct 23, 2023, 9:21 AM

#

I guess

#

I thought of it too

#

however I think we aren't even allowed to use more than one thread

#

the system we run the tests on I think limits it

prime ravine Oct 23, 2023, 9:22 AM

#

okay, that's fair

stone plover Oct 23, 2023, 9:22 AM

#

Also

#

why would we have issues with the cache?

#

Threads can't access eachother's memory?

prime ravine Oct 23, 2023, 9:23 AM

#

because you can share the cache between threads

#

i.e. have shared memory

stone plover Oct 23, 2023, 9:23 AM

#

yeah so where's the problem?

stone plover Oct 23, 2023, 9:23 AM

#

prime ravine (but then you have issues with sharing the same cache)

prime ravine Oct 23, 2023, 9:23 AM

#

... so then the threads can access each other's memory

#

which means you're at risk of race conditions

stone plover Oct 23, 2023, 9:24 AM

#

can't we split jobs equally from the start?

#

no we can't

#

because things arent cached

#

;/

#

well

#

there's only one solution

#

GPU computing

#

xD

#

nvm I need CUDA which I can't install to the main system

#

Time on the uni's machine 😄

linux12:/home/users/sdi23xxxxx/program>time ./collatz 100 100000000
950
3.929u 0.275s 0:04.20 99.7%     0+0k 0+0io 0pf+0w```

torn quarry Oct 23, 2023, 11:10 AM

#

stone plover Time on the uni's machine 😄 ```c linux12:/home/users/sdi23xxxxx/program>time ....

have you tried my naive idea of never allocate extra buffers?

stone plover Oct 23, 2023, 11:11 AM

#

torn quarry have you tried my naive idea of never allocate extra buffers?

No I haven't

#

But since I allocate them only once

#

on the main() function now

#

it shouldn't really be a performance difference

#

right?

torn quarry Oct 23, 2023, 11:11 AM

#

ah! ok

#

and never realloc?

stone plover Oct 23, 2023, 11:11 AM

#

I use realloc

#

but its being used very rare

#

wait I'll implement a counter

#

to see how many times it's being used

#

do you think the if statement is causing bottleneck?

torn quarry Oct 23, 2023, 11:12 AM

#

wait if you send me the code, I could compare both on my machine

#

so I'm not bothering you

stone plover Oct 23, 2023, 11:12 AM

#

oh alright

#

sure give me a second

#

📎 collatz.c

#

Here's the current code

torn quarry Oct 23, 2023, 11:13 AM

#

and how do you compile it eventually?

stone plover Oct 23, 2023, 11:13 AM

#

right

#

gcc -O0 -m32 -Wall -Wextra -Werror -pedantic -o collatz collatz.c

torn quarry Oct 23, 2023, 11:13 AM

#

ok!

#

and typical start and end, what do you choose? 100 and 100_000_000 ?

stone plover Oct 23, 2023, 11:14 AM

#

yes

torn quarry Oct 23, 2023, 11:21 AM

#

okay, 957 for the alternative approach ; so I guess not better if the competition already happens at the millis!

#

ah no wait

torn quarry Oct 23, 2023, 11:23 AM

#

stone plover yes

what command do you use to time it?

stone plover Oct 23, 2023, 11:24 AM

#

are you on linux

#

if yes

#

time ./collatz 100 100000000

#

If you are on windows

#

you can use a program called ptime

torn quarry Oct 23, 2023, 11:25 AM

#

nah it's fine, I can use time

stone plover Oct 23, 2023, 11:25 AM

#

or use the untrustworthy powershell command: ps Measure-Command { ./collatz.exe 100 100000000 }

#

Alright

torn quarry Oct 23, 2023, 11:26 AM

#

okay so the alternative approach takes much more time! so not good

stone plover Oct 23, 2023, 11:26 AM

#

looping through the giant buffer?

torn quarry Oct 23, 2023, 11:26 AM

#

yes

#

so with -O0 :

#

2.70 vs 3.40

#

with -O3:

#

1.66 vs 1.95

stone plover Oct 23, 2023, 11:27 AM

#

I see

#

your computer achieves a really fast time non the less

torn quarry Oct 23, 2023, 11:28 AM

#

(don"t especially surprise me that it's not the same percentage anymore ; likely due to vectorization)

#

oh yeah that's maybe a machine difference

stone plover Oct 23, 2023, 11:28 AM

#

Yeah I guess

torn quarry Oct 23, 2023, 11:28 AM

#

I was looking at the proportion only

torn quarry Oct 23, 2023, 11:28 AM

#

stone plover Yeah I guess

ah! also

#

my compiler doesn't not understand m32

#

so I had to remove that flag

#

maybe that's also ~~why~~ something that can impact the measured millis

stone plover Oct 23, 2023, 11:29 AM

#

oh yeah

#

you need an addon for m32

#

it impacts more things

#

so careful

#

gcc by default complies a program to 64 bit, if you compile it after in 32 bit there may be errors

torn quarry Oct 23, 2023, 11:35 AM

#

yeah

stone plover Oct 23, 2023, 11:35 AM

#

Do you know

#

if implementing gpu computing

#

is extremely hard

torn quarry Oct 23, 2023, 11:36 AM

#

(it's funny, I'm trying your code without realloc ; just don't realloc the buffer and if it's filled, stop adding a cache)

#

intuitivelt one could expect the number of holes to get lower, so the needs to realloc would be small

#

but okay anyway!

#

no idk about gpu things

stone plover Oct 23, 2023, 11:38 AM

#

wait

#

I have forgotten a check

#

I just noticed

#

            else {
            numbers[arr_counter] = n;
            counters[arr_counter] = count;```

#

I don't check if the n is already computed

torn quarry Oct 23, 2023, 11:39 AM

#

ah oops

stone plover Oct 23, 2023, 11:39 AM

#

but

#

hm

#

I don't really know if I can skip anything here

#

well

#

I think I can just add a check up on the while loop

#

that should work just fine actually

torn quarry Oct 23, 2023, 11:40 AM

#

btw, if I may , I think you have some duplicated code

stone plover Oct 23, 2023, 11:41 AM

#

Where?

torn quarry Oct 23, 2023, 11:41 AM

#

basic_collatz

stone plover Oct 23, 2023, 11:41 AM

#

        if (giant_buffer[n] != 0) {
            break;
        }``` I think this should be good on `basic_collatz`

torn quarry Oct 23, 2023, 11:43 AM

#

//create simple collatz algorithm
int basic_collatz(uint64_t n) {
    uint64_t count = 0;
    uint64_t arr_counter = 0;
    
    while(n != 1) {
        if (n % 2 == 0) {

            if (arr_counter >= SIZE) {
                SIZE *= 2;
                numbers = realloc(numbers, SIZE * sizeof(uint64_t));
                counters = realloc(counters, SIZE * sizeof(uint64_t));
            }

            if (n > max_number) {
                n/=2;
                count++;
            }
            else {
            numbers[arr_counter] = n;
            counters[arr_counter] = count;

            n /= 2;
            count++;
            arr_counter++;
            }

        } else {

            if (arr_counter >= SIZE) {
                SIZE *= 2;
                numbers = realloc(numbers, SIZE * sizeof(uint64_t));
                counters = realloc(counters, SIZE * sizeof(uint64_t));
            }

            if (n > max_number) {
                n = ((3 * n) + 1) / 2;
                count+=2;
            }
            else {
                
            numbers[arr_counter] = n;
            counters[arr_counter] = count;

            n = ((3 * n) + 1) / 2;
            count+=2;
            arr_counter++;

            }

        }

    }

    for (uint64_t i = 0; i < arr_counter; i++) {
        giant_buffer[numbers[i]] = count - counters[i];
    }
    return count;
}

#

can likely become

//create simple collatz algorithm
int basic_collatz(uint64_t n) {
    uint64_t count = 0;
    uint64_t arr_counter = 0;
    
    while(n != 1) {
        if(n <= max_number) {
            if (arr_counter >= SIZE) {
                SIZE *= 2;
                numbers = realloc(numbers, SIZE * sizeof(uint64_t));
                counters = realloc(counters, SIZE * sizeof(uint64_t));
            }
            numbers[arr_counter++] = n;
            counters[arr_counter] = count;
        }

        if (n % 2 == 0) {
            n /= 2;
            count++;
        } else {
            n = ((3 * n) + 1) / 2;
            count+=2;
        }

    }

    for (uint64_t i = 0; i < arr_counter; i++) {
        giant_buffer[numbers[i]] = count - counters[i];
    }
    return count;
}

#

(if I don't mistake)

stone plover Oct 23, 2023, 11:44 AM

#

oh

#

you merged the n <= max_number

#

I think

#

right?

torn quarry Oct 23, 2023, 11:45 AM

#

yes

#

and the resizing also

stone plover Oct 23, 2023, 11:45 AM

#

well I guess it seems good

#

take the arr_counter++ outside from the buffer

#

so its more readable

torn quarry Oct 23, 2023, 11:46 AM

#

imo the code as it is now, is highly unsafe : if your realloc fails, you'll get a segfault

#

I would add some checks there

stone plover Oct 23, 2023, 11:46 AM

#

I'll see what I can do

#

Professor said if I have the fastest code in class I need to do a presentation

#

Presentation to 200 people xd

torn quarry Oct 23, 2023, 11:46 AM

#

(also : since you keep track of the indices that you mutate in numbers, I'm not sure qhy you actually need counters:
can you just print the count in giant_buffer and rectify later?

stone plover Oct 23, 2023, 11:47 AM

#

Oh

#

save the numbers in giant_buffer

#

and have only numbers

#

If I understood correctly

torn quarry Oct 23, 2023, 11:47 AM

#

yeah ; since you know (in numbers) what you are mutating

#

you can save them and rectify later

stone plover Oct 23, 2023, 11:47 AM

#

yeah yeah

#

alright got it

#

I'll go eat now

#

I'll be back and I'll apply the changes

torn quarry Oct 23, 2023, 11:50 AM

#

it's kind of frustrating to get a competitive thing with bonus, based on "one time millis" 😄 if ever you're not first, don't hesitate to take a step back 😄

#

(an approach that works well on (start,end) might not work quite as well on different inputs 😉 since I see you're quite investing time into it, don't take it too personnally if it doesn't pay off 🙂 what you'd have learned might be the best take-away)

stone plover Oct 23, 2023, 12:12 PM

#

Well I'm pretty happy that I have learned so many about C in such a small period

#

I mean obviously I won't be as excited if I'm not the first, but it's not that bad

stone plover Oct 23, 2023, 12:50 PM

#

torn quarry can likely become ```c //create simple collatz algorithm int basic_collatz(uint6...

there's a problem with this one

#

trilon@AK-47:~/collatz$ time ./collatz 100 100000000
952

real 0m2.465s
user 0m2.365s
sys 0m0.100s

#

I got a value of 952 instead of 950

#

Alright nvm fixed it

#

The

numbers[arr_counter++] = n;
``` caused the error

#

I changed it to

#

arr_counter++;
numbers[arr_counter] = n;```

torn quarry Oct 23, 2023, 12:52 PM

#

ah yeah sorry, I haven't really tested before submitting

stone plover Oct 23, 2023, 12:52 PM

#

Also I'll remove the numbers[]

#

and use the giant_buffer

#

i'll also add the check to see if its computed already

#

break; ``` exists in C right?

prime ravine Oct 23, 2023, 12:54 PM

#

yup

stone plover Oct 23, 2023, 12:54 PM

#

Alright thanks!

#

hmm

#

for some reason now it has incorrect values

#

Oh I know why wait

#

Well I don't see any performance difference

#

But I guess its good practice to have a more simple and compact code

#

o oh

#

its not good

#

it's producing false results

#

I think the problem is that check

#

it's producing wrong results 💀

#

Alright I got the backup

#

I'll add some extra variables to check how many times other things get accessed

torn quarry Oct 23, 2023, 1:31 PM

#

ah sorry if I brought a mess 😄 hahahaha

#

#oops

stone plover Oct 23, 2023, 1:48 PM

#

I'll make a python script that runs test cases

#

and checks if the output of the program was correct

stone plover Oct 24, 2023, 10:54 AM

#

hm

#

@torn quarry I think we still call the basic function a lot of times

#

trilon@AK-47:~/collatz$ ./collatz 23504 100000000
950
Called 7301 times.
trilon@AK-47:~/collatz$ ./collatz 1 100000000
950
Called 2 times.
trilon@AK-47:~/collatz$ ./collatz 3000 100000000
950
Called 935 times.```

#

let me test the code at the python code to see if we access it the same amount

#

(it should be)

torn quarry Oct 24, 2023, 11:00 AM

#

for now in the basic function, you add in cache if the entry was not computed yet

#

what happens if you just, exit the function if the entry has been discovered already ? (because it means you're now in an orbit you've discovered)

#

(that is suppose to "return fast" in case you eventually meet a trajectory you've discovered before)

stone plover Oct 24, 2023, 11:02 AM

#

torn quarry what happens if you just, exit the function if the entry has been discovered alr...

for some reason if I do that it gives incorrect results

#

(no idea why actually)

torn quarry Oct 24, 2023, 11:02 AM

#

ah because you cannot rectify or something?

stone plover Oct 24, 2023, 11:02 AM

#

hm

torn quarry Oct 24, 2023, 11:02 AM

#

idk

stone plover Oct 24, 2023, 11:02 AM

#

let me think about it a bit

#

if (giant_buffer[n] != 0) {  //If it's calculated:
  c += giant_buffer[n];
}```

torn quarry Oct 24, 2023, 11:04 AM

#

for example ; 5 goes to 16 -> 8 -> 4 -> 2 -> 1
so when you're at 5, you actually don't have to go to 1, because you might already have computed 4 from a previous computation

stone plover Oct 24, 2023, 11:05 AM

#

yeah I got it

stone plover Oct 24, 2023, 11:05 AM

#

stone plover ```c if (giant_buffer[n] != 0) { //If it's calculated: c += giant_buffer[n]; ...

^^

#

I had implemented it

#

but it gave incorrect values

torn quarry Oct 24, 2023, 11:05 AM

#

ah

stone plover Oct 24, 2023, 11:05 AM

#

Probably I did something wrong

#

I'll give it another try

torn quarry Oct 24, 2023, 11:06 AM

#

maybe those cases are also too rare to really matter, idk

stone plover Oct 24, 2023, 11:06 AM

#

I'll add a counter

#

to see how rare are these cases

torn quarry Oct 24, 2023, 11:07 AM

#

maybe more like a score I think :

counter before early return / counter returned

the closest to 1, the closest to 1 you went

stone plover Oct 24, 2023, 11:07 AM

#

I found the error

#

I had incorrect brackets I THINK

#

lemme try to implement it again

torn quarry Oct 24, 2023, 11:10 AM

#

wait

#

I think you're right,there is something that is not supposed to be working

#

because,

stone plover Oct 24, 2023, 11:10 AM

#

Alright

#

It IS useful

stone plover Oct 24, 2023, 11:10 AM

#

torn quarry I think you're right,there is something that is not supposed to be working

It's working

#

trilon@AK-47:~/collatz$ ./collatz 14040 100000000
950
4366```

torn quarry Oct 24, 2023, 11:10 AM

#

ah, interesting

stone plover Oct 24, 2023, 11:11 AM

#

4366 times was giant_buffer[n] used

torn quarry Oct 24, 2023, 11:11 AM

#

because, iirc, but maybe I've stopped followng at some point,

#

the entire goal of this basic function was to never actually read from the cache

#

(compared to the main loop)

#

but ! I don't remember why

#

maybe I'm mistaking

stone plover Oct 24, 2023, 11:11 AM

#

Oh

#

I think you're right

#

yeah

#

Maybe now we are calling the function even more times

torn quarry Oct 24, 2023, 11:12 AM

#

but actually, it was two days ago already

#

so maybe the code had changed since then, and it's now completely useless to keep having this basic function 😄 hahaha

stone plover Oct 24, 2023, 11:12 AM

#

wait I'll compile an old backup and time them

torn quarry Oct 24, 2023, 11:12 AM

#

tbh I haven't followed entirely 🤣

stone plover Oct 24, 2023, 11:12 AM

#

¯_(ツ)_/¯

torn quarry Oct 24, 2023, 11:13 AM

#

yeah no I cann't remember

#

I do remember we restarted from [some state], then wanted to introduce cache ; then there was somthing in the code that was not friendly to handle reading caches in some cases, because the cache was not set yet ;

#

but it was sooo long ago 😄 like sunday morning? and we're tuesday afternoon here 😄 no chance my memory is large enough to handle that

#

haha

stone plover Oct 24, 2023, 11:17 AM

#

uhh

#

trilon@AK-47:~/collatz$ time ./collatz 14043 100000000
950
CALLED: 4366
GIANT: 4365

real    0m1.699s
user    0m1.578s
sys     0m0.120s
trilon@AK-47:~/collatz$ time ./test 13043 100000000
950
BASIC: 4064

real    0m1.698s
user    0m1.655s
sys     0m0.040s```

#

the first one is with the giant_buffer[n]

#

the second one is without

#

they achieve literally the same time

#

do we care about user or sys?

#

oh wait

#

wrong number on the 2nd one

#

OOPS

#

retry test

#

trilon@AK-47:~/collatz$ time ./collatz 14043 100000000
950
CALLED: 4366
GIANT: 4365

real    0m1.641s
user    0m1.550s
sys     0m0.090s
trilon@AK-47:~/collatz$ time ./test 14043 100000000
950
BASIC: 4366

real    0m1.647s
user    0m1.487s
sys     0m0.160s
trilon@AK-47:~/collatz$```

#

No difference

#

The function is called the same exact amount

#

however there's no performance difference

#

how is that possible

#

Can you check if I have done any mistakes xD

#

I think the code is good

#

//create simple collatz algorithm
int basic_collatz(uint64_t n) {
    uint64_t count = 0;
    uint64_t arr_counter = 0;
    times_basic_called++;
    
    while(n != 1) {

        if (giant_buffer[n] != 0) {
            times_giant_used++;
            count+=giant_buffer[n];
            n = 1;
        } 
        else {

          if (n % 2 == 0) {
  
              if (arr_counter >= SIZE) {
                  SIZE *= 2;
                  numbers = realloc(numbers, SIZE * sizeof(uint64_t));
              }
  
              if (n > max_number) {
                  n/=2;
                  count++;
              }
              else {
              numbers[arr_counter] = n;
              giant_buffer[n] = count;
  
              n /= 2;
              count++;
              arr_counter++;
              }

            } else {

                if (arr_counter >= SIZE) {
                    SIZE *= 2;
                    numbers = realloc(numbers, SIZE * sizeof(uint64_t));
                }

                if (n > max_number) {
                    n = ((3 * n) + 1) / 2;
                    count+=2;
                }
                else {
                    
                numbers[arr_counter] = n;
                giant_buffer[n] = count;

                n = ((3 * n) + 1) / 2;
                count+=2;
                arr_counter++;

                }

            }
        }

    }

    for (uint64_t i = 0; i < arr_counter; i++) {
        giant_buffer[numbers[i]] = count - giant_buffer[numbers[i]];
    }
    return count;
}```

#

I think I should do something about the if's and else

#

we have kinda alot

#

📎 collatz.c

#

@prime ravine sorry to bother you

#

do you know assembly?

#

https://godbolt.org/z/sd9fhvEas <---- I want to apply any optimizations that I can apply manually that -O3 applies

Compiler Explorer - C

#define BUFFER_SIZE 200000000

uint64_t giant_buffer[BUFFER_SIZE];
uint64_t max_number;
uint64_t* numbers;
uint64_t SIZE = 16432;
uint64_t times_basic_called = 0;
uint64_t times_giant_used = 0;

//create fast_atoi, exits if number is <= 0
int fast_atoi( const char * str )
{
uint64_t val = 0;
if (str[0] == '-' || str[0] == '0') {
...

prime ravine Oct 24, 2023, 11:40 AM

#

are you allowed to inline assembly for your solution?

stone plover Oct 24, 2023, 11:40 AM

#

I don't think so...

#

Well it doesn't say I am not allowed to

#

but I don't think

#

I'm allowed either

prime ravine Oct 24, 2023, 11:41 AM

#

oh so you're trying to make the C code have the optimisations in it?

stone plover Oct 24, 2023, 11:41 AM

#

yes

#

basically

#

I'm forced to compile with O0

prime ravine Oct 24, 2023, 11:42 AM

#

you won't be able to do it entirely, since the optimiser simply has more power than you have

#

you could try just unrolling some loops by hand

#

seeing if that gets you anywhere

stone plover Oct 24, 2023, 11:42 AM

#

I mean yeah

#

That's what I mean

#

The thing is I don't really understand assembly

#

I just know some uhh functions idk how u call them

#

the mov,push etc

#

cmp

#

Can you point out some obvious ones?

#

I noticed there's no .main on the O3

#

oh there is mb

torn quarry Oct 24, 2023, 11:47 AM

#

if I may, I think there are other measures that would be relevant

stone plover Oct 24, 2023, 11:47 AM

#

Like?

torn quarry Oct 24, 2023, 11:48 AM

#

for example: can you print the numbers of elements in the buffer that are never actually filled

#

(the ones smaller than start)

stone plover Oct 24, 2023, 11:48 AM

#

Well to be honest

#

I don't know if that's faster or not

#

we Could:

#

collatz(2, start)

#

collatz(start, end)

#

xD

#

but that ruins the whole point

#

of the assignment so ig not

stone plover Oct 24, 2023, 11:49 AM

#

torn quarry for example: can you print the numbers of elements in the buffer that are never ...

And why would I do that?

torn quarry Oct 24, 2023, 11:49 AM

#

but maybe the point is not the right one, yes

#

because take for example this path:

#

5 -> 16 -> 8 -> 4 -> 2 -> 1

#

and stay you start at 3

#

3 will give you the path

#

3 -> 10 -> 5 ...

stone plover Oct 24, 2023, 11:50 AM

#

right

#

so

torn quarry Oct 24, 2023, 11:50 AM

#

(wait I try to formulate the observation in my mind 😄 hahaha)

stone plover Oct 24, 2023, 11:51 AM

#

We need an algorithm to choose the best numbers first

#

And then guess for the others

torn quarry Oct 24, 2023, 11:51 AM

#

I'm wondering if it wouldn't be possible to actually buffer a bit more than numbers, but more like, a kind of dependency graph

stone plover Oct 24, 2023, 11:51 AM

#

BTW

torn quarry Oct 24, 2023, 11:51 AM

#

that actually encodes the path we're taking

#

but I'm not sure ; I have to think

stone plover Oct 24, 2023, 11:53 AM

#

The professor told me another guy is talking about a guessing algorithm (that guesses which number has the biggest sequence without actually calculating it), but neither my professor trusts it as much until he sees it

torn quarry Oct 24, 2023, 11:54 AM

#

you see in the current approach, when you hit 3, you're going to fill up the full path
3 -> 10 -> 5 -> 16 -> 8 -> 4 -> 2 -> 1

stone plover Oct 24, 2023, 11:54 AM

#

hm

#

May I ask

#

Do you think implementing reverse collatz would be better?

torn quarry Oct 24, 2023, 11:55 AM

#

but maybe you could just go to 1, and while iterating , you store information like:

entries[10] = { .pivot = 3, distance = 1 }

#

so that you don't have to go back and fill elements in your path

#

next time you hit 10, you know you only have to resolve it ;

#

I'm not sure that can be better, that said

#

the point is that if you allow pivot to be always smaller than the entry index, you can maybe will that buffer "very" fast,

#

and then re-traverse it while finding the maximal value

prime ravine Oct 24, 2023, 11:58 AM

#

what's the size of giant_buffer?

torn quarry Oct 24, 2023, 11:58 AM

#

epty entries are { pivot = -1 , distance = 0 } ; resolved ones are { -1 , > 0 }

prime ravine Oct 24, 2023, 11:58 AM

#

you could get away with storing uint32_t, now?

stone plover Oct 24, 2023, 12:00 PM

#

prime ravine you could get away with storing uint32_t, now?

I could?

#

When I tried doing so I can weird values

#

and errors

stone plover Oct 24, 2023, 12:00 PM

#

torn quarry epty entries are `{ pivot = -1 , distance = 0 }` ; resolved ones are `{ -1 , > 0...

you have burned my mind

torn quarry Oct 24, 2023, 12:01 PM

#

I'm giving it a try and I'll see

stone plover Oct 24, 2023, 12:07 PM

#

@torn quarry

#

I found some useful information in a research paper

#

📎 Collatz_Conjecture_Properties_and_Algorithms.pdf

#

Page 10

#

Has all the tricks proven by math to filter out numbers

#

I think

torn quarry Oct 24, 2023, 12:34 PM

#

can you tell me @stone plover

#

what are the maximal values for start 1 and
end = 10
end = 15
end = 25

stone plover Oct 24, 2023, 12:37 PM

#

oh

#

sorry

#

I had gone to eat

#

yeah I will

#

20, 20, 24

#

if your numbers are

#

19, 19, 23

#

it is correct

#

I add +1 in the end

#

@torn quarry

torn quarry Oct 24, 2023, 12:38 PM

#

ok !

torn quarry Oct 24, 2023, 12:48 PM

#

stone plover <@741186984684421143>

and what's your current time in millis?

#

(-O0)

stone plover Oct 24, 2023, 12:49 PM

#

we can't change in different systems

#

judge*

#

compare**

#

we can't compare in different systems

torn quarry Oct 24, 2023, 12:50 PM

#

stone plover

I picked that one ^

stone plover Oct 24, 2023, 12:50 PM

#

It's the latest one

torn quarry Oct 24, 2023, 12:50 PM

#

you run with

#

start=?

#

end=100 000 000

stone plover Oct 24, 2023, 12:50 PM

#

yes

torn quarry Oct 24, 2023, 12:52 PM

#

okay so unfortunately, my idea didn't turn to be fast enough

#

I think I make cache misses way too much

stone plover Oct 24, 2023, 12:53 PM

#

oof

#

Should we stop bothering with this

#

I mean probably

#

there are optimizations that we could find

torn quarry Oct 24, 2023, 1:27 PM

#

hmmmm, conceptually, I'm not quite sure to understand why my code is so slower

#

but I guess it has to do with the fact I traverse the numbers once without actually resolving anything

#

(and then I have to perform very long jumps, I guess)

#

hmmmmmm, no, nop nop nop --- there is something else I do wrong
because the last O(n) loop that resolves everything, takes 1% of the time -- I'll check a bit later, I must have forgotten a check

stone plover Oct 24, 2023, 1:36 PM

#

Oh I see

torn quarry Oct 24, 2023, 1:37 PM

#

aaaaaah okay I think I get it! with my approach, there is a significiant draw back, that I have to wait my cursor comes back at an index below the starting point

for example, s = 7, I will run the algo but I need to wait to come below 7 to stop

#

while in your case, you stop as soon as you have something in the cache (I guess?)

#

I think that's where I'm loosing all my time

#

so for example, when I hit 7,

I hit then 22 and I claim
value(22) = value(7) + 1
I continue and hit 11, so I claim
value(11) = value(7) + 2
and so on;

and I can never stop while I'm not below 7 ; even if I already discovered 11, I need to wait to come back below 7 because I need the dependencies to be always below the index ; hu hu , unfortunate

#

yeah yeah yeah, that's it I think

I just removed the check I made and performed one computation "too much", and I just made +50% perf decrease. so I guess that's why I'm so slow 😄 hahaha

stone plover Oct 24, 2023, 1:58 PM

#

I see

stone plover Oct 25, 2023, 8:46 AM

#

I guess then this project is finished

#

I'll try to make the code a bit clearer

#

Write some README.md

#

Explain the code

#

And it's finished

stone plover Nov 3, 2023, 9:24 AM

#

hello

stone plover Nov 3, 2023, 1:14 PM

#

@formal wyvern give me a second I'll post the latest code

#

📎 collatz.c 📎 collatz.s

#

Any ideas?

formal wyvern Nov 3, 2023, 1:17 PM

#

What exactly is the goal of the assignment?

stone plover Nov 3, 2023, 1:17 PM

#

Calculate the fastest

formal wyvern Nov 3, 2023, 1:17 PM

#

Count the number of steps each number takes to reach 1?

stone plover Nov 3, 2023, 1:17 PM

#

Yes

#

Find the biggest collatz sequence

#

In a range of numbers

formal wyvern Nov 3, 2023, 1:19 PM

#

Are start and end bounded?

stone plover Nov 3, 2023, 1:19 PM

#

Bounded meaning?

#

Specific?

#

The range of numbers is [-100m, +100m]

#

The fastest time test will run (probably) on 100 -> 100.000.000

formal wyvern Nov 3, 2023, 1:21 PM

#

Do we know something about the inputs?

#

For example, will they always be valid?

stone plover Nov 3, 2023, 1:21 PM

#

Yes

#

The start will never be bigger than end

#

And the numbers will be valid from the range I mentioned

#

I would give you the assignment paper but it's in Greek so...

formal wyvern Nov 3, 2023, 1:25 PM

#

stone plover And the numbers will be valid from the range I mentioned

Then why do you have error checking?

stone plover Nov 3, 2023, 1:25 PM

#

I have only for negative numbers

#

The range supports negative numbers

#

However

#

The assignment says if the number is negative print 0 and exit

#

So

#

Yeah ig

#

Jude has helped me as well but we can't find anything more than this to optimize except using the O3 optimizations

#

Which makes it run about x2.1 times after

#

I think

#

Lmk if you have any ideas or questions @formal wyvern

formal wyvern Nov 3, 2023, 1:45 PM

#

stone plover Lmk if you have any ideas or questions <@253894450290491393>

Have you tried SIMD?

stone plover Nov 3, 2023, 1:54 PM

#

No idea what that is

#

If it requires threads we are not allowed to use

#

Oh alright got it

#

However how would I apply that in my code? I need to calculate all the numbers in order since I use memoization

strange agate Nov 3, 2023, 2:03 PM

#

@stone plover You've been looking at this for weeks has your teacher not shown you what he did yet

stone plover Nov 3, 2023, 2:04 PM

#

I have achieved a better time than my teacher

#

However

#

The first place (There's a leaderboard apparently) gets a bonus on his grade

stone plover Nov 3, 2023, 2:04 PM

#

strange agate <@303823845494751233> You've been looking at this for weeks has your teacher not...

The assignment is due in 3 days

strange agate Nov 3, 2023, 2:05 PM

#

stone plover I have achieved a better time than my teacher

What did you do to get a better score than your teacher

stone plover Nov 3, 2023, 2:05 PM

#

Memoization

#

And some other tricks

#

Using mod 4

stone plover Nov 3, 2023, 2:06 PM

#

stone plover

The code is here

formal wyvern Nov 3, 2023, 2:06 PM

#

Why is your code so hideous?

stone plover Nov 3, 2023, 2:06 PM

#

Hideous?

strange agate Nov 3, 2023, 2:07 PM

#

formal wyvern Why is your code so hideous?

That's a pretty rude thing to say in a help channel

#

Not really something we want to see

formal wyvern Nov 3, 2023, 2:07 PM

#

That's true, my bad.

stone plover Nov 3, 2023, 2:08 PM

#

Well my experience in writing C is about 3-3,5 weeks sorry if my code is not very clear and well written

strange agate Nov 3, 2023, 2:10 PM

#

It's not bad

#

                if (arr_counter >= SIZE) {
                    SIZE *= 2;
                    numbers = realloc(numbers, SIZE * sizeof(uint32_t));
                }

                if (n <= max_number) {
                    numbers[arr_counter++] = n;
                    giant_buffer[n] = count;
                }
```this is repeated twice, this could be abstracted to a function to deal with reallocating your buffer and setting stuff

stone plover Nov 3, 2023, 2:11 PM

#

I suppose I could do that

#

I don't have a computer rn

#

I'll fix it in a bit thanks!

strange agate Nov 3, 2023, 2:12 PM

#

what's the difference between numbers and giant_buffer

stone plover Nov 3, 2023, 2:12 PM

#

numbers save temporarily numbers

#

The reason I save them is because I don't know the sequence for such number

#

So in the end I add them to the giant buffer after I have reached 0

strange agate Nov 3, 2023, 2:13 PM

#

Hmm interesting

stone plover Nov 3, 2023, 2:16 PM

#

I didn't explain it very well let me rephrase it

#

As I'm calculating for n, I cache every number I calculate in the process. (If I haven't already calculated obviously). After I finish calculating for n, I add to the cache all the numbers the giant buffer with the difference of the biggest count to the smaller

#

I hope I made some sense

strange agate Nov 3, 2023, 2:28 PM

#

It makes sense

#

Is one of your rules that you can't multithread?

stone plover Nov 3, 2023, 2:29 PM

#

Yep

#

Hurubon mentioned SMID but I can't see any way that I can apply that in my code

#

Since I don't calculate many values in the same time

#

I calculate one each loop

formal wyvern Nov 3, 2023, 2:29 PM

#

How does this compare with your solution?

#

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <stdbool.h>
#include <limits.h>

int buffer[1 << 28] = { 1, 1 };

int collatz_length(int n)
{
    if (buffer[n] == 0 && n % 2 == 0)
    {
        buffer[n] = 1 + collatz_length(n / 2);
    }
    else if (buffer[n] == 0 && n % 2 == 1)
    {
        buffer[n] = 2 + collatz_length((3 * n + 1) / 2);
    }
    
    return buffer[n];
}

int main(int argc, char** argv)
{
    if (argc != 3)
    {
        printf("Usage: collatz <start> <end>");
        return 1;
    }

    int start = atoi(argv[1]);
    int end   = atoi(argv[2]);

    if (start <= 0 || end <= 0)
    {
        printf("0");
        return 1;
    }

    int max = INT_MIN;

    for (; start <= end; start += 1)
    {
        int const result = collatz_length(start);

        if (result > max)
        {
            max = result;
        }
    }

    printf("%d", max);
}

strange agate Nov 3, 2023, 2:30 PM

#

It's verry difficult to try to apply simd to collatz

stone plover Nov 3, 2023, 2:31 PM

#

formal wyvern ```c #include <stdio.h> #include <stdlib.h> #include <stdint.h> #include <stdboo...

Let me give it a try

formal wyvern Nov 3, 2023, 2:33 PM

#

It might not work, change the 1 << 16 to something like 1 << 28.

formal wyvern Nov 3, 2023, 2:40 PM

#

stone plover Let me give it a try

What's the verdict?

stone plover Nov 3, 2023, 2:41 PM

#

Sorry family needed me for a bit

#

Im running the test now

#

test2.c:52:18: error: ‘result’ undeclared (first use in this function)
   52 |     printf("%d", result);
      |                  ^~~~~~
test2.c:52:18: note: each undeclared identifier is reported only once for each function it appears in```

#

I think you meant max let me change it

#

trilon@AK-47:~/collatz$ time ./test2 100 100000000
Segmentation fault

real    0m0.001s
user    0m0.001s
sys     0m0.000s```

formal wyvern Nov 3, 2023, 2:45 PM

#

stone plover ```c trilon@AK-47:~/collatz$ time ./test2 100 100000000 Segmentation fault real...

I changed the code a bit.

#

Try to run it again.

stone plover Nov 3, 2023, 2:46 PM

#

segfault

#

trilon@AK-47:~/collatz$ time ./test2 100 100000000
Segmentation fault

real 0m0.007s
user 0m0.007s
sys 0m0.000s

#

compiler settings

#

gcc -O0 -m32 -Wall -Wextra -Werror -pedantic -o test2 test2.c

formal wyvern Nov 3, 2023, 2:47 PM

#

Hmm, let me see why it segfaults.

stone plover Nov 3, 2023, 2:48 PM

#

1.1G file?

strange agate Nov 3, 2023, 2:50 PM

#

The way the array is initialized is problematic

#

If you write int buffer[1 << 28]; and then set buffer[0] = 1; buffer[1] = 1; in main then the array can live in .bss and the file won't be 1.1GB

stone plover Nov 3, 2023, 2:51 PM

#

Also

formal wyvern Nov 3, 2023, 2:51 PM

#

strange agate If you write `int buffer[1 << 28];` and then set `buffer[0] = 1;` `buffer[1] = 1...

Yes, but it won't be zero-initialized.

#

Which is what I want.

stone plover Nov 3, 2023, 2:51 PM

#

Oh no

strange agate Nov 3, 2023, 2:51 PM

#

formal wyvern Yes, but it won't be zero-initialized.

It will be zero initialized

stone plover Nov 3, 2023, 2:51 PM

#

Biggest size of .c file is 8kb ??????

#

why is that even a thing

#

well whatever

strange agate Nov 3, 2023, 2:52 PM

#

I'm not sure why it's segfaulting though. Could try running in a debugger to see why though

stone plover Nov 3, 2023, 2:52 PM

#

strange agate I'm not sure why it's segfaulting though. Could try running in a debugger to see...

I tried using godbolt

formal wyvern Nov 3, 2023, 2:52 PM

#

strange agate I'm not sure why it's segfaulting though. Could try running in a debugger to see...

Working on it.

stone plover Nov 3, 2023, 2:52 PM

#

it didn't like the 1.1gb

strange agate Nov 3, 2023, 2:52 PM

#

Yeah CE won't like that

stone plover Nov 3, 2023, 2:52 PM

#

visual studio code debugger might work let me check

strange agate Nov 3, 2023, 2:52 PM

#

stone plover Biggest size of .c file is 8kb ??????

The C standard sets some minimums the compiler must support, e.g. 64 arguments in a function all and whatnot

#

In practice compilers support more

stone plover Nov 3, 2023, 2:53 PM

#

Alright

stone plover Nov 3, 2023, 2:54 PM

#

stone plover visual studio code debugger might work let me check

I can't get it to run bruh

strange agate Nov 3, 2023, 2:56 PM

#

where does it say the error is

#

It should be either a stack overflow from recursion or out of bounds buffer access

stone plover Nov 3, 2023, 2:58 PM

#

yeah I can't even run the code because my vs code is broken

strange agate Nov 3, 2023, 2:58 PM

#

hmm

stone plover Nov 3, 2023, 2:59 PM

#

trying to fix it give me a sec

formal wyvern Nov 3, 2023, 3:04 PM

#

@stone plover I tried giving it 1 << 29 space and it seems not to crash on godbolt.

stone plover Nov 3, 2023, 3:04 PM

#

alright let me give it 1 << 29

#

did you use the command I gave you

formal wyvern Nov 3, 2023, 3:05 PM

#

stone plover alright let me give it 1 << 29

Try (1 << 29) - 1.

stone plover Nov 3, 2023, 3:05 PM

#

arguments*

formal wyvern Nov 3, 2023, 3:06 PM

#

Yes.

stone plover Nov 3, 2023, 3:06 PM

#

formal wyvern Try (1 << 29) - 1.

yeah I noticed that

#

it still gives me a seg fault

#

wait

#

I tried gcc test2.c

#

/usr/lib/gcc/x86_64-linux-gnu/11/crtbeginS.o: in function `deregister_tm_clones':
crtstuff.c:(.text+0x3): relocation truncated to fit: R_X86_64_PC32 against `.tm_clone_table'
crtstuff.c:(.text+0xa): relocation truncated to fit: R_X86_64_PC32 against symbol `__TMC_END__' defined in .data section in a.out
/usr/lib/gcc/x86_64-linux-gnu/11/crtbeginS.o: in function `register_tm_clones':
crtstuff.c:(.text+0x33): relocation truncated to fit: R_X86_64_PC32 against `.tm_clone_table'
crtstuff.c:(.text+0x3a): relocation truncated to fit: R_X86_64_PC32 against symbol `__TMC_END__' defined in .data section in a.out
/usr/lib/gcc/x86_64-linux-gnu/11/crtbeginS.o: in function `__do_global_dtors_aux':
crtstuff.c:(.text+0x76): relocation truncated to fit: R_X86_64_PC32 against `.bss'
crtstuff.c:(.text+0x9e): relocation truncated to fit: R_X86_64_PC32 against `.bss'
collect2: error: ld returned 1 exit status```

strange agate Nov 3, 2023, 3:08 PM

#

1<<29 is too big

stone plover Nov 3, 2023, 3:08 PM

#

yeah we used -1

#

(1<<29) - 1

strange agate Nov 3, 2023, 3:08 PM

#

Still too big

#

Just calloc it

stone plover Nov 3, 2023, 3:09 PM

#

Do we even need such a big cache

strange agate Nov 3, 2023, 3:09 PM

#

strange agate Still too big

The reason is the memory model of the binary itself is limited to a couple gigabytes, unless you use a different model which may be slower

stone plover Nov 3, 2023, 3:09 PM

#

lets just calloc it and test it

#

it shouldn't have such a huge difference anyways I think

formal wyvern Nov 3, 2023, 3:10 PM

#

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <stdbool.h>
#include <limits.h>


static inline int collatz_length(int* const cache, int n)
{
    printf("Input: %d\n", n);
    if (cache[n] == 0 && n % 2 == 0)
    {
        cache[n] = 1 + collatz_length(cache, n / 2);
    }
    else if (cache[n] == 0 && n % 2 == 1)
    {
        cache[n] = 2 + collatz_length(cache, (3 * n + 1) / 2);
    }
    
    return cache[n];
}

int main(int argc, char** argv)
{
    if (argc != 3)
    {
        printf("Usage: collatz <start> <end>");
        return 1;
    }

    int start = atoi(argv[1]);
    int end   = atoi(argv[2]);

    int* const cache = calloc(1U << 31, sizeof(int));
    cache[1] = 1;

    if (start <= 0 || end <= 0)
    {
        printf("0");
        return 1;
    }

    int max = INT_MIN;

    for (; start <= end; start += 1)
    {
        int const result = collatz_length(cache, start);

        if (result > max)
        {
            max = result;
        }
    }

    printf("%d", max);
}

stone plover Nov 3, 2023, 3:11 PM

#

test2.c: In function ‘collatz_length’:
test2.c:13:41: error: passing argument 1 of ‘collatz_length’ makes pointer from integer without a cast [-Werror=int-conversion]
   13 |         cache[n] = 1 + collatz_length(n / 2);
      |                                       ~~^~~
      |                                         |
      |                                         int
test2.c:8:45: note: expected ‘int * const’ but argument is of type ‘int’
    8 | static inline int collatz_length(int* const cache, int n)
      |                                  ~~~~~~~~~~~^~~~~
test2.c:13:24: error: too few arguments to function ‘collatz_length’
   13 |         cache[n] = 1 + collatz_length(n / 2);
      |                        ^~~~~~~~~~~~~~
test2.c:8:19: note: declared here
    8 | static inline int collatz_length(int* const cache, int n)
      |                   ^~~~~~~~~~~~~~
test2.c:17:51: error: passing argument 1 of ‘collatz_length’ makes pointer from integer without a cast [-Werror=int-conversion]
   17 |         cache[n] = 2 + collatz_length((3 * n + 1) / 2);
      |                                       ~~~~~~~~~~~~^~~
      |                                                   |
      |                                                   int
test2.c:8:45: note: expected ‘int * const’ but argument is of type ‘int’
    8 | static inline int collatz_length(int* const cache, int n)
      |                                  ~~~~~~~~~~~^~~~~
test2.c:17:24: error: too few arguments to function ‘collatz_length’
   17 |         cache[n] = 2 + collatz_length((3 * n + 1) / 2);
      |                        ^~~~~~~~~~~~~~
test2.c:8:19: note: declared here
    8 | static inline int collatz_length(int* const cache, int n)
      |                   ^~~~~~~~~~~~~~
test2.c: In function ‘main’:
test2.c:34:33: error: left shift count >= width of type [-Werror=shift-count-overflow]
   34 |     int* const cache = calloc(1 << 32, sizeof(int));
      |                                 ```

formal wyvern Nov 3, 2023, 3:12 PM

#

Fixed.

strange agate Nov 3, 2023, 3:12 PM

#

just do this https://godbolt.org/z/4949MYx3f

stone plover Nov 3, 2023, 3:13 PM

#

test2.c:34:24: error: argument 1 value ‘2147483648’ exceeds maximum object size 2147483647 [-Werror=alloc-size-larger-than=]
   34 |     int* const cache = calloc(1 << 31, sizeof(int));
      |                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from test2.c:2:
/usr/include/stdlib.h:543:14: note: in a call to allocation function ‘calloc’ declared here
  543 | extern void *calloc (size_t __nmemb, size_t __size)
      |              ^~~~~~
cc1: all warnings being treated as errors```

formal wyvern Nov 3, 2023, 3:14 PM

#

Try Zelis' version.

#

Although that gives SIGKILL in godbolt.

stone plover Nov 3, 2023, 3:14 PM

#

formal wyvern Although that gives SIGKILL in godbolt.

Alright

#

test2.c: In function ‘main’:
test2.c:35:13: error: product ‘1073741824 * 4’ of arguments 1 and 2 exceeds ‘SIZE_MAX’ [-Werror=alloc-size-larger-than=]
   35 |     cache = calloc(1 << 30, sizeof(int));
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from test2.c:2:
/usr/include/stdlib.h:543:14: note: in a call to allocation function ‘calloc’ declared here
  543 | extern void *calloc (size_t __nmemb, size_t __size)
      |              ^~~~~~
cc1: all warnings being treated as errors```

strange agate Nov 3, 2023, 3:14 PM

#

formal wyvern Although that gives SIGKILL in godbolt.

That'd be the OOM protection

stone plover Nov 3, 2023, 3:15 PM

#

isn't 1<<30 huge

strange agate Nov 3, 2023, 3:15 PM

#

stone plover ```c test2.c: In function ‘main’: test2.c:35:13: error: product ‘1073741824 * 4’...

try reducing the cache size then

formal wyvern Nov 3, 2023, 3:15 PM

#

Yes.

#

I put it back down to 16 and I just get [truncated].

#

Which is kind of sus, though.

stone plover Nov 3, 2023, 3:16 PM

#

seg fault

#

with 100 10000

#

Input: 148958

#

also why you assume the value is cached?

strange agate Nov 3, 2023, 3:17 PM

#

I think I expect the max recursion depth to be exceeded

#

You see sequence lengths in the thousands

stone plover Nov 3, 2023, 3:17 PM

#

strange agate I think I expect the max recursion depth to be exceeded

oh that

#

yeah

#

@formal wyvern Why would this approach work better?

formal wyvern Nov 3, 2023, 3:20 PM

#

It wouldn't necessarily, I was just wondering how the simple case compares to your solution.

stone plover Nov 3, 2023, 3:21 PM

#

Also that solution would necesarily work

#

Or maybe it would hmm

#

SMID isn't applicable here right?

strange agate Nov 3, 2023, 3:30 PM

#

Not particularly

stone plover Nov 3, 2023, 3:31 PM

#

Not particularly it is

#

or its not

strange agate Nov 3, 2023, 3:46 PM

#

It’s theoretically possible to use SIMD but the benefit is very questionable

#

It’d be complex at the very least

stone plover Nov 3, 2023, 11:17 PM

#

Thanks a ton for your help! Really appreciate it!

stone plover Nov 3, 2023, 11:59 PM

#

Haha

#Optimizing my collatz code