#Optimizing my collatz code

1875 messages · Page 2 of 2 (latest)

stone plover
#

python code automatically manages memory

torn quarry
#

but it was always the same numpy array, no?

stone plover
#

realloc isn't used on the giant_buffer

#

its being used on numbers and counters

torn quarry
#

what other buffers exist?

#

I mean: what for?

stone plover
#

I'll sent the code and I'll explain

#
int basic_collatz(uint64_t n) {
    uint64_t count = 0;
    uint64_t arr_counter = 0;
    uint64_t size = n;
    uint64_t* numbers;
    uint64_t* counters;
 
    numbers = malloc(size * sizeof(uint64_t));
    counters = malloc(size * sizeof(uint64_t));
    
    while(n != 1) {
        if (n % 2 == 0) {
 
            if (arr_counter >= size) {
                size *= 2;
                numbers = realloc(numbers, size * sizeof(uint64_t));
                counters = realloc(counters, size * sizeof(uint64_t));
            }
 
            assert(arr_counter < size);
            numbers[arr_counter] = n;
            counters[arr_counter] = count;
 
            n /= 2;
            count++;
            arr_counter++;
 
        } else {
 
            if (arr_counter >= size) {
                size *= 2;
                numbers = realloc(numbers, size * sizeof(uint64_t));
                counters = realloc(counters, size * sizeof(uint64_t));
            }
 
            assert(arr_counter < size);
            numbers[arr_counter] = n;
            counters[arr_counter] = count;
 
            n = ((3 * n) + 1) / 2;
            count+=2;
            arr_counter++;
 
        }
 
    }
 
    for (uint64_t i = 0; i < arr_counter; i++) {
        giant_buffer[numbers[i]] = count - counters[i];
    }
 
    free(numbers);
    free(counters);
 
    return count;
}```
#

do you want me to send the python code that is easier to read?

torn quarry
#

yes

stone plover
#
def basic_algo(n):
    global mem
    c = 0
    numbers = []
    counters = []

    while n > 1:
                
        if n % 2 == 0:

            numbers.append(n)
            counters.append(c)

            n = n // 2
            c+=1

        else:

            numbers.append(n)
            counters.append(c)

            n = (3 * n + 1) // 2
            c+=2

    for i in range(len(numbers)):
        mem[numbers[i]] = c - counters[i]
    
    return c```
torn quarry
#

thx!

stone plover
#

numbers and counters

#

I cache all the numbers that were computed in the end

#

I do it at the end because I don't know how far they are from 1 (collatz..)

torn quarry
#

okay but that's where your problem is

stone plover
#

the program is on numbers[i]

torn quarry
#

mem has been defined with capacity end-start

#

so it is too short to hold all the results basic_algo might need

stone plover
#

no

torn quarry
#

you can cache what basic_algo does, but you have no proof it fits into mem

stone plover
#

the problem isn't the mem

#

numbers[i] returns an absurd huge number

torn quarry
#

yeah that's what I mean

stone plover
#
@stone plover on the call `basic_collatz(103561)`, loop iteration `i=23` from the final loop is causing `numbers[i]` to result in `129140162```
torn quarry
#

mem is not meant to track an entire orbit

#

(it just cann't do it, by definition)

#

numbers and counters are tracking a full orbit

#

(the sequence of numbers that are hit by the basic algo, from a giving starting point)

#

but those numbers can because extremely big compared to your problem range

stone plover
#

oh

#

so I can just add a check

#

that if the number is bigger than the range

#

I don't cache it

torn quarry
#

yes

stone plover
#

Wonderful!

#

Thanks!

#

I thought the number that numbers[i] was returning wasn't a computed number

#

However I verified it is

#

Thanks for helping me understand it

torn quarry
#

you're welcome!

#

don't thx me, thx python and its verbose high level decent sane-for-the-mind style 🤣 hahahaha

torn quarry
#

please remove those realloc ; it's totally unnecessary now 😉

stone plover
#

what

#

The reallocs are not for mem

torn quarry
#

but cann't you mutate mem while iterating?

stone plover
#

mutate?

torn quarry
#

ah, maybe it will become silly

stone plover
#

:_:

torn quarry
#

is mem an array of signed integers?

stone plover
#

confusion immerses

#

no?

#

mem is the giant buffer

#

that we cache all numbers

#

we would compute in the future

torn quarry
#

okay nvm 😄 I had an idea but, I think it will get too mystic

stone plover
#

curiosity hits

#

I don't know I want to hear it I have wasted a lot of my time on this project but curiosity hits hard

#

I made some slight modifications should be all good now

#
//create simple collatz algorithm
int basic_collatz(uint64_t n) {
    uint64_t count = 0;
    uint64_t arr_counter = 0;
    uint64_t size = n;
    uint64_t* numbers;
    uint64_t* counters;

    numbers = malloc(size * sizeof(uint64_t));
    counters = malloc(size * sizeof(uint64_t));
    
    while(n != 1) {
        if (n % 2 == 0) {

            if (arr_counter >= size) {
                size *= 2;
                numbers = realloc(numbers, size * sizeof(uint64_t));
                counters = realloc(counters, size * sizeof(uint64_t));
            }

            if (n > max_number) {
                n/=2;
                count++;
            }
            else {
            numbers[arr_counter] = n;
            counters[arr_counter] = count;

            n /= 2;
            count++;
            arr_counter++;
            }

        } else {

            if (arr_counter >= size) {
                size *= 2;
                numbers = realloc(numbers, size * sizeof(uint64_t));
                counters = realloc(counters, size * sizeof(uint64_t));
            }

            if (n > max_number) {
                n = ((3 * n) + 1) / 2;
                count+=2;
            }
            else {
                
            numbers[arr_counter] = n;
            counters[arr_counter] = count;

            n = ((3 * n) + 1) / 2;
            count+=2;
            arr_counter++;

            }

        }

    }

    for (uint64_t i = 0; i < arr_counter; i++) {
        giant_buffer[numbers[i]] = count - counters[i];
    }

    free(numbers);
    free(counters);

    return count;
}```
#

I added that if ( n > max_number )

#

Alright it works now 🙂

#

a wonderful number of 2.7 seconds on O0

torn quarry
#
def basic_algo(n):
    global mem
    # `mem` will get mutated by this method. We keep track of the mutations
    # by using the sign as a flag. 
    c = 0

    while n > 1:
        if n < mem_size and mem[n] == 0:
           mem[n] = -c
                
        if n % 2 == 0:
            n = n // 2
            c+=1
        else:
            n = (3 * n + 1) // 2
            c+=2

    for i in range(len(mem)):
        if mem[i] < 0: mem[i] = c + mem[i]
    
    return c
stone plover
#

lets see the O3

torn quarry
#

would that work ?

stone plover
#

why are you storing mem[n] = -c?

#

it should be mem[n] = c no?

torn quarry
#

I use the sign as a way to keep track of the indices I'm mutating

stone plover
#

so after you remove it from mem[i] = c - mem[i]

torn quarry
#

ah yeah sorry

stone plover
#

but yeah

torn quarry
#

fixed

stone plover
#

That should probably work perfectly

#

and I remove the need of buffers

#

let me backup this code that I have right now

#

and I'll try to implement that

#

sounds easy

torn quarry
#

(I haven't added the check on n! so beware

stone plover
#

Yeah I have noticed

#

I'll add it

torn quarry
#

fixed

#

ok !

stone plover
#

this saves all the buffer and the reallocations

#

thats perfect!

#

anyways lets see some beautiful O3 performance 🙂

torn quarry
#

maybe measure it before checking if it's really nicer 😄 hahaha

#

it's a bit hacky.. maybe a comment would be useful to understand why we use negative numbers

stone plover
#

I still don't get why you use negative

#

I think it should be like this:

#
def basic_algo(n):
    global mem
    c = 0

    while n > 1:
        if n < mem_size and mem[n] == 0:
           mem[n] = c # CHANGED THIS
                
        if n % 2 == 0:
            n = n // 2
            c+=1
        else:
            n = (3 * n + 1) // 2
            c+=2

    for i in range(len(mem)):
        if mem[i] < 0: mem[i] = c - mem[i] #CHANGED THIS AS WELL
    
    return c```
#

Then mem can be unsigned

torn quarry
#

no

#

because if you do that

stone plover
#

I changed the + to -

torn quarry
#

next time you'll need to increment this counter (the old counters buffer), you won't see it

#

because it has positive sign

stone plover
#

wait

torn quarry
#

it really must stay negative, with a + at the end. otherwise you keep track of what entry is in "mutation" state

#

no? I'm saying something wrong?

stone plover
#

I think?

torn quarry
#

aaaaaaah, maybe I'm too prudent yes : because you have no cycles

#

so you don't come back on the same entry

stone plover
#

on the while loop

torn quarry
#

okay fair

stone plover
#

it will count how many iterations it has done until now

torn quarry
#

yeah okay okay fair to me! I trust

stone plover
#

no maybe I am wrong

#

I need to be sure

#

xd

torn quarry
#

(I mean, let's just run the python code and see if it dramatically fails 🤣

stone plover
#

oh right

torn quarry
#

haha

stone plover
#

yeah let me fix it and ill run it

torn quarry
#

before refactoring the C one, maybe playing with Python is helpful 😄

#

prototyping and shell-with-classes : where python is not that bad, I guess 😄

stone plover
#

oh wait

#

Oh I got what you meant about the plus

#

yeah you're right

#

I hadn't notice the check you have down there

#

is it really worth to do that

#

I mean

#

I guess ?

#

it should 🙂

torn quarry
#

yeah

stone plover
#

however

torn quarry
#

because otherwise you don't know where you need to do + c

stone plover
#

however is it faster

#

to run all over the mem

#

with a loop

#

or two loop

#

two custom buffers

#

with x specific iterations

torn quarry
#

idk

stone plover
#

buffers seem more optimal

torn quarry
#

to be honest, I think both are good in their own ways

stone plover
#

we could only use one buffer to keep track of what numbers to change

#

and then use your approach

torn quarry
#

because this last loop, you see,

stone plover
#

to change those numbers

torn quarry
#

it could be vectorized

#

(by C, I mean)

stone plover
#

oh

#

well I think

#

we could use an implementation of the two

torn quarry
#

maybe writing it with a ternary would make it more obvious for the compiler ; idk

stone plover
#

have an array

#

that holds the numbers

#

and store the number of iterations on mem

#

and then just

torn quarry
#

it also depends on how you realloc, and how much this method is effectively called

stone plover
#

change the numbers

torn quarry
#

because for some cases, you might only hit it rarely

#

(when you start low)

#

on other cases maybe you hit it every time

stone plover
#

hm

#

well the implementation of the two sounds better

#

don't you think?

#

use realloc only once for one buffer

#

store the counters in the mem

#

and then with a for loop

#

change the numbers to their supposed iterations

torn quarry
#

tbh I don't know

#

it's difficult to reason here because the complexity of the basic algorithm is difficult to comprehend

#

so maybe O(n) is "ridiculously small" compared to the main loop anyway

#

I would just check both implementations (realloc as you have, and no realloc at all),
in different scenarios

check if there is really a winner in all cases

if not, keep the one that is easier to understand

#

(assuming both are bugfree, ofc...)

stone plover
#

One quick question just wondering

#

why if we put as a max number: 1.000.000.000 it gets a segmentation fault?

#

it can't handle such big numbers?

#

(nvm I think its because of the 32 bit compiler settings)

#

!close

leaden obsidianBOT
torn quarry
#

that's actually a limitation of the cache approach

#

(you can never hope to cache integers that go beyond the size_t number

#

maybe you should disallow caching for those cases

#

(the algo will be slower... yes. but it will remain correct.

#

(note that you cannot just check the program inputs... because the orbits are not bounded by the inputs (as your previous issue showed))

torn quarry
# stone plover !close

actually, more generally, I think you should get more picky on the cache

  • first bound the size of the cache by a number you're sure pointers can reprsent
  • try to calloc until it succeeds ; if not, retry with a smaller size
  • add a guard everytime you try to access the buffer (IMO that might be a good case to write a structure or a macro, so you'll have a unified way to access this buffer)
#

you could also check if malloc couldn't be used, if calloc fails (say: your stack is much smaller than your heap) ; idk what others would think about it

stone plover
#

It's okay

#

It's an assignment after all

#

Not an actual program I'll use in my daily life or distribute xd

#

All the test cases work just fine

#

So there's no problem

torn quarry
#

okay 😄

stone plover
#

I hadn't read the assignment right

#

It's true after all, that whoever writes the fastest code, gets bonus

#

💀

#

Maybe after all I'll do more optimizations and more micro optimizations

#

I asked my professor, he said some students are fighting in milliseconds

#

I'm done

stone plover
prime ravine
#

you can probably reuse the numbers and counters buffers, instead of having to allocate for each number

stone plover
#

Yeah I guess that's possible

#

Okay that's one optimization

#

NOTED

#

If you have any other ideas lmk

#

Alright I fixed that

#

run some test cases

#

everything is working fine

stone plover
#

@prime ravine

#

My teacher said something about bss section and that i could utilize that maybe in some way

#

Any ideas what that is?

prime ravine
#

bss section is basically where statically allocated variables (that haven't been assigned values) get stored

#

so if you know the size of some buffer, for example, you can store it in the bss section and avoid an allocation at runtime

stone plover
#

I see

prime ravine
#

i guess you could use that for giant_buffer here?

#

wait no nvm

stone plover
#

I could

prime ravine
#

that's not true, you don't know the size at compile- time

stone plover
#

I can guess it

prime ravine
#

then you lose program correctness

stone plover
#

hear me out

#

the assignment says

#

the numbers will be given from the range (-100m, 100m)

#

(all negative numbers return 0)

#

so I can assume the maximum number size to be 100m + 1

prime ravine
#

yep, i guess that's true

stone plover
#

How would I do that?

#

can you give me an example code block that I use bss?

prime ravine
#

you just statically declare giant_buffer

stone plover
#

how do I do that?

prime ravine
#

the actual usage of .bss is an implementation detail

stone plover
#

so

#

how do I set the size of a buffer without alloc

prime ravine
#

just declare at the top of your file uint64_t giant_buffer[100000001]

#

just declare it as an array

stone plover
#

Okay

prime ravine
#

(this will create a p large binary though, like 1GB)

stone plover
#

what

#

idk we'll see

prime ravine
#

the .bss section is part of the binary itself

#

that's how you avoid the allocation

#

(you don't really avoid the allocation, it just gets mapped into memory as part of the image load)

stone plover
#

If it doesn't really give any performance boost

#

I'll remove it then

#

time to benchmark

#
warning: assignment to ‘uint64_t’ {aka ‘long unsigned int’} from ‘uint64_t *’ {aka ‘long unsigned int
’} makes integer from pointer without a cast```
#

what is this error?

prime ravine
#

you haven't shown me the line of code that generates it

#

so i can't tell

stone plover
#

oh mb

#
collatz.c: In function ‘basic_collatz’:
collatz.c:84:34: warning: assignment to ‘uint64_t *’ {aka ‘long unsigned int *’} from ‘uint64_t’ {aka ‘long unsigned in
’} makes pointer from integer without a cast [-Wint-conversion]
   84 |         giant_buffer[numbers[i]] = count - counters[i];```
#

There are tons of warnings

#

in the same format

prime ravine
#

you've declared giant_buffer wrong

stone plover
#

uint64_t* giant_buffer[100000001];

prime ravine
#

have you declared it as an array of uint64_t *s instead of uint64_ts?

stone plover
#

isn't it an array?

#

That's how I had declared it until now

#

OH

prime ravine
#

that's an array of pointers

stone plover
#

Yeah I got it

#

17k is the file

prime ravine
#

hmm

stone plover
#

program works

#

why would it be 1G?

#

100.000.000 * 64 bit

prime ravine
#

ohhhh i was wrong about bss

#

it does get allocated at runtime

#

the size is simply recorded in the object file, that makes sense

#

i'm dubious this will give you a performance speedup

stone plover
#

(dubious mean?)

#

(will not give any?)

prime ravine
#

i am doubtful that you'll get one

stone plover
#

oh alright

#

well

#

its still a micro optimization

#
collatz.c:8:10: error: variably modified ‘giant_buffer’ at file scope
    8 | uint64_t giant_buffer[BUFFER_SIZE];
      |          ^~~~~~~~~~~~
collatz.c: In function ‘main’:```
#

I tried making a const value BUFFER_SIZE

#

and set that to the giant_buffer

#

what's the error?

prime ravine
#

const value != constant expression

#

you can use a macro instead if you like

stone plover
#

what's a macro?

prime ravine
#

#define

stone plover
#

no idea

#

oh okay I went to read the docs

#

so I can just basically:

#define BUFFER_SIZE 100000000;```
prime ravine
#

yup

#

macros get expanded before the compiler runs

stone plover
#

I don't need any extras or something

prime ravine
#

nope

stone plover
#

sounds good

prime ravine
#

so the compiler actually sees uint64_t buffer[100000000], not uint64_t buffer[BUF_SIZE]

stone plover
#

aaa

#

noice

#

alright

#
trilon@M1911:~/hw0-TR1LON/collatz/src$ gcc -O0 -m32 -Wall -Wextra -Werror -pedantic -o collatz collatz.c
collatz.c:8:10: error: size ‘3200000032’ of array ‘giant_buffer’ exceeds maximum object size ‘21474836478 | uint64_t giant_buffer[BUFFER_SIZE];```
#

Bruh moment

#
Macro is 400.000.004
prime ravine
#

why is that?

#

and not 100 000 001?

stone plover
#

so I can cache more values that will be calculated

prime ravine
#

...but you said the max was 100 million

stone plover
#

yes

#

but collatz numbers can go above 100m

prime ravine
#

your original code doesn't account for that

stone plover
#

it does

#

basically

#

my original code says:

#

if the number that is calculated is bigger than the cache:

#

dont cache it

#

However

#

since I have a static cache now

#

I said let's make it bigger

#

if there's no easy fix

#

I'll just make it smaller and not cache big numbers

prime ravine
#

i really doubt that using bss will give you a performance increase

#

but try it and see what happens

stone plover
#

how do I fix that error

prime ravine
#

you reduce the size of the buffer?

stone plover
#

oh

#

alright

#

nothing changes

#

performance wise

#

o well

#

whatever

prime ravine
#

yup, thought so

stone plover
#

I'll leave it like that

#

if you think of anything else

#

don't hesitate

#

Current time is at:

trilon@M1911:~/hw0-TR1LON/collatz/src$ time ./collatz 100 100000000
950

real    0m4.581s
user    0m4.360s
sys     0m0.220s```
prime ravine
#

i wonder... how slow does it run without memoising?

stone plover
#

x10 slower

#

I think

prime ravine
#

hmm

#

im wondering if you could get some performance boosts by multithreading

#

(but then you have issues with sharing the same cache)

stone plover
#

I guess

#

I thought of it too

#

however I think we aren't even allowed to use more than one thread

#

the system we run the tests on I think limits it

prime ravine
#

okay, that's fair

stone plover
#

Also

#

why would we have issues with the cache?

#

Threads can't access eachother's memory?

prime ravine
#

because you can share the cache between threads

#

i.e. have shared memory

stone plover
#

yeah so where's the problem?

prime ravine
#

... so then the threads can access each other's memory

#

which means you're at risk of race conditions

stone plover
#

can't we split jobs equally from the start?

#

no we can't

#

because things arent cached

#

;/

#

well

#

there's only one solution

#

GPU computing

#

xD

#

nvm I need CUDA which I can't install to the main system

#

Time on the uni's machine 😄

linux12:/home/users/sdi23xxxxx/program>time ./collatz 100 100000000
950
3.929u 0.275s 0:04.20 99.7%     0+0k 0+0io 0pf+0w```
torn quarry
stone plover
#

But since I allocate them only once

#

on the main() function now

#

it shouldn't really be a performance difference

#

right?

torn quarry
#

ah! ok

#

and never realloc?

stone plover
#

I use realloc

#

but its being used very rare

#

wait I'll implement a counter

#

to see how many times it's being used

#

do you think the if statement is causing bottleneck?

torn quarry
#

wait if you send me the code, I could compare both on my machine

#

so I'm not bothering you

stone plover
#

oh alright

#

sure give me a second

#

Here's the current code

torn quarry
#

and how do you compile it eventually?

stone plover
#

right

#

gcc -O0 -m32 -Wall -Wextra -Werror -pedantic -o collatz collatz.c

torn quarry
#

ok!

#

and typical start and end, what do you choose? 100 and 100_000_000 ?

stone plover
#

yes

torn quarry
#

okay, 957 for the alternative approach ; so I guess not better if the competition already happens at the millis!

#

ah no wait

torn quarry
stone plover
#

are you on linux

#

if yes

#

time ./collatz 100 100000000

#

If you are on windows

#

you can use a program called ptime

torn quarry
#

nah it's fine, I can use time

stone plover
#

or use the untrustworthy powershell command: ps Measure-Command { ./collatz.exe 100 100000000 }

#

Alright

torn quarry
#

okay so the alternative approach takes much more time! so not good

stone plover
#

looping through the giant buffer?

torn quarry
#

yes

#

so with -O0 :

#

2.70 vs 3.40

#

with -O3:

#

1.66 vs 1.95

stone plover
#

I see

#

your computer achieves a really fast time non the less

torn quarry
#

(don"t especially surprise me that it's not the same percentage anymore ; likely due to vectorization)

#

oh yeah that's maybe a machine difference

stone plover
#

Yeah I guess

torn quarry
#

I was looking at the proportion only

torn quarry
#

my compiler doesn't not understand m32

#

so I had to remove that flag

#

maybe that's also why something that can impact the measured millis

stone plover
#

oh yeah

#

you need an addon for m32

#

it impacts more things

#

so careful

#

gcc by default complies a program to 64 bit, if you compile it after in 32 bit there may be errors

torn quarry
#

yeah

stone plover
#

Do you know

#

if implementing gpu computing

#

is extremely hard

torn quarry
#

(it's funny, I'm trying your code without realloc ; just don't realloc the buffer and if it's filled, stop adding a cache)

#

intuitivelt one could expect the number of holes to get lower, so the needs to realloc would be small

#

but okay anyway!

#

no idk about gpu things

stone plover
#

wait

#

I have forgotten a check

#

I just noticed

#
            else {
            numbers[arr_counter] = n;
            counters[arr_counter] = count;```
#

I don't check if the n is already computed

torn quarry
#

ah oops

stone plover
#

but

#

hm

#

I don't really know if I can skip anything here

#

well

#

I think I can just add a check up on the while loop

#

that should work just fine actually

torn quarry
#

btw, if I may , I think you have some duplicated code

stone plover
#

Where?

torn quarry
#

basic_collatz

stone plover
#
        if (giant_buffer[n] != 0) {
            break;
        }``` I think this should be good on `basic_collatz`
torn quarry
#
//create simple collatz algorithm
int basic_collatz(uint64_t n) {
    uint64_t count = 0;
    uint64_t arr_counter = 0;
    
    while(n != 1) {
        if (n % 2 == 0) {

            if (arr_counter >= SIZE) {
                SIZE *= 2;
                numbers = realloc(numbers, SIZE * sizeof(uint64_t));
                counters = realloc(counters, SIZE * sizeof(uint64_t));
            }

            if (n > max_number) {
                n/=2;
                count++;
            }
            else {
            numbers[arr_counter] = n;
            counters[arr_counter] = count;

            n /= 2;
            count++;
            arr_counter++;
            }

        } else {

            if (arr_counter >= SIZE) {
                SIZE *= 2;
                numbers = realloc(numbers, SIZE * sizeof(uint64_t));
                counters = realloc(counters, SIZE * sizeof(uint64_t));
            }

            if (n > max_number) {
                n = ((3 * n) + 1) / 2;
                count+=2;
            }
            else {
                
            numbers[arr_counter] = n;
            counters[arr_counter] = count;

            n = ((3 * n) + 1) / 2;
            count+=2;
            arr_counter++;

            }

        }

    }

    for (uint64_t i = 0; i < arr_counter; i++) {
        giant_buffer[numbers[i]] = count - counters[i];
    }
    return count;
}
#

can likely become

//create simple collatz algorithm
int basic_collatz(uint64_t n) {
    uint64_t count = 0;
    uint64_t arr_counter = 0;
    
    while(n != 1) {
        if(n <= max_number) {
            if (arr_counter >= SIZE) {
                SIZE *= 2;
                numbers = realloc(numbers, SIZE * sizeof(uint64_t));
                counters = realloc(counters, SIZE * sizeof(uint64_t));
            }
            numbers[arr_counter++] = n;
            counters[arr_counter] = count;
        }

        if (n % 2 == 0) {
            n /= 2;
            count++;
        } else {
            n = ((3 * n) + 1) / 2;
            count+=2;
        }

    }

    for (uint64_t i = 0; i < arr_counter; i++) {
        giant_buffer[numbers[i]] = count - counters[i];
    }
    return count;
}
#

(if I don't mistake)

stone plover
#

oh

#

you merged the n <= max_number

#

I think

#

right?

torn quarry
#

yes

#

and the resizing also

stone plover
#

well I guess it seems good

#

take the arr_counter++ outside from the buffer

#

so its more readable

torn quarry
#

imo the code as it is now, is highly unsafe : if your realloc fails, you'll get a segfault

#

I would add some checks there

stone plover
#

I'll see what I can do

#

Professor said if I have the fastest code in class I need to do a presentation

#

Presentation to 200 people xd

torn quarry
#

(also : since you keep track of the indices that you mutate in numbers, I'm not sure qhy you actually need counters:
can you just print the count in giant_buffer and rectify later?

stone plover
#

Oh

#

save the numbers in giant_buffer

#

and have only numbers

#

If I understood correctly

torn quarry
#

yeah ; since you know (in numbers) what you are mutating

#

you can save them and rectify later

stone plover
#

yeah yeah

#

alright got it

#

I'll go eat now

#

I'll be back and I'll apply the changes

torn quarry
#

it's kind of frustrating to get a competitive thing with bonus, based on "one time millis" 😄 if ever you're not first, don't hesitate to take a step back 😄

#

(an approach that works well on (start,end) might not work quite as well on different inputs 😉 since I see you're quite investing time into it, don't take it too personnally if it doesn't pay off 🙂 what you'd have learned might be the best take-away)

stone plover
#

Well I'm pretty happy that I have learned so many about C in such a small period

#

I mean obviously I won't be as excited if I'm not the first, but it's not that bad

stone plover
#

trilon@AK-47:~/collatz$ time ./collatz 100 100000000
952

real 0m2.465s
user 0m2.365s
sys 0m0.100s

#

I got a value of 952 instead of 950

#

Alright nvm fixed it

#

The

numbers[arr_counter++] = n;
``` caused the error
#

I changed it to

#
arr_counter++;
numbers[arr_counter] = n;```
torn quarry
#

ah yeah sorry, I haven't really tested before submitting

stone plover
#

Also I'll remove the numbers[]

#

and use the giant_buffer

#

i'll also add the check to see if its computed already

#
break; ``` exists in C right?
prime ravine
#

yup

stone plover
#

Alright thanks!

#

hmm

#

for some reason now it has incorrect values

#

Oh I know why wait

#

Well I don't see any performance difference

#

But I guess its good practice to have a more simple and compact code

#

o oh

#

its not good

#

it's producing false results

#

I think the problem is that check

#

it's producing wrong results 💀

#

Alright I got the backup

#

I'll add some extra variables to check how many times other things get accessed

torn quarry
#

ah sorry if I brought a mess 😄 hahahaha

#

#oops

stone plover
#

I'll make a python script that runs test cases

#

and checks if the output of the program was correct

stone plover
#

hm

#

@torn quarry I think we still call the basic function a lot of times

#
trilon@AK-47:~/collatz$ ./collatz 23504 100000000
950
Called 7301 times.
trilon@AK-47:~/collatz$ ./collatz 1 100000000
950
Called 2 times.
trilon@AK-47:~/collatz$ ./collatz 3000 100000000
950
Called 935 times.```
#

let me test the code at the python code to see if we access it the same amount

#

(it should be)

torn quarry
#

for now in the basic function, you add in cache if the entry was not computed yet

#

what happens if you just, exit the function if the entry has been discovered already ? (because it means you're now in an orbit you've discovered)

#

(that is suppose to "return fast" in case you eventually meet a trajectory you've discovered before)

stone plover
#

(no idea why actually)

torn quarry
#

ah because you cannot rectify or something?

stone plover
#

hm

torn quarry
#

idk

stone plover
#

let me think about it a bit

#
if (giant_buffer[n] != 0) {  //If it's calculated:
  c += giant_buffer[n];
}```
torn quarry
#

for example ; 5 goes to 16 -> 8 -> 4 -> 2 -> 1
so when you're at 5, you actually don't have to go to 1, because you might already have computed 4 from a previous computation

stone plover
#

yeah I got it

stone plover
#

I had implemented it

#

but it gave incorrect values

torn quarry
#

ah

stone plover
#

Probably I did something wrong

#

I'll give it another try

torn quarry
#

maybe those cases are also too rare to really matter, idk

stone plover
#

I'll add a counter

#

to see how rare are these cases

torn quarry
#

maybe more like a score I think :

counter before early return / counter returned

the closest to 1, the closest to 1 you went

stone plover
#

I found the error

#

I had incorrect brackets I THINK

#

lemme try to implement it again

torn quarry
#

wait

#

I think you're right,there is something that is not supposed to be working

#

because,

stone plover
#

Alright

#

It IS useful

stone plover
#
trilon@AK-47:~/collatz$ ./collatz 14040 100000000
950
4366```
torn quarry
#

ah, interesting

stone plover
#

4366 times was giant_buffer[n] used

torn quarry
#

because, iirc, but maybe I've stopped followng at some point,

#

the entire goal of this basic function was to never actually read from the cache

#

(compared to the main loop)

#

but ! I don't remember why

#

maybe I'm mistaking

stone plover
#

Oh

#

I think you're right

#

yeah

#

Maybe now we are calling the function even more times

torn quarry
#

but actually, it was two days ago already

#

so maybe the code had changed since then, and it's now completely useless to keep having this basic function 😄 hahaha

stone plover
#

wait I'll compile an old backup and time them

torn quarry
#

tbh I haven't followed entirely 🤣

stone plover
#

¯_(ツ)_/¯

torn quarry
#

yeah no I cann't remember

#

I do remember we restarted from [some state], then wanted to introduce cache ; then there was somthing in the code that was not friendly to handle reading caches in some cases, because the cache was not set yet ;

#

but it was sooo long ago 😄 like sunday morning? and we're tuesday afternoon here 😄 no chance my memory is large enough to handle that

#

haha

stone plover
#

uhh

#
trilon@AK-47:~/collatz$ time ./collatz 14043 100000000
950
CALLED: 4366
GIANT: 4365

real    0m1.699s
user    0m1.578s
sys     0m0.120s
trilon@AK-47:~/collatz$ time ./test 13043 100000000
950
BASIC: 4064

real    0m1.698s
user    0m1.655s
sys     0m0.040s```
#

the first one is with the giant_buffer[n]

#

the second one is without

#

they achieve literally the same time

#

do we care about user or sys?

#

oh wait

#

wrong number on the 2nd one

#

OOPS

#

retry test

#
trilon@AK-47:~/collatz$ time ./collatz 14043 100000000
950
CALLED: 4366
GIANT: 4365

real    0m1.641s
user    0m1.550s
sys     0m0.090s
trilon@AK-47:~/collatz$ time ./test 14043 100000000
950
BASIC: 4366

real    0m1.647s
user    0m1.487s
sys     0m0.160s
trilon@AK-47:~/collatz$```
#

No difference

#

The function is called the same exact amount

#

however there's no performance difference

#

how is that possible

#

Can you check if I have done any mistakes xD

#

I think the code is good

#
//create simple collatz algorithm
int basic_collatz(uint64_t n) {
    uint64_t count = 0;
    uint64_t arr_counter = 0;
    times_basic_called++;
    
    while(n != 1) {

        if (giant_buffer[n] != 0) {
            times_giant_used++;
            count+=giant_buffer[n];
            n = 1;
        } 
        else {

          if (n % 2 == 0) {
  
              if (arr_counter >= SIZE) {
                  SIZE *= 2;
                  numbers = realloc(numbers, SIZE * sizeof(uint64_t));
              }
  
              if (n > max_number) {
                  n/=2;
                  count++;
              }
              else {
              numbers[arr_counter] = n;
              giant_buffer[n] = count;
  
              n /= 2;
              count++;
              arr_counter++;
              }

            } else {

                if (arr_counter >= SIZE) {
                    SIZE *= 2;
                    numbers = realloc(numbers, SIZE * sizeof(uint64_t));
                }

                if (n > max_number) {
                    n = ((3 * n) + 1) / 2;
                    count+=2;
                }
                else {
                    
                numbers[arr_counter] = n;
                giant_buffer[n] = count;

                n = ((3 * n) + 1) / 2;
                count+=2;
                arr_counter++;

                }

            }
        }

    }

    for (uint64_t i = 0; i < arr_counter; i++) {
        giant_buffer[numbers[i]] = count - giant_buffer[numbers[i]];
    }
    return count;
}```
#

I think I should do something about the if's and else

#

we have kinda alot

#

@prime ravine sorry to bother you

#

do you know assembly?

#

https://godbolt.org/z/sd9fhvEas <---- I want to apply any optimizations that I can apply manually that -O3 applies

prime ravine
#

are you allowed to inline assembly for your solution?

stone plover
#

I don't think so...

#

Well it doesn't say I am not allowed to

#

but I don't think

#

I'm allowed either

prime ravine
#

oh so you're trying to make the C code have the optimisations in it?

stone plover
#

yes

#

basically

#

I'm forced to compile with O0

prime ravine
#

you won't be able to do it entirely, since the optimiser simply has more power than you have

#

you could try just unrolling some loops by hand

#

seeing if that gets you anywhere

stone plover
#

I mean yeah

#

That's what I mean

#

The thing is I don't really understand assembly

#

I just know some uhh functions idk how u call them

#

the mov,push etc

#

cmp

#

Can you point out some obvious ones?

#

I noticed there's no .main on the O3

#

oh there is mb

torn quarry
#

if I may, I think there are other measures that would be relevant

stone plover
#

Like?

torn quarry
#

for example: can you print the numbers of elements in the buffer that are never actually filled

#

(the ones smaller than start)

stone plover
#

Well to be honest

#

I don't know if that's faster or not

#

we Could:

#

collatz(2, start)

#

collatz(start, end)

#

xD

#

but that ruins the whole point

#

of the assignment so ig not

torn quarry
#

but maybe the point is not the right one, yes

#

because take for example this path:

#

5 -> 16 -> 8 -> 4 -> 2 -> 1

#

and stay you start at 3

#

3 will give you the path

#

3 -> 10 -> 5 ...

stone plover
#

right

#

so

torn quarry
#

(wait I try to formulate the observation in my mind 😄 hahaha)

stone plover
#

We need an algorithm to choose the best numbers first

#

And then guess for the others

torn quarry
#

I'm wondering if it wouldn't be possible to actually buffer a bit more than numbers, but more like, a kind of dependency graph

stone plover
#

BTW

torn quarry
#

that actually encodes the path we're taking

#

but I'm not sure ; I have to think

stone plover
#

The professor told me another guy is talking about a guessing algorithm (that guesses which number has the biggest sequence without actually calculating it), but neither my professor trusts it as much until he sees it

torn quarry
#

you see in the current approach, when you hit 3, you're going to fill up the full path
3 -> 10 -> 5 -> 16 -> 8 -> 4 -> 2 -> 1

stone plover
#

hm

#

May I ask

#

Do you think implementing reverse collatz would be better?

torn quarry
#

but maybe you could just go to 1, and while iterating , you store information like:

entries[10] = { .pivot = 3, distance = 1 }
#

so that you don't have to go back and fill elements in your path

#

next time you hit 10, you know you only have to resolve it ;

#

I'm not sure that can be better, that said

#

the point is that if you allow pivot to be always smaller than the entry index, you can maybe will that buffer "very" fast,

#

and then re-traverse it while finding the maximal value

prime ravine
#

what's the size of giant_buffer?

torn quarry
#

epty entries are { pivot = -1 , distance = 0 } ; resolved ones are { -1 , > 0 }

prime ravine
#

you could get away with storing uint32_t, now?

stone plover
#

When I tried doing so I can weird values

#

and errors

torn quarry
#

I'm giving it a try and I'll see

stone plover
#

@torn quarry

#

I found some useful information in a research paper

#

Page 10

#

Has all the tricks proven by math to filter out numbers

#

I think

torn quarry
#

can you tell me @stone plover

#

what are the maximal values for start 1 and
end = 10
end = 15
end = 25

stone plover
#

oh

#

sorry

#

I had gone to eat

#

yeah I will

#

20, 20, 24

#

if your numbers are

#

19, 19, 23

#

it is correct

#

I add +1 in the end

#

@torn quarry

torn quarry
#

ok !

torn quarry
#

(-O0)

stone plover
#

we can't change in different systems

#

judge*

#

compare**

#

we can't compare in different systems

torn quarry
stone plover
#

It's the latest one

torn quarry
#

you run with

#

start=?

#

end=100 000 000

stone plover
#

yes

torn quarry
#

okay so unfortunately, my idea didn't turn to be fast enough

#

I think I make cache misses way too much

stone plover
#

oof

#

Should we stop bothering with this

#

I mean probably

#

there are optimizations that we could find

torn quarry
#

hmmmm, conceptually, I'm not quite sure to understand why my code is so slower

#

but I guess it has to do with the fact I traverse the numbers once without actually resolving anything

#

(and then I have to perform very long jumps, I guess)

#

hmmmmmm, no, nop nop nop --- there is something else I do wrong
because the last O(n) loop that resolves everything, takes 1% of the time -- I'll check a bit later, I must have forgotten a check

stone plover
#

Oh I see

torn quarry
#

aaaaaah okay I think I get it! with my approach, there is a significiant draw back, that I have to wait my cursor comes back at an index below the starting point

for example, s = 7, I will run the algo but I need to wait to come below 7 to stop

#

while in your case, you stop as soon as you have something in the cache (I guess?)

#

I think that's where I'm loosing all my time

#

so for example, when I hit 7,

I hit then 22 and I claim
value(22) = value(7) + 1
I continue and hit 11, so I claim
value(11) = value(7) + 2
and so on;

and I can never stop while I'm not below 7 ; even if I already discovered 11, I need to wait to come back below 7 because I need the dependencies to be always below the index ; hu hu , unfortunate

#

yeah yeah yeah, that's it I think

I just removed the check I made and performed one computation "too much", and I just made +50% perf decrease. so I guess that's why I'm so slow 😄 hahaha

stone plover
#

I see

stone plover
#

I guess then this project is finished

#

I'll try to make the code a bit clearer

#

Explain the code

#

And it's finished

stone plover
#

hello

stone plover
#

@formal wyvern give me a second I'll post the latest code

#

Any ideas?

formal wyvern
#

What exactly is the goal of the assignment?

stone plover
#

Calculate the fastest

formal wyvern
#

Count the number of steps each number takes to reach 1?

stone plover
#

Yes

#

Find the biggest collatz sequence

#

In a range of numbers

formal wyvern
#

Are start and end bounded?

stone plover
#

Bounded meaning?

#

Specific?

#

The range of numbers is [-100m, +100m]

#

The fastest time test will run (probably) on 100 -> 100.000.000

formal wyvern
#

Do we know something about the inputs?

#

For example, will they always be valid?

stone plover
#

Yes

#

The start will never be bigger than end

#

And the numbers will be valid from the range I mentioned

#

I would give you the assignment paper but it's in Greek so...

formal wyvern
stone plover
#

I have only for negative numbers

#

The range supports negative numbers

#

However

#

The assignment says if the number is negative print 0 and exit

#

So

#

Yeah ig

#

Jude has helped me as well but we can't find anything more than this to optimize except using the O3 optimizations

#

Which makes it run about x2.1 times after

#

I think

#

Lmk if you have any ideas or questions @formal wyvern

formal wyvern
stone plover
#

No idea what that is

#

If it requires threads we are not allowed to use

#

Oh alright got it

#

However how would I apply that in my code? I need to calculate all the numbers in order since I use memoization

strange agate
#

@stone plover You've been looking at this for weeks has your teacher not shown you what he did yet

stone plover
#

I have achieved a better time than my teacher

#

However

#

The first place (There's a leaderboard apparently) gets a bonus on his grade

stone plover
strange agate
stone plover
#

Memoization

#

And some other tricks

#

Using mod 4

stone plover
formal wyvern
#

Why is your code so hideous?

stone plover
#

Hideous?

strange agate
#

Not really something we want to see

formal wyvern
#

That's true, my bad.

stone plover
#

Well my experience in writing C is about 3-3,5 weeks sorry if my code is not very clear and well written

strange agate
#

It's not bad

#
                if (arr_counter >= SIZE) {
                    SIZE *= 2;
                    numbers = realloc(numbers, SIZE * sizeof(uint32_t));
                }

                if (n <= max_number) {
                    numbers[arr_counter++] = n;
                    giant_buffer[n] = count;
                }
```this is repeated twice, this could be abstracted to a function to deal with reallocating your buffer and setting stuff
stone plover
#

I suppose I could do that

#

I don't have a computer rn

#

I'll fix it in a bit thanks!

strange agate
#

what's the difference between numbers and giant_buffer

stone plover
#

numbers save temporarily numbers

#

The reason I save them is because I don't know the sequence for such number

#

So in the end I add them to the giant buffer after I have reached 0

strange agate
#

Hmm interesting

stone plover
#

I didn't explain it very well let me rephrase it

#

As I'm calculating for n, I cache every number I calculate in the process. (If I haven't already calculated obviously). After I finish calculating for n, I add to the cache all the numbers the giant buffer with the difference of the biggest count to the smaller

#

I hope I made some sense

strange agate
#

It makes sense

#

Is one of your rules that you can't multithread?

stone plover
#

Yep

#

Hurubon mentioned SMID but I can't see any way that I can apply that in my code

#

Since I don't calculate many values in the same time

#

I calculate one each loop

formal wyvern
#

How does this compare with your solution?

#
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <stdbool.h>
#include <limits.h>

int buffer[1 << 28] = { 1, 1 };

int collatz_length(int n)
{
    if (buffer[n] == 0 && n % 2 == 0)
    {
        buffer[n] = 1 + collatz_length(n / 2);
    }
    else if (buffer[n] == 0 && n % 2 == 1)
    {
        buffer[n] = 2 + collatz_length((3 * n + 1) / 2);
    }
    
    return buffer[n];
}

int main(int argc, char** argv)
{
    if (argc != 3)
    {
        printf("Usage: collatz <start> <end>");
        return 1;
    }

    int start = atoi(argv[1]);
    int end   = atoi(argv[2]);

    if (start <= 0 || end <= 0)
    {
        printf("0");
        return 1;
    }

    int max = INT_MIN;

    for (; start <= end; start += 1)
    {
        int const result = collatz_length(start);

        if (result > max)
        {
            max = result;
        }
    }

    printf("%d", max);
}
strange agate
#

It's verry difficult to try to apply simd to collatz

formal wyvern
#

It might not work, change the 1 << 16 to something like 1 << 28.

formal wyvern
stone plover
#

Sorry family needed me for a bit

#

Im running the test now

#
test2.c:52:18: error: ‘result’ undeclared (first use in this function)
   52 |     printf("%d", result);
      |                  ^~~~~~
test2.c:52:18: note: each undeclared identifier is reported only once for each function it appears in```
#

I think you meant max let me change it

#
trilon@AK-47:~/collatz$ time ./test2 100 100000000
Segmentation fault

real    0m0.001s
user    0m0.001s
sys     0m0.000s```
formal wyvern
#

Try to run it again.

stone plover
#

segfault

#

trilon@AK-47:~/collatz$ time ./test2 100 100000000
Segmentation fault

real 0m0.007s
user 0m0.007s
sys 0m0.000s

#

compiler settings

#

gcc -O0 -m32 -Wall -Wextra -Werror -pedantic -o test2 test2.c

formal wyvern
#

Hmm, let me see why it segfaults.

stone plover
#

1.1G file?

strange agate
#

The way the array is initialized is problematic

#

If you write int buffer[1 << 28]; and then set buffer[0] = 1; buffer[1] = 1; in main then the array can live in .bss and the file won't be 1.1GB

stone plover
#

Also

formal wyvern
#

Which is what I want.

stone plover
#

Oh no

strange agate
stone plover
#

Biggest size of .c file is 8kb ??????

#

why is that even a thing

#

well whatever

strange agate
#

I'm not sure why it's segfaulting though. Could try running in a debugger to see why though

stone plover
#

it didn't like the 1.1gb

strange agate
#

Yeah CE won't like that

stone plover
#

visual studio code debugger might work let me check

strange agate
#

In practice compilers support more

stone plover
#

Alright

stone plover
strange agate
#

where does it say the error is

#

It should be either a stack overflow from recursion or out of bounds buffer access

stone plover
#

yeah I can't even run the code because my vs code is broken

strange agate
#

hmm

stone plover
#

trying to fix it give me a sec

formal wyvern
#

@stone plover I tried giving it 1 << 29 space and it seems not to crash on godbolt.

stone plover
#

alright let me give it 1 << 29

#

did you use the command I gave you

formal wyvern
stone plover
#

arguments*

formal wyvern
#

Yes.

stone plover
#

it still gives me a seg fault

#

wait

#

I tried gcc test2.c

#
/usr/lib/gcc/x86_64-linux-gnu/11/crtbeginS.o: in function `deregister_tm_clones':
crtstuff.c:(.text+0x3): relocation truncated to fit: R_X86_64_PC32 against `.tm_clone_table'
crtstuff.c:(.text+0xa): relocation truncated to fit: R_X86_64_PC32 against symbol `__TMC_END__' defined in .data section in a.out
/usr/lib/gcc/x86_64-linux-gnu/11/crtbeginS.o: in function `register_tm_clones':
crtstuff.c:(.text+0x33): relocation truncated to fit: R_X86_64_PC32 against `.tm_clone_table'
crtstuff.c:(.text+0x3a): relocation truncated to fit: R_X86_64_PC32 against symbol `__TMC_END__' defined in .data section in a.out
/usr/lib/gcc/x86_64-linux-gnu/11/crtbeginS.o: in function `__do_global_dtors_aux':
crtstuff.c:(.text+0x76): relocation truncated to fit: R_X86_64_PC32 against `.bss'
crtstuff.c:(.text+0x9e): relocation truncated to fit: R_X86_64_PC32 against `.bss'
collect2: error: ld returned 1 exit status```
strange agate
#

1<<29 is too big

stone plover
#

yeah we used -1

#

(1<<29) - 1

strange agate
#

Still too big

#

Just calloc it

stone plover
#

Do we even need such a big cache

strange agate
# strange agate Still too big

The reason is the memory model of the binary itself is limited to a couple gigabytes, unless you use a different model which may be slower

stone plover
#

lets just calloc it and test it

#

it shouldn't have such a huge difference anyways I think

formal wyvern
#
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <stdbool.h>
#include <limits.h>


static inline int collatz_length(int* const cache, int n)
{
    printf("Input: %d\n", n);
    if (cache[n] == 0 && n % 2 == 0)
    {
        cache[n] = 1 + collatz_length(cache, n / 2);
    }
    else if (cache[n] == 0 && n % 2 == 1)
    {
        cache[n] = 2 + collatz_length(cache, (3 * n + 1) / 2);
    }
    
    return cache[n];
}

int main(int argc, char** argv)
{
    if (argc != 3)
    {
        printf("Usage: collatz <start> <end>");
        return 1;
    }

    int start = atoi(argv[1]);
    int end   = atoi(argv[2]);

    int* const cache = calloc(1U << 31, sizeof(int));
    cache[1] = 1;

    if (start <= 0 || end <= 0)
    {
        printf("0");
        return 1;
    }

    int max = INT_MIN;

    for (; start <= end; start += 1)
    {
        int const result = collatz_length(cache, start);

        if (result > max)
        {
            max = result;
        }
    }

    printf("%d", max);
}
stone plover
#
test2.c: In function ‘collatz_length’:
test2.c:13:41: error: passing argument 1 of ‘collatz_length’ makes pointer from integer without a cast [-Werror=int-conversion]
   13 |         cache[n] = 1 + collatz_length(n / 2);
      |                                       ~~^~~
      |                                         |
      |                                         int
test2.c:8:45: note: expected ‘int * const’ but argument is of type ‘int8 | static inline int collatz_length(int* const cache, int n)
      |                                  ~~~~~~~~~~~^~~~~
test2.c:13:24: error: too few arguments to function ‘collatz_length’
   13 |         cache[n] = 1 + collatz_length(n / 2);
      |                        ^~~~~~~~~~~~~~
test2.c:8:19: note: declared here
    8 | static inline int collatz_length(int* const cache, int n)
      |                   ^~~~~~~~~~~~~~
test2.c:17:51: error: passing argument 1 of ‘collatz_length’ makes pointer from integer without a cast [-Werror=int-conversion]
   17 |         cache[n] = 2 + collatz_length((3 * n + 1) / 2);
      |                                       ~~~~~~~~~~~~^~~
      |                                                   |
      |                                                   int
test2.c:8:45: note: expected ‘int * const’ but argument is of type ‘int8 | static inline int collatz_length(int* const cache, int n)
      |                                  ~~~~~~~~~~~^~~~~
test2.c:17:24: error: too few arguments to function ‘collatz_length’
   17 |         cache[n] = 2 + collatz_length((3 * n + 1) / 2);
      |                        ^~~~~~~~~~~~~~
test2.c:8:19: note: declared here
    8 | static inline int collatz_length(int* const cache, int n)
      |                   ^~~~~~~~~~~~~~
test2.c: In function ‘main’:
test2.c:34:33: error: left shift count >= width of type [-Werror=shift-count-overflow]
   34 |     int* const cache = calloc(1 << 32, sizeof(int));
      |                                 ```
formal wyvern
#

Fixed.

strange agate
stone plover
#
test2.c:34:24: error: argument 1 value ‘2147483648’ exceeds maximum object size 2147483647 [-Werror=alloc-size-larger-than=]
   34 |     int* const cache = calloc(1 << 31, sizeof(int));
      |                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from test2.c:2:
/usr/include/stdlib.h:543:14: note: in a call to allocation function ‘calloc’ declared here
  543 | extern void *calloc (size_t __nmemb, size_t __size)
      |              ^~~~~~
cc1: all warnings being treated as errors```
formal wyvern
#

Try Zelis' version.

#

Although that gives SIGKILL in godbolt.

stone plover
#
test2.c: In function ‘main’:
test2.c:35:13: error: product ‘1073741824 * 4’ of arguments 1 and 2 exceeds ‘SIZE_MAX’ [-Werror=alloc-size-larger-than=]
   35 |     cache = calloc(1 << 30, sizeof(int));
      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from test2.c:2:
/usr/include/stdlib.h:543:14: note: in a call to allocation function ‘calloc’ declared here
  543 | extern void *calloc (size_t __nmemb, size_t __size)
      |              ^~~~~~
cc1: all warnings being treated as errors```
strange agate
stone plover
#

isn't 1<<30 huge

formal wyvern
#

Yes.

#

I put it back down to 16 and I just get [truncated].

#

Which is kind of sus, though.

stone plover
#

seg fault

#

with 100 10000

#

Input: 148958

#

also why you assume the value is cached?

strange agate
#

I think I expect the max recursion depth to be exceeded

#

You see sequence lengths in the thousands

stone plover
#

yeah

#

@formal wyvern Why would this approach work better?

formal wyvern
#

It wouldn't necessarily, I was just wondering how the simple case compares to your solution.

stone plover
#

Also that solution would necesarily work

#

Or maybe it would hmm

#

SMID isn't applicable here right?

strange agate
#

Not particularly

stone plover
#

Not particularly it is

#

or its not

strange agate
#

It’s theoretically possible to use SIMD but the benefit is very questionable

#

It’d be complex at the very least

stone plover
#

Thanks a ton for your help! Really appreciate it!

stone plover
#

Haha