hello... i have a loop that iterates from 0 to 2^48 and i would like to use more than 1 thread to speed up the execution of the loop. how would i go about this? the loop basically takes a variable and performs a bunch of checks on it and if it passes all the checks then it prints the variable to a file.
if anyone would like to look at the program in question: https://github.com/Colin-Henry/cubiomesWorkspaceForAASSG/blob/main/netherFilters.c (it may confuse you more than explain things, but id like to multithread the for (currentStructureSeed = startingStructureSeed; currentStructureSeed <= endingStructureSeed; currentStructureSeed++) and the variable mentioned above is currentStructureSeed
#multithreading in c, how?
63 messages · Page 1 of 1 (latest)
When your question is answered use !solved to mark the question as resolved.
Remember to ask specific questions, provide necessary details, and reduce your question to its simplest form. For tips on how to ask a good question use !howto ask.
Probably look into pthread (most popular option) or if you wanna use the added standardized language-native threading interface https://en.cppreference.com/w/c/thread
There's also a windows specific API, probably use that if you only plan to run on windows.
!man pthread_create
pthread_create - create a new thread
Synopsis
#include <pthread.h>
int pthread_create(pthread_t *restrict thread,
const pthread_attr_t *restrict attr,
void *(*start_routine)(void *),
void *restrict arg);
Compile and link with -pthread.
!man pthread_join
pthread_join - join with a terminated thread
Synopsis
#include <pthread.h>
int pthread_join(pthread_t thread, void **retval);
Compile and link with -pthread.
Note that you need to
Compile and link with -pthread.
Hello @gusty girder
If you need to create 2 ^ 48 threads, I think it is not available normally, because of resource limit.
In my opinion, you can create 1024 thread in a process.
So you can create several processes, please use function fork.
Or there is way to increase limit, but I don`t know exactly.
Thanks.
Linux doesn't have a thread per process limit: https://stackoverflow.com/a/344292
Also OP doesn't want to create a single thread for each iteration, that'd be really stupid, he probably just wants to split it up into N threads
@gusty girder However especially with the side effects you have (i.e. writing to a file) you need to be able to ensure some order, which is why working with threads would require so much overhead to ensure that if you're writing there couldn't be any other threads before that still need to write results from earlier iterations.
So multithreading would be a pain in the arse to set up, and because you need to do all that synchronization and order checking stuff I don't think you would actually get any improvements
What you potentitally could use is SIMD
However if you want to iterate from 0 to 2⁴⁸, then good luck because that'll take some time
;compile
print(f"{2**48 = :,}")
Program Output
2**48 = 281,474,976,710,656
elmonkeking | 92ms | python | Python 3.12 | godbolt.org
If I don’t care what order the results are in, does that speed stuff up?
Yeah, good luck counting to 281 trillion
yes
Because then you don't need to worry about the entire order stuff, which means you can truly parallelize this (apart from the writing to the file bit)
The bright side is it will only take 51 hours if I multithread (extrapolating from smaller, single threaded tests)
How many cores are there on your CPU?
how many cores do you have available on the machine it is running on?
Anywhere up to 4000
Oh, yeah. That'll speed things up
How often do you need to write to the file?
Every time something passes all the tests, so roughly 1/1000 iterations
okay, so every thousandth iteration.
That'll still be quiet some data.
What are you writing to the file?
For each successful iteration?
Just the variable. It will take ~2TB of data
Ah okay, just wanted to make sure you know what you're up for
Thanks lol, yea I’ve done a handful of tests to make sure everything is within scope. Now I just have to actually write the code
If you don’t mind, can you explain parallelization in C?
Is it the same pthreads and everything or something else
Yeah okay, then my suggestion would be to just utilize the entire 4000 cores by creating 4000 threads. Each thread iterates for 2⁴⁸ / 4000 iterations.
yeah
!man pthread_create
pthread_create - create a new thread
Synopsis
#include <pthread.h>
int pthread_create(pthread_t *restrict thread,
const pthread_attr_t *restrict attr,
void *(*start_routine)(void *),
void *restrict arg);
Compile and link with -pthread.
!man pthread_join
pthread_join - join with a terminated thread
Synopsis
#include <pthread.h>
int pthread_join(pthread_t thread, void **retval);
Compile and link with -pthread.
Thread 0 starts at 0, Thread 1 starts at 2⁴⁸ / 4000, Thread 2 starts at 2⁴⁸ / 4000 * 2, Thread 3 starts at 2⁴⁸ / 4000 * 3 and so on
Ah ok thx
Just make sure you don't miss any seeds due to rounding errors, so maybe just choose 4096 threads to have a safe division
since 4096 = 2¹²
I’m likely not going to use the full thing to be curteous to other users, since it’s a shared computing cluster, but I’ll pick a power of 2
how much can you reasonably use?
Like 256?
I’d say 256 or 512
kk, good luck
Thx
Just make sure you've covered all edge cases. Wouldn't want your program to crash somewhere where it remains in an endless loop
or where it produces faulty data
@gusty girder Has your question been resolved? If so, type !solved :)
!solved
Thank you and let us know if you have any more questions!
This thread is now set to auto-hide after an hour of inactivity
Well you can do pthreads or CUDA
Or openmp