CUDA programming summation | Together C & C++ | Page 1

autumn jungle Oct 5, 2023, 10:35 PM

#

I don't have code to speak of yet but the idea is that I will do a
for (y=1, y<=n, y++)
for (X=1,X<=n, X++)
some_variable += a/start(b)

Where a and b are calculated signed integers within the function. The issue is of course that I am adding the result of all of these calculations together and n ideally is going up to 100,000 the result of this is many, many calculations but I cannot just directly call += to some_variable because race condition. The obvious answer would be to save all the results in an array but the results are of double form so I would need an array of doubles length 100000^2 which is slightly too big.

I've read into reduction but I'd just like some help conceptualising it. Like do I split it into 100000^2/256 blocks of 256 threads each and store the value of each of those blocks? But then I've got the same issue of needing so many places to store data so as to not cause a race condition. Is there a way to run 1000 blocks, halt, sum those blocks. Then run the next 1000 blocks, halt. Etc.

Thx in advance for help and sorry if it's a very newbie question

idle dragonBOT Oct 5, 2023, 10:35 PM

#

When your question is answered use !solved to mark the question as resolved.

Remember to ask specific questions, provide necessary details, and reduce your question to its simplest form. For tips on how to ask a good question run !howto ask.

gloomy plume Oct 5, 2023, 10:38 PM

#

yes, the basic idea with a reduction is that the order in which you compute the sum doesn't matter. so you can compute multiple subsums in parallel and then sum those up. and you can do this recursively, i.e., take the subsums, and compute multiple subsums of those in parallel, etc. until you're left with a single sum.

autumn jungle Oct 5, 2023, 10:39 PM

#

And in what format would that look like?

gloomy plume Oct 5, 2023, 10:39 PM

#

not sure what you mean

autumn jungle Oct 5, 2023, 10:39 PM

#

Ig my question boils down to what blocks are

gloomy plume Oct 5, 2023, 10:39 PM

#

you mean thread blocks?

autumn jungle Oct 5, 2023, 10:40 PM

#

Like are blocks calculated one after another, are threads the parallel or can you have multiple blocks in parallel and within there it computes each thread 1 by 1

gloomy plume Oct 5, 2023, 10:40 PM

#

ok so you're completely new to this

autumn jungle Oct 5, 2023, 10:40 PM

#

Yes

gloomy plume Oct 5, 2023, 10:40 PM

#

get a good book or read the cuda programming guide https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html

#

start with something simpler than reduction

#

there is much to learn ahead of you

autumn jungle Oct 5, 2023, 10:42 PM

#

So there is no easy way of having a say a billion separate calculations and then adding them all together

#

Reduction would be the way to go?

gloomy plume Oct 5, 2023, 10:42 PM

#

yes, reduction is the way to go

autumn jungle Oct 5, 2023, 10:42 PM

#

Ok

#

Knowing I was on the right part is good, I didn't want to accidentally reinvent the wheel

#

And then later find there was a much simpler method

gloomy plume Oct 5, 2023, 10:44 PM

#

i think the main question here is whether you want to learn CUDA or just sum up a ton of numbers

#

there are libraries like thrust you could use that take care of all those things for you

autumn jungle Oct 5, 2023, 10:45 PM

#

Ohh I do want to learn CUDA

gloomy plume Oct 5, 2023, 10:45 PM

#

ok then start with the programming guide

autumn jungle Oct 5, 2023, 10:45 PM

#

I'm just using my method of having a project that requires it as it gives motivation

gloomy plume Oct 5, 2023, 10:45 PM

#

or with a book, e.g.: https://shop.elsevier.com/books/programming-massively-parallel-processors/hwu/978-0-323-91231-0

#

massively parallel programming is a vast topic

#

you need to learn it from the ground up with some proper guidance

autumn jungle Oct 5, 2023, 10:46 PM

#

I always seem to stumble onto vast topics

#

That's cs for u

#

Do you know where I might find info about CUDA cores for a rtx 3060 laptop variant. I'm assuming it isn't the same as the desktop variant but can't find info for it anywhere

gloomy plume Oct 5, 2023, 10:51 PM

#

wikipedia has it all

#

sec

#

https://en.wikipedia.org/wiki/GeForce_30_series#Laptop

#

there you go

#

3840 cuda cores

autumn jungle Oct 5, 2023, 10:53 PM

#

Thank you

#

So hardware wise does an sm execute 1 thread block to its entirety then go on to the next one? Or does it execute the blocks in parallel as well

gloomy plume Oct 5, 2023, 10:57 PM

#

check out the cuda programming guide or a book, it's all explained there

#

like, i'm happy to help, but i can't give you a whole multi-hour introductory course to cuda here i'm afraid

#

an sm generally holds multiple blocks and executes them

autumn jungle Oct 5, 2023, 11:02 PM

#

gloomy plume like, i'm happy to help, but i can't give you a whole multi-hour introductory co...

Yep I appreciate that. Trying to wrap my head around the programming guide

gloomy plume Oct 5, 2023, 11:03 PM

#

sure, feel free to ask questions. but probably better just do so in #concurrency-and-parallelism or #graphics-gamedev as they come up

#

in here, people are unlikely to find them

autumn jungle Oct 5, 2023, 11:04 PM

#

Alright will do thx

idle dragonBOT Oct 5, 2023, 11:09 PM

#

@autumn jungle Has your question been resolved? If so, run !solved :)

autumn jungle Oct 6, 2023, 12:06 AM

#

!solved

#CUDA programming summation