#CUDA programming summation

46 messages · Page 1 of 1 (latest)

autumn jungle
#

I don't have code to speak of yet but the idea is that I will do a
for (y=1, y<=n, y++)
for (X=1,X<=n, X++)
some_variable += a/start(b)

Where a and b are calculated signed integers within the function. The issue is of course that I am adding the result of all of these calculations together and n ideally is going up to 100,000 the result of this is many, many calculations but I cannot just directly call += to some_variable because race condition. The obvious answer would be to save all the results in an array but the results are of double form so I would need an array of doubles length 100000^2 which is slightly too big.

I've read into reduction but I'd just like some help conceptualising it. Like do I split it into 100000^2/256 blocks of 256 threads each and store the value of each of those blocks? But then I've got the same issue of needing so many places to store data so as to not cause a race condition. Is there a way to run 1000 blocks, halt, sum those blocks. Then run the next 1000 blocks, halt. Etc.

Thx in advance for help and sorry if it's a very newbie question

idle dragonBOT
#

When your question is answered use !solved to mark the question as resolved.

Remember to ask specific questions, provide necessary details, and reduce your question to its simplest form. For tips on how to ask a good question run !howto ask.

gloomy plume
#

yes, the basic idea with a reduction is that the order in which you compute the sum doesn't matter. so you can compute multiple subsums in parallel and then sum those up. and you can do this recursively, i.e., take the subsums, and compute multiple subsums of those in parallel, etc. until you're left with a single sum.

autumn jungle
#

And in what format would that look like?

gloomy plume
#

not sure what you mean

autumn jungle
#

Ig my question boils down to what blocks are

gloomy plume
#

you mean thread blocks?

autumn jungle
#

Like are blocks calculated one after another, are threads the parallel or can you have multiple blocks in parallel and within there it computes each thread 1 by 1

gloomy plume
#

ok so you're completely new to this

autumn jungle
#

Yes

gloomy plume
#
  1. start with something simpler than reduction
#

there is much to learn ahead of you

autumn jungle
#

So there is no easy way of having a say a billion separate calculations and then adding them all together

#

Reduction would be the way to go?

gloomy plume
#

yes, reduction is the way to go

autumn jungle
#

Ok

#

Knowing I was on the right part is good, I didn't want to accidentally reinvent the wheel

#

And then later find there was a much simpler method

gloomy plume
#

i think the main question here is whether you want to learn CUDA or just sum up a ton of numbers

#

there are libraries like thrust you could use that take care of all those things for you

autumn jungle
#

Ohh I do want to learn CUDA

gloomy plume
#

ok then start with the programming guide

autumn jungle
#

I'm just using my method of having a project that requires it as it gives motivation

gloomy plume
#

massively parallel programming is a vast topic

#

you need to learn it from the ground up with some proper guidance

autumn jungle
#

I always seem to stumble onto vast topics

#

That's cs for u

#

Do you know where I might find info about CUDA cores for a rtx 3060 laptop variant. I'm assuming it isn't the same as the desktop variant but can't find info for it anywhere

gloomy plume
#

wikipedia has it all

#

sec

#

there you go

#

3840 cuda cores

autumn jungle
#

Thank you

#

So hardware wise does an sm execute 1 thread block to its entirety then go on to the next one? Or does it execute the blocks in parallel as well

gloomy plume
#

check out the cuda programming guide or a book, it's all explained there

#

like, i'm happy to help, but i can't give you a whole multi-hour introductory course to cuda here i'm afraid

#

an sm generally holds multiple blocks and executes them

autumn jungle
gloomy plume
#

sure, feel free to ask questions. but probably better just do so in #concurrency-and-parallelism or #graphics-gamedev as they come up

#

in here, people are unlikely to find them

autumn jungle
#

Alright will do thx

idle dragonBOT
#

@autumn jungle Has your question been resolved? If so, run !solved :)

autumn jungle
#

!solved