I don't have code to speak of yet but the idea is that I will do a
for (y=1, y<=n, y++)
for (X=1,X<=n, X++)
some_variable += a/start(b)
Where a and b are calculated signed integers within the function. The issue is of course that I am adding the result of all of these calculations together and n ideally is going up to 100,000 the result of this is many, many calculations but I cannot just directly call += to some_variable because race condition. The obvious answer would be to save all the results in an array but the results are of double form so I would need an array of doubles length 100000^2 which is slightly too big.
I've read into reduction but I'd just like some help conceptualising it. Like do I split it into 100000^2/256 blocks of 256 threads each and store the value of each of those blocks? But then I've got the same issue of needing so many places to store data so as to not cause a race condition. Is there a way to run 1000 blocks, halt, sum those blocks. Then run the next 1000 blocks, halt. Etc.
Thx in advance for help and sorry if it's a very newbie question