#Looping over pixels in a compute shader

1 messages · Page 1 of 1 (latest)

hasty kettle
#

I'm looking to replace my colour replacement script workflow with a compute shader to try and increase realtime performance, and I've watched a small video to get the gist of how they work.

My question is, does the compute shader system have a similar API for getting each pixel of a texture and modifying it, or is the workflow completely different?

#

Looping over pixels in a compute shader

sleek storm
# hasty kettle I'm looking to replace my colour replacement script workflow with a compute shad...

The workflow is quite different. One can do for and while loops in compute shaders but in this case they are not needed (and would perform much worse than C# side script if all is done in one thread). What you need to do instead is dispatch enough groups/threads to process each pixel once. Then the problem just becomes the game of finding the texture pixel index that corresponds to the thread index and reading and modifying that pixel. There are countless ways you could do the indexing and probably doesn't matter too much, you just have to have some 1 to 1 match between the threads and the pixels. The easiest would be to have something like 8x8x1 threads in a group and then having (width/8)x(height/8)x1 groups so the indexing becomes trivial. (each SV_Dispatch_ThreadID would directly match a pixel ID)

In the shader code, you can directly access the pixel with something like texture[id] = whatever where id is a uint2 (which can be directly taken from the SV_Dispatch_ThreadID) and texture is RWTexture2D. With compute shaders you will access the the pixels with the 2D integer index rather than UV coordinates which you would use with regular shaders.

Note that if the size of the texture is not divisible by the number of threads in a group (8 in both directions in the given 8x8x1 example), you need to ceil the division result which means you will have couple extra threads executing beyond the size of the texture. I don't remember for sure if you necessarily need to take that into account but safer would be to do a bound check in the shader code to not try to access pixels outside the bounds.

In case the threads and groups are confusing, this visualization may help https://www.reddit.com/r/Unity3D/comments/1eywb95/a_visual_guide_to_the_structure_of_compute_shaders/. I think that very clearly visualizes what each ID means at least. Feel free to ask for more info if anything is unclear, my rambling is probably all over the places at this time of day.

sleek storm
#

Maybe worth mentioning that even though in this example the SV_Dispatch_ThreadIDs would correspond to the pixel indices, the compute shader does not have any clue what the indices are for (the texture in this example). When you dispatch a compute shader, it just executes the shader X amount of times and it is totally up to you what you do in each and how you use the indices. You could achieve the exactly same thing by using a group size of 64x1x1 and dispatching (width*height/64)x1x1 groups. In that case you would have the same amount of threads running your shader code but the indices would just be different, in that case you would need to do your own indexing logic to go from the 1D index into unique 2D pixel index (easy with division and modulo) in the shader.

For most parts the amount of groups and threads in each does not matter for the performance too much. The one thing to be careful about though is that the total amount of threads in each group should be multiple of 64 (8x8x4 for example would be total of 256 threads which is multiple of 64) . 1x1x1 groups would probably be the easiest to work with but with that you would most likely be throwing out 97% or 98.5% of the potential of your GPU due to hardware limitations of GPUs.

GPUs basically can't run 1 thread at a time like CPUs can but rather always execute in groups of 32 or 64 (depending on the architecture) threads which execute simultaneously. These groups are not necessarily the same "groups" that you define when dispatching a compute shader but these hardware groups execute those thread groups defined. With thread group size defined as 1x1x1, you would have 32 or 64 GPU threads firing for every single pixel which would make 31 or 63 threads work for nothing. There are probably other possible thread group sizes and GPU drivers could also potentially automatically optimize for too small groups, idk. Keeping the threads in a group multiple of 64 at least used to be a safe solution.

#

Group size more than 64 is totally fine though as long as it is multiple of 64. If the group size was 8x8x4 for example, that would amount to 256 threads in the virtual group. The GPU would then fire either 4 or 8 (again depending on the architecture, I think AMD and Nvidia had it different) actual GPU groups to process the whole virtual group.

#

Enough for now, sorry in advance if that is too much to wallow at once

hasty kettle
#

haha thats fine, its late so im not really in the state to read it now, but this is definitely invaluable advice

#

compute shaders seem a bit daunting as someone who hasnt even touched fragment shaders lol

sleek storm
#

It's quite late in here too so I'm sure I messed up something in the explanation too. Do ask anything that might have been unclear or anything else in general about compute shaders or HLSL once you get to it. Compute shaders indeed look very cryptic with all the dispatches, threads and groups not to mention starting with shaders. In some way I think compute shaders are more similar to C# code compared to fragment shaders though when it comes to writing the shader code itself.

late thistle
#

If persistent modifying is not required, color LUTs in a fragment shader are a simple alternative