I have a compilation which says it is using 20+ temporary registers, and getting warnings that I should only use 16 for optimal performance. For example:
cs_5_0
dcl_globalFlags refactoringAllowed
dcl_constantbuffer CB0[3], immediateIndexed
dcl_uav_structured u0, 4
dcl_uav_structured u1, 4
dcl_uav_structured u2, 4
dcl_uav_structured u3, 4
dcl_input vThreadID.xyz
dcl_temps 29 // <--------- This is too high
dcl_thread_group 32, 32, 1
How can I reduce it? I tried re-arranging my code in various ways to try to reduce temporary variables, but the compiler comes up with nearly identical results regardless.