#How to reduce temporary registers in HLSL?

2 messages · Page 1 of 1 (latest)

rotund knoll
#

I have a compilation which says it is using 20+ temporary registers, and getting warnings that I should only use 16 for optimal performance. For example:

cs_5_0
dcl_globalFlags refactoringAllowed
dcl_constantbuffer CB0[3], immediateIndexed
dcl_uav_structured u0, 4
dcl_uav_structured u1, 4
dcl_uav_structured u2, 4
dcl_uav_structured u3, 4
dcl_input vThreadID.xyz
dcl_temps 29  // <--------- This is too high
dcl_thread_group 32, 32, 1

How can I reduce it? I tried re-arranging my code in various ways to try to reduce temporary variables, but the compiler comes up with nearly identical results regardless.

spiral quiver
#

in my experience there is no easy way to reduce register if you don’t change algorithm or where your data come from. you can use a proximating algorithm, or save your calculation results into a buffer first. that’s being said, high register usage doesn’t always mean bad performance. I recommend reading this article https://gpuopen.com/learn/occupancy-explained/

AMD GPUOpen

In this blog post we will try to demystify what exactly occupancy is, which factors limit occupancy, and how to use tools to identify occupancy-limited workloads.