#Those are both totally valid approaches
1 messages · Page 1 of 1 (latest)
Does it change per target?
When I've looked that's been the case.
Don't wanna spread misinfo otherwise.
Spent some time generating code in my testbed to make sure I wasn't making things up. What I noticed was that it doesn't branch early enough to save any performance. It still does the math for both sides of the branch. Screenshot from Unity 2022.3.15f1 - URP 14.0.9
It's definitely not the same as a lerp but in terms of performance I still don't think its saving anything to approach it this way.
here's the code if you're curious
Same deal w/ 2022.3.15f1 HDRP
Same deal w/ 2023.2.3f1 HDRP too
It's not handling the branch node in a way that saves performance via branching. It might as well be a lerp node (just with a few less instructions).
& branching on the GPU can save a TON of performance on the right card
this approach just assumes that most GPUs can't branch and doesn't use it effectively based on what I can see in the compiled code
Keywords work fine and I mention those below but its a different technique.
I hope that covers everything! Lemme know if anything else needs elaborating or if I missed something. I'll try to use more specific language in the future re: functions.
Lerp was something I learned very early as a tool for branching without conditionals. The pattern would look like branchResult = lerp(false, true, boolAsFloat) vs in this case Unity'sbranchResult = boolAsFloat ? true : false I don't know if there are still people online learning to use Lerp as an alternative to branching but I probably shouldn't have assumed that was universally understood as being used that way in addition to blending.
To give an idea of how easily swapped those two are - unity used to actually use lerp in early versions of shadergraph.
https://docs.unity3d.com/Packages/com.unity.shadergraph@6.9/manual/Branch-Node.html
I'd love to talk to other people more about branching on the gpu in general. It seems like a bit of a dark art. Something that could use a good spreadsheet to describe support for it per gpu.
(in bed atm phone posting)
Not trying to be rude but you're not going deep enough. You're looking at a ShaderLab output, not generated assembly. When this gets compiled by hlsl you'll find that a lot of this code gets stripped, or restructured, in a way that does indeed branch, especially if [unity_branch] gets translated into an hlsl branch attribute
No worries I appreciate you catching me & following up! I was a little bothered that you didn't give more context at first but it's good to be working thru stuff like this with other people. I'm definitely suseptible to making mistakes on my own.
I don't know how to read a branch command in assembly tbh. I'll look around for that next.
using ternary:
using if else
interestingly without the texture sample my if/else code looks like this:
but so far ternary hasn't tried to save any performance even when i've made it easy (from what I can read its just picking one of two memory addresses and returning it)
this is compiling for OpenGLCore
the shadergraph code is really hard to read when compiled lol
does the same MOVC thing as the ternary in vertex fragment
in OpenGLCore at least
I also just don't see where UNITY_BRANCH would go here. That usually comes before an if / else code block.
which the ternary isn't
According to the source code I could find online its always the ternary too:
https://github.com/needle-mirror/com.unity.shadergraph/blob/a6168ec50e623893fed4ff1dfa65ad83dcddc4e3/Editor/Data/Nodes/Utility/Logic/BranchNode.cs#L20
without a distinct UNITY_BRANCH keyword
I think its actually kind of confusing the way they have it written here since UNITY_BRANCH and Unity_Branch_float look very similar.
but Unity_Branch_float isn't anywhere else in the Unity codebase I can find. It's unique to Shadergraph.
^ I think this is the one you're talking about.
Ok I tried the same thing in other apis too and couldn't notice much of a difference between them. When they decide to branch is pretty consistent.
My takeaway from this is still that there's no performance savings from using the branch operator. If there is it's definitely not on the scale you would assume you're getting with an if / else statement since all of the commands on both side of the branch still run. Sometimes the results are comparable to an if / else statement but only when the if / else statement doesn't have an explicit HLSL [branch] before it (which is pretty much the worst case).
It's possible if you go one layer deeper and look at how the commands are processed on the device, certain manufacturers might optimize things away but I think its highly unlikely.