#Those are both totally valid approaches

1 messages · Page 1 of 1 (latest)

bold nest
#

this is not true..

summer dragon
#

Does it change per target?

#

When I've looked that's been the case.

#

Don't wanna spread misinfo otherwise.

summer dragon
#

Spent some time generating code in my testbed to make sure I wasn't making things up. What I noticed was that it doesn't branch early enough to save any performance. It still does the math for both sides of the branch. Screenshot from Unity 2022.3.15f1 - URP 14.0.9

#

It's definitely not the same as a lerp but in terms of performance I still don't think its saving anything to approach it this way.

#

Same deal w/ 2022.3.15f1 HDRP

#

Same deal w/ 2023.2.3f1 HDRP too

#

It's not handling the branch node in a way that saves performance via branching. It might as well be a lerp node (just with a few less instructions).

#

& branching on the GPU can save a TON of performance on the right card

#

this approach just assumes that most GPUs can't branch and doesn't use it effectively based on what I can see in the compiled code

#

Keywords work fine and I mention those below but its a different technique.

summer dragon
# bold nest this is not true..

I hope that covers everything! Lemme know if anything else needs elaborating or if I missed something. I'll try to use more specific language in the future re: functions.

Lerp was something I learned very early as a tool for branching without conditionals. The pattern would look like branchResult = lerp(false, true, boolAsFloat) vs in this case Unity'sbranchResult = boolAsFloat ? true : false I don't know if there are still people online learning to use Lerp as an alternative to branching but I probably shouldn't have assumed that was universally understood as being used that way in addition to blending.

summer dragon
#

I'd love to talk to other people more about branching on the gpu in general. It seems like a bit of a dark art. Something that could use a good spreadsheet to describe support for it per gpu.

bold nest
summer dragon
#

No worries I appreciate you catching me & following up! I was a little bothered that you didn't give more context at first but it's good to be working thru stuff like this with other people. I'm definitely suseptible to making mistakes on my own.

I don't know how to read a branch command in assembly tbh. I'll look around for that next.

summer dragon
#

using ternary:

#

using if else

#

interestingly without the texture sample my if/else code looks like this:

#

but so far ternary hasn't tried to save any performance even when i've made it easy (from what I can read its just picking one of two memory addresses and returning it)

#

this is compiling for OpenGLCore

#

the shadergraph code is really hard to read when compiled lol

#

does the same MOVC thing as the ternary in vertex fragment

#

in OpenGLCore at least

summer dragon
#

which the ternary isn't

#
GitHub

[Mirrored from UPM, not affiliated with Unity Technologies.] 📦 The Shader Graph package adds a visual Shader editing tool to Unity. You can use this tool to create Shaders in a visual way instead o...

#

without a distinct UNITY_BRANCH keyword

#

I think its actually kind of confusing the way they have it written here since UNITY_BRANCH and Unity_Branch_float look very similar.

#

but Unity_Branch_float isn't anywhere else in the Unity codebase I can find. It's unique to Shadergraph.

#

^ I think this is the one you're talking about.

#

Ok I tried the same thing in other apis too and couldn't notice much of a difference between them. When they decide to branch is pretty consistent.

summer dragon
#

My takeaway from this is still that there's no performance savings from using the branch operator. If there is it's definitely not on the scale you would assume you're getting with an if / else statement since all of the commands on both side of the branch still run. Sometimes the results are comparable to an if / else statement but only when the if / else statement doesn't have an explicit HLSL [branch] before it (which is pretty much the worst case).

It's possible if you go one layer deeper and look at how the commands are processed on the device, certain manufacturers might optimize things away but I think its highly unlikely.