#HLSL PBR Shader vs Shader Graph

1 messages · Page 1 of 1 (latest)

lapis hatch
#

Hello Unity guys!
My team is planning some heavy optimizations and we're putting together plans on turning our Shader Graphs into HLSL.
URP.
PBR shader - we already have a custom one done in Graph (not using Unity's built in version for reasons decided before I joined) and we'd like to HLSL it.
I've tried making a very stripped down glass-only shader in HLSL as a first step, but that shader ended up having a very long compile time (300 ms / 150 ms) compared to the Shader Graph (2.0, 1.8ms)
We're not 100% sure why. Working theory is that the includes we put into the shader are bloating it, but that is the point I can't find a clear answer for online.

So does anybody know these answers?

  1. Do unnecessary includes cause bloat to compile time or runtime performance?
  2. Does Shader Graph have logic to strip out what's not needed that an HLSL does not by default?
#

The shader should be really simple, but it doesn't seem to work without these inputData and surfaceData points

We're wondering if
#include "Packages/com.unity.render-pipelines.universal/ShaderLibrary/Core.hlsl"
#include "Packages/com.unity.render-pipelines.universal/ShaderLibrary/Lighting.hlsl"
is bringing unnecessary or problematic bloat

magic gyro
#

Maybe clarify what the issue is. At one point you say compile time, at another, runtime performance.
The difference in compile time could be simply due to caching. There's no way the first compilation of a shader graph would be just 2ms.

lapis hatch
#

yeah, the question is about both

#

It's a general shader writing question

#

The difference in compile time could be simply due to caching. There's no way the first compilation of a shader graph would be just 2ms.

We thought this too but that 2.0 ms is the time our (current) logging method was showing us.
We're thrown for a loop about what should be basic info because our measurements are all over the place

so yeah I'm moreso asking for some cofirmation or clear answers on these basic things

magic gyro
#

Unnecessary includes shouldn't bloat compile time too much. The compiler just copy pastes the include code before the shader code such that all symbols are resolved. It's might be microseconds difference. And there's no effect on runtime performance whatsoever.

#

Shader graph might strip some stuff.
Its hard to say really, as it's part of the internal engine implementation.

You can generate the hlsl shader code from the graph to see what it looks like.

#

Also, 300ms is very fast for shader compilation.

lapis hatch
#

Yeah we've been doing that, but the generated shader code is not directly proportional to the compile time // the runtime performance

lapis hatch
magic gyro
#

For runtime performance you might want to use dedicated graphics debugging tools like PIX or maybe render doc.

magic gyro
lapis hatch
#

we're at his mercy and we're all kinda running in circles 🫠

magic gyro
#

Compile time is unrelated to loading time

#

It happens during the build.

lapis hatch
#

Not the way we're doing it, a lot of shaders get compiled the first time they're rendered.

magic gyro
#

How are you debugging/confirming it? Sharing some more context would be helpful.

lapis hatch
#

will get back to you in a moment

magic gyro
#

Maybe there's a small compilation to the gpu binary. You should try looking at/comparing the final glsl(or spir-v) code of your shader and shader graph.

lapis hatch
#

How can I see that? I haven't heard of those bfore

magic gyro
#

Select the shader, there's a button to show generated code and a platform drop down.

#

Actually, that's for the hlsl code.
Compile and show code is probably the button you need.

lapis hatch
#

Even trying to use various methods of prewarming, Unity consistently skips certain variants of shaders until they show up in-game on a camera.
we've been wrestling with trying to get everything included that we need to withotu bloating our initial binary download size with unneeded variants

lapis hatch
magic gyro
magic gyro
#

Can you take a screenshot of what you see?

lapis hatch
magic gyro
#

Compile and show code

lapis hatch
#

oh 🤦‍♂️

#

now I see it

magic gyro
#

Btw, just my personal opinion, but disregarding engine features(like shader graph or standard shader), you might be creating more problems for your project that would manifest in the future.
I've seen so many time how developers "implement our own custom thing" just to find out in the end(usually after release or close to it) that it creates more performance issues for them in the long run. You might think that default features introduce a lot of overhead, but this overhead could be crucial for other features that improve performance way more than that overhead, but the custom solution does not support them.

One example I've seen in a recent project is using some silly partition loading solution(basically instantiating prefabs into a scene), that made it impossible to use static batching properly/easily. They apparently were trying to solve memory issues, but ended up introducing a lot of cpu side performance issues. When the project was ported to switch 2, all of this bullshit surfaced and the team is now struggling to get decent fps without introducing breaking changes.

#

What I'm trying to say is, don't just go for a "custom solution" without deep consideration and understanding of the engine, project and the involved technologies(computer graphics, rendering, shader compilation, mobile rendering, metal graphics api, and such in your case).
And even if there are issues with the default solution the engine usually provides the "correct" ways to address them, without resorting to custom solutions.

#

I've seen people failing numerous times because they thought they're smarter than the engine developers.

#

But yeah, that's just me bragging.
And anyway, investigating the issue can be beneficial for improving your understanding on how things work.

lapis hatch
#

meetings ran long

#

We're making a mobile game that's trying to run way too many things and they're already talking about reducing framerate from 60 to 30 because of overheat

Shader Graph isn't made for super optimized pipelines, it's just not

magic gyro
#

That's arguable. In the worst case you can use the unlit shader grapb as a base. It should be super fast.

#

Anyways, before making any claims you should use dedicated GPU profiling tools.

lapis hatch
#

yeah we know how to make unlit, we need a pbr one

#

we have also tried dedicated profiling tools and nobody can get them working :v

#

something about our game just wont communicate with them

#

long story short we're not new devs or an indie team here, we know the leap we're trying to make

magic gyro
magic gyro
magic gyro
magic gyro
lapis hatch
#

I'm not trying to argue, I'm working in a studio with timelines too tight to let us do the profiling we need to do and should have done months ago

#

we have to work off some assumptions and the assumption of "shader graph includes stuff we dont need and is not performant for mobile" is just a given from every interaction about it I've ever had :v
What do you mean it's arguable?

magic gyro
lapis hatch
#

Can you answer my question first?

magic gyro
magic gyro
lapis hatch
#

and directors that, again, havent given us the bandwidth to do this properly.

magic gyro
lapis hatch
#

A default graph immediately compiles to somehing like 20,000 lines of code without actually including any nodes

#

It has 6 or 7 passes in it by default

#

Includes variants we don't want

#

it uses a bunch more instructions than are necessary by assigning things to variables internally that follow the visual flow of the graph but are not the most efficient way of passing information

#

etc

magic gyro
#
  1. The amount of lines of code mean absolutely nothing. You need to trace the actual execution path.
  2. The shader graph by default would be something like a standard shader. That implies many switches for disabling features if they are not used as well.
  3. Passes are not really a shader thing. There's no guarantee they are called at all. This is controlled by other rendering settings, the render path used in the project, and such.
  4. Variants count doesn't have effect on performance. Also, the build strips variants that are not actually used.
magic gyro
lapis hatch
#

Variant count does affect performance when it's impacting our build time and our load times. We only want one specific thing but unity is building variants we didn't ask for

lapis hatch
magic gyro
lapis hatch
#

You still haven't answered my question.

magic gyro
magic gyro
lapis hatch
#

You're just being defensive now.
Why do you assume shader graph IS performant when every actual visible metric says it's bloated?

magic gyro
#

Because I'm the end it just generated shader code.

lapis hatch
#

And that shader code is bloated

magic gyro
#

You didn't provide any single metric that is actually related to performance directly.

#

Code bloat is not related to performance.

lapis hatch
#

I just can't believe you at this point. You're saying what goes against every other easily google-able source and expert I've ever talked to about this.

magic gyro
#

And the actual reality is that your custom shader takes more time to compile to the gpu binaries right now(at runtime). Is that not right?

lapis hatch
#

That is not right.

#

That inconsistency is what started me asking these questions, I'm now convinced the shader graph code was included in the build and my custom one simply wasn't. that's easy to get around.

magic gyro
#

OK, you do you.

#

The thing with the "online experts" is that most of them are not experts at all and operate purely on assumptions. They never actually debugged or analyzed the said shaders with dedicated profiling tools to find what exactly affects performance and how.

lapis hatch
#

Please don't assume.
I'm not going off "online experts", those merely line up with what I have directly seen on previous projects and what I have learned from OFFline experts and seniors I sit beside in my career.

unborn glen