#Material.SetXXX performance
1 messages · Page 1 of 1 (latest)
What render pipeline are you using? On the built-in render pipeline, you should probably use the MaterialPropertyBlock
unfortunately I'm using URP
I think that would only improve on the Set function per material, but the problem is also that I have a lot of materials to animate
Expensive in what way?
URP uses the SRP batcher which should at least in theory reduce the cost of multiple materials significantly. Are you sure it's the SetXX methods especially that are slow? Also if you don't mind, please provide more information on what you are actually trying to animate so we could give more precise recommendations
Profiling using superluminal, most of the time is spent in the material.SetXXX functions
More specifically:
There are disintegration effects on all the objects that make the meshes disappear. These are disintegrated by animating a radius & position for all of these objects, independently
the radius and position are the material properties in this case, and of course only the radius is the one changing most of the time
How many exactly?
between 10 and 200 roughly?
Each frame I suppose?
yes.. 😄
Talking about this, I think I have an idea
I can create a custom buffer that I fill with these animated properties on a job
Then assign to the materials an ID to read from this buffer
of course that removes the properties from the editor completely, but at least it could work
it's a lot more complicated to manage though
Also 200 objects given 4 floats doesn't sound particularly much to me. How long does that take? Are you sure you are not creating new material instances every frame or something like that? Just pure SetXXX on existing material?
I will make a profiler capture
SetXXX itself is most probably just putting the value to a dictionary of sorts. I would assume that only stores the values on CPU side which are moved to the GPU only when the rendering is done
apart from the SetXXX, there are some significant improvements to be made of course, but ignoring those, the cost remains very high
ah it's not visible, but the percentages relative to the total cost of the EpicenterController.Update() function
of course profiled in release~
wow I say that but its not true, let me redo that 🤦
well, somewhat surprisingly, the result is almost the same in release
Is 0.1 ms all it takes though? One obvious thing you could do is to move both radius and position in one Vector4 w being the radius for example
Btw how many times is the inner for loop run? Do you have many materials on each of those 200 ish objects?
Oh wait, is the code only setting the properties for one of the objects? I thought it was doing for them all with the outer loop
Use the hierarchy mode.
look at the total number: 9ms (release) vs. 13ms (debug) for 100 instances
and the cost inside the update for the SetXX functions which are slightly higher in release (relatively)
I'll look at the # of loops and calls to SetXXX, but regardless of the size of the inner loop, the number of meshes/materials/properties animated won't change.
yeah, only one of the objects, which can have multiple renderers
It's very hard to come to any conclusions from the timeline view. Use hierarchy view...
And deep profiling if it doesn't show the specific calls in update
you can see the % of time this function call takes in the frame, the total ms it takes, and the breakdown of the function in the superluminal capture, that is not enough ?
Not enough. I trust the profiler more than superluminal, but you're not using it correctly, not providing enough info.
SetVector/SetFloat should just be queuing a gpu buffer update. The method itself should be pretty lightweight.
Btw you can read transform.position directly from the model matrix in shader assuming all those renderers are on the same object/transform. Ultimately still I'm having a hard time believing SetVector call would take that long
Probably each instance loops like 100+ times. 20000 calls to set to set property could probably account for 9ms
using the profiler properly would reveal it all
Just tried myself, for me setting a vector and a float around 6k times took 9ms. That is maybe not great either but as discussed earlier, you can get rid of one of the set calls (by combining and potentially reading the position from the matrix) and can always consider more optimized approaches (like using single buffer).
Now I'm interested in how many times you are setting the exact same data to the materials, in other words, how many times does the inner loop run (outerLoopIterations * innerLoopIterations) in average. If that is not something very high, I'm having hard time trying to explain how that could take that long
I wonder if using the property id instead of its name would help but i suspect there is a cost to updating data on the gpu now (unless its actually queued to be updated at a later point)
Good point though at least in my test setup there's not much of an impact (caching the index saves around 10% in time). I always thought the set calls just set a value in a map/dict and they are only sent to the GPU at the rendering stage, I don't have any info on that though, just thought it would allow more optimizations that way
It's hard to say I can't find anything explaining this.
The cost could also be the constant managed to native jump and conversion to apply these changes.
Doing animation in shader only or using GPU buffers with compute shaders would be best to reduce this problem
I thought the cost would be mostly due to native calls, though HasProperty is way cheaper, which I assume also does a native call (but I cant check now)
the call to Shaders.Contains comes out even cheaper; this is iterating over an array with 3 elements on average and comparing class instances, which shouldn't be that much slower than a hashmap addition (if that's really all SetXX was doing)
I don't know the # of calls yet, I'm not in the project, but I doubt I can change much about that anyway (in our worst case they all need to update)
I was mostly interested to see if there were any native APIs I could use to bypass it, but it seems not, which leaves us with the graphics buffer solution
Before this, we were driving the material parameter with the animator and we had no performance issues. We switched away from that for practical reasons (too many objects to animate by hand).
Perhaps another solution could be to bake the animations at edition time, though that kind of thing is never straightforward in my experience
can you give more detail as to what you are "animating" by changing material parameters?
it may be wise to pack data into some buffer or texture and use global time in shader to produce the effect
Even if it's 6k materials, it's too much to be rendered in one frame, let alone have properties modified. 🤔
I guess the SRP batcher does some magic here? Don't know how this works under the hood. In any case I will look at the numbers again tomorrow.
It's for disintegration / restoration effects on game objects, which may have multiple meshes & materials each
There are some points in the game where there are a lot of these animating at the same time
We did consider using a start + end time for the animation, which in terms of performance will work better of course. I don't remember why this wasn't an option. I will ask again.
They can be partially disintegrated as well and maintain that state, so there's some juggling to be done with the parameters. But that sounds feasible to me.
You can use the frame debugger to find out what is happening to draw a frame
You could also reduce the rate at which you update the materials. Especially for objects far away. Sort of an LOD system. Or update them all in one centralized place and split over several frames.
Yeah in any case GPU-wise we are still OK from what I could tell.
we have a top-down game, so most of them are at the same distance, but it's not a bad idea (but doing some culling for this is definitely a bit of work as well)
Is it something like vampire survivors with hundreds(more like thousands?) of units on screen?
they are buildings actually, where many mesh parts are animated individually (for construction mostly), which is why the numbers are so high
of course artist time is a factor there too
Yeah, for something like that, reducing update rate(or having an update queue with limited updates per frame) is pretty common I think.
Also DOTS obviously.
yeah that's what I did for our terrain updates and such, it's all jobified and spread out over frames
but there's no native material API so I got stuck on this mentally
there are many good suggestions in this thread 🙏
it is very pleasing to see how effective burst + jobs are, it makes all workloads effectively disappear (*from the main thread)
I was talking more about ECS(as part of dots), which probably does have a way to update materials(or whatever it uses) more effectively.
Generally with hundreds/thousands of updating objects, you want to go for ECS.
Usually you have a chance to combine the meshes and materials into one, if they're within the same "object" anyway
And like mentioned before at some point the animation could be handled by the shader entirely, if it happens at a predictable rate
Even if the animation in the shader would run on a global timer, you could subtract elapsed time as a per-instance offset for this material to restart the animation just for it
You could have some mask texture that controls how the effect changes over time too if that helps combining meshes again.
for now I ended up baking all the animations to a timeline animation clip, as it's the least invasive solution (it was already using timeline animations)
I think on average we had like 20-30 meshes per object, so for 100 objects, with each 2 materials, lets say 6000 SetXXX calls? this more or less matches what you measured, @quiet oyster
throwing this kind of thing into ECS is possible of course, but I'm not a big fan of mixing ECS and game object workflows
merging the meshes is definitely something to explore, though in terms of performance right now we're OK (the SRP batcher does a good job, there are only 30 batches)
Then make it all ecs
sure sure, give me half a year to refactor everything
With 40-60 SetXXX calls with identical data, you would definitely benefit from the buffer solution where you would upload all the data at once and read from the buffer based on index
Each of 40-60 meshes has 2 unique materials, or shared?
If they are shared, or you combine them to use shared materials you might need just 1-2 materials across all of them even if they remain separate meshes
1-2 instances I mean
there might be some shared materials, this is indeed a good point
I wonder what the cost is of animating so many individually instanced materials on the rendering side vs. using shared materials, especially when it comes to the CPU part of the render loop; something to try in the future
If you reduce the number of material instances you need to update per object from 50 to 1, the cost probably would scale linearly to 1/50
Not totally sure but in that case the only variable would be the number of instances you need to update, wouldn't matter how many mesh renderers are sharing it