#Need optimization Ideas

1 messages · Page 1 of 1 (latest)

brittle harness
#

Hello there so ... I was following the strawman videos to make my own version fo the strawman project sample.

https://github.com/Touhma/ECS-Strawman

Everything seems to run pretty fine. but I'm wondering what's next on the optimisation side ? Like everything work well at low entities count ( 5k ) but as soon as the count go 100x that it's struggling.

What can be done to improve performances at that level ?

From what i'm seeing the issue is not coming from the Systems code that are bursted but more on the rendering side.

I'm using in this example ECS : 1.0.16

So i'm wondering :

What's next ?
What is going on ?
What can I do sot reduce the rendering time issue ? ( if the issue actually come from here )

Any resources will be appreciated 🙂

visual geyser
#

hierarchy profiler is useless for multithreaded code

#

use timeline and figure out what's the candidate for optimizations

brittle harness
#

A lot of idling

visual geyser
#

looks like you are gpu bottlenecked?

#

not sure

#

check your GPU load

brittle harness
#

2080TI graphic cards , it should be able to show a few hundred thousands basic meshes x)

visual geyser
#

how do you render stuff?

brittle harness
#

Very Basic Mesh too

visual geyser
#

but how do you render it

brittle harness
#

Basic Baker with a prefab reference
With a basic Spawner Script :

 [UpdateInGroup(typeof(InitializationSystemGroup))]
    public partial struct SpawnSystem : ISystem
    {
        public void OnCreate(ref SystemState state)
            => state.RequireForUpdate<ConfigComponent>();

        public void OnUpdate(ref SystemState state)
        {
            ConfigComponent config = SystemAPI.GetSingleton<ConfigComponent>();

            NativeArray<Entity> instances = state.EntityManager.Instantiate
                (config.Prefab, config.SpawnCount, Allocator.Temp);

            Random rand = new(config.RandomSeed);

            foreach (Entity entity in instances)
            {
                RefRW<LocalTransform> transform = SystemAPI.GetComponentRW<LocalTransform>(entity);
                RefRW<DancerComponent> dancer = SystemAPI.GetComponentRW<DancerComponent>(entity);
                RefRW<WalkerComponent> walker = SystemAPI.GetComponentRW<WalkerComponent>(entity);

                transform.ValueRW = LocalTransform.FromPositionRotation(rand.NextOnDisk() * config.SpawnRadius, rand.NextYRotation());

                dancer.ValueRW = DancerComponent.Random(rand.NextUInt());
                walker.ValueRW = WalkerComponent.Random(rand.NextUInt());
            }

            state.Enabled = false;
        }
    }
#
  • Very simple prefab
#

The project use URP , Nothing weird done on it

#

Basic setup

visual geyser
#

so you just instantiate baked prefab?

brittle harness
#

Yup

#

It's a "Very Basic" example 🙂

#

No gpu instancing , no nothing.

I'm just taking a basic mesh , a basic UPR material, putting that in a prefab, then instanciating it a few thousand time after the prefab is baked.

visual geyser
#

If you look away with camera, is it still bad?

quasi vault
#

100x 5k?

brittle harness
#

500 000k @quasi vault

quasi vault
#

Yeah...

#

These are probably all setup as dynamics

brittle harness
#

in the batch rendered you mean ?

quasi vault
#

As in you haven't marked them static

#

Well that's definitely true as you're changing transform

brittle harness
#

"haven't named them static" -> Not sure to follow you here

quasi vault
#

Marked sorry

#

Typing on phone

visual geyser
#

I think half a mil meshes is too much even for 2080ti

quasi vault
#

Anyway just trying to point you're trying to push a gigabit of transform data to the gpu each second

quasi vault
#

But it's hard to know what is stalling exactly

brittle harness
#

let me check if that's really what's happening

Just turned off the system updating the transform and i'll mark them as static 😄

visual geyser
#

so bandwidth is 616gb/s

quasi vault
#

Where are you getting that?

visual geyser
#

some tech site

quasi vault
#

That's the bandwidth of gpu reading gpu memory

#

Not the bandwidth of pci4

visual geyser
#

ah

quasi vault
#

Which is 32gbs

#

On a 16x

#

Hence they should be fine throughput wise it's still very high

#

But that doesn't mean it's not causing latent

visual geyser
#

all right

brittle harness
#

Just so you know in the baker of the prefab : TransformUsageFlags.None -> Don't seem to change stuff

visual geyser
#

on game object

#

transform usage flags are additive

#

mesh renderer baker adds Dynamic

#

your None adds nothing

#

in the end you have Dynamic - thus nothing changes

quasi vault
#

Hmm thought mesh renderer adds renderable?
Anyway all prefabs are forced dynamic and something else probably does it anyway. Need to hit that static.

brittle harness
#

Technically saving 3ms

quasi vault
#

Gpu still high I assume?

#

(this 1 hand texting not going well)

brittle harness
#

Yup

#

100% used that's definitely a gpu bottleneck

#

lol

#

How the hell do people show 2millions cubes on the screen without burning their cards ?

visual geyser
#

afaik, BRG only batches in small ones

quasi vault
#

Just for curiosity, I don't think it has an effect in editor but can you go to project settings and turn off graphics jobs

visual geyser
#

like 100 per batch

#

so for such massive instancing you might want custom solution

brittle harness
quasi vault
#

Wait

#

You have 761 million tris

#

A normal game might have 5-20 mill

brittle harness
#

wait

#

that don't add up ...

#

the mesh is 684 triangles

sooooo time 500k should be half that

quasi vault
#

Yeah then a light?

#

But even half that is really high

#

So yes 2 million cubes would be 11x less tris 😅

brittle harness
#

yeah but not 11x more performances

quasi vault
#

Wouldn't expect it to be since its a lot more batches, just pointing out the comparison you're making

#

CPU time is halved though

brittle harness
#

Yep 🙂

#

But the gpu still getting raped

quasi vault
#

Side note, 2 million objects is the hard limit in unreal engine for nanite

#

As it's not expected you're getting close to this

brittle harness
#

What's weird IMO tho is why is the CPU halfed but not the GPU time

#

I mean I disabled Vsync so obviously the gpu would run full power all the time

quasi vault
#

Render thread != gpu time

brittle harness
#

Just the calcs on the CPU to send to the GPU ?

quasi vault
#

Render thread is a dedicated thread on CPU for doing gpu work

brittle harness
quasi vault
brittle harness
#

yeah with the

  • strawman mesh : 48ms on "other" & 22.25 ms on shadows
  • Cube Mesh : 21ms & 7.4ms
#

Definitely have a huuuge impact

#

With a Quad Mesh : 4.65 ms & 1ms

#

I would expect to have a bigger fps gain tho