#Help me optimise a simple system.

1 messages · Page 1 of 1 (latest)

halcyon mortar
#

Hello, Ive done all research I can and have done all I can to optimise an enemy move system I can. With it I can get about 3 million enemies moving around at 20fps. I know this seems a bit ridiculous, but I want MORE!

Perhaps I'm incorrectly using Jobs, or I'm missing out on burst performance? In the profiler the time to move the 3 mil entities is split 50/50 between my EnemyMoveJob and a local transform job (which I believe I won't be able to help). Perhaps not queuing to the ECB in my job but instead figuring out a different way to do it?

Here are the scripts in question (execution order from top to bottom):

[BurstCompile] 
public void OnUpdate(ref SystemState state)
{
    //stuff     
    EntityCommandBuffer ecb = new(Allocator.TempJob);
    state.Dependency = new EnemyMoveJob
    {
        //stuff
    }.ScheduleParallel(_enemyQuery, state.Dependency);

    state.CompleteDependency();
    ecb.Playback(state.EntityManager);
    ecb.Dispose();
} 
[BurstCompile] 
public partial struct EnemyMoveJob : IJobEntity
{
    [ReadOnly] public float DeltaTime;
    [ReadOnly] public BlobAssetReference<EnemyTargetsBlob> Targets;
    [ReadOnly] public float2 Offset;
    [ReadOnly] public GameQuality Quality;
    [ReadOnly] public int MaxTarget;

    public EntityCommandBuffer.ParallelWriter ECB;

    public void Execute([ChunkIndexInQuery] int chunkIndex, EnemyAspect enemy)
    {
        enemy.MoveTowardsNextTarget(DeltaTime, Targets, Offset, Quality);
        if (enemy.Order >= MaxTarget)
        {
            ECB.SetComponentEnabled<EnemyDeleteData>(chunkIndex, enemy.EntityReference, true);
        }
    }
}
public void MoveTowardsNextTarget(float deltaTime, BlobAssetReference<EnemyTargetsBlob> Targets, float2 RandomFloat2, GameQuality Quality)
    {
        if (Order >= Targets.Value.Targets.Length) return;

        if (TargetOffset.Equals(float3.zero))
        {
            _enemy.ValueRW.TargetOffset = new float3(RandomFloat2.x, 0, RandomFloat2.y);
            _enemy.ValueRW.Distance -= (ushort) math.distance(TargetOffset + EnemyPosition, Targets.Value.Targets[0].Location);
        }

        float3 location = Targets.Value.Targets[Order].Location + TargetOffset;
        float speedDelta = Speed * deltaTime * 2;

        if (math.distance(location, _transform.ValueRO.Position) < speedDelta) _enemy.ValueRW.Order++; //go to different target
        if (Order >= Targets.Value.Targets.Length) return; //reached final target

        float3 direction = location - _transform.ValueRO.Position;

        if (Quality == GameQuality.High)
              _transform.ValueRW.Rotation = 
              math.slerp(EnemyRotation, quaternion.LookRotation(direction, _transform.ValueRO.Up()), speedDelta);
        else if (Quality == GameQuality.Medium) 
              _transform.ValueRW.Rotation = 
              quaternion.LookRotation(direction, _transform.ValueRO.Up());

        if (Quality == GameQuality.Low) _transform.ValueRW.Position += speedDelta / 2 * math.normalize(direction);
        else _transform.ValueRW.Position += speedDelta / 2 * _transform.ValueRO.Forward();

        _enemy.ValueRW.Distance++;
    }

Sorry if the code is difficult to read, I've tried to simplify it as much as I could without removing relevant stuff. Any help would be much appreciated and I would love to learn new optimisations to apply to my other scripts (assuming that there are more remaining).

I understand that there is some performance on the table by minimising the amount of data in my Enemy Component, however most the the memory on an enemy entity is rendering/transform. In total I'm fitting about 50 enemies per chunk. I'll do some deep profiling at some point to see what is the most demanding part of moving an enemy, though usually it isn't the math part considering SIMD, and that's pretty much all I'm doing.

Am I doing something that is a big nono with stuff like this? Am I passing in too many variables into a method or something?

midnight bramble
#
  1. EnemyMoveJob - don't use ECB, use lookup and set value directly
  2. Get rid of all that code. Use built in BeginSimulationEntityCommandBufferSystem.Singleton for ecb
    state.CompleteDependency();
    ecb.Playback(state.EntityManager);
    ecb.Dispose();
#

those 2 are like the worst performance killers

halcyon mortar
midnight bramble
#

while instead you can let other systems run, scheduling more jobs and making use of threads much more

#

in fact, this job doesn't even need ecb

halcyon mortar
midnight bramble
#

use existing ones

#

generally

#

for best performance all you need is just one

#

BeginSimulationEntityCommandBufferSystem

#

and avoid everything else

#

here you only do structural changes: creation/destruction of entities

#

everything else must be done from within jobs directly

#

just remember

#

each sync point you do - wastes not just performance of system where you do it, it wastes performance for any other system that ran before it

halcyon mortar
#

In the job how can I do SetComponentEnabled from a job without ecb? You mentioned component lookup... what should I be looking for, as all I can find for SetComponentEnabled is from ecb and systemAPI,

midnight bramble
#

it has same method

halcyon mortar
lofty wyvern
halcyon mortar
midnight bramble
#

on job struct

#

actually

#

it's irrelevant

#

your job runs regardless already

#

switching ecb with lookup won't change anything for functionality

halcyon mortar
midnight bramble
halcyon mortar
# midnight bramble other than getting rid of ecb and sync point - not sure. Actual algo can probabl...

damn.

I see https://www.youtube.com/watch?v=1R5WFZk86kE&t=40s this video and https://www.youtube.com/watch?v=zPnwmsTokso this one which are calculating millions of objects incredibly performantly, (well the first one wouldn't really be that much but you get the point) while in unity simply changing the transform of millions of objects cripples performance; at this point would a custom transform system be the solution to more?

Follow me on Twitter for more updates: https://twitter.com/ProgrammerLin

It took a bit longer than expected, but the fluid simulation design from a few months ago has been implemented in our voxel project. It uses the same cell/particle hybrid method and is therefore fully volumetric with no height range limit beyond the 0-4095 y world boundary...

▶ Play video

A high resolution look at Havok Physics technology demonstrated at Sony's live event in New York. This demo shows a million particle real-time physics simulation running on the GPU of the Playstation 4.

▶ Play video
midnight bramble
#

which is the reason why your performance is bad

#

as well as usage of ECB to just set enabled state or component vs using lookup directly

#

all of that is performance killers

#

you can write 99% of project perfectly, but just 1% of code with sync points will ruin it completely

halcyon mortar
#

hmmm, Ill do some tests in a new project and see if sync points are the problem

#

probably are though

#

considering my use of ecb

midnight bramble
#

each time you do something with ecb

#

you just literally queue the work for main thread

halcyon mortar
#

yea... I read a while ago on the Dots best practices on Unity Learn and it showed a graph of structural changes and whatnot with ECB being literally 100x slower than other options

midnight bramble
#

what you did with ECB can be done directly on worker thread with lookup

#

oh wait

#

you don't even need lookup

#

EnabledRefRW is what you need

halcyon mortar
#

now ima test on actual project

midnight bramble
#

your best bet - get rid of transform hierarchies

halcyon mortar
#

Compute local to world is the same while my job is taking 2ms longer

#

let me apply the previously discussed optiisations

halcyon mortar
#

if so then I should work on memory optimisations

midnight bramble
#

not really

#

transform hierarchies are fully recursive and lookup based

#

best bet - just get rid of them

#

that's it

halcyon mortar
hallow hedge
#

yes if you want max performance you can't have children

#

calculating hierarchies is expensive

halcyon mortar
#

Yup I'm already doing that, so I guess theres nothing left to optimise in that case

lofty wyvern
#

do you render them? if not maybe you can remove localtoworld component

halcyon mortar
#

I do render them (just Im benchmarking without rendering), I might make a custom renderer involving instancing at some point, but even then I do believe I still require the localtoworld (or I can make my own localtoworld, but I doubt I can make one that is better than the current one)

midnight bramble
#

just get rid of any child that is not actually rendered

#

also

#

take a look at 1.1 experimental

#

as it has optimizations for this

halcyon mortar
#

oh sick

halcyon mortar
midnight bramble
halcyon mortar
#

damn

halcyon mortar
#

its depressingly bad

midnight bramble
#

I have doubts it's actual rendering that is struggling

#

I mean

#

dispatching draw commands

#

rendering itself is shader dependent mostly

halcyon mortar
hallow hedge
#

no noes rendering millions of the same object this way

halcyon mortar
hallow hedge
#

are you actually rendering 1million things on screen at once?
That'd be 8 pixels per object @4k, or 2 pixels per object at 1080p

#

you wouldn't see anything

midnight bramble
hallow hedge
#

it looks like 3-4 objects

halcyon mortar
halcyon mortar
hallow hedge
#

sounds like you need a voxel engine

#

if they're all dynamic that's 3.6GB/s of localtoworld data for 1million entities you need to move from the cpu to gpu at 60fps

#

your memory usage alone must be pushing up there

halcyon mortar
halcyon mortar
halcyon mortar
hallow hedge
#

gpu?

#

both?

halcyon mortar
#

in entities memory, so Im assuming cpu, lemme check gpu

halcyon mortar
#

I think im calculating wrong

hallow hedge
#

nah that seems fine

#

you dont have any textures loaded

#

it's not like pushing this is adding new memory

#

it's just replacing the old stuff

halcyon mortar
#

makes sense

halcyon mortar
hallow hedge
#

cant realyl tell you because i have no idea what you're trying to do

halcyon mortar
midnight bramble
hallow hedge
#

ok it looked like you were trying to create ships or something out of lots of tiny parts