#Using thousand of collision world overlapSpheres. Performance expected?

1 messages · Page 1 of 1 (latest)

spring ore
#

Hi all, I'm triying to make a Boids system with the most simple approach of spawning prefabs with "physics shape sphere" and apply a World SphereOverlap every X frames.
So I created a IJobEntity executed with ScheduleParallell on a system every 5 frames or so. I save the "collisions" for further processing on many systems/jobs.

The thing is that I can only spawn about 5k entities (prefabs simple circle sprite with sphere physics shape with NO collision response). Over that 5k the system drops to 13fps and with 10k it drops to 0.3fps (more than 110ms CPU time).

Is this the expected result? With the BOIDS official example I can spawn more than 100k and no drop in performance (the fishs with animations, lights, behaviour etc...) I know the boids example don't use physics but its own spatial partitioning and Data Oriented design but I was wondering if I'm doing something wrong. Should I do my own spatial partitioning and forget about physics for so many entities?

I already have a physics step in place, with multi-threaded. I did not touch the physics settings (apart from layers interaction 1x1).

Here is the code: (for each entity create a collision world overlap sphere over the entity and save data to a buffer for the entity)
https://gist.github.com/Surt/07e41e0e02874c1ebde7c26cc22a4264

P.D. I'm writing the lookup unsafe because using Command Buffer parallell it slows down A LOT MORE and nobody is using that bufferlookup except for reading.
Also, I'm using the debugger,. Most of the cpu time is used on this jobs parallell (70ms with 10k entities), apart from the render loop for so many entities.

Gist

Unity Physics. GitHub Gist: instantly share code, notes, and snippets.

patent locust
#

It's odd to use BufferLookup when you're only accessing the buffer for the current entity and not passing entities to the other listed methods. If you're only accessing the current entity per iteration, you can just have a DynamicBuffer parameter (ref DynamicBuffer<TBufferTypeHere> buffer) instead.

Allocating a new NativeList per entity is needlessly costly. You can alleviate some of that by creating and re-using an allocated list per worker execution (until it needs a resize, of course).
You could do something like this:

partial struct Job : IJobEntity, IJobEntityChunkBeginEnd
{
    [NativeDisableContainerSafetyRestriction] private NativeList<DistanceHit> _distanceHits;

    public bool OnChunkBegin(in ArchetypeChunk chunk, int unfilteredChunkIndex, bool useEnabledMask, in v128 chunkEnabledMask)
    {
        if (!_distanceHits.IsCreated)
        {
            _distanceHits = new NativeList<DistanceHit>(initialCapacity, Allocator.Temp);
        }
        else
        {
            _distanceHits.Clear();
        }
        return true; // return true to execute the job for this chunk
    }
}

Allocator.Temp is one of the allocator types that does not require disposal, as it's automatically deallocated by the runtime / job system.
https://docs.unity3d.com/Packages/com.unity.entities@1.2/manual/allocators-overview.html

You don't need to pass in the whole PhysicsWorldSingleton to the job, you can just pass the CollisionWorld.

BurstCompile isn't needed on the Execute method. There's a great breakdown about it here:
https://forum.unity.com/threads/when-where-and-why-to-put-burstcompile-with-mild-under-the-hood-explanation.1344539/

spring ore
#

great recomendations! I didn't know how to handle an allocated native per chunk.
CollisionWorld only, gotcha!

I will try right now 🙂

spring ore
#

Great! about 3ms less per job with all the optimizations. With the OnChunkBegin I can optimize other systems too, also, the BufferLookup changed for the single buffer on the entityquery works like a charm! Thanks a lot!

spring ore
#

with 10k entities. I gain about 3ms for each job compared to previous job.
Still not what I expected. I suppose that I should be doing this in other way?

spring ore
#

by changing to RefRO every component data in execute interface parameters (except to the buffer)

spring ore
#

now collapses at 20k entities. I was expecting 100k at least 😄

exotic fiber
#

You're just not going to be doing 100k overlap spheres in a frame at normal boid density

spring ore
#

that's ok, I just didn't know what to expect. I saw videos showing 100k cubes falling with physics, ppl telling about millions of entities, etc and for this use case I needed some perspective.
I wanted to perform the OVerlap Spheres work during 2 or 3 frames async and then the nexts jobs. I will try that if I find how. I will have to set the system outside of simulation system group for that?

tawdry aurora
#

I guess since you're using the physics world, you'd also have to remove that dependency, or keep it alive independently from the next frames while you query it (no idea if that's possible)

another approach could be to spread the queries out over a few frames

im not very well versed in unity physics performance, but I'm sure a proper custom spatial partitioning will perform better, simply because you have more information about your ("simple") problem than Unity Physics does (and the boids example is no different in that regard)

spring ore
#

Yeah, the https://github.com/Unity-Technologies/EntityComponentSystemSamples/blob/master/EntitiesSamples/Assets/Boids/README.md does not uses physics at all. In top of that they create its own matrixs and transformations for data that performs optimally for the use case.

The thing is that I'm lazy 😄 and I love to work with higher abstractions. Unity physics does have its own spatial partitioning "bounded volumes or something" but clearly my code is a brute raw use of physics.

I still have no clue on how to let the SchedulleParallell job working async. I did set a system where i execute the job only if more than 10 frames passed from the last time but still it tries always to finish during the frame is called. The way you propose its valid, I can manual batch the processing calling a custom batch. But still 🙂 lazy

GitHub

Contribute to Unity-Technologies/EntityComponentSystemSamples development by creating an account on GitHub.

exotic fiber
#

any job you use component data will finish within a frame, at a minimum it will finish the next time the same system runs exactly 1 frame later

#

but it'll nearly always get caught by something before this

#

(like the physics system updating again)

spring ore
#

I see. It seems obvious after thinking on it, sync points mutable data..... I should do the "batching" per frame instead