#Disposing command buffers is taking up an unusual amount of time

1 messages · Page 1 of 1 (latest)

lament marsh
#

In my systems I use ECB's in the standard sort of way. I use a cached EntityCommandBufferSystem to create a new command buffer, pass it to the job and call AddJobHandleForProducer with my job's handle.

I noticed that disposing these ECB's is taking up a lot of time.

Here's an example where 45 of my systems in this group are taking almost a whole ms to dispose their ECB's. If I drill down with the deep profiler, it's apparently calling Array.CustomResize 3,117 times in the dispose call, which seems... wrong?

This is Entities 1.0 pre.15. And you can see it occur in the game on an empty map where the player has not done anything.

This appears to be the code responsible in EntityCommandBuffer.cs. The resizes might be expected, I just wanted to raise a post because it seemed a bit sus.

            // Arrays of entities captured from an input EntityQuery at record time are always cleaned up
            // at Dispose time.
            var entityArraysCleanupList = chain->m_Cleanup->EntityArraysCleanupList;
            while (entityArraysCleanupList != null)
            {
                var prev = entityArraysCleanupList->Prev;
                Memory.Unmanaged.Free(entityArraysCleanupList->Ptr, m_Data->m_Allocator);
                entityArraysCleanupList = prev;
            }
            chain->m_Cleanup->EntityArraysCleanupList = null;

            Memory.Unmanaged.Free(chain->m_Cleanup, m_Data->m_Allocator);

            while (chain->m_Tail != null)
            {
                var prev = chain->m_Tail->Prev;
                Memory.Unmanaged.Free(chain->m_Tail, m_Data->m_Allocator);
                chain->m_Tail = prev;
            }

            chain->m_Head = null;
            if (chain->m_NextChain != null)
            {
                FreeChain(chain->m_NextChain, playbackPolicy, didPlayback);
                Memory.Unmanaged.Free(chain->m_NextChain, m_Data->m_Allocator);
                chain->m_NextChain = null;
            }
cloud shadow
#

I debugged this a bunch previously. Can you benchmark in an actual build?

#

I found it significantly faster without safety enabled.

#

I think I went from 0.05-0.1ms per to dispose to 0.01ms

#

That said you do seem to have a lot of command buffers running per frame

#

Side note: what you say isn't really the standards way of doing this anymore since that doesn't work with isystem. I would suggest the singleton way is standard now (and no need for add handle for producer)

lament marsh
lament gyro
#

having a ton of systems doing a ton of stuff is the #1 time where isystem will help

#

it won't help with the ecb dispose time, though

#

i don't think, anyway

lament marsh
#

My systems generally don't do anything except kick off an IJobEntity job though, I'm skeptical I'll get any substantial performance gains

#

I suppose I can do a small sample and do some profiling to get an idea of how much it will help

lament gyro
#

you can save a fair amount of managed job scheduling overhead by bursting your update

lucid dust
#

We've updated to 2022.2.21f1 and ECB's are still taking a long time to dispose, even in release mode. According to Visual Studio, 1/12 of the mainthread time is going to ECB Dispose calls.

We've converted 137 of our systems to ISystem (with 99 still left on SystemBase), but the time going to Dispose hasn't decreased at all.

Any way to eliminate this overhead?

cloud shadow
#

how many ecbs are you creating a frame to start with?

fossil torrent
#

Out of curiosity, are you using ECB commands that target EntityQuery and/or NativeArray<Entity> in these command buffers?

lucid dust
# cloud shadow how many ecbs are you creating a frame to start with?

I'd say it's probably somewhere in the realm of ~100? I'm going to start to measure it--both how many buffers and how many operations they're queued. Just trying to analyze if there are a lot of updates or something. I'd say ~50% of our jobs use a single command buffer--the rest don't use a command buffer at all.

The pattern we have in code is:
Inside OnUpdate(SystemState) we create a command buffer and pass it to a single job that we spawn as part of the update. The job then uses the command buffer and generally updates an entity or two with AddComponent() or SetComponent()

I don't think we target EntityQuery or NativeArray<Entity> in the buffers--but I'm not sure about every usage in the code. Is there a specific method you're concerned about, @fossil torrent ?

#

41 of the jobs declare a EntityCommandBuffer and 63 of them declare a EntityCommandBuffer.ParallelWriter, so 100 is probably really close. That said, many jobs only run if there are matching entities, so that'd be an upper limit--in the case of a base with a lot of the various elements.

lucid dust
#

Is it bad that we're creating 1 command buffer per job (that needs them) on every frame? I thought the docs advised against reusing them.

#

(sorry for the delay getting back-I didn't see your messages and was away for a couple days)

fossil torrent
#

Any of the ECB commands targeting EntityQuery are (currently) far slower than you'd possibly expect them to be, and internally generate a potentially large array of entities to cleanup under the hood. But a) that's unlikely to be the problem here; that would show up as a spike during recording and/or playback, not dispose. and b) FWIW now that I've scared you away from using these commands, I've actually just merged a fix for them which should show up in a future Entities hotfix.

#

Is it bad that we're creating 1 command buffer per job (that needs them) on every frame? I thought the docs advised against reusing them.
No, that's the expected usage. You definitely don't want to have multiple parallel jobs writing to the same command buffer; their sortIndex values would likely stomp all over each other and you'd get weird/unexpected playback behavior. Multiple threads within the same job is fine though; that's what the EntityCommandBuffer.ParallelWriter interface is for.

There's also "reusing" in the sense of "create an ECB with PlaybackPolicy.MultiPlayback, record it once, and play it back multiple times". That is supported & should work, but I'd be curious to hear if anybody's actually found a great reason to do so; it smells like a feature that was added speculatively rather than to address any particular use case.

#

Are you on the latest 1.0.10 release? The code you posted back in February from FreeChain() was changed a few months back to not be implemented recursively. I don't necessarily think that would affect dispose performance significantly, but let's at least make sure that we're looking at the same code while investigating (and that you'd not running into a problem we've already fixed)

viscid cedar
whole ermine
#

Any of the ECB commands targeting EntityQuery are (currently) far slower than you'd possibly expect them to be
Meaning @fossil torrent, that it's more efficient to add the remove/add action entity by entity if you have a IJobEntity running with the same query ?

fossil torrent
#

Meaning @CortAtUnity, that it's more efficient to add the remove/add action entity by entity if you have a IJobEntity running with the same query ?
Meaning that in 1.0.10 and earlier, ECB commands targeting an EntityQuery will literally just degenerate into applying the change entity-by-entity. I wouldn't suggest explicitly doing it entity-by-entity if you can avoid it; that just guarantees you'll always get the same slow behavior. Just sit tight and wait for a change in the next hotfix that will make the query-targeting commands much, much faster.

#

I'll do some profiling of ECB dispose() to see what could be causing the Resize() bottleneck. This is certainly surprising; ECBs definitely make several small allocations which must be freed individually (the abstraction of "a big flat list of commands" is very not representative of the actual implementation), but I still wouldn't expect Dispose to take more time than playback under any real circumstances.

fossil torrent
#

@viscid cedar @lucid dust Would it be possible to get a self-contained example of a self-contained system that records, plays back & disposes a command buffer where the dispose is taking an unusually long time? I'm wondering if it has something to do with the specific commands recorded, or how they're recorded.

lucid dust
#

Lim has a different project. It's a reasonable ask and I'll work on it. I'm part time so it may take a bit.

#

Also, we are on an old entity version. I'll work on updating to the latest, but there's a backwards compatibility issue with entities that requires a mass update to our project. That's why we haven't upgraded... But I'll make that happen too.

fossil torrent
#

I whipped up a quick perf test (100x ECBs, containing 1000 CreateEntity commands recorded on the main thread). Median times for:

  • Record = 1590 microseconds
  • Playback = 1794 microseconds
  • Dispose: 43 microseconds
    So, I don't think it's something, like, universally systemic or anything. I'd love to see a more specific repro.
lucid dust
#

Yeah, that's definitely not what the visual studio profiler is saying. So I agree that something is fishy.

reef charm
#

Profiler can sometimes be deceptive, e.g. in the attached snapshot is is NOT the "BoatTransformSystem_MY" that takes up 224ms (it actually only takes 0.08ms per frame), I think it is is also not the "AISCoreSpawnerjob" (self is 2.5ms) , but the ExecuteJobFunction.Invoke() is waiting for something 224ms (no idea for what: 2 seconds before and after this job the average frame time is ~2ms)

#

could something like this be at play here?

lament gyro
#

i use superluminal in situations like this, and it usually finds the culprit. but also we're researching some ways to surface this kind of thing inside unity in future versions

fossil torrent
#

It's a good point about the profiler making functions look slow when actually waiting for some long-running job to finish. In this case though, ECB.Dispose() should not be running concurrently with any DOTS jobs (it's already synced on running jobs before initiating ECB playback). It's conceivable that some non-DOTS job could be running in the background and gumming up the allocator performance, but I wouldn't think that would be consistent enough to show up in a profiler capture.

viscid cedar
fossil torrent
#

Is that the correct ticket? That one looks like a GPU perf issue that turned into a feature request for the graphics package.

fossil torrent
#

Thank you, that's more like it! I'll take a look soon, probably next week.

lucid dust
#

@fossil torrent So... good news and bad news.

I've updated to the latest entities (1.0.0-pre.65 ) and the performance is unchanged. Updated Entities, Physics, and Entities Graphics.

To try to find the culprit, I build some commands to enable and disable systems in game, and I got those working. Unfortunately, I don't see any one system that's tanking the performance.

Performance seems to linearly decline with each system that I enable, even though most systems aren't doing much in the mainthread--just launching a job. The profiler also doesn't show any outliers--it shows every system using 0% or 0.1% CPU. I'm getting around 210 FPS with all my systems enabled and 420 FPS with them all disabled.

Here's a typical OnUpdate():

    [BurstCompile]
    public void OnUpdate(ref SystemState systemState)
    {
      systemState.Dependency = new KrilloJob
      {
        AllRotationParameters = SystemAPI.GetComponentLookup<RotationParameters>(),
        LocalTransformLookup = SystemAPI.GetComponentLookup<LocalTransform>(),
        RotationSpeed = 800
      }.ScheduleParallel(_entityQuery, systemState.Dependency);
    }

Updating entities took lots of time, so I just started my analysis. I will try to get you a specific repro--but nothing is jumping out just yet.

#

Okay, I think I have a reproducible testcase. I did the following in the OnUpdate() linked above and just enabling that one system dropped FPS from 400 to 6:

[BurstCompile]
public void OnUpdate(ref SystemState systemState)
{
  for (int x = 0; x < 10000; x++)
  {
    var cb = SystemAPI.GetSingleton<AsteroPreTransformSimulationSystemBufferSystem.Singleton>()
      .CreateCommandBuffer(systemState.WorldUnmanaged).AsParallelWriter();
  }
}
      

Removing the .AsParallelWriter() makes the FPS drop MUCH less pronounced--it only drops to 181 FPS (instead of 6).

So I think a key to the performance issue is you have to use ParallelWriter ECB's. Not sure if your quick perf test did that...

#

@fossil torrent ^

#

(Note: I didn't do anything with the ECBs. I just created them.)

lucid dust
#

Uh... I just hit update in the package manager and that's what it took me to.

cloud shadow
#

yeah it's bugged

#

if you have a preview version installed it won't show the full versions

#

either manually edit manifest

lucid dust
#

2022.2.21f1

cloud shadow
#

or remove it from package manage

#

then it will appear

#

and you can install it

lucid dust
#

Ah. Okay. Good to know, thanks.

I'll have to update that later. Hopefully less stuff breaks this time around 🙂

viscid cedar
lucid dust
#

Yeah, I'm on a slightly older Unity, but the Entities package is what really matters.

cloud shadow
#

(that's a completely unrealistic amount of command buffers...)

lucid dust
#

agreed. I was just trying to reproduce the scenario. But we are allocating at least 100/frame. So at 100 FPS, we're making approximately 10,000 ECB's/second

cloud shadow
#

we have 147 in our published application

#

and from experience, that is way too much tbh

#

it's a clear sign of poor architecture

viscid cedar
cloud shadow
#

not from the performance side of it

#

but from a software architect side

lucid dust
#

@cloud shadow So I removedthe packages, and the pre release ones are still the only ones showing up.

cloud shadow
#

it means
a. you're doing a lot of structural changes, this is gong to be worse performance than your command buffers
b. it's really hard to debug problems
c. timing issues will creep in over time causing really hard to track down bugs
d. a bunch of other stuff

viscid cedar
lucid dust
#

(from the Unity registries)

cloud shadow
#

which will be much larger (and this project was pretty large...)

#

if you have 0 addcomponent, 0 removecomponent and only 1 destroy component in your project you really don't need a lot more ecbs ^_^'

lucid dust
#

@cloud shadow The pattern we have mostly is 1 ECB per system that possibly does updates back to entities in the gameworld. In a lot of cases, there's just no updates

#

(on any given frame)

viscid cedar
cloud shadow
#

it still has to dispose it

#

the first like of playback is like, is empty? ignore

#

you can even make it faster if you want

#

there is a

#

ShouldPlayback

#

bool you can set to false

#

to early out like as fast as humanly possible

#
        void PlaybackInternal(EntityDataAccess* mgr)
        {
            EnforceSingleThreadOwnership();

            if (!ShouldPlayback || m_Data == null)```
viscid cedar
cloud shadow
#

if you know it's empty just set that ShouldPlayback to true

#

well if you're on a low end platform, you should really be aiming for like as few ecb as possible

#

having done a bunch of optimization for past gen consoles which are even slower than most mobiles merging systems to reduced ecb is a legit thing

#

and i imagine this is the type of thing you might need to do for mobile

viscid cedar
#

Ya I have plan to further reduce ecb as much possible but still I hope official can improve this

cloud shadow
#

but yes ShouldPlayback = false
if you want to early out of playback

viscid cedar
#

👀 I guess most developers will still ecb everywhere like a boss

lucid dust
#

I'll try to update to the latest non-prerelease entities and such tomorrow. There are a few bugs that it introduces that I'll have to go fix now.

cloud shadow
#

i don't know what's up with your results but it's much less noticable for me

#

even on our work project where we create a lot, in il2cpp builds (in editor it's slow) empty ecb really isn't that expensve

lucid dust
#

We are using mono builds (but the affected methods are bursted)

#

So that should make IL2CPP not matter, right?

cloud shadow
#

and playback is only bursted once it starts walking the chains

lucid dust
#

Ah. Gotcha. Makes sense. We wanted to use IL2PP (and it gives a small performance uplift), but it broke some of our code so we had to roll it back.

lucid dust
#

@fossil torrent I reran the test above #1078529702890848406 message and I'm getting 16-18 FPS on Entities 1.0.10 creating 10k Parallel ECB's per frame. So it appears that the latest entities is faster than 1.0.0-pre.65, but it still has a huge cost for ParallelWriter ECB's.

I'm getting ~315 FPS when I disable the creation of the 10k ParallelECB's on the latest entities.

viscid cedar
shrewd violet
#

I had the same problem as you guys here. Tertle also suggested I use ecb as little as possible. I also tried it and it worked. I have a suggestion, if you can't get rid of the structure change in the job, you can use RequireForUpdate at OnCreate to reduce unnecessary ecb parallel creation frame by frame

shrewd violet
cloud shadow
#

(command buffers are like 5x faster in build to dispose)

shrewd violet
#

yeah, but it quite confused when we try to profiler on editor, can't figure out what's really causing the performance problem

cloud shadow
#

oh yeah profiling in editor isnt ideal

lament marsh
#

But it's not like in an actual game the ECB systems are our bottleneck. It's just the disposes on the main thread. And this happens even in an empty map when a huge portion of our systems aren't even running

#

And in the wild FF performs pretty damn well for this stage. People are making large factories, spending 50+ hours on single saves. I'd be surprised if my architecture was so fundamentally bad that it basically has to be rewritten to get under this 20 ECB threshold.

lucid dust
lucid dust
lucid dust
lament marsh
#

I try to avoid structural changes as well. Most uses of ECB's are just updating data

#

then probably entity creation

fossil torrent
#

Most uses of ECB's are just updating data
@lament marsh Can you elaborate on that? Internally we really only see three use cases for ECBs at runtime (in no particular order):

  1. Adding/removing components from a job, to represent entity state changes.
  2. Creating/instantiating new entities from a job.
  3. Destroying entities from a job.
    For case 1, enableable components provide a workaround in many cases where this is a bottleneck, since they can be toggled immediately from job code.
    For 2 and 3, we believe that a special-case solution tailored specifically for these use cases would have significantly higher performance than we can get from a general-purpose feature like the ECB.

Are there any ECB use cases not captured above that we should be considering?

#

Specifically, when you say "updating data", I'm imagining an ECB that is nothing but SetComponent commands, and that would strike me as a red flag.

lament marsh
# fossil torrent > Most uses of ECB's are just updating data <@226422780445458432> Can you elabor...

Yes, there are a huge number of SetComponent commands in Final Factory's code base.

Practically speaking, designing your data in real world scenarios such that you always are only operating on the data being passed to the Execute function and only modifying data in the chunk on the archetype's components is extremely difficult. In most cases, there just aren't performance concerns anyway.

In Final Factory's hot path systems, where performance is critical, I do my best to not use ECB's in this way, but otherwise, I'm not taking special care to avoid them due to the architectural complexity (I might just be bad at ECS data design though!)

Some other notes:

For (1), I started this project several years ago before enableable components, so I end up just using booleans on components and exiting the Execute functions right away if necessary in relevant systems most of the time, rather than Add or Removing components due to the performance ramifications of structural changes.

For 2 and 3, I have yet ran into performance issues with deleting or creating entities, even though FF potentially creates and deletes thousands of entities every tick on big factories.

fossil torrent
#

Totally understood re: enableable components, that's a big change to make for a project that's already in development for years.
Would modifying the data through ComponentLookup be an option instead of SetComponent commands?

lament marsh
#

There's probably a lot of low hanging architectural fruit to fix up so I can reduce the need for this.

I've mostly just been alone figuring this stuff out for almost 5 years, except for some forum posts and this discord. I'm sure if Unity devs or someone like Tertle could see the code base all kinds of mistakes could be pointed out 😅

fossil torrent
#

There's a special place in my heart for the early adopters 🙂 A lot of our work is naturally driven by making the current/future versions better, and it's not always easy to make sure existing users can benefit from that work as well. It's not that we don't appreciate you though, and we'll help when we can!
It sounds like you're doing the best you can under the circumstances with ECBs. At least being aware of the cost & avoiding them when they're not necessary where perf is a concern is 90% of the battle.

lucid dust
#

We can probably remove a modest number of them. I've been helping slims with a lot of the mass mechanical changes (like the effort to update almost every one of our 200 systems's queries to adopt the latest entities package (due to new validation rules)

#

That said, is there anything that can be done, especially with respect to the performance of the parallel writers ecbs? They're about 100x slower than the nonparallel

#

(and were you able to reproduce the reduction I posted?)

#

I think the key thing for us with ECBs is they need to be created and torndown on every frame just in case the execute needs it as long as there are entities matching the query. Even an optimization to lazy create them would help tremendously.

#

Now that we know that they're expensive we could look at potentially sharing the ECBs across all systems in a system group but that will be a huge lift and potentially introduce race conditions.

fossil torrent
# lucid dust Okay, I think I have a reproducible testcase. I did the following in the OnUpda...

This was a useful hint; calling EntityCommandBuffer.AsParallelWriter() adds 128 new memory allocations to the ECB right off the bat, each of which must later be freed even if nothing is written to them. This is to ensure that all worker threads have their own thread-local buffer to record commands into, without requiring any inter-thread synchronization. Just adding a call to AsParallelWriter in my perf test bumps the median time to dispose 100 ECBs from 46 microseconds to 200 microseconds -- still not in the same ballpark as recording/playback, but a significant increase nonetheless. I can look into lazily allocating these buffers until they're actually needed; that shouldn't be too hard.

I also don't have Burst enabled in my test; I should certainly do that to get a better sense of how these times compare in practice.

I wonder if looking up the singleton for each iteration of the loop is a significant part of the hit you're seeing. Can you try hoisting that outside the for loop & see how it affects your timing?

viscid cedar
# fossil torrent There's a special place in my heart for the early adopters 🙂 A lot of our work ...

🤔 Actually even for ecs newcomer I think it's still better to improve ecb so people can use ecb like a boss. I believe 90%+ of the unity developer will just spam a lot of ecb and then regret it. From what I see there's 2 ways to fix it but I not sure how hard to implement the solution

  1. Make creating ecb inside job possible instead of currently must create ecb on main thread. Ecb recording out of main thread only, ecb playback still on main thread.
  2. Completely skip ecb without playback and dispose when ecb doing nothing i.e. create empty ecb without create entity/add component/remove component/destroy entity
lucid dust
# fossil torrent This was a useful hint; calling `EntityCommandBuffer.AsParallelWriter()` adds 12...

I could, but it wouldn't represent what we're doing in our systems.

For each system that needs a ECB, we get the singleton and then create the command buffer.

Assuming 100 systems needing ECBs at 100 fps, this means 10,000 iterations of the snippet I sent you per second.

Also, if I change to a nonparallel writer, FPS jumped from 6->180 without changing the singleton get... So I think it's not the problem.

Is your test using IL2CPP or mono?

fossil torrent
#

Mono, it's an in-editor perf test

lucid dust
#

Okay. That was what I was using too.

fossil torrent
#

but I've just added Burst

lucid dust
#

But I was in release mode using burst for the create.

I think the dispose and playback are not bursted, though?

#

I saw basically that the snippet I sent took 160ms (for the 10k iterations)

#

Sounds like somehow on your system it's roughly 8x faster?

#

Oh... Yeah, on the latest entities I saw a 3x speedup. So now you're... 2.5x faster

#

My CPU is a 7945hx, btw.

lucid dust
lucid dust
viscid cedar
fossil torrent
lucid dust
#

Should be significantly slower than my CPU

fossil torrent
fossil torrent
lucid dust
#

Cort, If you want, I'll also offer to get into a virtual call if you want me to give a short presentation into what I've found takes overhead time in Unity.

fossil torrent
#

100 empty ECBs, median for 1000 tests, from a Burst-compiled function

create + call AsParallelWriter: 128 usec
dispose: 136 usec

#

without AsParallelWriter, that drops to 26 usec and 20 usec respectively

#

That's creating all command buffers inline with new EntityCommandBuffer(), not going through an ECB system.
Wait, ECB systems are still SystemBase? facepalm Okay so that's a mistake.

lucid dust
#

Not only systembase but a huge amount of boilerplate

#

Not sure why the create and dispose are SO much faster for you than me...

#

That's 26ms for 10,000

#

We had to basically copypasta for each ecbsystem from some of the unity ecbsystems

fossil torrent
#

That's probably why, I'm creating & disposing them inline in Burst instead of through an ECB system, which would be a much more realistic benchmark. And the ECB system dispose would not be using Burst, because it's a SystemBase. Playback is using Burst internally, even if you call it from Mono, so that's not an issue. But it would definitely amplify any unusual Dispose costs.

lucid dust
#

That is correct. Dispose is NOT bursted.

#

(I was trying to say that earlier). 🙂

fossil torrent
#

That we can fix. Let me update the benchmark to use ECB systems like a real application has to.

lucid dust
#

Sounds good.

Like I said, happy to get on a chat if you desire to show you what I'm seeing, if it would help.

Thanks for all your attention on this. Bursted dispose outside main thread would be amazing.

And having the ECB lazily instantiate... Possibly as part of the framework --not even bespoke code-- would be even better.

#

Command buffer creation is basically expensive boilerplate code that... If you don't match the ECB system with the system group will introduce subtle bugs. Ideally it should be all framework hidden.

fossil torrent
#

Yeah, this thread has already yielded a couple solid ideas for improving both create & dispose time. I just want to make sure I can reproduce the original perf issue first, so I can be confident it's adequately addressed

#

10K empty ECBs, created w/AsParallelWriter in a Burst-compiled ISystem through an ECB system, using the ECB system's update to playback and dispose, median of 100 tests

create: 12.2ms
playback and dispose: 28.83 ms

#

Same test without AsParallelWriter:

create: 2.55ms
playback and dispose: 9.08 ms

#

So there's a baseline. I'll get to work optimizing from there and report back.

#

Like I said, happy to get on a chat if you desire to show you what I'm seeing, if it would help.
I think I have enough to go on for now with the improvements already proposed; if that doesn't end up making a dent in this benchmark, I'll take you up on that

fossil torrent
#

What's hilarious is that ECBs allocated from an ECBSystem use a rewindable linear allocator, not a general-purpose allocator. So all these individual Dispose() calls are basically no-ops; all the memory for the whole frame is "freed" at once once per frame when the buffer is rewound and recycled. All the cost here is just iterating through lists of 128*10,000 individual allocations to call a no-op function on each one. double facepalm

lucid dust
#

Yup. You can see a massive number of array resizes in the profiler.

#

Your times look correct now.

#

You're gonna add 100fps to FF if you fix the problem you just mentioned. 🙂

lucid dust
fossil torrent
#

Camera performance? Not the DOTS forums, in that case; unless it's some specifically DOTS-related interaction

lucid dust
#

Nope. Just didn't know if you knew the best place to provide feedback for that.

fossil torrent
#

Sorry, I don't. I live in a comfortable DOTS-shaped hole and know very little about how the rest of the company gets stuff done.

lucid dust
#

Well thanks for your time. Pretty sure you've got a handle on the underlying issue now. Ping me if you need any extra help and I'll be sure to respond!

#

Again, I'd love it if ECBs could be built into the framework and lazy instantiated so we don't have to create them at all!

fossil torrent
#

Yeah, I've got several solid ideas for improvements from this thread today:

  • Lazily allocate each thread's first command block in AsParallelWriter
  • Skip individual deallocations for an ECB's command blocks if we know the allocator will auto-dispose them
  • Make ECB Systems an ISystem (or at least make their OnUpdate burst-compiled)
  • Move ECB disposal to an async IJob
    I expect that the first two items will already provide such a huge benefit that the last two may not be necessary, but we'll see!
lucid dust
#

Keep in mind that, unless we missed something, ECB systems have to be bespoke created by us to support iSystem... About 50-100 lines of boilerplate per ECB system

cloud shadow
#

I actually wrote an isystem ecbs in 0.51, was interesting

#

But it kind of breaks managed operations so surprised it can be considered

#

Just a side note Lothsahn, obviously you don't want to refactor an existing project but there are alternatives to set than ecb.
It's faster to just write to say a native queue and read it back in a seperate ijob and apply it. You're doing the exact same work that an ecb would do in playback but you get to do it in a seperate thread with no sync point.

fossil torrent
#

Okay, good news: a one-line change in EntityCommandBuffer.Dispose() makes most of this problem go away. Change

if (m_Data != null)

to

if (m_Data != null && !m_Data->m_Allocator.IsAutoDispose)

that dropped my playback-and-dispose time to 7.8ms, whether AsParallelWriter is used or not.

lucid dust
#

New release plz! 😉

fossil torrent
#

We're churning out hotfixes every few weeks at this point, you won't have to wait long

#

Go ahead and try it locally though

lucid dust
#

I don't have access to unity sources, I don't think...

fossil torrent
#

it's in the package source

#

you'll need to use a local copy of the package if you're not already; if so, don't worry about it, just sit tight and we'll fix it officially soon enough. The package manager gets angry if you modify package sources behind its back, and reverts them.

lucid dust
#

Just using package manager for now.

fossil torrent
#

Lazily allocating the per-thread command blocks is totally orthogonal to this fix and will likely affect ECB create time more than dispose, but I'd expect it would pretty much eliminate the cost of AsParallelWriter entirely. So in my benchmark you'd get something very close to no-parallel-writers 2.55ms instead of 12.2ms

lucid dust
fossil torrent
icy moth
whole ermine
fossil torrent
lucid dust
#

Thanks so much, Cort... Not just for the fix, but for the community involvement and great interaction. Literally some of the best tech support I've gotten in my many years as a dev.

fossil torrent
#

For completeness: we can't easily make ECBSystems into ISystems because of the lack of inheritance for structs. But we can easily wrap the buffer.Dispose() calls in a Burst-compiled function; that knocks the playback-and-dispose time for 10K empty ECBs from 7.8ms to 6.4ms.
I think that's enough to merge for now. Moving the dispose work into a separate thread is worth exploring, but less likely to be a clear win; it'll depend on whether each ECB system has enough ECBs queued up to justify the cost of scheduling a job vs. just disposing them immediately. And of course disposal is much faster now anyway.

fiery valve
lament gyro
#

use SystemAPI.GetSingleton<yourECBSystem.Singleton>().CreateCommandBuffer() and it does it for you

#

(and works from bursted isystems)

whole ermine
#

does it works with SystemBase too ? or it's just for isystem ?

lament gyro
#

both

fiery valve
lament gyro
#

eh? no?

#

i don't think there's any gotchas to using it. just create your ecb's that way, schedule whatever jobs against them from the same system, and the dep manager will take care of it

fiery valve
#

I'll dig into the codegen to understand how that works then, thanks

lament gyro
#

the codegen is just a distraction. the way it works is that when you say GetSingleton, your system gets added as a "reader" of the ecb system's singleton component. the ecb system will then mark itself as a "writer" of that same component, and then call complete all jobs before flushing its list of pending buffers. this will complete jobs scheduled by your system

#

it's kind of an abuse of the dependency manager, but it works great for this particular case

#

the key thing about it is that because you're calling GetSingleton instead of GetSingletonRW, multiple ECB jobs can run in parallel against multiple ECBs created against the same system's singleton, because they're marked as "readers"

#

even though they're actually writing to the ecb inside the thing

fiery valve
#

I see, thanks!

I wouldn't go as far to say codegen is just a distraction when it contains so many crucial implementation details, but you did save me some time there 😄

lament gyro
#

yeah fair enough. it was just gonna be a bit of a wild goose chase in this case

#

and boy howdy does it make things less verbose

round moon
cloud shadow
#

And that's what you need to get to your type of scale

lucid dust
#

@cloud shadow btw, thanks for all your suggestions. We'll definitely look into it, but we're in crunch time for release. That said, I think with the improvement Cort did, it won't be so necessary to reduce the number of ECB 🙂

#

But hopefully we can just avoid ECB's entirely by making our own nativearrays as you suggested 🙂

cloud shadow
#

good luck with your release!

lucid dust
#

@fossil torrent Looked up how to do manual packages and did so. I can confirm that just your one dispose fix takes our game from 260 FPS to 330 FPS. Huge improvement!

shrewd violet
lament gyro
#

we're not supposed to quote dates because we're always wrong, but reasonably soonish

shrewd violet
#

got it, looking forward to a big change from the ECB
I have another question related, I tried to disable ecb by using ShouldPlayback. But when all ecb is turned off, ecb system is still waiting for other jobs, is this normal?

fossil torrent
#

Yeah, what Elliot said. Real Soon Now(tm).
I can confirm that the change has been made & pushed to our main repository, and I'm working on getting it into the release branch for the next Entities 1.0 hotfix. Whether it makes it there or not, or when that hotfix is actually published depends on many factors outside my control; QA could decide it's too risky to rush in without further testing, or something actually breaks & it needs to be reverted for more iteration, or something else unrelated breaks & delays the whole release while we sort that out. You know how software goes.

fossil torrent
# shrewd violet got it, looking forward to a big change from the ECB I have another question rel...

EntityCommandBufferSystem.OnUpdate() calls a method FlushPendingBuffers() that handles playing back all the enqueued command buffers. The first thing that method does is call CompleteDependency() (which handles waiting for jobs from systems that use the recommended ECB system singleton approach) and then m_ProducerHandle.Complete() (which handles waiting for jobs from systems that use the old AddJobHandleForProducer() mechanism). From your screenshot, this looks like the latter case.
So, it's certainly expected that we'd still see a call to these methods. If there are literally zero jobs registered as input dependencies for the ECB system, I'd expect both calls to early out almost immediately; the fact that that's not happening tells me that there's still ECBs from somewhere being registered against this ECB system, even if your code isn't doing so. Possibly from within DOTS itself?

viscid cedar
lament gyro
#

we haven't thought of the magic yet, anyway 🙂

#

if after cort's optimizations, somebody comes to us with a profile where they converted an ecb system to isystem and it went way faster, we'd be more likely to prioritize it, but atm i think there are bigger fish to fry (like the completedependency() thing)

viscid cedar
lament gyro
#

yup. big project

viscid cedar
lament gyro
#

lol i'll get right on that

#

(jk, these are fundamental issues and also i'm not totally convinced that such a phone would be capable of running rollback netcode even if we did everything perfectly. but it should get somewhat better in the future)

shrewd violet
lament gyro
#

it's a really excellent question. if you're willing to try something, can you see what happens if you put the line if (m_PendingBuffers.Length == 0) return; at the top of FlushPendingBuffers() in Packages/com.unity.entities/Unity.Entities/EntityCommandBufferSystem.cs ? (note you'll have to internalize the package to try it, but it might be worth it if it eliminates the stall)

#

this will fix it if there are actually 0 pending ecb's for this system on a given frame, but will still stall if there's even 1. the latter problem is rather more complicated to fix.

whole ermine
# fossil torrent Yeah, what Elliot said. Real Soon Now(tm). I can confirm that the change has bee...

Hey Cort,
I think there is a hole with record atPlayback.

I was blocked by a simple case because with one Add/RemoveComponent request you can only modify one component.

Example : I have a system which for entities with FooComponent and no (WorkComponent & CooldownComponent) will launch jobs on those entities and add WorkComponent & CooldownComponent component via query command.

When the cooldown is done, the component CooldownComponent is removed and so entities with FooComponent & WorkComponent are considered done for this system.

Now, the problem is if I add WorkComponent & CooldownComponent from my initial query after the first command, no entities will match the query anymore.

Can we have an Add/RemoveComponent<T,T1,T2,T3> to solve that kind of figure ?

cloud shadow
#

There should already be a version that takes ComponentTypes (plural)

whole ermine
#

oh indeed, with the ComponentTypeSet override 👍

viscid cedar
cloud shadow
#

The improvements were already pushed weren't they

#

From your stack isn't this just the sync point causing you issues?

#

The ecbs seems very fine here

viscid cedar
cloud shadow
#

Yeah I was wondering what the hell was up with that number

fossil torrent
#

The path from the "EntityCommandBuffer.Playback" marker to the "CompleteAllJobs" marker is through the internal method BeforeStructuralChange(), which (as the name implies) is called before a structural change occurs. Every EntityManager structural change includes a Begin/EndStructuralChanges pair, but in EntityCommandBuffer playback we try to batch multiple changes into a single begin/end pair to minimize overhead. If you're seeing 556 calls to CompleteAllJobs during the playback of a single ECB, it means that ECB was broken into 556 batches of structural changes.

#

At a slightly higher level, the method to grep for in EntityCommandBuffer.cs is CommitStructuralChanges(). This ends the current batch of structural changes and starts a new one, so that's where each "CompleteAllJobs" marker is coming from. I see 23 matches in my copy of EntityCommandBuffer.cs, mostly corresponding to the following command varieties:

  1. manipulating managed components
  2. commands that apply an operation to an EntityQuery
  3. commands that apply an operation to a NativeArray<Entity> (which variant #2 degenerates to if CaptureAtRecord semantics are active)
#

so @viscid cedar what commands are in the ECB that's calling CompleteJobs so many times?

viscid cedar
hollow edge
viscid cedar
hollow edge
#

and it's practically required for predicted spawning

#

otherwise you spawn multiple times

viscid cedar
#

@fossil torrent I believe the root cause of this 500+ jobhandle.Complete is caused by I have quite a lot of system using IJobEntity but since ECB cannot pass directly into job and needs to create command buffer on main thread. So every frame even I do nothing I still need to bear the cost of this ECB even the ECB does nothing 99% of the time that just create empty command buffer. It's something I mentioned to u last time at above that needs to just skip ECB that do nothing

fossil torrent
#

That could be happening elsewhere, but that's not what I'm seeing in the profile snapshot you posted. That's one EntityCommandBuffer.Playback() call triggering 500+ CompleteAllJobs calls.

#

Calling .Playback() on an empty ECB now returns almost immediately. I can see a way to make it return even more immediately-er (defer the EnforceSingleThreadOwnership() call and associated safety-handle check until after the empty/invalid ECB early-out checks), but I think you'd need a ton of empty ECBs before that's a significant bottleneck.

viscid cedar
# fossil torrent Calling `.Playback()` on an empty ECB now returns almost immediately. I can see ...

Can u make the improvement into next release or later? This profile snapshot is from android that takes more than 0.50ms which quite expensive for mobile platform. Most of the time something is nothing at desktop platform but it's extremely expensive at mobile platform. I would like to make it nearly zero ms when it's empty ECBs. I guess there's no other way to generate 500+ CompleteAllJobs but I will continue to investigate

fossil torrent
#

RIght, and this profile snapshot shows 500+ JobHandle.Complete calls from a single ECB playback. An empty ECB playback would have zero JobHandle.Complete calls. This is not an empty-ECB-overhead problem; this is a single command buffer with hundreds of fairly expensive commands in it.

#

In the profiler, we should be emitting markers with the recording system's name before playing back each ECB from an ECB system. It's odd that we're not seeing that in your profiler screenshot; it jumps right from BeginInitializationEntityCommandBufferSystem to EntityCommandBuffer.Playback(). If you want to try to add the instrumentation yourself and see where the expensive buffer is coming from, the code in EntityCommandBufferSystem looks like this:

#if ENABLE_PROFILER
var system = World.Unmanaged.TryGetSystemStateForId(buffer.SystemID); // System is likely to be in our world.
if (system == null) system = World.FindSystemStateForId(buffer.SystemID);
if (system != null) system->m_ProfilerMarker.Begin();
buffer.Playback(EntityManager);
if (system != null) system->m_ProfilerMarker.End();
#else
buffer.Playback(EntityManager);
#endif

As for determining a command buffer's contents, the best way at the moment is to catch it in the debugger and view it in the watch window; ECBs have a custom debug visualizer that lets you explore their contents in a much friendlier representation than the messy reality.
Another thing to try would be setting EntityCommandBuffer.PLAYBACK_WITH_TRACE to true, which logs info on every command played back. But that's going to be a lot of log spew to dig through.

viscid cedar
fossil torrent
#

Is there a command buffer that's adding RequestSceneLoaded to a bunch of entities? Is that coming from your code, or Entities package code?

viscid cedar
reef charm
#

Please forgive if I am not fully following this interesting discussion: it appears to me the easier fix is avoiding to create 500 RequestSceneLoaded within 1 frame vs re-factoring it to an enableable component? A case where I would vote for refactoring is the DisableRenderingTag in the entity graphics package where it appears much more likely to create 1000nds of tags per frame.

fossil torrent
#

What's the command you're using to add/remove RequestSceneLoaded in the ECB? Just adding an unmanaged tag component 500 times shouldn't result in 500 JobHandle.Complete calls.

fossil torrent
cloud shadow
#

Are you doing ECB calls from some type of fixed update? (predictive?)

viscid cedar
viscid cedar
hollow edge
viscid cedar
#

Alright. CommandBuffer.AddComponent(entity, new RequestSceneLoaded { LoadFlags = SceneLoadFlags.LoadAdditive });

viscid cedar
#

Ok. Removed SceneLoadFlags.LoadAdditive still the same result. Anyway I'm using CommandBuffer.AddComponent<RequestSceneLoaded>(entity) and CommandBuffer.RemoveComponent<RequestSceneLoaded>(entity)

next dune
fossil torrent
fossil torrent
# viscid cedar Ok. Removed SceneLoadFlags.LoadAdditive still the same result. Anyway I'm using ...

These commands (AddComponent(entity, RequestSceneLoaded) and RemoveComponent(entity, RequestSceneLoaded) should not trigger a per-command JobHandle.Complete() call inside ECB playback. AFAICT that only happens for commands that manipulate managed components, or which add/remove components through an EntityQuery. So, I'm once again skeptical that this command is the root cause of the 500+ JobHandle.Complete() calls you're seeing.

viscid cedar
# fossil torrent These commands (`AddComponent(entity, RequestSceneLoaded)` and `RemoveComponent(...

The screenshots is before and after comment out CommandBuffer. I just realized that it's just caused by CommandBuffer.AddComponent<RequestSceneLoaded>(entity). Btw I foreach NativeArray<Entity> to call CommandBuffer.AddComponent<RequestSceneLoaded>(entity) at IJob which has 11 length meaning it will loop 11 times every frame but still I dun expect so much JobHandle.Complete() call. So I believe how it works is like since it's around 60 fps will call 60 times x 11 length = 660 JobHandle.Complete() call.

hollow edge
viscid cedar
hollow edge
#

so you can collect all entities that need it added into one array and for removal in other

viscid cedar
hollow edge
viscid cedar
#

Pass array into CommandBuffer still get the same result

hollow edge
reef charm
viscid cedar
hollow edge
viscid cedar
#

It's just the same result when it spikes

hollow edge
#

wait a second

#

It says it's only one CompleteAllJobs

#

I think this amount equals to amount of your total jobs scheduled since last sync point

#

Basically: all it does is just syncing all jobs

#

and I guess if you will do this manually, without ECB by calling CompleteAllJobs you will likely get the same amount of jobHandle.Complete

fossil torrent
#

Ah, so the issue may not be that ECB is completing more jobs than it should, but that you really do just have hundreds of jobs to complete before playback?

cloud shadow
#

Mobile + prediction loop + too many systems?

viscid cedar
# fossil torrent Ah, so the issue may not be that ECB is completing more jobs than it should, but...

Not sure if I understand ur question. After I comment out the entire IJob that doing AddComponent(entity, RequestSceneLoaded) and RemoveComponent(entity, RequestSceneLoaded or just comment out AddComponent(entity, RequestSceneLoaded) inside the IJob , the entire insanely amount of JobHandle.Complete calls just completely disappear. So I can conclude that only this single IJob generates insanely amount of JobHandle.Complete calls and the IJob only runs at only one single ISystem update in InitializationSystemGroup. I found that when the project has dots netcode, the amount of JobHandle.Complete calls will increase significantly that I guess InitializationSystemGroup has been changed to higher update rate. From what I observed, the solution for official to fix is to change RequestSceneLoaded to enableable component so no more add/remove RequestSceneLoaded structural change every frame and thus no more insanely amount of JobHandle.Complete calls.

hollow edge
#

scene components make no difference here

#

any sync point will sync 600 jobs

#

just create a system that updates near that ECB and do state.EntityManager.CreateEntity(); of some sort

viscid cedar
hollow edge
#

it will create a sync point and I believe you'll see same amount of jobs

hollow edge
#

job you schedule is only one of them

#

sync points from structural changes require all jobs to complete

#

doesn't matter where they are coming from

viscid cedar
#

🤔 I see. This also means that I really need official to change RequestSceneLoaded to enableable component since mobile can't bear this insanely amount of JobHandle.Complete calls that I need to make sure no structural changes 99% of the time

hollow edge
#

That could potentially improve performance drastically

#

consider massive amount of jobs you have

cloud shadow
#

just don't use your ecbs for things like this unless you can stop your system updating completely (requireForUpdate)

#

you aren't doing scene loads every frame so don't cause a sync point every frame

#

if you only have to do a complete once every 30 seconds you won't even notice it

hollow edge
#

latest update feature

viscid cedar
cloud shadow
# hollow edge latest update feature

are you sure? like the literal first line of ecbs is

        internal void FlushPendingBuffers(bool playBack)
        {
            if (m_PendingBuffers->IsEmpty)
                return;

            CompleteDependency();```
#

it doesn't dispose go through the dispose chain if it's empty

cloud shadow
#

the return is if no CreateCommandBuffer() has been called

#

protected UnsafeList<EntityCommandBuffer>* m_PendingBuffers;

#

that's just the list of all command buffers created

hollow edge
#

🤔

cloud shadow
#

ecbs can't know if it's been used

#

until it completes dependencies

hollow edge
cloud shadow
#

because it's used in jobs

cloud shadow
#

yes no command buffers recorded

#

which means no CreateCommandBuffer() called

hollow edge
#

ah...

#

bruh

cloud shadow
#

i.e. if you don't use BeginSimulation it won't call complete anymore

hollow edge
#

well

#

easy to solve with custom buffer

cloud shadow
#

it's main thread, it can't know if your job has passed something to it without completing the job dependency

hollow edge
#

always false before any commands passed

cloud shadow
#

that already exists

hollow edge
#

if command passed - set to true

#

should threadsafe without interlocks

cloud shadow
#

ecb ShouldPlayback

cloud shadow
#

it's not valid

hollow edge
#

ah, true

#

command might be written in future

cloud shadow
#

it's only completing the job chain it needs to check

fossil torrent
#

So if I understand correctly, the issue is that you need to create an ECB every frame and register it for playback, and before it's played back it has to complete the recording job (causing a stall if the recording job has a ton of dependencies), and in most cases the recording job didn't record any command anyway, so neither the playback nor the complete-dependency stall is necessary. Is that accurate?

#

On one hand, I can see the value in being able to avoid creating an ECB and registering it for playback until you're certain it's actually needed. I'll have to give that some thought. The simplest workaround I can think of would be if you could run some sort of synchronous main-thread pre-pass to quickly determine whether any scenes needed to be loaded, and only create the ECB and schedule the job if there's actual work to do. But that may not be feasible in this case.

#

On the other hand, if the ECB recording job has ~600 dependency jobs to complete...I mean, something is going to have to pay the cost of completing those jobs at some point. I'm worried that any herculean effort to avoid the stall at ECB playback time is just going to cause it to pop up somewhere else instead.

hollow edge
fossil torrent
fossil torrent
#

Yeah, hundreds of jobs does not necessarily imply hundreds of Complete() calls. This is specifically a consequence of the way each system tracks its read and write dependencies. Each component type that the system registers as a reader or writer has 0-N read-dependency jobs and 0-1 write-dependency job handles associated with it, and each of these handles is completed individually.
But the system in question is an ECB playback system, so my question is now how an ECB system ends up with read/write dependencies on hundreds of components.

hollow edge
#

It's not completing dependency

#

it's completing all jobs

fossil torrent
#

Right, but if you look at the implementation of ComponentDependencyManager.CompleteAllJobs() (which is the method emitting that marker), it's looping over the registered reader/writer types of the system and (individually) completing each type's read and write dependency jobs. So it's not completing all jobs, just all jobs potentially relevant to the current system according to its registered component type dependencies. But that would be a heck of a function name.

#

@viscid cedar Out of curiosity, does your application have O(556) systems that create command buffers targeting the BeginInitialization ECB system?

viscid cedar
# fossil torrent Yeah, hundreds of jobs does not necessarily imply hundreds of Complete() calls. ...

From what I know it's design issue in the ComponentDependencyManager discovered by official. I think hundreds of Complete() calls will be reduced significantly after improved ComponentDependencyManager. This design issue also causing main thread stalling that can drop more than half fps.

The problem is the ComponentDependencyManager currently works using ComponentTypes to know if two jobs using components will create a data race. This is not actually optimal though -- it shoudl really look at jobs reading/writing to component types in the same archetype but it doesn't today.

viscid cedar
fossil torrent
#

What's the source of that quote re: the ComponentDependencyManager?