#[BUG] Domain allocator crashes Editor

1 messages · Page 1 of 1 (latest)

tepid harness
#

Simple repro, having code like this crashes editor in OnDispose, replacing Domain to Persistent solves that. Latest 1.4 packages, latest burst, Editor 6.3.0b3 (Happens in previous versions too)

[BurstCompile]
public partial struct TstCrash : ISystem
{
    private UnsafeList<UnsafeList<int>> test;

    [BurstCompile]
    public void OnCreate(ref SystemState state)
    {
        test = new UnsafeList<UnsafeList<int>>(64 * 64 * 16, Allocator.Domain);
        for (int i = 0; i < 64 * 64 * 16; i++)
        {
            test.Add(new UnsafeList<int>(1, Allocator.Domain));
        }
    }

    [BurstCompile]
    public void OnDestroy(ref SystemState state)
    {
        for (int i = 0; i < test.Length; i++)
        {
            test[i].Dispose();
        }

        test.Dispose();
    }

    [BurstCompile]
    public void OnUpdate(ref SystemState state)
    {
        
    }
}

In addition having 128*128*16 will crash in OnCreate. Docs states that exceeding l2 cache only reduce performance, I don’t expect it to crash. I’m aware that Unity frees memory for Domain automatically, but docs also tells that we can manually free it at any time. Hi @tough smelt whom should I bother with that?

tough smelt
#

Unsurprising, sounds like a path fraught with peril. Please write up a bug and I'll see where it lands

tepid harness
#

Ok will fill bug report, but repro code above is all you need, 100% repro rate blobnod

tough smelt
#

that's definitely handy

tepid harness
#

Sure I just spending time while Unity creates project for bug report )))

#

@tough smelt CASE IN-117219

#

You can change size in TstCrash.cs (uncomment second line) to reproduce crash in OnCreate, otherwise it will crash in OnDestroy

tough smelt
#

thanks

tepid harness
#

Bump 😄 Afaik @celest grotto also reached that bug

celest grotto
#

Yep. Switched to persistent and no more crashing

agile shoal
#

So the short answer here is that the Domain allocator has a hardcoded limit on the number of allocations it can track. You're hitting that limit in the 128x128x16 case, which is why it crashes in OnCreate. We need to make that error case more clear and not crash.

The crash in OnDestroy is from a similar case of hitting the limit, but it's not clear to me whether it's caused by a bug in the implementation, or a flaw in the design. I'll need to investigate it further.

tepid harness
#

Good to know about allocation limit and yeah definitely should be handled with proper error AND 100% should be mentioned in docs

celest grotto
tepid harness
#

You meant persistent maybe?
Edit: ah I’ve read it wrong, sorry, you meant malloctracked with tempjob case, I’ve read it as temp job and persistent (like 2 cases)

agile shoal
#

Ah, it looks like we have another, smaller limit on the number of Domain-allocated frees per frame that we expect.

#

Currently we allow 2^18 Domain allocations and support freeing 2^16 of them in a single frame. Looking at your numbers, that would account for both the OnCreate and OnDestroy crashes.

#

We should probably just bump that latter limit to match the total number.

agile shoal
# tepid harness Malloctracked is not an allocator )

MallocTracked is used by the TempJob and Persistent allocators when you have leak-tracking enabled. It maxes out at 2^20 allocations and won't crash if you go past the limit, but those surplus allocations won't be tracked for leak-detection.

#

I'll update the bug with this information and get that moved along. Thanks for the repro case, it made it easy to track down.

tepid harness
#

Ok, then in the end as minimum we could expect not a crash but exception, and alloc free calls count match right? And as bonus point in can be mentioned in allocators docs section?blushie

#

Thanks for the info! And future improvements

tepid harness
#

@agile shoal sorry for ping, but will hijack that thread as seems it's also Allocator.Domain related, is something flashes for you with that stack?

#

Switching to manually allocate container per thread (see below) and crash disappears:

[NativeDisableContainerSafetyRestriction]
private NativeList<float> _list;

public void Execute(int index) 
{
    if(!_list.IsCreated)
        _list = new NativeList<float>(10, Allocator.Temp);

    // Do stuff...
}

Logic behind logically the same (list per thread, but with pool is just a bit more memory efficient), the difference here is mostly allocator

agile shoal
#

No need to apologize for the ping, I appreciate it.

#

Does this happen with any use of the PooledNativeList, or is it the same bug based on the (undocumented) size limit of the Domain allocator?

wet magnet
#

(I should say, I or anyone at work, has never had an issue with my PooledNativeList and it's used quite a lot. Also it's been shipped with my libraries for 7 months without anyone else mentioning an issue)

tepid harness
# agile shoal Does this happen with any use of the PooledNativeList, or is it the same bug bas...

Nah we have that only in one of our projects, it happens only after using in specific job, tho simple one and data used is small enough and usage the same as in any other job which is not crashing anything, in context of allocations, as there is usually list per thread at a time which then reused by other jobs (whole point of pooled list) I believe it’s definitely not exceed limit of allocations per frame

tepid harness
#

And I believe if it would be our Malloc or Free we would see stack from our code (as in initial stacktrace) but here is purely in engine C++ guts we have no access (

agile shoal
#

I've got a fix for this is heading to trunk, and backports will follow once it lands. I can keep you posted.

agile shoal
#

You (and @wet magnet) may be interested to know a few implementation details for performance reasons if you are manually freeing large numbers of Domain-allocated allocations.

When you make an allocation with the Domain allocator, we add that pointer to an array for tracking. If you manually dispose an allocation, we add it to a "to-free" list. Once a frame, we walk the allocated list and check if any of them were "freed" which is a O(m*n) operation (where m is the number of standing allocations, and n is the number of manually freed allocations that frame).

We walk both of those arrays in reverse and swap-back-remove, so it turns out if you deallocate in the same order you allocated, you'll get significantly better performance. Thought this might be helpful, since it's usually customary to do the opposite - free in the opposite order that things were allocated.

This is an implementation detail, so it's subject to change, but we don't have any plans to change it as of now.

#

I did experiment with iterating the to-free list forward while keeping the allocated list in reverse, since that would seem to better support the usual pattern of alloc A, B, C -> free C, B, A, but because we're using swap-back to remove items, it ends up shuffling the to-free list while we're working through it, and the best case ends up slower.

#

The actual domain reload auto-free is faster either way, since it's just a linear walk through the allocated list.

tepid harness
#

Nice! Thanks for the fix and info @agile shoal 🔥

agile shoal
#

Fix landed in 6000.5.0a5, 6000.4.0b5, 6000.3.5f1 and 6000.0.66f1

  • Domain allocator will print errors and not crash if user allocates more than the allocator is able to track, or if the user frees more than the number of allocations it can track per frame.
  • Increased the total allowed number of frees-per-frame to be equal to maximum number of allocations. Freeing this number of tracked allocations will be quite slow, but it will no longer crash.
  • Added documentation including the total number of tracked allocations allowed.

Feel free to ping me if you have any additional problems