#What stops this vectorizing?

1 messages · Page 1 of 1 (latest)

jagged knot
#

I have this code inside a job

for (var i = 0; i < this.onBuffer.Length; i++)
{
    remainings[i] = math.max(0, remainings[i] - this.DeltaTime);
    this.onBuffer[i] = remainings[i] != 0;
}```

which I was curious about vectorizing
#

I recreated a version in a unit test, both as a function pointer and a job just to start testing

        [BurstCompile]
        public static void Basic([NoAlias] float* remainings, [NoAlias] bool* buffer, int length, float deltaTime)
        {
            for (var i = 0; i < length; i++)
            {
                remainings[i] = math.max(0, remainings[i] - deltaTime);
                buffer[i] = remainings[i] != 0;
            }
        }

        [BurstCompile]
        public struct BasicJob : IJob
        {
            public NativeArray<float> Remainings;
            public NativeArray<bool> Buffer;

            public float DeltaTime;

            public void Execute()
            {
                for (var i = 0; i < this.Remainings.Length; i++)
                {
                    this.Remainings[i] = math.max(0, this.Remainings[i] - this.DeltaTime);
                    this.Buffer[i] = this.Remainings[i] != 0;
                }
            }
        }```
#

and it seems to generate quite the large chunk of vectorized code

#

the thing i can't figure out is, why this is not being generated in the original job

fast dove
#

Maybe using of accessor?

#

Too much of it at least

jagged knot
#

I even simplified the original system into a copy test

                    var testBuffer = new NativeArray<bool>(chunk.Count, Allocator.Temp);
                    var remainings = chunk.GetNativeArray(ref this.RemainingHandle).Reinterpret<float>();
                    for (var i = 0; i < testBuffer.Length; i++)
                    {
                        remainings[i] = math.max(0, remainings[i] - this.DeltaTime);
                        testBuffer[i] = remainings[i] != 0;
                    }```
#

it still generates the original non-vectorized code

fast dove
#

Can you use ref?

#

For both

jagged knot
#

they're pointers?

#

if i simply move this

                    // {
                    //     remainings[i] = math.max(0, remainings[i] - this.DeltaTime);
                    //     this.onBuffer[i] = remainings[i] != 0;
                    // }

                    CalculateOn(remainings, this.onBuffer.GetUnsafePtr(), this.onBuffer.Length, this.DeltaTime);```
into a method
#

it now works

fast dove
#

I mean instead of [i] use ref ElementAt

#

Which is only accessed once

jagged knot
#

float* remainings = (float*)chunk.GetRequiredComponentDataPtrRW(ref this.RemainingHandle);

fast dove
#

Oh

jagged knot
#

even copying everything local no longer works

                    var deltaTime = this.DeltaTime;
                    var length = this.onBuffer.Length;
                    var buffer = this.onBuffer;

                    for (var i = 0; i < length; i++)
                    {
                        remainings[i] = math.max(0, remainings[i] - deltaTime);
                        buffer[i] = remainings[i] != 0;
                    }```
fast dove
#

Can you then calculate final ptr only once?

jagged knot
#

but if i do this exact code in a separate method it works fine

fast dove
#

So you avoid using [i] multiple times

jagged knot
#

the question is why it works in 1 situation

#

but not the other

#

ok figured it out
this doesn't work

            public static void CalculateOn(float* remainings, NativeArray<bool> buffer, int length, float deltaTime)
            {
                for (var i = 0; i < length; i++)
                {
                    remainings[i] = math.max(0, remainings[i] - deltaTime);
                    buffer[i] = remainings[i] != 0;
                }
            }```

this works
```cs
            public static void CalculateOn([NoAlias] float* remainings, [NoAlias] NativeArray<bool> buffer, int length, float deltaTime)
            {
                for (var i = 0; i < length; i++)
                {
                    remainings[i] = math.max(0, remainings[i] - deltaTime);
                    buffer[i] = remainings[i] != 0;
                }
            }```
#

it thinks the array could alias because of the NativeDisableContainerSafetyRestriction

#

so the question is, how do i make this pattern work then =S

#
            [NativeDisableContainerSafetyRestriction] // Only initialized in the job
            private NativeList<bool> onBuffer;```
doesn't seem to help
#

(the weird thing is, the vectorized loop is like 5x more instructions, is it actually faster 🤔)

#
        [BurstCompile]
        public static void Scalar(float* remainings, bool* buffer, int length, float deltaTime)
        {
            for (var i = 0; i < length; i++)
            {
                remainings[i] = math.max(0, remainings[i] - deltaTime);
                buffer[i] = remainings[i] != 0;
            }
        }

        [BurstCompile]
        public static void Vectorized([NoAlias] float* remainings, [NoAlias] bool* buffer, int length, float deltaTime)
        {
            for (var i = 0; i < length; i++)
            {
                remainings[i] = math.max(0, remainings[i] - deltaTime);
                buffer[i] = remainings[i] != 0;
            }
        }```
#

is the simple repo but yeah, not sure how to stop burst thinking this is aliasing in the actual code

#

obviously i can just have this as the method but it's more about the general principle because this is a pattern is use frequently

#

even throwing it on the struct doesn't help

        [BurstCompile]
        public unsafe struct UpdateTimeJob : IJobChunk```
#

burst is insistent these alias

#

after all that

#

it's such a minor bump in performance due to the huge amount of extra instructions anyway =S

sage slate
#

and also why andreas added intrinsics, because he hated when this stuff happened to him with cpp compilers

fading mason
#

FWIW I am also pretty firmly in the camp of "if you want vectorized code, write vectorized code". It's great that Burst supports auto-vectorization, but it's very easy to fall off that path without realizing it (or to figure out why it happened even if you do realize it, as this thread solidly demonstrates)

jagged knot
#

Putting [NoAlias] on the field or job didn't do anything

#

On the field in particular I feel should solve this

alpine yew
#

I feel like NativeDisableContainerSafetyRestriction has max priority here, don’t know if burst team would change that behaviour unfortunatelly

sage slate
#

personally, if i had put this amount of investigation into a non-vectorized situation i would 100% be using intrinsics for said situation

jagged knot
#

To me this is less about getting this vectorisation to work, more about understanding what's going on and how I can code efficiently by default.

#

This pattern of allocating per thread containers but having to use disable safety to make it work is something I often do to minimise allocations.

#

And it's the first time I've stumbled upon a problem with it

#

just in case it wasn't clear because I've realized I never mentioned this in the post, I'm only using NativeDisableContainerSafetyRestriction because the native container isn't passed in, it's only initialized in the job but safety system doesn't like this.

            [NativeDisableContainerSafetyRestriction] // Only initialized in the job
            private NativeList<bool> onBuffer;

            /// <inheritdoc/>
            public void Execute(in ArchetypeChunk chunk, int unfilteredChunkIndex, bool useEnabledMask, in v128 chunkEnabledMask)
            {
                if (!this.onBuffer.IsCreated)
                {
                    this.onBuffer = new NativeList<bool>(chunk.Count, Allocator.Temp);
                }

                this.onBuffer.ResizeUninitialized(chunk.Count);```
#

doing it with an unsafe list doesn't help either,

            [NoAlias]
            private UnsafeList<bool> onBuffer;```
sage slate
#

i mean, maybe they were both incorrect, but matt pharr concluded in that blog post series, and andreas concluded in his oft-cited intrinsics talk, that one should just never assume that you can code in a non-intrinsics, non-ispc way that will reliably be vectorized