#Can you create a NativeArray of interfaces for use in Jobs?

1 messages · Page 1 of 1 (latest)

sour vine
#

I want to have a IJobFor that has a NativeArray of an interface. The reason is that I am doing mesh operations and want to fill the array with different operations to execute on the mesh in order (Bend, twist, offset, etc).

So the job will iterate over each vertex in a mesh, and iterate over each operation, executing it on the vertex.

Something sort of like this (there would be more to it than this, but this is the basic idea I was thinking of).
Of course you can't have an NativeArray of interfaces, so what would be the solution here?

interface IMeshOperator {
void Operate(int vertexIndex);
}

struct MeshJob : IJobFor {
  public NativeArray<IMeshOperator> operations;

  public void Execute(int index)
  {
    for (int i = 0; i < operations.Length; i++) {
      operations[i].Operate(index);
    }
  }
}
thorny otter
#

no

sour vine
#

Right... so what would be the solution to my design issue?

thorny otter
#

so this whole idea needs to be redone to fit into data oriented design

sour vine
thorny otter
#

this works

sour vine
#

Yeah, but it would be considerably slower, no? Since a full iteration of all the vertices would need to happen before the next job could run

#

Since each operation would need to iterate over all of the vertices.

thorny otter
#

Burst optimizes the shit out of it

sour vine
#

I mean, performing all of the operations in a single job would be faster than running a job for each operation in order.

#

The job for one operation could not start until the job for the previous one ended.

thorny otter
#

because branchless logic gets vectorized (4x-8x performance boost)

#

if you really want it to be in one job you'd need switch statement

#

and literally write it all like that

sour vine
# thorny otter because branchless logic gets vectorized (4x-8x performance boost)

Most of it will be completely branchless. But I don't see what that has to do with job execution order, or performance of doing multiple vs a single.
I don't care if it is a single job or not. I just want to make it as performant as I can so it can handle large amounts of vertices. But also be maintainable. I prefer not to have to edit the struct every time I want to add a new operation type.

thorny otter
#

if you do switch statement - no vectorization for sure

sour vine
thorny otter
#

obvious way - for each logic schedule job

sour vine
thorny otter
#

switch will destroy vectorization

#

also

#

if all your logic must run concurently

#

you don't really need job

#

unless you want that logic to happen async

#

and do something else inbetween

sour vine
#

I keep forgetting that Burst works on static methods. But jobs would still be faster because you can use ijobparallelfor right?

thorny otter
#

and also you can lose due to scheduling cost

#

so in other words: it depends

sour vine
#

I imagine with a large data set (lots of vertices) the mutithreading outweighs the scheduling cost.

thorny otter
#

yep

sour vine
#

So, guess I will go with scheduling the jobs sequentially

#

Thank you very much for your help! 😄

foggy sinew
flint laurel
# thorny otter it's exactly opposite

while the only actual answer is to profile, my intuition is that this is not correct, at least if the meshes are large enough not to fit in cache, or if the jobs run far enough apart that the mesh data gets evicted between them. even if burst vectorizes everything perfectly, you still have to pay for N times the memory bandwidth to stream all the vertices in and out for each distinct job. with simd load and store instructions, the memory bandwidth is indeed somewhat better, but not as good as just having to load it fewer times, i'm betting.

moreover, you could get parallelism while also using the switch statement and loading everything only once, if you're parallel across different sections of vertices in the mesh (assuming the mesh is partitionable somehow)

sour vine
sour vine
flint laurel
#

i think the best possible scenario is to have a single large IJobFor where each little bit of the IJobFor operates in parallel over a distinct range of vertices

thorny otter
#

if it's thread safe

#

you can run all jobs in parallel

sour vine
#

No, all the jobs operate on the same set of data, but there are subsections of the data.

I am generating tree meshs. So the mesh for example be all of the branches, so I can separate an IJobFor to run one execution per branch. Or I can do it per vertex or per loop or whatever.

sour vine
flint laurel
#

i thought IJobFor was just the newer IJobParallelFor

#

but yeah, either one

#

with IJobFor you have to say ScheduleParallel to get it to be parallel

#

but otherwise i think it's the same

sour vine
#

Oh really? You might be right, the docs doesn't really give any indication one way or the other. But now that you say that, it sounds familiar

#

I could probably do it in a single job, but it wouldn't be pretty haha. I will have to do some testing I guess.

foggy sinew
#

I think the idea here is to have one big twist job for example that operates on an array of all the verts that need twisting

#

Or am I wrong?

sour vine
#

I mean, whatever is faster, but also doesn't make a super class that is impossible to edit

bright prawn
#

Use DynamicBuffer for alternative

thorny otter
sour vine
#

I guess I could do something like this. Where I keep arrays of each type of operation, and then and second array that contains the 'id' of the operation, and the index of it inside of that array.

public struct MeshJob : IJobFor
{
  public NativeArray<TwistOperation> twistOperations; // 0
  public NativeArray<BendOperation> bendOperations; // 1

  public NativeArray<int2> order;

  public void Execute(int index)
  {
    for(int i = 0; i < order.Length; i++)
    {
        if (order[i].x == 0)
        {
          twistOperations[order[i].y].Run(..);
        }
        else if (order[i].x == 1)
        {
          bendOperations[order[i].y].Run(..);
        }
    }
  }
}
#

(int2 would be replaced with a custom struct which would be a enum and a int)

#

It would at least prevent the main job from being massive. But does require modifying the main mesh job to add a new type of mesh operation.

flint laurel
#

this .run thing seems iffy

#

and oopy

#

and why is your vertex and index data not on this job

#

how do you decide what operations need to apply to what sections of the vertex and index buffers?

thorny otter
#

if you simply want to spread between documents

#

just make strut partial

#

and declare partial methods

sour vine
#
struct BendOperation {
  public float bendAmount;
  void Run(..someVertexInformation..) { 
    // does something with the vertex data
  }
}
#

Oh wait, yeah I see it. Kind of oopy.

thorny otter
#

it's not possible in that way too

sour vine
#

is it not?

thorny otter
#

oh wait

#

if it's not interfaced

#

and only contains same logic

#

then it is possible

#

but feels like a pain to manage ngl

sour vine
#

I guess I could make Run a static method and just have BendOperation be the data that is passed to it.

sour vine
#

Maybe just running each operation in its own job really is the way to go. But I'm really not sure what the performance would be like

thorny otter
#

just test

sour vine
#

I mean, I could test I guess. But it will be a pain to setup the test.

#

Probably the thing to do though

thorny otter
#

do you have approx amount of data to process btw?

#

as in how much CPU must load

sour vine
#

Could be as low as like 100 vertices, or up to like 50,000 on the high end if I had to guess

thorny otter
#

that's not much

#

modern CPUs have 16mb l3 cache

sour vine
#

Maybe, but the mesh is regenerated every value change, like as a slider changes. So if it takes too long it will look bad. And the jobs are not super cheap

#

At least some of them aren't

flint laurel
#

i still want to know how do you decide which vertices get bends applied to them and which get twists etc

#

is it just a global list of operations, which each may access any part of the vertex buffer? or what happens

sour vine
flint laurel
#

i see. so it is kind of random per operation

#

so you can't really partition the meshes easily to do it in parallel

#

so i guess what i would do is either a) do it in separate jobs like you said, if the perf is not too bad, or b) i would compute the start and end vertices for each operation before scheduling the job (or in an earlier job), and bin the operations by which partition of the vertex data they access. and then you can schedule a parallel job where each parallel shard accesses the operations that are relevant for its partition of vertex data

sour vine
flint laurel
#

ah, i see

sour vine
#

Right, so, please let me know if I am doing this test wrong... because right now they are basically exactly the same. Once you get to 1 million, there is like a 3ms dif
https://gdl.space/ilivefezar.cs

#

But I feel like I maybe shouldn't be calling Complete() right away? That just makes it run on the main thread basically right?

thorny otter
#

to avoid Schedule overhead

#

and you can just Complete() offsetHandle

#

on second

#

since it depends on first

sour vine
#

Right, artifact of debugging, thanks.

thorny otter
#

wait

#

why are you doing it like this

sour vine
#

What do you mean?

thorny otter
#

why you do for loop inside Execute?

#

or is that important part of logic?

#

not optimization?

sour vine
#

Emulating looping over a set of vertices

thorny otter
#

so it's a logic?

#

not just some optimization?

sour vine
#

Yeah, but makes no difference

thorny otter
#

it does, because when you schedule normally

#

as it's mean to be

#

Burst works at it's best

#

I mean, when you operate specifically per index

#

without branching

sour vine
#

Right, so my test is faulty because I am not scheduling it properly?

thorny otter
#

no

#

It's not about scheduling

#

it's about Job's execute

#

and it's not necessariily faulty

#

I just don't know what exactly you need to do

#

but it felt like you were trying to emulate batches

#

which is not needed at all

sour vine
#

I thought it would make a difference, but doesn't make any at all it seems. Might be easier logic wise though

thorny otter
#

simply having
Execute(int index){ result[index] = data[index]*logic;}

#

works best

sour vine
#

Also, damn, I knew burst is fast. But it takes 420ms to execute with one million points. But only 65ms once you compile the jobs with burst

thorny otter
#

kinda a lot tbh

#

could definetelly make it way smaller with multithreading

sour vine
#

You mean not calling Complete right away?

thorny otter
#

no

#

I mean

#

ScheduleParallel

#

not just Schedule

#

Completing right away is necessary for benchmark obviously

sour vine
thorny otter
#

in real game scenario

#

it's best to schedule at start of frame

#

and complete at the end

thorny otter
#

you have loop

#

Burst has to compile it

#

it can't vectorize it

#

you lose 4x-8x boost

sour vine
#

Well, look at that, you're right. Down to ~22ms

#

From 430 down to 22...

#

(from removing the loop)

thorny otter
#

now do ScheduleParallel

sour vine
#

Dang, down to 4ms now with a batch count of 8

thorny otter
#

could play with batch count even further, but I don't know much details on that tbh

#

just make sure batch is big enough

#

so scheduling separate trhread batch is justified

#

feels like you can make it to 2ms

sour vine
#

You do see that overhead from the multiple jobs now. Having the separate jobs is about 2ms slower (6ms)

thorny otter
#

as 1mil tbh is not even that much

sour vine
#

Yeah, for sure

#

Btw, giving it more batches doesn't make a difference (4, 8, 16, 64, all gave the same time)

#

(might be a small difference up at ~32 , as it goes between 4 and 3 ms)

thorny otter
#

how about 256

#

or 1024

sour vine
#

Nah

#

Maybe takes 3ms a bit more often?

#

Up at 5 million it takes about 20ms for the single job, and for the two jobs it takes 25ms. So you start to see the overhead a bit more

thorny otter
#

in a build it's even better

#

since no safety checks

#

and burst is AOT

#

instead of editor JIT

sour vine
#

But, that isn't too bad I think. I will probably do the multiple jobs still since it is more flexible and easier to manage the code.

sour vine