#archived-dots

1 messages ยท Page 234 of 1

robust scaffold
#

Ah, singleton entity is an entity that only exists by itself

#

no worries, im stuck coding my own project and Im trying to map out the component architecture in my head

glacial hazel
robust scaffold
#

Normally, you'll have an entity archetype (like a class) with a lot of entities constructed from it (like an array of identical class type but different properties)

glacial hazel
#

Aaaah

robust scaffold
#

A singleton entity is a single entity of a single archetype (class). Like when you create an object from a class and only that 1 object.

glacial hazel
#

I think with your last message I understood better

#

How do you create an archetype of entities?

glacial hazel
#

And can an archetype inherit from another

#

Or I'm thinking like object oriented haha

robust scaffold
#

Just completely throw inheritance out the window

#

There is "polymorphism" but it's completely pointer based and reinterpreting things.

#

There is no inheritance, there is, largely, no references. Everything is value typed and isolated.

glacial hazel
#

Those components, are you simulating real life? xD

robust scaffold
#

Economic simulator

glacial hazel
#

Wow

robust scaffold
#

That is possibly the most pure Object Oriented coding I have ever seen. And Im converting it to DOTS

#

The inheritance and event based programming is insane

#

It makes my eyes water seeing the garbage code

glacial hazel
#

Wow haha

#

And you're doing it because you want a massive simulation I guess

robust scaffold
#

It's not mine, it's some guy's academic model

robust scaffold
#

I think it's perfect for DOTS, once I somehow wrap my head around some of the relational aspects that Im struggling to convert to DOTS

glacial hazel
#

Sounds cool!

haughty rampart
robust scaffold
glacial hazel
#

And what's the use of an archetype

robust scaffold
#

It's the "object" type used to define the array that the data will be located in

glacial hazel
#

So you run a system in all the entities that belong to an archetype?

robust scaffold
#

Like how you make var arrayOfStrings = new String[15]

#

That string is the archetype of that array

robust scaffold
glacial hazel
#

Ok ok, so it's a way to group your entities then

#

Or categorize them I guess

robust scaffold
#

Generally, the translation between Unity DOTS terminology is this: archetype = type, component = property, chunk = array.

safe lintel
#

@robust scaffold not really sure how you make a game without structural changes

robust scaffold
#

of course, structural changes will have to happen, because how else are the entities created in the first place, but minimizing them is my goal

safe lintel
#

polymorphism ๐Ÿค”

robust scaffold
#

At initialization, it sets index (since the target entity has not yet been created). At the end, it reads index, lookup the actual entity corresponding to that index, and sets the entity. No structural change, no entity shifting. Can all be done in parallel.

#

And Franco, ignore what I'm doing. This is advanced DOTS

glacial hazel
coarse turtle
#

yes

robust scaffold
safe lintel
#

that is beyond my comprehension at this point ๐Ÿฅฒ

robust scaffold
#

in that example, ushort is 2 bytes wide, the other 6 bytes are ignored.

safe lintel
#

tbh with a working fps prototype, the major slowdowns are really physics and hybrid unityengine stuff, not any jobs/systems doing structural changes

robust scaffold
#

So with my 2.5 million entities, every little bit of micro-optimization matters.

safe lintel
#

@robust scaffold ๐Ÿ‘ understandable, thats totally not my use case

robust scaffold
#

When ya think about, DOTS inheritance is simultaneously non-existent and extremely fluid compared to C#.

#

All one has to do is identify which struct is sized larger than the other and where each property is located, then you can switch between any two struct types completely free.

glacial hazel
#

But probably when you re-interpret the data to the other struct it won't make sense. Unless the other struct has the same data, and more

#

More than it won't make sense, it won't be meaningful

#

Like if they're totally 2 different structs

robust scaffold
#

Something like that. An initial enum identifying that type is the following data corresponding to. Then using a switch statement to properly access the data within each struct type

haughty rampart
#

what's field offset for? any docs? haven't actually seen that anywhere

robust scaffold
#

Thats how one can mimic polymorphism within a component without structural changes.

coarse turtle
robust scaffold
#

Ha, sniped

coarse turtle
#

๐Ÿ™‚

north bay
haughty rampart
robust scaffold
haughty rampart
robust scaffold
haughty rampart
#

yeah, i'll definitely do that as well. just did not know about fieldoffset until now.

glacial hazel
#

So... if you have one value stored there in that piece of memory, can you interpret that piece of memory, without changing it as more than one struct?

robust scaffold
glacial hazel
#

Or is to re-use that piece of memory, after changing some values, and then re-interpret it as another component

robust scaffold
#

Well that isnt the union example I put up earlier but it's along the same strategy

#

TShared and TDestination are identical structs, just typed differently because SharedComponentData and ComponentData can not be one component.

#

So I use the magic of pointer casting to just reinterpret TShared to TDestination and set the chunk component as TShared without any other changes.

#

If I could access chunk components by dynamic type handles, this would definitely be an event based system.

#

Well, it is an event based system

#

it requires the existence of a singleton containing ProvinceChangedEnable as a flag, then runs two systems mirroring the shared onto the chunk component. Since this is to run maybe once a minute or more, it's not a buffer element event

glacial hazel
#

I think I'm not ready to completely understand this haha. Anyways, I'll leave the chat now, thank you for your super kindness, wish you the best in your interesting-looking project! Probably I'll ask more next week or something, thanks!

robust scaffold
glacial hazel
#

My last question is... (sorry can't avoid it xD)

robust scaffold
#

im here mostly all day, my code is garbo anyways

#

and I need to scrap and rewrite the entire thing

glacial hazel
#

Do you think that DOTS is more complicated or hard to code than regular C# and Monobehaviours, or just different, and looks complicated because I'm just not familiarized with it?

haughty rampart
robust scaffold
haughty rampart
safe lintel
#

@glacial hazel I think dots is way easier to code for than old monobehaviours, I dont think kornflaks makes it easy on himself but I cant go back to non dots workflows, Ive tried several times and dots just is much more simple and elegant imo. Might take time to adjust though

glacial hazel
#

Wow, those are very good and hopeful answers

#

Awesome, I'll try to learn ๐Ÿค“

robust scaffold
#

I mean, look at the code I need to write just to reset all of an entity's census component values to 0:

#

On the other hand, that operating over about 2.5 million entities is so fast, i had difficulties finding the actual operation time

#

0.01ms and even then it was a blip.

haughty rampart
#

boilerplate code will be reduced even more as dots is developed btw

robust scaffold
#

hopefully, a major feature Unity has promised is improved code gen to allow for C# macros.

haughty rampart
#

what would a c# macro be? i know cpp macros but i doubt that's what you're referring to

robust scaffold
#

Unity maps methods to C++ then codegens from there.

haughty rampart
robust scaffold
#

Yea, public release of 0.18 was scrapped. They're on 0.20 internally

#

They're still coding, we're stuck on 0.17 with Entities.ForEach stuck codegen'ing to IJobChunk code.

haughty rampart
#

they said they want to move loads of things to source generators. that's what i'm really looking forward to

robust scaffold
#

Microsoft these days are focusing heavily on source generation integration. C#9.0 or something talked a lot about source generators

#

When I read the preliminary patch notes, I can smell Unity all over it

haughty rampart
#

i've written a lot of source generators already. they are so awesome. and 2 days ago i realized they just dropped the 2.0 api for source generators.

north bay
#

I heard rumors that they got rid of IL weaving completely and moved to source generators, really excited for that

robust scaffold
#

Yep. Will take a few years though for 9.0 improvements to trickle down to unity. Hence why DOTS wont be production ready for a few years, along with other issues.

north bay
#

I had to manually rewrite all my Entities.ForEach into customs jobs because IL weaving is so abysmal slow
Nowadays I don't even touch Entities.ForEach anymore..

robust scaffold
# haughty rampart wdym 9.0 improvements?

I think it's C#9, it's one of the newest C# versions. I dont have much experience with source gens but people on the forums were estimating that a lot of Unity's issues stem from the lack of comprehensive source gen integration into C# compared to something like C++.

#

And the newest C# just dropped those very same source gen improvements. Just my thoughts personally but I think unity may have greased some wheels over at the Gates headquarters to push through what they want in C#

haughty rampart
haughty rampart
#

that are source generators

robust scaffold
#

No clue what those do and they wont be available for Unity programmers for years but yea. Maybe that was what Unity was waiting for?

haughty rampart
#

but it's not really linked to the C# version actually

haughty rampart
robust scaffold
#

I am 100% sure record structs are not allowed in unity and that came in.... 8.0?

haughty rampart
#

il2cpp is......good but not related to dots or anything really

haughty rampart
#

did you mean record classes?

robust scaffold
haughty rampart
robust scaffold
#

9.0, Record value typed components that contain pointers / references instead of pure values.

haughty rampart
robust scaffold
#

Not a class, they can be unmanaged and burstable

#

Basically just C# implementation of pointers

haughty rampart
robust scaffold
#

I swear they're value typed like structs and enums

haughty rampart
#

wait a moment please

robust scaffold
haughty rampart
robust scaffold
#

And, ewwww, inheritance.

#

Disgusting. Back to my list of void pointers

haughty rampart
haughty rampart
#

but they too have nothing to do with pointers

muted star
#

Turbo Makes Games has some tutorials which aren't outdated

robust scaffold
robust scaffold
#

Well, not pointers. References.

#

Damn, I must be confusing them with the Span<> implementation then

haughty rampart
#

what i am looking forward to in C# 10 is with support for normal structs

#

that will also be insanely awesome for dots

coarse turtle
#

is this on the c# 10 spec page? Nvm found it in C# 9 lol

robust scaffold
haughty rampart
haughty rampart
robust scaffold
haughty rampart
#

i'm totally gonna use it loads and loads

robust scaffold
#

I just reinterpret all my native arrays into ints or floats then jump indices to access single properties but that also works

#

Finally, got this thing to vectorize

haughty rampart
#

why don't you inline the method anyway?

robust scaffold
#

This is not vectorized, length from batch in chunk does not result in vectorization for some reason

robust scaffold
haughty rampart
#

ahhh. ok. so that's something you'll change for release?

robust scaffold
#

Yea. Just for debugging here. I'll change it to AggressiveInlining when I'm happy with it.

#

I isolate the loop functions so I can see what it's compiling into using burst and it's a real simple NoInlining to check it

#

What the fuck, using batchInChunk.Count is vectorized now

haughty rampart
#

XD

robust scaffold
#

No it isnt anymore. Burst, what the fuck

#

Glorious purple

haughty rampart
#

maybe it thinks it aliases with something?

robust scaffold
#

The header has nothing it can possibly alias with

haughty rampart
#

yeah idk. was just thinking

robust scaffold
#

Okay, now it's not vectorized. Now Im paranoid. I need to check all my other functions if they're still vectorized

#

Now it's not vectorized and using the native-array's length

#

Burst. Why you do this?

haughty rampart
#

burst is fully deterministic. there must be something that burst can't reliably get enough information of

robust scaffold
#

wait, if you read the vectorized code, it says vpaddd

haughty rampart
#

something i'm curious about is how dots will be moddable.......it seems pristine for it......yet i am not sure how i would inject new code into a compiled project

robust scaffold
#

If Burst can support modding, DOTS can

haughty rampart
haughty rampart
#

how'd i miss that

robust scaffold
#

I figured out the difference in vectorization / not vectorized. There is no vectorized set. Only vectorized math functions. *= 0 is equivalent to set 0... but that doesnt make sense. It's still being set to the ultimate value.

#

wait

#

Vectorized set

#

Burst. What the fuck

#

and now it's not vectorized anymore... identical code. Just refreshed the assets

haughty rampart
robust scaffold
haughty rampart
#

i think the last i read burst in depth was with 1.5

robust scaffold
#

Honestly at the level of effort for bursted mods, I might as well fork the codebase and modify the source code myself. If it's open source...

haughty rampart
#

um, maybe try setting burst compilation to synchronous only. because otherwise i think the problem you face is a side effect.

robust scaffold
#

It's basically merging burst outputs. Fairly dangerous on the security side. Literally no scripting limits

haughty rampart
#

i hate modding apis that are like 'ohhhh, only visual scripting for youuuuu'

#

cough vrchat cough

robust scaffold
haughty rampart
agile dome
#

Bonus points if you allow automatic mod syncing from the server ๐Ÿ˜›

robust scaffold
haughty rampart
haughty rampart
robust scaffold
haughty rampart
#

and tbh. i can very much do without such people

robust scaffold
#

The average person wont assume that the mod they're loading is a raw DLL. After all, nearly every other popular game they play uses Lua or some other scripting language to enforce a "safety" on the mods they download. So they'll assume the same.

haughty rampart
robust scaffold
#

I mean, power to ya if you can somehow communicate the risk of the mods you load in your game as unprotected and raw compiled code but I just open source my game and ask whoever bothers to play to compile it themselves

haughty rampart
#

that doesn't work for titles you wanna get some profit from

robust scaffold
haughty rampart
#

and srsly. you can literally make a whole popup in your game about the risk of using mods which the user has to agree to so there's your risk communication

robust scaffold
haughty rampart
robust scaffold
haughty rampart
#

and there really shouldn't be

robust scaffold
#

Well, that question is for lawyers, the legal team, and how much liability one assumes by providing a program that blindly trusts DLL extensions provided to it upon non-admin confirmation from a user.

haughty rampart
#

wth. the user AGREES to use the dll because he has to seperately download it, put it in the folder and start the game. and you have a whole section in your AGB and a huge clause at the start of the game where the user has to agree to know the risks of using mods

agile dome
#

That seems reasonable, especially if you don't advertise or build additional tooling for it yourself. If you were doing some sort of mod syncing setup, I believe it would be a good idea to take more responsibility than that.

haughty rampart
#

@robust scaffold did synchronous compilation reap more expected results?

robust scaffold
haughty rampart
#

Hm strange

robust scaffold
#

I added a blank new line, now it's vectorized

#

definitely a race condition somewhere

#

This reliably vectorizes, when an array operates on another

#

but not when one of them is constant

robust scaffold
#

I think I figured it out. The burst inspector lies

#

The method stamps are not the same

#

Even if I reload unity a few times / refresh assets

#

Burst doesnt want to update

#

This is the proper stamp, with un-vectorized application of 0

haughty rampart
#

ah, so it keeps a prev version in the inspector?

robust scaffold
#

I actually dont know any more. I've long since renamed that method to see if renaming things update the inspector

#

it still has the nonvectorizedclear name

#

im gonna try and restart unity, see if that helps

haughty rampart
#

hm, seems like a bug to me. best file a bug report on the forum

robust scaffold
#

There ya go. Now it refreshed. It's just a text stamp. I dont think it matters

#

Yea. It's not suppose to be vectorized. the burst inspector just seemed to stall at old version using text logic

#

Is that a bug? Yes. Worth reporting? Nah. Restarting unity isnt that bad thankfully to worth writing up a bug report

robust scaffold
#

By instead initializing the value that the array is being set to outside the vectorized code section, it seems to now know reliably that the setting of the array can be vectorized

haughty rampart
robust scaffold
#

adding the in parameter to the parameter list changes the code drastically. I dont know how to read it though. Give me a sec to wait for unity to restart

robust scaffold
haughty rampart
robust scaffold
#

Using the in parameter changes the vmovedqu to vmoveups

haughty rampart
#

with int you copy the int to the method. with in you reference a readonly pointer to the int for the method

#

for int that does not really matter since pointer and int are usually same size

robust scaffold
#

dqu or ups? which one is faster?

haughty rampart
#

i have no idea.

robust scaffold
#

removing the in parameter, it has less code but also has that additional vmovd in front

haughty rampart
#

i'd imagine dqu though

#

yeah, vmovd probably copies the int to the method since it's a mov

robust scaffold
#

well, is throughput higher number better?

#

Latency is the same for both

haughty rampart
#

where are the docs? would be appreciated if you can link that^^ i'm interested

robust scaffold
#

that's for intel, which my computer is using

#

AMD also has a version as well but it's not as nice. It's a pure text file

haughty rampart
#

oh btw. throughput: lower is better

#

@robust scaffold

robust scaffold
haughty rampart
#

it's CPI

#

cycles per instruction

robust scaffold
#

oh duh, didnt see that

#

or well, recognize it

haughty rampart
#

see, i knew dqu was better

#

but you have an additional mov

robust scaffold
#

yea, and the loop goes around the dqu, so the small upfront cost of vmovd probably is insignificant... probably

haughty rampart
#

actually yeah

robust scaffold
#

Those arrows introduced in burst 1.7 is really nice

haughty rampart
#

fuck all people who say micro optimization is irrelevant. WE'RE DOING NANO OPTIMIZATION

robust scaffold
#

yea, it's insignificant

haughty rampart
#

yeah

robust scaffold
#

and this is setting entities to 0. Before vectorization, it already took 0.01ms singlethreaded

#

Even if I cut the time of operation in half, it'll still show up as 0.01ms as the profiler doesnt show nano-seconds

#

BUT PURPLE.

haughty rampart
#

with 0 entities your code performs worse than ups, with 1 entity it performs equally, and with 2+ it performs better than ups

robust scaffold
#

See, micro-optimization. Reducing runtime by microseconds. Nano-optimization, reducing runtime by nano-seconds (counting them CPU cycles)

#

Yep, if one wants to loop vectorize set an array, declare the value as a const outside the vectorize method. Do not directly set values within the vectorized function.

robust scaffold
#

That's the ultimate vectorized code. Spent....3 hours figuring that out.

#

original code replaced reset value with 0.

viral sonnet
#

damn, I wanted to ask half an hour ago if you tried ExpectVectorized

#

was that all it took now?

robust scaffold
#

ExpectVectorized will throw an error

#

it cant detect much beyond simple addition of 2 arrays

#

and im working on a different one now

viral sonnet
#

ah ok, i think expectvectorized does nothing else. total troll method lol

robust scaffold
#

Ive replaced the inline conditional with a math.select and the constant values to the method parameter

robust scaffold
robust scaffold
#

Read the method call and check the parameter list

#

if it isnt updated to the list in the code itself, the burst output failed to update

#

Restart unity

viral sonnet
#

haven't really followed it. oh sheet, that sucks.

robust scaffold
#

and it'll fix itself

#

for example, that doesnt match the code image above

#

so I need to restart unity

#

and then check the burst inspector again

viral sonnet
#

if you can make a small repo, post it in the forum and be famous like tertle!

robust scaffold
#

nah, i'll accept my anonymity.

#

See, now it updates after a restart

#

lets see what changed...

#

absolutely nothing at all

#

well no, theres no vectorized commands before the loop like in the old version...

robust scaffold
viral sonnet
#

seems like an oversight

#

so census[i] = 0 doesn't vec but census[i] = resetValue as parameter does vec?

robust scaffold
#

yep

#

and resetValue is a const 0

viral sonnet
#

interesting find

stiff skiff
#

Wait really?!

robust scaffold
#

100% certain. I've stared at this section of code for hours

#

vmovss is not a good command. Let me see if using a direct pointer helps instead of ref NativeReference

#

Nevermind, it's literally identical. Well fuck

stiff skiff
#

Whats that command again to have the burst compiler complain if something isnt vectorized?

robust scaffold
#

What the fuck, now it's unrolled

stiff skiff
#

And its using packed singles instead as well

robust scaffold
#

I HAVE DONE NOTHING DIFFERENT EXCEPT RESTART UNITY AND IT UNROLLS ITSELF FOR ME????????

#

BURST, WHY

stiff skiff
#

So are you sure that inlined 0 was the issue with your example above ๐Ÿ˜‰ ?

robust scaffold
#

inlining the 0 into the for loop will break vectorization

#

replacing it with a parameter who's value is set by a constant integer outside the function will result in a vectorized function

stiff skiff
#

As a joke, could you XOR it with itself, and see how it likes that

#

as there is a vxorps

robust scaffold
stiff skiff
#

I guess the issue with the hardcoded 0, is that there might not be a vmovps that takes an immediate value

robust scaffold
stiff skiff
#

and with the const instead it becomes a memory load

robust scaffold
#

Nope

robust scaffold
#

vxorps exists, i know it does. Unity is doing vxorps against itself to set values to 0 in their actual code

#

but detecting it in my custom code doesnt work

stiff skiff
#

could you split reading and writing census into 2 lines?

robust scaffold
#

Like that?

#

no change in resulting burst

stiff skiff
#

Yes that

robust scaffold
#

no change in resulting burst

stiff skiff
#

Which I guess is a weird way of writing 0 haha

robust scaffold
#

all arrangements result in that

stiff skiff
#

I should really setup a test project for messing with this some time

#

wait

#

is that just straight up doing memset on your whole array?

#

removing the loop

#

Which would make sense

robust scaffold
#

wait, is that good?

#

it... seems better?

stiff skiff
#

well there is no loop in there right?

viral sonnet
#

yes

stiff skiff
#

So it figure out you are setting the same value to the whole array

#

and instead uses memset to just write 0 into that entire memory block

robust scaffold
#

okay, so memset > vectorized > not vectorized

#

goooooood to know

stiff skiff
#

vectorizing is not some magical improvement

robust scaffold
#

well I just wasted 3 hours

stiff skiff
#

hell, it might sometimes be SLOWER

robust scaffold
#

god, I wish i knew assembly

stiff skiff
#

some basic assembly knowledge is always nice to have

robust scaffold
#

Well, I have none. I'm a nuclear engineer, not a comp sci

#

Ask me about neutrons, and I can do whatever ya need. Ask me about assembly, and I have no clue

stiff skiff
#

The idea of "vector" code, is that you can do the same instruction on multiple things

#

using (one of) a specialized instruction set that the target CPU supports

viral sonnet
#

basically any memX stuff you can do, memset, memcpy tends to be the fastest thing there is really for cpus

robust scaffold
#

right, identify mem-X. Really should color it something special, like uhhhhhh red?

viral sonnet
#

would be helpful yeah

robust scaffold
#

My next task is this massive chunk, see what I can do to improve it. Now I doubt i can do anything mem-X related to it

viral sonnet
#

doesn't look like it at a glance

robust scaffold
#

It's decently purple

viral sonnet
#

yeah looks good

#

all the inner loops are vec'd

robust scaffold
#

now mad is not doing it in one line. It's cutting it up into vmulss vaddss vdivss

stiff skiff
#

Is this vectorizing tho?

robust scaffold
#

it's purple

stiff skiff
#

Sure, but its not using packed singles anywhere

#

Well, it's doing so a bit further down I guess

viral sonnet
#

at the top is moves all to xmm4 etc.

robust scaffold
#

This is with no optimization. Purple everywhere...

#

Fun

stiff skiff
#

Again, it's all single scalar, purple or not

#

the purple just means it's an AVX instruction right?

robust scaffold
#

yep

stiff skiff
#

That doesn't mean it's better sadly

viral sonnet
#

yeah, zero has a point

robust scaffold
#

hrm, true

stiff skiff
#

If you want, I can run you through the assembly you have there on a short voice chat?

#

Give you a bit more of an understand on what it does, what it means

robust scaffold
#

There's the vfmadd I was looking for, but yea

stiff skiff
#

All good. It's just too much to write here haha.

robust scaffold
#

just take a glance through that. Other than the lack of singles packing, anything else I can do to that?

stiff skiff
#

But in short, the "vectorized" stuff, are the assembly instructions that you see end in PS, not SS

robust scaffold
#

alright, aim for maximum number of PS

stiff skiff
#

PD is also fine (for doubles)

robust scaffold
#

p stands for packed I assume

stiff skiff
#

Yes

#

Also, this kind of optimization, is really only worth it in your hot loop, if the profiler says it's something that needs work

#

Though it's a fun puzzle for learning

robust scaffold
#

I want to get as much using PS as possible so I can identify what patterns result in it

#

this of course doesnt, and I need to figure out what does

stiff skiff
#

you could see it like this, you could take paper and draw 4 vertical columns in it, write the input at the top, and do the EXACT same step on every of those 4 inputs, until you get to the result, on all 4, at the same time. Then it can be vectorized

robust scaffold
#

very simple, does result in vps

stiff skiff
#

vmovps ymm0, .... this loads the 4 floats (packed single) into ymm0

robust scaffold
#

This addition however breaks the v_ps

#

if it's just by itself, it works fine:

stiff skiff
#

This is a loop within the loop over inflation.length right?

robust scaffold
#

yep

stiff skiff
#

you mean that it works with the loop being 1, but not with it being 2

robust scaffold
robust scaffold
#

Thats what it looks like with values > 1.

#

no more v_ps

#

and i know there's a maddvps version

#

thats replacing the maddss

stiff skiff
#

Pretty sure it's unrolling that loop when it's 1

robust scaffold
#

well, it's not a loop at 1

stiff skiff
#

though I agree, it should be able to make it into a mad thats done twice with the loop. dunno why not

#

what happens if you literally just copy the line twice, instead of the loop

robust scaffold
#

now it identifies the v_ps version of madd

stiff skiff
#

kek

#

burst 1.7.0

Change the optimization pipeline to run the loop unroller exclusively after the loop vectorizer. This improves codegen in a lot of cases (mostly because the SLP vectorizer is unable to vectorize all the code that the loop unroller could have).```
#

I guess this is screwing this specific example over

robust scaffold
#

time to downgrade

stiff skiff
#

Would be interesting to see if this behaviour "fixes" itself in an older version

#

Because it feels like a bug

#

However, I've bashed my head against stupid bugs enough today. Time for sleep! Lemme know how the downgrade went ๐Ÿ˜‰

robust scaffold
#

well manually typing it out does retain vps

robust scaffold
#

1.6.2, produces the unrolled variant properly

safe lintel
#

lol my charactercontroller job burst output is 80k lines, i dont know how one would find the time to optimize this

robust scaffold
#

yep, im going line by line

#

and 1.7.0 inspector is very buggy, 1.6.2 is good

#

found the limit

#

it's actually 99 loops

#

nooo, i cant outsmart the unroller

robust scaffold
#

Looks like I need to manually type out the v_ps functions. Fun

robust scaffold
#

Assuming direct control over the assembly:

#

seems like the compiler can identify what patterns result in MAD float operations. Huh. Maybe manually coding out vectorized code is the way forward

#

This program can now only run on windows based intel computers within the last 3 years. Support for other computers will require payment

viral sonnet
#

my test has some shocking results honestly

#

spellstats1 profile marker is nearly double the time of spellstats2

robust scaffold
#

yea, the lookup mem-copies every single struct you call to a "local" variable they then hand to the job

#

the exposed returns a pointer you directly access

viral sonnet
#

unity has made some terrible design with this. wow

#

I have to change so much code

#

but also quite happy that i can improve a LOT

robust scaffold
#

it's to enforce ECS style upon entities. If you have a direct reference, you can place methods on components and have them get called on their properties. You can even have inheritance in them with direct reference

viral sonnet
#

๐Ÿ˜„

robust scaffold
#

also god damn, it is surprisingly really hard to manually vectorize something

viral sonnet
#

Not sure I can follow what you mean with ECS style. I want to read fast, CDFE can't read fast when it copies. That's terrible design and I see no safety reasons, or other excuse really

robust scaffold
viral sonnet
#

Why is writing not fine?

robust scaffold
#

unity can not monitor what happens when a reference is returned from an entity. So they cant enforce thread safety. Why they didnt give us another option I have no clue.

viral sonnet
#

I get it when it would skip chunk version increments but you can get a RW pointer just fine, it's what they do anyway

#

They can't ensure that anyway ๐Ÿ˜„

#

I mean, they prevent it pretty strictly unless you put the Disable tag on it. Then it's do what you want, even race conditions

robust scaffold
#

Yea, i dont know why they didnt ship an option like exposed CDFE's get reference or something.

viral sonnet
#

with their design they just prevent using Interlocked easily

#

I saw it myself, such issues are handled really well with it. No reason to circle around with writing, reading and iterating on endless data

robust scaffold
#

I give up manually vectorizing my code

#

I cant get that to work. Well the automatic definitely does but not the other manual additions

#

the compile works fine but not all entities are added, some are skipped, some are double added

viral sonnet
#

I think you have the greatest usage of burst and vectorization with that project. Hardly any game code looks like this. It's financial software if I remember correctly, right?

robust scaffold
#

yea, simulating it

viral sonnet
#

that's really unique for unity. I think Burst has never been tested that much lol

robust scaffold
#

i just wanna get my toes wet manually vectorizing something so when I actually need it, like in the very long and twisting calculations, I can just take over from burst and vectorize it myself

viral sonnet
#

Interested what the forum guys have to say about my find. So many use CDFE and so many fuck their performance with it

robust scaffold
#

but I got my shitty manually vectorized thing to work at least. Packing consists of 8 floats, not 4.

viral sonnet
#

ah cool!

robust scaffold
#

lets see about that performance...

viral sonnet
#

my 2 big systems rely quite heavily on CDFE. Gonna rewrite all this tomorrow and see how much it'll improve. But now I'm off to bed. Have a good night o/

zenith wyvern
pearl orchid
muted star
#

dyor

glacial hazel
#

CDFE is ComponentDataFromEntity?

#

What's the non slow alternative for that?

wary anchor
#

https://forum.unity.com/threads/dots-releases-latest-release-dots-0-17.1044523/
Should we be sticking with 2020.3.9f1 rather than any of the newer subversions of 2020.3?
And should I stick to these versions of eg burst, collections etc listed in this thread? I'm trying to track down the cause of a crash and I want to rule out having the wrong versions of any of the relevant packages!

full epoch
muted star
safe lintel
#

I keep updating to the latest 2020 and it hasnt been an issue so far

robust scaffold
robust scaffold
#

Just let us shoot ourselves in the foot Unity. Do it. Dewwwwww it.

viral sonnet
#

Great answer from Joachim. Looking forward what they come up with. Really glad they are acknowledging the problem. It cropped up a few times in the forum.

#

Later on I'll come around in changing all CDFEs to read only pointers. Really excited where it'll end up.

#

I already have a tight barrier around my code to not have any structural changes in there. Personally I'm completely on the safe side

robust scaffold
#

I have a value from 0 to 7. I need to generate a 32 byte (or 8 int) wide mask from it efficiently.

robust scaffold
#

what the absolute flying fuck

stiff skiff
#

Chunky

robust scaffold
stiff skiff
#

Please note that the source hints you see in there, dont always exactly line up with what the assembly is for

robust scaffold
#

works without burst, doesnt work when burst is on...

#

i dont know if that's really worth it... I couldnt get rid of the for loop anyways

stiff skiff
#

What is this for, and what is it supposed to do?

robust scaffold
#

overly elaborate adding 1 to every value in the array

#

that's the remainder factor for the 1 - 7 values

robust scaffold
#

When I need to manually vectorize something, i'll need to figure this out

#

alright, I think I broke burst. 1.7.0 is really unstable... well no shit

stiff skiff
#

I dont think you understand what mm256_add_epi32 does

robust scaffold
#

Adding 2 values lengthwise?

stiff skiff
#

And I really recommend against using these intrinsics without a very in depth understanding

robust scaffold
#

it cant be that hard to understand, i mean it works without burst

robust scaffold
#

I mean, is that really that hard to understand?

robust scaffold
#

It's resulting in the same exact code (well not unrolled), what's wrong with using add_epi32?

haughty rampart
# robust scaffold

btw beware that default(v256) is old syntax. new code should be written with just v256 mask = default

robust scaffold
haughty rampart
#

yeah. it produces the same output so it doesn't change anything but lots of people who worked with c# for years are slow in catching up with new syntax

robust scaffold
#

alright, good to know

#

imma try and eliminate the for loop with macro bitshifting

#

the main reason why I'm doing this very elaborate removal of a for loop is because the intended code that actually typing out the assembly for wont be as simple as addition and instead be located in a macro function taking only v256 indices.

haughty rampart
robust scaffold
haughty rampart
# robust scaffold Should I stop using `var` and instead strongly type my variables?

that of course depends on the developer. for the compiler it's the same either way. i personally never use var i just wanna know what type to expect without having to rely on an ide. and most often you're just shifting the type from the left side to the right side anyway. in the roslyn guidelines there is exactly specified when to use var use var when the type is immediately visible. e.g. var name = "Mind" or var person = new Person() do NOT use var in cases like var name = new Person().Name or similar. use var in for loops do NOT use var in foreach loops

#

i personaly see even less reason to use var now because var person = new Person() can now easily be written as Person person = new()

viral sonnet
#

"do NOT use var in foreach loops" - explain pls ๐Ÿ˜„

robust scaffold
haughty rampart
haughty rampart
robust scaffold
#

The clutter does get annoying at times, like defining archtypes.

haughty rampart
#

yeah i personally dislike that

robust scaffold
#

nearly doubles the length of the actual code

haughty rampart
#

exactly

viral sonnet
#

I adopted var with ECS, too much clutter yeah

robust scaffold
#

And hours later, I've successfully developed a "generic" operation to properly handle a variable sized input array and allow for manual vectorization

#

All of that is identical to this:

#

Well, slower by about 20% from just glancing through the profiler

#

I now understand why everything should be packed into 32 byte wide structs. Makes life so much easier.

#

ya know, that is an option. Literally pad your structs to be 32 bytes wide

#

of course, ya need to fill it with something because thats room for 8 floats and if ya need only 1 float, thats 7 floats of wasted memory. Or 3 doubles

viral sonnet
#

or use [StructLayout(LayoutKind.Sequential, Size = 64)]

robust scaffold
#

64's too large, 32 is actually with width of v256

#

you want it to be implicitly convertible to v256 without size difference

viral sonnet
#

i mean to use structlayout

#

just copy pasted ๐Ÿ™‚

robust scaffold
#

Well yea, that works. Or explicit. Still, that's 28 bytes of wasted space if you stick with floats

viral sonnet
#

isn't that kind of a given with padding?

robust scaffold
#

just something to keep in mind when designing components. Either keep them single value only or expand to fill 32 bytes.

#

when ya think about if, padding 28 bytes onto a float is completely worthless, turning the vectorized function into literally single operation.

#

I was thinking about maybe merging 4 entities into 1... somehow

viral sonnet
#

yeah, doesn't help I think

robust scaffold
#

4 or 8 entities

#

all of this wouldn't be a problem if I could somehow force chunk sizes to contain multiples of 4 or 8 entities, so the arrays produced from them are aligned with no remainder when converted to v256.

#

hrm, thats an idea

viral sonnet
#

the chunk layout could certainly be designed for that. how would the last chunk be handled by vectorization when there are only 2 entities left?

#

and I see you have lots of iterations, what's your general count?

robust scaffold
#

the additional padding of more entities to result in evenly divisible by 4 (or 8) entities

viral sonnet
#

so you would ensure even the last chunk has a multiple of 4 or 8?

robust scaffold
#

those entities will be tagged with PaddingEntity component or something so the values within them dont actually result in changes to gameplay

robust scaffold
#

im trying to logic out the reasoning right now but that's the gist of it. And seeing if there's any hint to dynamically assigned chunk sizes in the training code that may fuck with it

viral sonnet
#

sure, if you can make sure the entity count fits. or do you mean that by padding? essentially creating useless entities just to keep the multiples up

viral sonnet
#

could be a pain to handle but yeah, I think the logic is sound

robust scaffold
#

dummy entities of the exact same archetype except with a component containing a bool determining if it's a dummy padding entity or not

viral sonnet
#

there's a way to change the chunk size. I've tried that once but it didn't do anything so /shrug

robust scaffold
#

because the archetype must be identical between legit entities and not, there cant be a zero sized padding tag since that changes archetype

viral sonnet
#

yeah right, that would break it

robust scaffold
#

the main problem with my code right now is that I use a loooooooot of shared component data. Well, not a lot, just one. But it fractures my chunks into about 3000 possibilities.

#

So the maximum padding entity count is 3000 x 7 (where 1 legit entity exists per shared component possibility). Thats 21,000 entities wasted.

#

in the grand scheme of things, 21,000 entities out of 2,500,000 entities is barely a blip...

#

I will need to profile extensively the performance comparison between the tail implementation I screenshotted above and using padding entities.

viral sonnet
#

that's not good. can't you bring the data into something else? like a nhm

#

and I see you have lots of iterations, what's your general count? - is your usual iteration count 2.5M?

robust scaffold
#

if I were to instead transfer the data into a NHM, I would require twice the memory now. One storing the original entities and now one massive collection containing the copy and an additional <7 indices to make it divisible by 8. That's unworkable.

robust scaffold
viral sonnet
#

that's quite bonkers. and you have less than 0.1ms for it?

robust scaffold
#

the operation above? No, it's ~6 ms.

#

just adding 1 to a component of all 2.5M entities takes 6 ms

viral sonnet
#

ok, kind of relieved. I think I would've deleted my code otherwise haha

robust scaffold
#

it's 0.6ms

#

i didnt turn on burst

viral sonnet
#

really fast, and you're also writing that amount back

robust scaffold
#

thats with manual vectorization

viral sonnet
#

so with normal code Burst couldn't figure it out?

robust scaffold
viral sonnet
#

what do you mean with manual vectorization then?

robust scaffold
#

I just need a really simple case where I can compare the burst outputs so everything is aligned as expected

#

I type out the assembly in the code

viral sonnet
#

ah i see

robust scaffold
#

is identical to

#

except there's about 12 more lines to support the manual assembly version compared to burst automatic really simple code

#

that was the screenshot above doing

#

actually, manual vectorization is about 1ms faster (in total frame accumulated time) than burst's automatic implementation. I think it's the various safety checks required

#

While manual vectorization is accessing the data directly

viral sonnet
#

that's quite a lot

#

say, have you the other posted solution running for GetAsRef?

robust scaffold
#

Yea the one you shared. I got it

#
public static class ComponentDataFromEntityExtensions
{
    public static unsafe ref T GetAsRef<T>(this ComponentDataFromEntity<T> componentDataFromEntity,
        Entity entity) where T : struct, IComponentData
    {
        var entityPrivate =
            (ExposedComponentDataFromEntity<T>*) UnsafeUtility.AddressOf(ref componentDataFromEntity);

#if ENABLE_UNITY_COLLECTIONS_CHECKS
        AtomicSafetyHandle.CheckWriteAndThrow(entityPrivate->m_Safety);
#endif
        entityPrivate->m_EntityComponentStore->AssertEntityHasComponent(entity, entityPrivate->m_TypeIndex);

        entityPrivate->CheckComponentIsZeroSized();

        void* ptr = entityPrivate->m_EntityComponentStore->GetComponentDataWithTypeRW(entity,
            entityPrivate->m_TypeIndex, entityPrivate->m_GlobalSystemVersion,
            ref entityPrivate->m_Cache);
        return ref UnsafeUtility.AsRef<T>(ptr);
    }
}```
viral sonnet
#

ah, that's the one from the other guy (forgot his name) are you using that in your main code base?

robust scaffold
#

Yep. works well without needing to import another package

#

Unity should really just expose it, it's literally a single line removed from the normal CDFE...

#

Also hrm, I disabled the safety checks in burst and restarted unity to clean the burst cache, yet it's still about maybe 0.5ms slower than manually vectorized. Even with my absolute shit tail method call.

viral sonnet
#

where do you have the struct ExposedComponentDataFromEntity code? it's bugging me about protection levels

robust scaffold
# viral sonnet where do you have the struct ExposedComponentDataFromEntity code? it's bugging m...
internal struct ExposedComponentDataFromEntity<T> where T : struct, IComponentData
{
#if ENABLE_UNITY_COLLECTIONS_CHECKS
    public readonly AtomicSafetyHandle m_Safety;
#endif
    [NativeDisableUnsafePtrRestriction] public readonly unsafe EntityComponentStore* m_EntityComponentStore;
    public readonly int m_TypeIndex;
    public readonly uint m_GlobalSystemVersion;
#if ENABLE_UNITY_COLLECTIONS_CHECKS
    public readonly bool m_IsZeroSized; // cache of whether T is zero-sized
#endif
    public LookupCache m_Cache;

    [Conditional("ENABLE_UNITY_COLLECTIONS_CHECKS")]
    public void CheckComponentIsZeroSized()
    {
#if ENABLE_UNITY_COLLECTIONS_CHECKS
        if (m_IsZeroSized)
            throw new ArgumentException(
                $"ComponentDataFromEntity<{typeof(T)}> indexer can not index the component because it is zero sized, you can use Exists instead.");
#endif
    }
}```
#

basically CDFE but with everything public

#

You need to add these 3 lines in a separate .asmref file located in the folder those functions are in:

#
    "reference": "Unity.Entities"
}```
#

that bypasses the "internal" function protection nonsense

viral sonnet
#

ahh, that's it. thanks!

robust scaffold
#

Np, i am starving. Im gonna go find some food then maybe actually set up Unity Performance Tests instead of just looking at the profiler and eyeballing it

viral sonnet
#

enjoy, don't code while starving! I've worked with the unity perf test and it's wonky from timings

#

hope you have more stable results with it

robust scaffold
#

I might just use the profiler markers - recorder built in methods to just output data into the debug logger.

#

Does performance test hook onto profiler markers?

#

no it doesnt, alright I'll roll my own performance testing solution. Profile markers work in burst anyways and Im pretty sure Unity;s performance test cant.

viral sonnet
#

if you profile the editor they will show up

#

but I'd roll with your own, let it really run, much more stable

viral sonnet
#

performance: 2 > 3 > 1

#

shame ref is a little slower than the pointer. it's so much easier to write and read

#

the ToSpellStats extension seems to make no difference if I use "this ref" or not. Which would mean if the struct is already a ref there's no by value paramter involved. good that it works like this, wouldn't have expected it though

viral sonnet
#

those profiler markers are really unreliable

#

or the cpu/worker threads are ๐Ÿ™‚

#

i have 4 ways now and from measuring it, I honestly can't tell which one is the fastest

#

well, the first is always the slowest but the other ones are fluctuating like crazy

#

i have direct pointer access, casting the pointer with ref instead of UnsafeUtility and one with UnsafeUtility.AsRef

#

I should write a test for it. this seems pointless

#

or I should stop wasting time because it's clear anyway which is fastest. the one most annoying to write -.-

robust scaffold
#

But yea. Getting and modifying the pointer directly skips a lot of safety checks

viral sonnet
#

I only read from it, in the process a new struct is created

robust scaffold
#

Ah. Hrm. Interesting

viral sonnet
#

the SpellStats struct

robust scaffold
#

Mark spellstats as volatile. Just a prefix. Dont know of itll change anything

viral sonnet
#

I'd need to make an isolated test to really see what's going on with regards to safety checks, etc...

#

spellstats is a local to the thread. but good point, I need to read into volatile and how it could be useful to me in some contexts

robust scaffold
#

Volatile will just prevent the compiler from optimizing away previous actions. For example, you're doing nothing but writing to spellstats

#

The compiler may be deleting actions by 1 and 2 that may skew the results

viral sonnet
#

ok, I see, that may help indeed

#

Local variables cannot be declared volatile

#

well, let's see if it makes a difference as property

#

hm, doesn't really work in that context.

#

can't really set it as a private property for the job, i'll try anyway

#

i remember not being able to set anything private in jobs but let's see

#

a volatile field can't be of the type "spellstats". lol screw this ๐Ÿ˜„

viral sonnet
#

have you ever worked with ants profiler or something? where you can profile and see the timing of every single line of code. that's so fucking useful, never got it to work for unity and for burst I didn't even try anymore. thta would be so cool to have

#

I'm honestly a little lost where the cpu time is going

viral sonnet
#

really popilar for c# development, app or web

robust scaffold
#

oh, never heard of it before

#

i only do unity anyways

#

Holy shit

viral sonnet
#

lol, don't remember how much it was back then when I used it. Not much money for a company ๐Ÿ˜„ but for an individual, yeah. I also hate the subscriptions that are so common now -.-

#

you don't own anything anymore, kind of sucks

#

tight! visual studio can do the same, let's see how much I can get out of it

#

just the execute isn't that useful

#

ok, VS can profile all my code. holy shit ๐Ÿ˜„ also with burst

#

not quite sure how to interpret this ๐Ÿ˜„

robust scaffold
#
protected override void OnUpdate()
{
    if (_recorder.enabled)
    {
        _times[_index++] = _recorder.elapsedNanoseconds;
        
        if (_index < _times.Length)
            return;

        _recorder.enabled = false;

        var mean = _times.Average();
        var standardDev = math.sqrt(1d / 99d *
                                    _times.Select(time => time - mean).Select(innerPow => innerPow * innerPow)
                                        .Sum());
        Debug.Log($"[Results] Mean: {math.round(mean / 1e3d) / 1e3d}ms." +
                  $" SD: {math.round(standardDev / 1e3d) / 1e3d}ms.");
    }

    if (!Input.GetKey(KeyCode.Space)) 
        return;
    
    _index = 0;
    _recorder.enabled = true;
}```
#

Just hammered out a performance "recorder". Press spacebar to output general statistics of the area of code recorded by the profiler marker

#

it records 100 frames then logs the statistics to the debug log

viral sonnet
#

good stuff!

#

this seems weird

robust scaffold
#

i dont know if the numbers mean what ya think it means

#

because why would setting a local variable take so much numbers / time?

#

Also thats what the output looks like

viral sonnet
#

it's just setting values from a struct to a local variable. no idea why it's so high

#

VS calls the value cpu unit

#

maybe similar to tick?

#

it's reliable but this part sticks out

robust scaffold
#

it shouldnt take that long though

viral sonnet
#

part of interlocked and holy shit HasComponent ...

robust scaffold
#

did you run the game in order to get those values?

viral sonnet
#

attached to the editor

robust scaffold
#

huh, i dont know then

#

maybe it's counting the number of commands run?

viral sonnet
#

well, all numbers seem kind of expected. the only thing that sticks out is the var source I sent before, that one I don't get

robust scaffold
#

Also that's automatic burst vectorization's performance

#

nearly 8 ms difference, why do I not believe that>

viral sonnet
#

what was the one you posted before? without burst?

#

or manual?

robust scaffold
#

manual yea

#

Thats what the profiler says, 8ms is even more than the total runtime

viral sonnet
#

can you disable the markers when you take your recording?

#

it's massive overhead

robust scaffold
#

markers dont take any overhead, well any more than when the profiler is looking at it

viral sonnet
#

not true, my frames take 5 times longer with markers

robust scaffold
#

Well it's not much different

viral sonnet
#

interesting, I have very different results with sampling. maybe i'm a bit excessive, 13 x 250k

robust scaffold
#

yea, you really only need what, 100 frames? 500 at most. A good sample size. Not 250k....

viral sonnet
#

depends on what you want to measure ๐Ÿ˜‰ i'm measuring the different code paths inside the method

robust scaffold
#

ah, im just measuring addition so its not much

viral sonnet
#

k, seems neglect able then

sinful cipher
#

I once had a bad experience with declaring struct fields inside of a struct of the same time (with the intention of making some kind of node structure).
Maybe you can already figure this did not go very well.

However, nesting NativeCollections inside of other NativeCollections should not produce any problems, right? (E.g. NativeArray<NativeArray<int>>)

sinful cipher
viral sonnet
#

i think it's the pointer to a pointer problem

robust scaffold
#

Generally you want to linearize your data to maintain cache coherency. Burst will optimize by just adding to the pointer the type size. By nesting, you prevent that optimization.

viral sonnet
#

which means non-linear memory layout

sinful cipher
#

Use case:
NativeHashMap<Entity, NativeArray<VertexData>> to associate large amount of vertex data with a particular Entity

viral sonnet
#

use nativemultihashmap

sinful cipher
#

Hmm...

#

Also if VertexData for a particular Entity is swapped in and out frequently?

robust scaffold
#

But if you require that the order of data remain constant (NMHM does not guarantee order when set in parallel), use NativeHashMap<Entity, UnsafeList<VertexData>>

#

You can not nest Native-X containers. They're classes due to managed safety check method calls.

sinful cipher
#

VertexData elements for a particular Entity should appear in sequence, but nothing more

robust scaffold
#

Does the vertex data change? Like a single value in that list change?

sinful cipher
#

I vaguely recall encountering an issue when attempting to put a NativeArray inside an IComponentData. However, the IComponentData will do nothing more than hold a pointer/reference to the array, right?

robust scaffold
#

Do you know the vertex data at load time and never expect to change it? Maybe swap for a different vertex set?

robust scaffold
viral sonnet
#

the vertex data being swapped in and out frequently is a problem. can you work around it?

sinful cipher
robust scaffold
#

Vertex data is not meant to be placed in native containers. Far too large

#

If vertex data does not change often (maybe once every few seconds at least), bake in vertex data into Blob Assets.

#

BlobAssets are basically read only massive native containers that you can also nest things with. And they're intended to contain nested arrays

#

Unfortunately, they're also just arrays of blittable types. So no strings (unless you code it into char[]) and no pointers.

#

And they were designed for vertex and mesh data in mind. Something that should change very rarely because they're very expensive to set up.

sinful cipher
#

The problem is not necessarily one of performance, but one of ease-of-maintainance. Right now I have IComponentData that acts as a key/index to a range of VertexData within a NativeArray shared by multiple mesh Entities.

The reason for this shared NativeArray: My mesh calculation jobs operate on a batch of meshes, as its wasteful to do 1000 jobs for meshes that consist of just a few triangles.

However I am not very pleased with how complex everything has become. So I wish to simplify/streamline it as much as possible. Which is why I'm exploring options such as NativeHashMap<Entity, NativeArray<VertexData>> and such.

robust scaffold
#

that will mean a unique set of VertexData for every entity though unless you start fragmenting your chunks

sinful cipher
#

I'll look into the UnsafeList. So far I have not used it before

robust scaffold
#

UnsafeList is basically NativeList with all its safety checks stripped out

sinful cipher
#

These jobs can last multiple frames so I'm not that concerned about how the chunks will behave and such

sinful cipher
robust scaffold
#

Ah, if they last multiple frames, you can not attach Unsafe directly to a component

sinful cipher
#

Oh wait so unlike NativeList, UnsafeList can be put in an IComponentData?

robust scaffold
#

Jobs involving entity data can not last longer than a single frame. So yea, the only way forward is NativeHashMap<Entity, UnsafeList<VertexData>>

robust scaffold
sinful cipher
#

I see

robust scaffold
#

NativeList / NativeArray are also raw pointers to buffers but with a little bit special features

sinful cipher
#

I see I see

#

thanks alot

robust scaffold
#

Burst is basically built around operating on NativeArrays so other containers are basically second class. They could work but dont optimize well. I dont believe they can even autovectorize.

#

Which is what I'm largely discovering with my tests on dynamic buffers. A bit of an issue

sinful cipher
robust scaffold
#

We can get a pointer to the actual buffer using .AsNativeArray().GetUnsafePointer() so manual vectorization should work...

#

But if I could do it, Burst should be able to as well.

sinful cipher
#

Hmm

robust scaffold
#

Alright, the profiler marker can measure singlethreaded very well. Why is multithreaded so high?

#

Well that's just clear proof, auto-vectorized is garbage for singlethreaded job. Or I forgot to turn off safety checks

#

Yes. I did forget to turn off safety checks. Yea, that checks out with my understanding.

#

Automatic vectorization

#

Manual vectorization, what the fuck

#

it's faster than adding one to every index????

viral sonnet
#

oh wow, by a lot

robust scaffold
#

It's adding every value a random number, doesnt really matter what it is since it'll constantly overflow but still

viral sonnet
#

weren't you saying you used the same operations that burst made?

robust scaffold
#

yea, except Automatic isnt vectorized at all. Burst didnt seem to understand the NextInt command applied to all values

viral sonnet
#

ah

robust scaffold
#

im very suspicious of the manually vectorized results anyways. Somehow faster than += 1 to every index? No fucking way

haughty rampart
#

well i mean manual vectorization is nice, but with that you lose platform dependent compilation, don't you?

robust scaffold
#

And yes, i've turn off leak detection and safety checks for both

robust scaffold
#

Only intel windows 10 computers from the last 3 years can run this program, sorry

haughty rampart
#

so i mean i am glad that you are pushing the boundaries but is it really worth the effort in the long run? starting to doubt it

#

i am all about performance, but if it breaks on any future machine without rewriting pretty much everything.......

robust scaffold
#

5.7 ms down to 1.6 ms (maybe) just by spending a few hours typing out the assembly yourself? That's big

#

and it's not even the assembly as burst handles a lot of the pain as well even using manual commands

robust scaffold
robust scaffold
#

The burst processor detector is compile time constant so all the extra code is stripped

#

Burst compiles it all then passes the relevant data based on whatever starts up the program

haughty rampart
#

hmmm. i mean i see that it definitely can make a big difference, but i don't see bigger studios (or even small ones) writing burstable code with manual vectorization

robust scaffold
#

And honestly all you need to do is support AVX2 which combines all the sub-instructions:

#

There's also Neon but psh, who has an AMD computer in 2014 anyways

haughty rampart
#

who has an amd pc in 2021 XD

robust scaffold
#

"Those taiwanese chip developers are dead and will never develop a viable chip anyways, hahahaha" - Intel Execs circa 201X.

haughty rampart
#

i'm definitely still intrigued though.....keep going

robust scaffold
#

wait no, ARM is mobile phones

#

Even better, I will never be targeting ARM ever

haughty rampart
#

someday....

robust scaffold
#

microprocessors from Intel and AMD proposed by Intel in March 2008 and first supported by Intel with the Sandy Bridge[1] processor shipping in Q1 2011 and later on by AMD with the Bulldozer[2] processor shipping in Q3

#

Yea, intel and AMD are collaborating and running AVX2 instructions. So that is all I need to bother to support

haughty rampart
#

yeah, seems decent

robust scaffold
#

ARM / Neon controls over 98% of the mobile market though. So if you need to optimize for that, use ARM instructions

#

Each feature level above provides a compile-time check to test if the feature level is present at compile-time:

haughty rampart
#

yeah ok. i literally can't wait for my colleages to freak when seeing advanced code though. XD gonna be insanely funny

robust scaffold
#

Yea, it's compile time constant. So no need to worry about 150+ lines of manually vectorized code to just add 1 to every index of an array slowing ya down. Burst got ya covered.

haughty rampart
#

ok nice

robust scaffold
#

I believe it's set by the build target as well. Guess that does do something in the end.

#

I mean, the manual vectorization does result in regular C# code if you follow it all the way down in the source. Except it's structured in a way that the Burst pattern recognition can easily map.

haughty rampart
#

yeah that's of course understood

robust scaffold
#

So if you dont leave a fallback method and still compile for a platform that doesnt have that instruction set, it'll still work. Maybe not as well optimized though due to the massive amount of extra code.

haughty rampart
#

meh. it's still bursted code. will still be pretty good

robust scaffold
#

yea, I repeated the tests and confirmed that each entity is being randomly set a value. That's 2.5x faster than regular bursted code.

#

Manual

haughty rampart
#

pristine ๐Ÿ‘Œ

robust scaffold
#

Automatic

#

massive difference, manual unrolled the random value generation into a series of identical code and with the AVX2 commands as well whereas manual just has the generic mov.

#

And the AVX2 command includes vpaddd, so it's a packed addition. Full vectorization.

robust scaffold
haughty rampart
#

think so

robust scaffold
#

I dont recommend it if you plan on doing manual vectorization. Burst eventually stopped updating its cache of burst function outputs and wouldnt recompile, even if I reset the editor and turned burst off and on again

#

I think manual vectorization broke something and it didnt reset even after reload. I had to go back to 1.6.2 for everything to work properly.

#

1.7.0 introduced "cached" versions of bursted functions to speed up burst compilation inside a file and I think I broke something

haughty rampart
#

hm yeah. hope that get's fixed in a release soon

robust scaffold
#

Well yea, Burst 1.7.0 isnt even available for the "public" yet and only accessable through manual version request

#

Still, the arrows are beautiful. God I miss them in 1.6.2

haughty rampart
#

oh definitely

robust scaffold
# haughty rampart oh definitely

Also a big tip I've largely discovered today. Use [AssumeRange()] everywhere you can. Burst loves it and drastically changes the resulting compiled assembly for the better with it.

haughty rampart
#

oh i'm kinda an advocate to use any attribute whenever available as much as possible

robust scaffold
#

That assume range starting 1 removes a wasted "if empty" return check in the assembly.

haughty rampart
#

oh awesome

robust scaffold
#

Just one of the attribute I've seen actually do something

haughty rampart
#

[NoAlias]

robust scaffold
#

And the int remainder max value of 7 occasionally allows Burst to unroll and fully vectorize things as well. 8 ints is a magic number and allows for easy packing

robust scaffold
#

Im pretty sure I can also vectorize the random number generation.... pretty sure

#

Alright, once you multithread it, the gap between manual (top) and automatic (bottom) Burst isnt that large. ~0.1 ms so a 15% improvement when you manually vectorize. Pretty much what Burst themselves say to expect.

#

TLDR version: If you have a singlethreaded very mathematics heavy job, manually vectorize. You can get massive speedups by doing so. Multithreaded jobs like those working on Entities, ehhhhh. Massive amount of time for a 15% improvement (that will get larger though the less cores the computer has). Performance critical items should be manually vectorized as expected.

#

And the fact that it has taken me ~20 hours to create a functional manually vectorized job to add += 1 to every index of an array kinda tells ya the amount of effort as a beginner in assembly (no previous knowledge) has to take in order to do something so incredibly simple.

haughty rampart
robust scaffold
haughty rampart
#

well....burst still is part of dots

robust scaffold
# haughty rampart none said it'd be trivial

Yea, I wish it was though. But that's autovectorization. As I mentioned earlier today, I wish there was a way to define chunk entity sizes. Multiples of 8 would make manual vectorization literally as easy as just automatic vectorization.

haughty rampart
#

maybe in the future

robust scaffold
#

The vast majority of my time coding was spent trying to get the remaining 1 - 7 components vectorized without using a value-wise for loop

#

Well fuck, valuewise for loop is faster

viral sonnet
#

Have you thought about putting float4 or int4 in a comp instead of float and int? might be easier to get at least the multiple of 4

robust scaffold
viral sonnet
#

it's kind of weird though for the entity index itself

#

but if that doesn't matter

robust scaffold
#

If i only need 1 float, I'm wasting 7 float worth of space if I pad it out

#

Entity index determines the size of the native array I'm computing over

#

a float4 in an entity is equivalent to 4 entities with 1 float

viral sonnet
#

that's true with pointer access

robust scaffold
#

And that is what the native array is, a linear pointer access point

viral sonnet
#

when the cache line is 64 bytes, reading a native array at [0], the next 4, assuming ints should be free, right?

#

eh, more but you get the meaning ๐Ÿ™‚

robust scaffold
#

Vectorized random int generation. Pulled from the NextInt() function itself

#

wellll, no. This will result in 8 identical ints

viral sonnet
#

I just put a marker in a job that has no code in between and it's still measuring like 1.87ms

viral sonnet
#

Yeah, I don't know what it's measuring exactly. Itself? lol

robust scaffold
#

code sample?

viral sonnet
#

lol

robust scaffold
#

let me try....

viral sonnet
#

the marker is in a method that's called 250k times

#

but, it shouldn't do that?

robust scaffold
#

mine are called about 4,000 times

viral sonnet
#

it should be literally 0

robust scaffold
#

Nah, there's an overhead, even for mine which is 4k

viral sonnet
#

that's annoying

robust scaffold
#

0.078/4*250 = 4.875ms, so it's not a linear cost at least

#

as long as the number of entities and method calls are the same, the overhead should be constant

#

comparisons still work, just not perfect comparisons

viral sonnet
#

well, at least I can tell now that the variable assign from a struct that VS was so heavily profiling is garbage measurement. i mean, it was weird to have it be like 1.7ms, but when an empty marker has 1.5ms it's quite okay.

#

and disappointing what VS is measuring. i really dislike these kind of false positives that i'm chasing sometimes. just a waste of time -.-

robust scaffold
#

huuuuh, I think random is broken. They're missing pointers

#

NextInt() from random gives the same exact value every time

viral sonnet
#

the Unity.Mathematics.Random?

robust scaffold
#

because they're not modifying their state directly

#

yea

viral sonnet
#

use it with ref

robust scaffold
#

next state doesnt actually change the state...

viral sonnet
#

otherwise it returns the same value

robust scaffold
#

oh

viral sonnet
#

i had the same thing in the beginning and was sooo weirded out because sometimes all my casters were just missing the target!

thorn fossil
#

Where do I dispose of a referenced native array in a job? the calling function or inside the job iteself? [BurstCompile] private struct CreatePathJob : IJob { public NativeArray<AstarNode> nodeArray;

viral sonnet
#

i think you know by now but state is a field

robust scaffold
robust scaffold
#

but the math isnt compiled away so it did its job in making the compiler do things

robust scaffold
# thorn fossil thanks

it's a job that schedules disposal of the container so just chain it onto the back of the job using it

viral sonnet
#

oh nice, so vectorized random is working now?

robust scaffold
#

no, it'll keep returning the same value

#

what it's doing is preventing burst from mem-set away all the logic

thorn fossil
# robust scaffold it's a job that schedules disposal of the container so just chain it onto the ba...

Like this? ```
private void FindPath(Vector3Int start, Vector3Int end)
{
NativeArray<AstarNode> nativeArray = new NativeArray<AstarNode>(nodeArray.Length, Allocator.Temp);
for (int i = 0; i < nodeArray.Length; i++)
{
nativeArray[i] = nodeArray[i];
}

    Vector3Int startCell = navigationMap.WorldToCell(start);
    Vector3Int endCell = navigationMap.WorldToCell(end);

    CreatePathJob job = new CreatePathJob
    {
        nodeArray = nativeArray,
        start = new int2(startCell.x, startCell.y),
        end = new int2(endCell.x, endCell.y),
        bounds = bounds
    };

    job.Run();

    nativeArray.Dispose();
}```
robust scaffold
#

so this is replicating a complex and expensive task that I manually vectorized and is identical in code with an autovectorized (or not) code.

robust scaffold
viral sonnet
#

the rewrite to use CDFE pointers shaved of around 1ms. I had hopes it would be more but it's honestly not that bad. as it's multi threaded it's really over 8ms overall

robust scaffold
robust scaffold
robust scaffold
robust scaffold
robust scaffold
viral sonnet
#

I've thought a lot about going around the random access but it usually means, write out data and start another iteration which is always more expansive than just taking the hit. some mechanics will never be that fast but I still can't stop thinking about solving this ๐Ÿ˜„

robust scaffold