#archived-dots
1 messages ยท Page 237 of 1
then the chunk is sparse? lol sorry, that bit is confusing me
if I have X1/Y1/Z1 archetype and I read X1, what's in the cacheline? Y1/Z1 or X2/X3?
Definitely x2 and x3. Why would we be including other unnecessary component?
why unneccesary? when you expect to work on one archetype changes are high you read the other IComps too in the next few lines of code
Bevy is (afaik) the most advanced ECS right now
its ECS is kinda like Flecs and Entt both into one, with a sprinkle of some of DOTS ideas
This seems like a decent explanation, maybe it will help
rust is on the same perf tier as cpp/c. The case with Burst is that it uses some tricks that aid vectorization
so burst can be a bit faster than cpp if it happens to autovectorize better than cpp code
which is honestly not that hard to see
outside of vectorization, burst should be same speed as C/Cpp/rust
and rust also does those autovectorization helpers, but they are harder to get compiled in the compiler due to the insane layers of inlining going on. Still better chance to vectorize stuff than cpp which almost never vectorizes
bevy stuff could be compared quite a bit to burst due to the extra compiler checks and limitations going on
rust does not let you run parallel for systems if the compiler complains
Burst inspector is so nice. And the option for no inlining.
offtopic but: every time i look at cpp i think to myself: wtf? who came up with this syntax. it writes AND reads terrible. and the language itself 'compiles' with hacks left and right
"A chunk is divided into parallel arrays: one for each component type of the archetype and one array for the entity ID's themselves. " - This explains why the components are stored in their own arrays: for a chunk with archetype, say, ABCDEFG, we often only want to loop through a subset of the components, like A and B, rather than through all of them. If components of a single entity were instead packed together, looping through a subset of the components would require wastefully accessing the memory of the other components we don't care about. - from that I must again conclude, Unity is using what bevy calls sparse but chunked up in 16k
for (all chunks of the archetypes that include A and B)
for (i = 0; i < chunk.count; i++)
// a and b of the same entity
var a = chunk.A[i]
var b = chunk.B[i]``` their pseude code here wastes cache lines, yes?
The answer to who came up with the syntax: An old danish guy in the 80s
Though I guess the inventor of C should get a lot of blame as well
syntax wise, c# is probably my favorite language. although there are a few things that i also don't enjoy. e.g. switch statements. they really don't feel like they belong in c# imo. switch expressions look like they belong much more
Dont like the inline switch? I think its okay.
I definitely liked C# until I got into rust
no. i LOVE the new switch expressions. the old switch statements don't look like c#. aka
switch var
{
case 1:
break;
}
Ya know. I wish I could answer but I dont know enough about bevy to answer properly
I only read the linked page, from the bullet points Unity has the worst of archetypal and sparse set. They use chunks with summed up IComp arrays, slower iteration on a per entity basis and costly adding/removing, less parallelism. I'm honestly perplexed what I'm reading
when I find out that chunks are not even linearly aligned I'm throwing tables
I'm not sure but the guy who wrote it describes bevy's archetype storage as being the same as Unity's
the descriptions from unity and the description of archetypal ecs on bevy doesn't match though
I read that bevy post long ago and also thought Unity Entities would work like that but then conflicting statements cropped up and now I'm just confused ๐
this is what I understand from how Unity Entities works regarding memory layout
incrementing the pointer from CompA1 along would result in A2, A3, A4, etc...
if A4 is actually aligned, I don't know. just assuming
from how allocating works chunk2 could be somewhere else in memory
Pointers dont work across chunks. The chunks themselves are not aligned
I think the chunks themselves are sparse but the internals of the chunks are archetype
and they patented that shit ...
it really wouldn't make too much sense aligning the chunks as well
sure it does, depends on the algorithm
but having chunks not aligned is really the least of the problems I'm seeing
no. what if you add multiple hundred entities? or even 1 more than you can put in your already aligned chunks? you'd have to reorder your whole memory
it'd only really make sense if you'd have lots of persistent entities where the count doesn't change
are you saying compa1, compa2, compa3 different components? or different entities of the same component
different entities of the same component
my project uses a pretty big archetype of 10+ comps. that layout seems terrible for that purpose
haha nice ๐
that said i don't really understand why you think this is a bad layout, or maybe i'm misunderstanding how you think the memory is laid out
so you have lots of wasted cachelines then?
if I read CompA1 I'll have CompA2/3/4 in the cache line. If I don't use this data in the next lines of code and read other comps all those cachelines would be wasted, right?
so, especially when you have big archetypes the memory layout would be better when you read CompA1 and get compB1, compC1, etc...
except the whole point of this archetype is to allow simd code
{
if (!batchInChunk.Has(this.healthType) || !batchInChunk.Has(this.healthMaxType))
{
return;
}
var output = this.healthNormalized[this.healthNormalizedEntity];
var healths = batchInChunk.GetNativeArray(this.healthType);
var healthMax = batchInChunk.GetNativeArray(this.healthMaxType);
var outputArray = output.AsNativeArray().GetSubArray(indexOfFirstEntityInQuery, healths.Length);
for (var i = 0; i < outputArray.Length; i++)
{
// Unity.Burst.CompilerServices.Loop.ExpectVectorized();
outputArray[i] = new HealthNormalized { Normalized = healths[i].Value / (float)healthMax[i].Value };
}
}```
take my AI system for example
i can REALLY fast normalize huge amounts of data for every entity
no doubt, great code for that. I can hardly bring my code to a one lined simd code
is this style of code the norm in your project or an outlier?
hm, I've chopped up my code before but that resulted in so many iterations it was not worth it
i spend a lot of time in burst inspector (1.7 is amazing btw)
burst is what we wish the DOTS team was
burst is dots...
i assume you mean entities
that said, entities team is great if they talk to you ๐
i do get that the forum silence is annoying though
say, what are you doing with the output array then? from that piece of code you don't write it back to the chunk?
? you don't need to write ot back to the chunk
oh wait
sorry was thinking wrong code
that array is directly passed to my AI system
i do a lot of code on direct array access
basically i precompute all my conditions etc i need for utility
interesting approach
in very optimized simd branchless code
then pass them to a giant graph which is branching, does decisions etc
Is healthNormalized just Normalized? Is there a reason why you're not reinterpreting outputArray to a float array?
mostly because i intended to change it to hold more data at some point
i don't need float accuracy
i'm going to compact it to like 8-16 bytes of actual normalized, a few bytes to store if health is max, health is low, etc
Interesting. I'm primarily optimizing for easy v256 access, not memory size.
(from a compiler point of view, there's no difference though in the current code and just being a float)
so yeah thats basically why i haven't updated it yet
because i can't write it nicely without breaking vectorization
tertle, do you have a targeting system for distance check, angles, etc..?
but when it hits my graph i no longer need to it to be simd friendly
sorry not exactly sure what you mean by that
like for AI targeting?
yeah
i do at work, not in my personal project yet
so the entity knows if the target is in range or something
i do have a full vision / fog of war system though
generally, how do you handle entity relationships? Like when one entity needs to query another to obtain data like location
actually wrote that like 2 years ago, kind of my first dots library
i don't need code, just interested if it's simd
do you just store the entity as a property of a component?
where is that available?
the alternative is storing the pointer to the relevant component on the target entity, since Im pretty sure that is fixed so long as the target entity does not change
don't think i ever released it. i haven't touched it in 2 years though
but it's kind of in my back pocket whenever i need it in future for something i'm working on
(i do intend to go back up, use it and improve it at some point)
did you use CDFE for random lookups?
if i need to but not always
my orca/rvo (local avoidance) implementation doesn't use it
whats the alternative?
^
standup just started ๐คข , be back in like 15
i'll see if i can show what i mean by that
looking forward to it ๐
question: i am referencing a scriptableobject in a conversion system of mine. the entity is in a subscene. when i change the scriptableobject with the subscene closed, the values are not updated
Dont quote me on this but when you close a subscene, it caches all data within the subscene upon moment of closure in a bit array
so any information pulled outside the subscene that is assigned in editor is not changed until you open and reclose it
unfortunately that is also not quite true. when i open the subscene and close it again, it still doesn't update. only when you explicitly press 'reimport' it updates
i've also tried declaring an asset dependency on the conversion system, but that does not work either
i haven't tried this in ages
but have you tried
GameObjectConversionSystem.DeclareAssetDependency()
Can you force an update during conversion? Like Asset.Reimport()?
/// Declares that the conversion result of the target GameObject depends on a source asset. Any changes to the
/// source asset should trigger a reconversion of the dependent GameObject.
/// </summary>
/// <param name="target">The GameObject that has a dependency.</param>
/// <param name="dependsOn">The Object that the target depends on. This must be an asset.</param>
public void DeclareAssetDependency(GameObject target, UnityObject dependsOn) =>
m_MappingSystem.Dependencies.DependOnAsset(target, dependsOn);```
tried, after i googled it, but either i'm using it incorrectly or it doesn't work
yeah thats why i prefaced it with i haven't tried in ages because i had a feeling i had this experience as well
i think it works with some things but for some reason scriptableobjects weren't working
i really like subscenes, but that's.........less ideal
let me just open up my master project
i feel like i had a workaround for this ages ago but can't be certain
also, other question: someone said that Unity.Animation.Hybrid.CurveEvaluator is the slowest unity has to offer? (been a few days but i just remembered? not 100% sure tho) what other evaluators are there?
is there any word on when there will be word about the upcoming DOTs direction? ๐
wdym? dots is actively being worked on
yes, worked on but what direction has it taken since last we heard.
Nope
i guess a direction of internal development ๐คฃ
bummer... My custom entity solution + jobs framework meets my needs, but it would be nice to abandon physx.
So this is my implementation of the OCRA algorithm, with 200k entities
With this many entities the actual algorithm amazingly executes in just 1.58ms on my CPU
well, dots will at least get compatible with newer unity releases at the end of the year again
*entities
the problem i have is, the nearest neighbour calculation is not as fast as i'd like.
I'm currently using a quantized hashmap as which takes 2.41ms to quantize all entities then 4.24ms to find the nearest neighbour
if I could find a faster algorithm i could scale this amazingly
So the solution to avoid cdfe is to just pre-fetch all positions and pass it forward?
(the nearest neighbour calc is still 4x faster than unitys BVH in physics, 8x faster than arongranberg quadtree for writing, 10% slower for reading)
this runs entirely in IJobFor wait that's not true anymore. it uses IJobEntityBatch but it still uses a NativeArray with precomputed data indexed to entityInQueryIndex
it's basically a standalone simulation that reads from entity world then writes back at end
I tend to write a lot of my high performance simulations are standalone from entities
sure, but I am not a fan of the current gameobject conversion process, nor do I feel a pure entities solution is viable for my needs at the moment. I ended up creating my own light weight entity system which used cached gameobjects for rendering and physx and this works fine for my purposes. However, physx would be nice to drop for unity physics.
As far as conversion goes the only thing I heard was a random forum post hinting that they might be changing that whole thing. That was a while ago though, who knows if they've dropped that idea since then
I am hoping.
Ahh okay, I see. Interesting idea
Seems like you'd lose a lot of the architectural benefits of ecs at that point
found anything @rotund token ?
cool stuff, sounds like you don't use entities much in this except for Translation?
not really something I can translate to my problem though ๐ฆ
has a bunch of other config components per entity
etc
these are read | written to directly
anyway my point with this one was just that, every problem often has a different solution
what's your take on properties in IComponentData ? any downside?
i think properties expose data a specific way are good
but i wouldn't use properties over fields just because you can't inspect them
the target system aside, this is what I'm dealing with: https://pastebin.com/4srprrzf lots of small mechanics that need simple conditions and write backs for persistent data. not sure how simd'able this is. I'd need so many little jobs and from what I've tested the iteration job scheduling is not worth it, so I have huge job. (not that I prefer that)
Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.
you probably can't. this is the type of thing i a precompute though (maybe sliced over multiple frames depending on what it is) because it might be used in multiple places
like a creature having vision on a player might be used to determine targeting, to determine if it can attack etc
while a creature having previous vision on a player might be used to determine if it should search for a player in a specific area, or throw a blind attack etc
like, i guess properties are kinda against DOP, understandably, but like, we're still in c#, so the c# i do write i'd like to keep up to standards. aka, no public fields, only public properties. it's not something i am terribly inconvenienced by but you know....
Components don't fit into that mindset. They shouldn't really have private data since systems are what are modifying them
oh i agree, i only use public fields for IComponentData|Buffer and readonly structs
true. they don't have private data, only public. which is why PUBLIC, should be properties in 'normal' c#, is what i'm asking about
Im 99% sure autoproperties work in Entities....
(addendum: i don't use public, but internal, much better imo. yet properties still apply to internal)
but now this discussion is getting me to doubt myself
They aren't serializable
they work fine. they don't appear in the inspector
[CreateProperty] doesnt work?
wrong. just add [field: SerializeField]
you simply expose the backing field to the editor then
Ahh, that wasn't a thing the last time I used unity
c# properties have a getter/setter methods. That's their whole point. If you don't need it then you can use fields. Has nothing to do with public really
And yes, that's what I was getting at too. If it's public data why do you need a property
auto properties are a thing. it's just c# standard NOT to have public / internal fields. only properties
because of this ^^
But the standard is built around the idea you are writing typical C# with classes that have a good reason to hide internal data via properties
You could argue it's just c# standard NOT to have public / internal fields. only properties because of OOP
should this standard apply with DOD?
yeah, that's what i'm curious about
sorry to interrupt - i'm spawning a prefab and setting it's collider filter, and it seems to be also setting the filter from another prefab spawn.. does this make any sense?
from memory colliders are shared, there's a toggle to make them spawn unique
wondering if physics batches same geometry colliders
hmm, even when it's a completely separate prefab and collider?
Force Unique
basically it's two separate character prefabs, each have essentially the same capsule
huh
i never noticed that
i wonder if that would still share collider between all instances of specifically that prefab?
no
ie still be efficient, or would it then spawn 100 unique colliders for each
so this i think would imply..
tooltip explains it
may share collider data with other objects if they have the same inputs
this is why your prefabs are sharing
that all objects with a say box collider of same size or whatever the criteria is
would not be able to have different collision filters unless you specify unique
if you specify the collision filters on the physics shape they will generate separate for you
the issue is you're changing them at runtime
i'd take a bold claim and just say 'no' for the time being. sometimes i just wish i had more knowledge on how methods and stuff is saved in memory, then i'd probably be able to get to a more complete answer
this has taken me a day and a half of debugging to get to the bottom of, why raycasts were misbehaving
A property is just a syntactic wrapper around get/set methods that get generated by the compiler
well thanks @rotund token that helps
exactly which means the method has to 'be' somewhere
so are you wasting cycles going there and then just returning the value?
etc
I'm not sure what you mean. They're just methods. You can inspect them via reflection even
Im pretty sure the compiler can optimize away the wrapper method to the base acquisition of the value field.
unfortunately like i said, i don't yet have deep knowledge. but as i understand it, imagine it at least, you have data in memory......good......you can work with that data.....but how do you know HOW to work with it?.....well.....there's gotta be somewhere (method)data that is saved that describes HOW to work with the data
Pretttty sure. Because I have not seen an equivalent to the burst inspector for basic C#.
burst is very aggressive at stripping away, too aggressive sometimes ๐
hm, yeah i guess this of course also happens, which is very nice.
but are you sure?
I have not experienced any issue with aggressive inlining. The 1.7 change to prioritize vectorization over loop unrolling is very annoying though.
interesting haven't really looked into that
Thanks, this is very helpful!
can you give an example of when that causes bad issues
i detailed my very long rant a few... weeks ago in this discord channel. Let me see if I can find it
Hmm, I put it in a local variable outside my loop but it was still complaining. Perhaps the read only will help. Thanks!
and heh "looped into that"
anyway. another question: i can't seem to assign a new mesh to RenderMesh.mesh anymore. i know this was possible, but when i loaded up the project it just wouldn't work anymore
'assigning' does not make sense
are you setting the shared component?
em.SetSharedComponent(entity, new RenderMesh {mesh = mymesh});
you were able to do Rendermesh.mesh = new Mesh() previously
that would do nothing on any existing entity
you are just writing to a local copy of a struct
yeah why didn't i do that before? i wonder now......
but this 100000% worked
at some point
if you do something like
renderMesh.mesh.setVertices(newverts)
this would work
because the mesh is a reference
yeah but it changes ALL meshes since they are shared
what i did previously was new Mesh()
mesh.SetVertices()
rendermesh.mesh = mesh
is that before after?
and it worked. anyway, i'm just gonna use SetSharedComponentData as is probably correct anyway
oh wait after before?
The above image is what 1.7 produces for a math.mad. The bottom is 1.6
1.6 properly produces an unrolled MADD operation, 1.7 splits it into scalar add and multiply
1.6 is also properly packed operation. 1.7 is scalar for loop
?
talking to kornflaks
oh ok
have you tried OptimizeFor.Performance
not that code specifically, it's using only the internal for loop. Ignore the outer k
i guess it's this change
Change the optimization pipeline to run the loop unroller exclusively after the loop vectorizer. This improves codegen in a lot of cases (mostly because the SLP vectorizer is unable to vectorize all the code that the loop unroller could have).
Yes. I've set my AOT burst for OptimizeFor.Performance and using the lowest float precision
Yep. That's what is annoying in 1.7
have you brought it up with the burst devs
Eh, they probably know what they're doing. So I'll just type out my manual AVX2 commands
@zenith wyvern you remember when Topher mentioned whole scene conversion as a big change coming down the road? Wonder what happened to that guy, was nice having a member of the dots team in here
that won't change anything except in builds, i just meant to [BurstCompile(OptimizeFor = OptimizeFor.Performance)] but reading the changelog i suspect it won't matter
Yea, that as well. I tried a lot of things to get the unroller to work but it just doesnt seem to unroll anything at all now in 1.7
interesting, i stopped using 1.7 because of some issueswith the new inspector (errors that would stop it rendering)
I do remember. And yeah, he used to be pretty active, it was nice. I remember him popping up not too long ago for a sec
might have to pay attention to my performance benchmarks when i upgrade again
oh how i wished unity had support for c# 10 already. Rendermesh with { mesh = mesh } would be so nice
Oh yea, that as well. Burst inspector wont update due to how they changed burst compiling. There was some issue with cached functions and not re-compiling them. Going back to 1.7 in 2022.1 A16, I dont have that problem anymore thankfully. Only A12 had that problem.
it's been reliably updating the inspector in A16. Although stability isnt the best though if you try updating the code with the inspector open. Had a few crashes when that happened. Just make sure to close the inspector, then change ya code, then recompile, then reopen the inspector.
i'm only using 2021.2, apart from a quick look i haven't touched 2022.1 yet
No clue if they backported it then. Probably. Burst is a package, not a core engine process.
yeah you can just add it manually
I'm just using the release version of virtual texturing of 2022 HDRP. No other reason why I'm on alpha.
1.7 still supports 2019.4+
var mask = default(v256);
var maskPtr = (int*) UnsafeUtility.AddressOf(ref mask);
for (var index = 0; index < remainder; index++)
maskPtr[index] = int.MinValue;
// Operation.
var finalAddition = Operation(vPtr[length / 8], sumValue);
var location = UnsafeUtility.AddressOf(ref vPtr[length / 8]);
X86.Avx2.mm256_maskstore_epi32(location, mask, finalAddition);```
Ya know, I thought I was onto something when I made that chunk of code. Sadly not.
default;
it's what the burst source does. I'm following those to a T. I know I could strongly type the mask then use default;
ah ok. XD
var maskPtr = (int*) UnsafeUtility.AddressOf(ref mask);
for (var index = 0; index < remainder; index++)
maskPtr[index] = int.MinValue;
wish they'd just add an indexer
i just wrote an extension for it
think i did something like
{
fixed (v128* v = &a)
{
((float*)v)[index] = value;
}
}```
this just made the rest of my code unstable. now when i do follow up operations it tells me 'BufferTypeHandle<T> has been invalidated because of a structural change.' dealt with that at some point?
sounds like you're looping a buffer (or storing one) then using entitymanager
any structural change of EM invalidates all buffers
omg but i can't set the data at the END because after setting, i also need to update the material
use an ECB or do buffer.ToArray and loop that
Changing shared components is a structural change that requires a memcopy of all entities within a chunk headed by that shared component to a new chunk
So dont change shared components often
yeah i know. i guess i'll figure something out.
can't use ECB btw, because those don't work with .Run()
ECB should work with Run()
yes you can, i said ECB not ECB system
which is the only thing supported for ShareComponentData
it just wont work if you use EntityManager along side it
it's either pure ECB or pure EM
var ecb = new EntityCommandBuffer(Allocator.TempJob);
Entities.Foreach(Entity enityt => { ecb.setcomponent(entity, new Component())... }.Run();
ecb.Playback(EntityManager);
ecb.Dispose();
i never liked using EM, i prefer ECB, but i always get exceptions when using it with .Run()
are you using entities.foreach?
you could just be missing WithStructuralChange().Run()
OOOOh, well, i guess i always use ECB System
the issue is i think they're either looping a dynamic buffer
dynamicbuffer<val> something;
foreach(var v in val) { EM.AddComponent() }
or reading from it
EM.AddComponent();
something[0]
so what's the difference between entitycommandBuffersystem.GetOrCreateCommandBuffer() and new EntityCommandBuffer(Allocator.TempJob) ?
The buffer system assigns playback at the ECBS. Manually creating a new ECB requires you to call .Playback(EM) yourself.
The command buffer system determines when in your system order your buffer will run
ah, and that doesn't work with .Run() ?
it should
Command buffers work in run
strange
there's just no real reason to use a deferred command buffer in run
because you've already sync pointed, just get it over with
Yea. Just skip the middleman and directly conduct structural changes with EM.
Yes if you're on the main thread just use entitymanager. It's faster
and yes ideally just use EM
but there are conveniences of using a local ECB just to avoid issues with dynamic buffer
well, found a different solution. thanks for the help. i was reading a dynamicbuffer after applying. just moved that to before applying the structural change
it's only a temporary solution anyway. i'd much rather use the new Mesh.AllocateWritableMeshData(); api. but i found it kinda incompatible with entity jobs.
How so? It's been a LONG time but I remember using it before
In the end just using Mesh.SetVertexBufferData with native arrays turned out to be a better solution for me
but how? you can't allocate it inside a job, so you have to know how much you need BEFORE, and you can either apply it to ALL meshes at once, or 1. either way, it immediately disposes everything after you call Mesh.ApplyAndDisposeWritableMeshData() and how are you supposed to use that if you only get 1 entity mesh at a time? entities.foreach / jobforentitybatch the same here
Yes, you end up losing all the data after you apply it, this is why I went back to using dynamic buffers/native arrays
But it does work fine in jobs
yeah same, but i'd rather not since that IS the new dots mesh api afaik
I guess it's more useful for if you have a massive amount of mesh data you need to calculate and apply all at once
First step, move that for loop into a separate function.
well i do, but HOW if i am looping through entities
The SetVertexBufferData/SetIndexBufferParams api is still pretty new and fast
yeah haven't looked into that one yet, mainly because i couldn't immediately grasp what is needed there
What do you mean? You calculate all your data then apply it on the main thread afterwards
Now there should be a way to merge those two conditions.
yeah, but you still get 1 entity at a time with a job
Simple example of what I was using it for
I use this AllocateWritableMeshData for my terrain chunks
It sounds like you have too much data attached to a single entity? You can also just use IJobFor with native arrays to have more control over how it gets parallelized
validSpell_Array[i] = IgnoreGlobalCooldown_Array[i] || gcdEnding_Array[i] < tick;
not what i mean. Entities.Foreach(Rendermesh rendermesh)
{
Mesh.ApplyAndDisposeMeshData(rendermesh);
}
that's something you can do, but you're not applying it to each entity then
You can't apply your mesh to multiple entities at once but you can apply your mesh data to multiple meshes at once with the array/list versions of ApplyAndDisposeWritableMeshData
But yeah, because changing an entity's mesh involves a structural change there's nothing you can do about that except maybe combine your mesh if possible
@robust scaffold thanks for helping! It's not working when using the logical operator though. hm
And like Tertle said using a command buffer can potentially help by offloading all the structural changes to a sync point
it does work when I comment out one them
i mean normal SetVertices() etc, works for now and i'll probably look at the VertexBuffer variants but i'm not sure how far Mesh.ApplyAndDispose...() will get me
If you're changing your mesh data a lot it's a dead end in my experience. If you're just doing one huge mesh change and done, then it can be a lot faster from my testing
And just switching to SetVertexBuffer should be a huge saving since you can avoid the allocation from copying arrays
it's just terrain generation on a chunk by chunk basis. so generally i'm only creating it once
yeah, i'll do that definitely
Is there an error message or does the logic not work?
Make sure you invert the gcdEnding_Array[i] check to less than < rather then the greater than > you used in your code snippet.
@robust scaffold this works now ๐
I dont know if boolean checks get vectorized. Maybe. Move that for loop out into a separate function then [MethodImpl(MethodImplOptions.NoInlining)] to check. I think it should be
don't care about logic bombs right now lol. the comparison is wrong because I tried some stuff. important thing is that it's vectorized now. hard to read if I can also use the globalCooldowns comp array because it has a few values in it. there is purple code in but ... eh ... all chinese
Move it out then post it here. Let me see
I am also pretty sure you want to use less than< to get the same logic as the one you posted. In that second line.
yes! ๐
what's the diff between .AsNativeArray() and .ToNativeArray()? any significance?
weird to read. the method seems like it's not inlined
.toNativeArray returns a memcopy of the original array. Any changes to a NativeArray originating from a .ToNativeArray are not done upon the original array. .AsNativeArray() returns a pointer wrapper around the original array.
So changes done to a native array from a .AsNativeArray() are mirrored on the original array.
ok, so As is also more performant
definitely
dynamic buffer source
Well, it's packed and vectorized. Mission accomplished.
Well yes but it serves a specific purpose. A lot of functions only accept native arrays, so you can reinterpret a dynamic buffer as a native array and the changes to the array will affect the buffer
Native array is also a bit faster as it has less safety checks, though maybe that gets stripped away on an actual build
ok, i hope it doesn't break anything if i then dispose the original nativearray
In my opinion, dynamic buffers are terrible. Horrible performance. Horrible memory usage. No automatic vectorization. I just use fixed buffers.
It will very much break if you dispose a native array you've reinterpreted from a buffer
The only good part is that you can stick the data outside the chunk storage using a buffer size of 0. Helps with managing pointer reference callbacks.
i mean when passing it to the mesh api. cause i guess it's gonna be reallocated for it again
If you're talking about SetVertexBufferData that doesn't cause allocations
It does some fancy thing and copies your data directly to the gpu
If you dispose of an original container from a .ToNativeArray(). Nothing will happen. If you dispose original from .AsNativeArray(), the resulting NativeArray will break.
well, okay, no AsNativeArray for me then since i remove all the vertex data from my entities again to save memory
interesting, so du mean basically just class/system level NativeArray when you say 'fixed buffer'?
well you should always do AsArray on them before using them
Yeah any time you're accessing a buffer a lot reintrepeting it first will make a huge difference
No. Fixed buffer as in public fixed structType[17]
Huh..
so how is that used?
Can't you just use FixedList?
It returns a pointer but you can access it using a [] indexer and acts just like a normal array.
i guess dynamic is, dynamic
Those work fine in components
fixed list really
Fixed list wastes a single slot for length. Using a fixed field allows for you to define a constant length so it doesnt take up a spot and it's a C# built in property
so basically i'd guess fixed buffer approach is a pre-allocated non-dynamic thing, similar but without the overhead of a dynamic array?
Ahh, okay. I have no idea what that means but I assume you know what you're talking about
The only reason you would use a FixedList is for the already coded equality check. Otherwise it's a complete waste.
Oh and if you've toggled off Unsafe code. Fixed requires unsafe.
public unsafe struct Deltas : IComponentData
{
public const int Length = 8;
private fixed int _elements[Length];
[CreateProperty]
private int[] Elements => new []{_elements[0], _elements[1], _elements[2], _elements[3], _elements[4],
_elements[5], _elements[6], _elements[7]};
public ref int this[int index] => ref _elements[index];
}```
Now I could just use Delta IComponentData like a regular array.
so, simd code with an IComp that has more than one field is not working, right?
does putting methods in a struct like that have any non DOD overhead?
No? It just requires manual vectorization. Burst I dont think can recognize inter-component vectorization
The Elements? What methods?
yeah the elements thing i guess
That is a purely editor inspector property. Used in selecting the entity in the EntityDebugger and the data will show up in a nice list format. In build, because nothing references it, it gets stripped out by the compiler.
dafuq is the [CreateProperty] attribute
It's for EntityDebugger inspector view
oh
ahhh... how in gods name do you know all this
When you click on an entity in the debugger, there's properties that show up. fixed arrays do not show up built in so you need to implement a [CreateProperty] to make it show up. Like a .ToString() implementation.
I'm not 100% sure it gets stripped but I can ensure that it will using a compiler #if UNITY_EDITOR check but eh.
that makes sense yeah, i was looking again over the code and kinda figured that must be it
// <summary>
/// String displayed on the map itself.
/// </summary>
[HideInInspector] public FixedString32 Name;
/// <summary>
/// String used for debug purposes in Unity Editor Inspector.
/// </summary>
[CreateProperty] private string NameValue => Name.ToString();
/// <summary>
/// String used in the subtext of the province window.
/// </summary>
[HideInInspector] public FixedString32 Capital;
/// <summary>
/// String used for debug purposes in Unity Editor Inspector.
/// </summary>
[CreateProperty] private string CapitalValue => Capital.ToString();```
so the public ref int this[int index] does that allow indexing the component directly?
isn't that technically a non-editor method?
That's for ease of use. So I dont need to do Delta._elements[index]. I can just use Delta[index].
It'll compile to the same exact thing in Burst.
I checked for that.
ahhh
nice
10 points for that ๐
yeah that's pretty nice
someone was complaining a while back about big overhead with DynamicBuffer
this is definitely a nice alternative
Yea. There's a thread on the forum. We replicated it here. And I couldnt get burst to vectorize any math working upon DynamicBuffer.AsNativeArray() for some reason. No clue. Manual vectorization would probably work though
Nice, i'd pin that little fixed struct snipped if i could ๐
buuuuut fixed arrays are not dynamic ๐
Anything is dynamic if you make the size large enough
๐
oooof. then you always have to check if it actually contains a valid value, which is very difficult with structs and you're wasting loads of memory
*the index
well i think the useful thing with this is, you maybe often use dynamic buffers just for small amounts of data, so this is essentially i'd assume a far quicker way to do that
Yep. That is what FixedList is doing. One of the slots is used for length.
JacksonDunstan.com covers game programming
i always find it such a pain working with only half populated arrays, when i can't ensure to fully populate the array, i'd rather have a list
i guess the difference would be there if, say you have 100k entities with buffers
of course
i'm not personally doing that but it's nice to have the option
definitely
Yea. I got that Deltas fixed int[] on 2.5M entities. And adding it to another fixed int[] on the same entity requires 6.5ms for all entities.
And that's fully vectorized addition. Non-vectorized would require ~100ms.
yeah for this many entities fixed things are definitely the way to go
interesting i'd love to know the actual meaning of that last statement
vectorized?
Vectorized means those nice juicy purple commands in Burst.
https://docs.unity3d.com/Packages/com.unity.burst@1.6/manual/docs/OptimizationGuidelines-Aliasing.html here you can visually see what vectorizing does
does it optimise by packing multiple single value operations into vectors
right
Basically the best analogy I have ever seen
not quite what's going on though
so uhhh... how do you ensure vectorization of adding components
if you code in shaders, it's basically a wavefront.
shudders shaders......
ah wait you means literally summing values between int[]'s on different entities
I have Component A with a single field int. I want to add that int onto Component B. So B.int += A.int.
Yes
for (var i = 0; i < deltaArray.Length; i++)
for (var j = 0; j < Deltas.Length; j++)
inventories[i][j] = deltaArray[i][j];```
not sure what's going on here. using a NativeArray<uint> has way more purple simd arithmetics than uint*
Unfortunately, burst can not identify this as vectorizable automatically
More purple doesnt mean better
More purple vp_____ does mean better though
hm okay but I'm not understanding why the many differences occur
post the two burst inspectors then. I'll try to figure out whats going on
for (var i = 0; i < length; i++)
inventories[i] = X86.Avx2.mm256_add_epi32(deltas[i], inventories[i]);```
well shit
Those two code snippits are doing the same exact thing. But the second is about 20x faster.
so that's for specific build target only i guess
Yes. For Windows only.
Mac and Linux are screwed. Because I dont want to code them.
does android allow such tomfoolery?
The nativeArray version is unrolling the loop into 4 operations in sequence
It is faster
I believe you can get the same result using uint* if you place [AssumeRange()] everywhere. I think
Yes, Android uses NEON intrinsics: https://blog.unity.com/games/updated-guide-for-using-neon-intrinsics-in-unity-burst
Interesting stuff, thanks for sharing your insights @robust scaffold
NP. This is the magical world of micro-optimization. Now I've gotta figure out if burst can unroll my code if I implicit convert to a native array instead of using pointers...
that kinda what enzi is doing above?
I tried var globalCooldowns = chunk.GetNativeArray(GlobalCooldownEnding_ReadHandle).Reinterpret<uint>(); as parameter now and it's not unrolling the loop. only when I use the job parameter NativeArray<uint> ... this is weird
Yep. I wanna see if I can replicate it
public unsafe partial struct SpellCastJobSIMD_GCD : IJobChunk
{
[ReadOnly] public ComponentTypeHandle<GlobalCooldownEnding> GlobalCooldownEnding_ReadHandle;
public uint tick;
public NativeArray<uint> gcdEnding_Array;
public NativeArray<bool> ignoreGlobalCooldown_Array;
public NativeArray<bool> validSpell_Array;
public void Execute(ArchetypeChunk chunk, int chunkIndex, int firstEntityIndex)
{
int chunkCount = chunk.Count;
var globalCooldowns = chunk.GetNativeArray(GlobalCooldownEnding_ReadHandle).Reinterpret<uint>();
Lala(chunkCount, globalCooldowns);
}
[MethodImpl(MethodImplOptions.NoInlining)]
public void Lala(int chunkCount, NativeArray<uint> globalCooldowns)
{
for (int i = 0; i < chunkCount; i++)
{
validSpell_Array[i] = ignoreGlobalCooldown_Array[i];
validSpell_Array[i] |= globalCooldowns[i] < tick;
}
}
} ```
gcdEnding_Array in the last line, unrolled, globalCooldowns not unrolled
feels like I can't get hello world code right ... lol
Are you using ISystemBase? Why are you using IJobChunk? IJobEntityBatch is better (I'm 99% sure).
As for the second, uh, hrm.
true, I should change it
am i right in saying they're phasing out all those job types and leaving only systems with foreach?
Damn, no unrolling.
Definitely not. The lambda foreach compiles to the struct based job.
curious what assumption burst makes and pretty interesting to get automatic unrolling with nativearray, seems like some kind of optimization is kicking in
nah, both styles will stay. ๐
I honestly have no clue what causes the unroller to kick in. And I really need to figure out how...
this was the comment i'd read:
Oh. That's the old SystemBase.
i basically came into ecs post-foreach so wasn't sure how that all fit together
That's the stuff up here near the file class
And IJobForEach was an abomination. A really nice abomination as it taught me how to use struct based jobs but horrible for actual coding.
the 2 system classes are phased out and replaced with Systembase and IJobForEach is also old stuff and overshadowed by IJobChunk and IJobEntityBatch now
yeah i do remember it being messy, based on digging into it a little previously
The ComponentSystem and JobComponentSystem were early versions of SystemBase. Like Monobehavior for GO scripting.
i def think i need to look into the benefits of using the chunk and batch jobs but so far happy enough just using foreach
IJobForEach was basically a struct version of Entities.ForEach.
you'll smoothly gravitate towards struct based jobs ๐
what are the main benefits over foreach?
no limits, more control to put it simply
i mean it can be annoying being limited to the number of 'arguments' in a foreach
Back then, making the jump from the sinking ship that was IJobForEach to the new (back then) IJobChunk was a small hop. Now between Entities.ForEach() and IJobEntityBatch is like traversing the niagara falls on a tight rope. You can get a lot wrong and if you do, your entire program dies.
hehe not sure if that's a good or bad thing
small example, you mostly read from a comp and have only conditional writes. you can't do that in Entities.ForEach
Entities.ForEach would need a ref so you'd touch every chunk and therefore make chunk.DidChange useless
so wouldn't that require some pre-fetch thing to queue up all the writable components before actually running the logic
so like, gather all Translation comps that are writeable, and all Rotation comps, but there could be different amounts of either etc
nah, you'd just have a ComponentTypeHandle for read and write (having 2 of the same comp is also not possible in ForEach) and then acquire the write handle only when needed
rather than in for each i'd guess it just cues up entity count * Translation components and so on
yeah good point re having two of the same type, that can be frustrating
yeah, or having a ComponentTypeHandle for read and a ComponentDataFromEntity for the same comp to write
just out of pure curiosity, it'd be interesting to see a comparison with that ie performance difference
I've delayed writing struct based jobs way too long. should have switched sooner and I can only recommend it to anyone. Ignore Entities.ForEach after 1 month of learning ECS
yeah i can understand
i have to say i do really like foreach, in that it's really compact
The performance ranges from equivalent to ForEach to many many times faster if you get vectorized code.
Yea. And hammering them out is fairly simple. I use an occasional Entities.ForEach() when I just need a placeholder to do something
I found setting local variables to use inside a lambda so annoying, or utilizing methods and not have one blob of code
And inspecting the burst output from ForEach is basically impossible
Because ForEach compiles into IJobChunk pre-processor (I think using source generators, maybe not) and the compiled into Burst.
i bet there are some framework store assets starting to appear that utilize jobified ecs-ified code to do typical game stuff
inventory systems and weapons, health, targeting, etc
the Job system is fully functional and fully released. So yea. Go ahead and make frameworks.
i genuinely think that once there's more roll-in of general stuff like terrain, pathfinding, etc etc, dots could literally catapult the performance profile of unity completely
And thats where Unity is stuck for the past ~ 1 year. Welcome to the club / purgatory
i'd guess they're working on exposing dots in a more user-friendly way for that reason - user friendly gui / authoring approach with dod underneath
hehe yeah
man i complained for long enough that hardware was getting faster but software was, not really improving that much
oop has a lot to answer for
Intel / AMD are rolling out 512 bit wide instructions. That's vectors with 16 ints in a single operation.
class for this, class for that, multiple layers of inheritance, data duplicated and scattered around memory with consummate abandon
phones getting more and more powerful, still lag out on ordinary tasks
there are way too many code monkeys in the field who are happy that they can write code with only a few bugs
Yep. And operations with only a single float or int mathematics. The instruction bit size is 256 generally on modern CPUs and OOP single int only uses 32 out of 256 per operation. Complete waste.
sledgehammer / nut situation
'sledgehammer to crack a nut'
how about - perfectly arranged array of nuts, to fit precisely the area of the sledgehammer's impact surface ๐
can someone give me a quick headsup on ISystembase? i mean i understand that you can then burst compile everything, but how's that supposed to work with all the things that are non-burstable in the first place? e.g. commandbuffersystem.getorcreatebuffer() and all this initialization stuff?
eh, there's no use in getting the loop unrolled. time for the burst forum?
if energy is truly going to be the new currency, i can see poorly built apps and software essentially get penalized out of the market
might take 10 years but i doubt more than that
why should energy become the new currency?
ahh just that whole buzz-wordy thing regards the incoming green revolution, carbon credits, and the impending collapse of the fiat monetary system
microsoft patent 666 ... lol - use your body as crypto miner
but it's true, unity are ahead of the game, they're positioned well for this
ISystemBase is a really niche interface. I am 99% sure you can use .GetOrCreateBuffer(). If you cant, you'll need a separate function with [BurstDiscard] to acquire it then pass it into the main bursted OnUpdate() for use in IJobChunks.
battery usage is king
joachim said don't use ISystemBase yet ๐
fusion. liquid reactors. doubt that energy will be insane. at least if humanity doesn't fucking screw everything
Fuck the CTO. He said that Interlocked is bad. Thus, he is bad.
haha have you looked around recently
doesn't burst just completely ignore it then? wouldn't it be 'null' then
dreaming of utopia and here we are getting jabbed and forced to use subpar "green" energy. might take a few years for that to come all true
i mean that humanity screws everything is nothing new. still, one can hope
BurstDiscard just means it wont be burst compiled. It still will be run. Also, if you dont know the in and out of Burst, you shouldnt use ISystemBase. It's the extreme bleeding edge of DOTS and even I gave up from how restrictive it is.
i think ISystemBase is one of the major things Unity is working on right now
ok. i'm still very interested in it tough. but yeah, i dabbled with it and it's really tight
and the current system is not even in prototype state
It's in pre-alpha. The only reason why it exists is because people on the forums hyped it so much that the CTO got sick of the forums and went radio silent for the entire year. Pretty sure.
i had no intention on really using it currently. just couldn't wrap my head around on how pretty much all the non-burstable things are outside of jobs currently should work with ISystemBase then
Forget all the bugginess of HybridRenderer, I legitimately believe ISystemBase is the cause of Unity's current radio silence. Because the CTO saw the resulting outcry over how garbage it was released and just gave up on publishing the next dev versions until 1.0.
what's the issue with hybrid renderer anyway? why is it shunned?
Horrible performance.
Nah. I got far better overhead performance on V1
performance is still better on v2
i'd really like a native dots renderer though. why'd they have to go for a hybrid approach for the exact thing that slows down frames the most???
kinda need it for skinned meshes tho
HR is pretty garbage for a 2021 renderer
i don't think there's a more performant way in that
You would basically need to open up a third render pipeline purely for DOTS. Project Tiny was doing that and it was very rudimentary. Basically Godot / open source engine level.
well i guess there's shader based animation systems that work as an alternative to skinning
And people are already up in arms about the URP and HDRP split. Imagine a DOTSRP as well.
thinking out loud to no-one in particular here
which v2 is using
people are stupid. that's all
uhm, wasn't there a burst sub forum?
Yep. Under scripting. Basically dead.
The developers are not dead though. They do check both the burst forum and DOTS. They fixed an issue with current Entities earlier this month with 1.6 or 1.7 when it broke something.
I dont remember what but they do care about us languishing in development hell.
to add extra salt, they do care about devs who have no idea about simd code /s not ceral
well, hope I get some answer. I don't understand it and I found no workaround
which problem?
the unroll problem
and i found the forum. Yea, post it there. The devs respond relatively quickly (faster than the current DOTS team ETA of 1 year+)
done, posted it in the wrong subforum at first ...
it's been fun. time to go, good night all o/
wait, try removing the ref. You dont need it
same, 2am here, laters folks
@robust scaffold any experience with ref struct IComponentData btw?
I just use straight pointers. I havent touched ref structs before in any meaningful sense.
ref struct just means they can only be allocated on the stack
but idk if that has any affect in dots
i'm curious though
Hrm, let me look into that. I've gotta reproduce enzi's code unrolling code though first
sure
Use a ref struct or a readonly ref struct, such as Span<T> or ReadOnlySpan<T>, to work with blocks of memory as a sequence of bytes.
Nope. Cant be used in bursted jobs.
Span / ReadOnlySpan can not cross Mainthread - Job thread boundary
Thus, you can not use them as a IComponentData
You can use them as a struct within a IComponentData, let me dig up an example. But it must remain either Mainthread only or Jobthread only
huh. curious. why is that a limitation though? on the first look that doesn't make sense
Burst limitation
Would be invalid because the span is being used across the managed/Burst boundary. The reason this is not supported is that there is additional complexity in C#'s implementation of the span types because they can also support taking spans into managed data types (like a managed array).
ah, well that makes maybe more sense
yeah that applies to span only though afaics
https://www.jacksondunstan.com/articles/5051 Scroll down to public ref struct Enumerator.
ref structs are supported in burst, just cant cross boundaries
hmmm ok
also wouldn't it make more sense to use preincrement, especially for burst, instead of postincrement?
cause in cpp there definitely is a performance difference. shouldn't burst be affected too then?
For that article? No clue. I havent tried it or optimized it myself but it works.
like ++var vs var++? There's quite a debate online.
i only use preincrement anyway.
(mostly)
because i did dig deeper into how it compiles down even in c#
yeah, they are mostly interchangable. as long as you don't chain them into other operations
which i don't recommend
var++ vectorized would just burst compile into mm256_add_epi32(source, new v256(1)). The location of that new v256(1) doesnt matter.
And the location of that command on the code execution chain wouldnt matter either
So yea, preincrement and post increment ++ location doesnt matter.
array[var++] != array[++var]
Well yea. That's just re-organizing where the command for increment is located.
The command will still need to be run, the data will still need to be read. Reading first then writing.... ya know. var++ should be faster than ++var.
usually ++var is faster
var++ is read var, write var + 1.
++var is read var, write var + 1, read var.
There is no difference what so ever
[BurstCompile]
private struct IncrementTest : IJob
{
public NativeArray<int> TestArray, OutputArray;
public void Execute()
{
Method(TestArray, OutputArray);
}
[MethodImpl(MethodImplOptions.NoInlining)]
private void Method(NativeArray<int> array, NativeArray<int> output)
{
for (var i = 0; i < array.Length; i++)
{
output[i] = array[i]++;
}
}
}```
Now there's a difference
and pre-increment requires an additional infrastructure command
so pre-increment is in fact slower than post increment. Different data sure, but the action itself is slower
hmmm. cause i know that i got super frowned upon in my internships when using post increment. 'preincrement is faster'
Definitely faster than using post increment and then modifying it to result in pre-increment value
it very much depends on the data you want to obtain
ah yeah. ok. so maybe not so wrong afterall
If you need preincrement values, use it. If you need post increment, use that. If you're doing loop increment and dont need the return value, there is no difference
guess there's no downside to preincrement then
Yes. Do not use post increment and then add one in order to get pre increment
well, not much. it's just slightly worse.
fine.......slightly worse
it is one command more
And I will stick to using post increment where the return value isnt needed. Because C++ is written as C++ and not ++C.
and yet people that write c++ get mad at you when you use postincrement
Those people are dumb.
like i said. had a few internships at medium to bigger companies. they all frowned upon post increment
XD
thats the difference, I dont have internships. hahaha
isn't the biggest issue with for that you need a way to limit checking of the loop against its end point? And that Jobs have somehow solved this by optimising the length check?
I vaguely remember reading on a JacksonDunstan blog somewhere that .length is optimised in such a way that it's better than caching the value into for loops. But I could be wrong...
Alright, I have unfucked my code by simply believing in the stuff I read on stack overflow
// https://stackoverflow.com/questions/36488675/is-there-an-inverse-instruction-to-the-movemask-instruction-in-intel-avx2/36491672#36491672
var vShiftCount = new v256(31, 30, 29, 28, 27, 26, 25, 24);
var vMask = new v256(uint.MaxValue >> (32 - remainder));
vMask = X86.Avx2.mm256_sllv_epi32(vMask, vShiftCount);
// Operation.
var finalAddition = Operation(vPtr[length / 8], ref sumValue);
var location = UnsafeUtility.AddressOf(ref vPtr[length / 8]);
X86.Avx2.mm256_maskstore_epi32(location, vMask, finalAddition);```
var fPtr = (int*) vPtr;
for (var index = length - remainder; index < length; index++)
fPtr[index] += sumValue.NextInt();```
those two chunks of code are the same
According to the Z test: "The value of z is -3.52941. The value of p is .00042. The result is significant at p < .05." There is definite improvement in using a bitmask and maskstore over Burst's loop implementation
private v256 Operation(v256 value, ref Random sumValue)
{
var additionFactor = new v256(sumValue.NextInt(), sumValue.NextInt(), sumValue.NextInt(),
sumValue.NextInt(), sumValue.NextInt(), sumValue.NextInt(), sumValue.NextInt(), sumValue.NextInt());
return X86.Avx2.mm256_add_epi32(value, additionFactor);
}```
Yes. the improvement is 0.005ms, or 5 microseconds but this will now allow for a "simple" single manually vectorized method to apply for all non-v256 aligned arrays using this remainder code.
well, not a lot of sleep. @robust scaffold still going ๐
so my unroll problem is definitely an aliasing problem
but NoAlias tags also don't change anything
You're a trooper enzi
๐ pretty finicky this whole simd thing. got a lot of different results now with essentially the same code but never the one preferred
are you in europe i take it?
ah cool.. UK here
greetings overseas o/
hope things are well in Austria, crazy times atm
i'm alright having remote work but yeah it's pretty bad. everyone's going insane
yeah same i've been working at home the whole pandemic
the blessed ones ๐
haha yeah been lucky!
although i'm not sure if being stuck at home coding all day is a good thing ๐
Been doing it since 2018. I also miss having a talented team, the company I'm at now had a pretty shitty team so I don't miss them. lol Got more work done alone than with 3 other developers
sadly game dev studios are pretty much non-existent here in austria
hah yeah there's nothing worse than people not pulling their weight
the only great studio is moon studios with ori where I know the art director and some others but they were also a remote company from the beginning.
i'm on my own so pretty much doing 3d modelling and animation stuff as well as code etc.. it gives a good balance and i really enjoy both, kinda tough trying to do everything on my own though, takes time
yeah, or making things worse. it's unbelievable how much someone can fk up
ah cool! i suck at 3d modelling and everything on the art side. I can only make up for it on the programming side
yeah sometimes when people get a salary they lose the edge and or don't care, of course there are always people that are just bad
freelancing and being self employed definitely gives you that edge, you need to fight to stay afloat
exactly, most don't give a fuck about their work or craft. quite sad honestly. not people I want to be around with. lack of enthusiasm kills teams
yeah, take away that security and you'll see sink or swim
some of us have no choice about that ๐
we had this one backend dev who wrote functional code, every line took ages and his routines took nearly 30-90seconds to finish when scale increased. just frustrating
'you had one job' hehe
or worse, the backend guy leaves and nobody knows what in the hell his code does
then I come along, rewrote the same thing and it finished in under 1 second. the job I'm doing now, there's not a single line from the team anymore. lol
you need a payrise ๐
it's so hard to quantify good coders
yeah i guess it is
i'm the only developer now, payment is pretty good. sadly the product isn't going that well ๐ฆ
i've always been more a generalist ( although i've spend a massive chunk of my freelance time coding, c#, c++, python, even flash actionscript ) so i've never been in a dedicated team of coders as such.. generally a team of multi-disciplinaries
i think that's the best setup. have talented people who don't step on each others toes. every larger company I've been is full of shitty coders and those have the biggest ego. hard to work with. big ego and coders go weirdly hand in hand. I've a big ego too I suppose but my judge is the machine lol - so mostly I'm pretty humble
yeah that is terrible
i hate that as well.
hehe yeah i think us people who love to code tend towards the aspergers side of the social scale, can be a mix of arrogance and social ineptitude.. not always of course
or the inability to learn from mistakes. so many tense up, not sure if it's behaviour learned in school. don't make mistakes otherwise you're bad. so many then just double down at work. I dropped out in school because programming was like 2% and the rest was boring electronics hardware. self taught since I was 15
yeah it can be tough in the workplace to be called out on mistakes or bad judgement, nobody enjoys that, but everybody does it at some point
best thing is take it on the chin and learn from it really
exactly, making mistakes is normal but it's not accepted as normal
self thought since i was 13. well....started at least. currently in a private university getting bachelor in gamesprogramming
nice ๐
definitely. making mistakes is good. because when you get it right you actually achieve something
that's also the first thing they tought us in uni
right, I like the process a lot.
also people who don't take time to refactor are a pain
if you need to and don't do it, that sucks. the kind of spaghetti code I had to read and untangle ... oh boy
imo there's always some refactoring you can do. that's just natural when your code base grows
gladly I think with enough experience refactoring gets less important
definitely. but it'll still be good when your code base grows
worse are the people you are tasked with to rewrite their implementation. cause they've used an unfitting approach
yes. but i mean that is still most projects
in DOD I hardly had to. If I had to, there was something wrong in my architecture
definitely
Rewriting code isn't refactoring though
exactly
true, what was kind of funny is that I never bought much into OOP. Guess starting with C was a big part of it. Never cared much for polymorphism, inheritence, overly generic architectures and such
OOP is great. loved working with it. but it's just an unfitting approach for games
but it's a big difference if you work with 100+ people or alone or in a small team lol
with 100+ people you need to take the enterprise approach
meaning?
hm, hard to explain I guess. you know java and their love for generics, factories, interfaces and dependency injection?
100+ people write generic code, sometimes code that isn't even used but could be used. every code is very much programmable and like a module.
ah. so pretty much standard oop
it's needed because otherwise all those people can't really work together and are blocking each other constantly
I'd call it hardcore oop ๐ but yeah
i mean that's how oop should be written anyway
that's debatable hehe, in large companies the answer is yes though
oop is all about being pluggable
I don't think that was ever the intent but it's where we are at
shrug
so much of what oop is now was born out of necessity so people can work with and not so much what the machine wants, does that make sense?
OOP was never about the machine. it was always about making it easy for people
~~probably ~~true ๐
100% true
๐
yeah, can't think of anything that would say otherwise
oop set us back collectively I feel like. simd and multithreading is so old by now and we hardly have any grasp to use it
not saying it's easy but we lost at least 10 years what we are just starting now
maybe even much more
yeah, but noone really cares about multithreading. not even AAA companies. that's pain
but that's the way it goes. even happened in GPUs where we are just starting to untap potential with compute shaders
it's mindblowing to think unreal wrote a better rasterizer in compute shaders than hardware
once this style gets on the hardware, wow, graphics will be through the roof
there are optical cpu prototypes. they are sooooo much faster than electrical cpu's. but it's also scary to think about how this 'speed' might reset the industry once more
oh wow, not up-to-date on this
yeah i think we're literally hitting the limits of how fast electrons can physically move around at this point
yeah been hitting that limit a long time ago
there's a limit to how small we can make the process sizes of the cpu's etc right
kind of fascinating how they can still push it a little further
i guess quantum or some other thing might solve a few issues, if inadvertently spawning entire universes is a small side-effect
quantum doesn't work for games / normal applications though
i'm not sure if quantum ever takes off for generic computing
imagine a quantum pathfinding solution ๐
need a lot of qbits ๐
the way I understood it is that quantum computers are terrible at scaling
i'm imagining.........and i'm not a fan
yeah i don't think we actually even fully understand how they do what they do fully tbh
they can easily give you ALL the paths for 1 entity. ...but not the path for ALL entities
you are limited to the available qbits. so I can't imagine how that would ever work out
the only practical thing for quantum computers is to destroy any fixed size encryption
there are other uses as well
always amuses me when people kindof assume science has all the answers and we're so advanced.. actually if you really just look at it, we're barely dipping our toes into the depths of reality atm, we literally hit a brick wall at sub-atomic stuff, there's a looooot more to learn
i'm sure, encryption is just a really big one
if you'd have the main internet server of a region as a quantum pc, and entangle it with another region, you would pretty much eliminate latency
okay if you go into entanglement then it just gets sick ๐
exactly, entanglement basically hints at a whole other level of reality that is so far outwith our current paradigm
pretty incredible stuff
i'd like that. then i could fucking play games with my friends in australia and usa
but how many qbits would that take really?
faster than light communication
we would need to replace hard disks with qbits, right?
no
qbits are the cpu
basically
but you'd have to store it somewhere like memory
.......in the cpu?
it's not data
it's physical things
...representing state/data?
You operate on qbits, dont you?
fine......you could 'save' things in qbits too. but that's like only using the caches of common cpus for storing data......access this pc's cpu by your own cpu....
say we have a quantum computer with 100 qbits and entangled on the other side of the world the same 100 qbits. I send you a picture using the 100 qbits. ok, cool. then I send another one. where does it go? it doesn't just stack
Hmm well as far as copying qbits go...
that's not how qbits work
I dont think you can copy qbits
correct.
if you could, that wouldn't even make quantum encryption secure
isn't entanglement basically copying?
So yeah in that sense, storing qbits is not actually a thing...
no. it's using 2 own qbits and just 'links' them together
hm, but taking the inverse I can get the same result?
also entanglement only works with 2. put another in and the first entanglement breaks
Well ok you can store em to operate on later, but you have to remember you can only operate on them once.
not sure i get what you mean
And as @haughty rampart said, you'd be storing the actual physical entities I believe
correct
i remember this experiment where they had a quantum (thingy? photon?) entangled on opposite sides of the world. the state on one side switched accordingly. not sure how that translates to a qbit
yeah that's what entanglement is
the principle of the photon is the same as a qbit, right? true/false/not observed
stateless unless observed
Its better to think of a qbit as a standing wave that can consist of literally any combination of wave frequencies.
From this you can derive how powerful of a tool these are: all that possible data losslessly compressed in just 1 composite wave
Out of curiosity i was looking up the first computers, the Differential engine and later the Analytical engine.. they obviously used mechanical hardware to perform computations and the later one stored results on punch cards etc.. kinda feel as if the current state of quantum computing is kinda like that.. be interesting to see how it evolves
However, as you said, reading makes this composite wave "collapse" and you cant do anything afterwards with it
we are "collapsing" waves also in our computers with binary quantisation
the frequency isn't really just 0 or 1 (max peak) - not sure where the difference is honestly
There is no actual "binary superposition" going on
the frequency is EITHER 1 OR 0
A standing wave (and thus a qbit) can have any frequency simultaneously encoded within it
So its 2 additional things going on:
- Simultaneous frequencies
- All the way up to infinity
(At least, thats my current understanding)
how complex is this standing wave? just a sine?
I suppose? But remember you can reconstruct literally any wave function given enough sine waves at various frequencies
reads like it can be an infinite amount of frequencies?
i mean, you can do that anyway, just need the energy to keep it up. if it's preserved then it's really funky
Maybe you run into physical limits eventually
planck-related stuff
This is about all I know about quantum computers
(And some of it may not necessarily be fully correct)
cries in writing simd code ... ๐
both methods result in the same assembly. uncomment the lines and Method is not getting unrolled
bool in burst job? did i miss something?
what's the problem with bool?
No I remember those not being supported by the Burst compiler
I suppose they must have changed that at some point
I wrote my own NativeBool thats just a byte wrapper lol
I guess I can throw that away now
yeah ๐
as far as i can remember i've always been able to use bool in burst
Hmm seems it used to be a restriction at some point
hahah strange huh
i think the reason is because bool in c# is actually a byte and not a bit so it requires conversion from managed to unmanaged space
but byte is blittable
@viral sonnet
also there is no 'bit' type
byte is afaik the smallest allocatable type anyway
byte is but it's converted to a true bit, so from 8 bits to 1 in umanaged space
Some of us used DOTS before bool was legal in Burst. We made a custom struct that could represent a bool which was internally just a byte value. We called it ByteBool. ๐
good old days of inject
Yes!
Some of our frameworks still actually use it. We're just too lazy to change it.
oh man loved those days haha
<old man voice> back when updates were weekly! ๐ด
Your code gets deprecated on a weekly basis.
damn, a pointer in a comp throws the subscene serialization off
Exception thrown during SubScene import: System.ArgumentException: Blittable component type 'TargetMatrix' contains a (potentially nested) pointer field. Serializing bare pointers will likely lead to runtime errors. Remove this field and consider serializing the data it points to another way such as by using a BlobAssetReference or a [Serializable] ISharedComponent. If for whatever reason the pointer field should in fact be serialized, add the [ChunkSerializable] attribute to your type to bypass this error.
At least the error message is well written and [ChunkSerializable] fixes it
although it's weird that the field can't be just ignored for subscene serialization
Mine was NativeBool ๐
Suddenly nothing compiles because a bunch of methods had just vanished overnight
so weird, both of those happen every frame in my stress test. and I don't get why HasComponent is so costly
do you have safety on?
Hey, I currently have a system whereby I'm using a thread pool to run noise generation tasks that are configured via a node graph. The node graph object itself is kept as a static singleton in the class that schedules the thread generation tasks. Currently I'm having it run a single thread which receives a copied instance of the node graph generator object to pre-emptively prevent access issues since I was wanting them to be able to be processed in parallel. Since I use the generator as a lock for each one, it would effectively negate the processing gains of using threads in the first place. Or, at least that is my current understanding at the moment.
I was looking into maybe converting it into using the Jobs system, which will require some refactoring due to the limitations on what data types you can pass to it, but in doing so, I'm not sure if it will cooperate with the noise generator instance if it is calling the static method from within a parallel job. Obviously, it's not something I can pass in via a struct parameter. If I use parallel processing as my noise graph functions now, this would effectively return thousands of separate instances each batch just to sample each individual point. However, if I alter the return function to use a persistent instance, won't that force the job workers to compete for who gets to use the generator.getvalue function?
Summarily: If I have a job running as an IParallelFor that is calling a static method that returns a singleton instance, is there a risk of the function deadocking the job threads if the function relies on a static instanced singleton that processes the return output for the noise function?
You can't access mutable static data (or call static functions that access mutable static data) from inside a job
You can use https://docs.unity3d.com/Packages/com.unity.burst@1.7/manual/docs/AdvancedUsages.html#shared-static but it's slow
With how the safety system works you're meant to pass in all data a job needs into the job when you create it. It will prevent you from accessing any non-const non-read-only external data
And I don't mean theoretically, it will either give you a compiler error or an exception
Yeah, that's what I'm trying to see if I can work around. But it's kind of a catch-22 situation in that I'm trying to use it to generate the values from inside the job and then pass them out and copied them over into a managed array, but I can't generate them if they can't get access to the generator from inside the job in the first place.
The job wants you to give it the data pre-made, but the jobs purpose is to find out what the data is
If you want the speed from burst you have to play by it's rules. A job is supposed to compeltely self contained until it's done.
I guess to me it sounds like you need to re-think how you want to set up your jobs, there should be a set order where you generate your input data that gets pushed into the next job
And so on until it's done
You can have a native array that holds your job data, you let the job run until it's done, call complete on the job handle so unity knows it's safe to access, copy your data to your managed array, then restart the process
You can even do a memcopy so the exchange will be quite fast and not allocate
I already have that setup, though, but that's not the issue. On each iteration I'm wanting to have the job be like,
"Hey noise gen! What is the 0-1 value of this position? points over to it's struct values ",
and the nosie gen goes, "Ok, Uhhhh... like 0.2.".
But I want to know if I can make that kind of static call from within the job itself.
Like, if all the function were to return was a float instead of the entire generator class?
Currently it returns the generator itself, and then that gets used in the thread, but if I refactor the code to just return the actual value from the generator instead, would that be sufficient to work around the job system data restrictions?
No, you can't. It's still returning mutable static data. You would have to pre-generate your values and pass them in
Or copy the generate and pass it in
Is it possible to return non-mutable static data? It's not like I need it to change once the value is returned
Non-mutable means it's a compile time constant. Not that's it's just not changing anymore
Ohhh... ok
In that case you can easily just generate all your data into a native container and pass that to your job
Or rework your generator so it's burst friendly
Hmm... what is it about the shared static that makes it slow?
Dunno, I've never used it but I've heard it's slow
Idk if it's possible to rework the generator in it's current state, considering all the nodes are being derived from ScriptableObject. I was wanting to use the jobs themselves to compute along the node graph hierarchy. The values don't come pre-computed out of the box as the generator doesn't know what they are until something calls it's GetValue function
Unrelated, but another thing going forward I was thinking about for much later, is can you tell the job system to dedicate a specific number of workers to an individual task, when implementing just IJob and not doing parallel processsing? For example: if you want to conserve a certain number of worker threads for other stuff, and just run this type of job on a max of, say, 2 workers?
that doesn't really help as it'd stop the other stuff using the threads!
you can just use IJobFor, pass in the max threads you want to use and then calculate the index yourself within the job
I managed to do this. Hold on
Something like this, maths might be bad but you can figure it out.
[BurstCompile]
private struct MaxThreadJob : IJobFor
{
public const int MaxThreads = 2;
public NativeArray<int> Data;
public void Execute(int threadIndex)
{
var workPerThread = this.Data.Length / MaxThreads;
for (var i = 0; i < workPerThread; i++)
{
DoWork(threadIndex * workPerThread + i);
}
if (threadIndex == 0)
{
var remainingWork = threadIndex % MaxThreads;
for (var i = 0; i < remainingWork; i++)
{
DoWork(workPerThread * MaxThreads + i);
}
}
}
public void DoWork(int index)
{
}
}```
@low fiber
Ok so this is the basic principle:
- You have a
JobHandles[]with length less or equal thanJobsUtility.JobWorkerMaximumCount. - You have an
int cursorPos. - Every time you schedule one of those "worker restricted jobs", you add the JobHandle from the array at index=
cursorPosas a dependency when scheduling that job. Then you incrementcursorPos(and of course wrap it back to 0 after exceeding theJobHandle[]length.
So you're keeping track of it with your own counter, essentially?
Well I dont know what kind of counter you're thinking of
Think of it as a number of "job chains"
Thought experiment: If you always keep the JobHandle of your last scheduled Job, and always add it as dependency to the next Job you schedule, they can only ever occupy 1 worker thread, right?
So from there you just keep multiple "last scheduled JobHandles" equal to the number of worker threads you want, and "cycle" between them with the aforementioned cursorPos
Oh, I think I get it. So you maintain a linked chain of sorts for each worker you want to use and when you add a new job you're essentially tacking it onto the end of one of the chains?
Yes! I think you get it
What happens if the chain gets used up before you add a new job? Just not add a dependency if the chain is empty?
I dont know what happens if you add a completed job as a dependency for a new job, but instead of finding out in some painful way, I just check if the JobHandle was indeed completed. If it has, I just skip adding it as a dependency to the new one.
Also I make sure the "last scheduled JobHandle" is always a combination (through JobHandle.CombineDependencies(...)) of the "previous last JobHandle" and the new JobHandle
(Not sure if these things are strictly necessary. I just included every precaution I could think of and called it a day.)
So maybe you already figured but the way you check if a chain is empty is by checking if the last scheduled job has completed. There is no need to keep a record of how many jobs are currently in it or anything.
does generic jobs still can't be bursted?
burst does support generic jobs
No, IL2CPP build
What profiler is that screenshot from? Looks nice
I'm building IL2CPP in Unity 2021.2.5, and I'm getting this spammed in the runtime log:
````ExecutionEngineException: Attempting to call method 'Unity.Entities.FastEquality+CompareImpl`1[[Unity.Rendering.FrozenRenderSceneTag, Unity.Rendering.Hybrid, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null]]::CompareFunc' for which no ahead of time (AOT) code was generated.```
There doesn't seem to be anything I can do since some of this stuff is private. I have been told that IL2CPP doesn't work with 2021.2 and DOTS, is that the case?
Intel vTune
I didnt know that worked well with unity
@calm edge dots is only officially supported on 2020 lts
@viral sonnet Burst team responded (amazingly)
The compiler doesn't know the aliasing properties of the globalCooldowns property as compared to the other members, and so it won't unroll or vectorize the loop. Native containers that are resident in jobs have implicit aliasing guarantees (they cannot alias), and because Burst is aware of that it can do a better job. You could try putting [NoAlias] on the globalCooldowns member of Method to check if that helps?
gonna try it now
#archived-dots message already thought of that, [NoAlias] doesn't help ๐ฆ
and I ran into another problem that auto vectorization was breaking with having the A | B logic. it was working with either A or B
I'm going back time and time again and I'm realising I'm not able to simd my code. There are so many parameters residing in blob data alone that I'm not able to figure out how this would work. In the code posted I've the naive NativeArray<bool> ignoreGlobalCooldown_Array. To get this array data I have to loop over every spellcaster, find out the spell that it's casting, get the spell data and the actual ignore global cooldown flag and write it back to the array. for something that wants to be able to cast 250k in a single frame, I'm not sure this would even end up being faster. and this is only for 1 variable. I've like I don't know, 30-50? If anything I'd need to write every single variable into a seperate array, because auto vec is not understanding structs. Or I'm left with writing manual simd code and then I'll start worrying about my sanity. haha
Currently I'm trying to figure out why HasComponent takes so long: #archived-dots message At first I thought something weird is going on with measurement but it always crops up in vTune and Visual studio profiling.
maybe it's taking so long because of the interlocked beforehand. otherwise I've no explanation
hm, no interlocked. still quite the spike. so weird
everything else is in the 0.6-1.0ms range
oh and @robust scaffold as we were talking about using a pointer for getting the localtoworld matrix instead of copying it, it's working out quite well. the job for 250k takes around 0.4ms parallelized so really fast considering 250k random lookups. what made it slow before was the float4x4 copying. i think it took around 2ms with that. the actual targetsystem that uses the pointer now is slightly slower but only slightly. 0.3ms range so overall this is a huge win.
Thats good. Just find a way to invalidate the pointer when the target entity is destroyed somehow
I got it
native array doesnt work unrolling, requires the pointer
That was what I was mentioning when native array as method parameter doesnt work as well. I think
cool! so [NoAlias] isn't even needed?
Give me a sec
There's an issue
Note, this will throw an error during operation due to the fact that the component type handle was marked [ReadOnly] while the data is accessed using [ICODE].GetUnsafePtr()[/ICODE]
However, using the proper pointer via [ICODE].GetUnsafeReadOnlyPtr()[/ICODE] does NOT result in any loop unrolling. Which is interesting.
Burst is somehow having an issue with a read only pointer (it's the same pointer, different checks to obtain) and not a read / write pointer. You can still write to a read only pointer, the check is only at the method call, but burst unroller can not operate with the small difference.
Method(chunkCount, (uint*) globalCooldowns.GetUnsafeReadOnlyPtr());
This will not unroll but will work.
Method(chunkCount, (uint*) globalCooldowns.GetUnsafePtr());
This will unroll but the safety checker will throw an error in operation.
Remove the [ReadOnly] from the ComponentTypeHandle and the latter will work but the scheduling may be suboptimal
whaaat? how strange is that
no clue
public static unsafe void* GetUnsafePtr<T>(this NativeArray<T> nativeArray) where T : struct
{
AtomicSafetyHandle.CheckWriteAndThrow(nativeArray.m_Safety);
return nativeArray.m_Buffer;
}
public static unsafe void* GetUnsafeReadOnlyPtr<T>(this NativeArray<T> nativeArray) where T : struct
{
AtomicSafetyHandle.CheckReadAndThrow(nativeArray.m_Safety);
return nativeArray.m_Buffer;
}```
The code returns the same pointer
/// <summary>
/// <para>Checks if the handle can be read from. Throws an exception if already destroyed or a job is currently writing to th
/// </summary>
/// <param name="handle">Safety handle.</param>
[Conditional("ENABLE_UNITY_COLLECTIONS_CHECKS")]
public static unsafe void CheckReadAndThrow(AtomicSafetyHandle handle)
{
int* versionNode = (int*) (void*) handle.versionNode;
if (handle.version == (*versionNode & -7))
return;
AtomicSafetyHandle.CheckReadAndThrowNoEarlyOut(handle);
}
/// <summary>
/// <para>Checks if the handle can be written to. Throws an exception if already destroyed or a job is currently reading or w
/// </summary>
/// <param name="handle">Safety handle.</param>
[Conditional("ENABLE_UNITY_COLLECTIONS_CHECKS")]
public static unsafe void CheckWriteAndThrow(AtomicSafetyHandle handle)
{
int* versionNode = (int*) (void*) handle.versionNode;
if (handle.version == (*versionNode & -6))
return;
AtomicSafetyHandle.CheckWriteAndThrowNoEarlyOut(handle);
}```
The code uses the same exact data
just read only will not unroll
I've wrote that in a reply to the thread you created in the burst forums, maybe the devs will figure it out
great! thanks a lot!
as I'm still with 1.6 I'll see if I get the same results. you mentioned something about having your code not unrolled in 1.7 if I remember correctly
Right, the unroller was modified in 1.7. Try it then. Maybe you might get better results.
remove the ref?
no ref, I copy/pasted your code from the post
give me a sec, im failing to reproduce it as well
but im getting the 1.7 bug where the inspector doesnt refresh again, annoying
i should be testing burst in an empty project...
i'm gonna try with 1.7 now
public void Execute(ArchetypeChunk chunk, int chunkIndex)
{
int chunkCount = chunk.Count;
var globalCooldowns = chunk.GetNativeArray(GlobalCooldownEnding_ReadHandle).Reinterpret<uint>();
Method(chunkCount, (uint*) globalCooldowns.GetUnsafePtr());
}
[MethodImpl(MethodImplOptions.NoInlining)]
private void Method(int chunkCount, [NoAlias] uint* globalCooldowns)
{
for (int i = 0; i < chunkCount; i++)
{
validSpell_Array[i] = ignoreGlobalCooldown_Array[i];
validSpell_Array[i] |= globalCooldowns[i] < tick;
}
}```
I had to use no-alias
Yea, GetUnsafeReadOnlyPtr() works with unrolling if you use [NoAlias]
oh, do you have [NoAlias] on the public fields too?
public unsafe partial struct SpellCastJobSIMD_GCD : IJobEntityBatch
{
[ReadOnly] public ComponentTypeHandle<Census> GlobalCooldownEnding_ReadHandle;
public uint tick;
public NativeArray<uint> gcdEnding_Array;
public NativeArray<bool> ignoreGlobalCooldown_Array;
public NativeArray<bool> validSpell_Array;```
Nope
Sometimes I get this and just killing the process and restarting solves it
may not necessarily be related to what you're trying at the moment