#archived-dots
1 messages · Page 233 of 1
Because the citizenship random access the national interest float, which I then vectorized apply to all bank accounts within the population of that chunk
So wait...
Whenever you create a new Citizen entity, do you add the Citizenship chunk component immediately, or at a later stage?
related: do you create new Citizen entities within a job?
This kind of sounds like what I'm attempting to do: minimizing random access
- I create a citizen entity then add a <SharedCitzenship> component.
- I set that <SharedCitizenship> component (which is just a single entity) to the country entity it is a citizen of.
- Then in another IJobEntityBatch (not bursted and on the main thread), I read the <SharedCitizenship> of that chunk then set the chunk component <ChunkCitizenship> to that value.
Yep. I spent 6 hours last night converting my entire loading sequence to a job thread.
Hahaha I knew it! I was thinking of something similar, but wasnt sure if this was the dumb way of doing it.
SharedCitizen is just IComponentData though, not ISharedComponentData?
It's a shared component data, hence the prefix
private struct MirrorStaticToChunk : IJobEntityBatch
{
public EntityManager Em;
[ReadOnly] public SharedComponentTypeHandle<CitizenshipShared> CitizenshipShared;
public ComponentTypeHandle<CitizenshipChunk> CitizenshipChunk;
[ReadOnly] public SharedComponentTypeHandle<AreaShared> AreaShared;
public ComponentTypeHandle<AreaChunk> AreaChunk;
public void Execute(ArchetypeChunk batchInChunk, int batchIndex)
{
var countryEntity = batchInChunk.GetSharedComponentData(CitizenshipShared, Em).Country;
batchInChunk.SetChunkComponentData(CitizenshipChunk, new CitizenshipChunk(countryEntity));
if (!batchInChunk.HasChunkComponent(AreaChunk))
return;
var areaIndex = batchInChunk.GetSharedComponentData(AreaShared, Em).Index;
batchInChunk.SetChunkComponentData(AreaChunk, new AreaChunk(areaIndex));
}
}```
Like, archetype component? 🙂
Oh wait I see -- it doesnt happen in a job
what
i made that suggestion a while ago. not that it exists, sorry
thats not a thing... right? A component that can fracture chunks
oh, hahaha. Made me worried for a sec
I mean burst job
So wait, I can access SharedComponentData in non-burst jobs?
Nope. Just a "regular" threaded job
Must be on the main thread. Basically a glorified for loop.
Not an actual "job".
Haha yeah thats what I was thinking: this doesnt necessarily need to be a job at all
Hmmm....
new MirrorStaticToChunk
{
Em = EntityManager,
CitizenshipShared = GetSharedComponentTypeHandle<CitizenshipShared>(),
CitizenshipChunk = GetComponentTypeHandle<CitizenshipChunk>(),
AreaShared = GetSharedComponentTypeHandle<AreaShared>(),
AreaChunk = GetComponentTypeHandle<AreaChunk>()
}.Run(GetEntityQuery(_sharedChunkDesc));```
Well yea, you can do this using good old fashioned For() and ForEach() loops but eh
Hey thanks a lot, this clears things up quite a bit.
Time to tear down some stuff again lol
that's new
yea, I've been working on this project of mine for.... 7 years now? Started out using javascript, went to GO unity, then blindly fumbling around in DOTS, and now I'm neck deep in vectorizing code
can't schedule IJobEntityBatchWithIndex with a batchcount of more than 1
Null reference from within DOTS? What? How
lol, that's quite a ride
I also had something nice going in Unity with regular C#, but performance was abysmal. Hence I'm now here counting bits and what not
right, i need to figure out if bitwise operations is vectorized
if it is, that unlocks a whole new world of premature micro-optimization
"premature" lmao
Does the entity query you are operating on have entities within it? If .GetCount() is throwing an error, that typically means there are no entities. Well, the job shouldnt be running though.
DOTS is a premature optimizer's wet dream
i tried spellCasterQuery.CalculateEntityCount() and returns on 0
lol not sure what's going on. it's only doing that if I increase that batchcount
huh, that should be an early return then. IJobEntityBatch was introduced with 0.17 and "fresh" out of the oven. IJobEntityBatchWithIndex even more so.
Optimization can be kind of addicting. Especially if you have this excellent profiler that tells you exactly where time is spent.
Yea. Solid green bars filling the screen. So beautiful
damn, the algorithm needs the index :/
10 ms becomes 1 ms
best feeling
lol yeah
batch index?
the indexOfFirstEntityInQuery so I can safely write to the nativestream
When I finally got the first loop vectorization to work and 99ms got cut down to 6.7ms, holy shit. Bottle that feeling and sell it instead of what I'm making
Hrm, true. Well, you usually dont need more than 1 batch per chunk... usually.
We run it on a loop through KornFlak's mind. And the chemical it makes his brain secrete goes into every KornFlak's Simple Wafer's Wafer Cookie. Come home to the impossible flavor of your own completion.
When they stated 25x or 50x performance improvements in some of the first DOTS presentations, they werent kidding
Yea but the codebase I'm "converting" is made in the most dense Object Oriented style I have ever seen (written in Java).
The inheritance of some of these classes are 8 classes deep
and now I'm trying to convert it into the most dense ECS format for performance. It is quite the challenge
Kind of crazy how late i learned about the whole CPU cache relation with performance. It was a little before DOTS came out, so when it did, it was as if they knew I wanted this.
lol
I mean look at this nonsense
classes within classes within classes within classes. Relational values that relate to other relational values
that's why devs hate OOP
My approach is currently a bit more pragmatic perhaps. I have like a time budget for a particular set of systems (e.g. <1ms / update), and also try to favor ease of reading & maintainance instead of performance if possible
I noticed especially with Java the class inheritance hierarchy can get very deep.
Usually not so bad with C#, although the feature set is pretty similar 🤔
yep, and so many relational values. Bank account of one changes means a logging method and a dozen other values also change
and it's all delegate functions so they're dynamically called
And I'm gonna figure out how to parallel and maybe even vectorize a clearing house market exchange for millions of entities...
Is this all old stuff you wrote yourself that you're converting now?
Nope. This is some rando's online that I cant even get in touch with
I just like it and I wanna build a game on top of. It's the most feature complete entity based economic simulator out there
I see 🤔
When I first found this about... 3 years back. I thought: Entity based? Wow, that'll be easy to convert to Entity component system. hahaha. no
Its like learning how to program all over again
For me, things were going well until I added 3 more such planes. I was already at 35 fps by then on a decent PC.
VR? Nice. Reminds me of War Thunder kinda
Not VR. The clouds are proper volumetric ones though. Spherical planet/world. Decent aero simulation (although still a really rough approximation)
Not quite sure where I'll take it but I couldnt experiment any further due to said performance issues
Bitwise operations are vectorized.
This is revolutionary, kinda
I cant read any of this haha. When did you learn to read this stuff?
i have no clue what vmoveXXXXX or whatever are. But they're purple and not green and have v in front for vectorized operation
left column is CPU instruction and right column is arguments?
I see
haha 👍
The goal is the least amount of operations and the most amount of purple
Yep. Bitwise flag check operations are also vectorized. Woooo
This might help with that other guy, @viral sonnet . Have you considered bitwise flags?
ah sick, not really but i'll plan to
This is what my testing method call looks like
And this is the "gold" standard of vectorization. 3 lines. Vectorized addition and set.
of course, it aint doing much but it's real fast
damn i was about to ask, does this do the same as in the previous screenshot?
Nope, that's the original code (before testing some other stuff)
ohh right
that will return true for ExpectVectorization(). Really simple code
The bitwise check and select code will not
Why the unsafe pointer over the native array?
As mentioned earlier with Enzi: "Burst's auto-vectorization works entirely on pointers. It doesnt seem to understand that ref NativeArray<float> is identical to float*. The pointer will get vectorized. The ref Nativearray will not."
I was unable to get any purple with ref Nativearray, give me a sec
Wow, thats pretty bad
Also, instead of in float doing a normal float would likely work better as well?
I should remember this.
Should remove a pointer deref in vaddss
actually no
actually I have no fucking clue what is going on
there seems to be two different addition commands
good?
unrolling seems good
unrolling in shaders always good
Yes in this case that would be good
lol
Removing the in float
Moving to ref NativeArray<float>
alright, no difference between the float* and ref NativeArray<float>
Good to know
alright, lesson today: remove the in parameter, it doesnt seem to do anything useful
Since it's using LLVM to do these optimizations, it's pretty smart at optimizing code
I'm still very new to vectorization. God I wish Unity would publish a list of all commands and operations vectorized.
Though I wish the debug tools were a little better to help people understand what is happening, and what got optimized
What do you mean with that?
Some commands can be vectorized, some can not. Like bitwise operations (conditionals and movement) are vectorized (just learned a few minutes ago) but some other operations are not
Wouldnt this be determined by a CPU's instruction set?
Ah right, thats not "really" a Unity thing
Also conditions "can" be vectorized
just not automatically (I think..)
Like setting the int value of a pointer to 0. Can not be vectorized.
Interlocked isnt vectorized either
then there's got to be a list somewhere....
Of all cpu instructions 😛 ?
Of all v- prefixed operations yea
idk you probably already went here https://en.wikipedia.org/wiki/Advanced_Vector_Extensions
Advanced Vector Extensions (AVX, also known as Sandy Bridge New Extensions) are extensions to the x86 instruction set architecture for microprocessors from Intel and AMD proposed by Intel in March 2008 and first supported by Intel with the Sandy Bridge processor shipping in Q1 2011 and later on by AMD with the Bulldozer processor shipping in Q3 ...
Yea I saw that but that can not be all of it.
Oracle Solaris Mnemonic Intel/AMD Mnemonic Description Reference vaddpd ADDPD
(from my link)
now I gotta figure out how to use all these packed instructions. There's a lot of them
AMD's manual: https://www.amd.com/system/files/TechDocs/26568.pdf
seems more extensive?
my god, how in the world am i gonna learn this. It's like staring at DOTS when I first began. Except with even less community support but a hell of a lot better documentation
Assembly is not something you every fully learn
Gotta figure out if this works with burst and what burst can recognize. I aint gonna start scribbling out my own assembly functions
Just learn where to find it, and search every instruction you dont understand
Suggestion: Maybe '0' is too implicit? Worth a try to assign '(int) 0' or '0u' (unsigned?) instead?
I'm pretty sure it's because the pointer itself is not aligned linearly to allow for packing of values
Required for use in interlocked.add
Since I'm obtaining a temporary value from ComponentDataFromEntity
Could it have something to do with StructLayout? As in, perhaps you could fix this with an explicit layout?
The main requirement is that Interlocked requires a reference / pointer to the target integer value
oh wait let me read your previous thing
Using CDFE, it only returns a temporary value and unity does something else with setting it.
People on the forums are begging for a ref ElementAt() version for ComponentDataFromEntity that would make interlocked work directly. But it doesnt, so this "hack" is needed
You'd think it just some number
It is. It's a memory location though.
anyways, gotta convert all my pointer* to ref NativeArray and remove unsafe.
Oh you're setting the int at the pointer location to zero, not the pointer itself?
Yes. The pointer does not change. The value that the pointer is directing to is being set to 0
Ooh.
So the same thing with "public int Ptr;" instead of "public int* Ptr;" would vectorize?
nvm I'm guessing too much lol
Well, making the pointer no longer a memory pointer but instead a value int, it will vectorize
I'll let you do your thing :p
Wouldnt do anything though
I wont be able to interlock add it in a parallel job though.
which is why I need that pointer, to obtain a ref-able variable to shove into Interlocked.Add()
ComponetDataFromEntity returns a copy of the actual variable (for readonly safety). But that is annoying when I have a threadsafe operation called Interlocked.
Thing is, Unity CTO, the guy in the forums and overall command of DOTS, says to avoid using Interlocked. But he's wrong and I'm right.
What was his reasoning?
What even is the point of Interlocked if thread-safety is already guaranteed by the job system?
When you want to step outside unity's thread safety and into more exotic threading mechanics.
I'm needing to sum up the total for all chunks depending on the citizenship of that chunk. And that's to be done in parallel
And yea, i measured my performance. Singlethreaded, 4ms. Multithreaded + Atomic Interlocked, 0.7ms. No shit unity.
Oooh I see
You write to 1 summed value from multiple threads
Yep.
Although
you in effect have a map where the key is the country and the value is the sum
I can't seem to get anything vectorized :/
One "thread" per chunk that then sums to a single component on an entity. Multiple threads access that component. Requires use of an atomic addition to prevent race conditions
Ah, we did some burst inspection. No need for pointers. You can use ref var.
I was apparently dumb earlier
as for the rest, give me a sec\
ah ok, seemed weird because ref var should be the same under the hood. well, just semantics.
sure, thanks! 🙂
Oh yea, that conditional if statement will break vectorization
Ah fuck, I got class in 8 minutes. I dont know if I can finish typing this out and checking the inspector
I'll be back in.... god 2 hours?
no issue, thanks a lot for your help so far! 🙂
Huh, this is not vectorized and I dont quite have enough time to trial and error to figure out why
That however is vectorized. And yet the one I just hammered together is not. No clue why
i'll think about it when i get back in 2 hours
hm, i guess writing to a single value is the problem.
no, single value does work
actually yea. That might be the issue
well, it is writing to delta[0] which is a single value
and that is vectorized
do I need to set the output to be a single array?
either way, running late
eh, that was a little of a goose chase for me. :/ only arithmetics can be vectorized. if you have any kind of complex code with branching it won't work. hoped that something changed to the 10 years ago where I wrote simd the last time. seems to be pretty much the same restrictive shit. haha
Branching is possible. Only inline condition ? A : B.
Not very optimized though
Is there a better or built-in way to determine 'maxChunkEntityCount' as I do here?
I'd like to avoid using EntityQuery's CreateArchetypeChunkArray() on the main thread
EntityQuery provides CalculateChunkCount() and CalculateEntityCount(), but not something like "CalculateMaxEntityCountPerChunk()"
Related: Are entities of the same archetype split into multiple chunks? As in; chunks always have a finite size?
Maximum number of entities in a chunk can be calculated at compile time. I believe 16KB is chunk size, then you simply add the number of component size within an entity.
Its somewhere. Unity pulls that data for the entity debugger.
Although I can avoid having to deal with this if I knew how to use NativeMultiHashMap lol,
Maybe its about time I try using NativeMultiHashMap properly
@near mesa do you happen to have an example of iterating through NativeMultiHashMap values for a given key? 😋
The official ECS Boids sample is about the only one I can find so far
I know how to do it, done it plenty of times before a big overhaul of my code. But none on hand. Will need to write up some code manually.
Are you looking for the max entity capacity that chunks which match the query can have?
Some post on the internet suggests "NativeHashMap<TKey, NativeList<TValue>>". Feels a bit dubious to nest NativeCollections like that.
Yeah exactly
Do you know which component your query will have? If so you can use EntityManager.CreateArchetype(components).ChunkCapacity
Oh so I can request it from the archetype?
Wait lets see
Oh right there is such a thing
Although this won't get me anywhere as I would need to know what Archetypes can occur beforehand
I guess I'm better off going for another approach
Oh i now found plenty of NativeMultiHashMap examples: https://github.com/search?q=NativeMultiHashMap&type=code
I found something I can try
Another example with an Enumerator instead of Iterator... Hmm
This should be sufficient for now.
Thanks everyone
In my head canon Arowx is the shit post account from Joachim Ante
that would be so funny 🤣
lmao
Enumerator is easier to loop through personally.
Lol NativeHashSet<T> is just a wrapper for NativeHashMap<T, bool>
Yep
Burst 1.7.0 is apparently on changelog with some really nice improvements to the inspector yet it's not in the package list yet. What version of unity are you using?
i wonder how the asm for readonly span gets optimized in burst 1.7.0
Added support for System.Span<T> and System.ReadOnlySpan<T> within Bursted code. These types are not allowed as entry-point arguments.
AH HA, GOT IT. Just had to manually type out the fucking burst package version into the package manifest file. Unity, you will not deny me my cutting edge burst
2020.1.9f
lol, I probably won't update to burst 1.7 immediately, but I imagine the no entry point arguments means they can't be used as function parameters?
The package is public but unlisted. Now lets see these improvements to the inspector
Cant be used to pass between mainthread and jobs. So no job struct parameters but probably can be used if defined within the Execute() or other local functions
The cross boundary between mainthread and bursted jobs is so restrictive.
Cool, makes sense. You would not want to pass Span & ReadOnly Span as a member in a struct when scheduling a job.
Thats the thing, you. Yea.
Holy shit, it's so fancy now
i dont know what any of these things are doing but apparently it collapses the default burst code leaving behind only those derived from C# code
and then those arrows are.... something?
That's the archetype scheduling code
What I really wish is for custom comments from C# code that appear inside burst inspector. So I know where my code is actually located. Although these arrows help a lot
Does anyone here spend any effort in ensuring their code remains Unity editor 'hot reload-proof'? (Or is it not really worth it)
hot reload?
you mean the domain reload option?
like you want it to not recompile after you edit your source while your game is live?
Do a code change while the editor is in play mode and have it seamlessly continue running after recompilation without any state loss or corruption
I liked the feature when I initially started working with Unity. Although over time it has become harder and harder to properly maintain compatibility with this
well, afaik hot reload in .net only became an official feature with .NET 6. I think with burst it becomes a bit more different (I'm not entirely sure how the generated burst dll is linked to the mono runtime.)
No clue how that would work. If it could work with DOTS / burst in the first place so eh
there is live link. But I have yet to try it.
Live link? Well the feature Im talking about has been in unity for years
all it does is serialize whatever is in runtime at the moment
recompile
then deserialize all state again and continue the simulation from there
Although of course you need to make sure all state is serializable
- deserializable
Yea I get what you're saying. TBF, I don't really know how well it works with DOTS. I usually just leave my editor at Recompile After Finish Playing
Usually anything you put in jobs will already be serializable... although... pointers may not be
Yea, so not sure how well it works with native containers.
any gist of how Span is useful for us? Does it mean anything when we can use NativeArray?
IMO is more trouble than its worth but I was wondering if some are actually doing things "right" in this regard (hot-reload compatibility).
Might be relevant to your needs: https://forum.unity.com/threads/improving-iteration-time-on-c-script-changes.1184446/
xfoofx also worked on the burst compiler team.
increased waiting times (from 2s previously to 4s/6s)
Wait, it's not normal to wait 4s/6s for recompiling?
typically no
Oh no compilation times are fine for me. I separate my stuff into several assemblies. I was just wondering if this is something that is actually valued a lot in the "real world".
oh hot reloading is very convenient
but it breaks easily right?
we don't use microsoft c# 🙂
I've only done it in C, never with Unity directly. I actually thought livelink was suppose to be the "hot reload" solution for Unity DOTS
any "static Dictionary<GameObject, ExtraProperties>" you have lying around will have to be accounted for and dealt with properly for (de-)seerialization
what do you mean? Our code is in C#. Microsoft makes C#
struct MutableStruct { public int Value; }
...
Span<MutableStruct> spanOfStructs = new MutableStruct[1];
spanOfStructs[0].Value = 42;
Assert.Equal(42, spanOfStructs[0].Value);
var listOfStructs = new List<MutableStruct> { new MutableStruct() };
listOfStructs[0].Value = 42; // Error CS1612: the return value is not a variable
Technically we have Mono. Anyway, I've already read that article. What's the difference to NativeArraySpan and NativeSlice
https://github.com/Unity-Technologies/mono this is the version of mono that unity maintains.
Live Link sounds like it does not imply it will employ actual hot reloading. Just that it will received the updated bundles and - I guess - relaunch the game with the new stuff.
Baked into the language? No need for class declarations and such?
Basically span seems to be able to return ref values to elements of the array (removing need to create temp vars then reset the array element). Nothing particularly fancy.
That's useful if they have implemented it that way. NativeArray lacks a ref
var t1 = (int*) new NativeArray<int>().GetUnsafePtr();
Well there is a pointer sure
didn't mean that we can't do it. it's just easier to write
If you need to understand NativeSlice; I worked with it before
I wonder where ref ElementAt() comes from. Which collection has it?
NativeList has it
Ah, because nativelist is a pointer to a buffer. Not the buffer itself
FixedList has ref ElementAt
UnsafeUtility.ArrayElementAsRef<T> might be able to use this to write a NativeArray ref ElementAt
Right, people have said to roll your own extension method using that
yes, this works too
yea, I actually only just found out about that. I've been rolling with
return ref *(((T*)NativeArray.GetUnsafePtr()) + i);
Does that work with non-continuous collections? NHM/NMHM?
NHM just returns the struct. all the collections leave a lot of flexibility out
IMO all should be able to return refs/pointers
At least the unsafe versions should
Unity cares too much about handholding the programmer. Let us shoot ourselves in the foot.
Change the optimization pipeline to run the loop unroller exclusively after the loop vectorizer. This improves codegen in a lot of cases (mostly because the SLP vectorizer is unable to vectorize all the code that the loop unroller could have). -> wonder if this would improve my code at some places 🤔
yeah, i really dislike this in the NativeX space. handholding equals bad performance and that's not why I'm using these
beautiful rainbow
it's like playing factorio again
Unity really is a superior factorio
I literally can not play that game anymore without thinking: "I could be working on my project right now."
What happens to NativeCollections allocated with Allocator.Persistent after exiting play mode?
memory leak
lol
well that explains some things
Oh right I wasnt sure if the leak detection would pick this up but it does. The error will come when attempting to start play mode for the second time.
Hrm
New Burst 1.7.0 seems to have broken something with EntityManager.MoveEntitiesFrom()
nevermind, not burst, just my shitty code
I dont have to fix/pin anything with when working with UnsafeUtility.MemCpy if working with NativeCollections right?
(Native to Managed)
(Native to Native)
I mean it works but maybe I introduced some safety issue here
You know that Unity has an extention method CopyFrom and CopyTo
And the managed version
Yeah but I'm also converting the struct type during the copy
e.g. this allows fast conversion from Vector3[] to NativeArray<float3>
.Reinterpret<>() the array before copying? Ah, from managed to native
Yeah Reinterpret basically
well, you can copy to a NativArray<Vector3>() then .Reinterpret<float3>() to get it in the end
without having to make your own version memcopy
Is my version correct, though? :p
maybe? Unity has a lot more safety checks for read access
I'm primarily concerned about the fixed {} block
I had to do it for the managed array, but not for native
hmm I'll check what Unity has then
This was from before Unity had all those extensions I believe
(void*) ((IntPtr) (void*) gcHandle.AddrOfPinnedObject() + dstIndex * UnsafeUtility.SizeOf<T>())
Unity has some really strange casting here
yea looks right
Unity's reinterpret does an allocation here that I dont have to do :/
Also, this just in: EntityManager.MoveEntitiesFrom() can not handle chunk components
will throw null reference exceptions
Hmm alright. So far I havent seen strange things happen so I suppose I'll just go with it for now.
also you can just do
fixed (T* ptr = array) {
}
where did you find this?
Unity's CopyTo / CopyFrom implementation
found in NativeArray
Ooh right.
Looks like it basically does what I'm doing
without the fixed { } block somehow
Although now that I think of it, the fixed { } block probably just compiles to what Unity has in their implementation
i'm at a point where I'd like to start a job from a job 😄
hahaha, yep
Its like the "I want to put a prefab in a prefab" for people working with DOTS
at some point everyone gets there
then disappointment hits
or figure out a better way to create data in a job that is used to start the actual job then. NativeStream is the best solution I've found but even that is not async in that sense, takes a lot of time and blocks the main job
hello guys do you know any best tutorial for dots 2021 😦
Yeah I often chain Jobs for this reason
iam begging to understand ECS system
hey me too
Oh boy. So are we.
every version has differnt defenition
like
World.Active.EntityManager
1.8 dosent exist
WTH ??
its keep changing
0.18 is dead. Long live 0.21 (eta 203X)
I just stick with Entities 0.17 because I understand it and it works
loop vectorization and pain
it's World.DefaultGameObjectInjectionWorld.EntityManager for a long time now
impossible
can i punch my face
wait, are you using hybrid renderer?
Ah, yea. Dont.
lol
WHAT DO YOU SUGGEST
i have so many instantiated objects
around
10,000 to 50,000
units
runs 0.1fps
Use com.unity.entities. Documentation is here: https://docs.unity3d.com/Packages/com.unity.entities@latest
good lord, can you type sentences?
@frigid wigeon Type properly please
Basically understand what is Entities first (without even displaying anything). Then try to use Hybrid Renderer to try and render things
We're all here trying to understand what Entities is so, yea.
😦
I still use Graphics.DrawMesh instead of the hybrid renderer because at least I know it will work
even though I'm leaving a lot of performance on the table
Also still Unity 2020.1.9f. Unless there is a compelling reason to update, I usually dont.
Burst I think has minimum required version of 2020.2+
Huh, it works over here for me
Maybe its a newer version you're thinking about
Here's my journey: Understand how to schedule an IJob -> Understand how to schedule an IJobParallel -> Understand what Burst is -> Install Entities
HybridRenderer is good for static/non-moving objects out of the box. For many moving ones it gets a lot more complicated because the Transform/Hierarchy system gets in the way and needs to be tamed
I'm using Burst 1.4.1
2019.4+. Not 2020.2. Dont know where I got that from
2021.2+ for the standalone version... I think. no clue what BurstAoTCompiler is
Oh Ahead-of-time compiler?
yea
JIT in editor, AoT in build
Ah
I've compiled my project in 2020.3 perfectly fine so no clue what it means
But yea, dont use DOTS right now. It's not friendly to new programmers
iam not new 😦
Stick with regular Unity. DOTS will not be ready for another 2 or 3 years
Do you know what Burst is?
Yep, that's Entities. Really dense "documentation" that's great as an overview but honestly doesnt explain any DOTS programming patters at all
thats true
And here's Burst: https://docs.unity3d.com/Packages/com.unity.burst@1.7/manual/index.html
thanks 😄
Forget entities, it's basically useless until you know what Burst is inside and out. Try to start in Monobehavior scripts located on your camera and try to do things with IJobs
this will fix my frame issues 😦 ?
Theoretically: Yes. Practically: No. Not unless you basically rewrite your entire game from the ground up.
Burst isnt just a fancy compiler, it's a "~lifestyle~".
DOTS in a nutshell is: "Is this Burst compile-able? Will Burst be able to read this? Can Burst vectorize this? How can I order my entity components to allow for Burst to optimize this?"
Burst burst burst burst.
One little golden tag, so much pain.
This is how I started.
I only started using the actual entity component system after collecting data to put into jobs took 100 times longer than executing the actual job itself :p
Same here. I started with mass amounts of NativeArrays filled with data before I realized that Entities does that for me.
dear mother of god
i know
Oh jesus
so
"Saved by batching: 0"
lel
i have to optimize the game 😄
DOTS will not help. Custom render pipeline will.
URP ?
I doubt the bottleneck is information processing. You need to go beyond URP and HDRP and code your own Scriptable Render Pipeline to handle that many entities.
And how many entities is that? 10k? That's nothing.
😮
It could be. Updating transforms, submitting draw calls, calculating transform matrices....
gets super expensive if you have this amount of stuff on screen
At that point, you need to use a compute shader and do all those calculations GPU side and remove the CPU - GPU transfer bottleneck
For me, total time spent on Graphics.DrawMesh regularly exceeds time spent on updating all my ComponentSystems
@frigid wigeon From my experience, you need to get the "Batches" number down. 41852 is too much for any computer
How many vertices are in that character mesh?
let me check that out !
What are you going to try? :p
Lod system?
thats crazy
iam planning to implement it today
What is too bad about SkinnedMeshRenderer is that there is no GPU instancing support for it
you need to merge those meshes a bit before considering converting them to entities, DOTS is awesome but 41,000 batches with 38,000 shadow casters and 10,000 animators playing simultaneously is just waaaay too much
I believe rendering 40000 of the same thing should be possible if you could use GPU instancing.
but not with animated character though.
i dont think GPUI works with Entities does it?
No clue, but you could use Entities to calculate all the transform matrices, then use Graphics.DrawMeshInstanced(Mesh, Matrix4x4[]) or something like that
i love you
i was thinking this entire day, i probly can fix it within 3days
I would merge the meshes in batches of maybe 100, then convert the resulting meshes into Entities, then shove those merged meshes into Sub-Scenes in maybe batches of 100 themselves
Does batching even work with SkinnedMeshRenderer?
lol guess not
662.4M tris, what is this madness? 😄
nope, i know it wouldnt work with Entities anyway, GPUI i wasnt sure about though
doing well I'd say
i literally writing all notes
from ur guys comment
I LOVING WITH IT
❤️ ❤️
I test everything out
my calculator says one mesh has 66k triangles. I think overwatch has like 17k
🙂
I would use LOD to break a character up into cylinders and spheres for limbs and heads. Then use GPU instancing to just blast thousands of them onto the screen without too much trouble.
those are pretty detailed and like what, 10-15 on screen? what you want to do is really not possible with animated characters
even if you could render them, the animation system breaks down
one part in DOTS that's REALLY experimental is the animation package which could help in this case
but I honestly doubt it for 10k.
I remember once seeing this crazy Unity game or tech demo that had like 10000 animated characters. This was from before DOTS.
No clue how it was done
@frigid wigeon https://medium.com/chenjd-xyz/how-to-render-10-000-animated-characters-with-20-draw-calls-in-unity-e30a3036349a
I found this
This person uses GPU Instancing, BUT...
this is quite clever
he uses an "animation map" to manipulate vertices in the vertex shader to make it appear animated
ref NativeReference<float> parameter does not result in vectorized addition while float* [] does. So odd.
6.19ms single threaded for 250k...
That's really good.
With float*[]
O.o
it's really not. the really bad thing it's even slower parallel
float pointer has less commands but the ref nativereference seems to be unrolled
6ms for 250k entities? Hrm
maybe it is good, in the context of 16.6ms frame budget it's not though 😄
Oh, yea, that bad.
Do you have to use a NMHM? What are you doing with the int's per entity?
Are you summing these together per entity?
the parallel spells can't directly apply the damage amount to the target otherwise I have race conditions between the threads. so the damage amounts are written out and then brought together in the NHM and then they are applied to the entities
Ah ha, allow me to introduce you to the magical world of atomic calculations
Enter stage left: Interlocked. https://docs.microsoft.com/en-us/dotnet/api/system.threading.interlocked?view=net-5.0
those just kill performance further
I had this same issue with summing entity count to dynamic entity targets
I've seen them in action with nativelist.parallelwriter
i've to check if CDFE is atomic
maybe I don't need to do this lol
hear me out, i've profiled this myself. Cut a 4ms singlethreaded job over 2.5 million entities to 0.07ms parallel
Fuck the CTO, he doesnt know what he's talking about. Interlocked is amazing
hm, in my test i'm writing to one single target. that's really worst case
CDFE is not atomic. However, by using pointers in combination with interlocked, you can atomically add values together
Ehhhh, yea
But try it out. If it's equal performance, well you'll cut down on the memory requirements of the nativestream and NMHM
public unsafe struct Census : ISystemStateComponentData, IDisposable
{
// ReSharper disable once Unity.RedundantHideInInspectorAttribute
[HideInInspector] public int* Ptr;
// ReSharper disable once Unity.RedundantAttributeOnTarget
[SerializeField] public int Value => Ptr[0];
public Census(bool i)
{
Ptr = (int*) UnsafeUtility.Malloc(1, 4, Allocator.Persistent);
UnsafeUtility.MemClear(Ptr, 1);
}
public void Dispose()
{
UnsafeUtility.Free(Ptr, Allocator.Persistent);
Ptr = null;
}
}```
That is the target value, the collection of summed ints (or float or whatever floats ya boat).
[BurstCompile]
private struct IteratePopChunks : IJobEntityBatch
{
[ReadOnly] public ComponentTypeHandle<CitizenshipChunk> Citizenship;
[NativeDisableParallelForRestriction] public ComponentDataFromEntity<Census> Census;
public void Execute(ArchetypeChunk batchInChunk, int batchIndex)
{
var countryEntity = batchInChunk.GetChunkComponentData(Citizenship).Country;
unsafe
{
// Enabling interlocked pointer from component data from entity cuts time
// from 4 ms as single-thread to 0.07 ms multi-thread (1 ms total)
Interlocked.Add(ref Census[countryEntity].Ptr[0], batchInChunk.Count);
}
}
}```
That is the process. In this case, I'm summing to the target all relevant chunk counts. In parallel.
If you're using anything other than int variants, you'll need to make your own interlocked extenstion that adds floats together. Plenty on stack overflow
interesting usage of ISystemStateComponentData, I assume this also works for Icomp
ECS, GPU instancing, LOD, General Optimization code , animatiom map are the solutions to fix this bloody issue 😄
Yea but you need to manually dispose of the pointer. So system state or whatever else is needed
cool thanks, gonna give this a try
Did you ever check performance of:
- Writing to a NativeHashMap<int, int> where key is batchIndex and value is sum.
- Scheduling a second job (just single-threaded) with first job as dependency, in which you sum these values?
I'm kind of curious how that compares to your current implementation.
This is what I would do given the same problem, and is probably what that CTO would want you to do, right?
Yea. Or completely redesign your data structure to remove the need to have intermediate buffers. Somehow
But fuck 'em. Interlocked is my God (read Daddy Gates) given right and I will use it everywhere I can.
lol
Although I have to admit, I cant identify anything that could go super wrong here. Worst case is one or more threads have to wait for eachother to finish their integer addition.
Yea, thats what it does. Just waits in a while loop until a slot opens for it to then add itself onto the value
it's not the most optimal but it completely eliminates any need for intermediate buffers
I wonder what kind of abuse Joachim was thinking of when he wrote that post
Maybe the story would be different if you would iterate over Entities instead of Chunks (to sum some ComponentData value or something)
True. There's only about 7k chunks. Imagine 2.5M entities using interlocked
7k is nothing, largely
i barely get to 7k entities lol so its a lot to me
bump those numbers up. Stress test ya systems. 700k entities good
Anyway, cool stuff man. Really interesting to see this kind of approach
Honestly, i'm expecting use of around 100k entities at most. But 2.5M and maintaining good CPU times will be key. As only with many many entities will small optimizations show up in the profiler (outside of the usual random changes)
look at my sad worker thread utilization
155ms, what the fuck
holy shit. I'm here counting the ms until my 144fps system will require shifting to FixedUpdate and you're here with 8 FPS
EntityCommandBuffer playback for AddComponent and RemoveComponent takes super long for some reason
its only this bad when I move the "camera" several kilometers in a second
singlethreaded structural change. Of course it'll be very expensive
with normal traversal speed its better, but still not good enough
ooh 😮
And it's on the main thread so it blocks graphics update
There's no way around that unless you're willing to shift all your entities onto a separate world then shift it back, which may be even more expensive
Hmmm...
actually, give me a sec. Let me see how much my 3M entity transfer between worlds cost
implemented Interlocked.Add now and it's working with burst disabled. when I enable it it's throwing errors and that I should disable burst to check ... haha
gonna work on this tomorrow, I think overall it will be an improvement. so thanks again
most of the trouble comes from having to update MeshColliders, MeshRenderers, and normal map textures and such
So I have to return to the main thread every once in a while to do that.
So you're saying Worlds update completely in parallel?
Yes, kinda. You can move a world entirely to a Job thread, independent of the main thread and the graphics update loop. Allowing for a solid FPS while a different world is slogging through a job
HHmmmmmmmmmmm
Assuming that the move chunks job timing scales linearly, the move chunk will cost 0.27ms for 7k entities
Which is nothing
that already beats what I currently have
EconAuthoring Creation job is veeerrrrrryyyyyyyyy expensive. Over 5 seconds in RT. But it's divided over 300 frames allowing for a solid non-blocked 60FPS
burst was acting weird. it's working now. kind of like a free removed race condition. interlocked is my god now too
Performance? What's the timing?
could free up 2 jobs and the write/read to nativestream. pretty huge tbh
#1 cause of slow code and complicated systems for me has been keeping ObjectPools with Meshes and MeshColliders and such. Every TerrainPatchEntity too far to be visible should not take up a large Mesh, so it is released back into a pool. But then reallocating these takes long and is hard to jobify.
I hope DOTS physics and rendering soon becomes good enough for me to use for this...
41ms before and 30ms now.
Interlocked is our God Daddy Gates' given right. Good to see another convert to the Church of Interlocked.
Parallel? That's pretty bad
that's the whole frame timing. i can't measure just the interlocked timing right now
But I guess it is the worse possible case of 250k targeting 1 entity
Ah, Well it shaved off 11ms so good.
No clue about terrain. Unity's supposedly planning something for native DOTS terrain but that was promised 3 years ago
By the way, would it be faster to create a new Entity with an extra component in its archetype instead of adding a component to an existing Entity?
Yes.
My terrain is spherical so I dont expect I can use anything Unity will offer.
thats why you usually always pre-define the components located on every entity in its archetype
True. If regular Terrain doesnt work, DOTS wont either
well not the best absolute timings with profiling now but one worker thread takes 36ms and 2.58ms is the Interlocked.Add
Have you tried to do the job in parallel?
Wtf. I shall try this tomorrow then.
With interlocked add, parallel scheduling is possible
289ms overall and 20ms for interlocked overall
it's already 8 worker threads 🙂
i'd say this was very much worth it because eveyr other solution is pretty bad
2.58ms vs the what, 6ms earlier?
And the worst case scenario. That's over a 2x improvement in performance.
11-12ms before. if not more.
yes, it's absolute worst case. like, it can't get any worse haha
yeah i'm gonna stick to interlocked now for race conditions. all other solutions are not working out and i'm here for performance not for winning some price in style and finesse
Yep. All in a day's work. Anyways I've gotta go to sleep. I'll be around tomorrow or whenever lurking in this channel. Because Burst is Love. Burst is Life.
me too, have a good night
Interlocked can not be vectorized. If you can vectorize, that has a looooot better performance
The main cost of interlocked usage in my case (the census) is the inability to vectorize "reset" the value to 0. Because of the pointer redirection, it can not be properly vectorized.
See you around guys. Thanks for all the great help so far.
im pretty sure I could solve that with intrinsics. And Im looking for it... tomorrow. Sleep now. Yes
Is there any work around to pass a string into a job, to be used in a switch case statement? I know you can pass it in a NativeArray<char> form, but I'm not sure how to switch case on that, any ideas?
How many possible strings and how many switch cases do you have? This might make it easier to figure out how to best do this. And consider what the string is indicative of, and if you can swap it to an enum or int before passing.
Basically, I wanted to do some sort of delegate function passing into my job system, but since found out that's not really possible. So now I want to replace it with a switch case with all the possible static functions referenced inside the switch case. But, the switch case will be under constant additions as I pretty much am always creating new "algorithms" for it to run. Atm, I have each algorithm being a class with a name and run function deriving from some shared interface. I've got all these classes stored inside a dictionary for other external non-job uses. They are paired with an ID, but that ID is different on each run so I couldnt use that to identify them inside of the job. Which is why I went with a name, which ends up being the classes name. An enum does some useful here, do you think that would work?
It's a bit of a shame I couldnt get any delegation working cause the switch case needs to be maintained constantly
This sounds like an ideal case (if you'll excuse the pun) for a big enum. That will give you very clear naming, making code easy to read, and switches love nothing more than enums, even if they're huge.
enums are superficially a naming of ints, so if you need to be dynamic, you can use those ints in the case, too, by passing in a variable that's got the int value of the case/enum you want, as things change. This can get a bit fiddly, but much less so than messing with strings, dictionaries and classes. Also, I don't know how you're doing it, but those functions need to get into Jobs without reference to the classes/instances... as Jobs don't like references. Have a read of this, for how to think in terms of the world of Jobs vs References: https://www.jacksondunstan.com/articles/5397
JacksonDunstan.com covers game programming
Thanks! Atm, I'm calling a static function inside of the job. What I tried to do, was to pass in a class and call it's static method, but the class was a reference type so I couldnt. So is it actually possible to pass in a "static function" to a job?
This is very interesting, so that allows you to reference managed types, but not to read them? So I wouldnt be able to wrap my algorithm class instance and make the call to it's static function would I?
I think you'll have to find a way to alleviate class based thinking, altogether. Not merely so that what you do works, but so that it works well. Do you know much about the stack vs the heap, and how that differentiates what's going on? If so, you've gotta (to get the performance boost of Jobs and Burst) think about how to get as much as possible onto the stack. If this doesn't mean anything, then start thinking in terms of arrays of structs, like NativeArrays<structTypeYouMade> so that you can do multiple things at once. Perhaps the best thing I can impart is this: structs can have functions, and they travel with the struct! So if you can pass in a struct, it's got the function you might need within it, ready to go.
Can I pass a struct in and have it guaranteed that the struct does infact have this function?
you can pass in a struct with the method or in your job struct, call a static method with all the arguments you need to do the operation
But if I want to replace this struct with another, how can I call that function instead? Without having to change the code
You might want to consider Burst function pointers
I think you can also have an interface on the struct and hint the compiler that it is completely blittable
struct JobStruct<T> : IJob where T : unmanaged, SomeInterface {
public T YourStructYouWantToCall;
public void Execute() {
YourStructYouWantToCall.SomeMethod();
}
}
Thanks I'll check this out
I don't think the struct constraint is enough for that as a struct isnt always blittable
yea, unmanaged works better, couldn't remember if you can have trailing interfaces after unmanaged. Not really sure why I thought you couldn't have interfaces trailing after unmanaged in the first place.
Yes, it's always in there, but must be its own function, and must be operating in its own way, without reliance on outer references, or any references. Everything in a struct is copied into a Job, when the struct is passed in. This is known as "copy by value" or some other nonsense. Programming terminology is ridiculous. In other words, don't try to over think this. If you start thinking about how to put different functions inside a struct, you're going to get into a pickle, and need to consider the ideas state, and perhaps making a way to determine what Jobs to issue rather than having the work in the Jobs doing the determinations.
Alright, a few hours later I tracked down my bug. Shared component data properties are not registered with the entity remap functionality. Entity fields / properties located in SharedComponentData are not properly re-index'ed upon transfer to new world. It is thus mandatory to create a temporary component data representing the intended shared component data then manually assign it post-entity transfer to another world. The more you know.
thats a problem because shared component assignment and the resulting mem-copy is extremely expensive. I'm trying to get loading to be non-blocking but if I do the shared component assignment on mainthread, that's a giant lag spike...
Also huh, unity transport, the DOTS ultra lowlevel multiplayer API has been releasing updates regularly and quietly
And apparently it's pre-release flagged. That means full release within the next quarter
That may mean a big announcement on Entities front, because why would they release Transport and not release a "newer" version of Entities?
Hey KornFlaks o/
lol what are these patch notes in transport: Some public APIs that should have always been internal are now internal
'ello. Just looking around the blog and there was a mention in the newly announced Unity Multiplayer Support a small tidbit on the upcoming full release of Transport (along with integration with the new multiplayer services). And when I checked the changelog, it's 1.0 already
yeah, pretty good to see. also hope that means good things for entities and netcode
I'm currently playing around getting anything vectorized but Burst doesn't want to
can you see anything?
Ah, yea. That pointer redirection fucks with autovectorization
I have the same issue, described yesterday about resetting the unmanaged value the pointer was referencing
ah, i feared that. the health struct has more values in it, hm
so in essence we have to use blittable data type arrays?
You have 2 paths forward.
- Accept the non-vectorization. Isolate the pointer and value combo into its own component and take it out when you need to and eat the performance cost as payment for the magic of Interlocked.
- Learn assembly. Code out specific assembly instructions to manually pack the ultimate value and then handwrite the vectorization:
https://blog.unity.com/games/updated-guide-for-using-neon-intrinsics-in-unity-burst
Yea. Direct references to those data, no redirection
I mean indirect references does work. Or else how does NativeList variables get auto-vectorized when iterated through (I think). Unity recognizes that something is a pointer and the ultimate value is a blittable data type but that must be identified in Burst itself.
However that is under question as DynamicBuffers are not autovectorized despite being a pointer to a NativeArray equivalent (which also cant be auto-vectorized apparently or I couldnt get it to work.)
huh, this is really quite limiting :/ had hopes that burst would be smarter.
Dynamic buffers not being vectorized is really concerning. I need to do a lot more tests to figure out why
i know vectorization only from shader programming and this is nearly 10 years ago. but it worked for structs like float3/4 etc... maybe I'm off but this is the same thing
somewhat dissapointing that this is not further along
Vectorization according to the assembly is taking a reference to a value then assuming the next 4 or 8 pointers after it also point to the exact same data type. Packing those pointers together, then doing math on all of them in parallel.
The problem here is that while the pointer itself is packed linearly and able to be vectorized. The int value what the pointer is referencing is not packed linearly
Clearly there is a way to pack it linearly, that's the entire magic of Entities, taking pointers and "dynamically" linearizing component data behind the scenes to allow for burst vectorization.
so that I get this right, the problem is with the Ptr[0]?
Yep
ah ok, makes sense
output[i] += temporaryHealth[i].Ptr would be vectorized, but it's literally adding the pointer's memory index to a value which will produce garbage.
output[i] += temporaryHealth[i].Ptr[0] is not vectorized due to applying the pointer's memory redirection to acquire the actual useful information.
Now that Ptr[0] can also produce a pointer, you can ref the output of that, and thus you can manually pack and vectorize the addition yourself if you use Burst Intrinsics (aka Assembly).
good to know, i'll hold off in this case 🙂
the job doesn't make much sense anyway hehe
a relic from yesterday so I can write the interlocked.add test faster
grab output[i], output[i+1], output[i+2], output[i+3] pointers, pack them, then grab temporaryHealth[i].Ptr[0], 'temporaryHealth[i+1].Ptr[0], 'temporaryHealth[i+2].Ptr[0], 'temporaryHealth[i+3].Ptr[0], pack the pointers, then vaddXX them both.
it'll probably be 1.1x or 1.2x faster since you're doing it manually
The main problem of doing this is that you'll have to swap commands to pack and add for about a dozen or so different processor assembly commands:
So one add function now becomes a 50 line switch statement monstrosity
haha, yeah, no, I get flashbacks to the c++ engine I worked in the past. but thanks for the suggestion
yea. It's an option that if it's the largest contributor to your performance cost, manually hardcoding out vectorization where Burst doesnt recognize it is a very tedious job.
But if it turns your game from a 40FPS to 60FPS (8.3 ms difference per frame), yea it'll be worth it
Ehhhhh, never profiled it myself. But it's literally free performance. Might as well do it.
Well, free in that you assume likely / unlikely properly
Hint.Assume reads weird: The assume intrinsic is powerful and dangerous - telling the compiler that a condition is always true
What sense does that make?
When you're debugging and need to see what that branch will say in the inspector. Burst will automatically delete other branches resulting from a Assumed true value
That way you dont get confused by the million other random shit in the inspector
yeah, i just read about safety checks also. okay, not too interesting for me 🙂
Since you cant see what is which branch if the comments dont line up perfectly (I wish we could add compiler comments my god, would make looking through the inspector so much easier).
that would be really useful! most relevant code I find at the bottom lol
Turn on synchronous burst compiling. No-Inlining functions will appear near the top of the inspector.
Asynchronous burst compiling shuffles the burst output and makes reading through it very confusing
Huh, the inclusion of BurstCompile compile synchronously throws a lot of errors
wait, function pointer
Twisting the burst compiler and C# itself to allow for type inheritance and generic reactions in the form of "spells". A good read
Definitely in no way optimal performance.
Reminder for self tomorrow. Test element at ref function compatible with interlocked and vectorization. Check if stack alloc returns ref-able value.
thanks for the link. sounds really interesting. i'll finish first and ponder what that could gain me
right now the biggest value I'd see is not having so many branches
which is BIG
You still have your testing interlocked job?
How about you try this test for me. Because I can not get it out of my head
And I dont want to walk 30 min back in the rain to code it myself
for the race condition on the health value? or the temporaryHealth job?
Temporary health job
Replace temporary health with this I'm gonna write out
[StructLayout(Explicit)]
Public unsafe struct TempHealth : IComponentData
{
[FieldOffset(0)] Public int Value;
Public unsafe ref int GetReference()
{
fixed(int* ptr = &Value)
{
return ref ptr[0];
}
}
}```
gonna try 🙂 give me a min
Replace the .Ptr[0] in the interlocked.add with .GetReference(). Replace the summation section with .Value.
I wrote that out by memory on my phone. I cant recall what class Explicit so please fill that in with autocomplete
thanks, I wanted to get rid of the manual alloc/dispose
the job is not vectorized
Did it at least work?
i think reading and writing to healths[i].health is the problem
i'll try the interlocked now
+= is vectorized. I've done it many times
Wait. .health
The subsection of the struct, ya, that is not vectorized
Isolate the health parameter and the addition will be vectorized.
success!
What? How?
what you said. isolate the health parameter
Ah. I thought you just recompile it again
You can just reinterpret the health array to int and remove the need for the extra .health
yeah, I saw that you were doing this in your code
would be nice to have a method for vectorized code with more than just 1 value in a struct
I might have some ideas
but first I'm testing the interlocked. you're such a blessing, I actually wanted to rewrite the need for the allocation and then you were just typing it in discord for me 😄
I was hit my inspiration while rain was pouring down my neck. Maybe I would walk outside without my umbrella more often...
Be careful before you catch a cold!
I wanna hear them once ya done testing interlocked
well, not sure if the ideas are any good. i think the the memory layout has to be sequential ints here so there's really no circumventing this then other than to use single values in the struct
I'm personally looking into NativeSlice. If it could "slice" every odd or even index, I could reinterpret a struct containing 2 ints into an int array the native slice every even index to obtain the first property / int using slice with stride to skip every 4 bytes (skipping the odd properties). Then operate on that (assuming it retains pointers to the original array) with vectorized operations and boom, selective vectorization on a struct with multiple propeties.
Or I could just set the for loop index increment to +2 instead of ++. I dont know if that is vectorizable.
hm, interesting, the health value doesn't get updated anymore
Interlocked.Add(ref Health_WriteLookup[destination].GetHealthReference(), amount);
Nooo, fuck
I thought I was onto something. Damn
I'm gonna try it myself but yea, I doubt its gonna work if that doesnt
This is going beyond what I've done before, can you remove the fixed() and just return ref value?
I really doubt that will pass the compiler
doesn't compile. structs can't return this or members by reference
Shame. I'll need to dig into stack alloc to bake in the fixed and I might be able to ref return the [0] index of a stack alloc int[0].
Well, stack alloc needs disposal so that won't remove the Dispose() requirement. Damn
Remove the ref return. Just return the Ptr itself the use .Ptr[0] in the interlocked.
Wait no. It doesnt work like that...
I need to do some research tomorrow to see how I can get a returnable pointer...
Or dig into CDFE to see if I can hook into the entities list before they copy the data for reading
ok, the problem is that CDFE doesn't return the ref. it would work otherwise I think
Yea
It returns a copy. I just remembered. Which is why I had that pointer manually allocated hoops to jump through at creation
You could probably write a wrapper that does this but remove the CopyPtrToStructure
you'll probably need to edit the source directly though
lol just wanted to post that code 😄 exactly. I'll look into exposing this. not having refs on CDFE pains me for a while
since EntityComponentStore is private.
I've been using DreamingImLatios exposed package for a while https://github.com/Dreaming381/Latios-Framework/tree/v0.4.2/EntitiesExposed It can be added there. He already uses the EntityComponentStore
cool didn't know that existed
I had some trouble with his package version so here's my own
i might use the Myri package for audio 👂
@robust scaffold success! I've exposed CDFE and the GetReference is working as expected!
here's the updated version
and don't forget using Unity.Entities.Exposed
off to bed, have a good one o/
slight modification:
/// <summary>
/// Custom returns reference to entity's component as read write access.
/// </summary>
/// <param name="entity">The target entity being accessed.</param>
/// <returns>Direct reference to component on entity for modification.</returns>
public ref T GetReference(Entity entity)
{
#if ENABLE_UNITY_COLLECTIONS_CHECKS
AtomicSafetyHandle.CheckWriteAndThrow(m_Safety);
#endif
m_EntityComponentStore->AssertEntityHasComponent(entity, m_TypeIndex);
CheckComponentIsZeroSized();
void* ptr = m_EntityComponentStore->GetComponentDataWithTypeRW(entity, m_TypeIndex, m_GlobalSystemVersion, ref m_Cache);
return ref UnsafeUtility.AsRef<T>(ptr);
}```
Changes GetComponentData to Read/Write access
And merged in the AsRef<>() into the return function so unsafe is not needed in the SystemBase code
And a bit of documentation doesnt hurt.
hey o/ yeah, we will need to cleanup some stuff. what i posted is a proof of concept
Yep, it works. And it's proven to work. Now if unity could actually implement it...
sucks they are so reluctant to add it. having refs/pointers is so powerful. it feels like best of both worlds for c# and c
They pass around NativeArray's GetUnsafePointer() so they have the power and willingness to do it
there are so many instances where you get a struct copy and then have to set it again when you just change one value. there's no sense in it
i thought thta's the whole point of NativeContainers!
otherwise they have all the power and the dev is scrambling around
but yeah, if they don't do it. we have the capability to make it work regardless
when all is cleaned up, i'll post it in the forums. guess some will be interested in it
and maybe they have already targeted it for the next entities release
one year has now passed by since 0.17 was released internally
kind of crazy because I'm really fucking interested what they have done in all this time. I'm certainly not accusing them of slacking around
public static class CustomSystemBaseExtensions
{
public static ComponentDataFromEntityExposed<T> GetWriteReferenceFromEntity<T>(this ComponentSystemBase systemBase)
where T : struct, IComponentData
{
return systemBase.EntityManager.GetExposedCDFE<T>(false);
}
}```
this.GetWriteReferenceFromEntity<Census>();
Unfortunately the "this." is required but it's close enough to default system base method calling
var t1 = this.GetWriteReferenceFromEntity<Census>();
ref var t2 = ref t1.GetReference(Entity.Null);
I'm guessing that the next Entities release will be a full blown 1.0.0-pre.1
It's what, 0.21 in the manifest leaks or was it bumped to 0.23?
It's been at least 2 full releases internally since they went offline
yeah, it's still so weird that they stopped releasing. apparently they run on a completely different unity version
yea, 2020.3-DOTS
All the fancy new toys like UI toolkit even still targets 2020LTS even if 2021LTS is out.
i thought using 2020.3 would mean they release this unity version. but they never did. might be a big bug fest incoming
Normally 2020 would be on life support, critical bugs but otherwise pretty much finalized, but the "cutting edge" features are still getting backported
no never. it's far too soon for that. there's still gonna be loads of changes before we're anywhere near 1.0.0
All the burst major features are still min 2020.2 while burst itself is 2019.1
Transport is headed out for shipping to public release soon and the documentation is even more sparse than regular Entities
joachim said next release will be soon some days ago. i hope this means this year 🙂
transport is for dots AND gameobjects though (if i remember correctly) anyway, that is NETWORKING that is not that hard to finalize. because it contains no actual logic
ha, not happening. Probably early 2022, maybe Feb / March.
nah, they said: at the end of this year at the earliest. so it's possible
transport has no dependency to entities, so the release means little for it
I guess, the data in really is generic
still a good thing because overall netcode relies on it and that's already heavily delayed
"com.unity.entities": "0.20.0-preview.26". DOTS still on 0.21 (0.20 internally). From the training packages updated a month ago
can you reference that in a project or where did you get this from?
i'm interested
Hmm can't you use something like this so you don't have to expose a different ComponentDataFromEntity struct?
https://hatebin.com/tmdbhonipp
I think I'm gonna add that to my AsRef extensions lol
Seems a bit more logical than our implementation where we just wholesale copied the EntityManager implementation to get access to the component data storage
Anyways, i got yet more classes, be back in.... 2 or so hours. Maybe I can code during class?
Good thinking to just typecast it. I'm sure we can improve this a lot
Hrm, dont have access EntityComponentStore without assembly modification
Ye you need to put it into the asmref
Yep, definitely a lot simpler to use.
Hii. I'm learning DOTS, and.... I'd appreciate any help xD. I'm very used to code with "event Action<>" because it helps to keep different functionalities decoupled.
Is there any kind of events in dots? Or I'd have to use like component tags (without data), along with entity queries constantly searching for an entity with that component to "simulate" an event? What other option there is?
no. there are no event's in dots. and it's unlikely that something similar will come to dots. if you do truly need events, here's a video where this is implemented. https://www.youtube.com/watch?v=fkJ-7pqnRGo
✅ Get the Project files and Utilities at https://unitycodemonkey.com/video.php?v=fkJ-7pqnRGo
Let's learn 2 ways we can handle Events in Unity DOTS! The final class is included in the Project Files.
Unity DOTS / ECS Tutorials
https://www.youtube.com/playlist?list=PLzDRvYVwl53s40yP5RQXitbT--IRcHqba
What are Events? (C# Basics)
https://www.youtub...
There's a near daily thread on event implementation in DOTS. Because there is no "official" way.
Awesome, thank you
Wow, that's crazy
keep in mind that events are anti data-oriented-design
Aah... so that means that I'm still thinking "the old way"
Yea, ideally you should somehow design a different way other than events
yes
keep in mind, everything you want to solve with creating or writing more data is also bad 😉
the best case is, do the task where it occurs
there's a lot of lazy patterns going around handling this and none are any good for performance. if you don't care about it, fine, although i wonder why dots then
exactly
some good rule of thumb, best speed ups come from burst code, vectoriziation and good data layout
and most things are doable that way anyways
and the ones that aren't.......may be in the future
No idea what you mean with vectorization
vectorization is when your cpu does e.g. 4 copy operations at once, instead of 1 at a time
Aaah.. ok... where could I read more about that?
I think I need to see concrete examples to understand it better
Thank you!
I have so many questions lol
For example, no idea how to read this:
.LBB0_4:
vmovups ymm0, ymmword ptr [rcx - 96]
vmovups ymm1, ymmword ptr [rcx - 64]
vmovups ymm2, ymmword ptr [rcx - 32]
vmovups ymm3, ymmword ptr [rcx]
vmovups ymmword ptr [rdx - 96], ymm0
vmovups ymmword ptr [rdx - 64], ymm1
vmovups ymmword ptr [rdx - 32], ymm2
vmovups ymmword ptr [rdx], ymm3
sub rdx, -128
sub rcx, -128
add rsi, -32
jne .LBB0_4
test r10d, r10d
je .LBB0_8
Looks so cryptic
that's assembler language
assembly is pretty easy to learn. you don't really need to learn it but if you want to know why exactly something doesn't run optimally, it can be useful.
but you don't really need to learn it for this documentation since they describe the important parts below the code snippets
assembler is really interesting. there are the small code pieces, which look (clean) and then there are these huge code pieces which look worse at first glance but perform way better than the small ones (in their use cases). a rule of thumb for assembler i find useful is:
if the words are long: it's optimized, if they are short, it's not XD
does anyone know in which country the dots team is working?
why's that important?
because I might move and I'm going through the options 🙂
I think it's pretty spread out
yeah
You have Burbank, CA and I'm pretty sure there's one in Helsinki
Probably some in france & copenhagen 🤷
long good, that's what she said
ymmX is packing of pointers. ymmXXX is the pointer to the set of packed pointers. vXXXXX is the vectorized operation on that packed pointer.
the vXXXXXX is purple and purple is good.
And yea, dont worry about burst optimization right now. First learn to walk before running. Walking in this case is getting a job to be bursted in the first place.
Ok haha, thank you!
Bursting is more than just adding [BurstCompile] to a job, it's an entirely new language "hiding" under C# that Unity is developing called HPC#, High Performance C#.
And first step, ditch the Lambda Entities.ForEach(). Too restrictive, doesnt vectorize. Use IJobEntityBatch struct based jobs.
Whaaat, didn't know that. And where can I read about that or is not out yet?
Burst is fully released, has been for... years?, but it's like a completely different coding language. Like C# suddenly wanted to be C++.
Is IJobForEachWithEntity<> the same than the lambda Entities.ForEach()?
That's why I see pointers and stuff
ForEachWithEntity has been deprecated for years. IJobEntityBatch is ForEach equivalent.
Yep. All that stuff we were talking about over the past few days are burst code
So I'm watching "depreciated tutorials". From where do you recommend me to learn?
Entities.ForEach is actually compiled to IJobEntityChunk. IJobEntityBatch is newer and more optimized
So I don't have to bother you guys all the time haha xD
Yeaaa......... yea. Thats DOTS. There is no tutorials. Just us here on this discord
Ah, feck. You're the tutorial
We learn by hours of trial, error, and figuring out why it didnt work
And, there are no DOTS developers helping you?
Unity DOTS themselves has gone deep undercover and running ultra silent. Nothing constructive from them.
Why you think is that?
there still is a literal gag order on talking about DOTS for all Unity Employees outside the C suite top level. It's in development hell for about a year now
Just ask the Burst employees if something is compatible with DOTS and they will not respond, because they are not allowed to.
Hmm. crazy. And do you have a theory of why they're not allowed to?
Development hell. When progress on code remains stagnant because there's no clear vision forward
DOTS has literal billions of dollars being thrown at it, that's a lot of money for investors that may be going nowhere
So no news coming out to panic investors
I dont remember where I read this but apparently half the core Unity company is working purely on DOTS. Around 40 - 50 people. A normal team like the guys working on the entirety of UI Toolkit (the new UI system) is only 4 people.
Wow. Hopefully there are news in the next Unite now
I'm not too hopeful to learn DOTS after you telling me this though xD
hahaha, my sweet summer child
news? no news.
if they even mention the existance of DOTS
There is supposedly mandatory company wide regular training in DOTS though. Everyone from the people maintaining HDRP to audio have to know DOTS. So it's "alive".
That was from the last Brian's public appearance a few months back
Ok, so it's worth it
Crazy that we had this chat on the Unity's official discord😅
It's worth it ya. How new are you to DOTS? have you made anything with IJobParallel?
I'm super new I'd say, I'm just understanding basic stuff, I've done a lot of stuff with the Job C# system though, but whithout ECS. And I'm watching all the material that I can find. But there are fundamental stuff that I don't understand I think
If you shouldn't use "events" in dots. How would you code a button that when the player presses it opens a door? Or any basic example like that
i use entities as events
ie if a button is pushed, ButtonJob spawns an entity with a ButtonPushedComponent that contains data that is then processed/consumed by other systems looking only for the ButtonPushedComponent component
ewww, structural change.
I dont have any events for now but I'm projecting maybe a single singleton containing a buffer with either a function pointer containing the event or an enum mapping to an ultimate event switch statement containing the logic.
Also
God, Burst's dev team is on point in the forums. Literally an hour to reproduce an error then respond that they're fixing it. Imagine if DOTS was remotely as responsive.
So the logic wouldn't by ran by a System?
Like the logic of "opening the door"
Yep. The system is where all the logic is suppose to be. You dont want any logic inside the components
Also, is it not Burst part of DOTS ? Or is not because you also use it with C# job system?
Ah ok, I was thinking more like a Monobehaviour instead of a Component xD
Nope. Burst is Core Unity. Along with Jobs, Mathematics, and Collections (and soon to be Transport).
Aaaaah right!. makes sense
Component is like the values in an array. Systems are like the logic occurring inside and around a for loop accessing that data array
And to finish that analogy, Entities are the index of the component in that array
Yeah, I guess my question is... if you make a job, that's part of a system class. But this job is called in response of an event, instead of running every frame....
Does it matter if that job that you're gonna run, is part of a system class or can be in any class
Yea, honestly it really depends on the frequency of that event
I do have the singleton entity based event flagging currently:
But they're created and operated on once every 3 or 4 seconds, if not more. That's hundreds of frames where they wouldnt exist.
Now if an even may occur every frame or every 10th frame, yea I'll assign a buffer location on a global singleton entity
Not sure what you mean with "singleton entity", for me singletons are a class with a static self-reference that you can access from any code... And I don't understand an entity as a class, but as a collection of components
Also I feel guilty of asking questions because I feel like I'm stealing your time, maybe you guys should make the tutorials xD
More efficient than answering one by one haha