#Marching cube help to reduce game load times / performance improvements

1 messages · Page 1 of 1 (latest)

rocky oxide
#

A place for me to post my tests and work on fixing my Marching cube systems load times using Task, and async.

#

first i tested with a larger base chunk sizes and smaller World, and Large Chunk sizes which resulted in an expected more expensive Chunk.MarchCube(); function call for both load time and when terraforming

#

next i test with a smaller base chunk and smaller world chunk, and Large chunk sizes were bigger. This resulted in a much faster load time even though the number of individual chunks went up from 512 to 16,384 the number of Chunk.MarchCube(); function calls were alot less. Same with the recorded data for when terraforming but when turning off the profiler it felt much choppier to use the smaller chunk sizes with largechunk being bigger.

analog leaf
#

Well first of all 27 mb of garbage for a single frame is insane

fair gyro
#

You might want to post relevant code alongside those profiler data.

rocky oxide
#

and final test with a World size being larger and the chunk and largechunk sizes being smaller resulted in a much faster load time and being the only time where i could comfortably watch the screen and know what's happening which is interesting cause it had the highest call amount out of the three test but requires the most gameobjects to be spawned at a whopping 524k

analog leaf
#

!code

pale cometBOT
rocky oxide
#

wrong one :/

rocky oxide
analog leaf
#

Doesn’t matter

rocky oxide
#

think thats it

#

though i guess World Generator if we go in order of first used to last should be first not second

analog leaf
#

It’s kinda hard to analyse this code on my phone. I will try to get back to it when I can on a proper PC.

rocky oxide
#

codes a mess huh, sorry 😅

analog leaf
#

But in the mean time it might be worth your time to explore the job system and Burst. There is only so much you can do in a single thread.

#

Not a mess no. That website does not have syntax highlighting and ofc a phone screen is small.

#

Compared to my two 24 inch monitors that is

rocky oxide
#

been dreading that answer but yeah think i might have to

rocky oxide
rocky oxide
#

ugh these damn reference types are making this harder than i had hoped. Edit: seen the NativeList going to use that

#

new one fun :)

fair gyro
fair gyro
#
    void MarchCube(Vector3Int position, int step)
    {

        float[] Cube = new float[8];
rocky oxide
#

ahh but move to caller?

fair gyro
#

Yes, let the caller CreateMeshData allocate just once, and pass the same one to MarchCube, instead of every call of MarchCube needing to allocate one and then throwing it away after it's done.

rocky oxide
#

will definitely help since there were thousands of these chunks, thanks

fair gyro
#

I'm not really available to fully digest your code and understand your goal, but I would guess there are probably lots of these low hanging fruits you can clean up.

#

Additionally, engineering is also a lot about reframing the problem. If A is a hard problem to solve but you don't actually need to do A, instead you can just do B, then that might also solve your issue.

#

Burst and job system will help but that's not the silver bullet that rewriting your code to it will magically make it fast.

rocky oxide
rocky oxide
rocky oxide
fair gyro
fair gyro
#

Yeah.

rocky oxide
#

though im told jobs doesn't like reference data like the float[] is there a Native array?

fair gyro
#

And this optimization doesn't just applies to managed code, even when you are writing Burst code you would still want to minimize allocating native arrays, and this optimization will still apply.

rocky oxide
#

oh wait wouldn't that just undo that if i did this since when pushing to jobs would un reference it cause the reallocation no?

fair gyro
#

Basically point is that, instead of throwing away your existing code and praying a rewrite in Burst will magically fix all your problems, try to understand why your existing code is slow.

rocky oxide
fair gyro
# rocky oxide oh wait wouldn't that just undo that if i did this since when pushing to jobs w...

If you are going to write your code with Burst, you wouldn't be able to use reference type at all, you would instead be allocating native arrays and passing native arrays around, and if you naively "translate" the code you will run into the exact same performance pitfall: instead of just allocating one native array in CreateMeshData, you end up allocating a native array in every call of MarchCube and then deallocating right afterwards, killing your performance.

rocky oxide
#

so not what i want to do, wonder if there anyway i can take the whole CreateMeshData into the same function so as to not need to move it about

fair gyro
#

I mean you absolutely can rewrite your existing code into Burst and you will very likely get a decent performance boost, but yes the point is that you still need to understand the problem with your existing code to know what to fix.

rocky oxide
fair gyro
#

If you fix all the issues with your existing code, it might even be fast enough that you don't need a rewrite at all.

fair gyro
fair gyro
#

You move the code in MarchCube directly into the caller.

rocky oxide
#

this is it in jobs and not the first code you saw

fair gyro
#

Oh I meant you can still keep CreateMeshData and MarchCube separate instead of one giant block of code.

rocky oxide
#

nah if i want to have the float[] and use jobs but not reallocate the data i figured putting the whole loop into the MarchCube function would do just that

fair gyro
#

But as said, you should revert back to your old code, analyze and fix it, before considering rewriting in Burst and job.

#

Otherwise it's a waste of time and effort.

rocky oxide
limpid oyster
#

You should consider making it properly async first. From the look at the profiler screenshot, you're processing 800 chunks in one frame.

limpid oyster
#

The one at the start of the thread.

rocky oxide
limpid oyster
#

What do you mean by "at the same time"?

#

You're currently processing 800. That's what's clear from the screenshot.

rocky oxide
#

sorry intended to run in the same frame

#

is see that

limpid oyster
#

I don't know how many. It depends on how much time each chunk takes.

#

If one takes 16ms to process, then yes.

rocky oxide
#

i mean the test had 524k chunks total i wouldn't want them to run one frame per one chunk

#

especially when run on the same frame equates to about 1k per second (though im not sure how many it would be if it was one per frame)

rocky oxide
rocky oxide
limpid oyster
limpid oyster
#

Next, I'd look at the profiler again for more clues.

rocky oxide
# limpid oyster Ah, OK, so it does work async.

i mean yeah it does unsyncronize the calls so as to not call it all at once but i'll need to find a sweat spot or it'll call to little at a time and have alot of frames or call way too many and have no frame

rocky oxide
fair gyro
rocky oxide
#

still takes 2 minutes for a the 524k chunks or 25,165,824 cubes (i think its this many) to be marched :/

fair gyro
#

2 minutes might be a bit long, but 25 million cubes is a lot, especially (if I'm understanding correctly) you are actually building meshes for them.

rocky oxide
fair gyro
#

I'm not sure how fast you expect it to be, but consider what I've said before that engineering is a lot about reframing problems. Do you need all 500k chunks to be processed right away before player can play, or do you only need some of them (eg the ones where player spawns in) to be processed right away, and the rest can be processed in background/on demand as player move around?

rocky oxide
fair gyro
#

Instead of drilling on how to make this solution faster, take a step back and consider if you are tackling the wrong problem.

rocky oxide
#

i can try again of course since the creation speed of the chunks has increased alot

fair gyro
#

Spawning GOs is pretty slow yeah and you cannot move it to background thread, but you can always do something like "limiting spawning only x chunks per frame."

#

As long as player doesn't move so fast that "spawning only x chunks per frame" cannot keep up, it's irrelevant if a chunk shows up a few frames later.

rocky oxide
#

honestly i can get it up to spawning 10 ish thousand a second, and the current biggest draw back is the await Task cause it limits whats happening but it's meant to

rocky oxide
#

also as far as i understand what this means

#

pretty sure it's the async holding the generation back but im not sure

limpid oyster
restive breach
#

I would argue to implement a ObjectPool for the chunks. The player wouldnt be able to see all the chunks all the time, so you could just have an area around the player and store the chunk data for all chunks, and recycle gameobjects to show the data of the chunks

#

if you do that, new chunks wont allocate new gameobjects, avoiding needing to rely on the main thread for instantiate

#

(would also have the side effect of not needing a gameobject per chunk)

rocky oxide
#

wonder what'll happen if i do the opposite and add another Chunk size so it goes WorldChunk to LargeChunk to SmallChunk to Chunk 🤔

rocky oxide
#

nope also way too laggy to be practical guess the 2 chunk system was the sweat spot

rocky oxide
limpid oyster
fair gyro
#

Complete is a bit confusing of a naming, the job would still complete even without you calling it, just some time later. It would be better named as BlockUntilCompletion or something.

analog leaf
#

Or CompleteRightFuckingHereRightFuckingNow()

#

Also I noticed you do not have BurstCompile on your job

rocky oxide
#

I can try though

rocky oxide
#

Nope throws the error "The previously scheduled job writes to NativeList vertices. You must call JobHandle.Complete before you can read data from the NativeList

#

Unless you want me to somehow call all of BuildMesh(); and BuildUV(); inside the job

#

And yeah mesh is not a value type so i can't

limpid oyster
rocky oxide
limpid oyster
# rocky oxide do you know of a way to have a job call a function after it's done? so as to let...

I don't think there're any callbacks that the job calls. You'll need to check the handle in update, async or coroutine to see if it's complete.
That being said, it's kind of a wrong approach to using jobs.
The whole point of jobs is that you can schedule them somewhere at the start of the frame, forget about them and do some other work on the main thread, and lastly collect the results later in the frame, where it is likely complete.

rocky oxide
rocky oxide
#

sorry i mean how would i implement it

limpid oyster
# rocky oxide sorry i mean how would i implement it

Well, you'll need your whole thing to be async( or coroutine) basically, including the method that schedules and waits for job. Then after scheduling, loop in a while loop with an yield for some time, until the handle IsComplete(or something like that) is returning true.

#

Though, it's not ideal since I think jobs are meant to be complete within a frame, while async/coroutine don't have that many "resume points" in the frame.

rocky oxide
limpid oyster
rocky oxide
#

after changing it a bit it isn't throwing an error but it's also not the fastest ~6fps

#

but it is spawning 2.5k per frame i think

#

though it also doesn't have lag spikes the framerate is consistently the same

limpid oyster
rocky oxide
#

fair, eating currently so it'll take me a bit

analog leaf