#Compute buffer

1 messages · Page 1 of 1 (latest)

dusky glade
#

sorry to bother...again i'm stuck...what should be the computeBufferStartIndex when i am creating and removing arrays? i have a list of arrays which is being operated by another script...in that list, some of the arrays are being removed and some are being added according to my frustum culling calculation....here i am trying to send each arrays from that list to the buffer using a for loop.....i can't figure out how should i determine that computeBufferStartIndex every frame....

#

Compute buffer

robust pelican
dusky glade
#

the problem is the number of arrays are inconsistent, they are changing about every frame based on my cell based frustum culling system(each cell represent an array here)....in this case i have to create new buffer of inconsistent count and dispose it every frame when my number of arrays are updating....i heard it costs performance.....can you suggest me something to avoid this creating and disposing every frame...i still can't think of any good solution

robust pelican
dusky glade
robust pelican
dusky glade
#

thank you dude...i guess finally i found a way...let's see if it works properly

dusky glade
#

@robust pelican now i have two different types of meshes to render, so i need to call DrawMeshIndirect two times, right? as far i got to know, i have to pass the position buffer of the instances to the material, now is there any way to draw these both types of meshes with a single shader and material? i tried, but when i passed the position buffer of type A to the shader, it works fines, when i send two different position buffer for type A and type B mesh, how should i use these both buffers to render those both meshes using their respective buffer?

robust pelican
dusky glade
robust pelican
#

Sure, just use the same same material( or 2 materials using the same shader) for both calls

dusky glade
robust pelican
#

What both buffers..?

#

You only need 1 buffer

#

All it is is positions, right?

#

You pass the positions of the objects you want to render. There's no need to identify them as "grass" or "flower" positions

#

They're all just positions in the end

dusky glade
#

then won't both meshes will render in every positions?

robust pelican
#

No, you pass only the correct buffer to each drawcall

#
Draw Flowers:
  Bind flowers position buffer
  Draw flowers mesh
Draw Grass:
  Bind grass position buffer
  Draw grass mesh

makes sense?

dusky glade
#

did it explain the problem?

dusky glade
#

@robust pelican is there something i'm doing wrong?

robust pelican
dusky glade
robust pelican
#

Share via paste site.
!code

quiet radishBOT
dusky glade
dusky glade
robust pelican
dusky glade
#

thanks

dusky glade
#

it worked properly

dusky glade
robust pelican
#

Probably no way to do that.
If you're just bothered about duplicating it manually, you could instantiate the material copy via code.

dusky glade
#

it's alright, no problem to duplicate those...

#

@robust pelican sorry to bother again. another thing i want to bring up now, i have my frustum culling system in compute shader, but i want to eliminate the objects which are not in view since they are behind other objects...how do fiilter out the objects behind...more clearly, how do i detect which objects are in view (not completely covered by other objects)....can you help me about this?...i searched for a while, but all of these articles and forum questions are related to cpu thing, how should i do it in the compute shader?

robust pelican
#

Well, you'll basically have to implement occlusion culling on the GPU, which is not as easy imho.
Unity has occlusion culling system that required baking the occlusion data in the editor. If you're procedurally generating stuff, you'll need a dynamic occlusion culling system.

I've no clue how to implement one. I'd assume that would require sending a buffer of object bounds to a compute shader and doing some kind of raycasting in it. To be honest, at this point you might be better off writing your own engine as it seems like you're going against everything that unity provides you as a developer.

dusky glade
#

i think, i should not dive that deep:)

dusky glade
#

@robust pelican I had a curiousity , like , can I use it without having vulkan api?

#

it's too slow on mobile

#

like opengles 3.0 give 60 fps where switching to vulkan make it worse like 45 fps without any other changes according to my research i did months ago

#

idk why vulkan api is too slow

#

but with open gles 3.0 I see no grass when playing in mobile

#

when switching to vulkan graphics api , I get to see the grass, works fine, but vulkan reduce a lot of performance idk why

#

can I somehow get rid of vulkan?

robust pelican
#

The performance drop is probably due to rendering your grass

#

I think opengl 3 just doesn't support instanced indirect drawing(or whatever you were using), so the grass is not rendered at all.

dusky glade
dusky glade
#

I disabled grass

#

still 42-45 fps

#

about 5 months ago I found out opengl es is faster than vulkan

#

so I switched it as default back then

#

but now it needs vulkan

#

the doc says that it can also run in opengl es 3.1

robust pelican
#

Okay. Might want to profile the performance. Typically, the difference between the graphics apis only leads to performance improvements, not the other way around

dusky glade
robust pelican
dusky glade
#

still grass isnt appearing ...

#

i doubt it's something with the code

#

I have also checked 3.2

robust pelican
# dusky glade

It also mentioned only supporting 4 compute buffers at the time, which could be what's limiting the rendering.

dusky glade
#

still not working

dusky glade
dusky glade
#

as I have profiled the main culprit is gfx.waitforpresentongfxthread

#

in opengl it stays around 12ms

#

consistently

#

but in vulkan , it has spikes , sometimes 9ms , sometimes 33ms periodically

#

idk, vulkan was supposed to be better , it seems it's just a hoax

dusky glade
#

I have found out that some phones with mali, adreno gpu shows decrease of performance on vulkan api

robust pelican
dusky glade
#

lots of people have submitted this bug , but no improvement actually

robust pelican
#

Preferably both the 9 ms and 33ms

dusky glade
#

I have never profiled gpu module, lemme try , i will inform you the details

#

yeah

dusky glade
#

well gpu profiling doesnt work with vulkan

robust pelican
#

Well, then there's not much you can do I guess.

#

If you really want to dive deep, you might be able to profile with the dedicated vulkan profiling tools.

#

Also, might want to have a look at logcat during the gles 3 build. There should be something about your grass draw calls.

dusky glade
#

i wanna bother you more a bit, if you don't mind...

#

is there any way to minimize the performance cost of SetData function for a compute buffer? anything like job or anything else that you know about

robust pelican
#

Can you share the profiler data?

dusky glade
dusky glade
#

right now i am only drawing one type of foliages, there're 7 more, looks like it will take about ~1ms or maybe more

robust pelican
# dusky glade

Does it expand into more calls if you profile with deep profiling?

robust pelican
# dusky glade

Might want to try BeginWrite/EndWrite. The docs say it could be faster.

dusky glade
#

i'll try it, thanks

robust pelican
#

But also maybe consider not updating the whole array(how many elements does it have?) every frame.

dusky glade
#

it changes every frame

#

cz the number of cells of grass in visible frustum changes every now and then

robust pelican
dusky glade
#

right now i am actually sending 4x4 matrix to both compute buffer and shader as i have different scale, pos and rotation for each grass...

dusky glade
robust pelican
robust pelican
robust pelican
#

When a chunk enters the view, that's when you update the buffer.

#

Or exits.

#

Unless these objects change position/scale/rotation every frame?

dusky glade
# robust pelican The choice is not to update all the element every frame. I think I told you befo...

i have cells in my scene, visible cells are added into a list and the list is cleared every frame...then i send the array to the buffer....i tried what you said earlier...but in my chunk, i have about 300k instances....to iterate over all those every frame, even the gpu shows some heavy performance cost...moreover, i am targeting mobile platform...so that's why i didn't dive any deep into that matter

#

to be honest, i don't know how to do it more accurately

robust pelican
dusky glade
robust pelican
dusky glade
robust pelican
#

Can't you just render visible chunks and not render the invisible ones?

#

If you're gonna frustum check each grass instance, it's gonna kill the performance regardless of whether you do it on CPU or GPU.

#

Rendering some extra instances that are in half culled chunk is not gonna hit the performance as much as what you're trying to do.

dusky glade
robust pelican
#

Isn't that the whole point of using chunks. Otherwise you might as well just have all the grass in one huge array.

robust pelican
dusky glade
dusky glade
robust pelican
dusky glade
#

when i store the instances data in the buffer, where is it stored actually? in memory? i'm asking cz i have 64 huge chunks all loaded in one single scene....all those chunks in total contains about 8-9 million foliages....these much data makes a huge sized data....will it be alright then?

#

@robust pelican what do you think?

robust pelican
robust pelican
dusky glade
#

4x4 matrices

robust pelican
#

Computers are all about bytes and bits. It doesn't care how many flowers you have

#

Well, one matrix is 16* 4 bytes assuming it's floats.

#

So 64 bytes per element.

dusky glade
#

yeah

robust pelican
#

How many elements do you have?

dusky glade
#

about 8 millions in the whole open world scene

#

not sure about the count, those foliages were painted during the level designing process...but nearly 8 millions

robust pelican
#
  1. You don't need to upload the whole world to the GPU.
  2. Even if you did, that would be around 500 mb I guess. It's not little, but it's not uncommon for games to have buffers of such size.
dusky glade
robust pelican
#

Yeah, I know

#

I never suggested uploading the whole world data to the GPU

dusky glade
#

now when i move in the scene, sometimes i stand in between 4 chunks, sometimes 2 and again sometimes only on one....that requires getting the data in the buffer...but each chunks contains almost 300k or less elements...this data, while taking to the buffer, sometimes causing long spikes and screen freezing and some performance issues....so i though i could split each chunk again in multiple cells

dusky glade
#

i think i would just stick to as it is right now, rendering these is such a heavy load...but you really helped a lot.

#

i guess i will have to bother you again later....don't mind😅

robust pelican
#

Here's something I should have mentioned long time ago:
Almost any kind of optimization is a trade off between different things. In your case you're reading off memory usage for CPU and GPU time. How much you want to trade off one for the other is up to you. But there is no magical "optimize" button that you press that makes all your worries go away without a price.

This is why I'm very sceptical when people start calling lower level API, like draw calls in their code lightly - unity already provides the most optimal rendering workflow for most general cases. There's even GPU instancing on the material and terrain grass.

I bet that when you started this thing you thought that indirect instanced rendering is that magical button that would provide you an ideal solution. But it is not. While it gives you more control over how the rendering is happening, compared to the tools that unity provides by default, it also brings in certain drawbacks.

robust pelican
#

I assume this was before you even started this thread?

dusky glade
#

and i knew that indirect instance too must have some limitations, but compared to other system this is just better...as far i know right now

dusky glade
# robust pelican Another thing is, you mention long spikes and freezing here, but did you actuall...

right now when i am using divided cells in chunks, less amount of data is to be set(around 10k - 35k elements), but if i don't use cells and directly use huge chunks, the data will also be huge(around 800k when standing between 3 or 4 chunks), causing performance issue...just because i have to repeatedly change the data in the buffer based on the player movement....and yeah it was before this thread, right now i don't get spikes...just that around .8 ms issue, i think this is the trade off..i also have to think about memory....

robust pelican
dusky glade
#

yeah i thought about that after you said...but as it is targeted for mobile device, i think it will be too heavy...there's lot more in the scene, the buildings, streets and other stuffs...

#

@robust pelican anyway, if you don't mind answering...for how long have you been doing game dev? you seem to know lot about these stuffs...