#Compute buffer
1 messages · Page 1 of 1 (latest)
sorry to bother...again i'm stuck...what should be the computeBufferStartIndex when i am creating and removing arrays? i have a list of arrays which is being operated by another script...in that list, some of the arrays are being removed and some are being added according to my frustum culling calculation....here i am trying to send each arrays from that list to the buffer using a for loop.....i can't figure out how should i determine that computeBufferStartIndex every frame....
Compute buffer
Let's assume you want to always send 15 values, while each array has 3 values and there are 5 arrays. Then the first array would be written to the index 0 of the buffer, the second array to index 3, the third to 6, etc...
the problem is the number of arrays are inconsistent, they are changing about every frame based on my cell based frustum culling system(each cell represent an array here)....in this case i have to create new buffer of inconsistent count and dispose it every frame when my number of arrays are updating....i heard it costs performance.....can you suggest me something to avoid this creating and disposing every frame...i still can't think of any good solution
Well, you can't have a resizable buffer on GPU. You'll have to recreate it every time, which is bad.
In this situations, you'd usually estimate the max possible size of the buffer and make it that big. Then you can also use another small buffer to tell what indices range is valid in the big buffer.
i have a large space where about 50 million grass instances positions are there...to loop through all this and see if inside frustum, even in gpu, it is extremely performance costly.....that's why i made cell based system...but now i can't find any way to unite these fragments...
Yes, I understand that from your previous explanation. The advice doesn't change: use SetData, to set the data of the visible cells to your big buffer.
thank you dude...i guess finally i found a way...let's see if it works properly
@robust pelican now i have two different types of meshes to render, so i need to call DrawMeshIndirect two times, right? as far i got to know, i have to pass the position buffer of the instances to the material, now is there any way to draw these both types of meshes with a single shader and material? i tried, but when i passed the position buffer of type A to the shader, it works fines, when i send two different position buffer for type A and type B mesh, how should i use these both buffers to render those both meshes using their respective buffer?
Yes, you'll need to call it twice. If you want to render them in one drawcall, you'll need to combine the meshes. Basically what different batching techniques do.
well, i'll call it twice, but can i use only one shader to render both the meshes with two different buffers of positions?
Sure, just use the same same material( or 2 materials using the same shader) for both calls
how do i use both the buffers in the shader then? here is my simple unlit shader to test
What both buffers..?
You only need 1 buffer
All it is is positions, right?
You pass the positions of the objects you want to render. There's no need to identify them as "grass" or "flower" positions
They're all just positions in the end
then won't both meshes will render in every positions?
No, you pass only the correct buffer to each drawcall
Draw Flowers:
Bind flowers position buffer
Draw flowers mesh
Draw Grass:
Bind grass position buffer
Draw grass mesh
makes sense?
yeah, it totally does, let me try to implement this first, sometimes things get really confusing..
@robust pelican if you have enough time, please look at this...
did it explain the problem?
@robust pelican is there something i'm doing wrong?
Well, share the relevant code.
here it is
Share via paste site.
!code
📃 Large Code Blocks
Use links to services like:
https://gdl.space/, https://paste.ofcode.org/, https://hatebin.com/, https://paste.myst.rs/, https://hastebin.com/
📃 Inline Code
Surround code with three backquotes. Not quotation marks.
To format as C#, add cs to the first line:
```cs
// Your code here
```
Add a comment with a line number if there is an error message.
Hastebin is a free web-based pastebin service for storing and sharing text and code snippets with anyone. Get started now.
Hastebin is a free web-based pastebin service for storing and sharing text and code snippets with anyone. Get started now.
here
Try duplicating the material and using it for one of the scripts.
thanks
it worked properly
thank you so much.....anyway, i was wondering if there was any way to do this whole thing without duplicating the material...can you advice me anything regarding that?
Probably no way to do that.
If you're just bothered about duplicating it manually, you could instantiate the material copy via code.
it's alright, no problem to duplicate those...
@robust pelican sorry to bother again. another thing i want to bring up now, i have my frustum culling system in compute shader, but i want to eliminate the objects which are not in view since they are behind other objects...how do fiilter out the objects behind...more clearly, how do i detect which objects are in view (not completely covered by other objects)....can you help me about this?...i searched for a while, but all of these articles and forum questions are related to cpu thing, how should i do it in the compute shader?
Well, you'll basically have to implement occlusion culling on the GPU, which is not as easy imho.
Unity has occlusion culling system that required baking the occlusion data in the editor. If you're procedurally generating stuff, you'll need a dynamic occlusion culling system.
I've no clue how to implement one. I'd assume that would require sending a buffer of object bounds to a compute shader and doing some kind of raycasting in it. To be honest, at this point you might be better off writing your own engine as it seems like you're going against everything that unity provides you as a developer.
i think, i should not dive that deep:)
@robust pelican I had a curiousity , like , can I use it without having vulkan api?
it's too slow on mobile
like opengles 3.0 give 60 fps where switching to vulkan make it worse like 45 fps without any other changes according to my research i did months ago
idk why vulkan api is too slow
but with open gles 3.0 I see no grass when playing in mobile
when switching to vulkan graphics api , I get to see the grass, works fine, but vulkan reduce a lot of performance idk why
can I somehow get rid of vulkan?
The performance drop is probably due to rendering your grass
I think opengl 3 just doesn't support instanced indirect drawing(or whatever you were using), so the grass is not rendered at all.
This function only works on platforms that support compute shaders.
https://docs.unity3d.com/ScriptReference/Graphics.DrawMeshInstancedIndirect.html
no, I tried
I disabled grass
still 42-45 fps
about 5 months ago I found out opengl es is faster than vulkan
so I switched it as default back then
but now it needs vulkan
the doc says that it can also run in opengl es 3.1
Okay. Might want to profile the performance. Typically, the difference between the graphics apis only leads to performance improvements, not the other way around
Well, are you building with 3.1 selected? You mentioned 3 earlier.
still grass isnt appearing ...
i doubt it's something with the code
I have also checked 3.2
It also mentioned only supporting 4 compute buffers at the time, which could be what's limiting the rendering.
still not working
I will look into it...4 compute buffer limitation
as I have profiled the main culprit is gfx.waitforpresentongfxthread
in opengl it stays around 12ms
consistently
but in vulkan , it has spikes , sometimes 9ms , sometimes 33ms periodically
idk, vulkan was supposed to be better , it seems it's just a hoax
I have found out that some phones with mali, adreno gpu shows decrease of performance on vulkan api
It's very weird, Assuming it's rendering the same picture, there shouldn't be jumps in execution time.
lots of people have submitted this bug , but no improvement actually
Can you profile with the gpu module?
Preferably both the 9 ms and 33ms
Well, then there's not much you can do I guess.
If you really want to dive deep, you might be able to profile with the dedicated vulkan profiling tools.
Also, might want to have a look at logcat during the gles 3 build. There should be something about your grass draw calls.
alright, things are going fine right now....after i am using unity's later version, it seems things have turned out well...
i wanna bother you more a bit, if you don't mind...
is there any way to minimize the performance cost of SetData function for a compute buffer? anything like job or anything else that you know about
How much CPU time is it taking?
Can you share the profiler data?
about 0.12 ms
right now i am only drawing one type of foliages, there're 7 more, looks like it will take about ~1ms or maybe more
Does it expand into more calls if you profile with deep profiling?
Might want to try BeginWrite/EndWrite. The docs say it could be faster.
i'll try it, thanks
But also maybe consider not updating the whole array(how many elements does it have?) every frame.
abouit 10k-35k....
it changes every frame
cz the number of cells of grass in visible frustum changes every now and then
Is each element just float3?
right now i am actually sending 4x4 matrix to both compute buffer and shader as i have different scale, pos and rotation for each grass...
now it has to send 16 floats per elements....maybe this is why it is much slower...but do i have any choice other than that?
64 bytes per element. 1-3mb I guess. I'm not sure if reducing that is gonna have a huge difference. Try the begin write thing first.
The choice is not to update all the element every frame. I think I told you before to only send the changed data.
alright
When a chunk enters the view, that's when you update the buffer.
Or exits.
Unless these objects change position/scale/rotation every frame?
i have cells in my scene, visible cells are added into a list and the list is cleared every frame...then i send the array to the buffer....i tried what you said earlier...but in my chunk, i have about 300k instances....to iterate over all those every frame, even the gpu shows some heavy performance cost...moreover, i am targeting mobile platform...so that's why i didn't dive any deep into that matter
to be honest, i don't know how to do it more accurately
Why do you need to iterate 300k instances? Only iterate those that are actually in the view.
on the gpu, i have to find the visible instances...then append and then send the append count to the material
You can just pass ranges of indices that are actually visible. And iterate them.
to find the visible instances, i have to iterate through all the instance positions....and now what i am using, lift a heavy load from gpu, the gpu has to iterate over about 10k-35k positions to find the positions inside frustum...
Why though?
Can't you just render visible chunks and not render the invisible ones?
If you're gonna frustum check each grass instance, it's gonna kill the performance regardless of whether you do it on CPU or GPU.
Rendering some extra instances that are in half culled chunk is not gonna hit the performance as much as what you're trying to do.
i didn't think it like that...but how do i do that range thing?
Isn't that the whole point of using chunks. Otherwise you might as well just have all the grass in one huge array.
Same as what you're doing now. Pass the ranges in a structured buffer.
i should just try it now, thanks you so much, i will share you the result
can you please clarify this a bit more? i don't understand how should i pass the ranges...should i just again gather the positions from the chunks and set the data whenever i enter or exit a chunk?
As you put the chunks data in the buffer, you should record the start index and size of the data in the chunk. Then add this to another buffer with indices.
ChunkIndices
{
int start;
Int count;
}
Then you iterate this buffer on the GPU and for each entry you iterate the provided indices from the big buffer with the data.
when i store the instances data in the buffer, where is it stored actually? in memory? i'm asking cz i have 64 huge chunks all loaded in one single scene....all those chunks in total contains about 8-9 million foliages....these much data makes a huge sized data....will it be alright then?
@robust pelican what do you think?
The buffer is obviously stored in memory. It's normally stored in VRAM, but when you upload or modify it, a copy of it stored in the RAM as well, though specifics depend on the engine, drivers hardware and platform.
What exactly does the buffer contain? What is a foliage in this context?
grass, flowers, tiny plants etc
There's no such data type as grass, flowers and tiny plants. What kind of data does it contain?
4x4 matrices
Computers are all about bytes and bits. It doesn't care how many flowers you have
Well, one matrix is 16* 4 bytes assuming it's floats.
So 64 bytes per element.
yeah
How many elements do you have?
about 8 millions in the whole open world scene
not sure about the count, those foliages were painted during the level designing process...but nearly 8 millions
- You don't need to upload the whole world to the GPU.
- Even if you did, that would be around 500 mb I guess. It's not little, but it's not uncommon for games to have buffers of such size.
that's why i divided the scene into 64 chunks to split things up
now when i move in the scene, sometimes i stand in between 4 chunks, sometimes 2 and again sometimes only on one....that requires getting the data in the buffer...but each chunks contains almost 300k or less elements...this data, while taking to the buffer, sometimes causing long spikes and screen freezing and some performance issues....so i though i could split each chunk again in multiple cells
and then you know the rest
i think i would just stick to as it is right now, rendering these is such a heavy load...but you really helped a lot.
i guess i will have to bother you again later....don't mind😅
Here's something I should have mentioned long time ago:
Almost any kind of optimization is a trade off between different things. In your case you're reading off memory usage for CPU and GPU time. How much you want to trade off one for the other is up to you. But there is no magical "optimize" button that you press that makes all your worries go away without a price.
This is why I'm very sceptical when people start calling lower level API, like draw calls in their code lightly - unity already provides the most optimal rendering workflow for most general cases. There's even GPU instancing on the material and terrain grass.
I bet that when you started this thing you thought that indirect instanced rendering is that magical button that would provide you an ideal solution. But it is not. While it gives you more control over how the rendering is happening, compared to the tools that unity provides by default, it also brings in certain drawbacks.
Another thing is, you mention long spikes and freezing here, but did you actually profile it correctly and understood the cause?
I assume this was before you even started this thread?
right now i am not using terrain system, turned the terrain into 64 gameobjects and retrieved the grass in draft scene only to gather their positions and other data
and i knew that indirect instance too must have some limitations, but compared to other system this is just better...as far i know right now
right now when i am using divided cells in chunks, less amount of data is to be set(around 10k - 35k elements), but if i don't use cells and directly use huge chunks, the data will also be huge(around 800k when standing between 3 or 4 chunks), causing performance issue...just because i have to repeatedly change the data in the buffer based on the player movement....and yeah it was before this thread, right now i don't get spikes...just that around .8 ms issue, i think this is the trade off..i also have to think about memory....
Yes, just .8 Ms is actually pretty cheap of a trade off. You could avoid it almost entirely if you upload the whole world data at the start of the game and then just upload a very small indices buffer. But then you'd need to be ready for extra 500mb on the GPU.
yeah i thought about that after you said...but as it is targeted for mobile device, i think it will be too heavy...there's lot more in the scene, the buildings, streets and other stuffs...
@robust pelican anyway, if you don't mind answering...for how long have you been doing game dev? you seem to know lot about these stuffs...