#Instanced Grass Rendering
1 messages · Page 1 of 1 (latest)
Ok. We can move onto the compute shader.
#ifndef PROCEDURL_INSTANCE_INCLUDED
#define PROCEDURL_INSTANCE_INCLUDED
StructuredBuffer<float3> _GrassPositions;
StructuredBuffer<int> _GrassIndices;
#if UNITY_ANY_INSTANCING_ENABLED
void vertInstancingSetup() {}
#endif
void GetInstancingPos_float(in float3 PositionWS, out float3 Out)
{
Out = 0;
#if UNITY_ANY_INSTANCING_ENABLED
int grassIndex = _GrassIndices[unity_InstanceID];
Out = PositionWS + _GrassPositions[grassIndex];
#endif
}
#endif```
this is what i got
If there are no errors in Shader Graph, we can move on to the compute shader
It would show it in the graph editor, next to the Custom Function node
The world position is probably getting applied twice. This won't be a problem in the instanced draw call.
alr
it feels like object also moves on z
oh no it's just the rotation shit
yeah it's just twiced
actualy it's 4x, not 2x
For the compute shader, to keep things simple and try to get something working quickly, let's ignore frustum culling for now. So it's just filling the fullRes and lod buffers based on their distance from the camera.
yup
Do you need any help with that or can you get started with that?
If you want the data to be set up the way I suggested, then the append buffers should be int.
You can also do float3 if you want. That saves one buffer read in the grass shader, but uses 3 times more memory.
You may also want to add more data later, like an up direction, so the grass points up from the ground normal, so the int will save even more memory there.
Ok, since this is working with a one dimensional array, I would suggest changing numthreads to (16, 1, 1) and id to uint.
done
The rest should be straightforward.
No, the threads already take care of that for you. id is the index you want to read from.
CSMain will be called for each grass position.
There may be some cases where a loop is necessary, but not in this case.
imagine for every grasspos looping through the entire position set
which is nearly 100k
so it's 100k squared operations
wasted
so, it's smth like:
void CSMain (uint id : SV_DispatchThreadID)
{
if (distance(positions[id], camPosition) < 25)
{
fullRes.Append(id);
}
else
{
lod.Append(id);
}
}
i can move 25 to a variable later
am i right?
grassCompute.Dispatch(16, 1, 1, 1); also this is what i should write?
@tight dragon
No. You should calculate number of dispatches. It calculated by: var numDispatchesX = Math.CeilToInt(totalCount/NUM_THREAD_X);
NUM_THREAD_X = 16 in your case
Then grassCompute.Dispatch(numDispatchesX, 1, 1);
it accepts 4 args tho
groupZ which idk about
grassCompute.Dispatch(Mathf.CeilToInt(positions.count / 16), 1, 1, 1);```
The kernel index comes first
it's FindKernel yeah
Or simply count #pagma kernel count from top starting from 0
in your compute shader source
grassCompute.Dispatch(grassCompute.FindKernel("CSMain"), Mathf.CeilToInt(positions.count / 16), 1, 1);```
bruh i wrote those things like month ago
and forgot everything already
so these 3 buffers look like this
fullResBuffer = new GraphicsBuffer(GraphicsBuffer.Target.Append | GraphicsBuffer.Target.Structured, density, 4);
lodBuffer = new GraphicsBuffer(GraphicsBuffer.Target.Append | GraphicsBuffer.Target.Structured, density, 4);```
first has stride 12 cuz float3
and other two have 4 cuz int
now what about this code
public void Update() {
fullResBuffer.SetCounterValue(0);
lodBuffer.SetCounterValue(0);
grassCompute.SetFloats("camPosition", cam.transform.position.x, cam.transform.position.y, cam.transform.position.z);
grassCompute.Dispatch(grassCompute.FindKernel("CSMain"), Mathf.CeilToInt(positions.count / 16), 1, 1);
GraphicsBuffer.CopyCount(fullResBuffer, gBuffer, 0);
Graphics.RenderMeshIndirect(rparams, mesh, gBuffer);
GraphicsBuffer.CopyCount(lodBuffer, gBuffer, 0);
Graphics.RenderMeshIndirect(rparams, lodMesh, gBuffer);
}```
what do i read and where
You need to set the buffers to the compute shader before dispatching it
forgor
set
grassCompute.SetBuffer(kernelId, "positions", positions);
grassCompute.SetBuffer(kernelId, "fullRes", fullResBuffer);
grassCompute.SetBuffer(kernelId, "lod", lodBuffer);```
Ok. The draw call args buffer is still incorrectly initialized. CopyCount should be setting the 4th byte offset, not the 0th. And you're not setting the index count, which is the first argument.
dstOffsetBytes determines where in the buffer it gets copied to. There are 5 uints in the arguments buffer, 4 bytes each. The first one is indexCountPerInstance, the triangle count of the mesh. The second one is instanceCount.
The remaining three are not needed for this and can remain as zero.
and then i need gBuffer.SetData()?
Yes, to set the indexCountPerInstance
The docs for RenderMeshIndirect has an example of that.
in case i have 4 planes
which means i have 8 triangles
indexCountPerInstance is 8
right?
void Update()
{
if (lastDensity != density) UpdateDensity(density);
fullResBuffer.SetCounterValue(0);
lodBuffer.SetCounterValue(0);
grassCompute.SetFloats("camPosition", cam.transform.position.x, cam.transform.position.y, cam.transform.position.z);
grassCompute.Dispatch(kernelId, Mathf.CeilToInt(positions.count / 16), 1, 1);
GraphicsBuffer.CopyCount(fullResBuffer, gBuffer, 4);
var cmds = new GraphicsBuffer.IndirectDrawIndexedArgs[fullResBuffer.count];
for (int i = 0; i < fullResBuffer.count; i++)
{
cmds[i].indexCountPerInstance = 8;
cmds[0].instanceCount = 1;
}
gBuffer.SetData(cmds);
Graphics.RenderMeshIndirect(rparams, mesh, gBuffer, lodBuffer.count);
GraphicsBuffer.CopyCount(lodBuffer, gBuffer, 4);
cmds = new GraphicsBuffer.IndirectDrawIndexedArgs[lodBuffer.count];
for (int i = 0; i < lodBuffer.count; i++)
{
cmds[i].indexCountPerInstance = 8;
cmds[0].instanceCount = 1;
}
gBuffer.SetData(cmds);
Graphics.RenderMeshIndirect(rparams, lodMesh, gBuffer, lodBuffer.count);
}```
that's what i got but i'm unsure
what i don't get is where are positions written
are they part of a command or smth
The docs show how you can get the triangle count from the Mesh
The positions are in the positions buffer, which you will send to the shader through the material, as well as the other buffer.
commandCount should be only 1. Your args buffer only has 1 command in it.
And instance count should definitely not be 1, that's how many grass instances you want to render
You have to call SetData first, because it overwrites instance count as well, then write over it with the real count using CopyCount.
alr i fixed the triangle count
and the commands
grassCompute.Dispatch(kernelId, Mathf.CeilToInt(positions.count / 16), 1, 1);
var cmds = new GraphicsBuffer.IndirectDrawIndexedArgs[1];
cmds[0].indexCountPerInstance = mesh.GetIndexCount(0);
cmds[0].instanceCount = (uint)fullResBuffer.count;
gBuffer.SetData(cmds);
GraphicsBuffer.CopyCount(fullResBuffer, gBuffer, 4);
Graphics.RenderMeshIndirect(rparams, mesh, gBuffer, gBuffer.count);
cmds[0].instanceCount = (uint)lodBuffer.count;
gBuffer.SetData(cmds);
GraphicsBuffer.CopyCount(lodBuffer, gBuffer, 4);
Graphics.RenderMeshIndirect(rparams, lodMesh, gBuffer, gBuffer.count);```
is that right?
how do i send grass position from a pos buffer to the shader through the material
and what's the other buffer
The fullRes or lod buffer, depending on the draw call
You can set it on the material or in a MaterialPropertyBlock assigned to the RenderParams
var results = new NativeArray<RaycastHit>(density, Allocator.TempJob);
var commands = new NativeArray<RaycastCommand>(density, Allocator.TempJob);
for (int i = 0; i < density; i++)
{
Vector3 randomPoint = new Vector3(Random.Range(-transform.localScale.x / 2, transform.localScale.x / 2), 4096, Random.Range(-transform.localScale.z / 2, transform.localScale.z / 2)) + transform.position;
Debug.Log(randomPoint);
commands[i] = new RaycastCommand(randomPoint, Vector3.down, QueryParameters.Default, 8192);
}
JobHandle handle = RaycastCommand.ScheduleBatch(commands, results, 1, density, default);
handle.Complete();
commands.Dispose();
List<Vector3> coords = new();
foreach (var hit in results)
{
if (hit.collider != null && hit.collider.gameObject.TryGetComponent<GrassSupport>(out _))
{
coords.Add(hit.point);
}
}
results.Dispose();
Debug.Log(coords.Count);```
i get a good set of positions
but when i schedule the thing and read the results, it's empty
coords.count returns 0
and also this warning
also debugged results.length and it's proper
but for some reason raycasts from like (2, 4096, 0) straight down with length 8192 return a null, but it's just impossible
since plane has collider
You have "maxHits" in ScheduleBatch set to "density". "maxHits" determines the max number of hits PER raycast, not in total. It's only relevant if you're using the hitMultipleFaces option. It should be set to 1 here.
oh
var commands = new NativeArray<RaycastCommand>(density, Allocator.TempJob);
for (int i = 0; i < density; i++)
{
commands[i] = new RaycastCommand(RandomPoint(), Vector3.down, QueryParameters.Default, 8192);
}
JobHandle handle = RaycastCommand.ScheduleBatch(commands, results, 1, 1, default);```
it workz
lol
xD
well, it's now time to render all this stuff
var cmds = new GraphicsBuffer.IndirectDrawIndexedArgs[1];
cmds[0].indexCountPerInstance = mesh.GetIndexCount(0);
cmds[0].instanceCount = (uint)fullResBuffer.count;
gBuffer.SetData(cmds);
GraphicsBuffer.CopyCount(fullResBuffer, gBuffer, 4);
Graphics.RenderMeshIndirect(rparams, mesh, gBuffer, gBuffer.count);
cmds[0].indexCountPerInstance = mesh.GetIndexCount(0);
cmds[0].instanceCount = (uint)lodBuffer.count;
gBuffer.SetData(cmds);
GraphicsBuffer.CopyCount(lodBuffer, gBuffer, 4);
Graphics.RenderMeshIndirect(rparams, lodMesh, gBuffer, gBuffer.count);```
is this correct
like no more errors in this part
firstly drawing fullres and then lod
No, you removed the CopyCount
.count is the max size. Only the GPU knows the actual count after the compute shader dispatch. CopyCount copies the count from the internal count in the buffer into the args buffer, all in GPU memory. The CPU never finds out the actual count, that's the point of Indirect draw calls.
Sorry, I missed it in the code.
But there's no reason to set instanceCount to anything there, because it gets overwritten immediately by CopyCount.
I don't like gBuffer.count being passed as the command count in RenderMeshIndirect. It should always be 1 here.
You're using the full mesh index count for the lod indexCountPerInstance
alr
fullResBuffer.SetCounterValue(0);
lodBuffer.SetCounterValue(0);
grassCompute.SetFloats("camPosition", cam.transform.position.x, cam.transform.position.y, cam.transform.position.z);
grassCompute.Dispatch(kernelId, Mathf.CeilToInt(positions.count / 16), 1, 1);
var cmds = new GraphicsBuffer.IndirectDrawIndexedArgs[1];
cmds[0].indexCountPerInstance = mesh.GetIndexCount(0);
gBuffer.SetData(cmds);
GraphicsBuffer.CopyCount(fullResBuffer, gBuffer, 4);
Graphics.RenderMeshIndirect(rparams, mesh, gBuffer, 1);
cmds[0].indexCountPerInstance = lodMesh.GetIndexCount(0);
gBuffer.SetData(cmds);
GraphicsBuffer.CopyCount(lodBuffer, gBuffer, 4);
Graphics.RenderMeshIndirect(rparams, lodMesh, gBuffer, 1);```
is it proper now?
the entire code takes like 4ms to complete
is there a way to get how many ms Dispatch takes each frame
The Dispatch call on the CPU should be negligible, because it's just adding the command to a queue. To measure the GPU performance, you need a third party graphics debugger, like RenderDoc or Nsight
Alr, so the last part is drawing each grass on its own place
Ig that goes after the dispatch
by increasing number of X threads i get some boost, right?
like if there are 64 threads instead of 16/8
No. That number just determines how it gets batched, but it will always get distributed to as many compute threads as possible.
What data to where
position data to all 100k grasses
rparams.SetMatrixArray("_ObjectToWorld", data); or smth
You have to assign a new MaterialPropertyBlock, which you should only create once.
Then you can do rparams.SetBuffer("_GrassPosition", positions) and rparams.SetBuffer("_GrassIndices", fullResBuffer/lodBuffer)
You will have to set the _GrassIndices to different buffers depending on the draw call.
Yes, and it would be rparams.matProps.SetBuffer, my mistake.
Or matPropBlock.SetBuffer
That's fine
Yes, that's fine
so
rparams = new RenderParams(material);
matPropBlock = new MaterialPropertyBlock();
rparams.matProps = matPropBlock;
matPropBlock.SetBuffer("_GrassPosition", positions);```
Make sure the buffer names match the ones used in the hlsl file.
yeah did it
and then in Update
grassCompute.Dispatch(kernelId, Mathf.CeilToInt(positions.count / 16), 1, 1);
var cmds = new GraphicsBuffer.IndirectDrawIndexedArgs[1];
cmds[0].indexCountPerInstance = mesh.GetIndexCount(0);
gBuffer.SetData(cmds);
GraphicsBuffer.CopyCount(fullResBuffer, gBuffer, 4);
rparams.matProps.SetBuffer("_GrassIndices", fullResBuffer);
Graphics.RenderMeshIndirect(rparams, mesh, gBuffer, 1);
cmds[0].indexCountPerInstance = lodMesh.GetIndexCount(0);
gBuffer.SetData(cmds);
GraphicsBuffer.CopyCount(lodBuffer, gBuffer, 4);
rparams.matProps.SetBuffer("_GrassIndices", lodBuffer);
Graphics.RenderMeshIndirect(rparams, lodMesh, gBuffer, 1);```
Looks good
there is only one grass spawning for some reason
and fps is 5
i swear we missed smth
what those ints mean
like the position id
They are probably all being drawn in the same position.
The int is the index into the positions buffer.
Screenshot the Start method as well
The last line isn't assigning any variable.
it's useless here ig
But it's probably defaulting to zero so it's probably not making any difference.
one sec
nah ig i just cut the assigning part
that's the updatedensity
in case i switch from 100k to just 100
Just to make sure the right part of the .hlsl file is working, try changing the Out = PositionWS; back to Out = 0;. If it disappears, then it means something is wrong.
Ok, and then to test if unity_InstanceID is correct, lets try just placing them in a line on the X axis based on their instance id:
Out = PositionWS + float3(unity_InstanceID, 0, 0);
If it works, you'll see all of the grass in a long line.
Can you screenshot the Shader Graph nodes again?
sec
wait
it's not the unityid not working
it's just the hlsl
i just replaced id with 50
so all grasses should be drawn on 50/0/0
but grass stays in place
That's why I wanted to see the shader, to see how it uses the output
That's a bit weird. You're taking the world position, then trying to transform it from object to world space, then converting the output (which is in world space) from object space to world space.
You're supposed to omit the _float. It's fine, you would get an error if it was wrong.
See the difference here?
first node ig
And the one after the custom function
If you replace it with:
Out = PositionWS + float3(10, 0, 0);
Does it move the grass?
Can you try resaving/recompiling the shader graph?
Are you definitely using the right shader on the material you're using in the draw call?
wait
waaaait
LOL
LOL
LOOOOL
Somewhere deep in my project files
there is a freaking old grass material
which is attached to the script
it's like a month old
Happens
It will keep happening, just part of being a game developer. Over time you'll learn to assume nothing and check everything.
And you're CPU bound there, since the app time and CPU main time are the same. Your GPU can probably do it even faster if the CPU wasn't taking so long.
Btw, that stats window will not count these draw calls. It's not aware of them. That's why Tris and Verts are so low.
are those bounds of entire grass zone
Yes
or each individual grass object
Everything combined
Sure
rparams.worldBounds = new Bounds(transform.position, transform.localScale);
oh scale is always positive sure
just a point
i got some kind of bug ig
far grass is ok, it's just a cross, but close grass is weird
it should look this way
feels like missing vertices
Looks like the wrong indexCountPerInstance
it's imported from blender but still
it's 4 planes which means 16 verts
and 8 tris
Ok, so then it's probably because the data in the buffer is not being copied after the first RenderMeshIndirect, so it will end up using the same args as the low res one, with fewer triangles.
A quick solution to that is to make the args buffer 2 count instead of 1.
Yes, but it's my mistake. The GPU won't read that args buffer until later, at which point you've already overwritten it with the lod data.
alr
It's not a big change.
Just change the count of gBuffer to 2 when you create it.
Then cmds to 2 count.
And assign cmds[0] and cmds[1] in each step instead of overwriting the same one.
like dis
In this case, you will only need to call gBuffer.SetData once. Just assign both cmds first and call SetData once
You have one more missing thing
You have to tell the lod RenderMeshIndirect to use the second command, by adding a fifth parameter 1, which is the startCommand it starts reading from.
Oh and you have to fix the second CopyCount, because it's still copying only to the first command.
second mesh is just not getting rendered
GraphicsBuffer.CopyCount(lodBuffer, gBuffer, GraphicsBuffer.IndirectDrawIndexedArgs.size + 4)
shouldn't i also change command count to 2 in second render mesh ?
... ,gBuffer, 2, 1);
No, then it will try to read two commands from the buffer, starting at index 1
alr
buffers seem so unlogical
but as you experiment more you actually get why they are structured this way
Byte offsets are annoying and often a cause of bugs.
well
the thing caused a glitch
those grasses in the background flash and move around
not only in the background
just everywhere outside of fullRes mesh zone
can't i just use two different GraphicsBuffer.IndirectDrawIndexedArgs[1]?
Screenshot of the current code please
You're copying from the fullResBuffer twice
How many is that?
Nice. You're probably still CPU bound, so the FPS you're at now is not because of the grass. You can probably get more FPS if you fix whatever is making the CPU slow down.
Hmm, maybe not. Not sure why the Stats window is saying CPU main is taking 16.8 ms when it's clearly much lower than that.
Maybe I'm wrong, maybe it's just waiting for the GPU.
In which case, yeah, that's the FPS of the grass.
editor loop
But you'll get a bit more FPS once you add frustum culling.
without grass it's 250
when this code runs it's 60 but also 100k meshes bruh
it's giant
i had like 15 before
with RenderMeshInstanced
how do i cull this
You'll want to use https://docs.unity3d.com/ScriptReference/GeometryUtility.CalculateFrustumPlanes.html to calculate the frustum planes, pass these into the compute shader and perform a distance check against the current position.
You can find examples of this by searching "unity compute shader frustum culling"
Since all your grass meshes are the same size and shape, its simplest to consider them as spheres for culling. Then you just need to check if the distance is greater than the radius of the sphere, and if so, don't append it to any buffer.
Other implementations you find online might be more complicated, comparing the AABB/Bounds of the mesh against the frustum planes, which is unnecessary here.
i cant rly find any planes approach
Il try adapting it tomorrow
lez go
@tight dragon
I got no fps boost for some reason xD
Ig its hard even for gpu to go through 100k grasses and calculate dot product with planes for each
without calling rendermeshindirect i get like 200fps
but those 2 calls eat 140 of it and i get 60
It probably helps that the grass meshes don't have many vertices. The frustum culling is mainly reducing the number of vertices that get shaded, the number of pixels is still the same.
i got some weird issue
if i move grass zone from 0/0/0 to some other place, everything works like shit
generated positions set is proper, i checked several times
also since i used height, i can definitely see positions copying the geometry
but all grass is moved from where it should be to some other coords on a diagonal
idk what's it doing here
when it should be all around the terrain
I think I know what the problem may be. I remember someone noticing that Unity was using the center of the worldBounds to move the transformation matrix.
Because they are in world space, not relative to the grass zone.
Try removing PositionWS and just use GrassPositions
so well, how do i update the shader to render grass on the correct place
void GetInstancingPos_float(in float3 PositionWS, out float3 Out)
{
Out = 0;
#if UNITY_ANY_INSTANCING_ENABLED
int grassIndex = _GrassIndices[unity_InstanceID];
Out = _GrassPositions[grassIndex];
#endif
}```
Yes
Oh sorry, PositionWS is where the vertex positions are
Try feeding in just the object position into the Custom Function and revert the code.
u mean this?
So, remove the Transform node before the Custom Function
Yes
lez gooo
waaaaait
is this grass occlusion culled automatically?
fps skyrockets when there's no grass in view
No, there's just vastly less overdraw when the terrain is in the way.
Only if you feel you need more performance. But if frustum culling didn't give you much, occlusion culling won't be any better. More complicated to run.
is there a way to fix this
like one side is more lit that another
it's really noticable
grasses look like painted shields
One side is facing the sun more than the other. You can overwrite the normals in the shader to all face in the same direction, for example up.
In the Shader Graph
this part?
Easier to change the Normal in the vertex stage
lez go
perfect
is that a common thing for every game tho
gpu instancing grass meshes
Yes, it's common for grass normals to always face up
but what about the geometry
are like guys in every game instantiate crossed planes
or some higher-poly stuff can also be instantiated
im asking because...
I'm not sure. I think there's a variety.
i have no clue how to fix this
without adding additional planes
I think there are games that have individually modeled blades of grass.
yeah but it feels too AAA for me
i'll probably just rotate these 4 planes a bit
so they look normal from up
@tight dragon what about trees XD
you ever worked with them?
dk if tree should be a single mesh or one mesh for tree and tons of them for leaves
No I haven't. I'd say the leaves should be separate.
i decided to allow several different grass textures to be drawn in each grass zone
does that mean i create an array of buffers with length of textures amount
The easiest thing is to make a texture atlas and just offset what part of it you're reading from based on some index. That way, you can keep everything in the same draw call.
I would change the positions buffer into a buffer of a struct that contains a position and a texture index.
You can also use a texture array if you're fine with each texture being the same resolution.
is there a way to randomly select it in the grass lit shader
Sure, you can generate a pseudo random index based on the unity_InstanceID.
oh truuuuue
Or, since your grass is already in a random order due to the random raycasts, you can just modulo the instance ID with the number of textures in the array
int textureIndex = unity_InstanceID % textureCount;
Well, actually, that might not be possible since you're adding everything into an AppendBuffer. The order of that is not guaranteed to be the same every frame.
So then you might need to generate a pseudo random number from the position instead.
yo @tight dragon , should this thing work for HDRP as well?
i might have messed up the shader graph or hdrp just doesn't support the same thing
strange thing -> when you don't look at the grass mesh itself, it's texture is not rendered
so if i use the Vetrex -> Position to set grass position, it will ofc change place, but will only be visible while mesh is in view itself
straaaaaaaange
this part
GraphicsBuffer.CopyCount(fullResBuffer, gBuffer, 4);
rparams.matProps.SetBuffer("_GrassIndices", fullResBuffer);
Graphics.RenderMeshIndirect(rparams, mesh, gBuffer, 1);
GraphicsBuffer.CopyCount(lodBuffer, gBuffer, GraphicsBuffer.IndirectDrawIndexedArgs.size + 4);
rparams.matProps.SetBuffer("_GrassIndices", lodBuffer);
Graphics.RenderMeshIndirect(rparams, lodMesh, gBuffer, 1, 1);```
is now useless
I don't know. HDRP is a complicated beast that doesn't really like to be poked.
yeah, but Unity 6.0 added GPU Runtime Occlusion Culling
That won't work with this.
which only works on HDRP version
i don't need it to occlude grass
i need it to occlude everything else
since i can't bake occlusion culling
I think the thing that's breaking is the hacky workaround you did in Shader Graph to get instancing support.
https://forum.unity.com/threads/draw-rendermeshindirect-srp-not-working-on-some-machines.1579917/
I'm working on a project where I have many instances of a handful of static models (plants, trees, etc), and with the makeup of said data, the obvious...