#Iris - A Journey through OpenGL and beyond to learn Graphics
23407 messages Β· Page 24 of 24 (latest)
each 'node' is actually 8 nodes
so one index points to 8 nodes
each of the 8 internal nodes has one child index, which represents 8 nodes
Oh I see, child_offsets is also [8]
How the heck do you build that? Is there a way to take a normal BVH with 8 children per node, and create that?
you just 'hoist' things up a level
Yeah I think I kind of see it. I think you need to do it bottom up right?
Also i'm curious why you used AABBs and not bounding spheres btw
btw jasmine what is your pfp
i am starring at it a lot but its jsut a mush to me i see it
here TempNode is a 'normal' BVH
Girl on a rocket hanglider thing with a rust logo in the corner. Iirc it used to be the Rust zulip icon, but I can't find it anymore
i used bounding spheres but didn't wanna implement the entire sphere merging thesis thing, so i switched to AABBs for tight bounds
Oh it's not hard to do an approximate method btw
I'll stick to spheres then
Anyways time to go redo my plan for the BVH yet again π
yeah i use an approximate thing for lod spheres
but approximate is bad when you want the tighest bounds you can get
hmm
no
in jasmins blog she just tests all clusters
which makes sense
but that is not hierarchical
i dont understand
so you have groups with a self and parent error
if you dont have locked edges, you must have a dag. Dag implies that each parent will potentially share children in simplification. So when a parent gets culled, survivies and then sends its children to be processed duplicates can appear as other parents have the same children
is there a DAG here
how do you build the bvh
each LOD gets a SAH-optimized BVH
SAH?
each lod has a full bvh?
oooooooooooooooooooooooooooooooooooooooooooooooooooooooh
in order
i mean they're combined into one BVH at the end
i just make one BVH per LOD and then combine them to make building easier
yea i see it
no SAH optimization because the AABBs are all the same
wait no i dont see it
When building your DAG, make a list of LOD groups for each LOD.
Afterwards, build a BVH for each LOD level, with your leaf nodes being the LOD groups.
but if the groups are the nodes then its a dag
how
yes that's the meshlet DAG
which is reduced down to a local decision
each group can locally decide if it should render
so
n meshlets in lod n form agroup
and m meshlets in lod n+1 come out
i assume here youd cull lod n+1, survives then spawn work for lod n
this is a dag tho (cause multiple meshlets in lod n+1 point to the same group), so you cant build a bvh for this
actually yea you can build a bvh
but it will have duplicates
can you be more specific
you said you understood and then you went back to your original misunderstanding in 20 microseconds

i forgot to write lod
no i said i understand for the case when you cull all lods in parallel
but then you said you somehow combine all bvhs of all lods
huh
both the cases are the same
so you make a bvh over the roots?
yeah
why
so i have one thing to work with lol
but the DAG is completely irrelevant
it doesn't matter
what matters is that each group has a local and parent error
yea if all lods are separately evaluated then its irrelevant i see that
and thus can decide if it should render locally
is that how nanite does it also?
yes
huh
idk how they build the BVH
seems a bit wasteful
the bvh is only for occlusion cull right?
can it discard nodes on error?
i just do frustum and occlusion cull because why not
so occlusion + fustum?
^
by storing max parent error
can the bvh estimate what children are needed?
if max parent error is less than a pixel, all groups under the current node are going to have parent error less than a pixel
therefore
they are too detailed
parent error in this case is DAG parent error
Ok so you have this node as your root. What steps are you doing? Does each thread test the 8 children, and then write which of the 8 children that pass, or?
i split one node across 8 threads
so each thread tests one and writes the child if it passes
if it's a leaf (pointing to a group), it writes all of the meshlets into the meshlet cull input buf
then meshlet cull runs at the very end to check self errors and bin hw/sw
and frustum and occlusion cull ofc
Ok but why split one nore across 8 threads?
Why not just do the normal BVH stuff of having one node point to 8 children, test the node, and then if it passes add it's 8 children to a buffer for the next run
And have one node per thread
Can you explain more with a concrete example? I'm having trouble visualizing this
since i have 1 node struct representing 8 real nodes, i only need 1 index in the queue to represent 8 nodes
if 1 index was 1 node, i would need 8 indices to represent 8 nodes
thus, 8x the mem
i should try a bvh16 
No you wouldn't. Just write the parent node index.
E.g. given node 0 with children [1, 7], write node 0
And then in the next stage 8 threads pull node 0, and process 1-7
Although idk how you would handle < 8 children hmmm
that's an extra dependent mem load
you can handle less than 8 the same way i currently do (by just have zeroed nodes)
Ok so the options are basically:
8 threads per node
Each thread processes one child of the node, doing frustum + occlusion culling (different per thread), and LOD test (the same for each thread in the node, I think?)
Each thread with a passing child writes one node index (containing 8 children of the child) to the output buffer
OR
1 thread per node
LOD test, frustum, + occlusion culling using the AABB of the node itself and not the children
If passing, write 8 indices (8 children of the node) to the output buffer
the lod test would be different for every thread in both cases
might also wanna experiment with bvh16 or bvh4
god this is so complicated
Gotta traverse the BVH
then once you get a leaf, it's a lod group
So then iterate that to get meshlets
And then test the meshlets too
Yeah, just, so many damn pieces
The BVH nodes, where the leaves point to LOD groups, and then the meshlets within the group...
And there's so many different ways you can inline different parts of the traversal or distribute it across threads
great read
Glad you enjoyed it! I'm realizing I need a comment section on my blog so I can see if people are actually engaging with it or not π
AI? π€
sorry lol
βGlad you enjoyed it!β immediately made me think of ai
that crappy snapchat ai
oh lol
how much faster is sw raster for you guys?
in my scenes its around 20-30%
a lot worse then what nanite ppl claim with up to 3x
Vs mesh shaders or vertex shaders?
It's much faster than vertex shaders, but from what I've heard not much faster than mesh shaders, _at reasonable triangle sizes _
You'll get more improvements if you have really tiny 1 pixel triangles
Also the dag structure matters a lot more than the raster method
vs mesh shaders
but im already testing 1 pix triangles
hmm
ill say hw raster only yhen
Faster is faster. If you've already done the work, might as well keep it.
You should read my blog though, I covered this π
20-30% is a pretty big boost tbh
if it's already done, keep it
id have to impl the splitting into hw and sw meshlets
i wonder where the nanite guys got the 3x number
i dont think vertex shaded meshlets are that muvh worse
mesh shaders didn't exist back then
having to generate index buffers is slow
also nanite has much smaller trognles than you probably
sw raster is only really faster if all triangles are about 1-3 px
in my test the are subnpixel mostly
so they are much smaller than nanite on average
maybe that is a problem for sw raster
in ue nanite they are much larger tho
maybe that changed
80% are subpixel culled
yes
makes sense then
the hw raster and pixel quads are what's slow
if everything is being culled, mesh is gonna be almost as fast as sw
im guessing the vertex shader and cull work is disproportionate in my test yea
cause most tris are culled
the actual rasterization is not a bottleneck
buty the meshlet processing is
i need lods
edges are about 3-5px
when things get small it hits 2-3px total per tri
@fiery bolt how does your world format look like
Taking some time off virtual geometry stuff to go back to RT lighting instead.
i think u've encouraged me to try and start compressing my meshlets
Enjoy! It's very cool.
You might also want to read https://gpuopen.com/learn/mesh_shaders/mesh_shaders-meshlet_compression/. It came out after I already wrote everything, but they have a slightly different way of doing things.
nice, thanks!
@primal shadow im calling you out
from afar your pfp looks like some sort of pokemonβthe wings being ears, the girl being the mouth
Huh, weird
It's stolen from the rust zulip, although idt they use the icon anymore
And I have no idea what the original source is
where is lvstri
btw lukasino i made a renderdoc like inspector for taskgraph
ill put it into daxa as a util some time this quater
π
I know that you have it in timberdoodle but wasnt expecting it to get into daxa as util
i separated it mostly from tido now so ahould be mostly easy
oh yeah you meant something else 
The image inspection came to my mind
My brain is not braining for past few days
yea thats what will come
in yhe util
it injects itself into task graph so you can inspect images in attachments
cool
what would the use of something like that be?
its renderdoc lite in real time
debugging features in your engine
very nice since my renderdoc capture is always broken somehow
i couldnt use it since forever cause i have rt
tbf I havent used renderdoc in months
they finally did that they just ignore rt
yeah
yeah but you'd need buffers and etc too
yep buffers are the big downside
i hope i can extract layouts and stuff from slang
yeah pershonally I'd just open in renderdoc and use it 
renderdoc constantly doesnt work
I forgor the name but they have this feature
doesnt work with lots of extensions
If it works thats other thing
nosight?
true
βοΈ
I mean neither do you 
π€¨
I thought you didn't have buffer view support yet?
yep
βοΈ


fr tho the real time view is super duper convenient
also much better than spamming printf 
debug printf works for you? 
yes
lucky
heh
hmm it also works for me
I just go to vkconfig get the printf preset and launch my thing and add printf to my shader
tools, am i right
thats it
ok maybe I should try again sometime
I should also implement a pass viewer and debug view or something like you guys
lambdas at home dropped few months ago
ifunc?
function interface
my issue is that the newest slang version doesn't build on windows if I disable some targets or something
rust moment 
Come to dark side(C++)
real
hell naw
you got to suffer like us
just use daxa
in rust?
there were some rust wrappers around daxas c api
bro I'll just use my own render graph at that point lol
I proposed to nuke daxa c api since nobody uses it
heh?
I have my own 'RHI'
Come to the darkest side and use C++ on both host and device
I wish but I am not touching metal
π€¨
hahahaha
the right one is anyone suggesting daxa, and the left is other person
same goes for nabla
π€ͺ
(dont tell devsh i said this)
they will be enlightened soon
do you need a new vacuum? the daxa 3000 has the best ....
my callbacks are fully lifetimed 
btw @primal shadow if you make nanite, do you also plan to make vsms?
Vsms are kinda needed to capture the detail of the high poly geo and avoid popping.
I feel like it doesnt make much sense to have nanite geo if the shadows still pop/dont show the small details
rt shadows? π«¨
that would prob be what i would try
good luck with rebuilding as for the nanite changing geo
ah forgot about that
rtx megageometry hard req 
Nah. Right now just standard CSM, but eventually RT based lighting.
lets hope mega geo will do the trick
That's only necessary for GI though
You already have the high poly gbuffer, you can just trace direct lighting off of that and have it be good wuality
i was thinking about the shadows for direct lighting
the trace still needs the geo in the bvh to be in that quality too to some extent tho
Yeah, just RT from the gbuffer you rasterized. Why wouldn't that work?
I don't think so. Mayyyybe for small occluding details you would, but that's what the screen trace is for.
Screen trace + RT direct lighting should be plenty.
you defo need the screen trace for the small details
And depending on what mega geometry is it might involve scrapping the entire raster system I've wrote anyways π
the future is chrome raytraced
I hope so
I have absolutely 0 motivation to work on stuff rn π¦
I have everything sketched out for the VG BVH changes, but just don't feel like sitting down and writing it
I wonder what happened here
shhh, he is eeping
I see Meshoptimizer recently added meshopt_partitionClusters which could finally replace METIS.
free nanite?
Tried my hand at it. Can't get the LOD chain as deep as I'd like; many triangles are getting stuck at LOD 1.
Not using meshopt_buildMeshletsFlex yet but the Stanford bunny won't even simplify into LOD 1.
Let me see if I can compare METIS.
A DAGon.
i think your meshlets or groups or both are fucked
Both. The initial meshlets are very rough.
Initial clusters with meshopt_buildMeshlets vs meshopt_buildMeshletsFlex.
btw when exporting did you merge vertices and shade smooth
that mesh looks horrid
you should use the original scan
and merge vertices in blender on import and then shade smooth
Thanks. Will give it a try.
Downloaded from https://casual-effects.com/g3d/data10/index.html#mesh11 and merged vertices by distance in Blender.
Not sure how correct my DAG cut is, looking at https://jglrxavpok.github.io/, but seems to be more or less what Meshoptimizer's "free nanite" example gives me.
you could checkout my code
You should have a debug view to write the lod index or the computed size on screen of a meshlet
It helps a lot

Slowly getting my life back together
woah
He returns
This was a good sanity check.
Looking over compression and now streaming so I can throw a trillion triangle Deccer cube in.
lvstri!!!! Welcome back π
Party's over. Hide the frog food.
was it easy to implement?
or at least significantly easier than using metis
Yes. See the nanite.cpp demo example in the meshoptimizer repo.
Implemented streaming to find that I still have cracks in my DAG with larger assets like Activision Caldera or Intel Jungle Ruins.
Curious now as to what Unreal would produce for these.
Early this year, NVIDIA released their new raytracing technology, RTX Mega Geometry, alongside an impressive Zorah demo. This demo was distributed as a ~100 GB Unreal Engine scene - that can only be opened in a special branch of Unreal Engine, NvRTX. The demo showcased the application of new driver-exposed raytracing features, specifically clust...
headline describes my average realtime application

