#Iris - A Journey through OpenGL and beyond to learn Graphics

1 messages ยท Page 22 of 1

fiery bolt
#

1 ms for culling

#

and about 9 ms for raster

#

i need software raster now

fiery bolt
#

ok holy fuck it's rendering 1.2 million meshlets that doesn't sound good bleaker_kekw

#

same scene in unreal takes 1.92 ms

glass sphinx
#

nice culling hole ๐Ÿ˜ˆ

#

very impressive tho

#

is this lodding also?

fiery bolt
primal shadow
#

@wicked notch what heuristic did you use for software raster vs hardware raster again? I'm having a hard time figuring out a good metric. Nanite briefly mentions that they compute some kind of longest triangle edge data per cluster and use that iirc.

fiery bolt
#

project that to screenspace and software raster if it's less than 16 pixels or something iirc?

primal shadow
#

For now I've just given up on tweaking it, left it as something silly, and have moved on to other bits

#

Apparently Brian Karis read my blog post though, awesome!

wicked notch
#

if we never hear from you again, UE has won

primal shadow
#

He said he enjoyed it ๐Ÿ˜…

fiery bolt
#

unlike that tencent talk

runic surge
faint crane
#

I saw both of Tencent's talks between Advances and Moving Mobile Graphics but couldn't understand most of it.

#

Wish I got a picture, but Jasmine's article was linked in Advances. Slides are up, let me link it.

fiery bolt
#

ok so I just read the tencent slides and... it's just nanite but worse?

#

it's minimum effort nanite reimpl

#

instance cull, parallelize all meshlets, no software raster, and some wacky lod curve

faint crane
#

Could the lod curve somehow compensate for lack of software raster? They were targeting mid-high end mobile.

wispy spear
#

@solar sentinel's too : )

#

our frogs invading the world

wicked notch
#

buuut the benefits of sw raster are elsewhere too

#

for example with VSM, sw raster is pretty much the greatest source of speedup

delicate rain
#

lies, culling is

wicked notch
#

culling is the second greatest source of speedup KEKW

#

nah saky's right it's culling first and then sw rast

#

but the advantage of not going through the regular pipeline for triangles is massive

delicate rain
#

I was this close to getting into another insane argument

wicked notch
#

especially for small ones

delicate rain
#

no more

wicked notch
#

wink wink bistro trees

delicate rain
#

bistro trees bleakekw

wicked notch
#

I genuinely believe that sw rast for bistro trees would fix everything

#

but alas

#

I still don't have nanite shippable bleakekw

delicate rain
#

my bsm is also still borked

#

I've been distracted by shiny rays

wicked notch
#

same

solar sentinel
#

I'm doing hardware raster as well. Software raster doesn't pay off for anything larger than a pixel sized triangle. In the UE source code, it's literally called MicroPoly raster (...for a reason). There's overshading to compute hardware gradients. Even Nanite bins and rasterizes wider triangles via HW.

fiery bolt
#

nanite does triangles smaller than 16 pixels or so with SW

#

not just pixel-sized

wispy spear
solar sentinel
#

Holy shit

solar sentinel
solar sentinel
primal shadow
#

I still haven't figured out a good heuristics for SW vs HW. NSight shows like +/- 0.5ms all the time for the raster pass making measuring perf impossible. Idk how to improve when the results are so variable.

frank sail
#

Make the test scene bigger

solar sentinel
solar sentinel
primal shadow
wicked notch
#

more.meme.gif

primal shadow
solar sentinel
#

But total number of issued calls is once per material.

primal shadow
solar sentinel
#

I'm not doing Nanite.

#

I'm doing visibility buffer rendering only ๐Ÿ™‚

primal shadow
#

Ohhh ok

fiery bolt
#

my stress test scene has like 100 billion tris total bleakekw

#

spends about 10 ms in raster (hw, no sw)

#

well, i impld sw inside my mesh shader and that bumped it up to 20 ms lol

primal shadow
#

Not home rn, will check later

fiery bolt
#

@wicked notch I sped up mesh generation significantly by generating meshlet AABBs correctly instead of going through every single vertex in the mesh bleaker_kekw

wicked notch
#

crazy how that works KEKW

fiery bolt
#

it even speeds up rendering because now meshlets are being culled!

velvet marsh
#

is your AABB structure something you wrote yourself?

fiery bolt
#

yeah

#

i wrote some goofy bvh too

#

it... works?

#

the nanite presentation glossed over how it works so idk how correct i am

primal shadow
#

@fiery bolt I forget who is who, did you do the WebGPU Nanite impl?

#

(Scthe on github)

fiery bolt
#

nope not me

#

i'm doing it in rust and vulkan

primal shadow
#

@fiery bolt is your code open source anywhere? I'd like to check out your meshlet DAG building/simplification code

fiery bolt
#

i think i'm using meshopt for simplification currently, not my own thing

#

changed to meshopt to debug something then forgor to switch back KEKW

primal shadow
#

Looks... very similiar to my own code ๐Ÿ˜…

#

Did you base it off of mine? (which is totally fine, just curious what your process was)

primal shadow
#

Nice, happy it helped

fiery bolt
primal shadow
#

If you end up learning anything, let me know please!

#

So far today's experiments have revealed:

  • Setting target error = 1.0 helps a lot. No reason to limit the target error.
  • 255 v / 128 t is better than 64/64 (meshopt won't let me do 256 vertices :P)
fiery bolt
#

yeah i did the first one

#

and nanite does the latter

#

i didn't do the latter exactly because meshopt doesn't like 256 lol

#

i do 128/128

#

well, 128/124 because mesh shaders

primal shadow
#

I got better triangle fill rate with 255/128 vs 128/128

fiery bolt
#

what's your test model?

primal shadow
#

Stanford dragon, currently

fiery bolt
#

that is shade flat by default so you have zero vertex reuse

#

i had that issue with all the three stanford models

primal shadow
#

I have:

  • Stanford dragon
  • Stanford bunny
  • Jinx from arcane form sketchfab merged down to 1 mesh (I don't really support multiple materials per meshlet mesh)
  • Icelandic cliffs form quixel megascans
fiery bolt
#

every tri has unique vertices

#

the bunny too

#

oh yeah i also use the max of error instead of adding the max child

fiery bolt
primal shadow
#

So if LOD 0 has 0 error, LOD 1 has 10 error, and LOD 2 has 20 error

#

LOD 2's total error should be 30 relative to the bash mesh (LOD 0)

fiery bolt
#

not really

#

because it's a world place displacement

#

higher LODs will naturally have a higher error

primal shadow
#

Why? And that's tangential, no?

fiery bolt
#

lemme try to ms paint something

primal shadow
#

You don't want to know error relative to the previous LOD, you want to know error relative to the original mesh right?

fiery bolt
#

actually no idk how to draw it lmao

#

basically, when you simplify, you're collapsing edges into vertices

#

and the simplification error is the maximum displacement a vertex moved (sort of)

#

as you simplify more, edges get longer

#

so when you collapse, you get a higher error naturally

primal shadow
#

Sure, and that's a consistent metric for DAG cut purposes, yes

fiery bolt
#

i think this would lead to double-counting if you add

#

making your error higher than it should be

primal shadow
#

But that means your error projection is no longer saying "is this LOD imperceptible from the base mesh at this distance"

fiery bolt
#

the nanite presentation says they max, not add iirc

#

so in this case, if you're collapsing the red edge

#

you get this as the output

#

then you collapse this

#

leading to this

primal shadow
#

Compariosn of 255/128 vs 128/128 on icelandic cliffs btw: https://paste.rs/AFqkf.txt. Meshlet occupancy is a map of triangles_per_meshlet:count_of_meshlets key:value

fiery bolt
#

the total displacement in the two simplification steps is equal to half the length

#

which is equivalent to the second collapse error

primal shadow
#

can you rephrase that last bit?

#

I understand the images, but trying to understand the implications still

fiery bolt
#

so if you compare the first and last image, the vertex in the center has moved half the total length, right?

#

that's your final error of the simplification

primal shadow
#

That's true

#

Ok makes sense to me then

fiery bolt
#

yeah, that's equivalent to the displacement done by the second collapse

primal shadow
#

Also I reread the nanite slides rq

#

They say they do the max of child's parent error for the BVH

#

TGhey don't mention how they handle choosing the parent error in the first place though

fiery bolt
#

the parent error bit is higher

#

slide 66

primal shadow
fiery bolt
#

yeah at least as large implies max imo

#

@wicked notch what does nanite do

wicked notch
#

it's a leq comparison

#

so it doesn't matter

#

if parent >= threshold && current <= threshold

fiery bolt
#

no for building error

#

is it max or add

wicked notch
#

add I think

#

dont member, let me access the secret sauce (UE's source)

primal shadow
# fiery bolt yeah at least as large implies max imo

So if that's true, then it's this?:

  • LOD 0 clusters have error = 0.0, parent_error = INFINITY
  • Group and simplify LOD 0 to form a group
  • Group error = max(group_error_from_simplify, all_child_errors)
  • LOD 0 parent errors = group_error
  • LOD 1 error (not parent) = group_error
fiery bolt
#

yeah

wicked notch
#

UE is dead

#

๐Ÿ˜”

fiery bolt
fiery bolt
#

why does megascans not have a way to sort by tri count smh

wicked notch
#

it is indeed add

fiery bolt
#

they add child errors to the parent error?

wicked notch
#

the bounds of the group directly affect the error calculation

#

which is shrimply copied

fiery bolt
#

me no understand

wicked notch
#

let me show you da wae

#

actually I am retarded

#

yes it's max

fiery bolt
#

lol

#

wtf is that formatting

wicked notch
#

UE formatting

fiery bolt
wicked notch
#

this carries over from 1996 code probably

#

still as beautiful as ever 30 years ago froge_love

primal shadow
#

Where the heck does LODError come from?

fiery bolt
#

i assume that's set to 0

primal shadow
#

Seems like a field in Cluster

#

C++ has implicit this-> I guess

fiery bolt
#

it does

#

does megascans not have a high poly source asset with more than 2 mil tris

wicked notch
#

blender -> subdivision modifier -> simple

#

be careful because you can turn those 2 million triangles into 2 quintillion

#

speaking from experience bleakforg

fiery bolt
#

also for runtime error calculation, you should store the group bounding sphere and place your error test sphere on that, instead of at the center of the group

#

@primal shadow ^

fiery bolt
#

i'll just use lucy then

fiery bolt
#

currently you project a sphere with center = group center and radius = error, right?

primal shadow
#

Yeah

fiery bolt
#

that undercounts error for tris closer than the center

primal shadow
#

?

fiery bolt
#

you have triangles closer to the camera than the group center right

#

so if you test the error at the group center, those triangles might have an error higher than what you calculate

#

page 11

primal shadow
#

Wait so what's the solution?

#

Use the meshlet's center, and don't have any center for the group?

fiery bolt
#

store the entire bounding sphere of the group

#

and place the error test sphere at the closest point to the camera

primal shadow
#

Oh I see

#

Store the actual bounding sphere for the group, including radius

fiery bolt
#

yeah

primal shadow
#

Calculate closest point on the sphere to the camera

#

Then place the sphere at center = closet point, radius = error

fiery bolt
#

yep!

#

the only issue is that idk what to do when the camera is inside the sphere

#

bounding or error sphere

#

because then your sqrt(d2 - r2) gets a negative value in it

primal shadow
#

Bleh that's going to be a lot more involved change, I'l add this to the TODO list

fiery bolt
#

which is sus

primal shadow
#

Agreed, hmm

#

Does this really matter though?

fiery bolt
#

i'm not sure

primal shadow
#

I mean yeah we want the LOD calculation to be as accurate as possible, but I feel like it's such a handwave to begin with...

fiery bolt
#

idk what the source of my overculling is KEKW

#

it's probably occlusion culling

#

but it might be this

primal shadow
#

My occlusion culling is just broken

#

Using SPD to build the depth pyramid is not correct unfortunately

fiery bolt
#

i ported themaister's code and the hzb seems to be correct

#

but it's still broken

primal shadow
#

Is that the modified SPD one? I wanted to use that

#

I have a question though

fiery bolt
#

yeah

primal shadow
#

Say your depth texture is a non-power-of-2

#

What size do you make mip 0 of the depth pyramid?

fiery bolt
#

half of that

#

rounded down

primal shadow
#

So 1800: 1800/2 = 900, rounded down to 512?

fiery bolt
#

nono

#

just 900

#

by round down i meant if it's odd

primal shadow
#

So you don't enforce that the new depth pyramid is a power of 2? Hmm

fiery bolt
#

nope

primal shadow
#

I think that may be wrong, 1s

#
            (depth_view.get_view_width() + 63u) & ~63u,
            (depth_view.get_view_height() + 63u) & ~63u,
#

I think that rounds up to the nearest multiple of 64?

fiery bolt
#

yeah

primal shadow
#

Are you doing that? That may be you issue if not

fiery bolt
#

yeah i'm not

#

but why is he doing that

#

is it because each workgroup does a 64x64 block

#

and that he doesn't bound check in the shader thonk

primal shadow
#

Probably? Idk

#

Not sure if it's bounds checks though

#

Could be correctness

#

Like, extra bounds checks probably matters a lot less than extra vram usage from this

fiery bolt
#

i'll add that and check

primal shadow
#

Hopefully you can figure it out, I don't really want to go stare at hiz code ๐Ÿ˜ญ

fiery bolt
#

i'll let you know

primal shadow
#

Wait before you start I have another question

#

When simplifying meshlet groups, which edges do you need to lock?

fiery bolt
#

ones that are only a part of one tri

primal shadow
#

eloberate?

fiery bolt
#

uhhhh

#

you need to lock the border

primal shadow
#

Which border though?

fiery bolt
#

of the group

primal shadow
#

The border of the group, or border between meshlets?

fiery bolt
#

you don't want to lock between meshlets no

#

only the group

#

that's the secret sauce to nanite

#

oh yeah i also use larger groups than 4

#

i think it helps simplification perf

primal shadow
#

What do you use?

fiery bolt
#

i do 8 meshlets per group

#

you might wanna test it out though

primal shadow
#

The groups are too big for meshopt to split ๐Ÿ˜…

primal shadow
#

Oh btw, are you using METIS or meshopt for clusterizing?

#

Interested to know if you compared the two at all

fiery bolt
#

might try metis

fiery bolt
primal shadow
fiery bolt
#

in each group?

#

i do the same

#

it shouldn't be causing issues

#

meshopt clusterizes the source mesh after all

#

and it does lucy (28 mil tris) in 2-3 min

fiery bolt
#

oh i also made this shitty python script that automatically tiles any gltf

primal shadow
fiery bolt
primal shadow
#

Yes but not for meshlet meshes

#

Our asset processing APIs are sadly fairly poor atm, so I don't have a good way to convert GLTF -> bunch of meshlet meshes + scene file

#

(yet)

fiery bolt
#

ah

#

my issue is that my engine is not an engine

#

the only thing it can do is load gltfs KEKW

primal shadow
#

Join Bevy ๐Ÿฆ€

fiery bolt
#

does bevy do vulkan cutecatNE

primal shadow
#

No, wgpu.

#

There is an alternative vulkan backend in a non-official crate, but it dosen't work with any existing stuff ofc.

faint crane
fiery bolt
#

lol

#

you aren't a billion dollar company doing a siggraph presentation though

faint crane
#

Now I just need an excuse to skip on my externally broken occlusion culling.

#

I swear it was broken when I found it. No way to put it back together.

fiery bolt
#

no no

#

you have a week

#

it better be working by then

primal shadow
loud crag
loud crag
#

itโ€™s something iโ€™d put under โ€œcode correctnessโ€ which would be the first thing to work on

primal shadow
#

There's too much else to work on in the mean time, small stuff like this is not a priority.

fiery bolt
#

did some optimization and now it culls 1 million stanford dragons in 2 ms froge_love

#

raster takes 23 ms though (only hw, no sw) KEKW

#

800 billion tris at 30 fps

ebon ruin
#

But can it support foliage

wicked notch
#

overdraw is not real it can't hurt you

#

overdraw:

fiery bolt
#

i have acheived scene independence

#

7.2 trillion triangles at 60 fps

#

@primal shadow the edge detection really does help simplification massively

#

thanks for the insight

primal shadow
#

edge detection?

#

but uhh glad to help

fiery bolt
#

like the edge classification

#

internal/external

primal shadow
#

oh, np

#

is it not super slow for you?

#

at build time

fiery bolt
#

not really no

#

the other stanford dragon with 7.2 million tris imported in about 1.5 min

primal shadow
#

ยฏ_(ใƒ„)_/ยฏ

#

Well, I'll take a closer look at your code soon

#

I need a break from DAG building ๐Ÿ˜…

fiery bolt
#

it's probably the multithreading tbf

#

i'm running on a 12900k

primal shadow
#

what does your highest LOD look like?

#

Does it collapse to a sphere?

fiery bolt
#

good question lemme check

primal shadow
#

Also do you have a link to your github? I lost it

fiery bolt
#

it is... whatever this is

fiery bolt
primal shadow
#

I mean it's kinda dragon shaped

#

sure

#

Managed not to turn into a sphere

fiery bolt
#

probably because meshopt doesn't generate vertices

primal shadow
#

You went back to your own simplifier?

#

Also hey, at some point I'd appreciate an explination on how to implement subpixel SW raster

#

The stuff I found online never made sense to me

#

And it seems like you implemented it

fiery bolt
#

but i will soon

#

i'll let you know when i do

fiery bolt
primal shadow
fiery bolt
#

yeah

primal shadow
fiery bolt
#

my sw rasterizer is actually really bad

#

doing everything in hardware is very slightly faster lmao

primal shadow
#

Really? I saw way faster speeds with SW raster

#

Did you have mesh shaders already though?

fiery bolt
#

and i do backface culling in the mesh shader

primal shadow
#

That explains it. Nanite was started before mesh shaders, and mesh shaders give a lot of similiar speedup.

fiery bolt
#

i think with a bit of optimization and async compute overlap i can eek out a fair bit of perf with sw tbh

#

mainly the async compute overlap

#

the mesh shaders are completely bottleneck on hw so they have terrible util

primal shadow
#

Yeah I don't have mesh shaders or async overlap ๐Ÿ˜ญ

fiery bolt
#

impl mesh shaders into wgpu + naga and use them on native froge_evil

primal shadow
#

Too much work... I hate touching wgpu/naga

#

Not my kind of thing

fiery bolt
#

naga isn't terrible

#

i rewrote/restructured it for module-level scoping long ago

#

wasn't too bad

#

no idea how the code is today though

wicked notch
ebon ruin
#

Question about Nanite/what you guys do

#

Meshlets are merged as the LOD gets lower to prevent edge cruft

#

Doesnโ€™t this rely on the LOD of both meshlets decreasing? So what would happen if you need to lower one meshletโ€™s LOD, but another meshlet must stay higher?

fiery bolt
#

LOD isn't decide at meshlet level, but at meshlet group level

#

and you only lock the boundary of the group

#

and after simplification, you split the group into meshlets again

primal shadow
fiery bolt
#

yeah

#

i think that part of my code is broken actually

#

i see cracks

ebon ruin
#

i see

#

thank you all for your time

primal shadow
#

I hate tangents

#

Idk why my code isin't correct but it's niot

fiery bolt
#

compute analytic derivatives instead of using ddx/ddy perhaps

#

@primal shadow oh yeah i also found out that my border vertex detection was completely wrong, fixed it and pushed to my branch (PRed to your bevy repo)

#

might wanna test it out again lol

primal shadow
fiery bolt
#

ah

#

no idea then KEKW

primal shadow
fiery bolt
#

will do if i remember

#

lmao i fixed edge classification which led to fixing cracks which led to significant occlusion culling improvements which now means i can do 7.2 trillion triangles in under 3 ms

#

4.3 +- 0.3 if i lock clocks to base

wispy spear
#

!remindme 3d remind jasmine about the thing

vivid boughBOT
#
New Reminder | ID:72591537

Alright deccer, I'll remind you in 3 days about:

remind jasmine about the thing

fiery bolt
#

mobile drivers coming in clutch (words you'd never thought you'd ever see)

wispy spear
#

do you mean crutches?

fiery bolt
#

@languid vector methinks i have found a solution

#

the original sphere projection algo

#

that takes near clipping into account

#

project the full bounds to the screen, and scale the error using that

languid vector
#

pog thanks!

fiery bolt
#

idk if it works yet

languid vector
#

let me guess - you classified edges without generating shadow index buffer with geometry-only data?

#

the only thing I wonder is how the hell you manage to do 7.2 trillion triangles in 4ms

#

Is hierarchy this major optimization?

fiery bolt
#

so it was unlocking group borders

#

leading to cracks

fiery bolt
#

you can get rid of a bunch of instances early

#

and never even consider more than 1 or 2 meshlets for them

#

you also do frustum and occlusion culling inside hierarchy traversal

#

so you can save a bunch of bandwidth too

fiery bolt
languid vector
#

I've only done the frustum culling so far

#

I can't really understand what am I bound to since not a single metric is nsight is loaded more than 80%

fiery bolt
#

memory

languid vector
#

though "warp can't launch" is insanely high

fiery bolt
#

or raster

fiery bolt
languid vector
languid vector
fiery bolt
#

what's ISBE alloc stalled at

languid vector
#

I am not sure I understand what ISBE is ๐Ÿ˜…

fiery bolt
#

idk either

#

it's just a metric

languid vector
fiery bolt
#

I think it's memory allocation for mesh shader outputs

#

if your rasterizer is cooked you'll stall at that because it's still processing other tris

languid vector
#

I also do nanite-style quantization so it should have saved a bit of bandwidth

#

I mean, it is nearly x4 compression

fiery bolt
#

I have zero compression lol

#

it's not about mesh shader read bandwidth, it's about how fast the rasterizer can chug triangles

#

you're raster bound

languid vector
#

I have never thought I will be raster bound bleakekw

#

and I guess the only way to solve it occlusion culling + sw raster?

#

afaik occlusion culling is a massive optimization

#

really massive

fiery bolt
#

how large is your test scene

languid vector
fiery bolt
#

how many tris is that

#

you should use the xyzrgb dragon froge_love

#

it's got 7.2 million tris in it

#

or Lucy froge_love

languid vector
fiery bolt
#

27 mil iirc

languid vector
fiery bolt
languid vector
#

1660 Ti

fiery bolt
#

yeah you need occlusion cull

languid vector
#

I guess so kekw

fiery bolt
languid vector
fiery bolt
#

on a 12900k

#

full multithreaded

languid vector
fiery bolt
#

still better than unreal

languid vector
#

unreal takes longer?

fiery bolt
#

I let it run for an hour once and it froze

#

so I killed it

#

if this shit works I can finally start streaming froge_love

languid vector
fiery bolt
#

yeah

languid vector
#

I was just about to ask you if you already have implementation kekw

#

my dumb ass can't handle the maths now

fiery bolt
#

I'm making coffee so I can consume copious amounts of caffeine to understand wtf the paper does

languid vector
#

I personally can't understand the meaning of "conservativeness" in all these papers

fiery bolt
languid vector
fiery bolt
#

your bounding box must be larger than the object, never smaller

#

because otherwise you will cull too much

languid vector
#

yess, it makes sense now. thanks!

#

alr, it has 3 out variables:

out vec2 perpendicularDirection, out vec2 U, out vec2 L

now need to figure out what tf to do with it

fiery bolt
#

uhhhhhhhhhhh error is decreasing as i get closer

languid vector
fiery bolt
#

i've done something wrong

languid vector
#

so as I understand, it takes view and sphere and outputs a polygon with N sides with coordinates in screen-space? I guess I only need AABB from the sphere

fiery bolt
#

the core algorithm take a sphere in view space and tells you the min, max along a specific axis

languid vector
#

so I simply build an AABB and then compute its area that I use as error estimation?

#

"simply build" using this algorithm I mean

fiery bolt
#

no just one axis

#

and you scale error with that

languid vector
#

ah I see

#

so phi is always 0?

fiery bolt
#

i'm doing y axis so it's always pi / 2

languid vector
#

yup, makes sense

#

does it output size in screen-space?

#

or I need to project it further

fiery bolt
#

it outputs in view space

#

everything is in view space

languid vector
#

I see

#

I guess you are better at maths than me, so if you don't mind, can I come back with questions if have some after reading of the paper?

fiery bolt
#

i don't think i am KEKW

#

but sure

languid vector
#

thanks!

languid vector
#

hold on, I am just stupid

fiery bolt
#

same

languid vector
#

no, it indeed returns nans and I can't understand why

languid vector
#

@fiery bolt so I figured out the issue. I have inverse Z camera and was passing inverted clip planes to the algorithm ๐Ÿคก

#

tho haven't figured out how to use U and L for projection yet

fiery bolt
#

you should be doing it in view space though thonk

#

inverse Z shouldn't matter

languid vector
#

I mean, near plane was further than far plane

#

which produced a tons of NaNs due to sqrt of negative value

fiery bolt
#

ah lmao

languid vector
#

did u figure out how to use U and L?

fiery bolt
#

that's the screenspace bounds

languid vector
#

projected to the far plane?

fiery bolt
#

no, projected to the screen

#

which is the near plane

languid vector
#

amazing, so no need to project it?

#

if it's already projected to the screen

fiery bolt
#

i think no

#

actually no, the sample code they've given projects

languid vector
#

Though it doesn't take FOV into account

#

which is kinda strange

fiery bolt
#

yeah you need to project

languid vector
#

how do I project projected stuff bleakekw

fiery bolt
#

idk, the sample code does it

#

just project it frog_dum

languid vector
#

Alr I see, I will have a look at sample code

#

so yeah, it works relatively awful without reprojection

fiery bolt
#

i think i have something that sorta works?

languid vector
fiery bolt
#

i rewrote the code a few times

#

and projected

languid vector
#
vec2 parent_U, parent_L;
    GetBoundsForPhiLengyel(0.0f, parent_projected_bounding_sphere.center, parent_projected_bounding_sphere.radius, camera_data.near_clip_distance, camera_data.far_clip_distance, parent_U, parent_L) ;

    vec4 parent_projected_points[2];
    parent_projected_points[0] = camera_data.proj * vec4(parent_U.x, 0.0f, camera_data.near_clip_distance, 1.0f);
    parent_projected_points[1] = camera_data.proj * vec4(parent_L.x, 0.0f, camera_data.near_clip_distance, 1.0f);

    const float parent_result_error = (parent_projected_points[1].x / parent_projected_points[1].w) - (parent_projected_points[0].x / parent_projected_points[0].w);

this is how I do it

fiery bolt
#
    public f32 project_error(f32x4 bounds, f32 error) {
        let center = mul(this.mv, f32x4(bounds.xyz, 1.f)).xyz;
        let radius = bounds.w * this.scale;
        let err_frac = error / bounds.w;
        
        if ((center.z + radius) <= this.near) return 0.f;

        let dist2 = dot(center, center);
        let a = sqrt(dist2 - center.z * center.z);
        let t2 = dist2 - radius * radius;
        let t = sqrt(max(t2, 0.f));
        let in_sphere = t2 <= 0.f;

        f32x2 bounds[2];
        // cos(theta) = t / dist
        // sin(theta) = r / dist
        // T = (rotate(theta) * (a, z) / dist) * t,
        // removing the dist divide in cos, sin
        // ncos(theta) = t
        // nsin(theta) = r
        // rotate(theta) == rotate(ntheta) / dist
        // therefore, T = (rotate(ntheta) * (a, z) / dist2) * t
        // saving us two divides and a sqrt!
        var v = in_sphere ? f32x2(0.f) : f32x2(t, radius);
        let clip_sphere = (center.z + radius) >= this.near;
        let off = this.near - center.z;
        var k = sqrt(radius * radius - off * off);

        [unroll]
        for (int i = 0; i < 2; i++) {
            if (!in_sphere) 
                bounds[i] = mul(f32x2x2(v.x, v.y, -v.y, v.x), f32x2(a, center.z)) * v.x / dist2;
            let clip_bound = in_sphere || (bounds[i].y < this.near);
            if (clip_sphere && clip_bound)
                bounds[i] = f32x2(a + k, this.near);
            v.y = -v.y;
            k = -k;
        }

        let ndc_size = abs(bounds[0].x / bounds[0].y - bounds[1].x / bounds[1].y) * this.h;
        // NDC size has a range of [0, 2] mapping to [0, height], 
        // but don't divide by 2 because the error is divided by 2 at build-time.
        return ndc_size * err_frac * this.screen.y;
    }
languid vector
#

wait

#

why ur code is in rust

fiery bolt
#

slang

languid vector
#

got it

fiery bolt
#

my cpu code is in rust tho

languid vector
#

Alr, I will get back to the code tomorrow, gotta sleep. it is 3 AM in my country kekw

fiery bolt
#

you still have 3 hours till bedtime!

wicked notch
#

make that 6

primal shadow
#

Time to debug tangents again

primal shadow
fiery bolt
#

why not?

primal shadow
#

you'd have different projections for different parts of the same group, which makes no sense

fiery bolt
#

how would you?

primal shadow
#

and when you do a BVH, it's based on the group, not individual meshlets

fiery bolt
#

you're using the group LOD sphere

primal shadow
#

wait what culling sphere do you use?

fiery bolt
#

a merged sphere of all lower lods' group lod sphere

#

to do BVH you need a bounding sphere or else traversal won't be monotonic

fiery bolt
#

oooooooook even this doesn't work

#

time to bust out the pen and paper

languid vector
#

I mean if camera is inside the sphere

#

Just snap to camera position?

dull oyster
#

I still don't understand why the sphere should not just be at the center of the group

languid vector
#

You should never underestimate error, you can only overestimate

#

here is 2 groups that have the same error. Obviously the left one will have perceptually bigger error because it is bigger by itself, so it will be closer to camera

fiery bolt
#

but it's rendering way too much

#

like, 2 million meshlets

#

my culling queues are filling up bleaker_kekw

languid vector
fiery bolt
#

debubbing time

dull oyster
# languid vector So test is conservative

The test is made so that the error is not (in theory) perceptible, wouldn't making it more conservative just use more VRAM when you could use a lower fidelity lod for the same visual result?

fiery bolt
#

the test only guarantees the error isn't perceptible when you use bounds

dull oyster
#

I understand that it keeps higher fidelity lods more, but I'm not sure it's really necessary

fiery bolt
#

if you just use the center of the group it can't guarantee that

dull oyster
#

I may be missing the point completely but it bothers me that I don't see the issue ^^'

languid vector
fiery bolt
# dull oyster I may be missing the point completely but it bothers me that I don't see the iss...
  1. assuming error at the center is 0.999 px, the error for all triangles in the group closer than the center (which should be about half) will be more than a pixel
  2. when you do BVH, you need the outermost node's projected error to always bound that of it's children, so if you just use the center of the node, groups closer than that (again, about half), might have a higher projected error than the BVH node itself
languid vector
fiery bolt
#

i was sending my code but discord seems to be blocking it bleaker_kekw

languid vector
#

๐Ÿ’€

fiery bolt
#

i assume it's being blocked for spam

#
    // 2D Polyhedral Bounds of a Clipped, Perspective-Projected 3D Sphere (Michael Mara, Morgan McGuire).
    // We get the projected bounds on the axis that is the longest upon projection (need to be conservative!),
    // which is the one from (0, 0) to the sphere's center.
    public f32 perceptible_error_distance(f32x4 bounds) {
        let center = mul(this.mv, f32x4(bounds.xyz, 1.f)).xyz;
        let radius = bounds.w * this.scale;

        if (center.z + radius <= this.near)
            return 0.f;

        let dist2 = dot(center, center);
        let a = sqrt(dist2 - center.z * center.z);
        let proj_center = f32x2(a, center.z);
        let t2 = dist2 - radius * radius;
        var t = sqrt(max(t2, 0.f));
        let in_sphere = t2 < 0.f;

        // cos(theta) = t / dist
        // sin(theta) = r / dist
        // T = t * rotate(theta) * proj_center / dist,
        // removing the dist divide in cos, sin
        // ncos(theta) = t
        // nsin(theta) = r
        // rotate(theta) == rotate(ntheta) / dist
        // therefore, T = t * rotate(ntheta) * proj_center / dist2
        // saving us two divides and a sqrt!
        let ncos = t;
        let nsin = radius;
        let wt_z = dot(f32x2(-nsin, ncos), proj_center) / dist2;
        var t_z = t * wt_z;

        if (in_sphere || t_z < this.near) {
            // let off = this.near - center.z;
            // let k = sqrt(radius * radius - off * off);
            // let t = f32x2(a + k, this.near);
            t_z = this.near;
        }

        return t_z;
    }
#
    public f32 error_perceptible_at(f32 error) {
        // Don't divide by 2 because the error is already divided by 2 during build.
        return this.screen.y * this.h * this.min_scale * error;
    }

    public bool should_visit_bvh(f32x4 lod_bounds, f32 parent_error) {
        return this.perceptible_error_distance(lod_bounds) <= this.error_perceptible_at(parent_error);
    }

    public bool should_render(f32x4 lod_bounds, f32 error) {
        return this.perceptible_error_distance(lod_bounds) > this.error_perceptible_at(error);
    }
#

there

#

idk why i'm returning t_z

languid vector
#

and it works?

fiery bolt
#

well it doesn't have holes

#

or overlapping meshlets

languid vector
#

but rendering too much?

fiery bolt
#

yeah it does that

languid vector
#

we can always increase the threshold tho kekw

fiery bolt
#

idk how correct it is

jagged valve
fiery bolt
#

bindless everything

#

struct defs are duplicated though froge_sad

jagged valve
fiery bolt
#

why is removing frustum culling reducing the number of meshlets drawn

#

but increasing the number of bvh nodes traversed (as it should)

languid vector
primal shadow
#

@fiery bolt ok I'm taking a look at your PR today

#

If i pass ptr::null() for vertex_locks, and add back SimplifyOptions::LockBorder, it should be equivilant to the old code right? For comparison purposes

#

So uhh it's a bit... aggressive

#

231 -> 4 meshlets is... a choice ๐Ÿ˜…

#

let me try this on the cliff instead of bunnies

#

Cliffs:

#

Something's a bit off

#

Anyways back to compression

fiery bolt
#

are these numbers with the edge detection or without?

primal shadow
fiery bolt
primal shadow
#

I used your vertex locks for your pr

#

And used nullptr for mine

fiery bolt
#

hmmmmm

fiery bolt
#

so it doesn't mean that the whole mesh was 231 meshlets and it's now 4

#

what's the threshold at which you reject a simplified group?

fiery bolt
#

my frustum culling has been wrong all along...

#

lmfao

#

@languid vector 7.2 trillion tris in 1.46 +- 0.29 ms

languid vector
#

I am rendering 500m tris at 30ms

#

:c

fiery bolt
#

occlusion culling

languid vector
#

but without occlusion culling

#

yeah

#

and without hierarchy

#

and without sw raster

fiery bolt
#

occlusion cull is a bigger win than hierarchy

#

i'm gonna tune my sw raster thresholds rn

#

it's not that much of a difference

#

around 20% boost

languid vector
#

what was the best boost for your virtual geom renderer?

#

I mean runtime performance

fiery bolt
#

wdym

primal shadow
primal shadow
primal shadow
fiery bolt
primal shadow
#

also true

primal shadow
#

Tried 65%

LOD: 0, meshlet count: 15616, meshlet occupancy counts: {128: 15615, 88: 1, }
LOD: 1, meshlet count: 8066, meshlet occupancy counts: {128: 6406, 127: 1114, 64: 352, 63: 156, 126: 29, 62: 7, 43: 1, 125: 1, }
LOD: 2, meshlet count: 4719, meshlet occupancy counts: {128: 2925, 64: 507, 63: 465, 127: 222, 32: 157, 31: 133, 62: 98, 96: 62, 95: 59, 126: 27, 30: 20, 94: 17, 61: 13, 29: 5, 125: 3, 93: 2, 46: 1, 28: 1, 66: 1, 85: 1, }
LOD: 3, meshlet count: 1358, meshlet occupancy counts: {128: 811, 32: 11, 111: 9, 64: 9, 16: 9, 48: 9, 11: 9, 113: 8, 109: 8, 9: 8, 118: 8, 10: 8, 79: 8, 95: 8, 23: 7, 47: 7, 120: 7, 122: 7, 29: 7, 63: 7, 73: 7, 94: 7, 6: 7, 127: 7, 93: 7, 106: 7, 99: 6, 34: 6, 45: 6, 117: 6, 70: 6, 12: 6, 75: 6, 97: 6, 33: 6, 55: 6, 5: 5, 98: 5, 19: 5, 53: 5, 7: 5, 112: 5, 74: 5, 100: 5, 14: 5, 22: 5, 65: 5, 101: 5, 110: 5, 25: 5, 28: 5, 49: 5, 27: 5, 68: 5, 61: 4, 51: 4, 56: 4, 119: 4, 126: 4, 44: 4, 13: 4, 24: 4, 60: 4, 102: 4, 71: 4, 15: 4, 123: 4, 18: 4, 76: 4, 39: 4, 50: 4, 124: 4, 26: 4, 115: 4, 116: 4, 17: 4, 30: 4, 91: 4, 31: 4, 80: 4, 2: 4, 96: 4, 37: 4, 52: 3, 46: 3, 43: 3, 62: 3, 121: 3, 125: 3, 4: 3, 90: 3, 89: 3, 72: 3, 20: 3, 85: 3, 84: 3, 114: 3, 35: 3, 40: 2, 83: 2, 8: 2, 1: 2, 41: 2, 3: 2, 103: 2, 21: 2, 108: 2, 81: 2, 82: 2, 57: 2, 42: 2, 36: 2, 78: 2, 88: 2, 38: 2, 104: 1, 58: 1, 77: 1, 107: 1, 69: 1, 86: 1, }
fiery bolt
#

is that the cliff?

#

huh i get
561, 284, 152, 80, 40, 19, 10, 6, 3, 2
meshlets per lod

#

for the bunny

primal shadow
fiery bolt
#

hm that's weird

primal shadow
#

Idk. I'm doen with trying to improve DAG building atm.

#

Next todos are compress meshlet data and then BVH-based persistent culling

fiery bolt
#

wgsl doesn't support coherency

primal shadow
#

Hmm, do you need it?

fiery bolt
#

yeah coherency is required for any non-atomic writes to be visible to other workgroups in the same dispatch

#

same reason it's needed for SPD

primal shadow
#

Which part needs those?

fiery bolt
primal shadow
#

Right I know what it does, I'm curious what needs it though for persistent queues

#

Oh wait

fiery bolt
#

you're writing to a queue lol

primal shadow
#

you can attomic increment the queue counter

#

but your write won't be visible lol

fiery bolt
#

yep

primal shadow
#

yeah I see

fiery bolt
#

I do dependent dispatches

primal shadow
#

Isin't it litterly just adding coherent (glsl) / globallycoherent (hlsl) to the buffer decleration though?

fiery bolt
#

and so does nanite on PC

primal shadow
#

Ok I'll patch naga, easy enough

fiery bolt
#

oh yeah you also need forward progress guarantees

primal shadow
#

That, or spirv passthrough in bevy

fiery bolt
#

yeah that works

primal shadow
#

let me find the spirv thing

fiery bolt
#

metal on M series fails that

primal shadow
fiery bolt
#

so you have to fall back to dependent dispatches for apple silicon

loud crag
primal shadow
fiery bolt
loud crag
#

whatโ€™s that

fiery bolt
#

it'll just keep spinning

fiery bolt
loud crag
fiery bolt
#

no API actually gives that to you

primal shadow
loud crag
fiery bolt
#

but it mostly works on nvidia and amd

fiery bolt
loud crag
fiery bolt
#

dependent dispatches work well enoughโ„ข๏ธ

#

I might do persistent queues after streaming

#

maybe

wicked notch
#

I'm pretty sure cuda does define forward progress guarantees

#

so that's why it mostly works on NV

fiery bolt
#

amd too

#

idk about intel

wicked notch
#

PT workloads and dynamic parallelism is pretty big in cuda land

#

intel who?

loud crag
fiery bolt
#

should I do the whole fixed size page thing or half-ass dynamic allocation thonk

wicked notch
#

fixed page all the way

#

why would you even want dynamic alloc

#

just do tlsf on gpu

#

it's just a couple of fls and ffs

#

you might need a lock tho

#

acutally yeah no binning is bad KEKW

#

do standard gpu malloc

fiery bolt
primal shadow
#

@languid vector I have some more questions on the global mesh compression when you get a chance

  1. I don't get the idea behind the function you sent me a bit ago to calculate the bitrate/step_size for the global/mesh quantization grid. Should this not be a fixed value used for all of your meshes? Also, what's even the point of the global quantinization since you store your meshlet centers in a full vec3<f32> anyways no?
  2. Given that each meshlet stores a bitstream of vertex positions, for meshlet X with triangle index Y, how do you read the vertex position data? With fixed-size positions, i.e. one vec4<f32> per vertex, I can just store one u32 per meshlet pointing to the start of the meshlet's vertices in the large array of vertices, and each triangle index can be a single u8 pointing to an offset off of that starting position. But I'm not sure how to structure things with a bitstream.
  3. When quantizing, I'm not entirely sure how to handle sign-ness (i.e. negative or positive). For meshlets, I guess I could map -radius..radius to 0..diameter, and then do ceil2(log2(diameter)) to determine the bitrate. For meshes/global grid, I guess I just store the meshlet centers as a full vec3<f32> still? (but quantized to the grid instead of absolute coordinates)
wicked notch
fiery bolt
#

without Yet More Indirection

wicked notch
#

literally who cares

#

you're already bw limited

#

one more buffer

#

and maybe another one after that

fiery bolt
#

no but then I have to deal with those buffers

wicked notch
#

wym deal

#

alloc the buffer done

fiery bolt
#

dynamic gpu malloc

wicked notch
#

ye

#

gabe has an impl of gpu malloc

#

you can go take inspiration

fiery bolt
#

gonna take a lot of inspiration

wicked notch
#

me when I commit theft

ebon ruin
fiery bolt
#

oh yeah it's also more work when building

fiery bolt
ebon ruin
wicked notch
#

never heard of er

fiery bolt
#

me neither

wicked notch
#

do you mean pointers maybe

ebon ruin
#

I thought you were a vulkan man

wicked notch
#

I am yes

fiery bolt
#

I just put a Tex2D<f32> in my push constants

wicked notch
#

I am a vulkan 1.3 man

ebon ruin
#

are you pulling my leg

fiery bolt
#

and it just works

wicked notch
ebon ruin
#

thats so hot

#

i wish i learned that

wicked notch
#

descriptor sets are a bad dream that don't exist anymore

ebon ruin
#

now im not so scared of bulkan

#

thanks guys

fiery bolt
#

we have vulkaned someone else froge_love

wicked notch
#

we need more fresh blood for john khronos ๐Ÿ™

ebon ruin
#

not yet

fiery bolt
ebon ruin
#

i am a GL guy

wicked notch
#

not for long

ebon ruin
#

yes for long

#

vulkan offers me nothing

fiery bolt
#

pointers

wicked notch
#

pointers

fiery bolt
#

lol

ebon ruin
#

useless

fiery bolt
#

no

ebon ruin
#

meaningless

fiery bolt
#

nanite makes you a real clown

ebon ruin
#

its all meaningless

wicked notch
fiery bolt
wicked notch
#

watch out for edge cases

ebon ruin
#

the only thing desirable is hw rt but everytime i mention it L says โ€œNo.โ€

ebon ruin
fiery bolt
#

ok now make it fast

ebon ruin
#

no

wicked notch
#

you did nanite on scratch but you're scared of vulkan

#

that doesn't compute

fiery bolt
#

28 trillion trongles in 1.5ms

ebon ruin
#

iโ€™m not scared of vulkan its just useless to me

wicked notch
#

it maybe is

#

but pointers

fiery bolt
#

hit that and we'll let you not use vulkan

ebon ruin
#

pointing to what?

wicked notch
#

memory

ebon ruin
#

I have not even implemented PBR yet

#

and by the looks of it

fiery bolt
#

if you can't do 25 trillion triangles in 1.5ms you have to use vulkan

ebon ruin
#

neither have you

fiery bolt
#

exactly!

#

I even deleted my brdf.hlsl when porting to slang

wicked notch
#

rendering equation?

#

do you mean triangle rasterization equations?

fiery bolt
#

or edge equations

wicked notch
#

quadric error metrics maybe perhaps

fiery bolt
#

my error projection makes no sense

ebon ruin
wicked notch
fiery bolt
#

I just did the screen space projection thing and then multiplied two things

wicked notch
#

I literally just mul the radius with the projection

#

and scale it

fiery bolt
#

because just using t led to holes

wicked notch
#

best error function ever

ebon ruin
#

this is like the dark wizards trying to convince me to use dark magic

fiery bolt
#

yes

#

and you know you want it

#

you need it

wicked notch
#

show him

wicked notch
#

the slang push const

fiery bolt
wicked notch
#

give him a taste of slang's ultimate power when combined with vk13

wicked notch
#

(don't tell him about the driver bugs and invalid spirv and out of spec optimizations)

fiery bolt
#

fuck nvidia's driver

wicked notch
#

all my homies hate nv

ebon ruin
#

me too

wicked notch
#

I've found like 3 nvidia bugs when using slang

#

it's crazy

ebon ruin
#

i shouldve gone with rx

fiery bolt
#

fucking useless thing crashes with a misaligned addr if i use groupshared mem in my software rasterizer

wicked notch
#

I have zero idea how they say slang is "production ready"

fiery bolt
#

because slang is

#

their driver isn't

wicked notch
#

actually true

#

garbage driver

fiery bolt
#

i've looked through the entire spirv slang shits out

wicked notch
#

"release" "driver"

fiery bolt
#

it's perfectly fine

fiery bolt
#

works now

#

now i have three definitions of my gpu scene structs bleaker_kekw

wicked notch
#

how is dxc's output any different

fiery bolt
#

the only thing i can see is that dxc uses unsigned array lengths

#

slang uses signed

wicked notch
#

bruh

fiery bolt
#

this was in a minimized diff that just wrote zeroes to the arr

#

still crashed with slang so i assume that's the issue

wicked notch
#

nv driver btw

#

quality softwareโ„ข๏ธ

fiery bolt
#

however, using groupshared arrays in my stolen port of SPD works

wicked notch
#

ye that probably just "happens" to work

#

idk

fiery bolt
#

and using groupshared arrays in my mesh shader also works

wicked notch
#

the whole thing is ub

fiery bolt
#

slang default InterlockedAdd uses device scope for everything so i wrote my own spirv asm thingy for VMM (and all barriers etc)

#

it still died

#

do you think novideo will hire me if i tell them i'll fix their shit shader compiler

#

oh yeah VMM make available | make visible doesn't work either for some reason so i had to split my dispatches

wicked notch
#

seek god

#

actually seek grass

#

then god

fiery bolt
faint crane
#

Haven't implemented that yet.

primal shadow
buoyant summit
#

@wicked notch in cuda and the likes you use a special dispatch for forward progress

#

and the api requires that your dispatch size is under some limit

#

for forward progress guarantees to hold

#

and the limit varies device to device

#

and yes that would be useful in vk no less, I'm just saying it's a bit tricky to use

#

it's not super straightforward

wicked notch
#

ye it's flimsy

wispy spear
#

@primal shadow did you do the thing?

#

i think it was about a/the PR

primal shadow
primal shadow
#
pub struct MeshletMesh {
    /// Bitstream-packed vertex positions.
    pub vertex_positions: Arc<[u8]>,
    /// Octahedral compressed normals and uncompressed texture coordinates for vertices.
    pub vertex_attributes: Arc<[u8]>,
    /// Triangle indices for meshlets.
    pub indices: Arc<[u8]>,
    /// The list of meshlets making up this mesh.
    pub meshlets: Arc<[Meshlet]>,
    /// Spherical bounding volumes.
    pub bounding_spheres: Arc<[MeshletBoundingSpheres]>,
}

/// A single meshlet within a [`MeshletMesh`].
#[repr(C)]
pub struct Meshlet {
    /// The bit offset within the parent mesh's [`MeshletMesh::vertex_positions`] buffer where the vertex positions for this meshlet begin.
    pub start_vertex_position_bit: u32,
    /// The offset within the parent mesh's [`MeshletMesh::vertex_attributes`] buffer where the vertex attributes for this meshlet begin.
    pub start_vertex_attribute_id: u32,
    /// The offset within the parent mesh's [`MeshletMesh::indices`] buffer where the indices for this meshlet begin.
    pub start_index_id: u32,
    /// The amount of vertices in this meshlet.
    pub vertex_count: u8,
    /// The amount of triangles in this meshlet.
    pub triangle_count: u8,
    /// Number of bits used to quantize vertex positions within this meshlet.
    pub bits_per_vertex_position: u8,
    /// Unused. (TODO: Get rid of this in the disk representation?)
    pub padding: u8,
}

Ok, got this so far.

wicked notch
#

why is everything Arc nervous

loud crag
#

what's Arc anyway

#

I only know it from objc's Automatic Reference Counting

frank sail
#

I thought it was atomic ref counted

#

Idk what the ref countedness means in the context of a single uint

wicked notch
#

it's just shared_ptr

primal shadow
wicked notch
#

damn

#

can't you just make MeshletMesh arc?

primal shadow
#

It was either Arc (shared_ptr) or Box (unique_ptr)

fiery bolt
#

or you can do a custom DST cutecatNE

primal shadow
#

?

fiery bolt
#

you can make your own DST with a header and an unsized tail of bytes bleakekw

primal shadow
#

Ehhh maybe some other time if it becomes an issue lol

primal shadow
#

Compression continues to frustrate me greatly

#

I still have not seen good explinations for how half of it works

#

Mostly the purpose of the global grid

wide shadow
faint crane
fiery bolt
primal shadow
#

Nor do I think that's right

#

I think nanite was saying you need the same grid to avoid cracks between objects you would get from different grids

#

But unclear why they have one in the first place

glass sphinx
primal shadow
#

Right, but it's missing some context

#

Yes if you do different grid sizes per mesh there would be issues

#

But what's the purpose of the grid in the first place?

#

Like you could also solve the same problem by... Not quantizing anything to a grid

glass sphinx
#

how would you quantize it

fiery bolt
#

without quantizing you need the full 12 bytes for position

#

and it won't byte-compress very well I think

primal shadow
#

But you can compress with the per-cluster encoding, which afaik(?) is lossless(?)

#

So it'll compress to a more compact bitstream regardless

#

I think the global quantizing step beforehand is just to reduce the precision, in order to need less bits for the second step? Idk for sure

#

And then it makes a bunch of other things more complicated

#

There's two steps. The quantization with a fixed step size for all meshes, and then encoding vertices per-cluster

#

I think the purpose of the first is to reduce excess precision (so less bits are needed), and the second is just a more compact encoding of the same data

glass sphinx
#

if your compression doesnt yield the exact same vertex positions ror border vertices you get cracks

#

i guess the world grid helps keep the qualtization consistent

#

i dont think the quantization is lossless

wide shadow
#

it isnt going to be the easiest thing to do

#

also encountered slang bug...

wispy spear
#

looks like caldera hotel

wicked notch
#

why do I recognize that this is bistro

wide shadow
#

its bistro + sponza ๐Ÿ˜‰

#

shoot its very loseless...

wispy spear
#

oh hehe

wicked notch
#

ship it

wheat haven
primal shadow
fiery bolt
wispy spear
wicked notch
#

very nice, I'll check it out in a bit

primal shadow
#

Interesting issue I've found, shadows look too bad atm in bevy for meshlets

#

It's choosing too low of a lod I'm guessing, and the error is very visible when viewed by the main camera

#

I probably need to add a lod bias to shadow views

wicked notch
#

Unreal fixes this using SSS btw

#

they mentioned that in the original talk

primal shadow
#

Ironically nanite does the same, but they bias towards a less accurate lod, as VSM is so high res anyways it's never a problem

#

Mhmm also true

wicked notch
#

VSM is too powerful

primal shadow
#

Nah Ray traxing is

#

VSM is ok

wicked notch
#

fake

#

blatant RT propaganda

#

try again

primal shadow
#

Rasterization is a hack and does not work for non-primary views

#

And I won't pretend otherwise

frank sail
#

Banned

wicked notch
#

banned

finite yacht
#

Promoted to Admin

frank sail
#

not you too

loud crag
fiery bolt
#

don't waste your 0.25 rays per pixel on shadows, use them for GI and real specular smh

primal shadow
#

Ok after much research, I finally understand nanite's vertex quantization now

primal shadow
#
/// A single meshlet within a [`MeshletMesh`].
#[derive(Copy, Clone, Pod, Zeroable)]
#[repr(C)]
pub struct Meshlet {
    /// The bit offset within the parent mesh's [`MeshletMesh::vertex_positions`] buffer where the vertex positions for this meshlet begin.
    pub start_vertex_position_bit: u32,
    /// The offset within the parent mesh's [`MeshletMesh::vertex_normals`] and [`MeshletMesh::vertex_uvs`] buffers
    /// where non-position vertex attributes for this meshlet begin.
    pub start_vertex_attribute_id: u32,
    /// The offset within the parent mesh's [`MeshletMesh::indices`] buffer where the indices for this meshlet begin.
    pub start_index_id: u32,
    /// The amount of vertices in this meshlet.
    pub vertex_count: u8,
    /// The amount of triangles in this meshlet.
    pub triangle_count: u8,
    /// Number of bits used to quantize vertex positions within this meshlet.
    pub quantization_bits: u8,
    /// Number of bits used to to store the X channel of vertex positions within this meshlet.
    pub bits_per_vertex_position_channel_x: u8,
    /// Number of bits used to to store the Y channel of vertex positions within this meshlet.
    pub bits_per_vertex_position_channel_y: u8,
    /// Number of bits used to to store the Z channel of vertex positions within this meshlet.
    pub bits_per_vertex_position_channel_z: u8,
    /// Unused. (TODO: Get rid of this in the disk representation?)
    pub padding: u16,
    /// Minimum quantized X channel value of vertex positions within this meshlet.
    pub min_vertex_position_channel_x: f32,
    /// Minimum quantized Y channel value of vertex positions within this meshlet.
    pub min_vertex_position_channel_y: f32,
    /// Minimum quantized Z channel value of vertex positions within this meshlet.
    pub min_vertex_position_channel_z: f32,
}
#

The perfect 256 bits of metadata

wide shadow
ebon ruin
#

i am uh

#

making Nanite

#

on Scratch

fiery bolt
#

good

primal shadow
wide shadow
primal shadow
#

Update: It's been taking a while due to learning and then being sick, but my fever finally broke and I finished all the CPU-side changes for compressed per-meshlet vertices.

#

Just need to figure out how to do the GPU bitstream reader

#

I am also using a fixed quantization factor per mesh rn, nanite has an "auto" mode that chooses the best one, I'll have to figure out how they did that.

fiery bolt
#

don't all meshes need to have the same quantization factor

#

because it would lead to cracks otherwise

primal shadow
#

Maybe, idk. The original nanite presentation said it's user-selectable and has to be the same for different meshes, but unreal has an "auto" option.

#

Choose the precision this mesh should use when generating the Nanite mesh. Auto determines the appropriate precision based on the size of the mesh. The precision can be overridden to improve precision or optimize disk footprint.