#Iris - A Journey through OpenGL and beyond to learn Graphics

1 messages · Page 21 of 1

primal shadow
#

For reference here's the roadmap:

  • Removing material depth writing from the raster pass
  • Implicit tangents
  • Moving to persistent culling over a BVH-per-object of cluster groups instead of the huge flat dispatch over clusters
  • Software raster and making hardware raster less memory-hungry without mesh shaders (blocked on wgpu texture atomics)
  • Better mesh conversion workflows and asset processing, and making it faster and higher quality
  • Compute-based shading and software VRS (long term)
  • Streaming (long term)
left jacinth
#

@wicked notch I copied the compute rasterization function you had, and modified it to render quads instead of triangles, and of course I know your code is directly ported from UE5 source, which I can see correlates very highly. I was wondering if you had thought at all about the licensing of UE5's source

#

the specific file itself just says "all rights reserved" at the top

#

but also they say this

#

you can't copy paste UE code without being subject to royalties, but you can "learn" from UE code...

#

At what point does it count as copying

#

especially if the code in question is a software triangle rasterizer, which was understood and modified to instead be a software quad rasterizer

wicked notch
#

yeah, I was definitely thinking of removing that and replacing it, I guess I'll expedite that

left jacinth
#

what would you replace it with?

#

I don't understand copyright law man...

wicked notch
#

back in the day I was looking at various softrast methods

#

I dunno what to replace the subpixel calcs with tho

#

I gotta look at something public somehow, iirc AMD has some subpixel AA going on in their FFX repos

faint crane
#

Does "it was revealed to me in a dream" not work with Unreal's lawyers?

wicked notch
#

I mean my project is purely for personal use

#

so I don't really care KEKW

left jacinth
#

^

left jacinth
#

is there a place to find the logic for how hardware computes the scan-lines so that you make sure they match?

#

I didn't do too much searching but didn't find anything obvious

#

I don't really even know where to look

wicked notch
#

exam was so easy I almost forgot it existed (I was late thank god the prof was too KEKW)

left jacinth
#

dang

rocky schooner
#

Does the market have any demand for OpenGL?

wispy spear
#

the market has demand for people who understand graphics and compute pipelines, the api does not matter much

buoyant summit
#

no, I personally don't demand any opengl, tyvm (I'm the market)

frank sail
#

Customers (of any variety) dgaf about graphics APIs beyond what they may see when launching a game

#

Hmm, imagine launching cod 202x and seeing the choice to use fwog or vuk backends

#

add daxa to the list

wicked notch
#

how do you know cod doesn't use daxa already

#

potrick may be pulling some strings there

velvet marsh
#

the way I read this is that the market demand is for engineers who don't have problems learning a new graphics api quickly

wicked thorn
#

a

#

b

#

c

#

d

wispy spear
#

damn, italians do talk a lot

#

🤏 🤌

wicked notch
#

real

ebon ruin
#

but why ask here of all places?

severe dome
#

it is a graphics server

ebon ruin
#

but specifically in this thread

severe dome
#

oh lol i didn't even notice that lmao

primal shadow
#

@dull oyster something I thought of, given that we need to calculate the screen space AABB for occlusion culling anyways, why even bother with calculating the screen space diameter/radius of a cluster using a different formula? I guess it's slightly cheaper, which matters when you have millions of clusters that are likely to be skipped?

wicked notch
#

GPU lost GPU rights

#

only compute machine now

pale horizon
#

Time to do a software renderer on GPU froge_love

loud crag
#

the RTX 60 series will just be a big fat compute processor with nothing else

buoyant summit
#

AMD doesn't put drawing hw into that at all, neither they put format conversion and sampling stuff

#

NV has some very tiny and weak drawing hw and not sure about sampling etc

primal shadow
#

Hmm, I can't figure out how to map the meshoptimizer meshlet triangles/indices into a meshlet-local index

#
let index_ids = meshlet.start_index_id + vec3(triangle_id * 3u) + vec3(0u, 1u, 2u);
let indices = meshlet.start_vertex_id + vec3(get_meshlet_index(index_ids.x), get_meshlet_index(index_ids.y), get_meshlet_index(index_ids.z));
let vertex_ids = vec3(meshlet_vertex_ids[indices.x], meshlet_vertex_ids[indices.y], meshlet_vertex_ids[indices.z]);
let vertex_1 = unpack_meshlet_vertex(meshlet_vertex_data[vertex_ids.x]);
let vertex_2 = unpack_meshlet_vertex(meshlet_vertex_data[vertex_ids.y]);
let vertex_3 = unpack_meshlet_vertex(meshlet_vertex_data[vertex_ids.z]);
#

I have 64 vertices and 64 triangles max per meshlet

#

You can take the triangle_id [0, 64) and map that to an index [0, 192)

#

Then adding that to the start_index_if of the meshlet gives you the position within the larger buffer

#

And then you can do the same thing with vertices

#

But now I have a 64 thread workgroup load 1 vertex per thread into shared memory

#

And then 1 thread per triangle can load those vertices and build the triangle

#
let index_ids = meshlet.start_index_id + (local_invocation_id.x * 3u) + vec3(0u, 1u, 2u);
let index_1 = get_meshlet_index(index_ids[0]) - meshlet.start_vertex_id;
let index_2 = get_meshlet_index(index_ids[1]) - meshlet.start_vertex_id;
let index_3 = get_meshlet_index(index_ids[2]) - meshlet.start_vertex_id;
let vertex_1 = screen_space_vertices[index_1];
let vertex_2 = screen_space_vertices[index_2];
let vertex_3 = screen_space_vertices[index_3];
#

But I'm not actually sure my index calculations are correct here. I don't think subtracting by start_vertex_id works to get the meshlet indices back into the [0, 64) range :/

wide shadow
primal shadow
wispy spear
#

lustri got sucked in completely by the education programs

primal shadow
#

Zeux isin't in this discord, right? I have a question :/

#

I'll use github discussions

wispy spear
#

he used to

#

hes on the vk server iirc

primal shadow
#

There's a VK server? til

wispy spear
#

-,-

primal shadow
wicked notch
#

2/3 exams done

#

last one and I graduate let's goo

wispy spear
#

good luck with the last one my froggi

primal shadow
#

gl

#

Bunnies fight in hell (bugged)

dull oyster
faint crane
#

Incredible that every single statement is wrong.

dull oyster
#

"not compatible with raytracing", not with that attitude that's for sure

#

also "not that useful compared to traditional LOD", artists and coders have been struggling a lot with them, a Nanite-like system more or less makes it automatic

#

well obviously I'm preaching to the choir here
but still

wicked notch
#

the cooking's been on hold for a while now sadly, but I've been thinking a while about RT with nanite

wicked notch
dull oyster
#

My tests are currently on hold for the GP Direct demo, but that's what I stopped on: a BLAS per cluster group, and an instance inside the TLAS per instance of the cluster group*

wicked notch
#

did you ever get around measuring tracing and building perf (for the TLAS)?

#

compared to full LOD base BLASes

primal shadow
#

Traverse Research has also done RT with hierchal LODs, but idr if they ever explained how

wicked notch
dull oyster
#

I did not attempt to measure the difference between LOD 0 and this BLAS-per-group attempt

#

Partly because it is not finished
Partly because I don't care, I want to stream the cluster in and out of memory, so I will need to be able to do it with raytracing enabled too

primal shadow
wicked notch
dull oyster
dull oyster
glass sphinx
#

everyone uses compressed clusters anyways lmao thats already not compatible with rt

delicate rain
#

There already is a paper on it

wicked notch
#

damn

#

we don't deserve saky

delicate rain
#

Not done by me

#

Lmao

#

Soon though perhaps 🧙‍♂️

#

I'm kinda cooking the topic while on vacation, exploring directions

wicked notch
delicate rain
#

There are also two papers from adobe, talking about displacement without tessellation for rt

delicate rain
#

I didn't read yet, but I got the impression of it being nanite

wicked notch
#

it doesn't rely on meshlets at the very least

frank sail
#

more like nanot

wicked notch
#

I'm reading it for the first time rn

delicate rain
#

I'll try to read it today or tomorrow

wispy spear
#

its from the utah pot people?

wicked notch
#

it actually looks like something Brian Karis talked about before going to meshlets

delicate rain
delicate rain
#

I have a lot of learning ahead of me

wicked notch
#

this is actually good ol' nanite

delicate rain
#

Nice nice, thank you!

wicked notch
#

Intel suggests creating new BVH formats inside Vulkan/DX12 to make nanite + RT easier

wicked notch
#

very nice

delicate rain
#

Regardless of that, I wanted to say that I'll probably be hopping onto the nanite hype

#

Just in rt world

wicked notch
#

let's fucking gooo.meme

delicate rain
#

Btw what are good resources for nanite, still the deep dive into nanite video, or something else?

dull oyster
wicked notch
wispy spear
#

(you guys can ping me any time, no need for the space :P)

wicked notch
#

I found the thing

#

Plan on attending HPG 2023 in Delft, Netherlands, June 26-28, 2023.
Sign up for conference emails at http://eepurl.com/hZvXb1 .

HPG 2022 In-Person Event Keynote: The Journey to Nanite
Brian Karis, Epic Games
August 7, 2022, Fletcher Challenge Theatre, Harbour Centre, Simon Fraser University
https://www.highperformancegraphics.org/2022/in-person...

▶ Play video
wicked notch
primal shadow
glass sphinx
primal shadow
#

Do you save data that way vs refrencing a single set of vertices for all meshlets?

#

I'll have to look into delta position/uv though

#

Iirc nanite makes position relative to the cluster bounding sphere center

#

Not sure if that's only disk, or runtime too, as extra fetches for the bounding sphere data seems bleh

delicate rain
wicked notch
#

one more wrinkle for the brain

primal shadow
#

@wicked notch you use the screen-space AABB size of the meshlet for determing SW/HW raster right? For the first pass, do you compute one AABB using last frame's transform for culling against last frame's depth pyramid, and a second using the current frame's transform for choosing SW/HW?

wicked notch
#

yes

primal shadow
#

I have most of software raster prototyped, just need to write the actual triangle raster code 😛

#

Spent some time removing the off-the-shelf serializer I was using in favor of writing the bytes out myself. Meshlet asset loading is ~9x faster now, and probably uses less temporary memory https://github.com/bevyengine/bevy/pull/14193

buoyant summit
#

but also lods aren't as necessary with rt as they are with rast

#

shorter build times, less memory and only somewhat faster trace

delicate rain
#

In the paper I posted earlier they did seem to provide a significant speedup

wheat haven
#

stealing from reading Retina code has me questioning some of my long-held C++ habits, how dare you try and make me improve

pale horizon
#

stole code from trained my brain LLM on 😎

wispy spear
#

lustri, i wish you good luck for the eksems

wicked notch
#

I've done 16 of them what's one more

wispy spear
#

still 🙂

#

what sleep deprivation does to a mf

wicked notch
#

real

glass sphinx
#

lvstri we need to absorb you into daxa

wispy spear
glass sphinx
#

grand unified vulkan-abstraction theory

#

also jaker and martty and everyone else

#

we can all join and make one crippled library instead of many

#

😳

glass sphinx
#

at least i feel like there should be a lib to recommend to people other then using opengl thats more of a c api on vulkan

wispy spear
#

webgpu?

glass sphinx
#

yea that is ok

#

but nih

#

but we can nih together

wispy spear
#

i should be quiet, im totally unqualified for this

glass sphinx
#

webgpu also has some old-isms because of safety

#

i think we can be more ghetto

glass sphinx
#

we need an army

pale horizon
glass sphinx
#

vkguide does not have a good vulkan abstraction for regular use tho

#

its tutorial code

wispy spear
#

im also not a fan and abandoned it

minor root
#

One vulkan lib to rule them all

wheat haven
wispy spear
#

the problem is, that it feels like every other week there is a new way of how one should use vulkan

#

similar to every week a new js framework comes out which every web-dev needs to use

frank sail
#

the right way is my way

wispy spear
#

: )

#

and then there is the whole shader compiler bs in between which doesnt seem to allow certain things sometimes and you cant use vulkan as intended or something like that

frank sail
#

hmmm sounds like fuddery

#

I'm not particularly hindered by using glslang, as far as glsl compilation goes

wispy spear
#

good good

pale horizon
frank sail
#

except with vulkan it's like, actual hardware tiers rather than moronic API limitations being lifted

wispy spear
#

i remember when those tiers came around in dx12, and nobody used them

#

or dx11_2?

wispy spear
#

perhaps certain features of modern vulkan make sense for specific use cases only, but that was never clear or obvious to me, when reading/following those lose conversations about "what one should use/do" when it comes to bulkan

#

an actual guide would be nice (vkguide is not that)

#

also also dont listen to my random bs too much, i am not even focusing on vk right now, but i will from january onwards! mark my words lol

pale horizon
#

99% of what vkuide teaches is all that you need gpAkkoShrug
You can disregard what anyone else says KEKW

wispy spear
#

i dont like vkguide

#

its half assed

#

and a similar thing like logl right now - it feels like

frank sail
#

so vulkan offers multiple ways to do everything, but each way has genuine pros and cons

wispy spear
#

yeah

#

a "guide" (perhaps the wrong word) around those pros and cons for way x y and z could be something to build upon

frank sail
#

a simple guide won't cut it if you want to be able to pick the right tool every time

#

you have to read lots of stuff

#

like there's a bunch of ways to upload data, and even though my requirements are pretty narrow it's still hard to choose

wispy spear
#

yeah thats the stuff im talking about,

#

those things are mentioned across the server somewhat frequently too

#

but its 20 different people all the time 🙂

frank sail
#

like in #1128020727380054046 right now kekwfroggified

wispy spear
#

: )

frank sail
#

maybe you could make a rube goldberg uploading system like what devsh and co. cooked up

wispy spear
#

maybe i cant read, but thats exactly what the pros keep saying

#

but then its person a vs person b "yes you shouldnt to it this way, do it that way" 🙂

frank sail
#

at some point you have to decide for yourself based on the information presented

wispy spear
#

ofc

#

if you just want to display voxels for a minecraft clone its perhaps very different

#

compared to some open world terrain/planetrender streaming thingy

#

or maybe not, and you could use the same mekanism for both things

frank sail
#

it's also way too easy to overthink unless you have a concrete problem in front of you

wispy spear
#

yep

#

it would help if you know what you want to make exactly

frank sail
#

most of the time it's kinda obvious what you need to do to solve a problem in vulkan

#

like most cases of uploading data

#

don't tell the others I said this, but vkCmdUpdateBuffer works quite well in many cases

wispy spear
#

: )

frank sail
#

e.g. for a basic model viewer you could get away with vkCmdUpdateBuffer for per-frame data and then a simple duplicated buffer for per-object uniforms

#

or just vkCmdUpdateBuffer if you have few enough objects frognant

distant lodge
loud crag
#

vkguide is my lord and saviour

wicked notch
#

vblanco in shambles (it's joever)

#

I just used the khronos samples to learn vk honestly

#

I already knew GL so it was pretty easy froge

#

the GL -> Vk pipeline is real

delicate rain
#

Meh imo it doesn't really matter what you use to learn, even if you pick up some not so optimal practices, just listening to people on this server will eradicate most of them anyways

#

The biggest step imo is understanding the API, everything else you can just learn from other nih abstractions

glass sphinx
distant lodge
#

I'm glad vkguide/this server got me off of cached command buffers and on pipeline dynamic state

#

I think vulkan-tutorial presents this now too

delicate rain
#

It basically all stands on the fact that someone will implement a part of the algorithm in hardware

#

Otherwise it reduces memory bandwidth but compute cost will go nuclear

#

It's also unfit for use with current hw rt acceleration

#

So they basically say "currently this will only work in software but it will be slow. If someone adds a new hardware unit and makes it a part of the API it will be fast"

#

Sadge

buoyant summit
#

tbf

delicate rain
#

Interesting, I did not know that, but still I was kinda talking from the user perspective

buoyant summit
#

yes

delicate rain
#

As in as of right now it's basically useless for me 😅

buoyant summit
#

ok nvm

#

yes

#

that stuff in paper requires hw changes because it affects

#

but

#

I think global LOD is still useful

#

and global LOD doesn't affect traversal

#

with global LOD you'd just swap in differently detailed BLAS depending hit feedback or whatever

#

this is arguably more useful for something like a game as it lets you use less memory for the scene than needing full LOD in memory at all times

delicate rain
#

They mentioned that in the introduction - I thought you already could do that with the API (I know close to nothing about the API though)

buoyant summit
#

yes you could but

#

for very high tri meshes build cost is oof

#

and that's the bit that would be nice to address

delicate rain
#

I see, I need to experiment and build some intuition

buoyant summit
#

otherwise yes just do global LODs with rt

#

I think LOD decisions at traversal time are kinda whatever

delicate rain
#

There is also the paper from Intel about micro poly rt stuff, guess I'll read that next

buoyant summit
delicate rain
#

But I'm kinda worried

buoyant summit
#

but it only modifies build AFAIU

delicate rain
buoyant summit
#

so the hw can do it

#

just the api kinda can't (just for now, hopefully)

#

well it can but not in a very useful capacity I mean

#

you'll succumb to AS build costs with the kind of meshes they're working with

delicate rain
#

I wonder how sophisticated the hw BVHs are

buoyant summit
#

@hallow cedar

hallow cedar
#

what

buoyant summit
delicate rain
#

It seems like that's 90% of all the rt research now - more efficient bvh (aabb-kdop-obb)

hallow cedar
#

ah

#

nvidia i think quite a bit more than amd, intel potentially too

#

amd is a pretty standard BVH4, nvidia was doing CWBVH-y stuff I heard somewhere but I'm really not sure

delicate rain
#

I imagine it's all secret, especially the details

hallow cedar
#

well amd's not really

delicate rain
#

But AMD's is slow no?

hallow cedar
buoyant summit
#

all vendors likely use a simple bvh of some kind

#

but some vendors have more interesting hw bits for traversal

hallow cedar
#

speaking of i still have next to no idea about rdna4 rt and that's kinda weird to me

buoyant summit
#

rdna4 removes RT hw mindblown mindblown mindblown

hallow cedar
#

most accurate hw leak

delicate rain
buoyant summit
#

bro

#

get amd

delicate rain
#

So if you really wanna publish/research in that field, you're bound to software

buoyant summit
#

(or intel if you like extra pain)

#

hw rt is there

#

the open drivers are there

#

buy steam deck

#

you can hack the vk driver in whatever way you want

#

do whatever

#

with hw

#

directly

delicate rain
#

Uh

buoyant summit
#

nv rt in nvk does not yet exist and who knows when will

#

but the setup for rt on nv is a bit complex so it will be some time

#

I doubt you'd be up for REing nv prop blob

delicate rain
#

Yeah no and even if I were, there's a bunch of external limitations forced on me so it wouldn't be viable anyways

buoyant summit
#

ye

buoyant summit
#

very easy

delicate rain
#

Guess there is still the adobe displacement stuff that looked interesting though

buoyant summit
#

I swear on me gpu

hallow cedar
#

be careful with that

#

it might leave you hanging

pale horizon
#

They should remove TAA hw next

buoyant summit
#

what if there's no taa hw 😳

#

bro wants to remove the entire gpu

hallow cedar
#

CUs can do TAA with shaders -> remove CUs

#

only command processor pls thxbai

buoyant summit
#

computing on command processor be like frog_turtle frog_turtle frog_turtle frog_turtle frog_turtle

pale horizon
#

Retvrn to software rendering

hallow cedar
#

but what if software taa

#

maybe we will have to go outside

buoyant summit
pale horizon
#

Remove TAA CPU instruction

buoyant summit
#

those don't have drawing hw

#

or format conversion, image load/store, sampling support

#

just software

pale horizon
buoyant summit
#

just compute

#

only CUs

#

no drawing

hallow cedar
#

but what if our eyes do taa internally
your eyes do taa get them out get them out get them out get them out get them out get them out get them out get them out get them out get them out get them out get them out get them out get them out get them out get them out get them out get them out get them out get them out get them out get them out get them out get them out get them out get them out get them out get them out get them out get them out get them out get them out get them out get them out get them out get them out get them out get them out get them out get them out get them out get them out get them out get them out get them out get them out get them out get them out get them out get them out get them out get them out get them out get them out get them out get them out get them out get them out

pale horizon
#

Me when I turn my head in game with TAA fr fr

pale horizon
frank sail
delicate rain
#

I was mostly meming

#

Although I've heard some hate from Patrick

frank sail
#

Oh, AMD's rt acceleration is indeed a lot slower, but it's still very usable

delicate rain
#

Last I've heard it's 4-5x faster than software, but maybe that is outdated

#

I want to do hw rt so much, but it's hard to find something to wrap it with (as in a topic for uni)

fiery bolt
primal shadow
wispy spear
#

stolen from ue sources

wicked notch
#

you can see that in AMD's small triangle culling code

#

I don't feel qualified to explain that (also because it's been so long) but AMD should have everything you need

primal shadow
primal shadow
#

I didn't find anything there

wispy spear
#

finish it first

faint ruin
#

oh wait I didn’t even realize Elias was the one that wrote that vulkan article about how he learned wowza

#

sorry my mind goes a million miles a second sometimes

wicked notch
#

elias made it

#

insane

faint ruin
#

I’m a little dense sometimes

wicked notch
#

it's ok

#

don't worry about it chief

faint ruin
#

I’ll finish Vkguide up first
I’ve been super happy with it this far, probably the best thing to do before asking further questions

wispy spear
#

doesnt hurt to take notes about things which make no sense yet

#

and there is a post (something something vkguide) in #1019722539116802068 from vblanco, the author of that guide

faint ruin
#

thank ya thank ya

pale horizon
#

What was the deleted post about? 😅

faint ruin
# pale horizon What was the deleted post about? 😅

Knee jerk reaction to delete it buuuut
My initial question was essentially asking once one finishes vkguide, compared to say Vulkan-tutorial, it’s much more applied in that you’re making an engine. The question was essentially in regards to making a smaller renderer inspired by vkguide to better fit a different use case, or if basing a new project off what I did in vkguide makes sense where things do not see

#

Like, if I wanted to make a simulation application or a tool that lets me test different rendering techniques or run different demos of implemented graphics papers. Or I want to make a nice terrain generator. How can I take what I learn in vkguide into these tutorials in a way that tailors it to my use cases?

pale horizon
#

The renderer in vkguide is already quite minimal, and you’ll likely need much more for a “small” renderer, tbh
You can drop “drawing transparent objects” part if you don’t need it.

Also you don’t need “GPU driven” part most likely, unless you plan to render 1000s of objects

wispy spear
#

i would read the whole thing anyway

faint ruin
# wispy spear i would read the whole thing anyway

That’s my plan to at least read up to and through gpu driven rendering (maybe not before actually iterating on ideas), cause I have time to learn and wanna improve my understanding and application of the API

pale horizon
faint ruin
#

I have ideas on projects I wanna take on, it’s just a matter of reasonably converting what I have here to my needs.

I also feel a certain way of “cheating” using sample code and tutorials like this without actually understanding the underlying API (which happens through reading and actually getting your hands dirty)

#

But that’s a personal disposition I’m working on discarding

#

it makes no sense to have that especially if I am new to Vulkan.

wispy spear
#

you also better create a post in #1019722539116802068 about the projects you have in mind, like the other frogs did/do too

faint ruin
#

oh no, I will not be subject to the threats of those who came before me
post my projects I shall

wispy spear
#

hehe

faint ruin
#

Vulkan has been fun, learning it has been exhausting but I like the challenge
it definitely helps to have done stuff with other APIs too
I’m hyped, thanks guys

pale horizon
#

I can also highly recommend getting into bindless textures ASAP
Descriptor sets are not that good and bindless + buffer direct access allows you to mostly get rid of them

faint ruin
pale horizon
#

It’s not like they’re “bad”. It’s just managing/allocating/defining them is PITA when with bindless you just pass your textures with push constants and that’s it

faint ruin
#

I see

#

Push constants seems to be something I might wanna use later on

#

But yeah, the process for setting everything up with the sets and layouts does seem rather tedious

pale horizon
faint ruin
#

seems like I’ll keep digging through then

faint ruin
pale horizon
#

Push constants are fundamentals of vk 😅
But yeah, it’s good to reference multiple resources. Also TU Wien lectures are quite good indeed.

#

Don’t worry about “not getting” it - you’ll understand more with practice and when you solve real problems and add code on your own.

#

I think I know like 10% of vk and still I managed to make something presentable

faint ruin
#

Not necessarily the 10%, but the knowledge that leaning into the fundamentals and just… working with it. You don’t need to learn everything at once

#

It’s like a fighting game. You don’t need to learn every combo at the beginning

pale horizon
#

Yeah. You’ll rewrite it 10 times anyways, lmao

faint ruin
#

LOL probably true

faint ruin
wicked notch
#

man

#

you ever try to sit down and study

#

but while you read your eyes go out of focus

#

I think I'm going to die soon

hallow cedar
#

while you read
wish I'd get that far

#

i know the "going out of focus" stuff stuff though

#

it usually happens when I'm tired

wicked notch
#

it's barely 5pm tho agonyfrog

#

the 32°C of hellfire in my room doesn't help

hallow cedar
#

ah you too

#

i had 35°C earlier this week

#

death

delicate rain
#

Usually happens for me the first hour or so that I start

#

Once I get into it properly it's usually fine

#

And then once I get tired it comes back yeah

buoyant summit
#

today was my first moderately productive day in a while actually

#

did some CTS work that I can't talk about details of

#

god VK-GL-CTS is a chonker

hallow cedar
#

it's insane

#

on my old laptop I had to basically compile until I run OOM, then compile with lower thread-count (singlethreaded at time) for a bit, then abort and continue with higher thread count

#

because -j1 is incredibly dog slow but -j6 (yes it's a hexcore 🐸) makes me run oom

buoyant summit
#

buy chungus ram

#

why are there no laptops with option for 64G

#

or 128G

#

pls

#

ayymd how can I survive off 32G with a 12c SMT2 part

#

my current work laptop is actually a

#

4c SMT2 16G thing

#

which is...

#

well..

#

it works

hallow cedar
#

i was considering retrofitting 64g into my laptop

buoyant summit
#

with swap it's actually fine, so browser etc can get swapped out while important stuff gets to use ram

hallow cedar
#

went with 32g for now and it's ok

buoyant summit
#

you should

#

32G felt a bit uncomfortable on my 5900X PC

hallow cedar
#

yes on pc i'm 64G

#

on laptop i'm 8c16t32gb

#

but mediumterm I kinda want to look at alternatives

buoyant summit
#

I just want a nice reasonable laptop

hallow cedar
#

when I get a non-shit internet connection I'm considering just having a remote power switch for my pc and doing remote-y dev

buoyant summit
#

where 5p69e cpu pls

hallow cedar
#

I suppose my case is sorta special though

buoyant summit
#

ye that sounds nice

hallow cedar
#

I need chonker gpu for rt, but I've come to hate gamer bricks

hallow cedar
#

though I'd like to do boot/kernel selection perhaps

#

hmmmm

#

anyway I should probably ramble about this in my own thread instead 🐸 sorry

glass sphinx
wicked notch
#

btw it's two hours later and the situation has barely improved

#

here's a conversation between me & my friends about the situation at hand (the exam is tomorrow)

#

first message is "how is it going"

loud crag
#

L

wicked notch
velvet marsh
#

it's going to be ok, unless it won't

buoyant summit
#

tbh heat is also temporary

frank sail
#

life is also temporary

velvet marsh
#

do you all need a hug today

wicked notch
#

I need to pass this godforsaken exam

wheat haven
#

what subject

wicked notch
#

image processing and neural networks

#

it's 6 credits but it should be 1000

#

there's like 5 books we had as material

#

my notes are 600k words and a 400 pages

#

it's insane

velvet marsh
#

sounds over the top and unnecessary

wicked notch
#

lads

#

it is

#

finally

#

(for real this time)

#

over

faint crane
#

Is it over, or are we so back?

wicked notch
#

we are truly back

#

exam time is finally over

delicate rain
#

Ez clap

#

Time to shill some gp

wicked notch
#

ngl I was so lucky bleakekw

#

I knew nothing about morphological processing (had to skip because it was impossible to understand all that garbage in a few days)

#

but they didn't ask me about it so

frank sail
#

you did it laddie

frank sail
delicate rain
#

Wtf

#

I think I took like 500 words worth of notes accumulated across all my years studying lol

frank sail
#

I was so stupid I would take "notes" in math lectures just copying what the prof wrote, hoping it would make me magically understand the topic better

#

normally I'd barely write down anything though

delicate rain
#

I did that a little the first year

#

But yeah just listen and try to understand

#

Much better spent time than copying and not catching what he says

frank sail
#

my math prof was foreign and had an accent that I could hardly understand (my bad hearing didn't help either) froge_bleak

delicate rain
#

Oh god, yeah that does not sound good bleakekw

frank sail
#

anyway I somehow pulled through

delicate rain
#

Having pretty much all lectures on YouTube kinda allowed me to get away with almost no notes too btw

pale horizon
#

LVSTRI gonna come up with novel waifu generators next

wispy spear
#

wb lustri, time to bikeshed UE6 clone

loud crag
#

since idk why this works, I'm a little confused

#

because that does backface culling, right?

#

since it does det > 0.0

#

however in my Metal renderer that does frontface culling, and I need to do det < 0.0 instead

#

is this related to Y+ being up instead of down?

wide shadow
#

the code is from here I guess and if you do proj_mat[1][1] *= -1.0f to flip Y axis then you do frontface culling instead of backface culling
https://zeux.io/2023/04/28/triangle-backface-culling/

wicked notch
loud crag
#

ok thank

velvet marsh
#

a negative determinant indicates a reflection

#

so it's not just a up thing, but a handedness thing

#

you could have a different up axis and still have a positive determinant

#

and you could have the same up axis and a negative determinant

primal shadow
#

Help me brainstorm names that are better than meshlets, but less verbose than virtual geometry 😅
I was thinking maybe MicroMesh, but that's kinda a thing already (RT technique)

wicked notch
#

meshlet = cluster

#

if that's what you want

wispy spear
#

micromesh is a thing already

primal shadow
#

No a name for the user-facing feature in bevy. Something catchy like nanite.

wispy spear
#

i linked the papers somewhere in frogfood a week ago or so

primal shadow
wicked notch
#

that's weird

#

could've just done cluster and clusterInstance innit?

#

or is that reserved somewhere else

primal shadow
#

Ehh that's a bit verbose with how often I use the word cluster 😅

wicked notch
#

this is a matter of preference ofc

#

personally I'd rather a little bit of verbosity if that means not using two synonymous words in different contexts

primal shadow
#

It's subject to change, I need to clean all my code up, it's on the TODO. I'll think about it.

faint crane
primal shadow
#

I want to try and come up with a better name for the feature though since it's confusing that it's called meshlet, which is also used everywhere in the code 😛

wispy spear
#

who cares whether rust goes into nodejs? (im sorry if thats you cody)

wicked notch
faint crane
#

Nah, alcohol as a naming scheme.

wispy spear
#

ah

wicked notch
#

(in UE HLOD is a very different thing from virtual geometry)

primal shadow
#

We have HLODs already as a seperate thing, although I think we call them VisibilityRange

wispy spear
#

just keep the same naming scheme jasmine, everyone doing it knows what x and y means, and you cooking up some new terminology will be conchfusing

#

unless you want to coin those new shit for some weird reason in the bevy world, to stand out or something idk

primal shadow
#

Ok forget the variable naming

#

I need a new name for the rendering feature itself, like Nanite

wispy spear
#

Bevvite 😉

primal shadow
#

Currently it's just "MeshletPlugin" and "MeshletMesh"

#

Which are terrible

wispy spear
#

Meshlette

#

MeshlettifiedMesh

primal shadow
#

Those are so much worse 😅

wispy spear
#

ey 😄

#

why not just Mesh

#

and the engine decides whether it needs to meshlettify it or not, via some switch

primal shadow
#

We have a Mesh, but it's an entirely seperate thing, and would be very confusing to users who don't use the feature

wispy spear
#

remove it then

wicked notch
#

do you fancy acronyms? KEKW

primal shadow
#

Rip transparency and animated meshes I guess 😛

#

Sure, acrynyoms are fine

wispy spear
#

and make Mesh 2.0 the new Mesh

#

and add animations back in heh

wicked notch
#

CBVHLOD (cluster based virtualized hierarchical level of detail bleakforg)

primal shadow
wicked notch
#

intel skill issue

wispy spear
#

thats what you get for targeting peasant hardware

primal shadow
wicked notch
primal shadow
wicked notch
#

the more confusing the better the acronym

primal shadow
#

I think I'll just go with VirtualMeshPlugin and VirtualMesh 😛

wicked notch
#

but yeah naming is hard

primal shadow
#

What makes a mesh "virtual"? Idk it's a meaningless term, but it sounds good

wicked notch
#

I think the virtual is mostly referring to virtual texturing, as nanite author explains it the logical transition was "virtual texturing" (decoupling memory budget and texture sizes from rendering) to "virtual geometry" (again, removing concerns about poly budgets and automating LODs)

primal shadow
#

No I know, but for like non-rendering people, virtual is such a meaningless term lol

wicked notch
#

o

#

yeah idk how to explain it to non rendering people

primal shadow
#

The other annoying thing with VirtualMesh is that it has mesh in the name

#

So now I'm going to have dumb variables like virtual_mesh_meshlets

#

Which I mean, ig it's fine

#

Best I can come up with

wicked notch
#

tell your users "you better pray I don't call my next variables x and y"

#

make them appreciate smart

velvet marsh
#

vmeshes is my contribution

primal shadow
#

VMesh is not bad, thanks

wicked notch
#

nanogoon

frank sail
#

geographs

faint crane
#

nanot

buoyant summit
#

megametry

#

ur welcome

buoyant summit
primal shadow
#

@wicked notch do you understand nanite's scanline code? I don't get how it's solving for the passing X-interval

#

It's also not working for me 😦

primal shadow
#

Nvm think I got it

#

Idk why I swear I've tried this code before, but now it's suddenly working...

#

Mostly working, some pixels are slightly off still, but close!

primal shadow
#

Haven't finished the heuristics for SW/HW render switching, but only getting maybe a ~10% speedup vs HW only... Nowhere near what Nanite is quoting.

#

On the plus side I save a lot of memory having to only allocate data per-cluster instead of per-triangle, as I don't have access to mesh shaders to make HW raster fast

wispy spear
#

time to extend webgpu to support mesh shaders

primal shadow
#

I suppose I could use spirv passthrough and write it in glsl instead ig..

wispy spear
#

what is naga? the shader middleware?

primal shadow
#

Shader transpiler. Wgsl -> msl/hlsl/spirv/glsl

wispy spear
#

ah

fiery bolt
primal shadow
fiery bolt
#

try out some raw megascans assets perhaps

primal shadow
#

Trying to figure out why tf triangles wrap around to the other side of the screen when doing scanline...

primal shadow
#

I don't have any time to experiment though, as I need to focus on meshlets :((

wispy spear
#

ah oki

primal shadow
#

And my sutpid fucking shader won't work and renderdoc is giving me different results when viewing different threads

#

Any idea why renderdoc says thread 2 writes a "good" value to LDS, but then threads 0 reads garbage at index 2??

frank sail
#

Well, interactions between multiple threads

primal shadow
#

Really? Ahh

#

Well then idk how to even debug this tbh

frank sail
#

Ok, it doesn't run other threads at all

To debug a compute thread simply go to the compute shader section of the pipeline state viewer and enter the group and thread ID of the thread you would like to debug. This thread will be debugged in isolation with no other threads in the group running.

This means there can be no synchronisation with any other compute thread running and the debugging will run from start to finish as if no other thread had run.

primal shadow
#

Yeahh idk how to even go about trying to figure out to figure out why my triangle is wrapping around the screen when it should be clipped

frank sail
#

Make a version of it that doesn't use shared memory

primal shadow
#

I guess do all the math on the CPU?

#

Oh, good idea lol

faint crane
#

Or bunnies apparently.

wispy spear
#

unfortunately im not in the gaming/graphics industry and therefore neither at siggraph nor hpg nor any other giraffics related conference : (

faint crane
#

I can spread the gospel of VSM or whatever acronym we want to give it.

wispy spear
#

please do 🙂

primal shadow
#

Which is obviously backwards...

#

How even though?? ahh

#
// Compute triangle bounding box
let min_x = u32(min3(vertex_0.x, vertex_1.x, vertex_2.x));
let min_y = u32(min3(vertex_0.y, vertex_1.y, vertex_2.y));
var max_x = u32(ceil(max3(vertex_0.x, vertex_1.x, vertex_2.x)));
var max_y = u32(ceil(max3(vertex_0.y, vertex_1.y, vertex_2.y)));
max_x = min(max_x, u32(view.viewport.z) - 1u);
max_y = min(max_y, u32(view.viewport.w) - 1u);
#

I think it's because I clamp the max values to be within screen bounds, but not min...

wicked notch
#

it's UE's so beware KEKW

primal shadow
#

Ah ok

#

Still, I figured out the issue myself, and I just wanted to check if it's ok to early-out here or not

#

Which yeah, it's fine

#

Ok so that fixes the weird wrap-around issue

#

Things are still a bit pixelated with the scanline variant though, not sure why

#

might be an off-by-1 pixel somewhere

#

Like something's weird here

primal shadow
#

Idk how Nanite quotes 1.1ms for raster time in lumen in the land of nanite

#

That's insane

fiery bolt
#

the persistent threads bvh traversal probably saves a bunch of time, as well as whatever micro-opt they did in the compute rasterizer

#

still, wasn't it 2-3 ms?

#

or was that the second demo

primal shadow
#

No it was 1.1ms just for the raster, nothing else

#

The only thing I can think of is I use 64 triangle clusters, and tend to have low fill rate. They use 128 triangle clusters (better warp occupancy), and probably do better on fill rate.

fiery bolt
#

meshopt asserts at more than 124 tris per meshlet doesn't it KEKW

#

and the simplifier doesn't work all too well (it went from simplifying 14000 -> 144 meshlets once)

#

I need to make my simplifier work

wicked notch
#

I'm cookin lads

#

I'm cooking something special

primal shadow
#

What of?

wicked notch
#

a special sauce

#

the sauce that clusterizes and simplifies stuff

#

that I promised jaker 10 decades ago KEKW

faint crane
#

There was a presentation on dynamic LoD and “nanomesh” for mobile at the part 1 of “advancements in real-time graphics in games” course. Slides expected to be out next week.

#

I recall them having a 32 bit visibility buffer which stored cluster and triangle id at 25 and 7 bits. Also an exponential distance based heuristic for selecting a cut and culling.

dull oyster
#

an exponential distance based heuristic for selecting a cut
Exactly what I need for selecting clusters for raytracing 👀

faint crane
#

They have another presentation Thursday. I’ll have to try my hand at it when slides are out since I couldn’t follow verbally.

#

Results were impressive for high and low end mobile though. Very encouraging.

primal shadow
#

At gdc iirc

fiery bolt
faint crane
#

Your Bevy article was referenced on screen.

fiery bolt
primal shadow
faint crane
#

Visibility Buffer Rendering during part 2

primal shadow
#

Oh, cool! Are the slides posted anywhere?

#

Website doesn't have them yet 😦

faint crane
#

Sometime next week. Photography isn’t allowed so I didn’t try.

#

You can have this though.

wide shadow
frank sail
#

loading it is a pain in the butt I'm finding

#

it's split into a million tiny usd files

fiery bolt
glass sphinx
#

no way blender can handle that

#

prob faster to write a cpp program to do that

fiery bolt
wispy spear
#

is that the atomium sculpture in brussels

fiery bolt
#

that's a random table i stole obtained legally from megascans at max quality

wispy spear
fiery bolt
#

(for a total of 3 billion tris)

wicked notch
#

plus it was never meant for this shit

#

I'll roll my own

#

and then give it to you frogs

buoyant summit
#

make sure it's written in rust

#

or I will oxidize the pins on your cpu

#

or pads, depending on which ones your cpu has

wicked notch
#

this channel is anti rust

#

I like my iron clean and with no oxygen

fiery bolt
#

all my code is rust frog_bath

faint crane
#

"Variable Rate Shading with Visibility Buffer"

primal shadow
#

Ok so I peeked at nanite's code and was able to fix my scanline SW raster

#

Adding the "is point in triangle" test to each pixel of the scanline instead of assuming it's covered fixed it

#

Not sure why that's neccesary, I guess some kind of partial coverage/subpixel thing

wicked notch
#

man

#

the kitchen is overloaded

#

why are there so many edge cases

#

I've been doing this for hours and I'm nowhere near done

#

and it's 3am

#

guess I'll keep going tomorrow, my eyes are not staying open bleakekw

#

this reminds me of self balancing rbtrees but worse

#

"if the node has no black successors, at least one ancestor is red or if it has no left children and at least one right child or if ..."

ebon ruin
#

sleep bro

#

or you’ll become munted and die

fiery bolt
primal shadow
#

Did a massive refactor of the >1000 lines MeshletGpuScene into separate parts. Took ages, but much cleaner now, minus one of the new subsystems that's basically a copy paste. I still need to clean that up, and then optimize things so that we're not spending ~1ms/frame of CPU time on extract + preparing resources. https://github.com/JMS55/bevy/commit/b8ab371a2566c5c0f2fe743224854f8cad452b12

#

Overall CPU timings

wicked notch
#

pog

#

gotta read this

faint crane
#

I only attended SIGGRAPH, missed this was going on until they held a concluding panel at SIGGRAPH.

#

Just need a METIS killer.

wicked notch
#

I'm cooking

#

I only need to figure out testing because spamming node allocations is fun and all but eh kekkedsadge

#

I wonder if someone made some test suite for this algo

faint crane
#

Test in prod. PR to Unreal.

wicked notch
#

you know what

#

you're actually a genius

#

I forgot I had ue cloned and built

#

I'll just drop my shit in there and see what happens KEKW

faint crane
#

Cat is in lap. Maybe I gained a brain cell.

#

METIS but with good parameter names would be the most exciting thing out of this last week TBH.

#

I tried porting it once, but saw a bunch of GOTOs and cursed control flow.

primal shadow
faint crane
#

Froge computing when?

primal shadow
#

This paper seeks to close the gap by defining a block-compressed geometry format that is designed for arbitrary geometry topologies and can be directly consumed by future fixed-function hardware.

#

It's a bit of both, they say you can use it in mesh shaders and stuff, but also suggest it as a future hardware-level representation

primal shadow
#

Capped off the weekend by opening up the next meshlet PR and writing up the description for it https://github.com/bevyengine/bevy/pull/14623. Once this is merged, I plan to improve the CPU performance, and then either look into fixing occlusion culling bugs, or do persistent threads style culling and save a large chunk of time + eliminate a large amount of memory allocations.

fiery bolt
#

occlusion culling bugs froge_bleak

#

that's what I'm trying to fix (or was anyways until I decided to rewrite all my shaders in slang)

primal shadow
#

I'm using SPD for my depth pyramid generation, which is defeinitly not conservative 😛

#

I might just have to give up and write a more complex, slower, multi-dispatch downsampling pass

fiery bolt
#

you can set your own reduction tho

primal shadow
#

Only the reduction op (e.g. average, max, min, whatever)

#

But the problem is sometimes you need to read more than a 2x2 when downsampling

#

Something like that, I haven't looked into it too much

fiery bolt
#

yeah a max (for inverse z) should be conservative

primal shadow
#

Overview A hierarchical depth buffer is a multi-level depth (Z) buffer used as an acceleration structure for depth queries. As with normal texture mip chains, the dimensions of each level are generally successive power-of-2 fractions of the full-resolution buffer’s dimensions. In this article I present two techniques for generating a hierarchica...

fiery bolt
#

I was also fixing my SPD before rewriting everything in slang lol

#

dxc infinite looped when I made my 6th mip globallycoherent bleakekw

primal shadow
#

Fun. I don't even have access to that, I just split it into 2 dispatches (the second is only needed if you have a large enough initial texture, which is usually no)

primal shadow
#

idr off the top of my head

fiery bolt
#

it'd be 2^7 or 8 depending on how you generate mips

#

unless you go to quarter res directly

fiery bolt
#

@primal shadow i noticed you use the current camera view in the early pass - shouldn't that be the previous frame view?

primal shadow
#

Oh you know what, maybe I bind previous view as the current view? Hmm

fiery bolt
primal shadow
#

Yeah it's not that

#

I really don't know how this got messed up, because I explicitly fixed it in a PR dedicated to this kind of thing 😅

#

Maybe a lost commit? idk

fiery bolt
#

lol

primal shadow
#

I'm so confused what my code is doing 🤔

#

Oh ok I think I've fixed it

fiery bolt
#

hmmm 90ms for 3 billion tris, I should probably get started with the bvh and persistent threads cyberpoonk

#

I'm gonna TDR so much

wispy spear
#

increase the TDR timeout think

dull oyster
#

render 1 billion tris and temporaly accumulate them

wicked notch
#

or just raster

fiery bolt
wicked notch
#

did you profile 🐸

fiery bolt
#

and nosight shows me that my task shader has the shittiest occupancy known to man

#

like, 8 warps per SM

#

30% instruction issue rate

wicked notch
#

show TS

fiery bolt
#

TS?

wicked notch
#

task shader

fiery bolt
#

ah

#

the worst code you're ever gonna read

wicked notch
#

I mean you could've done the meshlet emit count thingy with subgroup ops instead of atomicAdd'ing one every time

#

but it's fine otherwise

fiery bolt
wicked notch
#

doesn't need the atomic anymore innit

#

because your group size will be equal to subgroup size

fiery bolt
#

my workgroup size isn't subgroup size thonk

wicked notch
#

yes

#

you will set it to that though if you want it to work KEKW

#

with subgroup ops that is

fiery bolt
#

well I guess

wicked notch
#

but it's fine either way

fiery bolt
#

yeah

wicked notch
#

profile your shader and look which inst is taking up the most cycles

fiery bolt
wicked notch
#

me when LGSB

fiery bolt
#

actually, can nosight tell me the number of mesh workgroups dispatched

wicked notch
#

update to the beta nosight

fiery bolt
#

wait where do you get beta versions

#

am i stupid

wicked notch
#

it's actually just the 2024.2 version

fiery bolt
#

oh

#

lmao

wicked notch
#

nsight is in release only the other stuff is in public beta

#

I'm dumb

fiery bolt
wicked notch
#

I just updated too

#

when september comes around I need to see if I can ask my supervisor to sign an NDA with NVIDIA

#

so I can get nsight pro froge_love

fiery bolt
#

where do i get the mesh shader stats, i've been using 2024.2 for a bit an never seen them

wicked notch
#

it's been a long time since I last used nsight bleakekw

#

(literally only 2 months)

fiery bolt
#

i need to get myself a supervisor, i'm just a lowly first year froge_sad

#

well technically second soon

#

top stall 1: LGSB 49% bleaker_kekw

#

i wonder if i can (ab)use mesh shaders to switch between hardware and software raster in the same shader thonk

#

SetMeshOutputCounts(0, 0) and software raster or something

ebon ruin
#

What did gp do before virtual geometry lol

fiery bolt
#

actually release games

wicked notch
#

too real

fiery bolt
#

i reduced my mesh shader payload size and went from 60 to 27 ms

#

what the fuck

wicked notch
#

you should keep it to less than 104 bytes

#

as per nv's suggestions

fiery bolt
#

i was at... 768

wicked notch
#

incredible

fiery bolt
#

now it's 256

#

one index per meshlet

#

how do i make it smaller

fiery bolt
#

good idea

#

but my late culling pass has fragmented indices

#

i should get rid of the whole meshlet pointer shit tbh

loud crag
wicked notch
#

gpu issue ig

faint crane
#

There are frogs in your computer eating up all the threads.

frank sail
#

frogs in my computer forgeeep

ebon ruin
#

i wonder if apple had a good reason to use their own graphics api

#

it wouldnt surprise me if they did it just to make programmers slightly more suicidal

buoyant summit
loud crag
#

in what way

#

more details pls

buoyant summit
#

cc @craggy shale

buoyant summit
# loud crag in what way

in doing lots of invalid transforms to these ops like moving across control flow, and not using correct tangles (sets of threads that participate in a simd_ op)

loud crag
#

so the asahi compiler wouldnt have had the same hang issues?

#

how tf can they not compile correctly for their own hardware

#

also still not eure what you mean

#

too eeepy for this

faint crane
#

Frogs in your brain.

buoyant summit
primal shadow
fiery bolt
#

I don't understand what exactly they use to bound their BVH froge_sad

#

probably need to put some thought into it

#

but first I need to fix my cursed task shader abuse

fiery bolt
loud crag
#

i remember a presentation from apple about that in llvm and how they managed that

buoyant summit
#

idk

#

you can make llvm actually work

#

NV has done that somehow

#

@craggy shale pls msl simd_ miscompilation examples

loud crag
#

the guy from apple i spoke to about this said he had no idea why my shader was causing it by looking at it briefly

#

just blamed it on mesh shaders

craggy shale
primal shadow
loud crag
#

havent we all done this now

#

min reduction sampler or with just min() when reading the 4 samples

loud crag
primal shadow
loud crag
#

with the nvidia spd

buoyant summit
# loud crag i cant figure out mastodon search so i can‘t find shit
#

it's old tho so maybe they have fixed things since

#

gob also had a screenshot somewhere where they cse'd simd_add(1) across control flow

primal shadow
wicked notch
#

no changes required

#

it was very straightforward actually

loud crag
#

oh wait i get it

#

if i was using 32 meshlets per invocation, i had no loop (optimised away) and then it worked fine

#

anything else and it hung, probably from using the simd stuff in the loop

#

might still be related tbf

#

though again im no expert on these compiler shenanigans so no idea what this “loop reconvergence” means

#

mayhaps i could inspect the generated IR and see whats going on

fiery bolt
# loud crag ye lvstri does this and ive done this (semi-successfully)

https://themaister.net/blog/2024/01/

Building the mip-chain in one pass is great for performance, but causes some problems. With NPOT textures and single pass, there is no obvious way to create a functional HiZ, and the go-to shader for this, FidelityFX SPD, doesn’t support that use case.

The problem is that the size of mip-maps round down, so if we have a 7×7 texture, LOD 1 is 3×3 and LOD 2 is 1×1. In LOD2, we will be able to query a 4×4 depth region, but the edge pixels are forgotten.

The “obvious” workaround is to pad the texture to POT, but that is a horrible waste of VRAM. The solution I went with instead was to fold in the neighbors as the mips are reduced. This makes it so that the edge pixels in each LOD also remembers depth information for pixels which were truncated away due to NPOT rounding.

I rolled a custom HiZ shader similar to SPD with some extra subgroup shenanigans because why not (SubgroupShuffleXor with 4 and 8).

primal shadow
wicked notch
#

this refactor is going very well

#

I say to myself as I wonder why I can't keep working on the same thing

fiery bolt
#

9.6 ms to lod and cull 210384384 meshlets thonk

#

85% SM utilization

#

i doubt i can get eek out more perf out of this

#

which means it's BVH time

wicked notch
#

single ownership rules

glass sphinx
#

this is now the meshlet rendering channel

wicked notch
#

always has been

fiery bolt
fiery bolt
wicked notch
#

our nanite is better

#

by virtue of being open source

#

and that we're all learning from each other froge_love

fiery bolt
#

my occlusion culling still decides to cull shit randomly

#

for no reason whatsoever

#

it's so random i think the bounding sphere generation is fucking up somewhere

primal shadow
fiery bolt
#

your performance is now gonna half lol

primal shadow
#

Lol we'll see

fiery bolt
#

actually no it might improve

primal shadow
#

I have a ton of stuff to implement still, there's so much I can improve

fiery bolt
#

because it'll switch to the lower LOD earlier

fiery bolt
#

if only this occlusion culling started to work properly...

primal shadow
#

Mine is broken, I need to fix it

fiery bolt
#

who's isn't

primal shadow
#

The fold might be better written as a map and then a max, hmmm

fiery bolt
#

i do map(..).reduce(f32::max).unwrap()

faint crane
fiery bolt
#

unfortunately it doesn't

#

maybe i should let it run for longer

wicked notch
#

I now have the the greatest timeline abstraction thanks to Dolkar

glass sphinx
#

dolkar is indeed the timeline god

wispy spear
loud crag
#

@wicked notch in your idea of sparse vsm, when would you free pages that are not visible? In the normal VSM we have those two buffers for (1) free pages, and (2) cached/invisible pages, and we allocate from free pages before re-using cached pages.
Now the problem I am having is this line in the Metal docs:

If the heap runs out of memory, Metal skips any remaining tiles in the request.
This means I need some way of figuring out when to free cached pages, so that I don't run out of memory... What were you imagining to solve this kind of problem?

wispy spear
#

count up the space you consume as you request pages, and if above a certain threshold you flush old pages?

loud crag
#

hmm sure i guess i could query the amount of sparse memory I am currently using, and then when I get close to OOM I could just evict a few unused but cached pages

wicked notch
loud crag
#

what about caching though

wicked notch
#

we only cache in view pages

loud crag
#

oh bruh

#

ok ig that simplifies a lot

wicked notch
#

to keep out of view pages cached you can just implement an LRU cache

loud crag
#

what's that

wicked notch
#

and pop pages as needed before allocating new ones

wicked notch
#

one of many

#

least recently used (LRU)

loud crag
#

well that could be something for later then

wicked notch
#

it's actually the easiest solution but my dumb ass brain couldn't reason about it

#

so you just have one timeline semaphore per queue as usual

#

but the timeline value is shared between them and it's always monotonically increasing

#

to ensure that stuff gets deleted properly, you just collect every queue's last reached value (from the semaphore if needed) and take the min of that

#

then you use that as the value to compare when deleting stuff

#

also if a queue isn't used for a while, you just fast forward its last reached value to the current maximum value (otherwise the deletion queue would get stuck)

#

I took heavy inspiration (to avoid saying straight up copied) from Dolkar's abstraction

buoyant summit
#

I see

#

cool

fiery bolt
#

i just realized, instead of taking LOD error as a sphere with center = centroid of the group, shouldn't it be a sphere with center = closest point of the group bounding sphere thonk

primal shadow
#

Wdym closet point?

fiery bolt
#

so you'd have to store the group bounding sphere, project that, and then place your 'lod sphere' at center - (0, 0, radius)

delicate rain
#

LRU cache is cope, what I'll do is sort based on the cascade index, so that I free the cached pages in the lowest cascade first

#

Since those will have the shortest lifespan anyways

wicked notch
#

smart

fiery bolt
#

why is meshopt generating meshlets with an average of 42 tris

frank sail
#

meshopt knows the meaning of life

wicked notch
#

real

#

I'm almost back btw

#

I just need this one last feature I promise

#

just one last feature

fiery bolt