Iris - A Journey through OpenGL and beyond to learn Graphics | Graphics Programming | Page 19

glass sphinx Apr 3, 2024, 9:37 PM

#

huh the fuck

#

for me per primitive atteibutes work fine

buoyant summit Apr 3, 2024, 9:37 PM

#

that doesn't work like that in all companies 💀

wicked notch Apr 3, 2024, 9:37 PM

#

glass sphinx for me per primitive atteibutes work fine

they decorate everything with perprimitiveext BUT the builtin variables

buoyant summit Apr 3, 2024, 9:37 PM

#

if slang was intel and intel had a driver bug they'd probably add a workaround to the compiler

wicked notch Apr 3, 2024, 9:37 PM

#

so gl_PrimitiveID and gl_CullPrimitiveEXT are not marked

glass sphinx Apr 3, 2024, 9:38 PM

#

wicked notch they decorate everything with perprimitiveext BUT the builtin variables

ahaaa

#

what is primitive id

wicked notch Apr 3, 2024, 9:38 PM

#

gl_MeshPrimitivesEXT[id].gl_PrimitiveID

#

I use this to output primitive id to visbuffer

glass sphinx Apr 3, 2024, 9:38 PM

#

what does it to over an index

#

i always just passed a uint

#

watbulb

wicked notch Apr 3, 2024, 9:39 PM

#

it's probably the same thing, one is per vertex and one is per primitive

#

idk

#

gl_CullPrimitiveEXT tho bleakekw

glass sphinx Apr 3, 2024, 9:40 PM

#

wicked notch gl_CullPrimitiveEXT tho <:bleakekw:1082598350303539240>

yea that is a problem

glass sphinx Apr 3, 2024, 9:40 PM

#

wicked notch it's probably the same thing, one is per vertex and one is per primitive

from reading opengl spec it seems you are not supoosed to erite that

#

its just an input for some shader stages

wicked notch Apr 3, 2024, 9:40 PM

#

glass sphinx from reading opengl spec it seems you are not supoosed to erite that

perprimitive?

glass sphinx Apr 3, 2024, 9:41 PM

#

in geo shaders its an output

#

no the prim id

wicked notch Apr 3, 2024, 9:41 PM

#

oh you mean gl_PrimitiveID

glass sphinx Apr 3, 2024, 9:41 PM

#

yee

wicked notch Apr 3, 2024, 9:41 PM

#

I mean

#


perprimitiveEXT out gl_MeshPerPrimitiveEXT {
  int  gl_PrimitiveID;
  int  gl_Layer;
  int  gl_ViewportIndex;
  bool gl_CullPrimitiveEXT;
  int  gl_PrimitiveShadingRateEXT;
} gl_MeshPrimitivesEXT[];``` vk spec says this is writeonly

#

https://github.com/KhronosGroup/Vulkan-Docs/blob/main/proposals/VK_EXT_mesh_shader.adoc#34-glsl-changes

glass sphinx Apr 3, 2024, 9:42 PM

#

hmm

#

cool that its documented in zhe mesh shader ext tho

buoyant summit Apr 3, 2024, 9:46 PM

#

tbf I also don't really get the point of mesh shader gl_PrimitiveID beside some weird compat with frag shaders that were consuming/expecting gl_PrimitiveID

glass sphinx Apr 3, 2024, 9:52 PM

#

yea

buoyant summit Apr 3, 2024, 9:55 PM

#

btw I found a use case for secondary command buffers

#

you can defer decisions like loadOp, storeOp and image layouts until later point in time

glass sphinx Apr 3, 2024, 9:56 PM

#

but cant you do that with normal cmd buffers too

buoyant summit Apr 3, 2024, 9:56 PM

#

no

#

suspend/resume requires that loadOp etc match

#

and image layouts

glass sphinx Apr 3, 2024, 9:56 PM

#

suspend and resume rendering needs the

#

uuuh

#

ok

#

intresting

buoyant summit Apr 3, 2024, 9:56 PM

#

tbh my abstraction is just kinda weird

#

but secondary cmdbufs turned out to be rather handy for it

glass sphinx Apr 3, 2024, 9:57 PM

#

nice

#

not weird if it works

delicate rain Apr 3, 2024, 9:58 PM

#

I never understood the secondary command buffers

#

I thought they were what work graphs are meant to be

buoyant summit Apr 3, 2024, 9:58 PM

#

frog_think

#

do you mean indirect commands

#

secondary command buffers are like normal command buffers except they need less information before you can record a draw

delicate rain Apr 3, 2024, 9:59 PM

#

Ah okay I was heavily misinformed then

glass sphinx Apr 3, 2024, 9:59 PM

#

i dont like the name command buffer

#

buffer too overloaded

#

command list better

delicate rain Apr 3, 2024, 9:59 PM

#

I thought secondary command buffers could be dispatched from primary ones

buoyant summit Apr 3, 2024, 9:59 PM

#

🥱

#

yes

delicate rain Apr 3, 2024, 10:00 PM

#

Hmmmm

wicked notch Apr 3, 2024, 10:00 PM

#

glass sphinx command list better

the microsoft employee side of you is leaking

buoyant summit Apr 3, 2024, 10:00 PM

#

me on my way to rename descriptor_buffer to descriptor_list

glass sphinx Apr 3, 2024, 10:00 PM

#

wicked notch the microsoft employee side of you is leaking

buy office

#

NOW

wicked notch Apr 3, 2024, 10:00 PM

#

I have university account

#

froge

glass sphinx Apr 3, 2024, 10:01 PM

#

ok good

buoyant summit Apr 3, 2024, 10:01 PM

#

tbh one thing I'd like if memory management for cmdbufs was more explicit

#

CommandPool feels ugh

glass sphinx Apr 3, 2024, 10:01 PM

#

true

#

bikeshed tho

buoyant summit Apr 3, 2024, 10:01 PM

#

can I just please provide a callback that driver will call every now and then froge_sad

buoyant summit Apr 3, 2024, 10:02 PM

#

glass sphinx bikeshed tho

real

#

btw

#

VK_EXT_nested_command_buffer is a thing apparently

#

so you can recurse with command buffers very deeply

#

idk who asked for that, but

glass sphinx Apr 3, 2024, 10:02 PM

#

lool

wispy spear Apr 3, 2024, 10:04 PM

#

John Daxa probably

delicate rain Apr 3, 2024, 10:04 PM

#

wispy spear John Daxa probably

Common misconception

#

He actually is called "Join Daxa"

buoyant summit Apr 3, 2024, 10:04 PM

#

Jane Daxa

wispy spear Apr 3, 2024, 10:05 PM

#

my bad i meant Jérôme Daxa

wicked notch Apr 4, 2024, 12:52 AM

#

https://github.com/KhronosGroup/SPIRV-Tools/issues/4919 turns out, there was already an issue

#

it's been sitting there for 2 years

#

thank you John Khronos bleakekw

loud crag Apr 4, 2024, 4:55 AM

#

in the default implementation yeah but I have a custom one which dynamically creates a set with all required images

#

hmm no that works

#

all my textures are SRGB generally

frank sail Apr 4, 2024, 6:59 AM

#

did you see this already lvstri
https://themaister.net/blog/2024/01/17/modernizing-granites-mesh-rendering/

pale horizon Apr 4, 2024, 11:22 AM

#

loud crag all my textures are SRGB generally

Well, you have a custom backend, so you probably do some sRGB trickery
But also try drawing some ColorEdit and see if displayed color matches there

#

Also you can open ImGui's style editor in DemoWindow and color pick the colors to see if they match
Simple sRGB-cope implementations usually have subtly wrong colors

#

(especially FrameBg, FrameBgHovered etc.)

loud crag Apr 4, 2024, 12:11 PM

#

pale horizon Well, you have a custom backend, so you probably do some sRGB trickery But also ...

I just convert the vertex colors to linear

primal shadow Apr 4, 2024, 4:12 PM

#

@wicked notch have you started software raster? Do you know why nanite forces HW raster for triangles that need to be clipped to the screen?

wicked notch Apr 4, 2024, 4:12 PM

#

clipping is faster in HW

primal shadow Apr 4, 2024, 4:17 PM

#

Isint it just a min/max of each screen space vertex to clamp to the screen bounds?

wicked notch Apr 4, 2024, 4:37 PM

#

itdepends

#

but tbh I didn't think about it too hard, I just acceoted what Brian said KEKW

glass sphinx Apr 4, 2024, 4:39 PM

#

primal shadow Isint it just a min/max of each screen space vertex to clamp to the screen bound...

no

wheat haven Apr 4, 2024, 4:53 PM

#

primal shadow Isint it just a min/max of each screen space vertex to clamp to the screen bound...

if it was just a min/max clamp you'd change the shape of the triangle

delicate rain Apr 4, 2024, 4:53 PM

#

primal shadow Isint it just a min/max of each screen space vertex to clamp to the screen bound...

I don't even think this gives you clipped triangle

#

yeah exactly

primal shadow Apr 4, 2024, 4:54 PM

#

Ah I see

#

Oh wait I misunderstood what this tutorial I'm reading is doing

#

They were clipping the aabb for the triangle, not the triangle vertices

delicate rain Apr 4, 2024, 4:55 PM

#

https://cent.felk.cvut.cz/courses/PGR2/lectures/10-architekturyII.pdf
Look this if you're interested
Starting Slide 46

primal shadow Apr 4, 2024, 4:56 PM

#

Thanks!

wispy spear Apr 4, 2024, 5:17 PM

#

delicate rain https://cent.felk.cvut.cz/courses/PGR2/lectures/10-architekturyII.pdf Look this ...

why are paper filenames always ass 😦

delicate rain Apr 4, 2024, 5:19 PM

#

it is a lecture slides thingy

wispy spear Apr 4, 2024, 5:19 PM

#

still

#

same

delicate rain Apr 4, 2024, 5:20 PM

#

hmm I don't see the issue here. It is a lecture 10 - which is about GPU architectures (architektury in Czech) part II

frank sail Apr 4, 2024, 5:21 PM

#

wispy spear why are paper filenames always ass 😦

I thought you liked funny spellings

primal shadow Apr 4, 2024, 6:11 PM

#

Does anyone understand what nanite talks about when it comes to solving for the x interval for software raster?

#

I suppose I should read the PDF they link as reference [71] 😛

primal shadow Apr 4, 2024, 6:44 PM

#

Nope I read through cure's GitHub project and can't actually find what they're doing

primal shadow Apr 4, 2024, 7:11 PM

#

@wicked notch how do you decide between SW/HW raster per cluster? Nanite presentation says all 3 triangle edges < 32 pixels long. But uhh, how do you determine that at the cluster level? Compute some kind of average triangle edge length per cluster?

wicked notch Apr 4, 2024, 7:14 PM

#

I personally just do aabb to viewport space

#

it's good enough for me

primal shadow Apr 4, 2024, 7:20 PM

#

And then what, SW rasterize anything with a small viewport area?

wicked notch Apr 4, 2024, 7:56 PM

#

yes

primal shadow Apr 4, 2024, 7:57 PM

#

Makes sense, thanks

wicked notch Apr 4, 2024, 7:57 PM

#

primal shadow And then what, SW rasterize anything with a small viewport area?

if the aabb is less than 32 pixels in each direction

primal shadow Apr 4, 2024, 7:57 PM

#

I'm almost ready to do software raster, I just need some final wgpu changes to do image atomics. Going to use buffers in the meantime.

#

I also need to figure out how to rework my culling code to feed into the SW/HW raster. Probably switch to two atomic lists of cluster IDs.

#

And then indirect dispatched on the SW raster / hw index buffer write.

wicked notch Apr 6, 2024, 1:34 AM

#

https://github.com/shader-slang/slang/pull/3895#issuecomment-2040601784
mfw

wispy spear Apr 6, 2024, 10:35 AM

#

isnt that what potti said?

#

the frogs over there fiks shit rather kwik

glass sphinx Apr 6, 2024, 11:29 AM

#

yes

wispy spear Apr 6, 2024, 11:37 AM

#

thats how we will get LVSTRINV soon

frank sail Apr 6, 2024, 4:29 PM

#

LPSTRI

wispy spear Apr 6, 2024, 4:39 PM

#

that reminds me of pastries

#

https://tenor.com/view/carlton-the-bear-bakery-pastries-food-tim-hortons-gif-27115405

Tenor

wispy spear Apr 6, 2024, 10:28 PM

#

luigi? i think you should try to get into novideo

#

after school

#

just like pixel went to valve

#

having a frog at important spots in the industry helps to fight John Khronos

wicked notch Apr 6, 2024, 10:34 PM

#

you're asking me to go to the dark side

wispy spear Apr 6, 2024, 10:35 PM

#

explicitly

buoyant summit Apr 6, 2024, 10:35 PM

#

why is nv dark side

wicked notch Apr 6, 2024, 10:35 PM

#

nv bad amd good

#

isn't that how it works

buoyant summit Apr 6, 2024, 10:36 PM

#

no

#

everyone bäd

wicked notch Apr 6, 2024, 10:36 PM

#

true that

frank sail Apr 6, 2024, 10:56 PM

#

my megacorp is better than your megacorp

delicate rain Apr 6, 2024, 10:58 PM

#

NV money

#

Go nv

#

I love monopoly

#

Long live nv

delicate rain Apr 8, 2024, 1:43 AM

#

@wicked notch I move here to not pollute Vulkan

#

I don't get it

#

do you do one dispatch per task shader workgroup?

#

or per subgroup?

#

(my actual knowledge of mesh shaders is limited)

wicked notch Apr 8, 2024, 1:45 AM

#

workgroup is 128
subgroup is 32
when I call vkCmdDrawMeshTasksEXT(1, 1, 1) I dispatch one task shader to cull 128 meshlets

#

if all of those survive I will have 4 mesh shader commands (EmitMeshTasksEXT()) each with a workgroup dispatch size of 32

#

what I'm trying to do is reduce the number of EmitMeshTasksEXT calls so that NV command processor doesn't commit sudoku

delicate rain Apr 8, 2024, 1:47 AM

#

okay so you still cull 32 meshlets per task subgroup?

#

uh

#

I'll just tell you what we do in Tido

wicked notch Apr 8, 2024, 1:48 AM

#

yes

#

32 per subgroup, 128 per workgroup

delicate rain Apr 8, 2024, 1:48 AM

#

each task shader invoc culls 32 meshletts, we get base offset, subgroup offset (I think) and survivor bitmask

#

then we dispatch mesh shader (32 threads) for each surviving meshlet

loud crag Apr 8, 2024, 1:48 AM

#

yeah survivor bitmask sounds like a good idea

wicked notch Apr 8, 2024, 1:48 AM

#

what's the survivor bitmask for

delicate rain Apr 8, 2024, 1:49 AM

#

to know what meshlet mesh shader works on

wicked notch Apr 8, 2024, 1:49 AM

#

you already know that don't you

loud crag Apr 8, 2024, 1:49 AM

#

replaces your deltaIDs from your payload

delicate rain Apr 8, 2024, 1:49 AM

#

nth mesh dispatch looks for nth set bit in the bitmask

wicked notch Apr 8, 2024, 1:49 AM

#

ah

#

I see

delicate rain Apr 8, 2024, 1:50 AM

#

your payload shrinks 32 times

#

(or 8 if you use weirdo uints)

wicked notch Apr 8, 2024, 1:50 AM

#

so you do an exclusive workgroup bitcount?

#

if there even is an intrinsic for that

#

maybe it's only subgroup bitcount

delicate rain Apr 8, 2024, 1:50 AM

#

we cooked with Patrick

#

wait

#

func wave32_find_nth_set_bit(uint mask, uint bit) -> uint
{
    // Each thread tests a bit in the mask.
    // The nth bit is the nth thread.
    let wave_lane_bit_mask = 1u << WaveGetLaneIndex();
    let is_nth_bit_set = ((mask & wave_lane_bit_mask) != 0) ? 1u : 0u;
    let set_bits_prefix_sum = WavePrefixSum(is_nth_bit_set) + is_nth_bit_set;

    let does_nth_bit_match_group = set_bits_prefix_sum == (bit + 1);
    uint ret;
    uint4 mask = WaveActiveBallot(does_nth_bit_match_group);
    uint first_set_bit = WaveActiveMin((mask.x & wave_lane_bit_mask) != 0 ? WaveGetLaneIndex() : 100);
    return first_set_bit;
}

#

returns the position of nth set bit in a bitmask, uniform for entire subgroup

wicked notch Apr 8, 2024, 1:52 AM

#

why would you do this instead of just using subgroupBallotExclusiveBitCount bleakekw

delicate rain Apr 8, 2024, 1:53 AM

#

I think it's not in slang

#

bleakekw

wicked notch Apr 8, 2024, 1:53 AM

#

ah so slang cope

#

epic

loud crag Apr 8, 2024, 1:53 AM

#

delicate rain I think it's not in slang

no spirv intrinsics?

delicate rain Apr 8, 2024, 1:53 AM

#

wicked notch why would you do this instead of just using subgroupBallotExclusiveBitCount <:bl...

but even if you did exclusive bit count

#

how do you know what thread to broadcast from

#

like the thread itself knows sure

#

but how do the others know

wicked notch Apr 8, 2024, 1:54 AM

#

interesting question

#

intuitively, each subgroup invocation calculates its own exclusive bit count

#

so instead of a ballot exclusive bit count, you do a regular bit count on the mask

delicate rain Apr 8, 2024, 1:55 AM

#

well you want a subgroup per meshlet

#

so all 32 threads should agree on the meshlet

#

no?

#

or do you have a different mapping

wicked notch Apr 8, 2024, 1:56 AM

#

I don't, you are indeed correct

#

4am brain syndrome

delicate rain Apr 8, 2024, 1:57 AM

#

the exclusive/inclusive bit count would just replace the WavePrefixSum and two lines above

wicked notch Apr 8, 2024, 1:57 AM

#

so it's a workgroup bit count

delicate rain Apr 8, 2024, 1:57 AM

#

the ballot and min you still need

wicked notch Apr 8, 2024, 1:57 AM

#

yeah it's basically what you and patrick cooked

#

your code is now mine

delicate rain Apr 8, 2024, 1:57 AM

#

yes yes

#

I will need soft shadows soon, so let's call it an exchange 🤝

glass sphinx Apr 8, 2024, 7:40 AM

#

wicked notch workgroup is 128 subgroup is 32 when I call `vkCmdDrawMeshTasksEXT(1, 1, 1)` I d...

why do you use such large workgroups

#

nv recommends 32 still

#

i also always lost perf on larger groups

#

the survivor list reduced the lsb pressure quite a bit that was very nice

#

larger groups are probably better when you have larger payload

#

i see now why you did it

#

https://tenor.com/view/allen-haff-allen-lee-haff-auction-hunters-just-a-ah-fan-yep-gif-26404237

Tenor

wicked notch Apr 8, 2024, 9:24 AM

#

the idea was to reduce the number of invocations needed to cull and draw

buoyant summit Apr 8, 2024, 9:33 AM

#

glass sphinx i also always lost perf on larger groups

mesh workgroups are fake on nv, it's just a single hw subgroup cycling through api subgroups, compiler munches your shader

wicked notch Apr 8, 2024, 9:39 AM

#

buoyant summit mesh workgroups are fake on nv, it's just a single hw subgroup cycling through a...

wait wat

#

can I at least expect gl_SubgroupID to work as I would expect it to work?

#

when workgroup size is multiple of subgroup size

glass sphinx Apr 8, 2024, 9:42 AM

#

wicked notch can I at least expect gl_SubgroupID to work as I would expect it to work?

yes

glass sphinx Apr 8, 2024, 9:47 AM

#

buoyant summit mesh workgroups are fake on nv, it's just a single hw subgroup cycling through a...

what bird dir you switcher this forbidden knowledge

buoyant summit Apr 8, 2024, 10:10 AM

#

wicked notch can I at least expect gl_SubgroupID to work as I would expect it to work?

yes, that's just an impl detail

buoyant summit Apr 8, 2024, 10:11 AM

#

glass sphinx what bird dir you switcher this forbidden knowledge

nvk people

glass sphinx Apr 8, 2024, 10:12 AM

#

nice

loud crag Apr 8, 2024, 12:02 PM

#

glass sphinx what bird dir you switcher this forbidden knowledge

well its kinda obvious given the NV extension is always one dimensional

buoyant summit Apr 8, 2024, 11:43 PM

#

https://github.com/shader-slang/slang/pull/3907

GitHub

WIP: Init expressions for struct fields support, #3738 by ArielG-NV...

Following commit handles init expressions of struct's.
The general implementation follows C++ init expression rules for derived classes.
The logic was implemented after type resolution (Semanti...

#

slang your beloved (NOT MINE THOUGH)

wicked notch Apr 8, 2024, 11:44 PM

#

holy

#

now we only need VMM and we're golden

glass sphinx Apr 8, 2024, 11:50 PM

#

buoyant summit https://github.com/shader-slang/slang/pull/3907

they really are sweet hearts

#

they just implement it

#

i love them

glass sphinx Apr 8, 2024, 11:50 PM

#

buoyant summit slang your beloved (NOT MINE THOUGH)

cope cope cope

#

https://tenor.com/view/cool-fun-white-cat-dance-cool-and-fun-times-gif-16435335956387921912

Tenor

#

^ you when a slang release drops (at least this one)

wicked notch Apr 9, 2024, 12:19 AM

#

@glass sphinx do you have docs on spirv intrinsics for slang

buoyant summit Apr 9, 2024, 10:20 AM

#

trying to review a slang patch and it makes my brain boil slightly

glass sphinx Apr 9, 2024, 10:21 AM

#

wicked notch <@238694102437330944> do you have docs on spirv intrinsics for slang

afaik there are only tests

buoyant summit Apr 9, 2024, 10:26 AM

#

where slang discord/irc..

#

-emit-ir, handy

buoyant summit Apr 9, 2024, 11:12 AM

#

#

I think and proof read a lot before posting

wispy spear Apr 9, 2024, 11:14 AM

#

hehe

#

better than not proof reading at all ❤️

frank sail Apr 9, 2024, 11:16 AM

#

I wonder how nano copes with email

buoyant summit Apr 9, 2024, 11:16 AM

#

frank sail I wonder how nano copes with email

I just re-read my thingy many times before sending

frank sail Apr 9, 2024, 11:16 AM

#

Same

buoyant summit Apr 9, 2024, 11:16 AM

#

but also by not sending any emails often

frank sail Apr 9, 2024, 11:16 AM

#

I hate email

wispy spear Apr 9, 2024, 11:17 AM

#

yeah it sucks

buoyant summit Apr 9, 2024, 11:17 AM

#

agree but I like receiving notifications on my issues/MRs/whatever in a centralized box

#

very handy

frank sail Apr 9, 2024, 11:17 AM

#

Sure but I mean for direct communication

buoyant summit Apr 9, 2024, 11:17 AM

#

yeah for direct communication it kinda sucks

frank sail Apr 9, 2024, 11:18 AM

#

Especially when mfs splinter threads and then the inbox becomes impossible to navigate bleakekw (might be a client/skill issue idk)

buoyant summit Apr 9, 2024, 11:18 AM

#

yes, skill issue

wispy spear Apr 9, 2024, 11:20 AM

#

plus all the signature junk polluting every thread

buoyant summit Apr 9, 2024, 11:24 AM

#

frank sail Especially when mfs splinter threads and then the inbox becomes impossible to na...

frank sail Apr 9, 2024, 11:24 AM

#

what client is this

buoyant summit Apr 9, 2024, 11:25 AM

#

thunderbird

frank sail Apr 9, 2024, 11:25 AM

#

in Outlook it's just a list in chronological order, at least by default

#

ass

buoyant summit Apr 9, 2024, 11:25 AM

#

never used outlook

frank sail Apr 9, 2024, 11:25 AM

#

I use it for work unfortunately

#

I did use Thunderbird once. Perhaps I should use it again

buoyant summit Apr 9, 2024, 11:26 AM

#

considering outlook is supposed to be backed by a big chungus corporation it probably has comparable functionality

#

so mayhaps just do a bit of googling

frank sail Apr 9, 2024, 11:26 AM

#

Idk what to search for I guess

#

I'll figure it out

buoyant summit Apr 9, 2024, 11:27 AM

#

do you use folders

#

or do you just look at your inbox

#

create folder + setup a filter by something

frank sail Apr 9, 2024, 11:27 AM

#

I have a bunch of filters and folders already

#

I just need to make it look nice

delicate rain Apr 9, 2024, 11:27 AM

#

Damn email powerusers

frank sail Apr 9, 2024, 11:28 AM

#

It's necessary when you get dozens of emails every day bleakekw

buoyant summit Apr 9, 2024, 11:28 AM

#

froge_bleak

delicate rain Apr 9, 2024, 11:29 AM

#

I contribute to one repo and they have the GitHub actions setup to try and build after each commit for like 15 different setups. So after each commit you receive 15 emails of the build succeeding or failing

frank sail Apr 9, 2024, 11:30 AM

#

hmm I only get emails when CI fails in my thing (Fwog)

delicate rain Apr 9, 2024, 11:30 AM

#

Or maybe it's that, but their gh actions are broken so everything fails 🥸

#

I don't really read them, because it was just garbage every time I did

#

900k+ lines of compiler errors

finite quartz Apr 9, 2024, 11:31 AM

#

frank sail I hate email

Yeah, takes me hours to write anything, I hate it too

glass sphinx Apr 9, 2024, 1:04 PM

#

frank sail I wonder how nano copes with email

i always send cryptic shit

#

if end to end encription gets outruled at some point im safe

#

i trained all the ppl i know to decypher my shit

#

noone else will understand

primal shadow Apr 9, 2024, 2:27 PM

#

My depth down sampling is missing some pixels on npot textures

#

And therefore the way I project bounding spheres to screen space is wrong. Ahhh.

delicate rain Apr 9, 2024, 3:18 PM

#

https://github.com/LVSTRI/Retina/blob/508dfb0ba11e6ffaba7633c07d2f5613e06472b9/src/Retina/Sandbox/Shaders/GBufferResolve.frag.glsl#L145 @wicked notch mister what is this cope? bleakekw

wicked notch Apr 9, 2024, 3:19 PM

#

it's what one does to avoid pagefaulting bleakekw

#

though tbh I don't think adding a page request is too bad

delicate rain Apr 9, 2024, 3:21 PM

#

hmm with PCF this is easy, you just request the PCF region to be in the same clip, but I have no clue about how to do it with pcss

wicked notch Apr 9, 2024, 3:22 PM

#

same thing except it's max(blockerSearchRadius, maxFilterRadius)

delicate rain Apr 9, 2024, 3:22 PM

#

right, but those can be massive no?

wicked notch Apr 9, 2024, 3:22 PM

#

oh

#

yes bleakekw

delicate rain Apr 9, 2024, 3:23 PM

#

yeah...

#

cope it is

wicked notch Apr 9, 2024, 3:23 PM

#

I have to try pagefaulting

#

it's really just an atomicAdd, how bad can it be

delicate rain Apr 9, 2024, 3:23 PM

#

I don't follow

wicked notch Apr 9, 2024, 3:24 PM

#

when filtering, if a page is not allocated you just emplace an allocation request in the list

delicate rain Apr 9, 2024, 3:24 PM

#

to be drawn next frame?

wicked notch Apr 9, 2024, 3:27 PM

#

ye

faint crane Apr 9, 2024, 9:20 PM

#

https://zeux.io/2024/04/09/meshlet-triangle-locality/

Meshlet triangle locality matters

When working with mesh shaders, the geometry needs to be split into meshlets: small geometry chunks where each meshlet has a set of vertices and triangle indices that refer to the vertices inside each meshlet. Mesh shader then has to transform all vertices and emit all transformed vertices and triangles through the shader API to the rasterizer. ...

wicked notch Apr 10, 2024, 12:09 AM

#

@primal shadow you're building your own version of SPD right?

#

can you pin me the discussions you had with devsh, I can't find them

primal shadow Apr 10, 2024, 12:15 AM

#

wicked notch <@145540119141679105> you're building your own version of SPD right?

No I'm not working on anything ATM. I'm on vacation and don't have my desktop, and my laptop is broken.

wicked notch Apr 10, 2024, 12:15 AM

#

rip

primal shadow Apr 10, 2024, 12:16 AM

#

wicked notch can you pin me the discussions you had with devsh, I can't find them

#graphics-techniques message

#

Depth down sampling for the initial pass is hard though. I messed up and need to fix it.

wicked notch Apr 10, 2024, 2:50 PM

#

halving time for hzb copy + reduce, 'ery nice

#

150us -> 80us

#

now I can spam this for all 16 clipmaps

glass sphinx Apr 10, 2024, 4:01 PM

#

that will be 1.3 ms

wicked notch Apr 10, 2024, 4:04 PM

#

yes bleakekw

glass sphinx Apr 10, 2024, 4:05 PM

#

frognant

#

why dont you do single pass downsampling

wicked notch Apr 10, 2024, 4:06 PM

#

because amd cannot comprehend the existence of operating systems other than windows

#

this is kinda sorta single pass

#

except it's 3 pass KEKW

glass sphinx Apr 10, 2024, 4:10 PM

#

oooff

#

wait but you can just copy the tido single pass downsampler

#

it just doesnt properly work on the bottom and right edge if you have a non pod res

wicked notch Apr 10, 2024, 4:10 PM

#

does it work for stuff that is non square

#

like 2048x1024

glass sphinx Apr 10, 2024, 4:11 PM

#

yea

wicked notch Apr 10, 2024, 4:11 PM

#

nice

wicked notch Apr 10, 2024, 4:11 PM

#

wicked notch like 2048x1024

how long does it take to downshrimple this image?

#

assume it's just single channel float

glass sphinx Apr 10, 2024, 4:12 PM

#

i was thinking of rescaling the lowest mip to some pot size

glass sphinx Apr 10, 2024, 4:12 PM

#

wicked notch how long does it take to downshrimple this image?

~20 microseconds

#

on 4080

#

fully memory bound

wicked notch Apr 10, 2024, 4:12 PM

#

hmm

glass sphinx Apr 10, 2024, 4:12 PM

#

wait no its less

#

at 1440p irs 20microsecs

wicked notch Apr 10, 2024, 4:12 PM

#

for me it takes 30 microseconds to copy time image and 30 microseconds to fully downsample it

glass sphinx Apr 10, 2024, 4:14 PM

#

tidos should be 11.5 microseconds for 2048*1024

#

on 4080

wicked notch Apr 10, 2024, 4:14 PM

#

does that include the initial reduction

glass sphinx Apr 10, 2024, 4:14 PM

#

no

wicked notch Apr 10, 2024, 4:14 PM

#

ok so your thing is 2.9x faster

glass sphinx Apr 10, 2024, 4:15 PM

#

arguably i should remove the writeout for the lowest 2-3 mips not just mip0

#

culling on that fine scale is unnecessary

#

also slower even potebtially

#

i need to test it again

delicate rain Apr 10, 2024, 4:17 PM

#

wicked notch ok so your thing is 2.9x faster

on a nuclear GPU

wicked notch Apr 10, 2024, 4:18 PM

#

btw saky

#

hear me out

#

when drawing the vsms

#

I bind temporary virtual resolution depth texture to get the sweet early Z

primal shadow Apr 10, 2024, 4:18 PM

#

@glass sphinx do you have code to share? I realized my down sampling does not work for npot textures :(.

wicked notch Apr 10, 2024, 4:18 PM

#

now let's say I build HZB out of this temporary depth texture

#

in the initialization step, I query for the active page table, and write 0 if page is inactive, or the depth if it is active

delicate rain Apr 10, 2024, 4:19 PM

#

primal shadow <@238694102437330944> do you have code to share? I realized my down sampling doe...

https://github.com/Sunset-Flock/Timberdoodle/blob/main/src/rendering/rasterize_visbuffer/gen_hiz.glsl

wicked notch Apr 10, 2024, 4:19 PM

#

then I reduce the thing and cull

#

culling outputs a bitmask of visible meshlets

#

this repeats for each clipmap

#

downside is subchannel switch

delicate rain Apr 10, 2024, 4:20 PM

#

uhh

primal shadow Apr 10, 2024, 4:21 PM

#

delicate rain <https://github.com/Sunset-Flock/Timberdoodle/blob/main/src/rendering/rasterize_...

Ty, I will steal and port to wgsl 🙂

delicate rain Apr 10, 2024, 4:21 PM

#

wicked notch then I reduce the thing and cull

so you cull your lower clips with the deph written out by the higher clips?

#

I don't fully follow

#

also it makes it so that you have to barrier between each clip draw

#

16 barriers each frame

#

yucky

wicked notch Apr 10, 2024, 4:22 PM

#

no I don't need barriers

#

or yes I do

#

but only for the depth texture

#

transitioning from undefined because I discard the results

primal shadow Apr 10, 2024, 4:22 PM

#

wicked notch culling outputs a bitmask of visible meshlets

I switched to 2 bits per meshlet btw. 0 = not eligible (wrong lod, instance culled, failed frustum cull), 1 = first pass draw (passed last frame occlusion cull), 2 = second pass candidate (failed last frame occlusion cull), 3 = second pass draw (passed current frame occlusion cull)

delicate rain Apr 10, 2024, 4:23 PM

#

wicked notch transitioning from undefined because I discard the results

well you need a full clear no?

wicked notch Apr 10, 2024, 4:23 PM

#

yes

delicate rain Apr 10, 2024, 4:23 PM

#

yeah

wicked notch Apr 10, 2024, 4:23 PM

#

but it's accelerated so it doesn't count

delicate rain Apr 10, 2024, 4:23 PM

#

yeah but the draws don't overlap

#

thats why I mainly don't like the temp Z buffer

wicked notch Apr 10, 2024, 4:24 PM

#

they don't but I'm hoping for hiz culling to make up for the difference

primal shadow Apr 10, 2024, 4:24 PM

#

Also I asked this question earlier and didn't get a response, do we really need to do an explicit frustum cull for each meshlet? We have to occlusion cull anyways, which involves projecting the culling sphere to a screen space aabb. We can just check if it's valid, no?

glass sphinx Apr 10, 2024, 4:24 PM

#

primal shadow <@238694102437330944> do you have code to share? I realized my down sampling doe...

mine also doesnt. I still need to add a variant that respects borders

delicate rain Apr 10, 2024, 4:24 PM

#

wicked notch they don't but I'm hoping for hiz culling to make up for the difference

I don't understand how you'd use the temp z for hiz cull

#

like you draw the biggest clip, and cull the lower one against hiz from the biggest clip?

#

ie draw clip 8 - build hiz of 8 - cull clip 7 agains 8 - draw clip 7 - build hiz 7 ....

wicked notch Apr 10, 2024, 4:25 PM

#

no

#

it's like this

primal shadow Apr 10, 2024, 4:27 PM

#

glass sphinx mine also doesnt. I still need to add a variant that respects borders

There's https://miketuritzin.com/post/hierarchical-depth-buffers, but idk how optimized it is.

wicked notch Apr 10, 2024, 4:27 PM

#

for clipmap in clipmaps {
  Clear(tempDepthBuffer);
  Draw(tempDepthBuffer, previousFrameVisibilityMask[clipmap.Index]);
  Copy(tempDepthBuffer, clipmap.HZB); // also performs an initial reduction, checks whether the current depth tile
                                      // is active in the page table, if it isn't active the depth written is going to be 0
  Reduce(clipmap.HZB);
  Cull(clipmap.HZB, currentFrameVisibilityMask[clipmap.Index]);
  Barrier();
}```

delicate rain Apr 10, 2024, 4:29 PM

#

right, but I don't see how this is different than doing it without temp hiz?

#

you can just as well dras previousFrameVisibilityMask into the VSM itself

#

reduce the VSM

#

and cull against that

#

you don't really need the temp z for this no?

#

or am I missing somehting again bleakekw

wicked notch Apr 10, 2024, 4:30 PM

#

now that I think about it you're right I don't

#

I can just make a dispatch over the full virtual resolution and each thread checks for the virtual page table, gets physical texel and write that (or 0 if page isn't active)

#

this way draws overlap too

delicate rain Apr 10, 2024, 4:32 PM

#

and we can do that inplace in the physical memory itself, (we just make the physical memory have mips, that way you can reduce over the physical memory)

wicked notch Apr 10, 2024, 4:33 PM

#

ye but I don't understand how to cull against the actual VSM with mips bleakekw

delicate rain Apr 10, 2024, 4:35 PM

#

I mean there will be some index math but I don't see how it would not be doable

#

conceptually it is all the same no?

wicked notch Apr 10, 2024, 4:36 PM

#

I mean

#

yeah

#

but it feels too clamplicated compared to just having a good ole HZB

delicate rain Apr 10, 2024, 4:37 PM

#

It will be a bit more involved yeah

wicked notch Apr 10, 2024, 4:37 PM

#

thinking about it, a quad could span 4 different physical pages

delicate rain Apr 10, 2024, 4:37 PM

#

yes

wicked notch Apr 10, 2024, 4:38 PM

#

(one per corner)

delicate rain Apr 10, 2024, 4:38 PM

#

so you need to go from NDC -> VSM_PAGE -> PHYSICAL_STORAGE

#

or, over some mip cutoff just NDC -> VSM_HIZ

#

it's just an indirection

#

(I'm saying it super nonchalantely, as like I won't run into 50 bugs trying to implement this)

wicked notch Apr 10, 2024, 4:56 PM

#

too real

#

I'll try out the ez version

frank sail Apr 11, 2024, 1:07 AM

#

buoyant summit

btw I found the button to enable this view in outlook (show as conversations) froge_love

distant lodge Apr 11, 2024, 2:58 PM

#

look what came in the mail today https://jglrxavpok.github.io/2024/04/02/recreating-nanite-runtime-lod-selection.html

buoyant summit Apr 11, 2024, 2:59 PM

#

@wicked notch make a blog

#

or at least a https://mastodon.gamedev.place/ account

distant lodge Apr 11, 2024, 2:59 PM

#

please make a blog, I don't wanna read twitter clones

buoyant summit Apr 11, 2024, 3:01 PM

#

posting short messages can be easier for the author to write

hallow cedar Apr 11, 2024, 3:01 PM

#

jokes on you i procrastinate doing either

distant lodge Apr 11, 2024, 3:01 PM

#

post them into your text editor

#

and then upload it to a web host

buoyant summit Apr 11, 2024, 3:01 PM

#

so it's a choice between lustri not writing anything (and you thus not reading) vs reading something that's perhaps not very quality

distant lodge Apr 11, 2024, 3:01 PM

#

one weird trick

buoyant summit Apr 11, 2024, 3:01 PM

#

distant lodge post them into your text editor

bro

#

leave it up to lustri to decide

distant lodge Apr 11, 2024, 3:02 PM

#

true, I should probably give in and make a memestodon account so I can continue passively trawling for GP info

buoyant summit Apr 11, 2024, 3:02 PM

#

and you VILL be reading microblogs on twitter clones

#

if the poster chooses

buoyant summit Apr 11, 2024, 3:06 PM

#

distant lodge true, I should probably give in and make a memestodon account so I can continue ...

just pick an instance that doesn't tolerate and/or enable morons

#

or other instances will block the instance you've picked which might reduce the quality of your experience

buoyant summit Apr 11, 2024, 5:43 PM

#

@dull oyster didnotread fully but why do you do this:


        vk::DeviceAddress vertexBufferAddress = (vk::DeviceAddress)-1;
        vk::DeviceAddress indexBufferAddress = (vk::DeviceAddress)-1;```

#

why not just 0

#

0 is null & invalid

frank sail Apr 11, 2024, 7:25 PM

#

buoyant summit or at least a <https://mastodon.gamedev.place/> account

not Twitter froge_sad

primal shadow Apr 11, 2024, 7:28 PM

#

I use my own blog on GitHub pages. Took like an hour to setup.

dull oyster Apr 11, 2024, 9:56 PM

#

buoyant summit <@163242187084005377> <:didnotread:1198383955037134979>fully but why do you do t...

Honestly? Don't really know why I did that

wicked notch Apr 12, 2024, 12:36 AM

#

I really should sleep because my eyes are failing me but the vsm HiZ expiment is almost done

wispy spear Apr 12, 2024, 12:37 AM

#

allez dodo! 🦤

wicked notch Apr 12, 2024, 12:54 AM

#

const uint cornerX = index & log2(position);
const uint cornerY = index >> log2(position);```

#

mmm yes

#

bleakekw

frank sail Apr 12, 2024, 12:56 AM

#

are you doing two-pass hiz

wicked notch Apr 12, 2024, 12:56 AM

#

just single pass rn

frank sail Apr 12, 2024, 12:56 AM

#

because I think you either have to do that or make pages active for two frames so geometry can render properly

wicked notch Apr 12, 2024, 12:58 AM

#

that does make sense

frank sail Apr 12, 2024, 12:58 AM

#

I tried one-pass hi-z and didn't think of that issue, then it was broken and I deleted it all froge_bleak

wicked notch Apr 12, 2024, 12:59 AM

#

ye I'm willing to accept some artifacts for now

#

I'll do two pass after this works properly and I get the perf boost I'm expecting

wicked notch Apr 12, 2024, 1:20 AM

#

buckle up because I'm about to create a 2048x2048x16x4 image

wispy spear Apr 12, 2024, 10:23 AM

#

i smell tachyons

#

https://tenor.com/view/borg-wormhole-star-trek-picard-cube-wormhole-borg-gif-21804210

Tenor

wicked notch Apr 12, 2024, 3:15 PM

#

ah yes

#

1ms spent on building the HZB bleakekw

delicate rain Apr 12, 2024, 3:16 PM

#

Does it at least go brr when drawing?

wicked notch Apr 12, 2024, 3:17 PM

#

I will test soon

#

meanwhile

#

float SampleVirtualShadow(in uvec2 position) {
  const uint power = findMSB(VIRTUAL_SHADOW_PAGE_SIZE);
  const uvec2 virtualPagePosition = position >> power;
  const uint virtualPage = imageLoad(g_VirtualShadowVirtualPageTableImage, ivec3(virtualPagePosition, gl_WorkGroupID.z)).x;
  if (!VirtualShadowIsPageBacked(virtualPage)) {
    return 0;
  }

  const ivec2 physicalTexelCorner = VirtualShadowCalculatePhysicalTexelCorner(virtualPage);
  const ivec2 physicalTexel = physicalTexelCorner + ivec2(position & uvec2(VIRTUAL_SHADOW_PHYSICAL_PAGE_SIZE - 1));
  return uintBitsToFloat(imageLoad(g_VirtualShadowPhysicalMemoryImage, physicalTexel).x);
}

RetinaGroupSize(16, 16, 1)
void main() {
  const uvec2 position = gl_GlobalInvocationID.xy;
  const uvec2 virtualPosition = position << 1;
  const vec4 samples = vec4(
    SampleVirtualShadow(virtualPosition + uvec2(0, 0)),
    SampleVirtualShadow(virtualPosition + uvec2(0, 1)),
    SampleVirtualShadow(virtualPosition + uvec2(1, 0)),
    SampleVirtualShadow(virtualPosition + uvec2(1, 1))
  );
  const float v = min(min(samples.x, samples.y), min(samples.z, samples.w));
  imageStore(g_OutputImage, ivec3(position, gl_WorkGroupID.z), vec4(v));
}
``` can I make this better in any way

#

I stall on TEXTHR though so I doubt that bleakekw

delicate rain Apr 12, 2024, 3:18 PM

#

On phone

#

Will check when I'm back on pc

#

2ms shadowmap draw seems fine no?

#

(from the capture )

wicked notch Apr 12, 2024, 3:18 PM

#

that's without hzb, I will make culling go brr now

delicate rain Apr 12, 2024, 3:19 PM

#

This is bistro 16 clips?

wicked notch Apr 12, 2024, 3:19 PM

#

nono sponza

#

because it loads faster so I can test without waiting kekkedsadge

#

mfw no async load

delicate rain Apr 12, 2024, 3:19 PM

#

Would be pog if it was bistro

wicked notch Apr 12, 2024, 3:19 PM

#

stay tuned

#

soon™️

delicate rain Apr 12, 2024, 3:20 PM

#

wicked notch mfw no async load

Ye it's pain to do

#

But fun when it works

#

Launch times are instant

#

Hmm, how much faster is Sponza compared to bistro?

wicked notch Apr 12, 2024, 3:21 PM

#

a lot

#

but it's not comparable to regular raster either way

#

I think I can make the copy go faster if I abuse the fact that page size is 128

#

I just allocate a ton of shared memory and put the samples there

delicate rain Apr 12, 2024, 3:23 PM

#

Okok, was just wonderin

wicked notch Apr 12, 2024, 4:30 PM

#

I reduce memory traffic

#

memory subsystem is happy

#

I am happy

#

froge

#

copy takes literally 3x less time

#

hell yeah

delicate rain Apr 12, 2024, 4:44 PM

#

At the mere cost of 380 megabytes of VRAM we get hiz

wicked notch Apr 12, 2024, 4:46 PM

#

less than what bistro would take in an acceleration structure KEKW

#

plus this is the dumb shitty naive method

#

if we reduce the VSM itself we get infinite power

delicate rain Apr 12, 2024, 5:02 PM

#

The if is the important part bleakekw

wicked notch Apr 12, 2024, 5:04 PM

#

I believe in you

delicate rain Apr 12, 2024, 5:04 PM

#

I'm slowly burning out

#

Uni is killing me

wicked notch Apr 12, 2024, 5:08 PM

#

ight moment of truth

delicate rain Apr 12, 2024, 5:09 PM

#

Speed

wicked notch Apr 12, 2024, 5:10 PM

#

I'm rendering at the blistering pace of VK_ERROR_DEVICE_LOST millseconds per frame

delicate rain Apr 12, 2024, 5:11 PM

#

I'm curious btw, are validation layers useful to you?

wicked notch Apr 12, 2024, 5:11 PM

#

somewhat

delicate rain Apr 12, 2024, 5:11 PM

#

Recently I can't really do anything other than shader print with them

wicked notch Apr 12, 2024, 5:11 PM

#

I've started to ignore sync val

#

yes

#

god bless debugPrintf

delicate rain Apr 12, 2024, 5:12 PM

#

Yeah, GPU based just crashes or misdiagnoses so the app doesn't run properly

#

Sync doesn't get bindless

#

And everything else is done by daxa

wicked notch Apr 12, 2024, 5:13 PM

#

holy mother of all pog

#

it is working

#

ladies and gentlemen

#

800 microseconds raster on bistro

delicate rain Apr 12, 2024, 5:14 PM

#

Letsgo

#

Damn that's fast

wicked notch Apr 12, 2024, 5:15 PM

#

it takes longer to build the HZB bleakekw

delicate rain Apr 12, 2024, 5:15 PM

#

If we combine it with hpb cull and caching it will be light speed

wicked notch Apr 12, 2024, 5:15 PM

#

this is already hpb

delicate rain Apr 12, 2024, 5:16 PM

#

Do you know if it actually works?

wicked notch Apr 12, 2024, 5:16 PM

#

I write far depth into non backed pages

delicate rain Apr 12, 2024, 5:16 PM

#

Like do you have Shadows?

delicate rain Apr 12, 2024, 5:16 PM

#

wicked notch this is already hpb

Aha I see

wicked notch Apr 12, 2024, 5:16 PM

#

oh yeah I forgot to do that KEKW

#

let me write basic sampling one sec

delicate rain Apr 12, 2024, 5:16 PM

#

Crossing my fingers

wicked notch Apr 12, 2024, 5:17 PM

#

btw I still get 14ms in this damned spot

#

at this point I have no idea why bleakekw

delicate rain Apr 12, 2024, 5:17 PM

#

Uhhhhh

#

Where is that?

#

Under some of the bushes?

wicked notch Apr 12, 2024, 5:17 PM

#

bushes

#

ye

#

here

delicate rain Apr 12, 2024, 5:18 PM

#

I'll try if I get that too soon

#

Do you have alpha discard?

wicked notch Apr 12, 2024, 5:18 PM

#

nop

delicate rain Apr 12, 2024, 5:19 PM

#

Hmm weird

loud crag Apr 12, 2024, 5:19 PM

#

wicked notch here

do you query memory budget every frame for those statistics

delicate rain Apr 12, 2024, 5:20 PM

#

Btw is it two pass?

wicked notch Apr 12, 2024, 5:20 PM

#

one pass

wicked notch Apr 12, 2024, 5:20 PM

#

loud crag do you query memory budget every frame for those statistics

ye

delicate rain Apr 12, 2024, 5:21 PM

#

So you just keep depth from previous frame and cull against that or?

wicked notch Apr 12, 2024, 5:21 PM

#

no I use previous frame visibility mask

#

I do that because it's easier to switch to two pass this way

delicate rain Apr 12, 2024, 5:22 PM

#

Previous frame visibility mask to do what?

#

(idk how single pass culling works) agonyfrog

wicked notch Apr 12, 2024, 5:23 PM

#

previous frame visibility mask to cull current frame meshlets

#

it's a bogus approach

#

but it makes it easier to do two pass

#

Like I only have to do another culling pass and xor the visibility masks together

delicate rain Apr 12, 2024, 5:25 PM

#

Hmm okay what is visibility mask? I thought it's just a bit mask of visible/notvisible for each cascade?

wicked notch Apr 12, 2024, 5:25 PM

#

something to mark meshlets as visible/not visible

#

each bit encodes visibility of a single meshlet

delicate rain Apr 12, 2024, 5:26 PM

#

How do you cull against that

#

Huuuh I don't understand

wicked notch Apr 12, 2024, 5:27 PM

#

bool IsMeshletVisible(uint meshletInstanceIndex) {
  const uint maskIndex = meshletInstanceIndex >> 6u;
  const uint bitIndex = meshletInstanceIndex & 0x3fu;
  const uint64_t mask = RetinaDereference(g_VisibleMeshletBuffer)[maskIndex];
  return (mask & (uint64_t(1) << bitIndex)) != 0;
}```

#

this is in the task shaderino

wispy spear Apr 12, 2024, 5:29 PM

#

3x fu 😛

delicate rain Apr 12, 2024, 5:29 PM

#

Tido has super sampling froge

delicate rain Apr 12, 2024, 5:31 PM

#

wicked notch ```glsl bool IsMeshletVisible(uint meshletInstanceIndex) { const uint maskInde...

How do you set this mask?

wicked notch Apr 12, 2024, 5:32 PM

#

when I cull

#

void main() {
  const uint meshletInstanceIndex = gl_GlobalInvocationID.x;
  if (meshletInstanceIndex >= u_MeshletCount) {
    return;
  }
  ...
  const uint meshletVisibilityMaskIndex = meshletInstanceIndex >> 6u;
  const uint meshletVisibilityMaskBit = meshletInstanceIndex & 0x3fu;
  if (IsMeshletVisible(aabb, viewInfo, transform)) {
    atomicOr(RetinaDereference(g_VisibleMeshletBuffer)[meshletVisibilityMaskIndex], uint64_t(1) << meshletVisibilityMaskBit);
  } else {
    atomicAnd(RetinaDereference(g_VisibleMeshletBuffer)[meshletVisibilityMaskIndex], ~(uint64_t(1) << meshletVisibilityMaskBit));
  }
}

#

let me take a moment to curse at shaderc

#

Failed to compile shader 'D:/Dev/CLion/Retina/src/Retina/Sandbox
/Shaders/GBufferResolve.frag.glsl': shaderc: internal error: compilation succeeded but failed to optimize: ID '128[%g_Vi
rtualShadowInfoBuffer]' defined in block '9[%9]' does not dominate its use in block '258[%258]'
  %258 = OpLabel``` fuck you

wispy spear Apr 12, 2024, 5:34 PM

#

i understand 0, but its utterly fascinating, and im not trying to be sarcastic or anything. its froge_love

#

maybe we lure the slang peeps onto our server and chain them somewhere in the basement, so that they can immediately fix those things

delicate rain Apr 12, 2024, 5:36 PM

#

wicked notch ```glsl void main() { const uint meshletInstanceIndex = gl_GlobalInvocationID....

So if a meshlet stops being visible, how can it become visible again?

wicked notch Apr 12, 2024, 5:37 PM

#

shadows do work with culling

#

and they're blazing fast to raster

#

VSM has been conquered

wicked notch Apr 12, 2024, 5:38 PM

#

delicate rain So if a meshlet stops being visible, how can it become visible again?

the mask is cleared every frame

#

if meshlet passes both frustum and hiz test it is set as visible

delicate rain Apr 12, 2024, 5:39 PM

#

wicked notch and they're blazing fast to raster

POG

#

okay so you draw meshlets visible last frame, build hiz, draw everything again and cull against hiz?

wicked notch Apr 12, 2024, 5:40 PM

#

correct, minus the "draw everything again"

delicate rain Apr 12, 2024, 5:40 PM

#

draw meshelts visible last frame, build hiz and...??

wicked notch Apr 12, 2024, 5:40 PM

#

cull for next frame

delicate rain Apr 12, 2024, 5:41 PM

#

when do you draw what was not visible last frame

wicked notch Apr 12, 2024, 5:41 PM

#

don't try too hard to understand this method, it's garbage KEKW

delicate rain Apr 12, 2024, 5:41 PM

#

why is it so hard for me to understand culling man

#

I stg I have huge skill issue

wicked notch Apr 12, 2024, 5:41 PM

#

delicate rain when do you draw what was not visible last frame

in theory after you build the hiz yes

#

I'm just not doing that rn for shrimplicity

loud crag Apr 12, 2024, 5:42 PM

#

wicked notch shadows do work with culling

damn, same scene for me but just using four csm cascades and I have 40fps instead of >200 :(

delicate rain Apr 12, 2024, 5:42 PM

#

wicked notch in theory after you build the hiz yes

but that means that you can miss stuff that was not visible last frame and just became visible this frame no?

wicked notch Apr 12, 2024, 5:43 PM

#

ye that's when you xor the masks

delicate rain Apr 12, 2024, 5:43 PM

#

for a frame I guess, because it is visible this frame so you draw it next frame

wicked notch Apr 12, 2024, 5:43 PM

#

two pass occlusion culling works like this

#

Draw(visibleLastFrame);
BuildHZB();
CullAndXor();
Draw(disoccludedCurrentFrame);```

#

I do this right now

Draw(visibleLastFrame);
BuildHZB();
Cull(); // for next frame```

delicate rain Apr 12, 2024, 5:47 PM

#

right so you do have one frame of delay

wicked notch Apr 12, 2024, 5:48 PM

#

yes

#

that's why what I'm doing is bad

delicate rain Apr 12, 2024, 5:48 PM

#

okay now I get it

#

sorry for the confuslment

wicked notch Apr 12, 2024, 5:49 PM

#

issa ok frogeheart

#

bliss

#

pure bliss

delicate rain Apr 12, 2024, 5:49 PM

#

yeah this is awesome

wicked notch Apr 12, 2024, 5:49 PM

#

1ms

delicate rain Apr 12, 2024, 5:49 PM

#

what gpu btw?

wicked notch Apr 12, 2024, 5:50 PM

#

3070

hallow cedar Apr 12, 2024, 5:50 PM

#

inb4 4090

#

aw

#

in after

delicate rain Apr 12, 2024, 5:50 PM

#

wicked notch 3070

I do 1ms with no hiz on 4070 ti s

#

hmmm

#

I like this a lot

wicked notch Apr 12, 2024, 5:54 PM

#

now potrick must figure out a way to make hzb build go brrr

#

because currently that takes more than VSM raster KEKW

delicate rain Apr 12, 2024, 5:58 PM

#

btw do you use your heuristic or the original Jaker one?

#

for clip selection

wicked notch Apr 12, 2024, 5:58 PM

#

I use mine because it's shrimpler

#

in practice it should be equiv

delicate rain Apr 12, 2024, 5:59 PM

#

I keep running out of resolution on the 0th clip on a 2k monitor bleakekw

wicked notch Apr 12, 2024, 5:59 PM

#

with my heuristic?

delicate rain Apr 12, 2024, 5:59 PM

#

I need to turn my 0th clip world size to 8 meters to work

delicate rain Apr 12, 2024, 5:59 PM

#

wicked notch with my heuristic?

nono yours its impossible

#

look at little sneaky guy

glass sphinx Apr 12, 2024, 6:03 PM

#

eeeeviiiiill

wicked notch Apr 12, 2024, 6:04 PM

#

potrick

#

make hzb build go brr

glass sphinx Apr 12, 2024, 6:45 PM

#

livstris neuron is glowing

#

i did too much irl to program in free time last weeks

#

froge_yeehaw

delicate rain Apr 12, 2024, 6:46 PM

#

I wonder if better culling makes it faster to not cache at all

wispy spear Apr 12, 2024, 6:46 PM

#

glass sphinx <:froge_yeehaw:1180194428607008908>

time to get back into the saddle

glass sphinx Apr 12, 2024, 6:58 PM

#

@wicked notch we can prob just not erite the lowest mips for hiz

#

maybe even only on page level

#

if we do it at page level it will be like 10 mics for all clips

#

idk how fucked the culling will be with that tho

wicked notch Apr 12, 2024, 6:59 PM

#

ye I think the real strat is just doing reduction on the VSM itself

#

per page, that is

glass sphinx Apr 12, 2024, 6:59 PM

#

tru

#

two pass

#

hmmhmm

wicked notch Apr 12, 2024, 6:59 PM

#

I think it can work, because it's effectively the same thing, just the sampling gets more convoluted

glass sphinx Apr 12, 2024, 6:59 PM

#

yes

#

wmart

#

its getting complicated

wispy spear Apr 12, 2024, 7:01 PM

#

https://tenor.com/view/make-it-so-jean-luc-picard-star-trek-the-next-generation-make-it-happen-gif-23455354

Tenor

astral field Apr 12, 2024, 10:54 PM

#

wispy spear maybe we lure the slang peeps onto our server and chain them somewhere in the ba...

solution: make an issue

#

forgefumbsup issues fix everything

wispy spear Apr 12, 2024, 10:55 PM

#

astral field solution: make an issue

is that you maisonbleu?

astral field Apr 12, 2024, 10:58 PM

#

wispy spear is that you maisonbleu?

I'm not a blue house?

wispy spear Apr 12, 2024, 10:58 PM

#

: )

astral field Apr 12, 2024, 10:58 PM

#

bleakekw oh, it wasn't french

wispy spear Apr 12, 2024, 10:58 PM

#

i dont remember your nickname but you fly the team-effort-d3d11 role 🙂

#

or you are the 8bit guy

astral field Apr 12, 2024, 10:58 PM

#

I'm 8 bit, yea

wispy spear Apr 12, 2024, 10:59 PM

#

ah

astral field Apr 12, 2024, 10:59 PM

#

OH you changed your name

wispy spear Apr 12, 2024, 10:59 PM

#

you changed your whole account

astral field Apr 12, 2024, 10:59 PM

#

nah, only username

#

8bit->nibble

#

half the bits

wispy spear Apr 12, 2024, 10:59 PM

#

nibble is 4 bits

astral field Apr 12, 2024, 10:59 PM

#

half the intelligence

wispy spear Apr 12, 2024, 11:00 PM

#

fair

astral field Apr 12, 2024, 11:00 PM

#

i had no idea you changed user, thought you disappeared

wispy spear Apr 12, 2024, 11:00 PM

#

ill never disappear

astral field Apr 12, 2024, 11:01 PM

#

froge_yeehaw

#

gpu dev's never disappear, only manifest differently

wispy spear Apr 12, 2024, 11:02 PM

#

im not a gpu dev though

astral field Apr 12, 2024, 11:05 PM

#

froge_evil

wicked notch Apr 12, 2024, 11:13 PM

#

I ponder

frank sail Apr 12, 2024, 11:14 PM

#

https://tenor.com/view/pondering-pondering-my-orb-my-orb-orb-pondering-my-pondering-my-orb-orb-gif-24060364

Tenor

wicked notch Apr 12, 2024, 11:15 PM

#

if the first clipmap is smol

#

frametime goes up a lot in that goddamned spot

#

near the bushes that is

#

it's fine literally everywhere else

#

I need to inspecc more

#

renderdoc really hates my 2k^2 x16 layers x12 mips image though kekkedsadge

#

so here is a partially reduced version of the first clipmap in that area

wispy spear Apr 12, 2024, 11:18 PM

#

could that be overdraw in the bushes killing the perf?

wicked notch Apr 12, 2024, 11:20 PM

#

it should be mitigated by hiz

wispy spear Apr 12, 2024, 11:20 PM

#

maybe debug draw the bushes like potti did few months ago

#

frozen frustum and fly around the bush

#

to see if it do hiz or hiznt

wicked notch Apr 12, 2024, 11:21 PM

#

oh you sparked an idea in me

#

page heatmap

#

brb

#

renderdoc is sometimes bogus though smh

frank sail Apr 12, 2024, 11:23 PM

#

wicked notch page heatmap

ahh cool idea

delicate rain Apr 12, 2024, 11:24 PM

#

Why do we come up with the coolest of ideas when the articles dealing stares us down???

wispy spear Apr 12, 2024, 11:25 PM

#

some people work better when there is pressure

delicate rain Apr 12, 2024, 11:25 PM

#

I like to blame my weak discipline

wispy spear Apr 12, 2024, 11:25 PM

#

out comes ze diamonds : )

wicked notch Apr 12, 2024, 11:32 PM

#

there is a singularity at the center of bistro

#

idk what that is bleakekw

astral field Apr 12, 2024, 11:33 PM

#

found the problem at least KEKW

frank sail Apr 12, 2024, 11:33 PM

#

tbf that's where most of the detail actually is

wispy spear Apr 12, 2024, 11:33 PM

#

explains all the bushes

#

they were trying to hide some shit

wicked notch Apr 12, 2024, 11:33 PM

#

still something must be going very wrong

#

how come hiz isn't able to cope

wispy spear Apr 12, 2024, 11:34 PM

#

is it possible that the geometry there is just simply fucked (read not ideal) and they really couldnt just be bothered

#

mayhaps needs some work in blender

wicked notch Apr 12, 2024, 11:35 PM

#

nah I see now what's going on

wispy spear Apr 12, 2024, 11:35 PM

#

ah

wicked notch Apr 12, 2024, 11:35 PM

#

and yes it is the geometry being horrid

#

but it's something worse

#

big meshlets

#

foliage is a completely disconnected mesh and meshoptimizer already doesn't care about spatial locality

#

so it just grabs whatever triangles it has available to build the meshlets, this makes the AABB huge

#

hence hiz can't cope

wispy spear Apr 12, 2024, 11:36 PM

#

that means you need a carefully crafted scene to properly generate meshlets for all the newfangledisms

wicked notch Apr 12, 2024, 11:36 PM

#

..or

wispy spear Apr 12, 2024, 11:37 PM

#

fix meshoptimizer

wicked notch Apr 12, 2024, 11:37 PM

#

I do nanite

#

as I promised

wispy spear Apr 12, 2024, 11:37 PM

#

brainworm

#

you have to help joker to write the vsm paper first

wicked notch Apr 12, 2024, 11:37 PM

#

yes I haven't done much bleakekw

#

I'll redeem myself

frank sail Apr 12, 2024, 11:39 PM

#

you were never obligated to help with the writing

wicked notch Apr 12, 2024, 11:39 PM

#

bullshit

#

it is my duty as a member of the VSM team to do work

astral field Apr 12, 2024, 11:40 PM

#

wicked notch bullshit

frogeshit*

wicked notch Apr 12, 2024, 11:40 PM

#

I must uphold that duty

frank sail Apr 12, 2024, 11:40 PM

#

I invite you to read what we have so far on overleaf

wicked notch Apr 12, 2024, 11:40 PM

#

I shall

frank sail Apr 12, 2024, 11:40 PM

#

feel free to add comments, make edits, etc.

#

obviously 😄

wicked notch Apr 12, 2024, 11:41 PM

#

I'll note down stuff I think I can work on

#

we use tido yes?

frank sail Apr 12, 2024, 11:41 PM

#

yeah

#

code isn't due for a few more weeks so we are not focusing on that at all

wicked notch Apr 12, 2024, 11:42 PM

#

gud

delicate rain Apr 12, 2024, 11:44 PM

#

wicked notch we use tido yes?

I wrote what we do in tido

wispy spear Apr 13, 2024, 12:57 PM

#

looks like LVSTRI blogs bout his college setup

#

https://www.youtube.com/watch?v=sWHfVPqTovY

YouTube

Edward

Linux Based Productivity setup - College edition 2024

Excuse me but lately I've really busy. Please disenjoy and leave a dislike.
This video was edited too with ffmpeg, recorded with my smartphone.

▶ Play video

#

its an italian keyboard layout but he tinkered with some keys

wispy spear Apr 13, 2024, 5:22 PM

#

lustri i also just noticed your cmake installing deps thingy, thats neat 🙂

#

also also, you dont need to tell cmake about your headers, cpp is enough, when declaring a target add_executable/library

wicked notch Apr 13, 2024, 5:32 PM

#

wispy spear also also, you dont need to tell cmake about your headers, cpp is enough, when d...

ik, do I do that schtuff somewhere

#

I think I do target_sources everywhere, maybe there's some stray header

wispy spear Apr 13, 2024, 5:33 PM

#

wicked notch ik, do I do that schtuff somewhere

root CMakeLists.txt

#

set(RETINA_HEADERS

#

and you still use cgltf, i will tell @loud crag

wicked notch Apr 13, 2024, 5:34 PM

#

wispy spear set(RETINA_HEADERS

I dun see dis thonk

wispy spear Apr 13, 2024, 5:34 PM

#

oh did you perchance remove it already :3

wicked notch Apr 13, 2024, 5:34 PM

#

mayhaps

#

I don't remember ever having that but 🅱️erhaps my brain is failing me

#

as it usually does

wispy spear Apr 13, 2024, 5:35 PM

#

oh

#

looks like i was on an old commit p_Cry

#

.Entry is still called .Entry though hehe

#

man your code is so readable

wicked notch Apr 13, 2024, 5:36 PM

#

is it

#

I still need to make rg to separate the passes

#

rn everything is in application bleakekw

wispy spear Apr 13, 2024, 5:37 PM

#

i fink it is, still not sure about the C for classes, but i very much stole the S and E prefix too now 🙂 and the auto Foo() -> ism

wicked notch Apr 13, 2024, 5:38 PM

#

does the thing compile for you on lunix btw

#

I think new GCC should be out

wispy spear Apr 13, 2024, 5:38 PM

#

ah let me check

#

uh let me check that too

#

last time was 13.2.1

#

it still is

wicked notch Apr 13, 2024, 5:40 PM

#

hm perhaps it was just clang18

#

why is gcc so slow to release smh

wispy spear Apr 13, 2024, 5:40 PM

#

dlss commit is outdated it seems

#

make] fatal: Fetched in submodule path 'NVIDIAImageScaling', but it did not contain 35e13ba316c98eeecf16f37eae70ce88019911f6. Direct fetching of that commit failed.
[cmake] CMake Error at nvdia_dlss-subbuild/nvdia_dlss-populate-prefix/tmp/nvdia_dlss-populate-gitclone.cmake:62 (message):

#

it should be disabled on lunix anyway i suppose, perhaps with some autodetection and a message?

wicked notch Apr 13, 2024, 5:41 PM

#

ye

wispy spear Apr 13, 2024, 5:44 PM

#

trying clang 17.0.6

#

ah lol?

#

dlss cloning worked now

#

it cant find #include <vulkan/vk_enum_string_helper.h>

wicked notch Apr 13, 2024, 5:46 PM

#

wot

wispy spear Apr 13, 2024, 5:46 PM

#

hopefully not a new vksdk ;c

wicked notch Apr 13, 2024, 5:46 PM

#

it's 275

#

but it's been there since the beginning of time I think

wispy spear Apr 13, 2024, 5:46 PM

#

narf

#

im on 268

wicked notch Apr 13, 2024, 5:47 PM

#

can you check in your sdk

#

is it actually missing

wispy spear Apr 13, 2024, 5:47 PM

#

eh hang on

wicked notch Apr 13, 2024, 5:47 PM

#

the string helper has been there since 1.2 at least

wispy spear Apr 13, 2024, 5:47 PM

#

this is weird

#

vulkan/vulkan.hpp is coming from /usr/include/vulkan wtf

#

[deccer@rootfs ~]$ sudo pacman -R vulkan-devel
checking dependencies...
error: failed to prepare transaction (could not satisfy dependencies)
:: removing spirv-tools breaks dependency 'spirv-tools' required by glslang
:: removing vulkan-headers breaks dependency 'vulkan-headers' required by qt6-base
:: removing spirv-tools breaks dependency 'spirv-tools' required by shaderc
``` ;C

wicked notch Apr 13, 2024, 5:49 PM

#

vulkan-headers methinks

wispy spear Apr 13, 2024, 5:49 PM

#

i have no memories installing anything qt6y

wicked notch Apr 13, 2024, 5:50 PM

#

kde I think is qt

wispy spear Apr 13, 2024, 5:50 PM

#

yeah, but im an xfce fanboy

#

ok let me fiddle

wicked notch Apr 13, 2024, 5:50 PM

#

issa ok, I'll fix linux building once and for all now

wispy spear Apr 13, 2024, 5:50 PM

#

haha

#

ah

#

its gnuplot

#

and that stupid patchpanel for thingy pipewire

#

and obs : )

#

removing ffmpeg breaks dependency 'ffmpeg' required by firefox

#

nice

wicked notch Apr 13, 2024, 5:53 PM

#

don't nuke your os KEKW

wispy spear Apr 13, 2024, 5:53 PM

#

: )

#

good thing is its ez to reinstall

#

but why would firefox have a hard dependency to ffmpeg

#

thats new

#

important question is, if i remove firefox and reinstall it, will it rember my current 250 open tabs 😛

#

ok i have to clean the tabs first anyway... will do that first

#

and i should also start working on local build containers

wispy spear Apr 13, 2024, 7:50 PM

#

down to 150 tabs : >

wispy spear Apr 13, 2024, 10:50 PM

#

down to 14

loud crag Apr 13, 2024, 11:46 PM

#

wispy spear also also, you dont need to tell cmake about your headers, cpp is enough, when d...

misinfo

frank sail Apr 13, 2024, 11:46 PM

#

how is that misinfo?

#

headers aren't treated as individual translation units

loud crag Apr 13, 2024, 11:47 PM

#

IDEs or analyzer tools want that and might break or fail to analyse the headers if you dont add the headers

frank sail Apr 13, 2024, 11:48 PM

#

perhap

wispy spear Apr 13, 2024, 11:48 PM

#

had no trouble with clion or vscode-cpp so far at least

#

in my cmake/openglstarted and other cppisms for now

wicked notch Apr 13, 2024, 11:50 PM

#

loud crag IDEs or analyzer tools want that and might break or fail to analyse the headers ...

clion werks fine without

primal shadow Apr 15, 2024, 2:55 AM

#

Finished revamping my 2pass occlusion culling to work with LODs, and be a lot cleaner/better in general!

#

Reusing the previous frame depth pyramid, instead of explicitly tracking cluster visibility between frames

loud crag Apr 15, 2024, 3:46 AM

#

@wicked notch you know if nvpro_pyramid works well on non-nvidia gpus? i’m thinking of using it, too, but reading comments in the source like „subgroupSize other than 32 is not tested, should work, message [email protected] if not“ is throwing me off a little

#

perhaps ffx spd is a better fit for me

primal shadow Apr 15, 2024, 3:47 AM

#

loud crag <@320895822394818561> you know if nvpro_pyramid works well on non-nvidia gpus? i...

Neither SPD or nvpro_pyramid work for non-power2 textures either, which is a huge pain. I haven't seen anyone handle it.

loud crag Apr 15, 2024, 3:54 AM

#

huh, are you sure?

#

the nvpro dispatcher seems like it can handle anything that is a multiple of 4 by default

#

that number is templated, though, so perhaps it supports other factors, too

primal shadow Apr 15, 2024, 4:00 AM

#

It probably doesn't enforce power of 2 for the textures or generates

#

Which would prevent you from using mips of a single texture

glass sphinx Apr 15, 2024, 6:40 AM

#

primal shadow Reusing the previous frame depth pyramid, instead of explicitly tracking cluster...

not tracking visibility reduced perf a lot im my tests

#

maybe i missunderstand what you do

#

but if you use last frames depth i assume you draw twice, one culled against last frame once culling against partial new frame

primal shadow Apr 15, 2024, 12:09 PM

#

glass sphinx but if you use last frames depth i assume you draw twice, one culled against las...

Yes

#

I'm doing exactly what nanite does

primal shadow Apr 15, 2024, 4:42 PM

#

Started porting SPD to bevy. Non power of 2 is actually easy. The samples out of bounds will just return 0, which ends up ensuring conservative depth, at the cost of a poorer culling on the edges of the screen. Totally fine imo.

wicked notch Apr 15, 2024, 4:46 PM

#

nvpro_pyramid does support NPOT

primal shadow Apr 16, 2024, 3:44 AM

#

Almost finished with porting SPD!

loud crag Apr 16, 2024, 6:50 AM

#

dlss be working beautifully

wicked notch Apr 16, 2024, 9:39 AM

#

it's disabled

#

should probably do something to enable/disable it easily KEKW

wispy spear Apr 16, 2024, 10:08 AM

#

you can use the already exiting short cut alt+f4 to reload the thing

buoyant summit Apr 16, 2024, 9:16 PM

#

https://github.com/shader-slang/slang/pull/3907

GitHub

Init expressions for struct fields support, #3738 by ArielG-NV · Pu...

Following commit handles init expressions of struct's + inheritance of constructors.
#3878
The general implementation follows C++ init expression rules for derived classes.
The logic is general...

astral field Apr 17, 2024, 12:51 AM

#

c++ classes froge_yeehaw

buoyant summit Apr 17, 2024, 1:23 AM

#

watbulb

#

slang has crappy classes

#

well it doesn't have classes, only structs, and those aren't crappy themselves, but the semantics for this in methods is hlsl-tier garbage

#

because this is basically inout

#

:barf:

astral field Apr 17, 2024, 1:27 AM

#

make an issue gpAkkoShrug

primal shadow Apr 17, 2024, 1:58 AM

#

Ok finished SPD port to Bevy/WGSL and using it in my nanite-impl 🙂

#

Much better perf

glass sphinx Apr 17, 2024, 12:11 PM

#

buoyant summit because `this` is basically `inout`

ah wtf

astral field Apr 17, 2024, 10:27 PM

#

glass sphinx ah wtf

it's inout for hlsl

#

since hlsl be hlsl

#

spirv is actually a this I think

glass sphinx Apr 17, 2024, 10:28 PM

#

spirv doesnt have a concept of classes at all

astral field Apr 17, 2024, 10:28 PM

#

not classes

#

but ptr's

#

glsl target also supports class stuff in slang

glass sphinx Apr 17, 2024, 10:29 PM

#

does spirv have storage class - generic pointers?

astral field Apr 17, 2024, 10:37 PM

#

glass sphinx does spirv have storage class - generic pointers?

yes

#

buoyant summit Apr 17, 2024, 10:38 PM

#

I don't like spir-v generic pointer tbh

#

or rather

#

I'm kinda suspicious about just throwing it into vk as is

astral field Apr 17, 2024, 10:56 PM

#

they're good if good tooling could use them (vcc & slang one day)

#

although I don't know the extent to which they can be used (never checked up deeply), I only really know about restricted ptr's

glass sphinx Apr 17, 2024, 11:47 PM

#

froge_love

buoyant summit Apr 17, 2024, 11:48 PM

#

glass sphinx <:froge_love:1105211408255295624>

it's not a thing in vk so don't get too excited (though slang probably easily could be taught to emit that and then fed into shady to lower generic ptrs to tags frognant )

#

and the generic pointers issue I opened a while ago is on a really slow cook

primal shadow Apr 18, 2024, 4:10 AM

#

I'm hitting the match dispatch limit for culling workgroups 😬

#

memory also continues to be really annoying to allocate large amounts of

frank sail Apr 18, 2024, 4:12 AM

#

primal shadow I'm hitting the match dispatch limit for culling workgroups 😬

how big

primal shadow Apr 18, 2024, 4:17 AM

#

Caused by:
    In a ComputePass
      note: encoder = `<CommandBuffer-(3, 3, Vulkan)>`
    In a dispatch command, indirect:false
      note: compute pipeline = `meshlet_culling_first_pipeline`
    Each current dispatch group size dimension ([103039, 1, 1]) must be less or equal to 65536

#

I can do the same dumb trick I did for the other pass and make it a 3d dispatch I then remap to 1d in order to get more workgroups 😛

#

Or use bigger workgroup sizes ig. Currently it's 64x1x1.

wheat haven Apr 18, 2024, 6:17 AM

#

primal shadow I can do the same dumb trick I did for the other pass and make it a 3d dispatch ...

exactly what I did when I hit that snag

primal shadow Apr 18, 2024, 6:34 AM

#

wheat haven exactly what I did when I hit that snag

Why must I though :/. Why can't drivers be smart.

craggy shale Apr 18, 2024, 6:46 PM

#

because actual magic doesn't exist and "magic" means heuristics and heuristics suck

wispy spear Apr 18, 2024, 9:09 PM

#

im also confused about cmake 🙂

#

one cmakelists does if (RETINA_ENABLE_PROFILER) the other one does if (${RETINA_ENABLE_PROFILER})

wicked notch Apr 18, 2024, 9:12 PM

#

I think it's equivalent

#

but ye I should change this

wispy spear Apr 18, 2024, 9:13 PM

#

i cant it to work either

#

i stole the profiler one

#

no matter what i set or option with ON and with or without CACHE and or CACHE BOOL it wont pick it up

#

unless i really have to yoink build/ physically and not just Delete Cache & Reconficture

loud crag Apr 19, 2024, 7:31 AM

#

wicked notch I think it's equivalent

yeah in this context it’s the same but it does have a different meaning

astral field Apr 19, 2024, 10:21 AM

#

primal shadow ``` Caused by: In a ComputePass note: encoder = `<CommandBuffer-(3, 3,...

2^16 probably not 100% random

delicate rain Apr 19, 2024, 9:34 PM

#

Mister mister @wicked notch please show your shadows

wicked notch Apr 19, 2024, 9:37 PM

#

uh

#

are you in a hurry

#

gimme 20 mins

delicate rain Apr 19, 2024, 9:39 PM

#

no

#

send gh

wicked notch Apr 19, 2024, 9:39 PM

#

sure

#

https://github.com/LVSTRI/Retina

#

jaker was able to compile with VS

delicate rain Apr 19, 2024, 9:39 PM

#

Thank you froge_love

wicked notch Apr 19, 2024, 9:39 PM

#

clang is a requirement

frank sail Apr 19, 2024, 9:45 PM

#

you can install clang through the visual studio installer if you want to use VS

primal shadow Apr 19, 2024, 9:56 PM

#

Behold: Wayy too many bunnies for my renderer to currently handle!

#

wicked notch Apr 19, 2024, 9:56 PM

#

that occupancy is sad

#

bleakekw

primal shadow Apr 19, 2024, 10:02 PM

#

Yes. Here's a breakdown of problems:

The two green vkCmdCopyBuffer()s (4ms each) are staging buffer copies. I'll probably want to use ReBAR and directly map buffers
The write_index_buffer pass (10.5ms each, 2 per view) are slooooow and have horrendous occupancy. Couple of things I can try:
- Have culling pass write out a list of all visible clusters, indirect draw to spawn write_index_buffer pass workgroups, instead of spawning 1 wg per cluster and just using early exit if they were culled
- Try to merge it with the culling pass
- Add software raster and hope it reduces the need for hardware raster
- Add mesh shaders to wgpu
Raster (2 per view, 3.3/4.3 for main/shadow view in the first one, and then second is basically free because occlusion culling)
- I'm 83% PES+VPC throughput, which I think means primitive assembler limited? Nothing I can do to fix this really. Software raster + mesh shaders might help, again.

wicked notch Apr 19, 2024, 10:03 PM

#

I don't remember our method for writing meshlet index buffers to be that slow tbh

#

idk how it came to be like this

primal shadow Apr 19, 2024, 10:05 PM

#

This is the shader https://github.com/bevyengine/bevy/blob/0ebd414dbc13ef77b84627ca5d0e82b72cd50262/crates/bevy_pbr/src/meshlet/write_index_buffer.wgsl. 1 workgroup per instanced cluster in the entire scene regardless of culling.

wicked notch Apr 19, 2024, 10:10 PM

#

looks fine

#

idk lol

#

what happens if you profile your shader with nsight

primal shadow Apr 19, 2024, 10:11 PM

#

Hold on had to go walk somewhere. Give me 15m to get back to my PC.

#

I'm suspicious that the spawn 1 wg per cluster and early exit if culled is too slow

#

Because notice both are slow, first and second pass

#

Even though the second pass should be doing nothing

glass sphinx Apr 19, 2024, 10:17 PM

#

primal shadow Yes. Here's a breakdown of problems: * The two green vkCmdCopyBuffer()s (4ms eac...

pes + vpc probably means hw culling is the limiting factor

#

like backface and frustum

#

also im pretty sure nanite uses no index buffer

#

writing it is just too slow

wicked notch Apr 19, 2024, 10:20 PM

#

Jasmine doesn't as well

#

they use a regular non indexed draw

glass sphinx Apr 19, 2024, 10:20 PM

#

what is the write index buffer doing then

primal shadow Apr 19, 2024, 10:21 PM

#

glass sphinx pes + vpc probably means hw culling is the limiting factor

I swear I turned off backface culling in my pipeline descriptor... I'll have to chdck

frank sail Apr 19, 2024, 10:21 PM

#

vpc = viewport culling

#

turning off back face culling won't help there

primal shadow Apr 19, 2024, 10:21 PM

#

glass sphinx also im pretty sure nanite uses no index buffer

In my testing using a whole indirect draw args per cluster was wayyy worse. Maybe it's better when almost all of your clusters are software rasterizef though...

glass sphinx Apr 19, 2024, 10:22 PM

#

yes it is

#

idk what your index buffer is

#

writing out the meshlets should be at most 1/10th of the draw tome

#

time

primal shadow Apr 19, 2024, 10:23 PM

#

glass sphinx idk what your index buffer is

1 u32 (cluster id + triangle id) per triangle, so that the vertex shader knows what to draw

glass sphinx Apr 19, 2024, 10:24 PM

#

so your indey buffer is really a primitive buffer i see

#

hmmm intresting that thats so slow

wicked notch Apr 19, 2024, 10:24 PM

#

ye there should be no reason to be so slow

#

send the trace, I am curious

glass sphinx Apr 19, 2024, 10:25 PM

#

yes

#

ah btw

#

remove the wg barrier

#

the atomic will be optimized by the hw if you are on nv

#

the barrier will most likely make it slower

primal shadow Apr 19, 2024, 10:26 PM

#

Will do when I'm home, 5m

wicked notch Apr 19, 2024, 10:26 PM

#

glass sphinx the atomic will be optimized by the hw if you are on nv

wot

#

how does that work

glass sphinx Apr 19, 2024, 10:27 PM

#

amd and nv bith have heavy hw optimizations once you have extreme contention

#

so if all threads in a warp use the same address

#

it will catch that and do one atomic + warp prefix sum instead

#

instead of warp size n atomics

#

as soon as one thread diverges you die

wicked notch Apr 19, 2024, 10:28 PM

#

in that case there is only one thread in the wg doing the atomic tho

glass sphinx Apr 19, 2024, 10:28 PM

#

yea i believe this will be slower

#

the gain is lower then the cost of the barrier

#

so what im suggesting is doing the atomic in all threads and let the he catch it and make it fast

#

this way you don't pay for the barrier

#

maybe that will make no difference tho

#

it can make a bug difference

#

but its hit or miss

glass sphinx Apr 19, 2024, 10:31 PM

#

primal shadow Yes. Here's a breakdown of problems: * The two green vkCmdCopyBuffer()s (4ms eac...

what i did is write out a bitmask per cluster instead and early out in zhe vertex shader instead

#

this way you write way less

#

that was fast af

wicked notch Apr 19, 2024, 10:33 PM

#

so what you're saying is that the driver just does this

const uint local_offset = subgroupExclusiveAdd(meshlet_primitive_count);
uint global_offset = 0;
if (subgroupElect()) {
  global_offset = atomicAdd(..., meshlet_primitive_count * 3) / 3;
}```

frank sail Apr 19, 2024, 10:33 PM

#

shrimple as dat

glass sphinx Apr 19, 2024, 10:33 PM

#

not the driver

#

its the actual hw

wicked notch Apr 19, 2024, 10:33 PM

#

hardware accelerated subgroup memes

glass sphinx Apr 19, 2024, 10:33 PM

#

wicked notch so what you're saying is that the driver just does this ```cpp const uint local_...

yes

#

its the same with branches btw

#

if all threads have same condition only one path is taken

#

there are more insteuction slike this where the hw will check if all threads use the same x to go faster

frank sail Apr 19, 2024, 10:35 PM

#

the sc will perform optimizations like this btw

#

on AMD

glass sphinx Apr 19, 2024, 10:35 PM

#

sc?

frank sail Apr 19, 2024, 10:35 PM

#

the hardware also does stuff

glass sphinx Apr 19, 2024, 10:35 PM

#

shader compiler

wicked notch Apr 19, 2024, 10:35 PM

#

imagine trusting any compiler for glsl

glass sphinx Apr 19, 2024, 10:35 PM

#

yes

#

but for runtime data its nicer to have the hw do it

frank sail Apr 19, 2024, 10:35 PM

#

wicked notch imagine trusting any compiler for glsl

it's not a glsl compiler smart

glass sphinx Apr 19, 2024, 10:35 PM

#

so it can do it even if its unknown

primal shadow Apr 19, 2024, 10:36 PM

#

Ok I'm back, let me see

primal shadow Apr 19, 2024, 10:38 PM

#

glass sphinx what i did is write out a bitmask per cluster instead and early out in zhe verte...

This is in reference to what?

glass sphinx Apr 19, 2024, 10:39 PM

#

the mask masks tris in a meshlet

#

so one bit is one triangle

#

i had a limit of up to 64 tris per cluster so it was just 2 uinzs

primal shadow Apr 19, 2024, 10:39 PM

#

glass sphinx remove the wg barrier

Here?

    // Reserve space in the buffer for this meshlet's triangles, and broadcast the start of that slice to all threads
    if triangle_id == 0u {
        draw_index_buffer_start_workgroup = atomicAdd(&draw_indirect_args.vertex_count, meshlet.triangle_count * 3u);
        draw_index_buffer_start_workgroup /= 3u;
    }
    workgroupBarrier();

How is removing it safe? Other threads may read the value before thread 0 writes it no?

wicked notch Apr 19, 2024, 10:39 PM

#

glass sphinx i had a limit of up to 64 tris per cluster so it was just 2 uinzs

did you try the prefix sum method btw

#

I wanted to try it but I then just hopped on the mesh shader train KEKW

glass sphinx Apr 19, 2024, 10:40 PM

#

i did it yes

#

i did all methods that i know

wicked notch Apr 19, 2024, 10:40 PM

#

and the tri mask was the best?

glass sphinx Apr 19, 2024, 10:40 PM

#

on a 4080

wicked notch Apr 19, 2024, 10:40 PM

#

well

#

ye I would imagine blasting 100 million vertices for a 4080 is just another tuesday KEKW

glass sphinx Apr 19, 2024, 10:41 PM

#

well, on a 1080ti ir was kinda the same

wicked notch Apr 19, 2024, 10:41 PM

#

hmm interesting

#

you just write invalid vertices if the tri is masked right?

glass sphinx Apr 19, 2024, 10:41 PM

#

yea

#

i just send them to -2-2-2

primal shadow Apr 19, 2024, 10:42 PM

#

glass sphinx i had a limit of up to 64 tris per cluster so it was just 2 uinzs

So instead of writing a buffer of cluster|triangle IDs so each vertex invocation knows what data to fetch, you wrote out a list of just cluster IDs (1 per cluster), and then hardcoded the draw size to 64 * clusters? And then each vertex can find it's cluster via vertex_id % 64, and then you just output NaN for excess triangles?

glass sphinx Apr 19, 2024, 10:42 PM

#

i should have used nan as amd fats paths those

glass sphinx Apr 19, 2024, 10:42 PM

#

primal shadow So instead of writing a buffer of cluster|triangle IDs so each vertex invocation...

yes

primal shadow Apr 19, 2024, 10:42 PM

#

I tried that. It was slow...

glass sphinx Apr 19, 2024, 10:42 PM

#

what was slow

primal shadow Apr 19, 2024, 10:42 PM

#

The draws. All the extra vertex invocations were expensive.

glass sphinx Apr 19, 2024, 10:42 PM

#

yea but now your compute is slow

primal shadow Apr 19, 2024, 10:43 PM

#

hrmm maybe it's worth a second test...

#Iris - A Journey through OpenGL and beyond to learn Graphics