Iris - A Journey through OpenGL and beyond to learn Graphics | Graphics Programming | Page 12

wicked notch Oct 9, 2023, 8:29 PM

#

for view in views
  cull(); // uses previous' frame HZB
  draw();
mark_pages();
make_hzb();```

delicate rain Oct 9, 2023, 8:30 PM

#

oh you mean latency in the hzb?

wicked notch Oct 9, 2023, 8:30 PM

#

yeah

#

actually

#

I'm stupid

#

we have the frame of latency regardless

#

epic

delicate rain Oct 9, 2023, 8:31 PM

#

I probably won't matter much anyways

wicked notch Oct 9, 2023, 8:31 PM

#

yeah

#

alright

#

time to write

delicate rain Oct 9, 2023, 8:31 PM

#

my man is speedrunning this

wicked notch Oct 9, 2023, 8:32 PM

#

I am

#

it's my life

#

it's now or never

#

I ain't gonna live forever 🎶

delicate rain Oct 9, 2023, 8:32 PM

#

https://tenor.com/view/wunga-katawunga-respect-respekt-gif-18341871

Tenor

wicked notch Oct 9, 2023, 8:33 PM

#

(it's a reference to Bon Jovi's Livin' on a Prayer song, in case someone doesn't catch that)

frank sail Oct 9, 2023, 8:34 PM

#

I thought it was a reference to another song but ye

wicked notch Oct 9, 2023, 9:12 PM

#

vec4 project_screen_aabb(in aabb_t aabb, in mat4 transform, in mat4 proj_view) {
    const vec3[] corners = vec3[](
        vec3(aabb.min.x, aabb.min.y, aabb.min.z),
        vec3(aabb.min.x, aabb.min.y, aabb.max.z),
        vec3(aabb.min.x, aabb.max.y, aabb.min.z),
        vec3(aabb.min.x, aabb.max.y, aabb.max.z),
        vec3(aabb.max.x, aabb.min.y, aabb.min.z),
        vec3(aabb.max.x, aabb.min.y, aabb.max.z),
        vec3(aabb.max.x, aabb.max.y, aabb.min.z),
        vec3(aabb.max.x, aabb.max.y, aabb.max.z)
    );
    vec2 min_xy = vec2(1.0);
    vec2 max_xy = vec2(0.0);
    for (uint i = 0; i < 8; ++i) {
        const vec4 clip = proj_view * transform * vec4(corners[i], 1.0);
        const vec2 ndc = clamp(clip.xy / clip.w, -1.0, 1.0);
        const vec2 uv = ndc * vec2(0.5, -0.5) + 0.5;
        min_xy = min(min_xy, uv);
        max_xy = max(max_xy, uv);
    }
    return vec4(min_xy, max_xy);
}

bool is_meshlet_visible(in vec4 box_uvs) {
    const vec2 min_xy = box_uvs.xy;
    const vec2 max_xy = box_uvs.zw;
    const vec2 hzb_size = vec2(imageSize(u_vsm_hzb));
    const float max_mip = 1 + floor(log2(max(hzb_size.x, hzb_size.y)));
    const float width = (max_xy.x - min_xy.x) * hzb_size.x;
    const float height = (max_xy.y - min_xy.y) * hzb_size.y;
    const float mip = clamp(ceil(log2(max(width, height))), 0, max_mip);
    const bvec4 samples = bvec4(
        bool(texelFetch(u_vsm_hzb, ivec2(box_uvs.xy * hzb_size), int(mip)).x),
        bool(texelFetch(u_vsm_hzb, ivec2(box_uvs.zy * hzb_size), int(mip)).x),
        bool(texelFetch(u_vsm_hzb, ivec2(box_uvs.xw * hzb_size), int(mip)).x),
        bool(texelFetch(u_vsm_hzb, ivec2(box_uvs.zw * hzb_size), int(mip)).x)
    );
    return any(samples);
}``` this be kinda weird

wicked notch Oct 10, 2023, 12:42 AM

#

583 device lost and counting

frank sail Oct 10, 2023, 1:11 AM

#

BORN TO TDR / GPU IS A FUCK / Crash Em All 1989 / I am trash man / 410,757,864,530 LOST DEVICES

wispy spear Oct 10, 2023, 1:28 AM

#

lol

wicked notch Oct 10, 2023, 10:15 AM

#

so first culling attempt

#

still takes a fuckton of time to rasterize

#

the regular raster path takes 47 microseconds (without culling lol), VSM raster takes 1m (with culling)

#

And the VSM hardware raster shader is sus

#

But I don't see anything extremely slow with it, except the likely divergence lol

#

How loading gl_FragCoord cause a short scoreboard dependency is beyond me

frank sail Oct 10, 2023, 10:23 AM

#

can you show PTX in the right side

#

SPIR-V is very fake

wicked notch Oct 10, 2023, 10:24 AM

#

hmm I don't see how to show it

#

frank sail Oct 10, 2023, 10:27 AM

#

rip

#

I forgor you can't see real assembly

wicked notch Oct 10, 2023, 10:27 AM

#

can AMD hardware help here

#

maybe show how AMD compiles this shader

frank sail Oct 10, 2023, 10:30 AM

#

download RGP

wicked notch Oct 10, 2023, 10:39 AM

#

there you go AMD man

frank sail Oct 10, 2023, 10:45 AM

#

I guess gl_Position is stored in four SGPRs which have almost no latency to load

#

so uh I guess that won't cause a stall on AMD

#

idk what NV is doing, but it's probably similarish (they don't have SGPRs, but gl_Position should come from some low-latency memory somewhere)

wicked notch Oct 10, 2023, 10:47 AM

#

so how fix

frank sail Oct 10, 2023, 10:50 AM

#

ok actually that s_load_b128 is loading gl_Position into four SGPRs from memory somewhere (s[0:1] is two SGPRs holding an address)

frank sail Oct 10, 2023, 10:52 AM

#

wicked notch so how fix

idk

#

I'm guessing the source mapping is just misinfo

wicked notch Oct 10, 2023, 10:52 AM

#

epic

#

caching time then

#

"The best rasterization technique is no rasterization"
~Sun Tzu

wicked notch Oct 10, 2023, 11:15 AM

#

wicked notch so first culling attempt

another sus thing is the PROP util

#

how is PROP the top sol when it handles early-late Z/depth testing and blending

#

all of which are disabled in the VSM pipeline bleakekw

frank sail Oct 10, 2023, 11:17 AM

#

the tools are breaking down on us brother

#

we must finish this journey alone

wicked notch Oct 10, 2023, 11:18 AM

#

I'm so conchfused rn bleakekw

frank sail Oct 10, 2023, 11:19 AM

#

if it makes you feel any better, my hzb is broken and idk why

glass sphinx Oct 10, 2023, 11:33 AM

#

Btw since turing nvidia has uniform registers, which are nearly identical to sgprs on amd afaik

#

they are also integer only

frank sail Oct 10, 2023, 11:33 AM

#

wdym integer only

#

like they store unformatted data?

glass sphinx Oct 10, 2023, 11:34 AM

#

the sgpr operations can not operate on floating point

frank sail Oct 10, 2023, 11:34 AM

#

well that's all registers

glass sphinx Oct 10, 2023, 11:34 AM

#

i guess they can load whatever

frank sail Oct 10, 2023, 11:34 AM

#

ok

#

so you have to load them into vector registers to do floating point meth

glass sphinx Oct 10, 2023, 11:34 AM

#

yea thats the same on amd

#

i had to learn that the hard way at work

#

adding one mul made the vgpr use explode by 12

#

🥸 sometimes compiler optimizations bite my ass when they suddenly get turned off by some bingus

frank sail Oct 10, 2023, 11:35 AM

#

glass sphinx yea thats the same on amd

Scalar ALU (SALU) instructions operate on values that are common to all work-items in the wave. These
operations consist of 32-bit integer or float arithmetic, and 32- or 64-bit bit-wise operations. The SALU also can
perform operations directly on the Program Counter, allowing the program to create a call stack in SGPRs.
Many operations also set the Scalar Condition Code bit (SCC) to indicate the result of a comparison, a carry-out,
or whether the instruction result was zero.

glass sphinx Oct 10, 2023, 11:36 AM

#

i an asure you amd can not do floating point ops with the scalar alu

glass sphinx Oct 10, 2023, 11:36 AM

#

frank sail > Scalar ALU (SALU) instructions operate on values that are common to all work-i...

where is this from?

frank sail Oct 10, 2023, 11:36 AM

#

the official RDNA 3 ISA guide

#

you may have been looking at RDNA 1 or 2

glass sphinx Oct 10, 2023, 11:37 AM

#

afaik rdna 3 also doesnt have it yet

#

hmm

frank sail Oct 10, 2023, 11:37 AM

#

what makes you say that?

glass sphinx Oct 10, 2023, 11:37 AM

#

i believe i searched for it in the isa before

#

how are they named

#

i cant find any s_XXX_f32 instructions

#

I cant find a single scalar f32 instruction 😦

frank sail Oct 10, 2023, 11:40 AM

#

I guess my snippet is misinfo because neither can I

glass sphinx Oct 10, 2023, 11:40 AM

#

Scalar ALU (SALU) instructions operate on values that are common to all work-items in the wave. These
operations consist of 32-bit integer or float arithmetic, and 32- or 64-bit bit-wise operations. The SALU also can
This is really specific on saying float arithmetic works huuuuuuuuh

#

i guess its just very unfortunate wording

frank sail Oct 10, 2023, 11:41 AM

#

the registers can be used for VALU ops but ye

glass sphinx Oct 10, 2023, 11:41 AM

#

mhmm

#

but there are rumors that rdna 4 gets float support on the salu

#

that would be sick

wicked notch Oct 10, 2023, 12:00 PM

#

conchfusion levels are reaching heights I didn't think were possible

#

@frank sail if you disable caching in your thing, what does the GPU trace look like

frank sail Oct 10, 2023, 12:01 PM

#

99% vsm drawing 1% everything else

#

ok let me give you better info

wicked notch Oct 10, 2023, 12:03 PM

#

ye can you show the trace

frank sail Oct 10, 2023, 12:06 PM

#

oops one sec

#

got distracted with manmade horrors in #questions

#

~~hold up imma delete that pic~~

#

better crop + relevant pass is selected so the counters are accurate

wicked notch Oct 10, 2023, 12:09 PM

#

Right yeah this looks like a healthy trace

#

pixel warps very high and top SOL = SM

wicked notch Oct 10, 2023, 12:10 PM

#

wicked notch so first culling attempt

but this is completely bogus

frank sail Oct 10, 2023, 12:10 PM

#

hmm

#

I'm drawing every vsm btw

wicked notch Oct 10, 2023, 12:13 PM

#

yeah same

#

#

let me get the simple scene

frank sail Oct 10, 2023, 12:15 PM

#

I gotta schleep but I'll talk l9er

frank sail Oct 10, 2023, 12:20 PM

#

wicked notch let me get the simple scene

Ye btw I was testing on this

#

Which I guess has low geometry complexity but whatever

#

I wonder if low geometry density is the Achilles' heel of this since there will be too much overdraw

#

Not overdraw but rather fs invocations

wicked notch Oct 10, 2023, 12:26 PM

#

baffling to say the least

#

is it mesh shaders?

#

somehow mesh shaders are garbage for this?

#

I dunno

#

let me try something

frank sail Oct 10, 2023, 12:28 PM

#

Look at what the active warps are

#

For me it was 99.9% PS warps

wicked notch Oct 10, 2023, 12:29 PM

#

For me it's 30% PS warps with an occasional spike to 60%

#

frank sail Oct 10, 2023, 12:30 PM

#

Btw I hypothesize that reducing the VSM resolution could improve perf as we'll have far fewer fragments for geometry hanging off the edge of the visible bounds

#

As we've learned, only a small fraction of the VSM is in use at a time which makes it a viable change

frank sail Oct 10, 2023, 12:32 PM

#

wicked notch

Did u make sure to select that pass so we're looking at the right counters

#

If so, then those are spooky numbers, Mason

wicked notch Oct 10, 2023, 12:32 PM

#

uhh one sec

#

there it is

#

spooky numbers indeed

raven orchid Oct 10, 2023, 12:34 PM

#

frank sail Btw I hypothesize that reducing the VSM resolution could improve perf as we'll h...

Are you thinking just halve the virtual res?

#

The other option would bring back readback

frank sail Oct 10, 2023, 12:35 PM

#

Yeah or even smaller if possible

#

True bleakekw

raven orchid Oct 10, 2023, 12:35 PM

#

I’m using 8k personally

wicked notch Oct 10, 2023, 12:41 PM

#

I'll let my brain machine work

#

you go schlepp jaker, you don't have to suffer with me bleakekw

frank sail Oct 10, 2023, 12:42 PM

#

Hold on I just had an unhinged idea

#

Are 8192^2 textures a thing that we can make

#

We could make an 8 bit stencil texture and use that for early-stencil

wicked notch Oct 10, 2023, 12:43 PM

#

that is so unhinged I have no idea how that works 💀

raven orchid Oct 10, 2023, 12:43 PM

#

Oh dang that brings back week 1 vsm memories

frank sail Oct 10, 2023, 12:44 PM

#

wicked notch that is so unhinged I have no idea how that works 💀

We would have to populate the texture to indicate where dirty pages are

#

But the beauty of it is that you just need one total, as long as you're okay with

foreach view:
  CullMeshlets(view);
  Draw(view);

wicked notch Oct 10, 2023, 12:46 PM

#

why is a 8192 texture required tho

frank sail Oct 10, 2023, 12:46 PM

#

Now it's a question of whether the early-s will actually save any perf

frank sail Oct 10, 2023, 12:46 PM

#

wicked notch why is a 8192 texture required tho

Because we can't make a 16k^2 texture

wicked notch Oct 10, 2023, 12:46 PM

#

I severely doubt that

raven orchid Oct 10, 2023, 12:47 PM

#

We asked froyo that I think

#

Answer was…. Crap

#

Something

frank sail Oct 10, 2023, 12:47 PM

#

Hmm

raven orchid Oct 10, 2023, 12:47 PM

#

frank sail Because we can't make a 16k^2 texture

My potato gpu let me

frank sail Oct 10, 2023, 12:47 PM

#

I'm bottlenecked by PS invocations so it'd be nice to skip a bunch of them

#

Not as good as viewport culling, but it's perhaps something

raven orchid Oct 10, 2023, 12:48 PM

#

Can you resize a viewport using the gpu

frank sail Oct 10, 2023, 12:49 PM

#

raven orchid My potato gpu let me

At some point I'd guess that filling out the bigass texture will take a lot of time

frank sail Oct 10, 2023, 12:49 PM

#

raven orchid Can you resize a viewport using the gpu

No 😢

raven orchid Oct 10, 2023, 12:49 PM

#

Crap

frank sail Oct 10, 2023, 12:51 PM

#

Device generated commands

For each draw in a sequence, the following can be specified:

a different shader group

a number of vertex buffer bindings

a different index buffer, with an optional dynamic offset and index type

a number of different push constants

a flag that encodes the primitive winding

#

Rip formating

raven orchid Oct 10, 2023, 12:52 PM

#

Hmmm

#

So this issue

#

Is it that too many meshlets are spilling into non dirty or non resident pages so there is wasted work?

frank sail Oct 10, 2023, 12:53 PM

#

Setting the front face from the gpu seems useless

frank sail Oct 10, 2023, 12:53 PM

#

raven orchid Is it that too many meshlets are spilling into non dirty or non resident pages s...

Ye that's the theory

#

It should only be bad for large meshlets, however

wicked notch Oct 10, 2023, 12:54 PM

#

We both use 64/64 ultra small meshlets sir

frank sail Oct 10, 2023, 12:54 PM

#

I mean physically lol

wicked notch Oct 10, 2023, 12:54 PM

#

Hmmm

frank sail Oct 10, 2023, 12:54 PM

#

Low geometry density is the problem

wicked notch Oct 10, 2023, 12:54 PM

#

oh yeah

#

yes

#

the solution is obviously subdividing the geometry

frank sail Oct 10, 2023, 12:55 PM

#

We need nanite 1 tri/px

wicked notch Oct 10, 2023, 12:55 PM

#

frank sail We need nanite 1 tri/px

Oh yeah, that's probably why it works so well for them

#

damn the small indie company Epic Games really did think it through

frank sail Oct 10, 2023, 12:56 PM

#

Other solutions

per triangle culling
per triangle clipping against min/max
sw rast

raven orchid Oct 10, 2023, 12:56 PM

#

So we just need to take a small detour and implement full nanite

#

Then our vsms are done

wicked notch Oct 10, 2023, 12:56 PM

#

well that was in the todo list for me KEKW

#

I "just" need to figure out graph partitioning

frank sail Oct 10, 2023, 12:56 PM

#

sw rast might actually be viable here

#

I can just lift lvstri's impl frogeheart

raven orchid Oct 10, 2023, 12:57 PM

#

Also others:

readback to modify viewport but 1-3 frames behind
only render 1/4 of each clip per frame

#

Wait so what kind of speed up did hzb give?

wicked notch Oct 10, 2023, 12:58 PM

#

My ultra naive shrimple HZB did a pretty good job at removing wasted mesh shader invocations

#

but well

#

it's still ultra bad

frank sail Oct 10, 2023, 12:59 PM

#

raven orchid Wait so what kind of speed up did hzb give?

It's currently broken so I can't say

#

But preliminary results say: beeg

wicked notch Oct 10, 2023, 12:59 PM

#

ye it's overall beeg

#

but not enough™️

raven orchid Oct 10, 2023, 1:00 PM

#

Other option is like

frank sail Oct 10, 2023, 1:00 PM

#

Btw another solution

increase the size of the smallest VSMs so meshlets aren't big compared to them

raven orchid Oct 10, 2023, 1:00 PM

#

How tiny do we need the base clip map

#

Oh yeah that’s where I was going right now lol

#

My confession is I never wanted vsm quality shadows

#

I just wanted csm but with sparse caching

wicked notch Oct 10, 2023, 1:01 PM

#

heresy

frank sail Oct 10, 2023, 1:01 PM

#

Have you heard of parallax corrected cached shadows

raven orchid Oct 10, 2023, 1:01 PM

#

So I toned down my base clip map lol

#

Sounds familiar

frank sail Oct 10, 2023, 1:03 PM

#

Tldr solution for the sun rotation with cached shadows

#

You can still have it rotate in huge quantized steps, but you do a ray marching hack to find where the shadow "would be" if the sun was at another angle

raven orchid Oct 10, 2023, 1:04 PM

#

Does that basically let you reuse old cached data?

frank sail Oct 10, 2023, 1:04 PM

#

So you can get the appearance of smooth rotation while rendering the shadows very infrequently

raven orchid Oct 10, 2023, 1:04 PM

#

Oh wow that actually sounds badly needed

frank sail Oct 10, 2023, 1:04 PM

#

It's only in gpu zen 2

#

But you can Google it and find the online bits

frank sail Oct 10, 2023, 1:05 PM

#

raven orchid Oh wow that actually sounds badly needed

Idk how it works with dynamic geometry unfortunately

#

I think far cry 5 (the game that uses them iirc) only did the parallax thing for their long distance adaptive shadows

#

They did regular csm up to like 30m

raven orchid Oct 10, 2023, 1:08 PM

#

I still think vsm has the potential to replace csm fully

wicked notch Oct 10, 2023, 1:10 PM

#

caching + good culling has potential

#

but caching + good culling + 1 tri/px should be best

#

you just need a GPU capable of rasterizing SCREEN_RESOLUTION triangles N times bleakekw

raven orchid Oct 10, 2023, 1:12 PM

#

Caching + good culling + viewport resize + temporal shadow update budget + parallax corrected + stencil rejection (maybe???)

wicked notch Oct 10, 2023, 1:13 PM

#

extremely tedious

raven orchid Oct 10, 2023, 1:13 PM

#

And yeah if you just go full nanite that’ll be the best

wicked notch Oct 10, 2023, 1:13 PM

#

but possible KEKW

frank sail Oct 10, 2023, 1:13 PM

#

wicked notch you just need a GPU capable of rasterizing SCREEN_RESOLUTION triangles N times <...

Oh ye the other part is that nanite can make these be 1px in light space

#

I'm working with 1 resolution of triangles

wicked notch Oct 10, 2023, 1:14 PM

#

ye they do LOD per clipmap

#

and then do SSS to fix discrepancies

frank sail Oct 10, 2023, 1:14 PM

#

Tf

frank sail Oct 10, 2023, 1:15 PM

#

raven orchid Caching + good culling + viewport resize + temporal shadow update budget + paral...

This is how shadow map enjoyers cope

wicked notch Oct 10, 2023, 1:16 PM

#

yes I'm a SHADOWMAP enjoyer
S help
H orror
A re
D neverending
O this is a legitimate call for help
W
M
A
P

raven orchid Oct 10, 2023, 1:17 PM

#

raven orchid So I toned down my base clip map lol

Also I should mention I did this but

#

Shadow quality is still way better than csm for the same perf

#

So despite the nightmare fuel I think vsm is still worth it

frank sail Oct 10, 2023, 1:18 PM

#

God it better be

#

raven orchid Oct 10, 2023, 1:19 PM

#

4 cascades at 8k with csm killed my gpu

#

But vsm it actually works lol

wicked notch Oct 10, 2023, 1:19 PM

#

frank sail Oct 10, 2023, 1:20 PM

#

raven orchid But vsm it actually works lol

So basically treating VSM as a nuclear bomb that makes CSM problems disappear

raven orchid Oct 10, 2023, 1:20 PM

#

That’s how I’m using lmao

#

Like my shadow bias is now 0

frank sail Oct 10, 2023, 1:20 PM

#

Tfw no 30 clipmaps at 16k res 😔

raven orchid Oct 10, 2023, 1:20 PM

#

Consistent across all scenes

#

Actually I can do more than 4 and perf stays the same

#

Probably 10 would be doable

frank sail Oct 10, 2023, 1:21 PM

#

raven orchid Like my shadow bias is now 0

Literally 0? Because idk how that'd work

raven orchid Oct 10, 2023, 1:21 PM

#

Well I still do normal vector offsetting during sampling

frank sail Oct 10, 2023, 1:21 PM

#

O

raven orchid Oct 10, 2023, 1:21 PM

#

But slope biasing I removed

frank sail Oct 10, 2023, 1:22 PM

#

You still need a slope bias hmm

#

Or do you mean polygon offset is gone

raven orchid Oct 10, 2023, 1:22 PM

#

Yeah that one

frank sail Oct 10, 2023, 1:23 PM

#

There is a formula to calculate the exact bias needed but I couldn't get it to work so I just added more constants bleakekw

#

I'll get to it later lol

#

Along with proper filtering/10k lines of SMRT

#

Anyways I gotta sleep fr

wispy spear Oct 16, 2023, 11:34 PM

#

hmm

#

not sure if this is of any interest to you my frog

#

https://youtu.be/uLnxLemdVlQ?t=154

YouTube

Gamefromscratch

Stunning Fantasy Environments Bundle - 1 Week Only!

The Mini But Mighty Fantasy Unreal Engine Environments is a one week only tiny bundle of very high quality fantasy environments. While it's for Unreal Engine, as you can see in the video it's trivial to get the assets into the Godot Engine and even Blender. (Unreal has amazing built-in exporting capabilities)

Links
https://www.humblebundle.c...

▶ Play video

delicate rain Oct 16, 2023, 11:48 PM

#

Oh the castle looks awesome

wispy spear Oct 16, 2023, 11:49 PM

#

yeah

#

thats how the castle thing in UE4 demo should have looked like xD

delicate rain Oct 16, 2023, 11:50 PM

#

Lmao that would have been insane

wispy spear Oct 16, 2023, 11:52 PM

#

when you watch it now, its quite underwhelming somehow

delicate rain Oct 16, 2023, 11:53 PM

#

Really? Those are some standards you have mister

wispy spear Oct 16, 2023, 11:58 PM

#

it was stunning back then

wicked notch Oct 23, 2023, 11:25 AM

#

ok so

#

I have been stomped on the great PROP incident for the past week

#

I have zero clue as to what to do besides implement caching

#

but that just sounds like a bandaid to some ultra weird underlying problem

frank sail Oct 23, 2023, 11:27 AM

#

You can always pause this and begin caching and hzb

#

More like h"z"b since there isn't a z

wicked notch Oct 23, 2023, 11:28 AM

#

yeah I figure that's all I can do

#

but caching still scares me

#

whatever Saky did was terrifying bleakekw

frank sail Oct 23, 2023, 11:28 AM

#

Ah

#

I think you can do something shrimpler, like what I did

#

(which still took me weeks to implement because smooth brain)

wicked notch Oct 23, 2023, 11:29 AM

#

ye perchance

frank sail Oct 23, 2023, 11:29 AM

#

first step is getting the stable addresses

#

So you can get sharp shadows everywhere, but while keeping the page addresses the same

#

Second step is making sure pages are marked correctly (dirty pages should only be the ones that were just alloc'd this frame, pages that were already visible and continue to be visible should be untouched)

#

Third step is making the pyramid of dirty pages and culling against it like an hzb

wicked notch Oct 23, 2023, 11:42 AM

#

nice n easy

#

what about the snapping

frank sail Oct 23, 2023, 11:43 AM

#

It took me like a week but it's literally 10-20 loc

#

That's part of step 1

#

It hurts my fingers to write how it works on mobile

#

This fn can be cheaply called every frame
https://github.com/JuanDiegoMontoya/Frogfood/blob/526f7a6645207abbcb05a26a1f778b3ba05580d2/src/techniques/VirtualShadowMaps.cpp#L349

#

(If there is no rotation)

#

Basically this function computes

A view matrix that is snapped to the grid to use for rendering the shadow map
An offset to apply in the fragment shader of rendering the shadow map which "corrects" the page address. This is needed because the fragment shader only knows about window space (gl_FragCoord)

#

Every other lookup into the VSM uses the stable viewproj, which is placed at the origin, and fract(uv)

wicked notch Oct 23, 2023, 11:53 AM

#

hmmmm

#

the brain's working

frank sail Oct 23, 2023, 11:53 AM

#

If I could draw a pic

#

But I'm on mobile bleakekw

raven orchid Oct 23, 2023, 11:58 AM

#

wicked notch I have been stomped on the great PROP incident for the past week

Wait which incident was this?

#

Idk what it stands for

raven orchid Oct 23, 2023, 12:00 PM

#

frank sail More like h"z"b since there isn't a z

We need to give this a special name

#

Virtual occlusion culling, voc

#

Idk

frank sail Oct 23, 2023, 12:01 PM

#

page overdraw gulling (pog)

raven orchid Oct 23, 2023, 12:02 PM

#

I like it

#

“In order to improve performance we introduce hierarchical pog”

frank sail Oct 23, 2023, 12:03 PM

#

virtual undesirables kulling (vuk)

#

Maybe just hierarchical page culling (hpc)

#

But it isn't culling pages rgbemojiwiggled

#

"Hierarchical page buffer" can't be misinterpreted probably

raven orchid Oct 23, 2023, 12:07 PM

#

Yeah true I don’t think hpb will be misinterpreted

#

Hopefully

frank sail Oct 23, 2023, 12:08 PM

#

As long as it has "hierarchical" in the name tbh

wicked notch Oct 23, 2023, 12:08 PM

#

wicked notch baffling to say the least

This is the great PROP incident btw

#

rendering VSMs for me takes a huge amount of time, the profiler just tells me "idk what it is but the PROP unit is dying"

raven orchid Oct 23, 2023, 12:08 PM

#

Oh weird

#

Full res 16k?

wicked notch Oct 23, 2023, 12:09 PM

#

16k viewport yeah

frank sail Oct 23, 2023, 12:09 PM

#

FYI mine just renders 4k now (but 16k should be okay after the new culling)

wicked notch Oct 23, 2023, 12:10 PM

#

4k?

#

As in 4k virtual shadow map?

frank sail Oct 23, 2023, 12:11 PM

#

Yeah

#

Instead of 16k

wicked notch Oct 23, 2023, 12:11 PM

#

hm

#

what happens if you use 16k

frank sail Oct 23, 2023, 12:11 PM

#

It doesn't affect correctness at all except at very high resolutions

frank sail Oct 23, 2023, 12:11 PM

#

wicked notch what happens if you use 16k

I'm not at computer to test

#

Before culling, it worked fine but with worse perf

#

Now I suspect the perf hit is much smaller

wicked notch Oct 23, 2023, 12:12 PM

#

do you have the thing 🅱️ushed

frank sail Oct 23, 2023, 12:12 PM

#

Yes, but there is still a bug

#

A 1 liner if you want to pull

#

Oh wait I didn't push that bug

#

Hmm hold on, lemme boot and push (might be a few)

wicked notch Oct 23, 2023, 12:15 PM

#

smh imagine not pushing bugs

frank sail Oct 23, 2023, 12:15 PM

#

Well I committed it, just haven't pushed kekwfroggified

frank sail Oct 23, 2023, 12:36 PM

#

@wicked notch you can pull now

#

it probably works on linux

wicked notch Oct 23, 2023, 12:43 PM

#

btw

#

you do https://github.com/JuanDiegoMontoya/Frogfood/blob/main/data/shaders/visbuffer/CullMeshlets.comp.glsl#L174 this thing

#

but since ndc coords can go beyond [-1;1] this is potentially not the actual min isn't it?

#

perhaps a fract is needed?

frank sail Oct 23, 2023, 12:45 PM

#

Oh

#

Yeah there is no actual min and max, you understood right

#

They should be infinity and negative infinity to start I guess

frank sail Oct 23, 2023, 12:47 PM

#

wicked notch perhaps a `fract` is needed?

Fract isn't needed though

#

I use a repeat sampler in this case

#

Though fract could be used instead

delicate rain Oct 23, 2023, 1:03 PM

#

wicked notch but since ndc coords can go beyond [-1;1] this is potentially not the actual min...

I project the camera position by the sun view projection matrix (sun is at 0,0). Then I divide the ndc position by ndc page size and take the ceil of that and that's it you got your offset

#

That is if you want to do the shrimple Jaker view

#

Not sure if I tagged the correct message 😅

wicked notch Oct 23, 2023, 3:04 PM

#

ahhh yes

#

branch

#

we all know the typical branching operation: storing a value

wicked notch Oct 23, 2023, 5:15 PM

#

guys I think I'm going insane

#

gl_FragCoord is in window space right?

#

That means if I am rasterizing a 16384^2 viewport, gl_FragCoord should be in range [0;16384) right?

delicate rain Oct 23, 2023, 5:19 PM

#

Yeah

wicked notch Oct 23, 2023, 5:21 PM

#

ok

#

Apparently I had a massive bug

#

but the thing just worked because some genius decided that 16384 / 128 = 128

delicate rain Oct 23, 2023, 5:22 PM

#

Lmao I'm always scared of changing the numbers I have in my defines

#

Because it's 50% chance of breaking every piece of math that was previously aligned

wicked notch Oct 23, 2023, 5:26 PM

#

ye that's exactly what happened

wicked notch Oct 23, 2023, 5:42 PM

#

as it turns out

#

rendering twelve 16k shadow maps is hard

#

and as it turns out again

#

halving the resolution unbounded me from the PROP

wicked notch Oct 23, 2023, 6:39 PM

#

Nsight crashed my computer

#

thank you nvidia

#

.capacity = (1 << 24) >> 4, // 2^24 meshlets, packed```

#

I could've just written 1 << 20

#

but this is funnier

wispy spear Oct 23, 2023, 6:59 PM

#

: D

wicked notch Oct 23, 2023, 7:17 PM

#

not bad

#

albeit the culling is slightly off

wispy spear Oct 23, 2023, 7:26 PM

#

SHRTS should be SHITS

wicked notch Oct 23, 2023, 8:19 PM

#

obligatory culling failure

#

I am very much not confident in this:

bool project_screen_aabb(in aabb_t aabb, in mat4 transform, in mat4 proj_view, out vec4 box_uvs) {
    const vec3[] corners = vec3[](
        vec3(aabb.min.x, aabb.min.y, aabb.min.z),
        vec3(aabb.min.x, aabb.min.y, aabb.max.z),
        vec3(aabb.min.x, aabb.max.y, aabb.min.z),
        vec3(aabb.min.x, aabb.max.y, aabb.max.z),
        vec3(aabb.max.x, aabb.min.y, aabb.min.z),
        vec3(aabb.max.x, aabb.min.y, aabb.max.z),
        vec3(aabb.max.x, aabb.max.y, aabb.min.z),
        vec3(aabb.max.x, aabb.max.y, aabb.max.z)
    );
    vec2 min_xy = vec2(1.0);
    vec2 max_xy = vec2(0.0);
    for (uint i = 0; i < 8; ++i) {
        const vec4 clip = proj_view * transform * vec4(corners[i], 1.0);
        if (clip.w <= 0.0) {
            return false;
        }
        const vec2 ndc = fract(clip.xy / clip.w);
        const vec2 uv = ndc * vec2(0.5, -0.5) + 0.5;
        min_xy = min(min_xy, uv);
        max_xy = max(max_xy, uv);
    }
    box_uvs = vec4(min_xy, max_xy);
    return true;
}

#

specifically the const vec2 ndc = fract(clip.xy / clip.w); part

wispy spear Oct 23, 2023, 8:39 PM

#

hmm would it help to visualize the values?

wicked notch Oct 23, 2023, 8:40 PM

#

they're kinda difficult to visualize

#

but perchance

#

actually they may not be difficult at all

wispy spear Oct 23, 2023, 8:42 PM

#

maybe with a little debug switch in there so that you can toggle that on and off from outside

distant lodge Oct 23, 2023, 8:44 PM

#

when I was having culling trouble I rendered out debug quads of the depth buffer linearized

#

so I could see what was there

#

(debug quads of the projected sphere)

wicked notch Oct 24, 2023, 9:54 AM

#

well this is..

#

a mess

#

bleakekw

frank sail Oct 24, 2023, 9:55 AM

#

what da views doin

wicked notch Oct 24, 2023, 10:03 AM

#

ight caching time

wicked notch Oct 24, 2023, 10:30 AM

#

I be wondering

#

if before caching

#

I should reimplement my nanite

#

just to see if it actually benefits

frank sail Oct 24, 2023, 10:31 AM

#

we do a little detouring

wicked notch Oct 24, 2023, 10:31 AM

#

I have all the code ready, I just have to mash it together

#

and make it work (optionally)

delicate rain Oct 24, 2023, 10:31 AM

#

LVSTRI procrastinating caching I'm procrastinating culling - I like the dynamic

frank sail Oct 24, 2023, 10:31 AM

#

uh methinks you should do the naniteisms later

wicked notch Oct 24, 2023, 10:32 AM

#

ye perchance

wicked notch Oct 24, 2023, 11:33 PM

#

man brain fog is real

#

me: where the fuck is my phone
also me: has phone in my fucking hands with the flashlight on

#

early onset dementia fr

wispy spear Oct 24, 2023, 11:35 PM

#

lol happened to me recently as well, was under the desk to fiddle cables in the darkness 😄

#

#youarenotalone

raven orchid Oct 25, 2023, 7:07 PM

#

i have a question

#

how long did it take to implement lod 0 sw rasterizer in compute for virtual geom?

wicked notch Oct 25, 2023, 7:09 PM

#

A week maybe

#

because I am dumb

#

You can find my endless struggles in this thread bleakekw

raven orchid Oct 25, 2023, 7:10 PM

#

dang that's pretty fast tho

#

what was or is performance like?

wicked notch Oct 25, 2023, 7:10 PM

#

I did only visbuf

#

but the gains were there

#

up to 3x as Unreal tested

raven orchid Oct 25, 2023, 7:11 PM

#

interesting ok

#

guess the stuff in wip has me in a weird place

frank sail Oct 25, 2023, 7:13 PM

#

It has left me feeling shaken

raven orchid Oct 25, 2023, 7:13 PM

#

shaken belief in a culling solution?

wicked notch Oct 25, 2023, 7:13 PM

#

The early Z almost made a comeback

#

rip

frank sail Oct 25, 2023, 7:14 PM

#

devsh mentioned interpolateAtOffset which is viable

wicked notch Oct 25, 2023, 7:15 PM

#

ye

#

I gotta render some quads fr

frank sail Oct 25, 2023, 7:15 PM

#

actually you'd have to write to gl_FragDepth which would kill early z on most hw anyway

#

meh

#

actually

#

yeah just render some quads hehe

wicked notch Oct 25, 2023, 7:16 PM

#

wait why would you write to gl_FragDepth

frank sail Oct 25, 2023, 7:17 PM

#

you don't

wicked notch Oct 25, 2023, 7:17 PM

#

o

frank sail Oct 25, 2023, 7:17 PM

#

I was trippin fr

wicked notch Oct 25, 2023, 7:17 PM

#

we good then?

frank sail Oct 25, 2023, 7:17 PM

#

reasonably

#

just one quad per active page to populate the depth buffer

wicked notch Oct 25, 2023, 7:18 PM

#

how do I make the quad

#

2 / page_size?

frank sail Oct 25, 2023, 7:18 PM

#

I guess

#

Actually quads are inefficient

#

Just make a point

wicked notch Oct 25, 2023, 7:19 PM

#

page sized point?

frank sail Oct 25, 2023, 7:19 PM

#

uh hold on lemme think

#

I guess a quad actually makes sense if it needs to cover more than one sample

#

Rects would be better but only NV supports those agonyfrog

#

Quads will suffice

wicked notch Oct 25, 2023, 7:21 PM

#

hmm do we need to cover more than one shrimple

#

I'm approaching this the same way as ROC

frank sail Oct 25, 2023, 7:22 PM

#

wicked notch hmm do we need to cover more than one shrimple

Yeah unless you want to shade 128² pixels in your fs bleakekw

wicked notch Oct 25, 2023, 7:22 PM

#

but hold on

#

how do I get the quad's depth?

#

unless I render the projected meshlet's AABB?

frank sail Oct 25, 2023, 7:23 PM

#

I was thinking it would be 1 for active pages and 0 for inactive pages

#

so early-z would just cull fragments in inactive pages

wicked notch Oct 25, 2023, 7:24 PM

#

Hmm that would be reasonable

frank sail Oct 25, 2023, 7:24 PM

#

when u actually render the geometry, the regular depth will be tested against that

wicked notch Oct 25, 2023, 7:25 PM

#

when does MSAA come into play though thonk

#

I assume when rendering the actual VSM

frank sail Oct 25, 2023, 7:25 PM

#

it doesn't, not anymore

wicked notch Oct 25, 2023, 7:26 PM

#

rip msaa

frank sail Oct 25, 2023, 7:26 PM

#

devsh mentioned that it's only for when depth has higher samples than color which is the opposite of what he proposed

#

according to the spec

wicked notch Oct 25, 2023, 7:26 PM

#

ah rip

frank sail Oct 25, 2023, 7:26 PM

#

but we can use interpolateAtOffset anyways so MSAA always sucked

wicked notch Oct 25, 2023, 7:26 PM

#

so we declaring gl_FragCoord varying?

frank sail Oct 25, 2023, 7:27 PM

#

is that real syntax

wicked notch Oct 25, 2023, 7:27 PM

#

maychance

frank sail Oct 25, 2023, 7:27 PM

#

in vec4 gl_FragCoord; or something

#

at the very least, we can declare our own window-space vs output

wicked notch Oct 25, 2023, 7:28 PM

#

ye

#

maybe that's better

frank sail Oct 25, 2023, 7:30 PM

#

also, don't we need conservative raster for this to work

#

if each fs invocation is to shade multiple pixels

wicked notch Oct 25, 2023, 7:31 PM

#

you know what

#

Imma draw a fullscreen tringle

#

and check whether I should output a depth of 1 or 0 in the frag shader KEKW

frank sail Oct 25, 2023, 7:32 PM

#

it's either that or clear+quad per active page

#

I think the latter would be faster in cases where there are few active pages

frank sail Oct 25, 2023, 7:34 PM

#

frank sail also, don't we need conservative raster for this to work

but there is still this issue for the drawing of the vsm

wicked notch Oct 25, 2023, 7:34 PM

#

hmm I don't really have a real grasp on how conservative raster works

#

The only time I read about it was for HFTS

#

Nvidia's bullshit technique for shadows KEKW

frank sail Oct 25, 2023, 7:35 PM

#

it means a fragment is generated if the triangle touches a texel rather than if it covers the center

#

it can also mean a fragment is generated if the triangle covers the entire texel, depending on the mode

#

at least AMD and NV support the ext in vulkan

#

but in GL, only NV supports it kekkedsadge

wicked notch Oct 25, 2023, 7:37 PM

#

damn rip

#

I don't really see it though (the reason for conservative raster)

frank sail Oct 25, 2023, 7:37 PM

#

time for triangle dilation geometry shader bleakekw

distant lodge Oct 25, 2023, 7:38 PM

#

High Fructose Torn Syrup

#

you also use it for voxelizing, iirc devsh mentioned the reason you'd want it in that case over msaa voxelization

#

I just forgot

frank sail Oct 25, 2023, 7:39 PM

#

wicked notch I don't really see it though (the reason for conservative raster)

imagine, in the extreme case, that you want only one fs invocation for each page

#

so you only need to fill a 128x128 depth buffer to get early z

#

a triangle could cover a significant amount of a page without actually covering the center sample, but obviously you care about all the texels in the page being written

#

but if the triangle doesn't cover the center sample, nothing gets written unless you use conservative raster

#

realistically, you'd only do like 4x4 or 8x8 pixels in the fs so the error wouldn't be so large, but it would still be less than ideal

wicked notch Oct 25, 2023, 7:44 PM

#

wait hold on

distant lodge Oct 25, 2023, 7:44 PM

#

why do you wanna manually dilate though, with NV at least the conservative rasterization ext is available since maxwell afaik

wicked notch Oct 25, 2023, 7:44 PM

#

I thought we were using a depth buffer as big as the VSM but 16 bit?

distant lodge Oct 25, 2023, 7:44 PM

#

(which still blows me out but what are you gonna do)

frank sail Oct 25, 2023, 7:45 PM

#

wicked notch I thought we were using a depth buffer as big as the VSM but 16 bit?

yeah that's the fallback

frank sail Oct 25, 2023, 7:45 PM

#

distant lodge why do you wanna manually dilate though, with NV at least the conservative raste...

for gpus without it lol (also that was a meme suggestion)

wicked notch Oct 25, 2023, 7:46 PM

#

I don't think anything older than maxwell should be legally allowed to run VSM

frank sail Oct 25, 2023, 7:46 PM

#

16-bit depth is ass though

wicked notch Oct 25, 2023, 7:46 PM

#

we only store a 1 or a 0 it's fine

frank sail Oct 25, 2023, 7:46 PM

#

well I guess if it's just being used to store a binary value then ye

#

lol

distant lodge Oct 25, 2023, 7:47 PM

#

why not use a stencil buffer

frank sail Oct 25, 2023, 7:47 PM

#

almost forgot that we actually have two depth buffers bleakekw

distant lodge Oct 25, 2023, 7:47 PM

#

can't those have 1 bit fidelity

#

or storage I should say

frank sail Oct 25, 2023, 7:47 PM

#

no but they can have 8 bit storage

#

I'm kinda lukewarm on the whole idea tbh

wicked notch Oct 25, 2023, 7:49 PM

#

is there early stencil?

frank sail Oct 25, 2023, 7:49 PM

#

yes

wicked notch Oct 25, 2023, 7:50 PM

#

pog

#

I've never rendered to a stencil buffer ever in my life

frank sail Oct 25, 2023, 7:50 PM

#

same

wicked notch Oct 25, 2023, 7:50 PM

#

I don't even know how you actually render into one bleakekw

distant lodge Oct 25, 2023, 7:51 PM

#

I thinnk you set the read/write ops in a similar way to configuring depth ops

#

they just work differently

wispy spear Oct 25, 2023, 7:52 PM

#

glStencilMask similar to glDepthMaskisms?

#

and glColorMask(false, false, false, false)

frank sail Oct 25, 2023, 7:52 PM

#

probably

wicked notch Oct 25, 2023, 7:53 PM

#

typedef struct VkStencilOpState {
    VkStencilOp    failOp;
    VkStencilOp    passOp;
    VkStencilOp    depthFailOp;
    VkCompareOp    compareOp;
    uint32_t       compareMask;
    uint32_t       writeMask;
    uint32_t       reference;
} VkStencilOpState;``` wtf is this

frank sail Oct 25, 2023, 7:53 PM

#

btw this all only helps the case where every page isn't being drawn to, i.e., when culling is also working

wicked notch Oct 25, 2023, 7:53 PM

#

frank sail btw this all only helps the case where every page isn't being drawn to, i.e., wh...

to be fair, my HZB is extremely iffy sometimes KEKW

#

and it's definitely more conservative than necessary

wispy spear Oct 25, 2023, 7:53 PM

#

https://stackoverflow.com/questions/48246302/writing-to-the-opengl-stencil-buffer

frank sail Oct 25, 2023, 7:54 PM

#

there is a learnopengl tutorial for stencil KEKW

wicked notch Oct 25, 2023, 7:55 PM

#

back to our origins huh

frank sail Oct 25, 2023, 7:56 PM

#

https://tenor.com/view/you-couldn’t-live-thanos-endgame-back-to-me-you-could-not-live-gif-16221515858415007908

Tenor

#

I'm more tempted to try gl_{Clip, Cull}Distance before this tbh

wispy spear Oct 25, 2023, 7:57 PM

#

when im back from memory transfers it better works

frank sail Oct 25, 2023, 7:57 PM

#

filling a big chungus stencil buffer for every draw sounds cap

#

also you literally can't even make the stencil buffer for some of the resolutions we target

wicked notch Oct 25, 2023, 7:58 PM

#

frank sail filling a big chungus stencil buffer for every draw sounds cap

ye true

#

if only we could do MSAA

wispy spear Oct 25, 2023, 7:58 PM

#

can you use a ubo/ssbo in a smol compute pass in between instead?

frank sail Oct 25, 2023, 7:58 PM

#

unless you do the fake MSAA thing with interpolateAtOffset (which requires conservative raster)

frank sail Oct 25, 2023, 7:58 PM

#

wispy spear can you use a ubo/ssbo in a smol compute pass in between instead?

wdym

wispy spear Oct 25, 2023, 7:59 PM

#

you want to write something to the stencil buffer

#

can this be replaced with a ubo/ssbo instead

wicked notch Oct 25, 2023, 7:59 PM

#

no because we lose early stencil then

#

rip

wispy spear Oct 25, 2023, 7:59 PM

#

ah thats also a thing

frank sail Oct 25, 2023, 8:00 PM

#

yeah the goal is to not emit fragment shader invocations somehow

frank sail Oct 25, 2023, 8:00 PM

#

frank sail I'm more tempted to try `gl_{Clip, Cull}Distance` before this tbh

forgot that shit wraps so this is a bad solution half the time kekkedsadge

#

well maybe there's a way to change the planes in that case

#

basically there will either be a large void in the middle that you don't want to render to, or a small square in the middle that you do want to render to

#

for posterity

#

idk how to efficiently compute the former bounds tbh. you can't do a simple min/max on the page coordinates

wicked notch Oct 25, 2023, 8:12 PM

#

you could just coerce AMD to make sparse not garbage on windows

#

and all our problems would vanish

#

use deadly force if necessary bleakekw

frank sail Oct 25, 2023, 8:15 PM

#

guess I'll try culling individual triangles against the page hierarchy

#

then sw rasterize nervous

wicked notch Oct 25, 2023, 9:15 PM

#

btw I fixed the HZB

#

vec4 project_screen_aabb(in aabb_t aabb, in mat4 transform, in mat4 proj_view) {
    const vec3[] corners = vec3[](
        vec3(aabb.min.x, aabb.min.y, aabb.min.z),
        vec3(aabb.min.x, aabb.min.y, aabb.max.z),
        vec3(aabb.min.x, aabb.max.y, aabb.min.z),
        vec3(aabb.min.x, aabb.max.y, aabb.max.z),
        vec3(aabb.max.x, aabb.min.y, aabb.min.z),
        vec3(aabb.max.x, aabb.min.y, aabb.max.z),
        vec3(aabb.max.x, aabb.max.y, aabb.min.z),
        vec3(aabb.max.x, aabb.max.y, aabb.max.z)
    );
    vec2 min_uv = vec2(+3.402823466e+38);
    vec2[8] semi_uv;
    for (uint i = 0; i < 8; ++i) {
        const vec4 clip = proj_view * transform * vec4(corners[i], 1.0);
        const vec2 uv = (clip.xy / clip.w) * vec2(0.5, -0.5);
        min_uv = min(min_uv, uv);
        semi_uv[i] = uv;
    }
    vec2 min_xy = vec2(0.0);
    vec2 max_xy = vec2(1.0);
    for (uint i = 0; i < 8; ++i) {
        const vec2 uv = fract(semi_uv[i] + min_uv);
        min_xy = min(min_xy, uv);
        max_xy = max(max_xy, uv);
    }
    return vec4(min_xy, max_xy);
}

#

for posterity too

frank sail Oct 25, 2023, 9:16 PM

#

is this for your boolean hzb?

wicked notch Oct 25, 2023, 9:16 PM

#

ye

#

also, how did you fix biasing?

#

I am still only applying a mere constant offset bleakekw

frank sail Oct 25, 2023, 9:18 PM

#

did you see the desmos I spammed

#

https://www.desmos.com/calculator/nbhoiubvfj

wicked notch Oct 25, 2023, 9:19 PM

#

so the old usual tan(acos())

#

epic

frank sail Oct 25, 2023, 9:19 PM

#

there's a linalg version below

#

I impl'd this but it's fucked somehow and I didn't care enough to fix it

#

btw you also need to account for fp error (somehow)

#

I think I just added a constant 1.0 / (1 << 24) or something

cold sky Oct 26, 2023, 1:02 PM

#

frank sail devsh mentioned that it's only for when depth has higher samples than color whic...

you can still try it and what happens 😛

cold sky Oct 26, 2023, 1:04 PM

#

distant lodge I just forgot

its because MSAA is a grid, just like using a higher resolution render target, so when you voxelize you can still miss a sample and have seams/gaps in your voxelization

wicked notch Oct 26, 2023, 9:01 PM

#

#

need your data @frank sail for posterity

wicked notch Oct 26, 2023, 9:17 PM

#

sir @wispy spear can I share the sparse experiment in #experiments

#

There's a possibility that your system will lockup if you are on NV/Windows so maybe a warning should be necessary

wispy spear Oct 26, 2023, 10:20 PM

#

wicked notch sir <@194502965749350400> can I share the sparse experiment in <#937398669550960...

of course, i dont think you need permission : )

wispy spear Oct 26, 2023, 11:12 PM

#

https://github.com/NVIDIAGameWorks/Displacement-MicroMap-Toolkit would this be of any help in your meshletisms?

GitHub

GitHub - NVIDIAGameWorks/Displacement-MicroMap-Toolkit

Contribute to NVIDIAGameWorks/Displacement-MicroMap-Toolkit development by creating an account on GitHub.

frank sail Oct 26, 2023, 11:12 PM

#

That's for their ray tracing extensions

#

Opacity and displacement micromap generation and stuff

wispy spear Oct 26, 2023, 11:13 PM

#

ah so something completely different

#

i saw meshopt showing up in there and thought there was some meshopt-isms going on few weeks or months ago

frank sail Oct 26, 2023, 11:15 PM

#

Yeah though I get how it invokes the idea of meshlets when it says "micromeshes"

wispy spear Oct 26, 2023, 11:15 PM

#

ye meshleading

wicked notch Oct 28, 2023, 2:45 PM

#

I have faith

#

With the new batching & waiting method vkQueueBindSparse does NOT cause hitches

#

when updating lots of clipmaps

#

vkQueueBindSparse's perf still sucks though

#

but it's not as bad

#

I wonder

#

Maybe setting a limit of updated pages per frame won't cause a lot of pop-in if I can keep framerate up?

#

Nsight's sparse image viewer is pretty pog though ngl

cold sky Oct 28, 2023, 3:23 PM

#

wicked notch With the new batching & waiting method vkQueueBindSparse does NOT cause hitches

wdym>

wicked notch Oct 28, 2023, 3:24 PM

#

vkQueueBindSparse + vkQueueWaitIdle KEKW

#

It is entirely possible that my previous setup with the timeline semaphore was wrong

#

I also used the graphics queue for everything, maybe using the transfer or the compute queue + timeline semaphore could give me more perf

cold sky Oct 28, 2023, 3:26 PM

#

wicked notch I also used the graphics queue for everything, maybe using the transfer or the c...

well duh

wicked notch Oct 28, 2023, 3:27 PM

#

I just gave up very quickly, this time I'll put in more work to make vkQueueBindSparse not suck and report back

#

Thanks to sharpneli & co. I now have a lot more info

cold sky Oct 28, 2023, 3:30 PM

#

wicked notch Thanks to sharpneli & co. I now have a lot more info

I'd think its kinda self-evident that since the bindsparse happens on the GPU timeline its better to not wait for the queue to be idle XD

wicked notch Oct 28, 2023, 3:31 PM

#

Ye, previously I was worried about the CPU overhead, but if you saw in the experiments room, if you don't spam vkQueueBindSparse calls the CPU overhead is basically zero or just wait for idle

cold sky Oct 28, 2023, 3:32 PM

#

wicked notch Ye, previously I was worried about the CPU overhead, but if you saw in the exper...

shouldn't it be basically one queue bind sparse per frame?

wicked notch Oct 28, 2023, 3:32 PM

#

Yes but even then, if you call vkQueueBindSparse once per frame, if enough frames pass you have a big issue

#

because the HW queue fills up

cold sky Oct 28, 2023, 3:33 PM

#

wicked notch Yes but even then, if you call `vkQueueBindSparse` once per frame, if enough fra...

you have a big issue from frequent flipping of a page on/off

#

use temporal smoothing / fixed mem budget

wicked notch Oct 28, 2023, 3:33 PM

#

yeah I have some ideas about that

cold sky Oct 28, 2023, 3:33 PM

#

one would be to wait for a page to be unused for K frames before evicting

wicked notch Oct 28, 2023, 3:34 PM

#

I'll first try deferred page unbinding, if there's enough memory in the pool

wicked notch Oct 28, 2023, 3:34 PM

#

cold sky one would be to wait for a page to be unused for `K` frames before evicting

ye

cold sky Oct 28, 2023, 3:35 PM

#

wicked notch I'll first try deferred page unbinding, if there's enough memory in the pool

if you can spare it, always have the pages eat up all the mem (that you've set aside)

#

btw this strategy is nice because for every bind, you have a matching unbind

#

basically to page something in, you have to page something out

wicked notch Oct 28, 2023, 3:38 PM

#

Hmm yeah that does sound good

#

also lets me never actually "unbind" (as is, submit a null VkSparseImageMemoryInfo)

#

You can just replace the page with a new offset, saving one bind operation

cold sky Oct 28, 2023, 3:40 PM

#

btw you can score all pages (Cause there's so few of them) on the GPU and run a GPU compute sort

#

(or CPU radix)

#

and if your mem budget is N pages, then you grab N most important ones

wicked notch Oct 28, 2023, 3:41 PM

#

what do I do with this score, optimal linear allocation?

#

Ah importance

cold sky Oct 28, 2023, 3:41 PM

#

figure out which ones need to be paged in the most

#

btw even if a page is not needed, you shouldn't set its importance to 0

#

you can do something like 0-1 is how likely its come into view, and >1 is for visible pages

cold sky Oct 28, 2023, 3:43 PM

#

cold sky and if your mem budget is N pages, then you grab N most important ones

then out of the N, you figure out how many of those are not resident and grab the top K out of that if K is your "bind per frame" limit

wicked notch Oct 28, 2023, 3:51 PM

#

nice that's a great heuristic

#

figuring out the most important pages is gonna be a chore though

#

perhaps I can score them using the 1 / clipmap_level and 1 / distance_to_camera heuristics

delicate rain Oct 28, 2023, 4:51 PM

#

wicked notch Maybe setting a limit of updated pages per frame won't cause a lot of pop-in if ...

I have 256 and it's fine

wicked notch Oct 28, 2023, 4:57 PM

#

alright I got hardware VSM back in full functionality

#

now with 100% more popin

#

I have reclaimed earlyZ and hardware filtering though

#

That one frame of latency is so sad

#

Look at ZROP go though KEKW

#

ZROP being abused in unspeakable ways is always fun

delicate rain Oct 28, 2023, 5:18 PM

#

This is with or without culling?

wicked notch Oct 28, 2023, 5:18 PM

#

Without

delicate rain Oct 28, 2023, 5:18 PM

#

16 clips?

wicked notch Oct 28, 2023, 5:18 PM

#

16 clips ye

delicate rain Oct 28, 2023, 5:18 PM

#

Wtf that's pretty good

wicked notch Oct 28, 2023, 5:18 PM

#

It is

#

It's at least twice as fast as software VSM

#

if not more

#

depending on position etc

#

The horrible thing is the frame of latency

#

That is just so bad

delicate rain Oct 28, 2023, 5:20 PM

#

I didn't really investigate real sparse what is the process you do now/how does it work?

#

(if you don't mind going over it ofc 🥹)

wicked notch Oct 28, 2023, 5:20 PM

#

Sure thing

#

So first things first, the marking of visible pages remains the same

#

Now though, instead of sending it to another compute shader to allocate pages, we read it back on the CPU

#

Specifically I do this

#

auto bindings = std::vector<ir::sparse_image_memory_bind_t>();
bindings.reserve(requests.size());
for (auto page = 0_u64; page < requests.size(); ++page) {
    const auto offset = ir::offset_3d_t {
        .x = static_cast<int32>((page % IRIS_VSM_VIRTUAL_PAGE_ROW_SIZE) * IRIS_VSM_VIRTUAL_PAGE_SIZE),
        .y = static_cast<int32>((page / IRIS_VSM_VIRTUAL_PAGE_ROW_SIZE) * IRIS_VSM_VIRTUAL_PAGE_SIZE),
    };
    const auto extent = ir::extent_3d_t {
        .width = IRIS_VSM_VIRTUAL_PAGE_SIZE,
        .height = IRIS_VSM_VIRTUAL_PAGE_SIZE,
        .depth = 1,
    };
    if (is_requested && !is_allocated) {
        const auto entry = _allocator.get().allocate();
        bindings.emplace_back(ir::sparse_image_memory_bind_t {
            .offset = offset,
            .extent = extent,
            .buffer = _buffer->slice(memory_offset, IRIS_VSM_PHYSICAL_PAGE_RESOLUTION, false),
        });
        _pages[page] = entry;
        _allocated++;
    } else if (!is_requested && is_allocated) {
        bindings.emplace_back(ir::sparse_image_memory_bind_t {
            .offset = offset,
            .extent = extent,
            .buffer = _buffer->slice(memory_offset, IRIS_VSM_PHYSICAL_PAGE_RESOLUTION, true),
        });
        _allocator.get().deallocate(entry);
    }
}
return bindings;

#

This takes in requests which is the read-back array

#

and outputs an array of sparse_image_memory_bind_t

#

Which is equivalent to VkSparseImageMemoryBind

#

It basically takes in the offset and extent of the our virtual image and assigns it to a certain offset of a VkDeviceMemory

#

The next step is to send this info over to vkQueueBindSparse which is just this

for (auto i = 0_u32; i < IRIS_VSM_CLIPMAP_COUNT; ++i) {
    const auto requests = vsm_visible_pages_buffer.as_span();
    // std::vector<ir::sparse_image_memory_bind_t>
    const auto bindings = _vsm.images[i].make_updated_sparse_bindings(requests.subspan(
        i * IRIS_VSM_VIRTUAL_PAGE_COUNT,
        IRIS_VSM_VIRTUAL_PAGE_COUNT
    ));
    if (!bindings.empty()) {
        _sparse_bind_info.image_binds.emplace_back(ir::sparse_image_memory_bind_info_t {
            .image = std::cref(_vsm.images[i].image()),
            .bindings = std::move(bindings),
        });
    }
}```

#

And then it's just a good ol' vk call

#

sparse_timeline_value = _sparse_bind_semaphore->increment(2);
_sparse_bind_info.wait_semaphores = {
    { std::cref(*_sparse_bind_semaphore), {}, sparse_timeline_value }
};
_sparse_bind_info.signal_semaphores = {
    { std::cref(*_sparse_bind_semaphore), {}, sparse_timeline_value + 1 }
};
_device->compute_queue().bind_sparse(_sparse_bind_info);```

#

I use timeline semaphores such that:

if there were no previous sparse binds, simply signal to the graphics queue once it is done
If there was a previous sparse bind, wait for it to be done and signal the graphics queue

#

Finally, sampling is super easy

#

float sun_shadow = 0.0;
const int vsm_residency = sparseTextureLodARB(u_vsm[virtual_page.position.z], virtual_page.uv.xyz, 0, sun_shadow);
if (!sparseTexelsResidentARB(vsm_residency)) {
    sun_shadow = 1.0;
}```

delicate rain Oct 28, 2023, 5:27 PM

#

Hmmm so the latency is only due to the readback?

wicked notch Oct 28, 2023, 5:27 PM

#

Yep

#

I can't even do anything about it because I need to update the sparse bindings though vkQueueBindSparse

delicate rain Oct 28, 2023, 5:30 PM

#

Right okay correct me if I'm wrong but the popping comes because you try to read sparse memory at a clip level where nothing is located since the requesting logic is the same as sampling logic. Aka it tries to read clip 3 while the memory is sill bound to clip 4?

#

Due to the single frame lag

wicked notch Oct 28, 2023, 5:31 PM

#

It tries to read clip 3 but the memory is unbound

#

Right now I immediately unbind all pages that are not request in the current frame

#

so if in the previous frame some page wasn't allocated, but the current frame says "now it is allocated boi", I sample an unbound page

#

The only way to fix it is to have no frames in flight (and stall before sampling) KEKW

delicate rain Oct 28, 2023, 5:34 PM

#

Hmmmm

#

I know how to fix the popping due to clip level switch

#

You just delay the clip level you sample by a single frame

#

But disoclussion and edge of frame will still pop

#

Okay I have a cursed idea

#

What if we tried to combine fake sparse and hw sparse

wicked notch Oct 28, 2023, 5:36 PM

#

how so

delicate rain Oct 28, 2023, 5:38 PM

#

We use hw sparse for pages that are only switching lod levels - since you can just delay the sampling as I described. But for pages that cover pixels that were previously not visible at all we do fake sparse with no delay

#

Then once these pages will switch lod we move them to the hw sparse path

wicked notch Oct 28, 2023, 5:39 PM

#

agonyfrog

delicate rain Oct 28, 2023, 5:39 PM

#

In your vsm page table you'd need to store info if a page is real or hw sparse to know what to sample

#

Ah but this would double the amount of clipmaps we have to draw

wicked notch Oct 28, 2023, 5:43 PM

#

Double the clipmaps but with efficient culling that's not a problem smart

delicate rain Oct 28, 2023, 5:45 PM

#

The culling is also the same for non resident pages so we could do two step culling - first share cull for both real and fake and second cull again now only for pages of real and fake separately

#

But I feel like this is extremely cursed

wicked notch Oct 28, 2023, 5:48 PM

#

Let me try getting rid of the timeline semaphore and just stalling the device

#

because that's what good people do, they stall the device

#

Ok I got a bsod

#

very good

delicate rain Oct 28, 2023, 5:56 PM

#

We just need bindSparseIndirect smh

#

Is that too much to ask for?

wicked notch Oct 28, 2023, 6:16 PM

#

goddamnit hw filtering looks so good

#

fuck me

#

wispy spear Oct 28, 2023, 6:19 PM

#

but the textures look like shit, some washed out BDU pants

wicked notch Oct 28, 2023, 7:07 PM

#

This debug view is so good

#

In white are the active pages relative to what the shadow map sees

raven orchid Oct 28, 2023, 7:09 PM

#

do you have hpb running as well?

#

like sparse depth + hpb cull

wicked notch Oct 28, 2023, 7:09 PM

#

working on it rn

#

I already have the code I just have to make it work™️

raven orchid Oct 28, 2023, 7:10 PM

#

i'm super interested in the final perf

#

adding hpb really helped mine

#

would you say

#

your current sparse approach is getting close to nvidia's GL driver impl of sparse?

wicked notch Oct 28, 2023, 7:11 PM

#

Nah

#

Mine is very rudimentary right now

#

It's just "update what changed and semaphore"

#

No deferred or staggered updates or stuff like that

raven orchid Oct 28, 2023, 7:12 PM

#

hmm ok makes sense

#

i wonder if this means sparse will be viable after all

wicked notch Oct 28, 2023, 7:13 PM

#

I hate readback with a passion though bleakekw

raven orchid Oct 28, 2023, 7:14 PM

#

delicate rain We just need bindSparseIndirect smh

let's hope for this

#

save us from the readback

#

gpu driven sparse management

wicked notch Oct 28, 2023, 7:15 PM

#

absolute garbage

wicked notch Oct 28, 2023, 7:15 PM

#

raven orchid gpu driven sparse management

I wish bleakekw

#

while we're at it, why not make it also perform well out of the box

cold sky Oct 28, 2023, 7:17 PM

#

ah rasterization, teh paradigm where you heroically toil to overcome problems unknown in raytracing

raven orchid Oct 28, 2023, 7:23 PM

#

wicked notch while we're at it, why not make it also perform well out of the box

this scares me tbh

#

like from your experiment it sounds like it might be possible, but also seems very tricky to get the API to not obliterate performance

wicked notch Oct 28, 2023, 7:24 PM

#

It's tricky but completely doable

#

Right now I have the most naive implementation possible, and vkQueueBindSparse takes at most 10ms

#

When updating across all 16 clipmaps

#

Also, viewing clip 2 in nsight makes me have a device loss error?

raven orchid Oct 28, 2023, 7:30 PM

#

Dang that’s surprisingly not bad

#

Wonder how low you can get it

wicked notch Oct 28, 2023, 7:30 PM

#

mfw

delicate rain Oct 28, 2023, 7:39 PM

#

The fact that you have to have no fif makes it really bad though

#

I'm not sure there is a way to overcome that

wicked notch Oct 28, 2023, 7:40 PM

#

I still have 2 FIF

#

Thing is, you don't just need 1 frame in flight

delicate rain Oct 28, 2023, 7:40 PM

#

yeh but your shadows are scuffed

wicked notch Oct 28, 2023, 7:40 PM

#

you need 1 frame in flight and issue a complete queue stall

#

you need to do both to not have scuffed shadows

delicate rain Oct 28, 2023, 7:41 PM

#

graphics queue stall?

wicked notch Oct 28, 2023, 7:41 PM

#

ye

delicate rain Oct 28, 2023, 7:41 PM

#

https://tenor.com/view/sad-frown-rain-cat-gif-17035614

Tenor

glass sphinx Oct 28, 2023, 7:41 PM

#

is that a waitidle?

#

on queue?

wicked notch Oct 28, 2023, 7:41 PM

#

while (is_running()) {
    mark_pages();
    readback();
    wait_idle();
    update_bindings();
    render_shadows();
}```

glass sphinx Oct 28, 2023, 7:42 PM

#

so no frame in flight?

wicked notch Oct 28, 2023, 7:42 PM

#

no frame in flight

glass sphinx Oct 28, 2023, 7:42 PM

#

https://www.youtube.com/shorts/XPcfbKYmxpw

YouTube

Short Clips

SpongeBob Foghorn Sound Effect

▶ Play video

delicate rain Oct 28, 2023, 7:44 PM

#

why do you need the wait idle?

#

is that just how sparse works?

wicked notch Oct 28, 2023, 7:45 PM

#

nono, you can omit the wait idle, you just have to work with an older readback then

#

the wait idle is there to complete the transfer from device to host

glass sphinx Oct 28, 2023, 7:45 PM

#

thats fine tho

wicked notch Oct 28, 2023, 7:45 PM

#

waiting for transfer?

glass sphinx Oct 28, 2023, 7:46 PM

#

delayed readback

wicked notch Oct 28, 2023, 7:46 PM

#

ah yeah

wicked notch Oct 28, 2023, 7:46 PM

#

wicked notch absolute garbage

you get this though

glass sphinx Oct 28, 2023, 7:46 PM

#

oh

#

L

#

https://tenor.com/view/minecraft-nvidia-ge-force-rtx-rtx-on-gif-21180788

Tenor

delicate rain Oct 28, 2023, 7:47 PM

#

~~We delay the user input for FIF and than the readback will be fine~~

wispy spear Oct 28, 2023, 7:47 PM

#

just show a loading screen

#

or some loading.gif on a quad

#

starfield did it

#

cod did it

#

why not showcase_vsm.exe

delicate rain Oct 28, 2023, 7:48 PM

#

we have cool shadows but you cannot move or rotate camera or else loading screen

wispy spear Oct 28, 2023, 7:48 PM

#

: D

#

ok im sorry, just here to keep the morale up

wicked notch Oct 28, 2023, 7:49 PM

#

I'll stick with HWVSM for a while more

#

after all SWVSM is a git checkout away

wicked notch Oct 28, 2023, 8:05 PM

#

I am so tempted to actually put a stall and see what happens bleakekw

#

father forgive me

#

no more scuffed shadows!

#

don't mind the giant hole though

#

Totally not issuing a full stall 💀

wicked notch Oct 28, 2023, 8:46 PM

#

so good

cold sky Oct 28, 2023, 9:54 PM

#

wicked notch you get this though

Skill issue

#

Git good

#

At fortune telling

#

Or loddin

frank sail Oct 28, 2023, 9:59 PM

#

The pop in would probably be less bad if you cached pages that weren't just on screen

wicked notch Oct 28, 2023, 10:00 PM

#

ye

#

once again caching would save my ass

frank sail Oct 28, 2023, 10:01 PM

#

there'd still be popping when you see new pages so meh

wicked notch Oct 28, 2023, 10:04 PM

#

I already have a full stall in my code rn so bleakekw

#

by writing wait_idle() I have forsaken my humanity Jaker

cold sky Oct 28, 2023, 10:06 PM

#

frank sail there'd still be popping when you see new pages so meh

well actually most of the demos of fucked up pages are fixable:

fast camera turning, second time you turn pages are there
use sampler feedback (implement your own) and if page not resident, sample a higher mip-map which is

#

the annoying thing to compensate for are disocclusions

#

run a small blur/average over your pages to score them ? So that way a not-right-now needed page feels important cause it has resident neighbours

frank sail Oct 28, 2023, 10:14 PM

#

Or just use a GPU allocator for zero frames of disocclusion lag smart

cold sky Oct 28, 2023, 10:17 PM

#

frank sail Or just use a GPU allocator for zero frames of disocclusion lag <:smart:59186497...

yeah but then no HiZ

#

and HW rasterizer neatly dropping writes to non-resident pages

frank sail Oct 28, 2023, 10:19 PM

#

yeah but you actually rarely have to draw vsm pages so it's not a big deal

wicked notch Oct 28, 2023, 10:20 PM

#

moving the camera causes a lot of cache invalidations though and perf dips quite a bit

frank sail Oct 28, 2023, 10:21 PM

#

that's why you don't move the camera

cold sky Oct 28, 2023, 10:27 PM

#

frank sail yeah but you actually rarely have to draw vsm pages so it's not a big deal

unless the sun moves 😛

wicked notch Oct 28, 2023, 10:28 PM

#

if the sun moves it's joever

cold sky Oct 28, 2023, 10:39 PM

#

also you know, if dynamic occluders within the pages move too XD

frank sail Oct 28, 2023, 10:39 PM

#

Nuh uh

raven orchid Oct 28, 2023, 10:40 PM

#

I think unreal is trying to move to a dual vsm mem pool solution to handle dynamics

#

Then they just merge dynamic with static cache

frank sail Oct 28, 2023, 10:42 PM

#

Ye

cold sky Oct 28, 2023, 10:44 PM

#

raven orchid I think unreal is trying to move to a dual vsm mem pool solution to handle dynam...

but then your sparse bind takes 2ce as long

raven orchid Oct 28, 2023, 10:44 PM

#

Yeah idk what their plan is

#

Or maybe they’re not using real sparse api? Idk

cold sky Oct 28, 2023, 10:45 PM

#

from what I can see your biggest bottleneck is the 1 frame lag + how many pages you can bind per frame

#

not the actual drawing, which is hilarious

raven orchid Oct 28, 2023, 10:46 PM

#

Though I know mac doesn’t support sparse and I think ue5 runs there right? Maybe they’re using software sparse

delicate rain Oct 28, 2023, 10:46 PM

#

They have Nanite they don't need real sparse

cold sky Oct 28, 2023, 10:46 PM

#

raven orchid Though I know mac doesn’t support sparse and I think ue5 runs there right? Maybe...

huh? Mac used to be AMD Radeon since like 2010-ish ?

#

and AMD has sparse since then

raven orchid Oct 28, 2023, 10:47 PM

#

Oh they ditched amd

cold sky Oct 28, 2023, 10:47 PM

#

tbf Nanite is the point at which I'm like... fuck rasterizing

raven orchid Oct 28, 2023, 10:47 PM

#

At least on apple silicon it reports no sparse in the vulkan support viewer

delicate rain Oct 28, 2023, 10:47 PM

#

Also I'm pretty sure you still need screen space shadows and other stuff because otherwise VSMs just die when you have stuff like moving grass and swaying trees

cold sky Oct 28, 2023, 10:47 PM

#

raven orchid Oh they ditched amd

ofc we all know that but in order to compete M1 and M2 GPUs would have to step into AMD's shoes

wicked notch Oct 28, 2023, 10:47 PM

#

you need SSS regardless of those because of LOD KEKW

raven orchid Oct 28, 2023, 10:48 PM

#

Yeah I hope so

#

I think metal might be approaching feature parity?

delicate rain Oct 28, 2023, 10:48 PM

#

wicked notch you need SSS regardless of those because of LOD <:KEKW:666849321462792234>

Because of LOD?

cold sky Oct 28, 2023, 10:48 PM

#

delicate rain Because of LOD?

lod transitions as you move

wicked notch Oct 28, 2023, 10:48 PM

#

ye, iirc nanite uses LOD for their VSMs too, so when there are mismatching LODs between views, they run a screen space trace to fix the shadows

cold sky Oct 28, 2023, 10:49 PM

#

bleakekw

frank sail Oct 28, 2023, 10:49 PM

#

Lod transitions are basically unnoticeable in VSM unless the lod bias is high

delicate rain Oct 28, 2023, 10:49 PM

#

Oh you just match the lod to the clipmap level

#

Ez clap

wicked notch Oct 28, 2023, 10:49 PM

#

but then rip LOD

cold sky Oct 28, 2023, 10:49 PM

#

just #ray-tracing

#

stop it

#

get some help

delicate rain Oct 28, 2023, 10:49 PM

#

Yeeah

cold sky Oct 28, 2023, 10:49 PM

#

build yourself a BLAS

wicked notch Oct 28, 2023, 10:49 PM

#

build a blas

#

rayQueryEXT

#

life is good

delicate rain Oct 28, 2023, 10:49 PM

#

It was fun while it lasted but I'm starting to be a doubter

raven orchid Oct 28, 2023, 10:50 PM

#

I think ue5 will be all in vsm

#

But they’ll probably go rt for ue6

cold sky Oct 28, 2023, 10:50 PM

#

in the time you've spent fucking around re-implementing a SW Rasterizer or optimizing for the bottleneck of Sparse Bind, you could have made a BLAS builder

delicate rain Oct 28, 2023, 10:50 PM

#

They have rt as the highest quality settings

cold sky Oct 28, 2023, 10:51 PM

#

raven orchid I think ue5 will be all in vsm

probably the SW+Nanite kind then

#

HW render will just die on multiple lights

raven orchid Oct 28, 2023, 10:51 PM

#

Yeah like they say you can use it with non nanite but

#

They don’t recommend it froge_bleak

cold sky Oct 28, 2023, 10:52 PM

#

there's only so many renderpasses/subpasses you can churn through in a frame

#

and its not like you can have a separate viewport per layer

raven orchid Oct 28, 2023, 10:52 PM

#

I wonder what

#

I wonder how a hybrid solution would look

cold sky Oct 28, 2023, 10:53 PM

#

I guess you could do a layered render, if you kept all VSMs same size

raven orchid Oct 28, 2023, 10:53 PM

#

Rt for non nanite, vsm for nanite

wicked notch Oct 28, 2023, 10:53 PM

#

unreal tells us that we can have as many lights as we want because all VSMs are 16k and their culling is 100% effective

cold sky Oct 28, 2023, 10:53 PM

#

I always recommend #ray-tracing

wicked notch Oct 28, 2023, 10:53 PM

#

as in, inactive pages do not contribute to any cost

cold sky Oct 28, 2023, 10:54 PM

#

cold sky I always recommend <#377557956775903232>

do not what UE was doing last year, do what UE will be doing next year

wicked notch Oct 28, 2023, 10:54 PM

#

They only call Nanite::Rasterize once, it rasterizes all viewports

cold sky Oct 28, 2023, 10:54 PM

#

wicked notch They only call `Nanite::Rasterize` once, it rasterizes all viewports

yeah its possible to do in SW

raven orchid Oct 28, 2023, 10:54 PM

#

cold sky I always recommend <#377557956775903232>

I would but idk how potato gpus will like this

cold sky Oct 28, 2023, 10:54 PM

#

a bit less in HW (viewports need to be binned)

raven orchid Oct 28, 2023, 10:55 PM

#

I actually don’t have any non potatos to test with

cold sky Oct 28, 2023, 10:55 PM

#

raven orchid I actually don’t have any non potatos to test with

don't write for today's potatos, write for tomorrows RTX x090 Ti

cold sky Oct 28, 2023, 10:55 PM

#

wicked notch unreal tells us that we can have as many lights as we want because all VSMs are ...

yeah light will make a certain amount of pages active

raven orchid Oct 28, 2023, 10:56 PM

#

Rip to my setup froge_sad

cold sky Oct 28, 2023, 10:56 PM

#

also each light requires that you analyze the pixels on screen and vote on which pages should be active

#

so by that virtue alone, there's a limit on lights

#

and I think it might be in the low tens per pixel (counted by area/volume of effect, not visibilit/contribution)

raven orchid Oct 28, 2023, 10:58 PM

#

Hmmm I wonder if they publish that limit anywhere

cold sky Oct 28, 2023, 10:58 PM

#

wicked notch as in, inactive pages do not contribute to any cost

but they have a cost when you're meshlet culling

raven orchid Oct 28, 2023, 10:58 PM

#

Idk how many virtual lights they allow

cold sky Oct 28, 2023, 10:58 PM

#

raven orchid Hmmm I wonder if they publish that limit anywhere

its empirical

raven orchid Oct 28, 2023, 10:58 PM

#

Yeah I meant like

wicked notch Oct 28, 2023, 10:58 PM

#

cold sky but they have a cost when you're meshlet culling

ye that's the only issue

raven orchid Oct 28, 2023, 10:58 PM

#

I wonder if they have a “best practices” section for point lights

cold sky Oct 28, 2023, 10:58 PM

#

when FPS drops to 5, there's your limit

wicked notch Oct 28, 2023, 10:59 PM

#

you could do conservative culling to improve perf

#

as in, cull only the N most important meshlets this frame

cold sky Oct 28, 2023, 10:59 PM

#

raven orchid Idk how many virtual lights they allow

there's no point in having a global limit, as not every light is in-frustum and not every light affects (or could affect) the same number of pixles

cold sky Oct 28, 2023, 10:59 PM

#

wicked notch as in, cull only the N most important meshlets this frame

hey guess what, its called importance sampling

#

and you gib a fixed budget (of rays)

raven orchid Oct 28, 2023, 11:07 PM

#

Checked their docs

#

They currently don’t support per-light resolution controls and it looks like they don’t yet expose a page update budget control

wicked notch Oct 28, 2023, 11:08 PM

#

ye they only do 16k

raven orchid Oct 28, 2023, 11:08 PM

#

cold sky and I think it might be in the low tens per pixel (counted by area/volume of eff...

So this is probably being generous

#

But they’re both listed as “we’re working on it” so maybe it’ll get a bit better soon for non directional

cold sky Oct 28, 2023, 11:17 PM

#

non directional will completely wreck your page occupancy

#

persp projection blows up anything near the light source

#

and suddenly all your pages are active bleakekw

#

VSM works with directional lights cause the projection is ortho

#

actual lights in the scene there really isn't a question of "are there any empty areas in our shadowmap"

#

more like "what LoD should be page them at"

frank sail Oct 28, 2023, 11:20 PM

#

cold sky and suddenly all your pages are active <:bleakekw:1082598350303539240>

With proper mipmap usage, that won't require you to have stupid texel density

raven orchid Oct 28, 2023, 11:20 PM

#

The virtual mip chain you mean?

cold sky Oct 28, 2023, 11:20 PM

#

cold sky actual lights in the scene there really isn't a question of "are there any empty...

like you can easily get into a situation where what the camera sees covers 80%+ of the shadowmap

cold sky Oct 28, 2023, 11:21 PM

#

frank sail With proper mipmap usage, that won't require you to have stupid texel density

more like "what LoD should be page them at"
thats what I'm getting at here

#

@wicked notch I have a silly way to optimize your HiZ

wicked notch Oct 28, 2023, 11:22 PM

#

I'm all ears

cold sky Oct 28, 2023, 11:22 PM

#

well not HiZ but early Z

#

do you need to draw your meshlets for different mip levels separately ?

wicked notch Oct 28, 2023, 11:23 PM

#

yes

#

I have 16 calls to vkCmdDrawMeshTasksEXT

#

16 is num_clipmaps

cold sky Oct 28, 2023, 11:24 PM

#

for the higher (low res) mip pages, don't draw geometry (and cull) in parts that are overlapped by higher res resident mips

wicked notch Oct 28, 2023, 11:24 PM

#

I could theoretically squash them into one

#

But then I hit the max numer of meshlets that I can rasterize with that func

#

Which is 1 mil for some reason

cold sky Oct 28, 2023, 11:25 PM

#

you can downsample the higher pages into the lower pages

#

a resident high level page covers 1/4 of the page immediately below it (whether the lower one is resident or not)

#

hmm I guess you can improve both HiZ and earlyZ 🤣

#

basically only rasterize geo for resident 1/4-pages that don't have a resident page directly beneath them in the mip-chain

#

that way you won't be drawing any meshlets to the highest mips

#

at all

#

btw you don't need to interlave compute/FS, you can do the downsampling all at the end after everything has been rendered

#

clever, eh?

wicked notch Oct 28, 2023, 11:30 PM

#

hm

cold sky Oct 28, 2023, 11:30 PM

#

~~this can probably cut your Mesh/Vertex processing time in 16x~~

#

nvm, still a good gain

delicate rain Oct 28, 2023, 11:32 PM

#

I suggested something similar way back, but not with the rasterize 1/4 that were not resident, I wanted to do downsample when you switch clip and the higher clip is fully covered by the lower clips

#

I'm not sure it will bring that big of an improvement thought

wicked notch Oct 28, 2023, 11:34 PM

#

worth trying

wicked notch Oct 29, 2023, 12:15 PM

#

I have implemented deferred binding updates as well as batching

#

However, there's a very sad fact

#

Updating a couple dozen pages takes 5ms

frank sail Oct 29, 2023, 12:17 PM

#

let it go m8

wicked notch Oct 29, 2023, 12:17 PM

#

no

#

more tricks are to be done

#

I really wanna do caching

#

but my brain is too small

#

Jaker can I trouble you to explain how to snap the camera to page offsets once again

#

also how to mark pages dirty

#

🥺

frank sail Oct 29, 2023, 12:19 PM

#

wicked notch more tricks are to be done

I mean the API sparse lol

wicked notch Oct 29, 2023, 12:19 PM

#

ye

frank sail Oct 29, 2023, 12:19 PM

#

sounds not worth

wicked notch Oct 29, 2023, 12:20 PM

#

I'm fully convinced that caching will let HW sparse shine

#

I just don't really understand how to do caching

frank sail Oct 29, 2023, 12:20 PM

#

frogstare

frank sail Oct 29, 2023, 12:20 PM

#

wicked notch also how to mark pages dirty

if page was allocated and hasn't been rendered to, it is dirty

#

I set this bit in the allocator

wicked notch Oct 29, 2023, 12:21 PM

#

What if a page remains allocated for two frames in a row but the camera snaps to another offset

frank sail Oct 29, 2023, 12:21 PM

#

nothing happens to it

#

because of my epic wrap addressing

frank sail Oct 29, 2023, 12:22 PM

#

wicked notch Jaker can I trouble you to explain how to snap the camera to page offsets once a...

what part do u need explain't

frank sail Oct 29, 2023, 12:23 PM

#

wicked notch I'm fully convinced that caching will let HW sparse shine

I don't see how tbqh

wicked notch Oct 29, 2023, 12:23 PM

#

frank sail what part do u need explain't

the snapping, how do you decide when to move the origin of the clipmaps, how do you calculate that

frank sail Oct 29, 2023, 12:24 PM

#

the beauty is that there is no "decision" as in an if statement

#

but ye lemme show u

#

do you have VirtualShadowMaps.cpp open already

#

the function in question is DirectionalVirtualShadowMap::UpdateOffset

#

I can explain each line

wicked notch Oct 29, 2023, 12:25 PM

#

Alright, please do KEKW

#

I feel very dumb

frank sail Oct 29, 2023, 12:25 PM

#

I'll post it

#

this will be the reference I guess

  void DirectionalVirtualShadowMap::UpdateOffset(glm::vec3 worldOffset)
  {
    for (uint32_t i = 0; i < uniforms_.numClipmaps; i++)
    {
      // Find the offset from the un-translated view matrix
      uniforms_.clipmapStableViewProjections[i] = stableProjections[i] * stableViewMatrix;
      const auto clip = stableProjections[i] * stableViewMatrix * glm::vec4(worldOffset, 1);
      const auto ndc = clip / clip.w;
      const auto uv = glm::vec2(ndc) * 0.5f; // Don't add the 0.5, since we want the center to be 0
      const auto pageOffset = glm::ivec2(uv * glm::vec2(context_.pageTables_.Extent().width, context_.pageTables_.Extent().height));
      uniforms_.clipmapOrigins[i] = pageOffset;

      const auto ndcShift = 2.0f * glm::vec2((float)pageOffset.x / context_.pageTables_.Extent().width, (float)pageOffset.y / context_.pageTables_.Extent().height);
      
      // Shift rendering projection matrix by opposite of page offset in clip space, then apply *only* that shift to the view matrix
      const auto shiftedProjection = glm::translate(glm::mat4(1), glm::vec3(-ndcShift, 0)) * stableProjections[i];
      viewMatrices[i] = glm::inverse(stableProjections[i]) * shiftedProjection * stableViewMatrix;
    }

    uniformBuffer_.UpdateData(uniforms_);
  }

#

so first off, we are calculating a separate offset for each clipmap (since each one has a different page size)

#

the offset we want needs to be a multiple of the page size for a given clipmap

#

this line calculates the viewproj of the clipmap as if it were locked to the origin (it still has a rotation component)

uniforms_.clipmapStableViewProjections[i] = stableProjections[i] * stableViewMatrix;

wicked notch Oct 29, 2023, 12:29 PM

#

How does a stable projection differ from a non stable one

frank sail Oct 29, 2023, 12:30 PM

#

at one point I was offsetting the projection matrix rather than the view matrix, but that fucked up math for other stuff later on, so I removed it

#

now I just explicitly call them stable so I'm certain about what I'm looking at

frank sail Oct 29, 2023, 12:31 PM

#

frank sail this line calculates the viewproj of the clipmap as if it were locked to the ori...

the page offset will be computed with this matrix btw

#

it provides a reference point (coordinate space?) I guess

#

oh, btw, worldOffset is the position of the player camera. we are trying to center the clipmap on it

wicked notch Oct 29, 2023, 12:33 PM

#

hm

frank sail Oct 29, 2023, 12:34 PM

#

these three lines are transforming the player coord to [-0.5, 0.5] space (half NDC space?) of the stable clipmap viewproj we just made

      const auto clip = stableProjections[i] * stableViewMatrix * glm::vec4(worldOffset, 1);
      const auto ndc = clip / clip.w;
      const auto uv = glm::vec2(ndc) * 0.5f; // Don't add the 0.5, since we want the center to be 0

#

note that it's perfectly fine for the resulting coord to not actually be in [-0.5, 0.5]. that just means it's not within the frustum of the stable viewproj (which is highly likely for the smaller clipmaps)

#

to get the all-important offset, we just multiply that "uv" by the number of pages in the clipmap

const auto pageOffset = glm::ivec2(uv * glm::vec2(context_.pageTables_.Extent().width, context_.pageTables_.Extent().height));

#

(btw it might be better to round than to truncate, idk)

#

anywho, this offset tells us how many page widths we need to translate the clipmap camera to be approximately centered on the player

delicate rain Oct 29, 2023, 12:38 PM

#

frank sail (btw it might be better to round than to truncate, idk)

I think if you trunc you will get bad behavior when when you switch from positive to negative no?

#

since -1 -> 0 <- 1

frank sail Oct 29, 2023, 12:39 PM

#

at worst you'll be off by one page which probably isn't noticeable ever

delicate rain Oct 29, 2023, 12:39 PM

#

it can be noticeable for the higher clipmaps

#

butyeah

frank sail Oct 29, 2023, 12:40 PM

#

what I mean is that the camera won't be perfectly centered on the player

#

it's not like the shadow will be wrong

delicate rain Oct 29, 2023, 12:40 PM

#

ah I see what you mean

#

yeah

frank sail Oct 29, 2023, 12:40 PM

#

that's why it's probably impossible to notice unless you are somehow looking at every page in the clipmap

#

anyways

#

back to the explanation

#

the projection matrix allows us to conveniently apply a shift in NDC space by translating it

      const auto ndcShift = 2.0f * glm::vec2((float)pageOffset.x / context_.pageTables_.Extent().width, (float)pageOffset.y / context_.pageTables_.Extent().height);
      const auto shiftedProjection = glm::translate(glm::mat4(1), glm::vec3(-ndcShift, 0)) * stableProjections[i];

#

we are only interested in translating it on XY because we don't want depth to get fucked when the player moves

#

so we're basically sliding the bad boys on a plane

frank sail Oct 29, 2023, 12:45 PM

#

frank sail the projection matrix allows us to conveniently apply a shift in NDC space by tr...

btw I originally tried right-multiplying the projection by the translation (putting the projection inside the glm::translate call), but that broke somehow nervous

wicked notch Oct 29, 2023, 12:45 PM

#

one quicc question

#

the mul by 2 is to go back to ndc?

frank sail Oct 29, 2023, 12:47 PM

#

pageOffset / numberOfPages generates a UV-space value

#

but projections work in NDC (actually clip space but yolo) I guess

#

so ye u are correct

wicked notch Oct 29, 2023, 12:48 PM

#

if it worky it worky

frank sail Oct 29, 2023, 12:48 PM

#

here is the final line in the loop

viewMatrices[i] = glm::inverse(stableProjections[i]) * shiftedProjection * stableViewMatrix;

#

ok I think that line is stupid

#

I mean it works

wicked notch Oct 29, 2023, 12:50 PM

#

Yeah I dunno what the hell is going on here

frank sail Oct 29, 2023, 12:50 PM

#

I'm basically extracting the shift from the shiftedProjection (undoing all the projection parts) and then applying it to the view matrix

#

so it's just translating the view matrix bleakekw

frank sail Oct 29, 2023, 12:51 PM

#

frank sail the projection matrix allows us to conveniently apply a shift in NDC space by tr...

so this line is stupid because the view matrix already has this property

#

except it's view space instead of NDC, which is trivial to convert to

wicked notch Oct 29, 2023, 12:53 PM

#

so basically

#

you make a shifted projection using a stable projection

frank sail Oct 29, 2023, 12:53 PM

#

the last three lines could probably be replaced by viewMatrices[i] = glm::translate(stableViewMatrix, glm::vec3(pageOffset * frustumSize, 0));

wicked notch Oct 29, 2023, 12:53 PM

#

then you undo the "projection" part of "shifted projection" by multiplying with its inverse

#

and that's your translated view matrix

frank sail Oct 29, 2023, 12:53 PM

#

ye

#

it's kinda dumb though as I just noted

#Iris - A Journey through OpenGL and beyond to learn Graphics