#Iris - A Journey through OpenGL and beyond to learn Graphics

1 messages · Page 10 of 1

raven orchid
#

Is this for binding individual pages? What’s the speed for binding the entire sparse texture?

wicked notch
#

binding as in vkQueueBindSparse?

raven orchid
#

I think I’m wondering if sparse can be attached to a frame buffer

#

Like the entire texture region (even if partially committed)

wicked notch
#

hol up lemme read the spec

#

ye there is no mention of sparse resources being incompatible with VkFramebuffer

wicked notch
#

I have acquired a machine capable of doing sparse

#

(for legal reasons I have to disclose that I did not, in fact, steal the machine)

frank sail
#

Grinding GP while on vacation 🫡

wispy spear
wicked notch
#

average idea of a break of a modern human

#

I also studied for exams all evening bleakekw

frank sail
#

Wtf

wispy spear
#

hook up with some locals

wicked notch
#

I do have stuff to do tonight as well

#

I thank god every day that my body seemingly has limitless energy

wicked notch
#

I don't feel tired for whatever reason bleakekw

#

maybe I am going to die

#

I am going to die soon yes

frank sail
#

I don't feel the physical sensation of being tired, but I notice that I become dumber

wicked notch
#

only possible explanation

wispy spear
#

hascheesh al jaffar al jaker... i think we will get nanite 3.0 very soon

#

running on some fujitsu t450, 2gig of ram, some pentium 4 thing

#

on a custom version of mesa

#

hmm jaffar ... reminds me of these

wicked notch
#

I can at least rest assured that AMD performs well with sparse

#

on RADV at least

#

pixel said amdgpu shouldn't be too different as well

#

so common amd W

wispy spear
#

then somebody needs to convince novideo to make it right on their chips too

frank sail
#

Yeah once ue6 is dependent on lvstri's stuff, they will

wicked notch
#

it would be cool to have an NV rep in here

#

just so I can see them and Jaker fight

wispy spear
#

we dont have a rep, but

#

someone joined their shop a year ago or so

wicked notch
#

use force if you need to

#

I have air support if needed

wispy spear
#

that opi from arm also seems to be a cool chap

#

tom olson

raven orchid
#

I did some experiments with OpenGL sparse virtual shadow maps today actually

wicked notch
#

holy pog

raven orchid
#

they seem to mostly work when used as a framebuffer target

wicked notch
#

that's music to my ears

#

very good, did you use the ARB ext?

raven orchid
#

yeah used the extension

#

support seems to be pretty strong for the 1060, so hopefully newer hardware is even better

wispy spear
#

eggcellent

raven orchid
#

but it allows me to commit and decommit, and I'm using compute to write to some readback textures which tells the CPU what to do

raven orchid
#

welll

#

performance is questionable right now lol

wicked notch
#

can I see 🥺 👉 👈

raven orchid
#

I might see if I can try stencil buffer rejection for the pages

frank sail
#

If it wurks

wicked notch
#

ye, the shrimple fact that it works is hands down incredible

raven orchid
#

I will try to capture the framebuffer

#

nsight really really hates 16k textures

#

at lesat on this gpu

#

the black parts are uncommitted

#

as the camera moves it allocs/frees pages

wicked notch
#

wrong message but aight

raven orchid
#

what do you think about stencil rejection for pages?

wicked notch
#

anyways, do you need to keep track of pages yourself?

raven orchid
#

I wonder if it would be viable

frank sail
#

Did you read the final fantasy shadows paper

raven orchid
#

well two of them

#

one for this frame, one to look at prev frame

raven orchid
wicked notch
raven orchid
#

hmmm maybe we need to ask Froyok

wicked notch
raven orchid
#

basically it seems that on OpenGL at least

#

if you read from an unalloced page, it returns all 0

#

so what if we explot that by setting alloced pages to some value other than 0

#

and have stencil rejection regions

frank sail
#

Anyways combining vsm with it would be cooler

raven orchid
#

or "if previous frame it was not alloced and this frame it is requested, alloc it"

wicked notch
raven orchid
#

maybe it is

#

this current impl is kind of hacked together just for testing

wicked notch
#

do you use morton codes btw

raven orchid
wicked notch
#

to store the pages

raven orchid
#

what are those froghorror

wicked notch
#

but they don't do VSM, their shadow map paper is mostly for point and area lights

frank sail
wicked notch
#

they have a pretty interesting idea where they make a frustum per screen tile and bin depth values, then they make a close up shadow frustum per tile, based on the binned depth values

#

but I dunno how it actually works

raven orchid
#

interesting

#

I definitely don't use those

#

the setup is

wicked notch
#

how do you access pages then thonk

raven orchid
#
  • one 16Kx16K sparse texture (hardware supported)
  • two 128x128 residency tables (regular textures)
  • a readback SSBO that the compute shader writes the changelist to
#

all I did was attach the texture to a framebuffer lol

wicked notch
#

you use textures for page tables? froghorror

raven orchid
#

yeah

#

just 2D int textures so I can do atomic ops

wicked notch
#

huh

#

you do any mip mapping by chance?

raven orchid
#

no everything is on one level for now

wicked notch
#

I see, very nice

#

how do you index into the page tables?

#

just uv / granularity?

raven orchid
#

yeah for the analysis part I just take the depth, convert to world space, then convert to shadow uvs and index the table that way

#

I might give the stencil thing a try later

wicked notch
#

If you wanted to do mips

#

I suppose you'd also mip map the page table right?

raven orchid
#

hmm

#

yeah I think you're right

#

we'd probably need a page table for each mip level right?

wicked notch
#

yeah definitely

#

making yet another branch..

#

btw how do your VSM look

#

let me see them sharp shadows

raven orchid
#

some sponza shadows

raven orchid
wicked notch
#

dayum

#

they do be sharp

raven orchid
#

not sure if it'll ever be ready to merge in

wicked notch
#

oh btw

#

do you ever evict

raven orchid
#

like evict a page from memory?

wicked notch
#

yeah

raven orchid
#

yeah right now I just have it set to

#

if a page was around last frame but not needed this frame, evict immediately

#

(not caching anything atm lol)

wicked notch
#

huh

#

how's 🅱️erf again?

raven orchid
#

not good

#

like it's definitely doing too much work each frame

wicked notch
#

wait, are you bound because of raster?

#

or because of memory management

raven orchid
#

it seems to be raster

#

I think it could be a lot better since it wastes time with non resident pages

#

which is kind of why I'm wondering if some kind of stencil method to reject invalid pages would help

#

my guess is page rejection will go a long way with performance

#

but to truly make it viable for older hardware caching would need to be really good

wicked notch
#

hmm I'm wondering what is the probability that NV's driver is actually doing some deferred eviction kind of thing

raven orchid
#

Yeah it’s probably pretty high

#

Hmmm good point though, better eviction strategy is definitely needed

wicked notch
#

my idea for Vulkan was not evicting, ever

#

instead I keep track of pages internally and just overwrite stuff where things change

#

because the idea is that you allocate big chunks of memory and then you suballocate them to sparse pages

raven orchid
#

That would work

wicked notch
#

eviction in this case would mean deallocating the big chunk of memory

raven orchid
#

Do you plan to just have 1 big memory pool?

#

Or multiple?

wicked notch
#

A few

#

well I'm thinking to make the pool size 1GiB

#

or something crazy high like that

#

it's something to experiment

raven orchid
#

One really nice thing you could exploit is caching if you go that route now that I think about it

#

Since you never remove pages

wicked notch
#

eh

#

yes

#

but that means more bookkeeping

raven orchid
#

True

wicked notch
#

also can you trust the GPU to not invalidate memory froghorror

finite quartz
raven orchid
finite quartz
wicked notch
#

I guess it's raster only

#

so maybe some other sort of culling is necessary

#

mesh shaders would be so good here

#

you can just use a task shader to cull invalid pages and dispatch only the necessary meshlets

raven orchid
#

Yeah think this one is raster bound mainly, dang

raven orchid
#

That’s massive actually

wicked notch
#

The alternative is work expansion in compute shaders

#

which should work just as well, it's just painful to implement bleakekw

#

the core idea is the same

#

check if the page is invalid, check whether any meshlets overlap that page, if both are true cull meshlets for that page

raven orchid
#

Yeah true that would work

#

I saw a paper a while back too where they estimate if a shadow will intersect the frustum

wicked notch
#

the only painful thing to check for is whether said meshlet overlaps many pages

#

they also do HZB wtf

#

HZB except for memory pages froghorror

#

so much indirection

#

I lost count

raven orchid
#

Hmmm yeah that’s also true

#

Though I do wonder how much better your perf will be since you’re using nanite 2

#

I’m testing just with regular meshes

wicked notch
#

probably a bit better

#

the reason it works so well is due to the finer grained nature of meshlets

#

it could work just as well with regular meshes to be fair

#

just perhaps less of an impact in 🅱️erf

wicked notch
#

it is time for me to attend a party

#

I shall ponder VSM there too

wispy spear
#

: )

wicked notch
#

if I'll be still lucid by then

wispy spear
#

have some fun irl

#

nanite 2.0 can wait

wicked notch
#

ye it's unlikely that I will ponder them bleakekw

wispy spear
#

we need a cooler name for it too btw 🙂

wicked notch
#

I'll leave that to my aide JStephano

wispy spear
#

dropped jaker like a hot potato

wicked notch
#

if Jaker is my right hand man, JStephano is my left hand man

wicked notch
frank sail
#

They'll be impressed

wicked notch
#

finally

#

the thing that will destroy darian

#

not even I can load this

#

it crashes while trying to allocate 20GB of VRAM (totally reasonable and valid amount of memory)

frank sail
#

yeah you need an AMD GPU to load that

wispy spear
#

one of them workstation ones with 48 jigs?

#

why one file btw 🙂 why not split it into chunks?

#

"somehow"

twin bough
#

omg

wicked notch
#

I could

#

in fact I will

twin bough
#

render it plz

#

i wanna see xd

wicked notch
#

I'll send it to darian soon

#

I have a gigabit internet here, luckily the netherlands are a civilized place

twin bough
#

seems like the future of game sizes will be bleak.. 16gb of just geometry seems...

wicked notch
#

fuck I don't even have enough space on my drive

finite quartz
wispy spear
#

wait until jakers vkspec llm turns hostile and generates mesh out of text

wicked notch
#

I envy my friend so much right now, his workstation exported 30GB of FBX and converted it into 20GB of glTF, all in 2 hours

wispy spear
#

some threadrepperino?

wicked notch
#

ye

#

one you can't actually buy

#

a 5995WX or something

#

whatever it's called bleakekw

wispy spear
#

ah the 64c one

#

a potential 96c one is coming soon 😉

distant lodge
#

is that its operating temperature

wicked notch
#

no that's the idle one

#

operating temperature is somewhere between the surface of the sun and the core of super star WR102f

wispy spear
#

wolf rayet stars are cool shit

wicked notch
#

alright then

#

I guess I should get started

#

may the god of sparse watch over me

wispy spear
wicked notch
#

ahh yes, don't you love it when a single layered/mipmapped texture takes up 32GiB of VRAM

raven orchid
#

Oh wow wtf. Just from one 16k mipmapped texture?

wicked notch
#

I also layered it 32 times over bleakekw

glass sphinx
wicked notch
#

(MJP did, god bless him, literal MVP)

frank sail
#

all this talk about vsm makes me want to make my own vsm thingy 😄

wicked notch
#

do it

#

it's the shadow technique to end all shadow techniques after all

frank sail
#

Since I have plenty of other things to look at

wicked notch
#

I'm not sure

#

on one side GL's sparse is much easier

#

on another side I know it's gonna perform very bad so fake sparse will be necessary bleakekw

frank sail
#

The impl I was thinking of wouldn't use real sparse anyway

wicked notch
#

I'm looking at SVT's slides right now

#

and I guess I get it

#

but the UV remapping to the real texture is weird

#

the hell does this mean bleakekw

#

paging is offset galore truly

frank sail
#

tl;dr indirection

wicked notch
#

I like how phys_tc doesn't exist

frank sail
#

I feel like reading code will just increase confusion at this point

#

The whole thing just seems like indirection and basic arithmetic

wicked notch
#

ok yes it's probably less clamplicated than I expected

#

I still fail to see how to translate virtual uv to physical uv though

#

ok suppose I wanna make a physical texture that will have at most 4096 128x128 pages

#

so basically a 8192^2 R32_SFLOAT texture

#

then I take a regular 2k texture and stuff its pages in the physical texture

#

now suppose I want to get the texel at uv (1, 1)

#

This will translate to page indexed (127, 127) I think?

#

which will again translate to somewhere in the physical texture, suppose the page also gives us the uv offset into the physical texture

frank sail
wicked notch
#

yes

#

My plan is to use multiple backing textures

frank sail
#

why

wicked notch
#

how do I decide how big to make the backing storage

#

isn't it better to have many smol ones

frank sail
#

Compile time constant smart

#

But yeah that's fair

wicked notch
#

ah yes

frank sail
#

If you have a strict budget that you can decide at init time, then surely you can get by with one big boy texture

wicked notch
#

anywho

#

we got the correct offset into the physical page that has the sample we're looking for

#

horray!

#

except it's just an offset, specifically it's the lower left corner of the page, we need to offset the offset by something

frank sail
#

um here's what I was imagining

vec2 virtual_uv = WorldPosToShadowUv();
ivec2 page_texel = ivec2(pagesDimensions * virtual_uv);
vec2 physical_uv = fract(pagesDimensions * virtual_uv);
int physical_index = texelFetch(s_virtualToPhysical, page_texel).x;

float shadow_depth = textureLod(s_shadowArray, vec3(physical_uv, physical_index), 0.0).x;
#

I made the backing texture an array of pages to spice things up

wicked notch
#

hmm

#

wouldn't it be phys_uv = fract(backingMemorySize * virtual_uv)

frank sail
wicked notch
#

are 0, 1 and 2 the backing textures

frank sail
#

these are pages

#

we don't know or care where they (physically) are, nor their size

#

the virtual2physical texture will tell us where they are physically located

#

and I guess what you do with size depends on how you handle the backing memory

#

maybe

#

using an array for the backing memory seems to most logical here, since it doesn't give any false impression that virtual pages map directly to physical ones

wicked notch
#

hold on

#

I must draw as well

frank sail
#

this picture is gonna be the hottest drop of 2023

wicked notch
#

here it comes

#

alright so lemme explain this shit

#

the first texture, marked with zero is the shadow map

#

let's say we are good boys and it's only 2k

#

we wanna sample in a particular texel

#

alright no prob, we go onto texture 1, the page table

#

the page table tells us the uv of the lower left corner of the physical page in the backing texture

#

very nice, let's go onto texture number 2, the humongous backing texture

#

now here's a problem, we know the page we want to sample from, but the actual texel info is lost, we cannot use the original offset because the texture sizes mismatch

#

and we don't even have an offset to begin with, since we only know the virtual uvs

#

how do we deal with translation in this case

frank sail
#

ok I realized an error in my code

#

virtualShadowMapSize * virtual_uv
should be
pagesDimensions * virtual_uv

frank sail
wicked notch
#

ok so we get the uv of the lower left corner of the page we are sampling in the virtual texture

#

then we subtract = offset into the page

#

right?

frank sail
#

er

#

maybe I mean the second step

#

actually lemme think bleakekw

wicked notch
#

pondering together

#

let's link our brains

frank sail
#

ok so I guess you can ignore the actual size of the virtual texture here (when sampling)

#

you just need to know the UV and how many pages there are on each axis

#

I'll draw another pic

wicked notch
#

fake sparse is painful

#

offsets are spooky

frank sail
#

surely knowing the dims of all the textures involved is sufficient

wicked notch
#

ok what's the actual uv then

#

the part we're missing is missing bleakekw

frank sail
#

lol I didn't want to calculate it since it would be a yucky number

#

but we know where the physical page is and the offset into it, so it would not be hard

wicked notch
#

it should be lower left corner + (0.25, 0.25) * ????

frank sail
#

1/3 is your last number innit

wicked notch
#

1 / num pages ye

#

we solved it

#

incredible

frank sail
#

how did we solve elementary skool bleakekw

wicked notch
#

listen

frank sail
#

I mean pass bleakekw

wicked notch
#

we are kindergarten students

#

it's ok

frank sail
# frank sail

I like the random red herring of specifying that the whole VSM is 256x256 and then never mentioning page sizes again

raven orchid
#

This is awesome lol

wicked notch
#

ok jaker

frank sail
raven orchid
#

Full blown virtual texturing next I guess

wicked notch
#

now solve linear mip transitions

frank sail
#

erm

#

I mean

wicked notch
frank sail
wicked notch
#

is there a spec on how shrimplers work

frank sail
#

you can use hw filtering for intra-page filtering, then handle edge cases yourself as mjp mentioned

wicked notch
#

like linear filtering in Vk or GL

frank sail
wicked notch
#

good

#

I thought it was magic implementation defined black boxes

#

like aniso bleakekw

frank sail
#

Only aniso

wicked notch
#

reinventing samplers one step at a time

#

only in GP

frank sail
#

mipmap filtering (trilinear) is actually quite shrimple

#

you just sample each mip, maybe filter the per-mip samples, then blend them together with fract(lod)

wicked notch
#

good ol lerp

raven orchid
# wicked notch

So I guess with this one the 2048^2 represents our “true” shadow map which covers the entire scene

wicked notch
wicked notch
#

it should've been 256^2 because then I would've actually been able to understand what was going on bleakekw

#

the only thing left to do is the allocator I guess

#

but that's a problem for tomorrow

#

tomorrow's me will surely be far more intelligent and not dumb at all, for sure

#

thank god the allocator is the same both for fake and real sparse

frank sail
#

Real fake allocators

wicked notch
#

how do I name this allocator

#

sparse page allocator?

#

real fake sparse page allocatornbleakekw

frank sail
#

William

wicked notch
#

william it is

wispy spear
#

Horse Conch

#

might have been a better name 😛

wicked notch
#

Damn, I wake up to another, 2 hours long, uber detailed explanation

wispy spear
#

hey thats devsh

frank sail
#

so, what heuristics are used to determine if a page is resident? do we just analyze the gbuffer each frame?

#

I guess that is easy enough

wicked notch
#

just depth

frank sail
#

ye

#

big brain: don't mark request it resident if albedo is 0 since the shadow won't be visible anyways

wicked notch
#

step 1. unproject_depth()
step 2. to_page_index()
step 3. make_resident(page_index)

frank sail
#

so the only hard part about this is the CPU readback crap

#

unless

#

allocating/freeing pages via a compute shader (only viable with fake sparse of course)

wicked notch
#

how the hecc are you supposed to allocate pages with compute

#

you need to call in API functions no?

frank sail
#

atomics or somethin

#

oh I mean if you have fixed backing memory already

wicked notch
#

hm then yes

frank sail
#

not having to do cpu readback is compelling ngl

wicked notch
#

but it's necessary once you run out of backing memory

#

otherwise it isn't very virtual at all bleakekw

frank sail
#

the virtual part is remapping pages

#

and possibly pretending you have more memory than in reality

#

you can make this work with clever use of clipmaps and fixed storage I'm sure

wicked notch
#

perchance

frank sail
#

for instance, a worst-case where you can see the whole scene can just allocate pages for a coarser map instead of filling the entire higher-detail map

#

I guess determining which level of the clipmap to use is when the heuristics get hard

#

maybe not

#

you can just prioritize coarser clipmap levels when you need to evict

#

assuming they overlap like this

frank sail
#

with real sparse, you don't need to max out your budget immediately

#

well I guess with non-sparse you can just create separate textures that act as your memory allocations, so idk

#

but yeah I think figuring out how to handle eviction in the worst cases will be the hardest part of this

wispy spear
#

could you use the other 10bits for that perhaps?

wicked notch
#

the way I was thinking about memory is the same for fake and real sparse

#

with real sparse I would've allocated 256MiB VkDeviceMemory every time

#

with fake sparse I allocate 256MiB physical textures

frank sail
wicked notch
#

but ye eviction will be very bad to handle

#

maybe if block of pages hasn't been touched in X frames evict

#

depending also maybe on the remaining budgets, how many pages are currently active, etc

frank sail
#

I think you should only evict if needed

#

that way you can avoid writes if nothing changed

wicked notch
#

ye, if the remaining budget is: "a lot", then no eviction

wispy spear
#

perhaps you could just play with different strategies

#

and see which is less weird

frank sail
#

I reckon there needs to be an atomic queue (or multiple) for pages

wicked notch
#

hm?

frank sail
#

when you allocate a page, push to the queue. when you need to dealloc, pop the oldest thing

wicked notch
#

oh, nah I just use morton codes

frank sail
#

idk I remember they did something similar for surfel allocation in surfel GI bleakekw

wicked notch
#

I make an array of 128x128 unsigned integers, last bit is the valid bit, each frame I clear everything to invalid and then I readback the valid pages and mark them

frank sail
#

alright so that gives us the invisible pages I suppose

#

but that also means no reuse unless you use another bit

wicked notch
#

what does this bit do

frank sail
#

it tells us that something has been rendered to the page at some point

#

if the page becomes visible again, we needn't re-render the page unless the sun moved or that area of the scene changed

wicked notch
#

ah

frank sail
#

which is admittedly a bit of an endgame optimization

wicked notch
#

yeah

#

lol

#

caching comes far into the future

frank sail
#

JS already has shadow map caching, hurry up frog_whip

wicked notch
#

I'm still conflicted on whether I should first finish real sparse or already move on with fake sparse

frank sail
#

if you do real sparse, I'll do fake sparse

#

then we can compare

#

I'll have to go to bed now because I need to wake up and perform manual labor is 5-6 hours bleakekw

wicked notch
#

💀

wicked notch
#

Just because

raven orchid
#

Every frame ended up being excessive

#

The read back is an ssbo that compute atomically pushes work to for the CPU

#

Though it still might be necessary to add a sort of allocator hardware budget to prevent it from trying to dealloc too many pages during one frame

wicked notch
#

hmm another problem has arose

#

Not every virtual texture will have the same resolution, some may be 16K, others may be 4K, so their page table sizes too will differ

#

Eh actually this is easily solvable, I just have to make yet another buffer storing each texture's page table offset in the main array

wicked notch
#

why are you asking this here bleakekw

runic surge
#

OH FUCK

wicked notch
#

I dun have a stretchy monitor

runic surge
#

I THOUGHT THIS WAS QUESTION

#

IM SORRY

wicked notch
#

no prob lmao

runic surge
wicked notch
#

Jaker was indeed right when he told me to mark reusable pages

#

too bad I'm stoopid

wicked notch
#
void main() {
    const ivec2 position = ivec2(gl_GlobalInvocationID.xy);
    const uvec2 size = imageSize(u_visbuffer);
    if (any(greaterThanEqual(position, size))) {
        return;
    }
    const uint depth_bits = uint((payload >> 34) & 0x3fffffff);
    if (depth_bits == 0) {
        return;
    }
    const float depth = uintBitsToFloat(depth_bits);
    const vec2 uv = (vec2(position) + vec2(0.5)) / vec2(size);
    vec4 world_position = inverse(u_camera.data.pv) * vec4(uv * 2.0 - 1.0, depth, 1.0);
    world_position /= world_position.w;
    const vec4 shadow_uv = shadow_data_ptr.pv * world_position;
    const uint shadow_tile_x = min(uint(shadow_uv.x * VSM_RESOLUTION / VSM_TILE_SIZE_X), VSM_TILE_SIZE_X - 1);
    const uint shadow_tile_y = min(uint(shadow_uv.y * VSM_RESOLUTION / VSM_TILE_SIZE_Y), VSM_TILE_SIZE_Y - 1);
    const uint shadow_tile_index = shadow_tile_x + shadow_tile_y * VSM_PAGE_COUNT;
    atomicAdd(vsm_page_req_ptr.pages[shadow_tile_index], 1);
}
``` amazing
wicked notch
#

holy god

#

this is horrifying to look at

frank sail
#

Is this real sparse

wicked notch
#

well this is just the page request part of real sparse

#

the thing where you see which pages are needed

frank sail
#

That's just analyzing the gbuffer eh

#

Depth

#

Wait so you aren't even touching the sparse texture here froghorror

wicked notch
#

ye

#

mfw writing to host memory takes time

#

shocking

frank sail
#

Ah

glass sphinx
frank sail
#

You should probably transfer to host after this pass or something

glass sphinx
#

🥸

wicked notch
#

ye I'll do some transfer shenanigans

frank sail
#

Or use device local memory

#

With host visible

wicked notch
#

Darian said it's even more painful than HOST_CACHED bleakekw

frank sail
#

Fake

#

Just try it bro

wicked notch
#

alright

#

surely much better

#

but will I read correct data?

#

or rather

#

will reading from the CPU be absurdly slow?

#

I shall now test

#

it takes half a millisecond to read 16KiB of data

#

jesus

frank sail
#

hmm

wicked notch
#

I think a dedicated transfer operation is necessary

frank sail
#

Ye

#

Seems the least finicky too, since memory types can change per device

wicked notch
#

I don't need to care about memory types though do I?

#

Like I always transfer from DEVICE_LOCAL to HOST_CACHED

wicked notch
#

yup, dedicated transfer op is a noop basically lol

#

and reading takes 2 microseconds

#

amazing

frank sail
wicked notch
#

Hmm

#

once a virtual page is mapped to a physical page, should this mapping remain the same until the virtual page is evicted?

#

or rather, overwritten, not evicted

frank sail
#

I don't see why you'd need to shuffle around allocated page tbh

wicked notch
#

maybe not for shadows

#

but for textures in general

#

actually, maybe for shadows too

#

I am suballocating from a 256MiB VkDeviceMemory after all, I need to free up space whenever possible

#

Say in frame 0 we draw to virtual page 0, this will map to physical page 7 or something; now suppose we never draw to virtual page 0 again, the physical page 7 will never be used again

#

If I don't "free" it

frank sail
#

Ok so you can shuffle allocated, but unused (freed) pages

#

That makes sense

wicked notch
#

ok good

#

all this indirection is destroying my single shared neuron

wispy spear
#

here, have mine.

wicked notch
#

ok the roadmap is as follows

#

step 1. get id of the virtual pages requested from the GPU
step 2. invalidate all pages
step 3. for each requested id, if it's not resident allocate new page

#

I am skeptical about the invalidate all pages step, but it's a logical operation so it's fine I guess

#

Jaker can you fact check

#

JS you're welcome to share the neuron and fact check too

frank sail
#

Invalidate all pages seems okay

#

I think it will even work when you add caching, provided you add a little more bookkeeping

wicked notch
#

I'm thinking of it like a linear allocator

#

that gets reset every frame

#

except already resident pages that didn't change shouldn't be touched

#

so actually hold on

frank sail
#

I'm holding for dear life

wicked notch
#

ok I got it

#

I keep track of 2 things

#

allocated pages and non allocated pages, for the allocated pages I also keep track if they are resident or not

#

when I get the requests from the GPU, there are a lot of things I have to do

#

while I was writing I decided that this is stupid

#

Better method: only keep track of allocated and unallocated pages

#

Now:

  1. requested page is already allocated => do nothing
  2. requested page is not allocated => allocate and update sparse bindings
  3. if page wasn't requested => deallocate it
#

how does this sound

raven orchid
#

Most of the time that will be really good

#

One exception I ran into was big camera changes if you immediately dealloc

wicked notch
#

the only time I call in vkQueueBindSparse is if 2 happens

raven orchid
#

Like frame 3 might get overwhelmed with dealloc requests

wicked notch
#

Hm

raven orchid
#

Then frame 4 the dealloced pages get requested again

wicked notch
#

btw deallocation in my case effectively does nothing

#

it just changes one bit in the page table

raven orchid
#

ah ok I don't think that will be a problem then

wicked notch
#

I should be safe right?

#

ok good

raven orchid
#

so memory still stays around?

wicked notch
#

ye memory and sparse bindings are untouched

raven orchid
#

Ok I think that will be fine since it's a cheap operation

wicked notch
#

I make a promise to not use that page again

#

if I break the promise I die

#

actually

#

I can break the promise

#

until I do caching

#

if I do caching and break the promise then I really will die

wispy spear
#

no worries, ill resurrect you

frank sail
#

I wonder if SDSM would even be a reasonable thing to implement with VSM

#

caching would instantly die

#

eh, it would probably suck since the whole point of SDSM is to tightly fit the shadow map to the frustum, which would play poorly with the concept of sparse allocation

twin bough
#

speaking of shadows

#

did anyone of you guys tried Tetrahedron Shadow Mapping yet

#

i want to learn it because it seems a really cool to have omni shadows on a 1x1 grid

frank sail
#

I have not seen it

twin bough
frank sail
#

only free resource about it is some student project
bleakekw

twin bough
#

i have a huge problem right now regarding omni shadows because the filter kernel leaks into the other parts of the cubemap

#

yeah 😦

#

there is a source code

frank sail
#

ok the algorithm seems relatively shrimple

#

I rarely see user-assigned clip planes actually used for anything

twin bough
#

the draw call generation is interesting

wicked notch
#

but we more than make up for it with clipmaps

wispy spear
#

@wicked notch how come you only came out of hibernation half a year ago or so and not already when you joined this frog pond 2 years ago? : )

wicked notch
#

I joined this pond 2 years ago due to my uni project I think I mentioned

wispy spear
#

oh, i may or may not have missed that

wicked notch
#

it was a Vk project and most of the questions I asked here at the time were things my classmates had written bleakekw

wispy spear
#

i dont rember seeing you being active before hence the question 😄

wicked notch
#

ye I basically wasn't active

wispy spear
#

either way, im glad you are here

distant lodge
#

I think the discord server panel even has a buzzword for this

#

it's called activation time or something

wicked notch
#

we got le page table

#

no amount of debug outputs will make me not write bugs

wicked notch
#
#define VSM_TILE_SIZE_X 128
#define VSM_TILE_SIZE_Y 128
#define VSM_CHANNEL_SIZE 4
updates.clear();
for each page_index in requested_pages {
  if (page_index == 1) {
    if (page_table[page_index].is_allocated()) {
      continue;
    }
    auto memory = VkDeviceMemory();
    auto offset = VkDeviceSize();
    page_table[page_index] = allocate_page(page_index, &memory, &offset);
    updates.emplace_back(VkSparseMemoryBind {
      .resourceOffset = VSM_TILE_SIZE_X * VSM_TILE_SIZE_Y * page_index,
      .size = VSM_TILE_SIZE_X * VSM_TILE_SIZE_Y * VSM_CHANNEL_SIZE,
      .memory = memory,
      .memoryOffset = offset
    });
  } else {  
    page_table[page_index] = deallocate_page(page_index);
  }
}
if (!updates.empty()) {
  vkQueueBindSparse(...);
}
#

ok good

#

this will quickly get out of hand for textures though

wicked notch
#

actually nevermind

#

this is completely bogus for regular textures and will work only for VSM

wicked notch
#

Ironed out it kinda looks like this

#
struct sparse_memory_block_t {
    VmaAllocation memory = {};
    VmaAllocationInfo info = {};
    uint64 allocations = 0;
};

struct sparse_memory_page_t {
    std::reference_wrapper<sparse_memory_block_t> block;
    uint64 offset = 0;
    uint64 size = 0;
};

struct sparse_image_memory_bind_t {
    image_subresource_t subresource = {};
    offset_3d_t offset = {};
    extent_3d_t extent = {};
    sparse_memory_page_t page;
};```
#

I am sort of unhappy still

frank sail
#

father

#

I crave serotonin

wicked notch
#

go do VSM

#

plenty of dopamine and serotonin once you achieve it

wicked notch
#

as it turns out

#

an unconditional vkAllocateMemory inside the render loop will lead to bad things

wicked notch
#

This is the single, most fucked up, most inefficient and most ridiculously non-scalable piece of code I have ever written, in my entire life

#

but it somehow works

#

actually I take "inefficient" back

#

it's actually pretty good bleakekw

#

averaging 30usec

#

but it is still stupidly non scalable

#

I gotta separate the "page allocator" and the "block allocator"

#

very tedious overall bleakekw

#

things left to do:

  • build a mip chain out of the page table, to be used as a "HZB"
  • build meshlet lists for the hw and sw rasterizers and cull pages
  • actually draw the shadow map based on the info above
#

oh btw, @raven orchid how bad is clearing a 16k render target? bleakekw

raven orchid
#

Well though I’ve never had the full thing in memory though

#

But clearing the sparse target every frame hasn’t been bad so far surprisingly

raven orchid
raven orchid
wicked notch
#

I draw into an R32_UINT shadow map

#

and I imageAtomicMin it with floatBitsToUint(depth) for both the hardware and software paths

#

so rip early Z

#

but I already know it won't be an issue

delicate rain
#

I'm confused didn't you do rt a little while ago? How are you doing insano shadow stuff now 🥸

wicked notch
#

Parkour

#

RT is currently stuck due to graph partitioning

delicate rain
#

I'm interested where this shadow map hole leads you to, I need inspiration, I hit a bit of a slump when it comes to my shadows

frank sail
#

We've been discussing it in here, #1090536732769927178, and #1128020727380054046

delicate rain
#

Will make sure to check it out, thanks

#

Okay I have no idea what VSMs are nor what you talk about in any of these threads but I'm sold already

#

Give me two days to catch up lol

frank sail
#

basically we are decoupling storage from our shadow map, allowing us to act as though the shadow map is huge (16k^2), at the expense of having to determine active pages and manage memory ourselves

delicate rain
#

So you are pretending you have huge shadowmap but only actually draw the sampled parts into a much smaller texture which you manage manually?

delicate rain
#

Nvm I'm starting to get it

delicate rain
# wicked notch ye caching is dead with SDSM

Taking the risk of sounding really stupid, could you not make your VSM span the frustum (similarly to SDSM) and then virtually reproject the tiles from last frame that are still in the new frustum this frame + draw the new tiles you need where the frustum moved?

frank sail
#

It could work if your frustum snaps to the world grid or something

#

That way tiles can reproject cleanly

#

I reckon you'd also have to transform light space depth from the old frame into the current

delicate rain
#

Why would you need that? The light space depth should be invariant to the frustum position no?

#

The thing that worries me the most with having small "tiles" is the amount of times I'll need to draw the scene

#

I'll prob need to yeet someones culling in order to make this viable

wicked notch
#

but you do culling per tile

#

i.e for each mesh/meshlet you determine if it overlaps any tiles that will be needed for sampling

#

if it do the you draw it

#

writes into unpaged memory get discarded with real sparse, but with fake sparse you should make sure to not write into unpaged tiles

delicate rain
#

Ohhhhh

#

Hmmm I see

#

Very smart

wicked notch
#

I dunno what kinda substances unreal devs sniff to come up with this shit but boy am I glad they do

delicate rain
#

Yeah the soda there must have a lot of minerals or smth

wicked notch
#

I am considering using a linear allocator for this shit

#

I now realize that 250 microseconds on average is far too much on the CPU side to update just 16k pages

#

I'm also considering placing a fixed amount of pages that can be updated each frame

#

but that's if the linear allocator thingy fails

glass sphinx
#

linear allocators are good

#

its really impressive how fast atomics are on gpus

#

its so good

wicked notch
#

Alright this is hard

frank sail
#

Same

wicked notch
#

Here's how I'm thinking of solving the amounts of page requests per frame

twin bough
#

That's what he said

wicked notch
#
layout (scalar, buffer_reference) restrict buffer b_page_table {
    uint8_t[] pages; // this is num_pages_total, for all virtual textures that exist
};

layout (scalar, buffer_reference) restrict buffer  b_page_req_table {
    uint count;
    uint[] pages;
};

void main() {
    // do whatever it is I have to do to get a page index for this frame or PAGE_INVALID if this pixel requests no pages.
    const uint page_index = find_virtual_page(...);
    if (page_index != PAGE_INVALID) {
        const uint8_t page_value = atomicExchange(page_table_ptr.pages[page_index], 1, gl_ScopeQueueFamily, gl_StorageSemanticsBuffer, gl_SemanticsAcquireRelease);
        if (page_value == 0) {
            const uint slot = atomicAdd(page_req_table_ptr.count, 1, gl_ScopeQueueFamily, gl_StorageSemanticsBuffer, gl_SemanticsAcquireRelease);
            page_req_table_ptr.pages[slot] = page_index;
        }
    }
}```
#

Ok pardon for the long time, I was thinking about it while I was writing bleakekw

#

I call in the atomics man: Wpotrick

#

please analyze this code bleakekw

wispy spear
#

summoning @glass sphinx

wicked notch
#

I should also note that this is to make CPU readback easier by sending in only page indices that have actually been requested, instead of all the pages

#

given that the memory is deallocated automatically every frame, since I'm switching to linear allocators

glass sphinx
#

what does find virtual page do

#

i have to go gym

#

later i help

wicked notch
#

it detects which tile of which texture this pixel is going to sample from

#

alright monkey man, have a nice workout session

wicked notch
#

Hmm I'm still thinking

#

Now that I have all requested pages, there is no way to guarantee order

#

so each frame I would end up doing a huge number of page requests

#

or wait, not exactly

#

the page requests would remain the same

#

But their location in memory could differ

#

Do I want this?

distant lodge
#

that might mean some of them are colder in cache than otherwise could be

#

but I dunno if your actual access patterns make that noticeable

wicked notch
#

Alright I was terribly wrong

#

I do need to update page bindings too with a linear allocator

frank sail
wicked notch
#

After very careful thought

#

A linear allocator is not usable for this kind of thing

#

I need some allocations to be persistent

wicked notch
#

It may work well for fake sparse though

#

since binding ops are super cheap

wicked notch
#

My brain is currently at max capacity, overheating and I still don't have a solution to this problem, despite thinking most of the day

#

The thing is also conceptually very easy too:

  1. Get page indices requested from the GPU
  2. For each page in the request, check if the requested page is resident, if it is do nothing, if it isn't, allocate and update sparse table, also if the page isn't requested deallocate it
#

And yet, this performs terribly, even with a first fit allocator

#

250us on average to update a mere 500 pages

#

How do I fix this

#

I don't think a linear allocator can fundamentally work either, I need some pages to stay resident in between frames

#

I thought also about maybe a "bump allocator" with a ring buffer, but that still doesn't sound right

distant lodge
#

kinda fallen behind this problem, but why can't you split your allocation strategy between persistent and non-persistent stuff

wicked notch
#

I'm not sure how to do that split

#

any page can be resident for any number of frames

#

"Persistent" here means, "the gpu requested this page 2 or more times in a row"

#

the first frame page 0 is requested it is allocated, if the next frame page 0 is requested again, nothing is done (no deallocation or moving of sorts)

#

english died for a sec there bleakekw

distant lodge
#

oh then rip

wicked notch
#

Does a bump allocator with a ring buffer make any sense whatsoever

distant lodge
#

maybe ¯_(ツ)_/¯

#

why not just have some central bitmask block you can check to see if a page is taken

#

I think some real OS allocators use that

wicked notch
#

For the ring buffer you mean?

#

Also that would be one huge bitmask

distant lodge
#

for any page allocator

#

how big are your pages?

wicked notch
#

It doesn't really matter, but 64KiB

distant lodge
#

idk it probably won't be that big though

wicked notch
#

It would be 16384 bits big

distant lodge
#

2048 bytes, 512 uints

#

per gig

wicked notch
#

Because in one virtual shadow map there are 2^14 pages

distant lodge
#

that's not that bad

wicked notch
#

Hm perhaps

distant lodge
#

2KiB/gig is a pretty small cost

#

all for an O(1) cache efficient residency czech

wicked notch
#

So the bump allocator I think would work like this

auto curr_ptr = bump.current();
while (is_page_allocated(curr_ptr)) {
  curr_ptr = bump.advance();
}
allocate_page(curr_ptr, ...);```
#

Doesn't look too bad

#

Advance automatically brings the ptr back to the first slot once it has reached the end

#

And is_page_allocated is said bitmask czech

distant lodge
#

you could probably use fancy bit intrinsics to do funny stuff there too

wicked notch
#

do tell

#

I like funny bits

distant lodge
#

both on the CPU and GPU there are dedicated instructions for stuff like bit count, or getting the first bit that's a 1 or the last bit that's a 1

wicked notch
#

Ah, __builtin_clz

distant lodge
#

so you can probably check 32 pages at a time if you're smart

#

or 64, not sure what you're targeting

wicked notch
#

I'm targeting my system

#

which is fairly modern, so 64 KEKW

distant lodge
#

your CPU system or GPU system

wicked notch
#

CPU

distant lodge
#

oh

#

so then yeah you can check 64 pages at a time

wicked notch
#

Maybe I didn't mention this, but this allocator should run on the CPU ye

#

I got a big brained idea

#

__builtin_clz is exactly what I need holy shit

#
const auto free = __builtin_clz(~bitmask);```
#

or not

#

eh, nah rip

#

it's not what I need

#

there must be a fancy instruction that returns the first 0 bit

distant lodge
#

it might be ffs

#

or maybe ctz?

wicked notch
#

hmm

#

likely

#

let me run some high level simulations (drawing on my tablet)

distant lodge
#

here's a higher level simulation

#

from what I read just now, beware that ctz/clz are technically undefined when x = 0

#

so you might need to be careful with it

wicked notch
#

Ok clz is definitely what I want

#

Suppose 0 is allocated and 1 is free, __builtin_clz(0011) returns 2, __builtin_clz(1000) returns 0 and __builtin_clz(0000) is undefined (good, it means it's completely full)

#

epic

distant lodge
#

undefined means it'll return whatever and you can't depend on it

#

so you have to check it separately

wicked notch
#

yeah, I'll just add a separate checc

#

so then

#

So the while loop before becomes bogus if I can check 64 things at a time

#
for each mask in list {
  if mask == 0 { continue; }
  const auto free = __builtin_clz(mask);
  allocate_page(free);
  break;
}```
distant lodge
#

I wonder if there's SIMD versions of clz

wicked notch
#

ah yes

#

loops? what are those

#

I only know 8192 bit wide vector instructions

#
int mm256_lzcnt_si256(__m256i vec)
{
    __m256i   nonzero_elem = _mm256_cmpeq_epi8(vec, _mm256_setzero_si256());
    unsigned  mask = ~_mm256_movemask_epi8(nonzero_elem);

    if (mask == 0)
        return 256;  // if this is rare, branching is probably good.

    alignas(32)  // gcc chooses to align elems anyway, with its clunky code
    uint8_t elems[32];
    _mm256_storeu_si256((__m256i*)elems, vec);

//    unsigned   lz_msk   = _lzcnt_u32(mask);
//    unsigned   idx = 31 - lz_msk;          // can use bsr to get the 31-x, because mask is known to be non-zero.
//  This takes the 31-x latency off the critical path, in parallel with final lzcnt
    unsigned   idx = bsr_nonzero(mask);
    unsigned   lz_msk = 31 - idx;
    unsigned   highest_nonzero_byte = elems[idx];
    return     lz_msk * 8 + _lzcnt_u32(highest_nonzero_byte) - 24;
               // lzcnt(byte)-24, because we don't want to count the leading 24 bits of padding.
}``` oh god what the fuck is this
distant lodge
#

looks weird

wicked notch
#

mfw gcc can't auto vectorize this shit because of control flow

#

alright that's enough bikeshedding the most optimal of clz instructions bleakekw

wicked notch
wicked notch
#
is_allocated(allocator, page) {
    index = page.index / 64;
    bit = page.index % 64;
    mask = allocator.list[index];
    return !(mask & (1 << bit));
}

deallocate(allocator, page) {
    index = page.index / 64;
    bit = page.index % 64;
    mask = allocator.list[index];
    mask |= 1 << bit;
}

allocate(allocator, page) {
    for (mask in allocator.list) {
        if (mask == 0) { continue; }
        index = 63 - __builtin_clz(mask);
        mask &= ~(1 << index);
        page.memory = allocator.memory;
        page.offset = ...;
        return VkSparseMemoryInfo(...);
    }
}

for (req in requests) {
    if (req == REQUEST_PAGE_NEEDED) {
        if (!is_allocated(allocator, pages[req])) {
            updates.emplace_back(allocate(allocator, pages[req]));
        }
    } else {
        if (is_allocated(allocator, pages[req])) {
            deallocate(allocator, pages[req]);
        }
    }
}```
#

This looks very promising

#

bit ops, how lovely

wicked notch
#

nice, unfortunately I won't be using that 😦

#

Or well

#

all is to be seen

#

so good thing you confirmed it was good

#

@raven orchid you may wanna invest in bitops too

raven orchid
#

Bitops?

#

Also trying to catch up on this

wicked notch
#

ok so, my allocator's performance was not very good

#

I was originally using a first fit, free list allocator (for some goddamn reason, no wonder it was slow) to manage my virtual pages

#

turns out you can do a much better thing

#

you use a big ass bitmask to remember which pages are allocated and which are not, then when you allocate you simply go through the bitmask and find the first free slot, using a CPU intrinsic

#

deallocating is even easier, you take the index of the page within the bitmask and the index of the bit, and you set the bit back to 1

#

now this, is blazing fast

#

like nanoseconds blazing fast

raven orchid
#

Ohhh now I see so this is specifically optimizing the problem of

#

How to find the next free slot fast

wicked notch
#

yes exactly

distant lodge
wicked notch
#

I dunno how to make that branchless bleakekw

distant lodge
#

make what specifically branchless

#

you can make all the aggregated bit checks branchless probably

distant lodge
#

hmmm

#

you just need to do

#

actually nah

#

idk

#

I was thinking you could use a special sentinel value like (mask == 0) * sentinel + (mask != 0) * __builtin_clz(mask)

#

but you'd just be deferring the branch

raven orchid
#

Hmmm

#

I think I’ll try this next if my current experiments blow up

wicked notch
#

epic

#

did you ever get around to implementing the HZB

distant lodge
raven orchid
#

No I fell into the caching rabbit hole

#

But I did increase the page group size to 32x32

distant lodge
#

nvm I think the scalar lzcnt came in SSE4, not that there's an SSE lzcnt

#

that's a rip

raven orchid
distant lodge
#

SSE4 is one of the later x86 isa extensions

wicked notch
distant lodge
#

its in AVX512 :^(

#

vplzcntd

raven orchid
#

For the physical memory itself I lost track

#

Did you decide on a “never actually free full” strat or do you release memory sometimes?

distant lodge
#

if you had a 32 core avx512 CPU you could do an allocation check in 1 cycle

wicked notch
#

the current idea is to deallocate a page block when it's empty for a number of frames

distant lodge
#

if only...

wicked notch
distant lodge
#

don't think those support avx512

#

do they?

wicked notch
#

the latest arch does

#

the 7000 series I think

distant lodge
#

simply buy one of those

wicked notch
#

requirements to run my engine: "32 core threadripper (64 preferred) with AVX512"

raven orchid
#

Another question

#

Could your current alloc strategy be partially moved to the GPU? Like could compute essentially select the next free pages atomically and just write that as work for the cpu to

wicked notch
#

uhh

#

let me think about it

#

ok yes

distant lodge
#

if you can do it with this simd crap you could totally dispatch a compute shader to do this same operation

raven orchid
#

I guess I’m thinking right now it’s probably
Gpu: I would like a page
Cpu: I will find next and alloc

Vs

Gpu: I need you to alloc this exact page
Cpu: ok

distant lodge
#

it'd be cool too

wicked notch
#

yes this is totally doable on the GPU now

#

the only thing I have to readback is the indices of the virtual pages I should update and the offsets into the device memory it should reside in

#

the only problem I see with this is when the block is completely full

raven orchid
#

Oh true

#

That would be what

#

Block fault lol

wicked notch
#

But I could issue an extra dispatch for that no prob

raven orchid
#

Unless the gpu could somehow just do some kind of virtual block strategy

#

And the cpu would alloc the block followed by the actual requests lol

wicked notch
#

man the world would be a better place if the GPU could allocate memory for itself

distant lodge
#

it'd be cool if the CPU had some kind of stable handle to GPU stuff so you didn't have to readback

raven orchid
#

Yeah 😦

distant lodge
#

vkCmdAllocateIndirect

wicked notch
#

18446744073709551615

#

18428729675200069631

#

stupid fast now

#

20usec on average

wispy spear
#

how long does UE5's impl take?

wicked notch
#

I dunno and I dun care bleakekw

#

This is different though

wispy spear
#

hehe, fair enough

#

ah

wicked notch
#

UE5 does it differently

#

But their code is so convoluted that I'm not basing it off UE at all

wispy spear
#

makes sense

wicked notch
wicked notch
#

now the hard part

#

rendering this thing bleakekw

proven laurel
distant lodge
#

what's the standard one look like?

wicked notch
#

std::countl_zeros

wicked notch
#

oh no

#

oh no...

#

all hope is lost

#

it's so joever

#

sparse is slow

#

(shocking)

wicked notch
#

page table HZB is such a ridiculous idea

frank sail
#

Impossible, it can't be done

wicked notch
#

watch me do it anyways and fail miserably bleakekw

runic surge
#

i believe lvstri

wicked notch
#

I don't

wicked notch
#

at least sync works now

#

timeline semaphores are fucking awesome

#

man this is so incredibly tedious

#

as it turns out when your entire renderer is hybrid sw/hw compute/mesh shader it isn't exactly trivial adding stuff bleakekw

wispy spear
wicked notch
wispy spear
#

ikr 😄

wicked notch
#

perfectly depicts my current struggle bleakekw

wispy spear
#

hehe

#

make sure to take a rest in between too

wicked notch
#

yeah I'll leave this for tomorrow me

wispy spear
#

NO EXCUSES!

#

oki

wicked notch
#

surely tomorrow me will be far more intelligent

wispy spear
#

sounds like my motto

frank sail
#

Don't worry, I'll finally be at my computer tonight and my implementation will be perfect

#

Inshallah

wicked notch
#

🙏

#

I shall await its completion habibi

wicked notch
#

I am extremely unhappy with how I wrote my shaders, I kept thinking "eh this is fine for now"

#

Well, it isn't fine anymore bleakekw

#

and I have also determined that HLSL still isn't as good as a drop in replacement as I'd like it to be

#

so good old GLSL it is

#

horray

wicked notch
#

I'm not convinced

#

GLSL sucks

#

sucks so bad

#

How much can I hide behind convenience functions the inline spirv?

glass sphinx
wicked notch
#

functions

#

there are a million restrictions on functions, plus there are no templates which would come in handy for some stuff in my compute rasterizer

#

it's hard to generalize something when functions suck

glass sphinx
#

since i use bda and texture handles i dont really mind it mich

wicked notch
#

I do too, but sometimes just using functions is nicer

glass sphinx
#

i dont know what you mean

#

"just using functions"

wicked notch
#

take a hypothetical check_hzb function

#
struct X {
    // can't have this in a struct in GLSL
    sampler2D hzb;
    // other data
};

struct Y {
    image2D hzb;
    // other data
};

// here a template would be useful for the core part of the algorithm, taking in both X and Y
template <typename T>
vec4 get_projected_aabb_uvs(in T x) {
    ...
}

bool is_occluded(in X info) {
    // regular HZB sample check
    ...
}

bool is_occluded(in Y info) {
    // different things, page table HZB check for example
    ...
}```
glass sphinx
#

well i have all textures virtual with handles

wicked notch
#

all this would be possible in HLSL as far as I'm reading

glass sphinx
#

so this is not a problem for me

wicked notch
#

yeah your texture handles are great

#

I don't get how they work though bleakekw

glass sphinx
#

the makros are pretty simple now

#

i generalized them back

#

so they arent typed anymore

#

just the accessor makros are

#

in shaders i mostly dont feel the need for hlsl too much yet

#

i started to dislike bda

#

because i dont have boundschecks

#

it just explodes

#

super annoying

#

so i wanna try hlsl byteaddressbuffer

wicked notch
#

ehh it's not too bad

#

oh yeah that's nice

#

I kinda want to YOLO it and move to HLSL no questions asked

glass sphinx
#

for example for rt i will use hlsl

#

cause nsight doesnt understand bda

#

so i couldnt debug it

wicked notch
#

nsight doesn't even understand timeline semaphores

#

garbage

glass sphinx
#

holy shit

#

honestly vulkan is really bad

#

dx has much better tooling sometimes

wicked notch
#

I tried NV's own sample with timeline semaphores

#

and Nsight just hung

glass sphinx
#

nv and amd also care more about dx i feel like

#

wtf

wicked notch
#

ye but DX feels shite to use

#

plus windows.h

#

I dunno about DX, maybe I'll try it sometime in the future

#

but it's yucky

glass sphinx
#

is it

#

it works in pix

#

another debugger at our whim

wicked notch
#

hmm perchance

#

eh maybe

#

there's a 10% chance I'll try D3D12 in the near future

glass sphinx
#

i dont think i ever will

#

way too kuch work

#

i can use vulkan already

wicked notch
#

oh for sure

#

I'll probably never switch to D3D12 even if it turns out to be the greatest

#

Vk works fine for me

glass sphinx
#

one of the few nice things in hlsl is that you can have multiple entry points in a file without makro spamm

#

sometimes its also quite painful without templates

#

also having member functions is also nicer to txpe

#

the syntax is much better in hlsl

wicked notch