#GPU Zen 3: Virtual Shadow Maps
1 messages · Page 2 of 1
I can delegate this task
hehe I was just looking for those
Beast
Hopefully it will be to our benefit
the initial estimate of 20-30 pages might be accurate, especially when we consider pictures and code snippets
I hate cramming text
anyways I'm not worried about length too much since the hard part (coming up with the algorithm itself) is done
This is the cope I need to hear
it's basically paint-by-numbers
PowerPoint is also surprisingly decent for creating pictures and diagrams
actually let me see if I can get a vtex address -> page table -> memory diagram going. Not sure if it'llwork out
do we have a shared drive to collect any images we might use
no
I'll set one up
you all should get a notification for access to a folder on gdrive
btw the temp z-buffer thing might leverage hw hi-z on AMD if we clear depth to 0 (instead of 1)
then we use a compute fragment shader to "actually clear" the active pages to 1
well it wouldn't just be a win for AMD now that I think of it (I was just considering hi-z)
everyone would benefit from fragments going into inactive pages being culled before hitting the frogment shader
hmmmmmm might be able to use stencil for the same effect (hi-s is a thing afaik and early-s definitely is)
This makes me wonder about the msaa idea
Can you stencil test msaa against non msaa stencil?
Ah nevermind, would be cool if we could use one stencil texel for 32 or 64 pixels at a time
This extension enables applications to use multisampled rendering with a depth/stencil sample count that is larger than the color sample count.
Well I didn't really get it, the idea I had in mind would be that you reduce the bandwidth of z/s buffer
the hardware does the bandwidth compression for msaa
What I wanted to do was test a batch of fragments against the same stencil fragment but the still get the full resolution atomic write out
But it probably doesn't make sense
Because you don't get invocation for all msaa samples
Yeah nevermind
Thanks! I started uploading some stuff to the drive as options
I put a spreadsheet in that folder for us to put benchmarks in
the transient stencil buffer appears to be an anti-optimization btw 
was worth a try
btw @wooden jolt do you have hpb implemented
that gave me a huge uplift in perf
literal 10ms difference
I was trying to figure out why our perf is so different and I guess that must be why
anyways I guess the s-buffer failed because it required a draw to initialize and not just a clear (which is accelerated on probably all hw)
for benchmarks do we have a build of FF that can record that?
I haven't pushed my latest changes which adds the failed s-buffer (disabled) and some better statistics
oh ok
I was just wondering so I can run it locally on the 1060. Maybe we can even put it in experiments to get more data if we need it
However I think we won't need to spend too much time benchmarking
It should be easy to run since I use cmake fetchcontent for everything
But some undocumented code fiddling may be required for certain things (changing the scene or certain rendering parameters)
sounds good! I'll try to run that later and see how it goes
Another caveat is that benchmarkable models should be processed with gltfpack to compress the textures
But we haven't really decided what to benchmark
Bistro seems like a natural choice
bistro sounds good to me
people would also probably be interested in situational tests too but idk if we have time to pull that off
I'm thinking like foliage animation
I could probably hack in some horrible hardcoded "animation" for alpha masked geometry
But it would be easier to just have a bunny or something that just goes in a circle
that would work I think. Probably anything that results in frequent cache invalidations
(also foliage kinda kills perf in my thing so I have it disabled
)
foliage is apparently also horrible in UE with VSM too
someone made a whole video on how to make it not suck as bad. I might still have it saved somewhere
saky mentioned that some of bistro's materials are slightly messed up and report that they're alpha masked when not
oh crap
this is the nvidia website one right?
wonder if the old godot ported one is any different
saky said potrick made a fixed version
@mystic lark could you provide a download if you have it available
that's awesome!
however my renderer currently doesn't account for that
(the material transparency type)
so I might just not implement it because it seems like significant work
what is reasonable in the remaining few weeks is adding page invalidations when objects move and hardcoded objects that follow a simple path
we'll see where it goes, but I feel like the final efforts for whatever renderer we use will be touch-ups and generally making it a presentable demo
yeah and local lights are out of scope
yeah lol, local is way too much
also I found the video: looks like the main optimization was to not allow certain things to invalidate the cache in the distance if they only move a tiny bit
so foreground foliage animation: invalidate
background foliage animation: just reuse slightly wrong cached data
I wonder if that could apply more generally
like for anything that moves but only barely far away
In gpu zen 2 there is an article about reusing shadows even when the light source moved
It uses reprojection
Can't remember the details but it involves taking a few shadow map samples to do a small search for the correct occluder I think
That's for light movement though, not objects
Maybe we can just bring both of those up some sort of a performance considerations thing? Or are we only allowed to bring up what we've 100% implemented
We're not forbidden from discussing anything afaik. Papers usually have a section for "future work" where they propose ideas they haven't tried yet
I'm not sure how to approach the issues that unreal solved but we didn't get around to implementing, like local lights or "coarse pages" for volumetric shadows
From what I remember it wasn’t too bad (compared to the rest of vsm at least lol). I think it’s mostly just having a way for post processing to mark pages as needed similar to the depth analyze stage
But yeah no idea how we can talk about local lights 
The way I imagined it was that I would always mark the 'coarse' pages touching the player's frustum so they're always available to any effect
so you could have low quality shadows anywhere in the frustum
or maybe you could accept one frame of latency to get higher quality shadows for those
I think we can mention it briefly as something unreal did but we hadn't enough time for
Interesting I bet that would work. I went the 1 frame of latency approach. Like if a postfx stage tried to access unavailable data it would just mark the corresponding coarse page for next frame to get it
Sounds good!
thanks champ
i ll add separated compressed alpha texture reads for transparencies in tido for vsms
just to see if that helps at all
@ lvstri, j stephano, saky: Let me know if there are more questions we should ask the editors
#1166523139468034068 message
I think we can get leniency on the demo since that's not going into print
Lol and I'll tell them that the extra vsm section can probably be deleted
Wait is this actually being published somewhere
possibly. see the title of the thread
Let me know if you want a pre-editor editing pass or anything
I have contributed to a fair number of published papers and I enjoy the editing process
alright
Also a good excuse for me to read the article in maximum depth so as to implement my own VSM lol
so I actually read that old virtual shadow mapping paper to see what it does
they render the scene to a small section of the shadow map, then render the whole scene with that shadow map section to the screen
repeat for each tile
then they discuss optimizing it

Why does everyone think we are meming with these threads 
probably because we're good memers
Our humor is too real
this presentation is 3 years old, but still gold
https://research.activision.com/publications/2021/10/shadows-of-cold-war--a-scalable-approach-to-shadowing
15 cod games have come out since then
just over 1 week left btw
I added a bit to the introduction, just to dip my toes a bit so to say. From tomorrow I will try to reserve at least an hour or two each day to work on it.
Btw I modified a bit of what you've written to fit the parts I did Jaker, hope it's okay
Yeah it's 99% likely to be okay because it's all rough garbage anyways 
Forming sentences is hard 
But it was for a student conference, so I wouldn't consider it a notable publication
And I did a practice paper last semester for a seminar at TU
But it always is a struggle
I can commit literary genocide on your paper before you submit it
(i.e. be a grammar Nazi)
Please do
I find the best way is to just spit out sentences, dont care about reusing terms or sentences making perfect sense
And then just iterating iterating iterating
At least that works for me
Ye I'll keep this in mind (as I design this rhyme)
Like the "shitty first drafts" paper jaker referred to
Perfect
Getting over the writers block
Saky will know how to make it sound academic-ish
yeah, we'll make it work
Inshallah I will pump out a load of shit and trim it down
Are you doing this in overleaf, I thought I saw an overleaf screenshot at one point
It is
We won't be able to give you access to the overleaf project we're in because it has the whole book
But copy and pasting our stuff would be doable
Btw what are all the "accept changes" about, are we meant to click it or is it meant for others to do?
I looked it up
It's some review/change-tracking feature
Basically I just ctrl+A, then "accept all" the changes every day
I want to give everyone a chance to see what changed before I accept them
(and rejecting will delete the changes so don't do that
)
Yeah it's just a suggestion mechanism for non destructive edits
Like google docs
Sounds good, I'll let you manage it then
Alrighty
I added some comments so that we can have scoped discussions without doing it in the comments
I can also handle all the citations, if you've never done it before
It is very easy
I think I should figure it out instead of littering the text with todos lol
I don't doubt it's easy, I'm just a bum
Either is fine, as long as they are marked as todos
We can just batch add them at the end
the draft is due the 14th 
so it can still be fairly crusty at that point. idk when the final draft is due though
Damn wasn't it March like yesterday, time goes by way too fast
Next weekend is taxes though
I have time dilation issues
You might send me the article sometime during the week or I might not have time to edit it all
already did my taxes 🤑
Team procrastination
I will be setting aside time this weekend while joker does his taxes
I don't need to do taxes because I'm unemployed 😎
Based
I'll be gone this weekend and so will have few opportunities to edit the article
You think time goes fast now just wait until you're not a student anymore
it's ok I'll be stuck doing a phd for years
doing a phd sounds like pain tbh
but I mean being a wage slave isn't much better, if at all
O fuck
That means I'm turning 25 this year
I thought it was 24
Damn
time to keel over
Where are my glasses?
You wanna go for PhD btw?
Or are you just meming
I do actually want to go for phd
Damn, based
I have no idea why
but the greatest reason is this
a cool youtuber guy makes a video on his studies and I go like: damn, I gotta get a phd now
I always tell people I'm one year older than I am so that when I turn that age I don't find it as shocking
But then I feel older than I am the rest of the time
this was basically when I was still a kid, in my last year of hs
Demongod the kind of guy to set his clock 10 minutes early so that he is everywhere on time
Even baseder
I'll link the vid
I think I get masters and I'm out
Lol how did you guess
my parents do that on the clocks in their cars and it's just annoying lmao
Must be a white people thing
It's like the Addams family where he has 3 watches that read different times and calculates the average of them
maybe it's because he looks like my dad
also like, doing it only on the clock in the car is meaningless
because you're already on the move at that point
My family's clocks all have a different time offset lol
Reminds me of the daylight savings meme
Each year I get closer to not noticing it at all
Because all my devices just auto switch
Only my oven stays true, proudly reporting the stolen hour
I didn't notice and I pulled an all nighter because it was already 5am when I noticed that the clock switched
I pulled one once and fell asleep in every class the next day
Yeah exactly
I had a roommate who would constantly pull all nighters watching youtube and then fall asleep in every class the next day
honestly quite incredible
I used to be able to do it before an exam, but I had to go everywhere first because noon would hit and my brain would give up
Ah habitual self destruction
I fear I'll lose the ability soon too
I already can't context switch as rapidly as I could even a year ago
possibly due to various GP worms in my brain
you're giving yourself drain bamage by staying up 
that can't possibly be it
I've never been able to do it it's not like a lost ability thing for me
plus @mystic lark is an offender too
Today is a long day for me, 5am to 2am waking
it's 4am go to sleep
As long as I get 3h of sleep I can function
ah yes, under the 3-4 hour mark my alarms simply won't work on me
weird
But generally if I actually have to get up for something I actually will
Like today I knew I needed to be up at 5:30 to catch a train and I woke up on my own at 5:20 after going to bed at 12
But if I don't need to be up I will snooze my alarm indefinitely
I actually sleep a lot like 7-8 hours each day, it's just from 6am to 2pm
how can you wake up a 2pm
Yeah when I first got into GP my schedule was 5am to 1pm sleeping
My flight sim was written mostly in that schedule
I have good window blinds 
The birds are the hard part
a mediterranean can't comprehend sleeping through such a hot and sunny hours
Lol if I stay up too late I can wake up at 10 and go back to sleep until 12
Anyways enough procrastinating more VSM writing please
I had an idea for hi-z culling that could work. basically, each frame you only reduce the cached pages (active pages are treated as having the far depth, so all objects touching them pass), then cull objects against that
I can't quite remember what I did in my failed vsm hi-z experiment but I think it was not that
The issue with hiz built on top of pages is that you get no use out of it
This is because you have no frame to frame continuity
So you need to clear the hiz when redrawing the page anyways
The only use would be to build the hiz as you are drawing the page imo
Or at least that was my point against it, but perhaps I'm misunderstanding
I’ll try to do more work on the diagrams today
Wait don’t we already have an execution flowchart? This one
Hmm true
You would get a use out of the hzb build from pached pages I think
But you cannot cull against it no?
If you have a page with hiz and an object moves into this page so you need to redraw the page
How do you cull against the hiz without potentially discarding the object?
Hmmm but you can discard the object
If it's under everything
Okay I see
Unless the object already was in the page though
If I have a cube that was previously in the page and the cube moves away from the sun it will get culled against the hiz
It will cull itself so to speak
thats fine
you draw everything twice
and cull it twice
once old depth once new depth
@mystic lark thats the cope ass way unreal and most engines draw
maybe they exclude stiff from first draw tho
like foliage or small things that wont occlude
But they draw every frame
There is no guarantee we draw every frame
So once a tile is cleared you have no idea if the previous info is relevant
Because it could be very far from the original cached position
But I wonder if we can reuse data from higher cascades that cover area where you'd like to draw lower cascades
cant we make the hiz before clear
cool idea
Idk what that will be useful for
When you clear you don't need the depth anymore
Which inherently means the hiz will not cull anything no?
I think my idea might be stupid hmm
I think my hiz idea is no better than hpb
i think they can work together
Exactly my point
we can do both
Sorry I baited you again lol
I will scream at you Patrick
i dont see the problem
ok
so end of a frame
we could build a hiz right?
of the vsm clips depths
if we can do that we can do two phase
You mean merged depths?
with two phase the first hiz could even have hughe mistakes in it
like literlaly wrong values
it will still work
The tiles that have relevant information to hiz culling are not redrawn and the triangles that overlap them are culled by hpb anyways
And everything else has useless information
I think my brain just needs images to understand
I don't get how two pass will be useful
why
Two pass you draw everything from last frame and then everything again against this no?
Huuuh
So the idea is
- Draw visible objects
- Cull objects
- Draw objects that were not drawn in 1 but just became visible in 2
yea
Step 0 can be culling with current frame info like the frustum and hpb
draw everything twice, second pass fixes the errors from first pass
Okay but you need an extra depth buffer per clip for this no?
I think the problem I initially had with hi-z is that I only did one pass, so the one frame of latency screwed me
Hmmm
How
Maybe I don't get it
Pass 0 sounds like exactly what we have now no?
How is pass 0 in your thingy different from what we have now with frustum and hpb cull?
I think it is
But then you'd have pass 2 which does a hiz cull for next frame
And pass 3 to make sure there is not one frame of latency in the hiz culling results (as that obviously does not play well with caching
)
Hmm
Maybe 1 pass hiz would have worked if I made pages dirty for an extra frame after they were allocated
why???
just make the hiz page res
and build it before you clear
I am so lost
Cull for next frame?
we are all too confused
Yeah
reduce the set of objects with hiz
but yeah
I gtg
I'm back for one second with a new thought: storing visibility for the next frame would mean every clipmap for every VSM would need that
But it's just a bitmask (and we choose the granularity) so maybe it's ok
Ok I'm gone again
I think I have a question about culling
specifically dynamic objects that move into frame
let's say we have certain data cached shadows from a combination of static + dynamic
dynamic object then moves into frustum frame N
is there anything stopping us from reusing cached data to figure out which pages it actually invalidates?
like "it overlaps this page, but its depth won't overwrite any existing values so we don't invalidate"
I suppose we could do something akin to the hi-z test to see what pages (if any) need to be invalidated using the object aabb
so that would basically be
build hiz from still-valid cached data
test object against it to see which pages it actually invalidates
right?
Oh yeah true. I guess I was just wondering if it would work or if I’m completely confused lol
Current plan is to just loop over every page of every clipmap covered by the aabb of an object that moves on the CPU
Praying that it won't be too slow
My issue with this is that it works unless the object is already in the page
If it already is in the page it can cull itself
So you'd need a list of meshlets per page or smth
True I guess it wouldn't work at all
I'm not sure a list of objects per page would work either
explain
If an object spans one page and moves deeper into it (depthwise) then with the proposed scheme it will be culled/not cause the page to be updated
yea thats fine
But the problem is that you don't know if the object is genuinely occluded
By fails I mean the page will not be updated and will have stale depth
Which is bad
most good occluding geo is static
This is not actual hiz we're talking about
This is for determining when to invalidate a page when an object in it moves
I think a similar scheme could work if we mark texels as being occluded by either static or dynamic geometry
If all the texels in a page are from static geo then we can safely cull dynamic geo against it
It could just be a bit in the texel itself
oh shit
oooh god
death
idk if it was mentioned but unreal did talk where they talked about VSM. How they invalidated caches for static or dynamic, etc...
https://dev.epicgames.com/community/learning/talks-and-demos/VlO2/unreal-engine-optimizing-ue5-rethinking-performance-paradigms-for-high-quality-visuals-pt-2-supporting-systems-unreal-fest-2023
cant you read motion vectors
when marking pages?
check if different to static motion
makes no senae
what if the moving object making the shadow is behind you?

they start talking about it at 6:50
You also get motion vector when rotating camera
How do you distinguish that from moving object
ty for this
Dang this talk is super good
they cache static and dynamic separately which probably simplifies certain things (edit: helps a ton)
yeah we could have 2x physical storage. One for static and one for dynamic, then they can both be sampled when applying the shadow to find the near depth
or static depth can be overlaid onto the dynamic shadow map, whichever is quicker
@mystic lark we can use your app for the demo, also it's not "due" until the book is published so we have time
I asked the editor yesterday about which demo to use and whether we could submit two 😄 and his response was that he suggested using the Vulkan demo since it's more relevant
Is your code in a branch of tido?
It's in master
It needs polishing
But all code is in src/rendering/virtual_shadow_maps
Including shaders?
Yep
Do you dev on master
Yeah 🤠
Pog
It's partly slang partly glsl
Uh
It can be anything afaik
Just make it easy to build
I'm afraid the slang requirement will make it harder but idk
Cmake should do everything
It's just a single shader
Ok
I can port it if required
Yeah ideally it should be as simple as git clone + invoke cmake
Well the code is going into a single repo for the gpu Zen examples I think
Some vcpkg deps, but if vcpkg is not setup cmake clones it
Into the source directory
But this can all be changed to fit the needs
It kinda integrates a bit with the rest of timberdoodle, but I can also extract it all out into its own repo
I have no idea how strict they are on this sort of stuff
My feeling is that they're not very strict
Regardless, if you say we have time to finalize, we can all settle this later (in the worst case, extracting the code shouldn't be an issue/take long)
The book is being published in mid-may so I think the code just needs to be ready by then
That gives us actually quite a lot of time
But the draft needs to be done by the end of this week 
Yeah I know
Well this changes things a bit though, do we still wanna describe ff implementation, or shift to tido one?
Dirty bit on dynamic objects I'd say
ff doesn't implement that yet 
And then my bikeshed on the moving camera thingy I did, it doesn't work for when the sun is at low angles
I think I know the fix to it thougho
oh lol the camera thing scares me but yeah we'll figure it out
Maybe I'll just "let" you write about that part
Yeah, I can also just switch to what you do tbh
how does yours not work at low angles btw
Gimme 5 minutes
I'll draw
Actually, since I'm at my pc finally, I'll attempt to explain the whole thing
mmmrrrr saaakkyyyyy
its very good that you made the vsm very separated
its quite clean cut out
good for something like a book as a sample
yes yes
okay so
this is the setup
z is world space up
red is sun camera projection
oh is the issue just projective aliasing
the subparts are pages
actually I'll let you cook
uh hmm
okay okay
acutally, I can jsut do a whole thing
and you'll ask
okay okay
I'll make an exhrimple of what breaks in my impl when the sun goes low
this is the problem I'm trying to solve
The blue is the depth disconnect that is generated by shifting the camera one tile forward
did you update teh pic
I see the problem presented in the image but not the solution
also I remember we discussed this issue somewhat recently
right
So what I do to mitigate this, is I calculate the depth offset along the x and y per page
and then when storing the depth inside of the memory, I first offset it by page_coord_x * x_depth_offset + page_coord_y * y_depth_offset
in this example, there is only o1, which is the x offset
this "essentially" forces all the pages to align in the xy plane
this means, that the depth stored is disconnected from the page index it occupied in the current camera position
when doing the shadow test, you take the world position and project it by the lights camera. Following this, you also perform this depth offset, in order to be able to correctly compare between the depth stored in memory and your reprojected depth
so all of the tests are completely disconected from the actual camera position, that drew the page
so what info do you store per page exactly
in what I'm explaining, just depth
ah hmm
ok I guess that makes sense
you just need some relative offset to account for the page depth changing
yep
btw here's my scheme. the third graph shows how it fails
ah right
the issue (and I believe you have the same, just have shows a bit different) is that when you tilt the sun these depth offsets converge to infinity
can't you just choose to make the depth offset relative to a different plane when the sun is tilted a lot
right, yeah that is what I want to try to solve it
I force the pages to align to xz or yz plane as opposed to xy plane when the sun is near the horizon
ie do something like this
that pic confuses me 😄
sorry
let me see if I can edit ur pic
I remake
is this better?
both are the same camera position, just the offseting is different
perhaps
the obvious issue is that the first alignment can slide along the xy plane while the second alignment can slide only along the xz plane
I think that is what I'm thinking of too
so like this
now for this, I just store an extra integer per page, which is just the camera view space offset at the time the page was drawn
so then during the depth test, I project the world position, by the clipmap view matrix, add on the discrepency between this light camera and the height stored per page and project by the clipmap projection matrix
now the light camera can slide along the purple axis also which gives me all of my degrees of freedom back
this works the same for the second alignment also
as an added bonus, you can push the per clip cameras very close to the player and have very good depth precision
like so
I believe you also have an issue with low light angles, because if you shift the camera in near plane, a tiny player movement in xy plane will cause a huge light camera shift
which again converges to infinity when light is at the horizon
actually the sun will just move up and down accordingly (along the world up axis, AKA the sun's local XY plane) when it's sideways like that
the issue is that the player will just see no shadows if it goes too far along the sun's local Z axis at any orientation (which means either going super high into the sky at noon, or far from the center of the game area at dusk)
yeah
but you can fit the clip heuristic, so that it only selects the clips that are in range so to speak no?
wdym
uh
I'm talking about my impl for reference
aha I see
yeah
I am not happy with my solution, but I found nothing better
do we know what unreal does?
no
damn
btw
my solution works fine as long as you make the light frustum long enough so the player can't reasonably leave it during gameplay
with 32 depth bits you can get pretty good precision at kilometers of distance
okay
then we go with your solution
I can implement it tomorrow
or later, after we write the article
yeah we should just focus on the article tbh 😄
🚲
yeah stable addressing is ultimately a small issue in the grand scheme of things 😄
I just center them at the world origin hehe

then they slide along their local xy
all of them are at the world origin?
is bistro square not at 0,0?
I added an attempt at HPB mipdiagram and culling scheme
we should just need an entry for the last bullet which I'll work on today or tomorrow
oh crap idk what I uploaded as
I'll try to give some feedback on the pics later
I just clicked save and used defaults
Idk but as long as the sun isn't super low then it will just slide to follow the player well enough
right
the vsm colab is really cool
I added a depth analysis diagram. Including Jaker's original we should have a first version for each bullet point now
Not sure what the best way to iterate on them will be
I'm gonna work on the article some more tonight. Was tired yesterday from traveling
we should have a section talking about debug drawing
@ saky
(jk we just need to work on the article for now) (also I can try contributing to tido and add this myself)
I have this
Not the physical pages part
But the memory status and page status drawing I have
displaying the physical pages should be the easiest thing too because you literally just draw the texture normally
Oh yeah true
Any sense of when the article will be ready for me to look at
If it's longer than 10 pages it's probably going to take more than a day to look through
Tomorrow I have planned like 6+ hours for it
So id say tomorrow/day after I'd hope we have all the text
And can just improve it
Alright
Sorry, your compile took too long to run and timed out. The most common causes of timeouts are:
When the timeout timeouts on printing the reasons
I see a little froggy doing some writing 
I'll be up in 8 hours or so to help
I bullied myself to stop procrastinating so am just spilling word soup haha
perfect, I'll take a look when I wake up and we can whittle it down to the good parts
or perhaps lustri could take a look while I eep 👀
since you guys are in a similar/same time zone
also start thinking about where you want to put code listings and pictures
hmm yeah, good point, I didn't really think about code snippets at all
I think we have a lot of opportunities with that
hmm
What do you think doing it like this:
Introduction
Overview
Implementation
- Setup Phase
- Drawing Phase
- Sampling Phase
- Optimizations
Results
Conclusion/Further work
is that not how it's laid out currently 
ah wait
eh it's close to how I imagine it in my head 
haha okay good
anyways yeah I agree with that structure despite how the actual article looks so far
I don't really like the Introduction and Overview sections being both in
I think they should both do the same thing
I want to use the intro section to explain context
Perchance we should rename introduction to previous work, ond overview to introduction
like other techniques that try to solve these problems, and to bring up UE5
I just glanced at some random paper I had open and that's indeed how they structure it. Let's do it 👍
anyways, enough bikeshed for now, I'll go back to typing soup
ye it is the usual structure
your experience is helpful mashallah
alright I'll eep now and let you and lustri (if/when he gets here) to your devices
gngn
We will need to come up with a nice way to distinguish between shadow map pages and memory pages
it quickly becomes quite annoying to write their full name out, but without it it will also be unreadable
I also feel like the split between Implementation and Optimizations can be quite confusing/become hard to navigate around when explaining individual stages
Because the way we have it set up right now, we go stage by stage, explaining their purpose and function. With the Optimizations split we either:
- Describe all the steps we do without fully explaining them and only reference them later when talking about the specific optimization, which can become very confusing to the reader (why am I reading this if I have no idea what its for?) ((yet it sometimes is still necessary and a bunch of papers do it...)
- Omit the optimization parts from the initial explanation and only add them later. This can be quite clumsy as we basically discard the structured nature of our explanation and start randomly pointing back and inserting details (to do this we modify this step in this way and this step in that way...)
I guess it really depends on the nature/spirit of the article. If it is supposed to be paper-like we should just full on provide the explanation with all the optimizations included right off the bat. The adding by parts approach (second option in my message above) works if this were an education talk/presentation, but more academic publications usually just give you everything and deal with it yourself. Also, since the whole thing is called "Implementing VSMs" imo we should focus more on the actual implementation part (ie. the optimizations we did and how we made it viable) rather than trying to delve into the explanation of the concept itself. It is not really our invention as Epic did it first, of course we should sufficiently explain the concepts as it should serve as a self contained piece, but our contribution is still the implementation description not the technique itself...
What about a flow chart like this for the Setup Phase?
Virtual and physical pages?
I agree. I wanted to put the optimizations later so the reader can understand the core algorithm first, but it has a negative impact on the overall structure
It could be useful. Also I'm quite surprised you don't have any read+write resources in a single pass 
I can make my own version of this diagram and compare, because I think we have some subtle implementation differences here
I have, in allocate shadow pages
I like it
Oh I somehow skipped it lol. I have read+write resources in other passes too though
oh and I missed one also, allocate shadow pages reads/writes memory page table too
It would be awesome if you took a look at what I wrote, so I get some feedback Jaker (if you have the time ofc), I'll prob continue writing a bit more before I go to bed, so would be nice if I already incorporated your notes
Yeah I'll eat something then take a look
awesome, thank you! 
btw I took a looksie at some other articles and they don't all have 20 pages
the decals have like 12 and a bunch is just listings, code snippets and images
we already have 5 of just text
(and one figure)
Yeah pretty much any length is allowed. I just predicted that ours would be around 20
(including figures and code)
Ah I thought it was a requirement
Can I get a look at this tonight somehow
Can you just dump it into a private overleaf document or are there too many dependencies to get it to render easily
I can, I'll do it in an hour or so
Doesn't have to be now since I won't be able to read it until tonight
I'll probably read it in like 8 hours from now
Right, I'll do it before I go to bed then, should be in a few hours or so
I have a quick meeting in 5, then I can review your changes
Don't rush the review, I'm dead I'll probably just peruse and fix wording or two
beep boop, I'm looking
holy crap that's a lot of text
I'm putting comments in overleaf
and I'll just accept everything to reduce the visual noise
Yeah it's getting a bit too much
Perfect!
I got stressed by the lack of content so I tried to add more rather than less
We can always remove
Can I just place the link to the WIP copied paper on overleaf?
probably can right, its not like our own writing is NDAd or smth
@sweet nimbus
just dm it to demongod and the other vsm peeps
oki
@acoustic bobcat did you get the overleaf invite from saky
make sure to change the cs. to www. in the link or it'll be in Czech 
Btw we're free to publish the article on our own websites and stuff
It's stated in the FAQ
Yeah I did
I commented on the language in the chat lol
How much more text is going to be written
Idk I need to finish reviewing it myself 
I got distracted with nonuniform stuff in #opengl then went to the gym
I think a few sections still need to be completed but saky wrote a shitton earlier today
Is 5 pages enough to describe how VSM works
Or is it going to be a lot of diagrams or something
We haven't put in any diagrams or code listings yet, but we plan to
We have like three or four diagrams that just need to be added
ah I see the chat you were talking about now lol
yeah raise any comments you have as inline comments in the document
I'm reviewing it too but in the actual article thingy. I already noticed some things that need to be moved/deleted 😄
this is a pretty good article that I found by searching "perspective aliasing"
https://learn.microsoft.com/en-us/windows/win32/dxtecharts/common-techniques-to-improve-shadow-depth-maps
Is GPU Zen like GPU gems
yeah
Is there an example of the style and format of the articles so I know what the goal looks like
uh I have GPU Zen 2
Actually nvm I found a google books preview
Ok so they're similar to GPU gems, maybe a little more detailed
This one is like 10 pages of small textbook-style pages
yeah the article length is basically "however long it needs to be"
Wow these assholes made track changes a paid feature
Or maybe it's always been like that and I am used to a pro account for real publications or something idk
Anyways I will try to just leave comments where I edit things but you might have to diff it against your other copy
oh I didn't realize you were editing too
Should I just leave comments for the edits
My edits are going to be a million small things
ah
I can't contribute much to the content itself since I haven't done VSM so it's mostly going to be like rewording sentences and stuff
I'm working on the main doc right now so there will be a bunch of conflicts
fuggg no track changes sucks
I can just comment out the old version of each sentence I touch or something idk
If you're editing now I can just leave it alone and you can copy the new version over later or something
hmm i have an idea
perhaps for now you can just read it and give broader feedback
e.g. regarding the structure of it and whether you found some things confusing
then we can incorporate that into the main doc and copy it over again tomorrow for smaller edits
Sure yeah
Alright I'll just do everything with comments
They'll still show up in the track changes bar they just won't be automatically applied
that's fine
Alright made a few comments
Overall I would say I don't really understand the algorithm well enough to actually implement it but I think having diagrams and images will help a lot
Which I assume are in progress
I'm also coming at this with zero context since I've literally never touched shadow mapping so it could be that I just don't have the prerequisite knowledge to understand the article fully but I think it's also a good benchmark as well
Btw for anyone that doesn't have a GPU Zen book, here's a random article preview I found for context https://books.google.com/books?id=rA7YCwAAQBAJ&lpg=PP1&pg=PA3#v=onepage&q&f=false
I think we are still missing a bunch, images will definitely be needed as without them it's very hard to explain
I also didn't get to explain a bunch of optimizations
Thank you! I'll go over it as soon as I properly wake up
yeah we definitely assume familiarity with advanced shadow mapping algorithms
I'm not sure how much of it is that though
Like are the cascades identical to the same ones used in CSM
or is it different
I think pictures will help me understand if the descriptions are just vague or if they make sense but I'm simply lacking context
My philosophy is that ideally the text description of the paper carries enough information to understand the concept even without the figures but maybe that's unrealistic for this article
ye absolutely it should
they can be thought of the same way
there's gonna be an overhaul tonight and probably tomorrow again
Ok maybe I get it, so if there's a sun straight above, the cascades are basically rendering the scene from the top down, and in this case you're just using a depth prepass to identify which shadowmap texels are visible to the player and sparsely storing the image in chunks if so?
yeah
Ok interesting
Why so many cascades
16 is a lot of times to draw the scene even with culling
you can use any number really
but with many cascades (or clipmaps as we like to call them), you can always have just the right shadow texel density (1:1 mapping of screen pixels to shadow texels)
Ok interesting
so shadows won't be blocky if you go close or undersampled/memory-wasting if you go far
How does it avoid the transitions of CSM though
Just because the steps are smaller between cascade resolution?
Why are they called clipmaps btw
That terminology wasn't defined in the article as far as I saw
UE5 docs call them that 🐸
pretty much. it's the transition where one shadow texel is about one screen pixel so it's almost impossible to tell, even without filtering
Climaps weren't defined, because I didn't want to get into the whole clipmap thingy
yeah that's a good idea lmao
So I just decided to call them cascades
Fundamentally the same concept imo
Should I write more today, or do you wanna overhaul first?
let me finish this pass
@acoustic bobcat you can see a little projective aliasing at the beginning but otherwise the shadow basically matches the pixels on the screen until the end when I get the the nearest cascade
with TAAU
with TAAU and filtering (PCSS)
Don't really know what I'm looking at tbh but yeah I didn't see any obvious transitions like CSM has
really bad tree model I made
that scene sucks because I made the normals wrong in blender somehow
this is probably more legible- here's projective aliasing that makes the seams more apparent
Yeah I see now
Still not bad at all
Why does it have those jagged edges though
yeah that's with no filtering or TAA to cope
Ah ok filtering normally takes care of that I assume
I don't remember LVSTRI's having that though and they looked to have crisp edges
I think LVSTRI only showed the good angles 🥸
You get very little aliasing when looking at the center of the bistro
you're projecting a square (a shadow map texel) onto the surface
so the projection gets all long and fucked up from the angle
it's like in games how decals get really stretched on steep terrain
yeah makes sense
VSM doesn't quite solve projective aliasing 
yeah it's hard to notice when you look at the scene from certain angles
if you tactically angle the sun at 45 degrees then there won't be any pathological moments in a city scene 
And I assume you can just blur the shadows a bit since that's realistic anyways
ah across the page boundaries mayhaps
page boundaries are fine if the pages are all resident... which isn't always the case
5 cascades here in the scene
I just mean to deal with the aliasing on the shadows themselves if you're not going to have TAA for instance
we need to extend the depth analysis pass to select all pages in a small radius instead of just the exact page the pixel landed on
or allow other passes to request pages if they fault
ye, we were talking about this with LVSTRI
your renders are so beautiful hehe
Skybox doing the heavy lifting
I wonder how much flicker you'll get when you request pages when filtering and get them the next frame
yeah I be wondering
I suspect the flickering will be unacceptable, but still better than simply having nothing forever
I have this issue too, took a bit of playing with the autoexposure params to get it to behave
my autoexposure is probably fine, it just has garbage input (sky and ambient in [0, 1] while everything else in [0, infinity])
and BC7 normal map 
but the BC7 normals are nothing compared to the fact that this material is just garbage
I'm glad Patrick switched to bc5
Metal pavement 😄
I have no materials 
all is just albedo
I handle metals and stuff correctly (I think). this material is just too shiny for some silly reason
let me try blenderrrr
uh oh >1gb 
yeah looks like garbo in blender too
blender really chokes on bistro damn
the new gltf loader is way quicker
only took like 10 seconds to see the mesh and maybe another 20 seconds to see materials
1 second in Tido
but yeah at least I'm not waiting 5 minutes for it
btw reading Demongods comments I think we should add a diagram explaining the mapping of everything
I'll give example
something like this scribble I did when explaining stuff to Patrick
like, this is the memory, this is a page, this is how virtual page maps to physical page (I really like the naming btw, we should explain it in the article)
I could see a comprehensive diagram like that being at the beginning for people to continuously refer to
but consider the physical size of the book isn't that big
yeah no we can definitely split it in parts/omit stuff, but I thing having the upper half just so that you have some sort of visual in your head would be helpful
Maybe not all at once but I think you could get away with breaking it up and interspersing it throughout
That way we can avoid some of the initial confusion
If you can make a big one (even if just of screenshots of debug tools with arrows pointing around) I can think of how to make it really compact
Like this but with real data ideally
Just so I can be sure I'm understanding what I'm looking at to serve as a sort of ground truth for what the small diagrams should convey
Here are some pics of GPU Zen 2 next to other famous GP books for reference
Lol nice white point
Anyways I hope it helps your intuition about how the article might look in print
What's the max page limit again
Like 30ish
Long
There's no hard limits, but we probably won't even push the soft ones
Plenty of room for some diagrams then
Yeep
They'll just have to be split up to fit the page size
@mystic lark do you think we need pics to show perspective and projective aliasing or nah
the microsoft thing I linked earlier had these to showcase them
or do you think a simple description of the issue would suffice
I think the description should be enough
I feel like demongod is a bit of an anomaly in the target group of the article
his feedback is valuable though because it reveals where we make assumptions
Yeah no, definitely
We can cite another paper/source better describing these issues
So like we do a short description and cite
I wrote paragraphs describing the aliasing issues just now
I think that should be enough
do you mind if I rewrite some paragraphs
I can comment them out first if you want to be able to compare
or just the CSM paragraph
Not at all go ahead
I wrote it with the idea that we at least have something to latch on to
And now we can just rewrite, restructure and improve
In no way is what I wrote final
I have the martty virus (I can't stop using hyphens)
are we supposed to capitalize the names of techniques like Cascaded Shadow Maps, etc. as if they're proper nouns
I would do it
fug
I don't know if we actually have anything useful to say about ray tracing so I might just remove that sentence 
Yeah I have no clue about rt so I decided to leave it to you or someone else haha
RT shadows has its own set of trade-offs but yeah idk if there's any point in comparing the two
I wanted to mention it because some people will inevitably ask the question
Right yeah, but the answer will be the same as always... Rt better shadows but slow
lol yeah I guess that's pretty uninteresting
hmm we still have previous work before the intro
I forgor which one we wanted to do but this seems backwards
I feel like they both do the same thing, I guess intro could be "motivation" and stating the problem, previous works then how others attempted to solve the problem?
here's an example paper
https://www.researchgate.net/publication/220791941_Sample_distribution_Shadow_Maps
they have abstract (we don't need), intro, then related work
intro: explains the problem at hand, followed by a brief summary of their technique
Okay let's do it like that
yeah most of the previous work stuff should be in the intro
except the actual previous work lol
Wym?
I think we should put "shadows important to ground the scene, despite various improvements still remains a compromise - describe issues, what vsms do in like 6 sentences"
Ah I see, ye I agree
wym
hmm I need to sleep
Into the introduction section I mean
ok that makes sense
btw Epic motivates VSMs with high-poly geometry in their talk
like if you want tiny rocks and shit casting accurate shadows, you need high res shadow maps
anyways, that's just food for though
intro is still a mess but I'll clean it up when I'm back
I mean, they also say that Nanite is basically required in order for VSMs to work
yeah but we can ignore that
their VSMs still work without nanite, just not as well
because there's more page invalidation or something
I'll try to add stable addressing and merge in caching
and perhaps talk about mesh shader drawing
do you mean in the article
yeah
alright
btw that yellow section at the bottom can probably be removed if we're going to explain caching sooner
Yeah thats what I meant by merge in
and you can change the specifics to fit your implementation, if necessary (if you choose to keep any of that text)
We should probably introduce the concept earlier, maybe in the initial paragraph so that we don't start talking about it out of the blue
I have a similar problem as with the heuristic explanation part
I don't really know where to fit it
as it is not really a part of the algorithm per say, but still important to talk about
yeah
I'll try to think about it a bit more and maybe look at how other papers do it
one last thing I want to add is like an algorithm listing of the whole process
"we also describe important optimizations that we implemented: caching, hierarchical page buffer, etc."
like this format
let's just use C++ code
and for code listings I was expecting to do snippets that describe a small piece of functionality
ok I guess we can put high-level stuff in there too
like this stuff
I don't mean concrete examples, more like pseudocode so
The whole process of drawing shadowmaps can thus be described by the following pseudo code:
AllocationRequests <- Mark visible pages(Depth Buffer)
Free and non visible physical pages <- Find free and non visible pages(...)
....
somethig like this in the algorithm format
the other articles use real languages and I think we should follow suit
it's ok if it's not your exact code
yeah, I meant it more to provide a layout of the logical steps - inputs and outputs of each - to give a nice overview, but I guess it's not really necessary
yea
Yeah referring to some other resource for that stuff is a good idea too
You don't need to describe everything yourselves you just need to provide some place to go read about it
The article doesn't need to stand alone from complete first principles, as long as it can be learned from first principles by reading all the references
Btw do we know which other diagrams we might need? I can help get started on those this weekend
Also if we need any of the current diagrams to be changed I can start on that too
I think we'll need something like this
probably not the whole thing, just the upper part
to show the relations between vsm cascade (clip), virtual page table, physical page table and memory pages
btw what do you use for diagrams? I'll probably want to use the same so that we get consistent look
Dang this is nice
Sounds good, I’ll use it as a reference to make a diagram
I’ve just been using PowerPoint so far
