#GPU Zen 3: Virtual Shadow Maps
1 messages · Page 4 of 1
im super proud to know you guys 🙂
as long as we mention TAA is fine
ok
idk how they pack figures next to each other in tex so I'm just gonna combine them in paint.net
we can leave formatting and beautification for last methinks
huh, that terrain has slightly more vertices than bistro
4.1 million vs 3.9 million
@mystic lark [sampling], should we mention both "inserting a new allocation on page fault when sampling" and "mark visible pages in a predefined radius instead of marking only a single page"?
Uhhh the new allocation on page fault imo makes no sense
It will flicker no?
Mark in a radius should be good I think
unless you mark those allocations to not be freed somehow, yes
Yeah, then I'd just avoid it
ight
And maybe say that we have a coarse page set as we were saying in tido with Jaker
And we can use that as a fallback when we get a page fault
You can just add it below and I'll merge it in if you dont want to change stuff
Dang this results pic ended up looking really cool though. Nice!
I didn't have time to read it again btw
I'll take a look when you get feedback from the real editor
I just awoke, I kinda fell asleep right as I got back home
but I see no notes from the editor so far
They have a lot to go through so I'm not surprised it's taking them a while
looking through the articles, I don't think this review period has actually started for any of them. However, there is are some very thorough review comments from a few weeks ago and earlier
Yours included? I can do another editing pass tomorrow if it's still not been commented on
yeah nothing yet. everything is in limbo lmao
It takes a while
I remember once being assigned to edit 200 pages of a 1000 page manual
took me like a month to get through
Very draining process compared to editing just 10 pages or something
even editing just 10 pages is pain, but I suppose it doesn't help that I'm also helping write it and we're on a tight deadline
Normally you have each author go through it separately and write comments which always takes like a month because people forget or whatever so it gets hammered out over a longer period of time
we were sorta doing that at a hyper pace. like 16 hour periods where we do a full editing pass + some writing
but the dust has kinda settled even though the conclusion isn't completely done and we're still in limbo 
@vestal snow for the perspective & projective aliasing pic, I think simple pics of the issue would suffice. this article has some good pics
https://learn.microsoft.com/en-us/windows/win32/dxtecharts/common-techniques-to-improve-shadow-depth-maps
huh that 1993 paper (Hierarchical Z-buffer visibility) really does seem like the origin of hi-z culling
but the version we use, being gpu-driven, is certainly a newer thing that derives from it
and their version has some weird quirks
While the basic Z-pyramid test can reject a substantial number
of polygons, it suffers from a similar difficulty to the basic octree
method. Because of the structure of the pyramid regions, a small
polygon covering the center of the image will be compared to the
Z value at the coarsest level of the pyramid. While the test is still
accurate in this case, it is not particularly powerful
Uhhh hmmm I kinda just looked up hiz culling, saw paper with 1k citations and assumed that's it
yeah idk it's probably fine
they are dummy and dont sample a 2x2 quad, which solves the issue
I wrote a little in the results section and added a pic to the beginning. We still need a table with a benchmark for the passes
the table should just show how long each pass of VSM took in a specific scene
Do we do a fly through and average the time?
It also heavily depends on the sun angle
I was thinking of an nsight capture because you don't have per-pass metrics in your renderer yet
ok
something like this except with just one GPU (taken from GI-1.0 paper)
I was thinking of doing table for a static scene, then the same scene but with caching disabled to show the difference
a flythrough would be ideal but idk how hard it would be for you to do. I guess you can just do it by hand
I can hack in camera following a predefined path (I have that in another project) and we can then benchmark on multiple gpus
and ofc a flythrough would require those metrics and some way to average them (which tbf is easy with an exponential weighted average)
we don't need multiple GPUs
just one will get the point across, which is that it's cheap to render
Okay, well the path will be good regardless
So that we get consistent results for caching and no caching
if you want to add the perf counters and camera path in engine, that fine, but I think those are not strictly necessary to get the data we need
I just want to be sure that we can complete this
Yea, perf counters I'll definitely add, camera path probably not then
For caching the counters are needed imo
Because the perf jumps up and down a lot with caching
I was only ever considering using nsight, not looking at the entire frame time
ye
Which would be hard to do with nsight
nsight can actually capture and average multiple frames
before you launch your app, you set it to capture multiple frames
then you check the "aggregate frames" box in the timeline after capturing
btw I do not find the "batch 10 task 0" prefix very useful in this
One issue I've had with that is the capture actually freezes the frame making the frame delta time spike up and teleport the camera 
yeah it won't be useful unless the camera is stationary
There should be a name of the task afterwards no?
there is
It's DAXA thingy I can talk to Patrick about that
it gets cut off unless I zoom in a lot
I don't care about that stuff as the user 😄
(and also the source language)
I feel like this can be misleading, as caching will be nuclear speed with stationary camera
yeah
But once you move the perf will obv drop
that's why I suggested testing without caching as well
but a path with averaged numbers would be ideal
Yeah good point @prime ice
my suggestion is to move the "slang" and "batch N task M" stuff to the suffix so it doesn't impede what I care about
We should make it a task graph option to disable/enable the prefix
or just make it a suffix
The slang draw is the actual name of the task I believe
ah lol
So slang prefix is app side
Or maybe the name of the pipeline
But those are good points, well change it
yeah it says pipeline name on the bar with the slang stuff
yea suffix might be better
Do you want to put an image of the nsight capture into the article?
Or just the values in a table
just in a table
but if you do the benchmark with in-app counters then you can skip the nsight thing
Benchmarks will be done after you wake up
(if you don't wake up earlier than in 5 hours
)
well I do have a meeting in 5 hours so
maybe potrick can help you
I am in poland, but without the la
nice
we still need people to review changes in the article btw. there's so much green
I cleared some just now, but 1. a bunch of edits are written by me and I want at least one other person to see them; and 2. I'm about to sleep
@wooden jolt do you think you can find some time to review edits in the article
yes
I'm forced to wait for irl shit
I'll be home soon™️
alright cool
ETA 30 mins or something
So will work late again
no need to actually write much (though that would also be appreciated lol). just make sure things are sane
I went through the entire article on my phone. I accepted the changes and made some minor corrections. The later sections (sampling and conclusion) still need a bit of cooking, I'll work on them once back on PC
Someone still needs to review the sections I added, they are still pretty rough and require rewording and polishing (I can also do that if no one does it before I get back)
Ok I’ll try to put a similar picture together today
Unless you have an idea for it already and plan to upload
Tido now has timings
I already put such an image in
Shortly after I said that hehe
okay so caching matters a lot lol
left no caching right caching
Is this enough for you Jaker, or how do you imagine the benchmarking process to go?
(I pushed so you should also be able to benchmark yourself if you want)
yeah that's perfect
we could do fancy shit by writing the timestamps to a CSV file and making a graph out of it
csvprofile my beloved
the chance to make afps a concrete term too
I did ewa_exec_time = 0.99 * ewa_exec_time + 0.01 * curr_exec_time for all of these, hopefully I understood correctly
I can also cook up some graph if desired, but I think it's bikeshed?
graphs would be cool but I don't think are necessary
also imo they would be more useful if you had a fixed camera flight path
okay, I'll focus on text for now
Btw, when fixing the sampling phase text I started to think
when sampling for PCF for example, wouldn't it be better to just figure out the cascade index directly by the same heuristic we use for the main pixel, instead of attempting to "blindly" look into the same page + page in clip above + page in clip below?
maybe, but doing that N times per pixel could get pretty expensive
I haven't tried it though
uh hmmm
yeah
I think we can combine it actually, attempt to look into the same page (we can trivially determine if the sample still lies in the same page) and if it crosses to another page, we can use the heuristic
I actually forgot that my impl blindly looks up and down one level lol
I saw that in LVSTRIs, I don't know if yours does that too 🕵️♂️
mine does 😎
now this is epic
I added timestamps into Tido vsm pass
and I think I already have an idea on how to integrate these into TaskGraph
Page faults also occur when sampling the VSM for shadows in participating media and translucent objects. - do we know how to fix traslucent objects?
run the marking pass without translucents, and then while drawing every translucent pixel?
coarse pages or the thingy where page faults create allocation requests
yeah that is what I'm suggesting
I think that's also the second idea I wrote if I understood right
unless you mean drawing translucent stuff an extra time just to mark pages
aha, I assumed you meant while sampling the shadows (IE the same issue occuring during PCF) not while drawing
well translucent stuff would be forward so it would be both, no?
unless you are doing dithered transparency
I don't follow
translucent/transparent geo would be drawn in a forward rendering pass in which shadows would be applied
i.e. they wouldn't be deferred as you seem to imply
oh I see, yeah in that case I'd just draw them twice
JS said that he does the allocate-if-page-fault thing for that and volumetrics
idk how bad the artifacts are, but I bet it's pretty easy to implement
yeah no, that will be simple to do
maybe the TAA can eat it away
(shame that there is no TAA yet in tido)
I rewrote and expanded the Sampling section, so it is ready for a review from someone else
ah i expected a prepass for mbt
mboit?
that's the same thing as mboit, just from some different people
apparently they were researched independently in parallel
ah I didn't know that is also a term for it
we probably should add some images into the sampling phase to break up the wall of text we have rn
I think I'm having some trouble understanding a sentence
send
Then one of three actions is taken, depending on the state of the page stored in the page table. If the page has not yet been allocated, its coordinates are added to the buffer storing allocation requests for this frame. If the page is already allocated, but not yet marked as visited, we mark it as visited this frame. Lastly, if the page is allocated \textit{and} marked as visible, we do nothing. in the marking visible pages section
why exactly is the allocation state of the page relevant here 
is it caching related
well yeah, if the page is already allocated from one of the previous frames you do not want to allocate it again
you only want to allocate the pages that are not yet backed by the physical memory (the ones that became visible this frame)
yeah but this is the the marking visible pages section no?
unless I'm misunderstanding
in my impl marking visible pages is a shrimple unconditional imageStore
the allocation comes later
how do you know which pages to allocate?
It's pages that were active last frame that are being discussed here right
they could have been active any frame in the past, not just the last frame
but yeah
maybe the pass should be called something like classify pages?
if the current page is not backed but visible, an allocation request is emplaced, otherwise if the current page is not visible but backed, it is freed
ye I don't free, but you have the same logic
it should be noted that the virtual page table and active page table are different entities in my thing
but I suppose it doesn't matter what I do
if you say it's fine then you get an 
maybe the diagram will make it more clear
ye so you access the virtual page table immediately during the visible page marking
which makes sense tbh
In my impl I don't free the pages unless the memory pool is full and new pages request allocation
so what I do in the first pass is mark all pages that are visible this frame (so I don't free them later) and store all pages that are visible but not yet allocated (cached)
I see, so confusion cleared?
yes
hmm maybe I should reword this part to make it clearer?
what do you think made you confused?
the diagram is good as well so we confirmed that it works 
BTW we really should unify the naming, sometimes we use visited and sometimes visible
I think we could mention "VPT" instead of page table generic here
I suggest visible as it is more in lines of what actually happens
ye visible is better imo as well
@vestal snow this change (visited -> visible) should also be reflected in the diagram/s 🙏
Sounds good I should be at the computer soon and can push diagram changes
one more nitpick
should we really mention implementation details such as mesh shaders in the HPB culling section?
I personally think it's fine the way it's written
Uh oh yeah good point, I was also thinking about that and I'm not sure
Perhaps I go way into detail about the mapping and such
And simply saying "we use meshlets and mesh shaders" would suffice?
the reason I nitpicked about this is because in other papers you would usually see impl details in the "Our Results" section or something like that
so this is definitely not wrong per se, just probably uhh misplaced a bit?
perhaps this is best, but the "removed" part should definitely go into the results section
"In our implementation [...] we performed HPB culling against meshlet bounding boxes, giving us good culling granularity [...]"
Good idea, yeah, we can also move the section where we describe our selected vsm resolutions and stuff down into the results section
What do you think about removing the split into bookkeeping, drawing and sampling all together and instead just having the individual steps?
I feel like the split is a bit disproportional in content as is rn
Especially if we remove the implementation details from the drawing
It will become quite plain compared to the bookkeeping part
Yeah
hmm
Not between but in the sections themselves
Look how long the bookkeeping part is
And how long the drawing part will be
inherent flaw of VSM: it's all bookkeeping 
I think the split is useful personally even if the content is very disproportional
@sweet nimbus what do you think
I have a question
In the heuristic we mention the frustum side length of the first cascade. Is it true that the first cascade side length determines all other side lengths in the Tido impl?
like second one is 2* first side length, second is double again, etc.
yep
ok cool
diagrams should be updated
perspective/projection one was deleted since Jaker's replaces it
cascade heuristic diagram is still in the review section at the end
I'll take a peeksie at it later today/early tomorrow
I think clip space should be two words
I think it's okay
Okay, let's leave it as is
Hmm maybe removing it would be fine lol
How many sub-steps are there in the bookkeeping section anyway
5
So we would instead make it 7 steps where one would be culling and drawing and the other would be sampling
Hmm maybe we keep it then
If there were like 2 substeps then it would be more obvious to merge them
I didn't mean to merge the sub steps
I meant to make drawing and sampling sub steps themselves
But it is probably bikeshed
Yeah that's what I meant but wasn't clear in my wording
Should we have some kind of showcase of a few scenes with really nice shadows to put at the very start
I just realized our first image is of artifacting shadows lol
lmao I had that thought too. I think it would be a good idea
Here is our article - let us start with the artifacts
the picture of projective aliasing is with default vsm settings in my renderer too 
I mean we can just be clear about that in the results: "vsm does not fully solve projective aliasing" etc.
there are some nice pics in the results section already fwiw
In reviews I've always been told to use assertive language in papers so "VSM is a flawless technique solving all issues other shadowing techniques struggle with"
btw VSM does not prevent you from using perspective skewing so you could advertise it with LiPSM or TSM

I already feel kind of weird talking about sampling for participating media and translucent objects when none of us implemented either
but then pages are non uniform
real
participating media like volumetrics?
that should go in future work
Oki, we can move that part or remove it if you want
I'm not familiar with these techniques but I'm assuming this is a joke suggestion
I'm half joking
I think we should still discuss it since it's an important feature of a renderer
mathematically VSM + LiPSM is possible
but this is bikeshed
it could go into the future work section
You can read what I wrote and decide
I should be able to implement it in Tido till may or whatever the deadline is
Eh probably not transparent objects
These will probably take more work to integrate with the rest of the pipeline
yeah just leave it in future work. you already have enough on your plate as it is
True dat
Is it allowed for us to mention certain things and just make it clear that we aren't bundling it in the example code, but either talk about what we tried or what UE does for it?
I don't think we should talk about what UE does
Ok, what about things we tried but aren't bundling?
If one of us tried it I think we can say that
such as volumetrics and the depth culling test you guys did
Ideally we should bundle everything we talk about with the example (Tido)
There is some time to implement stuff still at least from what I understood
But I'd avoid speculating about stuff we are not sure will work
Or just leave it to the future work section and Jaker said
Alright that makes sense to me
My volumetrics test is still kind of in a meme state so idk if I will be able to try and contribute it to Tido before the deadline
Do you use the same approach described in the sampling section of the article?
Actually I really should sleep now, I'll be back tomorrow
I read through it and I think my example code is using pretty much the same steps
Alright, then we at least know it should work
Btw are you doing the "project frustum" part or are you requesting pages to be allocated when you encounter them missing while sampling?
Btw funny observation, caching perf is 4-5 times better as long as you don't move very close to the surface
ah yes I need to test that
I mean it makes sense as when I fly close to the surface pretty much everything cached gets cleared and invalidated each frame, but id still expect mainly drawing a single cascade to be faster than 1400us
nsight would probably tell you something
But maybe the atomic min contention is going ham
Yeah I should take a capture
we don't need to root cause it atm
True
I'm 99% sure of that too
have you seen the mesh for the trees and bushes in bistro
I feel like the vsm journey is me and LVSTRI being like "mhmmmm what iff" and Jaker standing with a ruler above us and slapping our fingers "No! Not needed, write the article" 
it's all self intersecting quads 
Yeah I'm leaning closer and closer to not caching the lower cascades at all
btw I don't see the caching artifacts anymore
I think I triggered it just by enabling caching and moving around while close to a surface
but ye it's good now
how about we put that in future work
Yep, this is all hypothetical
It just feels bad, as our impl could have been so much better/tried out many more thingies
perfection is the enemy of good enough
we did what we could, given the circumstances
I accepted a bunch of edits and made a few of my own. it seems like the list just keeps getting longer 
Good changes! I reviewed/accepted all of them and made one or two tiny adjustments - asymptotically this surely must be converging to 0 changes 
There is still a bunch written by me that needs to be reviewed though
made some adjustments
it's joever
I double checked and current experiment code has both
Hmm I’m reading unreal’s docs and the volumetrics test almost definitely will need more time than we have. I found this: “In a future release, a more elegant solution would be where some of these effects can be marked for localized pages directly in advance rather than the current, overly conservative coarse pages being used.”
Seems like they’re still experimenting and want to move away from having to mark coarse pages?
@mystic lark do you mean for this to say "light frustum", then "cascade frustum"?
As the light frustum moves to follow the main camera, new pages, previously located on the edge just outside of the cascade frustum, might need to be drawn.
I think it should be "light frustum" both times
Yes, cascades frustum for both
went through the stuff in the drawing phase
now someone just needs to review the sampling phase
Shit I forgot. Someone should copy the contents of the article to the actual book just in case. We've made quite a few changes since then
Currently is 17 pages
I can add you as a collaborator
Just dm me your email
Have you guys gotten any editor feedback yet?
Nothing in the offical document so far
@prime ice have you read the article yet?
no
(wpotrick knows what
is about, just in case anyone is wondering wtf that is :D) @prime ice go read it.
Lazy German, bad 
yes
@mystic lark there is a part of the sampling section that I might rewrite
there is a paragraph about participating media and another for translucent stuff, but really they suffer from the same problem
the solution of drawing translucent stuff twice can still be mentioned
I initialy had them as a single paragraph and then I rewrote it to be two
but you can definitely merge it
I will try, inshallah
I need to be bullied into writing the final section
I'm procrastinating on it so hard
ok, that sounds better
readers will only care about having a rough idea about the perf of VSM or any other technique
I realized that the implementation doesn't even support these (afaik), so maybe this info should go in the future work?
yeah either that or we rely on JS implementing them
lol ok then I'll move it to the future work for now
I mean we rely on the fact that this works because he implemented them
I know they work, but they don't exist in your impl (the demo we are shipping)
ye(t)
we have theoretically about a week to prepare the demo
is that too little time?
the main things is getting a tiny benchmark (which you seem to almost have) for a table and the invalidation for moving objects
Nice that’s awesome
!remindme 10 hours finish the conclusion
Alright gpgpu, I'll remind you about finish the conclusion in 10 hours. ID: 67610154
I need to be whipped into shape

I got too distracted with the vsmisms in my own renderer
!remindme 9.5 hours whip jaker back onto track
Alright deccer, I'll remind you about whip jaker back onto track in 9 hours and 30 minutes. ID: 67610452
dogjiffalternativeversion.gif comes to mind
!remindme 10 hours write dynamic object invalidation mask
Alright m_saky, I'll remind you about write dynamic object invalidation mask in 10 hours. ID: 67610485
doggos' eyes are the best
lol
we need some pictures of nice, c r i s p shadows for the VSM article header still
or not lmao. I'm not too opposed to having the first pictures be commonly-encountered shadow mapping artifacts
I wrote some more stuff at the end of the article
I dunno, the timelines are all wack now 😭
Oh uhhh I thought the ones I sent were all you needed
The two screenshots with timestamps that is
And then I forgor
What exactly is required of me?
I'll do it first thing tomorrow, sorry for my recent inactivity, uni is taking it's toll
just checking because the conversation all of a sudden stopped here
and there was no "ALL STUFF SENT, REVIEWT, WE GOOD"
I've still been checking the main article each day - no reviews yet ☹️
ah its wolfgangs turn now?
Since it's our first (and only
) feedback it's kinda important for us
Ye
me rn
I've been mainly fixing Tido bugs over the evenings, so that it doesn't crash on every second user action
And is actually presentable
the crashes are the spice
Half of the bugs were yours mister Bingus
And I'm not even taking into account that you tried to gaslight me into adding some arcane swapchain extension to fix them
It's fine, he is a slow learner, but he does learn
(ily Patrick don't hurt me please)
I must have dementia
when did you send that
hahahaha
nearlz worked
i will kiss zou
Here
hte day saky leaves me is when i end my minecraft world
@hazy steppe can you pin so I don't forget again 
#1168692074447642664 message
The day you write a coherent sentence is the day I cry with happiness
Hehehe yeah I know
i have a broken word quota to achieve
Parsing your sentences has been a worthy investment of my skill points
Btw Jaker it should be noted that the "mark required pages pass" heavily depends on the page size (128x128) in those measurements
It almost doubles when making pages smaller
Not sure if it's relevant, but just so that you know
damn maybe we should do benchmarks for a few more vsm/page size combinations
small/large page and small/large vsm maybe (so just four total)?
Alright m_saky, I'll remind you about MEASURE VSM LAZY
in 12 hours. ID: 67661631
sorry for bothering you when you already did the original benchmark
oh no worries at all, I'm glad I can help
epic table 
I think we need to merge all the pre-draw passes lol
for now I will just sum them but it will be somewhat incorrect
perfect fit
whoops here's context
I'm not sure what the debug passes are exactly
In my thing the debug visualizations are just in the shading pass
They can be omitted I don't actually know why I even timed them
They just generate an image for imgui
Should I also include sampling measurements
Wdym
Do you have filtering
I have pcf
Are you doing the radius marking thing
Crappy one but that shouldn't matter
How does it affect perf
I'm not, I'm just looking for allocated page
So cascade chosen by heuristic, one above and one below
Uh 8 samples is like 0.6ms in 1440p I think
16 samples with super sampling were around 4-5ms though
But that's an extreme case
my friend suggested using a stacked bar graph instead of a boring table
trying to make this look good is cancer. maybe that's why no one else does it
You need better colors 😄
I am looking at other papers and some of them don't have conclusions (they just end on results + limitations + future work)
the 1-sentence conclusion I wrote sucks so I just commented it out for now
anywayyyyyyy Someone™️ should review what I have at the end currently
I need fresh eyes on that part because I've been looking at it for too long and my neuron has calcified
@wooden jolt I nominate you, as saky is busy profiling the sampling pass
as you wish boss
basically I'd just like for you to pls review the text in Results and thereafter
- is it missing any important information?
- is there unnecessary information (I'm afraid much of the future work might be)?
- basic editing concerns like poorly-worded sentences, misspellings, etc.
I've lost the ability to tell what it needs
My watchful gaze will fall upon the white sheets once more tomorrow
@sweet nimbus I reviewed the new sections btw, they look good but I think we should at least mention HZB culling right?
I managed to implement it (unoptimally) and it works wonders
what saky explained is probably how to correctly do it
or 🅱️erhaps we just shove all of that and performance optimizations into the future work & improvements section
What’s the font we’re using? I’ll switch the diagrams to use it today if we know it
Hopefully we get some Wolfgang comments soon
I think Peter will be reviewing our section
Okay I'm 95% there, I improved the timing collection and everything, but I am super tired, is it a big issue if I give the timings tomorrow?
Nope it's fine
@wooden jolt I didn't look at your feedback yet, but did you have anything to mention about the section with one sentence (local lights) 
I've been thinking about local lights, does Unreal really doo VSMs for them too?
huh 
I think we should expand on the section or something lmao
idk what to say though
Feels like a horrible idea to do VSM for local lights
don't you want to mainly cache these and agressively downsample?
UE5 does VSM for local lights
how wtf
the clipmaps behave differently
it's just a mip map no?
ye
so it's a dynamic resolution system for local lights essentially
but it still uses 99% of the vsm logic
but still, you draw the scene additional 6 * 3(active visible mip range) per local point light
I don't think you need hiz for local lights, just cull against a cube "frustum"
you really just cull against a cube yeah
oh because local lights have a range
and they also cull the VSMs themselves for local lights
I'm stupit
they have a feedback pass where they just cull local lights that are not in view or do not contribute to the lighting
(or if adding more local lights would go overbudget)
hmm I'll just add that local lights are feasible with VSMs, but we "chose" not to implement them for simplicity
ye just mentioning is fine
VSM are a great fit for local lights for the same reasons they're a good fit for directional lights
but can't implement them because, you know 🕐
it looks like Computer Modern is the default. I'm not sure if they overwrote the font for the book
Latin Modern is a similar one you can download to use with diagram tools etc.
okay here are the measurements
I always included only changes from the original two measurements
so for example changing page size from 128x128 -> 64x64 changed no values when I was not caching and only changed the bookkeeping when I was caching
For sampling the "no search" means that I only attempt to sample the current pixel page and don't try to go one cascade below and above on page miss
Berfect
for practical purposes, I think I will just use the mean numbers in this even though they don't paint a perfect picture
I do want to show how vsm/page/sample counts vary the perf though
so I'll probably make three tables
or one big table
or two medium and one small 
@wooden jolt can you elaborate on the comment you made on the optimization section? What do you mean by compute
I meant to have asked this yesterday but I forgor
I mean compute-based software raster
it's kind of a key point in VSM to make it quicc
it's all theory sadly
but you can logically draw that conclusion pretty easily
bound by PROP -> remove PROP from the equation -> enjoy speedup
but who's to say that whatever we make will be faster than PROP 
by the same argument that primitive culling in compute is faster than fixed function culling
it's not faster in mine 
that argument is just "hardware slow" bruh
yes 
certainly we can beat the hardware in some cases
but ye either way we really shouldn't try to state this as fact
and only just in theory
game theory
oh damn this reminds me
there is no backface culling in Tido yet
uhhhhhhhhhhhhhhhhhh
I'll have to fix that tomorrow
i fixed the thread's initial emojis btw 🙂
What are unreal guys cooking
Why is the tile smaller towards the shadow edge, rather than when closer to the camera
forbidden witchcraft
really though wtf 
they're probably using the shadow mask to find edges and create frustum dynamically?
it's probably the best solution to projective aliasing isn't it
there's a million different ways I can see this dynamic frustum partitioning failing tho
perhaps some kind of feedback to figure out how "full" the frustum is and the ratio between the expected size and the maximum size for said frustum
and why am I pondering this when it's 3AM
why is the frustum allocation so irregular tho what the hecc
how does that make sense? do they have multiple clip N projection matrices?
that's nuts though
Yeah I have no clue, or maybe they can afford a per page projection matrix, uhh I don't understand
Will need more pondering on the background threads, hopefully they come up with something while I eep
I have run out of background threads
can you spare some 
I need to invest in more brain wrinkles
U should download unreal yourself
Keep the enemy close, I like it
Or I can take a video for you lol
Nono it's fine haha, I will download
They also do some magic with SMRT where they dynamically determine their ray count or smth
More food for the neuron ig
I guess you can use some heuristic to tell if you're on an edge and to therefore allocate more rays
I think the final fantasy shadow paper discussed something kinda like that maybe
Actually I think that was just doing the shadow test, then seeing if the light needed to be shaded if the occlusion was less than 100%

this is giving me intrusive thoughts doubts about our whole strategy
actually I feel better now
it seems like they are a little smarter about clipmap selection- note how the boundaries are square
compared to this
some of the texels here overlap two valid pages, but choose the less detailed one from the heuristic
it barely matters, but yeah
magic
there is some data missing here
I'm sure you're aware
just the draw and bookkeeping for some stuff on the right
I didn't want to make it three tables. I'm taking suggestions on how to make it look better
For caching, the sampling timings remained the same as the original one
Das why I didn't copy them over again
ye I understand that
Ah this is what you mean
Okay I'll remeasure for these three setups you have
And fill in the values
btw didn't you add some optimizations that improve these numbers
Yeah exactly
Just the draw was affected though
But ye, I'll modify the table once I have the values again
I think I won't include the "sampling with search" results because we didn't discuss that
hmm
does anyone (not saky because he's a busy boi) want to take some pics of the various vsm debugging features in timberdoodle for the appendix
I also think we should each read the entire article again this weekend as a final pass to make sure things are good
then I will email the editor for our section and ask what's up with the timeline
it would also be useful to see numbers for 1 sample and 8 samples (no search for both)
ah and I guess the sampling numbers are for 4k, 128x128 pages?
Ye but it doesn't affect perf at all (the resolution or the page size)
Hmm but the page size should affect it
Uhh hmm
Okay, I'll also add that
that's quite odd tbh
If it didn't affect perf then I'll mention it
I'll remeasure again
I hope it doesn't take too much time
I tell designated people at work to do perf testing like this and it usually takes them a while to complete it
saky is an amd eployee confirmed
All good, I'll get to it this evening (after I do dynamic objects)
My perf testing is not super rigorous, I mainly use the timestamps + sparse nsight captures to validate
Hope that's enough
perfectly fine
nsight is overkill imo unless your timestamps don't capture something
we just need to give the reader a rough idea of perf
Nsight is more of a sanity check thing
To catch bugs in my perf measure code (if there are some)
the trick is to exclude barriers in the timestamps so the oerf looks better
I probably also should wrap each task with a full barrier to get the true exec time
(only for perf measurements ofc)
yea
If I can get it running on my machine I’ll give it a shot
I think saky mentioned he has new build steps posted unless I made that up
You might need to copy something into the cmakelists.txt to make it fetch vcpkg
oh I forgot about that sorryy
I remember building it for me was roughly:
- Copy vcpkg lines into cmakelists.txt
- Generate a VS solution or open the root directory in VS
- Build
- Run the exe from the root directory
adding it in a bit
JS might be able to do it with my crappy instructions if he's familiar with cmake
yeye I'll just add the vcpkg thingy into the Cmake
Well, with the vcpkg stuff
automatic download of vcpgk is now in, so all you need should be cmake and ninja
I've run into a few issues. I might have to continue tomorrow
it seems like the mirrors vcpkg is trying to use I can't access and I'm not too sure why
I'll need an error or something to go off of
I wasn’t able to work on this yesterday. I’ll post error info in #1166523139468034068
I'll stop spamming it with my stuff too
https://youtu.be/DDi62bPbWHw?t=885 you guys trying to build a car
shoot
i ll make my own early z
transient Z buffer is cope btw
you know what would be very helpful
some sort of function to feed the rasterizer stage that transforms output coords
which basically does this:
physicalMemoryCoord = f(primitiveCoord.xyz)```
and physicalMemoryCoord becomes gl_FragCoord
Correct hybrid ray traced shadows require a linked list of primitives per texel
but its holey
And you need conservative raster
just another rabbithole
what hw could we ask for to make hybrid raytraced shadows faster
(i.e not using a per pixel linked list
)
the sad thing is that it's a short rabbithole 😢
figure out per pixel linked lists
rip 🅱️erf
give up because there's actually no way for optimization
hw accelerated linked lists
VK_FORMAT_R32_UINT_ARRAY_10000
@mystic lark I can't remember if you got those new benchmarks
Ah I forgot
I'll try to get them today (have some urgent uni work I need to do first)
downloading
btw patrick was testing something and apparently if you run it through nsight you get much more consistent results
also discord and other electron apps cause random spikes in the frame times
nothing major, just letting you know
I just need ruff numbers
what I did to measure was
- hit Stop Gathering and reset timings in Render Statistics
- hit Use preset camera and press
Ito override the keyframe in Shader debug menu - maunally set 0th keyframe and 0.0 keyframe progress in Shader debug menu
- hit start gathering and press
Iat the same time to start the gather and start moving the camera - hit stop gathering once the loop is complete (8th keyframe 1.0 keyframe progress)
its a bit cope, but I am really pressed for time unfortunately ☹️
(also make sure to pull latest Tido I made some small changes)
perfect, thank you for doing it instead of me
I'm just trying to figure out why it's not loading the model (probably my working directory being fooked)
and now I'm trying to figure out how to use the VS debugger but have it set the right working dir
I'm a bit confused, so it loads the model now?
- after I pulled, the application would instantly close when running it
- I'm trying to use the debugger now, which I previously wasn't because it would set the wrong working directory (so the assets would load)
- I'm trying to set the correct working directory for the debugger so I can use it
I did, the error message is nothing 
I try to load the wrong file I think
I think I know why
it's crashing when dereferencing the ball
because the file didn't load
This should be uncommented
ah
yeah I was fiddling with this stuff
I typed it myself and had
std::filesystem::path const DEFAULT_HARDCODED_FILE = "bistro_fix_ball_compressed\\bistro_c.gltf";
Ye I told Patrick to wrap it in an if block it but ig he forgot to push
Meant to answer to the message above ups
yay I can debug it now
Good good
Very nice, that’s a cool demo
I should give saky a raise
it would be cooler with a certain cube model : > (not really)
hmm why is there no deccer cubes emoji or sticker
I'll see what I can do
I needed that because I had a bug
so I had to have something to reason about what I'm seeing
📸 yep, that's going into the appendix (it's a good debug view)
we need a debug view ranking btw
I like the projected overdraw the most
(because it gives me nightmares)
btw there is a bug in my heuristic somewhere that I've not yet had time to fix
you can see it when you have visualize clip levels on
sometimes (for some reason) the shade opaque pass requests pages that have no been allocated by the mark required pages pass
that doesn't seem too bad tbh
I did find an odd flickering bug though (might not show up in this vid because my monitor is 240hz)
the search during sampling makes it go away
oh god what is that
actually it might be a caching thing
jaker it's 3am
how low is the sun?
ALSO I might've moved the sun a tiny bit by accident which would cause the cache to be wrong
actually nah it looks obvoiusly wrong when I do that
oh yeah I also need to add invalidation on sun movement
I need a checklist
it's hard enough to trigger my bug that it doesn't really matter tbh
I gotta do this
I have a theory for this btw
I think it might be my unprojection from depth to world as I don't account for the near plane there
actually no, that is not it
eh idk, I'll solve it later 
it's no biggie fr
it's time to rest soldier
I think tido is basically shippable for the demo
well maybe the cache invalidation when sun moves but eh
I have some Java profiling that needs to be done
java?
uni garbage
knocking at my door
due till tomorrow midnight
and tomorrow I'm not here half the day
I am so happy I did all the java related exams (DB and UI)
it's in the top 3 worst exams
I don't think I can PR my function without making a new fork and going through a bunch of ceremony
it would be simpler if you copied this
#1166523139468034068 message
okay
I can also ask Patrick (I assume he will have no issue with that) and give you access
that would work too
this will be my contribution to tido, then I'll keel over and die as my life's work is complete
'tis but the beginning of a great journey
you won't die so easily
I will drag you all into nanite one way or the other
or maybe GI
hmm so many things to bikeshed
if you make a nanite library then I'll happily use it
GI is something I want to try though. the weeds of geometry processing do not interest me as much
I need to start with shrimpler stuff first
like getting the vk version of frogfood working
after this article
when's the deadline btw
idk at this point 
tomorrow's lectures are cancelled because uni exploded so I'll do some reviewing I guess
also I have to get tido working
yesterday was when we were supposed to submit the code
three weeks ago is when the draft review period was supposed to start
I'm gonna email our contact later tonight and ask what's up
we have not received any review which is kinda sus

I'm guessing the deadlines are a bit more... "flexible" than they initially seemed
also fyi: I think my hardware is too old for Tido. Would anyone else be able to take screenshots of the debug stuff?
I have another GPU laying around but it apparently also doesnt support mesh shaders 
in this machine it's a gtx 1060, and in another one it's an rx 5700
I see
Mesh shaders aren't supported until RDNA 2 (6000 series) and idk about nvidia
So turing+
yeah
apparently apple supports them, but trying to get things running on mac may be cursed
I might still try lol
Oh yea, we committed to the mesh shader pipeline when I brought over VSM, it was a nightmare maintaining both compute and mesh paths ☹️ so mesh shaders are now unfortunately required
only on M3 GPUs though
for M1 and M2 its emulated in compute and might not actually be faster than normal vertex shaders
That makes sense
I managed to get it to compile on M1. It doesn't run (symbol missing - maybe arch mismatch somewhere), but it compiles
hmm I didn't know that
I think M3 is also getting raytracing hardware
are these good enough for the appendix? I plan on having two more pics of other debug stuff
if no one says anything in the next 30 seconds it will be added
Those are some strict deadlines you have 
yeah M3 was the big GPU update, it got that dynamic register allocation thingy, hardware mesh shading and hardware RT
the apis for which have existed since M1, but both things are emulated in compute on M1 and M2
no idea about performance numbers on M3 though, but I do have one
hmm, what if I were to capture a debug view from my engine and disguise it as saky's cooking
delightfully devilish I must say
(I just want a pic of the overdraw view)
smart
comment within the next 5 picoseconds if you don't like this
hmm maybe the colors need to be hotter
remove the missing alpha placeholder and ship it
Then we just need one from lvstri and we’ll have snuck a screenshot from all 4 implementations into the article lol
Pleeeasee 🥹 🙏
it's just an imageAtomicAdd to VSM physmem sized image 
yeah this is a screengrab from paint.net lmao
If I didn't have homework due today I would immediately add it
ok I add it
I wonder why there is overdraw on triangle edges
hmm you mean overdraw from helper invocations?
it can't be helpers because those don't participate in any output
here you can see overdraw even though it's one triangle thick (you can see near plane clipping at the bottom)
If you are serious follow my example in vsm_state to create the image, then cull_and_draw_pages task head in vsm.inl will need to be modified and the call itself too and finally the fragment shader of cull and draw pagesv(both opaque and transparent)
It should actually not be that hard to add, daxa makes stuff very easy
You don't need to cope with the special views I do for the memory block
gotcha
Regardless we can use Jakers pretty views and I'll try to work on Tido a bit more tomorrow to finish up the remaining bits
perfect fit
beautiful 
I decided against having an explainer saying why debug visualizations are good and just kept it at two figures
Seconded
the text may need some light editing, but I'll let someone else deal with that 
all the article needs is for me to do those damn benchies (aaaaand perhaps one more light editing pass)
given the current pace of things, it seems I am fine to put them off until tomorrow 
So did you folks get any feedback on the article ?
nah you've had a disproportionate amount of stuff on your plate lately, let me take some load off
I also think you deserve to have the first author name since you did the most writing and contributed the whole-ass demo hehe
saky is an absolute machine recently
i added a single comment to the article btw
man this is so cool what you guys made 🙂
Yeah good job everyone! It turned out really nice
will we have a place in the article where we can add something like "special thanks to" for anyone who helped read/comment?
We can have an acknowledgements section
VSM is mentioned in this presentation near the end. should we reference it?
https://advances.realtimerendering.com/s2015/aaltonenhaar_siggraph2015_combined_final_footer_220dpi.pdf
we do not
I'll just add a sentence
fug idk how to cite presentations like this 
eh whatever I'll try
you guys can clean up my mess 
ok I hate how all the info about VSM is packed into two slides in this presentation
128k VSM 
how do they render all pages at once
I think they mean the "theoretical max resolution"
are they doing sw raster
idk how the vsm could be so big and they draw all the pages at once
unless they only draw a small portion of it at a time
idk how to even mention this now
someone with a shitter account could probably send a msg to seb?
I don't
it's way too ambiguous ye
sw raster is one way 
the other way is that you are only seeing a tiny part of the shadow map at a time
so you just draw a box around it
I guess you could construct per page projection also
But that seems nightmarishly slow
you'd need one drawlist per page
Yeah
which is bonkers 
most importantly, how do we relate this to our article?
"virtual shadow mapping has been explored before, ...?"
Yeah exactly
that is unsatisfactory imo
what were the lessons of the previous exploration of vsm
(I mean obviously we didn't use them as we just learned of this thing but yeah)
Well the issue is we have no idea what they do in the presentation
I'd tie it with the unreal citation
But eh idk, the presentation information is extremely unsatisfactory
just enough to say that they did it first 
I think it may also be worth mentioning the other virtual texturing stuff that has been explored somewhere around that part too
I have to eep now but I'll think about it
Is it okay to just dm Sebastian?
Does he have a website
You can get emails from git commits
https://github.com/sebbbi
Welp
twittershitter™️
There was a thing you could do, like putting .patch at the end of a commit url on GitHub to see the raw commit info