#Fwog and co.
1 messages Β· Page 5 of 1
for some reason, nothing renders when I run the examples through renderdoc on my work pc
then renderdoc crashes when I try loading a capture π©
on amd gpu?
I guess I have a UBism somewhere
commented out the whole RSM pass too just in case, but that ain't it
also note that it runs fineβ’οΈ when it's not through renderdoc
do you have ogl debug callback?
ye
ok that is odd
not that you have debug callback
odd is that it's not giving clues
nvidia's debug callback is pretty vocal about every little thing
even non errors
renderdoc is doing something funky I guess
but it's only adding stuff to spy on gapi calls and shaders
see if you can open this
yeah very funny
hmm that seems suspiciously small
or maybe I haven't told you that I don't use renderdoc
I did make the window tiny though
the captures seem corrupted somehow, even a 2k capture is less than 1mb even though I have several screen-size textures
wtf even hello triangle fails
well, it doesn't crash at least
but it displays incorrectly
I'm just going to assume that my renderdoc installation is cursed since I have been having other issues with it
i can load the capture
does it look like my pic?
yes, no vbo/ebo buond
it should be one of the sub-events for the draw call you have selected
wait no
it should be higher up
in the scene pass
or base pass
RSM Scene shows valid vao+vbo et al
interesting
mesh output shows this
vs input is fine
for base geometry adn rsm scene
maybe because of
sounds like bs

GL_LINES is 1 i think
and number values are interepreted as glenums in function parameters
can you try this one (it's hello triangle)
for me the triangle doesn't render at all, but when I look in the capture it is black (which is also incorrect)
purple screen hre
ye it should be purple + 
ye, just shrimple purple
the mesh is fucked
color attibs are all 0 and when you click through the vertices in the vsout part, it only jumps back and forth on the start/end points of that "line" you see there
thats also what you see when you do wireframe draw in the textureviewer
vertexbuffer seems to have only 1 vec
0, 0, 1 is in there - thats positions
I was able to open the capture and the vertex positions looked correct, but the colors were wrong
a_color cant be displayed
just shows --- -- --- here, when formatted to vec4
its rgba32 according to the vao
a_color should be RGBA8_UNORM
I get the same
are you playing with a 7900 XTX? π
π
I'm on a 6800 still
I wouldn't be having you help me debug this problem if it was an unreleased GPU 
anyways, I get the same thing here, but when I look in the VAO creation I see what I intended to put there
ye thats what i just checked too and formatted the buffer like that
looks like 256, 256, 256 as 3 values in a_color
try ubyte x3
it shows 255 255 255 for me with ubyte x3
ye same
tl;dr I think renderdoc is being dum dum
mayhaps
the vao showing funny shit is not the first time in the VTX section
failing on hello triangle is not good
resource inintitniittitiliaization for vao 57 also looks funky
I see a million glVertexArrayVertexBuffer calls after the pic I showed
exactly
thats what im referring to
those cant be all calls in all frames
otherwise the purpose of this window wouldnt make sense
ye it says frame #118
i would assume you see all calls on this resource during its creation, not durin its use until whenever
can you create a new capture and check all the checkboxes
wat
capture options
ah
strange how imgui works, but the other shit not
wdym
its coming "out of the screen"
and you are looking right onto the hypothenuse
thats why we dont see anything
is that some glProvokeVertex shisms?
for me, it is somehow rendering even though the third vertex is invalid (I guess AMD makes it become (0,0) in that case?)
my VS in is this
eh I'm just gonna give up on this for now
I can report the issue lat0r, I gotta work on work stuff
please report and link the issue for updooting it
i do be rembering that when i had my r9 fury in this linuxism it also didnt work, just black screen
i fink repro should also be shrimple wih the hello triangle repo
its cmake, select hellofrog and run
- capture
this only happens on AMD afaik
captures work fine for me on NV
I'll submit the report laterβ’οΈ
hehe just try again
although none of the glsl plugins worked really well at all in vs
I have one that 
Classic case of "not my fault- it's the drivers, I swear"
when an accident takes place the driver is the primary suspect
Bugs are never a gpu devs fault
especially not crashing on minimize bugs
That was intended behavior which happened to annoy some users
If you can't handle me at my crashes, then you don't deserve me at my undefined behavior
lol
manually implementing texture bilinear interpolation is extremely cursed
if you land exactly on a texel center, you get an ambiguity which can lead to strange problems
my math was resolving it differently than textureGather was
I also discovered that my reprojection math is somehow off by half a texel
actually nvm that, I'm just trippin yo
when your sample lands exactly on the center of a texel, it could be within four different "boxes" for interpolation
take my expert drawing
if the texel (red) is in the magenta box, the bilerp weights would be (1, 1) (since it's on the top-right corner), but if it was in the green box, the bilerp weights would be (0, 0)
ah
so what I was experiencing was textureGather choosing a different box than what my math was choosing
so my weights were wrong
well yeah you snap to the centre of just one, usually top left
that was one of my solutions
the other is adding an epsilon to the uv
which is icky
no I mean bilinear is snapping to top left, getting the other 3 samples and interpolating using difference between snapped and original
ah
what I'm actually doing is just snapping to the texel and not using any interpolation if it's close enough
it is usually done by remapping uv to texture size and truncating to int, then the fractional part is your interpolation factor
ye that's what I'm doing
to get the weights
but only if the UV is not nearly on the center of a texel
why make any special cases?
I explained why
which four texels does textureGather give me if the provided texcoord is on the center of a texel?
it will give you the texel and zero lerp factors
I don't understand
textureGather gives you the four texels that would be selected for performing linear interpolation if you sampled at that point
it doesn't weigh them or anything
do you have to use gather?
you could do vec2 samplepos = uv*textureSize(tex); ivec2 texelpos = ivec2(samplepos); vec2 lerpfactor = fract(samplepos);
then use texelpos to get 4 samples with texelfetch
apparently this person went through similar pains as myself
https://www.reedbeta.com/blog/texture-gathers-and-coordinate-precision/
so how about dropping gather
it's implementation dependent
that branch is probably going to hurt you more than what you get as a hypothetical win from gather instead of 4 fetches
not just branch itself but the used registers to evaluate it
assuming I'm already close to losing max occupancy due to register use
not on a rtx no
which is definitely something that would need to be profiled rather than guessed
well you seem to know better so I'll leave you be
taking four samples manually might be the best way to preserve what's left of my sanity after all
actually I decided to do something else entirely 
please welcome float RejectDepth4(vec3 reprojectedUV); to the family
I think I'm going to have nightmares of temporal ghosts haunting me
holy crap when I move something in front of my eyes really fast I see ghosting
nobody believes in TAA ghosts until they trick themselves into implementing TAA
I'm too smart for that. I'm only implementing SVGF (which requires TAA on top of the spatiotemporal filter) 
I kinda wish opengl glsl had samplerless textures (you can only call texelFetch on them) like vulkan glsl does
just a minor convenience
@shell inlet I'm unsure where in the pipeline I should put the variance calculation, but I guess
. It seems like the variance can be used to drive temporal accumulation as well as the spatial blur at the end.
Maybe I should compute variance of unfiltered output (or from history?) -> temporally accumulate (using variance) -> compute variance of the output -> spatial blur (using new variance)
I see pros and cons in the different places to compute variance, and maybe there is no one true answer
I have a feeling that I need to calculate the variance twice
it seems like these guys compute variance just once near the beginning
https://youtu.be/RTtVC-Zk5yI?t=444
they also use a phat kernel for computing variance
oh wait these guys also have temporal variance 
so there is "history moments" which I guess are accumulated per-frame moments (accumulated in the same reprojection pass I guess (though the variance won't go down over time unless you do something))
so then with these history moments you can compute the temporal variance you use for driving the spatial blur at the end
this variance stuff adds a bunch of complexity π
da paper should has every answer
i dont remember where is everything at
I find it odd that svgf uses a constant temporal blend factor of 0.2
I feel like it should be driven by variance or something
no
why
variance is not the same as temporal gradient
variance does not help fight ghosting
the idea was to give more effective samples to areas with a longer history or lower variance by using a lower alpha for those places
I swear some implementations do something like that
ah, that's a-svgf
and I'm guessing they don't use just variance for that, judging by your response
well, my idea was not exactly about approximating the temporal gradient
it would probably make temporal overblurring worse though
just heard one second of him talking, is he german by any chance?
Probably
they are everywhere π
there is so much interesting stuff out there
temporal gradient is easy though, just have 1 pixel in 3x3 tile compute lighting with exactly identical surface info between 2 frames and take the difference
so you need to store a chunk of previous gbuffer or smth and use it to calculate new illumination with the up to date state of the scene, then compare it with illumination from previous frame
then the difference you have is only related to change in illumination
Seems π¦le enough
too quiet
hows the progress going
slowly, but I think I understand almost the whole svgf paper now
I noticed that some impls use a bilateral filter for computing variance, while others use a separable gaussian without edge weights
I suppose I will try both
they prolly drop weighting to squeeze performance at the cost of losing quality
bilateral should be superior
I should note that the impl in question (which uses a separable gaussian) is specifically for shadow denoising, which means the only edge weight that matters is the depth weight
So they can get away with cheating more easily
yeah sounds reasonable
glsl has cool syntax
bool valid[2][2] = bool[2][2](bool[2](false, false), bool[2](false, false));
I was having a strugglelele with manual bilinear filtering (again
), but I managed to solve it by swapping some numbers, and idk why. Here is the shadertoy I used to solve the issue
https://www.shadertoy.com/view/DsXXWB
pay attention to what you pass to bilerp
you had to swap them because you swapped the passed arguments
dammit 
wait no I intentionally swapped the arguments
the shadertoy as I sent it is the "correct" thing
even though the code looks wrong because of the swapped args
my confusion is somewhere else
vec3 Bilerp(vec3 _00, vec3 _10, vec3 _01, vec3 _11, vec2 weight)
return Bilerp(_00, _01, _10, _11, weight);
?
intentional?
yes, surprisingly
why though
when they are in the correct order, the output is this (the issue is with the inputs I pass to the function)
so I just changed the order of params because it was low effort
the problem is a misunderstanding elsewhere
i thought you changed the function?
//this really is left vertical
vec3 bottom = mix(_00, _01, weight.x);
//this is right vertical
vec3 top = mix(_10, _11, weight.x);
oh man I really messed that up
so if you make the order correct and swap in the function it will remain correct
yeah changing those to the correct thing fixes it
I appreciate you looking and confirming my dementia diagnosis 
you're welcome
wanna try fun thing?
use smoothstep(0.0, 1.0, weight.x) in your mix instead of weight.x
vec3 Bilerp(vec3 _00, vec3 _01, vec3 _10, vec3 _11, vec2 weight)
{
vec3 bottom = mix(_00, _10, smoothstep(0.0, 1.0, weight.x));
vec3 top = mix(_01, _11, smoothstep(0.0, 1.0, weight.x));
return mix(bottom, top, smoothstep(0.0, 1.0, weight.y));
}
I'll try it after I wake up
Kinda cool effect
the right one reminds me of the first time i was playing HL1 in D3D, going super close to the rock wall
Yeah, it seems familiar somehow
Reminds me of alpha masked geometry with linear filtering
With those round edges
the right one is the hardware
Bicubic in just 4 taps (presumably each is bilinear) 
yep you need bilinear taps
Magic splines strike again
here's my crappy 7x7 estimated variance image
now I need to give it edge weights and actual kernel weights instead of using a box blur
with vs without the smoothstep
I can't see much of a difference, maybe the plain bilerp is a bit smoother
doesn't look very stable to me
I had to boost it a lot for it to be visible in renderdoc
it's actually showing [0, .2] or something
seems like some impls do a blur on the variance image itself prior to spatial filtering, so I might try that
denoising variance to denoise image π€
why do you think you need variance weighting anyways?
you are denoising indirect which is a low frequency signal
makes sense for direct/shadows but not so much for indirect
disoccluded areas have higher variance I guess
I'm just implementing svgf
dissocluded areas have more variance because of low spp, so you could momentarily apply more blur there
yeah but svgf is for direct and indirect
you don't have noisy direct
svgf doesn't use variance weights for indirect as far as I remember
at least certainly not in quake 2 rtx
they use spherical harmonics instead to preserve normal map detail
in the paper they just split the direct and indirect and denoise them separately
maybe I glazed over the actual differences in denoising between them
SHs are not in the paper ye
well they state that variance weights are needed to preserve high frequency details
indirect does not generally have a considerable amount of those
not until there is specular indirect
specular is also separated
and is even harder to denoise than both of them
do they even cover specular in the paper? I don't remember
specular is a whole different beast which is a pain to even reproject
and since it's highly view dependent it results in extreme ghosting with any of the naive reprojection methods you use for diffuse
they talk about it a little
they mention that one of its weaknesses is overblurring reflections
I think "beyond svgf" paper talks exclusively about specular
the paper is 15 pages with an extra 51 pages of random path tracing info stapled to it
will read tho
also what if bicubic
it's considerably smoother than bilinear
it's a copypaste of 3 functions and 1 call to sampleBicubic
should work as an alternative of texture
drop in
I'm not calling texture, so it's not drop-in
it's manual bilinear filtering except different (it's the one described in svgf)
ah okay I get what you are talking about
it's this thing in case you don't want to find the paragraph
yes I remember that
I think I mentioned this one time, maybe not here and not to you specifically
that svgf uses software bilerp like this
it works way better than the garbage I was using before
which was basically written without a reference
that's what I meant when I said reprojection is trickier than it seems
ye
actually I did say that to you and right in this thread
to the untrained eye, "software bilerp" seems equivalent to what the hardware gives you
tl;dr I should've just read the paper more carefully
svgf source code should also work and maybe even faster than only reading the paper
is the source code just falcor
reading both works best id say
source code is a chunk of falcor framework, maybe it's gone from nvidia developer page by now
in that case its just in falcor repo
yeah I started by reading falcor and some DX sample, but it was hard to understand without context
i still have the source code from nv dev on my home pc
why
ah, my linter is throwing a fit for some reason
added a new atrous pass, which doesn't look bad if I make the samples absurdly sparse (4, 8, and 16 wide)
I get low-frequency boiling artifacts which I'm not sure how to remove except by reducing the temporal alpha
going from 0.2 to 0.05 makes it a lot better in that regard
though the temporal lag is bad unless your FPS is absurdly high (which it is in this shrimple scene)
that's with 0.2 temporal alpha and the super wide spatial filter
still looks cool
when I add two more passes to the filter (so it's 1, 2, 4, 8, 16) it looks better, but the boiling is still there
wasnt pjbomb or the other guy rmax? talking about that boiling too some time ago?
just add pasta
its spaghetti code already ;D
idk
I wonder if the issue is that my framerate is too high ποΈ
it would certainly make the boiling appear quicker
can you decrease the frequency of that boiling somehow?
I can lower the framerate
heh
I can decrease the temporal alpha so it blends more frames together
which removes the boiling, but adds temporal lag (moving the light makes the lighting update more slowly)
how frequent do you move a light
right now it's just the sun, so only when testing
or would you actually notice in an actual game using that engine?
when you focus on gameplay?
but there should be minimal temporal lag for practical purposes
of course
the default of 0.2 has almost unnoticeable lag since it's only about 5 frames of accumulation
unless your FPS is really low
it also results in that boiling you see
ah
swr is also doing some GIisms
or is it "just" about implementing a paper properly now?
you can think of the temporal blend as just having extra samples to spatially filter with rather than something that solves noise on its own
right now I'm just implementing the parts of svgf that seem relevant
hmm, I feel like the variance stuff may still be helpful since the amount of noise is not constant across the image
It won't solve the boiling though
I wonder if the sampling sucks somehow, or if I just need TAA like they have in the paper
the 2nd pic shows the variance stuff in action neh?
boiling is due to variance, if you remove spatial filters and leave only temporal you will see that it's just bad, either needs longer lived history or better sampling strategies
surface sampling is just that bad
but what if we add restir
also are you storing first pass of atrous in the history?
the trick to "increase" sample count
by the way what if we, instead of using disk random, build a 2d discrete pdf out of the shadow map, based on "flux" intensity
on the fly
then use it to sample
maybe not even at full resolution to save performance
should still be pretty good
now that I think about it maybe it's not going to be pretty good π€
I'll be back after I eat
Yeah
I tried different steps for that pass and it didn't help much
what does it look like before you blur it?
I bet all noisy still
anyways the reason why sampling whole rsm based on intensity would not help much is because we are already pretty much sampling only emissive parts since we sample only whats visible from the sun pov
there is no way to not sample that (i.e. sample areas in shadow)
there is a room for improvement in other aspect though, the cosine still makes flux variable so maybe sampling using discrete 2d invcdf could still improve sampling but not by much I think
then there is a third factor that influences contribution - geometry term
if we could sample proportionally to it then I bet the variance could be substantially reduced
perhaps resampling can help here
These pics are after a 3x3 filter
With and without temporal blending (not in that order)
Yeah 1spp
More samples fixes it but I'd like to see if smarter filtering can help first
I also haven't tried quarter res yet
smarter than svgf is probably only RAE
but before filtering you really need to step up the convergence game of your 1spp
That's tough
I wonder if there is a trick that could improve the perf of the extra samples. Currently the sampling pattern obliterates the cache
Perhaps we could take multiple samples in a random small area
The extra samples should be cheap in theory since they are more coherent
But also maybe not, since cache lines can be evicted between each sample if you have enough competing threads
I bet the memory is laid out in morton order under the hood so as long as the tile that the taken samples cover can fit in the cache it's going to be somewhat cheap
otherwise it'd need another reload from vram to take the sample and flush the cache
that's just speculation I have no idea how real hardware operates
but that's how a software rasterizer would do
actually you tell me you work at amd
but anyhow I don't think that making samples coherent is useful because you actually want the samples to be incoherent, that way it means you are getting fresh diverse samples
It's laid out in tiles, and I don't believe they use a z curve
maybe a z curve per tile?
Take it with a grain of salt because there is a lot of conflicting info floating around so I'm not certain if I'm describing AMD hardware
Possible
iirc PS2 hw had z curve per tile
I've seen textures encoded like that in a ps2 title
which are immediately loaded into the memory from the disk
they call it preswizzled
Idk how big the tiles are either
even if today swizzling means .xyzw thing in shaders
Yeah AMD recommends using a certain function to swizzle your threads for ideal access
swizzling threads hmm
and for ideal access of what exactly?
I was talking about swizzling meaning rearranging layout of texture texels
and preswizzled meant that textures are stored as swizzled for hardware to consume directly
ARmpRed8x8 is the function
https://gpuopen.com/performance/
Compute shaders section
The function I mentioned is for writing though
But I don't see why it wouldn't work for reading if your access pattern is predetermined
this resembles some sort of z curve
not exactly like the textbook morton order
but not entirely unlike the one either
I like how the code is unreadable (I don't)
I did a test
first image = previous setup (using one per-frame rng value for influencing xi.x and xi.y in the shader)
second image = two per-frame random numbers for xi.x and xi.y
@heavy cipher is the reason we couldn't have the xi covid variant
anyways, here's when I do one rng for xi.x and xi.y (respectively) instead of both at the same time π¨π³
(xi.x corresponds to sample radius and xi.y corresponds to sample angle)
seems like applying the same per-frame random number to both values is worse than any of these alternatives
post-3x3 bilateral filter (left one is rng for just xi.x, right one is rng for xi.x and xi.y)
what if you stop rotating xi with two blue noise values
ah wait it's temporal not spatial that you're interested in
I guess you might want a low discrepancy sequence instead of white random
a temporal LDS?
I'm generating two white noise values per frame and using it for all threads atm
you may call that temporal I guess
I could try ur idea
the effect is that you will have less samey samples every frame
ideally, any consecutive group of ~5 frames should have evenly distributed noise (kinda like interleaved gradient noise I guess)
because white noise has high frequencies in it and causes clumps
lds is more evenly spaced or spaced in a way that allows you to explore the space more effectively
TAA impls use interleaved gradient noise so you can temporally explore space efficiently
or so I hear
however, I think a 1D LDS might be good enough
you have everything up on git?
I wanted to try out specular some time ago and I'm home now
maybe working with what's there will do fine
no, as I have a lot of trash in the files atm
I think I have something I can push though
ah yes, where I add the better reprojection
I don't feel like writing complex stuff so I'll probably use some solid angle sampling from gi compendium and assume it's the specular brdf (lol)
pdf=brdf
no ggx

tell me when you're done updating the repo
I'm not pushing anything else today, but I have some epic changes in the works
ok then
I won't be touching the sampling function
I'll use what's currently there
how bad?
also was thinking of plugging in a different brdf at first but then could not find a brdf
maybe I could simply kill some rays that are outside of the cone
I suppose specular really benefits from indirect occlusion
funny but doesn't look as bad without marching
just killing rays outside of the cone
this is not physically accurate and broken maths-wise
but behaves like how specular would
yep my marching algo is garbag
boosted a bit, looks like specular to me
300 samples allow it lol
really need a good way to find intersections for this to work better
nice
idk if I'll be working on it further
don't know a better marching algo
it used to work for screen space reflections
float dist = 0.05;
vec3 rayWorldPos = {0, 0, 0};
vec2 rayUV = {0, 0};
for(uint i = 0; i < 10; ++i)
{
rayWorldPos = surfaceWorldPos + R * dist;
rayUV = ProjectUV(rayWorldPos, rsm.sunViewProj);
float rayDepth = textureLod(s_rsmDepth, rayUV, 0.0).x;
rayWorldPos = UnprojectUV(rayDepth, rayUV, rsm.invSunViewProj);
dist = length(surfaceWorldPos - rayWorldPos);
}
is R the reflection vector
it is
ok, I think I see how it works
you might have better luck with an algorithm that has a constant step size and a condition to break once you get close to the surface defined by rsmDepth
thought about it too, didn't try
man idk raymarching is hard
I've been thinking about it and the best method would be to step pixel by pixel to not overshoot
otherwise it's very possible that it will return wrong intersection point and if we sample around it most if not all the samples will be zero according to brdf for sharp-ish reflections
you can use a larger step and then perform binary search at the end to refine the hit location
if you know how, can you make it work
?
because I can't
I've been already thinking about other ways
other ways to narrow down sampling area
I use the described technique for ssr
https://github.com/JuanDiegoMontoya/GLest-Rendererer/blob/main/glRenderer/Resources/Shaders/ssr.fs
I don't feel like trying to understand this rn so I'll wrap it up for now
kinda already satisfied my curiosity by making brute force specular
albedo modulation looks weird now
when you disable it that is
probably because of temporal filter
what's weird about it?
nothing actually
I'm tripping
this is still brute force
just looking interesting
it only reflects what the light source is seeing
I somehow managed to make the driver cause a crash when linking my program
it can compile the shader successfully though
if I comment out this loop, it stops crashing
for (int x = 0; x <= kWidth; x++)
{
for (int y = 0; y <= kWidth; y++)
{
ivec2 pos = gid + ivec2(x - kRadius, y - kRadius);
if (any(greaterThanEqual(pos, targetDim)) || any(lessThan(pos, ivec2(0))))
{
continue;
}
float weight = kernel[x][y];
vec3 c = texelFetch(s_input, pos, 0).rgb;
float lum = luminance(c);
accumSamples += weight;
mean += lum * weight;
meanSquared += lum * lum * weight;
}
}
kWidth is 5
kRadius is 2
kernel is a 5x5 array
if I change weight so it's always 1, it doesn't crash
wtf
uhh why is it not strictly less than
it's less than or equal
if kernel is 5x5 and width is 5
you are probably going to get an out of bounds?
ruh roh
yeah it no longer crashes when linking 
the <= was some vestigial thing from the previous code that was there
for (int x = -radius; x <= radius; x++)
{
for (int y = -radius; y <= radius; y++)
later, one thing I want to try for improving perf is making the bilateral filter separable
this paper suggests that it's a decent approximation of the real thing
http://www.cecs.uci.edu/~papers/icme05/defevent/papers/cr1296.pdf
I'm late on this but: 
This is a certified hood glsl classic
maybe we can project a cone somehow onto the rsm to sample specular
basically a map from world to uv space
although physically based specular is not a cone
honestly I don't even remember how to sample TR CT
was something like reflecting view vector along the microsurface normal
and that normal follows the NDF
I'd like to just observe how samples are distributed in uv space when you discard samples outside of the cone
maybe can make a desmos graph
need some pseudo depth map manifold
ah I actually need to do homework actually π
bye
good luck
hm think about it, we are already wasting samples by sampling a circle disregarding the fact that we are only interested in samples on the hemisphere around the surface normal
so many samples end up being wasted
but what if we start by projecting the hemisphere in uv space then finding the area of that projection to use in normalization term and find the jacobian determinant of the mapping to use as pdf
should end up having all samples contributing something
writing my thoughts here so I won't forget
interesting thoughts
I am inside your head
projecting the hemisphere into uv space though
how does that work
I mean, I see why it's good
by generating samples in the hemisphere in world space then transforming them with viewproj of the rsm
the hard part is deriving pdf and normalization factor
I don't really understand all the jacobian stuff other than that it's used in differential equations or something
jacobian determinant is a way to find surface element
surface element is the area of the differential
I recall something about it representing the local rate of change I think
jacobian matrix is all of the partial derivatives of each argument of a multivariate function
wikipedia has a better definition I'm sure
using a projected hemisphere will result in up to 50% fewer wasted samples according to my head math
advanced math explanations on wikipedia are usually not accessible to most people anyways
they are condensed but make sense if you have understanding of underlying concepts
there goes the J again π
wat
In your thread?
I think I remember void mentioning it once in the context of IBL, idk
you made fun of void sizzling J into things all the time π
Yes the Jacobians are an infection
Maybe I'll invent the counter to it called the Jakerbian
Jakerbean π
I didn't have to do it for what deccer had because it was an obvious spherical to cartesian map (but without radius)
I failed to recognize it at first
what's more of interest is the case of 1/pi becoming pi after going from integral to riemann sum
at least it keeps jaker on his toes
what if you write the full name?
the denoiser is working pretty well, but there is still some annoying subtle flickering in the distance
the video is kinda hyper to show that it works "under motion"
here's a calmer one
and that is with the variance-guided luminance weight
also with the temporal alpha turned down to 0.05 instead of 0.2
I can't see how they can make 0.2 not super unstable unless the sampling is really good
it's easier to notice when everything is still and you aren't looking at a compressed video
you do fly around like someone having an asymptotic minute going on
its beautiful
its a huge step forward from the last one
I changed a bunch of stuff so I can't even remember
now I need to add a bunch of sliders so I can determine the ideal constants to hardcode
and then optimize because right now I'm doing five 5x5 bilateral filters in a row
i keep remembering to git commit things when i made a step forward, but i end up changing too much at the same place and then i cant do that anymore :3
I didn't want to commit anything because the code is quite messy right now
but sometimes i remember rider/clion have local history
you can always rewrite (change order/remove/etc) commits later, but you have a save point if you have to go back for some reason
that is amazing you are very smart
guess what the sampling is usually really good
that is, in comparison with what you have with dithered rsm
I wish I could spend some time on improving it but I am doing my homework
school slows down the technological progress in the world
I want to wrap this up soon anyways
Supplemental video to our High Performance Graphics 2018 publication.
More details at http://cg.ivd.kit.edu/atf.php
check out the 1spp footage
damn lol
it's very good in terms of convergence rate
I wish that video wasn't 720p
hehe
I see a little bit of low frequency noise still in theirs
at 58 seconds, the ceiling looks quite similar to the one in mine
which one
what if you feed it back the result from a third iteration and not first
should have even more fake sample increase
interesting idea
metal sponza doesnt look bad either
unfortunately doesn't seem to help too much
I'm guessing the problem is the new samples not being blurred enough
the history is super smooth, but when you add 20% of the current frame, it's still kinda noisy
I think the extra blurred history only helps to an extent
I can crank the alpha down a lot with minimal issues except for lag when I move the light (even disocclusions aren't bad). Then I just need to add temporal gradient estimation
what if you make it blue noise
like use one value to rotate all xi components
don't do that it's pointless, I just did
also when will you commit the changes
well I ended up playing with it in my free time
and thought about better sampling but I'm totally out of ideas tbh, need to read more about stuff
Soon I hope
mfw I had the luminance weight disabled this whole time
float luminanceWeight = LuminanceWeight(oLuminance, cLuminance, variance, uniforms.phiLuminance);
float weight = normalWeight * depthWeight;
only noticed once I added sliders for everything
what was your cmake error jaker?
something probably identical to geryu's
I recall running vcvarsall or something like that
tomorrow I can try again on my work machine
the weird thing is that after my troubleshooting, I can create vs solutions with cmake, but not open the directory with vs
the output log had basically the same error
but yeah I need to fix these nans now
NaN/INF/-ve Display my beloved
the issue was that my variance calculation was sometimes returning a number less than 0
(whose result was then fed to a sqrt)
the variance guiding seems to create more problems than it solves
some disoccluded pixels flicker (rejected by spatial denoiser) due to high variance, so you need TAA or more blurring to fix it
what are you using it for anywahs
I just wanted to try it
try it on direct illumination
there is more variance in indirect always
unless maybe caustics
I'm currently pestering criver in #mathematics in order to try to move it to semi-directional sampling
so that we could use importance sampling of some sort and specular
oh and by variance weight I mean illuminance weight (which is variance-guided)
the quality is a little better in the second, particularly at a distance where indirect is higher frequency
I see no difference
looks colorful tho very pleasing
always a treat to see the indirect from curtains
without vs with illuminance weight
second is darker overall, and there is a little less overblurring
by the way when using directional sampling you have more apparent blue noise patterns
that's very blue
so when sampling random points on a surface there is inherently less correlation between samples?
idk tbh
I was also able to get a visible blue noise pattern (though not as strong as yours) by just randomizing xi.x per frame
now gotta wait for criver to go online and discuss the issues
maybe we should move the conversation over here
maybe
I see you guys are in deep discussion
I normally have #mathematics muted because convos in it are so long and technical
shit I accidentally pressed on visual studio that has been minimized for a few days and is now certainly entirely swapped on hdd
it's now going to load everything in ram
my hdd usage led is now always active
gotta love how much ram vs is hogging
almost like chrome
I found that occasionally restarting it helps
I sometimes kill its processes
such as vcpkgsrv.exe
not lethal, it simply creates fresh instances that have less hogged
this variance guiding is pretty ass without TAA
I'm also not boosting the variance of recently disoccluded pixels, but I doubt that matters much π€°
one thing I noticed is that the RSM resolution heavily influences the performance of sampling it (2048^2 vs 512^2)
the difference in indirect lighting quality is not very obvious
sampling is about 5x cheaper when the RSM is 512^2 compared to 2048^2
when the RSM resolution is too low, the indirect lighting starts looking pretty bad (128^2)
but at least it only takes a few ms to take 40 samples
I need to experiment with having multiple RSMs, i.e., having multiple indirect light casters
alright, I pushed everything
the first 2 pics from the last batch of pics, really looks neat
now I gotta optimize it so it runs on your iGPU
atm it's about 4-5ms on my GPU, which is too slow
on an iGPU it will probably be 40-50ms π¦
the first opt I'll try is using separable bilateral filters
and then doing everything at quarter res
i fink it makes sense not to target igpus with these kind of things anyway
GI should be off
Mayhaps
I want to run it at 4k in like 2ms though
What's sad is that it'd still be slower than gi-1.0 which looks way better than this
you could make use of simple irradiance maps placed around for igpus
and I'm not talking about preintegrated ones
I'm talking about dot luts
kinda spherical harmonics I guess
there was this sonic game on dreamcast that did this
you basically have a gradient texture that covers -1 to 1 range of the dot product of the normal and some direction
or "GI" on igpus just takes much longer, speading it out over 30 frames or whatever
and you find the dot product map it to [0; 1] and sample from the texture to get the simple irradiance
btw turns out the rsm sampling using hemisphere thing is very hard, rest in peace
That's quite similar to what I'm doing for mie scattering in the volumetric example
did you know you could optimize the compute light function
float sqr(float x) { return x*x; }
vec3 ComputePixelLight(vec3 surfaceWorldPos, vec3 surfaceNormal, vec3 rsmFlux, vec3 rsmWorldPos, vec3 rsmNormal)
{
vec3 pathSegmentVec = rsmWorldPos - surfaceWorldPos;
float cosines = max(0.0, dot(surfaceNormal, pathSegmentVec)) * max(0.0, -dot(rsmNormal, pathSegmentVec));
// Clamp distance to prevent singularity.
float d = max(length(pathSegmentVec), 0.03);
// Inverse square attenuation. d^4 is due to us not normalizing the two ray directions in the numerator.
return rsmFlux * cosines / sqr(sqr(d));
}
I once did that partially but you reverted the changes
also two cosines is not geometry term
geometry term is two cosines divided by squared distance
I think the paper calls it the geometry term, but not in reference to an actual BRDF or anything. Maybe a better name would be angular attenuation or something
what's the optimization?
1 sub and 1 negation instead of 2 subs and 2 muls instead of 3
d = d * d
d = d * d
vs
d = d*d*d*d
I would expect the shader compiler to do that one at least
maybe but I'm not relying on that
I don't worry too much about small arithmetic optimizations
also you can negate the dot product instead of negating the vector
oh and also the float d = max(length(pathSegmentVec), 0.03);
was distance before where I believe third sub would take place under the hood
also true
though on AMD negation is literally free, so it won't matter there
I guess I'll move the sign over to the dot
I diffed the assembly (RDNA2) of the original and your version and they were identical
wait no
my diff tool is special
what's the change
one sec, windiff got messed up when I reinstalled VS on this machine
wow comp is so useful
Name of first file to compare: a.txt
Name of second file to compare: b.txt
Comparing a.txt and b.txt...
Files are different sizes.
lol
diffing assembly is poopy
once there is even the slightest change, everything becomes different (this is with some random vscode diff extension I found)
new idea is to analyze them with shae.exe
overall, the optimized version has 4 fewer instructions
and vgpr pressure goes from 31 to 29
it was not in vain
God I wish I could analyze OpenGL apps with RGP
Kinda crazy how everything got reordered when that one function was changed
I can send the (dis)assembly in a sec
so uhh I don't think I would like to compare them
what analysis you did was plenty enough
understandable
so are we going to implement software ray tracing in fwog
sw rt is something I eventually want to explore since a lot of techniques use it, but right now I have too much stuff on my plate to do it
does fwog at least have the means to implement it?
two level acceleration data structure is probably taking it too far though since even with rsm there are no dynamic things
but building one bvh for the whole scene and packing it into a buffer texture should be enough
ok maybe not, maybe using ssbo is better
buffer textures would be useful only for two level ADS
yeah, it exposes all the useful features of OpenGL 4.6 (+ bindless textures)
since you could exploit the bindless textures extension
and treat it as BDA
and BDA is very useful for two level ADSes
I see
though performance is worse than the true bindless buffers
but I have RTX so who cares
it can chew it ezpz
using a single giant SSBO for all BLASes is probably okay
one for TLAS, one for BLAS
it'd definitely be fine for demo purposes
probably, but code will be more convoluted
you'd abstract away buffer access to shrimplify it
you could implement a SSBO pool class that grows automatically like vector
and abstract the memory thing away at least on the host side
on the device side the shader might get messy
maybe with macros it could be saved a bit
instead of the TLAS pointing to a buffer, it points to a segment of the BLAS buffer
I don't think it would be too bad tbh
maybe, sounds good enough to me
actual buffer pointers would be nicer, but this is what we got
I also recommend using madmann's bvh lib
that's cheating π€°
but writing your own is going to be a pain when there is a lib like that out there
I wouldn't call it cheating if it has more common sense
nih destroyed
or you're saying you'd like to make bvh building a core feature of fwog?
examples are already something to behold though
ever since I came and said what if we dither-filter your rsm
ye I should advertise those more
there's also galunga I believe?
ideally, the examples can be used for learning
ok so
that was a powerful brainworm 
BLAS stores triangle indices
TLAS stores offsets into BVH buffer
wait nope
it stores renderables
renderables store offsets into BVH buffer
ah
ok it stores indices to renderables
there should be another buffer with them
you would have a matrix and material and BVH offset and shit per each renderable struct
also need to fetch tris
so this info too
you'll have to pack your geometry too then
just a buffer per vertex attribute type I guess
unless you store interleaved
array of structs
this is a limitation in the sense that your geometry will have to be all of same format or you'll have to make a buffer for all permutations
yeah there is a lot of things to consider
better make a draft in some text file or something
can't keep track of all of it in your mind
yeah
but are you set on making two level ADS?
I thought it's fine to have one BVH for the whole scene
still need to pack geometry thouhg
if I were to implement sw rt, I'd want to do this
you will do it because I am in this thread
as soon as I get free time I will start a fork and you will most likely get dragged in purely by how contagious my enthusiasm is
you can just make a new project that uses fwog, no need to fork
but you can fork if you want
true
but that will become my project and I don't want that, I want to contribute to fwog
I would happily link your project under the "users of fwog" section π
should be the first library with the most over the top examples
examples:
hello triangle
deferred rendering
reflective shadow maps with denoising
real time software ray tracing with denoising
AAA game we made for fun in our spare time
I like it, the asymptotic upper bound is O(n^n) for the example complexity
I just realized that the RSM is definitely being affected by the depth bias in the shadow pass
the backsides of these curtains are no longer being strongly illuminated when I turn off the depth bias (ignore incorrect shadows)
with depth bias
so you're writing the depth with an offset huh
it really do be like that
I guess it's time to put the offset in the shader that applies the shadow
remember there was an offset along the normal? I wonder why the paper did that
probably not but you made me recall it
I kinda like how curtains are translucent though
it was a neat effect, but horrible to denoise
why is it horrible to denoise?
there were not many valid samples, since the only samples it could use were ones on the interior of the wrinkles of the curtain
and what samples there were, were all pretty weak since none were directly facing the surface being shaded
btw does feeding samples from third atrous not help at all?
I tried and it didn't seem to do much
I'm normally feeding back the output of the first atrous pass already
seemed like the history was already denoised enough, and it was the current frame's samples that were causing issues
plz add my optimization
#extension GL_GOOGLE_include_directive : enable

this is not an extension that's implemented on any desktop gpu
on nvidia at least
nvidia actually implements part of it, but that's not why I have it
well it causes a crash for me
I have it just so my linter doesn't complain about #includes
ye that's fine
When performing implicit conversion for binary operators, there may be multiple data types to which the two operands can be converted. **For example, when adding an int value to a uint value, both values can be implicitly converted to uint, float, and double. In such cases, a floating-point type is chosen if either operand has a floating-point type. **Otherwise, an unsigned integer type is chosen if either operand has an unsigned integer type. Otherwise, a signed integer type is chosen.
what driver are you on?
hell if I know
ok, what GPU?
2060
I'm on nv as well (3070), so that's weird
I've always had to cast to float
even on 1050
and on amd too
some old radeon hd series
odd, I never had to do it in modern glsl
in shadertoy I have to do it cuz it uses some ancient glsl es
but the spec (for glsl 4.6) says right there that this conversion is legal
I also ran the code on my AMD machine today (rx 6800) and it ran fine
look I can see depth bias causing leaking
shadows in gltf viewer now use fragment shader offset
fyi the "alpha moments" and "phi luminance" sliders do nothing atm since luminance weighting is disabled
somewhere around line 59 in Bilateral.h.glsl is where it's disabled (I just set the weight to 1)
I was sus that it doesn't
svgf and 5 samples seems stable
I smell something coming from the case
must be gpu
made the shadows stochastic because I deleted shadow samplers
only for gltf viewer though
finally a use for variance weights?
though denoising shadows temporally will lead to ghosting no doubt
unless you add temporal gradients or drop history (or make alpha smaller) on sun angle change
I ain't denoising no shadows π©
just making something serviceable that isn't blocky af
svgf made you weary hasn't it
knees weak, arms are heavy
with your specular sampling