#Fwog and co.
1 messages · Page 4 of 1
I guess my point is that sampling a hemisphere means you can make your samples not be wasted
but yeah
can't really do that
we can build a distribution that samples only emissive parts of the surface
now how practical that would be is a different question with an answer most likely very unpractical, aka are you crazy or what
I'm okay with keeping this purely diffuse for now
trying to figure out svgf alone is hard enough
I'll probably settle for a scuffed version of it anyways since increasing the sample count is relatively cheap
bro I can't
I now want to make specular rsm
help
the brainworm is brainworming my brain
just add phong brdf or something real quick to scratch your brainworm
tho I don't have time atm
literally moving out tomorrow
I will revisit it maybe in a month if lucky
ah, gl on the move
in your honor I'm going to calculate depth gradients and use it to fix the atrous filter
I think we could narrow down the area where we sample the specular on the rsm
by again marching to find intersection point first, and then generating a disk around it
marching for every sample is dumb but for reflected direction not as much
ye, not a bad idea
actually there is already a volumetric fog impl in fwog
but with indirect from rsm?
ah, not yet 
not yet 😈
tbh the way it's implemented, it wouldn't be too hard to add
and since the fog samples are big in screen space, I could use the dumb RSM sampling method and get free smoothing
speaking of samples
going from 1 to 2 samples literally doubles the execution time of the rsm shader
cache thrashing?
evidently
well at least you empirically evaluated that algorithm is O(N)
since I'm now doing temporal filtering, it seems like a larger blue noise texture has a positive overall effect, although it still performs worse with the spatial filter alone (as in these pics, 16x16 vs 256x256)
when temporal accumulation is enabled, instead of boiling artifacts I get splotches which are probably easier to clean up
yeah, I gotta work on that
and then also hysteresis when you store first pass of subsampled filter to be reused in the next frame
this effectively increases sample count
ye I sent a pic of that from the svgf paper
I also gotta figure out this variance stuff, but that can come later
also
- sample and denoise at quarter res, then upscale
- VRS to better allocate rsm samples (high variance regions get more samples, low variance regions get fewer)
I wonder how this disocclusion blur is implemented
https://youtu.be/3EdE38iRn2A?t=225
I have a history buffer, so it's easy to tell when a pixel was disoccluded
maybe it's just an extra atrous pass on discoccluded pixels
ah dammit, it was in the video all along
the disocclusion blur is like 80% of the algorithm 
disocclusion blur is just blur except for disocclused areas 
real question is though whether this disocclusion blur is stored in history or is only a 1-frame-long step to mask lack of samples
I would guess it's not stored
@golden schooner do you have any ideas on include handling? Every time I link my code I see bits that could just be in a header 😩
I could just make the include path be the directory the file is in
i have something indeed
or I could allow the user to pass an array of include paths
not added to enginekit yet, but i prototyped something
when you do #include <Foobar.glsl> it will do 2 things
either find the file relative to the programdir + Data/Shaders (thats where all my shaders sit)
or when the include was named "EngineKit.Graphics.GpuMaterial.virtual.glsl"
the include handler for that one will generate a glsl equivalent of the GpuMaterial struct at that include spot
does it make any sense?
I'm confused about the GpuMaterial thing
are you just saying the contents of the include get pasted in like usual?
GpuMaterial is my cpu side struct describing the elements for the materialbuffer for instance
or GpuLight for the lights
or GpuBaseInformation
ye
in case of an actual file, i just include the contents of the file as is
in case of a "virtual" file (marked by .virtual. in the filename) i lookup the struct (via reflection) in the namespace provided by the filename
I'm thinking of doing something more "standard"
and then generate a glsl equivalent out of that type/struct, and insert it
i did that virtual thing in d3d11 a lot back when
then i dont need to write all the supporting structs in glsl myself
ye that'd be nice
just a little reflection majeek
it's a bit complicated to add in C++ though
ye
so I just want to stick to regular includes
fair
i have a regex for detecting includes
and named the capture for the include name
I'm just gonna use stb_include since it does manual parsing and probably handles all the edge cases
so now I'm wondering how to design the interface
ye
in my case, at the stage where you read in the shader source, before passing it to glShaderSource..., i would plug in stbinclude, and then hand over the processed string to opengl
I think the main problem is that stb_include only accepts a single include dir
but I can probably change that
what do you think on implementing software ray tracing and adding an unwrapping lib to create baked AO maps for RSM?
not very practical for dynamic scenes but will compensate for some lack of visibility
tho at this point may as well bake the whole indirect
I guess if I get the chance I'll start by making perfect speular via rsm
which I'll then extend to rough specular by sampling around the perfect reflection point
maybe I'll try tomorrow if no assignments
I hope my poor laptop could handle the rsm
if I make this run at quarter res then it might perform acceptably on these low-end devices
I just realized something cool
the denoising pass has a constant cost
which means you could have a bunch of RSMs and only sample each once
then it all gets denoised at once
--------
SamplerCache.cpp:46
--------
// TODO: determine whether int white values should be 1 or 255
i believe it stays int or uint, when you use the i or ui variants of glSamplerParameterXv
if i got page 183 of glspec46 right
hmm also you were mentioning something about border color and clamp_to_edge but i forgor what it was exactly, glspec also only has some vague stuff going, it was bordercolor needs to be set when address mode is clamp_to_edge, wasnt it?
Clamp to edge clamps to the last texel
Clamp to border is the one where you need to set the border color
ah
Also I think both 1 and 255 are wrong for integer formats
is it not dependent on format
that would also mean samplercache could omit setting the bordercolor when clamp to border is not set
I thought if the texture was float you'd write float bits there
Yes
That's why there are different ones
But my point is that integer textures can go above 255 if they have more than 8 bits
Data conversions are performed as specified in section 2.2.1, with these excep-
tions:
• If the values for TEXTURE_BORDER_COLOR are specified with TexParame-
terIiv or TexParameterIuiv, they are unmodified and stored with an internal
data type of integer. If specified with TexParameteriv, they are converted to
floating-point using equation 2.2. Otherwise, the values are unmodified and
stored as floating-point.
default: FWOG_UNREACHABLE; break;
no assert? megamind meme
its just breadcrumbs jaker left behind for us
honestly I'm not the one to point that out, which is why my project is not open source
not the one to point fingers rather
i dont mind being finger pointed at my code
most of the time there is good feedback
I mind being finger pointed because I speedrun implementations
it just ticks me off if someone says it's not optimal or clean when I know it's not
understandable
I actually put effort in my nickname
Unreachable is an assert in debug mode
epic
its what, can't start declaration with number, void with return, return mispelled, g isn't valid in hex, wrong brace, semicolon after brace?
This is pretty cool
I wish there was something that explained how exactly the depth gradient is supposed to be used for the bilateral filter weight
I get that it's supposed to help with denoising surfaces viewed at an oblique angle
but just looking at some code is not great for learning
I think I'm going to just write unscientific jank and eyeball the results
mathematical rigor is too hard to achieve
hello scene-dependent constants my old friend
float depthThreshold = dot(1, abs(Ddxy));
translates to 1*|ddx| + 1*|ddy|
yeah that part is not hard to understand
you don't understand what gradients are?
idk it's just taking me some time to soak in all this code
gradient is a rate of change
I know what a gradient is
also there are different ways to get the depth gradient, and I can't find the shader where they compute that
if you view the surface straight on there is less tolerance because rate of change in depth is smaller
so I'm just gonna make a dumb one that does (f(x + 1) - f(x), f(y + 1) - f(y))
but for oblique there is more tolerance needed as nearby pixels will have more depth difference
there are built in gradients in glsl
only for fragment shaders
yes and you store it in gbuffers
and via some extension in vulkan glsl
when you rasterize
// Remap partial depth derivatives at z0 from [1,1] pixel offset to a new pixel offset.
float2 RemapDdxy(in float z0, in float2 ddxy, in float2 pixelOffset)
{
// Perspective correction for non-linear depth interpolation.
// Ref: https://www.scratchapixel.com/lessons/3d-basic-rendering/rasterization-practical-implementation/visibility-problem-depth-buffer-depth-interpolation
// Given a linear depth interpolation for finding z at offset q along z0 to z1
// z = 1 / (1 / z0 * (1 - q) + 1 / z1 * q)
// and z1 = z0 + ddxy, where z1 is at a unit pixel offset [1, 1]
// z can be calculated via ddxy as
//
// z = (z0 + ddxy) / (1 + (1-q) / z0 * ddxy)
float2 z = (z0 + ddxy) / (1 + ((1 - pixelOffset) / z0) * ddxy);
return sign(pixelOffset) * (z - z0);
}
I was specifically trying to avoid that
I could compute it in image space with acceptably few artifacts probably
I think you need a ratio of change over distance
and if it's bad I could use a more advanced method like this
https://atyuwen.github.io/posts/normal-reconstruction/
because it's the definition of a gradient
(change in value)/(change in argument)
y/x
derivative definition version lim(h->0){(f(x+h)-f(x))/h}
ye I'd just use the finite difference to get the gradient in image space
I don't rember the units of measure of argument change in builtin
need to look up glslspec
now, there will be artifacts at corners with a naive image space method, but who cares
pretty sure it's implementation-dependent
but in practice it uses the neighboring pixel (since 2x2 pixels are shaded in lockstep)
and if you use the coarse variants then the same samples in a quad (2x2 pixels) might be reused
what I'm going to do is calculate the naive image-space gradient and store that in a texture for now
it should be enough to fix most of the ugliness™️
reading some articles seems like it only takes a difference between values
ye
it's finite difference but on the rasterized triangle instead of in the image
which means no edge artifacts where planes meet
you can branch at the edge
if say you take difference current and right pixel you can instead take difference with left pixel at the right edge
same result
I meant geometric edges
well not really but at least it will try to extrapolate instead of just taking difference with itself (if clamped) or with wrapped around pixel
but ye I'll take that into account for screen space
so opting out of storing gradients is to avoid being memory bound?
e.g., finite difference in screen space will give incorrect results for a bunch of pixels here
https://i.imgur.com/t60gJBO.png
because it's assumed to be one big surface?
also to reduce the requirements from the renderer. I'd prefer to hide as much stuff in the file where I do all the RSM-specific things
can fight that with triangle indices
this is also one of the heuristics in weighting bilateral filters btw
I could imagine
I linked one technique that relies on the depth buffer only to identify the surface
tl;dr sample the neighboring pixels and use the depth of the two that are closest to this one to form the surface
@long robin what's the plan anyways?
to make a better filter?
also imagine pulling off temporal gradients later... for RSM
this is basically a rt pipeline denoiser so that you could swap RSM with path tracer and it'll work OK
given that it can work with 1spp that is
because rt is just that much more demanding when compared to sampling textures with whatever other math there is
can fwog bindless textures?
- create depth gradients
- add 3x3 bilateral blur (regardless of disocclusion)
- create moments
- upgrade atrous denoiser to use depth gradients and gamer moments
- sample and denoise at quarter res instead of screen res
- use VRS to further guide samples
yes
ok then if you also add bindless buffers I could make a pt example for fun
it's kinda in beta though
but it is usable (I use bindless textures in GPU-driven example)
there is a NV only extension NV_shader_buffer_load/store which gives you bindless buffers via pointers
BDA for ogl basically
I'm mainly interested in ubiquitous extensions
amd is not as generous
you can pretend with bindless textures
nor intel but what do you expect from intel
I guess buffer textures may do but they're ugly in code
also for whateer reason slow as hell on nv
slower than BDA extension
yes bindless handles work
huh, interesting
they're a texture after all
I don't remember can you alias memory buffers with vbo and texture at the same time?
probably can because I would have rejected the idea otherwise back then
I hate dupes
you can do it, and probably legally even, provided you use the correct barriers
also this denoiser- idk if it's going to be ideal for path tracers since it's my first attempt at a spatiotemporal denoiser
RSM is kinda forgiving because you can take extra samples cheaply
I would argue it'll going to do better with pt due to importance sampling
RSM samples are distributed very suboptimally
surface integrals are very bad like that
mayhaps, but I predict 1SPP PT will be very expensive compared to 1SPP RSM
it's a bold prediction
indubitably
and very divergent
and so very not hw accelerated since this is 
either way, pretty much any good GI solution will involve path tracing packaged in various ways
except those two
what about lightcuts?
is that even a good GI solution
all the recent ones I've seen use PT somehow
- surfels is PT with a fancy world space cache
- DDGI is PT with a fancy world space cache
- GI-1.0 is PT with a fancy world and screen space cache
- RTXGI is PT with a fancy world space cache
everyone is keen to put them cores to use
I'll have to take a look at lightcuts though since I have no clue at all how it works
not worth the effort if you ask me
in modern day and age
it's basically rasterizing the scene from multiple angles and caching the results
hmm
the paper is only 10 pages long so I can give it a skim
after I sleep
but it sounds not ideal for real-time from your description
its underlying technique is instant radiosity so it's not
any updates?
The five most recent commits
https://github.com/JuanDiegoMontoya/Fwog/commits/examples-refactor
The only thing of significance is making example 5 use the example framework, which was the original purpose of this branch
I see, truly minor
on an unrelated note I think I won't be working on fwog due to being busy for months ahead
Np, go work on important stuff 
more like do things to prove I know what I know
college is so stupid
I came there already knowing beyond what they can offer
I picked a thesis that they will have trouble understanding
pure comedy
Lol
you could just drop out and do your own thing
Naw, you need that piece of paper to tell employers to give you a job
then, just finish skool summa cum laude, while focusing on your hobbies on the side
meanwhile I was struggling with "Math II" 3 years ago 🥲
must be nice to have talent
theres no such thing
when people attribute your success to talent 😩

i parsed skool summa as skooma, which uh, is an acceptable minor if you are having a too easy time
opengl programmers scare me
this is what teardown does before rendering a fullscreen quad whose sole purpose is to unnecessarily copy a texture to another texture
the scary part is also the amount of objects in this thing
what you see there is a specially crafted binary signature in the command buffer designed for nv blob engineers and jakeronis to patch out in the driver
Ah, yes. I should have guessed
Teardown is such an interesting renderer, but it uses such a crusty old version of OpenGL and all the associated crusty practices involved
enable blend?
another Teardown classic (in the g-buffer pass)
I guess that explains why albedo in the g-buffer is stored in a RGB16_UNORM texture 
my brother in christ stores the input albedo texture as RGBA8_UNORM
and storing normals in RGB8_SNORM without any special encoding 
minus half a sin because there are no curved surfaces, but still keeping half a sin because RGB8
also storing linear depth in R16_UNORM while also having nonlinear depth in D32_UNORM (and clearly not using a reverse z projection)
tearing down teardown in teardown teardown 
I'm writing a frame breakdown, but I don't want it to "accidentally" turn into a giant roast of the renderer
these people actually shipped a successful game
I also couldn't get Nsight to connect to the game, so I can't tell how much of an impact any of these problems actually have on the game's perf
well, I suppose there is always this option 😈
something doesn't add up here (first row = whole frame)
😶
bruh moment
😭
@heavy cipher just so I am 100% clear, one should use an sRGB render target instead of storing pow(color, 2.2) into a unorm target, correct?
because what Teardown is doing is sampling a RGBA8 color texture, then doing pow(color, 2.2) and storing it into an RGB16_UNORM target
thats okay
depends on how you use it after
yeah I'm looking rn
sympler, no sampling, as dora would say
looks like they are just sampling it and using it with no further transformation
it is confusing
they are doing color math with this ^2.2'd value
I feel like the whole reason they are using a RGB16_unorm target is to mitigate precision loss from the ^2.2
weird
maybe I should refrain from passing judgement in my breakdown because I don't know wtf they are doing
I'll just mention it as something that is weird and I don't understand it
which one is more correcter though
spoilers
||the first image has the ^2.2 line commented out, while the second is unchanged||
I can't find if they are doing srgb encoding/"gamma correction" at the end of the pipeline like a normal person, so I can only assume that they're "doing it" here with the albedo???
sadly, modifying the shader they use to blit the texture to the screen affects a bunch of places, so I can't test my hypothesis
interesting to note that the swapchain is sRGB, but doesn't that mean it's doing linear->nonlinear to the color again 
ah i missed that albedo was unemcoded too
well idk what the encoding of albedo is in the material texture
but a normal person would put it in nonlinear srgb texture and have the sampler do the nonlinear->linear transform for lighting and shit
but this is a gfx programmer
they do lin2nonlin
so if it is already nonlin, its just worse
isn't albedo normally stored in nonlinear sRGB?
then you put it in an sRGB texture so it's linear when you sample
but yeah when you write to one, it does linear2nonlinear
ye but i am just repeating what you said in off topi
ok so to confirm, is that color pipeline weird to you too?
Yeah but some of it can be explained with "but opengl!!! Buggy!!!!!"
not color meth tho
Well that can be an artistic choice
Its an NPR renderer anyway
kinda
they have some clearly PBR-inspired code in here
ok, maybe the real albedo g-buffer texture for lighting is obfuscated through a glBlit
the real albedo is the friends we made along the way
lightColor *= texture2D(uScreen, b2).rgb;
that's the usage in the shader where it actually looks like it's going from albedo to shaded color
fwog
I need to just forget about this color stuff because it's melting my brain
it's just a color, but it burns
renderdoc should have an overlay that shows discarded fragments
it probably wouldn't be easy to implement though
my naive idea would be to patch shaders and add a write to a hidden SSBO before discarding
how naive
huh, I imagine the overdraw overlays probably do something similar to what I described
looks like they are unaffected by discards, which I guess means they are implemented with an SSBO/image atomic add
would it be possible to contact the guy and ask?
ok
hehe
I did in fact ask him, in case you didn't click my link
well Baldur immediately thought of a better solution than what I thought would be possible
using the stencil buffer no less
downside is no heatmap
i was referring to the author of teardown 😉 not baldur
Oh, you were referring to the previous conversation
That would basically be cheating
But it would be nice to be able to ask him any questions I have about it
"hey so I've been reverse engineering ur game mind helping me out?"
"for educational purposes, I am also asking for a friend"
retweet
another ism in teardown
the focal plane texture is generated and unused when DoF is disabled
it's a draw to a 1x1 texture though, so who cares
you
soulja boy tell em
this looks correcter
linear colorspace is known to look darker like that one below
when put on a srgb display
both images are in the same comment, which one did you mean 
don't you like seeing a man pull off complex voxel renderer with GI and fail on a basic task of handling srgb
also when you see the pow 2.2 in code it automatically looks sus
when there are hardware formats for doing every aspect of srgb correction for you
vertex colors?
aren't those vec4s so you don't need to store them in nonlinear format?
enough precision to get away with using linears
but originally they are in some format, which is usually not f32
unless you like spending bw on that for some reason
very much so
and i dunno if opengl lets you do ILS to srgb
it is possible in vk since 1.1 or some ext
ILS?
that'd be stupid not to
rendering to a srgb texture is same thing as storing to it or am I wrong?
it is different
and when you do render to srgb it converts for you
rendering is well supported
rendering goes through the "framebuffer" hw
ILS does not
wdym?
you can imageStore and imageLoad in fragment shader
yes
does it use different parts of hardware to store and to write to framebuffer in one shader invocation?
you don't store to the FB with ILS
you store to some different image
and yes, different HW
ILS conceptually is just like SSBO stores/loads, but to images
I see
can't really imagine what hardware would do to allow both in fragment shader though, but it's cool because we can use that to voxelize the scene by rendering every triangle using raster pipeline and store each fragment to 3d texture
yes, this something indeed done
imagine it as a side-effect of the FS
the FS writes to some bit of global memory for your image / SSBO store, then passes the FB value to the special hardware magic
minor correction: there is no GI in this game (though it looks like there might be)
no, you have to create a non-srgb view to use them
for writing data, teardown uses fragment shader output exclusively anyways
well ye
but i didn't know if that was allowed
ok, I thought you were wondering if you could get the automagic format conversion
only direct?
how does it have non-black shadows? (disregarding penumbra)
constant ambient?
yeah, I think it depends on the time of day
well no
it has IBL
so it's not constant ambient exactly
according to my experiments it should leak like crazy
and well if you think about it for more than 5 seconds
the IBL is just the skybox
like learnopengl IBL basically
it's used for ambient color
you cannot use skybox IBL as is for diffuse
you need to preconvolve it
preintegrate
yuh
the details will be in the breakdown
see there is a separate term for it
I guess the skybox as-is is the radiance map?
you can even do the integration with it in the shader by doing limited amount of samples
I haven't looked at the IBL yet but all I've seen is a single un-convoluted skybox cubemap
imagine being a sample 😳
what's up with my crazy ideas
you took your crazy pills today
I took a lot of ray tracing pills for a few years
convoluted rendering
very
also I found some more albedo textures being used (sampled) and they're all unorm
quite epic
if it's using radiance map without preintegrating then it would look silly as all pixels on the map will only influence those specific directions they lie at
maybe it is using a low mip?
e.g. a star map would look silly because all surfaces would on average be black-ish and those that happen to coincide with a star would look bright
tfw convoluting using glGenerateMipmap
(fake) (not real)
convoluting 
unironically a way to do it, but not exactly correct
for rough reflections
any reflections, including rough
by going down the lod
I would still convolve the thing and store at different mip levels to be at least somewhat more correct
fully correct thing would interpolate between fullres preconvolved cubemaps
why is it convolved exactly though
maybe someone managed to preintegrate an irradiance map with a blur kernel?
hmmm but a convolution is an integral iirc
maybe it is the same thing
use convolved instead of convoluted when writing the article (not real advice)(truth)
volve these nuts
thanks gaben
that is generally what is done, yeah
is there spec at all tho?
or just diffoose
there are shiny materials
roughness, metallic, and reflectivity are written to the g-boofer
yeah, apparently this game is half PBR
the blend between diffuse and specular seems proportional to the reflectivity
"based on physics from a different reality"
wdym?
so why are you smoking teardown rn?
mathematically
I told some people in another server that I will make a frame breakdown of it
plus I was interested in seeing how its renderer worked
I have already answered my own question in that chain of messages
convolvulating
the higher mips are just a convenient place to store them
I know, it becomes progressively more low frequency
we are in violent agreement
that's what happens when two people try to argue over objective truth
objective truth in a discussion about Teardown
bikeshed is full of opinions
that is my opinion
so I though about the rsm specular some more
thought
and realized that it would only show the bright parts illuminated by the directional light source
and what's worse it would show them through every obstacle on the way
still might be worth trying out
does your deferred pipe have material info already?
specifically diffuse and specular parts
because it would be more handy to have those as opposed to metallic and calculate fresnel and friends for every light source
imo
maybe not so good on bandwidth
does it assume anything to make those specular highlights from point lights?
yeah
all materials dielectrics?
you could think of it like that
so that's why it looks decent
for metals there is no ambient term because ambient is pretty much diffuse
and there is no diffuse for metals
so pretty much all metals end up black
in a renderer without reflection probes
or without SSR
you can cheat and sample skybox but that looks weird with all the leaking
unity games used to look like that
there is still no diffuse
it will look like smudged highlight
unless you gradually, with roughness, add ambient to it which will look even weirder
if every material is 100% rough it will be alright maybe
I get you aren't serious
I'm half serious in that if all the metal materials have a roughness of 1, it probably wouldn't look too inaccurate
completely rough metals look like poopoo though
didn't Gustafsson already do a breakdown of sorts on stream once
sounds somewhat familiar but I'd rather spend 20 hours figuring out how things work than 1 hour
@rugged notch also thinking about it more, it's way more useful IMO to have this sort of info in a textual format
but I suppose I can cheat and look at that stream
just have a breakdown on stream
brings in the big views
also i bet he didn't go: "haha this is where i commit srgb warcrimes"
have a meltdown during a breakdown of a stream of a breakdown of teardown
"srgb war crimes" is a phrase that I shall appropriate
instead of frogment shader, we should say 🐸ment shader
am I tripping, or is this explanation of how depth buffers work completely wrong 🐸
https://youtu.be/0VzE8ROwC58?t=1221
I wonder if them catching flak for this is why they deleted the twitch vod
"the hardware gets confused" 
i am not sure what he explain
isn't he talking about not having depth info inside the rasterized boxes
yeah but you can set the depth yourself or discard if you hit nothing
I don't see the poor interaction here that requires weird hacky workarounds like copying the depth buffer several times in the g-buffer pass
let me watch 1x more
I'm also speaking from having looked at the frame, in which there are multiple of these
where they copy linear depth to another texture so it can be read in subsequent draws
this is how they use it
perhaps this is just an optimization?
but they could use conservative depth + early fragment tests
i think he is talking about losing early depth
so depth is correct but you need to trace all fragments, all the time
not with discard
well on modern hardware you can do it
but these guys want to target old hardware that can't run this game for some reason
the way of the worm
are the boxes sorted?
can nv do re-z?
idk but who cares 
even on amd its not always a win
anyways, they might have looked into this and this was better perf
my perception has been tainted ever since the 2022 srgb incident
make the review into a diss track and release under the pseudonym jaker 1.5
what if instead of discarding, they just set the depth to a very large value
ah but they'd need to render front faces instead of back faces
sending the fragment to the shadow realm
I wonder why they render back faces only
what would this accomplish?
the ability to use early-z on inferior hardware
you can't have both early-z and increase the depth on that hw
assuming a non-reversed z
that would just be a discard with extra steps and slavery
wot
I don't get it
conservative depth is about decreasing depth
some hw can only do (atomic) depth test-and-set
i think
I thought it was just about assumptions, e.g., early discard is okay if you promise to only write depth values that wouldn't break it
yeah, so writing a depth value that is larger than the original one breaks it
(again with a LEQ test)
I drew a diagram to explain my thoughts
this SO answer seems to agree, but it doesn't mention hardware specifics
https://stackoverflow.com/questions/31624437/understanding-gl-arb-conservative-depth-extension
if you have already failed the early test, the shader will not run
what are you writing then?
i have no idea what you are talking about then jaker
nor do I have any idea what you're talking about
if you have a LEQ test, and (depth_greater), the only legal fragdepth you can write is the same depth you already have
or rather in this case i imagine the implementation will just turn off early tests
with conservative depth you can't change the depth value in a way that would make the early test fail
thats the important bit
I concur
so if the test is LEQ, increasing the depth might make it fail, where it originally passed
hence that is not compatible with conservative depth
okay, I think I see where our thoughts are diverging
I am looking at it from another angle
if the early-z test fails, you cannot write a depth that would make it pass
if the early test fails the shader is not run bruh
yes I worded that poorly
what I mean is that it is illegal to write a depth that would make it pass when you enable conservative z, if it failed the early z test
is this SO answer wrong?
With the GL_LESS comparison function, fragments that fail the depth test will have depth values larger than the current value in the depth buffer. This means that the early depth test can be used without affecting the result with depth_greater:
If the early depth test is applied, it eliminates fragments with depth larger than the current value before the fragment shader.
If the early depth test is not applied, the fragment will be processed by the fragment shader. Since it's guaranteed that the fragment shader will only make the value larger, it will still be larger than the current depth value, and will be eliminated by the depth test after the fragment shader.
ye
either you have it running in conservative depth or not
if it is not conservative depth, then your annotation means nothing
you just get a late test
okay now let's consider an entirely hypothetical scenario
if the hardware did do an early test there, would it be wrong?
so we have early test + late test (if the early test passed)
or is this precisely what re-z is
I can't see the case in which early z + depth_less would result in an optimization though (in a non-reverse-z renderer) (and by optimization I mean early z actually being used)
so the thought I have rn is that conservative z does nothing except maybe allow extra early-z tests on certain AMD hardware, but it feels wrong
however, the extension spec was written by AMD employees, so...
my understanding is that amd has the following modes:
LATE_Z - depth test and set after fragment shader
EARLY_Z_THEN_LATE_Z - depth test and set after fragment shader (when certain conditions are met this becomes EARLY_Z)
RE_Z - depth test before the fragment shader, depth test and set after the fragment shader
EARLY_Z_THEN_RE_Z - depth test before the fragment shader, depth test and set after the fragment shader (when certain conditions are met this becomes EARLY_Z)
I wonder what those "certain conditions" are
that reads like sarcasm but I'm genuinely curious
are they explained in the doc you're reading?
when I search "re-z" in this server I see an ancient message from you
#vulkan message
https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/amd/vulkan/radv_pipeline.c#L5024
http://developer.amd.com/wordpress/media/2013/10/cayman_3D_registers_v2.pdf
https://www.yumpu.com/en/document/read/43374261/mantle-programming-guide-and-api-reference
https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/gallium/drivers/radeonsi/si_state_shaders.cpp#L1890
enjoy my sources
if only one of us worked for amd
imagine how much more info we could have
indeed, but my coworkers are all sleeping rn
and I haven't looked at prop drivers very much because I'm not a driver dev
and unfortunately there is no way for me to plug into the company and absorb all of its knowledge instantly
this is reZ really
but apparently reZ is just shit in perf usually
or rather, this is a weaker one because this could also be implemented via two depth test and sets
Wouldn't depth_less prevent any sort of early test though
I mean conceptually
no, i think you are misunderstanding what this is for
Mayhaps
you have a conservative test first
ie you rast'ed the bbox backside
then you update the depth buffer with a closer value
but maybe it is I who misunderstand this stuff
Let's ignore the Teardown example
It seems to me like the point of conservative depth is to allow for more early z testing
I hope you agree
And it seems like the only scenario where it would ever enable more early z is this one
And the parallel discussion is what hardware is actually able to do that "optimization" (which apparently isn't always an optimization)
If that is wrong then I'm probably going to suffer from an aneurysm because I even drew a picture and everything
ok i think you are right
i had it wrong
the other conclusion is is that apparently the hw does fuckall to accelerate this 😛
thanks for the insight 🙏
No u
@shell inlet I learned something interesting
apparently you can approximate the depth gradient (not really the gradient, but the adaptive factor you use based on the surface slope) using the dot product of the surface normal and the camera forward vector
it seems like it would break down as you get farther from the center of the screen, but maybe it's hard to notice
yes
👋 👋👋👋👋👋👋👋 👋 👋👋👋👋👋👋👋 👋 👋👋👋👋👋👋👋 👋 👋👋👋👋👋👋👋
tl;dr ask an employee to give you encrypted registry keys to set or add a line in the driver
yeah lmao actually I've tried that but didn't push because it didn't help much
the problem was in something else however
in the kernel itself being too sparse, not the rejection heuristics
I didn't try to fix it because I knew you were working already on a better filter altogether
@long robin do you have magic code in your lib that makes drivers pick best GPUs instead of integrated ones
I accidentally opened a pr to your repo trying to merge your refactor branch with my main lol
how to use github
delete the pr later unless you want to meme about it
you can remove commits with rebase -i
desktop github client sucks
you dont say
: )
github also sucks in general
time to learn some git cli
no please I dont want to turn into a linux user
if you have clion or rider or any other jetbrains based product, they have very nice git clients as well built in
and should support rebase too
also fwog doesnt run on my lap top
their git gui is well integrated into the jetbrains products
because it probably doesnt have the magic code
i dont have jetbrains only visual studio by microsoft
stop pushing me to change my toolchains
it wont work
i dont even use cmake
thats ok
bro what happened to fwog
truly fighting to run the gltf example
wen building copy textures doesn't happen only copy models and copy shaders
and it causes runtime errors
but that's after I added magic code
before that it was picking integrated gpu
@long robin basically add this code somewhere
#ifdef _WIN32
extern "C" {
__declspec(dllexport) unsigned long NvOptimusEnablement = 0x00000001;
__declspec(dllexport) int AmdPowerXpressRequestHighPerformance = 1;
}
#endif
those exports tell drivers to pick gaming GPUs
in laptops
also had to tweak the cmake configuration haphazardly to find a working one because it defaults to non building project
idk who to blame here and if it's fixable
nice fps
when I minimized it it crashed
filtered runs 30 on average
but it's 1spp so expected
you could have applied the plum color scheme at least
that was my fav in win9x days 🙂
I like classic
it was aero by default and I changed to classic
and didn't touch colour because I like blue
anyways nah I don't think I'm going to attempt to work on fwog on this slow laptop
it was worth it to see how it performs though
and that I pointed out that it will not pick proper gpu on most laptops
the ones that have two of them
had to dig up my old opengl engine to look for the magic exports
you could have asked as well, or just use the searchbox on this server 😉
im at a hotel away from home attending college
digging up old projects is not a rare thing for me so it was faster than google search query
because i still remember how things are designed in them more or less
can recall where specific code is at
are you asking to segue into something else?
or specifically curious when I get back at home
well then it's just one more week and im moving back
but I will have a lot of homework to do
thought you were gone for half a year or so now, and no beingable to use your pc pc
it's distanced
I attend personally only for 3 weeks
the rest is self educayshun and homework/coursework/internships
and now next year thesis
i see
I need to figure out if pasting the magic vars is the ideal thing or if I should expose this some other way
for opengl its the only way, unless you force it in the control panel
Well yeah
I guess those vars need to be set before creating the GL context
Plus that code only works on Windows
use vk to do device sel and swapchain and do interop to do the rest in GL
expose like as a part of teh api?
is my internet bad or is discord bad
anyways you can only idfef it with some macro there's nothing more to do about it
I can just put it in Application.cpp actually
app.cpp is bae class of the others
Yeah all the examples use it
I did not
I wonder why
Maybe it's a bug in window resizing
Perhaps it's trying to make 0x0 render targets
maybe render() can be skipped when window is minimized
Let's see what's causing the crash first
hmm i found something earlier
and reminded me of your open issue at renderdoc
GL_INTEL_performance_queries
has metrics for the thing you are asking
...
31 FragmentsKillCount 0x940b uint64 Number of times Fragment Shader performed a fragment-discard operation for fragment or sample
...
Lol you called me jaker although my GitHub handle is different
lol
reflex, let me massage it
: D
baldur might know either way
looks like the window is indeed becoming 0x0 and it trips an assert in glm::perspective
should be an easy fix. I just need to add the thing to Application.cpp as well
@shell inlet I pushed the high perf magic numbers and the window minimize crash fix
I'm also not sure
I was secretly seeking validation this whole time
🤰 anyways, time to work on the denoiser once more
the real validation layer is the friends we made along the way
hmm @long robin i noticed you use GL_GOOGLE_include_directive
but isnt that only available in glslangvalidator and shaderc?
Yeah, I'm only enabling it though
Not requiring it
Basically I only use it so the linter doesn't complain about my includes
hmm but
will it include the files? or do you have a custom handler somewhere? (i didnt bother to check)
correct behaviour is just enabled, not required 😛
I'm using stb_include
Therefore I don't care if the extension is actually implemented
ah i rember you mentioning that one now
Yes it's literally just so I can press ctrl+s and not have the linter complain
The extension I mean
I couldn't figure out a more convenient way to do it
you said yourself here deccer
The best part is that Nvidia's driver actually implements it (the new line directives)
nvidia implements the ARB version, ye
No, I mean Nvidia will compile your #extension GL_GOOGLE_include_directive : require as well as the #line directives it adds
And it will actually use the line directives for error reporting
ah
tfw you send jjj to a random text channel cause you were watching a youtube video and wanted to rewind a minute but didn't have youtube as the focused window
What does j in discord do
j is a keyboard shortcut to 'go back 10 seconds' in youtube. l goes forward 10 seconds. k is play/pause
nothing. its called I had the wrong window open when I hit j on the keyboard
i use arrow keys 🙂
arrow keys are good for finer scrubbing, cause its 5s intervals
5 second increment pleb
is it?
You should've pressed enter
I should of. Wouldn't of been the first time!
of times is the charm
shhhh of is my calling card
at least you dont work on work stuff right now 🙂
You should of kept it a secret
I would of but wheres the fun in that
heh
I haven't figured out what we are doing for next week GP meetup. I half wanna do a 'lets play with fwog'
@daring narwhal spit it out
what was i typing in here
np 🙂
It might be interesting if you use a small framework (e.g., the application class in examples-refactor) to speed things up
jaker coerce them into implementing the volume clustered thing#
but it doesn't fundamentally change anything about designing a renderer, so it would pretty much just be that but with some convenient stuff from fwog
maybe peeps could see if fwog makes sense, architectural wise, if its easy to use to write advanced stuff, or if stuff is missing
sounds like a cool idea
What a spooky level of scrutiny to be subjected to
shut up and deal with it
I haven't even merged examples-refactor yet
ah speaking of
i was able to get my old business laptop working again, has a 960m in it and your volume renderer and rsm thing run way smoofer
i'm just shooting ideas, we don't have to do it jaker 🙂
isnt it just the RSM part?
All of the examples got heavily refactored
ah
But also there are a bunch of RSM improvements and other random stuff tacked on
Yeah, I can probably merge it soon then open a branch for RSM stuff
maybe not this week, but soon ™️
I wonder how reprojection error is handled for normal and depth rejection heuristics
perhaps a shrimple bilinear interpolation works for those
eh, there is still overzealous rejection on edges when turning the camera
mayhaps I shrimply select the pixel in the 2x2 that closest matches the original pixel in depth or normal
this seems to clear up some of the unnecessary rejection
vec4 depthPrevs = textureGather(s_gDepthPrev, reprojectedUV.xy, 0);
vec4 diffs = abs(depthPrevs - reprojectedUV.z);
float depthPrev;
if (diffs[0] < diffs[1] && diffs[0] < diffs[2] && diffs[0] < diffs[3])
depthPrev = depthPrevs[0];
else if (diffs[1] < diffs[2] && diffs[1] < diffs[3])
depthPrev = depthPrevs[1];
else if (diffs[2] < diffs[3])
depthPrev = depthPrevs[2];
else
depthPrev = depthPrevs[3];
but the #line is not part of that ext
well, nv can compile C-style #line directives, wherever they come from
yes, but thats part of glsl, not the ext, afaik
wait whats the c-style ones then
okay, but the file you linked is the glsl spec
what I am saying is that NV can compile these
I assumed they came from the google include directive extension, but according to you that is false
you are out of line kowalski
no u
jaker im going to center fwog's windows
did you change anything in Application in examples-refactor?
Since when?
since the branch was created
Yes, I have changed it
oki then ill base the branch off of this one then
oki of ill base of branch of of of of then // for thomas
Yeah, cuz I ain't done
fair
then i wont touch deez nuts
its been 11hrs since i made this PR? feels like it was 2hrs ago
hmm what's the default font in VS 2022 called
Consolas?
my work PC has some old-looking font
it's really consolas by default?
on my home PC it looks different, more modern
ye looks washed out
why is it so big is my question
no thats just discord
4k monitor
looks sharp, its serif
looks like jaker increased fontsize
I changed the zoom in the text editor because of that
or ctrl+mousewheeled a little
I don't think monitor resolution plays the role in the font scale also
I have this tho
because the bigger it is the smaller every thing is by default unless you tweak it
good job internet made me feel like im using internet explorer
that will also change in the future 🙂
restarting because you installed an extension
promises promises
when they move addons out of process
it's not funny that even on my modern pc visual studio often lags
they are probably in a tremendous tech debt
yea
