#Fwog and co.
1 messages Β· Page 16 of 1
So the compiled backend should have them just fine I guess, even when calling from rust
yeah
the shader tool bakes the shader permutations and reflection data into a bunch of headers
oh btw my epic glUniform calls don't work on AMD 
glGetUniformLocation returns -1 for those

time to make vendor-specific paths 
wouldnt the workaround also work on nvidia
the workaround I was thinking of would be to simply use the high bindings on AMD, which doesn't work on NV
orrrrrrr I can figure something else out
I love how Intel is completely out of the equation 
I have a laptop with a 12th gen intel chip because the ryzen models were all sold out π
I guess I'll just be running my app on my main machine
not a real vendor 
GL_MAX_IMAGE_UNITS on arc is 8, which means I have to do whatever I do for NV and pray that it works
I'd rather not have to make a hack though
I really just want glslang to automap bindings but not be dumb by starting at a uselessly high index
I finally figured out how to manipulate glslang into doing what I want
I just needed these flags
--amb --stb comp 8 --ssb comp 8 --suavb comp 0 --sib comp 0
the idea is to force textures and samplers to start at index 8, then auto-mapped bindings for images will start at index 0
right i see
Very nice
guess it means I can remove le hacks now
Let us know if it works fine
if it does execute le push
So that I can execute le pull
the current thing works on nvidia atm
so you can le pull whenever
you have an nv gpu, right?
Yes
then it should Just Workβ’οΈ as the uploaded version has the glUniform hack
Inshallah pull I must
"The current GL state uses a sampler (6) that has depth comparisons disabled, with a texture object (56) with a depth format, by a shader that samples it with a shadow sampler. This will result in undefined behavior." 
ignore it
it's annoying because these things are reported before the actual draw

uh
these are the samplers that the gl backend creates
the only shadow sampler that I made is in the gltf viewer sample in fwog, which you are apparently not building
glBindSampler(theUnitYouWantToUnbind, 0);
Which units do you bind
Interesting
all my homies love using sampler objects
I will now do something that for sure does NOT involve a for (i = 0 to GL_MAX_COMBINED_TEXTURE_IMAGE_UNITS)

Would you look at that, it worked NOT doing that thing I very explicitly said I wasn't going to do!
incredible
now would be a good time to start using shrimpler objects yourself
I suppose so 
so afaik fsr2 is an upscaling thing but would you also do aa with it by downscaling again or something like that?
I think render resolution and present resolution being equal should be TAA
Or well, the best TAA AMD could come up with
it's also TAA even when it's upscaling
very cool
yeh ill definitely replace my crappy MSAA with it once i get the bindings sorted out
but you could get even more AA by combining FSR 2 with supersampling, I suppose
probably not huge gains I'd imagine
ye MSAA and FSR 2 don't mix anyways

Holy shit
Now this is some good TAA
Compared to that garbage I did this is incredible
Big blurry mess when moving though (my fault) 
How does FSR2 want its motion vectors again?
I rendered mine as rg16f, then passed {renderWidth, renderHeight} to the motion vector scale param
||it still artifacts, but not too badly I guess
||

no motion vs me shaking the camera violently
honestly not that bad
yeah it's better than I expected
maybe I did integrate it properly
well, I'm still missing proper motion vectors on the skybox
but idk, it still seems more aliased under motion compared to the vulkan sample
opengl issue
It's a bit too sharp maybe?
Ah, yeah I did π
It do be kinda iffy when shaking your mouse violently
But under normal human being conditions it's great
i mean this is zoomed in
Yeah we're pixel picking, it's great, thank you Jaker
I will make sure to pay you in pretty pictures
Like this frog, for example

Might be worth taking a look at the fsr2 cauldron samples btw
did you delete the media submodule in your π΄
It seemed like the image was cleaner than my own
I don't remember fsr2 being this bad when shaking camera
this
I didn't explicitly delete it
i fought git submodules for a while to get rid of it because downloading a couple gb of assets isnt exactly great
I'm guessing my motion vectors are still slightly wack
how calculate?
ndc_prev - ndc_curr
Variables are poorly named
I tried calculating motion vectors by doing the world->ndc transform in the fs, then subtracting, but somehow that made my motion vectors huge
format for mv image?
rg16f
both seem fine
?
passed fragment world to fs?
yeah
I feel like I made a mistake, but it was like 2 lines of code so idk lmao
did you set proper motion vector scale?
If your application computes motion vectors in another space - for example normalized device coordinate space - then you may use the motionVectorScale field of the FfxFsr2DispatchDescription structure to instruct FSR2 to adjust them to match the expected range for FSR2. The code examples below illustrate how motion vectors may be scaled to screen space. The example HLSL and C++ code below illustrates how NDC-space motion vectors can be scaled using the FSR2 host API.
dispatchParameters.motionVectorScale.x = (float)renderWidth;
dispatchParameters.motionVectorScale.y = (float)renderHeight;
also lol that formatting
I also tried different permutations of negating the jitter Y, but the current config seems to be the most stable
Damn, FSR2 is taking 2ms 
what gpu
3070 currently
what resolution?
A little under 1080p
oh and what upscaling factor
None, render and present resolutions are equal
ah
fsr2 isn't optimized for pure TAA, but your perf shouldn't be that bad still
perf should be closer to 1ms according to this chart (looking at the 6800 column)
https://github.com/JuanDiegoMontoya/FidelityFX-FSR2#performance
Here's the breakdown btw
I'm doing a lot of suboptimal things in the backend
- excess memory barriers
- excess gl calls (glGetUniformLocation and glUniform1i)
- no fp16
- no subgroups
also, the persistently mapped buffers may not be offering the best performance
I think no subgroups hurts perf a lot in the SPD pass
not sure about no fp16 though
Completely unrelated, but do you know why my second and third cascade take so much longer to render? 
yeah, they're rendering more stuff
The second, third and fourth cascade have the same amounts of (indirect) drawcalls
Only frustum
the later cascades should have larger frustums, no?
Yeah but I'm using a very low lambda so 2nd 3rd and 4th enclose the whole bistro
@long robin check this out https://github.com/GPUOpen-Effects/FidelityFX-FSR2/issues/22
it's not a surprise to me that it would take longer than the first cascade then
Of course, but the problem is the second cascade
it takes 0.66ms while the third takes 1.5ms
Yes
You can see compute warps skyrocket in the third cascade too
But there's no in here dispatch...
and unit throughput goes wack on the third cascade
Ah well I'll figure it out, I'll stop leaking my thread into this lol
trying to find fsr2 code that does sampling with motion vectors
why is it so hard
it had to be the most convoluted code ever
so they say
For example, a motion vector for a pixel in the upper-left corner of the screen with a value of <width, height> would represent a motion that traversed the full width and height of the input surfaces, originating from the bottom-right corner.
doesn't that mean that the scale is actually width/2 and height/2? If ndc is [-1; 1], such that from say (-1, -1), you travel to (1, 1) the distance is 2 on each axis, then you want to scale such that
(1 - (-1))x = width
2x = width
x = width/2
@long robin plz respond
don't tell me ur already asleep
I was afk
Yeah that's what it would seem like
I'll test when I'm no longer on the john ||producing my best code||
its extremely well documented 
the empty line after all the enum names annoys me π©
me too lol
however, it is consistent π
well, you see, um

at least the * is left at the type
As it should
i have a huge respect for you jaker
for dinglefarting all that together within a few days π
Yeah thats honestly quite impressive
working as intended or not
john amd?
the other john
Competitor John
hmm it seems like simply dividing the motion vector scale by 2 fixed things
that seems so random
that looks quite nice
there is still a little flickering under no motion, but I think that's caused by the high frequency detail in the scene
normally there would be indirect lighting to reduce the contrast
do you happen to know if ffxFsr2ContextDispatch is thread safe?
wdym
or just access to the context in general
tbh not that i would ever call it from different threads
just curious if i should wrap it for safety
right for opengl it wouldnt be very safe ofc
you cannot call it from multiple threads simultaneously
not sure why you'd want to do that anyways
yeah true
ill just put a little disclaimer
the bindings are done i think so i can start integrating
you can call it from a different thread than the one that initialized it
thats good enough
but not multiple threads at once

the context and backend each store a bunch of state that gets modified when you dispatch
and the state is modified in a non-atomic way, so overlapping dispatches would fook everything up
yeah makes sense
i havent really looked at the impl
just mindlessly copied headers to rust 
take a look at these files to see what kind of state there is
https://github.com/JuanDiegoMontoya/FidelityFX-FSR2/blob/master/src/ffx-fsr2-api/gl/ffx_fsr2_gl.cpp
https://github.com/JuanDiegoMontoya/FidelityFX-FSR2/blob/master/src/ffx-fsr2-api/ffx_fsr2.cpp
here's a vid of me adding indirect lighting to remove the flickering and own the libs
lol the flickering is still there
prolly not
I don't observe flickering btw
what do you do for motion vectors and scale
vec2 calculate_velocity() {
vec4 clip_pos = i_clip_pos;
vec4 prev_clip_pos = i_prev_clip_pos;
clip_pos /= clip_pos.w;
prev_clip_pos /= prev_clip_pos.w;
return prev_clip_pos.xy - clip_pos.xy;
}```
In FS
scale is just render resolution
I just use ffxFsr2GetJitterPhaseCount and ffxFsr2GetJitterOffset
In VS I construct a shrimple mat4 with the jitter offsets
I do this
float jitterX{};
float jitterY{};
ffxFsr2GetJitterOffset(&jitterX, &jitterY, frameIndex, ffxFsr2GetJitterPhaseCount(renderWidth, windowWidth));
const float jitterOffsetX = 2.0f * jitterX / (float)renderWidth;
const float jitterOffsetY = 2.0f * jitterY / (float)renderHeight;
const auto jitterMatrix = glm::translate(glm::mat4(1), glm::vec3(jitterOffsetX, jitterOffsetY, 0));
const auto projUnjittered = glm::perspectiveZO(cameraFovY, renderWidth / (float)renderHeight, cameraNear, cameraFar);
const auto projJittered = jitterMatrix * projUnjittered;
...
dispatchDesc.jitterOffset = {jitterX, jitterY},
I think my motion vectors are okay actually
gl_Position = clip_pos + vec4(u_jitter * clip_pos.w, 0.0, 0.0);```
Here's what I actually do
spooky
Indeed
I just pass this frame's and last frame's unjittered viewproj
Yeah, same
what's the deal with u_jitter in your math then
Right now I don't jitter the projection at all
u_jitter is a vec2 of the offsets fsr2 gives me
oh, this is for gl_Position
what do you pass for the jitter offset in the dispatch description?
the output of ffxFsr2GetJitterOffset?
Output of that yes
what about for u_jitter?
jitter_offset * 2.0f / glm::vec2(window.width, window.height)
Where jitter_offset is the output of ffxFsr2GetJitterOffset
oh you're using window size and I'm using render size
Ah well remember my render size and present size are one and the same
ah
I think it should be render size though?
yeah the guide says so
const float jitterX = 2.0f * jitterX / (float)renderWidth;
const float jitterY = -2.0f * jitterY / (float)renderHeight;
except we don't negate the Y because opengl
Best API
Turns out math works (huge discovery)
I'm gonna try rendering with 1x upscaling
also someone should make a PR or poke someone at AMD to update the motion vector section to include correct info
yeah, someone
joker
joker works at amd? Well that's why hes so mentally unstable
idk I wouldn't ask the first guy, maybe the 2nd
That's true, what happened to Jaker 1?
yeah, this is 1x upscale (TAA basically)
I can only see 4 blocks
H264 still the king btw
seems pretty poopy if its supposed to flicker ngl
production games use fsr?
yes some spoodermoon
tons of games use fsr2 
ok I was a bout to list some but ok a lot of games use fsr2
I basically don't play any modern games so I legit wouldn't know
the crysis crisis
bababooey
Spooky
this is the official shrimple
How the hell do I not see any flickering?
you aren't looking hard enough 
what's going on with the yellow balls inside
also it's harder for you to observe because you aren't doing any upscaling
lvstri built different
yeah, it's to show how it interacts with particlezzzz
do you need to recreate the fsr2 context if the display size changes?
yeah
Yes
you should already be creating all window-size-related resources anyways
Yeah I do that somewhat implicitly
destroy_the_world()
eh ill think about it later
it's very easy in opengl
I literally just do a reassignment 

Checkmate Vulkaners
FfxFsr2Message fpMessage can this be null btw
Ah btw did you remove the asserts for commandList and device
.commandList = (void*)0x1,``` Might look like I was on crack while coding
I have not touched those yet
seems useful to have so ill implement it anyway
I forgor about the giant magnifier that comes with the samples
don't turn the sharpening > 1
this is not sharpening, this is coarse sanding at this point
how important is it to actually use the samplers from a gltf
because atm mine don't have a lod bias since I load the scene before creating the fsr2 context, so all the textures are blurry
I'm wondering if there are gltf files that actually use a "non standard" sampler, if that makes sense
because I would like to just use one hardcoded sampler of my choice
in prop engines, it's dictated by the integration with external editors and how much the artists care (which usually comes down to style)
general purpose engines obviously have to care
and bespoke engines i would imagine almost always hardcode samplers or expose it in-editor
if you want to be able to arbitrarily take a gltf file off sketchfab
there is going to be stuff that doesn't work due to samplers, i'd imagine mostly due to the address difference though, not the sampling
is there an LOD setting with samplers?
I imagine some pixel-style models want point filtering too
yeah, you can set a lod bias
i see
having 100% coverage of gltf is a relatively dumb goal imo though
unless your name is godot, UE, or unity
just normie stuff like address and filter mode seem exposed by gltf
ye, I just want to support a "reasonable" number of models for testing
artists are really good at finding weird ways to use things that exist, and working around things that don't exist
I have an idea w.r.t. samplers anyways
I can just construct them on the fly
rather than baking them into the loaded model
hehe this is definitely what UVs were intended for right
yeah like generate normal map from albedo, fiddle with the generated normal map such that it doesn't contain unit vectors anymore
change output color space in the engine to some obscure one because it "looks better"

level designers are more unhinged
most of them start out modding games, a lot of the time without documentation
completely ignore requests to make models to scale cause "you can scale in-engine by eyeballing anyways"
2m tall chair
but yeah, if your goal is just to import random gltf models, you're probably gonna find edge cases that don't work because people do really weird things
samplers are probably pretty low on the graph of cost vs reward
I fixed the issue already; my question was dumm
soon anticipating artists to use ai to generate illustrations and then it turns out ai autocompleted a chunk of it to be identical to some reference they trained it on
goner froger
an absolutely wild lad
anyways, textures don't turn to blob when I lower the render res now
crazy how much better quality looks than balanced imo
I think quality is the fullres TAA via upscaling and downsampling
yeah
its not surprising given what it is
but often quality/balanced shows little actual quality change
vs balanced to performance
e.g. video encoding
or graphics settings in general tbh
Quality is 1.5x upscale
i.e., they're all upscaling, just to different degrees
balanced is 1.7x area upscaling and performance is 2.0x
and ultra perf is 3x
damn that's unexpected
i would not expect 1.5x and 1.7x to look perceivably different
would i be cruel for asking for a no FSR comparison?
@oak garden you could make multiple fsr2 contexts if you wanted to multithread
then you could provide different command lists and get parallel fsr2s
not sure why you'd want that though lmao
Thatβs a bit weird yeah lmao
fsr looks barely different when I use the wrong depth convention
I wonder if it even matters
thank you resharper

@dapper gorge latest fsr2 stuff has goodies like not relying on extreme UB and FFX_FSR2_ALLOW_NULL_DEVICE_AND_COMMAND_LIST
also it should use fp16 now, if supported
does fsr2 even build on linux
nop
very cool
it probably wouldn't be too hard to take that person's efforts and make it work with the gl backend
eh but you're a vulkan guzzler, so the pr already has what you want
true
i could probably integrate it in my fork eventually
i think i messed up the build and i cant get it to generate cmake manually 
btw don't use my fork if you're targeting non-gl
I probably screwed up the cmake somehow
ah i can generate it from the cmd line outside vs
yeah
same
I think it's the shader tool generating hundreds of permutations with every thread on your pc
renderdoc doesn't seem to like something I added recently
ah, it's this
if (!subgroupSupported)
{
return FFX_ERROR_BACKEND_API_ERROR; // GL_KHR_shader_subgroup is required
}
renderdoc does not report support for the subgroup ext (or most other exts for that matter)
yet the ext works
how do you set subgroupSupported?
er
bool subgroupSupported = false;
GLint numExtensions{};
glGetIntegerv(GL_NUM_EXTENSIONS, &numExtensions);
for (GLint i = 0; i < numExtensions; i++)
{
const auto* extensionString = reinterpret_cast<const char*>(backendContext->glFunctionTable.glGetStringi(GL_EXTENSIONS, i));
if (strcmp(extensionString, "GL_KHR_shader_subgroup") == 0)
{
GLint supportedStages{};
backendContext->glFunctionTable.glGetIntegerv(GL_SUBGROUP_SUPPORTED_STAGES_KHR, &supportedStages);
if (supportedStages & GL_COMPUTE_SHADER_BIT)
{
subgroupSupported = true;
}
}
if (strcmp(extensionString, "GL_NV_gpu_shader5") == 0 || strcmp(extensionString, "GL_AMD_gpu_shader_half_float") == 0)
{
deviceCapabilities->fp16Supported = true;
}
}
if (!subgroupSupported)
{
return FFX_ERROR_BACKEND_API_ERROR; // GL_KHR_shader_subgroup is required
}
ok. I now on the latest and greatest amd opengl drivers querying it like that doesnt actually contain GL_KHR_shader_subgroup even though you can use it in glsl
man, the new nsight (gpu traces in particular) is way better
no more horrendous lag when a capture is open, interface looks a lot more like RGP
there is even an analysis view that shows estimated gain by fixing various things
too bad the handy little tips don't appear on compute passes, which is where I spend about 90% of my frame π
Superb
I read somewhere that gpu trace supports async compute (pretty nice), I think the normal frame capture doesnt. I guess it means this (https://developer.nvidia.com/blog/the-peak-performance-analysis-method-for-optimizing-any-gpu-workload/) is outdated now π¦
yeah :'(, fortunately there is a blog post telling us how to migrate
do you have any links for it π
Yeah, one sec
it's this @viral haven
https://developer.nvidia.com/blog/migrating-from-range-profiler-to-gpu-trace-in-nsight-graphics/
Thanks 
if its not in #graphics-resources perhaps its a good item for there
while you're posting articles I've been looking at this particularly good set of resources on morton curves
http://johnsietsma.com/2019/12/05/morton-order-introduction/
@digital lion have you already reported the bug with the AMD driver not reporting support for the subgroup ext
I am experiencing it right now 
it's not even reporting support for GL_AMD_gpu_shader_half_float
they really want gl to die π¦
wait nvm it is reporting the half float ext
they really only try to kill gl
well there was the gl/dx11 driver overhaul somewhat recently
that was pretty nice, but a lot of the changes were undocumented
good 
I am still wrapping my head around Vulkan about one week after starting studying it, I was up and running with GL within the first few days π¦
it does take some time
GL must remain
byeah
const auto* vendor = reinterpret_cast<const char*>(backendContext->glFunctionTable.glGetString(GL_VENDOR));
if (strstr(vendor, "ATI") || strstr(vendor, "AMD"))
{
subgroupSupported = true;
}
Incredible
the last thing I want to add to the backend (frontend technically) is support for gl crusty z convention, but my question in #mathematics hasn't gotten any attention yet (I'm praying that criver sees it)
also praying for driver engineers to resolve my ticket for the subgroup thing
could go do some angry yelling in their office
Just update the driver yourself and release a sneaky update not even the engineers know about
subgroupSupported = strstr(vendor, "ATI") || strstr(vendor, "AMD");
that would set it to false if it was reported as supported on nv
sheesh
then simply
subgroupSupported |= strstr(vendor, "ATI") || strstr(vendor, "AMD");
isSubgroupSupported π
imo that doesnt make it much cleaner
perhaps ATI doesnt make so much sense, since you probably wont be able to run anything involving Fwog on an actual ATI
and yet
so really the test against the "AMD" string is unnecessary afaik, I just have it to hedge my bets
ATI's ghost will forever haunt us
on the bright side, the new driver reports GL_EXT_nonuniform_qualifier 
@digital lion you may enjoy this information
he knows π
how
i believe we talked about that like a month ago or so already
when AMD released their new gl driver
I know he knows that the driver supports the ext, but it finally reports that when you enumerate exts
instead of being secretly supported
np π
monke_sounds.ogg
Speaking of monke, I almost forgot to watch Oliver eating an apple today
hwat
is the ticket you've created public, I'd like to follow it
It's not public
you can just give us your credentials to look at the ticket even if it is not public
you could also install a remote access tool like parsec or horizon for us to take a peek
you could also send us hourly screenshots of the tickets progress
it's the least you could do, really
does fsr2 blur your dithered transparency and shadows?
I don't have transparency atm, but it sorta blurs shadows
however, the shadows don't have per-frame rng, so it's noisy no matter what
so it just reconstructs the dither pattern?
I guess yeah
lol
the worst part is that it reconstructs the low-res dither pattern
so you get extra big chunky bits
debug build btw
rsm settings no longer reset whenever you change the resolution π
fsr2 actually resolves some amount of noise pretty well
it breaks when the noise is super high frequency (1spp)
hmm I guess this is #wip-worthy
no, I think it's good how it is
you did clip space and then pass (width,height) / 2 to fsr2 right
I did ndc space, then resolution/2 to fsr2
right
ah, I think I know why my renderer is so slow in renderdoc
I'm doing this for roughly 5k draws
I need to
- sort mah draws
- put all the uniforms in a buffer so I don't call glBufferData a billion times
apparently this math for reprojection is wrong (I'm plugging motion vectors into the denoiser instead of using matrix math)
vec2 reprojectedUV = uv + textureLod(s_gMotion, uv, 0.0).xy;
what?
something else is wrong methinks
reprojected uv is kinda odd name
my motion vectors should be in uv space now
o_motion = ((v_oldPos.xy / v_oldPos.w) - (v_curPos.xy / v_curPos.w)) * 0.5;
I thought you were talking about projecting from last frame onto the current frame, but I guess not
I'm going from current frame to last
but why what do you mean
motion vectors do tell the displacement needed to arrive at the position of the current fragment on the previous frame
so you treat the current one as the origin for motion vectors
and get the difference (target - origin)
yeah I get how they work
there was just a bit of a terminology mishap
I think my motion vectors are fine actually
I don't think this is in uv
o_motion = ((v_oldPos.xy / v_oldPos.w) - (v_curPos.xy / v_curPos.w)) * 0.5;
NDC to UV
[-1; 1] -> [0; 1] : (x+1)/2
start from NDC
(prev+1)/2 - (curr+1)/2
simplify
(prev+1 - curr+1)/2
(prev - curr + 2)/2
something's odd, you have
(prev - curr)/2
did I do the maths wrong
no, I think this is another terminology moment tho
maybe
"uv" means they are suitable for math in uv space
"ndc" motion vectors are suitable for math in ndc space [-1, 1], but "uv" motion vectors are halved because the space is half as large
if you want to pass width and height to motion vector scale in FSR2 I think you should use what I derived
full screen step from corner to corner
prev = (-1, -1)
cur = (1, 1)
get the UV space motion vector
(prev - curr + 2)/2
((-1, -1) - (1, 1) + 2)/2
((1, 1) - (3, 3))/2
(-2, -2)/2
(-1, -1)
full step in UV [0; 1] achieved
I am basing it on their claim that motion vectors are done such that each tells amount of pixels displaced together with the scale factor
For example, a motion vector for a pixel in the upper-left corner of the screen with a value of <width, height> would represent a motion that traversed the full width and height of the input surfaces, originating from the bottom-right corner.
I can only interpret this as (0, 0) + (width, height) travels entire screen, meaning that you want [0; 1] UV and [width, height] remapping scale factor
here is my maf
motion_ndc = ndc_old - ndc_cur
motion_uv = uv_old - uv_cur
motion_uv = (ndc_old*.5+.5) - (ndc_cur*.5+.5)
motion_uv = 0.5 * (ndc_old - ndc_cur)
anyways, my current motion vectors work fine for fsr2
the docs are also wrong about the motion vectors, they expect "uv" motion vectors rather than "ndc" ones (assuming the motion scale passed to the API is equal to the screen resolution)
there are two issues about this
https://github.com/GPUOpen-Effects/FidelityFX-FSR2/issues/22
https://github.com/GPUOpen-Effects/FidelityFX-FSR2/issues/78
is uv you're talking about zero to one?
it's the difference between coordinates in uv space
actually doesn't matter anyways because motion vectors only denote displacement, doesn't matter which space it is because of that and that you can scale them
what's clear is that in the end your space must be [0, width]*[0; height]
yeah
late reply but displacement renders offset term obsolete because
b - a = c
make a relationship such that c is constant displacement between a and b, by expressing b as dependent on a and displacement c
b = a + c
substitute
(a + c) - a = c
now add arbitrary offset x
((a + x) + c) - (a + x) = c
(a + x) + c - (a + x) = c
cancel out terms
c = c
still holds
so (prev - curr + 2)/2 and (prev - curr)/2 are equivalent
Jaker did you fix FSR2 perf
No
My investigation yielded that I'm severely VRAM bottlenecked in gl for some reason
There was a comment in the Vulkan backend indicating that fp16 should be disabled for a certain pass on Nvidia due to high VRAM use
I already did that though
Time to boop your colleagues
I wonder if maybe it's the resources I pass to fsr2 being too thicc
Lol I'm not gonna have them waste time looking at this
Hmm cache hit rates are fine for me in the slow passes
hmm that's not the worst
but also not the best lol
do you have the latest version of the backend?
Subchannel switches
I donut know
(they imply the workload is switching between graphics/compute/copy)
I think there's some kind of bug within my app tbh
I just don't know what it is
There are more subchannel switches in completely random positions
Are you using persistently mapped buffers
i would also like to know
I have them in the fsr backend and I wonder if they're causing issues
None
i believe nv just randomly shits in some commands into your command stream
i saw these randomly in many captures
Interesting
why does it do that
Keeps ya on yer toes
With the advanced trace dwm.exe appears for some reason
Right in the middle of fsr lol
I also need to check the shaders again to see if the extensions are actually being used
Especially subgroup
(Ignore the many subchannel switches in shadow map rendering, those are my fault
)
re dwm, schwapping vram resources perchance?
you never know with gl
you hung the bar too high
These also might simply be regular wfis due to memory barriers
I think it's more likely that nsight is just being dumb though
Can I get a caption on this one
me giving virtual headpat to Jaker for his opengl heroisms
that much better π
Same, your hard work is appreciated Jaker π«‘
jaker the wallpaper man, getting fsr2 to work in greenland
yes it isπ«‘

compared to how much without fsr2?
that's how long fsr2 takes, sorry I wasn't clear
no i am just an idiot
here is what I'm comparing it against
https://github.com/GPUOpen-Effects/FidelityFX-FSR2#performance
what you said earlier makes absolute sense
ah
can i just run it after checking out the fxr2-test branch?
oki
i could give it a try here if you want
lemme write the fat commit msg
lunix + gtx1060
it won't run on loonix
oh, right, you also mentioned that yestergestern
I'm also using windows.hisms to detect renderdoc 
what makes it windows exclusive
#include <windows.h>
- shader tool being an exe
- some code relying on windows.h and msvc C extensions
- cmake being hardcoded to reject non-windows builds
all of which are fixable, and in fact someone has made a PR on the fsr2 github to make it build on linux
I'm still open to someone porting the stuff to make it work on linux to my branch
but that one needs massaging
bro why would they do all of the above
i suppose because people more likely live on windows
π€·
idk, especially the shader tool one is baffling
ja π¦
yeah the shader tool i dont understand
someone needs to get the stick out
the entire shader compiler folder is like 30 Mb
the equivalent python script is like 300 loc
their shader compiler, glslangvalidator, dxcompiler.dll and dxir.dll
well you kinda have to ship dxc/fxc binaries
kinda feels like hastily hacked together somehow
probably some choices made in the beginning that ended up in the final thing since they didn't affect the target
yeah
which is fine
its more like a POC anyway i guess, and AAA studios have the wo:manpower/$$$ to implement it properly
i do have to say i was pleasantly surprised by how shrimple it is to integrate
@golden schooner try pulling the latest from the fsr2-test branch and running 03_gltf_viewer
02_deferred is broken because I haven't added motion vectors to it yet, and rsm now requires them
examples are not built by default btw
oh
you need to enable a thingy
is that new?
it's relatively new (I changed it probably a month or two ago)
around the time I was adding docs I think
don't worry, I've already confused myself with that change 
oki what do i enable where? π
ah
found it
option(FWOG_BUILD_EXAMPLES "Build the example projects for Fwog." FALSE)
ye π i dont use cmake-gui
I don't even use cmake-gui anymore except to build other projects
just VS's integration
i just let cmake addon in vscode do all the things
there is probably a nice button for you to check too
and cashually switch a FALSE to TRUE in one's cmakelist
nah, i have to edit the .txts
but bulding/selecting targets works
its all good
[cmake] CMake Error at build/_deps/fsr2-src/CMakeLists.txt:69 (message):
[cmake] Cannot find MSVC toolset version 142 or greater. Please make sure Visual
[cmake] Studio 2019 or newer installed
I dont understand why americans put 2 spaces after . irritates the shit out of me
fsr2 isn't very friendly to devs who don't conform
hehe
only old people do that here
dragonslayer does that too
he's old
imo it isn't as bad as when people put a space before punctuation (ahem gob)
hes en baguette beljeeque, they do be plenking, but ye
when I'm done with this branch, I'll probably turn this into its own example that is quarantined behind a flag
or maybe its own repo
I was thinkin about it
i think thats a french thing
i dont do that
as if I need more evidence that frenchies are crazed
well you didnt ask for it, but zer you go
pengu you are not from the bagette part of beljeek, you are from the cheese corner of beljeekistan
hehe
how do i say that
hmm
he's kinda used to the comfy "mom provides everything for me"
l'estench d'echeese
ah
bit stuck up
kind of a pain in the backdoor
but i move out in a month so ill tolerate it
eggcellent
for a bit longer
my new place is much nicer
private kitchen and bathroom is a huge plus
+1
btw my issue was that I wasn't applying the jitter to my uv before reprojecting
it was working before because I was accidentally passing a jittered invViewProj without realizing 
because it affects the location of the reprojected uv
the subpixel location matters since I filter
the only thing that confuses me is that the frame I'm reprojecting onto was jittered in a different way, so I feel like that should matter too
wait I get it, the jitter makes up for the displacement in the target resolution
say you have 1/4 render resolution, then full resolution would have 4x4 tile + 4x4 jitter offset as the pixel
or more than just the 4x4 grid if you account for TAA
this rsm denoising stuff is all happening at the render resolution too though 
hmm, I think I see what you mean
but I'm also pretty sure it's not exclusive to upscaling, but rather any TAA with per-frame jittering
TAA at least makes sense in motion, with upscaling I still have no idea what happens when you just move around and low resolution geometry changes shape
I think a big part of it is that internal upscaled depth and color are maintained between frames (in the TAAU impl)
so you can compare the low-res input with the reconstructed image from last frame and do some magic heuristics to reconstruct the current frame
I don't think upscaling depth helps anything in motion
motion invalidates any reconstruction
just rotating camera is not motion, what I'm talking about is real moving and deforming geometry, especially the edges
and even worse stuff like moving blades of grass
or hair
it's still possible to get high quality motion vectors from things like that
I think getting good motion vectors from literally everything is one of the painful parts of TAA(U) integration in a real engine
this isn't about motion vectors anymore though I just segued into something different at this point
at least FSR 2 lets you provide masks for things like transparent stuff and scrolling textures
where're we at now
ltt's sponsor segment
my point is that if all you have is 1 pixel per tile when geometry moves you gotta drop it to pure blurry low res quality
upscaling only makes sense for somewhat still geometry to me
have you read this
https://github.com/GPUOpen-Effects/FidelityFX-FSR2#the-technique
yeah it doesn't have any insights
just outsights
what does "when geometry moves" mean exactly?
are you referring to e.g., objects having different transforms between frames
imagine a bending leaf which is also over some distant background
there is a hard edge between those which you need to reconstruct with just 1 pixel per some tile
this leaf moves such that you only get 1 pixel per tile each frame, all different tiles
what do you reconstruct?
does that sound like it's possible to get meaningful info?
at least with just TAA you have a proper 1:1 geometry outline every frame and extra info is subpixel
what is a tile
pixel in input resolution?
render resolution is 1 pixel in a tile of the target resolution
each frame you render 1 pixel in a tile as a consequence
from that you'd reconstruct the full image
given that it's all static of course, it'd converge to ground truth
but in motion naah
how does it work when there are 1.5 pixels per tile
simple, you don't actually treat pixels as discrete with reconstruction
shrimple as that
it's just easier to think of it as tile
in reality it's continuous
you can splat samples onto the target image and eventually it converges to the full resolution
that's what wavelet upscalers do iirc
the problem is still a problem, when stuff moves it's hard to make sense of it
when the scene is static and you only rotate the view you still get pretty small error because motion vectors compensate for the change
and only rotating the camera is not deforming anything
what is deforming
my enthusiasm to explain it all over again
imagine a cubemap
you put it on the skybox
rotating a camera does not alter anything
but now if clouds move that's deforming geometry
does that make it clearer
I see
what I don't get is how that's fundamentally different from transforming the camera
because maths
you can still get disocclusions and such when the camera moves
moves yes rotates no
that was the missing part for me
moving camera is really same as moving object relative to it
mathematically speaking
I still don't get how info is irrecoverable except when disocclusions occur
unless that was the point you were making
because we also have changing shading especially if motion is fast
specular is nasty adds lots of high frequency details
add a bump map it's a nightmare
so you either have bad ghosting or bad quality from dropping history
I guess you could somewhat "compensate" by not using the negative sampler lod offset lmao
so now everything looks awful, but at least it's not blocky under movement
ok I need to sleep now
gn
mfw when opengl
it's like the driver delays the dma until the perfect moment to troll you
the driver senses profiling so it randomises behavior
yeah btw I can't even reproduce the fucked up graph
oh wait
now, with glBufferSubData, I get the epic dma at the beginning of the pass
guess I won't worry about it then
the one with longer frametime and empty space is persistently mapped?
yeah
but it's probably random chance that caused it
now I need to figure out why I'm being horribly bottlenecked by vram throughput
even the vk one gets screwed by random dmas in the middle of the pass, but at least it's not uber vram bottlenecked (instead, it's L1 + SM throughtput)
btw @digital lion, the subgroup bug should be fixed in the next driver release
amazing
the difference between the gl shaders and the vk shaders is almost nil, so I guess it's either an epic GL moment, NV moment (since it doesn't happen on AMD), or just the inputs I'm providing being too thicc
meh, reduced my color input & output to R11_G11_B10F (from RGBA16F) and almost 0 difference in perf
I blame nv for making bad hw
I have been lurkin here for the past few days and have still been confused if this is still fwog thread
this is the "& co." part of it
Oooh!
there are some fwog example updates interspersed throughout
Fwog now has a FSR2 sample that has random performance on NV 
How does NV hardware take 300 microseconds to switch to DMA mode smh
fsr2 performs pretty consistently poorly on this (3-4x worse than the equivalent AMD card)
it's doing a copy and waiting for it I think
(seriously though, my problem is also high vram and occupancy in the accumulate pass on NV)
Except when I toggle fp16, there is no epic perf difference π’
should I
documenting my fsr2 journey, starting somewhere around here: #1019779751600205955 message
it gon be lit
hype
Epic
type
finally it pays off to pester you to write π
can confirm that this works btw
you can target multiple envs with glslang like so: --target-env opengl --target-env spirv1.3
you need to run them separately then neh?
no, you can combine multiple envs at once
I literally invoke the shader compiler with these args:
-compiler=glslang -e main --target-env opengl --target-env spirv1.3 --amb --stb comp 8 --ssb comp 8 --sib comp 0 --suavb comp 0 -Os -S comp -DFFX_GLSL=1
cant arget
I'm accepting ideas for memes (funny or not) to put in this
I'm thinking about putting the stick-in-spokes meme for when I dared to try using glUniform to set image bindings in a spirv shader
I already have 1000 words wtf
why can't I pump out actually useful stuff this quickly
I'm just mildly frustrated that I haven't worked on the other article that much lately
ill just leave that here https://developer.download.nvidia.com/opengl/tutorials/bindless_graphics.pdf
I recently noticed that my fix for my temporal reprojection (incorporating jitter) adds a small bias which makes it look like the pixels are moving up and right π©
it's weird because the jitter is distributed around 0, so no single direction should be favored over time
vec2 uv = (vec2(gid) + 0.5) / uniforms.targetDim;
it also only happens when I add the jitter
I think I'm not accounting for last frame's jitter but that sounds like a problem for tomorrow me
truly mental
if I were to write opengl-like softraster app today I would name shader interpreter source file jaker.cpp
to make it accurate you need to leave any bugs you find in it
yep, a minute or so, thats fine
yeah
ill try "large canyon with rocks scattered around the place. a dense fog covers the bottom"
its not as "accurate" you would think it is/or have played with on the stable diffusion/openai discord
oi nice
can you share the image nonetheless?
sure
perhaps DM? then we dont pollute jaker's living room
are these HDR
no, they are jpegs
agx (copied from jasper) vs aces
the toe on aces deletes so many details
anyways, I guess tony mcmapface looks similar to the left pic
we need to see the path to white
are your monitors calibrated?
also slight shift to purple in blues in agx
mine is not calibrated because it burns in srgb mode
but the difference is in luma calibration only
so it's either no srgb or buying a new monitor for me
my brother had his monitors at 250 nits and saturation cranked up π
it was like staring into those pits of glowing green goo in half life
what's with the colorimeter thing?
are you colorimetring the colors of your surroundings as a hobby?
I used it to calibrate my monitors lol
and everyone else's monitors that I come across
but there's like
dedicated sRGB mode in each monitor's internal config
just choosing it should be enough
like built-in hardware
sRGB as opposed to HDR of some sort?
I have no clue how hdr is supposed to work
most monitors do not come from the factory well-calibrated anyways
plus you need to account for suboptimal viewing conditions
I know there are color spaces with wider gamut
in that case hardware should come with settings for those as well
what is a suboptimal viewing condition
ambient light > 0
displays are black and absorb most ambient radiation though



