#Iris - A Journey through OpenGL and beyond to learn Graphics
1 messages Β· Page 13 of 1
yeah
like the params you supply to glm::ortho
couldn't figure it out so I'm gonna keep my unhinged inverse thing
I think you should project the basis from ndc virtual space into world space -> x = offset * unprojected_basis -> translate by x
you also need to translate by z in order for the camera to slide along the same plane no?
not in view space
right but you translate the view matrix that does not mean translating in the view space no?
or am I dumb
idk
o
thats why you need to unproject from the viewspace of the vsm first
or multiply by the vsm view matrix to get it into vsm space translate there and then go back to world space
these should be identical
or do glm::inverse(stableProjections[i]) * shiftedProjection * stableViewMatrix; 
I have no clue what that does lmao
tru
I don't like how translating a view matrix does not translate in view space
someone should fix that
you can do that by doing projection * view_space_translation * view_space * world_space_translation
bug fix for math.exe
this translates in view space?
yes because you first project into view space and then translate
which does translation in view space
I had the order wrong my bad
Are glm::ortho's parameters in ndc units or world units?
world
Hmmmm
I have come to the realization that with HWVSM none of the techniques you guys use for caching work, so I'll have to invent my own
either that, or I go back to SWVSM
you should do vulkan 1.2 compatible SWVSM :^^^)
that can run on a gtx 760
at 60fps on bistro
how come?
In any possible case, I can't simply set gl_Position and write
I need to shift the ndc position by some factor and have it wrap around to raster to the correct page
i took the liberty to acquire a potential new vsm customer in #wip
Hpb building should still work though right?
Like if your internal marker says the page is cached, donβt include it in hpb
Which means hpb culling will delete it 
ye that's fine
the big issue is stable addressing
Oh yeah
Iβve had some success with fractional shifting of render matrices and caching
And then for addressing I use the untranslated version
yeah, my only hope right now is to do some unholy math
much more unholy than what Jaker cooked up 
abusing div by zero incoming π
ok simplest case
truncate camera's position
time to draw
suppose camera position = stable origin = 0
then view = stable_view
now what happens when I move one unit to the right
only god knows (jk I'm computing)
: )
yep, you're kinda fucked
btw Light Perspective Shadow Mapping used to do some unholy trick with Z/W divide to have shit wraparound with weird perspective
but here you need to wraparound by 2 axes
in HW the only way to do that is to set up 4 separate viewports and multi-cast/cull your meshlets into them
ngl, SW raster (at least SW Raster Output Processing) is sounding nicer and nicer by the day
lvstri for sw raster
didn't you say yesterday nanite only needs to call render shadows once?
wow that's extremely powerful
maybe one day, while sitting on the loo, it will hit you : ) but not today hehe
You literally don't need that here because you address pages yourself
Even with hw sparse actually
what do you mean
Idk just felt like saying stuff
Actually I'm probably
about hw sparse there but I can't think bc I just woke up
giving me false hope smh
no hardware :(
I thought I'd take a bit of a detour and do ImGUI
so that I can have sexy debug visuals like Jaker and Saky
however there is a small issue, to render an image in ImGUI I need to give it a descriptor set with an image bound
...How do I make this descriptor set, I don't have access the internal layout ImGUI creates
either guess or read the source
I need the handle itself tho
The VkDescriptorSetLayout inside of ImGUI
how the hell do I do this 
I don't wanna create a new backend
@delicate rain how does daxa handle it
This is the universe calling me back to GL innit
I believe we have our own shaders into which we pass the daxa::ImageIds
and samplers
so you have your own backend, epic
Ok since I really don't want to spend the night writing a backend for shitty ImGUI
Yeah
I'll do something cursed
hehe the purple fits
I mean obviously there must be a way to use the premade backend
Just find someone's code that uses it
you make a new backend
thats how daxa handles it
the imgui backends are not meant to be used really they are mostly examples
you should make your own
i love how the devil emoji is daxa colored
tf
it's not impossible to use them though 
it should also be fairly simple to make an imgui backend
its literally just shoving verticles and indices into a vbo/ibo, loading 2 shrimple shaders, and textures : )
they are fine
but usually they start to be annoying at some point
for daxa it took me a few hours (i had to debug a while cause the stupit in my head)
Thank god vulkan is well designed
lol
as it turns out you don't need to create a descriptor set out of the exact handle, you just need the layouts to be compatible
that is something i noticed during my early days with bulkan
the validator layer didnt always slap me in the face, only occassionally
while i was expecting it would at all times i fook something up
like having uniforms and shaders not be compatible
i think i forgor a part in the shader, while it was described in all the descriptorisms or other way around, it didnt yell at me
I was recently creating graphics pipeline with zero color and depth attachments and it didn't say anything
and still wrote to the first bound output from frag shader
ah i think i had something like that too
also does not complain when you fuck up the usage flags - for example when you don't set texture as sampled it will say nothing but the sampler reads will just return nothing
very fun to debug haha
speaking of vkguide, i should also continue : (
yes!
memory transfer nonsense : )
That should be fine though
I do that for VSM because I just need image store
yeah but I was binding two color attachments during begin rendering
which should not be compatible with the pipeline no?
Yeah that seems wrong
pog
This is my first time setting up ImGui on Vulkan lol
how do I get rid of that shitty border around my window
not the viewport, the main window
this border
Certain imgui calls require registering a license
All of my imguiisms are in Gui.cpp
was about to say, perhaps steal from frogfood, because of c# isms and whatnot
Including the pirated style
and dont forget that one little thing
style.WindowMenuButtonPosition = ImGuiDir.None;
to get rid of that dropdown thingy next to the window "tab"
I don't get the point of that button tbh
I had to use trial and error to find the correct offset 
pobalbyl different per font too somehow
It is
it's the ascent of the font probably
sick style
but the fix was using those two little shits
ImGui::PushStyleVar(ImGuiStyleVar_WindowBorderSize, 0.0f);
ImGui::PushStyleVar(ImGuiStyleVar_WindowPadding, ImVec2());```
now border is no more
ye those were the ones i mean π
what is this lvstri, irisvk 2.0?
irisvk except it's not a VSM only showcase 
irisvuk
irisfukall
and there's no sRGB issues at all!
amazing
epic
there's a severe lack of shadows
but I'm too deep into the refactoring now
trying to decipher the magic in your magic numbers... it sure is magic
It's just a pixel offset
There's no easy way to scroll them, you'd have to unbind them all and bind them scrolled
continue reading the convo
This?
yep
// they are the same image
layout (rgba32f, set = 0, binding = 0) uniform writeonly image2D u_history_storage;
layout (set = 0, binding = 1) uniform sampler2D u_history_sampled;
void main() {
const vec4 x = textureLod(u_history_sampled, gl_GlobalInvocationID / vec2(textureSize(u_history_sampled, 0)), 0);
x = do_something(x);
imageStore(u_history_storage, gl_GlobalInvocationID.xy, x);
}``` is this legal
If u never read after a write to that texel, I think it's fine
But I do not think stores are visible to samplers without a pipeline barrier
Ye I don't want to read again after I write
in GENERAL layout, yes
there's no guarantee, but they might become visible whenever
this is UB territory
there's no way to mark the sampler2D as coherent so any writes to the Storage image have no guarantee of showing up
furthermore, you "might" accidentally tap other pixels if you have the wrong sampler set (bilinear interpolation, etc)
why aren't you just using the image itself ?
like imagLoad and then imageStore to the same location ?
btw writeonly images don't need the format in the layout
its literally "barely legal", like maaaybe maaaybe if you do all the following it will work:
- no interpolation, literal NEAREST sampler
- only draw a single pixel with the pipeline at any location, and have an image barrier before and after
- image layout is GENERAL
etc.
in Vulkan they both need, unless you have the WithoutFormat feature enabled
and read images too if you have ext image load formatted
oof
OpenGL best API
but true, 99% of devices support write without format
so Nabla requires that feature
but still in Vulkan you need to add some stuff to VkStruct when making the image view IIRC
its not magically enabled on everything
IMHO this fuckery is super "Not Worth It" (TM)
I wanted to use linear filtering when reading
I could do it myself but eh
ye this is very ub
Ye I can't lol
you write the texels with the invocations
I'll just use two images
and then you want to read at an offset of 0.5
It's ub if any other thread writes to the sampled footprint
btw, your original image would read texel values 0.5 away from pixel center, and store them to pixel center
this basically computes a quasi 1/2 downsample
this textureLod(u_history_sampled, gl_GlobalInvocationID / vec2(textureSize(u_history_sampled, 0)), 0); will give you 0.5 pixel less in each axis
so when globalinvID is 0,0 you end up tapping pixels {0,0} {-1,0}, {0,-1}, {-1,-1}
yeah that was just quick way to demonstrate my issue
I do this now ```glsl
#version 460
layout (local_size_x = 16, local_size_y = 16) in;
layout (set = 0, binding = 0) uniform sampler2D u_velocity;
layout (set = 0, binding = 1) uniform sampler2D u_color;
layout (set = 0, binding = 2) uniform sampler2D u_history_sampled;
layout (rgba32f, set = 0, binding = 3) uniform image2D u_final_color;
void main() {
const vec2 resolution = vec2(textureSize(u_color, 0));
if (any(greaterThanEqual(gl_GlobalInvocationID.xy, ivec2(resolution)))) {
return;
}
const vec2 uv = (vec2(gl_GlobalInvocationID.xy) + 0.5) / resolution;
const vec2 velocity = textureLod(u_velocity, uv, 0).rg;
const vec2 prev_uv = uv - velocity;
const vec4 current_color = textureLod(u_color, uv, 0);
const vec4 previous_color = textureLod(u_history_sampled, prev_uv, 0);
imageStore(u_final_color, gl_GlobalInvocationID.xy, mix(current_color, previous_color, 0.9));
}```
ah you're doing TAA
all images are different so no UB
btw use texelFetch
or Jaker's FSR2 on GL
or ask @hallow umbra for help, as he's poured ungodly amounts of time into TAA
maybe got some code to throw at you, esp that you're using visbuffer
real
FSR4 when
FSR 4 free on gpuopen.com
effort spent making good TAA is better spent tweaking your masks and stuff to give FSR/streamline the highest quality inputs you can
@hallow umbra what happened to your sparkly TAA world btw? its been a while since you posted pics of progress : )
nothing, i just did everything interesting that i could think of
nop, i'm just out of ideas for anything graphics related
i don't feel motivated making things that someone else already did and better
and i investigated all techniques that i thought are underlooked
you could give virtual shadow maps a try ;P
assist lvstri/saky/jaker unlocking its secrets
volumetric frog
uhh
I think NV knows
In my frag shader, I output a zero motion vector for now
And DLAA starts up in "NO_MV_MODE"
however as soon as I put some other value in it, NGX immediately switches to "LOWRES_MV_MODE"
how the fuck does it know
I actually don't even need to put any other value in it, if I do some operation that results in 0, it also switches
???
driver detects access to api, and flips a switch perhaps
Ok so NV calculates motion vectors like this in their sample
void main(
in float4 i_position : SV_Position,
in float2 i_uv : UV,
out float4 o_color : SV_Target0
)
{
o_color = 0;
#if USE_STENCIL
uint stencil = t_GBufferStencil[i_position.xy].y;
if ((stencil & g_TemporalAA.stencilMask) == g_TemporalAA.stencilMask)
discard;
#endif
float depth = t_GBufferDepth[i_position.xy].x;
float4 clipPos;
clipPos.x = i_uv.x * 2 - 1;
clipPos.y = 1 - i_uv.y * 2;
clipPos.z = depth;
clipPos.w = 1;
float4 prevClipPos = mul(clipPos, g_TemporalAA.reprojectionMatrix);
if (prevClipPos.w <= 0)
return;
prevClipPos.xyz /= prevClipPos.w;
float2 prevUV;
prevUV.x = 0.5 + prevClipPos.x * 0.5;
prevUV.y = 0.5 - prevClipPos.y * 0.5;
float2 prevWindowPos = prevUV * g_TemporalAA.previousViewSize + g_TemporalAA.previousViewOrigin;
o_color.xy = prevWindowPos.xy - i_position.xy;
}
And I gotta say, what the fuck
is this
What is a reprojection matrix
Probably last frames clip->world
Uhh
maybe
viewReprojection = inverse(view->GetViewMatrix()) * viewPrevious->GetViewMatrix();
reprojectionMatrix = inverse(view->GetProjectionMatrix(false)) * affineToHomogeneous(viewReprojection) * viewPrevious->GetProjectionMatrix(false);```
It's whatever this does
template <typename T, int n>
matrix<T, n+1, n+1> affineToHomogeneous(affine<T, n> const & a)
{
matrix<T, n+1, n+1> result;
for (int i = 0; i < n; ++i)
{
for (int j = 0; j < n; ++j)
result[i][j] = a.m_linear[i][j];
result[i][n] = T(0);
}
for (int j = 0; j < n; ++j)
result[n][j] = a.m_translation[j];
result[n][n] = T(1);
return result;
}```???????????????????????????
What's wrong with just doing vertex * prevMVP - vertex*thisMVP in your vert shader?
Why do you have to do this cursed solution
Thing is, I dunno if it is actually correct, because NV's docs tell me to do this
Whatever this means
#questions message
Here, check this out
In my thing stuff looks very aliased when I move around
I have no clue what's going on, why do they read the velocity from the texture only if both XY are nonzero?
perhaps negative/zero velocities need special treatment?
Isn't negative velocity just moving in the opposite direction of positive velocity? (Aka back positive forward negative or the other way around)
Btw this builds a 4x4 matrix from some weirdo 3x3 matrix + translation I think
So the reprojection matrix goes prevFrameClip -> prevFrameView-> prevFrameWorld -> thisFrameView -> thisFrameClip
btw
I am starting to think my derivative calculation isn't accurate enough
and my motion vectors are fine
Other fun fact NVSDK_NGX_VK_Feature_Eval_Params::Sharpness does apparently nothing
Maybe it only works for DLSS, I'm using DLAA
Can you just use DLSS I thought FSR is the only one which you can freely use?
ye you can just clone it and use it
its source available?
I doubt that
is what allowed?
to use it
do they not even have headers?
Oh ye they do have headers
ah ok
man
AA is so nice
but you know what would be nicer
Figure out why the fuck I get unstable AA when moving around at the edges of triangles
mister mister do you have code for your project using the ktx thingy?
Why no fsr2
Libktx?
yes
I do
I'm finally writing a scene loader
That's coming too
I have decided that shipping spirv isn't so bad after all
You need a magnifier homie
I do
I figured you had changed your mind when you were suddenly okay with shipping a 33mb dll 
yeah 
btw, I suggest you take my code only as a demo of how to load KTX into Vulkan
Where
For proper KTX management check out Jaker's code, he actually checks whether a texture is supercompressed, needs transcoding, etc.
No as in, I do need a magnifier 
I'm looking at both, but is there really not a way to load directly into a staging buffer?
There is but it's garbage
bruuuh I'm starting to understand handmade ppl
Ktx also has a GL upload function but it sucks too 
I guess it is impossible to just take in a single pointer to a buffer into which we like to load our data
nono I AM THE LIBRARY I RETURN THE POINTER
ugh
Just memcpy my boy
Jaker can you run FSR2 in AA mode like this? (normals only, bistro)
Yes but it's not optimized for that
Fsr2 does a bunch of unnecessary work when you use it for 1x upscale (AA)
I'm not looking at π ±οΈerf
Just if a correct impl of FSR2 also suffers from the same aliasing
Yeeeeeee
It's hard for me to tell if your vid shows poopy aliasing or compression artifacts (which are worsened due to discord mobile sucking)
Btw you will have to edit this shader to display normals
https://github.com/JuanDiegoMontoya/Frogfood/blob/main/data/shaders/ShadeDeferredPbr.frag.glsl
that video looks neat nonetheless
It's probably worse without any AA
the tree would probably go bonkers without
yeah definitely worse without AA
welp, I've ran all possible sanity checks, it looks like my impl of DLSS is without errors
At least, without obvious errors 
here is yet another fun fact about DLSS
Here's good ol sponza
looks pretty bad innit, well it is performance mode DLSS
Now here's NV's sponza
Also in performance mode, same resolution
wait lemme remove the shadows
There it is
notice any difference?
How in god's name is their sample's app, from which I stole all the code, look so much better
what in the everloving fuck
rename your shiddy.exe to whatevernvused.exe
actually
let me try that
I swear to god if something changes I'm pulling DLSS out
π
drivers might use hashes not filenames anyway i suppose
I'm changing the AppID
ok DLSS is safe
nothing changed
I'll try asking on NV's forums/discord
was worf a try
Different tonemapper? Lighting model? Post processing stack?
AI denoisers are super fragile and sensitive to "subpixel patterns" in your inputs
I've had the OptiX one refuse to work cause Mitsuba splatted samples to multiple pixels with a gaussian kernel
you have a pronounced difference in lighting
and quite a lot of moire on your curtains
I disabled all post processing in the sample app btw
What I don't understand rn is the moirè
I am using the same LOD bias as they are using
Did you check RenderDoc
Compare render and display resolutions, image formats, sampler state, etc
One thing I am noticing
In my stuff, I can only see the edges of objects jittered
Even with a 6.0x magnifier
In the sample app though I can see everything jittering 
Are you jittering your view or projection matrix
The proj matrix
const auto jitter = sample_jitter(_device->frame_counter().current(), _state.dlss.jitter_count);
const auto jitter_translation = glm::vec3(2.0f * jitter / glm::vec2(_state.dlss.render_resolution), 0.0f);
const auto jitter_matrix = glm::translate(glm::mat4(1.0f), jitter_translation);
view.jittered_projection = jitter_matrix * view.projection;
As nvidia tells me to do
rip idk
I thought your bug could be that you only jitter gl_Position but not the other attributes
I solved it
I forgor I was using a different view struct for the visbuffer resolve
goddamnit
How
with the power of friendship and copy pasting code from the sample app
the dethfrog had a hand in this probably π
I kinda have a defcon0 situation on my hands
debugPrintfEXT gives me device loss π
this bug is megaweird ngl
also DLSS is enabling the deprecated VK_EXT_buffer_device_address
so that's fucking up my validation layers too
I fear integrating DLSS should be the last possible step of any engine
because it makes debugging impossible 
Old code old code
This is why FOSS is best
You can go in and fix that shit
ye this absolutely sucks
sigh
looks like I've been debugging nothing for two hours
!remindme 12h open debug printf issue
Alright lvstri, I'll remind you about open debug printf issue in 12 hours. ID: 62513782
and just now I notice that my page table isn't actually wrapping around 
epic texture viewer achieved
worst texture viewer in the world btw
did you add asteroids π
asteroids? wym
that is because mister has no caching still
daily reminder to add caching mister LVSTRI
heh
#define sampler_partially_bound decorate_with_string("update_after_bind|partially_bound")
layout (local_size_x = 16, local_size_y = 16) in;
layout (set = 0, binding = IRIS_TEXTURE_TYPE_2D_SFLOAT) sampler_partially_bound uniform sampler2D u_texture_2d_sfloat;
layout (set = 0, binding = IRIS_TEXTURE_TYPE_2D_SINT) sampler_partially_bound uniform isampler2D u_texture_2d_sint;
layout (set = 0, binding = IRIS_TEXTURE_TYPE_2D_UINT) sampler_partially_bound uniform usampler2D u_texture_2d_uint;
layout (set = 0, binding = IRIS_TEXTURE_TYPE_2D_ARRAY_SFLOAT) sampler_partially_bound uniform sampler2DArray u_texture_2d_array_sfloat;
layout (set = 0, binding = IRIS_TEXTURE_TYPE_2D_ARRAY_SINT) sampler_partially_bound uniform isampler2DArray u_texture_2d_array_sint;
layout (set = 0, binding = IRIS_TEXTURE_TYPE_2D_ARRAY_UINT) sampler_partially_bound uniform usampler2DArray u_texture_2d_array_uint;``` ahhh yes
modern GLSL code
static auto make_descriptor_binding_flag_from_decoration(const std::string& decoration) -> descriptor_binding_flag_t {
const auto split = split_decoration_string(decoration);
auto result = descriptor_binding_flag_t();
for (const auto& each : split) {
if (each == "update_after_bind") {
result |= ir::descriptor_binding_flag_t::e_update_after_bind;
} else if (each == "update_unused_while_pending") {
result |= ir::descriptor_binding_flag_t::e_update_unused_while_pending;
} else if (each == "partially_bound") {
result |= ir::descriptor_binding_flag_t::e_partially_bound;
} else if (each == "variable_descriptor_count") {
result |= ir::descriptor_binding_flag_t::e_variable_descriptor_count;
}
}
return result;
}``` mmmm
love it
why dont you alias them to the same binding
hmm the e_ is ugly too, its quite obvious that its an enum already, otherwise iris* is quite sexy code wise
because I'd have to refactor my entire reflection system
it's not π¦
I still have to steal your gpu table of resources
one day I'll be 100% bindless
hmm we should make use of discord's soundboard π
Doesn't that require you to be in voice
now i think daxas descriptor code shrunk down to like 300loc for all descriptor management and i added a lot of validation
server wkde soundboard
hahaha yeeeesss
my pipeline.cpp is like 6000 lines
πͺ its time to wear purple
now i agree with that message
that guy reminds me of my racoon, who comes visit here every once in a while π
omg i love saky so much
live footage of potrick asking me to use daxa
i actually look like that irl
its more like this, potrick == picard, crusher == lvstri, they even sit on daxa coloured chairs π
lvstri making sure the daxa-fwog monopoly can never happen
this is perfect
static float3 RandomVectorInCone(in float3 direction, in float angle) {
const uint3 pixelCoord = DispatchRaysIndex();
const uint3 dispatchDimension = DispatchRaysDimensions();
const uint pixelIndex = pixelCoord.y * dispatchDimension.x + pixelCoord.x;
const uint sampleIndex = RayTraceCB.CurrSampleIdx;
uint state = pixelIndex * sampleIndex;
const float phi = RandomPCG(state) * 2 * 3.141592653589793284626433;
const float z = RandomPCG(state) * (1 - cos(angle)) + cos(angle);
const float x = sqrt(1 - z * z) * cos(phi);
const float y = sqrt(1 - z * z) * sin(phi);
const float3 tangent = normalize(cross(float3(0, 1, 0), direction));
const float3 bitangent = cross(direction, tangent);
const float3x3 rotation = float3x3(tangent, bitangent, direction);
return normalize(mul(float3(x, y, z), rotation));
}```
for posterity
0.00872665
Donβt mind if i yoink that
make sure to send screenshots of your results with the code
lvstri qhat gpu do you have
3070 doc
nice
I may or may not be procrastinating on caching for my shadows with RT
ngl the RT API in Vulkan is super convoluted wtf
it also has lots of options that are just not usefu
like cpu side build
early days
btw @frank sail
I figured out a very much more shrimpler way of doing your unhinged glm::inverse(bababooey) * baba_is_you * stable_view
please god yes
const auto clip_world_position = view.stable_proj_view * glm::vec4(_camera.position(), 1.0f);
const auto uv_world_position = (glm::vec2(clip_world_position) / clip_world_position.w) * 0.5f;
const auto page_offset = glm::ivec2(uv_world_position * glm::vec2(IRIS_VSM_VIRTUAL_PAGE_ROW_SIZE));
const auto ndc_shift = 2.0f * (glm::vec2(page_offset) / glm::vec2(IRIS_VSM_VIRTUAL_PAGE_ROW_SIZE));
const auto world_page_offset = view.inv_stable_proj_view * glm::vec4(ndc_shift, 0.0f, 1.0f);
const auto world_page_offset_shift = glm::vec3(-world_page_offset);
const auto shifted_view = glm::translate(view.stable_view, world_page_offset_shift);
view.view = shifted_view;```
will analyze in a bit
Does this suffer from "if player moves too far away then Z range is fucked" problem I wonder
probably
because I'm supposedly translating the view matrix to where the player is, to the nearest page
rip
the solution for z will be more complicated
you will need a per-page z offset or something
when we discussed, I understood that it did not solve that problem
o
so saky's thing is massively more clamplicated but it doesn't solve the problem? π
that's how I understood it, idk
for orthographic projection, fp32 makes no sense
use/emulate unorm32
with fp32 you waste 2 bits
and you have a logartihmic distribution of the remaining 30
ye but I can't do atomicMin on a unorm32 image can I
you can do it on a uint32 image tho
unorm is just that, but with an implicit division by U32_MAX
I suppose you will have to do your math in fixed point to see any real benefit though
It does solve it
pog
I have to have per page z offset
ah
And my thingy
And a bit more logic and it still has some quirks
So id suggest just go for sliding along the plane 
can't we fix by translating the origin of the world somehow
perhaps by recreating the stable view matrix to point at another center
You need to correct the depths then
That's what I do
Uh maybe I misunderstood actually
you can invalidate everything if the player goes too far from the origin (on the light-space z axis), if you want to use minimal effort
then you can shift the light camera
what's an invalidation every time you move 2000km
good luck getting sufficient z precision
rip
you could make the frustum length like 1000 units and then shift every 500 (with a buffer zone to prevent the player from constantly triggering full refreshes by moving past a threshold)
Btw how do you deal with player going into negative coordinates from the origin? Won't it shift the sun camera underneath the terrain?
depth clamping 
yeah not much you can do there except make the frustum longer
most game content probably won't span such a huge area
vertically, that is
I want start citizen planets π₯Έ
make a bigger frustum
full scale planets yes
My frustum size increases with each clipmap
you don't need insane precision when you are 50,000,000 km from the surface
So it actually is fine
everything would be so much easier if we had infinite memory and infinite precision smh
raytracing is bad for my health
it's 3am and I am staring at path traced power plant
I have been staring at it for 10 minutes
this is a cry for help
go to sleep and dream about path traced frogs
thank god tomorrow is saturday
i am sitting in a similar boat
i should sleep
Now he gets it
are those frame times supposed to be normal?
i have no clue but usually frame times don't become sinosodual
depends on your definition of normal π
babe wake up new frametimes just dropped
technically speaking
average frametime is π₯
just ignore the 1% lows
does this mean my blocker search has not enough shrimples?
Looks fine to me
me when shown literally any kind of contact hardening:
but perhaps the light size is too big 
Add sliders for sample count, width, etc
ye
I'll allow it
nice
I could recognize that rust texture anywhere
sauce
so it seems like UE5 do be making an HZB for the VSM
tbh I think hzb would work if you have a two-pass approach
remember that unreal's meshlets are 99% of the times smaller that a page
so they can do HZB per page
we gotta think more heavily about it 
ye I'm doing that rn
ALSO
I determined that HZB is only helpful for dynamic geometry
if a page wasn't previously visible (when the camera moves or the light rotates), then it never had meaningful depth to cull against
so all geometry that touches that page must be rendered
anyways, here's the idea:
- the usual: mark & allocate visible pages, clear dirty physical pages, etc.
- build HPB and cull visible objects against it (visible objects are determined from step 4 of last frame)
- render remaining visible objects
- build HZB and cull objects against it
- render objects whose visibility changed from 0 to 1 (this is essential to avoid getting fucked by the cached nature of pages)
again, HZB only helps when moving geometry can cause an already-visible page to become invalidated
idk if geometry can move in any of our engines 
π
HPB however is useful for everything
and I don't think they can be cleverly combined like I originally thought
actually I think they can be merged if you put HPB in step 4
wait uh
with merged HPB+HZB in step 4, if you see a new page, objects may not be rendered to it in the first render, but they shouldn't be culled in step 4 as the HPB+HZB will be empty, which means they should be rendered in step 5
the idea requires storing object visibility until the next frame, which is numViews * numObjects bits of storage
where an object is presumably a meshlet
thank god for uint64_t 
well even if you have a million meshlets and 16 views, 16 million bits is only 2 MB
it's not as bad as trying to store the maximum number of indices for every view 
it's like 3 orders of magnitude less storage
tru
alright it's time to switch things up a bit
I shall put VSM in the backburner for a while
potrick while you're here
do you mind explaining a bit how daxa's resource table work on the C++ side
how are BufferIds and SamplerIds created, bound to descriptors and destroyed specifically
I am still a believer
in this endeavor
I thin HZB will help when you have animated thingy which just sways for example and you are redrawing the tile each frame
I have not abandoned you guys lol
the VSM train is still going strong
'Tis but one of my usual detours
hello
Example for Buffer:
- creating it gives you an id (index + version)
- index of id indexes into cpu side array of ImplBuffers (the metadata for the buffer)
- index indexes into a descriptor set binding array
- when creating the buffer its imediately written to the mega descriptor set
- daxa only has one descriptor set that has update after bind and some other flags set to make it convenient
- when calling destroy on the buffer it becomes a zombie
- zombies life until all already submitted commands running at the point in time when you call destroy are done
- daxa checks when they are done and actually performs destructions in Device::collect_garbage
- it uses timeline semaphores tracking submits on a cpu and gpu timeline
- actualyl destroying the buffer writes a dummy in the place of the dead buffer to avoid dangling descriptors
epic
then to access the buffer do you use push const or something?
to index in the buffer table that is
you either put it in a push constant or bind it as a uniform buffer (yeeaaa i know, i am not sure if i wanna keep them uniform buffers but daxa has them atm)
lustri, traitor
idk
why not coarser granularity
I was writing code for bvh building for hardware RT
it's probably best innit
maybe meshlets
except meshlet triangle upper bound is 65536 triangles 
perfectly balanced
Lvstri uses daxa
I stop using phobos
I think I'll start using other's people stuff after I make a render graph for my own stuff
a mostly complete rendergraph is massive pain and work
thats when i enslaved saky saky joined. Without our combined brains it would havebeen impossible
@delicate rain tell your tg pain
something something sparse something? : )
Currently thinking about full bindless, but I'm wondering if I have enough descriptor set bindings
I need:
- binding for everything that is VK_DESCRIPTOR_TYPE_SAMPLER
- binding for everything that is VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE
- binding for everything that is VK_DESCRIPTOR_TYPE_STORAGE_IMAGE
Possibly even one for VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER but let's not 
you could use mutable descriptors and make everything in the same binding
but I think support for that is extremely low 
inshallah we will have d3d12isms
do I actually need that
ok support is actually decent on newer desktop gpus
no, but it allows you to jam everything into one binding
mainly it's used for porting d3d12 apps to vulkan 
ye I could just do this
layout (set = 0, binding = 0) uniform sampler sampler_table[];
layout (set = 0, binding = 0) uniform samplerShadow sampler_table[];
layout (set = 1, binding = 0) uniform image1D image_table[];
layout (set = 1, binding = 0) uniform image2D image_table[];
layout (set = 1, binding = 0) uniform image3D image_table[];
layout (set = 1, binding = 0) uniform image1DArray image_table[];
layout (set = 1, binding = 0) uniform image2DArray image_table[];
...```
it also wastes memory because it essentially makes your descriptors a union
it's ok I'll just allocate enough memory that nobody's ever going to use
128MiB of descriptor mem sounds reasonable enough
oh cool, can image descriptors alias without any extension?
I believe that's what daxa does
seems legit
you need to do formatless ext etc but ye
ye but everyone has it so eh
I'm having a cursed idea
so you know how textures can be streamed in and out right
I'm thinking of shader_resource_table.remove_resource() and shader_resource_table.insert_resource();
Instead of having dummy descriptors, I just use a flat hash map that keeps track of all my textures and textures ids
so I can remap indices on the fly
I'm not sure either approach is doable either

I'm searching for the laziest, easiest way I can do this
leveraging the power of flat hash maps
you basically just need to allocate indices for descriptors, right
or are you trying to solve something else
ye I just need to allocate indices for descriptors
and be able to remove them without shifting shit around
like a page allocator where 1 descriptor = 1 page
no need to overthink it 
yeah just keep a list of free indices or something
a "free list" of sorts
nah slow
__builtin_clz baby
even shrimpler
and I can reuse the old crusty VSM page allocator I did on the CPU side
when I still believed in hw sparse π
it's a great approach
but allocations are O(n), rip perf
worst case 
when you have billions of descriptors that is
a performance disaster in vblancospeak
so more like O(sqrt(n))
ye but O(n/64) is asymptotically equivalent to O(n)
Ye
The secret ingredient is lying
Ttl?
time to live
so a descriptor slot remains allocated as long as it's TTL > 0
and it's freed when TTL = 0
Or be a man and overwrite it while in use
Btw I did not implement ttl for my gpu allocator in my voxel engine and chunks would sometimes artifact for a frame or two when you modified stuff 
The real issue was that I had occlusion culling data that was used next frame
But the end result was the same
technically
I can update while in use
I'm just not sure how leinent drivers are
but UPDATE_AFTER_BIND actually allows it 
I don't think it means modifying the descriptor that is in use though 
One weird trick to not have any validation errors
here's another issue, double buffered resources
hmm
actually
not an issue
I can just allocate 2 ids
void init() {
view_buffer = shader_resource_table->allocate_shared_buffer_resource<frames_in_flight>(); // std::vector<buffer_id_t>(2);
}
void render() {
thing = shader_resource_table->get_buffer_slice({ .id = view_buffer[current_frame] }); // returns buffer_slice_t, refreshes the cache
thing.insert(...);
}```
oh god this is gonna take ages
I can basically just delete pipeline.cpp and remake it from scratch
amazing
π
lovely
hmm hmm my stomach bubblin'
sexy
with good ol macros it's a little less
now comes the hard part
the shader resource tableβ’οΈ
hehe its pretty crazy how do you use all of them, must be some big ass if else block in main, no?
perhaps next evolution is generating the shaders to whatever you need it generate to
I make a macro that generates more macros that access these
the usage I'm going for is this
void main() {
vec3 payload = IRIS_STORAGE_IMAGE_2D_LOAD(uint32, image_id).xyz;
}```
I make sure to give publicity to daxa 
ah π
beautiful
its readable, i like it
I wouldn't say this part is especially readable but the rest is manageable at least 
#define _IRIS_ACQUIRE_COMBINED_SAMPLER(dimension, type, image_id, sampler_id) sampler##dimension(_IRIS_ACQUIRE_SAMPLED_IMAGE(dimension, type, image_id), u_sampler_table[sampler_id])
#define _IRIS_ACQUIRE_COMBINED_SAMPLER_SHADOW(dimension, type, image_id, sampler_id) sampler##dimension##Shadow(_IRIS_ACQUIRE_SAMPLED_IMAGE(dimension, type, image_id), u_sampler_shadow_table[sampler_id])```
can you return opaque types from functions in glsl
like
image2D id_to_descriptor(uint id) {
return table[id];
}```
I can't
but I have achieved epic syntax regardless
#define output_image iris_image_accessor(restrict_write, u_output_image_id)
#define texture_2d_sfloat iris_combined_sampler_2d(float32, u_texture_id, u_sampler_id)
#define texture_2d_sint iris_combined_sampler_2d(int32, u_texture_id, u_sampler_id)
#define texture_2d_uint iris_combined_sampler_2d(uint32, u_texture_id, u_sampler_id)
#define texture_2d_array_sfloat iris_combined_sampler_2d_array(float32, u_texture_id, u_sampler_id)
#define texture_2d_array_sint iris_combined_sampler_2d_array(int32, u_texture_id, u_sampler_id)
#define texture_2d_array_uint iris_combined_sampler_2d_array(uint32, u_texture_id, u_sampler_id)```
totally not inspired by daxa 
#version 460
#include "bindings.glsl"
iris_declare_storage_image_descriptor_qualified(restrict_read, restrict readonly, image2D);
iris_declare_storage_image_descriptor_qualified(restrict_write, restrict writeonly, image2D);
#define input_image iris_image_accessor(restrict_read, u_input_image_id)
#define output_image iris_image_accessor(restrict_write, u_output_image_id)
layout (scalar, push_constant) restrict readonly uniform u_push_constant {
uint u_input_image_id;
uint u_output_image_id;
};
layout (local_size_x = 16, local_size_y = 16) in;
void main() {
const ivec2 size = imageSize(input_image);
if (any(greaterThanEqual(gl_GlobalInvocationID.xy, size))) {
return;
}
const ivec2 position = ivec2(gl_GlobalInvocationID.xy);
const vec4 payload = imageLoad(input_image, position);
imageStore(output_image, position, vec4(linear_as_srgb(tonemap(payload.xyz)), 1.0));
}```
wouldnt surprise me if daxa was inspired by sweatshop.pl π
god fucking damnit this is so good
I can literally stop thinking about anything
just handle = srt->allocate(); and buffer = srt->acquire(handle);
it's crazy
it really is amazing
i tried to see if devsh had any bindles util like that but he doesnt seem to have any
designing and deciding on the makros were pure pain in my head
endless changes
now im happy
btw
what do you use for dummy descriptors
do you just create a 1x1 image or something
yes
epic
but im considering using the robustness vulkan feature stuff
it's quite sad we can't use null
you actually can
but you need some feature
it can apparently tank perf
so im scared of it
but dx12 has it default enabled for everything afaik
so cant be too bad
robustness also makes it legal to read and write out of bounds
it ignores writes and on reads you get 0
device loss be gone
REAL
maybe i make it optional or something idk
but i think its very nice that dx12 saves you and forced gpu makers to implement hw acceleration for these checks
i also vaguely remember it was for mobile maybe
so desktop might not care
mobile is not real so we're good
yea
I doubt the Vk/Dx drivers are much different either
random insane fact: ada lovlace has 128 bit atomic cas
ah, i see
i really need to advance deeper into gpu drivenisms in order to understand all this
Hmm
me wonder
couple the ShaRT with frames in flight or not
shit I have a devilish idea
one ShaRT per frame in flight

a devshish idea?
yes

if one ponders the orb, one shall realize that two frames in flight might have completely different ShaRTs
DLSS3 knockoff

oh
did i just reinvent it
i feel like i dropped into a barrel full of toxic waste and grew superpowers lol
take your time frogking
thic
it's 1MiB now 
hmmmmm
Since updating a descriptor set only comes with allocation/deallocation
should I make that RAII-style
I'm lighting the daxa beam
mr potrick, how do you handle buffers/images in the gpu resource table that change within a frame in flight
so for example, say you have a camera buffer that you update every frame
well
either i just alloc from a per frame staging buffer (device local or host local depending on what its used for)
or i make an array inside the buffer of the cam info
and pass the index to it
write the appropriate part
the table doesnt know anything but creation and deletion
I don't think there is any specific resource table handling to them
the indices are 100% tied to the resources
so when i change a resource between frames i pass different ids to the shaders
what about differences in bindings between frames?
expand
e.g:
frame 0: I want to allocate these two images, please update this frame's descriptor set with the two images
frame 1: I want to allocate three more images, please update this frame's descriptor set with the three images
afaik calling vkUpdateDescriptorSets is illegal while the set is in use

i dont understand
there is one descriptor set
you can update slots that arent used
its fine if the set is in use
the only restriction is that the specific slots within the descriptor array arent actually used
it's fine as long as you don't touch slots that are in use?
yes
pog
so for the camera buffer, staging and then vkCmdCopy?
well its actually bacially orthogonal to daxa @wicked notch
No?
what I do is either:
- have a single buffer that i copy to from staging once a frame
- simply use device local host visible scratch memory that i get an offset into (linear alloc) then write and pass the bda to the shader, so no staging or copy. Just instant cpu write then use on gpu, ultra fast.
you can do that
i usually dont anymore
bar memory is sexier
The nice thing about 2) is that daxa (task graph) handles the lifetime of the memory for you
why only one set though? why not have 1 set per frame in flight
so you can update one while reading the others
what if next frame, I free 1 image, add its index to the freelist, and then try to write into that
imagine I have image A bound to a slot, next frame I remove this image but also register a new one, if I add A's old index to the freelist immediately then it will be picked up as the index for the new image
If you free a buffer or an image it is only freed after GPU frame cnt catches up to the point on CPU frame cnt where it was freed
and if I have a frame in flight, something might be reading that slot
it uses way more memory to have multiple sets
there is also no benefit afaik
daxa also doesnt know what fif is
that's interesting
So there is no way for you to use a slot that is currently in use on the GPU still
Ye I'm trying to design this such that FIF are trivial too
It is deferred
so you don't have any core systems that rely on FIF like deletion queues
that wont ever be a problem
its checked and deferred
i do
it doesnt rely on fif
I don't think relying on fif is good anywhere
it checks a timeline semaphore
It's too arbitrary
there is a cpu and gpu submit timeline
I might want to use daxa for compute sim where there is no bound on fif
destructions are deferred until the gpu catches up to the cpu at the timepoint of destruction call
its very easy to implement\
daxa still has a cleanup function that should be called once a frame to do housekeeping
but its not tied to fif or anything like that
It is arbitrary what daxa has is much better
FIF is pretty convenient for anywhere you need to do n-buffered CPU-GPU sync
do you do timeline semaphores for all of that
CPU-GPU timeline makes more sense though
just be aware to not do the cherno bullshit of abstractions that have n copies inherently per resource per fif
Yeah but that is users responsibility
Handling fif for readback and everything I mean
btw im working on a tg facelift atm that will make it easier to use, more powerful AND less loc
yeah that's interesting though, I think if I had timeline based deletion logic I could pull FIF code out of my core vulkan context
Thinking about embedding fif into abstractions makes my brain hurt
Too complex for me
i got that from dolkar, they do that too. Very awsome. Its also verrry simple to implement. Just keep in mind that it wont defer non-submitted commands
my context just owns the counter
and I mainly use it to scale buffer capacity for stuff like uploading instances
easy, daxa user doesnt even know what it is really cause it just works tm π
gabe forgot most things about descriptors
I did too
good
le as that


