#Iris - A Journey through OpenGL and beyond to learn Graphics
1 messages ยท Page 2 of 1
They did troll me quite hard, ngl
Searching the topic on Google yields countless stackoverflow questions lol
Up there with null pointers for sure :^)
I am one foot away from doing a good ole glReadPixels
Why did nobody think of making a debugger for shaders?!
renderdoc is
not with your usual break points, but you can pretty much do printf-like debugging with it
martty was working on some debugger thing
Intro GPUs are complex beasts - and certainly more mysterious in some ways than CPUs which come with ample amounts of documentation and manuals. Aspiring graphics programmers (too insignificant to have a devrel contact to give insight) are sometimes left to scrounge old GDC presentations on performance tips, with very little known about the inne...
I am so happy
Finally
Now I can ponder why humans can't ever reach an agreement on anything
Why GLSL/HLSL don't follow the same goddamned conventions
Quick backup to not lose this
Also, I have no idea what the "stabilize" part of the code does, if anyone can explain it that would be much appreciated
Looks like it snaps the light matrix's offset somehow
Probably to prevent shimmering/swimming artifacts when the camera moves slightly
Looks like it
It has something to do with filtering later too somehow.
According to the blog post I am following to do this
It's huge but it's very good, I should credit this
it's a great post
btw, I don't see where it mentions the stabilization thing in the post
is the code for it in the accompanying code?
You could
Does it make sense to?
I'm not sure tbh
I'd lean towards no since it aims to denoise shadows that are very expensive to generate, whereas you can just take more samples and/or apply a cheaper denoiser in your case
Are you using random disk sampling for your shadows yet 
That's exactly why I asked ๐
I am using Poisson Disk sampling right now
I mean
I would be using Poisson Disk if my shadows weren't broken
But yeah, that's the plan.
are you using hardware pcf as well
that's like an extra 4x total samples for freeโข๏ธ
Very broken ๐ฆ
hmm it's not terribly wrong though
maybe it's just something shrimple like using the wrong matrix for some of the cascades
I think either my PSSM is broken or the shadow frustum's min Z is
Leaning towards the latter.
This is why I'm leaning towards the second by the way.
No idea what happened in Slice 1
Slice 0 is empty too for some unknown reason
1st and 2nd matrices are definitely broken in some way I don't know.
I can't find any problems though...
Turns out there are problems
Going CPU side
One day GPU debuggers will exist I'm sure.
Even though it's been 40 years
maybe you can help martty with work on his
I don't think I have the required skills, I wish I could help though
Also it looks like it works only on AMD GPUs sadly ๐ฆ
But it makes sense, since they have open source drivers after all
doesnt mean it has to stay on amd, and i think you are quite smart
just seeing you how much effort you put into small things
If only I'd decided to do everything on the CPU first..
It was such a trivial error that I'm truly ashamed of anything I've done
struct cascade_data_t {
mat4 projection;
mat4 view;
mat4 global;
mat4 pv;
vec4 scale;
vec4 offset; // w is split
};
is supposed to be
struct cascade_data_t {
mat4 projection;
mat4 view;
mat4 pv;
mat4 global;
vec4 scale;
vec4 offset; // w is split
};
We did it
I'll make sure to thoroughly comment my compute shader so that people can avoid having multiple strokes over the course of weeks
(Like I did)
And we also have SDSMs
Finally
Turns out I had to remap the linearized Z back into [0;1] by dividing it with far - near
Almost subpixel perfect
Is it, what's wrong?
it could just be the poor quality of the textures ofc, but it looks so washed out that the colors seem to roll around somehow
Does this look bad as well?
you can see that uglyness on the pillar behind the thing
It does to be to be honest
yeah it looks schtrange
mayhaps initial textures are just shit
if you can disable everything you have going and just output albedo
mayhaps we can see whats fooking with the colors
yeah
now the plant pot thing looks somewhat like red marble (in the top image) - how it should be
Storing every texture as linear SRGB and disabling the SRGB framebuffer results in this
Which is reasonable I guess?
albedo/emissive needs to be srgb
this looks correct-er
That's what I learned
But using SRGB gives this...
What am I doing wrong ๐ค
Not at all
Here you can see my output shader https://github.com/LVSTRI/Iris/blob/master/shaders/4.1/simple.frag#L160
hmm looks harmless, but unrelated to that, you could save one texture sample there
if you sample your base color into a vec4 (diffusergb+alpha)
True, but I'm severely CPU limited at the moment, not exactly struggling on GPU compute ๐
at 2K my GPU is pulling 10 watts
I have decided to put this on low priority
Since I have one huge pressing issue
I am severely CPU limited
And I don't have a weak CPU
I am only getting 20 FPS on San Miguel
With my GPU basically asleep
How can I go and fix this problem?
albedo textures should have sRGB internal format, and you should have framebuffer sRGB enabled too
Should I do something about the 3786 issues NSight is reporting 
nah it's probably fine 
Yes I do have that, unfortunately that results in ugly colors as seen in the picture above ๐ฆ
hmm
Maybe your tone mapper is scuffed
I'd have to look at a capture to see if the srgbisms are correct, tbh
basically, there are a few places where you need to think about sRGB:
- where you create material textures (albedo should almost always be sRGB, everything else not)
- just before you output color to the screen (you either need to apply the linear-to-nonlinear sRGB conversion yourself, or enable framebuffer sRGB so it happens automagically)
- in deferred renderers, it's useful to make your albedo gbuffer texture sRGB to maximize precision (but it won't affect absolute correctness, just how much banding there is)
This capture satisfies the conditions you just laid out btw
It has a SRGB framebuffer, GLFW_SRGB_CAPABLE is true and albedo textures are stored as GL_SRGB8_ALPHA8
should be all good then
I'll send a pic of how albedo looks on my machine
Btw one trick you can use is to display albedo only and see if it looks exactly as it appears when you open the actual image file in the model
Thing is, albedo looks fine if I output that only
But if I multiply it by a small constant, say 0.025 then it gets ugly quickly
I don't see the issue tbh
The issue is that this thing (SRGB framebuffer + SRGB textures)
Looks worse than this (Non SRGB framebuffer + Non SRGB textures)
for some unknown reason
Non sRGB everything cancels out to the correct result when you have no lighting
!remindme 5 hours check this capture
Alright Jaker, I'll remind you about check this capture in 5 hours. ID: 56576618
Ah I forgor to mention one thing
If I make the offscreen framebuffer SRGB too
It looks fine
Disregard that
It doesn't 
here is albedo in mine
direct lighting (I have a tonemapper so it's not exactly comparable)
Yours looks... a bit darker?
If my eyes and monitor are portraying colors correctly
hmm
I'm using the gltf from the khronos gltf samples repo
also you need to enable anisotropic filtering
look how blurry the floor is next to the lion bust
but that's irrelevant hehe
True, I quickly did it heh
Ah by the way, do you know any good denoisers to use in combo with Poisson Disk sampling?
I don't know of any specifically for pcf
a small (3x3) bilateral filter might work for sun shadows specifically
otherwise you just take more samples I guess
Pcf is very cheap in that regard
And ideally you have hardware pcf enabled too (via shadow samplers)
I'll send a rdc capture of my thing in a sec
Yeah I just did that
I have to leave soon so it'll be easier for me to do that than to look at yours
Sure, thanks a lot
frick it's 300 mb
one sec gotta use a smoller scene
this model is SM_Deccer_Cubes_Textured_Embedded.gltf
I'm drawing the g-buffer albedo on top of everything else in this one (at the end of the shading pass)
that's fine
Should I be concerned?
nah it's normal
for my thing
I call glInvalidateNamedFramebufferData before drawing the base pass, so it looks like that in renderdoc
Tbh, I think there's a good chance that your thing isn't wrong
Looks like it was my oversight
Nah it was wrong
I said making the offscreen framebuffer didn't work
Turns out it did, but I forgor to set it to sRGB when resizing
so it was just going back to UNORM
rip
Works fine now, onto poisson disk
This is going to take a long time to implement isn't it..
I'll just end the day with cascade to cascade interpolation
this is neat
Did you know that you can see noise in the penumbra of dynamic light shadows in half life 2
hahaha i dont actually know why they messed that up
on later source games (p2, csgo) they just used conventional filters
I launched HL2:DM a while ago and saw that most object shadows were just chunky hardware pcf, but for some reason I couldn't use the frickin flashlight
mp_flashlight 1
most object shadows were just chunky hardware pcf,
thats the RTT shadows ๐
anyways, here's what I'm talking about, but in gmod
yeah
its this
if im correct
they even have contact hardening
unfortunately it doesnt work
rip
Before I take the leap into the never-ending rabbit hole of MDI
Should I apply the bilateral filter before the albedo pass?
the noise doesn't bother me personally
btw, are you using blue noise?
or some kind of low discrepancy sequence
Just a regular poisson disk.
Hmm I'm sampling the whole disk sequentially, as in:
for each in poisson_disk { texture(shadow_map, uv + each * texel_size); }```
A vec2[] with a lot of random values
how do you make those random values
Right now they're static, but I could make a poisson generator make them every frame, should I?
not yet
can you describe how you generate the random numbers though? this is the important detail I'm wondering about
because random numbers matter a lot, and sometimes you want certain kinds of random numbers
The ones in the poisson disk?
ye
I just took the array from some random GLSL sample I found on github 
Nope ๐ฆ
notice how there is no clumping and no voids, but the samples still appear to be somewhat randomly placed
Ah I see
white noise suffers from clumps and voids, which, while eventually converging to the correct result (with enough samples), is visually displeasing
I'll whip together a quick shadertoy to show how blue noise vs white noise looks for dithering
Alright
and then theres IGN
too soon
Oh hi, I have a new member in this rambling thread
Blue noise looks like it's the wrong way around?
the goal is to approximate a black2white gradient with a limited palette
this is how it looks for me
Oh it's correct then
Fair enough
The bottom one I thought was "wrong" but it's all good then.
ah
yeah the bottom one is showing what happens when you just quantize without any noise
anyways, the value of blue noise in sampling is that it gives you nice coverage of the sampling domain with few samples
low discrepancy sequences have a similar property (scroll down to the funny pictures)
https://en.wikipedia.org/wiki/Low-discrepancy_sequence
Hmm, so I use the output of blue noise to sample the shadow map then right?
yeah
blueNoise.rg
well, there's two places that need randomness
first, you're taking multiple samples per pixel, so you need some LDS/blue noise for those samples
second, you want each pixel to have a different seed to start with, otherwise you get big banding artifacts
resolution * frag_coord + time to the rescue
what I do is use a hammersley sequence to generate random samples on a disk for the samples, then sample a blue noise texture to generate per-pixel noise
So uh
I'll show code
uv = hammersley(...);
shadow = texture(shadow_map, shadow_uv + texture(blue_noise, uv).rg * texel_size);```
it's a little more involved than that, because hammersley only gives 2 random numbers in [0, 1], but we actually want a radius + angle (for disk sampling)
vec2 xi = 2 random numbers
float r = radius (I use sqrt because I want to uniformly sample the disk, no sqrt means you get more samples near the center)
then I do polar to cartesian on the next line
Why does sqrt give you uniformness?
think of it this way
the circumference of a small circle is smaller than the circumference of a large circle
that is, at smaller values of r, points (at random angles) clump up more
I'm explaining this horribly ๐
Nah I get it
Square root graph in [0;1] more or less looks like ln
you can find a proof of sqrt being the correct normalization thingy on stackoverflow
And it makes sense to sample less/(more?) at smaller radiuses
yeah so you want fewer small radii because points are closer on those
I'll take the proof
Alright next query.
now I need to explain toroidal shifts
(AKA why there is fract in my code)
I don't see you sampling blue noise here, am I supposed to use vec2(r * cos(theta), r * sin(theta)) to sample the blue noise?
oh that's because this code isn't actually using blue noise 
but I can show you where you would plug it in
that shitty hash function on the line I linked should be replaced by sampling a blue noise texture
the fract, btw, is used for combining the two noise sources correctly
not exactly
you want to sample exactly the texel at gl_FragCoord % blueNoseTextureSize
so you use texelFetch for that
well I did something stupid in my code that uses blue noise
textureLod(s_blueNoise, (vec2(gid) + 0.5) / textureSize(s_blueNoise, 0), 0).xy;
pretend gid is the same as gl_FragCoord (I use it because this is from a compute shader rather than a fragment shader)
idk why I didn't use texelFetch
We don't like division so texture(blue_noise, fract(gl_FragCoord / vec2(textureSize(blue_noise, 0))))

notice how we still divide, epic
I guess you can argue that mod is a lot slower on GPUs
but it's literally one op so who cares
ah, I present you with two methods
A whopping two, incredible
method 1 (medium effort): use this thing
https://github.com/gao-duan/BlueNoise
method 2 (low effort): copy textures from here (what I always do)
http://momentsingraphics.de/BlueNoise.html
method 3 (high effort): implement your own algorithm
https://blog.demofox.org/2017/10/20/generating-blue-noise-sample-points-with-mitchells-best-candidate-algorithm/
pranked, I gave you three options
btw here's some more reading
https://psychopath.io/post/2014_06_28_low_discrepancy_sequences
then, after all this, you can do the MDI or bilateral filteringisms if you feel the shadows are still too noisy ๐
I wonder if you could implement blue noise generators on the GPU
afaik one challenge is that blue noise algorithms tend to be very serial
well these big brained people used DSP theory to make one on the GPU
https://tellusim.com/blue-noise/
blazing fast
6 minutes for 1024^2 blue noise on a 2080 Ti
What's a good resolution, 128^2?
if you want to go even deeper, you could check this out. I haven't managed to fully comprehend this article myself
https://developer.nvidia.com/blog/rendering-in-real-time-with-spatiotemporal-blue-noise-textures-part-1/
just try different ones and see what looks best
too small usually gives you obvious tiling artifacts
the biggest one I have in my repo is 256^2, so I guess I found that to be satisfactory for my uses
i saw void use 128^2 and i use it since, it feels like it also looks better than 32^2
I think it depends on what you use it for
ja
i used the big one for shadow ๐
but it could also just be a brain messing with your vision thing
256^2 vs 16^2 blue noise for my rsm
Does this look good?
for (uint i = 0; i < sampled_count; ++i) {
const ivec2 noise_size = textureSize(blue_noise, 0);
const ivec2 noise_texel = ivec2(int(gl_FragCoord.x) % noise_size.x, int(gl_FragCoord.y) % noise_size.y);
const vec2 xi = fract(sample_hammersley(i, sampled_count) + texelFetch(blue_noise, noise_texel, 0).xy);
const float r = sqrt(xi.x);
const float theta = xi.y * 2.0 * 3.1415926535897932384626433832795;
const vec2 offset = vec2(r * cos(theta), r * sin(theta));
const vec2 s_uv = shadow_frag_pos.xy + offset * scaled_texel_size;
shadow_factor += texture(shadow_map, vec4(s_uv, cascade, light_depth)).r;
}```
Shadows look very soft and nice
can i see closeup?
looks nice
also 3.1415926535897932384626433832795; u might wanna setup defines for common math constants
Hmm closeup is very sharp.
That's because I am using SDSM.
How can I soften shadows based on how far I am from them?
Too sharp
No, a regular layered framebuffer
ah
float calculate_kernel_shift() {
float d = texture(depth, gl_FragCoord.xy / vec2(textureSize(depth, 0))).r;
d = (2.0 * camera.near * camera.far) / (camera.far + camera.near - d * (camera.far - camera.near));
const float x = ((d - camera.near) / (camera.far - camera.near)) * 2.0 - 1.0;
return clamp(1 / (1 + exp(-x)), 0.0, 1.0);
}``` is this a good way of mapping linear depth [-1;1] to a sigmoid?
Damn my current mesh and model classes are completely incompatible with MDI 
heh realizing that is the first step to mdi
from looking through this thread you are making great progress ๐
Thanks, I'm slowing down a bit since the things I want to do are getting exponentially harder 
But I'll still try to keep up
Hmm MDI is all about huge buffers
Real max chungus buffers
But even those humongous buffers will eventually run out (at least if I allocate them reasonably big and not take the whole VRAM with me)
How to handle this... hmm
Perhaps, each mesh is actually just a vertex offset + index offset + pointer to the "global" VBO/EBO?
When drawing I could probably do vector<vector<MESH_ID>> where the first vector is the index of the vbo
I would then use them like this?
vector<vector<mesh_handle>> stuff(MAX_VBOS, vector<mesh_handle>());
// when loading a mesh
auto [vbo_id, mesh_handle] = load_mesh();
stuff[vbo_id].emplace_back(mesh_handle);
// much later
for each [vbo_id, meshes] in stuff {
bind_vbo(vbo_id);
vector<MDI_struct> more_stuff;
for each mesh in meshes {
more_stuff.push_back({ mesh.indices, mesh.offsets, etc });
}
memcpy(mapped_indirect_buffer_ptr, more_stuff.data(), size_bytes(more_stuff));
glMultiDrawIndirectโข();
}```
What to do about different shaders then, hmm...
Even more grouping, perhaps
vector<vector<vector<mesh_handle>>>
[vbo_id][shader_id][mesh_id]
I have no idea if I'm on the right track, how do you guys handle this?
mayhaps keep it a little more shrimple
here is what i do
i stuff all my mesh primitives into 1 vbo/ebo pair
via something called MeshPool
MeshPool keeps track of VertexOffset/VertexCount/IndexOffset/IndexCount per mesh primitive
and owns the VBO/EBO
i do something similar for my materials
then when i draw meshprimitive1, i grab it from the pool (vertexoffset/count/indexoffset/count) and add an entry in my indirectbuffer
and at the same time into my instancebuffer (instancebuffer is just my worldmatrix+materialindex per mesh i want to render)
then i bind all those
vbo/ebo/materialbuffer/instancebuffer/indirectbuffer and glMultiDrawElementsIndirect
all my materials use bindless textures
...
layout(location = 2) out flat int v_mesh_material_id;
...
struct GpuModelMeshInstance
{
mat4 WorldMatrix;
ivec4 MaterialId;
};
layout(binding = 1, std430) readonly buffer InstanceBuffer
{
GpuModelMeshInstance Instances[];
} instanceBuffer;
void main()
{
GpuModelMeshInstance modelMeshInstance = instanceBuffer.Instances[gl_BaseInstance + gl_DrawID];
v_position = (modelMeshInstance.WorldMatrix * vec4(i_position, 1.0)).xyz;
...
v_mesh_material_id = modelMeshInstance.MaterialId.x;
...
gl_Position = cameraInformation.ProjectionMatrix * cameraInformation.ViewMatrix * vec4(v_position, 1.0);
thats my vs
...
#extension GL_ARB_bindless_texture : enable
#extension GL_ARB_gpu_shader_int64 : enable
...
layout(location = 2) in flat int v_mesh_material_id;
...
struct GpuMaterial
{
vec4 BaseColor;
vec4 MetalnessRoughnessOcclusion;
vec4 EmissiveColor;
uvec2 BaseColorTexture;
uvec2 NormalTexture;
uvec2 MetalnessRoughnessTexture;
uvec2 SpecularTexture;
uvec2 OcclusionTexture;
uvec2 EmissiveTexture;
};
layout(binding = 2, std430) buffer MaterialBuffer
{
GpuMaterial[] Materials;
} materialBuffer;
...
void main()
{
GpuMaterial material = materialBuffer.Materials[v_mesh_material_id];
...
``` and my fs
Hmm yes but if two meshes in vbo 0 need different shaders, you need a way to differentiate right?
Can this happen, is it useless?
It could happen
I guess this doesn't concern VBOs though, just how you group them for draw
for each vbo -> for each shader -> for each mesh
you could keep vbo/ebo/instancebuffer and just switch the pipeline to some other shader mayhaps a different materialpool, and then its also just glMDI
I have one shader for my mdi stuff
"pbr"
and one big vertex buffer and index buffer
then, all I have to manage are the little meshies
for an actual game, you'd probably want a way to put a different shader on a mesh (for effects and stuff)
perhaps group them by that shader
Fair enough, I'll think about what happens if the VBO fills up later
What allocator do you guys use?
Regular free-list allocator? Something fancier?
anything you like
a shrimple linear allocator will work for a shrimple app
consider if you actually need to be able to delete stuff
I'm an ambitious person
My snow game thingy needs to be open world because I like open worlds
So I think I need to stream meshes in and out?
yeah
you might have to consider LOD too
and keep various lods of your mesh in the vbo/ebo etc
Yes, to keep things spicy
: )
Alright, free-list it is
another thing with the mdiism you can do
which i also want to do at some point, not just lod
but also cull primitives
and have that compute shader populate the indirectbuffer
Ah the thing where the GPU feeds itself, it was in the AZDO thing yes
It's very far down in my listโข๏ธ
I made a crappy mdi allocator for my voxel engine
It just tracks blocks of memory and splits them up or combines them as necessary
O(N) for everything, but it works fine since I don't have a billion tiny allocations
I smell overengineering 
when are you going to write your LustriOS to run your LustriGL-mmo on it?
I'm not that ambitious 
the lie detector detected: its a lie
Considering I am yet to write a single line to actually implement the things I'm talking about here, I confirm the lie detector's result
Of course not, slow and steady
By the way, any feedback on this thing: https://github.com/LVSTRI/Iris/blob/master/shaders/4.1/simple.frag#L229
Since SDSMs are too damn precise when very close to the shadows, I needed to figure out a way to increase the PCF kernel, but using view_depth gave me too much shimmering, so I settled with this
Can I do better?
Btw glsl has a built in bitfieldReverse function
You have to make a hack in shadertoy because glsl es is scuffed beyond belief
Nice, not so nice for shadertoy though
I wouldn't want to spend more time on shadows, but is a bilateral filter + view depth based kernel size going to reduce shimmering?
bilateral filter will not fix temporal shimmering
But temporal accumulation will?
yeah, some form of it
but that's a big brain worm
the view depth based kernel size may work
How big of a worm?
you can spend infinite time tweaking your temporal reprojection
one does not shrimply implement it "correctly"
Interesting
it also requires deep integration with the rest of your renderer to support motion vectors
How would this temporal accumulation work, at a high level I mean
Just the overview of the algorithm
game studios sometimes employ one or two people whose sole job is to make sure TAA works

The basic idea is to temporally jitter your samples, then reuse those samples in the next frame
so you are distributing samples both spatially and temporally
Hmm
reusing temporal samples is tricky because the scene is dynamic
you have to figure out how much stuff has moved (with motion vectors) and determine where disocclusion (areas that previously weren't on the screen) has occurred
failing to do any of that perfectly can lead to a myriad of artifacts. You probably know about TAA ghosting already, which is when your sample rejection heuristic didn't reject a sample it should've
Sure
I am one of the few people that hates all kind of ghosting/shimmering
DLSS, DLAA, FSR, TAA, you name it
I don't like them, I'd rather stick with FXAA
But for shadows only it shouldn't be a big deal right?
Well purely screen space techniques will have pretty bad shimmering (specular aliasing) compared to TAA
But no ghosting
Yeah, probably
I have a presentation somewhere
render_jittered_shadow_maps_and_motion_vectors();
if (check_if_any_bad_things_happened_such_as_dissoclusion()) {
discard_history_or_something();
}
read_previous_screen_space_shadow_only_pass_and_merge_with_history():
render_scene_with_new_screen_space_history_buffer();```
I think
I'm unsure as to whether I need to jitter the shadow map themselves, or a shadow only render of the scene from the camera's POV (or neither of them and I'm just rambling)
I haven't done temporal shadows so I'll theorize with you
I guess jittering the shadow projection is what you want
When sampling or when drawing?
Because swimming is produced by aliasing in the shadow map
When drawing the shadow map
I see
temporally filtering shadows is a bit funky though
you have to write the shadow mask to a texture and filter that I think
which means that you can practically filter a limited number of shadows (like just from the sun)
generic TAA is nice here because it applies to everything
There will be no point lights in my game, only area lights 
The mask is basically everything you do for shadows except multiplying it by the lighting result
aka how occluded the light is
Filter that here applies to "do temporal accumulation" right?
And then that results will get used by the final pass where I draw pretty colors multiplied by the filtered result?
It makes sense, the only thing I have no idea how to do is "do temporal accumulation"
maybe there is some clever way to filter multiple light shadows efficiently that I'm not aware of
I'm sure it's more involved than mix(old, new, 1 / (frames + 1))
"draw the rest of the owl"
Yeah exactly 
well it kinda boils down to that
in fact, that code will work if nothing moves
I legit suggest trying that to start
I will
then upgrading it with reprojection
but now I sleepy
Good theorycrafting session, I'll see if I can find a paper or something on this
I found the link I was looking for btw
https://developer.download.nvidia.com/gameworks/events/GDC2016/msalvi_temporal_supersampling.pdf
I awoke.
I also read the slides and they're basicaly telling me to apply TAA to the mask and sample that
But TAA is a huge jar of worms.
luckily you can get away with more hacks when it's just shadows
Therefore TSSMโข๏ธ (Temporal Supersampling Shadow Mapping (New technique patented by Jaker and Yours Truly)) is postponed
embrace imperfection
your shadows are pretty awesome already
way cooler than anything I've done for them
It's on the listโข๏ธ
noice ๐
Time for MDI
"just add contact hardening" 
its a can of worms
its a worm of cans
do they sell cups of wriggling worms
wouldnt surprise me they have shops outside, even in the US like Lidl and Aldi
We're back to our humble origins.
And we have an indirect triangle
noice
noice, even srgb correct it seems hehe
I'm wondering something though, whether I should switch to: every attribute gets its own vertex buffer
Instead of every attribute is tightly packed into one vertex buffer, I wonder if that makes handling multiple vertex formats easier?
i personally just use interleaved vertexformats
the only difference for you is how you setup the vao, everything else remains the same
you also may or may not save a few bytes here or there, depending on what attribute you want to source
Upon pondering I have decided that making one VBO per attribute actually makes my life harder.
Interleaved it is.
maybe try both
see what works best for you or the scenario you want to render
maybe interleaved makes more sense for a general purpose mesh, but separate buffers make more sense for something like a planet or so
I've thought about that, but in the case of missing attributes, if you don't want to specify separate offsets for each attribute in each VBO, you'd have to pad with zeros
Creating lots of waste
usually meshes for some pbrism have positions and same amounts of normals uvs and tangents etc
I am thinking.
VAO store the format of the attributes in the VBO.
Is it then reasonable to create one VAO per VBO/EBO combination?
With these VBO/EBO being, of course, very large
imo you should only have one vao per vertex format
then use glVertexArrayAttribFormat and co. to facilitate that
Yeah but another vertex format implicitly means another VBO (In my thing at least)
and glVertexArrayVertexBuffer/glVertexArrayElementBuffer to bind VBO/EBO when necessary
I found that if I stuffed more than one vertex format in a VBO, offsets would get funky
you can group geometry into various vbos if you need/want
That's why I decided this one-to-one relationship between vertex formats and VBOs
in general geometry is usually the same though, pos/normal/uv/tangents for static geometry + weights for animated
What do you mean?
game worlds are made out of the same type of geometry
on top of that you might have things like foliage or UI elements or special geometry for effects of sorts, those dont share the "typical" vertex format
be it your space game with big asteroid fields, or your open world mmo with dunes and rocks or cities
you might only have 4, 5, 6 vertex formats
- your pos/normal/uv/tangents for static geometry... houses... rocks... statues... ships
- same as 1) but with weights and bonematrices for animated geometry
- some debug nonsense, perhaps simple lines or grids, which is just position and color
- perhaps some font rendering
- mayhaps part of 4) UI elements... just pos and uv for instance
maybe 1) and 2) are split into several vbos/ebo maybe not
Hmm.
when noobs ask in #opengl we usualyl suggest just make 1 vao per mesh, when they start out
95% of them will never come back and ask for more because it was just some homework or so, and doing that (vao per mesh) is also completely fine
there is another camp of people ๐
those who reuse 1 vao for everything, just like japser said he reuses 1 fbo for all the shtuff
jasper*
I have decided.
anyways, sounds like lvstri will use 1 vao per vbo anyways due to mdi
One VAO per vertex format it is.
Nah, I forgor about EBOs
So there actually wasn't this one-to-one relationship, I would end up creating redundant VAOs where I could've just done glVertexArrayElementBuffer
I'm a little confused
Wouldn't you normally have 1 vbo (or set of vbos) and one index buffer per vao with mdi
Yeah, but say your VBO is full but your EBO can still house stuff
Attempting to insert a new mesh would result in a new VBO but the same EBO
So two mesh belonging to the same EBO could have different VBOs and vice versa.
i am also a little conchfused now
if you assume all your geometry has verticles and indicles
then you always bind both
and provide both (data wise)
if you add a new mesh, you add vertices to the vbo and indices to the ebo
if you cant fit, then its a different problem
you could try to evict old geometry/indices or simply your renderer knows it needs to switch to a different set of vbo/ebo because the other is "full"
But VBO and EBO are not tied
they are kinda
ofc you can store stuff in there however you want, an make it super complicated
Say vbo[0] is full but ebo[0] is not, a new mesh would go in vbo[1] and ebo[0]
then your renderer needs to keep track of that
Hmm so you're saying I could just make a VBO and EBO a single "package"
yeah
Definitely sounds simpler.
like my MeshPool class for instance
owns 1 vbo and 1 ebo (for now, i dont need more right now but while talking to you right now ...)
i could make it own as many as it wanted, but also just keep track of what mesh lives in what "pair" of vbo/ebo
and then bind the right ones later during draw
i might have to split my glMDI into n calls too
or even more perverse, when you do vertexfetch/pull you could bind several vbo/ibo as ssbo, then might not need to "batch" or "split" your mdi calls, but then you still need to keep track which mesh sits in which of them so that you can index from the vertexshader
I don't even consider the possibility of a vbo or ibo being overfilled in mine
just make beeg buffers
allocate(glGetInteger(GL_VRAM))
I hope MeshPool isn't trademarked because I'm stealing the name
: )
absolute naive
i keep MeshPrimitive around for some reason
thats the actual vertex container on the cpu, which i get from my gltfloader, pos/normal/uv/andwhatnot
Ah you went for the linear allocator, fair fair.
yeah
I wrote a generic free-list one because it might turn useful for other stuff in the future maybe?
perhaps
Not sure but it's pretty short and I already knew how to make one
Speaking of, I always wondered if the best we can do for best-fit is O(N)? Are there fancier methods?
i suppose it depends entirely on your use case or how you want to write/use code
Not that I particularly care tbh
and performance is probably not really of concern, but who knows
Yeah, it's not like this allocator will be used profusely, even when streaming data
the allocator I mentioned was used for storing voxel meshes (which change fairly frequently) and it was never even close to being a bottleneck
Yeah we're absolutely fine here, alright, time to do the meshpoolโข๏ธ
The mesh pool is done and it was surprisingly not painfulโข๏ธ to do.
All that remains is the grouping and batching
The I should be able to plug this bad boy into my model loader.
Emphasis on the should.
gl_DrawID my absolute beloved.
how's perf atm
Probably meaningless to measure, there's no significant load anywhere, I'll report once I get shadows and compute up and running on the indirect renderer
noice
mayhaps introduce Labels for your things too
and pass them down to glObjectLabel that makes debugging schlightly more useful
Labels?
Hmm right now my model class takes in a "mesh pool", does that make sense or should I do something else ๐ค
i think those are separate
or rather it depends
Model can be the sole container of the verticles/indicles/materials you loaded from a file
or its your "abstraction" for how you setup your world
where Model is just a handle knowing about pooled material and the pooled mesh counterpart
Aha, so separating loading from uploading
if you still need vertexdata after uploading, you can keep it around
but you probably should (and so should i, but i havent) have no need to keep vertex data around after uploading
boundingsphere/boxes can be calculated from vertexpositions at loading time, navmeshes or the likes are probably created in a similar fashion
it probably depends on the use case/situation
Does it make sense to have a renderable class which contains all the informations required to draw that thing?
I.e: texture handles, mesh handles, the model it belongs too, etc.
Are you using bindless textures
Yes, I'm trying to debug why they don't work
Well, RenderDoc doesn't support bindless textures
You now must use nsight graphics to debug
RenderDoc supports Vulkan "bindless" though 
idk
you can do bindless-like things in GL without actual bindless
such as using array textures
or you can use regular textures and just batch MDI draws by material (which is probably the most practical non-bindless approach)
I'll try using nsight for a while..
I have good news by the way
Well, both good and bad.
Good: bindless textures work!
Bad: mipmaps are... just gone?
I guess vulkan bindless is easier to implement because it's really just "bind everything and index" rather than truly binding nothing
And mipmaps are back
what was the buge?
glTextureStorage2D with levels = 1 instead of floor(log2(max(w, h))) + 1
I didn't have to do this with glTexImage2D ๐
Also half of my draw loop is now filling up buffers 
glTexImage sucksssss
True, I also took the opportunity and "modernized" my thing.
Now everything is DSA.
excellent
Much much better to use to be honest, it's quite nice.
much good. very sane
glBindThing(thing); glDoThingToThing(); is API dreamed up by the deranged
i hope that will haunt these people forever
i remember learning opengl and my jaw was on the floor
was that before or after learning vulkan
because I'd imagine that most wouldn't realize the API is absolutely unhinged if they learned GL first
Yeah, to be honest if you told me OpenGL was well designed I'd probably believe you.
I didn't have many issues using Core 3.3
modern GL is a lot better, but still has a myriad of questionable design choices
for example, there isn't a single struct in the entire API
The only thing I wish OpenGL had is more debug features.
The debug callback really didn't help me all that much...
For some reason when the error occurs, it's always on glXSwapBuffers or something
I find the callback to be quite helpful in diagnosing API misuse
Not on the site of the actual call
I... am not sure?
#if !defined(NDEBUG)
glEnable(GL_DEBUG_OUTPUT);
glDebugMessageCallback([] (GLenum source,
GLenum type,
GLuint id,
GLenum severity,
GLsizei length,
const GLchar* message,
const void*) {
if (severity == GL_DEBUG_SEVERITY_NOTIFICATION) {
return;
}
std::cout << "debug callback: " << message << std::endl;
if (severity == GL_DEBUG_SEVERITY_HIGH) {
debug_break();
}
}, nullptr);
#endif```
Here
add glEnable(GL_DEBUG_OUTPUT_SYNCHRONOUS); to that snippet
then make an invalid GL call somewhere like glDrawArrays(1234, 0, 0);
Interesting, unfortunately I am not able at the moment since I borked the thing
Will try as soon as I unborke the borke
before, but I just found binding stuff really awkward
i got used to it then I switched to vulkan 
heh
despite gl being somewhat clunky, i now how to get a tringle on a screen with no effort
if i did the same in vulkan, id probably get half of the frametimes ๐
aye I don't blame you. vulkan is a beast
im still too afraid to play with it
I'll try Vulkan sooner or later, probably after my Snowโข๏ธ thingy
So very far into the future 
from what I've seen in this thread, I think you've been really productive with OpenGL
I have a feeling you'll like ๐ ฑ๏ธulkan
but also you're already productive af with OpenGL, so maybe don't ruin that by using vulkan 
I like ๐ฆle stuff to be honest
this 100%. i wish I had stayed with opengl longer instead of jumping to vulkan
Now, if the unhingedness of the API outweighs the ๐ฆle-ness then it's another matter entirely 
you just gotta make/find the right wrapper to re-hinge it
shilling time: #1019779751600205955
Ah yes, the "un-unhinged OpenGLโข๏ธ"
lol some of the cupboards in my house have that mechanism and it's a pain in the ass
yeah
it keeps schlipping
exactly because of that reason
and when you try to balance, you have to do all 4 of them not just the 2 ๐
fuck doors
all my homies love windows (the kind that you peer through)
you just got played
I have been reading
512MiB of garbage data past the end of my buffer
And the driver just goes: "yeah sure, why not"
Instead of crashing, brilliant.
And we're back!
Now my GPU is getting utilized quite well in this scene, previously one frame took 70ms, now it's 5ms.
Yes, well, in this scene there were something like 30'000 draw calls 
I will soon, once I figure out why textures work fine normally but stop working in NSight
oh?
Yeah, these textures are very clearly wrong.
But it works fine if I don't run my app in NSight...
most likely you miss samplers
glGetTextureSamplerHandle + glMakeTextureHandleResident
Hmm I can see all my textures resident in NSight though..
It's probably the texture ids that are wrong?
This tree is sus.
imposter tree
In NSight it's even more sus.
For the record, these colors are not random, for some reason NSight gets different texture ids...
I am failing to understand this "bug".
Assimp is so awfully slow in debug mode this is painful..
new task: integrate a gltf loader 
I am literally idle
Waiting for Assimp to finish loading.
For every minute that passes like this the priority of integrating a faster loader grows..
could you perchance link a release build of ass imp to your debug builds
Hmm, that may work as a temporary fix for this madness.
I didn't think about that.
How awful does your CMake have to be to completely break if I link a release build
Goddamnit
my cmake template uses cgltf
Yeah, I'm wasting too much time on this
Fighting the linker just to get this stupid loader to work isn't really worth it.
I'll grab this real quick, can you link it?
integrating a gltf loader may take some time
well you implement stuff insanely quickly, so maybe not 
Turns out I don't need to change loader because the bug is not there.
Apparently if I make my buffer so huge that the entire model fits in one VA0/VBO/EBO set, then NSight likes it.
why did you use a zero for VAO but 'o' for the other ones 
I didn't even notice until now 
The weird thing is that texture indices are completely disconnected from VAO/VBO/EBOs
Not even that, it's just a counter that gets incremented
Although I plan to make this counter "local" to the model.
yeah thats a little brainworm
atm i use strings for that still
for materials i do the same as with meshes, i throw them into a MaterialPool
that also means i have a cpu side Material, a gpu side GpuMaterial and PooledMaterial which is pretty much the global index of the material i then use to update my MaterialIndex in the instanceism
if that makes any sense
Yeah I had something similar in mind
Anyway
I think I entered the backrooms
Which level is this again?
The incredibly funny thing is that this happens strictly in NSight
When a mesh doesn't fit in the huge VBO
Also, can you show me your (hopefully correct) render of bistro?
I need some ground truth here
I have no idea what is bugged and what's not.
From this angle is good
Interior too if possible
I need to download the scene first
do you know where I can find a gltf of it
Just interior and exterior
oh jeez I selected shaded view
"Compiling shaders..."
the door to the "main cafe" seems a little screwed up with both interior and exterior
looks like there are two doors that are almost overlapping
I know my thing is the least trustworthy possible, but it looks like that for me too.
I'd show you a pic, but my blender is frozen trying to export as gltf 
it's hammering one core, probably 90% overhead from the python interpreter
it finished finally
Yeah, it's sad 

it worked eventually
Interesting
now I gotta test it in my indirect renderer

bounding box view is comically useless here lol
Onto figuring out how I went from broken to more broken then.
I do have to wonder how every other model I've tested just works (outside nsight) but I'll leave that for laterโข๏ธ
now im curious too
blender uses 1 core to import and 1 core to export
disk io is also almost 0
smells like its not optimized at all
66ms sounds a lot - just exterior
i mdi this scene like 3 times (2x directional light shadow) 1x gbuffer
Everything I do borks it further 
Nice shading
None of us have AA though ๐ข
Needs shadows to make it more canny ๐
Hmm, how much vram do you have deccer?
This scene uses over 16 gigs on my AMD machine since I don't have compressed textures
i got my hands on a 1060, this one has 6gb
ram ram wise, was like 12GiB
I have been debugging an issue that doesn't exist for a long time.
Indices are good in NSight now
Just by removing layout (bindless_sampler) uniform and glMakeTextureHandleResidentARB
Which means my bindless texture usage is illegal in some way?
did you fix your debug callback setup yet
Yeah, been silent all this time 
classic
I really didn't have to comment much from the shader
Just this chunk:
layout (bindless_sampler) uniform;
layout (std430, binding = 5) readonly restrict buffer b_texture {
sampler2D[] textures;
};```
And somehow NSight now likes my program
you should read the extension spec for GL_ARB_BINDLESS_TEXTURE to make sure you didn't goof somehow
So it took 1 day and a few minutes this morning
To remove layout (bindless_sampler) uniform; (was causing big issues)
And make sure I was reading the correct set of UVs
Today I learned there can be more than one set of UVs in a mesh
i dont think its even necessary, that layout thing, i dont have it anywhere in my shaders at least
I will save this snippet because it's pretty cool though:
vec3 hsv = vec3(fract(M_GOLDEN_CONJ * (i_diffuse_texture + 1)), 0.5, 0.95);
if (i_diffuse_texture == -1) {
hsv = vec3(0.0, 0.0, 0.0);
}
diffuse = hsv_to_rgb(hsv);```
It does this cool thing.
For each number it associates a completely unique color
Because math apparently
yeah, you find this "question" or "talk" scattered across the server sometimes
when people implement vbuffer or just go mdi and want to visualize materialids or any other id
implodee? i think used a cool cosine based function for that, i believe its in #wip
#wip message
Make sure you fix the spelling before using the code
And we're back to where we were (now without bugs).
Nice job
NSight still sends me to the backrooms if I dare to use more than one VBO
Holy shit
I have no idea how this worked before
But I fixed it, this was a super extra large bug.
Alright, NOW all (obvious) bugs have been fixed
i dont even know what you are talking about
when you talk about nvidia bugs ๐
perhaps this warrants a minimap code example that everyone can try to reproduce, and if its an actual bug we could raise it somewhere ๐
No no it's my fault haha
I fixed it, the indirect groups I was creating were mismatched at some point with the objects
ah : )
Damn KTX is a lot of work
Let me know when you get it to work so I can copy you
Alright, I rewrote the entire loading thing for the third time
Let's see if I've got good enough to make this work first try (spoiler: no)
It doesn't even compile

First segfault of many to come
Is there no DSA glCompressedTexImage2D?
I guess not
Thankfully I don't really need that, I actually need the Sub thing.
also no immutable storage for them
And we segfaulted once more, the counter is at 12 for now.
Memory is hard
Lovely vertex data
actually, I think you're supposed to use glTextureStorage2D, then upload data with glCompressedTextureSubImage2D
or maybe you can do it all at once with glTextureStorage2D
I'm doing this and "it works"
"It works" meaning "I don't segfault but because the rest of my program is broken beyond god's help I can't test if it actually works"
the ref page for glTexStorage2D is probably wrong, since it doesn't include compressed texture formats
internalformat must be one of the sized internal formats given in Table 1 below, one of the sized depth-component formats GL_DEPTH_COMPONENT32F, GL_DEPTH_COMPONENT24, or GL_DEPTH_COMPONENT16, one of the combined depth-stencil formats, GL_DEPTH32F_STENCIL8, or GL_DEPTH24_STENCIL8, or the stencil-only format, GL_STENCIL_INDEX8.
very sus
I saw how KTX-Software implements ktxTexture2_GLUpload and it uses glTexStorage2D + glCompressedTexSubImage2D with compressed formats so we're good (assuming Khronos doesn't L I E)
Also gltfpack was quantizing my vertices
So we got funny vertex data.
I was looking at the GL spec, but it's quite annoying as it takes you on a wild goose chase trying to find what's legal
Yeah I spent a whopping 5 minutes looking at glTextureStorage2D for any sign of compressed formats
I gave up and went to see an actual implementation
Does libktx come with a compressor, or is it just for reading and writing ktx files?
Ideally, I'd like to be able to load gltfs with or without compressed textures, but always have compressed textures internally (for minimizing bandwidth and VRAM usage)
Hmm that lib sounds familiar
Looks pretty nice
Do you invoke it at runtime?
I was considering compressonator, but this looks much more automagic for my needs
Not yet but I plan to
Hopefully I'll have something that looks at the glTF, tells me if it's optimized and if it isn't I invoke gltfpack
Well it's a start...
hOLY POG IT WORKS
Now I can load bistro diffuse + normal maps without killing my GPU in the process
How's ๐ ฑ๏ธerf
How do you determine if it's optimized ๐ค
Solid 4ms
