#Iris - A Journey through OpenGL and beyond to learn Graphics
1 messages ยท Page 3 of 1
It's kind of naive, I just see if the glTF has basisu images in it.
Ideally I'd want to set a flag somere in the extras section
https://github.com/LVSTRI/Iris/blob/master/src/texture.cpp#L28
https://github.com/LVSTRI/Iris/blob/master/src/model.cpp#L65
This is the implementation by the way
Feel free to steal as I have stolen from others 
I've never seen std::type_identity before
auto* ktx = std::type_identity_t<ktxTexture2*>();
apparently it's useful for implicit type conversions to the template type
My reasoning for using that is I don't like declaring the type on the left
uh
Call me crazy, I deserve it
you can just write auto* ktx = ktxTexture2*{};
nvm 
you have to write it like (int*){}
er
(ktxTexture2*){}
Interesting.
damn, libktx looks super easy to use
I guess my life is a lie then
for value types, you can indeed write auto foo = Foo{}; (or Foo()), which I do a lot
I guess pointer syntax is brain damaged, so you can't write it exactly like those
Truer words have never been spoketh
I learn something new about C++ every time I read your code
what's the advantage of defining swap for a custom type instead of just defining a move constructor and move assignment operator?
I don't understand it fully either, but apparently it avoids repetition and favors ADL
my move semantics are cursed, lemme show you
It's called the copy-and-swap idiom
Buffer::Buffer(Buffer&& old) noexcept
: size_(std::exchange(old.size_, 0)),
storageFlags_(std::exchange(old.storageFlags_, BufferStorageFlag::NONE)),
id_(std::exchange(old.id_, 0)),
mappedMemory_(std::exchange(old.mappedMemory_, nullptr))
{
}
Buffer& Buffer::operator=(Buffer&& old) noexcept
{
if (&old == this)
return *this;
this->~Buffer();
return *new (this) Buffer(std::move(old));
}
I did some epic spec reading with others to ensure this is actually legal code
iirc there is an edge case with this pattern when you have a pointer to a subobject of the object that gets destroyed
where the spec isn't clear on whether it's UB
but no one should (or can) be making pointers to members of Buffer, so it's fine here
Reject the spec, embrace "it works on my machine"
anyways, the whole point of this is to reduce duplication in favor of spooky placement new
60% of the time, it works every time
I'll keep my "swap" thanks 
you need move semantics anyways, no?
swapping isn't the only reason I need em
idk, this is #bikeshed-๐ material
If I didn't misunderstand that idiom, copy-and-swap works in all cases
Because you define only one copy assignment op, taking self by value
and you just swap that with *this
yeah, it seems to work
If you pass anything that's not an xvalue or rvalue it copies it
otherwise, move constructed
Then you swap it with either the copied state, or "unspecified" state (depending always on whether you move or not) which agrees to move semantics
I don't know where ADL comes into play but I never understood ADL
Or anything about C++'s overloading resolution rules
yeah, they're absurdly complex
all I know is that it enables this construct
endl(std::cout);
oh, it also lets you do this
std::string a{"hello"};
std::string b{"world"};
swap(a, b);
which means you don't have to force a particular version of swap, if your thing happens to define it in its namespace/class/whatever
std::swap means you will get something slightly less optimal than if you implemented your own swap, since it always move-constructs a temporary instead of just swapping each member
maybe once they add reflection, we can finally have an optimal std::swap 
Good morning friends
Today I discovered that cutting electricity for half a day is legal in this country
Can't really do anything about it if it's an issue in the electrical distribution network.
Anyways now that we have Indirect Drawing and Shadows figured out, I will now ponder where to go next.
Possible candidates are OIT and Frustum Culling
OIT is really interesting because it uses Linked Lists on the GPU which to me is fairly wild.
that's just one possible implementation of OIT
if you want the state of the art, you probably want one of these (second one requires RoVs which aren't on AMD)
http://momentsingraphics.de/MissingTMBOITCode.html
https://www.intel.com/content/www/us/en/developer/articles/technical/oit-approximation-with-pixel-synchronization.html
But no linked lists ๐ฆ
Are there other things things that use GPU Linked Lists, I'm very curious to try them
linked lists generally aren't what you want to be doing on the GPU ๐
may I suggest occlusion culling
all done in a compoot shader
well, there's raster occlusion culling, which uses the frogment shader
Occlusion culling in compute hmm.
look up "hi-z occlusion culling"
it's a bit more complex than raster occlusion culling
you can use both at the same time if you wish
Alright I have pondered enough, it's time to setup frustum + occlusion culling
so, how did you compress sponza
did you just run compressonator with some flags?
and it produced ktx2 files out of png?
I ended up not using Compressonator as it was far too overkill for my purposes, instead gltfpack is very automagic
gltfpack -i .\bistro\bistro.gltf -o .\compressed\bistro\bistro.glb -tc -tq 10 -vpf -kn -km -ke -noq```
I just had to run this and boom: everything is compressed
ah neat
Somehow I always manage to forget GPUs are parallel machines
Hmm I still get occasional flickering for some reason?
Even though this was most likely the issue, there is still flickering once every second or so, randomly
Shader is just this:
void main() {
uint index = gl_GlobalInvocationID.x;
if (index == 0) {
for (uint i = 0; i < u_draw_count; ++i) {
draw_count[i] = 0;
}
}
barrier();
if (index < u_object_count) {
const object_info_t object = objects[index];
const mat4 local_transform = local_transforms[object.local_transform];
const mat4 global_transform = global_transforms[object.global_transform];
const mat4 model = global_transform * local_transform;
if (is_object_visible(object, model)) {
const uint slot = atomicAdd(draw_count[object.group_index], 1);
indirect_commands[slot + object.group_offset] = object.command;
object_shift[slot + object.group_offset].object_id = index;
}
}
}
(is_object_visible always returns true for now)
do you call glMemoryBarrier in your host code
I have no idea what that is (so no)
it's necessary to make your program correct
Interesting, I'll read up on that
it ensures incoherent reads and writes (SSBO and image stores from shaders) are visible/completed to future operations
so if you write some indirect commands in a shader, then do glMultiDrawElementsIndirect, you need glMemoryBarrier(GL_COMMAND_BARRIER_BIT); between them
otherwise the driver cannot see that the MDI command depends on the dispatch and issue the corresponding synchronization and cache flush/invalidation
I see
So just like CPUs atomics then
I assume "visible" and "available" mean the same thing (I mean, not that visible == available, just that visible/available on the CPU are the same on the GPU)?
It's basically a cache flush
there is just one concept of memory visibility in opengl
btw, DX11 doesn't have this, so the driver has to issue conservative barriers between every pass that does incoherent writes
which means you can't mess up sync, but you also cannot get maximum perf
anyways, flickering like what you have is typically a symptom of a synchronization issue
you can check it by inserting glFinish or glMemoryBarrier(GL_ALL_BARRIER_BITS) after every draw/dispatch that does SSBO/image writes
Very interesting
Why does cache become incoherent on the GPU itself though?
Like, the GPU is feeding itself data, how does it fail to maintain coherency?
Oh it's probably because each GPU SM has its own cache and any workgroup writing data is not guaranteed to be the same reading it?
GPU cache coherency protocols are very basic compared to CPU ones
and often rely on manual flushes and sync
idk all the details though
So hold on a sec
I also do a depth reduce and setup shadow cascades in compute
...Am I supposed to insert barriers here too?
prob
But it's been working fine up until now with no errors from the debug callback 
any time you want a write to be visible or a read to have finished before the next pass that consumes the memory, you need a barrier
So this:
depth_reduce_init_shader.bind();
offscreen_attachment[1].bind_texture(0);
depth_reduce_attachments[0].bind_image_texture(0, 0, false, 0, GL_WRITE_ONLY);
camera_buffer.bind_base(1);
glDispatchCompute(depth_reduce_wgc[0].x, depth_reduce_wgc[0].y, 1);
depth_reduce_shader.bind();
for (auto i = 1; i < depth_reduce_wgc.size(); i++) {
depth_reduce_attachments[i - 1].bind_image_texture(0, 0, false, 0, GL_READ_ONLY);
depth_reduce_attachments[i].bind_image_texture(1, 0, false, 0, GL_WRITE_ONLY);
glDispatchCompute(depth_reduce_wgc[i].x, depth_reduce_wgc[i].y, 1);
}```
becomes this:
```cpp
...
glDispatchCompute(depth_reduce_wgc[0].x, depth_reduce_wgc[0].y, 1);
glMemoryBarrier();
depth_reduce_shader.bind();
for (auto i = 1; i < depth_reduce_wgc.size(); i++) {
...
glDispatchCompute(depth_reduce_wgc[i].x, depth_reduce_wgc[i].y, 1);
glMemoryBarrier();
}```?
ye
Hmm I also read from the depth's sampler though, do I need a barrier here too ๐ค
Before the first dispatch, I mean
The first dispatch here, reads from a framebuffer attachment (the depth attachment, written to by a Z-prepass)
Should I do glMemoryBarrier(GL_FRAMEBUFFER_BARRIER_BIT);?
The spec isn't very clear on the rules.
SHADER_IMAGE_ACCESS_BARRIER_BIT: Memory accesses using shader built-in image load, store, and atomic functions issued after the barrier will reflect data written by shaders prior to the barrier. Additionally, image stores and atomics issued after the barrier will not execute until all memory accesses (e.g., loads, stores, texture fetches, vertex fetches) initiated prior to the barrier complete.
btw, the ref pages do not mention this critical information for some of the barrier bits, so you best refer to the spec
I guess that is to say that you need a barrier before the first dispatch as well
By the way
We might have a situation
Using glFinish(); I still see occasional flickering 
spoopy
idk if glFinish technically makes writes visible, so maybe don't use that
or use glFinish+glMemoryBarrier if you're super paranoid ๐
where did you put it
After the dispatch that writes indirect commands
damn
maybe there is a race within that shader. Lemme look at it again
shader looks okay
I thought atomics could only be done on variables that were neither readonly nor writeonly
Compiler doesn't really complain but you are right lol
It doesn't really make sense for it to be writeonly
Same flickering though
I guess try debugging with nsight
or somehow simplifying the shader (e.g., removing the atomic and just using the global invocation id, if possible)
wait
Can you explain the loop at the beginning of your shader
Since the shader increments the draw_count, I need a way to reset it between invocations
So that it doesn't grow to infinity and beyond
What do you expect barrier() to do
Make all other threads wait for the first thread to finish initializing draw_count?
Did you know that barrier only synchronizes threads within a single workgroup
i.e., it is not global sync
so the code is probably wrong if you have more than one wg
Try clearing the buffer with this
https://registry.khronos.org/OpenGL-Refpages/gl4/html/glClearBufferSubData.xhtml
Yeah...
no more flickering
How many more absolutely vital pieces of information am I missing I wonder 
just read the whole gl and glsl specs before continuing 
usually takes me a lot longer than that ๐
and me ^3 that
Alright that's a wrap, tomorrow we'll have actual frustum culling (and possibly even occlusion)
By the way I only just realized that gltfpack merges primitives if they have the same node and material
Thankfully it's open source so I could simply add a flag and build from source
bool intersect_aabb_plane(in aabb_t aabb, in vec4 plane) {
const vec3 normal = plane.xyz;
const vec3 size = aabb.size.xyz;
const vec3 center = aabb.center.xyz;
const float radius = dot(size, abs(normal));
return -radius <= dot(normal, center) - plane.w;
}
bool is_object_visible(in object_info_t object, in mat4 model) {
const aabb_t aabb = object.aabb;
const vec3 world_aabb_max = vec3(model * vec4(aabb.max.xyz, 1.0));
const vec3 world_aabb_min = vec3(model * vec4(aabb.min.xyz, 1.0));
const vec3 world_aabb_center = (world_aabb_max + world_aabb_min) / 2.0;
const vec3 world_aabb_extents = world_aabb_max - world_aabb_center;
const aabb_t global_aabb = aabb_t(
vec4(world_aabb_min, 0.0),
vec4(world_aabb_max, 0.0),
vec4(world_aabb_center, 0.0),
vec4(world_aabb_extents, 0.0));
for (int i = 0; i < 6; ++i) {
if (!intersect_aabb_plane(global_aabb, frustum.planes[i])) {
return false;
}
}
return true;
}``` It was disappointingly trivial to implement...
Learning about AABBs with mouse picking was worth it 
Is it worth doing this for shadow maps too? ๐ค
Hmm, frustum culling shadow cascades doesn't really work sadly, I need more tolerance I guess?
Turns out my intuition for culling shadow lights was completely off
I'm now following this scary looking paper: https://arisilvennoinen.github.io/Publications/Shadow_Caster_Culling_for_Efficient_Shadow_Mapping.pdf but it's probably better if I do occlusion culling first
CHC++ Uses hardware occlusion queries, but from what I'm reading they are fairly inefficient due to CPU stalling, can conditional rendering fix this?
Maybe HiZ culling is the way to go?
Ye
Hardware occlusion queries aren't great these days since you can just write to a buffer now
But how do you do occlusion culling for shadows?
HiZ requires depth
But shadows are depth
here's a classic algorithm:
- Render objects that were marked visible to depth
- Perform occlusion culling against depth, marking visible objects
if you do it all on the GPU, there is just one frame of latency between an object being marked visible, and actually being drawn
But you can add a third step to remove that latency
By simply drawing the objects whose visibility changed from 0 to 1 this frame
I uh
How do you do step 2 without rendering all objects
You need the depth of every object to check whether the object is visible or not?
Step 2 depends on the implementation
For hi-z, it means performing the test for every object's bounding volume
For raster occlusion culling, it means drawing the bounding volume for every object (which is hopefully substantially cheaper than actually drawing every object)
Raster is cool because it's so shrimple
Here's a sample that implements it
https://github.com/JuanDiegoMontoya/Fwog/blob/main/example/05_gpu_driven.cpp
Wow you actually render cubes
Incredible
It's not like you told me already
...if I'm dumb
thats a cute pic
Shadow frustum culling for some unknown reason does not work
Isn't it the same exact thing? As frustum culling for perspective projections I mean.
Even though I explicitly disable near plane culling, it looks like it's doing it anyways...?
I think
Nevermind I don't
The only difference is where your planes are
Ight it works now
bug was actually pretty obvious, you get one LVSTRI point if you spot it
Actually nevermind, that was just placebo, I'm not culling anything now 
I actually have no idea now ๐ค
bool is_aabb_inside_plane(in aabb_t aabb, in mat4 model, in vec4 plane) {
const vec3 normal = plane.xyz;
const vec3 extent = aabb.extent.xyz;
const vec3 center = aabb.center.xyz;
const float radius = dot(extent, abs(normal));
return -radius <= (dot(normal, center) - plane.w);
}
bool is_object_visible(in object_info_t object, in mat4 model) {
const aabb_t aabb = object.aabb;
const vec3 world_aabb_min = vec3(model * vec4(aabb.min.xyz, 1.0));
const vec3 world_aabb_max = vec3(model * vec4(aabb.max.xyz, 1.0));
const vec3 world_aabb_center = vec3(model * vec4(aabb.center.xyz, 1.0));
const vec3 right = vec3(model[0]) * aabb.extent.x;
const vec3 up = vec3(model[1]) * aabb.extent.y;
const vec3 forward = vec3(-model[2]) * aabb.extent.z;
const vec3 world_extent = vec3(
abs(dot(vec3(1, 0, 0), right)) +
abs(dot(vec3(1, 0, 0), up)) +
abs(dot(vec3(1, 0, 0), forward)),
abs(dot(vec3(0, 1, 0), right)) +
abs(dot(vec3(0, 1, 0), up)) +
abs(dot(vec3(0, 1, 0), forward)),
abs(dot(vec3(0, 0, 1), right)) +
abs(dot(vec3(0, 0, 1), up)) +
abs(dot(vec3(0, 0, 1), forward)));
const aabb_t world_aabb = aabb_t(
vec4(world_aabb_min, 1.0),
vec4(world_aabb_max, 1.0),
vec4(world_aabb_center, 1.0),
vec4(world_extent, 1.0));
const uint planes = bool(u_disable_near_culling) ? 5 : 6;
for (uint i = 0; i < planes; ++i) {
if (!is_aabb_inside_plane(world_aabb, model, frustum.planes[i])) {
return false;
}
}
return true;
}
``` This should be fine?
I mean, it works perfectly fine for a perspective projection, why not for shadows?
it's just culling completely visible objects for some unknown reason?
They aren't even z < 0
It's only the first cascade as well...
I'm lighting the Jaker beacon
ask chatgpt what's wrong with your code 
Why would you even suggest that 
cuz I'm a lazy bastard
ngl I actually asked chatgpt, but I can't tell if it's answer is correct 
probably because I don't understand 100% of the math in the original code
maybe it'll help if you walk me through the math,
style ๐
btw, is_aabb_inside_plane has an unused parameter
and arguably is_object_visible should take an aabb_t instead of an object_info_t, if all you need from it is the AABB
Yeah that's me checking various things
Anyways the math is as follows:
- Translate AABB's center to world space
model * vec4(center, 1) - Translate and correct AABB's extents (should account for rotations and scales, we use the first 3 columns of the model matrix to correct this)
- Check if the AABB is on or inside all 6 planes (or 5 if near culling is disabled, last plane is the near plane), we basically take the signed distance from the plane's origin to the center of the AABB and check if it's within radius or more
dot(normal, center) gives whether the point is inside or outside the plane
Is there any way I can debug a compute shader?
Out of pure curiosity, what did
answer?
It said something about the computation for world_extent being wrong
I have discovered
A thing
Actually multiple things.
First off my signs are completely broken.
Second, distances from the plane origins are garbage
Third.
I have no idea how to fix all this 
Therefore I'll grab a man's best friends: pen and paper, and write down stuff.
how dare you disrespect man's true best friend
With inverse(view)
[0] = {iris::plane_t} {normal=[0.609994292 0 0.792405844], distance=-4.57495737}
[1] = {iris::plane_t} {normal=[0.609994292 0 -0.792405844], distance=-4.57495737}
[2] = {iris::plane_t} {normal=[0.5 0.866025447 0], distance=-2.88397455}
[3] = {iris::plane_t} {normal=[0.5 -0.866025447 -0], distance=-4.61602545}
[4] = {iris::plane_t} {normal=[-1 -0 -0], distance=-504.5}
[5] = {iris::plane_t} {normal=[1 0 0], distance=-7.4000001}
With inverse(pv)
[0] = {iris::plane_t} {normal=[-0.609995067 0 0.792405247], distance=-4.57496309}
[1] = {iris::plane_t} {normal=[-0.609995067 0 -0.792405247], distance=-4.57496309}
[2] = {iris::plane_t} {normal=[-0.500000775 0.86602503 0], distance=-4.61603069}
[3] = {iris::plane_t} {normal=[-0.500000775 -0.86602503 0], distance=-2.88398075}
[4] = {iris::plane_t} {normal=[1 0 0], distance=-504.5}
[5] = {iris::plane_t} {normal=[-1 0 0], distance=-7.4000001}
Why god
I fixed the thing
I finally achieved inner peace.
@ derhass helped me a lot, honorable mention here.
good luck
Hmm, the "first frame" is very important in raster occlusion culling apparently, but I still don't understand the "core loop" very well
If the first frame I perform no occlusion culling, the next frame I am supposed to mark all objects that were visible the previous frame and render them?
What about changes though? Reprojection of the depth buffer?
you can run a frame before the gameloop starts
And I guess any time I can't reliably reproject ๐
you don't need to reproject anything 

reprojection is for if you want to reuse an old depth buffer for new object positions
but you don't have to do that
I described some methods above that don't require reprojection
a dogjiff.gif also doesnt require reprojection
I suppose the core idea is that, instead of reprojecting, you use the object visibility from last frame instead
Also, the first frame isn't a special case when you do this
yus
How do I know if an object is completely occluded though...
No samples pass the depth test? Doesn't that require an occlusion query ๐ค
Oh wait that's what early Z is for right?
early z + ssbo write in fs
I can feel more brain expansion
technically that code is UB btw
since there is a race if there are multiple fragments 
atomicExchange
atomicCompSwap
either one works
atomicCompSwap(data[draw_id].is_visible, 0, 1)
So that you only do this once
efishenshy
writing the same value from multiple threads is technically UB according to the spec, but works on actual hw
Yeah makes sense, there's no reason why it wouldn't
I imagine it's faster than atomics since you don't have to serialize access to the memory controller
Now what vertices do I feed the GPU?
Hmm
I suppose I could first do frustum culling, get the number of visible objects there, then build another indirect buffer with instanceCount = number of objects in frustum and draw cubes?
``` I don't really understand this...
a cube lol
The comment says 24 vertices, but I think it's actually 14
Oh god this is going to be a mess isn't it
If I want to do occlusion culling for shadows too, that is
Each cascade gets its own indirect buffer...
it shouldn't be too bad if you abstract it properly
This piece of code is copy-pasted 5 times 
Uhh
I have these two buffer bindings here:
layout (std430, binding = 7) restrict buffer b_occlusion_draw_count {
uint occlusion_draw_count;
};
layout (std430, binding = 8) writeonly restrict buffer b_occlusion_indirect_command {
indirect_command_t occlusion_command;
};```
Because I need atomics basically
Is there any way I can avoid creating a whole SSBO for a single uint?
you could just use a glUniform1ui
^
what happened to your foo_t naming convention
Hmm but I'd still need to bind this buffer to GL_PARAMETER_BUFFER
I changed it, it didn't make much sense for buffers and uniforms and stuff
are you doing glMDICount?
Yes
isn't there GL_DRAW_INDIRECT_BUFFER? or does that not work with MDI?
Yes but you also need to bind a buffer to tell GL how many indirect commands to consume, that's bound to GL_PARAMETER_BUFFER
those are 2 buffers
oh lol
the buffer types are weird
this is effectively a counter buffer
Naming couldn't be worse yes 
75% of the problems people have when learning gl would be solved with better names

anyhow, doing hi-z culling here?
Raster based, because it's more ๐ฆle
I'm currently taking the output of the frustum culling pass to draw only the AABBs that are in frustum
Sorry, which thing?
was thinking about the conditional rendering occlusion stuff opengl has
but pretty sure the perf of that ain't that great
Ah no, the idea is to rasterize AABBs and check visibility with early Z
So you basically leverage early Z to write into an SSBO the visibility of any object
right
Then you just write indirect commands that pass this test (if they managed to pass frustum culling first that is)
Jaker, why is your depth write disabled when rendering visible bounding boxes?
Ah I see.
Do we use last frame's depth buffer as well hmm
"This pass comes after the scene pass because it relies on a depth buffer to have already been created" do we really need this though?
noooooo
yes, otherwise you're testing occlusion against an empty depth buffer
But what about this
which part
It's a bullet point but it's all the same technique
the temporal coherence just means that object visibility will be almost the same from frame to frame
objects visible last frame are probably visible this frame, etc
So rip shadow mapping occlusion culling?
If the depth buffer already needs to be created then we can't really cull anything can we?
Unless it's acceptable for shadow mapping culling to just use last frames' depth?
it should work for shadows, even if the frustum changes every frame
I think you have some confusion about exactly what data is taken from the last frame
lemme go through it again
- clear depth buffer
- render visible objects
- render bounding boxes for occlusion testing
- render objects whose visibility changed from 0 to 1 this frame (optional step to prevent one frame of pop-in when objects become visible)
the visibility info from step 2 is used in step 1 of the next frame
visibility=draw commands or whatever
the thingy with reprojection basically uses the last frame's depth buffer, and swaps occlusion testing and object rendering (so u test first)
which I guess is more intuitive lol
the problem is that reprojection is not perfect and leads to arguably worse artifacts (false occlusion) compared to one frame of lag (which, again, can be mitigated by step 3 above)
About step 3
Can I just do this:
void main() {
const uint prev = visibility[i_object_id] & 0x1;
visibility[i_object_id] = (prev << 1u) | 1;
}```
Yeah, it's useless
red herring smh
Tbh I'd use an if statement just to make it obvious ๐
To keep track of change
if (lastFrameVisibility[obj_id] == 0)
{
objectsThatNeedToBeDrawnThisFrame[obj_id] = 1;
}
thisFrameVisibility[obj_id] = 1;
you can certainly reduce the number of buffers needed here
Yeah, I'll just make it work first
On my way copy pasting 5 times the same snippet (again)
hehe
Actually screw that, I don't even know if it works only for objects in my perspective
Let's just test primary camera perspective first
It didn't work ๐ฆ
Wait I didn't bind the shader
It worksn't
Ah my shift is wrong
Goddamnit

It's "working" but it doesn't seem to cull anything more than frustum culling did
Make a debug mode that draws the bounding boxes to the screen
HiZ is looking more and more appealing... there are loads of issues with ROC apparently
Well, it's to be expected when dealing with bounding boxes, I wish ROC was a bit more conservative though
like what
I don't know the exact reason, but some flickering can be observed if you are inside an AABB
Sometimes not even just flickering, you get the object seemingly transparent due to its visibility changing every frame
Here for example
I am using your Fwog because I nuked ROC 
Hmm
a shrimple way to mitigate it is to always draw the object if the camera is very close to its bounding box
Makes sense, what about this?
The object is not very close to the camera here, perhaps some offset could be applied to the AABB's position?
I can't see the aabb of the object in question
Well... we are inside it 
all it takes for the artifact to appear is for the camera to be inside the aabb and to not be looking at any other side of the object
I wonder if you can use clip planes for this
Hmm
I already have frustum planes from frustum culling
I would shrimply have to invert the condition
Tbh it's easier to just draw the thingy if you're inside the aabb
ideally, you wouldn't have objects with giant bounding boxes like that
Many scenes merge primitives so their bounding boxes are huge
I de-nuked ROC
Ah by the way, I can't find any documentation about NSight's profiler magic words
you'd get much finer culling if you used it on meshlets
vpc = viewport culling
idk what pes is
Also PCIe throughput is reporting 16GB/s, should I be worried about that?
They only happen at the end of shadow map draws
Actually... yes lol
I was rewriting the whole texture buffer, object buffer, indirect buffer for each call to "perform_frustum_cull"
Truly incredible
Zoo wee mama
Looks like checking if you are inside the AABB works fine
Or rather, I have not found any more artifacts yet.
By the way what's this
meshlets are little baby meshes
like 64-128 verts each
might be too fine though, since each aabb is like 14-24 verts
Meshoptimizer can emit meshlets
They are typically used with mesh shaders
This is super interesting holy
Apparently it's a whole new rendering paradigm?
Gone are vertex -> geometry -> raster?
they are automatically enabled
you just have to check that your implementation supports it
very good
Old: model_t
New: meshlet_group_t
To be honest I'm not understanding much, if anything at all 
The task shader is optional apparently, which is great for me, one less thing to worry about
you use task shaders to cull meshlets, basically
are you switching to vk already? 
No way
just using the cursed gl nv meme shader ext
ah
cursed because probably no one uses it ๐ธ

I know everyone's given up on GL, but I have not
I couldn't help but see there's no AMD version of this mesh shader extension

meme shading lol
AMD (pejorative)
heh
auto meshlet_triangles = std::vector<uint32>(max_meshlets * max_triangles * 3);``` I stared at that 3 for a solid minute
Before remembering: "Wait a minute, a triangle has 3 points"
Incredible.
smh
struct Tringle
{
uint32_t verticeeeez[3];
};
std :: vector < Tringle >(max_meshlets * max_triangles);
struct meshlet_t {
uint32 vertex_offset = 0;
uint32 vertex_count = 0;
uint32 triangle_offset = 0;
uint32 triangle_count = 0;
// this indices into meshlet_group_t::vertices
std::vector<uint32> vertices;
// gl_PrimitiveIndicesNV I guess
std::vector<uint8> triangles;
};
struct meshlet_group_t {
std::vector<meshlet_t> meshlets;
std::vector<vertex_format_t> vertices;
std::vector<uint32> indices;
};``` I have no idea what I'm doing 
I'm relying on meshoptimizer's documentation which is not much
constexpr auto max_vertices = 64u;
constexpr auto max_triangles = 124u;
constexpr auto cone_weight = 0.0f;
const auto max_meshlets = meshopt_buildMeshletsBound(indices.size(), max_vertices, max_triangles);
auto meshlets = std::vector<meshopt_Meshlet>(max_meshlets);
auto meshlet_vertices = std::vector<uint32>(max_meshlets * max_vertices);
auto meshlet_triangles = std::vector<uint8>(max_meshlets * max_triangles * 3);
const auto meshlet_count = meshopt_buildMeshlets(
meshlets.data(),
meshlet_vertices.data(),
meshlet_triangles.data(),
indices.data(),
indices.size(),
(const float32*)vertices.data(),
vertices.size(),
sizeof(vertex_format_t),
max_vertices,
max_triangles,
cone_weight);
auto& last_meshlet = meshlets[meshlet_count - 1];
meshlet_vertices.resize(last_meshlet.vertex_offset + last_meshlet.vertex_count);
meshlet_triangles.resize(last_meshlet.triangle_offset + ((last_meshlet.triangle_count * 3 + 3) & ~3));
meshlets.resize(meshlet_count);``` This "works"
i.e: it doesn't crash
oh jeez
I guess the "triangles" are indices into the meshlet itself?
I guess cone_weight is some value to influence how faces are grouped w.r.t. their normal
idk, we're all guessing
It's something regarding cone based culling but I dunno
"cone_weight should be left as 0 if cluster cone culling is not used, and set to a value between 0 and 1 to balance cone culling efficiency with other forms of culling like frustum or occlusion culling."
Whatever this means
yeah, ultimately you would use it to make normal-based culling better
but optimizing for that too much means you might group far apart vertices together, making frustum and occlusion culling worse
What I would really like to know right now is whatever the hell gl_PrimitiveIndicesNV is
Since it's a uint8 and NVIDIA only allows up to 256 - 1 triangles in a meshlet I suppose they are indices that index into the meshlet vertices?
i.e: gl_MeshVerticesNV
When each mesh shader work group completes, it emits an output mesh
consisting of
...
* an array of vertex index values written to the built-in output array
gl_PrimitiveIndicesNV, where each output primitive has a set of one,
two, or three indices that identify the output vertices in the mesh used
to form the primitive.
So... yes?
gl_PrimitiveIndicesNV are indices for gl_MeshVerticesNV?
I'll just assume yes for now
I'll know if I'm wrong because I will see funny triangles (or none at all)
where did you see gl_MeshVerticesNV
out gl_MeshPerVertexNV {
vec4 gl_Position;
float gl_PointSize;
float gl_ClipDistance[];
float gl_CullDistance[];
} gl_MeshVerticesNV[];```
It's this thing apparently
hmm the roblox guy, the same guy who made meshoptimizer, forgor his name atm, has a video series of meshletisms going iirc
its vulkan, but still
I'll look it up one sec
I have to figure out if I need the original index buffer or not
I'm 80% sure I don't
https://github.com/zeux/niagara
We will kick off the Vulkan stream series by discussing what we're going to be building and the general approach; then we'll start writing code to get a triangle on screen.
i think it was towards the end, meshlet culling
fun fact, sponza subdivides in 3515 meshlets
(Assuming I didn't break any laws of physics)
We are back to the origins once more.
For the third time 
Jaker, could you translate into (comprehensible) english what the first parameter of glDrawMeshTasksNV does?
Wait it's simply an offset int gl_WorkGroupID.x
Why would I ever need this
if you want to draw a subset of meshlets
consider it a convenience parameter, like how dispatches are 3D
๐ญ๐ฒ ๐ฒ๐ฒ
#1027528776717975592
reminds me of this frog_shush I made
the hand needs an outline though
Uh
------------
Internal error: assembly compile error for mesh shader at offset 30836:
-- error message --
line 1003, column 8: error: unknown opcode modifier
-- internal assembly text --
!!NVmp5.0
OPTION NV_internal;
OPTION NV_shader_storage_buffer;
OPTION NV_bindless_texture;
GROUP_SIZE 1;
PRIMITIVE_TYPE TRIANGLES;
PRIMITIVES_OUT 124;
VERTICES_OUT 64;
# cgc version 3.4.0001, build date Apr 13 2023
# command line args:
#vendor NVIDIA Corporation
#version 3.4.0.1 COP Build Date Apr 13 2023
#profile gp5mp
#program main
#semantic b_transform_buffer : SBO_BUFFER[0]
#semantic b_meshlet_buffer : SBO_BUFFER[1]
#semantic b_vertex_buffer : SBO_BUFFER[2]
#semantic pv
#semantic meshlet.5 : __LOCAL
#var uint3 gl_WorkGroupID : $vin.CTAID : CTAID[0] : -1 : 1
#var float4 gl_MeshVerticesNV[0].gl_Position : $vout.POSITION : HPOS[32] : -1 : 1
#var float gl_MeshVerticesNV[0].gl_PointSize : $vout.PSIZE : : -1 : 0
#var float gl_MeshVerticesNV[0].gl_ClipDistance[0] : : : -1 : 0
#var float gl_MeshVerticesNV[0].gl_Cul
I managed to crash NVIDIA's internal shader compiler
oof
Removing this:
layout (location = 0) in flat uint o_meshlet_id;``` fixes it
Ah I see
layout (location = 0) out t_per_vertex {
uint meshlet_id;
} o_per_vertex[];```
Where the hell do I put the flat
lol
out flat t_per_vertex doesn't work
....how do I send flat attributes to the frag shader?
It's per vertex not per primitive sadly
and from the mesh shader, it should be easy
You have access to the whole meshlet in the mesh shader, no?
You should be able to do whatever you want with a little creativity
So far my creativity has caused the NVIDIA internal compiler to crash 4 times

I figured it out btw
layout (location = 0) out t_per_vertex {
flat uint meshlet_id;
} o_per_vertex[];``` this is legal apparently
The spec says it's not
But NVIDIA is NVIDIA I guess
What do you mean nsight does not support debugging their own fucking extension
Only D3D12 Mesh Shaders are supported in NSight apparently..
At least deccer's cubes work
But I'll leave it like this
There's no way I can continue with mesh shading if I can't even debug...
Pretty sad
I once again forgor a triangle has 3 vertices
pog
Now I just have to figure out why everything borks when there's more than one meshlet group
While I figure that out, here's good froge
frogchamp
83796 meshlets and 5942 meshlet groups (or if you are old fashioned "meshes" ๐)
Occupancy is dead though 
I'm hitting the memory limit 11 times out of 10 
#version 460 core
#extension GL_NV_mesh_shader : require
struct meshlet_t {
uint triangle_count;
uint vertex_count;
uint vertex_offset;
uint[64] vertices;
uint[384] triangles;
uint mesh_index;
};
struct vertex_format_t {
vec4 position;
vec4 normal;
vec4 uv;
vec4 tangent;
};
layout (local_size_x = 1) in;
layout (triangles, max_vertices = 64, max_primitives = 124) out;
layout (location = 0) out t_per_vertex {
flat uint meshlet_id;
} o_per_vertex[];
layout (location = 0) uniform mat4 pv;
layout (std430, binding = 0) readonly restrict buffer b_transform_buffer {
mat4[] transforms;
};
layout (std430, binding = 1) readonly restrict buffer b_meshlet_buffer {
meshlet_t[] meshlets;
};
layout (std430, binding = 2) readonly restrict buffer b_vertex_buffer {
vertex_format_t[] vertices;
};
void main() {
const uint workgroup_index = gl_WorkGroupID.x;
const meshlet_t meshlet = meshlets[workgroup_index];
const mat4 transform = transforms[meshlet.mesh_index];
for (uint i = 0; i < meshlet.vertex_count; ++i) {
const vertex_format_t vertex = vertices[meshlet.vertex_offset + meshlet.vertices[i]];
gl_MeshVerticesNV[i].gl_Position = pv * transform * vec4(vertex.position.xyz, 1.0);
o_per_vertex[i].meshlet_id = workgroup_index;
}
const uint index_count = meshlet.triangle_count;
gl_PrimitiveCountNV = index_count;
for (uint i = 0; i < index_count * 3; ++i) {
gl_PrimitiveIndicesNV[i] = meshlet.triangles[i];
}
}
``` Backup
i think perprimitiveNV qualifier is what you want
Interesting, I'll check it out
I think this struct is the culprit by the way:
struct meshlet_t {
uint triangle_count;
uint vertex_count;
uint vertex_offset;
uint[64] vertices;
uint[384] triangles;
uint mesh_index;
};```
Each thread loading 2KiB of data isn't ideal I think 
the bad occupancy you mean? Thats probably because the layout (local_size_x = 1) in; so its only using 1/subgroupSize of the hardware
Ah yeah I changed that, I now use 32
Occupancy is still horrible, though better by a factor of 2
btw didnt you start learning opengl only a few months ago how can you already be messing with mesh shaders and shit thats crazy
I just copied the nvidia sample 
Also jaker helped me translate the spec into human language
pretty sure I was doing learnopengl shit a couple months in
I am still doing that by the way, I implemented normal mapping on the way
I didn't have this server when I started tho
Yeah, this server is a gold mine
what's funny is that I recently submitted a bug report to nvidia and got a reply basically telling me to rtfm
#1019779751600205955 message
this thread singlehandedly making me question what I've been spending my time on
I haven't done a bunch of the things lvstri is doing here
the shadows are way cooler already
and I haven't even touched mesh shaders
You gave me the filtering algorithm though
yeah they look great :3
probably look back through this thread when I go back to making my shadows look better
I also spent a solid week fixing bugs in my shadows 
and look how it paid off ๐
Everything regarding shadows is here btw: https://github.com/LVSTRI/Iris/blob/master/shaders/5.0/setup_shadows.comp
While filtering is here: https://github.com/LVSTRI/Iris/blob/master/shaders/5.0/main.frag#L100
ty, I've saved those links for later to have a browse
although with what I'm currently doing I don't care about shadows too much
but i'll definitely take a look after
I somewhat fixed the occupancy by putting everything in buffers, but it's still limited because there's too much data in each thread?
What the hell is a subgroupBallot and subgroupVote
We're electing the next US president?
Ah, GL_KHR_shader_subgroup isn't even supported in OpenGL
There is no apparent way of scheduling work to the mesh shader from the task shader that allows culling in OpenGL without KHR_shader_subgroup
I guess this is as far as we go huh...
Pretty sad, I was having a ton of fun even without a debugger
Final image with Mesh Shaders, 32 threads per workgroup, 32 vertices MAX (1 to 1), 124 primitives MAX, 174777 meshlets
It's time to go back to our origins, with good ol' vertex shaders.
they are supported. NVIDIA, AMD and Intel have them in OpenGL. Just not core which is sad
I see, then they are buried under the second page result of google, because first page is Vulkan only 
I don't think the next core opengl version will come for a long time lol
I'm sure it'll come within the heat death of the universe
actually one type of subgroup operations is core (since 4.6) : ARB_shader_group_vote
Interesting, I'll eventually come back to mesh shaders, maybe I'll figure these out.
Hi-Z time!
what
Nevermind that, you can do culling with GL, I just refused to go past the first few google hits lol
I also didn't bother to look at how ballotThreadNV actually worked, turns out it's extremely useful
Yes indeed, unfortunately it's very different from the NV one.
That said, GPUs are scary.
gpus are epiiiiiiic
GPUs are trying to take over the world, they can already feed themselves data to work with.
With task shaders they can even dispatch work to themselves
KHR_shader_subgroup_ballot is a superset of ARB_shader_ballot which is basically NV_shader_thread_group
if this continues, lustri is publishing some api more capable than vulkan, soon
you read it here first
I doubt that, but I may or may not try to hack bindless textures into RenderDoc.
It's really a pain having to disable textures everytime I want to debug with it.
yes please ๐
that also reminds me i wanted to readd a texturearray path for that reason to my shit
you'd be a ๐ฆตend if you did it
it's probably not easy to implement if baldurk has refused to do it a million times already
hmm, maybe you can still hack a non-complete, non-performant version in somehow
jaker and i are worthy guinea pigs
How do I know the size of the mips in a texture created with glTextureStorage2D(..., mips)? ๐ค
If the base level is 1024x768, what's level 1?
The spec says how they're created
But basically it's just max(1, floor(res / 2)) for each level
Very nice
So if I specify floor(log2(max(w, h))) + 1 mips I'll have
0 -> 1024x768
1 -> 512x384
...
9 -> 2x1
10 -> 1x1
yeah, probably
I worry about the probably, but it'll be fine
I almost always mince my statements so I can't be wrong 
not curry?
that comes later
Anything that combines textureGather and textureLod?
interesting AMD has an extension just for that
There is this stupid edge that doesn't go away
#version 460 core
layout (local_size_x = 16, local_size_y = 16, local_size_z = 1) in;
layout (location = 0) uniform uint u_level;
layout (binding = 0) uniform sampler2D u_in_depth;
layout (binding = 1, r32f) uniform writeonly image2D u_out_depth;
void main() {
const uvec2 coord = gl_GlobalInvocationID.xy;
const ivec2 size = imageSize(u_out_depth);
if (all(lessThan(coord, size))) {
const vec4 depth = vec4(
textureLod(u_in_depth, ((vec2(coord) + vec2(0.0, 0.0) + 0.5) / vec2(size)), u_level).r,
textureLod(u_in_depth, ((vec2(coord) + vec2(1.0, 0.0) + 0.5) / vec2(size)), u_level).r,
textureLod(u_in_depth, ((vec2(coord) + vec2(0.0, 1.0) + 0.5) / vec2(size)), u_level).r,
textureLod(u_in_depth, ((vec2(coord) + vec2(1.0, 1.0) + 0.5) / vec2(size)), u_level).r);
imageStore(u_out_depth, ivec2(coord), vec4(max(max(depth.x, depth.y), max(depth.z, depth.w))));
}
}``` What's wrong here?
hmm wrong addressmode perhaps in your u_in_depth?
clamp_to_edge vs clamp_to_border?
(i am just talking out of my ass here)
Yeah it's clamp_to_edge
what if you clamp to border and bordercolly to black?
Wait hold on, I'm an idiot.
Only the first mip is clamp_to_edge
Lovely, it works now
Hmm, HiZ culling has the opposite problem as ROC
It's not very conservative
Mayhaps I have some errors in my implementation.
what was ROC again?
Raster Occlusion Culling
never heard that before
Wot
That doesn't make a lot of sense yes, I meant to say "only the original depth buffer is clamp to edge, the actual hiz mip chain was clamp to border"
I should probably start using sampler objects.
Sampler objects are bae
Hmm HiZ is super fast
By a factor of 2 at least
Still broken though 
It's possible that D3D/Vulkan conventions are biasing this article
try glClipControl to change the depth range, then call the corresponding glm function to generate a new projection
I'd have to change all my 60 shaders to use [0;1] instead of [-1;1]
But honestly it's worth it, I don't know who the hell thought -1,1 depth was a good idea, I hope he's repenting
Finally, I switched to a sane(r) NDC system
No more depth [-1,1] bullshit. (HiZ is still broken though
)
Hmm these are the AABB's uvs that HiZ is seeing, I'm not sure I see anything wrong with them
Maybe the scale?
I am failing to understand HiZ
And the funny part is I don't know why I'm failing at understading it, it's quite straightforward.
I'll start again from square 0, building the HiZ mip chain
are you failing to understand HiZ as a whole, or just a particular bit of its implementation?
I don't know, I feel like I understand it, but when I try and apply "fixes" that I think are causing me problems, everything breaks (or nothing changes at all).
Also:
const mat4 pv = u_cascade_layer == -1 ? camera.pv : cascades[u_cascade_layer].pv;
// project AABB in clip space
vec4 ndc_corner = pv * model * vec4(aabb_corners[i], 1.0);
ndc_corner.z = max(ndc_corner.z, 0.0);
ndc_corner /= ndc_corner.w;``` Why the hell we max out ndc's Z before perspective div is a mystery
when you say "i dont know" you need to say it like jimmy yang in his chinese accent... "oh.... i dont knoow"
then why do you have it
His fault
I mean, it makes sense to do that, just not before perspective division
We don't care about objects behind the near plane after all.
does it even matter if you do the clamp before or after perspective division
the eventual value will be 0 either way
ASUS ROC (Republic of Camers)
Alright enough complaining, I'll get to work seriously now
Ah just one thing, could you find some HiZ samples for me? I can't find anything other than Niagara and the one linked above
I'd like to see some common ground, hopefully
vkguide.dev has one
I like this variable name
NoofInstances
N o o f
Is there anything equivalent to VK_STRUCTURE_TYPE_SAMPLER_REDUCTION_MODE_CREATE_INFO_EXT in OpenGL?
Looks like vkguide is using it as a foolproof way of reducing a depth image
Also that's one long struct name lol
no
it's not hard to implement yourself, it'll just perform worse since you have to explicitly take four shrimples
just go straight to whatever you want to call your gpu architecture + api which will be the successor of vk
where xxReduceDepth() is a thing
:(
it's ez pz
return 0;
or return 1; if you don't have reverse depth 
How
In god's holy name
is level 9
lower than level 10
How does this even happen
isn't that expected behavior
the final mip should be really deep since it's the deepest of all pixels
Well yes, it would be
except I omitted one crucial detail
level 8 is also higher than level 9
Which is absolutely bonkers
maybe you don't handle reducing odd resolutions correctly
you have to do something special for those iirc
yes
Which is why I made this:
static auto previous_power_two(iris::uint32 v) noexcept -> iris::uint32 {
auto r = 1;
while (r < v) {
r <<= 1;
}
return r >> 1;
}```
Here projectSphere seems to take a world space sphere, but that doesn't seem right does it? https://github.com/zeux/niagara/blob/4e3e21440e4b7d0699bcc5d46f2efbe1e0050946/src/shaders/drawcull.comp.glsl#L82
DX12 equivalent if you're curious
https://github.com/microsoft/DirectX-Specs/blob/master/d3d/MeshShader.md#primitive-attributes
Good morning friends.
Day 2 of debugging HiZ, hopefully I'll have a clue what's going on by today
did it come to you in your dreams
idk, never implemented it
I mean, I know how it works conceptually
do depth reduction, project object bounds (AABBs or spheres), gather some texels, etc.
Could you find the original SIGGRAPH slides or the original paper? I can't seem to find it lol
I see you don't go past google hit #5 too 
is there something fundamental about hi-z culling that you're unsure about?
A bit of this
This is deccer cubes
โข๏ธ
These are the screen space bounding rectangles that HiZ is seeing
Now, suppose the red cube is huge, except there's a hole in the middle that allows you to see the purple and green cubes
Would they get culled because the bounding box's depth is lower than in the HiZ's?
shouldn't be a problem
the bounding boxes cannot occlude other bounding boxes
they are simply a conservative testing volume
which you test against the depth buffer
So the bounding rects simply serve as an "index" against the HiZ chain?
they serve the same purpose as the bounding volumes in ROC
a volume that conservatively encapsulates an object which, if visible, is an indication that the object itself is probably visible as well
hi-z and ROC are really just different ways of testing the bounding volumes against the depth buffer
Ah I was going with the opposite (and wrong) intuition: if the volume is NOT visible then the object is culled
that is correct though
if the conservative bounding volume isn't visible, then the object itself definitely isn't
Hmm I see
but if the bounding volume is visible, then the object is only probably visible (because the bounding volume is conservative)
I think I'm getting close to solving it.
I only need to figure out the projectSphere's weird projection thing

I would like to order one death please
You would think.
As a human person
That GL_NEAREST takes the closest pixel to the UV coordinates specified in textureLod
GL_CLOSEST does
It just truncates ๐
badumm tsss
ah
because gl is so unhinged that you have to specify min and mip in the same parameter
otherwise you implicitly disable mips
wouldntvedve happend with the right abstraction
fwog user #2???
I want to say this was a good experience
OpenGL is like a diving in some closed mines experience
But this really wasn't, I just hope OpenGL doesn't fail me again...
crystal clear water everywhere, but as soon as you dive and cause a wave shock, shit is going to hit the fan
or maybe it's lvstri's evil villain Vulkan backstory
haha
By the way, in the few months I have only had a few dozen debug messages
there are usually not many anyway
Not once has the debug message callback come to my help like "hey, you might want to do GL_NEAREST_MIPMAP_NEAREST"
Did you ever enable the synchronous thingy
Yes since you told me
glValidationLayer :C
Haven't had the chance to use it though, since literally not a single message has happened
Well it's not an api error to do that methinks
if the api would have known what you wanted to achieve
Right, I just don't think implicitly disabling mips is not a good way of handling this
Give me literally any message, don't just do stuff silently
dont be so hard on neither of you (you and OpenGL)
the latter will have more opportunities to bite you in the butt
pengu jaker and i were thinking/toying with some glValidation layer thing a while ago, but its super dead
Blue is mip level, red is value sampled from the uvs of that level
that thing could/should/might have picked that one up
I checked level 9, and the mip was just 1.0
So thank you, OpenGL.
I'm sure this would not have happened with any saner API
.
But it is what it is..
At least I have occlusion culling now
Actually, I have 4 different fully functional algorithms of doing occlusion culling 
noice
Because yes, I rewrote the entire algorithm 4 times, in different ways
And they all work perfectly fine
Now, onto removing the over 500 lines of code purely for debugging purposes 
After this I think I'll be taking a big chamomile cup
I'm kind of irritated.
You're using the wrong API then, my friend
https://github.com/GraphicsProgramming/gl-validation-layer is the thing i was talking about
I came back.
I checked this out too, but it looks like it's abandoned?
๐ฆ
It's merely taking a nap
I do think sometimes that I could "just" switch to Vulkan, "just" use meshlets, "just" use meshlet culling which is super accurate and all the bleeding edge features you can name
Except the "just" 
I saw some Vulkan code today as a result of trying to figure out why HiZ was not working and it's very thicc
I don't really want to give up the convenience of GL yet.
Maybe you could do vkguide as a side-side project
The unhingedness does not outweigh the convenience (yet)
ja its kind dead ๐ฆ
You could always use a nice gl wrapper ahem (ignore broken docs build)
https://github.com/JuanDiegoMontoya/Fwog
or switch to c# and use my experiment ๐
At least I know I'm not alone fighting the non-existant debugging messages
But we accept contributors
Anyways I have now calmed down and I'm not mad anymore at GL
I want to do something smol next.
Anti-Aliasing
Do something easy like cutting edge TAA
Except I won't implement TAA myself, I'll just use FSR2 since it's open source (Thank you Jaker)
One small issue
Only Vulkan and dx12 backend are provided
So you'd have to make a gl one
Why do I get the feeling it's not... small
Yeah
You work at AMD right? Just make an OpenGL version duh
Sad
If you want, I can start working on it. I've been needing an excuse to do it
What do you mean it's not supported on OpenGL by the way? Isn't it just a spatio-temporal upscaling algorithm?
Like, why does it require Vulkan or DX12?
It's not just a shader you invoke. Fsr2 also needs a bunch of internal resources and stuff
from 1 to 10? 9.2 id say
Probably not super hard, provided you understand the requirements
...Translating Vulkan code (which I have 0 idea about) to OpenGL code (which I have at most 1 idea about) isn't exactly in my skillset
I am unironically willing to try it
I need to better understand fsr2 for my job anyways 
FXAA
FXAA sounds good
SMAA is like improved FXAA if you're willing to suffer through a much more complicated implementation
you got it figured out at least, you've been doing great things 
eyeris
I hope I don't bring shame to any women in Germany bearing that name with my shitty OpenGL stuff then 
Lovely edges
oi that looks cool
@wicked notch are you following the catlike coding tutorial (I think that was the one I used)
ah
+1 for sorting your messages for better readabliktliblity
the second link only implements half of FXAA btw
it doesn't do the end-of-edge search, which is pretty important for reducing geometric aliasing
I see, I'll try catlike coding then
The paper should implement everything anyways?
How many aa techniques are there anyway
Too many 
Visualizing AABBs when in FXAA debug mode is kinda weird
looks smoof to me
Yeah it's definitely better
this scene is probably also not a good scene to show off antialiasing
I've been trying to find new scenes, no luck unfortuantely ๐ฆ
the directxsamples might have one with 2 telefone poles and a wire
The tree in bistro gets blurred a lot lol
I'll download that
ah bistro also has those lights hanging across on wire
These DirectX samples made me remember how attracted I am to Mesh Shaders
They are so good
Anyways testing scene right now
After a bit more testing I found that FXAA was indeed working..
It's just.. well, not very good 
Do I do TAA or do I not do TAA...
Do it!
No temporal accumulation?
I mean something different
some old TAA impls have like 5 frames of history that they reproject and test against
but modern impls use the last frame only, since it's faster and more stable
Interesting
It's always either Marco Salvi or Akenine-Moller
oh hi marc(o)
They wrote 90% of the papers
I went with ROC in the end
HiZ is too damn conservative for my tastes
Anyways TAA, I think I get the gist of the algorithm but there's still some parts I can't figure out quite well cough neighborhood clipping
But we'll deal with that later.
never tried but I think for ROC GL_NV_representative_fragment_test can be useful
could be worth a try you just need to do glEnable(REPRESENTATIVE_FRAGMENT_TEST_NV)