#Iris - A Journey through OpenGL and beyond to learn Graphics

1 messages ยท Page 3 of 1

frank sail
#

Does gltfpack set a flag

wicked notch
#

It's kind of naive, I just see if the glTF has basisu images in it.

#

Ideally I'd want to set a flag somere in the extras section

#

Feel free to steal as I have stolen from others KEKW

frank sail
#

I've never seen std::type_identity before
auto* ktx = std::type_identity_t<ktxTexture2*>();

#

apparently it's useful for implicit type conversions to the template type

wicked notch
#

My reasoning for using that is I don't like declaring the type on the left

frank sail
#

uh

wicked notch
#

Call me crazy, I deserve it

frank sail
#

you can just write auto* ktx = ktxTexture2*{};

#

nvm bleakekw

#

you have to write it like (int*){}

#

er

#

(ktxTexture2*){}

wicked notch
#

Interesting.

frank sail
#

damn, libktx looks super easy to use

wicked notch
#

I guess my life is a lie then

frank sail
#

for value types, you can indeed write auto foo = Foo{}; (or Foo()), which I do a lot

#

I guess pointer syntax is brain damaged, so you can't write it exactly like those

wicked notch
frank sail
#

I learn something new about C++ every time I read your code

#

what's the advantage of defining swap for a custom type instead of just defining a move constructor and move assignment operator?

wicked notch
#

I don't understand it fully either, but apparently it avoids repetition and favors ADL

frank sail
#

my move semantics are cursed, lemme show you

wicked notch
#

It's called the copy-and-swap idiom

frank sail
#
  Buffer::Buffer(Buffer&& old) noexcept
    : size_(std::exchange(old.size_, 0)),
      storageFlags_(std::exchange(old.storageFlags_, BufferStorageFlag::NONE)),
      id_(std::exchange(old.id_, 0)),
      mappedMemory_(std::exchange(old.mappedMemory_, nullptr))
  {
  }

  Buffer& Buffer::operator=(Buffer&& old) noexcept
  {
    if (&old == this)
      return *this;
    this->~Buffer();
    return *new (this) Buffer(std::move(old));
  }
wicked notch
#

I always flinch when I see a placement new

#

for some reason bleakekw

frank sail
#

I did some epic spec reading with others to ensure this is actually legal code

#

iirc there is an edge case with this pattern when you have a pointer to a subobject of the object that gets destroyed

#

where the spec isn't clear on whether it's UB

#

but no one should (or can) be making pointers to members of Buffer, so it's fine here

wicked notch
#

Reject the spec, embrace "it works on my machine"

frank sail
#

anyways, the whole point of this is to reduce duplication in favor of spooky placement new

#

60% of the time, it works every time

wicked notch
#

I'll keep my "swap" thanks KEKW

frank sail
#

you need move semantics anyways, no?

#

swapping isn't the only reason I need em

#

idk, this is #bikeshed-๐Ÿ˜‡ material

wicked notch
#

If I didn't misunderstand that idiom, copy-and-swap works in all cases

#

Because you define only one copy assignment op, taking self by value

#

and you just swap that with *this

frank sail
#

yeah, it seems to work

wicked notch
#

If you pass anything that's not an xvalue or rvalue it copies it

#

otherwise, move constructed

#

Then you swap it with either the copied state, or "unspecified" state (depending always on whether you move or not) which agrees to move semantics

#

I don't know where ADL comes into play but I never understood ADL

#

Or anything about C++'s overloading resolution rules

frank sail
#

yeah, they're absurdly complex

#

all I know is that it enables this construct
endl(std::cout);

#

oh, it also lets you do this

    std::string a{"hello"};
    std::string b{"world"};
    swap(a, b);
#

which means you don't have to force a particular version of swap, if your thing happens to define it in its namespace/class/whatever

#

std::swap means you will get something slightly less optimal than if you implemented your own swap, since it always move-constructs a temporary instead of just swapping each member

#

maybe once they add reflection, we can finally have an optimal std::swap bleakekw

wicked notch
#

Good morning friends

#

Today I discovered that cutting electricity for half a day is legal in this country

#

Can't really do anything about it if it's an issue in the electrical distribution network.

#

Anyways now that we have Indirect Drawing and Shadows figured out, I will now ponder where to go next.

#

Possible candidates are OIT and Frustum Culling

#

OIT is really interesting because it uses Linked Lists on the GPU which to me is fairly wild.

frank sail
#

that's just one possible implementation of OIT

wicked notch
#

But no linked lists ๐Ÿ˜ฆ

#

Are there other things things that use GPU Linked Lists, I'm very curious to try them

frank sail
#

linked lists generally aren't what you want to be doing on the GPU ๐Ÿ˜„

frank sail
#

all done in a compoot shader

#

well, there's raster occlusion culling, which uses the frogment shader

wicked notch
#

Occlusion culling in compute hmm.

frank sail
#

look up "hi-z occlusion culling"

#

it's a bit more complex than raster occlusion culling

#

you can use both at the same time if you wish

wicked notch
#

Alright I have pondered enough, it's time to setup frustum + occlusion culling

wispy spear
#

so, how did you compress sponza

#

did you just run compressonator with some flags?

#

and it produced ktx2 files out of png?

wicked notch
#

I ended up not using Compressonator as it was far too overkill for my purposes, instead gltfpack is very automagic

#
gltfpack -i .\bistro\bistro.gltf -o .\compressed\bistro\bistro.glb -tc -tq 10 -vpf -kn -km -ke -noq```
#

I just had to run this and boom: everything is compressed

wispy spear
#

ah neat

wicked notch
#

Somehow I always manage to forget GPUs are parallel machines

#

Hmm I still get occasional flickering for some reason?

#

Even though this was most likely the issue, there is still flickering once every second or so, randomly

#

Shader is just this:

void main() {
    uint index = gl_GlobalInvocationID.x;
    if (index == 0) {
        for (uint i = 0; i < u_draw_count; ++i) {
            draw_count[i] = 0;
        }
    }
    barrier();
    if (index < u_object_count) {
        const object_info_t object = objects[index];
        const mat4 local_transform = local_transforms[object.local_transform];
        const mat4 global_transform = global_transforms[object.global_transform];
        const mat4 model = global_transform * local_transform;
        if (is_object_visible(object, model)) {
            const uint slot = atomicAdd(draw_count[object.group_index], 1);
            indirect_commands[slot + object.group_offset] = object.command;
            object_shift[slot + object.group_offset].object_id = index;
        }
    }
}
#

(is_object_visible always returns true for now)

frank sail
#

do you call glMemoryBarrier in your host code

wicked notch
#

I have no idea what that is (so no)

frank sail
#

it's necessary to make your program correct

wicked notch
#

Interesting, I'll read up on that

frank sail
#

it ensures incoherent reads and writes (SSBO and image stores from shaders) are visible/completed to future operations

#

so if you write some indirect commands in a shader, then do glMultiDrawElementsIndirect, you need glMemoryBarrier(GL_COMMAND_BARRIER_BIT); between them

#

otherwise the driver cannot see that the MDI command depends on the dispatch and issue the corresponding synchronization and cache flush/invalidation

wicked notch
#

I see

#

So just like CPUs atomics then

#

I assume "visible" and "available" mean the same thing (I mean, not that visible == available, just that visible/available on the CPU are the same on the GPU)?

frank sail
#

glMakeVisible and glMakeAvailable bleakekw

#

(jk those don't exist)

wicked notch
#

It's basically a cache flush

frank sail
#

there is just one concept of memory visibility in opengl

#

btw, DX11 doesn't have this, so the driver has to issue conservative barriers between every pass that does incoherent writes

#

which means you can't mess up sync, but you also cannot get maximum perf

#

anyways, flickering like what you have is typically a symptom of a synchronization issue

#

you can check it by inserting glFinish or glMemoryBarrier(GL_ALL_BARRIER_BITS) after every draw/dispatch that does SSBO/image writes

wicked notch
#

Very interesting

#

Why does cache become incoherent on the GPU itself though?

#

Like, the GPU is feeding itself data, how does it fail to maintain coherency?

#

Oh it's probably because each GPU SM has its own cache and any workgroup writing data is not guaranteed to be the same reading it?

frank sail
#

GPU cache coherency protocols are very basic compared to CPU ones

#

and often rely on manual flushes and sync

#

idk all the details though

wicked notch
#

So hold on a sec

#

I also do a depth reduce and setup shadow cascades in compute

#

...Am I supposed to insert barriers here too?

frank sail
#

prob

wicked notch
#

But it's been working fine up until now with no errors from the debug callback frog_sweat

frank sail
#

any time you want a write to be visible or a read to have finished before the next pass that consumes the memory, you need a barrier

wicked notch
#

So this:

depth_reduce_init_shader.bind();
offscreen_attachment[1].bind_texture(0);
depth_reduce_attachments[0].bind_image_texture(0, 0, false, 0, GL_WRITE_ONLY);
camera_buffer.bind_base(1);
glDispatchCompute(depth_reduce_wgc[0].x, depth_reduce_wgc[0].y, 1);

depth_reduce_shader.bind();
for (auto i = 1; i < depth_reduce_wgc.size(); i++) {
    depth_reduce_attachments[i - 1].bind_image_texture(0, 0, false, 0, GL_READ_ONLY);
    depth_reduce_attachments[i].bind_image_texture(1, 0, false, 0, GL_WRITE_ONLY);
    glDispatchCompute(depth_reduce_wgc[i].x, depth_reduce_wgc[i].y, 1);
}```
becomes this:
```cpp
...
glDispatchCompute(depth_reduce_wgc[0].x, depth_reduce_wgc[0].y, 1);
glMemoryBarrier();
depth_reduce_shader.bind();
for (auto i = 1; i < depth_reduce_wgc.size(); i++) {
    ...
    glDispatchCompute(depth_reduce_wgc[i].x, depth_reduce_wgc[i].y, 1);
    glMemoryBarrier();
}```?
frank sail
#

ye

wicked notch
#

Hmm I also read from the depth's sampler though, do I need a barrier here too ๐Ÿค”

#

Before the first dispatch, I mean

frank sail
#

wdym before the first dispatch

#

like in some draw?

wicked notch
#

Should I do glMemoryBarrier(GL_FRAMEBUFFER_BARRIER_BIT);?

#

The spec isn't very clear on the rules.

frank sail
#

SHADER_IMAGE_ACCESS_BARRIER_BIT: Memory accesses using shader built-in image load, store, and atomic functions issued after the barrier will reflect data written by shaders prior to the barrier. Additionally, image stores and atomics issued after the barrier will not execute until all memory accesses (e.g., loads, stores, texture fetches, vertex fetches) initiated prior to the barrier complete.

#

btw, the ref pages do not mention this critical information for some of the barrier bits, so you best refer to the spec

wicked notch
#

Hm

#

Interesting

frank sail
#

I guess that is to say that you need a barrier before the first dispatch as well

wicked notch
#

By the way

#

We might have a situation

#

Using glFinish(); I still see occasional flickering nervous

frank sail
#

spoopy

#

idk if glFinish technically makes writes visible, so maybe don't use that

#

or use glFinish+glMemoryBarrier if you're super paranoid ๐Ÿ˜„

wicked notch
#

Uhhh

#
glMemoryBarrier(GL_ALL_BARRIER_BITS);
glFinish();``` still flickers
#

oh no

frank sail
#

where did you put it

wicked notch
#

After the dispatch that writes indirect commands

frank sail
#

damn

#

maybe there is a race within that shader. Lemme look at it again

#

shader looks okay

wicked notch
#

I pushed this broken stuff

frank sail
#

I thought atomics could only be done on variables that were neither readonly nor writeonly

wicked notch
#

Compiler doesn't really complain but you are right lol

#

It doesn't really make sense for it to be writeonly

#

Same flickering though

frank sail
#

I guess try debugging with nsight

#

or somehow simplifying the shader (e.g., removing the atomic and just using the global invocation id, if possible)

#

wait

#

Can you explain the loop at the beginning of your shader

wicked notch
#

Since the shader increments the draw_count, I need a way to reset it between invocations

#

So that it doesn't grow to infinity and beyond

frank sail
#

What do you expect barrier() to do

wicked notch
#

Make all other threads wait for the first thread to finish initializing draw_count?

frank sail
#

Did you know that barrier only synchronizes threads within a single workgroup

#

i.e., it is not global sync

#

so the code is probably wrong if you have more than one wg

wicked notch
#

me:

frank sail
wicked notch
#

Yeah...

#

no more flickering

#

How many more absolutely vital pieces of information am I missing I wonder KEKW

frank sail
#

just read the whole gl and glsl specs before continuing smart

wicked notch
#

Took a whole day just to setup Frustum Culling

#

Very promising dare I say KEKW

frank sail
#

usually takes me a lot longer than that ๐Ÿ™‚

wispy spear
#

and me ^3 that

wicked notch
#

Alright that's a wrap, tomorrow we'll have actual frustum culling (and possibly even occlusion)

wicked notch
#

By the way I only just realized that gltfpack merges primitives if they have the same node and material

#

Thankfully it's open source so I could simply add a flag and build from source

wicked notch
#
bool intersect_aabb_plane(in aabb_t aabb, in vec4 plane) {
    const vec3 normal = plane.xyz;
    const vec3 size = aabb.size.xyz;
    const vec3 center = aabb.center.xyz;
    const float radius = dot(size, abs(normal));
    return -radius <= dot(normal, center) - plane.w;
}

bool is_object_visible(in object_info_t object, in mat4 model) {
    const aabb_t aabb = object.aabb;
    const vec3 world_aabb_max = vec3(model * vec4(aabb.max.xyz, 1.0));
    const vec3 world_aabb_min = vec3(model * vec4(aabb.min.xyz, 1.0));
    const vec3 world_aabb_center = (world_aabb_max + world_aabb_min) / 2.0;
    const vec3 world_aabb_extents = world_aabb_max - world_aabb_center;

    const aabb_t global_aabb = aabb_t(
        vec4(world_aabb_min, 0.0),
        vec4(world_aabb_max, 0.0),
        vec4(world_aabb_center, 0.0),
        vec4(world_aabb_extents, 0.0));
    for (int i = 0; i < 6; ++i) {
        if (!intersect_aabb_plane(global_aabb, frustum.planes[i])) {
            return false;
        }
    }

    return true;
}``` It was disappointingly trivial to implement...
#

Learning about AABBs with mouse picking was worth it KEKW

wicked notch
#

Is it worth doing this for shadow maps too? ๐Ÿค”

wicked notch
#

Hmm, frustum culling shadow cascades doesn't really work sadly, I need more tolerance I guess?

wicked notch
#

Turns out my intuition for culling shadow lights was completely off

wicked notch
#

CHC++ Uses hardware occlusion queries, but from what I'm reading they are fairly inefficient due to CPU stalling, can conditional rendering fix this?

#

Maybe HiZ culling is the way to go?

frank sail
#

Hardware occlusion queries aren't great these days since you can just write to a buffer now

wicked notch
#

But how do you do occlusion culling for shadows?

#

HiZ requires depth

#

But shadows are depth

frank sail
#

uh

#

you do it the exact same way as usual

frank sail
#

here's a classic algorithm:

  1. Render objects that were marked visible to depth
  2. Perform occlusion culling against depth, marking visible objects
#

if you do it all on the GPU, there is just one frame of latency between an object being marked visible, and actually being drawn

#

But you can add a third step to remove that latency

#

By simply drawing the objects whose visibility changed from 0 to 1 this frame

wicked notch
#

I uh

#

How do you do step 2 without rendering all objects

#

You need the depth of every object to check whether the object is visible or not?

frank sail
#

Step 2 depends on the implementation

#

For hi-z, it means performing the test for every object's bounding volume

#

For raster occlusion culling, it means drawing the bounding volume for every object (which is hopefully substantially cheaper than actually drawing every object)

#

Raster is cool because it's so shrimple

wicked notch
#

Wow you actually render cubes

#

Incredible

#

It's not like you told me already

#

...if I'm dumb

wispy spear
#

thats a cute pic

wicked notch
#

Shadow frustum culling for some unknown reason does not work

#

Isn't it the same exact thing? As frustum culling for perspective projections I mean.

#

Even though I explicitly disable near plane culling, it looks like it's doing it anyways...?

#

I think

#

Nevermind I don't

frank sail
wicked notch
#

Ight it works now

wicked notch
#

Actually nevermind, that was just placebo, I'm not culling anything now nervous

#

I actually have no idea now ๐Ÿค”

#
bool is_aabb_inside_plane(in aabb_t aabb, in mat4 model, in vec4 plane) {
    const vec3 normal = plane.xyz;
    const vec3 extent = aabb.extent.xyz;
    const vec3 center = aabb.center.xyz;
    const float radius = dot(extent, abs(normal));
    return -radius <= (dot(normal, center) - plane.w);
}

bool is_object_visible(in object_info_t object, in mat4 model) {
    const aabb_t aabb = object.aabb;
    const vec3 world_aabb_min = vec3(model * vec4(aabb.min.xyz, 1.0));
    const vec3 world_aabb_max = vec3(model * vec4(aabb.max.xyz, 1.0));
    const vec3 world_aabb_center = vec3(model * vec4(aabb.center.xyz, 1.0));
    const vec3 right = vec3(model[0]) * aabb.extent.x;
    const vec3 up = vec3(model[1]) * aabb.extent.y;
    const vec3 forward = vec3(-model[2]) * aabb.extent.z;

    const vec3 world_extent = vec3(
        abs(dot(vec3(1, 0, 0), right)) +
        abs(dot(vec3(1, 0, 0), up)) +
        abs(dot(vec3(1, 0, 0), forward)),

        abs(dot(vec3(0, 1, 0), right)) +
        abs(dot(vec3(0, 1, 0), up)) +
        abs(dot(vec3(0, 1, 0), forward)),

        abs(dot(vec3(0, 0, 1), right)) +
        abs(dot(vec3(0, 0, 1), up)) +
        abs(dot(vec3(0, 0, 1), forward)));

    const aabb_t world_aabb = aabb_t(
        vec4(world_aabb_min, 1.0),
        vec4(world_aabb_max, 1.0),
        vec4(world_aabb_center, 1.0),
        vec4(world_extent, 1.0));
    const uint planes = bool(u_disable_near_culling) ? 5 : 6;
    for (uint i = 0; i < planes; ++i) {
        if (!is_aabb_inside_plane(world_aabb, model, frustum.planes[i])) {
            return false;
        }
    }

    return true;
}
``` This should be fine?
#

I mean, it works perfectly fine for a perspective projection, why not for shadows?

#

it's just culling completely visible objects for some unknown reason?

#

They aren't even z < 0

#

It's only the first cascade as well...

#

I'm lighting the Jaker beacon

frank sail
#

ask chatgpt what's wrong with your code bleakekw

wicked notch
#

Why would you even suggest that frog_gone

frank sail
#

cuz I'm a lazy bastard

wicked notch
frank sail
#

ngl I actually asked chatgpt, but I can't tell if it's answer is correct nervous

#

probably because I don't understand 100% of the math in the original code

#

maybe it'll help if you walk me through the math, rubberducky style ๐Ÿ˜„

#

btw, is_aabb_inside_plane has an unused parameter

#

and arguably is_object_visible should take an aabb_t instead of an object_info_t, if all you need from it is the AABB

wicked notch
#

Yeah that's me checking various things

#

Anyways the math is as follows:

  • Translate AABB's center to world space model * vec4(center, 1)
  • Translate and correct AABB's extents (should account for rotations and scales, we use the first 3 columns of the model matrix to correct this)
  • Check if the AABB is on or inside all 6 planes (or 5 if near culling is disabled, last plane is the near plane), we basically take the signed distance from the plane's origin to the center of the AABB and check if it's within radius or more
#

dot(normal, center) gives whether the point is inside or outside the plane

#

Is there any way I can debug a compute shader?

frank sail
#

no ๐Ÿ˜ฆ

#

well, not in gl

wicked notch
#

Out of pure curiosity, what did nogpt answer?

frank sail
#

It said something about the computation for world_extent being wrong

wicked notch
#

I have discovered

#

A thing

#

Actually multiple things.

#

First off my signs are completely broken.

wispy spear
wicked notch
#

Second, distances from the plane origins are garbage

#

Third.

#

I have no idea how to fix all this KEKW

#

Therefore I'll grab a man's best friends: pen and paper, and write down stuff.

frank sail
#

how dare you disrespect man's true best friend

wicked notch
#

True, pen and paper are actually a man's oldest friend.

#

A tool as old as time

wicked notch
#
With inverse(view)
[0] = {iris::plane_t} {normal=[0.609994292 0 0.792405844], distance=-4.57495737}
[1] = {iris::plane_t} {normal=[0.609994292 0 -0.792405844], distance=-4.57495737}
[2] = {iris::plane_t} {normal=[0.5 0.866025447 0], distance=-2.88397455}
[3] = {iris::plane_t} {normal=[0.5 -0.866025447 -0], distance=-4.61602545}
[4] = {iris::plane_t} {normal=[-1 -0 -0], distance=-504.5}
[5] = {iris::plane_t} {normal=[1 0 0], distance=-7.4000001}

With inverse(pv)
[0] = {iris::plane_t} {normal=[-0.609995067 0 0.792405247], distance=-4.57496309}
[1] = {iris::plane_t} {normal=[-0.609995067 0 -0.792405247], distance=-4.57496309}
[2] = {iris::plane_t} {normal=[-0.500000775 0.86602503 0], distance=-4.61603069}
[3] = {iris::plane_t} {normal=[-0.500000775 -0.86602503 0], distance=-2.88398075}
[4] = {iris::plane_t} {normal=[1 0 0], distance=-504.5}
[5] = {iris::plane_t} {normal=[-1 0 0], distance=-7.4000001}
#

Why god

wicked notch
#

I fixed the thing

#

I finally achieved inner peace.

#

@ derhass helped me a lot, honorable mention here.

wicked notch
#

Alright

#

Now occlusion culling

#

Wish me luck frog_sweat

wispy spear
#

good luck

wicked notch
#

Hmm, the "first frame" is very important in raster occlusion culling apparently, but I still don't understand the "core loop" very well

#

If the first frame I perform no occlusion culling, the next frame I am supposed to mark all objects that were visible the previous frame and render them?

#

What about changes though? Reprojection of the depth buffer?

wispy spear
#

you can run a frame before the gameloop starts

wicked notch
#

And I guess any time I can't reliably reproject ๐Ÿ˜…

frank sail
#

you don't need to reproject anything frogstare

wicked notch
frank sail
#

reprojection is for if you want to reuse an old depth buffer for new object positions

#

but you don't have to do that

frank sail
#

I described some methods above that don't require reprojection

wispy spear
#

a dogjiff.gif also doesnt require reprojection

frank sail
#

I suppose the core idea is that, instead of reprojecting, you use the object visibility from last frame instead

#

Also, the first frame isn't a special case when you do this

wicked notch
#

Yes, I see

#

Something like this I suppose

frank sail
#

yus

wicked notch
#

How do I know if an object is completely occluded though...

#

No samples pass the depth test? Doesn't that require an occlusion query ๐Ÿค”

#

Oh wait that's what early Z is for right?

frank sail
#

early z + ssbo write in fs

wicked notch
#

If no samples pass the depth test then there will be no writes?

#

Crazy

#

Huge

wicked notch
#

I can feel more brain expansion

frank sail
#

technically that code is UB btw

#

since there is a race if there are multiple fragments frog_gone

wicked notch
#

atomicCompareExchange?

#

Or whatever it's called in GLSL

frank sail
#

atomicExchange

wicked notch
#

atomicCompSwap

frank sail
#

either one works

wicked notch
#

atomicCompSwap(data[draw_id].is_visible, 0, 1)

#

So that you only do this once

#

efishenshy

frank sail
#

writing the same value from multiple threads is technically UB according to the spec, but works on actual hw

wicked notch
#

Yeah makes sense, there's no reason why it wouldn't

frank sail
#

I imagine it's faster than atomics since you don't have to serialize access to the memory controller

wicked notch
#

Now what vertices do I feed the GPU?

#

Hmm

#

I suppose I could first do frustum culling, get the number of visible objects there, then build another indirect buffer with instanceCount = number of objects in frustum and draw cubes?

#
``` I don't really understand this...
frank sail
#

The comment says 24 vertices, but I think it's actually 14

wicked notch
#

1e-1

#

What

#

-1.0?

frank sail
#

lmao

#

I must've been testing different epsilon values

wicked notch
#

Oh god this is going to be a mess isn't it

#

If I want to do occlusion culling for shadows too, that is

#

Each cascade gets its own indirect buffer...

frank sail
#

it shouldn't be too bad if you abstract it properly

wicked notch
#

This piece of code is copy-pasted 5 times KEKW

#

Uhh

#

I have these two buffer bindings here:

layout (std430, binding = 7) restrict buffer b_occlusion_draw_count {
    uint occlusion_draw_count;
};

layout (std430, binding = 8) writeonly restrict buffer b_occlusion_indirect_command {
    indirect_command_t occlusion_command;
};```
#

Because I need atomics basically

#

Is there any way I can avoid creating a whole SSBO for a single uint?

wispy spear
#

you could just use a glUniform1ui

proven laurel
#

^

wispy spear
#

what happened to your foo_t naming convention

wicked notch
#

Hmm but I'd still need to bind this buffer to GL_PARAMETER_BUFFER

wicked notch
wispy spear
#

are you doing glMDICount?

wicked notch
#

Yes

proven laurel
#

isn't there GL_DRAW_INDIRECT_BUFFER? or does that not work with MDI?

wicked notch
#

Yes but you also need to bind a buffer to tell GL how many indirect commands to consume, that's bound to GL_PARAMETER_BUFFER

wispy spear
#

those are 2 buffers

proven laurel
#

oh lol

wispy spear
#

ye you need two

#

just looksied in the spec

proven laurel
#

the buffer types are weird

wispy spear
proven laurel
#

this is effectively a counter buffer

wicked notch
#

Naming couldn't be worse yes KEKW

wispy spear
#

May Fiffths

#

almost to the day ๐Ÿ™‚

proven laurel
#

75% of the problems people have when learning gl would be solved with better names

#

anyhow, doing hi-z culling here?

wicked notch
#

Raster based, because it's more ๐Ÿฆle

wicked notch
#

I'm currently taking the output of the frustum culling pass to draw only the AABBs that are in frustum

proven laurel
#

oh you mean like

#

that opengl occlusion thing?

wicked notch
#

Sorry, which thing?

proven laurel
#

was thinking about the conditional rendering occlusion stuff opengl has

#

but pretty sure the perf of that ain't that great

wicked notch
#

Ah no, the idea is to rasterize AABBs and check visibility with early Z

#

So you basically leverage early Z to write into an SSBO the visibility of any object

proven laurel
#

right

wicked notch
#

Then you just write indirect commands that pass this test (if they managed to pass frustum culling first that is)

wicked notch
#

Jaker, why is your depth write disabled when rendering visible bounding boxes?

#

Ah I see.

wicked notch
#

Do we use last frame's depth buffer as well hmm

#

"This pass comes after the scene pass because it relies on a depth buffer to have already been created" do we really need this though?

frank sail
frank sail
wicked notch
#

But what about this

frank sail
#

which part

wicked notch
#

It's a bullet point but it's all the same technique

frank sail
#

the temporal coherence just means that object visibility will be almost the same from frame to frame

#

objects visible last frame are probably visible this frame, etc

wicked notch
#

So rip shadow mapping occlusion culling?

#

If the depth buffer already needs to be created then we can't really cull anything can we?

#

Unless it's acceptable for shadow mapping culling to just use last frames' depth?

frank sail
#

it should work for shadows, even if the frustum changes every frame

#

I think you have some confusion about exactly what data is taken from the last frame

wicked notch
#

I'm very bad at tracking resources nervous

#

But I think just depth?

#

Uh

#

Not sure frog_sweat

frank sail
#

lemme go through it again

#
  1. clear depth buffer
  2. render visible objects
  3. render bounding boxes for occlusion testing
  4. render objects whose visibility changed from 0 to 1 this frame (optional step to prevent one frame of pop-in when objects become visible)
#

the visibility info from step 2 is used in step 1 of the next frame

#

visibility=draw commands or whatever

wicked notch
#

๐Ÿธ ๐Ÿ’ก

#

Got it, thanks

frank sail
#

the thingy with reprojection basically uses the last frame's depth buffer, and swaps occlusion testing and object rendering (so u test first)

#

which I guess is more intuitive lol

#

the problem is that reprojection is not perfect and leads to arguably worse artifacts (false occlusion) compared to one frame of lag (which, again, can be mitigated by step 3 above)

wicked notch
#

About step 3

#

Can I just do this:

void main() {
    const uint prev = visibility[i_object_id] & 0x1;
    visibility[i_object_id] = (prev << 1u) | 1;
}```
frank sail
#

uh

#

why the long shift

wicked notch
#

Yeah, it's useless

frank sail
#

red herring smh

wicked notch
#

Does this work as in last = curr?

#

Idea is bit 1 is last and bit 0 is curr

frank sail
#

Tbh I'd use an if statement just to make it obvious ๐Ÿ˜„

wicked notch
#

To keep track of change

frank sail
#

if (lastFrameVisibility[obj_id] == 0)
{
objectsThatNeedToBeDrawnThisFrame[obj_id] = 1;
}
thisFrameVisibility[obj_id] = 1;

#

you can certainly reduce the number of buffers needed here

wicked notch
#

Yeah, I'll just make it work first

#

On my way copy pasting 5 times the same snippet (again)

frank sail
#

hehe

wicked notch
#

Actually screw that, I don't even know if it works only for objects in my perspective

#

Let's just test primary camera perspective first

#

It didn't work ๐Ÿ˜ฆ

#

Wait I didn't bind the shader

#

It worksn't

#

Ah my shift is wrong

#

Goddamnit

wispy spear
wicked notch
#

It's "working" but it doesn't seem to cull anything more than frustum culling did

frank sail
#

Make a debug mode that draws the bounding boxes to the screen

wicked notch
#

Alright, occlusion culling works

#

Except it's slower than... a depth prepass?

wicked notch
#

HiZ is looking more and more appealing... there are loads of issues with ROC apparently

#

Well, it's to be expected when dealing with bounding boxes, I wish ROC was a bit more conservative though

wicked notch
#

I don't know the exact reason, but some flickering can be observed if you are inside an AABB

#

Sometimes not even just flickering, you get the object seemingly transparent due to its visibility changing every frame

#

Here for example

#

I am using your Fwog because I nuked ROC KEKW

frank sail
#

ah

#

that artifact happens when an object occludes its own bounding box

wicked notch
#

Hmm

frank sail
#

a shrimple way to mitigate it is to always draw the object if the camera is very close to its bounding box

wicked notch
#

The object is not very close to the camera here, perhaps some offset could be applied to the AABB's position?

frank sail
#

I can't see the aabb of the object in question

wicked notch
#

Well... we are inside it KEKW

frank sail
#

all it takes for the artifact to appear is for the camera to be inside the aabb and to not be looking at any other side of the object

#

I wonder if you can use clip planes for this

wicked notch
#

Hmm

#

I already have frustum planes from frustum culling

#

I would shrimply have to invert the condition

frank sail
#

Tbh it's easier to just draw the thingy if you're inside the aabb

wicked notch
#

Yeah

#

But that kills occlusion culling basically KEKW

frank sail
#

ideally, you wouldn't have objects with giant bounding boxes like that

wicked notch
#

Many scenes merge primitives so their bounding boxes are huge

#

I de-nuked ROC

#

Ah by the way, I can't find any documentation about NSight's profiler magic words

frank sail
wicked notch
#

How the hell do I read this

#

What is PES+VPC

frank sail
#

vpc = viewport culling

#

idk what pes is

wicked notch
#

Also PCIe throughput is reporting 16GB/s, should I be worried about that?

#

They only happen at the end of shadow map draws

frank sail
#

Are you uploading something ๐Ÿค”

#

Or maybe some buffer is in host memory

wicked notch
#

Actually... yes lol

#

I was rewriting the whole texture buffer, object buffer, indirect buffer for each call to "perform_frustum_cull"

#

Truly incredible

frank sail
#

Zoo wee mama

wicked notch
#

Looks like checking if you are inside the AABB works fine

#

Or rather, I have not found any more artifacts yet.

wicked notch
frank sail
#

meshlets are little baby meshes

#

like 64-128 verts each

#

might be too fine though, since each aabb is like 14-24 verts

#

Meshoptimizer can emit meshlets

#

They are typically used with mesh shaders

wicked notch
#

This is super interesting holy

#

Apparently it's a whole new rendering paradigm?

#

Gone are vertex -> geometry -> raster?

frank sail
#

everything before fs is replaced by task and mesh shaders

#

it's pretty low level

wicked notch
#

No worries, it'll only be a very small detour

#

How do you enable extensions?

frank sail
#

they are automatically enabled

#

you just have to check that your implementation supports it

wicked notch
wicked notch
#

Old: model_t
New: meshlet_group_t

#

To be honest I'm not understanding much, if anything at all nervous

#

The task shader is optional apparently, which is great for me, one less thing to worry about

frank sail
#

you use task shaders to cull meshlets, basically

wispy spear
#

are you switching to vk already? froge_evil

wicked notch
#

No way

frank sail
#

just using the cursed gl nv meme shader ext

wicked notch
#

I'll stay on GL for a long time

#

Why is it cursed?

wispy spear
#

ah

frank sail
#

cursed because probably no one uses it ๐Ÿธ

wicked notch
#

I know everyone's given up on GL, but I have not

#

I couldn't help but see there's no AMD version of this mesh shader extension

frank sail
#

yup

#

you only get cross platform meme shading on nv

#

I mean vk lol

proven laurel
#

meme shading lol

wispy spear
#

who cares about AMD and INTEL anyway

frank sail
#

AMD (pejorative)

wispy spear
#

heh

wicked notch
#
auto meshlet_triangles = std::vector<uint32>(max_meshlets * max_triangles * 3);``` I stared at that 3 for a solid minute
#

Before remembering: "Wait a minute, a triangle has 3 points"

#

Incredible.

wispy spear
#

tis a multi tringle

#

one per universe

frank sail
wicked notch
#
struct meshlet_t {
    uint32 vertex_offset = 0;
    uint32 vertex_count = 0;
    uint32 triangle_offset = 0;
    uint32 triangle_count = 0;
    
    // this indices into meshlet_group_t::vertices
    std::vector<uint32> vertices;
    // gl_PrimitiveIndicesNV I guess
    std::vector<uint8> triangles;
};

struct meshlet_group_t {
    std::vector<meshlet_t> meshlets;
    std::vector<vertex_format_t> vertices;
    std::vector<uint32> indices;
};``` I have no idea what I'm doing ![KEKW](https://cdn.discordapp.com/emojis/666849321462792234.webp?size=128 "KEKW")
frank sail
#

are you copying a sample from somewhere

#

also, I N D I R E C T I O N

wicked notch
#

I'm relying on meshoptimizer's documentation which is not much

#
constexpr auto max_vertices = 64u;
constexpr auto max_triangles = 124u;
constexpr auto cone_weight = 0.0f;
const auto max_meshlets = meshopt_buildMeshletsBound(indices.size(), max_vertices, max_triangles);
auto meshlets = std::vector<meshopt_Meshlet>(max_meshlets);
auto meshlet_vertices = std::vector<uint32>(max_meshlets * max_vertices);
auto meshlet_triangles = std::vector<uint8>(max_meshlets * max_triangles * 3);
const auto meshlet_count = meshopt_buildMeshlets(
    meshlets.data(),
    meshlet_vertices.data(),
    meshlet_triangles.data(),
    indices.data(),
    indices.size(),
    (const float32*)vertices.data(),
    vertices.size(),
    sizeof(vertex_format_t),
    max_vertices,
    max_triangles,
    cone_weight);

auto& last_meshlet = meshlets[meshlet_count - 1];
meshlet_vertices.resize(last_meshlet.vertex_offset + last_meshlet.vertex_count);
meshlet_triangles.resize(last_meshlet.triangle_offset + ((last_meshlet.triangle_count * 3 + 3) & ~3));
meshlets.resize(meshlet_count);``` This "works"
#

i.e: it doesn't crash

frank sail
#

oh jeez

wicked notch
#

I guess the "triangles" are indices into the meshlet itself?

frank sail
#

I guess cone_weight is some value to influence how faces are grouped w.r.t. their normal

#

idk, we're all guessing

wicked notch
#

It's something regarding cone based culling but I dunno

#

"cone_weight should be left as 0 if cluster cone culling is not used, and set to a value between 0 and 1 to balance cone culling efficiency with other forms of culling like frustum or occlusion culling."
Whatever this means

frank sail
#

yeah, ultimately you would use it to make normal-based culling better

#

but optimizing for that too much means you might group far apart vertices together, making frustum and occlusion culling worse

wicked notch
#

What I would really like to know right now is whatever the hell gl_PrimitiveIndicesNV is

#

Since it's a uint8 and NVIDIA only allows up to 256 - 1 triangles in a meshlet I suppose they are indices that index into the meshlet vertices?

#

i.e: gl_MeshVerticesNV

frank sail
#
When each mesh shader work group completes, it emits an output mesh
    consisting of
...
  * an array of vertex index values written to the built-in output array
      gl_PrimitiveIndicesNV, where each output primitive has a set of one,
      two, or three indices that identify the output vertices in the mesh used
      to form the primitive.
wicked notch
#

So... yes?

#

gl_PrimitiveIndicesNV are indices for gl_MeshVerticesNV?

#

I'll just assume yes for now

#

I'll know if I'm wrong because I will see funny triangles (or none at all)

frank sail
#

where did you see gl_MeshVerticesNV

frank sail
#

I don't see it in the spec

wicked notch
#
out gl_MeshPerVertexNV {
     vec4  gl_Position;
     float gl_PointSize;
     float gl_ClipDistance[];
     float gl_CullDistance[];
  } gl_MeshVerticesNV[];```
#

It's this thing apparently

wispy spear
#

hmm the roblox guy, the same guy who made meshoptimizer, forgor his name atm, has a video series of meshletisms going iirc

#

its vulkan, but still

wicked notch
#

I'll look it up one sec

#

I have to figure out if I need the original index buffer or not

#

I'm 80% sure I don't

wispy spear
#

i think it was towards the end, meshlet culling

wicked notch
#

fun fact, sponza subdivides in 3515 meshlets

#

(Assuming I didn't break any laws of physics)

wicked notch
#

We are back to the origins once more.

#

For the third time KEKW

#

Jaker, could you translate into (comprehensible) english what the first parameter of glDrawMeshTasksNV does?

#

Wait it's simply an offset int gl_WorkGroupID.x

#

Why would I ever need this

frank sail
#

if you want to draw a subset of meshlets

wicked notch
#

Hmm

#

I lately type "hmm" a lot, I should make an emoji ๐Ÿธ ๐Ÿค”

#

frog_think

frank sail
#

consider it a convenience parameter, like how dispatches are 3D

frank sail
wicked notch
#

Yeah, that's fair.

#

deccer if you please.

#

Oh shit this is garbage, one second

frank sail
#

#1027528776717975592

#

reminds me of this frog_shush I made

#

the hand needs an outline though

wicked notch
#

Yeah, I'll do it 'morrow.

#

I would like to see sponza before I sleepy

wicked notch
#

Uh

#
------------
Internal error: assembly compile error for mesh shader at offset 30836:
-- error message --
line 1003, column 8:  error: unknown opcode modifier
-- internal assembly text --
!!NVmp5.0
OPTION NV_internal;
OPTION NV_shader_storage_buffer;
OPTION NV_bindless_texture;
GROUP_SIZE 1;
PRIMITIVE_TYPE TRIANGLES;
PRIMITIVES_OUT 124;
VERTICES_OUT 64;
# cgc version 3.4.0001, build date Apr 13 2023
# command line args:
#vendor NVIDIA Corporation
#version 3.4.0.1 COP Build Date Apr 13 2023
#profile gp5mp
#program main
#semantic b_transform_buffer : SBO_BUFFER[0]
#semantic b_meshlet_buffer : SBO_BUFFER[1]
#semantic b_vertex_buffer : SBO_BUFFER[2]
#semantic pv
#semantic meshlet.5 : __LOCAL
#var uint3 gl_WorkGroupID : $vin.CTAID : CTAID[0] : -1 : 1
#var float4 gl_MeshVerticesNV[0].gl_Position : $vout.POSITION : HPOS[32] : -1 : 1
#var float gl_MeshVerticesNV[0].gl_PointSize : $vout.PSIZE :  : -1 : 0
#var float gl_MeshVerticesNV[0].gl_ClipDistance[0] :  :  : -1 : 0
#var float gl_MeshVerticesNV[0].gl_Cul

I managed to crash NVIDIA's internal shader compiler

frank sail
#

oof

wicked notch
#

Removing this:

layout (location = 0) in flat uint o_meshlet_id;``` fixes it
#

Ah I see

#
layout (location = 0) out t_per_vertex {
    uint meshlet_id;
} o_per_vertex[];```
#

Where the hell do I put the flat

#

lol

#

out flat t_per_vertex doesn't work

frank sail
#

idk

#

you probably can't

wicked notch
#

....how do I send flat attributes to the frag shader?

frank sail
#

you can write the same value three times

#

ez flat

wicked notch
#

It's per vertex not per primitive sadly

frank sail
#

and from the mesh shader, it should be easy

#

You have access to the whole meshlet in the mesh shader, no?

#

You should be able to do whatever you want with a little creativity

wicked notch
#

So far my creativity has caused the NVIDIA internal compiler to crash 4 times

#

I figured it out btw

#
layout (location = 0) out t_per_vertex {
    flat uint meshlet_id;
} o_per_vertex[];``` this is legal apparently
#

The spec says it's not

#

But NVIDIA is NVIDIA I guess

#

What do you mean nsight does not support debugging their own fucking extension

#

Only D3D12 Mesh Shaders are supported in NSight apparently..

wicked notch
#

At least deccer's cubes work

#

But I'll leave it like this

#

There's no way I can continue with mesh shading if I can't even debug...

#

Pretty sad

wicked notch
#

I once again forgor a triangle has 3 vertices

#

Now I just have to figure out why everything borks when there's more than one meshlet group

#

While I figure that out, here's good froge

frank sail
#

frogchamp

wicked notch
#

83796 meshlets and 5942 meshlet groups (or if you are old fashioned "meshes" ๐Ÿ˜„)

#

Occupancy is dead though nervous

#

I'm hitting the memory limit 11 times out of 10 KEKW

#
#version 460 core
#extension GL_NV_mesh_shader : require

struct meshlet_t {
    uint triangle_count;
    uint vertex_count;
    uint vertex_offset;
    uint[64] vertices;
    uint[384] triangles;
    uint mesh_index;
};

struct vertex_format_t {
    vec4 position;
    vec4 normal;
    vec4 uv;
    vec4 tangent;
};

layout (local_size_x = 1) in;

layout (triangles, max_vertices = 64, max_primitives = 124) out;

layout (location = 0) out t_per_vertex {
    flat uint meshlet_id;
} o_per_vertex[];

layout (location = 0) uniform mat4 pv;

layout (std430, binding = 0) readonly restrict buffer b_transform_buffer {
    mat4[] transforms;
};

layout (std430, binding = 1) readonly restrict buffer b_meshlet_buffer {
    meshlet_t[] meshlets;
};

layout (std430, binding = 2) readonly restrict buffer b_vertex_buffer {
    vertex_format_t[] vertices;
};

void main() {
    const uint workgroup_index = gl_WorkGroupID.x;
    const meshlet_t meshlet = meshlets[workgroup_index];
    const mat4 transform = transforms[meshlet.mesh_index];

    for (uint i = 0; i < meshlet.vertex_count; ++i) {
        const vertex_format_t vertex = vertices[meshlet.vertex_offset + meshlet.vertices[i]];
        gl_MeshVerticesNV[i].gl_Position = pv * transform * vec4(vertex.position.xyz, 1.0);
        o_per_vertex[i].meshlet_id = workgroup_index;
    }

    const uint index_count = meshlet.triangle_count;
    gl_PrimitiveCountNV = index_count;
    for (uint i = 0; i < index_count * 3; ++i) {
        gl_PrimitiveIndicesNV[i] = meshlet.triangles[i];
    }
}
``` Backup
finite yacht
wicked notch
#

Interesting, I'll check it out

#

I think this struct is the culprit by the way:

struct meshlet_t {
    uint triangle_count;
    uint vertex_count;
    uint vertex_offset;
    uint[64] vertices;
    uint[384] triangles;
    uint mesh_index;
};```
#

Each thread loading 2KiB of data isn't ideal I think KEKW

finite yacht
#

the bad occupancy you mean? Thats probably because the layout (local_size_x = 1) in; so its only using 1/subgroupSize of the hardware

wicked notch
#

Ah yeah I changed that, I now use 32

#

Occupancy is still horrible, though better by a factor of 2

finite yacht
#

btw didnt you start learning opengl only a few months ago how can you already be messing with mesh shaders and shit thats crazy

wicked notch
#

I just copied the nvidia sample KEKW

#

Also jaker helped me translate the spec into human language

frank sail
#

pretty sure I was doing learnopengl shit a couple months in

wicked notch
#

I am still doing that by the way, I implemented normal mapping on the way

frank sail
#

I didn't have this server when I started tho

wicked notch
#

Yeah, this server is a gold mine

frank sail
#

#1019779751600205955 message

cunning atlas
frank sail
#

I haven't done a bunch of the things lvstri is doing here

#

the shadows are way cooler already

#

and I haven't even touched mesh shaders

wicked notch
#

You gave me the filtering algorithm though

cunning atlas
#

yeah they look great :3

#

probably look back through this thread when I go back to making my shadows look better

wicked notch
#

I also spent a solid week fixing bugs in my shadows nervous

cunning atlas
#

and look how it paid off ๐Ÿ˜‰

wicked notch
cunning atlas
#

ty, I've saved those links for later to have a browse

#

although with what I'm currently doing I don't care about shadows too much

#

but i'll definitely take a look after

wicked notch
#

I somewhat fixed the occupancy by putting everything in buffers, but it's still limited because there's too much data in each thread?

wicked notch
#

What the hell is a subgroupBallot and subgroupVote

#

We're electing the next US president?

#

Ah, GL_KHR_shader_subgroup isn't even supported in OpenGL

#

There is no apparent way of scheduling work to the mesh shader from the task shader that allows culling in OpenGL without KHR_shader_subgroup

#

I guess this is as far as we go huh...

#

Pretty sad, I was having a ton of fun even without a debugger

#

Final image with Mesh Shaders, 32 threads per workgroup, 32 vertices MAX (1 to 1), 124 primitives MAX, 174777 meshlets

#

It's time to go back to our origins, with good ol' vertex shaders.

finite yacht
wicked notch
#

I see, then they are buried under the second page result of google, because first page is Vulkan only KEKW

proven laurel
wicked notch
#

I'm sure it'll come within the heat death of the universe

finite yacht
#

actually one type of subgroup operations is core (since 4.6) : ARB_shader_group_vote

wicked notch
#

Interesting, I'll eventually come back to mesh shaders, maybe I'll figure these out.

wicked notch
#

Hi-Z time!

wicked notch
#

Nevermind that, you can do culling with GL, I just refused to go past the first few google hits lol

#

I also didn't bother to look at how ballotThreadNV actually worked, turns out it's extremely useful

frank sail
wicked notch
#

Yes indeed, unfortunately it's very different from the NV one.

#

That said, GPUs are scary.

frank sail
#

gpus are epiiiiiiic

wicked notch
#

GPUs are trying to take over the world, they can already feed themselves data to work with.

#

With task shaders they can even dispatch work to themselves

finite yacht
#

KHR_shader_subgroup_ballot is a superset of ARB_shader_ballot which is basically NV_shader_thread_group

wispy spear
#

if this continues, lustri is publishing some api more capable than vulkan, soon

#

you read it here first

wicked notch
#

I doubt that, but I may or may not try to hack bindless textures into RenderDoc.

#

It's really a pain having to disable textures everytime I want to debug with it.

wispy spear
#

yes please ๐Ÿ˜„

#

that also reminds me i wanted to readd a texturearray path for that reason to my shit

frank sail
#

you'd be a ๐Ÿฆตend if you did it

#

it's probably not easy to implement if baldurk has refused to do it a million times already

#

hmm, maybe you can still hack a non-complete, non-performant version in somehow

wispy spear
#

jaker and i are worthy guinea pigs

wicked notch
#

How do I know the size of the mips in a texture created with glTextureStorage2D(..., mips)? ๐Ÿค”

#

If the base level is 1024x768, what's level 1?

frank sail
#

The spec says how they're created

#

But basically it's just max(1, floor(res / 2)) for each level

wicked notch
#

Very nice

#

So if I specify floor(log2(max(w, h))) + 1 mips I'll have
0 -> 1024x768
1 -> 512x384
...
9 -> 2x1
10 -> 1x1

frank sail
#

yeah, probably

wicked notch
#

I worry about the probably, but it'll be fine

frank sail
#

I almost always mince my statements so I can't be wrong smart

wispy spear
#

not curry?

frank sail
#

that comes later

wicked notch
#

Anything that combines textureGather and textureLod?

wispy spear
#

textureGatherLod

#

jk

finite yacht
#

interesting AMD has an extension just for that

wicked notch
#

There is this stupid edge that doesn't go away

#
#version 460 core

layout (local_size_x = 16, local_size_y = 16, local_size_z = 1) in;

layout (location = 0) uniform uint u_level;

layout (binding = 0) uniform sampler2D u_in_depth;
layout (binding = 1, r32f) uniform writeonly image2D u_out_depth;


void main() {
    const uvec2 coord = gl_GlobalInvocationID.xy;
    const ivec2 size = imageSize(u_out_depth);
    if (all(lessThan(coord, size))) {
        const vec4 depth = vec4(
            textureLod(u_in_depth, ((vec2(coord) + vec2(0.0, 0.0) + 0.5) / vec2(size)), u_level).r,
            textureLod(u_in_depth, ((vec2(coord) + vec2(1.0, 0.0) + 0.5) / vec2(size)), u_level).r,
            textureLod(u_in_depth, ((vec2(coord) + vec2(0.0, 1.0) + 0.5) / vec2(size)), u_level).r,
            textureLod(u_in_depth, ((vec2(coord) + vec2(1.0, 1.0) + 0.5) / vec2(size)), u_level).r);
        imageStore(u_out_depth, ivec2(coord), vec4(max(max(depth.x, depth.y), max(depth.z, depth.w))));
    }
}``` What's wrong here?
wispy spear
#

hmm wrong addressmode perhaps in your u_in_depth?

#

clamp_to_edge vs clamp_to_border?

#

(i am just talking out of my ass here)

wicked notch
#

Yeah it's clamp_to_edge

wispy spear
#

what if you clamp to border and bordercolly to black?

wicked notch
#

Wait hold on, I'm an idiot.

#

Only the first mip is clamp_to_edge

#

Lovely, it works now

wicked notch
#

Hmm, HiZ culling has the opposite problem as ROC

#

It's not very conservative

#

Mayhaps I have some errors in my implementation.

wispy spear
#

what was ROC again?

wicked notch
#

Raster Occlusion Culling

wispy spear
#

never heard that before

frank sail
wicked notch
#

That doesn't make a lot of sense yes, I meant to say "only the original depth buffer is clamp to edge, the actual hiz mip chain was clamp to border"

#

I should probably start using sampler objects.

frank sail
#

Sampler objects are bae

wicked notch
#

Hmm HiZ is super fast

#

By a factor of 2 at least

#

Still broken though KEKW

#

It's possible that D3D/Vulkan conventions are biasing this article

frank sail
#

try glClipControl to change the depth range, then call the corresponding glm function to generate a new projection

wicked notch
#

I'd have to change all my 60 shaders to use [0;1] instead of [-1;1]

#

But honestly it's worth it, I don't know who the hell thought -1,1 depth was a good idea, I hope he's repenting

wicked notch
#

Finally, I switched to a sane(r) NDC system

#

No more depth [-1,1] bullshit. (HiZ is still broken though KEKW)

wicked notch
#

Hmm these are the AABB's uvs that HiZ is seeing, I'm not sure I see anything wrong with them

#

Maybe the scale?

wicked notch
#

I am failing to understand HiZ

#

And the funny part is I don't know why I'm failing at understading it, it's quite straightforward.

#

I'll start again from square 0, building the HiZ mip chain

frank sail
#

are you failing to understand HiZ as a whole, or just a particular bit of its implementation?

wicked notch
#

I don't know, I feel like I understand it, but when I try and apply "fixes" that I think are causing me problems, everything breaks (or nothing changes at all).

#

Also:

const mat4 pv = u_cascade_layer == -1 ? camera.pv : cascades[u_cascade_layer].pv;
// project AABB in clip space
vec4 ndc_corner = pv * model * vec4(aabb_corners[i], 1.0);
ndc_corner.z = max(ndc_corner.z, 0.0);
ndc_corner /= ndc_corner.w;``` Why the hell we max out ndc's Z before perspective div is a mystery
wispy spear
#

when you say "i dont know" you need to say it like jimmy yang in his chinese accent... "oh.... i dont knoow"

wicked notch
#

His fault

#

I mean, it makes sense to do that, just not before perspective division

#

We don't care about objects behind the near plane after all.

frank sail
#

does it even matter if you do the clamp before or after perspective division

#

the eventual value will be 0 either way

wicked notch
#

I guess

#

I'm grasping at straws here frog_gone

#

ROC was much easier froge_sad

frank sail
#

ASUS ROC (Republic of Camers)

wicked notch
#

Alright enough complaining, I'll get to work seriously now

#

Ah just one thing, could you find some HiZ samples for me? I can't find anything other than Niagara and the one linked above

#

I'd like to see some common ground, hopefully

frank sail
#

vkguide.dev has one

frank sail
#

N o o f

wicked notch
#

Is there anything equivalent to VK_STRUCTURE_TYPE_SAMPLER_REDUCTION_MODE_CREATE_INFO_EXT in OpenGL?

#

Looks like vkguide is using it as a foolproof way of reducing a depth image

#

Also that's one long struct name lol

frank sail
#

no

wicked notch
frank sail
wicked notch
#

I am gigastupid then

#

Can't seem to implement a proper depth reduction

wispy spear
#

just go straight to whatever you want to call your gpu architecture + api which will be the successor of vk

#

where xxReduceDepth() is a thing

wicked notch
#

:(

frank sail
#

or return 1; if you don't have reverse depth smart

wicked notch
#

How

#

In god's holy name

#

is level 9

#

lower than level 10

#

How does this even happen

frank sail
#

isn't that expected behavior

#

the final mip should be really deep since it's the deepest of all pixels

wicked notch
#

Well yes, it would be

#

except I omitted one crucial detail

#

level 8 is also higher than level 9

#

Which is absolutely bonkers

frank sail
#

maybe you don't handle reducing odd resolutions correctly

#

you have to do something special for those iirc

wicked notch
#

yes

#

Which is why I made this:

static auto previous_power_two(iris::uint32 v) noexcept -> iris::uint32 {
    auto r = 1;
    while (r < v) {
        r <<= 1;
    }
    return r >> 1;
}```
wicked notch
#

Mip chain is fully working now

#

Onto figuring out the dumb algorithm

wicked notch
#

This is very frustrating.

#

I'll stop for now

wicked notch
wicked notch
#

Good morning friends.

#

Day 2 of debugging HiZ, hopefully I'll have a clue what's going on by today

frank sail
#

did it come to you in your dreams

wicked notch
#

Sadly not

#

By the way, if you know how this works, could you give me a hint?

frank sail
#

idk, never implemented it

#

I mean, I know how it works conceptually

#

do depth reduction, project object bounds (AABBs or spheres), gather some texels, etc.

wicked notch
#

Could you find the original SIGGRAPH slides or the original paper? I can't seem to find it lol

frank sail
#

I'll try

#

I failed

wicked notch
#

I see you don't go past google hit #5 too bleakekw

frank sail
#

is there something fundamental about hi-z culling that you're unsure about?

wicked notch
#

This is deccer cubes

#

โ„ข๏ธ

#

These are the screen space bounding rectangles that HiZ is seeing

#

Now, suppose the red cube is huge, except there's a hole in the middle that allows you to see the purple and green cubes

#

Would they get culled because the bounding box's depth is lower than in the HiZ's?

frank sail
#

shouldn't be a problem

#

the bounding boxes cannot occlude other bounding boxes

#

they are simply a conservative testing volume

#

which you test against the depth buffer

wicked notch
#

So the bounding rects simply serve as an "index" against the HiZ chain?

frank sail
#

they serve the same purpose as the bounding volumes in ROC

#

a volume that conservatively encapsulates an object which, if visible, is an indication that the object itself is probably visible as well

#

hi-z and ROC are really just different ways of testing the bounding volumes against the depth buffer

wicked notch
#

Ah I was going with the opposite (and wrong) intuition: if the volume is NOT visible then the object is culled

frank sail
#

that is correct though

#

if the conservative bounding volume isn't visible, then the object itself definitely isn't

wicked notch
#

Hmm I see

frank sail
#

but if the bounding volume is visible, then the object is only probably visible (because the bounding volume is conservative)

wicked notch
#

Alright

#

Thanks a lot for the additional insight

wicked notch
#

I think I'm getting close to solving it.

#

I only need to figure out the projectSphere's weird projection thing

wicked notch
#

Heh

#

Hahah

#

Aahahahha

frank sail
wicked notch
#

I would like to order one death please

#

You would think.

#

As a human person

#

That GL_NEAREST takes the closest pixel to the UV coordinates specified in textureLod

wispy spear
#

GL_CLOSEST does

frank sail
#

It just truncates ๐Ÿ™‚

wicked notch
#

Except if you have mips

#

You need GL_NEAREST_MIPMAP_NEAREST

wispy spear
#

badumm tsss

frank sail
#

ah

#

because gl is so unhinged that you have to specify min and mip in the same parameter

#

otherwise you implicitly disable mips

wispy spear
#

wouldntvedve happend with the right abstraction

frank sail
#

fwog user #2???

wispy spear
wicked notch
#

I want to say this was a good experience

wispy spear
#

OpenGL is like a diving in some closed mines experience

wicked notch
#

But this really wasn't, I just hope OpenGL doesn't fail me again...

wispy spear
#

crystal clear water everywhere, but as soon as you dive and cause a wave shock, shit is going to hit the fan

frank sail
#

or maybe it's lvstri's evil villain Vulkan backstory

wispy spear
#

haha

wicked notch
#

By the way, in the few months I have only had a few dozen debug messages

wispy spear
#

there are usually not many anyway

wicked notch
#

Not once has the debug message callback come to my help like "hey, you might want to do GL_NEAREST_MIPMAP_NEAREST"

frank sail
#

Did you ever enable the synchronous thingy

wicked notch
#

Yes since you told me

wispy spear
#

glValidationLayer :C

wicked notch
#

Haven't had the chance to use it though, since literally not a single message has happened

frank sail
wicked notch
#

It should be

#

fucking hell

#

Don't just implicitly disable shit

wispy spear
#

if the api would have known what you wanted to achieve

wicked notch
#

Right, I just don't think implicitly disabling mips is not a good way of handling this

#

Give me literally any message, don't just do stuff silently

wispy spear
#

dont be so hard on neither of you (you and OpenGL)

#

the latter will have more opportunities to bite you in the butt

wicked notch
#

I can only imagine

#

By the way

#

Here's how I diagnosed this

wispy spear
#

pengu jaker and i were thinking/toying with some glValidation layer thing a while ago, but its super dead

wicked notch
#

Blue is mip level, red is value sampled from the uvs of that level

wispy spear
#

that thing could/should/might have picked that one up

wicked notch
#

I checked level 9, and the mip was just 1.0

#

So thank you, OpenGL.

#

I'm sure this would not have happened with any saner API KEKW.

#

But it is what it is..

#

At least I have occlusion culling now

#

Actually, I have 4 different fully functional algorithms of doing occlusion culling KEKW

wispy spear
#

noice

wicked notch
#

Because yes, I rewrote the entire algorithm 4 times, in different ways

#

And they all work perfectly fine

#

Now, onto removing the over 500 lines of code purely for debugging purposes KEKW

#

After this I think I'll be taking a big chamomile cup

#

I'm kind of irritated.

frank sail
wispy spear
wicked notch
#

I came back.

wicked notch
#

๐Ÿ˜ฆ

frank sail
#

It's merely taking a nap

wicked notch
#

Except the "just" KEKW

frank sail
wicked notch
#

I saw some Vulkan code today as a result of trying to figure out why HiZ was not working and it's very thicc

#

I don't really want to give up the convenience of GL yet.

frank sail
#

Maybe you could do vkguide as a side-side project

wicked notch
#

The unhingedness does not outweigh the convenience (yet)

wispy spear
#

ja its kind dead ๐Ÿ˜ฆ

frank sail
wispy spear
#

or switch to c# and use my experiment ๐Ÿ˜›

wicked notch
frank sail
wicked notch
#

Anyways I have now calmed down and I'm not mad anymore at GL

#

I want to do something smol next.

#

Anti-Aliasing

frank sail
#

Do something easy like cutting edge TAA

wicked notch
#

Except I won't implement TAA myself, I'll just use FSR2 since it's open source (Thank you Jaker)

frank sail
#

One small issue

#

Only Vulkan and dx12 backend are provided

#

So you'd have to make a gl one

wicked notch
#

Why do I get the feeling it's not... small

#

Yeah

#

You work at AMD right? Just make an OpenGL version duh

frank sail
#

I can make one in my personal time

#

But there won't be an official one

wicked notch
#

Sad

frank sail
#

If you want, I can start working on it. I've been needing an excuse to do it

wicked notch
#

What do you mean it's not supported on OpenGL by the way? Isn't it just a spatio-temporal upscaling algorithm?

#

Like, why does it require Vulkan or DX12?

frank sail
#

It's not just a shader you invoke. Fsr2 also needs a bunch of internal resources and stuff

wicked notch
#

Hmm

#

How hard is it to port to OpenGL?

wispy spear
#

from 1 to 10? 9.2 id say

frank sail
frank sail
wicked notch
frank sail
#

I am unironically willing to try it

#

I need to better understand fsr2 for my job anyways KEKW

wispy spear
#

after the volume clustered renderer~~?~~!

#

Jaker:

wicked notch
#

I don't really handle transparency for now so I'm safe from this thing?

frank sail
wicked notch
#

Ah yes

#

Lines

#

Yeah ok, I don't understand shit KEKW

#

Hmm where to go next I wonder

frank sail
#

FXAA

wicked notch
#

FXAA sounds good

frank sail
#

SMAA is like improved FXAA if you're willing to suffer through a much more complicated implementation

cunning atlas
#

you got it figured out at least, you've been doing great things frogapprove

wispy spear
#

did you know Iris is also a female firstname in germany

#

not just the eye part

frank sail
#

eyeris

wicked notch
wispy spear
#

haha

#

na, dont worry

wicked notch
#

Lovely edges

wispy spear
#

oi that looks cool

frank sail
#

@wicked notch are you following the catlike coding tutorial (I think that was the one I used)

frank sail
#

ah

wispy spear
#

+1 for sorting your messages for better readabliktliblity

frank sail
#

the second link only implements half of FXAA btw

#

it doesn't do the end-of-edge search, which is pretty important for reducing geometric aliasing

wicked notch
#

I see, I'll try catlike coding then

wicked notch
#

The paper should implement everything anyways?

frank sail
#

yeah

twin bough
#

How many aa techniques are there anyway

wicked notch
#

Too many KEKW

wicked notch
#

2 hour wait for a haircut but we got it done

#

Time to finish FXAA

wicked notch
#

Visualizing AABBs when in FXAA debug mode is kinda weird

wicked notch
#

It's not very... anti-aliased.

#

The AABB debug lines are pretty smooth though

wispy spear
#

looks smoof to me

wicked notch
#

Yeah it's definitely better

wispy spear
#

this scene is probably also not a good scene to show off antialiasing

wicked notch
#

I've been trying to find new scenes, no luck unfortuantely ๐Ÿ˜ฆ

wispy spear
#

the directxsamples might have one with 2 telefone poles and a wire

wicked notch
#

The tree in bistro gets blurred a lot lol

wicked notch
wispy spear
#

ah bistro also has those lights hanging across on wire

wicked notch
#

These DirectX samples made me remember how attracted I am to Mesh Shaders

#

They are so good

#

Anyways testing scene right now

wicked notch
#

After a bit more testing I found that FXAA was indeed working..

#

It's just.. well, not very good nervous

wicked notch
#

Do I do TAA or do I not do TAA...

finite yacht
#

Do it!

wicked notch
#

I'll try asking my orb (the magic 8 ball)

#

Looks like I will do TAA

frank sail
#

Sweet

#

all I ask is that you use a TAA impl that relies on a single frame of history

wicked notch
#

No temporal accumulation?

frank sail
#

I mean something different

#

some old TAA impls have like 5 frames of history that they reproject and test against

#

but modern impls use the last frame only, since it's faster and more stable

wicked notch
#

Interesting

wicked notch
#

It's always either Marco Salvi or Akenine-Moller

frank sail
#

oh hi marc(o)

wicked notch
#

They wrote 90% of the papers

wicked notch
#

I went with ROC in the end

#

HiZ is too damn conservative for my tastes

#

Anyways TAA, I think I get the gist of the algorithm but there's still some parts I can't figure out quite well cough neighborhood clipping

#

But we'll deal with that later.

finite yacht
#

never tried but I think for ROC GL_NV_representative_fragment_test can be useful

#

could be worth a try you just need to do glEnable(REPRESENTATIVE_FRAGMENT_TEST_NV)

wicked notch
#

I'll try it right now since it seems easy enough

#

Ah it's a performance thing, let's see..

#

Damn, this actually reduced my frametimes by 1ms lol

#

Unfortunately it's NV only..

finite yacht
#

yeah. Just test if its available before enabling it ez

#

i am suprised it makes such a big difference. Wouldnt expect drawing a few aabbs and writing to ssbo to be that expensive