Iris - A Journey through OpenGL and beyond to learn Graphics | Graphics Programming | Page 6

wicked notch Jun 25, 2023, 10:33 AM

#

wtf

frank sail Jun 25, 2023, 10:35 AM

#

it means you can't write Foo& foo const

#

because references are immutable

wicked notch Jun 25, 2023, 10:36 AM

#

Hmm

#

That was not my intention, I wanted to do the classic™️ const Foo& foo

frank sail Jun 25, 2023, 10:38 AM

#

oh this is in a shader frogstare

wicked notch Jun 25, 2023, 10:38 AM

#

Yes, I wanted to do this basically:

layout (buffer_reference) buffer BDA {};
void main() {
    const BDA ptr = BDA(address);
}```

frank sail Jun 25, 2023, 10:38 AM

#

ye saw your #vulkan post 😄

#

I just assumed C++

#

I think what you can do is make another struct (or buffer declaration) where all the members are const, then cast the address to that

#

honestly quite incredible

#

can you put the readonly qualifier on the buffer declaration

wicked notch Jun 25, 2023, 10:41 AM

#

Actually yeah

#

That might do the tricc

#

But there is another problem

#

Wait, there is no problem

#

I am shrimply dum

#

I forgor explicit_shader_arithmetic_types

#

[validation] Validation Error: [ UNASSIGNED-Device address out of bounds ] Object 0: handle = 0x26427633360, type = VK_OBJECT_TYPE_QUEUE;
 | MessageID = 0x1a898625 | Device address 0x111d0030 access out of bounds.  Command buffer (0x26432dba6a0). Draw Index 0. Pipeline (0x89e60f0000000042). Shader Module (0x5c528300000
0003e). Shader Instruction Index = 226.  Stage = Vertex. Vertex Index = 2 Instance Index = 0.  Shader validation error occurred in file ../shaders/0.1/main.vert at line 43.Unable to find suitable #line directive in SPIR-V OpSource.

#

Validation truly is, broken (as in too powerful)

#

How do you even detect this KEKW

wispy spear Jun 25, 2023, 10:46 AM

#

by running it

#

and catching this error

#

wrapping it in a function you can query

#

frog_shrimple

wicked notch Jun 25, 2023, 10:58 AM

#

my first BDA triangle

#

BDA is so good

wispy spear Jun 25, 2023, 10:58 AM

#

noice

wicked notch Jun 25, 2023, 10:58 AM

#

I have no idea how I've lived without it for so long

wicked notch Jun 25, 2023, 3:10 PM

#

VK_EXT_mesh_shader has been conquered

#

And it is faster than anything I have ever written in OpenGL somefuckinghow

#

HOW is this faster than my microoptimized, single drawcall, indexed meshlet emulator

#

This doesn't make any friggin sense KEKW

#

This doesn't even have vertex quantization, or anything at all really

#

It's just bruteforce meshlets and it's somehow faster..

#

All the time I spent optimizing the emulation in GL 🥲

wicked notch Jun 25, 2023, 3:54 PM

#

After a very slight optimization, still no quantization, this is what it looks like

#

500 microseconds to render bistro (no culling at all) KEKW

finite yacht Jun 25, 2023, 5:04 PM

#

for comparison against vertex shader I am getting 740 microseconds on a RX 5700 XT, also without any culling

wicked notch Jun 25, 2023, 5:05 PM

#

Very nice

finite yacht Jun 25, 2023, 5:09 PM

#

rtx 3070 is a more performant card so some difference definitely comes just from that

#

you render exterior only, yes?

wicked notch Jun 25, 2023, 5:10 PM

#

No actually, interior as well

finite yacht Jun 25, 2023, 5:26 PM

#

what the. The new amd drivers dont have GL_ARB_texture_compression_bptc anymore.
(Need to have some some compressed internal format, otherwise interior fills vram completely)

#

an other bug ticket it is

#

yes I consider that a bug

wicked notch Jun 25, 2023, 5:28 PM

#

rip

#

Wait

#

My Vulkan thing has no textures

#

holy shit KEKW

#

I completely forgot about implementing textures

frank sail Jun 25, 2023, 5:36 PM

#

finite yacht what the. The new amd drivers dont have `GL_ARB_texture_compression_bptc` anymor...

Good thing bptc is core now

wicked notch Jun 25, 2023, 5:39 PM

#

430 usecs for exterior only

#

Not much difference

finite yacht Jun 25, 2023, 5:48 PM

#

i am dumb the extension string I was checking for had a typo

finite yacht Jun 25, 2023, 5:50 PM

#

frank sail Good thing bptc is core now

forgot that

wicked notch Jun 25, 2023, 5:53 PM

#

I am stupid, with traditional rendering is what gets written to the depth buffer just gl_Position.z after w division?

frank sail Jun 25, 2023, 5:57 PM

#

and after the viewport transform, I believe

#

but I've never used manual depth range so idk

wicked notch Jun 25, 2023, 6:00 PM

#

``` This should work in Vulkan too right?

frank sail Jun 25, 2023, 6:01 PM

#

Yes

wicked notch Jun 25, 2023, 6:01 PM

#

So I can just use gl_FragCoord.z

#

epic

wicked notch Jun 25, 2023, 7:55 PM

#

I have huge respect for Unreal Engine devs

#

I already had huge respect but now that I'm trying to do what they do

#

It's incredibly painful bleakekw

#

nsight does not support R64_UNORM textures

#

The dudes over at epic really just did Nanite without debuggers bleakekw

wispy spear Jun 25, 2023, 7:57 PM

#

or has tools which can handle it

#

or early versions of nsight etc, im sure there is some cooperation happening

wicked notch Jun 25, 2023, 7:57 PM

#

Likely yeah, they probably have their own debuggers and tools tbh

wispy spear Jun 25, 2023, 7:58 PM

#

that should encourage you to make one yourself too : >

wicked notch Jun 25, 2023, 7:58 PM

#

I am only one human being

wispy spear Jun 25, 2023, 7:58 PM

#

but a smart one

wicked notch Jun 25, 2023, 7:58 PM

#

It's a numbers problem KEKW

wispy spear Jun 25, 2023, 7:58 PM

#

doesnt make you less schmart 🙂

wicked notch Jun 25, 2023, 8:40 PM

#

Ladies and gents

#

We got the thing

#

we have depth, meshlet ID and primitive ID inside a 64bit framebuffer

#

hybrid software/hardware rasterization coming soon™️

#

#version 460
#extension GL_ARB_separate_shader_objects : enable
#extension GL_EXT_shader_explicit_arithmetic_types : enable
#extension GL_EXT_shader_image_int64 : enable
#extension GL_EXT_mesh_shader : enable

layout (location = 0) in i_vertex_data_block {
    flat uint i_meshlet_id;
};

layout (r64ui, set = 0, binding = 1) uniform u64image2D u_visbuffer;

void main() {
    const uint64_t depth = uint64_t(floatBitsToUint(gl_FragCoord.z) & 0x3fffffffu);
    const uint64_t payload =
        (uint64_t(depth) << 34) |
        ((uint64_t(i_meshlet_id) << 7) & 0x07ffffff) |
        ((uint64_t(gl_PrimitiveID)) & 0x7f);
    imageAtomicMax(u_visbuffer, ivec2(gl_FragCoord.xy), payload);
}
``` world's weirdest fragment shader ![KEKW](https://cdn.discordapp.com/emojis/666849321462792234.webp?size=128 "KEKW")

wispy spear Jun 25, 2023, 8:50 PM

#

name the constants too please

#

(not location/binding/set 🙂)

wicked notch Jun 25, 2023, 9:37 PM

#

Uhh

#

I think position is kinda broken bleakekw

#

vec3 unproject_depth(in float depth, in vec2 uv) {
    const vec4 ndc = vec4(uv * 2.0 - 1.0, depth, 1.0);
    const vec4 world = inverse(u_camera.data.pv) * ndc;
    return world.xyz / world.w;
}
const vec3 position = unproject_depth(depth, gl_FragCoord.xy / vec2(resolution));

#

Am I stupid or is this fine?

frank sail Jun 25, 2023, 9:41 PM

#

your world variable should be called clip

#

or something bleakekw

wicked notch Jun 25, 2023, 9:41 PM

#

I don't think that's what's causing the problem bleakekw

#

Also you sure?

frank sail Jun 25, 2023, 9:41 PM

#

anyways, looks okay

wicked notch Jun 25, 2023, 9:42 PM

#

If you invert PV and transform NDC you get M

frank sail Jun 25, 2023, 9:42 PM

#

yeah so you actually just have world but with funny w

#

misinfo

wicked notch Jun 25, 2023, 9:42 PM

#

world_with_funny_w

wispy spear Jun 25, 2023, 9:43 PM

#

looks cool though

wicked notch Jun 25, 2023, 9:46 PM

#

I fixed

#

somehow gl_FragCoord.xy / resolution is different than getting uvs from vertex shader

wispy spear Jun 25, 2023, 9:51 PM

#

barythingy vs perspective perhaps?

wicked notch Jun 25, 2023, 9:52 PM

#

Probable

wicked notch Jun 25, 2023, 10:35 PM

#

very correct interpolation

#

KEKW

wispy spear Jun 25, 2023, 10:40 PM

#

leave it like that

#

now make the game 🙂

wicked notch Jun 25, 2023, 10:44 PM

#

this is required to make the game

#

Very important to have nanite KEKW

wispy spear Jun 25, 2023, 10:45 PM

#

good progress either way 🙂

wicked notch Jun 25, 2023, 10:48 PM

#

We have le normals

#

I can still do analytical partial derivatives let's goo

#

I don't understand the need to rescale the derivatives though thonk

#

perhaps doing the math by hand could be beneficial

wicked notch Jun 26, 2023, 8:34 AM

#

Is EmitMeshTasksEXT just a fancy vkCmdDispatch except it's in a task shader? frog_thinkk

frank sail Jun 26, 2023, 8:35 AM

#

Yes

wicked notch Jun 26, 2023, 8:35 AM

#

Hmm

frank sail Jun 26, 2023, 8:35 AM

#

Too late, I sleep now. Gn frogeheart

wicked notch Jun 26, 2023, 8:36 AM

#

Gn sir

#

By the time you will awoke from your slumber, meshlet culling shall be fully functional

wicked notch Jun 26, 2023, 9:55 AM

#

wicked notch Jun 26, 2023, 12:06 PM

#

Hmm, I don't understand glsl subgroupBallotExclusiveBitCount

#

gl_SubgroupSize could be 64 in case of AMD, so how does this return uint? frog_thinkk

#

Does subgroupBallotExclusiveBitCount perhaps count bits up until gl_SubgroupInvocationID (excluded)?

proven laurel Jun 26, 2023, 12:11 PM

#

wicked notch `gl_SubgroupSize` could be 64 in case of AMD, so how does this return `uint`? <:...

wdym

wicked notch Jun 26, 2023, 12:12 PM

#

subgroupBallotExclusiveBitCount returns uint, that's 32 bits, not enough to hold all subgroup ballots (for wave64's)

proven laurel Jun 26, 2023, 12:12 PM

#

uint subgroupBallotExclusiveBitCount(uvec4 value) returns the exclusive scan of the number of bits set in value, only counting the bottom gl_SubgroupSize bits (we'll cover what an exclusive scan is later).

#

number of bits

wicked notch Jun 26, 2023, 12:13 PM

#

You see, I can't read

#

KEKW

proven laurel Jun 26, 2023, 12:13 PM

#

KEKW

wicked notch Jun 26, 2023, 12:13 PM

#

Makes sense then

#

only counting the bottom gl_SubgroupSize bits is an extremely covoluted and cryptic way of saying "we only count bits up until the current subgroup's ballot excluded"

#

Unless that's not actually what it's saying

#

It would make sense though

#

Since you can do stuff[base + subgroupBallotExclusiveBitCount(vote)] = other_stuff

#

layout (local_size_x = WORKGROUP_SIZE, local_size_y = 1, local_size_z = 1) in;

layout (push_constant) uniform pc_data_block {
    uint64_t meshlet_address;
    uint64_t vertex_address;
    uint64_t index_address;
    uint64_t primitive_address;
    uint64_t transforms_address;
    uint meshlet_count;
};

taskPayloadSharedEXT task_payload_t payload;

void main() {
    const uint meshlet_id = gl_GlobalInvocationID.x;
    const bool is_visible = meshlet_id < meshlet_count /*&& frustum_cull(meshlet_id)*/;
    const uvec4 vote = subgroupBallot(is_visible);
    const uint surviving = subgroupBallotBitCount(vote);
    const uint offset_index = subgroupBallotExclusiveBitCount(vote);
    payload.base_meshlet_id = gl_WorkGroupID.x * WORKGROUP_SIZE;
    payload.meshlet_offset[offset_index] = uint8_t(gl_LocalInvocationID.x);
    if (gl_LocalInvocationID.x == 0) {
        EmitMeshTasksEXT(surviving, 1, 1);
    }
}
``` World's dumbest task shader

#

holy shit it works

wicked notch Jun 26, 2023, 3:28 PM

#

Mfw it's easier to write task shaders than to do frustum culling with infinite/reverse Z projections

#

I am dumb and stupid

#

I always forget to do prim_count * 3 instead of just prim_count

#

ffs

wicked notch Jun 26, 2023, 5:32 PM

#

Turns out the projection is fine

#

My plane extraction method also works fine (I think)

wicked notch Jun 26, 2023, 6:45 PM

#

It took a long while

#

But we did it

#

world's most efficient rasterizer

#

But we do not stop here

#

We can be efficienter

raven orchid Jun 26, 2023, 9:06 PM

#

is this the page where you'll be documenting nanite impl progress?

wicked notch Jun 26, 2023, 9:08 PM

#

Yes

#

It's moslty memes + me ranting about stuff I don't like though KEKW

#

Notice that I use the easy way out, i.e I use mesh shaders

#

Nanite emulates them, I do have an OpenGL prototype for mesh shader emulation, but it's a pain to work with

raven orchid Jun 26, 2023, 9:17 PM

#

Awesome gonna follow this

#

Do they do that so they can support apis that don’t have mesh shaders?

wicked notch Jun 26, 2023, 9:21 PM

#

Yeah

#

You could (in theory) run nanite on 10 year old GPUs if you really wanted to KEKW

#

The minimum requirement is just 64 bit buffer atomics

wicked notch Jun 26, 2023, 9:42 PM

#

Alright next on the list are

HiZ occlusion culling
Primitive culling
Cluster screenspace area classification

#

And hopefully I can begin with compute rasterization soon™️

wicked notch Jun 27, 2023, 2:57 PM

#

Hmm, gltfpack doesn't seem to be able to generate instances on its own

#

I mean it makes sense that one model = one instance, but eh

wicked notch Jun 27, 2023, 3:38 PM

#

cgltf chokes on EXT_mesh_gpu_instancing nervous

wicked notch Jun 27, 2023, 5:44 PM

#

I have been pondering

#

struct meshlet_glsl_t {
    uint32 vertex_offset = 0;
    uint32 index_offset = 0;
    uint32 primitive_offset = 0;
    uint32 index_count = 0;
    uint32 primitive_count = 0;
    uint32 group_id = 0;
    alignas(alignof(float32)) aabb_t aabb = {};
};```

#

Given that my meshlet struct is about 48 bytes

wispy spear Jun 27, 2023, 5:45 PM

#

https://tenor.com/view/contemplating-dr-randall-mindy-leonardo-dicaprio-dont-look-up-thinking-gif-24584744

Tenor

wicked notch Jun 27, 2023, 5:45 PM

#

Soon to become just 40, replacing the AABB with the sphere

#

Say a mesh subdivides in N meshlets and this mesh has M instances

#

Does it make sense to have N * M meshlets?

#

There will be a ton of redundancy...

wispy spear Jun 27, 2023, 5:48 PM

#

do you predict that this change will have a big positive impact on $PERFORMANCE?

wicked notch Jun 27, 2023, 5:48 PM

#

I don't care about that yet

#

I am just brainstorming how to do instanced rendering with meshlets

#

But it will have a huge impact on memory

#

Right now I'm uploading M times the same vertices to the GPU, over and over again nervous

wispy spear Jun 27, 2023, 5:49 PM

#

hmm

wicked notch Jun 27, 2023, 6:19 PM

#

#graphics-techniques message

wispy spear Jun 27, 2023, 6:29 PM

#

wicked notch https://discord.com/channels/318590007881236480/600645448394342402/1123315174447...

#

pinned it too, jaker talks so much it would just go under again

wicked notch Jun 27, 2023, 6:30 PM

#

Thanks m8

wicked notch Jun 27, 2023, 6:58 PM

#

Instances!

wicked notch Jun 27, 2023, 8:59 PM

#

Hmm my instancing is borking meshoptimizer somehow

#

It can't generate a proper vertex remap for bistro only

#

Perhaps I am breaking some assumption meshopt makes

wicked notch Jun 28, 2023, 10:21 AM

#

Just now I realize how small a number 134217728 is nervous

#

With big scenes the number of meshlets is insane

#

Powerplant alone is 12 million (instanced) meshlets

wispy spear Jun 28, 2023, 10:42 AM

#

oof

wicked notch Jun 28, 2023, 12:57 PM

#

I have now reduced the number of meshlets considerably

#

at the cost of more memory

#

for fucks sake

#

Why can't I have 1 terabyte of VRAM

#

Everything would be so much easier

wicked notch Jun 28, 2023, 1:13 PM

#

I was debugging why displaying normals would cause a device lost

#

Even though displaying meshlet ID would work just fine

#

Turns out I was passing 0 as the meshlet instance buffer address

#

How the hell was it working before

#

???

#

Alright now powerplant is 200628 meshlets froge

raven orchid Jun 28, 2023, 2:34 PM

#

Does performance improve to a point with more meshlets or degrade? Is it a balancing act to keep the number within a good range?

wicked notch Jun 28, 2023, 2:37 PM

#

Of course performance scales pretty much linearly with the number of meshlets, but occupancy stays the same

#

Whether you send 100 meshlets or 100000 meshlets the GPU will happily process them at full speed

#

Perhaps it would be a cool experiment to merge ALL meshes into a single huge mesh and derive meshlets from that

wispy spear Jun 28, 2023, 3:41 PM

#

powerplant's primitives are also quite awkward

#

all the pipes are one, iirc for example

wicked notch Jun 28, 2023, 6:13 PM

#

Hmm

#

I currently do depth testing with a shrimple imageAtomicMax

#

Actually, nevermind whatever I was thinking KEKW

wicked notch Jun 28, 2023, 9:30 PM

#

I really don't like HiZ

#

It's overly conservative at times, you have to handle disocclusion events, frame 0 is a special case

wispy spear Jun 28, 2023, 9:31 PM

#

you could play Rust or PubG instead

wicked notch Jun 28, 2023, 9:32 PM

#

frog_thinkk

wispy spear Jun 28, 2023, 9:32 PM

#

didnt you ponder around the other day, that hiz doesnt do anything for your already good meshlet renderisms?

#

or is that a different thing

wicked notch Jun 28, 2023, 9:33 PM

#

Perhaps I misspoke, occlusion culling is definitely useful

#

I just have a personal feud with HiZ KEKW

#

The impl is also straight from Niagara so..

#

Anyways, I don't think it's practical to do ROC with meshlets, far too many AABBs lol

#

The culling would not be worth it I think, perhaps I could experiment

#

It wouldn't integrate very well with the TASK/MESH pipeline but eh

#

It's just one more buffer, what's wrong with that

#

What do you guys predict, will ROC be worth it?

#

Bets are open

frank sail Jun 28, 2023, 9:42 PM

#

republic of china

wicked notch Jun 28, 2023, 9:46 PM

#

frank sail republic of china

yes

wispy spear Jun 28, 2023, 9:46 PM

#

ah

wicked notch Jun 28, 2023, 9:46 PM

#

Tbh I kinda want to just leave HiZ as is and come back to it later

#

I would like to move onto the next step, which is cluster area/error estimation

wispy spear Jun 28, 2023, 9:47 PM

#

isnt roc "just"(tm) yaymd 's new cuda?

wicked notch Jun 28, 2023, 9:48 PM

#

And software rasterization for the big gains

wicked notch Jun 28, 2023, 9:48 PM

#

wispy spear isnt roc "just"(tm) <:yaymd:1091308645750423632> 's new cuda?

That's ROCm KEKW

wispy spear Jun 28, 2023, 9:48 PM

#

ah you are tralking about something else

#

republic of coomers

wicked notch Jun 28, 2023, 9:48 PM

#

Yes, ROC is raster occlusion culling

wispy spear Jun 28, 2023, 9:48 PM

#

oops : > you got me

wicked notch Jun 28, 2023, 9:49 PM

#

Anyways I have a small problem

#

Wpotrick did not do the cluster area estimation yet

wispy spear Jun 28, 2023, 9:49 PM

#

~~there are exercises for that~~

wicked notch Jun 28, 2023, 9:49 PM

#

Which means I am on my own nervous

wispy spear Jun 28, 2023, 9:49 PM

#

perhaps pester him to figure something out with you

#

you are the only one actually trying to achieve something here

#

@glass sphinx lustri is complaining that you did not do cluster area estiminiation yet

glass sphinx Jun 28, 2023, 9:51 PM

#

frog_thinkk

#

im too lazy

wicked notch Jun 28, 2023, 9:51 PM

#

I am not

wispy spear Jun 28, 2023, 9:51 PM

#

put your heads together

#

get it done

glass sphinx Jun 28, 2023, 9:52 PM

#

i only touch opengl with a stick

wispy spear Jun 28, 2023, 9:52 PM

#

🥢

#

here have 2

glass sphinx Jun 28, 2023, 9:52 PM

#

we can plan it together

wicked notch Jun 28, 2023, 9:52 PM

#

Worry not, I am currently using our lord and savior vulkan

glass sphinx Jun 28, 2023, 9:53 PM

#

goooood

#

so

#

did you ever want to join a cult before?

#

i have good news

#

we are recruiting

wicked notch Jun 28, 2023, 10:05 PM

#

Incredible

#

What kind of cult

frank sail Jun 28, 2023, 10:06 PM

#

https://tenor.com/view/obiwan-star-wars-dont-try-it-ewan-mc-gregor-revenge-of-the-sith-gif-7971802

Tenor

glass sphinx Jun 28, 2023, 10:18 PM

#

bleakekw

#

~~first you need to sign a clause that makes you my property forever~~

#

i am gathering the convincement crew

delicate rain Jun 28, 2023, 10:20 PM

#

It's a good and friendly cult

#

you only need to recruit 5 more ppl as a payoff

#

for getting introduced

#

it's worth it tho

delicate rain Jun 28, 2023, 10:21 PM

#

delicate rain you only need to recruit 5 more ppl as a payoff

we can make it 3 if you are cool

#

which you seem you are

glass sphinx Jun 28, 2023, 10:21 PM

#

are you a c++ person @wicked notch ?

left jacinth Jun 28, 2023, 10:22 PM

#

Did somebody say Daxa?

wicked notch Jun 28, 2023, 10:22 PM

#

They're raiding me nervous

delicate rain Jun 28, 2023, 10:22 PM

#

https://tenor.com/view/anchorman-assemble-will-ferrell-seashell-blow-gif-20525265

Tenor

glass sphinx Jun 28, 2023, 10:22 PM

#

you have no escape

#

https://tenor.com/view/pakistan-army-pak-army-ssg-commando-gif-24705824

Tenor

wicked notch Jun 28, 2023, 10:22 PM

#

deccer look what you did

glass sphinx Jun 28, 2023, 10:23 PM

#

we come and conquer

wicked notch Jun 28, 2023, 10:23 PM

#

Anyways, I'm honored about this invitation, but I'm afraid I can't join your cult 😦

glass sphinx Jun 28, 2023, 10:24 PM

#

👹

delicate rain Jun 28, 2023, 10:24 PM

#

https://tenor.com/view/turn-up-the-heat-a-little-parker-stevenson-louis-osmond-greenhouse-academy-push-them-a-little-bit-more-gif-18492906

Tenor

glass sphinx Jun 28, 2023, 10:24 PM

#

https://tenor.com/view/wrong-not-nope-gif-22294097

Tenor

wicked notch Jun 28, 2023, 10:24 PM

#

bleakekw

glass sphinx Jun 28, 2023, 10:25 PM

#

anyways we need more daxa people to ~~compensate my lazyness~~ get even more features in

#

if you ever feel the need to completely rewrite everything for no reason in daxa we are ~~eating you alive~~ always here

wicked notch Jun 28, 2023, 10:26 PM

#

Sure, I will eventually get tired of writing stuff on my own bleakekw

glass sphinx Jun 28, 2023, 10:26 PM

#

merge it into daxa

delicate rain Jun 28, 2023, 10:26 PM

#

Suffering is meant to be shared

glass sphinx Jun 28, 2023, 10:26 PM

#

real

#

real

delicate rain Jun 28, 2023, 10:26 PM

#

especially when it's caused by GP

wicked notch Jun 28, 2023, 10:26 PM

#

delicate rain Suffering is meant to be shared

Truer words have never been spoketh

glass sphinx Jun 28, 2023, 10:27 PM

#

btw is there a way to see the post quickly?

#

it seems like i must scroll up throu 100 quadrillion messages

wicked notch Jun 28, 2023, 10:28 PM

#

Ah yeah, unfortunately I did not have the foresight to pin the initial thing bleakekw

#

But worry not, I just recently started with Vulkan

#

I went straight for mesh shaders (the EXT one)

glass sphinx Jun 28, 2023, 10:29 PM

#

template addict very good

left jacinth Jun 28, 2023, 10:29 PM

#

Holy shit Patrick relax on the cult behavior

glass sphinx Jun 28, 2023, 10:30 PM

#

wicked notch But worry not, I just recently started with Vulkan

nice

delicate rain Jun 28, 2023, 10:30 PM

#

https://tenor.com/view/lhp-satan-dance-cult-lucifer-gif-17172667

Tenor

glass sphinx Jun 28, 2023, 10:30 PM

#

https://tenor.com/view/loona-이달소-이달의소녀-moon-circle-gif-20696589

Tenor

glass sphinx Jun 28, 2023, 10:31 PM

#

wicked notch Ah yeah, unfortunately I did not have the foresight to pin the initial thing <:b...

discord is so shit

#

how do they miss obvious features like that i cant see the original post

#

frog_think

wispy spear Jun 28, 2023, 10:32 PM

#

before you fine daxa people try to sell more carpet to my italic friend here, help him figure out cluster-mesh-thingy

#

then he's all yours 😄

wicked notch Jun 28, 2023, 10:32 PM

#

Look at him, abandoning me like this

glass sphinx Jun 28, 2023, 10:32 PM

#

https://tenor.com/view/patrick-star-gif-26050999

Tenor

wispy spear Jun 28, 2023, 10:32 PM

#

i get a commission, its worf it

left jacinth Jun 28, 2023, 10:33 PM

#

Lmao

delicate rain Jun 28, 2023, 10:33 PM

#

wispy spear before you fine daxa people try to sell more carpet to my italic friend here, he...

This smells of being usefull they don't teach that at Daxa academy 😦

glass sphinx Jun 28, 2023, 10:33 PM

#

crimes

left jacinth Jun 28, 2023, 10:33 PM

#

??

#

What the balls are you saying saky

delicate rain Jun 28, 2023, 10:34 PM

#

https://tenor.com/view/sad-frown-rain-cat-gif-17035614

Tenor

glass sphinx Jun 28, 2023, 10:34 PM

#

@wicked notch can you link and pin the github?

wicked notch Jun 28, 2023, 10:34 PM

#

Sure

#

One sec

frank sail Jun 28, 2023, 10:35 PM

#

glass sphinx btw is there a way to see the post quickly?

search for a common keyword and sort by old

#

discord is very cool

wicked notch Jun 28, 2023, 10:36 PM

#

OpenGL garbage: https://github.com/LVSTRI/Iris
Vulkan garbage: https://github.com/LVSTRI/IrisVk

#

I suppose you won't be much interested in the GL one

glass sphinx Jun 28, 2023, 10:37 PM

#

soon daxa ~~garbage~~ pure gold

wispy spear Jun 28, 2023, 10:37 PM

#

its not garbage, its brainworm material tbf

wispy spear Jun 28, 2023, 10:37 PM

#

wicked notch OpenGL garbage: <https://github.com/LVSTRI/Iris> Vulkan garbage: <https://github...

left jacinth Jun 28, 2023, 10:37 PM

#

Daxa is not garbage

#

Daxa is literally the best vulkan abstraction possible

#

Except it's missing a couple things 💀

wispy spear Jun 28, 2023, 10:38 PM

#

no sales pitches we need solutions

glass sphinx Jun 28, 2023, 10:38 PM

#

the code looks quite clean

#

better then other things i have seen here

glass sphinx Jun 28, 2023, 10:39 PM

#

left jacinth Except it's missing a couple things 💀

that is where lvstri comes in

#

i like your code

wicked notch Jun 28, 2023, 10:42 PM

#

Thanks, it's missing a lot of niceties though

delicate rain Jun 28, 2023, 10:42 PM

#

The code looks awesome

wicked notch Jun 28, 2023, 10:42 PM

#

A proper render graph for example bleakekw

glass sphinx Jun 28, 2023, 10:42 PM

#

damn this code is super clean

delicate rain Jun 28, 2023, 10:42 PM

#

I mean, thats a pretty tall task for a nicety 😄

glass sphinx Jun 28, 2023, 10:42 PM

#

task list burned oput so much of by brain

#

since task list i cant type properly anymore

wicked notch Jun 28, 2023, 10:43 PM

#

bleakekw

#

The worm needs more brain mass

frank sail Jun 28, 2023, 10:43 PM

#

glass sphinx since task list i cant type properly anymore

same except for me it was after filling in sTypes all day

glass sphinx Jun 28, 2023, 10:44 PM

#

😿

#

stypes to be pain

delicate rain Jun 28, 2023, 10:44 PM

#

we had pain bugs caused by ommiting them by accident in some places

wicked notch Jul 1, 2023, 2:32 PM

#

As expected, ROC does not perform very well

#

I'm conflicted

wicked notch Jul 2, 2023, 6:22 PM

#

I have ordered 128GB of RAM

#

finally we can fit Moana Island in System Ram KEKW

wispy spear Jul 2, 2023, 6:28 PM

#

😄

wicked notch Jul 2, 2023, 9:15 PM

#

I don't understand how Darianopolis managed to export FBX in unreal tbh

#

I'm trying it with Moana assets and it exports a botched version, the lowest LOD possible KEKW

wispy spear Jul 2, 2023, 9:17 PM

#

summon him and start the interlocution

#

https://tenor.com/view/picard-4gifs-star-trek-va-la-barre-gif-24687536

Tenor

glass sphinx Jul 2, 2023, 9:38 PM

#

THERE ARE FOUUUUURR LIGHTS

wispy spear Jul 2, 2023, 10:05 PM

#

at last

glass sphinx Jul 3, 2023, 12:29 PM

#

tng ist wirklich gut

glass sphinx Jul 3, 2023, 9:22 PM

#

@wispy spear pro tip: von den neuen star treks (die all poopy sind) ist die neuste (strange new worlds) wirklich gut

#

grosse empfehlung

#

pike ist wundervoll

wispy spear Jul 3, 2023, 9:23 PM

#

@glass sphinx you're late

#

hab alles schon gesehen

#

und ja du hast recht

#

was is ziemlich kacke finde is spock und die schwester, fand seine frau viel cooler

#

und die nunien tzung tante nervt auch jedesma mit ihrem gorn shit

#

schade das der andorianer maschinenraumfutzi wech is, der war der beschde

frank sail Jul 3, 2023, 9:27 PM

#

https://tenor.com/view/do-you-speaka-any-english-speaka-speaka-english-jackie-chan-rush-hour-gif-18426418

Tenor

wispy spear Jul 3, 2023, 9:27 PM

#

das a jacky chan movie

#

we're just lamenting about the latest star trek isms, where strange new worlds is the better out of the 3 new shows

wicked notch Jul 3, 2023, 10:10 PM

#

Star trek things aside

#

I think I have the best possible implementation of HiZ my brain can manage

#

And it is reasonably conservative

#

A little demo

frank sail Jul 3, 2023, 10:15 PM

#

aliasing aaaaaaah

#

jk nice cooling

wicked notch Jul 3, 2023, 10:15 PM

#

FSR2 will come soon™️

#

Also is it just me or are these normals fucked

#

This is the first I've seen this nervous

frank sail Jul 3, 2023, 10:17 PM

#

hm

glass sphinx Jul 3, 2023, 10:17 PM

#

wispy spear schade das der andorianer maschinenraumfutzi wech is, der war der beschde

der war echt der beste

#

super sad

#

aber hatte sehr gute folgen

frank sail Jul 3, 2023, 10:17 PM

#

if you don't remap your normals, half of them will be black

wicked notch Jul 3, 2023, 10:17 PM

#

Yeah, I meant the green being -X

frank sail Jul 3, 2023, 10:18 PM

#

o

wicked notch Jul 3, 2023, 10:18 PM

#

On the little thingy in the middle

frank sail Jul 3, 2023, 10:18 PM

#

da green been

#

hehe

glass sphinx Jul 3, 2023, 10:18 PM

#

n * 0.5 + 0.5

wicked notch Jul 3, 2023, 10:18 PM

#

ye this do not be good

glass sphinx Jul 3, 2023, 10:22 PM

#

huuh

wicked notch Jul 3, 2023, 10:28 PM

#

So uh

#

Area of a triangle is bh/2

#

What's the area of a bunch of em

frank sail Jul 3, 2023, 10:28 PM

#

bh/2 * numberOfTris

wicked notch Jul 3, 2023, 10:28 PM

#

Course it is

frank sail Jul 3, 2023, 10:29 PM

#

The formula to calculate the area of a regular polygon is, Area = (number of sides × length of one side × apothem)/2, where the value of apothem can be calculated using the formula, Apothem = [(length of one side)/{2 ×(tan(180/number of sides))}].

wicked notch Jul 3, 2023, 10:29 PM

#

Das a lot of data

#

Well not a lot

#

But hm

#

I wonder how Unreal does it, I was thinking of computing the area of a clip space projected AABB around the cluster

#

So like, (box.z - box.x) * (box.w - box.y)

#

But this is extremely skewed towards hardware rendering

#

And hardware rendering is cringe

#

I should also determine clusters whose triangles are going to be clipped

#

Given that we do things per vertex, I could check if vertex.xy > 1.0 || vertex.xy < -1.0

#

And mark the cluster as hardware rasterizeable if so

#

Cutting the area of the projected AABB in half could be good

#

Depending on uh, literally everything

#

Jaker, could you lend me your braincell

frank sail Jul 3, 2023, 10:35 PM

#

I can lend a froge a braincell

#

ok so you're trying to see how big a cluster is in screen space as a heuristic to determine which rasterizer to use?

wicked notch Jul 3, 2023, 10:36 PM

#

yes

frank sail Jul 3, 2023, 10:36 PM

#

did you already give up on getting the bounding box

wicked notch Jul 3, 2023, 10:36 PM

#

Later on to determine which lod to use, but that's a story for future me

#

I have not

#

It's the only way I could think of bleakekw

#

Of course I will try it, I would like to hear other smorter/dumbere ways

wispy spear Jul 3, 2023, 10:39 PM

#

das ist cool btw, just saw the motion picture in all normal glory

wicked notch Jul 3, 2023, 10:40 PM

#

I still don't have textures btw bleakekw

wispy spear Jul 3, 2023, 10:40 PM

#

soon(tm)

wicked notch Jul 3, 2023, 10:40 PM

#

I won't introduce a memory bottleneck immediately so that I can still see gains from my ridiculous quest towards Nanite

#

Or well, a scuffed, dumber and worse version of Nanite KEKW

frank sail Jul 3, 2023, 10:44 PM

#

btw there is a way to get the screen space area of a projected sphere

#

https://iquilezles.org/articles/sphereproj/

wicked notch Jul 3, 2023, 10:45 PM

#

Inigo Quilez to the rescue pog

frank sail Jul 3, 2023, 10:46 PM

#

I think it might be more ideal than the screen space AABB in some instances

#

idk if it's better in general

wicked notch Jul 4, 2023, 11:33 AM

#

huh

#

L2 broke through its own limits

proven laurel Jul 4, 2023, 11:35 AM

#

wicked notch huh

https://tenor.com/view/oblivion-elder-scrolls-stop-right-there-stop-criminal-scum-gif-16138905

Tenor

glass sphinx Jul 4, 2023, 1:34 PM

#

wicked notch huh

i rescently got 25tb/s vramthroughput

#

profilers do be sniffing glue sometimes

proven laurel Jul 4, 2023, 2:35 PM

#

glass sphinx i rescently got 25tb/s vramthroughput

https://tenor.com/view/now-old-man-the-future-is-now-old-man-gif-9677657

Tenor

wicked notch Jul 4, 2023, 2:36 PM

#

ultra wrong cluster area detection KEKW

wicked notch Jul 4, 2023, 2:59 PM

#

At least clip detection is working (no white pixels at the edges)

#

Alright we did it boys

#

Blue = Small Area = Software Raster
Red = Big Area = Hardware Raster
Black = Clipped = Hardware Raster

#

Everything converges to blue as distance grows as expected

wicked notch Jul 4, 2023, 3:46 PM

#

Now I gotta do soft rast nervous

wispy spear Jul 4, 2023, 4:11 PM

#

damn this thing is getting better and better ❤️

summer gyro Jul 4, 2023, 4:14 PM

#

Is this still opengl ?
Or have you wandered off to vulkan ?

#

This stuff looks soo cool

wicked notch Jul 4, 2023, 4:17 PM

#

I have unfortunately defected to Vulkan 😦

#

But I still have good uses for GL

summer gyro Jul 4, 2023, 4:17 PM

#

Fair enough

#

I also recently started learning vulkan
And the validation layers are so much better than the debug call backs

glass sphinx Jul 4, 2023, 4:18 PM

#

wicked notch But I still have good uses for GL

hmhmmm

#

aha uhu

#

tell me

wicked notch Jul 4, 2023, 4:19 PM

#

Well uh

#

I have IrisGL that works pretty well, it has shadows and all, so I can use that thing as a reference sometimes

#

Don't slander my boy GL 😭

wispy spear Jul 4, 2023, 4:22 PM

#

oh?

#

this is vookan already?

wicked notch Jul 4, 2023, 4:23 PM

#

Yes, the last updates have been in Vulkan

glass sphinx Jul 4, 2023, 4:24 PM

#

LVSTRI one of the people i would hire if i could

#

workoholics that write pretty code are very good for business

wicked notch Jul 4, 2023, 4:34 PM

#

By the way I managed to export a cute scene from UE5

#

Here

#

It's pretty nice

raven orchid Jul 4, 2023, 8:15 PM

#

wicked notch Yes, the last updates have been in Vulkan

Switched everything to Vulkan or doing hybrid with gl?

wispy spear Jul 4, 2023, 8:16 PM

#

wouldnt surprise me if lustri interops some dx12 into the mix for some obscure reason heh

wicked notch Jul 4, 2023, 8:20 PM

#

I did not go that far into madness yet bleakekw

#

It's fully Vulkan

wicked notch Jul 4, 2023, 8:56 PM

#

I think it's finally time to add textures

wicked notch Jul 4, 2023, 11:47 PM

#

https://github.com/alecjacobson/common-3d-test-models for later™️

wicked notch Jul 5, 2023, 9:41 AM

#

RAM has arrived let's goo

#

I shall use every last friggin byte

wispy spear Jul 5, 2023, 10:18 AM

#

https://tenor.com/view/sheep-gif-4520105

Tenor

Sheep

▶ Play video

wicked notch Jul 5, 2023, 2:46 PM

#

Man, 128GB of RAM feels truly liberating

#

Unreal uses up to 64 and I still have 64 left KEKW

wicked notch Jul 5, 2023, 5:01 PM

#

with a super long delay, textures (meshlet flavor)

raven orchid Jul 5, 2023, 7:07 PM

#

Hey nice!

#

What causes it to need so much system ram? Are you having to stream from system to vram a lot or does it mostly stay in system?

wicked notch Jul 5, 2023, 7:31 PM

#

Not yet, I got so much RAM because of Unreal Engine (which I use as editor) and blender nervous

#

They were taking up so much RAM I couldn't bear it anymore

proven laurel Jul 5, 2023, 7:34 PM

#

wicked notch Not yet, I got so much RAM because of Unreal Engine (which I use as editor) and ...

Are you running both in parallel

proven laurel Jul 5, 2023, 7:34 PM

#

wicked notch Man, 128GB of RAM feels truly liberating

three extra chrome tabs is really nice isn't it?

wicked notch Jul 5, 2023, 7:35 PM

#

proven laurel Are you running both in parallel

No but blender sucks up a lot of RAM KEKW

proven laurel Jul 5, 2023, 7:35 PM

#

converting Bistro from FBX to GLTF was my breaking point to go from 16GB to 32GB KEKW

finite quartz Jul 5, 2023, 7:39 PM

#

I went from 32 to 64 because I was often swapping because of Painter... froghorror (We like RAM and VRAM a lot)

proven laurel Jul 5, 2023, 7:41 PM

#

finite quartz I went from 32 to 64 because I was often swapping because of Painter... <:frogho...

Substance Painter?

#

3D software just eats up all ram and vram lol

finite quartz Jul 5, 2023, 7:41 PM

#

proven laurel Substance Painter?

yeah

#

we eat all the vram we can :p

proven laurel Jul 5, 2023, 7:43 PM

#

well I'll see if my GPU has enough for the stuff I want to do KEKW

wicked notch Jul 6, 2023, 9:29 AM

#

Hmm some clusters have triangles scattered about all over the place

#

The hell is meshoptimizer doing

wicked notch Jul 6, 2023, 11:49 PM

#

It is now time

#

compute rasterizer

#

except I am sleep

#

So that is deferred to tomorrow KEKW

glass sphinx Jul 7, 2023, 12:10 AM

#

btw what is your educational status?

#

hs? uni? working?

frank sail Jul 7, 2023, 1:27 AM

#

uni

#

(I am observing lvstri through walls)

glass sphinx Jul 7, 2023, 1:36 AM

#

frogapprove

wicked notch Jul 7, 2023, 8:07 AM

#

frank sail uni

I confirm this

#

Jaker do be livin in my walls

wicked notch Jul 7, 2023, 1:12 PM

#

bistro if you draw only big triangles

#

KEKW

wicked notch Jul 7, 2023, 1:52 PM

#

Hmm

#

How should I schedule work for my compute rasterizer

#

Perhaps one workgroup per meshlet and one thread per primitive?

#

It's gonna have a lot of dead invocations...

#

primitive_count is never going to be MAX_PRIMITIVES

#

More indirection could solve this though bleakekw

wicked notch Jul 7, 2023, 2:46 PM

#

World's most absolutely unhinged and stupid software rasterizer bleakekw

#

polygon fill: todo

#

Hmm this isn't very promising

#

Granted this is probably the most inefficient way possible of doing a compute rasterizer

#

But right now it's also not doing much...

#

Perhaps I am spending a lot of time idle

#

Uhhhh

#

How the hell do I read this?

#

Oh well, I can see that the only two if statements are taking up a combined 120% of the time spent in the shader bleakekw

wicked notch Jul 7, 2023, 3:38 PM

#

I think the added indirection is necessary

#

I am spending over 300 microseconds idle

wispy spear Jul 7, 2023, 3:41 PM

#

is ther a way to get rid of that if by somehow sorting the data before hand?

wicked notch Jul 7, 2023, 3:43 PM

#

Yeah

#

I need to do an extra processing step

#

I should make a buffer with all software rasterized meshlets and another with an index to all software rasterized primitives

#

Actually no

#

it's probably better if I make a single buffer with primitiveID | meshletID << 7

#

WG size will be 256

#

And we round down as usual

#

looping over all primitives vs looping over all meshlets hmmmmmm

#

deep ponderation

#

Looks like nanite does the former

#

And it is somehow faster

#

Even deeper pondering

wicked notch Jul 7, 2023, 4:22 PM

#

I am dispatching 476394 workgroups, each workgroup has a local size of 128

#

The average number of primitives is about 64

#

Which means that of 60978432 threads, 30489216 are doing nothing bleakekw

wispy spear Jul 7, 2023, 4:31 PM

#

fouf, sounds like a lot wasted potential

proven laurel Jul 7, 2023, 4:57 PM

#

with meshlets you generally want to try to merge as much as possible

#

otherwise you get low occupancy

wicked notch Jul 7, 2023, 5:02 PM

#

wicked notch it's probably better if I make a single buffer with primitiveID | meshletID << 7

Yeah but I just realized with this solution I am shrimply moving the problem elsewhere bleakekw

#

Perhaps making smaller meshlets is better

#

As a quick sanity check I tried making the meshlets smaller, but the time took by rasterizer is the same....

#

frog_thinkk

wicked notch Jul 7, 2023, 5:32 PM

#

Any change I make impacts minimally, looks like idle threads are not the bottleneck?

#

I strongly believe I am doing something wrong, I will inspect more closely

#

The barriers and loads are taking up most of the time, hmm

wicked notch Jul 7, 2023, 8:27 PM

#

@frank sail could you help a smol-brained frog in need?

frank sail Jul 7, 2023, 8:28 PM

#

wicked notch Jul 7, 2023, 8:31 PM

#

wicked notch The barriers and loads are taking up most of the time, hmm

Can you extract any useful informations from this

frank sail Jul 7, 2023, 8:37 PM

#

no

#

what are the throughputs

wicked notch Jul 7, 2023, 8:38 PM

#

This is with hardware raster (consider only meshlet_cull_and_draw)

frank sail Jul 7, 2023, 8:38 PM

#

what part of the frame am I looking at

wicked notch Jul 7, 2023, 8:39 PM

#

meshlet_cull_and_draw (I have disabled culling btw, for testing hw/sw)

frank sail Jul 7, 2023, 8:39 PM

#

hmm

#

what's the occupancy

wicked notch Jul 7, 2023, 8:40 PM

#

This is software

wicked notch Jul 7, 2023, 8:40 PM

#

frank sail what's the occupancy

Occupancy is good both on HW raster and SW raster

#

Thread coherency is also 99%

#

The software rasterizer also doesn't do primitive filling, it just renders points

#

I could show you the code if you are interested

#

But it's really basic

#

shared meshlet_data_t meshlet;
shared vec3[64] vertices;

void main() {
    const uint meshlet_instance_id = gl_WorkGroupID.x;
    const uint meshlet_id = meshlet_instances[meshlet_instance_id].meshlet_id;
    const uint instance_id = meshlet_instances[meshlet_instance_id].instance_id;

    if (is_candidate_sw_raster(meshlet_id) && gl_LocalInvocationID.x == 0) {
        meshlet = meshlet_data[meshlet_id];
    }
    barrier();

    if (!is_candidate_sw_raster(meshlet_id)) {
        return;
    }

    const mat4 transform = instances[instance_id];
    if (gl_LocalInvocationID.x < meshlet.index_count) {
        vertices[gl_LocalInvocationID.x] = fetch_vertex_and_project_to_ndc(gl_LocalInvocationID.x);
    }
    barrier();

    const uint primitive_id = gl_LocalInvocationID.x;
    if (primitive_id < meshlet.primitive_count) {
        const vec3[] triangles = rasterize_triangles(primitive_id);
        imageAtomicMax(visbuffer, ivec2(triangles[0].xy), triangles[0].z);
        imageAtomicMax(visbuffer, ivec2(triangles[1].xy), triangles[1].z);
        imageAtomicMax(visbuffer, ivec2(triangles[2].xy), triangles[2].z);
    }
}```

#

This is the gist

#

The things that take the most are the two barriers (and I have no idea why)

frank sail Jul 7, 2023, 8:47 PM

#

remove the barriers bleakekw

wicked notch Jul 7, 2023, 8:48 PM

#

wg size is (128, 1, 1) and invocation size is (meshlet_count, 1, 1) btw

#

Or one workgroup per meshlet, one thread per primitive

#

It is almost a 1 to 1 copy of what unreal does bleakekw

frank sail Jul 7, 2023, 8:50 PM

#

hmm I guess you can't really change the wg size

#

but yeah barrier with a big wg will be slow

wicked notch Jul 7, 2023, 8:51 PM

#

If I make the WG smaller I get 3x worse perf

#

bleakekw

frank sail Jul 7, 2023, 8:51 PM

#

ouphe

wispy spear Jul 7, 2023, 8:51 PM

#

make wg bigger then anismart

wicked notch Jul 7, 2023, 8:51 PM

#

2x worse perf

frank sail Jul 7, 2023, 8:51 PM

#

make it samer

wispy spear Jul 7, 2023, 8:52 PM

#

heh

#

splitting this into multiple passes would not help?

wicked notch Jul 7, 2023, 8:52 PM

#

Is this barrier really that destructive?

if (gl_LocalInvocationID.x == 0 && is_sw_rast) {
    meshlet = meshlet_ptr.data[meshlet_id];
}
barrier();```

frank sail Jul 7, 2023, 8:52 PM

#

wispy spear splitting this into multiple passes would not help?

That'd add a lot of VRAM traffic

wispy spear Jul 7, 2023, 8:52 PM

#

ah

wicked notch Jul 7, 2023, 8:52 PM

#

It it copying something like, 64 bytes of data in shared memory

#

How is this barrier taking half the time spent in the shader

frank sail Jul 7, 2023, 8:53 PM

#

well it's possible it's just an artifact of how the profiler reports things

#

in RGP, actual load instructions will appear very cheap, but then their cost will show up at s_waitcnt instructions

wicked notch Jul 7, 2023, 8:57 PM

#

Right

#

except there are literal noops before this barrier

#

except the load

frank sail Jul 7, 2023, 8:57 PM

#

just one teensy weensy little load

wicked notch Jul 7, 2023, 8:57 PM

#

Like legit it's just this

void main() {
    const uint meshlet_instance_id = gl_WorkGroupID.x;

    const uint meshlet_id = instance_ptr.data[meshlet_instance_id].meshlet_id;
    const uint instance_id = instance_ptr.data[meshlet_instance_id].instance_id;
    const uint primitive_id = gl_LocalInvocationID.x;
    const bool is_sw_rast = cluster_class_ptr.data[meshlet_instance_id] == CLUSTER_CLASS_SW_RASTER;
    
    if (gl_LocalInvocationID.x == 0 && is_sw_rast) {
        meshlet = meshlet_ptr.data[meshlet_id];
    }
    barrier();
}

#

Oh wait there is another load

#

the classification

#

Let me try forcing SW

#

It's the same

#

I am dying

#

Alright

#

I will rasterize NOTHING

#

?????????????????????

#

16 milliseconds for doing nothing

#

KEKW

#

What the hell is happening

frank sail Jul 7, 2023, 9:05 PM

#

are you dispatching a lot of wgs or something

wicked notch Jul 7, 2023, 9:05 PM

#

uh

#

maybe

#

is 500'000 considered a lot

#

btw I cracked it

#

it was the imageAtomicMax

#

I have to thank the profiler for misleading me and doing absolutely nothing to help me figure it out KEKW

#

Now it's taking the same as the hardware rasterizer

#

Which is still terrible

#

it should be taking 3x less time than HW raster in this particular case (according to unreal)

glass sphinx Jul 7, 2023, 9:18 PM

#

well

wicked notch Jul 7, 2023, 9:24 PM

#

glass sphinx well

https://tenor.com/view/wellwellwell-3dmodel-gif-18496577

Tenor

wicked notch Jul 7, 2023, 9:44 PM

#

Was I wrong to expect a crazy speed up maybe?

#

Ah beautiful

#

On todays episode of: things that make no sense

#

Thai scene: 30 million primitives so many small triangles, compute matches raster (compute should be faster)
Bistro scene: 5 million primitives and many big triangles, compute is faster than raster

#

heh

glass sphinx Jul 7, 2023, 9:59 PM

#

maybe your heuristic to check for median tri size is off

wicked notch Jul 7, 2023, 9:59 PM

#

I am doing full software vs full raster right now

glass sphinx Jul 7, 2023, 10:00 PM

#

ah ok

distant lodge Jul 8, 2023, 1:06 AM

#

I'm surprised that it's possible to match/beat raster hw at all

frank sail Jul 8, 2023, 1:06 AM

#

its perf breaks down when you have really bad quad occupancy

distant lodge Jul 8, 2023, 1:06 AM

#

man I should've started following earlier

frank sail Jul 8, 2023, 1:06 AM

#

with thin or tiny tris

distant lodge Jul 8, 2023, 1:07 AM

#

is one of the drawbacks that you need to store a buffer of all triangles ever instanced in your scene? I can't imagine you can beat the hardware vertex cache

frank sail Jul 8, 2023, 1:09 AM

#

idk how much of a perf uplift hw vertex reuse is, but clearly it's not unbeatable

#

you also lose a lot of potential vertex reuse when you render unconnected meshlets

distant lodge Jul 8, 2023, 1:12 AM

#

true

#

I wonder what the total memory requirement is to render bistro is (minus images)

frank sail Jul 8, 2023, 1:12 AM

#

btw, I wonder how much vertex reuse you can get with shared memory

#

oh wait

#

I think lvstri is already loading verts to shared mem

distant lodge Jul 8, 2023, 1:16 AM

#

oh I think I get it, because you're working with meshlets you have a known bound on the triangles you're working on

#

so your shared mem gets loaded with your meshlet's vertices and you go from there

frank sail Jul 8, 2023, 1:18 AM

#

I guess you could do something like this

layout(group_x = 128) in; // max number of verts in a meshlet

fetch and shade vertex[localInvocationID]
store transformed vertex in shared memory
barrer()
if (localInvocationID < numPrimsInThisMeshlet)
  assemble primitive[localInvocationID]
  rasterize primitive

distant lodge Jul 8, 2023, 1:21 AM

#

yeah, rasterizing 1 primitive per thread sounds kinda funky though

#

is that what the hardware does

frank sail Jul 8, 2023, 1:21 AM

#

well, there is dedicated hw for rasterizing prims, so it's super fast

#

but in this case, your prims are only a few pixels large at most

distant lodge Jul 8, 2023, 1:22 AM

#

though I guess since we're talking small triangles specifically that this is used for

#

yeah

#

software rasterization in compute always seemed pretty cumbersome to me because both the vertex and fragment operations essentially decompress into a ton more data to process in the next stage

#

but it makes sense how this technique deals with it

frank sail Jul 8, 2023, 1:25 AM

#

it works well in very constrained situations

#

in nanite, they're working with tiny triangles in a visbuffer-like renderer (so the fragment shader is literally just writing depth and a triangle/instance ID)

distant lodge Jul 8, 2023, 1:28 AM

#

yeah

#

deceptively simple and insanely clever

frank sail Jul 8, 2023, 1:30 AM

#

and best of all, infinitely bikesheddable

glass sphinx Jul 8, 2023, 1:35 AM

#

distant lodge is one of the drawbacks that you need to store a buffer of all triangles ever in...

no

#

you can do the same tricks as meshshaders do

#

you shade all verts in a meshlet within a workgroup

#

then share the results and create triangles

#

then rasterize

frank sail Jul 8, 2023, 1:35 AM

#

my frogge

glass sphinx Jul 8, 2023, 1:36 AM

#

that can and will beat the hw vertex cache hard

glass sphinx Jul 8, 2023, 1:38 AM

#

distant lodge software rasterization in compute always seemed pretty cumbersome to me because ...

you should probably never attempt to even do any frag shading

#

the only strong usecase with software raster is to write a visbuffer with depth

cedar seal Jul 8, 2023, 3:51 AM

#

frog shading

wicked notch Jul 8, 2023, 11:05 AM

#

I think I cracked the code

#

Perhaps the reason my compute raster was so garbage, was due to unconditionally imageAtomicMax'ing

#

I should've just done the classic

    for (uint x = min.x; x < max.x; ++x) {
        if (is_inside_triangle(x, y, ...)) {
            imageAtomicMax(visbuffer, ...);
        }
    }
}```

raven orchid Jul 8, 2023, 5:20 PM

#

I’m guessing when you switched to conditional atomic the level of contention went way down?

wicked notch Jul 8, 2023, 5:49 PM

#

I'm still testing right now, results will be in soon™️

wicked notch Jul 8, 2023, 6:21 PM

#

Alright results are in

#

And what sad results these are, occupancy remained the same, after all, rasterizing pixel sized triangles is quite easy

#

As did the time took for Thai (2.83)

#

I am going to assume something is fundamentally wrong with the way I build this software rasterizer

#

Until I find someone to pester about this, software raster is on hold 😦

distant lodge Jul 8, 2023, 6:23 PM

#

rip, what algorithm do you use to actually rasterize btw

wicked notch Jul 8, 2023, 6:24 PM

#

I tried this one: https://github.com/ssloy/tinyrenderer/wiki/Lesson-2:-Triangle-rasterization-and-back-face-culling

#

Perhaps I should try Unreal's algorithm as well

distant lodge Jul 8, 2023, 6:26 PM

#

which algorithm? scanline? checking if a pixel is in the triangle in an AABB?

wicked notch Jul 8, 2023, 6:27 PM

#

For each pixel in the triangle bounds yes

distant lodge Jul 8, 2023, 6:27 PM

#

#

and you're rendering 1 triangle/thread?

wicked notch Jul 8, 2023, 6:28 PM

#

Yeah it is quite bad, but somehow still manages good occupancy

#

Yes one prim per thread

#

I'll try unreal's approach

distant lodge Jul 8, 2023, 6:28 PM

#

what's unreal's approach?

wicked notch Jul 8, 2023, 6:28 PM

#

They have a hybrid scanline/pixel in AABB method

#

They choose one based on triangle screen footprint

#

https://github.com/EpicGames/UnrealEngine/blob/ue5-early-access/Engine/Shaders/Private/Nanite/Rasterizer.usf#L151

distant lodge Jul 8, 2023, 6:30 PM

#

it 404s

#

what's the secret to read UE code

wicked notch Jul 8, 2023, 6:30 PM

#

Let me find the sacred link once again

#

https://www.unrealengine.com/en-US/ue-on-github

distant lodge Jul 8, 2023, 6:32 PM

#

oh gross I need a UE account and need to connect it

wicked notch Jul 8, 2023, 6:32 PM

#

Yes, very sad

wicked notch Jul 9, 2023, 10:02 AM

#

I was thinking about the way I classify meshlet area

#

Could I compute the perfect area of a cluster in local space at load time and then scale that based on view distance and the transform's scale? thonk

#

Well it doesn't really matter right now, gotta solve sw raster first, I'll give it a few tries more and then move on

wispy spear Jul 9, 2023, 10:17 AM

#

does that not depend on how (as in where) you look at the mesh, which you cant possibly know at load time?

wicked notch Jul 9, 2023, 10:21 AM

#

Yes, at load time you compite a "baseline", the true area of the cluster in local space

#

Then, the idea is to scale that area based on view distance and transform's scale

#

Hmm perhaps this does not work with disconnected clusters (i.e. triangles that are not connected but share the same cluster ID)

distant lodge Jul 9, 2023, 10:24 AM

#

aren't you generally trying to avoid having those though

wicked notch Jul 9, 2023, 10:29 AM

#

Yes but meshoptimizer can't help but make disconnected cluster sometimes

#

I might make my own meshletizer

#

Or hack into meshoptimizer and fix that "bug"

wispy spear Jul 9, 2023, 10:37 AM

#

wicked notch Yes, at load time you compite a "baseline", the true area of the cluster in loca...

sounds like some uv unwrap, which also contains depth information : > (but ignore me, im just blabbering)

wicked notch Jul 9, 2023, 12:55 PM

#

I have reached a conclusion

#

Actually two conclusions

#

Conclusion #1: I was indeed doing fundamentally flawed calculations

#

Conclusion #2:

void rasterize(in vec3[3] triangle, in uint64_t payload) {
    const vec4 bounds = make_bounding_box(triangle[0], triangle[1], triangle[2]);
    const uint start_x = uint(bounds.x);
    const uint start_y = uint(bounds.y);
    const uint end_x = uint(bounds.z);
    const uint end_y = uint(bounds.w);
    for (uint x = start_x; x < end_x; ++x) {
        for (uint y = start_y; y < end_y; ++y) {
            const vec3 barycentric = make_barycentric(triangle[0], triangle[1], triangle[2], vec2(x, y));
            if (barycentric.x < 0.0 || barycentric.y < 0.0 || barycentric.z < 0.0) {
                continue;
            }
            const float z = dot(barycentric, vec3(triangle[0].z, triangle[1].z, triangle[2].z));
            imageAtomicMax(u_visbuffer, ivec2(x, y), (uint64_t(floatBitsToUint(z)) << 34) | payload);
        }
    }
}
``` This is pure and utter garbage ![bleakekw](https://cdn.discordapp.com/emojis/1082598350303539240.webp?size=128 "bleakekw")

#

it doesn't respect any rasterization spec ever created

wicked notch Jul 9, 2023, 1:31 PM

#

Am I overthinking this? What the hell is NaniteViewAndInvViewSize and NaniteViewRect

#

https://github.com/EpicGames/UnrealEngine/blob/ue5-early-access/Engine/Shaders/Private/Nanite/Rasterizer.usf#L317C1-L318C111

wicked notch Jul 9, 2023, 4:01 PM

#

I am so sad

#

Unreal's rasterizer literal copy does nothing to help

glass sphinx Jul 9, 2023, 4:02 PM

#

wicked notch I am so sad

what gpu?

wicked notch Jul 9, 2023, 4:02 PM

#

RTX 3070

glass sphinx Jul 9, 2023, 4:02 PM

#

strange

wicked notch Jul 9, 2023, 4:07 PM

#

Perhaps their clusterizer is that much better than Meshoptimizer?

wicked notch Jul 9, 2023, 9:21 PM

#

HLSL peeps, what does select do exactly

#

It's scary how little HLSL is documented

wispy spear Jul 9, 2023, 9:32 PM

#

hmm never seen select in hlsl before, only know it from socket nonsense

wicked notch Jul 9, 2023, 9:33 PM

#

It's old behaviour for ?: apparently

#

Thing is, what ?: does isn't documented either for vector types KEKW

wispy spear Jul 9, 2023, 9:34 PM

#

: (

#

sad times

minor root Jul 9, 2023, 10:07 PM

#

Guessing purely from naming select is any(…)

#

Then if it’s different from :? then that should be all

frank sail Jul 9, 2023, 10:10 PM

#

select sounds more like glsl's step

wicked notch Jul 9, 2023, 10:11 PM

#

Regardless, something is, once again, fundamentally wrong

#

Mesh shaders are nice but they lack flexibility in choosing a meshlet's size

#

On NV it's either 64/126 or death

wispy spear Jul 9, 2023, 10:13 PM

#

write about it, perhaps it tickles some $GPUVENDOR engineer's interest

wicked notch Jul 9, 2023, 10:13 PM

#

I don't think they will change their schtuff because I can't make a compute rasterizer efficient bleakekw

wispy spear Jul 9, 2023, 10:13 PM

#

heh

wicked notch Jul 9, 2023, 10:15 PM

#

There's nothing wrong with 64/126 per se, it's the workgroup size mismatch that kills me

#

And NV likes 128 a lot more (for compute)

#

Task/Mesh is 32 only

#

But regardless, occupancy is fine, I'm always and forever limited by VRAM

frank sail Jul 9, 2023, 10:17 PM

#

too bad you're in uncharted territory with this stuff

wispy spear Jul 9, 2023, 10:17 PM

#

you could ask peeps to run your stuff on different hardware, if that helps

#

to collect some data

wicked notch Jul 9, 2023, 10:18 PM

#

AMD hardware likes completely different things from NVIDIA's bleakekw

frank sail Jul 9, 2023, 10:18 PM

#

package telemetry with your app to collect extra data

wicked notch Jul 9, 2023, 10:18 PM

#

NVIDIA likes a WG of 32 for task/mesh, AMD likes one vertex/primitive per invocation

frank sail Jul 9, 2023, 10:19 PM

#

wicked notch AMD hardware likes completely different things from NVIDIA's <:bleakekw:10825983...

give it a little ifdef, as a treat

wicked notch Jul 9, 2023, 10:19 PM

#

Yeah I might

#

I don't have AMD hardware though bleakekw

#

Hey uh, Jaker

wispy spear Jul 9, 2023, 10:19 PM

#

or compile 2 binaries

wicked notch Jul 9, 2023, 10:19 PM

#

Could you send a little treat

minor root Jul 9, 2023, 10:19 PM

#

bleakekw

frank sail Jul 9, 2023, 10:19 PM

#

I send you my regards

wicked notch Jul 9, 2023, 10:19 PM

#

A 7900xtx will suffice

minor root Jul 9, 2023, 10:19 PM

#

A small offering for salvation

#

Inshallah lvstri will revolutionise rasterization

wicked notch Jul 9, 2023, 10:20 PM

#

I am merely copying Unreal

#

bleakekw

minor root Jul 9, 2023, 10:20 PM

#

Revolution!

wispy spear Jul 9, 2023, 10:21 PM

#

peope who wrote those shaders for unreal are probably in the copyright notice or commit log

#

mayhaps reach out

wicked notch Jul 9, 2023, 10:21 PM

#

I could

#

worst they could do is send Hitmen to terminate me due to copyright violation

wispy spear Jul 9, 2023, 10:21 PM

#

or have you hired

wicked notch Jul 9, 2023, 10:31 PM

#

Jaker

#

you are at AMD

#

explain what primitive shaders are and how they are different from mesh shaders

#

it's your patent after all

frank sail Jul 9, 2023, 10:33 PM

#

what have you googled

wicked notch Jul 9, 2023, 10:33 PM

#

"The vast majority of triangles are software rasterised using hyper-optimised compute shaders specifically designed for the advantages we can exploit," explains Brian Karis. "As a result, we've been able to leave hardware rasterisers in the dust at this specific task. Software rasterisation is a core component of Nanite that allows it to achieve what it does. We can't beat hardware rasterisers in all cases though so we'll use hardware when we've determined it's the faster path. On PlayStation 5 we use primitive shaders for that path which is considerably faster than using the old pipeline we had before with vertex shaders."```
from: <https://www.eurogamer.net/digitalfoundry-2020-unreal-engine-5-playstation-5-tech-demo-analysis>

#

And
https://www.resetera.com/threads/primitive-shader-amds-patent-deep-dive.186831/

frank sail Jul 9, 2023, 10:34 PM

#

here is moar info
https://timur.hu/blog/2022/what-is-ngg

Timur’s blog

What is NGG and shader culling on AMD RDNA GPUs?

NGG (Next Generation Geometry) is the technology that is responsible for any vertex and geometry processing in AMD RDNA GPUs. I decided to do a write-up about my experience implementing it in RADV, which is the Vulkan driver used by many Linux systems, including the Steam Deck. I will also talk about shader culling on RDNA GPUs.

wicked notch Jul 9, 2023, 10:34 PM

#

epic

frank sail Jul 9, 2023, 10:36 PM

#

wicked notch ``` "The vast majority of triangles are software rasterised using hyper-optimise...

not sure how close you can get to handwritten primitive shaders, but I guess using mesh shaders and following best practices will probably get you there

wicked notch Jul 9, 2023, 10:37 PM

#

I wanted to cope with: "Maybe my mesh shader is so good it doesn't need a software rasterizer"

#

Or something cringe like that bleakekw

#

But it turns out the PS5 does something different

#

Damn you AMD

frank sail Jul 9, 2023, 10:38 PM

#

forget about ps5

#

nanite runs on AMD PC GPUs too

wicked notch Jul 9, 2023, 10:39 PM

#

Yes but they use vertex shaders

#

That means their soft rast path is faster*
*in certain cases

frank sail Jul 9, 2023, 10:39 PM

#

well I guess get a ps5 devkit then

#

or get into driver dev

wicked notch Jul 9, 2023, 10:40 PM

#

I might need to acquire some AMD hw

#

But that would be one hell of a detour bleakekw

frank sail Jul 9, 2023, 10:40 PM

#

de2our

#

no way
https://youtu.be/vYqlbzrtI9Y

YouTube

High-Performance Graphics

Multum In Parvo: Level of Detail and Approximation Models at the Gr...

Multum In Parvo: Level of Detail and Approximation Models at the Graphics Nexus
Tamy Boubekeur, Adobe Research
Keynote - HPG 2023 - Day 3

▶ Play video

wicked notch Jul 9, 2023, 11:00 PM

#

Noice I skimmed through looks great

#

Bookmarked

#

Jaker how hard is driver dev

frank sail Jul 9, 2023, 11:01 PM

#

idk, I don't do it

wicked notch Jul 9, 2023, 11:01 PM

#

who does it

frank sail Jul 9, 2023, 11:02 PM

#

@ pixelduck @ nanokatze @ mohamexiety @ martty @ pac85

wicked notch Jul 9, 2023, 11:03 PM

#

Is driver dev on NV impossible?

frank sail Jul 9, 2023, 11:03 PM

#

noyes

wicked notch Jul 9, 2023, 11:03 PM

#

They don't share anything and the only open source driver sucks (I heard at least)

#

Hmm a 6950xt costs 600 robux

frank sail Jul 9, 2023, 11:04 PM

#

wicked notch They don't share anything and the only open source driver sucks (I heard at leas...

best I can do is refer you to your nearest nvidia representative bleakekw

wicked notch Jul 9, 2023, 11:04 PM

#

nearest? you mean in my walls

frank sail Jul 9, 2023, 11:05 PM

#

we're roommates (in your walls)

wispy spear Jul 9, 2023, 11:05 PM

#

toomuchvoltage is at nvidia iirc 😉

frank sail Jul 9, 2023, 11:05 PM

#

working on drivers though?

wispy spear Jul 9, 2023, 11:05 PM

#

not sure

frank sail Jul 9, 2023, 11:05 PM

#

or devtech stuff mayhap

wicked notch Jul 9, 2023, 11:07 PM

#

Btw

#

I went back to our old friend GL

#

And the vertex path here matches compute raster and mesh shaders on Vulkan as well

#

I have never seen 3 ridiculously different techniques agree on performance so much

#

What the hell

#

I want to profile unreal

frank sail Jul 9, 2023, 11:13 PM

#

unreal is instrumented

wicked notch Jul 9, 2023, 11:13 PM

#

It is time to compile unreal from source, wish me luck

frank sail Jul 9, 2023, 11:13 PM

#

with gpu frame marquers

wicked notch Jul 9, 2023, 11:13 PM

#

frank sail unreal is instrumented

Yes actually

frank sail Jul 9, 2023, 11:13 PM

#

I'm also stating that as a fact

wicked notch Jul 9, 2023, 11:13 PM

#

I should first see if they list sw/hw timings

frank sail Jul 9, 2023, 11:13 PM

#

because I have to profile unreal every day at work bleakekw

wicked notch Jul 9, 2023, 11:13 PM

#

epic

#

do you know how to get nanite sw/hw timings

#

Sparing me from the tedious documentation crunching bleakekw

frank sail Jul 9, 2023, 11:15 PM

#

uh you put D3D12.EmitRgpFrameMarkers=1 in DefaultEngine.ini and then profile it with RGP :^)

#

https://gpuopen.com/unreal-engine-performance-guide/

wicked notch Jul 9, 2023, 11:15 PM

#

"go buy AMD hardware scrub"

frank sail Jul 9, 2023, 11:16 PM

#

ue has frame markers for other thingies

#

just gotta figure out how to enable them

wicked notch Jul 9, 2023, 11:16 PM

#

How do you profile then

frank sail Jul 9, 2023, 11:16 PM

#

wdym

wicked notch Jul 9, 2023, 11:16 PM

#

Connect from RGP to Unreal somehow?

frank sail Jul 9, 2023, 11:17 PM

#

you just hook up your favorite gpu profiler to the game you're profiling

wicked notch Jul 9, 2023, 11:17 PM

#

Ah, do you have to export the game

frank sail Jul 9, 2023, 11:18 PM

#

for rgp, you just need to have rdp (the program that hooks into vulkan, dx12, and opencl apps) running before you launch the app

frank sail Jul 9, 2023, 11:18 PM

#

wicked notch Ah, do you have to export the game

you can connect to the engine or a standalone build of le app in question

wicked notch Jul 9, 2023, 11:18 PM

#

Oh that's great then

#

I don't want to wait 4 months for Unreal to package my stupid app with one mesh in it bleakekw

#

Last time I tried packaging my test thingy it took 1 hour

#

It had literally no actors beside a static mesh

#

Perhaps I did sumthing wrong

frank sail Jul 9, 2023, 11:20 PM

#

probably

wicked notch Jul 10, 2023, 12:17 PM

#

I have done a lot of investigations

#

Rendering all kinds of scenes, bistro subdivided into oblivion as well bleakekw

#

I subdivided everything into at least 100 million triangles

#

And I am starting to see some gains, it appears that THE WHOLE viewport, has to be covered in pixel sized triangles for the HW raster to be much slower than the SW raster

#

Perhaps Nanite really is only needed for stupid amounts of triangles

#

i.e: 1 billion+

glass sphinx Jul 10, 2023, 12:33 PM

#

nanite only renders aeound 100 mil at 4k i believe

#

sw raster can also be great to allow for larger view distances

#

or later lod switching

wicked notch Jul 10, 2023, 12:36 PM

#

Yeah, also overdraw I guess is much better with SW for larger draw distances

wicked notch Jul 10, 2023, 2:00 PM

#

sadness

#

Why does Windows freak out when I go overboard the physical memory limit, it's using far more than 48GB bleakekw

glass sphinx Jul 10, 2023, 2:28 PM

#

blender is such a slow boy sometimes

wicked notch Jul 10, 2023, 5:03 PM

#

I just re-read unreal's slides for the, uh

#

7th time KEKW

#

And they say "we software rasterize all clusters where at least one triangle is more than 32 pixels wide"

frank sail Jul 10, 2023, 5:04 PM

#

huh

cedar seal Jul 10, 2023, 5:04 PM

#

At least?

wicked notch Jul 10, 2023, 5:04 PM

#

At most*

#

No I mean

cedar seal Jul 10, 2023, 5:05 PM

#

More?

wicked notch Jul 10, 2023, 5:05 PM

#

I can't fucking write bleakekw

cedar seal Jul 10, 2023, 5:05 PM

#

No more?

wicked notch Jul 10, 2023, 5:05 PM

#

If all triangles are less than 32 pixels wide they software rasterize

#

That makes sense

cedar seal Jul 10, 2023, 5:05 PM

#

That would make sense yes

#

And hw rasterization rejects exactly same triangles?

wicked notch Jul 10, 2023, 5:07 PM

#

Now, does it make sense to have a compute shader with invocation size meshlet_count and workgroup size (MAX_PRIMITIVES, 1, 1) that can check for that

#

It should also cull now that I think about it

wicked notch Jul 10, 2023, 5:08 PM

#

cedar seal And hw rasterization rejects exactly same triangles?

HW raster gets the clusters that won't be SW rasterized yes

cedar seal Jul 10, 2023, 5:08 PM

#

Workgroup sizing has always been sort of a mystery to me

#

Only good way I know is to use microbenchmark

wicked notch Jul 10, 2023, 5:28 PM

#

I've googled without results, where do I find that tool

cedar seal Jul 10, 2023, 5:37 PM

#

I mean you write your own microbenchmarks that tell performance of different group sizes.

#

Brute force search essentially

#

Results can be and often are gpu specific, so the benchmarks need to be run on each installation.

wicked notch Jul 10, 2023, 6:36 PM

#

shared meshlet_glsl_t s_meshlets[MESHLETS_PER_WORKGROUP];
shared mat4 s_pvm[MESHLETS_PER_WORKGROUP];
shared vec3 s_vertices[MAX_VERTICES * MESHLETS_PER_WORKGROUP];
shared uint s_primitive_size[MESHLETS_PER_WORKGROUP];``` Do you guys think this is too much shared data? ![bleakekw](https://cdn.discordapp.com/emojis/1082598350303539240.webp?size=128 "bleakekw")

raven orchid Jul 10, 2023, 6:45 PM

#

How big are the constants

wicked notch Jul 10, 2023, 6:46 PM

#

MESHLETS_PER_WORKGROUP is 4, MAX_VERTICES = MAX_PRIMITIVES = 64

wicked notch Jul 10, 2023, 7:08 PM

#

Eh I expected nothing and of course this classificator is bad bleakekw

#

With this new classificator I have perfect accuracy

#

Except it takes about the same time it takes for me to rasterize the entire model bleakekw

#

How the hell does Unreal do this

#

damnit

glass sphinx Jul 10, 2023, 7:23 PM

#

this is probably an area that needs a ton of testing on bug maps used in production

wicked notch Jul 10, 2023, 7:50 PM

#

occupancy is great though KEKW

#

note: the "cull" part is a LIE, there is no culling right now

glass sphinx Jul 10, 2023, 7:51 PM

#

MASSIVE

wispy spear Jul 10, 2023, 7:53 PM

#

so what is the actual problem right now?

#

you can only rnder 10mio tris compared to UE's 100mio?

wicked notch Jul 10, 2023, 7:55 PM

#

Only one problem

#

I can't efficiently classify clusters

wispy spear Jul 10, 2023, 7:55 PM

#

compared to what

wicked notch Jul 10, 2023, 7:55 PM

#

To right now

wispy spear Jul 10, 2023, 7:55 PM

#

but in what does it manifest itself

wicked notch Jul 10, 2023, 7:55 PM

#

1.5ms to classify clusters is terrible

wispy spear Jul 10, 2023, 7:55 PM

#

ah

#

how long does UE take?

wicked notch Jul 10, 2023, 7:56 PM

#

I have no idea, but considering the whole nanite pass takes less than 2 milliseconds..

wispy spear Jul 10, 2023, 7:56 PM

#

its broken down into clusters already neh?

wicked notch Jul 10, 2023, 7:56 PM

#

Yes, the classify shader is as efficient as I could make it

wispy spear Jul 10, 2023, 7:56 PM

#

ah

#

ship it

#

provide debug/release binaries

#

let other frogs try it out

wicked notch Jul 10, 2023, 7:57 PM

#

https://hastebin.com/share/uqawimogom.glsl

wispy spear Jul 10, 2023, 7:57 PM

#

perhaps driver/hardware combos fuck with the results

wicked notch Jul 10, 2023, 7:57 PM

#

This is the shader if anyone wants to take a look

wicked notch Jul 10, 2023, 7:57 PM

#

wispy spear ship it

I will test on other hardware just to see yes

frank sail Jul 10, 2023, 7:57 PM

#

have you profiled UE with nsight yet

wispy spear Jul 10, 2023, 7:58 PM

#

or power states

wicked notch Jul 10, 2023, 7:58 PM

#

frank sail have you profiled UE with nsight yet

I'll do it right now

#

Opening UE5 at the speed of light

frank sail Jul 10, 2023, 7:58 PM

#

wispy spear Jul 10, 2023, 7:58 PM

#

ram really got cheap af btw

#

i was wondering if i should go from 64 to 128 too 🙂

raven orchid Jul 10, 2023, 8:02 PM

#

wicked notch <https://hastebin.com/share/uqawimogom.glsl>

This shader is the one taking 1.5?

wicked notch Jul 10, 2023, 8:03 PM

#

yes

wispy spear Jul 10, 2023, 8:03 PM

#

uqawimogom 🙂

#

explains a lot heh

raven orchid Jul 10, 2023, 8:06 PM

#

I wonder if the atomic ops are eating most of its time

frank sail Jul 10, 2023, 8:07 PM

#

inb4 bank conflicts

#

not sure where that'd appear in the profiler. maybe under VRAM

wicked notch Jul 10, 2023, 8:07 PM

#

raven orchid I wonder if the atomic ops are eating most of its time

It's VRAM bound to hell and back

#

I'd say vertex fetch and transform are eating up my precious milliseconds

#

Unfortunately I have no idea what LGSB is KEKW

frank sail Jul 10, 2023, 8:08 PM

#

is there something you can do to simulate fetching fewer vertices/less vertex data to see if that helps perf

wicked notch Jul 10, 2023, 8:09 PM

#

Yes, fewer verts does help

frank sail Jul 10, 2023, 8:09 PM

#

what about atomics

wicked notch Jul 10, 2023, 8:09 PM

#

64 verts / 64 prims meshlets are the best

#

64 / 126 recommended by nvidia is kinda trash

frank sail Jul 10, 2023, 8:09 PM

#

try replacing atomics with regular ops and see if perf changes (ignoring the brokenness)

wispy spear Jul 10, 2023, 8:09 PM

#

LGSB = LarGe String Buffer

wicked notch Jul 10, 2023, 8:13 PM

#

removing the atomics quite literally changed nothing bleakekw

#

Damn alright

#

I guess AMD needs to invent infinity cache except for memory bandwidth

#

Infinite Memory Bandwidth (AMD patent pending)

frank sail Jul 10, 2023, 8:14 PM

#

How compact are your vertices

#

Maybe you can shave some bytes off

wicked notch Jul 10, 2023, 8:15 PM

#

I can

#

I have purposely left quantization for later™️

#

But I guess it's my only shot at better performance

wispy spear Jul 10, 2023, 8:17 PM

#

out of curiosity i tried to open various nsight docs and tried putting LGSB into their searchboxes, 0 hits

#

which is quite weird

wicked notch Jul 10, 2023, 8:25 PM

#

It's here: https://docs.nvidia.com/nsight-graphics/UserGuide/

#

C+F: Long Scoreboard

#

It's basically "wait for this memory load to complete"

wispy spear Jul 10, 2023, 8:27 PM

#

ah

#

appendix wouldvebeenve nice

wicked notch Jul 10, 2023, 8:28 PM

#

It's a pain in the arse to search for docs for NV

#

Yes

wispy spear Jul 10, 2023, 8:28 PM

#

can you press F1 in that LGSB column/row within nsight

#

or is there a ? button in the system menu like good old win9x windows had

wicked notch Jul 10, 2023, 8:29 PM

#

I hovered over LGSB and it said "Long Scoreboard" yes

wispy spear Jul 10, 2023, 8:29 PM

#

ah lol

#

showing myself out

wicked notch Jul 10, 2023, 8:30 PM

#

Nah I didn't notice myself at first bleakekw

#

Docs should be easily available

wicked notch Jul 10, 2023, 9:35 PM

#

when the classification stage takes more time than the rasterizer

#

I love absolute non sensical results

wispy spear Jul 10, 2023, 9:40 PM

#

did you talk to devsh yet?

wicked notch Jul 10, 2023, 9:40 PM

#

He's not active recently so I couldn't catch him

wispy spear Jul 10, 2023, 9:40 PM

#

dont be afraid to boop him, im sure he has a few bits and bops to say about this

wicked notch Jul 10, 2023, 9:59 PM

#

By the way something is 100% amiss with meshlets

#

There is no way in hell these 3 meshlets (remember, they have 64 triangles inside them) have a triangle MORE than 32 pixels wide

#

Red is hardware, Blue is software btw

#

Unless they are part of the same meshlet, which again makes no sense because meshlets should be continuous

frank sail Jul 10, 2023, 10:04 PM

#

inb4 bad perf is due to a bug causing you to render 64x more stuff

wicked notch Jul 10, 2023, 10:07 PM

#

imma open an issue with meshoptimizer

#

Nevermind

#

zeux doesn't agree with me

#

https://github.com/zeux/meshoptimizer/issues/531

#

mfw
I’m not sure I agree with this being a mistake. For mesh shading implementation on desktop my assessment has been that under filling meshlets results in more efficiency loss than the extra culling is worth.

#

At least I have peace of mind anything I've done so far wasn't wrong

wicked notch Jul 10, 2023, 10:49 PM

#

Aight I guess I'll make my own custom meshopt based on the actual meshopt

#

It's funny, anything made by zeux I end up making my own fork
gltfpack => I have my own
meshopt => soon™️

#

I guess our opinions are very different bleakekw

frank sail Jul 10, 2023, 10:50 PM

#

lvstri trying to render 1 (one) mesh

wicked notch Jul 10, 2023, 10:51 PM

#

frank sail lvstri trying to render 1 (one) mesh

Honestly lol

#

Anything I do does not conform to any generally agreed upon standard I have different requirements for everything bleakekw

#

Feels like I'm reinventing the universe

#

I'll follow deccus suggestion and take a break

#

Jaker you now have an employee

#

I expect paychecks

frank sail Jul 10, 2023, 10:53 PM

#

you could also implement a different part of the renderer if you don't want to break too hard

#

like shading

#

but yeah don't burn yourself doing too much schtuff

wicked notch Jul 10, 2023, 10:54 PM

#

perhaps I'll port what I had back in opengl yes

#

Shadow maps took 4ms to render with culling back then

#

I wonder how much that improves with this ridiculously optimized pipeline

frank sail Jul 10, 2023, 10:55 PM

#

5ms bleakekw

#

will be interesting to see though froge

wispy spear Jul 10, 2023, 10:57 PM

#

and speak to devsh

#

since wpotrick is rather useless 😄

#

and or write about your findings and problems

#

publish it somehow

wicked notch Jul 10, 2023, 10:57 PM

#

potrick has helped me a lot with instancing, he's good

wispy spear Jul 10, 2023, 10:58 PM

#

(i know, i was just kidding)

wicked notch Jul 10, 2023, 10:58 PM

#

Also I think he just wants me to join his cult bleakekw

distant lodge Jul 10, 2023, 10:58 PM

#

devsh has done full on meshlet stuff? I know he mentioned doing a visbuffer

wispy spear Jul 10, 2023, 10:58 PM

#

yeah the so called brainworm

wicked notch Jul 10, 2023, 10:58 PM

#

epic, I'll pong him tomorrow I guess

wispy spear Jul 10, 2023, 10:59 PM

#

doesnt hurt

#Iris - A Journey through OpenGL and beyond to learn Graphics