#Foundation - My adventure through Graphics Programming

1 messages ยท Page 2 of 1

supple cliff
#

we are working on async in taskgraph atm

merry laurel
#

I use task graph for really long time

supple cliff
#

nice

merry laurel
supple cliff
#

are already added the "concurrent" use enums for it

#

they also work already if you want to overlap tasks that use the same resource

merry laurel
#

nice. I have sw raster so compute async for it would be nice ๐Ÿ˜ˆ

#

finally

#

my texture streaming is now instant

#

with 32 threads it should be XD

supple cliff
#

nice

#

i think i m getting the 9800x2d

#

or the 16 cor variant if it has dual x3d

merry laurel
#

yeah

#

nemez said it wont

supple cliff
#

im choping

merry laurel
#

I went for 7950x because it had nice discount like most things that I got

#

9800x3d looking really nice but I want 32 threads with dual cache ๐Ÿ˜ญ

#

this upgrade is insane 7700HQ -> 7950X and 1050M -> 4080 super

paper ore
#

then you should have gone for some epyc

spring sandal
#

Btw how many total additional shit did you buy before it worked?

merry laurel
#

....

paper ore
#

it felt like everything 3 times

#

including stickers for the tower

spring sandal
#

Its a serious question, deccersmart

merry laurel
#

1x 7950X, 2x motherboards, 4x ram kits

#

fucking a lot of money

paper ore
#

oh

#

so the 2nd mobo worked already

#

was not dead cpu then

summer sigil
#

I'm finally upgrading from a 2600 to a 5800x, if I can get a used 5800x cheap off ebay

merry laurel
#

nice

merry laurel
paper ore
#

i got my 5950x off ebay too, with mobo cooler and 32gb ram, for 320 european pesos, but i got a 64gb kit new

paper ore
merry laurel
#

yeah

#

so in total 2x 7950x, 3x motherboard and 5x ram kits

#

summer job money coming in clutch

#

also my microphone is waiting for me

spring sandal
merry laurel
#

php dev

#

not really. I was doing a lot of plumbing for their mono repo

#

docker, configs, databases and php + symfony bullshit

#

Also I learned to use mysql to code gen sql for me KEKW

spring sandal
#

How did you get the job? Looking at local job market and its just destroyed(

merry laurel
#

first job I got through friend who is designer he worked for one company and they needed a programmer. I was hired on spot basically. second job I got because my high school has mandatory internships in 3rd and 4th grade. I got there through friend they have never done internships before. After 14 days he sat me down asked bunch of questions thats were he learnt that I am high schooler and got a job there

#

They are expecting me after I graduate HS

#

before that I worked from my 15 years bunch of other jobs

#

I am lucky bastard

#

first job my github helped a bunch

spring sandal
merry laurel
#

I mean its not just a luck but a lot of work also

#

In my high school there is day of companies(rough translation) where like 60 of them to our school and try to recruit us

spring sandal
#

My college doesnt offer internships) Hopefully uni will offer something

merry laurel
#

same here. We look for them

#

They say you have to have internship and dont offer some. So you gotta look on internet

spring sandal
#

So if you dont have one they just kick you out?

merry laurel
#

nah

#

but they will find you one

merry laurel
#

I am running bistro + sponza with 1.2k fps

#

threat interactive in shambles

merry laurel
#

got even clang to compile my project froge_love

merry laurel
#

mesh shader time

supple cliff
#

king time

merry laurel
#

and then nanite LODs rendering bleaker_kekw

#

I gotta figure that one

paper ore
#

somebody whined about mesh shaders being slower than ordinary shit the other day

merry laurel
#

where?

supple cliff
merry laurel
#

yeah

#

for sure

#

If I remember correctly you moved shit from task to compute?

#

I want to try implement my own work graphs

#

vk spec my beloved

paper ore
#

my b

supple cliff
#

i can switch

#

i am trying to accesss work expansion perf still

merry laurel
#

๐Ÿ‘€

supple cliff
#

for task shaders poo2 wins

#

but its very close in compute culling

merry laurel
#

oh boy thats a lot of bikeshed

supple cliff
#

yea it is

merry laurel
#

people in academic should be banned from writing code

#

because oh god that shit is unreadable

#

I found some DGC code bit disappointed since you cant really execute commands in same way as work graphs its more less indirect dispatch/draw call but with more flexibility

merry laurel
#

VK_INDIRECT_COMMANDS_TOKEN_TYPE_DISPATCH_EXT ๐Ÿ˜ˆ

#

what the actual fuck

#

They removed support in newer driver version? KEKW

#

or is this beta

#

anyway

supple cliff
#

afaik this driver is a rollback due to a securety bug

#

see the vulkan version decreasing as well

merry laurel
#

Oh completely forgot about that

merry laurel
#

fastest running windows server 2008 R2 bleaker_kekw

paper ore
#

why

merry laurel
#

For school bleaker_kekw

merry laurel
#

Also I am forced to write documentation

#

I dont want to talk how we were forced to use MS DOS

spring sandal
#

why?are you studing archaeology?

merry laurel
#

sadly I am studying HS computer science

supple cliff
#

horse science

merry laurel
#

FOR THE EMPEROR

merry laurel
#

why 80% of time is spent in resolve vis buffer shader bleakekw

#

Found the coulprit stupid atomic ops

#
const f32 mip_map = log2(max(length(uv_grad.ddx * 65536u), length(uv_grad.ddy * 65536u)));
const u32 wave_mask = WaveActiveBitOr(65536u >> u32(max(0, mip_map)));
if(WaveIsFirstLane()) {
    InterlockedOr(push.uses.u_readback_material[mesh.material_index], wave_mask);
}
summer sigil
#

What's with the wave ops?

merry laurel
#

this is the code snippet causing this bad perf

#

from 2.2 ms -> 0.6 ms KEKW

merry laurel
summer sigil
#

What is this pass doing?

merry laurel
#

its resolving vis buffer to color image + writes which texture size is needed for material

summer sigil
#

Oh, is that for your texture streaming?

merry laurel
#

yep

summer sigil
#

Makes sense now

merry laurel
#

the WaveIsFirstLane should reduce to amount of atomics to just one per subgroup if I am correct

#

I think it could be also cause by that buffer being host mapped so I can do readback

#

I will fix that and see if it improves

#

This weekend I want to add mesh streaming

merry laurel
#

fuuuuuuuck

merry laurel
#

forking daxa for debug info

merry laurel
#

nsight is just stupid froge_sad

supple cliff
#

slang debug info doesnt work

#

add -g2 to the cmd line

#

i have it in a commite but didnt submit

merry laurel
supple cliff
#

yea nsight has 0 slang support

#

makes no sense cause it doesnt need anyrhing

#

it should just eat it but doesnt

#

g2 makes it work in aftermath crashes tho lmao

merry laurel
#

terrible readback performace has been fixed 2.6-2.2ms -> 0.4-0.6ms

merry laurel
#

I hate culling

pale vortex
#

Why tho

merry laurel
#

Rn I am adding culling for meshes and its broken

#

but works fine for meshlets

merry laurel
#

I just fixed it. It was so stupid

merry laurel
#

Mesh streaming is in

#

I perform frustum + hiz culling which is quite strict and I am evicting meshes after 255 frames

#

In whole scene there is over 1600 meshes

paper ore
#

so

#

with a 240hz monitor

#

when i do a 360 ๐Ÿ˜„

#

i see all sorts of plopping and loading

merry laurel
#

this would be fixed with faster decompression and less strict frustum culling and maybe getting rid of hiz culling

#

also distance based culling

#

so some meshes around camera would be kept

#

For example you dont see texture pop in for texture streaming because their is much less of them and I still keep 16x16 px in VRAM

#

It also might be creating buffer which is slow it can take up to 10 ms

merry laurel
paper ore
#

perhaps this will be depending on what kind of game you make with it

#

a car racing game which is rather linear you can just evict asap

#

with some openworld fps nonsense not so fast

#

openworld rts with some isometric camera faster again et al

merry laurel
#

Either way it should be fast

#

Its time to make it faster

supple cliff
#

cracked

#

i looove tracy traces

#

oooh so sexy

merry laurel
#

made it "instant" with caching and running it in release build KEKW

#

so much threads

#

Funny thing this would be faster(my guess everything would be loaded in 1 frame) if threads werent too slow to pick up the jobs

merry laurel
#

thanks to task and mesh shader I need to basically change all my code

#

also tragic embed fail

merry laurel
#

I just hit a slang compiler bug ๐Ÿ’€

paper ore
#

report it

#

the frogs usually respond rather kwik

merry laurel
#

this is the code that makes slang explode

groupshared u32 hw_meshlet_count;
[shader("amplification")]
[numthreads(32, 1, 1)]
void draw_meshlets_task(u32vec3 local_thread_index : SV_GroupThreadID, u32vec3 global_thread_index : SV_DispatchThreadID, u32vec3 group_index : SV_GroupID) {
    const u32 count = push.uses.u_meshlets_data->count; // reading from pointer in push constant
    
    if(local_thread_index.x == 0) {
        printf("%u\n", count); // commenting this make validation layer error disappear

        // also commenting makes validation layer error disappear.
        // printf or atomic code below has to be commented in other to not crash
        u32 i = 0;
        InterlockedAdd(hw_meshlet_count, 1, i);
        hw_meshlet_count = i;
    }

    GroupMemoryBarrierWithGroupSync();
}

struct Vertex {
    f32vec4 sv_position : SV_Position;
};

struct Primtive {
    nointerpolation [[vk::location(0)]] u32 vis_index;
    bool cull_primitive : SV_CullPrimitive;
};

[outputtopology("triangle")]
[shader("mesh")]
[numthreads(1, 1, 1)]
void draw_meshlets_mesh(
    in payload MeshPayload payload,
    u32vec3 local_thread_index : SV_GroupThreadID,
    u32vec3 group_index : SV_GroupID,
    u32vec3 global_thread_index : SV_DispatchThreadID,
    OutputIndices<u32vec3, MAX_TRIANGLES_PER_MESHLET> out_indices,
    OutputVertices<Vertex, MAX_VERTICES_PER_MESHLET> out_vertices,
    OutputPrimitives<Primtive, MAX_TRIANGLES_PER_MESHLET> out_primitives) {}

[shader("fragment")]
void draw_meshlets_frag(in Vertex vertex, in Primtive primitive) {}   ```
paper ore
#

report it ๐Ÿ™‚

merry laurel
#

updated slang and its gone KEKW

merry laurel
#

I am fighting task graph allocator

#

@supple cliff Is there some stupid way to make the buffer larger?

#

found it

supple cliff
#

yes in constructor\

merry laurel
#

so stupid KEKW

paper ore
#

how many trillion of verticles is this?

merry laurel
#

this many 44 224 440 125 vertices

paper ore
#

0.0044224440125 trillion vertices

merry laurel
#

I need to optimize

merry laurel
#

I have 24 MB buffer for each bistro so multiply that by 125 bistros

#

how much memory it is

paper ore
#

oofy

#

3000

merry laurel
#

this was optimization for my 1050

#

I will remove it and do mesh shaders

paper ore
#

4080 has 12? or 16? giggerinos of vram

merry laurel
#

16

#

also other thing that I need to fix is my task graph allocator

#

remove dependence on it

#

also the memory budget textures vs geometry is insane KEKW

paper ore
#

ok so 16/3 is 5

#

that means you can do 725 bistros

#

0.02 TRILLION vertices ๐Ÿ˜„

merry laurel
#

I dont why adding buffer_ptr to task makes engine crash in submit

merry laurel
#

same scene but 15x less memory

paper ore
#

ok so 725 times, times 15

#

0.02 times 15 is 0.3 TRILLION no? ๐Ÿ˜„

merry laurel
#

dammit

#

I want my 1 trillion vertices

summer sigil
#

You can compress meshlets very easily

paper ore
#

put 3 more gpus in the system

merry laurel
#

I loaded 1000 bistros and its 909 MB

#

Its jugging on sw raster KEKW

paper ore
#

: D

merry laurel
#

hiz is broken again bleakekw

merry laurel
#

so much saving

paper ore
#

the heck

merry laurel
#

git stash my beloved

merry laurel
#

Okay now I need to optimize CPU side

merry laurel
#

oh boy I am iterating over 154k entities ๐Ÿ’€

#

easy 70ms of frame time gone

#

added special flag for root entities

#

and now transforms

pale vortex
#

what impl are you using for them?

#

for the iteration I mean

#

I don't think 154k is a lot for ecs

merry laurel
#

I am using flecs. I was iterating over each entity in query and checking if it doesnt have parent

#

that costed me 60-70ms

#

I tried the flecs way but It looks like its sameish

#

So I went for special flag which changed that from 60-70ms -> 50us

pale vortex
#

yes that's how I store parenting info too

#

the hierarchy is another set of components where entities have a HierarchyParent component

#

but it is strange that flecs doesn't provide this out of the box

merry laurel
#

flecs has its own. imo its kind of weird

merry laurel
#

this is how you assign parent to child

#

this is how you iterate over children

#

query_transforms = world->query_builder<GlobalTransformComponent, LocalTransformComponent, GlobalTransformComponent*>().term_at(3).cascade(flecs::ChildOf).optional().build();

#

this is for iterating over parents and cascading to children

pale vortex
#

something must be wrong, 70ms is enormous for hierarchy traversal

merry laurel
#

yeah

pale vortex
#

wait

#

you're not running debug are you

#

is this a debug build

merry laurel
#

I am

#

always debug KEKW

pale vortex
#

well eventually you'll want a develop build with optimizations on, but do check what flecs docs say about this

#

and if it requires optimizations to be on to be fast

merry laurel
#

now its also fast in debug so I am not complaining KEKW

pale vortex
#

yeah but if all you want is to see which are the root level entities, then yeah you solved it quickly

#

but what if you want which are the entities that are two levels deep within their hierarchy that have a particular property?

#

you're back at 70ms

#

in general it doesn't make sense to benchmark debug because it reflects nothing real

merry laurel
#

I am not planning to turning this into game engine or game in near future. So jank solutions until then

pale vortex
#

it certainly doesn't reflect the performance your user will see

#

the code will be changed so much by the optimizer (the assembly I mean) that it won't resemble debug at all, most likely

merry laurel
#

yeah. I got quite big uplift for streaming

pale vortex
#

these are the components that I use for my hierarchy

struct HierarchyParent
{
    SceneEntityId ParentId;
};

struct HierarchyChildren
{
    std::vector<SceneEntityId> Children;
};

struct IsRootInHierarchy
{

};

I don't have 154k entities for sure tho so I never benchmarked it

#

it's a bit of work when the hierarchy changes but iterating afterwards should be reasonably quick

merry laurel
#

I had something simillar with entt. flecs has its own way and I am lazy to learn it. It took me a while to query all parent and childrens transforms

merry laurel
#

Todo list for me:

  1. pack transforms and readback for meshes
  2. make mesh shaders faster
  3. try nanite lods bleakekw
#

No clue how I will select the correct lod

paper ore
#

by area perhaps

merry laurel
#

thats the easy part

paper ore
#

now draw the rest of the fucking owl

#

: >

merry laurel
#

but you need to make sure you dont select upper lod level or lower(basically rendering 2 lods meshlets just overlaying each other)

paper ore
#

a little overdraw hasnt hurt anyone yet :>

merry laurel
paper ore
#

ignore me please, i have no clue about any of this advanced tek

#

people in #1090390868449558618 know probably or have ideas

merry laurel
#

I will try to figure this thing. It is going to be fun

#

first I will get the lods on the screen first

#

Also I hope nvidia devs will bless us with nsight update ๐Ÿ™

summer sigil
merry laurel
merry laurel
#

No clue why going out of bounds on gpu doesnt prompt crash for me....

paper ore
#

perhaps robustness is on/off by default?

merry laurel
#

pretty sure its off

#

6656 bytes -> 208 bytes
18.8 MB -> 12.6MB

#

nice savings

merry laurel
#

Today move the rest of code to mesh shaders and hopefuly do some optimization.

#

Then ๐Ÿฅ ๐Ÿฅ ๐Ÿฅ ||nanite lods||

proud marten
merry laurel
#

So I moved code to mesh shaders. It was pain to do so. I was hitting some edge cases

#

Now finally meshlets lods

#

Gotta do a lot of reading

merry laurel
#

also got this baby

merry laurel
#

I am thinking of taking pause on this project and try hw rt

supple cliff
#

daxa rt?

merry laurel
supple cliff
#

cool cool

#

the shader binding table stuff is still very akawrd

#

but we couldnt find a good abstraction yet

#

jaisero worked on it a while but i didnt check on it

#

So i am eager to see what you come up with

merry laurel
#

yeah I was looking at shader binding table and it was ify. I have some idea but idk

merry laurel
#

@supple cliff My idea to get rid of indices is to wrap it into objects that would hide the index and make it readable.

#

This the idea I got without much going in depth

merry laurel
#

I might have shot myself into foot with offline asset baking and some assumptions with meshlets

#

also I unload meshes from VRAM so another bullet hole in my foot

merry laurel
#

quite chonker

paper ore
#

is that your current project?

merry laurel
#

yeah if you render your scene in 0.2ms and you are sending 500mbps it fill ups quite fast

merry laurel
paper ore
#

ah

#

but

#

what does it show exactly

merry laurel
#

it shows the amount of data that tracy received from application that is being profiled

pale vortex
#

when you really want to make sure you get 60fps even after your game runs for 100 hours

paper ore
#

i see

merry laurel
#

much better

merry laurel
#

sw raster

#

sw raster disabled

#

quite abysmal

summer sigil
#

Uh, why's it so slow? Mine is way faster ๐Ÿค”

#

Did you cache vertices in wg memory?

supple cliff
#

i also have way more occupancy, its at 80% or so for me

#

maybe its the scene

merry laurel
#

I do cache in wg memory and the scene in question

#

its 2544037 meshlets in question to be sw rendered nervous

#

both in early and late pass

#

Only thing to help me would be LODs

merry laurel
#

finally slang support froge_love

summer sigil
#

Does nsight have anything that tells you which variables are living long?

#

It just tells you number of live registers, but I don't see how to turn that into something actionable

merry laurel
#

I will be able to check in 10 hours after school bleaker_kekw

supple cliff
#

it shows what register usage certain lines cause

merry laurel
#

I've got no clue

#

This is the first time I am able to profile the frame and see shaders

merry laurel
#

made it bit better

merry laurel
#

now this is the worst offender

#

I made other change which cut it by 0.5ms in total. But these changes are quite incremental. The biggest problem is how many meshlets are rendered in first place

merry laurel
#

I found easy trick to compute normal matrix on gpu

#

and its dirt cheap

#
    cross(transform_matrix[1].xyz, transform_matrix[2].xyz), 
    cross(transform_matrix[2].xyz, transform_matrix[0].xyz), 
    cross(transform_matrix[0].xyz, transform_matrix[1].xyz)
);```
merry laurel
#

My gpu transform info went from 128 bytes to 84 bytes after using 3x4 matrix and 3x3 matrix. Now it can be just 3x4 matrix(48 bytes). After seperating position, scale and quaternion from matrix it can be just 40 bytes

merry laurel
#

lods are borked froge_sad

merry laurel
#

I had some big chunks missing

#

it seems it really hates roofs of bistro since they are made just out of one meshlet

merry laurel
paper ore
#

how does overdraw l ook like?

merry laurel
#

honestly I wonder too KEKW

paper ore
#

luckily i know a tool which can visualize that ๐Ÿ˜›

merry laurel
devout niche
summer sigil
#

How do y'all deal with small meshlets/triangles getting stuck and never getting simplified?

#

Like meshopt does a great job of evenly splitting things up into meshlets for the first LOD

#

But then the next level it gets kinda screwed up, and it often dosen't simplify out

merry laurel
merry laurel
merry laurel
#

one leaf is one meshlet bleakekw

summer sigil
#

ugh

#

Unreal solved it somehow

merry laurel
#

I wonder how

#

in my case I dont see how the meshlets would be more simplified

paper ore
merry laurel
#

yeah froge_sad

paper ore
#

bistor bad

merry laurel
paper ore
#

yeah

merry laurel
#

@devout niche here are the stats

#

13.5m meshlets

paper ore
#

didnt we have trillions of vertices already?

#

: )

merry laurel
#

nope

#

sw raster is jugging destroying performace so much

#

only good thing is I finally got lods in

paper ore
#

if only that could be hardware acceleratered ๐Ÿ˜„

merry laurel
#

so I can focus on performace

paper ore
#

its an oxymoron

merry laurel
#

I know frogegreenexcited . It was to my previous message

merry laurel
paper ore
#

do that first

#

gp can wait

merry laurel
#

thesis and the final exams are in few months

#

so its fine

#

they are in may

#

so many changes

summer sigil
#

@small osprey I'm thinking about it, and how does a seperate BVH tree per LOD level make sense? Wouldn't all the groups in a LOD level have similar errors?? What's the BVH really accelerating here?

#

Frustum/Occlusion culling, maybe?? Idk

small osprey
summer sigil
#

But like what's the point of a tree then?

#

Why not just group meshlets by LOD, and then have like 1 two level tree where you select a LOD level, and then look at every meshlet in the LOD

small osprey
#

that's discrete lods again

#

remember, we have similar world space errors, but they're not similar after screen space transformation

summer sigil
#

Yes, but you can still have a 2-level tree

small osprey
#

it depends on absolute error as well as bounds

summer sigil
#

I don't get how you're accelerating anything by having a multi level tree

small osprey
#

because the value you're traversing on depends on abs error and bounds

summer sigil
#

I'm not getting it :/

small osprey
#

you project error to the screen

#

so that depends on where the group is

summer sigil
#

right, and?

small osprey
#

so you can discard huge parts of the lod-bvh if a node high up has too much error

#

say you have a lod transition halfway across the mesh from the camera

#

you will traverse the half of the lod-bvh for lod n, and half of it for lod n + 1

#

if you had a two level tree you'd traverse two whole lods

#

now imagine you have 6 lod transitions because it's a massive mesh

#

with a two level tree you'd go through 6 lods worth of meshlets

#

with a bvh it's roughly similar

#

i think atleast

#

i just did it because that's what the slides said they did lmfao

summer sigil
#

ok, so

#

the LOD selection is based only on LOD spheres

#

So do you build your AABBs for the BVH based on the group LOD spheres?

#

I know LOD selection depends in part based on the meshlet group position

#

But like I don't quite see how the BVH interacts with that

small osprey
#

abs error is not identical across an entire LOD

#

it varies considerably

summer sigil
#

Does it? Ok

small osprey
#

it just whatever meshopts spits out as the highest edge collapse error

summer sigil
#

Remind me how I build the leaves of the BVH though? It's meshlet groups, but what do I use for determining the AABB bounds for it's BVH node?

small osprey
#

if some part of the mesh lower tri density, error will be higher for the same lod

#

individual meshlet AABBs iirc

summer sigil
#

You're not using meshlet groups for your BVH leaves?

small osprey
#

uhhhh i forgor lemme check what i actually do

#

so group lod spheres are the union of all meshlet lod spheres

summer sigil
#

ok yes

#

then how do you start building the BVH?

small osprey
#

over actual group AABBs apparently

#

like, just union of meshlet AABBs

#

don't ask why

summer sigil
#

right, that's what I thought

#

but then how do you garuntee that error is monotonic down the BVH?

small osprey
#

just don't use it for SAH

#

maybe i should

#

also my error projection still isn't quite right so it's not actually monotonic always froge_sad

#

things like disappearing sometimes

#

but very rarely

summer sigil
small osprey
#

shouldn't be a correctness problem, maybe just a perf issue

#

because when converting the temporary build bvh to the full bvh i recursively merge all lod and cull bounds again

#

and max out abs errors

#

idk tho

#

i wrote this code with zero refs because karis doesn't talk about it in the slides kekkedsadge

#

it's a miracle it even somewhat works

summer sigil
#

yeah :/

merry laurel
#

what the actual fuck

#

I just went from 30 ms -> 12 ms

#

So stupid

supple cliff
merry laurel
#

quite simillar

summer sigil
#

I don't know why it gets so much worse over time D:

#

how I'm grouping clusters must be poorly done

#

that, or splitting the group ig

merry laurel
#

how are you able to see invidual lods?

#

also I might be too tired but I dont see anything really wrong

#

number of meshlets is going down

summer sigil
merry laurel
#

I need to play more with the nanite lods but first I want to optimize the rendering

summer sigil
#

I'm going to try adding spatial links between clusters and see if it helps

merry laurel
#

๐Ÿ‘€

#

Tell me later if it improved the grouping

summer sigil
#

I'm not sure how to do it tbh

#

I want to focus on minimizing shared edges

#

But also have it so if that there's an equal number of shared edges, then to use spatial weights

small osprey
summer sigil
#

maybe it's the splitting step thats the problem

#

and not the gropuing

small osprey
#

which means that clustering groups after simplification is messing up yeah

#

what's your target tri count?

summer sigil
small osprey
#

no like percentage for simplify

summer sigil
summer sigil
small osprey
#

try 45% maybe

#

oh yeah how large are your groups

summer sigil
#

8 meshlets

summer sigil
#

I keep ending up with these tiny meshlets :/

small osprey
merry laurel
#

How long do your assets take to build?

summer sigil
#

pretty much instant for the bunny, much longer for larger meshes

summer sigil
small osprey
#

rip

small osprey
#

something isn't quite linear

summer sigil
#

I need to build a way to visualize groups, so that I can see if it's the grouping, the simplification, or the splitting step that's the issue

small osprey
#

jasmine's blog and the nanite slides personally

merry laurel
#

Thank you. Nanite is similar to normal meshlet rendering but with nanite lods and software rasterization in compute. I will send you resources when I come home froge_love

paper ore
#

jglrxavpok is also on the serveur, jmsine too ๐Ÿ˜›

#

lvstri and wpotti also have some clue

merry laurel
#

quite a lot

#

I need to learn gpu work expansion ๐Ÿ˜”

merry laurel
#

does anybody know what this means?

#

I love how nsight has 0 useful docs

merry laurel
#

I need to fix my random crashes...

paper ore
#

0.0029 TRILLION vertices

merry laurel
#

I should get rid of the vertex count. its the unique vertices count

paper ore
#

then show the real values

small osprey
#

it's not a real bottleneck

merry laurel
#

Nvidia should fix their stuff

small osprey
#

you should go work for them

merry laurel
#

would like to but I need to get out of the high school first froge_sad

#

so in 4 years(also after collage) bleakekw

merry laurel
#

yep

#

still stuck in high school

small osprey
#

i miss high school

merry laurel
#

after I graduate I will have long summer holidays about 4 months and then collage starts...

small osprey
#

life was so fun back then

#

now i speedrun 2 months of coursework in 3 days after the deadline

merry laurel
#

I hope the collage wont suck that much

#

I already know some cs stuff from hs which is like 2 semesters

merry laurel
small osprey
#

you will soon enough KEKW

merry laurel
#

friends and classmates are good. But boy doing nothing or some worthless stuff

#

at least I know how to configure ms dos and set up vlans

merry laurel
#

I wont be missing waking up at 5:00 to get to school

small osprey
#

ok yeah i don't miss that

#

i love waking up at 4 pm and missing all my lectures

#

it's great

merry laurel
#

night owl I see

#

I am in last grade and there is almost nothing to teach and final exams are around the corner. So my thurdays and fridays look like get to school at 7:00, sit 2 hours some linux stuff, sit 2 hours for java stuff, english and another 2 hours of java where teacher shows to how brew beer

#

Also today we had some students from collage to present why to choose their collage. Not joking the pros of their collage is they have their own brewery and pub in faculty

small osprey
#

yeah 12th grade for me was vibe, chemistry, then vibe

small osprey
#

surely you're in germany

merry laurel
#

a lot of collages have pub on faculties KEKW

merry laurel
small osprey
#

lol

merry laurel
#

czechia

small osprey
#

we've got like 3 student union run bars

merry laurel
#

we drink 2x more beer than germans

small osprey
#

no brewery tho

merry laurel
#

this is just beer

#

we have own wine and hard alcohol

small osprey
#

surely the UK is gonna beat y'all on pure alcohol consumption right

merry laurel
#

nope

#

My beloved slivovice

#

90% pure alcohol

small osprey
#

yum

merry laurel
#

doubles as nail polish remover

small osprey
#

nail polish or nail polish remover

merry laurel
#

In few days christmass holidays which I will spent preparing for final exams bleakekw

small osprey
merry laurel
#

Standard is your native language 2 essay, grammar exam and read 20 books and they will ask you questions about everything even authors. Then English or Math, English is an easy essay and grammar and oral. Math is self explanatory. But I am in computer science high school so I have additional programming, databases, networking, hardware and some stuff about boolean algebra, processors, etc... and then also high school thesis(basically bacholar thesis)

#

this is what awaits me

#

in few months

#

Each cs subject is 30 questions

#

It is a lot to remember

#

All these exams happen in 1.5 week window

small osprey
#

wtf

merry laurel
#

Also I will have my collage entrance exams before I graduate

#

yeah thats why I am bleaker_kekw about final exams

summer sigil
#

Ok, so here's what my meshlet groups look like at LOD 1

#

I'm gonna try METIS for grouping triangles into meshlets

merry laurel
#

it looks a bit better at least from this angle

summer sigil
#

Anyone have unreal they can run the mesh through and see how it looks in nanite?

merry laurel
#

I am downloading unreal if you can send me the model file I can try

summer sigil
merry laurel
#

thats going to take some time

summer sigil
#

Nw ๐Ÿ˜… . I appreciate the help.

#

See if Nanite can visualize meshlet groups, or only meshlets.

#

And try to get close enough to visualize LOD 0 and 1 please

merry laurel
#

keeping me warm froge_love

#

@summer sigil ๐Ÿ˜‰

summer sigil
#

Those are clusters and not cluster groups, right?

#

Thanks!

merry laurel
#

clusters

summer sigil
#

Hmm what is "patches"?

merry laurel
#

I wonder too

#

triangles?

#

unreal has now nanite tesellation so its maybe related to that?

paper ore
#

the term patches suggests that yeah

summer sigil
#

Ohh probably

#

I was going to say, it could be cluster groups, but I never saw them use that term.

#

Tesselation is more likely, given that it was added after their 2021 presentation

#

Anyways I think there probably is improvements I could make to the cluster grouping

#

But I have a feeling that my main issue rn is meshopt's meshlet algorithm performs poorly for Nanite

#

Small meshlets are very problematic when you need to make the next LOD

#

And it dosen't prioritize minimizing locked edges

#

So, I'm gonna try METIS

merry laurel
#

They have done some serious improvents to rendering and materials

summer sigil
#

Yeah I've read that. I haven't spent any time on material optimization or ergonomics yet.

#

I plan to eventually though

merry laurel
#

Same here

summer sigil
#

Really hoping we get device generated commands in wgpu before I have to tackle that...

merry laurel
#

I want to spent few months on nanite until I graduate then I will have fun with shading and other stuff

merry laurel
#

or also materials

summer sigil
#

For materials

#

Doing 1 dispatch per material in your project is expensive

merry laurel
#

brute force depth comps per material ๐Ÿ’€

summer sigil
#

Having a pass analyze the visbuffer, and then write out 1 dispatch per material actually needed is much cheaper.

merry laurel
#

yeah they do something simillar

#

Also they do some VRS fun

summer sigil
summer sigil
merry laurel
#

I will clean up my code and fix CPU perf issue

#

Then try loading bunny

#

Few weeks ago I tried reading dgc code and it was awful

#

good thing they allow you create your own structure and tell it what to read

#

also are you running meshopt_optimizeMeshlet for each meshlet? @summer sigil

summer sigil
#

Yeah, it's part of the rust bindings

#

Like build_meshlets calls that internally

merry laurel
summer sigil
#

What's wrong with it?

#

oh, the boiling

#

That's inevitable

#

All you can do is design your game to work around it, try to hide it with texture detail, tweak some variables, etc.

merry laurel
#

๐Ÿ˜ญ

merry laurel
#

I am being so gaslighted rn

#

I am setting positions correctly but flecs is not doing its thing

summer sigil
#

I kind of made progress on using METIS for meshlet generation instead of meshoptimizer

#

Not quite working though...

merry laurel
#

it doesnt provide any weights to metis at all

summer sigil
#

I'll have to look

merry laurel
#

I am hitting some weird bug/ub inside my ecs froge_sad
No clue why its happening so ditching this and will work on something else

summer sigil
#

Ok so metis is choosing triangles for meshlets that are completely disjoint, ahhh

#

so, my graph edges are wrong, maybe

merry laurel
#

lvstri cooked some code while back

#

let me search more

summer sigil
#

Maybe it dosen't work if I do it on vertices

#

I'm partioning triangles based on shared vertices

#

Although maybe my vertex IDs are wrong, hmmm

merry laurel
#

he did his own

#

Found it

#

@summer sigil โ˜๏ธ

summer sigil
#

mmhm

merry laurel
summer sigil
#

no

#

I'm just using part graph kway on triangles

merry laurel
#

then that might be it ๐Ÿ˜…

summer sigil
#

oh hang on I might know

summer sigil
#

ah yeah that fixed it

#

I was essentially saying triangle_id = indices[i] / 3

#

when it should be triangle_id = i / 3

#

The issue is ofc these tiny meshlets still

#

And I'm only setting partition size to 120, not 128

#

Because otherwise metis might go over and hit 129 or 130

merry laurel
#

๐Ÿ˜”

summer sigil
#

Annoying that METIS has these tiny meshlets too

#

But manual recursive bisection might help, idk

summer sigil
#

Ok so even if there are small meshlets at LOD 0, the good news is that they don't get "stuck" as you build further LODs anymore! This is a visualization of the groups (not meshlets)

#

no tiny groups!

#

And if I zoom out on the meshlets view, no tiny meshlets stuck!

graceful spear
#

๐Ÿ˜ญ I come here to look at an interesting project and I get ptsd from networking this year

#

I did all of my teachers 30 packet tracers assignments in the last 3 days of school

merry laurel
#

I am doing rn some cleaning up. I dropped VRAM usage for mesh instance 50% and meshlet instances 25%.
My todo list:

  1. optimize culling
  2. try getting rid of indirection in rendering(meshlet instance index -> meshlet instance data)
  3. get rid of stupid populate meshlets shader that I have which sole job its iterate over all meshlets in mesh and write them out
summer sigil
#

Finally, fixed LOD 0! #showcase message

small osprey
#

what's the secret frogsippy

summer sigil
#

Using METIS instead of meshopt to build meshlets, and a lot of careful tweaking of METIS

#

I use 128t:255v meshlets

small osprey
#

that does not sound fun

summer sigil
#

And for triangle partioning, you want to:

  • Build a graph where triangles are nodes, edges are between triangles that share an edge/vertex, with the edge weight being the count of shared edges/vertices
  • Set ufactor to 1
  • Use recursive bisection for the partioning method (still need to confirm that this is needed over kway)
  • Aim for triangle_count.div_ceil(126) partitions
#

Still need to test kway again, then test on more meshes, and then work on the meshlet partioning for groups

small osprey
#

hmmm bisection would allow it to be multithreaded too

summer sigil
#

Rn I'm still using a 8 target group size with ufactor=200 and rejecting <5% simplified groups. All this needs tweaking still.

summer sigil
small osprey
#

ah

#

maybe i'll try doing it myself (eventually...)

summer sigil
#

But from what I've heard the best way to partition things is recurvisley call part_recursive(partitions = 2) until you're close to the target partition size

#

I very much suggest spending time on the DAG build

small osprey
#

would this be optimal for group partitioning too?

summer sigil
#

It's the most important thing for perf

summer sigil
small osprey
#

recursive bisection in general

summer sigil
#

Yes

small osprey
#

so i have to fix that too KEKW

summer sigil
#

Not sure it's entirely perfect, but it's the rough idea

#

I think unreal uses METIS_PartRecursive instead of METIS_PartKWay like that paper does

small osprey
#

oh someone made nanite their dissertation?

#

cute

summer sigil
#

A couple people have

small osprey
#

well i guess i have a fallback if i don't find something new till then lol

summer sigil
#

Otherwise if you go for 126 in a single go, most hit 126 and are slightly empty

summer sigil
supple cliff
devout niche
summer sigil
supple cliff
#

better in what way?

#

culling and vertex reuse?

summer sigil
#

Idk about vertex reuse, but culling yes

#

Meshopt generates a lot more uniform meshlets

summer sigil
#

I've been thinking about BVH based lod/culling and it just dosen't seem to make sense

#

These are the amounts of meshlet groups per LOD level
1984
1084
566
284
142
71
36
18
9
5
3
2
1
1
1

#

if you have one BVH per level, that's such an unbalanced tree

#

I mean it's still better than parallel brute force, but

small osprey
#

so each level has 8x the capacity

summer sigil
#

Wdym?

small osprey
#

each level has 8 children

#

so every level you add increases capacity by 8x

#

so every ~3 LODs adds a level

#

which isn't as bad

summer sigil
#

No but I mean

#

You make a seperate bvh per lod

#

But each lod has ~half as many groups

#

So you end up with basically all of the nodes concentrated to the left side of the tree, no?

#

Which I mean I suppose is fine, idk

summer sigil
#

The overall tree

small osprey
#

I guess yeah

#

idk maybe nanite does something smarter than me

#

it probably does

#

but my thingy is fast enoughโ„ข๏ธ

summer sigil
#

I mean yeah it'll be faster

#

Just feels off to me

paper ore
#

its time yall nanite people cook up a program/slides where all your various tekneeks are compared against each other

#

ze bunny scene, ze bistor scene, heck why not even the deccer cube scene (the complex ones ina 20x20x20 cube)

#

because i have total overview who is implementing what based on what and what outcome each of yalls stuff has/does/makes

supple cliff
summer sigil
supple cliff
#

i see what you mean

#

i wouldnt really worry about that

#

maybe the largest lods should start earlier in the tree somewhat

#

i think you can try to balance that also

#

wont be perfect but you cant balance that anyway as the lods will just have different depths

summer sigil
#

It's widely unabalanced

supple cliff
#

well thats fine, its just the nature of yhe tree that comes from combining different tree roots

#

you have to traverse a big part of many trees anyways so its mostly like traversing many trees, just a nicer interface to have one

small osprey
#

what I explained was just how I did it

#

I dunno how nanite actually does it

summer sigil
merry laurel
#

Finally back. To do some work on this project

#

I made rendering a bit faster by getting rid of indirection for meshlet instance data from queue. It decreased frame time by 0.4-0.3ms. It aint that much for +-7ms frame.

#

Only downside is increased bandwidth and memory needed.

#

Also that I am limited to the half of meshlets that I could have before

#

In my case that is 33 554 432 meshlet instances

merry laurel
#

A bit late but happy new year frogs froge_yeehaw

paper ore
#

happy new frog my year

merry laurel
#

I hope dwm.exe burns in hell

#

messing up my profiling

#

but looks like I got another 0.2ms from frame time

summer sigil
#

@small osprey the leaf nodes you build your BVH out of are groups of clusters. Are the clusters pre or post simplification? I.e. the clusters you grouped before simplifying, or clusters you make after simplifying?

small osprey
#

post simplify i think

summer sigil
#

hrm

#

I need to fix that then

#

It makes sense, but I realize my stuff is wrong now

small osprey
#

actually wait

#

lemme check the code

#

yeah i lied

#

groups are before before simplify

#

but their parent err is filled in after simplification

summer sigil
#

right, that's how I had it, but then why does that make sense?

#

"if error is low enough, display group"

#

...but the error is the amount of deformation of the simplified group (new meshlets). Not the original group.

small osprey
#

because only parent error is stored in the bvh

#

the bvh traversal visits all groups with parent error that isn't detailed enough

#

then meshlet cull goes through and only renders meshlets with self-error that's detailed enough

summer sigil
#

hmmm

#

need to think on this more when I'm not exauhsted

merry laurel
#

damn you hiz culling

#

but got another 0.3ms off frame time

paper ore
#

you must be in negative milliseconds by now ๐Ÿ˜„

merry laurel
#

I wish

#

So my 125 bistro + sponza scene is running around +-4.2ms when seeing everything

#

I shaved off +-1ms from it

#

major gains was when I removed indirection to fetch meshlet data

#

threads now directly fetch the needed data to fetch mesh, meshlet and transform data

#

The todays win is removing populate meshlets shader for prefix sum and binary search.

#

Next step is bvh culling idk if I will do it

paper ore
#

now add VSM

merry laurel
merry laurel
#

I can finally focus on writing the high school thesis/blog

paper ore
#

ah

merry laurel
north helm
#

Lets gooooo

#

Cant wait to read that

small osprey
#

VSM cringe just RT everything

merry laurel
#

then why nanite in first place KEKW

merry laurel
#

I have to write it in 2 different languages

small osprey
#

too slow

#

I still dk how I'm gonna nicely RT stuff ngl

#

rn I just throw the high poly into a bvh

merry laurel
#

I am planning to have PT reference to my raster cope

#

But rn sleep is in order and studying for stupid literature exam on 5 books + authors

merry laurel
#

windows moment

#

I found out that windows does this randomly

paper ore
#

does what?

merry laurel
#

puts magnifier icon on desktop icons

paper ore
#

looks like its indexing the file then

#

time to switch to linux

merry laurel
paper ore
#

you must have that enabled then

#

otherwise its off by default

merry laurel
#

I used linux for while but physx omniverse support kind of pissed me of after fighting it for 6 hours and rage switched to using windows KEKW

merry laurel
paper ore
#

maybe you pressed a magic win+shift+something shortcut triggerent the accessible isms

#

i used to get that switching fucking IME accidentally

#

left alt + left shift

merry laurel
#

Anyway tomorrow school starts again bleakekw
This half of school year will be spent going to school doing nothing, gym, studying for collage and final exam

#

I will have collage entrance exams in 20 days. Hopefully you dont need to study really its logical and reading exam ๐Ÿคทโ€โ™‚๏ธ

#

other exams are months away

paper ore
#

i wish you best of luck ๐Ÿ™‚

merry laurel
#

This is just third of questions for computer science part of final exam

paper ore
#

Maua Zaba is missing on the list

merry laurel
#

Literature is bleakekw

merry laurel
paper ore
#

: (

merry laurel
#

Each file is circa 2/3 4A and some more

merry laurel
#

So yeah this half of the last grade is not going to be fun

#

but I will be rewarded with 4 months of holidays

#

and the college starts froge_sad

merry laurel
# proud marten MINES CANCELED FOR SNOW

Dont talk about snow. Czech weather pulled out some funny card. It didnt snowed for 20 days but today it snowed then we had warm weather so some snow turned into ice

#

So my groceries looked like this

#

Literally every 5 metres

proud marten
#

LOL

north helm
merry laurel
#

Basically PhysX 5

north helm
#

Okay

#

I Got i to compile on linux

#

But it was hell

#

Basicaly you have to compile it manually by selecting the source files you want ๐Ÿฅฒ

#

Their cmakelist is doomed

merry laurel
north helm
#

NVIDIA moment

merry laurel
north helm
#

And the amount of GCC warnings is insane

merry laurel
#

I am clang user which was fun also

north helm
merry laurel
north helm
#

Okay i'l give it when i start my computer

merry laurel
#

๐Ÿซก

north helm
#

@merry laurel

#

i use it as a github submodule

summer sigil
#

Figuring out how to build the bvh is quite frustrating:/

merry laurel
north helm
summer sigil
merry laurel
#

very fun bleakekw

small osprey
#

just copy my code forgderp1

summer sigil
small osprey
#

lol

#

when yoinking code I find it helpful to just stare at the function I'm copying until I understand what it does KEKW

merry laurel
merry laurel
#

God I love breaking changes

#

No more fancy ui

#

I guess its time to also use live++

merry laurel
#

Got it integrated but it doesnt want to hot reload

#

it seems it cant find changes

merry laurel
#

Okay after fighting with cmake with linker flags. I have managed to load the translation units to learn thanks to daxa I cant reload my code because my pch includes it everywhere KEKW

#

good riddance

#

that is going to be a lot of fixing

paper ore
#

time to write your own OS

#

which fixes all that : >

north helm
#

@merry laurel can you renderer render this ?

#

mine without culling and shading on a rtx 4070 is at 0.2fps

merry laurel
north helm
#

no

merry laurel
#

then no ๐Ÿ’€

#

my asset manager will explode

north helm
#

id does not have uv lol

#

i had to tweak my asset manager to load it

#

but this looks like the ultimate test for your renderer

merry laurel
#

I will try to give it some uvs and materials so my renderer survives

north helm
#

i gave it 0 0 for all uvs x)

merry laurel
#

I dont even handle that

#

gltf file has to have uvs otherwise boom

#

Also I hope it fits into my meshlet limit of 33M meshlets

north helm
#

this thing is so huge that it broke entt's ecs by having too many entities x)

merry laurel
#

My current test bench has 125k to 231k entities

#

So it should be fine

north helm
#

i dont have the count but i had to force the library to use uint64_t for entities

merry laurel
#

Oh okay then my CPU will suffer

north helm
#

when loading it was using 28G of ram

#

and maybe some swap

merry laurel
#

I iterate over all entities twice to check if dirty

#

I wanted to fix that but flecs wasn't behaving right

north helm
#

flecs ?

merry laurel
#

entt but better

north helm
#

why is it better ?

merry laurel
#

It scales better

#

Generally faster

north helm
#

oh okay

#

maybe i will change

merry laurel
#

it is bit weird to implement

#

compared to entt

north helm
#

do you have scene graph with your ecs ?

merry laurel
#

you mean hierarchy then yes

north helm
#

you do you manage this ?

merry laurel
#

elaborate more?

north helm
#

a scene graph is not data oriented so i cant see how it can fit with an ecs

#

your nodes contains only ecs ids ?

merry laurel
#

I use flecs to do some magic for me you could imagine it being database and queries same as sql select. Flecs does the heavy lifting for to be faster in data oriented paradigm

#

this is cpu only

#

gpu is plain data oriented

#

but I have some ecs hierarchy there

north helm
#

so flecs provide hierarchy methods ?

#

because entt is linear

merry laurel
#

yes

#

yozu can ask for parent components, etc..

north helm
#

ooooohhhhhhh

#

lets goooo

#

thats so cool thank you for you advices

merry laurel
#

Now I will bully windows to give me more ram

#

where is other 46GB of ram

north helm
#

trying to load it in blender ?

paper ore
#

you could add a static page file with fixed min=max to extend that

merry laurel
north helm
#

does blender have culling and optimizations to render those huge meshes ?

paper ore
#

lvstri also had a 128gb page file iirc on top of his 128gb of physical ram to export ue city hehe

north helm
#

i just deleted my swapfile to avoid destroying my ssd x)

#

i dont know if i can still load this model

paper ore
#

you get frustum culling with addons

merry laurel
#

blender is so shitty software

#

at least for these big scenes

#

I am tempted to make my own blender

north helm
#

implement your nanite cope in a blender plugin

merry laurel
#

NO

paper ore
#

no better yet, coerce donmccurdy into rewriting the gltf import/export plugins to run natively not via python

north helm
#

๐Ÿ’€

merry laurel
#

RamMap is goated software

#

now my computer is actually usable

merry laurel
#

Blender was already choking on caldera

#

Funny thing about caldera usda file

#

It has all the LODs but blender usd support sucks and will show you the worst

north helm
#

ooohhh i forgor the warzone map

#

i have to try it

merry laurel
#

good luck exporting it

north helm
#

do you have it ? ๐Ÿ‘‰ ๐Ÿ‘ˆ ๐Ÿฅบ

paper ore
#

yeah caldera was a pain in the ass

merry laurel
#

nope, gave up

paper ore
#

4hrs or so for the airfield piece

merry laurel
#

because how stupid blender is

paper ore
#

single threaded too ๐Ÿ™‚