Iris - A Journey through OpenGL and beyond to learn Graphics | Graphics Programming | Page 15

wicked notch Jan 8, 2024, 9:11 PM

#

I guess I should also do something about this 💀

wispy spear Jan 8, 2024, 9:17 PM

#

: )

wicked notch Jan 8, 2024, 9:17 PM

#

bistro be looking very square KEKW

wispy spear Jan 8, 2024, 9:17 PM

#

its the bistro square after all

#

ah and because you dont take transforms into account, the lights are gone here too : >

wicked notch Jan 8, 2024, 9:18 PM

#

yep :(

#

top priority right now is figuring out how UE deals with partitions bigger than the prim upper bound

#

and also fixing horrible meshlet islands

wispy spear Jan 8, 2024, 9:19 PM

#

when is next exam?

wicked notch Jan 8, 2024, 9:20 PM

#

23rd with a project deadline the 16th KEKW

wicked notch Jan 8, 2024, 9:50 PM

#

I also have to think about how to store this DAG

#

we also need a name for this tech

#

current candidates are:

Frognite
Nanofrog

#

I'll also ask Suslik tomorrow if he still remembers how his clusterizer worked

wispy spear Jan 8, 2024, 10:02 PM

#

Frogfrog

#

Nanonite

wicked notch Jan 8, 2024, 11:32 PM

#

I can't sleep like this

#

I have a trillion ideas I want to try for this but exams and I gotta sleep

#

why does life

frank sail Jan 8, 2024, 11:32 PM

#

when is the blog dropping btw

#

froge_yeehaw

wicked notch Jan 8, 2024, 11:33 PM

#

whenever I can form coherent sentences about graphs and kdtrees KEKW

#

I think I'll shelve the custom clusterizer for now and try using meshoptimizer's

#

maybe it's viable maybe it isn't, regardless I have about 20 more steps to get nanite

#

the next one is figuring out the DAG (and consequently meshlet borders, locking and adaptive simplification)

frank sail Jan 8, 2024, 11:35 PM

#

I'll work on lumen while you do nanite

wicked notch Jan 8, 2024, 11:36 PM

#

epic

frank sail Jan 8, 2024, 11:36 PM

#

games

wicked notch Jan 8, 2024, 11:36 PM

#

one day we'll merge our powers and combine VSM Lumen and Nanite

#

and then we'll get cease and desists from Epic bleakekw

frank sail Jan 8, 2024, 11:37 PM

#

have you determined if it's possible to implement a subset of nanite and still observe a benefit

#

i.e., reduce the work for smaller gains

#

because having any auto lod at all is very useful

wicked notch Jan 8, 2024, 11:37 PM

#

nanite is super scalable yeah

#

already I can decide the simplification factor

frank sail Jan 8, 2024, 11:38 PM

#

I mean in terms of implementation effort

wicked notch Jan 8, 2024, 11:38 PM

#

oh

#

I'm going for the simplest possible impl

#

So far it's a bumpy road

#

I think the major hurdles will be due to edge cases

#

don't care yet about those but I'll try to mitigate them when I see one

#

I also realized that I don't understand mesh shading at all agonyfrog

#

as in "what is a vertex index buffer"

#

"for indirection into the vertex buffer allowing vertex reuse"

#

is a very valid answer and the one I'd have given until yesterday

#

I gotta get a deeper understanding on that though to make good™️ partitioning

#

this doesn't matter if you just use meshopt for the initial clustering and "split" steps

#

although meshoptimizet is suboptimal for nanite, I confirmed that

#

long meshlets + potentially discontinuous island are terrible

#

this means that two different meshlets might share two borders

frank sail Jan 8, 2024, 11:43 PM

#

is that why you were making your own clusteroni

wicked notch Jan 8, 2024, 11:43 PM

#

which breaks some assumptions

wicked notch Jan 8, 2024, 11:44 PM

#

frank sail is that why you were making your own clusteroni

I'm trying and failing kekkedsadge

#

I'll ask suslik for help tomorrow

frank sail Jan 8, 2024, 11:44 PM

#

sucking is the first step to being kinda good at something

wicked notch Jan 8, 2024, 11:46 PM

#

the two major issues with my own clusterizer right now are that my heuristics suck

#

and that partition size is not guaranteed

wicked notch Jan 8, 2024, 11:46 PM

#

wicked notch I guess I should also do something about this 💀

so this happens bleakekw

wispy spear Jan 8, 2024, 11:47 PM

#

does it really matter?

wicked notch Jan 8, 2024, 11:47 PM

#

yes

wispy spear Jan 8, 2024, 11:47 PM

#

when you merge all tringles into one list, and then just partition by 128 tringles

wicked notch Jan 8, 2024, 11:47 PM

#

mesh shaders have a strict upper bound

wispy spear Jan 8, 2024, 11:47 PM

#

and fill up with NANs at the end

wicked notch Jan 8, 2024, 11:48 PM

#

wispy spear and fill up with NANs at the end

that's one solution

frank sail Jan 8, 2024, 11:48 PM

#

wicked notch I guess I should also do something about this 💀

Subtract one from the primitive count smh

wicked notch Jan 8, 2024, 11:48 PM

#

subdividing might be a better one

wispy spear Jan 8, 2024, 11:48 PM

#

ah

#

that i even understand : )

#

or you smuggle in a little low poly frog mesh

#

in 127 variations, to fill up all possible scenarios

#

when the last meshlet only has 1, 2, 3, 4, 5 etc vertices then the frog127, frog126, frog125etc mesh goes here 🙂

frank sail Jan 8, 2024, 11:50 PM

#

would be funny to use little frogs as a debug mesh

wispy spear Jan 8, 2024, 11:50 PM

#

https://tenor.com/view/self-five-tina-fey-selfhighfive-30rock-gif-5277274

Tenor

wicked notch Jan 8, 2024, 11:50 PM

#

I do use the good froge

wispy spear Jan 8, 2024, 11:50 PM

#

the big froge works nicely

wicked notch Jan 8, 2024, 11:50 PM

#

it's one primitive, it's reasonably complex and it has meshlets of all shapes and kinds

frank sail Jan 8, 2024, 11:51 PM

#

also that made me think about a possible low poly aesthetic for a game that uses as few vertices as possible to represent an object

wispy spear Jan 8, 2024, 11:51 PM

#

the 2 vertex castle

#

in the distance

frank sail Jan 8, 2024, 11:51 PM

#

VK_LINES castle

wispy spear Jan 8, 2024, 11:51 PM

#

that one big tower which towers over all the other parts of it

#

you keep increasing GL_LINE_WIDTH until its big enough to switch to actual mesh, as you come closer

#

lustri, write your ideas down and then focus on your exams

#

unless you can convince sir brain to hire you out of school before all that

twin bough Jan 9, 2024, 2:55 AM

#

budget unreal

#

damn

#

great job

wicked notch Jan 9, 2024, 11:39 AM

#

this be a meshlet

#

how do I find the edges of said meshlet bleakekw

#

do I have to take into account winding order?

wispy spear Jan 9, 2024, 11:48 AM

#

criver might know?

wicked notch Jan 9, 2024, 11:57 AM

#

perchance

wicked notch Jan 9, 2024, 12:12 PM

#

huh

#

maybe border edges can only connect two vertices

#

wait

#

I mean

#

border vertices can only be connected by two edges

#

or maybe 3

#

at most 3

#

can an inner vertex be connected by 3 edges though?

#

yes..

#

hmm

#

#

oh old on

#

a boundary edge is only referenced by one primitive

#

so if I encounter the same edge twice when scanning primitives then it's not a boundary edge

#

now if I have this index buffer [0, 1, 2]

#

How do I traverse it?

#

0 -> 1
1 -> 2
2 -> 0?

#

Rather how does the GPU do it

glass sphinx Jan 9, 2024, 12:27 PM

#

yes

#

wait

#

traverse in what sense

#

why would it need it?

#

it just rasterizes

wicked notch Jan 9, 2024, 12:28 PM

#

the rasterizer gets a list of vertices

#

let's say it's just a triangle

#

[0, 1, 2] is its index buffer

#

I guess the order of the indices defines the winding order?

glass sphinx Jan 9, 2024, 12:29 PM

#

yea thats it

wicked notch Jan 9, 2024, 12:29 PM

#

assuming the vertices stay static

glass sphinx Jan 9, 2024, 12:29 PM

#

but that is defined by the positions as well

frank sail Jan 9, 2024, 12:29 PM

#

the winding order is determined by the signed area of the triangle

#

which you can calculate from the determinant of the vertices or sumshit

wicked notch Jan 9, 2024, 12:31 PM

#

right yeah

#

but I mean

#

the signed area changes if I swap around indices in the index buffer

frank sail Jan 9, 2024, 12:31 PM

#

it do

wicked notch Jan 9, 2024, 12:31 PM

#

ok I think I can do this maybe

frank sail Jan 9, 2024, 12:31 PM

#

yes you can

wicked notch Jan 9, 2024, 12:31 PM

#

where's my hashmap

wicked notch Jan 9, 2024, 12:50 PM

#

my man Federico was rendering 373 million triangles meshes in 2008 with a fucking 6800 GT and OpenGL 2

wicked notch Jan 9, 2024, 1:15 PM

#

it works

#

holy shit it works

#

let's fucking go

#

don't mind the uh

#

the uh

#

auto clusterEdges = std::vector<std::vector<std::pair<std::uint32_t, std::uint32_t>>>(currentClusters.size());
auto clusterEdgeCounts = std::vector<std::unordered_map<std::pair<std::uint32_t, std::uint32_t>, std::uint32_t>>(currentClusters.size());```

#

KEKW

#

it's ok this is at build time it isn't meant to be performant

#

it's not that I can't code

#

nope, totally unrelated

pale horizon Jan 9, 2024, 1:19 PM

#

Cache: sayonara...

wicked notch Jan 9, 2024, 1:20 PM

#

might as well write CacheMiss<CacheMiss<CacheMiss<uint32>>>

pale horizon Jan 9, 2024, 1:20 PM

#

List<List<Pair<
List<HashMap<...

hallow umbra Jan 9, 2024, 1:20 PM

#

and a diet coke

frank sail Jan 9, 2024, 1:20 PM

#

buy one of the X3D cpus for epic cache to minimize the damage of this situation

wicked notch Jan 9, 2024, 1:21 PM

#

true

#

buy AMD

#

(not sponsored)

#

ok now we can partition

#

and then make a shrimplify and split

#

and then make a dag

#

then runtime LOD selection

#

very easy

wicked notch Jan 9, 2024, 2:07 PM

#

for (auto index = 0; const auto& cluster : currentClusters) {
    auto clusterEdges = std::vector<std::uint64_t>();
    for (auto k = 0; k < cluster.triangle_count; ++k) {
        const auto basePrimitiveId = cluster.triangle_offset + k * 3;
        if (edge0.first > edge0.second) {
            std::swap(edge0.first, edge0.second);
        }
        if (edge1.first > edge1.second) {
            std::swap(edge1.first, edge1.second);
        }
        if (edge2.first > edge2.second) {
            std::swap(edge2.first, edge2.second);
        }
        clusterEdges.emplace_back(static_cast<std::uint64_t>(edge0.first) << 32 | edge0.second);
        clusterEdges.emplace_back(static_cast<std::uint64_t>(edge1.first) << 32 | edge1.second);
        clusterEdges.emplace_back(static_cast<std::uint64_t>(edge2.first) << 32 | edge2.second);
    }
    std::sort(std::execution::par, clusterEdges.begin(), clusterEdges.end());
    auto clusterBorderEdges = std::vector<std::uint64_t>();
    clusterBorderEdges.reserve(clusterEdges.size());
    if (clusterEdges[0] != clusterEdges[1]) {
        clusterBorderEdges.emplace_back(clusterEdges[0]);
    }
    for (auto k = 1; k < clusterEdges.size() - 2; ++k) {
        const auto& previousEdge = clusterEdges[k - 1];
        const auto& edge = clusterEdges[k];
        const auto& nextEdge = clusterEdges[k + 1];
        if (edge != previousEdge && edge != nextEdge) {
            clusterBorderEdges.emplace_back(edge);
        }
    }
    if (clusterEdges[clusterEdges.size() - 2] != clusterEdges[clusterEdges.size() - 1]) {
        clusterBorderEdges.emplace_back(clusterEdges[clusterEdges.size() - 1]);
    }
    ++index;
}```I made it less cancer

#

I guess now that I sort this I can make a sliding window search to look for shared meshlet borders

#

although borders can get pretty big

#

eh it's fine, CPUs are fast KEKW

wicked notch Jan 9, 2024, 2:51 PM

#

and since I'm already breaking half of the GLTF spec, why not break it even more, I'll define custom attributes to store dual graph topology & stuff

wicked notch Jan 9, 2024, 3:13 PM

#

hmm I guess edge weights now are the number of shared boundary edges

#

wait hold up

#

uniqueEdges = List<HashMap<uint64, uint64>>();
for (index, cluster) in clusters {
  clusterEdges = HashMap<uint64, uint64>();
  for (id, _) in cluster.primitives {
    const auto edges = [
      sort((cluster.triangles[id * 3 + 0], cluster.triangles[id * 3 + 1])),
      sort((cluster.triangles[id * 3 + 1], cluster.triangles[id * 3 + 2])),
      sort((cluster.triangles[id * 3 + 2], cluster.triangles[id * 3 + 0])),
    ];
    for edge in edges {
      ++clusterEdges[(edge.0 << 32) | edge.1];
    }
  }
  uniqueEdges[index] = clusterEdges
    .erase((edge, count) => count > 1);
}
clusterBorders = HashMap<uint64, List<(uint64, List<uint64>)>>();
for (i, _) in clusters {
  sharedEdges = List<uint64>();
  for (j, _) in clusters {
    if i == j {
      continue;
    }
    for edge in uniqueEdges[i] {
      if uniqueEdges[j].contains(edge) {
        sharedEdges.push(edge);
      }
    }
    if sharedEdges.isEmpty() {
      continue;
    }
    clusterBorders[i].push((j, sharedEdges));
  }
}```

#

damn

#

wait no still wrong

#

shit

#

ok now it's good (maybe)

#

ok pseudocode helps

wicked notch Jan 9, 2024, 3:55 PM

#

std::unordered_map<std::uint64_t, std::vector<std::pair<std::uint64_t, std::vector<std::uint64_t>>>>()```

#

it got worse

wispy spear Jan 9, 2024, 5:30 PM

#

use using

#

: )

wicked notch Jan 9, 2024, 5:32 PM

#

That's heresy

wispy spear Jan 9, 2024, 5:34 PM

#

i like how you exaggerated with the std:: for uint64_t 😄

wicked notch Jan 9, 2024, 5:37 PM

#

I could've copy pasted my own integer types

#

but I feel like it's too late now bleakekw

#

Anyways I bothered Suslik again

#

~~I hope he doesn't kill me~~

wicked notch Jan 9, 2024, 6:19 PM

#

shit

#

meshopt_simplify doesn't have a way for me to lock edges

#

fffffuck

wicked notch Jan 9, 2024, 8:07 PM

#

how far is the future zeux

#

it's been two weeks

#

agonyfrog

wispy spear Jan 9, 2024, 8:18 PM

#

what were you aksing? 😄

wicked notch Jan 9, 2024, 8:22 PM

#

It's another's PR

#

apparently a company is making nanite too bleakekw

#

https://github.com/zeux/meshoptimizer/pull/601
https://blog.traverseresearch.nl/creating-a-directed-acyclic-graph-from-a-mesh-1329e57286e5

frank sail Jan 9, 2024, 8:24 PM

#

I bet you could get hired at traverse

#

Then implement nanite for them

wicked notch Jan 9, 2024, 8:41 PM

#

it's a start

#

alright

wispy spear Jan 9, 2024, 8:42 PM

#

we soon need a meshlet containing emoji

wicked notch Jan 9, 2024, 8:45 PM

#

shit

#

the partitions are wrong

#

how tf

#

ok time to go back to balls

#

I might need bigger balls though

#

goddamit

#

it's the disconnected meshlets

#

fuuuuck

#

meshopt_simplify also fails spectacularly to halve the number of triangles

#

damn

#

it's going wrong in so many different places at once froge_bleak

glass sphinx Jan 9, 2024, 9:23 PM

#

wicked notch I might need bigger balls though

https://tenor.com/view/cat-cat-memes-cat-images-cat-meme-gif-4644773688486402896

Tenor

wicked notch Jan 9, 2024, 9:28 PM

#

absolutely fantastic

#

beautiful even

wicked notch Jan 9, 2024, 9:36 PM

#

wicked notch meshopt_simplify also fails spectacularly to halve the number of triangles

nevermind

#

my ability to code dwindles with every second

#

I don't even know why I assumed meshoptimizer was wrong instead of my ridiculously dodgy code

#

meshopt_simplify works so flawlessly it's scary

#

alright

#

no seams/tears

#

let's fucking gooo

#

but suzanne had a bit of a lobotomy KEKW

#

alright moment of truth

#

let's randomize vertex selection

#

those traverse guys are full of shit

#

beautiful unchanged borders

#

we're nearly done boys

#

all I have to do is the goddamn DAG and we win

wicked notch Jan 9, 2024, 10:27 PM

#

vertex weights are bad somehow

#

I'm having way too much fun

#

this is glorious

wispy spear Jan 9, 2024, 10:33 PM

#

: )

#

that makes me happy

#

is that froge again?

wicked notch Jan 9, 2024, 10:33 PM

#

this is just a ball KEKW

wispy spear Jan 9, 2024, 10:33 PM

#

oukay : )

wicked notch Jan 9, 2024, 10:34 PM

#

we're inside a ball, looking out

#

it's easier to visualize this way

wispy spear Jan 9, 2024, 10:34 PM

#

yeah i can see we are inside

wicked notch Jan 9, 2024, 10:34 PM

#

man this is too good

#

we're almost there boys

#

and then I'll replace every one of your renderers with this

wispy spear Jan 9, 2024, 10:34 PM

#

all this meshletism is because the gpu has an easier time scheduling work for those smoller islands of verticles?

wicked notch Jan 9, 2024, 10:35 PM

#

ye

#

also better/easier culling

wispy spear Jan 9, 2024, 10:35 PM

#

yeah

#

it sounds counter intuitive at first

#

more vertex groups to be aabb or whatever checked

#

than just the original mesh primitives

#

i cant wait to unlock this achievement too hehe

wicked notch Jan 9, 2024, 10:37 PM

#

I'll delegate Jaker to write documentation for my nanite

#

so that you can go ahead and implement it

wispy spear Jan 9, 2024, 10:39 PM

#

: D

#

Saky should implement it

wicked notch Jan 9, 2024, 10:40 PM

#

oh yeah, potrick and saky were working on nanite 2 as well

wispy spear Jan 9, 2024, 10:41 PM

#

were?

wicked notch Jan 9, 2024, 10:42 PM

#

saky got stuck writing VSM then KEKW

#

and potrick got the culling brainworm

#

alright this is it for today

#

tomorrow I'll do stable partitioning and cross cluster boundary edge weights

#

and maybe I'll figure out how to hack glTF to store le epic clusters

wispy spear Jan 9, 2024, 10:49 PM

#

sounds like a plan

wicked notch Jan 9, 2024, 10:49 PM

#

now it's time to study

#

until I fall asleep

wispy spear Jan 9, 2024, 10:50 PM

#

https://tenor.com/view/boy-math-school-asia-study-gif-18879119

Tenor

wicked notch Jan 9, 2024, 10:50 PM

#

wispy spear https://tenor.com/view/boy-math-school-asia-study-gif-18879119

real

wicked notch Jan 10, 2024, 12:04 PM

#

hmm maybe the usual limits are causing clusters to be too smol

#

with 128/128 I get sparser but more consistent clusters

#

the usual game of tradeoffs

wicked notch Jan 10, 2024, 1:24 PM

#

ok 128/128 is gud enough

#

now

#

what do I do

#

store everything needed to build the DAG and build it at runtime
serialize the DAG and somehow also the vertex/index data referenced by said DAG

#

I feel like the first one is easier

wicked notch Jan 10, 2024, 2:02 PM

#

friggin METIS is crashing again on edge weights

#

ok time for asan

wicked notch Jan 10, 2024, 3:50 PM

#

goddamnit

#

this shit is so slow

#

for (auto k = 0; k < meshClusters.size(); ++k) {
    auto borderingClusters = std::vector<std::pair<std::uint64_t, std::vector<std::uint64_t>>>();
    for (auto z = 0; z < meshClusters.size(); ++z) {
        if (k == z) {
            continue;
        }
        auto sharedEdges = std::vector<std::uint64_t>();
        for (const auto& edge  : uniqueEdges[k]) {
            if (uniqueEdges[z].contains(edge)) {
                sharedEdges.emplace_back(edge);
            }
        }
        if (sharedEdges.empty()) {
            continue;
        }
        borderingClusters.emplace_back(z, std::move(sharedEdges));
    }
    clusterBorders[k] = std::move(borderingClusters);
}```

#

and no shit

#

O(n^2)

#

is indeed slow KEKW

wicked notch Jan 10, 2024, 4:31 PM

#

Ah my weights are inverted

#

shiet

wispy spear Jan 10, 2024, 5:31 PM

#

ill bring back the idea of 127 variants of a frog, to fillup the remaining vertices in a meshlet

wicked notch Jan 10, 2024, 5:42 PM

#

no worries

#

I have fixed everything

#

™️

#

ok now I really have to start to think on how to store this shit

#

looking at meshlets at different LODs interact perfectly is fun

#

but I want to see them switching at runtime

#

so

#

glTF

#

wonderful format

#

how do I shove meshlet in it

wicked notch Jan 10, 2024, 7:43 PM

#

wicked notch I have fixed everything

turns out, I didn't fix everything

primal shadow Jan 10, 2024, 7:57 PM

#

wicked notch how do I shove meshlet in it

Bevy is working on a meshlet extension for gltf for our use case

wicked notch Jan 10, 2024, 8:01 PM

#

epic

#

unfortunately I am going insane

#

so the border issue is irrelevant

#

but the "generating good meshlets" issue is still alive and kicking

frank sail Jan 10, 2024, 8:02 PM

#

wicked notch friggin METIS is crashing again on edge weights

I thought you stopped using METIS

wicked notch Jan 10, 2024, 8:03 PM

#

I'm using both METIS and my own graph algos

#

some stuff is just painful to implement KEKW

frank sail Jan 10, 2024, 8:03 PM

#

ah

wicked notch Jan 10, 2024, 8:04 PM

#

but it's ok, METIS isn't giving much trouble now that I know what the cryptic shit Karypis wrote in his document means

#

anyways the issue is

#

neither METIS nor I can generate functional meshlets

#

as in, I cannot generate meshlets that stay under the 128 primitive limit

#

I can't enforce that limit

#

further still, meshoptimizer still greatly prefers vertex reuse to spatial locality

primal shadow Jan 10, 2024, 8:10 PM

#

Have you seen the algorithms in this paper? https://github.com/Senbyo/meshletmaker/blob/main/README.md

GitHub

meshletmaker/README.md at main · Senbyo/meshletmaker

Collection of meshlet generation algorithms. Contribute to Senbyo/meshletmaker development by creating an account on GitHub.

wicked notch Jan 10, 2024, 8:11 PM

#

I have the paper open in my browser but I still have to read it kekkedsadge

#

maybe I'll read it now, draw inspiration from it

#

Sponza is also a ridiculous mesh

frank sail Jan 10, 2024, 8:26 PM

#

are the big triangles making you down

wicked notch Jan 10, 2024, 8:26 PM

#

no

#

it's the pillars

#

it's a single mesh

#

so the clusterizer just assumes it's contiguous

#

but it isn't

frank sail Jan 10, 2024, 8:27 PM

#

is dis old sponza

wicked notch Jan 10, 2024, 8:27 PM

#

ye

frank sail Jan 10, 2024, 8:27 PM

#

wicked notch so the clusterizer just assumes it's contiguous

is that your code

wicked notch Jan 10, 2024, 8:28 PM

#

it's meshoptimizer's

#

but the same issue happens with graph partitioning

#

the absolute funniest thing

#

is that Unreal doesn't give a shit

#

and just clusterizes all of sponza at once

frank sail Jan 10, 2024, 8:29 PM

#

hmm so make a big soup and then submit the triangles to the clusteroni

wicked notch Jan 10, 2024, 8:29 PM

#

yeah

#

#

it's just one primitive

#

which is great, but you still have discontinuous islands in your graph

#

the pillars and the arches share no vertices

#

for some goddamn reason

frank sail Jan 10, 2024, 8:30 PM

#

so don't put them in the same cluster

#

shrimple as that

wicked notch Jan 10, 2024, 8:31 PM

#

and if I don't I get bad partitioning later because something that should have been a border, actually isn't

#

bad partitioning = less than ideal partitions

#

which in turns restricts the simplifier from simplifying

#

and in turn the split step cannot actually split anything due to the mesh being too simple

wicked notch Jan 10, 2024, 9:09 PM

#

ok I figured out how to make the micro index buffer

#

thank you zeux for commenting your code KEKW

#

new approach

wicked notch Jan 10, 2024, 9:42 PM

#

I think I cracked it

#

ok maybe I did crack it

#

This is unreal's ball

#

#

This is mine

#

I think this is good

buoyant summit Jan 10, 2024, 9:56 PM

#

wow looks like your clusters are more rectangular too........

#

instead of a more stretched aspect ratio

wicked notch Jan 10, 2024, 9:56 PM

#

Unreal generates twice as many clusters as well for some reason

#

LogStaticMesh:   Input: 4130 Clusters, 520344 Triangles and 1199795 Vertices
LogStaticMesh:   Output without splits: 4130 Clusters, 520344 Triangles and 1199795 Vertices
LogStaticMesh:   Output with splits: 8233 Clusters, 520344 Triangles and 1202735 Vertices
LogStaticMesh: Material Stats - Unique Materials: 1, Fast Path Clusters: 8233, Slow Path Clusters: 0, 1 Material: 8233, 2 Materials: 0, 3 Materials: 0, At Least 4 Materials: 0```

wicked notch Jan 10, 2024, 10:21 PM

#

I gotta look at UE's source to figure out how the hell they're coercing their graph partitioning to stay true to the primitive limit

wicked notch Jan 10, 2024, 10:38 PM

#

ok they just bisect the graph when it's greater than expected partition size

#

figures

wicked notch Jan 11, 2024, 12:27 AM

#

ok, random ideas I got while I was studying

#

One DAG per glTF primitive
the DAG only has one root node if the primitive is less than 128 primitives or two leaf nodes and one root node if it is more than 128 but less than 255
The simplifier reaching the target goal doesn't matter
The cluster partitions must be at least composed of 4 clusters per partition
Make a better algo for searching connected clusters because O(n^2) where n=numClusters is absolute shit (it's probably parallelizable)

wispy spear Jan 11, 2024, 11:47 AM

#

do you have to find all these clusters during runtime????

#

is all offline nonsense right?

wicked notch Jan 11, 2024, 11:50 AM

#

ye this is all offline

wispy spear Jan 11, 2024, 11:53 AM

#

then dont put too much brain-cell-lets into this

wicked notch Jan 11, 2024, 5:05 PM

#

I think

#

maybe I'm approaching this in a wrong way

#

instead of hoping partitioning and clustering will work with generic meshes of a generic triangle count

#

why not make sure everything is a multiple of 128

#

so basically subdivision to make sure a mesh is always at least 128 triangles

#

let me check what happens if I feed unreal engine a single triangle

#

ok unreal doesn't tessellate/displace a cube or a triangle

#

I assume they simply skip all the steps and make the DAG with a single root node

#

yup

#

LogStaticMesh: Display: Building static mesh Mesh...
LogStaticMesh: Adjacency [0.00s], tris: 192, UVs 2
LogStaticMesh: Clustering [0.00s]. Ratio: 1.000000
LogStaticMesh: Leaves [0.00s]
LogStaticMesh: Reduce [0.00s]
LogStaticMesh: Fallback 0/1 [0.00s], num tris: 192
LogStaticMesh: ConstrainClusters:
LogStaticMesh:   Input: 3 Clusters, 318 Triangles and 277 Vertices
LogStaticMesh:   Output without splits: 3 Clusters, 318 Triangles and 277 Vertices
LogStaticMesh:   Output with splits: 3 Clusters, 318 Triangles and 277 Vertices
LogStaticMesh: Material Stats - Unique Materials: 1, Fast Path Clusters: 3, Slow Path Clusters: 0, 1 Material: 3, 2 Materials: 0, 3 Materials: 0, At Least 4 Materials: 0```

#

if a mesh is >128 but <255 then Unreal does some shenanigans

#

I assume this is simple subdivision and not catmull-clark

buoyant summit Jan 11, 2024, 5:17 PM

#

catmull-clark subdivision would obviously be bad because it changes the surface

#

(as in the set of points)

#

tbh I kinda hate catmull-clark even when using it as an authoring tool

#

would be nice to have subdivision algorithm that as you subdivide tends to where you'd trace your shadows basically

wicked notch Jan 11, 2024, 5:20 PM

#

yeah that would be nice

buoyant summit Jan 11, 2024, 5:20 PM

#

the last two lines are me going on a tangent, just to be clear, ignore those

wicked notch Jan 11, 2024, 5:20 PM

#

no more scuffed shadows on spheres

buoyant summit Jan 11, 2024, 5:21 PM

#

scuffed shadows on curved surfaces is already a solved problem

wicked notch Jan 11, 2024, 5:21 PM

#

anyways I'll try looking for some subdivision algorithms/libraries that can guarantee me N output triangles

#

any suggestions?

buoyant summit Jan 11, 2024, 5:23 PM

#

tbh I'm not sure that can exist

#

for general N

wicked notch Jan 11, 2024, 5:23 PM

#

yep "guarantee" is a strong word

#

I want something that can at least approach N

buoyant summit Jan 11, 2024, 5:24 PM

#

you could just split random edges (preferrably ones that have big triangle lie on them so that your triangles tend to be about the same size) until you arrive at N

wicked notch Jan 11, 2024, 5:24 PM

#

hmm

#

makes sense

#

if (primitiveCount % 128 > 32) {
  area = CalculatePrimitiveArea(mesh);
  edges = CalculateWeightedPrimitiveAdjacency(mesh, area);
  subdivided = SubdivideMesh(mesh, edges);
}```

buoyant summit Jan 11, 2024, 5:31 PM

#

btw obviously you can subdivide an edge not into two edges but e.g. 3 edges

#

or 5 edges

#

maybe that's worthwhile

wispy spear Jan 11, 2024, 5:34 PM

#

this is not relevant or is it? https://www.youtube.com/watch?v=FFWgQZsfwy8 for some reason i watched it last night

YouTube

Jonathan Dupuy

A Halfedge Refinement Rule for Parallel Catmull Clark Subdivision (...

More details: http://onrendering.com/

▶ Play video

wicked notch Jan 11, 2024, 5:38 PM

#

this is catmull-clark which changes the surface so unfortunately not for me

#

but maybe I can apply this knowledge

primal shadow Jan 12, 2024, 3:26 AM

#

@wicked notch do you have a reference for calculating tangents from screen space derivitaves?

#

Something that can be used with the miktspace normal system bevy already uses

wicked notch Jan 12, 2024, 10:30 AM

#

I used this: http://www.thetenthplanet.de/archives/1180 there's some precision issues you'll have to fix if you want a decent result

#

Otherwise I now prefer just calculating tangents on the host if they're missing

wicked notch Jan 12, 2024, 11:47 AM

#

so here's another thing I didn't understand at first

#

you don't use a graph partitioning algorithm to do clusters and hope they don't go over 128

#

you use a graph partitioning algorithm to recursively bisect a graph and then until the partitions are the right size

wicked notch Jan 13, 2024, 12:42 PM

#

ok graph bisection

#

are Unreal peeps smoking crack or am I dumb

#

    real_t PartitionWeights[] = {
        float( TargetNumPartitions / 2 ) / TargetNumPartitions,
        1.0f - float( TargetNumPartitions / 2 ) / TargetNumPartitions
    };```

frank sail Jan 13, 2024, 12:49 PM

#

wicked notch are Unreal peeps smoking crack or am I dumb

how do you think they made unreal

wicked notch Jan 13, 2024, 12:49 PM

#

isn't this just [0.5, 0.5]

#

x / 2 / xis just 1/2

frank sail Jan 13, 2024, 12:49 PM

#

what is the type of TargetNumPartitions

wicked notch Jan 13, 2024, 12:49 PM

#

oh yeah

frank sail Jan 13, 2024, 12:50 PM

#

presumably not float

wicked notch Jan 13, 2024, 12:50 PM

#

le integer divison

frank sail Jan 13, 2024, 12:50 PM

#

in programming, we do special math

wicked notch Jan 13, 2024, 12:57 PM

#

ok so for a graph with N nodes

#

I gotta do this

#

minPartitionSize = 124;
maxPartitionSize = 128;
targetPartitionSize = (minPartitionSize + maxPartitionSize) / 2;
targetPartitionCount = Max<int32>(2, Round(nodesCount / float32(targetPartitionSize)));
partitionWeights = [
  float32(targetPartitionCount / 2) / targetPartitionCount,
  1.0 - (float32(targetPartitionCount / 2) / targetPartitionCount),
];```

#

and so this way if I have 385 nodes

#

I do
targetPartitionCount = Max<int32>(2, Round(385 / float32(126))); which is just 3
float32(3 / 2) / 3 which is 0.333
1.0 - float32(3 / 2) / 3 which is 0.666

#

ok good

#

then what I do is

wicked notch Jan 13, 2024, 1:19 PM

#

BisectGraph(graph) {
  (_, partitions) = Partition(2, graph);
  front = 0;
  back = graph->GetVertexCount();
  swap = [];
  while front <= back {
    while front <= back && partitions[front] == 0 {
      swap[front] = front;
      front++;
    }
    while front <= back && partitions[back] == 1 {
      swap[back] = back;
      back--;
    }
    if front < back {
      swap[front] = back;
      swap[back] = front;
      front++;
      back--;
    }
  }
  split = front;
  partitionSize = [split, graph->GetVertexCount() - split];
  // make new graphs if partition size still too big
}

RecursiveBisectGraph(graph) {
  output = BisectGraph(graph);
  if output[0] && output[1] {
    RecursiveBisectGraph(output[0]);
    RecursiveBisectGraph(output[1]);
  }
}```

wispy spear Jan 13, 2024, 1:28 PM

#

wicked notch I do `targetPartitionCount = Max<int32>(2, Round(385 / float32(126)));` which is...

isnt float(3/2) still 1.0?

wicked notch Jan 13, 2024, 1:28 PM

#

yep

wispy spear Jan 13, 2024, 1:28 PM

#

and 1.0f / 3 is 0.33

wicked notch Jan 13, 2024, 1:28 PM

#

the point is to have partition weights be either 0.333, 0.666 (or more) or 0.5, 0.5 (or less)

wispy spear Jan 13, 2024, 1:28 PM

#

ok, looks like you just didnt update your calculation there : )

wicked notch Jan 13, 2024, 10:07 PM

#

I hate the fact that I've gotten used to reading unreal's source bleakekw

pale horizon Jan 13, 2024, 11:21 PM

#

Don’t say it to Tim to not be thrown into his CBT dungeon

twin bough Jan 14, 2024, 5:29 AM

#

Inb4 lvstri gets poached by epic

pale horizon Jan 14, 2024, 10:35 AM

#

Inb4 lvstri is actually Tim in disguise KEKW

wispy spear Jan 14, 2024, 10:49 AM

#

inb4 lvstri works at epic already, and uses "exams" as excuses for "i need to finish implementing nanonite 3.0 with my father brian or he will shoe me again"

twin bough Jan 14, 2024, 11:05 AM

#

pale horizon Inb4 lvstri is actually Tim in disguise <:KEKW:666849321462792234>

https://tenor.com/view/tim-sweeney-tim-sweeney-tim-sweeney-jumpscare-jumpscare-gif-8536783965545243764

Tenor

pale horizon Jan 14, 2024, 11:07 AM

#

#

Same vibes

wicked notch Jan 14, 2024, 12:29 PM

#

I bisected a graph

#

and I took an entire friggin hour to understand whatever the hell I was doing

wicked notch Jan 14, 2024, 12:30 PM

#

wicked notch ```rs BisectGraph(graph) { (_, partitions) = Partition(2, graph); front = 0;...

the [front, back] is a vertex remap

frank sail Jan 14, 2024, 12:30 PM

#

but ye did it lad frogapprove

wicked notch Jan 14, 2024, 12:38 PM

#

this is so unintuitive it's crazy

wicked notch Jan 14, 2024, 2:02 PM

#

I also gotta remap the remap

#

beautiful

wicked notch Jan 14, 2024, 3:45 PM

#

unreal does a radix sort of some kind

#

and then hashes the triangle IDs with the distance from their center as the hash

#

wtf

wicked notch Jan 14, 2024, 9:36 PM

#

ahh yep

#

as usual, more problems came up!

#

maybe if I allow looser partitioning I can solve this easily

wicked notch Jan 14, 2024, 9:58 PM

#

nope

#

sigh

wicked notch Jan 14, 2024, 10:27 PM

#

Ah I see

#

it's fucking foliage

#

amazing

#

the offending things are always foliage because they have the most disjoint meshes

#

(and some windows, because they are the same mesh for some goddamn reason, so their triangles are disjoint as well)

#

I could maybe tessellate

wispy spear Jan 14, 2024, 10:34 PM

#

are you doing that per primitive?

wicked notch Jan 14, 2024, 10:34 PM

#

ye

#

oh yeah another thing I could do is per tringle range materials

#

like unreal does

wispy spear Jan 14, 2024, 10:35 PM

#

crazy that this is so hard

wicked notch Jan 14, 2024, 10:36 PM

#

I can't even see what unreal does with bistro because it can't import bistro kekkedsadge

#

it literally crashes while building nanite meshes

#

anyways, brain fog is really starting to kick in so I'll leave this for past 16th me

wispy spear Jan 14, 2024, 10:41 PM

#

hmm debugging unreal is not an option neh?

wicked notch Jan 14, 2024, 11:04 PM

#

I'll try tomorrow

#

but unreal is just pain

#

it takes 10 minutes to import bistro when compiling unreal in release mode

#

one order of magnitude more in debug mode

pale horizon Jan 14, 2024, 11:04 PM

#

EPIC, HIRE THIS MAN KEKW

wicked notch Jan 14, 2024, 11:05 PM

#

brian already solved foliage kekkedsadge

wispy spear Jan 14, 2024, 11:05 PM

#

time to make an unreal without all the clutter and baggage from the past : >

#

to be able to import bistro hassle free

#

to be able to implement VSM 2.0 and nan~~okatze~~ite 3.0

wicked notch Jan 15, 2024, 12:11 AM

#

ok I solved it

#

with a garbage solution

#

(even more bisections)

#

holy shit it's garbage

#

ok subdivision is literally a requirement

#

TODO: make subdivision

wispy spear Jan 15, 2024, 12:39 AM

#

let delauny and worley come to you in your dreams

wicked notch Jan 15, 2024, 1:16 AM

#

OpenSubdiv is covered by the Apache license, and is free to use for commercial or non-commercial use. This is the same code that Pixar uses internally for animated film production. Our intent is to encourage high performance accurate subdiv drawing by giving away the "good stuff".

#

I didn't know Pixar was this chad

#

https://tenor.com/view/mujikcboro-seriymujik-gif-24361533

Tenor

primal shadow Jan 15, 2024, 1:50 AM

#

@wicked notch can it ever happen that a meshlet rendered in the first pass (was visible last frame and not frustum culled this frame) won't be visible after performing occlusion culling int he second pass?

wicked notch Jan 15, 2024, 1:52 AM

#

no, the second pass only cares about what wasn't visible before

primal shadow Jan 15, 2024, 1:54 AM

#

hmm ok, so I need to find why things are breaking...

#

oh wait lol

#

I'm reusing the preivous oclcusion buffer for the current, but never clearing it 😛

distant lodge Jan 15, 2024, 2:55 PM

#

OpenSubdiv is pretty nice iirc, I believe it's what blender uses for at least one of its subdivision modifiers

#

I recall cloning it a while back for that reason

primal shadow Jan 15, 2024, 6:01 PM

#

Fixed bugs, got a lot more performance back lol

#

I was accidentally reusing a buffer without clearing it, so once a meshlet became visible it would never become unvisible, leading to the first pass basically rendering every meshlet every time

#

And then for shadow views, I didn't realize the frustum was not setup to be uploaded to the GPU, so culling wasn't working at all

#

So I fixed those bugs and now it's much faster lmao

wicked notch Jan 15, 2024, 8:51 PM

#

alright, opensubdiv is nice but it's slow as hell

#

before I jump into opensubdiv's gpu backends I'd want to try MT

wicked notch Jan 15, 2024, 9:19 PM

#

this worked incredibly well though

#

holy shit

primal shadow Jan 15, 2024, 9:34 PM

#

MT?

wicked notch Jan 15, 2024, 9:36 PM

#

multithreading

#

btw I now understand what Jensen meant by "the more you buy the more you save"

#

the more you draw the more perf you have

#

that's right, the more I tessellate the better the clusterizer does

frank sail Jan 15, 2024, 10:20 PM

#

@wicked notch how do you cope with buffers in Vulkan

#

particularly updating them

#

I sense three usages:

Update whole thing every frame
Update occasionally
Never update from CPU (except maybe clearing it with the clear command)

#

Anyways 1 and 3 are easy to solve in a vacuum

wicked notch Jan 15, 2024, 10:23 PM

#

my public API for buffers is kinda barebones at the moment, I just have this

auto Write(const T& value, uint64 offset = 0) noexcept -> void;
auto Write(std::span<const T> values, uint64 offset = 0) noexcept -> void;
auto Write(const void* data, uint64 size, uint64 offset) noexcept -> void;```

frank sail Jan 15, 2024, 10:23 PM

#

is this buffered

wicked notch Jan 15, 2024, 10:24 PM

#

no unbuffered

frank sail Jan 15, 2024, 10:24 PM

#

alright so you manually buffer it

wicked notch Jan 15, 2024, 10:24 PM

#

yeah

frank sail Jan 15, 2024, 10:24 PM

#

🥖

wicked notch Jan 15, 2024, 10:25 PM

#

I think it may be worth to have something like BufferedTypedBuffer<T> or something

#

even though the name is garbage

frank sail Jan 15, 2024, 10:25 PM

#

The problem is coming up with a unified abstraction for these uses

#

But it's probably not that hard, just inherit fam

wicked notch Jan 15, 2024, 10:25 PM

#

yep

#

don't inherit too much otherwise you end up like java

#

new InputStreamReader(new BufferedInputReader(new StreamAdapter(new FuckThisShit())))

frank sail Jan 15, 2024, 10:26 PM

#

I could template too but then everything is now in the header bleakekw

#

Btw I think devsh handles case 2 by using an upload "pool"

#

That is the rarest case tbh and I think I only do it for scene geometry updates

#

I suppose I could hack it by treating those as N-buffered, but rip memory

wicked notch Jan 15, 2024, 10:31 PM

#

tbh unless you're moving lots of MiB of data around per frame, it's probably fine to just write it all

#

or at least, update sparsely without buffering

#

for everything else I think the staging pool that devsh presents is quite good

frank sail Jan 15, 2024, 10:34 PM

#

hmm I better get crackin on this

pale horizon Jan 16, 2024, 2:14 PM

#

wicked notch my public API for buffers is kinda barebones at the moment, I just have this ```...

I didn’t even parse that as C++ at first lmao

cold sky Jan 16, 2024, 7:23 PM

#

frank sail I suppose I could hack it by treating those as N-buffered, but rip memory

RIP memory indeed, but if you aalways use same amount, thats fine

#

my thing is only more optimal when you do 64mb one frame, and 1mb another

frank sail Jan 16, 2024, 7:23 PM

#

yeah there's nothing you can do when you write the whole thing every frame

cold sky Jan 16, 2024, 7:25 PM

#

wicked notch I think it may be worth to have something like `BufferedTypedBuffer<T>` or somet...

this is the biggest mistake you'll ever make 😛

distant lodge Jan 16, 2024, 8:01 PM

#

devsh style upload pools are nice but I still n-buffer stuff I write out every frame

#

I mainly use them for staging mesh/texture uploads

cold sky Jan 16, 2024, 8:02 PM

#

sooon depri will give me "Even More Power" (TM)

#

hue hue hue hue hue

frank sail Jan 16, 2024, 8:02 PM

#

devsh palpatine incident

distant lodge Jan 16, 2024, 8:02 PM

#

unlimited buffering

cold sky Jan 16, 2024, 8:03 PM

#

well he's rigging the TimelineSemaphore Deferred std::function<> to a Pool Allocator

#

for dem bindless descriptor sets

distant lodge Jan 16, 2024, 8:04 PM

#

that's what I do for my fixed size upload pool essentially but I don't see where bindless comes in

cold sky Jan 16, 2024, 8:04 PM

#

so I can allocate/deallocate

cold sky Jan 16, 2024, 8:04 PM

#

distant lodge that's what I do for my fixed size upload pool essentially but I don't see where...

basically I go alloc_addr() and it gives me a slot into an array descriptor binding thats currently free

#

when I'm done using a texture and want to mark a descriptor array item as ready for overwrite, I want to do free_addr(slot) right?

#

but I probably want to latch that on the semaphore value that the last frame that uses said texture will signal

distant lodge Jan 16, 2024, 8:06 PM

#

what I do for that is I have FIF implicit linked freelists and append them to the tail of the global free list

cold sky Jan 16, 2024, 8:07 PM

#

yea, but that dumb

#

it kinda implies you only have one TS

distant lodge Jan 16, 2024, 8:07 PM

#

so basically I have a std::vector<BindlessHandle> that's all live and dead handles, then BindlessHandle m_waitingFrees[FRAMES_IN_FLIGHT]; and BindlessHandle m_freeListHead/Tail;

#

yeah but you can easily extend this to work with more TS if you get rid of the FIF

cold sky Jan 16, 2024, 8:08 PM

#

eeeeh

#

trust me I thought about it

#

it gets suuuuper messy with multi-threading

#

also I don't just do free lists, I do arbitrary functors

distant lodge Jan 16, 2024, 8:08 PM

#

just get better locks

cold sky Jan 16, 2024, 8:09 PM

#

it just so happens that 99% of the time that functor is a free/deallocate functor

distant lodge Jan 16, 2024, 8:09 PM

#

the functors do throw a wrench in it yeah but with update-after-bind and whatnot the only thing that's in my critical section is disturbing the index list for alloc/free

#

I use those fancy 1 byte webkit style locks

cold sky Jan 16, 2024, 8:09 PM

#

distant lodge just get better locks

it gets super messy, because you don't want to run the functor under a lock

distant lodge Jan 16, 2024, 8:09 PM

#

and yielding switches fibers instead of blocking the thread

#

so just locking is no big deal for me

cold sky Jan 16, 2024, 8:11 PM

#

and you also don't want to run the functor from some unexpected thread/actor

#

this is why my latch lists are partitioned per-semaphore-per-resource

#

i.e. for the same semaphore Down Streaming Buffer doesn't keep its latched frees interleaved with Up Streaming Buffer and MegaDescriptorSet latched frees

#

We don't have a master-cleaner/submitter/waiter thread, so sticking the deferred events in the semaphore makes no sense

#

also you need to recognise those systems for what they are == garbage collection

cold sky Jan 16, 2024, 8:13 PM

#

cold sky this is why my latch lists are partitioned per-semaphore-per-resource

^ this is much better

#

I dont want to be running data-download consumption host callbacks when I just want to poll if I can free a single descriptor when I allocate a slot, all because I stick all my events on the same semaphore's queue

#

if I slapped all events in the same queue, I'd be susceptible to random pauses like that

distant lodge Jan 16, 2024, 8:16 PM

#

I just give stuff like that a dedicated update() method, though I guess you'd still be susceptible to the pauses simply from the fact you contend the same lock as the GC cycle

cold sky Jan 16, 2024, 8:16 PM

#

its also kinda important when the deferred event is a free and the resource is not thread-safe

#

at least my event queues live in the resource, so if I externally synchronise access to the resource, I wont get any nasty surprises like unsynchronized callback execution

cold sky Jan 16, 2024, 8:18 PM

#

distant lodge the functors do throw a wrench in it yeah but with update-after-bind and whatnot...

functors are juicy: https://github.com/Devsh-Graphics-Programming/Nabla-Examples-and-Tests/blob/9e980e729bb6e2813d0b2a60e20c182b837d4fce/05_StreamingAndBufferDeviceAddressApp/main.cpp#L283

#

thats a free latched with a data consumption callback when you're moving memory GPU -> Host

#

the fun part is when you use this little function: https://github.com/Devsh-Graphics-Programming/Nabla/blob/84d9aca59ccf8790b118bc9ddb63772078e20efc/include/nbl/video/utilities/IUtilities.h#L499

#

so you can actually download a 2GB vkBuffer through a 64MB HOST_VISIBLE

#

in lots of little submits of 1 copy command

#

and every single time you overflow (can't allocate) the free callbacks will get run until a timeout

wicked notch Jan 16, 2024, 8:23 PM

#

this is a bigger brainworm than I'd recommend for safe consumption

#

the epic staging pool is good but going farther than that with this stuff is nervous

distant lodge Jan 16, 2024, 8:24 PM

#

I see that it functions like an upload buffer but the opposite way around essnetially

#

but I also think having thread safety issues with the output of your intermediate data download functors is a skill issue

distant lodge Jan 16, 2024, 8:24 PM

#

wicked notch the epic staging pool is good but going farther than that with this stuff is <:n...

you don't know how much you want it until you do

cold sky Jan 16, 2024, 8:25 PM

#

distant lodge I see that it functions like an upload buffer but the opposite way around essnet...

upload buffer is far easier to write

frank sail Jan 16, 2024, 8:25 PM

#

glBufferSubData

distant lodge Jan 16, 2024, 8:25 PM

#

mine aren't stge 5 brainworm yet and don't do the crazy stuff devsh's do like converting your image format live

cold sky Jan 16, 2024, 8:26 PM

#

cause when you want to push 2GB of data to Device from Host througha 64MB buffer, you just need to wait for Device to finish copying the previous chunk of staging buf to destination before you overwrite it (the staging)

#

download buffer is faaar weirder

#

cause Host needs to do shit with the data in the 64MB mapped staging BEFORE you free it (and device overwrites with a new chunk)

#

so you need a callback that runs AFTER semaphore signal and BEFORE the free, which you need to perform to make progress in the copy-submit loop

distant lodge Jan 16, 2024, 8:28 PM

#

right ok I'm starting to see

cold sky Jan 16, 2024, 8:29 PM

#

in the upstreaming direction you already have the 2GB Host source and 2GB Device destination laying about

#

and if don't you'll naturally make multiple calls exactly the chunk size that the Host produces at a time

distant lodge Jan 16, 2024, 8:29 PM

#

so with multithreading it, on top of everything else, you have to take care that the execution time of the free functors doesn't bleed into the execution time of enqueueing

cold sky Jan 16, 2024, 8:30 PM

#

I can explicitly force all ready functors to run with a cull_frees()

#

https://github.com/Devsh-Graphics-Programming/Nabla-Examples-and-Tests/blob/9e980e729bb6e2813d0b2a60e20c182b837d4fce/05_StreamingAndBufferDeviceAddressApp/main.cpp#L313

#

or just block XD

#

while (m_downStreamingBuffer->cull_frees()) {}

#

😛

#

generally speaking what the buffer will do is try to run 1 or 2 functors every time you allocate

#

via a poll()

#

cause otherwise your allocator runs out of memory and you get faux fragmentation

#

ofc if it can't service your request, then it will do a wait instead of a poll

#

the allocate call takes a timeout

distant lodge Jan 16, 2024, 8:33 PM

#

the way I'd do it is probably just to have a dedicated update() method that calls a parallel scatter of the functors with a gather function pending on it to free the indices when it's done, if I was assuming that running them synchronously would affect execution elsewhere

#

and if there's no space to allocate, just enqueue your functor into the not ready list directly

cold sky Jan 16, 2024, 8:33 PM

#

distant lodge the way I'd do it is probably just to have a dedicated `update()` method that ca...

and then you need to remember to call the update() method

#

after upgrading to timeline semas

#

I literally never have to run that

#

unless I want the functors to run "earlier than"

cold sky Jan 16, 2024, 8:34 PM

#

cold sky I literally never have to run that

functors will [implicitly] either run during the next allocate() or in the destructor of the staging buffer

distant lodge Jan 16, 2024, 8:36 PM

#

so it's valid usage to allocate staging buffers that get deleted when you're done staging?

cold sky Jan 16, 2024, 8:36 PM

#

yep

#

it just that the destructor might stall you

#

you'll deadlock if you havent submitted the copies yet though

distant lodge Jan 16, 2024, 8:37 PM

#

funky, I'd much rather have to remember to call the update method lol

#

doesn't sound nearly worth it for surprise sync points

cold sky Jan 16, 2024, 8:38 PM

#

its totally worth it

#

cause we can write code that does not give shits about overflows

#

like an faux-immediate mode CAD renderer

#

that can attempt to draw 400MB of CPU produced geometry through a 64MB ReBAR buffer

distant lodge Jan 16, 2024, 8:39 PM

#

mine can handle the overflows, I just need to have an uploadPool.update() in my update loop somewhere

cold sky Jan 16, 2024, 8:39 PM

#

yeah and we don't

#

you just check the condition for "would be overflow"

#

and do a "hidden transparent submit"

#

which signals the semaphore value that you latched all the resource frees from waaaay before you knew there'd be an overflow

#

then you wait on the semaphore

#

reset the commandbuffer and begin it again

#

go back to business as usual with the rest of the code
and while its trying to allocate all those resources, it unknowingly releases resources from deferred functors

distant lodge Jan 16, 2024, 8:41 PM

#

I do all that except replace waiting on the semaphore for enqueueing the functor

cold sky Jan 16, 2024, 8:41 PM

#

you need to wait on the semaphore cause you've ran out of space

distant lodge Jan 16, 2024, 8:41 PM

#

I opted to structure specifically around never waiting on anything

cold sky Jan 16, 2024, 8:42 PM

#

literally no forward progress can be made

distant lodge Jan 16, 2024, 8:42 PM

#

yeah that's why I have the update function though, it does the same TS check you'd do when enqueueing another write job, except if nothing is ready I don't block any threads

#

which is much better for me because fibers = avoid OS blocks

cold sky Jan 16, 2024, 8:43 PM

#

potatoe/tomatoe

#

you still need to suspend exxecution of the "Immediate Draw" routine

#

either by waiting, switching or yielding

distant lodge Jan 16, 2024, 8:43 PM

#

yeah that's true, luckily I don't have to port GDI/oldGL code lol

cold sky Jan 16, 2024, 8:44 PM

#

the thing is, with your system I'd have to spam update() literally everywhere

#

so I can make progress

#

or whenever an allocation fails

#

my update() is just rolled into the allocation routine implicitly

distant lodge Jan 16, 2024, 8:45 PM

#

yeah I guess the major downside is you have to architect everything around being async-friendly

cold sky Jan 16, 2024, 8:45 PM

#

wdym everything?

distant lodge Jan 16, 2024, 8:46 PM

#

everything downstream of your uploads and downloads I mean

cold sky Jan 16, 2024, 8:46 PM

#

nope

distant lodge Jan 16, 2024, 8:46 PM

#

I meant with my system

#

if you block on stuff you don't

cold sky Jan 16, 2024, 8:47 PM

#

the whole code outside of the utilities thinks the commandbuffer never got reset

#

it goes into the function begun

#

and comes out begun as well

#

the only way it can know from the outside that there was a hidden submit is because the nextSemaphoreWait.value has incremented

cold sky Jan 16, 2024, 8:48 PM

#

distant lodge if you block on stuff you don't

actually you gave me a funny idea 😛

#

right now

#

to overlap Host and Device better

#

maybe I should have a lambda or some condition which causes artificial allocation failure / overflow

#

i.e. if I have 128MB staging buffer, so the host doesn't suballocate the whole 128MB, write it, submit and block

#

but rather tries to request 32 or 64mb at a time max

#

and if there's only 96 or 64mb free, treat it as an overflow

#

then I could have it block on a much older submit

#

eeeeh

#

but then I'd need multiple commandbuffers to round-robin

#

cause I can't reset a pending one

#

hmm

#

ok I might actually implement this as an improvement

#

aside from the intendedNextSubmit where the last cmdbuf is the resettable scratch, I could have N of them going round robin

#

that will be stage 69 brainworm

#

then if you have the right sized N, the actual function with overflows might actually never block (if Host is slow enough on the callbacks or data filling)

distant lodge Jan 16, 2024, 9:55 PM

#

honestly that's a pretty neat idea, would need some tuning to be worthwhile though

cold sky Jan 16, 2024, 10:10 PM

#

I'm not really hungry for perf

#

Just usability and correctness

wicked notch Jan 16, 2024, 11:46 PM

#

back to nanite

#

I have sent the project in

#

alright so tessellation is done (it saves my ass so much you have no idea, at least for the initial clustering)

#

strict graph partitioning is done

#

simplification is done

#

what's left is

figure out how I'm gonna store this shit in glTF
building the DAG

frank sail Jan 16, 2024, 11:53 PM

#

LVSTRI_nanite_at_home

wicked notch Jan 16, 2024, 11:55 PM

#

we're almost done boys

#

just a little more and we'll have flawless LODs

#

I accept suggestions for storing meshletisms in glTF

#

like does one mesh = one LOD group?

#

how do I express the parent - child relationship of the DAG

frank sail Jan 16, 2024, 11:56 PM

#

via edges

wicked notch Jan 16, 2024, 11:56 PM

#

yeah but in gltf

frank sail Jan 16, 2024, 11:56 PM

#

hehe idk

wicked notch Jan 16, 2024, 11:56 PM

#

maybe it's just better if I make a custom format

frank sail Jan 16, 2024, 11:56 PM

#

hmm copy how nodes do it?

#

you could store it as a custom binary blob in the gltf

wicked notch Jan 16, 2024, 11:57 PM

#

my DAG is gonna be CSR

#

so it's just two arrays

frank sail Jan 16, 2024, 11:57 PM

#

CSR?

wicked notch Jan 16, 2024, 11:57 PM

#

compressed storage row

frank sail Jan 16, 2024, 11:58 PM

#

wat dat

wicked notch Jan 16, 2024, 11:58 PM

#

like a graph that connects vertex 0 to 2,3 - vertex 1 to 0 and 2 and vertex 2 to 0, 1 and 3 is just written like this
[0, 2, 4, 7]
[2, 3, 1, 0, 0, 1, 3]

frank sail Jan 16, 2024, 11:59 PM

#

ah noice

#

so is each row a layer of the dag

#

i.e. the first row is just root nodes

wicked notch Jan 17, 2024, 12:01 AM

#

the first array is offsets into the second array

#

the second array is edges

frank sail Jan 17, 2024, 12:01 AM

#

o

wicked notch Jan 17, 2024, 12:01 AM

#

I shall explain better

#

vertex 0 => look up range in the first array: start = range[0]; end = range[1], edges connecting vertex 0 are in the second array starting at 0 and ending at 2 (excluded)
vertex 1 => look up range in the first array: start = range[1]; end = range[2], edges connecting vertex 1 are in the second array starting at 2 and ending at 4 (excluded)
...

frank sail Jan 17, 2024, 12:04 AM

#

interesting

#

I see they use this for sparse matrices too

#

btw how do you select cuts at runtime kekkedsadge

primal shadow Jan 17, 2024, 12:14 AM

#

wicked notch what's left is - figure out how I'm gonna store this shit in glTF - building th...

Custom gltf extension, store it as binary serialized data.

wicked notch Jan 17, 2024, 12:14 AM

#

frank sail btw how do you select cuts at runtime <:kekkedsadge:1149726521066524764>

that's easy

#

process all leaf clusters, if their parent's error is too big but the children are good, add them to the list, otherwise add the parent to another list for further processing
repeat until all clusters have been processed

primal shadow Jan 17, 2024, 12:15 AM

#

Also @wicked notch when you're done please make a write up of the preprocessing algorithm and save me a lot of future pain 😄

#

And tbh how you structured the dispatches over meshlets/instances would be good to compare how I did it against, but less important.

wicked notch Jan 17, 2024, 12:16 AM

#

remember that I use mesh shaders

primal shadow Jan 17, 2024, 12:17 AM

#

Oh you do, nvm. Well actually no, I'd still like to know

wicked notch Jan 17, 2024, 12:17 AM

#

I use your same approach btw if I recall

primal shadow Jan 17, 2024, 12:17 AM

#

Doing a dispatch with one workgroup per meshlet takes like 0.1ms per 2^16 meshlets, which doesn't scale great :((

#

To write out the index buffer

wicked notch Jan 17, 2024, 12:18 AM

#

  uint32 IndexOffset;
  uint32 IndexCount;
  uint32 PrimitiveOffset;
  uint32 PrimitiveCount;
};

struct MeshletInstance {
  uint32 MeshletIndex;
  uint32 TransformIndex;
  uint32 MaterialIndex;
  // other per instance data
};```

wicked notch Jan 17, 2024, 12:18 AM

#

primal shadow Doing a dispatch with one workgroup per meshlet takes like 0.1ms per 2^16 meshle...

if the writing + culling is still combined in one dispatch, then that's your bottleneck

primal shadow Jan 17, 2024, 12:19 AM

#

No they're seperate

wicked notch Jan 17, 2024, 12:19 AM

#

otherwise idk, perchance check what nsight has to say

primal shadow Jan 17, 2024, 12:19 AM

#

Which improved things a lot, but it's still a bit expensive

primal shadow Jan 17, 2024, 12:19 AM

#

wicked notch otherwise idk, perchance check what nsight has to say

It's simply the overhead of so many dispatches :/. Wgpu kinda limits me here. I can only do 2^16 meshlets/workgroups per dispatch, and it has to put a barrier between every dispatch.

frank sail Jan 17, 2024, 12:20 AM

#

wicked notch process all leaf clusters, if their parent's error is too big but the children a...

Do you do this in a compute shader or something

wicked notch Jan 17, 2024, 12:20 AM

#

frank sail Do you do this in a compute shader or something

yes

wicked notch Jan 17, 2024, 12:20 AM

#

primal shadow It's simply the overhead of so many dispatches :/. Wgpu kinda limits me here. I ...

rip

#

ask the wgpu lads to up the limit KEKW

#

2^16 as workgroup limit is kinda garbage

frank sail Jan 17, 2024, 12:21 AM

#

wicked notch remember that I use mesh shaders

Wouldn't it work with fake mesh shaders

primal shadow Jan 17, 2024, 12:22 AM

#

wicked notch 2^16 as workgroup limit is kinda garbage

It's because webgpu enforces one limit for all dispatch dimensions instead of letting X be much higher :p. But yeah, I'm going to try and fix that in wgpu...

#

But back on topic, please do a writeup of the preprocessing 🙂

#

When you finish*

frank sail Jan 17, 2024, 12:22 AM

#

Lowest common denominator API bleakekw

wicked notch Jan 17, 2024, 12:23 AM

#

primal shadow But back on topic, please do a writeup of the preprocessing 🙂

I will, it's very shrimple though

#

(it isn't)

#

it's just the edge cases

#

they're literally limitless

#

infinite edge cases

primal shadow Jan 17, 2024, 12:25 AM

#

wicked notch I will, it's very shrimple though

Yeah I don't believe you LMAO.

frank sail Jan 17, 2024, 12:25 AM

#

#

primal shadow Jan 17, 2024, 12:27 AM

#

wicked notch 2^16 as workgroup limit is kinda garbage

Oh wait iirc it's because of AMD. AMD doesn't give you higher limits...

frank sail Jan 17, 2024, 12:30 AM

#

2^16 * 128 threads is still like 8 million though

primal shadow Jan 17, 2024, 12:30 AM

#

It's 1 workgroup per meshlet

wicked notch Jan 17, 2024, 12:31 AM

#

i guess you can dispatch more in the second dimension

#

should work just fine tbh

primal shadow Jan 17, 2024, 12:31 AM

#

Hmm, perhaps? Let me check what the max overall dimensions are.

#

That's a good idea if it's possible though

wicked notch Jan 17, 2024, 12:32 AM

#

btw @frank sail can you link again that thing about persistent threads in kompute

frank sail Jan 17, 2024, 12:33 AM

#

Uhh

#

Was it a paper

wicked notch Jan 17, 2024, 12:33 AM

#

my froge brain does not allow me to remember events past 2 weeks old

faint crane Jan 17, 2024, 12:33 AM

#

“It was revealed to me in a dream.”

frank sail Jan 17, 2024, 12:33 AM

#

https://www.classes.cs.uchicago.edu/archive/2016/winter/32001-1/papers/AStudyofPersistentThreadsStyleGPUProgrammingforGPGPUWorkloads.pdf
First Google result

wicked notch Jan 17, 2024, 12:34 AM

#

should've just googled smh

#

who's gonna tell LVSTRI

#

oh yeah this is the part that requires UB to work

#

fun

#

wait hold up

delicate rain Jan 17, 2024, 12:36 AM

#

By UB do you mean "Unreal 5 Behavior" ?

wicked notch Jan 17, 2024, 12:36 AM

#

delicate rain By UB do you mean "Unreal 5 Behavior" ?

true

#

ight so

#

MPMC queues on the gpu

#

what's the exit condition

#

and can I just atomicAdd(taskCount, -1)

#

nope

#

alright

primal shadow Jan 17, 2024, 12:40 AM

#

So uhh trying to spawn 40x40 Stanford bunnies leads to
Buffer binding 6 range 2905695912 exceeds max_*_buffer_binding_size limit 2147483648

#

That's the index buffer

#

I simply can't bind that big of a buffer 😭

#

Idk what to do about that

wicked notch Jan 17, 2024, 12:40 AM

#

uh

#

BDA?

#

KEKW

#

you have that right?

primal shadow Jan 17, 2024, 12:42 AM

#

Device pointers? No lol

wicked notch Jan 17, 2024, 12:44 AM

#

damn

#

time to write another wgpu complaint then

primal shadow Jan 17, 2024, 12:45 AM

#

Already done... I opened an issue a few weeks ago

wicked notch Jan 17, 2024, 1:02 AM

#

thank god for the wayback machine

#

https://web.archive.org/web/20160722075531/https://mediatech.aalto.fi/~timo/publications/aila2009hpg_paper.pdf

#

void main() {
  while (true) {
    const uint currentTaskIndex = atomicAdd(completedTaskCount, 1);
    if (currentTaskIndex > maxTasks) {
      return;
    }
    const Cluster currentCluster = clusters[taskIndirection[currentTaskIndex]];
    const Cluster parentCluster = clusters[currentCluster.parent];
    if (ProjectError(currentCluster) <= threshold && ProjectError(parentCluster) > threshold) {
      ProcessCluster(parentCluster);
    } else {
      ScheduleWork(parentCluster);
      atomicAdd(completedTaskCount, -1);
    }
  }
}```

#

I guess idk

wicked notch Jan 17, 2024, 1:41 AM

#

this doesn't say when all clusters have been processed though

#

ye this requires more thinking

primal shadow Jan 17, 2024, 4:19 AM

#

According to some testing, my renderer performs worse than regular CPU-driven draw calls...

#

and has serious memory usage issues having to have a giant index buffer, it's basically unusable for naythign practicle :/

frank sail Jan 17, 2024, 4:20 AM

#

why does the index buffer need to be giant

#

bigger than usual?

primal shadow Jan 17, 2024, 4:20 AM

#

Feels like all my work was wasted unless I go try and add atomic image support and u64s to wgpu/naga

buoyant summit Jan 17, 2024, 4:21 AM

#

wgpu/naga moment,,,

primal shadow Jan 17, 2024, 4:21 AM

#

I need one giant index buffer that stores a u32 per vertex of all possible triangles in the scene

#

and just binding that large of a buffer is a problem, I reach the max bindable limit...

#

Buffer binding 6 range 2905695912 exceeds max_*_buffer_binding_size limit 2147483648

frank sail Jan 17, 2024, 4:21 AM

#

primal shadow I need one giant index buffer that stores a u32 per vertex of all possible trian...

isn't that the same if you use regular models frog_thinkk

primal shadow Jan 17, 2024, 4:22 AM

#

And that's like nearly 3gb of just an index buffer

primal shadow Jan 17, 2024, 4:22 AM

#

frank sail isn't that the same if you use regular models <:frog_thinkk:1112825124316516448>

Not with instancing, no

frank sail Jan 17, 2024, 4:22 AM

#

ah

primal shadow Jan 17, 2024, 4:22 AM

#

Because I have to allocate it per triangle, regardless of instancing

frank sail Jan 17, 2024, 4:22 AM

#

ok I see

primal shadow Jan 17, 2024, 4:22 AM

#

Same with all the extra per-meshlet data, to a much lesser extent

frank sail Jan 17, 2024, 4:22 AM

#

surely there is a way you can reuse indices

#

I bet more indirection would do the trick

primal shadow Jan 17, 2024, 4:25 AM

#

I don't see any way. I do a single draw_indirect() and encode the meshlet ID + triangle ID into each index, and then the vertex shader takes the vertex index (the index I wrote) and extracts the meshlet ID + triangle ID for it to get the vertex for.

#

or not quite meshlet ID + triangle ID, meshlet ID + meshlet index or something, idr exactly

frank sail Jan 17, 2024, 4:27 AM

#

does webgpu have MDI?

primal shadow Jan 17, 2024, 4:27 AM

#

It does. Performed extremely poorly when I tested doing one draw per meshlet though.

#

From what I can tell, it's either mesh shaders, or software raster

frank sail Jan 17, 2024, 4:27 AM

#

yeah one subdraw per meshlet is not ideal

#

but maybe you can group them or use larger meshlets

#

or geometry shaders bleakekw (jk)

primal shadow Jan 17, 2024, 4:29 AM

#

I think I'll at least try larger meshlets

#

I'm doing 64x64 meshlets, maybe 64x128 performs better

frank sail Jan 17, 2024, 4:29 AM

#

what are the two numbers

#

i forgor

#

vertices x triangles?

primal shadow Jan 17, 2024, 4:36 AM

#

yeah

frank sail Jan 17, 2024, 4:39 AM

#

maybe doing way bigger meshlets would make mdi+instancing viable. I've seen one renderer use 1024 triangle meshlets

#

Though that was ray tracing

wicked notch Jan 17, 2024, 11:36 AM

#

primal shadow According to some testing, my renderer performs worse than regular CPU-driven dr...

show nsight trace of both

wicked notch Jan 17, 2024, 3:14 PM

#

btw I have an idea for software meshletisms

primal shadow Jan 17, 2024, 5:30 PM

#

wicked notch show nsight trace of both

Will later TN

wicked notch Jan 17, 2024, 10:28 PM

#

@primal shadow you up?

primal shadow Jan 17, 2024, 10:32 PM

#

wicked notch <@145540119141679105> you up?

Yes, I'm at work though. I won't be able to test my stuff for another ~3-4h

wicked notch Jan 17, 2024, 10:32 PM

#

ah rip

#

do you remember how much time it takes for you to rasterize bistro

#

both generating the index buffer and rasterizing the visbuffer

#

it doesn't have to be accurate

primal shadow Jan 17, 2024, 10:34 PM

#

I never tested on bistro because I don't have a good system for converting whole scenes yet

wicked notch Jan 17, 2024, 10:34 PM

#

ight, we'll test later with one of your scenes

#

but you can technically cut your memory consumption by a factor of 3 if you accept unindexed rendering

#

without degenerate tringles

primal shadow Jan 17, 2024, 10:35 PM

#

Err, how?

#

I don't do indexed rendering as is really, the index buffer just encodes the meshlet and triangle data for the vertex shader to load

wicked notch Jan 17, 2024, 10:36 PM

#

by doing this

if (gl_LocalInvocationID.x == 0) {
    const uint currentIndexCount = atomicAdd(g_meshletDrawIndirectBuffer.VertexCount, meshlet.PrimitiveCount * 3);
    s_meshletBasePrimitive = currentIndexCount / 3;
}
barrier();

const uint currentPrimitiveId = gl_LocalInvocationID.x;
if (currentPrimitiveId < meshlet.PrimitiveCount) {
    g_meshletVisiblePrimitiveBuffer[s_meshletBasePrimitive + currentPrimitiveId] = meshletInstanceIndex << 7 | currentPrimitiveId;
}```

#

And this

const uint visiblePrimitiveIndex = gl_VertexIndex / 3;
const uint visiblePrimitive = g_meshletVisiblePrimitiveBuffer[visiblePrimitiveIndex];
const uint meshletInstanceIndex = visiblePrimitive >> 7u;
const uint primitiveId = visiblePrimitive & 0x7fu;
const uint primitiveCycle = gl_VertexIndex % 3;
const uint primitiveIndex = uint(g_meshletPrimitiveBuffer[meshlet.PrimitiveOffset + primitiveId * 3 + primitiveCycle]);
const uint vertexIndex = g_meshletIndexBuffer[meshlet.IndexOffset + primitiveIndex];
const vec3 position = g_meshletPositionBuffer[meshlet.VertexOffset + vertexIndex];```

primal shadow Jan 17, 2024, 10:39 PM

#

That's what I currently have basically, no? You still have an index buffer of size triangle_count * 3

wicked notch Jan 17, 2024, 10:39 PM

#

look closely

#

I never mul by 3 when indexing the "index buffer" (aka g_meshletVisiblePrimitiveBuffer)

#

it's sized meshletCount * triangleCount

#

or well, meshletInstanceCount * triangleCount

primal shadow Jan 17, 2024, 10:41 PM

#

Oh so you're doing an indirect draw of vertex count size still

wicked notch Jan 17, 2024, 10:41 PM

#

the size of the draw doesn't change yeah

primal shadow Jan 17, 2024, 10:42 PM

#

But putting the triangle IDs in a separate buffer and having the vertices load them

#

Rather than the index buffer

wicked notch Jan 17, 2024, 10:42 PM

#

you could also load the primitive indices directly here

#

it doesn't really matter

primal shadow Jan 17, 2024, 10:42 PM

#

Sensible yeah. Still a lot of data, but less so.

#

Let me try that later, thanks

wicked notch Jan 17, 2024, 10:42 PM

#

it's exactly 3 times less memory

primal shadow Jan 17, 2024, 10:51 PM

#

1gb of data per 250 million triangle instances (excluding asset data)

#

Still a lot, but more manageable than 3gb

wicked notch Jan 17, 2024, 10:51 PM

#

The next step is just budgeting (and consequently, LoD)

primal shadow Jan 17, 2024, 10:52 PM

#

Theoretically I need to allocate the whole amount regardless of lods though

wicked notch Jan 17, 2024, 10:52 PM

#

No, you allocate a fixed budget

primal shadow Jan 17, 2024, 10:52 PM

#

Unless I'm ok estimating and allocating a lower amount of data on the premise that it won't all be used due to LODs/culling

wicked notch Jan 17, 2024, 10:53 PM

#

it's just the classic budgeting problem

#

allocate a fixed budget, work with that, hope that culling and LoD don't make you go over budget

primal shadow Jan 17, 2024, 10:53 PM

#

Mhm

primal shadow Jan 18, 2024, 12:51 AM

#

I wonder if I can do without uploading per triangle data at all

#

Just upload visible meshlet IDs and do some kind of prefix sum to inform how many triangles there are per meshlet, as a running count

#

And then the vertex shader would do some kind of binary search to find their meshlet id

wicked notch Jan 18, 2024, 1:02 AM

#

ye that's also a possibility

wicked notch Jan 18, 2024, 1:55 AM

#

interestingly, I still fall apart on disjoint meshes

#

I should probably detect those cases and fallback to kdtree partitioning, instead of adding more logic to this

#

this is a pain

#

it's 3am ffs

#

I eep

primal shadow Jan 18, 2024, 2:46 AM

#

scene

#

meshlets

#

regular renderer

primal shadow Jan 18, 2024, 3:07 AM

#

I need to test this on a real scene tbh...

primal shadow Jan 18, 2024, 4:00 AM

#

I think my occlusion culling is not working 🤔

primal shadow Jan 18, 2024, 5:52 AM

#

Apparently my meshlet bounding spheres are not correct

primal shadow Jan 18, 2024, 7:15 PM

#

Fixed occlusion culling

wicked notch Jan 18, 2024, 9:52 PM

#

fuck it

#

I'll just DFS into the graph to find disconnected-ness

wicked notch Jan 19, 2024, 1:41 AM

#

I also realized a fatal flaw in my shader resource table just now bleakekw

primal shadow Jan 19, 2024, 1:48 AM

#

@wicked notch do you have any idea how to do culling for orthographic projections?

wicked notch Jan 19, 2024, 1:48 AM

#

the same way you do for perspective projections

primal shadow Jan 19, 2024, 1:48 AM

#

hrm

wicked notch Jan 19, 2024, 1:49 AM

#

there is no change in the math if you use the hartmann-gribbs method to extract frustum planes

#

HZB remains precisely the same

primal shadow Jan 19, 2024, 6:16 AM

#

It was the way I was converting depth 😛

#

Forgot I had to adjust that for ortho

wicked notch Jan 19, 2024, 5:49 PM

#

I think adding fake connectivity edges with a huge weight should make this epic

wicked notch Jan 20, 2024, 1:46 AM

#

TODO: look more into voronoi++
also figure out disjoint set union and vertex hash discretization

primal shadow Jan 20, 2024, 7:16 AM

#

@wicked notch I kind of want to try LODs. What do I need to know?

buoyant summit Jan 20, 2024, 11:18 AM

#

@wicked notch https://jglrxavpok.github.io/2024/01/19/recreating-nanite-lod-generation.html they have some stuff about welding "close-enough" vertices together there

jglrxavpok’s blog

Recreating Nanite: LOD generation

Random thoughts and progress on my personal projects.

wicked notch Jan 20, 2024, 11:56 AM

#

what an article to wake up to

frank sail Jan 21, 2024, 7:02 AM

#

that article seems like exactly what you are working on lol

#

now you can copy his homework. how serendipitous

wicked notch Jan 21, 2024, 12:14 PM

#

he's using meshoptimizer for clusterization, which is what I'm struggling with

#

but the "welding vertices together" is extremely valuable info

#

so I'll be copying that indeed KEKW

wispy spear Jan 21, 2024, 12:17 PM

#

wispy spear let delauny and worley come to you in your dreams

wasnt too far fetched then

wicked notch Jan 21, 2024, 12:23 PM

#

voronoi is definitely helping me

#

I really gotta thank all the magic math guys

#

it's basically the dual of a delaunay's triangulation

wicked notch Jan 21, 2024, 1:00 PM

#

I'm also about to commit crimes against unreal engine

#

none of these planes are connected together, all vertices are unique

wispy spear Jan 21, 2024, 1:11 PM

#

thats how Cities Skylines 2 renders forests, no? 😄

wicked notch Jan 21, 2024, 1:17 PM

#

ok blender did not understand the assignment

#

I'll do it myself

wicked notch Jan 21, 2024, 2:10 PM

#

epic

#

it crashes unreal

#

welp

#

here's the mesh

📎 disjoint_planes_single.glb

#

it's 100% valid gltf

wicked notch Jan 21, 2024, 2:32 PM

#

I managed to import it

#

and yep, the real deal fails hard here

wicked notch Jan 21, 2024, 2:53 PM

#

real footage of me abusing unreal

wispy spear Jan 21, 2024, 3:02 PM

#

hehe my engine does not support untringleized stuff

wicked notch Jan 22, 2024, 1:24 AM

#

btw @frank sail

#

remember the eternally broken HZB?

#

for some reason it's not broken anymore

#

I have no idea why this even works

#

but if during the HzbCopy shader, instead of doing an initial reduction, you just take the sample and shove it into level 0 and then reduce from there

#

you get no flickering or artifacting of any kind

frank sail Jan 22, 2024, 1:41 AM

#

wicked notch for some reason it's not broken anymore

#

the gnomes invaded your PC and fixed it

wicked notch Jan 22, 2024, 1:42 AM

#

can you try it out in frogfood as well, just to check whether I'm going crazy or not

#

because it actually works

#

I don't know why it actually works

frank sail Jan 22, 2024, 2:39 AM

#

wicked notch but if during the HzbCopy shader, instead of doing an initial reduction, you jus...

can you show the code

#

after you wake up

wicked notch Jan 22, 2024, 11:18 AM

#

HzbCopy
HzbReduce
Cull

frank sail Jan 22, 2024, 11:20 AM

#

wicked notch but if during the HzbCopy shader, instead of doing an initial reduction, you jus...

erm what I mean is the code for this step

#

sounds like 1-2 lines

wicked notch Jan 22, 2024, 11:21 AM

#

oh yeah

frank sail Jan 22, 2024, 11:21 AM

#

originally we took the nearest 4 texels or so

#

but it sounds like you want to do something different

wicked notch Jan 22, 2024, 11:21 AM

#

it's just this ```glsl
void main() {
const ivec2 sourceSize = textureSize(g_inputDepthImage, 0);
const ivec2 targetSize = imageSize(g_hzbMainImage);
if (any(greaterThanEqual(gl_GlobalInvocationID.xy, targetSize))) {
return;
}
const vec2 ratio = vec2(sourceSize) / vec2(targetSize);
const ivec2 sourcePosition = ivec2(vec2(gl_GlobalInvocationID.xy) * ratio + 0.5);
const ivec2 destPosition = ivec2(gl_GlobalInvocationID.xy);

const float depth = GetSample(g_inputDepthImage, sourcePosition);
imageStore(g_hzbMainImage, destPosition, vec4(depth));

}```

#

GetSample has an additional safeguard where I do min(position, textureSize(depth, 0) - 1)

frank sail Jan 22, 2024, 11:21 AM

#

aight

#

how could that possibly be correct doe bleakekw

wicked notch Jan 22, 2024, 11:22 AM

#

I have zero clue

#

this should not work

frank sail Jan 22, 2024, 11:22 AM

#

wait what does GetSample do exactly

wicked notch Jan 22, 2024, 11:22 AM

#

and yet it does

frank sail Jan 22, 2024, 11:22 AM

#

is it just texelFetch

wicked notch Jan 22, 2024, 11:22 AM

#

frank sail wait what does `GetSample` do exactly

float GetSample(restrict readonly image2D image, ivec2 coord) {
    return imageLoad(image, min(coord, imageSize(image) - 1)).x;
}```

frank sail Jan 22, 2024, 11:22 AM

#

ok yeah

#

how tf

wicked notch Jan 22, 2024, 11:22 AM

#

I do not know

frank sail Jan 22, 2024, 11:22 AM

#

my guess is that your code subtly fails still

wicked notch Jan 22, 2024, 11:22 AM

#

yes

#

I did not observe this

frank sail Jan 22, 2024, 11:23 AM

#

I can't test rn because I'm deep in this uncommitted code (and need to eep soon)

wicked notch Jan 22, 2024, 11:23 AM

#

I tested on the classic intel sponza long tile and it works fine

#

which scene was used the last time we observed HZB failure?

frank sail Jan 22, 2024, 11:24 AM

#

uh I think khronos sponza

#

that one had the worst artifacts for me

wicked notch Jan 22, 2024, 11:29 AM

#

after observing extensively for 5 minutes, I cannot see artifacts

#

it slightly underculls though

frank sail Jan 22, 2024, 11:29 AM

#

but it does cull something

wicked notch Jan 22, 2024, 11:30 AM

#

it does cull a lot yes

frank sail Jan 22, 2024, 11:30 AM

#

when you say it underculls, does that mean it's bugged

wicked notch Jan 22, 2024, 11:31 AM

#

no it culls every smol meshlet inside

#

the camera is outside of sponza

frank sail Jan 22, 2024, 11:32 AM

#

ah so this is just not culling the yuge meshlets

#

which is expected

wicked notch Jan 22, 2024, 11:32 AM

#

works fine here too

#

this is at 1920x1080 with a 1024^2 HZB

#

it makes zero sense

wispy spear Jan 22, 2024, 11:33 AM

#

driber update perhaps in between which fixed something?

wicked notch Jan 22, 2024, 11:34 AM

#

I've had geforce experience that is nagging me to update my drivers for a while now KEKW

wispy spear Jan 22, 2024, 11:34 AM

#

or jaker made someone add IrisVk.exe to "the list"

wicked notch Jan 22, 2024, 11:34 AM

#

ShooterGame.exe

faint crane Jan 22, 2024, 5:55 PM

#

We experienced some solar radiation storms up north which may have flipped the bit you needed to fix Hi-Z culling.

wicked notch Jan 22, 2024, 10:30 PM

#

to avoid clogging the #engine-dev channel

#

we have unity

#

it is now time to reverse engineer the shit out of this engine

wispy spear Jan 22, 2024, 10:31 PM

#

~~does it have hzb and vsm? 😄~~

wicked notch Jan 22, 2024, 10:32 PM

#

maybe it does have hzb, but surely not VSM

#

that means it's 1-0 already for GP

wispy spear Jan 22, 2024, 10:34 PM

#

hehe

#

i also saw this popping up in my recommendations https://www.youtube.com/watch?v=w99UcsgkUgE

YouTube

Keenan Crane

Geometry Processing with Intrinsic Triangulations (Day I)

This video is the first in a series of two lectures given by Keenan Crane at the Harvard FRG Workshop on Geometric Methods for Analyzing Discrete Shapes: https://cmsa.fas.harvard.edu/frg-2021/

Day II: https://www.youtube.com/watch?v=JQ2burHX710

Abstract: The intrinsic viewpoint was a hallmark of 19th century geometry, enabling one to reason ab...

▶ Play video

wicked notch Jan 22, 2024, 10:37 PM

#

oh nice

#

bookmarked

wispy spear Jan 22, 2024, 10:38 PM

#

seems to be a multipart-er thing

wicked notch Jan 22, 2024, 10:39 PM

#

oh wow, unity has shit default shadows

#

#

peter panning mmm yes

#

oh damn unity uses dx11 by default KEKW

wheat haven Jan 22, 2024, 10:48 PM

#

it feels like dx11 is still the default for most things these days

wicked notch Jan 22, 2024, 10:52 PM

#

uh

#

is unity shit by default or am I just too inexperienced with unity

wispy spear Jan 22, 2024, 10:52 PM

#

both perchance

#

im sorry, i couldnt resist

wicked notch Jan 22, 2024, 10:53 PM

#

just take a look at this capture

📎 x.rdc

wicked notch Jan 22, 2024, 10:53 PM

#

wispy spear im sorry, i couldnt resist

you're right tho KEKW

#

besides the plane being 200 triangles of allah

#

they don't do any indirect shenanigans, meshletisms, HZBs

wheat haven Jan 22, 2024, 10:56 PM

#

yeah I see z-prepass, CSM, rendering, postfx

#

and one other pass which might be bloom? I can't tell

wicked notch Jan 22, 2024, 10:57 PM

#

it's SSS

wheat haven Jan 22, 2024, 10:57 PM

#

ah

wicked notch Jan 22, 2024, 10:57 PM

#

of all things KEKW

#

wait hold up

#

I need to use HDRP if I want the good™️ stuff

#Iris - A Journey through OpenGL and beyond to learn Graphics