#Iris - A Journey through OpenGL and beyond to learn Graphics

1 messages · Page 7 of 1

frank sail
#

devsh bingo:
he will shill his lib to you (free space)

wispy spear
#

hehe

#

just stand your ground

distant lodge
#

assert your dominance by shilling your lib to him

wicked notch
#

I think I just had a huge brain moment

#

Actually nevermind

#

It's view dependent so it won't work

#

goddamnit

#

pure sadness

#

I thought marking the biggest primitive in the cluster and checking only that would be smart, but it isn't

glass sphinx
wicked notch
#

since meshlets are disconnected, I can't guarantee that such primitive will be in view when others are

#

What does this meeean

#

How can I do this in a way that doesn't take all my goddamn compute power

distant lodge
#

you'd think just the AABB extents would give you a close enough read on that

wicked notch
#

yes but they won't, because meshoptimizer prioritizes topological efficiency to locality efficiency

#

Which means clusters are very much welcome to have disconnected triangles that span the entire mesh

glass sphinx
glass sphinx
wicked notch
distant lodge
#

oh I get what you mean, but I have a feeling that meshopt's clusterizer doesn't work like nanite's

#

and I think zeux's opinion on locality only holds if you're not making a hybrid hw-sw rasterizer

wicked notch
#

Yes I get that, this all translates in "I have to make my clusterizer that matches my requirements"

#

I have no idea how to though bleakekw

distant lodge
#

you could look at nanite's, or start simple

#

maybe try converting your meshes to triangle strips

#

and then subdivide those strips into equal pieces

#

that should give you some kind of locality

wicked notch
#

I'll try to find Unreal's clusterizer, their folder structure is kinda insane KEKW

cedar seal
#

Wouldn't something like octree give you locality?

wicked notch
#

Possibly, I'd have to experiment

wicked notch
#

These are all the software rasterized clusters

#

For some reason I don't understand there are clusters that are way too big

#

Ah I wasn't abs'ing properly

#

Now it's good

#

Not gonna lie it's kinda cool to see it work in practice, even if 50% of the potentially software rasterizable clusters are hardware rasterized...

#

Looks pretty good

#

(still no culling btw)

wicked notch
#

With everything enabled it's absurdly fast

#

You can kinda see when the rasterizer chokes because of small tris, for 100 microseconds or so vertex occupancy is great but pixel occupancy is very low, then for the next 200 microseconds everything is good

wispy spear
#

you are an animal

#

Nanite 2.0 soon

cedar seal
#

Frognite

wicked notch
#

Yes

#

That'll be the name of this

cedar seal
#

Is this iris thing somewhere in github?

wispy spear
#

check pins

frank sail
#

da peeens

wicked notch
#

so

#

here's Sponza

#

very cute huh?

#

except uh

#

Perhaps I subdivided it a bit too much

frank sail
#

you got that signature nanite look though

wispy spear
#

now add Lumen 2.0

#

perhaps call it Nemul 🙂

wicked notch
#

GI is terrifying

distant lodge
#

you're right

wispy spear
#

says the person who just reshrimplemented Nanite

frank sail
wicked notch
#

The hard part is the LOD DAG

wispy spear
#

ah

#

thats where you should also talk to devsh

wicked notch
#

yes

wispy spear
#

he has a little video series on his discord explaining it

wicked notch
#

btw I'm curious to check if I effectively wasted my time, or if this is actually better

#

so I'm packaging this high poly sponza™️ for you guys to test

distant lodge
#

oh wtf so he actually implemented his own meshleting LOD

#

I'm hyped to see if it works on my 760

#

hyped to see if I even have the video memory for it

frank sail
#

I don't think lod is in yet

#

But I'm hyped

#

Oh you're talking about devsh

distant lodge
#

yeah

wispy spear
#

my 2x780ti were able to run devsh's LOD thing with 25mio tringles iirc, 1.5years ago

wicked notch
#

Go ahead boys, post screenshots and perf data

wispy spear
#

only if it runs on lunix too 😛

wicked notch
#

It's just a model

#

The high poly sponza

#

To test with your own schtuff

wispy spear
#

ah

#

fuck

#

im blind now

wicked notch
#

Here's my perf data

#

1.96ms to rasterize
2.22ms total with textures
no culling

wispy spear
#

1.x GB 😛

#

frametime 41ms

wicked notch
#

Hmm, can I see what's taking so much time

wispy spear
wicked notch
#

In NSight

raven orchid
#

I sometimes get “not supported on this hardware” messages with parts of the profiler

wicked notch
#

Actually yeah, I forgor that GPU trace only works with new GPUs

#

You should still have the frame profiler though

raven orchid
#

yeah frame profiler should work

#

dang wish the full profiler worked 😦

#

for this mesh did you subdivide it in your thing and then export or how did you gen?

wicked notch
#

Regular catmull-clark subdivision

frank sail
wicked notch
#

I downloaded some library

frank sail
#

I think demongod said it doesn't work on his pascal gpu though

raven orchid
#

yeah with opengl it gets mad for some reason

frank sail
#

also make sure you run nsight as admin or sudo

raven orchid
#

well it gets mad for 2 reasons: A) it does not like pascal

frank sail
#

that's another classic

raven orchid
#

B) does not support anything but DX12 or VK

frank sail
#

I can use it just fine for my opengl apps

#

just not the shader profiler

raven orchid
#

can you profile the shaders

#

ah yeah

#

😦

#

same

frank sail
#

gpu trace is good enough for me 90% of the time

#

e.g., if it says you're vram bottlenecked, it's generally quite obvious what lines would impact that

raven orchid
#

yeah true, usually you can frogdelet through the code for that stage and guess the bad parts

wicked notch
#

mr jaker

#

what do you have to say about this kind of occupancy

frank sail
#

register usage seems kinda high, but I'm not sure how to interpret the graph exactly

#

oh this is my own thing lol

wicked notch
frank sail
#

what mesh are you drawing

wicked notch
#

smh doesn't even recognize his own brainchild

wicked notch
frank sail
#

I see

#

how many primitives

wicked notch
#

25 million

frank sail
#

I mean in terms of draw calls

#

Doesn't seem like too many tbh, which is good

wicked notch
#

Ah yeah, it's very little

frank sail
#

All I can say is that the RSM pass is the only thing I attempted to optimize

#

The rest is just a very naive deferred renderer

wicked notch
#

25 draw calls btw

frank sail
#

Lol the sample RSM pass is super L2 bound because it's doing random samples

raven orchid
#

I tried to open it in blender and i don’t have enough ram for that lol

wicked notch
frank sail
wicked notch
#

1080p

#

No FSR2

frank sail
#

nice

#

fsr2 performs horribly on nv anyways

#

also, this is definitely vertex bound

wicked notch
#

do I have your permission to implement my thing in frogfooding

frank sail
#

what thing frogstare

#

meshlet renderer?

wicked notch
#

yes, without hybrid sw/hw though

#

Because I can't do that in GL 😦

frank sail
#

does that require mesh shading?

wicked notch
#

no

#

good old vertex shader

#

I am planning to switch back to vertex shaders as well

#

I like being able to run on more than 0.01% of systems bleakekw

frank sail
#

does your technique require any deps

#

besides maybe meshopt

wicked notch
#

just meshopt

frank sail
#

yeah you can add it if you want

#

though

#

it's in very early dev atm

#

I didn't originally plan frogfood as a collaborative thingy, but this feature seems lit enough that I'm willing to try

wicked notch
#

Epique

frank sail
wicked notch
#

nope

#

I need that only because of hybrid sw/hw raster

frank sail
#

oh because regular shaders

#

noice

wicked notch
#

If I were able to write to D32 attachments from compute life would be so much easier (and no 64 bit atomics either)

frank sail
#

you can write to a D32 attachment and then copy

#

oops I mean R32

#

it's ugly, but works

wicked notch
#

How do you merge 2 D32 attachments?

frank sail
#

wdym

wicked notch
#

Uhhh

#

It could work?

#

I mean I need to store meshlet_id and primitive_id

#

so how would I update the meshlet_id and primitive_id only if the depth test passed?

frank sail
#

cas loop bleakekw

#

but yeah just frogment shader is fine

wicked notch
#

I could yolo it like this:

const float depth = uintBitsToFloat(imageLoad(depth, position).x);
if (depth > current_depth) {
    // YOLO
    imageStore(depth, position, current_depth);
    imageStore(visbuffer, position, meshlet_id << 24 | primitive_id);
}```
frank sail
#

ship it

glass sphinx
frank sail
wicked notch
#

The only thing that's kinda meh with the old vertex shader approach is the preprocessing step

#

It takes a good third of the raster time*

#

Perhaps one could cache this

raven orchid
#

Reprocesses all vertices each frame?

wicked notch
#

Yeah, it builds an index buffer with meshlet_id << 7 | primitive_id & 0x7f

frank sail
#

chonky

wicked notch
#

The shader's super easy as well

#
shared uint base_index[MESHLET_PER_WORK_GROUP];
shared uint base_primitive[MESHLET_PER_WORK_GROUP];
shared uint primitive_count[MESHLET_PER_WORK_GROUP];

void main() {
    const uint meshlet_base_id = gl_WorkGroupID.x * MESHLET_PER_WORK_GROUP;
    const uint meshlet_offset = gl_LocalInvocationID.x / 64;
    const uint meshlet_id = meshlet_base_id + meshlet_offset;
    const uint local_id = gl_LocalInvocationID.x % 64;
    const uint index = local_id * 3;

    if (meshlet_id < meshlet_count && local_id == 0) {
        const meshlet_t meshlet = meshlets[meshlet_id];
        base_index[meshlet_offset] = atomicAdd(o_command.index_count, meshlet.primitive_count * 3);
        base_primitive[meshlet_offset] = meshlet.base_primitive;
        primitive_count[meshlet_offset] = meshlet.primitive_count;
    }
    barrier();
    if (meshlet_id < meshlet_count && local_id < primitive_count[meshlet_offset]) {
        o_meshlet.indices[base_index[meshlet_offset] + index + 0] = (meshlet_id << 8u) | (primitives[base_primitive[meshlet_offset] + index + 0] & 0xffu);
        o_meshlet.indices[base_index[meshlet_offset] + index + 1] = (meshlet_id << 8u) | (primitives[base_primitive[meshlet_offset] + index + 1] & 0xffu);
        o_meshlet.indices[base_index[meshlet_offset] + index + 2] = (meshlet_id << 8u) | (primitives[base_primitive[meshlet_offset] + index + 2] & 0xffu);
    }
}```
wispy spear
#

FFU 🙂

#

should have a name too

wicked notch
#
#define PRIMITIVE_MASK (MAX_PRIMITIVES - 1)```
![bleakekw](https://cdn.discordapp.com/emojis/1082598350303539240.webp?size=128 "bleakekw")
frank sail
#

not too bad

wicked notch
#

I made it better

#

One less memory load goes a long way

wispy spear
#

how many ms does that save?

wicked notch
#

About half

#

0.29 -> 0.15

wispy spear
#

not bad

#

means more juice for potential denoisers/giisms later

wicked notch
#

I feel like the driver is hiding some more juice though

#

The hardware itself too

wispy spear
#

that is why you shuold write about all that

#

and make it available for others outside our GP bubble

raven orchid
#

How does nanite do lods? For some reason I didn’t think they were anymore

distant lodge
#

the LODs are like the bread and butter technically speaking

wicked notch
#

Given a list of N clusters, they build a DAG where the leafs are the most detailed LODs

glass sphinx
#

virtualized geo makes it possible to draw so much

raven orchid
#

Yeah I guess when I hear lod I think chunky stuff from older games

glass sphinx
#

its really complex, especially the hierarchy building is very complex and involved

#

nanite cal dynamically lod parts of obnjects even

wicked notch
#

Then for each LOD level until we reach the root, we take M clusters (let's say M=4 clusters) with the most shared boundaries and shrimplify them, the shrimplification happens such that the resulting cluster will be half the triangles of the previous cluster, after that we split the cluster into two clusters and they will be the parents

finite yacht
wicked notch
#

We simplify clusters in groups because otherwise we would have cracks where cluster boundaries don't match, so we try to group clusters with the most shared boundaries, and leave the outer boundaries of the group unchanged

#

This is repeated until we reach the root which is always MAX_PRIMITIVES triangles

raven orchid
#

That’s super cool

#

So right now is yours rendering tons of stuff but at max lod?

wicked notch
#

How to choose the correct cut for the DAG is even more clamplicated, it's based on screen space projected error using a quadric error metric

wicked notch
#

But honestly the error thing flew so high over my head that I'm thinking of not implementing that bleakekw

#

Brian Karis himself said it took over one year of full development just to get the error metric right

raven orchid
#

Wow

#

Rnd for nanite and lumen was extreme

wicked notch
#

Everytime I hear this I get scared frog_gone

raven orchid
#

Oh yeah 1 man year he said

wispy spear
#

its supposed to be simple to be implemented

#

otherwise they wouldntve said "simplify"

wicked notch
#

I was thinking of doing some good old shading

#

lighting, shadowing, real time SDF based global illumination

#

But for shadows I would need to call the whole Frognite™️ (patent pending) pipeline N times

#

So multiview?

#

Do I just dispatch more workgroups (for software)?

#

And I guess I could just use native multiview for mesh shaders

wicked notch
#

Aaaaaaaand gltfpack can't properly convert normal maps to BC5_RG

#

epic

frank sail
#

what does it do?

wicked notch
#

It expects the normal map to be transcoded to BC1_RGB

#

so it doesn't swizzle the green channel of the normal map to its alpha channel

frank sail
#

can't wait for your release of 2gltf2pack

#

after you implement Frognite™️ in Frogfood®️ ofc

wicked notch
#

I do plan on releasing Frogmen™️ too

#

(lumen but with frogs)

#

SDFDDGI caught my eye

frank sail
#

more like Lufrog

wispy spear
#

Froglight

wicked notch
#

Before I'll make ridiculous breaking changes

#

Here's some stupid N dot L * base_color shading bleakekw

#

zeux says "it doesn't take into consideration topology very much"

#

Which is exactly what I need KEKW

#

I wonder how many curses I will get from Khronos for integrating DirectX tools with their pristine Vulkan™️ API

#

I also need to start thinking seriously about multiview

#

I am not sure I can use VK_KHR_multiview because I don't bind any color attachments whatsoever in my entire pipeline

#

(except for output to swapchain)

#
typedef struct VkRenderPassMultiviewCreateInfo {
    VkStructureType    sType;
    const void*        pNext;
    uint32_t           subpassCount;
    const uint32_t*    pViewMasks;
    uint32_t           dependencyCount;
    const int32_t*     pViewOffsets;
    uint32_t           correlationMaskCount;
    const uint32_t*    pCorrelationMasks;
} VkRenderPassMultiviewCreateInfo;``` I'm really not sure how to use this ![bleakekw](https://cdn.discordapp.com/emojis/1082598350303539240.webp?size=128 "bleakekw")
wispy spear
#

i also noticed something fuhny

#

our pluginshinanians exposed quantization via gltf, but the plugin itself has some quantificitione options already

wicked notch
#

Does blender document their quantization shenanigans

#

gltfpack is open (blender is too but good luck navigating the 10 million loc black box of stuff bleakekw)

wispy spear
#

i have not checked tbh

#

and its the gltf plugin i was talking about not blender itself

wicked notch
#

Ah it's a plugin

#

By the way

#

Some implementations may not support multiview in conjunction with mesh shaders, geometry shaders or tessellation shaders.

#

This is very sad 😭

wispy spear
#

fook

wicked notch
#

Why would you not support multiview with the feature that literally needs multiview more than any other feature bleakekw

wispy spear
#

non ho idea di cosa sia il multiview

wicked notch
#

basically fancy gl_Layer

wispy spear
#

ah

wicked notch
#

if you make a framebuffer with layers, you can draw to each layer by writing to gl_Layer

#

multiview gives you gl_ViewIndex and it's read-only

wispy spear
#

yeah, rather than employing a gs

wicked notch
#

You set a number of views you want to render to when you setup the thing and then all the draw commands are broadcasted to each view

#

Nanite does this to render all shadow maps, for all lights, in all viewports simultaneously

wispy spear
#

oof that sounds neat

#

something i need too at some poitn for my shadowisms 🙂

#

or lightprobes perhaps?

frank sail
#

Ah so that's how they do vsm

wicked notch
#

ye

#

I wonder how to do multiview for compute

wispy spear
#

glMultiDispatch

frank sail
#

It would be cool if we could schedule barriers from the GPU

wicked notch
#

True but I don't think I need barriers

#

Perhaps I could shrimply put the unused y dimension to use

frank sail
#

or the hidden w dimension that driver devs don't want you to know about

wicked notch
#

The only issue I have is with memory

#

but it's only a couple million uints

#

so a couple megabytes at worst will become a couple hundred

#

With 100 multiviews

frank sail
#

what's the issue with multiview?

#

also, do you do culling for every view frog_sweat

wicked notch
#

course I do it's extremely fast

#

HZB build + cull + classify doesn't even show up in gputrace right now

frank sail
#

even for 100 views?

wicked notch
#

Combined is less than 50 microsecs

frank sail
#

waw

wicked notch
frank sail
#

ah

wicked notch
#

For 100 views you could estimate 5 milliseconds just to cullbleakekw

#

But 100 views is just a huge upperbound, I think unreal supports up to 128

#

But they cache their whole pipeline/scene and make it persist across frames and more shenanigans beyond my comprehension bleakekw

frank sail
#

ouf

cedar seal
#

You really should fork gltfpack. Most of the processing is in meshoptimizer, gltfpack is essentially almost "just" a wrapper for meshoptimizer.

#

Or even file issues / pull requests

wicked notch
#

TODO
Test MaterialID depth buffering:

  1. Rasterize visbuffer
  2. Fullscreen triangle, load visbuffer per pixel and write gl_FragDepth = uintBitsAsFloat(material_id);, depth test set to ALWAYS
  3. Draw more fullscreen triangles, one per material, depth test set to EQUALS
wispy spear
#

not sure if you are into bundles, but https://www.youtube.com/watch?v=6_BBgz5-H20

The Unreal Engine Mega Pack is a huge collection of high quality 3d environment assets.
https://www.humblebundle.com/software/unreal-engine-mega-pack-software?partner=gamefromscratch

This pack from Hivemind contains thousands of 3D objects and blueprints for creating a wide variety of maps, including Viking, Medieval, Harbours, Churches, Houses...

▶ Play video
#

for the triangle counts 🙂

wicked notch
#

epic

#

Hopefully the gltf exported doesn't die as usual bleakekw

wispy spear
#

heh

wicked notch
#

@frank sail

#

I drew a lil something

#

Feast your eyes upon my huge drawing skills

#

This is with regards to SMRT

#

As far as I've understood at least

wispy spear
#

reads like al hamdu lillah

wicked notch
#

It's actually english but my writing skills shit

wispy spear
#

heh, you havent seen my handwirting

wicked notch
#

white text is "Sun"
blue text is "Hit"
green text is "March until hit"

wispy spear
#

ah lol

frank sail
wicked notch
#

How does this produce contact hardening though? thonk

#

Also you seem to define something like a "heightmap thickness" which I have no idea where it comes in

wicked notch
#

the farther away we are, the less likely we are to hit any object?

frank sail
#

Father away means the ground-to-light ray has more time to diverge before hitting a blocker

#

Idk if you've played counter strike, but you might know in shooter games that peeking when you're near a corner will reveal the enemy more quickly than peeking from far away

wicked notch
#

yes

#

I get that, what about the height map thickness though?

frank sail
#

well that's just a heuristic

#

the depth map doesn't tell us the true geometry of everything behind it, so we have to guess somehow

#

so we just say the depth map is a solid wall of N thickness with absolutely nothing behind it (except for the surface we're shading)

wicked notch
#

So instead of the tree we have a huge wall

frank sail
#

unreal uses some additional heuristics that are suggested by console commands

wicked notch
#

From the perspective of the ray at least

frank sail
#

yeah, it's a wall with the outline of a tree

wicked notch
#

I'm testing SMRT in unreal right now lol

#

This is SMRT with 1spp and 1rpp

#

pretty blocky

frank sail
#

btw the UE docs don't cover the new SMRT console commands

#

they remained the same since the 5.0 release

wicked notch
#

Now the obligatory question

#

What are the disadvantages of SMRT?

#

To me it looks like free contact hardening lol

#

Also what happens when the blocker is not in this cascade thonk

frank sail
#

it has light leaking

frank sail
#

you ought to be able to trace within multiple cascades I guess

wicked notch
#

damn

#

Tracing within multiple cascades sounds baad

#

Let's say we got 8 shadow rays

#

8 steps per cascade

#

On average we can assume the blocker has a 50% probability of being in the current cascade

frank sail
#

what I mean is just switching to a different cascade when you go outside the bounds of one

wicked notch
#

hm

frank sail
#

the shadow ray is typically pretty short (even when the blocker is far away, you just teminate the ray), so it's unlikely you'll ever go between more than two

#

at least according to my mental heuristics (I haven't actually implemented this with >1 cascade lel)

wicked notch
#

How do I know when I'm out of bounds though thonk

#

shadow_clip_pos.xy >= 1.0 or <= 0.0?

#

Or maybe when the ray is above the shadow map?

frank sail
glass sphinx
#

btw do you do entity cvulling on the gpu?

wicked notch
#

yes

#

In the "cull_and_classify" step

glass sphinx
#

froge_love manual barriers

glass sphinx
#

not full entities

#

im looking at cull_classify.comp

wicked notch
#

Ah right now it's not the latest version

#

But yes, I am culling meshlets

#

meshlet instances

glass sphinx
#

but not entities before that

wicked notch
#

What do you mean by entity?

glass sphinx
#

i guess im asking if you cull full meshes

#

before you cull the meshlets

wicked notch
#

Ah, nope but good call

#

I should probably do that as well

glass sphinx
#

i was gonna ask how if you did

#

because its actually annoying as fuck

#

because you have an asymetric work expansion from mesh to meshlets

#

each mesh can return a different count of meshlets

#

so you need to either do an indirect draw count, where you cull in the vertex shader and discard all vertices, starting MESH_COUNT draws each having an indirect draw with PER_MESH_MESHLET_COUNT as the number of vertices

#

or you do compute work expansion

#

by doing a prefix sum and then binary search

#

but soon tm we may get work graphs which solve this issue btw

wicked notch
#

Hmm sounds hard indeed

#

Yeah, soon KEKW

glass sphinx
#

this is exactly what work graphcs would solve

#

sadge that we need to wait

glass sphinx
#

making its ultra simple and efficien t

#

ALSO WHY IS THERE NO DISPATCHINDIRECTCOUNT

#

😿

#

@frank sail give it to me

wicked notch
#

Wat

glass sphinx
#

i need multi dispatch indirect

wicked notch
#

??

frank sail
#

why do you want that

wicked notch
#

Wtf there is no dispatch indirect count

#

How

wicked notch
#

It's super useful

glass sphinx
frank sail
#

just increase the size of the dispatch

wicked notch
#

no

frank sail
#

unless you also want indirect global barriers and such

glass sphinx
#

if we get work graphs i wont care at all i wont use any indirect anymore only work graphs if they dont suck

glass sphinx
#

prefix sum and binary search is the most efficient way aside from draw indirect count abuse

frank sail
glass sphinx
#

i need to map from global thread index to meshlet index and mesh index

#

each mesh not culled has different counts of meshlets

#

i need to iterate over the surviving meshes meshlets

#

so i cant just use the thread index to index meshlets

frank sail
#

can you explain how you would use multi dispatch indirect

#

and how that is not mappable to regular indirect dispatch

#

if you need ordering, then you're basically asking for indirect barriers and command submission, which I agree would be cool and useful

glass sphinx
#
  • i would make a buffer containing n dispatch indirect structs
  • each containing the meshlet count / workgroupsize rounded up as the x parameter, 1,1 for y and z
  • i would populate this in mesh culling, each surviving mesh appends to this buffer filling the dispatch values
  • then dispatch indirect count over the array of dispatch infos, each working to cull meshlets for a mesh
#

this can be done with task shaders as well btw

#

vkDrawMeshletsDispatchTasksCount or whatever

#

but it has the stupit shit with setting up drawing and all that for no reason

glass sphinx
#

if you have n meshes

#

each have m[N] meshlets

#

this is why we have draw indirect count

wicked notch
#

You could probably have a buffer with the counts of each meshlet and divide the global ID by some upper bound

frank sail
#

what if you did an indirect dispatch where you use Y or Z to indicate how many meshlets you produced or whatever

glass sphinx
#

its super slow like that

#

so you would get like 95% of the grid wasted

#

one way to fix this

#

is to simply not be gpou driven

wicked notch
#

Unacceptable

glass sphinx
#

and record a dispatch per mesh, and then use predicates of whatever they are called

#

to cull

#

but that is actually much slower

#

as you need an extra dispatch for all draws

frank sail
#

I'm sure the issue is trivially solvable with another level of indirection

glass sphinx
#

it is

#

i do a prefix sum

#

of an array containing meshlet counts

wicked notch
#

prefix sum over the meshlet counts?

#

Hm

glass sphinx
#

for each nonculled mesh

#

then i binary search for each thread in a fat dispatch

#

if they find two counts they are in between

wicked notch
#

Each thread does binsearch?

glass sphinx
#

they found their mesh

#

and can then subtract that meshes prefix sum of their id

#

to get meshlet id

wicked notch
glass sphinx
#

🤓 👆

wicked notch
#

But isn't binsearch for each thread slow as fucc

glass sphinx
#

i had to ponder the orb for that one

glass sphinx
#

at least for small entity counts

#

another solution would be to have different expansion rates

#

so inside the mesh cull shader you do

  • test mesh
  • if meshlet count < 128 append to 128 buffer
  • if meshlet count < 512 append to 512 buffer
  • ...
#

then later you dispatch for each of these buffers the number of entries in x, y is the multiplication to get to the buffer count from workgroup size

wicked notch
#

This sucks

glass sphinx
#

this will probably waste around 70% worst case or so

wicked notch
#

I like binsearch after all bleakekw

glass sphinx
#

if you do it power of two steps, it will be at most 50% ignoring anything under 32 or what ever warp size is

#

i believe it is the best way to combine them

#

so have an if on massive meshes

#

like idk > 1024 meshlets

wicked notch
#

Ye prolly the best

glass sphinx
#

and put them in the buffer for big dispatch or so

#

but its so much work and so stupit

#

GIVE ME DISPATCH INDIRECT COUUUUNNNTTTT

#

it is actually more efficient to do a draw indirect count and cull in vertex shaders im pretty sure

#

just to not do all the shit inbetween

wicked notch
#

I really fail to understand why there is no indirect count for dispatch

#

such a basic thing

frank sail
#

because nobody needed it misinfo

wicked notch
#

Well mr potrick needs it now (and I will be as well in the near future)

glass sphinx
wicked notch
#

So we'll be raiding Khronos HQ

frank sail
#

whip out the copium bois

glass sphinx
#

if we get workgraphs this is all not important

#

they are indirect on all steroids at the same time

wicked notch
#

Workgraphs would solve so many issues with GPU driven it's crazy

glass sphinx
#

yes

frank sail
#

did someone already mention doing a bunch of DispatchIndirect (up to a fixed max) on the CPU and letting the GPU populate each one

#

truly one of the GPU-driven strategies of all time

glass sphinx
#

yep

glass sphinx
#

i think its actually how it should be done if i wasnt full gpu driven

frank sail
#

why do you need predication

glass sphinx
#

i heard from the mountains that some vendors like it over 0 dispatches

wicked notch
#

The olympus gods

frank sail
#

can't you treat it like MultiDispatchIndirect from the GPU side (no count)

glass sphinx
#

i guess 0 is fine

#

would be cool to loop

#

omg give me the command processor

#

i will programm it

#

😟

frank sail
#

abandon vulkan and become an amdgpu main

wicked notch
#

Why stop at that

#

Expose the whole warp scheduler

#

I'll program it myself

frank sail
#

oh, so you want to replace fixed-function bits of the hw? bleakekw

wicked notch
#

While you're at it expose the whole memory subsystem, so I won't need CPU readback to update gpu mem pages bleakekw

glass sphinx
#

ok ok listen to me:
i prerecord a command buffer with 1 million dispatches or osme other high number, the nhave predication aroiund every 100 or so.
Then i fill them as i need enabling predicates to unlock more dispatches.
I the nreuse that cmd buffer every frame

frank sail
frank sail
wispy spear
wicked notch
#

TODO: look at SDF tracing and probe tracing

#

Voxels scare me

#

Actually any kind of data that is supposed to be stored in a data structure that's not a simple ass array scares me bleakekw

raven orchid
#

@wicked notch I haven't done SDF tracing yet

#

for probe tracing are you talking about

wispy spear
#

its about time you do 🙂

frank sail
raven orchid
#

yeah I think SDF is used for things like Godot 4

frank sail
#

I believe the data structure is baked though

raven orchid
#

oh dang so they're not redoing it

#

probes tend to be baked though

#

I think

#

from what I remember for things like Division 2

#

I think their probes end up using sort of offline preprocessing so that each one can cache which surfaces they can see

wispy spear
frank sail
#

why it posted here KEKW

#

but also, very cool

wispy spear
#

luschtri asked to have dfdx(worldpos) vischuellized

frank sail
#

It looks super kewl

wispy spear
#

yeah

#

disco bounding lines

#

im surprised you dont try to sell me "you need fsr2"

#

: >

wicked notch
#

I was doing an experimentationes with dFdx

#

You can transfer the knowledge I gained to frogfooding btw

#
mat3 TBN = mat3(0.0);
{
    const vec3[] world_positions = vec3[](
        vec3(transform * vec4(positions[0], 1.0)),
        vec3(transform * vec4(positions[1], 1.0)),
        vec3(transform * vec4(positions[2], 1.0))
    );
    const vec3 ddx_position = analytical_ddx(derivatives, world_positions);
    const vec3 ddy_position = analytical_ddy(derivatives, world_positions);
    const vec2 ddx_uv = uv_grad.ddx;
    const vec2 ddy_uv = uv_grad.ddy;

    const vec3 N = w_normal;
    const vec3 T = normalize(ddx_position * ddy_uv.y - ddy_position * ddx_uv.y);
    const vec3 B = -normalize(cross(N, T));

    TBN = mat3(T, B, N);
}```
#

Here how I do TBN now

#

No tangents required

#
vec3 analytical_ddx(in partial_derivatives_t derivatives, in vec3[3] values) {
    return vec3(
        dot(derivatives.ddx, vec3(values[0].x, values[1].x, values[2].x)),
        dot(derivatives.ddx, vec3(values[0].y, values[1].y, values[2].y)),
        dot(derivatives.ddx, vec3(values[0].z, values[1].z, values[2].z))
    );
}

vec3 analytical_ddy(in partial_derivatives_t derivatives, in vec3[3] values) {
    return vec3(
        dot(derivatives.ddy, vec3(values[0].x, values[1].x, values[2].x)),
        dot(derivatives.ddy, vec3(values[0].y, values[1].y, values[2].y)),
        dot(derivatives.ddy, vec3(values[0].z, values[1].z, values[2].z))
    );
}``` With this
wispy spear
#

is that from how to reconstruct normals out of thin air?

wicked notch
#

No but that's the next step bleakekw

wispy spear
#

heh

wicked notch
#

I could actually calculate normals analytically right now

wispy spear
#

i rember there was a blog flying around wrt to that, recently

wicked notch
#

It's just normalize(cross(v[2] - v[0], v[1] - v[0]))

wispy spear
#

cheeky

wicked notch
#

So the vertex data becomes just position and UV

#

And we can quantize both of them perfectly bleakekw

#

Road to 0 byte vertex format

raven orchid
#

dang that's pretty cool

wispy spear
#

hehe

frank sail
wispy spear
#

this is brainworm 2.0

#

perhaps you can smear a little dithering over it, nobody will notice non smoof norbels

frank sail
wispy spear
#

powerplant.obj.vs.glsl

frank sail
#

I think you mean, tessellate the mesh so much that multiple pixels do not share a triangle

#

that's how you get smooth face balls

wicked notch
#

That's the objective with Nanite anyways 🚠

frank sail
#

Eckszacktly

wicked notch
#

I wonder if I could compute a gradient for smoothizing the normals

frank sail
#

You have to get the neighboring faces too

wicked notch
#

we can't use subgroup ops in frag shaders right? 😦

frank sail
#

Which means you pass a half edge structure to the GPU bleakekw

#

There are certain subgroup ops that you can use

#

Like the quad ones

#

I think that's it

wicked notch
#

Is there no WaveReadAcrossQuadLaneX or something like that

frank sail
#

yeah there is

#

ctrl f quad

wicked notch
#

subgroupQuadBroadcast what a shit name

#

I thought this was a "write" operation, not a read one

wispy spear
#

(facepalming at the name, not you)

frank sail
#

It actually broadcasts an FM radio signal when you call it

wispy spear
#

so you can tune in?

frank sail
#

Ye so we can look at dem quads

wicked notch
#

I prefer triangles

minor root
#

did you end up figuring out the dFdx thingy

wicked notch
#

yes

minor root
#

ah

#

this thing

#

its neat

#

hows perf

wicked notch
minor root
wicked notch
#

400 microseconds in total for bistro with sampling and all

#

At this point I should begin optimizing the memory bandwidth of the GPU because that's my bottleneck bleakekw

#

Multiview is coming soon

frank sail
#

You'll port that to #1128020727380054046 right frogstare

wicked notch
#

multiview ain't supported on GL 😭

minor root
#

you'll port all this to #1073361699651989584 right

wicked notch
#

It's all open sus

frank sail
wicked notch
#

ye for oculus only

minor root
wicked notch
#

Though I have no idea if any other vendor silently supports this ext bleakekw

frank sail
#

Czech gpuopen

#

Wrong gpu website

#

I meant gpuinfo

wispy spear
#

youll port all this to #1019740157798273024 right

#

hehe we do a little funny

wicked notch
#

Holy shit it's supported

frank sail
#

Wdym

wicked notch
#

Incredible

wispy spear
#

brainworm 3.0 unlocked

frank sail
#

You know what to do now

wispy spear
#

there is a OVR_multiview2 too

frank sail
#

OVR = OpenVR

wicked notch
wispy spear
#

i difuger

wicked notch
#

You go ahead and implement all the PBRisms

wispy spear
#

you can study tomorrow evening

frank sail
#

bad parenting deccer bleakekw

wicked notch
#

Anyways, returning to GI one sec

#

@raven orchid How exactly do VPL work?

raven orchid
#

Then single or multi bounce lighting just becomes spawning virtual lights around the scene

#

Which I think can somewhat be related to probe based lighting too

wicked notch
#

Hmm

raven orchid
#

For that instead of VPLs they spawn probes

#

And probes capture light info for regions of the world

minor root
#

that seems pretty neat

wispy spear
wicked notch
#

The gains are from avoiding expensive barriers and state changes

#

If you are doing basic things with no barriers multiview is unlikely to make a diff

minor root
#

another unanswered nvidia forum post bleakekw

#

they really do not reply at all

wicked notch
#

typical

twin musk
#

unfortunately both have been shortened to ovr

frank sail
#

pranked

twin musk
proven laurel
#

I am considering getting it but don't want to waste money KEKW

wicked notch
#

I forgor 💀

#

one sec

wicked notch
#

Alright I produced a functional FBX

#

ignore blender not responding bleakekw

wicked notch
#

success

#

It's not a very high poly scene though, just 20 million instanced triangles

wispy spear
#

what does instanced mean in this ocntext?

#

are those wall towers the same mesh and those have been instanced?

wicked notch
#

yes, the trees too

wispy spear
#

ah

#

kewl

wicked notch
#

It's also just 34MB lol

#

it doesn't have any textures sadly

#

Perhaps Unreal's FBX exporter is unable to export textures?

wispy spear
#

yup it seems that way

#

i also had no luck so far

wicked notch
#

triangle dust KEKW

wispy spear
#

: )

#

how many this time?

wicked notch
#

34 million

#

smol increase

#

normals are also a bit fucced

wispy spear
#

looks ok to me

#

are you sure you loaded them not as srgb 😛 (i know there are no maps yet)

wicked notch
#

It's stupidly small somehow lol

wispy spear
#

uhm

#

this mesh turns my "modelviewer" black and imgui wont show up either 😄

#

4:54:54

wicked notch
#

epic KEKW

#

renderdoc says anything useful?

wispy spear
#

i doublt i can even take a capture 😛

wicked notch
#

Perhaps you could display the instance ID or the primitive ID

#

here's the color func I use

#
vec3 hsv_to_rgb(in vec3 hsv) {
    const vec3 rgb = saturate(abs(mod(hsv.x * 6.0 + vec3(0.0, 4.0, 2.0), 6.0) - 3.0) - 1.0);
    return hsv.z * mix(vec3(1.0), rgb, hsv.y);
}
#

You use it like this

hsv_to_rgb(vec3(float(gl_PrimitiveID) * M_GOLDEN_CONJ, 0.875, 0.85)```
wispy spear
#

ah

#

42k nodes, its at 24k or so

#

that seem to take ages to load 🙂

wicked notch
#

great

#

I wonder why it takes ages to load, I can load it in a few ms

wispy spear
#

because my code is shit presumably

wicked notch
#

incredible

wispy spear
#
[16:55:33 DBG] SharpGltfMeshLoader: Loading Material MI_Fountain_Water_Inst
[17:00:08 DBG] SharpGltfMeshLoader: Loaded 46961 primitives from /home/deccer/Personal/Code/Projects/lessGravity/OpenSpace/src/OpenSpace.Main/bin/Debug/net7.0/Data/Props/bazaar.glb
``` 4.5min 😄
#

ok this is weird, i has finished loading everything, but screen is black XD

#

wtf

#

its busy creating the meshpool out of those 43k meshprims 😄

#

ok i have to work on that hehe

wicked notch
#

Ah you don't handle instancing nervous

wispy spear
#

yes, i dont handle it

#

ie i dont check gltf extensions

wicked notch
#

Ah no need

#

it's not using EXT_mesh_instancing

#

It's just using regular gltf node instancing

wispy spear
#

ah

wicked notch
#

where multiple nodes reference the same mesh

wispy spear
#

i just iterate over all nodes

#

and meshes should be handled properly, i believe my deccer cubes work the same way

#

thanks for this fucked model : > to show me how shit my code is

wicked notch
#

I blame unreal

wispy spear
#

na, my code is also actually shit

#

too much memory copy bs and allocation shinanigans

wicked notch
#

Huh

#

Apparently KHR mesh shaders require the "geometryShader" feature to be enabled thonk

wispy spear
#

ugh

wicked notch
#

when unreal engine

wispy spear
#

where did the other 64gig go?

#

didnt you upgrade to 128?

wicked notch
#

Yes, but I constantly bluescreed bleakekw

#

MEMORY_MANAGEMENT or some stuff

#

Turns out my CPU's IMC did NOT like 128GB

#

smh Jaker

#

fix your CPUs

wispy spear
#

fook

proven laurel
proven laurel
minor root
#

why did you buy 64 gb of ram before checking if your cpu can handle it bleakekw

wicked notch
#

It was on the QVL

#

I blame the QVL

wicked notch
#

TODO

#

learn how the fuck they managed this

#

Hold on

#

They just went: "alright no available API allows us to do efficient BVH rebuild what do we do"

#

"we obviously forgo hardware acceleration and just use Embree and make our own BVH!"

#

amazing

#

If they can make a GPU accelerated, dense BVH based on clusters

#

Why can't AMD or NVIDIA

#

bruh

proven laurel
#

which is why I meant could be a timing issue

wicked notch
#

Alright boys

#

it is time for one of my usual detours

#

Like last time with mesh shaders, we can all see it didn't turn out into anything serious

#

It's not like I'm in a rabbit hole 9km deep into meshlets

#

Totally not that

#

Anyways this time the detour will be RayTracing!

#

After the last exam is done, I will spend day and night learning aboud BVHs on the GPU (doing them myself, no VK_KHR_ray_tracing)

#

Then I will re-read the paper about nanite style RT LODs

#

and finally I will try making an issue on Vulkan-Docs to see how the big brains over at Khronos will receive it

wicked notch
#

Also will probably buy a 7600XT

#

Because RADV

glass sphinx
#

bleakekw the driver sink hole you will fall in gives me enough time to catch up again

#

i can draw again btw

#

i am slowly clawing back my power in the rewrite

wicked notch
#

I saw your impl of the entity culling btw, I think I get it now

#

Amazing ideas behind it

wicked notch
#

You guys remember the ballz

#

It's time to rewrite the raytracer, on the CPU with a proper BVH this time bleakekw

frank sail
#

Nice balls homie, solid 8/10

wicked notch
#
void trace(bvh, origin, direction) -> color {
  auto ray = { origin, direction };
  const auto max_bounces = 32;
  for (i = 0; i < max_bounces; ++i) {
    auto hit = bvh.traverse(ray);
    if (!hit) {
      break;
    }
    ray.origin = hit.point;
    ray.direction = random_direction_in_hemisphere(hit.normal);
  }
}``` hmm
#

deep thought

wicked notch
#

does each primitive need to store an ID to the mesh it pertains?

#

So each BVH node will contain the ID of the primitive and the ID of the mesh?

fluid jungle
wicked notch
#

I just went on the usual adobe color picker and choose something that looked nice

fluid jungle
#

looks very vibrant

wicked notch
#

Also I wanted to mimic Sebastian Lague's layout so that I had a good reference image

wicked notch
#

I have not

#

why did you not send me these when I first asked you months ago

frank sail
#

my dementia only allows me to remember a small number of things at a time

fluid jungle
frank sail
#

btw on AMD, an ID (often used for materials) is stored in triangle nodes

#

and each triangle node can store up to four triangles arranged as a fan (so only five total positions have to be stored)

wicked notch
#

Damn BVHs are heavy

#

The library I'm using stores bounding box (24 bytes) and 2 indices (8 bytes)

frank sail
#

box nodes hold "pointers" (indices) to four children as well as their bounds, stored as f16 or f32 (so there are two kinds of box nodes)

wicked notch
#

Overall I can see how to send this thing to the GPU though

#

it's basically a flat tree stored as a vector

frank sail
#

so I guess you're not doing hw rt

wicked notch
#

Not yet™️

#

I first have to understand the basics before I can trust the hardware to do it right bleakekw

frank sail
#

tru

#

for more info, check the RDNA 2/3 ISA guides and search for IMAGE_BVH_INTERSECT_RAY

#

well it might not be that much more info

minor root
#

@wicked notch sir web wizard

#
<div style="display:flex;align-items:center;">
                <input
                  type="checkbox"
                  id="{{component.id}}-checkbox-{{option}}"
                  name="{{component.id}}"
                  value="{{option}}"
                  [checked]="component.val.includes(';{{option}};')"
                  (change)="onCheckboxChange($event)"
                  style="width: 10%; height: 30px;">
                <label for="{{component.id}}-checkbox-{{option}}" style="">{{option}}</label>
              </div>

why the label nicely centered but the checkbox not

minor root
#

ok bruh the input was inheriting some css that messed it up

#

thanks previous devs

wicked notch
#

The exam have finished

#

I now have endless free time

glass sphinx
#

LVSTRI be working in the secret 25h h each day

wicked notch
#

I will admit my average sleep time this month was 3 hours

minor root
finite yacht
wide shadow
wicked notch
#

Alright for now I'll shrimply store another indirection vector

#

purpose is mesh_id = ind[prim_id]

#

I see now why we have two BVHs bleakekw

finite yacht
#

you have a blas for each mesh right?

wicked notch
#

Ye that's the plan at least

finite yacht
#

so when iterating through the blases and traversing each one cant you remeber the index, just like you remember closest hit pos

wicked notch
#

Perhaps that would be best

#

Also my "mesh" right now is a single triangle KEKW

#

I'll just try tracing this tringle for now

wispy spear
#

soon lvstri will be snatched by some big $GPUVENDOR where he is put in the basement to work on $TECH and we will never see/hear/read from him anymore

wicked notch
#

behold

#

A photorealistic render of a triangle in a scene with no light sources

wispy spear
wicked notch
#

The primitives in the BVH have to be NDC don't they

#

No actually that's wrong

#

hmm

wicked notch
#

I've lost count of how many times I had to draw a tringle

#

But here we are again

#

with bonus barycentric coordinates

#

I decided that everything shall be world space for simplicity

wispy spear
#

is it srgb though?

wicked notch
#

now it is

wicked notch
#

oh shit

#

how do we parallelize trasversal

frank sail
#

put it in a shader

wispy spear
#

Parallel.ForEach(trasversals, trasversal => {});

#

i should be quiet, i cant do any gp really : >

frank sail
#

also, wdym "parallelize traversal"? like you want one ray's traversal to be parallelized?

wicked notch
#

Perhaps it's just this library I'm using, but their BVH trasversal function isn't really thread safe

#

it does execute in parallel internally I think though

frank sail
#

tf

#

how

#

traversal should be thread safe

wicked notch
#

intersect() isn't marked const

#

so that could be why

#

it modifies internal state or something

frank sail
#

what library is this

wicked notch
frank sail
#

madmann is here btw

wicked notch
#

actually hold on

#

I'm dumb

#

yeah I'm dumb

#

the ray required for trasversal isn't const

#

but intersect is

frank sail
wicked notch
#

so... safe?

#

ish

#

I mean I should lock the ray bleakekw

frank sail
#

what does the function return?

wicked notch
#

intersect? Nothing

frank sail
#

actually can you just tell me what file it's in

wicked notch
#

it takes a function that is supposed to iterate over some primitives and intersect each one with the ray

frank sail
#

ok so I guess the ray is mutable in case one of the callbacks needs to mutate it

#

rays also store tmin and tmax

wicked notch
#

hmm yes

#

makes sense

frank sail
#

so it should be perfectly fine to call that fn from many threads

wicked notch
#

yep

#

I am invoking UB

#

I wonder why I didn't doubt myself before doubting the lib

#

smh

#

Also fun fact, none of the internal usages of ray change its state apparently

#

Perhaps I'm missing something?

frank sail
#

czech the exshrimples

#

oh ok so tmin and tmax are actually the min and max distance for the ray to travel

wicked notch
#

casual 40ms

#

to trace a single tringle

#

amazing

frank sail
#

so I guess it's sorta an inout ray param

wicked notch
#

Actually, not during trasversal

#

during ray-triangle intersection

frank sail
#

close enough 😎

wispy spear
#

mayhaps duing tlasversar

wicked notch
#

I wonder why not make tmin and tmax optionally atomic

#

Holy pog 4ms

frank sail
wicked notch
#

ya got a point

#

I wonder if loading in suzanne would be a good idea

#

Damn, 10ms

#

Not bad

#

Now I'll load intel sponza bleakekw

frank sail
#

hmm how many tris are in suzanne?

#

if intel sponza has 1000x as many tris, I expect only about a 10x decrease in perf (assuming bvh2)

wicked notch
#

to be fair

#

logN is great

frank sail
#

oh nice only 5x decrease

wicked notch
#

quite nice tbh

frank sail
#

not bad

wicked notch
#

I'll go back to cornell box though bleakekw

frank sail
#

this is cpu too

wicked notch
#

ye fully CPU

#
executor.for_each(0, height, [&](size_t start, size_t end) {
    for (auto y = start; y < end; ++y) {
        for (auto x = 0u; x < width; ++x) {
            auto color = glm::vec4(0.0f, 0.0f, 0.0f, 1.0f);
            for (auto s = 0u; s < spp; ++s) {
                const auto u = x / static_cast<float>(width - 1);
                const auto v = y / static_cast<float>(height - 1);
                const auto uv_near = glm::vec4(glm::vec2(u, v) * 2.0f - 1.0f, 0.0f, 1.0f);
                const auto uv_far = glm::vec4(glm::vec2(u, v) * 2.0f - 1.0f, 0.1f, 1.0f);
                auto world_near = inv_pv * uv_near;
                auto world_far = inv_pv * uv_far;
                world_near /= world_near.w;
                world_far /= world_far.w;

                auto ray = bvh_ray(
                    as_vec3(world_near),
                    as_vec3(glm::normalize(world_far - world_near)));
                auto bary = glm::vec3(0.0f);
                auto hit = intersect(bvh, ray, [&](size_t i) {
                    if (auto hit = perm_prims[i].intersect(ray)) {
                        const auto& [b_u, b_v] = *hit;
                        bary = glm::vec3(b_u, b_v, 1.0f - b_u - b_v);
                        return true;
                    }
                    return false;
                });
                if (hit != -1) {
                    color = glm::vec4(bary, 1.0f);
                }
            }
            color /= spp;
            image[y * width + x] = encode_rgba(glm::vec4(as_srgb(color), 1.0f));
        }
    }
});```
#

Amazing

frank sail
#

now do path tracing

wicked notch
#

soon™️

frank sail
wicked notch
#

ye

distant lodge
#

if it's 43ms fully CPU you could totally have it go realtime on your GPU

frank sail
#

oke that's ebic

wicked notch
#

BVHv2's

wicked notch
#

but I'm slowly beginning to expand my brain mass

frank sail
#

because madmann is using the SmallStack thingy with a fixed size

wicked notch
#

Yeah but

#

Does AMD parallelize trasversal

frank sail
#

what does that mean

wicked notch
#

as in, instead of

while (!stack.is_empty()) {
    // traverse
}```
frank sail
#

each thread has its own ray and does its own traversal and intersection

wicked notch
#

You do something fancier

frank sail
#

and each thread has its own stack

wicked notch
#

Fair enough

frank sail
#

the shader compiler generates a traversal kernel

#

the only thing the hardware accelerates on AMD is bvh node and triangle intersection

#

the actual traversal is just regular code

wicked notch
#

Very nice

frank sail
#

using a "stackless" (actually fixed size stack) method

wicked notch
#

I guess trasversal is inherently difficult to parallelize

#

I mean, where would you even begin

#

Each step depends on the previous

frank sail
#

stop thinking about parallelizing traversal bleakekw

#

each thread has its own ray to worry about

wicked notch
#

I guess I might bleakekw

#

nah jk

#

I'll do it the lame, easy way

#

Oh I got an idea

frank sail
#

unless you have a scene with 10^100 triangles and literally only a single ray, parallelizing traversal doesn't seem very helpful

wicked notch
#

Perhaps work expansion could help

#

each work package does the trasversal for its own level

#

and dispatches more work for the next level

#

until leaves are reached

#

ok I'll stop now

frank sail
#

actually that does seem kinda interesting for making memory access more coherent

#

but you'd have one dispatch per level of the bvh, and each dispatch would become increasingly incoherent as there are more nodes

#

prolly not worth tbhbh

wicked notch
#

I mean if the big brains at NV and AMD are doing it this way then it is not worth to think about just yet

#

anyways

#

I'm quite happy I managed to understand BVHs this quickly

#

I was expecting a more gruesome and bloody thing

frank sail
#

what you're proposing sounds like distributing the work of a single ray's traversal to several threads. but you're already gonna have millions of rays, so you can shrimply have each thread compute one ray

wicked notch
#

ye sounds about right

#

we gotta shade cornell box

frank sail
#

wdym it looks pretty shaded already

#

hol up, now it's shaded

distant lodge
#

this is what AMD devrel does to your shaders

#

they don't want you to know this

wicked notch
#

Alright boys

#

poll time

#

Do I first shade this on the CPU or do I immediately start writing a shader

frank sail
#

for learning purposes it's probably easier to start with the CPU

#

and you also don't have to use a shit (shading) lang while you're learning

wicked notch
#

How much do I have to pay you for you to make a good shading language btw

frank sail
#

uhh

#

tree fiddy

wicked notch
#

deal

frank sail
#

(approx)

distant lodge
#

just write your own shading lang that will fix what's wrong with GLSL for real this time

wicked notch
#

I have no idea how to write languages

wispy spear
#

call it glsl 2.0

frank sail
#

too bad shading languages still require some knowledge of graphics APIs

#

e.g., you still need a concept of resource binding

wicked notch
#

I know what that is fortunately bleakekw

frank sail
#

if you're using cutting-edge vulkan, at least you can use BDA and descriptor indexing to shrimplify that stuff a bit

#

but I doubt you can make anything cuda-like without also providing your own API that wraps stuff nicely

wicked notch
#

I don't want cuda like tbh

#

I want to learn how to do RT in glsl

#

so I can use it in Iris KEKW

#

And frogfood as well

frank sail
#

I'm just saying that shading languages suck and cuda is a much nicer environment to use

wicked notch
#

ye true

frank sail
#

and I'd like to be able to have something similar for graphics

wicked notch
#

you are at AMD

frank sail
#

without vendor lock-in bleakekw

wicked notch
#

just pester some graphics engineer or something

#

you can use advanced tactics like:
guns
guns
more guns
intercontinental ballistic missiles (in case they escape)

frank sail
#

the tf2 mercenary approach to persuasion

wicked notch
#

Alright I have materials and normals

#

tomorrow we'll be doing good ol path tracing

wicked notch
#

as it turns out

#

intersecting a bvh is hard

minor root
#

it do be