#Iris - A Journey through OpenGL and beyond to learn Graphics
1 messages Β· Page 18 of 1
did you push the latest tbb fix? π
it should be good now
[cmake] Cannot find source file:
[cmake]
[cmake] /home/deccer/Code/External/Retina/Dependencies/cgltf/Implementation.c
smh : ) elias mentioned it earlier
incredible
let me reclone this mf
I think the issue is caused by not using fastgltf
fastgltf issue
indeed, why are you not using fastgltf already
Git is goofy ah when changing case sensitivity on Windows
it do be, indeed
maybe I should wipe my windows drive
π¬π³
yeah windows tends to have trouble when you tell it to rename a file with only case differences
you need to remove CGLTF completely and add cgltf again
otherwise theres a git config for casing but i wouldnt fiddle with it
@wicked notch how do you abstract vkCmdPushConstants?
with the byte array thingy
template <typename... Args>
auto CCommandBuffer::PushConstants(Args&&... args) noexcept -> CCommandBuffer& {
RETINA_PROFILE_SCOPED();
return PushConstants(0_u32, std::span<const uint8>(Core::MakeByteArray(std::forward<Args>(args)...)));
}
auto CCommandBuffer::PushConstants(uint32 offset, std::span<const uint8> values) noexcept -> CCommandBuffer& {
RETINA_PROFILE_SCOPED();
const auto& currentPipeline = *_currentState.Pipeline;
const auto& layout = currentPipeline.GetLayout();
vkCmdPushConstants(_handle, layout.Handle, AsEnumCounterpart(layout.PushConstant.Stages), offset, values.size_bytes(), values.data());
return *this;
}```
damn I need a current pipeline thingy
actually I don't
actually I do for the stage flags
unless
oh man, I'm based
.stageFlags = VK_SHADER_STAGE_ALL,
valid strategy
I have that in my global pipeline layout
honestly quite incredible
btw I have this handy thing
class TriviallyCopyableByteSpan : public std::span<const std::byte>
{
public:
template<typename T>
requires std::is_trivially_copyable_v<T>
TriviallyCopyableByteSpan(const T& t) : std::span<const std::byte>(std::as_bytes(std::span{&t, static_cast<size_t>(1)}))
{
}
template<typename T>
requires std::is_trivially_copyable_v<T>
TriviallyCopyableByteSpan(std::span<const T> t) : std::span<const std::byte>(std::as_bytes(t))
{
}
template<typename T>
requires std::is_trivially_copyable_v<T>
TriviallyCopyableByteSpan(std::span<T> t) : std::span<const std::byte>(std::as_bytes(t))
{
}
};
lvstrium
@wicked notch would you be interested in collaborating on a d3d12 renderer when you have time?
for d3d12 learning
since you mentioned an interest
damn you want to go to the dark side
maybe i should too : >
im currently pissed about lunix
its ok if you just want to go with lvstri : )
Just a possible future thing, no pressure
so D3D13 then
I have no goals except to learn d3d12 gooder so I'd be willing to work on any part of it and let you do what you want
@wispy spear you could join the fun too
Unless π
timberdoodle and $unnamed-thingy will actually be bf and gf
mor like grumpy-aunt and alcoholic-uncle
(babafroggy and gooeyfroggy)
The future is bright 
hehe maybe we pick a new pet
nothing like some awesome shadows 
there is nothing wrong in this image btw
it's just your imagination
Just add some bias, lmao, ez fix, bro
and tell Mr Moiree to piss off
I just realized that log scaling in PSSM is really just whatever we were doing with VSM at the beginning
which is POT scaling
real
but I still have faith in SD(S?)VSM
the thing where you reduce your depth buffer and do magic
I don't see how but perhaps
I switched from an actual index buffer to a buffer of triangle IDs for meshlets. No perf improvement, but 1/3 memory usage per triangle, and it opens up the possibility of bigger meshlet sizes.
how does that work? doesn't that remove your vertex cache?
you already give up your vertex cache if you do SW meshlets innit
The indices weren't real indices, they were just an index into a different buffer anyways.
I wansn't using a real vertex buffer
But I'm not really familiar with the hardware vertex cache stuff so idk
I'm not using a real vertex buffer either, but I'm still using a real index buffer
regardless of that the perf implications are minimal
Regardless, didn't seem to be much of any perf difference on an RTX 3080 on windows
potrick already tested
Yeah
seriously? that's surprising to me
Β―_(γ)_/Β―
I suppose the vertex cache doesn't help much when you're super limited by registers and memory
so you just do
uint triangleId = gl_VertexID / 3;
uint vertexId = gl_VertexID % 3;
?
Anyways roadmap is currently:
- Finish LODs
- Software raster
- Experiment with meshlet sizing
- Single pass depth pyramid generation (overhead/barriers between passes is sloooow)
it wouldn't be the opposite because % 3 can only ever be 0 1 2
uhh
Basically, let me find the commit
https://github.com/bevyengine/bevy/pull/10164/commits/8ac2e942b3e9cc5679eb1abf5c56b9328b1a479f visibility_buffer_raster.wgsl
let packed_ids = draw_index_buffer[vertex_index / 3u];
let cluster_id = packed_ids >> 8u;
let triangle_id = extractBits(packed_ids, 0u, 8u);
let index_id = (triangle_id * 3u) + (vertex_index % 3u);
huh, maybe I'll give that a try
I'm still allocating worst case triangle buffer size though
After everything else is finished, I want to experiment with dynamically allocating it based on some heuristics and maybe buffer readback, and probbaly just spill over into software raster if I run out of room
I was just about to say
the solution to that is fixed budget + readback from gpu
dynamic size is not practical
Yeah. I want to do LODs + software raster first, since that'll greatly change how much I actually need
Also I'm limited to 2gb max, which is uhh
storage buffer moment
4 bytes per triangle
imagine π ±οΈointers
city sample is 35 billion
LODs and software raster means I should never hit that limit idt, that feels like more than enough now
gotta crank those numbers up
per frame? Or total in the scene without culling/LODs
ok yeah, the 2gb limit just affects the actual amount I can render at once
I can't export the big version :(
Well technically, in each of the two passes
35 billion triangles...ah yes, a classic 391GB index buffer
@delicate rain where's daxa's imgui port
behold cope
how much cope we talkin
It's fine actually
I was just memin
also how are you already futher along in porting VSMs than me??
are the layout colors already fixed for sRGB brain damage
oh there I do some cope
cope is all we can do
I precompile two shaders, one for sRGB and one for UNORM, so one does manual gamma correction and one does not
these are the arrays here https://github.com/Ipotrick/Daxa/blob/master/src/utils/impl_imgui_spv.hpp
I then decide on these based on the swapchain format iirc
only on very big gpus
like 4080
and you still get a little perf from index buffer but remember i calculated in the index buffer generation
the draw is still faster but on 4080 just unindexed was nearly the same
1080ti was a much bugger siff
I just realized the ImageCount refers to frames in flights in imgui
actually insane
Alright, time to sit down and figure out why my LOD builder is not working
Assertion failed: v < vertex_count, file vendor/src/simplifier.cpp, line 63 wuh
oh wait
hmmmmm
fetchcontent is unable to fetch fmtlib : (
it does happen sometimes
I have no idea why 
can commit history change at random
so commit hashes that are valid are rendered invalid by a rebase?
if possible find actually tagged commits
it depends
some can also just disappear
because orphaned
tagged commits are releases?
yes
well
its 2 things
a release happens on a tag
tags can exist without having a release hehe
but yeah
and CGLTF is also still there, but i guess i have to reclone
I made sure CGLTF is now upper case everywhere
ok you can reclone now
I fixed fmtlib's commit hashes
I'll fix the other's as well

glMultiDrawArraysIndirectBindlessCountNV
yeah
from 2015
anywho
fails to configure
bunch of macro/templateism
(freshly cloned)
and with clang
bruh
no rush : )
So uhh yeah these are 4 clusters that metis chose to group ???
remember
metis has no spatial info
it only has topology info
you must set weights yourself
Right, I was thinking that's the problem
But like what do I do? Multiply the shared edge count weight by the spatial distance?
whatever you want
you can compute a centroid for each meshlet
and multiply that by the greatest of the edges
you can do what you said as well
just find a good function, maybe with the help of desmos
I didn't care about maximum number of shared edges
I just used distance
ughh ok, thanks
Ok turns out none of my meshlets have shared edges
So uhhh yeah
@wicked notch you use the u8 meshlet index/vertex things to identify shared edges yes?
I use the full index
meshletIndices[offset + meshletPrimitives[offset + id]]
I think
let me check
yes the full index
Also not working, wtf
let meshopt_meshlet = meshlets.meshlets[meshlet_id];
for triangle_id in 0..meshopt_meshlet.triangle_count {
let index_id1 = meshopt_meshlet.triangle_offset + (triangle_id * 3);
let index_id2 = index_id1 + 1;
let index_id3 = index_id2 + 1;
let index1 = meshlets.triangles[index_id1 as usize] as u32;
let index2 = meshlets.triangles[index_id2 as usize] as u32;
let index3 = meshlets.triangles[index_id3 as usize] as u32;
let vertex_id1 = meshopt_meshlet.vertex_offset + index1;
let vertex_id2 = meshopt_meshlet.vertex_offset + index2;
let vertex_id3 = meshopt_meshlet.vertex_offset + index3;
let v0 = meshlets.vertices[vertex_id1 as usize];
let v1 = meshlets.vertices[vertex_id2 as usize];
let v2 = meshlets.vertices[vertex_id3 as usize];
meshlet_triangle_edges.insert((v0.min(v1), v0.max(v1)));
meshlet_triangle_edges.insert((v0.min(v2), v0.max(v2)));
meshlet_triangle_edges.insert((v1.min(v2), v1.max(v2)));
}
regarding this btw, it appears gcc has had a bug about C++'s grammar for a decade and that clang doesn't support expected on linux yet with libstdc++
actually incredible
fuck me
: D
it's actually my fault, if I weren't using c++23 this wouldn't have happened probably
terrible git practices
yeah
even if you're rebasing to merge a PR it shouldn't affect history though
so wtf are they doing
squash should also kill commits
yeah that's a rebase
but you wouldn't put a PR's commit hash into fetchcontent
you can't, unless you clone the PR author's repo instead
gpus are scary sometimes
I was wondering why my loading times were very slow and it turns out I was processing and generating meshlets for every gltf primitive 4 times over
but then I realized the rendering took basically the same time (100us difference in pure raster)
oopsi
Ok I think the suzanne head is just dumb and has unique vertices for every single triangle
Progress, kinda works
Some meshlets have those really teeny triangles in weird formations that can't be simplified though
LOD 0 meshlets without any adjacent meshlets
@wicked notch when you're back online: Blender seems to be weird and when you export meshes with tangents it duplicates vertices and seperates the mesh into parts, which ruins the LOD generation code. How are you dealing with this? Are you somehow deduplicating/hashing vertices?
You know what I'll go figure out how to calculate tangents during the visbuffer resolve in the shader
ye, I did mention already I was running the entire meshoptimizer's optimization pipeline before anything and for each new lod level
always reoptimize your meshlets, METIS breaks down hard if you don't optimize
yes that is the problem with meshopt, it has a really lax heuristic on connection
in short, it does not care about whether a meshlet is fully spatially connected or not
that's why using meshoptimizer to generate meshlets doesn't work
I wonder, why imgui decides to reupload buffers instead of just using device_local | host_visible memory
the peeps might lack the knowledge of how one should do shit efficiently according to x, which might not have been ideal n months ago
I realized it's just for backwards compat
we didn't have device_local | host_visible mem until recently 
my talking out of my ass was correct lol
void main() {
const SVertexFormat vertex = g_VertexBuffer.Data[gl_VertexIndex];
const vec4 linearColor = vec4(RetinaToLinearFromNonLinear(vertex.Color.rgb), vertex.Color.a);
o_Uv = vertex.Uv;
o_Color = linearColor;
gl_Position = vec4(vertex.Position * u_Scale + u_Translate, 0.0, 1.0);
}
``` epic cope
vert ye
oops lol
i dont know why i dont do the srgbisms there too
i have a convoluted logic in my code which schwitzes FRAMEBUFFER_SRGB on or off
lustri you are a genius
no this is just imgui brain damage, there's nothing genius about this but copium 
shut up and take the compliment π
blending dies you know
man i hate it when github just picks a random commit for this stuff
I copied permalink specifically π
also if you display textures in imgui they'll look wrong
Lies
Ok, this one might bite me 
bro look at the green lol
you're interpolating non linear colors
and blending nonlinearly too
you can't just ignore srgb and hope it works lol
we've had this discussion already a few days ago in #questions
#questions message
Fuk 
Also which green and where
you will receive mail from my lawyer
nevermind that, it's just discord artifacts
But yeah, displaying textures might reveal the badness of this method and Iβll resort to cope 
it's joever
time to write custom imgui backend
I guess you could pass RGB8 image view to it too, but you need VK_IMAGE_CREATE_MUTABLE_FORMAT and this is becoming giga-copium and might be slow
Actually custom ImGui backend doesn't sound like that bad of a deal
Especially if you don't need multiple viewport/window support
But also you can do bindless textures there and no AddTexture copium
Because calling ImGui_ImplVulkan_AddTexture beforehand sure sounds like extra work, when you can just
ImGui::Image(bindlessId);
real
this is partly why I did custom backend
behold
#include <Retina/Retina.glsl>
#include <Retina/Utility.glsl>
layout (location = 0) in vec4 i_Color;
layout (location = 1) in vec2 i_Uv;
layout (location = 0) out vec4 o_Pixel;
RetinaDeclarePushConstant() {
uint u_VertexBufferId;
uint u_SamplerId;
uint u_TextureId;
vec2 u_Scale;
vec2 u_Translate;
};
#define g_Texture RetinaGetSampledImage(Texture2D, u_TextureId)
#define g_Sampler RetinaGetSampler(u_SamplerId)
void main() {
if (RetinaIsHandleValid(u_TextureId)) {
o_Pixel = i_Color * texture(sampler2D(g_Texture, g_Sampler), i_Uv);
} else {
o_Pixel = i_Color;
}
}
btw you don't need to do this lol
you can go the copium way and convert all your style colors to linear
this is lossy but it werks and you don't need to write a backend
As pat has said, it doesn't work for color pickers
- I really want to be able to just
ImGui::Image(bindlessId)π
Just write your own ImGui(tm)
I was wondering why my texture upload wasn't working, turns out I'm a dumbass and I didn't have the alpha channel on in renderdoc
custom backend is bae
CopiumGUI
imgui also supports just single-channel (which I use)
Retina.GUI's first ever test run is complete
I promise the text isn't fucked like it is in the image 
idk why it is fucked up
because this is faster to load 
: >
shame that the floor is just a quad in oldsponza
would look much cooler with the usa states pattern
now make imgui purple and you're set
i second that
give theme
Make tlottes theme plz
is that that one white one?
https://github.com/Eearslya/Luna/blob/dev/Luna/Source/Renderer/UIManager.cpp#L221 (these colors are srgb)
CIE1337
then these are linear
shit
because I output imgui onto srgb
it's joever
banned
I decided to properly convert all the colors instead of just coping with unorm target π
I went for converting into vertex shader
also fair
I just ran a quick script to convert and re-output a C++ array 
especially when you consider color pickers, converting in the shader won't always work
I will just make my own color picker
Ugh getting METIS to respect the meshlet vertex/triangle size maximums is gonna suckkkkk though
it's not that bad
man
it feels so incredibly good to just write ImGui::Image(handle)
_imGuiContext->Render(*_tonemap.MainImage, commandBuffer, [&] noexcept {
ImGui::Begin("Hello, world!");
ImGui::Image(GUI::AsTextureHandle(_visbuffer.DepthImage.GetHandle()), { 1280, 720 });
ImGui::End();
});```
look at this
it's glorious
Iβm gonna still your backend
Nothing personnel, kid
Mine will probably be 10x longer because I donβt have so many magic abstractions 
imgooey backend is rather trivial methinks
its just populating vbo/ebo and providing two simple shaders
populating by iterating over imgooeys cmdists and thats all
hehe i support resizing π
but thats just another 3 lines or so
ye I just allocated 1 million vertices and indices and that's it
βJust donβt draw too much UIβ tactic? 
yeah that should last for a while
if GUI has to draw more than 1 million vertices it's joever for other reasons
Simplest
editor
i think i have 64k default size
Good. Otherwise youβd be stuck in Cherno engine hell
Plz no C#, sorry, deccer 
(Itβs another bikeshed hell, adding βscriptingβ)
scripting is rather necessary though innit
Depends. You can do a lot with just C++
c# is the only sane choice for an editor, the tooling is world class, unironically
I linked spelunky psp in my thread, it has some good and sane C++ gameplay code
Lua is not bad, but no typing is awful. Not sure how good luauβs typing is
What if we made another language, use some Rust mannerisms, and remove operator precedence? Call it WGSL or something.
and call it Rust (with a capital R)
But basically you can get very far with just C++ and then start moving stuff to scripts
Itβs not like youβll have a team of people who need to iterate quickly and suck at C++
what if I suck at C++
But yeah, I was a fan of Lua, but doing more gameplay programming in Go, I understand how much typing helps and how your APIs became less copium
yeah we will support luigi either way
I like making an editor simply because playing with sliders and buttons during runtime is fun
anyone with windows, clang and time to waste want to try this? 
linux is broken until gcc can figure shit out
eh why not
I kinda hate the fact that I basically require the latest bleeding edge shit to compile my stuff lol
what working dir
src/Retina/Entry
the asserts tho
oh wait shit
wait the working dir is wrong
my bad
working dir is wherever the exe is at
I don't think it likes the separate build dir
hm
wait did you change the model path
I load bistro on startup iirc
in this commit
I didn't do nothin
ye, you should probably put something inside the Models folder
cmake -S . -B Build -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++
cmake --build Build --parallel 8
it's i src/Retina/Sandbox/Assets
but I didn't upload a default model, silly me
also change the path to the model in src/Retina/Sandbox/SandboxApplication.cpp
skill issue on my part indeed
tons of hazards tho
it's actually the same two hazards if you limit duplicated messages
but they are fake
it's just syncval being unable to cope with bindless resources being bound at the same time, even when never dynamically used
Delete this
Submit the bug NOW if real 
imguiio has a fontscale thingy, mayhaps it helps
which one?
that work graph thing
or was that potrick who posted it
was a red name : >
#general message
found it
@wicked notch what the heck is the error metric meshopt ismplify gives you? I can't find any documentation on it
just quadric innit
it's the average of the quadric error metric for all vertices after simplification iirc
Hmm, so how do you select what LOD to use? Is there some formula for calculating the acceptable amount of error at a given distance/
there isn't, it's up to you
you can project the error to the sphere bounds and get a screen size estimate of the error
then pick LOD based on that estimate
you can also do it super naively and scale the error by the distance
as long as your error function is monotonic it's no problem
Like take the meshlet bounding sphere, and scale the radius by the error?
So you have your screen-space AABB you use for culling
So area of that, or use the sphere exactly?
And then i guess your heuristic is < 1 pixel worth of error
sure, area of the bounding rect is fine too
awesome, ty
Also more questions but you don' have to ansrwer these yet: How do I size the dispatches for persistent threads? 1 single workgroup of 64x1x1? Somthing else/
And is 1 thread per node, or 1 wg or?
that I have no idea 
I was planning to do LOD selection on the CPU
as a temporary cope
then maybe I'll just do the naive BVH trasversal thing in compute
or use an octree or something, idk
but I don't think I'll be doing persistent threads
Nanite does some kind of angle-based error projection, but dosen't explain it π¦
Also @wicked notch I will continue to annoy you with questions (let me know if you mind). Do you have any tips for forcing the error to be monotic over node depth? Maybe walk the tree backwards and make bounding sphere size * error be the max of child,parent for each node?
I mean you can shrimply do that when building the dag
From what I remember, meshoptimizer outputs an error which represents the error in terms of relative mesh extents
That is, from what I understand, if the error is 0.01,then it represents a displacement of 1% of the mesh size.
What I chose to do is to project a sphere of radius error*mesh scale and another of radius parentError * mesh scale, compute the projected radius and use those projected radiuses for determining whether a given cluster is visible.
I can link to the code if you want.
anti taa freaks about to lose their collective minds
Now compare MSAA and TAA plz 
you have taa??
I can't because I don't support inferior rasterization techniques like forward
Forward made a comeback, you fool 
(looks good, though, I'm not a TAA hater, I'm just lazy
)
I talked to someone else that did a nanite kinda thing, and they actually suggested using 0.5 * longest_view_axis_in_pixels * sqrt(error), then that gives you the length of a line that you can project to find the depth where it would be < 1 pixel, and then you can compare that depth to the meshlets bounding sphere depth in view space.
Which I don't quite understand, but they said it works better than the projected aabb size
Not sure I understand what longest_view_axis_in_pixels is supposed to represent, is it the distance (in screen/view space) between the two furthest visible points of the mesh?
Just in case : Small note about meshoptimizer, the returned simplification error is already the square root of the error computed by meshoptimizer's simplification.
Oh good to know, I'll have to remove the manual sqrt π , thank you
It's max of your texture dimensions of your viewport
So like 1920x1080 it's 1920.0
So you form a line of length 1920.0 * 0.5 * error, and then find the depth where it's less than a pixel
And then compare the meshlet bounding sphere depth
is that to determine which lod to load?
Sounds more efficient than what I do
anyone who doesn't instinctively read sqrt as "squirt" is lying
Yes
I'm starting on LODs
I have to go write a dag traversal shader :p
did you fix your meshlet generation problems yet
traversing a dag isn't really key to lod selection btw
you can use any datastructure you want, even a simple list of clusters (with parent id)
all you need to select a lod is the current error and the parent's error
Kinda. I found out it was blenders tangent exporter that was splitting the mesh and causing issues. Decided to just fix the mesh manually and move on, as I eventually want to get rid of precalculated vertex tangents.
Yeah that's what I have now, just a flat list. I'm not entirely convinced it works though... Hold on brb
suzanne is indeed fucked
you should still optimize your meshes with meshopt anyways
It happened with the Stanford bunny too
Yeah it's on my list to go figure out all the meshopt index buffer remappers and whatever
I'm not convinced about the single dag cut
Like dosent it depends not just on the error, but also the bounding spheres of each meshlet?
And the perspective you view it at etc
you make the parent use a very conservative bounding sphere
and you also make its error strictly greater than all of its children
you can probably account for perspective distorsion by doing some projective magic
but to be honest, it should be good enough if you're very conservative with the error
although that might result in higher lods than necessary but our gpus are big and strong
I think my single brain cell is strong enough to try my hand at this now that everyone else has figured out and explained the hard parts.
I should rename this thread to nanite HQ
Ok @wicked notch let's say you have 4 nodes a, b, c, d
D is a child of C is a child of b is a child of a
If you evaluate the meshlets as a flat list to render
D has low enough error to render
And so does C
But A and B does not
Hmmm ok I see nvm
Only d would render
each node should have a parent id as well
otherwise you can't do the checking properly
Yes but
Hmmm
Ok yeah I understand
If A has too high error, but b, c, and d have low enough
Than a wouldn't render
B would render
C wouldn't not render because it's parent error is too low
And same for d
Right? So you end up with only 1 cluster, B
Which is the cheapest to render
in that case yes
While still maintaining little enough error
Ok I understand how this works to ensure a single cut then
I've internalized it
Before I didn't trust it
Also @wicked notch I had a thought
Rn I'm doing 1 thread per wg to cull cluster, then a separate dispatch to write the triangle buffer at 1 workgroup per cluster (1 thread per triangle)
Doing 1 thread per cluster to cull and then write all 64 triangles was too expensive
But what if I did 1 workgroup per cluster, but combined both the culling and triangle id writing passes into 1.
you had this before didn't you
this is (probably, though extensively tested) not going to be faster because only one thread will be doing the culling, stalling the other threads in the workgroup
Did I? Actually idr if I tested that. I know 1 thread per cluster, writing all 64 triangles was too slow.
It gets annoying with two pass and lods
@dull oyster oh hey, just realized you're the one writing the nanite blog, thanks! It was a useful reference at several points for me
Wait a minute I;m getting confused. Which nodes are the parents? The original meshlets, or the simplified versions?
should be the simplified versions
But then how do oyu have 1 parent for each node?
Lets say you have 4 clusters, you merge them, and split into 2.
Nodes have more than 1 parent
Is the parent you store for LOD selection supposed to be the parent with the highest error?
Ok, I think I need to use meshopt_computeClusterBounds() to make a new bounding sphere for the whole group
which is used for LOD, seperate from the meshlet bounding spheres
Lod 0 (bottom of the DAG):
self_error = 0
self_bounds = computed from meshlets
parent_error = error of simplified group this meshlet is in when building lod 1
parent_bounds = computed simplified group when building lod 1
Lod 1 (parent of lod 0):
self_error = computed when simplifying groups of lod 0 meshlets
self_bounds = computed from meshlets from simplified group
parent_error = error of simplified group this meshlet is in when building lod 2
parent_bounds = computed simplified group when building lod 2
and so on
so the error in each group feeds to the next one
but the bounds for self are always claculated on the exact meshlet, while the parent bounds are calculated on the group as a whole
no, you need both parents
what, how do you do the runtime LOD checking then if you have multiple parents??
you select either the parent(s) or the children
you can't just select one parent and one child
you must choose between either the parent(s) or the children
Sorry, you've lost me π¬
For each cluster, you only render it if the parent error is too high, and the self error is low enough
But which parent, if you have multiple?
you calculate the error in the merge and simplify step
in the split step you simply copy the error over
Yes, but you also need a bounding sphere for the parent
So, which parent's bounding sphere?
again, there is only one bounding sphere
the one you calculate in the merge & simplify step
a bounding sphere over the entire simplified group, before splitting, yes?
yep
Yeah that's what I figured out last night lol, ok good
Each meshlet has a bounding sphere, but you also make a seperate bounding sphere for the gropu before splitting, which is what gets used as the parent bounds
I believe you should make a bounding sphere for the child group as well
but I'm not sure on that one
the child group, ?
if you look at this example
suppose the dag on the right is your whole mesh
there are no more parents or children
at draw time, you either choose to draw the two parent clusters or the four child clusters
there are no more valid states
ok agreed so far
therefore, isn't it more practical to treat the whole dag as a tree with two nodes?
you either have the parent "group" or the "child" group
hmm, I'm not sure that scales to a whole DAG though
yeah, it's an interesting idea
Ok, done with adding the parent info to the LOD system
Next is rip out my culling shader and make it choose LODs
Going to rip out the 2pass culling for now
@wicked notch turns out I've not been accounting for perspective warping when projecting the bounding sphere for the meshlet to do culling
regardless of LODs
welp
rip
Doing the local->world->view space conversions on the center point, and then adding the radius is not correct π
I think I have a similar problem.
And I've mostly been taking note from your struggle. 
things would be so easy if y'all just used bounding boxes instead of spheres 
Vkguide lied to me
I took my occlusion culling code mostly from them, and they didn't account for the warped perspective:P
re fisiks
@faint crane can you coerce someone into taking intel spoonza and turn it into some stylized "wood block" model
where you can pull out the columns and then the shit starts to collapse
or you throw your stick into the wall et al
something like sis
Definitely stealing that idea.
mayhaps in 2 or 3 versions, one with bigger blocks, smoller ones and even smoller ones or something like that
luigi?
is that you?
A guide on how to make a game engine in a weekend.
Source code - https://github.com/iris-engine-dev/bric_a_brac
π All views are my own π
Music
Get Happy All The Time by Jason Pedder, Douglas Brown and over 1M + mainstream tracks here https://go.lickd.co/Music
License ID: axlaDwOdEJN
trying to pretend to be a calzone, when you really are a blob of cheddar
(it made me smile when i saw the link to gibhut in the video description, i know its not you :D)
success?!?
I think so!

This guy must have more than one brain cell.
nice
You're further than I am though no xD?
how do we motivate you lustri
I haven't started lol. My LOD builder defeinitly is very mediocre.
Deccer Cubes actually crashed my engine as per #926896734284689428.
well you were warned π
oh no π
Oh ok nvm it was a dumb mistake
There's some subtle flickering now though, might need TAA
oh wait no, I was dumjb again ahh
yeah still have this problem, ugh
how the heck do I debug this π€
why would this require TAA 
Blend subpixel error
Not sure that's what it is though
I have no idea why no clusters are rendering
I guess there's no possible cluster where error < 1 is possible
wait hold on, that makes no sense
idk what I'm talking about btw
lod 0 has error = 0
I don't think it's neccesary, but i haven't implemented it yet no
what I mean is that it doesn't seem right for there to be tiny subpixel error that causes pixel-wide holes or whatever
ignroe the flickering
that was a bugged thing
the real issue is that it's sometimes rendering nothing
which, shouldn't ever be possible??
LOD 0 should always be able to render
The only time it chooses not to, is if LOD 1 has < 1 pixel of error
I can't answer that
in which case LOD 1+ would render instead...
it seems to happen right at the transition between LODs
Ok I did some debugging
This is right during the transition between two nodes, where it glitches out and renders nothing:
For parent cluster:
lod_is_ok = false
parent_lod_is_ok = false
For child cluster:
lod_is_ok = true
parent_lod_is_ok = true
it renders if lod_is_ok && !parent_lod_is_ok
this is confusing af though
child_parent_lod = parent_lod
so how is it false for one, but true for the other
@wicked notch ok I'm stuck, help please π . Have you seen anything where no clusters render for a chunk at all, right on the boundary where if you move slightly further or closer it renders either the child or parent? Like right at the boundary between LOD changes, sometimes it renders nothing if you can get the distance just right.
ok so parent and child errors are not the same, ???
parent error in the child shader is 0.00013
self error in the parent shader is 0.00039
...suspicioully 1/3 as much? π€
man this sucks, I have no idea why they're different -_-
Ok other question, do you use the same bounding sphere that you use for culling as you use for determining whether the current cluster is at the right lod?
I feel like I'm messing up somewhere here
Did you use meshopt_simplifyScale at all?
Also I think I should be adding the error across LODs...
How do you calculate them?
You must ensure that the error is always increasing.
After simplifying a group, I take the max of the parents' errors, and add it to the current error, to make sure I don't have multiple lod with the same error. I'm not entirely sure it doesn't accumulate too much error, but it looks fine.
And finally I set the parents' parentError to the value I just computed
Not parents' error actually, I mean the errors of the previous lod that was used to generate the simplified cluster.
I haven't gotten to the rendering stage yet due to heavy thesis writing 
Tbh the error seems so nebulous
Idk how to check if the error is subpixel, when it's a made up value -_-
#graphics-techniques message
@wicked notch my cluster builder is not ever able to get to 1 cluster. It seems to get stuck somewhere around 1000 left. Do I need to start welding vertices? Or remap the vertices and optimize them?
Or should I be able to get to 1 cluster even without them?
both
optimizing vertices is really a necessity for metis to work properly
you need zero duplicated vertices
it's really easy
vertex welding tho... 
Which functions did you use again? Like the reorder for vertex cache or overdraw or whatever ones probably don't make sense right, because that's not applicable
Hash?
Gotcha. Will try when I'm home.
https://github.com/JMS55/bevy/blob/meshlet-lods/crates/bevy_pbr/src/meshlet/from_mesh.rs#L73-L75
You take the max of the simplified group and the child errors here.
If a parent is not simplified much, its error will be small, therefore its error will be <= child errors, so the resulting error for the parent will be equal to the child errors. And clusters with parentError == selfError will never be rendered.
Also, simplification is done iteratively, so the error of the parent is relative to the children. The error stored inside the meshlets must be somewhat "global", ie the error of a parent must no longer depend from the child.
To solve this issue, I personally add the child errors to the parent's error to make it increase. Not sure if that's the correct way to do it, but works fine for me.
https://github.com/jglrxavpok/Carrot/blob/a9319a15a195d6e24eafe657e15da486a47e7e45/asset_tools/fertilizer/models/ModelProcessing.cpp#L842-L852
Sure, but there's another issue. The final error calculations also depend on bounding spheres...
And it's unclear how the fuck to even project the error
Like theoretically, it should be add all error along the subtree so you get the total error, which is what you do
In world space
And then divide by the original mesh's diameter? Or something?
Not even sure if that makes sense
the Nanite presentation is (perhaps intentionally) vague about this
Yes, I was thinking of emailing Brian Karis and asking for advice lol
I use only the position of the bounding sphere, not its size.
But! I multiply the error returned by meshoptimizer with the result of meshopt_simplifyScale on the vertices of the meshlet group.
To select the LOD, for each cluster I project two spheres:
XYZ of bounding sphere and Radius of meshlet error
XYZ of parent bounding sphere and Radius of parentError
I compute the projected radius in pixels, and check them against my threshold (max 1 pixel of error)
Why does error become radius? What's the logic behind that?
It comes from the way I understand the error returned by meshoptimizer, which to me must match the interpretation of target_error:
target_error represents the error relative to mesh extents that can be tolerated, e.g. 0.01 = 1% deformation; value range [0..1]
I understand it as how much the shape of the mesh has been modified, as in how much smaller/bigger the mesh got in terms of its size. Of course not all parts of the mesh may shrink/grow that much so the actual bounding sphere may not change, but some parts may have "shrinked" by at most error%
lustri when he finished exams
Same
hmm now i want some pizza too : >
I think it makes sense not to use the bounding sphere of the meshlet to do lod selection
But use a new bounding sphere based on the group
In addition to the parent bounding sphere also being from a group
I finished 2 back to back just a few hours ago
After spending more time looking at LOD selection/error, I'm even more confused
I've scoured several github projects and nothing makes sense anymore -_-
I'm convinced there's no actual way to calculate the amount of error introduced by simplification in screenspace to determine the LOD
in any consistent way
Actually I think my impl is mostly right, but I made two mistakes:
Error needs to be converted to be monotonic with respect to object size, and the easiest way to do that is precalculate lod_error = (object_space_simplification_error) / (bounding_sphere_diameter) (I think this makes sense...?)
For calculating LOD, I always need to use the group bounding spheres, and never the per-meshlet tighter culling spheres. So each meshlet actually needs 3 bounding spheres: self culling, self lod, and parent lod
Meshlets of a same group need to make the same LOD decision, so you indeed need to use the group bounding sphere for LOD selection.
@wicked notch Mister LVSTRI how do you compress bistro with gltf pack/ do you have compressed version that you could share? I suspect the normal textures are compressed wrong, could that be the case? Do they need any specific treatment?
you're in luck, if you wait about 10 minutes you'll have a proper version of gltfpack
I was just working on that lol
https://github.com/LVSTRI/meshoptimizer @delicate rain
there you go
What did you change, if you don't mind me being curious?
Oh lmao
Nice, thank you
Is this also the version that doesn't destroy the scene hierarchy?
you can just use the -kn flag for that I think
the version I had that made gltfpack leave the hierarchy untouched was kinda dumb because it was a time where I didn't use meshletisms
I can add it easily tho
Aha I see, well the flag worked well enough the last time, I think it should be fine for now
What do I set MESHOPT_BASISU_PATH to? π found it
i just looked at the dlss commit on retina for fun and saw you're adding RETINA_INLINE on a templated constexpr function... why?
gcc doesn't inline some functions when they're not called from a constexpr context
idk why, I dumped the inlining heuristic cost and it looked like they should've been inlined
oh is inline gnu::always_inline? nvm then
what does that do
can someone explain ktx ? its still all fuzzy to me
@wicked notch unrelated but btw ImGui::Combo is the "old" api and the code for the quality preset is making me feel weird with those lambdas
it's basically a simple swizzle from RG to RA
BC5 expects one channel in the R component and another in the A component
yes RA
oh lol weird
indeed
ktx can contain basisu textures which can quickly be transformed into BC textures or ASTC textures, so you can target mobile & desktop
yes this is the exact problem we have
can you tell me more about basisu?
yeah thats the gltf extension which allows ktx textures
to use them you need the library i linked
ah so basis u is used by ktx and gltf has an extension that allows for ktx/basis u textures
ye
and its meant as a thing that can be run everywhere
whereas if you'd only use DDS with a BC7 image for example that will only run on desktop hardware
and this is like a middle ground
basisu is just a supercompression scheme yes
ktx supports basisu for compressed textures which you then transcode at runtime to BCn
Big W, got proper LOD selection working!!
Lod selection changing is slightly noticeable, it's not completely imperceptible sadly. But it's fairly good.
And no more glitches!
make sure to document it, maybe even blog about it
Mhm I'm not confident yet, but once I finish it all I plan to do either a mini-blog on specifically the LOD building, or a bigger one on the entire renderer π
no excuses, ill have my eyes on you hehe
Subscribe to my RSS feed π https://jms55.github.io/
but does it work with disconnected meshes :p
Fs no lol
I have a bunch of stuff to improve on the LOD builder
ye that's the truly cancerous part
I can't actually get to 1 node, I just stop after 10 levels atm
building a graph of meshlets was hard enough
but making it work on all sorts of meshes is truly horrible
ye, the solution is vertex welding but that's hard too
Mhmm I'll work on it in the future
I'm actually going to have to take a 3 week break I've just realized, I'm going to be traveling and won't have my desktop
ye that's epic
and tm the initial meshlet stuff (everything before LODs) will be merged into bevy main finally π
Please let me know what you do for error metrics and projection!
Theortically, there's more than just position error
It's not just a matter of pixels rendered
There's also things like BSDF error etc
Today I just ended an involuntary 3 week break and will try my hand at this.
Couldnβt do anything on my Mac.
so you bought a pc? π
My PC has been waiting to render some Suzannes and Sponzas. Mac crashed on Deccer Cubes.
Good luck lol. My code is open source if you need a reference.
make sure to credit all the frogs, jasmine
frogs?
the ones living in your head
@dull oyster I'm not sure projecting the error and checking < 1% even makes any sense
The error is relative to the previous meshlet, theoretically
So it's not even a function of deformation from the original mesh
That's why I add the error from the meshlets used to generate the simplified ones
Mhm I was thinking about that. But then you would want to scale it by the original mesh's bounding sphere, probably.
To get deformity from the original mesh
then use that percentage to scale the projected size of the LOD bounding sphere
so if a higher LOD group is 10% deformed from the original mesh (after dividing by the original LOD 0 bounding sphere diameter)
Then you project the higher LOD's bounding sphere to the screen, and multipky by 0.1
I haven't thought over if that makes sense or not, but something along those lines, probably?
I multiply all my errors by the group scale (meshopt_simplifyScale) to have all errors in world units instead of being relative.
I think it could work.
By the way, it is not projecting and checking if <1% but checking if <1 pixel
Yeah
But idk not convinced about anything
A nice side effect of TAA is that shadows get automagically nicer even without filtering
anyways, the framework for nanite is complete
now it's time to work on the graph algorithms
first: imma make my own graph partitioner, I've found all sorts of bugs (read: segfaults
) when specifying edge weights in METIS, the docs don't say anything specific about edge weights valid values but even if I limit the range in the [0, 2^16-1] range or [1, 2^16-1] ranges it sometimes crashes
second: it follows from the first that I need a half edge structure, this is actually good because I can start doing some preprocessing and mesh validation here already
I'll make my own data structures as well just to minimize the amount of obscure, abandoned libraries I need
also I gotta figure out a good format to store my DAG in gltf (and not break the existing data)
write graph partitioner in rust :frog_crab:
@wicked notch do you have a collection of meshes to test LOD building edge cases?
Also I'm curious, what do you set as your target_error to meshoptimizer's simplify
#1090390868449558618 message
once you can simplify this to one cluster, you're good to go 
I haven't found a good value yet
Wtf is wrong with it? π . Your renderer can manage it though?
And what should it end up looking like, a flat plane?
no of course I can't 
There's no way I'll manage then π
I have zero clue, but ideally it should be <128 triangles
the thing the model tests is how well you can weld vertices
because if you try to simplify that model with meshopt_simplify, you'll never get anywhere
Hmm ok, thanks
plus you need to watch out for UV error
Yeah I don't have any way of dealing with UV seams
@wicked notch if you're still awake, I have a question for u (related to what I posted in #vulkan)
does your renderer run under renderdoc?
have you been able to use edit-and-continue on your original source code (by emitting debug info)
lemme checc
I need a toggle to easily delete DLSS
huh
if I edit it dumps every include in one single file
I can still debug just fine tho
as of right now I don't
I removed the thing to test something else
but I'll integrate it back tomorrow
hmm renderdoc is happy to show me the source if there are no includes
maybe my glslang includer is fooked
mfw tomorrow I'll have to sing the national hymn and I don't remember it 
why does our hymn last for 4 fucking minutes
New LVSTRI character arc dropped
tbf Americans have to recite our "pledge of allegiance" every morning in grade 1-12 
it's technically optional but socially you're pressured to do it anyways
ye I just reverted to using my scuffed stb_include fork and now it "works" (it shows the mostly unmolested source)
wgsl π¦
if you add bindless I will stop considering webgpu a meme
even opegl has bindless cmon 
We have it, but it's not very good
But you can use wgsl with vulkan if you want
It's a nice shader language regardless
We have limited forms of pointers, idk what you mean by templates or descriptor aliasung
pointers = buffer pointers (not logical pointers)
templates = generics
descriptor aliasing = reinterpret memory through different descriptor declarations on the same binding
nope to all 3, I think

-force-glsl-scalar-layout -fvk-use-entrypoint-name -emit-spirv-directly -matrix-layout-column-major -O0 -lang slang
I am going insane
this dogshit IR thinks SV_PrimitiveID is a global param which is good
except it doesn't store any references to any parent type
so
struct PerPrimitive {
uint PrimitiveID : SV_PrimitiveID;
};``` the struct is lost in translation
incredible
so I literally have to check whether the current stage is mesh shader
hey @wicked notch what's that cope technique for splitting cascades called again
the intel one
where you analyze the depth buffer
SDSM
that's right, thanks
Just testing volumetric f(r)og in Unreal Engine again
π 3 π 1
thats me :)
hello im necroposting bc I've not looked at this thread in a while and i like feeling useful π
Since then I wrote my own backend which uses some unholy hacks from DiligentEngine to achieve proper alpha blending, color pickers and I also support putting non-linear colors into ImGui's color pickers (which are converted to RGB and back)
hell yea
you can do that, just keep the actual textureid around, which you need anyway for getting the handle in the firstplace
I was replying to the srgb stuff and my mention π
ah, i blame elias for not properly separating messages then : >
can someone please recap what exactly the issues are with current implementations besides that imgui uses srgb for everything?
and for me Image::Image(imageView) works perfectly without needing to change anything
Thatβs the only issue. But a pretty big one
Try to display _SRGB texture via ImGui::Image call, for example
Hmm, I thought it took descriptor set
it does
Oh ye I think I just make a unorm view and then send it off to imgui
Requires mutable format for all textures though. Kinda cringe
Is it though
PR ready for initial meshlet LODs. Didn't do vertex hashing yet, just the basics. https://github.com/bevyengine/bevy/pull/12755
lvstri is missing as a refererence
I have nothing to link to lol, otherwise I would
He has a github with the same name
Ah sure, I'll add him (and others) in a bit to a credits section. The references is intended to help reviewers, I wasn't thinking about credit yet.
There will also be a blog post when bevy 0.14 releases in a few months, and I'll credit people more.throroughly at that point
(it's fine you don't have to add me, I did nothing in particular lol)
Nah you helped a lot! You answered my many questions lol. I do want to give people credit
Lvstri unknowingly improving the rust ecosystem
lustri you raised your voice at the right time in the right levels π
das a lot of todos
me: hmm this PerPrimitiveEXT decoration is missing
literally every possible tool in the Graphics Programming stack breaks
NV Driver: broken
spirv-val: broken
slang: would you believe it, broken

why not come back to the holy land of (G)ood (L)uck
ah yes, making my life harder for no reason 
heh i tried
potti mentioned that the slang ppl fix shit in a timely fashion
its really cool tho that slang is nv
cause they can ping the driver peeps


