#Iris - A Journey through OpenGL and beyond to learn Graphics
1 messages Β· Page 4 of 1
Once for the main camera and once for each shadow cascade
Performance was actually a bit worse than not doing ROC on shadows, but I assume it scales better?
oh ok. Is it actually worth it for the shadows.... Yeah about to ask
With representative_fragment_test it looks like it's viable
Like it takes much less time than the shadow map passes themselves
Previously it was ~400us for each cascade
Depending on the cascade of course.
yeah but without the extension idk. Because the shadow map generation pass the fragment shader does basically nothing
so that you are not saving anything
Ye
I was looking at other methods for shadow caster culling, but they are all so ridiculously complicated that it turned me off
Like building a whole octree to accelerate culling and stuff.. crazy stuff
are you already doing compute frustum culling
Yes
Takes no time at all
I'm surprised by how little time it takes honestly lol
It's the small gray rect
Less than 10 microseconds 
yep its amazing and so useful
16 bit depth can help speed up shadow map generation. and dont use a geometry shader if you are
Yeah no geometry shader
I was thinking about going full reverse Z because ROC has some precision issues on very small AABBs
Such as floor planes etc..
like false negatives, where the aabb doesnt produce any fragments and the mesh is not drawn?
but then the mesh wouldnt be drawn in the first place i guess
No, it's because of temporal incoherence due to the camera moving relatively quickly and ROC not being able to keep up with last frame's depth
I guess at least
I could fix this by making the AABBs slightly bigger though, now that I think about it
To add some buffering
last frame's depth should be irrelevant
My bad, current frame's depth*
the temporal incoherence is solved by the step that renders just-disoccluded objects
Ah yeah, I forgot about that
I found that adding some epsilon to the AABB works pretty good at "solving" this temporal incoherence too
And it doesn't reduce the effectiveness of culling by much
void main() {
...
vec3 position = make_cube(gl_VertexID) - 0.5;
const vec3 aabb_max = object_info.aabb.max.xyz;
const vec3 aabb_min = object_info.aabb.min.xyz;
const vec3 aabb_size = (aabb_max - aabb_min) * 0.5;
const vec3 aabb_center = object_info.aabb.center.xyz;
const vec3 local_view_pos = vec3(transpose(inverse(transform)) * vec4(camera.position, 1.0));
position *= (aabb_size + 20) * 2.0;
position += aabb_center;
if (all(lessThan(abs(local_view_pos - aabb_center), aabb_size))) {
roc_visibility[object_id] = 1;
gl_Position = vec4(-2.0, -2.0, -1.0, 1.0);
} else {
o_object_id = object_id;
gl_Position = camera.pv * transform * vec4(position, 1.0);
}
}
LODs are also something I could do in the future, but I would like some hugemonogous scenes to test with first
I'm getting sidetracked here a lot though, hehe, TAA is the primary objective.
The infamous reprojection is here
Although now that I'm beginning to understand TAA I know I can't just apply this to the shadow map hmm...
I would definitely need a shadow mask.. maybe?
Oh well, temporal supersampling shadows can wait for now
yeah
shivoa mentioned that shadow masks can be trivially TAA'd in the context of the new zelda game having awful shadow aliasing
it's probably worth it if you're suffering from bad aliasing from the sun shadow
Ahh I see
Speaking of bad aliasing, Elden Ring on ultra had pretty bad aliasing too..
I wonder why they don't just filter the thing..
For Zelda, is the nintendo switch unable to perform shadow filtering efficiently?
spatial filtering doesn't remove temporal aliasing
Yeah, I mean spatio temporal filtering
Beaten by a beginner in OpenGL smh
tbf, they have to carefully consider all features they add since they will eat into the rendering budget
Ye jokes aside I'm sure they err on the side of caution most of the time
that being said, temporally filtering a shadow mask should be quite cheap in the grand scheme of things
they probably just don't have motion vectors or some other thing that would be dealbreaker when it comes to this
adding those would likely make rendering the scene more expensive
that's quite a bit of extra bandwidth
Hmm I see
imagine if the g-buffer was already like 96 bits, an extra 32 or 64 could affect perf quite a lot
I am actually limited by VRAM throughput just by rendering shadow maps 
And that's on desktop, I imagine mobile and low power chips have even lower bandwidth?
i dont understand the hype for zelda π¦
Maybe it's good I don't know. I never played it
it's allegedly a fun game
I know people really liked the previous one, so of course they would be excited for a sequel
hmm, the lets plays i watched so far were meh, but thats probably just me
I like FPS and Arena more anyways π
An open world like Elden Ring would be good too, I loved it
also didnt like that one, but that applies to all dark souls and clones
arena? π§
Anyways, after I get TAA done I really want to experiment with Atmospheric Scattering, I'm getting bored of the void sky
By arena I mean stuff like ULTRAKILL or DOOM
mmm ultrakill is special
They are considered arena shooters right?
it's relegated to boomer shooter now
I thought "standard" was more like CoD or something
nah its all the same
Anyways, I like DOOM
Very cool indeed
DRG is so good
πͺ¨andπͺ¨
hehe
celeste and hollow knight are also pretty good if you want a break from fpses
I played both last year and was pretty blown away
i know the author of celeste
Personally?
no
Ah
well its made by 2 peeps
they hang out on the FNA discord
(FNA = XNA reimplementation)
i didnt like celeste either btw : (
You mustn't be easily impressed like I am
perhaps i prefer more space-y stuff
I wish Starfield will be huge
Except it's Bethesda...
I also want a proper space game, Star Citizen is the closest we get
same
But it's permanently beta
Freelancer was superb
then i played Everspace 2 recently which is praised as the new freelancer or better even
but the end was VERY disappointing
I thought about it a little more and the shadows in totk can probably still be TAA'd for static objects without motion vectors
marking dynamic objects to ignore would still require 1 bit in the gbuffer
This do be some good anti aliasing
why can I see the stone behind some curtains
Twas a little bugged 
This is not
(There's the comparison in #wip)
I wonder how I should handle stuff that just pops into existence, what do I use as the previous frame's model matrix?
Just a zeroed out mat4?
just using the current one sounds more correct
you can also have a bias mask that controls how strongly should be blended between history and current color
Hmm yes makes sense
Motion Vectorsβ’οΈ
Full TAA is here, it looks incredibly good
I should filter it a bit more though, so objects in the distance don't shimmer, but I know that I can tweak and tune TAA for the next 10 years and it'll never be perfect π
I will patiently wait for Mr. Jaker to port FSR2 to OpenGL so I can have free antialiasing AND upscaling 
He was hard at work with FSR2 this morning so that's good π
Where to go next hmm
I kinda want to continue with mesh shaders, but the situation's dire in OpenGL..
Next is Atmospheric Scattering which is really cool
And some UI wouldn't be too bad as well..
Oh, terrain rendering as well!
I will have to get started, sooner or later, to making open worlds after all
nah
next is volume clustered rendering
and then is void's GI thingy
before all else
In this video Volume Tiled Forward Shading is described. Volume Tiled Forward Shading is based on Tiled and Clustered Forward Shading from Ola Olsson et. al.
In this video, the Sponza (Crytek, 2010) scene is used to demonstrate Volume Tiled Forward Shading. By first constructing a Bounding Volume Hierarchy (BVH) over the lights, we can achieve ...
This?
yes
Psychedelic Lighting Simulator I see
hehe
It's basically point light culling?
i mean i couldnt tell the difference between 500 and 1mio lights
not just point light i guess
This has been successfully added to the TODO listβ’οΈ
Hmm I really don't want to implement a GUI myself, is there any library I could use? I have found this: https://github.com/Immediate-Mode-UI/Nuklear
Lel
deccer has found someone worthy of the task
Progress report on Operation Black Mage (porting FSR2 to OpenGL)
Uh
The shaders and cmake are probably done, now I just need to make the backend
Btw, how is your TAA under motion
looks neat still
Even the best TAA implementations I've seen shimmer a little bit in motion
Visual representation of Jaker's progress:
Jk by the way, it's your personal time so I'll be patient
: )
~~Perhaps I'll switch to Vulkan faster than you port it
~~
By the way RMLUI looks very interesting, finally my useless HTML and CSS skills will be put to use
On a whim I decided to actually look at the vulkan tutorial in my bookmarks
This is.. frightening lol
Words I've never seen before 
Tbf you only have to do like 70% of that once in your whole life
Since it's boilerplate
Yeah but why is there a physical and a logical device, what does that even mean
For SLI or something?
Ah alright it's explained, perhaps I should just read
VkPhysicalDevice is the actual hardware, VkLogicalDevice VkDevice lets you specify what extensions and features of the VkPhysicalDevice you want to use and stuff like that?
To draw to a VkImage acquired from the swap chain, we have to wrap it into a VkImageView and VkFramebuffer. An image view references a specific part of an image to be used, and a framebuffer references image views
Ah yes, indirection.
A recommended method, which has been proved over the years :p
how come you fiddle with Vk now?
Nothing special, I just wanted to check out how much I can delay using Vulkan 
heh
honestly i set up my device stuff once and I look at it only when I need to enable something
if you use C++ vkbootstrap helps a lot with the setup too
I really like this Vulkan thingy where you put your stuff into structs and then use these as parameters
It reminds me of my framebuffer constructor with 10 parameters or something
I'll probably adopt this
don't forget that OpenGL features zero structs
almost
a mind boggling choice tbh
the indirect structs don't count
15 goddamned parameters 
You'd love fwog hehe
The next time OpenGL manages to anger me with its unhingedness I'll switch to Fwog
you can do whatever, I just like shilling
Yes, I think I've used enough raw OpenGL, so I'll gladly accept your shilling
wouldnt surprise me that you switch to vk and make a better Fwovk before jaker even considers switchting to vk
Fvog
and martty will get inspiration from it for vuk2
int main() {
if (!glfwInit()) {
IRIS_LOG_ERROR("failed to initialize glfw");
return -1;
}
glfwWindowHint(GLFW_CLIENT_API, GLFW_NO_API);
glfwWindowHint(GLFW_RESIZABLE, GLFW_FALSE);
auto* window = glfwCreateWindow(800, 600, "IrisVk", nullptr, nullptr);
// instance
{
auto count = 0_u32;
const auto** extensions = glfwGetRequiredInstanceExtensions(&count);
auto application_info = VkApplicationInfo();
application_info.sType = VK_STRUCTURE_TYPE_APPLICATION_INFO;
application_info.pNext = nullptr;
application_info.pApplicationName = "IrisVk";
application_info.applicationVersion = VK_MAKE_API_VERSION(0, 1, 0, 0);
application_info.pEngineName = "IrisVk";
application_info.engineVersion = VK_MAKE_API_VERSION(0, 1, 0, 0);
application_info.apiVersion = VK_API_VERSION_1_3;
auto instance_info = VkInstanceCreateInfo();
instance_info.sType = VK_STRUCTURE_TYPE_INSTANCE_CREATE_INFO;
instance_info.pNext = nullptr;
instance_info.flags = {};
instance_info.pApplicationInfo = &application_info;
instance_info.enabledLayerCount = 0;
instance_info.ppEnabledLayerNames = nullptr;
instance_info.enabledExtensionCount = count;
instance_info.ppEnabledExtensionNames = extensions;
auto instance = VkInstance();
auto result = vkCreateInstance(&instance_info, nullptr, &instance);
IRIS_LOG_INFO("instance created: %p", (const void*)instance);
}
while (!glfwWindowShouldClose(window)) {
glfwPollEvents();
}
glfwDestroyWindow(window);
glfwTerminate();
return 0;
}
``` This is what I wrote in the past 30 minutes
And I'm already tired

It's 40 lines of code
You need less than 30 for a colored triangle π
glfwDestroyTheChild

Does it actually?
big chads don't call either
true
let os mommy clean up after you
and claim that their shit works whifout calling sdlInit when using sdl
what about cleaning up before you?
wut
there was a dude somewhere in #vulkan or idk where, claiming that
Hmm I don't really like RML, it forces you to use OpenGL 3.3 if I didn't misunderstand anything?
It also comes with its own glad and stuff
Actually maybe not
Yeah okay, after using RmlUi for a few hours I decided it's not very comfortable to use.. I think it's more intended for Apps and stuff instead of real time 3D
dearimgui looks promising, it's even used by Rockstar Games lol
I will eventually, but I'll probably use something easier like Electron or something like that
I would like to see my render targets right now mostly
yeah then dear imgooey
babagooey
Is that the docking branch
Yes, I've discovered that after I initially did this all manually without docking of course 
Try drawing ImGui without framebuffer srgb
It'll fix some of the colors, but too bad it's wrong either way 
Hmm need everything as linear sRGB and then perform the conversion manually in the TAA resolve pass?
You should be drawing ImGui at the very end of everything
And without framebuffer sRGB, because in theory it should already be sRGB (thus it doesn't need the linear-srgb conversion)
But doesn't OpenGL convert back to linear space when sampling from textures?
I suppose ImGui samples from my output texture when I give it the texture id to ImGui::Image, then it will write that into the default framebuffer and since it's not sRGB it will be stored in linear space?
can you not draw imgui onto the final image after resolving everything else
or even once you blitted everything to fb0, draw imgui then
I'm talking about imgui itself, not the random image you draw with ImGui::Image
Ah by the way, my pipeline is simply: Depth Prepass -> Frustum + Occlusion Culling -> Depth Reduce -> Shadow Setup -> Shadow Render -> Color Pass (SRGB texture) -> TAA Setup -> TAA Resolve -> Draw UI (To SRGB framebuffer)
ImGui is drawn at the very end, to the default framebuffer
and I'm saying to disable framebuffer srgb right before that
It should make your imgui darker
Ah I see
glDisable(GL_FRAMEBUFFER_SRGB)
I can tell it's on because the colors are too bright
Nope it still looks very bad
Btw, the author of ImGui literally doesn't understand color encodings, so it's wrong no matter what 
https://github.com/ocornut/imgui/issues/578
The gui colors look correct at least
But now your scene is super dark lol
I don't care too much about color correctness of the UI to be quite honest
I'm afraid to step into uncharted territory (#vulkan)
I looked again at whatever little Vulkan I wrote yesterday and I just realized I'm passing over 15 parameters to a single function
The whole "ApplicationInfo" and stuff..
I think that's enough Vulkan for today
230 lines and still no pretty colors on the screen π¦
Time to learn about atmospheric scattering!
just a few hundred more lines and you'll have a clear color going

Do you recommend anything for atmospheric scattering? Besides scratchapixel's article?
Does Fwog have any samples I can copy reference?
I only have a local mie scattering fog thingy
which isn't suitable for atmospheres, since you need rayleigh scattering for dat
Rip, I'll read scratchapixel first and then this then
there are probably also some siggraph pbr course thingies you can look at
I also have this thing sitting in a tab
https://www.ea.com/frostbite/news/physically-based-sky-atmosphere-and-cloud-rendering
There's also the thing about stars and clouds
And fog
And rain
Making an atmosphere is hard
fog is something I can help with
Lovely, one piece at a time I'll solve this puzzle hopefully
I'm not sure about rain rendering techniques
you can probably get something decent with a particle system
I guess there are quite a few components to a weather system though
clouds, precipitation, ground effects (like puddles, snow, 'wettening', etc.)
Hmm yes, the most difficult thing would be to dynamically update each object based on how it reacts to rain
Why can't the dumb silicate rock figure it out on it's own smh
I wrote about particles and teardown (which has some nice weather effects) in my blog
https://juandiegomontoya.github.io/
how do objects react to rain?
The things you wrote
with shriwwle
ah
Objects become more shiny or their roughness is accentuated
I think some tomb raider game had a presentation where they mentioned this stuff
I'm getting very ahead of myself though, atmosphere first.
It seems like the easiest of them all
Thanks for all the links, big preesh
ah yeah uncharted 4 was the one
https://advances.realtimerendering.com/other/2016/naughty_dog/NaughtyDog_TechArt_Final.pdf
they use some wetness mask
btw ignore the mip fog in that presentation 
Me: Heavily constrained by VRAM Throughput
Also me: What's 6 or 7 more full screen passes
How do games even do this
Without running into absurd amounts of render targets
you'll get there π
lol. lmao
btw the bandwidth isn't that bad when you read/write to each pixel once
full screen passes are generally super cheap
There's no way we're reading once per pixel from the wetness mask is there 
one trick is combining a lot of full screen passes into a single one also
Hmm, like making a huge shader?
t h i c c
How many buffer bindings was the limit per shader stage? 

it looks like this is just affecting the values they write to the g-buffer in the scene pass
Hmm I see
wetness is an additional material input thingy that they use
so it's one extra texture read I guess
I am still using a regular forward renderer, should I go deferred?
but it doesn't affect how chonky the g-buffer is
Even more render targets 
deferred is nice because it makes a lot of post processing stuff possible
like ssao, ssr, etc.
I'll have to think carefully about what I put in the gbuffer
I'm thinking just uvs and depth?
everythingggggg

that's a tough question
"standard" deferred renderers will put surface material info in the g-buffer (material type, normals, roughness, metallic, ao) and depth
Yeah but that's a lot of bandwidth
boohoo
the new hotness is visibility rendering, which is basically depth+triangle&instance ID, then you can fetch material parameters in the shading pass
Look, I don't want my occupancy to look like it's starving
meh
visibility buffer is quite a bit more effort for probably 0 gain in most projects
UE still uses a massive g-buffer in its regular rasterized renderer and that performs fine
Does depth + uv not work?
how do you know what texture to sample with just the UV
now how do you get mipmap+anisotropic filtering
just pass 4 more floats for UV gradient on x and y 
depth + uv + material id + ddx/ddy
Do I only need derivatives for aniso?
If I had meshlets I could easily calculate derivatives
Just get 3 vertices with gl_MeshPrimitiveID, compute barycentrics, ezpz
if you have the triangle ID, you can compute the derivatives (and UV, etc.) analytically
http://filmicworlds.com/blog/visibility-buffer-rendering-with-material-graphs/
you need instance ID and triangle ID to uniquely identify a triangle
implying you'll have more than one vertex/index buffer
It's easy if you have only one ofc
I made it like this because I thought it was more appropriate 
Jokes aside it's actually very easy to solve
well, if you're going to optimize meaningless stuff, you better go the full way
with one mega vertex/index buffer, you can do MDI
Every object already stores their index to their VBOs and EBOs
So I just need one more id, object id
And I can uniquely identify any triangle
homie is speedrunning graphics
True this is pretty meaningless lol
I'll just keep my good old forward renderer until I need a reason to change it
I guess you can write a thin g-buffer if you need that stuff for post-fx
like what I think π©
does
Atmosphere is cooler than random api plumbing π
Btw I do MDI already, in case you forgor
I just preallocate 256MiB and if it spills I make another 256MiB
oops 
you may need to up that if you do visbuffer
or mayhaps make some adjustments to the algorithm
I guess you could record the buffer id and do multiple shading passes (or one where you have all the buffers bound)
The scheduler in my brain says this process (deferred) is very low priority
But I will think about it
I'm not saying you should do it. I'm saying that it will happen at some point
Beware the pipeline!
@wicked notch what do you remember about the reason for physical devices and logical devices being separate things? I just (hopefully correctly) remembered that one logical device can correspond to multiple physical devices, e.g., for SLI
I think it's the other way around
One physical -> many logical
Yeah, a VkPhysicalDevice is the actual hardware while the VkDevice specifies what features and extensions of that specific physical device you want to use
https://stackoverflow.com/questions/31833776/howd-multi-gpu-programming-work-with-vulkan
The idea with this is that the SLI aggregation is exposed as a single VkDevice, which is created from a number of VkPhysicalDevices. Each internal physical device is a "sub-device". You can query sub-devices and some properties about them. Memory allocations are specific to a particular sub-device. Resource objects (buffers and images) are not specific to a sub-device, but they can be associated with different memory allocations on the different sub-devices.
tutorial vs so answer, unstoppable force vs immoveable object
I think this is a bit different
I think you can ask for certain things before getting the logical device
sadly my knowledge of vulkan device creation is quite limited
anyways, the tutorial provided reason enough to have such a separation
the SLI thingy is just an aside (and mostly irrelevant)
If you can query for SLI in OpenGL I'd just do:
if (is_sli_enabled) {
printf("bro wasted his money lmao");
debug_break();
}```
Rate my (non-antialiased) sun
I forgor everything about my physics classes, how do you calculate the sun's trajectory over time again? 
sunpos = sunpos + 1
x = cos(time)
y = sin(time)
fisically accurate
I remember it more involved but that's accurate enough I guess?
making it have an offset is up to the reader to figure out
good thing void is not here atm :>
@ void !!! math alert !!!
Now we can start doing the atmosphere thingy
Looks like a perfect UK atmosphere already
Still no clear color has been achieved
At least I have a swapchain now, I have no idea how OpenGL abstracts the swapchain away from me...
Anyways it's cool seeing the inner workings of how things work effectively
it's cursed
the windowing system defines it
you can write to it by binding framebuffer 0, but you can't read from it or query anything about its attachments
its a little sad
glfw gives you a minimal amount of control over the swapchain when you create the window
one could simulate a swapchain, with an ordinary fbo, and you never care about fb0
and your swapchain::present is just a glBlitNamedFramebuffer(..to fb0...)
i am toying with that idea for some time now, and prohaps emove BeginRenderToSwapchain and replace it with an acchual schwapschain object
hmm, how do you render to the actual swapchain then
is it just a secret blit somewhere
not a bad idea tbh
I could copy that idea to shrimplify fwog (by removing RenderToSwapchain stuff)
: )
technically
you could still draw your UI after it if you wanted with a little gymnastics to put it between blit and swap, or you also just draw into the swapbuffer-fbo
pretty much like you would do in dx11
Hmm I can't find the necessary motivation to study Atmospheric Scattering
Exams are also coming up 
Rayleigh will still be here too
I'll definitely be there for the FSR2 inauguration
I guess I'll try to continue with the Vulkan triangle instead of randomly reading scratchapixel
We're at day 3 and counting by the way 
The more I sink into Vulkan the less I understand how OpenGL works
How anything "just works" in GL is purely beyond me at this point
ObenGL is just Vulkan minus info structs and synchronization thingies (i forgot the word)
That's normal when learning Vulkan
I wanted to notify I reached the "Synchronization and Command Buffers" phase
If you don't hear from me within 30 minutes, that means Khronos has claimed my soul
Alright
I came back
I have this.
...But I understood close to nothing about synchronization
I understood a bit of synchronization
Which I am pretty happy about, I at least have 1 clue of what the thing is doing
That being said I will definitely do atmosphere tomorrow, while I digest the casual 1k lines for a single triangle..
I also realized earlier I am not even using a vertex buffer, that's probably another thousand lines 
Donβt worry, thereβs a good portion of that 1k you wonβt have to mess with all the time
czech and german number plates
OpenGL: Driver does lots of mostly useful stuff for you, hides some nastyness.
Vulkan: You deal with everything yourself.
I have a question. Do you do the ROC with an indirect draw or not
alright thats exactly what I was going to ask
so I need to add an other indirect buffer π¦
Yeah, and it's just a single indirect command too if you use Jaker's trick
Which is to do a TRIANGLE_STRIP with the number of objects that were visible as instanceCount
Roc?
Raster Occlusion Culling
so the compute frustum culling would modify the instanceCount of this single draw command?
Yes
i smell an other layer of indirection
i was also thinking about that last night
when you let your cs cull all sorts of things, then how to transfer the right gl_DrawID, to fetch the instance data for the object you want to render
unless you also pass that to the cs, and let it also store the instance data that way it does for indirects
but gave up thinking further a minute later π
Along the indirect buffer to write, I also pass another buffer yes
It's sole purpose is to store the relationship between the index of the object and the index of the thread in the compute shader that culled the object
That's where the additional layer of indirection comes from
I never thought that mate :p
i do/did π
I can see how the compute shader culling works, but how does roc work?
You rasterize every object's bounding volume and check it against the depth buffer
You also abuse EarlyZ so that you can simply have a fragment shader that does visibility[object] = 1
Because all the samples that don't pass will not enter the fragment shader
Where does the depth buffer come from?
You use this frame's depth buffer to write visibility for the next frame
How does it deal with false negatives?
Generally object "pop-in" is not noticeable unless at low framerates but you can fix that by also drawing all the objects that changed visibility from 0 to 1
You keep two visibility buffers and copy current to last at the end of the frame
Then in the culling compute shader, you just check both last and current
So the pipeline looks like this:
frustum_cull();
draw_all_visible();
occlusion_cull();
draw_changed_visibility();```
At least I think it was this way?
All of these in gpu?
Yes
Wait no it's the first occlusion cull that you don't need
Yeah, we have the visibility from the previous' frame
I should give it a try
Do you use bounding spheres and bounding boxes, any other shapes?
I just use regular AABBs, I used bounding spheres for HiZ but I prefer ROC π
republic of china
What step is roc? draw*?
occlusion_cull
and draw_changed_visibility would also be something you implement due to roc
Okay now it makes a bit more sense
chicken-and-egg problem hehe
I suck at explaining
egg and chicken problem
naw you did good
or at least no worse than myself when explaining this π
even i understood it somehow
- frustum cull - clear
- draw all visible - this is just normal draw, fills depth buffer
- roc - draw bounding volumes, update visibility buffer for next frame
Like that?
ye
Pretty much
Does roc execute fragment shader for every fragment for the bounding shape?
there is also an nv extension that allows just one fragment to be rasterized
https://registry.khronos.org/OpenGL/extensions/NV/NV_representative_fragment_test.txt
Unless you do glEnable(GL_REPRESENTATIVE_FRAGMENT_TEST_NV); ye
That would be one fragment per triangle, I see. Cool stuff.
Is this worth on mobile as well?
roc in general?
no clue here, I don't do mobile
it makes huge savings on desktop, so I'd imagine that some of it would translate to mobile hw
@wicked notch try Nsight Graphics
It's gone 
They removed it in this version
welp
Ah one thing I'm doing is this though
foreach cascade {
dispatch() // does frustum culling, writes indirect commands
glMultiDrawIndirectCount(); // draws
}```
Could this cause problems?
nvidia doesn't like "subchannel switches" (changing between compute and graphics work) because it forces a WFI each time
but that will manifest as a tiny gap of no work on the GPU
and you have only like 4 cascades, so it's no biggie
Yeah, but there's a stupid amount of compute warps active
Where do these all come from
That's a good question
have you seen this extension
https://registry.khronos.org/OpenGL/extensions/ARB/ARB_texture_filter_minmax.txt
it seems supported on new hw from both vendors
it's all yours, my friend

@wicked notch have you figured out how to view multi-frame gpu traces in nsight
for some reason it's only showing me the first frame
I presume this means it'll capture 5 consecutive frames anyways
Yeah, as far as I know it just calculates the frame deltas
this grayed out button tells me that something weird is happening
Hmm probably an estimate or weird shit happening
I also have the thing greyed out for some reason
Mayhaps it's due to the debug group markers?
ah
when I change the metric set to throughput metrics, now I can aggregate the frames
much better
epic
I didn't even notice the difference between advaced and throughput
Documentation on nvidia's site is outdated 
yeah
this post shows some "srcunit" metrics that have been renamed since
https://developer.nvidia.com/blog/optimizing-vk-vkr-and-dx12-dxr-applications-using-nsight-graphics-gpu-trace-advanced-mode-metrics/
you can now find them by just searching "L2"
btw you can analyze a range with shift+drag (I accidentally discovered it a few days ago)
Nice
This looks more updated indeed, I hope it's not just for VK and D3D
yeah I'm looking at my gl app with it right now
I am a bit conchfused by C#'s generics system
Given the following:
private static void GenericThing<T>(T x)
where T : IThing
{
x.DoThing();
}
private static void InterfaceThing(IThing x)
{
x.DoThing();
}```
What is the difference between the first and the second thingy?
If the generic type is constrained, then what's the point? 
one is compiletime the other is runtime
Hmm
Looking at the IL code both seem to do a callvirt does that mean I'm doing something wrong?
no
What do you mean by compile time?
i think constructed questions like that dont lead to anywhere
you cant do typeof(x) in interfaceThing, but you can do typeof(T)
Hmm I see
and use it to run code
if typeof(x) == typeof(whatever) for instance
but you could in the generic thing
What I don't understand is why the GenericThing is doing a virtual call π€
callvirt is used for more or less everything, typically the jit will devirtualize it if it can
Ooh
So let's say I have this:
Thing thing = new Thing();
IThing thing2 = new Thing();
GenericThing(thing);
GenericThing(thing2);```
I assume the first will be devirtualized while the second will not?
never had to dig down that deep
Hehe, I enjoy getting to know the language
id rely on the compiler to lower properly
I want to explore C#'s guts as much as possible
good exercise right there π
you can also emit IL with c#, but its been years since i did that last time π it was in the summer of 2005
I have this very nice thing on the right
i think both will probably be devirtualized, since even though thing2 has type IThing the jit can go back and see that it's got to be of type Thing
Hmm I see
So if the compiler can prove that T is going to be Thing then it will devirtualize the call
Epic
i'm not 100% certain on that though, i could be completely wrong there!
That's fine, I'll be reading stuff on C#'s JIT compiler anyways
Guesswork and theorycrafting are appreciated as well
its interesting to see how people approach new languages : )
https://www.youtube.com/watch?v=4yALYEINbyI This is a great talk tbh
Explains a lot about generics, devirtualization and stuff
Unfortunately no one really knows how the JIT compiler actually performs devirtualization, but this issue is pretty good: https://github.com/dotnet/runtime/issues/7541
I've seen from the various threads that the rule of thumb to avoid hindering the JIT's compiler ability to devirtualize is to use:
- Sealed Classes
- Don't assign T to an interface, just use
var - Generics and
typeofare your best friend
i am not sure i understand 2)
With 2 I mean this:
IThing thing = new Thing();
``` is potentially dangerous
While
Thing thing = new Thing();``` is safe
ah, but it shouldnt
Yeah of course, this is a shrimple example
The compiler is big brained enough I'm sure
its a good thing that all the dotnet/and c#isms are also worked upon in public
i need to make more use of spans too - https://learn.microsoft.com/en-us/archive/msdn-magazine/2018/january/csharp-all-about-span-exploring-a-new-net-mainstay
weird how this article is inbetween all this other crap
C# has been pretty fun so far, however Avalonia's AXAML is a bit of a pain to use
Why couldn't it just be HTML smh
i agree avalonias xaml is scuffed
WPF's xaml is >> all
HTML is not an option π its not xhtml
π ImGui
dear imgui is the end-all be-all of gui solutions
Alright
Trying to render the City Sample (even the small one) caused my GPU's driver to reset infinite times, I had to hard-reset my PC 
Turns out that if two or more meshes are equal, different vertex buffers with the same contents are still created for them, which is really suboptimal
So I was basically loading the full 116 million triangles for no reason at all, spilling it into system RAM, and even into the page file
are you going to break the mesh into chunks?
physical ones
and stream/load on demand?
is it even "large" (dimension wise) enough? π
I think for now I'll shrimply NOT allocate vertex buffers with the same contents lol
Then I might do some software meshlet shenanigans perhaps
Streaming is still a dream I don't think I can achieve right now, but soonβ’οΈ
: ) oki
the stream dream meme
Also, I'll upload the city sample if any of you want to break your engine try it out yourself
be careful with your git repo
License issues?
size and bandwiff
I dunno anything about licensing tbh
Ah, yeah 300MiB is not much but still quite heavy
there is a stupid limit for unpaid git plans
perhaps you can upload it to discord
and link from zer
not sure what the non nitro upload limit is, with nitro its 500mibs iirc
only 100mb for plebs like us
I have a hetzner storage box but I have no idea how to setup permissions 
For a quicc upload π
Try it out, let me know how many levels of backrooms you'll fly through
sudo chmod +777 -R /
π
(dont run that)
Hehe I've used linux for a long time
890mb π
ngl I've actually fallen for that joke way back
hehe
On a VPS though so nothing catastrophic
i got rid of all my servers over the years
SharpGLTF.Validation.SchemaException: Accessor[208] _count: 0 must be greater or equal to 1.Model generated by <Unreal Engine 5.2.0> seems to be malformed; Please, check the file at https://github.khronos.org/glTF-Validator/
lets see if vscode can load it
Hmm, blender could load it so I assumed it had no errors
gltf seems indeed kaputt here and there
according to the red indicators of the linter/validator thing
you need to rewrite UE into the german umlaut u π and yell it, puts more emfasis on the sillyness
oo ee
"i blame Γ"
why ΓΌ smiling
201 validation messages for /home/deccer/Private/Code/Projects/lessGravity/OpenSpace/src/OpenSpace.Assets/Data/Props/Small_City_LVL/Small_City_LVL.gltf
haha, i didnt see it until you said it
cgltf doesn't seem to complain though I did pass it through gltfpack
So perhaps it did some weird shenanigans
hmm its also just warnings but ye
Just do the big ignoreβ’οΈ
{
"bufferView": 208,
"count": 0,
"type": "VEC3",
"componentType": 5126
},
{
"bufferView": 209,
"count": 0,
"type": "VEC3",
"componentType": 5126
},
{
"bufferView": 210,
"count": 0,
"type": "VEC4",
"componentType": 5126
},
for those kind of things
I mean, yeah a 0 sized buffer is nonsensical 
or is it?

heh
hmm i probably have to open another iShoe with that lib
since validation is pretty much off already
I'm not touching my home pc until I'm off work
i was about to say that an excuse is about to come .. something about work
you live in ze wong timezone
no u
nuh uh
So, it's exam season but I have not been idling, I have done quite a bit of research a lot of stuff: How Nanite works, Micropolygon Software Rasterizers, Visbuffers and Vulkan.
- Regarding "software" meshlets (i.e.: without mesh shaders) it turns out you lose quite a bit on performance because you lose the vertex cache. I'm not sure if this'll be a problem in the future with hugely detailed meshes, we'll see.
- Micropolygon Software Rasterizers are incredibly cool and they somehow beat a hardware rasterizer. It turns out they also work really well with a 64-bit Visibility Buffer.
- Visibility buffer are another thing that's incredibly cool, if you store the cluster index and the triangle ID within that cluster, along with the depth you have all the informations you need to render a triangle.
I, however, am getting VERY ahead of myself, given that I don't even render a single triangle with my current Vulkan abstraction 
I'm also quite tired of using raw OpenGL and don't have the life force to abstract it on my own, so I guess Fwog will have another user!
Unfortunately I don't have time to properly sit down and program as I used to, once exam season ends I should start again
Alright, I have one hour before my brain batteries run out, it's time to clone Fwog
Jaker I expect techsupport within 2 nanoseconds of my requests
Thanks π©
π³π΅
There is an error here https://github.com/JuanDiegoMontoya/Fwog/blob/main/include/Fwog/Rendering.h#L79
This tells me you always use a -1 to 1 depth range
Unforgivable
I uh
unrelated, but I also need to fix a bunch of missing enumerators because they didn't appear in the refpages when I was copying them 
before or after working on them articles?
babogey

I forgot what the missing enumerators were though so I'll forget about it altogether
Regardless of Jaker's questionable time management skills
We have Mr. Triangle for the 28595th time
Time to do hybrid software-hardware rasterization with LOD'ing and basically reimplement nanite
Hold on
OpenGL does support 64 bit atomics right?
I mean 64 bit image atomics, sorry
what's the ext called
ill revoke the first tringle being rendered properly
GL_EXT_extension
GL_EXT_shader_image_int64
close
I don't see it on gpuinfo
maybe it's one of those glsl extensions that "just works" when you add it on supporting drivers
So is this concerning? "0(3) : error C0202: extension EXT_shader_image_int64 not supported\n"
no
How do I feed SPV to your thing
that's the neat part
uh
would you like me to add a way to feed spirv to fwog shaders
Nah, don't bother, it's too painful
Alright change of plans
I will not do the 64 bit visbuffer in OpenGL
I'll shrimply do software meshlets
I mean for buffers
Hm
maybe your epic renderer could be enough pressure to get vendors to add 64-bit image atomics
Yeah, let's not do 64 bit visbuffer in GL 
its been 1.5years or so since devsh showed off his
of a visbuffer
and if you look at #vulkan right now you can tell it took a toll
jebus they talk faster than i can read
I do actually remeber being able to enable GL_EXT_shader_image_int64 and declare u64image3D in the shader without compile error on amd
but the new amd drivers are weird. Cant say if it works when you have backing storage and actually read/write it
Epic
Btw, how do mesh shaders handle vertex cache? We do basically both vertex and index pulling here, does it not matter because clusters are small enough to fit in L1 anyways?
my educated guess is that they don't, at least on AMD hardware
since mesh shaders bypass the part of the hardware (the geometry engine) that implements vertex reuse
lemme know if there is anything in fwog that sucks
Ayeaye chief
vertices[v_offset + vertex_indices[i_offset + primitive_indices[p_offset + thread_index]]]
Lovely three way indirection 
How in god's holy name is this struct not trivially copyable
struct raw_meshlet_t {
uint32 vertex_offset = 0;
uint32 index_offset = 0;
uint32 index_count = 0;
uint32 triangle_offset = 0;
uint32 triangle_count = 0;
// custom data
uint32 group_id;
};```
Is C++ shitting me?
TriviallyCopyableByteSpan(std::span<T> t) : std::span<const std::byte>(std::as_bytes(t))```??????
Ah TriviallyCopyableSpan accepts a single T too for some reason smh
Alright, we got software mesh shaders in Fwog
It's how I call "rendering meshlets without mesh shaders"β’οΈ lol
The pipeline is actually very simple, from the meshlet buffer I build indirect commands that have a vertexCount of primitive_count * 3 triangles and one instance
Then the vertex shader just fetches primitives, indices and vertices from gl_DrawID
void main() {
const meshlet_t meshlet = meshlets[gl_DrawID];
const mat4 transform = transforms[meshlet.group_id];
o_meshlet_id = gl_DrawID;
const uint primitive_index = primitives[meshlet.triangle_offset + gl_VertexID];
const uint vertex_index = indices[meshlet.index_offset + primitive_index];
const vertex_format_t vertex = vertices[meshlet.vertex_offset + vertex_index];
gl_Position = camera.pv * transforms[meshlet.group_id] * vec4(vertex.position.xyz, 1.0);
}```
Only problem is uh, L1 hit rates being 20% 
ok so this could easily be done in actual mesh shaders when available on the hardware
the fact that the vertex shader manually fetches the index makes it sounds slow
Yeah, I didn't change anything on building meshlets or the buffers from my previous mesh shaders attempt
Yep
This is one sad frame capture lol
where should I be looking at to see its bad? (i dont use nsight and stuff)
"Unit Throughput" being very smol
Also "SM Occupancy" is all grey
The blue thingy in SM Occupancy is how many vertex warps were in flight
Less vertex warps = less throughput = less ms
Actually more ms but you get the point 
the meshlets are generated by some libary?
Yes, I use meshoptimizer
Hmm that's a pain indeed, but you only need these two functions to build meshlets: meshopt_buildMeshletsBound and meshopt_buildMeshlets
So perhaps you could just DllImport your way through these two
yeah If I really wanted, I would do that. Like I am trying to do right now with fsr2 and failling horribly
but first I need to look for a faster mesh shader emulation. I remember there was compute shader emulation: https://tellusim.com/mesh-shader-emulation/
have you looked into that?
great
Much better
But L1 hit rates are still pure garbage
This is shrimply MDI + Index Buffer + Post-T&L cache
Compute will come after I study 
fun learning time has ended, now it is boring learning time
i have cooked up a new horrible way of studying
factorio on one monitor, watching lectures i missed on the other
very effective
I can't focus on more than one thing at a time
if I'm programming with a video playing on the other monitor, I don't retain anything from the video
and vice versa
it depends for me
the lecture is mostly audio, and the material is not extremely complicated
at best I'll tab out of a game while I'm waiting to respawn so I can read a sentence or two of a blog post
Same, I tried watching some doom horror mod showcase while programming, and I don't remember shit 2 hours later
ok it only "works" until you try to use imageStore. Because then you get a runtime exception in glLinkProgram...
i wonder if others can reproduce it. @frank sail next time you are on on your amd card and you have time can you try compiling and linking this please (as fragment shader):
#extension GL_EXT_shader_image_int64 : require
#extension GL_ARB_gpu_shader_int64 : require
layout(binding = 0) restrict writeonly uniform u64image3D ImgResult;
void main() {
imageStore(ImgResult, ivec3(0), uvec4(0));
}
you might need the format in the binding thing too
layout(binding = 0, xxx) blabla blabla; where the xxx is
its writeonly so the format is not needed
ah
(but i also tried with)
there's an extension that makes the format unneeded even when it's not writeonly
GL_EXT_shader_image_load_formatted
one really has to read the little txt files of the extensions first to know when to use which π neh?
Epic
what's the deal with these ktx compile options
https://github.com/LVSTRI/Iris/blob/master/CMakeLists.txt#L67
I dunno what's the deal either but I couldn't compile KTX because it uses -Werror and it has warnings so...
I just yoinked warnings out of the equation 
Probably just an issue with my outdated MSVC tools
I'm about to find out if that's the case
yeah there were a bunch of warnings lol
@wicked notch are you shrimply assuming that all textures in your gltfs are compressed
https://github.com/LVSTRI/Iris/blob/848591e2f896d03a17a34a9d3eaaf0b6c77d5cc1/src/model.cpp#L81
I don't see a call to texture_t::create in that file, so I guess so
just learned how basis universal works too and it seems quite nice
imagine being the poor schmuck who has to write the BU->ASTC transcoder 
Yeah, it's a very primitive model loader, the Vulkan one is much better
soonβ’οΈ
btw
how do you obtain the ktx images
do you just use the toktx tool in ktx-software
ah
sweet
doesn't look like blender can output basisu textures or draco meshes 
anyways, I shouldβ’οΈ have compressed texture support pushed to fwog tomorrow. It's already in, but I have to test before committing
Lovely, I'll be the first in the world to use Fwog's compressed texturesβ’οΈ
Referencing: https://tellusim.com/mesh-shader-emulation/ How can they possibly use only one DrawElements call?
Hmm I suppose they just call it with the max number of indices
I do not understand this quite well...
same way you can use glDrawArrays(... big number);
and get the triangleId by gl_VertexID % 3 to index into some other buffer to draw many triangles/sprites in 2d games
Does gl_VertexID reset when gl_InstanceID is incremented?
I crashed the driver so hard I got a BSOD
Is this even possible?
Ah I see
mfw windows shits itself when I add 3059167329581762395871623049847601294875601948657 to my vertex offset
lvstri trying to read vertices from another timeline
I have cooked up a solution
The only problem is that it's a garbage solution
Basically for each vertex ID, store the meshlet it belogs to
And is gl_VertexID just the content of the index buffer?
the index of the vertex currently being processed. When using non-indexed rendering, it is the effective index of the current vertex (the number of vertices processed + the firstβ value). For indexed rendering, it is the index used to fetch this vertex from the buffer.``` What does "index used to fetch this vertex from the buffer" mean? 
vertices[base_vertex + gl_VertexID]?
yes but base_vertex is already taken into account
Wait, is it the index in the index buffer or the index in the vertex buffer?
Say my index buffer is [2, 0, 1], I call glDrawElements(3, 1, ...)
gl_VertexID represents the index of the current index being processed?
So like vertices[indices[gl_VertexID]]
for indexed drawing its the value of the index not the index of the index in the element buffer.
So if you do indexed drawing and want to manually fetch the vertex you'd do vertices[gl_VertexID]
Ah I see, there's a big fat dumb monkey in my brain
Thanks for engaging in my meaningless ramble, I understand now
are you on the path of VisBuffer again?
its not an uncommen question
its not meaningless btw π its akschually kwite interesting
I can't really do that in GL unfortunately
Or rather, I can, but I'm not brave enough to hack spirv into OpenGL and pray for the best (and that the driver doesn't notice)
devsh did it
How did she/he/they manage?
its a brainworm
Yeah well, I figured 
ask him, he has a dedicated channel on his discord explaining it in 6, 7 vidjeos
I'll see if I catch him in #vulkan sometime, I see they're quite active in there
Yeah but Vulkan is a beast
I can do meshlet rendering in less than 200 lines in OpenGL 
why not? I am not so familiar about visbuf
Regular visbuffer is perfectly doable with OpenGL, what I'm trying to do requires 64 bit image atomics and I recall that the extension for it was not present/very broken in OpenGL?
Also there's no GL_R64UI image format so... there's that 
why does it have to be 64bit?
Because I can put depth in it as well
ah packing
And if I can put depth in it I can do imageAtomicMax and do hybrid software rasterization in compute
I am not aware if I could do the same without imageAtomicMax or with a GL_RG32UI target
you could add it into llvmpipe perhaps π
Drivers are magic as far as I know, I don't think I should be allowed anywhere near them 
im sure you could figure something out heh
I'd guess Vulkan is far less effort than hacking the driver


discord wont be running away π€
