#Luna Engine - C++ and Vulkan

2381 messages ยท Page 3 of 3 (latest)

prisma folio
#

one invocation is one meshlet

#

and the shader doesn't even use local invocation ID

#

it only uses global

#
void main() {
  const uint meshletId = gl_GlobalInvocationID.x + (BatchID * MeshletsPerBatch);
  if (meshletId >= Uniforms.MeshletCount) { return; }

  bool isVisible = CullMeshletFrustum(meshletId);
  if (isVisible) {
    const uint index = atomicAdd(CullTriangleDispatch.Commands[BatchID].x, 1);
    VisibleMeshlets.Indices[index + (BatchID * MeshletsPerBatch)] = meshletId;
  }
}
lone moon
#

it do look reasonable

#

I feel like the early return is useless but it's fine nvm

prisma folio
#

yeah I have no idea why changing the workgroup size fixed it

#

what is the default if you don't give one, anyway? does it default to 1, or some device-specific amount?

cloud osprey
#

1

prisma folio
#

and atomic adds work across invocations / outside of local group?

lone moon
#

yes

prisma folio
#

then I guess it really must be some kind of intel skill issue, idk how else to explain it

prisma folio
#

tfw CullMeshlets is outputting more meshlets than were given to it froghorror

west hamlet
#

its just predicting the future for you already

prisma folio
#

it's got to be something about the atomic add

#

do I need a barrier() somewhere..?

#

if CullMeshlets has a local size of 1, there's no error, but I'm back to missing half of my meshlets

#

if it's anything above 1, the meshlets appear, but I start apparently counting meshlets twice

#

damnit, just needed to adjust the bounds check

prisma folio
#

so, setting the local size of cullmeshlets didn't actually fix the issue, it just made it look better because it was going outside the bounds of its own batch

#

still have the problem of other batches not appearing, except in renderdoc

prisma folio
#

oh my god

#

it's...the validation layers

#

somehow disabling validation makes it all work

#

this thing right here is the cause of all my headache, what the hell

prisma folio
#

in any case, meshletisms seem to be complete

#

now I guess I need to learn what this "hi-z" nonsense is

west hamlet
#

: ) cool

cloud osprey
prisma folio
#

it's either the layer or the layer confusing Intel

#

that one specific checkbox causes and fixes the issue

#

looking at the layer code I can't see anything immediately wrong

#

also why don't we have a vkCmdDispatchIndirectCount damnit

prisma folio
#

home for the weekend, now I get to try all this fun on my 3080

#

first thing I should probably do is get some timestamp queries

lone moon
#

pixel says nobody wanted it

#

well I do

#

and that's enough for all vendors to bend over and impl it

#

but also potrick and now you want it

#

I demand justice

prisma folio
#

how expensive is it to have a dispatch that does nothing but early return

#

I consolidated the entire cullmeshlets/culltriangles compute into 2 calls, but it means that at least half of the invocations just bail out

#

is it better to have 4 separate but properly-sized indirect calls?

lone moon
#

idk but I'd assume the less dispatches the better

#

you can merge the meshlet culling and primitive into one disparch too

prisma folio
#

I guess I could, couldn't I? hmm

#

this is why I need timestamps, I need answers only profiling can give

lone moon
#

I can already tell you meshlet gen for bistro (interior and exterior) takes 50us on my 3070

prisma folio
#

gen? as in meshopt_buildMeshlets?

lone moon
#

the compute shader that outputs index buffer

prisma folio
#

ah, do you have a single compute shader?

lone moon
#

in frogfood yes

prisma folio
#

frogfood has 2 tho

lone moon
#

actually no

#

ye in my old thing

prisma folio
#

there's a new thing?

lone moon
#

the new thing is mesh shaders

prisma folio
#

well yeah of course

#

but frogfood has CullMeshlets + CullTriangles, no?

lone moon
#

yeah jaker split it at some point

#

when he wanted primitive culling

#

but you can merge both as well

#

I did so before moving to mesh shaders

prisma folio
#

yeah I should definitely try mesh shaders, but it would mean I'd have to maintain both paths or abandon developing at work XD

lone moon
#

the best solution is mesh shaders ofc but this approach is great too tbh

prisma folio
#
vkAllocateMemory(): pAllocateInfo->allocationSize is 359013904 bytes from heap 2,but size of that heap is only 224395264 bytes.

does VMA just...not look at heap sizes before allocating? I didn't tell it to use any specific memory type, it chose that one

prisma folio
#

single compute shader has been accomplished, with a small feedback buffer for culling stats

prisma folio
#

tried adding cone culling for fun...yep, 3% of the meshlets for the whole scene. def not worth much ๐Ÿ˜…

#

probably not worth the extra memory bandwidth at all

cloud osprey
#

How are you getting the culling numbers? Readback?

prisma folio
#

ye, small storage buffer

layout(set = 1, binding = 9, scalar) restrict buffer VisBufferStatsBuffer {
  VisBufferStats Stats;
};
prisma folio
#

timestamps are in, maybe implot next, but that's enough for tonight

lone moon
#

zero reason to use cone culling with software meshlets

prisma folio
#

I already have the determinant part from culltriangles yeah

#

I was just curious if it would help to cull them in bulk before doing individuals

#

So if I'm understanding Hi-Z right, you're basically taking mips of the depth buffer where the value is the closest depth value for that section, so you can prevent drawing objects behind closer geometry?

#

How well does that work when the camera moves quickly/on the edges of the screen?

lone moon
#

perfectly fine if you do two pass HZB

#

it artifacts if you do one pass only

prisma folio
#

What's two pass?

lone moon
#

render previous visible objects
build hzb
render disoccluded objects

cloud osprey
#

yet there are three steps, curious

prisma folio
#

How do you determine the disoccluded objects?

cloud osprey
#

they are the ones that were not rendered in step 1, but were not culled

lone moon
#

you test for visibility against hzb and if previous visibility was 0 and current visibility is 1 then it's a disocclusion event

prisma folio
#

Is it a hard requirement that hzb images are power-of-two?

#

or could I just make a normal mip chain

cloud osprey
#

you'd have to cope with downsampling odd resolutions but it could work

#

it's just easier to reason about when it's PoT tbh

prisma folio
#

hmm, may have to mess with my render graph

#

I can make images that are constant size or a multiplier to framebuffer size, but not "previous power of two of framebuffer size"

#

I guess I can make "Fixed", "Relative", and "PoTRelative"? ๐Ÿ˜…

cloud osprey
#

why not let people get the framebuffer size

prisma folio
#

because technically the framebuffer size doesn't exist until the graph is executed

#

actually wait, hmm

cloud osprey
#

It could reduce API surface area since you could get rid of the other options

#

Or keep them if they simplify things, idc

prisma folio
#

I forgot framebuffer size is baked into the graph, passes CAN retrieve the size of other attachments at bake time, that works

prisma folio
#

I think I need to ditch this render graph for now, it's...putting up a lot of resistance to hzb

prisma folio
#

Does this seem right to you guys? Not sure if my hzb reduce is acting as it should

#

Mip 0:

#

Mip 4:

#

shouldn't it always be getting brighter (higher values)?

lone moon
#

idk

#

are you doing reverse Z

prisma folio
#

yeah

lone moon
#

then yes

prisma folio
#

yes it's wrong?

west hamlet
lone moon
#

yes it should be getting brighter

#

@prisma folio I'm stupid

#

with reverse Z you do min reduction

#

therefore it's not supposed to get brighter, sorry

prisma folio
#

then I'm confused, what do the mips actually do?

#

if we do min all the time, we're checking what the furthest depth value in that region is, no?

#

actually wait, now that I'm thinking, hmm

cloud osprey
#

you calculate how large the screen space bounding box of the object is, then sample a prefiltered mip based on that

#

otherwise you would have to sample a lot of pixels for large objects in order to determine whether they can be culled

prisma folio
#

well hzb is certainly doing...something

prisma folio
#

I feel like there has to be some kind of Y-flip that I'm not doing, because the way this screenshot looks is as if the ceiling is a mirrored mask for the meshlets on the floor (notice the gap in the ceiling matching below)

west hamlet
#

if you copy from fwog then probably

prisma folio
#

I've tried flipping so many Ys and can't figure out which one it should be

west hamlet
#

hehe

cloud osprey
#

U should write a debug visualization instead of flipping the table Y

west hamlet
#

like a floating big arrow in Y shape

#

which indicates how the Y goes

prisma folio
#

I'm actually trying something now

#

drawing a box that represents the uv bounds of a specific meshlet

#

problem is, it looks perfect

#

which narrows the issue to CullQuadHiZ

prisma folio
#

jesus, I think I've finally got it

#

meshlet cull is already up to 1ms though which is interesting, didn't you say your cull only took 50us LVSTRI?

cloud osprey
#

yeah my cull is almost instant

#

I can do it 10+ times per frame without noticeable perf loss

prisma folio
#

I'mma grab nsight, I don't trust my timestamps yet

#

aaaand it crashes allocating a command buffer, nice

lone moon
#

3070

prisma folio
#

including hzb?

lone moon
#

probably

#

maybe not

prisma folio
#

nsight said 0.41ms for my whole dispatch

lone moon
#

sounds about right

prisma folio
#

does it? 0.41ms isn't exactly a noop

lone moon
#

there may or may not be some more optimizations you can do

#

idk how you implemented the thingy

prisma folio
#

I never realized that the interior is a little, uh..off ๐Ÿ˜„

#

I assume it's just because the interior and exterior were never meant to be merged

prisma folio
#

hmm, slight issue with hzb; not sure if this is because it's picking the wrong mip or what

#

that vertical plank in front of the bottles gets culled

prisma folio
#

back on the intel igpu, oof

prisma folio
cloud osprey
#

Are you on Intel

prisma folio
#

right now yeah, but even on the 3080 it was a good deal slower than what LVSTRI mentioned

cloud osprey
#

I split the two shaders specifically to make it easier to achieve peak occupancy

#

I mean you could use persistent threads to have constant max occupancy in one dispatch, but that'd be hard

#

And you'd have to use a sketchy ahh spinlock

prisma folio
#

hmm, guess I can try splitting them again as an experiment

lone moon
#

oh yeah

#

that's why you split them

#

I forgor

prisma folio
#

when LVSTRI said 50us I figured splitting them wasn't all that important, but I guess it might be ๐Ÿ˜„

cloud osprey
#

culling was slow for me until I split them

prisma folio
#

with validation turned off it's currently ~11ms

cloud osprey
#

I should also warn you about copying that primitive culling code, because I don't think it's 100% correct bleakekw

#

I'm mainly skeptical about the small primitive culling function

prisma folio
#

minor improvement KEKW

west hamlet
#

: )

#

just flip dividend and divisor when its > 1

prisma folio
# prisma folio

now that I think about it, I think this issue is caused by going straight from framebuffer size to previous power of 2

#

some pixels end up getting skipped so the hiz buffer isn't entirely accurate

prisma folio
#

changing it to start at framebufferSize / 2 causes weird artifacts, it doesn't like not being pot

#

dunno how else to fix it other than going to next power of two

#

which would make it a 2048/1024 image

regal elk
#

if you're going from the vkguide version, there's an off by 1 on picking the mip level to sample

#

try adding 1 to the calculated mip level in the cull shader

prisma folio
#

vkguide has hi-z?

#

ooooh, and min reduction samplers, hmmmm

prisma folio
#

+1 seems to have fixed the issues I had, and min reduction means I can have just 1 sample , neat

regal elk
#

yeah that +1 really threw me for a loop back when I tried it, hence why I remember the solution offhand lol

prisma folio
#

yeah it didn't show up all the time but it came up whenever there was something right up against the camera with a small gap to look through

#

looks pretty good for overdraw

regal elk
#

yeah I remember what made it infuriating was that it almost worked right

#

I ended up drawing debug spheres and overlaying sub-quads of the depth buffer for where the AABBs projected, not even sure how I ultimately fixed it though

prisma folio
#

all of this descriptor chat in #vulkan has me thinking again, about scrapping the entire descriptor set setup from granite and going with a daxa "just bind everything ever" approach

cloud osprey
#

my plan for #1128020727380054046 is to make a single beeg descriptor set with a shrimple allocator

prisma folio
#

iirc that's basically what daxa does, the CPU handle for buffers and images is just an index into the descriptor array, so the CPU and GPU "handles" are the same

cloud osprey
#

I'm still considering how BDA fits in to all this

#

I guess it means I can make an array of buffer pointers and put that (along with the other descriptor stuff) in a header and include that everywhere

#

without infecting every shader source with struct definitions I mean

prisma folio
#

yeah I was thinking either 1 storage buffer with an array of pointers or a push constant with a BDA to said storage buffer

cloud osprey
#

you could have the push constant hold a pointer to a buffer with an array of buffer references

prisma folio
#

exactly

#
layout(scalar, binding = DAXA_BUFFER_DEVICE_ADDRESS_BUFFER_BINDING, set = 0) restrict readonly buffer daxa_BufferDeviceAddressBufferBlock { daxa_u64 addresses[]; }
daxa_buffer_device_address_buffer;

daxa apparently goes with the "one storage buffer" approach

#
#define DAXA_STORAGE_BUFFER_BINDING 0
#define DAXA_STORAGE_IMAGE_BINDING 1
#define DAXA_SAMPLED_IMAGE_BINDING 2
#define DAXA_SAMPLER_BINDING 3
#define DAXA_BUFFER_DEVICE_ADDRESS_BUFFER_BINDING 4
#define DAXA_ACCELERATION_STRUCTURE_BINDING 5

one master descriptor set with 6 bindings

lone moon
#

you need only 3 btw

#

but yes this method gud

#

I can vouch for it

#

since I stole it from potrick KEKW

cloud osprey
#

btw, if you have a fat array of buffer addresses, how do you tell each shader where the buffers it cares about are?

#

do you just send some push constants with indices

#

or perhaps just one buffer pointer with an array of indices

lone moon
#

you send ids through push constants

prisma folio
#

probably a push constant with an index to a "master" buffer that has other indices

cloud osprey
#

ok ye what lvstri showed is what I was wondering

prisma folio
#

ah okay so it's basically descriptor aliasing

#

got it

cloud osprey
#

yeah

#

huge array of addresses, then you tell the shader where X buffer is and it will fetch and reinterpret the pointer

prisma folio
#

yeah, you just have the same huge array of pointers defined N times with N different types

cloud osprey
#

ye

lone moon
#

no actually

#

the buffer with all the addresses is declared once

#

as array of uint64_t

#

you then reinterpret

cloud osprey
lone moon
#

defining your preferred buffer type

#

for the other stuff though yes you alias

cloud osprey
lone moon
#

alias all the images with all the qualifiers etc

#

since this is time consuming you:
A. automate this and generate all the macro permutations
B. define only a subset of aliases and let the user choose further aliasing with restrict/readonly/...

#

I recommend B

prisma folio
#

the buffer with the addresses is defined multiple times, no? each different type has its own layout(set = 0, binding = 0) of the same buffer

lone moon
#

nope, just once

cloud osprey
#

storage_images_restrict_coherent_volatile_readonly_writeonly[i] bleakekw

lone moon
#

you don't need to alias the address buffer

#

since you're gonna be reinterpreting later, with BDA

prisma folio
#

I'm confused because that's exactly what the code you linked is doing though

lone moon
#

oh

#

yeah, I should've used the daxa example

#

Potrick does it better than I do

#

I'm describing what potrick does

#

What I'm doing is I'm aliasing storage buffers directly

#

because I made a grave mistake

#

But you shan't commit the same mistake, do it the goodโ„ข๏ธ way

#

I'll refactor this later bleakekw

cloud osprey
#

where is this defined RetinaGetStorageBufferMember

lone moon
#

Bindings.glsl

prisma folio
#

I'm trying to read daxa code but I can't find any actual shaders, just the includes

lone moon
#

you should probably look at daxa's "daxa.glsl" for the correct way to do this

prisma folio
#

how does one reinterpret a buffer

lone moon
#

and then you do this "Type name = Type(masterAddressBuffer[id])

#

Like a functional style cast

prisma folio
#

that simple? huh

#

shame intel doesn't have descriptor buffer

lone moon
#

descriptor buffer is nicer than the horrid descriptorpool/set thingy

#

but thankfully you write this once and forget about it forever

#

so it don't matter much

prisma folio
#

yeah you really only need 1 pool/set for this anyway

lone moon
#

ye

prisma folio
#

also thinking about nuking buffer usage from my api and just setting it all

cloud osprey
#

dew it

prisma folio
#

every day I stray further from granite

prisma folio
#

like in rchit when it's getting textures

lone moon
#

it is needed

#

I am just lazy

prisma folio
#

okay cool just making sure I wasn't missing something

prisma folio
#

hmmm, starting to suspect something's up with the occlusion cull. no meshlets fail the hi-z, even at this angle? ๐Ÿค”

prisma folio
#

Random thoughts to investigate:

  • Singletons vs static classes vs namespaced loose functions
  • Custom memory allocator? Plus containers that can use it
  • Windowing system that supports 1 main window + any number of child windows (imgui viewports)
  • Separate render thread
  • Object creation: device->CreateTexture() vs Texture::Create()
  • Shared GLSL/C++ headers
west hamlet
#

at first point, might as well add dependency injection too, could come in handy later when object creation device->Foo

cloud osprey
#

Shared headers are trivial btw

#

Just ifdef __cplusplus

prisma folio
#

yeah I'm just trying to think of where to put them; I don't want the shaders to be in my cpp folders because where will they go if I built for distro

cloud osprey
#

I just put them next to my shaders nervous

#

that way it's easy for the scuffed shader includers to find them

#

and on the c++ side I just have to add the shader folder to the list of includes

west hamlet
#

thats a build system issue not a file organization issue

prisma folio
#

everyone's doing cool shadow stuff but me, I need to start doing things again

prisma folio
#

usually it's deccer who tries to resuscitate me

west hamlet
prisma folio
#

behold a project, this time using a lot of Retina-isms to make the code cleaner @lone moon

west hamlet
#

SEVEN MONTHS

#

: )

#

wb my frog

prisma folio
#

been at this for the past week or so, bikeshedding what kind of code style I wanted

west hamlet
#

you should have waited 60 more milliseconds

#

for 15:16:17:180

prisma folio
#

you have discord timestamps in milliseconds?

west hamlet
#

no but log entries my frog

prisma folio
#

ah

true willow
#

god damn

#

it's alive

prisma folio
#

time for me to write nothing vulkan-related for the rest of the day

#

I need filesystem access

lone moon
#

oh shit this isn't google

#

how to delete discord message

#

jokes aside I'm stoked that my code served as inspiration for you

#

how I'll sleep tonight knowing that at least one (1) person found it useful:

prisma folio
#

Ye, it's a bit of a different paradigm from what I'm used to (all of the static Create functions were weird to me) but it's kinda grown on me

prisma folio
#
template<typename T>
class MyClass {
  T* Thing = nullptr;
  [[nodiscard]] constexpr auto operator*() const noexcept -> T& { return *Thing; }
};

Question for the C++ spec experts, would this operator automatically switch between returning T& and const T& depending on the const-ness of the class?

#

Or do I need a separate declaration for both?

west hamlet
#

isnt that only with that declthingy

#

decltype(auto)

prisma folio
#

I feel like using decltype(auto) would make it just return a copy though

west hamlet
#

auto may drop cv qualifiers i read somewhere few weeks ago when i was wondering what decltype was

#

decltype(auto) preserves that and reference-ness

prisma folio
#
<source>:16:24: error: static assertion failed
   16 |     static_assert(std::is_same_v<decltype(*c2), const int&>);

rip

#

decltype(auto) doesn't save it either

west hamlet
#

shiat

prisma folio
#

guess I do need separate definitions

prisma folio
#

if I don't have a postfix return type at all it does return T, interesting

true willow
#

returning auto is always returning value

prisma folio
#
LUNA_NODISCARD constexpr auto Get() noexcept -> T*;
LUNA_NODISCARD constexpr auto Get() const noexcept -> const T*;

LUNA_NODISCARD constexpr auto operator*() noexcept -> T&;
LUNA_NODISCARD constexpr auto operator*() const noexcept -> const T&;
LUNA_NODISCARD constexpr auto operator->() noexcept -> T*;
LUNA_NODISCARD constexpr auto operator->() const noexcept -> const T*;

mildly annoying but oh well

true willow
#

you'd do auto& for ref

prisma folio
#

oooh

true willow
#

decltype(auto) forwards whatever is returned by the function

prisma folio
#

and forwarding is always done by ref?

#

or does the builtin dereference operator just return references

true willow
#

not sure

#

probably builtin dereference returns lvalue reference

#

you can get rid of it with std::remove_reference

prisma folio
#
    int* a;
    static_assert(std::is_same_v<decltype(*a), int&>);

looks like dereferencing a pointer just gives a reference, TIL

#

never had to think about these kinds of semantics before

true willow
#

class being const would mean that the member is a const pointer to non const T

#

though if T is const T then it should work I think

inland sphinx
#

I am ignorant, is const not just a compile time signal to the compiler to error on attempts to write to something?

#

why would it have any impact on pointers and references at execution times if so

true willow
#

because it lets compiler do more optimizations knowing that something doesn't change

inland sphinx
#

ahh that makes sense

true willow
#

it is even UB to cast away constness and modify through something that was created as const

inland sphinx
#

the lesson seems to be to never use const

#

jk

inland sphinx
#

maybe with a unity build it can figure those things out?

true willow
#

shouldn't be any different from link time optimizations

inland sphinx
#

resharper C++ I think warned me about const issues I wasn't working on a template though

true willow
#

and yes compiler can analyze code to decide which transformations to apply but it's less reliable than being explicitly const correct

prisma folio
#

const-correctness is also the way to go because it signals to developers what they should and shouldn't do

true willow
#

yeah should make use of the syntax to clear up semantics wherever possible, they even made a language all about it it's called rust

inland sphinx
#

we don't need to throw out the baby with the bathwater

#

very apt phrase in this context I think

cloud osprey
#

It's only not UB if the original object was not const

cloud osprey
#

Oh I switched your first two words in my mind and thought you were asking a question

prisma folio
cloud osprey
#

Ye

prisma folio
#

that's bizarre to me but alrighty

true willow
#

it was created not const so it makes sense it's not const even if you slap const on it

#

I think that's basically why const cast exists to begin with

cloud osprey
#

I thought it existed because there are old C APIs that have no notion of const correctness

prisma folio
#

I guess that makes sense, I figured const cast was for passing objects to functions that weren't written with const even if they don't modify the object

cloud osprey
#

Yeah

prisma folio
#

I thought const_cast was basically "Take away the const so I can pass it to this function, but I promise I won't actually modify it"

cloud osprey
#

That's how it should be used

inland sphinx
#

what are the rules for where "begin" is

cloud osprey
#

Hm?

inland sphinx
#

if I am assigning a literal value for the first time to a const variable it is clear that this value was originally declared as const

#

but

#

sometimes we get data from places other than literal assignment

#

what if I'm reading from io

#

what if the value was derived from a library

cloud osprey
#

it doesn't matter what you initialize the object with

#

what matters is that the object is const

inland sphinx
#

oh I see

#

we are passing by reference in Eearslya's example

#

I get it, it's a copy otherwise

prisma folio
#

Anyone here use the shaderc lib from the SDK? The CMake documentation says that shaderc_combined is a static library, but I'm getting a runtime mismatch error.

lld-link: error: /failifmismatch: mismatch detected for 'RuntimeLibrary':
>>> libcpmtd.lib(xstrxfrm.obj) has value MTd_StaticDebug
>>> shaderc_combinedd.lib(shaderc.obj) has value MDd_DynamicDebug
clang++: error: linker command failed with exit code 1 (use -v to see invocation)