#Downsides of Programmable Index Pulling

27 messages · Page 1 of 1 (latest)

shadow prawn
#

Currently I use the global shared vertex and index buffer strategy to do multidraw, but this adds a choke point when loading new mesh data because it has to be allocated into these arrays. I would like to move away from that to a pure-bindless approach.

My idea is to have a shared buffer of buffer addresses only (making allocation much simpler, since every entry is the same size) and look up the vertex and index buffers inside of the vertex shader in the multidraw dispatch.

Is this a good idea, and what are the downsides of doing this?

mellow sable
#

index buffers allow hardware to reuse processed vertices which saves computations, bandwidth and sometimes is required to get full geometric troughput (like on older AMD cards)

shadow prawn
#

suppose I only care about modern hardware

#

does that change the equation

mellow sable
#

you still cannot benefit from index buffers, you can only set those up at the command buffer level

shadow prawn
#

is this somewhere that mesh shaders would help? since you can populate those as part of the dispatch

mellow sable
#

well that's a bit inaccurate, you can certainly benefit from the data compression aspect

#

mesh shaders setup their own index buffers in each workgroup/meshlet, yes

#

but unless your data has been processed to fit that, you may be less efficient than legacy vs + idx buffer

pine drum
#

the only way to do bindless/etc indices is indeed just mesh shaders

#

unless you have some sort of system where you cull and write indices post-cull

mellow sable
#

considering how cheap command recording is in VK and especially that device-generated commands are a thing, the need to do bindless index buffers is not that big the way I see it

#

at some point drawing the world in one draw is more of a flex than anything else

shadow prawn
#

I'm finding that command buffer recording in vk is a lot more expensive than I thought it would be, especially in debug

#

I'm only recording a couple hundred commands but that's already taking around 14ms

mellow sable
#

you can record hundreds of thousands of draws before hitting bottlenecks

#

that's down to your scene traversal being slow most likely, not an API bottleneck

shadow prawn
#

it is not. if I stub out the vulkan commands my encode code drops to just 1 ms.

#

In release it is faster, encode only takes about 3ms total. but the slowness in debug is a problem for me

mellow sable
#

debug/release doesn't affect driver code so it's unclear why that'd make a difference

shadow prawn
#

just what i've measured 🤷

mellow sable
#

what do your commands do ? do you upload data, switch pipelines, rebuild new ones ? or do you just bind PC/descriptor data and spam draws

shadow prawn
#

I encode one CB and submit it at the start to batch upload data, then I go though encoding all of the draw commands for the various stages (culling, prepass, ray tracing, etc)

#

i don't build pipelines at encode time, those are all built beforehand

pine drum
#

you can easily do a few thousand drawcalls per milisecond

#

its so fast that your bottleneck will be literally anywhere else