Downsides of Programmable Index Pulling | Graphics Programming | Page 1

shadow prawn Apr 11, 2026, 4:46 PM

#

Currently I use the global shared vertex and index buffer strategy to do multidraw, but this adds a choke point when loading new mesh data because it has to be allocated into these arrays. I would like to move away from that to a pure-bindless approach.

My idea is to have a shared buffer of buffer addresses only (making allocation much simpler, since every entry is the same size) and look up the vertex and index buffers inside of the vertex shader in the multidraw dispatch.

Is this a good idea, and what are the downsides of doing this?

mellow sable Apr 11, 2026, 5:23 PM

#

index buffers allow hardware to reuse processed vertices which saves computations, bandwidth and sometimes is required to get full geometric troughput (like on older AMD cards)

shadow prawn Apr 11, 2026, 5:23 PM

#

suppose I only care about modern hardware

#

does that change the equation

mellow sable Apr 11, 2026, 5:24 PM

#

you still cannot benefit from index buffers, you can only set those up at the command buffer level

shadow prawn Apr 11, 2026, 5:24 PM

#

is this somewhere that mesh shaders would help? since you can populate those as part of the dispatch

mellow sable Apr 11, 2026, 5:25 PM

#

well that's a bit inaccurate, you can certainly benefit from the data compression aspect

#

mesh shaders setup their own index buffers in each workgroup/meshlet, yes

#

but unless your data has been processed to fit that, you may be less efficient than legacy vs + idx buffer

pine drum Apr 12, 2026, 6:28 PM

#

the only way to do bindless/etc indices is indeed just mesh shaders

#

unless you have some sort of system where you cull and write indices post-cull

mellow sable Apr 12, 2026, 6:33 PM

#

considering how cheap command recording is in VK and especially that device-generated commands are a thing, the need to do bindless index buffers is not that big the way I see it

#

at some point drawing the world in one draw is more of a flex than anything else

shadow prawn Apr 12, 2026, 6:34 PM

#

I'm finding that command buffer recording in vk is a lot more expensive than I thought it would be, especially in debug

#

I'm only recording a couple hundred commands but that's already taking around 14ms

mellow sable Apr 12, 2026, 6:34 PM

#

you can record hundreds of thousands of draws before hitting bottlenecks

#

that's down to your scene traversal being slow most likely, not an API bottleneck

shadow prawn Apr 12, 2026, 6:35 PM

#

it is not. if I stub out the vulkan commands my encode code drops to just 1 ms.

#

In release it is faster, encode only takes about 3ms total. but the slowness in debug is a problem for me

mellow sable Apr 12, 2026, 6:36 PM

#

debug/release doesn't affect driver code so it's unclear why that'd make a difference

shadow prawn Apr 12, 2026, 6:37 PM

#

just what i've measured 🤷

mellow sable Apr 12, 2026, 6:39 PM

#

what do your commands do ? do you upload data, switch pipelines, rebuild new ones ? or do you just bind PC/descriptor data and spam draws

shadow prawn Apr 12, 2026, 6:40 PM

#

I encode one CB and submit it at the start to batch upload data, then I go though encoding all of the draw commands for the various stages (culling, prepass, ray tracing, etc)

#

i don't build pipelines at encode time, those are all built beforehand

pine drum Apr 12, 2026, 7:02 PM

#

shadow prawn I'm only recording a couple hundred commands but that's already taking around 14...

in debug

#

you can easily do a few thousand drawcalls per milisecond

#

its so fast that your bottleneck will be literally anywhere else

#Downsides of Programmable Index Pulling