#Sluggish Rendering More Than 50 Quads

13 messages · Page 1 of 1 (latest)

junior karma
#

Hey all, I'm fairly new to Vulkan and may have a fundamental misunderstanding. I'm developing on MacOS on a super beefy Mac Studio, so my question may be biased by performance or compatibility issues since we have to use MoltenVK.

Here's the rust/ash repo I'm working on https://github.com/wizebin/ash-base
And here's the file with the code below in it https://github.com/wizebin/ash-base/blob/3fa3747cd9773ff53342035a51fbb88ff6bc42df/src/main.rs
Here's the constant defining how many quads to create and render https://github.com/wizebin/ash-base/blob/3fa3747cd9773ff53342035a51fbb88ff6bc42df/src/main.rs#L529

I'm following several samples, primarily the main sample from the vulkan ash crate https://github.com/ash-rs/ash/tree/master/ash-examples/src and having some terrible performance.

With just 200 quads, this function (found https://github.com/wizebin/ash-base/blob/3fa3747cd9773ff53342035a51fbb88ff6bc42df/src/main.rs#L617) takes about 80ms to run, quite far beyond the needed 16ms for 60fps

record_submit_commandbuffer(
        // SNIPPED FOR DISCORD MESSAGE LENGTH
        device.cmd_end_render_pass(draw_command_buffer);
    },
);

This ends up making the scene quite jittery, I am able to render significantly more quads with the same content using the cpu and rending with SDL so it absolutely must be an issue with the way I'm using Vulkan. The process of updating quads and resending data to the gpu is incredibly fast, but the render itself is taking ages.

I attached a flamegraph of the execution with the function above highlighted in blue

Can somebody give me a few pointers about possible issues so I can get started with resolving these performance issues?

tender thicket
#

I cant find anything obviously wrong with the code. So i ran it on my system and aside from being unstable and needing small fixes it ran perfectly fine.

I suspect this is some MoltenVK related issue. Unfortunately i cannot help you there but hopefully this will at least help you narrow down what the issue might be.

proud plover
#

wait... does it take 80ms on the cpu?

junior karma
#

It's just 80ms total, I'm not sure if that's cpu or gpu, that's just total delay between starting the command buffer submission function execution and completing it

proud plover
#

you could start by commenting everything in the closure out and seeing what happens

junior karma
#

It does run pretty well for me with just 4 images, but when pumping the quad count up to 200 it really starts to suffer

tender thicket
# junior karma Dang, thank you for looking into it, I'm sure I'll grind through the fixes you f...

Well it was mostly just removing macos specific code that wouldnt build. One issue in particular though was that you selected the swapchain extent based on reported limits. The problem is that some window managers (wayland in particular) allow you to freely choose the size. So they report a min of 0 and a max of u32::MAX. Effectively your code was trying to create a 4billion by 4 billion sized swapchain 😅

mellow rose
#

do you store the vertices in ram

#

i had an issue where i created a buffer with VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT and it basically was sending data from ram every frame 🙈

junior karma
#

Hmm, that certainly may be, I'll look at that during my next debugging attempt. I have ruled out textures but haven't had time to rule out any other system yet.

It did stick out to me that the tutorials seemed to create and free the vertex buffer instead of keeping the buffer around for future updates. on the other hand I couldn't find a single example that modified vertices on the cpu at all.

I'm so inexperienced and there are so few large scale samples in rust that I completely lack confidence in any conclusion without thorough testing

Running on windows resulted in absolutely lightspeed performance, so it may simply be the moltenvk layer 🫠

wraith cedar
#

did you come up with a solution?