Sluggish Rendering More Than 50 Quads | Vulkan API Discord | Page 1

junior karma Apr 10, 2024, 4:38 PM

#

Hey all, I'm fairly new to Vulkan and may have a fundamental misunderstanding. I'm developing on MacOS on a super beefy Mac Studio, so my question may be biased by performance or compatibility issues since we have to use MoltenVK.

Here's the rust/ash repo I'm working on https://github.com/wizebin/ash-base
And here's the file with the code below in it https://github.com/wizebin/ash-base/blob/3fa3747cd9773ff53342035a51fbb88ff6bc42df/src/main.rs
Here's the constant defining how many quads to create and render https://github.com/wizebin/ash-base/blob/3fa3747cd9773ff53342035a51fbb88ff6bc42df/src/main.rs#L529

I'm following several samples, primarily the main sample from the vulkan ash crate https://github.com/ash-rs/ash/tree/master/ash-examples/src and having some terrible performance.

With just 200 quads, this function (found https://github.com/wizebin/ash-base/blob/3fa3747cd9773ff53342035a51fbb88ff6bc42df/src/main.rs#L617) takes about 80ms to run, quite far beyond the needed 16ms for 60fps

record_submit_commandbuffer(
        // SNIPPED FOR DISCORD MESSAGE LENGTH
        device.cmd_end_render_pass(draw_command_buffer);
    },
);

This ends up making the scene quite jittery, I am able to render significantly more quads with the same content using the cpu and rending with SDL so it absolutely must be an issue with the way I'm using Vulkan. The process of updating quads and resending data to the gpu is incredibly fast, but the render itself is taking ages.

I attached a flamegraph of the execution with the function above highlighted in blue

Can somebody give me a few pointers about possible issues so I can get started with resolving these performance issues?

Screenshot_2024-04-10_at_12.29.12_PM.png

tender thicket Apr 10, 2024, 7:56 PM

#

I cant find anything obviously wrong with the code. So i ran it on my system and aside from being unstable and needing small fixes it ran perfectly fine.

I suspect this is some MoltenVK related issue. Unfortunately i cannot help you there but hopefully this will at least help you narrow down what the issue might be.

proud plover Apr 10, 2024, 8:40 PM

#

wait... does it take 80ms on the cpu?

junior karma Apr 10, 2024, 8:46 PM

#

It's just 80ms total, I'm not sure if that's cpu or gpu, that's just total delay between starting the command buffer submission function execution and completing it

proud plover Apr 10, 2024, 8:48 PM

#

you could start by commenting everything in the closure out and seeing what happens

junior karma Apr 10, 2024, 8:50 PM

#

tender thicket I cant find anything obviously wrong with the code. So i ran it on my system and...

Dang, thank you for looking into it, I'm sure I'll grind through the fixes you found eventually as well, if you happen to have a second to point them out that would be greatly appreciated!

@proud plover Thanks for the advice, my next few steps are just to binary search performance and see if I can identify the culprit

#

It does run pretty well for me with just 4 images, but when pumping the quad count up to 200 it really starts to suffer

tender thicket Apr 10, 2024, 8:52 PM

#

junior karma Dang, thank you for looking into it, I'm sure I'll grind through the fixes you f...

Well it was mostly just removing macos specific code that wouldnt build. One issue in particular though was that you selected the swapchain extent based on reported limits. The problem is that some window managers (wayland in particular) allow you to freely choose the size. So they report a min of 0 and a max of u32::MAX. Effectively your code was trying to create a 4billion by 4 billion sized swapchain 😅

mellow rose Apr 12, 2024, 4:16 PM

#

do you store the vertices in ram

#

i had an issue where i created a buffer with VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT and it basically was sending data from ram every frame 🙈

junior karma Apr 13, 2024, 1:07 AM

#

Hmm, that certainly may be, I'll look at that during my next debugging attempt. I have ruled out textures but haven't had time to rule out any other system yet.

It did stick out to me that the tutorials seemed to create and free the vertex buffer instead of keeping the buffer around for future updates. on the other hand I couldn't find a single example that modified vertices on the cpu at all.

I'm so inexperienced and there are so few large scale samples in rust that I completely lack confidence in any conclusion without thorough testing

~~Running on windows resulted in absolutely lightspeed performance, so it may simply be the moltenvk layer~~ 🫠

wraith cedar Apr 16, 2024, 10:40 AM

#

junior karma Hmm, that certainly may be, I'll look at that during my next debugging attempt. ...

moltenvk tends to be sensitive to some things

#

did you come up with a solution?

#Sluggish Rendering More Than 50 Quads