Strange Timing Question | Vulkan API Discord | Page 1

warm tusk Aug 9, 2025, 9:02 PM

#

EDIT: summary of progress here #1403846020504358932 message , so you don't have to read the whole thread

original message:

I've got a vulkan game that I'm profiling using tracy, and am confused about the results. I'm using a pretty bog standard game loop, and the specific bits that I'm profiling seem well within the 16ms budget. what's strange is I'm getting a cadence where my fence wait is very long every other frame, and then both frames on my swapchain flush in quick sequence.

any idea what gives?

(note: the stackframes I've "zonescoped" are sparse, but I promise there's no mysterious CPU load being unaccounted for. the "earth" GPU frames are my most expensive renderpass, but the rest of my renderpasses are also there, just way small by comparison)

warm tusk Aug 9, 2025, 10:49 PM

#

oof. turns out I was using 2 frames in flight when I should have been using 3. whoops!

warm tusk Aug 16, 2025, 1:27 AM

#

update, nope: that didn't account for it. turns out it's a timing thing, and I happened to toggle some stuff that degraded performance which "fixed" it. still no idea what's wrong 🙁

hoary zinc Aug 16, 2025, 1:52 AM

#

Looks like CPU bound here. Try to add dedicated tracy zones for: vkAcquireNextImageKHR, vkResetFence, vkWaitFences, vkQueueSubmit, vkQueuePresent.

warm tusk Aug 16, 2025, 1:55 AM

#

ty for answer, sorry for splitting this thread (I wasn't getting any traction here before 😛 )
but I have a more up to date tracy screenshot here #beginners message (I have tracy zones scoped around the fence waits, and the acquire + waits + resets + submit + present, from CPU side, all happen within a tiny sliver, with the exception of the fence waits)

#

dumping post from #beginners here:

I'm running into absolutely wacky frame timings, and I just cannot figure it out for the life of me. I'm using a bog standard frame loop, doing nothing fancy, I've copy/pasted my loop into chat GPT and it can't find anything wrong with it, I've pored over it line by line making sure I understand what's happening, and something just still isn't making sense. anyone wanna take a stab at what I could possibly be doing wrong?
(attached: a tracy profile screenshot showing the ridiculousness; gray = waiting on fence)
some details:

VK_PRESENT_MODE_FIFO
2 frames in flight
3 swapchain images
extremely minimal CPU (~0.5ms/frame) and GPU (~3ms/frame) workload
1 fence per fif (wait at top of frame, trigger w/ submit)
also keep track of frame/swapchain index fence associations and make sure to also wait on that fence
1 semaphore per fif (submit pWaitSemaphores, triggered on vkAcquireNextImage)
1 semaphore per swapchain img (present pWaitSemaphores, triggered on submit pSignalSemaphores)

with such small workloads, what I'd expect to happen is, there'd be constant pressure on the sync objects to immediately snap up rendering w/ every vblank. but I'm getting these crazy waits.

anyone see something obviously wrong in my logic?

#

hoary zinc Aug 16, 2025, 1:58 AM

#

roger. could u show the code for fencewait and fenceimagewait?

warm tusk Aug 16, 2025, 1:59 AM

#

here's my whole loop code:

{
  vkWaitForFences(gvCore.device, 1, &gvLoop.frameResourceFences[gvLoop.curfif], VK_TRUE, UINT64_MAX);

  VkResult swapchainResult = VK_SUCCESS;
  {
    u32 imageIndex;
    swapchainResult = vkAcquireNextImageKHR(gvCore.device, gvWindow.swapchain, UINT64_MAX, gvLoop.frameImageAvailableSemaphores[gvLoop.curfif], VK_NULL_HANDLE, &imageIndex);

    if(swapchainResult == VK_SUCCESS || swapchainResult == VK_SUBOPTIMAL_KHR)
    {
      if (gvLoop.imageFrameResourceFences[imageIndex] != VK_NULL_HANDLE)
      {
        vkWaitForFences(gvCore.device, 1, &gvLoop.imageFrameResourceFences[imageIndex], VK_TRUE, UINT64_MAX);
      }
      gvLoop.imageFrameResourceFences[imageIndex] = gvLoop.frameResourceFences[gvLoop.curfif];
      vkResetFences(gvCore.device, 1, &gvLoop.frameResourceFences[gvLoop.curfif]);

      uploadEnvData();
      VkCommandBuffer commandBuffer = startRenderCommandBuffer();
      appendPreRenderCommands(commandBuffer);
      appendRenderCommands(commandBuffer, imageIndex, true, true);
      endRenderCommandBuffer(commandBuffer);

      VkSubmitInfo submitInfo={};
      submitInfo.sType = VK_STRUCTURE_TYPE_SUBMIT_INFO;
      submitInfo.waitSemaphoreCount = 1;
      submitInfo.pWaitSemaphores = &gvLoop.frameImageAvailableSemaphores[gvLoop.curfif];
      VkPipelineStageFlags waitStages[] = {VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT};
      submitInfo.pWaitDstStageMask = waitStages;

      submitInfo.commandBufferCount = 1;
      submitInfo.pCommandBuffers = &gvLoop.commandBuffers[gvLoop.curfif];
      submitInfo.signalSemaphoreCount = 1;
      submitInfo.pSignalSemaphores = &gvLoop.imageRenderFinishedSemaphores[imageIndex];

      VkPresentInfoKHR presentInfo={};
      presentInfo.sType = VK_STRUCTURE_TYPE_PRESENT_INFO_KHR;
      presentInfo.waitSemaphoreCount = 1;
      presentInfo.pWaitSemaphores = &gvLoop.imageRenderFinishedSemaphores[imageIndex];

      presentInfo.swapchainCount = 1;
      presentInfo.pSwapchains = &gvWindow.swapchain;
      presentInfo.pImageIndices = &imageIndex;
      presentInfo.pResults = nul;

      CHECK_VKCMD(vkQueueSubmit(gvCore.graphicsQueue, 1, &submitInfo, gvLoop.frameResourceFences[gvLoop.curfif]),"failed to submit draw command buffer");

      swapchainResult = vkQueuePresentKHR(gvCore.presentQueue, &presentInfo);
    }
    else
    {
      vkResetFences(gvCore.device, 1, &gvLoop.frameResourceFences[gvLoop.curfif]);
    }
  }

  // swapchain result is always VK_SUCCESS in my problematic case, so don't worry about it
}
gvLoop.curfif = (gvLoop.curfif+1)%gvConfig.nfif;

#

(I took out the VkZoneScoped calls and other profiling stuff for clarity, but they're scoped around the two vkWaitForFences calls)

hoary zinc Aug 16, 2025, 2:29 AM

#

issue which I see. U use same index for swapchain image and FIF. This the top mistake in Vulkan. gvLoop.curfif

Solution: For swapchain image u need to use dedicated index which will return vkAcquireNextImageKHR

For FIF index u need to brainlessly increment your own index and reset to 0.

#

@hildarthedorf told the legit strategy how to organize FIF and swapchain using OG binary semaphores.

For each FIF:

semaphore (acquire)
fence (for submit, wait and reset)
command pool + command buffer (for best practices)

For each swapchain image:

semaphore (render end)
framebuffer (connected to swapchain image)

The workflow

vkWaitForFences
vkAcquireNextImageKHR (fif->acquire)
vkResetFences
vkResetCommandPool
Begin command buffer
Issue commands
End command buffer
vkQueueSubmit(wait sem: fif->acquire, signal sem: swapchain->renderEnd)
vkQueuePresentKHR(wait sem: swapchain->renderEnd)

Frame in flight index is incremented and reset to zero. Swapchain image index is returned by vkAcquireNextImageKHR.

FIF and swapchain image count could be anything. For example in my Android app I have two FIF and four swapchain images.
▶️ This could be simplified by using timeline semaphores. Unfortunately I never touched them yet because my target devices are limited by Vulkan 1.1 for mobile tile GPUs. Pay attention that VkFramebuffer and VkRenderPass ~~crap~~ features are not needed on Desktop Vulkan. You should adjust approach using dynamic rendering.

warm tusk Aug 16, 2025, 2:31 AM

#

reading through the hildar post now, but I think I'm already doing what you are describing? I have independent curfif and imageIndex; I cycle curfif (just after/outside of this loop, sorry I failed to include that), and use the imageIndex returned from the acquire

#

(adding this now to the loop above: gvLoop.curfif = (gvLoop.curfif+1)%gvConfig.nfif;)

hoary zinc Aug 16, 2025, 2:32 AM

#

i'm looking code. line 12, line 6 and line 61. Obviously same index

warm tusk Aug 16, 2025, 2:35 AM

#

there are two vkWaitForFences, though: line 2 and line 12. line 2 waits on the curfif fence, and line 12 waits on the fence that was last associated with that imageIndex (note: line 14 associates the fences in frameResourceFences with the currently used image index, they're not two separate lists of fences)

hoary zinc Aug 16, 2025, 2:36 AM

#

let me compare code flow more carefully...

#

line 18 is strange. what the purpose here? It's needed to wait only once fence per frame.

warm tusk Aug 16, 2025, 2:42 AM

#

sorry, I've been tinkering w/ the posted code to get rid of extranneous info. what are you thinking is line 18?

hoary zinc Aug 16, 2025, 2:43 AM

#

I have not clue what's going on. I'm comparing with my implementation for Windows and Android platform. I have none of this. So maybe u could explain why additional vkWaitForFences is needed?

warm tusk Aug 16, 2025, 2:50 AM

#

by "none of this" you mean just the second vkWaitForFences? or is there other stuff you're not doing?

the situation is, I have a fence for every fif- I don't have any fences other that. the second list is just an association of existing fences with their corresponding imageIndex.

so what might happen is:

- frame 1: fif:0, imageIndex 0
- frame 2: fif:1, imageIndex 0

and I'd wait on fif:1 fence for frame 2, but I also would wait on fif:0 fence for frame 2, because I'm trying to use an index that was associated with a previous frame.

I'd think this would be rare, but my tracy profiling shows that it hits, so 🤷

hoary zinc Aug 16, 2025, 2:54 AM

#

Roger. I suggest to implement the simple case first. Here is my suggestion:

#

📎 message.txt

#

https://github.com/Goshido/android-vulkan/blob/%2392/tools/editor/src/render_session.cpp#L918-L1094

GitHub

android-vulkan/tools/editor/src/render_session.cpp at #92 · Goshid...

This repository is a project for learning Vulkan API, constraint based 3D physics, Lua scripting, spatial sound rendering, HTML+CSS UI rendering. - Goshido/android-vulkan

warm tusk Aug 16, 2025, 2:55 AM

#

I guess there's no need for me to stall CPU side while waiting for a submit involving an in-use swapchain image index, because presumably the semaphore would hold off on letting that submit so anything until the swapchain actually gives up the image anyways. but ftr, if I just comment out that additional wait, I still get the ridiculous timing

#

#

looking at your code now 👀

hoary zinc Aug 16, 2025, 3:01 AM

#

are u rendering complex scene of single triangle? It could be wrong GPU timeline because of missing tracy zone.

warm tusk Aug 16, 2025, 3:04 AM

#

a complex scene, but not that complex. I'm TracyVkZoneing all the render command appendations (there's one "big" one, called "earth", and a bunch of small ones)

hoary zinc Aug 16, 2025, 3:07 AM

#

do u do any cross command buffer syncronizations? because I see at least two parallel command buffer executions on GPU timeline

warm tusk Aug 16, 2025, 3:08 AM

#

nope, one big command buffer and one big submit

warm tusk Aug 16, 2025, 4:07 AM

#

ok I'm actually still working through hildar's post very slowly and carefully to not miss anything, and I think the only inconsistency with what I have is that hildar has "one framebuffer for each swapchain image", where I have "one framebuffer for each fif X swapchain image"

white echo Aug 16, 2025, 4:11 AM

#

it doesn't impact perf but like ... why?

warm tusk Aug 16, 2025, 4:15 AM

#

I have a set of frame resources per fif, so that's uniform buffers, storage buffers, intermediate images, depth buffers, etc..., and a set of present images per nswapchainimages. so the thought is, I might get assigned to present to one swapchain image while also being assigned a different set of other frame resources?

white echo Aug 16, 2025, 4:16 AM

#

yes but sets and framebuffers are separate objects

#

the only concern here is depth or any other attachment which has no concern on a single queue cuz every frame is sequential and you don't care about the previous frame's data

#

also why you only need one depth image total usually

warm tusk Aug 16, 2025, 4:21 AM

#

ok. I mean that'd be a nice GPU memory optimization to take advantage of in the future, but for now I just want to figure out this stutter lol

white echo Aug 16, 2025, 4:26 AM

#

soo quick qn, why are you waiting on fences after acquiring and why are you doing this

if (gvLoop.imageFrameResourceFences[imageIndex] != VK_NULL_HANDLE)
      {
        vkWaitForFences(gvCore.device, 1, &gvLoop.imageFrameResourceFences[imageIndex], VK_TRUE, UINT64_MAX);
      }
      gvLoop.imageFrameResourceFences[imageIndex] = gvLoop.frameResourceFences[gvLoop.curfif];
      vkResetFences(gvCore.device, 1, &gvLoop.frameResourceFences[gvLoop.curfif]);

#

seems to me that it is likey the source of your problem cuz you might be waiting on a fence that waits for 2 frames

warm tusk Aug 16, 2025, 4:31 AM

#

#1403846020504358932 message
tl;dr: I think you're right that's overkill, but commenting it out doesn't resolve the issue

white echo Aug 16, 2025, 4:32 AM

#

move the wait before acquire and then only use gvLoop.frameResourceFences[gvLoop.curfif]
delete the whole gvLoop.imageFrameResourceFences[imageIndex]

warm tusk Aug 16, 2025, 4:34 AM

#

I already have a wait before the acquire (using frameResourceFences[curfif]); the imageFrameResourceFences[imageIndex] is an additional wait, and removing it doesn't solve the issue

#

#1403846020504358932 message

white echo Aug 16, 2025, 4:34 AM

#

ohhhh missed the first line lmao

#

well I don't see anything else wrong with the code you have shown 🤔

warm tusk Aug 16, 2025, 4:47 AM

#

yeah, damn. wth am I doing wrong... 🤔

warm tusk Aug 16, 2025, 5:10 AM

#

ok so I've wrapped a TracyVkZone around my whole frame's command buffer (which verifies that yes, the "earth" commands is the 99% time spend), but I've also implemented the VK_EXT_calibrated_timestamps , so the timing of the GPU and CPU stuff should be ~ in sync. and when I hoaver over one of my GPU loads, it highlights (vertical thin pink line on the left side of the screenshot) where on the CPU the command was issued. and so it looks like I'm issuing two frames of commands, then chilling on a fence for a very long time, while presumably a semaphore is preventing the corresponding fif's commands from flushing(?), then the semaphore relents, the commands are flushed, the fence is triggered, and I can again issue two fifs.

so what would be holding on to a semaphore for 31ms, with no visible GPU load? is there a way I can get tracy to see some more internals there?

warm tusk Aug 16, 2025, 5:31 AM

#

equal parts fascinating and frustrating: if I start my loop with a vkDeviceWaitIdle(gvCore.device); right before my vkWaitForFences, I get a rock solid 60fps 🤦

#

(I don't count this as "problem solved", btw. clearly there's something wrong w/ my sync logic that this fixes it...)

hoary zinc Aug 16, 2025, 5:40 AM

#

wait could u try another app? Make sure that every single implicit layer is off. Especially Riva Tuner.

#

Maybe the issue is not with the code but with your environment.

warm tusk Aug 16, 2025, 5:45 AM

#

how can I try another app? I'd have to build it with tracy integration?

hoary zinc Aug 16, 2025, 5:45 AM

#

vk cube

#

i think u can distinguish 30 fps vs 60 fps by eye

warm tusk Aug 16, 2025, 5:57 AM

#

the issue isn't merely that it's 30fps to 60fps, but that the 30fps I experience is super stuttery and terrible. but that should work too- how can I quickly and easily run vk cube?

hoary zinc Aug 16, 2025, 6:00 AM

#

it should be inside VulkanSDK package. Are u on Windows or Linux?

warm tusk Aug 16, 2025, 6:05 AM

#

windows

#

(but also heads up, I really doubt I'm running anything that would universally hamstring vulkan applications. happy to run this quick test to prove it though)

#

ok yep found Vulkan Cube.exe, and it looks like it's running 60fps

hoary zinc Aug 16, 2025, 6:09 AM

#

warm tusk ok yep found `Vulkan Cube.exe`, and it looks like it's running 60fps

roger. let's check which layers are loaded. Are ur app is console app? Or "windows" app?

warm tusk Aug 16, 2025, 6:10 AM

#

windows

hoary zinc Aug 16, 2025, 6:11 AM

#

bad. we need a console output. vulkan-1.dll could print loading order if u run app with VK_LOADER_DEBUG=all

#

#

layer stack for VkInstance and VkDevice

#

@white echo do u know how to output this with vkconfig maybe?

warm tusk Aug 16, 2025, 6:16 AM

#

I just ran it through the vulkan configurator and got:


Vulkan Development Status:
- Vulkan Layers Controlled by "Validation" configuration
- Environment variables:
    - VULKAN_SDK: C:\VulkanSDK\1.3.283.0
    - VK_LOCAL: C:\Users\phild\VulkanSDK
- Vulkan Loader version: 1.4.309
- User-Defined Layers locations:
    - VK_LAYER_PATH variable: None
    - Per-configuration paths: None
    - VK_ADD_LAYER_PATH variable: None
- `vk_layer_settings.txt` uses the default platform path:
    C:\Users\phild\AppData\Local\LunarG\vkconfig\override
- Available Layers:
    - VK_LAYER_NV_optimus
    - VK_LAYER_NV_present
    - VK_LAYER_RENDERDOC_Capture
    - VK_LAYER_OBS_HOOK
    - VK_LAYER_VALVE_steam_overlay
    - VK_LAYER_VALVE_steam_fossilize
    - VK_LAYER_LUNARG_api_dump
    - VK_LAYER_LUNARG_gfxreconstruct
    - VK_LAYER_KHRONOS_synchronization2
    - VK_LAYER_KHRONOS_validation
    - VK_LAYER_LUNARG_monitor
    - VK_LAYER_LUNARG_screenshot
    - VK_LAYER_KHRONOS_profiles
    - VK_LAYER_KHRONOS_shader_object
- Physical Devices:
    - Intel(R) Iris(R) Xe Graphics with Vulkan 1.3.293
        - deviceUUID: 8680A0A7040000000002000000000000
        - driverUUID: 33322E302E3130312E36303738000000
    - NVIDIA GeForce RTX 4050 Laptop GPU with Vulkan 1.4.312
        - deviceUUID: 4FCCE0F1BC31734D47A646C685209BCF
        - driverUUID: 2A9A7E7F0F015AF8B3D06EE0131CF715

Launching Vulkan Application:
- Application: nightondeck.exe
- Executable: C:\archive\projects\github\nightondeck\package\release\nightondeck.exe
- Working Directory: C:\archive\projects\github\nightondeck\package\release
- Log file: C:\Users\phild\VulkanSDK\nightondeck.txt

257 countries

  [+0.01s] ini_borders

  [+0.15s] ini_window
Vulkan Instance Extensions Requesting:
  VK_KHR_surface
  VK_KHR_win32_surface

WARNING-CreateInstance-status-message(INFO / SPEC): msgNum: 601872502 - Validation Information: [ WARNING-CreateInstance-status-message ] Object 0: handle = 0x2b239d4c050, type = VK_OBJECT_TYPE_INSTANCE; | MessageID = 0x23dfd876 | vkCreateInstance():  Khronos Validation Layer Active:
    Settings File: Found at C:\Users\phild\AppData\Local\LunarG\vkconfig\override\vk_layer_settings.txt specified by VkConfig application override.
    Current Enables: None.
    Current Disables: None.

    Objects: 1
        [0] 0x2b239d4c050, type: 1, name: NULL

  [+0.06s] ini_graphics
Vulkan Device Extensions Requesting:
  VK_EXT_calibrated_timestamps
  VK_KHR_swapchain

Vulkan API Version: 1.4.312
Vulkan Driver Version: 580.352.0

WARNING-vkGetDeviceProcAddr-device(WARN / SPEC): msgNum: 582089644 - Validation Warning: [ WARNING-vkGetDeviceProcAddr-device ] | MessageID = 0x22b1fbac | vkGetDeviceProcAddr(): pName is trying to grab vkGetPhysicalDeviceCalibrateableTimeDomainsEXT which is an instance level function
    Objects: 0

shared staging used: 184.7MB of 512.0MB (36.1%)

  [+0.92s] ini_graphicscontent
  [+0.00s] ini_audio

There are 0 game controller(s) attached (0 joystick(s))
  [+0.00s] ini_game
[+1.24s] init

writing file pipelines.cache

total time 20423380762 (7.00s)
[+1.24s] init
  [+0.01s] ini_borders
  [+0.15s] ini_window
  [+0.06s] ini_graphics
  [+0.92s] ini_graphicscontent
    [+0.12s] ini_vulkanshared
    [+0.00s] ini_vulkanoffscreens
  [+0.00s] ini_audio
  [+0.00s] ini_game

arena memory used (7063 allocs, 2 arenas) 100% 262144b (0.2MB) of 262144b (0.2MB)
arena memory used (1 allocs) 0% 0b (0.0MB) of 131072b (0.1MB)

Process terminated

hoary zinc Aug 16, 2025, 6:18 AM

#

does not count. it's needed to be run via app code. App is responsible for demanding explicit layers for example

#

could u quickly add console entry point for ur app? I mean main function and switch linker settings to produce console version of your app. U could get HINSTANCE via GetModuleHandleW ( nullptr ); The rest of WinAPI code should not change.

white echo Aug 16, 2025, 6:27 AM

#

you need to compile as a console app basically

#

vkconfig doesn't output VK_LOADER_DEBUG iirc

warm tusk Aug 16, 2025, 6:36 AM

#

ok, I have it building and running as a console app. what do you want me to do with it?

hoary zinc Aug 16, 2025, 6:43 AM

#

open terminal (cmd.exe)
execute set VK_LOADER_DEBUG=all
run your app in that terminal
show the Vulkan loader output for VkInstance layers and VkDevice layers

warm tusk Aug 16, 2025, 6:48 AM

#

📎 message.txt

hoary zinc Aug 16, 2025, 6:50 AM

#

vkCreateDevice layer callstack setup to:
   <Application>
     ||
   <Loader>
     ||
   VK_LAYER_NV_optimus
           Type: Implicit
           Enabled By: Implicit Layer
               Disable Env Var:  DISABLE_LAYER_NV_OPTIMUS_1
           Manifest: C:\WINDOWS\System32\DriverStore\FileRepository\nvdmi.inf_amd64_a2b59b092685856e\nv-vk64.json
           Library:  C:\WINDOWS\System32\DriverStore\FileRepository\nvdmi.inf_amd64_a2b59b092685856e\.\nvoglv64.dll
     ||
   VK_LAYER_NV_present
           Type: Implicit
           Enabled By: Implicit Layer
               Disable Env Var:  DISABLE_LAYER_NV_GR2608_1
           Manifest: C:\WINDOWS\System32\DriverStore\FileRepository\nvdmi.inf_amd64_a2b59b092685856e\nv-vk64.json
           Library:  C:\WINDOWS\System32\DriverStore\FileRepository\nvdmi.inf_amd64_a2b59b092685856e\.\nvoglv64.dll
     ||
   VK_LAYER_OBS_HOOK
           Type: Implicit
           Enabled By: Implicit Layer
               Disable Env Var:  DISABLE_VULKAN_OBS_CAPTURE
           Manifest: C:\ProgramData\obs-studio-hook\obs-vulkan64.json
           Library:  C:\ProgramData\obs-studio-hook\.\graphics-hook64.dll
     ||
   <Device>
       Using "NVIDIA GeForce RTX 4050 Laptop GPU" with driver: "C:\WINDOWS\System32\DriverStore\FileRepository\nvdmi.inf_amd64_a2b59b092685856e\.\nvoglv64.dll"

Ur layers.

#

lt's try to disable every single of them. Do u see Disable Env Var:?

#

so the Idea is to define it as env variable while running your app. U can use same trick as before with set VK_LOADER_DEBUG=all
For example here is my config in visual studio

#

=1 is intentional here.

#

after that validate than nothing except GPU driver is below ur app

warm tusk Aug 16, 2025, 6:53 AM

#

yep, that did not change anything :/

hoary zinc Aug 16, 2025, 6:54 AM

#

alr. What's VVL status? Any sync validation/core validation errors/warnings?

warm tusk Aug 16, 2025, 6:55 AM

#

oh wait nvm one sec

hoary zinc Aug 16, 2025, 6:57 AM

#

for RivaTuner (installed with MSI autoburner usually) it's needed to do this

warm tusk Aug 16, 2025, 6:59 AM

#

agh ok sorry. I missed one of the settings, and then briefly thought it was fixed, but it quickly went back to its old stuttery pattern

#

what is rivatuner?

hoary zinc Aug 16, 2025, 7:01 AM

#

if u do not use it - ignore it. It's special because it's working and even Vulkan Loader could not detect it. usually this stuff is used for this as !example for DX game.

warm tusk Aug 16, 2025, 7:02 AM

#

ah. yeah, I don't use it. but anyways, I've also disabled the layers, and no luck 🙁

hoary zinc Aug 16, 2025, 7:02 AM

#

hoary zinc alr. What's VVL status? Any sync validation/core validation errors/warnings?

?

warm tusk Aug 16, 2025, 7:04 AM

#

no validation warnings (lol I actually fixed the one I had missed when enabling the sync stuff for the vk tracy, that was exposed when running it through the configurator)

hoary zinc Aug 16, 2025, 7:05 AM

#

NVIDIA 580.97 driver?

warm tusk Aug 16, 2025, 7:05 AM

#

yep

#

ah, nope

#

but close

#

I'll load that one up now

#

(I usually keep pretty up to date. I'd be surprised if mine is more than a few months old)

#

updated, problem persists

hoary zinc Aug 16, 2025, 7:20 AM

#

out of ideas 😢

warm tusk Aug 16, 2025, 7:21 AM

#

me too 😮‍💨
thanks for taking the time to troubleshoot with me though

flint moon Aug 16, 2025, 11:49 AM

#

@warm tusk I encountered the same issue some 2-3 years ago, too, and by now I am pretty sure that this is a bug in Tracy. Tracy sometimes calls vkGetCalibratedTimestampsEXTexcessively (1000+ calls per frame) because the calibrated timestamp deviation may change at any time and Tracy does not take this into account correctly. See here: https://github.com/wolfpld/tracy/blob/master/public/tracy/TracyVulkan.hpp#L366. If m_deviation is close to 0, it may take a lot of calls to vkGetCalibratedTimestampsEXT to finish the loop, which is obviously slow.

GitHub

tracy/public/tracy/TracyVulkan.hpp at master · wolfpld/tracy

Frame profiler. Contribute to wolfpld/tracy development by creating an account on GitHub.

warm tusk Aug 16, 2025, 4:55 PM

#

I appreciate the suggestion, but I don't think that's it in my case. I was getting problematic profiles before I enabled the calibrated timestamps ext, and can see the general pattern of every-other spiky frame times from my in-game frame timer (with tracy disabled). :/

warm tusk Aug 16, 2025, 5:34 PM

#

putting together a summary message, so anyone else doesn't need to read through the whole thread 😛

issue: (see tracy screenshot) I'm seeing alternating fence waits of ~30ms and ~1ms, even with extremely minimal CPU & GPU workloads

my current loop (stripped down for simplicity):

v0 loop()
{
  vkWaitForFences(gvCore.device, 1, &gvLoop.frameResourceFences[gvLoop.curfif], VK_TRUE, UINT64_MAX);

  u32 imageIndex;
  VkResult swapchainResult = vkAcquireNextImageKHR(gvCore.device, gvWindow.swapchain, UINT64_MAX, gvLoop.frameImageAvailableSemaphores[gvLoop.curfif], VK_NULL_HANDLE, &imageIndex);
  //in practice, swapchainResult = VK_SUCCESS and problem persists, so don't worry about it

  vkResetFences(gvCore.device, 1, &gvLoop.frameResourceFences[gvLoop.curfif]);

  uploadEnvBuffs();

  VkCommandBuffer commandBuffer = startRenderCommandBuffer();
  appendRenderCommands(commandBuffer);
  endRenderCommandBuffer(commandBuffer);

  VkSubmitInfo submitInfo={};
  submitInfo.sType = VK_STRUCTURE_TYPE_SUBMIT_INFO;
  submitInfo.waitSemaphoreCount = 1;
  submitInfo.pWaitSemaphores = &gvLoop.frameImageAvailableSemaphores[gvLoop.curfif];
  VkPipelineStageFlags waitStages[] = {VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT};
  submitInfo.pWaitDstStageMask = waitStages;

  submitInfo.commandBufferCount = 1;
  submitInfo.pCommandBuffers = &gvLoop.commandBuffers[gvLoop.curfif];
  submitInfo.signalSemaphoreCount = 1;
  submitInfo.pSignalSemaphores = &gvLoop.imageRenderFinishedSemaphores[imageIndex];

  VkPresentInfoKHR presentInfo={};
  presentInfo.sType = VK_STRUCTURE_TYPE_PRESENT_INFO_KHR;
  presentInfo.waitSemaphoreCount = 1;
  presentInfo.pWaitSemaphores = &gvLoop.imageRenderFinishedSemaphores[imageIndex];

  presentInfo.swapchainCount = 1;
  presentInfo.pSwapchains = &gvWindow.swapchain;
  presentInfo.pImageIndices = &imageIndex;
  presentInfo.pResults = nul;

  CHECK_VKCMD(vkQueueSubmit(gvCore.graphicsQueue, 1, &submitInfo, gvLoop.frameResourceFences[gvLoop.curfif]),"failed to submit draw command buffer");

  swapchainResult = vkQueuePresentKHR(gvCore.presentQueue, &presentInfo);
  gvLoop.curfif = (gvLoop.curfif+1)%gvConfig.nfif;
}

loop details:

2 fif (issue also occurs with 3)
3 swapchain images
VK_PRESENT_MODE_FIFO_KHR
< 3ms CPU/frame, < 3ms GPU/frame
no validation warnings
disabled all layers
updated graphics driver
if I throw a vkDeviceWaitIdle right before the wait for fences, I get a rock solid 60fps (but would obviously like to not need to do this)

what might I be doing wrong? I'd be willing to do anything (well, not anything...) to get to the bottom of this- if someone wants to zoom control my screen, or if you'd like compensation, etc...

toxic pewter Aug 16, 2025, 6:53 PM

#

Do you also see it with your GPU manufacturer profiler?

warm tusk Aug 16, 2025, 7:02 PM

#

I haven't tried (I've never used nvidia's profiler before), but I'll give that a go.
though I'm not worried about this being a fault of the profiler: I can feel it in-game, even with any profilers disabled (and can also verify it w/ the basic in-game frametime display)

warm tusk Aug 16, 2025, 8:11 PM

#

ok I've pulled it up in NVidia NSight Systems, and not going to lie, I don't really understand how to interpret the results lol. reading through the "getting started" docs on their website, but is there anything specific you think I should be looking for with that?

#

ah shoot looks like NSight Systems is different from NSight Graphics.

warm tusk Aug 16, 2025, 8:43 PM

#

ok welp, I launched NSight Graphics and it looks like that's a single-frame-capture tool? (similar to renderdoc?) which isn't showing me anything especially useful.

warm tusk Aug 17, 2025, 5:52 PM

#

AHA! an update! I tried just logging my fif and imageIndex per frame (omg I should have done this way earlier...), and I'm getting this, starting from frame 0:

WTF. fif is oscillating correctly, but imageIndex is... why the hell is it giving me repeats and rarely using the third frame?!?!

toxic pewter Aug 17, 2025, 5:52 PM

#

You control frame in flight yourself so it's normal that it's "oscillating"

#

imageIndex comes from vkAcquireNextImageKHR which has no guarantee on the order

warm tusk Aug 17, 2025, 5:56 PM

#

"no guarantee", right- in the same way that an OS "doesn't guarantee" that any time will be spent on any given thread.
so, I have to be able to handle any out of order frames (which I thankfully do- my game doesn't crash or anything). but if it's giving me 001100110011 out of 3 requested frames in flight, then something is not working as intended

warm tusk Aug 17, 2025, 9:12 PM

#

in case anyone is curious, this might be the culprit: #synchronization message

fallow wind Aug 17, 2025, 9:32 PM

#

warm tusk putting together a summary message, so anyone else doesn't need to read through ...

How do I read that diagram? What are the little things on top and the bars on the bottom, exactly?

warm tusk Aug 17, 2025, 9:35 PM

#

the yellow/green/red bars at the very top are frame times (each bar is a frame). the purple highlight in that zone is the 5-or-so frames I'm zoomed into in the bottom part of the diagram.

#

the bottom part then, is a timeline broken up into two areas: the red lines are on the GPU, and the purple/gray are on the CPU

#

looking at the purple, you'll see one long frame (mostly taken up by fencewait), then one very short frame (so short it's cut off fra|)

fallow wind Aug 17, 2025, 9:38 PM

#

The red bars up top, why are they in two rows?

#

Or, I mean the red bars in the middle, I guess.

warm tusk Aug 17, 2025, 9:38 PM

#

my mouse is hovering over a GPU submit workload, which shows a red sliver ~ 30ms prior highlighting when on the CPU it was submitted

#

dunno, two vulkan "contexts"? (not totally sure what that means under the hood)

#

but it (in this case anyways, not sure if this is true by rule) corresponds to a fif's workload

#

(so one blob is one fif workload, and the blob next to it but on another row corresponds to the next fif's workload)

fallow wind Aug 17, 2025, 9:40 PM

#

So FiF0 is on top and FiF1 is on the bottom? That sort of thing?

warm tusk Aug 17, 2025, 9:40 PM

#

yep

fallow wind Aug 17, 2025, 9:43 PM

#

And the brown fencewait bar is the wait on the FiF's fence up at the top of your rendering loop?

warm tusk Aug 17, 2025, 9:46 PM

#

yes

fallow wind Aug 17, 2025, 9:49 PM

#

Where in that loop do you increment curfif?

warm tusk Aug 17, 2025, 9:50 PM

#

right at the end

#

oh whoops, I cut it off in my copy/paste

#

let me fix that

#

fixed

fallow wind Aug 17, 2025, 9:57 PM

#

So weird. That makes no sense at all.

#

What platform is this on? Like this is where I start seriously wondering about broken drivers and stuff like that...

#

If the swapchain is well behaved then the only reason it should be handing back the same image index twice (in FIFO mode especially) is if you stalled long enough for the desktop compositor to have entirely finished using that image you last presented before you next call acquire.

misty glade Aug 17, 2025, 9:59 PM

#

You mentioned how using vkDeviceWaitIdle() fixes the issue, what about vkQueueWaitIdle()?

warm tusk Aug 17, 2025, 9:59 PM

#

fallow wind So weird. That makes no sense at all.

lol I appreciate the validation 😂

#

but yes, this is on an up to date windows 11 laptop using an up to date nvidia driver on a laptop 4050 GPU

toxic pewter Aug 17, 2025, 10:00 PM

#

Yeah OS + vendor can help

#

Yeah ok it shouldn't happen on this setup

warm tusk Aug 17, 2025, 10:01 PM

#

fallow wind If the swapchain is well behaved then the only reason it should be handing back ...

when I use my cpu-side "chill out until 16ms has passed" code in there, I get swapchain frame 0,0,0,0,0...

fallow wind Aug 17, 2025, 10:01 PM

#

Is your EXE accidentally named some_game_that_had_broken_codde_that_nv_hacked_around.exe or something? Clutching at straws here, I know...

warm tusk Aug 17, 2025, 10:01 PM

#

misty glade You mentioned how using vkDeviceWaitIdle() fixes the issue, what about vkQueueWa...

haven't tried, but I expect it'd work identically (I'm using one queue)

toxic pewter Aug 17, 2025, 10:02 PM

#

Broken implicit layer maybe?

misty glade Aug 17, 2025, 10:02 PM

#

warm tusk haven't tried, but I expect it'd work identically (I'm using one queue)

Wouldn't vkDeviceWaitIdle() also wait on the present queue?

#

Just trying ideas here

warm tusk Aug 17, 2025, 10:02 PM

#

fallow wind Is your EXE accidentally named `some_game_that_had_broken_codde_that_nv_hacked_a...

LOL it's called nightondeck.exe, which I doubt nvidia is doing anything special for

warm tusk Aug 17, 2025, 10:03 PM

#

toxic pewter Broken implicit layer maybe?

already tried disabling all implicit layers (there's chatter about it further up the thread if you want to see details)

warm tusk Aug 17, 2025, 10:03 PM

#

misty glade Wouldn't vkDeviceWaitIdle() also wait on the present queue?

yes, which is why I think vkDeviceWaitIdle essentially == vkQueueWaitIdle in this case

warm tusk Aug 17, 2025, 10:04 PM

#

misty glade Just trying ideas here

yep! appreciate the help brainstorming! 😁

misty glade Aug 17, 2025, 10:05 PM

#

^_^

#

Hmm

#

Maybe try reducing the problem space

#

How about two swapchain images

#

instead of three

#

or did you already try that lol

#

Aren't you supposed to use the current frame index for the proper semaphore for VkPresentInfoKHR, not the imageIndex

fallow wind Aug 17, 2025, 10:07 PM

#

I'm still a bit confused about the fencewait bar. Something is off there. So you're waiting to rerecord the FiF0 data, and that fence correctly blocks you until your last submitted FiF0 clears the GPU, Okay. But you don't start trying to rerecord FiF0 until a while after FiF1 has already finished rendering, which seems like it's too late. Recording the next FiF0 should happen on top of FiF1 being recorded...

misty glade Aug 17, 2025, 10:08 PM

#

misty glade Aren't you supposed to use the current frame index for the proper semaphore for ...

This is what my engine does:

Semaphore[] finishedSemaphores = lastPass.getFinishedSemaphores();

            VkPresentInfoKHR presentInfo = VkPresentInfoKHR.calloc(stack);
            presentInfo.sType(VK_STRUCTURE_TYPE_PRESENT_INFO_KHR);
            presentInfo.pWaitSemaphores(stack.longs(
                    ((VulkanSemaphore[]) finishedSemaphores)[frameIndex].getHandle()
            ));
            presentInfo.swapchainCount(1);
            presentInfo.pSwapchains(stack.longs(swapchain.getHandle()));
            presentInfo.pImageIndices(pImageIndex);

            vkQueuePresentKHR(presentQueue, presentInfo);

            frameIndex = (frameIndex + 1) % FRAMES_IN_FLIGHT;

warm tusk Aug 17, 2025, 10:10 PM

#

fallow wind I'm still a bit confused about the `fencewait` bar. Something is off there. So y...

the issue is the submit tied to the fif has the present of its used swapchain image as a wait semaphore

#

so we're waiting on the fence which is waiting on the completion of a GPU workload which is waiting to even begin processing until its corresponding acquired swapchain frame is available which is waiting on vsync

#

you can see in the diagram (below the frame timing bars) are two rows of "frame" spans. the top row is the span of the CPU frame (== to the width of the parent-most purple bar), and the bar below it is vsync, locked at 16.6ms.

you'll notice both GPU workloads are "released" to begin work right on a vsync frame boundary, which makes sense

#

I could mark up that diagram with notes on exactly when each fence/semaphore is taken/released, and it will all check out and be correct.

the issue is that I'm for some godforsaken reason being dished 001100110011.

misty glade Aug 17, 2025, 10:18 PM

#

warm tusk putting together a summary message, so anyone else doesn't need to read through ...

This kinda doesn't make sense, why are you using gvLoop.curfif for frameResourceFences and frameImageAvailableSemaphores but imageIndex for imageRenderFinishedSemaphores

warm tusk Aug 17, 2025, 10:18 PM

#

(I wouldn't be surprised if the situation driving this madness is just that I'm working with such tiny GPU workloads, and that's... confusing the driver that's responsible for predictively picking which swapchain image to next assign, which is preferring to try to reuse frames to minimize... GPU memory cache thrashing? I dunno... and it's mistaking the slow frame for thinking something took a long time? 🤷)

#

(^ pulling 100% out of my ass)

fallow wind Aug 17, 2025, 10:20 PM

#

Do you have a way of marking which fencewait belongs to which of the red GPU bars and where each of those GPU workloads was submitted?

warm tusk Aug 17, 2025, 10:21 PM

#

misty glade This kinda doesn't make sense, why are you using `gvLoop.curfif` for `frameResou...

because the presentation doesn't need to care which fif completed, it needs to know which swapchain frame is ready

fallow wind Aug 17, 2025, 10:21 PM

#

Oh, okay, I think I figured it out. The tiny pink sliver to the right of fencewait is recording that frame.

#

And the submit is at the end of that.

#

Got it.

misty glade Aug 17, 2025, 10:24 PM

#

warm tusk because the presentation doesn't need to care which fif completed, it needs to k...

yeah you're right

#

that also wouldn't fix vkAcquireNextImageKHR returning weird indices

#

that's so weird

warm tusk Aug 17, 2025, 10:26 PM

#

misty glade How about two swapchain images

oddly enough, that "fixes" the problem, but does so by giving swapchain image 0,0,0,0,0... 😳

misty glade Aug 17, 2025, 10:28 PM

#

warm tusk oddly enough, that "fixes" the problem, but does so by giving swapchain image `0...

Your vkAcquireNextImageKHR needs help... >_<

warm tusk Aug 17, 2025, 10:29 PM

#

warm tusk oddly enough, that "fixes" the problem, but does so by giving swapchain image `0...

eh nvm, spoke too soon. left it running long enough and it went from 0,0,0,0 to 0,1,0,1,0,1 and back to 0,0,1,1,0,0,1,1, 😭

misty glade Aug 17, 2025, 10:30 PM

#

Maybe try it on a different machine

#

Or use your iGPU as the selected device

warm tusk Aug 17, 2025, 10:31 PM

#

ah iGPU is a good call 👀

misty glade Aug 17, 2025, 10:31 PM

#

My nvidia gpu handled some glsl stuff weirdly compared to my igpu so it might help

warm tusk Aug 17, 2025, 10:32 PM

#

iGPU works perfect. 0,1,2,0,1,2,0,1,2

misty glade Aug 17, 2025, 10:33 PM

#

How long did you leave it running?

warm tusk Aug 17, 2025, 10:35 PM

#

this whole time since you suggested it

#

rock solid

#

ridiculous 🤦

#

man and app bootup time is like 5x faster w/ igpu too 😭

#

I could have just been using that this whole time

#

(don't get me wrong, I want to fix the sync issues in the general case, not just for my machine)

#

so if there is still an issue in my setup, I do want to work to understand and fix it. but is it really as simple as "4050 drivers expect massive workloads and their swapchain distribution predictive logic gets confused when that's not the case"?

misty glade Aug 17, 2025, 10:41 PM

#

warm tusk so if there _is_ still an issue in my setup, I do want to work to understand and...

I think the more GPUs/machines you try on, the more it will become clear whether there's something wrong with your code or the 4090 Vulkan driver

warm tusk Aug 17, 2025, 10:42 PM

#

would you feel comfortable running an .exe from me? 😛

misty glade Aug 17, 2025, 10:42 PM

#

warm tusk would you feel comfortable running an .exe from me? 😛

I need to setup a VM and everything...>_<

#

no, sorry lol

warm tusk Aug 17, 2025, 10:43 PM

#

hah fair

misty glade Aug 17, 2025, 10:43 PM

#

I don't have an nvidia gpu with me rn anyway

#

Mine's an Intel Iris Xe

warm tusk Aug 17, 2025, 10:44 PM

#

I mean the test case is something other than my graphics card. no need to be nvidia.

misty glade Aug 17, 2025, 10:44 PM

#

yeah

#

does your app build for linux?

warm tusk Aug 17, 2025, 10:47 PM

#

it does, need to quick run it through the build machine first

#

one sec

fallow wind Aug 17, 2025, 10:47 PM

#

I think you've just got stupid drivers that're going into some weird compat mode or something like that.

#

Like your code that you've posted looks right.

misty glade Aug 17, 2025, 10:52 PM

#

you could run your app under linux with the official nvidia drivers

if it works, then it's a windows 4070 driver issue, and if it doesn't, that means your vulkan api usage is still broken and maybe the igpu was just not caring

warm tusk Aug 17, 2025, 10:55 PM

#

I don't have linux installed on this machine :/

misty glade Aug 17, 2025, 10:58 PM

#

warm tusk I don't have linux installed on this machine :/

live usb perhaps?

#

I think you'll run out of space pretty quickly though

warm tusk Aug 19, 2025, 2:13 AM

#

still haven't got around to testing it on linux, for now I'm just assuming it's an odd behavior of my graphics driver (if anyone wants to volunteer to test it on their system, lmk!)

one thing I realized, though, is that, because it is still hitting every present with a new frame, the only real "issue" is an input/physics sampling one.

that is, when I pass actual delta times into my game loop:

~2ms passes
get input
advance physics ~2ms
frame drawn from this pov
~30ms passes
get input
advance physics ~30ms
frame drawn from this pov
repeat
but the frames are presented precisely 16ms apart, which looks super juttery because it's alternating showing 2ms of advanced physics every 16ms, and 30ms of advanced physics every 16ms.

if instead, I just hardcode 16ms into advancing the world, regardless of the fact that actually only either 2ms has passed or 30ms has passed, then it actually looks totally smooth. the only real "issue" is the really inconsistent input sampling, up to +30ms more latency than I'd get otherwise (which isn't good, but in my particular case, is "acceptable").

#

(all that said: if someone here knows somebody at nvidia, and thinks they'd be interested in correcting a vulkan acquire image degeneracy on laptop-4050's, I'd be happy to cooperate with them to fix it!)

storm crypt Aug 23, 2025, 11:25 AM

#

I have the exact same issue and did some digging.
It seems like just a rather common encounter / misconception, found like two related posts on the first page in the Vulkan subreddit. (FIFO giving random images / Vsync causes extreme stuttering)

Don't think it's a driver bug or anything, the AcquireNextImage thing is mostly a red herring. The real issue is related to Frames In Flight, explained in detail by Erfan Ahmadi here.

If I understand correctly, it can happen when the host is submitting frames way faster than the device can present. Sometimes multiple frames are executed in quick succession (the short delay) before any present can happen, so the following frames will need to wait (the long delay) until the presentation engine is ready, and thus causing serve visual and input lag. It's fine in Mailbox or other mode 'cause any extra frames are consumed immediately instead of being queued in FIFO.

Removing FiF or adding waits around submits can solve this but then it loses the advantage of using FiF in the first place.
In the above blog post and nvpro_core2 apparently fix it using timeline semaphores or similar technique.

Total beginner btw, I have no idea about most of it but hope it helps. 🙂

white echo Aug 23, 2025, 12:03 PM

#

any way of syncing with the cpu with timeline semaphores you can do with fences

#

the signaling order in the nvpro reddit thread you linked is also the standard FIF fence signaling order

white echo Aug 23, 2025, 12:20 PM

#

also your first link only relates to latency of input, the issue we are talking about here is there is a 2 vsync gpu pause every FIF number of frames, as in,
draw frame 0 -> draw frame 1 -> gpu goes to sleep for 2 vsync amount of time -> draw frame 3 -> draw frame 4

storm crypt Aug 23, 2025, 6:18 PM

#

I guess the main point is some(me) would expect the process gets blocked on the fence or acquireImage when the queue is full, but it's actually the GPU waiting(the 2 vsync) on the sema until the image is usable.

#

Yeah, the timeline sema is probably just different way to do the same thing. I just thought maybe they somehow prevent the host from going too far ahead.

warm tusk Aug 24, 2025, 4:06 AM

#

if vkAquireNexeImage would just hand out images approximately cyclically (012012012 rather than 0011001100) for me, then I wouldn't have any issue. in that world, even if I'm able to push through both frames in flight essentially instantly: I'll constantly be two frames ahead, and I'll be spending a vast majority of the time just waiting at the door for each "present" to free up my next submit, which would finish instantly and leave me back at the door waiting for the next present. in other words, it'd render 1 frame every 16ms, get its work done instantly, and chill for most of the frame. perfect!

but, since it's alternating doubles, I get the absolute worst case situation. I need to wait on a fence to issue a submit, but the submit also has to wait for its corresponding swapchain image to finish any presents it's still busy with to begin work.

so here's the timing:

A: fif 0
A: aquire swapchain 0
A: submit (fif:0,sc:0)
A: GPU processes (fif:0,sc:0) [fif 0 released]
A: GPU holds on presenting swapchain 0, waiting for vsync
B: fif 1
B: aquire swapchain 1
B: submit (fif:1,sc:1)
B: GPU processes (fif:1,sc:1) [fif 1 released]
B: GPU holds on presenting swapchain 1, waiting for vsync
C: fif 0
C: aquire swapchain 1 <- why 1?!?!
C: submit (fif:0,sc:1)
C: GPU holds on processing (fif:0,sc:1), because swapchain 1 hasn't actually yet been released, because B hasn't yet presented
D: fif 1
D: aquire swapchian 0
D: submit (fif:1,sc:0)
D: GPU holds on processing (fif:1,sc:0), because swapchain 0 hasn't actually yet been released, because A hasn't yet presented
E: CPU holds on fif 0 because C hasn't processed

ok phew. at this point, everything is saturated, and we're waiting on our first vsync signal to trigger a present and start releasing the holds. continuing on:

vsync signal
A: presents [swapchain 0 released]
D would be free to start processing, but it can't, because command buffers submitted on a same queue are processed in order, and C is still held up

so we've presented a frame, but nothing was actually freed to resume! now we wait a full 16ms for the next vsync. after that wait:

vsync signal
B: presents [swapchain 1] released]
C: GPU processes (fif:0,sc:1) [fif 0 released]
D: GPU processes (fif:1,sc:0) [fif 1 released]

at this point, all swapchains are released, and all fifs are free, so we start over.

#

^ looks more complicated than it is. the dastardly line is C's aquiring the swapchain used by B (resulting in D aquiring the swapchain used by A). so we saturate the semaphores and fences, but when A fully completes, it does nothing to release C.

white echo Aug 24, 2025, 4:34 AM

#

@warm tusk add a fence to acquire, you can prob reuse the FIF fence since you have to wait for it before recording/submitting
when you get 0,0,1,1, it should cancel out FIF but when you get 0,1,2 it should let FIF work but you may get less cpu time than without it

warm tusk Aug 24, 2025, 3:21 PM

#

lol isn't that what we were discussing here? #1403846020504358932 message

but what do you mean "{in the degenerate case} it should cancel out FIF"? regardless, putting that back in doesn't resolve the issue

cobalt hound Aug 24, 2025, 4:33 PM

#

Something to keep in mind with frames in flight is input delay, which with triple buffering could cause one frame to be presented and two be waiting to be presented, but the two waiting could have the same input information as the one that is being presented. which adds some kind of input delay with turbulence for a game running at 60fps ranging from about 16ms-48ms which is a noticeable delay which could be interpreted as lagging especially when there is some additional input delay from other sources.

cobalt hound Aug 24, 2025, 4:50 PM

#

also:
when outputing the image indicies returned from vkAcquireNextImageKHR from my current project running on a rtx3080TI:

idx: 0
idx: 0
idx: 0
idx: 0
idx: 0
idx: 0
idx: 0
idx: 0
idx: 0
idx: 0
idx: 0
idx: 0
idx: 0
idx: 0
idx: 0
idx: 0
idx: 0
idx: 0
idx: 0
idx: 0

some context:
i'm dooing double buffering.
before writing to the swapchain image i asynchronously wait on the previous VkSwapchainPresentFenceInfoKHR to be signaled, because that was the only way i got rid of some validation error ... it doesn't matter though because before im writing to the swapchain image, e.g. i'm waiting on that i do physics, and rendering gbuffers->writing to the swapchain image is the post process step, which takes more than enough time for the previous image to be presented

warm tusk Aug 24, 2025, 4:58 PM

#

double buffering meaning "two frames in flight"? because you are not double buffering present with that swapchain index pattern lol. and if by "async wait on previous present fence" you mean you're sitting on a fence (CPU waiting) until present of previous swapchain image to acquire the next swapchain image and submit the post process write to that image, then you're kinda forcing a not-double-buffering pattern.

cobalt hound Aug 24, 2025, 5:06 PM

#

well to be fair, im not dooing double buffering not on the presentation side, however the post process step goes in less than a millisecond for me, which it is fine to do it like this, since the stuff that is done before takes signitifantly more time

#

the point i wanted to make is that the indicies returned from vkAcquireNextImageKHR on nvidia drivers seem not to be in a oscillating order, so the issue ur having might not be from that

warm tusk Aug 24, 2025, 5:19 PM

#

oh, I can also get 0,0,0,0... if I use 1 fif (in other words, if I wait CPU side for the completion of the previous submit before acquiring the swapchainimg and launching the next one). and that works! the driver clearly notices that all presentation frames are free so there's no need to cycle.
the issue is that I'm running on a 4050, so of course it'll be able to perform a full frame's work in series. but I'd like to ship this to run on a 4050 and a 950, and I'm frustrated that the only reason the 4050 is stalling is because of an incoherent swapchain presentation order (the 0,1,1,0,0,1,1,0...).

I should be able to saturate the fences/semaphores, and when the presentation engine releases resources, I snap them up and immediately re-saturate, regardless of how fast I can resaturate. it should be the periodic release of resources that drives the period of the whole system. I feel like this is a pretty fundamental ask?

cobalt hound Aug 24, 2025, 5:33 PM

#

warm tusk oh, I can also get `0,0,0,0...` if I use 1 fif (in other words, if I wait CPU si...

wdym by saturating fences/semapohres?

warm tusk Aug 24, 2025, 5:38 PM

#

yeah I've been trying to figure out what word is most correct to use there, and that was the best I could think of. but you're right that's not quite correct 😛

what I'm trying to get at is, if I'm able to produce @ a faster rate than needed, I can put up gates at various parts of the pipeline to limit output. I'm calling a gate "saturated" here if the thing it's gating is waiting on the other side of that gate.

#

like w/ a multithreaded queue: you can consider the producer/consumer sides of the queue. the producer fills the queue to the brim, but is then stuck at the gate, unable to fill it more until a consumer comes along and frees up some resources in the queue. in this case, my application is the producer, and vsync is the consumer. the producer has "saturated" this queue because it's in wait 99% of the time w/ a full queue, and just responds to any release of resources by immediately resaturating.

cobalt hound Aug 24, 2025, 5:48 PM

#

i see

warm tusk Aug 24, 2025, 5:50 PM

#

(the opposite side of that would be if my game can't produce content to fill the queue fast enough to keep up with the consumer- so it's the consumer that just sits around waiting for something to plop in the queue that it can grab, and the producer is just constantly doing work to add to the queue and immediately starting work on the next thing. unfortunately, in this situation, you'd say "the producer thread is saturated" because it's 100% busy doing work. so me overloading the word to say "the gate is saturated" to mean the opposite is... fairly considered confusing 😛 )

cobalt hound Aug 24, 2025, 6:01 PM

#

warm tusk oh, I can also get `0,0,0,0...` if I use 1 fif (in other words, if I wait CPU si...

i guess the best way you could resolve your incoherent presentation order, is by fixing it yourself, e.g. dont index into your frame resources by using imageIndex
rather have a variable which is frameResourceIndex = (frameResourceIndex + 1) % frameResourceCount;after each successful vkAcquireNextImageKHR call.
only use imageIndex for selecting the swapchain image

warm tusk Aug 24, 2025, 6:06 PM

#

that's what I have here #1403846020504358932 message (where curfif is what you've called frameResourceIndex)

the issue is the submit wait semaphore on the swapchain's present, which is 1. necessary, because I can't be rendering to an image that is already in queue to present, and 2. causing every other frame to wait for two vsyncs to be able to get to work (frame A:swap index 0, B:1, C:1- A and B can get submitted immediately, C needs to wait for both A and B to finish presenting before it can get to work). see #1403846020504358932 message

cobalt hound Aug 24, 2025, 6:22 PM

#

and when you disable vsync, e.g. use VK_PRESENT_MODE_IMMEDIATE_KHR or VK_PRESENT_MODE_FIFO_RELAXED_KHR?

warm tusk Aug 24, 2025, 6:30 PM

#

lol well IMMEDIATE just means I'm rendering/updating at like 1k fps, so yeah, no stuttering (but battery drain will be insane). and I'm actually not sure about FIFO_RELAXED- what is that?

cobalt hound Aug 24, 2025, 6:32 PM

#

warm tusk lol well `IMMEDIATE` just means I'm rendering/updating at like 1k fps, so yeah, ...

its a mix of FIFO and IMMEDIATE

cobalt hound Aug 24, 2025, 6:32 PM

#

warm tusk lol well `IMMEDIATE` just means I'm rendering/updating at like 1k fps, so yeah, ...

so have you tried disabling it?

warm tusk Aug 24, 2025, 6:32 PM

#

have I tried IMMEDIATE? yes. and it does what I've said.

#

I'm reading the spec on RELAXED now, but already I notice that it's not a guaranteed presentation mode :/

#

also, it looks like RELAXED's value is that it won't wait past a missed vsync. so a late frame has the opportunity to tear into a current one. which is not my problem

cobalt hound Aug 24, 2025, 6:51 PM

#

warm tusk if `vkAquireNexeImage` would just hand out images approximately cyclically (`012...

shouldn't it be:
~~C: fif 0~~ C: fif 2 ?
because if fif is 0, frame resources from A will be used for C right?

#

and by that do: vkWaitForFences(gvCore.device, 1, &gvLoop.frameResourceFences[gvLoop.curfif], VK_TRUE, UINT64_MAX); with curfif=0, e.g. wait for A to be presented

warm tusk Aug 24, 2025, 7:02 PM

#

It’s fif 0,1,0,1 because I’m only using 2 fif (to limit lag input latency). But the problem persists even w/ 3 fif (just with a slightly more complicated gate saturation timeline that I’d rather not write out 😛).

#

But C fence doesn’t wait on A to be presented despite sharing a fif index, because A’s command buffer submission has actually already finished processing and released the gate

cobalt hound Aug 24, 2025, 7:09 PM

#

warm tusk It’s fif 0,1,0,1 because I’m only using 2 fif (to limit lag input latency). But ...

what does gvLoop.frameResourceFences[gvLoop.curfif] wait on?

#

oh its the cmd buffer

warm tusk Aug 24, 2025, 7:11 PM

#

each frame submits a command buffer, and presents a frame. The fences unlock at the completion of the command buffer submission’s GPU processing, but do not care about the present.

cobalt hound Aug 24, 2025, 7:18 PM

#

warm tusk putting together a summary message, so anyone else doesn't need to read through ...

could you paste the full function?

warm tusk Aug 24, 2025, 7:20 PM

#

That is the full function. Or do you mean “including the various profiling macros and recovery code paths that don’t get hit in this case”?

cobalt hound Aug 24, 2025, 7:21 PM

#

warm tusk That is the full function. Or do you mean “including the various profiling macro...

you said it was stripped down for simplicity

warm tusk Aug 24, 2025, 7:22 PM

#

Yep, on those grounds (removed profiling macros and unhit code paths). But I can post the actual full thing if you’d like as well- just heads up it’s the same info but more obfuscated 😛

cobalt hound Aug 24, 2025, 7:23 PM

#

would still like to see it

warm tusk Aug 24, 2025, 7:29 PM

#

unedited loop code

📎 message.txt

cobalt hound Aug 24, 2025, 7:58 PM

#

warm tusk unedited loop code

thanks!

hm you mentioned that dooing vkDeviceWaitIdle on each frame fixes things right?

warm tusk Aug 24, 2025, 8:01 PM

#

yeah, and that actually makes vkAcquireNextSwapchainImage just return 0,0,0,0...

cobalt hound Aug 24, 2025, 8:17 PM

#

warm tusk yeah, and that actually makes `vkAcquireNextSwapchainImage` just return `0,0,0,0...

VkSubmitInfo submitInfo = {};
submitInfo.sType = VK_STRUCTURE_TYPE_SUBMIT_INFO;
submitInfo.waitSemaphoreCount = 1;
submitInfo.pWaitSemaphores = &gvLoop.frameImageAvailableSemaphores[gvLoop.curfif];
VkPipelineStageFlags waitStages[] = { VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT };
submitInfo.pWaitDstStageMask = waitStages;

submitInfo.commandBufferCount = 1;
submitInfo.pCommandBuffers = &gvLoop.commandBuffers[gvLoop.curfif];
submitInfo.signalSemaphoreCount = 1;
submitInfo.pSignalSemaphores = &gvLoop.imageRenderFinishedSemaphores[imageIndex];

in your submit info the only thing your waiting on is the frameImageAvailableSemaphores semaphore.
which means that your gpu is free to render frame A and B at the same time.

theory:
in frame 0 and 1 your cpu submits rendering commands for fif's A and B, since there is no dependency your gpu decides to render them both at once.
in frame 2 you wait for A to finish, once that happens you submit the rendering commands for the next frame.
in frame 3 you wait for B to finish, however since it was rendered along with A its already finished and the next commands can be submited.
in frame 4 you wait for A to finish, while rendering B aswell ... round and round it goes ...

however when you do a HARD_WAIT with vkDeviceWaitIdle B is not submitted before A has completed ...

warm tusk Aug 24, 2025, 8:59 PM

#

I'm confused. lol I've already outlined exactly what happens here: #1403846020504358932 message

and there also is no "both at once"- I'm using a single graphics queue. but it does process A and Bs commands in quick succession (#1403846020504358932 message)

the issue is that C's command buffer cannot start processing on the GPU (even though it can be submitted!) until B's swapchain image finishes presenting. D also can't start processing because it was enqueued after C's, even though once A presents all semaphores are unlocked.

so we have

enqueue/process both A and B (in quick sequence)
enqueue C and eventually D but can't process
A presents, but that unlocks nothing
B presents, which unlocks C
C and D process (in quick sequence)

#

maybe I should make another thread that consolidates all the most up to date information. but the current question I want to answer is:

How can I force acquire swapchain to not give degenerate frame ordering?

or

How can I architect my game loop to process input/physics with a periodicity driven by an easily met vsync even in the case of degenerate frame ordering?

cobalt hound Aug 24, 2025, 9:20 PM

#

warm tusk I'm confused. lol I've already outlined exactly what happens here: https://disco...

well a gpus can very well do many things at once, so there is many at once, even though your only using a single graphics queue.
in your code i cannot find anything that orders the execution of frames in flight submissions, be it by the host or device

#

from vulkans Implicit Synchronization Guarantees:

Submission order is a fundamental ordering in Vulkan, giving meaning to the order in which action and synchronization commands are recorded and submitted to a single queue. Explicit and implicit ordering guarantees between commands in Vulkan all work on the premise that this ordering is meaningful. This order does not itself define any execution or memory dependencies; synchronization commands and other orderings within the API use this ordering to define their scopes.

warm tusk Aug 24, 2025, 11:41 PM

#

cobalt hound well a gpus can very well do many things at once, so there is many at once, even...

ok fair, but in the profiling screenshot I linked, it is clear they are in-practice being completed sequentially.
but regardless (sequentially-and-quickly vs actually-in-parallel), the problem is as stated above.

cobalt hound Aug 24, 2025, 11:42 PM

#

warm tusk ok fair, but in the profiling screenshot I linked, it is clear they are in-pract...

its very important, because that might be the cause of everything.
also could you link me the screenshot you mentioned?

cobalt hound Aug 24, 2025, 11:46 PM

#

warm tusk I'm confused. lol I've already outlined exactly what happens here: https://disco...

also, the order in which anything is submitted to a queue, might be executed out of order if there is no explicit synchronization, which makes a memory or execution dependency

warm tusk Aug 24, 2025, 11:47 PM

#

cobalt hound its very important, because that might be the cause of everything. also could yo...

it's the second link in the post you replied to

cobalt hound Aug 24, 2025, 11:51 PM

#

#

my theory is that the rendering of A and B overlaps because there is no explicit synchronization between them, or any kind of dependency

#

if it was so, the fix would be to make any submission wait on the previous submission

warm tusk Aug 24, 2025, 11:53 PM

#

cobalt hound also, the order in which anything is submitted to a queue, might be executed out...

let's slow down. my above descriptions describe the situation which justifies every single wait. I'm no longer asking any questions about why anything is waiting. if you want to point to any wait in the profiling screenshot, I will be able to tell you exactly what it's waiting on and why.

the issue is that the waits are "long short long short long short", which if I drive the processing between snapshots using those waits, I end up with snapshots after 2ms of physics processing and after 30ms of physics processing, displayed every 16ms, which looks like stuttering

#

and your diagram's annotations are not correct. the red chunks on the upper timelines (not the frame graph) are alternating frame submit loads

#

one sec, I'll annotate it myself

cobalt hound Aug 24, 2025, 11:54 PM

#

they are just to illustrate the point im trying to make

#

that your gpus is working on two frames at once

warm tusk Aug 24, 2025, 11:58 PM

#

they literally are not. what the hell.
the red blobs are each a frame's GPU submit being processed.

#

in the profiler I can hover over the GPU submit loads and it shows the corresponding frame from which they were submitted. notice how each blob is executed sequentially.

cobalt hound Aug 25, 2025, 12:01 AM

#

submission order != execution order

warm tusk Aug 25, 2025, 12:01 AM

#

not by constraint, but here, it literally is

cobalt hound Aug 25, 2025, 12:01 AM

#

whats the name of these red blobs?

warm tusk Aug 25, 2025, 12:02 AM

#

"earth", but that's because that's the name of the renderpass they're processing.

but yo. I can hover over them, and see the corresponding CPU frame they were submitted on.

cobalt hound Aug 25, 2025, 12:03 AM

#

wheres the corresponding tracy scope for those?

warm tusk Aug 25, 2025, 12:04 AM

#

like on the CPU? it's submitted with the submit. what?

cobalt hound Aug 25, 2025, 12:05 AM

#

what is reponsible for them showing up?

#

is it some tracy option, or something you do in code?

warm tusk Aug 25, 2025, 12:06 AM

#

yes, it flags renderpasses in the command buffer that get submitted during record

cobalt hound Aug 25, 2025, 12:20 AM

#

warm tusk they literally are not. what the hell. the red blobs are each a frame's GPU subm...

the the fenceWait scope represents the wait of submission to finish, in your annotation A, B, C, D fenceWait all end way before the red earth bar begin

#

nvm

warm tusk Aug 25, 2025, 12:24 AM

#

also worth noting that that isn't the most up to date profiling screenshot (it's just the first one I came across when looking for a link that showed the long,short,long,short behavior). the one linked includes the fenceImageWait step, which is commented out in my latest loop. (it also might have been taken before I integrated the gpu sync extension which lets tracy ~precisely place the GPU and CPU timelines relative to each other- though the difference between before/after isn't significant)

the image here #1403846020504358932 message is more up to date

cobalt hound Aug 25, 2025, 12:30 AM

#

how does the tracy capture look with HARD_WAIT?

#

as comparison that might help

warm tusk Aug 25, 2025, 12:30 AM

#

what question are you trying to answer? I have already accounted for why every wait exists. do you not believe my analysis?

#

I can attempt to write it up more clearly if #1403846020504358932 message wasn't clear enough

cobalt hound Aug 25, 2025, 12:33 AM

#

warm tusk what question are you trying to answer? I have already accounted for why every w...

i'm trying to help you fix a bug. and comparing the two different tracy captures might help

warm tusk Aug 25, 2025, 12:34 AM

#

I appreciate the help, but this post is growing to 300 posts long, and I have had back and forths on the topic with multiple people, and you are not showing a willingness to first understand existing findings

#

#1403846020504358932 message <- here is a capture with a vk wait idle

cobalt hound Aug 25, 2025, 12:52 AM

#

what controls these?

cobalt hound Aug 25, 2025, 12:53 AM

#

warm tusk what question are you trying to answer? I have already accounted for why every w...

the question i'm trying to answer is whether the information tracy gives you is valid

white echo Aug 25, 2025, 1:01 AM

#

warm tusk lol isn't that what we were discussing here? https://discord.com/channels/427551...

lol I mean it didn't made sense then, but given the current context and theory that the 0,0,1,1 was causing it, seemed like a simple test
just to make sure, you did

wait for fences
reset fence
acquire with same fence
wait for fences
reset fence
record cb
submit with same fence

right

warm tusk Aug 25, 2025, 1:09 AM

#

cobalt hound the question i'm trying to answer is whether the information tracy gives you is ...

ok. thanks for the time you have put in, truly. but I don't think we're even close to on the same page, and respectfully, I can't spend any more time assisting this line of inquiry. the behavior tracy shows is extremely consistent with the synchronization data sent to vulkan (again, please just check this out, if you're interested in understanding that: #1403846020504358932 message).

a spec can explicitly not guarantee something like execution order, but a driver can absolutely implement the spec in a way that happens to often respect that order. that non-discrepancy is not enough for me to start questioning a widely used tool used in the most trivial way, especially when an out-of-order swapchain image distribution fully explains the issue.

if I'm overlooking something, and there's more to it that would justify this line of inquiry (like if you're the developer of tracy and are familiar with some shortcoming or something?), then ok. but you've gotta sell me more than just demanding various profiles.

warm tusk Aug 25, 2025, 1:11 AM

#

white echo lol I mean it didn't made sense then, but given the current context and theory t...

oh that is not quite what I'm currently doing. I'm doing more like:

wait for fif fence
acquire (but not with a fence- can you acquire with a fence?)
wait for imageIndex fence
reset fences
record cb
submit w/ fif fence and imageindex fence

white echo Aug 25, 2025, 1:12 AM

#

warm tusk oh that is _not_ quite what I'm currently doing. I'm doing more like: ``` wait f...

you need to reset before it can be used again

#

or it's still in the signaled state

warm tusk Aug 25, 2025, 1:12 AM

#

I do reset the fences before reuse?

white echo Aug 25, 2025, 1:13 AM

#

ohhh I thought you pass it to acquire, missed the brackets lol

#

VkResult vkAcquireNextImageKHR(
VkDevice device,
VkSwapchainKHR swapchain,
uint64_t timeout,
VkSemaphore semaphore,
VkFence fence,
uint32_t* pImageIndex);

#

that's what I meant

#

not your image index frence stuff

warm tusk Aug 25, 2025, 1:15 AM

#

OH got it. no I haven't tried that. but wait why would that help? what am I trying to hold off on doing CPU side until the image is acquired?

white echo Aug 25, 2025, 1:15 AM

#

so just wait for FIF then reset then acquire with FIF then wait and reset again

white echo Aug 25, 2025, 1:15 AM

#

warm tusk OH got it. no I haven't tried that. but wait why would that help? what am I tryi...

if you get 0,0,1,1 then FIF cancels out cuz you'll wait for the previous frame to finish

#

if you get 0,1,2,0,1,2 then FIF still should work

warm tusk Aug 25, 2025, 1:16 AM

#

k gimme a sec to think through this

warm tusk Aug 25, 2025, 1:33 AM

#

trying it out, seeing some very strange behavior. one is that I'm spending a majority of time on this new acquire fence wait, which does trigger a 0,1,2,0,1,2... swapchain image distribution, but every once in a while I get stuck on a frame for 10s

white echo Aug 25, 2025, 1:34 AM

#

I think I can declare you as cursed lmao

warm tusk Aug 25, 2025, 1:34 AM

#

#

ah wait the big 10s thing might be the tracy collection lag issue a someone else was posting about earlier

white echo Aug 25, 2025, 1:39 AM

#

lmao

#

well try it without tracy and a standard fps monitor and graph it out

#

should not go 30 fps sometimes

#

if everything looks fine then we should prob move on from this cursed thread lol

warm tusk Aug 25, 2025, 1:43 AM

#

ok wow yep, looks great. total bummer that I have to do that, and also that for some reason that kills tracy performance. but yeah is this a step that should be added to the "standard frame sync logic" to encourage cyclic acquire distribution?

#

and also ftr, I don't necessarily need to reuse the fif fence, right? I don't even need multiple fences for this. I could just have one "acquire fence" that I reuse every frame?

white echo Aug 25, 2025, 1:45 AM

#

I mean needing to care about when you're drawing nothing is rather niche by how no one knows how to solve this lmao

warm tusk Aug 25, 2025, 1:45 AM

#

(because it's always added, waited on, reset in the full frame)

white echo Aug 25, 2025, 1:45 AM

#

warm tusk and also ftr, I _don't_ necessarily need to reuse the fif fence, right? I don't ...

I mean reusing the FIF fence saves you one fence creation soo

warm tusk Aug 25, 2025, 1:45 AM

#

ok sure. I was just making sure I was understanding the usage pattern here 👍

white echo Aug 25, 2025, 1:49 AM

#

lol 2 textured sphers and some 2D sprites is fairly close to nothing for the gpu

warm tusk Aug 25, 2025, 1:50 AM

#

sure, but a lot of people make games rendering even less

#

(my point is it shouldn't be that niche)

white echo Aug 25, 2025, 1:51 AM

#

well you can blog about it

warm tusk Aug 25, 2025, 1:51 AM

#

LOL you mean write a blog about this thread?

white echo Aug 25, 2025, 1:51 AM

#

yes

warm tusk Aug 25, 2025, 1:51 AM

#

ugh more work...

#Strange Timing Question

For each FIF:

For each swapchain image:

The workflow