#Image Generation Stuck Until New Requests Are Sent

49 messages · Page 1 of 1 (latest)

heavy field
#

It typically takes 5–10 seconds to generate an image. However, sometimes a request doesn’t enter the processing queue until another request is sent.

This issue occurred multiple times today, and I recorded it. In the example I captured, my friend's first request stayed in the queue for an unusually long time. I asked him to send a second request to try and trigger the first one to start processing, but that didn’t work. When he submitted a third request, it finally caused the first request to begin processing, followed by the second, and then the third.

This issue occurs frequently and significantly impacts the user experience, as requests that should complete in 5–10 seconds end up taking several minutes.

fossil anvilBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> ✅ Mark Solution

heavy field
hollow totem
#

It's not because the comfyui is still loading?

heavy field
hollow totem
#

Yes and I see the same worker processed 3 requests after waiting for some moments

#

Can you check the logs of the workers

#

Maybe you should check if your worker is ready or still hasn't called the handler function

hollow totem
#

Then send 3-5 request immediately without any good delay between them

heavy field
#

This doesn’t happen all the time, but it pops up about once a week and definitely hurts the user experience. I even had a meeting with @gritty marlin to try and show the issue live, it didn’t happen back then (which was a good thing haha)

We agreed that if it happened again and I could catch it on video, that would be helpful. So that’s what I’ve done now, so I hope this helps the team figure out what’s going wrong

hollow totem
# heavy field

i noticed the scaling here is weird(ineffecient) tho, 1 worker end up serving your request but 3 end up running?

#

im not sure how to help without extra details, maybe logs would help. but sure contacting staff and letting them to debug on runpod site could help

heavy field
hollow totem
#

Did you open a support ticket for this?

#

Maybe, can you Edit your endpoint then check if it's queue delay or request count (scaling type)

#

I think i can't see the logs well, but if you can try rerunning your request and check if your workers are in a ready state to receive and process your Request (add some prints after model loaded)

heavy field
gritty marlin
#

thanks for reporting this @heavy field and thanks for your support already @hollow totem

minor solstice
#

I also encounter this issue lots of times!
The request sometime stuck in the queue, and all workers still stay in idle, doesnt pop the queue to work.
And this possibley be fixed when i send other requests to the queue, and make lots of request waiting in the queue, then the first worker start from idle to running

silk anchor
#

@gritty marlin Hi tim ,I'm muxin from X

heavy field
#

Hi, has anyone found a manual solution to this? My app is ready but because of this issue, sometimes I can generate an image, sometimes it's stuck/lost in the queue and never generates.

#

Once this is fixed we can launch our app

bright epochBOT
hollow totem
#

perhaps create a ticket first

heavy field
#

For those who have the same problem: I created a support ticket, tried different methods but it is not fixed yet. We will launch our app as soon as the problem is fixed.

little shell
#

If you are just creating your endpoint or it’s initializing with no ready worker, do not submit a job yet. There’s a bug we’re tracking about this. At most, the job sits in queue for two minutes if it was queued before a single worker is ready. It also pushes past the queue if another job is queued after a worker is ready.

heavy field
# little shell If you are just creating your endpoint or it’s initializing with no ready worker...

Thank you for sharing that, Dean. Unfortunately, the issue occurs when workers are ready/idle.

Additionally, I've noticed several times where workers kept running for over 8 minutes despite having no jobs in the queue or in progress, and created a new thread here: https://discord.com/channels/912829806415085598/1389378980959752273

Looking forward to launching our app as soon as these issues are fixed! 🙏

heavy field
#

Let me also share an update about this issue

hollow totem
#

What's up

heavy field
#

So the issue is, requests get stuck in queue, but they are generated when we send new requests. Because what we noticed is that sending new requests (either 1, 2 or sometimes more) trigger something, and it changes the status of our first request from being in queue to being in progress.

#

and we noticed that, clicking the "generate" button many times always worked to generate images, because it triggered what was wrong

hollow totem
#

So like some request doesn't scale up workers?

#

And gets stuck?

heavy field
#

to fix this, we updated our app, so that clicking the "generate" button sends the same request 5 times to RunPod, and the app checks each of their status quickly, and as soon as 1 of the 5 requests get in progress, we cancel the duplicate requests immediately. this way the probability of generating images is much higher and it does not generate any duplicate image

#

this is a temporary solution we found for the issue, but I hope RunPod devs can fix it so there's no need to do this

heavy field
hollow totem
#

What if you used a new endpoint

#

Like you clone them, wait until all is initialized then test them again

#

Did you report this endopoint to a ticket already?

heavy field
heavy field
hollow totem
#

staffs should be able to check whats wrong, is there no update yet?
whats the last thing they said

heavy field
# hollow totem staffs should be able to check whats wrong, is there no update yet? whats the la...

we've been in touch almost everyday and they've been helpful, I really appreciate their time and help. the last thing they said was that they noticed my setup was using SDK version 1.7.9, there have been improvements and bug fixes since then, so they recommended upgrading it to 1.7.12

thanks to Tim, I've updated it today but haven't updated my app to use the old method (sending 1 request at a time rather than 5) yet. I will keep you posted if I notice upgrading the SDK fixes the issue