Jobs not being processed in a timely manner | Laravel | Page 1

teal mica Mar 13, 2024, 12:15 AM

#

So I'm using Horizon to handle my jobs in production and at one time I'm submitting 50-70 jobs into the queue, some of them are taking as long as 3-4 minutes to actually enter the queue, despite the job itself taking on average 0.2-0.5 seconds to run.

Here's my Horizon config in case it's relevant, I have it maxing out at 30 processes

'supervisor-1' => [
    'minProcesses' => 1,
    'maxProcesses' => 30,
    'balanceMaxShift' => 5,
    'balanceCooldown' => 5,
    'tries' => 5,
],

rapid orchid Mar 13, 2024, 9:35 AM

#

Are there any other jobs being processed? And what do you mean by "enter the queue"? Jobs are queued immediately when you dispatch them.

teal mica Mar 13, 2024, 1:19 PM

#

rapid orchid Are there any other jobs being processed? And what do you mean by "enter the que...

Sorry my phrasing may not be 100% accurate, when I say “enter the queue” I mean that they actually get picked up by a worker

#

I have one long running job (on a different queue) that takes about a minute, it kicks off these smaller ones that are user notifications

heavy kettle Mar 15, 2024, 1:05 PM

#

Is the same supervisor running both queues? Can you show the whole config in horizon.php under the "Queue Worker Configuration" section?

teal mica Mar 15, 2024, 4:27 PM

#

heavy kettle Is the same supervisor running both queues? Can you show the whole config in hor...

i'm not sure how to tell that, I think they're separate though?

'defaults' => [
        'supervisor-1' => [
            'connection' => 'redis',
            'queue' => ['default'],
            'balance' => 'auto',
            'maxProcesses' => 1,
            'maxTime' => 0,
            'maxJobs' => 0,
            'memory' => 128,
            'tries' => 1,
            'timeout' => 600,
            'nice' => 0,
        ],
    ],

    'environments' => [
        'production' => [
            'supervisor-1' => [
                'minProcesses' => 1,
                'maxProcesses' => 10,
                'balanceMaxShift' => 5,
                'balanceCooldown' => 2,
                'tries' => 5,
            ],
            'supervisor-long-running' => [
                'connection' => 'redis-long-running',
                'queue' => [
                    'long-running-queue',
                ],
                'balance' => 'auto',
                'minProcesses' => 2,
                'maxProcesses' => 5,
                'tries' => 5,
                'timeout' => 900,
                'balanceMaxShift' => 5,
                'balanceCooldown' => 2
            ],
        ],

        'local' => [
            'supervisor-1' => [
                'maxProcesses' => 3,
            ],
            'supervisor-long-running' => [
                'connection' => 'redis-long-running',
                'queue' => [
                    'long-running-queue',
                ],
                'balance' => 'simple',
                'processes' => 9,
                'tries' => 2,
                'timeout' => 900,
            ],
        ],
    ],

#

The main job runs on the long-running-queue, it sends out notifications that get queued on the default queue, where the delay is happening

#

based on the timings of the notifications, i'm not entirely convinced my default queue is scaling up at all...

heavy kettle Mar 15, 2024, 4:40 PM

#

Looks like an incomplete config file maybe....

You are setting up 'supervisor-1', but not 'supervisor-long-running' up top...I would start with putting everything into the "defaults" then modify per environment to keep it cleaner

teal mica Mar 15, 2024, 4:42 PM

#

heavy kettle Looks like an incomplete config file maybe.... You are setting up 'supervisor-1...

will do that for sure now

heavy kettle Mar 15, 2024, 4:43 PM

#

Try that...it will clean up the config, then may make it easier to debug

teal mica Mar 15, 2024, 4:49 PM

#

'defaults' => [
        'supervisor-1' => [
            'connection' => 'redis',
            'queue' => ['default'],
            'balance' => 'auto',
            'minProcesses' => 1,
            'balanceMaxShift' => 5,
            'balanceCooldown' => 2,
            'maxTime' => 0,
            'maxJobs' => 0,
            'memory' => 128,
            'tries' => 5,
            'timeout' => 600,
            'nice' => 0,
        ],
        'supervisor-long-running' => [
            'connection' => 'redis-long-running',
            'queue' => ['long-running-queue'],
            'balance' => 'auto',
            'minProcesses' => 1,
            'balanceMaxShift' => 5,
            'balanceCooldown' => 2,
            'timeout' => 900,
            'maxTime' => 0,
            'maxJobs' => 0,
            'memory' => 128,
            'tries' => 1,
            'nice' => 0,
        ]
    ],

    'environments' => [
        'production' => [
            'supervisor-1' => [
                'maxProcesses' => 10,
            ],
            'supervisor-long-running' => [
                'maxProcesses' => 5,
            ],
        ],

        'dev' => [
            'supervisor-1' => [
                'maxProcesses' => 5,
            ],
            'supervisor-long-running' => [
                'supervisor-long-running' => [
                    'maxProcesses' => 2,
                ],
            ],
        ],

        'local' => [
            'supervisor-1' => [
                'maxProcesses' => 3,
            ],
            'supervisor-long-running' => [
                'supervisor-long-running' => [
                    'maxProcesses' => 1,
                ],
            ],
        ],
    ],

#

hopefully that's a bit better to read

#

just deployed it, will be monitoring things for a bit

heavy kettle Mar 15, 2024, 4:54 PM

#

Can you send a screenshot of horizon 'home' page that shows # of processes, etc

teal mica Mar 15, 2024, 4:56 PM

#

sure thing

heavy kettle Mar 15, 2024, 5:05 PM

#

How are you dispatching those jobs?

teal mica Mar 15, 2024, 5:10 PM

#

heavy kettle How are you dispatching those jobs?

the main longer running job triggers notifications to users, these notifications have shouldqueue on them

$subscription->user->notify(new NewMoviesNotification($newMovies[$theater->id], $theater, $subscription));

class NewMoviesNotification extends Notification implements ShouldQueue

heavy kettle Mar 15, 2024, 5:12 PM

#

Ok. So it's the notification ones that are the issue, correct?

teal mica Mar 15, 2024, 5:13 PM

#

yeah, those are the ones that are waiting in the queue too long

heavy kettle Mar 15, 2024, 5:13 PM

#

How are you dispatching the long running job?

#

I'm wondering if it is actually running on the default queue

teal mica Mar 15, 2024, 5:14 PM

#

that one is dispatched via scheduler, looks like this

$schedule->call(function () {
    UpdateTheatersPaid::dispatch()->onQueue('long-running-queue');
})->everyFiveMinutes();

heavy kettle Mar 15, 2024, 5:15 PM

#

In config/queue.php what is "retry_after" set to?

#

If you have long running jobs, this setting can really throw some weird occurances if it's set too short

teal mica Mar 15, 2024, 5:16 PM

#

looks like for default its 900, the long running queue is 1200

'redis' => [
    'driver' => 'redis',
    'connection' => 'default',
    'queue' => env('REDIS_QUEUE', 'default'),
    'retry_after' => 900,
    'block_for' => null,
    'after_commit' => false,
],
'redis-long-running' => [
    'driver' => 'redis',
    'connection' => 'default',
    'queue' => 'default',
    'retry_after' => 1200,
],

heavy kettle Mar 15, 2024, 5:17 PM

#

Why do you have a seperate connection to 'redis-long-running' here?

teal mica Mar 15, 2024, 5:18 PM

#

good question, i set this up forever ago so honestly I don't remember why it's like this

heavy kettle Mar 15, 2024, 5:18 PM

#

Ok. That seems a bit odd to me, but nto sure if that would cause the issue or not since it's still going to the same driver/connection from there.

#

Can you confirm the "notify" jobs only run AFTER thelong running is complete?

teal mica Mar 15, 2024, 5:20 PM

#

heavy kettle Can you confirm the "notify" jobs only run AFTER thelong running is complete?

i don't think so? they get dispatched throughout the long running job (could be early in it's run or very late)

#

it dispatches anywhere from 50-500 in one run

heavy kettle Mar 15, 2024, 5:20 PM

#

Ok. But what I want to know is NOT when the first is dispatched, but does it wait to actually process until the long running is done? Or do they start processing before it's done?

teal mica Mar 15, 2024, 5:21 PM

#

i haven't explicitly added anything to tell it to wait, so it is processing them while the long running job is still running

heavy kettle Mar 15, 2024, 5:22 PM

#

Right, but I'm thinking about debugging and it would be good to know.

What are your queue settings inside the .ENV? Don't need UN and PW

#

Specifically QUEUE_CONNECTION

teal mica Mar 15, 2024, 5:23 PM

#

QUEUE_CONNECTION=redis
REDIS_HOST=127.0.0.1
REDIS_PORT=6379

heavy kettle Mar 15, 2024, 5:23 PM

#

Ok...just making sure it wasn't 'sync'

teal mica Mar 15, 2024, 5:23 PM

#

lol that would be a silly mistake

heavy kettle Mar 15, 2024, 5:23 PM

#

Haha...I've done worse. 😂

teal mica Mar 15, 2024, 5:24 PM

#

would it make sense to hold the notifications until the main job is finished? or is that kinda irrelevant

heavy kettle Mar 15, 2024, 5:25 PM

#

That's irrelevant, and IMO, defeats the purpose of the queue system....it's designed to allow this very thing.

teal mica Mar 15, 2024, 5:25 PM

#

yeah that was my thought as well

heavy kettle Mar 15, 2024, 5:25 PM

#

Do you have this setup locally where you can test something?

#

At the top of your long running task add this line:

Log::debug('Start long runner:'.now()->toString())

At the end:

Log::debug('End long runner:'.now()->toString())

At top of notification job:

Log::debug('Send notification:'.now()->toString())

#

Let this run and share the log

teal mica Mar 15, 2024, 5:30 PM

#

I'll just add this to production, it's really hard for me to test this stuff locally due to data differences

#

added them, going to monitor the logs for a bit now

#

(this may take an indeterminate amount of time to come back, these notifications are not consistent throughout the day)

teal mica Mar 15, 2024, 6:06 PM

#

looks like __construct gets triggered when job is initially queued, times are identical

[2024-03-15 18:05:42] production.DEBUG: Send notification:Fri Mar 15 2024 18:05:42 GMT+0000
[2024-03-15 18:05:42] production.DEBUG: Send notification:Fri Mar 15 2024 18:05:42 GMT+0000
[2024-03-15 18:05:42] production.DEBUG: Send notification:Fri Mar 15 2024 18:05:42 GMT+0000
[2024-03-15 18:05:42] production.DEBUG: Send notification:Fri Mar 15 2024 18:05:42 GMT+0000
[2024-03-15 18:05:42] production.DEBUG: Send notification:Fri Mar 15 2024 18:05:42 GMT+0000
[2024-03-15 18:05:42] production.DEBUG: Send notification:Fri Mar 15 2024 18:05:42 GMT+0000
[2024-03-15 18:05:42] production.DEBUG: Send notification:Fri Mar 15 2024 18:05:42 GMT+0000
[2024-03-15 18:05:42] production.DEBUG: Send notification:Fri Mar 15 2024 18:05:42 GMT+0000
[2024-03-15 18:05:42] production.DEBUG: Send notification:Fri Mar 15 2024 18:05:42 GMT+0000
[2024-03-15 18:05:42] production.DEBUG: Send notification:Fri Mar 15 2024 18:05:42 GMT+0000
[2024-03-15 18:05:42] production.DEBUG: Send notification:Fri Mar 15 2024 18:05:42 GMT+0000
[2024-03-15 18:05:42] production.DEBUG: Send notification:Fri Mar 15 2024 18:05:42 GMT+0000
[2024-03-15 18:05:42] production.DEBUG: Send notification:Fri Mar 15 2024 18:05:42 GMT+0000
[2024-03-15 18:05:42] production.DEBUG: Send notification:Fri Mar 15 2024 18:05:42 GMT+0000
[2024-03-15 18:05:42] production.DEBUG: Send notification:Fri Mar 15 2024 18:05:42 GMT+0000
[2024-03-15 18:05:42] production.DEBUG: Send notification:Fri Mar 15 2024 18:05:42 GMT+0000
[2024-03-15 18:05:42] production.DEBUG: Send notification:Fri Mar 15 2024 18:05:42 GMT+0000
[2024-03-15 18:05:42] production.DEBUG: Send notification:Fri Mar 15 2024 18:05:42 GMT+0000
[2024-03-15 18:05:42] production.DEBUG: Send notification:Fri Mar 15 2024 18:05:42 GMT+0000
[2024-03-15 18:05:42] production.DEBUG: Send notification:Fri Mar 15 2024 18:05:42 GMT+0000
[2024-03-15 18:05:42] production.DEBUG: Send notification:Fri Mar 15 2024 18:05:42 GMT+0000
[2024-03-15 18:05:42] production.DEBUG: Send notification:Fri Mar 15 2024 18:05:42 GMT+0000
[2024-03-15 18:05:42] production.DEBUG: Send notification:Fri Mar 15 2024 18:05:42 GMT+0000
[2024-03-15 18:05:42] production.DEBUG: Send notification:Fri Mar 15 2024 18:05:42 GMT+0000
[2024-03-15 18:05:42] production.DEBUG: Send notification:Fri Mar 15 2024 18:05:42 GMT+0000
[2024-03-15 18:05:42] production.DEBUG: Send notification:Fri Mar 15 2024 18:05:42 GMT+0000
[2024-03-15 18:05:42] production.DEBUG: Send notification:Fri Mar 15 2024 18:05:42 GMT+0000

#

longest one to send here was ~9 seconds after initial queue

#

heavy kettle Mar 15, 2024, 6:11 PM

#

Ok. Could you put a log into when the notification job actually runs?

#

How many notifications will come from a single long running job?

teal mica Mar 15, 2024, 6:12 PM

#

heavy kettle Ok. Could you put a log into when the notification job actually runs?

where would that log go? if not in __construct wherre

teal mica Mar 15, 2024, 6:12 PM

#

heavy kettle How many notifications will come from a single long running job?

it's extremely variable, anywhere from 50-500

heavy kettle Mar 15, 2024, 6:15 PM

#

teal mica where would that log go? if not in __construct wherre

May need to do this and listen for the event

#

https://laravel.com/docs/10.x/notifications#notification-sent-event

teal mica Mar 15, 2024, 6:16 PM

#

ah yeah that should work

#

deploying that

heavy kettle Mar 15, 2024, 6:18 PM

#

teal mica it's extremely variable, anywhere from 50-500

Ok....so a notification is taking around 500ms to send, if you have 8 notifications, then it would take 4 seconds to process if only 1 worker......right?

teal mica Mar 15, 2024, 6:18 PM

#

yeah one notification takes anywhere from 0.25 - 0.5 seconds to run

heavy kettle Mar 15, 2024, 6:19 PM

#

Ok....so, if you ended up getting 500, then 250 seconds to run wouldn't seem out of the ordinary?

teal mica Mar 15, 2024, 6:20 PM

#

well, it's supposed to scale up to 20 workers, assuming that each worker should be responsible for ~25 notifications

#

so in theory a maximum of like 12-15 seconds

heavy kettle Mar 15, 2024, 6:20 PM

#

Ok...it's only set to 10, so that may be the first thing.: 'maxProcesses' => 10,

#

The other is that you are using a scaling strategy....turn this off for a test to see

#

Basically, laravel only adds 1 worker every 3 seconds or something like this with scaling....so, if you have a very light load, you may only have 1 worker available, then start processing, 3 seconds later a new worker is spawned, etc

teal mica Mar 15, 2024, 6:22 PM

#

so you're saying to try it with 20 workers on at all times yeah?

teal mica Mar 15, 2024, 6:23 PM

#

heavy kettle Ok...it's only set to 10, so that may be the first thing.: 'maxProcesses' => 10,

yeah i had changed it at one point today but thats beside the point lol

heavy kettle Mar 15, 2024, 6:23 PM

#

Yes. Just to test, then you can tweak the balance settings if that is the issue

teal mica Mar 15, 2024, 6:23 PM

#

will update config now

heavy kettle Mar 15, 2024, 6:23 PM

#

balance => false

teal mica Mar 15, 2024, 6:24 PM

#

'balance' => false,
'minProcesses' => 20,
'maxProcesses' => 20,

heavy kettle Mar 15, 2024, 6:24 PM

#

Sorry. Hold on

teal mica Mar 15, 2024, 6:24 PM

#

that should work right?

heavy kettle Mar 15, 2024, 6:26 PM

#

Go ahead and try that. You don't have an autoScalingStrategy configured, so it may also be that....but try this first

teal mica Mar 15, 2024, 6:26 PM

#

oh interesting

#

okay i'll try what i posted and see

heavy kettle Mar 15, 2024, 6:27 PM

#

Once deployed, send a screenshot of horizon dashboard again

teal mica Mar 15, 2024, 6:27 PM

#

looks good

heavy kettle Mar 15, 2024, 6:28 PM

#

Cool. This should give us the answer

#

If they process fast, it's the autoScalingStrategy that is too slow to grow

#

The other thing I would consider is putting the notifications into chunks. This would help a bunch

teal mica Mar 15, 2024, 6:31 PM

#

will let you know next time it triggers some notifications

#

also thanks for the help btw this has been something i've been scratching my head about for a while

heavy kettle Mar 15, 2024, 6:34 PM

#

Of course...happy to help

#

How big is this server? CPU? RAM?

teal mica Mar 15, 2024, 7:18 PM

#

it just sent one notification that seemingly took like a minute to get processed?

teal mica Mar 15, 2024, 7:18 PM

#

heavy kettle How big is this server? CPU? RAM?

4 cpu cores 8gb of ram

heavy kettle Mar 15, 2024, 7:20 PM

#

teal mica 4 cpu cores 8gb of ram

Can you send the full screen with jobs including that one?

heavy kettle Mar 15, 2024, 7:20 PM

#

teal mica 4 cpu cores 8gb of ram

Ok. Just making sure it was .5 vCPU and 512 MB ram.

teal mica Mar 15, 2024, 7:21 PM

#

#

just sent out a bunch that were all sent out very quick

[2024-03-15 19:26:40] production.DEBUG: Send notification:Fri Mar 15 2024 19:26:40 GMT+0000
[2024-03-15 19:26:41] production.DEBUG: Send notification:Fri Mar 15 2024 19:26:41 GMT+0000
[2024-03-15 19:26:43] production.DEBUG: Send notification:Fri Mar 15 2024 19:26:43 GMT+0000
[2024-03-15 19:26:44] production.DEBUG: Send notification:Fri Mar 15 2024 19:26:44 GMT+0000
[2024-03-15 19:26:44] production.DEBUG: Send notification:Fri Mar 15 2024 19:26:44 GMT+0000
[2024-03-15 19:26:44] production.DEBUG: Send notification:Fri Mar 15 2024 19:26:44 GMT+0000
[2024-03-15 19:26:44] production.DEBUG: Send notification:Fri Mar 15 2024 19:26:44 GMT+0000
[2024-03-15 19:26:44] production.DEBUG: Send notification:Fri Mar 15 2024 19:26:44 GMT+0000
[2024-03-15 19:26:44] production.DEBUG: Send notification:Fri Mar 15 2024 19:26:44 GMT+0000
[2024-03-15 19:26:44] production.DEBUG: Send notification:Fri Mar 15 2024 19:26:44 GMT+0000
[2024-03-15 19:26:44] production.DEBUG: Send notification:Fri Mar 15 2024 19:26:44 GMT+0000
[2024-03-15 19:26:44] production.DEBUG: Send notification:Fri Mar 15 2024 19:26:44 GMT+0000
[2024-03-15 19:26:44] production.DEBUG: Send notification:Fri Mar 15 2024 19:26:44 GMT+0000
[2024-03-15 19:26:44] production.DEBUG: Send notification:Fri Mar 15 2024 19:26:44 GMT+0000
[2024-03-15 19:26:44] production.DEBUG: Send notification:Fri Mar 15 2024 19:26:44 GMT+0000
[2024-03-15 19:26:44] production.DEBUG: Send notification:Fri Mar 15 2024 19:26:44 GMT+0000
[2024-03-15 19:26:44] production.DEBUG: Send notification:Fri Mar 15 2024 19:26:44 GMT+0000
[2024-03-15 19:26:44] production.DEBUG: Send notification:Fri Mar 15 2024 19:26:44 GMT+0000
[2024-03-15 19:26:44] production.DEBUG: Send notification:Fri Mar 15 2024 19:26:44 GMT+0000
[2024-03-15 19:26:44] production.DEBUG: Send notification:Fri Mar 15 2024 19:26:44 GMT+0000

#

but they sat in the queue for a minute

#

#

another wave just went through a minute after that too

#

these were all queued for 2 minutes

#

so it seems like it does 20 (after waiting an initial minute), then waits another minute then does 20 more

heavy kettle Mar 15, 2024, 7:45 PM

#

Hmmmm....do you have any throttling in place?

teal mica Mar 15, 2024, 8:03 PM

#

nothing that i'm aware of, at least nothing i've done intentionally

#

if it helps i can grant temp access to the repo or something?

heavy kettle Mar 15, 2024, 8:31 PM

#

Sure. I'll take a look bmckay959

#Jobs not being processed in a timely manner