#Jobs not being processed in a timely manner

1 messages · Page 1 of 1 (latest)

teal mica
#

So I'm using Horizon to handle my jobs in production and at one time I'm submitting 50-70 jobs into the queue, some of them are taking as long as 3-4 minutes to actually enter the queue, despite the job itself taking on average 0.2-0.5 seconds to run.

Here's my Horizon config in case it's relevant, I have it maxing out at 30 processes

'supervisor-1' => [
    'minProcesses' => 1,
    'maxProcesses' => 30,
    'balanceMaxShift' => 5,
    'balanceCooldown' => 5,
    'tries' => 5,
],
rapid orchid
#

Are there any other jobs being processed? And what do you mean by "enter the queue"? Jobs are queued immediately when you dispatch them.

teal mica
#

I have one long running job (on a different queue) that takes about a minute, it kicks off these smaller ones that are user notifications

heavy kettle
#

Is the same supervisor running both queues? Can you show the whole config in horizon.php under the "Queue Worker Configuration" section?

teal mica
# heavy kettle Is the same supervisor running both queues? Can you show the whole config in hor...

i'm not sure how to tell that, I think they're separate though?

'defaults' => [
        'supervisor-1' => [
            'connection' => 'redis',
            'queue' => ['default'],
            'balance' => 'auto',
            'maxProcesses' => 1,
            'maxTime' => 0,
            'maxJobs' => 0,
            'memory' => 128,
            'tries' => 1,
            'timeout' => 600,
            'nice' => 0,
        ],
    ],

    'environments' => [
        'production' => [
            'supervisor-1' => [
                'minProcesses' => 1,
                'maxProcesses' => 10,
                'balanceMaxShift' => 5,
                'balanceCooldown' => 2,
                'tries' => 5,
            ],
            'supervisor-long-running' => [
                'connection' => 'redis-long-running',
                'queue' => [
                    'long-running-queue',
                ],
                'balance' => 'auto',
                'minProcesses' => 2,
                'maxProcesses' => 5,
                'tries' => 5,
                'timeout' => 900,
                'balanceMaxShift' => 5,
                'balanceCooldown' => 2
            ],
        ],

        'local' => [
            'supervisor-1' => [
                'maxProcesses' => 3,
            ],
            'supervisor-long-running' => [
                'connection' => 'redis-long-running',
                'queue' => [
                    'long-running-queue',
                ],
                'balance' => 'simple',
                'processes' => 9,
                'tries' => 2,
                'timeout' => 900,
            ],
        ],
    ],
#

The main job runs on the long-running-queue, it sends out notifications that get queued on the default queue, where the delay is happening

#

based on the timings of the notifications, i'm not entirely convinced my default queue is scaling up at all...

heavy kettle
#

Looks like an incomplete config file maybe....

You are setting up 'supervisor-1', but not 'supervisor-long-running' up top...I would start with putting everything into the "defaults" then modify per environment to keep it cleaner

heavy kettle
#

Try that...it will clean up the config, then may make it easier to debug

teal mica
#
'defaults' => [
        'supervisor-1' => [
            'connection' => 'redis',
            'queue' => ['default'],
            'balance' => 'auto',
            'minProcesses' => 1,
            'balanceMaxShift' => 5,
            'balanceCooldown' => 2,
            'maxTime' => 0,
            'maxJobs' => 0,
            'memory' => 128,
            'tries' => 5,
            'timeout' => 600,
            'nice' => 0,
        ],
        'supervisor-long-running' => [
            'connection' => 'redis-long-running',
            'queue' => ['long-running-queue'],
            'balance' => 'auto',
            'minProcesses' => 1,
            'balanceMaxShift' => 5,
            'balanceCooldown' => 2,
            'timeout' => 900,
            'maxTime' => 0,
            'maxJobs' => 0,
            'memory' => 128,
            'tries' => 1,
            'nice' => 0,
        ]
    ],

    'environments' => [
        'production' => [
            'supervisor-1' => [
                'maxProcesses' => 10,
            ],
            'supervisor-long-running' => [
                'maxProcesses' => 5,
            ],
        ],

        'dev' => [
            'supervisor-1' => [
                'maxProcesses' => 5,
            ],
            'supervisor-long-running' => [
                'supervisor-long-running' => [
                    'maxProcesses' => 2,
                ],
            ],
        ],

        'local' => [
            'supervisor-1' => [
                'maxProcesses' => 3,
            ],
            'supervisor-long-running' => [
                'supervisor-long-running' => [
                    'maxProcesses' => 1,
                ],
            ],
        ],
    ],
#

hopefully that's a bit better to read

#

just deployed it, will be monitoring things for a bit

heavy kettle
#

Can you send a screenshot of horizon 'home' page that shows # of processes, etc

teal mica
#

sure thing

heavy kettle
#

How are you dispatching those jobs?

teal mica
# heavy kettle How are you dispatching those jobs?

the main longer running job triggers notifications to users, these notifications have shouldqueue on them

$subscription->user->notify(new NewMoviesNotification($newMovies[$theater->id], $theater, $subscription));

class NewMoviesNotification extends Notification implements ShouldQueue

heavy kettle
#

Ok. So it's the notification ones that are the issue, correct?

teal mica
#

yeah, those are the ones that are waiting in the queue too long

heavy kettle
#

How are you dispatching the long running job?

#

I'm wondering if it is actually running on the default queue

teal mica
#

that one is dispatched via scheduler, looks like this

$schedule->call(function () {
    UpdateTheatersPaid::dispatch()->onQueue('long-running-queue');
})->everyFiveMinutes();
heavy kettle
#

In config/queue.php what is "retry_after" set to?

#

If you have long running jobs, this setting can really throw some weird occurances if it's set too short

teal mica
#

looks like for default its 900, the long running queue is 1200

'redis' => [
    'driver' => 'redis',
    'connection' => 'default',
    'queue' => env('REDIS_QUEUE', 'default'),
    'retry_after' => 900,
    'block_for' => null,
    'after_commit' => false,
],
'redis-long-running' => [
    'driver' => 'redis',
    'connection' => 'default',
    'queue' => 'default',
    'retry_after' => 1200,
],
heavy kettle
#

Why do you have a seperate connection to 'redis-long-running' here?

teal mica
#

good question, i set this up forever ago so honestly I don't remember why it's like this

heavy kettle
#

Ok. That seems a bit odd to me, but nto sure if that would cause the issue or not since it's still going to the same driver/connection from there.

#

Can you confirm the "notify" jobs only run AFTER thelong running is complete?

teal mica
#

it dispatches anywhere from 50-500 in one run

heavy kettle
#

Ok. But what I want to know is NOT when the first is dispatched, but does it wait to actually process until the long running is done? Or do they start processing before it's done?

teal mica
#

i haven't explicitly added anything to tell it to wait, so it is processing them while the long running job is still running

heavy kettle
#

Right, but I'm thinking about debugging and it would be good to know.

What are your queue settings inside the .ENV? Don't need UN and PW

#

Specifically QUEUE_CONNECTION

teal mica
#
QUEUE_CONNECTION=redis
REDIS_HOST=127.0.0.1
REDIS_PORT=6379
heavy kettle
#

Ok...just making sure it wasn't 'sync'

teal mica
#

lol that would be a silly mistake

heavy kettle
#

Haha...I've done worse. 😂

teal mica
#

would it make sense to hold the notifications until the main job is finished? or is that kinda irrelevant

heavy kettle
#

That's irrelevant, and IMO, defeats the purpose of the queue system....it's designed to allow this very thing.

teal mica
#

yeah that was my thought as well

heavy kettle
#

Do you have this setup locally where you can test something?

#

At the top of your long running task add this line:

Log::debug('Start long runner:'.now()->toString())

At the end:

Log::debug('End long runner:'.now()->toString())

At top of notification job:

Log::debug('Send notification:'.now()->toString())

#

Let this run and share the log

teal mica
#

I'll just add this to production, it's really hard for me to test this stuff locally due to data differences

#

added them, going to monitor the logs for a bit now

#

(this may take an indeterminate amount of time to come back, these notifications are not consistent throughout the day)

teal mica
#

looks like __construct gets triggered when job is initially queued, times are identical

[2024-03-15 18:05:42] production.DEBUG: Send notification:Fri Mar 15 2024 18:05:42 GMT+0000
[2024-03-15 18:05:42] production.DEBUG: Send notification:Fri Mar 15 2024 18:05:42 GMT+0000
[2024-03-15 18:05:42] production.DEBUG: Send notification:Fri Mar 15 2024 18:05:42 GMT+0000
[2024-03-15 18:05:42] production.DEBUG: Send notification:Fri Mar 15 2024 18:05:42 GMT+0000
[2024-03-15 18:05:42] production.DEBUG: Send notification:Fri Mar 15 2024 18:05:42 GMT+0000
[2024-03-15 18:05:42] production.DEBUG: Send notification:Fri Mar 15 2024 18:05:42 GMT+0000
[2024-03-15 18:05:42] production.DEBUG: Send notification:Fri Mar 15 2024 18:05:42 GMT+0000
[2024-03-15 18:05:42] production.DEBUG: Send notification:Fri Mar 15 2024 18:05:42 GMT+0000
[2024-03-15 18:05:42] production.DEBUG: Send notification:Fri Mar 15 2024 18:05:42 GMT+0000
[2024-03-15 18:05:42] production.DEBUG: Send notification:Fri Mar 15 2024 18:05:42 GMT+0000
[2024-03-15 18:05:42] production.DEBUG: Send notification:Fri Mar 15 2024 18:05:42 GMT+0000
[2024-03-15 18:05:42] production.DEBUG: Send notification:Fri Mar 15 2024 18:05:42 GMT+0000
[2024-03-15 18:05:42] production.DEBUG: Send notification:Fri Mar 15 2024 18:05:42 GMT+0000
[2024-03-15 18:05:42] production.DEBUG: Send notification:Fri Mar 15 2024 18:05:42 GMT+0000
[2024-03-15 18:05:42] production.DEBUG: Send notification:Fri Mar 15 2024 18:05:42 GMT+0000
[2024-03-15 18:05:42] production.DEBUG: Send notification:Fri Mar 15 2024 18:05:42 GMT+0000
[2024-03-15 18:05:42] production.DEBUG: Send notification:Fri Mar 15 2024 18:05:42 GMT+0000
[2024-03-15 18:05:42] production.DEBUG: Send notification:Fri Mar 15 2024 18:05:42 GMT+0000
[2024-03-15 18:05:42] production.DEBUG: Send notification:Fri Mar 15 2024 18:05:42 GMT+0000
[2024-03-15 18:05:42] production.DEBUG: Send notification:Fri Mar 15 2024 18:05:42 GMT+0000
[2024-03-15 18:05:42] production.DEBUG: Send notification:Fri Mar 15 2024 18:05:42 GMT+0000
[2024-03-15 18:05:42] production.DEBUG: Send notification:Fri Mar 15 2024 18:05:42 GMT+0000
[2024-03-15 18:05:42] production.DEBUG: Send notification:Fri Mar 15 2024 18:05:42 GMT+0000
[2024-03-15 18:05:42] production.DEBUG: Send notification:Fri Mar 15 2024 18:05:42 GMT+0000
[2024-03-15 18:05:42] production.DEBUG: Send notification:Fri Mar 15 2024 18:05:42 GMT+0000
[2024-03-15 18:05:42] production.DEBUG: Send notification:Fri Mar 15 2024 18:05:42 GMT+0000
[2024-03-15 18:05:42] production.DEBUG: Send notification:Fri Mar 15 2024 18:05:42 GMT+0000
#

longest one to send here was ~9 seconds after initial queue

heavy kettle
#

Ok. Could you put a log into when the notification job actually runs?

#

How many notifications will come from a single long running job?

teal mica
teal mica
teal mica
#

ah yeah that should work

#

deploying that

heavy kettle
teal mica
#

yeah one notification takes anywhere from 0.25 - 0.5 seconds to run

heavy kettle
#

Ok....so, if you ended up getting 500, then 250 seconds to run wouldn't seem out of the ordinary?

teal mica
#

well, it's supposed to scale up to 20 workers, assuming that each worker should be responsible for ~25 notifications

#

so in theory a maximum of like 12-15 seconds

heavy kettle
#

Ok...it's only set to 10, so that may be the first thing.: 'maxProcesses' => 10,

#

The other is that you are using a scaling strategy....turn this off for a test to see

#

Basically, laravel only adds 1 worker every 3 seconds or something like this with scaling....so, if you have a very light load, you may only have 1 worker available, then start processing, 3 seconds later a new worker is spawned, etc

teal mica
#

so you're saying to try it with 20 workers on at all times yeah?

teal mica
heavy kettle
#

Yes. Just to test, then you can tweak the balance settings if that is the issue

teal mica
#

will update config now

heavy kettle
#

balance => false

teal mica
#

'balance' => false,
'minProcesses' => 20,
'maxProcesses' => 20,

heavy kettle
#

Sorry. Hold on

teal mica
#

that should work right?

heavy kettle
#

Go ahead and try that. You don't have an autoScalingStrategy configured, so it may also be that....but try this first

teal mica
#

oh interesting

#

okay i'll try what i posted and see

heavy kettle
#

Once deployed, send a screenshot of horizon dashboard again

teal mica
#

looks good

heavy kettle
#

Cool. This should give us the answer

#

If they process fast, it's the autoScalingStrategy that is too slow to grow

#

The other thing I would consider is putting the notifications into chunks. This would help a bunch

teal mica
#

will let you know next time it triggers some notifications

#

also thanks for the help btw this has been something i've been scratching my head about for a while

heavy kettle
#

Of course...happy to help

#

How big is this server? CPU? RAM?

teal mica
#

it just sent one notification that seemingly took like a minute to get processed?

teal mica
heavy kettle
heavy kettle
teal mica
#

just sent out a bunch that were all sent out very quick

[2024-03-15 19:26:40] production.DEBUG: Send notification:Fri Mar 15 2024 19:26:40 GMT+0000
[2024-03-15 19:26:41] production.DEBUG: Send notification:Fri Mar 15 2024 19:26:41 GMT+0000
[2024-03-15 19:26:43] production.DEBUG: Send notification:Fri Mar 15 2024 19:26:43 GMT+0000
[2024-03-15 19:26:44] production.DEBUG: Send notification:Fri Mar 15 2024 19:26:44 GMT+0000
[2024-03-15 19:26:44] production.DEBUG: Send notification:Fri Mar 15 2024 19:26:44 GMT+0000
[2024-03-15 19:26:44] production.DEBUG: Send notification:Fri Mar 15 2024 19:26:44 GMT+0000
[2024-03-15 19:26:44] production.DEBUG: Send notification:Fri Mar 15 2024 19:26:44 GMT+0000
[2024-03-15 19:26:44] production.DEBUG: Send notification:Fri Mar 15 2024 19:26:44 GMT+0000
[2024-03-15 19:26:44] production.DEBUG: Send notification:Fri Mar 15 2024 19:26:44 GMT+0000
[2024-03-15 19:26:44] production.DEBUG: Send notification:Fri Mar 15 2024 19:26:44 GMT+0000
[2024-03-15 19:26:44] production.DEBUG: Send notification:Fri Mar 15 2024 19:26:44 GMT+0000
[2024-03-15 19:26:44] production.DEBUG: Send notification:Fri Mar 15 2024 19:26:44 GMT+0000
[2024-03-15 19:26:44] production.DEBUG: Send notification:Fri Mar 15 2024 19:26:44 GMT+0000
[2024-03-15 19:26:44] production.DEBUG: Send notification:Fri Mar 15 2024 19:26:44 GMT+0000
[2024-03-15 19:26:44] production.DEBUG: Send notification:Fri Mar 15 2024 19:26:44 GMT+0000
[2024-03-15 19:26:44] production.DEBUG: Send notification:Fri Mar 15 2024 19:26:44 GMT+0000
[2024-03-15 19:26:44] production.DEBUG: Send notification:Fri Mar 15 2024 19:26:44 GMT+0000
[2024-03-15 19:26:44] production.DEBUG: Send notification:Fri Mar 15 2024 19:26:44 GMT+0000
[2024-03-15 19:26:44] production.DEBUG: Send notification:Fri Mar 15 2024 19:26:44 GMT+0000
[2024-03-15 19:26:44] production.DEBUG: Send notification:Fri Mar 15 2024 19:26:44 GMT+0000

#

but they sat in the queue for a minute

#

another wave just went through a minute after that too

#

these were all queued for 2 minutes

#

so it seems like it does 20 (after waiting an initial minute), then waits another minute then does 20 more

heavy kettle
#

Hmmmm....do you have any throttling in place?

teal mica
#

nothing that i'm aware of, at least nothing i've done intentionally

#

if it helps i can grant temp access to the repo or something?

heavy kettle
#

Sure. I'll take a look bmckay959