#If a DJANGO process is failing because of the workers

20 messages · Page 1 of 1 (latest)

lean sail
#

I do some maths in my django app on railway. It's complicated, and CPU bound. I have the current Developer account (the $5 one).

The maths fails once deployed on some runs with this error:

[2023-06-05 14:02:41 +0000] [74] [CRITICAL] WORKER TIMEOUT (pid:140)
[2023-06-05 14:02:41 +0000] [140] [INFO] Worker exiting (pid: 140)
[2023-06-05 14:02:42 +0000] [74] [WARNING] Worker with pid 140 was terminated due to signal 9

Similar to this SO

I'm not doing anything smart right now like managing workers, async things or using specific multi-threading libs.

I'm not sure how to solve this issue (without further reducing the data set size, which is already as small as possible). I currently have 2 possible ideas:

  1. Follow the advice on the SO, to increase the allocated worker memory
  2. Add a FastAPI service to the railway deployment and somehow offload the process to there

Can anyone share any suggestions to solve this, or assess the likelyhood of either of my plans working?

copper thornBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> ✅ Mark Solution

obtuse wolfBOT
#

Project ID: 8a43b0e6-5720-40f3-8ee1-27f04ba76646

lean sail
#

8a43b0e6-5720-40f3-8ee1-27f04ba76646

ruby gulch
#

can you show me a screenshot of your service metrics during the times the worker gets killed?

ruby gulch
#

when did you upgrade to the dev plan

lean sail
#

A few days ago, last week, around wed

ruby gulch
#

okay good so your service would be on the dev plan

#

show me the logs with the kill message

lean sail
# ruby gulch show me the logs with the kill message

[2023-06-05 14:02:41 +0000] [74] [CRITICAL] WORKER TIMEOUT (pid:140)
[2023-06-05 14:02:41 +0000] [140] [INFO] Worker exiting (pid: 140)
[2023-06-05 14:02:42 +0000] [74] [WARNING] Worker with pid 140 was terminated due to signal 9
[2023-06-05 14:02:42 +0000] [205] [INFO] Booting worker with pid: 205
<array_function internals>:200: RuntimeWarning: invalid value encountered in cast

ruby gulch
#

how big would you say your dataset it?

lean sail
ruby gulch
#

cool

lean sail
# ruby gulch how big would you say your dataset it?

Run from local the answer to your question is:

engi-alpha-django-web-1  | <class 'pandas.core.frame.DataFrame'>
engi-alpha-django-web-1  | RangeIndex: 336 entries, 0 to 335
engi-alpha-django-web-1  | Data columns (total 2 columns):
engi-alpha-django-web-1  |  #   Column  Non-Null Count  Dtype              
engi-alpha-django-web-1  | ---  ------  --------------  -----              
engi-alpha-django-web-1  |  0   ds      336 non-null    datetime64[ns, UTC]
engi-alpha-django-web-1  |  1   y       336 non-null    float64            
engi-alpha-django-web-1  | dtypes: datetime64[ns, UTC](1), float64(1)
engi-alpha-django-web-1  | memory usage: 5.4 KB

So not big. The answer to my problem however is not the data size, but some input data, (particularly the ones that fail), have majority 0 values, therefore the maths goes ballistic. It technically can cope with 0, but the way the maths gets executed becomes too intensive, and then thats where the timeout occurs on the deploy.

So, that's the core issue, and I guess I need to do better UI/ upfront checks for that. but my original question still is floating in my head. If I ran up another railway service just to do the maths, and then got django to call it via an API, how does railway allocate resourcing between those 2 services? Would you expect there to be a performance gain in that architecture, or because everything is fundamentally using the same resources, it wouldn't nessicarily help? (this might just be a 'how long is a piece of string/ suck it and see' answer, but experienced insight might save me some labour 🙂 )

ruby gulch
#

every service gets 8vcpu and 8gb mem, so offloading work to a worker service is totally a viable option

lean sail
ruby gulch
#

yep, but if you actually constantly use that much resources you will face a very hefty bill

lean sail
ruby gulch
#

no problem, come back if you need any more help 🙂