#worker keeps dying while training a lora model

9 messages · Page 1 of 1 (latest)

paper mason
#

even after setting the worker to be active, it keeps dying after like 2 minutes. is there a way to prevent this?

karmic palm
paper mason
#

removing execution timeout fixed it

karmic palm
#

@dull scaffold this maybe a bug in runpod

dull scaffold
#

@paper mason would you mind providing the endpoint id or some more info about the used docker image?

paper mason
#

I'm not sure if it's a bug. i think it worked as intended as i set the execution timeout. endpoint id is z398ywur6g1041. docker image is custom one i made for training a flux lora model.

#

i just thought it was unexpected because i don't remember checking that box. i think it's checked by default when you create a worker

earnest ore
#

This behavior is intentional. The execution timeout is designed to prevent a worker from running indefinitely, which could happen if there’s a bug in the code or a long-running process that could potentially drain all your credits.