#worker keeps dying while training a lora model
9 messages · Page 1 of 1 (latest)
Hmm yeah i wonder if this is normal, and idle timeout seems not to work, being active as supposed to
removing execution timeout fixed it
@dull scaffold this maybe a bug in runpod
@paper mason would you mind providing the endpoint id or some more info about the used docker image?
I'm not sure if it's a bug. i think it worked as intended as i set the execution timeout. endpoint id is z398ywur6g1041. docker image is custom one i made for training a flux lora model.
i just thought it was unexpected because i don't remember checking that box. i think it's checked by default when you create a worker
This behavior is intentional. The execution timeout is designed to prevent a worker from running indefinitely, which could happen if there’s a bug in the code or a long-running process that could potentially drain all your credits.