#Disable pod restarts

21 messages · Page 1 of 1 (latest)

vivid pasture
#

Hi, I’m trying to use runpod for a large distributed fault-tolerant inference job. My containers pull from a global work queue and exit when they don’t see any work for a while. However, it seems like runpod’s default behavior is to simply restart the pod, which means I have to micromanage when to terminate my pods. Can I disable the automatic restarts and just have the pod terminate when PID 1 exits, like running a docker container normally does?

noble rivet
#

Having similar problem

latent tree
#

i recommend just reading the env "RUNPOD_POD_ID" and use it for terminating the pod

#

RUNPOD_POD_ID

#

its in every pod

noble rivet
#

do you mean i should add towards the end of my job for example an api request to terminate the pod ?

latent tree
#

When you want to terminate the pod, yes just call a terminate request into runpod api, graphql or rest

#

Then just terminate with pod id

verbal elk
#

If you are using the official image

#

It should have runpodctl built in

#

With a proper pod scoped api key

#

So it should be able to kill itself with

runpodctl remove pod "$(RUNPOD_POD_ID)"
#

Ive confirmed that it works with runpod official pytprch image

#

And have used it in ffmped testing for 400+ pods

#

Secure cloud had zero issues

noble rivet
#

awesome, I'm not using the official image but i will bake that into my own

noble rivet
#

theres a pr on the linked issue which fixes it

spiral moon