Hi, I’m trying to use runpod for a large distributed fault-tolerant inference job. My containers pull from a global work queue and exit when they don’t see any work for a while. However, it seems like runpod’s default behavior is to simply restart the pod, which means I have to micromanage when to terminate my pods. Can I disable the automatic restarts and just have the pod terminate when PID 1 exits, like running a docker container normally does?
#Disable pod restarts
21 messages · Page 1 of 1 (latest)
Having similar problem
how do you EXIT the pod? is it terminate pod?
i recommend just reading the env "RUNPOD_POD_ID" and use it for terminating the pod
RUNPOD_POD_ID
its in every pod
do you mean i should add towards the end of my job for example an api request to terminate the pod ?
When you want to terminate the pod, yes just call a terminate request into runpod api, graphql or rest
Then just terminate with pod id
If you are using the official image
It should have runpodctl built in
With a proper pod scoped api key
So it should be able to kill itself with
runpodctl remove pod "$(RUNPOD_POD_ID)"
Ive confirmed that it works with runpod official pytprch image
And have used it in ffmped testing for 400+ pods
Secure cloud had zero issues
awesome, I'm not using the official image but i will bake that into my own
https://github.com/runpod/runpodctl/issues/202 runpod cli currently doesnt work , but i managed to install using a workaround
GitHub
Official install script fails wget -qO- cli.runpod.net | sudo bash Installing runpodctl... All system requirements satisfied. Latest version of runpodctl: v1.14.11 Failed to download runpodctl. Thi...
theres a pr on the linked issue which fixes it
I approved this PR, thank you for showing me it exists.