#My pod has randomly crashed several times today, and received emails of Runpod issues.

57 messages · Page 1 of 1 (latest)

peak magnet
#

Today, my pod has crashed a few times, to the point where I'm receiving emails from Runpod about the issues. How can I fix?

modest tokenBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> ✅ Mark Solution

thin axle
#

could you provide some informations

worldly grove
#

Maybe some logs or screenshot on your pod tab would help

peak magnet
#

Here's a snapshot of the audit logs, where I'm stopping and starting pods that have disconnected in the middle of processes..

thin axle
#

@peak magnet deleted your post as you leaked your email. You are running comfy from web terminal?

peak magnet
#

Thank you! Didn't know that would be an issue. Yes, I'm launching Comfy either thru Jupiter's terminal or the native terminal

#

and in the middle of creating, I'll get a connection closed, and several hours of work will crash. SUPER frustrating.

#

What can we do to keep a solid connection?

thin axle
#

use normal ssh and run process in tmux

peak magnet
#

Are there tutorials available on how to set that up?

#

And what's the difference in the user experience?

thin axle
#

Usually you want to setup ssh keys on your machine and add public key to RunPod settings page.

peak magnet
#

Something like this??

peak magnet
#

Using WSL or Windows Subsystem for Linux to Setup SSH Public/Private Key Pair for Vast.ai and Runpod. Secure your XenBlocks Cloud Miner!

SSH Key Pair Guide: https://github.com/TreeCityWes/VastSSHKeyPair/blob/main/VastSSHKeyPair.md

Vast.ai GPU Rental: https://cloud.vast.ai/?ref_id=88736

Xen.game: https://xen.game/treecitywes

GDXen: https://w...

▶ Play video
worldly grove
#

Yes, what he did is the same thing generating public key using wsl, setting it into the platform then connecting with ssh

thin axle
#

Bt you dont need to use wsl as windows has build in ssh client

peak magnet
#

Hey guys... My pod just disconnected again, and I was using SSH via Terminus..

south stump
#

Your pod ran out of system memory (RAM not VRAM) and the Linux kernel killed off the process. Your pod was not disconnected.. Try using the filter at the top of the page to ensure that your pod gets more system memory assigend to it.

#

Which template is this by the way? You can load tcmalloc to try to improve memory management, thats what A1111 and Forge do because they ran out of memory when switching out models too frequently.

peak magnet
south stump
#

Why do you come here asking for help if you know better than everyone here?

#

And pytorch template does not include libtcmalloc so install it and implement it as I suggested.

#

Without libtcmalloc stable diffusion runs out of memory eventually.

peak magnet
peak magnet
peak magnet
#

And would the command to install be:

#

pip install libtcmalloc-minimal4
TCMALLOC="$(ldconfig -p | grep -Po "libtcmalloc.so.\d" | head -n 1)"
export LD_PRELOAD="${TCMALLOC}"

worldly grove
#

Yep hep

#

Try it

peak magnet
#

pip install libtcmalloc-minimal4
ERROR: Could not find a version that satisfies the requirement libtcmalloc-minimal4 (from versions: none)
ERROR: No matching distribution found for libtcmalloc-minimal4

worldly grove
#

Try looking for other scripts like from setup in runpod workers or runpod templates

#

There should be some examples of working tmalloc install

#

I'm not on my pc right now so can't help much sorry

south stump
peak magnet
thin axle
#

@peak magnet
apt-get install google-perftools

#

this is correct way to install TCmalloc

worldly grove
#

oof

modest tokenBOT
peak magnet
#

Got this error when trying to install:

#

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
E: Unable to locate package google-perftools

thin axle
#

run first apt update

south stump
#

And the google one is called libgoogle-perftools4

#
apt update && apt -y install libtcmalloc-minimal4 libgoogle-perftools4
peak magnet
#

apt update
apt update && apt -y install libtcmalloc-minimal4 libgoogle-perftools4
TCMALLOC="$(ldconfig -p | grep -Po "libtcmalloc.so.\d" | head -n 1)"
export LD_PRELOAD="${TCMALLOC}"

south stump