#Issues in SE region causing a massive amount of jobs to be retried

26 messages · Page 1 of 1 (latest)

vapid vigil
#

The issues in the screenshot are causing 10% of my jobs to be retried in SE region. Please fix this, its not happening in CA region.

#

Obviously I am referring to the "Connection timeout" errors which causes the job results to fail to be returned, and not the single exeption among them.

serene pebble
#

@vapid vigil DO YOU MIND SUBMITING AS TICKET ON WEBSITE EASIER TO ESCALATE

vapid vigil
#

No need to shout but sure 😁

serene pebble
#

ups sorry for caps

serene pebble
#

done

vapid vigil
lavish hearth
#

hahaha

#

wait SE?

vapid vigil
lavish hearth
vapid vigil
#

I said 10% are retried NOT ALL 🤦‍♂️

lavish hearth
#

im using dev on SE

lavish hearth
vapid vigil
lavish hearth
#

well goodluck on your problem

vapid vigil
#

RunPod needs to check it out, I switched to CA in the meantime and it works fine without any issues.

lavish hearth
#

great to hear

vapid vigil
#

I was using CA but then switched to SE because my jobs were failing, but it was actually because my own Redis server had OOM issues due to running out of memory and wasn't a RunPod issue.

#

So I upgraded my ElastiCache instance on AWS from cache.t3.medium to cache.m4.large and now its fine.

lavish hearth
#

Wow you use elasticache?

#

why not self hosted redis

vapid vigil
#

Because its a cluster not a single instance