#Production Redis instance down while re-deployment is stuck

33 messages · Page 1 of 1 (latest)

torn flax
#

After I renamed a Redis service private networking address.

project/204594ed-2cbb-4f9e-924a-88cf5d199e27/service/64564b48-5b07-4326-b701-392e6a09ec5b?environmentId=767ba06f-5057-4e54-9c14-7104bd5ae0e2

tiny peakBOT
torn flax
#

Our main production operations have been down for 30 minutes now because of this.

torn flax
#

The Redis service keeps randomly going down.

torn flax
#

Our production Redis instance is still down after numerous deployment attempts.

spring meadow
#

Do you have any logs that you can share?

#

Is the deployment itself crashing?

torn flax
#

No build/deploy logs being shown unfortunately. Sometimes the deployment itself fails, sometimes it passes but randomly goes offline after a couple minutes. At this time there is an "active" deployment (no logs at all) but the instance isn't reachable. Not sure if it's a state or UI bug.

spring meadow
#

Have you tried renaming back to the original private network address?

torn flax
#

Good idea I'll give it a try after this next deployment fails

spring meadow
#

I'll see if I can repro this issue

torn flax
#

Still no luck after changing back to the original address

normal shoal
#

I'm seeing a simlar issue (related?) - redeployed a service and now it cannot talk to my Redis instances.

#
PHP Fatal error:  Uncaught RedisException: Connection timed out
#

FWIW this Redis instance itself was not redeployed.

#

Trying to SSH into the Redis instance shows:

📦 Your service's container is not running (status: exited)
🔧 Deploy or restart your service, then try again.
#

@torn flax do you get that too?

torn flax
#

I can't do that because the service is pending deletion right now, but I imagine I would get the same result as you. The container was definitely not running

normal shoal
#

Redeploy has fixed for me. FWIW, from my service, my-redis was not resolving. my-redis.railway.internal was, however - may play a factor.

torn flax
#

Staff, please just delete the service that was linked in this thread. It's been pending deletion for half an hour now and blocks any other changes from going out. The environment is essentially unusable right now.

torn flax
#

@restive flame @austere widget sorry for the ping but something must be done here

high valley
#

Try on railway Station, the team is there more active

austere widget
#

Working on a fix for this now

#

This bug is so very rare I havent seen it in ~4 months

torn flax
#

god bless you noah

austere widget
#

The Redis2 instance you had pending deletion has now completed cc @torn flax

#

Sorry you got nailed by that, not ideal at all

#

The nature of the bug causing that deletion issue is so very rare its hard to consistently reproduce. However we did allegedly fix it. Have some things coming down the pipe that should nuke it for good

#

If I'm able to help otherwise please let me know!

torn flax
#

that should do the trick thanks for the post mortem

austere widget
#

Absolutely!