#System crashed and corrupted multiple files

1 messages · Page 1 of 1 (latest)

nimble burrow
#

Hello all, so I made an upload last week and it somehow crashed immich_server (even brought down my entire docker daemon, which is exceedingly strange).
After restarting everything, quite a few images were corrupted, and all the new uploads ever since also becomes corrupted.

I've tried regenerating thumbnails, re-pulling and redeploying, but nothing works. I am planning on re-creating everything from scratch, however, I am hoping to find a solution to this, in case it happens again in the future.

some error logs (couldn't find the logs when immich_server crashed):

[Nest] 7  - 05/18/2025, 12:16:41 AM   ERROR [Microservices:{}] Unable to run job handler (facialRecognition/undefined): TypeError: Cannot read properties of undefined (reading 'replaceAll')
    at JobService.onJobStart (/usr/src/app/dist/services/job.service.js:167:55)
TypeError: Cannot read properties of undefined (reading 'replaceAll')
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async EventRepository.onEvent (/usr/src/app/dist/repositories/event.repository.js:126:13)
    at async Worker.processJob (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:394:28)
    at async Worker.retryIfFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:581:24)
Error: Missing lock for job queue-facial-recognition. failed
    at Scripts.finishedErrors (/usr/src/app/node_modules/bullmq/dist/cjs/classes/scripts.js:272:24)
    at Job.moveToFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/job.js:427:32)
    at async handleFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:379:21)
cold ridgeBOT
#

:wave: Hey @nimble burrow,

Thanks for reaching out to us. Please carefully read this message and follow the recommended actions. This will help us be more effective in our support effort and leave more time for building Immich immich.

References

#

Checklist

I have...

  1. :ballot_box_with_check: verified I'm on the latest release(note that mobile app releases may take some time).
  2. :ballot_box_with_check: read applicable release notes.
  3. :ballot_box_with_check: reviewed the FAQs for known issues.
  4. :ballot_box_with_check: reviewed Github for known issues.
  5. :ballot_box_with_check: tried accessing Immich via local ip (without a custom reverse proxy).
  6. :ballot_box_with_check: uploaded the relevant information (see below).
  7. :ballot_box_with_check: tried an incognito window, disabled extensions, cleared mobile app cache, logged out and back in, different browsers, etc. as applicable

(an item can be marked as "complete" by reacting with the appropriate number)

Information

In order to be able to effectively help you, we need you to provide clear information to show what the problem is. The exact details needed vary per case, but here is a list of things to consider:

  • Your docker-compose.yml and .env files.
  • Logs from all the containers and their status (see above).
  • All the troubleshooting steps you've tried so far.
  • Any recent changes you've made to Immich or your system.
  • Details about your system (both software/OS and hardware).
  • Details about your storage (filesystems, type of disks, output of commands like fdisk -l and df -h).
  • The version of the Immich server, mobile app, and other relevant pieces.
  • Any other information that you think might be relevant.

Please paste files and logs with proper code formatting, and especially avoid blurry screenshots.
Without the right information we can't work out what the problem is. Help us help you ;)

If this ticket can be closed you can use the /close command, and re-open it later if needed.

cold ridgeBOT
nocturne field
#

Could you add your compose/env and host environment @nimble burrow ?

thick temple
nimble burrow
# nocturne field Could you add your compose/env and host environment <@437796341918203905> ?

the compose:

services:
  immich-server:
    container_name: immich_server
    image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release}
    # extends:
    #   file: hwaccel.transcoding.yml
    #   service: cpu # set to one of [nvenc, quicksync, rkmpp, vaapi, vaapi-wsl] for accelerated transcoding
    volumes:
      # Do not edit the next line. If you want to change the media storage location on your system, edit the value of UPLOAD_LOCATION in the .env file
      - ${UPLOAD_LOCATION}:/usr/src/app/upload
      - /etc/localtime:/etc/localtime:ro
      - /volume1/homes/lakrymosa/Photos/PhotoLibrary/M/Mina:/volume1/homes/lakrymosa/Photos/PhotoLibrary/M/Mina
    env_file:
      - stack.env

    depends_on:
      - redis
      - database
    restart: always
    healthcheck:
      disable: false

  immich-machine-learning:
    container_name: immich_machine_learning
    # For hardware acceleration, add one of -[armnn, cuda, openvino] to the image tag.
    # Example tag: ${IMMICH_VERSION:-release}-cuda
    image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}
    # extends: # uncomment this section for hardware acceleration - see https://immich.app/docs/features/ml-hardware-acceleration
    #   file: hwaccel.ml.yml
    #   service: cpu # set to one of [armnn, cuda, openvino, openvino-wsl] for accelerated inference - use the `-wsl` version for WSL2 where applicable
    volumes:
      - model-cache:/cache
    env_file:
      - stack.env
    restart: always
    healthcheck:
      disable: false

  redis:
    container_name: immich_redis
    image: docker.io/redis:6.2-alpine@sha256:905c4ee67b8e0aa955331960d2aa745781e6bd89afc44a8584bfd13bc890f0ae
    healthcheck:
      test: redis-cli ping || exit 1
    restart: always

  database:
    container_name: immich_postgres
    image: docker.io/tensorchord/pgvecto-rs:pg14-v0.2.0@sha256:90724186f0a3517cf6914295b5ab410db9ce23190a2d9d0b9dd6463e3fa298f0
    environment:
      POSTGRES_PASSWORD: ${DB_PASSWORD}
      POSTGRES_USER: ${DB_USERNAME}
      POSTGRES_DB: ${DB_DATABASE_NAME}
      POSTGRES_INITDB_ARGS: '--data-checksums'
    volumes:
      # Do not edit the next line. If you want to change the database storage location on your system, edit the value of DB_DATA_LOCATION in the .env file
      - ${DB_DATA_LOCATION}:/var/lib/postgresql/data
    healthcheck:
      test: >-
        pg_isready --dbname="$${POSTGRES_DB}" --username="$${POSTGRES_USER}" || exit 1;
        Chksum="$$(psql --dbname="$${POSTGRES_DB}" --username="$${POSTGRES_USER}" --tuples-only --no-align
        --command='SELECT COALESCE(SUM(checksum_failures), 0) FROM pg_stat_database')";
        echo "checksum failure count is $$Chksum";
        [ "$$Chksum" = '0' ] || exit 1
      interval: 5m
      # start_interval: 30s
      start_period: 5m
    command: >-
      postgres
      -c shared_preload_libraries=vectors.so
      -c 'search_path="$$user", public, vectors'
      -c logging_collector=on
      -c max_wal_size=2GB
      -c shared_buffers=512MB
      -c wal_compression=on
    restart: always

volumes:
  model-cache:

networks:
  default:
    external: true
    name: nginx_network

env variables:

UPLOAD_LOCATION=/volume1/docker/immich
DB_DATA_LOCATION=/volume1/docker/immich/database
TZ=Asia/Shanghai
IMMICH_VERSION=release
DB_PASSWORD=msHGfvY8F4QLrmb4QruiKS2enaRyuGKWX3MeGiwLjna8J13X
DB_USERNAME=postgres
DB_DATABASE_NAME=immich
nimble burrow
nocturne field
#

Seems like a Synology NAS hosting it

#

Which model do you have?

#

(Right now my mind is going towards too little memory)

nimble burrow
nocturne field
#

Yeah 20 should be fine for the heaviest users, NASes often only have 4G which is just enough until you start uploading things

#

If you upload new images, do they appear in the GUI?

#

Even with an "Error"

#

or just nothing

nimble burrow
#

most of the old pictures were still intact, but all the new uploads are broken.

nocturne field
#

Ok but the new ones do upload, that means the thumbnail generation is broken 🤔

nimble burrow
nocturne field
#

No the enlarged one shows a larger thumbnail 🙂

#

if you download then it will use the original

#

thumbnail is maybe a bit misleading because the large ones are 1440p

nimble burrow
#

ah I see, so yeah, the thumbnails are broken

nimble burrow
nocturne field
#

yeah sorry the GUI calls them "preview" but they are stored in the thumbnail section nonetheless

#

You don't store the thumbnails seperately so I'm really scratching my head here

#

Do you know which upload caused this?

nimble burrow
#

yep, I've tried deleting it and re-upload again, but it is still broken

#

but I have teared down everything so I am unable to produce any logs

nocturne field
#

Partition isn't full or anything ?

#

There are permanent database-specific logs in DB_DATA_LOCATION/log(s) (I always forget whether there is an s)

nimble burrow
nimble burrow
nocturne field
#

Yeah unless your file happened to be 100GB I don't think that will have caused it then 😛

nimble burrow
nocturne field
#

Is the file anything you're willing to share ?

nimble burrow
#

kinda nervous, I love immich so much, it is the coolest app in my homelab

nocturne field
#

If you want to, zip it up and upload it here or in private

nimble burrow
nocturne field
#

don't straight upload it to discord because it will clean it up

nimble burrow
#

if I can find it, what files should I upload?

nocturne field
#

Oh I was wondering if you knew the specific file it crashed on, if not then maybe it will be a wild goose chase 😅

nimble burrow
#

haha, it is just a few pictures took on an iPhone, I don't think it is any different than the other pictures

#

it is very strange, becuase the said upload crashed my entire docker daemon, I thought dockers are "containerized/compartmentrized"

#

still got no idea what happened

#

oh btw, while I am rebuilding, I've run into new problems...wondering if you can also help me with it?

nocturne field
#

Sure, just tell us 🙂

nimble burrow
#

thanks! so

#

it is an iPhone 16 pro

#

the small one, not pro max

#

and it was uploading using the immich app

#

so I am trying to rebuild today, and I wanted to bind external libraries, as you can probably see on my compose file: /volume1/homes/lakrymosa/Photos/PhotoLibrary/M/Mina:/volume1/homes/lakrymosa/Photos/PhotoLibrary/M/Mina. I had this particular binding running fine in my old immich, but today with the new build it just won't work

nocturne field
#

Just curious because I recently got an iPhone 16 and I'm seeing weird thumbnails because the live photos have multiple previews embedded in them

nimble burrow
#

very strange, I couldn't remember how I managed to bind it on my old immich, I binded it on the first day I deployed immich

nimble burrow
#

but live photos isn't new though, I probably uploaded live photos before

nocturne field
#

I don't think it should be this, just curious if there could be some connection

nocturne field
#

Unless there is some permission you still need to give the container on the Synology side

nimble burrow
#

yep that's what I thought, and the link is validated successfully, but the picture won't show up, and the library says 0 photos

nimble burrow
#

I am not very code-savvy, so I've tried some easy one such as including user: 1026:100 in the compose, or putting PUID PGID in the environment variables, but still won't work

#

and crashed it again, had to rebuild multiple time, still stuck

#

the user: 1026:100 is my Synology administrator ID

#

I think I might need to SSH into it and try those "chmod+numbers" thing

#

Mraedis is Gandoff's elv name right?

#

Gandoff the white wizard

#

oh and btw, I am also reading the "data dump" section of the immich doc, it says I can use it as a backup mechanism? not sure how to do it thought...I am worried that immich might crash and files become corrupted again, and I have to rebuild...uploading a ton of photos and waiting on the system to recongize all the faces is a nightmare.

#

is there an easy way to backup? I am wondering, can I just copy and backup the "immich" folder regularly, or use those fancy "snapshot" functions, if things go south, I can just replace the immich folder with the backup and everything will be back online?

#

and another stupid question: there are two other family members using my immich, but I have some private photo that I rather keep it to myself, is there a way to do it in immich? I am the admin

nocturne field
#

No, Mraedis is my personal user name 😛

nocturne field
#

these are database only and are done once every 24 hours unless you've changed the settings

#

database dumps are very easy IMO

nocturne field
#

the admin also can't see anything unless they navigate to the files on the host

nimble burrow
nimble burrow
nocturne field
#

You can also adjust the frequency

#

And yes, having a copy of the 'backup' folder should be basically it as far as the database is concerned

nimble burrow
nimble burrow
#

@nocturne field Hello Mraedis, I can really use some help here! I build a fresh one using the compose and env. from immich official guide, but all the uploads are broken. I rebuilt immich multiple times yesterday trying to fix the eternal library mount, everytime I rebuilt, I always uploaded a few photos to make sure immich is functioning properly, and it always worked well.
So right now I am scratching my head here, I haven't done anything differently. I've also tried force refresh/using a different brower, tear down everything and make sure it is a fresh rebuild, but it is still broken.

#

should I upload my compose and eni variables here? but everything is just simply copy and paste from official docs

nocturne field
#

Did you also add external libraries?

nimble burrow
nocturne field
#

What's in the docker logs?

nimble burrow
#

For immich_server, all the logs are green, I don't see any errors:

#

The redis looks fine too:

1:M 19 May 2025 10:33:37.018 # WARNING Memory overcommit must be enabled! Without it, a background save or replication may fail under low memory condition. Being disabled, it can also cause failures without low memory condition, see https://github.com/jemalloc/jemalloc/issues/1328. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.

1:M 19 May 2025 10:33:37.018 * oO0OoO0OoO0Oo Valkey is starting oO0OoO0OoO0Oo

1:M 19 May 2025 10:33:37.018 * Valkey version=8.1.0, bits=64, commit=00000000, modified=0, pid=1, just started

1:M 19 May 2025 10:33:37.018 # Warning: no config file specified, using the default config. In order to specify a config file use valkey-server /path/to/valkey.conf

1:M 19 May 2025 10:33:37.019 * monotonic clock: POSIX clock_gettime

1:M 19 May 2025 10:33:37.020 * Running mode=standalone, port=6379.

1:M 19 May 2025 10:33:37.020 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.

1:M 19 May 2025 10:33:37.020 * Server initialized

1:M 19 May 2025 10:33:37.020 * Ready to accept connections tcp

1:M 19 May 2025 10:42:49.812 * 100 changes in 300 seconds. Saving...

1:M 19 May 2025 10:42:49.813 * Background saving started by pid 137

137:C 19 May 2025 10:42:50.877 * DB saved on disk

137:C 19 May 2025 10:42:50.877 * Fork CoW for RDB: current 0 MB, peak 0 MB, average 0 MB

1:M 19 May 2025 10:42:50.921 * Background saving terminated with success

1:M 19 May 2025 10:50:51.841 * 100 changes in 300 seconds. Saving...

1:M 19 May 2025 10:50:51.841 * Background saving started by pid 245

245:C 19 May 2025 10:50:54.448 * DB saved on disk

245:C 19 May 2025 10:50:54.449 * Fork CoW for RDB: current 0 MB, peak 0 MB, average 0 MB

1:M 19 May 2025 10:50:54.455 * Background saving terminated with success

1:M 19 May 2025 10:57:24.702 * 100 changes in 300 seconds. Saving...

1:M 19 May 2025 10:57:24.702 * Background saving started by pid 331

331:C 19 May 2025 10:57:29.352 * DB saved on disk

331:C 19 May 2025 10:57:29.353 * Fork CoW for RDB: current 1 MB, peak 1 MB, average 1 MB

1:M 19 May 2025 10:57:29.427 * Background saving terminated with success

1:M 19 May 2025 11:05:49.920 * 100 changes in 300 seconds. Saving...

1:M 19 May 2025 11:05:49.920 * Background saving started by pid 446

446:C 19 May 2025 11:05:52.758 * DB saved on disk

446:C 19 May 2025 11:05:52.759 * Fork CoW for RDB: current 1 MB, peak 1 MB, average 0 MB

1:M 19 May 2025 11:05:52.837 * Background saving terminated with success

1:M 19 May 2025 11:12:49.906 * 100 changes in 300 seconds. Saving...

1:M 19 May 2025 11:12:49.906 * Background saving started by pid 537

537:C 19 May 2025 11:12:50.982 * DB saved on disk

537:C 19 May 2025 11:12:50.983 * Fork CoW for RDB: current 1 MB, peak 1 MB, average 0 MB

1:M 19 May 2025 11:12:51.013 * Background saving terminated with success
#

Postgres also looks fine:

The files belonging to this database system will be owned by user "postgres".

This user must also own the server process.

The database cluster will be initialized with locale "en_US.utf8".

The default database encoding has accordingly been set to "UTF8".

The default text search configuration will be set to "english".

Data page checksums are enabled.

fixing permissions on existing directory /var/lib/postgresql/data ... ok

creating subdirectories ... ok

selecting dynamic shared memory implementation ... posix

selecting default max_connections ... 100

selecting default shared_buffers ... 128MB

selecting default time zone ... Etc/UTC

creating configuration files ... ok

running bootstrap script ... ok

performing post-bootstrap initialization ... ok

syncing data to disk ... ok

initdb: warning: enabling "trust" authentication for local connections

You can change this by editing pg_hba.conf or using the option -A, or

--auth-local and --auth-host, the next time you run initdb.

Success. You can now start the database server using:

    pg_ctl -D /var/lib/postgresql/data -l logfile start

waiting for server to start.......2025-05-19 10:33:56.289 UTC [49] LOG:  redirecting log output to logging collector process

2025-05-19 10:33:56.289 UTC [49] HINT:  Future log output will appear in directory "log".

...... done

server started

CREATE DATABASE

/usr/local/bin/docker-entrypoint.sh: ignoring /docker-entrypoint-initdb.d/*

waiting for server to shut down............ done

server stopped

PostgreSQL init process complete; ready for start up.

2025-05-19 10:34:36.489 UTC [1] LOG:  redirecting log output to logging collector process

2025-05-19 10:34:36.489 UTC [1] HINT:  Future log output will appear in directory "log".
#

machine learning:

[05/19/25 18:33:24] INFO     Starting gunicorn 23.0.0                           

[05/19/25 18:33:24] INFO     Listening at: http://[::]:3003 (8)                 

[05/19/25 18:33:24] INFO     Using worker: immich_ml.config.CustomUvicornWorker 

[05/19/25 18:33:24] INFO     Booting worker with pid: 9                         

[05/19/25 18:34:32] INFO     Started server process [9]                         

[05/19/25 18:34:32] INFO     Waiting for application startup.                   

[05/19/25 18:34:32] INFO     Created in-memory cache with unloading after 300s  

                             of inactivity.                                     

[05/19/25 18:34:32] INFO     Initialized request thread pool with 4 threads.    

[05/19/25 18:34:32] INFO     Application startup complete.      
#

All the logs above are from a fresh rebuild (built just 5 mins before I post on Discord), so all the logs are from the very begining

#

Really have no idea here. Maybe I rebuilt way too many time yesterday, and somehow the docker daemon cached some werid errors and corruptted my new build? doesn't make any sense. Should restart my host (synology NAS)?

nocturne field
#

Not seeing any errors in your logs

#

What do you mean by "uploads are broken" btw

nimble burrow
#

the thumbnails are broken

nocturne field
#

could you go into the admin section and press "missing" on the thumbnails job

#

report back how many it is queueing

nimble burrow
#

in the admin section, "GENERATE THUMBNAILS" tab, it was showing 0 active, 3 waiting. After I pressed the "missing", 4 is waiting

#

this is weird, I just checked my compose file again, I did not include any external bind mount, but here it is showing an external library waiting

#

there is nothing in the external library section, and I didn't add any.

nocturne field
#

curious indeed

#

Did you check the external library section in the admin panel to make sure there is none?

#

But it sounds like your redis cache is a bit broken

nimble burrow
#

might have to reboot my synology nas, will report back

nimble burrow
#

@nocturne field Hello Mraedis, I am happy to report that after restarting my Synology, immich is working fine now (I didn't rebuild after restart, it is the same build as before restart).

nimble burrow
#

hey Mraedis, here are some additional updates: the bind mounts are also working now, at first the bind mount pictures are "broken", however, after a container restart, thumbnails are generated properly.