#No Smart Search

1 messages · Page 1 of 1 (latest)

old fox
#
  • OS: Rocky9
  • Deployment: Docker Compose
  • Immich Version: v1.121.0 & 1.134.0
  • HW:
    • AMD 3600
    • ASUS B550 mobo
    • Gigabyte 1080ti
    • Several TB free storage
  • Reverse Proxy: SWAG

I was previously running immich v1.121.0 when I noticed that my smart search was no longer working. I decided I likely just needed an update, and updated to v1.134.0 and checked for updates to the docker-compose.yml & supporting files. I can't remember if I checked the logs before updating to the newer version, but I'm currently seeing the error(s). From what I can gather, I think the models are self contained in the ML image. Which ruled out my first thought that I may be blocking the location hosting the models. I've also tried to open port 3003 on the ML container and that didn't help. Any ideas?

immich | [Nest] 17  - 06/13/2025, 12:40:14 AM    WARN [Api:MachineLearningRepository~0lkhpjdx] Machine learning request to "http://immich-machine-learning:3003" failed: fetch failed
immich | [Nest] 17  - 06/13/2025, 12:40:14 AM   ERROR [Api:ErrorInterceptor~0lkhpjdx] Unknown error: Error: Machine learning request '{"clip":{"textual":{"modelName":"ViT-B-32__openai","options":{"language":"en-US"}}}}' failed for all URLs
covert sphinxBOT
#

:wave: Hey @old fox,

Thanks for reaching out to us. Please carefully read this message and follow the recommended actions. This will help us be more effective in our support effort and leave more time for building Immich immich.

References

#

Checklist

I have...

  1. :ballot_box_with_check: verified I'm on the latest release(note that mobile app releases may take some time).
  2. :ballot_box_with_check: read applicable release notes.
  3. :ballot_box_with_check: reviewed the FAQs for known issues.
  4. :ballot_box_with_check: reviewed Github for known issues.
  5. :ballot_box_with_check: tried accessing Immich via local ip (without a custom reverse proxy).
  6. :ballot_box_with_check: uploaded the relevant information (see below).
  7. :ballot_box_with_check: tried an incognito window, disabled extensions, cleared mobile app cache, logged out and back in, different browsers, etc. as applicable

(an item can be marked as "complete" by reacting with the appropriate number)

Information

In order to be able to effectively help you, we need you to provide clear information to show what the problem is. The exact details needed vary per case, but here is a list of things to consider:

  • Your docker-compose.yml and .env files.
  • Logs from all the containers and their status (see above).
  • All the troubleshooting steps you've tried so far.
  • Any recent changes you've made to Immich or your system.
  • Details about your system (both software/OS and hardware).
  • Details about your storage (filesystems, type of disks, output of commands like fdisk -l and df -h).
  • The version of the Immich server, mobile app, and other relevant pieces.
  • Any other information that you think might be relevant.

Please paste files and logs with proper code formatting, and especially avoid blurry screenshots.
Without the right information we can't work out what the problem is. Help us help you ;)

If this ticket can be closed you can use the /close command, and re-open it later if needed.

old fox
#

IMMICH_VERSION=v1.134.0

immich-server:
    container_name: immich
    image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION}
    deploy:
      replicas: 1
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities:
                - gpu
                - compute
                - video
    env_file:
      - .env
    ports:
      - 2283:2283
    volumes:
      - ${UPLOAD_LOCATION}:/usr/src/app/upload
      - /etc/localtime:/etc/localtime:ro
    restart: unless-stopped
    depends_on:
      - redis
      - database

  immich-machine-learning:
    container_name: immich_machine_learning
    image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION}-cuda
    deploy:
      replicas: 1
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities:
                - gpu
    env_file:
      - .env
    volumes:
      - model-cache:/cache
    restart: unless-stopped

  redis:
    container_name: immich_redis
    image: docker.io/valkey/valkey:8-bookworm@sha256:a19bebed6a91bd5e6e2106fef015f9602a3392deeb7c9ed47548378dcee3dfc2
    healthcheck:
      test: redis-cli ping || exit 1
    deploy:
      replicas: 1
    restart: unless-stopped

  database:
    container_name: immich_postgres
    image: ghcr.io/immich-app/postgres:14-vectorchord0.3.0-pgvectors0.3.0
    deploy:
      replicas: 1
    environment:
      POSTGRES_PASSWORD: ${DB_PASSWORD}
      POSTGRES_USER: ${DB_USERNAME}
      POSTGRES_DB: ${DB_DATABASE_NAME}
      POSTGRES_INITDB_ARGS: '--data-checksums'
      DB_STORAGE_TYPE: 'HDD'
    volumes:
      - ${DB_DATA_LOCATION}:/var/lib/postgresql/data
    restart: unless-stopped
#

This has occurred with the existing model-cache as well as a new & fresh model-cache

old fox
#

FWIW, when drop -cuda from the ML image it works great

eager flower
#

ERROR Worker (pid:9) was sent code 139!

#

It's segfaulting

#

What driver version are you running?

old fox
#

NVIDIA-SMI 550.100 Driver Version: 575.57.08 CUDA Version: 12.4

#

Running on a 1080ti which should have compute capability 6.1

eager flower
old fox
#

I believe so

dnf list installed | grep nvidia-container
libnvidia-container-tools.x86_64               1.17.8-1                            @cuda-rhel9-x86_64                             
libnvidia-container1.x86_64                    1.17.8-1                            @cuda-rhel9-x86_64                             
nvidia-container-toolkit.x86_64                1.17.8-1                            @cuda-rhel9-x86_64                             
nvidia-container-toolkit-base.x86_64           1.17.8-1                            @cuda-rhel9-x86_64
eager flower
#

@distant bear any idea?

old fox
#
dnf list installed | grep nvidia
kmod-nvidia-latest-dkms.x86_64                 3:575.57.08-1.el9                   @cuda-rhel9-x86_64                             
libnvidia-container-tools.x86_64               1.17.8-1                            @cuda-rhel9-x86_64                             
libnvidia-container1.x86_64                    1.17.8-1                            @cuda-rhel9-x86_64                             
libnvidia-gpucomp.x86_64                       3:575.57.08-1.el9                   @cuda-rhel9-x86_64                             
libnvidia-ml.x86_64                            3:575.57.08-1.el9                   @cuda-rhel9-x86_64                             
nvidia-container-toolkit.x86_64                1.17.8-1                            @cuda-rhel9-x86_64                             
nvidia-container-toolkit-base.x86_64           1.17.8-1                            @cuda-rhel9-x86_64                             
nvidia-driver.x86_64                           3:575.57.08-1.el9                   @cuda-rhel9-x86_64                             
nvidia-driver-libs.x86_64                      3:575.57.08-1.el9                   @cuda-rhel9-x86_64                             
nvidia-kmod-common.noarch                      3:575.57.08-1.el9                   @cuda-rhel9-x86_64                             
nvidia-modprobe.x86_64                         3:575.57.08-1.el9                   @cuda-rhel9-x86_64
distant bear
#

Segfaults are generally driver-related. I'm not sure why that specific driver would be problematic though

old fox
#

Looks like that's the latest available nvidia driver on Rocky9 atm. I could try and replace it with the dkms version maybe?

#

Actually I'm already on dkms

distant bear
#

Do you have the latest nvidia-container-toolkit installed?

old fox
#

Yup 1.17.8 is the latest per their GitHub

#

I did have a kernel update to do. So I did that and let DKMS rebuild but that didn't help

old fox
#

@distant bear If I set the LD_LIBRARY_PATH for the ML container to include the path to cuda I'm able to get farther bash immich_machine_learning | 2025-10-02 22:04:45.616101011 [E:onnxruntime:Default, cuda_call.cc:118 CudaCall] CUDA failure 803: system has unsupported display driver / cuda driver combination ; GPU=-1 ; hostname=1aeb1e8be6a3 ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider_info.cc ; line=66 ; expr=cudaGetDeviceCount(&num_devices); immich_machine_learning | *************** EP Error *************** immich_machine_learning | EP Error /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider_info.cc:59 static onnxruntime::CUDAExecutionProviderInfo onnxruntime::CUDAExecutionProviderInfo::FromProviderOptions(const onnxruntime::ProviderOptions&) [ONNXRuntimeError] : 1 : FAIL : provider_options_utils.h:151 Parse Failed to parse provider option "device_id": CUDA failure 803: system has unsupported display driver / cuda driver combination ; GPU=-1 ; hostname=1aeb1e8be6a3 ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider_info.cc ; line=66 ; expr=cudaGetDeviceCount(&num_devices); immich_machine_learning | when using ['CUDAExecutionProvider', 'CPUExecutionProvider'] immich_machine_learning | Falling back to ['CPUExecutionProvider'] and retrying. immich_machine_learning | ****************************************

old fox
#

I'm actually confused by your documentation. You state that compute capbility 5.2 is the minimum required, but then you state CUDA 12.3 support in the driver is required. From what I'm finding online, CUDA 12 does support some of these older GPUs. Did this section of the documentation fall out of date?

This feature allows you to use a GPU to accelerate machine learning tasks, such as Smart Search and Facial Recognition, while reducing CPU load.

distant bear
#

compute capability is a different thing from the cuda version. the model engine we use requires a certain range of compute capabilities

old fox
#

For what it's worth, the fix for all my issues and it looks like others have seen similar issues, is the LD_LIBRARY_PATH. You may want to consider adding the included cuda path to the LD_LIBRARY_PATH.