Issues with machine learning | Immich | Page 1

exotic steeple Oct 14, 2024, 11:42 PM

#

I just moved out of LXC+Docker to just PODMAN directly on the proxmox host primarily because I was having issues with smart search a lot. Everything started up great with no issues, pretty much a straightforward podman-compose up -d with no changes. nvidia-smi was already working on the host since I had it working in LXC.

However now any time I try to access smart search, run a job, anything I get the following errors about getting killed and then it just repeats with diff pid every time.

[10/14/24 23:39:03] INFO     Initialized request thread pool with 8 threads.    
[10/14/24 23:39:03] INFO     Application startup complete.                      
[10/14/24 23:39:03] INFO     Loading visual model 'ViT-B-32__openai' to memory  
[10/14/24 23:39:03] INFO     Setting execution providers to                     
                             ['CUDAExecutionProvider', 'CPUExecutionProvider'], 
                             in descending order of preference                  
[10/14/24 23:39:04] ERROR    Worker (pid:99) was sent SIGSEGV!                  
[10/14/24 23:39:04] INFO     Booting worker with pid: 127

gusty brambleBOT Oct 14, 2024, 11:42 PM

#

:wave: Hey @exotic steeple,

Thanks for reaching out to us. Please follow the recommended actions below; this will help us be more effective in our support effort and leave more time for building Immich immich .

References

Container Logs: docker compose logs docs
Container Status: docker compose ps docs
Reverse Proxy: https://immich.app/docs/administration/reverse-proxy

Checklist

:ballot_box_with_check: I have verified I'm on the latest release(note that mobile app releases may take some time).
:ballot_box_with_check: I have read applicable release notes.
:ballot_box_with_check: I have reviewed the FAQs for known issues.
:ballot_box_with_check: I have reviewed Github for known issues.
:ballot_box_with_check: I have tried accessing Immich via local ip (without a custom reverse proxy).
:ballot_box_with_check: I have uploaded the relevant logs, docker compose, and .env files, making sure to use code formatting.
:ballot_box_with_check: I have tried an incognito window, disabled extensions, cleared mobile app cache, logged out and back in, different browsers, etc. as applicable

(an item can be marked as "complete" by reacting with the appropriate number)

If this ticket can be closed you can use the /close command, and re-open it later if needed.

gusty brambleBOT Oct 15, 2024, 12:01 AM

#

gusty bramble :wave: Hey <@402646640374054922>, Thanks for reaching out to us. Please follow ...

Successfully submitted, a tag has been added to inform contributors. :white_check_mark:

exotic steeple Oct 15, 2024, 12:06 AM

#

 immich-machine-learning:
    container_name: immich_machine_learning
    # For hardware acceleration, add one of -[armnn, cuda, openvino] to the image tag.
    # Example tag: ${IMMICH_VERSION:-release}-cuda
    image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}-cuda
    extends: # uncomment this section for hardware acceleration - see https://immich.app/docs/fe
atures/ml-hardware-acceleration
      file: hwaccel.ml.yml
      service: cuda # set to one of [armnn, cuda, openvino, openvino-wsl] for accelerated inference - use the `-wsl` version for WSL2 where applicable
    volumes:
      - /collective/appdata/immich/model:/cache
    env_file:
      - .env
    restart: unless-stopped

compact marsh Oct 15, 2024, 12:22 AM

#

try switch to CPU search, if it works, then your hardware acelleration settings might not correct

exotic steeple Oct 15, 2024, 12:31 AM

#

what hardware acceleration settings?

compact marsh Oct 15, 2024, 1:03 AM

#

exotic steeple what hardware acceleration settings?

The way you mount your GPU or gpu/cuda driver...etc

exotic steeple Oct 15, 2024, 1:55 AM

#

I don't. I literally use what the docker compose provides

#

I know it works with test cases like below. I just would expect docker-compose to somehow pass the nvidia gpu like I do since I have no control of it with compose?

podman run --rm --device nvidia.com/gpu=all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi
Tue Oct 15 02:03:49 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.120                Driver Version: 550.120        CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1080        Off |   00000000:01:00.0 Off |                  N/A |
| 14%   47C    P0             48W /  230W |       0MiB /   8192MiB |      0%      Default |
|                                         |                        |                  N/A |
...

compact marsh Oct 15, 2024, 2:05 AM

#

https://docs.docker.com/compose/how-tos/gpu-support/

Docker Documentation

Enable GPU support

Understand GPU support in Docker Compose

exotic steeple Oct 15, 2024, 2:10 AM

#

yes, that is what the docker-compose provides with the extends, and which I have set and have posted. From the webiste

Example of a Compose file for running a service with access to 1 GPU device

services:
  test:
    image: nvidia/cuda:12.3.1-base-ubuntu20.04
    command: nvidia-smi
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

#

from hwaccel.ml.yml

root@server /c/a/immich-compose# grep -A 10 "cuda" hwaccel.ml.yml 
  cuda:
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities:
                - gpu

exotic steeple Oct 15, 2024, 2:41 AM

#

well i got it working by doing this explicitly in docker-compose.yml

    devices:
      - nvidia.com/gpu=all
    security_opt:
     - "label=disable"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities:
                - gpu

mental raptor Oct 15, 2024, 2:42 AM

#

Make sure that your nvidia container toolkit is up-to-date. I don’t have experience using GPUs with Podman, but you may also need to configure it https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#configuring-podman

mental raptor Oct 15, 2024, 2:43 AM

#

exotic steeple well i got it working by doing this explicitly in docker-compose.yml ``` de...

Or just do that lol

exotic steeple Oct 15, 2024, 2:43 AM

#

yep i literally did apt update and then installed nvidia-container-toolkit right before i posted this 😛

#Issues with machine learning

References

Checklist