#Issues with machine learning

1 messages · Page 1 of 1 (latest)

exotic steeple
#

I just moved out of LXC+Docker to just PODMAN directly on the proxmox host primarily because I was having issues with smart search a lot. Everything started up great with no issues, pretty much a straightforward podman-compose up -d with no changes. nvidia-smi was already working on the host since I had it working in LXC.

However now any time I try to access smart search, run a job, anything I get the following errors about getting killed and then it just repeats with diff pid every time.

[10/14/24 23:39:03] INFO     Initialized request thread pool with 8 threads.    
[10/14/24 23:39:03] INFO     Application startup complete.                      
[10/14/24 23:39:03] INFO     Loading visual model 'ViT-B-32__openai' to memory  
[10/14/24 23:39:03] INFO     Setting execution providers to                     
                             ['CUDAExecutionProvider', 'CPUExecutionProvider'], 
                             in descending order of preference                  
[10/14/24 23:39:04] ERROR    Worker (pid:99) was sent SIGSEGV!                  
[10/14/24 23:39:04] INFO     Booting worker with pid: 127                      
gusty brambleBOT
#

:wave: Hey @exotic steeple,

Thanks for reaching out to us. Please follow the recommended actions below; this will help us be more effective in our support effort and leave more time for building Immich immich.

References

Checklist

  1. :ballot_box_with_check: I have verified I'm on the latest release(note that mobile app releases may take some time).
  2. :ballot_box_with_check: I have read applicable release notes.
  3. :ballot_box_with_check: I have reviewed the FAQs for known issues.
  4. :ballot_box_with_check: I have reviewed Github for known issues.
  5. :ballot_box_with_check: I have tried accessing Immich via local ip (without a custom reverse proxy).
  6. :ballot_box_with_check: I have uploaded the relevant logs, docker compose, and .env files, making sure to use code formatting.
  7. :ballot_box_with_check: I have tried an incognito window, disabled extensions, cleared mobile app cache, logged out and back in, different browsers, etc. as applicable

(an item can be marked as "complete" by reacting with the appropriate number)

If this ticket can be closed you can use the /close command, and re-open it later if needed.

gusty brambleBOT
exotic steeple
#
 immich-machine-learning:
    container_name: immich_machine_learning
    # For hardware acceleration, add one of -[armnn, cuda, openvino] to the image tag.
    # Example tag: ${IMMICH_VERSION:-release}-cuda
    image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}-cuda
    extends: # uncomment this section for hardware acceleration - see https://immich.app/docs/fe
atures/ml-hardware-acceleration
      file: hwaccel.ml.yml
      service: cuda # set to one of [armnn, cuda, openvino, openvino-wsl] for accelerated inference - use the `-wsl` version for WSL2 where applicable
    volumes:
      - /collective/appdata/immich/model:/cache
    env_file:
      - .env
    restart: unless-stopped
compact marsh
#

try switch to CPU search, if it works, then your hardware acelleration settings might not correct

exotic steeple
#

what hardware acceleration settings?

compact marsh
exotic steeple
#

I don't. I literally use what the docker compose provides

#

I know it works with test cases like below. I just would expect docker-compose to somehow pass the nvidia gpu like I do since I have no control of it with compose?

podman run --rm --device nvidia.com/gpu=all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi
Tue Oct 15 02:03:49 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.120                Driver Version: 550.120        CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1080        Off |   00000000:01:00.0 Off |                  N/A |
| 14%   47C    P0             48W /  230W |       0MiB /   8192MiB |      0%      Default |
|                                         |                        |                  N/A |
...
compact marsh
exotic steeple
#

yes, that is what the docker-compose provides with the extends, and which I have set and have posted. From the webiste

Example of a Compose file for running a service with access to 1 GPU device

services:
  test:
    image: nvidia/cuda:12.3.1-base-ubuntu20.04
    command: nvidia-smi
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
#

from hwaccel.ml.yml

root@server /c/a/immich-compose# grep -A 10 "cuda" hwaccel.ml.yml 
  cuda:
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities:
                - gpu
exotic steeple
#

well i got it working by doing this explicitly in docker-compose.yml

    devices:
      - nvidia.com/gpu=all
    security_opt:
     - "label=disable"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities:
                - gpu
mental raptor
exotic steeple
#

yep i literally did apt update and then installed nvidia-container-toolkit right before i posted this 😛