#ML not working after newest update

1 messages · Page 1 of 1 (latest)

severe elk
#

I just tried out the new ML update, and I've gotten it to work on immich-microservices (transcoding).
Unfortunately I didn't find success with immich-machine-learning.

Here are my ML yml files and how I referenced them in the docker-compose.

docker-compose.yml

  immich-microservices:
    container_name: immich_microservices
    image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release}
    extends:
      file: hwaccel.transcoding.yml
      service: quicksync
    command:
      - start.sh
      - microservices
    volumes:
      - ${UPLOAD_LOCATION}:/usr/src/app/upload
      - /etc/localtime:/etc/localtime:ro
    env_file:
      - .env
    depends_on:
      - redis
      - database
    restart: always
  immich-machine-learning:
    container_name: immich_machine_learning
    image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}-openvino
    extends:
      file: hwaccel.ml.yml
      service: openvino
    volumes:
      - ${CACHE_LOCATION}:/cache
    env_file:
      - .env
    restart: always

hwaccel.ml.yml

version: "3.8"
services:
  cpu: {}

  openvino:
    device_cgroup_rules:
      - "c 189:* rmw"
    devices:
      - /dev/dri:/dev/dri
    volumes:
      - /dev/bus/usb:/dev/bus/usb

hwaccel.transcoding.yml

version: "3.8"

services:

  cpu: {}

  quicksync:
    devices:
      - /dev/dri:/dev/dri
jagged horizonBOT
#

:wave: Hey @severe elk,

Thanks for reaching out to us. Please follow the recommended actions below; this will help us be more effective in our support effort and leave more time for building Immich immich.

References

Checklist

  1. :ballot_box_with_check: I have verified I'm on the latest release(note that mobile app releases may take some time).
  2. :ballot_box_with_check: I have read applicable release notes.
  3. :ballot_box_with_check: I have reviewed the FAQs for known issues.
  4. :ballot_box_with_check: I have reviewed Github for known issues.
  5. :ballot_box_with_check: I have tried accessing Immich via local ip (without a custom reverse proxy).
  6. :ballot_box_with_check: I have uploaded the relevant logs, docker compose, and .env files using the buttons below or the /upload command.
  7. :ballot_box_with_check: I have tried an incognito window, disabled extensions, cleared mobile app cache, logged out and back in, different browsers, etc. as applicable

(an item can be marked as "complete" by reacting with the appropriate number)

If this ticket can be closed you can use the /close command, and re-open it later if needed.

jagged horizonBOT
severe elk
#

oh I see so it wasnt a configuration error thanks

#

I use the same CLIP too

charred elm
#

I’m back to poor old cpu 🙂 transcode videos does work for me

severe elk
#

Yeah, same

void ridge
#

Hmm... getting this for my ML. I guess good news for me its recognizing my GPU:

safe flint
#

OpenVINO is a bit flakier than CPU or CUDA. It might be incompatible with the model you're using

void ridge
#

yeah.. maybe I just rotate all the models and see which one works...

#

Also I'm using Portainer... so I can't configure CPU {}... wouldn't it fall back with CPU?

safe flint
#

To use CPU, you can just remove the devices: section so it doesn't see the GPU

void ridge
#

ah... i see what you mean. CPU is a place holder for the devices: to be empty. Thanks for clarifying.

safe flint
#

Can you try using the pr-6871-openvino image tag for immich-machine-learning? I don't have an Intel device handy, but I suspect that this might fix your issue

severe elk
safe flint
#

You can change the value of the image: line to ghcr.io/immich-app/immich-machine-learning:pr-6871-openvino

severe elk
#

got it, give me a second

#
immich_machine_learning  | [02/03/24 00:42:51] INFO     Loading facial recognition model 'antelopev2' to
immich_machine_learning  |                              memory

immich_machine_learning  | EP Error /home/onnxruntimedev/onnxruntime/onnxruntime/python/onnxruntime_pybind_state.cc:739 std::unique_ptr<onnxruntime::IExecutionProvider> onnxruntime::python::CreateExecutionProviderInstance(const onnxruntime::SessionOptions&, const string&, const ProviderOptionsMap&) Invalid OpenVINO EP option: disable_dynamic_shapes
immich_machine_learning  |  when using ['OpenVINOExecutionProvider', 'CPUExecutionProvider']
immich_machine_learning  | Falling back to ['CPUExecutionProvider'] and retrying.
immich_machine_learning  | EP Error /home/onnxruntimedev/onnxruntime/onnxruntime/python/onnxruntime_pybind_state.cc:739 std::unique_ptr<onnxruntime::IExecutionProvider> onnxruntime::python::CreateExecutionProviderInstance(const onnxruntime::SessionOptions&, const string&, const ProviderOptionsMap&) Invalid OpenVINO EP option: disable_dynamic_shapes
immich_machine_learning  |  when using ['OpenVINOExecutionProvider', 'CPUExecutionProvider']
immich_machine_learning  | Falling back to ['CPUExecutionProvider'] and retrying.
immich_machine_learning  | [02/03/24 00:42:53] INFO     Loading clip model 'ViT-H-14-quickgelu__dfn5b' to
immich_machine_learning  |                              memory

immich_machine_learning  | EP Error /home/onnxruntimedev/onnxruntime/onnxruntime/python/onnxruntime_pybind_state.cc:739 std::unique_ptr<onnxruntime::IExecutionProvider> onnxruntime::python::CreateExecutionProviderInstance(const onnxruntime::SessionOptions&, const string&, const ProviderOptionsMap&) Invalid OpenVINO EP option: disable_dynamic_shapes
immich_machine_learning  |  when using ['OpenVINOExecutionProvider', 'CPUExecutionProvider']
immich_machine_learning  | Falling back to ['CPUExecutionProvider'] and retrying.
#

just in case:

version: "3.8"
services:
  cpu: {}

  openvino:
    device_cgroup_rules:
      - "c 189:* rmw"
    devices:
      - /dev/dri:/dev/dri
    volumes:
      - /dev/bus/usb:/dev/bus/usb
    extends:
      file: hwaccel.ml.yml
      service: openvino
safe flint
#

Hm, looks like that option was only recently renamed to disable_dynamic_shapes. Before that it was enable_dynamic_shapes and defaulted to false

safe flint
#

@severe elk Can you pull and try again?

void ridge
safe flint
#

Is that with the release image or the pr image?

void ridge
safe flint
#

Hm, when did you pull the image?

void ridge
#

around ~ 11:40PM EST

#

like about 5 mins b4 I posted the latest log

safe flint
#

Can you try pulling to see if it's the latest?

void ridge
#

yep. clearing my image now and repulling

safe flint
#

I just reverted a change so try pulling and trying again

void ridge
#

oh hey... looks promising. no error

#

one sec..

#

hope I set it up right..

safe flint
#

Hey, there we go

void ridge
#

yeah looks ok here

#

nice nice...

#

its setup right

#

i have this much so will see how it goes...

safe flint
#

Awesome. The latest change made it so the model wasn't totally copied, just the small "instruction" file. But from your error it looks like that doesn't work, so I guess we just have to make another copy of the full thing

void ridge
#

oh.. i what i did was also to clear out my model-cache folder so it downloaded again to be sure. not sure if that also made a difference

#

looks like going down 12-15 images per second

safe flint
#

How fast was it on CPU? And what does your utilization look like?

void ridge
#

hmm.. lemme check proxmox

#

i think it was about 80% b4.. didn't do a comparison.. but lemme try intel_gpu_top.... my vGPU passthrough is a bit funky on that

#

yep.. its WORKING!!! IT'S ALIVEEEEE!!

#

thanks dude!

#

was like totally 0 b4

safe flint
#

Let's gooo

severe elk
#

oh nice let me try

severe elk
#

Nice, it works! (only tested with 1 image though lol)

#

I get a different error though which is unrelated I think, im just asking just in case:

immich_microservices     | [Nest] 7  - 02/03/2024, 11:13:22 AM   ERROR [JobService] Unable to run job handler (smartSearch/smart-search): Error: ENOENT: no such file or directory, open 'upload/thumbs/4b543d3b-e766-49b1-a36b-528e8be14ce0/87/e4/87e43c32-706a-4185-a21c-4353598dab25.jpeg'
immich_microservices     | [Nest] 7  - 02/03/2024, 11:13:22 AM   ERROR [JobService] Error: ENOENT: no such file or directory, open 'upload/thumbs/4b543d3b-e766-49b1-a36b-528e8be14ce0/87/e4/87e43c32-706a-4185-a21c-4353598dab25.jpeg'
immich_microservices     | [Nest] 7  - 02/03/2024, 11:13:22 AM   ERROR [JobService] Object:
immich_microservices     | {
immich_microservices     |   "id": "87e43c32-706a-4185-a21c-4353598dab25"
immich_microservices     | }
immich_microservices     |
immich_microservices     | [Nest] 7  - 02/03/2024, 11:13:31 AM   ERROR [JobService] Unable to run job handler (faceDetection/face-detection): Error: ENOENT: no such file or directory, open 'upload/thumbs/4b543d3b-e766-49b1-a36b-528e8be14ce0/87/e4/87e43c32-706a-4185-a21c-4353598dab25.jpeg'
immich_microservices     | [Nest] 7  - 02/03/2024, 11:13:31 AM   ERROR [JobService] Error: ENOENT: no such file or directory, open 'upload/thumbs/4b543d3b-e766-49b1-a36b-528e8be14ce0/87/e4/87e43c32-706a-4185-a21c-4353598dab25.jpeg'
immich_microservices     | [Nest] 7  - 02/03/2024, 11:13:31 AM   ERROR [JobService] Object:
immich_microservices     | {
immich_microservices     |   "id": "87e43c32-706a-4185-a21c-4353598dab25"
immich_microservices     | }
#

it cant find a thumbnail, this file indeed doesnt exist but like what now? how do I make the error go away

#
immich_microservices     | [Nest] 7  - 02/03/2024, 11:18:05 AM   ERROR [JobService] Unable to run job handler (thumbnailGeneration/generate-webp-thumbnail): Error: ffmpeg exited with code 1: Conversion failed!
immich_microservices     |
immich_microservices     | [Nest] 7  - 02/03/2024, 11:18:05 AM   ERROR [JobService] Error: ffmpeg exited with code 1: Conversion failed!
immich_microservices     |
immich_microservices     |     at ChildProcess.<anonymous> (/usr/src/app/node_modules/fluent-ffmpeg/lib/processor.js:182:22)
immich_microservices     |     at ChildProcess.emit (node:events:518:28)
immich_microservices     |     at ChildProcess._handle.onexit (node:internal/child_process:294:12)
immich_microservices     | [Nest] 7  - 02/03/2024, 11:18:05 AM   ERROR [JobService] Object:
immich_microservices     | {
immich_microservices     |   "id": "87e43c32-706a-4185-a21c-4353598dab25"
immich_microservices     | }

Oh and I get this when running GENERATE THUMBNAILS

tame pasture
severe elk
#

do you know whats causing it?

tame pasture
#

I think this is a dependency issue

quiet sparrow
#

Just pasting here in hopes to help with troubleshooting. Change to your dev image but getting this error with default model for smart search:

                             onnxruntime::openvino_ep::OVExeNetwork             
                             onnxruntime::openvino_ep::OVCore::LoadNetwork(const
                             string&, std::string&, ov::AnyMap&, std::string)   
                             [OpenVINO-EP]  Exception while Loading Network for 
                             graph:                                             
                             OpenVINOExecutionProvider_OpenVINO-EP-subgraph_910_
                             0Check 'false' failed at                           
                             src/inference/src/core.cpp:149:                    
                             invalid external data:                             
                             ExternalDataInfo(data_full_path:                   
                             693d0214-c641-11ee-b47d-0242ac1f0002, offset: 0,   
                             data_length: 9437184)                              
                                                                                ```
#
[02/08/24 05:25:56] DEBUG    Loading clip vision model 'ViT-B-32__openai'       
2024-02-08 05:25:56.374002632 [E:onnxruntime:, inference_session.cc:1754 operator()] Exception during initialization: /home/onnxruntimedev/onnxruntime/onnxruntime/core/providers/openvino/ov_interface.cc:53 onnxruntime::openvino_ep::OVExeNetwork onnxruntime::openvino_ep::OVCore::LoadNetwork(const string&, std::string&, ov::AnyMap&, std::string) [OpenVINO-EP]  Exception while Loading Network for graph: OpenVINOExecutionProvider_OpenVINO-EP-subgraph_911_0Check 'false' failed at src/inference/src/core.cpp:149:
invalid external data: ExternalDataInfo(data_full_path: 693d0214-c641-11ee-b47d-0242ac1f0002, offset: 0, data_length: 9437184)```
quiet sparrow
#

Updating this rather than opening a new issue. Just updated to the latest 1.95 and iGPU is still giving the same error:

services:
  immich-machine-learning:
    container_name: immich_machine_learning
    #image: ghcr.io/immich-app/immich-machine-learning:pr-6871-openvino
    image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}-openvino
    device_cgroup_rules:
       - "c 189:* rmw"
    devices:
       - /dev/dri:/dev/dri
    volumes:
       - /dev/bus/usb:/dev/bus/usb
       - model-cache:/cache
    #volumes:
     #- model-cache:/cache
    env_file:
      - stack.env
    ports:
      - 3003:3003
    restart: always```
safe flint
#

We're still waiting for the onnxruntime-openvino package to be updated

quiet sparrow
#

aah okay, my non-github understanding brain thought below addressed this in the latest update:
fix(ml): openvino not working with dynamic axes by @mertalev in #6871

jagged horizonBOT
safe flint
#

Unfortunately that wasn't the only bug with openvino