#openvino immich_machine_learning

1 messages · Page 1 of 1 (latest)

copper sable
#

It seems that sometimes my system temporarily runs out of gpu resources. when that happens, the machine learning container crashes but keeps running.
immich_server

dusky emberBOT
#

:wave: Hey @copper sable,

Thanks for reaching out to us. Please follow the recommended actions below; this will help us be more effective in our support effort and leave more time for building Immich immich.

References

Checklist

  1. :ballot_box_with_check: I have verified I'm on the latest release(note that mobile app releases may take some time).
  2. :ballot_box_with_check: I have read applicable release notes.
  3. :ballot_box_with_check: I have reviewed the FAQs for known issues.
  4. :ballot_box_with_check: I have reviewed Github for known issues.
  5. :ballot_box_with_check: I have tried accessing Immich via local ip (without a custom reverse proxy).
  6. :ballot_box_with_check: I have uploaded the relevant logs, docker compose, and .env files, making sure to use code formatting.
  7. :ballot_box_with_check: I have tried an incognito window, disabled extensions, cleared mobile app cache, logged out and back in, different browsers, etc. as applicable

(an item can be marked as "complete" by reacting with the appropriate number)

If this ticket can be closed you can use the /close command, and re-open it later if needed.

copper sable
#

the immich_machine_learning container does not recover unless I manually reboot it, and causes all jobs to fail in queue. It processes one failure / 5min.

dusky emberBOT
copper sable
#

This first crash happened after 32 minutes of running. Will monitor the next round as well

#

using image ghcr.io/immich-app/immich-machine-learning:pr-12455-openvino

#

version 1.114

#

it crashes again after 7 min. I'll try running just facial detection to see what happens

#

after this "crash" the container uses an additional 1gb of ram but is still unable to process facial jobs.

#

and crashed after 3-4 assets.

#

it seems to be loading buffalo_l twice

[09/24/24 21:00:06] INFO     Application startup complete.
[09/24/24 21:00:22] INFO     Loading detection model 'buffalo_l' to memory
[09/24/24 21:00:22] INFO     Setting execution providers to
                             ['OpenVINOExecutionProvider',
                             'CPUExecutionProvider'], in descending order of
                             preference
[09/24/24 21:00:23] INFO     Loading recognition model 'buffalo_l' to memory
[09/24/24 21:00:23] INFO     Setting execution providers to
                             ['OpenVINOExecutionProvider',
                             'CPUExecutionProvider'], in descending order of
                             preference

when loading the SigLIP model, it only loads once. Job concurrency on both is 1

obsidian trench
#

i have same issue

copper sable
#

woohoo!! im not the only one!!

#

do you have this too?
I'm getting a fair bit of these warnings in my immich_machine_learning container. Because it's a warning, is it something I should give attention?

[W:onnxruntime:, execution_frame.cc:879 VerifyOutputSizes] Expected shape from model of {1,512} does not match actual shape of {2,512} for output 683

copper sable
topaz basin
#

I'm guessing you're right at the edge of your iGPU's RAM limit, so if it uses a bit extra at some point it all falls apart

cerulean surge
topaz basin
copper sable
topaz basin
#

As of the 1.115 release

copper sable
#

my bad, I didn't see any of that in the release notes. Guess I'll update the stack tomorrow

topaz basin
#

It shouldn't use more resources than the ViT model. I just meant that buffalo_l is really two models: a detection model and a recognition model

copper sable
#

oh gotcha

topaz basin
#

Technically the recognition model has unbounded RAM usage because it will process all faces in an image in one batch, so if it encounters an image with a lot of faces it'll use a bit more RAM than usual

#

But it should always use a lot less than the ViT model you have

copper sable
#

even with multiple faces?

#

I'm still using ghcr.io/immich-app/immich-machine-learning:pr-12455-openvino, should I be using release or continue with this one?

#

Is there any suggested method to automatically reboot the container to handle these crashes?

topaz basin
copper sable
#

oh yes that is tiny. It really shouldn't get anywhere close then

topaz basin
#

I wonder if this is related to the ram not being fully freed when the worker exits

topaz basin
#

Could also be related to the kernels on Synologys being super outdated

#

It's a wonder any of this stuff even works haha

copper sable
#

hahaha lovely. Just what I wanted to hear...

#

you think that guide would work with an outdated kernel?

topaz basin
#

I'm not sure about step 1 since Synology has its own setup for kernels, but the remaining steps should be fine

copper sable
#

ok thanks, guess I'll make sure backups are upto snuff and give it a go tomorrow. My current kernel is 4.4.302+

copper sable
copper sable
#

is there a way to do facial recognition via cpu but smartsearch via hardware acceleration?

topaz basin
#

There’s no easy way. I guess you could run two ML instances with a load balancer or something

topaz basin
#

There is a possible solution for you in running the models in fp16 instead of fp32. It should halve the models’ memory usage with a tiny difference in quality

copper sable
#

thing is, it's just buffalo_l that's the problem. I could handle it crashing and restarting correctly as well, however when it crashes it takes down the entire model (it does not ViT-L-16-SigLIP-384__webli). I've been running ViT-L-16-SigLIP-384__webli with no problems for about 12 hours now

topaz basin
#

Does this only happen when both tasks are being processed, or also when face detection alone is?

#

i.e. is it just the combined total ram that’s the problem, or is there something specific about the facial recognition model outside of ram issues?

copper sable
topaz basin
#

Does the container use more than 8GiB when that happens?

copper sable
#

I don't think so

#

Let me run a quick comparison

#

This is running SigLIP

#

and buffalo_l

#

the ram seems much more volatile than sigLIP, as sigLIP ran steady

#

just crashed again

topaz basin
#

Could you restart the container and run docker exec immich_machine_learning rm -r /cache/facial-recognition/buffalo_l/recognition/openvino?

copper sable
#

Will do. I just upgraded to 1.115 so I assumed I had a fresh palette

topaz basin
copper sable
#

it did

#
ash-4.4# docker attach immich_machine_learning
[09/25/24 13:11:34] INFO     Loading detection model 'buffalo_l' to memory
[09/25/24 13:11:34] INFO     Setting execution providers to
                             ['OpenVINOExecutionProvider',
                             'CPUExecutionProvider'], in descending order of
                             preference
[09/25/24 13:11:35] INFO     Loading recognition model 'buffalo_l' to memory
[09/25/24 13:11:35] INFO     Setting execution providers to
                             ['OpenVINOExecutionProvider',
                             'CPUExecutionProvider'], in descending order of
                             preference
topaz basin
#

Kind of, but openvino uses cached blobs

copper sable
#

should it be using cpu for buffalo_l? Single task being processed

topaz basin
#

It’s compiling the models right now

copper sable
#

oh alright, so just wait for cpu to drop before looking at failed logs again (if it takes longer than 5min I assume I'll have fails on the immich_server)

topaz basin
#

Yup

copper sable
#

there was one minor cpu/ram spike since the cpu dropped, but running fairly steady otherwise so far

#

actually it appears cpu and ram aren't always in sync

topaz basin
#

It seems like there’s some allocation limit this is hitting besides the 8GiB limit you mentioned

#

Can you check the server logs for the asset id of the asset that caused it to crash?

#

In the “job failed” text, there should be something like “id”: “<random-string>”

copper sable
#

yep

#

got it, c1a83766-d232-450d-be6c-3c47cd89aefe

topaz basin
#

Can you open a random image in immich and change the id in the url to this id?

copper sable
#

Stacktrace
Error: Error: 400
    at Object.he [as ok] (http://192.168.3.92:2283/_app/immutable/chunks/fetch-client.CFZ4JfrO.js:1:2872)
    at async Yt (http://192.168.3.92:2283/_app/immutable/nodes/17.LM1klr7C.js:1:2358)
    at async Pe (http://192.168.3.92:2283/_app/immutable/chunks/entry.CYYJZztW.js:1:14634)```
#

perhaps that's another user?

#

lemme login as another

topaz basin
#

Might be

copper sable
#

20 faces in that one

topaz basin
#

Hmm, well that at least confirms that it’s because of the batch size

#

Maybe the way openvino handles batching is more ram-intensive

copper sable
#

perhaps, it was using much more than cuda to start with anyways

After unloading the model, the ram usage is still up there, will this present a memory leak moving forwards?

                            to memory
[09/25/24 13:28:58] INFO     Setting execution providers to
                            ['OpenVINOExecutionProvider',
                            'CPUExecutionProvider'], in descending order of
                            preference
[09/25/24 13:34:00] INFO     Shutting down due to inactivity.
[09/25/24 13:34:00] INFO     Shutting down
[09/25/24 13:34:00] INFO     Waiting for application shutdown.
[09/25/24 13:34:01] INFO     Application shutdown complete.
[09/25/24 13:34:01] INFO     Finished server process [10]
[09/25/24 13:34:01] ERROR    Worker (pid:10) was sent SIGINT!
[09/25/24 13:34:01] INFO     Booting worker with pid: 120
[09/25/24 13:34:09] INFO     Started server process [120]
[09/25/24 13:34:09] INFO     Waiting for application startup.
[09/25/24 13:34:09] INFO     Created in-memory cache with unloading after 300s
                            of inactivity.
[09/25/24 13:34:09] INFO     Initialized request thread pool with 4 threads.
[09/25/24 13:34:09] INFO     Application startup complete.

I'm running a search and will wait for it to unload again just to see if I get approx the same resting ram usage

topaz basin
#

It’s a memory leak if it keeps building up each time, but if it reuses the ram from earlier then it isn’t a leak

copper sable
#

Ok so I'll know in 5 min then. Regarding this for me, I guess I need to run buffalo_l in cpu only? It would be really nice if after crashing it would fallback on cpu, then fire up hardware acceleration for the next

#

we dropped to 2.75gb, not significantly more than than the 2.21 previously

#

about 500mb more

topaz basin
#

The question is if loading the models reuses that or if it’ll spike much higher now, and what it drops to after unloading a second time

copper sable
topaz basin
#

That’s good at least

copper sable
#

running a 3rd time for consistency

#

idles about the same on the 3rd round as the 2nd

#

would error handling be possible for something like this, or is it more effort than it's worth? As of right now in this condition the buffalo_l model just hangs

topaz basin
#

It’s one thing to handle this with a cpu fallback when loading the model, but it’s tricky on-the-fly

#

The solution is generally to avoid this happening in the first place

copper sable
#

so it's not so easy to even just reload the gpu model?

copper sable
topaz basin
#

We’re just using openvino through a relatively high-level API, so the specifics of how allocations and deallocations happen isn’t up to us. #11981 is related

topaz basin
#

The issue in this case is that once it gets an allocation error with that one asset with 20 faces, it still doesn’t work for other assets that it could otherwise process. I think the fact that it can’t recover from this is an upstream issue

copper sable
#
                             status code returned while running
                             OpenVINO-EP-subgraph_2 node.```

So you're thinking there's no easy way to restart it once a non-zero status code is returned?
#

also, after reading that discussion, are failed assets excluded from being ran, even you'd manually click missing, discover, or refresh in the admin->jobs pane?

topaz basin
#

One issue is that if it hits this situation once, there’s no reason to assume it won’t again. When there are many jobs, reloading the model from disk for each job because of a failure that happens each time is silly. Another issue is that the error isn’t necessarily fatal, so falling back to cpu or reloading from disk is problematic if it can still process other assets fine. And specifically for the CPU fallback, it could actually increase ram pressure if the model was previously on the GPU and make things worse.

copper sable
#

what would you consider fatal? This job now hangs until someone manually restarts the container, or the queue is emptied by a bunch of header timeouts

topaz basin
#

It’s essentially fatal in this case, but for CUDA it would actually work fine for other inputs

copper sable
#

ah ok.

#

I guess I'll spin up 2 ML instances then, one on cpu and one on gpu, and push the respective jobs to each via a nginx loadbalancer or so

topaz basin
#

Out of curiosity, could you download that asset with 20 faces, delete it from immich (including from the trash), disable smart search in the settings, restart the ml container and upload that asset?

#

I’m interested in whether it can process those faces in isolation without other jobs running

copper sable
topaz basin
#

But there were other face detection jobs that ran before it

copper sable
#

oh you mean as the only job that runs though the ml container

#

sure

#

would I need to delete it, or could I upload it as a different user?

topaz basin
#

Uploading as a different user is fine too

copper sable
#

1 sec clearing queues

#

just to be safe I'll restart the entire immich stack

#

I think it worked

topaz basin
#

Ooh interesting. What does the ram usage look like? How many faces did it detect?

copper sable
#

wait, I can't find the image in the library

topaz basin
#

Also enable debug logs since it’ll give a bit more info

copper sable
#

I was wrong as it found all 20 people

topaz basin
#

Did you change the min face detection setting, or is it default?

copper sable
#

let me set debugging. just trying to figure out how to delete it from trash atm

#

I believe I changed it

topaz basin
#

Try uploading the asset, deleting it and then uploading it again

#

Before the models get unloaded

copper sable
#

if I delete it, it's gone? No trash?

topaz basin
#

It should be deleted from the trash as well

copper sable
#

models have been unloaded. I'll try this a few times within shorter window

#

it doesn't flinch

topaz basin
#

Now lower the face detection threshold and try again until it detects more than 20 faces in that image

copper sable
#

[Nest] 7 - 09/25/2024, 2:38:59 PM DEBUG [Microservices:PersonService] 20 faces detected in upload/thumbs/c89ae764-04ef-4f4e-8f36-8d0247f0a12b/65/08/6508584b-68b8-445f-b07b-c7e787bb37ca-preview.webp

#

there are only 20 faces in that image

topaz basin
#

The model will disagree at some point 🙂

copper sable
#

hahaha ok

#

24 faces at 0.1

topaz basin
#

What does the ram usage look like?

copper sable
#

crashed when I set it to 0.0.
3.53 gb at 0.1

topaz basin
#

Oh lol, 0.0 is brutal

#

It would detect thousands of faces in that image

copper sable
#

haha ok. It didn't let me do 0.05

topaz basin
#

But this is good. So the issue isn’t so much the batch size but when it receives batches of different sizes

#

I’m guessing it’s doing some caching here, so it has one version of the model that it uses for a batch of 20, another version for 24, etc.

#

And presumably the effect of this scales higher when the batch size grows and is less noticeable at smaller batch sizes

copper sable
#

oh interesting, it also fully unloaded the model.

#

after the crash I was successfully able to reupload it again after changing the detection settings. I'm guessing that after 5 min without a request it managed to unload

copper sable
topaz basin
#

It’s probably holding onto the kernels and the ram it allocated for processing that particular batch size. This lets it reuse all of this the next time it gets an input with that batch size

copper sable
topaz basin
#

Each request it gets from the server would refresh that counter, so it’d only recover sooner after the server goes through all the jobs

copper sable
#

ah so not helpful then

topaz basin
#

Yeah not really

#

Based on what I’m seeing, using fp16 would probably avoid the error you’re seeing because it halves the working memory that needs to be allocated for each batch size

#

With the bonus of being much faster

copper sable
copper sable
#

the real advantage of the hardware acceleration is searching images

topaz basin
#

But how much of an effect it has depends on the specific model

#

Using cpu for facial recognition is fine too, but this might be a simpler alternative. It can also mean using a better model that normally doesn’t fit in memory

#

It’s typical to use float16 for inference partly because of this: the accuracy gain of using a better model outweighs the loss of using float16

copper sable
topaz basin
#

For search mainly. But there’s technically a better detection model than the one in buffalo_l as well

#

#1272383382487040020 message

#

But it’s not so big that you can’t run it at full precision either

copper sable
#

Interesting. so you think that model at float16 would do better than buffalo_l at 32?

topaz basin
#

Yup, at least for detection

#

Since the recognition model would still be the same for both

copper sable
#

hah, what do you mean by that? would it generally mark the same people as different people?

#

oh I get what you're saying, it would find more people, not necessarily group them better

topaz basin
#

Detection = is this a face, recognition = whose face is this

copper sable
#

right. Any idea when that might be included in an official release?

topaz basin
#

Maybe in a few weeks

copper sable
#

so by about the time immich would become stable?

#

(if that's still on the roadmap for this year)

topaz basin
#

It’d be before that. The stable release is at least two months away, if not a bit more

copper sable
#

my bad, I generally double any timeline that anyone gives me - force of habbit

topaz basin
#

lol

#

Not a bad habit to have

copper sable
#

until then, any immich update would require a manual cache replacement?

copper sable
topaz basin
copper sable
#

Sorry. Let me rephrase.

If I use the model that you linked in that discussion, and replace the onnx file, would an immich update clean that and require me to replace it again?

topaz basin
#

No, immich won’t do anything with it across releases

copper sable
#

ok awesome. then it's a set-it and forget-it type thing

topaz basin
#

The only exception is that if there’s some IO error when it’s loading the model, it’ll clear the cache and download the model again

copper sable
#

but it's backwards compatible with the other one, so it shouldn't cause a fatal error, right?

topaz basin
#

Nope, it would just mean it goes back to the normal detection model

#

And this doesn’t include runtime errors from OpenVINO or anything

copper sable
#

yeah that sounds fine. If there's an IO issue I've got much larger problems

#

ok, so then go full circle. My apologies.

This new facial detection model, running at float16 in openvino versus float32 in cpu, would that be a big difference?

topaz basin
#

There difference in outputs should be small. Idk, maybe out of every 100 faces it’d detect at float32 it’d miss 1 or 2 at float16

#

I’d need to test it to give a more precise figure

copper sable
#

oh that's fairly minor. how do I go about changing the float? MACHINE_LEARNING_ANN_FP16_TURBO?

topaz basin
#

It isn’t supported at the moment, but it’s relatively easy to add

copper sable
#

oh alright, then I may just wait a moment. I tried creating a cpu fallback mode - testing it atm

topaz basin
#

You could hack it by going into the container and changing the line "precision": "FP32" in the file /usr/src/app/sessions/ort.py to "precision": "FP16"

copper sable
copper sable
topaz basin
#

Detection threshold | Face count
0.7 | 84
0.6 | 135
0.5 | 200

copper sable
#

interesting. For some reason I sit at 82 solid.

#

I ended up writing up a nginx script that if the request to the hwa container failed, it would reroute to the cpu container. It handled quite a few timeouts, but ultimately still crashed

topaz basin
#

It could be a difference in rounding. These results are also from a while ago so there might be a change in code behavior since then

copper sable
#

Running the float 16 works to an extent. I don't think it has any impact on my smart search model though. Running both tasks simultaneously has some issues

#

and just like that

#

the float16 model crashed

#

guess I spoke too soon

topaz basin
#

Out of resources?

copper sable
#

yea

#

was running at around 1.6-1.7gb then suddenly spiked to 4.33

#

immich_server

topaz basin
#

Could you post the ram usage too?

copper sable
topaz basin
#

Hmm, I guess it ultimately can't handle a string of images with high and different numbers of faces

#

float16 just delays the inevitable there

copper sable
#

I wonder what made the CPU spike, as that's what took it all down

topaz basin
#

What's the timestamp of the spike?

#

The CPU spike

copper sable
#

sorry it's gone. Let me re-run this to regenerate it

#

I'm wondering if my cpu is having issues using more ram than the 8gb official. I've seen it use 8.x ram before, but never 10+. Yes the remainder is full of cache, but ...

#

I needed to get dual rank (2Rx8) memory to recognize a 16gb module in the system

copper sable
#

so after a little playing around I made a super hacky solution to this.

I have 2 instances of immich_machine_learning running. One is hwa (and preloads clip), the other is cpu and resting idle.

I also added a nginx container, which is a load balancer of sorts. All ML tasks are sent to the load balancer, which are then forwarded to hwa container. If the request comes back as a failure, it gets forwarded to the cpu container. I have an external bash script monitoring the load balancer logs, and if an error is logged, it restarts the hwa contaier.

I'm now using approx 12gb ram, machine learning is running ok with hwa, and so far it hasn't transitioned to the cpu container (since I ended up preloading the clip module about 5 min ago)

Edit: I also unintentionally updated to 1.116, if that makes any difference

copper sable
#

following up on this, I disabled full container restarts for awhile, and so far the main container is managing to recover when it doesn't have the request waiting for it while it's starting up (nginx marks the container down for short period upon error).

Turns out it keeps eating more memory every time it reloads, so the script to restart the container is required

copper sable
#

@topaz basin does this look like a memory leak to you? after running my script to keep rebooting the hwa container on failures, I left the cpu container untouched. The cpu container is using significantly more ram now then it did 5 hours ago (over 1gb at idle, after the container is sent SIGINT and reloads to a "fresh" state).

copper sable
#

and the last image is after startup was complete after running docker restart immich_machine_learning_cpu

Both containers are using the openvino image, but the cpu container is not permitted access to the drivers, so it falls back to cpu

#

immich_machine_learning_cpu grows after one round of usage

topaz basin
#

You should use the cpu image instead of the openvino image for cpu. It uses a more advanced memory allocator that’s more effective at avoiding fragmentation

#

It isn’t installed in the openvino image because it was causing issues for some users

copper sable
#

oh interesting. I guess I can look at the differences between the containers and manually build to see if it solves my issue, or do you think it'll create more issues?

So a memory leak in the openvino image is a known thing?

topaz basin
#

It isn't a memory leak per se, just memory fragmentation. But yes, I imagine the fact that the openvino image uses the default glibc allocator probably contributes to the wonky RAM usage

#

Feel free to extend the openvino image to install a different allocator like mimalloc, snmalloc or jemalloc. Just be sure to set LD_PRELOAD to make sure it's used. mimalloc is set like this:

lib_path="/usr/lib/$(arch)-linux-gnu/libmimalloc.so.2"
export LD_PRELOAD="$lib_path"
#

mimalloc and snmalloc are newer allocators that generally perform the best. jemalloc is older but tried-and-true.

#

There's also tcmalloc, which I have no experience with

copper sable
#

next to no difference with jemalloc