openvino immich_machine_learning | Immich | Page 1

copper sable Sep 25, 2024, 1:41 AM

#

It seems that sometimes my system temporarily runs out of gpu resources. when that happens, the machine learning container crashes but keeps running.
immich_server

dusky emberBOT Sep 25, 2024, 1:41 AM

#

:wave: Hey @copper sable,

Thanks for reaching out to us. Please follow the recommended actions below; this will help us be more effective in our support effort and leave more time for building Immich immich .

References

Container Logs: docker compose logs docs
Container Status: docker compose ps docs
Reverse Proxy: https://immich.app/docs/administration/reverse-proxy

Checklist

:ballot_box_with_check: I have verified I'm on the latest release(note that mobile app releases may take some time).
:ballot_box_with_check: I have read applicable release notes.
:ballot_box_with_check: I have reviewed the FAQs for known issues.
:ballot_box_with_check: I have reviewed Github for known issues.
:ballot_box_with_check: I have tried accessing Immich via local ip (without a custom reverse proxy).
:ballot_box_with_check: I have uploaded the relevant logs, docker compose, and .env files, making sure to use code formatting.
:ballot_box_with_check: I have tried an incognito window, disabled extensions, cleared mobile app cache, logged out and back in, different browsers, etc. as applicable

(an item can be marked as "complete" by reacting with the appropriate number)

If this ticket can be closed you can use the /close command, and re-open it later if needed.

copper sable Sep 25, 2024, 1:41 AM

#

immich_server

📎 message.txt

#

immich_machine_learning

📎 message.txt

#

the immich_machine_learning container does not recover unless I manually reboot it, and causes all jobs to fail in queue. It processes one failure / 5min.

dusky emberBOT Sep 25, 2024, 1:43 AM

#

dusky ember :wave: Hey <@1142920273851592814>, Thanks for reaching out to us. Please follow...

Successfully submitted, a tag has been added to inform contributors. :white_check_mark:

copper sable Sep 25, 2024, 1:46 AM

#

This first crash happened after 32 minutes of running. Will monitor the next round as well

#

using image ghcr.io/immich-app/immich-machine-learning:pr-12455-openvino

#

version 1.114

#

it crashes again after 7 min. I'll try running just facial detection to see what happens

#

after this "crash" the container uses an additional 1gb of ram but is still unable to process facial jobs.

#

and crashed after 3-4 assets.

#

it seems to be loading buffalo_l twice

[09/24/24 21:00:06] INFO     Application startup complete.
[09/24/24 21:00:22] INFO     Loading detection model 'buffalo_l' to memory
[09/24/24 21:00:22] INFO     Setting execution providers to
                             ['OpenVINOExecutionProvider',
                             'CPUExecutionProvider'], in descending order of
                             preference
[09/24/24 21:00:23] INFO     Loading recognition model 'buffalo_l' to memory
[09/24/24 21:00:23] INFO     Setting execution providers to
                             ['OpenVINOExecutionProvider',
                             'CPUExecutionProvider'], in descending order of
                             preference

when loading the SigLIP model, it only loads once. Job concurrency on both is 1

obsidian trench Sep 25, 2024, 3:05 AM

#

i have same issue

copper sable Sep 25, 2024, 3:07 AM

#

woohoo!! im not the only one!!

#

do you have this too?
I'm getting a fair bit of these warnings in my immich_machine_learning container. Because it's a warning, is it something I should give attention?

[W:onnxruntime:, execution_frame.cc:879 VerifyOutputSizes] Expected shape from model of {1,512} does not match actual shape of {2,512} for output 683

copper sable Sep 25, 2024, 3:10 AM

#

obsidian trench i have same issue

what are your system specs, out of curiosity?

topaz basin Sep 25, 2024, 4:30 AM

#

copper sable it seems to be loading `buffalo_l` twice ```[09/24/24 21:00:06] INFO Initia...

Those are separate models

#

I'm guessing you're right at the edge of your iGPU's RAM limit, so if it uses a bit extra at some point it all falls apart

cerulean surge Sep 25, 2024, 4:44 AM

#

If you're sure you have enough RAM for your iGPU, this might be of interest. I haven't seen out of resources errors after I followed these instructions.
https://github.com/immich-app/immich/discussions/11422

topaz basin Sep 25, 2024, 5:13 AM

#

copper sable do you have this too? `I'm getting a fair bit of these warnings in my immich_mac...

This particular warning should be fixed as of the latest release. Maybe running docker exec immich_machine_learning rm -r /cache/facial-recognition will help?

copper sable Sep 25, 2024, 5:13 AM

#

topaz basin This particular warning should be fixed as of the latest release. Maybe running ...

as of 1.115, or the ML image?

topaz basin Sep 25, 2024, 5:16 AM

#

As of the 1.115 release

copper sable Sep 25, 2024, 5:16 AM

#

topaz basin Those are separate models

sorry I didn't see this earlier, are you suggesting the buffalo_l uses more resources than the ViT-L-16-SigLIP-384__webli model?

#

my bad, I didn't see any of that in the release notes. Guess I'll update the stack tomorrow

topaz basin Sep 25, 2024, 5:19 AM

#

It shouldn't use more resources than the ViT model. I just meant that buffalo_l is really two models: a detection model and a recognition model

copper sable Sep 25, 2024, 5:20 AM

#

oh gotcha

topaz basin Sep 25, 2024, 5:20 AM

#

Technically the recognition model has unbounded RAM usage because it will process all faces in an image in one batch, so if it encounters an image with a lot of faces it'll use a bit more RAM than usual

#

But it should always use a lot less than the ViT model you have

copper sable Sep 25, 2024, 5:21 AM

#

even with multiple faces?

#

I'm still using ghcr.io/immich-app/immich-machine-learning:pr-12455-openvino, should I be using release or continue with this one?

#

Is there any suggested method to automatically reboot the container to handle these crashes?

topaz basin Sep 25, 2024, 5:27 AM

#

copper sable even with multiple faces?

It's like 25MiB/face

copper sable Sep 25, 2024, 5:27 AM

#

oh yes that is tiny. It really shouldn't get anywhere close then

topaz basin Sep 25, 2024, 5:28 AM

#

copper sable I'm still using `ghcr.io/immich-app/immich-machine-learning:pr-12455-openvino`, ...

You can just use release at this point

#

I wonder if this is related to the ram not being fully freed when the worker exits

topaz basin Sep 25, 2024, 5:33 AM

#

cerulean surge If you're sure you have enough RAM for your iGPU, this might be of interest. I h...

It could also be related to this. The GuC and HuC firmware isn't enabled on this processor by default. There might be some hidden interaction here even for non Jasper or Elkhart processors

#

Could also be related to the kernels on Synologys being super outdated

#

It's a wonder any of this stuff even works haha

copper sable Sep 25, 2024, 5:34 AM

#

hahaha lovely. Just what I wanted to hear...

#

you think that guide would work with an outdated kernel?

topaz basin Sep 25, 2024, 5:38 AM

#

I'm not sure about step 1 since Synology has its own setup for kernels, but the remaining steps should be fine

copper sable Sep 25, 2024, 5:43 AM

#

ok thanks, guess I'll make sure backups are upto snuff and give it a go tomorrow. My current kernel is 4.4.302+

copper sable Sep 25, 2024, 1:57 PM

#

topaz basin It could also be related to this. The GuC and HuC firmware isn't enabled on this...

I'm assuming I'd need to follow those steps on the host os, within the docker container would be insufficient?

copper sable Sep 25, 2024, 3:39 PM

#

looks like on the ds423+ GuC and HuC are not available. https://patchwork.kernel.org/project/intel-gfx/patch/[email protected]/

#

is there a way to do facial recognition via cpu but smartsearch via hardware acceleration?

topaz basin Sep 25, 2024, 3:49 PM

#

There’s no easy way. I guess you could run two ML instances with a load balancer or something

copper sable Sep 25, 2024, 4:26 PM

#

topaz basin There’s no easy way. I guess you could run two ML instances with a load balancer...

Ok thanks

topaz basin Sep 25, 2024, 4:42 PM

#

There is a possible solution for you in running the models in fp16 instead of fp32. It should halve the models’ memory usage with a tiny difference in quality

copper sable Sep 25, 2024, 4:46 PM

#

thing is, it's just buffalo_l that's the problem. I could handle it crashing and restarting correctly as well, however when it crashes it takes down the entire model (it does not ViT-L-16-SigLIP-384__webli). I've been running ViT-L-16-SigLIP-384__webli with no problems for about 12 hours now

topaz basin Sep 25, 2024, 5:07 PM

#

Does this only happen when both tasks are being processed, or also when face detection alone is?

#

i.e. is it just the combined total ram that’s the problem, or is there something specific about the facial recognition model outside of ram issues?

copper sable Sep 25, 2024, 5:56 PM

#

topaz basin Does this only happen when both tasks are being processed, or also when face det...

Something specific about facial recognition. No other tasks are running.

It ran fine for about half an hour with both ml tasks running simultaneously, but I cannot get it to run alone anymore for any extended period

topaz basin Sep 25, 2024, 5:57 PM

#

Does the container use more than 8GiB when that happens?

copper sable Sep 25, 2024, 6:00 PM

#

I don't think so

#

Let me run a quick comparison

#

This is running SigLIP

#

and buffalo_l

#

the ram seems much more volatile than sigLIP, as sigLIP ran steady

#

just crashed again

#

#

topaz basin Sep 25, 2024, 6:09 PM

#

Could you restart the container and run docker exec immich_machine_learning rm -r /cache/facial-recognition/buffalo_l/recognition/openvino?

copper sable Sep 25, 2024, 6:09 PM

#

Will do. I just upgraded to 1.115 so I assumed I had a fresh palette

topaz basin Sep 25, 2024, 6:10 PM

#

copper sable just crashed again

Did the crash correspond with that spike in ram at the end?

copper sable Sep 25, 2024, 6:10 PM

#

it did

#

ash-4.4# docker attach immich_machine_learning
[09/25/24 13:11:34] INFO     Loading detection model 'buffalo_l' to memory
[09/25/24 13:11:34] INFO     Setting execution providers to
                             ['OpenVINOExecutionProvider',
                             'CPUExecutionProvider'], in descending order of
                             preference
[09/25/24 13:11:35] INFO     Loading recognition model 'buffalo_l' to memory
[09/25/24 13:11:35] INFO     Setting execution providers to
                             ['OpenVINOExecutionProvider',
                             'CPUExecutionProvider'], in descending order of
                             preference

topaz basin Sep 25, 2024, 6:13 PM

#

Kind of, but openvino uses cached blobs

topaz basin Sep 25, 2024, 6:13 PM

#

copper sable Will do. I just upgraded to 1.115 so I assumed I had a fresh palette

Referring to this

copper sable Sep 25, 2024, 6:14 PM

#

should it be using cpu for buffalo_l? Single task being processed

topaz basin Sep 25, 2024, 6:15 PM

#

It’s compiling the models right now

copper sable Sep 25, 2024, 6:16 PM

#

oh alright, so just wait for cpu to drop before looking at failed logs again (if it takes longer than 5min I assume I'll have fails on the immich_server)

topaz basin Sep 25, 2024, 6:16 PM

#

Yup

copper sable Sep 25, 2024, 6:20 PM

#

there was one minor cpu/ram spike since the cpu dropped, but running fairly steady otherwise so far

#

actually it appears cpu and ram aren't always in sync

#

and crash.

topaz basin Sep 25, 2024, 6:25 PM

#

It seems like there’s some allocation limit this is hitting besides the 8GiB limit you mentioned

#

Can you check the server logs for the asset id of the asset that caused it to crash?

#

In the “job failed” text, there should be something like “id”: “<random-string>”

copper sable Sep 25, 2024, 6:28 PM

#

yep

#

got it, c1a83766-d232-450d-be6c-3c47cd89aefe

topaz basin Sep 25, 2024, 6:29 PM

#

Can you open a random image in immich and change the id in the url to this id?

copper sable Sep 25, 2024, 6:29 PM

#


Stacktrace
Error: Error: 400
    at Object.he [as ok] (http://192.168.3.92:2283/_app/immutable/chunks/fetch-client.CFZ4JfrO.js:1:2872)
    at async Yt (http://192.168.3.92:2283/_app/immutable/nodes/17.LM1klr7C.js:1:2358)
    at async Pe (http://192.168.3.92:2283/_app/immutable/chunks/entry.CYYJZztW.js:1:14634)```

#

perhaps that's another user?

#

lemme login as another

topaz basin Sep 25, 2024, 6:30 PM

#

Might be

copper sable Sep 25, 2024, 6:31 PM

#

20 faces in that one

topaz basin Sep 25, 2024, 6:33 PM

#

Hmm, well that at least confirms that it’s because of the batch size

#

Maybe the way openvino handles batching is more ram-intensive

copper sable Sep 25, 2024, 6:37 PM

#

perhaps, it was using much more than cuda to start with anyways

After unloading the model, the ram usage is still up there, will this present a memory leak moving forwards?

                            to memory
[09/25/24 13:28:58] INFO     Setting execution providers to
                            ['OpenVINOExecutionProvider',
                            'CPUExecutionProvider'], in descending order of
                            preference
[09/25/24 13:34:00] INFO     Shutting down due to inactivity.
[09/25/24 13:34:00] INFO     Shutting down
[09/25/24 13:34:00] INFO     Waiting for application shutdown.
[09/25/24 13:34:01] INFO     Application shutdown complete.
[09/25/24 13:34:01] INFO     Finished server process [10]
[09/25/24 13:34:01] ERROR    Worker (pid:10) was sent SIGINT!
[09/25/24 13:34:01] INFO     Booting worker with pid: 120
[09/25/24 13:34:09] INFO     Started server process [120]
[09/25/24 13:34:09] INFO     Waiting for application startup.
[09/25/24 13:34:09] INFO     Created in-memory cache with unloading after 300s
                            of inactivity.
[09/25/24 13:34:09] INFO     Initialized request thread pool with 4 threads.
[09/25/24 13:34:09] INFO     Application startup complete.

I'm running a search and will wait for it to unload again just to see if I get approx the same resting ram usage

topaz basin Sep 25, 2024, 6:38 PM

#

It’s a memory leak if it keeps building up each time, but if it reuses the ram from earlier then it isn’t a leak

copper sable Sep 25, 2024, 6:40 PM

#

Ok so I'll know in 5 min then. Regarding this for me, I guess I need to run buffalo_l in cpu only? It would be really nice if after crashing it would fallback on cpu, then fire up hardware acceleration for the next

#

we dropped to 2.75gb, not significantly more than than the 2.21 previously

#

about 500mb more

topaz basin Sep 25, 2024, 6:44 PM

#

The question is if loading the models reuses that or if it’ll spike much higher now, and what it drops to after unloading a second time

copper sable Sep 25, 2024, 6:45 PM

#

topaz basin The question is if loading the models reuses that or if it’ll spike much higher ...

right, about 500 more mb at idle, spike are similar

topaz basin Sep 25, 2024, 6:45 PM

#

That’s good at least

copper sable Sep 25, 2024, 6:46 PM

#

running a 3rd time for consistency

#

idles about the same on the 3rd round as the 2nd

#

would error handling be possible for something like this, or is it more effort than it's worth? As of right now in this condition the buffalo_l model just hangs

📎 message.txt

topaz basin Sep 25, 2024, 6:57 PM

#

It’s one thing to handle this with a cpu fallback when loading the model, but it’s tricky on-the-fly

#

The solution is generally to avoid this happening in the first place

copper sable Sep 25, 2024, 6:58 PM

#

so it's not so easy to even just reload the gpu model?

copper sable Sep 25, 2024, 6:59 PM

#

topaz basin The solution is generally to avoid this happening in the first place

that I understand. But that would entail a rewrite of the processing, or require specific hardware (which will still fail at some count of faces)

topaz basin Sep 25, 2024, 7:03 PM

#

We’re just using openvino through a relatively high-level API, so the specifics of how allocations and deallocations happen isn’t up to us. #11981 is related

dusky emberBOT Sep 25, 2024, 7:04 PM

#

topaz basin We’re just using openvino through a relatively high-level API, so the specifics ...

[Discussion] (immich-app/immich#11981)

topaz basin Sep 25, 2024, 7:06 PM

#

The issue in this case is that once it gets an allocation error with that one asset with 20 faces, it still doesn’t work for other assets that it could otherwise process. I think the fact that it can’t recover from this is an upstream issue

copper sable Sep 25, 2024, 7:08 PM

#

                             status code returned while running
                             OpenVINO-EP-subgraph_2 node.```

So you're thinking there's no easy way to restart it once a non-zero status code is returned?

#

also, after reading that discussion, are failed assets excluded from being ran, even you'd manually click missing, discover, or refresh in the admin->jobs pane?

topaz basin Sep 25, 2024, 7:15 PM

#

One issue is that if it hits this situation once, there’s no reason to assume it won’t again. When there are many jobs, reloading the model from disk for each job because of a failure that happens each time is silly. Another issue is that the error isn’t necessarily fatal, so falling back to cpu or reloading from disk is problematic if it can still process other assets fine. And specifically for the CPU fallback, it could actually increase ram pressure if the model was previously on the GPU and make things worse.

copper sable Sep 25, 2024, 7:17 PM

#

what would you consider fatal? This job now hangs until someone manually restarts the container, or the queue is emptied by a bunch of header timeouts

topaz basin Sep 25, 2024, 7:18 PM

#

It’s essentially fatal in this case, but for CUDA it would actually work fine for other inputs

copper sable Sep 25, 2024, 7:18 PM

#

ah ok.

#

I guess I'll spin up 2 ML instances then, one on cpu and one on gpu, and push the respective jobs to each via a nginx loadbalancer or so

topaz basin Sep 25, 2024, 7:23 PM

#

Out of curiosity, could you download that asset with 20 faces, delete it from immich (including from the trash), disable smart search in the settings, restart the ml container and upload that asset?

#

I’m interested in whether it can process those faces in isolation without other jobs running

copper sable Sep 25, 2024, 7:25 PM

#

topaz basin I’m interested in whether it can process those faces in isolation without other ...

the test I was doing was without any other jobs running

topaz basin Sep 25, 2024, 7:25 PM

#

But there were other face detection jobs that ran before it

copper sable Sep 25, 2024, 7:26 PM

#

oh you mean as the only job that runs though the ml container

#

sure

#

would I need to delete it, or could I upload it as a different user?

topaz basin Sep 25, 2024, 7:26 PM

#

Uploading as a different user is fine too

copper sable Sep 25, 2024, 7:26 PM

#

1 sec clearing queues

#

just to be safe I'll restart the entire immich stack

#

I think it worked

topaz basin Sep 25, 2024, 7:31 PM

#

Ooh interesting. What does the ram usage look like? How many faces did it detect?

copper sable Sep 25, 2024, 7:32 PM

#

wait, I can't find the image in the library

topaz basin Sep 25, 2024, 7:33 PM

#

Also enable debug logs since it’ll give a bit more info

copper sable Sep 25, 2024, 7:33 PM

#

I was wrong as it found all 20 people

topaz basin Sep 25, 2024, 7:35 PM

#

Did you change the min face detection setting, or is it default?

copper sable Sep 25, 2024, 7:35 PM

#

let me set debugging. just trying to figure out how to delete it from trash atm

#

I believe I changed it

topaz basin Sep 25, 2024, 7:36 PM

#

Try uploading the asset, deleting it and then uploading it again

#

Before the models get unloaded

copper sable Sep 25, 2024, 7:36 PM

#

if I delete it, it's gone? No trash?

topaz basin Sep 25, 2024, 7:37 PM

#

It should be deleted from the trash as well

copper sable Sep 25, 2024, 7:37 PM

#

models have been unloaded. I'll try this a few times within shorter window

#

#

it doesn't flinch

topaz basin Sep 25, 2024, 7:40 PM

#

Now lower the face detection threshold and try again until it detects more than 20 faces in that image

copper sable Sep 25, 2024, 7:40 PM

#

[Nest] 7 - 09/25/2024, 2:38:59 PM DEBUG [Microservices:PersonService] 20 faces detected in upload/thumbs/c89ae764-04ef-4f4e-8f36-8d0247f0a12b/65/08/6508584b-68b8-445f-b07b-c7e787bb37ca-preview.webp

#

there are only 20 faces in that image

topaz basin Sep 25, 2024, 7:40 PM

#

The model will disagree at some point 🙂

copper sable Sep 25, 2024, 7:40 PM

#

hahaha ok

#

24 faces at 0.1

topaz basin Sep 25, 2024, 7:43 PM

#

What does the ram usage look like?

copper sable Sep 25, 2024, 7:44 PM

#

crashed when I set it to 0.0.
3.53 gb at 0.1

#

topaz basin Sep 25, 2024, 7:47 PM

#

Oh lol, 0.0 is brutal

#

It would detect thousands of faces in that image

copper sable Sep 25, 2024, 7:49 PM

#

haha ok. It didn't let me do 0.05

topaz basin Sep 25, 2024, 7:50 PM

#

But this is good. So the issue isn’t so much the batch size but when it receives batches of different sizes

#

I’m guessing it’s doing some caching here, so it has one version of the model that it uses for a batch of 20, another version for 24, etc.

#

And presumably the effect of this scales higher when the batch size grows and is less noticeable at smaller batch sizes

copper sable Sep 25, 2024, 7:52 PM

#

oh interesting, it also fully unloaded the model.

#

after the crash I was successfully able to reupload it again after changing the detection settings. I'm guessing that after 5 min without a request it managed to unload

copper sable Sep 25, 2024, 7:53 PM

#

topaz basin I’m guessing it’s doing some caching here, so it has one version of the model th...

oh interesting. so it just unloads them slowly then? because I did see ram dip before

topaz basin Sep 25, 2024, 7:57 PM

#

It’s probably holding onto the kernels and the ram it allocated for processing that particular batch size. This lets it reuse all of this the next time it gets an input with that batch size

copper sable Sep 25, 2024, 7:58 PM

#

copper sable after the crash I was successfully able to reupload it again after changing the ...

would potentially changing the unload time to be smaller, say 30seconds, perhaps allow it to recover from immich_server requests?

topaz basin Sep 25, 2024, 7:59 PM

#

Each request it gets from the server would refresh that counter, so it’d only recover sooner after the server goes through all the jobs

copper sable Sep 25, 2024, 8:00 PM

#

ah so not helpful then

topaz basin Sep 25, 2024, 8:00 PM

#

Yeah not really

#

Based on what I’m seeing, using fp16 would probably avoid the error you’re seeing because it halves the working memory that needs to be allocated for each batch size

#

With the bonus of being much faster

copper sable Sep 25, 2024, 8:04 PM

#

topaz basin With the bonus of being much faster

speed really doesn't matter to me. 50 photos could take a day on this hardware for all I care

copper sable Sep 25, 2024, 8:05 PM

#

topaz basin Based on what I’m seeing, using fp16 would probably avoid the error you’re seein...

this would affect facial detection accuracy however?

#

the real advantage of the hardware acceleration is searching images

topaz basin Sep 25, 2024, 8:12 PM

#

copper sable this would affect facial detection accuracy however?

The effect is generally pretty minor. OpenVINO (without ONNX) actually defaults to float16 when using a GPU

#

But how much of an effect it has depends on the specific model

#

Using cpu for facial recognition is fine too, but this might be a simpler alternative. It can also mean using a better model that normally doesn’t fit in memory

#

It’s typical to use float16 for inference partly because of this: the accuracy gain of using a better model outweighs the loss of using float16

copper sable Sep 25, 2024, 8:17 PM

#

topaz basin It’s typical to use float16 for inference partly because of this: the accuracy g...

we're talking between buffalo_m vs buffalo_l ?

topaz basin Sep 25, 2024, 8:19 PM

#

For search mainly. But there’s technically a better detection model than the one in buffalo_l as well

#

#1272383382487040020 message

#

But it’s not so big that you can’t run it at full precision either

copper sable Sep 25, 2024, 8:23 PM

#

Interesting. so you think that model at float16 would do better than buffalo_l at 32?

topaz basin Sep 25, 2024, 8:24 PM

#

Yup, at least for detection

#

Since the recognition model would still be the same for both

copper sable Sep 25, 2024, 8:25 PM

#

hah, what do you mean by that? would it generally mark the same people as different people?

#

oh I get what you're saying, it would find more people, not necessarily group them better

topaz basin Sep 25, 2024, 8:26 PM

#

Detection = is this a face, recognition = whose face is this

copper sable Sep 25, 2024, 8:26 PM

#

right. Any idea when that might be included in an official release?

topaz basin Sep 25, 2024, 8:28 PM

#

Maybe in a few weeks

copper sable Sep 25, 2024, 8:29 PM

#

so by about the time immich would become stable?

#

(if that's still on the roadmap for this year)

topaz basin Sep 25, 2024, 8:30 PM

#

It’d be before that. The stable release is at least two months away, if not a bit more

copper sable Sep 25, 2024, 8:31 PM

#

my bad, I generally double any timeline that anyone gives me - force of habbit

topaz basin Sep 25, 2024, 8:31 PM

#

lol

#

Not a bad habit to have

copper sable Sep 25, 2024, 8:31 PM

#

until then, any immich update would require a manual cache replacement?

copper sable Sep 25, 2024, 8:32 PM

#

topaz basin Not a bad habit to have

haha yep, keeps me happy things are done "ahead of schedule", instead of behind

topaz basin Sep 25, 2024, 8:32 PM

#

copper sable right. Any idea when that might be included in an official release?

Sorry, by this do you mean when the model will be added, or when float16 will?

copper sable Sep 25, 2024, 8:33 PM

#

Sorry. Let me rephrase.

If I use the model that you linked in that discussion, and replace the onnx file, would an immich update clean that and require me to replace it again?

topaz basin Sep 25, 2024, 8:34 PM

#

No, immich won’t do anything with it across releases

copper sable Sep 25, 2024, 8:35 PM

#

ok awesome. then it's a set-it and forget-it type thing

topaz basin Sep 25, 2024, 8:35 PM

#

The only exception is that if there’s some IO error when it’s loading the model, it’ll clear the cache and download the model again

copper sable Sep 25, 2024, 8:36 PM

#

but it's backwards compatible with the other one, so it shouldn't cause a fatal error, right?

topaz basin Sep 25, 2024, 8:36 PM

#

Nope, it would just mean it goes back to the normal detection model

#

And this doesn’t include runtime errors from OpenVINO or anything

copper sable Sep 25, 2024, 8:37 PM

#

yeah that sounds fine. If there's an IO issue I've got much larger problems

#

ok, so then go full circle. My apologies.

This new facial detection model, running at float16 in openvino versus float32 in cpu, would that be a big difference?

topaz basin Sep 25, 2024, 9:02 PM

#

There difference in outputs should be small. Idk, maybe out of every 100 faces it’d detect at float32 it’d miss 1 or 2 at float16

#

I’d need to test it to give a more precise figure

copper sable Sep 25, 2024, 9:07 PM

#

oh that's fairly minor. how do I go about changing the float? MACHINE_LEARNING_ANN_FP16_TURBO?

topaz basin Sep 25, 2024, 9:28 PM

#

It isn’t supported at the moment, but it’s relatively easy to add

copper sable Sep 25, 2024, 9:30 PM

#

oh alright, then I may just wait a moment. I tried creating a cpu fallback mode - testing it atm

topaz basin Sep 25, 2024, 9:32 PM

#

You could hack it by going into the container and changing the line "precision": "FP32" in the file /usr/src/app/sessions/ort.py to "precision": "FP16"

copper sable Sep 25, 2024, 9:32 PM

#

topaz basin You could hack it by going into the container and changing the line `"precision"...

oh that does sound super easy

copper sable Sep 26, 2024, 1:39 AM

#

topaz basin https://discord.com/channels/979116623879368755/1272383382487040020/127239758815...

do you remember how many faces you detected in this image? I'm counting 82, at float 16 or 32. Just want to make sure my numbers aren't skewed

topaz basin Sep 26, 2024, 2:38 AM

#

Detection threshold | Face count
0.7 | 84
0.6 | 135
0.5 | 200

copper sable Sep 26, 2024, 2:40 AM

#

interesting. For some reason I sit at 82 solid.

#

I ended up writing up a nginx script that if the request to the hwa container failed, it would reroute to the cpu container. It handled quite a few timeouts, but ultimately still crashed

topaz basin Sep 26, 2024, 2:42 AM

#

It could be a difference in rounding. These results are also from a while ago so there might be a change in code behavior since then

copper sable Sep 26, 2024, 2:43 AM

#

Running the float 16 works to an extent. I don't think it has any impact on my smart search model though. Running both tasks simultaneously has some issues

#

and just like that

#

the float16 model crashed

#

guess I spoke too soon

topaz basin Sep 26, 2024, 2:44 AM

#

Out of resources?

copper sable Sep 26, 2024, 2:44 AM

#

yea

#

was running at around 1.6-1.7gb then suddenly spiked to 4.33

#

immich_server

#

📎 message.txt

#

immich_machine_learning

📎 message.txt

topaz basin Sep 26, 2024, 2:46 AM

#

Could you post the ram usage too?

copper sable Sep 26, 2024, 2:46 AM

#

topaz basin Sep 26, 2024, 2:49 AM

#

Hmm, I guess it ultimately can't handle a string of images with high and different numbers of faces

#

float16 just delays the inevitable there

copper sable Sep 26, 2024, 2:50 AM

#

topaz basin float16 just delays the inevitable there

I don't know if it made a difference tbh. crashed 8min after running resources though

#

I wonder what made the CPU spike, as that's what took it all down

topaz basin Sep 26, 2024, 2:54 AM

#

What's the timestamp of the spike?

#

The CPU spike

copper sable Sep 26, 2024, 2:55 AM

#

sorry it's gone. Let me re-run this to regenerate it

#

📎 message.txt 📎 message.txt

#

I'm wondering if my cpu is having issues using more ram than the 8gb official. I've seen it use 8.x ram before, but never 10+. Yes the remainder is full of cache, but ...

#

I needed to get dual rank (2Rx8) memory to recognize a 16gb module in the system

#

copper sable Sep 26, 2024, 7:55 PM

#

so after a little playing around I made a super hacky solution to this.

I have 2 instances of immich_machine_learning running. One is hwa (and preloads clip), the other is cpu and resting idle.

I also added a nginx container, which is a load balancer of sorts. All ML tasks are sent to the load balancer, which are then forwarded to hwa container. If the request comes back as a failure, it gets forwarded to the cpu container. I have an external bash script monitoring the load balancer logs, and if an error is logged, it restarts the hwa contaier.

I'm now using approx 12gb ram, machine learning is running ok with hwa, and so far it hasn't transitioned to the cpu container (since I ended up preloading the clip module about 5 min ago)

Edit: I also unintentionally updated to 1.116, if that makes any difference

copper sable Sep 26, 2024, 8:39 PM

#

following up on this, I disabled full container restarts for awhile, and so far the main container is managing to recover when it doesn't have the request waiting for it while it's starting up (nginx marks the container down for short period upon error).

Turns out it keeps eating more memory every time it reloads, so the script to restart the container is required

copper sable Sep 27, 2024, 1:02 AM

#

@topaz basin does this look like a memory leak to you? after running my script to keep rebooting the hwa container on failures, I left the cpu container untouched. The cpu container is using significantly more ram now then it did 5 hours ago (over 1gb at idle, after the container is sent SIGINT and reloads to a "fresh" state).

copper sable Sep 27, 2024, 1:23 AM

#

and the last image is after startup was complete after running docker restart immich_machine_learning_cpu

Both containers are using the openvino image, but the cpu container is not permitted access to the drivers, so it falls back to cpu

#

immich_machine_learning_cpu grows after one round of usage

topaz basin Sep 27, 2024, 2:15 AM

#

You should use the cpu image instead of the openvino image for cpu. It uses a more advanced memory allocator that’s more effective at avoiding fragmentation

#

It isn’t installed in the openvino image because it was causing issues for some users

copper sable Sep 27, 2024, 2:31 AM

#

oh interesting. I guess I can look at the differences between the containers and manually build to see if it solves my issue, or do you think it'll create more issues?

So a memory leak in the openvino image is a known thing?

topaz basin Sep 27, 2024, 3:28 AM

#

It isn't a memory leak per se, just memory fragmentation. But yes, I imagine the fact that the openvino image uses the default glibc allocator probably contributes to the wonky RAM usage

#

Feel free to extend the openvino image to install a different allocator like mimalloc, snmalloc or jemalloc. Just be sure to set LD_PRELOAD to make sure it's used. mimalloc is set like this:

lib_path="/usr/lib/$(arch)-linux-gnu/libmimalloc.so.2"
export LD_PRELOAD="$lib_path"

#

mimalloc and snmalloc are newer allocators that generally perform the best. jemalloc is older but tried-and-true.

#

There's also tcmalloc, which I have no experience with

copper sable Sep 27, 2024, 5:35 AM

#

next to no difference with jemalloc

#openvino immich_machine_learning

References

Checklist