#openvino immich_machine_learning
1 messages · Page 1 of 1 (latest)
:wave: Hey @copper sable,
Thanks for reaching out to us. Please follow the recommended actions below; this will help us be more effective in our support effort and leave more time for building Immich
.
References
- Container Logs:
docker compose logsdocs - Container Status:
docker compose psdocs - Reverse Proxy: https://immich.app/docs/administration/reverse-proxy
Checklist
- :ballot_box_with_check: I have verified I'm on the latest release(note that mobile app releases may take some time).
- :ballot_box_with_check: I have read applicable release notes.
- :ballot_box_with_check: I have reviewed the FAQs for known issues.
- :ballot_box_with_check: I have reviewed Github for known issues.
- :ballot_box_with_check: I have tried accessing Immich via local ip (without a custom reverse proxy).
- :ballot_box_with_check: I have uploaded the relevant logs, docker compose, and .env files, making sure to use code formatting.
- :ballot_box_with_check: I have tried an incognito window, disabled extensions, cleared mobile app cache, logged out and back in, different browsers, etc. as applicable
(an item can be marked as "complete" by reacting with the appropriate number)
If this ticket can be closed you can use the /close command, and re-open it later if needed.
immich_server
immich_machine_learning
the immich_machine_learning container does not recover unless I manually reboot it, and causes all jobs to fail in queue. It processes one failure / 5min.
Successfully submitted, a tag has been added to inform contributors. :white_check_mark:
This first crash happened after 32 minutes of running. Will monitor the next round as well
using image ghcr.io/immich-app/immich-machine-learning:pr-12455-openvino
version 1.114
it crashes again after 7 min. I'll try running just facial detection to see what happens
after this "crash" the container uses an additional 1gb of ram but is still unable to process facial jobs.
and crashed after 3-4 assets.
it seems to be loading buffalo_l twice
[09/24/24 21:00:06] INFO Application startup complete.
[09/24/24 21:00:22] INFO Loading detection model 'buffalo_l' to memory
[09/24/24 21:00:22] INFO Setting execution providers to
['OpenVINOExecutionProvider',
'CPUExecutionProvider'], in descending order of
preference
[09/24/24 21:00:23] INFO Loading recognition model 'buffalo_l' to memory
[09/24/24 21:00:23] INFO Setting execution providers to
['OpenVINOExecutionProvider',
'CPUExecutionProvider'], in descending order of
preference
when loading the SigLIP model, it only loads once. Job concurrency on both is 1
i have same issue
woohoo!! im not the only one!!
do you have this too?
I'm getting a fair bit of these warnings in my immich_machine_learning container. Because it's a warning, is it something I should give attention?
[W:onnxruntime:, execution_frame.cc:879 VerifyOutputSizes] Expected shape from model of {1,512} does not match actual shape of {2,512} for output 683
what are your system specs, out of curiosity?
Those are separate models
I'm guessing you're right at the edge of your iGPU's RAM limit, so if it uses a bit extra at some point it all falls apart
If you're sure you have enough RAM for your iGPU, this might be of interest. I haven't seen out of resources errors after I followed these instructions.
https://github.com/immich-app/immich/discussions/11422
This particular warning should be fixed as of the latest release. Maybe running docker exec immich_machine_learning rm -r /cache/facial-recognition will help?
as of 1.115, or the ML image?
As of the 1.115 release
sorry I didn't see this earlier, are you suggesting the buffalo_l uses more resources than the ViT-L-16-SigLIP-384__webli model?
my bad, I didn't see any of that in the release notes. Guess I'll update the stack tomorrow
It shouldn't use more resources than the ViT model. I just meant that buffalo_l is really two models: a detection model and a recognition model
oh gotcha
Technically the recognition model has unbounded RAM usage because it will process all faces in an image in one batch, so if it encounters an image with a lot of faces it'll use a bit more RAM than usual
But it should always use a lot less than the ViT model you have
even with multiple faces?
I'm still using ghcr.io/immich-app/immich-machine-learning:pr-12455-openvino, should I be using release or continue with this one?
Is there any suggested method to automatically reboot the container to handle these crashes?
It's like 25MiB/face
oh yes that is tiny. It really shouldn't get anywhere close then
You can just use release at this point
I wonder if this is related to the ram not being fully freed when the worker exits
It could also be related to this. The GuC and HuC firmware isn't enabled on this processor by default. There might be some hidden interaction here even for non Jasper or Elkhart processors
Could also be related to the kernels on Synologys being super outdated
It's a wonder any of this stuff even works haha
hahaha lovely. Just what I wanted to hear...
you think that guide would work with an outdated kernel?
I'm not sure about step 1 since Synology has its own setup for kernels, but the remaining steps should be fine
ok thanks, guess I'll make sure backups are upto snuff and give it a go tomorrow. My current kernel is 4.4.302+
I'm assuming I'd need to follow those steps on the host os, within the docker container would be insufficient?
looks like on the ds423+ GuC and HuC are not available. https://patchwork.kernel.org/project/intel-gfx/patch/[email protected]/
is there a way to do facial recognition via cpu but smartsearch via hardware acceleration?
There’s no easy way. I guess you could run two ML instances with a load balancer or something
Ok thanks
There is a possible solution for you in running the models in fp16 instead of fp32. It should halve the models’ memory usage with a tiny difference in quality
thing is, it's just buffalo_l that's the problem. I could handle it crashing and restarting correctly as well, however when it crashes it takes down the entire model (it does not ViT-L-16-SigLIP-384__webli). I've been running ViT-L-16-SigLIP-384__webli with no problems for about 12 hours now
Does this only happen when both tasks are being processed, or also when face detection alone is?
i.e. is it just the combined total ram that’s the problem, or is there something specific about the facial recognition model outside of ram issues?
Something specific about facial recognition. No other tasks are running.
It ran fine for about half an hour with both ml tasks running simultaneously, but I cannot get it to run alone anymore for any extended period
Does the container use more than 8GiB when that happens?
I don't think so
Let me run a quick comparison
This is running SigLIP
and buffalo_l
the ram seems much more volatile than sigLIP, as sigLIP ran steady
just crashed again
Could you restart the container and run docker exec immich_machine_learning rm -r /cache/facial-recognition/buffalo_l/recognition/openvino?
Will do. I just upgraded to 1.115 so I assumed I had a fresh palette
Did the crash correspond with that spike in ram at the end?
it did
ash-4.4# docker attach immich_machine_learning
[09/25/24 13:11:34] INFO Loading detection model 'buffalo_l' to memory
[09/25/24 13:11:34] INFO Setting execution providers to
['OpenVINOExecutionProvider',
'CPUExecutionProvider'], in descending order of
preference
[09/25/24 13:11:35] INFO Loading recognition model 'buffalo_l' to memory
[09/25/24 13:11:35] INFO Setting execution providers to
['OpenVINOExecutionProvider',
'CPUExecutionProvider'], in descending order of
preference
Kind of, but openvino uses cached blobs
Referring to this
should it be using cpu for buffalo_l? Single task being processed
It’s compiling the models right now
oh alright, so just wait for cpu to drop before looking at failed logs again (if it takes longer than 5min I assume I'll have fails on the immich_server)
Yup
there was one minor cpu/ram spike since the cpu dropped, but running fairly steady otherwise so far
actually it appears cpu and ram aren't always in sync
and crash.
It seems like there’s some allocation limit this is hitting besides the 8GiB limit you mentioned
Can you check the server logs for the asset id of the asset that caused it to crash?
In the “job failed” text, there should be something like “id”: “<random-string>”
Can you open a random image in immich and change the id in the url to this id?
Stacktrace
Error: Error: 400
at Object.he [as ok] (http://192.168.3.92:2283/_app/immutable/chunks/fetch-client.CFZ4JfrO.js:1:2872)
at async Yt (http://192.168.3.92:2283/_app/immutable/nodes/17.LM1klr7C.js:1:2358)
at async Pe (http://192.168.3.92:2283/_app/immutable/chunks/entry.CYYJZztW.js:1:14634)```
perhaps that's another user?
lemme login as another
Might be
20 faces in that one
Hmm, well that at least confirms that it’s because of the batch size
Maybe the way openvino handles batching is more ram-intensive
perhaps, it was using much more than cuda to start with anyways
After unloading the model, the ram usage is still up there, will this present a memory leak moving forwards?
to memory
[09/25/24 13:28:58] INFO Setting execution providers to
['OpenVINOExecutionProvider',
'CPUExecutionProvider'], in descending order of
preference
[09/25/24 13:34:00] INFO Shutting down due to inactivity.
[09/25/24 13:34:00] INFO Shutting down
[09/25/24 13:34:00] INFO Waiting for application shutdown.
[09/25/24 13:34:01] INFO Application shutdown complete.
[09/25/24 13:34:01] INFO Finished server process [10]
[09/25/24 13:34:01] ERROR Worker (pid:10) was sent SIGINT!
[09/25/24 13:34:01] INFO Booting worker with pid: 120
[09/25/24 13:34:09] INFO Started server process [120]
[09/25/24 13:34:09] INFO Waiting for application startup.
[09/25/24 13:34:09] INFO Created in-memory cache with unloading after 300s
of inactivity.
[09/25/24 13:34:09] INFO Initialized request thread pool with 4 threads.
[09/25/24 13:34:09] INFO Application startup complete.
I'm running a search and will wait for it to unload again just to see if I get approx the same resting ram usage
It’s a memory leak if it keeps building up each time, but if it reuses the ram from earlier then it isn’t a leak
Ok so I'll know in 5 min then. Regarding this for me, I guess I need to run buffalo_l in cpu only? It would be really nice if after crashing it would fallback on cpu, then fire up hardware acceleration for the next
we dropped to 2.75gb, not significantly more than than the 2.21 previously
about 500mb more
The question is if loading the models reuses that or if it’ll spike much higher now, and what it drops to after unloading a second time
right, about 500 more mb at idle, spike are similar
That’s good at least
running a 3rd time for consistency
idles about the same on the 3rd round as the 2nd
would error handling be possible for something like this, or is it more effort than it's worth? As of right now in this condition the buffalo_l model just hangs
It’s one thing to handle this with a cpu fallback when loading the model, but it’s tricky on-the-fly
The solution is generally to avoid this happening in the first place
so it's not so easy to even just reload the gpu model?
that I understand. But that would entail a rewrite of the processing, or require specific hardware (which will still fail at some count of faces)
We’re just using openvino through a relatively high-level API, so the specifics of how allocations and deallocations happen isn’t up to us. #11981 is related
[Discussion] (immich-app/immich#11981)
The issue in this case is that once it gets an allocation error with that one asset with 20 faces, it still doesn’t work for other assets that it could otherwise process. I think the fact that it can’t recover from this is an upstream issue
status code returned while running
OpenVINO-EP-subgraph_2 node.```
So you're thinking there's no easy way to restart it once a non-zero status code is returned?
also, after reading that discussion, are failed assets excluded from being ran, even you'd manually click missing, discover, or refresh in the admin->jobs pane?
One issue is that if it hits this situation once, there’s no reason to assume it won’t again. When there are many jobs, reloading the model from disk for each job because of a failure that happens each time is silly. Another issue is that the error isn’t necessarily fatal, so falling back to cpu or reloading from disk is problematic if it can still process other assets fine. And specifically for the CPU fallback, it could actually increase ram pressure if the model was previously on the GPU and make things worse.
what would you consider fatal? This job now hangs until someone manually restarts the container, or the queue is emptied by a bunch of header timeouts
It’s essentially fatal in this case, but for CUDA it would actually work fine for other inputs
ah ok.
I guess I'll spin up 2 ML instances then, one on cpu and one on gpu, and push the respective jobs to each via a nginx loadbalancer or so
Out of curiosity, could you download that asset with 20 faces, delete it from immich (including from the trash), disable smart search in the settings, restart the ml container and upload that asset?
I’m interested in whether it can process those faces in isolation without other jobs running
the test I was doing was without any other jobs running
But there were other face detection jobs that ran before it
oh you mean as the only job that runs though the ml container
sure
would I need to delete it, or could I upload it as a different user?
Uploading as a different user is fine too
1 sec clearing queues
just to be safe I'll restart the entire immich stack
I think it worked
Ooh interesting. What does the ram usage look like? How many faces did it detect?
wait, I can't find the image in the library
Also enable debug logs since it’ll give a bit more info
I was wrong as it found all 20 people
Did you change the min face detection setting, or is it default?
let me set debugging. just trying to figure out how to delete it from trash atm
I believe I changed it
Try uploading the asset, deleting it and then uploading it again
Before the models get unloaded
if I delete it, it's gone? No trash?
It should be deleted from the trash as well
models have been unloaded. I'll try this a few times within shorter window
it doesn't flinch
Now lower the face detection threshold and try again until it detects more than 20 faces in that image
[Nest] 7 - 09/25/2024, 2:38:59 PM DEBUG [Microservices:PersonService] 20 faces detected in upload/thumbs/c89ae764-04ef-4f4e-8f36-8d0247f0a12b/65/08/6508584b-68b8-445f-b07b-c7e787bb37ca-preview.webp
there are only 20 faces in that image
The model will disagree at some point 🙂
What does the ram usage look like?
haha ok. It didn't let me do 0.05
But this is good. So the issue isn’t so much the batch size but when it receives batches of different sizes
I’m guessing it’s doing some caching here, so it has one version of the model that it uses for a batch of 20, another version for 24, etc.
And presumably the effect of this scales higher when the batch size grows and is less noticeable at smaller batch sizes
oh interesting, it also fully unloaded the model.
after the crash I was successfully able to reupload it again after changing the detection settings. I'm guessing that after 5 min without a request it managed to unload
oh interesting. so it just unloads them slowly then? because I did see ram dip before
It’s probably holding onto the kernels and the ram it allocated for processing that particular batch size. This lets it reuse all of this the next time it gets an input with that batch size
would potentially changing the unload time to be smaller, say 30seconds, perhaps allow it to recover from immich_server requests?
Each request it gets from the server would refresh that counter, so it’d only recover sooner after the server goes through all the jobs
ah so not helpful then
Yeah not really
Based on what I’m seeing, using fp16 would probably avoid the error you’re seeing because it halves the working memory that needs to be allocated for each batch size
With the bonus of being much faster
speed really doesn't matter to me. 50 photos could take a day on this hardware for all I care
this would affect facial detection accuracy however?
the real advantage of the hardware acceleration is searching images
The effect is generally pretty minor. OpenVINO (without ONNX) actually defaults to float16 when using a GPU
But how much of an effect it has depends on the specific model
Using cpu for facial recognition is fine too, but this might be a simpler alternative. It can also mean using a better model that normally doesn’t fit in memory
It’s typical to use float16 for inference partly because of this: the accuracy gain of using a better model outweighs the loss of using float16
we're talking between buffalo_m vs buffalo_l ?
For search mainly. But there’s technically a better detection model than the one in buffalo_l as well
#1272383382487040020 message
But it’s not so big that you can’t run it at full precision either
Interesting. so you think that model at float16 would do better than buffalo_l at 32?
Yup, at least for detection
Since the recognition model would still be the same for both
hah, what do you mean by that? would it generally mark the same people as different people?
oh I get what you're saying, it would find more people, not necessarily group them better
Detection = is this a face, recognition = whose face is this
right. Any idea when that might be included in an official release?
Maybe in a few weeks
so by about the time immich would become stable?
(if that's still on the roadmap for this year)
It’d be before that. The stable release is at least two months away, if not a bit more
my bad, I generally double any timeline that anyone gives me - force of habbit
until then, any immich update would require a manual cache replacement?
haha yep, keeps me happy things are done "ahead of schedule", instead of behind
Sorry, by this do you mean when the model will be added, or when float16 will?
Sorry. Let me rephrase.
If I use the model that you linked in that discussion, and replace the onnx file, would an immich update clean that and require me to replace it again?
No, immich won’t do anything with it across releases
ok awesome. then it's a set-it and forget-it type thing
The only exception is that if there’s some IO error when it’s loading the model, it’ll clear the cache and download the model again
but it's backwards compatible with the other one, so it shouldn't cause a fatal error, right?
Nope, it would just mean it goes back to the normal detection model
And this doesn’t include runtime errors from OpenVINO or anything
yeah that sounds fine. If there's an IO issue I've got much larger problems
ok, so then go full circle. My apologies.
This new facial detection model, running at float16 in openvino versus float32 in cpu, would that be a big difference?
There difference in outputs should be small. Idk, maybe out of every 100 faces it’d detect at float32 it’d miss 1 or 2 at float16
I’d need to test it to give a more precise figure
oh that's fairly minor. how do I go about changing the float? MACHINE_LEARNING_ANN_FP16_TURBO?
It isn’t supported at the moment, but it’s relatively easy to add
oh alright, then I may just wait a moment. I tried creating a cpu fallback mode - testing it atm
You could hack it by going into the container and changing the line "precision": "FP32" in the file /usr/src/app/sessions/ort.py to "precision": "FP16"
oh that does sound super easy
do you remember how many faces you detected in this image? I'm counting 82, at float 16 or 32. Just want to make sure my numbers aren't skewed
Detection threshold | Face count
0.7 | 84
0.6 | 135
0.5 | 200
interesting. For some reason I sit at 82 solid.
I ended up writing up a nginx script that if the request to the hwa container failed, it would reroute to the cpu container. It handled quite a few timeouts, but ultimately still crashed
It could be a difference in rounding. These results are also from a while ago so there might be a change in code behavior since then
Running the float 16 works to an extent. I don't think it has any impact on my smart search model though. Running both tasks simultaneously has some issues
and just like that
the float16 model crashed
guess I spoke too soon
Out of resources?
yea
was running at around 1.6-1.7gb then suddenly spiked to 4.33
immich_server
immich_machine_learning
Could you post the ram usage too?
Hmm, I guess it ultimately can't handle a string of images with high and different numbers of faces
float16 just delays the inevitable there
I don't know if it made a difference tbh. crashed 8min after running resources though
I wonder what made the CPU spike, as that's what took it all down
sorry it's gone. Let me re-run this to regenerate it
I'm wondering if my cpu is having issues using more ram than the 8gb official. I've seen it use 8.x ram before, but never 10+. Yes the remainder is full of cache, but ...
I needed to get dual rank (2Rx8) memory to recognize a 16gb module in the system
so after a little playing around I made a super hacky solution to this.
I have 2 instances of immich_machine_learning running. One is hwa (and preloads clip), the other is cpu and resting idle.
I also added a nginx container, which is a load balancer of sorts. All ML tasks are sent to the load balancer, which are then forwarded to hwa container. If the request comes back as a failure, it gets forwarded to the cpu container. I have an external bash script monitoring the load balancer logs, and if an error is logged, it restarts the hwa contaier.
I'm now using approx 12gb ram, machine learning is running ok with hwa, and so far it hasn't transitioned to the cpu container (since I ended up preloading the clip module about 5 min ago)
Edit: I also unintentionally updated to 1.116, if that makes any difference
following up on this, I disabled full container restarts for awhile, and so far the main container is managing to recover when it doesn't have the request waiting for it while it's starting up (nginx marks the container down for short period upon error).
Turns out it keeps eating more memory every time it reloads, so the script to restart the container is required
@topaz basin does this look like a memory leak to you? after running my script to keep rebooting the hwa container on failures, I left the cpu container untouched. The cpu container is using significantly more ram now then it did 5 hours ago (over 1gb at idle, after the container is sent SIGINT and reloads to a "fresh" state).
and the last image is after startup was complete after running docker restart immich_machine_learning_cpu
Both containers are using the openvino image, but the cpu container is not permitted access to the drivers, so it falls back to cpu
immich_machine_learning_cpu grows after one round of usage
You should use the cpu image instead of the openvino image for cpu. It uses a more advanced memory allocator that’s more effective at avoiding fragmentation
It isn’t installed in the openvino image because it was causing issues for some users
oh interesting. I guess I can look at the differences between the containers and manually build to see if it solves my issue, or do you think it'll create more issues?
So a memory leak in the openvino image is a known thing?
It isn't a memory leak per se, just memory fragmentation. But yes, I imagine the fact that the openvino image uses the default glibc allocator probably contributes to the wonky RAM usage
Feel free to extend the openvino image to install a different allocator like mimalloc, snmalloc or jemalloc. Just be sure to set LD_PRELOAD to make sure it's used. mimalloc is set like this:
lib_path="/usr/lib/$(arch)-linux-gnu/libmimalloc.so.2"
export LD_PRELOAD="$lib_path"
mimalloc and snmalloc are newer allocators that generally perform the best. jemalloc is older but tried-and-true.
There's also tcmalloc, which I have no experience with
next to no difference with jemalloc