#Immich ML not working with rocm on 860M, 880M, 890M (gfx1150, gfx1151).

1 messages · Page 1 of 1 (latest)

lost glenBOT
#

:wave: Hey @sand gazelle,

Thanks for reaching out to us. Please carefully read this message and follow the recommended actions. This will help us be more effective in our support effort and leave more time for building Immich immich.

References

#

Checklist

I have...

  1. :ballot_box_with_check: verified I'm on the latest release(note that mobile app releases may take some time).
  2. :ballot_box_with_check: read applicable release notes.
  3. :ballot_box_with_check: reviewed the FAQs for known issues.
  4. :ballot_box_with_check: reviewed Github for known issues.
  5. :ballot_box_with_check: tried accessing Immich via local ip (without a custom reverse proxy).
  6. :ballot_box_with_check: uploaded the relevant information (see below).
  7. :ballot_box_with_check: tried an incognito window, disabled extensions, cleared mobile app cache, logged out and back in, different browsers, etc. as applicable

(an item can be marked as "complete" by reacting with the appropriate number)

Information

In order to be able to effectively help you, we need you to provide clear information to show what the problem is. The exact details needed vary per case, but here is a list of things to consider:

  • Your docker-compose.yml and .env files.
  • Logs from all the containers and their status (see above).
  • All the troubleshooting steps you've tried so far.
  • Any recent changes you've made to Immich or your system.
  • Details about your system (both software/OS and hardware).
  • Details about your storage (filesystems, type of disks, output of commands like fdisk -l and df -h).
  • The version of the Immich server, mobile app, and other relevant pieces.
  • Any other information that you think might be relevant.

Please paste files and logs with proper code formatting, and especially avoid blurry screenshots.
Without the right information we can't work out what the problem is. Help us help you ;)

If this ticket can be closed you can use the /close command, and re-open it later if needed.

lost glenBOT
sand gazelle
#

ahh it could be the fact that rocm 7 uses a newer kernel version of linux than the docker image?

#

im on 6.14.0-37-generic

sand gazelle
#

ah no apparently docker uses the hosts kernel

sand gazelle
#

help!

void scarab
#

seems like rocm 7.2 fixes this issue

sand gazelle
#

I think I have rocm 7.2

#

@void scarab any tips?

#

oh wait

#

maybe not on my server let me check

void scarab
#

I mean we'll need to compile with 7.2 first

sand gazelle
#

oh yeah that would be awesome

#

am I the only one feeling like AMD did huge steps forward lately?

#

So is there something I can do in the meantime?

void scarab
#

No, it seems like they're getting their shizzle together these days

#

Not use ROCm for now? 👀 It's being kind of a butt lately

sand gazelle
#

I'd like my OCR to keep working?

#

yeah....

void scarab
#

switching over to non-HWA doesn't affect what you can or can't process

sand gazelle
#

they released so mucchhh stufff lately devs cant follow up

#

they're going faster than freaking formula 1

#

hummm

sand gazelle
#

@void scarab what version of rocm would work? I'd rather stay with 7.1 or 7.2 but if I have to go back to 6.4.... I don't think this stuff is in the docs either? Maybe it should be mentioned?

void scarab
#

I mean that once the immich container upgrades to rocm 7.2 it might go well

sand gazelle
#

do you think it's planned?

void scarab
#

Currently ROCm is building super slow too so I'm not even sure 2.5.3-rocm exists

sand gazelle
#

should I disable ml in the meantime to save ressources? I guess whenever it tries to run it uses useless cpu ressources and fails

void scarab
#

Whats wrong with using cpu?

sand gazelle
#

@void scarab cpu works fine for face and ocr ML?

void scarab
#

Why wouldn't it?

#

GPU is just to get it to run faster

sand gazelle
#

true

#

I'll just save some watts tho

#

cause my cpu has a 65w TDP I'd rather have it run iddle and wait for the rocm container update

#

I could lower the tdp in the bios...

void scarab
#

Does your GPU not have 200-300W TDP? 👀

sand gazelle
#

oh I'm running immich on my server with a ryzen 7 ai pro 360 it has an igpu an amd ryzen 880m

#

it's supported by rocm

#

I'm downgrading to rocm 7.1 see if it helps

#

the gpu access fault seems to come from my end

#

clinfo has the same problem so

#

its not the immich image, I'll close this

sand gazelle
#

I re-opened the post to clarify that rocm has limitations on ryzen apu's that's causing this issue according to bing: HIP pageable memory hangs

clinfo crashes

Missing HMM support

ROCm APU limitations

Those are architectural limitations of ROCm on Ryzen AI today.

#

According to copilot:
Even with perfect repos, Ryzen AI APUs (gfx1150/gfx1151) still have:

No OpenCL support

No HMM / unified memory

No pageable memory copies

Partial HIP support only

#

closing again

sand gazelle
#

sooooo

#

I re-openeds this just in case anyone has a ryzen apu and got the ML container to work

#

I swear it used to... just doesn't anymore

#

I have a ryzen 7 ai 360 pro with a 880M I'm running the correct kernel but I think it's just not implemented yet for apu's

#

I think the real issue is it has something to do with mesa drivers and dkms conflicts
it's more complex than it seems

#

solved

#

7.2 doesn't work with apu's right now

#

re-opened because they're blocking gfx1103 at the moment looking for a work-around

#

rocBLAS error: Cannot read /opt/rocm/lib/rocblas/library/TensileLibrary.dat: No such file or directory for GPU arch : gfx1103
List of available TensileLibrary Files :
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1100.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1201.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx90a.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1151.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1101.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1150.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx942.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1200.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1030.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx908.dat"
"/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1102.dat"
[02/06/26 20:07:45] ERROR Worker (pid:110) was sent code 134!

#

I need to fake gfx1150

#

almost there

#

:0:/longer_pathname_so_that_rpms_can_support_packaging_the_debug_info_for_all_os_profiles/src/clr/hipamd/src/hip_global.cpp:158 : 0966326528 us: Module not initialized

#

trying to find a good spoof that works... tried 11.0 with a fake gfx1150 and got this

#

im playing around with a docker compose override:
services:
immich-machine-learning:
devices:
- /dev/kfd
- /dev/dri
environment:
# Enable ROCm/HIP for PyTorch
HSA_OVERRIDE_GFX_VERSION: "11.0.0"
#HIP_VISIBLE_DEVICES: "0"
#ROCM_VISIBLE_DEVICES: "0"
# Optional: force PyTorch to use ROCm
PYTORCH_ROCM_ARCH: "gfx1150"
# Immich ML tuning
IMMICH_MACHINE_LEARNING_GPU: "true"
group_add:
- "render"
- "video"

#

trying to find a good match

sand gazelle
#

mostly getting /src/clr/hipamd/src/hip_global.cpp:158 : 7365836103 us: Module not initialized

#

and amdgpu dmesg says: [ 7404.417640] amdgpu: Freeing queue vital buffer 0x76b7c0c00000, queue evicted

sand gazelle
#

probable cause is that amd-ttm doesn't see it has available memory

#

and I'm unable to change the value

#

so indeed the dkms drivers do prevent rocm from seeing the memory somehow

#

I've set the minimum to 2gb vram in the bios and I'm updating the pages limit manually - you have to have secure boot off tho

sand gazelle
#

meh doesn't solve nothing

sand gazelle
#

I tried quite a few things nothing seems to work with gfx1050 / 1051....

sand gazelle
#

sooo

#

had a lot of issues

#

ended up just fixing stuff, disabling secure boot and lowering my max concurrent tasks and using smaller models

#

is immich-app/ViT-SO400M-16-SigLIP2-384__webli too big for 16gb vram or is it ok (clip)?

#

i went down from antelopev2 to buffalo and from the multilingual server ocr to en only mobile ocr

#

after all it's basically memory issues and amd-ttm is useless you realpy need to set a big buffer in bios on an apu despite what their doc says

#

i also installed rocm 7.2 in the container hopefully i didnt fk stuff up

#

and using gfx override 1100 is still the thing to do for now

sand gazelle
#

Immich ML not working with rocm? (APU radeon 880m)

sand gazelle
#

the sweet spot according to researches I've made is 32GB of ram or more, and if you have 32GB, set 8GB aside in the bios for VRAM and the OS or ROCM will allocate the rest as needed. however the current rapidocr onnx models do not seem to have support for rocm at all so it's a waste of time to try to get those working - the solution will be to wait for another compatible onnx model or to create your own image of immich ML....

sand gazelle
#

So, in the end, just use cpu for ML it's pointless to try and fix rocm at the moment as it doesn't fully support igpu's very well (and it's not their focus anyway).

#

The cpu ML fixed all my issues and ran much faster than the buggy rocm libraries.

sand gazelle
#

The Rock 7.12 supports the latest 1150/1151 igpus at last, making it possible to run compute workloads like ocr on the GPU.

Even if you have a gfx1150 gpu you need to follow these instructions - the 1150 pipeline is fked up according to amd devs

You need a minimum of 8gb vram set in the bios and 32gb total ram...

Make sure you're running ubuntu 24.04 and update to the latest HWE kernel (6.17.0-14-generic):
sudo apt update
sudo apt dist-upgrade

You must create an env var in your .bashrc file:
export HSA_OVERRIDE_GFX_VERSION=11.5.1

You need to install these first:
sudo apt install wget python3 libatomic1

Then run:
sudo usermod -a -G render,video $LOGNAME

reboot

Still from your home dir:
mkdir therock-tarball && cd therock-tarball
wget https://therock-nightly-tarball.s3.amazonaws.com/therock-dist-linux-gfx1151-7.12.0a20260218.tar.gz
mkdir install
tar -xf *.tar.gz -C install

Run this whole paragraph as a copy/paste in your terminal if you can it's faster:

# Configure ROCm PATH. Make sure you're in the therock-tarball directory before proceeding. ROCM_INSTALL_PATH=$(pwd)/install sudo tee /etc/profile.d/set-rocm-env.sh << EOF export ROCM_PATH=$ROCM_INSTALL_PATH export PATH=\$PATH:\$ROCM_PATH/bin export LD_LIBRARY_PATH=\$ROCM_PATH/lib EOF sudo chmod +x /etc/profile.d/set-rocm-env.sh source /etc/profile.d/set-rocm-env.sh

If you run rocminfo after and amd-smi version it should work, and the ultimate test is to run test_hip_api - if it succeeds, ur good!

Add these to your docker compose .env file if issues arise while running the ML container:
MIGRAPHX_DISABLE_MLIR=1 MIGRAPHX_ENABLE_REPRODUCIBLE_COMPILE=1

Also, use the OCR server model, it actually uses the gpu correctly while the mobile models apparently produce image sizes that rocm/migraph/miopen has a hard time to understand.

sand gazelle
#

Immich ML not working with rocm on 860M, 880M, 890M (gfx1150, gfx1151).

wispy kindleBOT
sand gazelle
#

oh well, seems like the ocr is happening but there's no data when searching... weird. no errors.