Hello! I'm trying to install and run Applio with a Radeon RX 7800 XT. I followed the official documentation for AMD, but when I launch Applio, I get this error: UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at ..\c10\cuda\CUDAFunctions.cpp:108.)
#Applio AMD
1 messages · Page 1 of 1 (latest)
what's your adrenalin version?
well, you probably messed up something in the applio guide
anyway, that one does not work with the latest adrenalin
grab the source code and unzip it into C:\Applio
if C:\Users\username\Miniconda3 exists, delete it
drop this into C:\Applio and run to install
run env\python to verify version 3.11 was installed
download these files into C:\Applio
open command prompt cmd.exe in C:\Applio
run env\python -m pip install torch-2.7.0a0+rocm_git3f903c3-cp311-cp311-win_amd64.whl
and env\python -m pip install torchaudio-2.7.0a0+52638ef-cp311-cp311-win_amd64.whl
open rvc\train\train.py and add a line torch.backends.cudnn.enabled = False like this
benchmark should be set to False as well
after than use run-applio.bat
it does not create a run-applio.bat
lmk if anything explodes
xd I'll, thanks a lot !
@magic shadow I started training and the gpu uses 100% idk if its normal
yes, it is using gpu
ok thanks 🥲
as long as the shared memory is not used you should get decent performance
dont use default high batch size
I'm using a 16 batch size for like a 17min database
😕
XD
4-8
I looked on the discord and ppl said more shouldnt be an issue
so idk
should i restart at 8?
it 17 min is ~20 slices/min = 340 slices.... ideally you want no less than 50 steps/epoch
try 4, or 6
see which results in a better sounding model
there's no rule set in stone what you should use
every dataset is different
ok I'll try 4, thanks !
I don't really know what I did wrong but my graphs doesn't look like the documentation 🥲
uncheck both
dont bother with old loss charts, use new avg_50
show all the charts
not like this
collapse grads, expans loss_avg_50
like this
your kl chart is unusually high
idk what I've done
click this under each chart
anyway, I'd not expect anything good from this.. give a model from ~4.5k steps a test
it seems that you've resumed it with a different batch size
now ?
bcs I did not
what kind of voice is it in the dataset?
kl is supposed to do down under 1 pretty quickly
goku kid voice in french dub
may wanna figure what's wrong with the dataset
perhaps the dataset wasn't cleaned properly
Hi
Can I have an example of what the graphs should look like?
I followed everything from the documentation so idk
maybe I'll try with another dataset
ideally these 3 going down
and converging at some value
fm chart going down or staying more or less flat
but usually it goes up... as long as it does not go more than 1/10k steps it is usually fine
example
as for norm g, as long as it does not shoot over 1k it is fine
or batch size
I've tried w 6 today and the same dataset and my graphs look like this 🤣
smoothing is at 0 btw
something went wrong
oh :/
unfortunately the working method with Zluda requires a driver rollback to 25.4.1 or 25.3.1
I can try that
ok so I download applio again ?
hip sdk 6.1 or 5,7
no, you can keep current
you'll deinstall torch and other stuff if you follow it
?
that exploded with NaNs
make sure the driver did not automatically update to the latest
it did not
🤷♂️
latest version of applio doesnt have that issue unless you messed up the code
^
wow idk I'm on 3.2.9
I'll download it again and retry
well
is it all NaN loss values in the terminal window?
that issue looks like the one in the old mangio RVC
test | epoch=2 | step=206 | time=13:29:10 | training_speed=0:03:25 | lowest_value=34.37 (epoch 1 and step 7) | Number of epochs remaining for overtraining: g/total: 50 d/total: 100 | smoothed_loss_gen=34.370 | smoothed_loss_disc=nan Saved model 'C:\Applio-3.2.9\logs\test\test_2e_206s_best_epoch.pth' (epoch 2 and step 206) New best epoch 3 with smoothed loss_g 34.370 and loss_d nan test | epoch=3 | step=309 | time=13:32:34 | training_speed=0:03:23 | lowest_value=34.37 (epoch 1 and step 7) | Number of epochs remaining for overtraining: g/total: 50 d/total: 100 | smoothed_loss_gen=34.370 | smoothed_loss_disc=nan
I have no clue as I see you're using AMD gpu 
thanks for your help tho !
I've tested 6.2.4 install method on my spare pc
I know it works fine
I have 6.1.2
I’m down to try if I can
So I should go back to latest drivers ?
okay thanks !
in the patch file :
C:\Applio-3.2.9>rmdir /S /q zluda The system cannot find the file specified.
idk if it's fine
maybe rm -rf zluda ?
okok perfect
it seems to work
thanks you a lot !!
Should I use another Embedder Model because I want to train a French model ?
- default contentvec is fine
- only as an experiment to see whether it helps or not
where can I find a good French embedder model ?
there's new experimental spin that may improve some things
but you need a spin-trained pretrain as well
I don't know what it is
then use default one 🙂
I trained a first model on it. But when I try it, it's incomprehensible and there is some random breathing sounds
so I guessed it was because of that
any idea on how to fix this ?
show the model_info.json
did you use a pretrain?
{ "total_dataset_duration": "00:33:56", "total_seconds": 2036.0395624999999, "embedder_model": "contentvec", "speakers_id": 1 }
I don't think
if you did uncheck the [x] Pretrained setting for some reason, that may explain your results. Or if you're trying to run inference using a custom embedder.
hi ! sorry I was not at home so I didn’t try anything but what should I do ? Train again w the same settings ?
can you run tensorboard an show what the chart look like for the model you trained?
your model exploded again
I saw that but I tried the model before the explosion and it was incomprehensible
@obtuse gazelle lmk when you're online, I got a test to run
sorry I was on vacation
I’m here all day 🫡
remind me, what did we manage to install last time? was it 7800xt and zluda?
latest drivers and 6.2.4
and yes 7800xt
save these to Applio's folder, then run bench.bat
ok, I did it !
what should I do now?
copy the output
this ?
C:\Applio-3.2.9>zluda\zluda.exe -- env\python.exe bench.py Using cuda torch.float32 Compilation is in progress. Please wait... linear : 0.0587s conv1d 192x192x1 : 0.2302s Compilation is in progress. Please wait... conv1d 192x768x3 : 0.3445s conv1d 768x768x1 : 1.1695s Compilation is in progress. Please wait... Compilation is in progress. Please wait... up_0 : 0.4844s up_1 : 0.6375s up_2 : 0.4737s up_3 : 0.4587s dn_0 : 0.2608s dn_1 : 0.2592s dn_2 : 0.2634s dn_3 : 0.2252s res1a : 0.4191s res1b : 17.1694s res1c : 17.3982s res2a : 0.9267s res2b : 17.1960s res2c : 17.7140s res3a : 1.4904s res3b : 17.5465s res3c : 17.7366s conv_post : 0.3962s torch.float16 linear : 0.0472s conv1d 192x192x1 : 0.1530s conv1d 192x768x3 : 0.2621s conv1d 768x768x1 : 0.2638s Compilation is in progress. Please wait... Compilation is in progress. Please wait... Compilation is in progress. Please wait... up_0 : 0.4987s up_1 : 0.5005s up_2 : 0.4999s up_3 : 0.4968s dn_0 : 0.2553s dn_1 : 0.2583s dn_2 : 0.2593s dn_3 : 0.2271s res1a : 0.3422s res1b : 17.2472s res1c : 17.3245s res2a : 0.3494s res2b : 17.6717s res2c : 17.2763s res3a : 0.3481s res3b : 17.7725s res3c : 17.9646s conv_post : 0.1600s torch.bfloat16
linear : 0.0598s conv1d 192x192x1 : 0.1472s conv1d 192x768x3 : 0.2555s conv1d 768x768x1 : 0.2573s up_0 : 0.4465s up_1 : 0.4557s up_2 : 0.4575s up_3 : 0.4576s dn_0 : 0.2578s dn_1 : 0.2560s dn_2 : 0.2563s dn_3 : 0.2247s res1a : 0.2942s res1b : 17.7791s res1c : 17.8637s res2a : 0.3042s res2b : 17.7352s res2c : 16.6792s res3a : 0.3150s res3b : 16.5893s res3c : 17.7460s conv_post : 0.1592s
ah, run one more time so compilation times are not included
okok
torch.bfloat16 linear : 0.0232s conv1d 192x192x1 : 0.1586s conv1d 192x768x3 : 0.2567s conv1d 768x768x1 : 0.2564s up_0 : 0.4270s up_1 : 0.4556s up_2 : 0.4543s up_3 : 0.4557s dn_0 : 0.2546s dn_1 : 0.2567s dn_2 : 0.2561s dn_3 : 0.2279s res1a : 0.3000s res1b : 17.4343s res1c : 17.4720s res2a : 0.3019s res2b : 17.6095s res2c : 17.9372s res3a : 0.3191s res3b : 17.7246s res3c : 17.8517s conv_post : 0.1583s
yeah same issue :/
can I fix it ?
download it
unzip
then move all the folders into C:\program files\amd\rocm\6.2
overwriting when asked
ok done
then in rvc\lib\zluda.py add 1==2 and
if 1==2 and torch.cuda.is_available()
err
hold on
oh
okay
run the bench
i replace it here ? if torch.cuda.is_available() and torch.cuda.get_device_name().endswith("[ZLUDA]"):
no, just add 1==2 and after if
okay
to disable this check and run with cudnn
it's compiling again
Compilation is in progress. Please wait... Traceback (most recent call last): File "C:\Applio-3.2.9\bench.py", line 75, in <module> t = benchmark_op(layer.to(dtype), x.to(dtype)) File "C:\Applio-3.2.9\bench.py", line 23, in benchmark_op _ = op(x) File "C:\Applio-3.2.9\env\lib\site-packages\torch\nn\modules\module.py", line 1751, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "C:\Applio-3.2.9\env\lib\site-packages\torch\nn\modules\module.py", line 1762, in _call_impl return forward_call(*args, **kwargs) File "C:\Applio-3.2.9\env\lib\site-packages\torch\nn\modules\conv.py", line 974, in forward return F.conv_transpose1d( RuntimeError: GET was unable to find an engine to execute this computation
goddamit
🤣
I've seen this before
I trust you 🫡
oh
download the HIP SDK develop
it contains 6.5 folder
that goes to program files/amd/rocm
next to 6.2
change environment variables Path entry to point to 6.5/bin
unzip to C:\
it makes Applio-main folder, run run-install.bat
while it is installing download torch and torchaudio wheels from the link above into Applio's folder
into the main folder ?
new applio-main
while run-install.bat is running and doing its stuff
so you dont sit and stare at it for 5 minutes
okok
done 🫡
so run-install is done and you've downloaded the .whl files?
yes
and unzipped hip sdk develop into rocm folder
and changed the path
okay, open cmd in Applio-main
env\python -m pip install torch-, press tab to auto-complete, enter
env\python -m pip install torchaudio-, press tab to auto-complete, enter
env\python bench.py
yes, copy it over
if it runs and does not blow up and numbers are not in 5s+ it should be good
xd
C:\Applio-main>env\python bench.py Traceback (most recent call last): File "C:\Applio-main\bench.py", line 1, in <module> import torch File "C:\Applio-main\env\Lib\site-packages\torch\__init__.py", line 274, in <module> _load_dll_libraries() File "C:\Applio-main\env\Lib\site-packages\torch\__init__.py", line 257, in _load_dll_libraries raise err OSError: [WinError 127] The specified procedure could not be found. Error loading "C:\Applio-main\env\Lib\site-packages\torch\lib\torch_hip.dll" or one of its dependencies.
these red flagged files should be in 6.5/bin
that's all I got
they are mb
could you run where hipconfig
in the bin?
no, from a cmd window
okok
a new cmd window in applio-main
C:\Program Files\AMD\ROCm\6.5\bin\hipconfig.bat C:\Program Files\AMD\ROCm\6.5\bin\hipconfig.exe
xd no I did it in a new cmd
C:\Applio-main>env\python bench.py Traceback (most recent call last): File "C:\Applio-main\bench.py", line 1, in <module> import torch File "C:\Applio-main\env\Lib\site-packages\torch\__init__.py", line 274, in <module> _load_dll_libraries() File "C:\Applio-main\env\Lib\site-packages\torch\__init__.py", line 257, in _load_dll_libraries raise err OSError: [WinError 127] The specified procedure could not be found. Error loading "C:\Applio-main\env\Lib\site-packages\torch\lib\torch_hip.dll" or one of its dependencies.
same error
well, open a new cmd in applio-main
🥺
done
file, open, C:\Applio-main\env\Lib\site-packages\torch\lib\torch_hip.dll
screenshot
no, the top left corner
oh
weirder and weirder
this is crazy 🤣
try expanding those files at the top
click >
see if anything red shows under them
you can also try opening C:\Applio-main\env\Lib\site-packages\torch\__init__.py in notepad, find 0x0000 and change it to 0x0001
prev_error_mode = kernel32.SetErrorMode(0x0001)
oops, change it to 0x0000
save, then run the bench again
it should pop up a window to say what it could not find
too much files to expand
i did not say expand all files, just 1st level
okay, lets try the init edit
change prev_error_mode = kernel32.SetErrorMode(0x0001) to prev_error_mode = kernel32.SetErrorMode(0x0000)
okay
could you try setting HIP_PATH env variable
and point it to program files/amd/rocm/6.5
(not bin)
open new cmd after that and run the bench again
how are the numbers?
xd
res1a : 0.2208s MIOpen(HIP): Warning [IsEnoughWorkspace] [GetSolutionsFallback WTI] Solver <GemmFwdRest>, workspace required: 1228800, provided ptr: 0000000000000000 size: 0 MIOpen(HIP): Warning [IsEnoughWorkspace] [EvaluateInvokers] Solver <GemmFwdRest>, workspace required: 1228800, provided ptr: 0000000000000000 size: 0 res1b : 9.5653s MIOpen(HIP): Warning [IsEnoughWorkspace] [GetSolutionsFallback WTI] Solver <GemmFwdRest>, workspace required: 1228800, provided ptr: 0000000000000000 size: 0 MIOpen(HIP): Warning [IsEnoughWorkspace] [EvaluateInvokers] Solver <GemmFwdRest>, workspace required: 1228800, provided ptr: 0000000000000000 size: 0 res1c : 9.5043s res2a : 0.5534s MIOpen(HIP): Warning [IsEnoughWorkspace] [GetSolutionsFallback WTI] Solver <GemmFwdRest>, workspace required: 2867200, provided ptr: 0000000000000000 size: 0 MIOpen(HIP): Warning [IsEnoughWorkspace] [EvaluateInvokers] Solver <GemmFwdRest>, workspace required: 2867200, provided ptr: 0000000000000000 size: 0 res2b : 21.4636s MIOpen(HIP): Warning [IsEnoughWorkspace] [GetSolutionsFallback WTI] Solver <GemmFwdRest>, workspace required: 2867200, provided ptr: 0000000000000000 size: 0 MIOpen(HIP): Warning [IsEnoughWorkspace] [EvaluateInvokers] Solver <GemmFwdRest>, workspace required: 2867200, provided ptr: 0000000000000000 size: 0
xddd
go back to zluda lol
change the env variables back to 6.2
and delete 6.5, that's 15GB of junk
so I can't do anything? :(
other option is to try the nightly zluda build
but again it is another experimental thing
HIP SDK extension: https://drive.google.com/file/d/1Gvg3hxNEj2Vsd2nQgwadrUEY6dYXy0H9/view?usp=sharing
Google Docs
this upzips into rocm/6.2 folder
it should overwrite some stuff
you know what
lets not do that
oh
i wanna do something better with my friday afternoon
you'll need to re-do the 6.2.4 install, but use thi file instead
edit patch zluda and put this file instead of the regular zluda rocm6
if you dont want to runin your current install, unzip the compiled version of applio somewhere else again
I didn't really understand what I have to do w that
but answer when u can dw !! I don't want to ruin ur afternoon
patch zluda 62.bat has curl -s -L https://github.com/lshqqytiger/ZLUDA/releases/download/rel.5e717459179dc272b7d7d23391f0fad66c7459cf/ZLUDA-windows-rocm6-amd64.zip > zluda.zip
replace it with the nightly
did you uninstall/reinstall torch cu118?
I did
then patched it using updated night zluda?
yes
and hip extensions unzipped to the rocm/6.2 folder with overwriting some of the stuff there?
yes
okay, you can try running the bench
one with 1==2 and one without
to see if that makes a difference
zluda was extracted in main and not in a folder
C:\Applio-3.2.9>zluda\zluda.exe -- env\python.exe bench.py The system cannot find the path specified.
so it says that
defender I think
yes, for some reason it does not have zluda folder in it
if you ran the patch already, you need to re-do torch again
since an important dll was moved
okok
okay, so make sure the content of nightly is manually unzipped into zluda folder
in patch zluda delete lines
curl -s -L https://github.com/lshqqytiger/ZLUDA/releases/download/rel.5e717459179dc272b7d7d23391f0fad66c7459cf/ZLUDA-windows-rocm6-amd64.zip > zluda.zip
tar -xf zluda.zip
del zluda.zip
well well well
torch.float16 linear : 0.0290s conv1d 192x192x1 : 0.2268s conv1d 192x768x3 : 0.2659s conv1d 768x768x1 : 0.2734s up_0 : 0.5102s up_1 : 0.5176s up_2 : 0.5192s up_3 : 0.5293s dn_0 : 0.2644s dn_1 : 0.2683s dn_2 : 0.2643s dn_3 : 0.2320s res1a : 0.3522s res1b : 17.8821s res1c : 17.8233s res2a : 0.3527s res2b : 18.0610s res2c : 18.2808s res3a : 0.3554s res3b : 18.1599s res3c : 18.2609s conv_post : 0.1569s
this is with 1==2?
without
so?
torch.float16 linear : 0.0693s conv1d 192x192x1 : 0.1666s conv1d 192x768x3 : 0.2663s conv1d 768x768x1 : 0.2680s up_0 : 0.5272s up_1 : 0.5275s up_2 : 0.5016s up_3 : 0.5246s dn_0 : 0.2676s dn_1 : 0.2680s dn_2 : 0.2690s dn_3 : 0.2374s res1a : 0.3558s res1b : 17.8415s res1c : 18.1317s res2a : 0.3602s res2b : 18.1202s res2c : 18.1842s res3a : 0.3532s res3b : 18.2191s res3c : 18.3551s conv_post : 0.1692s
ffs
@magic shadow nothing new?
nope, also they have some issue with nightly wheels
latest wheels are torch-2.9.0a0+rocm7.0.0rc20250826-cp311-cp311-win_amd64.whl torchaudio-2.8.0a0+rocm7.0.0rc20250826-cp311-cp311-win_amd64.whl
from a week ago
there are newer wheels posted to https://rocm.nightlies.amd.com/v2/gfx110X-dgpu/
env\python -m pip install torch torchaudio --index-url https://rocm.nightlies.amd.com/v2/gfx110X-dgpu --upgrade
from applio folder
from a fresh applio ?
--index-url option requires 1 argument
works for me
if you get 'requirement already satisfied', try env\python -m pip uninstall torch torchaudio torchvision first
you can give it a try
but i would not expect much.. they have not updated my ticked with any resolution
Applio should work without (or with very minimal changes), they are still fixing other things
Failed to load amdhip64.dll: amdhip64.dll: Can't open: The specified module could not be found. (0x7E) [ERROR] amdgpu-arch failed with return code 1 [stderr]
bigger error
in the 3.4.0 file, there is no more amd installation?
??
you download the compiled v3.4.0, unzip, uninstall torch torchaudio torchvision
install torch from nightly
that should be it
after that you can run the bench
env\python bench.py
show the full error
you may be missing this file in C:\windows\system32
have you used dependency checker before?
I think so
so I add it ?
you can check this file
but most likelt it will be the .dll I gave you
just save it either in the same folder as shm.dll or into windows/system32
also
check windows/system32 folder
there should be a file name amdhip64_number.dll
6 or 7
make a copy of it without _number
that should fix
as I see from the bench screenshot they have not fixed the res b/c yet
9 seconds is 10x slower than expected
but anyway, give training a try
may need to fix a few things manually
in the same folder as shm?
c:\windows\system32
so I copy it in system 32?
if training explodes, delete ``` dist.init_process_group(
backend="gloo" if sys.platform == "win32" or device.type != "cuda" else "nccl",
init_method="env://",
world_size=n_gpus if device.type == "cuda" else 1,
rank=rank if device.type == "cuda" else 0,
)
yes
and the other possible fix needed is removing @torch.jit.script from rvc\lib\algorithm\commons.py
(they should fix this soon enough)
okok I'll try a training later and I'll tell u
@magic shadow When I start training :
AttributeError: module 'torch.distributed' has no attribute 'init_process_group'
devs have reproduced the slowness with res b/c tests, so hopefully going to make some workaround
the dist.init_process_group should've been fixed partially(?) with latest
I hope so
could you try downloading the latest applio, then installing the latest nightly wheels again and try inference?
or training with bf16 precision
@obtuse gazelle
where do I find the latest nightly wheels ?
res1b : 9.5445s MIOpen(HIP): Warning [IsEnoughWorkspace] [GetSolutionsFallback WTI] Solver <GemmFwdRest>, workspace required: 1228800, provided ptr: 0000000000000000 size: 0 MIOpen(HIP): Warning [IsEnoughWorkspace] [EvaluateInvokers] Solver <GemmFwdRest>, workspace required: 1228800, provided ptr: 0000000000000000 size: 0 res1c : 9.4794s res2a : 0.6199s MIOpen(HIP): Warning [IsEnoughWorkspace] [GetSolutionsFallback WTI] Solver <GemmFwdRest>, workspace required: 2867200, provided ptr: 0000000000000000 size: 0 MIOpen(HIP): Warning [IsEnoughWorkspace] [EvaluateInvokers] Solver <GemmFwdRest>, workspace required: 2867200, provided ptr: 0000000000000000 size: 0 res2b : 21.3799s MIOpen(HIP): Warning [IsEnoughWorkspace] [GetSolutionsFallback WTI] Solver <GemmFwdRest>, workspace required: 2867200, provided ptr: 0000000000000000 size: 0 MIOpen(HIP): Warning [IsEnoughWorkspace] [EvaluateInvokers] Solver <GemmFwdRest>, workspace required: 2867200, provided ptr: 0000000000000000 size: 0 res2c : 21.3607s
@magic shadow still no fix ?
there are newer wheels
should I try?
you can give it a try and report back
ok thanks, I'll try tomorrow
pip install --index-url https://rocm.nightlies.amd.com/v2/gfx120X-all/ --pre torch==2.9.1+rocm7.11.0a20251223
you can test it in a separate virtual environment
.