#SDNext WebUI on Intel ARC
1 messages · Page 3 of 1
yup
But it could be the commit though
they changed some stuff about memory etc.
likely still working on it so we are getting the pure upstream when compiling, some stuff might be worse atm
ill patiently wait them to fix it then
though im sure a lot of ppl out there are keen to try SD in native windows
if I am up to it....I may try and compile another day...nah, this is fine. Diffusers seems to work well, just the vae's don't show up
I wonder if they have a diffuser version for some of these vae's?
the models are the same
just the pipelines are written differently
afaik
so some samplers and plugins you cannot use when choosing diffusers backend
I think it may be safetensors, some are still ckpt and pt files
makes sense.
Hm so linux still quite a bit faster?
Diffuser backend is just as fast, and if you use the prebuilt wheels its fast but you have to wait 10-15minutes to start.
I'm still stuck with the same error I had before, as I genuinely have no clue how to fix it.
Vipitis seemed to get it to work with just having the conda installed and not deleting python 3. See if he can help you out. A theory I had for why it didn't work for me is i couldnt update python to the latest version and could only use 3.10.6 while conda is newer. 🤷♂️
When in WSL, Error 139. When I tried windows, it was a dll error.
no clue about WSL
I got ipex to work for CausalLM inference today but the JIT delay is horrible
I added a wheel up a few posts, requires one file edit in the instructions.
I might try that on Friday. Got a busy day tomorrow. And really need to get to sleep this time
Was up past 5 am the last few days
alarm at 7.30 which is in almost 6 hours
They are compiling from a branch and not xpu master, so xpu master adds a file we dont need that causes an error since it doesn't exist in torch
it's all about activating the one API environment with the servars script.
No doubt, I habe another wheel that doesn't need an edit I may uplaod it
I did the ipex webinar today and it was completely useless. They just talked about CPU stuff and a hyperparameter searching script they implemented.
If you want to compile yourself, change xpu master to the xpu 2.0 branch in the bat file
no useful information for GPU/xpu and my questions didn't get real answers either.
You can't save the JIT kernels to use them again or in other processes is what they confirmed to me.
Which I already did before starting.
That, and MKL + dcp
@grave condor#0 what version of python do you have?
3.9.4 I believe
don't use --use-ipex if you don't want to troubleshoot ipexrun
That may be the reason it wouldn't run right with python 3.10.6, i got the same error dan does. Had to I delete and add conda to path(doesn't matter for me since I dont do any real programming)
does ipexrun also work for xpu? I might need it for accelerate launch
xpu mode is available but it's slower than no ipexrun at all
cpu mode is the fastest
eh, I will try accelerate launch for the eval script, I believe by accelerate config got the xpu registered.
accelerate has --use-xpu cmd arg
I am using accelerate.Accelerator.device() right now for a simple device agnostic Implementation
haven't tried it on on all three options tho. But wanted to before I push it
Got it working again.
On WSL*
My only issue now
Is that line that keeps showing up
That line is a weird issue happens only at 1024x1024
Diffusers with attention slicing turned on doesn't have that issue
I have attention slicing on and that is happening.
Attention slicing off = Scaled Dot Product
Attention slicing on = Diffusers
Try turning it off and reload the model
It'a a weird issue and it doesn't go away without a complete restart
And it only happens at 1024x1024
768x1024 or 1024x1536 don't have this issue
Try 1080x1080
This happens in all models on IPEX with 1024x1024
I couldn't find why this happens exactly at 1024x1024
I cannot do 1920x1080
This needs 12 GB
No offload, all move options are on, VAE slicing and VAE tiling is on
And not attention slicing?
Attention slicing off = Scaled Dot Product
And this is without LORA right?
Without
Added a patch to dynamically slice it to keep under 4GB
https://github.com/vladmandic/automatic/commit/9d17cf4c122b98b25b8cb9e3388c1a75df68cdb2
Do you use these?
Like this
NO
Alright.
They are FP32
Well I don't use them.
I was just wondering is all.
Didn't know they were FP32.
Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.
I get an out of resources error when trying to generate with those diffuser settings.
No refiner, just a pixel art LORA.
Your RAM usage is 15 GB but it still runs out of RAM?
WSL is set to a ram limit of 24GB with a swap of 40GB.
When fully loaded it looks like this.
Wait, your GPU dies before it runs out of resources
Device Not Found error when trying to load a Lora
Let me try running a 1920x1080 image without a LORA.
Generation was a success.
Something's wrong with LORAs.
Lora support is still experimental in diffusers
That's very disappointing.
Yep, I can do 1920x1080 with the refiner included.
I really wanted to use loras, though.
Try lower res
1024x1536 is pretty stable
Didn't work. Out of resources.
I don't even think it will generate a 1024x1024 image with a LORA.
Nope.
Only way to do it is through model CPU offload.
Yep. With model CPU offload I can do 1080x1080 images with loras enabled.
my linux setup is so borked I get "out of resources" at anything above 1024, following SDXL optimization guide. 64gb sys ram, a770 LE
+nothing looks good, so i'm gonna have to review all of my setup 
fixed my linux setup
except for hires fix is not working at all, but hopefully more stuff will get updated to diffusers backend and then i won't have to
if I could do training on my card, I could just make my own loras...

thas prolly not good right
You renamed it according to the guide, I assume.
Rename it back to that, and set the VAE loading precision to be FP32.
so the instructions have changed slightly since then?
ahhh shoot, getting black images with SDXL again
or maybe its only CounterfeitXL getting black images
refiner isn't working at all though...
refiner: disabled
tried rebooting a few times
huh
you need to use fp16 fixed vae for counterfeitxl
the baked in vae never worked for me
also set precision to fp16
there is no need to upcast
yeah i'm using the fixed vae and precision type BF16
i tried fp16 and it didn't work any differently
then i have no idea why
specifically i didnt set anything about precision
so bf16 may or may not work, probably disty knows better
I need to also ask disty about xpu-smi
I can't get it to output gpu stats
the whole output is blank except for frequency and power
You have to run it with sudo or root.
Not at computer, but are you running it with the watch command that pings it every few seconds? I found that it would kinda glitch out a bit at first before it started working.
I am not, how do I use that?
Its something like this i think $ watch -n <interval> <command>
Replace <interval> with time interval at which you want command to repeat, in seconds. Replace <command> with command you want to repeat.
For example, if you want to run top command every 5 seconds, type following command −
$ watch -n 1 (or the number of seconds to run it. I don't know it by heart I will look it up real quick
Dunno how all that extra stuff added to my post, but I think thats it
oh that's really neat, thank!
No problem
I still don't get utilization, but at least its eventually giving me memory use
Yeah, it seems a bit glitchy. Sometimes making the prompt window bigger fixes it lol
oh yay, i'm starting to get stuff 
oh, I wasn't expecting to see the line on Original backend with an older model
and now I'm back where I was before, can't render using the old backend

and i'm going blind from a migraine, so I guess i'll try more another day
I thought it started working on its own but something is very wrong 
Use FP16 precision.
Use 1080x1080 resolution. Do not use 1024.
Set channelslast as well
This genuinely seems like the best overall setup on WSL
Isn't the current sd xl implementation flawed in the web ui? Atleast I saw an open issue working on something to do with it.
Some things just don't work.
LORAs for example do not work with sequential CPU offload, and will not run without model CPU offload.
Well I'm using vladmantic's automatic webUI
Not that.
A different fork entirely
It's quite different, though. Vladmantic's fork has 1500 commits ahead, 1200 commits behind.
Yeah but did he make the sd xl support or move it from main
I don't think it came from the original branch
All I see is comfy ui us better for some reason
Voxel and pixel art loras are great, man.
Well I cant mess with this for a month or so so hopefully everything is sorted by then
In native windows I have noticed that some samplers glitch in diffusers (so far with sd1.5) unipc doesn't work at all. Euler doesnt work with standard and Euler a with diffusers Maybe try messing with samplers, might not be the same issue though. Haven't gotten sny lines yet with sdxl, but I am using cpu model offload and have all the ipex optimizations enabled as they seem to help in windows.
SDNext is a complete rewrite at this point
No
A1111 was 3 weeks late and A1111's SDXL implementation is terrible
Sdnext was vlad diffusion which was originally a fork but changed the name because it became much different. Sdnext is better updated and keeps GPUs other than Nvidia in mind. Most extensions will work with both though, and there is a refiner extension but I have never tried it.
Works on the dev branch with Attention Slicing turned on and no offload.
How do I git checkout the dev branch?
What's the URL?
Akane lora, 2048x3072
git checkout -b dev but it will be merged soon anyway
Very good.
One thing I dont like, is sdnext disables control net when starting with sdxl, which sucks when switching back and forth to sd1.5.
This is not something that happened to me.
It stayed enabled even when I swapped to diffusers.
It disables when starting the ui
It will get disabled if you restart
Waiting for them to add the controlnet sdxl models but seems they are too big
would I just do ./webui.sh --use-ipex --upgrade --reinstall
do a git pull and this should be fine
An amazing model.
So far I've been exceedingly impressed with it.
just seems to be an ass to run
Does the dev branch resolve the 1024x1024 resolution issue btw?
hopefully that doesnt damper community engagement
That was an issue for a loong time
I couldn't find a fix for that
Why does 1024x1024 specifically cause artifacting, though?
That's what I don't understand.
Same thing happens on original backend too
Weird.
Swapped to the dev branch, enabled sequential CPU offload
disabled model CPU offload
put on a pixel art LORA.
Black images.
🤷♂️
I have fix for that
Don't use any offloading in the meantime
Alright.
Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.
Cannot copy out of meta tensor; no data!
Why am I getting meta tensor errors now?
I had this before, too. Not sure what caused it.
Can you not use sequential CPU offload combined with the sequential apply for LORAs?
@proper cradle Even with sequential CPU offload off, and LORA set to diffusers default
I'm getting meta tensor errors.
11:10:51-849166 ERROR Arguments: args=('task(zf6p5v3burvufxj)', '', '', [], 20, 3, 0, True, False, False, 1, 1, 6, 6,
0.7, 1, -1.0, -1.0, 0, 0, 0, 1080, 1080, False, 0.3, 2, 'Latent', 20, 0, 0, 0.8, '', '', [], 0,
False, False, 'positive', 'comma', 0, False, False, '', 0, '', [], 0, '', [], 0, '', [], True,
False, False, False, 0, False) kwargs={}
11:10:51-850403 ERROR gradio call: NotImplementedError```
Had to fully shutdown and restart WSL in order for it to generate an image with all offload types disabled.
Well model CPU offload + sequential apply LORA works.
Yeah, meta tensor errors for sequential only.
Did a git pull, states I'm already up to date
So no clue 🤷♂️
dan9070@dbs580:~/automatic$ git pull
Already up to date.
dan9070@dbs580:~/automatic$ git branch
* dev
dan9070@dbs580:~/automatic$```
git checkout origin/master
git branch -d dev
git checkout origin/dev
git pull
HEAD is now at 417ef540 Merge pull request #1971 from Aptronymist/master
dan9070@dbs580:~/automatic$ git branch -d dev
warning: deleting branch 'dev' that has been merged to
'refs/remotes/origin/dev', but not yet merged to HEAD.
Deleted branch dev (was 0a7105d5).
dan9070@dbs580:~/automatic$ git checkout origin/dev
Previous HEAD position was 417ef540 Merge pull request #1971 from Aptronymist/master
HEAD is now at 0a7105d5 Fix SDXL LoRa offloading and SD 1.5 parsing
dan9070@dbs580:~/automatic$ git pull
You are not currently on a branch.
Please specify which branch you want to merge with.
See git-pull(1) for details.
git pull <remote> <branch>
dan9070@dbs580:~/automatic$ git branch
- (HEAD detached at origin/dev)
dan9070@dbs580:~/automatic$ git pull
That good?
That just means your not on the main branch iirc. You will need to switch back to update later though i think anyway
I'm using the dev branch to utilize Sequential CPU offload with Sequential LORA.
Sequential stopped working for me in windows native version, haven't tried latest commit though.
But model offload runs well. Sequential is slower now anyway
Sequential is meant to be slower.
Lol
Apologies if I misunderstand git a bit, as I'm not usually one to mess heavily into repositories.
@proper cradle Is * (HEAD detached at origin/dev) correct for git branch?
git checkout dev
Switched to a new branch 'dev'```
git pull
Already up to date.```
git status
On branch dev
Your branch is up to date with 'origin/dev'.
nothing to commit, working tree clean```
Identical settings to mine
updated: 2023-08-10
hash: 0a7105d5
url: https://github.com/vladmandic/automatic/tree/dev```
What lora are you using?
Kurokawa Akane Lora:
Prompt: (masterpiece, best quality, highres, anime, pixiv), (1girl, kurokawa akane, blue hair, green eyes, medium hair, gradient hair, solo, full body, standing on an abstract water), (bloom, swirling lights, light particles, detailed, 8k), <lora:training_model:1.0>
Negative prompt: (worst quality, low quality:1.4, lowres, blurry), (3d, interlocked fingers, loli, 2girls),
Steps: 40 | Seed: 4107994374 | Sampler: Euler a | CFG scale: 10 | Size: 1024x1536 | Parser: Full parser | Model: SDXL_astreapixieXLAnime_v16 | Model hash: 432e15eb | VAE: sdxl-vae-fp16-fix | Version: 0a7105d | Pipeline: Diffusers | Operations: txt2img | Lora hashes: "training_model: efe6c5dadf89"
Time taken: 1m 46.25s |
GPU active 1016 MB reserved 1358 MB | System peak 341 MB total 16288 MB
Just a question.
The moment I swapped to this branch, it stopped detecting all of my SDXL safetensors.
The only one is 1.5.
Refreshing does nothing.
Does it have the correct perms?
I don't know what perms would've changed from the previous repo.
It detects only SD 1.5.
Run ls -lh in that folder (Inside WSL)
-rw-r--r-- 1 dan9070 dan9070 6.5G Aug 9 20:58 dreamshaperXL10_alpha2Xl10.safetensors
-rw-r--r-- 1 dan9070 dan9070 6.5G Aug 9 20:52 sd_xl_base_1.0_0.9vae.safetensors
-rw-r--r-- 1 dan9070 dan9070 5.7G Aug 9 20:52 sd_xl_refiner_1.0_0.9vae.safetensors
-rw------- 1 dan9070 dan9070 4.0G Aug 9 20:44 v1-5-pruned-emaonly.safetensors
I think I might know why
Set them all to rwxr
Nothing changed. They're still not detected.
1.5 is, though.
Remove the dots from file?
I've never seen this before
remove cache.json
Voxel Lora + Pixel Lora with Sequential CPU Offload:
Time taken: 3m 6.21s | GPU active 1915 MB reserved 2242 MB | System peak 1526 MB total 16288 MB
Fixed.
Json deleted, reinstalled the models incase of some weird corruption
All models once again are detected.
NotImplementedError: Cannot copy out of meta tensor; no data!
updated: 2023-08-10
hash: 0a7105d5
url: https://github.com/vladmandic/automatic/tree/dev```
Yeah, I'm stumped now.
OH wait
upcasting is still on
Nope.
Still same error, sadly.
Even without a lora selected, I get a meta out of tensor error.
Guess I'll stick with Model CPU offload for now
It seems that with CPU model offload and the refiner model loaded I now get this error.
Removing the negative prompt, I now get "IndexError: string index out of range" with the same traceback.
I can't even seem to load the model refiner without CPU offload now either
Or even use it regardless of what offload method I use.
I just get that error.
Dev branch moment
Went back to main branch. Refiner works.
🤷♂️
this ones 1016 x 1016, FP16
original backend, sd1. 5 base model
Negative prompt: Mutated. Disfigured. Multiple Limbs. Disfigured weapon/sword. (((More than one)))
Steps: 20 | Seed: 3147534735 | Sampler: DDIM | CFG scale: 6 | Size: 1080x1080 | Parser: Full parser | Model: sd_xl_base_1.0_0.9vae | Model hash: be9edd61 | Refiner: sd_xl_refiner_1.0_0.9vae | Latent sampler: DDIM | Image CFG scale: 6 | Denoising strength: 0.3 | Refiner start: 0.8 | Secondary steps: 20 | Version: 417ef54 | Pipeline: Diffusers | Operations: "refine | txt2img"```
Is the same problem at 1080x1080?
not sure, I went to bed after
They were about the same for me for a while, but I mean slower than it was before.
@coral mulch everything got merged to master and refiner issue is fixed too
Start the webui with --reinstall if you want to use Sequential offload
Alright, thank you.
I actually had to git pull from the master branch, by git branch origin master
Then it updated.
Sequential LORAs work now.
Thank you, again.
@proper cradle With sequential on, I can use 9GB of my VRAM with the LORA to generate 4096x4096 images.
sequential slows performance down considerably though correct?
It does, at a huge VRAM cost decrease.
have you found a good sweet spot between vram usage and performance? I have 48gb RAM and a A770 16gb VRAM card
I was about to check out that pixel art lora too. Pretty neat
how much can you get out of just using the vram?
I've done up to 512x1024, but as soon as I hit 1024x1024 it starts throwing errors...But I havent been using the low vram flags. I tried once and saw a 11x incrase in render time for the same res
seems a bit low tbh unless sd xl uses that much vram
comfyui is supposed to be alot more efficient than auto1111
idk sd.next seems to have diverged a lot from it
sd.next is very different
4096x4096 with model shuffling, attention slicig, vae tiling, fp16 vae and vae upcasting false
VAE tiling is a must unless you have an Nvidia A100
VAE upcasting false = FP16
No one should use FP32
Attention slicing fixes NaNs above 2032x2032
Model shuffling sends unused models to RAM so it won't sit in the VRAM, doing nothing, No performance hit.
I will state however that without model CPU offload or sequential offload it doesn't really work with Loras yet
At least on my side.
Turn on Attention Slicing
it is on.
is there a trick to getting sdxl refiner working? I get this error when starting up with refiner.py ModuleNotFoundError: No module named 'sgm'
base sdxl model is working fine
Don't use random extensions
Disty:thanks for the link. Will check it out later
Any of you tested this?
https://youtu.be/GZLjbTPLCVk
The full list of commands and links can be found on my GitHub: https://github.com/ospangler/intel-arc-stable-diffusion-tutorial
Be sure to check out @Archive-pg2zn 's tutorial at https://www.youtube.com/watch?v=ub9150aOMMc on how to setup the wslconfig file, additional tips, error troubleshooting during Vladmandic installation, and improvements...
looks like theyr'e just doing a video tutorial for WSL2 setup?
Seems there's some commands he uses that I do not have.
im tempted to try it out since if it doesnt work out in the end, i could just unregister my wsl but im not really familiar with the commands he used
and regarding the OneApi toolkit, im curious if it will appear under programs in control panel even though he is installing it via wsl cuz the installation appeared on windows (15:08)
I've gotten SDXL already working through Disty's method.
No Aivan, I don't think so.
It's still within WSL
good cuz i dont want to keep tab of things i have to uninstall if this goes wrong in wsl hahaha
I assume you meant the oneapi basekit GUI right?
The reason why that shows up is because he's running the GUI installer for the base kit.
It's the same on Windows and Linux
yea
WSL2 supports graphical interfaces (WSLg)
Disty's method skips that entirely by directly installing what is needed through CLI.
interesting cuz i tried ssh-ing to my university’s lab computer using wsl to open a program but no graphical interface appeared. I could use X2Go, but less software, the better. No worries tho!
Its a little wonky to get going, but you can run it in native windows now.
i just tried the new openvino version, downloadin sdxl now to try it but for sd1.5 it is blazing fast
they got 11it/s on A770 https://youtu.be/a28Le2l4MA4 see around 12 minutes.
Generative AI is exploding, bringing potential AI applications that could change everything we do. One example of this recent progress is the release of text processing models, which possess the capability to solve complex problems like passing medical and law exams, akin to human abilities. However, one critical question remains: can we run the...
yeah thats what i acheived was 11.12 it/s
with 1 images or 4?
1
single batch
sdxl does not appear to work, although i set gpu in the openvino script settings it infers on the cpu with that model selected
does it move any of the models onto GPU?
11 it/s on arc 
yes it works great with any sd1.5 based models
1st run is slow because it compiles the model
subsequent runs run at just over 11it/s
oh it just handles the compiling for me? even more of an improvement over previous openvino xD
yes it has that baked in
works great on windows
yeah i mean idk if sdxl will be that fast but if they get sdxl working i would imagine 3-4
Well no of course not
I had 11it/s or so working on arch, sd1.5, before a performance regression with pytorch 2 that brought me to 3 it/s at best
i mean, you won't be doing 1024^2 at that speed of course
I think I've underestimated sequential CPU offload lmao
openvino is fast and this is pretty easy to configure the guide in the wiki for a1111 is extremely straightforward and nothing convoluted to do
It's slow for single generations
But amazing for large batch sizes
With model CPU offload, I can do 12 images per batch in 2 minutes
sheeesh
wiht sdxl?
Yes.
thats pretty good ngl
I'm going to test Sequential now
assuming 20 iterations per image ig
to see how high I can get on batch size
I keep getting really weird artifacting past batch size 2
i wouild imagne single image is slow for sequential because how it processes form cpu to gpu but all the following images would be fast
supersaturated colors and noise
that's with sequential LORA on
in wsl i was getting like 2.3 it/s in sdxl so thi sseems about right
does ipexrun work for you guys?
i have an identical setup to disty, pretty sure, and it's barely working on my machine
kinda stinks
I have disty's working on my side.
idk if the bifrost card is any different or if I need to update some microcode or something
Indeed
what's the typical performance penalty with that suite?
also: anyone notice any significant difference between bf16/fp16?
for me, it can be the difference between getting an image and not getting one
but I don't know when exactly is right for which one. My setup seems cursed though, I can't get good faces on any model
The higher the image batch size you can get, it seems you get closer to actual image gen performance
However it IS slower than Model CPU offload
Ope, it's lower than that.
16 seconds.
haha same here, on occasion
this makes sense, all the cpu/gpu shuffling costs are amortized over the batch
I haven't been able to get anything on my current linux setup yet. SDXL has broken faces and sd1.5 models no longer work at all
anyway I'd imagine you'd get discontinuities whenever you need to kick on another vram saver
what's your distro?
It did it.
1280x720 images.
Zero prompt with just negatives generates some interesting outcomes.
why so gray?
oh zero prompt
I would occasionally get really gray results with second pass
kinda had a faded look. pretty cool
annoying though
Ubuntu 23 this time. I killed a couple of others before it
hm
try disabling ipexrun if you haven't already
what flavor errors are you getting
im still on 22.04LTS
anyway i gotta go to sleep
hopefully i can get my litany of errors sorted out in the coming days
i was nagging disty on github since they were the only other person I knew about running sd on arc but now I've found this servers so things should be smoother
I love how despite not having any prompts
Somehow it still puts together a coherent image on it's own
This model blows 1.5 out of the water
okay correction, only the 1.5 base model seems to work properly with the openvino implementation, whilst other models will work and generate an image, im assuming the openvino compilation pipeline messes the models up as other models just output complete garbage
A1111 works with open vino?
They have official support I believe,https://github.com/comfyanonymous/ComfyUI/discussions/476 ran it in native windows, its slower than sdnext
Also, sdxl didn't work for me, but I didn't really know what i was doing. 1.5 worked fine though
I'm trying to get the SD.Next ComfyUI Extension to work lmao since it's literally just ComfyUI
I wonder if this could work for sdnext as well. Not a fan of automatic since it's never natice support for their platform, its always a fork that may never get maintained etc.
the openvino appartently doesnt work with other scripys and such so i dont think it would work with sd.next as its heavily modified
this will likely change in the future as development continues but its nice to have an easy to use webui version using openvino which is the fastest on arc by far
We have a thread A1111 for Arc on here
#1141164275990278206 message
nice
How is the speed for you?
Also, man a few months ago i dont think i imagined so many options for arc gpus. Coming along fast imo
@keen marsh Literally the same it seems. At least in terms of normal non-offload running, 1 IT a second basically (with LORA)
Then again this IS just the extension
It's using all the same packages my main venv is using
It uses A LOT of VRAM though it seems
It's nowhere near as optimized
Yeah nvm I can barely run it lol
It runs for the first two images then explodes
Well at least I got a lil' taste of ComfyUI, and I don't really like it tbh.
🤷♂️
Yeah not s big fan either, it was slower for me with sd1.5 too. Probably not bad when you get over the learning curve though, seen people make very fine too ed iterations pretty quickly with it.
SDNext has openvino_fx compiler as an option in the compute settings
But It's slower than no compile at all on my end
And it uses more VRAM
Yes, this started to happen after PyTorch 2.
BF16 is faster on original backend.
FP16 is faster on diffusers backend.
i think its simply because of precision conversion somewhere in the pipeline, bf16/fp16 shdnt affect speed per se
just guessing though
Thanks
interesting, I wonder if it will fair better in windows? Or if an environment for openvino needs to be created? I don't know enough and just hack my way through things, but I will start looking into these things soon.
if 11it/s is possible, maaaan lol
I noticed compile was slower for one off generations but would speed up with larger batches and consecutive runs
Definitely possible
I will check out this a1111 fork and look into sdnext's openvino backend on my system maybe today.
so that's with openvino backend on sdnext?
no
or does that benchmark work on a1111?
i've only ever gotten 6/it's, but I do have a750, maybe it's not possible on it
ahhh, okay. I never tried that in linux
my native windows ipex is a percentage slower than linux right now, the self compiled one with AOT anyway
You can't use the ortiginal backend in OpenVINO SD WebUI
whatever sdnext defaults to
Original backend
that was with ipex like a month ago
I get 8.3 it/s at 512x512 on original backend with FP16
Native windows with openvino a1111 fork sd1.5 512x512 is 11.2 it/s
was that with or without batching
Without
I was using compilation warmup and batching
That's why
Anyway diffusers/vino is faster than original/ipex nowadays?
I was reading that models don't work right though? Do you have to convert them? or have you tried?
yeah they dont seem to work right, the fork automatically converts them when it does its model compile, you get generations but the outputs dont match what the model is for
OpenVINO SD WebUI is entirely different thing
OpenVINO actually slows things down in SDNext
seems to be
For example an anime model will still output real life images more like what the 1.5 base model would produce
8.5 down to 8
Albeit it seems like if a model is say trained more for nsfw content that seems to stick just not the models desired style
I wonder if there is an issue with the conversion process? Doesn't seem like it should change anything
sounds like the wrong VAE is being used
Not sure if that fork is maintained , which is why i like SDNEXt tbh
maybe it's not loading an embedded vae
Vae shouldn't effect the style that much? mostly color output
ehh
Also, make sure you use clip skip
i also notice far better results using dpm++ 2m karas vs euler a using the openvino fork. This vae thing is possible. I haven't done alot of extensive testing yet but it is nice to have a solution thay works natively on windows
fine details are all vae
Is clip skip an extension?
it's a setting
Okay
No, you can set it in the options.
you discard the last n layers of CLIP
Most anime models need clip skip 2, most realistic models need 1.
essentially it weakens guidance
Do note that openvino fork disables all other scripts other than the openvino acceleration script
sort of like "blurring" the meanings of the words
I find that it doesn't make THAT big a difference though, just changes the image the style is usually the same. Sometimes stuff like "masterpiece" can make frames around the image in some models lol
Could be different in openvino though
it should have a big impact on image generated, not just colors (though thats probably what you see in practice). VAE decoders is just a neural network that convert latent image (that humans cannot comprehend) back to pixel space
It's worth a shot. Vae might need to be converted to openvino as well right? Not sure if the fork does that
It does
It will run on the CPU otherwise
thats the case a few months back (and honestly i think still is), you cannot just use custom models without conversion
So you think i should set clip skip to 2? It defaults to 1
typically the model page will tell you which to use
You can also add it to the main page in the options so you don't have to go to settings all the time.
sdnext already does that for you btw
yeah setting clip skip and such seems to not change anything, definitely should not be getting like real life photorealistic images from conterfeit but here we are
Does it have a baked vae?
Im not sure but using seperate vaes have no effect on the image so they're being ignored
I had an issue where certain vaes didnt seem to work in diffusers, also some sanplers made garbled output
So I finally decided to try https://www.technopat.net/sosyal/konu/using-stable-diffusion-webui-with-intel-arc-gpus.2593077/
on a clean Ubuntu wsl but it appears that it doesn't let me **load my weights. **. Is there a way to resolve this?
In this guide, we will install and use Stable Diffusion WebUI SD.Next with Intel ARC GPU's.
Intel PyTorch Library doesn't have native support for Windows so we have to use Native Linux or Linux via WSL.
Setup WSL on Windows:
Follow these instructions to setup Linux environment in Windows, then...
A1111 OpenVino solution already has a fix for "Restore Faces" update soon
I wonder if its because I have both my igpu and dgpu
#1084296011675082843 message
Disable iGPU
But does it support SDXL.
Or you can try xpu_VISIBLE_DEVICES env variable
🤔
once I do that, do I just run ./webui.sh --use-ipex?
FYI I fixed my issue with the A1111 OpenVINO solution by reinstallling my driver, disabling my RTX and reinstalling
Try xpu_VISIBLE_DEVICES=1 ./webui.sh --use-ipex
Try 1 or 0
This should hide the iGPU from IPEX
task manager says Arc is my gpu 1 so i should put 0 instead right?
iGPU is 2 or 0?
0
The number is the GPU ID.
use 1
Use 1.
wdym? second pass?
ipexrun things
don't use --use-ipex to disable ipexrun
Old SD (2.1 and prior) had a fix for higher resolutions to maintain coherency.
It was broken in SDXL.
Use Img2Img
"hires fix"?
It's the exact same thing
I thought hires fix upscaled in latent space
thus saving a round trip through the VAE
If you select any upscaler with hires, latent upscaling goes out the window
And latent upscaling was generally bad in my experience
without ipex
Try disaling iGPU from the BIOS
Okay, and what command should i run after?
oh I would always use latent upscaling with 512x512 base images; it would work decently well, but unreliably with frequent NaNs and unstable outputs
same ./webui.sh
Would it even be remotely possible to generate a 4096x4096 image on SDXL without artifacting or duplicating?
with lots of manual intervention absolutely
Nope. 2048x2048 maybe but 4096x4096 is too much for SDXL:
my trick for super duper resolution stuff is to generate at multiple "scales"
and stick things together
This is a direct 4096x4096
granted it's inconsistent and only works well for niche things
unless you mean directly
if you manage to get a decent 2048x2048 image, you can upscale it
Generating at 1920x1080 and upscaling to 3840x2160 works well:
Img2Img
Bruh.
its normal to have that warning since this is my first time setting it up right?
Selected model not found?
This is normal for the first setup since it will look for a model.ckpt
also as a tip when upscaling via img to img it's often beneficial to include more close-up related things in your prompt, since you're essentially running the model on small areas at a time
the extreme case of this is to upscale first via simple interpolation, than inpaint areas one by one to add more detail
this process could be carried out forever, in theory, especially with ControlNet to keep the model in line
but it's hilariously labor intensive
This is without Tiled Upscale
Yep Img2Img it in one go
what's the vram limited resolution on that one?
I'm guessing that's with all the vram savers on?
With Attention Slicing and VAE Tiling and Model Shuffling, A770 16GB is VRAM limited to 4096x4096
nice
Not at this time. IPEX WSL/Linux is the path for SDXL on Arc
Which is what I have currently set up.
👍
Nvm I answered it myself.
I'm an idiot lmao
@proper cradle When resizing, do you just use Resize Fixed
Yeah, img2img makes a HUGE difference in quality.
I am very pleased with the outputs.
Pushed the Windows fix by @paper horizon, can someone test it?
Also is it detecting OneAPI if you don't use --use-ipex?
I remember previously when ipex for windows first release and i tried it was detecting oneapi without the --use-ipex
OSError: [WinError 126] The specified module could not be found. Error loading
"C:\Users\KingOfMemes\automatic\venv\lib\site-packages\torch\lib\backend_with_compiler.dll" or one of its dependencies.
detects oneapi without --use-ipex tho so the audodetection works fine, i have tried launching with an without --use-ipex and done --reinstall just to make sure but still no luck on windows at the current moment
03:48:15-367904 INFO Installing package: torch==2.0.0a0 torchvision==0.15.1 intel_extension_for_pytorch==2.0.110+gitba7f6c1 openvino==2023.1.0.dev20230728 -f
https://developer.intel.com/ipex-whl-stable-xpu
03:48:19-672083 ERROR Error running pip: install --upgrade torch==2.0.0a0 torchvision==0.15.1 intel_extension_for_pytorch==2.0.110+gitba7f6c1
openvino==2023.1.0.dev20230728 -f https://developer.intel.com/ipex-whl-stable-xpu
this happens during the install as well
Do you see this warning?
Incompatible torch version {installed_torch_ver} for ipex windows, reinstalling to {ipex_torch_ver}
Removed Torchvision from this
try conda install libuv
you have to do that in a conda environment as ipex tutorial says
I managed to get it working without an conda env and just installed it into my system pip
but I am using the VSCode oneAPI env setup, which might rely on conda under the hood
I would consider conda a dependency
Yes i saw the incompatible torch version and then it reinstalled
it's okay to not use conda as long as uv.dll is in your library path. conda install libuv just simplifies things
Anybody gotten sequential offload to work on native windows in sd.next?
I do have torch, ipex and torchvison compiled from source. I guess i could upload the wheels.
I want to tey and compile the specific git# that intel used to see if speed in native increases or if aot just makes it slower, the it takes hours with aot.
you can just git checkout at specific commit
i doubt its working well currently.. theres no reason to not upload a functioning prebuilt wheel file otherwise
you edit the compile.bat with the get# where it has "xpu-master" also xpu-master adds a file call that doesn't exist in pytorch but is easily fixed with a simple comment line. I use Vipitis's .bat file still
I also edited a compile file for just Ipex
I have the wheels uploaded in this thread somewhere, but those you have to edit one file. If you compile from the xpu-2.X it works without the need to edit, and that's where the git# they use is from. It's from way back on july 25th (my birthday btw lol)
for sd.next this isntalled for me without any error in windows if you want to use the prebuilt wheels " torch==2.0.0a0 torchvision==0.15.2a0 intel_extension_for_pytorch==2.0.110+gitba7f6c1 -f https://developer.intel.com/ipex-whl-stable-xpu "
yes it all installs fine, always did tbh, still havent gotten it to actually launch within windows
OSError: [WinError 126] The specified module could not be found. Error loading
"C:\Users\KingOfMemes\automatic\venv\lib\site-packages\torch\lib\backend_with_compiler.dll" or one of its dependencies.
this is the error i now receive
sorry edited
hold on that may be where you need to edit a file.
I just woke up so i am slow right now give me a min to check
yeah, that is the file that doesn't exist in pytorch do this
2.)Locate the init.py file in your intel extension for pytorch folder pip
"your_python_directory\Lib\site-packages\intel_extension_for_pytorch\ init.py"
3.) Comment out line 100
#from . import _inductor
should work after that
It is in xpu-master for some reason, but it is not in the xpu 2.X branch
If you compile from the git hash or the specific xpu2.x branch it doesn't exist
im not seing that in my file
A Python package for extending the official PyTorch that can easily obtain performance on Intel platform - intel/intel-extension-for-pytorch
hmm...everything running from oneapi environment? Call all variables etc "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
"C:\Program Files (x86)\Intel\oneAPI\mkl\2023.2.0\env\vars.bat"
"C:\Program Files (x86)\Intel\oneAPI\compiler\2023.2.0\env\vars.bat"
correct
Running in Conda?
run in conda env and do conda install libuv
launch SD.Next by python launch.py --use-ipex
Forgot to mention, conda is necessary for me
I actually had to replace my python with it, as RC posted in this thread
I've checked the dependencies of torch and it requires libuv
you can also try this wheel, but you do need to edit that line out https://drive.google.com/file/d/1UnlMNvzqiqHW9aAiXv56u2M-EDn1JJwr/view?usp=sharing
miniconda3\envs\{env_name}\Library\bin\uv.dll
then copy the folder to your VENV in automatic
that wheel also does not need 15minutes to start, it is compiled with AOT so starts right away. But it is 2it/s slower than normal in original backend for some reason
Which is why I may try and compile from that git# but I don't really feel like spending another day on it lol
wow, you compiled the AOT wheels yourself! How long does it take on your platform?
lol, took 4-6 hours
does 90% then when it gets to cmake cpu goes to 15% and it takes HOURS
may I know your CPU and RAM?
5600, 32gb of 3200
What is the GPU usage when that happens? Probably compiling on the GPU.
I have another wheel you don't need to edit a file, but too lazy to upload tbh lol
even with libuv and lauching from conda still get same error
I didn't really check, but it definitely wasn't high.
Yeah, it's a pain tbh. I deleted python 3 and added miniconda to path and that's when it worked. Obviously this shouldn't be the case. Vipitis got it running though
I don't use Python for anything else so It didn't matter to me.
what about python -c "import torch"
I think the conda python and system python need to be the exact same or something
there is also some reference to conda in one of the compile.bat files I think
I could only get python 3.10.6 when conda is like 3.10.12 so that may be why
What this returns when you run this in the webui env?
pip show torch
Name: torch
Version: 2.0.0a0+gitc6a572f
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: [email protected]
License: BSD-3
Location: c:\users\kingofmemes\automatic\venv\lib\site-packages
Requires: filelock, jinja2, networkx, sympy, typing-extensions
Required-by: accelerate, basicsr, clean-fid, clip, clip-interrogator, compel, facexlib, gfpgan, invisible-watermark, kornia, lpips, open-clip-torch, pytorch-lightning, realesrgan, timm, tomesd, torchdiffeq, torchmetrics, torchsde, torchvision
did you launch sd.next with webui.bat or python launch.py? Try the latter and use python from your conda env with libuv
i have tried that same dll error
(maybe delete the venv directory too
and verified libuv is installed with conda list
just recorded my steps
i will try full reinstalling through conda and seing what happens
is your official python 3.10.12?
then maybe try https://github.com/lucasg/Dependencies to analyze what dependencies are actually missing for miniconda3\envs\sdnext\Lib\site-packages\torch\lib\torch_cpu.dll
A rewrite of the old legacy software "depends.exe" in C# for Windows devs to troubleshoot dll load dependencies issues. - GitHub - lucasg/Dependencies: A rewrite of the old legacy...
Python 3.10.12```
Also, how do you update python 3 to the latest version in windows without installing 3.11
I could not for the life of me figure out how to update from 3.10.6, theys topped uploading install files
it gave me 3.10.12
I mean your python outside of the venv
My theory is that they need to be the same to work, as I couldn't get conda to work with ipex while python was installed in my path
I didn't install another python other than from miniconda
yes
Okay, yeah that's the same for me then. Only Vipitis seems to have gotten it to work with python 3 installed
although I never tried 3.11
Upscalers and rembg won't work, everything else should work fine.
These are the errors you will get:
Also, have you gotten sequential cpu offload to work in windows? @paper horizon
It did work with native wheel IIRC, but it doesn't with my prebuilt one for some reason. It may have never worked though and I am misremembering
haven't tried it yet
ok installing and configuring thru conda has been successful thus far
out of curiosity, has anyone gotten any kind of training working on 1.5 or XL?
recently
I think disty had a patch for inversion for 1.x torch, but it would crash after a few iterations for me
it is not inferencing on windows
WARNING Torch FP16 test failed: Forcing FP32 operations: Tensor on device meta is not on the expected
device xpu:0!
got this
hitting generate doesnt throw any errors but literally nothing happens
no activity on cpu or gpu
and my igpu is disabled so thats not the issue
C:\Users\KingOfMemes\anaconda3\envs\sdnext\lib\site-packages\numba\np\ufunc\parallel.py:371: NumbaWarning: The TBB threading layer requires TBB version 2021 update 6 or later i.e., TBB_INTERFACE_VERSION >= 12060. Found TBB_INTERFACE_VERSION = 12020. The TBB threading layer is disabled. this also happens when launching
in taks manager, the GPU activity is hidden under the "compute" graph
oh wait i lied this must be long aot thing people have talked about i see some progress happening now
The prebuilt wheels will take about 10-15 minutes on the first inference. You have to restart the ui each time you change diffusers i think.
original backend does not work on windows for me i get api errors diffusers seems to be working
i see okay
guess ill just let it warm up
You hqve to complete restart to use it I think
The wheel i posted doesn't have that problem btw(long first generation) just edit that file and drop it into vent after install.
I kinda want to try compiling with AOT for python 3.9 but don't really want to spend 6 hours... and my CPU is even older
did you modify the script to just build ipex and use the troch prebuilt wheel instead?
What this outputs?
import torch
import intel_extension_for_pytorch as ipex
def test_fp16():
x = torch.tensor([[1.5,.0,.0,.0]]).to("xpu").half()
layerNorm = torch.nn.LayerNorm(4, eps=0.00001, elementwise_affine=True, dtype=torch.float16, device="xpu")
_y = layerNorm(x)
return True
if test_fp16():
print("Pass")
you can use the torch prebuilt, first time I compiled all but subsequent times I compiled just ipex. Torch and torchvision don't take that long to compile, maybe an hour all together if that.
I suggest modifying to compile ipex from here https://github.com/intel/intel-extension-for-pytorch/tree/release/xpu/2.0.110 then no need to edit files. The git# they compiled in the prebuilt is there as well.
Also, you exchange slow startup for slower inference speed, but it's still fast enough IMO. Especially with diffusers
Well, i got an image to generate after the long startup
Was close to 10 it/s
Then it crashed and locked my whole pc up
I believe you can call torch.compile or torch.jit.trace to get better inference performance down the road
but I need it to just quickly work for developing this stuff
well, google restricted my wheel file for whatever reason
Just going to delete it I guess, not worth the review. Not sure many used it anyway.
I will just give it a try and let it run for a while
I sent it for review, just in case google thinks I'm hacking people or something lol.
is that in like 63 of the script?
i think he means use this branch instead of xpu master
I got a CMake Warning: Manually-specified variables were not used by the project: and then it lists a few things as well as the USE_AOT_DEVICES which I added. I hope that's just a warning that remains from defaults they used. Don't want to add up with the same JIT variant in two hours
i..dk. the only one who got it to compile is aaron
set "VER_IPEX=v2.0.110+xpu" or the specific git#
at the top?
yes
how i got it to compile aot was like this "compile_bundle2.bat 1 2 ats-m150 " (bundle 2 is my edited bat for just ipex)
this is using the bat you edited earlier btw
also, I did this in conda in the oneapi environment etc
outside of conda it failed
changed it to .txt just incase there is some sorta issue uploading .bat files
I recommend trying the specific git# they used tbh, I'm hoping it is faster as I'm not sure if AOT makes it a bit slower or they changed something in the code since then
Also it pulls a lot more warnings than when using xpu-master, but it works the same in the end
I now hope my compilation finishes successfully but I will look at the changes.
No doubt, if you used xpu-master then you will need to edit the init.py file and comment out that line that pulls the error
30 minutes in and it's 488/1049 so an hour seems reasonable
Really not sure what it's trying to pull from pytorch, but it doesn't exist
Oh, if you are using AOT, it will hit 1047 and take about 4 hours from there lol
They acknowledged this on the github as well, and say they are trying to fix it.
If the cmake exe is running it is still compiling, cpu was around 15% at that point
that's the case with the script on the release branch already.
so I am hopeful
oh nice, I haven't checked since I compiled
they fixed some stuff in the file inside the release branch
I grabbed the script from GitHub today so I should have all the fixes
I didn't throw out my old ipex version. But there is a force_reinstall. As long as I get the wheel files it should be good
You can install the prebuilt wheel over it fine, the wheel you compile will be inside the Dist folder
I got some many warnings by now haha
yeah, lol
arrow lake next year
built on 20a node
would be a shame if you decided to go 14900k
yeah, it doesn't sound like the smartest decision
but I have waited long enough and there will always be a next gen.
got to 1047 in around 2 hours
oof, I would plan to add a couple hours from my estimate. Took maybe an hour or less for me to get there, don't think it took that long.
Successfully installed intel-extension-for-pytorch-2.0.110+git509a378
it took in total around 5 hours 30 minutes. and the step 1047 was reach after 2
let's hope this works
it does work, still had a short wait on first inference but it is fast enough to be useful without any specifc tweaking. Well worth the 6 hours.
Probably don't want to compile for 6 hours, it's slower too so they may be trying to figure that out as well as decrease the compile time.
The wheel I have is for python39
seems like the last two hours is just ocloc.exe running
apparently there are controlnet models that work with sdxl but only in comfyui right now
I can't seem to get img2img upscale on diffusers working
It just makes the images noisier (?)
hmm, might only be occuring at higher resolutions
are these related to VRAM usage? is there some sort of soft cap I'm hitting that drops quality?
FWIW
hmm, got it working at 1.9 scale...bet this is just a 1024 issue again
wonder why 1024 res is so cursed
especially since that's the exact resolution its meant to work best on
What is you denoise strength?
Too low and it will be noisy
Too high and it will change the image too much
no, it gets more noisy
if you watch the interim images it goes from noisy to noiser
Base res?
no sorry base was 512x 640
Yep, probably 1024 curse
that's so odd
try 1080
Also enabling both move base options will save 6 GB VRAM without any performance loss
If you ran out of memory when VAE decoding*
Or using refiner
Also hires is working on the dev branch
sweet
there's no ControlNet on diffusers, right? What would it take to get working?
there is StableDiffusionControlNetPipeline directly in diffusers. I used it today for a project
There is I believe, just not for sdxl or at least not in the webui version yet.
nice
there was a ControlNet for XL chapter in the docs https://huggingface.co/docs/diffusers/api/pipelines/controlnet_sdxl
you can use controlnet is comfyUI btw. Both controlnet and controlnet loras
Needs a new UI work
Is ComfyUI as performant as vladmantic yet?
I don't think so, I haven't used it but the one time. Don't think they have any of the ipex optimizations it's just barebones support
only tried in native windows though
No idea how much performance you can get out of Vlad, but I'm generating 4k resolution images in about 180 seconds using an RTX 3080 10GB on ComfyUI
And if I wanted to make it 8K resolution, it would take ~800 seconds
Does Vlad produce 4-8k imagery without VRAM errors?
on 10GB VRAM?
Also Enjoy my new LORA for horror style 😄
Thats Nvidia, Arc needs optimization
A770 16 GB can generate direct 4096x4096 without --medvram or --lowvram
Nice. But it should be able too with 16GB of VRAM.
Would be nice to see how it runs on the lower VRAM cards
SDXL 1024x1024 can run on 2GB GPUs with --lowvram
Fair. That's pretty decent. I'm on a 10GB VRAM + 16GB of system ram and producing 4k images using only Tiled VAE, A larger than average Page/Swap file and fp16 precision.
I've produced 8k and larger, but it just take 10 minutes plus
1 GB with FP16 and 2 GB with FP32. 2 GB GPUs generally doesn't support FP16.
That is unfortunate considering that FP16 is only useful for low end GPU's really
I mean, if it weren't for the lag that it causes my system, I'd still be using the fp32 vae myself
BF16 runs fine
Intel ARC defaults to BF16 on SDNext
Awesome, because BF16 is the most optimum for SDXL
BF16 generally runs faster on ARC than FP16
Interesting.
I released a BF16 LORA yesterday, then found that all GTX users need fp16 lol
so I had to put out 2 versions lol
Has anyone done any Arc A380 testing then?
With StableSwarm now a thing, the opportunity to generate multiple batches of images, per graphics card, simultaneously is now a thing.
Does anyone have any experience using StableSwarm with multiple GPU brands?
Splitting batches to multiple GPUs were already a thing?
Probably possible with a proper server to connect to seperate APIs.
Of course it was already possible, but with the new StableSwarm API, all of the features of ComfyUI can be utilized across multiple GPU's remotely acrosss a network or over the internet.
And it's just more accessible in general than older methods
I have an A380 as a secondary and I'm just checking what options I have for potential workflow improvements
hello, I tried OpenVINO for a while and it's not quite there yet, comfy looks like it still has some issues with it too. can someone direct me to a solid guide for sd.next? I want to run 1.5 and eventually sdxl. thanks!
also if theres some easy to follow documentation on what it can and cant do I would love to see it. Intel Arc A770 16GB
would I just follow the instruction on the pinned post in this thread? edit: it looks like this is the way
If you mean openvino support, I think you just git clone sd.next, then run webui.bat --open-vino and i believe it will set everything for you, just don't change it from fp32.
I keep hearing ipex works better so I'm trying to go with that
Should be pinned.
If you mean native windows, then it's way more complicated as you have to also compile it yourself. If you want to use IPEX go for wsl2, the install is a lot mroe involved than openvino
If you don't compile, you have to wait 10-15 minutes before your first generation, but after that it is pretty fast
I just went through this install: https://www.technopat.net/sosyal/konu/using-stable-diffusion-webui-with-intel-arc-gpus.2593077/
In this guide, we will install and use Stable Diffusion WebUI SD.Next with Intel ARC GPU's.
Intel PyTorch Library doesn't have native support for Windows so we have to use Native Linux or Linux via WSL.
Setup WSL on Windows:
Follow these instructions to setup Linux environment in Windows, then...
Its the same right?
Same but forgot to update ons thing
Run;
sudo apt install libjemalloc-dev
This is where I'm at can I run that after it's done
Thanks man.
I ran libjemalloc, did I miss something?
I am writing a small controlnet app and trying to run it on my A750. But it is really slow. What is the trick to speed up inference with the different models. or which file do I need to look at the find the solution.
As this will be hosted it needs to be device agnostic.
was I supposed to install libjemalloc in automatic folder? I did it after going cd
it's best to use webui.py to run in the venv, not sure if that's causing your error though
I got this error before.
./shared/source/os_interface/os_interface.h
Do you happen to have your igpu enabled? Check to see if you have 2 gpus in task manager. I managed to fix mine after disabling igpu multimonitor on my asus motherboard.
After that, try running it again. If it doesnt work, Reinstall.
#1084296011675082843 message
oh, this? gpu 0
yea

