#SDNext WebUI on Intel ARC

1 messages · Page 3 of 1

chrome bone
#

oh wait, i remember having higher it/s when using ipex thats not compiled with AOT

#

just slow start but speed is better

keen marsh
#

yup

#

But it could be the commit though

#

they changed some stuff about memory etc.

#

likely still working on it so we are getting the pure upstream when compiling, some stuff might be worse atm

chrome bone
#

ill patiently wait them to fix it then

#

though im sure a lot of ppl out there are keen to try SD in native windows

keen marsh
#

if I am up to it....I may try and compile another day...nah, this is fine. Diffusers seems to work well, just the vae's don't show up

chrome bone
#

yeah

#

pain peko

keen marsh
#

I wonder if they have a diffuser version for some of these vae's?

chrome bone
#

the models are the same

#

just the pipelines are written differently

#

afaik

#

so some samplers and plugins you cannot use when choosing diffusers backend

keen marsh
#

I think it may be safetensors, some are still ckpt and pt files

tall grove
#

Hm so linux still quite a bit faster?

keen marsh
#

Diffuser backend is just as fast, and if you use the prebuilt wheels its fast but you have to wait 10-15minutes to start.

coral mulch
#

I'm still stuck with the same error I had before, as I genuinely have no clue how to fix it.

keen marsh
#

Vipitis seemed to get it to work with just having the conda installed and not deleting python 3. See if he can help you out. A theory I had for why it didn't work for me is i couldnt update python to the latest version and could only use 3.10.6 while conda is newer. 🤷‍♂️

grave condor
#

126 error?

#

I got it to switch to 127 for a while.

coral mulch
#

When in WSL, Error 139. When I tried windows, it was a dll error.

grave condor
#

no clue about WSL

coral mulch
#
or one of its dependencies.```
#

For windows.

grave condor
#

I got ipex to work for CausalLM inference today but the JIT delay is horrible

keen marsh
grave condor
#

I might try that on Friday. Got a busy day tomorrow. And really need to get to sleep this time

#

Was up past 5 am the last few days

#

alarm at 7.30 which is in almost 6 hours

keen marsh
#

They are compiling from a branch and not xpu master, so xpu master adds a file we dont need that causes an error since it doesn't exist in torch

grave condor
keen marsh
grave condor
#

I did the ipex webinar today and it was completely useless. They just talked about CPU stuff and a hyperparameter searching script they implemented.

keen marsh
#

If you want to compile yourself, change xpu master to the xpu 2.0 branch in the bat file

grave condor
#

no useful information for GPU/xpu and my questions didn't get real answers either.

#

You can't save the JIT kernels to use them again or in other processes is what they confirmed to me.

coral mulch
#

That, and MKL + dcp

keen marsh
#

@grave condor#0 what version of python do you have?

grave condor
#

3.9.4 I believe

proper cradle
keen marsh
#

That may be the reason it wouldn't run right with python 3.10.6, i got the same error dan does. Had to I delete and add conda to path(doesn't matter for me since I dont do any real programming)

grave condor
proper cradle
#

xpu mode is available but it's slower than no ipexrun at all

#

cpu mode is the fastest

grave condor
#

eh, I will try accelerate launch for the eval script, I believe by accelerate config got the xpu registered.

proper cradle
#

accelerate has --use-xpu cmd arg

grave condor
#

I am using accelerate.Accelerator.device() right now for a simple device agnostic Implementation

#

haven't tried it on on all three options tho. But wanted to before I push it

coral mulch
#

Got it working again.

#

On WSL*

#

My only issue now

#

Is that line that keeps showing up

proper cradle
#

That line is a weird issue happens only at 1024x1024

#

Diffusers with attention slicing turned on doesn't have that issue

coral mulch
#

I have attention slicing on and that is happening.

proper cradle
#

Attention slicing off = Scaled Dot Product
Attention slicing on = Diffusers

#

Try turning it off and reload the model

#

It'a a weird issue and it doesn't go away without a complete restart

#

And it only happens at 1024x1024

#

768x1024 or 1024x1536 don't have this issue

coral mulch
#

768x768 works rather well.

proper cradle
#

Try 1080x1080

coral mulch
#

Yeah, 1080x1080 is fine

proper cradle
#

This happens in all models on IPEX with 1024x1024

coral mulch
proper cradle
#

I couldn't find why this happens exactly at 1024x1024

coral mulch
#

I'll probably do 1280x720 for nice 16:9 images

#

Or 1024x768

proper cradle
#

I go straight for 1920x1080 on SDXL

#

Then regenerate it with Img2Img at 3840x2160

coral mulch
#

I cannot do 1920x1080

proper cradle
#

Here is an example image i generated on my A770

#

8GB?

coral mulch
proper cradle
coral mulch
#

Model CPU offload is used

#

Mine still seems to over-use sysram

#

Vram isnt an issue

proper cradle
#

No offload, all move options are on, VAE slicing and VAE tiling is on

coral mulch
#

And not attention slicing?

proper cradle
#

Attention slicing off = Scaled Dot Product

coral mulch
#

And this is without LORA right?

proper cradle
#

Without

coral mulch
#

Hm.

coral mulch
#

Do you use these?

proper cradle
proper cradle
coral mulch
#

Alright.

proper cradle
#

They are FP32

coral mulch
#

Well I don't use them.

#

I was just wondering is all.

#

Didn't know they were FP32.

#

I get an out of resources error when trying to generate with those diffuser settings.

#

No refiner, just a pixel art LORA.

proper cradle
#

Your RAM usage is 15 GB but it still runs out of RAM?

coral mulch
#

WSL is set to a ram limit of 24GB with a swap of 40GB.

#

When fully loaded it looks like this.

proper cradle
#

Wait, your GPU dies before it runs out of resources

#

Device Not Found error when trying to load a Lora

coral mulch
#

Let me try running a 1920x1080 image without a LORA.

#

Generation was a success.

#

Something's wrong with LORAs.

proper cradle
#

Lora support is still experimental in diffusers

coral mulch
#

That's very disappointing.

#

Yep, I can do 1920x1080 with the refiner included.

#

I really wanted to use loras, though.

proper cradle
#

Try lower res

coral mulch
proper cradle
#

1024x1536 is pretty stable

coral mulch
#

Didn't work. Out of resources.

#

I don't even think it will generate a 1024x1024 image with a LORA.

#

Nope.

#

Only way to do it is through model CPU offload.

#

Yep. With model CPU offload I can do 1080x1080 images with loras enabled.

restive parcel
#

my linux setup is so borked I get "out of resources" at anything above 1024, following SDXL optimization guide. 64gb sys ram, a770 LE

#

+nothing looks good, so i'm gonna have to review all of my setup thonk

restive parcel
#

fixed my linux setup Honma_Yay except for hires fix is not working at all, but hopefully more stuff will get updated to diffusers backend and then i won't have to

#

if I could do training on my card, I could just make my own loras...

#

thas prolly not good right

coral mulch
#

Rename it back to that, and set the VAE loading precision to be FP32.

restive parcel
#

so the instructions have changed slightly since then?

#

ahhh shoot, getting black images with SDXL again

#

or maybe its only CounterfeitXL getting black images

#

refiner isn't working at all though...

#

refiner: disabled

#

tried rebooting a few times

chrome bone
#

huh

#

you need to use fp16 fixed vae for counterfeitxl

#

the baked in vae never worked for me

#

also set precision to fp16

#

there is no need to upcast

restive parcel
#

yeah i'm using the fixed vae and precision type BF16

#

i tried fp16 and it didn't work any differently

chrome bone
#

then i have no idea why

#

specifically i didnt set anything about precision

#

so bf16 may or may not work, probably disty knows better

restive parcel
#

I need to also ask disty about xpu-smi

#

I can't get it to output gpu stats

#

the whole output is blank except for frequency and power

keen marsh
restive parcel
#

I did

#

same output

keen marsh
#

Not at computer, but are you running it with the watch command that pings it every few seconds? I found that it would kinda glitch out a bit at first before it started working.

restive parcel
#

I am not, how do I use that?

keen marsh
#

Its something like this i think $ watch -n <interval> <command>
Replace <interval> with time interval at which you want command to repeat, in seconds. Replace <command> with command you want to repeat.

For example, if you want to run top command every 5 seconds, type following command −

$ watch -n 1 (or the number of seconds to run it. I don't know it by heart I will look it up real quick

#

Dunno how all that extra stuff added to my post, but I think thats it

restive parcel
#

oh that's really neat, thank!

keen marsh
#

No problem

restive parcel
#

I still don't get utilization, but at least its eventually giving me memory use

keen marsh
#

Yeah, it seems a bit glitchy. Sometimes making the prompt window bigger fixes it lol

restive parcel
#

oh yay, i'm starting to get stuff DinaKEK

restive parcel
#

oh, I wasn't expecting to see the line on Original backend with an older model

#

and now I'm back where I was before, can't render using the old backend

#

and i'm going blind from a migraine, so I guess i'll try more another day

#

I thought it started working on its own but something is very wrong DinaKEK

coral mulch
#

Use 1080x1080 resolution. Do not use 1024.

#

Set channelslast as well

#

This genuinely seems like the best overall setup on WSL

tall grove
#

Isn't the current sd xl implementation flawed in the web ui? Atleast I saw an open issue working on something to do with it.

coral mulch
coral mulch
#

LORAs for example do not work with sequential CPU offload, and will not run without model CPU offload.

tall grove
coral mulch
#

Well I'm using vladmantic's automatic webUI

#

Not that.

#

A different fork entirely

tall grove
#

Yeah but its a fork from it

#

Did the sd xl implementation not come from the base?

coral mulch
tall grove
#

Yeah but did he make the sd xl support or move it from main

coral mulch
#

I don't think it came from the original branch

tall grove
#

Atleast the main branch seems to be working on a better solution

#

Actually not sure

coral mulch
tall grove
#

All I see is comfy ui us better for some reason

coral mulch
#

Voxel and pixel art loras are great, man.

tall grove
#

Well I cant mess with this for a month or so so hopefully everything is sorted by then

keen marsh
proper cradle
proper cradle
#

A1111 was 3 weeks late and A1111's SDXL implementation is terrible

keen marsh
# tall grove Yeah but its a fork from it

Sdnext was vlad diffusion which was originally a fork but changed the name because it became much different. Sdnext is better updated and keeps GPUs other than Nvidia in mind. Most extensions will work with both though, and there is a refiner extension but I have never tried it.

proper cradle
coral mulch
#

What's the URL?

proper cradle
#

Akane lora, 2048x3072

coral mulch
proper cradle
#

git checkout -b dev but it will be merged soon anyway

coral mulch
#

Very good.

keen marsh
#

One thing I dont like, is sdnext disables control net when starting with sdxl, which sucks when switching back and forth to sd1.5.

coral mulch
#

It stayed enabled even when I swapped to diffusers.

keen marsh
#

It disables when starting the ui

proper cradle
#

It will get disabled if you restart

coral mulch
#

Btw disty

#

once I checkout to that branch

keen marsh
#

Waiting for them to add the controlnet sdxl models but seems they are too big

coral mulch
#

would I just do ./webui.sh --use-ipex --upgrade --reinstall

proper cradle
#

do a git pull and this should be fine

tall grove
#

oh right

#

man sd xl has so much potential

coral mulch
#

So far I've been exceedingly impressed with it.

tall grove
#

just seems to be an ass to run

coral mulch
#

Does the dev branch resolve the 1024x1024 resolution issue btw?

tall grove
#

hopefully that doesnt damper community engagement

proper cradle
#

I couldn't find a fix for that

coral mulch
#

Why does 1024x1024 specifically cause artifacting, though?

#

That's what I don't understand.

proper cradle
#

Same thing happens on original backend too

coral mulch
#

Weird.

#

Swapped to the dev branch, enabled sequential CPU offload

#

disabled model CPU offload

#

put on a pixel art LORA.

#

Black images.

#

🤷‍♂️

proper cradle
#

Don't use any offloading in the meantime

coral mulch
#

Alright.

proper cradle
#

git pull

coral mulch
#

Cannot copy out of meta tensor; no data!

#

Why am I getting meta tensor errors now?

#

I had this before, too. Not sure what caused it.

#

Can you not use sequential CPU offload combined with the sequential apply for LORAs?

#

@proper cradle Even with sequential CPU offload off, and LORA set to diffusers default

#

I'm getting meta tensor errors.

#
11:10:51-849166 ERROR    Arguments: args=('task(zf6p5v3burvufxj)', '', '', [], 20, 3, 0, True, False, False, 1, 1, 6, 6,
                         0.7, 1, -1.0, -1.0, 0, 0, 0, 1080, 1080, False, 0.3, 2, 'Latent', 20, 0, 0, 0.8, '', '', [], 0,
                         False, False, 'positive', 'comma', 0, False, False, '', 0, '', [], 0, '', [], 0, '', [], True,
                         False, False, False, 0, False) kwargs={}
11:10:51-850403 ERROR    gradio call: NotImplementedError```
#

Had to fully shutdown and restart WSL in order for it to generate an image with all offload types disabled.

#

Well model CPU offload + sequential apply LORA works.

#

Yeah, meta tensor errors for sequential only.

#

Did a git pull, states I'm already up to date

#

So no clue 🤷‍♂️

#
dan9070@dbs580:~/automatic$ git pull
Already up to date.
dan9070@dbs580:~/automatic$ git branch
* dev
dan9070@dbs580:~/automatic$```
proper cradle
coral mulch
#

HEAD is now at 417ef540 Merge pull request #1971 from Aptronymist/master
dan9070@dbs580:~/automatic$ git branch -d dev
warning: deleting branch 'dev' that has been merged to
'refs/remotes/origin/dev', but not yet merged to HEAD.
Deleted branch dev (was 0a7105d5).
dan9070@dbs580:~/automatic$ git checkout origin/dev
Previous HEAD position was 417ef540 Merge pull request #1971 from Aptronymist/master
HEAD is now at 0a7105d5 Fix SDXL LoRa offloading and SD 1.5 parsing
dan9070@dbs580:~/automatic$ git pull
You are not currently on a branch.
Please specify which branch you want to merge with.
See git-pull(1) for details.

git pull <remote> <branch>

dan9070@dbs580:~/automatic$ git branch

  • (HEAD detached at origin/dev)
    dan9070@dbs580:~/automatic$ git pull
#

That good?

keen marsh
#

That just means your not on the main branch iirc. You will need to switch back to update later though i think anyway

coral mulch
#

I'm using the dev branch to utilize Sequential CPU offload with Sequential LORA.

keen marsh
#

Sequential stopped working for me in windows native version, haven't tried latest commit though.

#

But model offload runs well. Sequential is slower now anyway

coral mulch
#

Sequential is meant to be slower.

#

Lol

#

Apologies if I misunderstand git a bit, as I'm not usually one to mess heavily into repositories.

#

@proper cradle Is * (HEAD detached at origin/dev) correct for git branch?

proper cradle
#

git checkout dev

coral mulch
#
Switched to a new branch 'dev'```
proper cradle
#

git pull

coral mulch
#
Already up to date.```
proper cradle
#

git status

coral mulch
#
On branch dev
Your branch is up to date with 'origin/dev'.

nothing to commit, working tree clean```
proper cradle
#

This should work

coral mulch
#

Where is this located?

proper cradle
coral mulch
#

Identical settings to mine

#
updated: 2023-08-10
hash: 0a7105d5
url: https://github.com/vladmandic/automatic/tree/dev```
proper cradle
#

What lora are you using?

#

Kurokawa Akane Lora:



Prompt: (masterpiece, best quality, highres, anime, pixiv), (1girl, kurokawa akane, blue hair, green eyes, medium hair, gradient hair, solo, full body, standing on an abstract water), (bloom, swirling lights, light particles, detailed, 8k), <lora:training_model:1.0>
Negative prompt: (worst quality, low quality:1.4, lowres, blurry), (3d, interlocked fingers, loli, 2girls),
Steps: 40 | Seed: 4107994374 | Sampler: Euler a | CFG scale: 10 | Size: 1024x1536 | Parser: Full parser | Model: SDXL_astreapixieXLAnime_v16 | Model hash: 432e15eb | VAE: sdxl-vae-fp16-fix | Version: 0a7105d | Pipeline: Diffusers | Operations: txt2img | Lora hashes: "training_model: efe6c5dadf89"

Time taken: 1m 46.25s |

GPU active 1016 MB reserved 1358 MB | System peak 341 MB total 16288 MB
coral mulch
#

The moment I swapped to this branch, it stopped detecting all of my SDXL safetensors.

#

The only one is 1.5.

#

Refreshing does nothing.

proper cradle
#

Try setting pipeline to autodetect

#

And refresh

coral mulch
#

Available models: /home/dan9070/automatic/models/Stable-diffusion 1

proper cradle
#

Does it have the correct perms?

coral mulch
#

I don't know what perms would've changed from the previous repo.

#

It detects only SD 1.5.

proper cradle
#

Run ls -lh in that folder (Inside WSL)

coral mulch
#

-rw-r--r-- 1 dan9070 dan9070 6.5G Aug 9 20:58 dreamshaperXL10_alpha2Xl10.safetensors
-rw-r--r-- 1 dan9070 dan9070 6.5G Aug 9 20:52 sd_xl_base_1.0_0.9vae.safetensors
-rw-r--r-- 1 dan9070 dan9070 5.7G Aug 9 20:52 sd_xl_refiner_1.0_0.9vae.safetensors
-rw------- 1 dan9070 dan9070 4.0G Aug 9 20:44 v1-5-pruned-emaonly.safetensors

#

I think I might know why

#

Set them all to rwxr

#

Nothing changed. They're still not detected.

#

1.5 is, though.

proper cradle
#

Remove the dots from file?

#

I've never seen this before

#

remove cache.json

#

Voxel Lora + Pixel Lora with Sequential CPU Offload:
Time taken: 3m 6.21s | GPU active 1915 MB reserved 2242 MB | System peak 1526 MB total 16288 MB

coral mulch
#

Fixed.

#

Json deleted, reinstalled the models incase of some weird corruption

#

All models once again are detected.

#

NotImplementedError: Cannot copy out of meta tensor; no data!

#
updated: 2023-08-10
hash: 0a7105d5
url: https://github.com/vladmandic/automatic/tree/dev```
#

Yeah, I'm stumped now.

#

OH wait

#

upcasting is still on

#

Nope.

#

Still same error, sadly.

#

Even without a lora selected, I get a meta out of tensor error.

#

Guess I'll stick with Model CPU offload for now

coral mulch
#

It seems that with CPU model offload and the refiner model loaded I now get this error.

#

Removing the negative prompt, I now get "IndexError: string index out of range" with the same traceback.

#

I can't even seem to load the model refiner without CPU offload now either

#

Or even use it regardless of what offload method I use.

#

I just get that error.

#

Dev branch moment

#

Went back to main branch. Refiner works.

#

🤷‍♂️

restive parcel
#

original backend, sd1. 5 base model

coral mulch
#
Negative prompt: Mutated. Disfigured. Multiple Limbs. Disfigured weapon/sword. (((More than one)))
Steps: 20 | Seed: 3147534735 | Sampler: DDIM | CFG scale: 6 | Size: 1080x1080 | Parser: Full parser | Model: sd_xl_base_1.0_0.9vae | Model hash: be9edd61 | Refiner: sd_xl_refiner_1.0_0.9vae | Latent sampler: DDIM | Image CFG scale: 6 | Denoising strength: 0.3 | Refiner start: 0.8 | Secondary steps: 20 | Version: 417ef54 | Pipeline: Diffusers | Operations: "refine | txt2img"```
coral mulch
restive parcel
#

not sure, I went to bed after

keen marsh
proper cradle
#

@coral mulch everything got merged to master and refiner issue is fixed too

#

Start the webui with --reinstall if you want to use Sequential offload

coral mulch
#

Alright, thank you.

coral mulch
#

Then it updated.

#

Sequential LORAs work now.

#

Thank you, again.

#

@proper cradle With sequential on, I can use 9GB of my VRAM with the LORA to generate 4096x4096 images.

mellow sparrow
#

sequential slows performance down considerably though correct?

coral mulch
mellow sparrow
#

have you found a good sweet spot between vram usage and performance? I have 48gb RAM and a A770 16gb VRAM card

#

I was about to check out that pixel art lora too. Pretty neat

tall grove
#

how much can you get out of just using the vram?

mellow sparrow
#

I've done up to 512x1024, but as soon as I hit 1024x1024 it starts throwing errors...But I havent been using the low vram flags. I tried once and saw a 11x incrase in render time for the same res

tall grove
#

seems a bit low tbh unless sd xl uses that much vram

mellow sparrow
#

comfyui is supposed to be alot more efficient than auto1111

tall grove
#

idk sd.next seems to have diverged a lot from it

restive parcel
#

sd.next is very different

proper cradle
#

VAE tiling is a must unless you have an Nvidia A100

#

VAE upcasting false = FP16

#

No one should use FP32

#

Attention slicing fixes NaNs above 2032x2032

#

Model shuffling sends unused models to RAM so it won't sit in the VRAM, doing nothing, No performance hit.

tall grove
#

Oh that seems more normal.

#

Was beginning to think 16gb was small 😞

#

*too

coral mulch
#

I will state however that without model CPU offload or sequential offload it doesn't really work with Loras yet

#

At least on my side.

coral mulch
mellow sparrow
#

is there a trick to getting sdxl refiner working? I get this error when starting up with refiner.py ModuleNotFoundError: No module named 'sgm'

#

base sdxl model is working fine

proper cradle
#

Don't use random extensions

coral mulch
#

Looks like it decided to work now.

#

No offload, it works.

mellow sparrow
#

Disty:thanks for the link. Will check it out later

pastel geode
#

Any of you tested this?
https://youtu.be/GZLjbTPLCVk

The full list of commands and links can be found on my GitHub: https://github.com/ospangler/intel-arc-stable-diffusion-tutorial

Be sure to check out @Archive-pg2zn 's tutorial at https://www.youtube.com/watch?v=ub9150aOMMc on how to setup the wslconfig file, additional tips, error troubleshooting during Vladmandic installation, and improvements...

▶ Play video
restive parcel
#

looks like theyr'e just doing a video tutorial for WSL2 setup?

coral mulch
pastel geode
#

im tempted to try it out since if it doesnt work out in the end, i could just unregister my wsl but im not really familiar with the commands he used

#

and regarding the OneApi toolkit, im curious if it will appear under programs in control panel even though he is installing it via wsl cuz the installation appeared on windows (15:08)

coral mulch
#

I've gotten SDXL already working through Disty's method.

#

No Aivan, I don't think so.

#

It's still within WSL

pastel geode
coral mulch
#

I assume you meant the oneapi basekit GUI right?

#

The reason why that shows up is because he's running the GUI installer for the base kit.

#

It's the same on Windows and Linux

coral mulch
#

WSL2 supports graphical interfaces (WSLg)

#

Disty's method skips that entirely by directly installing what is needed through CLI.

pastel geode
#

interesting cuz i tried ssh-ing to my university’s lab computer using wsl to open a program but no graphical interface appeared. I could use X2Go, but less software, the better. No worries tho!

keen marsh
#

Its a little wonky to get going, but you can run it in native windows now.

novel sphinx
#

i just tried the new openvino version, downloadin sdxl now to try it but for sd1.5 it is blazing fast

grave condor
#

they got 11it/s on A770 https://youtu.be/a28Le2l4MA4 see around 12 minutes.

Generative AI is exploding, bringing potential AI applications that could change everything we do. One example of this recent progress is the release of text processing models, which possess the capability to solve complex problems like passing medical and law exams, akin to human abilities. However, one critical question remains: can we run the...

▶ Play video
novel sphinx
#

yeah thats what i acheived was 11.12 it/s

grave condor
#

with 1 images or 4?

novel sphinx
#

1

#

single batch

#

sdxl does not appear to work, although i set gpu in the openvino script settings it infers on the cpu with that model selected

grave condor
#

does it move any of the models onto GPU?

restive parcel
#

11 it/s on arc Inani

novel sphinx
#

yes it works great with any sd1.5 based models

#

1st run is slow because it compiles the model

#

subsequent runs run at just over 11it/s

restive parcel
#

oh it just handles the compiling for me? even more of an improvement over previous openvino xD

novel sphinx
#

yes it has that baked in

coral mulch
#

Imagine SDXL at that speed

novel sphinx
#

works great on windows

#

yeah i mean idk if sdxl will be that fast but if they get sdxl working i would imagine 3-4

coral mulch
#

Well no of course not

broken grail
#

I had 11it/s or so working on arch, sd1.5, before a performance regression with pytorch 2 that brought me to 3 it/s at best

restive parcel
#

i mean, you won't be doing 1024^2 at that speed of course

coral mulch
#

I think I've underestimated sequential CPU offload lmao

novel sphinx
#

openvino is fast and this is pretty easy to configure the guide in the wiki for a1111 is extremely straightforward and nothing convoluted to do

coral mulch
#

It's slow for single generations

#

But amazing for large batch sizes

#

With model CPU offload, I can do 12 images per batch in 2 minutes

restive parcel
#

sheeesh

novel sphinx
#

wiht sdxl?

coral mulch
#

Yes.

novel sphinx
#

thats pretty good ngl

broken grail
#

wow

#

~2it/s thereabouts?

coral mulch
#

I'm going to test Sequential now

broken grail
#

assuming 20 iterations per image ig

coral mulch
#

to see how high I can get on batch size

broken grail
#

I keep getting really weird artifacting past batch size 2

novel sphinx
#

i wouild imagne single image is slow for sequential because how it processes form cpu to gpu but all the following images would be fast

coral mulch
#

oh yeah uh

#

I forgot to mention

broken grail
#

supersaturated colors and noise

coral mulch
#

that's with sequential LORA on

novel sphinx
#

in wsl i was getting like 2.3 it/s in sdxl so thi sseems about right

broken grail
#

does ipexrun work for you guys?

#

i have an identical setup to disty, pretty sure, and it's barely working on my machine

#

kinda stinks

coral mulch
#

I have disty's working on my side.

broken grail
#

idk if the bifrost card is any different or if I need to update some microcode or something

coral mulch
broken grail
#

wow you've got all the vram savers on

#

I haven't tried much of sdxl yet

coral mulch
#

Indeed

broken grail
#

what's the typical performance penalty with that suite?

coral mulch
#

I didn't show that either huh

#

I'm sacrificing speed for sheer image generation

broken grail
#

also: anyone notice any significant difference between bf16/fp16?

coral mulch
#

So I just tested

#

I can do Batch size 24 on Sequential

restive parcel
coral mulch
#

The higher the image batch size you can get, it seems you get closer to actual image gen performance

#

However it IS slower than Model CPU offload

#

Ope, it's lower than that.

#

16 seconds.

broken grail
coral mulch
#

Then the real question is

#

What's the best batch size to generation speed?

broken grail
#

hm

#

I would test it but keep getting numerical instability past batch size 2

restive parcel
broken grail
#

anyway I'd imagine you'd get discontinuities whenever you need to kick on another vram saver

coral mulch
#

It did it.

#

1280x720 images.

#

Zero prompt with just negatives generates some interesting outcomes.

broken grail
#

why so gray?

#

oh zero prompt

#

I would occasionally get really gray results with second pass

#

kinda had a faded look. pretty cool

#

annoying though

restive parcel
broken grail
#

hm

#

try disabling ipexrun if you haven't already

#

what flavor errors are you getting

coral mulch
#

im still on 22.04LTS

broken grail
#

anyway i gotta go to sleep

#

hopefully i can get my litany of errors sorted out in the coming days

#

i was nagging disty on github since they were the only other person I knew about running sd on arc but now I've found this servers so things should be smoother

coral mulch
#

I love how despite not having any prompts

#

Somehow it still puts together a coherent image on it's own

#

This model blows 1.5 out of the water

novel sphinx
#

okay correction, only the 1.5 base model seems to work properly with the openvino implementation, whilst other models will work and generate an image, im assuming the openvino compilation pipeline messes the models up as other models just output complete garbage

coral mulch
#

Just a lil' question

#

What do you have to do to get ComfyUI opearting on Arc?

keen marsh
#

Also, sdxl didn't work for me, but I didn't really know what i was doing. 1.5 worked fine though

coral mulch
#

I'm trying to get the SD.Next ComfyUI Extension to work lmao since it's literally just ComfyUI

keen marsh
#

I wonder if this could work for sdnext as well. Not a fan of automatic since it's never natice support for their platform, its always a fork that may never get maintained etc.

novel sphinx
#

the openvino appartently doesnt work with other scripys and such so i dont think it would work with sd.next as its heavily modified

#

this will likely change in the future as development continues but its nice to have an easy to use webui version using openvino which is the fastest on arc by far

ember orchid
#

We have a thread A1111 for Arc on here
#1141164275990278206 message

coral mulch
#

oh hey it owrks

#

SDXL ComfyUI extension does indeed work

restive parcel
#

nice

keen marsh
#

Also, man a few months ago i dont think i imagined so many options for arc gpus. Coming along fast imo

coral mulch
#

@keen marsh Literally the same it seems. At least in terms of normal non-offload running, 1 IT a second basically (with LORA)

#

Then again this IS just the extension

#

It's using all the same packages my main venv is using

coral mulch
#

It uses A LOT of VRAM though it seems

#

It's nowhere near as optimized

#

Yeah nvm I can barely run it lol

#

It runs for the first two images then explodes

#

Well at least I got a lil' taste of ComfyUI, and I don't really like it tbh.

#

🤷‍♂️

keen marsh
#

Yeah not s big fan either, it was slower for me with sd1.5 too. Probably not bad when you get over the learning curve though, seen people make very fine too ed iterations pretty quickly with it.

proper cradle
#

But It's slower than no compile at all on my end

#

And it uses more VRAM

proper cradle
chrome bone
#

i think its simply because of precision conversion somewhere in the pipeline, bf16/fp16 shdnt affect speed per se

#

just guessing though

keen marsh
#

if 11it/s is possible, maaaan lol

broken grail
#

I noticed compile was slower for one off generations but would speed up with larger batches and consecutive runs

broken grail
keen marsh
#

I will check out this a1111 fork and look into sdnext's openvino backend on my system maybe today.

broken grail
#

search a770

keen marsh
broken grail
#

no

keen marsh
#

or does that benchmark work on a1111?

broken grail
#

that was ipex

#

in sdnext

#

invokeai optimizations

keen marsh
#

i've only ever gotten 6/it's, but I do have a750, maybe it's not possible on it

proper cradle
#

Diffusers is way faster than original backend

#

OpenVINO WebUI uses diffusers too

keen marsh
#

ahhh, okay. I never tried that in linux

broken grail
#

my 11it/s was without diffusers iirc

#

or maybe not

#

idk

#

don't remember

keen marsh
#

my native windows ipex is a percentage slower than linux right now, the self compiled one with AOT anyway

proper cradle
broken grail
#

whatever sdnext defaults to

proper cradle
broken grail
proper cradle
#

I get 8.3 it/s at 512x512 on original backend with FP16

novel sphinx
#

Native windows with openvino a1111 fork sd1.5 512x512 is 11.2 it/s

broken grail
proper cradle
#

Without

broken grail
#

I was using compilation warmup and batching

#

That's why

#

Anyway diffusers/vino is faster than original/ipex nowadays?

keen marsh
broken grail
#

I wonder if zen kernel is causing me trouble

#

gotta get ipexrun fixed

novel sphinx
#

yeah they dont seem to work right, the fork automatically converts them when it does its model compile, you get generations but the outputs dont match what the model is for

proper cradle
#

OpenVINO actually slows things down in SDNext

novel sphinx
#

For example an anime model will still output real life images more like what the 1.5 base model would produce

proper cradle
novel sphinx
#

Albeit it seems like if a model is say trained more for nsfw content that seems to stick just not the models desired style

keen marsh
broken grail
keen marsh
#

Not sure if that fork is maintained , which is why i like SDNEXt tbh

broken grail
#

maybe it's not loading an embedded vae

keen marsh
#

Vae shouldn't effect the style that much? mostly color output

broken grail
#

ehh

keen marsh
#

Also, make sure you use clip skip

novel sphinx
#

i also notice far better results using dpm++ 2m karas vs euler a using the openvino fork. This vae thing is possible. I haven't done alot of extensive testing yet but it is nice to have a solution thay works natively on windows

broken grail
#

fine details are all vae

novel sphinx
#

Is clip skip an extension?

broken grail
#

it's a setting

novel sphinx
#

Okay

keen marsh
#

No, you can set it in the options.

broken grail
#

you discard the last n layers of CLIP

keen marsh
#

Most anime models need clip skip 2, most realistic models need 1.

broken grail
#

essentially it weakens guidance

novel sphinx
#

Do note that openvino fork disables all other scripts other than the openvino acceleration script

broken grail
keen marsh
#

I find that it doesn't make THAT big a difference though, just changes the image the style is usually the same. Sometimes stuff like "masterpiece" can make frames around the image in some models lol

#

Could be different in openvino though

chrome bone
#

it should have a big impact on image generated, not just colors (though thats probably what you see in practice). VAE decoders is just a neural network that convert latent image (that humans cannot comprehend) back to pixel space

keen marsh
#

It's worth a shot. Vae might need to be converted to openvino as well right? Not sure if the fork does that

proper cradle
#

It will run on the CPU otherwise

chrome bone
#

thats the case a few months back (and honestly i think still is), you cannot just use custom models without conversion

novel sphinx
#

So you think i should set clip skip to 2? It defaults to 1

keen marsh
#

You can also add it to the main page in the options so you don't have to go to settings all the time.

#

sdnext already does that for you btw

novel sphinx
#

yeah setting clip skip and such seems to not change anything, definitely should not be getting like real life photorealistic images from conterfeit but here we are

proper cradle
proper cradle
#

Beats RTX 3070 at batch size 16

#

Pretty close to a RTX 3070 Ti

novel sphinx
#

Im not sure but using seperate vaes have no effect on the image so they're being ignored

keen marsh
#

I had an issue where certain vaes didnt seem to work in diffusers, also some sanplers made garbled output

coral mulch
#

Any news on High-res fixes yet for Vladmantic?

#

I'm kinda itching for it.

pastel geode
#

So I finally decided to try https://www.technopat.net/sosyal/konu/using-stable-diffusion-webui-with-intel-arc-gpus.2593077/
on a clean Ubuntu wsl but it appears that it doesn't let me **load my weights. **. Is there a way to resolve this?

ember orchid
#

A1111 OpenVino solution already has a fix for "Restore Faces" update soon

pastel geode
coral mulch
proper cradle
#

Or you can try xpu_VISIBLE_DEVICES env variable

coral mulch
#

🤔

pastel geode
#

once I do that, do I just run ./webui.sh --use-ipex?

ember orchid
#

FYI I fixed my issue with the A1111 OpenVINO solution by reinstallling my driver, disabling my RTX and reinstalling

proper cradle
#

Try 1 or 0

proper cradle
pastel geode
pastel geode
#

0

coral mulch
#

The number is the GPU ID.

proper cradle
#

use 1

coral mulch
#

Use 1.

broken grail
pastel geode
proper cradle
#

ipexrun things

broken grail
#

hey that's my error

#

I think

proper cradle
#

don't use --use-ipex to disable ipexrun

coral mulch
#

It was broken in SDXL.

broken grail
#

oh you mean SDXL second pass

#

ok

coral mulch
#

...

#

No, I really don't.

proper cradle
broken grail
#

"hires fix"?

proper cradle
#

It's the exact same thing

broken grail
#

thus saving a round trip through the VAE

proper cradle
#

And latent upscaling was generally bad in my experience

pastel geode
#

without ipex

proper cradle
#

Try disaling iGPU from the BIOS

pastel geode
#

Okay, and what command should i run after?

broken grail
proper cradle
coral mulch
#

Would it even be remotely possible to generate a 4096x4096 image on SDXL without artifacting or duplicating?

broken grail
#

with lots of manual intervention absolutely

proper cradle
broken grail
#

my trick for super duper resolution stuff is to generate at multiple "scales"

#

and stick things together

proper cradle
broken grail
#

granted it's inconsistent and only works well for niche things

#

unless you mean directly

proper cradle
#

if you manage to get a decent 2048x2048 image, you can upscale it

#

Generating at 1920x1080 and upscaling to 3840x2160 works well:

coral mulch
#

Would you second-pass upscale it?

#

Or just go into extras

proper cradle
#

Img2Img

coral mulch
#

Bruh.

proper cradle
#

Re-generate

#

Upscaling from extras isn't good

#

2048x3072 with Lora:

pastel geode
proper cradle
#

Selected model not found?

pastel geode
#

yep

#

checkpoint*

proper cradle
broken grail
#

also as a tip when upscaling via img to img it's often beneficial to include more close-up related things in your prompt, since you're essentially running the model on small areas at a time

#

the extreme case of this is to upscale first via simple interpolation, than inpaint areas one by one to add more detail

#

this process could be carried out forever, in theory, especially with ControlNet to keep the model in line

#

but it's hilariously labor intensive

proper cradle
broken grail
#

oh you're just directly tossing it in?

#

hot damn

proper cradle
#

Yep Img2Img it in one go

broken grail
#

what's the vram limited resolution on that one?

#

I'm guessing that's with all the vram savers on?

proper cradle
#

With Attention Slicing and VAE Tiling and Model Shuffling, A770 16GB is VRAM limited to 4096x4096

broken grail
#

nice

proper cradle
#

Only VAE decode is Tiled

#

--lowvram and --medvram (aka cpu offloading) is disabled

ember orchid
coral mulch
#

Which is what I have currently set up.

#

👍

#

Nvm I answered it myself.

#

I'm an idiot lmao

coral mulch
#

@proper cradle When resizing, do you just use Resize Fixed

proper cradle
coral mulch
#

Thank you.

broken grail
#

well I got ipexrun working

#

it was model compile

coral mulch
#

Yeah, img2img makes a HUGE difference in quality.

#

I am very pleased with the outputs.

proper cradle
#

Pushed the Windows fix by @paper horizon, can someone test it?

#

Also is it detecting OneAPI if you don't use --use-ipex?

novel sphinx
#

I remember previously when ipex for windows first release and i tried it was detecting oneapi without the --use-ipex

novel sphinx
#

OSError: [WinError 126] The specified module could not be found. Error loading
"C:\Users\KingOfMemes\automatic\venv\lib\site-packages\torch\lib\backend_with_compiler.dll" or one of its dependencies.

#

detects oneapi without --use-ipex tho so the audodetection works fine, i have tried launching with an without --use-ipex and done --reinstall just to make sure but still no luck on windows at the current moment

#

this happens during the install as well

proper cradle
proper cradle
paper horizon
#

you have to do that in a conda environment as ipex tutorial says

grave condor
#

I managed to get it working without an conda env and just installed it into my system pip

#

but I am using the VSCode oneAPI env setup, which might rely on conda under the hood

#

I would consider conda a dependency

novel sphinx
#

Yes i saw the incompatible torch version and then it reinstalled

paper horizon
#

it's okay to not use conda as long as uv.dll is in your library path. conda install libuv just simplifies things

keen marsh
#

Anybody gotten sequential offload to work on native windows in sd.next?

#

I do have torch, ipex and torchvison compiled from source. I guess i could upload the wheels.

#

I want to tey and compile the specific git# that intel used to see if speed in native increases or if aot just makes it slower, the it takes hours with aot.

chrome bone
#

you can just git checkout at specific commit

#

i doubt its working well currently.. theres no reason to not upload a functioning prebuilt wheel file otherwise

keen marsh
#

you edit the compile.bat with the get# where it has "xpu-master" also xpu-master adds a file call that doesn't exist in pytorch but is easily fixed with a simple comment line. I use Vipitis's .bat file still

#

I also edited a compile file for just Ipex

#

I have the wheels uploaded in this thread somewhere, but those you have to edit one file. If you compile from the xpu-2.X it works without the need to edit, and that's where the git# they use is from. It's from way back on july 25th (my birthday btw lol)

#

for sd.next this isntalled for me without any error in windows if you want to use the prebuilt wheels " torch==2.0.0a0 torchvision==0.15.2a0 intel_extension_for_pytorch==2.0.110+gitba7f6c1 -f https://developer.intel.com/ipex-whl-stable-xpu "

novel sphinx
#

yes it all installs fine, always did tbh, still havent gotten it to actually launch within windows

#

OSError: [WinError 126] The specified module could not be found. Error loading
"C:\Users\KingOfMemes\automatic\venv\lib\site-packages\torch\lib\backend_with_compiler.dll" or one of its dependencies.

#

this is the error i now receive

keen marsh
#

sorry edited

keen marsh
#

I just woke up so i am slow right now give me a min to check

#

yeah, that is the file that doesn't exist in pytorch do this

#

2.)Locate the init.py file in your intel extension for pytorch folder pip

"your_python_directory\Lib\site-packages\intel_extension_for_pytorch\ init.py"

3.) Comment out line 100

#from . import _inductor

#

should work after that

#

It is in xpu-master for some reason, but it is not in the xpu 2.X branch

#

If you compile from the git hash or the specific xpu2.x branch it doesn't exist

novel sphinx
#

im not seing that in my file

keen marsh
novel sphinx
#

yeah my version already doenst have that line

#

from the prebuilt wheel

keen marsh
#

hmm...everything running from oneapi environment? Call all variables etc "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
"C:\Program Files (x86)\Intel\oneAPI\mkl\2023.2.0\env\vars.bat"
"C:\Program Files (x86)\Intel\oneAPI\compiler\2023.2.0\env\vars.bat"

novel sphinx
#

correct

keen marsh
#

Running in Conda?

paper horizon
#

launch SD.Next by python launch.py --use-ipex

keen marsh
#

Forgot to mention, conda is necessary for me

#

I actually had to replace my python with it, as RC posted in this thread

paper horizon
#

I've checked the dependencies of torch and it requires libuv

keen marsh
paper horizon
#

miniconda3\envs\{env_name}\Library\bin\uv.dll

keen marsh
#

then copy the folder to your VENV in automatic

#

that wheel also does not need 15minutes to start, it is compiled with AOT so starts right away. But it is 2it/s slower than normal in original backend for some reason

#

Which is why I may try and compile from that git# but I don't really feel like spending another day on it lol

paper horizon
keen marsh
#

lol, took 4-6 hours

#

does 90% then when it gets to cmake cpu goes to 15% and it takes HOURS

paper horizon
#

may I know your CPU and RAM?

keen marsh
#

5600, 32gb of 3200

proper cradle
keen marsh
#

I have another wheel you don't need to edit a file, but too lazy to upload tbh lol

novel sphinx
#

even with libuv and lauching from conda still get same error

keen marsh
keen marsh
#

I don't use Python for anything else so It didn't matter to me.

paper horizon
keen marsh
#

I think the conda python and system python need to be the exact same or something

#

there is also some reference to conda in one of the compile.bat files I think

#

I could only get python 3.10.6 when conda is like 3.10.12 so that may be why

proper cradle
#

What this returns when you run this in the webui env?
pip show torch

novel sphinx
#

Name: torch
Version: 2.0.0a0+gitc6a572f
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: [email protected]
License: BSD-3
Location: c:\users\kingofmemes\automatic\venv\lib\site-packages
Requires: filelock, jinja2, networkx, sympy, typing-extensions
Required-by: accelerate, basicsr, clean-fid, clip, clip-interrogator, compel, facexlib, gfpgan, invisible-watermark, kornia, lpips, open-clip-torch, pytorch-lightning, realesrgan, timm, tomesd, torchdiffeq, torchmetrics, torchsde, torchvision

paper horizon
#

did you launch sd.next with webui.bat or python launch.py? Try the latter and use python from your conda env with libuv

novel sphinx
#

i have tried that same dll error

paper horizon
#

(maybe delete the venv directory too

novel sphinx
#

and verified libuv is installed with conda list

paper horizon
#

just recorded my steps

novel sphinx
#

i will try full reinstalling through conda and seing what happens

keen marsh
paper horizon
#
Python 3.10.12```
keen marsh
#

Also, how do you update python 3 to the latest version in windows without installing 3.11

paper horizon
#

yes

#

conda create -n {env_name} python=3.10

keen marsh
#

I could not for the life of me figure out how to update from 3.10.6, theys topped uploading install files

paper horizon
#

it gave me 3.10.12

keen marsh
#

I mean your python outside of the venv

#

My theory is that they need to be the same to work, as I couldn't get conda to work with ipex while python was installed in my path

paper horizon
#

I didn't install another python other than from miniconda

keen marsh
#

Now i can run it without conda as well btw.

#

so no python3 on your system already?

paper horizon
#

yes

keen marsh
#

Okay, yeah that's the same for me then. Only Vipitis seems to have gotten it to work with python 3 installed

#

although I never tried 3.11

paper horizon
#

someone tried 3.11, and IIRC, SD.Next doesn't support 3.11

#

lol

proper cradle
keen marsh
#

Also, have you gotten sequential cpu offload to work in windows? @paper horizon

#

It did work with native wheel IIRC, but it doesn't with my prebuilt one for some reason. It may have never worked though and I am misremembering

novel sphinx
#

ok installing and configuring thru conda has been successful thus far

broken grail
#

out of curiosity, has anyone gotten any kind of training working on 1.5 or XL?

#

recently

#

I think disty had a patch for inversion for 1.x torch, but it would crash after a few iterations for me

novel sphinx
#

it is not inferencing on windows

#

WARNING Torch FP16 test failed: Forcing FP32 operations: Tensor on device meta is not on the expected
device xpu:0!

#

got this

#

hitting generate doesnt throw any errors but literally nothing happens

#

no activity on cpu or gpu

#

and my igpu is disabled so thats not the issue

#

C:\Users\KingOfMemes\anaconda3\envs\sdnext\lib\site-packages\numba\np\ufunc\parallel.py:371: NumbaWarning: The TBB threading layer requires TBB version 2021 update 6 or later i.e., TBB_INTERFACE_VERSION >= 12060. Found TBB_INTERFACE_VERSION = 12020. The TBB threading layer is disabled. this also happens when launching

grave condor
#

in taks manager, the GPU activity is hidden under the "compute" graph

novel sphinx
#

oh wait i lied this must be long aot thing people have talked about i see some progress happening now

keen marsh
#

The prebuilt wheels will take about 10-15 minutes on the first inference. You have to restart the ui each time you change diffusers i think.

novel sphinx
#

original backend does not work on windows for me i get api errors diffusers seems to be working

#

i see okay

#

guess ill just let it warm up

keen marsh
keen marsh
grave condor
#

I kinda want to try compiling with AOT for python 3.9 but don't really want to spend 6 hours... and my CPU is even older

#

did you modify the script to just build ipex and use the troch prebuilt wheel instead?

proper cradle
keen marsh
#

Also, you exchange slow startup for slower inference speed, but it's still fast enough IMO. Especially with diffusers

novel sphinx
#

Well, i got an image to generate after the long startup

#

Was close to 10 it/s

#

Then it crashed and locked my whole pc up

grave condor
#

I believe you can call torch.compile or torch.jit.trace to get better inference performance down the road

#

but I need it to just quickly work for developing this stuff

keen marsh
#

well, google restricted my wheel file for whatever reason

#

Just going to delete it I guess, not worth the review. Not sure many used it anyway.

grave condor
#

I will just give it a try and let it run for a while

keen marsh
#

I sent it for review, just in case google thinks I'm hacking people or something lol.

grave condor
chrome bone
#

i think he means use this branch instead of xpu master

grave condor
#

I got a CMake Warning: Manually-specified variables were not used by the project: and then it lists a few things as well as the USE_AOT_DEVICES which I added. I hope that's just a warning that remains from defaults they used. Don't want to add up with the same JIT variant in two hours

chrome bone
#

it is bound to happen

grave condor
#

do we know if there is a trick to check?

#

is there like ipex.aot_enabled()?

chrome bone
#

i..dk. the only one who got it to compile is aaron

keen marsh
grave condor
#

at the top?

keen marsh
#

yes

#

how i got it to compile aot was like this "compile_bundle2.bat 1 2 ats-m150 " (bundle 2 is my edited bat for just ipex)

#

this is using the bat you edited earlier btw

#

also, I did this in conda in the oneapi environment etc

#

outside of conda it failed

#

I recommend trying the specific git# they used tbh, I'm hoping it is faster as I'm not sure if AOT makes it a bit slower or they changed something in the code since then

#

Also it pulls a lot more warnings than when using xpu-master, but it works the same in the end

grave condor
#

I now hope my compilation finishes successfully but I will look at the changes.

keen marsh
#

No doubt, if you used xpu-master then you will need to edit the init.py file and comment out that line that pulls the error

grave condor
#

30 minutes in and it's 488/1049 so an hour seems reasonable

keen marsh
#

Really not sure what it's trying to pull from pytorch, but it doesn't exist

#

Oh, if you are using AOT, it will hit 1047 and take about 4 hours from there lol

#

They acknowledged this on the github as well, and say they are trying to fix it.

#

If the cmake exe is running it is still compiling, cpu was around 15% at that point

grave condor
#

so I am hopeful

keen marsh
#

oh nice, I haven't checked since I compiled

grave condor
#

they fixed some stuff in the file inside the release branch

#

I grabbed the script from GitHub today so I should have all the fixes

keen marsh
#

yeah I see it now, this should work

#

No need to edit anything afterward

grave condor
#

I didn't throw out my old ipex version. But there is a force_reinstall. As long as I get the wheel files it should be good

keen marsh
#

You can install the prebuilt wheel over it fine, the wheel you compile will be inside the Dist folder

grave condor
#

I got some many warnings by now haha

keen marsh
#

yeah, lol

grave condor
#

90 minutes and I am on 775

#

I need that i9 in my next build

chrome bone
#

arrow lake next year

#

built on 20a node

#

would be a shame if you decided to go 14900k

grave condor
#

yeah, it doesn't sound like the smartest decision

#

but I have waited long enough and there will always be a next gen.

grave condor
#

got to 1047 in around 2 hours

keen marsh
#

oof, I would plan to add a couple hours from my estimate. Took maybe an hour or less for me to get there, don't think it took that long.

grave condor
#

Successfully installed intel-extension-for-pytorch-2.0.110+git509a378
it took in total around 5 hours 30 minutes. and the step 1047 was reach after 2

#

let's hope this works

grave condor
#

it does work, still had a short wait on first inference but it is fast enough to be useful without any specifc tweaking. Well worth the 6 hours.

tall grove
#

This wheel file would work for anyone?

#

If so weird they haven't done aot version yet

keen marsh
#

Probably don't want to compile for 6 hours, it's slower too so they may be trying to figure that out as well as decrease the compile time.

grave condor
#

The wheel I have is for python39

#

seems like the last two hours is just ocloc.exe running

keen marsh
#

apparently there are controlnet models that work with sdxl but only in comfyui right now

broken grail
#

I can't seem to get img2img upscale on diffusers working

#

It just makes the images noisier (?)

#

hmm, might only be occuring at higher resolutions

#

are these related to VRAM usage? is there some sort of soft cap I'm hitting that drops quality?

#

hmm, got it working at 1.9 scale...bet this is just a 1024 issue again

#

wonder why 1024 res is so cursed

restive parcel
#

especially since that's the exact resolution its meant to work best on

proper cradle
#

Too low and it will be noisy

#

Too high and it will change the image too much

broken grail
#

no, it gets more noisy

#

if you watch the interim images it goes from noisy to noiser

proper cradle
#

Base res?

broken grail
#

uh

#

I think it was like 1024x1280

#

probably 1024 issue

#

related

proper cradle
#

So hires is 2048x2560?

#

1024 curse shouldn't hapen at 2048

broken grail
#

no sorry base was 512x 640

proper cradle
#

Yep, probably 1024 curse

broken grail
#

that's so odd

proper cradle
#

try 1080

broken grail
#

I did 1.9x scaling and that fixed it

#

I wonder what the cause of that 1024 bug is

proper cradle
# broken grail FWIW

Also enabling both move base options will save 6 GB VRAM without any performance loss

proper cradle
#

Or using refiner

broken grail
#

Sure

#

I take it latent upscale is also broken on diffusers right

proper cradle
#

Also hires is working on the dev branch

broken grail
#

sweet

broken grail
#

there's no ControlNet on diffusers, right? What would it take to get working?

grave condor
#

there is StableDiffusionControlNetPipeline directly in diffusers. I used it today for a project

keen marsh
grave condor
keen marsh
#

you can use controlnet is comfyUI btw. Both controlnet and controlnet loras

coral mulch
keen marsh
#

only tried in native windows though

open sundial
#

And if I wanted to make it 8K resolution, it would take ~800 seconds

#

Does Vlad produce 4-8k imagery without VRAM errors?

#

on 10GB VRAM?

#

Also Enjoy my new LORA for horror style 😄

keen marsh
proper cradle
open sundial
#

Would be nice to see how it runs on the lower VRAM cards

proper cradle
#

SDXL 1024x1024 can run on 2GB GPUs with --lowvram

open sundial
#

Fair. That's pretty decent. I'm on a 10GB VRAM + 16GB of system ram and producing 4k images using only Tiled VAE, A larger than average Page/Swap file and fp16 precision.

#

I've produced 8k and larger, but it just take 10 minutes plus

proper cradle
open sundial
#

That is unfortunate considering that FP16 is only useful for low end GPU's really

#

I mean, if it weren't for the lag that it causes my system, I'd still be using the fp32 vae myself

open sundial
#

Not if you have a card pre-RTX

#

Bf16 is RTX only

#

For nvidia anyway

proper cradle
#

Intel ARC defaults to BF16 on SDNext

open sundial
#

Awesome, because BF16 is the most optimum for SDXL

proper cradle
#

BF16 generally runs faster on ARC than FP16

open sundial
#

Interesting.

#

I released a BF16 LORA yesterday, then found that all GTX users need fp16 lol

#

so I had to put out 2 versions lol

#

Has anyone done any Arc A380 testing then?

With StableSwarm now a thing, the opportunity to generate multiple batches of images, per graphics card, simultaneously is now a thing.

#

Does anyone have any experience using StableSwarm with multiple GPU brands?

proper cradle
#

Splitting batches to multiple GPUs were already a thing?

proper cradle
open sundial
#

And it's just more accessible in general than older methods

#

I have an A380 as a secondary and I'm just checking what options I have for potential workflow improvements

onyx moth
#

hello, I tried OpenVINO for a while and it's not quite there yet, comfy looks like it still has some issues with it too. can someone direct me to a solid guide for sd.next? I want to run 1.5 and eventually sdxl. thanks!

#

also if theres some easy to follow documentation on what it can and cant do I would love to see it. Intel Arc A770 16GB

#

would I just follow the instruction on the pinned post in this thread? edit: it looks like this is the way

keen marsh
onyx moth
proper cradle
coral mulch
keen marsh
#

If you mean native windows, then it's way more complicated as you have to also compile it yourself. If you want to use IPEX go for wsl2, the install is a lot mroe involved than openvino

#

If you don't compile, you have to wait 10-15 minutes before your first generation, but after that it is pretty fast

onyx moth
#
#

Its the same right?

proper cradle
#

Run;

sudo apt install libjemalloc-dev
onyx moth
#

This is where I'm at can I run that after it's done

#

Thanks man.

#

I ran libjemalloc, did I miss something?

grave condor
#

I am writing a small controlnet app and trying to run it on my A750. But it is really slow. What is the trick to speed up inference with the different models. or which file do I need to look at the find the solution.
As this will be hosted it needs to be device agnostic.

onyx moth
#

was I supposed to install libjemalloc in automatic folder? I did it after going cd

keen marsh
# onyx moth

it's best to use webui.py to run in the venv, not sure if that's causing your error though

pastel geode
# onyx moth

I got this error before.
./shared/source/os_interface/os_interface.h
Do you happen to have your igpu enabled? Check to see if you have 2 gpus in task manager. I managed to fix mine after disabling igpu multimonitor on my asus motherboard.
After that, try running it again. If it doesnt work, Reinstall.
#1084296011675082843 message

onyx moth
#

oh, this? gpu 0

pastel geode