#ComfyUI for Intel Arc using IPEX

1 messages · Page 11 of 1

earnest grotto
#

Wonder if it's swapping more blocks or if wan just scales like that, or both

reef ivy
#

Have you tried kijais nodes? You can specify the block swap amounts

earnest grotto
#

I can specify when launching comfy too

earnest grotto
#

also qwen image edit model released

reef ivy
#

How do you do that? Although with the nodes you don't have to restart the webui

earnest grotto
#

huh, I swear there was a launch argument to do that but I guess not. I must've mistaken it with kohya as that does have block swapping

#

I guess I might try kijai's nodes then

reef ivy
#

heads up you will need to edit a line of code in the sampler, basically fp64 to 32 etc. For some reason he does it differently than base comfyui. If on battlemage it might just work though.

earnest grotto
#

apparently qwen image's hf space takes more than 300s by default to edit an image, and so won't let me. weird that they made it take that long

somber trellis
#

problem is the text encode takes a lot longer because it's utilizing an mmproj

#

20-22s/it

somber trellis
#

I realized I could just use the multigpu cliploader on lowvram and force it on xpu without issues.

somber trellis
#

ngl tho it works very well

earnest grotto
#

I'm going to try qwen image edit, and I wonder, I have a feeling it will be much better than kontext at everything except text 🤔

earnest grotto
#

Man. 48gb (on windows?) is not enough for loading the fp8 model. oh well, q6 loads then after the spike gives me 16gb free which ticks me off

#

Swap was set to 32gb which should've been enough. I'll retry with 50gb of swap

earnest grotto
#

Replacement for CFG through dropping blocks randomly

#

If only SAI had gone a little farther with skip layer guidance. I wonder how good this will be in practice, it looks very promising, though they all do, this one feels like it's confirming biases I have and that makes it gooder in my eyes

somber trellis
#

using --reserve-vram 8.0

#

I'm using a q8 mmproj

#

and multigpu loaders alongside --cache-none and --disable-smart-memory

#

and --lowvram

earnest grotto
somber trellis
#

I have pagefile set to automatic on my end

#

currently taking 18gb

#

and normal qwen image too

#

both operate at the same speed in terms of inference

#

1024x1024 euler simple is 11s/it and 1536x1536 is around 14-15s/it

#

that's with the lightning lora tho

#

and cfg set to 1

earnest grotto
somber trellis
earnest grotto
#

Mm, qwen inference speed seems to scale very well with higher resolutions indeed

#

1536x1536 is 2.25x more pixels than 1024*1024, or... roughly 2.25mp either way, but for you, and for me, it's more like sqrt of that as slow

reef ivy
#

Is qwen worth it? Seems really slow.

earnest grotto
#

Yeah, it's unusably slow without the 4 step lora

#

But with it, it's better than kontext

#

I haven't tried text yet, I guess. Still poking things. Only did some tests with higher resolution now, it seems to follow the prompt less with higher resolution but it still does things, whereas kontext would either do nothing or do nothing and blur the image

earnest grotto
#

better and faster

craggy hinge
#

A slightly silly question: what argument do I need to set to run ComfyUI from the second ARC GPU instead of the first?

earnest grotto
#

--default-device 1

#

Say what the actual GPUs you have are. If you're referring to an iGPU and a dGPU, you should probably disable the iGPU instead

craggy hinge
#

I9-13900K, 2 ARC A770. And I see no integrated GPU in Device Manager, I don't remember whether I disabled it in BIOS or not.

#

And the -default-device 1 argument does not seem to work.

earnest grotto
#

Open comfy/model_management.py, go to the lines where ipex_to_cuda is imported and ran, and after that do torch.xpu.set_device(1), keeping the indentation intact. if you did not install comfy using my script, this would be where comfy tries to import ipex, you can do that after the try-catch

craggy hinge
#

Oh, Thanks. I deeply apologise, I didn't understand everything in the script, but I did it this rough way and it worked. If you could provide slightly more detailed instructions, I would be very grateful.
"def get_torch_device(): global directml_enabled .... if is_intel_xpu(): return torch.device("xpu", torch.xpu.set_device(1))"

earnest grotto
#

that can work yes

#

it's messier

earnest grotto
#

Apparently early qwen image edit had a bug in diffusers' inference code. And apparently that was in comfy too, as updating comfy and rerunning the same workflow changed the output

reef ivy
#

you could also try those multigpu nodes, but last I checked they didn't support intel and had to edit some code to add xpu in.

gleaming nebula
#

I have a ASUSVIVOBOOK S 14, with INTELLLCOREULTRA 7 EVOEDITION with INTELARCGRAPHICS, The CONFYUI is using NPU or ARC graphics?

keen adder
#

Hello colleagues, I have a question regarding creating an image (text-to-image). I have an Etsy shop that sells digital files prepared for wood engraving.
My problem is that I can’t find a suitable checkpoint + LoRA (if necessary) to get close to the style I used to create with Microsoft Designer. I will attach an image created with Microsoft – this is the look I want to get as close as possible to in my setup.
I have a 12GB graphics card, so full Flux models are not an option. 🙂

earnest grotto
#

funny how kontext's prompting guide better applies to qwen

keen adder
earnest grotto
#

I think using regular flux/qwen might just be better

#

how much ram do you have and how long are you willing to wait

lunar thicket
earnest grotto
#

I wanted them to be ratmen but decided to settle for this

#

vermintide

earnest grotto
#

Huh, the 4 steps lora seems to produce better results than the 8 steps one

earnest grotto
earnest grotto
#

Oh and, I guess I might as well post this here too

keen adder
# earnest grotto 🤔

This is top result. I have other programs to edit and prepare for engraving. Have 32gb ram and B58012gb vram.

lament shale
earnest grotto
#

yes

lament shale
#

oh and do we even need to update

#

like is there any significant difference now

#

last I downloaded was back in April and haven't updated comfy since

lament shale
earnest grotto
#

3rd post in the pins

earnest grotto
lament shale
#

flux models are tedious on my a750

earnest grotto
#

it also has a minimap now though imo that's a bit of a redundant feature if zooming were better

lament shale
#

wanted to try kontext

earnest grotto
#

and group nodes have been improved

earnest grotto
#

kontext is just plain worse

lament shale
#

i see

earnest grotto
#

how much ram do you have

lament shale
#

32

earnest grotto
#

you two need to buy a bit more

lament shale
#

lol

#

since I don't do really edit much, I don't think it's worth the hassle for me. Since the nodes work nicely in their older versions

#

just wanted to see if the new comfy updated made any significant changes

earnest grotto
# lament shale i see

I would've posted 8 images (2x4) comparing kontext and qwen anime-ifying but i forgot the bot times out if you post a lot

#

apparently they will be making a new version as well that does multi image editing better

lunar thicket
earnest grotto
#

It produces better results. It has much, much less cases of refusing to do anything, It's much better at following the prompt. You can compare the 4 images yourself, I've gotten similarly better results with other tasks than animification

#

And that's with the 4 step lora for the regular qwen, an 8 step lora for this edit qwen just released and a 4 step one will likely come later and those should offer even better results

#

Kontext's prompt guide never made much sense to me as prompting differently did not make too much of a difference. but for qwen specifying "while keeping X the same" actually works and not specifying it actually sometimes gives a not-the-same result

#

The only thing which IMO kontext really did well, was removing text and watermarks. However in a quick test qwen seems just as good

#

One downside is the image resizing, from what I can see you might need to resize to multiples of 112? I'll test

lament shale
#

ahh, thanks for the info Vik

earnest grotto
#

I'm redownloading kontext to do a few more comparisons

reef ivy
lament shale
#

yeah, but I'm fine with what I have for now

reef ivy
#

Yeah, just find the commit for when they added whatever model you want and you should be fine, sometimes other nodes get updated to improve performance though.

earnest grotto
#

Some colorization attempts. Original, kontext, qwen. I cherrypicked the best results i could scrounge up. The rightmost kontext result is... Very blurry despite being higher resolution than the others

#

In this specific case, the person with the white hair is also an albino which I didn't specify in the prompt, so kontext's fried colors end up being a bit better but still

#

Girl on left should have brown hair, brown eyes. Boy on right white/silver hair, red eyes.

#

For some reason qwen is prone to outpainting when the resolution is higher. But it also interprets the prompt differently, sometimes better, which can be handy

somber trellis
#

Qwen Image Edit Lightning loras released 10 hours ago.

earnest grotto
#

about 10 hours ago

#

alright then

somber trellis
#

goin around on gmod messing with people using qwen image edit

#

is kinda funny

#

he didnt know

#

I noticed that text works a lot better at 1328x1328 or higher resolutions.

#

That's a vae approx of a 1536 image output im doing rn

#

this one is a 1024 output i got

#

mispell on apocalypse, fixable

#

or maybe im getting bad seeds

earnest grotto
#

Regular lora vs edit

#

Original, "Make the ground in the image full of grass"

#

2mp, regular lora vs edit lora

#

Edit lora less prone to random outpainting 🤔

somber trellis
somber trellis
somber trellis
earnest grotto
keen adder
#

@earnest grotto thanks.

earnest grotto
keen adder
#

@earnest grotto are you remember promt for this image? Because i render more realistic style image 2 eagle. This is with Qwen-4_K_S and 4 step lora.

earnest grotto
#

"Keeping the black and white sketchy style of the image, create a completely new image of greyscale hordes of rat men are sieging a burning medieval city, removing the pumpkin"
"Remove the bats"

#

An edit model is unnecessary for a realistic-looking black and white image

earnest grotto
#

arguably it's unnecessary for this kind of image too as the style is very generic, but i wanted to see if it can match it more closely

keen adder
#

@vik Promt: Keeping the black and white sketchy style of the image, create a completely new image of greyscale hordes of rat men are sieging a burning medieval city, removing the pumpkin. Where i make mistake its very different ?

earnest grotto
#

Drag and drop the image I uploaded into comfui to see the workflow

#

you may need to open it in browser and trim the end of the link that converts to webp

keen adder
#

Alert
Unable to process dropped item: TypeError: Failed to fetch

#

This content is no longer available. @earnest grotto

#

after cut to see workflow

somber trellis
zealous atlas
#

On Windows 11 the script was giving me permission errors about Comfy_Intel after third guestion (Continue?). Double clicking or right click -> open with Python didn't work. The way to solve it was to open PowerShell and run "python .\Setup_ComfyUI_Intel.py"

earnest grotto
earnest grotto
#

apparently updating various things on linux, probably oneapi, besides breaking steam and all steam games, also nuked lora training performance for sdxl

#

at least windows works 😐

civic charm
#

do you know what got updated?

#

My horrible performance with anything after IPEX 2.3 on Arch linux is probably related as Arch uses the latest packages long before they hit on any other distro

earnest grotto
#

I will check later, but I would definitely pin it on latest things

#

I was sticking with older oneapi or level zero or both, not sure, enough that Blender's Cycles wasn't working

#

After updating it now works, but, well...

civic charm
#

oneapi is irrelevant now

#

kohya installs its own mkl and dpcpp

#

pytorch ships its own mkl and dpcpp

#

level zero is the main driver

#

also did the kernel update from 6.8 to 6.12?

#

aka old lts vs new lts

earnest grotto
#

I installed the kernel myself

#

With 6.16.3-061603-generi, after 87 steps I got 11.5s/it

#

Will test with 6.14.8-061408-generic now, then 6.12.3 then 6.11 then 6.8, if none break

#

I believe I was using 6.14 fine

earnest grotto
#

190 steps, 11.4s/it with 6.14

#

10.99s/it with 6.12.3-061203-generic after 40 steps

#

10.97s/it with 6.11.11-061111-generic after 40 steps

earnest grotto
#

11.13s/it with 6.8.0-78-generic after 121 steps

#

i'll try reinstalling things

earnest grotto
#

tried many older versions of libze1, intel-opencl-icd, libze-dev and whatever else with not much luck

#

I'll be trying more some other time, I guess

keen adder
#

@earnest grotto I understand now where is big difference you use Img To Img workflow and copy style of my referal image and create new. I try to make Txt To Img and from there come this different in style.

reef ivy
#

Last time I updated linux, it broke everything, haven't used it since lol

earnest grotto
#

Tried a lot of sycl's environment variables, more versions of level_zero and other packages, no results
what would be causing urenqueuekernellaunch to be taking 10x as long as it should, jesus

earnest grotto
#

Damn. Even with onednn profiling enabled, windows is ~4x faster, ~7s/it vs 26s/it

deft grove
#

b580 사용법 혹시 한글로 설명 해 주 실 수 있는분..? 몇달째 Stable Diffusion 구축하려고 시도중인데 실패하고 계속 cpu만 도는데..

deft grove
earnest grotto
#

Oh yeah, sorry, forgot about that
Install with my script or if you want a simpler installation and a simpler UI, SDNext or AI Playground

deft grove
lament shale
#

can I perform img2vids using wan2.2 on my arc a750?

#

if so, I assume I need to update my comfyui

#

and updating from the manager will probably break some stuff(?)

earnest grotto
lament shale
#

32

earnest grotto
#

You need more for wan 2.2

#

Well, you could probably run the q3s but quality will degrade a fair bit

earnest grotto
lament shale
lament shale
earnest grotto
#

if you installed with my script, run it again to update

#

or:

git stash
git pull
git stash pop```
lament shale
#

Okay, thanks

reef ivy
#

it should work I think, if comfy unloads the previous model first. I still haven't gotten around to using it

#

GGUF will be he best bet, or Scaledfp8, i'd give those a try.

earnest grotto
reef ivy
#

probably end up finding some workaround, maybe if one model is a little less important use the lower quant or something. I really need to get around to trying this stuff out. Want to upgrade my ram next anyway

earnest grotto
#

Using only one of the 2 models defeats the purpose
Might as well just use wan 2.1 then

reef ivy
#

I was thinking use a lower quant for the less important model and higher for the most important. Not really sure how it works but I was thinking its like sdxl

earnest grotto
#

I tried that, having a higher quant for the high noise and lower for low noise is fine
problem is that with less ram you will need lower quants, imo q3 degrades too much, and I expect you'll run out of ram while loading the models with a higher quant, and windows refuses to use swap for that

#

2x8gb ddr4 costs ~40 euros and 2x16 80 so i think the prices are just low enough that you should get more

reef ivy
#

Yeah plan is to buy another 32gb kit unless prices drop again

earnest grotto
#

Personally, my crystal ball tells me that nothing ever happens, new CPUs will not be massively better for a few more years, so if you're not planning to upgrade your CPU in the coming years, might as well lock in on ram

#

block swapping and similar are turning out to just be too good and things are more compute-constrained now

#

for image/video gen, that is 😛

earnest grotto
#

@lament shale If you really want, you can try both with Q4 actually

#

These might be able to load with 32gb, not sure, if you close everything else

lament shale
#

I'll try. I have so many things running on the bg

earnest grotto
#

You will want to close pretty much everything that eats ram, at least on the first run when the 2 models load

#

The Q4_K_S quants are 8.5gb each

#

fp8 umt5 is 6.5gb

#

23.5gb just for the models, 8gb leftover is not much to have left over

#

Hmm, gguf umt5 could help i guess

#

q6_k is 4.76gb and should be near identical quality to fp8

#

if that's not enough, get q3 for the low noise model first and use that, keep high noise at q4

#

then if tha's still not enough, get a smaller umt5 and if even then you don't have enough I think just call it quits and get more ram. q3/4 text encoder, q3 low and q3 high will just be too degraded imo

earnest grotto
#

I'll make a comparison in a bit

earnest grotto
#

I take some things back. Bloating my pagefile to 50GB may have helped

#

though subsequent generations look to be ~40 seconds slower per step, ~80s->~120s

#

Not too bad I guess?

reef ivy
#

Have you tried kijais nodes? They usually have more control over RAM/VRAM allocation

earnest grotto
#

🤔

#

Feels like imgur has killed a lot of the quality but oh well

#

Yeah I'm compressing and uploading the videos here directly, this is too bad

earnest grotto
# earnest grotto

Q3 basically kills the motion in the grass and the silhouette of the moving branch, and introduces some artifacts
High noise Q4 breaks the eyes

#

But using q4 as the high noise is at least better than Q3 with grass movement

#

I think the lower quants' issues, particularly Q3's, will be more noticeable with plain text to video rather than image to video 🤔

earnest grotto
lament shale
earnest grotto
#

Previously I was using ~10-20gb of page file, at 50gb I could do both Q8s at once

#

Though, this is with 48gb of ram

reef ivy
#

I would say while worse q3 seems usable at least. That amount of ram seems crazy for 16gb card though

#

128gb is probably the real play for ai, dunno if ddr4 goes higher?

civic charm
#

Switched to the "Xe" driver on Linux and now she is fast with PyTorch 2.7.1

civic charm
#

Horrible non-blocking performance issue also doesn't exists with the Xe drvier

#

unfortunatley int8 matmul is still slower than bf16 and still has 2x more memory usage than bf16

#

qwen-image also got a nice boost

earnest grotto
#

though not so much for qwen

muted fiber
#

hey i got intel arc a770 16G anyone know can i run wan 2.2 5b

earnest grotto
# earnest grotto

I'm pretty sure it was slower on windows too before some recent update

muted fiber
# earnest grotto

i am facing reconnecting problem when i run wan 2.2 5b on arc a770 what is the issue

reef ivy
#

5b should run no-issue, should even run on a750 without much tweaking but can't say for sure yet

earnest grotto
muted fiber
muted fiber
earnest grotto
#

Video quality is bad because wan 2.2 5B is only 5B and they have not trained it more (I assume, it shouldn't be their priority like the 2x14B models are)
you're running out of RAM with 5B when it's not quantized
You barely have enough RAM to run the 5B model, you don't have enough to run anything better, so yes, you do need more

#

even 32 is kinda on the low side for video. you might be able to get by with just 32

#

to run the 2 14B models

earnest grotto
#

you've changed some settings incorrectly

muted fiber
#

720p take long time

muted fiber
earnest grotto
muted fiber
somber trellis
earnest grotto
# muted fiber

Write a longer prompt and increase the resolution
I guess this is another reason why you should want to get more RAM, I can't find any few-step ("lightning") lora for the 5B model, so you would literally generate faster with the 2 14B ones anyways, if you had the RAM to load them into

muted fiber
earnest grotto
#

q4 wan 2.1, q6 umt5

#

potentially q3 wan 2.1, dunno how much ram windows will keep for itself

#

16gb of ram is below the recommended specs of a good chunk of modern games anyways, you should really consider getting more

reef ivy
#

If quality is bad on 5b use the ones from kijai, he supposedly fixed it at some point.

earnest grotto
# reef ivy I would say while worse q3 seems usable at least. That amount of ram seems crazy...

I am compute limited, and for the things i want to do 48gb also starts to seem not enough or going above it would be a good speed boost
block swapping is too much of a good thing
think about it this way: a video model takes let's say 60s/it, with an iteration basically going through all the model's weights
even if the model was 40gb big, 60 seconds is just so, so, so much longer than it would take for all that data to move from ram to vram
even with pice 4 x8, that's 16GB/s, nevermind x16 or pcie 5 x16 or having a part of the model already in vram at the start or whatever else
like how it's faster to use a q8 model with block swapping than the other not-neatly-8-bit quants without

#

musubi tuner apparently got qwen edit support. but evidently that needs at least 64gb (for block swapping) and bnb is still kinda up in the air for intel

#

luckily it has kontext support too but I kinda feel bad about even considering training a kontext lora due to how much massively better qwen is

#

if models weren't trending towards being bigger and slower...

reef ivy
#

I think the monetization goal for these companies is to have a free really big model and really small model then have a fast premium model that is in between.

#

Basically sell it to companies to use for apps and get paid fees ect

civic charm
#

Tho Xe driver is a bit unstable for daily desktop use

wicked fulcrum
#

Anyone noticing or seeing reports of people having issues installing/running ComfyUI using PyTorch 2.8?

quartz kelp
civic charm
wicked fulcrum
#

Issue seeing is environment fails. Doesn't happen on all systems but its fixed by setting torch from 2.8.0+xpu to 2.7.0+xpu

Wondering if something with the upstream wheel has an erroneous system level dependency or path

earnest grotto
#

I tried reproducing it but it just works

wicked fulcrum
#

Its very hard to reproduce. Its happening on some but not all system that had previously worked, then for some reason can't on PT 2.8. Saw on a lab system. Reinstalling Windows fixed that PC. Others that have it, going back to PT 2.7 gets past the error and install works.
Tag me if you guys see this issue.

somber trellis
#

Trying to get vibevoice to work.

#

Currently can't get it to load into XPU.

#

Works on CPU, both the 1.5b and 7b models.

#

tbh tho im still not impressed with the voice cloning quality

earnest grotto
somber trellis
#

This sounds as if

#

markiplier

#

tried to do the imperial watchguard voice

somber trellis
#

imperial watchguard tries the internet and complains

#

ok i kinda take it back it's pretty good at voices

somber trellis
#

It's properly loading and running with ipex_to_cuda and ipex 2.8.10 and inferencing at 1.36s/it with vibevoice 7b

#

^ on CPU

#

This is just to show the quality on xpu vs cpu, which with just ipex + ipex_to_cuda seems much worse for rn

somber trellis
somber trellis
reef ivy
#

is that all running on diffusers? Or are you using a quant of some sort?

somber trellis
#

Takes up like 44gb of sysram

#

Not comfyui, sadly. The two current nodes I've tried to get working but both of them have issues properly allocating to xpu

reef ivy
#

My next upgrade will be ramz I was going to just add another 32gb kit but maybe I will try a 64gb kit, have to find similar timings. Models seem to be getting bigger and bigger

earnest grotto
earnest grotto
#

I guess I will need to see what happened to bitsandbytes support

#

8 bit adamw might be pretty necessary for kontext and jesus, qwen...

#

Hopefully they're good enough

#

(same for windows)

earnest grotto
#

NotImplementedError: The operator 'bitsandbytes::optimizer_update_8bit_blockwise' is not currently implemented for the XPU device. with the latest build
rest in pizzeria. peppino. pizza tower.

reef ivy
earnest grotto
#

yeah i think windows likes to eat ~7gb for itself without debloating(?)

#

though since the amount it uses like that might depend on how much ram you already have, i wasn't certain

civic charm
earnest grotto
#

That works on intel?

civic charm
#

It should

earnest grotto
#

I will try

civic charm
#

Also you can use Adafactor too, it will use close to no memory

earnest grotto
#

Well, guess I might as well try CAME too if musubi has it

civic charm
#

Original came has no optimizations implemented

#

It will use as much memory as AdamW and will run very slow

#

They forgot to disable gradients on the optimizer. This is the most basic thing

earnest grotto
#

ah, i guess part of my issue is training with 1mp images, but then kontext didn't seem very intended for lower resolutions... oh well, i'll scale them down anyways

earnest grotto
#

🤔 going down from 1mp to 0.6mp was a 3x boost in speed but vram usage is still massive

earnest grotto
#

Another thing I forgot... torch.xpu.empty_cache() right before and after backward() drastically reduces vram usage and improves performance. as always. ~15.3s/it -> 13.3s/it with nothing else changed, and VRAM usage went from.. I think ~14->9.3GB? (For the number of blocks I set it to swap, which was the highest, 34) Shared is still rather high however

somber trellis
#

@earnest grotto I think vibevoice currently trumps any competition open-source-wise.

#

It's not even comparable.

earnest grotto
#

I can tell that's supposed to be kleiner but it just doesn't sound like kleiner

#

Also it gets randomly quiet

#

Being able to do long speech without changing the voice is good I guess... Unless the random quietness is an artifact of not truly long speech

somber trellis
#

but then there are other parts of the voice that it adds in that doesn't.

#

It's a hit and miss.

#

I also noticed the volume decreased halfway-through.

#

Not sure why it does this.

#

Also has that voice artifact that you'd hear from older tts models

#

A hiss.

reef ivy
#

Seems models with better emotion and natural speaking have harder times with capturing the likeness 100%. Might just need more training data though as most can make the voice off small snippets it seems.

earnest grotto
#

Yea that's better

somber trellis
#

I'm currently waiting for the experimental gguf models to get somewhere on vibevoice

#

I really want a way to run this model faster.

earnest grotto
#

yeesh musubi tuner's inference script has a lot of bugs

#

the trained lora wasn't doing anything in comfy so i wanted to see if it's a comfy issue or it just needs that much more training or what, and man...

earnest grotto
#

New qwen lightning lora, and one for edit soon™

civic charm
#

PyTorch 2.10 with Xe driver:

#

As fast as / slightly faster than OpenVINO now

earnest grotto
#

damn, pretty nice

swift aurora
#

Does anyone else have problems with either opencv-python module trashing the bed or WAS node itself mishandling that particular module to a grievous extend? (as in, you can reliably nuke your comfyUI installation by just installing WAS node which then does something stupid with OpenCV Python, which then breaks anything relying on that)

reef ivy
#

Pytorch fixed some regression or is this just the xe driver improvement for linux?

rustic sonnet
rustic sonnet
civic charm
rustic sonnet
#

Fair

earnest grotto
#

well, I haven't tried 2.10 I guess

#

i am scared that musubi tuner's resuming doesn't work so I kinda don't wanna stop training now

earnest grotto
#

I'm sure a random morning/night 3 second blackout or brownout will eventually get me though

earnest grotto
earnest grotto
#

New Lumina dropped. Man, at this rate anime models are really getting left in the dust

#

8B and some of their examples really give me confidence. And the last Lumina was alright

somber trellis
#

getting 3.7it/s

#

Sadly it won't actually produce any audio on my end. It's a bnb 4-bit model.

#

It loaded onto my gpu. I have bitsandbytes 0.48.0.dev0 installed.

#

Clearly not properly though, since it didn't produce anything.

lunar thicket
#

I am curious

somber trellis
somber trellis
lunar thicket
#

ah gotcha 👍 thanks

#

Very much a novice just tinkering in my spare time with comfyui

#

AIPlayground made the install so easy

earnest grotto
#

😂
since 2.10 is 2.1, a lot of the ipex_to_cuda version checks now don't make sense

somber trellis
#

bruh

#

I just want a way to run the 7b vibevoice model on my xpu without ooming

reef ivy
earnest grotto
#

bnb claims it has qlora support for intel, which needs 4 bit

somber trellis
#

a person named calcuis is working on gguf-connector

#

he already has a vibevoice 1.5b model quantized and it works on xpu fine with some changes to the code

#

i had just sent him the link to aoi-ot's 7b model backup

#

since he didnt even have 7b supposedly (asked him in hf discussions)

marble sigil
#

Hi there!
Anyone know how to send PM to Bob Duffy? I registered yesterday on Discord and sent Bob a friend request.
But there is no guarantee that he will accept the request. I am new to Discord and don't understand anything.
My PM is very long and I also don't want anyone else to read it. Please give him a sign, someone!

marble sigil
#

what why?

earnest grotto
#

Why do you want to DM him

somber trellis
#

Suspicious.

earnest grotto
#

Spent a while bashing my head against the lora I trained, now am confused if kontext is just THAT stubborn or what, as I still see literally no change, even after using musubi tuner to merge into kontext and then using that in comfy

marble sigil
earnest grotto
#

Yes, and why do you want to do it

marble sigil
earnest grotto
#

state your questions

marble sigil
marble sigil
lunar thicket
#

You should simply make a post in #1243956384052285560 or #1088926345138012160 with your questions. Or perhaps start with one question at a time

#

An individual like Bob (intel employee) is in a one-to-many situation with the many, many general users.

Imagine if every person with a question for him just DMed him all the time, and no systems were in place to prevent it. He would be buried under an avalanche of DMs.

marble sigil
# lunar thicket If you are new to not only Discord but all types of online chat, then it is reas...

Thanks for your comments.
I'm 51, of which I've been using the Internet for almost 30 years. No place except Discord never made me uncomfortable with the surfing interface.
In addition, my Discord is buggy and hangs.
The issue of "one for all" is a personnel problem of Intel. There must be assistants for this. But I hope that Bob will be very interested in receiving feedback from me later.

I'm so fed up with the "professional" advisors I met on Reddit, that I don't want to risk it anymore and put my questions out in public.
Today, you can count on the fingers of one hand the number of people who are planning a similar PC build to mine.
In three months, there will be more of these people. And these new people will turn into, as you said, "many general users".

somber trellis
#

This discord exists as an insiders community for the people that mess with and have quetsions about intel products

next niche
#

I found this thread linked in a YT video. Is the CMD method for installation still working? When I try to launch it after installing requirements-ipex.txt, I get a bunch of errors: numpy incompatibility, pytorch incompatibility, torchaudio missing, av missing

earnest grotto
#

(See also, 3rd pinned message)

somber trellis
#

@earnest grotto

#

I got an LLM to create a bnb test script for me

#

It can do 4 and 8 bit inference supposedly

#

Only thing we cant do

earnest grotto
#

well... without seeing the code there isn't much to say

somber trellis
earnest grotto
#

furthermore, not everything is linear layer

#

adamw 8 bit definitely doesn't work, though that's training not inference

earnest grotto
#

it's not great as an actual test

#

just in this case, I'm willing to assume, if there's no exception, bnb have it fully implemented

#

and, again, linear only

somber trellis
#

I could just ask it for a more complex test.

earnest grotto
#

but they do list qlora support so i'm sure they should have inference for 4 bit at least

somber trellis
#

take what i say with a grain of salt lmao i cant code in the first place

civic charm
#

Works with any GPU or device

earnest grotto
#

Ah, standalone repo

#

I will check it out

civic charm
#

for AdamW 8bit
sdnq.optim.AdamW
optimizer_args: use_quantized_buffers=True

#

use_quantized_matmul feature for inference "works" on A770 but slows down the model instead of making it faster, so set that to False

#

Nvidia runs almost 2x faster with it

#

RX 7900 XTX without a proper INT8 hardware manages to run 10% faster than FP16 with it

somber trellis
#

im kinda frustrated rn

#

for the last week ive been trying with my non-coding brain to modify the gradio_demo.py in vibevoice's repository

#

I don't know what I should use if I want to load a large model with offload capabilities

#

I wondered if I could use sdnq to quantize it to int8, allowing the model to fully fit into my gpu memory

#

but I don't really know how to do that either

reef ivy
somber trellis
#

I've tried that one and two others

civic charm
#

set use_quantized_matmul to False on Alchemist tho

#

also sdnq supports anything between 1 to 8 bits and also has int and uint quants

#

you can try uint3, uint4, int5, int6 for low bit

somber trellis
#

I'm getting speeds around 1.7s/it and it takes 12gb of vram on the vibevoice 7b model

#

ok guess im not posting any snippets the bot's onto me again

#

it has flash attention 2 still there but it auto-fallbacks to sdpa since it isnt supported

#

i forgot to get it to change that

civic charm
somber trellis
#

around the same speed, but more vram is used

civic charm
#

Also looked at the code a little bit

somber trellis
#

14.5 instead of 12

civic charm
#

Why are you not using quant config on VibeVoiceForConditionalGenerationInference.from_pretrained?

#

And loading the unquantized model to memory instead?

somber trellis
#

its a script spaghetti coded by gemini

#

i also didnt know there was a quant config

#

lmao

civic charm
#

I didn't even list the other manual method

somber trellis
#

i didnt know thats what that was

#

lmao

#

then again

#

its SDNQconfig

civic charm
#

Also to disable compile: set SDNQ_USE_TORCH_COMPILE=0

civic charm
#

It has quite a bit of speedup

somber trellis
#

it won't load onto xpu anymore

#

oh i think i know why lmao

#

it removed quantization_device and return_device lmao

#

or not

#

🤷‍♂️

civic charm
#

Default is don't touch the device

#

Aka cpu

#

unless device map is set

civic charm
#

What is the actual issue?

somber trellis
civic charm
#

device.map = self.device is the culprit

#

transformers will try to allocate everything at once

somber trellis
#

so manually set it to device.map = "xpu", ?

civic charm
#

And you will hit the 4gb alloc limit

civic charm
somber trellis
#

👍

somber trellis
#

around the same speed as the previous scripts, 1.79s/it

#

its spiking to 93-98% xpu usage and its at 12.8gb of vram

civic charm
#

This shouldn't have a difference on running speed, only on loading

somber trellis
#

Well it works.

#

Lol

civic charm
civic charm
#

Also has optional 8bit support on top

#

sdnq.optim.CAME

#

To enable 8bit, pass this to optimizer args: use_quantized_buffers=True

somber trellis
#

reverted to current stable torch+xpu build (2.8.0)

#

AttributeError: 'TritonLauncher' object has no attribute 'shared_library'

#

i think im missing something

civic charm
#

Did you manually install triton?

#

If so, don't

somber trellis
#

im assuming im not supposed to use it

civic charm
#

That is the correct way

somber trellis
#

I'm on pytorch-triton-xpu 3.4.0

civic charm
somber trellis
#

C:\Users\dbs_5\Comfy_Intel\cenv\python.exe
C:\Users\dbs_5\AppData\Local\Microsoft\WindowsApps\python.exe

civic charm
#

Seems like it is a bug in older versions of intel triton

#

nightly should work

somber trellis
#

pytorch-triton-xpu==3.5.0+git1b0418a9

civic charm
#

I have no idea what this is:

OSError: [WinError -529697949] Windows Error 0xe06d7363

#

I guess triton support on Windows isn't ready yet

somber trellis
#

llm added funny if statement to the device_map and removed quantization_device and return_device

#

caused it to not work

#

that latest script doesnt work for some reason, outputs garbled even with --torch_no_compile

earnest grotto
# civic charm btw, sdnq also has my came implementation that doesn't run like potato

I think I kinda gave up after seeing no change in 4000 steps with 0.0002 lr and my experience with how plain stubborn kontext can be with prompting in some cases, I'm assuming the model itself is just that fried, I'd probably have to train qwen and I'm not sure if I have the RAM for that. Maybe if I can hack musubi tuner to load in fp8 right off the bat, since it has a tendency to want to load in bf16 and then convert to fp8

somber trellis
#

went back to the non-compile script and im happy with it working at 1.7 it/s on pytorch 2.10

#

nearly double the speed than on cpu, so that's cool

#

or not becuase the first generation artifacts, then the second generate is gibberish

#

The CPU loading script doesn't degrade, but the xpu only ones do.

#

It stays and remains at a stable 1.71-1.8IT/s, while the ones utilizing xpu-only loading alongside SDNQConfig are either artifacted, garbled or the model doesn't output any audio data at all. It just goes straight to 3IT/S and generates nothing sometimes.

#

I'm very, very confused as to why.

#

random copypasta

somber trellis
somber trellis
somber trellis
earnest grotto
#

After training the neta lumina lora for even longer, i feel it's actually coming along better (12.5k steps)

#

Really makes me wonder what did neta screw up. Perhaps they overfed it poopoo autogenerated slop captions but otherwise the large amount of images they trained on was still good 🤔

#

pretty sure that model doesn't even understanding basic quality tags like "best quality", "low quality", "worst quality" despite them insisting on those tags in the prompt guide

neon tapir
#

is anyone else experiencing or has experienced a weird issue where half way through generation the image will black out entirely?

neon tapir
#

im using 2.9.0+xpu, i also use this persons installer script but tbh i havent tried installing comfy without it in awhile https://github.com/a-One-Fan/ComfyUI-Intel-Installer-Script EDIT: just realized this is your script, i have in the past always used the 2.5+xpu option but ive recently swapped to the stable pytorch version to see if i can get improvements

GitHub

Contribute to a-One-Fan/ComfyUI-Intel-Installer-Script development by creating an account on GitHub.

earnest grotto
neon tapir
#

I tried a variety of things and I’m not exactly sure what fixed the issue. I last restarted my PC yesterday and I’ve included a picture with my workflow. My main goal was to see how the new stable option would affect performance, stability, and compatibility. I also selected the option to install the recommended nodes including KJ, RGThree, and others. I think that’s when the problems started possibly because I was trying to use my old workflow on a new version (I hadn’t updated ComfyUI in a while) or due to some other factors. But recently I uninstalled the new version and went back to the older 2.5+xpu version and things have been running smoothly since. I plan to try updating to the stable version of pytorch again to see if that was the issue. For now I’m unable to pinpoint exactly what caused the problem but thank you for the responses.

lunar thicket
earnest grotto
lunar thicket
#

exactly 1mp or "1mp or less"?

earnest grotto
#

and vice-versa

#

some newer anime finetunes are trained to be able to work at higher resolutions without breaking. but generally, you can assume sdxl performance is 1024*1024 or might as well be 1024*1024

#

lower -> image gets fried
higher -> repeating patterns

lunar thicket
#

thanks for info

wet frost
#

Could someone help me get it working on a Core Ultra iGPU in Linux?

This is what I did:

git clone https://github.com/comfyanonymous/ComfyUI.git ~/ComfyUI
cd ~/ComfyUI
python3 -m venv venv
. ./venv/bin/activate
python3 -m pip install --upgrade pip
pip install -r requirements.txt
pip uninstall -y torch torchvision torchaudio
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/xpu
pip install --upgrade comfyui-frontend-package

Check:
python3 -c "import torch; print('Torch version:', torch.__version__)"
Torch version: 2.10.0.dev20250914+xpu

However I get this error:
/home/floki/ComfyUI/venv/lib/python3.12/site-packages/torch/xpu/__init__.py:61: UserWarning: XPU device count is zero! (Triggered internally at /pytorch/c10/xpu/XPUFunctions.cpp:115.) return torch._C._xpu_getDeviceCount()

lunar thicket
#

I was using Flux models and thought this was just normal B580 generation speeds, but then I saw all the posts in here from folks talking about sdxl speeds so I tried that instead. Holy smokes lol. Sdxl models are just way, wayyy faster.

earnest grotto
lunar thicket
#

I am trying to learn comfy and prompting first.

earnest grotto
#

There are no good anime finetunes on any model after sdxl

lunar thicket
#

I havent figured out how to properly do inpainting or img 2 img or upscaling stuff. The last sounds like it should be dead simple but when i tried it it just didnt do anything lol

lunar thicket
earnest grotto
#

Get animagine 4.0 opt or illustrious 2.0, read each's prompting guide that explains the quality and artist tags, and open up gelbooru and look at what tags images have

civic charm
#

btw, how is the speeds on b580 with sdxl 1024x1024 and with what pytorch version?

lunar thicket
civic charm
lunar thicket
civic charm
#

i am trying to figure out if battlemage is able to deliver its full potential. If it can, it will be faster than an RTX 3090 Ti.

earnest grotto
# lunar thicket What does this mean?

Base models can't do anything anime art. They might have some offputting corpo art-looking style (see: ghibli trend) and know who hatsune miku is and that's about it. To be honest, even SDXL and 1.5 did actually know the artstyles of some classical art so that was at least interesting. So you want someone to go and actually train them on a bunch of art
This has not happened to a useful degree for any model after SDXL
Lumina has an anime finetune, but it's kinda broken. For Flux there was Chroma but even though it's trained on anime, it's still not quite there

civic charm
civic charm
#

there are some slight differences in tagging

earnest grotto
#

Yes, and danbooru limits searching to 2 tags

#

The general concept is enough, it's not like the models follow the tags infinitely well anyways. For example "wing collar" is confusing enough (to CLIP?) to make wings despite having that many instances

lunar thicket
#

but happy to help your testing with my card

#

here is hoping that by Celestial dGPU release they have unleashed full potential

civic charm
#

it won't be anywhere near a rtx 3090 on gaming but matrix / ai compute on intel gpus are kinda insane

lunar thicket
#

yeah im pretty ignorant on the AI performance. I had read somewhere that it was just ok, but that was ages ago. Maybe I have an outdated perspective here

#

are you currently running an A770?
or RTX 3090/3090 Ti?

civic charm
#

a770 and rx 7900 xtx

earnest grotto
# lunar thicket I havent figured out how to properly do inpainting or img 2 img or upscaling stu...

For inpainting, you want a dedicated inpainting model
There's basically only 3-4 or so. Base SD 1.5, Dreamshaper 8 inpainting (an SD 1.5 finetune), SDXL and Flux. Faster, lower quality <-> Slower, higher quality. Anime isn't super big of an issue here if you just intend to use inpainting to fix up minor issues or remove things while keeping the bg consistent, since due to actually seeing the image inpainting models can usually match the style to an extent
Also,

#

there's some external ones you need custom nodes for, brushnet/powerpaint is good but you can try those later once you familliarize yourself with comfy

#

The default inpainting workflow has an issue if you intend to inpaint multiple times but let's keep it simple for now i guess

#

You don't have any prior experience with any other nodal UIs, right? Blender, unreal, unity, houdini, etc.

lunar thicket
#

I do actually. I used Unity for years and used their state machine animation system, before node based / visual scripting got introduced more widely later on, around the time I abandoned gamedev. Dabbled in Unreal 4 some briefly too.

#

Most of my experience is older and without those systems though

#

This is good info re: the faster to slower model options

earnest grotto
#

Well Comfy is that but worse; if you have Blender experience in particular, some cool people have made a Blender addon to integrate a ComfyUI node editor into Blender, and Blender has an actually pretty decent nodal editor so that's a huge jump in usability. I might want to poke it a bit more again though, my old Intel code there might be too old now

#

I'll show you a fixed inpaint workflow in a bit then

lunar thicket
#

So far I am doing alright on workflows and nodes, primarily using built in comfy ones

#

I did install a third party workflow that was missing a node last night, so i went to github and grabbed the python script for that node and dumped it in the indicated folder. But it didn't work, couldnt get that node to work, even after restarting everything. So I just disabled it because it was just a resizing node anyways

earnest grotto
#

VAEs are lossy
Even with just 1 encode->decode, the image will get slightly blurry and very fine details (grass, chainlink) will be gone. After probably 5, your image will be fried with the SDXL VAE
The bottom nodes, you can adjust how much the mask gets blurred

#

There's some other issues with inpainting as well i feel, and not comfy-specific... but not much we can do

lunar thicket
#

(I am not on desktop, I haven't had a chance to go tinker again today)

earnest grotto
#

If a node has text above its corner there, it's a custom node, EXCEPT for nodes that say [BETA], those are built-in

#

The text is determined by what the addon registered its custom nodes as. usually most people will just name them after their repo

#

No text means it's standard

#

Though if you zoom out enough so that other text disappears, this does too 🤷

somber trellis
earnest grotto
#

i don't recognize the character

somber trellis
#

mordhau

somber trellis
earnest grotto
#

Apparently Index TTS 2 released 9 days ago

#

Were MS stirring up drama to bury it?

#

And apparently it has proper emotion control, judging by what a random custom node for comfyui has for it

somber trellis
#

And I'm not impressed so far lmao

#

It's a lot faster than Vibevoice, I would say.

#

It's also quite a lot smaller.

reef ivy
#

Which is closer the voice you cloned? The second has more inflection but does it match the voice better?

somber trellis
#

The second one matches the input audio better.

#

Both in terms of level of artifacts, and in terms of voice similarity.

somber trellis
lunar thicket
#

using the default prompt it had (evening sunset scenery blue sky nature, glass bottle with a galaxy in it) and set to 1024x1024

#

the image lol

#

pytorch version: 2.8.0+xpu

#

idk what kind of performance is "good" or not. considering this has the refiner model to load also it is slower than it could be

#

oh, and this is in Windows

civic charm
#

can you do a second run? first runs have the jit overhead

#

also load a normal sdxl model

#

no one uses the refiner and it is a different arch

lunar thicket
civic charm
#

sd_xl_base_1.0.safetensors is the one i am interested in

lunar thicket
#

I ran it 3 times

#

my workflow

#

same prompts as before, and same model, and 1024x1024 again

#

@civic charm much faster this time

civic charm
#

4 it/s is pretty close to its full potential

#

RTX 3090 gets 4.0 it/s

#

RX 7900 XTX gets 4.8 it/s

lunar thicket
#

holy smokes

#

that is impressive in context

civic charm
#

RTX 4090 gets 8 it/s

lunar thicket
#

i mean just thinking about MSRP of 3090 vs B580. or die size

civic charm
#

4.12 it/s for $250 is insane

lunar thicket
#

if only they had a 24GB B580 😄

civic charm
#

B60 : p

lunar thicket
#

yeah

#

i wonder if performance will be noticeably worse for B50 since it doesnt have the full G21 like B580

civic charm
#

Intel lists it as 170 int8 tops
b580 has 233 int8 tops
interpolating this info should give around 3 it/s but not sure how much memory bandwidth bottleneck it will have from a 128 bit bus

lunar thicket
#

B60 is gonna be the move

earnest grotto
lunar thicket
earnest grotto
#

The command prompt

lunar thicket
#

its the default comfyui one. "SDXL Basic"

#

oh. idk if i have that anymore

earnest grotto
#

Run comfy again and show what it says at the start

earnest grotto
#

do you wanna run the script again, pick nightly, and tell us what the performance is then

lunar thicket
#

how do i pick nightly build?

#

(and can i switch back easily?)

earnest grotto
#

oh, this is the AIPG comfy

#

don't poke that then, I guess

lunar thicket
#

yeah 😆 im a scrub who got into it that way

lunar thicket
#

@earnest grotto have you done nightly pytorch builds and seen a noticeable uplift on Arc?

civic charm
#

A770 went from 2.2 it/s to 2.4 it/s

lunar thicket
#

that's not bad

formal tusk
#

Some interesting news today, apparently China is banning the sale of Nvidia chips. With any luck, there will be a move away from CUDA in open source offerings lol

earnest grotto
#

They only banned 2 nvidia gpus, not all
And... Things are already not too cuda-dependent

earnest grotto
lunar thicket
#

Yeah. Maybe i will try it anyways. Worst case scenario Ill just reinstall lol

#

If 5-10% gain is even theoretically on the table thats pretty fire

earnest grotto
#

pip3 install --pre --upgrade torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/xpu

lunar thicket
earnest grotto
#

For inference it's pretty much the same

lunar thicket
#

👍 cool

earnest grotto
#

Only difference is if you run out of vram on linux the driver crashes (it kinda does on windows too sometimes but generally not)

lunar thicket
#

I have a kubuntu install on the same machine but i only use it for playing games occasionally

#

thats a big difference lol

earnest grotto
#

Training performance was better on linux but recently it improved on windows, and there's some odd issue on linux i haven't nailed down yet that kills training performance

civic charm
earnest grotto
#

Ah, windows has pseudo-leaks in inference that don't happen on linux (to a noticable extent?). I get them after doing ~30 images with heavier models like kontext or qwen, or hundreds of images with sdxl
VRAM is clearly free but at some point the runtime gives up and says it ran out of vram anyways and will refuse to let you gen until you restart your PC
However on both OSes those do seem to happen in training, again moreso for heavier models
And spamming empty_cache() seems to resolve both of those and sometimes improve training performance and definitely improve vram usage especially during training, which is odd since it used to break WSL before

#

Wacky bug, really contradicts the pytorch doc, at least for the Nvidia empty_cache

#

I could get 3 batch size training an sdxl lora with it

#

Couldn't without

civic charm
#

Xe driver with PT 2.10: 3.6 s/it with 11.8 gb vram usage

earnest grotto
#

2 batch size?

civic charm
#

yes

earnest grotto
#

damn, that's pretty good, so it's fixed then

civic charm
#

i915 driver is still horibble tho

#

ipex 2.3 and openvino works fine with i915 but pytorch / modern ipex doesn't

#

they only work properly with Xe

#

also i915 is the default driver for A770

#

Also using Xe will lose video encoding support with alchemist

earnest grotto
civic charm
#

I am using Linux 6.17.0-rc6-1-mainline

Added this to GRUB_CMDLINE_LINUX_DEFAULT

i915.force_probe=!56a0 xe.force_probe=56a0
civic charm
#

I don't really use that feature and i do have another GPU to use if i need to so it doesn't really affect my use case

#
sudo lspci -vvv | grep xe
Kernel driver in use: xe
Kernel modules: i915, xe
#

Also 56a0 is the device id of A770 16GB

earnest grotto
#

ye

lunar thicket
earnest grotto
#

eventually restarting it won't work

lunar thicket
#

Hmm interesting

#

So far i haven't hit a hard wall

#

Maybe its an AIPlayground quirk that is actually an advantage

earnest grotto
#

well, most people are unlikely to generate 700 sdxl images without restarting their pc

#

it's a bit more concerning with the heavier models though

lunar thicket
#

700 😵

earnest grotto
#

qwen image edit's results are good but only good enough that i really want a few extra variations

earnest grotto
#

In hindsight... IMO better to just lock a seed and tweak the prompt and weights

#

Oh, and grids for testing out loras

#

let's say 8 loras, 4 seeds, 8 prompts. that's 256 images, though it is on the high side

lunar thicket
#

This is advanced comfy'ing

#

never considered locking a seed

earnest grotto
#

you should, and weight your prompts

#

particularly with sdxl models

#

less so with the new dit ones, newer text encoder don't seem to play well with that?

lunar thicket
#

weighting prompts i have done

earnest grotto
#

prompt weighting is massive

lunar thicket
#

Whats a "new dit one"?

earnest grotto
#

SD3, SD3.5, Flux, Lumina, Qwen Image, etc.

#

Well, it's most likely not the DiT architecture that makes that not work but the not-CLIP text encoders

#

It's just that both of those came hand in hand

lunar thicket
#

Ah ok

#

I think QWEN is too heavy for my system, using the default model and workflow that ComfyUI provides

#
  1. its slow as balls, 2) it consumes all my system RAM to spool up
earnest grotto
#

I don't think the image gen model is worth it, though i mostly do anime

#

but even for anime the edit model is good

civic charm
#

edit model is slow but the image model is actually a little bit faster than flux

lunar thicket
#

is that right? Good to know

civic charm
#

2.4 s/it on flux vs 2.25 s/it on qwen

#

on a770 with sdnext

earnest grotto
#

Original, edit model, +my own edits

#

We've moved on from trying to make image gen models make coherent text, to trying to get them to understand that the text is actually written on the pages of otherwise blank open books

lunar thicket
#

Lol

civic charm
#

coherent text issue was mostly from the very compressed 4ch vae not being able to preserve the texts

earnest grotto
#

it was bad even when big

#

but that was part of it for sure

#

and the popped egg yolk anime eyes

lunar thicket
#

@earnest grotto what GPU are you using for image gen ?

earnest grotto
#

a770 16gb

lunar thicket
#

think you'll upgrade or does it suit your purposes well?

earnest grotto
#

I don't upgrade often. I won't be upgrading anytime soon

lunar thicket
#

ah ok, same

#

Some people change GPUs like theyre out of style and a new season is upon them

earnest grotto
#

Most new GPUs just don't seem worth it to me

#

And there isn't much of a 2nd hand market here I think

lunar thicket
#

when I built this PC i had not upgraded anything (except the motherboard downsized to itx) since my last build in 2016

reef ivy
earnest grotto
#

no

reef ivy
#

I am thinking a used 3090 tbh, wanted to stick with intel but not sure now

earnest grotto
#

there is no such used market here

#

if you can find one for the mythical $700, power to you 🤷

#

I don't intend to get a new GPU simply because I'm fine with my current one. my last one was an rx 480. I want an upgrade i make to be substantial, but in this age that seems like an increasingly slim chance, i could instead spend more but for now I don't wanna

#

if I started monetizing what I do with AI I might reconsider but... I don't, for now

lunar thicket
reef ivy
#

Yeah, which makes the b770 kinda interesting if it releases anyway lol

earnest grotto
#

if they do intend to release a gaming gpu that's extra fat, they better have something planned for that CPU bottlenecking

lunar thicket
#

For AI it could be pretty cool but I don't expect huge gaming gains

#

B580 for me until Celestial, probably. Id have gone B770 if it came out at the same time

reef ivy
#

I think at higher resolution the bottleneck will be less noticeable which is more likely for people to use with higher performance cards imo anyway

#

If I can game on that b60 and the price is reasonable then that could also be an option

earnest grotto
#

my crystal ball says you probably will be able to game but the price will not be reasonable. but then, I think $600-700 would not be a reasonable price for it

#

and $700 is the mystical magical second hand 3090 price

somber trellis
earnest grotto
#

i don't recognize the first person
I'm afraid to say... I haven't played oblivion
But man, AI TTS is such a boon for memes

somber trellis
#

The first person is just the base male breton race voice

#

AI TTS is a boon for mod creation, too.

#

Morrowind has a mod catered to voicing the entire rest of the game. Parts that aren't voiced, just text redone.

#

Using elevenlabs.

#

A large percentage of morrowind's conversations aren't voiced.

earnest grotto
#

I still chuckle sometimes when I remember that some AI voice mods for skyrim I saw didn't fit in, because the base game's voices are emotionless and repetitive and the AI ones had too much actual emotion

somber trellis
#

it can do skyrim nord

#

Vibevoice is very hit-and-miss. I assume this is because we have little to no control on what is generated.

#

Sadly though it's very slow.

earnest grotto
somber trellis
#

That minute and 28 i did took 21 minutes to generate on int8

#

1.79-1.8IT/s

somber trellis
earnest grotto
#

I'd suggest trying neco arc but that'd be a big gamble

somber trellis
#

You got a minute of neco arc voicelines that I can just put in?

#

or would I just use this

#

LMAO

earnest grotto
#

Well... They're voice alright, just not sure if they're lines

#

yeah that's what i used

#

and uh, index tts i think, could produce something usable

somber trellis
#

index tts 2 sounds like a tube to me

#

index tts 1 has no emotions but some of the best cloning quality

#

and it was a 1.5b parameter model too

earnest grotto
somber trellis
#

Do you have the text for this?

#

I'll try it.

earnest grotto
#

I was also frying it with the temperature i think

#

This is from just the noises

somber trellis
#

NYAH! Why do you avoid the ai-generated spam channels so much? Trust me, there's nothing scary there. And corey doesn't post there that often.

#

I did it myself anyways lol

earnest grotto
#

What I wanted it to say was among the lines of "Mira... Why do you avoid the AI-generated spam channel so mcuh? Trust me, there's nothing scary there, and corry does not post there that often..."

#

It had a freakout at the start due to me cranking up the temperature, and fumbled "ai-generated" a bit

somber trellis
#

A I generated

#

these ttses look at combined acryonyms as words

#

99% of the time

earnest grotto
#

maybe

#

Mira, why do you avoid the A I generated spam channel so much? Trust me, there's nothing scary there and corry does NOT post there that often! from the metadata

somber trellis
#

only 2 seconds of preview

#

Lol

earnest grotto
#

You should probably cut out the initial freakout part 😂

somber trellis
#

yeah probably

#

sounds like its saying chud

#

The further I go above 1 minute, the more it takes per it

#

2.24 s/it on 1 minute 33 second audio input

#

nvm it went back down

#

phew

#

I should also be using always on top

#

I bet I can cherry pick outta dis

#

windows really doesnt like it when you dont have cmd windows ontop

#

kinda annoying cuz it dips to 2.2-2.3s/it until I focus a window

#

or maybe im suffering what you said earlier, where over time generations just end up imploding the PC

#

cuz now im getting 1.63s/it (I just blackscreened)

#

its like she implodes at the end every time

#

lmao

earnest grotto
#

If you want my workaround, add torch.xpu.empty_cache() somewhere in the inference loop 🤷

somber trellis
#

@earnest grotto there we go

#

gotta remove the beginning again

earnest grotto
#

ah, i should really try indextts 2 sometime. iirc doing chinese (hopefully japanese?) to english was one of the big features

somber trellis
#

I instead tried it on a huggingface space to get a taste of how it sounds

earnest grotto
#

30 images of lace. autotagger's opinions:
8 are 1girl, 1 is 1boy, the remaining 21 are 1other
🤔 found this a bit funny and peculiar

lunar thicket
#

autotagger said PrideFlag

somber trellis
earnest grotto
#

also damn, this must be the first time i've actually managed to overtrain a lora. fascinating.

#

(overtrained loras produce spooky results)

somber trellis
#

It requires torchcodec, which requires pytorch 2.8 at the latest.

somber trellis
#

It kept insisting it wasn't compatible with 2.8.0+xpu, so I just didn't go further.

reef ivy
somber trellis
#

Funds? What are those?

reef ivy
somber trellis
earnest grotto
#

god, dolphin is so much better than file explorer. what are microsoft doing...

#

file explorer loads so crazy slow, it's a pain to actually use it to look at my images

earnest grotto
#

yeah, with the xe driver linux training performance is definitely back to reasonable speeds

earnest grotto
#

new qwen image edit dropped

lunar thicket
earnest grotto
#

so far, i know only what qwen claim about it: Multi-image Editing Support, Enhanced Single-image Consistency, Native Support for ControlNet

#

they have a lot of examples there

#

they do have a hf demo

#

multi image makes me hope for style transfer but that's probably still not a thing

#

time to find out

#

the hf space doesn't do multi image

#

wait, nevermind, they link the wrong space

#

double nevermind then

lunar thicket
#

lol

earnest grotto
#

it could just be hf overestimating the time needed? welp, i'll wait i guess

lunar thicket
#

Thanks, I won't be messing with it for now. QWEN is too heavy for my rig

#

I still havent booted up and tried setting up an Inpainting workflow based on your screenshot

earnest grotto
#

4.82s/it training sdxl with 3 batch size and 12 rank, albeit with only 1024^2 images 🤔

#

windows didn't like that as much and was stuck around 10+s/it

lunar thicket
#

training? Neat

#

Like for creating a Lora or something?

#

and thats on your A770?

earnest grotto
#

yes

earnest grotto
#

@dark pasture Use this script
If you want more performance, install the nightly pytorch

dark pasture
#

thanks, wondering why it installs kohya_ss from bmaltais

earnest grotto
#

so that you can also train, if you want

#

make a lora of your favorite anime or video game character, make them do things

#

hopefully one day i will figure out a way to or the models will get generalizable enough to make a spritesheet

#

if the new edit models could properly do both pose and style transfer...

#

How is it that style transfer, one of the so early things from like GAN times if not before? seems to be turning into lost capability now

dark pasture
#

well, maybe with help of llm it would be possible to do things 😛

#

btw, that are your maximum resolution for picture generation?

lunar thicket
#

isnt style transfer like explicitly claimed as one of the things that flux kontext, and qwen edit, can do?

earnest grotto
earnest grotto
dark pasture
#

i wonder if there is working IPAdapter for flux? besides XLabs-AI

dark pasture
earnest grotto
#

you can kinda get outside that but you will have issues

earnest grotto
#

I think it's a bit sad they based it on dev and not chroma. though perhaps they had already trained too much on dev before chroma finished

dark pasture
#

ha! Comfy is generating 2560x1920... 17.39 it/s s/it

earnest grotto
#

i should try to compare it with the noobai ipadapter

dark pasture
#

yeah, with sdxl was getting cool stuff with ipadapter 😄

earnest grotto
#

it also might be a better idea to do the equivalent of hires fix, not because flux will be broken with high resolutions, but because it'll be faster and the early half of the timesteps is not responsible for the fine details in high resolutions anyways

dark pasture
#

yeah, of course, but it's more interesting to test out stuff like this and try to push the limits 😄

i remember was doing x6 latent upscale

dark pasture
#

Vik, thanks a lot with that script. Teacache is doing stuff 😄

lunar thicket
dark pasture
lunar thicket
#

😆

#

I was like whats this guy got setup over there...

dark pasture
#

hmmm, now after every generation i have RuntimeError: UR error: 38 (UR_RESULT_ERROR_OUT_OF_HOST_MEMORY) 😮

need to explore a bit

earnest grotto
#

windows issue

dark pasture
#

well, i am on linux 🙂

earnest grotto
#

never had that on linux, what did you do

dark pasture
#

i do 3x1280x864 generation and when next generation getting started i receive out of host memory

earnest grotto
#

also, you can go edit comfy/samplers.py, after line 990 in outer_sample, add a new line with just torch.xpu.empty_cache() (and the proper indentation)

#

it might help memory usage a bit, though it's more pronounced in training

dark pasture
#

yeah, it feels like it doesn't empty cache well. Also empty cache node doesn't help either

earnest grotto
#

you want it done on every step, not after it's finished sampling or only before it started

#

but, it's also not a magic fix and might not help too much

#

it's only a magic fix for training where it's absolutely necessary sometimes (i distinctly remember i needed it for training ace-step. a shame the trained lora didn't work out though)

dark pasture
#

torch.xpu.empty_cache() didn't do the trick, but it seems that i need to lower quant of gguf models, since 9 gb model allow to continue futher generation, but 11gb model after first generation gives out of host memory

earnest grotto
#

compiling the q4 qwen image 2509 took 38 minutes

#

~18.5s/it -> ~15s/it (1 cfg)

earnest grotto
#

i'm gonna assume the lightning lora is just bad for it and wait for a new one 😐

earnest grotto
#

Nonetheless it seems generally... Better?

#

hopefully lora soon

lunar thicket
#

thats a huge gain in speed

earnest grotto
#

I would need to edit 162 images without shutting comfy down for the compile time to have been worth it

lunar thicket
vocal swan
#

Hey guys I need some help - I reinstalled comfyui with the script again because it was getting out of date, but now all my generations are black images. The sampler preview shows for a few steps but then goes black.

  • Tried different models (illustriousXL, cyberrealisticXL)
  • Seperately loaded sdxl fp16 fix vae
  • Used --force-fp16 argument
  • Used --force-upcast-attention
  • Tried torch nightly and stable
  • Updated comfyui with git pull
  • Used custom and default nodes

should i just switch back to 2.5+ipex and see if that works?

civic charm
#

--force-fp16 is sure to cause black images

#

use --bf16-unet --bf16-vae

#

use torch 2.7.1 or later

vocal swan
#

same issue, using all default settings
--bf16-unet --disable-ipex-optimize --lowvram

i have a b580. at one point some generations got through but it was really inconsistant and most of the time it just goes black

this is the error I get

C:\Comfy_Intel\ComfyUI\nodes.py:1594: RuntimeWarning: invalid value encountered in cast   img = Image.fromarray(np.clip(i, 0, 255).astype(np.uint8))
lunar thicket
#

No idea here

reef ivy
#

Is that fp-16 vae fix still necessary? Make sure all your custom nodes are also updated

civic charm
#

sdxl doesn't work with fp16, if you are using fp16, you have to use fp16 fix vae

#

but intel defaults to bf16, bf16 doesn't have this issue

dark pasture
# dark pasture hmmm, now after every generation i have ```RuntimeError: UR error: 38 (UR_RESULT...

Hope, this https://github.com/comfyanonymous/ComfyUI/pull/9979 fixed my OUT_OF_HOST_MEMORY error,

third generation ran smoothly 😛

GitHub

What:
Explicitly call detach() on unloaded model&#39;s model_finalizer to avoid memory leak.
Why:
When unloading models in load_models_gpu(), the model finalizer was not being explicitly detach...

earnest grotto
somber trellis
#

for some reason qwen image edit 2509 has issues doing style transfers

#

an entire reddit post on it

#

original

#

"Change his armor color to gold, and make it clean-looking." Qwen-Image-Edit-2509-Q8_0.gguf with the 4-step qwen-image-lightning 2.0 lora

#

Qwen image edit 2509 fails to do pixel art transfers at all.

#

Something I messed around a ton with on the previous version of the model.

earnest grotto
#

if a lora is not trained for the specific model you are using it for, it will be worse

#

that it's usable, yes, it's usable, but it is worse than a lora specifically trained for that model will be

somber trellis
#

I know that.

#

Of course I'm also going to download the 2509 version of the lora when it releases

#

but what I said regarding prompt adherence in certain scenarios doesnt even involve the loras