#ComfyUI for Intel Arc using IPEX
1 messages · Page 11 of 1
Have you tried kijais nodes? You can specify the block swap amounts
I can specify when launching comfy too
also qwen image edit model released
How do you do that? Although with the nodes you don't have to restart the webui
huh, I swear there was a launch argument to do that but I guess not. I must've mistaken it with kohya as that does have block swapping
I guess I might try kijai's nodes then
heads up you will need to edit a line of code in the sampler, basically fp64 to 32 etc. For some reason he does it differently than base comfyui. If on battlemage it might just work though.
apparently qwen image's hf space takes more than 300s by default to edit an image, and so won't let me. weird that they made it take that long
with the lightning lora in comfyui inference takes around 3 minutes on qwen-image q8 gguf
problem is the text encode takes a lot longer because it's utilizing an mmproj
20-22s/it
https://huggingface.co/QuantStack/Qwen-Image-Edit-GGUF/
Rename it so it matches the main file. For example is you use:
Main model: Qwen2.5-VL-7B-Instruct-Q4_K_M.gguf
mmproj: Qwen2.5-VL-7B-Instruct-Q4_K_M-mmproj-BF16.gguf```
I realized I could just use the multigpu cliploader on lowvram and force it on xpu without issues.
I'm going to try qwen image edit, and I wonder, I have a feeling it will be much better than kontext at everything except text 🤔
Man. 48gb (on windows?) is not enough for loading the fp8 model. oh well, q6 loads then after the spike gives me 16gb free which ticks me off
Swap was set to 32gb which should've been enough. I'll retry with 50gb of swap
Replacement for CFG through dropping blocks randomly
If only SAI had gone a little farther with skip layer guidance. I wonder how good this will be in practice, it looks very promising, though they all do, this one feels like it's confirming biases I have and that makes it gooder in my eyes
gguf q8 model works just fine on my end but i have 64gb
using --reserve-vram 8.0
I'm using a q8 mmproj
and multigpu loaders alongside --cache-none and --disable-smart-memory
and --lowvram
I run out while loading it, since RAM usage spikes while loading. If I had more it would load. Windows seems to not care that I have lots of swap. Linux might be better
I have pagefile set to automatic on my end
currently taking 18gb
and normal qwen image too
both operate at the same speed in terms of inference
1024x1024 euler simple is 11s/it and 1536x1536 is around 14-15s/it
that's with the lightning lora tho
and cfg set to 1
I have manually set it to 50gb.
Mm, qwen inference speed seems to scale very well with higher resolutions indeed
1536x1536 is 2.25x more pixels than 1024*1024, or... roughly 2.25mp either way, but for you, and for me, it's more like sqrt of that as slow
Is qwen worth it? Seems really slow.
Yeah, it's unusably slow without the 4 step lora
But with it, it's better than kontext
I haven't tried text yet, I guess. Still poking things. Only did some tests with higher resolution now, it seems to follow the prompt less with higher resolution but it still does things, whereas kontext would either do nothing or do nothing and blur the image
also might be a bit faster too, comparing with kontext without any speed loras vs qwen with
better and faster
A slightly silly question: what argument do I need to set to run ComfyUI from the second ARC GPU instead of the first?
--default-device 1
Say what the actual GPUs you have are. If you're referring to an iGPU and a dGPU, you should probably disable the iGPU instead
I9-13900K, 2 ARC A770. And I see no integrated GPU in Device Manager, I don't remember whether I disabled it in BIOS or not.
And the -default-device 1 argument does not seem to work.
Open comfy/model_management.py, go to the lines where ipex_to_cuda is imported and ran, and after that do torch.xpu.set_device(1), keeping the indentation intact. if you did not install comfy using my script, this would be where comfy tries to import ipex, you can do that after the try-catch
Oh, Thanks. I deeply apologise, I didn't understand everything in the script, but I did it this rough way and it worked. If you could provide slightly more detailed instructions, I would be very grateful.
"def get_torch_device(): global directml_enabled .... if is_intel_xpu(): return torch.device("xpu", torch.xpu.set_device(1))"
Apparently early qwen image edit had a bug in diffusers' inference code. And apparently that was in comfy too, as updating comfy and rerunning the same workflow changed the output
you could also try those multigpu nodes, but last I checked they didn't support intel and had to edit some code to add xpu in.
I have a ASUSVIVOBOOK S 14, with INTELLLCOREULTRA 7 EVOEDITION with INTELARCGRAPHICS, The CONFYUI is using NPU or ARC graphics?
Hello colleagues, I have a question regarding creating an image (text-to-image). I have an Etsy shop that sells digital files prepared for wood engraving.
My problem is that I can’t find a suitable checkpoint + LoRA (if necessary) to get close to the style I used to create with Microsoft Designer. I will attach an image created with Microsoft – this is the look I want to get as close as possible to in my setup.
I have a 12GB graphics card, so full Flux models are not an option. 🙂
Comfy can't use the NPU
Flux is an option, your vram pretty much doesn't matter anymore as long as you at least have a good amount of ram and are willing to wait a slight bit longer
I doubt flux would be able to do this however
I will try with Qwen, I hope it might be able to replicate the style
funny how kontext's prompting guide better applies to qwen
Ok if you achieved some good results please tell me. Atm in on 12 hrs shoft and cant try anything
🤔
I think using regular flux/qwen might just be better
how much ram do you have and how long are you willing to wait
- Godricks Castle, Elden Ring
Huh, the 4 steps lora seems to produce better results than the 8 steps one
🤔
Oh and, I guess I might as well post this here too
This is top result. I have other programs to edit and prepare for engraving. Have 32gb ram and B58012gb vram.
oh, Vik. Can i use your script for setting up comfyUI for updating it too? The recent few updates have been breaking a bunch of stuff, I fear it will be even more so with using arc
yes
oh and do we even need to update
like is there any significant difference now
last I downloaded was back in April and haven't updated comfy since
can you link it again? the script
3rd post in the pins
if you want to use the qwen edit model, yes
i'm with the idea "If it's not broken, keep it that way"
flux models are tedious on my a750
it also has a minimap now though imo that's a bit of a redundant feature if zooming were better
wanted to try kontext
and group nodes have been improved
no point, go straight to qwen
kontext is just plain worse
how much ram do you have
32
you two need to buy a bit more
lol
since I don't do really edit much, I don't think it's worth the hassle for me. Since the nodes work nicely in their older versions
just wanted to see if the new comfy updated made any significant changes
I would've posted 8 images (2x4) comparing kontext and qwen anime-ifying but i forgot the bot times out if you post a lot
apparently they will be making a new version as well that does multi image editing better
I dont know anything about qwen, but have goofed around with kontext a bit. Can you elaborate on why its just plain better?
It produces better results. It has much, much less cases of refusing to do anything, It's much better at following the prompt. You can compare the 4 images yourself, I've gotten similarly better results with other tasks than animification
And that's with the 4 step lora for the regular qwen, an 8 step lora for this edit qwen just released and a 4 step one will likely come later and those should offer even better results
Kontext's prompt guide never made much sense to me as prompting differently did not make too much of a difference. but for qwen specifying "while keeping X the same" actually works and not specifying it actually sometimes gives a not-the-same result
The only thing which IMO kontext really did well, was removing text and watermarks. However in a quick test qwen seems just as good
One downside is the image resizing, from what I can see you might need to resize to multiples of 112? I'll test
ahh, thanks for the info Vik
I'm redownloading kontext to do a few more comparisons
only if you plan to use new models, comfy has to update for support new stuff.
yeah, but I'm fine with what I have for now
Yeah, just find the commit for when they added whatever model you want and you should be fine, sometimes other nodes get updated to improve performance though.
Some colorization attempts. Original, kontext, qwen. I cherrypicked the best results i could scrounge up. The rightmost kontext result is... Very blurry despite being higher resolution than the others
In this specific case, the person with the white hair is also an albino which I didn't specify in the prompt, so kontext's fried colors end up being a bit better but still
Girl on left should have brown hair, brown eyes. Boy on right white/silver hair, red eyes.
For some reason qwen is prone to outpainting when the resolution is higher. But it also interprets the prompt differently, sometimes better, which can be handy
@earnest grotto Qwen Image
Qwen Image Edit Lightning loras released 10 hours ago.
goin around on gmod messing with people using qwen image edit
is kinda funny
he didnt know
I noticed that text works a lot better at 1328x1328 or higher resolutions.
That's a vae approx of a 1536 image output im doing rn
this one is a 1024 output i got
mispell on apocalypse, fixable
or maybe im getting bad seeds
Regular lora vs edit
Original, "Make the ground in the image full of grass"
2mp, regular lora vs edit lora
Edit lora less prone to random outpainting 🤔
@keen adder ^
@earnest grotto thanks.
Oops, specifically I used the image edit model with one of the images you sent as a reference
https://huggingface.co/QuantStack/Qwen-Image-Edit-GGUF/tree/main
Since you have 32gb, you should get Q5 again. However I expect if your prompt the regular model similarly to microsoft designer, you should get a similar result.
Also, launch comfy with --reserve-vram 7
@earnest grotto are you remember promt for this image? Because i render more realistic style image 2 eagle. This is with Qwen-4_K_S and 4 step lora.
"Keeping the black and white sketchy style of the image, create a completely new image of greyscale hordes of rat men are sieging a burning medieval city, removing the pumpkin"
"Remove the bats"
An edit model is unnecessary for a realistic-looking black and white image
arguably it's unnecessary for this kind of image too as the style is very generic, but i wanted to see if it can match it more closely
@vik Promt: Keeping the black and white sketchy style of the image, create a completely new image of greyscale hordes of rat men are sieging a burning medieval city, removing the pumpkin. Where i make mistake its very different ?
Drag and drop the image I uploaded into comfui to see the workflow
you may need to open it in browser and trim the end of the link that converts to webp
https://cdn.discordapp.com/attachments/1193952640225267802/1409939327349166160/ComfyUI_25321_.png?ex=68af33d3&is=68ade253&hm=b9836bb8c731859ec56b39ff17519c1b4899bd3d757b9aebdefe87ab673df166 this is link . if i try drag drop confy give me error
Alert
Unable to process dropped item: TypeError: Failed to fetch
This content is no longer available. @earnest grotto
after cut to see workflow
On Windows 11 the script was giving me permission errors about Comfy_Intel after third guestion (Continue?). Double clicking or right click -> open with Python didn't work. The way to solve it was to open PowerShell and run "python .\Setup_ComfyUI_Intel.py"
Run it in a place where you don't need admin permissions, like your documents folder
apparently updating various things on linux, probably oneapi, besides breaking steam and all steam games, also nuked lora training performance for sdxl
at least windows works 😐
do you know what got updated?
My horrible performance with anything after IPEX 2.3 on Arch linux is probably related as Arch uses the latest packages long before they hit on any other distro
I will check later, but I would definitely pin it on latest things
I was sticking with older oneapi or level zero or both, not sure, enough that Blender's Cycles wasn't working
After updating it now works, but, well...
oneapi is irrelevant now
kohya installs its own mkl and dpcpp
pytorch ships its own mkl and dpcpp
level zero is the main driver
also did the kernel update from 6.8 to 6.12?
aka old lts vs new lts
I installed the kernel myself
With 6.16.3-061603-generi, after 87 steps I got 11.5s/it
Will test with 6.14.8-061408-generic now, then 6.12.3 then 6.11 then 6.8, if none break
I believe I was using 6.14 fine
190 steps, 11.4s/it with 6.14
10.99s/it with 6.12.3-061203-generic after 40 steps
10.97s/it with 6.11.11-061111-generic after 40 steps
tried many older versions of libze1, intel-opencl-icd, libze-dev and whatever else with not much luck
I'll be trying more some other time, I guess
@earnest grotto I understand now where is big difference you use Img To Img workflow and copy style of my referal image and create new. I try to make Txt To Img and from there come this different in style.
Last time I updated linux, it broke everything, haven't used it since lol
It was in my user folder.
Tried a lot of sycl's environment variables, more versions of level_zero and other packages, no results
what would be causing urenqueuekernellaunch to be taking 10x as long as it should, jesus
Damn. Even with onednn profiling enabled, windows is ~4x faster, ~7s/it vs 26s/it
b580 사용법 혹시 한글로 설명 해 주 실 수 있는분..? 몇달째 Stable Diffusion 구축하려고 시도중인데 실패하고 계속 cpu만 도는데..
You can install ComfyUI using my script ^
You can also use SDNext https://github.com/vladmandic/sdnext or Intel AI Playground https://game.intel.com/us/stories/introducing-ai-playground/
What GPU do you have
I`m ARC B580
Oh yeah, sorry, forgot about that
Install with my script or if you want a simpler installation and a simpler UI, SDNext or AI Playground
Thank you. I'm going to try comfUI. Thank you for the good information. I'll try to copy it
can I perform img2vids using wan2.2 on my arc a750?
if so, I assume I need to update my comfyui
and updating from the manager will probably break some stuff(?)
How much ram
32
You need more for wan 2.2
Well, you could probably run the q3s but quality will degrade a fair bit
From my own experience, you can get by with 48gb however if you have 32 already, you have 2 free slots and you're better off getting at least another 32
Did it work?
I used all my slots, i can get 16gb sticks I suppose
will this be the case?
if you installed with my script, run it again to update
or:
git stash
git pull
git stash pop```
Okay, thanks
it should work I think, if comfy unloads the previous model first. I still haven't gotten around to using it
GGUF will be he best bet, or Scaledfp8, i'd give those a try.
It doesn't unload the previous model
probably end up finding some workaround, maybe if one model is a little less important use the lower quant or something. I really need to get around to trying this stuff out. Want to upgrade my ram next anyway
Using only one of the 2 models defeats the purpose
Might as well just use wan 2.1 then
I was thinking use a lower quant for the less important model and higher for the most important. Not really sure how it works but I was thinking its like sdxl
I tried that, having a higher quant for the high noise and lower for low noise is fine
problem is that with less ram you will need lower quants, imo q3 degrades too much, and I expect you'll run out of ram while loading the models with a higher quant, and windows refuses to use swap for that
2x8gb ddr4 costs ~40 euros and 2x16 80 so i think the prices are just low enough that you should get more
Yeah plan is to buy another 32gb kit unless prices drop again
Personally, my crystal ball tells me that nothing ever happens, new CPUs will not be massively better for a few more years, so if you're not planning to upgrade your CPU in the coming years, might as well lock in on ram
block swapping and similar are turning out to just be too good and things are more compute-constrained now
for image/video gen, that is 😛
@lament shale If you really want, you can try both with Q4 actually
These might be able to load with 32gb, not sure, if you close everything else
I'll try. I have so many things running on the bg
You will want to close pretty much everything that eats ram, at least on the first run when the 2 models load
The Q4_K_S quants are 8.5gb each
fp8 umt5 is 6.5gb
23.5gb just for the models, 8gb leftover is not much to have left over
Hmm, gguf umt5 could help i guess
q6_k is 4.76gb and should be near identical quality to fp8
if that's not enough, get q3 for the low noise model first and use that, keep high noise at q4
then if tha's still not enough, get a smaller umt5 and if even then you don't have enough I think just call it quits and get more ram. q3/4 text encoder, q3 low and q3 high will just be too degraded imo
I'll make a comparison in a bit
I take some things back. Bloating my pagefile to 50GB may have helped
though subsequent generations look to be ~40 seconds slower per step, ~80s->~120s
Not too bad I guess?
Have you tried kijais nodes? They usually have more control over RAM/VRAM allocation
🤔
Feels like imgur has killed a lot of the quality but oh well
Yeah I'm compressing and uploading the videos here directly, this is too bad
Q3 basically kills the motion in the grass and the silhouette of the moving branch, and introduces some artifacts
High noise Q4 breaks the eyes
But using q4 as the high noise is at least better than Q3 with grass movement
I think the lower quants' issues, particularly Q3's, will be more noticeable with plain text to video rather than image to video 🤔
@lament shale If you have space on an ssd, you can try massively increasing your pagefile

Previously I was using ~10-20gb of page file, at 50gb I could do both Q8s at once
Though, this is with 48gb of ram
I would say while worse q3 seems usable at least. That amount of ram seems crazy for 16gb card though
128gb is probably the real play for ai, dunno if ddr4 goes higher?
Switched to the "Xe" driver on Linux and now she is fast with PyTorch 2.7.1
Horrible non-blocking performance issue also doesn't exists with the Xe drvier
unfortunatley int8 matmul is still slower than bf16 and still has 2x more memory usage than bf16
qwen-image also got a nice boost
if this is sdxl with euler ancestral, i get about this speed on windows as well
though not so much for qwen
hey i got intel arc a770 16G anyone know can i run wan 2.2 5b
I'm pretty sure it was slower on windows too before some recent update
Yes
i am facing reconnecting problem when i run wan 2.2 5b on arc a770 what is the issue
5b should run no-issue, should even run on a750 without much tweaking but can't say for sure yet
Is this with the workflow I showed
What is shown in comfy's command prompt
Do you have anything else running
Is that an 8gb a770 or 16gb
how are you running comfy and with what arguments
give more info
Yea this workflow wan 2.2 5b on arc a770 16gb using comfyui in ai playground and I am facing reconnecting problem but when I run this with quantized version it work buy video quality is so bad
How much RAM do you have
16gb required more ?
Video quality is bad because wan 2.2 5B is only 5B and they have not trained it more (I assume, it shouldn't be their priority like the 2x14B models are)
you're running out of RAM with 5B when it's not quantized
You barely have enough RAM to run the 5B model, you don't have enough to run anything better, so yes, you do need more
even 32 is kinda on the low side for video. you might be able to get by with just 32
to run the 2 14B models
This bad quality
you've changed some settings incorrectly
i just change resolution to 480p
720p take long time
what is the issue do u think for this quality
Show the workflow
Write a longer prompt and increase the resolution
I guess this is another reason why you should want to get more RAM, I can't find any few-step ("lightning") lora for the 5B model, so you would literally generate faster with the 2 14B ones anyways, if you had the RAM to load them into
can u suggest me any model that i can run on my system
q4 wan 2.1, q6 umt5
potentially q3 wan 2.1, dunno how much ram windows will keep for itself
16gb of ram is below the recommended specs of a good chunk of modern games anyways, you should really consider getting more
If quality is bad on 5b use the ones from kijai, he supposedly fixed it at some point.
I am compute limited, and for the things i want to do 48gb also starts to seem not enough or going above it would be a good speed boost
block swapping is too much of a good thing
think about it this way: a video model takes let's say 60s/it, with an iteration basically going through all the model's weights
even if the model was 40gb big, 60 seconds is just so, so, so much longer than it would take for all that data to move from ram to vram
even with pice 4 x8, that's 16GB/s, nevermind x16 or pcie 5 x16 or having a part of the model already in vram at the start or whatever else
like how it's faster to use a q8 model with block swapping than the other not-neatly-8-bit quants without
musubi tuner apparently got qwen edit support. but evidently that needs at least 64gb (for block swapping) and bnb is still kinda up in the air for intel
luckily it has kontext support too but I kinda feel bad about even considering training a kontext lora due to how much massively better qwen is
if models weren't trending towards being bigger and slower...
I think the monetization goal for these companies is to have a free really big model and really small model then have a fast premium model that is in between.
Basically sell it to companies to use for apps and get paid fees ect
OpenVINO has the same 2.35 it/s speed regardless of the driver
Tho Xe driver is a bit unstable for daily desktop use
Anyone noticing or seeing reports of people having issues installing/running ComfyUI using PyTorch 2.8?
I had an issue but i just had to install the latest version of Intel deep learning essentials. Now it runs perfectly with SDXL with a nice performance uplift.
There is this issue on Linux with up to date distros: https://github.com/pytorch/pytorch/issues/159974
Issue seeing is environment fails. Doesn't happen on all systems but its fixed by setting torch from 2.8.0+xpu to 2.7.0+xpu
Wondering if something with the upstream wheel has an erroneous system level dependency or path
I tried reproducing it but it just works
Its very hard to reproduce. Its happening on some but not all system that had previously worked, then for some reason can't on PT 2.8. Saw on a lab system. Reinstalling Windows fixed that PC. Others that have it, going back to PT 2.7 gets past the error and install works.
Tag me if you guys see this issue.
Trying to get vibevoice to work.
Currently can't get it to load into XPU.
Works on CPU, both the 1.5b and 7b models.
tbh tho im still not impressed with the voice cloning quality
He inhaled only a moderate amount of helium
it was sheogorath in disguise
This sounds as if
markiplier
tried to do the imperial watchguard voice
(this is a copypaste)
imperial watchguard tries the internet and complains
ok i kinda take it back it's pretty good at voices
It's properly loading and running with ipex_to_cuda and ipex 2.8.10 and inferencing at 1.36s/it with vibevoice 7b
xpu outputs seem dodgy
^ on CPU
This is just to show the quality on xpu vs cpu, which with just ipex + ipex_to_cuda seems much worse for rn
is that all running on diffusers? Or are you using a quant of some sort?
Nope. These last few are all CPU.
Takes up like 44gb of sysram
Not comfyui, sadly. The two current nodes I've tried to get working but both of them have issues properly allocating to xpu
I instead just ran the official repository, backed up.
https://github.com/vibevoice-community/VibeVoice
https://huggingface.co/vibevoice/VibeVoice-7B
My next upgrade will be ramz I was going to just add another 32gb kit but maybe I will try a 64gb kit, have to find similar timings. Models seem to be getting bigger and bigger
Are you running out of ram 🤔
yeesh
I guess I will need to see what happened to bitsandbytes support
8 bit adamw might be pretty necessary for kontext and jesus, qwen...
😐
Hopefully they're good enough
(same for windows)
NotImplementedError: The operator 'bitsandbytes::optimizer_update_8bit_blockwise' is not currently implemented for the XPU device. with the latest build
rest in pizzeria. peppino. pizza tower.
With wan2.1 14b I couldn't do any of the loras because I would oom, or it would be pushing the limit. Think with 32gb ram i have 24gb usuable "vram"
yeah i think windows likes to eat ~7gb for itself without debloating(?)
though since the amount it uses like that might depend on how much ram you already have, i wasn't certain
TorchAo also has AdamW 8bit
That works on intel?
It should
I will try
Also you can use Adafactor too, it will use close to no memory
Well, guess I might as well try CAME too if musubi has it
Original came has no optimizations implemented
It will use as much memory as AdamW and will run very slow
They forgot to disable gradients on the optimizer. This is the most basic thing
ah, i guess part of my issue is training with 1mp images, but then kontext didn't seem very intended for lower resolutions... oh well, i'll scale them down anyways
🤔 going down from 1mp to 0.6mp was a 3x boost in speed but vram usage is still massive
Another thing I forgot... torch.xpu.empty_cache() right before and after backward() drastically reduces vram usage and improves performance. as always. ~15.3s/it -> 13.3s/it with nothing else changed, and VRAM usage went from.. I think ~14->9.3GB? (For the number of blocks I set it to swap, which was the highest, 34) Shared is still rather high however
@earnest grotto I think vibevoice currently trumps any competition open-source-wise.
It's not even comparable.
I can tell that's supposed to be kleiner but it just doesn't sound like kleiner
Also it gets randomly quiet
Being able to do long speech without changing the voice is good I guess... Unless the random quietness is an artifact of not truly long speech
There are parts where it nails the "kleiner" feel
but then there are other parts of the voice that it adds in that doesn't.
It's a hit and miss.
I also noticed the volume decreased halfway-through.
Not sure why it does this.
Also has that voice artifact that you'd hear from older tts models
A hiss.
Seems models with better emotion and natural speaking have harder times with capturing the likeness 100%. Might just need more training data though as most can make the voice off small snippets it seems.
Yea that's better
It definitely trumps the others in terms of voice similarity
I'm currently waiting for the experimental gguf models to get somewhere on vibevoice
I really want a way to run this model faster.
yeesh musubi tuner's inference script has a lot of bugs
the trained lora wasn't doing anything in comfy so i wanted to see if it's a comfy issue or it just needs that much more training or what, and man...
New qwen lightning lora, and one for edit soon™
damn, pretty nice
Does anyone else have problems with either opencv-python module trashing the bed or WAS node itself mishandling that particular module to a grievous extend? (as in, you can reliably nuke your comfyUI installation by just installing WAS node which then does something stupid with OpenCV Python, which then breaks anything relying on that)
Pytorch fixed some regression or is this just the xe driver improvement for linux?
How is it with windows?
That's what I'm wondering too
i don't use windows
Fair
2.25it/s
well, I haven't tried 2.10 I guess
i am scared that musubi tuner's resuming doesn't work so I kinda don't wanna stop training now
I'm sure a random morning/night 3 second blackout or brownout will eventually get me though
Ah, I spoke too soon. The slight speed boost is also on windows
New Lumina dropped. Man, at this rate anime models are really getting left in the dust
Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding
8B and some of their examples really give me confidence. And the last Lumina was alright
Got my hopes up with loading https://huggingface.co/DevParker/VibeVoice7b-low-vram/
getting 3.7it/s
Sadly it won't actually produce any audio on my end. It's a bnb 4-bit model.
It loaded onto my gpu. I have bitsandbytes 0.48.0.dev0 installed.
Clearly not properly though, since it didn't produce anything.
Dumb question but where is this rate displayed? Is it in the console when running a job?
I am curious
in cmd, as I'm unable to run vibevoice through comfyui
its displayed in cmd as well for comfyui btw
ah gotcha 👍 thanks
Very much a novice just tinkering in my spare time with comfyui
AIPlayground made the install so easy
😂
since 2.10 is 2.1, a lot of the ipex_to_cuda version checks now don't make sense
Did they add 4bit support for intel recently? Last I checked it wasn't.
bnb claims it has qlora support for intel, which needs 4 bit
I got some high hopes
a person named calcuis is working on gguf-connector
he already has a vibevoice 1.5b model quantized and it works on xpu fine with some changes to the code
i had just sent him the link to aoi-ot's 7b model backup
since he didnt even have 7b supposedly (asked him in hf discussions)
Hi there!
Anyone know how to send PM to Bob Duffy? I registered yesterday on Discord and sent Bob a friend request.
But there is no guarantee that he will accept the request. I am new to Discord and don't understand anything.
My PM is very long and I also don't want anyone else to read it. Please give him a sign, someone!
Why
what why?
Why do you want to DM him
^ Ngl he posted the world's most LLM generated greeting.
Suspicious.
Spent a while bashing my head against the lora I trained, now am confused if kontext is just THAT stubborn or what, as I still see literally no change, even after using musubi tuner to merge into kontext and then using that in comfy
This is not possible for new Discord users who are not friends
Yes, and why do you want to do it
I have a lot of different questions. You might find this boring.
state your questions
I will do so as soon as I can get Bob's contact information. As soon as Mr. Duffy directs me to you, I will immediately express my questions to you.
Hi Bob. I have many different questions about CPU, B60 dual, MB, AI Playground...
I'm not looking to be your friend on Discord, but can I send you a PM?
If you are new to not only Discord but all types of online chat, then it is reasonable you wouldn't know that what you are doing is frowned upon and actively discouraged
(that is the reason Discord is preventing you from doing exactly this)
You should simply make a post in #1243956384052285560 or #1088926345138012160 with your questions. Or perhaps start with one question at a time
An individual like Bob (intel employee) is in a one-to-many situation with the many, many general users.
Imagine if every person with a question for him just DMed him all the time, and no systems were in place to prevent it. He would be buried under an avalanche of DMs.
Thanks for your comments.
I'm 51, of which I've been using the Internet for almost 30 years. No place except Discord never made me uncomfortable with the surfing interface.
In addition, my Discord is buggy and hangs.
The issue of "one for all" is a personnel problem of Intel. There must be assistants for this. But I hope that Bob will be very interested in receiving feedback from me later.
I'm so fed up with the "professional" advisors I met on Reddit, that I don't want to risk it anymore and put my questions out in public.
Today, you can count on the fingers of one hand the number of people who are planning a similar PC build to mine.
In three months, there will be more of these people. And these new people will turn into, as you said, "many general users".
The first part of that problem is thinking intel's employees roam reddit
This discord exists as an insiders community for the people that mess with and have quetsions about intel products
I found this thread linked in a YT video. Is the CMD method for installation still working? When I try to launch it after installing requirements-ipex.txt, I get a bunch of errors: numpy incompatibility, pytorch incompatibility, torchaudio missing, av missing
I've made a script that installs comfyui (or kohya_ss) for you #1193952640225267802 message
(See also, 3rd pinned message)
@earnest grotto
I got an LLM to create a bnb test script for me
It can do 4 and 8 bit inference supposedly
Only thing we cant do
well... without seeing the code there isn't much to say
furthermore, not everything is linear layer
adamw 8 bit definitely doesn't work, though that's training not inference
this basically only tests if bnb will throw an exception or not
it's not great as an actual test
just in this case, I'm willing to assume, if there's no exception, bnb have it fully implemented
and, again, linear only
I could just ask it for a more complex test.
but they do list qlora support so i'm sure they should have inference for 4 bit at least
take what i say with a grain of salt lmao i cant code in the first place
https://github.com/Disty0/sdnq
SDNQ is faster than BNB on Nvidia
Works with any GPU or device
for AdamW 8bit
sdnq.optim.AdamW
optimizer_args: use_quantized_buffers=True
use_quantized_matmul feature for inference "works" on A770 but slows down the model instead of making it faster, so set that to False
Nvidia runs almost 2x faster with it
RX 7900 XTX without a proper INT8 hardware manages to run 10% faster than FP16 with it
im kinda frustrated rn
for the last week ive been trying with my non-coding brain to modify the gradio_demo.py in vibevoice's repository
I don't know what I should use if I want to load a large model with offload capabilities
I wondered if I could use sdnq to quantize it to int8, allowing the model to fully fit into my gpu memory
but I don't really know how to do that either
https://github.com/Enemyx-net/VibeVoice-ComfyUI/releases maybe this has something?
I don't think I can get Vibevoice-7b working on that.
I've tried that one and two others
It is used the same way as any other quant options like bnb, torhao and quanto
set use_quantized_matmul to False on Alchemist tho
also sdnq supports anything between 1 to 8 bits and also has int and uint quants
you can try uint3, uint4, int5, int6 for low bit
i fed 2.5 pro the repos for vibevoice and sdnq and got it to make a working script that loads the model into cpu, quantizes to int8, and moves it over to xpu.
I'm getting speeds around 1.7s/it and it takes 12gb of vram on the vibevoice 7b model
ok guess im not posting any snippets the bot's onto me again
it has flash attention 2 still there but it auto-fallbacks to sdpa since it isnt supported
i forgot to get it to change that
You can set quant device and return device to xpu
1.67s/it
around the same speed, but more vram is used
Also looked at the code a little bit
14.5 instead of 12
Why are you not using quant config on VibeVoiceForConditionalGenerationInference.from_pretrained?
And loading the unquantized model to memory instead?
its a script spaghetti coded by gemini
i also didnt know there was a quant config
lmao
This is the quant config
I didn't even list the other manual method
Also to disable compile: set SDNQ_USE_TORCH_COMPILE=0
Tho i would much rather figure out why triton is failing than disabling it
It has quite a bit of speedup
@civic charm I don't think gemini succeeded in using sdnqconfig lmao
it won't load onto xpu anymore
oh i think i know why lmao
it removed quantization_device and return_device lmao
or not
🤷♂️
Loading part seems right
What is the actual issue?
It states it's trying to allocate 8gb on 16gb but it's failing to do so.
device.map = self.device is the culprit
transformers will try to allocate everything at once
so manually set it to device.map = "xpu", ?
And you will hit the 4gb alloc limit
No, set to cpu or use set UR_L0_ENABLE_RELAXED_ALLOCATION_LIMITS=1 to emulate above 4g allocs
👍
added the latter command, works now
around the same speed as the previous scripts, 1.79s/it
its spiking to 93-98% xpu usage and its at 12.8gb of vram
This shouldn't have a difference on running speed, only on loading
For running speed, look into this
btw, sdnq also has my came implementation that doesn't run like potato
Also has optional 8bit support on top
sdnq.optim.CAME
To enable 8bit, pass this to optimizer args: use_quantized_buffers=True
reverted to current stable torch+xpu build (2.8.0)
AttributeError: 'TritonLauncher' object has no attribute 'shared_library'
i think im missing something
Triton's automatically installed when you install from pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/xpu
im assuming im not supposed to use it
That is the correct way
I'm on pytorch-triton-xpu 3.4.0
C:\Users\dbs_5\Comfy_Intel\cenv\python.exe
C:\Users\dbs_5\AppData\Local\Microsoft\WindowsApps\python.exe
Alright, time to run pip3 install --pre --upgrade --force-reinstall torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/xpu
pytorch-triton-xpu==3.5.0+git1b0418a9
current script
current error
I have no idea what this is:
OSError: [WinError -529697949] Windows Error 0xe06d7363
I guess triton support on Windows isn't ready yet
llm added funny if statement to the device_map and removed quantization_device and return_device
caused it to not work
that latest script doesnt work for some reason, outputs garbled even with --torch_no_compile
I think I kinda gave up after seeing no change in 4000 steps with 0.0002 lr and my experience with how plain stubborn kontext can be with prompting in some cases, I'm assuming the model itself is just that fried, I'd probably have to train qwen and I'm not sure if I have the RAM for that. Maybe if I can hack musubi tuner to load in fp8 right off the bat, since it has a tendency to want to load in bf16 and then convert to fp8
went back to the non-compile script and im happy with it working at 1.7 it/s on pytorch 2.10
nearly double the speed than on cpu, so that's cool
or not becuase the first generation artifacts, then the second generate is gibberish
The CPU loading script doesn't degrade, but the xpu only ones do.
It stays and remains at a stable 1.71-1.8IT/s, while the ones utilizing xpu-only loading alongside SDNQConfig are either artifacted, garbled or the model doesn't output any audio data at all. It just goes straight to 3IT/S and generates nothing sometimes.
I'm very, very confused as to why.
just to make sure, did a second run in the same
random copypasta
After training the neta lumina lora for even longer, i feel it's actually coming along better (12.5k steps)
Really makes me wonder what did neta screw up. Perhaps they overfed it poopoo autogenerated slop captions but otherwise the large amount of images they trained on was still good 🤔
pretty sure that model doesn't even understanding basic quality tags like "best quality", "low quality", "worst quality" despite them insisting on those tags in the prompt guide
is anyone else experiencing or has experienced a weird issue where half way through generation the image will black out entirely?
What pytorch version
im using 2.9.0+xpu, i also use this persons installer script but tbh i havent tried installing comfy without it in awhile https://github.com/a-One-Fan/ComfyUI-Intel-Installer-Script EDIT: just realized this is your script, i have in the past always used the 2.5+xpu option but ive recently swapped to the stable pytorch version to see if i can get improvements
That's me yes
Anything after 2.5 is faster than 2.5 yes, especially the recent nightlies get up to 2.45it/s with sdxl (vs ~1.85it/s)
Show your workflow, when did you last restart your pc
I tried a variety of things and I’m not exactly sure what fixed the issue. I last restarted my PC yesterday and I’ve included a picture with my workflow. My main goal was to see how the new stable option would affect performance, stability, and compatibility. I also selected the option to install the recommended nodes including KJ, RGThree, and others. I think that’s when the problems started possibly because I was trying to use my old workflow on a new version (I hadn’t updated ComfyUI in a while) or due to some other factors. But recently I uninstalled the new version and went back to the older 2.5+xpu version and things have been running smoothly since. I plan to try updating to the stable version of pytorch again to see if that was the issue. For now I’m unable to pinpoint exactly what caused the problem but thank you for the responses.
does resolution affect the it/s rate? i assume yes.
but if thats the case, is there a standard for benchmarking like this?
Yes but you should not generate anything that is not 1mp from scratch with most sdxl models
exactly 1mp or "1mp or less"?
640x1536, 768x1344, 832x1216, 896x1152, 1024x1024, +-64 pixels
and vice-versa
some newer anime finetunes are trained to be able to work at higher resolutions without breaking. but generally, you can assume sdxl performance is 1024*1024 or might as well be 1024*1024
lower -> image gets fried
higher -> repeating patterns
thanks for info
Could someone help me get it working on a Core Ultra iGPU in Linux?
This is what I did:
git clone https://github.com/comfyanonymous/ComfyUI.git ~/ComfyUI
cd ~/ComfyUI
python3 -m venv venv
. ./venv/bin/activate
python3 -m pip install --upgrade pip
pip install -r requirements.txt
pip uninstall -y torch torchvision torchaudio
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/xpu
pip install --upgrade comfyui-frontend-package
Check:
python3 -c "import torch; print('Torch version:', torch.__version__)"
Torch version: 2.10.0.dev20250914+xpu
However I get this error:
/home/floki/ComfyUI/venv/lib/python3.12/site-packages/torch/xpu/__init__.py:61: UserWarning: XPU device count is zero! (Triggered internally at /pytorch/c10/xpu/XPUFunctions.cpp:115.) return torch._C._xpu_getDeviceCount()
I was using Flux models and thought this was just normal B580 generation speeds, but then I saw all the posts in here from folks talking about sdxl speeds so I tried that instead. Holy smokes lol. Sdxl models are just way, wayyy faster.
What are you doing
Anime? Realistic stuff? Inpainting? etc.
Right now I am doing anime, but just for learning prompting. With flux I was trying to create more realistic scenes
I am trying to learn comfy and prompting first.
There are no good anime finetunes on any model after sdxl
I havent figured out how to properly do inpainting or img 2 img or upscaling stuff. The last sounds like it should be dead simple but when i tried it it just didnt do anything lol
What does this mean?
Get animagine 4.0 opt or illustrious 2.0, read each's prompting guide that explains the quality and artist tags, and open up gelbooru and look at what tags images have
btw, how is the speeds on b580 with sdxl 1024x1024 and with what pytorch version?
I got illustrious to try out but idk what version lol
this one is nice: https://civitai.com/models/1299502
I am away from desk or id tell you precisely, sorry. I'll follow up next time I get in my desktop.
but i was doing 896x1152, batches of 4-6 and it was very fast. Pytorch version is 2.8 I believe
i am trying to figure out if battlemage is able to deliver its full potential. If it can, it will be faster than an RTX 3090 Ti.
Base models can't do anything anime art. They might have some offputting corpo art-looking style (see: ghibli trend) and know who hatsune miku is and that's about it. To be honest, even SDXL and 1.5 did actually know the artstyles of some classical art so that was at least interesting. So you want someone to go and actually train them on a bunch of art
This has not happened to a useful degree for any model after SDXL
Lumina has an anime finetune, but it's kinda broken. For Flux there was Chroma but even though it's trained on anime, it's still not quite there
Alchemist / A770 can only deliver half of its full potential
don't open gelbooru at work
they are trained on danbooru, not gelbooru
there are some slight differences in tagging
Yes, and danbooru limits searching to 2 tags
The general concept is enough, it's not like the models follow the tags infinitely well anyways. For example "wing collar" is confusing enough (to CLIP?) to make wings despite having that many instances
my instinct with Arc is to say "it won't deliver"

but happy to help your testing with my card
here is hoping that by Celestial dGPU release they have unleashed full potential
it won't be anywhere near a rtx 3090 on gaming but matrix / ai compute on intel gpus are kinda insane
yeah im pretty ignorant on the AI performance. I had read somewhere that it was just ok, but that was ages ago. Maybe I have an outdated perspective here
are you currently running an A770?
or RTX 3090/3090 Ti?
a770 and rx 7900 xtx
For inpainting, you want a dedicated inpainting model
There's basically only 3-4 or so. Base SD 1.5, Dreamshaper 8 inpainting (an SD 1.5 finetune), SDXL and Flux. Faster, lower quality <-> Slower, higher quality. Anime isn't super big of an issue here if you just intend to use inpainting to fix up minor issues or remove things while keeping the bg consistent, since due to actually seeing the image inpainting models can usually match the style to an extent
Also,
there's some external ones you need custom nodes for, brushnet/powerpaint is good but you can try those later once you familliarize yourself with comfy
The default inpainting workflow has an issue if you intend to inpaint multiple times but let's keep it simple for now i guess
You don't have any prior experience with any other nodal UIs, right? Blender, unreal, unity, houdini, etc.
I do actually. I used Unity for years and used their state machine animation system, before node based / visual scripting got introduced more widely later on, around the time I abandoned gamedev. Dabbled in Unreal 4 some briefly too.
Most of my experience is older and without those systems though
This is good info re: the faster to slower model options
Well Comfy is that but worse; if you have Blender experience in particular, some cool people have made a Blender addon to integrate a ComfyUI node editor into Blender, and Blender has an actually pretty decent nodal editor so that's a huge jump in usability. I might want to poke it a bit more again though, my old Intel code there might be too old now
I'll show you a fixed inpaint workflow in a bit then
So far I am doing alright on workflows and nodes, primarily using built in comfy ones
I did install a third party workflow that was missing a node last night, so i went to github and grabbed the python script for that node and dumped it in the indicated folder. But it didn't work, couldnt get that node to work, even after restarting everything. So I just disabled it because it was just a resizing node anyways
VAEs are lossy
Even with just 1 encode->decode, the image will get slightly blurry and very fine details (grass, chainlink) will be gone. After probably 5, your image will be fried with the SDXL VAE
The bottom nodes, you can adjust how much the mask gets blurred
There's some other issues with inpainting as well i feel, and not comfy-specific... but not much we can do
Are these standard nodes?
(I am not on desktop, I haven't had a chance to go tinker again today)
If a node has text above its corner there, it's a custom node, EXCEPT for nodes that say [BETA], those are built-in
The text is determined by what the addon registered its custom nodes as. usually most people will just name them after their repo
No text means it's standard
Though if you zoom out enough so that other text disappears, this does too 🤷
i don't recognize the character
Apparently Index TTS 2 released 9 days ago
Were MS stirring up drama to bury it?
And apparently it has proper emotion control, judging by what a random custom node for comfyui has for it
Testing it on huggingface
And I'm not impressed so far lmao
index tts 2
It's a lot faster than Vibevoice, I would say.
It's also quite a lot smaller.
Vibevoice7b
Which is closer the voice you cloned? The second has more inflection but does it match the voice better?
The second one matches the input audio better.
Both in terms of level of artifacts, and in terms of voice similarity.
So this was with the default SDXL Simple workflow in comfy:
using the default prompt it had (evening sunset scenery blue sky nature, glass bottle with a galaxy in it) and set to 1024x1024
the image lol
pytorch version: 2.8.0+xpu
idk what kind of performance is "good" or not. considering this has the refiner model to load also it is slower than it could be
oh, and this is in Windows
can you do a second run? first runs have the jit overhead
also load a normal sdxl model
no one uses the refiner and it is a different arch
sure i can do that. what is a 'normal sdxl' model? this one came with "sd_xl_base_1.0.safetensors" as the base model before it does the refiner step
sd_xl_base_1.0.safetensors is the one i am interested in
I ran it 3 times
my workflow
same prompts as before, and same model, and 1024x1024 again
@civic charm much faster this time
4 it/s is pretty close to its full potential
RTX 3090 gets 4.0 it/s
RX 7900 XTX gets 4.8 it/s
RTX 4090 gets 8 it/s
i mean just thinking about MSRP of 3090 vs B580. or die size
4.12 it/s for $250 is insane
if only they had a 24GB B580 😄
B60 : p
yeah
i wonder if performance will be noticeably worse for B50 since it doesnt have the full G21 like B580
Intel lists it as 170 int8 tops
b580 has 233 int8 tops
interpolating this info should give around 3 it/s but not sure how much memory bandwidth bottleneck it will have from a 128 bit bus
B60 is gonna be the move
Show the upper part of this
the workflow?
The command prompt
Run comfy again and show what it says at the start
do you wanna run the script again, pick nightly, and tell us what the performance is then
yeah 😆 im a scrub who got into it that way
@earnest grotto have you done nightly pytorch builds and seen a noticeable uplift on Arc?
A770 went from 2.2 it/s to 2.4 it/s
that's not bad
Some interesting news today, apparently China is banning the sale of Nvidia chips. With any luck, there will be a move away from CUDA in open source offerings lol
They only banned 2 nvidia gpus, not all
And... Things are already not too cuda-dependent
You also don't need to build it yourself. (presumably) The pytorch foundation builds them. You can just download them. Thing is I don't want to mess with AIPG's environment
Yeah. Maybe i will try it anyways. Worst case scenario Ill just reinstall lol
If 5-10% gain is even theoretically on the table thats pretty fire
pip3 install --pre --upgrade torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/xpu
Oh i meant to ask, have you tested linux vs windows performance differences?
For inference it's pretty much the same
👍 cool
Only difference is if you run out of vram on linux the driver crashes (it kinda does on windows too sometimes but generally not)
I have a kubuntu install on the same machine but i only use it for playing games occasionally
thats a big difference lol
Training performance was better on linux but recently it improved on windows, and there's some odd issue on linux i haven't nailed down yet that kills training performance
Xe driver is a little bit better on this end, it doesn't crash the gpu with it anymore
Ah, windows has pseudo-leaks in inference that don't happen on linux (to a noticable extent?). I get them after doing ~30 images with heavier models like kontext or qwen, or hundreds of images with sdxl
VRAM is clearly free but at some point the runtime gives up and says it ran out of vram anyways and will refuse to let you gen until you restart your PC
However on both OSes those do seem to happen in training, again moreso for heavier models
And spamming empty_cache() seems to resolve both of those and sometimes improve training performance and definitely improve vram usage especially during training, which is odd since it used to break WSL before
Wacky bug, really contradicts the pytorch doc, at least for the Nvidia empty_cache
I could get 3 batch size training an sdxl lora with it
Couldn't without
Xe driver with PT 2.10: 3.6 s/it with 11.8 gb vram usage
2 batch size?
yes
damn, that's pretty good, so it's fixed then
i915 driver is still horibble tho
ipex 2.3 and openvino works fine with i915 but pytorch / modern ipex doesn't
they only work properly with Xe
also i915 is the default driver for A770
Also using Xe will lose video encoding support with alchemist
Ah... I wonder if I was using the xe driver, forgot that I was, and accidentally uninstalled it and went back to i915. I was having this issue with OBS
I am using Linux 6.17.0-rc6-1-mainline
Added this to GRUB_CMDLINE_LINUX_DEFAULT
i915.force_probe=!56a0 xe.force_probe=56a0
video encoding on alchemist is not supported on Xe, that's why it errors out
I don't really use that feature and i do have another GPU to use if i need to so it doesn't really affect my use case
sudo lspci -vvv | grep xe
Kernel driver in use: xe
Kernel modules: i915, xe
Also 56a0 is the device id of A770 16GB
ye
This happened to me too but I just restart the ComfyUI service and it fixes
eventually restarting it won't work
Hmm interesting
So far i haven't hit a hard wall
Maybe its an AIPlayground quirk that is actually an advantage
well, most people are unlikely to generate 700 sdxl images without restarting their pc
it's a bit more concerning with the heavier models though
700 😵
qwen image edit's results are good but only good enough that i really want a few extra variations
come up with a script that makes the prompts you want. generate lots. sift through
In hindsight... IMO better to just lock a seed and tweak the prompt and weights
Oh, and grids for testing out loras
let's say 8 loras, 4 seeds, 8 prompts. that's 256 images, though it is on the high side
you should, and weight your prompts
particularly with sdxl models
less so with the new dit ones, newer text encoder don't seem to play well with that?
weighting prompts i have done
prompt weighting is massive
Whats a "new dit one"?
SD3, SD3.5, Flux, Lumina, Qwen Image, etc.
Well, it's most likely not the DiT architecture that makes that not work but the not-CLIP text encoders
It's just that both of those came hand in hand
Ah ok
I think QWEN is too heavy for my system, using the default model and workflow that ComfyUI provides
- its slow as balls, 2) it consumes all my system RAM to spool up
I don't think the image gen model is worth it, though i mostly do anime
but even for anime the edit model is good
edit model is slow but the image model is actually a little bit faster than flux
is that right? Good to know
Original, edit model, +my own edits
We've moved on from trying to make image gen models make coherent text, to trying to get them to understand that the text is actually written on the pages of otherwise blank open books
Lol
coherent text issue was mostly from the very compressed 4ch vae not being able to preserve the texts
it was bad even when big
but that was part of it for sure
and the popped egg yolk anime eyes
@earnest grotto what GPU are you using for image gen ?
a770 16gb
think you'll upgrade or does it suit your purposes well?
I don't upgrade often. I won't be upgrading anytime soon
ah ok, same
Some people change GPUs like theyre out of style and a new season is upon them
Most new GPUs just don't seem worth it to me
And there isn't much of a 2nd hand market here I think
when I built this PC i had not upgraded anything (except the motherboard downsized to itx) since my last build in 2016
Will you get the b770 if it releases?
no
I am thinking a used 3090 tbh, wanted to stick with intel but not sure now
there is no such used market here
if you can find one for the mythical $700, power to you 🤷
I don't intend to get a new GPU simply because I'm fine with my current one. my last one was an rx 480. I want an upgrade i make to be substantial, but in this age that seems like an increasingly slim chance, i could instead spend more but for now I don't wanna
if I started monetizing what I do with AI I might reconsider but... I don't, for now
Did you check out the SDXL testing above? Was surprised to see my B580 performing at or above 3090 in that task
Yeah, which makes the b770 kinda interesting if it releases anyway lol
if they do intend to release a gaming gpu that's extra fat, they better have something planned for that CPU bottlenecking
For AI it could be pretty cool but I don't expect huge gaming gains
B580 for me until Celestial, probably. Id have gone B770 if it came out at the same time
I think at higher resolution the bottleneck will be less noticeable which is more likely for people to use with higher performance cards imo anyway
If I can game on that b60 and the price is reasonable then that could also be an option
my crystal ball says you probably will be able to game but the price will not be reasonable. but then, I think $600-700 would not be a reasonable price for it
and $700 is the mystical magical second hand 3090 price
i don't recognize the first person
I'm afraid to say... I haven't played oblivion
But man, AI TTS is such a boon for memes
The first person is just the base male breton race voice
AI TTS is a boon for mod creation, too.
Morrowind has a mod catered to voicing the entire rest of the game. Parts that aren't voiced, just text redone.
Using elevenlabs.
A large percentage of morrowind's conversations aren't voiced.
I still chuckle sometimes when I remember that some AI voice mods for skyrim I saw didn't fit in, because the base game's voices are emotionless and repetitive and the AI ones had too much actual emotion
Oh yeah uh
it can do skyrim nord
Vibevoice is very hit-and-miss. I assume this is because we have little to no control on what is generated.
Sadly though it's very slow.
I know WoW has a mod that voices a lot of the unvoiced NPC interactions as well. Not that I play that, I just know about it
I should've tried TF2 voices, actually now that I think about it.
I'd suggest trying neco arc but that'd be a big gamble
You got a minute of neco arc voicelines that I can just put in?
or would I just use this
LMAO
Well... They're voice alright, just not sure if they're lines
yeah that's what i used
and uh, index tts i think, could produce something usable
index tts 2 sounds like a tube to me
index tts 1 has no emotions but some of the best cloning quality
and it was a 1.5b parameter model too
NYAH! Why do you avoid the ai-generated spam channels so much? Trust me, there's nothing scary there. And corey doesn't post there that often.
I did it myself anyways lol
What I wanted it to say was among the lines of "Mira... Why do you avoid the AI-generated spam channel so mcuh? Trust me, there's nothing scary there, and corry does not post there that often..."
It had a freakout at the start due to me cranking up the temperature, and fumbled "ai-generated" a bit
maybe
Mira, why do you avoid the A I generated spam channel so much? Trust me, there's nothing scary there and corry does NOT post there that often! from the metadata
You should probably cut out the initial freakout part 😂
yeah probably
sounds like its saying chud
The further I go above 1 minute, the more it takes per it
2.24 s/it on 1 minute 33 second audio input
nvm it went back down
phew
I should also be using always on top
\
I bet I can cherry pick outta dis
windows really doesnt like it when you dont have cmd windows ontop
kinda annoying cuz it dips to 2.2-2.3s/it until I focus a window
or maybe im suffering what you said earlier, where over time generations just end up imploding the PC
cuz now im getting 1.63s/it (I just blackscreened)
its like she implodes at the end every time
lmao
that did not lower performance for inference, it just made it so eventually an error gets thrown
If you want my workaround, add torch.xpu.empty_cache() somewhere in the inference loop 🤷
ah, i should really try indextts 2 sometime. iirc doing chinese (hopefully japanese?) to english was one of the big features
I would if the current repositories weren't so cuda-focused.
I instead tried it on a huggingface space to get a taste of how it sounds
30 images of lace. autotagger's opinions:
8 are 1girl, 1 is 1boy, the remaining 21 are 1other
🤔 found this a bit funny and peculiar
autotagger said 
@somber trellis Presumably you've been trying to use this? https://github.com/diodiogod/TTS-Audio-Suite
also damn, this must be the first time i've actually managed to overtrain a lora. fascinating.
(overtrained loras produce spooky results)
Yes.
It requires torchcodec, which requires pytorch 2.8 at the latest.
It kept insisting it wasn't compatible with 2.8.0+xpu, so I just didn't go further.
I've seen a few go for 600 over the last few months, just haven't had the funds
https://www.reddit.com/r/StableDiffusion/comments/1nmnq6i/raylight_tensor_split_distributed_gpu_now_can_do/ here is a thing that looks cool
nice
god, dolphin is so much better than file explorer. what are microsoft doing...
file explorer loads so crazy slow, it's a pain to actually use it to look at my images
yeah, with the xe driver linux training performance is definitely back to reasonable speeds
windows 11 is a mess
new qwen image edit dropped
Anything noteworthy about it?
so far, i know only what qwen claim about it: Multi-image Editing Support, Enhanced Single-image Consistency, Native Support for ControlNet
they have a lot of examples there
they do have a hf demo
multi image makes me hope for style transfer but that's probably still not a thing
time to find out
the hf space doesn't do multi image
wait, nevermind, they link the wrong space
welp
double nevermind then
lol
it could just be hf overestimating the time needed? welp, i'll wait i guess
If you want the correct space, https://huggingface.co/spaces/Qwen/Qwen-Image-Edit-2509
Thanks, I won't be messing with it for now. QWEN is too heavy for my rig
I still havent booted up and tried setting up an Inpainting workflow based on your screenshot
4.82s/it training sdxl with 3 batch size and 12 rank, albeit with only 1024^2 images 🤔
windows didn't like that as much and was stuck around 10+s/it
yes
@dark pasture Use this script
If you want more performance, install the nightly pytorch
thanks, wondering why it installs kohya_ss from bmaltais
so that you can also train, if you want
make a lora of your favorite anime or video game character, make them do things
hopefully one day i will figure out a way to or the models will get generalizable enough to make a spritesheet
if the new edit models could properly do both pose and style transfer...
How is it that style transfer, one of the so early things from like GAN times if not before? seems to be turning into lost capability now
well, maybe with help of llm it would be possible to do things 😛
btw, that are your maximum resolution for picture generation?
isnt style transfer like explicitly claimed as one of the things that flux kontext, and qwen edit, can do?
resolution depends on model
these models know about 4-5 styles. you can transform an image into a style they know, but those styles are not very good and of course very limited, and that's if we count "real life" as a style, and one of the other artstyles is yellowed chatgpt "anime" image. you can't give them a random style as a reference. this was doable with style IPAdapters before this, I guess
i wonder if there is working IPAdapter for flux? besides XLabs-AI
well, i was generating on flux, so atm my model is flux 🙂
flux works between 4 and 0.25 megapixels
you can kinda get outside that but you will have issues
edit models should be replacing ipadapters. since the edited image works as a prompt, after all
i don't expect the xlabs ipadapter to be able to actually do style transfer.
there has been USO however, which I haven't tried much https://huggingface.co/bytedance-research/USO
I think it's a bit sad they based it on dev and not chroma. though perhaps they had already trained too much on dev before chroma finished
ha! Comfy is generating 2560x1920... 17.39 it/s s/it
i should try to compare it with the noobai ipadapter
yeah, with sdxl was getting cool stuff with ipadapter 😄
it also might be a better idea to do the equivalent of hires fix, not because flux will be broken with high resolutions, but because it'll be faster and the early half of the timesteps is not responsible for the fine details in high resolutions anyways
yeah, of course, but it's more interesting to test out stuff like this and try to push the limits 😄
i remember was doing x6 latent upscale
Vik, thanks a lot with that script. Teacache is doing stuff 😄
It/s or s/it?
ooopss... sorry, s/it
hmmm, now after every generation i have RuntimeError: UR error: 38 (UR_RESULT_ERROR_OUT_OF_HOST_MEMORY) 😮
need to explore a bit
restart comfy
windows issue
well, i am on linux 🙂
never had that on linux, what did you do
i do 3x1280x864 generation and when next generation getting started i receive out of host memory
also, you can go edit comfy/samplers.py, after line 990 in outer_sample, add a new line with just torch.xpu.empty_cache() (and the proper indentation)
it might help memory usage a bit, though it's more pronounced in training
yeah, it feels like it doesn't empty cache well. Also empty cache node doesn't help either
this is done on every sampling step. any node without pointless model patching logic can't do that
you want it done on every step, not after it's finished sampling or only before it started
but, it's also not a magic fix and might not help too much
it's only a magic fix for training where it's absolutely necessary sometimes (i distinctly remember i needed it for training ace-step. a shame the trained lora didn't work out though)
torch.xpu.empty_cache() didn't do the trick, but it seems that i need to lower quant of gguf models, since 9 gb model allow to continue futher generation, but 11gb model after first generation gives out of host memory
i'm gonna assume the lightning lora is just bad for it and wait for a new one 😐
thats a huge gain in speed
I would need to edit 162 images without shutting comfy down for the compile time to have been worth it

Hey guys I need some help - I reinstalled comfyui with the script again because it was getting out of date, but now all my generations are black images. The sampler preview shows for a few steps but then goes black.
- Tried different models (illustriousXL, cyberrealisticXL)
- Seperately loaded sdxl fp16 fix vae
- Used --force-fp16 argument
- Used --force-upcast-attention
- Tried torch nightly and stable
- Updated comfyui with git pull
- Used custom and default nodes
should i just switch back to 2.5+ipex and see if that works?
--force-fp16 is sure to cause black images
use --bf16-unet --bf16-vae
use torch 2.7.1 or later
same issue, using all default settings
--bf16-unet --disable-ipex-optimize --lowvram
i have a b580. at one point some generations got through but it was really inconsistant and most of the time it just goes black
this is the error I get
C:\Comfy_Intel\ComfyUI\nodes.py:1594: RuntimeWarning: invalid value encountered in cast img = Image.fromarray(np.clip(i, 0, 255).astype(np.uint8))
No idea here
Is that fp-16 vae fix still necessary? Make sure all your custom nodes are also updated
sdxl doesn't work with fp16, if you are using fp16, you have to use fp16 fix vae
but intel defaults to bf16, bf16 doesn't have this issue
Hope, this https://github.com/comfyanonymous/ComfyUI/pull/9979 fixed my OUT_OF_HOST_MEMORY error,
third generation ran smoothly 😛
What driver version
Run furmark or some other graphics stress testing tool and say what happens
qwen-image-lightning-2.0 works fine on 4-step
for some reason qwen image edit 2509 has issues doing style transfers
an entire reddit post on it
original
"Change his armor color to gold, and make it clean-looking." Qwen-Image-Edit-2509-Q8_0.gguf with the 4-step qwen-image-lightning 2.0 lora
Qwen image edit 2509 fails to do pixel art transfers at all.
Something I messed around a ton with on the previous version of the model.
.
if a lora is not trained for the specific model you are using it for, it will be worse
that it's usable, yes, it's usable, but it is worse than a lora specifically trained for that model will be