#ComfyUI for Intel Arc using IPEX
14167 messages ยท Page 15 of 15 (latest)
make sure to actually select CLIP, it defaults to GLM
maybe some webui had a convenience feature that by default strips out underscore tokens. but I doubt such a feature would've made sense once pony became popular
or convenience that strips out underscores for everything but score_9, ...
comfy doesn't have this and I don't think other web uis like sdnext or forge do either
I don't understand what this is illustrating tbh
Oh playing with it myself now i see the underscore is its own token
you can try it yourself in comfy, prompt with and without underscores and see that the image will be slightly different. like with using an abliterated text encoder, it will produce slightly worse results, but to definitively say it's worse you need to gen a few hundred images and compare. I did this for some garbage qwen 0.6b ablit, it made anima produce artifacts far more often, but I'm not doing a test like that again, took too long.
for anima the worseness of bad prompts is noticeable enough that you don't need to gen hundreds. even typos or swapping around an artist/character's first and last names are enough to make anima see it as something completely different.
alternatively, you could probably also calculate loss for a few hundred images, it'd likely be higher. though I'm not big on trusting loss, this should be valid enough
comfy is a mess now with some of these older nodes, conflicts everywhere lol.
First ltx 2.3 video, Dev with distill lora and some sorta upscaler. Default prompt and needed an image but didn't have one so used the default lol. whole thing done in 21 minutes. 10sec's with audio, that's kinda nuts coming from when I last did video with wan 2.1. Also, literal first attempt after taking hours to set stuff up again. Really impressed with this model.
also on 8gb a750 with 32gb of system ram.
another default prompt from the workflow i downloaded, i2v dev with distill lora. also forgot, gguf models Q4_K_S. 14 min but was only 5 secs this time.
need to setup an llm prompt enhancer but all my llm nodes broke in comfy completely.
I see bitsandbytes added official intel support? Even for 4bit quants? Seems to not work for me in windows though
There's something with the way ComfyUI LTX2.3 handles clip. It very easily ignores the prompt especially if long. But long prompts are where the model shines. Its more consistent outside ComfyUI. Ive seen posts where people have a workaround.
Revisting my old Wan2.1 img2video test, the llm keeps adding dialog but it's old qwen2.5 14b. Also messed up and forgot to change the aspect ratio from default.
wild this is what I used to get a year ago with wan2.1, slower and lower res.
and this nightmare fuel with LTX last year lol. They really improved.
takes about 18mins right now, I guess that includes the 14b LLM as well(not sure). Output is close to 720p and it does some sorta upscale pass as well at lower steps.
10 minutes
However, this is 640x360. But it's 20 seconds.
gif artifacting
don't use gifs
if you want an autoplaying looping video on discord, one way I know of is imgur's embeds
it was accidental as i forgot to change to h264/mp4 in videocombine
bad voice
oh god the ending
It probably would be better if I inserted my own audio clip via vibevoice, and combined them.
the speed seems tied to vram allocation being random. I got up to 12minutes with the upscale to 720p yesterday(at 10sec output), when the model was mostly on ram(strangely). Basically the same GGUF issues I had since the beginning with Flux
I really need a native block swap node so i can force vram and ram to be the same, but I guess since nvidia users don't need it anymore kijai didn't make any nodes for ltx 2
and also, --lowvram makes everything super slow as opposed to --resreve-vram
lowvram makes it so text encoders are ran on the cpu by default. if you run out of vram, you run out, it doesn't try to keep some amount of vram free like reserve-vram does
I don't know why, i've ran text encoders manually on cpu and it didn't slow anything down, but lowvram kills speed all the time, makes inference twice as slow
Looks like I can actually do 1280x720 20 seconds with LTX 2.3. Takes 12-15 minutes on 640x480 vs like 33 minutes on 1280x720
Wonder if hidream o1 image is better than flux klein 9b ๐ค
good on them for making an 8b model and not a bloated 17b one again
kinda wish more newer models would be closer to 4b but oh well
What size model and quant? Fp8/Q8? Are you running reserve-vram? Feels like if it starts loading too much on vram it slows down at random times, for me the lowres and upscale take the same amount of time (the upscale has 4 steps). I also use the q4 model for my system, and I am on windows so have native ram offload.
Q8. Reserve vram at 8.0.
I'm using reserve 6 with an a750, i know it's different with an a770, but maybe try higher and see if it speeds up or slows down. Although upscaling could just take longer when set to 20sec's, i've only done 10
I found the model is loading mostly if not 100% on ram, at least according to the cmd panel. I think i use maybe 5gb vram total, i'd have to check again. When i set it lower or did lowvram, it used like 7 to 7.5 gb and ran almost twice as slow.
fp8 would likely be different, i feel like mostly gguf's have the issue
reserve-vram is somewhat broken i think, and reserves 5gb less than you tell it to.
6 is basically the lowest you should go, and you should probably stick with it only if you have nothing else using up vram. you are leaving 1gb free for anything else. that might sound like enough if you don't want to play any low vram game, but IIRC, discord+vivaldi(chromium browser)+steam+vscode all open were enough to use up 1gb vram for me.
lower reserve vram is faster
so, generally, I don't got below 7
but if you don't have all those open you can probably live with 6
juicy stuff. good to see some performance improvements hitting LLMs. hopefully some good diffusion LLMs next
Vik.
The MTP checkpoint for Qwen 3.6 27b released today.
It seems I'm capable of getting 3-5T/s from it
at Q4_K_XL
i've built the mtp llama cpp. haven't tested it yet. gpu busy mass re-generating danbooru images so i can try training controlnet
well, i did already try but the result was unsatisfactory so I'm making a better dataset
amazing that mtp alone made qwen3.6 27b slow but useable though
makes me very excited for the other implementations
i mean it is a 27b dense
it should inherently be better at coding
but moe is never far behind it
i have both downloaded though
๐คทโโ๏ธ
Haven't set up llama.cpp in years, might need to anyway seems ollama has a bad vulnerability and all intel arc stuff is on older versions.
why would a 27b model be inherently better at coding than a 35b model?
Maybe I am misreading convo above on this
the 35b model is a mixture of experts, 3b activated
hence why it's also realistic to run it at good speeds even when you don't have enough vram
Yeah, basically it's like one big model split into multiple small models so it is faster and runs on weaker hardware.
Aren't these arguments in favor of the 35b? But Dan said the 27b is inherently better at coding
That is what i was asking about
having more parameters doesn't automatically make it better. or worse
the 27b might indeed be better at coding
but that's not gonna help you at 5t/s
as an example, see also, flux 1 dev vs flux 2 klein 4b. or even just flux 1 dev vs any chroma variant
Doesn't seem like they are that far off from each other but I am just watching videos revewing them atm.
I see, so the argument here is that the 27b could be 'technically' better but would run like dog doodoo, making it not better in a practical sense.
๐
Qwen Image 2 might release after all
Hopefully it does
For context, it's 7B unlike the original obese 20B Qwen Image. And of course, more parameters doesn't mean better. They're claiming it beats 2512 at most benchmarks, though I am personally more interested in image editing
It was supposedly going to release around february-march...
For something a bit more comfy related,
https://github.com/BobJohnson24/ComfyUI-INT8-Fast
I had tried this but didn't see a performance increase materialize. Person updated it, gonna try again later
After some tweaking, insane... Man I'm so happy
Original | Different style
The one time Flux annihilating artstyles actually came somewhat in handy
They've also released one for the VAE now https://huggingface.co/papers/2605.13565
Paper page - Qwen-Image-VAE-2.0 Technical Report
how did you install comfy?
That's very outdated
Comfy desktop has an Intel version now. You can also use AI Playground if you want something visually simpler than comfy, though comfy itself has been worryingly simplifying a lot recently. I also have a script that installs just regular comfy and some optional custom nodes for you #1193952640225267802 message
I am an acolyte of the Vik's Script Way
@slender nymph @lunar thicket Anima's base model is released https://huggingface.co/circlestone-labs/Anima/blob/main/split_files/diffusion_models/anima-base-v1.0.safetensors
Wondering if I need to play with this one or not. Maybe I dust off comfyUI and give it a whirl
well hello there, been having some fun with some portraits with the preview model, we'll see how the full one compares ๐
it will look mildly better, difference won't be super big
seems to work well with both loras for preview and the turbo lora, i am happy so far xD
Anima is trained from the ground up, right? It's not a SD derivative?
Finetuned cosmos predict 2B with minor architectural changes
T5 removed and replaced with a small adapter + qwen 3 0.6b base
I guess my question was a little vague heh. Thanks
It's better than lumina or the various sdxl finetunes, trained on both gelbooru and natural language, this version can do 1536x1536 plus uses the qwen vae (the old one), it's slower than sdxl but faster than lumina and modern distillation can make for a far better distilled version than sdxl ever had, though imo the preview3 distill lora was kinda meh
Was just reading about it, according to Nvidia (lol) Cosmos Predict2 should be much more capable of complex prompts than SDXL
It normally uses T5
Idk what T5 means tbh
so they removed&replaced it to make it more performant?
with similar results I assume?
Not that they are super slow when used for imagegen but still
The results are most likely similar if t5 was used directly, but it wasn't trained with that so can't know for certain
It works fine, and besides, the captions themselves or other things are likely bigger bottlemecks
Like the VAE, though it might've been late to swap it
I am also sus of how well claude can caption art. Local VLMs like gemma 4 or qwen 3.5 and 3.6 produced really bad results for some very simple images
Anima can put things on the left/right of the image perfectly fine and understands those for things when I train a lora with it, but can't do someone looking left/right by default, likely because claude couldn't caption it
Nah, this isn't any steps backwards
The mask editor crashes my browser in comfyui. So i have been too lazy to do inpainting
SDXL models do not understand left/right period
Use the krita addon. Comfy's frontend is too garbage either way
true i was about to mention how it is frustrating with SDXL that it doesn't understand orientation/position whatsoever
Not suited at all for inpainting a lot
is the krita add on specifically something for masking or is it like a full comfy
Oh
It's an addon for Krita https://krita.org/en/
Krita is a professional FREE and open source painting program. It is made by artists that want to see affordable art tools for everyone.
It's a bit annoying that it's mainly made for using their own premade workflows and model choice, but it supports custom workflows and that's great
Yeah with custom workflow support it seems very powerful
I will give this a try next time
it was working alright 3 months ago decided to open today but it stays like this nothing happening
Open the start_lowvram.bat file in some text editor, open a command prompt, and start pasting lines one by one and then pressing enter.
Say at which line it gets stuck
you can't paste into the command prompt with ctrl+v directly, either press ctrl+shift+v, or right click the text box
So apparently torch compile just works on windows with MSVC and bothering with the oneapi toolkit is unnecessary, but still need to activate MSVC's environment stuff
I'll do a speed comparison. On Windows, 1.36it/s with compile with anima @ 1152x896, 0.877it/s without
On Linux, 1.11it/s with compile and 0.72it/s without. Both windows and linux are 2.9. impressive
Also seems that 2.13 still uses more VRAM and the current nightly breaks compile. sad
Any reason to upgrade to 2.12 for joe rando who just uses it for old SDXL and such?
I don't think so
Well, i'm not sure if compiling sdxl works with 2.9 but you should probably be switching off sdxl either way
compiling flux klein should work and i had good speedups with 4b/9b
dependency issues with 2.11 too. yeesh
sigh
kinda getting tired of jumping through so many hoops with intel, was okay with it when I knew they were working on hardware, drivers and software support
it might be temporary issues like there were with ipex's american domain
i'm pretty sure 2.12 was fine a few days ago (before I started testing a bunch, at least)
Hopefully, i had a bunch of dependency issues when updating comfy. Had to just delete all the llm nodes
2.11's issues are at least resolvable pretty easily. i don't wanna bother building torchaudio for 2.12
also, not sure it was ever mentioned but bitsnbytes supposedly has intel support now for fp4 etc. But it completely breaks now on windows atleast.
comfy seems to not have much interest in stuff like that
comfy-kitchen got support for things like mxfp4 but int8 is still nowhere to be seen (outside that custom node I linked a while ago)
besides intel doing int8 and not fp8, int8 also has better performance on AFAIK all Nvidia GPUs, and AMD is the same as intel in that regard
but nope
also, if you're referring to 4-bit q-lora, that was supported for a while now
it's for LLMs.
sub-8bit quantization in comfy, with something that isn't mxfp4, is also just not there and also an issue for every gpu, except for the 5000s specifically which do have hardware for mxfp4
well, it does work with every gpu, but it's not very exposed and I don't think it's better than a (proper?) gguf quantization