#ComfyUI for Intel Arc using IPEX

14167 messages ยท Page 15 of 15 (latest)

earnest grotto
#

make sure to actually select CLIP, it defaults to GLM

#

maybe some webui had a convenience feature that by default strips out underscore tokens. but I doubt such a feature would've made sense once pony became popular

#

or convenience that strips out underscores for everything but score_9, ...

#

comfy doesn't have this and I don't think other web uis like sdnext or forge do either

lunar thicket
#

Oh playing with it myself now i see the underscore is its own token

earnest grotto
#

you can try it yourself in comfy, prompt with and without underscores and see that the image will be slightly different. like with using an abliterated text encoder, it will produce slightly worse results, but to definitively say it's worse you need to gen a few hundred images and compare. I did this for some garbage qwen 0.6b ablit, it made anima produce artifacts far more often, but I'm not doing a test like that again, took too long.
for anima the worseness of bad prompts is noticeable enough that you don't need to gen hundreds. even typos or swapping around an artist/character's first and last names are enough to make anima see it as something completely different.

#

alternatively, you could probably also calculate loss for a few hundred images, it'd likely be higher. though I'm not big on trusting loss, this should be valid enough

reef ivy
#

comfy is a mess now with some of these older nodes, conflicts everywhere lol.

reef ivy
#

First ltx 2.3 video, Dev with distill lora and some sorta upscaler. Default prompt and needed an image but didn't have one so used the default lol. whole thing done in 21 minutes. 10sec's with audio, that's kinda nuts coming from when I last did video with wan 2.1. Also, literal first attempt after taking hours to set stuff up again. Really impressed with this model.

reef ivy
#

also on 8gb a750 with 32gb of system ram.

reef ivy
#

another default prompt from the workflow i downloaded, i2v dev with distill lora. also forgot, gguf models Q4_K_S. 14 min but was only 5 secs this time.

#

need to setup an llm prompt enhancer but all my llm nodes broke in comfy completely.

reef ivy
#

I see bitsandbytes added official intel support? Even for 4bit quants? Seems to not work for me in windows though

wicked fulcrum
reef ivy
#

Revisting my old Wan2.1 img2video test, the llm keeps adding dialog but it's old qwen2.5 14b. Also messed up and forgot to change the aspect ratio from default.

#

wild this is what I used to get a year ago with wan2.1, slower and lower res.

#

and this nightmare fuel with LTX last year lol. They really improved.

reef ivy
#

takes about 18mins right now, I guess that includes the 14b LLM as well(not sure). Output is close to 720p and it does some sorta upscale pass as well at lower steps.

somber trellis
#

10 minutes

#

However, this is 640x360. But it's 20 seconds.

#

gif artifacting

earnest grotto
#

don't use gifs

#

if you want an autoplaying looping video on discord, one way I know of is imgur's embeds

somber trellis
#

bad voice

#

oh god the ending

#

It probably would be better if I inserted my own audio clip via vibevoice, and combined them.

reef ivy
# somber trellis

the speed seems tied to vram allocation being random. I got up to 12minutes with the upscale to 720p yesterday(at 10sec output), when the model was mostly on ram(strangely). Basically the same GGUF issues I had since the beginning with Flux

#

I really need a native block swap node so i can force vram and ram to be the same, but I guess since nvidia users don't need it anymore kijai didn't make any nodes for ltx 2

#

and also, --lowvram makes everything super slow as opposed to --resreve-vram

earnest grotto
#

lowvram makes it so text encoders are ran on the cpu by default. if you run out of vram, you run out, it doesn't try to keep some amount of vram free like reserve-vram does

reef ivy
#

I don't know why, i've ran text encoders manually on cpu and it didn't slow anything down, but lowvram kills speed all the time, makes inference twice as slow

somber trellis
somber trellis
somber trellis
somber trellis
#

Looks like I can actually do 1280x720 20 seconds with LTX 2.3. Takes 12-15 minutes on 640x480 vs like 33 minutes on 1280x720

somber trellis
earnest grotto
#

Wonder if hidream o1 image is better than flux klein 9b ๐Ÿค”

#

good on them for making an 8b model and not a bloated 17b one again

#

kinda wish more newer models would be closer to 4b but oh well

reef ivy
somber trellis
reef ivy
#

I'm using reserve 6 with an a750, i know it's different with an a770, but maybe try higher and see if it speeds up or slows down. Although upscaling could just take longer when set to 20sec's, i've only done 10

#

I found the model is loading mostly if not 100% on ram, at least according to the cmd panel. I think i use maybe 5gb vram total, i'd have to check again. When i set it lower or did lowvram, it used like 7 to 7.5 gb and ran almost twice as slow.

#

fp8 would likely be different, i feel like mostly gguf's have the issue

earnest grotto
#

reserve-vram is somewhat broken i think, and reserves 5gb less than you tell it to.
6 is basically the lowest you should go, and you should probably stick with it only if you have nothing else using up vram. you are leaving 1gb free for anything else. that might sound like enough if you don't want to play any low vram game, but IIRC, discord+vivaldi(chromium browser)+steam+vscode all open were enough to use up 1gb vram for me.
lower reserve vram is faster

#

so, generally, I don't got below 7

#

but if you don't have all those open you can probably live with 6

upbeat crow
earnest grotto
somber trellis
#

The MTP checkpoint for Qwen 3.6 27b released today.

#

It seems I'm capable of getting 3-5T/s from it

#

at Q4_K_XL

earnest grotto
#

i've built the mtp llama cpp. haven't tested it yet. gpu busy mass re-generating danbooru images so i can try training controlnet

#

well, i did already try but the result was unsatisfactory so I'm making a better dataset

somber trellis
#

amazing that mtp alone made qwen3.6 27b slow but useable though

#

makes me very excited for the other implementations

earnest grotto
#

i don't think there is much point in using 27b

#

just use the 35b one

somber trellis
#

i mean it is a 27b dense

#

it should inherently be better at coding

#

but moe is never far behind it

#

i have both downloaded though

#

๐Ÿคทโ€โ™‚๏ธ

reef ivy
#

Haven't set up llama.cpp in years, might need to anyway seems ollama has a bad vulnerability and all intel arc stuff is on older versions.

lunar thicket
#

why would a 27b model be inherently better at coding than a 35b model?

#

Maybe I am misreading convo above on this

earnest grotto
#

the 35b model is a mixture of experts, 3b activated

#

hence why it's also realistic to run it at good speeds even when you don't have enough vram

reef ivy
#

Yeah, basically it's like one big model split into multiple small models so it is faster and runs on weaker hardware.

lunar thicket
#

Aren't these arguments in favor of the 35b? But Dan said the 27b is inherently better at coding

#

That is what i was asking about

earnest grotto
#

having more parameters doesn't automatically make it better. or worse
the 27b might indeed be better at coding
but that's not gonna help you at 5t/s

#

as an example, see also, flux 1 dev vs flux 2 klein 4b. or even just flux 1 dev vs any chroma variant

reef ivy
#

Doesn't seem like they are that far off from each other but I am just watching videos revewing them atm.

lunar thicket
#

I see, so the argument here is that the 27b could be 'technically' better but would run like dog doodoo, making it not better in a practical sense.

#

๐Ÿ‘

earnest grotto
#

Qwen Image 2 might release after all

#

Hopefully it does

#

For context, it's 7B unlike the original obese 20B Qwen Image. And of course, more parameters doesn't mean better. They're claiming it beats 2512 at most benchmarks, though I am personally more interested in image editing

#

It was supposedly going to release around february-march...

earnest grotto
#

I had tried this but didn't see a performance increase materialize. Person updated it, gonna try again later

earnest grotto
#

The one time Flux annihilating artstyles actually came somewhat in handy

earnest grotto
earnest grotto
lunar thicket
#

how did you install comfy?

earnest grotto
#

That's very outdated

#

Comfy desktop has an Intel version now. You can also use AI Playground if you want something visually simpler than comfy, though comfy itself has been worryingly simplifying a lot recently. I also have a script that installs just regular comfy and some optional custom nodes for you #1193952640225267802 message

lunar thicket
#

I am an acolyte of the Vik's Script Way

earnest grotto
lunar thicket
#

Wondering if I need to play with this one or not. Maybe I dust off comfyUI and give it a whirl

slender nymph
earnest grotto
#

it will look mildly better, difference won't be super big

slender nymph
lunar thicket
#

Anima is trained from the ground up, right? It's not a SD derivative?

earnest grotto
#

T5 removed and replaced with a small adapter + qwen 3 0.6b base

lunar thicket
#

I guess my question was a little vague heh. Thanks

earnest grotto
#

It's better than lumina or the various sdxl finetunes, trained on both gelbooru and natural language, this version can do 1536x1536 plus uses the qwen vae (the old one), it's slower than sdxl but faster than lumina and modern distillation can make for a far better distilled version than sdxl ever had, though imo the preview3 distill lora was kinda meh

lunar thicket
#

Was just reading about it, according to Nvidia (lol) Cosmos Predict2 should be much more capable of complex prompts than SDXL

earnest grotto
#

It normally uses T5

lunar thicket
#

Idk what T5 means tbh

earnest grotto
#

It's big and fat

#

SD3 and Flux 1 also used it. It's not used much anymore

lunar thicket
#

so they removed&replaced it to make it more performant?

earnest grotto
#

Yes

#

Qwen 3 0.6b is like 1/20th the size and way faster

lunar thicket
#

with similar results I assume?

earnest grotto
#

Not that they are super slow when used for imagegen but still

earnest grotto
#

It works fine, and besides, the captions themselves or other things are likely bigger bottlemecks

#

Like the VAE, though it might've been late to swap it

#

I am also sus of how well claude can caption art. Local VLMs like gemma 4 or qwen 3.5 and 3.6 produced really bad results for some very simple images

#

Anima can put things on the left/right of the image perfectly fine and understands those for things when I train a lora with it, but can't do someone looking left/right by default, likely because claude couldn't caption it

lunar thicket
#

that's disappointing

#

one step forward one step back with these things

earnest grotto
#

Nah, this isn't any steps backwards

lunar thicket
#

The mask editor crashes my browser in comfyui. So i have been too lazy to do inpainting

earnest grotto
#

SDXL models do not understand left/right period

earnest grotto
lunar thicket
#

true i was about to mention how it is frustrating with SDXL that it doesn't understand orientation/position whatsoever

earnest grotto
#

Not suited at all for inpainting a lot

lunar thicket
#

Oh

earnest grotto
earnest grotto
lunar thicket
#

Yeah with custom workflow support it seems very powerful

#

I will give this a try next time

quasi cypress
#

it was working alright 3 months ago decided to open today but it stays like this nothing happening

earnest grotto
earnest grotto
#

So apparently torch compile just works on windows with MSVC and bothering with the oneapi toolkit is unnecessary, but still need to activate MSVC's environment stuff

#

I'll do a speed comparison. On Windows, 1.36it/s with compile with anima @ 1152x896, 0.877it/s without

#

On Linux, 1.11it/s with compile and 0.72it/s without. Both windows and linux are 2.9. impressive

#

Also seems that 2.13 still uses more VRAM and the current nightly breaks compile. sad

lunar thicket
earnest grotto
#

I don't think so

#

Well, i'm not sure if compiling sdxl works with 2.9 but you should probably be switching off sdxl either way

#

compiling flux klein should work and i had good speedups with 4b/9b

earnest grotto
#

So apparently there is no 2.12 xpu torchaudio and this breaks things

#

annoying

earnest grotto
#

dependency issues with 2.11 too. yeesh

reef ivy
#

sigh

#

kinda getting tired of jumping through so many hoops with intel, was okay with it when I knew they were working on hardware, drivers and software support

earnest grotto
#

it might be temporary issues like there were with ipex's american domain

#

i'm pretty sure 2.12 was fine a few days ago (before I started testing a bunch, at least)

reef ivy
#

Hopefully, i had a bunch of dependency issues when updating comfy. Had to just delete all the llm nodes

earnest grotto
#

2.11's issues are at least resolvable pretty easily. i don't wanna bother building torchaudio for 2.12

reef ivy
#

also, not sure it was ever mentioned but bitsnbytes supposedly has intel support now for fp4 etc. But it completely breaks now on windows atleast.

earnest grotto
#

comfy seems to not have much interest in stuff like that

#

comfy-kitchen got support for things like mxfp4 but int8 is still nowhere to be seen (outside that custom node I linked a while ago)

#

besides intel doing int8 and not fp8, int8 also has better performance on AFAIK all Nvidia GPUs, and AMD is the same as intel in that regard

#

but nope

#

also, if you're referring to 4-bit q-lora, that was supported for a while now

#

it's for LLMs.

#

sub-8bit quantization in comfy, with something that isn't mxfp4, is also just not there and also an issue for every gpu, except for the 5000s specifically which do have hardware for mxfp4

#

well, it does work with every gpu, but it's not very exposed and I don't think it's better than a (proper?) gguf quantization