#ComfyUI for Intel Arc using IPEX
1 messages · Page 8 of 1
run my script again to update the hijacks
i need to run new version of the script?
correct?
also is it possible to run wan 2.1 on arc a770 16gb?
it's possible
still no torch.compile in windows. Do you guys think it will be required to use the one api basekit again whenever it becomes funtional? I'm not 100% but it seems when calling the setvars the speed goes down. speed seems the same now
added skip layer guidance and enhance a video to workflow, runs even faster with better quality.
for some reason wan loves slow motion for this image
i can use same guides and workflows as nvidia owners?
Well, no sageattention or torch.compile(if in windows). You can use the workflow in the png I just posted if you want to
I am on an a750 so you probably could use a larger model if you want to.
Torch 2.8.0.dev has a good performance improvement in SDXL speed on my A580 with comfyui. I can now generate a 1024x1024 image in 17 seconds instead of 35+ seconds.
still very slow for i2v
Linux? Need to test latest build with sdxl, but that is almost 100@% faster than my a750 in windows.
It will always start slow, teacache and adaptive guider don't kick in till later
Although that is way slower than me
Are you using my workflow?
yeah
Q8, surely it won't slow it to 10 minutes per vid to 10 minutes per it right?
Also what length?
Depends on how much is fitting into vram. Also length makes a big difference, i only do 41 frames for 10-12 minutes. It i move to 61 its closer to 30min.
Try different models, maybe q6 or q5? Also fp8 is the same size as q8 and could be faster maybe
I don't change anything except the base images
also do everything must be sync with each other? I think my text encode is something like fp16
Try a smaller model maybe, try each step lower till you get decent speed. If all are alow them maybe something is up with drivers or b580.
Let me check in a minute the size I am using
yeah that may be it, I am using the fp8 encoder, it's probably half the size
i think scaled version is better in native than kijai https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/tree/main/split_files/text_encoders
don't forget that 12gb vram is still small for Ai, it's better than 8 but you will still need quants for most stuff
go for best quality that can fit on your system
in my workflow the clip is put straight onto the cpu so the model itself uses the vram instead. If you get a model that can fit into your 12gb it should be faster.
I am using windows
what sampler are you using? Last I checked fastest I got was 1.3 s/it with sdxl
You are on a580 right not b580?
I also get 1.6-1.7it/s with SDXL on euler or dpmpp 2m sde
yep an A580 8GB, here's the workflow I used to test.
I will try this in a little bit, I expected the a770 to be faster but not the a580.
nope, with that workflow it's even slower for me. I wonder if it's the model you all are using, my sdxl models are old as dirt.
also, could be drivers. What are you all on? The latest?
with wavespeed added I can get 1.3it/s, still slower than you all get lol
can't see any .png posted by u
@earnest grotto plz help, im scared if i do something wrong the whole comfy will stop workin lol
just get the newest version of the script and run it from where you ran it last time (such that next to it there's a Comfy_Intel folder, with ComfyUI and cenv in it)
its done?
Please install inside your user folder, not in C:
when i 1st time running it it was in C:
its trying to install new copy
in my user profile folder
@earnest grotto still having this error
Show the whole error and link the custom node
go into the comfy folder in ComfyUI, delete ipex_to_cuda, run the installer script again.
or git clone https://github.com/Disty0/ipex_to_cuda into that folder.
appreciate your help, boss! everything working
is there any way to speed up?
???#1193952640225267802 message
Right above your first post
@somber trellis @quartz kelp what driver are you guys on?
i dont understand how to get this image in .png because discord saving it in another format
should just be able to click the image, right click and chose save as.
you can also try and download the mp4 video
I think it should save the workflow as well
yea, i tried to open image in browser first and then save - this worked
Hey Vik, after that speed became 3 times slower, how to fix that?
I am using driver 32.0.101.6647, the model an illustrious finetune and I am using python 3.12.9, Here is the pip list of my env
latest drivers I am even slower now lol.
im also currently on 6647
just tried latest and 6647, both are slower. Latest driver had my clocks going all over the place also
guess this memory issue is only on the a750 and not the a580...:(
what command line args are you using? I am only using lowvram.
dmpm_sde karas is almost 200% slower lol
euler is about .20s/it slower
--bf16-unet --use-pytorch-cross-attention --disable-ipex-optimize --reserve-vram 7.0
maybe reserve vram needs to change for the latest drivers
since I'm swapping drivers, I may try a real old one from pre-battlemage era
although i may have to reinstall arc control dunno
seems reserve-vram 6 takes me back to 1.3s/it. Probably would slow down wan, but speed up smaller models maybe?
wonder if I should upgrade to 6651
Use 6314.
even for windows?
What do you mean "even for windows"
Nah I just wonder if the version works differently between OS due to the difference between setup
There are no such driver versions on linux. If you are using WSL, the underlying windows driver version will have an effect.
On Linux, the relevant versions of things are the kernel, the intel compute runtime, and probably others. WSL has a small set of specific kernels it works with
You really don't sound like you're on Linux, so just use 6314.
to install, I just download the particular driver and execute it right? I don't have to DDU or something like that
What issues are you having and have you tried the latest that just dropped yesterday?
If it effects speed I may roll back myself tbh
It's better if you use DDU but you don't really need to
I haven't compared speed. AI-related, things just break in general on some newer drivers. I haven't tried 6651 yet but 6647, outside of AI, still has the half-of-discord-doesn't-redraw bug and I've had vivaldi (chromium browser) go black a couple of times, neither of which used to happen before
But uh... On Linux, the driver has started to completely crash if I'm doing anything in Blender, so for now I'm sitting on windows till I finish the thing I wanna model
can't downgrade. ugh, I guess I'll just stick on 6647. I don't really want the DDU hassle just to gamble for speed
What GPU
looking through this, i wonder if it can fix torch.compile in windows error with level zero? https://github.com/intel/intel-xpu-backend-for-triton/blob/880021875f71a881b31c36b06413b44889f522c9/.github/WINDOWS.md
Issue I am seeing atm, is my version of level zero is 1.20.1 and the closest on there is 1.20.2
b580
6314 is for alchemist only, not battlemage. If you're on battlemage, the oldest driver with support is the one after 6319. Driver quality dropped since battlemage released, I don't think there's much you can do for now
you may be better with latest, and if you have any issues then roll back
if you want more speed, use AIPG's comfy, it has a special battlemage build of ipex for 2.3, as battlemage hadn't released back then.
Although, not sure of compatibility with newer models and nodes
I appear to have gotten torch.compile to work in windows, although not sure if there is a benefit.
Inference time before torch.compile for iteration 9: 216.99881553649902 ms Inference time after torch.compile for iteration 9: 203.9179801940918 ms
so basically you need to follow the steps in here, you will need one api installed. Download the 1.20.2 release(not sure if the newer ones will work) and then copy the lib and include folder into C:\Program Files (x86)\Intel\oneAPI\compiler\latest
you can test by opening your enviornment, calling the setvars for one api and testing the code here https://pytorch.org/docs/stable/notes/get_start_xpu.html#inference-with-torch-compile
this will only work with 2.8 xpu nightly with the triton xpu
If there is a way to download the level one stuff with the windows installer of oneapi, I couldn't find it.
Seem to get between 15% and 25% speed increase. But lots of trial and error with all the comfy nodes and settings getting it to work
well i did that, updated to the latest 2.8.0 nightly...
My speeds seem to be worse with compile than without.
doesn't seem to work great with sdxl, it is working with Wan though. A little speed boost in Flux as well
also, different compile nodes seem to work better than others, teacache compile node with inductor and default seemed to work well for flux
Although I am probably not on the latest nightly, i am still on the one from yesterday
I think the bigger the model the more difference it makes
for wan I use comiple on tranformer blocks only, which helped speed it up faster than normal I think.
Wan has gone from about 12min with my workflow to as low as 8min with an average of around 9 with torch compile
It seems the best one is wavespeed's compile node.
I will try that one again, I had trouble with it and trying to change settings
specifically "Compile Model+"
The one with the default of inductor, not velocator for the backend
I think wavespeed as a whole is better than teacache, but dev doesn't update often so it's old
Honestly I'm not even sure if the wavespeed compile node is even working...
if it's faster (or slower) it is.
If using sdxl, honestly I didn't find anything that helped the speed other than like wavespeed or teacache, i think the model is too small for it to help and might hurt it instead.
Flux got a small gain, and Wan can get a big gain
wan2.1 1.3b upscaling with the Control Tile Lora.
Using a couple techiniques for speed, and only did 20 steps so could improve with more. Haven't tried a 50 step output yet
Unrelated to Comfy but since Pytorch nightly now installs Triton for windows, did anyone try writing any kernels with it?
That's way above my knowledge, but I did manage to get compile to work
It's complaining about Not being able to find C compiler when running the basic add kernel example
I think @somber trellis ran into that somewhere up in the chat?
Was it activating the basekit or what
Check here #1193952640225267802 message and here #1193952640225267802 message
Download the level zero files and add them to the oneapi comiler folder, then call oneapi setvars like normal.
Yeah you need one api and level zero for the compiler. Level zero isn't installed automatically for some reason on windows.
As far as speed, it only seems to increase with larger models. But it does work
you need the latest oneapi base toolkit with level zero 1.20.2
just drag and drop the lib and include folders into C:\Program Files (x86)\Intel\oneAPI\compiler\latest as aaron said before
torch compile doesn't seem worth messing with just due to the compilation time it takes, and the errors it needs to error out
what would've been better is getting torch 2.6.10+IPEX compilation to work with compile nodes
Thanks for the help, i guess my dream of writing Triton kernels in windows have to wait
I tried adding the visual studio c++ compiler to path like nvidia users do but it only worked with oneapi
I did the same too before asking here
I think at the moment it is still in infancy, I think full support will come. Tbh, I think Intel is the only one with official triton wheels for windows
Yea atleast officially it seems to be only intel. There are forks that works on Windows for Nvidia tho
Yeah, they also get sage attention
I am hoping for flash attention soon(i hope its faster than the pytorch attention anyway). I found some posts on github of ot being worked on
It's available for Xe in Intel's Cutlass fork
Not yet in Pytorch tho
if sd.next's openvino implement was in comfy
i think we'd all be using it
Open vino has issues though. Last I checked it used much more resources than ipex.
Would love to try it in comfy, i believe there was a fork years ago
Well they're definitely trying.
i dont think torch compile likes ipex_to_cuda hijacks
Some settings don't work in the custom nodes in comfy
inductor and default or reduce overhead seem to work.
max-autotune-no-cudagraphs can pull errors
inductor works seemingly
teacache also has eager and aot_eager
Eager will work, but I think it is slower than inductor. Also if you have the option to compile on transformer blocks only, i think that helps with speed. I only see it on the wan nodes though.
If you have wan installed I think you will see the difference (on second gen as the compile adds to the time). Flux as well should get a speed bump
if using a lora, you will need the model patch order node from kj nodes
For flux, you don't need the model patcher node for lora if using the teacache compiler I don't think. Although with a detailer lora the output is different, but i think it's still working. The patcher node just causes massive slowdown for me.
Bad news, seems like flash attention will only be supported on PVC, BMG and LNL
https://github.com/Dao-AILab/flash-attention/pull/1528
Is there a hardware limitation with alchemist?
looks like no speed increase for us, wish we could get the ipex 2.1 speed back 😦
Yeah, looks like its a hardware limitation with lack of flash attention support all together.
i dont think its a hardware limitation
thats why first movers can hardly be surpassed. researchers do it for free for you
only hardware dependency of flash atten is wmma / tensor cores
a770 has tensor cores
i don't see any reason why it is an hardware limit
Welp, being a 32 bit gpu hits here too
Could this help speed up video and image gguf models in comfy by any chance? https://www.intel.com/content/www/us/en/developer/articles/technical/accelerate-gguf-models-with-transformers.html
after many hours of reinstalling windows 11 and all of my packages
upscale model no longer crashes me anymore (again)
I wish I knew why it happened. What I find odd is this "Display Port Lost" blackscreen happened during upscaling and randomly while running certain games.
Crashing might be memory related, the display port I dunno.
The displayport lost is the crash i mean tho
Its a full on blackscreen, pc becomes entirely unresponsive until restarted
Do you see any ram spikes before it happens? I've had the system freeze when ooming before, usually it's going into page file. (not sure why it would do that on an a770 though).
Thing is, I shouldn't be OOMing anyways. I'm using slicing from IPEX_TO_CUDA to prevent OOMs...
triton was updated in nightly, dunno what has been changed though.
oh new driver actually improved the it/s, nice I guess
What driver are you updating from? I am on the last with the same number so not sure it would change anything
6647 to 6651, I'm just waiting for whql really
anybody know if this torch.compile error is fixable torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised: RuntimeError: self and mat2 must have the same dtype, but got Half and Float happens when using the fp8 weight type with some wan2.1 models when using the load diffusion.
That error is because torch.compile is bypassing my ipex hijacks
And ComfyUI errors out without my hijack
Would that just be torch.compile all together or maybe because it's a custom node?
it is from the node, it shouldn't need my hijacks in the first place
thanks, I found another compiler node that works, my guess it may be one of the settings I have set probably bypasses comfy settings
hey guys, could you tell me how the B580 performs on AI? I made a topic here #1354844590519353384, but some people told me to ask here
if you could answer me there it would be nice
Would you like to run just a basic sdxl clip->ksampler->vae decode thing, 1024*1024, etc., and tell us your performance and pytorch version?
anyone on the 2.8 nightly release for pytorch?
where do I grab the basic sdxl one? Then again it's kinda hard to measure the difference since I'm never using it before on older driver version
Here's a workflow
If you don't know how that works - open the image in browser, download it, and drag and drop the file over comfyui in your browser
then run
With 6647 on windows, with my A770 16GB, the slower pytorch 2.6, I get 1.22it/s with this workflow
well, that doesn't answer my first question. where do you download the base sdxl?
Sorry, I forgot
If you installed with my script, you can run it again and download a bunch of models I liked, as well as the base sdxl
Here's the direct link as well
https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors
Should also be able to download it with comfyui manager
I'll be off
I should switch back to 2.3, forgot i was using 2.6 on windows
yucky
it's 1.44 it/s on first load, changing seed a bit and got 2.47 it/s on the next one and it's pretty consistent around it. I think I'm on 2.7.0 + xpu
which model did you use? (im just curious)
that's the base sdxl as @earnest grotto requested
interesting
the latest driver really improves the gen speed
im looking to get a b580 and i want to know how fast it is on ai, that is why i asked
that is 1024x1024?
yes
honestly if you're not vram limited on the kind of AI that you want to do. nvidia is probably easier. I kind of giving up tard-wrangling mine for i2v and it's variant. haven't fool around with the deepseek local itself too since I don't really see the advantage of it compared to cloud ones.
but hey, b580 is amazing when it works. you just have to expect it to not just works™
based on this picture and mine was finished in 11s instead of 16s. b580 surpasses it easily with flying colors
omg finally some numbers 😭 🙏
i was going to get a 3060, but looks like my mind is locked on the b580 100% now
well keep in mind that the webUI that they're using is probably different. but even with that 27 secs instead of 12 sec probably is better, strictly for image genning tho.
yea
anything is easily better than my current gpu
i could get a 3060 for around the same price of the b580 here. the 3060 is nvidia, so it would be easier to use ai stuff, but the power of the b580 over it makes i choose the b580
I mean sure, but it won't be convenient using comfyUI if you're used to regular A1111 and it's derivative. and not to mention it kinda sus if you're gaming with intel on older games(I got to use vulkan for dota2 because it drips into 10 fps or so on dx11 lol)
i dont mind having some extra steps running stuff and i dont play older games, so i guess im good
b580 will be a major upgrade over that lol. Only thing is the setup will be a little more difficult if your new, but Viks script and intel ai plaground are really good to get started.
12gb will be really good for entry level too, i mean I am on 8 and getting by (barely lol)
Went to 1.9it/s with 2.3
1.85it/s with 2.8
you should get a 24gb b770 if such a gpu releases, but if it doesn't you shouldn't
a 16gb one would be good too for your case
where did u get this image from? is it flux or sdxl?
Eh they probably run the test somehow that it bloated that it need of 12 gb vram minimum
i think so
because i have 2080ti and 3080, and my friend had a 3060 and its not even close in flux
it takes 45 seconds with 25 steps with 2080 ti and around 1 and half minutes in 3060
I wish Gpu comparison channels also did tests with comfyui
It's pretty niche interest. You probably better off find the benchmark comparison on east asian video ironically(provided that you can type the moonrune keyword) lmao
i guess they using sdxl base with a refiner or somthn
guys whats the best pytorch for windows? my setup is soooooo slow
sometimes its 4s/it but 90% of the time its 9-24
but im using same workflow every day
show workflow, what gpu
A770 16gb
Ultimate SD Upscale with flux gguf
Flux is slow
but how its possible? yesterday it was 3.5s/it now its 32
I have a question: is it still correct as of the end of March 2025 to follow the installation procedure in this post on a windows 11 24h2 + intel arc a770 16gb pc configuration? Or has something changed?
even flux 4_0 is slow
Just run the script vik has in the pinned comments of this thread.
It does it for you.
@summer oxide ^
This content is no longer available. I can't download the requirements.txt file now. Could someone kindly help me?
Thanks, I will try this!
Thanks for the helpful advice!
But when I run this script, it returns this: Script version: 0.1.6p
Unknown or potentially incorrectly detected GPU:
Please report this issue.
I guess this script is for a discrete graphics card, but I have an integrated Intel chip( Thinkbook Pro 14 Ultra 225H), and I just want to use this card. How can I do it?
I check this: https://github.com/YanWenKun/ComfyUI-WinPortable-XPU
Unfortunately, there is no Ultra 5 225H, which still gives me some hope.
Drivers up to date?
The built-in windows driver updater does NOT work, if you were thinking of using that
Also, you can try AI Playground
Say you driver version
And open powershell, type in Get-WmiObject Win32_VideoController, press enter, and show what is shown
Here
Ok, I see the issue, I'll fix that, hold on
Thanks a lot!
hmm
this is an arrow lake igpu, i wonder if it's supported at all
Get AI playground and tell me if it works
@buoyant gulch
wait i'm getting confused, it should be supported
Ok, playing this ground for the first time.
I updated the script, should work now for arrow lake
I installed it on D drive, is there a problem?
@buoyant gulch Arrow lake is not supported yet for AIPG, sorry
I updated my script. Download it again, try again with it
Anyway, try your script, it's only a problem of time……thanks!
Everything is OK except this file, tried tree times, still can't download this file……
Show the whole list of files
everything is probably fine and you can probably run it without issue, but show the whole list
also consider trying the experimental pytorch which will be much faster, but might not work
hmm
try the experimental pytorch, or choose 2.6 instead of 2.5.1
Nightly is installed successfully.
It will install, the question is will it work afterwards
download some model and test
Entered 8188, and downloading sd1.5 model.
GPU usage is 98%
Thanks man!
you are the goat!
One can only dream 
Get the turbo lora https://civitai.com/models/876388/flux1-turbo-alpha and use the fp8 model if you can instead of gguf as they are slower and you should have enough vram.
not enough vram to upscale with the full model
You can also add teacache or wavespeed, or the distilled flux
wdym by distilled flux, it's already distilled
woops i am thinking of hunyuan, i think flux has a de-destilled model which is maybe slower lol
There is hyper flux and turbo flux lora's, you'd have to experiment to see which is better
@signal patio you can try this workflow, added teacache and torch.compile(if on windows requires extra steps to set compile up but it only helps a little so can be ignored). I have 2 sd upscalers as well if needed. Detailer deamon could be bypassed as well for a little more speed but less details. lora loader for multiple loras, 8 steps is for the turbo lora etc. Also launch comfy with --reserve-vram 7 (or try different numbers, they help with speed in windows and gguf models)
lol, support JUST got added to AIPG #1245461432141873245 message
@buoyant gulch
updated to pytorch 2.6 as well
I can run Flux_pruned_fp32 (16.6G)on my ThinkBook Pro Ultra 255H, checkpoint version workflow in 3-4 minutes to generate this 896 X 1152 picture!
I’ve got image generation working. Curious if there is a consensus on the most capable video workflow for a750
But after a while, I got an error and exited, restart comfyui using python main.py, I got this: No module named 'yaml", create a new conda environment, reinstall, still not working, same error.
I don't think that's very usable waiting for 4 minutes
Also, if you want to do anime, there's good sdxl anime finetunes that will be way faster and know way more anime-related concepts than flux
Use the shortcut.
Can I install comfyui manager? I think there is no problem with it , because I run flux after installing comfyui manager and using it install a missing node.
Sure. My script also gives you the option to install comfyui manager, and a few others, if you don't know how
I' 'm testing it, this laptop has UBS4 and TGX ports, and I can use a GPU docking station. I will try to solve these minor things by myself. Thanks for your contribution to the community!
SDXL only needs 10 seconds, so it's a good choice for this laptop.
Wan is the best local gen for img2video, it may be a toss up with hunyuan for text to video. I am using the fp8 quant for wan now as the gguf are too inconsistent with speed.
Have you tried doing still images with wan? I heard it's surprisingly good for that but I kinda doubt it
Not personally but dan did a couple with the 1.3b model and they looked pretty good. Should have decent prompt adherence too
Is LTX just not as good or not compatable with Intel cards?
Ltx is okay for closeup videos without too much movement. Its the fastest. The new 1.3b wan fun model may be better, it just came out yesterday but haven't tried it yet.
Wan2.1 14b i2v videos with my current setup take about 10-12 minutes for 480p at 49 frames(3secs)
I don't really have a frame of reference, so I guess that's good...?
I think I'm getting 720p still images with 20 steps at around 30 seconds. Not quire sure how that translates, but sounds about right. I assumed video would be a little easier and just the first image would be the most computationally intensive, but that was an assumption.
Nope, video is harder. 720p with an a750 if possible will probably take over an hour maybe much more.
and that's probably for like 2 seconds maybe, 33 frames. (with wan 14b).
The smaller models like 1.3 and LTX may be able to do 720p at a reasonable speed but lower quality
Yikes. I'm sure I'll start with 480. I've noticed some prompts give really bad ugly results and others come out almost perfect for images. I used to think that was just the capability of the software, but finally learning how to do prompts a little bit. Haven't even stumbled through Workflows yet. LoL
Prompting is very important, especially for older models like sd1.5 and SDXL
flux does pretty good with natural language, LTX requires an LLM tbh
Well...I'm downloading LTX now just to test it out and start with lower expectations. WDYM it needs a LLM? ...like to help write effective prompts? How so?
Yeah, they have one built into their workflow now called prompt enhancer. 0.95 is the latest model
I have my own custom one for my old ltx workflow that used ollama
Ltx can get decent results for simple upclose animations and non compex movements.
Well...that's if I could get it to install correctly after downloading 40+ GB. 😵💫
At least it appears every other node and feature is fine except all the ones that start with LTXV
When I try ComfyUI Manager, I get "Failed to Clone Repo" and when I try manually it's missing several nodes. I wonder if something happened to the Git or if he's trying to monetize it now.
40gb for ltxv? That doesn't sound right.
Oh, no, that part is only a few MB. I was basing it off all the dependencies from this page:
https://comfyui-wiki.com/en/tutorial/advanced/ltx-video-workflow-step-by-step-guide
you made sure to load into your environment before installing the requirements if doing it manually right? I notice those instructions don't mention it. I do believe I've heard some people complaining bout comfy manager in recent updates, but it's been okay for me.
and missing nodes should be able to be installed with the manager, unless the workflow your using is old as some nodes may have been removed for ltx.
which one is better at quality wavespeed or teacache? i really cant see major difference between them
I think wavespeed but its not compatible with wan yet and has way less frequent updates. If there is no major difference for you use the one thats fastest
Teacache is generally more stable with outputs for flux than wavespeed is. I've been getting blurry outputs using default 0.120 recommended threshold on the first block cache node for wavespeed.
Though I think both caching methods seem to have a chance to straight up fail on my end.
TBH, IDK what you mean "load into your environment". If it wasn't mentioned there I probably didn't do it. LoL
Well when you install comfyui you create a virtual enviornment in python where you install pytorch/ipex and all the dependencies for comfyui etc. So in order to manually install nodes you will need to manually load into this environment first then navigate to the custom nodes folder through cmd, and then install them.
I guess I have some digging to do because it's been several weeks since I went through all that and I don't remember what I did. LoL! I think I used miniconda3, but I don't remember anything about how I created the environment. 😅
python -m venv 'name of your enviornment'
"environment name" \scripts\activate ```
Did you use Viks script?
If so i think you can run it again, and control+c to cancel and it will exit into your environment
Also did you try install missing custom nodes in comfyui manager? It usually works
I tried the manager. It doesn't find any missing nodes. When I load the I2V workspace I get the missing nodes error notice. The manager has been useless for me so far. 🤷♂️
I did end up using Vik's script during my original install.
I generally run the instance from windows with the start_lowvram.bat from either you or him or someone around here. LoL
I right click and run with Python, but Ctrl+C just closes Python enirely and I'm back in windows.
If you installed with my script, run with the shortcut, then press ctrl+c to close comfyui. you will be in the conda environment
This is why you use the shortcut, not the batch file
Running batch files directly will cause that to happen
The shortcut my script makes launches it with extra arguments for the command prompt to not close (also helps in case comfyui crashes or something)
The alternative is to launch a command prompt yourself and run the batch file from that, rather than double clicking
So from there I try .\scripts\activate ...I'm not a python dev. LoL
Do I run
cd custom_nodes/ComfyUI-LTXVideo && pip install -r requirements.txt
or
.\python_embeded\python.exe -m pip install -r .\ComfyUI\custom_nodes\ComfyUI-LTXVideo\requirements.txt
No
pip install -r ./custom_nodes/ComfyUI-LTXVideo/requirements.txt
Hmm. I just get requirement already satisfied since I already did this from a GitCMD instead of this environment before. Ugh.
What nodes are missing, what workflow
Are there any custom nodes saying they failed to load
All the ltx-*2v.json workflows from https://comfyui-wiki.com/en/tutorial/advanced/ltx-video-workflow-step-by-step-guide .
This
Show what comfyui prints in the console when you launch it
The whole thing or just this part?
0.0 seconds (IMPORT FAILED): D:\Comfy_Intel\ComfyUI\custom_nodes\ComfyUI-LTXVideo
close comfyui
git pull
in the command prompt, in comfyui's folder
run again, say if it's still broken
(D:\Comfy_Intel\cenv) D:\Comfy_Intel\ComfyUI>git pull
Updating 96d891cb..2d17d891
error: Your local changes to the following files would be overwritten by merge:
comfy/model_management.py
Please commit your changes or stash them before you merge.
Aborting
sorry,
git restore .
then
git pull
0.1 seconds: D:\Comfy_Intel\ComfyUI\custom_nodes\ComfyUI-LTXVideo
... but still says missing nodes in the GUI. Let me check something...
Manager still doesn't detect missing nodes from GUI. Bummer.
Let me see if the Lightricks Git workflows work instead of the ones from the comfyui-wiki
I think Ltx nodes got completely reworked so if the workflow is old those nodes may not exist
It defaulted to a checkpoint I didn't even have instead of just being blank. v0.9.5 instead of my v0.9.0 took me a minute to spot. LoL
yeah, most workflows do that
I assume workflows go in the user/default/workflows directory. That's where I put the ones from the wiki. I just left the ones from Lightricks in the custom nodes where it was installed and searched to open from there.
...maybe at this point I should move my conversation over to a ComfyUI channel or server. Thanks @earnest grotto for helping me get this resolved.
Just noticed my GPU is at 0%. When generating images it's usually 50-100%. Hmm Seems it probably crashed and froze.
Seems there is an issue with manager now where it can't auto detect some nodes, I think the "workaround" is to manually search for them and install them.
@earnest grotto if the first run after boot is 3 s/it, 2nd - 9, 3rd 20+ its because of VRAM is full? tired of rebooting to run at normal speed. Do i have any alternative? thx
probably
don't have any suggestion other than use linux
--reserve-vram 7 or test other numbers
Probably need to set clear vram nodes once or twice on the prompt
whats the best option for my intel arc a770 rn? arch?
after clearing vram its still 15s/it
but if i run this after boot its 3.5s/it
Either you typo or it is faster which is normal since boot are slower due to loading the models etc
3.5s/it is faster than 15s/it right?
Yeah
i dont see any typo then 😄
after fresh boot generation is fast
but when i try to run it again it becames slower and slower
until 20s + /it
rebooting makes it fast again
clearing vram didnt help, unfortunately
are you running with the --reserve-vram argument? If on windows it helps with inconsistent speed
idk im using vik script
I don't have his version installed at the moment, you can probably add it to the start up shortcut he has if you right click and edit. It might be in there already, if so you can play with the numbers. I think --reserve-vram 7 was good for a770
Or you can start it up manually with conda activate .\cenv\Scripts\activate python main.py --bf16-unet --use-pytorch-cross-attention --disable-ipex-optimize --reserve-vram 7.0 in a command prompt from comfy folder. I think that's the name of the enviorment
Edit the start_lowvram batch file to also have --reserve-vram 6.0 or whatever other number
which option is better now? native pytoch 2.6 with xpu support or IPEX? I'm looking through the git repo and wondering if the native support is decent enough? (P.S. I'm using arc a750)
Use 2.8
So its better than ipex after all? Alrighty
what's this error with ComfyUI manager? I cloned the repo in the custom_nodes section and am using a venv now.
`(comfyui_env) D:\Extra\ComfyUI>python main.py --bf16-unet
Failed to execute startup-script: D:\Extra\ComfyUI\custom_nodes\ComfyUI-Manager\prestartup_script.py / No module named 'async_timeout'
Prestartup times for custom nodes:
0.1 seconds (PRESTARTUP FAILED): D:\Extra\ComfyUI\custom_nodes\ComfyUI-Manager
Traceback (most recent call last):
File "D:\Extra\ComfyUI\main.py", line 134, in <module>
import comfy.utils
File "D:\Extra\ComfyUI\comfy\utils.py", line 20, in <module>
import torch
File "D:\Extra\ComfyUI\comfyui_env\lib\site-packages\torch_init_.py", line 1002, in <module>
raise ImportError(
ImportError: Failed to load PyTorch C extensions:
It appears that PyTorch has loaded the torch/_C folder
of the PyTorch repository rather than the C extensions which
are expected in the torch._C namespace. This can occur when
using the install workflow. e.g.
$ python setup.py install && python -c "import torch"
This error can generally be solved using the `develop` workflow
$ python setup.py develop && python -c "import torch" # This should succeed
or by running Python from a different directory.`
where do I run $ python setup.py develop && python -c "import torch"
So, I run the script in the location where I want to install Comfy?
Yes, preferably somewhere where you don't need admin permissions to touch things, e.g. your documents folder
okay, thanks bro. I guess I have to choose Nightly for torch 2.8
Yes
If I want to update, I can just rerun the latest script right? I don't have to find all of additional nodes that I installed before
yes
if you installed the nodes the script lets you install, running it and choosing them again updates them as well
the command line inside the script doesn't include the --pre arg when choosing Nightly so it installs the 2.6, instead of the pre-launch 2.8
oh wait, nvm. It does install 2.8, but first downloads the 2.6 for some reason?
weird
It first downloads Comfy's requirements
Then installs on top of that
oh, I was confused for a second
can I change the torch version by rerunning the script after install?
yes
I don't think there's much reason to switch off 2.8 unless you found some bug
With 2.3, I got 1.9it/s. 2.6, 1.22it/s. 2.8, 1.85it/s
2.5 is about the same as 2.6
You can use 2.3 if you want to test out my unfinished stuff for getting the 3dpack to work
hm, it works fine and very fast. Around what Vik said. But i'm also getting this error:
from accelerate.utils.memory import clear_device_cache ImportError: cannot import name 'clear_device_cache' from 'accelerate.utils.memory' (D:\Extra\Comfy_Intel\cenv\lib\site-packages\accelerate\utils\memory.py)
ComfyUI works and runs just fine even with this
when i click run then facing this problem .... other models works fine but problem with the flux models
can anyone help me to solve this ?
show command prompt that comfy is running in and say gpu
Checkpoint files will always be loaded safely.
F:\COOMFY UI\Comfy_Intel\cenv\lib\site-packages\torchvision\io\image.py:14: UserWarning: Failed to load image Python extension: 'Could not find module 'F:\COOMFY UI\Comfy_Intel\cenv\Lib\site-packages\torchvision\image.pyd' (or one of its dependencies). Try using the full path with constructor syntax.'If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
warn(
[W331 23:02:25.000000000 OperatorEntry.cpp:162] Warning: Warning only once for all operators, other operators may also be overridden.
Overriding a previously registered kernel for the same operator and the same dispatch key
operator: aten::_cummax_helper(Tensor self, Tensor(a!) values, Tensor(b!) indices, int dim) -> ()
registered at C:\Jenkins\workspace\IPEX-WW-BUILDS\private-gpu\build\aten\src\ATen\RegisterSchema.cpp:6
dispatch key: XPU
previous kernel: registered at C:\Jenkins\workspace\IPEX-WW-BUILDS\private-gpu\build\aten\src\ATen\RegisterCPU.cpp:30476
new kernel: registered at C:\Jenkins\workspace\IPEX-WW-BUILDS\ipex-gpu\build\Release\csrc\gpu\csrc\aten\generated\ATen\RegisterXPU.cpp:2971 (function operator ())
ipex_init: (True, None)
Total VRAM 15931 MB, total RAM 32694 MB
pytorch version: 2.5.1+cxx11.abi
Set vram state to: LOW_VRAM
Device: xpu
Using pytorch attention
ComfyUI version: 0.3.27
ComfyUI frontend version: 1.14.6
[Prompt Server] web root: F:\COOMFY UI\Comfy_Intel\cenv\lib\site-packages\comfyui_frontend_package\static
Import times for custom nodes:
0.0 seconds: F:\COOMFY UI\Comfy_Intel\ComfyUI\custom_nodes\websocket_image_save.py
Starting server
To see the GUI go to: http://127.0.0.1:8188
got prompt
(F:\COOMFY UI\Comfy_Intel\cenv) F:\COOMFY UI\Comfy_Intel\ComfyUI>C:\Users\Admin\AppData\Local\Packages\MicrosoftWindows.Client.Core_cw5n1h2txyewy\TempState\ScreenClip\{65D38567-438B-42EC-AD4E-7CE461080E66}.png```
A770
16gb?
Get an older driver and try again
Also install comfyui using my script
^
already installed using your script
ok, get an older driver and try again
gpu driver?
okay let me check !
and say if anything changes
Download the Q4 flux models (you can use my script again), use the workflow below the script, say if it still crashes
You can try and find older drivers on techpowerup. I hoard them because of this lol
Reinstall and use the newer pytorch, you are on 2.5 from that error message.
which one 3 or 4?
@reef ivy
4 is the 2.8 nightly
3 is 2.6, stable but slower
2 is 2.5 ipex, even slower
and 1 being 2.3.110 is still the "fastest"
but supports the least things
While you should probably use either 2.3 or the nightlies, that's probably not gonna fix the issue. Do what I told you
No, it installs 2.5 so it's 2.5
Why 2.5?
It's not the latest IPEX
Because that's the one I had made the script install
I'm pretty sure my internet works fine
same problem
Show the whole pip list
how to do that?
It crashes without an error, with the Q4 models?
do I just extend the ss?
scroll up, copypaste the thing or just send a screenshot of the upper part
Same as previous
restart pc, launch comfyui, try again. if it still crashes, get the oldest driver you can, install that, try again
it installed 2.7.0 🤔
--pre it is, then
download script again, run again
0.1.8
Download script again, run again
how to do it for intel arc 370M ?
Can a770 run flux.1 model? I tried the nf4 one, vram usage roofed to 22gb then system hung, but seems nvidia can run it with 4gb vram..
yes, you can use my script, or if you want to do it manually install the gguf custom nodes and use the q4 version
you don't need to, but things will probably be faster with --lowvram and potentially --reserve-vram 7.0 or so
you can use either
also not worked
turn off comfyui. open a command prompt in comfyui's folder
type
git checkout e1da98a
press enter. restart comfyui, say if it still crashes
nf4 is not compatible with arc gpu's since it requires bitsandbytes support. GGUF and FP8 models work though.
There is actually bitsandbytes arc support now, but just not for nf4. I don't know if they ever plan to add it or not.
Heads up, latest Comfy updates removed reroute nodes, so alot of workflows will be weird looking and probably won't work on older versions etc.
hwut
what are they replaced with
surely comfyanonymous/litegraph didn't just... literally remove reroutes???
maybe just a bug?
Thanks. What kind of bitsandbytes are support? Maybe a link?
Just use the q4_0 models, or q5_0 if you have >8gb of vram
though i'm not sure, it's possible q5 or higher is better than nf4 anyways
Don't think there is any functionality that would be usefull in comfyUI. I have it installed just to stop the errors I was getting https://github.com/bitsandbytes-foundation/bitsandbytes/tree/multi-backend-refactor?tab=readme-ov-file#bitsandbytes-multi-backend-alpha-release-is-out
q5 should be higher quality but much slower from my understanding
I find for wan2.1 models GGUF is just too inconsistent with speed, it is slower in general but sometimes it is much slower like double speed loss, I've switched to fp8 even though it hits the ram heavier it is still faster and more consistent in speed.
nf4 is supposed to run faster than fp16 / bf16
hmm, need to find where i put that panhard crab pic i had masked out, new 3d model generator showed up
If anyone wants it
(Color if alpha is naively removed is black)
trellis's hf space | triposg's hf space
last one flipped
gonna try without simplifying
simplified vs not
Really not sure what to think. Need good retopo. Nvidia's thing looked great.
ran out of free. people are saying it's better with anime than trelliis 🤔
might poke to see if it works locally later i guess
cmd
is that why some packages don't import on comfy? I've been getting this one since I installed 3 days ago
Cannot import D:\Extra\Comfy_Intel\ComfyUI\custom_nodes\comfyui-brushnet module for custom nodes: Failed to import diffusers.loaders.peft because of the following error (look up to see its traceback): cannot import name 'clear_device_cache' from 'accelerate.utils.memory' (D:\Extra\Comfy_Intel\cenv\lib\site-packages\accelerate\utils\memory.py)
And another package didn't work with the same memory.py error
works without any additional hassle
first run took 90 seconds, different seed 72
peak vram usage with the example workflow was 12.3gb
No, that error has been showing up for a while, probably node needs to be updated Could also be becuase they completely changed the front end and made it an entire seperate git repositiory now.
Doesn't seem like any of these models would be good for complex geomtry with moving parts like vehicles etc.
It's just a test for hard surface
It's complex and has all sorts of parts - antennas, cannons, whatever
Without significant work, probably not for anything up close, but for static environmental models or for an application like an RTS game, I think that would be very passable without much modification.
running the newest script
Flux working on you coomfy ui?
haven't tried flux
Huh
#. Both eager mode and
torch.compileis supported. The featuretorch.compileis also supported on Windows from PyTorch* 2.7 with Intel GPU, refer toHow to Use Inductor on Windows with CPU/XPU <https://pytorch.org/tutorials/prototype/inductor_windows_cpu.html>_.
update get start xpu document for v2.7
nice
Working on q4 models but not working on original flux models
The bf16 flux models need 22gb for Nvidia, probably even more for Intel
comfy_extras.chainner_models is deprecated and has been replaced by the spandrel library. I got UR error with this
but i already used flux dev some month ago....
You did not use the "original flux models"
the fp8 ones are not that
fp8 is not the best quantization there is
it might be mildly faster (?) than the gguf ones
Q4 is around 6.3gb but i used a flux dev model which was around 22gb
q6 probably gets close if not better. but really not sure about that
My guess is 2.7 is linux only as triton on windows only comes with nightly 2.8? Also, I have gotten both eager and inductor to work in comfy i believe with it.
you quantized it to fp8 in comfy
you were just spending twice as long loading the model, to then convert it to fp8
yes, that's just converted ahead of time so you don't spend twice as long loading the model
As for quality I felt they were all basically the same down to q4 gguf, fp8 is fastest but --reserve-vram used to get back some speed for gguf
which even with an ssd, that takes some time
Which one do you think has better image quality?
probably q6, definitely q8
not like you are forced to use q4, especially if you have more vram
but also the differences get extremely minor
I feel like the loss is minimal down to q4. Its 95% the same quality as q8 and probably close to fp8
But use the biggest that can fit on your GPU with decent speed. Q8 or q6 probably for 16gb
Also try the reserve vram command for speed
what is that ? Every time I open comfy it shows that step by step 5 to 80
Q3 quants are where quality gets burnt, haven't seen a good q3 of any model ever.
Q2 should just not exist lol
I believe this is the new security feature
Not 100% sure though
how to use that?
reverse vram
It's comfy manager
ohh
Add --reserve-vram 6 to the startup parameters. Play with the number from 4 up depending on your vram amount. Check the pinned comments here
thank you vik. finally have flux gguf working. used about 9gb vram on Q5_0
fyi, eager mode = no compile
Well that explains why it didn't make it faster lol.
Can someone explain why i can't find missed nodes of ACE++ ?
tried manual install but no luck
2.3+ipex or nightly? which is better and faster ?
pretty much same speed. nightly has better compatibility. it's also nightly, experimental, things might be broken one night then not the other
or idk, maybe they build their torches during the day
but i doubt it / or it's day in your timezone
I got this error on using hiresfix, does my ipex_to_cuda has a problem or what?
can someone help me please?
I don't know man, it seems like you better off asking directly in the github that made the nodes instead
You can try and manually search for it in the manager maybe, also those nodes could be depricated.
I check the git repository for commit updates and to see if anyone else has the issue for problem nodes
i found what the problem was, requirements.txt is blank, it contains nothing. But in the same folder i found repo_requirements.txt and used it. So my comfy setup is crashed after that, and i decided to run @earnest grotto script again. It worked, Comfy is back to life but ACE++ nodes is not working still...
First error was - No module named 'scepter' so i put only this module name inside of requirements.txt and in installed everything but 1 error again:
transparent-background 1.3.3 requires albucore==0.0.16, but you have albucore 0.0.23 which is incompatible.
@earnest grotto need your help boss
repo_requirements.txt:
huggingface_hub
diffusers
transformers
torch>=2.4.1
xformers>=0.0.27.post2
gradio>=4.44.1
scepter
ms_swift
curious i'm running flux1 gguf q5 on a770, speed's at 8.2s/it, that normal?
Show your pytorch version, in comfyui's command prompt
Show your workflow
Launch arguments
using ur workflow,python main.py --disable-ipex-optimize ^
--lowvram --bf16-unet
torch is on 20250327 nightly
Seems about right, you can try the turbo lora and teacache/wavespeed
Also add --reserve-vram 6
No, doesn't seem right, I'm getting ~2x faster speed
WITH things going on in the background
4.6s for q4, 5.6 for q5_1
Show screenshots of the things I asked for
This is not using my script
Or you are not launching using the shortcut it makes, or something else
workflow looks exactly the same? what shortcut?
my script makes a shortcut, that I expect you to launch comfyui using
a shortcut that you can move around anywhere, put on your desktop, in your start menu, whatever
which you can not do with a batch script
or a python script
i run your blue balloon and got 7.3s
i see the script, it's using the same python ./main.py --bf16-unet --disable-ipex-optimize --lowvram
so no need to reserve vram?
There's more written in the batch file the script makes.
so i need to use ipex_to_cuda?
I git cloned it. Do I need to run any .py to enable it?
is it normal for a flux Dev Q8 GGUF image to take ages to generate on Ultra 9 285K with Arc A770?
You ran out of vram
#1193952640225267802 message
the gguf model is only 12gb
That's not the only thing that uses vram
Generate at a lower resolution or use one of the smaller quants
is this how i edit the text? could you please confirm this?
@shrewd plaza Install ComfyUI using my script instead ^.
Ipex isn't needed with torch nightly Anyway right?
On windows? I get about 6-7s/it with the q4 of flux, with teacache enabled
I got that speed on Windows too.
yes
i didn't have teacache in my workflow, now with it I have 3.5s with q5
https://www.reddit.com/r/StableDiffusion/comments/1jtvgyy/hidreami1_new_opensource_base_model/ new image model is supposedly good, not sure if it can run locally yet though.
nevermind, old
Looks like you need north of 24GB to run
Don't be too pessimistic
17B, at q4, 8.5gb, that should fit on an a770
And even, it makes me wonder
@reef ivy Have you had success running the less quantized wan versions on your a750 with kijai's block offloading?
Also wow, how is sd3 medium that high in those benchmarks https://github.com/HiDream-ai/HiDream-I1
I run the fp8 scaled version in native. It offloads like it should and is fairly fast for 8gb imo. The non scaled ran as well but was much slower and used more than my ram and hit page file
If I get 64gb or ram it should probably be fine
the 14b wan?
Ipex_to_cuda doesn't work with sam2, just curious what custom nodes does this hijack work with?
show the stack trace
many of the upscaler doesn't seems to work with the hijack
As I understand it, there are some changes to IPEX in the latest pytorch XPU nightly 2.8 - is the hijack appropriate still?
I can do 480x832 for 49 frames 30 steps in like 8-10 minutes. With Teacache, compile etc. for a video model this powerful and 8gb non nvidia gpu I think it's pretty impressive.
might be faster on a770 with more vram
The hijacks are not ipex specific
there is no ipex for 2.8
Use the hijacks always because there are always people who hardcode .to('cuda') in their nodes
Do you have a workflow that you could share?
#1193952640225267802 message need to add the model loader for fp8 these were with gguf. Also not sure if torch compile is added to these as these are older runs. Can bypass enhance a video also
Gguf is slower and inconsistent with speed but uses less resources
shocking, upscale image using model node simply sends my computer to poweroff crash...
odd, that specific model works perfectly fine for me
i've had some issues with some denoising models i think, but nothing like that
just came back from the 10th time power-off crash XD
ah, hm
"4x-UltraSharp" and "ESRGAN-UltraSharp-4x" should be the same thing
i tried 4xFaceupdat, 4x_nmkd-siax-200k ... all crash
Do you wanna try a different driver
i'm on 6632
resizable bar was the only other thing would cause poweroff crash, so I had it disabled...
what's your psu wattage and what specs
I think dan was having this exact issue
So there's a model named Hidream now
17b parameters, gonna wait for a gguf version
this your local gen or just an example pic
this is a huggingface gen
using a nf4 quantized version of the model
specifically the fast version
Evga 550ga
If I recall correctly, you had a b580
https://www.intel.com/content/www/us/en/products/sku/241598/intel-arc-b580-graphics/specifications.html
Minimum Power Supply Unit 600 W
Now thinking more on it, this is probably not the issue but it is something you should keep in mind
A770
And I ran 4x esrgan that's default in sdnext without issue
First try newer drivers, then try 6314 from here https://www.techpowerup.com/download/intel-graphics-drivers/
In general, keep your drivers up to date. I've often had issues that only happen on one specific driver then are fixed in the next one, which with intel's usual release schedule is after 1.5 weeks i think
6647 is old
6734 same crash. Now getting antique 6314
You probably want a downscaler, I think if you didn't put a downscaler. It'll upscale to like 3000 x 3000 or more reso
I think b580 only capable for 1.5 times on 1024x1024 comfortably
1.7 is pushing it to UR error more than often and 2.0 will do the power off crash on me
The image will upscale to 15xx * 1000. These GAN upscalers are also extremely lightweight, can upscale a 2000*2000 image 4x without much hassle and in under 10 seconds
This is not upscaling using some diffusion model
It's not running out of vram
I haven't stared at gpu usage but I wonder if it could cause power draw to peak and shut down, especially if the cpu also gets involved
Given the official minimum recommendation for a b580 is 600w, let alone an a770 that needs more...
Guess it would depend on cpu but i doubt it's some ultra low power one
I just tested on ultrasharp 4x, input of 230x154 works without crash, but 269x179 would crash the system... this is much less than what you guys have.
when the 230x154's upscaled, GPU registered a 2 sec spike of 90W (2 sec is the resolution of hwinfo), not sure if this justifies the psu theory?
what node do you use with 4xUltrasharp?
update: I hacked the upscale node so that it uses CPU and it can upscale to 3xxx*2xxx image no problem, although takes a whopping whole minute
You were using it correctly
What pytorch version
Nightly. Was using dev0330, just tried 0409 same thing
run furmark https://geeks3d.com/furmark/ and say if anything notable happens
FurMark - GPU stress test and graphics card benchmark
ran furmark for an hour. no crash.
welp, guess it's (probably?) not a psu issue then
guyz why any controlnet workflow ends up with this error?
So latest drivers add one api to the install? I guess this makes it uneccesary to manually install the level zero stuff for torch.compile now?
99% sure that's not new, the new thing is the option to not install it. The base toolkit is separate and way bigger than the driver.
You don't want to untick it, but if someone really wants to not be able to use Blender's Cycles, do anything AI, probably do anything OpenCL-related or whatever else, they can untick it and save like a gigabyte or who knows
curious, anyone's using A770 and have no problem with ESRGAN upscale? may I ask what version your spandrel is ?
I have no problems yes
https://www.reddit.com/r/StableDiffusion/comments/1jx0xly/use_nightly_torchcompile_for_more_speedup_on_gguf/ I wonder does this benefit us any?
File "E:\Comfy_Intel\ComfyUI\execution.py", line 327, in execute output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb) File "E:\Comfy_Intel\ComfyUI\execution.py", line 202, in get_output_data return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb) File "E:\Comfy_Intel\ComfyUI\execution.py", line 174, in _map_node_over_list process_inputs(input_dict, i) File "E:\Comfy_Intel\ComfyUI\execution.py", line 163, in process_inputs results.append(getattr(obj, func)(**inputs)) File "E:\Comfy_Intel\ComfyUI\custom_nodes\ComfyUI-Impact-Pack\modules\impact\impact_pack.py", line 1314, in doit current_latent = upscaler.upscale_shape(step_info, current_latent, new_w, new_h, temp_prefix) File "E:\Comfy_Intel\ComfyUI\custom_nodes\ComfyUI-Impact-Pack\modules\impact\core.py", line 1744, in upscale_shape latent_upscale_on_pixel_space_with_model_shape2(samples, scale_method, self.upscale_model, File "E:\Comfy_Intel\ComfyUI\custom_nodes\ComfyUI-Impact-Pack\modules\impact\core.py", line 1456, in latent_upscale_on_pixel_space_with_model_shape2 pixels = vae_decode(vae, samples, use_tile, hook, tile_size=tile_size, overlap=overlap) File "E:\Comfy_Intel\ComfyUI\custom_nodes\ComfyUI-Impact-Pack\modules\impact\core.py", line 1386, in vae_decode pixels = nodes.VAEDecode().decode(vae, samples)[0] File "E:\Comfy_Intel\ComfyUI\nodes.py", line 287, in decode images = vae.decode(samples["samples"]) File "E:\Comfy_Intel\ComfyUI\comfy\sd.py", line 524, in decode samples = samples_in[x:x+batch_number].to(self.vae_dtype).to(self.device) File "E:\Comfy_Intel\ComfyUI\comfy\ipex_to_cuda\hijacks.py", line 223, in Tensor_to return original_Tensor_to(self, device, *args, **kwargs) RuntimeError: Native API failed. (UR_RESULT_ERROR_UNKNOWN)
it's so annoying that the upscaler works on one day and broke it on the next day
Did you update something?
No
Need help with an error
Intel ARC 530M
ComfyUI error
@azure lily Install comfyui using my script #1193952640225267802 message
@earnest grotto can u help me, how to fix it?
I simply changed it to fp32
Thank you so much vik, I finally got it up and running, but now I have a problem running comfyUI-florence2 node
What pytorch version
And show the stack trace
Likely you installed ipex 2.3? If so update to latest pytorch 2.8. if you need 2.3 you have to add some parameters to the comfyui management file for florence to work iirc.
#1193952640225267802 message
My script does that, but only for some ipex/torch versions, so I wanna know which is the broken one
https://github.com/1038lab/ComfyUI-SparkTTS isn't working for me anymore.
I seem to be hitting a 6gb vram limit with this node
erroring out with dynamic_scaled_dot_product
The repo works, it's just not letting me generate more than a few words then OOMing at 6gb VRAM. It doesn't seem to be allocating any further than that
return original_scaled_dot_product_attention(query, key, value, attn_mask=attn_mask, dropout_p=dropout_p, is_causal=is_causal, **kwargs)
File "C:\Users\dbs_5\Comfy_Intel\ComfyUI\comfy\ipex_to_cuda\attention.py", line 116, in dynamic_scaled_dot_product_attention
hidden_states = original_scaled_dot_product_attention(query, key, value, attn_mask=attn_mask, dropout_p=dropout_p, is_causal=is_causal, **kwargs)
RuntimeError: Unknown exception```
Any help would be appreciated, as I really do like messing with this model.
It's very effective at cloning for what it is.
I don't know what changed because I was able to use all 16gb of my vram no problem beforehand
in case I did anything wrong I had re-updated my install with the latest 0.1.7p script
same issue
probably one of the scripts within this node repo is the problem
Iirc there were two sets of comfy nodes for that and one I couldn't get to work for the life of me and the other worked but not great.
restart pc
same erro
r
onednn_verbose,v1,common,error,ocl,Error during the build of OpenCL program. Build log:
,src\gpu\intel\ocl\ocl_gpu_engine.cpp:169
onednn_verbose,v1,primitive,error,ocl,errcode -42,CL_INVALID_BINARY,src\gpu\intel\ocl\ocl_gpu_engine.cpp:264,src\gpu\intel\ocl\ocl_gpu_engine.cpp:264```
see if there's anything in %LocalAppData%\NEO\neo_compiler_cache and delete everything if there's anything
if it still breaks, try a different driver
What was it, deleting the cache?
ah, listened to the thing
it seems that voicemeeter still crashes and blackscreens on 24h2, and thats whats been causing my blackscreens (i think)
That sucks though, because voicemeeter is what I constantly use.
I seem to be right, too. The moment I swapped to my hyperx as the default I instantly stopped blackscreening.
And right as I say that, it restarts itself...
guess it isnt that
God I hope windows doesn't end up forcing 24h2 on me.
It's really annoying to be honest...
I don't get what causes the blackscreen either. I stress-tested my PC for an hour and it was fine.
It's something to do with how ComfyUI and the games I play allocate memory I think.
I don't think it's only driver related either, as it's happened on earlier drivers too.
*With a fresh safe-mode DDU uninstall.
22h4 has many issues for some reason
Make the issue easily reproducible and report it over at IGCIT
did a windows reset, reinstalled my stuff
hasn't crashed since
probably something i did
welp that didnt even fix it either
Bot got angry that I posted too many fumo images at once and timed me out
So, single image
I'm not very happy with the results honestly
This is with TripoSG
Anyone try this on Arc yet?
I just downloaded the full gguf, will try in a bit... If I oom with that, I'll be trying dev tomorrow
Have you been monitoring anything from the gpu, temps etc. Did any mem check stuff? Might be something with your system, psu,gpu memory etc. Also might just be windows 24h2 tbh
I didn't oom but turns out the gguf versions of the text encoders don't want to load, so I guess I'll pick up tomorrow either way, with normal versions of those
It doesn't seem to be temps. I'm capable of running Mordhau at max temps no issue at all...
Even my CPU is only spiking at like 60C
Also Hidream Full Q8_0 runs fine with reserve-vram 8.0 at 9s/it
Is this HiDream?
Nice. Anything special or tricky about the workflow?
~38gb RAM peak, 14GB VRAM peak
very scientifically and objectively measured through staring at task manager with fingers on ctrl and c
3.5 minutes (~11.5s/it)
Wonder if it's jpeg artifacts it's trying to emulate 🤔
identical speeds when used with --reserve-vram 8.0 and 16.0
set to 16 instead
I'm not using any reserve-vram
i oomed without it
I didn't 🤷♂️
im starting to think my slicing musnt be working right or something
then again im also getting 9s/it and not 11
🤷♂️
those probably would've gone down after a few runs since my 8.3 with dev also went down to 7.1
Not in ComfyUI quite yet but, this made me happy (best text I could squeeze out of flux)
https://github.com/Tencent/InstantCharacter
Any of you guys poked lanpaint? I guess I'll have to try it more myself, since I haven't had much luck but their results look good
You can use the following extension if you want to force block swapping for Hidream
B580? 14GB vram peak sounds dangerous
I'm using an A770 16GB. You can probably use --reserve-vram to cut down on vram usage, or there is the custom node njb posted right above you
that makes a bit more sense, guess i'll need to upgrade from 32GB ram before i can get started on f.lux. thank you for the advice
you probably will. but RAM is cheap. a shame VRAM isn't so easy to get more of
speak of the devil https://huggingface.co/OnomaAIResearch/Illustrious-Lumina-v0.03
The safetensors file is meant to only "contain the weights" - for comfyui-compatible format, we will try to prepare it as soon as possible.
🤔
Anybody looked at framepack? Not sure if it's open source yet to make work on xpu. I am a bit out of the loop as I haven't done much in the last week which is like a decade in AI time lol. Also, Ltxv 0.9.6 distilled was released and looks better and faster than the older models from what I"ve seen(still no hunyuan or wan though).
Anyone using LTXV 9.6. Compared to 9.5 it seems to produce artifacts, but maybe 9.6 requires something special Im missing
LTX 0.9.6 model requires a long prompt to work properly.
I tried sample prompts. Just get a lot of artifacts compared to 9.5. But I only changed the checkpoint to 9.6 distilled from my 9.5 workflow.
So this is normal, version 0.9.6 does not have the same number of steps and the settings differ in the workflow
Ok I'll look for an 0.9.6 workfkow to see what the setting should be.
I just moved from 0.9.1 to 0.9.5 and that's a huge difference.
All workflows have been updated to version 0.9.6 in the official repository. I think it's best to completely abandon previous versions and adopt version 0.9.6.
https://github.com/Lightricks/ComfyUI-LTXVideo/tree/master/assets
It would also be better to combine it with a VLM.
FramePack works fine in SDNext with SDNext FramePack extension: #1127742927347666964 message
i haven't tried it yet, but maybe it could be the vae your using? Also probably make sure the cfg is at 1 since it's distilled. It also may require a new workflow I haven't messed with it yet.
I've messed around with it a bit, the results are pretty decent. I'm running an A770 on torch nightly 2.8, and I have to use a node that clears the cache after every run for it to not fail. Nature related imagery has a lot of issues with watermarks being included. It is a promising model, with any luck we will see some LORAs.
after 32min... already with teacache
what model, quant, resolution, and frame count?
default wan2.1 template 512x512 33length q6
You have reserve-vram 7(or another number) in your launch parameters? For gguf it's necessary for decent speed.
I use wan2.1 fp8 scaled now, and it's much faster (quality is a bit lower than gguf though from my experience)
is that reserve-vram 7 to make sure not to overspill into shared gpu memory?
using q5 to keep it within vram, now within 10 min, but seems worse too
reserve-vram makes comfy tell pytorch to try to leave that much vram for the rest of the system. in practice it lowers vram usage and can help with performance when you're almost running out
Well technically it tries to reserve vram fro the rest of the system, but for some reason it tends to speed up gguf speed for arc users.
also you don't have to use 7 you can mess around with it, I found that 7 for my a750 is the most stable speed wise now for gguf models in recent drivers/pytorch.
try lowering your teacache values for quality, you can also use torch.compile now in nightly pytorch( you may need some extra steps still for windows)
can someone help why flux controlnet is not working? im also cant use flux inpaint :((( really need that features
Show what's not working with inpainting
#1193952640225267802 message
That's not the official flux inpainting.
where i can get official?
line 17 of math.py
thanks man! you was right, after i used another workflow canny is working, trying to setup depth but can't find depth workflow 😦 its strange that comfy have canny node by default but no depth one
also about inpainting, your workflow is workin, im waiting till it finish, its ok that its extremely slow? 15s/it
it didn't worked, same image i recieved in the end. i guess im missing something, and it should work faster than that
will i see the end of haruhi before i die 
canny is a very simple mathematical operation
depth needs a model to estimate it, and needs support for that model and so on
i haven't watched haruhi yet but it's the next thing on my watchlist after gun gale online
how long did it take?
my promt is cat eyes, but its not working
and it takes 771 seconds on my a770 16gb pytorch nightly, installed using your script
120 seconds with 30 steps
probably would've worked fine with 20 steps, and generally wavespeed's first block cache also works decently, so it can drop down to ~30-60 seconds
Do u have any ideas why it took 771 sec on my machine and why my promt isn’t applied ?
don't set the guidance crazy high
what gpu, what launch arguments for comfy, are the models stored on an ssd or hdd
ah, and your image is 2000*2000
the image I used is 1mp
Here is a workflow that uses brushnet's cut for inpaint node and optionally kjnodes' color match
My script installs those if you chose to install the extra nodes
if flux is struggling to follow your prompt, set the cfg to 3 or so. keep in mind this will slow things down and if you're running out of vram, which you might also have been with a 4mp image, it will be even slower and you might crash
hence why, the cut for inpaint node
If flux doesn't make what you want even with cfg, then it might just be that flux can't do what you want it to
a770 16gb, ssd
yea, my image is 2000x2000
which node do u use for flux depth?
thank u!
i thoght impainting is faster and depends on a area, that im trying to impaint
inpainting happens in the latent you pass to the ksampler. if you pass the whole image then that's what will be used.
Sorry im dumb 😂, its real to optimize it to generate only masked part and then put it on existing image?
Pytorch 2.7 + xpu is now supporting sdpa
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/xpu
getting this error with florence in comfy now. The operator 'aten::_conv_depthwise2d' is not currently implemented for the XPU device. Please open a feature on https://github.com/intel/torch-xpu-ops/issues. You can set the environment variable `PYTORCH_ENABLE_XPU_FALLBACK=1` to use the CPU implementation as a fallback for XPU unimplemented operators. WARNING: this will bring unexpected performance compared with running natively on XPU.
looks like kijai made it so florence loads to cpu first for some reason, gonna try and revert the florence nodes
nope, pytorch-xpu issue https://github.com/intel/torch-xpu-ops/issues/1576
Does anyone tested ReActor/Instand_ID or anything else needs torchvision on Intel Core Ultra 7 258V iGPU? I can't install those things!
in your comfyui env, do this pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/xpu --force-reinstall
It worked! Thank you!
Better than torch 2.6 but still 30% slower than ipex 2.1.4 on Linux
I guess we should see what ipex 2.7 speed will be like
Also the bf16/fp16 reduction flag with math sdpa doesn't seem to do anything with intel
For context, sdxl at 1024x1024:
ipex 2.1.4: 2.0 it/s
torch 2.7 / 2.8: 1.4 it/s
openvino: 2.4 it/s
What sampler? Anything extra? Is this including text encoding every time? Because on Linux with 2.8, I'm getting 2.2it/s with euler and 2.15it/s with euler ancestral
Are you using dpm++ sde or any of those other such samplers?
also wow, hidream has really soaked in a bunch of very random anime characters, I wish it knew artstyles better instead
It knows saber, and it has very faint knowledge of megumin, nothing that looks similar at all, but mentioning her makes it generate a big pointy witch hat, long boots and a dark cloak
Euler
I'll try with sdnext later, but even on windows I was getting ~1.85it/s, but on comfy
I've had cases where some kernels inexplicably tank performance
I got up to 1.8 it/s by moving the model from GPU to CPU and then back to GPU at the start of every generation
This doesn't make sense
Also using non_blocking=True when moving any model kills the performance
It drops to 2 s/it (or 0.5 it/s)
It supposed to make thing faster when moving the text encoders from gpu to cpu, not slower
Any clue as to what caused the slowdown? just pytorch regression or something intel specific?
If things were compatible i'd still be on 2.1.4
this one is intel specific
Specific pytorch 2.7 fixes in latest windows driver for b series and core ultra related to torch.compile. no mention of alchemist
Make Dia avialbe in ComfyUI. Contribute to Yuan-ManX/ComfyUI-Dia development by creating an account on GitHub.
WIll test later, but if someone else wants to beat me to it
it's a comfyui version of https://github.com/nari-labs/dia
claims to beat 11labs
we love claims
Nice, hope its not just more smoke
So, is the use-bf16 comand arg still necessary? I notice that without it the quantization nodes actually work(although output is distorted), and normal gens seem fine so far(you can also select bf16 now in most model load nodes)
bf16 was needed because of a bug where iirc generation flat out didn't work due to a data type mismatch. you should prefer to use bf16 because it's ever so slightly faster. but otherwise it's whatever
Fp16 seems to give black image(atleast with wan and kijai nodes) bf16 and converting to fp8 work but the conversion is distorted
so far looks like the black output issue is only with Kijai nodes, I have been able to quantize a few models so far with no issue.
IPEX 2.7
Still can't be imported with glibc 2.41
I was able to get it to work by using patchelf --clear-execstack $lib but pytorch 2.7 + ipex 2.7 is slower than just pytorch 2.7?
It drops the performance to 1 it/s
cant go past this point
One or more packages have failed to download: intel_extension_for_pytorch\s+2.5.10+bmg
Please run the script again and ensure your internet connection is working.
It installed correctly, everything is fine
I've updated the script to fix this, you can download the updated version
You should prefer to use the nightlies rather than 2.5, they have better performance, using them more now I haven't seen any notable issues
1 more frenda
Oops, I should've posted the frenda elsewhere. Oh well, here's fine too, since it doesn't have that creepy smile. I wonder how much synthetic data they used and if it might actually be better to use higher quality ai gens 🤔
If the model they release is actuaally this good, this will be pretty big
Since uh, it's only on their service for now 😛
Hi thanks for answer. I'm able to install 2.6 and nightly but not 2.5. No idea why.
Im both higher versions im unable to use impact nodes. Does anyone have solution for this issue? Is it possible to use this nodes on Intel Arc? To be honest without this node i rather return my B580 and buy something that is working.
This error my script was giving you is superfluous. Everything installed fine, if you think 2.5 didn't install properly because it told you that, that's (most likely) wrong
I updated the script to fix that.
You can download the script again and replace the old one. You don't need to because it should work anyways.
Show what the specific errors are with the impact pack
Lol I can't get that to work. At all.
The node widgets for the load and run dia models dont even have corresponding widgets. The run dia model node is looking for a purple model widget but the load dia node only has a grey model widget
I went looking for other repos, the customdia repo doesnt work whatsoever and straight up just throws a bunch of tensor shape errors
Oh boy
I wanted to try it too. It had lower VRAM requirements than spark while promising better performance
Something tells me though by online reviews that it really isnt that good
they have a hf space if you want to assess quality without comfy
hf says i reached my usage limit
also the intro of this is just funny
"Ugh this coffee tastes like cardboard (moans)"
But the moan ends up sounding more like a birdcall
that was some pained screeching
well the repo does say it doesn't handle those well
That coffee musta had the worst aftertaste in the last millenia
ok dude put (off putting laughter) as one of the prompts
And the AI is like in the corner of the room in the clip yelling "OFF PUTTING LAUGHTER"
in the most sarcastic, cliche voice
oh i just realized it sounds like JERMA
LMAO
Ok im not gonna lie its hilarious
i kinda wanna mess with it
FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis
I think its in kijais nodes, but you have to edit his files to get wan to run on arc at all (change float64 to 32)
Im getting above error.
Prestartup times for custom nodes:
2.5 seconds: C:\Users\COMFY\anaconda3\Comfy_Intel\ComfyUI\custom_nodes\comfyui-manager
Checkpoint files will always be loaded safely.
ipex_init: (True, None)
Total VRAM 11874 MB, total RAM 32607 MB
pytorch version: 2.8.0.dev20250427+xpu
Set vram state to: LOW_VRAM
Device: xpu
Using pytorch attention
Python version: 3.10.16 | packaged by Anaconda, Inc. | (main, Dec 11 2024, 16:19:12) [MSC v.1929 64 bit (AMD64)]
ComfyUI version: 0.3.30
ComfyUI frontend version: 1.17.11
[Prompt Server] web root: C:\Users\COMFY\anaconda3\Comfy_Intel\cenv\lib\site-packages\comfyui_frontend_package\static
Loading: ComfyUI-Impact-Pack (V8.14.2)
[Impact Pack] Failed to import due to several dependencies are missing!!!!
Traceback (most recent call last):
File "C:\Users\COMFY\anaconda3\Comfy_Intel\ComfyUI\nodes.py", line 2128, in load_custom_node
module_spec.loader.exec_module(module)
File "<frozen importlib.bootstrap_external>", line 883, in exec_module
File "<frozen importlib.bootstrap>", line 241, in call_with_frames_removed
File "C:\Users\COMFY\anaconda3\Comfy_Intel\ComfyUI\custom_nodes\ComfyUI-Impact-Pack_init.py", line 46, in <module>
raise e
File "C:\Users\COMFY\anaconda3\Comfy_Intel\ComfyUI\custom_nodes\ComfyUI-Impact-Pack_init.py", line 28, in <module>
import cv2
File "C:\Users\COMFY\anaconda3\Comfy_Intel\cenv\lib\site-packages\cv2_init.py", line 181, in <module>
bootstrap()
File "C:\Users\COMFY\anaconda3\Comfy_Intel\cenv\lib\site-packages\cv2_init_.py", line 153, in bootstrap
native_module = importlib.import_module("cv2")
File "C:\Users\COMFY\anaconda3\Comfy_Intel\cenv\lib\importlib_init_.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
ImportError: DLL load failed while importing cv2: The specified module could not be found.
What is the "above error"
probably auto deleted
Cannot import C:\Users\COMFY\anaconda3\Comfy_Intel\ComfyUI\custom_nodes\ComfyUI-Impact-Pack module for custom nodes: DLL load failed while importing cv2: The specified module could not be found.
Loading: ComfyUI-Impact-Subpack (V1.3.1)
Traceback (most recent call last):
File "C:\Users\COMFY\anaconda3\Comfy_Intel\ComfyUI\nodes.py", line 2128, in load_custom_node
module_spec.loader.exec_module(module)
File "<frozen importlib._bootstrap_external>", line 883, in exec_module
File "<frozen importlib.bootstrap>", line 241, in call_with_frames_removed
File "C:\Users\COMFY\anaconda3\Comfy_Intel\ComfyUI\custom_nodes\ComfyUI-Impact-Subpack_init.py", line 23, in <module>
imported_module = importlib.import_module(".modules.{}".format(module_name), name)
File "C:\Users\COMFY\anaconda3\Comfy_Intel\cenv\lib\importlib_init.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 883, in exec_module
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
File "C:\Users\COMFY\anaconda3\Comfy_Intel\ComfyUI\custom_nodes\ComfyUI-Impact-Subpack\modules\subpack_nodes.py", line 3, in <module>
from . import subcore
File "C:\Users\COMFY\anaconda3\Comfy_Intel\ComfyUI\custom_nodes\ComfyUI-Impact-Subpack\modules\subcore.py", line 3, in <module>
import cv2
ImportError: DLL load failed while importing cv2: The specified module could not be found.