#ComfyUI for Intel Arc using IPEX
1 messages · Page 4 of 1
tried another one, that also brings nothing into comfyui, i don't get it
yeah, none of the json files bring anything in. not sure whats wrong
still getting this
try zooming out and scrolling. The workflows seem to be in a different place or hit reset view
yeah its not that
it's nowhere
do you have all 3 encoder files?
Also make sure you select everything and make sure it loads, sometimes the workflow has the wrong name for files or you have them in a different place or different name etc
I have no idea, you can try downloading the cogvideox nodes in manager first maybe. Everything loads fine for me. Also what browser are you using? Sometimes that can have issues with webui's. I am on firefox, and have used chrome/edge(though not with cogvideo yet)
Oh, and make sure comfy is up to date
I'm also on FF
It should be.
Curious. How are you running this? native linux install? Windows? WSL?
I am in native windows, but the 4gb issue is with arc in general.
The hikacks bypass it somehow(have to ask disty about that)
It's a hardware limitation
It (shouldn't be) the end of the world, it should always be possible to work around in software, but it's still not great
Split things up into smaller chunks
Do other workflow jsons work proper? How are you downloading and loading into the UI?
ah ok my bad
Yes, the flux one I saved works fine
did you install the cogvideoxwrapper nodes? you can do it in the manager, or git pull. I honestly have absolutely no clue why these would work different for you. Even if you don't have them installed they should show up red.
i'll try to install them
the json files still don't work
I have no clue, only thing with me is they aren't centered so you have to zoom out compared to other workdlows. If comfy is updated I have no idea.
yeah, i zoomed out and everything
lets see if i can update comfy
nope, its latest version
Try and git pull in the comfy folder
try and drag and drop this image, it should give you the fun workflow for img2vid
will do, running an llm atm.
yup, this one worked. Told me im missing two nodes
Gotta generate an image to test it with
no doubt, glad it worked.
however im getting gifs... and if i pick a video format i get an error.
something about format
You need to install the requirements for the video nodes, i had the same issue and it wouldn't install automatically. You have to activate env and go to the folder and pip install -r requirements. I will give you the name in a minute
This effects all the video output bot just cog video
start your venv manually and then navigate here ...ComfyUI\custom_nodes\ComfyUI-VideoHelperSuite and then pip install -r requirements.txt make sure your in that folder you dont' want to install the default comfy requirements
This should install the needed encoder in the venv, I tried updating the node through comfy but this was the only way it worked for me
Hi Guys, I just bought a laptop with Intel Core Ultra 7 155H CPU with 32gb RAM. I just generated an image on SD3.5 medium model with the example workflow provided by stability AI and it took 1 hours and 12 minutes to generate! Is there a way to accelerate this to more reasonable waiting times (like 10 minutes)? I appreciate the help!
What you really need for any AI application nowadays is dedicated graphics cards. So I don't think there is much you can do
I was afraid this would be the answer 😂 Anyway thank you for quick reply 👍
I don't think 1 hour is quite right regardless 🤔
did you specifically install ipex or did you just clone comfyui and run it?
I installed ipex and then followed the instructions on the first post of this channel but got this error when try to pip install -r requirements-ipex-ultra.txt. Any help is highly appreciated 🙏
What version of python are you using
3.11.0
you need 3.10
Hello, has anyone managed to get comyui's Flux Train to work with a 770. I get the following error at the start of the workout "RuntimeError: FP64 data type is unsupported on current platform."
Loading: ComfyUI-Inspire-Pack (V1.6)
Total VRAM 15931 MB, total RAM 32127 MB
pytorch version: 2.3.1+cxx11.abi
Set vram state to: NORMAL_VRAM
Device: xpu
Loading: ComfyUI-Manager (V2.51.9)
ComfyUI Revision: 2804 [cc9cf6d1] | Released on '2024-10-31'
Python main.py --auto-launch --bf16-unet --disable-ipex-optimize --fast with hijack
Only pytorch version that supports arc is 2.5 or ipex afaik
And fp64 isn't supported natively, have the same issue trying to use frame interpolation ATM.
I guess it's not a good idea to install pytorch 2.5 because it's much too recent?
Only issue I have heard is that it is slower
But it supports intel natively
I am stll on ipex though, as speed is important to me lol
i am using comfyUi with stability matrix
and currently using ultra 7 258v with intel arc
Can anyone suggest ways to make better use of resources?
Found a workflow for extended videos in cogvideox-fun models, used the 5b-fun since 2b is kinda nonsense so far. Real low res but I may try an video upscale node. With fun models you can have a start and end image, not sure how to do it with the regular and this work flow is crazy looking lol.
Here is workflow, you will get some errors but the answer is in one of the comments https://civitai.com/models/792021/cogvideox-image-to-video-with-video-extension-x7-low-vram
Honestly, pretty cool for a video on an a750 intel arc gpu that couldn't even generate images over a year ago. Progress is crazy.
Yes (not this specific trainer), however I couldn't get any loras to work, ones I trained or others
Has this changed with 2.5?
Cuz when I tested with earlier 2.5, it was still borked
You don't have the t5 model loaded , you have the vit L loaded twice
The cpu offload option gave me errors like it wasn't supported. You got the 5b gguf to work? They also gave me errors, I did just update maybe I will try again. Also haven't actually tried the gguf clip models are they faster with same precision?
So, no one else either can get flux loras working?
They work for me? I use gguf models though.
Regular (fp8) loras work on GGUF models?
Do you wanna test out a lora for me? Are you using 2.5?
No, I am on Ipex.
Here is a workflow I am building, it has a lora loader. I think it should show the lora I am using. https://civitai.com/models/876388/flux1-turbo-alpha
I can try the lora, but I am using Ipex
Also, is there a way to update comfyui without having to re-add the ipex hijacks each time?
I keep forgetting then get the 4gb error lol
My script can do that, but it will also re-create its conda environment every time too
otherwise, no, you gotta keep re-adding it
I tried a bunch of git stash commands and it "worked" but completely broke the file lol.
https://mega.nz/file/HMo0WIjB#RfUIoBHrv7YCCaWjHlNsQEeJnO0M2L_ERuJ-BnU2Wyo
https://mega.nz/file/eMBlGIIZ#sXLgBLG9RvFJ2I_tzhoU71Y5YSa10udOIrIJxfWJFvs
It's a character lora. Shouldn't matter which one. Example prompt, "Anime style drawing of Wann kneeling on a reflective floor"
just plain git stash is enough, no extra stuff, in case you want to actually save the changes you did (which you probably don't)
otherwise git restore ., then pull, then re-edit
Should lora be full strength?
Maybe issue with pytorch 2.5?
I had tried with both 2.5 and 2.3, and it was pure black
May be something fixed with windows drivers now, idk
actually, training might still not work on windows, I'll have to try and see
Oh it could be I am still on older drivers because of the vram issue, I am on 5971. But have you tried linux?
This was both on linux and windows. I only train on linux, I don't use wsl and training on windows natively hasn't worked for a while
actually i may have only tried linux, i don't remember anymore, i'll try again
Strange, it could be the driver I guess.
Nice! How many steps did you use? Just a heads up you can get decent quality with like 10 steps.
bf16 is the only one that works, and the fp8 converter thing doesn't work either so no speed boosts for us at all(at least none that worked for me). I need to try the cpu offload again, it also gave errors last I tried.
All we're looking at ComfyUI as an optional backend to AI Playground.
With this we'd commit workflows that integrate well with AI Playground and provide value added features.
I'm checking if this community would be interested in submitting workflows for us to test and review for this.
If interest, I'll create a ComfyUI workflow thread here, for shared workflows.
What about workflows that need specific custom nodes?
And/or potentially even models
Specifically:
Upscaling with upscale models (tons of those, different results), like realesrgan-x4plus, the nodes for this are in by default but no model; good for textures and anime
Upscaling with SUPIR, needs nodes and model, good for normal realistic images
Inpainting with powerpaint, needs nodes and model, uses a non-inpainting SD1.5 model, does object removal much better than regular inpainting models, can be better than regular models besides that but that stuck with me
Custom Nodes and models are Ok.
We'll probably look at creating a manifest for a workflow, everything needed for it, with a user controlled option to download and install.
The harder part is input types images, masks etc. If AI Playground already has the input then we can map. If it doesn't then that would be harder to implement
What would be cool is to have a toggle to view the nodes and edit etc. If you ever did music there is a program called reason where you can toggle and view everything like a hardware setup and mess with the connections then flip back to ui.
@reef ivy Do you use Linux/WSL and if yes, would you like to test out training a flux lora through comfyui on your 8gb gpu?
xpu-smi is telling me my vram usage is consistently below 8gb. It is only getting polled something like every 20 seconds but it certainly got me thinking
i kinda don't think there's a vram equivalent/argument for ulimit to test myself 🤔
I have wsl but haven't used it in almost a year, also no clue how to train
https://blog.comfy.org/mochi-1/ Mochi in comfy, there is a 9gb model under low ram solutions and a fp8 clip, someone ran it on a 3060 12gb.
We are excited to announce that ComfyUI now has optimized support for Genmo’s latest model, Mochi! This integration brings state-of-the-art video generation capabilities to the ComfyUI community, even if you're working with consumer-grade GPUs.
The weights and architecture for Mochi 1 (480P) are open and available, and Mochi 1
Seems like it will take at least an hour to genetate at 30 steps, but seems to be working so far. Not sure I want to wait that long
it works
Takes like an hour on a750 with default settings, I ran it i with less steps and lower res and it sped up a little, took like 17 minutes but lower res the output was bad.
make sure you update comfy
and use the workflow they provide, should be able to click and drag tha photo
I'm not sure, I update manually and have an old script that doesn't have that option.
Just have to re-edit the file for hijacks each time
It does, just re-run it
Also updates the custom nodes it installs, if they're already installed
It doesn't update edited files, i should go fix that
If you get it working let me know how fast it goes, it's real slow on a750. And i think I read it's also slow on amd, but not sure what they were using.
so far less frames doesn't seem to speed up generation much if at all, and lower res only does to a point. So it's much diffrent from cogvideo
might let it sit for an hour and see how good it is, but don't feel like it now
Are you running their workflow? or the one from mochiwrapper? I am using the one they posted
I am using the scaled version
if bias: bias = bias.to('cpu')
o = torch._scaled_mm(inn.to('cpu'), w.to('cpu'), out_dtype=input.dtype, bias=bias, scale_a=scale_input, scale_b=scale_weight).to(inn.device)
there, quick dirty hack
might not work since with fp8 i know things might also not be implemented for cpu
I just changed this to 5 steps because it's so slow but here is the workflow, it is default 30.
you should also be able to drag that video I posted earlier after downloading it #1193952640225267802 message
that resolution is super low though lol
also, it's much faster now. Maybe I was having some issues when i tried, sometimes comfy goes slow until I re start it.
oom'd at vae, need to use tiled probably
If a piece of code is too long it auto deletes, probably for security reasons.
rip dan
it's short
banned
It deletes with like more than a couple lines
wtf, wow
I've posted longer that didn't get deleted
It's been deleting almost everything for me, if it's longer than like 2 lines maybe even one line sometimes
What he was posting was fairly short, and I think sometimes he posted my thing copypasted without the code block?
nah, i've had it happen for quite a while
Who should we contact to get him unbanned?
no idea who specifically
204342691964780546
IDK how i'd turn that into a mention, dammit discord
Wow.
Well that was resolved fast
They deleted all your posts too it seems?
normal for a ban
either way, replace the 2nd line you posted which does have bias, with
o = torch._scaled_mm(inn.to('cpu'), w.to('cpu'), out_dtype=input.dtype, bias=bias.to('cpu'), scale_a=scale_input, scale_b=scale_weight).to(inn.device)
and 4th one which doesn't have bias with
o = torch._scaled_mm(inn.to('cpu'), w.to('cpu'), out_dtype=input.dtype, scale_a=scale_input, scale_b=scale_weight).to(inn.device)
try again
might still not work, stuff is unimplemented for fp8
i wonder, do spambots post code that often? because I have not seen any in a different server I'm in that gets spambots fairly often
17.80it/s
on bf16
set to fp8 weights via the 'load diffusion model' node.
Because I remembered
If anyone gets banned in the future please just contact a mod, we can fix the issue immediately. sorry for auto mod.
wow, that is pretty fast tbh. I can't get the vae to work, tiled vae isn't working. Gonna try the mochi decode node
Do you know what's up with the code strings? Seems like it auto deletes and bans people now, should we just not post it like that anymore?
to my knowledge nothing was changed, let me reach out to the admins and double check things. Just keep posting now, if anything happens just ping any mod. I wanna make sure you guys have freedom to post in here
okay, thanks a lot. Appreciate it
can't get the vae to work, tiled vae gives errors and the mochiwrapper nodes refuse to install. Guess low res is all i can do
the vae decodes with the mochi node is causing me problems using it with the gguf variant
I cant get it to even install at all, will try another day.
If anyone else can get cogvideo 5b working that'd be cool.
I wish we could run mochi but that's 24gb vram
🤷♂️
Can get the fun models to work since I can lower the resolution and frames etc
5b regular can only run at a set res and frames
,,,sam loader(facedetailer) dont work on ipex 2.3.110 but work fine with 2.1.40, any fix?
For me, it was happening mostly when posting blocks of code
I haven't done that recently and I don't have any code to post right now, so, 🤷
If it happens again, I'll DM you the offending code
We changed limits in this channel only, it should help but still curious
@earnest grotto The latest IPEX version has issues with Florence2 and reading filepaths
the florence2node by kijai
Works fine on the previous version.
I'll show you the error.
It's probably some dependency ipex uses, but I don't know why.
How do i install comfy and llms using pytorch 2.5.1 instead of ipex
Is it better than ipex?
IPEX hijacks transformers and that hijack fails
Replace import ipex with this:
try:
import transformers # ipex hijacks transformers and makes it unable to load a model
backup_get_class_from_dynamic_module = transformers.dynamic_module_utils.get_class_from_dynamic_module
import intel_extension_for_pytorch as ipex
ipex.llm.utils._get_class_from_dynamic_module = backup_get_class_from_dynamic_module
transformers.dynamic_module_utils.get_class_from_dynamic_module = backup_get_class_from_dynamic_module
except Exception:
pass
Where do you want me to put this, just in case I'm an idiot.
find the import intel_extension_for_pytorch in model management
What about ipex_to_cuda? That's the same location where that's imported too is it not?
after this: transformers.dynamic_module_utils.get_class_from_dynamic_module = backup_get_class_from_dynamic_module
import transformers # ipex hijacks transformers and makes it unable to load a model
backup_get_class_from_dynamic_module = transformers.dynamic_module_utils.get_class_from_dynamic_module
import intel_extension_for_pytorch as ipex
ipex.llm.utils._get_class_from_dynamic_module = backup_get_class_from_dynamic_module
transformers.dynamic_module_utils.get_class_from_dynamic_module = backup_get_class_from_dynamic_module
from ipex_to_cuda import ipex_init
ipex_init()
xpu_available = True
except Exception:
pass````
yep
It doesn't have random corruptions like ipex but it is significantly slower
do any of these local video models produce decent results
the online stuff has been pretty disappointing to look at
show us your results
Mochi is the best but if you expect them to compete with the paid models then no. Mochi seems pretty close though, and runs on local affordable gpus so that is something.
are the hijacks still needed with it?
Yes
4 GB issue is a hardware issue
Alchemist is a 32 bit architecture
Is it still recomanded to use IPEX as in the original post explained or is there a better Method to get ComfyUI to run by now?
@uncut bronze ^
I've made a python script, which you can just run and it will install ComfyUI with IPEX for you, apply Disty's hijacks, and optionally download some custom nodes or some models
Seems pretty good?
It actualy does. I have Comfy running with Ipex allready though. But I would love it for a second install to try the hijacks
I had some issues with bf16 and the fooocus nodes and with torch audio
Okay I have now clou how to use Disty's hijack. Do where do I have to put the Hiijak. And do I need to run different requirements for them or update pytorch or anything?
You git clone the hijacks repo in comfyui's comfy folder, find where intel_extension_for_pytorch is imported in model_management.py and edit that so it also does from ipex_to_cuda import ipex_init and ipex_init() right afterwards
Thanks, Uff, with oneAPI installation and all thats a lot more complicated 😄
windows or linux
did you see this #1193952640225267802 message not sure if this should be the new edit or if it just works for that llm node.
Cool, seems like this will work with pytorch as well? If not using ipex
I've updated it. 2.5 might explicitly need some basekit component or whatever on windows, I'll see if there's some stuff floating around so it won't be needed like with 2.3
For now, the script will still only install 2.3
Where can i find the script
Do you have a script for linux
@devout tangle ^
Windows only for now.
Ok, i got it to work, thx, i have another question though, after like 4-5 images my 50gb of ram gets fill up with cache and pc starts to hang and lag or even just unresponsive at all, how to to solve this problem
restart pc
I think this is just a windows issue, thought i thought this was fixed already
Its on linux
I didnt install out of tree gpu driver, just regular that came with distro
Kernel 6.11.7
anyone know a relatively painless fix for the numpy-problem?
post the error
probably you want numpy==1.26.4
try 6.5, idk
I have to restart comfyui then? It takes all the ram
yes
It's not convenient, it must be a big of some sort
Instal tcmalloc and use start the webui like this:
LD_PRELOAD=/usr/lib/libtcmalloc.so.4 rest_of_the_command
Suggestion; If possible, add pre-requisite part to the pinned script-post mentioning that you need to have anaconda/forge/miniconda installed. Just for clarity's sake
you have iGPU enabled?
Disabled.
Also, I'd like to mention that the flux toolkit loras (depth and canny) work on arc, but will not load if the main model is loaded in fp8 dtype with a seperate lora. It results in a black screen.
tried it with --bf16-unet ?
...You are asking common-sense questions, Li.
Yes.
But it's always good to make sure.
🤷♂️
haha my bad.. been trying to help troubleshooting non-stop these days..
The flux fill fp8 model works great
so inpainting and outpainting on arc is no problem
I assume the main flux canny and depth models (non-lora) would work but nobody has converted them to FP8 yet
and I don't want to install two 23 gigabyte files lmao
GGUF works fine
It's just slower
But more accurate
higher precision than base fp8
So I made my own fp8 versions and they work
This script I found off of Kijai's stuff is nice to have.
for gguf models.. you can improve the speed by adding --reserve-vram 5.0 to comfyUI launch arg
Why would I do that on an a770?
That'd literally cripple the bandwidth I have.
🤷♂️
try and see
i know lowvram improved my speed, and running the t5 off the cpu could be even faster than shuffling it around
It will still use more than 5gb vram if it needs to (At least it seems to sometimes), it just makes the gguf models go faster. (on a750 anyway)
it attempts to leave that much vram to the rest of the system, not to reserve it for comfyui itself
yeah i can even get fp16 gguf model running with decent speed when launching with reserve vram 4.0
without it, it keeps swapping with DRAM for smaller chunks and taking much longer time
for NV users they seem to have similar memory management techniques. for example on a 4060 8GB you would see only 7GB out of 8GB is being used for Flux.1 Q8 running, and can notice the increase in system RAM usage while inferencing.
What's up with the larger than usual amount of mysteriously vanishing messages now
Nvidia added a cpu offload option to the drivers a while back, it can be toggled on/off in drivers also a comfyui command for i think.
Larger than 4GB memory allocation error? that was a while back in ipex 2.0 era
That was me asking you about InstantIR
But I removed my messages since they were at like 2 AM for me
i mean messages in this chat, and others
saw something in digital-art, went to see, nothing
the rvc thread now had a notification, but nothing
This node errors out 🤷♂️ I want to use this over SUPIR since it's better overall at image restoration
use the lowest possible resolution
I have no idea
0.1 megapixel scale causes a tensor a and b mismatch due to it literally being too small
rip
Show the whole error, show a pip list
that's a pretty old ipex
intel_extension_for_pytorch @ https://github.com/Nuullll/intel-extension-for-pytorch/releases/download/v2.1.10%2Bxpu/intel_extension_for_pytorch-2.1.10+xpu-cp310-cp310-win_amd64.whl
that is what came from the requirements
should I remove all previous modules?
Put the script in some random folder. Run it. It will create everything for you.
Whatever you had previously won't matter.
that's the point yes
So I tried out GGUF models with reservevram enabled
They equal the speed of FP8 models
but are actually more accurate to fp16
🤷♂️
ltxvideo works with comfycore base nodes, but not nodes from ltxvideo-comfyui
Faster than flux too btw, at 2.5S/IT.
I got this error when running ComfyUI. I used a-One-fan's setup file. Can anyone give me some advice? Thanks so much!
!!! Exception during processing !!! The program was built for 1 devices
Build program log for 'Intel(R) Arc(TM) 140V GPU (16GB)':
-11 (PI_ERROR_BUILD_PROGRAM_FAILURE)
When installing, did it say it installed for an integrated GPU?
You can run it again, show what it says
You may need to disable your IGPU when using arc for AI, not 100% sure though there might be workarounds now.
The issue here is pretty likely that I didn't expect Intel to have a dedicated GPU called "140V"
And it probably installs the igpu version of ipex
which is also partially why i made it spit the name of the GPU and if it decided it's dedicated or integrated back to the user when installing
oh that's battlmag-- i mean xe2 mobile. Not sure what that runs on as far as ipex
wait, so it is integrated
That GPU does not have VRAM.
140V is the Core Ultra series 2 integrated GPU
need to replace the ipex wheels with
conda install libuv python -m pip install numpy==1.26.4 torch==2.3.1.post0+cxx11.abi torchvision==0.18.1.post0+cxx11.abi torchaudio==2.3.1.post0+cxx11.abi intel-extension-for-pytorch==2.3.110.post0+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/lnl/us/
Do they have something special in the name to identify them, or is there a list of lnl igpus vs mtl ones, vs desktop arc
I get the GPU by basically looking at powershell's Get-WmiObject Win32_VideoController, which is fairly descriptive but I don't think gives info on generation and such
maybe it does but i'm on windows rn
You’re absolutely right. I’m running this on a laptop with Core Ultra 7 258V (32GB RAM) and no dGPU.
I installed IPEX 2.3.110 following Intel’s instructions. Then I ran pip install -r requirements.txt in ComfyUI directory.
Server initially reported missing modules like opencv-python, which I installed individually. After resolving those, everything worked perfectly without any errors.
I haven’t tried hijacks yet. I’m newbie and not sure what it can do.
a-One-fan installer detects my device as “possibly integrated GPU.”
I am the One fan in question
I can just hardcode a check for "140V" right now but i want to see if there's a better way to do it
Thank you. Looking forward to an update for a-One-fan to support the Core Ultra Series 2 from you.
there is nothing special in the naming.. the extra index url downloads the wheels compiled with lnl as the AOT target.. which is for the Core Ultra series 2 iGPUs
--extra-index-url https://pytorch-extension.intel.com/release-whl/stable/lnl/us <--- lnl/us at the end instead of xpu/us or mtl/us
multi-AOT for more devices should be WIP. once that's done then we can use the same wheel for different devices
technically it should still work if you use other AOT wheels, it would take a long time for the first image generation to compile kernels for that device.. tho
I mean in the name string of the GPU, at least the one that powershell command spits out, so i can identify which to download for
as i'm pretty sure that's not the only lunar lake igpu
oh
AFAIK,
- MTL = "Intel(R) Arc(TM) Graphics"
- LNL = "Intel(R) Arc(TM) 1**V GPU"
- ACM Arc = "Intel(R) Arc(TM) A*** Graphics"
maybe using regex to filter "Intel(R) Arc(TM) A" and " Intel(R) Arc(TM) 1** "
A60 uses the same wheel as A770
so if name starts with Intel(R) Arc(TM) A then download the A770 wheels???
Hmm, I guess I'll do that, thanks
welp, that was odd, my SSD decided it should load a model for ~700 seconds
Oh well
@sly trench Download the script again. Should say 0.0.7p now. Run again. Should work now.
damn, from 8s/it to 14s/it, 2.5 man
damn... something in 2.3 causes the flux trainer i'm using to save zeroed out loras, but it works with 2.5
i'm turning into shadow the hedgehog here
Probably because of BF16's rounding to zero issue thanks to its lack of precision
And random IPEX corruptions doesn't help either
Cache the CLIP embeds on 2.5
Do not run CLIP on IPEX
Thank you. I will try the new script on LNL laptop tomorrow, please wait for me to report the result.
But now I tried with MTL laptop (Core Ultra 7 155H) and got this error. I also tried version 0.0.6 but still got the same error.
The script identified the device as a Meteor Lake iGPU.
And I have no dGPU
How big of an image did you try to make
512x512
Show the nodes
You mean this?
C:\Users\Lenovo\Comfy_Intel\ComfyUI\custom_nodes\websocket_image_save.py
C:\Users\Lenovo\Comfy_Intel\ComfyUI\custom_nodes\ComfyUI_TiledKSampler
C:\Users\Lenovo\Comfy_Intel\ComfyUI\custom_nodes\ComfyUI_IPAdapter_plus
C:\Users\Lenovo\Comfy_Intel\ComfyUI\custom_nodes\ComfyUI-GGUF
C:\Users\Lenovo\Comfy_Intel\ComfyUI\custom_nodes\ComfyUI_ExtraModels
C:\Users\Lenovo\Comfy_Intel\ComfyUI\custom_nodes\rgthree-comfy
C:\Users\Lenovo\Comfy_Intel\ComfyUI\custom_nodes\ComfyUI-KJNodes
C:\Users\Lenovo\Comfy_Intel\ComfyUI\custom_nodes\comfyui_controlnet_aux
C:\Users\Lenovo\Comfy_Intel\ComfyUI\custom_nodes\comfyui-inpaint-nodes
C:\Users\Lenovo\Comfy_Intel\ComfyUI\custom_nodes\comfyui-tooling-nodes
C:\Users\Lenovo\Comfy_Intel\ComfyUI\custom_nodes\ComfyUI-BrushNet
C:\Users\Lenovo\Comfy_Intel\ComfyUI\custom_nodes\ComfyUI-SUPIR
I'm using it as Local Server for Krita AI Diffusion.
This
I'm not using ComfyUI on webUI.
I'm using it as Local Server for Krita AI Diffusion plugin.
Yesterday when I ran it on my LNL laptop everything worked perfectly fine.
But I can't get it to run on the MTL laptop in any way.
Have you ever encountered this error? What can I do to fix it.
Thank you very much.
Yes. I will look into it in a bit, I'm on linux rn
I'll be waiting for good news from you. Thank you again for your help 😊
on some more testing, the lora is full of zeroes during training and remains so after backward-ing, and after whatever else
same embeds for 2.3 and 2.5, works with bf16 on 2.5
I remember now, the 2.5 build I had done didn't have whatever change they did with the attention to slow everything down, it was running at the same speed as 2.3
gonna try 2.6
welp, from 8 @ 2.3, to 14 @ 2.5, to 11.5 @ 2.6
I guess they improved a bit with 2.6
not quite 8 but oh well
Hi, who managed to get bitsandbytes working on the arc gpu?
what do you need bitsandbytes for
To improve training speed
in kohya
their installation instruction is not very clear https://huggingface.co/docs/bitsandbytes/main/en/installation?platform=Intel+CPU%2BGPU#multi-backend
They ask for v2.4.0+ (ipex) but xpu version is not available
I think it's safe to say no one has bitsandbytes working.
they've probably just copypasted the cpu ipex because they plan to support gpus as well, but haven't gotten that to work yet?
2.5 ipex looks to be in the works. if we assume the guide is actually true anyways you will likely need to build it yourself, which is gonna take a while.
You can then get the multi-backend bnb and test, but it says some things are not supported so I'd really expect whatever is needed for training to be among them
There are other ways to get faster training. What python version are you using? What model are you training? Windows or Linux?
I am using version 2.6.1dev+XPU
what model are you training?
windows? linux?
windows with https://github.com/cocktailpeanut/fluxgym (fork of kohya)
linux will be faster
I tried without much success on Linux ubuntu 24
how many s/it are you getting on windows and with what gpu
A770 7s/its
hmm, that's pretty fast
In fact I am also looking to train with a higher resolution.
I am limited to 512x512
why?
I think bitsandbytes can save me some memory
I need to enter images of width 512x1024
train*
I tried the --split option but it requires more than 64 gb ram
Probably a fluxgym issue
I think the best solution is to run simpletuner another script with more optimizations
They use Quanto
But I couldn't get it to work.
If you build a version of pytorch 2.5 with xpu support from before they did whatever they did with slowing down attention, it'd probably be faster
i don't have such a build, and no gaurantees, it's a gamble
I train the model split with a comfyUI wrapper for kohya. the newer version of it does eat ram like crazy but the older one doesn't
splitting the model will slow things down, so, if you want higher resolution you'd end up trading off speed even with 512x512
Ok, I will continue to experiment and give feedback.
Why this happen? Conda is whitelisted in firewall, port is unused, it has admin perms, thx for the help in advance and sorry if this isn't the correct channel
You already have comfyui running
What changed with Pytorch 2.5 having XPU support? Is IPEX optional? ( I see pytorch detects an xpu device without it installed). It is unclear whether some ipex versions are tied to certain torch versions.
IPEX is not necessary
IPEX is always tied to a specific torch version
Performance is much worse (Pytorch foundation issue)
Some bugs fixed?
2.6 looks to improve on that performance regression
thanks. For now I am just testing on a Meteor Lake iGPU, I am new to Arc. good to know IPEX is just for perf optimization.
No. The performance regression with 2.5 is for every GPU.
Intel or otherwise
Except supposedly, H100s?
Where is the iGPU max memory set? Is it BIOS? I see it OOMs with >4G requests even if it says total capacity is 28G
torch.OutOfMemoryError: XPU out of memory. Tried to allocate 5.59 GiB. GPU 0 has a total capacity of 28.66 GiB.
Did you install ComfyUI with my script, which will apply disty's hijacks?
If yes, I have no idea why currently
no, I am just trying plain pytorch, just setting up to see the basics work at all
do you know if there are other discords/channels where such xpu stack related discussions are held (not necessarily in gen-ai context)?
I heard there is one, this was the first hit for intel-discord
no idea
Nop, when I change from wifi to simdata it works, I don't know what could be causing not working on wifi
You seem like the more active mtl user here
Open ComfyUI/comfy/ipex_to_cuda/hijacks.py
Go to line 7, device_supports_fp64 = torch.xpu.has_fp64_dtype() if hasattr(torch.xpu, "has_fp64_dtype") else torch.xpu.get_device_properties("xpu").has_fp64
below it, add
print(f"\n\n\nfp64: {device_supports_fp64}\n\n\n")
run comfyui, you will see a lot of black space and in the middle of it "fp64: True" or "fp64: False", show which it is
OR
alternatively you can activate the conda environment, go into python and do what that does
preferably you'd do the latter as I'd probably want to try out a few other commands
There are these too:
If you still want to use attention slicing, use the IPEX_FORCE_ATTENTION_SLICE=1 env var
Xe2 should support 64 bit
So it shoudln't have 4GB issues and FP64 issues anymore
ah, this is making me realize
with the fp64 emulation that's shaping up, we're probably gonna reach a point where alchemist can do the fp64 data type but can't allocate more than 4gb?
Probably
Pytorch 2.5's FP64 emulation causes exactly this (manually enabled)
So have to use force attention slicing env var as well
I'll probably make it set that
I was using ComfyUI without any issues and generated several images. After stopping the server, I tried to restart it, but this happened.😭
Something with your environment is broken
@sly trench
Open ComfyUI/comfy/ipex_to_cuda/hijacks.py
Go to line 7, which is device_supports_fp64 = torch.xpu.has_fp64_dtype() if hasattr(torch.xpu, "has_fp64_dtype") else torch.xpu.get_device_properties("xpu").has_fp64
make a new line below it, add
print(f"\n\n\nfp64: {device_supports_fp64}\n\n\n")
run comfyui, you will see a lot of blank space and in the middle of it "fp64: True" or "fp64: False", show which it is
then undo the new line and text you added
The issue has been resolved by uninstalling and reinstalling PyTorch 😄 👍
just realized the non-comfyui vanilla nodes for ltxvideo work in arc, just not with --lowvram enabled.
having --reservevram 4.0 does just entirely mitigate the issue
and using https://github.com/SeanScripts/ComfyUI-Unload-Model in case certain things doesnt unload is good too
Comfy should go to lowvram automatically if needed I think it's just slower than enabeling it by default, maybe --reservevram speeds up the process where you don't need to enable it manually.
I want to update my drivers but the vram issue still isn't fixed I don't think. The reserve vram command helps in comfy but not sure there is alternatives for other applications.
LTX in comfyui, 50 steps with a default prompt/workflow. There are some tricks to getting more movement and better video. It's super fast, 2.77s/it there abouts. There are some quants but they ran much slower for me (probably due to --reserve-vram)
This was img2video, generated the image with flux with the default prompt.
might try my darth vader and compare to cogvideox lol
https://github.com/sandner-art/ai-research/tree/main/LTXV-Video used motion fix workflow, lowered the resolution to 768x512(recommended by devs).
Tips for better output https://www.reddit.com/r/StableDiffusion/comments/1h26okm/ltxvideo_tips_for_optimal_outputs_summary/
Hi @earnest grotto . It showed fp64:True
Additionally, your 0.0.7 version working fine on LNL laptop
This is with meteor lake, right?
Yes. it showed fp64:True on MTL laptop
I have both MTL and LNL
MTL has native FP64 support
datatype sure, but apparently it can't allocate more than 4gb
That's because it doesn't support int64
ah
seems like an odd choice of what to support
man, hopefully i don't get some really weird 700 second long load time again
nice, I didn't
wonder why my ssd decided to do that that one time
@sly trench Ok, I've updated the script, download it again, put it where you put it last time and run it again, it updates when you do that, faster than reinstalling. Regardless of that, it should work now
I wonder if it would be possible to get the 4gb fix committed to comfy? Its annoying to redo it each update tbh
It's probably best for a workaround like that to be included on intel's side, as a part of pytorch now
Especially now that some fp64 emulation has made it to 2.5
Could having multiple conda environments in windows slow stuff down even if they aren't both running.edit Seems I had to reinstall libuv/ipex files into conda env, even the quant models are faster now.
Introducing the Intel Arc B-Series GPU family, offering high performance for modern gaming at 1080p and 1440p resolutions, complete with the latest AI upscaling and ray tracing capabilities.
Increase responsiveness and more with Intel Arc gaming technologies, with Intel Xe Super Sampling (XeSS) technology boosting visual quality and performan...
Ever wanted to explore the latest generative AI tech but were intimidated by how to get started? Intel makes it easy no matter what level of expertise you’re at. From beginner to enthusiast, learn how Intel enables anyone to use generative AI through Intel’s own AI Playground, from text-to-image creation to customizable chatbots.
Join Bob Duff...
honestly, very cool. I hope we will be able to import are own workflows and edit them in comfy. (seems like it from the demo).
latest cogvideox updates broke all my workflows for it. No clue how to fix it
finally figured out what was wrong, but now all outputs are garbage. If anybody tries out cogvideox again let me know if you can get decent output.
that’s the goal
There is something majorly wrong with the latest ipex in windows in terms of speed, its often 5x slower but sometimes its the same speed. I realized I had installed an older ipex which was why it was fast, but its not compatible with florence 2 so I upgraded and now speed is ridiculously slow half the time. Is this issue known? I know i reported it many months ago, i could update drivers but haven't since there is a memory issue in the latest ones.
for instance, same prompt and settings for ltx video. last ipex version it's 2.55s/it all the time
ipex, or pytorch 2.5
ipex, latest version 2.3xx
I am on older drivers, but it was a problem months ago as well. There is a memrory issue for a750 now in latest drivers so haven't updated.
also, ipex hasn't updated since then either
frustrating since some new stuff like the florence VLM won't work on the older version
there is a 2.5 ipex in progress
if you're using my script, I poked into adding 2.5 and with it 2.1.40 but gave up somewhat halfway through when I couldn't get 2.5 working on windows without the basekit, I can fully add 2.1.40 as an option
I here pytorch is still slower, probably about what I am getting in 2.3
I could try it and see I guess
yeah, now it's back to 11s/it lol it's wierd
2.1.4 is fast, but it isn't compatible with all the newer stuff
Now it's back to 2.44s/it during the same generation? it's like it's changing randomly.
2.5 is slower and also has bugfixes
e.g. if you want stable cascade, that works on 2.5
doesn't on 2.3
2.6 is somewhat faster than 2.5, haven't tested too much
i think it's still slower than 2.3, just not almost-2x-slower
I may try 2.6 out, I think ipex 2.3 is just bugged for a750 in windows. I may also try new drivers, but last time it wouldn't even run flux without oom.
I can get the whls off github? Don't think i've ever installed straight pytorch tbh lol
of course, since it's nightlies, it's probably possible that you get one where someone did a whoopsie and everything is completely broken
lol, yeah. Appreciate it I will give it a shot. What version is the preview build on? 2.5?
yeah 2.5 is the preview
for the time being though, you will need the basekit (or rather, a part of it?) to run them, as per the link above in that article explaining requirements
https://www.intel.com/content/www/us/en/developer/articles/tool/pytorch-prerequisites-for-intel-gpu/2-5.html
(Scroll down for windows)
These prerequisites let you compile and build PyTorch 2.5 on Linux systems with optimizations for Intel® GPUs.
maybe that's this
most likely
it could also be driver/windows issue.. I noticed that when shared GPU memory is being used by > 200MB.. it drops perf by like 4x.. but on comfy you could try adding —reserve-vram 6.0
if you open task manager, look at buttom of the GPU tab where it says “Shared GPU Memory”, whenever that is being used >0.2-0.3GB, perf drops..
comfyui and disty's hijacks are made to work with xpu support in 2.5
you should move init_ipex above any ipex importing, still in the try-catch block
I'd assume the transformers load issue isn't present in 2.5 since there's no ipex
For Florence 2
I have it set to 4.0 but I could try 6.0 and see if it helps.
couldn't get either version of pytorch to work at all, installed one api, called environment kept getting different errors. The procedure entry point is the main one.
So far 6.0 seems to be more stable, it is a little slower but consistent now. Thank you
But ipex2.3 is a regression from 2.1.4, in windows with comfyui on a750 anyway
Im seeing something similar. When running loading and unloading models that take you to the edge of memory.
It might be a memory leak and that driver fix. Remember that driver fix that fixed SDXL in A750. It looks like that driver starts to swap memory when under max memory. This shared memory mode interrupts and slows compute.
Getting to max memory seems to happen as you load and unload models. 10-20% of memory never clears and causes a memory bottleneck
So far, I've only experience it with 2.3 ipex, 2.1.4 doesn't seem to suffer from it. At least for Comfyui, I've found that the sweet spot seems to be --reserve-vram 5.0 this keeps most of the speed and stays consistent. At 4.0 it goes randomly slow and faster etc. At 6.0 the speed decreases more with no other benefit. I have only tested LTX video and Flux so far though. Maybe the memory fix they are working on will fix it
Can Ipex and IpexLLM be run from the same Env? or would they conflict with eachother? going to install Ollama to run LLM's in comfy but not sure if I should run it in the same Env or not.
Also is there any issue with running multiple conda environments? Would resources get stuck or slow down etc.
Currently AI Playground is running IPEX and IPEX-LLM in the same environment. We are finding that adding in additional Frameworks and APIs (ie Llama.cpp) may require we have separate environment or at least server instance as oneAPI DLLs may be inconsistent across these projects
Okay thanks. I was thinking that is what ai playground was doing. But wasn't sure. Also seems ipexllm wants 3.11 while ipex wants 3.10
Gonna try and just install ollama in its own environment abd see how it goes with comfy, probably be easier when ai playground intergrates comfy
I believe AI Playground is on 3.10.11 @honest hull can check me on that
we are on 3.11
Hello everyone, I recently purchased an A770 graphics card, and I want to run comfyui on it. I followed the pinned instructions to install comfyui and run it, but many newer features are not available. I found that Intel® Extension for PyTorch* v2.3.110+xpu was released a few months ago, and the requirements.txt in the pinned tutorial is somewhat outdated. Is there an updated version of the requirements?
@lucid lily Use this script ^
There are multiple pins for different wayst to install, Vik's script is the ideal way, especially for an a770. The other way is still there because of issues with speed on a750 and under cards in windows. Only thing that won't work with the older ipex that I found is Florence2 from my (limited) testing.
@earnest grottoThank you. I used the script you provided for the installation, and everything went relatively smoothly. However, after completing the installation, when I launch ComfyUI using the shortcut, I receive the following error:
Error loading "J:\Comfy_Intel\cenv\lib\site-packages\torch\lib\torch_xpu_ops_aten.dll" or one of its dependencies.
I checked and found that this file is not problematic, but I'm not sure why it failed to load successfully.
@earnest grottoUm... I tried reinstalling from scratch, and this time it started without any errors. It's working now, very strange.😅
I installed AI playground through official website. Everything's work as expected
Can triton work on intel, and on in intel in windows? https://purz.notion.site/Get-Windows-Triton-working-for-Mochi-6a0c055e21c84cfba7f1dd628e624e97 Also, the Tencent video models apparently works on 8gb now runs faster with sageattention but I think you need triton for it.
Triton works on Intel with PyTorch 2.6
Triton itself doesn't work on Windows
Seems they got it working for windows in that post? But it needs cuda stuff
Anybody on a770 tried the hunyuan model?
Me. I can't get it to work effectively at all, and at lower resolutions gives me tensor errors 🤷
LTXVideo works with PAG.
They don't want people in europe using their models so I'm not trying out another okish image generator 😔
hunyuan isn't an ok-ish model tho
it's an actually competitive one
and it can run on a 3090
that's ok-ish to me
i think the images the current best models like flux generate, are ok-ish
nevermind that each has its own drawbacks
do you not tinker with realism or guidance models
there are ways to pull out details
like uh...
this looks to be doing in effect what can be done with flux.1 tools
well fluxtapoz has PAG for flux
and SEG (Smoothed Energy Guidance)
they can help improve prompt adherence, combined with stuff like perpnegguider
which is designed to help further follow prompting
This isn't magic that will make things way better
rescaling cfg is already in comfy by default, think others might be too
They are just nodes that make it easier to manipulate. More UI friendly.
After all, ComfyUI is down to the bones a visual scripting language of sorts.
Using that with flux is going to kill generation times, unless you're using schnell but then you're kinda defeating your own point, schnell is worse
I'm kinda getting tired of bad fingers at this point honestly
Yes, it's slower.
I kinda wish we just had better models.
LTXVideo 0.9 is great for how small it is 🤷♂️
furthermore, in stacking a lot of the almost-placebo improvements like PAG, SAG, perpneg, AYS and so on, I've sometimes had cases where the model starts failing to denoise
I've not gotten blackscreened images from that, but for some reason I do from the lora nodes when used with flux.
Including LoraLoaderModelOnly.
Not black. Just images with some leftover noise in them
Oh.
in spots
🤷♂️
I don't seem to have that issue
This was one I did yesturday with all of the nodes
this is pixelwave flux btw
that was with SDXL, maybe it's less likely with flux but I'm hesitant to use flux much due to how slow it is and how biased it is towards real life photos with strong depth of field
I'd say you don't need flux for this
but, i don't want abstract shapes or eldritch non-people
SDXL and most of its finetunes still suffer from non-zero snr-ness, and that doesn't seem to be the case for anything after it, at least not to that extent, disty said SD3.5 (don't remember which?) wasn't actually trained with zero snr
it was one of the big things SAI were touting with the original SD3
I like the SDXL finetunes. But then, there's no SD3.5 finetunes, and I don't think flux finetuning has progressed much
nvidia sana looks nice for how fast it is
if only it worked properly on arc
because it kept giving me terrible outputs with the wip node
It is not supposed to work properly on anything other than nvidia because nvidia explicitly want to forbid that
in their license
Well someone is doing it
So I gotta hope they get it fully operational
thats the node i tried
Nvidia users are getting it down to 10, supposedly even 8
Ideally, I'd want a good 3d model generator and then I can do as I please with the models, but that's still far off
For image generators, I want
- Coherent, symmetric, 5-fingered 5-toed humans (none do this)
And by extension, other coherent things too. Guns? Tanks? Most of my attempts with controlnet, like putting in the effort to pose an openpose skeleton, just made me facepalm, finetune couldn't do crossed legs properly and even with controlnet just fused them together - action (can any do 2 people fighting? I think no)
- Styles and some reference sheet for them
most anime finetunes do this indirectly. PonyXL author wanted to be malicious about this. Base Flux struggles. SD3.5 looks to do styles, but I don't want to be finding out what it can or can't do
In general, I'd like to see the actual captions used for 100-1000 random varied images in these models' datasets. Surely that's not too much to ask for? - Zero snr or something close enough (this means the model can generate very dark or bright images)
- A fast inpainting model that's better than SD1.5
among other things
SD1.5 still feels like the best for inpainting
I haven't tried the new Flux controlnets
So, it might tick that box of not failing to follow them
I tried it
As a reference, the 2nd image is from a playstation 2 game.
I'd say you'll get what you want within the next year or so. Although if it stays open by then who knows
3d modelling will obsolete itself if we can use an inpainter to Draw a base model image t-pose
it looks better than previous 3d model generators, i'll give it that, it is good to see those are actually improving
But yeah, it's ps2-level... ish. Can be worse.
A lot of these will revolutionize in 2025.
Kijai updated cog video nodes and now almost nothing works right anymore. Been messing with ltx and can get decent results sometimes.
I'm not gonna dunk on it too much since the hf space errored out when trying to do textures over 1024^2 or simplify less than 0.95, but...
LTX 0.9 btw, theyre gonna release the 1.0
which hopefully fixes the long-prompting issue
which they know
Yeah, hopefully they train human movement into that one. Also the need to add noise to make the image worse so it animates
but ltxvideo is already OP for video-to-video
I haven't tried video to video yet
darn
The video models look decent but I'm still not a big fan of the fact that they're all fundamentally limited to short clips
With that in mind, those recent ai generated ads from coca cola and such get even more obvious
why hand not green >:(
i dunno but its pretty close
probably the denoise value
the other two nodes in ltxtricks
are cool
one of them essentially does what an application you might know in the past did
image+video to video looks really good from the example. Although it was pretty much a perfect one to one drawing
a youtuber got famous for it
lol i cant remember that software
Ah, I remember.
Ebsynth.
from 2019 lol
ah, i don't know that
or is that the thing that joel haver uses https://www.youtube.com/watch?v=c6MW-qdNoYA
ELDEN RING out now! - https://www.bandainamcoent.com/games/elden-ring
Featuring @comedianalecrobbins @cerspence and Calvin LaVallee - https://www.youtube.com/channel/UC_DudUFOztlAHj5JlHXbuDQ
More Animations - https://www.youtube.com/playlist?list=PLKtIcOP0WvJDZemPYZZQSqotCgpps5DbX
Subscribe for weekly short films.
Support -
Patreon: https:...
it is the thing joel haver uses
its rotoscoping + ebsynth
he says so in the comments of the vid i posted
That was an amazing video tbh, I only know of ebsynth from a1111 extension, never tried it though. I think animatediff replaced it
if he posted that today, he'd get flamed to all hell lol.
aaron
use the pag node from ltxtricks
it slows it down a bit but greatly helps consistency
Todays world sucks
The stg nodes? Yeah, I use them. I was able get them to work with img2video as well. They help consitency but also slow down movement (at least wtih img2video). Longer steps also slow down movement, not sure if that is only with stg nodes though.
People use the Detail Daemon nodes as well, but for img2video I could not get them to work properly.
I need to try some more txt2video stuff. Need to setup my llm to prompt for it though.
Running florence2 to ollama to ltx feels really cool tbh. The vram issue pops up sometimes with ipex 2.3 though
I use wd-14 tagger's largest model
with underscores removed
fed that into api llm
thats what I use to img2img usually, that and combine it with flux depth/redux
I need to check that out.
On another note cogvideox fun models just don't seem to work anymore with the new workflow for img2img. Either just black output or misty noise that vaguely looks like the image. Gguf still doesn't work for cogvideo either
https://www.reddit.com/r/StableDiffusion/comments/1h9d9xy/svdquant_now_has_comfyui_support/ new 4bit quant support in comfy apparently, not sure if it runs on intel or not
has anyone tried to install sd3.5 large and succeeded, if so i need help
Hi Bob and Community. 1st: Thanks to INTEL to all the effort they take to establish themself as an additional GPU developer and vendor; I also owned an ARC A770. So my Q directly to Bob:
Why does INTEL spent that less effort on also get all those nice tools and running smoothly on Linux, since all those big models are trained on it on big fives side?
It is a huge amount to take to get this GPU also running ComfyUI since there are much less instructions and hints to find to set it up on Debian as there are,e.g, for Windows.
And, please, no offense, but also e.g. the python version in those tools from INTEL is still on V3.10 and so some kind of outdated related to latest Debian dists.
So, u always have to look for some elder,e.g, pip wheels and stuff to get downloaded to get it nearly some kind of running/working.
Will there be a bit better instruction to get this running?
Regards.
Depends on what you count as success. I can get the turbo model to work right, but the dev model outputs trash half the time. Apparently something to do with Ipex, should work if you use pytorch (but obviously that is still slower for now).
You can use pytorch 2.5 or 2.6 natively now, but there is a speed issue that hasn't been fixed yet. (not just with intel). They probably haven't updated ipex since it is included natively now.
Many things do not work with the latest python version, including some webuis themselves, this is normal
It's assumed that if you're on linux, you can figure things out yourself, otherwise you wouldn't be using linux
You can get older python from the deadsnakes ppa, or use conda and make a conda environment with an older python
https://intel.github.io/intel-extension-for-pytorch/index.html#installation?platform=gpu&version=v2.3.110%2Bxpu&os=linux%2Fwsl2&package=conda
You'll likely want 2.3 for the best performance ^
Note that the pip instructions here are broken, at the very least there is no conda prefix with pip
OR, you can get 2.6 for more compatibility, less bugs https://pytorch.org/docs/stable/notes/get_start_xpu.html (nightly)
This website introduces Intel® Extension for PyTorch*
Everything works fine with Python 3.11, it is just that the UIs are very stubborn in staying with Python 3.10
The only issue is Python 3.12
Not many things support 3.12 yet
many projects still use numpy < 2.0 too.. above pip installs would install latest numpy and it might also break projects
@formal tusk If you're still struggling with getting ComfyUI running, here's a linux version of the pinned script https://raw.githubusercontent.com/a-One-Fan/ComfyUI-Intel-Installer-Script/refs/heads/other_one/Setup_ComfyUI_Intel.py
Though I can't guarantee that it will just work
Hi @earnest grotto . I've reinstalled your 0.0.8 version on my MTL laptop.
But it occur this error. Any ideas?
Now that the B580 is released I have a question about it, does it also have a 4GB memory block allocation limit?
I don't think so, but someone will have to test. Xe2 shouldn't need it
IPEX 2.5 is listed on the intel repo
Seems like the US mirror is giving access denied error on every ipex version
CN mirror works but very slow
nice, will try it out and see if it works. Maybe coincide with b580 launch
This is the speed i am getting from the CN mirror rn (I have 1000 Mbps download)
Haven't seen DSL in a long time lol
IPEX 2.5 is a little bit slower but completely acceptable for the accuracy improvements
How much slower? on windows 2.3 is already slower than 2.1.4 for me, so it might be faster lol
I am using a custom model arch
Went from 2.6 s/it to 2.8 s/it
But now i am able to run CLIP on the GPU without corruptions
Yeah, that's not bad at all. I am downloading now. I guess there is no need for a oneapi update?
Nice, so stable 3.5 should work now
I couldn't get it to pip install, so made a requirements file. Might not work then. Windows typically have to install oneapi afaik.
so far seems way slower in windows, but I do have a new driver update waiting which could improve things. Also issues with memory in latest drivers so could be faster on older one lol.
flux from about 5or6/sit's at 1024 to 12.9s/it and ltx from 2s/it to 82s/it lol.
gonna update the drivers and try again, then if that is worse still might try older driver, then go back to 2.3 or 2.1 if it's still terrible
requirements files are just stuff to install with pip, laid out in a file so you don't have to type them out
source control is more convenient, whatever else
I tried to copy the pip install from the other one and input the new links but it wouldn't work for me, might have neede the entire url for each file like I did in requirements file though.
ipex 2.5 is unusably slow in windows, at least with comfy ui
are you using ipex 2.5 or pytorch 2.5?
ipex
My guess it's a compounding issue from 2.3 and current drivers with a750 memory allocation.
2.3 is slower than 2.1.4 but it can be mitigated with reserve-vram 6.0, nothing seems to help with ipex 2.5
can't view vram usage anymore wtih the new arc control thing, so maybe it's running on cpu or something? seemed too fast for that with flux though.
use xpu-smi to view vram usages
I am on windows😭
works on windows 😉
Ohh didn't know that, thanks!!
xpu-smi.exe dump to view the metrics available.. most of them are avaiable but some might report N/A as the tool is developed for Data Center GPUs
xpu-smi.exe dump -m 0,18 -d 0 should show you the GPU utilization as well as memory used in MB unit
same on windows.. starting with ipex 2.3, installation of the oneAPI base toolkit is no longer needed
when you pip install ipex it also installs the oneAPI dependencies. for ipex 2.5.110 it should be installing dpcpp-cpp-rt==2025.0.4 mkl-dpcpp==2025.0.4 etc
It worked it was just like 100% slower in everything for me in windows. Probably a750 related
2.1.4 is the most stable and fastest but it is no longer compatible with alot of new stuff
still on 2.3 myself
trying to get comfy_extramodels sana to work
I don't seem to be getting the greatest results.
Maybe there's a problem.
I've tried it at multiple CFGs, on euler simple
Haven't tried that, but It's geared specifically for nvidia all together so maybe that's why. could also be the clip issue that 2.5 might fix
Someone please give me the bat file for installing comfy ui
@primal hatch
the script dosen't work with the b580
The official guide for 2.3 also seems to have switched to china? https://pytorch-extension.intel.com/installation?platform=gpu&version=v2.3.110%2Bxpu&os=linux%2Fwsl2&package=pip
I don't know if installing ipex for 2.3/pytorch 2.6 manually would work on battlemage at all at the moment
it might indeed be too new
ahhh so wait?
I'll edit the script to install the same thing for battlemage as it does for alchemist, but you'll have to find out if that works or not, and I lean towards no
yeah
Also I guess I'll have to check if it can be downloaded from the us at all anymore or they've just moved to china since my script downloads from the US
I'll be uh, waiting on a windows update in the meantime though
So don't expect that in 10 minutes
okie
Saw that too
I hope US will be back up soon
CN connections are slow even if it works properly
@earnest grottowait the ps1 file works
That's from when I decided to be very generous with detecting the GPU
That will install alchemist stuff
ohhh.....
is there even new stuff for battlemage?
I don't know
Discord crashed due to windows updating
???
I assume that battlemage support will need to be added to pytorch, and IPEX, and I assume that it will not be retroactively added for 2.3 which is not as laggy as 2.5 or 2.6
okie
@silk umbra @sly trench Updated
nice
You can just download the new version and replace the old one with it. Run it again, it will not have the same error again but I'd still expect it to not work
ye did it
If it doesn't work, you wanna follow some of my directions instead after that?
It's just loading
IT WORKS
Did you generate an image?
ok downloading flux rn
Where do you get the whl for ipex 2.5.10?
Actually, should I even bother going from pytorch 2.3
well, since the us one is down, it's probably worth taking a look into the cn one to see if there's anything new
and it has been down for a while now
i can't get it to gen
What does the command prompt say
Failed to validate prompt for output 9:
- CheckpointLoaderSimple 4:
- Value not in list: ckpt_name: 'Flux\flux1-schnell-fp8.safetensors' not in []
Output will be ignored
invalid prompt: {'type': 'prompt_outputs_failed_validation', 'message': 'Prompt outputs failed validation', 'details': '', 'extra_info': {}}
- Value not in list: ckpt_name: 'Flux\flux1-schnell-fp8.safetensors' not in []
im using swarmui
Do it in comfyui then
Bottom of this post
Download the image. Drag-and-drop it onto comfyui in the browser
oh ok
Did you download flux with my script or one of the various all-in-one variants
off the web
Yeah that's what was expected
Your swarmui issue is unrelated
I'd bet in a week's time support will be here
okie
you download all the packages? Torch, Torchaudio/vision etc? It 100% worked for me in windows, however it was so slow it was pretty much useless.
Feel like speed and stability has regressed from each version after 2.1.4 tbh
could also be a skill issue on my part lol
It isn't just the torch+xpu packages I tried
I wanted to use it in conjunction with ipex 2.5.10
but idek if that works
also uh
phi-4 released
Yeah, I did it with ipex 2.5. I just found the links for each and copied them to a requirements.txt file. Ipex likely still needs the special version of each
I wonder how true those benchmark results are tbh.
from user feedback seems on par with qwen2.5 14b
when you realize you just had to restart your pc when properly re-installing ipex-llm ollama
now it workin like a charm with phi-4
Will they release a 7b model?
🤷♂️ No clue yet. This model's brand new.
There is a new video2audio model, but apparently only works on 2.5.1 pytorch
What version of torch+xpu is compatible with ipex 2.5.10?
If there was any real reason to swap to it, that is...
I used the one for windows here https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/torch/ For me no reason to swap because it was tooooo slow. Although you may need it for new stuff like that video2audio model I posted. Not sure if the ipex 2.5 is better than the regular torch2.5 that already has xpu natively though. I could never get the regular torch to work for me
Yep, far slower indeed.
Ipex didn't seem to help much either, I guess everything's still in its testing phase
imma keep with 2.3
nothing seems better atm
Sneak peek coming to AI Playground 2.0 with ComfyUI workflows and support for Flux.1
None
Use the torch from ipex
waiting for this 😍😍
Intel should make some comfy nodes for llms, then I won't have to run it through ollama if using it in a workflow with ai playground
could just use ipex llm ollama to run it with sycl
ok well 2.5.1 works on the a770 le no problem it's just half speed like aaron said
rip
pytorch 2.6 is faster while still having fixes that came with 2.5
if that is what you're looking for
otherwise can just stick with 2.3 for now
I do but i'd like to run it all in the same env without needing to load up a server. It also causes some issues with vram for me as ipex2.3 has some speed and memory issues with a750 sometimes.
I could not for the life of me get it to work with comfyui in windows. I called the oneapi files and it loaded but then errored out. Only ipex seems to work for me.
I may start up another wsl2 enviornment and see if I can get more speed in windows.
you shouldn't need to call oneapi vars for upstream Pytorch 2.6
just pip install torch torchvision with the extra url that points to pytorch.org
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/xpu
I will give it one more try with this install command, thanks.
would recommend testing it with a simple standalone script first and a fresh python env..
for example
- create env and pip install torch torchvision using above link
pip install diffusers transformers accelerate- run below code snippet
from diffusers import AutoPipelineForText2Image, DEISMultistepScheduler
import torch
pipe = AutoPipelineForText2Image.from_pretrained('lykon/dreamshaper-8', torch_dtype=torch.float16, variant="fp16")
pipe.scheduler = DEISMultistepScheduler.from_config(pipe.scheduler.config)
pipe = pipe.to("xpu")
prompt = "portrait photo of muscular bearded guy in a worn mech suit, light bokeh, intricate, steel metal, elegant, sharp focus, soft lighting, vibrant colors"
generator = torch.manual_seed(33)
image = pipe(prompt, generator=generator, num_inference_steps=25).images[0]
image.save("./image.png")
also add this to ComfyUI/comfy/model_management.py if you are using pytorch 2.6 nightly. it improves performance by about 1.5x .. The perf issue with upstream Pytorch seemed to be not yet fixed in 2.6 nightly either (happens on other vendors too)
do the ipex hijacks still work in windows or am i doing something wrong? Get this error with latent upscale instead of oom Current platform can NOT allocate memory block with size larger than 4GB! Tried to allocate 4.25 GiB (GPU 0; 7.75 GiB total capacity; 4.19 GiB already allocated; 4.35 GiB reserved in total by PyTorch)
this is what I have now with the florence fix as well try: import transformers # ipex hijacks transformers and makes it unable to load a model backup_get_class_from_dynamic_module = transformers.dynamic_module_utils.get_class_from_dynamic_module import intel_extension_for_pytorch as ipex ipex.llm.utils._get_class_from_dynamic_module = backup_get_class_from_dynamic_module transformers.dynamic_module_utils.get_class_from_dynamic_module = backup_get_class_from_dynamic_module from ipex_to_cuda import ipex_init ipex_init() xpu_available = True except Exception: pass
If you are using ipex 2.5, update the hijacks too
Some nodes of ComfyUI need fp64, but my arc a770 don't have it. Any solutions?
@lucid lily ^
Or you edit model_management.py as shown right above you
and you clone disty's hijacks into the comfy folder
add here?
You find the try-catch block that does import intel_extension_for_pytorch as ipex and you replace it with what you see above
I think the except might've had something else that isn't pass but not sure
@lucid lily Please link the exact custom node and copypaste the full stack trace, and link any models if they need to be seperately downloaded
I assume we just add the command disty mentions there to model_management.py?
probably after ipex_init()?
I want to know too, seems odd if that's for a different backend
wouldnt you just import ipex_init() and thats it
then that command ig
from ipex_to_cuda import ipex_init
ipex_init()
torch.backends.cuda.allow_fp16_bf16_reduction_math_sdp(True)
_ = torch.xpu.device_count()
xpu_available = torch.xpu.is_available()
except Exception:
pass```
@earnest grottoany instructions?🥲
These are the instructions
Do this
well im on 2.6.0
i wanted to see if 2.6.0 would fix my issue with sana in comfyui outputting improper outputs
Same workflow on arc
on both 2.3.0 and 2.6.0
also torch 2.6 with that command makes the performance equal to 2.3
@somber trelliscan you run any custom_node like semantic segmentation which use fp64 when using cuda with ARC?
@somber trellis you have some spare nvidia gpu? you wanna run some stuff for me with nvdiffrast, after some time?
No he can't.
Support for fp64 will come. It's already in an experimental state in 2.5/2.6
If you want things fixed now, do what I said
And I'll look into the custom node and I can put patching it into my installer script or tell you what to edit or whatever else
wait did you guys get stable diffusion working on arc b580
@earnest grottoright here
Yes I already saw it
.
Link the custom node
Link any models if they need to be seperately downloaded
Did I misunderstood?😅
I found any nodes which utilize Semantic Segmentation preprocessor in comfyui report fp64 error when using Arc
no im on arc
i have a 1060 but its not installed lmao
Are there any other commands I should add to model_management other than torch.backends.cuda.allow_fp16_bf16_reduction_math_sdp(True)
@earnest grottoI replaced any datatype float64/double in this nodes with float/float32, but still got same error report😅
instantir works on torch 2.6+xpu at 4s/it
I can run flux which uses fp64 datatypes in its mmdit py
Ipex-to-cuda fixes that issue for you in most cases but certain custom nodes you might need to modify it manually
the hijacks don't cover every scenario
This was with 2.3, not sure what's going on with it. I am going to try 2.6 sometime today. I only get this error with latent upscale so far, but my code string is correct?
yeah, hijacks updated same error. Might just be something with the latent upscale nodes that don't get hijacked? Gonna try 2.6 later on fingers crossed it works this time
It was going from 768*512 to 2x that, I would have expected an oom rather than the 4gb message. Its an ltx vid though. I have been trying to find an upscale method that didn't take forever
But oom usually has another message not the 4gb, when I got that before I didn't have the hijacks working. Its strange
not sure if i wanna use 2.3.0 or 2.6.0
or ipex 2.5.1
2.6.0/2.5.1 is not far from 2.3.0 with the cuda mem eff sdpa fix
is there a way to get it to run with comfy? Getting not compiled with cuda errors
I got it to work in a fresh env by itself with the test script, but comfy won't recognize xpu or something
with 2.6.0?
Yeah
In model_management.py you need to remove all ipex-related code for xpu
okay thanks
from ipex_to_cuda import ipex_init
ipex_init()
torch.backends.cuda.allow_fp16_bf16_reduction_math_sdp(True)
_ = torch.xpu.device_count()
xpu_available = torch.xpu.is_available()
except Exception:
pass```
for me i just use this
that third line helps with performance
also, did you get a lot dependency errors?
I got some 2025 oneapi dependency errors that I fixed by installing either their 2025.0.2 or 2025.0.1 counterparts
otherwise I have torch, torchaudio and torchvision for torch 2.6.0
this is what I am getting.
