#SDNext WebUI on Intel ARC
6474 messages · Page 7 of 7 (latest)
It wasn't working in docker so i tried just running in windows and its still not working, so idk what's happened
I took a half year hiatus from my computer so idk what's changed in that time
When you ran it in windows, did you run with --use-ipex
aye
fresh git clone, webui.bat --use-ipex, no errors, open up webui, select an old model I had > "Queue cannot be constructed..."
ok, odd. lets try with a fresh civitai model from the sidebar, can't possibly mess that up. Downloads fine, select model, loads up, cool. Hit generate > "Engine did not start..." error and the webui crashes
tried it a few times and didn't get any farther
@restive parcel what gpu, what driver version
a770, current driver is 32.0.101.6458 due to instabilities present in latest
had to revert
wanna try 6314?
32.0.101.6557_101.6262 from January works alright on an a750, but the most recent driver certainly not
6647 works alright
6632 works fine
but I'm having difficulty using sd.next, as most youtube sd tutorials are on a1111, which has different settings and layout from sd.next, and sdnext wiki has only the most basic doc
you can try the sdnext discord if you need help learning the program.
can someone help?
this happened the first time i tried to open it
i downloaded a different driver then it fixed it
but now its happening again after not using it for a while
and also a slight memory upgrade if that matters
if alchemist gpu, get driver 6314
and keep it. it will probably vanish from intel's website after the next driver release
sry forgot to mention it is an arc a750
why
that's an alchemist gpu yes, so get 6314
because not all the old drivers are available (in an easy to find way?), something like only the last 10 or so are
When running for the first time/reinstalling
ok
after installing the new drivers, would it be necessary to delete the venv folder to make a new one?
no need to reinstall anything in sdnext
just try with this older driver, should work fine
it worked!! thank u
how do u figure out which drivers work and which dont? just by trying out each of them one at a time?
pretty much
Hey all, i finally got SDNext to install on my pc, but im having an issue where when i launch it, the only "error" it gives, is that torch is running in cpu only mode, i am unsure how to fix this. I have tried reinstalling torch, using the command argument --use-ipex, also have tried to force the id to my gpu.
When i use the ipex command, it just won't start properly and gives me the error "torch not compiled with XPU enabled"
Any help would be appreciated.
which version of python are you on?
if you are on a really old one, it might not find pre built wheels for torch+xpu
Torch is not changed after the first install unless you add --reinstall too
You should have been using --use-ipex from the start
Im pretty certain it is the newest since i juat installed the latest one on their website
What exactly do you mean from the start? Sorry I'm a little new to this!. I added ipex to the command line arguments after it was installed and built
When you first ran it
Since you didn't do that, run it once with both --reinstall and --use-ipex
Gotcha, ill try that when i get home.!
That fixed it completely, thank you so much!
I'm getting the same error now. I've tried plenty of different drivers, and made clean SDNext install.
Using A770. SDNext used to work fine a week ago, unsure why it's acting up.
The full error being: "Queue cannot be constructed with the given context and device since the device is neither a member of the context nor a descendant of its member"
i just ended up using Comfy off the backend of I Playground, since it seemed to work for some reason? 
never got SDNext to run
I managed to fix it! The issue seemed to be the iGPU being enabled and making some conflicts. I had it disabled before but something must have happened to enabled it again, so I didn't consider checking it.
@fickle plume I will continue talking here.
There is no error, I think the Graphics Drivers (or something related to it), crashes.
I am currently trying to do a fresh re-install. Let me try that first and I will let you know what occurs.
I don't have access to the stack trace rn either, I believe.
so it just exits when loading diffusers?
how do you know that happens specifically when loading diffusers
and what driver version
and what GPU
It'll get to the point where it it says "6/7: Loading Diffusers", or something along these lines, then crash. It will then say "Press any key to continue", on the same line.
32.0.101.6647, Intel Arc A770 16GB
Install 6314, try again
Sounds good.
Also, do you have any resources for how to make Flux1 work on this GPU?
dunno about sdnext, I use comfyui mostly now
it should be possible in sdnext
kk
Oh boy, I am not allowed to post my error.
"XPU out of memory. Tried to allocate 2.67 GiB. GPU 0 has a total capacity of 15.56 GiB. Of the allocated memory 6.66 GiB is allocated by PyTorch, and 1.36 GiB is reserved by PyTorch but unallocated. Please use empty_cache to release all unoccupied cached memory."
This was specifically when I was trying to upscale.
Seems to work now, I changed the version and then I changed the virtual memory on my PC.
Thank you for the assistance, I will let you know if I encounter anything else.
Just made the switch to SdNext from A1111 after switching to intel GPU. It's been anything but pleasant.
Can anyone guide me on how to optimise it better?
-
I can't use img2img from images as sdnext will just crash. I have yet to try lowering the res for the image to see if it works.
-
Sometimes generating an image will just take a really long time only for Sdnext to crash. Sometimes when I see that the progress is really slow, I closed sdnext and reopen the webui and it's fast again.
I'm on windows 10, 11th gen intel processor, b580 gpu and 32 gb ram.
I got medvram in the argument as without it, sdnext will push my vram and ram usage to the max.
are you running with --use-ipex?
Yes I am
It's been awhile since I used sdnext, are you using any of the attention optimization settings? Also what size images are you using?
I can generate 512x512 and 1024x1024 just fine, but when I want to go higher than 1024 most of the time it'll take very long and oftentimes it's just crashed after awhile
I tried to install forge or fooocus but both seems to have issue installing haha
How much higher are you trying to go? Its probably spilling over into system ram which will cause slowdown.
Larger res will always be slower
I usually do 768x768 in A1111 before I double it... No reason specifically. So I can just work with a lower res. The thing that bugged me is that img2img doesn't seems to work without it crashing haha. I'm gonna try to lower the res of the image first before I load it up to see if it'll work
Run with lowvram, and disable live previews
Even though now I got a higher vram? Was using a 3050 with 8gb ram, and med vram was working just fine... Then again that's an nvidia card. Will give it a try later
Nvidia offloads vram to ram by default in the drivers, so basically runs lowvram unless turned off
I used to run 768*768 no problem in sd next though, are you using any of the attention optimizations? I forget which was the best in sd next, but they should help with vram optimization and speed. You have to chose them in the options in sdnext
I enabled tiled VAE to 96 I think and it helped out a lot. Even able to go upwards to 1920x1080. Not sure what attention optimization is
I haven't used it in over a year so maybe it's changed, I think the cuda hijacks maybe enable it by default for intel now not sure. You can also try ultimate-sd-upscale or another tiled upscaler to get higher resolution images.
is there anyway to fix the "No XPU deviced found" thing without rolling back the drivers?
Why do you not want to go back to older drivers
You should probably be more specific with whatever error your getting. Drivers shouldn't cause that, so it's likely something wrong with your environment.
you were right im sorry i just needed to disable igpu in device manager
If you manually install torch 2.8 you don't have to disable the igpu. But you must use FP16 or you will get NAN output when decoding VAE.
--use-nightly arg will install 2.8 for you
NaN with BF16 is very strange tho, usually it is the reverse
Also don't use Clip Skip with SDXL
A1111 doesn't support Clip Skip for SDXL, so any change you make is ignored on A1111.
SDNext supports setting Clip Skip for SDXL and you will have a bad time when you actually use Clip Skip with SDXL.
Pony will simply break and output static colors with non-default Clip Skip for example.
Other SDXL models still work to a degree.
HiDream on A770 with qint4:
Full model with CFG gets 10 s/it
VRAM usage is 12 GB with model offload
Flux Dev with qint4 + model offload gets 3.75 s/it and uses 8GB VRAM for comparison
nice
Not so nice for those fingers :(
Show the error
The whole stack trace
FramePack on Intel ARC A770 16GB
Using NNCF INT8 with balanced offload high threshold set to 0.6 and low to 0.
how's the VRAM usage on this?
Depends on the offload threshold
0.6 threshold with 16 GB VRAM GPU means use 10 GB of VRAM for the model weights
It uses 15-16 GB total with these settings
ouch, so difficult to run on B580
You can reduce the threshold but 6 GB will probably be the minimum
As the compute needs 5-6GB
It should work fine
Just reduce the threshold
B580 has 12 GB
yes
Also this 24 fps 4 second video took 15 minutes to generate on A770
thats actually not that much of time
had to generate 120 images and prediction algos etc
24 fps * 4 = 96 frames
ah my apologies i'm blind
😭
then approx 10s per image, seems normal according to benchmarks i saw online
im only wondering about vram usage for things like flux and sdxl
Balanced offload is pretty fast
VRAM doesn't matter that much unless you have less than 8 GB
System RAM is more limiting with these models
ordered a 64gb kit to avoid system ram limitation
i hope its enough.... ddr5 is expensive
64 GB is enough for most
Only issue will be if you try to run Hidream at full BF16 as the model weights alone are 64 GB at BF16
Tho you really should use int8 quantization with HiDream anyway
int8 will fit in 64 GB
Some quant tests for HiDream:
HiDream Full
All BF16 / BF16 Transformer + NNCF INT8 TE / All NNCF INT8 / All Quanto INT4
hidream full itself is quite impressive, but you do have 4gb more VRAM
New NNCF Quant modes:
Flux Dev, transformer only quant
BF16 / INT8_SYM
INT8 (the original method with old nncf) / INT4 / INT4_SYM
With SDXL:
BF16 / INT8_SYM without Conv quant / INT8_SYM with Conv quant / INT8 (the original method with old nncf)
INT4 / INT4_SYM without Conv quant / INT4_SYM with Conv quant
On the fly Lora usage is also supported.
with SDXL + LoRa:
BF16 / INT8_SYM without Conv quant / INT8_SYM with Conv quant / INT8 (the original method with old nncf)
LoRa starts to lose its effect with INT4 quants
INT4 / INT4_SYM without Conv quant / INT4_SYM with Conv quant / BF16 without LoRa
NNCF changelog:
- NNCF update to 2.16.0
major refactoring of NNCF quantization code
new quant types:INT8_SYM(new default),INT4andINT4_SYM
quantization support for the convolutional layers on unet models with sym methods
pre-load quantization support
LoRA support
the new INT8_SYM method has basically the same quality as 16 bit
pre-load quantization support is to support quanting the model while the original 16 bit model is being read from the disk so no out of ram will happen
framepack works on A770? torch xpu nightly is enough?
is there a tutorial page on using it in sdnext?
Errors snippets isn't helpful, post the full log
also were you on dev branch?
anyway, dev branch merged to main now, it should be able to run on the main branch if you update
i updated to latest dev, so it's working now. can I use only init image to generate?
i've updated to latest dev so it runs now. but it's been 50min , i'm still only 36% doing 4sec 20fps...
GPU's being used, vram's filled, but speed's a joke been 1.5hr, now at 50%
what is your offload and quant settings?
you won't be able to run the full model at 16 bit with the default offload settings without going oom
set balanced offload high threshold to 0.5 or 0.6
Enable transformer, video, llm and te from nncf quantizaton settings
I explained the settings here
New NNCF update might make it need to reduce the high threshold to 0.5
video did come out after 2hrs...
wonderful, how to have your nice UI?
UI settings, UI type -> Modern
Then do a full restart and also clear the caches from your web browser
with the NNCF.. I ran a 8sec video before heading to bed, which took ...3.5hrs
Video 20250429-075742-libx264-f145.mp4 | Codec libx264 | Size 512x768x145 | FPS 20
sample 12303.43 vae 147.78 offload 56.95 move 23.26 vision 19.69 encode 12.46 prompt 11.81 save 1.68 gc 0.87 preview 0.55 | GPU 14852 MB 93% | RAM 31.25 GB 49%
What is your PyTorch version?
New update uses 2.7
I was using 2.8 nightly
2.6 is slow
my modern looks different
Clear caches
you're right i'm still on 2.6 nightly, i'll upgrade it right now.
been using comfy since 2.6, cuz it's tough to learn sdnext without someone like you, there's very little tutorial out there, most of them on comfy
thank you so much
I did do clear all cache and then restart server. i'll try again after download 2.8
i went into broser setting to clear images and files, but i suppose also cookies and other site data ?
Also just checking, did you set modern as the UI type or did you change the theme?
Because they are different things.
Cookies and site data
i've been meaning to ask, did you make ipex_to_cuda?
Yes
how come it's being used in comfy, but i don't see it mentioned in sdnext?
so it's still integrated in sdnext under the hood
Yes
remove anything that starts with ~ in the site packages folder
You probably changed the old ui's theme instead of the ui type
Show the user interface settings
Try setting the theme back to default
Then restart and do CTRL + F5
CTRL+ F5 should force actual refresh
This sycl kernel dll sounds serious,it's been there since a month ago, why is it causing problem now
I restart and do ctrl+f5 still
not sure what to do here
much better with 2.8nightly, 12sec video in 1.5hr !!
but how come you can get it done with your A770 much faster?
turned out that the modern extension was not enabled, one down!
do i need to check voncolutional layers on nncf?
i'm confused with modernUI tho, with old UI, I can do faceswap by start with Image to Image, script->face->faceswap->input image->generate, now with modernUI, after doing the same in scripts tab, hitting generate at the bottom doesn't do anything
Does webui --lowvram invalidate the balanced offloading settings? do you run webui without --lowvram?i thought with A770 lowvrams a must...
--lowvram will override it with sequential offload
Which will probably make it run on a potato but very slowly
Modern UI uses the Control tab by default, most scripts doesn't work in there
You can re-enable text2img tabs from settings
It was called something like Hide txt2img tabs in User interface settings
So just webui --use-ipex for A770 ?
Settings -> Variational Auto Encoder -> VAE Tiling
Goodness... but comfyUI guys says to use --lowvram with a770 on comfy
ComfyUI doesn't support balanced offload
Gotcha
I tried to find flux1 qint and saw ur hf, but didn't see any checkpoint qint file?
It is a diffusers model
Also available in the reference models in SDNext
You can just click on it and it will download & load it for you
Enlightening
Tho i recommend using the original model and quantizing it on the fly instead
We support quantizing any model on the fly
Yep
Just set a quant mode
We also support quantizing while the model loads with transformers models (like flux) so you won't run out of system RAM
That's called "pre" load mode
post load mode will load everything into RAM first, then quantize
SDXL and UNet models are only supported with post load mode
You guys are so good, really wish you can take over comfys backend
Yes I read thru the wiki that's how I realized about lowvram
sigh, I remembered wrong, haven't been using --lowvram with sdnext
what else could be causing my super slow framepack?...
what is your resolution and frame rate?
12 seconds is very long
15 mins was for 4 seconds, 24 fps, 480p
12 seconds, 30 fps, 512x768 will take 1.2 hours just from multiplying 15 minutes with frame and resolution increase
I see. what does lowering balanced low watermark from 25 to 0 do exactly? and how about high watermark from 75 to 60?
Video 20250429-230714-libx264-f73.mp4 | Codec libx264 | Size 384x576x73 | FPS 24
sample 568.83 vae 50.10 offload 37.98 move 13.00 vision 10.95 prompt 10.78 encode 9.78 save 0.65 gc 0.64 preview 0.40 | GPU 12868 MB 81% | RAM 30.97 GB 48%
high watermark i'm guessing from doc that if vram's being utilized more than high watermark, the rest of model get sends to ram? but doc says nothing about lower water mark
high watermark is used to reserve x amount of the vram for the model weights
aka model weights will use 75% of your vram with 0.75
low is used for when to offload models
if vram usage is smaller than low watermark, it won't offload
clip models stays in vram with 0.2
llama text encoder uses a lot of vram for compute so setting 0.7 for the high will likely oom
tho it will be fine if windows can use the shared memory
so changing low from 0.2 to 0 makes sure clip gets offload to ram?
Yes
Curious if your ipextocuda directs all torch.cuda to torch.xpu, all porjects using torch that only has cuda codes can work with xpu?
it direct most
tho anything isn't pytorch won't be directed
possible to do the same with onxx?
We don't use ComfyUI in SDNext
FramePack was working awesomely yesterday, then I updated the extension and now I get this when starting the server
'ImportError: cannot import name 'ui_video_vlm' from 'modules' (unknown location)'
That's a new feature in dev branch
Either downgrade the extension or switch to dev branch of SDNext
Ah thanks, I don't have much experience with github but I managed to downgrade after some reading. Framepack is seriously impressive!
SDNext updated now
Also here is the new FramePack wiki: https://vladmandic.github.io/sdnext-docs/FramePack/
I tried to do faceswap on video using control tab, sdnext did see 577 frames, but all it does is faceswap on first frame for 577 times
are there examples of Loras that work with framepack? I've tried 3 different hunyuan loras from civitai and they all give load errors
You might want to create an issue on github for these.
Added multiple performance optimizations to NNCF in dev branch:
https://github.com/vladmandic/sdnext/commit/a4d4462e2a117946925cc81e57c4b984b947de02
https://github.com/vladmandic/sdnext/commit/a57c7087b83fc7ad69e697378e2ee9b59f240574
INT4 quants now runs 75% faster out of the box or 3.5 times faster with torch.compile compared to before.
INT8 quants now runs 30% faster out of the box or 2 times faster with torch.compile compared to before.
torch.compile is only used for the decompression, rest of the model is untouched
Flux.Dev now runs at 2.4 s/it with INT8_SYM quant or 2.5 s/it with INT4 quant
NNCF in SDNext beats Bitsandbytes now : )
when trying it on Juggernaut v9 I have
no NNCF 1.65 it/s
NNCF 1.72 it/s
NNCF decompress 1.79 it/s
NNCF decompress+matmul 1.7 it/s
above numbers are on 2nd generation after changing/apply setting, not sure why the first gen right after changing setting has much better speed
Those settings require full restart
btw, with this dev version, if I toggle on/off one NNCF setting, screen automatically scroll to top that I have to scroll down manually to NNCF setting area, and repeat this for every setting toggle
can I load gguf models in sdnext? I saved them in Unet folder, but they don't show up in base model dropdowns
NNCF settings are on top of the settings menu, just below bitsandbytes
Are you up to date?
i was using show all pages, won't have this problem if going directly to quant settings
UNets are in the UNet dropdown
Renamed NNCF to SDNQ in dev branch.
I have re-implemented and optimized enough code to not use any imports from NNCF and the modifed code is not really NNCF anymore.
HiDream now runs at 6 s/it with uint4 quant on A770
that's better than before but to be quite honest imo that's still far from reasonably usable. something like teacache/first block cache would likely help, and would probably be even better than comfyui given the speedup without it
generally a ~2-3x speedup for flux, i'd expect the same for hidream
4 s/it with teacache threshold at 0.17 or 3 s/it with teacache threshold at 0.3
that's better
I install torch xpu nightly first, and then requirements.txt and then webui.bat
isn't diffusers latest 0.33.1 on pypi
installing requirements.txt is not supported
Let webui.sh / webui.bat handle the venv
video on sndq quant should be pre or pro?
Managed to achieve 2.3 it/s with INT8 matmul and 2.1 it/s with FP8 matmul on on FLUX with an RTX 4090 (on Runpod) without using any custom kernels or CUDA
INT8 has basically the same quality as the full 16 bit model
It would be nice to see int8 or fp8 mm support in PyTorch for Intel
IPEX 2.7 sort of has int8 mm support but not really, IPEX 2.7 is just too slow with everything
SDXL with SDNQ, everything except the VAE is quantized
BF16 / INT8 weights + INT8 MatMul / INT6 weights + INT8 MatMul / UINT4 weights
@rustic saffron Explain your sdnext issues. What model, did you run with --use-ipex, and so on, show a screenshot of the bad image with the generation settings visible
KIOKIS V20 hyper model, and once i opened sdnext again, it just gave error on webpage and cmd just died without error, so like background timed out trying to load it. as for settings, i just ran default ones it game me, no idea what any of them even do so didnt change it in case i break something, not like they give me something broken out the gate, would be setup already and have options for some skilled people to micromanage and tweak it, at least how i see it.
Run sdnext with --reinstall --use-ipex
right so, got it installed, and it does work, with the juggernautXL_v8Rundiffusion, but kIOKISSDXLHyper_v20 just errors out, no error info, just webui says it, and cmd is press enter to continue, after the loading 1 model line. no other info there. so suppose it doesnt support kIOKISSDXLHyper_v20 for some reason maybe, which would suck, its a real top tier 13gb one.
how much ram do you have?
SDXL models are 6.5 GB
That 13 GB model is wasting 6.5 GB of space for no reason, find a FP16 version
32gb ddr4, vram is 12gb b580
and i ran that model for ages on my laptop, 16gb ddr4 8gb 3070 laptop gpu. it gets really amazing results. also i dont know what fp16 is.
that model is stored in 32 bits
but your GPU converts it to 16 bits to run it
It is 2x bigger for no reason
Find a 6.5 GB version of it, or you can use the models page to convert it yourself
im not smart enough to convert it, ill just use a different model. the 6.5gb ones seem to work, and honestly been good 20 hours trying to get any sdxl to run, so i'll take the w here.
do you want to do anime art or realistic things
realistic, dont do anime stuff, did test kiokis 3d model works, a little like 2.something gb one. so can use that for 3d stylized stuff.
Try juggernaut xl v9 + rundiffusionphoto 2, it should be better than v8 https://civitai.com/models/133005?modelVersionId=348913
SDXL
4 bits / 2bits / 1 bit
damn, impressively good at just 4 bits
oddly coherent at 1 xD
4bit always seems like the cutoff point for coherence
Meged to main branch
Benchmarks:
I keep getting this runtime warning when using ipex on my A580 when using txt2img, especially when faces are mentioned in the prompt. it outputs a black image
i've tried:
- disabling ipex optimizations
- ipex force attention slice 1
- fp16fix VAE model
- lower resolutions
- full precision
- reinstalling
- changing safetensor models
- VAE tiling
it doesn't seem to be a memory issue, i also got 32gb of ram and it doesn't fill up
i've been searching this issue for some time and trying all the fixes i could find but nothing really helped
the only 'stable' way i found was to use openvino but that doesn't really help when most of the things except the steps fall back to the cpu and slows the generation a lot
i'm very new to this so idk what to do anymore
Wrong resolution. 1.5 and SDXL are trained with specific resolutions and due to their architecture do not work well outside of them. For SDXL, those are 1536x640, 1344x768, 1216x832, 1152x896, 1024x1024, 896x1152, 832x1216, 768x1344 and 640x1536
the first 1-2 generations on the default 1024x1024 work but the ones after trigger the same error again. I also tried 1344x768 and 896x1152 but those either trigger the same error or just output a light purple-ish image
What model, what GPU, what driver version
Model: Ikastrious V10 (from civitai)
GPU: Intel Arc A580 8GB
Driver Ver: 32.0.101.6874
This is the second time i am seeing this issue and that GPU was also an A580
Normally this happens on unsuppoerted iGPUs with BF16
Can you try manually setting the dtype to FP16?
A580 might be unsupported by ipex / pytorch too
aaron has an a580 and seems to mostly not have much issues (with comfy)
I have an a750^^
that fixed it, thank you :D
gpu usage is also way more constistent, so yeah I guess bf16 doesnt work
hmmm after quite a lot of attempts, it's a bit inconsistent/odd. some prompts work well, some output the same error, some output a garbled image. i'm rather confused
i guess i'll give comfyui a shot
i doubt it'll be much different
I remember people using a580's fine must be some updates to either pytorch/ipex or the windows driver.
IT IS
the comfyui install with your script works flawlessly, and i mean completely flawlessly
thank you for adding whatever magic to it
But are the images sane though
Can you try setting IPEX_FORCE_ATTENTION_SLICE to 1
set IPEX_FORCE_ATTENTION_SLICE=1
.\webui.bat --use-ipex
This is the only difference between comfy and sdnext
I disabled dyn atten in sdnext in favor of flash atten with pytorch 2.7
SDNQ Quantization matrix with SDXL
(image is downscaled and webp compressed)
if you mean in uhhh sdnext, i did try that, didn't do anything
update: I enabled that thing again and --lowvram and it's working, somehow, idk how, but it is
to be fair, I did make a bug report about something similar for blender, where the renderer would just crash when the vram was almost full, so it might be related, funnily enough
Balanced offload might be broken on a580
Model offload might work but will have higher vram usage
yeah, im also getting worse iterations/s compared to using --low_vram
Flash attention works on intel now? Or is it just a flag for compatibility
PyTorch has built-in flash attention with all 3 GPU vendors
Intel got support with PyTorch 2.7
AMD got support with PyTorch 2.5
Nvidia got support with PyTorch 2.0
Torch sdpa uses flash, memory efficient, and math attention
Highest priority is flash atten, then memory efficient atten, if nothing works, then math atten
So it's still best to use sdpa?
yes
model offload seems to be working just fine for the generation part but Adetailer breaks, was working under --lowvram
edit: i got adetailer to "work" by using --lowvram and model offload but it's either replacing the face with garble or a mix of garble and the actual character's body on top of its face
- i'm still getting some occasional garbled generations so ig sequential might be the only one that works stable
I hope it's fixable somehow, i was getting near 1 it/s on model offload but 3.2s/it will have to do for now
I would but it's worse quality-wise
example: eyes
detailer left
adetailer right
same yolo model, same settings
yolo model has no affect on the image
both adetailer and built-in detailer is just inpainting
did you set the denoise / detailer strenght and the detailer steps to be the same on both?
yes!
i feel like the built-in detailer doesn't follow the prompt at all, even when i tried to manually give it the prompt
since the eyes dont match
Question, has the "Unable to allocate more than 4gb" thing fixed for the A series cards?
It still exist but there is nothing that needs a single 4 gb allocation anymore
Only the attention needed +4 gb but we now have flash attention, memory efficient attention and my dynamic attention that all uses much less memory and doesn't need +4 gb at all anymore
Also unable to allocate +4gb thing is a 32 bit hardware issue, it cannot be fixed, only workarounds exists
Flash attention and memory efficient is much faster than old attention so you don't really need any workaround.
But if you still need a workaround so that you can allocate a single 4gb block for whatever reason, the workaround is this:
export UR_L0_ENABLE_RELAXED_ALLOCATION_LIMITS=1
Right-o thanks
Can somebody help me setup the sdnext for my A750? It's just crashing with no output
I'm new to this
Give more info. What is "just crashing with no output", when is it
I got it running with a different model, but it's like creating weird stuff when I copied the exact settings of a ytber
Is there any video or anything which tells me how this works
IMO the vast majority of youtube tutorials on AI are best case, bad.
Comfy has a handy wiki, but it's for comfy. At a glance, I don't like what explanations sdnext has on its wiki
Show what "weird stuff" and your settings
Similar to Krea and Magnific but offline using Stable Diffusion. Just follow these steps and enhance a low resolution image better than you ever thought possible.
If you need to install Controlnet and Automatic1111, please check my video and written descriptions here:
https://www.hallett-ai.com/getting-started-free
Links from the Video
...
I followed this
The weird stuff was just just the top left of the picture being generated and it took like 1hr
rest was all grey
I did not take any screenshot of the result and can't wait another hour
You waited 1 hour for an image?
I will just try to show you the settings
yeah it took a long time
I knew my settings were wrong but there's just no tutorial anywhere which was useful
they used controlnet in the vid ill try to install that
Every image you generate is saved.
There's a gallery, and you can also just open sdnext's folder and go into the outputs folder and find them there as well.
Don't crop
And, you ran out of VRAM.
no it just took like 40 steps or something
40 times
Now if I generate it just takes like a few secs
that image is 732x906
Forget about that image help me setting up this
I check enable here and apply changes, it turns off itself
it shuts down and restarts the webui, but the extension doesnt load
.
sdnext should have controlnet built in, and most likely in the composite tab
This is why I keep asking you to not crop.
If you crop every time, I can't help you
oh you mean the screenshot
I thought in the resize settings because you've asked for the deformed result
You're using an sd 1.5 model. get the tile controlnet for sd 1.5, use the original unscaled image as input and upscale the image.
I found the controlnet in the control tab under control elements
but doesnt have all the stuff in the video
Finally managed to achieve performance increase using INT8 matmul on Flux with Intel ARC A770:
2.5 s/it -> 2.0 s/it
new pytorch version fixed it? or something else?
intel wants the exact opposite strided weights compared to amd and nvidia
intel want contiguous meanwhile amd and nvidia wants non-contiguous
amd is fine with either but performs better with non-contiguous
nvidia dies with contiguous
intel dies with non-contiguous
Also Flux is exactly at the breaking point of INT8 on A770
Anything smaller than Flux will run slower
flux has mnk dim of 3072
sdxl is 1280