#✨|sdxl
1 messages · Page 149 of 1
ive seen nvidia users report that a lot. try restarting the server and re-running the same seed see if it changes or is coincidence
man why are light paths so easy but basic anatomy is usually so hard
the specular on surfaces is always pristine lmao
there were image of x-ray hand which got bone where 6th finger is appearing. I think it will be better with anatomy.
M.I.B.
Someone knows what the commands are to enable animations in python?
math based with https://3b1b.github.io/manim/getting_started/example_scenes.html
If loras are already included in a sdxl model, do i still need to write "Lora example" in the prompt?
how does it even work?
With noise injection between the samplers.
what are some of your guys' favorite models for SDXL?
Jugernaut v3 or 5
what does juggernaut do over the base?
Not exactly sure, but i like it in general. For general purpose. I think base isnt at all bad, just using jugernaut, and for hiding pictures in pictures or text in pictures, because only for sd, i am using i think realistic vision SD in CN in A1111.
i might give it a try and test out some stuff with it
Curious if this is related to Soul's avatar or not 🙂
SDXL standard.
a surreal airbrush illustration of vibrant flowers coming to life in a dreamlike world. Let your creativity bloom with vivid colors and imaginative forms.
@versed merlin #1100170312106127410 - #1101178553900478464
Scary should i censor it? I think not. The first one got his left side issue 🙂
Nah, my 2yo kid says, "elk man"
Green face guy he said, "green elk man, silly red eyes"
Jackson the Michael
comfy will be great for creating game assets, pulling directly from python to call an image directly into the game
just use variables and arg values to supply the text to the nodes
last one
Can someone explain the differences:
Hey guys, does anyone have a good prompt to share? I want to try creating some simple vintage black and white illustrations like the example. If you can help me out, I'd be happy.
Anyone got a prompt that creates a clean visual representation of a neural network? Or else a Lora or some embeddings? I'm lost and can't create one in SD (Automatic1111 webUI),.. any help appreciated, thanks. I tried Img2img and it just copies the existing one I got from the interwebz,.. hardly original or accurate.
https://archive.ph/JJiih use python
sad xl died, but rumor is 3.0 might hit shortly. I hope only a rumor.
huh?
anyone played around with the new SamplerCustom node and know how to set a denoise value with it?
died?
Yep
users went back to 1.5 and just look at the state of all discords people gave SAI the middle fing, mostly, to SDXL.
Olvio had a vid about this recently and people just aren't using SDXL.
idk 90% of SD social media posts that gain traction are just those animated things which I dont think has tooling for XL yet
so i wouldnt write off XL just because some tools arent updated yet
Oh, I have written it off and it died faster than 2.1 did and I stayed with 2.1 despite that.
1.5 can do most things 2.1 can but XL is different enough there's things only it can do
like 4k denoising
idk i havent had any reason to use my 1.5 tunes from XL unless there's some extremely specific lora or conditioning needed
Not gonna go into it but the people aren't having what XL gives. They are firmly happy with 1.5 and as one person wrote about 3.0 "If SAI doesn't rein in the resource requirements of 3.0 it will be still born".
most video games dont even run on 8 gigs of vram anymore nevermind SD lol
if people wanna use their 6 gig cards they'll always have 1.5 but idk why other people with 8-12+ cards would ever use it over XL outside tooling
Just telling you the way the public feels and thinks. When there is one that does 1536x1536 natively and only reuires 4-8 gigs yeah, sai needs to get it together.
The unhappy are always the loudest
yea i only post SD stuff if i make new nodes or something I dont post random renders
tools suck for it ngl. As a trainer none of us are happy with the BS SAI does. pump and dump is for ex hedge fund managers not for stuff like this.
last actual picture I posted was just demoing the colored latents
was 1.5 any different? it was just a plain model uploaded and everyone built their own tools around it.
I expect no less from people here as it is akin to walking into a dnc office and saying you are voting Trump.
fact is I have a 4090 so all moot to me but it sure sucks when we lack the tools to train XL because SAI just dump the mdel and ran off to work on 3.0
not even a 4090 just like an RX 6800 will get you 90% of the way there with most inference tools today.
1.5 tools for XL is not the way and this being so different with two tes and noise offset being used on base (a total BS move) it really hurts. eventually the tools will catch up but which first new tools or 3.0 before they can be even made for XL?
i wonder if they might rush 3.0 specifically because the tools arent catching up
drop the refiner improve the offset noise etc
make it easier to work with
but i mean idk. in the end we're not their customer base, their base is private tunes for enterprise iirc. so totally different world
Yup, end goal is some hefty contracts from corporations
wonder if you could 8bit quantize an image model and not have it get super fucked up
like how the llama people do it
the quantizing game is pretty advanced now with exllama2 and stuff. quants usually outperform everything else gig-per-gig of vram
so 8gig people could run a quanted version
i mean for 3.0. XL already runs on 8 gigs iirc
if you use xformers
I dunno but the german one has potential and a lot of trainers are watching it closely. Onetrainer is making a training tool for it so that will help immensely.
i have no idea what you're talking about
that is the one that is faster than SD and does larger gens
wurstchen 3
when the trainers get to it they will ravage it and make it better as they did with 1.5
I have a dataset that is pissing me off
refuses to train right and just blew up on me in 300 steps using adafactor
that's already in diffusers it seems. could try it out yourself and repot back
btw, my testing in comfyui shows the node leaks 😦
no, 3 is still being tweaked
2 is
I am on their discord
Ah
disable the smart mem
try it out you will see the noise and stuff but they are working on it just a couple of guys
smart mem?
--disable-smart-memory
let me try that
fixed the vram slowly crawling up until HIP died for me
yep, not in there
?
.\python_embeded\python.exe -s ComfyUI\main.py --windows-standalone-build --use-pytorch-cross-attention --gpu-only --dont-upcast-attention --disable-xformers
what about it
you ? there is what I was talking about
yea try adding --disable-smart-memory to your command
adding it now, thanks
not sure why it was leaking because it should be plain on left and lora on the right but the lora was affecting the plain
they aren't even connected
ohhhhh I thought you meant memory leak like number go up
no
that still might help tbh since it'll unload more aggressively
the node leaks the lora
try the new comfy with pytorch 2.1 it doesnt use xformers https://github.com/comfyanonymous/ComfyUI/releases
or just enable the pytorch attention via cli since that's what it's doing anyways I assume
12.1
my comfy still on 11.8
I think.
lol, I forgot this is windows and he doesn't use a venv in windows
ay you have a 4090 you should try the new brainfloat unet
get that infinity+1 batch size
when I train I use bf16
what does that really do for 3k and 4k cards doing unet in bf16?
how is it leaking lora? like you ctrl-b bypass it and it's still there?
I use the comfyroll node that has the on and off switch. I turn it on get pic X for left side. Turn it on get a different pic for left side.
left side doesn't even see my lora it isn't connected
idk tbh I suppose the unet autocasts to fp16 by default so it'd maybe give slightly better precision?
y tho
just ctrl-b the builtin node
if you have to click the switch anyways...
I don't, that is the point
im not understanding I guess. when you say switch i'm thinking bool toggle
left side is always without lora and right is always with (or make sure turn it off and should be the same) only with the lora anywhere the base left side is not the same
is it running in the same batch?
nope
you're gonna have to screenshot it cause i dont know what the fuck is going on
where even is the lora then
From the workflow I posted the other day showing two image results from the same prompt box, one with Lora and one without lora, I do not get any of that happening in my flow.
Something is messed up in yours if that's happening
okies, cool beans and good to know. Thanks
try without the custom node
im 90% sure it's that switch you're talking about. report it as a bug
I use that node without issues
if it's an auto switch or something comfy caches the model value at the tail end of the node so it'll re-use the lora'd version unless something inputting into the lora node itself changes
it also doesnt clone the cache for children nodes, it just passes a pointer
so the children have to make sure to clone and whatnot if they need to
or else you can pollute the upstream cache iirc
I used that --bf switch
not sure what facerestore is doing in comfy as I never downloaded it but I guess it doesn't like it
I disabled it
do you just have the comfyui equivalent of Skyrim with 200 sex mods
now suddenly Serana is a floating head and arms and you have to run those BSA compatibility checker tools to find every mod that touches her character cell
no, I try to keep it as clean as I can until some goddamn image I get has unknown nodes then I have to go grab them for its workflow
when I open someone's canvas and it's 90% red I just give up
not worth
i dont even have the manager installed I download my nodes by hand like the good ole days of 9/11
I just noticed something
when I ran the workflow it started the lora then the left side then the right side. It should have done the left side, then load the lora and start the right side
post the full flowchart
and I can look at it like a guy that opens the hood of his car and stares at it before calling a mechanic
It'll process nodes in order
left side should not have changed
if I tell it off or none for lora it goes back to the right pic which is the first of those two.
it does not do it with all loras only some loras
making me have to go back to automatic1111
here is a prime example
no idea why that is happening but not on all loras
wth?
ctrl-b made even a different image
anyway, now you see what I am talking about
I wonder if because I only load the one model node and split off model to lora and to ksampler?
nope
not a big deal I always have automatic1111 to test my trainings with. has scripting too
all I can say is i've never had the prompt/lora bleeding ever in auto or comfy so I cant relate
the main problems are always just memory use which comfy without smart mem does best at from my findings
are you on Xformers or SDP? Could try SDP see if its still bad. I read someone saying xformers can make the problem worse but idk he might just be on dope.
I gave up on comfyui for this and realize it can't do what I need it to do. Might just be on my side but, in the end, that is all I care about is on my side so if I can't get it to work with what I am doing I just go back to automatic1111. As I said it isn't a big deal.
Is there a limit I should look out for when training a lora for sdxl
like a max amount of images
no hard limits but it depends on what you are training
I just trained on 75 images for a lora. Most times I like 20.
so 200 images would be to much or redundant?
I just trained one with 1.750. But i have no idea 😅
Next round i try 4.000 and see if there is a difference
I use google collab to train my Lora’s, locally I use a 8gb 3070
o.k. i have same gpu and no chance to training loras. So if google collab is working for you then o.k. 👍
Nah in my experience you need a minimum of 15gb of vram, I bought my gpu right before sdxl released too 
12gb is fine for training. I am using a 4070
it isnt so rare
nice color like real life one
Thank you, i am currently working on a lora for that.
wth
havent seen a polar bear-tiger? 😆
oh how it makes sense now 
random prompt generator.
I accidentally created an American Eagle magazine ad generator.... lol
what is the best auto captioning for loras right now? is it still blip?
GT-V
GPT-V*
It's how I'm doing all my captioning now. Gonna be even better when the api is out.
through bing or are you paying the 20 bucks?
I added a tutorial for it over on SCG-Playground
I pay, but I use ChatGPT for a lot more than SD, though it saves me soooo much time I would consider paying for it all the same. It literally saves me hours per training
yeah I should really pay for it too
do you have a link for this, it sounds like it might be usefull
Have you been on my discord yet? If not, I'll shoot you an invite for my server, instructions are a community post
nah shoot man
properly selected vae?
150 steps too high, Euler a Sampler, pick another, wrong vae, 1.5 weight on ctrl net too high, cfg scale 2.5 too low
my usual settings do the same, resorting to old faithfuls
K so must be the VAE thing cause i ive never touched it 
THX 
I don't use Auto anymore and I know these options slightly changed so I'm not 100%, but check here
I think that checkbox is new, if it's selected, looks like that's when it uses the manually selected vae
thank you ser, will look into it
That's a VAE issue. Likely using xl VAE on 1.5 or vice versa.
Hail to the king baby!
Willie still looking good as always
worst on spiderman are fingers
@uncut fiber i trained an llm lora to generate sd prompts, wanna try it?
Thank you but i dont know how to install llm at all 🙂
Hello all...
guys i never used sd xl; which ui should i use? the node one is good?
the node one (comfyui) is more friendly with resources
Crazy quality... must be a massive workflow and GPU?
just install oobabooga
@upbeat summit @crisp owl do you guys also want it?
Thank you 🙂
I'm using ProtoVision by @delicate kelp as a model here which is a big part of the fidelity and coherence.
base resolution: 1344x768 - so SDXL in-spec
base sampler -> latent upscale (hires fix) -> pixel upscale -> post-processing (film grain)
rest is prompting. so nothing really crazy.
realistic cinematic film still from behind, perched high on a city skyscraper, Miles Morales wearing a hoodie and his mask and is illuminated by the twinkling city lights below. his figure is highlighted by the cool, diffuse light of the moon, close up action hero pose shallow depth of field
here are all 4 stages of the image gen (base, latent upscale, pixel upscale, post)
yeah - sounds existing! would love to try it out. but have to do some work first - but I can check it out later!
I can take a peep yeah, would likely be tonight I fiddle with
kinda a pia to get, but finally got a good decent depiction of Odin's sacrifice of his eye
@ionic dragon oobabooga is extension or where i found it? Going to check github 🙂
o.k. i found it, will install tomorrow. I hope i will manage it. Thank you!
yea, please dm so i can send the link
That looks fun
What Loras and models can I use to make my pictures as good as those on Pixiv? Or at least close to it?
you can use base model with no lora and get good images
but like this one?
or this?
@ionic dragon i have only 8GB VRAM will it works?
i knew it, but will install oobabooga anyway I mean i expected it
For example but also the lewder ones
I think I'm getting closer. Not bad for a rookie... trying to follow what you said...
reactor (result, source, face)
@upbeat summit Have you shared your workflow somewhere?
@vital ermine
AMD 7900 cards are officially supported for pytorch and other now.
https://community.amd.com/t5/ai/amd-extends-support-for-pytorch-machine-learning-development-on/ba-p/637756
their "official pytorch" support is rocm 5.7+ so still nightly
opinions about the sapphire 7900 xt that's on sale for $400 on amazon?
keepa history for context
userbenchmark says it's "hugely better" than my 2060S.
That's all my input lol
I have not found that site to be useful so I stick with passmark
tbh I haven't really checked much on those sites in general, but that's good to know 👍🏻
I'm still waiting on any official info regarding the 5k series cards before i determine what I'm gonna do to upgrade from mine
passmark's score is pretty linear and they have some nice shopping tools. it's always my first stop on a build
can SDLX combine images? For exa: I generate a clothe with a certain drawing and I want a character with that clothe
Explore a unique art gallery featuring AI-generated images inspired by the themes of Andreas Achenbach. From serene landscapes like 'Snow-covered Village' and 'Reading at the Park' to intriguing concepts like 'Space Pirate' and 'The New Tower', this collection showcases the fusion of classic art inspiration with cutting-edge AI technology from S...
my last video. Next should be Van Gogh
if it's a sapphire OEM those are usually good. if it's just the reference model with a saphire sticker I hear they arent the best
but I mean $400
I paid $1000 for my XTX
check the seller cause 400 seems way too low. the 7800 XT is more than that
I guess you have choices depending on what you need:
(1) Use img2img where you mask out the person but not the clothes and generate a new person but leave the clothes alone.
(2) Train a lora for your clothes and then you can easily make different people where it by using the keyword.
(3) Use ipadapter with the source image being your clothes and sdxl will often not modify them too much.
Here's a super quick example of (3). I just downloaded an image from the internet (left image) and then used the apadpater+ node in comfyui with a different character name. It messed up the face in the sdxl image but you can see how similar it kept the dress. There is also a weight for idapter then controls how strong the source image will be.
seller with a single review and highly abnormal prices
also the AI generated chinese name
used to get around blacklist
dall-e is making some great stuff, but it always go wrong with poses, colors, details and/or background
Hi, which prompt did you use for this? I've been trying something similar but without success. Could you help me? Thank you.
I cheated, it's dalle3. 😐
this looks great! nice work
@nimble heart yeah, I told him and he said fantastic as it may now solve the issues he has getting it to work right in Linux. Thanks for the info for him.
I gained about 0.25 it/s on sd 1.5 with the latest nightly over stable torch
smol steps
comfyui no longer works right in Linux on his amd as it will say it received the prompt and wait for 20minutes but no issues in windows.
I share my workflow with almost all my images that I made for other fine-tuners on Civitai: https://civitai.com/user/masslevel. It's a very simplified one, but it does the job. It doesn't always have to be complicated or complex to make interesting images.
I got a couple of other workflows like the one I used for the Spider-Man images, but they use a lot of nodes that I customized to my needs. But if you have any questions, let me know anytime!
apparently someone hacked flash attention to barely work for inference and got 30 it/s
comfy works great for me.
Don't know, don't care just saying he isn't having it work for him. Not sure why but oh, well.
I wish I got that but 7it/s is my max for XL
make the venv manually instead of using the script
I always do for cloned torch projects and I never have any issues
His problem, not mine as I have my own set of problems with this to try and figure out.
other than that idk, pytorch statically links the rocm libs and compiler so it should be distro package agnostic. skill issue I guess
I suspect my 6ish it/s is due to being gen 3 x16 so half the bandwidth but gen 4 is not going to make me get x2 it/s
im still at like 3.14 or something for XL lol
was that 30it/s for 1.5 or something?
yea
ahhh. well I get fast on that but nothing like that
if you take 5 hours to hack flash attention to work apparently the XTX hits 30 it/s on 1.5 512
if I were on gen 4 I suspect 25-30
havent personally verified cause im not C++ fluent
yea I peak at 3.17 it/s after warmup
ffor XL
I peak from 6-7
using sub quad attention
SDP I only get 3.13 and the VAE bombs vram even harder
I love it when people are saying their 4090 gets 20+ it/s on XL
comfyui literally spits warnings when using SDP whithout flash attention
pretty cool used to not do that
YET, sdp was supposed to be brand agnostic/neutral
it is
in that it works
but yea it uses I think the doggettx or something attention as a fallback if flash isnt available
yeah, the old orgiinal one
amd alerady has flash for all the MI gpus so i guess we just wait for rdna 3 support....
slow now but was a godsend when it hit
i remember when I first got xformers working in auto for 1.4 and it was like brain explosion
like 50% speed and half memory
well, I am waiting for Xformers new version to arrive it is faster and less mem again for amp/ada cards.
Uses TC I guess
yea uses flash attention 2.0
meanwhile I just want flash 1...
afaik there's nothing stopping the rdna 3 cards from running it besides software
i forgot if it was gemm/wmma or what but whatever hwaccel flash uses has rdna 3 kernels in rocm
wonder if fused attention would work since its just implemented in triton afaik
does anyone have sucess training in XL?
yes, but not exactly easy
the tools are being update because research is showing to turn off one of the TEs not both but for now all we can do is turn both off.
I want to train a model so it paints like my artstyle and I can work faster in my projects, it is possible?
how do you turn them off? 
for lora?
--network_train_unet_only for lora
For both Textual inversion and lora
not sure how to input that code
it´s on a notebook?
I use it locally and haven't done a TI since 2.0 so can't help there. I use Kohya_ss.
no prob
is it possible to have a separate bat file that automatically loads sdXL_v10 as the model, sdxl refiner as the refiner tab, and the sdxl VAE as VAE and sets image parameters at 1344*768?
I'm about to try some outdoor and architectural stuff with Juggernaut XL, which I guess it's supposed to be good for
Found a creature, gill man creature, horror
I'm trying to create something of a demon throne room
Hi guys, I'm new here, and I use DiffusionBee & InvokeAI that can't/doesn't allow SDXL. Where are you guys using these SDXL models? what interface, or program please?
is there a node or something so we can colour code the noodles to better trace them out?
Love it
https://civitai.com/models/136070?modelVersionId=155532 where do these go in comfy? (what dir)
it seems it is running on my cpu, will it make some difference?
Cpu is very slow, like really slow
so probably on gpu, but according to msi afterburner, it is in normal memory running about speed 6tokens/sec
i cant read english faster 😄
o.k. it uses gpu but very low, and VRAM usage is say 2,5GB
Will delete, does this look right XD Or does it have to be SDXL VAE?
sdxl vae
argh XD
i get error on comfy ui when i use this node, it said theres something wrong in the face detailer part
I just wanna know if its only me or its actually the workflow so if anybody have time, plis try it for me T-T
So heres the thing, when i set the VAE to SDXL VAE, then press "apply settings", it says "0 settings changed", and if i reload UI, the vae is set to the old one i had o0
any ideas would be appreciated much
this is for Auto
or must one change it to XL VAE ere time he/she/they/them loads the UI
I love the look of XL images, but how do you quickly iterate on prompts when they take like 4x as long per image?
Does anyone have fast settings to suggest that don't degrade the image too much? e.g. 512x512 is faster than 1280x768... but the images look nothing alike and are often half baked
Unsure exactly on the workings of A1111 anymore, but I know I made my VAE options quick options so they appeared at my top bar. (user interface menu I believe is there you set those)
watch the live preview and cancel if its gonna be bad
I always build torch and xformers from source because I'm actually crazy, but if little speed ups like this is what you want I'd say that's the right thing to do
yeah, that's why I mentioned I'm insane
argh just wont switch to SDXL VAE XD
keep getting "AttributeError: 'NoneType' object has no attribute 'pop'"
I deleted repositories and VENV to no avail 
gonna spend more time recompiling than you'll gain in speed
only reason I'd ever do it is if I need different build flags from nightly like for a new ROCm version
also idk how the pre-built stable torch packages are built with flash attention on windows
¯_(ツ)_/¯
I was just looking at that seeing if I can bullshit my way into making it work on AMD lol
flash attention isn't rawly compatible with windows, they must have some kind of workaround when they're building the stable releases
given this
https://github.com/ROCmSoftwarePlatform/composable_kernel/pull/943
it might be possible now?
but im not smart enough to implement it
exLLaMaV1 is faster than V2 if you don't have flash attention, which is really dumb
2 is faster for me still
maybe linux or amd thing
main benefit of v2 is the new quant format anyways which isnt compatible on v1
maybe, my environment is fully optimized and exLLaMaV1 is about 60% faster than V2
I did something I guess
v2 has native rocm v1 doesnt
I'm talking 40T/s vs 14T/s
i get 40 t/s on v2 lol
though every time I turn it on it yells at me for not having flash
idk, V2 isn't as refined or flexible as V1 for NVIDIA users due to flash attention being the main speed up for NVIDIA
does v1 use xformers for you or something?
yeah, V1 uses Xformers instead of flash attention, and it's way faster than V2 for me
cause xformers actually works on windows lol
so if i get 40 T/S without flash wtf will i get with
I mean, Xformers IS flash attention, just a refined version of it..
No, the point is to generate a batch of like ~50 pictures trying different things, go do some chores and come back to figure out what worked before doing a higher resolution version of that image
"the problem you haven't isn't a problem I have so just stop having it" - thanks
yeah, I get half of that with V2
if you're going to do chores how is it too slow?
that's identical to what I get on V1 though
Because obviously I don't use your computer
you haven't given any context besides "slower"
are we talking like 10 minutes per image or something?
Generating ~78 images took about ~40 minutes for me
not too bad for 1024 tbh what sample count is that
This is my setting
if you're just iterating prompt I'd drop it to like 30 or less
You could do some tricky stuff, but it would would require doing something like using the advanced ksampler, setting it to your preferred amount of steps you usually do, but then setting it to end at say 10 steps, instead of the full say 40 steps it's ultimately aiming to achieve.
Then just decode that latent and save.
You won't have a full complete image, but you can set 50 of those to complete much faster than waiting for the full amount of steps
is the refiner on?
It is off
I'd try DPM++ 2M Karras @ 25 samples. should be faster and probably converge a bit better
DPM2a is one of those double samplers so it renders at half speed, which means you're at equivalent 32 samples
yeah, that's also what I personally use on ComfyUI
if you have VRAM to spare bumping up the batch size to 4 or so will give you like 10% more speed too
Ah, maybe I selected the wrong one earlier or forgot what setting I was using
70 images is also super overkill for just seeing how a prompt does IMO. I usually do like 10 ish for each variant.
10 at the most
I like to make grids
so I can see how different words effect each other in combination
you're on what was it a 3060 ti?
4070ti
example of my workflow (SD 1.0)
damn that seems low. thought those had more tensor cores
40T/s is low on 13b with exLLaMa?
well, idk then
i thought the 4070 ti was supposed to be comparable to a 3090 but with half vram
is 14 with context 100% loaded?
it is, when it's entirely allocated
so like 3800 context and 200 new tokens or something
with DeepSpeed I get closer to exLLaMa 1
but not as good as if I would use flash attention
flash helps most with full context
no, with just 1 message
for the price of a 4070 ti you can buy a 7900 XT
if the 5000 series will be as cucked as the 4000 series I WILL find them
meanwhile the XTX is still cheaper than the 4080
4070ti got small bus, or is it different card?
does it make that big of a difference with LLMs?
I thought the cache helped out enough
yea 4070 ti is the 192 bus boy
AMD is miles better than NVIDIA with hardware, I'd love to see the day NVIDIA gets completely destroyed by AMD, we've had enough
are the vram components expensive ?
i think ppl using llm using as well image generation
no, NVIDIA saves nothing by cucking them, it's just a method of forcing you to buy a 4090 or a 4080TI
the rest of the 4000 series has 128bit VRAM
im show of my XTX in simracing stuff sometimes. Playing VR with 100% maxed out graphics only using 40% of my GPU lol.
all the kids in the discord server think I have an overclocked 4090 or something
for games the new amd cards definitely fuck
but the ML software is still in omega catchup mode
I'd argue if AMD would have software at the same level as CUDA the fastest AI inference is likely AMD
I hope that'll be the case one day
4080 has 256
yea its a super weird config. AMD cut costs on VRAM by using gddr6 non-X and still giving it a fat bus, cache, and clocks.
@nimble heart simracers? Great! What sim you prefer?
ah Im not into iracing or anything. I have assetto corsa but I mostly play beamng.
im omega casual. have a $250 wheel and no shifter.
with a homemade desk mount
assetto corsa isnt bad, only that manager is must.
I fucking hate the interface on that game. paying for the manager just to make the game usable is insane.
I dont play multiplayer really anyways so beamng has lots more to do
only thing that hurts with beam is the AI racers are garbage.
@nimble heart any idea?
no it is possible with free version @nimble heart
I was testing stuff again and now it didn't even load
yea it goes from miserable to just unfun lol
no, graphics is phenomenal 🙂
if you have exllama2 installed manually to repositories/, move it and update text gen
exl2 is bundled with the deps now
I'll just reinstall the UI
i am playing most AMS2 because easiest make multiplayer with friends, no rocket science 🙂
what model would you suggest average user with 3070 8GB?
Using llama-2-7b-chat.Q4_K_M.gguf
and the graphics mods are paid for the good versions too lol
it is not. It is all posible with free version
idk beam looks fine, has good VR/wheel support, and has lots to do so im fine with it. Not interested in buying a $1k rig and getting competitive
yes i know it
@nimble heart which torch version does oobabooga work best with?
you all are using 13b?
yea
my model is based on llama2 so i also have 4k context, but I dont measure with it filled.
i have only 8GB but can probably try it, but i think i am lost
I just wanna hook up my wheel toss on my rift and drive around for a while. Setting up SRP with the AI cars and visual mods and content manager and all that shit is way too much effort. spent enough time modding Skyrim I've had my fill.
is it recommended to train a realistic lora on a realistic model like realvis or just use the sdxl base model?
yes just that free thing is easy to setup, and it is like night and day. But one has to trust developers. I understand you! @nimble heart
- beam aint look half bad anymore if you havent seen it in a few years.
13B needs to be quanted down to 4bit to work on 8 gigs
so if you're benchmarking speed that'll perform way different than 8bit
i made what?
I just figured out how PyTorch are building the packages with flash attention
It's possible to allow Conda to build flash attention from source
i mean you won the lap in video
wonder if you could to that for exllama
ah yea. I gave them a gnarly head start since the car is OP.
yes
Because exLLaMa V2 isn't conjoined with flash attention, if I build flash attention from source it should work with exLLaMa 2
hypothetically the GT4 looking car should be faster but the AI corners like grandma
yea what I was thinking
it checks for it at runtime
After I'll finish doing that I can truly compare exLLaMa 2 and 1
I think Oobabooga should be shipped with a pre-built venv so everyone could use everything right off the bat
if you add another lap even a 4cyl sedan with no aero handily beats their AI. biggest peeve tbh but not enough to redownload all 80 gigs of asseto corsa lol
https://youtu.be/WE2Y5gXYBhA
ugh.
maybe flash attention expects CUDA 12.1?
11.6 and above
Then wtf is that error?
I'll try a different torch version
I remember TRT also having issues with stuff like this
12.1 might literally be too new
I have to install the other torch version with Conda, right? It will make sure it's all compatible
Why not? That's how Oobabooga says it should be done
tf you mean why not? virtualenv is a lot easier to set up manually
nope, same stupidity
just virtualenv virtualenv; overlay use virtualenv/bin/activate.nu; pip install -Ur requirements_amd.txt
boom ooba works
repeat for comfyui, auto1111, anything
what'd it say while compiling the flash extension?
ninja spit out anything funky?
https://github.com/Dao-AILab/flash-attention/issues/604#issuecomment-1763226621
guy seems like he knows what he's doing. got it working on mixed 3060 + 4070
I'll try to uninstall flash attention, it's really funky rn
I'll test if even normal EXL2 works
anyone good at controlnet here? I have used it a lot for my datasets but never with body positioning. can someone say how I do that?
with Contrlnet OpenPose
oh General a sorta tracker PR for some major rocm flash attention stuff
https://github.com/ROCmSoftwarePlatform/composable_kernel/pull/810
navi 21 and 31 pass whatever they're testing it seems. They dont even test 5000 series cards lol.
495 commits ahead of develop oofie
sheesh
Well, I hope they can pull it off and my friend told me last night he finally got Linux to run right with comfy.
nice. 19.5 it/s on 1.5 and 3.17 it/s on XL is what I get for reference. I have the latent2rgb live previews enabled so that might reduce it a few %
yeah, it will, it will
okay 495 commits sounds scary but 65,000 added lines sounds scarier
I wouldn't want to track them
is anyone using sdxl base on automatic1111 ?
using it right now for the aboves
how is the results ?
comfy is better
im a bit out of my water here but it almost seems like rocm straight didnt have the architecture in place that newer flash attentions needed so they have to build up the entire stack from scratch with all the shit they were missing until they can finally compile the flash attention tests
I just can only take comfy for so long
I did above
what resolution @simple agate
this one is mines 😄
bru
oh, damn I know that issue
512x512? SDXL need 1024x1024
yep
you cannot, not even in comfy, use 512
XL Can do 512 it just looks soupy, not outright fried
13b is llama 🙂
with refiner right ?
most times no but above yes
XL @ 512 for "photo of a locomotive. bad yes but not even close to that XBOX red ring of death looking picture
size of model is about 7GB refiner 6 so yes
XL 512 + the 1.5 vae looks closer
so double check that ig
actually I can probably just look at the meta
damn its a copy paste
the untrimmed XL 0.9 download was indeed 13 gigabytes iirc
i think sdxl+refiner is 13GB still i think
ah maybe
now im taking error about vae 32 bit something ..
yes, --no-half-vae
thanks gonna try
k
does auto still default to 16 vae?
yup
@nimble heart He told me it was his linux install as he installed fresh but something didn't work then installed it again and all good now.
@nimble heart yo?
that with flash or just exl2
flash
it's so bad
bru
at 1024x1024 it's 2.34 it/s consistently
what quant is that? you out of vram?
what are your settings because that is what he is getting?
I have one that should work on 12 gig cards
comfyui or auto
auto I cant get above like 2.5
comfyui
VRAM is over 90% full, it's 4bit. when a 4000 NVIDIA card that isn't 4080 or higher get's above that precentage it ceases
--output-directory /tmp/comfyui
--port 7776
--dont-upcast-attention
--preview-method latent2rgb
--disable-smart-memory
ah
the not upcasting gives you like +50%
that seems crazy. from my measurements a 5bit model should fit...
another instance of NVIDIA being soooooo good
I have some quants if you wanna try them
https://huggingface.co/Beinsezii/MythoMax-L2-13b-EXL2
those measurements are without flash attention so you should hypothetically use even less VRAM
Output generated in 54.21 seconds (0.02 tokens/s, 1 tokens, context 65, seed 792262846)
nah.
yea something aint right
I would actually die of old age before it generates a sentance
I'm not even kidding
is it falling to software
no errors whatsoever, it's just NVIDIA's card is too smart for this
@nimble heart Thanks he put those in and is now at 3.19 it/s
basically exactly what I get with the default settings too
yeah, he said they were rock steady
nice. .02 higher than me lol. I have my card in silent mode instead of OC mode though
if he wants to try getting flash attention working he could possibly push that up to like 5+ it/s if the other numbers were to be believed
you saw what that did to my speeds..
not sure. those guys on the flash attn github got it working on a 4070 so start copy pasting I guess
he's on linux with ROCm totally different
- yours isnt compiled correctly clearly
okay, it's now 35T/s, way better. it just throws random words though
great random word generator imo
nvidia doesnt like you lmao
i havent had that happen since i tried running an opt model with BnB
that's mythomax-l2
that's super fucked up my quant looks nothing like that
if I can recall; that model isn't a random word generator, is it? is it because I used a GPTQ model?
try one of my EXL2 quants I know they dont die lol
hypothetically with FA2 the 5bit would probably work with 4k or close to context
exllama2 can do 4bit gptq or mixed bit exl2
out of curiosity try reducing the exllama2 shape from 4096 to 2048
it's already on 2048
should I do 4096?
damn wtf
i mean if its a 4bit you should be able to
but it definitely wont run better I wouldnt think lol
try it
leeroy jenkins
also maybe try the non-HF version
I was already on that one
the _HF version uses different samplers
should I try the HF one?
does it work well for sdxl yet?
is this fine?
alright, now down to 9t/s, it does this with those settings
I give up, man
exLLaMa 2 is a failure, V1 is faster and it doesn't turn 13B llms into weird gpt-2 offsprings
skill issue
IDK how you managed to break it
I've converted like 3 other windows users to Exllama 2 and you're the first with it being slower
Nvidia heard you talking shit and blacklisted your card
I'll just stick to exLLaMa 1 then, it was fine
good, because I meant 100% of it
they suck
if you're using gptq there's really no difference anyways. they basically use the same kernels
To whomever helped me recently, THANK YOU'S!!!!! We did it, all sdxl babbbyyyy, Base & Ref, CN, VAE SSSSSOOOOOOOOO HAPPY its fixed WWWWWOOOOOOHHHOOOO
you have to use the exl2 specific quant to see the huge benefits
I did, it turned into a slower version of GPT-2
@nimble heart what is best preset? Simple?
my 13b llama2 based EXL2 quanta outperform 30b gptq llama 1
that summin a saffa would say
on ooba? simple is the best overall, but others will be better at chat specifically
o.k. thank you
o.k. going to check it
I lied it's not on the main repo it's here
https://github.com/oobabooga/oobabooga.github.io/blob/main/arena/results.md
tldr depending on the model Midnight Enigma, Yara, or Shortwave are theoretically the best at chat
for mythomax I think midnight is clearly the best. it hates Yara for sure.
Thank you @nimble heart
don't know who that is
I probably picked it up from my best friend's wife
You
which star is closest to the earth
AI
The star that is closest to Earth is Proxima Centauri, a small red dwarf located about 4.24 light-years away (approximately 26.8 trillion miles or 43.4 trillion kilometers). Proxima Centauri is part of the Alpha Centauri star system, which also includes the brighter and more massive stars Alpha Centauri A and B 🌠
LOL!!!
did it add the star emoji itself
yes
that's funny. what model
what's the quant level
4bit?
no mmnt i think llma.cpp
ah. for llama.cpp you could easily do 8bit and just offload the last like 6 layers onto your cpu
you could probably do 13B 4bit entirely on GPU if you set up flash.
i am happy with what i got. That question is bit tricky
SDXL from Distillery @upper fractal
My IPAdapter setup keeps crashing when I go to 1024x1024 from 512x512!!!
@crisp owl @delicate kelp
have fun 🙂
#✨|sdxl message right here 🤭
in Auto I would recommend selecting the VAE selector (I think it's called sd_vae or something) as one of your quick settings. Then you can monitor it easily as it should appear at the top of your screen in the generating screens
And @half cedar 🥰
Yea got that alpha and did it, SD VAE 👊
what is the latest sdxl model and refiner ?
No other official SAI models, just base and refiner.
And as far as I've seen, no community based refiners.
But there's always new community models trained and released on places like Civitai
SDXL has almost everything inside. Working with prompt and just a lora like xl_more_Art you can get endless styles
True!! I like so much that style. Btw this is SDXL
yeah 🙂
Yesterday i have trained a lora but accidentally used the wrong image folder. With that lora i can generate images with a resolution as low as 392x392 without getting complete garbage. 11 it/s for the win. lol
whoahhh official nvda extension on a1111 on tensorRT
https://github.com/NVIDIA/Stable-Diffusion-WebUI-TensorRT
very juicy
Now we just need a port for comfy
Is SDXL ever going to be supported on AMD cards?
i think its pretty neat that a large company actually devoted dev hours in creating the extension lmao
did you try it?
not yet
isnt it response that amd bought something related?
ok I'm going to install it now
damn they even got someone to do technical writing on customer support in how to install a1111 lmao https://nvidia.custhelp.com/app/answers/detail/a_id/5487
Error caught was: No module named 'triton'
RuntimeError: Couldn't install protobuf.
*** Command: "D:\SDXL\stable-diffusion-webui\venv\Scripts\python.exe" -m pip install protobuf==3.20.2 --prefer-binary
*** Error code: 1
go over it and it will work, just not sure it is faster, having 7it/s at 768x768
I get error
entry point for ?destroyTensorDescriptorEx@ops@cudnn
and also
ERROR:root:Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)
opened an issue
me as well 🙂
it is weird they are talking about generating button and button is exporting... i hope everything is o.k.
Odin (a bit pudgy) riding Sleipnir
what simpling method, simplig steps, cfg scale is used here?
This was using a preconditioning method.
3 steps to start, latent pushed to the main doing 20 steps, latent pushed to refiner for 5 steps, latent pushed to upscale for another 10 steps.
cfg 7.
dpmpp_3m_sde_gpu/exponential
Oh, for the love of Sleipnir, how did you prompt the 8-legged horse? That has been my quest sinve midjourney v4
final outcome was actually several back and forth processes between my main workflow and my img2img workflow.
So not necessarily purely prompted, though I did manage to get several 6/7 legged horses just with prompting, I just couldn't get the complete 8 without a bit of additional work
that's the one Loki gave birth to right?
Question on Refiner.
When I use it, it has to unload the XL model and load the Refiner model. This creates a lot of churn... Is there some way I can keep both loaded? Or maybe batch all of the XL work, then refine later?
Yup he is said to have given birth to Sleipnir
A1111 or Comfy?
There should be an option in A1111 which lets you choose how many models to keep loaded, I believe it uses RAM to boot it quicker
Oh I see, I can look for the setting
wonder what drives a man to shapeshift into a mare then bear the child of some demon stallion
gotta be the wildest midlife crisis
Loki has an interesting life 😆
Base model only: 19 seconds
Add Refiner: 34 seconds
Untick the option setting: 31 seconds
Hmm it's still really slow to use the refiner. I guess I'll go without the refiner until I'm ready to make a final image
Is there a way to create animations with SDXL?
lora:xl_more_art-full_v1:1 what does this lora usage mean ? use this file with 1 weight parameter ?
There was, not sure if it’s still there. Seems like it’s not worth it. https://www.reddit.com/r/StableDiffusion/comments/15bbq1p/a1111_performance_tip_for_sdxl_disable/
I use with 0.8 and it works great. It enhance images someway. It isn't a tyle (because you can specify styles in prompt) it just enhance image composition
I haven't used A1111 since SDXL came out, so not sure what's really changed there, but at least with my card it did help when I would switch between a couple checkpoints often to reduce the loading time. Not sure what's changed since though, may work differently as there's been a lot of changes since
Containing the tales of Fenrir
Is SDXL The best model for fine-tuning? Are there any other SD version that are better for this? Or maybe better model? I am looking to tune it for pixel art style, so I was wondeirng if SDXL was the best choice to do LoRa for
yes it is
Nice
This is NOT Tyr losing his right hand to Fenrir, but man they sure do look like good buds
I’m fairly new to the ComfyUI scene but out of curiosity, is it the SDXL models that causes the Mac silicon chip to run so slowly or is it purely down to Comfy’s poor optimisation for MacOS?
just to make sure, did you follow the section in the github where he describes the install process for mac users?
Hello guys... and good night... Hope all is well with everyone!
how do I make it more visible?
inverting the post processed input somehow improved it
https://huggingface.co/hotshotco/Hotshot-XL there this I haven't tried it because the demo I saw it was max 24gb of gpu vram
here the stream of demo I saw https://www.youtube.com/watch?v=vqxQY6Cu5X8
Let's try out this amazing meme/gif workflow!
If you want to set up ComfyUI before the stream, check the video below. Full installation tutorial at the beginning!
Set up ComfyUI + AnimateDiff-Evolved
https://www.youtube.com/watch?v=GV_syPyGSDY
ComfyUI
https://github.com/comfyanonymous/ComfyUI
AnimateDiff-Evolved
https://github.com/Kosinkadin...
whats the different in text l, text g in comfy ui?
two different CLIP models in SDXL, valid to mirror your prompt to both. not much guidance, except g might be considered overall image and L is about the subject, YMMV
just how did you do this
RealVisXL is good at capturing phsychedelic realms!
actual params are closeup of ancient entity of the universe, octane render, raytracing, psychedelic visuals, (non-euclidean geometry:1.2), iridescence, DMT visuals, LSD, ayahuasca visuals, vibrant metallics, reflections, fluorescent highlights, 4D, psilocybin, DNA helix, magic mushrooms, depth, crisp details, 8k, <lora:sdxl_offset_example_v10:0.8>
Im really getting into creating psychedelic visuals with stable diffusion. Would you send me the link for RealVisXL?
I appreciate it! I’ll let you know if I create anything worthwhile
Fantastic! I’ll mess around with it tonight
I have seen the universe. I AM THE UNIVERSE.
big boi boa
Trying to get a good Jormungandr, not a whole lot of norse stuff in models
Yeah this wasn't really close at all to what I want, but thought it was a nice pic regardless haha
anyone know why my eyes keep coming out looking like this?
its not the size that matters, its how you use it
wrong with XL
as to your eye problem welcome to XL base where it loves snake eyes, wonky eyes, and blind eyes.
Has anybody managed to get the new comfy ait repo working? For a supposedly better foundation the patch fails and there are missing deps 
Hi I'm still pretty new to SD with A1111, can anyone give some tips for making photorealistic images less noisy/blurry? I'm trying to create 3440x1440 wallpapers and am currently using just the txt2img tab with a base res of 1376x576 with R-ESRGAN4x 2.5x upscale. How do I know if the issue is noise in the base image from prompt, or suboptimal upscaler/config?
sorry if asking in wrong place.
"a beautiful picture of the ocean, the sunset of the sea, the sparkling light blue sea water, the stars shining, the beach, many golden twinkling crystal covering the beach, the sunset, the beautiful coloruful clouds
Steps: 40, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 335770400, Size: 1376x576, Model hash: 31e35c80fc, Model: sd_xl_base_1.0, Denoising strength: 0.7, Hires upscale: 2.5, Hires upscaler: R-ESRGAN 4x+, Version: v1.6.0"
i think the blurriness is from upscaling far to big of a scale, and i think you should instead use img to img just denoise it about 0.4 then use the model you used on the base image, just go up about 1.5-2 times, rinse and repeat for a couple then it should be better
isn't 4x the default upscale amount, so not 100% sure what you mean there. will try that also tho thanks
for model-based upscalers if you use any value below 4 it just scales to 4 then downscales back to what you originally picked
so if you do 2x it scales to 4x then shrinks by half
which is so redundant, but I am unsure why there are not more 2x models (there are some).
probably cause esrgan was 4x and 99% of them are just tunes of that
easier to just downscale a 4x'd model then retrain the whole thing around 2x
esrgan's how old now we need a new arch
we're still in the OPT/GPT2 days of upscalers we need the Llama equivalent.
Yep, they are all ancient now.
DAT (dual aggregation Transformer) Upscalers are pretty good in my opinion.
https://openmodeldb.info/
there are some
You can as well render at much higher resolution like 1720x720 like it was said, and then just resize 2x with some native 2x resizers.
🙂
Why does my IPAdapter crash at 1024x1024? It works fine at 512x512!
What 2x upscalers does the community recommend? Cuz I've been using x4 for x2 images without understanding the implications lel 😛
Invasion of the... lips with eyes?
SwinIR? @glass notch Or you can test what you like from
https://openmodeldb.info/
yeah higher resolutions or highres fix fixes that
you could also just use face enhancement (like gfpgan or codeformer)
For up close shots where the image is stable and sharp, don't use it, it might even make it blurrier (at least the last time I used it for close ups)
Anyone know of there's a different way to pass timestamps with prompts instead of using a batch scheduler for animation frames? The webclient method for the API refuses to pass it due to the formatting of the scheduler with the extra " and : separating frame numbers and prompt text
Knock, knock... who's there?!!
A modern good looking portfolio website design ui ux
Nothing special, just thought it was kinda pretty
Hi everyone! I'm looking for information on the system requirements for using AUTOMATIC1111 with SDXL. I'm currently using CmfyUI, but I'm finding the node-based system a bit confusing. That's why I'm considering trying AUTOMATIC1111.
My system is as follows: RTX2070 Super FTW3 8GB + 64GB of RAM + M.2 7300MB + 5700X.
Does this system support AUTOMATIC1111 + SDXL?
when your lora works, but the sticker prompt fails you XD
I think thats good enough
I read RTX2060 and above is already good
what do you need 64gb ram for btw its overkill xD
Are you doing coding calculations ?
64 is amazing to have
I have created a Lora that can only produce elevation heightmaps....of terrains. Into the archive.
Sure but why much better than 32gb ddr5 for normal useage
I can use SDXL (painfully but it works) with 6GB VRAM 16 GB RAM so you' ll be fine. I am currently switching to comfy as per another user's advice as there is a performance cost with automatic1111 apparently.
SDXL with CompyUI can consume more than 32GB of RAM during Vae pass.
I had only 16GB of RAM, I was getting blue screens during Vae pass, so I upgraded to 64, 2x32.
Of course, I think you were the one who helped me make up my mind. I was in doubt between upgrading from 16GB to 32 or 64. Upgrading to 64GB was the best thing I did. Now, even during the Vae pass, I can work with other apps at the same time without suffering stuttering due to lack of RAM.
Sounds good
What does Vae pass mean btw?
And does dreamshaper with comfy UI also need 64gb?