#✨|sdxl
1 messages · Page 7 of 1
it also adds like, 15 years to female faces
you say that like it's a bad thing
well... Civitai would love it, it makes NSFW legal no matter the prompt ;D
Better not tell the Ministy of Magic,
i have a cursed seed for sale
Tried using dpm_adaptive as a third pass, and this happened.
Which is a total fluke of course.
Thanks. XD
Pasting those links used to work!
they disabled embeds to catch you up like that
they were like "tell him we thought it'd be funny"
soemtmes the artifacts really work for it
misaligned timesteps in DDIM between base/refiner introduces way more detail than the refiner knows what to do with given its timestep spacing
those people in the background should not be there lmao
they disappear when you realign them
That seems exploitable.
I guess you should make it into a feature then.
pseudo do you have tips for good negative prompts?
holy shit cursed image
A bit derpy, but Winston Churchill giving a speech at Hogwarts
honestly... it's much different from 1.5, I'd oly use it for specific images and certain styles. the old mandatory negative presets aren't necessary anymore
Jesus is back... and he means business
other that maybe long neck at portrait orientation
i don't even think i use any.
I did not come to bring peace but a sword.
long negative prompts tend to lock the thing into a certain style and i like to screw around with a lot of ideas so i just tend to throw more into the positive prompt to get what i want
i think even 1.5 and 2.x are better with negative embeds trained on the model that you're using
Someone has to stop Churchill from whatever he's doing
omg jesus vs churchy
@urban fjord Jesus is ready and waiting in London. Where is Winston?
why do you want to see img2img?
it can't work with 0.9 and excessively smooths the image out, for the base model that is
makes it look kind of like airbrushed vector art
it really works well for some art styles but not for photoreal eg. Hires. fix
I mean, it works for photorealism, but limits it to shallow depth of field, only that way it looks accurate
and not all photography is like that
the obsession with bokeh is a fairly modern phenomenon
i think his dad is stepping in
This is img2img with 0.9 base and no refiner and it looks like a pretty normal base output. Maybe better result with better prompt.
I feel something is just wrong with your img2img code pseudo
0.9 loves watermarks eh
mine uses the refiner on the img2img output
I'll try hook up the refiner too.
We want i2i as a means to upscale. Take say that last gen upscale and try to i2i that while not degrading but enhancing it.
i'm also not the one that claimed it didn't work. i saw it work before, i was tryin to reproduce my good results, and now it doesn't
just reporting what i observe
Like as a whole those are two completely different images
well yeah he used .9
oh wait
0.9 nvm
i'm dumb. i don't know what strength that is
You know at first I thought it worked too. But then when actual researching and testing it just doesn’t
It will be high denoise
i'm confused because the refiner does that just fine. what it doesn't do for me, is like, add sunglasses to a subject if i ask it to
Comfy said they were both fine when I asked @brave halo
it just changes it too minimally
If you want an upscaler you should say that and not just img2img.
so my LoRA works now, but it looks like crap without the refiner
Both.
The issue is in the i2i
Not the upscale part
Apparently GPU is quicker on some systems, but less deterministic
However on my machine they are the exact same speed
I guess if you have a weak CPU but a good GPU then use GPU, otherwise use the other one
lightning serves as a reminder: god of thunder weilds the banhammer
have you tried for analog/film photos yet?
Have there been any annoucements regarding the time SDXL will be released? Any launch events planned?
July 18th, AFAIK
I mostly use this prompt that I've build for SD 2.1 for film / analog portraits. It also works great for SDXL. you just need populate the [PLACEHOLDERS] with your own values.
cinematic movie extreme close-up still of an epic scene of a [ETHNICITY] [OCCUPATION] in the [SEASON] at [DAYTIME], centered, looking into the camera, fog atmosphere, volumetrics, photorealistic, from a western movie, analog, very grainy, film still, kodak ektar, fujifilm fuji, kodak gold, cinestill 800t, kodak portra, photo taken by thomas hoepker
this is your grandmother's bake sale
fractal bake
nice!
But wow, this does Escher much better than 1.5 ever did.
i love the deep fried art lmfao
yeah I had some really interesting mash-ups
oh, fractal lawn gnomes was terrifying
Churchill is really not playing around anymore.
Or rather
Abstract renderings of fractal-infused lawn gnomes --style abstract, fractal, surreal, M.C. Escher -
Because if I'm throwing two-word prompts at it, I'll get an AI to fix them for me.
Oohkay
That's terrifying!
anyone use fitCorders comfy configs?
fractal crochet - the best
needs a LoRA for people falling down stairs
just tell it to do a cartwheel down the stairs
tldr on "A score" effect on prompt?
That is not good for people with trypophobia.
I'm okay with it. More of an issue if they look like lots of round holes
bird law documents
what's the negative prompt?
Is there a Discord silver? Like on Reddit
crocheted danny devito from @urban fjord
"Why is the refiner just refining?"
well i was confused at what the person was requesting 😛
so it can do pretty good style conversion but you have to use pretty high denoising and it's kinda limited in which styles it'd do
like it can make people cel-shaded versions of themselves
it's super fast, too. it'd be interesting to optimise it for certain styles
is there any specific prompts we can learn and optimize about SDXL 0.9 for consideration?
it's going to take a little bit of work on your part, are you cool with learning a little bit of python?
so i just setup sdxlmixsampler custom node and a regular setup of base and refiner, same seed, same sampler, same steps base+refiner. but they somehow end up different and im trying to see if theres a benefit from one to the other. Left is custom node. right is base+refiner
yes, i'm familiar with python
In regards to that comment: the concern is that the base or the refiner are unable to image2image and add not subtract details. Both collapse details and ‘un-refine’ the image when it is run through again through samplers. They are unable to add the necessary noise I would assume.
For ex try genning 1024 upscale it in any way and then running it through a sampler and adding details and sharpening it up. U can play with denoise, cfg, steps, et-al and nothing really adds. Setting denoise high does give similar results but far from what the initial base+refiner does.
If you just want to change a small portion you can try inpainting.
Maybe we are just asking too much.

If you want to upscale then img2img isn't the right tool, at least not without controlnet.
ok you can use GPT 3.5-Turbo or local LLM to generate test prompts based on certain concepts. you can also have it generate test concepts, which you ask it to generate test prompts for
Again it’s not about the upscale part. It’s about further enhancing images. Even feeding the gen 1:1 to a sampler takes away from it. But unsure if control-net will be the the tool to help here as we don’t have that yet and that makes an entire different latent image to build up of.
Controlnet will come, either from SAI or from someone else.
so what's Python usage?
what's the meaning of SAI,stability AI?
Yes
Denoise?
But yeah, I still recommend using inpainting on the parts you want to enhance that way it uses the remaining bits as reference.
automating that
Where can I learn exactly how Stable Diffusion works? An overview of the different pieces of it and how they fit together. For example, SDXL has something called CLIP Text Encode which take in something called a Clip and two strings and then outputs something called conditioning. I want to know what that CLIPTextEncode does and why it has two string inputs instead of just one.
https://cdn.discordapp.com/attachments/803727923336445963/1129236647163207710/image.png
you can try the workflow @ https://github.com/SytanSD
photo of the emoji 😆 shot on 70mm film
Wait is this actually working or a coincident...
i want to enhance all of it 
really sdxl 1.0?
Try doing it in two passes where you inpaint part of it each time.
everyone in the chat when @hard fractal shows up
Alright, it looks like there is a weak connection between emoji and the concept. That above was me testing with 🚒
photo of ❤️ shot on 70mm film
Definitely, I just need to remove emoji from the prompt.
What's up goobers
whats up !
My tiredness after taking a nap
🖼️ 🎄 🤶 📸
It forgot Santa, but yeah that is a photo of christmas shot on some kind of Camera.

If you are using comfyui, you could check nodes.py to see the code of each node.
https://stable-diffusion-art.com/how-stable-diffusion-work/ This seems like a decent starting point too. Thanks.
you, feeling refreshed
gpt asked for that
i have no involvement with this prompt =_=
dont lie thats exactly what you wanted
OpenAI: deploy carefully, don't do X or Y
SAI: Don't do this, that, or plug it into other AIs
me:
yes
i think my dream pipeline would have like 20 different models in it
do you do any finetuning?
like, on SDXL?
yes
civit?
i put in hyper realism as a prompt and sdxl i feel like just straight ripped this off someone., that signature is tooo good lol
No
I do no
Not even a bit
I feel 10x worse lmao
Gotta love having a ton of sleep disorders
break on through to the other side and trigger an integer overflow
Prompt: 🛢️ 🖌️ 🎨 🎄 🤶 📸
Do we even need to prompt with words anymore?
This is perfectly clear. Oil painting of Christmas shot on some kind of camera. Though I don't think the painting look too oily, more like a drawing
nice, mine's in my discord bio. One of the first to do style models and then got busy lol
Oh hey, would you look at that, my messages made a logarithmic curve lol
check out my training code on github, https://github.com/bghira/SimpleTuner
the trajectory of your sleep debt if you follow my 3 step simple plan
Look at me, a statistical prodigy
Side note, cause I am infatuated
I kinda think I need these in my life
Pixie-esque time travel++;
hmm
🕒⚡ Pixie-esque time travel++;
that's the full prompt for that img and it seems to make one single character
ah, a fellow 2.1 expert. I've trained something like 450 models on 2.1 and built my app on it (~2000 trained at scale). Capitalized on the underappreciation of 2.1
everyone slept on it
looks good?
BAM!

lol i did just 200 models and those were all burnt, figuring out how to do a general fine-tune to resolve the noise
i tried a couple large scale analog photography portrait tunes but no dice and it wasnt worth my time to fuck with anymore.
what grinds my gears is that a bunch of awesome 2.1 tunes and merges just started coming out on civit now that I'm gonna phase it out
her face is awesome
shes just an amazing person
i love sdxl. look at that pic. look at it!
the velvet!
is that some Lora?
no, just a prompt
ahhh
Winston Churchill as a manic pixie girl.
the training data had an aesthetic score for each image
yes, aesthetic score, it biases images towards a certain quality-level - it's only in the refiner (not the base) and broadly you can just ignore it (ie leave it at the default value)
The new drivers from NVIDIA are interesting
I generate a decent bit faster now, but I also am proportionally slower at VAE decode as well
so I went from like 15 seconds to 13 seconds for gen, but now it takes like 4 seconds instead of 2 for decode
mayble manually forcing tiled will help
it seems like tiled might even be slower
hmmm
@high skiff are you familiar with fitCorder's configs?
I am not
yeah, ok
there is something up with the VAE for sure
base diffusion took 19 seconds for 4 images, refiner diffusion took 11, and VAE decode took 21 seconds
ok, so tiled VAE is even slower here
by nearly 2x
with non tiled its 19.5 seconds/11.5 seconds/12.5 seconds
With tiled its 19.2/11.3/21.1 seconds
meaning a solid chunk of the image gen time is just VAE
I believe tiled has always been slower
you mean 536.40 for new nvidia drivers?
@high skiff that is
or is there a beta driver im not seeing
that is the driver I am on
I know it has issues
its weird
made my gen speed faster, but slowed down my VAE decode speed considerably
gonna need to be a lot of optimizations in the coming weeks/months from everyone involved I think, including A1111/comfy
comfy is pretty optimized for all of this, thats the nature of how it runs, but A1111 is a dumpster fire right now for SDXL
when it isn't crashing, or you don't wanna use half of the samplers, or high res fix, or img2img, or controlnet, or... anthing lol
not to mention people getting blue screens
no idea why, but it eats up every last shred of my 32 gb of ram in the process
causing even my razer keyboard lights to start lagging lmao
ah, thats inevitable unfortunately
its the size of the models
I am upgrading to 64GB system RAM because of it
0.9 prunde is actually bigger, its a very weird phenomenon
they didn't prune it right
training LoRA's for it uses way more VRAM too
like 30% more from what friends have shared
Yeah, likely people not knowing how to prune SDXL and assuming its the same as 1.5
aand just deleted it
yeah, I recommend using the full model
but yes, both will hit your system RAM hard
seems like SDXL has a 32GB system RAM minimum
a very serious minimum, considering the pc's general unusability in the last steps before image gen
which I mean, I feel like 32 GB should be the standard now
for sure
luckily RAM is very affordable now
hell, even my fast 64GB DDR5 for my new PC was about $140
its a very reasonable price
what price?
$600 for a Zotac 3090 with shipping, box, and tax included
at a price like that, how could I say no lol
that's fantastic. nice find.
i also wish sdxl could gen 512x512 images like 1.5, to then be upscaled, but it produces gibberish
would help
yeah, agreed
that might be an improvement that comes later, dunno
I still much prefer my way of doing things with 1.5, where I gen a grid of like 16 or 32 images, pick the one I like and high res fix it
So much faster and more efficient
yeah hi res fixing is superior
I prefer native 1024 so far
but at the same time, once we have semi decent finetunes of SDXL, high res fixing to 2048 should be just as good if not better
yeah, lots of improvements came out post 1.5, hopefully similar happens here
Did you try messing around with the resolution parameters
I wonder how good they work
I have seen enough behind the scenes to know that training SDXL for LoRA's and hopefully TI's will be super viable and powerful
yeah if you drop it to 512x512 it outputs crazy images
They are pretty useless
Thats a shame
the latent res sliders do very very little
whereas 1.5 you could drop to virtually anything
about as much as Xformers noise
just slight changes
tho, I have set mien to 4096x4096 which is what SAI recommended
I have found better alternatives to a lot of their recommendations, but not that one
Which parameter, arent there two
latent size and latent target, both seem to be pretty negligible
Also, new release of my UI is inbound
this time with ultimate upscale
I would be excited but Im a no refiner guy sadly lol
really?
why is that?
it is desctructive for non realism, i will give that
or well, traditional art, the refiner does hurt there
So I can train loras on the base model and also its unnecesary for animation to be a perfectionist
Dont want to have to train both models even if I could
ah, make sense
Since Im gonna be making tons of loras
fair enough
yeah, it is a shame
tho from what Pseudo said, SAI seem to be killing the refiner anyways
which is a monumental mistep IMO
No refiner for 1.0? Or killing it after that
seems like no refiner for 1.0
I bet the 0.9 refiner will work fine tho
which means they better get hard to work on 1.0 base, cause it needs to improve a lotttt for realism lmao
messing a bit with fantasy realism
Love the nipple vents lol
lmao
OMG, didn't even notice that lmao
yeah, 1.0 looks bad for realism
real shame
now, do you inpaint it away or leave it cause it's hilarious lmao
Whaat are they making it all midjourney looking
Hopefully if we lower aesthetic score it wont be so bad
I don't mess with inpainting at all with SDXL
very randomly good/bad with the same prompt
low A scores are ideal for realism, thats part of why my images look so much better
5 is about the highest I typically recommend for realism
I really hope SAI is just gonna drop an anime and photorealism finetune alongside the base 1.0 model
maybe 6 if you want some artsy realism
i do like how sdxl does expressions better, had to use a lora before
that would be fucking insane, and would cement SDXL as a success on day one
but yeah, my realism quality is from a mixture of proper prompting, split text encoders, split diffusion, and precise using of the a scores and such
I will be attaching a more advanced prompt for my v0.6 release of my workflow
time to retire my corgi prompt haha
Nice thumb 🍆
aw liked the corgi one
as long as we get a way to high res/img2img ill be happy refiner or no refiner
He will live on forever lol
There isn't one?
I bet I could come up with an upscaling solution using XL tbh
yours was the first comfyui workflow i found when I started monkeying around with it yesterday
nice
my split diffusion seems to help results from SDXL considerably, which is why the dev for diffusers implemented it as their new default workflow
having a large amount of confusion at the moment
hmmm
Please do, then get back to me, but doubt u will get acceptable results
@visual glade Could I get some quick help on something? I am quite confused at the moment
nevermind!
sorry
I wrote down a value incorrectly lol
😅
what's the limitation token numbers on the positive and negative prompts?77 tokens maximum each?
I have seen some people say that, but I have used a 156 token negative several times
156 is more than previous sd models right?
@upbeat summit your film/analog portraits prompts work well,so real
I should give some analog realism a try
I think we need a poll on whether or not to save the corgi ;o)
sharp texture and the skin is smooth
I'm trying to train face LoRAs and still not getting the right parameters for SDXL smh D:
@eternal foghey man, are you here by chance?
i2i is not possible yet, least w good results
@eternal fogif you happen to come on, I would grealty appreciate some info for how you suggest training SDXL LoRA's using kohya, pretty please!
no idea at this moment honestly
Also, I get my 3090 on the release day lol
@visual glade is there a way to keep the path of load LoRA node, but mute its activity? like it passes through it, i but i just want to disable actually using the lora
rather than disconnecting and reconnecting
you can set the strength to zero
yeah - personally hate that MJ orange-green-rainbowy look
how many imgs
sleep well!
Ps refiner does good. Still if SAI can gen that quality built in great if not it’s a neat thing.
💤
15, diverse set too.
really cool knolling images
does anyone knows why when i add negative prompts,the man changed to a woman eventually with the same image seed
prompt:A man walking around her neighborhood, highlight hair, detailed eyes, sharp focus, young face, perfect symmetric face, pupil reflecting surroundings, realistic skin, soft healthy skin
nagative prompt:ugly, tiling, poorly drawn hands, poorly drawn feet, poorly drawn face, out of frame, extra limbs, disfigured, deformed, body out of frame, bad anatomy, watermark, signature, cut off, low contrast, underexposed, overexposed, bad art, beginner, amateur, distorted face
yeah I noticed this too. but I have not yet identified the corresponding token
remove ugly from negative 😉
oh, walking around her embedding token nearest algorithm, her is female subject maybe
try emphasizing man in your prompt with (man:1.2)
For comphy is there a node that will randomize text as well? With any type of paramaters?
there is the Text Random Line node in WAS but I haven't included it yet in a workflow
Sorry I'm not too familiar, WAS?
WAS is a custom node package for ComfyUI: https://github.com/WASasquatch/was-node-suite-comfyui
Nice, python?
yeah
i tested it again,negative ugly token is okay, but the positive prompt should not use the subject her
So I could just as easily drop a string concatination node myself easily?
for word in words: {s} something
I haven't created my own node yet, but it is pretty straight forward.
it's just python
also this package has Eval nodes where you can just use python expressions (a + b + c) or simpleeval to process your data: https://github.com/LucianoCirino/efficiency-nodes-comfyui I do some manipulations with it
the SRAM usage bottleneck
I had a question about comfyui. Could I have a node to wait for user input while running the whole process? For example, I genetated two images by base model and I want to select one of them for the refiner steps. Could comfyui handle this case?
SD: XL vs 2.1 vs 1.5. optimized prompts, normalized resolution
/prompt RAW photo, B&W photo, (detailed face)+, portrait of a beautiful woman posing for a picture, canon 85mm F1.2, (soft fill light), f22, dramatic lighting, trending on ArtStation Pixiv, high detail, sharp focus, aesthetic, 8k uhd, DSLR, intricate details, soft lighting, high quality,
Negative prompt: blurry, out of focus, (deformed iris, deformed pupils, semi-realistic, cgi, 3d, render, sketch, cartoon, drawing, anime)++, text, close up, cropped, out of frame, worst quality, low quality, jpeg artifacts, ugly, duplicate, morbid, mutilated, extra fingers, mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, blurry, dehydrated, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, fused fingers, too many fingers, long neck
yeah, my new PC has 64 GB SRAM just cause SDXL is so RAM hungry lol
PHROUGHE
The last generation I did looks so real
Literally a macro shot of a frog at a reptile store being misted with the light above the enclosure 😛
looks awesome
one of my favorite new realism gens
i've been trying to get a good 2k workflow
also impressed with how well SDXL can do specific cars, like this 918 Spyder
it can do a lot of cars better than a lot of specific car LoRA's out there
that is an impressive spyder
@west breach That third image is super simple but extremely expressive that's cool
nice fidelity, @west breach
thanks. just tweaking it still trying to remove the cloudy effect i'm seeing in some images
WOAAAAAAAAAAAAAH
ruined my Oobabooga, both of my Kohya's, my old Comfy UI, and my A1111 install
all of my shit is gone
sighhhh
\
@high skiff Your workflow + textual guide is amazing
I am glad you like it
so i heard refiner will be gone? so will it be baked in for 1.0?
Not sure if it will be gone yet, if it is, then it will not be in 1.0
1.0 would just be the base
i think there are some devs who want the refiner gone
I understand why they want it gone, and it makes total sense
But it will also lower SDXL's performance considerably, unless they get a huge improvement to 1.0 between now and Tuesday
Why remove the refiner? To make SDXL images less detailed so that we can put more focus into detailers or upscalers ?
biggest reason seems to be cause you would have to train LoRA's and stuff for both the base and the refiner
super stupid question: if the refiner improves quility overall, why isnt it baked into the base model? are there technical reasons?
Two years forward, all of this will have seemed like nonsense and a waste of time. The tech will probably have advanced so much and it probably will all be because of some simple little new invention made by nvidia
there are
Polytethering Neurodivergent Pipelines Through Infinite Symmetry VR Universes
from my research, it seems that the base model is made to understand composition and framing, while the refiner is made to understand fine details
the base does the image shapes and composition, the refiner cleans up the bases shortcomings
How much vram is needed for sdxl 0.9?
to run it, 8GB
some people have gotten it to work on 6GB
At 768x768 i assume?
yep works on my 3060 6gb
nope, 1024 works just fine on 8GB
Ah great
I thought at 8gb you run it with the practically slightly worse VAE for 1024 though
Uou do 1024?
I genuinely have no idea how the hell people are saying they are training LoRA's for SDXL on 8GB GPU's
I am using BS 1, and its using 18GB VRAM
yep
Jeez, even 2.1 cant do 1024 on 6vram
Then the model is extremely optimized
he said 768 not 1024 for the 6gb
@boreal bough @golden quarry @eternal fog Sorry to ping you all, just wanted to ask if you guys have any advice, cause the fact that BS 1 is taking 17-18GB VRAM in both Kohya and Derrian's UI is insane, and makes 0 sense
6GB can likely run 1024x
it will just pool and be slow
and again, I think they use the tiled VAE under 11gb, so the results aren't the same
are you running --medram --lowram ?
I run 8x 1024x without having tiled VAE on a 10 GB 3080
Imma be entirely honest... I have no clue. It could be a lot of things, drivers, windows version, so on
i'm using comfy and have not using any options
this is the lowest I have seen at the moment
Wtf is a tiled vram? Is it new?
and at this speed, training LoRA's is impossible
Mind linking me what is comfy?
I can be waiting 9.5 hours for a 4 epoch LoRA
Even if it takes days, that's not really 'impossible'
Yeah that looks like nvidias good old fuck you that happened when they released drivers for the 4060
it is when settings have yet to be found, and 4 epochs is about 5% of a real LoRA, which means a single one would take weeks
comfy UI
👆 this
Sweet thanks
@golden quarryI think something is actually wrong with my drivers, so I will be doing a full rip out and reinstall with DDU
because I noticed today that my GPU idles at max clocks as well
110 watts idle
where can you see the gpu watts?
Go to the commqnd center and see the gpu
I use HW info 64
no, thats not accurate at all, and there is no power draw in there
It should show gpu usage watt average per~5 sec default vqlue
Mine has
And yeah it isnt accurate
I had the same problems when I implemented my own workflow. But when I was using kohya_ss I could run with batch size 12 on less than 24GB
so kohya_ss seemed to be extremely memory efficient for me
yeah, and now I am trying to figure out how to also run a LoRA on 10GB VRAM
cause people here have said they have done it
but here Kohya is, using 17 GB VRAM for BS 1
I would have to check how much vram it needed for me
not 17GB for BS one, I would assume lol
but I cannot imagine its 17GB for BS1 if its still < 24GB for BS 12
You need ~8 for lora
~12 for 1024 training on 2.1, not sure on 0.9
I have talked to multiple people on here saying they have done SDXL LoRA's on less than 10GB VRAM
yeah, I totally believe that
but nobody seems to know why thats not the case for me
let me check in a few minutes how much vram it needs for me
My new GPU will be here on Tuesday, then I won't have to worry
Maybe you have some other stuff that take vram, lower your resolution and close everything else
but I would still like to play around now
Also use xformers
I only had 0.2 GB VRAM used
I'd be worried if my setup uses 2x the Vram it does for others even with a bigger gpu
nah, I found results with batch size 12 much better than with batch size 1, so its still worth getting vram down
nothing I would be running would b eusing like 9 extra GB VRAM
Should be good i think
What are your launch setting and uou qre using a1111?
Aaa that might be it
no, koyha is super efficient
Mmm mqybe not in loras idk
it is
Try to run in a1111 too
its what everybody has done who said they got it for less than 10GB VRAM
If it works then cheers, if not then that just sux
no, i won't touch A1111 with a stick lol
Well i cant force you
alright, just nuked my kohyaa training
do you have the same torch etc. version as them? iirc there is a difference in efficency with current implementation
is that it? @high skiff
yeah!
yup, all the same
Ah wait, are you perhaps using 4xxx gpu? I remember they had a problem with extreme vram usage relatively to 3xxx and lower
really? X_x
Then im out of ideas, sry
if you are on Windows I'd try setting it up in WSL2 and checking if the ram usage is the same
There's something extremely odd with SDXL training at the moment.
I've had it work perfectly fine, even with batch 2. Then I'll come back the next day with 0 changes and it will OOM. Makes no sense.
WSL2 is hell, and I will not be using it again lol
last time I did that, I lost over 80GB of files in a failed training when it nuked a file structure
Didnt see you for a while, how are you?
how is it hell? It's easier and cleaner to setup than the mess of .bat files for Windows
is their any diff in ram/speed when using linux vs windows for sdxl
yeah, this all makes 0 sense
thirsty when it's running 😄
nevermind then I guess, I'll just go back to 1.5
I get bored for a bit and then come back :). I'm good how are you.
seems about right for a 4090
small due to different drivers (you are using the linux driver) and APIs but no significant penalty for using WSL
and some stuff are just optimized better in Linux which you get some of in WSL
at least for 1.5/2.1 though, I only assume it's the same for sdxl but there's no reason it shouldnt be
Im good, at the end of exam season
Im going to try to make a memory implementation soon for chatbots
Happy to see you back
@high skiff Okay, you are right. It takes 16GB for batch size 1 and 22 GB for batch size 12
wild ^^
yeah, I can only assume thats a bug
dunno. When I implemented training myself I always got OOM
you have to do a lot of tricks like gradient checkpointing, mixed precision training and so on to get it low enough
like even if you just train a lora it has to store all intermediate gradients or doing excessive gradient checkpointing. So memory usage is not automatically much smaller than when you do full finetune
I'll be trying again tonight. I just don't understand how it runs 1 day at batch 2 1024*1024. And then the next day it OOM.
hell even Kohya says SDXL should LoRA on 12GB VRAM just fine
Maybe picture complexity?
If you made a model twice the size of SDXL, I'm sure you'd get the same picture quality without the refiner.
hm, maybe its because they add text encoder lora training?
But then it wouldn't fit in 8GB.
same issue even if you disable it
Or picture weight
Nah I did a proper control. It's the exact same dataset and the exact same training settings.
Mmm
yes, but maybe the bug is in there. Like they always create the loras, even if you disable it#
they just remove the loras later. Its a bit weird implemented
I have a theory that the trainer is fucking up the settings input into the script and it's not configuring properly.
thats my assumption as well
I noticed the other day I asked for 100 epochs and it gave me 80
It had randomly added a max steps entry in I didn't ask for
do you cache text encoder outputs?
Yeah
maybe they forgot to move the TE back to cpu
ok, and now training 1.5 is telling me I have no dataset
pretty sure that kohya is just in flames and all fucked up right now
Well I don't train the 1st text encoder and I did notice the logs were presenting like it was trying to. So I think that's what's being buggy.
what the advantages of that model?
I need to look at it after work and see if I can figure it out. I'm using the GUI so I might just manually use the scripts instead.
windows or linux? if windows, there tends to be random background app usage that gets in the way if you're right at the edge (on Linux you can track & control that better, and tends to be less of an issue in the first place)
my own windows install idles between 0.5 and 1 GiB idle VRAM usage depending on the day, lot higher as soon as i open anything
It's on windows, but the idle memory usage is around the same. And I've also tried using the newer Nvidia drivers that don't OOM but instead use insanely slow system ram and the usage in total balloons up to something like 15GB, so there is something odd going on.
I'll try it again tonight, but I'll close everything else down and see if it makes any difference.
my idle usage on my GPU is like 0.2-0.4GB
@high skiff i have a workflow, can you please tell if the refiner is being used as a base?
it's using the base as the base
Mines similar to mcmonkeys where it will show about 0.5 to 1. But I always have discord, steam and many Firefox windows open.
That looks like you're using the wrong noise scheduler for the chosen sampler.
@ionic dragon share a screenshot of your sampling settings.
is there something wrong with the workflow?
as some images are very noisy
you ca just drag this into comfy
I'm not on my pc so I can't lol
oh wait
dpmpp_2m_sde_gpu, normal(sampler)
There's your problem
i am doing a comparision
That sampler needs karras
so just trying out all the combinations
If you use normal it does what yours is doing.
anything but normal for the first stage
so there's nothing wrong with my workflow?
Some samplers are fine with normal. The sde ones don't like it though.
ok
No you've just picked a noise schedule that sampler doesn't like.
i am just doing a comparision
so i wanted to verify if everything's was right
before i can share all the outputs
Change it back to karras and it should be fine.
this is the same one but with karras
@eternal fog do you know any tool which can stitch images?
so i can make a grid like the one in a1111?
but manually?
There's custom nodes that can do x/y grids
This is the first result on Google
https://github.com/hnmr293/ComfyUI-nodes-hnmr
here is my 'economy' workflow
@high skiff economy dog
lololol
alright
so it looks like my new dataset for my Avatar LoRA is a huge success
results from just a 15 minute training on 1.5
so when I can get SDXL into action, this should be amazing
can SDXL runb on 8gb in comfyui?
it can run under 6gb
can you let me know did you need to increase you page size memory ? or are u using a special configuration ?
it doesn't need to offload for me
no special config
if you are using comfy, make sure to update it
there was a fix that was implemented a few days ago to reduce vram usage
training a avatar lora?
yeah, testing my dataset for SDXL
i am not looking for comfy nodes, i have bunch of images, i want to stitch them manually
thanks some how it works now ! thank you
glad to hear
beautiful!
the 24k gold workflow 
thx 🙂
.
is this the stock 0.9?
yep
@molten gull you're bringing bangers again
i m wildly experimenting again 🙂 back at stable diffusion, my midjourney experiment is over (for now)
it's not really doing what i want it to do yet though 🙂 stubborn little program 🙂
really great images anyway 🙂
I don't think I've found higher batch sizes to be better when it comes to 1.5 lora. Dunno if it's different for sdxl because I haven't done a lot of training. But I haven't seen much of a reason to go above batch 2 usually
SDXL trains extremely fast. Like a few epochs and its done. So increasing batch size don't cost much but hopefully makes training more stable
i m not sure it trains fast
but I haven't found time to experiment a lot. My current workflow works perfectly.
1.) Clip Interrogator to find best fitting token
2.) Finetuning of OpenCLIP embedding with Textual Inversion
3.) LORA for OpenCLIP text encoder
4.) LORA for unet, everything else frozen
Do you use kohya?
the only step that takes long time is the textual inversion. But you can probably skip the step. I just like to have a simple single trigger word as embedding
Lora
I'm curious, does the voting we do also improve prompt understanding?
not sure at all even ... it's all very frustrating still, i did a LORA with 400epochs and 100 pictures (ran the whole night) and it seems to be better than a small trained one
and yes, the implementation by kohya
i'm looking at some images at #1100484581037195384 and they seem to follow the prompt far far better than they did in 0.9
I thought kohyas implementation of training the te was a bit broken
might be, dunno. I changed the code a lot
cause it didn't support many things I wanted to have, like training TI + TE, training TE and Unet separately, using different ranks for TE and Unet and so on
I mean, I usually train 1600 steps at batch 2 when I train 1.5 lora, which isn't really that many steps in the grand scheme of things
I see
Pretty sure kohya does allow you to set different ranks for unet though
is it possible to mix sdxl with normal models?
like, start an image with a 1.5 model and finish it with sdxl refiner
maybe if I set them via the blocks, I don't know
yes#
All in all, not something I can really implement in my ui though. This is why I kind of want to just build my own trainer to put my ui to
F, i like this style 🙂
Also yeah, the blocks is how you do it
I just used the "set dim from weight file" option and initialized an empty lora myself
the whole workflow was very hacky ^^° Maybe I find time and motivation to make everything clean and reusable and then make a pull request for kohya
Fair
My goal with the ui was to make it easy for the end user as much as possible
So I don't usually implement things that require editing sd-scripts
yeah, I think its always cool having both. An intuitive UI and an flexible customizable API
what did you make @golden quarry ?
dude, this LoRA worked so good on 1.5, I can only imagine how good it will be when I can play with SDXL
I can also get way more images in my dataset
The lora easy training scripts.
A ui made with pyside6 to be as easy as possible to use
https://github.com/derrian-distro/LoRA_Easy_Training_Scripts
Yeah. I'd love to be able to do that, but my knowledge of SD in the technical aspect is unfortunately not the best. I understand it on a decent level, but I haven't really looked into implementation details
Thanks! A lot of effort goes into it
say, what do you suggest for learning rate, unet learning rate and TE learning rate ? and why ? (and how do you know?)
I thought I would understand it lol But fact is: if I implement training, inference workflows and so on myself then it needs 10x times more memory than when I use kohya_ss. There are so many optimization steps necessary and pytorch is not that easy debuggable. So I just use kohya_ss now and change the code to get my own customizations ion there
Fair enough, I'd like to be able to understand how to do it from scratch though
super low. 5e-5 for both and I train them separately. However, I'm pretty sure you can do larger learning rates. It's just that training the loras went so far that I had the time to do low LR and watch the progress
can i stop and continue a training somehow ?
For sdxl, not sure. It seems that the way I was training is impossible for others as I was able to do stuff seemingly nobody else was able to
yeah, but I never did that. Actually, I just train a bit longer until it clearly overfits, then I choose the best checkpoint
In my tests for sdxl, I used 1e-3 lel
loras are so small you can just save every epoch
i m using 2e-4 at the moment, doesnt seem to be too bad
oops, sorry, I meant 5e-4
can i train something to 200 epochs today, and continue to 400 epochs some other day ? can those be continued ?
5e-4 is what I use on 1.5
If you set the save state yeah
Just keep in mind the folders that get created are not small
save state is something different than save freq and save every 10 epochs ?
so i need both, yes ?
I think just save_state
i want a .safetensors every 10 epochs
you can also continue training given an existing model. Thats totally possible. It's just that the optimizer has to adapt its learning rates and momentum and stuff
so training might get worse for the first few steps
do you set "save last state", too ? i m not really getting what that one does different
save_state is storing the "training state" such that you can continue training another day as if nothing happened
and how would i continue on an existing model ?
Save last state is so that you don't have 400 save state folders lying around
It will save the last x states
thats good, so i do save state, save last state and there epochs: 1
i only want the last of whatever is trained so far
the other is storing the model itself. It does not contain the training state. So if you continue training from a model then the optimizer will have a noisy gradient for the first few steps and training gets worse for a moment. But it will stabilize after an epoch
how exactly do i load it again?
The resume variable
Also within the save args
Assuming you are using save states
i will use save states, yes
i cant see a resume though
ah, it's in "saving args" 🙂
shouldnt it rather be in "Loading Args" ?
say, when i do the resume thing, can i change the learning rates before i "start training" again ? or does it have to be the same as before ?
Well, resume is the only loading arg
in theory its totally possible, but I do not know if the kohya script supports this
It used to at least, recently it seems like they changed it
can you explain how the "sample args" section works please?
It's annoying to say the least
Sd-scripts should have a section on how to format the txt file that gets passed in
will it allow me to make some sample images of what the state of the lora is, given some prompts to it ? do you have an example text file by chance ?
https://github.com/darkstorm2150/sd-scripts/blob/main/docs/gen_img_README-en.md you can find it here under prompt options
so i do a file like "lora_training.txt" and put something in there like:
"portrait of an old woman in style of malicor_old_woman --n boring image --w 1024 --h 1024 --d 3245098"
and load this to my training of my lora "malicor_old_woman" ? and get some examples that way ?
Pretty much
could i put in multiple lines in that text file ? and have more than 1 sample per epoch ?
yes
if you use the kohya_ss gui, though, there is a bug in the sampling. It's fixed in the kohya script
Note that if you edit the file during training it will register changes, at least with Kohya
It is useful if you notice you forgot things you wanted sampled during training, and I guess you can also remove prompts that's just taking time and seems to not be useful anymore.
I guess its using https://github.com/kohya-ss/sd-scripts/tree/sdxl as base, too?
amazing piece of software, actually
@golden quarry where does it save the samples to ?
just take sure that your kohya sd-scripts is up to date. The SDXL branch is still worked on and They do fixes every day
I guess I should move from the GUI version to the standard version then. Hopefully I can just use the commands from one in the other without any issues.
Yes, but that's a gimmick, because 1.5 and XL latent images aren't compatible and the Refiner has similar dataset limitations as the SDXL Base.
In order to do that, you need to generate an image with 1.5 model, upscale it to the SDXL resolution with an upscaled or highres fix, send the output it to VAE Encoder, send the latent to the SDXL Refiner, then use VAE Decoder to get your output.
That extra encoding and decoding takes a lot of time, as well as SDXL refiner loading time, and the SDXL's Base does generate a 1024x1024 image faster than 1.5 would.
Also, the Refiner can easily ruin a moderately NSFW generation, if that's what you want to use your Refiner on, because it doesn't have anything too spicy in the training dataset. Only the mild stuff. You can use masks and whatnot, but that overcomplicates your workflow, tremendously and unnecessarily. Even if you don't care about that, if your 1.5 model is heavily biased towards a certain style, chances are it will generate the details of this style better than the Refiner would, because it was fine-tuned to do just that. Unlike SDXL, which is general purpose model. Undoubtedly superior to 1.5 overall, but not specialized enough to compete with some 1.5 fine-tunes yet.
So any solid reason to use 1.5 as your base kinda disqualifies the Refiner, and any good reason to use the Refiner strongly suggests you to use it in combination with SDXL Base, which is a very good model, after all.
the thing crashed :/
ah okay, thanks!
I'm using sd-scripts as a base, not kohya-ss
I think the GUIs are not interferring with the original cli version. You can just pull the up-to-date fixes from the sd-scripts
yes, I mean sd-scripts
And I do try and keep it up to date
Pretty much every day after I get back from work I've checked to make sure there were no updates on the SDXL branch, mainly because I want things to be easy to do!
can you have a look at that error message i got by chance ? @golden quarry ?
So just clone and pull from sd-scripts and copy all files over?
If you are using my scripts, you can go into the sd_scripts folder, open up a cmd, then type git checkout sdxl then git pull
I'm not using yours but I guess I should move to yours.
should work for the other, too
I'm dying to see deliberate get upgraded to sdxl
i really wish you would be less prescriptive, it isnt a 'gimmick'
tbh you're just doing it wrong, the refiner pass should happen before 1.5 touches it
i also pass the latents directly from 2.1 to sdxl without the vae encode decode steps you described.
basically mostly incorrect except about the nsfw part
more 0 dropout training? =p but why
Deliberate gave a lot more precision to 1.5, so I'm hoping it makes SDXL even more flexable
base models are by definition as flexible as it gets, making a model more precise is by definition, reducing flexibility
Have you used deliberate before?
you have to prompt the base model very precisely to get full access to the parts of its data distribution that are "less obvious"
uhm yeah 🙂 i even ported it to the Diffusers style a while back
Deliberate is a very old model, uses very old training techniques
try epicRealism if you want something better, as it doesn't need negative prompts
I mostly used deliberate with artists to get art styles
It worked well with producing very good results for 1.5
i see
christ, why does sdxl take any word like chubby or thicc and make them 400lbs
helluva bias
the restore doesn't reaaaaaaaally work yet @golden quarry , it says it continues at state50, but it starts at step1 again, not sure if it just names it wrong, could be the case
it may just not progress the update bar to the right pct
Something in SDXL seems to pick up on things and exaggerates it, like my misaligned eyes issue.
mine does and it is a pain in the ass. i dont blame him for avoiding progress bar shenanagins
its already in the base lurking just beneath the surface most likely in the text encoder
when you provide those features it suddenly knows how to express them in the u-net
Maybe try plus sized
quality of data has always mattered
idk why people assume it wont with sdxl
probably because of mcmonkey...
Definitely, I just found it strange that it took it a lot further than both my training images and the features already present in the base-model
I feel like prompt weights and negative prompts are what will fix these short comings
after all my 2.1 experiments, i'm not surprised one bit by that
that's a weaker effect in SDXL now
Well, use bigger numbers
Empty negative prompt means, during sampling, on each sampling step, create two denoised pics - one using the prompt, one using the empty prompt, and combine them into one, adding first and subtracting the second.
When the Negative Guidance setting kicks in during sampling, you stop producing two denoised images (from frompt and from neg prompt), you stop combining them into one, and instead you produce just one denoised image, using the prompt.
That is the reason why results are different.
@visual glade what the heck, is that how negative guidance works in ComfyUI too?
when you go large enough it simply breaks the results
can we use prompt as a filename?
This is where negative prompting smaller can work around this
On Linux? Mostly yes, the only illegal character is /. Doesn't mean you'll enjoy it.
On windows? Mostly no, the list of disallowed characters is huge.
that's how my trainer works
in comfy?
seems that it goes from zero to 60 in nothing at all. 'thicc' is a normal looking person, 'plus-sized' is unhealthy
but i dont know how to use, prompt as image filename
Negative prompt out thicc with a small weight to balance it out
of course not
"thicc"
or whatever you use
yeah it seems incredibly inefficient and i'd never ever heard of it being done that way. i had to look at the code to verify. he is correct. but i have, no idea why he does that.
it just needs to be pushed back a bit
ok, i think you need to simply run a local copy of SDXL with prompt weighting and so you can be asuaged that it will not help
I'm just using stable horde via artbot
when you put a weighted term into the positive and negative
Is there a repo with weighting implemented?
AI: "you're on your damn own"
Compel has an open pull request.
Or is it your own impl wip?
ah
(side note, did anyone ever add it to IF? I tried to do it in Compel but got too busy)
DeepFloyd tokenizes by letter, i have no idea how that impacts Compel.
most of the decisions in that UI seem to be "it makes better anime images" when that's a very bad metric
this is actually super trippy lmao
man you just described the most popular SD 2.x trainer
i wouldve assumed this was your highest priority with your fox girls
here are more for you @azure oxide
oh my god i am grooving now
prompted for an astronaut in a teacup
you can kind of see where it was... hoping to go with that
i always wondered how people could make great models with base 1.5. Like if base 1.5 can make such great modes, sdxl will be insane
by overfitting it
Do you know why weighted positive and negative do this?
i think it's a logical fallacy to assume that just because an older model had good fine-tunes that SDXL ones will be better
people didn't pick up fine-tuning 2.x because it was very hard, and everyone blamed OpenCLIP. and guess what? SDXL has an even bigger OpenCLIP, plus the original CLIP-L from 1.5
so far, a lot of the successful trainers on 1.5 can't even load into SDXL training without first upgrading their equipment
it just seems to be the text encoder fighting itself in stupid ways, but there's two text encoders to fight each other and the u-net now
These are pretty and trippy
what's funny is, if you say nothing at all to the prompts, it behaves nicer
side thing, in comfy --gpu-only is a good bit faster than the high vram line
that's system-dependent
ah
which CPU do you have?
i wonder if it's the lack of AVX512
I thought gpu-only was memory related only
I don't really understand what you mean. Negative prompts work like classifier free guidance, just that you replace the empty prompt by a negative one
as a line
does avx512 help in diffusion?
but you call the unet twice, yes
--gpu-only forces everything on the GPU and disables shifting of stuff between cpu and gpu
@rustic garnet i'm repeating AUTOMATIC1111's words about how his negative guidance implementation works, which is that
normally if you have a decent CPU the text encoders are run on the CPU because that's actually faster than shifting it to the GPU, running it and shifting it back
Share this interesting transformation after looping multiple time of the 20 steps base+refiner process. (20,40,60,80,200)
interesting, so it still has to shift the encodings back to CPU even in gpu-only?
no, gpu-only only uses the gpu
were you saying 'that's actually faster DESPITE shifting it to the GPU and shifting it back'?
the shifting back only happens in the other vram modes
You sent the buildings forward in time lol
dunno, in AUTOMATIC1111 own wiki it is described differently
ok I think I get what you're saying. Makes sense. 5 year old CPU...
i see he edited it now
-_-
LOL
where has he been? doesn't know anything about the code he copy-pasted from SGM @visual glade
I just wish people would stop using that ui as a reference, there's issues in the sampling code and everywhere else
but thats about negative guidance minimum sigma
it seems to be a trick to improve performancy by disabling cfg for certain timesteps
Issues in auto1111?
negative prompts still work the same way as before
Is there a superior local UI to use?
agree with derrian. there are A LOT of things that influence vram on pc - too many to diagnose directly, it would have to be painful and slow process of elimination
That's not what the person asked, though. They asked, "can I start an image with a 1.5 model and finish it with a refiner". I answered to that. Technically, we can. But I don't see a reason why we should. Other than maaaaybe using a LoRA, but chances are that would most likely get trashed by the refiner. Even you admit that we shouldn't start with 1.5 here.
Using the 1.5 derivatives for "refining" the SDXL's output in a certain way is an entirely different process, which can indeed might use cases before the SDXL ControlNet fine-tunes become available. But that would be complicated, because it's hard for 1.5 to operate at such a high resolution. And any useful scenario I can think of implies downscaling the image, using 1.5 for Ultimate upscale, or some use of ControlNet. That's either rather dumb, or very advanced, and there should be a very strong reason to necessitate that kind of workflow. It's very heavy...
anything that uses diffusers is superior and also ComfyUI
But that would be complicated, because it's hard for 1.5 to operate at such a high resolution.
ControlNet Tile 1.5 works just fine on SDXL outputs..
I figured diffusers were model dependent ? I'll have to check out comfyui, I'm assuming there's a native Linux version
i'm surprised that you always de-rate your own software when encouraging others to try different options. most developers have a bigger ego than that
I also don't get this linguistic and style different prompts 🤷
it was confirmed by StabilityAI's internal testing, so
but how should it work technically?
it Just Does because the way SDXL was trained favoured OpenCLIP a bit
if you make both prompts different then the tokens don't align
BTW, I saw a discussion here one time about comfy UI gpu-only requiring a large amount of vram, but I can do batchsize 8 with --gpu-only and it stays under 20GB VRAM
@rustic garnet not everything has to be ideal and perfect. we're denoising imaginary data samples and making something from nothing. if you misalign stuff, that's often where new details can be introduced
it just doesn't make sense
if you have two different prompts in the two CLIPs, you are inherently accessing a different part of the data distribution than you would if you used the same prompt in both
I tested it once and it maked things worse as expected. Haven't looked into it since then
https://github.com/huggingface/diffusers/issues/4004#issuecomment-1627764201
see the examples here
they're not "better" results, they are "different", which in some cases does overlap with "better"
I would switch to yours completely, but it doesn't support mobile browsers and has very inconsistent support when it comes to access from another PC in your local network.
Like, workflows are stored on the client, but outputs are stored on the host... Templates do help, but don't solve all QoL issues.
the point is that your tokens are misaligned afterwards. If you use a prompt like
"a dog" for CLIP-G and "national geographic" for CLIP_L then you merge the tokens "a" with "national" and "dog" with "geographic"
P.S. I just realized how deranged it seems to say '20GB of VRAM is not a lot' but I was thinking 40 or 60GB as 'a lot'
or are the prompts not concatenated afterwards? I could swear they are
*the embeddings
the cross-attention layers of the text encoder can have vectors of arbitrary length, it's the pooled embeds that can't
I was gonna say like.. what kinda card do u have access to LOL
the pooled embed is only using CLIP-G anyways
yep that's how SDXL favours it, aiui
Joining this server and observing the discussion on some of these channels has me realizing I'm fairly lacking in knowledge of generative ai
it's more like I know it's not for everyone so I don't want to push people too hard towards it
that's a pretty mature take though
if people gravitate toward something, that's their own fault.
i think they end up appreciating the software and its capabilities more
i used to try and convince Windows users to switch to Linux 😁
learnt me lesson
nope, I'm right, they are concatenated
You've fallen, soldier. Keep fighting
so it weird that it works to use to different prompts
instead of convincing users, i just open bug reports and fix their issues now.
most of us here don't, its just that we also have some actual machine learning engineers here as well as, similar to any other tech fields, people who can catch on to what's happening behind the scene without prior knowledge
Speaking of your software.
It also works fine with any camera output. After all, it's an img2img process.
you could fill up the first prompt with blank tokens and then use different prompts - would make more sense
like most of this stuff, it's not intutive when something new works.
anyways, for me this "linguistic" and "style" prompt thing is still super experimental and a weird hack. I wouldn't claim its the way to do things. Maybe in 2 weeks there is a totally different way of doing it
I don't know for sure - but I think once you have embeddings the order doesn't even matter at that point. The order and connection between tokens has already been taken into account.
i don't think anyone has claimed it's THE way to do things? it's just Sytan's workflow
sytan's our jesus now, he can do no wrong




