#✨|sdxl
1 messages · Page 6 of 1
Yes absolutely
ok, I am not gonna argue this with you, that is not at all how that works lol
the only way to pull less power with more CUDA cores is for them to be literally closer together
8gb is struggling to run SDXL 12gb is not
i wonder why tf nvidia made the ti with 8gb instead of the opossite
Fuck you, that's why
which resolution usually is 12gb able to generate? i've seen people making up to 1024 with 8gb
so then you think 1k cuda cores at 1GHz and 1k cuda cores at 2GHz use the same power?
I mean I can do 4k with 10gb.
If you are wanting to do any training like making LoRAs you'll struggle on 8gb.
you force in more power into the GPU in order to overclock it and get more performance out of it
My 3080 is severely overclocked at start, meaning it draws a lot more power for a little more perf over a stock varient
straight up 4k or using hirefix?
Doesn't actually matter.
also, the 3090 chips are better manufactured, the 3080 gets the cut down failed chips of the 3090
Which means the 3090 has better silicon, and in general needs less core voltage to sustain clock speeds
If you can high Res fix you can do it normally as well. Just takes longer.
I think the formula for power vs clock is a cube or something close
You're joking, right?
yes
you need way more power to get a little more clock speed
The 3080 is pretty much maxed out
exactly
I'm limited by the power limit on my FE card
It's cube if you assume you need higher voltage to get higher frequencies. Which has been true historically, but..
I thin pseudo is confused here is all
higher frequency needs higher voltage but higher temperature also needs higher voltage
yeah
that's because the silicon needs more "push" to update itself faster
and the better the integrity of the silicon chip, the less voltage it needs to sustain a specific clock
Drop down enough levels, it's due to the capacitance of the transistors.
which is known as its efficiency
you can undervolt a chip to save power while not losing performance
Switching them faster means you need to move electrons faster, so you need higher current, so you need higher voltage to push the higher current.
But higher voltage means the capacitance will increase...
that's only because they leave a margin of error
for example, this GPU uses 1.2v to run at 1850mhz stock
but I use just 0.89v to run it at 1800mhz, and it uses way less power
because as chips age they also need higher voltage
...so you need even higher voltage to compensate for the effect of higher voltage.
yeah, higher impedence starts to happen
You can even gain performance doing that, because it also lowers power consumption which gives the boost more power headroom to clock higher.
more heat = more resistance = more power needed = more heat
#🌶|off-topic or #💬|general-chat before a mod shows up again ^^'
yup!
I never managed to wrap my head around impedance. Thinking in terms of subatomic particles works fine for me. 😛
this is very much on topic, its about GPU clocks for SD lol
'sdXL'
that's why i said undervolting is an overclock, and he said, "are you joking?" but i guess when you say it 
but yeah, anyways, my current 3080 uses more power than the 3090 I am looking to get, and the 3090 is faster
that's workload-dependent, anyway
undervolting is not an overclock, an overclock is an overclock
undervolting can help you overclock lol
Undervolting isn't overclocking. It just has most of the same implications.
undervolting is an overclock, ask AMD or NVIDIA support
its not ._.
it helps you attain higher clocks than stock
You're literally not changing the clock. 😄
especially on something like the Vega56
they are just different
and it should be leveraged by a lot of people running SD as well
you can save a lot of power and lose no perf by undervolting
it's hard to know whether you'll freeze your whole system by undervolting, or else the mfg would do it for you
but anyways, this all got derailed
the 3080 I have currently is both bigger and uses more power than the 3090 I am going to buy, and its slower
which makes sense why this 3080 LOVES to strangle my pathetic little 650 watt PSU lol
Is there anything other than ComfyUI that supports SDXL LoRA atm and runs well? The VAE decoding is much slower for me with LoRA which is kind of a problem when testing out LoRA training.
i keep meaning to write a proper bug report, but with a primitive wired to a CLIPTextEncodeSDXL node, even if they're not attached to anything else- deleting either or removing the connection breaks the graph; it won't run at all
Idly notes that XL is ~4x faster than 1.5 at 2048x2048
...though the MPS library still crashes in VAE decode.
I noticed that as well
I don't know if I've tried 2048x2048 in SDXL, but I cannot even run it in 1.5
I imagine it's because of tiling. It's just doing 4 tiles instead of 16.
Or however that works.
SDXL slows down less and uses less VRAM than 1,5 and 2.1 when increasing resolutions
meaning, it requires less addtional
Makes it more practical for me to work on my bot on my laptop.
Although honestly I should just stub out ComfyUI.
Though personally I don't care too much about 2048x2048, but I'm sure there are use-cases where it matters.
Also, I just had ComfyUI suck down 60GB of VRAM. XD
Mac doesn't have certain precision levels:
The neural engine can't be used for training anyway. It only supports Float16, Int8, and UInt8, and is only accessible through CoreML and MLCompute. PyTorch uses neither of these APIs for training.
bf16 is a no-go
"Error: NDArray dimension length > INT_MAX"
wonder if you're hitting that, lmao
I hate pytorch (amd user)
So 4GiB I guess
Me and my 2070 Super, after being told this sad news
Oh, I'm just trying inference.
@fresh path sorry mate. what platform did better? JAX? Flax? ONNX? DirectML? TensorFlow? Some other garbage no one uses?
hey, this looks like the first image I made in SD 1.0 back in the yonder years!
But honestly I don't need full-res pictures from this.
my first image in SD 1.0 lol
✨ WETNESS ✨
I think most including comfyui uses directml
im not sure what the perf is meant to be
but it was alright
DirectML occurs via PyTorch.
I'm disappointed, I did a big sorting of my 16k images, but I still have the 11th.
If it’s the opposite how did I undervolt and overclock my cpu at the same time😂
It isn't. We've been over that. 😛
people are opinionated on this stuff and most don't have a background in electrical engineering 😄
that's a software-specific feature
the core concept of undervolting, will inherently overclock a modern GPU that uses thermal headroom to decide its max clocks
1.0 isn't scheduled until next week right?
the same reason cooling off a 7800X3D makes it go way faster..
oh well yeah, if your GPU was thermally throttling, then it will
thermal throttle by design at stock settings, yes
yeah, most well designed products don't do that 😅
whut? of course they do
no 
maximum strength tylenol was designed so that it killed half of the people that took it. and then, they backed the dose off a tad
GPU's sit wellll below their throttle temps
Throttling isn't a firewall anymore
it's just a modern approach, man. if the GPU can go faster, it will
It's more of a gentle curve..
what melts at 100 degrees thats so bad anyway?
The solder.
yeah lol
100 degrees C is going to increase the risk of electromigration as well
true
electrons move along the junctions and it literally degrades them
the transistors themselves probably go way over that
this is the way I have to run my GPU right now for it to mostly not turn my PC off
And modern GPUs/CPUs are so fast, they need to account for the speed of electricity. They can't instantly throttle up, because then they'll brown out; it's like suddenly removing an obstacle to water, the water doesn't instantly fill the new channel.
yeah it always freaked me out how the temp we see of a device is an average and not real
hi is this the sdxl channel?
this is #gpu-go-brrr because they deleted that channel
NOW WE HAVE NOWHERE
thanks for rubbing it in
yes
oh, i added png info to all my bot's generated images
having extremely adjacent to SDXL conversations seems pretty reasonable to me lol
Its not like we are talking about cars, its a convo on how the hardware required to run what we are all doing works lol
most people here would benefit a lot from undervolting their GPU's
same perf, much less power
No you do not. 😛
I think the A100 has, like, ten.
(Or maybe that was the W6000)
you don't need them because you'll be too busy trying to make ends meet by working multiple dead-end jobs after splurging on the H100, there won't be enough time to play games
No no, just sell your house
i'm already homeless so i guess this plan isn't for me 
is unipc the worst sampler?
some people say it's the best
those people are "inpatient" in a "psychiatric intensive care unit" and we wish them the best
I tried it but never saw any real benefits to it. Though I never saw any real benefit of all the different samplers using 1.5 either. Some of them worked some of them didn't and I just disabled those who didn't work.
ControlNet works really well with UniPC for some reason, so, i just use it for that. literally, only for that.
a zebra made of crochet
the sarengeti spaghetti valley
i generated images using unipc, i just got blanks images, i dont why
wait
i used unipc with exponential scheduler
i just got all blanks for every generation using that combo?
has anyone run Zeroscope V2 XL? and if so have u ran it with under 15.3GB VRAM?
comfy do you support anything like it? (txt2video)
once a decent one comes out I might add support
do you know Zeroscope V2 576w ?
and the XL ?
looks way better than is it Gen2 the runway one?
im gonna try it with auto1111 but if you had a model/support for this i would rather stick in comfy
i am doing a comparison so i am using all combos, so is it with my pc or is it that unipc doesnt work at all with exponential scheduler?
yeah that's very possible
I made it so that "normal" matches close enough the default uni_pc scheduler
the others might not work very well
whats the VRAM on that?
24 GB
nice one
yeah, need it for playing with SDXL tuning
run this for me https://civitai.com/models/96563 lol
I don't have it yet
currently trying to secure the payment for it
sending $700 to pay pal is apprently "sketchy" lmao
my 2070 only has like 9gb VRAM 24GB total it says
even renewed/refurbished 3090s are still going for $1000, so as always be careful of scams
its a verified seller on hardwareswap
4070 was going to be my choice
and we are doing it through paypal good and services as well, so I have buyers protection
@visual glade do you know any tool to attach images, so i can replicate a1111 grid?
i am doing comparision
but i discovered its cheaper to get a prebuilt on Scan than it is get the parts
What are the possible resolutions?
he actually uses SD< and when I told him I am using it to research SDXL more he was like "Dope!"
he was also able to run some SD on it to give me some speed numbers, said he also runs vicuna 30B on it
good, my laptops doing me good so far, but next big job i'll get the 4070 machine
for text models you should only expect 20b
for 1000 token memory
hopefully the war hasnt started by then
30b would be a lot of CPU ram usage i think
u heard the whole 'we will blow up the chip factory' and bring the people/tech to US if you come to 'the island'
could be smaller bit quantized
Zeroscope V2 XL (txt2video)
The XL models use 15.3GB of VRAM when rendering 30 fps at 1024x576
back when i was running text models the above kinda thing was what i was using with like
gpt-neox 20b
took a lotttta vram and had to offload some to cpu
maybe stuff's better now
i dunno, is it worth me trying this model
it looks so good
hey i can always upscale, but only so far you can get with that
uhhh idk, never tried txt2video stuff except deforum way back
Anyone talking sdxl? What are the lowest possible resolutions?
4bit can cut memory requirements in half, pretty impressive
the image data is trained on 1024x1024
SDXL is running super fast for me
Ok. Thanks
any resolution that adds up to a megapixel is recommended
thank Comfy it seems to be his UI helping it
don't get too cozy, 1.0's coming in 5 days 😄
yeah
768x768 is barely ok. 640x640 can be made to work in a bad way
but by design 1024x1024 is how it should be run
Is that style stuff used by the bot anything we're going to have access to as like a separate kind of thing we load in?
aye, but might not respond to prompts all the same though
or is the style just additional pre/post prompt?
i think its just going to get more pathways
final date for sure?
Whats the resource load on SDXL? Like is a 3090 able to handle it?
3090 handles it fine
im using a 2070
hasn't been any sign of it getting pushed back
i can get an img in 15 seconds
yea
oh it's going to get released as 1.0 on that date most likely even if it's a terrible checkpoint
a 3090 is strong enough to run 2 instances of it simulataneously. you're good to go XD
it's insane how far we've come
these were the speeds i used to get on like 512x512
i could cook eggs on my laptop but it runs good
i dont think its gonna be terrible checkpoint
SAI has already stated in advance that they're "looking to provide regular updates to the SDXL weights" which to me indicates 1.0 is gonna be kinda rough
Considering 90% of you guys don't remember 1.1 it would be good to revisit it to see what that was like.
@warm hazel i love the patronism, keep it up 👍🏽
hello, at inference time, what process is done? it just take the text and a random noisy image? and it comput the denoising unet?
Not you specifically.
and SDXL appears to be faster than 1.5 at good resolutions, because it doesn't have to use highresfix
i think the checkpoint is fine; when i roll in bot it really feels like 1 image = 9 gens in 0.9 worth of quality most of the time lol
idk what changed but the release candidates seem real good
but there weren't any mentions of license changes for 0.9, right?
that's still just guessing on our part, not any official guidance
really? they don't follow my prompts.
to be seen in a few days anyway
i first roll my prompts in 0.9 to get to a point where it's pretty consistent, then i roll in 1.0
and the 1.0 gens come out a lot better
i don't do any kind of weird cherry-picking like that. i go straight to 1.0 and try to gen what i want. it does not work as well as 0.9 for me
... i know how to prompt sdxl
it's OpenCLIP style prompting, i helped Sytan figure it out.
i'm not saying that, more if you can get a good prompt in 0.9
why do i get some black rectangles when using hirefix?
"trashpanda is amazed by the quality of the release candidate and holding his head in glee"
oh yeah and cuz of the bots restrictions
namely the above
you have to be aware of the filtered words
at all time
if they're any part of the tokens they're cut out
@quasi remnant my point is that you shouldn't have to focus on where the prompt is made, first. that's a weird comparison method that the model authors designed because it favours their newer work. i'm running the weights locally. i have contributed a new workflow to the img2img pipeline for Diffusers, i'm pretty familiar with the internals.
anyone know why this happens?
Usually it's because the VAE Fails to run through properly
when i try to upscale it again it just ignores
What Interface are you using? Comfy, Auto1111, something else?
with 1.0 you only get 1 pic at a time; in the time that 1 pic is made in 1.0 i have 6-12+ pics made in 0.9
auto1111
once 1.0 is out you'll be able to test it a lot more
you get two per gen in the bot channels and over 6-8 gens you can be pretty certain you've seen their 4 test models' outputs.
Try --no-half-vae as an argument
so i prefer to run prompt engineering tests first then pass it through there, it does come with quality results
you get the same seed, sometimes it's the same pic just micro-changed
with more details
so i consider it 1 pic
sometimes the 2 are completely diff
that's all weird test methodology you seem to be doing specifically to give 1.0 an advantage
sometimes one is just more detailed ver of the other
already using
this one
mine is like this ```@echo off
#set PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.9,max_split_size_mb:512
set PYTHON=
set GIT=
set VENV_DIR=
set COMMANDLINE_ARGS= --xformers --disable-safe-unpickle --allow-code --xformers --skip-torch-cuda-test --no-half-vae --api --listen --update-check --medvram
call webui.bat
@echo on
if it works it works /shrug i'm impressed with the results of 1.0 and that's all i personally care about at the end of the day
i care about a lot more than that, lol
once i finish a prompt, it works a hell of a lot better in 1.0 most of the time
Are these models just going to get more and more VRAM hungry or will we get lower as we develop?
considering they're nuking half of the model i'd say more efficient
both, which means they'll try to cancel them out around consumer hardware
I tried using hunblemikey's workflow, upon using it says empty latent ratio custom sdxl, I tried using comfy manager, but couldn't find it, can anyone help?
wdym?
fair, im looking for the open source render farm comfy mentioned yesterday
Does anyone have performance results for Intel ARC cards and SDXL, or even AMD? How do they compare to a 3060 or a 3090?
a spaghetti fall
Does anyone know of a good tool that can use BLIP-2 to auto tag images?
I've got taggers for Booru style tags, but nothing for plain english
i have a bghira/SImpleTuner project with interrogate.py that does this
Is that going to be any good for tagging stuff for SDXL
I tested using a dataset I already had that used Booru style tags and it wasn't brilliant
Welp
did it have a trigger word? did you make sure to keep 1 token, shuffle rest?
reply meant for this*
I just realised all the space pics are really good because the training data is made by billion dollar telescopes
try to get a phoenix nebula
it perfectly combines the concept so that a phoenix shape arises out of the space dust
This is how many images the applied ML team at SAI has made in Comfy while testing SDXL.
This doesn't include the massive grids we make. Or other teams at SAI testing it out.
This is just the applied finetuning team.
Can't in SDXL, as I can't train the text encoder, so it disables that option.
Cache the outputs of the text encoders. This option is useful to reduce the GPU memory usage. This option cannot be used with options for shuffling or dropping the captions.
@hard fractal do I get an achievement if I can get (1.5 equivalent) finetune levels of improvement with a 43mb lora (trained on 5k images) on 0.9base? xD
dang that's only like 7x more images than @waxen berry made
Nuking half the model? What are you referring to?
the refiner going away
I see, so what takes it's place?
nothing
1.0?
They need to fix it blurring stuff in img2img if they are removingthe refiner I think
Otherwise any sort of high res fix will just break stuff
I mean wasn't comfy working with them to implement it?
i've put work into Diffusers to fix that issue, so, it's already happening
do you mean the latent upscaler that makes things look blurry?
No, someone was talking about this earlier as well.
when you do img2img, it washes out loads of details on the original image
So if you are trying to do a pixel upscale highres fix type thing, you lose loads of detail and it smooths it all out
yeah, that's because it adds noise to the completed image
It wasn't an issue in 1.5 or 2.1 though. Or at least I never had that issue.
it depends on your settings and your prompts. the hires. fix in SDXL base needs a lower CFG value
it's just a problem for A1111 or your ComfyUI settings, and not an issue with the model itself.
that'll lock it in too much to a single outcome
I've tried multiple settings and it always washes out detail on the 2nd pass
lower CFG on the refiner gives it more freedom to fix the image
No. SDXL won’t do i2i
We have all tried
then you haven't tried hard enough lol my implementation overrides the model for img2img with the SDXL base model
Do you have a Comfy workflow that can consistantly use Img2img with low denoise without washing out details?
no he does not
Tried cfg from 4-12 doesn’t help. Tried the manual denoise from all ranges
Doesn’t work
base + overfitted lora + refiner at 0.2~0.3 denoise
@eternal fog idk how many times i have to tell people i don't use ComfyUI
That's using something extra with the LoRA. I mean with the base stuff
lacks face details
Tried the one step less on refiner don’t work. Tried base and refiner no go
refiner was trained on leftover noise from image generation. it dosent work on added noise
you could try to prompt fix that - but at cost of introducing huge bias
ptx0/s1 is my fork of the SDXL base model on HF hub, and ptx0/s2 is the SDXL refiner. they're forked so I can change their scheduler_config.json to use DDIMScheduler by default for convenience.
that prompt_variation runs when a img is pinged to the bot with a prompt to guide diffusion with
it just uses ControlNet if i don't provide text with the img
in fact the base model is so good at img2img, especially compared to the refiner
You have ControlNet that works with SDXL?
the refiner doesn't seem to add "enough" changes to really impact it much
yeah bro we know its better without the refiner. this shit is the easiest shit in the world in comfy, but it still dosent preserve detail
@eternal fog i experimented with making a Controlnet Tile model and can't figure out how it's trained. i use 2.x and 1.5 controlnet models for SDXL outputs instead.
that's why you cut the img2img output from the base, halfway, and feed into refiner.
Wouldn't it makes the images off?
oh it certainly has side effects
yeah tried that. even worse
not enough that i notice
@shy kelp idk what to tell you lol it works in Diffusers
i'm using DDIM / ddim_uniform equivalent
diffusers?
This is what I have at the moment. The left is just the base with a 2x pixel upscale. The right is after the 2nd pass.
This one looks ok, the refiner has added detail to the eyes, but if you look elsewhere, it's pulled some detail off the skin, although this image isnt' that bad.
- no refiner, or refiner at 0.15~0.25 denoise
very realistic looking details, high risk of deformities. roll the dice more
You got a comparison? Let’s see it.
the code is open source 😄
ur telling that if i just use ddim+ ddim_uniform sampler it will work?
has the requirement of no short prompts to work properly - so you need a few supporting style tags (anything as long as its not nothign)
Dude, for real?
1.5 with something around 5 gives shit images mostly pixelated and some random images which have no context
you can just use a clip query to get a string, then use that string - that makes it work
yep
i think hes talking about upscaling

is there any workaround to use RAM as complement for VRAM?
looks a bit iffy for me with CFG 2
cool
Left is CFG 2, Right is CFG 10
@uneven dove what do you think of training a model with 20million images?
Not gonna train for sure, ask our of curiousity
depends on the images i would say
i'm using DDIM uniform, like i said
lower CFG has higher realism from base with this scheduler
idk what to tell you, sorry if yours doesn't work that way
i do 20/20 between base and refiner, and refiner adds no extra noise. i use a higher CFG on the refiner to bring the washed out look to life a bit more from the base.
This will give an entire different image
One of the devs put on reddit I think that dpmpp_2m_sde was the "best" in their testing, so I've been trying that.
@eternal fog that testing data had people upvoting deformed images. i know, because i upvoted deformed images
Yeah it does, but even 20million doesn't cover many subjects.
Just wondering how many did MJ use?
DIMM actually makes the washout even worse for me
i think that's a waste of time? why would you want to train on 20 million images? what are they of?
would be a waste tho
Left is Pixel Upscale, Right is after an img2img pass over it.
its a matter of prompting. if you use any clip model to extract a prompt from a picture, then throw that prompt back into that node, it will work pretty well
depends on the model. for fixing 2.x i needed about 200k images from real cameras
Is this i2i?
also this for negative "Deformed, unrealistic, bad quality, grainy, noisy, plastic, hazy, low contrast"
It's not though, because on the same prompt some schedulers are fine and others aren't.
I think the issue is that you can get very different results for very very small changes.
that is the base model generating and feeding its 50% denoised output to the refiner, not a completed image as input, no
where'd they say they were getting rid of the refiner? i thought just the bot genning images doesn't use refiner
I ain't training, just some random thought.
Training on multiple subjects
Just had a quick check too, DIMM high CFG Deep fries the image. on the SDE ones it makes it look better.
Probably why low CFG works on DIMM
Like an all round model.
Like making SDXL double capable
But it looks shit on SDE
i can run up to CFG 20 on the Diffusers implementation of ddim_uniform and it doesn't burn.
I often have troubles focusing with xl
i'd try higher, but my bot has a hard limit that i'd have to remove first
anyone trained LoRAs yet?
yep, they work on the base model
I have, but I'm getting weirdness with the training
which script/params did you use?
Like it worked last night and did training
Then I put the exact same parameters in today with no changes and it OOM
khoya-ss is the one people are having success with but darien-distro has Easy Tuning Scripts for SDXL repo that optimize things further
Do you have a link for this? googling darien-distro doesn't bring anything up.
nice thanks! did you discover good face parameters yet?
it is incredibly biased on picking facial features it seems... if you have abnormalities in any faces, suddenly, they are everywhere
while there are methods - I'd recommend against them, as they will automatically be invalidated by 1.0 release - so further research/implementations of them are kinda moot
everything else goes though
literally train anything except
• face
• body parts
• eye accessories
I thought 1.0 was pretty similar to 0.9? Like similar enough that the scripts should be easily adaptable?
yep, for literally everything except the 3 things I posted above
faces will be much easier mind you, not the other way around
why is that?
this was my best 1.0 result so far and it didn't do my prompt 😦
hopefully they get rid of the refiner step
that was a mistake
ufff! And that's 1.0?
it's a technological requirement to do certain levels of image detail @sharp robin because they stuck with the ancient VAE architecture
try it yourself in #1100170312106127410 normal prompts work well
I'm assuming @uneven dove was displaying a 1.0 LoRA
nope. that's the bot here
Any tips prompt on framing a photo of a human to not show the fingers/hands? Like the one above.
they have talked about removing it already, so I assume it is not that big of a requirement
I've tried training Loras using the diffusers sdxl dreambooth script, and they all come out with either weird lines or without the subject at all. Has anyone had luck with that script at all? I suspect it's borked.
they're able to do that because they can continually just point to previous base models that were worse, and say, "but 1.0 is so much better than 1.4, 1.5, 2.0, 2.1"
it's like telling someone you doubled your sales, but it's just from 1 to 2.
if they can do 1.0(no refiner) better than .9+refiner its a W
technical details indicate that is extremely unlikely
you can keep hoping for it but i and many others are likely to use the refiner forever, even if SAI abandons it
I haven't tried it, did you try kohya?
Not yet - diffusers was easier to setup for my env
yawn
Ohh interesting - I'm currently mostly running stuff on modal without a ui but might be worth setting up a notebook to check this out!
DDIM, PLMS, UniPC samplers do not work for SD XL
lol
fail
once again he can't fix his samplers
i wish he'd just throw them all away and use Diffusers
honestly just use Vlad's fork
1.0 weights aren't out yet right?
correct
last time I checked a1111 didn't even have img2img working on unipc
now I understand why vlad switched to diffusers, that was the right choice
Open up.
I definitively need a better gpu 💀
is that batch size like 150?
mine took 300 seconds per it but i was on an A100-80G doing a full general fine-tune on 2.1-v 
his SDXL PR says DDIM, UniPC, and PLMS are not working with SDXL 
that's not either yours or my experience
that's cause he didn't bother abstracting things out
you don't have to bother with code abstractions when you have genius on your side
in comfyui I wrote the code in a way that I could just swap out things and it still work
yeah, Diffusers did that too, which was an interesting thing to discover for me, how much is being abstracted away and "normalised" under the hood
hadn't looked at that much til this week
kudos to Auto for reproducing the SGM results though, guess that's what happens when you lift code letter-for-letter
that's easy when you just use the code straight from it
Why?
less technical debt and easier to extend, a more quickly growing ecosystem full of various disciplines
Better architecture, far more advancedments and features integrated
ask Vladmandic why he switched to it
gave him native Kandinsky support in his fork of sd-webui without any extra labour on his part
oh I checked the reference images on his pr, there's a lot more differences than there should be
i was going to say that 
it's probably the optimizations he's doing
Euler is so simple there's no way he could have screwed that up?
it's not that, I do more optimizations in comfyui and my images match extremely closely
help the poor model a little by adding some fluff 😉
I think someone already done this (use one node to run both base+refiner). But I added a little more to loop the base+refiner process multiple times. https://civitai.com/models/108594?modelVersionId=116911
OHHHHHH
mine sucked because of the "coon" again omfg
you dropped an o
this bot is the damn worst
yeah, saw that in other chat
"let's party together" I made this 2 days ago 😄
hi all!
nice hope it works and everything
lol that raccoon was really a lucky parameter and seed roll, the eyes through those sunglasses amazing
oh nice, thats my workflow 😅
omg i didn't even think of it being DOA
that's so unsettling, thanks comfy
very very interesting to see the difference from the loops
if it is, I am covered :p
worst case is I get my money back
Here's how closely sgm and comfyui match with the right settings
that's the kind of difference there should be when using euler even with max optimizations or whatever
i'm so disappointed in you, comfy /s
is this the 450 dollar one? if so you're lucky... don't know wtf happend with used prices in europe, few weeks ago i was looking and they went for 650euro on ebay, it's bloody 800+ now
yeah, Euler is truly wild.
Yeah, thanks for sharing the nice workflow.
no problem! Gonna be updating it soon
yeah euler is easy, at least the default sampler was not SDE cause that might have been a problem
Is there a big difference between looping the diffuser, and just running SDE+Karras for a bunch of extra steps?
Vs my method with my prompt split and the same seed
oh my pic is without the refiner
what do you mean?
It was my understanding that that combination effectively adds a little bit of noise at every step
are you being stochastic 
you mean the advanced sampler in comfyui?
are the two images from the bot 1.0 vs 0.9? or different versions of 1.0?
it's a waves hands around mysterryyyy
it could be a lot of different things
i see 😅
nope I'm still conflicted.
On the left is a raw SDXL output, on the right is that image run through Proton. Personally I lean towards favouring the RH image
While that is what I'm using, I think I just mean dpm-2m-sde-karras. Never mind.
What's proton?
And Proton?
I think the left one is better, other than the awful shit SD does for earrings
if you check my workflow you will see Im outputting the SDCL image to an IMG2IMG step
1

gtx 1650 moment
wow ok that's a first lmao. SDXL just made NSFW without me telling it to
Oh I see what it's done lol. I was testing img2img and the top is almost skin coloured. I think it decided that the top wasn't there
sdxl’s imminent release makes it look like a good time for me to get into this. Is vlad’s fork the one I should be using or is there another app?
A lot of us here are using ComfyUI, but if you just want a simple generation tool it seems like Vlad is decent
This was done with img2img, seems like it works decent enough. There does seem to be a point where it just switches though. From being similar to the input, to not looking like it at all.
if you want the full experience get ComfyUI and this workflow: https://github.com/SytanSD/Sytan-SDXL-ComfyUI
Yeah ^, I'm using that workflow too, just modified slightly for upscaling
ahm... nope
I think my attempt to force a style with prompt weighting has kind of locked in this face for some reason
dont threaten me with a good time
I've been playing around with Img2Img a bit, it can be very particular on what prompt you use and the exact step settings. Even just 1 thing being slightly off you get a mess. But when it works it works reasonably well.
@boreal bough how am i supposed to see training progress/errors with this Kohya distro ui?
I mean it's normal attire for the beach so...
That's a completely different step though lol
Anyone else find the text rendering in ComfyUI a bit weird?
thanks, I'll try that first
It's blurry... and disappears entirely well before it'd be too small to read, if it wasn't blurry.
update it, some dpi issues were fixed recently
Turns out my 3080 I have now is bigger in almost every way to the 3090 I just bought
insanity lol
3090:
12.5 inches long
4.8 inches tall
2.3 inches thick
3080:
12.7 inches long
5.5 inches tall
2.3 inches thick
whats proton?
this distro Kohya ui that caith sent aint workin:(
3080 is 1565 grams
3090 is 1702 grams tho
So the 3090 is a lot more dense
will definitely be getting a bracket
oh damn. right one passes the vibe check. 👍
i have the biggest 3090 ever made and no bracket lmao
ketchup incident. heh.
yeah i just meant it as a general heads-up 😛 i liked the pics
you dont yet. unless derrian implemented it already - cause that's from yesterdays patch on kohya
documentation is still lacking though
will prob work in a day or two
okay but when I start training the button greys out and nothing is happening, no resources being used
oh that
um, you should be able to see in the cmd thats open in the background
kohya should be running in that
tis completely blank
did you add it to the queue, before pressing start?
sdxl could probably do a better job if it could do inpainting
got it, thanks. ngl i dont see how this is easier than kohya ui but queue system is neat
nvm CPUAllocator error as usual with any batch over 1
@visual glade even though im still learning ur ui i would like to give you a high five for the memory optimization🙌
anyone know if this node works like multidiffusion tiled vae from a1111?
I get this in any lora config on any install of kohya. 5800x 32GB 3090 isnt good enough?....
Is there a way to choose the tile size for vae encode/decode
thanks I did a bit of memory optimizations because I wanted SDXL to be well received
it's easy if you change the code but not from the UI
You think its good for it to match the grid when you are doing grid method animations?
Is it something that would be open to creating a custom node for?
I dunno how much is exposed to the nodes
would it be possible to add a preset workflow drop down menu, something with recommeded settings from you or stability, and like community list?
it's super easy if you create a custom node, just look at the node for the tiled VAE
i really like the drop img workflow to it and it just works
but sometimes it messy to look through that
alright so people who claim that img2img works, what settings?
i wonder if they'll do the inpainting model, and if they do, if it will be done the same way as before, with 5 new extra channels
I can show you mine but the nodes are a mess.
And depending on the image input you have to change the steps around or it doesn't work
this has been working for me https://github.com/SeargeDP/SeargeSDXL
yeah well it works, it just does a shit job. so how do you get to preserve detail
hah as soon as I said it was working my results look like this
Like the source image and your idea
like img2img upscaling. some people say its not shit.
with the right settings
so whats the right settings
img2img upscaling, I still can't get that 100% right, it still does stuff like remove skin texture.
I think I'm close, but something always goes a bit off.
im using DIMM and DIMM uniform and still looks like shit
oh, i use the refiner model for upscaling. not the base. the base does this
broken as all hell
you'll ever got it to work 😂
Yeah I get that with the base, but then with the refiner I find if you do too much denoise it goes mental, and too little nothing happens
And there is a very small window between the two
And sometimes there is none
alright so how do you do the refiner, what settings
i was using the base as img2img for artsy stuff like, comic book versions of people. that, it does really well.
i did a PIL upscale to 2048x2048 and then put it in the refiner with a lower guidance than i use for typical denoising and a high strength like 0.6
I posted an actual image 2 image above, not an upscale though
How many steps on the refiner? Because when I do that it melts.
I thought you said you prefered the base model
this?
yeah results not even close
It never really was with img2img.
I think people have been spoiled by controlnet.
Because I kept thinking it was nowhere near, but then I went back and that's what it does.
If you make the denoise too low, to try keep it the same, it just blobs
I'll try a pass now with low denoise, I still have the same image loaded.
i had to put the refiner strength pretty high for it to destroy the image
like .9?
it doesn't do very good image transformations at all
and by that i mean, CHANGE the image
it just adds more detail 
i will try and find those settings. sec
ptx0/s2 is my refiner model fork
.3 strength, 7.5 guidance 10.0 aesthetic score 1.0 nascore
in soviet russia. bear trains you
whats guidance? you are using controlnet?
no controlnet, just SDXL refiner there as an image input from the discord attachment which is Image.resize'd to the 2048x2048
CFG
@shy kelp Automatic1111 image2image works, tho does not support refiner yet and takes ages but it is seeming like it is working
30 step .40 denoise
yeah that's using the traditional img2img SEdit method that adds new noise.
like like it didnt upscale 😂
Oo you have a bot to run multiple models through discord? That's really neat dude
also make sure you're using a different seed
I just use a 3rd SD 1.5 model (like Juggernaut , EpicRealisim or photon) for upscaling it has better skin textures and gets rid of some of thr blurry bokeh background that is overdone by SDXL.
also whats the point if its not using the refiner at all 😂
@gentle mirage any model on huggingface Hub 
Is the code for the bot open source
Id like to try if i can
this is probably the way to go rn
it does language models and so on, too. there's also Bark TTS. yes, it's bghira/discord-tron-master on GH
🤝
Bot 1 isnt using refiner rn
this was after the image was already upscaled, i just want i2i working well first upscale comes second, tho will test it out, just takes long to do this,
it's not got great support for low VRAM systems since i run it on an 80G but i've done some work for that lately and it runs basic stuff alright on my 8G laptop GPU
likely to come soon
i hope not
just one.
if you read the paper it was only meant for the last 20% of the gen or .20 denoise
Cool
200 steps
you should go check how much the last 20% can actually change the image
you guys know anything about single-order vs multi-order samplers etc?
i thought people were advising to use it at 0.35 tho?
i actually like it better at 0.20
most people aren't using SEdit, they're using the new partial-diffusion workflow
SEdit expressly does better with lower noise levels whereas the refiner seems fully capable of continuing the base model's denoising at higher percentages
at the top middle is the base model's output when it's fully denoised at 15 steps
the left column is SEdit and the right column is Partial diffusion
the rows go from .1 to .9 strength for the refiner, where it does 90% of the denoising work
you can see somewhere around 60% strength the deformities start kicking in like an acid trip but it's still quite valuable results for certain styles
the columns btw do 20 steps total, i did 15 at the top because it seemed fairly representative of the base model's contribution
This is about as close as I can get it where it changes enough to follow the style, but doesn't change the image completely.
Any less denoise it barely looks different, any more it starts changing stuff.
and then the rest of the world will start realizing that img2img dosent work with the refiner
@boreal bough m8 your lora config is trying to allocate 54 gigs
This is just slightly less denoise and it barely changes from the source
is this channel just going to be scantily clad women for a few hours now?
Nah I'm going to change the image
thx
we dont know if the implementation of auto can do it or not yet, but at least the base model does some, im trying to be positive, best case for me is they just figure out a way to do away with it like they said they may be able to
We now return to our regularly scheduled episode of "Raving Racoon Rampage"
fuck yea bud can you get a rakun attackin' old David Attenborough?
when they said that?
still havent seen one source for that claim
Not even Sytan knew about it and he's obviously very active. so no
I tried to make the attack and instead they became friends because he's David Effin Attenborough 
XD you should you're not running with all optimizations off?
my full setup, with batch 8 should run you around 18gb vram + overhead
if you do batch 1, it should fit under 16gb vram
batch size 1 allocates the exact same bytes. heres the config @boreal bough
sick em!
what the hell 
like 2-3 days ago
the refiner is going away? why? for 1.0 release or in the future?
i really hope it doesn't because i like the refiner, so, the news that they're trying to compress the quality of the model into the base, was pretty frustrating to hear
I didnt have tensorboard on, but other than that, identical
it's on Twitter and i can't seemingly look at stuff on there anymore
but if you have a twitter account still, go to EMastoque's page
Grrrr I asked Emad personally for the ability to fine tune the refiner 😄
well it would mean we can do img2img upscaling so im all for it 😂
i guess you pissed him off and he just chose to delete it 
@shy kelp why would it mean that though
great. just great
when you configured it, did you follow this for accelerate?
- This machine
- No distributed training
- NO
- NO
- NO
- all
- bf16
they didn't test any of that to begin with for SDXL, why would it change now 
I heard they are eliminating the refiner because they want to go green and dont want to use oil in their pipeline. Very noble.
because the refiner is the reason why you cant img2img upscale and get the same quality as the original pass
it's because of the noise schedule
U already have refiner. So if model can do away w out it. Please do
read the original paper they used for their training framework, the part about the 'ideal denoiser' and how it works by making everything look smooth like soft serve ice cream
it never asked actually. regular kohya did but not this ui
when you click install.bat, it should have gone through that :/
you can probably fine-tune a new noise schedule into sdxl but it'll be hard
random thought:
Could you take the output of a first base+refiner pass and feed that into a second IMG2IM Base+refiner pass???
even if the refiner "goes away" people are still going to use it
i ran install from terminal per the guide. I'll try it again
well thats because the refiner works on the detail right
if we say "you don't need to do this" people are going to do it anyways
but the refiner dosent do a good job with detail unless its the first pass, so there it is. not good at upscaling.
@golden quarry
you still have an error here on the sdxl branch readme.md
fp16 leads to NAN, and based on people in the last two days, enough people followed that part XD
the base model is trained on the full 1000 timesteps
This is what I've been doing, but I've just found somethign I think is an odd behaviour I'm investigating
tbh thats sort of what a research community does
yeah that's expected
person1 "you can't do that"
person2 "hold my beer!!"
torch 2? yes
triton? no
cudnn? yes
thats all install.bat asks me
like someone said earlier, "Google innovates, StabilityAI releases, and the community fixes the mess"
stability does innovate though
lol by copy-pasting all of the research that came before them
don't stray too far from the Imagen paper! you'll get hurt!
like the technical paper was a long winded way of saying "Karras scheduling is pretty cool, and Hires. Fix works well, and ensemble pipelines are great!"
didn't even put it up against any serious community fine-tunes like epicRealism or any ControlNet pipelines or even a basic finetune from earlier gen with hires. fix
based on tests i've seen, it looks like img2img on sdxl might not be that great...unless maybe 1.0 has like the proper things to make it work that 0.9 doesnt but as of right now txt2img always gives better results than img2img which is opposite of how sd1.5 operates.
I don't see much point in judging img2img before the control-tile model is out as that changes everything.
oh, and reference_only.
negative. just ran accelerate with regular kohya, set it up exactly as you said, used same parameters, and same error
my pc is broken i guess
Can you see the accelerate settings without editing it? Because I'm pretty sure I did not run with bf16 and my stuff trained. But maybe I would need less VRAM with it.
bf16 is needed
idk but you can run install.bat and choose option 4 "run accelerate manually"
uff. rip ❤️
NaNs mean you had a division by zero, which means some number underflowed so that the precision isn't enough to represent the value
Was epic realism an actual big finetune?
@sharp robin it works without negative prompts and is generally the winner in that category in any comparison spaces on HF Hub
Yeah I know how to change them but I want to see what I was actually running. Though in the training config you can also set precision, what's the difference?
Amazing
Thanks for that info
you gonna loop SDXL through it? 😄
Whoops! Yeah I forgot to update that part, I'll do it soon!
I'll also update the install scripts to set bf16 default
dang it Derrian 😂
jk jk
Oh cmon I was trying to just rush it out! I didn't have the weights for a long time, so I couldn't test it until I got them
yo let me tell you that watermark ain't invisible
@golden quarry recognize this at all? im using same system and config as caith
Well I certainly don't get that issue, but it's a system ram oom
If you aren't caching latents to disk, you might want to
I have 32gb ram, so I can just cache to ram
guess i'll have to close my 26 browser tabs🤔
But it uses about 28gb total
i have 32g as well
Well that definitely could be it lel
So caching latents to disc is necessary to save normal RAM but has nothing to do with VRAM?
Yeah
thanks for the help, i couldnt tell what memory it was talking about
Cashing latents does conserve some vram, but it uses more memory instead
but sometimes it tries to allocate 54 gigs
I don't have a big enough dataset where RAM is an issue, but I had some issues regarding cached latents on disc as I was using the same dataset several time with different settings. Though I can sadly not remember the exact issue I had.
5k images only goes up by like 2gb vram... so... yeah
last time I realized that dataset size even has a real impact, was on my 30k dataset attempt
had to tone down batch size for that
with caches its getting better
Where do you get your datasets?
having a few images at 4000x6000 or bigger will cause issues though. make sure to keep your images at or below 2048x2048.
yeah it sucks but violating the license is worse imo
an issue called data hoarding XD
To get that many it sounds like you need some decent source.
"i bought my dataset from Shutterstock" - first day on the internet kid
I thought the training script had a max size for images so high-res images didn't matter, they'd just be resized down.
hydrus network -> all booru sites can be auto scraped
jdownloader 2 -> any image board, or collection on flickr
pixabay/CC sites -> good 'normal' images, to balance whatever dataset you have
google/torrent -> backup solution if its a rare source (overwatch artbook or similar)
@boreal bough updated the install guide and installer to set bf16 instead of fp16
just download LAION data with "NSFW" set to "Definitely"
guaranteed it's not in there yet
its finally working goddamn
yeah... everything is in the laion database, to the point where its legit terrifying
@boreal bough r u sure you meant 1e-3? thats like 2 minutes training time for 10 images
I'm aware of how insane it sounds... but it still hasn't failed me once 🙈
didnt you say thats 30 minutes for 45 imgs??
only things that ruined things so far, was messing with dim/alpha, or captioning without trigger words (or having words like "feet" or "foot" in the captions - any body parts are taboo and will break all anatomy knowledge of the original base)
10 min for 30~50 images
im trying without captions rn. some ppl on reddit had success
it can definitely work... but it's bad practice with sdxl, as you're guaranteed to get significantly worse results. hell, even booru tagging real life images has better results than no captions
i use blip for my app
only problem is when it specifies someone as bald or black and removes that info
XD
by 'no captions' you mean 1.0 dropout?
i mean no caption files
does it just use a trigger word then
folder name, yeah
ooh ok
my trainer prepends the folder name an image is in, to its caption, which is optionally its filename
i do that so that when i'm captioning data, i do not end up with too many similarly captioned images
i think captioning tools will have the biggest effect on future training. being able to tell the tagger which types of descriptors to include or omit
oh, that's easy
I found I need fewer images to get a decent LoRA with SDXL. Did y'all experience that too?
i've seen the tool thats supposed to do that but for scaling on any type of human, it gets messy
great news for me:)
I might be hallucinating fwiw
yes - but it backgrounds need to be watched out for
unless you're talking styles, those are super easy to get now
super bad news for me. I wonder if background removal for subject training even helps anymore
how do you mean?
nah I'm still doing faces, they work fine for the validation images in wandb so far, but once I use it for inference it kind of doesn't truly capture
is there a way to make the background not blurry
you can simply caption the background correctly, then either not prompt for it, or even negativ prompt it
but if you ignore tagging it, the other words absorb it
even if I use negative in any of the bots it still shows up blurry
many prompts cause blur. what are your positive prompts?
ok lemme try less prompts
anything that usually has blur causes it, like "cinematic" <- obviously cinematic shots have insane bokeh
does it help with eyes?
look
prompt:beautiful ginger woman style:Photographic negative_prompt:blurry, blurry background
blur
I imagine the photographic style includes blur
Anything like photographic, movie still all that stuff have some depth of field effect
ohhh
Try not using a particular style in the bot and stuff like sharp maybe
Apparently for some reason "taken by iPhone" gets rid of blur
Or so I've heard
might also get rid of sharpness ;D
i'd put bokeh in the negative, but chances are the style overrules it
its looking like I'll need 100+ epochs for faces
what is bokeh
oh
also, a lot of relatively bad retouching techniques involve blurring the skin, because they rely on dumb frequency separation
its still blurry ugh
it's an effect created by shallow depth of field when you open the aperture in a photographic lens.
well... keep in mind the bot doesn't use the refiner yet
DeepFloyd ginger 
and the base model might not be trained to output a lot of fine facial details because of the refiner's existence
pseudo says refiner is going away
its xl 1.0
i hope it ain't, but the one we have will likely exist for a long time
@visual glade are you allowed to demonstrate img2img with 1.0 for us yet?
even clipdrop is giving blur
what makes you think it does? The original announcemet regarding the bot here? Or I missied something?
discussions from this channel with the devs, and Emad's twitter
the base model wasn't like, trained to incompletely denoise images, if that's what you meant earlier
cause it does help at times
that's not what I meant
I meant it could not be sufficiently trained to output extremely fine details, like skin textures and whatnot, because the refiner can actually handle that.
but it can do them
base is a little derpy but i have no idea where you guys are getting this "can't do details" from
and partially noisy output merely is an option that helps to save a little bit of time with the refiner. neither refiner not the leftover noise is mandatory
Base on it's own can do loads of detail, it just fucks up small things like eyes
And then the refiner usually sorts those out
Model: SDXL Base
SDXL Refiner: Off
Resolution: 1152x768
maybe you're running it like guidance 7.5?
i use no negative prompt, btw
Talking about eyes, this is what the refiner can do to help
which one is the fixed one? the glowy on the left or the triangular iris on the right? lmao
I mean the ones on the left are clearly broken
On the right they aren't perfect, but they look like actual eyes
funny that when I add a LoRA the image quality takes a big dip – everything looks a bit shittier
not sure why. Still messing with parameters though but default diffusers looks bad
here's a set of funky eyes i got but they're not that bad as the ones you had on the left
Well, are you sure the LoRA's good?
haha no, but if you have parameter suggestions I'd appreciate it
default diffusers set up uses Euler 😦
the colors are really nice
ahhh should I use something else?
i use DDIM
prompt= a stunning portrait of a 1985 movie elder-teen standing in line at a grocery store
seed= 1505469711
guidance= 8.2
guidance_rescale= 0.0
steps= 20
resolution: 1184x664
SDXL Refiner:
strength 0.65
num_inference_steps 30
guidance_scale 7.5
aesthetic_score 10.0
negative_aesthetic_score 2.8```
^ for the SEdit / img2img style diffusers, these are fine parameters
What does setting Aesthetic score to 10 actually help with
I noticed setting it lower on horror type pictures helped a bit with the feel
this is what i got for those
guidance - you mean Aesthetic score guidance on the refiner? positive or negative? or an actual CFG scale on the base being so low?
Not very much right now, but AIUI it's basically what the RLHF was trained into.
we're talkin about the base, Erilaz
So you get pictures more like the average of what people liked. ...in 1.0.
naw the RLHF was for their internal data and charts
the LAION aesthetics score is what went into the conditioning input value.
man, pseudo. you are showing screencaps of movies from my favorite era 😉
Elder Teens R Us?
Thanks
hehe
i love it when they get a 40 year old to play a high school student. it makes it so much more relatable and immersive
Morning 
oof. never even though of unironically setting the CFG scale on the base that low. thanks, gotta try that
yeah when you do CFG 1.7 through the base and then 7.5 through refiner you can add prompt keywords like 'colourful' and 'balanced contrast' to help bring it back to life
I've found cfg low on the base, high on the 2nd pass / refiner seems to work well
6.0 isn't low enough on the base though imo
you can't really straddle that uncanny valley there. it's either 1.7-3.0 or 7.5-8.0
you freaks and your middle CFGs can keep your horror shows 😄
I've been running 12 on the refiner and it seems to make the details pop a bit more
But I'm using dpm++ 2m SDE at the moment, which might be different to what DIMM does
Nah it doesn't, not at all
stochastic samplers are a bit different, more forgiving but less accurate
I can try it on DIMM and see if it deep fries
I'm back to dpmpp_sde_gpu. still my favorite
It's slower though, I want SPEEEEEED
yeah, that's true. but the image just converges nicely for me
Sell your house, buy an H100
implying I own a house
dpmpp_2s_ancesterial is too slow. High quality, but very slow. With V1.5 I could create an image every few seconds
I'm back to using dpm_adaptive a lot, actually. It seems to do a better job getting rid of leftover noise.
I should try using that as a third pass maybe
But then again, V 1.5 was low quality at times
Yeah, on DIMM it deep fries
What would be a good choice? I want something that is fast, but not compromising too much on quality. It doesn't have to be super high quality.
I want to do more AI animations
dpmpp_2m?
you have to use the same sampler on both models
just in case that's what that is
I was, but the CFG was 12
this is what the refiner is getting at 13/20. It's got a lot of work to do
DPM doesn't care about it, DIMM seems to
@west breach like hell. if you squint, they mint
What ur trying to do?
partial-diffusion
I'm just showing what is going on. I don't think this is leaving the refiner to help sharpen up the end result, it's left with way more work than that


