#✨|sdxl
1 messages · Page 52 of 1
yeah, many words arent weighted equally. brown hair also dominates most prompts
but I'm getting there
the refiner does eliminate much of the texture issues, i still don't consider it an amazing result though tbh a little let down by the quality so far i think. I hope it will improve in the future with community efforts
also the double step process is annoying but 🤷♂️
oh better i guess
are you using a1111?
yes
ya i dont think it's using the refiner in the same way that it does in comfy
k. fixed the blur, now I just need to set the style
how did you fix?
selfie of a cute young woman with pastel hair with freckles
(blur, bokeh:1.4), + whatever other negatives you use
i was actually trying to make sci-fi movie scenes, but was having issues so i just wanted to try something simple with XL first to figure it out 😂
didnt expect everyone to start making pastel ladies
ah! I got you
Positive: cat summoning a gigantic tomato using the power of but a single infinity stone, in the style of marvel movie
Style: (horror:1.2) still, grunge, vignette, chromatic aberration, dark, lovecraftian
Negative: deformed, bad anatomy, ugly
this is nice, prompt?
anyone has an idea how to use the refiner in the right way in auto?
yeah. it can't. quite literally
using the tiled vae got me down to a min thirty, incredible. Thank you 😄
sai even confirmed that a1111 does not have full support yet, and will prob take a while
perfect! have fun making images c:
In Automatic1111 I run this prompt "lora:offset_0.2:0.2 plastic doll Dora the Explorer wearing He-Man outfit" with Seed: 169085828 and Sampling method: DPM++ 2M SDE.
It look okay until the end when it glitch out and give me a grainy image, can some test if this is same for other or if it only happen for me.
post the grainy image?
maybe you still have an old VAE set in the settings?
for an older version of SD, make sure you are using the correct VAE
or automatic
I run it in 20 steps and it look goot until last frame, and it still do great in the old Euler.
sounds like a VAE issue i had the same thing, but if not that, not sure
sounds a little like the preview is just taking an estimate
I run Automatin on VAE and I get same when loading the sdxl_vae.safenor
It only happen with DPM++ 2M SDE, the DPM++ SDE look okay
i just read some of the comments about the 0.9 vae, doesn't anyone know the story on why, the benefits?
They left it in the oven too long
i dont think the devs had said anything about it
yeah i got this when i tried different samplers, can't remember exactly which ones at present
yeah interesting to see it just appear in huggingface
there's no "why" for it from what I understand
is it's like night and day improvement or like some like a and some like b?
anyone know a way to change the default workflow
or better yet load an arbitrary one on launch?
that one suffers from not enough enough steps on base model, too much noise AND not enough steps on refiner to fix it
There's token limit on comfyui or works like a1111?
The bot here, what do it use to generate images, what Sampling Method?
it changes
I don't think it's night/day, but the examples are hard to refute imo lol
I'm training a lora for SDXL. Getting this error trying to generate with it
17:58:40-904771 ERROR Diffusers LoRA loading failed: crown_vic-000006 'UNet2DConditionModel' object has no attribute
''
(am using kohya)
Well, for the most part SDXL give me great images so I am still glad.
i know i tried 30 steps on the _2m_sde_gpu and got the artifacts that kennylex posted, and thought, if i have to go more than 30 steps, so far... not the sampler for me
finally a result i like 😄
but it begs the question if i do go higher yet on steps will i like the images enough to warrant the time
for science. it has a hardcap of 150 steps - at which point the changes stop existing
What samplers do folk like to use for SDXL? Self I just try to avoid EULER and Ancestors.
Euler.
just used highresfix with the base, seems to work out pretty well for me
base only - as refiner doesn't need to be the same
i coulda sworn the cap was removed in comfy
i think i did a 300 step generation once just for funsies
was a few months ago tho so foggy on the details
nah, there is no cap, but its about when steps stop becoming relevant
ah yeah convergence and all that
are there supposed to be watermarks in SDXL base model?
This belongs in #1087493722645725184 ❤️
yes and no. please post a photo XD maybe you managed to set ascore to 0
This one has some small scribbles in the bottom.
That's base model, txt2img.
Did you ever get an answer for this question? I'd like to know too.
@boreal bough - With refiner in img2img, I get another one in the bottom left, + one diagonal across the background, like stock photo watermark google result
Mr Bean Ascending (missing the spotlight though)
oh. nah. that one is probably just baked into the artist or style you're going for.
I'm assuming you already have "signature" in your negative and it wasn't enough?
(signature:1.4) in the negative in the refiner process should fix it - though you should only use this strength when the default strength isnt strong enough
Slightly better results in comfy vs auto 1111
Ah, actually yeah, the txt2img WAS with a style LORA, first test XD
Must've come through there.
Both were without any "signature" or "watermark" in the negative. Only "noise, grain, filmgrain" in negative.
Good shit
I see your mech, and I raise you star destroyer
I see your Gundam style centaur mech and lower you a rusty beaten up mech that creates paths threw swamps
Copax Realistic XL https://civitai.com/models/118111/copax-realistic-xl-sdxl10
srsly though good gen, i've got some rats in power armor that have a bit of the same vibe
I have so many questions 🤣
could anyone explain what It/s means? It says like 17 it per second but still takes a few minutes to generate
Look more closely at the unit. That is not it/s. It is s/it.
As in seconds per iteration.
What GPU? What model? What resolution?
So AMD card on Linux? Are you sure it's using the GPU? I don't know much about AMD.
I can not recommend an AMD card for AI related tasks. All the software prioritize NVIDIA because of its huge market share and CUDA so you may constantly run it problems while setting up libraries etc.
Everything I've heard is that DirectML is slow and trash. Not many people use it so you will need to ask around.
I get the same perforfamnce, I have the same card. It's beacause of the resolution and overall size of the model.
Automatic1111 isn't really in my favor atm lol
thanks too, I dont have other options though :p
Did you get that from inpainting or prompting?
Use the base model, not the refiner.
txt to image with base
What was the prompt and CFG?
this was an earlier run with basic prompts testing refiner in img2img
7 and "beautiful young woman, short black messy hair, white, beautiful eyes, sitting in the park watching birds with a beautiful garden behind her
Negative prompt: fat, , , bad hands, mutated heads, multiple heads"
It's almost like your negative prompt got in the positive somehow.
did you find anyway to speed it up?
Indeed, automatic has done that lately lol. The more i imply negative, the more it adds it 
@midnight shuttleMe and my non existing short term memory 
Yep. Wrong model.
bro refining noise
Yeah, fuck that noise 
No. If xformers supported AMD we could have gotten better performance but it doesn't. Someone created an alternative to xformers called "flash-attention" and folks at AMD are trying to modify it to work on AMD. Fingers crossed.
The last update was yesterday.
https://github.com/ROCmSoftwarePlatform/flash-attention
Oh boi.. This will kill my gpu lol
Also, noticing the generation of automatic is quite slower than on comfy
thanks, it looks like there is a beta release available, I wonder if comfy cares about implementing it in ui
How much VRAM and what GPU?
Just about made it 
Filling up 24 GB? Wow.
Nibbled at my ram, then it went down again :P
it burped
VAE seems to use a lot of VRAM in SDXL/A1111. 9.3 GB used then up to 13 or 14 when VAE runs.
a 16k i generated with a 8k upscaler in comfy ate all my video memory, 30GB ram and then some lol (as in pagefile on my nvme)
No. It's fork of the original flash-attention repo which is why the documentation belongs to the original project. There's no beta release.
I even use --medvram, comfy barely uses much compared. It is at half usage when i generate basic res. CAn even make 2048 generations there
There is definitely something strange with A1111 and SDXL memory usage. It should not be so inefficient.
Nope!
In comfyUI, is there any way to pick a single image from a batch and only regenerate that one? When I drag the image back in, I just get the same batch that I originally generated.
here is the prompt: With Cat-Like Tread
Upon our prey we steal!
In silence dread,
Our cautious way we feel!
No sound at all
We never speak a word,
A fly's foot-fall
Would be distinctly heard!
Now to refine this one that already looks great at 80 steps :P
Okay, sad :/ amd is in a really bad place right now
artroomAI with sdxl 1.0
things are actually improving now 😄
pytorch and rocm's last update boosted my iterations by 3-4 on SD 1.5 512px
I see your swamp mech, and send you a bird that really wants to sit on its shoulder
whats the number so I can compare?
I am gettin around 9.5
You are doing something wrong
It doesnt seem to be very good at faces that are not close up, and even close up ones seem iffy
been stuck on mechs atm ... but i've got one under fire
Not too bad, teeth is a bit off though
From the model documentation: Faces and people in general may not be generated properly.
some ok results though
Is there a "highres fix" upscale by" feature in comfy?
Ok, i love it lol
I'd have one of those on my desk
I need this
these are fire!
Sometimes it is good. And with training it can be better at people. But it is not designed to generate images of people. For that a GAN would be better.
I raise you a mech girl
I hate it, if I would see that in real life I'll punch the little shit
but its so cute
X to doubt on GANs. midjourney does a good job. I guess ill hope for community models to improve it
just gonna ignore the fact the tree/rock/explosion thing in the background is a low poly mesh but
Midjourney has the advantage of being a closed model so they can train it on more things and then filter the results. Open source models need to take into account possible misuse when choosing what to train. Have to be careful when training images of people.
confused robot is confused
"how does one... love?"
im not gonna lie im dying with that thing on the bottom right. look of just pure disgust/surprise
us frfr
think im getting more into it now
Going to ask again because I think it was missed. Is there a way in ComfyUI to pull a single image from a batch and just generate that one image again.
When I drag the image back in, it just gives me the workflow for the whole batch. I have tried incrementing and decrementing the seed, but I can't find it.
just need an upscaler
despite half a bottle of rum, the man still couldn't handle the endless ruminations of the giant robot. If he could, he would kick his ass and make him walk to the girl.
is this not the most basic workflow and I still get 1 per second
I do not believe so
is that actually the prompt
Is there any way to interrogate the image to get its generation parameters? Or are the later outputs in a batch dependent on the earlier outputs.
Where is the best place to find well made comfy UI work flows? The ones I find are very basic I want one that can do high-res fix and face restore etc
What I meant was that your system env is not set up properly. Did you install torch for rocm?
use vit-H, it works really well for sdxl
I really like this style
My eyes hurt just a little bit
so, the comfyui github states pip install torch-directml, thats what I did
Pixel art at the start, i think i had Pixel art game art of -
oh, if you mean like get the exact params, then drag the image into comfy, to get back the workflow, or use automatic png info, if it was made in automatic
interrogator + Vit-H, if its just an image
idk man, I use the rocm version, works for me. I am on linux btw, rocm isn't on windows
oh then thats why. I'm on windows, if I remember correctly rocm didnt work for me
its crazy linux has 10x performance for you lol
1 it/s on a $1000 card is amazing
Basically. I am wondering if there is an equivalent to PNG info in comfy. I have always used batches when testing basic things, but I like to then settle on a single image to upscale, inpaint, etc.
If I can't regenerate that single image, I am going to need to do things differently.
I don't want to do the batch each time because that is just wasting GPU time.
if you're talking AMD windows isn't even supported yet. They only just released the basic HIP SDK today, doesn't even have any ML libraries yet
yes
fun
idk what card you have but if it's an AMD one ROCm is their version of CUDA and it only works on Linux right now. It's coming to windows Soon™️
rocm started as a server thing. didn't even work on consumer GPUs for a while.
ah I was just talking to s0md3v because they have the same card as me and is getting 10x the speed
yea
I have a 7900 XTX and it runs pretty fast on linux.
whats your iterations/second?
sdxl 1024 is like 3.25 it/s or something
huh neat. I get 1.5 on 6900xt
wait I can just install linux
yea
yeah 7900xtx is pretty good
I have both windows and linux
same
idk if windows shitty partitions will allow it
windows for my Rift linux for everything else
the linux installers can forcibly shrink windows partitions lol
Oculus Rift?
yea
which one you got? OG CV1?
that's not a good idea. Just create a new partition for linux in windows before you even do anything else. This will make things easier if they are a beginner
worked fine for me
bakc up your stuff first though If you've never messed with that stuff before
damn, I'm a huge fan of the original CV1, I had S, CV1 is soooo good
yep, is what I said. just drag the generated image back into the comfyui
then it loads all the params
Should I buy a100
I only got the S cause it's the cheapest headset until valve finally makes a new Index
yes if you can afford without hurting your finances. no in all other situations
how is the a100 compared to a 3090/4090?
if you're 100% certain you really wanna do literally all the AI stuff sure. If AI is an occaisional hobby I'd snag a 3090 or something instead
4090 seems better for generation and cheap
A100 is like 2.3 times faster than 4090
my 4090 can essentially do everything except pure finetuning - although even that works with gradient accumulation + batch size 1. (would just take too long)
(10x the price)
3090 is the best bang-for-buck option you have right now
So like 10x 4090 can be better than a a100 maybe
Not a professional thoo
Yes, but that gives me the batch parameters. I only want one image out of the batch, not the whole batch.
I've seen 7900 XT's new for $750, wonder how they'd do in comparison
long as you don't mind the ROCm pains
lol just get an a6000 if you really really need the vram on 1gpu
Isnt there something about AMD and double the vram requirements?
when you load comfy with direct ML does it display your VRAM properly?
¯_(ツ)_/¯
It's running well on my 7900XT, getting ~2.9it/sec for 1024x1024 SDXL
InvokeAI 3.0.1 just dropped with that sweet SDXL support. (okay, partial support. I think inpainting is still missing?)
I don't have a 24 gig nvidia card to compare against
in that case no - it only provides the batch.
you have to manually increase the seed number to generate just the one image
I use gtx 3060 should I upgrade to 4080
I do know that the pytorch 2.0 attention is broken on AMD and it causes these VRAM spikes like 2-3x what you'd normally see
so you have to use sub quad or doggettx attention
ah, yeah XD 4080 is ok
12g-16gb memory? not really an upgrade
Is it 16Gb vram only
not necessarily what I'd recommend for ai, but its good for gaming
Which one is yhe 24gb one
Do you know what the default increment is for Comfy? In A1111 it is 1, but I have tried 5 seeds in each direction without finding the 3rd image in my batch.
3090 (around 700$ used)
4090 (around 1800$)
have 24gb vram - and are the best for AI
Ohhhh thats nice
4090 is not worth it for speed. while it is a bit faster, not nearly enough to more than double the price
3060 still works but I want to learn training fine tuning
one thing I'll say is on KoboldAI, my 7900 XTX can load 13B param models with minimal offloading to the CPU, while my 1070 struggled with 6B. The maths math out so it doesn't seem like the VRAM is crazy or anything. Think the high VRAM is just a Stable Diffusion thing.
I only got the 4090 since I do gaming and other things, and had the extra money
What you use
you shouldn't need any offloading for 13b 🤔
16 bit depth
Win 11 has faster performance too
back when I still had my 1070 4/8 bit didn't even exist so I'd say it's a fair comparison
So this xl cant work good with 512x512 is iy
Hmm
man that must suck
So what is this xl for
Finally got the right to use A100 gpu on google cloud 
Xl sized images
1024x1024
1.5 can do 1024 too 😭
I mean there's a ROCm version someone made that seems to work well for 6900 XT's and similar. I ported some parts of the code to target my 7900 XTX but it miscompiles with some clang problem and I'm too lazy to fix it
But not well. If you want 512 images use 1.5 or 2.1. If you want larger images such as 1024 or other supported resolution use SDXL. Very simple.
How much are you paying 😭😭😭
But 1024 way to slow
Should be less than 1$/hour on spot vm
Buy or rent better GPU.
Still it will be way more slow on better gpu 😭
no
:(
SDXL is optimized for 1024, its a bit slower. But not by much
How to enable embeddings in comfyui? Like it's no half vae for a1111
its just a better model
Yes its like same speed butt 1024 makes it slow
it makes 1.5 slow, SDXL is made to run at 1024
Yea 😭😭
especially when you account for success rate
SDXL is faster on 768+
Bonfire of the Vanities (that's the prompt) ArtroomAI sdxl 1.0 dpm++ SDE
How to use textual inversions in comfyui?
I should probably move to linux whenever my upgrades arrive, dealing with all this shit on windows and amd is just pain
its all hacky kinda works *** bs
type embedding:tinversionname
Yeah linux works so much better for AMD cards, it's well worth it
What causes this unrealistic look in stable diffusion
Use both. Whichever you install first, there's options to leave some of the drive unpartitioned then you can just run the other installer and boom.
Set the Linux install as your primary boot device and it'll add Windows as a boot option usually
No need to add any arguments like no half vae in the batfile?
ik, was planning to do it on sd release but decided to wait since i'm swiping out ram an adding a bunch of extra storage space in a few days anyways
comfy uses full precision VAE by default
😢
price/performance not worth that I think
XTX or 4090
3090 will always be the better deal, unless you're determined to play cyberpunk without dlss enabled
man once you get a card that has ai accelerators it's a whole new world. Instead of very carefully generating images hoping they turn out OK you just generate like 10 of them at once and pick the best one
in 1.5 I had my default batch size to 16
is what I had to get to make full use of games along with rtx4090
you end up with weird bottlenecks if you go highest end
definitely wouldn't recommend XD
lol how long before it throttles
2 TB gen 4 + 1 TB gen 3 == happy
idk anything that can use gen 5 speeds yet
most high end gaming boards from last two years. funnily enough we just didn't have nvme's that pci5 until like this year
I meant like software wise
Whats the downside of a 12gb 4060?
It's really really hard to be bottlenecked on a gen 4
What model is best for photorealistic
unless you literally just copy around 100 gig models
ah, yeah. windows boot drive.
games - ratchet and clank: rift apart!
no loading while jumping through areas XD
nah I put windows on my gen 3 drive fuck windows
linux gets my gen 4
I just want my adobe stuff to load faster tbh
windows is just an additional bonus
I swear the adobe stuff mines bitcoin or something when turning on
lol intel optane as a boot drive+ pagefile since they have the most absurd endurance ratings of any drive, 1.2pb for a 128gb drive
I think optane is still better than any gen5 for a boot drive even with 2 gens of pcie bandwith for the gen5
optane on nvme drives is a scam. They literally use the same chips 99% of the time
? its 100% optane not the hybrid drives
Just try 3090. 4090 is overpriced
That's so expensive lol
what's even the point then? Just a marketing name for their spendier drives or something
it's not optimizing anything if the whole drive is made of the same chiplets
optimizing? its a completely diffrent tech to nand flash, I use it because it has a hilariously high write endurance/sustained write speed vs most junk nand drives
https://www.intel.com/content/www/us/en/products/sku/211867/intel-optane-ssd-p1600x-series-118gb-m-2-80mm-pcie-3-0-x4-3d-xpoint/specifications.html
Intel® Optane™ SSD P1600X Series (118GB, M.2 80mm PCIe 3.0 x4, 3D XPoint™) quick reference with specifications, features, and technologies.
PCWorld did a video on optane drives. For most task regular nvme is better. some task optane flies past everything
ROCM is now out for windows. Any ideas how to use it with comfy ui?
was PC world speaking on indurance or just performance?
the HIP SDK is out but the ML Libraries arent' and there's no torch builds for it yet
performance only
yea
I think Robot is primarily speaking about indurance
Anyone tried latent upscale? I'm getting artifacts when I try.
think endurance is oversold nowadays. had this drive for like 4 years across two computers having done a lot wipes + clean installs and in/uninstalling dozens of fat bloated modern games.
total expenditure, 8% lol.
lol 20tb written, i've done like 2tbw in 1 day just tinkering with comfyui
it loads ~25gb of write to the drive every gen
Fucking how. I've been using SD since 1.4 came out
mostly on that drive until recently
What's writing 25 gigs?
Lmao. I really want the ram to arrive so this sort of nonsense can stop occuring (this is from just this session)
Ah
I've never used pagefiles
ever
swap on windows and page on linux are both 0
wait
swap on linux and page on windows
I tried to get away with not getting more ram but llms are 100% bandwith starved so upgrade it is lol
ram's so cheap
also, update on the whole TensorRT situation, NVIDIA stated that on the next TensorRT release they will release a script that converts almost any safetensors(including SDXL) into TRT
last thing you should cut corners on
this was a while ago
the differene between 16 and 32 gigs on my 2017 machine was like $30
and until very recently 32 was enough to do everything under the sun without swapping
hey does someone know how to use comfy ui? becasue i have some proplems with loras and embedings. i dont know how to activate them i added them to the node tree but idk what to tipe in the clip text box to controll them
they're just active
it's not like auto where you need to type in a specific token to activate them
just go base model node -> lora node -> everything else
set the strength to 0 to turn it off temporarily
Does anyone of you have a good Img2Img workflow for SDXL 1?
Trying myself with 1.5 ControlNet
ok thanks
and with the embedings?
1.5 stuff does not work
I don't have an actual img2img workflow .json but if you just add an image input into my principled node it behaves like img2img with all the settings you're used to https://github.com/Beinsezii/bsz-cui-extras
at all?
img2img works
if XL works 1.5 should 100% work
Thx
controlnet with xl works ??
No
no
the tooling does but it needs to wait for new models to be trained iirc
1.5 modesl loras or whatever does not work with xl.
right, saw the github discussion, waiting til its out then ill switch over
Im getting error" shape mismatch when trying to apply embedding, embedding will be ignored 768 1280" when I try to use embeddings in comfy
Oh no.
I thought you meant 1.5 on its own
I'm trying to build automatic face-inpainting in comfy. But with CLIPseg it makes a solid mask that cant be denoised from the original image ("original fill" mode in auto1111)
Ideas?
no
Im getting error" shape mismatch when trying to apply embedding, embedding will be ignored 768 1280" when I try to use embeddings in comfy. Anyone help?
are you using 1.5 embeddings for xl?
Yes
Using for dramshaper
Sdxl has different negative embeddings like BadDream or UnrealisticDream
?
fucking resetting GPU motherfucker fix your shit AMD fuck
yeah the setup is not recommended now obv, Its super jank but works much better than it should tbh for being at like 99.999% memory usage for hours at a time
i am testing embedings for a 1.5 model
speaking of SSD wear I also set my default firefox download folder to my ramdisk and it's really nice 👌
unzip files and dick around with everything at then copy the final folder structure at lightspeed to your disk if you wanna save it. Don't even bother cleaning up cause it all vanishes when you power off at night.
also have my SD output folder on it
if u use comfy ui 32 is still engoh
recommend 64 just for that
SDXL + refiner I don't go over like 23 gigs
on automatic 1111 i go over 32 but on comfy ui only like a bit more them 20
sounds pretty nice
So, I left a SDXL lora on back on a 1.5 generation. It is not terrible..
Actually seems to handle upscaling pretty well
huh? How you loaded SDXL lora inside 1.5 model?
very carefully
comfyui can do that
comfy
actually it should be pretty easy to (accidentally) do on a1111 too lol
lets just answer anyone's questions they'll ever have from now on with comfy
so, have you tried the other way around? 1.5 lora into SDXL model?
spooky eyes
I've found it likes to like making anime eyes kinda dead inside
XL Loras are like insane size. Some on civit AI are at the size of prunned FP16 models
I would not expect any 1.5 lora to produce anything other than random garbage on xl, they aren't even the same architecture
Is that just a storage constraint? I've never worried about model size before
I think it has to do with using sd1.5 lora settings on sdxl, but I am also clueless at sdxl training
I just run them through a 1.5 model at like ~.4 denoise and it seems to work pretty well
nah fuck 1.5 it runs like ass at high res
How to i get the denoising option for img2img?
2 megapixels it's like half speed or something
start at steps > 0
lmao (agony)
I mean the fact that 1.5 loras are like 15-20X smaller. Also you need to load them. So also performance
if total is 20 and it starts at 10 that's 50% denoise
Thxxxx, i needed that <3
the official sdxl lora example is only 50mb. the folks on civitai posting 500mb loras are doing something wrong lol
but does 0 of 20 mean 100% or 0%?
I mean clearly. Wednesday one is like 1.7GB 😄
I think you need to use lower network settings as your replacing less of the model
and in sd1.5 you used higher ones
base 1.5 model= 2gb lol
There are a bunch of different lora types and the appropriate size for each type varies. Then for each type, there are size settings that change the size. You can't just say "lora is big therefore settings are wrong".
I would imagine a contrast adjustment being a lot smaller than subject loras
surely you dont need 1.5gb of weights for your like 100 image lora
That's true for sure.
the concept is the same, you're just training on a (relatively) few images. the size should still end up the same. i think someone spoke about this above but theres been a lot of responses since then, there's settings that you can tweak when finetuning and the ones with huge filesizes are likely setting them incorrectly
I believe the setting that influences the file size is relative to the base model. So if you did a 50mb training on SD1.5, then it would be much larger on SDXL without needing to be
since resolution is 4x, so I would expect 4x size
assuming theyre using a few images
Network Weights, etc are more important than they amount of images when it comes to size
did someone say street photography?
76 images of 2k images down in 30 minutes. Woohoo Lora training
Waiting for controlnet.
A400?
I have a 3060 12gb :((
can you train Lora with 12 gb vram?
For SDXL that's freakishly fast
Yeah 8 rank at 768 at least
Easily with 1.5 extremely slowly will a batch size of one and gradient diffusers enabled on sdxl
I need to learn how to do it then. I have 12 GB VRAM. Do you have a tutorial for sdxl training lora
The best one was was se courses guy
I don't understand what you are saying
do you have the link?
He's a YouTuber
@fast vector 90 images in 30 minutes for a 3060 12 GB? I can't believe how much power sdxl demands :((
except that I asked one of SDXL Lora's creators and for them it took much longer.
"Oh it did. I think it took about 3.5 hours to train this for 2 epochs on my 3090. I've trained my last few 1.5 LoRAs at 1024 this took substantially longer."
I feel like I might give up on sdxl Lora training and maybe only train a model when I go to sleep or go to work. This is wild
Yeah 3.5 hours is too long to be honest
im still experimenting, but it looks like you can get a satisfactory result in less samples with SDXL.
My exact SD1.5 lora settings on SDXL took 3.5 hours or so on a 3090. Results were bad. I, then, did a different training with new settings (same image count) and less samples and it took 30 min with much better results.
wait you're mooncryptowow? Or is it just coincidence?
Stack Overflow are releasing their own Overflow AI, specific for AI/ML. interesting
Which settings did you change?
pretty much everything tbh, I still havent found good settings... just better-than-before settings
man, I actually wish if comfy had something like autocompletion. It would be great for embeddings, etc
does old textual inversion work on sdxl ?
Old nothing works with sdxl. You have to train new textual inversions and loras afaik.
i see
I think textual inversion will not be able to work on sdxl. At least that's what I remember reading
i get a error everytime
I mean there are already loras and even checkpoints
I haven't tried to train the actual model. Loras and so on weork fine. I use a branch of the kohya scripts.
theres a bunch of new and different settings you need
and only one video tutorial that is super bloated
such as?
this has a summary of some
I just grabbed updated scripts, and trained my loras like normal. I only needed to set the res to 1024 and check the box for an sdxl model.
ya i dont want lora
do you have an example of your settings / output?
i wanted to train a checkpoint
ah, sorry I never tired checkpoints
Yeah I have not done checkpoints either for sdxl.
I don't have optimal settings. I don't want to post samples trained from real people here.
does comfyui not install on python 310
No I mean
i got comfyui on 3.10.9
I tried installing pip dependies and some packages aren’t available
yeah it should work
which is the ideal AI python version
specifically torchdiffeq
i use torch 2
I use python 3.11 and the standalone ships with python 3.10
torchdiffeq is pure python so it should have no trouble installing
One hour and a half in and I just hit my first epoch out of 20 for SDXL Lora training

5k dataset? xD
well you made me notice torchdiffeq wasn't actually used for anything important
so I'll remove that dependency
I have a total step count of 1800 😦
even then O_o
you going over your vram by any chance?
hey how do embeddings in comfy ui work. wehn i am using them do i need to put the file name or the embedding name into the clip text box?

@deft coral what graphics card do you have
RTX 4090
@boreal bough I have 12 and I believe it was asking for 17 I got it to work by enabling the gradient something or other
@deft coral 3060 12 gb poverty gang over here
oh damn. you're running on standard ram now - which explains the speed
it insnt realy explained ther
Here's the results of a "spider-gwen" lora for sdxl 0.9. I haven't re-trained it for 1.0 yet because I need to change the dataset some and I haven't gotten around to it yet. In this attempt, the fact that I mixed costume, no costume, mask, no mask, made the lora somewhat ineffective.
in ther case the file has the same name as the embeding
@boreal bough I don't know what that means 🙂
embedding:filename
@queen ether
basically, once you run out of vram, it doesn't crash nor stop working, but it gets slow as hell
that's what you're experiencing
but 1 embeding file can have multible different embedding names in it
can you show me one?
@deft coral my GPU is held up by a paint brush
you're missing some optimization settings, since multiple here run it on 12gb vram
Is there a guide your following for your settings? They look great
@boreal bough is that a command argument ?
ok wait a sec
are you running kohya-ss directly, or via the ui, or via derrian easy lora trainer?
@boreal bough UI :0
this one has 2 and ther are others with even more https://civitai.com/models/89484/epicrealism-embeddings?modelVersionId=95263
This is just a resource Upload for Sample-Images i created with these Embeddings. They are not very versatile and good Positive Prompt: epiCRealism...
then xformers, gradient checkpointing, 1e-3 unet & learning speed. dim/alpha = 8/1 ,and the command argument for unet only
--network_train_unet_only
except that thee are two different ones. One in positive and other in negative prompt
I only see one
that one should work fine on comfyui
I hope it does
it has 2 : 1. epiCRealism 2. epiCNegative and the file name is epiCNegative.pt
no, there are two files
@boreal bough I was reading this on the GitHub but I don't know where to put it. I know where to put it in stable diffusion but I don't know where to put it in koya SS I have something filled in in the optimizer extra arguments that starts with scale parameter is that it?
No but if the next one I train works ok, I'll post some details for you here. I think I copied settings from here in my more recent lycoris attempts. https://youtu.be/dUlki1IAB0w?t=460. But I used the prodigy optimizer with learning rate = 1.0 and the loss doesn't drop much after around 1000 steps or so using around 40 images.
id imagine it would be hard to make any images without it having one haha. Its def in there
no it just donlodes 1
under the scheduler in the ui, there should be an empty text box for additional arguments, put it in there
just click on other one
its just 1 file
hi friends, whats the recommended vram for running SDXL?
{'string_to_token': {'': 265}, 'string_to_param': {'': tensor([[-0.0034, 0.0213, 0.0007, ..., 0.0280, 0.0048, 0.0047],
[ 0.0179, 0.0198, 0.0057, ..., 0.0105, 0.0117, -0.0043],
[ 0.0095, 0.0053, -0.0120, ..., -0.0096, 0.0066, -0.0025],
...,
[ 0.0037, -0.0079, -0.0430, ..., -0.0173, 0.0184, -0.0008],
[-0.0271, -0.0072, 0.0129, ..., -0.0072, -0.0059, 0.0009],
[-0.0165, -0.0159, 0.0144, ..., 0.0192, 0.0016, 0.0220]])}, 'name': '_EmbeddingMerge_temp', 'step': 0, 'sd_checkpoint': None, 'sd_checkpoint_name': None}
there's only one
ohhhhhh
i have 10gb VRAM but when i select the SDXL model, my GPU is stuck at 9.7/10gb 😭
sorry this is the first time i am doing embedding stuff so i am a bit of a noob sorry
@boreal bough I could keep asking questions forever 😓
dont stop ^^ I'll gladly help when it comes to LoRA questions
i didnt see that i have to downlode 2 files i thoght its both in 1 file sorry
are you on the latest? try update/update_comfyui.bat
at fist i thogh comfy ui looks harder then a1111 but its the opposite.
which lora and can you give me a screenshot?
is there any sheet for all the styles for sdxl? so we know how we can get a specific style?
yeah, there is.
assuming you're doing normal training, with 1 trigger word at the start
then these are your advanced settings - where xformers and such is included
only the network train unet needs to be in that box (if you're using adamw8bit), if you're using an dadapt or other cool scheduler like that, then yeah, the remaining special settings also go into that box
#✨|sdxl message also just pinned right here haha
that's not the proper lora format though
@visual glade where do I voice my complaint about the civitai lora promotion? :/
terrible loras are mass produced by the minute right now
"woops, we forget the whole unet, guess the vae's just that good"
only if its clip guided 
on the civitai discord probably
T.T
I'm pretty sure a1111 doesn't support that format either
is this the diffusers format lora format?
the bad ones will get downvoted
its just a matter of considering 2 basic things
A.) this is not 1.5 - stop reusing the exact same captions - especially when they don't work in standard generation to begin with
B.) use dim/alpha of 8/1 unless you know what the difference is of not using this preset
i was told to use that argument in there because SDXL doesnt work with adamw8bit im using afactor. i also have almost all the same settings enables as you other than memory efficient attention so youre saying i should switch to adamw8bit and use the --network train argument there intstead?
? fake news XD I've trained 20+ loras on sdxl with adamw8bit
all very successful
Unet Training: 1e-3 <- ideal setting for 95% of situations (From dataset size of 10~1000)
Dimension/Alpha = 8/1
Unet training only! (--network_train_unet_only)
Resolution = 1024,1024
Bucket size = 512,2048
repeats = 1
epochs = around 20 should be where the model is 'perfect'. Train for 40 to be sure. (scales very slowly with dataset size - but not nearly as much as in 1.5)
is there an example diffusers format lora somewhere?
it's like 2 lines of code to add support so I can add it as long as I can actually make sure it's an actual format and not an output of buggy software
ive been reading about the --network_train_unet_only thing for the past two days now an i just have no idea just ZERO idea how to enable it
"easiest way to train stable diffusion xl"
couldnt be more wrong, dear god
almost everything said in this video is not true T.T
so there's no diffusers lora files anywhere?
🤷♂️ looks like kohya it is.
Perfect prompt following... as long as your prompt is "random noise".
Im getting error" shape mismatch when trying to apply embedding, embedding will be ignored 768 1280" when I try to use embeddings in comfy. I saw drramshapersdxl alpha 2 model using BadDream negative embedding, can you help? @visual glade
Ok, where do I get the support then? I really don't know, thanks
here may be good 🙂
just don't ping the devs pls
Okay.
your embedding is prob-- yeah that
so it will actually work a bit on SDXL but will only apply to the clip-l text encoder
hence the warning
That I know but saw the sdxl model dreamshaper using it, saw on meta data in civit
hmm
Okay, so it works to a point?
it'll half-work.
and by half, less-than-half... since L is weaker than G
Joe
it will do something but it might not be the thing it's supposed to do
@visual glade can i ask you something
should use kohya gui
sure
So better not use it?
I made a node to apply a HaldCLUT to an image. Would it be possible to apply it to a latent? Not sure how the numbers represent RGB values.
does SDXL have unet?
Oh ok
yeah it has a unet
comfy, can you check to see if we included the unet with SDXL
that's probably a problem with your software then or your settings
I don't think people would be complaining about memory usage if it was missing
lmfao
just stare at the noise like clouds and draw what you see, no need for unet
Boring old SDXL still requires a unet???
guys, theyre NEW lmao
we might've replaced the unet with a tiny guy that sits inside your pc and draws the pictures for you, not sure
😕
"beat unet guy into submission only"
poor unet man :c always drawing. never rewarded
Try a LoRA first
are you sure you have the correct SD model file? The official standard one not a diffusers file
i can't even get the lora to work since it needs the main model
maybe it's actually a wnet from sticking two of them together
Freon needs to get Everydream working on 1.0 so we can see how screwed we are trying to do full finetunes on 24 gigs of ram
oooo
wnet.
I'm in
joe
ryzen
using what checkpoint? xD we only have fp16, or the research licenced full 0.9 checkpoint
okay your issue is trying to do this in automatic1111
i think that's kohya gui, which is just a gradio shell around kohya-ss scripts
Ah, got it.
Updated for SDXL 1.0. How to install #Kohya SS GUI trainer and do #LoRA training with Stable Diffusion XL (#SDXL) this is the video you are looking for. I have shown how to install Kohya from scratch. The best parameters to do LoRA training with SDXL. How to use Kohya SDXL LoRAs with ComfyUI. How to do checkpoint comparison with SDXL LoRAs and m...
Try that tutorial step by step
why are you trying to create a model? lmao
Then, you can ask beginner questions here: https://discord.gg/wFYWAykv
heh I guess he's got it working and testing it so we'll find out.. and what happend to the 1.0 with .9 vae sai released today? did that get sorted out?
Any insight on why previous models' VAE worked with fp16 but SDXL VAE requires fp32?
Now I'm concerned we didn't ship the right file tho.
@sour obsidian - can you check?
And make sure we included the unet?
jonatas
alexander
yea, I sometimes forget that, might of slipped my mind
the previous models VAE also has issues in fp16
at least one of them did

https://huggingface.co/madebyollin/sdxl-vae-fp16-fix this should work for fp16?
fp8 wen
I hadn't seen that before. 1.5 and 2.1 both worked without --no-half-vae. But SDXL I had to add it. Strange.
there's enough issues that I made comfyui always run it in fp32
I've been using that one. I just prefer to use the baked-in VAE so it's one less setting to change when I change models. (Can't figure out how to have A1111 pick up a .safetensors VAE automatically)
when emad gives us even more h100s to play with
i use no-half-vae
we need at least 50 dedicated to 24/7 cat pic generation before we can get around to productive usages like fp8 testing
to... train?
Is there any equivalent of no half vae for comfy?
no
Playgroundai.com just launched the first batch of SDXL 1.0 filters. They're super cool.
in the command line
i think stability should run a contest where i win and get a h100
I don't understand enough about how that would affect the unet. How is it related?
well, we can neither confirm nor deny the inclusion of the unet.
might not be very beneficial with sdxl. the base is so good that none of the refined models that have come out have really impressed me much. they all seem to ruin the general use capabilities
i know what your saying
I'm hoping for SDXL hypernet training someday and also inpainting model.
but i have my own dataset
Can't wait for inpainting model
joe. I just checked that video and it has a few critical things wrong.
it's missing the unet only, it's dimension is set to 256... ... ... and a lot more
essentially any LoRA trained using this video will
• significantly harm clip model. longer prompts wont work correctly anymore - which is what more and more people that rely on this tutorial keep mentioning in this very chat
• create huge filesize, when 43mb is all that is needed. especially for sub 500image datasets
• not mention how incredibly important captioning is, even more now on sdxl
• using ohwx token, which was a 1.5 thing, as we have new token weights now
and more but I dont wanna spam you.
Playgroundai has inpainting already
most refiners do too. lots have come out and none have really improved things yet
Nothing local tho
Inpainting model (works well with inpaint only masked)? Or just inpaint feature in general?
I guess sdxl has little to no knowledge of slime rancher...
It's all there, including ControlNet
So they have access to an unreleased inpainting model? That's good news because it means it will be released eventually.
Can it do latent nothing inpainting that can add totally new objects? That's mainly what I did in 1.5 inpainting
@boreal bough was i right then>
It's identical to 1.5
loras can train on large datasets too. i think the community is just going to move towards loras. they're so much more efficient and when they're performing better than all the base models that are available, people will start making lora merges and all the scene will happen there
Sweet
Same here. That's why I am waiting for it and why I am still on A1111. Can't do that with Comfy as far as I know.
And free. They give you 1k images a day for free. Their sub model is like $15 bucks a month.
I'm guessing your info came from that video, since it explains the settings
inpainting with comfy is possible it's just a pita
I did some inpainting in comfy with a 1.5 model but I wasn't very good at it
Inpainting whole image is possible. I have not found how to replicate "inpaint only masked" but I don't need it yet since there is no inpainting model for SDXL yet.
Playground is building what they call "Canvas". It's amazing. It does have a couple glitches. But new model coming soon.
you'd set up a crop node and inpaint that whole cropped picture. then paste it back onto the coords it came from. that's what auto is doing behind the scenes for inpainting only masked. they just crop to the mask
ah yeah, but the training settings in it basically mirrored the ones you had
guessing it's a game of broken telephone in regards to lora settings.
Um the settings I linked you I can guarantee will work out in a good lora
I'll def check it out this weekend
But it sounds like some site has access to the pre-release inpainting SDXL model.
Hmm...that makes sense. But sounds like a lot more work in Comfy especially for iterative generations (multiple cycles of inpainting).
yeah. like i said. total pita
@deft coral
if you use these settings,
then I guarantee you a good lora
@boreal bough
I hope A1111 can get SDXL working fully and with better VRAM usage (needs 16 GB right now - 14.5 used max for 1024). Or else mayne SD.next will be an alternative. I still need to look at that one.
But is it an inpainting model? There is a big difference for certain workflows.
A1111 works with SDXL for me. have you tried nuking your venv folder?
kohya?
If you use https://github.com/bmaltais/kohya_ss
it should be an easy and automatic install
koyah doesn't work for me either
i get the same unet issue
use that. its foolproof, and uses kohya in background
Inpainting with prompt + controlnet
I don't understand. Is that a yes?
after like 2 hours of training only 3/20 epochs completed kohya finally crashed ❤️ i love god
Yes
OK that's good news. Thank you. It means the unreleased inpainting model will hopefulyl be available someday.
It doesn't work on 4k - 32k images. You have to shrink them, then it works.
Their max upscale is 4k
Davinci Resolve will upscale to 32k
@boreal bough how can i nuke my ven
@golden quarry would it be possible to get an update for kohya?
gonna try and get the lora training tutorial out on sunday, and wanna link your distro for it.
just delete the folder, then run setup again.
I've seen your name before. Are you a game dev?
venv folder*
so fresh install
yeah I can
thank you ❤️
I was pretty busy overall
Hello guys, is there any lora training guides for sdxl1.0?
so how do you train a lora without trigger words?
depends? what are you training?
both kohya and automatic1111 with the db extension load up with xl for me. i don't think the fault is with their commits
i think its a local issue
--network_train_unet_only is not an option in my kohya and i have no idea where to put ❤️
I was thinking of training an art style
kill my self
in the gui? extra commands are in advanced
Is ther a good way to get A1111 to detect a .safetensors VAE that is named the same as the model? It only seems to want to use .pt VAE files that way.
So i want to train an anime character on sdxl, can anyone teach/tell me how to do so? 
I kind of dont know what settings to sue
use*
I would still recommend a trigger word, as it will make your life easier.
but if you wanna do it without anyway:
-> Use Interrogator with Vit-H to auto caption everything. Accuracy doesn't matter too much. its only important you have enough tags.
shuffle on
have a dataset that is big & varied enough that enough concepts show up (car, house, person, toaster, whatever - just make sure its not all of a person, otherwise ALL captions will learn to create people xD)
I'd recommend 100~500image dataset if you wanna do without trigger word
50~100 to get a good varied style with trigger word
the hardest part is building the dataset
thanks for the mini guide 🙂 I will give it a go later
with dadaptation trainers, settings are 1 and 1
huh
This guide is a repository for testing and tweaking DAdaptation V3 LoRAs, introduced by Kohya on 05/25/2023 .
For reference to my guide on collating a dataset, and the old method of utilizing the AdamW8Bit optimizer, see - https://rentry.org/lazytrainingguide
Useful links:
Kohya - https://github....
I trained a 'grumpy style' lora which makes any face grumpy looking. But you need to add 'grump style' to the prompt
does it still work well for sdxl? haven't gotten around to trying the dynamic ones yet
i've been using the newest form, prodigy, and getting great results. might try more with dadapt. but yeah it works. i have a 4080 with 16gb and do batches of 2. some models i've even bucketed at 768 pixels
noice. will have to give it a go before I post my guide
some funky eye lashes going on here
Looks like neurons.
you've saved my life
I tried one of the other adaptive ones, (maybe adafactor?), and the loss curve looked identical to prodigy but prodigy was just a little lower across the whole training session. So I just went back to prodigy.
They look like l-systems
fractal lashes
i believe in you
bro
this is so good
imagine if i can make my own checkpoints
@boreal bough
this has so much potential
👋
my biggest achievement on sdxl so far, is getting finetune level of improvement, by training a 5k datasets, resulting in a mere 43mb LoRA file 😄
got over 100 learned concepts working
hmm. i don't know how to make comfy ui stop fudging my symlinks and rewriting new folders over them
one second, going through all my loras, next second, "undefined" cause it decided to make a folder there
you gunna enter the compitition ?lol
all the checkpoint refinements i've seen so far have underwhelmed. while loras push the base in new ways.
true finetuning gonna be expensive as hell if you're in a rush
also we only got the pruned model, so you're either stuck using the 0.9 research license full model
or essentially just merging a lora with the base and calling it a "finetune"
Hm. Civit is throwing a contest. But it's also kind of "A model that is in dire need of some tweaking." ish. Which means the contest is for a porn model. If i enter the judges won't pick mine.
non nsfw only
sfw only
rules! XD always read them
no no what they mean is that the submitted example pics are sfw
oh i forgot he has me blocked lmao
they're very skirty like that. when they blocked NSFW submissions, all they did was make pornographic merges show up in general results, but with their porn images not showing.
the 3 winners will be boobs, bazongas, and butts
when their call to training says that sdxl is in dire need of training, we know what that means
How do you caption with this size dataset?
either way, lykon gonna get one of those 4090s for the dreamshaperXL base - since by contest rules its gonna beat everything by virtue of being used as a base by many others
and for the other two :/
character is hard as hell to get right, as you'll get to experience all the downsides of the refiner - meaning that lora will kinda cheat by being an existing concept, that is only reinforced well
style is the only true competition, as its also the only one that can be trained on all machines, and is by far the easiest to train
automate it with blip and run some filters on it to remove bad captions. you set up processes and work it until you're happy
he's also got eyes to start. i'll just buy a 4090 instead of trying to beat a popularity contest trying to boost civit's traffic
I used blip but I am not very satisfied with its result
yep. what flowwolf said is what I did.
Interrogator running Vit-H
but I really wouldn't recommend it as a starting thing for sdxl, as big datasets are the hardest to get right - and each failure punishes you with 20~40hours of training time (on an rtx4090)
Curating and captioning data sets is the hardest part of any training. The actual training process is easy.
my last lora with just 200 images cooked for 19 hours over 2 runs
im still salty
I've noticed these weird pixel patterns showing up in my SDXL images. Is it to do with the scheduler? Attached image is via Diffusers with SDXL + refiner
please use 1e-3 8/1. 14 min are eeeeeennnoooouuugh XD
I tested a training with 15 images with selfmade caption. It already able to produce decent features of original images.
is this a joke
apprantly thats the digital watermark making it obvious it's ai gnerated. we have to keep that in according to the license
24GB for 1 image
i think
yeah. higher learn rates seem fine i've noticed
I run a 4090 with 24gbvram and can do batch 8
how
The watermark is invisible. There is some problem with the VAE. Try this model: https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/sd_xl_base_1.0_0.9vae.safetensors
is that at 1024x1024? i've only tried batches of 2
i think i need to tell windows to put my page file on the new gen 4 ssd
10~12 is the theoretical maximum on 24gb vram with adamw8bit
I use 8 cause I still like using my pc
its nice to not have 5fps during training
ooohhhhhh. yeah I was talking about training a LoRA
thanks for the advice. i'll try it on my next training run on that same dataset
batch 3 is maximum on 24gb vram
Thanks! How can I pull only that file from the hub and replace my existing one? Is there an argument for the hf_hub utility that will let me pick a particular safetensors file from a repo?
and that's using comfy, as A1111 has a lot of unneccesary overhead
no this new pixel pattern is different from the vae issue. it shows on comfy gens in the new version
encoding the watermark as pixels prevents the embedded data from being lost in transit
automatic probably won't have it enabled by default
Ah, I was wondering if that was going to be the case. If it was just kept in the headers then I imagine it might get lost when being saved/exported into different formats
are you denoising in the refiner stage or adding noise in between the two stages? The refiner usually goes over those pixels but if it is being stopped from working on a noised object it might only gussy up the pixelation
yeah like there was a quick minute where discord was stripping metadata
How do we know the pixel pattern is a watermark? Where did this info come from?
Emad on Twitter.
I generate an image with the base model first, save the latent data and then pass it onto the refiner. I'm using the denoising_start and denoising_end arguments to only refine in the last 30% of the total steps specified
Thanks
I have never heard of that utility. I just download the file and put it in the right place.
It's the library diffusers uses to pull models from hugggingface
because if you remove the invisible watermark code, that pixel pattern which is only there if you look for it, disappears
I haven't gotten as far as writing my own code to use diffiusers. I just use the pre-built UIs.
Gotcha. I'm using diffusers in Unreal Engine
the vae issue produces interlacing lines, it might even be intentional to give a chromatic abbrasion effect
it doesn't happen on my hardware though so i don't really know
Welp, time to look through the code I guess. Oh well.
i was trying to use stable studio this way, but comfy ui hates symlinks and keeps erasing mine
i've got stable studio loaded but don't want to use a seperate install. so i'm trying to point it at my existing comfy and models but it all hates symlinks so much
That is strange behaviour with the refiner, I would try adjusting the latter percentage and seeing if additional or less refinement is appropriate, my setting of 1/3rd is approximately free of pixellation but if I do a run with no refiner to mass produce compositions then it will have those issues, but I can just run them through again with noise
so i was right with the unet missing
ill just wait for it to get added to the main checkpoint
@Caith do you have a favourite tool for tagging by hand?
unet is not missing. it's like 95% of what makes the model work.
there are currently issues with the vae (color encoder)
so if you use the 1.0 vae, then you may end up with funny colored lines near mouths - but nothing critical
absolutely! Hydrus Network
if thats the case
best for manual tagging. also scales well for bigger projects
why the hell doesn;t it work for me
takes a while to learn it once - but absolutely worth it
I'll try with 0.95 base and 0.05 refiner and see what happens
Thanks I will check it out. I was using FastCaption before
Some reason other than a missing unet. Without unet you cannot generate any images from the model. The model is useless.
and that's what im trying to pin point
for automated tagging, use A1111 -> extension "Interrogator", load the Vit-H model, and have that batch generate your captions. best (quick and easy) automated workflow for now
@west breach ^
I don't think anyone knows the answer. But the unet is not missing. That is not the real problem. It can't be.
if thats the case then why is koyah ss telling me diffrent
The error message is not correct. You need to investigate other options.
I used BLIP originally. But the captions were very short and non descriptive
looks cool. so you can setup all the nodes and then use the other ui to just ganerate
I don't know. Start by reading all available documentation about how the system actually works. Not about how to use it but how it works. Then use those learnings to identify the problem. No one seems to know the answer. So you must do this.

It is new technology. If you wish to use it you must be willing to learn.
oooo my terran lora works nicely when i don't prompt for what i was captioning and just do natural prompts like i would on the base model
haas anyone tried KosMos2 from Microsoft? It would do a visual scan and return tags
is one of the many existing models.
the reason I recommend Vit-H, is because it's close enough to sdxl, that it can be used fully automatically
#




