#✨|sdxl
1 messages · Page 23 of 1
this was my understanding as well
Keep in mind that memory use scale differently with SDXL than 1.5 so reducing res won't help as much now as before.
wait does this also mean higher max res with same vram or nah?
McMonkey mentions in the gist that you will exceed 8 GB VRAM at certain points which is why an updated Nvidia driver is important.
Look at Joe's images. You get a much higher resolution than possible on 1.5 without tricks but at a certain point it becomes pointless though
So SDXL isnt good for the grid method like 1.5 is?
High resolution won't add details SDXL hasn't learnt.
Caq: If you're talking about what I think you're talking about it is just less needed.
wait is this thing based on an actual prompt?
so what you're saying is SDXXL needs to be trained on macro photography
Why would it be less needed?
im guessing cuz it would be better to do the whole image in one go or something
Doesnt seem like it finished denoising anyway.. I dont get it
No I mean like grids for animation frames
oooh yeah idk then i thought u meant tiled upscaling
yeah idk either
Yeah @wicked frigate No idea what's going on with the trainer. Regardless of what size I set my images to, it always uses 12GB+ of VRAM and starts to use system memory and goes insanely slow, or with the old Nvidia Drivers it OOMs
Then we're thinking about different things.
wow, check out the results in #1100484581037195384, the refiner is amazing!
In 1.5 you needed different tricks to achieve higher resolutions due to VRAM usage but with SDXL scaling more gently that would of course not be needed.
im getting almost 2048x2048 with 6gb on sdxl comfy, easily 2048x2048 with other models
Yeah the refiner is pretty good. People have been using it since the 0.9 'release' and even on 1.5 based images it creates a noticeable improvements. If you look for some images that combine the refiner with 1.5 Juggernaut model they're outstanding.
comfyui is so clever wtf it caches the parts if they dont change
Training not inference
ah
Something has broken with the training scripts
And it seems like you can no longer train LoRA on less than 12GB VRAMish
Or at least that's what happens to me
I am fully convinced ComfyUI is the best UI and it just needs a little more user friendlyness for it to be the most popular.
its so good wtf
automatic1111 feels clunky in comparison
it might just be that i dont like gradio
When I was training last week on a 12GB 3060, I had to turn the text encoder training off (and cache the text encoder inputs) otherwise it'd try to allocate like 60GB
but wouldn't SDXL 1.0 be better than 1.5? Im not sure what juggernaut model you are referring to is
juggernaut is a trained 1.5 model
I tried it again just now with the text encoder on and it works, but it's painfully slow
Yeah, I trained a LoRA last week on 1024x1024 at batch 2 and it worked. Now I can't even train 512x512. It's clearly hanging onto something that it shouldn't be.
I can definitely feel your pain
Juggernaut is a 1.5 model. To clarify SDXL is actually two models: the base model and a refiner. Some people have been using 1.5 models as the base and then putting the SDXL refiner after, basically like how SDXL works but with a different base model.
okay, thanks, I believe sdxl will replace 1.5 entirely
previously you had to jump to an old commit, right? if you remember when the newest commits were when you went down that route, could maybe pin the code down
voting question: are people voting on preferred generation between two images or how well it followed the prompt?
I went back to the old commit again. It worked once and then never worked again. So I have no idea.
Yeah it's a significant improvement. I think thing that might hold it back is if people can't do fine tuning and LORAs with just 10GB VRAM but it sounds like there's a solution for that.
i thought that sdxl was sd3, but apparantly the versioning is weird and its it's own thing. not sd3. 3 will probably replace 1.5 fully. xl has a lot of over head being 1024 and a lot of people may stick to 1.5 due to it
kinda depends on the prompt, ideally it should be followed, but consider things like overall quality, light/shadow/texture, etc. The machine can't always hit every prompt the way we think it might, so there's always that
1.5 has so much momentum to it
idk, the speed is pretty good
Mcmonkey says all you need is 8GB
yeah. but not as
I'm using his scripts and it doesn't work on 10GB
Been there
Yeah I heard that I'm just not clear on the details so I didn't want to claim it was solved.
Yeah it's not
I imagine gradient checkpointing is a big reason
12 and 16gb gpu sales will skyrocket
I can tell it to train 1x1 images and it still OOM, there's something broken
trying to force full fp16 training and see what it does
It's great to build complicated pipelines but a wysiwyg editor is 100x more streamline to take what u get and start inpainting or completely changing and adding many control nets etc. Seems like a big trade off to make. Seems like it would be helped by having a gui as well that you can flick to at a moments notice and use all/most the same tools and models
honestly, have a look over on openai. a solid 9/10 people who use dall-e/gpt on the reg havent a notion about alternatives
nope 12GB VRAM needed again
when i first bought an 8gb gpu, people thought i was nutter cause it'd never be needed for years. then gtav pc version came out.
soon as theres a need, people upgrade in huge droves
Apprently there is a solution to that problem releasing today or very soon
seems like some of the A/Bs by the bot is just using the same model with refiner at different strengths
seems like very bias data source
I think the ability to load layouts helps with that a lot. Honestly I think the last missing piece is a node that is basically a black box containg a group of other nodes so the user doesn't have to worry about the details. Basically one node that's your "this does all the lora stuff" node that you plug in, but you can open it up if you want.
Maybe they will kill the refiner if it doesnt perform well
Wait what? I haven't head that? Got any details?
suppose. but what i mean is, i cant see quite as big an arms race as all the buttcoin mining stuff from back when
No details just Mcmonkey has been hyping it up
nah the model is way better with refiner
but i think 35% is generally too high
so they might be testing what exactly is the best %
Ah ok. I hope it works out but I'm a bit burnt out on 'hype'
Yeah some kind of peice of the pipe that can have a full ui and access all the fiddly shit like loras and controlnets would be nice.
Maybe they can confirm or refute it, but it might just be picking a random model twice with different settings without checking that it is using two different models.
bitcoin never really made a hardware arms race. there were some software competitions. Ethereum mining was more about deploying as many gpu's of any performance as possible
have you tried backdating to the commit ID in the gist?
Oh and have you made sure you have up to date pytorch and xformers installed?
bitcoin sure accelerated ASIC development for a quick bit
Stability ai's own suite?
yeah that could be it
Machine Learning has significantly upgraded Nvidia's hardware design
A comfyui feature
I installed the scripts and the venv entirely fresh. I've used old comits before, but I'll try that older one just to confirm.
well that'd be a double blind so a decent idea. dont over think it though, screws with results if you have that in the back of your head 🙂
unfortunately it's also inflated their prices
I dont see self hosted AI making much of a dent in the GPU market. But if running AI on gpus starts becoming as profitable as crypto was (which is definitely something that could happen) then yeah GPUs could totally become scarce
everything has inflated prices these days. the dollar is being weird. also, they're selling more gpus now than ever before
market is so fucky
but maybe consumer gpus are too weak/low vram for this to matter who knows
people mining bought them in multiples though. although, might make a decent use for kazakhstans hilarious concentration of gpu
selling cycles as service could bump it up but i might as well be writing the worlds most boring scifi now
nvidia stops giving a shit about consumers as AI explodes and increases prices, AMD increases prices because nvidia does
also inflation lol
its off sdxl chat but just to bookend my thought, i can imagine web servers having AI and accessing sites being generated on the fly by specifically tuned AI.
I wouldn't be so sure. Between local Stable Diffusion, a ton of Local LLMs, some of the code copilot type stuff... AI is taking off more and more and a lot of people want to be able to run it locally. Although what most consumers really need for that isn't so much a faster GPU as it is a cheap GPU with lots of VRAM
but again, boring fanfic.
Node based UIs are very common when you get into the more technical side of creative software. Look at Nuke, Unreal, Resolve, Blender, and many many more. More technically oriented artists will likely feel very at home in ComfyUI. Less technically oriented artists probably aren't going to be using Stable Diffusion directly in the first place and will more likely use text to image through higher level tools like Photoshop.
I think one of the biggest weaknesses Comfy has right now is using it for highly interactive workflows. This is not an inherent issue though. It can be improved. Even then, artists that do anything involving non-realtime rendering may find that iterating on comfyui flows is not that foreign. Comfy will need the ability to run workflows programmatically so that it can be integrated with render farm management software though.
training a 3rd friend on sdxl. can't share iamges causee my friends are private types and don't want me doing that. good learning experience though.
Using kohya to train loras with the new prodigy optimizer. it's a dadaptation based one. works nice. 15 epochs with 20 images repeated 10 each. takes about 2 hours each time.
its pretty remarkable how easily good it is
It needs more nodes built in too tbh, the experience of developing custom nodes is extremely buggy right now
what machine do you have @trim orbit
i know someone training rn too and they're getting glacial numbers
i7 127000 with a 4080
Node UIs are also very common in DAWs (used for interfacing with/generating music). They're a great bridge between non-programmers and programmers when you need it (and it's done right).
2 hours for a face lora is ridiculously long
Ok that old commit works.
Well it does this time, I've had it work once and then break again for no reason the next day.
yeah if i launch it wrong, it slags and estimates 30-40 hours. gotta have bf16 on, caching the text on, --network_train_unet_only on
cheers! just needed a ballpark
did one with 10 images at 40min'ish. keep in mind that these are 1024 bucketed images.
to be honest im really not too sure this is the kind of thing that depends on a ton of stuff, but on the supply side of things you can bet that there will be a ton of competition popping up when it comes to AI accelerators
it'll be cool to see what Jim Keller's TensTorrent pushes out next. Their original hardware line was a good start.
maybe he'll end up back at AMD again
AI is definitely going to be a bigger and bigger part of building a computer over the next decade. I'm curious if we're going to see a dedicated 'AI Processing Unit' or expandable VRAM first
In 7 years we're going to have biotech GPUs 😄
life long system builder, ML was the primary focus of building this new machine. Gaming a close secondary
Expandable CRAM is all I want, also ironic that Nvidia has tensor cores but we dont use them
Lots of consumer CPUs already have the AI processing unit components
And we won't be using vectors for graphics processing.
Same. AI and VR are the only reasons for my last upgrade
yeah deffo but its the scale that matters, will tons of people need massive GPU sized accelerators or will most people be fine with an integrated graphics sized accelerator
What GPU did you use to render that? A Quadro?
Probably, but there are some really good custom node libraries available now. I've been dipping my toes into developing custom nodes, and the biggest issue I see is lack of documentation, but that will come in time. I find custom node development pretty trivial in most cases. I've worked on plugins for various DCCs though, and am very familiar with node based UIs, so this is pretty familiar territory for me.
https://www.pcworld.com/article/1076150/intel-confirms-ai-improvements-will-come-in-meteor-lake.html and Apple has had AI cores for quite a while
between models getting easier to run, more specific accelerators, and the more demanding stuff being able to be run remotely its deffo a toss up as to how important buying expensive AI accelerators for local use will be
I imagine it's going to be exactly like GPUs and gaming. Most people will be fine with a 'good enough' solution, then you'll have your hardcore bleeding edge people paying big bucks, and there will be some type of middle ground for people in the... well middle
i was a good enuffer for years and years
I mean specifically it's buggy to change nodes inputs without breaking existing workflows that use them
thats why i always bought amd
And primitives in general seem to have bugs
That I cant quite figure out
Like weird incompatibilities
can't we still render below 1024 tho?
Honestly if Nvidia wasn't so absolutely dominant in the ML and VR space I would still be with AMD. I started out literally getting scrap computers from a local college my uncle would give me and just upgrading the GPU and ram.
yeah this is what i kinda expect too and why im not too worried about the availability of this stuff for now, who the fuck knows what happens when we get AGI
my frankenstein machines ran pretty well
sdxl? not very well. i'd say 1.5 woorks so much better there
Yes you just need to connect your output to a scale image node and set your desired resolution :p
i've got an old alienware area 51 rebuilt with all different parts. the case controller isn't wired up because its proprietary, so its very much frankensteined rn
i love frankenpc's
isnt the newest AMD gpu generation competitive in AI performance?
The good news is when we hit AGI it will instantly be super intelligent and it will just tell us what we should be doing 😆
hmm. If we found a way to, We would save a lot of Vram for people. The bots have sometimes rendered images in 768. I swear they have.
for a while i was turning it on by leaning way over with a paper clip and getting into the mobo just right to touch it to two specific pins and to close the circuit
I upgraded about a year ago and haven't been able to justify another one yet so I admit I'm not up to date on the latest GPUs
Honestly with the way its designed I dont think lowering the resolution saves much vram
just use ziplm 🤣
"competitive" hmmm. no. i'd say no. it's a huge compromise for ML still
Nice. Luckily the worst I ever got was needing to reroute the reboot button to power on because the power switch was dead.
A100
random benchmark, disclaimer i have no clue what im talking about but the 7000 series is somewhat comparable in performance while those before were vastly worse
Tom's is usually a good source so I'd trust it
Can't wait to see this with 1024
thanks for that
Ahh..... There is a few people who use that GPU on here. Yh. It's around 48Gb Vram isn't it?
a100 has 40 and 80gb models
@wicked frigate Ok yeah confirmed, there's something that's been added to the code between that commit in your gist and now that has caused the training to use more VRAM.
I've noticed with this - ImportError: cannot import name 'StableDiffusionXLPipeline' from 'diffusers'
The newer script uses that StableDiffusionXLPipeline, but the older script doesn't
cant it also do shared memory?
nvlink capable yeah
80Gb. blimey... That is nuts
yeah and you can get even more by linking them lol
I'm 90% sure they ran their 6000 series and Intels on fp32. my RX 6600 was 2x that it/s on fp16
Wdym by this btw, just pure performance or cuda and shit
all the cutting edge stuff comes out supporting nvidia because AMD is always playing catchup on the software side
Yeah kk that's what I expected
Nvidia also has the money to bribe support developers so they often work hand in hand with software and game devs to optimize things.
that benchmark is a bit outdated and also wrong
6000 series from AMD get more performance than that with ROCm
Ok so I do absolutely have no clue what I'm talking about good to know
6800XT is about 7it/s last time I tested and it might have increased a bit with latest comfyui changes
https://youtu.be/It9D08W8Z7o here's a cool new direction nvidia is doing too. entire platforms for AI. mental arm cpus driving them.
Yeah GH100 is fucking sick
that's what AMD's strategy seems to be for gaming features going forward too. so it looks that way for starfield from bethesda
and 3090 TI is 24it/s or slightly more depending on how many windows I have open, at least that's what I get on ComfyUI
Could it be the UI they were using? They weren't using your UI. Also they ended up using two different UIs for AMD and Nvidia which granted isn't a great practice.
Is comfyui considered the fastest?
Yeah I hope AMD does more of that, Nvidia has been doing it for literally decades while AMD has been happy to half ass it. I hope that's changing
the 4080 numbers are about what I see still currently. i've been getting those since i got it and optimized that hog as much as i could. think it was about february i hit 23it/s with the right config. ever since python 2 i've had it steady 23-24it/s
Me with my 4.5 it/s (sd1.5) rtx 2060
meh. i'd rather dlss was offered. fsr is nice to have but it sucks.
relied on it with my vega64 for 1440p gaming
Yeah dlss is awesome and deffo better than fsr
yes it's one of the fastest with a pure pytorch implementation
Can u use it with an auto1111 like ui?
i've got sdxl working on the /dev branch of automatic and i think vladmantic's sd.next has support in the main branch
Cuz the huge customizability of node based isn't always necessary
Yeah which is why I'm curious if it can work with a simpler ui
But keep the comfy backend
yeah you can easily make a simple ui on top of it and there are a few of them
the dream ™️
Backwards compatibility is always difficult with these sorts of things. For production ready custom nodes you probably need to release an entirely new node and deprecate the previous one if you are changing the interface in a breaking way.
This is also why VFX studios lock down their software during a project and for the most part only upgrade between projects.
In terms of issues with primitives, I'm curios what issues you're running into.
don't f with the tool chain
link/name of them? 👉 👈🥺
there's always that one technical engineer guy in a design studio who is panicking over anyone suggesting new changes
Yeah I would like to know too lol, if I make a new int input and try to connect an int output to it it will fail with no good reason
"CHANGE?!"
And changing the inputs in general often breaks my nodes on existing workflow, it works for new nodes tho
thx
I would recommend deleting the old node and replacing it with a new instance of that node when the node changes.
I do that but when you have a workflow like this it gets tedious
And last time I did it it still broke for unknown reasons
boo
show me your war face!
Since workflows are just Json blobs, a good solution would be to write a script that automates the replacement process outside of comfyui. You could even release said script to users to help them update their workflows.
I might try that, not sure how to do it exactly but being able to switch the output llinks of two nodes in the json file would be a start
when I update a node in a way that it breaks something in the frontend I add code to fix it
I'm mainly concerned about primitives not being compatible but its whatever, my workflow is almost complete
For reference, in academia at least I have never met anyone using AMD/Intel GPUs for research ever
are we doing monsters?
It is either Nvidia GPUs (consumer or commercial) or Google TPUs, that's it
Monsters?
This is the ideal way to handle it where possible. I haven't poked around much beyond writing custom nodes. Does ComfyUI expose enough that a custom node library would be able to do this?
Somebody say monsters?
no a custom node can't do this unless it also adds something to the frontend
by that I mean a file in the web/extensions folder
and even within that subset, I'd say the users of Nvidia GPUs to Google TPU ratio is higher than 10:1
I think so
I have never met anyone using AMD or Intel to train large scale networks
a1111 has support already if you install /dev branch
just make a new folder then
Who is it that made that 2b Lora for SDXL. I've forgotten who it was.
and his master friend
Basically it seems that the prompts that are in showdown/pantheon are more tightly clustered than the prompts in Bot-X
It does a lot of styles but I think people like the mid journey 'look' so the model has a bit of a bias in that direction
Hey what's the best current workflow to use in comfyui ?
I'm not sure there's a 'best' but I like Sytan's
they put a lot of work into it and it shows
Yeah! That's the one I was using last week, has it been updated ?
I pruned the model from 13 g to 6 g this is the output seems to look pretty similar and took a lot less time. Not sure if there are other pruned models out there or not
@west breach I've send you a message
What tool did you use for pruning?
Just so you know they have a github, but I don't think it's been updated recently. https://github.com/SytanSD/Sytan-SDXL-ComfyUI
you can use comfyui to "prune" models
Thanks!
just a model convert down to fp16 and no ema
do this but without merging
It's not really pruning is it for SDXL, it's just converting it to FP16 instead of FP32
Ah so just casting, not real pruning
christ it hasnt even been a year since SD 1.5 what the fuck
For the love of god, people need to stop upvoting midjourney style pics in the bot-X channels, that's how we end up with a base model that can't do photo realism
Just to be clear, these were all generated by the bot but a lot of them were while the refiner was on. Just because 'base model' has two different meanings at this point.
I would recommend you check out these custom nodes and the example workflows: https://github.com/SeargeDP/SeargeSDXL
They make SDXL workflows a lot simpler.
I'll check it out !!
ok well I've found a specific version of the training scripts thanks to mcmonkey that lets me train 1024x1024 at batch 2
icu
Nice. How much VRAM is it using?
I think it might be using a tiny bit over 10 and going into system RAM. I'm just checking at batch 1 to see if it changes the speed. As it was still running 3s/it even with all the VRAM being used.
reminds me of the photo used for joji - glimpse of us
First Ever SDXL Training With Kohya LoRA - Stable Diffusion XL Training Will Replace Older Models : https://youtu.be/AY6DMBCIZ3A
How to install #Kohya SS GUI trainer and do #LoRA training with Stable Diffusion XL (#SDXL) this is the video you are looking for. I have shown how to install Kohya from scratch. The best parameters to do LoRA training with SDXL. How to use Kohya SDXL LoRAs with ComfyUI. How to do checkpoint comparison with SDXL LoRAs and many more cool stuff.
...
ok batch 1, it's using 9.5GB VRAM
thats p tight
ugh, I'm getting NaNs during training now I'm a bit in though
dang
I have had this working before, so I'll try with some different settings
Are you using no half vae ? or the updated vae for half precision ?
Was using the updated vae
oh okay! I don't know then 🙂
It might be AdamW, it was doing this before, but prodigy worked, so I'll try that again
no
Has any of you been able to produce sampling images during the training ? no matter what parameters I'm using, it's always... weird
SDXL lora training yes, on kohya
Sampling during training isn't working for me so I just hope for the best and test afterwards.
Is there any tool for training refiner yet?
yeah... that's what I'm doing but... yeah...
hate this midjourney palette overtrained style - I am tired of it tbh
i've made a few loras. they're good. might run a large set i have over night
eyes :/
What's wrong with her eyes
deformed pupils
they dont pass for real
Oh wow, you really gotta squint to see that. I glanced at it and thought it was fine
I even zoomed into the image and took a look at the eyes, but forgot to look at the pupils
we're a bit used to seeing them with reflections, so maybe that's why it's not immediately obvious
Really speaks to how good these models are, even after you told me the eyes were messed up it wasn't obvious to me
dof is a little funky. shoulders in focus but the chest look like it's miles away
lol so @wicked frigate it will start training on that commit. But it NaNs almost instantly regardless of settings.
this style does that a lot. Sometimes even the nose is out of focus, eyes in focus, and shoulders out again
if you want I have a hack to produce sampling images during training of LORA but it requires messing inside 2 python files
But it's really slow and needs more VRAM
adetailer in auto1111 is actually helping my lora gens. not to it's best effect though. it's just straight img2img ing through the base model again, without any controlnet
if i use https://comfyanonymous.github.io/ComfyUI_examples/model_merging/ to try to prune, the output should automatically be fp16? should i run comfy with –force-fp16 just incase?
Fucking weeb 🥲
depends on your hardware, unless you have 16xx or lower it will be fp16
colab T4
faces?
yeah lol
training time?
Not sure if it's the best workflow for SDXL, but it works pretty well. I use it all the time 😉
about an hour. I'm optimizing to reduce time right now
(on an A100 80gb lol)
good luck. I'll be happy to try out the script on 10 images when its stable
Sure!
did you get faces?
indeed
nice! How long? With Kohya as well?
A very fast lr I think 5e-4 so only ~20 minutes on 3090, and yeah kohya
Its not very good lol
ah cool! I got it with a super high rank of 256 and it was amazing but the lora is like 2gb
which defeats the purpose (+I used reg images)
now I'm trying to bring it down to rank 16
LR was 4e-4, trying 1e-4 as well
storage isnt an issue for my purposes. I wonder if thats the only big downside of high rank
well it takes a while to load
so it slows down inference. It's an annoyance mostly, totally fine
btw I love your Van Gogh fine-tune!
thank you! I can't wait to try it on XL. I've been doing faces since then and its just not as fun
faces are great when they work haha
or when i squint
Sorry I had to do a thing. That's awesome, thanks for the info.
I thought someone would have nailed a face params by now but it's a bit hard
my very first attempt
I got a NVIDIA GPU laptop at last what should I run SDXL on frens
It doesn't seem to work unfortunately, it gives NaN errors
did someone at Stability figure out how to do face LoRAs well yet? Asking for a friend 😄
we'll all be hyperdreamboothing in no time tho I'm sure
(would sincerely love some tips)
haha yeah I'm writing a script for it right now
comfy!
Well this has certainly been a rollercoaster. At least it proves it's possible so I'm sure the community will get it sorted sometime after release if the devs don't.
in the spirit of the new For Honor hero
I'm very glad the style prompts got released, to me that's just as important as having the SDXL 1.0 weights
No joke
It makes reproducing the bot-X/showdown/pantheon stuff easier
Though with the random models going on we can't exactly reproduce much.
Fair, I don't need to reproduce the images exactly, just get close enough
For research
I hope things get better when I can run 1.0 locally as most of the prompts I've tried seem to be worse in 1.0 than 0.9.
Bro I warned you I wasted hours on that lol, I personally recommend waiting until 1.0 unless there's a pressing need
Tbh most of the benchmarks for vram use are done on linux with gradient checkpointing too
Just use Roop, faster and easier for single face
Face LoRA's are boring
mcmonkey said they did theirs on windows with 8GB VRAM, so not sure what's going on
it's funny to see that "deformed" made it to the list of the negatives in the style prompts 😄
"ugly,malformed,deformed,strange,2345_fingers,two_heads"
Roop feels tacked on because it doesn't capture the "head" just the face
I swear I remember his settings menitoning gradient checkpointing, not sure if theres a way to do that on windows
lensa's revenue begs to differ
I'm sure they will. The fine tune models are a thousand times better than the base models for 1.5, you gain so much with LORAs, and then when we get controlnets you get a level of fine grain control you just can't get without them.
bf16 works best for no nan's
Still easier than training a LoRa 🙂
everytime i use fp16 i get nans 10min in
But I was able to get up to 768x 1 batch size on my 3060 12gb (8 rank lora)
On windows
There is, but his settings don't have them enabled. Also if you can't do without you aren't going to be able to do with
Stop voting for midjourney style pictures and we all end up with a better base SDXL 1.0
Especially if all you have is a M1 lol
Hmm ok
Yeah I still dont get how mcmonkey got his results
Most if not all of your post there is irrelevant to my post. Most of my prompting attempt has lead to worse results with 1.0 so I hope this is a fluke with the settings and not representative of the models.
difference with MJ is we can train our own embeddings
or loras or whatever
base model looking like MJ for short prompts? oh well
if you say so. Finetuning absolutely improves prompt results, LORAs increase prompt accuracy/options, but whatever you do you
Hmm Kohya just updated his scripts after I logged in issue and it's training with the newest version now
1.0 is awesome on the #bots so much easier!
I was trying some styles in 0.9 and one of these hit the mark and the other seems quite off. If all the differs here is the model then it seems like they differ a lot when it comes to understanding prompts/styles.
Let us pray to the SD gods, if the base model ends up like midjourney it is going to be fun in a not so good way
Should I pull the main repo?
SDXL has produced the funniest image i could never get in any flavour of 1.5
@civic sigil You need the SDXL Branch for now - https://github.com/kohya-ss/sd-scripts/tree/sdxl
Hmm that does look promising
Oh right thats what I meant
I forgot ab that
I'm a bit worried to see that the 3 models were that close in the piechart.
Thanks for reporting the issue
I'm using Prodigy as that's worked for me before, so I'll let this run and see what it does and then try with AdamW because that's never worked for me on SDXL
Same
Prodigy also heats my VRAM up less
no Lion ?
I'm just trying to get this damn thing to work
anyone know if SDXL will work for us mere mortals running a M1 pro lol ?
I know SDXL runs on a 3080 and people have run it on 20xx series but I don't know how that compares
you could try running 0.9 and see?
prodigy has been working good for me
spooderman
@mortal wingThanks mate
18 seconds per its for this round of LoRA training. Everything is fine...
If only I had 13 GB VRAM instead of 12 GB
What are you training on?
An Anime Artist I have some images from, that I'd used to test on SD 1.5
Sorry, I mean what hardware
3080 10GB
Ah oof, that 10GB
It's finally worked though and it's worked pretty well for a no effort attempt
The art is slightly horny though, so looking for something suitable to post
It's not perfect by any means, as I just used some random settings. But you can tell it understands the style.
ComfyUI
I've been using loras with comfy and training with kohya. Works great.
Something I have noticed though is it breaks out of the style really easy if I change the aspect ratio to one I didn't train on
But that might be the settings I used
Plus the fact that I only ran it for 15 minutes
sdxl 0.9 is pretty good at styles. Out of the box, it's better at "by Frazetta" style than when I fine-tuned 1.5 for it.
Mine? They are supposed to be that way.
Nice
i was born in the wrong generation 🥴
the mha girl?
a bummer
The dataset had art of her in yes
I want to know if lora on specific characters works well
It should do
I've got some old datasets I can try, but I dont have enough time today
I trained a lora on a person and it was as good as when I do the same for 1.5.
They all have Booru tagging and I'm not sure SDXL likes that
1girl as a single word skews the image towards nsfw
Just from my analysis of the images in showdown
Now I don't know if this is just the low training time, the dataset, or the lack of training the text encoder. But if I prompt anything that isn't in the training data, it loses the LoRA style completely
In addition to painters, 0.9 also understands the styles of a lot of famous photographers. So if you have not tried that, then you should. It's fun.
I have to crank the weight a lot
I wonder if it has anything to do with dual text encoders
It could just be crap training settings
For the failure to adapt to unseen words
hasn’t it always been like that
Nah, well trained style LoRAs work with pretty much anything
It does sort of work with the weight cranked though
she’s wiggly
Yeah I think that is training time + Small dataset
and .9
do we know what artists we can use ?
I like "painting by rembrandt" for portraits.
I like "Photo by Herman Leonard" for photographic portraits
I might try a run with Booru tags tomorrow and compare and see what the difference is.
Now I know it works I can try stuff
booru tags worked well with the lora i made, but they weren't overly specific- mostly describing clothing
but I had to run simple prompts in G, and most of the tags in L
Did you train the text encoder?
Because I wonder if that would also make a difference
i did
I don't think I have enough VRAM to do that
been meaning to try one without, but been too busy
when you guys are making loras, are you doing --network_train_unet_only ? does that mean captions don't work?
Captions do work, because I used my captions to test the LoRA and it did what I expected
I think you just aren't training the bigger text encoder
I use the UI and I just set the learning weight for the text encoder to 0. No idea if that's correct, but it has worked and I only use captions in txt files.
WaifuXL is out 😄
link ? 🙂
It's the same beta version I had and I couldn't get anything good out of it
yeah i'm training with captions too and they seem to work. i was just curious if it's confirmaiton bias or what
I trained a person that the model didn't know. So the fact that it works is a lot more obvious than with a style.
I've been picking specific styles so I can tell
i like this SDXL style guide!
will be cool to see how an early finetune looks
No refiner ftw
gives me confidence for the next training run i'll do. thanks
Unrefined 
any of you tried refiner on anime and had any luck?
i tried it on some akira-ish images and it was no bueno
Yeah I'll probably try a few things tomorrow.
I need to test normal tagging, booru tagging, small datasets vs large datasets. Short training time and long training time.
So far I've only done 15 image datasets for 15 minutes as a quick test.
yeah, tried a bit but seems to make it worse
What is SDXL exactly?
very cool - really like the styles you are exploring. I also like making analog photography images in general
When exactly is the release date?
1 week
What happened exactly?
doing some extra fine tuninigs i guess
you're the analog wizard
have you used the analog film style in pinned messages? I've been using it for all of these, its amazing
Based on local generations vs 1.0 on bot then 1.0 is a step down. I hope this is just do to settings being wrong and that 1.0 will actually be better than 0.9.
ok the training with AdamW came out a lot better than with Prodigy
I created this analog photography portrait prompt build for SD 2.0 - it also works great with SDXL. Just fill in the placeholders:
cinematic movie extreme close-up still of an epic scene of a [ETHNICITY] [OCCUPATION] in the [SEASON] at [DAYTIME], centered, looking into the camera, fog atmosphere, volumetrics, photorealistic, from a western movie, analog, very grainy, film still, kodak ektar, fujifilm fuji, kodak gold, cinestill 800t, kodak portra, photo taken by thomas hoepker
but there might be better ways with SDXL. I just refactored a couple of my old prompts
i was using this before today, its very nice. I like to have a lot of flexibility in style so I shortened it a bit. I'm so happy with XLs photography so far. its only the eyes holding it back now.
Hmm ok, I think when you don't train the text encoder the prompts work better in the text_L
yeah, I agree
Putting them in Text_G seems to then pull the style away
I probably sort out most images because of one facial feature. it's mostly the eyes.
Comfys official basic txt2img combines them. have you tried that?
Putting them in both works too
or you get creative 😄
Sdxl 1.0 looks much better for me imo
MC face reveal
Besides Big boobs, what is impossible for SXDL to do?
Wat
It can do big boobs fine
"impossible" is a give up attitude
youtuber told me it was unpossible so i'm sticking to it. i love youtuber!
Can you show your boobs? 
Is the sdxl release delayed by a week ?
I will try then
What does style do technically? Is it added to the prompt or smth elese entirely?
yes they are prompt tokens and will be applied to your prompt
SDXL can't make images of Emad.
had to run my own test. big enough?
Deviantart: Larger
prove it to the man
Here you can have a look how it works. Stability AI posted the official style tokens that are used by the bot earlier:
#✨|sdxl message
Noice
Thx
with the help of Roop you can 😄
i tried roop. annnd cant get it to do well with sdxl
I am sure it's a matter of time
i had it working fine on comfy. now i'm using dev branch of auto and its working more fine cause it has codeformer
oh it worked but i guess roop works best with low resolution becuase my faces were coming out very low res
roop faces are only about 128x128
rare photo of me
roop is very low res yeah. thats even more apparant on sdxl codeformer helps a lot
The developers made a ranking of the best and worst samplers, did they release it?
for sdxl?
Yes
oh wow hello! 
@solemn bear loving your work on PAI man
thank you Sir! Also good question for artists, I'm also interested
yeah artists names would be good
So, from what I can tell, one of the final candidate SDXL models has zero terminal SNR in it?
@solemn bear i know of this one https://creator.nightcafe.studio/collection/Cv8Qc6vtaks3IZHyQ9MT
I will pay for someone to make an eye-fixer node
thank you🙏
like a one-shot method tho. inpaint takes cherrypicking, codeformer ruins likeness once I'm generating on trained faces @latent zodiac
it would be nice
I really do hope the zero terminal SNR model is selected as the final one... because if it isn't I'm going to have to figure out how to shard SDXL on a TPU to add it back in...
had this one a bit ago, it's great except....
personally i enjoy having my generations' eyes look like they're tripping on acid
Why it looks like that only difference between two variants are clip skip value?
@soft bone how you upscaling these?
siax
ok, so I think character LoRAs are going to be easy
I don't know which you're talking about but one is so bad at following prompts that it should definitely not be chosen.
I just did a test with a dataset of 120 images, ran it for 15 minutes, so about 600 steps, my tagging is dogshit, but the likeness is pretty much spot on
i dont think loras for the base work for the refiner. so i wonder if thats what interoperability means
That shouldn't really matter
The refiner doesn't need to know your subject to refine it.
I have been finetuning a zero terminal SNR model off of SD1.5 and I can say that prompting it is absolutely different, most likely because it doesn't have the learned average brightness from the broken noise scheduler holding it back from doing what it wants to do. The upside is that you have control over the brightness. The downside is that you have A LOT of control over the brightness.
That is true, yeah best the base model take care of it otherwise we would have to be more inclusive with people with lazy eyes 🤣
I'm happy with 0.9 abilities when it comes to light
for the small amount of training this LoRA is extremely good for a character
doesnt take much with phoenix optimizer. so simple
yeah but zero snr models can do this
I used AdamW Optimiser
So can 0.9, all you need to do is to set aeshetic score to 1 billion in ComfyUI drhead.
Another 0.9 image. Manages bright and dark portions of the image well.
0.9 is great at dark
this is what i managed to get out of the discord bot for the same prompt
Solid black background is a gimmick, while it says something about the model what matters is the images you can create with it.
I don't actually want images where I can't see anyting.
i've also noticed that models with zero terminal snr perform better at inpainting
you know how often times even with the inpainting model you'll get a sort of "halo" cast around an area where it's obvious where you inpainted? even without a proper inpainting model zero terminal snr models don't do that
and they can handle higher denoising strength without hallucinating
where are you finding zero terminal snr models
is their an sdxl fork ?
i'm training one myself, and i know a few people who are prototyping their own
very cool, this was something I was hoping would be an architectural choice but I'm glad to see ppl adding it in
note to self. read about snr
i could be wrong but i remember reading either joe penna or mcmonkey saying it looked nice at first but in the end wasn't that great
i do plan to release a training base SD1.5 model trained on laion aesthetics 5plus with duplicates and low quality matches removed. probably SD2.1 too, and SDXL if I have to and I can figure out how to cram it on a TPU...
https://old.reddit.com/r/StableDiffusion/comments/13joe98/sds_noise_schedule_is_flawed_this_new_paper/ good place to start before you start
it is not going to be easy to do full scale finetunes of SDXL. the best plan I have so is to store the text encoder and VAE on one core and to split the U-net into thirds for TPUv3 training. If that works decently fast it should allow some data parallelism as well.
hot damn I missed a lot while sleeping
That's why sleeping is bad.
emad showing up was not what I was expecting x_x
well - while the delay was expected, the general communication we got instead was pretty nice
after style prompt release - did they mention/promise anything else?
Did you catch the LoRA stuff?
only that it will be released at a later date
or did the get sped up to today as well O:
Yeah that was what I meant, that there is something that will be released later.
Did you catch the free 4090 to everyone Opera style?
wait what XD
I have no idea what this means - but I'm intrigued
Just joking
But no, I don't think you missed anything.
I just hope WaifuXL is good, it's just taking me forever to download as the download stops and I forget to start it again.
I actually don't agree - there never was an offical statement that was like oh this is delayed. It all was/is pretty burried and implicit. They should have at least written something about it in the announcement with the new release date
Like how many ppl are actually in this discord and then read and scroll through the sdxl channel
It should have been obvious that there was an misunderstanding as there was nothing really planned for today like a stage event.
While the delay is unfortunate, in a months time no one will care about the delay
What does this mean? which model is which?
We are on a bit of a time crunch because we have to demo our product soon - which is a bit of a pain when the model is non-comercial. I'm sure other ppl banked on it too
keeping it blind to not affect biases
four images i was able to get last night before i went to bed 1000 steps each upscalled by 4x
So I just tested making a LoRA with 4 total pictures. It actually worked pretty well.
script? i'll test for faces
hinging your product's success on another emergent product sounds kinda risky dont you think lol
using the most recent Kohya_ss sd-scripts on the SDXL branch
config i meant. if you dont mind
It's done pretty well on the face
It's the one MCMonkey posted - https://gist.github.com/mcmonkey4eva/0f0bd074c17802213817a9a5a50098df
Looks like my latest LoRA just failed, I see no different when applying it or not and I don't know if I messed up training or if Comfy ui is bugged.
hey guys. as I understand it we are down to just a few release candidates as the devs figure out what will ultimately be the 1.0 release?
LoRAs for 0.9 may not work with 1.0 right?
I mean yeah but it's better sai than oai behind a paid api, We have it fully working on SDXL 0.9 (and had it in closed beta on 2.1), but again - that's non-comercial
Just a bummer
You shouldn't depend on a unreleased open-source project for your commercial product.
yeah wait until 1.0
Keep on thinking that
Lets just wait till 1.0
release has been pushed back I here. Which is fine by me. I want the best release candidate.
Cherry pick a few seeds
hear not here
@soft bone Yeah you should give that a go, this is 4 images that aren't even tagged properly that I ran for 600 steps and whilst it's not perfect it's very very close
i've got 0.9. i'll do fine till release
An amazing thing I've noticed with SDXL, is that it seems to be much better at generating deep blacks and bright whites than SD 1.5 and SD 2.1
that's because it's trained on offset noise
but sai has proper communication channels. they are a company
did you reach out to them? are you in any way partnered?
Do they even know you exist?
Even projects like waifuXL got greenlit internally ahead of time
I don't think offset noise is a theoretically sound approach
Would certainly screw up inference sampling
it isn't. but it's easier to apply
ya whatever emad was doing here was pretty good for a pure black background
@boreal bough Maybe we should partner indeed 🤔 Just too short term
What I care about is hands and feet. Kinda tired of the monster 6 fingered mutants.
Hoping 1.0 addresses that.
I've thought that maybe they are using a different noise distribution, but a simple offset should not be able to explain it
Or at least, they can't change the inference noise distribution
In my mind, the only possibility is if they use a different noise distribution during training
And it can't be offset
the real solution is rescaling the betas to sample zero terminal snr but offset noise just happens to break the noise scheduler just enough to make it forget the mean brightness it learns
Wait, were the snrs during training not 0 before?
That's very alarming
also the noise distribution during training and inference is already different, it's not pure gaussian during inference after a few steps
It would mean information leakage in the training
so whats the difference between true black and true white?
no, that can't be done on epsilon loss, but on v-parameterization it can be done
Wait, why not epsilon? It seems to me it would still work
Assuming a sufficiently powerful network that is
going off memory, it's because the SNR is infinity at the last step
ill dig out the paper hold on a sec
We discover that common diffusion noise schedules do not enforce the last
timestep to have zero signal-to-noise ratio (SNR), and some implementations of
diffusion samplers do not start from the last timestep. Such designs are flawed
and do not reflect the fact that the model is given pure Gaussian noise at
inference, creating a discrepancy betwe...
while trainig loras - true white always turned greyish white
I bet this was a huge thing internally XD
I'm genuinely surprised they solved it
this paper is unfortunately somewhat sparse on details which is part of why people have done very little with it
now I just need to know how - so i can replicate it for some of my loras
Ok, I've read it, and it isn't clear why epsilon-prediction is not possible with 0 snr at last step
why does it say repeats:1 epoch:5? 5 steps per image?
I changed that for my run
It was 4 images no repeats and 200 epochs
https://old.reddit.com/r/StableDiffusion/comments/13joe98/sds_noise_schedule_is_flawed_this_new_paper/jkhezbi/
Here SD staff says they implemented 0 snr noise into SDXL @undone bloom
That sounds a bit too much Arron.
well in practice i know people who have trained on both, and epsilon does work to a degree, but as soon as you switch to v-loss things improve drastically and very quickly
For 4 images, it's not because it worked
Maybe it worked but it's a waste of time most likely.
its only in one of the candidates i believe
so this would clafify as true white?
It took 15 minutes. The smaller ones have a worse likeness
Yeah that's pretty white
@undone bloom Fair, but can you reference any paper where they claim offset noise
For SDXL?
yeah that was a test. 0.9 has offset noise instead. and strangely, they cited the paper I linked in the SDXL report about their use of offset noise even though the paper specifically says to not use offset noise
and this is true black correct?
i do believe the offset noise comes from the SDXL report
No, i can see the difference in my AMOLED screen
"finally, we implemented offset noise from the famous paper 'Don't Implement Offset Noise'"
Does anyone know if SDXL is only trained on 1024x1024 images?
Yeah I see that, what a weird choice
the advantage of true white/black is, that you can automate the next steps to turn it transparent, or use it as a background in a word document
I've never seen a good justification for why offset-noise should work
There's hand wavy justification
But nothing theoretical
SDXL is trained on different aspects ratios.
And intuitively, offset noise would break the training/inference distributions quite a bit
That's good.
sdxl offset noise need to be 0, when training in kohya
I wish the models had an alpha channel to do transparency. Would be really cool for some things.
doesn't kohya specifically warn you if you don't have it set to 0.0357?
(then it uses the hardcoded values in kohya - which are the original sdxl noise values) 0.5~1.5 tooltip is no longer correct
what about
pretty white to me
tigigeririget
nope - and that is a problem. people often get bad loras because of old habits/settings
lol that image was 97mb for some reason
holy shit
dont do drugs, kids
oh wait - yeah in the cmd. but based on people reporting their lora findings, 90% dont read those x_x
His parents are siblings
At least from my experiences testing you really need to do more of a stress test for pure black/white backgrounds. Try to get a white subject on a white background or a black subject on a black background. That'll make any flaws apparent.
I don't understand anything, why is this LoRA trained on photos turning images more oil painty...
Polar bear in snow
I also turned the brighness on and saw some kind of weird grid on the image
i mean ive been training 1.5 on v-loss for a while now including some lora training -- i've gotten good at ignoring kohya's warnings because I am smarter than it
if you use the sdxl refiner it does seem to superimpose patterns into the image
i thinks its the upscaler
Strange
it does warn you that v-parameterization is for SD 2.0. I simply don't agree with it.
Wait, what do v-parameterization do for 1.5 and SDXL?
only real reason is to do zero terminal snr training
I think it makes your prompts really precise?
for 1.5 it does nothing
even less white
...I didnt actually test emads true white either XD
lemme check that real quick if they even truly solved it
u should
If you look at the paper where v-prediction was proposed, they claimed that compared to epsilon-prediction, the images are less noisy
Fewer artifacts
I did check Emad's black and it is not perfectly black but blacker than my 0.9 results by a tiny amount.
.
I've tried a v prediction model and from my experience it only generates what you prompt so you need to be very specific on what you want in the image.
found it, here's the response on snr + sdxl
@boreal bough can you check
gg, you did 
off by less than 1%. that's true black
now true white
its not perfect tbh tho in the bottom right its messed up and on the top of the tigers head you can see the color shade into the black a little
Closest I’ve gotten so far but I haven’t really been testing
Can you just prompt for “white ffffff”
At least with my experiences... like, most of this is devoid of objective metrics, but my experiences tell me that there are a lot of ways that multiple parts of the SD workflow are actively fighting against the broken noise scheduler and it shows in so many places and usually manifests as hallucinations. Full-resolution inpainting is the most obvious one I know of and I really hope I do get my inpainting training code working to demonstrate it more properly.
man that is a pretty birb
I'm honestly suprised to see schedulers stick around in diffusion for so long
seems to be the source of alot of heartache
white always goes into greys or sepia T.T
Yeah
even on custom trained loras, where the bg is 100% white, and prompt is "white background"
after training its polluted into greyish again
this isnt true white but its pretty good.
its definitely cool!
Well, hopefully their engineers will make the right choice
Use 0 snr noise instead of offset noise
But given that SDXL 0.9 used offset noise, it is a minor disappointment
I'm wanting SAI to put more energy into Harmonai but I might be biased...
i still get OOM with this. it takes 29gb ram and then instantly maxes my 3090
Do you have the absolute newest sdxl branch of the sd-scripts?
afaik lemme check
Are bots 1 thru 10 using sdxl 1.0 to render or is it still .9?
Awesome. Thanks
So I just did a stupid test that actually worked resonably well
I took some random photo of someone online, it's a crap quality photo too 338x238
And created a LORA off the 1 image
This is the Original
votes from the bot gens will determine which of the three 1.0 models is chosen for release
That's not bad for a single image
generating animals on black background has been a thing for me for a few days now
wow thats surpirsingly good
I didn't caption the facial expression, so she's sort of stuck with that expression, but it's worked really well
It's also captured the poor quality of the image though
one of those images you randomly get. you're not sure what it is, but it looks really good 😄
did you manage to get true white to work in any way though?
since that is one aspect I genuinely gave up on.
Perceptable white is easy - but the moment you paste it into a word document, it instantly shows.
While a post processing workflow can automate this, it essentially invalidates my dnd portrait loras, since they rely on true white backgrounds, for seemless intgegration into character sheets for printing
even emads true white post wasnt true white - still sepia polluted
looks a little odd from other angles too
But I think you could actually make a resonably one from 3 or 4 pictures
not sure if it's true white tho
i mean looks fine to me lol
under 1% off - I love it ❤️
'stunning photograph of an orange landscape with a white tiger, 35mm photograph, professional, 4k, highly detailed'
I need to try it with a picture that isn't only 300x200
how do I replicate it? XD
Will we ever get SD using the alpha channel too for transparent backgrounds?
Anyone have a good img2img pipeline that works? I'd love to try!
while fun, that would explode the latent space x_x
you can however two step automate it?
either via true white bg, or greenscreen lora
Kinda mad how good this can be from a single image
why are you mad? xD
ohhhh 
I tried WaifuXL and I'm glad to see that 0.9 LoRA works on that one, that means there might be a benefit of training LoRA on WaifuXL and run them on 0.9
Honestly not sure then, it took me ages to get it working, Would OOM every time until that update.
that would require training a VAE specifically for that (and consequently retraining the model or at minimum realigning it) and would also come at the cost of mostly undesired alpha channel in all images. i wouldn't hold your breath, good background removal tools are likely as good as it will get outside of niche specialized models
👀
its infuriating to get the highest vram consumer gpu available and still not be able to train a little lora
4090?
3090, just the 24gb i mean
? you should be able to batch 8 casually
okay.....I just spent a week's pay to get a used....USED 3090. Because I don't have a million dollars to get a 4090
and I won't be able to train with it?
ikr🤷♂️
you will my shits just broken
24GB gpus have just barely enough to train LORAs on SDXL
lots of settings - where getting even one wrong results in nothing working, so yeah. I dont blame you ^^'
SDXL development on civitai is gonna be painfully slow over a number of months.
everyone will just rent
?
16gb vram to train properly. 24gb to train a full (good) lora in 14 minutes, with a dataset of around 60 images
Well i invested in a 3090...at least I will be able to generate images even if I can't train.....
general models are going to be a huge pain in the ass to train -- you'll either have to rent a cloud GPU or figure out how to shard the unet
okay so a 3090 can be used to train and a 16GB can also train...albeit slower
hm, situation has improved then? last i heard was that 16 was mathematically impossible, 24 is barely enough
send me the json for this 14 minute training
Well can a 16GB card train or no?
also i'm quite a bit more concerned with FFT
+1 to that
Honestly seems like you can train properly on 10GB with an even smaller dataset
8 was impossible until today.
12 worked - but not for all models, so gpu type was actually relevant
16 worked from day 1
24 runs efficient as hell - heck I even trained a 5k dataset lora, with 100+ concepts, fully working, in 18 hours
json for proof
With 24GB VRAM you've got a lot of flexibility and LoRA training should be easy.
any training on sdxl is referring to 0.9 correct? Y'all realize that's just for testing? 0.9 models will not be compatible with 1.0.
@boreal bough how close is this?
What about full finetunes, though? Any news about that?
Yes we love training so it’s ok who cares
I'll wait to see how incompatible they will be.
It’s not like the architecture is different
ah well okay
40gb was what i was told
They’re just RLHFing the model

great so you're implying a 5090 will be used? Assuming Jensen doubles the vram
After some test on WDXL, I think it might be good at 768,768 rather than 1024,1024
theoretically?
You can achieve what the word "finetune" means with a 3090/4090
realistically?
You'll want 8 or more A100s, to achieve a good batch size, so you can actually improve the model, rather than make it worse
768 then upscaling after?
Guess I better figure out how to shard the damn unet then
JSON? 😭
I get 7 s/it but it doesn't matter when my LoRA dataset is small.
95% white
A100s? So basically you mean money to pay for renting resources then
yep
Cool stuff! What params did you use?
WEll....at least my 3090 will be able to generate SDXL images in somewhat of a timely manner even if it's not the best for training.
did the math last time, about 360$ to produce a finetuned version of 0.9 trained on 6.5k raccoon images.
didn't go through with it, since 1.5 equivalent finetune levels can now be achieved with lora.
Jensen needs to make a consumer cut down version of a H100.....which I guess is what the 5090 will be.
proper finetunes of sdxl should aim high, since a lot is now possible that wasnt possible before. just gonna be expensive T.T
With an A100 you can get a lot done in terms of LoRA for the price you pay.
What about the H100?
Or better yet......two 3090s using nvlink
but damn my 1200w psu won't be enough.
A6000 Ada is cheaper and 48GB
what do you mean by "cheaper"?
40k$ to buy one XD
Like $7000, very reasonable
not sure if we can realistically talk about it
A6000 Ada is well within the price range of many here I bet
oh hell well $7000 is a bargain then in comparison 😄
you should in theory be able to train it on two 3090s with NVLink
The ones mcmonkey posted. I'm in bed now so can't grab them, but I posted them a little bit ago.
Someone out there will have access to an A6000 I'm sure
sharding a unet is an absolutely annoying pain in the ass, it isn't really efficient either, but it will work
It was a lot of effort to get the one 3090....if I could get two though I would.
depends on your goal - but you can set up a runpod with a single A100 if you wanna be serious about training. Its the least expensive option - unless you already have access to high end equipment
Same seed, same workflow. WDXL0.9 1024 vs 768
get that, an nvlink bridge, and extensive knowledge of pytorch or jax, and you'll have everything you need.
what's a runpod?
And a second 3090 of course though right?
Just FYI, the A6000 and the A6000 Ada are two different GPUs
Runpod Instance pricing for H100, A100, RTX A6000, RTX A5000, RTX 3090, RTX 4090, and more.
yeah
and extensive knowledge of pytorch 🤣 the pain is real
Used they are going for around $700. It will take months cause I'm working paying other bills before I can swing a second card.
lemme check those runpod pricing
It makes sense to rent for training unless you have powerful gpus.
Also I can't really offer any guarantees on how much moving data across devices is going to slow you down with that. But you'd be doing all-reduce gradient updates if you were renting a multi A100 pod anyways, and with model parallel you only have to move the intermediate tensors I think.
how often do you plan on training?
training doesnt take long, you're essentially just paying to a fail a few times until it works properly, then its like 14~21min per lora
I'm mostly a user so I guess I will wait to see what civitai community comes up with. But I want to do a personal project which may require training models.
then you'll see if runpod is cheaper or not\
but most people dont make enough loras where its worth getting a dedicated card for it
I just want to finish a project a concept artist promised me he would.....but never did.
Figured maybe I could AI to copy his style
Aaron Beck is a concept artist known for his robots.
He did a piece called: "Robot with Laser" had a very unique style
Hey are you guys voting at all? A/B? Dem/Pub? Red/Blue?
I asked him to flesh it out with pics with multiple angles. But he never got around to it.
Hoping SD could help me finish his work.
I never know what to do when the prompt isn't followed as much, but picture is better. I mean do you want better pictures that don't follow your prompt?
i think one of the help texts does say to pick ones that follow the prompt better
A.) offer him more money
B.) learn stable diffusion, learn training, buy expensive equipment, spend 30+ hours learning the tools, then create a model based on the artist - to essentially commit art theft since its specialized to impersonate them, have your image
C.) pay someone on fiverr 15$ to make it for you in the style of the other person
XD
all are viable options - just saying that the alternatives are less effort
Won't be theft if I keep the pics to myself. But he works for hollywood project so I can't afford him. lol
Thank you Dr. Head I can see why they gave you the degree
I want to learn SD but I can't afford expensive equipment except maybe for renting runpod services I gues
Why don't you pay someone like 20 bucks to do your whole project
we are all art thieves! 😄
I just do SD as a hobby just throw me a few bucks and I'll work a dollar an hour. I do it anyway
4freee
Fiverr you mean those virtual beggars? "will make ai art for tips"
Maybe.
Lemme post a pic of what I mean...... one sec.......
I mean I'm not crucifying you or anything. But even if I ignore the AI aspect - if I draw a bunch of bunnies by directly copying the style of will quinn, using pens only, that is the definition of art theft.
I'm not advocating or aguing against it - but denying the definition of the word feels a bit wrong.
D) realise the fiver guy used AI anyway 😆
There's a big enough gap between people that know AI art and people that don't, that you can still get paid
For it
we have gone full circle 🤣
oh god fiverr gonna be a bad place once sdxl is full release + loras
its this true white?
Probably will last fo another year... Yearish? And then I imagine there will be a very small demo of extremely tech-unsavvy elderly people but at that point... There's usually not a ton of that medicare/medicaid money being spent on small business/enterprise
yeah i feel like sdxl will start the existents of paid loras or something dumb lol
I want multiple pics from different angles of this robot.
rgb(247, 245, 248), f7f5f8
so I want SD to learn how to make robots in Aaron Beck's style
of this specific robot in fact
rip
There's not a really a lot of learning. It's not like chess, or a programming language. It's more like learning, idk, ProCreate.
There's some little bits here and there that aren't exactly handed on a silver platter but they're really nothing
If SD was a human engineer I would ask him to backwards engineer the robot and create detailed schematics from multiple angles.
That's what I want SD to do.
You could do GAN inversion then use DragGAN
I'm dead serious
Yeah I heard of DragGAN. That would be interesting to see how SD might interpret the legs.
the arms imply the style of the legs
There's also dragDiffusion, but as I understand, dragdiffusion doesn't really work
Just inpaint/extend the legs
In my interpretation of that specific robot I picture him around 7 and a half feet tall like a Predator in size.
Inpaint can take care of 90% of what these specialty jawns can do, just without the snazzy affect
SDXL with human bodies/robot bodies loves making them super tall and long
Right! Which is why I'm excited for SDXL and why I spent $700 on a used 3090 from a gamer who didn't mine with the card and treated it like a baby.
What've you got RAM wise?
Most of the cost of training a single LoRA on a A100 in the cloud is just setting up the environment. Training itself is super fast.
64GB ddr4 and a 5800x3d
I am going to slap the SDXL VAE on SD 1.5 and train it until it is aligned.
i could max out the board to 128GB ram if i have to
SDXL long people bias be like:
lol
Wow I thought 16GB RAM and 3070 already spoiled me
Your racoon LoRA is looking good Caith
Someone said SDXL 1.0 was better at not making super tall people, what do yall think?
because I still see tons of Suuuuper tall people/robots
What will that do?
Not that RAM does a whole lot besides cache'ing models... I hear 0.9xl requires a station wagons worth though (of RAM)
I'm sure it did.....but now it's over. lol
SDXL currently specializes in slenderman proportions
Eh, I run it on 16 gigs
64GB of ram should hold me for now I hope.
You can get short robots too if you want.
SDXL's VAE is better at reproducing fine details. So if I train the model with it I'll get an SD 1.5 which can produce better fine details.
This comes at the cost of making it incompatible with everything though, I think.
Maybe that's what the people wanted. There's a lot of demand of long-legged people in the art world.
A super version of 1.5?