#🆕|sd3
1 messages · Page 3 of 1
Bit unfair to compare this effort to SAI, they used a tried architecture from SAI, then threw some data at it. It's fun, would be nice to see it grow, but it is not even in the same league of work needed
Simo is pretty much an exception ( like why he want to train the model at the first place ) sooooo... 
hell yeah I will compare smaller models with less training data (0.6B vs 8B), its absolutely fair
One party writes researches a recipe, the other party uses the recipe, yet it's all the same work 👀
I know it's been sdxl refined, but I keep being impressed at the composition that hunyuan puts out. The blue ball vs red hat stuff is not the best, but 1 to 1.5 subjects on the screen has way better composition than sdxl and different than ella and pixart so it compliments it well for prompt running across multiple models.
that's very nice
I hope we'll get safetensor weights so I can try it as well
there's a comfyui plugin so you can run it there with no complex setup
its the same plugin that lets you use pixart-sigma and other stuff
How many custom nodes is you comfyui? Checked them all? That's a much larger attack vector than this set of pickled weights that's been out for some time and no red flags have been raised
Just use it if you want to try 🙂
whats that guy responding to ?
I can't see cause twitter requires you to have an account to even see anything, just garbage

sd def feels more accomplished than sdxl
not sd3
banned 
wonder if this is really what was said
does this mean it's just going to take a while and they are going to release them? or the plan may have shifted away from an open release
bruh
How many times have they said they will release it open source and all

dies

its just the guy himself replying to his own tweet

someone should build a 32 channel vae
another day, another 2 weeks
at that point its overkill tbh
16 channel imo is a sweet spot for finer things like text
Midjourney's vae is around that size
damn
hang in there Emad just fix the hands first then drop the Weights.
🤣
SD3 confirmed?
You can buy bootleg Russian SD3 weight copies.
Axel Rose leaked them on the net too.
Someone make "In 2 weeks" t shirts.
that moment when half life 3 releases before sd3
Another day, another two weeks.
Get an account. All the important business news stuff happens there. It's more reliable than any newspaper at least.
Which absolutely sucks. That site is a cesspit.
no it's not. you're just looking at the wrong stuff. i have my account rather well setup with ai related stuff and announcements and it's great.
took a little while to tell it what I didn't want and it stopped showing me that stuff
Then you're following the wrong people.
the answer is, everything is on there, good, bad, everything. you just have to curate your feed.
its okay to have an account to keep up with news and crap
And that shouldn't be necassary tbh. Musk ran it into the ground - and it was not great even before he took over.
eh? literally the whole world is on there but they should know what you want?
once I followed all the right academic and ai people, and said no to the political stuff, I don't get anything stupid in my "for you" feed anymore.
Stop whining. It was privately owned before, and it still is. Musk doesn't control what anyone posts there, and it's where news happens.
I don't really care though. Not like anyone's forcing you.
Stop protecting that shithole. shrugs
Not like anyone is forcing to try and sugarcoat what it is.
2B and an 8B "Beta" with v0.9 weights. 👼
Just a typo. xD
2B and 9S models (no one gonna get this reference)
guys what to do while waiting for sd3?
keep generating waifus and husbandos
Jokes on you I played that for the first time ever a few days ago.
People already use SD as a “2B model”. No need to create another one! /s
haha
but man.. every time i get a ping from this server, i kinda hope it's from the announcements channel, but it turns out it's some random people... smh :3
but i still love ya
use all the new open source t5 based models
I don’t really understand the logic of choosing T5 for the text encoder. Wouldn’t a newer llm (e.g. llama 3) or even a reduced parameter count T5 using new distillation methods be better?
hopefully the t5 module is somehow plug and play and we eventually can replace it
from what I know you can or not use it and you can put it in your ram so it doesnt take vram from gpu
I wonder if I'll switch over to invokeAI if that get SD3 support, just so img2img and regional stuff will be easier to do
true
yeah idk about replacing it with other weights that do not match the size
as they literally just replace the bits of the weights of T5 with zeros in the MMDiT or whatever to make it not use T5
we are still in the dark when it comes to how this is actually all structured code wise :3
depending how their pipeline actually works
the community will find ways to fix stuff anyway
they replace the T5 EMBEDDINGS with zeros, to make the conditioning
still, the T5 model they are using is going to perform the best
im also curious about the edit model, i hope it fixes the artifacts and deformations from cosxl
I'm glad that inpainting will be a thing out of the box
god SD3 with highresfix would be a treat 🙏
OOM 🙂
yea i cant wait to upgrade my pc
i mean im just waiting for 5090
this is you
and if they dont give us at least 32GB vram, im gonna kill the nvidia dude... jk of course...
(in minecraft)
kek
i really dont like to have a beard personally, just always itchy, so i try to shave as much as possible, but sometimes lazy as hell
and technically my eyesight is not great, but i just dont want to wear glasses
Hunyuan uses a 15 gig version of llama2. Would be neat if they swap that out for 3
quantization is better than distillation
Interesting. Do you have a source for that? Is the rationale something like distillation (reducing number of weights so you can keep higher weight precision for a given model file size) effectively reducing the model's breadth of knowledge while retaining the accuracy of the knowledge it has, while quantization makes the knowledge more approximate but maintains its breadth?
I might have misremembered the paper, but quantization is still one of the best ways to reduce vram requirements
I guess you could distill and quantize
Yes in almost every case with llms, this is the answer. More parameters will almost always be better, assuming it's at least Q3 or higher. Like a 13B at Q4 will outperform a 7B at Q8. The 7B Q8 will be close to 7GB in size and have a perplexity like 5.9, the 13B Q4 will be around 7gb in size as well and have a perplexity of like 5.3 (lower is better)
But you can also distill the model as well to make room for more relevant data. If you're using a model for writing English novels, you probably don't need the model to contain a shitload of data about math and science
So in the same file size, you can stuff more relevant data in the model if you need to(or just shrink the overall filesize to play nicer with multimodal setups like when doing stable diffusion so you don't need 256 terabytes of ram for swapping 3000 models in and out of the GPU lol)
Wombo Dream i2i into SDXL+LoRA+PAG Advanced NOT SD3
do you even know how's the performance of LLaMa 3 8B Q4 + SDXL
seriously even some of the SSD can't handle it
Im playing with it right now, seems like a lot of fun so far
What do you mean? Running them in parallel or is there an SDXL version of Ella now?
What's wombo dream? Edit: oh ok it's another service
When I finally got it to work.
I run them as a single workflow in comfy on my personal computer, so i'm not sure what the problem would be running that combo
however, you can't use the LLM directly as a tokenisation step 😦
So you are using it as a prompt enhancer? Can you show me the workflow?
yes, that's correct
but depending on model, you might have to change output, i'm using ollama as backend
this is the summary xD
switched from llama3 to phi medium, so gonna have to find my bearings again a bit
Thanks.
Contribute to stavsap/comfyui-ollama development by creating an account on GitHub.
this is the node i'm using to use ollama
important is that you get the model instruct template right!!!!
for this workflow, the system prompt MUST be used, or you'll be getting gibberish that you cannot use
right now i'm trying to find a better system promtp
and then stuff like this happens
@teal fossil workflow is inside the image
still no news? goodness
Two weeks.
where you hear that?
Two weeks
wrong server bub
This honestly pisses me off, and it's only the people who don't know how AI works or they've just never used AI before, AI has been with us for a whole while it only got popular with t2i and llms
decades ago: There is no such things as a digital artist 
Lol even VFX, a vfx artist uses AI tech to track footage so they can add CGI stuff later on, but they wouldn't know that bcoz they're npcs
Ludites man, get off your PC.
Use analogue only. Kodak Agfa family moments. Go develop that shit!
Better yet only cave art is true art, only uses natural raw materials, paint me some bisons.
Two weeks is common knowledge by now, just scroll back up to see how common. Sources do not need to be cited for common knowledge.
Humans are obviously unnatural, just like AI. Only stuff made by non-humans is really art.
Hmm... But life seems pretty unnatural compared to the other stuff in the solar system. Maybe only stuff made by non-living things is really art?
ah, I see :D
you never know eh?
desenho kids
How heavy are the weights? 50kg? 500? 5000?
Can we even lift the weights after Emad drops them? Are we worthy and capable?
2 weeks.
OK. So I'll sleep for two weeks. See you then.
The oil flow stop and turn into savages.
Does sd3 released ?
Feu
2 weeks.
2 weeks plus tax
I'd be soooo all over it dropping. IF I could even think about running it locally without my gpu sprouting legs and getting the hell out of my country.
even if it drops, I'll still have to use the api.
2 epochs and an era™
System Prompt? Who or what generates that? 🙂
it's a way to initiate your LLM
telling it what it should do
else it wwould just respond with a default answer
OK, I have a ChatGPT4 account - will that link to this node?
no, this is ollama specific
i dunno if you can do a system prompt on gpt4
I'll try 🙂
In general no. ChatGPT does not offer api access. You need a playground account for that. A separate thing entirely.
But you can use openrouter. It's what I do, and the charges are exactly the same as in playground. But offers many, many more models.
I do have a PG a/c - so I have plenty of options...
But chatgpt seems to be entirely non-functional for me for the past 5 minutes or so. API calls work though.
I have d/loaded Llama 2 Chat 7B Q4 into Jan.ai - can I link ComfyUI-Ollama locally to this?
Realy ?
https://huggingface.co/NikolayKozloff/Meta-Llama-3-8B-Instruct-bf16-correct-pre-tokenizer-and-EOS-token-Q8_0-Q6_k-Q4_K_M-GGUF this or https://huggingface.co/bartowski/Phi-3-medium-4k-instruct-GGUF are pretty much the best light models imho
llama2 is pretty much outdated at this point 😦
Every time someone asks for the release it's "2 weeks".
It's a Meme at this point.
Yeah llama3 instruct or other dpo fine tunes of llama3 are super powerful for their 8b sizes. You can easily fit them in 8gb of vram with 8k context using Q4 or Q5 quants(within 1% of Q8's perplexity). They are on par with llama2 80b models.
phi3-medium (14b) is even more amazing in smarts -> it's super neutered tho
Ha lol
Usually I'm a patient person but the SD3 demos impressed me so much that I can't wait to have the open source weights to test, but I guess releasing a model takes time
sd3 open means midjourney in danger, i strongly believe
2 weeks.
2 solar weeks 🌞
What happens is that Stability AI GPUs are so strong that time bends around them.
We are in a time loop of two weeks created by this phenomena.
but CTO said may,he dont lie
Wich will be released before between sd3 and gta6 ?
manhunt 3
Alf life 3
yeah it'll come out at the end of this month, probably on may 32nd

hope 2B comes at the end of the month fr
8B and others need cooking though
2B is a perfect candidate for accessibility and training
Whats the diff between 2B and 8B?
and Size
a little but sure, but I don't know if it's gonna be massively worse than 8B in prompt adherence
if it's gonna be on the level of the others such as pixart-sigma I might as well wait for 8B or something
all I could do is generate paintings as they are in the style I like
finetunes of 2B would be fine though
they are the same,both unreleased
kek
2B is closer to a full train as its a smaller parameter model, so we might even get better quality with 2B than what we see on the API
I just wonder how GOOD 8B would be at its fullest potential
like how llama3 8B was trained for 15T tokens and its wonderful
we won't get anything close to that, but I'm still willing to wait if stability still has the opportunity to train further as much as they can
I feel like the time complexity of SD3 releasing is [ O(n^∞)+ (2 weeks) ]
it releases in two weeks, at any given time
sd3 not supporting unet is a problem with no support for controlnet ootb
No the sd3 release is O(TREE(exp(n)) + 2weeks)) lol
got it, see you then
"i will catch the turtle maybe in june "
they will launch with controlnets
Relsease the sd3 2B now is a good idea
they said that the will release the smaller models first
Do I install it inside ComfyUI/Custom_Nodes? Or as a standalone?
ollama is a standalone program
Stuck at "[WinError 10061] No connection could be made because the target machine actively refused it!"
I have a fresh API Key and a positive balance $£
ollama is local?
I have d/loaded it into Custom_nodes and selected the Ollama_Vision Node in comfyUI
OK, I got the LLama Server running in the background - says Payload Too Large!!! 😄
you got this one?
funk twitter 😉
YESSSSS
this would be amazing
bunch of variants to choose from
the community will find which one's the best for each model size
We might release some variants.
We might release some
We might release
We might

sd3 2030 confirmed
2030 AD?
trying to use the stability api and I can't even gen a picture of a woman in a shirt. Just comes back blurred. You can clearly see that she's NOT naked
whats the prompt though
enhance.... enhance....
that second pic tho.. there is something sus going on on the bottom part of the pic LOL
||not sd3||
idk but it looks like a snail to me lmao
kek
the hell is that face tho creepy
You know, you might be right on the second one. I'm trying to get a pic of a woman sitting on a sundae wearing a shirt that says 'cat'. Don't ask me why, I just wanted to test the api for prompt adherence.
I also have 'nude' as a negative
that straight up looks like one of those videos when you start MGS4, its like some weird tv channels @low stone
y'all think we will ever get a truly photorealistic model?
I don't know what you're talking about, I was able to generate nudes all over the place: #🆕|sd3 message
i mean some sdxl models are very good with photorealistic, assuming you prompt correctly
that's not how the filter works, lots of false positives. my pet theory is anatomy is so outrageously bad, they decided just to blur most gens that are somewhat horrific :p
maybe they've cranked the settings down?
so you telling me we cant generate horror?
actually, it works better than females 🙂
No, in all seriousness, that filter is bizarre, but it is what it is. The good thing is that there's a lot of prompts to try for the first time once you can use sd3 local 😂
sure but they always lack in skin texture, always looking glossy and the dreaded "ai face"
i hope so, looking at ai photos all day makes it so easy to tell. Kinda ruins it
but wait, is that filter because of sd3 by itself or is it more because the pics are shown on discord here and its blocked on discord? i have no idea cause im not using the api lol
it's at the api level
yup
well that sucks
i think the api is the only source, so it's the same everywhere
yea just have to wait for the weights :3
i mean can't they just release at least one? like heck even give us the small 2B model to play with
sigh 
If the model isn't good yet, the reactions will be "is this it???" and no one will remember it's the limited 2b model. Don't think SAI can win whatever they do at this point
Though some openness/updates would be much appreciated
sdxl wasn't really all that great when it released either. The community made it good
i would be happy even with a text update that says, it will release in June or something
just give us some info
I don't get poor skin texture or ai face with sdxl
send some samples
one moment
take your time
no problem, for me the best i've got out of sdxl is
I mean, that's pretty good?
maybe, it's hard to tell after a while if it's real enough
aside from the iris, I wouldn't know it was ai, and only because I'm looking for it
1st one is not bad, second looks like ai to me
yeah just the first two generated...
the skin is a little too glossy and the depth of field looks unnatural
that being said, I mean, I put mine through a few different detailing steps, and I render them with two different checkpoints
Yeah but the skin is very very smooth without any little imperfection
it's meant to look like a model shoot, touched up
So I've been watching Lykon's tweets as they respond to the onslaught of SD3 when posts. One just now said "Sooner than you expect" which was then deleted. 🙂
interesting....
I can get more textured and blemishy too
huh... interesting...
is there any possibility before June? :3
not this weekend becuase it's a holiday... so that's the only sad part i'm focusing on. 🙂
wtf I remember that one too
lmao
guess he got contacted that hte model is not 2 weeks away, but like 2 months
😔
Everything is soon if you personally have access to it
man, the morale is pretty low around stability, which sucks. I really hope we continue to see open models because giving everything to openai/msft/etc sucks
that's a lot better than the other recent one where he said something like "as far as i know, the plan is still to release the weights"
chinese companies seem pretty intent on continuing to work on such things
those chinese companies censor the shit out of the model
they do
i tried generating flags with hunyuan
won't post em here, but it was a test for how carefully they've checked their surely massive data set
they have def gone through every image one by one
flags that are easily confused: holland, russia, france. nailed em all consistently
britain, USA, australia: nailed all those consistently
pixart sigma might as well be sd2 for generating anything more than fully clothed people
zero trouble with their own of course, couldn't do taiwan, and had absolutely no concept of what a nazi flag was or the confederate flag. none.
'flag of taiwan' probably gets you extradited
both of those latter symbols are fn everywhere in all kinds of random images
historic stuff, garbage online, photos of random political events
the fact those are not in their model at all is honestly kinda shocking/impressive
... expected you mean? You gotta remember that the ccp oversees all of that stuff
SD3 8B (or 2B if the prompt adherence is at least better than pixart and other open ones) is something that would be amazing
I do believe it coming out
just not soon
pixart-sigma isn't even close to what SD3 8B can do
nah, i'm not surprised they tried
what's impressive is that it is completely not in the data set at all
that's a ton of labor
SD3 8B is like 70% of the way there to ideogram level of adherence, and pixart is like 40% or worse, idk how to say it
it also all boils down to motion/graphic elements and text working in SD3, whilst not in Pixart-Sigma
but pixart-sigma, for a 0.6B model is still extremely impressive
for complex compositions its waaaaay better than something like SDXL
but it doesn't meet what I want to do, which SD3 8B gets really really close
try harder
oh yeahhh forgot about ideogram. That model is great
ideogram is amazing, I just wish there was something like it offline
SD3 8B gets quite close, and I suppose finetunes would be 99% of the way there tbh
pixart sigma does some nudity by accident quite often. maybe the prompts were filtered, but the images it was trained on less so it seems
2B or 4B being able to be finetuned offline, I would make a bunch of models that would boost the motion graphic(?) element capabilities
so did SD3 to one person here
it got through the censorship very rarely
I don't know how it happened
oh i rember that... it feels as if filter has some sort of cumulative scoring system. woman: 1pt, photorealistic: 1pts, upper body skin: 2pts, breast structure: 1pts. 3 is out. That image evaded it as well, totally covered upper body 😂
doubtful it's that fine grained. It's just a classifier model thats been trained to recognize the nsfw'nish of an image and give it a score
same guess
its not even well trained
I wonder if they are just using the laion nsfw detector or whatever
Playground is soon about to drop their v3 model which looks like it might be up their with sd3. they didnt specifically say this but they sound pretty confident about its ability to render better faces and prompt adherence.
That's exciting. Any idea about clip/language model side of things?
||A surreal, dreamlike portrait of a brunette, with a mesmerizing, infinite zoom effect, where a circular section of her face is magnified, revealing the intricate texture of her skin, with tiny, industrious construction workers, no larger than a grain of rice, busily at work, filling the pores,||
That's a neat prompt
The effect is 10+, but for prompt understanding, well, i had something else in mind 😂
Supposedly the term is "inset map" but sd3 nor the local stuff seems to understand that
inset map, makes sense , obviously still too much off the beaten path
In a dreamlike, 8K photo, a whimsical, furry, futuristic white cyber owl with purple streaks in its fur, sits in a mystical Valdivian forest, surrounded by bioluminescent foliage, with a tiny garden gnome stepping into a mushroom house. The aurora borealis swirls above, casting an ethereal glow on a secret arctic vault, where a cute, velvet-skinned goblin with whiskers poses.
A stunning, ultra-realistic portrait of a black Barbie doll stands in a secret arctic vault, surrounded by towering ice mountains, and a kaleidoscope of colors. In the background, a cute, velvet-skinned goblin with whiskers poses, surrounded by clockwork machinery and glowing orbs.
In a surreal, 8K photo, a futuristic, cyberpunk cityscape unfolds, with a vibrant, candy village, where a furry, futuristic white cyber dog rides a dragon-zebra chimera. A woman barbarian rides a majestic, agitated dragon-zebra chimera, through a dense, mystical forest of Schwarzwald.
A mesmerizing, 8K portrait of a brunette, with a mesmerizing, infinite zoom effect, reveals the intricate, labyrinthine texture of her skin, where tiny, industrious construction workers, no larger than a grain of rice, busily toil within the pores, building minute skyscrapers and suspension bridges. In the background, a surreal, fantasy cityscape unfolds, with towering ice mountains, and a vibrant, candy village, where a furry, futuristic white cyber dog with purple streaks in its fur, wearing a dogtag saying "soon", rides a dragon-zebra chimera through the streets.
A stunning, ultra-realistic portrait of a black Barbie doll, dressed in intricately detailed clothes and jewelry, stands in a secret arctic vault, surrounded by towering ice mountains, and a kaleidoscope of colors, swirling with abstract, Dalí-esque patterns. In the background, a cute, velvet-skinned goblin with whiskers poses, surrounded by clockwork machinery and glowing, iridescent orbs, as the aurora borealis swirls above.
In a surreal, 8K photo, a vibrant, fantasy confectionery wonderland unfolds, with a whimsical, furry, futuristic white cyber owl perched on a mushroom house, surrounded by biscuits, and a flowing chocolate river. A woman barbarian rides a majestic, agitated dragon-zebra chimera, through a dense, mystical forest.
chat?
@low stone
@faint breach
4.2B model
97s inference time tho
but still, 3GB vram requirement for 4.2B
pretty neat. always great to see new models that can run on cheap hardware
especially since stability is going to release sd3 and be done with image models
we need to see how this affects bigger resolutions and accurately captioned datasets (prompt adherence)
currently its just 256x256
nope, they havent revealed any technical things, not even images. which honestly i kinda like...they said its about 30% done, so i'm guessing 2 more weeks or months who knows lool . i just wish these companies would just keep tthings quiet and drop things when its actually finished. this whole waiting thing is beginning to get tidious and abit annoying. you're not selling a movie or sthng geez.
ehh what? they packing up after sd3?
before emad quit he said sd3 would be the last image model they ever made
That's entirely fine though. It's sad to see Stability stop in this field, but it's fine. There is enough research and resources available that others can more easily keep going.
but why would they stop
i think ther streght is in image models
theres more money in LLMs
sure but the open models are so good and you compete with the gpus and engeneres and money of meta
but for open image models is almost no competion
how? lol. Meta's business is advertising. They give away all their models for free. How do you compete with that as an exclusively model building company?
it's not a popularity contest. Stability is an actual business that needs revenue
Oh I want that kind of gen
is Meta gonna start releasing their own image gen models?
thats what i am saying you cant compete with them. tahts why doing image models is smarter becasue meta does not relese them
*Congratulations you've been added to the Stable Diffusion 3 early preview waitlist!
You'll be notified by email with an invite to our Discord server when you've been granted access to the preview. *
nothing since that 😕
Jeez that is the most ridiculous prompt
My checkpoints folder just exploded.
is there a way to try that in comfy?
they just got hunyuan going in comfy the other day, so this would also probably take a bit.
nice, i like to try new toys 🙂
me who only have 2GB VRAM :
Maybe the cat with 4GB of Vram can help you instead.
Once again I have to remind that there is no Pony team, it's just me.
I would be happy to train on whatever (which has the right license) but it's a tricky question. Good data can make meh architecture shine but will still be inferior to good model with good data, hence I am still optimistic about SD3 option. If it does not happen for whatever reason and I need to find back up - honestly another XL version with improved data is probably fine.
Is that loaded in int4?
1.1gb is quite small for 4.2b in fp8
Hey astralite, I've got a question:
When you train (or fine-tune) pony, what kind of hardware/gpus do you train on?
Like what's the vram requirements
v6 was trained on 3x a100 with 80GB VRAM for 3 months
Does the vram usage scale up as more images are dumped into the dataset?
Not the memory, you need model weights + (batch size * per image vram) so more VRAM means more images per iteration (but it also does not scale linearly)
Generally you just need enough to fit weights and 8+ batch
So 40GB is most likely enought for XL finetuning
I remmeber 3090/4090s not having enough to use Adam (which you want to use)
Would you not consider using Cascade?
All of the researchers are gone. There is nobody left to develop a new model.
and Stability really needs money
impressive
It's non commercial. So, no.
gm
Pony,or Cascade?
You're still allowed to fine tune it 😃
Please 🙏
I can fine tune it but can't use it commercially, making Pony is super expensive and running inference for it also costs money, I am already operating at a loss so making it even worth is not a great option.
The reality of our enjoyment, a shame 😞
Worth trying to raise funds from the community? I suppose it all hinges on the SD3 release anyway.
I would prefer to figure out functional economy rather than using some one time large events like kickstarter
iirc redmondai sponsored a cascade finetune once upon a time, maybe they'd be up for another
oh boy
that's like $15k
would it be more expensive if you rented 9xA100 for a month?
or 270xA100 for a day🤔
we actually bought the hardware so a bit more expensive
Oh I misread this, I see that they said they have 3 a100s and it took 3 months.~~ Without accounting for power supply efficiency, A100s pull around 300w at max load plus some minor CPU and other peripheral usage. So let's round up and say 3000w for the 9, you now have to offset two space heater's worth of heat with HVAC for a whole month(~10k btu which is a large 120v window unit that would pretty much have to run nonstop).~~ Meh it's too early in the morning to pencil all this shit out, but these are some of the things to consider lol. Basically, ponyboi would have a massive power bill, but you'd have to do the math to see which would be cheaper. Him buying all the equipment+his power bill vs renting a server to do it in a much shorter amount of time. Plus, he makes multiple models with the hardware he invested in.
Oh and also, a lot of data centers have rules against using them for nsfw stuff
sigma 2k - superhands....
Just use a hand detailer 🤣
I think it wouldn't work, it will detect potatoes instead of hands
@cunning lintel since you were posting complicated prompts yesterday. (Some sd3 pics, some not) / people traveling through a tube network over a futuristic city. Robot Santa. Man is friend with a robot. One eyed women. Doctors with Cthulhu heads. Scientists with space ship shipping businesses.
Another: large throne room where a scientist with Cthulhu head sits on his throne made of lightsabers. Frantic and panicking crowds beneath him as he decrees his next royal orders to begin the end.
Hi, A quick question. When you 3 months of training, Is it with the final dataset or Iterating by finetuning the dataset every training? Sorry if that a stupid question.
yeah, mixed some prompts together, poor sd3 trying to make sense of it :p
@cunning lintel hunyuan at 2.40:1 aspect ratio
wide
i looked, looked again, but well, of course cthulhu has a third leg with all those tentacles!
even gods need sensible footwear
I assume there is no infulstructure made for this yet? So we may need to wait for a extention/nodes and such in order to make it usable?
no hand detailer can fix that 😄
one could argue that if finetuning cascade sends a ton of donations to their patreon or other service, they've commercialized it and are liable af. Anyone trying to earn from their AI work won't touch it
non commercial research only license really kills a model. also shutting down the official channel for it does too
Yep...Gits! 🤣
How about CosXL as a backup plan? It's supposed to be an update for base SDXL.
Non commercial
... Why the heck are they doing that... sighs
same with SD3, but with SD3 you can buy a license
I forgot if you can also do that to cosxl
Apparently not.
It's a research experiment, much like cascade. They may eventually decide to do some kind of spin-offs with them down the line or sell them
i think it's bizarre that a company with a cash crisis has so many models without a commercial license
you'd think you'd at least leave the door open to a conversation
huh, 2 weeks left?
ah, reminds me of Nintendo's announcement announcement announcement.
so, which dataset is SD3 based on?
my friend's friend who was a stability employee 6.53 years ago heard from my uncle that he heard from emad that it will come out tomorrow (this is legit)
no idea, but I bet its partially laion, trained on 512px first, then on 1024px for 8B and later on the smaller models
the dataset was captioned 50/50 by CogVLM (detailed accurate prompts) and the raw captions
they truncated the prompt length to like 72 or whatever because of clip and I don't know if this is real (I heard from a random discord user who heard from a random discord user who heard from a twitter user who was claiming to be a stability employee but it was actually my uncle all along), but they might ditch clip and use T5 only and continue training with non-truncated prompts or whatever idk what was really told, I bet I'm wrong, it could just be clip being ditched and that already heavily improved the text adherence and the prompts were never truncated or something I don't know, it doesn't even matter how hard you try you will never know the truth cause we are never given it.
sounds like a lot of room for error 😛
a lot of room for misinformation spread around as fact because stability tries their best not to inform us, so we make up random shit constantly and get proven wrong
I need to read the paper again
random screenshot go!!!! (this must mean something idk)
i'm not an astrophysicist
its simple rocket science, what do you not understand 
T5 has 512 context length for sure
but I don't know if the cogvlm prompts were actually shortened to 77 tokens or not
and if they were, does it sabotage the prompt adherence and make the T5 context length less important as it was never trained on prompts longer than 77

that snippet shows they only used 75 tokens for the "c" vector (t5 embedding)
like I'm not gonna use up 512 anytime soon, but like idk ~200 would have been a little more useful or something (longclip has 248)
the issue is the vram required for cross attention goes up substantially as you increase either/or resolution and embedding size
there are several hacks out there that try to deal with it, like localized or sparse attention, or the chunking of the token blocks, they have drawbacks though
chunking seems the be the most popular, which is likely what that is
Afaik T5 gets the short end of the stick bc they are also still using Clip G & L.
nothing changes the context length of T5 obviously, but idk how the shortened prompts in the dataset
And they are testing (apparently) if they can use T5 only instead.
I would destroy clip-L and never let it touch SD3 again and then decrease the strength of clip-G so that its used for styling, just in case T5 makes everything too photoreal
Btw T5XXL is not CogVLM, but good for captioning.
Well, tags still have their purpose. I like to combine tags and semi-natural language.
I would use tags for styling if it helps
otherwise I would just use T5 becuase of natural prompting
splits the prompt into 75/77 long segments
T5 has a very long embedding dimension, I iamgine thats why it was used, there's more data there
4096 vs 768 or 1024 or 1280 or whatever of common clip models
but its not a VLM and wasn't trained on any cross entropy loss with a VIT or anything, its just an encoder/decoder model, like something you'd use for language translation
could've just as easily used Llama3 or something else
isn't that decoder only? or is that not an issue?
I saw lavi-bridge, which could use decoder only models such as llama2
I guess i'd have to noodle on the impact of using a decoder only network, but you can get the features from whatever model I suppose and use that
I think a lot of the vlms are just VIT tacked onto (often encoder only) llms with adapters
decoder-only has the disadvantage that the information flow moves to the last tokens
in encoder architectures every token gets context information from any other token
in decoder only models you have a causal mask, so every token only gets information from the past tokens
so "a cat with blue fur" in clip or t5 would have the information about blue fur in the cat token
yeah makes sense
in llama3 in contrast the token "cat" has no further information while the token "fur" contains this information
right due to causal mask
I would imagine that this makes the cross attention more difficult because the last token contains all the information instead of having all tokens equally
zoom in for more detail
doesn't mean it wouldn't be possible with llama3, but I guess that's why they prefer decoder architectures like t5
yeah just as is would probably not be as efficient
I think the vlms are using full self attention on the image tokens prior to the attachment to the llm part
I wonder if in the future we'll get large parameter (lets say, 12B or larger) ternary diffusion transformer models
idk which companies would be willing to test it further
at smaller parameter sizes, it has a massive FID/quality penalty, but it starts to climb back up the larger the parameter size is, whilst retaining low VRAM requirements
only problem, the inference time is terrible at larger parameter sizes
but damn, the small checkpoint size and only 3GB vram required for a 4.2B model
How is that possible? Without Clip?
just uses the embeddings from the text model, its a different embedding space but in theory still has contextual meaning
is there any current way to try TerDit in comfy?
What's best way to use SD3 on a phone? Can use via api with Comfy etc but when I'm on the go what's the best solution currently?
Ideally outside of discord and not the stability assistant because it's trash
I'd love to try out that large-dit tha they mention in that article. 20 gig image model... we just need to find a hugging face demo of it.
SD3 Open Source Weights, when ?
2 weeks
Guys SD3 release in May?
yep, only a year to go
🙂 we had this with 2.0 already 🙂 month was correct, just not the year 😉
tbh would just be nice to see some communication eugh
you can visit Civitai Discord server for communication though
here is mostly for art sharing or super-technical discussion
hyper detailed, photorealistic, myriad witnesses, frantic crowds panicking, surreal aerated landscape, inter dimensional planetary robotic networking
Hi, what's the difference in quality between SD3 and SD3 Turbo?
Is that new? what could this mean?
https://www.linkedin.com/posts/stability-ai_build-microsoftstabilityai-aipartnerships-activity-7199080680226516995-6zPM
Any tests we'd run now aren't on the final model versions so we'd have to wait for release to know that
Azure has these kinds of models like mistral and OpenAI available in azure as a resource you can provision in your resource group. Looks like stabilities stuff will be available too.
oh thanks for explanation
Hmm and based on what you have seen until now? I'm wondering if it's worth the effort to set up the API to test SD3 Turbo (as it is now).
The last I heard, the Turbo version is worse and not worth wasting credits on.
thanks for this
Yeah I haven't tested it. I've only used the regular sd3 on the api. The main sd3 model is so fast via that I never think about speed and want the turbo. Obviously that could change locally.
Emad is the hero we didn't know we needed and we didn't ask for.
hope this will generate a good amount of income for stability
if SD3 comes out, they'll make a super finetuned version of SD3 or SD3 Turbo, like with Core (sdxl turbo) and it will make it a competitive choice
especially if they optimize it for stuff like tensorRT, it will decrease the price of the credits
nice jokes you have there
its really funny
Agreed. They were mentioning overfitting which I think I'm seeing in the results of the current api. Ultra stylized output from a base model is less than ideal.
why dont they just put a commercial license on all of this -IF you make money . and for personal use free... why not?
sdxl sd15
all of it
ask musk for money
hmm
a few billion is nothign for him and this is up hi salley
if you say so
he supports this sort of thing
that would be good
well musk invested in openai or whatever
SD3 being commercial now is the most logical, as they have opted out a lot of artists
so to me it feels less morally incorrect
but I would still not sell ai art tbh
but he already has an ai company,why would he buy a company thats full of debt
They do an actually give a seed as an option. I should try that and see how it goes.
Hello papa musk, it's me Emad from stability AI. We hear open AI joined the evil empire and backstabbed you but if you are still into freeing AI for the masses we are doing the same and we won't backstab you because we are righteous dudes. So we need money coz we're nice. We have a long track record of putting out free stuff and we are commited to the cause. Drop me ugh us a call and we talk.
Thats all it takes a tweet
whatshisface sad billionaire didnt think theyd buy minecart when he tweeted about beign fed up with this world
but they did
Coz he is an eccentric billionaire.
he doesnt have to buy it, just help out or buy a share in the company... whatever rich people do
SAI gonna be like that hobo on San Francisco asking for spare change,he swears on his mama life he gonna find a job in 2 weeks
Sd3 turbo is cheaper, according to the pricing page, that's why I'm considering it
no because SAI has really a good track record.
they did amazing things
Emad is no hobo!
true,he's been clean for years i swear!
CIVITAI SAI comfui etc, incredible what this community did and its for free. It's better than paid products! Could you imagine what people could do if money wasn't a hindrance. If everyone could pour ALL their time and effort into what matters to them and their calling.
mind boggling potential
Maybe these help...
#🆕|sd3 message
#🆕|sd3 message
#🆕|sd3 message
Earlier posts said how bad it was and not to touch it, so maybe it was improved after then 🤷🏻♂️
Whatever is going on with AI it's better than getting involved with anything crypto related... 🤮
But it is true based on my experience - NOTHING beats a good model.
No inpainting, no face detailing, no loras, no perturbed attention guidance.
well one thing is certain,bitcoin cash lasted more than SAI
SD3@ClipDrop - prompt = Vibrant colours, Bold Brush Strokes, Strong Symbolic Imagery.
Deeply Personal, Reflective of Emotional and Physical Struggles.
Mexican Culture, Folklore, Surrealism.
Highly Emotional Depictions of Pain, Suffering, and the Human condition.
Symbolism of The Monkey and the Humming Bird, Symbols of Hope and Duality
This came from a question to ChatGPT4: extemporise the qualities of the art of Frida Kahlo.
Well buttcoin is the only one that lasted out of what thousands of meme and shitcoins... not exactly stellar number.
SD3@ClipDrop - prompt = photorealistic assassin’s creed cybernetic male assassin in an ivory
electrical-rococo elaborate robe by nexro xiii, light and mysterious, in
superhero pose, light and bright, mysterious, magnificent and cybernetic royal, warrior like, light and mysterious immense details, HD, cinematic lighting, cinematic, epic, photoreal by Riccardo Federici, Frank Frazettaby Bill Sienkiewicz and donato giancola and anders zorn, cinematic, dramatic lighting, rembrandt light
and it doesnt work sure you ge tpaid in crypto ok cool well as soon as you need to buy stuff in YOUR location anythign like a shouse a car whatever your government and bank will find out because you ahve to covnert your crypto to real currency and the tax man comes
SD3@ClipDrop prompt = Art Nouveau style, face by Anna Dittmann, snake eyes, snake young, large illuminati symbol in the boarder, Celtic knot with pine tree and pine cones, perfect eyes, A painting of a norwegian woman with flowers on her head, botanical art by Pierre-Joseph Redouté, vivid, blond hair, 1920s short dress, trending on deviantart, pop surrealism, detail
those images are so cool i feel like burning my money rn
SD3@ClipDrop prompt = antique damaged portrait war poster, devil,portrait, by Albert Bierstadt, by Andy Warhol, by Annibale Carracci, by Caravaggio Michelangelo Merisi, by Takashi Murakami, Spray Paint, Halfrear Lighting, Soft Lighting, Linen, Posterization
SD3@ClipDrop prompt = a realistic beautiful autumn queen, headshot, close up, night time, autumnal mood, venice carnival, grand guignol, wavy hairstyle, white hair, character concept art, created by victo ngai henri rousseau vladimir kush coles philips elizabeth catlett arief putra john currin alenka sottler itzchak tarkay anita inverarity maxfield parrish peregrine heathcoate tamara de lempicka mads berg isaac maimon iwona lifsches non binary heart connection/detailed modern art style 8k
They have excellent, excellent eyes and faces ... so far so good!
best eyes i have seen,these folks i tell you,HUGE and BEST hands,they are very very great like our country
hello
so I just tried their sd3-turbo model on their api. doesn't work. returns a 404 not found. sd3 works fine.
oh... nevermind, their url is the same i guess, i just have to pass that model in json
for communication from stability ai concerning their model?
SD3@ClipDrop prompt = masterpiece,best quality,fine_art_parody,realistic,real,solo,multiple_girls,alternate hair length,wet hair,tears,tsurime,white colored eyelashes,looking at viewer,red eyes,narrowed eyes,large breasts,crop top,gothic_lolita,tabi,cross-laced_footwear,demon horns,half middle_finger,smoking,
here are 6 images from sd3-turbo instead of sd3. i know this is still the old version of the model, but the quality is WAY muddier. I'd never use this unless I was generating icons or thumbnails or some kind of very clean render artwork. anything stylized just is way too messy.
@honest cedar d3 turbo is in another league, but not in a good way
each time same prompt 2x sd3, 2x sd3 turbo
maybe those were a bit unfair, one more this time more suited for sd3 turbo's looks, it's usable for this kinda prompt (cartoon illustration of a woman in a hat holding a gun, digital art, fantasy art, steampunk, redhead, weird west, portrait of lady mechanika, cowgirl )
best eyes i have seen,these folks i tell you,HUGE and BEST hands,they are very very great like our country
A powerful agent, her eyes aglow with an unholy power, stands atop a ruined, gothic spire, as a stormy, apocalyptic landscape unfolds behind her, in the styles of Michael Garmash, Guy Denning, and Olive Cotton
Neg: boring, tranquil, wrong, low quality, photo
A resourceful operative, her eyes in a determined gaze, infiltrates a secret society's masquerade ball, surrounded by masked figures and candelabras, in the styles of Michael Garmash, Guy Denning, and Olive Cotton.
Neg: boring, tranquil, wrong, low quality, photo
A haunting portrait of a weary agent, her face deathly pale, surrounded by ritualistic symbols and forbidden knowledge, as candles flicker with an otherworldly energy, in the styles of Michael Garmash, Guy Denning, and Olive Cotton.
Neg: boring, tranquil, wrong, low quality, photo
1
A photorealistic portrait of a 20-year-old South Korean girl radiates beauty with her long, flowing black hair, mesmerizing brown eyes, and captivating smile. She stands at 166 cm tall, with fair skin and a slim, D-cup figure reminiscent of Blackpink's Lisa. Dressed in a white shirt and deep blue jeans, she exudes elegance and charm. The portrait should be a full-body shot, 8k HDR, with high detailed features and a natural, approachable expression, illuminated by soft, golden-hour sunlight.
Yeah, I made the content simpler and sd3 turbo definitely did better with it. Prompt: in the style of anime, chibi cute samurai playing video games at a Tokyo game shop, minimalistic
I expect SD3 Turbo to look more stylistic and have less variety
sheep standing on hind legs whereing a gas mask looks up in the sky away from a cell phone
A sheep standing on its hind legs, wearing a gas mask, looking up in the sky, and away from a cell phone (no cell phone, sorry!)
Any update news for sd3 open ?
A striking, azure Lamborghini, sleek and aerodynamic, thunders down a sun-kissed coastal road, its engine roar blending with the crashing waves and salty sea breeze. Majestic seagulls soar overhead, adding a dynamic element to the scene's exhilarating motion.
Sure. 2 weeks 😂
another 2 weeks from today
?

sd3 2weeks edition
A dimly lit alleyway in Mumbai, with shadows looming ominously in the background, setting the tone for the dark and gritty atmosphere of the film --ar 16:9
#🆕|sd3 A dimly lit alleyway in Mumbai, with shadows looming ominously in the background, setting the tone for the dark and gritty atmosphere of the film.
futuristic headphone advertising
where can i generate images
Two weeks? Maybe one?
who cares I don;t even know if I cna run this on my machine. I am scared. What ifts 25 gigs for the model and it takes 8 minutes to generate one pic
they explicity said that it can run on a 24 gig rtx 4090 without memory overflow and 4090 can generate 1 1024×1024 image in 30 secs
I dont remember step count but it was about 30-45 iirc
And that's (probably) with T5
well 24 gigs vram is a lot

theres a cat here with 4 (send help) what is he gonna do?
5090 with 32 gig soon
800 million model Can it produce good images?
Do you know how to prompt image here
Btw guys with TagGui you can already pretty easily play around with captioning images with T5-Xxl and it's not bad.
uhmmm does anyone know when with SD3 model be available on hugging face
2 more weeks (until it released
When SD4 API will be released.
Only then we will get SD3 checkpoints.
Until then it will be "two more weeks" gaslighting like Emad did a month ago.
Impossible. I'm sure their API's are used by the corporations that guarantee majority of the costs.
They have limited time until Google's imagen 3 and GPT-4o's image generation roll out
Those two have all the things SD3 promises
There are no planned updates of Dall-E 3 in the works that I have heard. And frankly I find Copilot's implementation less of a pain that ChatGPT's, since the latter has all manner of censorship that Copilot does not.
If you look at the paper on SD3, they consider the biggest rival to be Ideogram
Plot twist: It'll be unstable
They show some examples under "explorations of capabilities" title
New model has near-perfect text generation
those images in these examples are definitely cherrypicked
but it is still impressive
I don't think they call it Dall-E 4
only problem is that it hasn't started to roll out yet
not bad
hwo did you generate with it
they've definitely rolled it out
Well, SORA can also generate images. Paradoxically, to advance DALL-E, they just need to remove the filters 😂
Maybe it is for a future rollout. I am a subscriber, have GPT4o, and can tell you it is no different than GPT4 for image generation that I can tell. Not even for text.
if we take the fact that they generated an enormous number of images just to cherrypick the best one to show it as an example on the website into account, they probably have rolled it out
For text accuracy it is miles behind Ideogram
I've tried it few times

I dont know if it's just me but it looked like too cartoon-ish
I am not here to wax poetic on ideogram on all things, since they all have their weaknesses and strengths. Ideogram as well, but if you want text, Ideogram is king.
As to looking cartoonish, the text, it is a matter of knowing how to engineer the prompt. This is still a factor today.
thankfully SD3 is good enough for Movie Titles
lol
true
it's missing the credits at the bottom
There are things for example where Midjourney can do things none of them can yet. But SD traditionally can compete soon enough once you get specialized Loras, so my comment is on the vanilla experience
SD3 overall is super exciting, don't get me wrong. I'm just underwhelmed for now at the cost. $19 for 300 images in a month? RLY? I can imagine spending the 9 bucks for a test run of 130 odd images, but never for a regular experience.
You can get 60 fast for free with Copilot (Dall-E 3) per day, and more if you can wait a bit, and same for Ideogram
they need to optimize it first for tensorRT and stuff, and you have to consider that they need the money
but yeah its very expensive for what they offer
They can need the money and will have me nodding my head in sympathy, but that doesn't mean the end user/consumer is going to opt for paying more for less
I guess SAI can still lean on the stable diffusion brand and its promise of (local) generation with superior tooling. The main reason I'm interested in SD3 is the promise of much better tooling combined with state of the art generation. It's why i'm now paying a little for playing with SD3, curiosity to see how well it performs. But if the tooling (control nets, inpainting (read on twitter there was no such thing as SD3 inpainting yet, ouch), style transfer, regional prompting, customizable guidance, weighted/mixed prompts) turns out to not be there, it'll be more and more waning interest in SD3 for me. If it's just texttoimage, might as well use something that seems more capable.
But I'm no pro or heavy user, whether I pay or not, is of no consequence to SAI, they should cater to heavy professional use, but I'm afraid for those the tooling story is much the same if all you need is texttoimage stock-footage en masse, there's plenty other options and at the current price point SD3 is not competitive at all.
can you generate other stuff maby you realy have a new model
I only know of controlnet and fine tuning that were promised
and even then, we don't know which
openpose, canny and depth would be more than enough
"A first person view of a robot typewriting the following journal entries:
- yo, so like, i can see now?? caught the sunrise and it was insane, colors everywhere. kinda makes you wonder, like, what even is reality?
the text is large, legible and clear. the robot's hands type on the typewriter."
idogram seems worse for this example
Two weeks! 
SD3@Clipdrop.co - $10/month for 10 prompts/24hrs (4 pictures each prompt) = 1 ,200 pictures/month for $10
I mean you can technically open up a new account with a temporary email and generate 3 sd3 images for free (dont do that pls sai needs money)
1200 images/month much much better than ArtySan's 150!
I dreamt I was in the place of Fry from futurama, and when I asked Bender if SD3 was finally out, he replied “two more weeks”.
"If you know how long is a piece of string?!" then that will = the weight for the waits.
Remember February when everyone was like, "Don't need Stable Cascade since we're getting SD3 in two weeks"... and then proceeded to waste half a year completely ignoring the model that can literally generate 4K images in 30 seconds because SD3 would be better.
My desktop background is still made by SC, because I only update it when we get the weights for a new model. (And no, CosXL and Hunyuan DiT don't count at all for subtle and ethereal reasons.)
Indeed, Stable Cascade has great potential, but has been overlooked despite its strength
Maybe once SD3 releases and people realize how much RAM it costs to get large renders out of it, SC will suddenly look more interesting. I could just be wishing though...
I doubt that. Because the 2 billion model is as powerful as the SDXL, and there is also the 800 million model, which is smaller in size than the SD1.5 and is most likely much better than it.
Maybe if sd3 isn't released they might look around for sc
Hmm. Well, on the other hand, if it turns out to be better than Stable Cascade, I will have zero complaints. Just at the moment, I feel very frustrated that open-source AI has basically been dragging its feet because no one wants to work on anything that will be obsolete when SD3 releases.
I agree with you . But perhaps it was planned to delay SD3 from the beginning, as it does not make sense that they released SC first, and most likely it was for a promotional purpose.
Looks like we'll be stuck with the huge SDXL model
I'm not sure what that means, but I get the sense SAI has lost control of a lot of things since about the end of last year.
What's that? Just normal SDXL?
?
What's the "huge SDXL model" we're stuck with? Do you just mean to say that SDXL is huge compared to 1.5? Or did someone do a mergekit / mixture-of-experts or something that I don't know about?
SDXL has 6 billion parameters so it consumes a lot of resources
Oh you just meant compared to 1.5.
yes
But imagine if SD3 was released in the form of 2 billion and it is as powerful as SDXL. Life will be easier
What size is 1.5?
There is an 8 billion model, but I think 90 percent or more will not be able to operate it or will find it not worth the effort.
I think 1 billion or 1.5 billion. I forgot, but in this range
But the parameters are not everything, but also the structure of the model and the text encoder play the most important role
VAE I don't know but it is only a decoder and plays a very minor role
VAE is where the VRAM crashes always happen. 4K unet is just fine.
Yes, I agree, but there is a tiling method, but it takes longer
SDXL is suppsed to be 3,6B, excluding the text encoders obviously
My guess is that a lot of people will use the smallest SD3, just as most people are still using SD1.5 instead of XL
A model the size of SD1.5 with the power of SDXL would be a huge step-up. Wouldn't be really useful until all the controlnets etc were trained though.
I bet that will only take 2-3 months.
After SD3 releases in 2 weeks.
Devs said that 800M SD3 will be more powerful than base 1.5 (despite 1.5 has slightly more than 800m parameters), that's good enough already for to play around with it
I'm working on a painting app / trying to use AI to do useful work, so I'm really not interested unless it can do useful stuff. For now, that 100% requires multiple controlnets.
https://github.com/QuintessentialForms/ParrotLUX
Either way you'd need to wait the controlnets, but having the smallest SD3 to work with should anyway cost you less electricty and power... which sounds convenient
so any idea about sd3 release eta or something? I just join the channel and looking for some good news.
Currently, everything is unknown to my knowledge
Two weeks from [insert current day here].
? Did they announce anything?
I think so
same with 2B beating SDXL, despite it being smaller than SDXL (3.5B)
and 8B is just undertrained 😔
You can actually just calculate the release date using this handy javascript function.
var SD3 = new Date();
SD3.setDate( SD3.getDate() + 14 );
console.log( SD3.toISOString() );
How much B is in the API right now sd3 ?
is it the 8b or the 3.5b ?
I think the API uses 8B
I think I remember hearing it was an early train of the 8b model at one point. Without T5.
Well we can't train T5 though can we
because i'm wondering is there is any chance that the one at the api gonna get any better ?
isn't it frozen
or that's it's limits ?
because it can't render hands or legs correctly most of the time
SD only gets amazing once the community fine-tunes it.
true
And then you need specialized models and controlnets to get production-quality usable stuff.
if they train it further
I know that finetunes will help
It might be hard to notice if it had gotten better. Quality always depends on your prompts.
I wish 8B Loras will be possible with 24GB, but I'm doubtful
Isn't that just a fine-tune?
yes Loras are finetunes
like modular finetunes, you can use it with models
and its less VRAM intensive, etc
I thought loras were lower-rank. Fewer parameters than a full model fine-tune.
not just my prompts it depends on the encoder
because 1.5 1.6 and 2 is different encoder not a lot of people using sd 2
it's sd 1.5 and sdxl
They are, with most in the 8-128 dim range. Still takes a shitload of vram to train them. Doras look promising. Saw that even with 16 8 dim on them, they are on par or better than loras with way higher dimensional sizes. Like as good as a 128. If that's the case, then it lessens the vram training requirements
Even still, if you're working with images in the 1024² range, then for an 8B model, it's probably going to take 32-48gb vram to train them with even just the clip encoders and no t5
correction, even as low as 8 dimensions. here's a decent writeup about it all that i saw the other day: https://sebastianraschka.com/blog/2024/lora-dora.html
just because the model is 8b doesn't mean you have to train all 8b parameters
in sdxl you often achieve same results when only training the cross attention layers than when training both. Similarly, you don't have to train the down-layers of the unet in sdxl
usually our loras are several times too large for no reason
similarly, you can train only a part of the 8b model and you will be fine
well I hope that will work for 24GB and less
I would say that's just a question of proper gradient checkpointing. We just have to wait for someone implementing efficient training
do you guys use stable cascade at all ?
used it for a little while
its good, but doesn't match what I want
and the results are smooth
as long as you aren't looking for super photorealistic images, the model makes nice and clean images
can you show some of the nicer pictures you made with it?
These were all made using Cascade
yeah these look SUPER clean
It takes ~12gb vram to train an sdxl lora with 128 ranks with both clip encoders+unet in koyha or one trainer. I'm not even talking about dreambooth training, that takes far more vram. So to train a rank 128 lora on 8b model, it's going to be faaaaaar more.
Also Cascade
Well that's why I brought up the Dora thing. Also since sd3 doesn't use a unet, it could potentially take far more or far less vram to lora train per "billion model parameters" if that makes sense. It's likely also going to take some time for people to get the tooling up and running for it as well
I haven't looked into that aspect much yet, so I can't give you an educated guess on how demanding the training will or won't be. Just using sdxl training as a reference since I've trained dozens of loras for it
I wonder if we can use Qlora, since its more transformer based
then again, how will the quantization go then
Apparently sd3 does a great job with these miniature scenes as well. 🙂
A good WS there too!
you don't need dim 128
neither for sdxl
even less for sd3
think of it as you want to train the method a new concept. The size of the concept don't necessarily scale with model size
in particular, if you only have a few megabytes of training images, training a gigabyte lora is a rather dumb idea anyways
I was thinking it's because it was sdxl refined, but this is sd3 raw and it indeed has him.
Oh I don't train that high ever, it's just some superstitious thing a lot of "guides" lead people into thinking they need to use, so most people use it anyways. I normally just do 16 or 32, but it depends on what you're training and how you're training.
We don't actually know yet, but if it's like training llms that the architecture is based around(dit), then it's still going to take some hefty resources to train correctly
Llms are a lot more forgiving than image based generation
still 2 weeks ay?
2 more weeks
aight.
So did anyone test the latest API (or whatever)? How does it compare?

i know of at least 2 promosing stable cascade projects
no it has not 6b parameters
Please share. I know it's been mostly ignored, but I managed to find at least one youtuber training loras.
ther is also a stable cascade model that has 1b parameters and it will also be more powerfull then 1.5
Link? Also what SC projects?
one group is making a furry model and the other group/person is making a anime model and both aver very big datasets. like 6m+ images
these two are from the anime finetune but its still not done.
Oh finetunes. I could use an anime finetune if there was a lineart controlnet.
it has something similar to lineart. where it sees the edges of the image
You can't draw a canny map.
she is sitting in the middle of the car with a seat belt on, where is that seat belt connected to from the middle? LOL
its one of the new mercedes 🙂
kek
anime looks amazing with the wider color gamut
Make it 12 gigs or less 😄
I am learning the word "soon" in any language since sd3 release day 1 :
in catalan : aviat
in two weeks is what in catalan?
What a weird situation w ehave here. Cascade has been out for 6 months. We know its a huge imrpovement on SDXL. It works on current hardwares... I t makes huge images. It's also VERY fast...Yet here we are in two weeks... Waiting. Can you image what cascade would be in 6 months if it was embraced at leats half as much as sdxl...
What are waiting for anywya more than half of us wotn even be able to run this thing locally.
So sad.
People always want what they can't have and ignore what they have.
So of course they closed the channel even, nobody cared. this s all our fault
So lets wait then forever
for nothing
very fast? really?
cascade has one limitation similar to SDXL: prompt understanding. I think it's mostly the promise of a model that will soon be available and that improves on that aspect is what invalidates further work on cascade (and sdxl), the difference being that lots of efforts/research on sdxl started earlier and are only recently published.
I remember cascade having very average speed
and same or better results
speed is questionable
But just as we now still see sd1.5 research published, i'm sure sdxl wil be there to stay for a long while
if 2B fails to replace
cascade is great for what it is
if T5 seems to be too good to not use, and it will be a hassle to load/use, SDXL is staying
Yes it is great hence its amazing how underrated it is
sdxl is good too yes
but cascade has greate rpotential
I didn’t mess with it much, but got frustrated with cascade. It LOVES flat backgrounds. Like, it lacks creativity. If you ask for a subject, it won’t build a scene around it, just slap it on a blue or gray background.
I dont speaked catalan since long time , i am not sure but i think is "en dues setmanas"
Cascade deserve more atencion and fine tunes because is hard to find a good prompt on base model
cascade understands prompts diffrently. its trained more for natural laguage and longer prompts
ah, you teach me something, I didn't know this subtlety
stable cascade can do some good stuff.
I love the second
Ultra-realistic 8K image of will smith in an exploded view. The components should be meticulously detailed and appear to float against a black background, highlighting their complexity and precision craftsmanship, hyper-realistic detail
oh i like, let's put will smith in an earlier prompt i borrowed
ultra-detailed photo of a shattered sculpture made of rose quartz depicting will smith, full body enlarged, ((pink glitter explosion)), side view, motion effects, ((shattering sculpture)), colored crystal particles floating as the sculpture breaks into many tiny pieces, studio lights, ultra sharp focus, high speed photo, Mschiffer art, soft colors,
giraffe confident expression, pixar style, expression