#๐๏ฝsd3
1 messages ยท Page 4 of 1
Might be subjective. In my opinion Cascade makes worse images than SDXL, in particular when compared against custom models. Sure, people will now say that you cannot compare base models against custom models and cascade could be so much better when trained. BUT there is no good custom model for cascade. Maybe because nobody is interested, or maybe because it just doesn't work
like my feeling is that Cascade is already heavily overfitted on midjourney images and won't improve much by fine-tuning
also I found that training cascade doesn't work as well as training sdxl. But I know other people have different experiences in this regard ๐คทโโ๏ธ
i just hope i wont have to make another update in 2 weeks (eheh) when this thing drops and it cant run on anything less than 24 gigs
i got the tail end of the sd15 era and sdxl was too much for my old comp
but sinc eit was many years old i said ok lets go
but 6 months late ri really dont feel like upgrading again
Tho u think I can use it eith my 8gb rx 6600 xt on Windows and u think it's worth learning?
Great I'm in the wrong channel
For this question ๐
You can use the sd3 api in comfy and then pipe it into any other comfy node to use detailers or hand fixers or just sdxl refine it with some denoising.
if after all this waiting and they release a model that still spits out images that look more like these api images without any distinct improvement especially with human anatomy. i'm going to blow a gasket.
giraffe confident expression, pixar style, expression
#๐ ๏ฝshow-and-tell giraffe confident expression, pixar style, expression
/txt2img A white man , dressed in an OpenAI logo black T-shirt , writing on the blackboard with white chalk ๏ผthe blackboard have two pieces and can be moved up and down. The content he wrote is "Transfer between Modalities. Suppose we directly model P (text, pixels, sound) with one big autoregressive transfoer. What are the pros and cons?" Shot from the back.
wow pretty impressive, I though cascade wasn't use anymore.
What workflow do you use?
nothing speachial. just good prompts
you have to talk to it in a natural lagugae
I tried it but it took a lot to generate and I couldn't test prompts enough, results were of all kinds
cascade is just next lvl. the base is good. but it has much more potential. it lerns 16x faster / 16x more then other diffuion models, it can output very high ress, it has a very big clip model alsmot 2b.
i think ther is still a free demo on hugging face spaces
i like that it understands more complex stuff like "a person wherring clothing made out of trash bags" but it does not like mispellings lol
I used in Comfy
I got some nice generations
But there was something, like too "perfect", cartoony, it made it unrealistic, beetween other kind of results
yes i agree. you have to include stuff in your prompts to make it less perfect.
like nosy phone image a older date and stuff like this
but the model is fine tuned on "ascetic pleasing " images. so a finetune on worse stuff would help. but the alternative is trying some promt stuff
try to use stuff like this "grainy iphone photo of a black backpack with the word LUMA embroidered, new yor city metro, posted on Reddit in 2013
"
ok, I'm trying to set it up running again, ty
then you can get stuff like this
it looks more real and less cartoony. but the model sometimes still wants to make it lok perfect. thats why a lora is probably needed
what resolutions do you use? and that's why asked about the workflow, I use this, only that worked. But took like 10 min to generate in its default settings https://civitai.com/models/119257/gtm-comfyui-workflows-including-cascade-sdxl-and-sd15
In v4.0, the "KRestartSampler" node can be installed from: https://github.com/ssitu/ComfyUI_restart_sampling The dequality node is included in the ...
just use the basic workflow for now https://comfyanonymous.github.io/ComfyUI_examples/stable_cascade/
i just made this one with this prompt "grainy iphone photo of a bag , new york city metro, posted on Reddit in 2013"
it still looks a bit like a studio photo but its getting there
what gpu do you have?
jummy xD (cascade)
RX 6700, 12 GB VRAM
I'll try to run it when I set it up, ty
12gb shuld be enogh. so it shuld not take 10min for you. i think you can add stuff like a lwe vram option to comfy ui
No idea, I never found "flags" for Comfy or any optimization, I just run "HSA_OVERRIDE_GFX_VERSION=10.3.0 python main.py"
i found this "python main.py --lowvram --preview-method auto --use-split-cross-attention" i think you only need the first one
(cascade)
now this one looks good/real (cascade)
It looks really good
$prompt full body,
learning "soon" word until sd3 release #2
French ๐จ๐ต : Bientรดt ๐จโ๐จ๐๏ธ๐ฅ ๐ฅ ๐ผ
boy
My prompt = peruvian arpillera streamline moderne molly-mae hague natural nature majesty victo ngai henri rousseau vladimir kush tamara lempicka andrea kowch
Llama 2B prompt = Create a vibrant and eclectic pop art collage inspired by the fusion of Peruvian arpillera with Streamline Moderne, featuring Molly-Mae Hague as the central figure. Incorporate elements of natural nature, majesty, and fantasy to create a dynamic and imaginative piece. Collaborate with artists such as Victo Ngai, Henri Rousseau, Vladimir Kush, Tamara Lempicka, and Andrea Kowch to bring their unique styles and perspectives to the piece. The resulting work should be a bold and colorful celebration of artistic collaboration and creativity
Oh I got it running, it generated really fast
you can ether buy artisan and can genrate it in that channel or here for free https://glif.app/glifs/clv4ca3aq0004xklr050jusec
we just post generated images here
I'm trying the Galaxy Time Machine Workflow again, still takes long
yes indeed, pretty neat and realistic
i think it uses also sdxl and stuff
you dont need to use it
only if you use the face/hand detailer I think
It expend most of the time with model b and c
It generate images of 1920 x 1280, I guess that's why it takes more
After model c it shows a preview
Also what was messing with the generations was this SD XL Styles, for some reason they are broken, so I bypassed them. I guess it doesn't need it, and they add much nonsense.
well this is the same prompt with GMT workflow
I guess I'll try prompts with the basic one and maybe then try with this
And for some reason output is store in Temp folder
cool
G T M ๐คท๐ปโโ๏ธ ๐คฃ
If you're using v4 of the Cascade workflow, it does a double pass over the C stage, so will take longer. You can either bypass that, or use v3, but that uses the unet models. Should be easy enough to change around anyway.
oh ok sorry
RX 6700 12 GB VRAM
That will work, but I wouldnt expect it to be fast. Cascade was a struggle for people with smaller GPUs
it takes like 7 minutes I think, it is ok for the large scale almost finished image
I have a 4090
dude is this cascade too???
This image Prompt executed in 123.69 seconds, including model load times.
The workflow may be embedded, but it's a different one to the ones I've posted on Civit.
is this cascade? gaddammn its good but why did it take soo long? is cascade that much slower than sdxl?
It is Cascade, but that is doing 2 passes in Cascade, 1 in SDXL and then Ultimate SD Upscale.
...and a little post-processing in the flow too ๐
Also Cascade
To save memory, it clears the model after it has run, so has to load it each time as well.
I changed the SDXL model to "Boltning Hyper"... Prompt executed in 66.48 seconds
Not a bad time for a 2K image
Stable cascade ?
so how long does normal cascade setup take to generate?
Can stable cascade run on rtx 3060 12go ?
Yes
Try it. I don't have a basic setup, and it'll vary between GPUs
ok but is it usually longer than sdxl or just about the same?
yes
Ha thanks , do you need some ajustments like -low-vram ?
I'd guess it's slightly longer, because it's a larger combined model size and has several stages to run. The result is more important to me than the time to run it.
@severe phoenix
cascade
Cascade
Prompt: manic firestarter
@remote holly I'm running this workflow with the Comfy UI cascade models, 12 GB VRAM on Linux, no flags, 68.40 seconds 1024x1024aa
ok thanks
the GTM workflow it takes oh wow 12 minutes! but generates in 1920 x 1280
If you don't need the size, you can disable the upscaling to speed it up.
....or is that without it already? I suppose it must be.
That's cool , i will try it when i recive my pc
Can you blend images with cascade
Yes
Also Cascade
where is that in the workflow? And I wanted to ask you, why it doesn't put generation in "Output", I found them in "temp"
Probably because of config issue in the save image node. Have you checked the path?
Very easy with Cascade
Does the save image node have a coloured box around it after you run the flow?
I put in manic firestarter and just get this.
does it need the full path?
I need to check that
Change "Quality" to 100
Should never have NaN in there
Actually, right click and rebuild node. Reconnect it if the noodle vanishes.
You probably don't want overwrite enabled either.
I've actually replaced that save node with
I find Cascade does nice composition/images, but nearly always needs refining in some way.
yes in general lighting is really good and things are in place
Nice img2img too
like this, everything is in place, the isometry (!?) is perfect, but it lacks stuff
What's the prompt for that?
Isometric Cutaway - An Image illustrating a diorama of an Alchemist in a simple Alchemy Lab, Bauhaus, Scott Uminga, elegant, abrupt, (in the style of Codex SEraphianus:1.4)
Clownshark attack (cascade) aaaaaah
Not what I was expecting ๐
I had cinematic prompt style selected
What's it lacking?! ๐คท๐ปโโ๏ธ
it is a good drawing, I don't know
Some details here and there, but no much more
Did you expect something more like this?
He's trying to invent a fire extinguisher to put out that fire ๐ค
that has sdxl upscalling or something?
Yes, they all do
yes maybe, the drawing style I don't know, it needs some finetune or something, but I don't know, like it's never a clear style, it that makes sense
Perhaps because you've prompted a style it doesn't know?
I changed it to
Isometric Cutaway - A photo realistic diorama of an Alchemist in a simple Alchemy Lab, Bauhaus, Scott Uminga, elegant, abrupt, (in the style of Codex SEraphianus:1.4)
i prompted many drawings, animation and illustration styles and it always was a little bit meh. But maybe it is like you say.
Isometric Cutaway - A black and white diorama of an Alchemist in a simple Alchemy Lab, Bauhaus, Scott Uminga, elegant, abrupt, (in the style of pen and ink line drawing:1.2)
but also made some style that were really neat aa
yes that looks more like it, a clear style of drawing
I think you just need to be more specific with how you want it to look.
3D Model style
this is a really good clown shart BTW
@dusky thistle
Isometric Cutaway - A photo realistic cinematic diorama of an Alchemist in a simple Alchemy Lab, elegant, abrupt, (in the style of Game of Thrones:1.2),
yes I'm finding that some "crazy" prompts that work in SD XL, they kind of glitch in cascade
SD XL - Cascade
Ewww! Maybe my prompts are just not crazy enough ๐
Oh I'm loving cascade now
A crocodile submarine
Cascade will never forgive us.
refine that s$it!
make loras and controlnets
actually u can refine it with sdxl in comfy
lol
yes
cascade is better
non commercial ๐
The majority of us aren't interested in the commercial requirement.
what does it imply?
nothing that affects us personally, if you make images for yourself, it doesn't matter
No large companies/groups will be interested in improving it, because there's no financial gain.
and that
Same with Stable Diffusion 3
more cascade
Nothing suspicious here, move along.
(Cascade)
Makes you wonder why there's no dedicated Cascade channel here, doesn't it? ๐คท๐ปโโ๏ธ
Exactly...WHY?!?!
#1207078178510872636 this will show as locked or unknown or whatever if you dont have access to archived channels
๐
what I am sad about is that bokeh is getting out of hand
Is it?
@real terrace Didn't you want unclean images of this?
Did you want it so the bag wasn't so much the central focus in the image?
yes I don't why they replaced for sd3 that isn't even out yet
impressive
There was one!
Looks fine, you just need to select what you want.
output_path and filename_prefix seems to be mixed
That's the default, so it creates a dated directory each day and puts images in it.
Change them around if you prefer.
Cascade makes a good job of JW
Amazing, can you try with these images ?
I want to see what's done with liminal spaces
They're way too small for Cascade
I upscaled them 3x, but the pinhole camera image quality is too bad
You need decent images to do a merge, Cascade resolution is higher than SDXL.
I did another 2x upscale on the images, but the merge results are all similar...
when SD3 downloadable?
Ha i see , do you think 1024*1024 image are good for merge ?
Soon
Should be
Cascde can nativly generate at that ress
yes it is. that's what happens when i have to sit in the car waiting for someone.
seats are attached to the outter walls of the train, that's an interesting design, would work well in india xD
also, why are they teasing us with updates in #๐ฃ๏ฝannouncements when they know everyone's waiting for the weights xD
ngl this is starting to make more sense
in no way we are getting SD3 in May
and considering that they will release all models at once, 8B has to still cook
for a very long time
why dont they just effin say that though instead of doing the annoying 2 more weeks hype thingy
I was going to disagree, but 1.4 did release around October 2022 and staff were teasing it in August. Might have a point?
The API argument does fall apart though.
idk why they keep saying soon
they want to stretch the hype
๐คทโโ๏ธ
Soon is just the term they use to placate people asking for a date. Always has been.
"guys it'll be here just wait a few more days" so that we'll be kept interested
Unless staff has been giving actual timelines like "we are planning on a release near end of May", then assume soon tm doesn't really mean much.
civitai blog had a "we've heard that it will come at the end of may"
but I dont remember the exact wording
More often than not recently, they just drop models with no warning.
why not come and just say something like "look guys, it's actually not going great, it might take more time to release the weights" instead of nothing ๐ฆ
alex mcmonkey already addressed that 8B is very undertrained
It's all conjecture, the utterly bizarre part is SAI just refuses to make any announcement. I fondly remember the SDXL trajectory, never was there this promise of "soon" the worst was a week?? delay when release we announced. And we got the 0.9 leak. And the model in bot. and more talking devs.
but nothing similar has been announced collectivelly from stability
sdxl 0.9 leak was kinda cool :3
I love how stability made a post about it
like bruh
i mean im still patient and technically have lots of stuff to play with already, but just wish they gave us some form of information, even if it's bad news
yeah, this silence combined with occasional hype posts on twitter is more annoying than that it creates interst
Leaks to force a model release do seem to happen way too often...
I hope a leak wouldn't force them
8B has a long way to go still
I dont want them to stop training prematurely
they don't even announce "oopsie that testing where you sign up for a bot, not gonna happen" while it obviously isn't gonna happen anymore... be open and transparent, but all these empty/false promises only lead to more skepticism
but do they have anything ready, like for example is the 2B model trained but they are waiting for all the versions to release all at once?
They might/might not, we can speculate ๐คก
I was hoping they'd release 2B first, therefore I was confident in the may release, but they probably want to release it all at once
and yes, its all just speculation
๐ฌ
yea i wish they released the smaller one so we can play with something while waiting
hopefully it doesnt go past June 
does he know?
as long as you know more than Jon Snow it's good ๐
worst case scenario we just pay alex mcmonkey to leak something :3
Fairly confident SD3 is not the next model that SAI is going to release.
skipping straight to SD3.1 ๐
We aren't entitled to anything from them. Updates and two-way communication are nice, sure, but don't expect them.
Yeah I sit on the fence about this as well, or if they do release sd3, it will be far different from what they originally planned or intended. I know they brought up issues about overtraining recently, so that's what kind of made me start to think they're going to redo a bunch of it but maybe with a different spin
overtraining?
Yeah they overfit the current models
Which is what I just said...
do you mean they lack diverse data?
oh
I thought they meant different things for all this time
an option is to just release whatever they have currently (bad checkpoints) and let the community fix it with finetunes anyway :3
this was 27 days ago
so I suppose a lot has changed since then
I really hope that a bunch of well captioned datasets come for finetuning
like juggernaut X or whatever
(even if that specifically ended up as a disaster)
I disagree. If they were a company that just drops models, sure. But SAI announces SD3, lets people sign up for bot testing, mention model soon, mention model in 2 weeks.... And NONE of all they things they announced come to pass... Meanwhile they do post teasers on twitter, ,meaningless images... If a company on one hand hand says things will happen, these things don't come to pass, then responds with nothing but silence, but does manage to keep putting teasers on twitter, i strongly feel that company should also keep its users which they made announcements to up to date when these announced things don't some to pass
Valve:
half life 3 confirmed
valve does next to no communication with the fanbase lmao
๐ฆ
But honestly, actions say more than words, it's clear SAI (not the employees, the company as such, it's clear people like mcmonkey try) doesn't take its user base serious in any way except as hype-cattle, the closing of sdxl bot that would be back soon with no word about definitve clusre whatsoever, closing cascade channel while still in good use, opening up a waiting list for something not coming to pass all show nothing but disdain.
if sd3 doesnt happen, i guess people will now focus on cascade perhaps? hmm, but i still want sd3 ๐ข
I think something like pixart, the thing most lacking in SDXL is prompt-following. And maybe a surprise ella release will go a long way for that as well
But SD3 will happen ๐
I'm speculating that the freesound stable audio model is going to beat SD3 out the door.
oh crap yes if they release the free version
I'm waiting for that
hopefully finetuning will be easy
wait what happened to the reverse engineered ella sdxl thingy? they said they will release very soon last time i checked and i still dont see it...
I wonder how hard it will turn people away from the non-commercial license lol
SD3 too, but for now, stable audio
Was by the same guy that released morbius, and het got a very very cold shower ๐
i would love to try stable audio or some version locally :3
apparently its very vram efficient
like 4-6GB iirc
Finetuning will be the deciding factor if I'll care about it
oh.... mobius... so those guys just say stuff but never release? sounds like the Alibaba group LOL
cuase if I could fine tune it on songs I like and it sounds decent, then I'd be amazed
its all instrumental though
i mean cant you fine tune it to include vocals? :3
ehh
we'll need a massive community finetune then
even instrumentals would be cool tbh
yea
especially if its offline, free, forever
i think there will be weights, just not "free to do whatever with" and lots of hype. They haven't said they aren't releasing ELLA, just silent about it
meh
people gonna just use suno for serious use though, cause its only $10 with commercial use, and stability is $20
and it has vocals
but can suno do instrumentals only?
suno and udio are sooooo goood
suno can do instrumentals yes
kk
both instrumentals and instrumental+vocals too
would be amazing to have an open model that's competitive, but somehow i'm doubting that
i wish we had more projects out there to try and mimic some of the audio tech, i mean we have a lot for text generation and image generation, but not a lot for audio stuff
well im not counting tts stuff, we have a lot of those
meta's audiobox (just fancy text to voice) seemed soo amazing.
I can convert an album into instrumentals using UVR5 with decent quality and just make a dataset using that
In 2 weeks.
The team was aiming for end of May. As far as I know that hasn't changed.
so emad knew all this stuff and yet he said sd3 was coming "soon". Bruhh wth, why hype this stuff up and do the lets pick ppl for access to showcase images when they knew it had all these problems and that it wouldnt be actually ready anytime soon?? this whole thing is just annoying. Cascade is a pretty good model why didnt they drop it like cascade? whats the need for all this nonsense?
No one is going to use Cascade because of it's licensing
did SD3 leak? ๐
nope
no commercial use but otherwise idk
it can make very clean images
I dont make images for commercial use
Money.
yeah cascade looked a little too smooth on skin in my testing
but with some sdxl refining in comfyu its epic
but again no contorlnets nothing
so its even mor eof a crapshoot
u never know what u gonna get
really? like we give a damn about commercial license, its pretty much unenforceable, that license stuff is for big companies with high revenues wh can afford to be sued to be sued millions in damages. stability dnt care about sueing our raggedy asses lool
yeah its mainly for big companies, its just that many people still get turned off
honestly i used to think this but some examples i've seen on here have really blown me away
people are afraid no matter how non-enforcable this all is
also this CCTV lora looks really good
rivals SD3 CCTV images
imagine sd3 loras
oh yeah sd3 will be amazing
people who? nobody gives a shit. if we ddid everyone in here would be anti-ai
and idk about model trainers
if they are just in limbo because "sd3 could come any minute, why would I waste training on Cascade" or whatever
yes
cascade wa sdealt a bad hand
i dotn get it
perfectly usable better than sdxl
no one cared
the wait better be worth it lmao
haha
epic
Casecade was made by a team entirely separate from Stability that got funding from them IIRC
thats another thing
the hype is bad
the longer people wait the higher the expectations
eventually it simpossible to satisfy the expectattions
honestly I would have a lot of fun even with the base model
so that's why I'm still eagerly waiting
happens with anything big
otherwise I'd be waiting for finetunes and not get excited at all basically
Lykon is probably getting a finetune out very soon after the launch probably
im basically waiting for the finetunes and mor eoptimized stuff
since he's a stability dev
turbo version whatever
Let's be real here. Most ppl are waiting for the Pony SD3 finetune lol
i hope i can at leats run those locally
yes
if not the bas
i dpont get pony
lol
exact same clip models as SDXL
like seriously all it makes is a room with a woman in it
every single time
regardless of promt
lol
You can say that about most SD finetunes tbh
yeah this is why I'm scared of SD3 finetunes a little
i think the 3b will be more interesting than people thing
some amazing sdxl checkpoint sout there
we will have 800M, 2B, 4B, 8B
im not technical so i dont really know what those Bs mean
more params
2B will replace SDXL
smaller parameter size, better quality, T5 running on CPU
essentially yes
but that is only using clip models
800m might have similar results but wayyy faster
800M will replace SD1.5 (or not idk)
2B or 4B will replace SDXL
8B will only be used by 16-24GB users
when it comes out ill train a finetune on cpu for the memes
๐ฅ
12GB with T5 with only CPU
whats the base size of sd3? still 1024x1024?
who knows
1024x1024 only for 8B
smaller models MIGHT get 1024px versions
but honestly
the 16 channel VAE probably carries it
u know with the excitement and readiness to pounce there better be every single sd3 version of every plugin available 30 minutes after the weights drop
controlnets supir comfyu
everything
The one about overtraining was a discord post from one of the devs or team members.
my idea exactly ๐ฅ
a discord post
sd3 comes out im speedrun making something
I need to find it then
You know it won't. SDXL doesn't even have full CNet support
SDXL got a new openpose controlnet, and its finally good
or rather, it actually works kek
Does that work with Pony?
no idea, its SDXL based so possibly ๐คทโโ๏ธ
Well the neat thing is that objectivity doesn't care if you disagree. They are under zero legal obligation to even release the model. At least there's the API to play around with.
whats it called?
in the openpose one, there's a "twins" version, and the creator said this:
It is a model with similar performance and different style. The pose will be more precise but aesthetic score will be lower.
yeah I tried this openpose one and its good
lol pokemon model
yeha looks better
god i wish video would catch up
haiper does some ok things
but again
crapshoot
I want boring reality for SD3
as good as "low quality" works sometimes with SD3, something a bit more consistent would be good
connsistency is what wer eall after
Selecting different areas in comfyu is very good
i want to be able ti select a character or object or backdrop and somehow tell the ai to not change it anymore
only change angles and perspective
but leave textures and shape alone
wtfbbq, what does legal have anything to do with it?!
https://www.reddit.com/r/StableDiffusion/comments/1bv83qt/update_on_the_boring_reality_approach_for/
:))
yeah these cna look crazy deceptive
not AI looking
exactly
u reeeeaaallly have to look
you really have to look to figure it out ๐
closest I have ever gotten without boring reality using sdxl
goddamn this aggressive depth of field
haven't used this for a very long time
joe biden lean
wth lmao
did you train it or just use good promts?
I did make a lora but it was a failure
so the Biden images were made with promts?
and a special finetune of SDXL
using an nswf image is okay, but it gets really annoying when it sometimes makes nipples or whatever for no goddamn reason
even if you spam it in the negative prompts
cosxl before cosxl
even though this was jsut primitive offset noise
i know about that, but there is still no sdxl segmentation model (i mean technically there is one i think, but the filesize is ridiculous for a controlnet)
we dont talk about 2.1 :3
so I'm "loosing" generation power (!?) if I generate in 1024x1024?
What's the square native resolution?
not sure about native, but for me, cascade makes better squares i think around 1536x1536 from what i tested
Basically, you're complaining out of entitlement, which I had already brought up. What I said didn't change your entitled stance, so I was letting you know that they have zero need to keep you in the loop or even release the models at all, from an objective standpoint because nowhere does it legally state they have to. You just want them to and are mad that they aren't giving you updates about what they are doing or going to do. If they do, great. If they don't, then oh well.
maybe I'm biased because I saw it here, but I instantly recognized the people as "SD 1.5 people" xD if that makes sense
like a common denominator of people that the models end up doing
all with the same smile and expression at once
hmm, is the new CEO gonna do a Sam Altman ?
no,hes gonna pull a Sam Bankman
Maybe the real sd3 was the friends we made along the way
It's just a cleverly designed wall, made to look like a train.
@muted dove 1024 but it can also do much higher ress because it was trained on multiple resolutions
Yes
Heh its alredy out. At least one version xD
Cascade is 3.5 and 1
the end of may interesting xD
tomorrow is left, I wonder if they won't release it, or they won't release it
if it's cause of 8B still being undertrained, then no problem ๐
They could still release the smaller one for now.
sme x.X
just need to give the SAI CEO a few millions and he will release it,i swear its 4 real this time
*billions
*trillions
galons
Afaik the only mention of 'end of may' for the SD3 release date was in a civitai newsletter, nothing official.
only official news we got was that they are broke and looking to sell
what do you mean? "we've heard" is a really credible source 
The last two posts from Lykon that I can see (through Google) were on May 24th. The second to last post said โstill cookingโ. So I donโt think it is likely that we will get SD3 for some weeks yet.
yup
8B def June or July (July more likely, idk how long they want to train)
2B could come in June though
Wouldnโt it be more likely that the smaller models are distilled from the largest model, rather than having completely separate training?
distilled hurt too much
on quality
it is much much better to use cut-off dataset against limited steps training.
sure things like Lightning or Turbo could help
( SD3 Turbo's on the API btw )
What do either of those things have to do with a smaller model? The number of training samples or epochs seems independent to me (and that more would always be better).
the relationship between parameter and dataset-training is... uhhh a lot of math. so it is a little bit tricky to explain.
If I understand correctly, parameter is a calculation inside neural network to give a data using their learned perspective
or gonna ask GPT

basically "i dont know"
Parameters are just the number of weights and biases in the network. Fewer means fewer nodes and connections.
it also have to do with computer programming
A smaller network will train quicker (fewer steps and less time per step), but Iโve seen scaling laws that indicate things keep improving with more training and more data.
When was this posted?
idk early may or before
Hmmm.. We might not get it by tomorrow then. My guess was mid/late June.
Worst case scenario would be july, i thought.
We already know that the current version of SD3 is already looking very consistent now. Lykon showed us a anime image the other day that looked like it came straight out of a finetune. So i think it's almost ready. At least one of the models is.
Imagine if it is tomorrow tho? Imagine. Lol! A day away.
@dull star Yeah. 2B would still be amazing tho. I think it will be way better than SDXL (If you use T5 with it.)
Don't you think so?
now imagine if SAI goes bankrupt before july
leaks:
Idk.. I personally am comparing it to what we already have. And i believe it will be better than that.
I think 2B will know less "things" objects, names for things etc. Though i think 2B will still be competitive to a degree with 8B, because of the T5 text encoder will be able to merge a lot of concepts together to make new ones without the need for LoRas. (We will still need them tho.)
I just think we would need Loras less than before.
But yh. That 8B is gonna be a monster.
if its better than pixart then fine
It appears that SAI still have set a goal to reach with SD3. If it wasn't at all possible, i doubt they would have tried to continue with it. So in my personal opnion dispite everything going on i think the full training of SD3 is still relatively safe.
I believe it would be, yes. Not sure by how much tho.
The custom finetunes will pretty insane too, i think.
Yeah. I've seen Ideogram too. Strange it came out at a simular time to SD3's announcement. I've also noticed that Ideogram makes similar images to SD3, in terms of composition? It almost looks as if it's a finetune of SD3's architecture. I don't think it is, because it was available way too early for that. But it seems to work in a suspiciously similar way.
If Dall-E 3 & SD3 are compared, for example. They don't look nothing alike with the same prompt.
Be calm and wait two weeks...
im sure they will recover like FTX ๐
i don't see similarity at all (3x sd3, 3x ideogram, same prompt, no magic prompt)
its partially made from the deepfloyd devs
its a pixel based diffusion model or whatever
ideogram is better than SD3, but finetunes will probably gets us closer
lacks diversity funny that
its weird how if you give it a prompt and you don't use magic prompt, you get very very similar images
Lykons Twitter post on left, my own raw sd3 on right.
Anime-style dramatic scene featuring a determined white-haired, red-horned woman in dark, glossy armor with red lines, brandishing a fiery red blade, intense battle unfolding with robotic soldiers in matching armor, background ablaze with orange flames and flying rubble, towering gray cliffs frame the action, sunlight piercing through clouds, crisp details, and contrasting fiery and cool color palette.
For that prompt, it's a lot of improvement ๐
but questions questions questions ๐
depends what you look t, one is holding a sword, the other has a sword fused in the arms, one has robot soldiers running around, the other robot soldiers fused broken in the ground, in one the girls has arms, torse legs, the other she lost her arm i think ๐ข
Still same diff, fused sword, two left arms ๐คท
cherrypicking will still be required
New one is so much better, but.... questions questions questions :p
are these beta sd3 people?
who got access to do finetuning or what?
2B finetuned will be fricking sick if true
๐ค
All the hopes up yes
8B ๐ฌ ehhhh
But SDXL has 3.5 B...
I was disappointed by the news about the 5090, only 28 vram...
LMAO
yes lol
That's why it's called SDXL and not called SD3
I want SDXXXL ๐
I just went through 18 generations on sd3 with the following prompt and it could never do all of the letters. "8b is ice too" in letters made out of ice. Golden hour, in the arctic.
The closest it got
Okay. Here you go: ||https://civitai.com/models/167764/sdxxxl|| (NSFW model)
8B is ice too ๐ฅ

that's why the tweet has 4 images :p
It will also be interesting to see what the new model from Ideogram will be like
ideogram seemed to have stepped on the brakes. First i noticed a message "25 free prompts become 20" and now when i try to generate an img, it says "our slow queue is currently full, please upgrade or try later"
Totally and I realized that after I saw how hard it was having it. Kind of weird when it has done longer stuff before
Yeah I saw that. In their favor, the image quality has been slowly increasing since they launched. I just wish they had an upscale option. I've been a paid user since the beginning since I wanted the private generations.
i understand fully, it's just so noticeable at the same time, i never got that slow queue, and now reduced free gens and slow queue.
ideogram is really curious, composition of the gen is always the same, something odd must be going on there. seems like it does your regional prompting workflow ๐
That's a good call out. I generated a lot of images to use for training and even though the details of the subjects changed, they were always the same pose. Some people have referred to that as overtraining, where there's little change from seed to seed
maybe, another explanation could be ideogram is stacked models, a lowres one for composition, which then works much like a controlnet to guide the high res one. Either way, I'd be surprised if it turns out to be just a single pass
It's so crazy literal with prompts, i had some really weird ones, and it just bends the image till all objects are in it. Unlike dalle-3 (and sd3, much more so) that puts aesthetics first and seemingly ignores things it can't fit it.
All of which makes it a great additional model, it behaves so different
They really do have the secret sauce on actions. They have actions that no one else has , or will do because of censorship. Concerning stacked models, that sounds like most of what I'm doing these days. Pixart/ella/hunyuan refined with sdxl
maybe the model behind the sd3 api was never the 8b model ๐ค
and they generated all these pics with 8b
Yeah, based on what's been said before, it's definitely not as good as the current 8b
Plus they have upscaling etc internally. The api is raw 1300 res
1300?
yeah its 8B, just not the current version
it cannot be
oh i did not know that who of them worked on it?
no idea
did it get anounced i thoght 32?
they cant burn infinite investor money lol
they plan on making a new one? ๐ฎ
thats how cascade works lol
only 4gb more that is nothing ๐ญ
NVIDIA GeForce RTX 5090 rumored to feature 448-bit memory bus The upcoming flagship Blackwell graphics card is said to feature a 448-bit memory bus, out of the 512 bits available on the GB202 graphics processor. This is a new rumor suggesting a different memory configuration than previously discussed. The GB202 Blackwell GPU is the flagship [โฆ]
๐ญ man they didnt upgade vram for 4years and then they only upgrade it for 4 gb.
wtf man
I'll be amazed if they truly are only bringing vram up that much.
Leaves dangerous room for competitors to outshine them.
I mean tbf, what competitors does Nvidia have?
Not really anybody
Currently not a whole big push to outcompete, but that always only lasts so long with anything
28GB is more than enough for gaming, while not enough to compete with nvidia's server GPUs, that have a much higher profit margin
Nvidia's main competitor is itself...
cannot let plebs get high vram easily
Just get a random ai gpu that was used for cypto before
Hopefully AMD or Intel or some chinese outfit will eventually release a reasonably priced GPU aimed at AI hobbyists, but it looks like it will be years before anyone catches up with nvidia
Amd seems to be leaning against that and focusing on mid range gaming cards
For intel i dont know much but i think they have been making a few
I might still get a 5090 if the performance increase is good. Even if it can't run bigger models than the 4090, if it can generate images substantially faster it would be a nice. AI art is all about iterations
A closeup of hyper detailed Cookie Monster face, with fiery yellow eyes and an angry expression. The background is a dark gray with sparks flying around him. illustrations, comic art, and cinematic light effects. Dark fantasy setting with smoke. a dark tone with focus stacking
this is 2B with highresfix I assume
and if this is really just 2B with highresfix I'm excited
this is good enough for a base model AS LONG AS WE GET VARIETY
@low stone
if it weren't highresfix this would be a distorted blurry face (I zoomed into this image)
Yep, like I was talking about the other day when people were thinking they'd magically be able to train their waifu generators with the 8b model lol
I wonder if lykon is teasing 2B because it'll come tomorrow (May 31st), or if they are just trying to say that "look how good 2B looks despite being smaller"
probably the latter lol
but yeah 2B will be trainable offline, so my interesting will rise if it's absolutely the case
You can't train 8B.
I know...
But they can..
8B won't make lora-type models viable even for 24GB sadly
I'd love to train concepts and stuff so badly for SD3
I hope the prompt adherence is at least better than pixart sigma ๐
wait wtf?
so these were just upscaled with like esrgan or swin-ir or whataver?
From what I've seen on artisan, they're using sdxl turbo to do a lot of the upscaling heavy lifting. One hopes that switches to sd3 when it's ready. I have faith that a 2b sd3 would still be great because I've been using Ella/pixart/hunyuan with tiny models and the upscaler output is fantastic. You add vastly more training even onto those little models which sd3 has, and it'll be great.
One of the biggest issues of upscaling those other sd3 competitors with sdxl, is that sdxl doesn't understand multisubject so you often lose character specific traits as the upscale stages progress . When you can upscale with sd3 that does, great things are gonna happen.
I'm still concerned about 2B, it's not enough for creating complex concepts
really?
Everything Lycon shows is beautiful, but I don't see any complex scenes in his examples. Detailed faces aren't impressive anymore.
Right, but someone was trying to argue it the other day. I don't think it was you.
So all of these were created with Ella which is a tiny model which uses another tiny model. So for me, size isn't as important as concept training.
I don't think we're gonna get dall-e level concepts. Even with fine tunes. After doing some Lora training and seeing how easy it is, I realized that stability doesn't have the money to train all the concepts, and the community coesnt have the interest in training anything other than mostly porn.
I was kinda put off recently when I saw lately that the majority of Reddit model and civitai stuff has moved to pony derivatives. Something that goes back even further than sdxl, to make it into a better looking sd 1.5 with single word tags again.
I'd be very happy to be proven wrong
Dalle can do insane concepts.. ideogram beats it for actions, but dall-e can do transformations better than anyone
And I don't see sd3 ever catching up to even current day dall-e.
And throwing away all the fine details the new VAE allows ๐
I have to agree... And it saddens me greatly ๐
The ELLA paper said the limitation was the captions that just weren't good enough. SD3 used the same vlm for their captions, so i'm not expecting much improvement either
But hopefully it will atleast understand longer prompts compared to the api
Pony (the most popular SDXL finetune) already stated they are intending to train on 8B so already he's wrong. So yeah, another way to try and dodge releasing the 8B because "heh its not like you could use it anyway!!!"
Are they ever just going to admit that they're keeping it API only because it's the good one?
Absolutely. It saddens me that my cosxl raw or cosxl merges always look way worse than just regular sdxl models. It means I can't upscale with the better color range.
And hyper sdxl doesn't work with cosxl models.
That said, I just ran a bunch of complicated prompts against sd3 and it beat pixart,hunyuan, and Ella for prompt adherence every time. So even with a 2b sd3, it'll be better than what we have now.
hopefully yes
Cookie Monster with robot arms, puppet hands, a stone head with ruby eyes, dancing on the moon over a massive pit of garbage. Sd3 pic / pixart / Ella
Ideogram ^^
A detailed anime scene featuring a young woman with cropped white hair and luminous yellow eyes; she has prominent black horns extending from her head. Dressed in a tight, advanced black suit with glowing orange patterns, she stands amidst large-leafed tropical plants that shine under the bluish hue of bioluminescent bubbles and moonlight.
I don't know if you used turbo, but I feel like I was able to get sharper lines out of sd3
boo hiss!
Do they have at least a dozen or so a100s and probably three of four months worth of time? Since sd3 is a dit based model, you can easily look over in llm communities to see what kind of resources it takes to train a small llm in the 8b range.
Current dalle is shit
Youndont need a big model for complex concepts. The te desides if you can do that
They're doing the best they can given the circumstances. 3/4 of his comments on twitter at this point are just fending off attacks, stemming from comments by people who aren't with the company anymore.
yeah, and gpt4o dalle is going to be nuts in comparison. i'm almost a little scared at what it'll be capable of compared to everything else we've seen so far.
meta has realtime diffusion and then 3d modeling of that scene and animation, on demand. there's no way the new dalle won't have all that and more.
It feels like open ai makes ther models look less real willingly
Real time is just turbo
has the model been released?
look how long it has taken in the past from api release to weight release........ there is an pattern.You will not see any "model" before July.
as you can see the copium has moved to July,after July it will move to August or September
"SD3 is like a greasy-piglet: it defies anyone to grab a hold of it!!!" ๐
schrodinger's SD3,its here but also it isnt
I'm not among those worried that SD plans to pull a bait-and-switch and keep SD3 closed, everything indicates that they are actively working on the model and the weights will be released when ready. But I am concerned that the longer it takes, the higher the risk that SD runs out of money before we get our hands on SD3.
movie poster, Triangle-shaped food wrapped in green bamboo leaves, known as Zongzi, glutinous rice dumpling, exquisite plates, beautiful cutlery, Pop mart style, Pixar style, Depth of Field, Ray Tracing, Front View, Intricate Details, Unreal Engine, Octane Rendering, Best Quality
People casually ignoring Lykon confirming that everything will be released open source as has been confirmed many times. ๐คฆโโ๏ธ
yeah lmaooo
OMG SD3 CEO COMING
my ceo is not a liar,he may be a cheater,tax evader,gambler,deceptive but never a communist
๐ญ
Was supposed to be released in May no? Why they donโt give a date instead of saying soon :/
it will come out one day ๐ฅ
Yep hope โ soon โ :/
What the best integorator for sdxl on your opinion?
They probably donโt know and donโt want to guess after their initial predictions didnโt pan out. Iโm guessing that either the training loss has a different trajectory than they were expecting due to the new architecture or the training methods didnโt work well enough and they had to restart a few times to make tweaks. They could even be making tweaks to the architecture itself. I believe that all this means that, on release, the performance should be much better than it was for the version analyzed in the paper.
Ok :/ so we can wait again a bit of time, was thinking the 2b was almost ready
It could be as little as a few weeks, I donโt know. But I imagine that they will need to do things like safety and performance testing after the training is done, so it wonโt be as simple as train and ship.
its coming,just give more money to sai
๐ฑ
fk 2B i want 8B
Stop hinting!
Keep your hints in your pocket Emad.
I want 16B too and 32B
I want flying cars and a private resort on Mars.
8B will require huge VRAM... most of which this community will never own, let alone have access to!! 2B/Twobee doobee doo will suffice!!!!
And a diamond pony!
2B = 2Weeks it all makes sense now,the numbers dont lie
2 weeks from now or from in 1 month 
Macs will be able to handle it with no problem. I just want to know what the resolution scaling will be like, since I like to generate at 4K sizes. I can do that with SDXL and hiresfix, but Iโd like to be able to do it in SD3 8B.
yes what about the rest of us numerous windows peasants?
2 weeks in venus time

You can use 2B, buy a new 5090, use 8B quantized and accept the quality loss, use the API, or buy a Mac. What are you expecting, magic?

I expect consistency.
๐
If 2B cna do consistency I am happy.
to be fair all this shit is magic really
I think the recent posts were to reinforce that 2B still has impressive quality, even if itโs not as capable as 8B. I think that is good news. Not sure what you mean by consistency.
5 years ago would you think u can type in whatever and get an image
consistency means i can recreate any part of the image i desire with 100% consistency between generations
To me it kinda did the opposite ๐ compare the wizard to the earlier wizard announcing sd3 and the api, and the iceblock words to what the API can do, it's a bit of a step backwards
Are you talking about inpainting? What โpart of the imageโ?
If youโre expecting no quality difference, thenโฆ๐คทโโ๏ธ. But I havenโt seen a direct comparison between the models on the same images. I thought the ones lykon posted looked great.
i want to be able to select a certain par tof the image ( like facedetailer cna select faces) and there is ways to selct a character or object easily in comfyui
and then iw ant to be able to "lock" that part somehow in promt and recreate it in subsequent generations
maybe even select it with a select tool like photoshop
It sounds like youโre asking a lot. I donโt know how that would be possible.
and tell the ai to only change the angle position and lighting for that part
and ;eave the shape and texture alone
henc eu can composite any scene and story telling becomes possible
If you need part of one image inside another, it would be better to mask off and do inpainting or outpainting.
well relighting a scene without changing the shape and texture has been achieved
also AI is starting to understand 3D there is text to 3d so lets go
so i hope soon u cna select an orange and somehow lock it down and reuse it other generations.
until this is achieved all this is just fun one off images
no stories no comics no movies nothing
until this is done
Character/object/scene consistency I think would be better achieved by LoRAs and stuff like that. Itโs too much to expect from t2i, because a picture is literally worth a thousand (or more) words.
sadly loras dont achoieve it too
This was with Cascade....
Prompt was: blocks of ice forming the words "Put it on ice" in a frozen arctic region on a sunny day
๐คฃ
๐
How would you describe the subtle differences in facial features that differentiate one blonde babe from another? If you donโt know how to put it into words, you canโt expect the model to read your mind. Unless you overfit during training so every โblonde girlโ becomes the same blonde girl.
well reactor can give pretty similar faces
or you can do ipadapter to kinda give the same outfit
not a strecth to think that when you select an orange and ask the AI to "lock it" it woudl create some kind atemplate like that in it smind and recall it in the next generation
like now you can have realtime loras with a few images ina folder orreactor face with just an image to guide the ai
why cant the ai make a mental note and keep it in it smind instead of us making a fodler and putting the images there
Still trying...
I think I'm at the point where I'm going to stop using the SD3 api until after public release. Maybe it's because I've just gotten my workflows so good or something, but now the local models pixart/ella/hunyuan are all putting out pics that are better looking and more prompt adhereing than the sd3 api. The stuff lykon is posting on twitter, even from the 2b, look massively better than what I've been getting out of the sd3 api lately.
sd3 vs. hunyuan. on more and more stuff, sd3 is barely prompt adhering at all. sdxl base is giving me better images at this point. half of me wonders if they're testing the 0.5b on their api right now. It feels like it got worse recently.
We need refiners and loras ans dpo and perturbed attention guidance and turbo and controlnet and apiadapter and LCM support
after that SD# will be better than those other things
I think it's just first sd3 is new, wow, much amazing, try all nice prompts at at
Now SD3 has been there, tried all the nice prompt, let's try something new
maybe i've just been spending too much time with it... but even just the concepts have recently fallen off a cliff.
ooops, sd3 doesn't do so well with something new ๐ข
Maybe prompts are getting too complex
it needs to be refined with sdxl
countless school buses for countless children in a world where only school buses and children exist
looks what I got for this prompt.
that's just embarassing
reminds me of early sdxl
yeah exactly
My main issue is, i'm not sure what's teased on twitter is better, yes, it's more trained, less broken gens
but an image with that many small people isn't showcased
people literally call out lykon for only posting simple images, and he responds back with 3 people holding little signs. He literally created dreamshaper which can do amazing stuff. Makes me feel like he's purposely avoiding.
then again he's posting on twitter for a high profile company, so he's probably limited in what he wants to post.
Yeah, in one reply he said marketing would post prompts, obviously that means he isn't allowed to
@low stone it might be more than just sd3 getting "old", have you tried old prompts, i noticed two i tried getting different results now #๐๏ฝsd3 message
could be just unlucky rerolls ๐คทโโ๏ธ
Or not
Needs boring school bus lora.
ideogram+sdxl
@low stone https://github.com/lllyasviel/Omost
automatic regional prompting basically
hah i just saw that. first couple of examples, ok, that's kind of neat. by the last example where they've got seriously complicated images going on... really impressive stuff.
this isn't as simple as ella-sdxl, but it probably gets to the same place in he end.
but it looks kind of complicated to get going.
ooooo they have a demo spcae
Try the new tool here:
https://app.pixverse.ai/create/video
๐งJoin my newsletter
https://delightfuldesign.eo.page/w7tf5
๐จโ๐ซCheck out my AI courses:
https://www.udemy.com/user/samson-vowles/?referralCode=92BFBB305B81A1C7D1A0
๐ผBusiness inquiries
samsonvowles@gmail.com
--- My top resources:
๐ Grab My AI Secrets! Dive into my h...
see this is what im talking about but for images
select somethign and lock it - keep it as is in subsequent promts
It starts to compartmentalise the photo, like a grid!!!
first result from it is meh
no book, no monks, no shark in a robe.
hunyuan / Omost / sd3
it shows promise.
the only downside, is that the llm piece where it's writing out the "code" for the renderer, takes a LONG time. it's a very large amount of tokens to generate.
I guess the upside is that it probably works with any sdxl model instead of something like pixart or hunyuan where you have to train something new.
How can it accommodate so many tokens ... way more than the usual 77?!
I'm assuming that it's working on the image in stages instead of trying to throw all that at once.
Most likely...
Very cool idea
https://huggingface.co/spaces/lllyasviel/Omost you can also just try it out here if you want to see how it does with your prompt.
I have tried the online space - it eventually "cannot find a free GPU!" ๐
i.e. you've had your turn, now over to somebody else! ๐
Here is what the online space generated #๐๏ฝsd3 message
"a young lion sleeping on top of an elephant with a giraffe in the background in thelush berlin zoo" is too much to ask for I guess
And which model does it use for image generation?
its so weird, omost has a DALLE3 feel to it
I wonder if DALLE3 is just a pipeline that uses an SD3-like model and does some weak regional prompting
wait wtf the local version only takes 8GB of VRAM
Change the lion into a tiger cub, add a giraffe ๐
Oops I didn't mean to ping you @sullen moss
No problem
