#🆕|sd3
1 messages · Page 2 of 1
I mean, keep in mind it's going to be one of the smaller models as per another developer
Hopefully the 2b one, but I feel some ppl are going to disappointed when it isn't the 8b one
I will be. After working with pixart though, I understand that people don't want near 1 minute render times for first ksampler on a 4090.
Rendering with an llm text encoder is slllooowww
That said I think that was the power of cascade, first stage really low res and then just upscale that
I'm doing that now with Pixar and Ella. 512 first stage, then upscale
I've got a 4090 and I'm still doing that
2B is 512 anyway
yeah... all depends on how big the diff really is
if it's like pixart 512 vs 1024, no biggie
if it's like sd15 vs sdxl, or sdxl base vs a sdxl finetune for concepts, then well, shit
Not gonna lie, the phrasing of that tweet is a lil weird
"as far as I know"
tbf the base images shared by lykon on his twitter all look like an sdxl finetune
or better
i'm not talking about the quality of the image
They don't look any noticably better than an sdxl finetune but I'm still keeping my hopium up
if you're patient and have a good finetune, you can make some really good looking shit even in sd15
true
it's about the underlying architecture and understanding of concepts, especially ones it wasn't trained on
Some 1.5 finetunes look better than xl finetunes
idk about that, but point is, image quality is a poor assessment of the capabilities of a model
ability to follow a prompt is important, but yeah, even more important, concepts
can you use it to make a shark mouth so huge that a city has been constructed inside of it?
shit like that
can you use it to make a spider with bat wings holding onto a witch (like someone on a motorcycle) flying on her broom?
etc etc
it does look like dreamshaperxl
At the very least they look better than the api
it definitely will
he is using a local version
which is more up to date, uses more steps and uses highresfix
its gonna look better
api is watered down
and just raw result
unfortunately the 8b, 4b, 2b and 800m versions aren't cross compatable
So more fragmentation yippee
I mean, the dev team seems hopeful someone will make an x-adapter thing so that they are cross compatible
I'm gonna be so disappointed if 4b or 2b catches on and people finetune it to make it look like 8b while 8b is left to the dust
@dull star so that made me look at that series of tweets. I've been pretty optimistic about this whole thing, but working on the open release version, implies there's a closed version. Yay shitty low end model for the people 😥 https://twitter.com/Lykon4072/status/1791839648987156525
@cibernicola_es Since my team is working on the open release version, it would be odd if we end up not doing it.
Well yeah. They said they were going to release one of the smaller models first, and then maybe one day release the bigger models
big model never was what we were told
But this makes it seem more like only smaller models,, not just first. Time will tell.
tbh, seeing what pixart does with .6b params, comparing that to what sd3 produces, i wonder why 8 are needed.
Maybe someone can direct him to this channel if he wants "MORE sd3 images". wth is even this statement lool. seems like needless fanboying. please when sd3? emad said it was "coming shortly" 3 months ago. i aint mad they got him outta there tbh. the fact that stability doesnt have a leading website in ai gen like leonardo nd co is just insane. wth is clipdrop and that stabledream thingy. Everything runpod is offering, stability should've had a better version of that by now. its crazy how far behind they are in that space.
yes, the plan comfy shared was that 8b would not be released to the public
nah they said all models will drop
not 8b
I do remember him saying that 8B will come later, but with a stupid ass reason
Source?
do you have a link?
Lykon said all models will drop at one point
it was a long time ago in ai years somewhere on here
honestly
i don't think any of it tells us much of anything
i'm not that worried about this. the difference between 8b nd 4-6b ones isnt that great from what i've heard. besides most ppl dnt have the resources to run the 8b one
i'm inclined to believe even top SAI employees don't know where this will all end up
idk about the fact that it won't come out at all, but I definitely know that 8B comes later
unless they're ready to release before any negotiations for a sale begin
true, most ppl don't... i do though as do a lot of ppl on this discord
so we're going to care
probably not.
you'd need at least a 16gb card to run 8b anyways
won't be able to train 8b on 24gb vram, sure, but it's not like ppl aren't renting time on a100s anyway
Maybe 12 if you do some offloading
fair enough, but i dont though i'm not a high roller like y'all lool
I'm probably getting a 4090 whenever the 5000 series drops
either way, sdxl is what... 3.5B?
yes
not rich but don't spend on much of anything these days
Tough that sale "news" seems noting but rumours from an over eager reporter as well, no mention of sources anywhere.. if you keep repeating that every month, it'll be true once 🙂
SDXL can run in less than 4gb using fp8
got bored with eating out, got bored with drinking, and i have a big oled in the basement so i don't go to theaters or anything like that anymore
all my sports teams suck and i lost interest years ago so i don't spend on that either
adds up to a crazy amount of money every year
say on a weekly basis, you bought a coffee for $5, ordered a single shitty pizza for $10, and had a beer and a burger on saturdays at a bar conservatively for $20
that's a 4090 every year with some change left over
sdxl finetunes are still somewhat undertrained for its parameters
unlike the thousands of 2gb 1.5 anime merges
FP8 scales linearly with n-b of parameters
SD3 8b would be about 8.5 or so gb in size
did you see the emad tweet?
i'd think he'd know what was going on, being the primary stakeholder... but who knows
the sarcastic one?
You'd need at least 12gb for the 8b parameters
That seemed more like Emad being Emad and tweeting without thinking. He's posted denials like that last year as well, andhere we are
could be^
I still hope for 8b releasing and people making LoRA's/finetunes compatible with 8b
From Lumina paper, it claims bigger model would converge faster.
this is a good point too
That's going above my head. Yes, i understand the meaning of "converge faster" but totally opposed to that pixart is in a similar league as SD3 for many prompts, and it is trained way less than SDXL. Either SAI skimped on training SD3 (why would they) or there's more to it
interesting
it's also entirely T5
don't know how much of a diff that makes
Not that pixart is the be all end all, but if i wasn't told it how small it is, i'd never guessed. It might be at the limit of what's possible for that architecture size, but it's impressive
agreed
Kind of a lot in some cases. Since you're mapping better language semantics with t5, they can map to clusters of concepts in the model faster, which would mean faster training of the models. Versus it kind of having to wiggle its way through in a more brute force approach due to having a less natural comprehension of the prompting
I need to say something dump. How good quality of the output is dependent on the training dataset aesthetic. How diverse image output is depending on how diverse of the training dataset. Pixart training dataset is focus on high quality and it is so limited and it couldn't produce many diverse output as sd3. For example, accurate text.
it does also behave bizarrely with samplers and schedulers we're used to... hands in particular
festivalman noticed recently using the supreme sampler helped a lot with that
My experience is kinda different. Except for text, SD3 and Pixart seem to struggle just as much. Maybe what I think of weird concepts is either too weird or rather normal (as sometimes it works, sometimes not). Plenty times pixart doesn't grasp a prompt, rather frequently neither does SD3 (I expected much more from SD3, to me it feels like slightly better SDXL, prompts where I thought SDXL should be able to do it, but needed much tweaking to get it, SD3 does at once, beyond that, it's often underwhelming to me)
This sounds like limited T5 on sd3
Probably is, but that is what it is now, hopefully it'll be better in the final version. (I just can't wrap my head around the fact that none of that has become obvious in the development phase, like did they really only test prompts like "gorgeous girl in a cafe" and was "three bottles, first red, second blue, third green" the farthest their imagination went:, didn't they try to break it, find the failures modes)
its weird how even with the clip models alongside T5, SD3 still has the best adherence in my experience
Doesn't hand quality really depend on the training data and resolution in particular
I also thought it was primarily the vae that fucked up the hands
it does, but it does respond differently to samplers
for example... res_momentumized results in something resembling cfg burn at high step counts, which i've never seen anywhere else (sd15, cascade, sdxl)
Errrm Guys i have a question 
why does sd3 don't have cfg scale ? and refiner ?
also why sd3 renders hands pretty bad like most of the times the hands are poorly drawn
i thought sd3 won't struggle with something like this
i love it! despite that criticism, i love it still can behave like clip, and happily eat wordsoup, then spits out amazing pictures. My fondest prompts are like that. I'd actually prefer it if it optionally can be even more like clip at times, better cloning styles from artists and such.
I wonder how much smarter it would be with T5 only or T5+ClipG
of it it makes no difference lmao
still api only, so not many settings, might well be the rectified flow part plays bad with cfg scale, or not, i know nothing about it. the hands, no one knows, it's all speculation here, the model in api is presumably an undertrained one, but surely work in process, it might get better at it
hunyuandit paper tested and the result is clip + t5
but that is a special english-chinese model and a smaller T5 iirc
is sd3 t5 XXL?
It's more of a parameter issue in models. Think of all the wrist angles the hand can be turned in, now think of how many angles each finger can be in, along with how many angles each knuckle can be in. Now picture all the non-hand things that can be in proximity to them, like holding an apple or grasping a lever, etc. It goes really exponential really quickly. Now imagine all of the training data you'd need to cover the majority of the combinations they can be in. This is why hands are always a pain.
They could get around that with artificial data using 3d modeling though
So then is midjourney and dalle just massive parameter models?
Correct
Probably model sizes 10x what you can run locally, if not larger
Dalle is probably way up there internally like 80b or something, but I'm just speculating
And probably broken up into different swappable chunks
For different tasks
Ik for a fact MJ has a massive vae alone
i really hope it gets better soon
i have been working on a system that can generate similar images with AI technology anyone wanna try ?
Controlnet can solve the hands problem at the very least
or if you have a cool prompt and want to test it in sd3 pls give me the prompt
like anyone here
but then you also have naiv3 which also somehow generates proper hands
Yeah, it's what I use when I need hands. I just model out in zbrush or pose a mannequin in ue5, then render a perfect depth map. Works pretty effortlessly usually
But I've been doing 3d modeling for over 20 years
Not a chance in hell I'd sit there trying to fiddle around with seeds and inpainting, praying to the rng gods
Total waste of time lol
I don't really have that much 3d experience, the only time I've worked in 3d was with AutoCAD
Which isn't technically 3d modeling but you get the point
but it will be limited each time u want to generate something :[
like you have to do it manually
pixart sigma
Wow
Hauntingly beautiful mixed media collage, influenced by the dark, mystical realms of Camilla D'Errico and Gustav Doré, with the surreal, dreamlike atmosphere of Masayoshi Matsumoto. A daring Russian explorer, fiery red hair ablaze, clad in a worn leather jacket, ventures into the mystical underground cave, clutching a radiant crystal, as shadows dance, concealing ancient secrets, waiting to be unearthed
(SD3 and pixart)
A richly textured, dark fantasy tempera painting, inspired by Craig Mullins and Jose Royo, showcases a dynamic trickster djinn, adorned with ornate, golden chain jewelry that seems to shimmer in the dim light
(SD3 and pixart)
I cna live with that. I was scared SD3 models will be 12 gigs or more...
Why shouldn’t they be
but that's fp8
Larger
We should expect fp16/bf16 model sizes
these are SD3 results
it may seem shoddy, but pixart does not even come close to this
with pixart I keep getting weird stuff
and I won't even attempt ELLA as that also uses a smaller T5 and it doesn't work with text either
I just cannot live with pixart only. As good as it is, as greatful I am for it (like its also commercial openrail++ too), it just doesn't not have the prompt adherence and text capabilities of SD3.
For a 0.6B model though, its still very impressive nonetheless.
can't wait to see how 0.8B SD3 compares to this!
Hey
How do you know which version is available ?
also do you know a good way to fix hand issues ?
because i keep encoutring so many issues with bad hands renedering even with using negative prompts
daymm that will take a while
yup
also this is an older version with probably lower step count and no highresfix
so we'll have to see how it performs locally with any step count we give it
realvision or jagranut
and Lykon (a dev at stability) will probably immediately start making DreamShaper finetune for SD3
yes i was wondering why i can't adjust the step count and cfg scale
it's so weird i didn't find params for it
API probably cheaps out on the step count, but I could be wrong
highresfix also fixes small faces and such
I use it all the time with SDXL
can't wait to see what difference it makes for SD3
yes i know highres
i also know about adetailer
i tried to used it to fix hands but it didn't work as expected
i mean it works with faces
but with hands nope :[
i mean they should make the price flexible based on step count idk
but if the step count gonna give better results defiantly i will make it higher
can you give me a prompt to try it out please
hopefully they release 2B by the end of the month
what is the prompt for this
hol on
Photo of Criminal in a ski mask making a phone call in front of a store. There is caption on the bottom of the image: "It's time to Counter the Strike...". There is a red arrow pointing towards the caption. The red arrow is from a Red circle which has an image of Halo Master Chief in it.
How do you know the size of it ? and what size is currently being used because i searched but i didn't find this info
i want to follow these updates

there will be 4 model sizes
800M, 2B, 4B and 8B
2B in my opinion might become the most popular one
andd .... which one is being used in the api right now ? 
8B
i will try it out wait
this is dall-e 3 results :
And this is sd3 results :
that's very bad


that's weird?
Dalle3 looks like the typical results I get with current SD3
and the SD3 images you generated look like the results I'd get with like the earliest available version of SD3
all you need is "ugly" in the negative prompt, or not even that
okay wait i will try
ugly, ugly eyes, ugly face, deformed eyes
Photo of Criminal in a ski mask making a phone call in front of a store. There is caption on the bottom of the image: "It's time to Counter the Strike...". There is a red arrow pointing towards the caption. The red arrow is from a Red circle which has an image of Halo Master Chief in it.
that's still bad 

honsetly i thought sd3 base model would be better than this
do you think that the issue might be with the prompt itself ?

no this looks okay to me for a base model result
we always overestimate how good base models are supposed to look
But how dall-e 3 is this good ?
and it's like the base model only
DALLE-3 is a service
(so is SD3 right now, but its cause they need the money)
Stability knows how to make a sexy model, they have Core
Core is super good quality and finetuned and its on the API
SD3 is currently raw and possibly underfed with POSSIBLY low step count...
((definitely not because the model is still in training))
i see and i wish that it develops and competes with modjoruney and dall-e3 it would be great for the open source community
meow
question what does sd3 think "iris" is
1.5 read it as a eye in sdxl it read it as a flower xD
🤔
Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering
hopefully we can plug and play some of these t5 models
really love the bubbly mold surface on the water
it does reflections well too here
I wish it cna be used for serious 100% lifelike images already and maybe movies later on as in real serious good expression
not just gimmicks and mEmEZ
I hope refined SD3 is good enough
as for the tools
yeah...
not yet
i wish upscalers werent so slow and resource hogs
supir devours everything
usually i disable it :/
i only upscale if i come up with soemthing reaaaally cool
i know other upscalers are faster but nothign coems close to supir
In a desolate, post-apocalyptic wasteland, the milk packaging's face sets into a determined scowl as it discovers a hidden, abandoned laboratory, its bespectacled eyes narrowing with determination, its entire body tense with curiosity.
In a hidden, underground garden, the milk packaging's face relaxes into a serene smile as it discovers a tranquil, crystal-filled oasis, its bespectacled eyes softening with contentment, its entire boxed being radiating a sense of peaceful calm.
You guys use promt assists or just type this all in?
local install or can it be done online somewhere? :0
https://groq.com/ (free but effectively a capped context length at 6k) using https://github.com/lobehub/lobe-chat, can even use things the new gpt4o and such, plenty inference services to plug in it
The LPU™ Inference Engine by Groq is a hardware and software platform that delivers exceptional compute speed, quality, and energy efficiency. Groq provides cloud and on-prem solutions at scale for AI applications.
Headquartered in Silicon Valley and founded in 2016. The LPU and related systems are designed, fabricated, and assembled in No...
(in this case i just asked "create some scenes of a humanoid milk carton discovering the world. Create 5 variations, make it discover an underground lake, and describe the (human) characteristics of the milk carton" then picked the good ones and iterate on those. turned out carton made it often like cardboard, so changed that)
but what was it supposed to nail. the tire i guess 😂
Just typed it in. Prompt: rabbid anthro Cactus in a straight jacket, foaming at the mouth. In a padded cell. Mentally insane. Surrounded by anthro bubble guards in uniform.
That image would be epic if you managed to get the bubbles 🙂
Free DiT Generation https://huggingface.co/spaces/Tencent-Hunyuan/HunyuanDiT
hunyuan vs ella/sdxl
although I guess I should put that through the refiner as well to see what it does with it.
ok fine that's pretty cool. 🙂
where comfy nodes for hunyuan? 🙂
please to be making comfy nodes kthxbai
@noble coyote https://github.com/city96/ComfyUI_ExtraModels/pull/37
it's happening!
You have ella sdxl??? The author said training ella-sdxl was "prematurely promised to something beyond my control" in his latest blog update about emma. https://wrong.wang/blog/20240512-what-is-emma/
After completing the work on ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment, my objective shifted towards the lightweight and cost-effective transformation of the Stable Diffusion series models into image generation models that are conditioned on cross-modal sequences of text and images. I explored various approaches for i...
actually hunyuan makes some pretty good stuff too... original on left, sdxl refined on right
nah, just refinement stage.
But it's fully censored and has terrible prompt adherence. E.g. if you want a redhead woman with a blue hoodie and white-haired man with a green tuxedo, it will just throw all the colors in there however it wants, ignoring your instructions. Kind of defeats the purpose of all that extra processing power.
pixart-Sigma with SDXL finisch works. Will try it with Ella
this is in the negative prompt on hunyuan.... i'll try it without it.
ELLA seems to work fine too
4 out of 5 images had a blue red girl and a green gray/white man
yeah hunyuan definitely mixes subject details.
At least they seems happy 🙂
it's also worth mentioning that their settings are really for low quality. low cfg, etc. might be made better if we had this local to fool with sampler settings.
now all we need to do is refine it with something that's even less prompt following than any of that.
I don't think my 8Gb RTX 2070 has the right power for DiT?
certainly works well as a composition stage for sdxl refinement
instant present, glass box, tiny universe of bright colorful pixar monsters living and playing inside, christmas wrapping paper bow, cinematic holiday setting, outdoors holiday wooded scenery, cute animated, 8K, hyperrealistic 3D art, ambient dynamic soft lighting, intricate highly detailed, bright pastel colors, extremely high-resolution details, photographic, realism pushed to extreme, fine texture, incredibly lifelike
so hunyuan on right was actually more prompt following than ella here.
so like everythign else, good for using it alongside other stuff.
sd3 beat both of them.
really good output here.
(yes sdxl refined, but sd3's composition was excellent)
another sd3 refined output. seriously good
like the prompt 🙂
For now only pixart sigma with sdxl can reach it (local)
nice
really good
it's kind of what I mean, because all of these have different training levels, it's best to throw prompts at all of them simultaneously. 🙂 oh the workflow.
With some more steps it comes close.
And it is a great prompt @low stone, funny little creatures
there's a couple of prompt twitter feeds that are good for this stuff and then I change details (usually includes furry little animals). #PromptShare and i subscribe to a group called PromptAlchemists
sdxl ft only
draw a logo which named River Tree,it is a company name which operating nonwoven machines, use green and black color
hi Rachel? what's a nonwoven machine?
This at least from the alibaba sales site of river tree 🙂
I really wonder what's up with these bots. they spammed all of the channels with the same completely random request. I haven't figured out the point of it all.
what do they gain by endlessly doing it?
Easy explaination, on an other server i am one, a visitor has told us he watched a video (chinese) and there was explained howto generate images with discord. As sometimes they do not know english they just either repeat a request from another person (guy in black robe,....) or try just a prompt. As soon as they see it does not work they leave again. Just simply too many 🙂
No script, real people who follow a youtube (tiktok,...) tutorial. Guess from a time the bots where on. On the other discord server they searched for the invitation to the blue willow discord as it was shared there....
Not the smartest way but sometimes reproduction is easier then thinking...
bots are well designed. I don't think anyone able to develop bot who can't find a proper way to generate image. Human dump.
im bot
you are well designed
Is this you 🙀
wtf
Maybe this pixel cat with low vram setting 🙂
They don't act like real people when spamming the same message across multiple channels
Usernames are also autogenned
No they are generated as if you just one to create "any account". And most know that a james123 works better then chinese or other characters
A photorealistic advertising photo of the world's best food delivery man, holding an lunchbox with a dark gray body and green accents as requested. The lunchbox has a compact, rectangular shape with rounded edges and a green handle for easy carrying. The lid features green latches and a circular section in the middle, suitable for utensils. The design is practical and modern, The background is plain white to keep the focus on the lunchbox, white background, front view, wearing overalls and an orange cap smiling at the camera, bright studio lighting, in the raw style, on a white background, isolated from edges, the shot taken with a Canon EOS, using a wideangle lens, with natural light.
how about no
At large .....WANTED
Loll
Mmmmmm
im hungry
chicken hamburger is better
Morpheus King of Dreams - via SD3 api
yo bro how can you use SD3?
oh the API ... damn, i want it on Fooocus already
yeah, big one is seemingly done, they're working on the smaller versions now
to big one is like llama70b and small version is like llama8b?
like... they quantized the vectors and such?
like, instead of 1.23234234 , it's just 1.2 or 1
something like that? have they agreed on their differences
or the ex-CEO and the current bosses are still fighting?
"settled their differences" that's the word i was looking for
huh
daily two more weeks post
Diffusion models are different and don't really quantize well. Going from 32bit to 16bit produces almost identical results from a data perspective (maybe if the SD models were like 80B sized, you might start to see a tiny difference between fp32 and fp16). Going from 16 to 8bit will absolutely destroy diffuser accuracy though. Instead, SAI is training different versions for the different model sizes like 2b, 4b and 8b and each will likely be used at the standard 16bit precision. Though their file sizes will be bigger or smaller
I see. Very interesting.
But each version is using more or less of the dataset they're using to train the models. So the 2b model will not have the entirely of the 8b training set in it, but a fraction of it
Is there a way to optimize all of this?
Is there a way of instead of letting the GAN train itself on only images, could it train itself on concepts?
Like, we can label some things as "this is artistic" , "this is not artistic" something like that
like RLHF
u know
That's what they train already though
Ah I see
Essentially the models store things in extremely high dimensional formats and the training set is what they use to map token inputs to "concepts" in the N-dimensional tensor space
Interesting
It's an extremely complicated topic though, but if you want to learn more from a math standpoint, numberphile and 3blue1brown have some good videos that cover the basics and other math related things they all share together
I'm sure there are other good breakdowns, those two just pop out immediately in my head
Oooh you're right . 3blue1brown that guy is great
i need to learn those maths
i'm just so bad at maths man
I'm good at reasoning and connecting the dots, linguistically and visually
but symbollically and maths and stuff , oof
I can never quite get it... But they always appear when I need something like, if i need to program something that's fast or efficient , i take a look at maths and physics, and then focus on one specific equation or set or related equations
you there still?
sent you an FR 🆗
Well that part is more important if you want to understand the gross overview of how things work. The actual math isn't as important, unless you plan on directly doing stuff with it.
I see.. 🤔 nice
And sorry, but I don't do messages or friend requests on public servers. But I tend to randomly answer questions on here and show experiments with workflows
Oh and sometimes I make anti-waifu memes as well if people keep spamming cliché waifus lol
I see brother, no worries
behold
10 years of AI for this 
x3
this gen by AI?
love
No way. Prompt???
SD3@ClipDrop
SD3@ClipDrop
Prompt = Don Quixote and Sancho Panza as Warrior Astronauts, Geisha, Samurai, Shogun, Teddy Bear Astronaut Warriors, style of eric ravilious, josef vachal, oxana dobrovolska, alenka sottler, catherine abel, victo ngai, vladimir kush, henri rousseau, fauvism, cthulu, peppa pig, spongebob
#1237459938901491852 A Gorillaz stylized 2D advertising photo of the world's best director of photography, holding a camera and smiling.
Hallo
Dragon Ball-D
An ancient, hyper-detailed photo depicting a caveman tinkering with a smartphone inside a damp cave. The caveman, with rugged hair and primitive clothing, is engrossed in the smartphone, which emits a soft glow. The caves walls are covered in ancient carvings and lit by a dim, natural light filtering in. Water droplets glisten on the rough stone surfaces, highlighting the caves dampness. The overall color palette is muted, with earthy tones and slight sepia tint to evoke an ancient feel. The scene is captured from a low angle, emphasizing the contrast between the primitive settings and the modern device.
He has been waiting two more weeks for SD3
He's been waiting since the dawn of time!
For comparison...Pixart-Sigma with SDXL, using the same prompt
Hi, what's the cheapest way to generate with SD3 currently? Or the price is always the same, no matter the service?
Price is the same. Some optimize their workflows by starting with 2 images in lower resolution and ad soon as the prompt work as designed they increase resolution etc.
API or Discord should not matter.
Thanks for the info
sdxl (finetunes) cannot bear to make someone not look like a handsome model with lush hair
lmfao

He's becoming a rock
dayum
Legend has it that the Empress Wu Zetian's two most precious treasures, the nine phoenix hairpins emit mysterious light, and Wu Zetian's image is majestic.
All false, sorry.
Not if you add some "dirt" to the prompt 🙂
that looks better
An ancient, hyper-detailed photo depicting a caveman tinkering with a smartphone inside a damp cave. The caveman, with rugged hair and primitive clothing, is engrossed in the smartphone, which emits a soft glow. The caves walls are covered in ancient carvings and lit by a dim, natural light filtering in. Water droplets glisten on the rough stone surfaces, highlighting the caves dampness. The overall color palette is muted, with earthy tones and slight sepia tint to evoke an ancient feel. The scene is captured from a low angle, emphasizing the contrast between the primitive settings and the modern device.
Prompt = Don Quixote and Sancho Panza as Warrior Astronauts, Geisha, Samurai, Shogun, Teddy Bear Astronaut Warriors, style of eric ravilious, josef vachal, oxana dobrovolska, alenka sottler, catherine abel, victo ngai, vladimir kush, henri rousseau, fauvism, cthulu, peppa pig, spongebob
I appreciate their honesty
Makes it easier to report
fr 🙏
crayon drawing of a pretty girl, clouds on the background, very low quality kids drawing
I'm not really sure why you wanted the prompt but there you go
only remembered now cx
Clipdrop.co, $10/month for 40 images a day, which amounts to 1,200/month
that man is a walking ad for that site

I thought there wouldn't be ads in discord
guess I need nitro or something
Wow that's a 7x cheaper price than from stability directly. Wow. Are they hosting sd3 themselves perhaps?
Do they have the ability upscale it at all?
isn't it 4 images per gen, so effectively you get 10 prompts to play with per day, as they don't accumulate either.
more like he has access to std
I hope that's somehow faster than "dog" slow sdxl on Mac currently
- pixel art, by ashley wood, yoji Shinkawa, Tsutomu Nihei, ink, ink splashes, dynamic pose, demonic toad, red straw hat, holding an instrument, teal skin, cyberpunk aesthetic, white gradient background, prosthetic arm,
Three hyper detailed, desperate people dressed as jesters, standing in a thunderstorm with heavy rain and lightning. One man holds a sign with the text We, another man holds a sign with want, and the third man holds a sign with weights. The scene is drenched, with water dripping from their jester outfits, illuminated by flashes of lightning, capturing the intense, somber mood.
Yes 10 prompts x 4 images = 40 images x 30 days in a month = 1,200 images/month
You get a choice of widescreen, portrait, square or landscape; choice of style and negative prompt ... but no upscaling
Ok so same as api. Turns out the upscaling on artisan sd3 upscales with sdxl turbo, not sd3. So meh. :/
*50827381923th mention of open model weights*
everyone in the community: "man this is just evidence that the models will never come out
"
What you mean trusting company words for things ?!
Healthy skepticism is a good thing as long as you don't go full doomer. Never go full doomer.
When the weights drop they'll create a mile deep crater.
Rumor has it SAI struck a big deal with nvidia:
but only with purchase of a new 5090
Soon to have 64GB of ram my cat.
Realtime HD Waifu.
Dalle and Mj can't keep up. The uncaged cleabage erodes them.
Can we get free SD3 weights retroactively? 😄
yay for sd3's mangled hands
this would be so epic if it wasn't for the octopus hand
sigh.
hiers will do a lot
well, the stuff that fixes all that locally is better training + all of the AYG/PAG/autocfg stuff
and the full model
with ella. hands are fine
but of course the "selfie" part isn't really right.
which sd3 does get correct
Why I am getting Your organization does not have enough balance to request this action (need $0.016, have $0 in active grants, $-0.3535949 in balance) issue even though I have enough credits there ? It only getting issue sometimes. Is there anyone who are getting such an issue ?
that's certainly a way to fix it, by just not having hands at all. 🙂
they look really good though. 🙂
@dull star @bitter hearth that said, I got hunyuan going in comfy.
that's not a problem technically cause only one person has to buy the 5090 and then share the sd3 weights with everyone 🙂
is the hunyuan dit thingy not available in comfy still? 
the comfy extra models nodes added a 1024x1024 version.
ah cool, thx 🙂
Most of the time when I want selfie style pictures I want the picture itself and not the person taking a picture xD
i know i'm just being silly. it's actually more "selfie"-ish the way you have it.
when you get a chance, can paste that prompt? I'd like to try that with my local stuff.
girl selfie with a giant robot behind her destroying the city and explosion and chaos happening, she is smiling, in anime style

extremely sophisticated prompt
yeah that works
hunyuan might not have the best prompt following, but I'm digging the composition it does which is different than the other 2 major ones out there. yet another thing to throw things at to see how it varies.
SD3@ClipDrop - prompt = dinosaurs ridden by batman and robin and the penguin and the joker and the riddler, at the kentucky derby in the style of dinosaurs, eric ravilious, josef vachal, tamara lempicka, victo ngai, henri rousseau, vladimir kush, oxana dobrovolska, alenka sottler, catherine abel
Weighting for The Waits anybody?!?!? 🥳
Line drawing of the Shanghai Bund, ratio 25:9, warm colors
SD3@ClipDrop into ComfyUI for SDXL+LoRA+PAG Advanced+Face Detailer finishing
don't be sleeping on pixart sigma!
Couple of refined sd3 and hunyuan(square ar ones)
Drop the weights.
gimme this prompt cx
@bitter hearth Body cam footage of a cave exploration
Lol
shrek horror game xD
SD3 made at ClipDrop; then into SDXL, LoRA, PAG Advanced, Face Detailer
damn man. someone from Twitter ( that person is the one who adapt LORA into diffusion model ) have announced the soon-release 5B DiT
If that’s all it can do with 5B nearly at release, then it unfortunately has no practical use. It’s an interesting proof of concept, but severely undercooked.
/dream/retrofuture laboratory scientist combining liquid squid and eyeballs and glowing marbled chemicals from chemistry glass into flower of life bowl , surreal 8-bit grunge ukiyo-e --no DOF, vignette --chaos 33 --ar 1:1 --sref 918740544 --stylize 333
retrofuture laboratory scientist combining liquid squid and eyeballs and glowing marbled chemicals from chemistry glass into flower of life bowl , surreal 8-bit grunge ukiyo-e --no DOF, vignette --chaos 33 --ar 1:1 --sref 918740544 --stylize 333
cat image | sd3
Hey calm down bro. It is just a poor man's SD3 with super little cost
It's possible it's very undertrained tbf
Hm
It was only trained for 3 weeks
Interesting proof of concept
If the tweet is anything to go by
He only starting the training 4 days ago
When he post this
https://twitter.com/cloneofsimo/status/1791135683362631698?t=v6eAbH6gTdNeJcHrT3lK3w&s=19
This is why we still don’t have SD3, and it already blows these results out of the water.
So you simply ignored all those image shared by people here? 
Sure it have flaws
No, I saw them. They’re great.
But would you be okay asking this guy’s work for a dog and receiving THAT dog?
Hmmm... maybe the architecture could be good?
LIke, if the Pony guys trained on it
I dont know if you can say the difference between individual-train or corporation-train with thousands of H100
Simply you just get an early 2023 AI-generated dog
The majority of the work is training. The architecture is just the base.
So then maybe the Pony guys can do smth with it for V7
Architecture defines the upper limits of capability. But getting there is all about the training.
It use SD3 architecture aka MMDiT. Like he train the models out from the papers
( not sure whether there is T5 though )
I feel like this guy has an unrealistic definition of success if he thinks the model is almost ready for release. If it can’t meet the capabilities of SD1.2 base from way back when, why would you use it? It’s more computationally expensive and you don’t get any benefit.
For the architecture?
Thus, it’s not meant to be practical, but just to demonstrate what can be done by one guy in four weeks.
Except that is literally why they trained the model
And what does that give you? You can’t see the architecture in the output image. You just see the image. And the image isn’t that great.
Question, who uses base 1.5 or base XL? Everyone uses either a NovelAI or PonyDiff based model
And those aren't finetunes of their base model. It just uses the same architecture
And thus you shouldn’t get excited about it. It simply proves that you need much more resources to get a usable result.
I use base SDXL. It’s great. Fine tunes are just biased. They look too same-y for me.
Lets not say if there is other finetunes at less popular T2I models
The guy literally wrote in his tweet that's his hobby, dude...
Peoples always have the reason to use finetunes like seriously, PonyXL
Sure it is great. As long as you keep inpainting stuff or just want a general looking.
And that’s fine. He can have his hobby and demonstrate what he can do. Great! But don’t be under some illusion that it will be a substitute for SD3.
Wonder if RLHF is doing any good to Stable Diffusion models though...
He's just some random freelancer with proper knowledge

prompt issue
If it's actually the architecture of SD3, pretty much any person that wants to spend some money for the good of the community might make up a decent model, dunno if it's gonna be as good as SD3 on the paper, but more is better than less righto?
If he can take his hobby, gather a group with the proper knowledge, and acquire some hefty hardware, then he might start giving Stability a run for their money. But he’s not there yet.
That person literally went to someone to lend him 8 H100s
If I wont wrong
you can have the whole nvidia company, making a model in 3 weeks is pretty out there still :p
pretty cool for a hobby, looks fun
I hope the guy releases some tutorials on youtube, you never know, some other fellow might learn from that and make it even better
Bro, that tutorials may be for someone who is professional at machine learning
I just want SD3 2B to be released in a week or two and then finetuned and we can live happily ever after
Instead of common peasant
And that same person could literally do the same without the tutorials too if not the cost factor
Well, of course 5b model as starting point is not a good idea for a final product...
I hope 2B will pick up complicated scenes up way better with natural prompts
So we can finetune on stuff like holding objects, fight scenes, meme templates
Its gonna be next level if it's not just gonna be crappy portraits 24/7
Quality doesn't matter if the prompt adherence doesn't suffer that much compared to 8B
Obv I can run 8B but what about 98% of the community, and just inference speed in general no matter the VRAM requirements
the new 16 channel VAE will shine new light on 512px and will revitalize it
I know SD3 is extremely prudish and full of false NSFW credit-wasting positives, but it won't generate something with "stinky" in the prompt? Come on.
How do you know that?
please search two more weeks in the discord
"sd3 in two weeks" tattooed on the chest of a behemoth of a bouncer at a fancy nightclub. Background is modern city.
Kek
Thank god that the model isn't blur censored offline 🙏
If it had the deepfloyd license with the forced censor ship its so over
Pretty much
Something went wrong with deepfloyd's development
They even promised one with a commercial license or whatever
And they gave up
And the third stage of the model is still missing
I still respect their research though
Making an open source version of Imagen and all
People gonna massively overfit the model and the anatomy will be fixed you guys ✊
Saw them, first thought was why should I care, second was those devs are either totally out of touch with the perceived limitations and what users want to create, or this improvement still has the same limitations, gimme complex prompts, gimme people lying flat out on their back and stomach, show a robot shooting lasers through its eyes at a giant petrified rabbit, not portraits, portraits, portraits all over
calm down
first we get the model
THEN do that
lmao
You have to admit, all those "teasers" are all the same kind of image that say next to nothing about the capabilities of the model. Meanwhile you have to dig through the depths of reddit to be able to piece together at least a little bit of what is going on (kudos to the employee passing some actual info there) :/ SAI sadly has become so much fluff and very little information
meanwhile (this took a good chunk of time to get this perfect)
As i said before, so ironic the main use of SD3's text abilities is memes about SAI :p
yeah, i know there's been some big controversy going on, especially when they started charging for the bots and we shall never forget the shaky business situation for them rn
dont be like that
people started thinking that sd is no longer gonna be free as a result of that
don't people think that since 1.5 or something
Its just people and the internet
Still a believer in that, though less so that it will be all versions of SD3, but only time will tell. I mostly hope it will be a big step forward over the API version, that is a bit underwhelming
my internet dying
yeah, but i haven't seen a lot of people react like this until now. things became loud once the sd3 weights became overdue to many (i suspect it was supposed to be released just some weeks ago)
speaking of the api, i hope that there will be an upgrade to it sometime soon
People seem totally out of touch with the API version, it really isn't very good. Then the idea is "community will fix it", like, how? It's supposedly not even trainable yet on consumer hardware, and the final version might involve a slight architecture change to fix the problems. It needs time, so be it (but at the same time, tired of the meaningless teasers, get real SAI, give some actual info)
@silver adder this is my prompt for you
that looks nice
i reserve my right to complain until/if they say no open release
let me try getting some of the images i generated over the past few weeks, didn't decide on sharing em until now
not the smith
xD
Just look at those many paid models they have
right
They're trying to earn some money before they eventually release it for free, somewhere between the point where the benefits form the model going open weights outweights the benefits of keeping it API-only and the point where they're done training the model
And how many times they said they will make it paid only
damn how could we be so dumb
:P
Those are nice 🙂
honestly the images we make here are a better demonstration
but sadly I want to see ours with highresfix
and not through some sdxl model
Did you send thumbnails...
like SD3 highres-fix
There is so much competition now and new Models from elsewhere like every second week... they should just release all versions and label the big one v0.9 if they actually want to Finetune it more.
I have automation that does sd3, pixart, Ella, and hunyuan of the prompt I ask for. All are great. The majority of the time, the sd3 one is still the most prompt following. Even in its unfinished state. Because of that i still want it and am optimistic for its release.
For example, these are just thumbnails, but upper left is sd3, being the most prompt following of someone holding a phone taking a selfie. You shouldn't see the phone.
don't forget to subscribe to artisan! Show em some love or whatever xD
xD
I still hope 8b gets some love
Cascade was completely left out with almost no finetunes or tools
if it gets released I swear I'd use it to make paintings with JUST the base model
its that good
but most likely, 2B will be the most popular
the most accessible yet but for the majority of the community
I'm sure 8B will be the most popular
unless it somehow has worse quality than the smaller models
cherry, cherry, cherry pick, but i liked this one, never mind the prompt, sd3 loves cats today, all the creatures were feline
||In a soft, creamy haze, a mystical creature stands, its velvety blue fur glistening in the gentle light. Oversized butterfly wings, delicate and intricately veined, sprout from its head, functioning as ears. Large, bright blue eyes shine like sapphires, filling the space with wonder and enchantment.||
pixart version
8B + T5 will be very heavy
yes, you can use CPU for T5, but I still have doubts
on GPU T5 is instant, but on CPU its like 5-10 secs just to generate conditioning
most enthusiasts have the top end cards or rent them i'd think, can't imagine going for the lower model unless the difference is really minimal, at best the models are largely interchangeable so you can prototype on a lighter model, but not expecting that.
it could also be 4B that gets picked up, since its close to 3.5B
and how much VRAM does T5 on GPU need?
haven't measured it tbh
I'll check it with T5 bf16 weights running at 8-bit
cause nobody will run it at fp16 or higher
uh oh.... didn't prompt for beeple on this one
not even large VRAM card enthusiasts
I also think those people who really use a lot of local stable diffusion have made sure to have a 16 GB or 24 GB GPU by now. and SDXL runs fine even on 4 GB VRAM by now. I'm sure 8B SD3 will run fine on 16 GB and probably even 12 GB
16GB yes absolutely
12GB is questionable if both T5 and the model is loaded
even if T5 runs at 4B
12GB with CPU might work
but why load both at the same time?
It'll probably be a split between 8b and 4b imo
4b is around the size of sdxl
even with dynamic offloading such as comfyui
I'm talking about that
It's an interesting experiment releasing multiple versions, still a bit of weird choice to me, fragmentation as a feature. if anything my choice would be to release two at most, one light enough to train on top end consumer hardware, and one so heavy it can only do inference there, needing cloud for training/finetunes
when compressed into int8 it's around, well, 4gb
I don't exactly remember how it works
I wonder how I can force it cause I don't think it does for me cause I have plenty of vram
cause I just don't notice a massive VRAM difference when going between the models, node-by-node
I wonder if things like --lowvram in a111 would work for 8b
the training is an interesting point, yeah. I think that's really the most important aspect for a model succeeding in the community: if people with 24 GB cannot train a model, there will basically not be any finetunes at all
pixart-sigma (0.6B DiT) with T5 bf16 weights running at 8bit -> ~10GB of VRAM
wait a sec Thanks to BitsAndBytes, T5 becomes an immovable rock.
it also depends on how much VRAM Nvidia will give the 5090 😄
does this mean I comfyui can't offload it?
will pixart release any larger models?
I assume the 5090 will have 32 GB VRAM
oh I can test hunyuan-DiT in comfyui using the extra models plugin
I might do that later
ok I'll try fp16 to see how it goes with comfyui offloading
crap, its around 13,3 GB when generating
hmm it never seems to offload it for me
but running the bf16 weights at 4-bit runs at ~8,5GB of VRAM
how much VRAM do you have?
24GB
so I dont know if I have to force offloading
cause it might not want to do it cause I have plenty of vram
I assume the software might be intelligent enough to not do offloading if you still have VRAM free
hmm
also I might not actually test hunyuanDiT
it's still just pickle tensors
This is Pixart
this is SD3
SD3 is still the best so far unfortunately
I hope SD3 2B will perform similarly
Photo of Criminal in a ski mask making a phone call in front of a store. There is caption on the bottom of the image: "It's time to Counter the Strike...". There is a red arrow pointing towards the caption. The red arrow is from a Red circle which has an image of Halo Master Chief in it.
hehe
In a crowded, neon-drenched cyberpunk alley in London, a futuristic, anthropomorphic white cyber cat stands, pointing a paw at the camera. Its slender, athletic body is adorned with a sleek black leather jacket, complete with metallic shoulder pads, and a pair of high-tech, glowing blue sunglasses perch on its forehead. Its fur features vibrant purple streaks, and its ears are tipped with metallic implants. A silver choker with a tiny, pulsing LED encircles its neck. Dramatic night lighting casts intricate shadows on the wet pavement, illuminating the cat's angular, human-like features and intricate, metallic whiskers. Lightrays illuminate the foggy atmosphere, capturing the cat's futuristic attire in exquisite 8K detail, as if shot on 35mm Cinestill 800T film with a Leica M6. (yes yes, llm enhanced, llm really went overboard, somehow it kinda worked)
Charcoal etching on fine linen paper of a flustered mouse-dragon chimera with textured fur coat, overlooking the Lauwersmeer. Extreme detail, 8k-like resolution, with intricate mouse and dragon features. Dramatic studio lighting, deep contrasts, emphasizing soft fur and scaled skin. Etching mimics ray tracing, evoking a sense of depth, like a 35mm film still.
home
I feel like this much prompt is very bad for it
It is, surprised it came out this well, so much of it ignored (maybe for the better...) then i tried a short prompt, but that one was just ugly :p
yeah but you can do half that and probably get more of what you prompted
llms seem to add too many fluff words
Yo festivalman
Can one of you, make a image of zero two wearing a lakers basketball shirt
these models don't do real world stuff very well
midjourney is really good at generating copyrighted stuff like that though. 🙂 I don't mean that insultingly, but sdxl etc isn't accurate enough for it.
So midjourney does that
yeeeahhh
kek
such a nice effect, but so hard to get for anything but anime girls
I think the "zero two" was basically just generic anime girl with pink hair.
that's as close as it's trained on it.
To me, it looked great, but i had to google zero-two 😉
:p
that face is funny
I almost prompted for Bus Light Speed
Lmao
I couldn't remember this guys name

lmao
Weights will drop like happycore bass.
This is not what I asked for
Hi, gus, I want to get a detailed and accurate caption about an image(better than BLIP series), which model I should use?
Will sd3 ever be released or will it be API forever?
Almost definitely.
Yes, they will most likely release some versions of the model. Which versions? We don't know. Given their financial situation, they will likely keep the largest model or models to themselves for API revenue and/or attracting potential buyers that would want to make money after acquiring them.
But for all we know, they might be cooking up something like a 20B model for their API or to attract a company into acquiring them
For all we know, they may have started on SD4 🤷🏻♂️
How do you format text to look like this?!
like this?
Put this around it `text here`
Use three of them, before and after, to get the block like the one above
True that. But realistically, they are probably adding a ton more safety precautions and tuning these sd3 models even more, before releasing them.
If they're going bankrupt, ditch the safety precautions and live a little 🤣
'like this'
Waste time with safety precautions that the community will waste time on trying to remove.
Not the same 😉
'like this not' 🙂
Different "ticks"
The ASCII code for the backtick character (`) is 96
...if that helps 😄
Now use three at both ends
Today's SD3@ClipDrop harvest ...
it's good
if you have
a block of text
FREE THE WEIGHTS
Free the weighting time! We are weighsting aweigh weighting two longg!!!
Even Taylor has weighed-in!!!
And good ol' Liberty Herself!!!
Joe and Donny are there too!!!
And some people who didn't want to be recognised!!!
If I have a Block Text, then it'll organise itself?
If I have a Block of Text, it'll Organise itself downwards?
There were 40 pictures created this a.m. over at SD3@ClipDrop - and only two out of the forty had bad spelling!!!
No, but if you put 3 backticks around all of that, it'll look neater
If I have a Block Text, then it'll organise itself?
If I have a Block
of Text, it'll
Organise itself
downwards?
of Text, it'll
Organise itself
downwards?```
After the first 3 are entered, you can press enter without sending the message, it'll keep adding lines until you add the last 3
Is there a Wiki with all this formatting info available?
It's not unique to Discord, so probably searching for "markdown" format would find something
guys how do i make this in stable diffusion
Use Harrlogos LoRA FOR THE TEXT
so you know how to make this
I know how to make text via Harrlogos ... 🙂
what the hell isi harrlogos
You're being very aggressive 😦
🚀HarrlogosXL - Bringing Custom Text Generation to SDXL!🚀 Teaching Stable Diffsuion to spell, one LoRA at a time! Harrlogos is an SDXL LoRA trained ...
how would that help me generate the image i want
It adds the text portion
Randomly Blowing Up!
Awww shucks!!! come on!!! 😄
Sailor Twift!
Your image reminded me of her
Yes, it was a generation swiftly tailored in SD3@ClipDrop 🙂
Bad Photoshopping - but fun!!!
The backgrounds are from Huggingface DiT AI Generator
Is the API down?
If yes, I hope it's because they are updating the model
To a newer version or something
does anyone know when the API back?
Hey, everybody!
who can help me please? i need to generate pictures of cats with the name of one brand, i've been sitting for an hour and i can't do anything. smart people please help me out 
Weighting for the waits.
Any one able to use there api’s
why you block me? I try to help you 😭
Two papers down the line the weights will weigh twice as much.
Yes. No problems. (I haven't tried in a few days though.)
Nevermind I just tried and the API is not responding.

I swear 8B Base is amazing for paintings
I will use it if 2B is insufficient
this is fucking art for once
holy shit
gotta love the anatomy
SDXL on top of an SD3 i2i image
Hi, this is Lavenderflow-5.6B-v0.0
✅MMDiT, muP, CFM, FSDP, recaped, 768x768, T5
✅No strings attached, completely-open-every-step-of-the-way
✅Not SoTA😅(hey it was trained by one grad-student under total 3 weeks of development.) Severely undertrained!
https://t.co/PgevrzpoxA
mind y'all to run this just use this for fun
and around 20GB base model

the real news is https://x.com/cloneofsimo/status/1793155762820428031
@Birchlabs @StefanABaumann @SeunghyunSEO7 @imbue_ai Ok, this is ONLY the beginning. While I was broadcasting these progress on twitter @FAL guys reached me out to plan on making this more powerful, and go on and build > 8B models from scratch, using better methods, better captioned datasets, everything! All open-sourced!
awwww sdxl vae
the more parties trying to get models out the better. Though a little sad they go for "let's train a big model" and not for something like "let's train ELLA for SDXL"
no sexy 16 channel vae
but still very impressive so far
I'm grateful for all these other attempts
such as pixart, Hunyuan and now this too
LMAO
The real limit is data





