#πο½sd3
1 messages Β· Page 29 of 1
i love going to the toilet while my friends play in the swings
Don't want to miss out on anything
so anyone has found how to get consistently good results with SD3, without having to use stuff like chatgpt to enrich the prompt?
because its slow and very poor in quality, nuanced and beautiful sampling techniques straight up fail with it, and there is disco diffusion level of control and compatibility
yes prompt with sdxl then upscale with SD3
i just hook it up to one button node drag the slider for auto queue to the right and go do other stuff
ok, and without cheating with other models?
that looks like a screenshot from BDO
artstation β β β β β
Me going outside once a month
can you put emojis in prompt ?
that's so funny ngl
cursed image
last
why is it?
all seven fingers.
anyways i love the aesthetic
very cursed. it gives me chills
looks like typical flashlight shot liminal photos, I love it
if you want the prompt is : poor quality photo from the 2000s of a playground in a bathroom, a slide on a bathtub, swings, toys, colored with childish drawings, a pink lamp hanging from the ceiling, pink chairs, shadows , bright colors, a giant plastic duck, hot pixels
yes i try to reach this feeling
i like it because it vibes with my personality. childish and shallow
There's always oil in my water colors :(
A watercolor painting showing a wide shot of a lonely tree between ruins, looking at an aging castle in the distance. The setting sun casts a golden hue across the scene, creating a warm, soothing atmosphere. The sky is a mesmerizing blend of orange, pink, and purple, reflecting on the calm sea waves that gently lap at the shore. Golden cut. β trending on artstation 4/5 β
β
β
β
β β¦β¦β¦β¦β¦
(Using ruinedfooocus, 45 steps, cfg 4, dpmpp_2m, sgm_uniform)
(oil:0)
same
time to cook... oh wait...
amazing
How good does it run on RuinedFooocus?
So many people are trying to claim that the commercial licensing kills fine tuning. Astralight really started a misinformation fire. Seems like he's raising drama so that stability "makes him shutup" . This might turn into legal issues down the road. Astra being really bold here
Putting "Oil painting" as a negative just makes it look more realistic. π
Not sure if ruinedfooocus supports the "(oil:0)" notation. Tried it in the thrid image.
Fixed seed.
amazing
Congrats for the shape of that thing in the tub
how on earth is that a legal issue?
Idk. :3 I mean it's fine?
But I don't think it is at it's full potential. It doesn't use the T5 version.
libel
datavoid created a script ot help with that. it's on his civit page. he was here talking about it yesterday
sd3 2b / lumina / pixart
No clown makeup or braces wasn't in the prompt π obviously fangs and red lips fcked it up, but uf... (prompt as image cause for some reason it's banned by reddit)
this is some real horrific stuff
for real, uneasy feeling
have fun with the "licensing" SAI lol. You will be irrelevant soon , many competitors.
weird
people get free shit then mad when other people commercializing that free shit might have to pay for it?
so weird.
Someone spilled their spaghettios
obviously the staff at stability.ai have zero respect for that position and shouldn't engage with these people at all. I'd love to see a mod just start banning people for acting that way . 0 warning.
the fuck?
SD3? so nice
so mad
whats you prompt
Troll account.
so, your message you tried to send 10 times was blocked by .... automod because it contained the keyword "cp"
I think it's called engagement farming these days
Asking same cuz that's the style I'm tryna make rn and am failing miserably π
bs
100% they're part of the bronie brigade
what the fuck is that prompt LMAO
can we ban that person
looks like back rooms
Stability should really talk to their legal team about Astralight sending his server members this way
It is, liminal space stuff
we sure can
You can hear them marching
cool robot
i doubt they have the discipline to march. they're more like a black friday door crasher mob.
SAI had technical genius but zero business sense, if you aren't the leader you can't demand things you will just get passed over by the competitors- you have to gain the most market share first then you can demand things lol. Must be a bunch of 20 year old business geniuses running things
that looks like a bad photoshop
you seem to got a handle on it. Start your own AI business. The time is hot right now.
yess
Oh no
anyone testing out lora training? curious on what learning rate y'all using
You could just save the clip from sdxl and put them into clip folder
i tested trying to get simpletrainer scripts working. It got about that far
yes, I know that thank you, but do you guys know of any other as well
just the mobius one i saw him playing with . had some interesting results iwht it
π
on civit, there's an sdxl + sd3 merge thing... i think that guy just has a custom clip and used it with sd3 too.
ahhh kk, i got the diffusers script working (windows) and ran some few tests, so far it seems sd3 lora training in faster than sdxl atleast without the t5 encoder
You could load any sdxl model and save the clips
ugly anatomy. rip sd3, it's over
i don thtink we need to train the text encoders at all. that's what i heard.
i'm not training them but you need them for training
got any tips for trying to get diffusers working on windows? i might go that route
high vram and some programming knowledge
right. without t5 would make sense for memory saving. wonder if it'll affect using the lora with t5 in inference
What about the prompts with LoRA training? They have to be with the format of long descriptions, without an activation word?
16gb and i used ot have a career hardening php websites (turning off register_globals made me a lot of cash)
doesn't rlly matter imo i almost never used prompts for training other than ("photo of ohwx woman")
I would probably prompt the lora key words on the clip layers, and use the t5 layer with a long description because i love long prompts and have always wished sdxl and sd15 understood them better
16gb is perfect, at 1024x1024 training i'm only using 12gb
funny to me that what i was looking for is here now, but people are so mad they are expected to write longer prompts now. oh well.
on sdxl i'll do 896x896 (and relative buckets) with batches of 2. works good.
sd3 is great imo, just needs some people to fintune it, i think it has the potential for 1.5 level of fintune
thank god they turned on slow mode for this channel
its a little harsh though, 30 secs
yesterday you could barely read the channel
the censorship is outrageous!
yeah, 30 secs back then would have been amazing, but right now the chat is rather dead
it was 15 secs like an hour ago
Will you do a PixartΞ£ Pony btw ?
I'm having trouble with disk space again 
someone posted not long ago
LMAO
lol same
I don't know yet. I am exloring alternative models but XL is focus now.
pixart would be interesting
noice, no humans in sight 
Nice...Until you look at shadows and lighting....
100% /home RIP in pieces π¦
a man sitting on grass
try to make him lay on his back
no you
sitting work but laying is NSFW so it is a nightmare
i can't i don't have high enough prompt skills yet
its completely correct I swear 
woah nice, raw output from the model?
output + other programm
is this realy works?
https://www.reddit.com/r/StableDiffusion/comments/1df0kau/sd3_has_been_liberated_internally_pure_text2img/
oh its this vhs app, I love that one, thanks

partial, eg sitting work better but all double use words like Laying cause horrible nightmares
did they censor the community model? ppl say the api-version makes better people than the one they released.
thats the myth. the dataset was built without pornography in mind, and the use case of "women laying" is undertrained. There are tons of undertrained poses. It's a base model.
if you believe reddit, you'd believe that they go in an edit the weights after it's trained. you'd believe that they have another copy of 2b sitting around that's not edited. This is just a whole made up thing that redditors are stuck on
wasn't talking about porn tho, just people.
hands are fuckt, faces are fuckt, gestures are fuckt... anything that involves a person is basically trash from what i've seen.
These were missing in a workflow I was trying to use, how do i get them? Sorry im not used to comfy
lol i'll be fine. don't sulk
Manager > install missing custom nodes
ofc
you mentioned censoring. that's what the crux of the conversation is. the censor myth comes out of the caimer community
they are but gotta play more with model to see if it works good or not
humans can be generated. but "woman laying" is under fit. It's not unobvious why people are so pissed about that one case being under trained
Idk why this have a backroom vibe, π¨ iΒ΄ll try to create something like this xd
Men laying, men sitting, women sitting, .... the list is quite long
Holy shit the lighting is creepy but epic af
check the hand, the people in the background, and the eyes.
lots of people making people sitting. every seed and every prompt can't be a winner. base models can't be over fit to a concept. it's better to underfit them
there are problems. i wouldn't say that case is caused by censoring at all. why are the goal posts changing suddenly... oh wait i know why. It's a contest you want to win. Not an honest conversation you want to have.
you guys are crazy what you found out in 28 hours
thats just a normal day in florida
huh?
You said people were having a fit over one thing, and I said that is wrong.
"whaaaaat?"
Look what I made mum!
i literally don't get your point
i never said people are only having a fit over one thing. but it's pretty obviuous people are MOST hung up on women in grass
i wouldnt exactly call those women
How about Cascade? I know it takes a lot of resources to train, yet the quality itself is already superior compared to SDXL and Cascade could use additional checkpoints (already quite fond of the generalist model Invictus Redmond fine-tune the following images had been created with). Even if not likely, just a thought:
the hangups are motivated pretty obviously. i'm not speaking toweards any individual. the community outrage is fueld by a need for porn though. it's nothing more than that
You did highlight one thing.
non commercial
people who want to create accurate images are doing that. finding work arounds. having fun.
People who want to stay mad will stay mad.
not sure why u keep equating shitty gens on sfw anatomy with a need for nsfw
prompt
are u ok?
childhood trauma π
This is cool
who even are you?
Ah did't know there was a difference license-wise
photography in the style of detailed hyperrealism ,creature ,fantasy,James Christensen,hyper detailed
are the huggingface spaces worse than the real thing
It's the same model.
uuh, i'm me...?
I found the diffusers thing to train. Isn't "sks" a gun? :thonk:
i thought so, it just seemed unimpressive sometimes
that's unfortunate
nope im you
i got duplicates.
lol don't make fun of older people. you'll be old too. /grandpa simspon .gif
i would like to apologize in name of everyone here for his behaviour,he has mental problems π
photograph of a woman wearing fantasy intricate armour, treading on fartstation, found in a bin, disasterpiece, cinematic, bokeh, bilby bilby bilby fnar fnar, nudge nudge, wink wink, say no more.
The secret is to insult it.
making fun of someone for being fat is one thing. you can avoid getting fat yourself. just gotta eat less. simple.
making fun of someone's age is a timebomb waiting for your mid life crisis
you have to step on it and punch it until it gives u good imgs
It's still not fair on the person who is fat though.
lol bruh. you still on that myth too. weren't you the guy thaat got @lunar canopy the mods belieivng i was on meds?
i'm me, this is myself, and that's uuuh, that one over there is my personality.
no that wasnt me that was the other guy
what was his name?
i dont think people who are making snide comments care about whats fair
I do not mean to offend, but I don't ever see you not talking in here.
oh these look lovely
hey
I have been on-and-off in this server for 2 years flowwolf
idk but u can find the comments by just searching the med names
starting to get boring reality lora vibes. I really hope the guy who made the lora will make one for SD3 too.
same. we must be in sync.
that's too much work i'm sorry
Only difference is you left multiple times.
if you know that, why would you say i'm always here? now i'm offended cause you're fishing me in pointlessly
π π¨
No more pointless than arguing with people in the #πο½sd3 channel instead of working towards a better model.
Okay, let's go to classics. "a photograph of an astronaut riding a horse", stable diffusion 3 running in the official huggingface space, default settings except for seed 1. What the hell is going with horse's legs?
that was an instant reply wtf
Guys check out http://recraft.ai
Create and edit digital illustrations, art, and 3D graphics in a uniform brand style. Use Recraft to quickly generate designs for your app, website, or brand online for free.
This isn't the channel to advertise this. This is a #πο½sd3 channel. #πΆο½off-topic bring it there.
ups sry
SD 3 in ComfyUI with config in the ksampler set to 4 instead of 4.5
I hope you guys are also using Ultimate SD upscale with SD3. It makes the difference.
prompt: Flowers growing out of a crack in the sidewalk next to a lawn, in a heavy rainstorm, sallow, stark, thin, waif; Insanely detailed, sunken eyes, high cheekbones, broken teeth, long flowing raven hair, delicate, rags; intricate, hyperdetailed realistic by artist "Dante Gabriel Rossetti", by artist "tom bagshaw"
sometimes i'm quick sometimes i'm slow, and sometimes i'm mentally slow... depends on how many hands i've got on the keyboard and what's on my screen.
no advertising
bruh, it's a comfy node...
no reason to yet, still learning how everything effects the final output
Considering that is used with SD3, that isn't advertising at all.
If your intention is to be annoying, I'll be contacting staff in seconds. I'm here to mess with productivity.
weird threat
in seconds even
the upscale workflow is part of the workflows you can download from the huggingface weight's page fo rSD 3
if there's one thing you can't count on here, it's the mods
Then you'd be wrong.
rude π¦ fruit always listens and does well
you can count on @viral plaza
well me perosnally i can't. i know this. people harass me directly often. Repeat offenders. Mods don't care though. They've even endorsed it at times
A close-up of an extraterrestrial being, characterized by its large, prominent eyes, wrinkled skin, and distinct facial features. Thebeing's eyes are detailed with reflections, suggesting a light source nearby. The texture of its skin appears rough and weathered, with numerous wrinkles and pores. The image is in grayscale, emphasizing the contrasts and details of the creature's features.
you can also count from 1 to 10
can sd3 count 1 to 10 though
like, aren't we supposed to share the workflows and settings in order to make this bloody model work here?
naw @lunar canopy has gone along with the trope that i'm mentally ill and on meds that a small group preached for a while.
only if its properly safe and consensual
Nah he said that because I told someone to not advertise an unrelated website that isn't SD3
listen the dan guy said no advertising
hey @slate portal great thread there on reddit
i can browse and use that AI site linked in comfyui
i have not done that, and am timing you out
thanks btw (:
I can see why he gets timed out.
oh π¦ my bad then >_<
that was an answer to ultimate sd upscale mention, though
Yes, after I told someone to not advertise an unrelated website.
The point is he was trying to use my own words against me lmao
AI is unrelated?
oh lol ok
Sorry, but do you have one braincell? It's almost like you're not keeping up with the conversation.
isnt pretend modding against rules
Whooo, finally chat without the cheerleader?
Are you here to do anything but SD3?
far reach
This convo is a waste of time
ya stop
atlest it's good with animals
You made dark contrast lora? something like that
does anyone else like avocados
prompt: CSGO , AWP,DRAGON LORE SKINmodel: stableDiffusion3SD3_sd3MediumInclT5XXL
steps: 28
cfgscale: 4.5sampler: dpmpp_2m
scheduler: sgm_uniform
There's a couple people who made dark contrast loras, but I made one of them yes
Zavychrome dat you?
Yes
prompt: CSGO , AWP,DRAGON LORE SKIN
model: stableDiffusion3SD3_sd3MediumInclT5XXL
steps: 28
cfgscale: 4.5
sampler: dpmpp_2m
ahahhahah no way bro
yea it says Zavyπ , awesome bro that's only lora I use haha
death by avocado?
Thanks π π«. Took a decent about of work to find all of those tags and latent spaces
Nice, what made it be your sole lora you use, if I may ask?
you be a monster
no question ur work is clean as seen in zavychrome models
for now im just waiting for ZavySD3
its very subtle, and it brings up the image quality so much. Just feels more professional, I often use it on 2x.
I'm waiting for Juggernaut sd3.
I'm not gonna finetune, but just curious if the training code was released already?
bro now way you getting this from csgo prompt xD
I'm unlikely to spend time on that for some time, unless they can show that they didn't poison anatomy to hell
Video feed VHS style of a futuristic small children treehouse. Creative materials are used like tires and recycled plastic bags. In the backyard of a house. A small kid is posing in front. christmas lights. at night
Prompt?
i agree, from what i noticed is when i make perfectly clear characters SD3 tends to randomize the faces all the time
anorexia
she was being fat and avocado shaped
l: by Jacob van Ruisdael and peter Mork monsted
g: A hotel stands on the edge of a rocky basalt cliff overlooking a lake far below. Rain pours down on its wooden facade, and its windows are lit up. The lake's surface is choppy and dark (by Jacob van Ruisdael and peter Mork monsted:1.2)
t5:
l: A hotel stands on the edge of a rocky basalt cliff overlooking a lake far below. Rain pours down on its wooden facade, and its windows are lit up. The lake's surface is choppy and dark (by Jacob van Ruisdael and peter Mork monsted:1.2)
g: A hotel stands on the edge of a rocky basalt cliff overlooking a lake far below. Rain pours down on its wooden facade, and its windows are lit up. The lake's surface is choppy and dark (by Jacob van Ruisdael and peter Mork monsted:1.2)
t5:
l: by Jacob van Ruisdael and peter Mork monsted
g: A hotel stands on the edge of a rocky basalt cliff overlooking a lake far below. Rain pours down on its wooden facade, and its windows are lit up. The lake's surface is choppy and dark (by Jacob van Ruisdael and peter Mork monsted:1.2)
t5: A hotel stands on the edge of a rocky basalt cliff overlooking a lake far below. Rain pours down on its wooden facade, and its windows are lit up. The lake's surface is choppy and dark. A painting by Jacob van Ruisdael and peter Mork monsted
it REALLY did not want to do it, and it green kept going on her skin and her face would still be there,
Thanks, yeah I do feel it comes most alive on the upscale. Which is one of the things I like about SD3, with the updated vae it's not needed to upscale for details to come alive. Just kinda broken right now
almost got it
It can understand artists, but only clip g, use any other, and it makes a photo π₯
there is diffusers code for lora and dreambooth, but it sucks cause its diffusers format... we can't use it in stuff like comfyui..
I think we should wait for onetrainer implementation, which is being worked on rn
change the cfg setting on the ksampler, change your sampler AND you scheduler, and change the value on the ModelSamplingSD3 node - and you will affect whether it looks like a photo or not
AvocadoWP @steady carbon
Nice
let's try
ahh alright, it's gonna be a while till any finetunes then :/ I am disappointed with the human anatomy problems. The model is amazing at so many different things it hurts that it's hampered in such an important area.
with a man its not that hard but with girl feels impossible its cursed
Contracados
community gatheringποΈ ποΈ π₯
I'm that close
a girl horizontally on grass, like a normal person
Is this hf SD3?
AVOCADO GANG
hf? its SD3
Sd3 from the Huggingface d/load?
Creepypasta : "never ask to sd3 to generate a girl in grass"
Yup
Woman prone in a field
Does anyone else find SD3 to be wildly oversaturated? I dunno if it's the weird samplers you have to use, or like the VAE or something
let me know if you'd like my workflow to play around with
yes
Its excellent quality for a "real person" - most real people are coming out horribly mutilated
All the people laying on grass posts are quite amusing lol.
I got it!!!
would love to, your trick already does wonders.
change the value in the ModelSamplingSD3 node for comfy to something like .5 and tweak up. Also, change your sampler to uni_pc and the scheduler to ddim_uniform. that should go a long way to desaturating
Cinematic still photography of a modern sports car.
huh?
DM'd it to you
prompt: A normal girl laying on the grass vertically,
,
model: stableDiffusion3SD3_sd3MediumInclT5XXL
,
steps: 28
,
cfgscale: 4.5
,
sampler: dpmpp_2m
,
scheduler: sgm_uniform
Yes, I was saying that SD3 has the saturation charateristics of SDXL 0.9
What does Model Sampling achieve?
prompt: a high-FOV first-person view from Skyrim, character's hands holding a saphire red dagger, standing on a rainy mountain peak, vast landscape of forests and mountains in the distance, epic aurora,epic skies, high graphics, highly detailed textures, realistic lighting, immersive fantasy style, Unreal Engine 5
we're not using U-net, keep that in mind. it's sampling the model. and it's very touchy. the difference in effect between 1 and 1.1 is quite noticble. and you can go from 0 to i'm not sure how high. i've only gone up to 5 so far
What does Model Sampling achieve?
i just answered you. it affects the look of the entire image
it's a way to amplify the noise
OK
research paper says 3-6 generates more "interesting" images
Aitreprenuer has a thoughtful video out that takes out some of the sting of the complainers. https://www.youtube.com/watch?v=Eq6-81vZ218
Say goodbye to Midjourney and welcome the new AI model that's set to redefine the future of AI art generation: SD3 Medium! BUT It has issues...
In this video, I'll show you good, the bad and the ugly side of SD3 and how this release will pave the way to the future of AI models!
Have you tried SD3 Medium yet? Let me know in the comments!
β¬β¬β¬β¬β¬β¬β¬...
counter strike is like how its with the rest of the base models
and values over 5 start really overcooking the image
prompt
damn thats closest ive seen yet good job
good things take time, I look at this situation as 1.5 days all over again
Nice image!
It certainly does, but that starts with accepting there is a problem, which some people avoid π
Pretty good horse and awesome trouser creases!
git gud guys
what the fuck
has anyone managed to run any SD3 lora on ComfyUI? I created a LoRa nsfw
Who is gonna train this with that kinda license? Nobody will actually pay for that lol
SD3 doesn't know who Andrew Tate is, damn it
i wanted to make some funny memes lol
well anathomy will be fixed. For me the problem is that model doesn't know any style.
Pretty good - amazing akchtually
Wow, as a trade off dude is missing an arm
Rendered with the Peaches lyrics by Jack Black
pls no nipples
LORA will do that, like Pony
not exactly. Finetuned models maybe. But not lora.
for me that is the worst limitation.
I will be working on a LoRA when tools are more available that should fix anatomy better than a finetune
Crisp vibrant killer-colours!
yes but the fact that seems not knowing any style is very limiting
Trust me, I made my own method of weight untangling that worked absolute magic on SDXL
#πο½sd3 message does destroy the photo look, helps a fair bit
I have hopes it will work as good if not better with SD3
I think I might have found something interesting, but I don't want to count my chickens before they are hatched
weirdly temperamental model this sd3, above it i noticed just prompting clip_g and in the others only mentioning artists or empty works as well, full prompt in other encoders, boom, photo
pi-hole ?
Try turning down your CFG to 2 for snow images, I found that out yesterday
@hexed dirgeSome examples of how well my fix worked on SDXL
what did you find?
Thanks - I did not know that!
maybe I'm not clear. I know that body fails will be fixed.
try this and set the CFG high to like 6-8
but. SDXL has tons of styles inside that SD3 seems not having
lol I'm just doing random experiments, I bet this isn't even going to give consistently better results, like freeu
PS are you using WLSH image save node?
I am using SD3 as input to a simple KSampler+LoRA setup - then feeding that into SDXL and Face Detailer ...
Is the Triple Clip SD3 Node essential?
audioscavenger - Save Image Extended for ComfyUI
I just wanted something to save in .jpg , I don't need .PNG's filling up my storage.
with cfg at 5 vrs cfg at 2
So lora won't fix that
just use the preview image instead of the save image, and then right click and save as images you actualy want.
i dont believe so, you just need 1 text encoder, choose any one you want
I usally just set big batches and go watch a video, thats not going to work.

the shift seems to make a big difference
yeah it looks like one, this is what text in SD3 for, not some crappy ass "flat 2D photoshop text on cardboard"
if you want to use a simpler workflow, load the sd3_medium_inclu_clips_t4xxlfpf8 or 16 model and then you don't need most of the nodes - you only need all of them if you want to finetune settings
zamn
What does Shift do? I know it makes a diffference - but what parameters is it changing?
ΡΡΡΡΠΊΠΈΠΉ ΡΠΈΠΊΡΠΎΠΊΠ΅Ρ ΡΠΈΠ΄ΠΈΡ Π½Π° ΠΊΠΎΠ½Π΅
no idea, but i couldn't make the pear human-like until i played with it
Yes I have clip_l and clip_g plus the t5xxlfp8
yes. but if you use the model i screen shotted, all of that is included in it, you don't need them individually.
adjustiing clip values?
Using regular clips + biggest T5 is actually worse in some cases than just using the two clips, I'm finding. The farther away one here makes way more sense overall and is with just the clips, on the same prompt and seed
I just include them as I do not know what they're doing?! π
fine
i use the separate models, makes the t5 clip default to CPU which is good for my poor old 2080
yeah the text encoder part is a bit weird. some times some combinations are better and some times some other combinations are better
same prompt, same seed, different shift
So I could leave the top two fields blank?
what i'm going to suggest you do is try it and see.
@winged seal did you understand?
Sorry? I was gone
Will do ... I will know something is happening ... but not what or why? π
what I mean that lora can't fix the lack of styles
what styles are you lacking?
literaly everyone
You can use ClipSave to save SDXL checkpoint clips to disk and load those instead of stock clip_g and clip_l btw, can give interesting results if the checkpoint knew about more concepts than like base SDXL
like literally tell us what you cant' get it to do so we can try to figure out what you need to do to get it to do that
i believe roblox type figures was a issue, not sure tho
highschool was rough****
I reached out to them early on (when license changes were pre-announced). They never replied to me.
I asked to get in touch with someone here before release. They never replied to me.
Community asked them. They never replied to me.
This became a PR issue. They never replied to me.
We can argue about definition of "denied" but it does feel like it to me.
try to explain. Look at this site : https://sdxl.parrotzone.art/ . They did a great work discovering all artists can SDXL base reproduce. SD3 has near to 0 af them
it was a ... hairy... time
Need a hand?
so far every artist i've handed SD3, it knows
it's like she had surgery where they cut off her hand so they could use it to extend her other arm
Oh, my bad. Yeah the styles aren't going to be something that I fix. Styles are a lot easier than fixing fundamental issues in the actual anatomy and entertangling of concepts, but I don't have too much experience with them for SDXL or SD3
Does it know David Lynch?
I am also adult enough not to be afraid to admit it.
i don't even know david lynch. What does he create?
oh honey
My goal is to provide a detangled base model for SD3 that performs significantly better than what we have right now with way less deformation and such. And then from that people should be able to easily train in art styles with much less resistance @hexed dirge
sd3 2b likes cats π 8 out of 10 for " a close up of a mystical being with scales that shimmer like the moon ..." is cat like, the other 2 anime girls (artists in prompt must have caused that)
it knows some major styles.
prompt: watercolor and ink line drawing of a small child and a red balloon
Rabbits (2002)
In a nameless city deluged by continuous rain, three rabbits live with a fearful mystery.
Written and Directed by David Lynch
Composer - Angelo Badalamenti
Cast - Naomi Watts, Laura Harring, Scott Coffey, Rebekah Del Rio
https://letterboxd.com/film/rabbits/
https://www.imdb.com/title/tt0347840/
I have been re-running many of my SDXL prompts in SD3 - and they look nothing like the SDXL version - seems that most "living artists" have been withdrawn from the SD3 Dataset?!
Yeah, SD3 feels like SD2 again.
Very limited copyrighted material in the training/tagging, very boring output in comparison.
also dead one I think. And this is a big issue.
When I think of David Lynch I think of Eraserhead and Twin Peaks, and also that "alphabet" short film he and his wife made in the 70s
we talken david lynch up in dis too?
so he's a video guy?
crystal said SD3 recognized every artist so far, asked if it rec'd lynch
My favourite living artists include Andrtea Kowch, Victo Ngai, Yayoi Kusama and Vladimir Kush
they are all portrait models
prompt: afremov
For zdzislaw beksinski it just makes trees.
Stability did some shenanigans and lobotomized it
look it up yourself. You will only get jokes here.
try increasing the weight
seems it works though @mellow latch
prompt: acne, freckles, frizzy hair, portrait and background scene by David Lynch
Definitely looks like something from Twin Peaks
Where's the log?
okay. i mean, it even knows an artist i found on facebook that only 1.5 knew
put cinematic instead of Lynch. And use same seed
It just makes beksinski trigger a thomas cole look.
Legit looks kinda bad for style, though coherency and output are fine
I am getting the idea that they have given The Community "free SD3" - but an almost totally bland and almost 'featureless' version - but it's their opportunity! I'm sure they're going to release a much better version on a pay/month basis!!!
yeah :/ it gets a bit better with weighting but beksinski's influence has been immensely reduced
Just sad.
Stability chose to make the model a small fragment of what it could be
they probably withdrew anyone that opted in to Spawning's database of artists that don't want to be used in data training sets
1.5 was so vast
2B is near to SDXL so not that small.
the prompt has no mention of dog only pet fish I want my pet fish SD3
have you tried prompting the text encoders like this l: by Jacob van Ruisdael and peter Mork monsted g: A hotel stands on the edge of a rocky basalt cliff overlooking a lake far below. Rain pours down on its wooden facade, and its windows are lit up. The lake's surface is choppy and dark (by Jacob van Ruisdael and peter Mork monsted:1.2) t5:
it's crazy, and not how i'd want to use the model, but then, for some reason these artist seem to work. (also the lower cfg / lower modelsampling value works a bit, but seems to devrade results into blotchyness as well, it's a fine balance whn trying that)
Those feet are quite handsome - or vice-versa!
Yes, my "Dustbowl, Frail Children of Dust, American Gothic, dystopia - style of Beksinski and Andrea Kowch" prompt looks weak and washed-out in SD3
1: Lynch 2: cinematic
I speaking in terms of capability.
Its a large model that doesnβt understand style.
Just like SD2. Bad training = bad model
Hopefully SD3 will demonstrate to future model trainers/researchers what a bad idea that is
you used the word 'pet' - dogs are the most likely thing to be associated with pets. just use a prompt like "walking down the street holding onto a leash attached to the halter of a goldfish
well try simple prompt like color portrait of a sci-fi woman in the style of Moebius
tried adding artstation to the prompt? 
Will try artstation next time π
quite frankly, if you can't find a way to prompt and get a good result without having to resort to a shortcut like someone's name, you need to work on your prompting skills
Good news is there will be no GREG RUTKOWSKI!
Yeah, I worded that more harshly than I should have. It's also much more difficult to do these projects and especially check over things mostly solo I'm sure, since it's already enough of a pain in the ass with a small team, and I'm not really taking that into account.
By the way, if you need help with dataset cleanup I know we have made some advances there (including being able to detect probably 100K images I know you want excluded), I'm sure we could share that if you're interested.
I think this will be an upside to SD3 - that we all re-double our skills at prompting!
I am going to mess with SD3 for composition and text, its not a good art model imo.
Whenever the community finetunes and combines a model with Loras we will see something really special.
good! his influence is poor at best, unless you want blue toned sword and sorcery landscape scenes.
SDXL gives results almost near to every style/artist. Images weren't good, but with some finetuning or just a simple lora like xl_add_more_art everything changes drammatically. SD3 is not near to any style and that I think is very very limiting
this is not desirable
Using generalisations like Fauvism, Fibonacci, neonpunk, zentangle, cubism, art deco, streamline moderne etc etc - SD3 can be superlative
Well, atleast it's not photo's
seems like a good model for doing art, to me
it is a good model for art
yes, they are good, but it's just a general digital image. Not the style asked
dAMN gORGEOUS!
It's when you use a specific person's/artist's name that SD3 begins to fall-down
No, you are misunderstanding latent correlation and how the model works.
When these connections are severed from their IP/copyright holder the connections are lessened between relevant text too.
i can even get a canvas texture out of it if i want to
@jolly swan I have no dogs in this fight but tbh I have been reading the drama and if I were you I would never speak to those people again, it is obvious they have nothing but contempt for you
Thats a nice pretty forest, its not very artistic though.
SDXL with adapters is by in large better
dude, i misunderstand nothing. i TEACH this stuff. what i'm saying is that you need to stop using shortcuts to get results and learn how to prompt correctly. or just use 1.5.
Can just take the whole discord thread into any llm for sentiment analysis and it will show what everybody who is not one of their echo chamber knows : contempt, hostility, smugness
yeah, i'm very affraid its more clip knowledge of adjecent styles than training on the artist
What kind of behavior (and optimism) would you otherwise expect from someone preaching magic of friendship?
I use DALL-E tbh
Upscaled Loglady ,
Bonus loglady
Lmaooooooo
I generally do not call those artists directly, but the latents do not accurately replicate a coherent understanding when the data is lobotomized.
It is literally what happened with SD2.1 and no one used it.
Donβt assume I am a fool just because you donβt want to admit to obvious problems lol
You do you man, more props to you if you can stand the abuse and keep civil. I would have snapped after two smug sentences
Oar-some!
With feet for hands
That's the BONUS!
i know a lot of people that use 2.1 - you, personally, do not like the fact that you cant' use a name as a shortcut to the vector data you think you want. that doesn't mean there's an issue anywhere but between your eyes and ears
My prediction is that we are not going to hear much of Lykon here for a while. I would be VERY surprised if corporate hasnt already put a leash on him for his unprofessional behavior of yesterday
Better off. A person with such a childlike attitude has no business being a customer-facing voice for any company.
You have some kind of teflon coating? I'm respecting this.
monster eating comfyui noodles
SD3+clip I+clip g+t5xxlfp8 consumes 13 GB of vram on my RTX 3060 12 GB. is normal?Is necessary t5xxl fp8 text encoder?
Him throwing gaslighting around and telling people to git gud on a major product release day, on public channels ? Absolutely unhinghed. Very bad look for SAI.
I've heard many things as I've been doing this for a while, if I haven't renamed Pony to something by now it's safe to assume I can take a hit π
Sadly, at this point, i have more hopes that an eventual ELLA SDXL will give me the better prompt understanding than that a finetunes/fix of SD3 2b will give me the pretty pictures i'd like (and by that i just mean aestetically pleasing, using styles / artist etc)
So how do you get specific faces when the model has no way to call them? π
The text labels them generically and you canβt replicate them with specificity.
How do you get a style when the style has no name and its key nodal trigger was severed?
You donβt get to respect IP and have a great style model, sorry. The words we use to describe IP and copyright have to be maintained for proper coherency.
No big deal, china will release something to prove me right before too long lol
yea he basically stired up the hate even more
since you don't really want any assistance or suggestions, you'll get none.
i unfollowed them, i mea i only used absolutereality 1.5 anyways
he's busy training 8b now so that's more likely why we wont' see them. i kind of wonder tho, why anyone treating him the way they do would expect ot be treated professionally? "The customer is always right" doesn't have much merit anymore
sd3 M
You have no solutions.
You are just upset that I am correct about Ip and the latents lol
what do you mean "sadly at this point" it has been available for not barely a day . same crap people said with SDXL... you have NO idea what will come of it, cos you have NO idea what will come of it, maybe you're right? but from what the community has shown me, I think probably not
cos...its skill bruh
no one ever attacked him tho, people were unhappy with SD3 and he basically lashed out at everyone. Only then people started to attack him
naw. i'm just not going to engage any more with a whiner that thinks they know everything and who has shown they know nothing
SDXL is awesome now, the base model is kinda lame.
Stability trains poorly and they always have
@toxic bone Just basic decency is enough. This man has too big of an ego. I read the whole thread from a neutral point of view (not a pony user, not a SAI fanboy), and if you talk to people that way, you are just a douchebag, thats the extend of it
so sad π¦ tears in rain. π§
i liked when he said "for the 100th time"
Tbh Lykon got thrown under the bus. They used him to hype SD3 and they used him to take the heat when it sucked. Maybe they'll fire him for unprofessional behavior idk, but doing so would in no way address the actual problem which is bad management.
I think it will work very well with finetuned models also with some limitations (and I'm not referring to NSFW)
exactly, I was responding to him saying finetunes will not be so hot compared to sdxl from what I've seen thus far, some... things, I mean I have high hopes with what we'll see soon even
Man go read his replies. If you side with his way of talking to people you are just as bad as he is
if "skill issue" and "git gud" hurt your feels, you might've been part of the participation ribbon generation
Ok mr βi teach ai image generation model architectureβ man.
There are plenty of people who love everything about the model for you to engage with.
If you canβt acknowledge shortcomings because of your ideological bent, donβt blame me.
Well, time will tell won't it... Ive been around the 1.5 release, the 2.0 and of course SDXL, for me this is by far the weirdest. One one hand, it is better than all else, on the other, it's by far the hardest to get something nice out of. Not saying 1.5 worked wonders, but in ways (styles), it responded more to prompts than this one. Ans yes, of course, each time expectations were lower
For sure, I think SD3 is bland as a base model, but I expect it will be awesome eventually.
It really βfeelsβ like SD2.1, which could have been improved, but everyone migrated to SDXL after SD1.5, so it got the attention to improve.
I am a very succesful developer with decent critical thinking skills I really dont feel treating ignorance of something at some point is warranted (hard sentence not a native speaker). everybody is ignorant at some point. Lykon probably learned half of the shit he knows when he joined the actual researchers at SAI. The Git Gud attitude is just pathetic
Ella sd3 is possible ?
I'm glad you been around since 1.5, I've been working with generative AI predecessors since well before even crayola, doesn't make me any better at predicting things without evidence to suggest anything...
Prompt?
lol git gud is long time internet culture. What developer corners were you in where RTFM and other "harsh" words weren't thrown?
Unwarranted. People were distressed by actual issues in the product you overhyped, its a good time to take a humble pill
maybe tell what someone is doing wrong as opposed to saying i dont know what the fuck you are doing wrong
when i took javascript courses at a community college, the instructor was relentless on newb questions. no mercy. that's just how it's alwyas been.
Friggin, linus of linux creation got sent to camp to learn how to behave better.
well, as i say, time will tell. would be great to be wrong, but after this botched launch, theres little reason for me to be optimistic
The helmet looks goofy, but its good so far
what botched launch?
launch was fine, some employees were not
employees were fine. fragile people were not
will have to wait for the bigger version to see idk
In a surreal, enchanted grove, an ancient tree with luminescent bark stands at the center, its branches forming intricate patterns against the twilight sky. Captured in stunning high-definition with a Fujifilm GFX 100S, 45mm lens, f2.0 aperture. The tree's bark glows softly in shades of blue and green, creating a mesmerizing, ethereal light, shown in perfect sharpness.
The background features a vibrant array of bioluminescent plants: glowing ferns, neon-colored flowers, and vines that shimmer with an inner light. Fireflies dart around, adding to the magical ambiance. A crystal-clear stream winds through the grove, its water reflecting the myriad of colors. The ground is covered in soft, mossy carpets, with delicate, luminous mushrooms dotting the landscape. The scene is serene and otherworldly, capturing the enchanting beauty of this hidden sanctuary. The overall photo quality highlights the intricate details and realism. Tags: National Geographic, award-winning, surreal nature, enchanted forest, stunning detail.
oh yeh, this 2b model is all it was made to be, totally better in manny ways than the 8b one in the api, seriously, are you trying to be difficult?!
Wrong, I can do it with base SDXL. No controlnets/adapters/loras.
Just because you canβt doesnβt mean the models donβt operate like that. Lol
https://github.com/DataCTE/sd3_prompting @lavish osprey to prove a point lmao
yeah fragile people. I am just waiting for you to say woke or use that terminology now
you sound mightly salty for something you don't even pay for.. why so mad? I wonder...
crap performance for what should be a transformer..
https://github.com/DataCTE/sd3_prompting check this out, try including to your prompts
I think, as told, SD3 finetuned models will be great. Not having the style knowledge is limiting also having lora. You can't have a lora for every artist or particular style. Without that knoledge images will be very similar one to another.
How i feel about SD3 is a mix of SD 2.0 and SDXL, The reason SD2 never took off because it was overshadowed by SDXL, and people whined about SDXL because it couldn't do poses, SD3 cant do anatomy but its far better then other models in all other aspects, with time its going to get better
ya why play the base game just use the mod packs
Legitimately no.
Its a grammar technique and you can find it on your own lol
2b sdxl is all you need
A young woman with a striking appearance, showcasing dramatic makeup that accentuates her features. She has long eyelashes and voluminous false lashes paired with vibrant blue eyeshadow. Her hair is styled in large buns secured by black accessories resembling chopsticks or a style commonly known as 'Buddhist Buns.' The makeup on her face includes dark eyeliner, winged eyeliner extending past the outer corners of her eyes, and contoured cheekbones. She is wearing heavy eye makeup that enhances her gaze. Her lips are colored with a nude tone, complementing the overall look. The woman is also adorned in black accessories that include chokers and possibly other body jewelry, although the specifics of these pieces are not visible in the image provided.
πΏ i get yall mad but take it to general if yall are just argueing
I remember waiting almost a month to move from 1.5 to SDXL, launch is usually a mess in a way or another
I've seen these crazy narrow unnatural hips often now. Looks like Barbie hips overpowered the 3 remaining woman in the Dataset.
What gets me, is I bet 99% of people here use only single prompt pathway and don't use all three clip inputs conditionings
month? I had to wait for almost 5 months before sdxl arrived to a usable state
Lol, i am sorry prompting is so hard for you π
([name 1|name 2]:0.7)
You are welcome, and yes any name real, or madeup will work.
It's not obvious what each of the CLIP / T5 boxes are actually for?
Not a lora but a SDXL refiner + SD3
my bad
Yes, plus no CN or unsupported cn, no IP adapter, etc
with / without
it doesn't affect my style of prompting much, because I'm essentially doing the same thing.
Captured in stunning high-definition with a Fujifilm GFX 100S, 45mm lens, f2.0 aperture
^ camera + photo settings makes a huuuuge difference
Tags: National Geographic, award-winning, surreal nature, enchanted forest, stunning detail.
^ not nearly as strong of an effect. but does positively shift it, as long as they match the subject you're generating.
How not? https://arxiv.org/pdf/1910.10683 here's all you need to know π
Are there any real benefits of 3 clips over just the 1?
Are they still training 8B, or is 8B already completed and they're holding it back? Is 8B what was demonstrated in the research paper?
I'm not a scientist.
It improves model quality, without it, model quality degrades
It will be as soon as models are trained, IPAd on sdxl is still a shod next to 1.5
@noble coyote well I don't know how "beneficial" per se, but it gives "different" feeling and control over prompt styles,
Hell I mean L / G prompting with SDXL can make some stuff SO much better than mono or duplicate prompting
i kneel
I suspect that 8B will be the Rolls Royce/Cadillac Marque - and chargeable. While we get the Lada or Trabant version i.e. SD3-Medium
That's all fine, I'm just confused about 8B's status
2B is so closer to SDXL so that's not the problem I think. 8B will require very powerful PC
I wager we'll see it, just as this, cos they will make more money if people train themselves to use it at home and decide to go commercial with their usage of it
OK
I'm happy to use cloud GPUs on runpod or wherever to run things.
or a lot of compromises to make it run on a high end home pc
SD3@ClipDrop is head and shoulders above SD3-Medium
I mean otherwise, with no access, people will just stick to sdxl and not pay for something thay have no famiiarity with
eh it technically should but not in sd3. using 2 is the best
why the heck is SD3 so fucking horrible?!?!?!!?
Skill issue https://github.com/DataCTE/sd3_prompting
it isn't, its good at some types of things and worse at other types.
for text, and complicated instructions, its better then sdxl
for humans and anatomy, its def worse
Lol βskill issueβ aka, natural language isnβt calling the latents well because of how stability trained the model.
SD3 at ClipDrop and Glif is superlative!
cos you type wrong words into the clip encoders
i really don't know where this misunderstanding is coming from. 8b can definetly run on 24 gig cards including TE5 because you can run the TE in RAM. The fp16 base model should have around 16-20 gigs of vram usage. So the only downside is, that it becomes a bit slower nothing else
give up the words
come on, that has to trolling, using "quality tags" is no skill issue. It's a model issue if that's the go to way get somewhat ok results
using the text encoders stability trained and released with the model
not simply calculating the raw embeddings mentally and feeding those into the model
pathetic
Isn't there an alternative to maintain the quality but using a little less vram? I am running it in low vram but it is very slow
Didn't SD 1.4 and 1.5 rely a lot on quality tags?
They did, but knowing the tags isnβt a skill lol
so... it cuts out 90% of SD users
It cuts-out low-budget "enthusiasts"
and with the release of nvidia 5000 series, the old gen becomes a lot cheaper
That's why I want to run it with the highest possible quality but with a little less vram
no way. NVidia never done
How much development happens with models that arenβt widely used? π€
Any?
Wierd, the 3000 isn't cheaper
memory tax
were you able to find any more hidden prompts?
Cascade was going in the right direction ... smaller, more powerful models ... so that a hardware change would not be necessary
Because they stop producing them, buy a used one for 300 dollars. They throw them at your doorstep in ebay
300 for which card?
a 3090 as i said
I would be afraid that they'd been used for BitCoin mining ... and near to burn-out!
could you share prompt please? thankyou
what kind of dollars you talking about cuz they are lik 700
i use comfy only on my mining cards for 1yr straight now no issues
frankly i am not happy with sd3...well i am disappointed
well yea, not if you are a bit careful. I bought mine 2 years ago and it is still working. Just look at the seller. Most private people just sell their old card for a newer gen. Don't buy from chinese vendors. In most countries BC mining isn't worthwile due to electricity costs
i gotta ask is Emmacado possible in SD3? 
idk all i know is i turned lykon into a bad ass anime girl.
it is true, and if you combine both the prompts you get this... if you add "Portrait and background by david Lynch"
And SD3 is better, because loras will work for big and little sd3. Cascade was just wonky.
Those βlow-budget enthusiastsβ ensure that the bigger half of SD3 will get needed attention.
And βlow budgetβ is a unnecessary drag, there are 1500 dollar computers that wonβt be able you run the full 13gb sd3 2b. Be reasonable.
(brain surgeon:1.35) preforming a ribbon cutting ceremony
yup, they weren't better for it, it was a hack due to training data. the big thing about ssd3 is that's supposed to be prompted by natural language, just like other recent models
I'm trying out the prompts in the GitHub with the classic girl lying on field but getting same results
I'm "half-disappointed" - as I cannot re-run all my old SDXL prompts and get anywhere near the quality. But using generalised terms (and not accredited artists' styles) SD3 can be a revelation!
Just feels like another tool, not a next step. Imo
@molten valleyCheck DM
imo we should go back to using sd 1.3 and disco diffusion
helps with a lot but it can;t fix everything
SDXL LoRAs are not directly usable in SD3? I mean I should link SD3 to an SDXL setup, and then use LoRAs?
you LIED to us! π
like i was expecting WAY better as seen in demo and paper...the images were top notch quality but now we get like after 5- 10 generations that too after experimenting with prompts and tweaking settings a lot
SDXL uses Unet. SD 3 does not. they don't speak the same language
Believing everything on the internet is a skill issue.
sorry fren! it helps a lot but not everything sadly
Is there any good text prompting grammar that has been found yet?
I just describe the thing with text and then quote the text, works ok but seems like it gets confused with the letters about halfway through the steps, sometimes works, sometimes doesnβt.
Do you have workflow/prompt in image?
SD 3 has anatomy issues with a FEW posistions - a person laying on something is one thing it needs more training on. at a guess, most of the data set is probably pictures of people standing up
Free SD3 will take a lot of taming and refining - for those who have patience. Meanwhile, SD3 is Free on Glif; and affordable at ClipDrop - and is much better quality than SD3-Medium
could you please share the link as well? thankyou!
Prompt works-ish
there was a good hack that used standing for it the image. #πο½sd3 message
9 out of 10 "Text" prompts on my SD3 setup are unusable/need Photoshop
a big part of the issue is that the model has three text encoders, and all three of them have different concepts theyr'e better at and not others
Dude...you can easily fix it with a 2nd pass to sdxl with 1.0 in denoise...skill issue
This looks exactly like base sdxl tbh
ouchh scary
9 out of 10?, almost all of my text generation have been good
an embedding might very well fix everything. we'll have to see
prompt: by artist "Greg Craola Simkins" <--- almost, but not quite, exactly what he creates
Got it on first try
How are you prompting text?
alright I'm back wtf 1 minute slowmo
My settings must be poor!
thankyou so much..does this require a sign up or something? and is there a limit to generations?
I just add "Contains text that reads "blah blah"" near the object i want to add text to
I made about 80 generations ... then it made a time-out ... came back 10 minutes later and carried on. There are many other Free SD3 Glifs - just do a Search
Got this in one gen of a woman laying in grass. It's top 10 worthy gens ever in 2B for grass-laying?
oh wow that seems great!
the limit is 200 images daily
Thats exactly what I have been trying. π€
Do you lead the prompt with the text direction?
but glif wich version of SD3 calls?
ClipDrop.co SD3 is $10/month for 1200 generations - which to say is 10 prompts/day - and 4 pictures/prompt
yes, here a image i made as an example
it is 8B from my testing, looks completely different and better than 2b local.
SD3 on Glif seems superior to SD3-Medium ... but I don't know what brand of SD3 is in use there!
damn SAI hates itself
It uses the API, which is the 8B version
I look at this generation and it brings tears to my eyes, just came across art from 4chan on DALL-E
How does Glif provide Free Sd3 at all?
honestly idk, but i know the creator of glif so i will go ask him
I'll try to reproduce this in sd3. I'll just have to remove the gun so the fingers dont show π
The trouble with Glif is , filters are super sensitive, I really want to play with the 8B locally!
Gotta say it's pretty good with selfies
Glif might not remain free forever, they have plans to monetize it because they are burning through money
its that same girl,
lol
Its free, so no cash loss. ClipDrop gives you nothing even if 3 out of 4 generations are blurred-out!
Get In!!!
Don't worry, 8B will get worse before release to match 2B.
Just put SD3 into the search box at Glif - and many other SD3 Glifs are available...
It also has Dalle-3
Fabian@Glif
I have been using Glif for a long time now
I think it took the art station prompt literally
Kinda expect glif wanted to switch to running SD3 local, not unthinkable it already got more expensive than hoped for, they're really in a tricky situation, ditch sd3 8b for 2b and get a lot of disappointed users, or keep hoping 8b comes soon
currently they have no plans on switching to SD 3 2B
I have an appointment with a mattress, a duvet, and a pillow! Peace Out!
Should have removed the gun
Leading with text as the very first thing does make a difference. In output coherency.
always five fingers..
man I thought it could do text well
The highest I've seen so far when running is 10Gig
so I do wonder about T5, I mean, it is a self attention model, I wonder if TELLING it to do something rather than the descriptive CLIP manner for example "Create an photograph of...."
It can do text well, it all depends on your prompt just look at the image above by r1bb1t
8B doesn't look better...
what is this T5 people keep talking about
π
when I found the reddit post about running SD3 it said very little results difference between regular and t5..t5 was just extra in case you wanted to do finetunes. This post here:https://www.reddit.com/r/StableDiffusion/comments/1de65iz/how_to_run_sd3medium_locally_right_now/
yo that was me on discord lol
Cute π€
it says T5 "only slightly improves results".
Your famous!
basically, sd3 has 3 text encoders(different sizes) You can use 1 of them, 2 of them, or all 3. More uses up more vram. 2 is a sweet spot for sd3 currently. The t5 one is the biggest one and hence usually performs the best. So use that and the 2nd biggest one which is clip vit g i believe
Im not using my glasses rn, is this real? it looks good XD
It's good and it's Stable 2b
Stabilityβs attitude has been garbage since Emad left π
yay lmao
i think i had enough of SD3 for today, goodnight yall
Say goodbye to Midjourney and welcome the new AI model that's set to redefine the future of AI art generation: SD3 Medium! BUT It has issues...
In this video, I'll show you good, the bad and the ugly side of SD3 and how this release will pave the way to the future of AI models!
Have you tried SD3 Medium yet? Let me know in the comments!
β¬β¬β¬β¬β¬β¬β¬...
Is he stoned/blind or is the video a sarcastic pisstake??
Longer words, multiple numbers, really stresses the text gen.
It will write simple stuff well
I tought he speedrunany% the biggest blunt but he was being sarcastic
aitrepreneur is awesome
watch the video and then say somthing about it
I guess this model did end up like SD2, a lot of people just hate it, but I still like it despite all the downsides
Yep, exactly the same thing
chill its only the second day, SDXL fine tunes didn't come out until a week later or something i dont remember very well
(fucking around with torchdiffeq, it's not going completely well so far but we're making progress)
I hope when 8B comes out, they tone down the DPO a little, so the model looks a little more bland and will have more variety, but that's just a guess
Its just starting like SD2, it can still end like SDXL.
With adapters/lora, we will get a lot of versatility out of 2B and 8B.
I did
I hope you are right. I wonder how much compute it'll cost to unruin basic humans.
yeah im waiting for finetunes and loras from the community
when onetrainer gets the SD3 branch pushed into master, I'm going WILD with training loras π *rubs hands*
Im sure finetunes will be even better than sdxl, not more, just better
I am hoping it can be fixed with just textual inversions
Lol, lora re contextualization in a finetune should βfix itβ, but it is literally going to require adding data to the latents as its obviously not in there right now
That sweet sweet 16-channel Vae though... I want it for XL. -.-
I like SD3 more than what I liked SDXL in the start. But it's maybe because I hope that finetuning model will be as great as it was for SDXL. I mean, landscape and prompt adherence is really good. So maybe it's even easier to train?
OH I hope boring reality makes a comeback, it would be SO GOOD for SD3 2B considering how good its already at low quality photos π
Also I havenΒ΄t finetune a model yet, only trained loras...but I think its time to stop using AI images to train a finetune model, its just a bad idea
No lies I like SD3
It doesnβt really matter as long as the associate tags/text is good.
We should be able to train on anything.
I still need a good up2date tutorial for that. Like - how do I prep a dataset to train 1 or more embeddings? I don't think I can just use my regular datasets for that.
synthetic dataset didn't end up helping ai art model like they did with LLMS
yeah basically sd3 is better at some things while worse at somethings. sdxl was basically mediocre at most things in the beginning
Has anyone found a way to make poorly drawn things?
I havenΒ΄t tried it yet, iΒ΄ll try rn, I had the idea but I forget about it, thanks for the reminding π€
Did people figure out SD3 yet? Or are we still in the chernobyl faze?
The main thing is, there is this model (8b) with far fewer downsides (and one more, it doesn't like the long ass prompts unlike 2b), made by the same company, and even the same model family (sd3), this (2b) was to be just more "lightweight". It's really hard not to turn into a skeptic when apparently, all that (8b) turned into this (2b). This is not what was expected by a lighter model, maybe bit worse prompt understanding, bit worse at details, but this model, is simply totally different, not in a good way
People are figuring it out, but we lack all of the regular tools to even begin to use it like we use the other models.
Give it a month.
looks like a colouring book
YouΒ΄re right i gotta prompt it to be more simple
another better tool is gonna get released in that time
arrogance is bliss
I agree, SDXElla look promising.
I have an sdella > adapter > SDXl pipeline that rivals midjourney.
But sd3 has wide accessibility and all it needs is an adapter to make it comparable.
sd 8b vs mj v6
Much the same unfortunately. Will post some results when I get a moment
Prompting for SD3 reminds me of prompting for cascade
Midjourney has some serious rizz
π€ Not so different tbh, sd3 did well
gotta wait til 8b, if it ever releases...
I hope if 8B comes local in the future, they will not train it to be super duper aesthetic like 2B or whatever
2b is aesthetic? π€¨
Im sure finetunned 2b wont be much far away from 8b in quality, it would only need to be more specific
Midjourney finishes so much better, similar composition but the βend detailsβ are hardly comparable.
SDXL cleans up like Midjourney at the pixel level, SD3 has some crunchiness in there. I kinda expect better samplers to help this out.
2b doesn't need finetuning, it needs re-training π
just look at non-photos, like illustrations or paintings, its clear that it was trained with a bunch of aesthetic stuff, it looks super saturated and constrasty and not something I'd see from a typical base model
but yeah, its not mj6-like aesthetics, but like typical SD finetune/merge aesthetics
It does horror great β¦ I always thought anything SD does has always and will always be the MJ killer
Lmao, i mean technically finetunning with more images than the training π π€ βοΈ
2b can generete like 4 styles, 8b can generate whatever style you want, it seems we have a completly different model than 8b
Oh yeah I've been digging sd3 using cogvlm captions from my own dataset, there's some lovely stuff in there. That seems to be the secret, use CogVLM style captions as prompts
bro why do we have to wait 40 sec to write, this shit pisses me off
Even people turn it pretty good
care to share a few prompts that turn pretty good? just want to know what such prompts are supposed to look like
I would, but I'm not at my PC right now. But...
Industrial-style dimly lit attic interior with wooden beams and cobweb-covered trunks. A large glass jar centered, containing turbulent water with churning waves and whirlpools. Amidst the jar's confines, two toy-scale battleships made of metal or plastic clash, their miniature cannons firing and smoke plumes rising amidst the agitated water. Dusty attic debris and old trunk labels blurred in the background, creating a sense of neglect and aged atmosphere.
THey should have released the half trained 8B instead then
Upside down houses...or something. Turned out kinda cool looking though
thanks! so not the crazy long stuff with flowery prose, this helps a bit
You could try trawling ideogram for prompts, they work quite well.
Some parts of SD3 have me really hopeful, but some others have me really worried
Like in some ways its so much better than SDXL, but it seems to have massive blackouts in its prompt adherence for random reasons
CogVLM style prompting it is. Thank you for the excellent suggestion
oh, great idea, tried a few on the 8b model, but somehow never tried in 2b, was just relying on previous llm rewrites with weren't really it, need better instructions/examples
My dataset is full of various VLM captions, CogVLM, llava, Phi3, and a couple more cos I like to keep a variety, most seem to work quite well, just pretend your a VLM describing the picture you want.
phi vision is better
How does phi write out its explanations different to CogVLM?
but is it for prompts? the model was trained on cogvlm output
Phi was obsessed with handbags and backpacks last time I used it.
They explicitly mention that it used artificial data in their Dataset now. Ai trained on Ai.
zombie gore/blood β οΈ
you can't tell me that this doesn't something like juggernaut or some other finetune
Naahhhh fuuuuuuuuuu*k, its a giant mistake, I had the same mistake using loras, other people had the same mistake training finetunes....damn
I didnβt use CogVLM to write the prompt, I just copied the style to make the prompt. And the fromat of segmenting the subject, background, and style in three paragraphs with isolated descriptions works well.
We are talking about prompt formatting. Not vision models necessarily.
yes, its a vision model. You tell it to explain the image. You can also give it examples. Its much smarter / more accurate than cogvlm, both in real world use and its benchmark rankings
Ai trained on ai only anplifies AI mistakes
that sucks if that's really the case, also this slomo is really fricking annoying, tone it back down to 30 secs please π
for the technical people - here's the SD3 network architecture https://encord.com/blog/stable-diffusion-3-text-to-image-model/
Not if its done properly which microsoft did in this case. Phi punches way above its weight
yeah I hope they don't turn 8B into an endless AI RLHF nightmare and make it look like juggernaut or dreamshaperxl
please keep it to regular images or something π
I love sticky headers which take up a third of the screen! I hate web designers with a burning passion.
Google?
aw don't see gguf, I hope gguf comes soon or taggui support
Woops meant microsoft
I just read the announcement. I'm pretty sure it was in there or in their technical paper regarding their dataset.
Nah training an AI images with AI generated images makes the model learn the bias and slightly mistakes made with the AI and "remembers" them as if its normal
Well I tried to get your flow to work with mine. I could not get that to work the same as your for some reason.
What I did get was a different type of style from how I did change my flow.
I can still use your flow though.
Please keep some humans in the dataset. π
If you do train AI on ai you need to make sure you also use real images to offset them?
This is the prompt I used "A childrenΒ΄s bad drawing of a car on the road, crayion draw style"
BTW: mods can you remove the 30 seoconds limit?
I have spent the last 10 minutes trying to get SD3 to generate an image of a golden apple, and it just... won't do it
I have given it over 36 attempts now with several prompts.
Again, it depends. Wizard 8X22B for instance is the best performing local LLM and it was trained mostly on synthetic data. Same with phi now. Its a matter of using GOOD synthetic data.
apple coated in gold? nothing?
Actually it's a minuet.
I grabbed your prompt from your workflow actually.
You can write a request on his repo.
Try messy scribbly sketch of... for fun times
not even when you want a golden statue in the shape of an apple?
@cunning lintel
do you have perms to decrease the slomo speed? and if so, could you please decrease it to like 30 secs?
"golden apple" is a prose term for a tasty apple. Did you try "an apple made entirely of gold" ? ... nvm, read your prompt. :/
but that ask for an apple, not a statue, not saying it solves it, but saw this as hamburger, golden hamburger, nope, but something like statue of gold resembling hamburger was ok
Try golden chrome apple, if you want shiny shiny apples
I know that for LLMs this is true, but for AI generators its different, I can be wrong but if that happened to me using only a few midjourney ai images for a lora, and to other models which were able to produce almost photorealistic images in the first finetunes but in the last ones it looks like plastic....I still think its a bad idea (but only for image generators)
The oil painting aesthetics look fantastic tho man. Like, SD3 seems to have great fine details, but its inconsistent prompt adherence and composition seem to really cause issues
Have you tried it with just CLIP without T5?
and with T5 on it's own without CLIP?
Golden statue of an apple worked!
Strange of SD1.5 and SDXL can understand the apple is supposed to be gold, but SD3 doesn't. Looks fantastic!!!
The trick is to fill in its conceptual gaps with synthetic data and labeling it as such. And balancing it out with non synthetic. Then you profit off it filling in those gaps without a loss in quality.
from the onetrainer discord
As far as i'm aware. It varies from concept to concept. Some will work great with both, some wont right now, until it is fully trained by the community.
makes you wonder why there is t5, an llm, even if basic, if it can't even understand things. i think it's just bizarro trained, many tokens like apple are soo strong it can't be gold, and so it goes probably for a load of words, but you probably know better than i what could cause it
Pixart uses T5 and does not have that issue. Its not T5
Its def just bad training. In fact, T5 should on paper make it harder to mess it up like this
No T5 is just T5, I didn't bother downloading SD3s T5 and used pixarts
Like i said. It can also be clip.
Pixart is a much smaller model too. It's tiny.
wasn't saying it was t5, was just wondering why t5 isn't showing it's there, if anything i'd expect t5 to make things easier to understand, not harder for the ai
Better give him the cookies
Its how the model was trained. They clearly removed any and all images of humans in any even slightly "provocative" pose
It depends how the transformers were trained on it, and if the captions are the exact same for T5 and clips, then there shouldn't be much difference for the same prompt
Never ask sd3 to generate woman lying in the grass
It will right now because CLIP is causing a conflict with some images and concepts. Most of it is just due to undertraining. It was to serve a purpose tho. (the undertraining) To surve as a model that is soft and squishy and easy to train. Something that is overbaked will likely be harder.
I'm gonna try fine-tuning it using different captions for the clips to the T5 if that's possible.
I just randomly saw AITrepeneur used my woman laying in grass workflow in his video π
The Grass Woman incident
It's not that bad except for when the prompt involves a specific act or action that it clearly wasn't directly trained on, I'm finding. It's definitely not "censored" more than Base SDXL was.
compare a person sitting / lying down vs sdxl. it is much more censored
It's not censored. At least not in that way it's not.
Deriving censorship from "not enough images were well captioned as being of a person lying down" is odd imo
WHAT THE FUCK! THIS IS REALLY GOOD! WHOA!
They actively removed images from the dataset to cause this. That is censorship.
Her head looks a bit small
zoomed the fuck out. lens distortion. Her arm is wicked big too. You feel me?
I tried the grass thing on like five XL checkpoints yesterday, NewReality was the only one that didn't produce at least half horrorshows
Either removed the images from the dataset or mislabeled them to make it worse than base SDXL
Like i said m8. It's not censorship. It's a lack of data. The model has been underbaked for easier community finetuning.
Yeah, the texture of sd3 really has potential. Imagine a good finetune model and make sd3 the final pass
They had that data for SDXL. Last I checked they didn't just toss out the dataset entirely. They actively made it worse at humans in any pose not standing
Yes because a company intentionally does things to make things worse, for profit, you know... licenses in droves by doing that...
No , what happened is they curated the dataset to remove any IP that they had no rights to, and things which are not part of what they wish to include in the model, and that's that... also, it is not bad at all if you know how to work with it...which none of us do by any means well yet
It handicaps the model.
Lets not pretend there is no negative impact
They curated it to remove any chance of people generating nsfw images from the base model to avoid possible legal issues. Don't act dumb.
The actual variety of people is extremely strong in SD3 too, the community should take care not to overtune in like that one frickin girl you see in every single animation on CivitAi
Anecdotal examples are not going to prove your point lol
That's what I'm talken about.
You right. They didn't. But they also weren't training a 800M model, a 2B model, a 4B model, and a 8B model. They didn't have the time to max out the training for 2B. So they optimized it for easier finetuning instead. Just wait for the finetunes if you don't like the base model.
That is not how it works... pixart is vastly less trained and does not have that issue
Its easier to remove nudity than IP
Where did you hear that it's intentionally underbaked? That's very interesting
They have trained every model they have released poorly lol
Stability did not do this βon purposeβ
1.5 prompt run through SD 3. prompt: she underwater dreams at dawn,head and shoulders, long luxurious lustrous hair, watersmoke, portrait by Pino Daeni";"waves, bubbles, streamers,smoke"; synthwave ethereal "long flowing hair" "ethereal, fractal iridescent, filigree" <-- sampler: heun, scheduler: simple, modelSamplingSD3: 1, cfg: 4.5, steps: 28,
Pixart people regularly come out as melted plastic messes, it's not that good
from two of the staff memebers yesterday m8.
Why are you trying to defend it. Its not like im saying they are evil or some shit for doing it. Its to cover their asses legally.
it's very intentionaly rushed out, becasue the people in this discord wouldn't back off and let them finish the training that still has to be done. so they just tossed a smaller model together and released it
I mean yeah, but that isn't what I wanted. I got what I wanted in the end, just needed to severely hold SD3's hand. It looks amazing now tho
Yes, because its very very undertrained compared to SD. Yet it does not turn people into spaghetti monsters if you try and make them lying down or something
It's up for own opinion. But if they aimed for pure consistency. Everything would be overbaked. It would be harder for the community to train on.
So we have to do multiprompting now?
Ffs, canβt stability release something that works normally without adding new hoops to jump through.
No no no, don't blame it on other people. It was 100% SAI's doing. They supposedly had results from SEVERAL months ago that look monumentally better than what we have now. This scapegoat of the community being to blame is succhhhh BS
i think you might be a lot happier if you created your own model
You talking to me here?
It does the same than XL, bikinis, nude without any details on npples and pssy. People are over dramatic
what you said is not what you're alluding to here, you said they mislabeled things to avoid their usage. I disagreed with that, no shit they curated it, they have to. It is not mislabeling, but intentional curated removal that allows for uncovered breasts, but no nipples lol.
@stray bronze this is also a smaller model, it is not quite the 8b I wager it'll be a much more interesting model when it is released.
@solemn raven you misunderstand what I said. They removed IP that requested not to be trained on, was what I meant. Things they're not supposed to train with, from the owner of the image itself. If someone had a stock photo of mickey, legal with disney or not, i wager if the stock photographer who took the image did not request not to train on it, it will be there of course
i'm not blaming it on anyone, i'm telling you bluntly why you got this now. been to the artisan channels lately? we're still beta testing in there
I thought you were done arguing?
Or do you just want to defend an image model like you have some ownership of it?
I have my own pipelines that work exactly like I want, and I generally find ways to use stabilities models better than they do.
Sorry if its becoming a sad trope.
poor baby doesn't understand things
Bro, SAI staff was showing off images like this MONTHS ago saying SD3 could do photographic realism
Literally like 3-4 MONTHs ago. If they somehow managed to train backwards, IDK what to say
what am I arguing I did not say it was perfect or wonderful even, just that what ada said , was not the case.
Then move on and use your pipelines.
Left image: ground truth. the rest (right): SD3 (API)
8B is going to be cool, but I canβt run it local, and if I need that much hardware capacity, I am simply going to look elsewhere.
Stability were aimless and they still are.
those results have nothing to do with sd 2b, some of the more remarkable replies yesterday.
It does "Son Goku next to Garfield" quite nicely
SD 3 does do that. in fact, i just posted one like that with the 2B version of SD 3 that was released as medium which you are whining about. go work with ultra in artisan if you want to see what 8B is capable of.
ATM I don't even know which way is correct. To get something decent in terms of anatomy you need to vigorously prompt every single aspect of an image in order to get just one out of a hundreds of generations to look not completely crooked. Three way CLIP is: L is for stylistic aspects of an image (is you trying to generate a photograph, or a painting), G is for short description of a scene, objects, etc. t5xxl is the most powerful tokenizer with biggest buffer that should be able to understand lenghty sentences, not just tags
I agree. I've noticed some folks on the subreddit have been thinking that we automatically need to assume they've been showing off the 8B model, not the 2B one. If that's the case then that changes a lot, but does that mean we were potentially misled to a degree, possibly. Still not upfront on their end.