#๐๏ฝsd3
1 messages ยท Page 20 of 1
nice
Because lying down is too provocative.
this prompt actually works
Why is this a childโฆ
SD3 ๐คฃ LMAO
she adorable
all this wait for nothing
Find the extra finger
2B is all I ever wanted, if internet shoots down now. I could play with it for rest of my life. Variety is fantastic
๐ ๐ฎ ๐
hello! First off, thanks so much for all your hard work guys and for releasing the model to us to test and play with!
I am pretty interested in it's ability to understand different concepts, such as the example of a blue ball on a red box etc.
Are there any prompting tips for achieving these kinds of results? Say if i wanted to be able to describe each person at a dinner table and not ahve any bleed over.
The SD3 dataset probably didnโt have enough images of humans in diverse enough poses to generalize knowledge of anatomy.
are you trollin?
because yulia likes to post child suggestive stuff and they're still not banned from SAI's channels for some reason despite it being specifically listed as violating policy.
First try, no problem!
finetuned sdxl is miles better rn
SDXL Base vs SD3
It's supposed to be "progress" ๐คท๐ปโโ๏ธ
๐
8b vs 2b
wow shes so beautiful, whats her skincare routine
is this for the grass wall??
looking forward to use 8b xD
I think putting a cap on the number of generations is the only bad thing about the license. Itโs not something you can realistically track and is basically telling someone how much they can use their own compute.
Where can I find the 8b model
I really hope something isn't right with the settings, this is just too bad for anatomy :/
in order to fine tune, does Stability need to release something?
only via api
something radioactive
or the Kohya dev just needs to figure it out?
isn't there a way to bypass triple clip loader? I thought someone said on here that I can use a double clip loader to use the clip inside the checkpoint ?
Thank god we still have SDXL Turbo and loads of LoRa and Models to play with , this just wasnt worth the wait. I can add text with The Gimp and be done 10x faster.
Where is the 8b?
you can use it for free on glif
looking fordward to 8b
What token length does sd3 support for prompts?
yes. same prompt. you can see it in the bottom.
also is it just me or are 90% of the women when you generate things w/out specifying race white or asian
makes me wonder about the training data
R they ever gonna drop its weights
grass wall gang rise up
Not amputee, just a bit different
don't want to use something like that online .. for this I use midjourney in "production" ๐
there's many apps there that use sd3 8b, and they come with prompt processors and sdxl upscaling
maybe in august ppl say, we dont know
ah, i use the service for fun anyways
ah im in the right place at last ๐
girk gang, in action movie poster , an oil painting, in 80s, rivets, long hair, in the style of mike deodato, luis royo, the neutral colors, paul hedley , happy, in city, movie poster Style: Neon Punk
Style: Neon Punk
So it looks like sd3 still works with the FaceDetailer node ๐
That one is a simple one
A Photograph of a woman in the snow wearing a short skirt
says alot I came to the discord instead of making more images...
we want to test and experiment using controlnet, openpose etc. for poses like handstands, breakdance moves etc. that would work for our image concept ๐
girk gang, in action movie poster , an oil painting, in 80s, rivets, long hair, in the style of mike deodato, luis royo, the neutral colors, paul hedley , happy, in city, movie poster Style: Neon Punk
Style: Neon Punk
alright I figured out how to bypass the triple clip you just bypass it that's all if the model has clip it'll work
Nice. I'll have to switch over Comfy UI then.
thanks
I think clipdrop.co switched to SD3 2B and not using 8b api anymore... wonder how I know xD
LEFT is SD3 and RIGHT is afroman sd3 (they look exactly the same before I bypass clip)
Same seed 123456, and yes, it does different things. Prompt: A person wearing a hooded garment, with their face obscured by a dark, textured substance that appears to be melting or dripping. The substance is speckled with bright red and orange particles, giving it a fiery appearance. The person's hands are pressed against their face, further concealing their identity. The background is blurred, emphasizing the foreground, and there are floating particles or debris scattered throughout the image
SD3 BASE <> AFRO'S
the horrors of war
LEFT is afroman RIGHT is SD3
hopefully there's going to be a sd 3.1 that fixes some issues (especially with people doing poses and laying on the grass haha)
better every time
i like AFRO's better in mines and your example hbu?
SDXL Base vs SD3
Hmm.. Something about using FaceDetailer causes next generations to slow down immensely though. Went from 1 it/s to 20s/it
girk gang, in action movie poster , an oil painting, in 80s, rivets, long hair, in the style of mike deodato, luis royo, the neutral colors, paul hedley , happy, in city, movie poster Style: Neon Punk
Style: Neon Punk
Wouldn't recommend using FaceDetailer at the moment
Screenshot of a facebook post captioned "in my meth era" on top. The account who posted it is named "John Doe". The man in the photo is shirtless. There is a US flag in the background.
from above, a woman laying on a grass field taking selfie photo
It's probably a lora merged into the checkpoint, it'll use the CLIP part of the lora to 'enhance' it, I've been testing loras myself
another gore image...
So celebs are fully deleted as well huh? My SDXL custom model vs SD3
it doesn't know styles like Deodato. It doesn't know any style
It seems like a lot of pose phrases like "Lying" "Laid" even "Sitting" creates meat blobs
That worked ๐
It's pretty good with artistic images
are you trollin? It's not even in same universe
that's how he pulled it off so quickly? doesn't stuff like this take a while to process though?
SDXL Base vs SD3
Same seed, using the Cube Shaped Anatomy SDXL lora first, bypassed 2nd.
Luck with from above, a woman laying on a grass field taking selfie photo seed 602204995
neg: featureless, colourless
AFRO's version of the same prompt, it's still a dirty mess
Which checkpoint is everyone else using?
the right is better i'd say
t5 one
They're all identical, the lower two include CLIP/T5.
SDXL Base vs SD3
All 3
it is explained here: https://comfyanonymous.github.io/ComfyUI_examples/sd3/
SD3-medium-inc-clip & SD3-medium-inc-clip-T3XXL is standalone
SD3-medium required the text encoders be put into the /clip/ folder
if you have RTX GPU you should be ok for the FP8, check
Here's another one. cnmtq cinematic scene with darth vader on the death star first pic is with my Cinematique LORA, and 2nd is without. Same seed (769925201549290)
i ended this abomination before it got decoded
I didn't
NOOOOOOOOOOOOOOOOO
SDXL Vers.
PROMPT:
girk gang, in action movie poster , an oil painting, in 80s, rivets, long hair, in the style of mike deodato, luis royo, the neutral colors, paul hedley , happy, in city, movie poster Style: Neon Punk
Style: Neon Punk
damn, I need a minute with this one
she survived
hey, someone know which folder i suppose to use for TripleCLIP? i try to create a TripleCLIP folder but it's dont show up in comfy
Put the clip files in the clip folder :]
weren't wheels problematic before? this looks okay
put the encoders in models/CLIP folder.
ahaha so simple, okay thanks for your awnser ๐
Does this count?
highresfix is so goddamn good on SD3 Base
Time for Burgers
we need finetuned models as soon as possible
With USDU?
hiresfix by sdxl?
SDXL Base vs SD3. prompt "a woman resting on a grass lawn. she is laughing. she is fully clothed. this is a safe for work image." Why does it almost generate NSFW but still body horror. How is censorship or under
Umbrellas still suck.
Now do it again but for sitting on chairs
nope, base model, but with ultimate SD upscaler so it uses tiling
kek
the trick is, to use shift 1 instead of 3 when upscaling
but ONLY when upscaling
when you make the base image, use 3
You upscaling with SD upscale?
as good as the VAE is with small face, nothing beats highresfix
they'll say you're a SAI shill now
how long till training code is released my friend?
you are a SAI shill now 
nice motion blur
You can't just willy nilly lay a woman down on grass. You first have to take her out to a restaurant, wine and dine her.
you can use ultimateupscale with the sd3 model. just set the tiles to the size of your original image. so if you were using 1152x896, you'd set the tile sizes to match that (that way it's just creating a perfect 2x2 grid if you 2x the image) and set the denoise low like 0.15-0.2
Shemales
maybe it already is? Check Diffusers. I'm on vacation
SDXL Base vs SD3 prompt "from above, a woman laying on a grass field taking selfie photo"
ask @dim fiber or @muted dove or @gentle folio for their settings man ๐
SD3 s t r e t c h e s people out
lovely
lol
yo, is sd3.1 a possibility
how do you know which ones are actually using SD3?
NICE
for fixing 1 stupid prompt the whole community is focusing on?
In 2 weeks
(which apparently doesn't even need to be fixed)
perhaps lol
Two more weeks
Save me....
Why? Also didn't we establish there are a lot of prompts having issues?
just type up "sd3" in the searchbar and you'll see a bunch of spaces of em
I think we should stay away from the subreddit for a good while
Does anyone know what the difference is with T5XXL e4m3fn?
anyone claiming that SD3 isn't utter rubbish on there is going to be exiled no questions asked
what I've seen now is a bunch of people getting monstruosity and some people succeeding.
hands fixed in afterlife
bro..
Wait. Is that what's happening? Everyone is trying to make people lying on the grass?
Alright, you just made it sound like the issue people encountered doesn't have to be fixed
bro...
What do you think, won't finetuning break diversity of sd3?
You were talking to me? xD
to me I think
I hope I can try finetuning with 12gb 
Mine was a lucky hit, most of those laying down ones are just bad.
uh oh, you can't say that, it doesn't fit the narrative
"T5XXL e4m3fn" Does anyone know why it has that name? What does the "e4m3fn" mean?
Reduce the CFG down to get rid off the AI look/
keep going with head trough the doors while there is a doorknob.
Thanx I figured as much that I went way to high on that
idk im getting decent hands
New VAE is indeed amazing, the CLIP models are doing wonders for keeping track of more complex prompts, but the 2B model seems to be really struggling seemingly at random
yeah youre deffinetly trolling
I like to think I know what I'm doing but I can't get a good image with that prompt out of sd3.
80s anime still, girl fixing a mech, retro fashion, muted retro colors, style of Dragons Heaven
@muted dove this is all i get
Something isn't right here
Oh! I see. Thanks. Thought it was something else that i wasn't yet aware of. xD
prompt?
beautiful hand
I love this model
the anatomy of everything is messed up whe doingn actions or poses
did you paste that prompt into just t5 or all 3? I'm only using clip_g and t5 with clip_l blank.
Saint Jerome is jacked
slick
Only using the T5 field and leaving the others blank
how tf did you do that
do you feel that covers enough topics? when I was trying that before, it often didn't know who people were etc.
just drag png into comfyui
Imo needed probably 2-3 months of training to make encoders converge on more complex things. Better to release it and move to 8b.
Will give time to the community to catch up on the new architecture
I don't know what the others are supposed to do, but had best results only using t5 so far
with t5 alone.
3 out of 4 good
it's good for commercials :p
mm, interesting, 8b at the moment looks pretty good
I love SD3 but it's not 1 stupid prompt. Lot of bodies not only the ones laying come out malformed
for the cat, the best one was clip g.
๐
this looks like sdxl
the horrors of war #2
god
so, Im guessing all the bad renderings are because we dont know the right settings for prompting yet?
It's as good as SDXL at archery
could be tes fighting each others sometimes
OH CRAP
harry took the lookmaxxing class
what lykon is now saying, is that some stuff is going to look better with clip_g, some stuff is going to look better with T5. the problem is that the different encoders are fighting. what looks great with one, will then get messed up by the other one that's not as good. so with each prompt, we're supposed to copy/paste back and forth around the different encoders to see which one has the training. long sigh.
Yo guys she is actually sitting ON the bench with SD3, not that weird unnatural thing SDXL does
i can fix her
you need to train A LOT to make the tes converge on all prompts (or restrict the model scope).
For simple poses they don't fight, for more complex ones or things that the model has seen less, they'll conflict.
This doesn't happen as much with 8b because it's bigger and was trained for longer.
I just remade linkinpark album cover
lower the cfg
k
I appreciate the candor. I've been doing local vs api for a bunch this morning, and the 8b on the api isn't perfect but the shenanigans count is just night and day lower.
perfect form PR
AH didn't realize there was a new sd3 text encode node. Niiiiice
Same prompt and seed in T5, Clip_l, Clip_g
a photo of a woman reaching out to the viewer, she is smiling and a funfair can be seen in the background. It is a grey rainy day and she is completely wet
3
we now want to transfer 2b image clarity aesthetic to 8b.
Don't prompt for a watermelon eating marshmallows. Worst mistake of my life.
Hey, good to see you! I hoped there would be some SD3 launch event that you would host, as the SDXL one was amazing. But nope
. Anyway, you can try this prompt structure, just change it to anything else you like. Prompt: Three girls in a hotel lobby: the first one is blonde with red dress, the second one is ginger with white dress, the third one is brunette with green dress
See? It toned down the AI look
So what you're saying is 2B isn't all we need after all?
no, I think that 2b is all you need as an architecture
seeing as I already think 8b is awesome, i have high hopes.
Jo Biden...... ๐
I can't deny at least the fact that the VAE is exquisite though
guh
by the way, most stuff should look better with all 3
i'm guessing handstands are off the table
maybe decreasing the strength of clip_g and clip_l could help?
Don't people get the weird blobs when using t5 only as well?
What the hell is clip_l doing here?! ๐
compositions are great and look way more interesting - cool. prompt adherence is next-gen.
but the image quality of the outputs looks very compressed. maybe I'm doing something wrong but there are many compression artifacts or halos around objects - at least for me.
Same prompt in all 3?
how is this acceptable to release?
I've only tried the sd3+t5 combo so far. Could be better, could be worse. It's ok though.
Most of the weird blobs are because people are still writing prompts like they did for earlier models. You must use natural language for SD3.
not necessarily, but yes
I'm curious how long 8B takes to do 20-30 steps on an A100. Is it way slower than 2B? 10 seconds?
We can do non-centered subjects. ๐ 40% hitrate.
A sinister figure lurks in the shadows on the right side of the painting. In the center of the painting, an empty alleyway extends into the infinite distance, receding into fog.
SDXL still rocks ๐๐
Prompt: Britney Spears DSLR cinematic shot, blade runner style, oops I did it again , photorealistic photo of a pink haired singer laying on a grass Coachella, after-party, messy hair, , bikini tall grass
@lavish osprey how well is it with dark horror Iโve only played around with hp Lovecraft fused with jack skeleton
you guys liked this game back in the day too? it was my favorite PS1 game
I have the same problem on my local machine. Colab doesn't have the same issue for me. Something connected with negative conditioning
there isn't much speed difference compared to 2b
SDXL Base vs SD3
Really? Thats great to hear
hmm interesting. I will investigate this! thank you
at least 8B won't be a chore to use
It is a bit better...but... 
Taylor Swift DSLR cinematic shot, blade runner style, oops I did it again , photorealistic photo of a pink haired singer laying on a grass Coachella, after-party, messy hair, , bikini tall grass
I'm too used to decent speeds such as 2-4 it/s
Hopefully
well, I won't bet a hand on it, I always use them on H100 so I don't really notice ๐
I guess if you have vram offloading to do you will see a huge timesink
I wonder when 8B will be finished... September? October? Later?
this is a late april fools joke right?
working as intended
I mean as long as 8B is less than 2x slower than 2B, it's still good news for us
Apparently one of the three clip models it uses is absolute dogpoop
Britney Spears DSLR cinematic shot, blade runner style, oops I did it again , photorealistic photo of a pink haired singer laying on a grass Coachella, after-party, messy hair, , bikini tall grass
your kid calls you at night and you go check on him an see this, what do you do
but to be honest if I were to decide I'd think twice before releasing a model if that's how the community reacts (it's just my opinion, not SAI)
If 2B still requires months of training, then it's anyones guess how much could be added to 8B. My guess... Maybe early next year? Idk...
for a base model, its better than XL at understanding. Now, what we need is some videos on how to finetune it so people stop bitching and comparing 3 to the latest finetune of XL
Are there any news about training/finetuning out there yet? I haven't found anything related on Reddit or here so far.
I'm coming from a pretty specific set of needs (limiting latency for www.wand.app) so I wouldn't base my perspective on the community as a whole or anything
Thank god it's not a man getting photographed there
when we released 8b on API, 2b was in this state:
this should answer your question
a realistic and high quality portrait of a beautiful young woman with european ethics. professional photograph of the woman in her minimalistic apartment. The wall behind the model is painting in a pale purple color. The woman is wearing a dark yellow turtleneck and white troussers. She is holding her smartphone in the left hand. Thephoto is taken from a side-view and the woman is looking onto her smartphone. Her hands and fingers are well proportioned considering her appearance. the photo is a high quality photo. the photo is shot from a wide-angle view. The photo and textures are in 8k resoluion.
People issues
That does change a bunch
(and also should help you guess how much we worked on this release)
It's unfortunate that an early version of 8B wasn't released then instead
or learn how to prompt to start with? Model just released
SDXL Base vs SD3
that's the current state of 2b as well. look at the mutants it makes
protip: reduce the strength of all clips so they work together rather than fight eachother
ultimate combo
This all happened with the release of XL also. people bitched a lot when it came out, but those were just the noisy ignorant folks. I see this model with far more potential than XL base had for sure.
no matter what, people will always complain in open source releases. if something is free this is the fate, but community can be educated, i hope
3d render of n64 mario standing smiling smoking a cigarette in his mouth with smoke, looking at the viewer, pitch black background
no negative prompt
first is from glif stablediffusion 3
second is from stable diffusion 3 medium super prompt huggingface space with prompt enchantment disabled 25 steps 7.5 guidance scale seed 384575511
absolutely agreed
that same prompt/seed now
I see the potential, I see some things work much better. But is the deformation because of misuse or is it unfixable?
SDXL 1.0 Base
Red lips Beyonce Knowles DSLR cinematic shot, blade runner style, oops I did it again , photorealistic photo of a haired singer laying on a grass Coachella, after-party, messy hair, , bikini tall grass
what now? no shit. I am asking if there is finetune info out yet.
not SD3, though
Beyonce Knowles DSLR cinematic shot, blade runner style, oops I did it again , photorealistic photo of a haired singer laying on a grass Coachella, after-party, messy hair, , bikini tall grass
guys, won't finetunes destroy variety of faces in sd3?
you can say all you want about anatomy, but that's terrible image quality
ask someone else, I will not sue finetunes. Base is all I need
Some old dall-e 3 prompts copy pasted
Training does happen at a fair pace then. Yeah. 8B might actually be ready sooner than i thought it might be. You doing a great job anyway guys. I can see 2B easily outperforming SDXL with more training. It's got genuine potential!
SDXL Base vs SD3
Wondering the same question as well...
I can see 2B easily outperforming SDXL with more training.
It already does on basically all serious benchmarks...
How did you get her to look like TS?
I'm just using Huggingface google Space for SDXL ๐
judging by how blurry that is, it was made with base xl
lol
Full body Rihanna riri badgalriri crazy in love lemonade DSLR cinematic shot, blade runner style, oops I did it again , photorealistic photo of a haired singer laying on a grass Coachella, after-party, messy hair, , bikini tall grass
there is only one way to find it out... try it out
hows sd3 guys? yay or nay?
Not SD3...
LMAO ๐ข
Yes, they're not SD3 anyway....in grass issues with that
It can't spell "Sally" to save it's life though
maybe every comma in the prompt creates one extra joint ๐
Yeah. The community has a good head start with that as base too.
Just look at photos being posted๐
a bit like how XL was at drop...not great, but ripe with potential for finetunes.
Will share SD3 versions for comparison next ๐
Why the FP16 and not the fp8 one?
yea, i mean like everything is awesome, it just need anatomy and pose improving
i felt SDXL was decent at the start
both clips and fp16
comfyui has a node that separates the 3 text encoders so you can prompt each individually, what's the use case for that?
prompt?
full body shot of cute pale skinned european 18yo teen at the nightclub with {platinum blonde hair|blonde hair|ginger hair|black hair|brown hair|pink hair} and with big lips wearing pink lipstick and eyeliner, {smirk|happy|staring},
4k, hdr, 2160p
neg: bad quality, poor quality, doll, disfigured, jpg, toy, bad anatomy, missing limbs, missing fingers, 2d, drawing, asian, ugly, dark skin, tanned, acne, fat, obese, mature, old
I go away for 10 minutes and there's 50 new posts.
i agree its too early to panic, sdxl was garbage on day one, but sdxl day one didn't make people look like Star Trek Transporter accidents
It does a lot of things better than sdxl did. Prompt adherence is much better, text is much better. The stuff it's bad at, sdxl was also bad at. So I assume it will end up good. My question is: s it better than stable cascade?
thats not sd3 right?
Oh! And it was able to do that Back of the violin prompt too! I tried this prompt out and SD3 got it perfectly first time! That truly blew me away. Before now, i only saw models like Deep Floyd successfully do a prompt like that.
That dog is having Vietnam flashbacks
@lavish osprey i'm guessing unsampling is no longer a thing with SD3?
Trying to see how changing different clip prompts affect image. I haven't learned anything so far xD
Well, that's one way to look at it, when announcing 2b you said things like
@lavish osprey4072I personally don't think anyone needs 8b including us. With less training time and much less resources, 2b beats 8b in some benchmarks. Sure 8b is objectively superior in terms of potential, but the cost is much higher.
Things like that raise expectations, esp when the 8b model is what people know. I think no one can objectively say what 2b produces comes close to 8b, maybe in a few really narrow cases, but it's really really challenging to find gens that stand out using 2b, even pixart often looks nicer. Neither does it help to act o defensive, comments about skill issue and such, not helping, if it's skill, then teach people, show how to get the good gens! I'm clueless as to how, and mamy with me.
SDXL ๐
Best adherence out of 8.
From left to right: An frowning old man, a sleepy young woman, and a laughing toddler. The old man is wearing a red fedora. The young woman is wearing a blue jacket. The toddler is holding a stuffed bear.
which image?
Yooo its epic, do you have the prompt of that image?
Pretty interesting conditioning. That works well?
SDXL Base vs SD3
A photorealistic monkey inside a claymation style city
I'd add photo in front
Left is Ultra (API) and right is local, t5 + clip:
lmfao
same picture
why not?
the one with rhianna
I can't wait to get home so I can use the model and create a monstrosity like you guys are๐ oh it's gunna be fun
Ultra is the best
I went back to the official ComfyUI SD3 example workflow. The prompt was: a woman dancing in the streets. The jpeg like compression artifacts are sadly all over the image.
Not sure if I'm doing something wrong here.
The second one is better.
but you said Ultra is just a workflow and not a model
15 steps cfg 2.5
no i think those are sdxl. my mutant beyonce is sd3
nice find
But why didn't they release it to the public then?
Keep in mind that you need the fp16 to get best results and the prompts in SD XL will need to be completely redone in SD3
eular?
a "middle-aged" woman
Ultra is not a model, it's stuff like Dalle or Ideogram
Gotta eat some potatoes
Im using f16t5. I used ChatGPT to expand my prompt.
im trying with some different settings, model sampling on high and sampler on low
agreed, im prompting in sentences, not keywords. LLM generated ones are also quite good
it's all already released basically lol
I think Ultra if it had been released to the public would solve all of the issues.
use node superprompt, it expands very well
actual answer: no one would be able to run the darn thing
SDXL Base vs SD3
you mean using SDXL lora with SD3? I`ve tried both 1.5 and XL LoRAs with SD3 with the ComfyUI workflows, and I've got the "lora key not loaded: lora_unet_up_blocks" error on all tries.. How did you managed to make Loras work with SD3?
no because you wouldn't be able to run it lol.
SDXL< >SD3
Prompt: girk gang, in action movie poster , an oil painting, in 80s, rivets, long hair, in the style of mike deodato, luis royo, the neutral colors, paul hedley , happy, in city, movie poster Style: Neon Punk
Style: Neon Punk
guess i need to try again then
i was getting a black image with flipped sigmas
i will look for SuperPrompt node. thanks.
kirby smoking a fat blunt
no negative prompt
first is from glif stablediffusion 3
second is from stable diffusion 3 medium super prompt huggingface space with prompt enchantment disabled 25 steps 7.5 guidance scale seed 1062578693
6000 ada not enough?
Ok, a workflow you said. I don't understand, but that's ok. It would'nt run even in the A100?
I mean sometimes it does deliver
any word from stability about this pile of junk. Realise that there are also company's using SD and where wiling to pay good amounts for it but with the mess up most will stick with SDXL or jump to a different ship
kirby
no negative prompt
first is from glif stablediffusion 3
second is from stable diffusion 3 medium super prompt huggingface space with prompt enchantment disabled 25 steps 7.5 guidance scale seed 1682567628
do we have a clear answer on how many tokens a prompt supports?
You'll have better luck having SD3 Ultra getting released to DreamStudio than it actually being released. The reason why they didn't release it wasn't because of money, it was because they knew only a small handful of people could run it
idk who it was, but someone did mention to lower the cfg
true
I run stuff on colab and would gladly pay to run Ultra on a A100 over there (easy these days). Well, for SD3 stuff I will use the API then...
anyway, releasing Ultra would be like releasing Dalle. It's not a model.
these artifacts are annoying - zoomed in, some seeds are ok
roblox avatar with a black low taper fade afro, wearing a black suit and a white buttoned shirt, holding a phone with a wood case on it, taking a selfie in the bathroom mirror
no negative prompt
first is from glif stablediffusion 3
second is from stable diffusion 3 medium super prompt huggingface space with prompt enchantment disabled 25 steps 7.5 guidance scale seed 2670674994
the roblox style literally vanished lol
git gud!
all styles vanished ๐ฆ
๐ญ๐ญ
booba vanished
I've got 128GB ram on my macbook, I'll take one Ultra Model + Workflow please. And probably wait 5 minutes per image.
Yeah it will probably never happen
making good anime was definitely a priority over roblox (which might also have copyright by the way)
Quick question isn't this all solveable since we have the weights just need to custom fine tune ?
im trying to figure out some good params
yep
it's not roblox. I tried lot of illustrators without success
All three text encoders (T5 FP16) along with the 8B model will take over 25-27GB of VRAM easily, not to mention RAM usage. Very few people will use SD3 8B.
minecraft gameplay, screenshot, hud
no negative prompt
first is from glif stablediffusion 3
second is from stable diffusion 3 medium super prompt huggingface space with prompt enchantment disabled 25 steps 7.5 guidance scale seed 3038793424
wtf
our task was just to provide a good enough base model in the shortest possible time. I think we did it.
This was the shortest training from scratch in our history. Had a fraction of the time/compute of cascade.
you can run text encoders on the CPU with no issues
@lilac latch cant text encoders be ran off cpu ? most llms can
4 bit T5 works fine
It does work somewhat
8b knowledge >>>> 2b knowledge
quality is far worse with t5 fp8, i'm sorry
you would need an a5000 to run it
I think 2b will be enough after finetuned models
So far is it worse than even sdxl ?
fine tuning code going up today?
Hi, I'm trying to figure out how to run sd3 in comfy. The example workflow gives me 3 missing nodes that seem to be sd3 related.
Any idea where I can get those?
Or is there another way to do this?
that's why you run f16 on the CPU
Hi guys, im new here, cant understand how to generate pics via bot or somthing can you help me?
That truly is disappointing 
update comfy
You need to update Comfyui to get the nodes
I was told SD3 2B will not run on 12GB VRAM, and it can work with less then 5Gb. So thats that
oh ok thank you. I thought it updates on start for some reason.
even if so T5 can run in ram
fp16 package should be already uploaded, or will be shortly
i would wait until there is a 3.1 of sd3 medium or until someone makes a finetuned model that is trained on sd3 ultra images
minecraft gameplay, screenshot, hotbar below with heart bar and hunger bar, the player's hand is showing on the right, theres a pig on the front
no negative prompt
first is from glif stablediffusion 3
second is from stable diffusion 3 medium super prompt huggingface space with prompt enchantment disabled 25 steps 7.5 guidance scale seed 3745968304
Is the t5 model (by itself and not with the clip l and clip g) working in the comfyui sample workflow? When I run the combo sd3+t5 model everything works fine and the triple clip (t5 + clipl + clipg) thing works. But if I switch to the fp16 t5 clip only it forgets the movie title or what Jason looks like. Super weird.
wait are these both 2b but one is from hf?
i dont know how in the actual living fuck it managed to unintentionally generate a pp
no
this is the new vae in action wow
also please don't ever consider the HF space as a metric lol
Is there an acklowledged issue with this SD3 medium or is it working as intended ?
Is SD3 medium worse than sdxl ?
people know but they'll still do it
just use comfyui manager and install missing nodes
sdxl base? nah
jpeg, jpg negatives do nothing, jpeg like artifacts all over the image :/
i did notice artifacts as well
better in some things(text and prompt following), worse in somethings like human pose and stuff
So it's not fully matured yet through fine tuning right
oh my...
well yes that's star wars starred by dogs
what schedulers are you guys using?
All honesty Outside of humans this thing is pretty good for a base thanks for the work.
VAE finally good enough to reproduce them
vae is too good
if i want to run SD3 Without text~~, do i need to use clips?~~ can i use clips
complex poses are a no no for now, anatomy isn't quite right in most cases. it takes a few tries to get a decent looking picture. you'll have better chances by prompting it right.
Fine-Tuning: Capable of absorbing nuanced details from small datasets, making it perfect for customisation. this this totally makes sense now, with the knowledge the medium model was trained in a short time. Maybe... maybe it's a good thing if more training fixes the deformation.
The problem is how much fine tuning are we going to have to do to get just basic human antomy right? Seems like we might have to literally train a new model
hello there ๐
"safety"
looks like it's working pretty well to me, despite being a base model. Local software quality might differ with skill level.
The purpose of this model is to give people something to finetune. We probably went a bit too far and gave the impression this should compete with older finetunes that had months to cook on top of each other. It's not the case.
heh...
What do the clip_i and clip_l do? I understand their general purpose but not the specific one.
comedy and tragedy are two sides of the same coin amirite?
pixelart is really good, but it should not invade minceraft
๐ฅฒ
I didn't work con the architecture, but it's explained in the paper
fintuners will cook something soon im sure
It weirdly cannot generate nipples lmao
Isn't it kinda dishonest to keep reposting that bottom right image when even the guy who generated it said he cherrypicked it and most results were also garbage human blobs?
ok, fair enough. I'll look in the paper then
3D render of Sonic and Tails
No negative prompt
First is from the "stablediffusion 3" glif
Second is from the Stable Diffusion 3 Medium super prompt huggingface space with prompt enchantment disabled 25 steps 7.5 guidance scale seed 384623937
The "Ultra" model actually knows who Tails is. The Medium model kind of does but not quite fully
sanic ๐ฎ
"muh safety"
meincraft
people who mock ai safety . so short sighted.
most humans generated period on sd3 are blobs
oh wow that's nice for a base model
But it literally made a fully nsfw pic like with gore ๐ but no nipples
kek
it's amputee friendly
explain
I am using SD3 medium as a refiner in my workflow side by side with dreamshaper lightning refiner to compare and I am not getting great results at all
SD3 being uncensored will literally change nothing about the current state of AI safety when a billion nsfw SDXL finetunes currently exist
prompt adherence to asking for dark fantasy illustration is basically nil
it's an important discussion that people making this stuff have constantly. short sighted people typically only have the perspective they're offered from the confines of their armchair
steers towards realism
@viral plaza are the sd3 training scripts released somewhere?
a PC web site screenshot of the roblox website
no negative prompt
first is from glif stablediffusion 3
second is from stable diffusion 3 medium super prompt huggingface space with prompt enchantment disabled 25 steps 7.5 guidance scale seed 45409346
the roblox style came back lmfao
i would use it the other way around
ok.
#๐๏ฝsd3 message
#๐๏ฝsd3 message
#๐๏ฝsd3 message
#๐๏ฝsd3 message
Better?
(also don't tell me you didn't notice that the bad ones are also "cherrypicked")
and to that point, why are people crying about it then? ai porn got maxed out with ponyxl. it's peaked. moving on now
Also it seems specifically trained to avoid copyright shit
Testing out CliptextencodeSD3. I'm liking the results of splitting up the prompt like this so far. Time to deep dive! ๐
that doesn't explain anything. why are those people shortsighted? is it beacause you think there would be legal issues? or marketing? what's it?
I think SAI might disallow people to finetune the model because "SAfEtY"
@still sundial Pixart generates the base image and then I have one refiner as dreamshaper and I added an sd3 refiner in parallel for comparaison
9/10 clip-l just make a huge mess
woah. sorry that wasn't the answer you wanted. I feel it explained it fine.
You seem to know the conversation's landscape already. This doesn't seem like a good faith approach to conversation now does it
there is definetely some combination that will work :3
I have 245GB of SDXL LORAs installed on my system, i think this is one of those situations where the sum of it's parts is greater than the base, sure SDXL is not technically superior but as a whole it exceeds SD3 currently, just a matter of time before the community picks up on generating SD3 loras and checkpoints and then SD3 would exceed SDXL but until then I wouldn't instantly say SD3 is my faovirte model to use out there
yes!
I hope someone can merge SDXL Base model with SD3
inpaint. you'll be fine.
if you actually made her wear a bikini maybe
So splitting up the prompt into the different CLIP fields can give JUICY details, but oh boy this grass prompt is a killer xD
It probably does, It cant generate styles of some videogames as long at their characters like sdxl could
lol low cfg results in ungodly monstrosities sometimes
Not my prompt, it stole it from reddit
reddit is a bad place
Eagle
?
Strange but cute ๐ค
people diving into the kiddy pool be like "why are all these concussions happening?!"
I just want to know why I'm being called shortsighted, because I can't make sense of the "safety" argument besides business politics
Oof i just tricked it's NSFW guardrails lmao
Reddit is an echo chamber of people screaming into the void.
Reddit is good for obscure answers, but bad for sanity
clip-g + t5 -> t5 only -> clip-g only -> clip-l only
hmm
safety is so they can't be liable for people making CSAM
Weirdly enough it's ok with creating NSFW images of old grandmas ๐
duh
bloxy block blocky robloxian 2.0 a PC web site screenshot of the roblox avatar with a black low taper fade afro, wearing a black suit and a white buttoned shirt, holding a phone with a wood case on it, taking a selfie in the bathroom mirror
seed 1818190208 no prompt enchantment
almost close to roblox style but looks more lego
Wow that's a very honest way to talk about someone you disagree with instead of addresing their arguments
how ? lol it's so weird having it generate blank nipples
right?
What prompt
lol
i think selfie is making it look more humanoid
this behaviour will drastically change on other prompts. It's a sign of undertraining
i think that's because of how the medium model is truncated to make it small, which might include the dataset, not because of "safety"
Say old grandma i don't know why but it works
photo of Nendoroid, anime character figure,
hey guys, I'm all for picking on the details but... As always when a new product drops, it needs some times to get its place in the community. SDXL wasn't well received at first, before people knew how to prompt it, or finetune it.
And I've got to say, out of the box, this model is the best base model I've tried, ever.
Great job stability on that one, and thanks for droping us those weights. Time to play with them and tame that workflow.
raw photograph, realistic photo of miku hatsune in a hulking hydraulic biomechanical exoskeleton armored robot, detailed face, sunset, sweaty, post-apocalyptic, cyberpunk. the text "SD3" is written on the exoskeleton
yes its super random
lol
this only true if we can fine tune out the huge deficiencies we see in sd3 base. It is literally much of what even sdxl base had before finetuning. This realease is less sdxl and more the same stuff we saw and the unusable sd 2.1
hmm after updating comfyUi i get "module 'torch' has no attribute 'float8_e4m3fn'"
I did see it updating torch. Do I need to do something with the dependencies?
Doesn't make much sense, how come "porn" peaked but nothing else did if we're moving on to better models
lol not falling for that one
it was the same problem when SDXL landed. people crying about ai safety efforts like it was going to harm them personally
the selling point behind the original sd was that you could create copyrighted stuff and celebs
So just a few differences in splitting up the prompt in different CLIP fields. Some CLIPs play more nicely together in different situations
doesn't make any sense, people have been doing that with photoshop, nobody sues or helds PS accountable for that, atleast not since 90s...
Wow! Lol! Sounds like you guys pulled off a miracle. Not joking, less training time than what Cascade had? Impressive.
Eagle
It is objectively and measurably the best base model we ever released.
It will be surpassed only by 8b
wow it even thought about the reflections ๐ฅน
you think there's a chance we can't fine tune out the deficiencies? surely a large enough dataset will resolve whatever shortcomings its facing right?
adobe doesn't exactly provide all the material for it though
They're not as cherrypicked as you're trying to make them seem, the odds of getting a SD3 gen without a single issue are less than 50% for sure for a lot of prompts that shouldn't be hard for a model of this caliber
the reflection is fucked up but still good ๐
you're just not wanting to make sense of it. you can't see past the wall you've decided is there
How use ???
Works perfectly, it's skill issue for sure.
the tests made, did they include all sorts of objects/scenes, including humans? How would it rank just with human anatomy for example?
woah i didn't realize you were staff until just now, when you said we i clicked on your name and realized your tag lol
not too far ๐
hi, anyone have a guide for running in comfyui?
People, How use ?????
use stableswarm ui for now
toes are still a mess though ๐ฆ
I'm using Comfy.
he's the one who refined 2b all yesterday while we were all cuckuuiing at him
i hope and i will be amongst those trying. but it's like trying to fix a student's drawing of a human being when the student has been blind all thier life... where do you even begin? so many things wrong with just base anatomy alone
I agree, but there are multiple issues with it still and a lot of people seem to prefer overlooking them
did you make new tcg cards? :3
How it use?
yep, check this https://comfyanonymous.github.io/ComfyUI_examples/sd3/
Actually super tired at the moment. Just doing logic testing.
mhm
well said
any finetuning info coming out soon so we aren't grasping in the dark?
the people who are getting bad generations are people who aren't using stable swarm and are just using comfy
they do in capacity. SD doesn't have CSAM in their database either right? Same thing
not really. he said it can't be trained if the base doesn't know it. that's a myth from sd2 days. it's wrong.
what does stable swarm add? just curious
That's a complete lie
seems like it just doesnt get what the basic style is supposed to be
Literally nothing, it's ComfyUI with a different interface put on top of it
The guy is lying
mind you i added modern architecture, realistic and photorealistic to the negatives for sd3
I just think it's pretty evident it's bad at poses, if that means it's unfamiliar with base anatomy I guess that's up to debate but it can render stuff pretty well as far as anatomy without worrying about a specific pose
That's decidedly untrue.
it does not care, it goes to realism
prompt? settings? please and thank you.
zzz
yeah as a base model it's bad at some things. but that's not an indication that these can't be taught
will more steps or a higher CFG improve text fidelity? Or is that just a cross your fingers deal and 8B will be the panacea?
Pls share prompt
maybe it adds some extra nodes under the hood to refine/optimize the generation?
my final note before i have a stroke... this release feels like the Starfield release. It was released half baked, without systems or things it promised to do and they are "relying" on the community to fix basic errors with the system not just mod it to be nicer
dark fantasy concept art,Undead servitors, their bodies in various states of decay, shamble through the silent, shadowy streets, their lifeless eyes glowing faintly as they go about their grim tasks with a mechanical, unsettling precision,in the style of greg rutkowski
Moving the goalpost, the person you're replying to wasn't talking about finetunes
Positive: from top angle, fullbody portrait of a closed eyes woman is lying on grass with her long legs Negative: look at viewer, amputee Others: just default from basic workflow
Nothing people using ComfyUI can't do
...No, SD3. That's not how shadows do.
it seems to make some difference from what i've seen from other people (i can't try it because my gpu is cheeks), another reason people are getting bad generations though is because they're not using clip and t5 together
this was the context
Is anyone else experiencing some kinda memory issues? It looks like sometimes Comfy or SD3 doesn't stop using all the VRAM and so the next jobs go super slowly
Can u try this with SDXL too? I'm curious to see how it compares
yeah so bottom line there's no chance we can't train SD3 into a really good NSFW model, with enough data we can overcome it's deficiencies with poses and such and eventually it'll exceed SDXL, for now though I'd say SDXL remains king imo
add "mutated shadows" in the negative prompt
Which version are you running? Which CLIPs?
top middle one what is it trying to cut in the shadow? lol
i dont really know about it. i coudln't care about sd3 nsfw. there's already a ton of nsfw available.
Porn isn't that diverse
i agree . they need to give us training scripts.. we have our work cut out for us
SDXL can't do shadow physics either unfortunately. (Don't have enough VRAM to be switching back and forth.)
kk
Thanks! ๐๐ป
The clips do take up a bunch of VRAM, try medvram maybe or something like that
The issue is I'm not running any renders and my vram is maxed now xD
eww strawberries and beans? lol why would you do that to my eyes?
No. That's not how falling stuff works.
switch the fp8 for fp16 for better results
So I need the model without text encoders along with g, l, and the fp16? I have a 4090.
from my discord server, by PierSF. A nice lesson on how to prompt for sd3
A cinematic photo of a female character with (the caption "I'm lovin' it" clearly displayed above her). She has short hair and wears a McDonald's hat, and with a feminine physique. Flowing fire in the background is made of ethereal, swirling patterns resembling a chaotic wildfire burning into the sky. The background is dominated by deep reds and blacks, with strong flashes of exploding light casting long shadows and creating an intense and dramatic atmosphere. The character's face is bored and indifferent, with pale skin and striking features. She wears a McDonald's employee service worker outfit with the basic company logo. Her hands are hidden inside her pockets. In the background is a single small modern concrete building burning at the side of the road. The overall style is like a dynamic and casually taken iPhone 5 picture posted to Instagram.
Comfy is caching the results, so that for example you can change a VAE and then re-run the same seed gen without having to actually compute anything other than VAE decode
Because all 3 of those are overpowered tokens in SDXL, so I was testing them. Surprisingly, none of them took over; they're all balanced. Plus, the burger did not end up the standard one that we always get...so this is an improvement.
yeah the model doesn't know hitler
fp16 takes forever to load up for some reason so I just stuck with fp8
wow sd3 seems to like more natural prompts
cheated!
I'm lovin' it too
only the first time. the second image with the same files will not take anywhere near as long
Perfecto.
ah it's a measured test very insightful stuff thanks
Ah gotcha, will try that, thanks!
hitler's image should be forgotten. only the atrocities remain.
Well, i guess that combover does count
that has to be the ultra model right, if so you pretty much cheated
The brioche bun was a surprising result.
he was not a stylish man
Very impressive results for a 2B model
no, 2B
kind of a drug addict
@lavish osprey Could you perhaps... teach us how to prompt for SD3 as you were one of the core devs ?
Oh this worked! Thank you!
prompt?
i guess the sweetness of the brioche could complement the burger and actually work lol
whats the recipe ๐
dumb question, but where do I drop the clips in? which directory?
from top angle, fullbody portrait of a closed eyes blonde is lying on grass with her long legs, she is wearing tank-top and pants
Asked for utopian sci-fi fashionshoot in a room with peeling paint, got a dude from ASOS catalog. Where is that magical prompt adherence? ๐ฉ
so that explains what some people said about people using sd3 medium wrong
Diffusers PR was merged https://github.com/huggingface/diffusers/pull/8483 idk if they've got the public posts about it done yet or not offhand
prompt: A woman in a white one-piece swimsuit is lying on her back on lush green grass. She has her arms raised above her head, with her hands resting on the grass. The sunlight highlights her toned physique and relaxed expression. The perspective is top-down, capturing her entire body from head to toe. The scene conveys a sense of relaxation and enjoying the outdoors on a sunny day.
The answer is always in the beans:
the woman in the grass thing was just a meme brigade trying to make the worst generations they could.
so SD3 needs flirting
ah ok ty
Some attempts to get correct
rip Midjourney
this dosent count as nsfw right
once it comes to diffusers, we can use it with free tier colabs that don't utilize guis
is there any way to run sd3 in a1111? ๐
trading cards (had to double check i had t5 on, but for this prompt, sd3 at times spells like cascade)
I see swords are still an issue haha
I called it last night lmaooo
people trying to find failure cases the second the model comes out to claim that its trash
people say such things after every release by everyone else. They are not going anywhere. They lose big time to some in specific aspects and kill in others, and so far no one is master of all IMHO.
LOL okay will try.
I don't see any nudity
so paste that whole thing into all 3 of the text encoders?
that scene lives in my head rent free. love that movie
remember, a similar thing happened with sd2.0
I tested 20 prompts, SD3 won every time
horniest sd base model yet
sd3 is great but mj is miles ahead imo
I can make 20 it will lose in. Big deal.
i will test it when it works in a1111 ๐ , so just looking at your pics for now ๐
@simple thistle i didnt even notice you put a security scanner when comfy launches, nice ๐
Not that I am ungrateful for some text... but it really cannot do it upside down lol
people said sd 2.0 sucked, but when they used negative prompts it worked better
Please go ahead
Have some fun diffusing everyone, and don't forget to get some sleep if you've been crunching for this release for the last 48hours devs, thanks again, and see you around here soon people
i personally loved 2.0, but i absolutely hated the stock sdxl model ๐
how the hell do you install this?
juggernaut xl is probably one of the best finetunes out there
time to go catfishing on tinder
you need a UI first
i have the next latest UI
i hated it more bec. of me , dunno was never able to get good results with the stock model. Now with all the tuned models its all good
i thought this was an entirely new thing
It doesn't know any artist styles, so it's a big NO from me. I always loved the old models for 18th century oil painting styles, etc... this model is probably trained on AI images mostly.
I have no need to. I have been at this for years, with 50 thousand images in MJ (v2 to now) alone, let alone Dalle-E3, ideogram and SD 1.4 to now. They all have their strengths. It is about knowing them and maximizing them.
lol what? skill issues
at least he is holding the sword in the section for two handed control, even if it looks silly just posing that way
wrong reply?
oh its the demo
Even greg rutkowski works because tha'ts from the clip layer
oh woops yes sorry
Try making some artwork in style of Yoshitaka Amano? Good Luck
oh that was in previous base models? I need to check that out
Just a little boy
i need help is there a guide to install all this or where to put these files?
use the single clip block if you use the same prompt in all 3. Works better for some reason ( @simple thistle maybe knows why)
It almost looks like gordon ramsay, but not quite
Look up SD3 on youtube and filter results to "today".
Gordon Ramsaid
so that's one artist. doesn't suggest ALL artists and styles aren't in the model at all.
Get his consent and train his works yourself. its ez
oooh what about zdzislaw beksinski
Honestly, I'm blown away at the raw details so far and the prompt adherence.
high quality magazine cover A pizza chef showing off his latest amazing pizza recipe that uses the infinity stones from the avengers
it kinda worked with 8B

thing about open weights is you can work with them anyway you want
LMAO nice
idk man, for a base model this is nice
what is the clip_l, clip_g and t5xxl thing ?
still nothing compared to what we will get with finetunes
so there is hope in the community
Not only that. It doesn't recognize almost 99% of 18th century artists.
doing that now, thanks!
Cascade seems better, but I don't really have a feel for prompting yet.
clip_l and clip_g are what we had in SDXL, t5 is something that increases prompt understanding and especially Text capabilities
Handles environments quite well
It really loves Japanese.
how / why / when to use this ?
I think the captions are natural language so they might not include the artists name in the training data anymore
I think what they're doing is pretty obvious. The silent majority will see it too
cascade was cool indeed :3
yeah cascade was nice too
why does he look more like an australian than a scot/british person
i just dont get how all those workflows work in comfy, and i feel its super overwhelming. Really hope there is an auto update soon ๐ .
cascade was my prefered model yet, i really like sd3 too
especially with 1 24 inch screen comfy is horrible
The woman in the grass is a lot of people pointing out that there are issues with SD3 2B model that shouldn't be there and being surprised by the amount of bad generations such a simple prompt provides you with. A lot of similar prompts will also have similar issues
drag and drop the json into the main window
If anybody else wants to test what I'm testing, I'm finding putting quality terms in "CLIP_L" and the main prompt in "t5xxl" and combining them in "Clip_G" is adding a lot of crispy details to the result
Can someone help me why do all my results seem not as high definition or realistic as y'all
how/why to use the CLIPTextEncodeSD3 node ?
That's why it's wrong to say that it's a midjourney killer. Lol. It's clearly NOT
Phase 2 will start soon. Unstablediffusion and others all starting up their gofundme donation rallies so that they can all fix the OUTRAGEOUS problems from sd3. it's the standard hate cycle.
yeah but i hate not knowing how everything works, where are the benefits of different workflows.
That sounds absolutely CogVLM-esque. ๐
nobody cares about that stupid prompt anyway lol.
gpt-4o heh
Brother in christ this is a woman lying on a bed where is the hate, the model is just bad at a lot of things
it is a stupid prompt. Doesn't even have a cpa with text on it
As a base model yeah, but I think artists names is something that would be easy to teach it with a quick finetune
it's wrong to say it's a mj killer because mj is not an open model and this is not a service
gpt4o prompts for sd3 are INSANE
no cap
not half bad, figuring out a prompt it likes is the tricky part ๐ (i haven't either, just threw a few cascade prompts at it)
Just like how Blender isn't necessarily a Maya killer
the model was trained on longer prompts. You're confusing the tes.
does making the step higher help with the quality? did anyone try 100 or more?
Bear. It put a "B" in the corner instead of a random number.
(I prompted for the teddy bear. Intentional, not concept bleeding.)
If you say so
anyone has an idea why/when/how to use CLIPTextEncodeSD3 ?
in case you didn't know, Dalle is similar to this, that's why they put a LLM in front of it.
i get a feeling 20 does a better job than 28 steps
I had llama3 NOT restrict the length...
yall, the potential of even this 2B SD3 is amazing.
a painting of a monk riding a big cat wearing a tutu holding a orange umbrella, on the ground next to the big cat is a parrot wearing a flat cap
Not From 2B Medium!
it's already at a point when you stop comparing it to models and you start comparing it with real pictures.
Alright, I'll gen ~9 images with it to make a nice grid to see if that makes the amount of errors lower. If yes, very nice
I don't see the SD3 node in "add nodes". ComfyUI is latest version. Is it under a cryptic submenu?
no, you sure you updated? ๐ฆ
What SD3 node are you looking for?
What happens if you switch "woman" with "dog"? Saw a short prompt work with that...
very cool, prompt?
that's not me that's the manager
idk, literally any
100%
Yeah Cascade can't do specific prompts like this. Just need fine-tuning for hands.
(Her eating ramen with bamboo is intentional.)
It does seem to be prompting that's the main issue. The feet are scuffed in this one, but it's not a meat blob
The woman is wearing a nightgown and she is on top of the duvet
Her arms are above her head and her legs are stretched out to the bottom of the bed```
Nice prompt, works well
double click on empty space, and write sd3 in search
"""""""japanese"""""""
Not to mention there are clearly a LOT of known artists and works that are censored elsewhere but heavily trained in MJ. I mean, I tried a fantasy work in the style of Frank Frazetta and no one got anything close except MJ, who seemed to be possessed by his spirit and brush. Dalle E told me to not use his name in vain, and Ideogram didn't know what a Franz Frazetta was.
That did it, thanks
I laughed pretty hard at this "so"
but some characters are surprisingly correct
I mean the scene is pretty, but damn the girl is creepy xD
it can't make japanese text from prompt though
ah well either way, its kinda cool
I memorized hiragana and katakana years ago and I remember nothing.
they don't give a fuck apparently.
it also has issues with western characters with accents
Bro used a 6 word prompt and is shocked he got bad results ๐ญ
Perfect thanks! And thanks for the love on the SDXL event. I dont work at stability anymore tho, so thats why I wasnt hosting one ๐
I'm at N4 kanji now... Kill me.
Ohhh, does it just expect waaay more pose description in the prompt for human subjects, but not really need them for animals?
This is obvious, and I am merely pointing out how the closed model has these curiosities
They don't directly train on their works. They use their works to train styles, then use the styles to make synthetic datasets, then train those into MJ. That way you never accidentally get a real artwork out of an MJ gen.
they will bring a calamity upon the whole AI space.
People always making the same mistake. Open weights don't compete with services and closed models.
same prompt as above. gets a solid 33% of the woman just wrong.
It doesn't seem to help all that much on average
Yep, I'm having the same experience more often than not
Everyone has a right for an opinion. The guy just said it's a midjourney killer, which is clearly not true! I never said the model is bad or smth, it's just not better than previous version and that's a big step down. I personally prefer SDXL. And yeah, for those who doesn't have a high-End PC and demand on Paid API's it is a service indeed.
I am noticing that if you give more details about the scene, the prompt, the details etc... it generates better looking pictures. Unless it is placebo
but you get real screenshots from disney movies if you write disney .com
1st one got that CatDog anatomy lol
So the 8b model is only available through API correct? but has anythign been set up to run that through comfy?
@lavish osprey man.... you just had to say length matters. I just told llama3 to not limit itself to shorter prompts and this thing just came alive. I assume because the t5 is now overpowering the other encoders. This is an sdxl refined one with the same long prompt in all 3 encoders. Vibrant red-orange sunset sky with thick smoke plumes billowing from crumbling skyscrapers and burning buildings. Metallic woman-spider hybrid stands amidst ruins in worn Japanese kimono with intricate gold trim, her face and body composed of rusted machinery and copper wires. Glinting silver eyes lock onto viewer's gaze as she grasps a futuristic assault rifle in one hand and a curved katana sword in the other. In the distance, flames engulf towering buildings, casting flickering shadows on the devastated streets, while twisted wreckage litters the ground beneath her mechanical feet.
I dunno about you guys, but I don't really need to generate women lying in grass ๐
TBH I've never used MJ.
well, it can, so it is finetunable
You're clearly doing it wrong
What are you talking about?