#🏞|general-with-images
1 messages · Page 143 of 1
👍
Found this image, wanted to test it with SV3D.
I like how every time he's half-turned you can slightly see a nose and mouth
I saw that they upgraded the capabilities for free ChatGPT users with search and GPT-4.
So I wanted to test if it had DALL-E 3, I asked if it could generated images, it said yes. So I asked it for a green field with a giraffe.
the free version only knows how to draw in python
I noticed 🤣
Rembrandt move on over... 😉
it can most likely run on 8GB, yet the question is how fast. With SD3 I'd recommend a 4080 (16GB) or 4090 (24GB) (even with SDXL)
generate a image of smegal
cough
heheh
Dollar Tree Todd Howard after Xbox's Starfield DLC showcase.
Might as well go for something really simple tonight. lol
Right in time for the release of SD3 ! A muscial video made with SD and SD3 about the alleged superiority of Comfy ! https://youtu.be/O3NzGSHjj4s
#aiart #stablediffusion #comfy #comfyui #stablevideodiffusion #stablevideo #imageai #videoai #aimusic #mistralai #udioai #blufftitler #parody #sarcastic
The Ballad of Comfy UI is a funny little video clip about the Comfy UI webui interface for Stable Diffusion, which is allegedly the superior interface !
All images and videos are AI made, gene...
All SD and SVD made, and the song "The Ballad of Comfy UI" is Udio & Mistral (with my assistance).
nice stuff
The Traveling Lansburys
I did these last year, I don't think I need to say who these people are. 🙂
https://suno.com/song/ec724e51-94de-4af1-b276-0afdb5878053 No video yet 😄
Gangster Rap, uptempo, deep drums, deep bass, male voice, chart song, bass, drum, drum and bass, electro, electronic song. Listen and make your own with Suno.
A fresh light blue background, with hand-painted white clouds and water ripples, highlights the refreshing feeling. The slightly lower part of the center is a blue-and-white product map, surrounded by hand-painted pimples and disappearing effects, with some hand-painted plants. In the upper right corner is the red "618 Special" hand-painted label, and the bubble pattern adds vitality. At the top of the product map, write "Use salicylic acid to get rid of acne without trouble!" The overall style of the publicity copy is refreshing, natural and vivid.
hi
Всем салам, бота можно к себе на сервер добавить как нибудь?
@deep gust
Thanks. The prompt is com from SD3 research paper.. it really cannot reproduce similar quality and atmosphere
Keep in mind that this is "only" the 2B model
And I did not cherry pick, this was the first and only one
@shut sinew create an image with cat
My first SD3 picture 😄
Not really used to swarm ...
Ahhh... disable regular Sampler 😄
Good evening, can you please tell me why some checkpoints generate images of this type? Is something missing from the settings?
There should also be ComfyUI Workflows. Didn't try yet
cool
Bad luck most of my songs are German ...
problem is not full free
Same with Suno ...
suno do only sound without voice?
Including voice .... otherwise German wouldn't be a problem 🙂
I wonder what video card is needed to run this locally
I didn't find an online version
Haven't searched for that yet ...
If you use the fp8 version it takes around 8gb of vram
i have 4
LeoB
HELLO
Hello! 🤗
😄
Sure bigger ones will become cheaper used ...
And even cheaper if nvidia rushed to launch its 5000 series..... I'm still waiting 😡 😭
5000 could be a bit over 🙂
Using my 4090 in energy safer mode most of the time ....
I have very expensive electricity
yoga
There must be a guy behind her 😄
Same prompt in SDXL vs SD3, tell me again that SD3 has better prompt understanding...
We should know the prompt 🙂
(and which one is which... lol)
🤤
Sadly the good one is SDXL, the one on the right is SD3... now sure it might be a prompt building thing, the prompt was - "Full body shot of a werewolf queen ((powerful dominant female werewolf)) with her pack of werewolves ((partially transformed werewolf queen, chimera woman half human half wolf)), dense realistic fur, body hair, strong woman, macabre, odd, weird, amateur photography, vox pop, photojournalism, real image, highly detailed, (volumetric lighting) (photo:1.15) skin blemishes, (soft glow, hazy real camera image, real texture, grainy, gritty:1.1) blemishes, imperfections, interesting contrasting color palette, F/2.8, film grain"
But out of the gate SD3 is not great, its 'slightly' better than SDXL was at photorealism, but it completely lacks all the hyped ability to make artistic shots imo
Maybe put the chimera thing in front ....
Thing is even if you could wrangle a good werewolf woman, the composition is just garbage, same crap base SDXL got stuck with, centered subjects not life or dynamism
is this the base SDXL model or a finetuned one? I would compare the two base models.
Base models have never been really satisfying ...
It's realistic vis4 vs SD3 which I think is a fair comparison. As I said elsewhere, I think maybe Sd3 is going to be better eventually but for now no, its just not. Added to which it has some weird limitations like not working with 60% of the samplers
And the way it handles latent noise seems strange, a lot of the tricks i use with that for SDXL simply don't work for Sd3 as it sort of ignores base noise differences
thinking is way way far from knowing. I would compare the base two models. I am not an expert at all. thank you for replying and testing it. 👍
We will have to relearn prompting and tricks for best results ... it's always been that way and with SD2 in a pretty bad way 🙂
I have no doubt its more capable as a base model than SDXL base, no doubt at all. But that means nothing really. As nobody uses SDXL base model anymore. It needs to compete with finetunes or its pointless. It will live and die on how easy to fine tune it is. And the fact it only works with 40% of samplers is bad right out of the gate
It's day 1 ... I wonder we can already use it local ...
But some things are a bit strage ... the fingers for example
I didn't have access to the uncensored closed weight model, but from what people have said it does/did not have this issue, so it seems clear they have deliberately mangled the open weights, which you know is fair enough but it does sort of kill it as a bad model
Fine tunes will be the thing, but nobody knows how easy or hard they are to do yet
I also wonder that it's only 4GB 🙂
Would anyone mind passing me a working SD3 workflow for comfyUI? Just a base one.
I need to see if Intel Arc is operational with the current model.
Guess I will.
🤷
We'll see.
Hm. It worked.
VRAM usage is great, too.
Cool!
Which one?
SD3 Medium.
AFAIK there's no other at the moment ... ?
Ye, all three are the same model with their clip encoders embedded. You just download the base and use "TRIPCLIPLOADER" node in comfyui instead
Takes me a minute and a half to get an image.
Space for improvement ... 🙂
no, theres 2 text encoder one and a 3 text encoder one
you simply aren't prompting it right. it's way way different then sdxl prompting, you need to sentences. for example this(its llm enhanced)
A stunning photorealistic depiction of a dominant female werewolf, half-human and half-chimeric, commands attention. Her chiseled physique, replete with textured skin and flowing locks of golden hair, radiates power and authority. Surrounded by a pack of strong, masculine werewolves, each one imbued with realistic vitality and movement, the scene pulses with life and energy.
Yes no offense but that its't right either, its gone anime for no reason at all. Its not photo realistic at all
Having expiremented more the problem seems to be the base workflow is just not very good in comfyui
If you throw the steps up to 80+ and remove the weird negative conditioning, drop the model sampling it starts to look much better
80+ yikes, that will take forever no?
i just did it with 25 and changed absolutely nothing. and yeah it is still not as good as a realisim trained model. This is a 100% expected. Even in llms, a small fine tuned llm beats much larger llms in a specific task.
Sdxl and sd3 are somewhat similar in size and and still sd3 puts up a pretty good performance
It takes me a minute and a half to generate a single SD3 image.
I wonder how fast a 3060/4060 generates an SD3 image.
SO this is putting your prompt into mine keeping the bits which make it look like a photo... 80+ steps no negative condition weighting (not even sure what that does) and removed the 'model sampling' pretty decent right?
oh yeah thats pretty impressive
And removing your prompt just using my original word salad sdxl prompt, gets this
Which honestly is damn impressive, SDXL cant even get close to that out of the box
(base model of course)
I am 100% convinced most of the issues are sampler/steps/scheduler based
yeah i think my prompt made the werewolf a bit too human
side landscape view of white castle countryside mansion with long futuristic glass car garage on the left, top pointed towers on the main house
Here is the same prompt using the base workflow the huggingface repo suggests (base on left) I think the base workflows are removing a lot of the artistic elements and making it look a bit fake/digital anatomy seems better using the higher settings as well
side landscape view of white castle countryside mansion with long futuristic glass car garage on the left, top pointed towers on the main house
My settings on left base workflow on right... the settings are the problem imho
Keeping Clip_L and Clip_G text empty allows a solid colourfill background character image
which is nice
SDXL on the right. Too bad SD3 can't combine stuff like this.....
but stuff like this definitely works way better in SD3 ! SDXL just mixes up everything
using DPMPP SDE? 😄
I managed to get one by going about the prompt from a different front
closeup photo of a art piece made of watches, in a cat shape, art sculpture composed of watches
ahh good idea! I'm also testing different ways to structure my prompts
New model may be "smarter" and finally getting words right (for most of the time) is ok, but still anime XL + alchemy is the killer for me
got this one from phoenix
luckily was able to fix it with killer combo
sorry, didn't mention it's leonardoAI models
can't wait for some SD3 finetunes to come.
Out of the Box can work for some things though
So I have SD3 fed into SV3D for model gen
What would be the best node to use to convert the multi-image output SV3D utilizes into a usable model?
can sd3 do "a cow sitting on a frog", by default? Can someone test it?
😁
@pallid ruin very close. but in most images iv seen, it looks more like the cow is behind the frog doing something questionable. pretty good though.
absolutely flawlessy if you ignore everything you just asked for lol
Maybe you didnt use enough steps. They are still in the stage of trying to figure it out. The frog looks ready though.
anyone need a job?
I need someone to do my images for my history channel on YouTube
Feel free to dm
ive been messing around with the pixart huggingface space just a tad bit. how do theses 2 images look?
Love them !
You are the new Disneyland Monnet
🌝👍
I don't understand why all new models still are trained in 1024x1024. 16:9 is more appealing and video AI is coming soon.
a woman lying in the grass: sd15 base, native 1024x1024 generation
I very much like Cascade even if it is less versatile in terms of compositions, like it´s often repeating similar settings/perspectives for example yet input images can help along as well (not used with the recent ones here)
In Stable Swarm you have to turn off all other samplers ...
Climate-Change had melted the Polar-Icecaps; so water was more abundant everywhere - leading to people having to "live-out-of-a-bathtub" forever more ... SD3
my prompts for 1.5, and for sdxl, are working just fine
Yes, they work. But you can get better results.
and?
Looking at the pictures from the SD3 Creators, I would've never prompted like that and don't get results as good with my old prompting style ...
i'm sitting here, with SD 3 running in comfy, taking the settings apart bit by bit and learning how this new architecture works. what are you running SD 3 in?
I have used API via Glif and run it now in Stable Swarm ... but for real I'm working on a song 🙂
ah. okay. i have no idea what settings you can adjust in that. can you set the samplers or schedulers?
I think Comfy is a bit different than Stable Swarm.... Can take a picture for you ...
this is SD 3 in comfy, sampler set to DDIM, Scheduler set to DDIM_Uniform, ModelSamplingSD3 set to 1, all the rest of the workflow settings at default. Prompt is a 1.5 prompt and is: Fluffy snooty elegant Muskrat with lush tightly curled hair, wrinkled nose, crinkled eyes, brown eyes, Octane Render, by Jasmine Becket-Griffith, by Daniel Merriam, by artist "Patrick Woodroffe"
the workflow i'm using is the comfy_example_workflows_sd3_medium_example_workflow_basic workflow available on the huggingface weights page
You are one step in front of me in this case. I just wonder that I had to disable all the usual Samplers and use this special SD3 one with CLIP + T5
i've been sitting here for hours doing what I did when stable diffusion first came out. change a setting, render, look at the result. change it again by 1 number or a couple numbers, render, look at the result. SD 3 doesn't use U-net. everything is different. you ahve to learn it from the bottom up
this workflow uses
A single colored flowerpot decorated with similar lines
i can't use t5XXL_fp8 - my machine throws errors, i have to use 16
Yes ... I am pretty sure you are right. Without learning new stuff we won't get best results. A lot of work for me waiting 🙂
yup. one .00001 step at a time 🙂
I never thought we would be able to create pictures from day 1 anyway. Now it needs the swarm intelligence.
same settings. another SD 1.5 prompt: wet shimmery hair, hyper-detailed, galaxy airbrush portrait; bee magic; mist, volumetric lighting
change the sampler to uni_pc and the scheduler to sgm_uniform
this model is a beast
For sure ... that you can have 3 different objects in a picture for example ...
Wouldn't have tried to prompt that in SDXL ... so need to relearn prompting for SD3
That's been my idea: SD1.5 or XL prompts work, but won't take you to best results ...
prompt: green cat on the left, red dog on the right, in the middle is a blue cube with a triangle on it
the only thing i did different between the images as change the value in the ModelSamplingSD3 node
Chance to get that with SDXL or 1.5? 🙂
you can't get that. you're welcome to try though 🙂
prompt: breakfast, photo by Tjalf Sparnaay
My friends don't understand that I am a 1-Man-Army and want to get a MTV-Ready video ^^
Leonardo and Mona Lisa on a date
He looks like a seaman 🙂
prompt: morning light, photo by Tjalf Sparnaay
prompt: still life, photo by Tjalf Sparnaay
Can you try one by William Eggleston
just: by artist "william eggleston" ?
yeah a still life or food photography
k. just a second
still life, by photographer "william eggleston"
prompt: candy stripe, bobbles, bubblegum, bubbles, butterfly, Mark ryden, Jordan Grimmer <--- ModelSamplingSD3 node set to 1.2
same prompt, but ModelSamplingSD3 node set to 1
prompt: Illustration of an old village, winding cobblestone streets,gloomy, castle-like buildings, intricate carvings, gothic architecture, otherworldliness, enchantment, midnight, full moon, mist, volumetric lighting by illustrator "Joe Sinnott", by artist "Patrick Brown"
texture looks pretty damn good, is that sd3?
yes
running it inside ComfyUI
im running a few basic samples too using comfyui, wish i had a workflow for facedetailer
that i don't have, but you're welcome to a copy of the workflow i have setup for sd3 if you want
thanks i actually need something basic that can incorporate face detailer for human faces
I think there's one on civitai.com including facedezailer ...
grab a copy of ReActor perhaps?
Yes 🙂 Haven't tried yet
i tried loading up that workflow btw, but shows some glitches in red
did node updates too but fails at face detailer
Did you get the latest version? They said something about bugfixes ...
i might have to check that
think i tried the one that says official
the base model is pretty good for images other than human figures
honestly this is a better result than trained sdxl for this kind of images
😄
28/5 commanded here ... but can be different in Comfy ... https://www.reddit.com/r/StableDiffusion/comments/1de65iz/how_to_run_sd3medium_locally_right_now/
the swamui interface didnt feel aesthetic so i settled with comfyui for sd3
shockingly weird anomalies with limbs
surprisingly faces look fairly good for a base model, but hands and legs are fu*d up
Yes ... it's more a mess ... that's why I don't like it either.
well yes, very unexpected
Need to catch some food. Have a wonderful day!
i'm gonna make wild guess and say that sai wanted to censor nudity which is why we see face/portaits look good but when it comes to figures they messed it up big time
later
Freepik's new AI tool is really consistent with the character's likeness. And it's blazing fast too, considering the initial image quality. I wonder how they're achieving it.
this is so cool
prompt: still life, photo by Tjalf Sparnaay
This is the land of confusion ... sing
宫崎骏的动画作品以其独特的画风和治愈系的故事。镜头缓缓推进,展示一个宁静的乡村小院。阳光透过树叶的缝隙,洒在一张旧木桌上。桌上摆放着一大块切好的西瓜,红色的瓜瓤和黑色的瓜子在阳光下显得格外诱人。一只橘猫懒洋洋地躺在旁边,偶尔抬头望向镜头,眼神中流露出满足和慵懒
Showing off SD3 capabilities to a friend. Who can guess his name ?
Almost. You got 0 good letters. Nice try though x) well I know not everybody seem to be liking it, but I do have quite lots of fun with that model right now
Definitely John
Roast'n?
Naruto?
THIS IS IMPOSSIBLE
XD this is too good to be true
thanks for this laugh
This is impossible ... BMW! sing
Not really from what I can tell so far. What's missing comparably I feel are interesting compositions/characters/poses/exaggerations/etc. Not completely yet present merely to a certain extend when looking at prior versions (2.0/2.1 excluded).
It also at least seems to struggle with artistic styles, also while using generalistic terms. So not only artist names do not work as good (if they work).
I´ve merely done a bit testing yet it looks like SD3 has the tendency to coat everything into a glossy/hyper real finish combined with realistic depiction tendencies, lacking a more natural appearance.
8B could potentially be varying because the images I did see looked at least more interesting (as well compostionwise) for example with a pleasant here and there slightly painty cinematic touch and also a more illustrative painty example I did see, being a comparison of a prompt in MJ and 8B, then it´s merely been a few images I could look at so far, let alone play with it here, so I couldn´t tell exactly how it behaves and what is potentially possible in that regard, as well in terms of natural appearances.
Here for demonstrating what I´m talking about:
abstract surreal expressionistic,expressionism,painted
Stable Cascade:
SD3 2B:
the pre release images that were showcased have no relation to the images i'm generating with this 2B sd3 model, its possible that this particular variant of 2B is extremely poor at rendering images. they have other variants but so far this release isn't what most people were expecting
Yes, heard and saw so as well in terms of the difference of what we got for download compared to the API.
in that case thats a pretty bad move from sai, but i dunno, i haven't used API
me neither, simply heard about and did see images coming from the API that looked at least better
Leonardo has launched a pretty cool new model ... no idea on what it's based ...
you remember the images a couple of months ago that looked amazing, dont think we have been given that model yet, 2B model isnt it
and i can see why a lot of ppl are pissed
It's been almost impossible to redo those pictures they showed us ...
exactly
even then I didn´t see anything that really got me because there are almost only photo-realistic/cinematic images and those that weren´t didn´t seem to be that interesting either. Still crossing fingers here 8B will be more suiting here
8b is only for enterprise
Is that for sure?
😄
Likely. I mean they'll respond to market I guess. They are figuring out profitability
So it´s merely guessing?
No. They clearly said it's for enterprise. And even most companies can't afford it.
Have a friend working in an org who said that spoke to stability team, and this was their stance
well, what to say 😄 Hoping for a proper 6B model then 😄
Besides, what philosophy is it to merely offer the "best" model to enterprises? At least seems like a corporate capitalism way
Money - the root of all evil ... ^^
No, it´s not money, it´s the people wanting it
It's a pretty simple phrase ... sure ...
Emil? 😄
Not really ... did 20 of my old prompts, 4 pictures each ... maybe 2 I want to keep ...
Looks like even Glif generated better pictures. But I am not really used to StableSwarm ...
And as I said ... old prompts aren't the best idea ...
And btw at least in my eyes there are more suited ways for generating income, like let´s say selling a model for commercial use on a fair basis. When suggesting this already elsewhere there was the reply roughly saying: "but then it will be pirated" which doesn´t really matter when being available for free for non-commercial use anyway and even if it was behind a paywall that´s what every software company has to deal with, as well in terms of possible protection (if required). Then for the latter the open source background would be gone anyway.
Some nice details but what's the thing on the bottom right for example?
if that´s your only complain I don´t know what you are talking about 😄
then what is besides letting the prompt translate by a LLM? 😄 Natural language doesn´t really seem to be sufficient or merely to some extend
Well it just doesn't make any sense ... so I don't want it ... SD3 can do inpainting?
I'm still relaxed but wondering about some things ...
What do you mean it doesn´t make sense, the only thing not being proper is his hand behind the sword, not that much of an issue I feel
well, not taking minor flaws like the slightly too long arm into account
if that is a sword after all 😄
yet can go for one at least in terms of AI 🙂
Could be a kendo stick as well 😄
I have no idea ... but looks like it's flying ^^
yeah, then like said it"s AI, expected not be perfect with every image (actually a lot less) The rest of the Imagel ooks fine of you are into that style 🙂
Cascade
😀
Cascade with input image:
Input Image (SDXL):
We should know the Input ...
🌟 Find Your Voice, Find Your Freedom 🌟
In the darkest times, when the world feels too small and the shadows too long, remember this: You are not alone. I am here, standing with you, a fellow traveler on the path less taken. As a survivor of gang stalking, I know the depths of isolation and the relentless pursuit of the unseen. But today, I rise...
many many images
that image is the input for cascade 🙂
Got it! Sorry for missunderstanding you ...
nw 🙂
Okay that's weird
not an apple, but very cool
go run that through lumalabs
i mean you said "apply"
oh. i meant to type "an apple spinning in space"
but that should animate really well in LumaLabs
it's going to take pretty long
that's what i got from method
i'm still walking through the settings
never let your watercolor paper get wet
also, paint drys out, keep your brush wet
hmmm, doesn´t feel really convicing in regards to the reference to be honest and I´m btw not saying it cannot produce any sort flat appearances, yet it simply seems to add its style on top, basically either a photorealistic or mentioned 3d-finish look. Like said, when using input images in Cascade or SDXL (or even SD1.4 for that matter) they neutrally transport the original in that sense while SD3 seems to add a finishing if you will
i have no idea what the 'style' of your refrence image is. so here's the prompt: childish scribble drawing, tempura paint on cardboard, smudged, finger painted green eyed monsters
i'm not trying to make something that looks identical to your image, to me it's a watercolor painting with tempura paint on cardboard
there are a lot of images that i've posted with paper surfaces and paint on them
hmm, pretty close.
maybe less gruesome. that ladybug dont look too well. 6/10 for prompt adherence.
yes, like said, not saying it cannot produce anything flat looking, yet here it usually doesn´t so far, unlike prior models, then I´m currently testing some and I think it´s simply the lack of trainingsmaterial that isn´t really there, like a lot of stuff looks at least seemingly being merely a photo of the original (even though it could have other reasons,yet I´m seeing a lot of those paintings on a canvas with a relief structure. Btw here you can see this added effect as well:
SDXL
SD3
btw I´m currently testing da Vinci/Bosch and alike, where it seems to be pretty flat indeed, reason I think it might be the trainingsmaterial/captions
i guess i don't understand what you mean by flat looking. because that doesn't look flat to me
yeah were definately don't know how to use SD3 perfectly
looks tasty
like not a relief or 3D-ish appearance. Like the SDXL one, the SD3 one even if not that much looks more toyie/plastic like
@royal monolithyou mean flat color and flat shading?
okay well, to my eyes, every single one of the images i posted for you look flat in that case.
like said, it´s not about it not being able to produce anything flat at all, yet at least from experience so far it seems SD3 has the tendency to add some sort of finish to the images which usually leans into the direction of photorealism or 3D.
start playing around with the comfy workflow node called ModelSamplingSD3 - set it to .01 and go up from there
yes, I´ll check, thank you so far
wow
ROFL! well now, that could turn into anything. yeah. that function is critical to the look of the final result. don't change anything, lock the seed, and just start changing that one node one decimal point at a time.
I saw first foreign person that said "rofl"
🙂 i actualy say it outloud at times
I've mostly heard that from Russians
sd3 ... human figure is out of the question.
@proud dagger i tried that prompt you suggested - it doesn't work very well
it's not great no but it's not a demon flesh pile was the point
it's mostly a valid image for the prompt just not great anatomy
a regular quality issue vs the horrid demons being shared around on reddit
true 🙂 now i want to know what it is that adding :0.5 to that prompt is doing.
and what happens if i take it out of the ( ) and put it in [ ] instead
that downweights the prompt. My working theory is that something's broken in the guidance causing it to explode - on the older unet models we had explicitly separated cross-attention (text guided) and self-attention (unguided), often the cross attention made mistakes and self attention fixed them. SD3's arch is different, it doesn't have that separation, so I think downweighting the prompt is letting the model do the equivalent of empowering it's self-attention to outrank the prompt guidance
it's an explicit syntax supported in Comfy and most other UIs, not just magic text
not quite what i'm asking. under the covers, what is that making the bits and pieces of the network do that's giving the effect
what is it specificaly affecting, why, and how
It's a scaling factor on the calculcated CLIP embeddings for the prompt input.
Here's HF docs that discuss the topic in more detail: https://huggingface.co/docs/diffusers/v0.22.1/en/using-diffusers/weighted_prompts
okay, thanks.
what i'm trying to solve for is why this is an issue to begin with - and now i'm going to go play with other numbers in that prompt. but what is it in the core model that's causing this issue?
@proud dagger ummm
also, why is it an issue with laying on flat surfaces but not standing up or sitting?
are you sure about that? i keep getting weird anomalies in all positions
can't say for sure right now. A bug in the safety filters is one of the possibilities (it might have eg mistakenly associated women lying down with nsfw and thus excluded from training, so it's only seen vertical women). It's been replicated as well with eg "a cat lying in the grass" by a few people so it might be something more specific and less obvious than that, it's not clear at all yet.
In general SD3-Medium seems to be subpar at human anatomy, but "a woman lying in grass" specifically creates horribly twisted demented things, vs in other positions you just eg get a 3rd leg or missing a finger or etc.
The broader imperfections are probably just something that'll go away with further training
i find it very strange that the dev had several months ahead of release to plan the launch but we just saw how awful the results are when it comes human form
This is just the first model, released a bit quicker than planned because the community was really impatient to get access. This is SD3-Medium (technically a Beta of it). The bigbig stronk model is still coming
This SD3-Medium model is awesome at a lot of things but yeah has some severe shortcomings we didn't realize in advance
some of the active members of SAI were posting amazing pics on twtter btw. What happened?
The model is amazing lol, it just has some shortcomings you have to avoid
(note that SAI staff posting might sometimes be the upcoming big 8B)
i hope you are right when you say big one is still coming. cause this 2B medium variant looks like needs rigorous training yet
I thought it might be however the word laying was being tokenized but changing that out didn't ahve any real effect. it also has a real problem with hands and feet
you'd be surprised if you compared sdxl base with sd3 base
it still can't do elephants correctly either, and tails are still a huge issue - cat and dog tails wind up in strange posistions or disconnected
I'd listen to him
Laying on a car bonnet looks like the result of a car crash too
(they didn't like it when i pointed that out earlier)
there is a funny plot twist ... i never thought i'd be using comfyui, but now that i've put some hours behind this, im starting to like it with reasonably decent control over it, this wouldn't have happened if it wasn't for sd3 lol
btw this is sdxl base 1.0 that i just rendered now, sd3 dont stand a chance next to it
what prompt?
data on img using comfy
i don't want to open your workflow. what prompt did you use?
prompt doesn't matter, im showing the differences in sdxl and sd3
What I dont get, is that for there to have been images in the initial research paper there must have been a working model that had been trained. Why did it then take months for the “training” and not have the paper published that started the hype train once you had a trained product??
i chose both base models
I say you as SAI not you personally*
you guys can have a healthy discussion about it and probably persuade a better decision making protocol, so far SAI has done some nice work w/ txt2img, i dont think people will stop using sd15 or sdxl
but this sd3 2b version needs go back to drawing table
there are people out there still using 1.4
it realy doesn't. people just really need to learn how to use it
let me have your prompt then
the images in the paper were from early versions of the 8B model. The 8B model is really smart but bad at fine details and generally undertrained.
this sd3 version is incapable of prompt coherence and would forcefully give you glitchy outputs
2B is great, it has t he best small details of any model ever released to my knowledge, it just needs mainly needs more time to cook ie more training, esp on humans apparently
Then why wouldnt one train it properly and then smash mj and dalle in the research paper rather than limping just ahead of it??
i know how this can be great, but when it involves human anatomy its terrible
we do not have infinite compute, we have to work in priority orders and take time. To train 8B to perfection we need to dedicate a massive amount of resources for a very long time to it
this version is extremely capable of very good prompt coherence
2B is much faster to train so we wanted to get that pushed ahead first
some of the cooler images with sd3 but not human figure
give SDXL this prompt: a red cat on the left, a green dog on the right, in the middle is a blue cube with a pyramid on it
i'd like to see what it creates
thats easily doable with even sd15 using regional prompting
i hope you have eyes open to see the weirdness
prompt She waits in the waves, hair flowing in the sun, radiant and smiling softly
still waiting to see how SDXL handles that prompt i gave you
take a good look at the first one
if you dont see anything wrong i dunno what to tell you
so you're not going to run that prompt through sdxl?
Fair call, thanks for at least fronting up 🙂 Cant be easy at this time!
give SDXL this prompt: a red cat on the left, a green dog on the right, in the middle is a blue cube with a pyramid on it
i'd like to see what it creates
i have views to share in defense of sd3 and ethical pov ... but sd3 2b hasn't been perfected as it is, aslo falling short in comparsion with its predecessors
so you're not going to run that through sdxl and see if it hsa prompt coherance then?
thats easy to do with regional prompting, but im too lazy to do that now
you know it would fail, too
correct it is not perfected yet
no it wont, if you know what regional prompting can do
but it's awesome in a lot of ways that prior models were not as awesome at
um, one shot, one run. no inpainting or anything else
you need to leverage the available tools
nope. no extra tools. just the prompt, and the basic sdxl workflow
that's odd, SD 3 has no issues doing it without fancy tools
that is prompt coherence
that's good for what it can do but why are you evading the fact how terrible it generates human form?
To be fair even at a zero shot prompt, theres 10 billion seeds out there, sdxl will be capable of it, just may need longer to seed hunt. Doesnt mean it cant do it though.
Also its kind of moot as the main argument is not about prompt adherence and more about the fact that the woman on the grass has a bad case of leprosy
because it's very good at human form. all you need to do is scroll back through this channel and look at all the excellent images of humans that have been being posted for hours
just a lazy prompt for a portrait of a blonde woman
that's odd
of all the human images i've generated with it, almost 90% of them were trash
you must not have been watching this channel then
yeah, nothing to write home about
do you mean there are some rudimentaries to prompting on sd3?
i mean that last image is flat. SDXL can do much better than that with the correct settings
bring on the hands im not talking about portraits
i get good faces with sd3 but not hands and feet
haven't seen you post any SDXL hands?
i still dont understand how you miss the point
i'm not missing the point. if you want to use SDXL, cool. use it. but stop dissing SD 3 just because you haven' tworked with it enough to get it to work yet
i have generated tons of nsfw with hands legs on sdxl and they came out grewat
you are defending a flaw as i can see
all im saying sd3 is terrible at human form
if you have no desire to correct that you are pushing it into a corner
what I've watched you do is look at images and say how bad they look simply because you assume they are SD 3. but just like the anti-ai art haters, you really don't know what you're looking at or what created it.
you are making aguments in defense of something that needs rectification
are you fine with how sd3 is now?
that was my whole point, either you see it or you dont
My first SD3 image!!!!
looks cool, try this a blonde woman sitting on a couch
low quality
quality in terms of image tone is fine but the render itself is botched up
I will try with updated settings
sure
idk why SD3 always wants to show breasts
but it not related to your workflow in most situation, its a gliche in the training data
is SD3 meant to allow you to prompt in sentences? or do I still have to prompt with (photorealistic), bla bla bla
it listens to both
oh, so better than SDXL
ive used sdxl with tags and sentence too
I like how it actually does the text of what you type now
I was pretty hyped about that one
go on
and the models that are fine tuned listen well
wow, ppl already made checkpoints
My argument is that you are biased, and that no matter how good an image you assume was created by sd3 actually is, you will speak of it in negative terms, and see issues that are only there for you
To the point that a photo could be posted, and if you assumed it was sd3, you'd find issues
that's a risky practice, you will not help the company realize their short comings
and if you are seriously arguing that sd3 is fine as it is .. your arguments are logically invalid
sd3
Good morning!
Tag 🙂
Thanks 🙂 Trying to start every day with a good morning coffee picture 😄
It's more a twitter thingy ... and kinda competition. When will I get out of ideas? Created more than 150 and trying to create a big mosaik ...
I think we are getting closer to what we wanted to reach 🙂
Hey guys, SD 3 works with A1111 ?
didnt use face detailer on these two images
i feel inclined to think once the model is fine tuned there won't be a need for adetailer / face detailer
not yet
I think my tests with StableSwarm weren't really good ... my fault? Looks like the new ComfyUI does a better work ...
i didnt like swarmui... but im really loving comfy
alex is gonna cry now
I think it's a bit of both ... Comfy and A1111 but without the comfortable interface from A1111
yeah there are some conveniences in auto1111 that comfy doesn't have but even then comfyui has its own charm
a1111 is for kids and comfy is for grown men 🙂
I like to stay a kid 😛
hehe
they are both good, but the reason im enjoying comfy a lot now is cause i finally figured out how to manage those wiring
its really not that hard
A1111 is good for quick results and an easy manual workflow ...
when i first tried comfyui the app didnt have default template lol but now that it has a load button for default workflow its lot easier to get started
nice
whoa! epic
Learning Comfy is always a benefit cause you learn to understand the A.I. more ...
Thanks 🙂
comfy is tingling my geeky side, i used write a lot of html/css once
It's really a genius tool
well yeah, its fun 🙂
I have only learned Turbo Pascal 😄
mid journey like effect with water color
BTW I've got the idea that SD3 likes to add signatures ...
yeah,
altho i have text, watermark in negative but it still shows up
I've the slight idea that it might be parts of the prompt the A.I. don't understand like SDXL any longer
i was able to negate watermark in sdxl but when it comes to anime its a hit and miss sometimes, but sd3 is bit more persuasive with watermark
You could add "Signature" no idea whether it will help 🙂
HAHA xD
just a man x)
but "mangecouilles" mean something like "balls eater"
Kim could be more a sausage eater nowadays ... not sure ^^
😄
These two are bewitchingly beautiful (SD3 into low-noise SDXL+LoRAs)
Some call it: Evolution ... ^^
^6
This didn't turn out as expected ...
1
dayum, sd3 looking real good
If you are on the correct amount of drugs, that looks normal.
i told you what I was saying. you keep either ignoring it, or twisting it. Alex told you that the version of SD 3 you are using is unfinished, the company is well aware there are issues and are working on them. you ignored that, too. However i see that you've continued to try to learn the product and i'm hopeful that after working with it for a while longer you'll start to find the method that works for you to achieve the sort of results you wish to have
im going to hvae to mute you, you are looking for excuse to argue, i made my point clear and if your assumption is i hold bias view over sd3 you are just plain ignorant
even as it is, with the bias and other failings like anatomy, I think the prompt adherence and the text capabilities make up for it, and are worth investing time in understanding the model prompting system, while waiting for a 3.1 or for finetunes that do tackle those failings.
But, for having been around for quite some time, every new model had this exact community response when it got out. Time helps, people getting better with the model helps, finetuning and other methods bringing quality back up help too. but it's hard to keep the community happy in those times, or to just keep a level head about what a base model is and isn't, when it just gets out.
Arguing isn't the best solution either, just keep on bringing some good pics, training on the new model and having fun as a community, this is the best any of us can do around
i clearly pointed at it's stronger traits and weaker traits.
and i have no interest to converse for the sake of drama
I don't argue on the weak and strong points, I'm aware of those failings too. like I said, I just feel the strong points do win over the weak points, and are worth investing time into understanding the model more while it keeps on getting better.
I didn't answer you directly, so no interest in drama either don't worry 🙂 I even said to the person you were talking to that arguing was useless
What was the prompt for this one ? I love it
im optimistic about training it, im also curious about 8b variant
the few things I tried, it seemed to me that training wasn't working correctly yet, or that I didn't understand something. I'll try again later on that front : this is the big thing that will bring 3.0 to the top. Right now, it fails on some key concepts for me too.
8b variant seems fun, but I won't be able to train it locally so I'm more interested in that 2b we currently have. I need to tinker more in my diffusers script :p
yeah i agree on those points you just mentioned, i think we should at least give a shot with this 2b model
the thing I'm spending time on today, while the diffusers training get sorted out, is the prompting. We do have 3 text encoders now, that do interpret prompt differently, and this is quite new. Also, model shift parameter. Understanding those new parameters, how to use those for real, what each of them does exactly, ... this can already be quite usefull to get better 3.0 results, as well as train it better once we can do that properly
in worst case scenario and what im hearing from others, best bet might be to train 8b model if resources allow
In the medium/long term, 8b training will be better for sure. but even at first, I do feel that training 2b will be faster, and teach us more about how the model reacts, how the prompts should be built for optimal result, and then use those things we learned to train better on 8b
just from what I learned when I trained a lot 1.5 and 2.1, each model does react differently to the prompt format
yeah agree on your approach
the reason i mention 8b over 2b model is that there could be technical bumps/limits with 2b model
true. but are we sure the 8b weights will be public at some point ?
no idea honestly
I mean, I hope they will, but while I don't know, I intend to learn the most I can, and have fun with it. I do admit, I'm still a child having fun when talking about training, this is still so cool and addictive to me, even after so many models x)
this 2b base model is quite capable with various things, if human figure gets sorted then community will love using it
human anatomy needs to get sorted out, yep. this is the main criticism I ear and see in the model, and it's a problem
especially when we see it being so good at other points
yep thats' where i have issues with it too
asian bias is quite high too in my experience
well, it may just be me, I didn't make enough picture to really 100% comfirm it, but if I prompt "a woman" and don't specify origin, I get an asian woman most of the time, in the samples I have
you mean with this sd3?
i got all these while playing with it... i didnt specify any ethnicity, no asian bias with these
I didn't use a "save" node, and mostly previewed my pictures, plus I didn't play a lot with anatomy except for that "girl laying on grass challenge" people seemed to run yesterday ^^ it was mostly my impression from my small sample of results. It's the problem with small samples though, not representative. I'm happy yours doesn't seem to present that bias though
oh well i have had terrible anomaly with human figures too, but no asian bias
this is the very first asian render with sd3 but on purpose.
i wonder if you have noticed this, we dont need much facial correction or at all with this model
and its a base model
anyone know what the normal speed for generating images with sdxl is with a 4080, it takes around 45 seconds to generate 1 image and i want to know if thats normal?
would depend on a number of factors, steps, hires fix, any controlnet in use, but in general txt2img 45s way too long for sdxl on your card
oh
what is end at step mean
why there 2 values
not sure, i dont use those in my workflow
can i copy your workflow thing?
for sdxl?
yeah
200 steps seem quite high !
what should i set it to? like 100?
try torqx template, but I usually make pictures in 20 to 40 steps, depending on the model and resolution
yeah 20-40 is ideal
oh ic
yeah imma try this out
check with missing nodes in case you dont have them already installed via comfy manager, but those are essentials
sd3
what are these ? am not able to generate wit hur template
for example, this is a comparison done here for step numbers : #🍥|anime message
The more steps, the more refined an image gets, but it has diminishing returns
those are face detailer to correct facial defect, you should have them for general purpose, you can update them via comfy manager
is ti a plugin or something?
yeah, and just about every other change that happens anywhere gets this same sort of response. it really is sad in a way, and unhelpful in others.
I did a lot of other responses in more helpful format, this one wasn't helpfull I'll give you that. Got tired after a while of the same complaints when there are lots of other positives to use.
Yeah, always the same when anything releases, people for it, people against it, life
oh nice it works ty
nice
why this one have a red highlight
check with the ultralytics models
they are under Install Models once you click manager
get all the face models
im not in a good position to trouble shoot comfy plugins, i only picked up comfy yesterday when sd3 came out
but they should be easy to fix
i got all the ultralytics models but for some reason it stillh ave the red highlight
what does it mean tho
huh weird now it doesnt have a red highlight
you would have had to restart after updates
oh aight imma do that
hm i try to generate landscape but it it looks low quality, is the model not great at landscapes? or is my promting just bad
looks washed out
yeah
which checkpoint are you using?
ahh ok, base model is kind of less vibrant
wth is pony
this was the result with sdxl base i was testing out to compare anomaly with sd3
yeah i got something similar, imma try out the sd3
you should get one of fine tuned sdxl
I got some good landscape results with base SDXL, maybe play with settings n prompts more?
can you show some?
should i use the refiner?
I only have 1 on me, could find more when I am home. It is very stylized
ah i see, definitly send me when you get home
hm i see, ill try mess with the prompts for a bit then
at least I thought it was good lol, but that's subjective
i dont expect shit prompt to be good, i however expect the new SD3 to not do ctulhu. And i expect the deffirence between two boys on the beach and two girls on the beach to not be so stark.
boys have meh hands. but girls have A FUCKING LEGFOOT
Can you PLEASE censor nsfw without making women look like aliens ?
this is accidental misoginy on German level.
nah thats sdxl
looks good!
usually when i get many colors in wild flowers like that, they all smash together and gradient oen another
well i used the stunning landscape prompt
i remember i did stunning landscape prompt before on the sdxl prerelease i think it was called clip drop and the output was completely different from what im getting now
hm weird they look painterly when i did it before it was more photographic
maybe its cuz of the face models
the terrible cataclysms of the last days of humanity on Earth...
@worthy sonnet haha
Aaa 1.0 is not that good, I recomend you trying leosamsHelloworldXL_helloworldXL7.0 , its the best one I tried
we agree that SD 3 is anti-celebrity, don't we?
It's absolutely stupid, pointless and very frustrating. Yet another reason to stay at SDXL
beautiful blonde in slim bath suit with rainbow socks just off the beach
Yeah it cant do most of the celebrities, the community will have a hard time finetunning it, but I think it could achive way more than sdxl (in some months, obviously)
No Justin Bieber? 
nice. now, how about animating that in LumaLabs?
SDXL even 1.0 would manage the body very well, but this is a nightmare. That, plus the fact that we can't with celebrities... It might as well be their death warrant.
Hadn´t heard of that before. Gonna check, btw it´s Cascade with the Invictus Redmond v1.1 Checkpoint
it just opened and it's really really cool. https://lumalabs.ai/dream-machine
ty
I've heard disturbing rumors re. trainability on Reddit
Is this just speculation or does anyone have more info?
your first problem is reading anything on reddit, the second problem is listening to it.
I don't know.
Im sure if used enough data, it can be finetuned it and "fix" everything it cant do or at least many of the things it cant do
if you mean can people train SD 3 models and LoRAs, i know of two that are doing that right now
Soon we'll have tools like Kohya_SS compatible with SD3, and that should be even better. But if checkpoints are so bad with anatomy, more than a few people will be disappointed 😏
the entire reason for the fine tunes is to fix the issues the core model has, yes?
You can finetune models with...good anatomy as the imput data 😁
I'm more of the opinion that more LoRA+embedding and the like = more bullshit.
I concur, but mostly because I'm speaking on behalf of my beleaguered HDD
just with A1111, I have more than 16k picts
Looks good from what I´ve seen so far, then what is that login with google on stuff?
that's google authtencation. if you have a google account, and most do, you can just use that instead of having to have a seperate login for each site.
i did these with luma yesterday. the girl picking flowers was text to video. the other is image to video
🤣
@meager geyser
yes, I know what it's for 😉 yet I'm wondering why it's the only option. Not really doing google here, except for necessarily on the phone
Ty, interesting, do you have the input Image at hand for comparison?
i dunno. just what they decided on i guess.
i don't have - i cna't remember where i put the image, so i created this in SD 3 right now and ran it through Luma for you
yes, that´s pretty cool already, thank you 🙂
If you like, you could do it with this one:
okay, give me a bit. the queue is slow
or this, I don´t know:
yeah well 😄
🙂 i didn't prompt for anything, just used the image. it let's you either just do image to video, or text to video, or put a prompt in with an image to video
This looks funny glitchy like if it was inflated 😄 Like it 🙂
Thank you very much for testing 🙂
🙂 free accounts are lmiited to 10 a day, but you could work up a really fun short over a few days if you wanted to
you're welcome 🙂
democracies and other political regimes in the contemporary world
Hello
those are awesome @crisp stream !
Thank you 🙂
😀
you just got very high praise
fits in a way 😉🤭
the kelp that ate New York?
i wanna see the movie you are creating
are you talking About the Video stuff?
i'm talking about the movie you're gonna make after you animate all those very cool images you're posting and edit the clips togther 🙂
no, I don't think I'll do it via that service at least. Possibly animated dif locally. Currently busy fooling around with input images in Cascade
that'd be cool too
gonna see, when I find the Moment 🙂
someone's not going to sleep tonight
how come? 😀
cause you're buried in stable diffusion 😉
Thank you, it's Cascade 🙂
ah, nono, daily routine 😀
couldn't get anything alike from SD3 so far and haven't seen anything like that from SD3, reason I stay on Cascade for now in combination with the Invictus Redmond v1.1 checkpoint, in case you are interested
very interesting to hear that, i have no experience with cascade but im actually looking into it now
yes, it's improved quality compared to SDXL, even though it"s not as versatile because less training. I especially like its colorange/dynamics, can have Old Master contrast effects, guess because of a better VAE. Very satisfying to work with here 🙂
i noticed in some of the youtube guides im checking out now, cascade has been well trained to do proper hands and fingers...
not shabby at least, on a regular basis
well the comparison graph on sdxl, cascade shows cascade runs faster and gives better outputs, something im tempted to setup on comfyui
btw if you go for it, mentioned checkpoint is basically simply a quality upgrade for the base model, it's as well a generalist model yet let's artistic styles come through much netter. The checkpoint needs a unet input though, not the checkpoint loader the base model uses for stage C. Here is the workflow for the Invictus Redmond checkpoint with input image:
thanks for the workflow, im looking into github section for necessary files
are you able to create humans in photorealistic theme?
say something basic
gonna send the base model one as well, and for the Standard way withe the Redmond you can simply substitute the "load image node with an empty latent
sec
I've been staring at faces too long, does this look real?
t2i base model (workflow included)
im guessing you are not using face detailer in this one?
nope
downloading files .. im going for the lite 16 version since i just wanna run cascade for orientation
was the first Image btw
we'll see 😀
@wispy nest t2i workflow for the Redmont Checkpoint
wont let me load the workflow, probably cause of copying the image from comfyui interface, might have to drag the actual file for it to work
i2i Workflow for the base Model (fron the github examples)
yep
ok no worries
im checking out the comfyuiworkflows
going to do something very basic... just for a feel of it
yet you won't find the Redmond Workflow there, that workflow is at Civit.ai or simply take the ones I posted 🙂
@wispy nest
nope, lots of defects - right pupil, front teeth jumped out at me immediately
crossed-eyed bunny 😀
okay
Looking forward to how you like it 🙂
cascade is indeed great
used oil painting for it
looks good for starters
im also impressed how little effort it took me to put this into work with comfyui
btw if anything is black and white and you don´t want it you can n-prompt colourless. Works pretty well, at least to some extend
btw i see the stageb and c go into unet folder, not checkpoint folder
I got them in the checkpoint folder, only the Redmond one is in the unet folder
hmm i see
didnt let me load c and b when i had them in checkpoint
i moved them into unet and it worked
stage a went into vae
i actually renamed the files appropriately where they go
cascade_textencoder.bf16
stage_a_vae
stage_b_lite_checkpoint_bf16
stage_c_lite_checkpoint_bf16
but checkpoint should be unet
yeah well, don´t know. I didn´t put them in the unet folder because I so far couldn´t customize the unetfolder path (got basically all checkpoints except for the Redmond one in checkpoints on an SSD while Comfy is installed on an HDD. Would have to move to the SSD at some point
I double checked and they are in the models/stable diffusion folder
anyway, mainthing is you got them going
all functional
btw this is the link for the checkpoint, the photorealism of the cover looks pretty good I feel, though I´m no expert on photorealism:
https://civitai.com/models/316681/invictusredmond-stable-cascade-stage-c-finetune-generalist-model
Invictus.Redmond is here! I'm grateful for the GPU time from Redmond.AI that allowed me to finish this model! This is a generalist model fine-tuned...
ahh ok thanks, so that's the fine tuned version of cascade
yep 🙂
do you have that up on hugginface btw? cause civitai download speed sucks for me
no, I haven´t, don´t know if it is on HF
ok, so im wondering right now i have 2 files going into unet folder, those are the models, but with redmond as a single checkpoint folder how does that work?
i'm only finding it on civitAI
ok ty
my speed with civitai tends to fluctuate
they might be rate limiting
it's 6, almost 7 gig
usual sdxl file size
but with civitai throttling speed its inconvenient, huggingface on the other hand is very consistent
any idea how this preview section has 4 images cause when i create a preview image node i get one panel
there should be a batch number somewhere
🙂
cool stuff
when you use the redmond checkpoint you only have a checkpoint loader in the workflow, instead of stage b and c?
no, the Redmond substitutes the Stage C checkpoint loader by the unet loader, like in this workflow (even though there is still the disconnected stage C base model present: #🏞|general-with-images message
ok i think i get it, the stage b stays the same and i replace stage c with your model? btw that image you shared wont let me load the workflow
and here the i2i workflow: #🏞|general-with-images message
sec
t2i workflow for Redmond
if you need the model still, i downloaded it and can stick it on my google drive
and yes, you are correct about the models
set your batch to 4 or whatever
Good morning 🙂
Good morning 🙂
Was your cascade workflow included somewhere?
yes, here for the Invictus Redmond fine-tune checkpoint: #🏞|general-with-images message
and here with i2i: #🏞|general-with-images message
Here is the IR model: #🏞|general-with-images message
The base model workflows are available at huggingface (too lazy to write more :-D)
or oyu can scroll up for the base model ones, they are labeled as such
Thx ,,,, something to play 🙂
Hmmm... sure they are still included? Comfy don't wanna find a workflow ...
