#🆕|sd3
1 messages · Page 117 of 1
could someone put on civit pls my shell scripts are written for that
waiting for the first woman lying on the floor
Made with SD3.5M
detail levels look good
Text ain't half bad
yea that is definitely improvement
This image shows a cozy, sunlit room with a relaxed, homey atmosphere, featuring Furina from Genshin Impact. She stands barefoot near a window, dressed casually in a tank top and shorts, holding a plate of food and a glass of water, looking slightly surprised or concerned. The warm sunlight streaming through the curtains softens the scene, creating a peaceful, domestic vibe. On the left side, a pair of playful, water-like creatures—one wearing a whimsical top hat—are causing mischief in the kitchen, splashing water all over the sink and bubbles floating in the air. The room is filled with small, charming details, like shelves full of books, potted plants, and scattered objects such as cans and a tipped-over cup. The balance between Furina’s calm stance and the chaos in the background gives the scene a playful, slice-of-life feel, capturing a moment of quiet absurdity in a seemingly ordinary day.
not a great prompt but hey.
textures are really good
Flux/Florence2 + LoRAs img2img
smh base model doesnt recognize furina
doesn't recognize Loona from Helluva Boss either
I mean the basic feature is there
am I crazy or it is not worse than large?
"sd3 medium is all you need"
no, 8b is better but that one good too
waiting for gguf
it's here already
https://huggingface.co/ND911/stable-diffusion-3.5-medium-GGUF/tree/main
oh okay
why, it is small
bro was camping
didn't even remember WHY I was using gguf
someone like me with 2GB VRAM 
yeah its good for that
wtf
oh curios how q4 gonna run
will try that tomorrow
like with my 2GB VRAM setup
I did have my 6GB VRAM laptop tho
q4 is good on like 7B and up models (talking about LLMs)
but this may apply to DiTs since these are transformer based
but the point is, a large quantized model might outperform a small unquantized model
but it will be way slower for sure
hmm waiting for all the reddit posts that trash on SD3.5 Medium
its to be expected
The comments saying "why use this when you have Flux" already appeared 
That guy probably have dual 4090 setup
yeah he can fill flux and T5 at FP32 on his GPUs
imagine opposite happens
LOL i saw that
q4 is smaller than 1.5 but speed drops from 1.16s/it down to 1.41s/it
quality does not drops heavily
q4 vs fp16
"My Triple 5090 512Gb RAM, 3 x 10Tb SSD is gonna ... "
... need a pocket nuclear-reactor to power
wanted to try style blending between photo and anime, unfortunately it didnt do well in medium maybe i suck at prompting 😂
"a photo of a cafe at night. there is an anime girl sitting on a chair"
i couldn't too but I don't know correct prompting for that
only seen lykon trying 3 subjects in separated style
raw photo of a cafe at night. an anime girl is sitting at a table
Super nice. Curious though, how was a K quant made? When I look into the options for quantizing models (e.g. leejet's stable-diffusion-cpp), they only have non-K-quant methods.
(also I'm dying to see Forge get K-quant support, from my understanding it's still not in yet, though I could be relying on outdated info)
this is kind of impressive, ngl
It seems like ComfyUI is the only source (that I know of) that supports K quants but...I really prefer other tools. :x
yea that is very good at various styles
about to upload my upscaling workflow
ok nice, tiled upscale is what think is the thing I might use it for
what this node doing in "SLG" workflow?
Where will it be uploaded to?
Thank you..
this is SGL sampling
I don't use small GPUs but the smallest quant being just 1.79 GB is kinda cool
runs on a mobile phone lol
oh yeah I forgot phones
that's a legit use case for smol model
hmmm that is nearly perfect
Is that just by adding the SGL node?
This one I assume?
if its anything like as good as PAG was for SDXL then this is a big deal
been missing PAG so badly
there is workflow on hf page with "SGL"
seems like it does help
not always of course
SLG SLG SkipLayerGuidance
Yes it should help
The underlying mechanism is very similar to PAG
@lunar canopy how comes Dango has the yellow name and not me
Oh that's my old days dev title still there ? Lol
chat, I need dog plushie vs dangos fight images now
oh awesome
in my opinion PAG is the biggest quality boost we got in last year or two
in terms of just a single node giving a boost
one thing you might have missed with Medium, is that it works at 512x512 too
unlike Large that ONLY works at 1mp
at the end of the day, they complement each other very well
I generated oner 2k images with Medium as refiner yesterday and I love it
SLG is not as direct as PAG in terms of layer selection given how DiT model works - but play with it - it is fun
Flux/Florence2 + LoRAs img2img
yeah that's okay, even a little bit of PAG effect would be good
Relevant layers:
2,4,7,8,9
Divisive Norm and Spectral Modulation from here gives a bit of PAG effect also and works on all DiTs https://github.com/Clybius/ComfyUI-Latent-Modifiers
should stack with SLG
6s per image on rtx3060 with 512 and looks fine!
I'm also curious to test this stuff here ^^^
it could go down to 3s with 8step lora and maybe to 2s with tensort
Wouldn't it be wise to release your base models together with an example fine tune, one maybe done in partnership with a prominent fine tuner, so the people that are less tech savvy and more focused on quality rather than trainability and versatility can have a glimpse of what the model could offer down the line?
hey that's naked
oops
I'm gonna release some finetunes soon
the model might be stronger structurally at 512, cos in my opinion flux is strongest structurally at 512 or sometimes even 384
less long distance attention issues with a smaller image
cause... you don't have long distances
you can also attempt a "cascade-like" workflow
where you generate low res, then upscale and refine
yeah "cascade-like" is how I do flux, its good for these

Nice, because people love focusing on defects any base model has that are fixable relatively easily through training, and maybe releasing them together would prevent that somehow
how do I use the skip layer guidance
the simpletuner dev showed some screenshots of sd3.5m fine tune testing and it already looks better
or do I even need it
well, my finetune currently focuses more on artistic view than on fixing issues. But we're also working on those.
where can I see them?
idk if euler ancestral is helping but I'm using it for sure
im gonna do 40 steps with it
since this is almost 2 it/s
medium still has that superior VAE compared to sd3.5 large
no speckled "dust" artifact
terminus research group discord
lol dem fingers
the styles are amazing
thanks
I promped it





euler ancestral, cfg 4, ddim_uniform, 40 steps
sadly it doesn't like uglyness, almost like flux
thanks
Prompt: The man is shirtless and is barefeet and covered in mud. His long pants are old and torn. He looks thin and frail.
Reality: The man is genetically perfect, has abs and a nice jawline, he is NOT covered in mud cause that would be unappetizing... Rather, we just put a brushed him with a tiny bit of dirt. He looks like someone who is well fed and his pants cannot be torn as that doesn't look aesthetically pleasing...
I know a lot about stable diffusión but one thing I didn't search and I would like someone to explain it to me what does the TX5 (or whatever it's called) do to sd3.5 ? what improvement does because im using it without it and it's working fine but i want to know what I'm missing
T5_XXL is a large language model, but unlike most of the ones that we use today, this one's an ENCODER too.
Some models can use it as an encoder (like CLIP I think??) to improve prompt adherence and text capabilities.
If you are mostly making images without text or rather simplistic scenes then yeah... you won't ever need it.
ok putting "ugly" before "man" gives him a worse jawline and a receding hairline. Perfection
he still has abs though and his pants are NOT torn
thank you very much, I think this sd3.5 + the announced controlnet support will do wonders
@cunning lintel
is there info about controlnets?
BEFORE and AFTER | SkipLayerGuidanceSD3 (default settings)
is this sd35L?
medium
because
coming
SD3.5M ... random seeds, all other settings the same.
prompt: what?
Sorry, looks squished
Prompt: zombies running screaming with giant billboard in the background that reads "WHOA"
Not sure if my output is indicative or I am doing something wrong? Using the workflow example from the HF Repo for SD3.5M
Definitely not happy with these results
not necessarily doing something wrong, the model isn't as strong as the big boys
Uh where is skip layer guidance node from
its a core comfy node
mmm i thought i updated, ill do it again
This is 0.5 more B than I need though
might have to do it manually from github
ya i havent actually found it yet lol just started to look
OK it is indeed part of Comfy, had to update
a more complex prompt using the Triple CLIP Text Encoder
it will work on flux too but it needs adapting
obviously SAI won't make the flux node so we have to
Will there be a turbo version of Medium?
not sure Clip L and Clip G would like the big paragraphs
I don't know much about prompting so I'm not sure, but it might be doing harm giving big paragraph to weaker text encoders
I see only downvotes under the 3.5 medium announcement, is it that bad?
Think it's just nightmares from the 3.0 model
go by Arxiv reactions rather than Reddit reactions
I mean thumbs down under announcement here but guessing reddit isn't doing much better
oh, reddit and discord are same anyway
it takes a while but in a few months there will be Arxiv papers covering these models, there's already a few papers that talk about Flux
Full HD
Medium test
I wonder how fast SD3.5M going to be on my dual 3090 setup 🤔, shame I can't test it out right now
for the most part you can just take the parameter count to be the speed multiplier
not always true but its not far off
however Flux goes double speed per parameter cos it doesn't need a negative
Didn't do much better with the smaller prompts either 🤭 which is why I have both short and long examples. The prompt was expanded by GPT-4o and tailored to the strengths of each encoder.
What do I know 🤷♂️
seems like the image quality you got in the end was similar to the others
its a limited model compared to flux and sd3.5l
OK, now Large vs Medium using a smaller prompt for both L/G
yeah large is just drastically better
might be the case that medium is nice for tiled upscale, not sure yet
I think Lumina is worth looking at again also, there was a fine tune of it to 2k resolution in the I-max paper, sadly they did not release it but it would be possible to replicate it
Large vs Medium different prompt:
Using @lavish osprey 's upscaling workflow which uses both Large and Medium models...
showing some potential
on the upscale
needs a bit more aesthetic finetuning or preference optimisation
That's a mighty hitchhiking thumb
SD3.5L
In general, how does this perform in comparison to 3.0?
SD3.5L original vs. Upscaled result through Medium (Lykon's upscale workflow)
Apparently SD3.5M loras can be trained in mere minutes https://x.com/peacej/status/1851288045712191572
Mostly good, but what is the versatility in unique applications such as stylized artwork?
Not sure... I would have to put them side by side... let me see...
It’s definitely better at human anatomy by a mile, a bit less then large, not sure about other things.
oh a random Jerry Chi
3.5m hands are a lot better than 3.0m hands
its a big improvement over 3.0m
its easy to forget what previous models were like, did sd 1.5 for last week and didn't see 1 correct hand in like 1000 generations lol
I really love sd 1.5's lighting and general "vibe" though sometimes, no other model like it
Yeah I think flux spoiled us, basically perfect anatomy each gen.
I still prefer flux but sd3.5’s is much more creative than it.
And more trainable too
In regards of full fine tunes
huh 3.5 medium isn't that bad after all, resolutions above 1MP don't break like in 3.5 large and the quality isn't terrible for the size and speed
flux spoiled us yeah exactly
cos what comes out now will always have flux as context
SD3.5M vs SD3.5L vs SD3M
same settings for all three generations.
Based on what I've tested so far, even though medium outputs feel messy, it creates quite nice looking skin tones.
SD3.5L vs SD3.5M vs SD3M ... same settings for all three

hands are the final boss of models
my last experience with diffusion models is SD2 but by god it's progressed a lot
Because SD2 sucked
SD3M vs. SD3.5M vs SD3.5L
Wait I just realized SD3's "woman laying in grass" was just a copy of SDXL's "woman doing yoga" prompt
p.s. misspelling was on-purpose.
you could do some interesting stuff on SD2 but compared to now especially it's pretty mid (and believe me, I loved that model for some reason)
I liked 1.5 a lot more tbh
I was mostly doing embedding training for 2 which was pretty neat but unfortunately the architecture for embeds is no more
oh well, this isn't the channel for nostalgia lmao
you can still do embeddings 🤔
SD3.5L vs SD3.5M vs SD3M
SD3.5L did it 🙂
Actually? That's cool- thought SDXL kinda killed it with the multiple text encoders, worth looking into I suppose
Well, hands are not going to happen, I tried to generate a woman showing her hands, did 100 images, only 3-5 ok.
I think multiple text encoders is gonna go away also
they are mostly a temporary anomaly cos its expensive to train a model to fully replace clip
Flux for comparison...
although the only one that got the BACK of the hand was SD3.5L 🤦♂️
With more "encouragement" it finally got the back of the hand LOL
Bro is holding a gun to the models head with that "(back of a hand:2.5)"
Multiple text encoders with sdxl?
SDXL uses clip_l and clip_g if am not mistaken
Okay. I haven't seen any people experimenting with dual clip loader workflows for sdxl, so I thought it was a single.
I was like " IMEAN IT, DO NOT TEST ME! HAVE :5 READY!"
I take a Mythbusters attitude. If it doesn't burn on its own, I will MAKE IT burn.
is it tune'able tho? like what are the requirements to tune 3.5L? in terms of memory only lol
not mentioning datasets because somehow people find/make those which is lowkey crazy to think about
If lora training works this time I'm sure that it will be a really good refiner model to use...at least for my graphic card. 0.25 denoise for the image to the right. Just a quick test
It smoothed the skin detail away.
Also lost a thumb.
across the board, it looks smoother overall.
Depends on how good we can train loras for anatomy of course. If it's even trainable...I'll believe that when I see it though
SD3.5M is probably a good test bed for training if nothing else.
did you guys have any luck inpainting with sd3.5 medium ? Results get completely squashed. worked well as a refiner but inpainting not at all. I'm wondering if i'm doing something wrong
are you using a stochastic sampler or a deterministic one?
for inpainting stochasticity is the most important thing
using the suggested sampler dpmpp_2m with sgm uniform scheduler
would suggest trying the node version of dpmpp_2s, the one that lets you adjust eta and s_noise
keep s_noise at 1 and set eta as high as you can without the image breaking
that can help a lot
it was supposed to be a face
will test the behaviour with large to see if it's the same
it should give you higher image quality as well as helping inpainting
the main downside is stochastic samplers need more steps
SD 3.5M testing after work, pretty much random stuff I wanted to try. I guess it could have many uses despite of several things that don't seem to be working with it.
SD 3.5 Medium, M2 Mac Pro
by far the best set of images I have seen from SD 3.5M
really good job
I think with a bit more fine tune it has potential
sd3 medium hqd inpainting problems as well. But the 3.5m version seems way worse. SD3.5 large seems to be working a bit better, but not good at all. 30 steps here, 5 cfg, 0.4 denoise
can you share some settings ?
not sure if I could convince you to try more steps
inpainting is a hard task, a lot of steps can help a lot
how much steps for inpainting ?
I would have done 100-150, but maybe 40-60 would be ok?
I don't think I had any special settings. Tried different samplers and schedulers, steps from 20 to 40. CFG from 3 to 6.
if you want to do like 30 steps then the stochastic sampler I recommended might be worse
for low steps, Deis and UniPC can be good
well with flux, sdxl and sd 1.5 you don't need more steps. But i hope it's a settings problem. Generating images seems fine, step swap as well (refiner), but inpainting i'm getting horrible results and low quality. Wondering what might be the culprit. I will try more steps to see
ah you are not even using the new slg node
very nice results however
I agree flux was able to inpaint with less steps
even Schnell could inpaint lol
my macbook pro did fine with SD3.5L but it really isn't liking SD3.5M. all the results are coming out messed up.
I did generate 1000+ images at one go, and I may be picky about what is OK... so there is that too 😄
but in terms of image quality is very nice
Yes, I did tweak the prompts for several hours, generating images doesn't take long with decent GPU.
With SD 3.5 medium.
ah I didn't realise these were with euler
so with a stronger sampler you could get some more quality also
what stronger sampler ?
euler is the weakest out of any of the default comfy samplers
for deterministic, DPM++ 2M, Uni PC and Deis are particularly good
for stochastic, DPM++ 2M SDE, 3M SDE and 2SA are good, although only 2SA works with Flux and SD 3.5
you can get better samplers than those default ones but they require custom nodes
what are those better samplers, i have custom nodes for samplers
Clownshark's node pack gives lots of stochastic samplers that work with Flux and SD 3.5
https://github.com/ClownsharkBatwing/RES4LYF
Sampler RK is the latest node as far as I know, and its got a few to choose from
if you put Eta to 0 then they run in deterministic mode
What has happened to her thumb?
Yeah, bad hands in Medium. Like I said, if lora training works it might have a brighter future than 3.0 model.
I gotta ask, as I just had to hit the wrong setting in my BIOS so my PC is in no post mode atm, as it power cycles.
So I need to rely on a cloud provider -- is there any good big names as of recently for 3.5 and flux?
I was looking @ modal.. But is runpod still a actively worked on cloud gpu provider?
I have a feeling it's going to be painful to finetune sd3.5 medium and people are just going to go back to large. I think the architecture makes it so you need a high parameter count to train efficiently. Flux is a good example. It's as big as it is because that's the cheapest to train, even if it sounds counter-intuitive.
yeah anatomy isn't that great but nice for its size, great for people with weak gpus.

2:54 generation time here for 823x1152
In original workflow clip3 was fp16
Really like the details, they need upscalling but pretty interesting to work on
SD's introducing post mentions the Stability AI API, Replicate, & DeepInfra
the post-falling down the stairs
What's your performance difference between SD 3.5M and L? I use SD 3.5M (default) and SD 3.5L (Q8) with following speeds: 3.96s/it (M), 7,18s/it (L) - both use the same settings: shift 3, 40 steps, 4.5cfg, dpmpp_2m, sgm_uniform. I expected to get better performance from the M version.
My env: Linux, AMD Radeon RX 6700 XT (12GB), pytorch 2.5.0+rocm6.1
Hmm
Red Panda?

So basically we have entered singularity when a new ai model comes out every day.
vaporware
I have done a few hundred images of ranking. Red panda seems to have a little more variation in composition, mixed with really bad Dall E 3/MJV6 "Aesthetislop" which people look at and go "oo pretty" cause there is an unnecessary amount of contrast/noise/"detail"
Examples of aesthetislop, where there is just noise and "detail" everywhere to try and get the same reaction as jingling keys in front of a baby's eyes lol
I get why people like it, but its just all noise/nonsensical detail over anything logical or visually pleasing in a toned down way
fast food for the eyes
pretty much haha
just overwhelm the senses so people don't take a look closer and realize its all meaningless noise and nonsensical details/tones
i like the new word aesthetislop
A lot of people who do the head to head comparisons on that space go for "oo pretty" within 2.5 seconds of looking instead of taking in the composition, stylistic variation, or interpretation of the prompt
my friends in my research group have been using it for nearly 2 years now haha. It really does best explain the look haha
Zombiecore LoRA - Match made in heaven 🤭
goodness, more baked than snoopdog
it's also vaporware right now
we shall see, we don't even know what model it is
Blueberry was around for a bit, and now you can use it
I am just curious who its made by
we also don't know if the images you're voting for were created by something new, or something old with a new name. we know nothing about it at all
you mean for redpanda?
yeah, we know nothing about redpanda as of now
yeah. it's vaporware. the images could have been created with anything. once we get a demo, and know who's behind it, then it's time to worry about it
I mean, you are saying pretty obvious stuff lol
but yeah, curious to see who made it
last time it was a new Black Forest Labs model, so we will see
We have been due for a new Dalle4 for a long time, honestly. Dalle 3 was a prompt adherence champion, but my god did it look abysmal
We'll agree to disagree.
In any case, for those awaiting it. The GGUF build of SD3.5 Medium is out
I can probably make one of those charts in Photoshop in :30 seconds... I'll call mine Hawk Tuah and give it a score of 2001, a rate of 95% and a selections # of 76871 and profit 😎
What is Red Panda?
Hi excuse me, how do you load those models in Comfy?
A mysterious AI model that no one have a single clue where it came from
It just appeared in Artificial Analysis
GGUF?
yes
I think it's probably a codename
Workflow is there. You need to install the GGUF extensions and make sure Comfy is updated
like Flux's codename "Blueberry"
great thanks, I found the missing nodes there
Also I see now that the text encoder can be quantized too
what is the SD3-5 vae? I didn't use none before I think
@dusky thistle comfyUI has a new scheduler in it
all models can be quantized. that doesn't mean they should be
great, yes I was kind of dazzled about so much quantization
at this point I load some model and it worked somehow
I don't even remember where did I get it
The only thing about these models is that they take so much I don't fiddle or play too much with the parameters, I just set prompts and img size
what scheduler
lmao
linear_quadradic
Afaik, it's pretty much only used with mochi or w/e that model is called. I haven't tested to see if it works at all with other models
Off the top of my head, it should likely look like something between an sgm_uniform and simple sigma curve
ahh nvm, looks like they use some wonky exponent for it. looks like it spends a lot of time shifting things around a bunch, then rapidly drops into fine details
it's definitely meant for flow models only. sdxl models usually have concave curves and it didn't respond to that when I tried to run this with an sdxl model.
either way, it's likely not all that useful for non-mochi models.
Sd3 is a flow matching model
It is, but that kind of curve is not what the model was trained on. So sure, it will work, but it's not optimal. It will spend far less time resolving medium and fine details using that linear quadratic curve
Think of it like model shifting to an extreme value away from the norm
Shrug
papa toilet
please can anyone explain to me what these two clips do. is one more powerful than the other?
Yeah beta has been my go to for a while now
T5 and clip L, T5 will be for sentences, L for danbouru style tags, T5 is way way more influential
ahh thanks, what kinda tags are these?
here's the architecture
SD3 uses all three encoders, flux only uses two
clip_g is the workhorse - it drives the entire thing
clip_l and t5-xxl work along with it
t5xxl gets your detail rich, narrative, natural language prompt
clip_l gets your artsy, ambient, background, fine details prompt
clip_g gets the no-nonsense, just-the-facts-mam information about what the image is
IF you do not use a node that allows you to put in a seperate prompt for each encoder, then the text you put into the positive prompt will be given to each encoder anyway, and they will fight with each other - they aren't in sync
nice explanation, I would like to experiment with that concept, it seems to be really important if they work like that
for example in this img the text that appeared was just some text to reinforce the idea, not some main title to write
here's a workflow to start with. this has the SD3-2b-medium model loaded, but you can just change that to SD3.5 if you wish
it would be in the clip_l I guess
the encoders alone do lousy text. you need at least two of them for it to be readable
I wonder how much different could you find when trying a single prompt and these 3 clip prompts, is there something done?
damn thanks for t5xxl level explanation 😃🙏
the idea is that I didn't want that text to be readable, it was just some text to reinforce the idea of the image
this is an SD3.5 Large workflow i'm working with right now, it also has the node for all three encoders. to get the workflow: 1. click the image to open in viewer. 2. click the Open in browser words. 3. right click, save as. then drag and drop into comfyUI
so type random letters: here we see a sign with the word "swrenwtw rerwun ewwn"
the AI will try to guess what it might be
there wasn't a sign mentioned, it was "advertisement" so it make a text and put something in the prompt.
just an example of what you'd do for random text
so clip g would be the classic SD prompt, clip t5xx would the "chat GTP" type promtp, clip_l would be booru tags, like "trending on artstation" ?
i would advise you to never use trending on artstation unless you just want random dice rolls for noise. and most of the other tags like it. they are just noise.
here
clip_l: trees, warped, twisted, weathered, ruined, cracked, flames, fire, smoky; atmospheric lighting; peter mohrbacher, james jean, william morris, ernst haeckel, zaha hadid
clip_g: apocalyptic ruins; pink tree growing flames;
t5xxl: the scene is set in apocalyptic ruins. in the center, we see pink tree growing flames rather than leaves; All that is left of the buildings are parts of walls with crumbling bricks
=======
clip_l: thick impasto painting, heavy canvas texture, noticeable brush strokes
clip_g: Stratocumulus; Renaissance Island beach sunrise, maya beach by artist "Paul Dougherty", by artist "Nicolette Ceccoli"
t5xxl: the scene is a stunning sunrise at the beach. cumulostratus clouds cover the sky with red and gold, while transparent green water and white foam breaks on the light brown sand,
=====
there are 2 sets of prompts to play with
@dusky thistle comparisons. model: sd3.5 large sampler: euler_ancestral schedulers: on the left, linear_quadratic. on the right, beta workflow is in the images
that looks real good on the right
what's interesting i've noticed with this flow models is it's like it just needs to set the trajectory real carefully
almost like aiming a rifle
it does. i'm working on that sampler/scheduler sheet - should be done tomorrow
and then pow, it can take massive steps
did you grab the sd3.5 medium workflow with SLG?
no idea, been coding all night
skipping layers, like matteo did with flux back there. skipping the blocks
just got a brutally, just, oh-my-gawd-don't-want-to-admit-it bug fixed that's been hanging around for weeks
cool yea sounds like PAG with zeroing out the V
files are there when you're ready
do you remember that really long node matteo did for flux, where he set up every block so you could adjust them?
and then he talked about how certain blocks didn't do anything and we played around with not using them?
ohhh so clipl recognizes artists and styles and t5 is just straight up natural language. woow i'll try this thanks!
it's great having comprehensive nodes like that
i make shit like this all the time
okay, that's what SLG is. only not nearly as long and annoying a node as those
there are two workflows on that repo, one with, one with out and you should probably DM lykon about that setup you have and talk to him
he's got teeny wings coming out of his neck
hahah yeah
the one that screen shot was from
i'll bet those nodes aren't avaialble, are they?
add the term symmetrical to the prompt as the first term
they should be
i think they're in there, if not i can get em to ya
i'll go poke around and let you knwo if i can find them or not
DM lykon, and go crash
@lavish osprey @graceful osprey SD 3.5 Medium modelspec resolution is 1440x1440, however the model struggles to run this resolution natively: (it's fine at 1024x1024, and I bet it can upscale to 1440 happily, but native from the ground at 1440 doesn't seem to work too well. Maybe with the skip layer guidance, longer prompt, luckier seed, and all you can push it into working? idk even then I feel like recommending 768 or 1024 with an upscale is smarter than recommend native start at 1440)
that's a photo of a cat and that staircase it's inside of, is the early steps completely corrupted, and the latter detail steps managed to refine it into something that almost looks like a real structure
Flux doesn't do well at the advertised top resolution of 2048 either
unless you are using I-max
i was able to generate FullHD
but not first try
To be clear, I'm not discussing the advertised top resolution, I'm discussing the defined default resolution, which inside of the released SD3.5 Model is set at its top, but should not be
that's defined in the metadata header of the model if you're unfamiliar with modelspec
which https://github.com/Stability-AI/ModelSpec?tab=readme-ov-file#inferencing-tools-and-uis Apply relevant keys where logical (eg resolution in image models should be applied as the default resolution for images made with that model) that key is intended to be the default usage resolution
ah thanks
I knew there was metadata inside .safetensors files but I have never looked at them
okay yeah you have a good point then
Clip_l, Clip_g, Clip_round_the_ear - what does this one do?! 🥳
With questions like that, you'll soon find out my lad! 😄
Even the strongest have their weak days.
on Civit there is someone adding Flan-t5-xxl to Flux and SD3.5
a cat
I doubt it's a cat that's adding it.
so what are the best clips (!?)
I'm trying this
ty, I have to try that
Hello everyone, I really need help. I was struggling all night, installing a stable diffusion, literally for one job, and right now I'm launching it for the first time and it drops what's in the picture, talking about old drivers. I can't update it because this is the limit for the video card.
I want to ask someone to do literally one job. I'll be extremely grateful if someone responds.
the big speed up is only gonna come once you have a setup that fits within your VRAM
regarding clips, some people like this for Clip-L https://huggingface.co/zer0int/CLIP-GmP-ViT-L-14
although I am not sure about it
Another big key difference about sd3.5 medium is that the t5 should contain no more than 256 tokens
Flan-t5-xxl for T5 I am more sure about
what what setup that might be?
make sure the text encoders get put onto the CPU before the Ksampler starts
and try Q4_0
or just offload the text encoders entirely
after text encode
And SLG works well with 3.5 medium. Don't know about large though since medium has the slightly different architecture with extra self attention layers or w/e it was
SLG will work with Large and Flux I think
just needs porting and recalibrating
its a clever idea, it drops layers for the negative so the structure gets messed up
👁️
since models try to do the opposite of the negative, you end up with better structure
for Flux this will require generating a negative, with the 100% speed penalty
would be good to try this method with Perp-Neg also
what are the resolutions suported by sd3.5 ?
I just use the same as SDXL, which are in the "SD3 Select Latent Resolution" node
Hmmm weird. 1920*1024 should just work out of box
ah but you might need a higher shift
like 4~5
1920x1024, Seed 1, a photo of a cat, CFG 7, Sigma Shift 3 on left and 5 on right, Sampler Euler scheduler Normal, Steps 20
in both cases while watching live previews it's clear the early steps are failing and the latter steps are trying to recover the corruption into something coherent (in this case a TV screen type of effect on the background, and a cat with a patchy outline lol)
here's that at the end. The two other images are intermediate previews just to show what it's doing in the early steps
for comparison at 1024 the early step previews have a relatively clean coherent image, just without details built yet
In Flux I got better results with stochastic samplers
the built in DPM++ 2SA in Comfy has been adapted to work with ret flow
Noisy beginning is kinda expected but should not bu this bad
and also those early previews resemble the final image moreso than the high res
I've been trying to port restart sampler to ret flow, it might help with these models
its given me the best results in SD 1.5 and SDXL
it adds the noise in a special way, its a bit tricky
what are the schelduders and samplers ?
does someone knows a prompt system for prompt enhancing
omg i never knew stuff like this existed! is there one for img2img
hi, I found same problem when not using T5.
1,2 - clips only
3,4 - clips + T5
why is my workflow i used for sd3.5 l not working for sd3.5 m
oh shit, i'm so used to ignoring T5 with SD3 since it did almost nothing with the SD3.0 models, is it actually needed for SD3.5 Medium now?
you can run with 1 text encoder, 2 or all 3.
i know that lol refer to context of the conversation above, it's a much more specific thing
i have been using with no clips too and saw no errors until increased resolution above 1024px
you asked if t5 is needed. The answer is no. lol
By any chance you know why my workflow is not working?
again, within a specific context of the conversation above
this is an open chat sry dawg. Maybe reply to the message
seed=1, steps=20,cfg=7, CLIPs + T5, Steps=20. Seems still wonked out. left is 1440x1440, right is 1920x1024
oh i left dpm++ on instead of euler, slightly different details with euler but still wonked
"a photo of a cat"
seed 1
steps 20
1440x1440
dpmpp_2m
sgm_uniform
cfg 4
fp16 everything
maybe too high cfg?
hmm there is stripes in background but seems to be gone if I specify
that looks very similar to my results on euler at cfg 7. It's clearly corrupted on the background, and the cat itself doesn't look great (those eyes are barely existent)
the specified background reduces the visibility of damage, but that image still looks pretty off
i think this is just a case of, the model wasn't trained enough at or above the 1440x1440 res, and the more input you give the more it can compensate and make it work, but if you give minimal input the difference shines aggressively
this is a general case with image diffusion models - they are largely self-correcting, so giving it more of anything helps cover issues. More prompt tokens = wider attention = more self-correction. More steps = more actual entire runs of the model = more self-correction
(relatedly in the other direction, generating with empty prompts is a great way to sus out implicit biases in a model, whatever it outputs without conditioning will be approximately representative of the types of content it was trained the most on and may have developed a general bias towards)
(in that test, SD3.5Medium has a very nice broad range, and seems to favor 768x768 for the most stable generations, even at 1024x1024 with empty input it displays some striping patterns)
yea there is definitely undertraining in some cases but I do not judge it much as it is base model and will be improved if adopted by community
yee
does that means model is unbiased or that is bad thing?
that means model lacks any visible bias, which is good
I've seen some models eg continually generate humanoid outputs on empty prompt, indicating that the model was violently overtrained on humans to win on "omg it makes pretty girl" aesthetic evals but otherwise useless as a foundation for anything
SD3.5Medium has a very broad range of random outputs from that, indicating it likely hasn't been overly tuned on anything in particular, making it very optimal for a base model for community tuning adoption
(noting however that's just a quick n dirty test, not a guarantee, so grain of salt and all that)
sd35m promptless outputs
maybe a touch of over tendency towards artsy stylings but eh
and for comparison, Flux Dev
burnt hard on high quality photoreal art
that woman included is The Flux Woman who everyone sees constantly, and the community have taken to identifying by way of her specific "butt chin" shape
majority of finetunes 
but it is probably inevitable future of sd35 to become coherent. I compared sd35 with flux on "lying on the grass" and sd35 often tried to do the hardest thing, like upside down, while flux often choose the easiest route by generating non-upside-down and non-horizontal (idk how to name it)
yea I had similar result
so, you either have diverse model with lacking coherence or coherent with lacking divercity
yeah, i went on whole rants about that during the sd3 launch. no model is good at upside down woman, sd3 was just the one model that tried to do it without asking
i think the ideal option is the diverse capabilities, but with something to avoid it defaulting to things it's bad at. eg an LLM prompt augmenter that takes short prompts and writes long prompts for it that don't suck and give good defaults for things the image model might make silly choices for
it's funny how often people have said my discord bot running flux schnell looks better than their flux dev results (or similar comparisons)... the model's worse, but my discord bot has an LLM that extends your prompt, and that does magic
not sure if it's just the distillation, but those all look like MJ images. Synthetic data, not just any "photoreal art". Especially the woman "sameface" with the butt chin, that's 100% MJ
yeah, another artifact on training on real world data. If you go on photo datasets they're usually using various different angles. Models trained on synth data tend to overfit on upright position or, in general, simple poses.
After playing a lot with flux and SD 3.5 large, i began to forget how fast 1s/it was, Man SD 3.5 medium is so fast
if anyone wonders. got it fixed updating comfyui via manger.
<---------------- needs OmniGen on ComfyUI
I probably should add some tiny llama to extend prompts
I often see people saying sd35 is not cinematic\realistic but that is sooo wrong, they just used to models defaulted to certain style. In that cause you have to prompt to get it and you will
agree, even if I'll have to run it for a 5min, it would be interesting, and I saw it is actually great, but suspiciously reminds me flux's aesthetics, ||but I might hallucinate||
#1243166025000943746 message relevant omnigen testing in the swarm discord earlier
Its the power to meld disparate elements of different photos - seamlessly - which excites me
seems like it has cool capabilities but its general image gen quality is very "meh" tier, making it hard to justify the disgustingly long gen time
The more you drink, the more you save! 🙂
My 8Gb VRAM 'excluded' me from using Omnigen on pinokio
There is an Omnigen Huggingface https://huggingface.co/spaces/Shitao/OmniGen
(SD3.5L + SD3.5M)
Maybe I am just choosing the wrong subjects, but so far I have been underwhelmed by SD 3.5 Medium. Compared to Large of course. (workflows included, but prompt is basically: "a comic with strong outlines of a tree that stands tall in the center, with cats of various sizes, colors, and expressions scattered around its trunk and branches. Some cats are nestled among the leaves, while others are perched on the tree's main branches. The tree's trunk is brown, and the leaves are green. The cats exhibit a range of facial expressions, from curious to content." First is SD 3.5M at 768x768, then 1024x1024 and finally SD 3.5 L
same seed
Anyone have a quick TLDR of the state of things? How is 3.5 compared to last released model or Flux? seems there are stripping patterns is that the worst of it?
I have not seen striping issues myself, but it seems to completely whig out at sizes over 1024 x 1024. Large that is.
the output is quite different and they have different strengths and weaknesses. If you want hands, well, Flux is your choice, but there is more to imagery than that. Comics are clear win (no Loras) for SD 3.5 L
other than that all good?
what about text?
faces/body how maleable is it?
can it do the dreaded woman laying on grass
astronaut on the moon
The thing is I don't really do stock photography with people. But that much is fine. Just try it IMHO. Nothing endemic
Hello Chess man
comfy still the best way? forge? anything developed better?

Hey @rapid pivot
Large can do woman in grass too, but obviously worse then flux in anatomy. Flux is also better at text.
It would imply I am trying a bunch of other GUIs. I tend to stick with what I have if I am satisfied and it does what I want and well
and COmfy fits that description
The trick is to explore them with the themes and types you want, and leanr which are best at what
so essentially Flux is currently outperforming and we will need to wait for 3.5 models to be retrained by enthusiasts? were legalities and licensing changed to encourage more of this or we still at old restrictive ones?
I can tell you my findings, but they only cover what interests me. So anime? No clue. Chicks with boobs? Check Civit. Etc. Artwork? Stock imagery. Creative content? Text? I'm your guy
I mean, in terms of text, the clear no.1, of any AI, is Ideogram 1.0 and 2.0. It isn't even close. Flux isa decent second, but distant. Logos? Flux is the king today. All the others. MJ included, are just behind
MJ6 that is. MJ 5 was strong
Satirical comics? Imagen 3 is the best now. Also not close.
afaik Flux hard to train and that's why we were waiting on 3.5 or model like it. and it appears 3.5 needs more training tossed at it.
I don't fanboy any model. I just tryt to leanr what the best tools are for each case
looking for local
sd3.5 M works pretty good for me. However, CLIP Text encode node (positive as well as negative) takes extremely long to load, any ideas why?
Have you tried the GGUF models?
Sorry, forgot to mention, thats what im trying rn
I use the T5 Q8_0. WOrks fine for me. Though YMMV
ah damn, do we also now have different T5 models we need to deal with?
Flux has a big edge RN in that it has been out for 3 full months now. So TONS of strong LoRas. SD 3.5 hasn't had this level of development yet, so give it time
GGUF means it is quantized. Designed to load and be processed faster and with less memory needs
as it is loaded in blocks
and not as a whole
Changing to 3.5 brings a tokenizer fault at the negative prompt
models need to be trained, not just loras. the downside of Flux model, however we go back to how easy/ licencing for 3.5 is otherwise wont get models created
im just wondering because im even trying with dual clip loader t5 & clip l + gguf variants of sd3.5.
Shouldnt it be way faster than flux 1 q8 with both of them?
i mean the ksampling is pretty fast. no problems at all, just with text encode
I use three clips, but have not bothered to time the entire lot. I should and will, but overall I have found them to be pretty close. At least not different enough to warrant Ohs and Ahs
I am talking output results. For me the talk about what is the driver for an improved image, be it LoRA or checkpoint, is semantics. I don't actually care so long as I get what I want from the combination.
kk thanks!
you can theoretically have an endless stream of loras, but I do not like to stay with a single acute style. Its easier when a model inherently knows what it needs to. and makes the image generating creative workflow more malleable. essentially Lora's are an overfitment run on top, sure easy fast way of getting what you want if you want specifics but as a whole a model by itself that understands you is better.
Reminds me of an arument I had with a strong chess master who offered to mentor me to mastery myself. He set out this study plan that involved self studying matches as far back as the late 1800s so I could 'absorb' the evolution of the game and blablabla. I told him I would prefer a more focused and pragmatic approach that maximized results (so long as they did not impair evolution). We soon parted ways. I don't have infinite time, nor patience.
How is citing endless culture and names going to matter if I lose because I simply played worse? Same for AIs. Talking about why one way sounds nicer on paper is unimportant to me. If I get the results I seek, all the technical perfections of another setup or method won't matter one whit to my eyes.
i am asking the wrong person then. was looking for more of a power user, jack of all trades approach.
I wasn't aware you were asking anything at all.
You asked what is best. I told you in results. You then railed about why models and Loras should be splitting the output development differently, which is semantics. But there was no question involved.
Is it the purity of some technical aspect that matters to you? Or better images that fulfill your requests?
I made clear where I position myself
Devolving into personal attacks? Is it that hard to just have a civil discussion?
If you were expecting 3.5 Medium to be equivalent or better to 3.5 Large... well your expectations are backwards. Medium is smaller than Large. It's expected to be a bit weaker on quality, but with the perk of running fast and using less vram
SwarmUI is the best UI
many thanks, will give it a try, took a hiatus from SD and trying to find where thing are at as a whole. Swarm is a gui with comfy backend right?
if you use dualclip (CLIP G + CLIP L) in non-gguf, it should be blazing fast and near-equivalent results to triple (the T5 on SD3 is relatively weak, but the most intensive to run)
yes
I quite understand, but the images I showed emphasize issues in the output that make it unusable. For those topics as I explained
I used SD 3.5 L to show that it wasn't a general failing by SD 3.5.
That said, here is the type of image that Flux can only dream about doing, but that SD 3.5 can do:
Depends on what you mean outperform, but yes imo. Sd3.5 is more creative but flux is obviously better at text, anatomy, and sometimes prompt following.
So sd3.5 is a great tool too, and it also has more knowledge but from what flux knows, it performs better.
Prompt?
thank you, appreciated
in the image
all settings and meta are always in images I post here
an Impressionist Cartoon of a tree covered in whimsical cats on the base and branches all drawn in a variety of colors and facial expressions in the style of Andy Kehoe and Skottie Young. The outlines in the layered 2d art are strong and reminiscent of Keith Haring.
from what I heard flux is also easier to train, despite being a distilled model
guess the way to go for now is using sd3 for creative arts and flux for more realistic/clean images
Maybe for Lora’s but not for full finetuning I believe, and yeah i think the flux and sd3 for different tasks is a good idea.
How about the 32-bit T5?
who needs full fine-tuning
huge file even GGUFed. (18+ GB)
I meant F32, sorry
what about it
It’s better than Lora’s especially if you want to make the model learn more. Loras are only good for small scale learning(one or few styles)
You said the T5 for SD3 is weak. I asked whether you expected tangible benefits using the F32 version
GGUF makes things smaller by reducing the bitwidth, so if you explicitly use a high bitwidth yeah it's big
no, the problem is that the model's backbone (the MMDiT part, that does actual image generation during sampling) doesn't particularly care about what the T5 has to say, because CLIP G provides much more useful information during training
the bitwidth doesn't enhance anything, it's just narrow refinements to precision - fp32 requires twice the memory of fp16 and microscopically more precise data, it's incredibly pointless to use fp32 for anything outside of training
even fp16 barely provides more precision data than fp8 does
(this btw is the reason why Flux doesn't have CLIP G: by removing the clear G signal, they force the model to learn T5, and once it's willing to learn T5 it can achieve better results in the long run)
That's quite interesting. I assume this was understood before SD 3.5 went into training. Any reason why this path was not chosen?
(SD3.5L + SD3.5M)
Hmmmm! Playing around with the samplers with SD 3.5L dpmpp_2s_ancestral just gave me my best result by far in my Keith Haring cats in trees prompt. So far my request for his style and strong simplistic art and outlines was ignored, though it did adhere to my desire for rich multicolored whimsical cats. To be explored:
you're dissing someone that is what you're looking for. this makes little sense to me
use the workflow in this then, this is 3.5 large
we will do this another time, in the mean time have you found SLG workflow usefull or you running it without it?
i'm currently finishing the sampler/scheduler compare sheet. however i did work with it some yesterday, and i played around with the same idea when matteo released his block skip node for flux a couple months back. it's very useful, but you need to be extremely careful with the values. one interesting thing is that by enabling skip, you may get better text - but you will also lose fine details. image with skip turned on, on the left. image with it off, on the right
it's implimented in 3.5 medium as a way to tweak things like hands and feet if needed, or other things that are warping that you don't want to warp.
it's not intended to be a 'turn it on and use it on everything'
they are completely different pictures all together, guess there is no free lunch. results for skip for text based on that sample are crazy though.
skipping layer 2 only - skip on the left, without skip on the right
you can see that the fine details are deleted
wow yes, right picture looks really nice with the splattering of paint
i dont think its barely been touched yet, people plug it in and use the defaults in it, can have lots of effects but i sure as hell aint gonna figure it out
if you're referring to the block skip for unet called free lunch, this works in the same way, but the blocks we use for SD3 are NOT a unet structure so you will ahve to carefuly play with the values and layers to get an idea of what you want to turn off or on and when
i am. that's the project i start after i finish this sampler/scheduler compare sheet
there is a lot of possibilities with this, maybe there is a magic combo somewhere, will need to go see the paper to see if they talk about what layers represent what aspects
actually, you really want to go talk to matteo. or talk to @dusky thistle - they're the ones that have done the most digging into this sort of thing
thank you for the resources, let me dig into and familiarize myself with it first.
Nihilism LoRA
@craggy crest top is SLG lod have mercy, really hard to find a sweet spot, finding start percentage the higher it goes the more distorted, needs low initial value.
heh. yeah, it's not an easy, point and shoot, idea. the people that have dug the most into what each of the blocks are actually doing are clownshark and matteo. and yes, you want very small values. not sure if your node will allow decimals but if so, try them. 0.001 vrs 0.00001
i assume just keeping matched to SD3
ie avoid too much arch change until there's enough to justify an "SD4"
https://replicate.com/stability-ai/stable-diffusion-3.5-medium SD 3.5 medium is up on replicate if you don't have a machine that can run it
Cats with 4GB VRAM (send help)
@bitter hearth
just type @cat and then pick from the list to tag
@dusky thistle one of these is dpm_2/linear_quadradic, the other is dpm_2/beta - 3 guesses which is which
I will guess Quadratic is the more detailed one (second one)
Beta/Linear_quadratic, worked very nice on this one, my opinion
ddim_uniform tends to warm images up
linear_quadratic is a bit of a spicy choice
sprints through low sigmas at lightspeed
dont' trip at that speed...
sprints through low sigmas at lightspeed```
@bitter hearth
i dont really know what im doing but i got detail daemon in there too bosting early if that even makes sense
detail daemon will help with linear_quadratic
because it will offset some of the detail loss
so yeah that makes sense 🙂
these are all dpm_fast. one is exponential, one is karras, one is normal
can you guess which is which?
never actually checked out what exponential graph looks like TBH
I think the third one is normal
hmm not sure
;) now you have to go play
👍
I still can't fully understand what to put in each clip; sometimes I feel like I have it clear, but then I don't.
If I have to repeat or reinforce the idea in all
I'm using this GGUF model, but it takes 4:40 minutes, while
this large model I think it took 4 minutes only
GGUF models trade speed for less vram requirement, so a GGUF model will be slower then an FP8 one while also being much lighter on the vram requirements
now that you say it, with this model my PC haven't go OOM or stutter, that's true, iirc with the large one there was some steps where videos would stop playing for a bit
The red panda model which was #1 on the text-to-image leaderboard(beat flux.1.1 pro, flux.1 pro, dev, schnell, sd3.5 large and turbo): <
https://x.com/recraftai/status/1851706399631224939>
It has very bad prompt following(maybe only in this prompt idk), but great realism.
prompt: a high quality photograph of a white cat sitting on top of a blue dog on a brown couch in a living room. Behind them, is a window, with 4 cow pictures, one in each corner. Outside the window is outer space and a ufo.
Flux.1 schnell 8-step on left, Recraft v3(red panda) on the right
indeed its finally revealed on the image leaderboard too
idk man it looks average
especially if its closed source
bs its very good
you will see realism and text is pretty great but prompt following is very disappointing. No reason to use it over flux or sd3.5 large imo.
recraft
I guess it does do "crappy style" photos out of the box, but that's required for it to be used as a service with no lora options
A close-up, realistic portrait of an elderly man dressed as a military soldier. He has deep wrinkles, white stubble, and a stern, weathered expression, symbolizing years of service. The uniform is slightly worn, with medals and insignia visible, and a green camo pattern typical of an army soldier. His eyes reflect resilience and experience, capturing the weight of his journey. The background is blurred, focusing solely on his face and upper uniform, creating a dramatic, respectful portrayal.
the texture is nice
use the hard flash style option it looks much better
Yeah its nice at realism, but as I said above, prompt following isn't great. Open source models can do similar gen's too in terms of realism and have much better prompt following.
and its gonna be paid sooner or later
it is payed
how much per image/credit
but u get like 50 free images
and then 5 free every day
but after then 10 or 20 dollar a moth
if you want omore
you can also ulode images and make your own style
kinda like those ai video websites
yes
Flux schnell(considerably worse then flux dev, and sd3.5 large) on the left vs Recraft v3 on the right,
prompt: A blue block on top of a red block. Next to the red block is a green block with a candle on top of the green block.
it seems like its worse at promt understanding
i cinda like this image
reminds me of something idk
but it has less often this fake real look with that smoth skin
aesthetics are good though
oh like ipadapter for SDXL
like there you could load an image and it would make an image in that style
granted, it wasn't perfect, but it was good enough
Yeah its considerably worse then the new open source models on prompt following(flux dev, schnell, sd3.5 large, turbo) imo.
But yes its for sure more realistic then out of the box models, but there are many ways to make models more "realistic".
t5xxl: give it natural language that is rich in details and adjectives. clip_g: just give it the basics of the image. clip_l: give it all the fine details and artsy stuff. example: t5xxl: closeup on a vase with roses in it, dew sparkles on the petals. the light shines in from the side at a slant casting shadows across the scene. clip_g: roses in a vase near a window. closeup. dynamic light and shadows. clip_l: sparkling bits of light bounching off dew on the rose petals. soft, bokeh background.
the model does look good here
during the testing period on Artificial Analysis I mostly voted against it though
my preferences seemed to be for Flux Pro 1.1, Ideogram V2 and Midjourney V6
however these ELO tests are unfair to the open source models
because they get fine tuning and Comfy workflows
but ELO test doesn't reflect that
when i tested it was always very close most of the time promt understanidng was not the porblem
its close yes, at the top of the leaderboard
I did come away with the impression that the gaps between models are very small
at some point you cant make it more realistic
there are upscaled SD 1.5 images that look like photos even
the difference between workflows is like 100x larger than the difference between models
but these are very limited finetunes
yes but I don't see the downside in checkpoints specialising
you could train a small router model to route prompts to appropriate checkpoints for example
I guess storage, and loading/unloading to VRAM is the disadvantage there
SD3.5 medium. Prompt: fantasy painting of a handsome lion Knight,long wavy hair, slight smile, piercing green eyes; emerald, symmetrical,intricate engraved armor; hyperdetailed. the words "Kings wear Crowns"
too many checkpoints after a while and you get lost in the clutter
Civit does feel like that
interesting that even Midjourney chose to have their anime checkpoint separate though
yikes that recraft is rough. I just tried about 10 prompts that look amazing on flux and was really unimpressed by recraft. I'd probably pick sd 3.5m over it for looks (just don't prompt for hands).
Refract vs Ideogram
To be honest, I still don't understand why this model is in first place...
Yeah I mean it’s great at realism but it seems even more constrained then flux and has much worse prompt following.
that refract would be decent on its own, but that ideogram pic is killer and makes refract look like sdxl
at a guess, it's some one's attempt at a flux finetune.
its a big one, going by API pricing and timings
@craggy crest Wanted to share that I have taking a bit of a "break" after all from bigger over arching training, in favor of my first concept training on Flux Lite, which seems to be working extremely good for my first attempt, so I am happy about that haha
the aesthetic fine tune is a bit off, especially small details
its got signs of a strong model though in composition and blur
cool - pictures or it didn't happen
Training a dappled sunlight LoRA. Cooking up some more examples, just real slow cause I am on my 3060ti lol
my dataset is pretty small and less than ideal, so I will be trying to get good enough results to supplementally train off of for better feature reach
prompt?
striped dappled light on the face of a young black boy. The light on his face is in striped and he is wearing a gray tank top with very short black hair, dark background
right hand side definitely better in both cases, nice job
thanks <3
sd 3.5, no lora
mixed in a smidge of my realism LoRA's to make it look a touch better as well haha
just your prompt
hoenstly, not bad
modified your prompt to:HDR photography: striped dappled light on the face of a young black boy. The light on his face is in striped and he is wearing a gray tank top with very short black hair, dark background
sd3.5 large, no lora, just the prompt
Comic Book LoRA
its a decent beginning for that as a concept, nice
you really like the word 'whoa!' don't you?
and no need to sit there and battle fo rhours and hours and hours with a base model that's frozen and doesn't want to do what you want
ok, this looks ASTRONOMICALLY better haha
??? what... does?
Oh yeah, I talked to two people with doctorates in this scene, and they both agreed that flux is not "frozen" and that it was a completely false claim made by some dude on reddit with no real truth behind it
They spoke way higher level about it with me, much higher level than I understood, thats for sure but yeah, two different people were 100% sure that "frozen" is a BS claim from people who don't understand flux or distillation. One said friend was the creator of Libre Flux and the writer of the paper used to prune flix down from 12B to 8B, so I would wager he knows what hes talking about 😅
this
i coludn't care less what they say. i know what was done, they don't
casually doesn't read the part where they wrote an entire paper about flux and pruning/de-compressing it
casually reminds you that i'm a programmer and said i 'know' not 'im assuming'
ok man, I will continue to believe people who are much more qualified 😅
it doesn't matter anyways, flux trains great regardless of what some people try to say about it 😅
i highly doubt they are much more qualified. just that they support your assumptions.
do you have a doctorate in machine learning?
i don't need one. if that's a requirement, then robin rombach, the creator of stable diffusion, is unqualified
actually no, I am not gonna get into this, it literally does not matter in the grand scheme of things
anyways
of course it doesn't. you can sit there and pull your hair out, melt your gpu, and pay outrageous electric bills to train unnecessary loras. i don't care.
how are things going with medium? I heard its pretty good
better or worse than large?
you haven't played with it yet?
ah. well - get a few moments to go play with it and see what you think - try using it as a refiner for your images generated with flux
Well considerably worse in anatomy at least but ok in art styles. Decent as a upscaler/refiner.
jesus, worse in anatomy 😭
I guess it is more of a refiner after all, so oh well there
it is a smaller model - that's why you can also use SLG with it if you need to tweak the anatomy
how would you do that
a lot fo people have been playing with it as a refiner
Yeah, that seemed to be more of the direction people were leaning in for a while
grab the exmaple workflow with the SLG node for comfyUI, update your comfyUI, and then play around with the values
oh yopu dont have any input
I'd be down to use it on my flux lite gens
you asked how you would do that. not what tests i'd done?
Yeah that can help a bit, it’s far better then sd3 medium in anatomy but aroundish base sdxl in anatomy(worse without slg).
there are 3 workflows in the huggingface repo for medium: SD3.5L_plus_SD3.5M_upscaling_example_workflow
SD3.5M_SLG_example_workflow
SD3.5M_example_workflow
but update your comfyUI before loading any of them so you get the new node and scheduler
Is hould know... I have trained over 1000 LoRA's on base and was part of the beta program for it where I gave feedback on it before public release
I would kinda expect it since it’s smaller then sdxl. But it’s workable at least, not as bad as sd3 medium.
I mean, its basically the same size as SDXL, and its over a year newer with new tech, much more training, and a WAY more mature industry
it IS sd3 medium, finished
was wondering how you would tweak SLG for anatomy as you said, but no info
just looking for info
I still think that medium will be far more viable than large, so I wil have my eyes on that if anything
SLG is skip layers guidance. so you are turning various blocks on and off, and changing values. as an example, this is using it to tweak text. for this i had 3 layers turned off and only two of the three encoders working on the prompt. With is on the left. without is on the right
@craggy crest better? 😛 😛 😛
those are really cool :) very DC comics style
It's a nice LoRA indeed. Available for SD3.5 as well 😉
zombie ironman?
Very cool!
Sd3.5 2.6b vs Allegro 2.8b(text to video model)
work out which step the detail you want to work on appears, then apply SLG only around that step, trying each scale from 0-10 in intervals of 1
you can start with the default blocks
its important to have it off for the final smaller sigmas
cos it will reduce fine detail, just like PAG does
you also probably want it off for the first 30% of sigmas or so, because it could reduce image diversity
i think you just made kagi's eyes crossed. can you boil that down?
i think he really would like to be able to use it
I don’t think this is Albert Einstein lol, allegro 2.8b
prompt: Albert Einstein walking around in a futuristic world, far away
Thanks <3 Training flux 8b haha
maybe this leave layers and scale at default, start 0.3 end 0.7
keeps it away from the spicy areas
I like minimax the most out of these proprietary video ones
Flux
Thank you for using comcom analytics.
"comcom analytics" supports all community managers (moderators and server owners) by stats, visualization, and analytics.
If you have any questions, feel free to ask us!
Your dashboard
Help
Support server
Other languages
en: help
ja: help Japanese
Allegro is open source with Apache 2.0 license. It’s 2.8b params dit.
There’s also mochi-1 which is considerably better also Apache 2.0 and 10b dit.
In the text to video leaderboard, mochi-1 is #2 behind mini max, and beats kling, luma, gen3.
Mochi is amazing especially for a open source model. I prefer it to mini max honestly. These ones are generated by the official website which uses the open source model and a upscaler.
sound on
@craggy crest Is this correct "frozen just means no gradients apply to it"?
not what I mean, at least. flux is basically a giant lora
I was asking in general from a convo I just had. To me I have had so many failures on Flux then I take the same dataset and throw it at SD3.5 and BANG, success after success. I mentioned that SAI said theirs were open (unfrozen) weights and was told they will say anything to stay relevant. Don't know just know I have had no real success with Flux locally while I have with 3.5. If the malformed appendages could be fixed then it will be a dream to work in for me.
there has been a couple of pieces of news about Flux that are important regarding training
the simpletuner dev managed to train flux for over 2500 H100 GPU hours without it collapsing- this is $7,500 worth of training, more than needed for the vast majority of checkpoints
secondly the realvis dev managed to train a new checkpoint on flux de-distilled and boost the aesthetics a lot
so progress can be made its just tricky
My hope is for Juggernaught since they are working with BFL directly it gives us a simpler to train on Flux.
not sure who told you that but - flux is not only distilled, it's also DPO'd. it's frozen. 3.5 has had none of that done to it.
to do anything at all to flux, you basically have to break it and then assemble the pieces into something that's not flux
I agree, and thank you for the confirmation.
there's a reason that when asked about it, black forest said they were not interested in making it trainable. also a reason they never released any information on how to train it.
BINGO
and a very specific reason why you can train SD3.5 almost instantly, and it is such a massive fight to do anything to flux
Local it sure is for everyone I know who trains
we're being 'watched'
wristwatched to be exact 😛
there's also a reason that all the devs who have been fighting to do anything with flux since it released dropped it and grabbed 3.5 the second they could
rock-n-rolex?
Beautiful news
trainers came out, there was a lora the DAY it released and the guy that made it didn't ahve early access, etc
it's almost effortless to work with
Flux is sledgehammer training then you finally manage it and concept bleed ruins it. SD3.5 is a sponge.
flux is an uphill battle unless you stick in it's very tight range.
it's a tool. a very good tool, with a very specific purpose. use it for that and don't try to make it be something it's not
I wish we had a save to lora node, or save to checkpoint then extract a lora from it, for 3.5.
not how it works. you have to train a lora, just like you do any other model. you can't just 'save' a file
I'm not stupid I meant you have 3.5, you have your lora. you adjust the lora. perfect. time to save it out. YES, there is a save lora node in comfy just not for 3.5
the old way was save it to checkpoint and extract
well - if you save the workflow, you'll always have your settings. and you can save templates
just set the lora up, and save it as a template
of course, but this is for release. once we get the tools we had in XL then I can rock the training for 3.5
they should come fast if the community can leave flux for 3.5
lakers win the world series!!
there's a basketball world series?