#🆕|sd3

1 messages · Page 69 of 1

low stone
brittle nexus
sterile pendant
#

What tolerance ranges are you using? That's what truly determines the quality with that ode sampler node. Like there's a big difference between 3/4 and 4/5, or 4/5 and 5/6. Literally 10x the precision difference since it's a whole extra decimal point. But it will make compute times go up by a ton

torn wharf
#

yeah i determined the problem is because that sampler it's only doing 1 step and calling it done. i don't have the workflow set up correctly somehow and just gave up. i was trying all sorts of ranges while playing

sterile pendant
#

3/4 or 4/5 work well, any higher and it's a waste of electricity usually. For the adaptive ones, just set the step count to like 200 so it doesn't abort early, it acts kind of like a jeopardy music timer

#

Oh and bosh3 is the only one that's really worth it for SD3

craggy crest
bitter hearth
low stone
#

I'm finding auraflow to be more prompt following than sd3 2b at this point. Sometimes I get more elements on sd3 but It feels sdxl'ish to be combining subject details on sd3 like that. The prompt: A chubby, bearded man with a shocked expression looks up at the ceiling, his mouth wide open. Colorful, fluffy creatures with angry faces pour out of a hole in the ceiling above him. The man's face is wrinkled and red, with sweat beading on his forehead. Bright, twinkling Christmas lights swirl around the room, casting a magical glow. A huge robot reindeer towers over a tiny Santa Claus, who gazes up at it in wonder. Snowflakes dance in the air, creating a whimsical atmosphere. The image is incredibly detailed, like a high-quality photograph taken with an expensive camera.

bitter hearth
mortal mesa
#

like 20th gen, i do 4 at a time, its Bill Cosby if its not clear enough

torn wharf
#

yay christmas morning! i'm prompting for nintendo 64s and i'm getting all these jank looking super nintendos though.

fleet meteor
brittle nexus
torn wharf
#

you know, sometimes i think sd3 has no idea what an n64 is

#

me thinks lykon was a sega kid and is keeping the ancient console wars alive. kind of sus

#

yeaah has a pretty good idea of what a sega draeamcast is but no idea what the n64 is. right okay. that's not sus at all.

junior dune
#

Is there a way to train a lora with sd3 as the base model?

torn wharf
#

onetrainer sd3 dev branch, or kohya scripts sd3 dev branch

#

THIS IS PRETTY SUS THO. N64 DESERVES GREATER REPRESENTATION

#

i didn't prompt no gamecube but thats what its tryin to do clearly. that aint no n64. this safety training is bohshit

#

closest i could get but they erased the nintendonger

deft whale
#

/help

torn wharf
#

/balls

craggy crest
bitter hearth
bitter hearth
torn wharf
#

heh

craggy crest
mortal mesa
craggy crest
bitter hearth
craggy crest
mortal mesa
#

unsettling

torn wharf
craggy crest
torn wharf
craggy crest
torn wharf
edgy kelp
#

I really wonder why SAI didn't even try to blur just a little the Midjourney-based dataset images, the artifacts in SD3 medium are so annoying...

tidal spire
#

Dress

#

Change dress

bitter hearth
#

Changing dress, please wait 2 weeks for it to be done.

tidal spire
cold osprey
upper gust
upper gust
bitter hearth
#

the image I made before was heavily stylized so its personal taste really

here is an image that is just photographic with no stylization:

#

that is at 300 steps of order 4 implicit adams

#

for comparison here is the same seed with 20 steps of order 1 euler

#

but there is much more potential in samplers
Dr Head's node is awesome but
the implicit adams is only order 4
and it is fixed step not variable

#

higher order variable step implicit solvers would be a big quality increase over this

#

explicit solvers (euler, heun, bosh, RK4, fehlberg, dopri etc) won't work well with stable diffusion at any order higher than 2, or maybe 3 at best

#

due to stiffness

#

but implicit solvers allow us to push the order higher

#

🙂

cold osprey
# cold osprey

Please who can help me put this in a space background and put on the monkey with a space helmet

hollow apex
#

Any good SD3 fine-tunes already out?

bitter hearth
#

yeah there is one that makes balls

#

its awesome

#

its only lora though

#

no big checkpoints yet

icy drift
#

Any news on when Civitai will allow SD3 again? Since we got the updated licence.

lusty oyster
#

how to reboot olama?

#

After closing

bitter hearth
lusty oyster
lusty oyster
bitter hearth
#

IDK what civit are doing

#

it never made any sense to me

#

I think I misunderstood what civit actually is

#

I thought it was just a model hosting site but maybe they are trying to be something else

#

like an open source software foundation or something

heady wolf
#

I read that they've been in communication with SAI about a commercial license. My understanding is that once you make over $1mil, then you work with SAI to make a new bespoke license agreement. That's probably the "new contract" civit staff is reading, not the one that was released to the public weeks ago 😛

silk solstice
sterile pendant
silk solstice
bitter hearth
#

SD3 colours are so good

bitter hearth
torn wharf
#

They're not a non profit like the FSF

lusty oyster
orchid crypt
#

so, sd3 maked from screen shots? this is ONE image generated with clear sd3 medium...

lusty oyster
craggy crest
bitter hearth
#

Balls

#

Good morning

marble plinth
#

hey guys, any hints or tips if i would like to create img2img in the style of marco grassi? I'ver trained a model with several pics of his work, and i would like to transfer that style to some pics of my family. Tried ip adapter and t2i style, but not succesfull... any tip is highly appreciated. Im using A1111 btw!

torn wharf
craggy crest
tough viper
#

how come civitai isn't hosting sd3 models?

torn wharf
desert garnet
torn wharf
#

They said it was because they didn't want creators to suffer and stability could require all models be destroyed. So civit deleted everything that had been uploaded and indefinitely banned sd3 content to protect creators.

|| it was really about protecting their business interests and they were only spinning it as protecting the community ||

craggy crest
sage burrow
#

So I've been merging models for fun and experimentation. If you add SD3 into the mix at all, there is always missing || nipples || and 🍆 lolol
In some cases the images are turning out better, and in other cases they are worse. I guess some older checkpoints merge better with SD3 than others.

foggy cloak
#

Civitai has made some terrible decisions lately but that one seems like a win-win tbh

craggy crest
craggy crest
desert garnet
#

everyone lies even the napster ceo lied

bitter hearth
#

Balls

torn wharf
# foggy cloak Civitai has made some terrible decisions lately but that one seems like a win-wi...

How are people not sharing for commercial reasons, helped out?

Civit have never demonstrated a capacity of dealing with issues maturely. This is more of the same. Nobody was at threat of needing to destroy models and a judge would've never held that up if it was even attempted at being enforced. The whole reasoning is a joke. They're likely using it more as a negotiation chip than any kind of good will.

Ulterior motives always

#

Telling the community they're banning all models until stability gives them favorable commercial terms doesn't communicate that well though. Trust that businesses are never honest. Especially ones that are still in the red and spending money to establish themselves.

craggy crest
desert garnet
#

wasnt that what happened with facebook/napster

torn wharf
#

napster they got hit with the riaa. after they were acquired by microsoft, they actually did a cool spotify style service which was way ahead of the times

#

my bad wasn't microsoft that got them. roxio. but they did a streaming subscription thing it was handy dandy

#

at facebook i think when they restructured the company they fucked over joseph gordon levitt and he didn't play his cards right. cut throat but yeah thats how its played. schmiddy from Google fame, isn't exactly a 'nice' guy either.

torn wharf
#

wait mb. it was another guy who played that character in the social network. the amazing spider guy

raven fern
sterile pendant
craggy crest
sacred jewel
bitter hearth
#

thomas there's too much water in this planet

torn wharf
#

death to the water! its time humanity take back the earth

craggy crest
#

@naive sparrow

bitter hearth
#

might go back to SDXL for a bit to get the stronger version of CFG++

#

not sure if it will be possible to port it fully to SD3

#

at the moment only "alternate" mode works in the comfy node for SD3, which lets you half your CFG

#

but the original full version lets you set your CFG to like 0.6

hollow apex
#

lets hope its slightly less censored to shit

bitter hearth
#

hmm never mind the full CFG++ at 0.6 CFG wasn't that much better lol

lusty oyster
rain current
bitter hearth
alpine summit
verbal epoch
#

Where do I find trained sd3 models

vast condor
#

The base model is on hugginface of course, if you mean finetuned by the community, civit is still not hosting yet to my knowledge, and the early ones weren't really that mind blowing anyway. Gotta be patient

craggy crest
bitter hearth
mortal mesa
#

Failed to fetch response from Ollama.

supple gulch
#

How about sd3.1? What's schedule of realese?

mortal mesa
low stone
mild bramble
mortal mesa
edgy kelp
#

Man, I lost so many Balls (TM) while I was offline

edgy kelp
mortal mesa
#

uh oh, SD3 just popped out topless with appropriate bits unprompted, what do i do, email it to trust and safety

edgy kelp
#

SAI will persecute you

#

Or prosecute, whatever

mortal mesa
#

maybe both

bitter hearth
edgy kelp
sage burrow
#

What is something that is SFW that SD3 does NOT know, that Pony and/or SDXL knows? I'm testing my model merges.

tough oriole
#

Yall having luck with aesthetics on SD3? Its learning my subjects but it doesnt look good.

supple gulch
bitter hearth
#

the 16-channel VAE is currently unmatched

#

I think midjourney is still the best model in the world overall because it has good structure and aesthetics and also good subject knowledge (often training in a legal grey area on hollywood movies)

#

but SD3 can produce better quality because of VAE

torn wharf
mortal mesa
#

i don't use them but from what ive seen on a different server Ideogram looks better

bitter hearth
#

Ideogram is also very strong

torn wharf
#

lots seem to think that if we just get a 16 channel vae into sdxl or sd15, then it'll all be good. but those are still unet models that barely understand text comprehension

bitter hearth
#

I actually think Kolors looks very good a lot of the time, but its not a generalist model so its not actually that useable

torn wharf
#

does ideogram run locally yet? i've all but forgotten about it

bitter hearth
#

no its like midjourney sadly

torn wharf
#

yeah it's entirely irrelevant to me if i cant use the weights. the architecture is meaningless

bitter hearth
#

yeah its sad

mortal mesa
#

just another diffusion model you cant really play with

bitter hearth
#

Lykon said that Unet won't even scale to 8B
so the big model wouldn't have been possible without the DiTs

mortal mesa
#

looks nice though

torn wharf
#

kolors is a unet . i like what they're doing there, but they're really late to the game to try to get a new unet model started 2 years after stable diffusion 1.

bitter hearth
#

the sad part is Auraflow didn't get the 16 channel VAE

torn wharf
#

auraflow is v0.1 and the current release is just to kickstart community engagement. i doubt they'll keep it to a 16gb requirement too. likely will release smaller versions and then versions with different finalization steps. vae or otherwise

#

i haven't even bothered to load auraflow because it's a 16gb file and i've only got 16gb vram

mortal mesa
sage burrow
#

Comfyui model merging with SD3 plus anything else, just basically leave you wish SD3 only. There was a very very slight colour vibrancy increase when I merged it with Juggernaut, but that's about it.

#

Now I wonder if my SDXL plus SD1.5 merges actualy worked or not 😦

mortal mesa
junior peak
#

I'm getting somewhat nice results using big tiger gemma 27B as a prompt enhancer

#

somehow it seems to make these pics a bit better

coral sable
#

any ETA for SD3 2B refreshed version?

edgy kelp
#

Reportedly: "When donkeys will fly"

#

If it was actually meant to be a beta version, they would have kept on training that model even before the release

mortal mesa
#

i await the surprise 4B release

edgy kelp
#

I await Stable Cascade 2.0 LOL

dry wave
#

I have the feeling I had this discussion here so many times and it's useless to repeat it over and over

#

but the unet architecture is not anyway worse than a dit architecture

#

just because it uses a unet does not mean it is not a transformer

#

same way all dit architectures use some convolutional operations under the hood, too

dull star
dry wave
#

it's not a question about unet or dit or mdit

#

it's often rather the data and data annotation that makes a difference. And also the stronger vae that allows for more details

bitter hearth
torn wharf
#

not quite true. a DiT architecture is just better suited to the task. it was engineered for the purpose. Unet was just what was available back then and they prototyped on it.

dry wave
#

and a better text encoder for sure

torn wharf
#

if you're going to open with "this is nonsense" then we'll never have an honest discussion on the matter. i give up immediately

cunning lintel
torn wharf
#

before you reply, just know that you win. pat yourself on the back

dry wave
#

dit is just a transformer architecture on image patches, what's the difference to the unet?

torn wharf
edgy kelp
dry wave
#

the only significant difference is that unets are not using positional embeddings. You could add them, though

edgy kelp
dry wave
#

there was even a paper comparing different architectures and found the sdxl unet architecture outperforming the pixart dit architecture on the same data trained

torn wharf
#

tbf pixart isn't that great and doesn't have the same structure as an mmdit

dry wave
#

which also doesn't necessarily mean unet is better.
unet is the more complicated architecture. PixArt with it's dit architecture showed you can reach same results with simpler architecture

sterile pendant
dry wave
#

peer review? in ml ? 😅

#

mmdit is something else, yes

sterile pendant
torn wharf
#

unet has been worked with longer. more complicated means less easy to improve upon. simpler is usually a better design in many cases. and it shows in colors wiht sd3

dry wave
#

I don't really believe that mmdit is parameter efficient. But that's my subjective opinion. There is no good evaluation yet

edgy kelp
#

In no way a pear reviewed this paper

torn wharf
#

there's lots of results coming out. auraflow for instance. there might not be any great finalized review yet, but there are MANY indications

desert garnet
#

we need a pear to review this math function to see if it works

dry wave
torn wharf
#

why am i bothering though. we've already agreed at the beginning that you won

edgy kelp
#

Pear reviewed fruit juice (made from pears, for pears, reviewed by pears)

desert garnet
#

but is the math real ?

dry wave
#

yeah, future will show. The architecture of mmdit is unnecessary complicated in my opinion. You are right that simple is usually better. So I would go for the simple dit architecture of PixArt and improve on that

torn wharf
edgy kelp
#

Math was done by pears, so hell yea brother

#

Reportedly 1 pear + 1 pear equals 2 pears (but this is not 100% pear reviewed yet)

torn wharf
mortal mesa
#

you can destroy numbers by taking your numbers individually and multiplying (1 x 1 = 1) till you are out of numbers

torn wharf
#

further. the spelling of pear always fucks with me. peer? pear? pare?

edgy kelp
torn wharf
#

need to pull out the calculus to get statistical propabilitys of that apple possibly being a pear

edgy kelp
#

I'd call it Pearbability

desert garnet
#

schrodinger pear

torn wharf
#

sounds naughty

edgy kelp
#

Sorry brothers, I'm impeared

#

Also, let's talk about the fact that you can impeach a president and not impear them

torn wharf
#

i saw that debate and there was some impearment

edgy kelp
#

I can't stand this, it's unpearable

torn wharf
#

well i saw the highlights. i'm canadian so i don't really pear

edgy kelp
#

I'm italian, I pear even less

torn wharf
#

yeah pearing near the border of the pearnited states has me pearing a little bit

edgy kelp
#

appearently

torn wharf
#

i pear for the economy sometimes. it's already all peared up. now our largest pearing partners are peared

#

imagine if over on your side of the pear, pearis started the revolution up again. one might pear that could mess your pearconomy up too

edgy kelp
#

Y'all gotta fuel them cars with spear pear juice bought from Pearmany in the Pearope

#

I reckon Pearmany is a leading expearter of pears

torn wharf
#

theres always pearaguay but that might be less available in the eu

edgy kelp
#

I think we should stop if we don't want to be peared... uh banned

torn wharf
#

i'm running out of bad pear puns anyways

edgy kelp
#

Hahahah

torn wharf
#

winners imo are impeared, appearently, pearis and pearaguay. thats my pear review

dry wave
#

new models always look better. I doubt, though, that it has much to do with the model architecture itself. SD 1.5 was trained on LAION and when you look at the data it's no surprise the model looks so bad. The most efficient way to train models nowadays is to train on high quality artificial Midjourney, Dall-E or Ideogram data. Also auraflow was trained in ideogram data as crazy. The availability of more and more high quality synthetic data is the driver for better open models

#

I would wish there are more evaluations where people really test different architectures on exactly the same data to see which method works best

#

but such evaluations are rare. Ni surprise, they are basically burnt money

#

I think we can agree that you need something more powerful than clip to get better prompt understanding. But do you really need mmdit? Did ever someone compared mmdit against cross validation and found it superior?

bitter hearth
pseudo owl
#

dit models are pretty nice but whats more important is probably better training data, better text encoders, and better vaes

torn wharf
# dry wave I would wish there are more evaluations where people really test different archi...

i'm not sure why that would be more objective of a measure. dufferent architectures might prefer different data. captioning may work differently on models with t5 instead of clip vit . there are plenty of considerations. ultimatley i think testing the best product of one against the best product of another is how to contend them. and even then, objectivity will be hard to maintain

torn wharf
#

Kolors is a good showcase of SDXL with better data. there's an improvement but nothing that breaks it out of it's mold

#

soon we'll get better vaes adapted to work with sd15 and sdxl. i don't think we'll break the mold there either. it'll still mostly be the same

pseudo owl
#

mmdit seems better for prompt following but worse for image quality i guess?
next-dit(from lumina) seems to do be a middle ground
plain dit seems to be somewhat a middle ground too but slightly worse img quality
unet seems to be also a middle ground but slightly worse prompt following

torn wharf
#

Ella adapters certainly haven't taken off because they actually don't offer much improvement

dry wave
#

I don't think it has much to do with the architecture. prompt following comes from a good text encoder and a good captioning if your data

torn wharf
#

maybe well need to wait for some models to show up refined with ella adapters

pseudo owl
#

only sd15 has 1 ella adapter and that one was kinda ok but nothing too good

#

it definitely was a lot better then sd15's original prompt following but not much better then sdxl's

dry wave
#

there is no much technical reason why a dit should be better in prompt following..From what? I mean, nobody really understands at all how these transformers work but I wouldn't overinterpret anything here

#

ella adapters are just adapters after all. They fix the text encoder, but if your model itself does not have great understanding it's hard to fix that afterwards

#

newer models are all trained on synthetic captions. That together with better text encoder gives them the better prompt understanding

pseudo owl
dry wave
#

train sdxl on synthetic captions and t5 and you will probably get a model as good

#

you only need one text encoder

#

I don't know if mmdit has better text understanding. Could be.

pseudo owl
#

Oh yeah a image i got from a nice lora called 'better lora' from sdxl, improves text rendering a lot

#

so good text rendering might not be architecture specific

dry wave
#

but shrek is a known word

#

if you use words that are not known you will get in trouble with clip

#

I mean it works to a certain extent

#

but t5 seems to be really the better choice when you want text and prompt understanding

torn wharf
#

you're not considering the MM in MMDiT. multi modal. it has blocks exclusively for text comprehension

dry wave
#

anyways, I just think we shouldn't overemphasize some of the architectural differences. dit or unet - the difference between both is not that huge

#

the multimodal in mmdit is a joke

#

there is not much multimodal in there

torn wharf
#

That's a big technical reason why the architecture is more suited towards prompting images

pseudo owl
#

not too bad
prompt: a image of a man holding a sign saying 'xfnk',

torn wharf
dry wave
#

they project text and image patched into the same latent space and apply self attention on them.
Before that they projected them into different latent spaces and connected them via cross attention

dry wave
#

I would say the biggest difference between the dit and mmdit architecture is that the text is processed and interveiled with the timesteps, too, while in the dit and unet architecture the text was frozen

torn wharf
#

seems like ad hominem and not a real reason. You said there's nothing different that would imply better text comprehension. now when i point out direct evidence for it, you insist that they just made that up in an academic research paper, for marketing purposes

dry wave
#

this makes the model much more computational expensive, but it could allow the model to adapt/connect the text understanding to the image it processes

torn wharf
dry wave
#

🤦‍♂️

they call the model multimodal but it's just a model with two domains: text and image

#

SD1.5 is also using two domains, text and image

#

so why is sd 1.5 not a multimodal model?

torn wharf
#

no one said it isn't...

dry wave
#

because they named it so. It's a marketing thing. You need to name your new architecture somehow

flint sapphire
#

Someone speaks Spanish

torn wharf
#

it's named such because the transformer blocks themselves include two networks for different modalities.

#

it doesn't mean no other text to image model is multimodal

dry wave
#

calling it multimodal dit sounds for sure better than calling it Würstchen

torn wharf
#

wurstchen is a unet

dry wave
#

😬

devout schooner
dry wave
#

yeah, and it's super fast

torn wharf
# dry wave 😬

you've had a track record of calling factual information "a joke" or "wrong" when it doesn't suit your understanding, so i'm going to name this face Mr Kreuger.

dry wave
#

oh yeah, I'm so sry that I called it a joke when I should rather wrote "it's a name they used for marketing reasons, because obviously all other models are also multimodal"

#

good that I added an explanation afterwards

torn wharf
#

i dont see it on marketing material though. it just seems like what they named the architecture. it's a bad hot take. you also shouldn't appologize where you don't actually mean anything by it. it dilutes the potential for sincerity

#

https://stability.ai/news/stability-ai-secures-significant-new-investment their latest marketing video for sd3 is here. nothing about mmdit or multi modal. i can't find it anywhere in their marketing material. it's only regarding the individual transformer blocks of the architecture. Sort of like Unet is named that way because it's sort of shaped like a U.

the marketing reasoning just seems made up and fraught with bias

dry wave
#

right, I'm not sorry. I think I just cannot discuss with you because you seem to just react on single trigger words without trying to understand what I wrote or even put them into context. Same way you seem to just post trigger words yourself. Like you write mmdit is multimodal, so that is a proof that it's better in text understanding. No, this is not a proof. It's just a word (and you seem to put just way too much meaning into single words). I talked about what in my opinion is the difference between mmdit and dit and why that could mean it has a better text understanding but there is lack of any evaluation on it.

As said, I try to add context and explain my opinion as good as possible but it seems that we both just communicate on very different ways

#

so let's just stop that, it leads nowhere

torn wharf
#

" Like you write mmdit is multimodal, so that is a proof that it's better in text understanding." I believe the reasoning was elaborated much differently than that

#

hyperbole is better suited to poetry and literature. i never said anything of proof. i said words like "evidence", "seems to imply", and "indicate". Where i questioned you was your insistent that there is zero technical reasoning for thinking it could be better.

#

#💬|general-chat message here's the last time i used the word proof on this server and i was wrong then. so i am reserved with such language now. being wrong is a great opportunity to learn.

sterile pendant
#

Damn, you guys are still going hard at it

torn wharf
#

i wouldn't say going hard. i'm not sure what his goal with all the misinformation is though. legitimate research and a legitimate architecture being boiled down to a marketing stunt , honestly has me confounded. Im' not sure how anyone seeking the truth could arrive at that conclusion without having ulterior motives.

there's a reason why the researchers behind auraflow are using the mmdit approach as well. It's a strong architecture. Calling it a marketing stunt is like calling master view controller programming a marketing stunt.

bitter hearth
#

gemma 27B is probably amazing for this yeah

#

its gets tiring writing image prompts, I quite like LLMs for this task

#

ironically you could get a second T5 to write the prompt for the SD3 T5

craggy crest
dry wave
#

actually that's a big disadvantage of many models:/ they are so trained on synthetic captions that they cannot really deal anymore with short human captions. It's a bit weird that you need to feed your prompt into an llm to get better outcome 😬

bitter hearth
#

something that didn't get mentioned that I am pretty excited about is the rectifying flow part of SD3

#

it makes the paths more straight

#

when the paths get wiggly we need to either use the slow solvers or lose accuracy

#

and in the SD3 paper it said that the bigger the SD3 models have even straighter paths

#

if you look at how bad euler images are compared to DPM++ 2m images
that difference is entirely down to wiggly paths

#

if the paths were perfectly straight then an euler image and a DPM++ 2m image would look the same

#

it would also only take one step!

#

we will never get to that level but rectifying flow is a huge step in that direction, because it directly trains on straight path objective

dry wave
#

I heard from other people that rectified flow is more vulnerable to overfitting, though. It memoizes images faster. Dunno how much this is true, though

#

sd3 despite all it's failures sometimes produces extremely sharp and detailed images. I would give the better vae the credits, but could also be the used diffusion method, who knows

cunning lintel
#

i thought rectified flow lowers cfg effect and thus negative prompting hardly has effect

bitter hearth
#

I would also give the vae more credit than rectifying flow, but I think it also helps

mortal mesa
#

i dunno what all this mumbo mumbo is but i generated a cat with neg prompt only

torn wharf
#

I've heard this a few times but I get good results when using negative prompts. Tested with colors.

I think the meta negative habits like "bad hands" don't work as they do in refined models, so people are reverse reasoning their way around that

pseudo owl
#

unless you train it a LOT

bitter hearth
#

yeah BERT-likes and T5 have to be fine-tuned for each specific task, I agree

#

they are very parameter efficient once you have a good fine tune locked in, that is not too underfit or overfit

#

but I wouldn't want to do zero shot or few shot on them

torn wharf
#

There's long clip that I want to see some research done with. Same old clip models with larger context iirc

bitter hearth
#

would be cool to see more done about improving clip models yeah

#

if I remember rightly long clip only benchmarked slightly better on most tasks

#

but on one task it was much better

torn wharf
bitter hearth
#

yeah I think I saw it on reddit

#

I might be getting it confused with a different clip

#

I had a go with SAG, PAG and Free-u today in SDXL

#

but it doesn't help that much for my type of image I think

#

because I do very high steps but very low CFG

#

Align Your Steps or Karras scheduler also doesn't help as much if your steps are very high

#

its a bit personal taste though, I could definitely see PAG/SAG making changes I just wasn't sure if they were an improvement or just different

#

SAG did definitely improve concept bleeding

torn wharf
#

Yeah the concept bleeding. So much of it on a unet since the cross attention is so limited

bitter hearth
#

its so bad yeah

torn wharf
#

I'll have to play with sag

bitter hearth
#

going back to SDXL after SD3

#

concept bleeding everywhere

torn wharf
#

Even cascade handles it better than sdxl but its there

bitter hearth
#

I like to prompt colours a lot and it keeps adding the wall colour to their skin

torn wharf
#

Using an actual color word is very heavy. I almost rely on regional prompting when I want to direct color

bitter hearth
#

I never liked cascade
my opinion is a bit unpopular there
but I don't like the side effects of the compressed latent

#

it affects the structure

#

regional prompting sounds good I haven't tried that yet

torn wharf
#

Yeah. I mean ... I don't use it often.. but its there to consider. There's a few problems with it but I cheer anyone finding use from it on

bitter hearth
#

clownshark gets the best images out of anyone I have seen and he uses it
so I gotta respect that

torn wharf
bitter hearth
#

not yet, aside from the demo, but that looks amazing yeah

torn wharf
#

Thought it used an LLM but from what I learned I guess phi3 isn't an llm

bitter hearth
#

I was saying on the Rundiffusion server that attention maps is an area that hasn't been explored enough

#

there was a paper about attention map injections where they found a lot of potential improvements to explore

#

its hard because you lose image quality from doing this

#

a bit like how we lose image quality with CFG

mortal mesa
bitter hearth
#

wow really good text

#

it got the fancy font

torn wharf
#

Worrcerestffs sauce. Spelling checks out

mortal mesa
#

whats this here sauce!

bitter hearth
#

my new project is to forget about fancy solvers and just work on clownshark-style noise injection

bitter hearth
#

and just solve with DPM++ 2M because its good enough mostly

#

this noise injection video is amazing its a huge detail boost
https://www.youtube.com/watch?v=59-3RZknRgk

#

you do lose some convergence speed though

torn wharf
# pseudo owl it is

all things i've read about it is that it is just a regular language model and isn't trained on a large language set

bitter hearth
#

some people call Phi 3 a SLM
for small language model

#

but I think its fine to call it LLM

#

I call BERT an LLM sometimes

pseudo owl
bitter hearth
#

I think people just wanted to differentiate it from like 70B+ models

#

but it gets quite unclear

torn wharf
#

yeah LLM seems to be the generic term now. the vernacular landscape is still being developed. microsoft calls it an slm

sterile pendant
# torn wharf i wouldn't say going hard. i'm not sure what his goal with all the misinformati...

Oh I agree, mmdit is the better way forward. It will scale better as models get larger. Unet is still good, but it's a very old architecture by comparison. Wanna say it was originally created for the biomedical world for aiding in CT or MRI scans. In another year or two, wouldn't surprise me to see mamba based diffusers or something along the lines. The field changes a lot, but doesn't immediately make prior tech obsolete, they just typically end up being less efficient. Mmdit technically still has a lot of wiggle room for efficiency as well, as the aura creator brings up with his model

bitter hearth
#

there's a prototype mamba diffusion model paper already out, its really cool 🙂

#

they didn't replace all the attention layers, just some of them though

edgy kelp
#

When pear-based diffusion models?

torn wharf
#

sold it to a guy who used it in a total destruction back yard derby for $100

#

had it's uses but, old and dusty and not modern

sterile pendant
torn wharf
#

old cars are fun af. you get a model that doesn't have anti lock breaks ? sooo much fun in snow

bitter hearth
torn wharf
#

or just the rain or whatever. hard to get all sideways with ABS

edgy kelp
#

I hope at least for a balls-based diffusion model

craggy crest
sterile pendant
# torn wharf old cars are fun af. you get a model that doesn't have anti lock breaks ? sooo ...

i used to be a dumbass that was into drifting... yeah, i know it all too well lol... but yeah, the whole field is a giant min/max game. reminds me of a rover i had to design ages ago where i ran into the engineering hell loop of min/maxing. it started with a goal of a rough frame size, which had to be made out of welded steel, but then based on that, it has a weight, so you need motors that can move the weight, which means you need a battery of a certain size to operate for X amount of time. so then you find that the frame isn't sturdy enough for the 25lb battery, so you need a sturdier frame, which is heavier, which means you need bigger motors, which means you need a bigger battery and so on lol. models are a lot like that as well where you're juggling vram size, parameters, training time, inference time, perplexity and so on

edgy kelp
#

The question is: when will Stable Diffusion models learn to output face expression more complex than 'smiling' or 'sad'? Is it too advanced maths right now? Do we have the technology?

sterile pendant
#

they only comprehend what they are captioned with

edgy kelp
#

I think SAI developers were themselves not trained with a dataset that lets them understand this

pseudo owl
edgy kelp
#

Current SAI datasets 'the picture shows a person, but now let's talk about how many leaves are there in that particular tree over here'

bitter hearth
#

vision models are not that great yet

sterile pendant
#

high quality captioners are slow, they rely on things like cogvlm that are fast as hell, but definitely not perfect

#

florence2 is a reaaaaaaaaaaaaaaaaaly good one though

bitter hearth
#

florence2 prompts better than I do LOL

craggy crest
bitter hearth
#

ok but at lower CFG

craggy crest
sterile pendant
#

florence2 large chugs out "more detailed captions" in like 1 second per image on my pc with an old 2080 in it and it's accurate a good 95% of the time

edgy kelp
craggy crest
pseudo owl
edgy kelp
bitter hearth
#

there are some concepts where the model requires more CFG than others

#

expressions tend to be further along the scale where they need a decent CFG injection

sterile pendant
#

and 17b parameters is nothing, they caption hundreds of thousands of images per day on their servers with thousands of a100ss

pseudo owl
craggy crest
#

shocked

sterile pendant
edgy kelp
craggy crest
#

laughing hysterically

sterile pendant
pseudo owl
edgy kelp
craggy crest
bitter hearth
#

yeah only the bigger llava models like the Yi 34B one can match cogvlm 1

craggy crest
bitter hearth
#

it looks like a big laugh to me

craggy crest
bitter hearth
#

I think crystalwizard is right with this one

#

that's much more than a chuckle

pseudo owl
bitter hearth
#

yeah llava method is a bit old now they just keep strapping bigger LLMs on

#

which will probably work for a bit longer

sterile pendant
craggy crest
# edgy kelp https://www.google.com/search?sca_esv=1a233ca9bb5af21c&sxsrf=ADLYWIIZ5jybpgjk0DH...

i notice on that page that there are a whole lot of diffrent ways people laugh hysterically. one person has their head thrown back mouth open, another has their face buried in their arm, etc - the problem isn't that the model doesn't know how to create the expresison - its that you're not telling it the specifics of what you want the expression to look like. you're expecting it to read your mind

pseudo owl
craggy crest
#

if you want him laughing with his mouth wide open, say so

bitter hearth
#

ah yeah I haven't tried internvl2 yet but I saw it on open_vlm_leaderboard

sterile pendant
#

of course big ass 2000000b models are going to be really good, but lets talk consumer level gpu captioners

pseudo owl
#

ok florence2 and moondream2 are the best then(the good part about moondream is that you can vqa as well), both are roughly same speed but moondream2 is slightly bigger and similar vram usage

edgy kelp
craggy crest
#

painful grimace

edgy kelp
sterile pendant
#

so yeah, i like to talk more about the models we can actually run at home without $10k pcs

pseudo owl
#

i just use kaggle since it gives 2 t4(combined 28gb vram) for free lol, my home gpu has 4gb vram

willow void
#

Just put in my 3090 sadcat

bitter hearth
#

has less CFG burn also

craggy crest
#

a man laughing hard, head thrown back, mouth open, hands in the air, excited (using SD3 2B for all of these)

pseudo owl
willow void
sterile pendant
#

he was trained on the aliens from total recall

craggy crest
#

(we were discussing expresions)

willow void
#

its better this way

sterile pendant
#

QUAAAAAID START THE REACTOR

#

(screenshot from the actual movie, not a generation)

craggy crest
#

intense concentration

torn wharf
#

Open your miiiind

bitter hearth
craggy crest
bitter hearth
craggy crest
bitter hearth
craggy crest
#

there's always the online comfyUI websites

bitter hearth
tough oriole
#

Working on a SD3 cartoon concept. (Picked a random one from my sdxl dataset) learned the style well enough but it still cant do good poses and I'm not sure how to fix it.

craggy crest
grim pivot
#

Heii

tough oriole
bitter hearth
craggy crest
#

wouldn't you need an openBalls controlnet?

bitter hearth
#

Open balls lmao

low stone
sullen moss
#

Are there any updates on the 8B model?

mild bramble
craggy crest
edgy kelp
#

Hilarious: I'm trying SD3 on Huggingface spaces and I wrote long prompts about some random celebrities, the best results in resemblance were obtained when I included anagraphical data like zodiac sign and whatnot similar BS, that must be the T5 I guess

bitter hearth
#

feel like its was just a case of a small sample size

#

don't think zodiac sign would be in the captions much

#

maybe T5 is interpreting it in a funny way yeah

#

and zodiac sign is close to other terms in the internal embedding inside the T5

radiant ledge
#

or maybe that celebrity just happened to look like a bull or something

edgy kelp
#

An aries celebrity would look like a goat demon, big time

upper gust
shell plaza
hazy hatch
#

Skibidi toilet

shell plaza
lucid swift
edgy kelp
#

Balls(TM) but also Pears(TM)

#

Notice the insane difference between the two images, neither have the best resemblance but some supposedly filler words did... something.

image on the left features the extended prompt with "useless" details:
"a highly detailed close-up of the famous taurus actress christina hendricks, born may the third of 1975"

image on the right features the shorter prompt, more concise:
"a highly detailed close-up of the famous actress christina hendricks"

Both images used the same seed and same everything, except for the slightly different prompt... I subjectively think that the extended prompt has somewhat better resemblance

sterile pendant
# edgy kelp An aries celebrity would look like a goat demon, big time

No it wouldn't, if anything, it would just look like a ram. You gotta remember that the concept-space mappings for it would mostly map to constellations and the sky. That shits light-years away from a human celebrity. You'd likely need to prompt better to get a human-ram hybrid.

edgy kelp
#

Did the same with Ryan Gosling, it improved the face symmetry but ruined the picture, I wonder what happens if I remove the negative prompt

#

Without negative prompt he looks like a middle-eastern

sacred jewel
strange grotto
#

met this error

limpid drum
#

Hey folks, I've been way for a bit (almost had to go on bereavement leave) . No that I am back, am reading about flroence2 and how it's great for tagging and performs well despite the smaller file size. Question: Can it be used for prompt generation or is it strictly for image description?

bitter hearth
#

it is very good for it

#

its better than I am (as a human)

craggy crest
bitter hearth
#

I found 0.5 shift could get a bit rough in terms of structure

#

when it works, it does look nice though

craggy crest
bitter hearth
#

yeah my experience was similar

#

it can be a tricky trade-off at low steps

#

at high steps you can get both but it takes long

#

I didn't expect it to be like this because the older models do better with karras schedule

craggy crest
bitter hearth
#

these look same to me

craggy crest
#

here's with shift at 8

mortal mesa
#

with and without glasses

bitter hearth
#

Shift 0.1 is better

#

thanks wow the difference is huge

craggy crest
bitter hearth
#

that's a good range yes I did a lot at 1.5

#

I use lower CFG than you do, in general, so it gets a bit more squiffy sometimes

#

but running a batch of 20-30 will mostly still yield one good one

mortal mesa
#

shift 3 detail

#

there would of been bugs at lower shift

bitter hearth
#

that looks nice

#

I think shift is similar to CFG
lower is higher quality but you lose the control and the structure

mortal mesa
#

cant really put my finger on it, its not quite that, seems more about fine details vs coherence, you can still get detail just not the pimples on her face or the bug is the jungle

bitter hearth
#

its hard to explain yeah

#

I like this look that super low CFG gives, I would describe it as "wispy"

#

Would be easier to test if you guys prompt for shining balls

#

Much much better than humans

craggy crest
#

shift specifically changes the time_step value

bitter hearth
#

most people don't like low CFG that much its just a personal thing

#

it kinda goes low contrast and hazy, with pastel colours

craggy crest
#

other places call it CLIP guidance

#

shift is dealing with the actual time_step value

bitter hearth
#

yeah the Ksampler I use doesn't actually take in prompts

#

instead it has an input called guidance

#

and then I use separate nodes to add the positive and negative

craggy crest
bitter hearth
#

oh yeah you are right

#

I just meant they are similar in some ways

#

in the sense that higher values can lower quality

mortal mesa
#

oh hi you were using the adams family stuff, ya? i tried and i get either one step or 4 steps, anything easy i was missing ya think?

bitter hearth
#

4th order implicit adams yeah

limpid drum
# bitter hearth yes I use it for prompt generation

Thanks and apologies for the late reply, working on some major enterprise software agreements here and I get to fall on the sword in two weeks if they're not done and we lose $ 2 billion in tax benefits. Which Comfy node do you use for prompt generation with Florence?

bitter hearth
#

I'm afraid I always forget the names of things
but I just searched "florence2" in comfy manager

#

I didn't need to get a custom node from github there was one built in

bitter hearth
#

but I am not sure why

mortal mesa
#

k, ill play again sometimes, thanks

bitter hearth
#

if I remember rightly this one

#

was right

#

you can take workflow from there

#

also this one, but the image quality is degraded by noise injection as an "experiment"

#

can't remember if the cat was SD3 or SDXL though

mortal mesa
#

right off the bat, Value 300 bigger than max of 150: max_steps

bitter hearth
#

oh no is the max 150 steps?

craggy crest
#

his long captioner does a better job in my opinion

bitter hearth
#

ah thanks I love captioners as I don't like writing prompts

craggy crest
#

he's got 6 extremely nice spaces to play around with

bitter hearth
bitter hearth
#

I mostly do stable diffusion to do weird experiments rather than actually make images to use

mortal mesa
craggy crest
#

(watches you make the ai retrace the same lines over and over, putting smaller and smaller dots in eactly the same spot)

bitter hearth
#

I didn't have errors

#

it might be a VRAM issue

#

as I rent datacenter GPUs so VRAM is high

craggy crest
#

there's a point of diminising return. stick with around 32 steps

#

at a certain point, you're just racking up GPU hours that even a microscope couldn't tell the difference in

bitter hearth
#

yeah in the DPM++ paper he compares against a powerful 4th order Runge-Kutta solver and he says its not worth the time

craggy crest
#

it's not. pull an image up into photoshop, zoom to the pixel level and - that's as finely detailed as you get

#

the AI is going to do one full redraw of the entire image for each step. at a certain point, you've got what you're going to be able to detect and you're just wasting compute time

bitter hearth
#

I'm not sure yet what my conclusion is

#

sometimes I quite liked the changes I got from the expensive sampling

craggy crest
#

if you're doing something like medical imaging, or some sort of scientific application, that's different but for stable diffusion? stick with around 32 steps

bitter hearth
#

yeah I'm just wasting my own money so its ok

limpid drum
bitter hearth
#

no problem

#

I didn't realise the default Ksampler tops out at 150 steps though

#

that's good to know

mortal mesa
#

non converging samplers don't necessarily add detail at high steps, it just keeps changing it

#

i think my issue with adams is i have a crap card and have issue with flash attention

bitter hearth
#

are you sure the max is 150 steps?

#

maybe it got changed?

#
    @classmethod
    def INPUT_TYPES(s):
        return {"required":
                    {"model": ("MODEL",),
                     "scheduler": (comfy.samplers.SCHEDULER_NAMES, ),
                     "steps": ("INT", {"default": 20, "min": 1, "max": 10000}),
                     "denoise": ("FLOAT", {"default": 1.0, "min": 0.0, "max": 1.0, "step": 0.01}),
                      }
               }
    RETURN_TYPES = ("SIGMAS",)
    CATEGORY = "sampling/custom_sampling/schedulers"

    FUNCTION = "get_sigmas"```
mortal mesa
#

not sure at all, console spit it out

bitter hearth
#

which sampling node do you use?

craggy crest
mortal mesa
#

your workflow from the cat

craggy crest
#

how much vram do you have?

bitter hearth
#

flash attention might have been an issue

#

yeah

#

comfy is complex and sparsely documented so there may well be a 150 step limit somewhere

#

gonna try to take a look through

mortal mesa
#

2080 ti 11gb, ya i think i have to downgrade pytorch to get flash attention on my card :/

craggy crest
bitter hearth
#

join cloud gang yes

mortal mesa
craggy crest
#

i'm actually really surprised you're generating on that, at all.

mortal mesa
#

now your being silly

craggy crest
#

i'm not. you've got a very small system. and a lot of this stuff takes a lot larger system than you've got.

bitter hearth
#

I don't actually know what the vram minimums are for stuff

mortal mesa
#

4gb

bitter hearth
#

ah ok

mortal mesa
bitter hearth
#

sadly machine learning suddenly started using massive amounts of VRAM

#

it didn't used to

#

Alexnet was trained on RTX 580s lol

craggy crest
mortal mesa
#

im actually waiting for 50 series pricing, when i get up off the floor ill probably buy a used 3090

craggy crest
#

are you generating videos some other way than editing video clips together then?

bitter hearth
#

generally there are tricks to reduce vram for most of this stuff

#

it mostly involves a lot of tiling

#

that's why people disagree a lot about how fast upscaling is

mortal mesa
#

ya sure it takes forever for length but its possible

#

see the little card that could

bitter hearth
#

those are really cool yeah

mortal mesa
#

this small computer is winning ya see

#

its not the size of the card its how you use it (this also isn't really true)

edgy kelp
#

Prompt used: "typography text saying: "i smell earwigs in here", flowery background, motivational, detailed exact text rendering"
Negative prompt: "missing negative prompt"

bitter hearth
sage burrow
#

Has anyone ever been able to get a drow elf with black skin, using SD3? I keep getting pale human ladies. SDXL on the otherhand makes amazing drow!

mortal mesa
torn wharf
#

but we've got the biggest balls of them all!!

edgy kelp
bitter hearth
#

I don't think sd3 is very good at pointy ears stuff

sage burrow
sage burrow
bitter hearth
#

The anatomy of the ears are ugly sadcat

sage burrow
#

I made an SDXL drow lora, wonder if I can trick SD3 into using that 😉

bitter hearth
edgy kelp
#

I swear I used to make the very same bad quality memes with paint a few years ago using impact font

mortal mesa
#

its the AI way

edgy kelp
mortal mesa
#

must be true the AI said it, thats commng

edgy kelp
#

Sad matters

mortal mesa
#

i live under a rock, i dont know what Drow is and neither does the AI i guess

edgy kelp
#

That looks like a Night Elf from Warcraft franchise

torn wharf
#

dark elf in DnD. While DnD is popular, the art is quite niche and not found to far outside of DnD contexts.

mortal mesa
#

i play some ancient game, they got dark elf and high elf

edgy kelp
#

I think drows exist in some european folklore/mythology though

torn wharf
#

Thats right. DnD draws largely from folklore and mythologies. But you'll find that art for the traditional myth Drows are not the same as the DnD stylizations.

#

WoW is largely a DnD rip off the same way that Overwatch is largely a TF2 ripoff. i use that phrase loosely here. "Heavily inspired by" might be more fitting

sterile pendant
mortal mesa
#

dark elf popped out some horror type with black skin for me, i didnt save em

sterile pendant
#

You might also try just using a controlnet to get the form and use terms for dark skinned

sage burrow
sterile pendant
#

Without mentioning elves

#

Always ask yourself "what would images like this be captioned as in a dataset or by a non-knowledgeable person looking at the image"

sage burrow
#

PSA gliff sucks, bikini armour isn't allowed grumble

torn wharf
#

idealy, the base model captioning should be so robust that the knowledge is generalized and it can zero shot the image. but if you prompt a hulk elf, it's going to be either mostly the hulk or mostly an elf.

#

some concepts seem so over fit that you can't blend them into other concepts so well. like doing car models as a ball is hard to prompt for, since cars are so over fit and it can generate specific models so well

mortal mesa
sage burrow
#

My furry glif got the skin colour correct ROFL

bitter hearth
#

Always furries

sage burrow
#

Apparently SD3 wasn't the problem, it was just my bad prompting! Claude helped!

#

@bitter hearth apparently SD3 can do pointy ears just fine... with the help of Claude 😄

sterile pendant
sage burrow
sterile pendant
#

yeah just throwing the first one it made out here

sterile pendant
#

that ear is a little wonky though

sage burrow
mortal mesa
torn wharf
#

finally finished my coffee and got around to it

sage burrow
#

Btown skin though 😦 Awesome ears :>

sterile pendant
#

like this

#

see the huge forearm parts lol shit had me rolling

torn wharf
sage burrow
#

What happens several years after a furry and a drow get frisky....

sterile pendant
#

lol

sage burrow
torn wharf
sage burrow
#

it even got the white/silver hair without me asking!!

torn wharf
#

only one prompt to all 3 tencs. bosh3 solver. A jet black dark skinned man, jet black skin like obsidian, he has elvish ears and is wearing elvish style robes. his eyes glow with the magic of valinor

sterile pendant
#

going to try running a few with kolors to see how it handles it

#

yeah it's being a pain in the ass about making them pale skinned elves

#

oh nvm, i got one

#

guess it didn't like the concept bleed test of "alabaster colored clothing"

torn wharf
#

its a unet so any "color" is going to take over the entire image

#

adjectives go hard

sterile pendant
#

yeah for sure, just wanted to test it out

craggy crest
sterile pendant
#

usually just by rolling the dice with seeds, honestly

torn wharf
#

i've never found attention span problems to be solved by freeu. that one has always just felt like it does nothing imo

craggy crest
torn wharf
#

Yeah i remember all the hype. No one could ever explain it to me or what i was doing wrong. I just don't see it and then since it can't be explained, it feels like people are just propping it up

sterile pendant
#

PAG+automaticcfg and/or dynamic cfg always better me far better results than freeu did

#

Freeu just always gave things that fake aesthetic plastic look, even if you used the correct settings for sdxl

#

But iirc, they touched on some of the issues like that in the paper

#

The reality is that there's no magic wand that magically makes generations better. They all have pros/cons. So it's better to kind of grasp what all the tools can do and know when to use them

craggy crest
# torn wharf Yeah i remember all the hype. No one could ever explain it to me or what i was ...

okay, well - it's used on mage.space so i did a tutorial video on it, which is here https://youtu.be/1FMIZNR25jA?si=eGM7-vQPK70w4hEk and created documentation if you want that - i got tired of people asking how it worked

One of the powerful tools available on Mage.space is called FreeU. But what is it? And how do you use it? Let's discuss this.

▶ Play video
bitter hearth
#

out of FreeU, SAG and PAG

#

I got my favourite results from SAG

sterile pendant
#

mine were with PAG advanced where i left scale at 3 and set rescale to 0.5-0.7. the ksampler's cfg needs to be N-3, so if you want ~5cfg, you'd set the ksampler to 2

bitter hearth
#

will definitely use this

craggy crest
bitter hearth
craggy crest
hallow lion
craggy crest
hallow lion
#

ye thats the guy he did it, all of it its his fault

#

So you feel less bad about sd3's ladies in the grass, here is auraflows "photo of a fat dog eating bananas"

sacred jewel
craggy crest
chilly fog
#

Вот в таком стиле нарисуй персонажа злобного коммуниста, на прозрачном фоне png файл

sacred jewel
low stone
torn wharf
#

the first minute opens with the usual same hype. It'll take your images to that next level and is a free lunch. its so ez. Not much hope for this video. i've seen a lot of them.

craggy crest
#

how about you watch the entire thing before you get negative about it

torn wharf
#

i got to the end and each different example as you're showing and explaining it, doesn't make sense to me since the visuals are just slightly different noise solutions and nothing about finer details. this is my usual experience with freeu. you gave me the tutorial as if i've never tried to understand it's use over the unet. but you're mistaken. i've just never understood the point of free u and it's basically just waving a magical wand and calling it good.

While you address that theres no "best" settings, theres a narrow range recommended for every parameter still because outside of that tight clamp the denoising solution just goes absolutely bonkers

craggy crest
torn wharf
#

i've never made any image better with freeu. one thing your tutorial avoided mentioning is all the hype about speeding up generations too. how you could get better generations in less steps. no one mentions that anymore these days

craggy crest
#

and i'm not mistaken, it affects each of the layers, and the skip connections that some of the data routes from one side of the network to the other

#

my original question was i wondered if this could be a way to adjust how much the network grabs adjectives and prevent it going overboard with them.

#

matt3o goes more indepth with this https://youtu.be/0ChoeLHZ48M?si=Z9iRoqE9gZ92n0e5

This time we are going to do some R&D and I will need your help to reverse engineer the UNet. Basically prompting each block of the UNet separately with a dedicated prompt we are able to get higher quality generations.

Extension repository: https://github.com/cubiq/prompt_injection

Discord server: https://discord.com/invite/W2DhHkcjgn
Github s...

▶ Play video
torn wharf
#

prompting the unet block block seems like over kill. ther'es a clip cutoff extension on a1111 thats useful. regional prompting is the best solution to the unet's short attention spans

craggy crest
torn wharf
#

yeah. the attention problem. i know.

craggy crest
sage burrow
#

I may or may not have previously prompted a red tshirt on my horror zombies to trick some places into producing better results 😉
Not just SD apparently

shut sable
torn wharf
#

red tshirt made of ketchup

bitter hearth
#

deepshrink, scalecrafter and high-diffusion all have a little bit of block by block action

sage burrow
#

It's odd, I've tried FreeU a few times, sometimes the results are truly amazing, the other times, the results are extremely horrible. Works with some workflows and not others for me it seems.

Re the vid, I liked it, far less sensationalism than say, 99% of youtube vids lol

bitter hearth
#

yeah I had that from PAG, SAG and FreeU
they are great tools but not for every image

sage burrow
bitter hearth
#

sometimes these things sharpen too much and I like soft images

torn wharf
#

viscous. good one

sage burrow
sage burrow
bitter hearth
#

Self Attention Guidance and Perturbed Attention Guidance

#

they are both trying to improve CFG

sage burrow
bitter hearth
#

not sure
I really struggle with things being named different in different places

torn wharf
#

i wonder what kind of cfg improvements we'll see created for sd3. those work on the unet style network don't they? i know people started perturbing sd3 in the first week. i'm just saying, there'll probably be some interesting attention solvers to come

sage burrow
#

It was out day 2 😄

torn wharf
#

oh i looked at SAG an i'm wrong. it can work on DiT too. Does it already?

bitter hearth
#

not sure

#

I read like 30 papers on samplers in the last few days
turns out I was wrong, 4th order implicit adams isn't going to be the best

#

DPM++ 2M still seems to be very competitive

#

also UniPC seems to be underrated

#

at low steps its one of the best

craggy crest
low stone
bitter hearth
#

sadcat wtf

#

What did the cake do

craggy crest
sage burrow
craggy crest
sage burrow
rapid moon
#

any idea when stable-diffusion-3-large be released to huggingface?

bitter hearth
#

no news yet

rapid moon
#

😥

bitter hearth
#

its probably gonna be a while

#

because the effect of people attacking SAI over the issues with SD3 2b is that they will delay the release heavily

#

to get it as good as they can

rapid moon
#

large is 8b right?

bitter hearth
#

yeah

rapid moon
#

any idea how much vram it will take?

bitter hearth
#

they said on reddit it will fit in 24gb

rapid moon
#

THATS F***in AWESOME!!!!

#

i can use 4090 instead of A100

edgy kelp
bitter hearth
#

not neccesarily

#

there's two issues with the model and both are fixable with fine tunes

#
  1. not enough subject knowledge
  2. issues with structure
#

and really they are both just the same thing

#

enough subject knowledge will teach it structure

#

I suspect they may well just train from scratch though yes

cobalt moon
#

eh I am pretty sure SD3 will not getting into the same fate as SD2 and SD2.1

#

SD3 is literally their deadly hit or miss

bitter hearth
#

it will still be SD3

#

its already WIP

#

the new SD3 medium

#

but its a new training run

#

was my understanding

#

having said that IDK if it was 100% confirmed that it isn't a finetune rather than a new fresh run

#

but the point is it will keep the SD3 datasets probably

#

what I would say as well is that the 8B can do structure fine

#

so its just a case of scaling it down to 2B

#

I suspect its the self attention that is the issue

cinder junco
#

Just throwing out a quick question in case anyone has tried it. I'm having difficulty using SD3 with a custom node for MultiDiffusion/Mixture of Diffusers tiled upscaling alongside SD3 ControlNet tile. When set up a similar way as I had it using the SD Ultimate Upscaler node, which seemed to work, I get tiled ghosting of the original image (first image below). If I use an empty latent for the second stage and completely rely on the controlnet to guide the final result, you can see that is the cause of the tiling (second image) -- it applies the full input image to the controlnet, instead of breaking it up into tiles. There doesn't seem to be a way of connecting the controlnet side of things (which changes the text conditioning) to the MultiDiffusion side (which affects the model) to let MultiDiffusion directly affect the controlnet. I saw someone else having what looks like the same problem using SD1.5, but there doesn't seem to be a definitive solution. https://www.reddit.com/r/comfyui/comments/19amano/help_cant_get_tiled_diffusion_controlnet_tile/

Reddit

Explore this post and more from the comfyui community

woven bay
#

Is stable diffusion 3 uncensored or not?
And also it will run on a 6gb vram gpu like sd 1.5 or not?

bitter hearth
#

its censored

#

by the normal definitions of that

#

for legal reasons probably all base models from everyone will be mostly censored going forward

#

and then fine tuners will do whatever they want

#

there was a crowdfunded program to fund an uncensored model and the creators just ran away with the money

#

lol

#

it was crazy

#

I still don't understand why no one took them to court

sage burrow
edgy kelp
#

It's not censored, it just does not like people lying on grass

sage burrow
edgy kelp
bitter hearth
#

I don't want to say its a skill issue
but I have managed to do the woman lying on grass test

edgy kelp
# edgy kelp

Notice that this prompt worked ONLY at 768x768, SD3 is not censored, so far it seems only VERY undertrained

bitter hearth
#

if your prompt has a difficult structure (such as the grass one)
you can do a lot to help
dynamic thresholding, CFG scheduling, block level IP adapter, attention map injection, control net scheduling
and most importantly
one of the stupidly long solvers

edgy kelp
#

I think there is severe inconsistence between seeds... and I don't mean that every seed generates totally different images (for some prompt it generates about the identical image), I mean that some prompts generate good images in only a few seeds. But I don't have the optimizations though since I used SD3 only on the huggingface space of Stability AI

#

Well, that's true for the ""beta"" SD3 medium, dunno about what's next

bitter hearth
#

I agree about seeds yeah

#

I get some seeds that are so bad

#

and then some that are amazing

edgy kelp
#

Of course I can't write good prompts, but if the seed number 1 generates almost the perfect image with my bad prompt, yet any random seed generates garbage there was evident issues with the training

#

Almost every prompt works at least for 99% at the first seed, I usually try prompts at the first seed in any model for to test the prompt adherence

noble coyote
bitter hearth
#

trying lots of seeds is a good idea

edgy kelp
#

Yesterday the dudes were trying to generate drows or dark elves or whatever, I did these two images with the same prompt but the image on the left was seed 1 while the other is the first random seed that came out.

Prompt: "a high quality hyperdetailed close-up of a dark elf drow from DnD"

I used the dumbest and easiest prompt, the elf on the left is not 100% lore accurate but has pretty good composition and anatomy IMHO

bitter hearth
#

the first elf looks like a royal

#

and the second like a rogue

edgy kelp
#

But the ears are objectively better in the first one

#

Look at this, used the same random seed as the one on the right but lowered the resolution to 768x768

bitter hearth
#

IDK if the second ears are bad

#

just different

edgy kelp
#

The second ears are doubled

#

Or... I guess those are horns?

bitter hearth
#

might be a popper collar

edgy kelp
#

Either way it's a confusing composition, which is generally thought of a bad image

#

I know the rest is pretty good though

bitter hearth
#

yeah I know what you mean

edgy kelp
#

Is the instrument Saxophone a no-no for SD3 or there is a more "natural language" friendly CogVLM way in place of "playing a brass saxophone"/"lips on a saxophone"?

cinder junco
#

Musical instruments with complicated features are difficult for AI. Tubas are a good example for SDXL.

bitter hearth
#

SDXL also can't do sax

#

mostly

edgy kelp
#

Nice, looks like SD3 does decent saxophones if the prompt is simple and short, that's a funny one

#

"a man is playing a brass saxophone", yesterday I tried a long convoluted prompt fixed with LLMs and it generated only body horror

#

768x768, one handed sax playing (also best saxophone I generated with any open source model, bad luck for the... weird technique)

stark ridge
#

!generate A serene mountain landscape at sunrise, with snow-capped peaks and a clear blue sky, painted in a realistic style.

urban arch
edgy kelp
#

Did you use that exact prompt?

#

Because that exact prompt, using SD3 medium without any fancy optimization, gives me this

bitter hearth
#

gonna try it

edgy kelp
#

I'll have to fix the resolution though

#

I guess he has some other token in the prompt, it just outputs images that look from bad SD1.5 fine-tunes this way

craggy crest
urban arch
#

Yeah, I realized after I posted that I was using SDXL in Fooocus, not SD3. I also have some D&D specific loras I downloaded from CivitAI, but that is the exact prompt.

sage burrow
bitter hearth
#

this is my attempt with that prompt

#

or this variation

edgy kelp
urban arch
#

I also got some mixed results

sage burrow
edgy kelp
#

Let me try with my previous prompt, the one that actually got me decent outputs

edgy kelp
#

Same seed by the way

#

Prompt was "a high quality hyperdetailed close-up of a dark elf drow from DnD"

sage burrow
#

Taesd knows how to fo dark skin apparently

#

Here's sdxl

edgy kelp
sage burrow
#

My sd3 from yesterday

#

For fun try dalle!!!! Best and most detailed drow ever!!!

edgy kelp
#

I use Dall-E 3 very often... I reckon on Glif you should have the 8B, right?

sage burrow
edgy kelp
#

All prompts I did with SD3 medium on that huggingface space are more or less improved if I lower the resolution to 768x768

edgy kelp
sage burrow
#

If that's the case, I impatiently await sd3 large 😄
Though on glif Claude helps?also

edgy kelp
#

I guess I'm not natural enough...

#

🤖

young blade
#

random thought of the day
cascade is still the best model (just sayin')

craggy crest
young blade
edgy kelp
#

1.5 had the best pretraining (if you know what I mean), but being the first one was "technically" meh. SD2.0 could have been way bigger if trained properly

sage burrow
#

I use 1.5 daily! 🙂

sage burrow
edgy kelp
#

I tried doing this image with Dall-E 3 and it got wrong the text every single time, it worked only with SD3 at 768x768

edgy kelp
#

SD2.0 was insanely undertrained, probably though in a more linear way than SD3 since it had somewhat better anatomy

young blade
sage burrow
edgy kelp
young blade
sage burrow
young blade
#

i just use a styler prompt mostly with cascade, seems to do really well for what i need

edgy kelp
#

There are little-to-none loras out there for Cascade because it was not hyped at all being a surprise release

sage burrow
#

The file size of SC! 😭

#

Prompt some up! 😄

edgy kelp
sage burrow
#

I'll try some on my furry glif....

edgy kelp
#

Some furry balls?

#

Me when there are no BALLS in here

sage burrow
edgy kelp
#

Are those the BALLS of Ultron?

sage burrow
sage burrow
edgy kelp
# sage burrow What's that?

Some character from Marvel movies, I have no clue what pop culture references work at all in the present day

sage burrow
edgy kelp
#

Where's the cat with 0.5 GB vram?

sage burrow
#

Ball shaped furries! 😄

spare bearBOT
#
sage burrow
#

I think I want my glif credits back rofl

#

Not round darnit

torn wharf
sage burrow
torn wharf
#

reminds me of critters

sage burrow
torn wharf
#

has some "boglins" vibs

craggy crest
torn wharf
#

they were little hand puppets you could control the eyes

sage burrow
edgy kelp
craggy crest
edgy kelp
#

2 more and 2 more!

sage burrow
#

SD3 has pietre Bruegel the Elder in it already, it's fine and done! ❤️

torn wharf
sage burrow