#πŸ†•ο½œsd3

1 messages Β· Page 7 of 1

dull star
#

honestly I'll just use the base model and make stuff I personally like

turbid grotto
#

it is definitely not 1024px)

dull star
#

back in March, look at the eyes

#

god, img2img will be so goddamn good

turbid grotto
#

Lykon said on twitter that 2b is better than current 8b in some categories sponging

dull star
#

considering that 2B was their focus, it makes sense

#

like a well trained small model will outperform an undertrained large model

#

llama3 8B vs like the largest Bloom model

turbid grotto
#

yea and it shows that 8b now has even more potential

low stone
#

Sd3/pixart/hunyuan - pixels being sampled by an unruly bunch on a untimely schedule, they're analyzing a model but it's too small.

patent acorn
#

huyan one is pretty weird

#

i thought its gonna become a skibidi

#

SD3 one is sick

low stone
#

It does really well with a lot of stuff so I keep using it.

noble coyote
#

SD3 really does have excellent visual acuity!

bitter hearth
#

with people like the creator of HelloWorld, sd3 will be awesome

bitter hearth
restive halo
#

I think the one that they'll release first will be the most popular since tooling, finetunes etc. wil be built on top of it

#

part of why I'm sad that we dont get all of them is that instead of the community finding out which one works best for most people, everyone is funneled into the same model

torpid forge
low stone
#

Sd3/pixart/hunyuan - a collection of cells in a fierce battle with a virus, spears, guns, shields, cannons

dull star
#

pixart did nice

#

I wish the api had the 2B model

#

it would possibly increase user count as its so much better

dusky thistle
#

would make a lot more sense imo if they just swapped em

dull star
#

exactly

#

also it seems they'll give us fine tuning code or something

#
  • Fine-Tuning: Capable of absorbing nuanced details from small datasets, making it perfect for customization and creativity.
#

I wonder if this means that they have made some really good finetuning implementations themselves

#

"absorbing nuanced details from small datasets" sounds really promising

dusky thistle
dull star
#

also cant wait for Stable Audio 2

noble coyote
#

"When's SD4 coming out?!" πŸ˜„

dull star
#

yeah ideogram is still the goat

rain current
#

It is uglier, but more faithful to the prompt

dull star
#

but damn, SD3 2B finetuned might take the throne, even if its still not the best

dull star
#

still prefer it over DALLE3's super smooth style

rain current
#

I am eagerly looking forward to the 12th to see how 2B works (I have bad feelings, I hope I'm wrong).... but the one I would like to have is 4B

dull star
#

I'm just looking forward to more knowledge in the model

#

like it knowing the look of video games, video game characters, etc

bitter hearth
dull star
#

I want it to know a lot, like how going from Llama 3-8B to like Llama 3-70B, 8B is coherent and all, but 70B just KNOWS more

rain current
#

Yes, with 2B we will be able to play, understand it, until the others arrive

dull star
dull star
#

and when 2B comes out, and its actually good and has diversity/variety and etc

#

I'll buy $10 credits

bitter hearth
mortal mesa
#

with shutterstock pics

dull star
viral plaza
#

from the SD3 research paper https://arxiv.org/pdf/2403.03206 this is how CLIP and T5 come together in the model:
You can whole on just stack more stuff horizontally at will and it works. Similar works on SDXL/SD1 and I think comfy does it by default for >77tok prompts, but SD3 is basically designed to be happy with stacking like that

dull star
#

woah

bitter hearth
#

is controlnet ready for sd3 medium?

viral plaza
#

you can pick and choose which tencs to use yes. It's only trained for G+L, and T5, and will need training to recognize other formats like longclip

dull star
restive halo
viral plaza
#

clipskip setup is exactly the same as SDXL

dull star
restive halo
dull star
#

ahhhh thanks

#

oh interesting

viral plaza
#

hopefully a fixerupper won't be needed for this one but yeah you can easily extract the VAE separately and tune it same as always

restive halo
#

I also thought Emad said we'll get their own controlnets with release but I think they've went back on that, too or I've misremembered

dull star
#

yeah I Emad said it on twitter

bitter hearth
#

cause sd3 is multimodal can you prompt using only images?

dull star
#

you mean like clip vision or something?

wide pagoda
#

I'd be surprised if he said it would exist at release, since it wouldn't make sense to delay release for that

mortal mesa
#

ypou can look

#

and be suprised

restive halo
#

I tried to find the reference but the thread and tweet I found were leading to [deleted]

#

so not sure what the wording was

mortal mesa
#

oh, ya i dont remember, lol did it get deleted, i surely dont know hahaha

restive halo
#

ah no, there's still a bunch of replies where he says it (note: this wasn't for sd3 my bad)

dull star
#

yup

restive halo
mortal mesa
#

welp about that, it didnt happen, ran out of compute

restive halo
#

but it was also supposed to be up to 8b and estimated to 2 months ago, so clearly a lot was said out of hype, or at best blind optimism

dull star
#

emad is known for hyping stuff up

#

lol

#

8B is still far away

mortal mesa
#

uh they work there

teal fossil
#

Let's call it optimism and stop listening to Emad at all.

dull star
#

thank god they kept training the models

#

it's so much better now

upper snow
viral plaza
# bitter hearth is controlnet ready for sd3 medium?

i know it's been looked into but idk if that'll be ready on release or not. Probably not.
The way to make controlnets is really clean+clear in SD3 though (there's a direct place to add a new stream, vs on old unets it was pretty hacky) so I'd expect controlnets on SD3 to be pretty cool once they're actually out

restive halo
#

if it really does work better and more easily that'd be awesome since tooling is the main advantage of SD over everyone else

viral plaza
viral plaza
viral plaza
# restive halo

lol yeah he had a habit of saying "yes thing will be ready" far before that was guaranteed

restive halo
dull star
#

oh that is feb 13th

restive halo
#

the 3rd one is definitely sd3

dull star
#

wait

dull star
#

yeah the second and third one yes

restive halo
#

but my bad, I shared too quick, I didnt even find the one I remembered anyway

viral plaza
upper snow
# viral plaza huh?

I remember that there was some mention about the research team wanting to test if using only T5 performs better, don't remember if it was you or someone else.

#

as in just training the model on t5 only, leaving clip out completely

viral plaza
upper snow
dull star
ionic geyser
upper snow
#

tbh I mainly just hate the token limit of CLIP. And you can't really blame them for doing it that way because sequence length is SOOOOOO EXPENSIVE

#

Plus I think over 99% of the data used to train CLIP was less than 20 tokens long, you can see the consequences of this if you look at the positional embedding and also read the Long-CLIP paper to see experiments on it. CLIP can barely function past 20 tokens on its own.

#

I did try aligning a long clip model to a finetuned SD1.5 model and it takes like, over twice as long to train like that with the 258 token long context window. That's with the text encoder unfrozen. But it did seem to work as advertised, it pays much better attention to the whole prompt.

fleet falcon
upper snow
#

anyways I do hope that T5 does enough to at least hold different parts of the prompt together when we are doing unholy things with clip embeddings and torch.cat

#

concatting can get everything in different chunks to at least be present but it won't allow things to properly combine with each other (except for whatever the denoiser can accomplish on its own. Honestly MMDIT might just inherently be able to handle this a lot better anyways). t5 would be the only thing on the input side able to combine distant concepts

dull star
#

Does anyone know how PonySD3 would be trained

#

would they continue with training clip with like tags, or did they switch to some vlm to caption the dataset

violet escarp
#

my assumption is that they'll use both. They already used some vlm captioning for XL using their own trained captioner

dull star
#

thanks

jolly swan
dull star
#

woah

jolly swan
#

I.e. OCR, character name recognition, support for nsfw, image grounding (wip but you should be describe object positions better in dalle3 esq way)

dull star
#

nice

bitter hearth
#

@dull star what you think of the helloworldsdxl models?

dull star
#

Leosam's? haven't tried I think

#

hmm gpt4v tagging

bitter hearth
fair spruce
neon wagon
crude yarrow
#

We just barely got the sdxl controlnets so I can't imagine we will be getting sd3 controlnets anytime soon.

bitter hearth
#

also how good would loras be in a DiT vs Unet?

lapis bay
#

via sd3 api. I hope that sd3 medium will be able to do these kind of images too

lucid swift
dull star
#

then again, 2B was trained for a more correct amount of time than 8B

#

@sterile pendant ahhh

#

sorry if you have already seen

#

YESSS A PROPER 2B IMAGE WITH NO UPSCALING

#

the 16 channel VAE is doing its job quite well

#

a 4 channel vae would probably fail in this case

#

so with highresfix, we'll get image quality that Lykon's been posting

#

but even for a native image, this is quite clean!

leaden kindle
#

Anyone know if you train a Lora on the 2B SD3 model, will it work on the 8B SD3 model?

dull star
#

probably not πŸ€·β€β™‚οΈ

but a textual inversion probably will

muted dove
#

If you look closely...

dull star
low stone
#

As you can see here, the 4 channel sdxl vae has a detrimental effect on facial features.

lucid swift
dull star
#

wouldn't it require a complete retrain?

lucid swift
#

Idk

dull star
# low stone

yeah, highresfix makes facial features worse the more you increase the resolution

torpid forge
muted dove
upper snow
dull star
#

what does the f8 mean?

teal fossil
#

Can someone explain in easy terms how the 16-channel vae is so much better than the 4-channel one and why?

dull star
merry hawk
#

How to do ir

dry wave
#

but yes, yoz have to fully retrain the vae as well as sd for it

dry wave
#

same happens with vae. You get artifacts and lose small details in the image

#

furthermore the vae is extremely sensitive to small changes because it's so strongly compressed - so it's hard for the diffusion process to get small details right

#

that's why all small things like heads far away go lost in diffusion

teal fossil
vapid radish
hollow epoch
silver sluice
#

What are the vram requirements for sd3 per model size? Will an 8gb rtx3090 be able to run the 8b model?

silver sluice
#

If anyone deserves first access to the model weights is the pony dev

dusky thistle
#

3090 has 24gb vram and yes it would

hallow lion
#

i like how right out the gate the pony devs are the most uhm... productive.

#

πŸ˜„

#

I do have a tiny suggestion for the ponies tho, please make your models more varied. All I get is a woman in an empty room, no matter the promt. 😦

#

In the same clothes too

#

If I am lucky enough to get clothes

#

your anatomy is pretty good tho if anyone can finally get good hands across the board is the p0ny sd3 i think

#

no more diffusionhand

dull star
#

apparently hands are gonna be better according to the email we have been sent about 2B

#

I have a massive doubt about it, but I definitelly expect hands to be at least a little better or on the level of SDXL

#
  • Photorealism: Overcomes common artifacts in hands and faces, delivering high-quality images without the need for complex workflows.
hallow lion
#

Yes I read it when I got it.

#

bold claims.

#

i hope tho because facedetailer is terrible at detecting and fixing hands

#

in comfyu anyway

dull star
#

smaller faces in the image, I expect to be much better thanks to the improved VAE

#

so I believe that

#

but I will still do a workflow (a very complex one) to make images better

#

you won't expect this at all....

#

Highres-fix 🀯

#
  • Typography: Achieves robust results in typography, outperforming larger state-of-the-art models.

I wonder how much this has improved over the 8B Beta days

dusky thistle
#

tiled upscales are the only way around the untrained resolution problem of latent upscales

#

but then you can have issues with compositional drift

#

but best i've found has always been a tiled approach

dull star
#

That's why I just use highresfix (t2i -> denoise at like 50% and t2i)

#

cause I have enough vram to waste

#

but controlnet tiles would be good

hallow lion
#

hehe everyone talks smack about hi res fix

hallow lion
#

ugh

#

also the backdrops and architexture will make more sense

#

like this pohoto the closest column if not right

#

also it doens t make sense what is it a bunker?>

#

lines in buildings are always not straight

#

sizes of floors are always messe dup

#

and in general they feel funny and unreal all over sdxl too sd15 even worse

dull star
hallow lion
#

hmmm

mortal mesa
#

i use nvidia cards

dull star
#

same

hallow lion
#

can u train on anything else even?

#

AMD

#

XD

dull star
#

if only rocm caught up

mortal mesa
#

i thought you were training on a building, ill show myself out now

dull star
#

oh I just realised the joke sorry sadcat

hallow lion
#

πŸ˜„

torn wharf
#

just as ai figures out hands, humans go and make them more complicated

hallow lion
#

thats an act of war

torn wharf
#

when judgement day isnt' from fear of being shut off, but a temper tantrum about drawing hands

dull star
#

I still don't get this membership thing

#

can somebody explain if this means that you need a professional ($20/month) to make images for commercial use, or this is only for hosting the model on your own service for example?

#

cause "utilize within that member's own product" sounds vague to me

#

like I "utilize" the model OFFLINE (so I basically host to myself, and not to paying customers), therefore I use it for personal use, so its okay, but then what's with the generated image, if it's owned by me?

#

can I use that generated image, which is owned by me, for commercial use?

restive halo
#

the answer I got (but for youtube) didn't clear that much except basically saying dont worry unless you are making a lot

dull star
#

if it doesn't require a membership I'll still donate to stability

dull star
#

don't know about game assets yet

#

and for just making images, I'd just share them for free on social media for people to see anyway

restive halo
#

yeah, I want to create some animated videos, and it's kind of unclear at what stage you need to start paying

#

but I guess I'll worry about if I ever make enough stuff to be making >$20/mo

dull star
#

its like the membership is a suggestion, not a rule thomas

#

but yeah I wouldn't be making much cash from it

restive halo
#

it's just a bit annoying that if you for example put a lot of effort and make something and it happens to go super viral, you might be in a weird spot

#

the $20/month is at least fairly clear but if you needed to have contacted them to have made an agreeement beforehand it's a bit iffy

rain current
#

I hope SD3 is as precise with the prompt as ideogram...
"A composite of three distinct scenes. In the top scene, there's a spacious room with a table set with elegant decor, including a vase with pink flowers, a black teapot, and white spherical objects. A woman in a black outfit sits on the floor, engrossed in her thoughts. The middle scene showcases a woman with a unique hairstyle, wearing a black outfit, sitting at a table with a wine glass in front of her. The background is a dilapidated building with a reflective body of water in front. The bottom scene depicts a serene scene of a man sitting alone on a boat, surrounded by a calm body of water with a dilapidated building in the background."

dull star
#

yikes, probably not

#

there's nothing stopping us from just simply using regional prompting though, this could be easily set up and accomplished

rain current
#

Taking it to the limit... 😨
"A collage of various intricate and artistic photographs. Starting from the top left, there's a close-up of a person's eye with a detailed pattern on the iris. Next to it, there's an image of a hand resting on sandy terrain with a ring on one of the fingers. Moving right, there's a person wearing a black and white striped outfit with a reflective face mask, revealing a cityscape behind it. Below, there's a detailed close-up of a moth's wings with ornate patterns. Next to it, there's a photograph of a white horse in a snowy landscape. On the bottom left, there's a close-up of a person's face with a detailed sketch of a city skyline on it. Adjacent to it, there's a photograph of a castle-like structure in a snowy environment"

dull star
#

honestly if we finetune SD3 on this then it might get really good at splitscreen stuff

#

I also love making movie posters on ideogram

#

SD3 can make nice paintings man

#

I hope 2B will excel at these too

dull star
#

this is just so good man

#

SD3 did perfectly

lucid swift
lucid swift
#

Also deep floid if trained in pixel space and still had artifacts

dull star
#

odd

#

but yeah its pixel space, yet it has the same small face issue as models that use VAEs

lucid swift
#

Small face?

dull star
#

small faces are distorted

#

in the distance

lucid swift
#

Maby the artifacts in if are created by the upscaler

dull star
#

well if it is like deepfloyd, then yes

#

cause its in multiple stages

dry wave
#

of course, channel count does also have an impact on performance, however, the international channel count is independent from the vae channel count

dry wave
#

the first thing that happens in sd is that the 4 channels are mapped to ~1000 channels

lucid swift
#

How do u know this stff btw?

dry wave
#

so it doesn't matter what's the channel count in input and output, when most of the time the model is using a much larger channel count anyways

#

it's open source. Anyone can lookup the code

lucid swift
dry wave
#

I'm scientist in a field related to machine learning πŸ€·β€β™‚οΈ

lucid swift
#

Do u think that increasing channel count would also help the casade model without removing the 16x training speedup?

#

@dry wave

jolly swan
hallow lion
#

i like how dalle straight up set him on fire

hallow lion
dry wave
#

I think cascade was some kind of proof of concept. Showing that you can achieve incredible compression

#

but this amount of compression doesn't make sense if you want high quality output

lucid swift
#

Yes i woder if a incresed channel amount could let it seem like a normal model with more details but still high compression for faster learning

dry wave
#

I think the main advantage here is the multiple stages

#

doing composition first and then fine details

#

in sd you always use the same fat unet in every time step

#

but we know that most of the unet is not even used most of the time

lucid swift
#

Stage b does not really add details currently. Its only like a insane vae

dry wave
#

like the output of the down layers of the unet stays the same most of the timesteps but they are still computed all the time

#

doing staging just makes sense ad composition on early timesteps is just a very different task from the later timesteps

dry wave
#

like I think dallE is using a "vae" made out of a diffusion process

lucid swift
fleet meteor
lucid swift
dry wave
#

in principal diffusion is a method to go from a random normal distribution to a complicated distribution. A vae is doing something very similar.

lucid swift
#

I am just so impressed with cascade because it lerns 16 times faster. This can make finetuning or training in general more possible on consumer hardware

dry wave
#

anyways, I go to bed. Good night

lucid swift
dry wave
#

I hadn't much luck with fine-tuning Cascade πŸ€·β€β™‚οΈ

#

fine-tuning results in sdxl were always much better

lucid swift
#

But we can write tomorrow. I am following 2 cascade fi etuning projects and both seem promising

sterile pendant
#

Oh and also, 16 vae channels means that you have a lot better control at decoding an image vs the old 4 channel method

dull star
#

wdym better control?

sterile pendant
#

A vae is what resolves an image from high dimensional latent space. It takes some Nth dimensional data and collapses it down to three dimensions: RGB. The more channels the vae has, the more accurately it can do the job

#

It would be like comparing mono audio to stereo audio

hallow lion
#

did you know VAE is a lossy process? XD

#

i didnt know

#

everytime you decode and encode you loose quality =0

sterile pendant
#

Yeah it's a form of compression and decompression

#

Oh and the extra channels thing also applies to the encoding part as well

hallow lion
rugged nova
split ledge
#

Hey there πŸ™‚
Can I generate images with a lower resolution than 1024x1024 with sd3 ?

hallow lion
#

Does it like the same promting style as sdxl? Or natural laguage? do we still need load sof negative promts?

#

do we still do parenthesis? the emphasize?

#

(((((((great hands:1.5))))))

turbid grotto
#

guys

#

I am so happy about sd3

crude yarrow
sour harbor
#

When do you think there'll be SD3 LoRA training? πŸ€” So interesting

patent acorn
sterile pendant
#

But I'd imagine SD3 can probably handle something like 768Β² without imploding

neon wagon
#

we will see 3-6 months after the weights the first loras and finetuned models

sterile pendant
#

Honestly, I'm waiting more for the controlnets than anything. From what they've said, the controlnets will be far better and easier to train than the hacky stuff that was needed to use them with unets.

deft wren
#

We are so happy...

sterile pendant
#

Loras/doras should be neat as well, but if controlnets can wrangle tough scenes, you won't need so many models of people trying to fix things like hands and whatnot(if you're working on people)

#

Loras will likely be a much bigger deal since the base model is already really decent. Good controlnets and maybe ipa(or some kind of dit compatible version that does the same kinds of things) will make things far easier than relying on overtrained models

dusky thistle
#

yeah, i see the loras/doras as a way to add concepts

#

not fix stuff

#

(hopefully)

rain current
brave bloom
sterile pendant
#

If it's just 2b and the two clips, should likely be doable with even 12gb vram, maybe even 8 depending on the dim size

dry wave
sterile pendant
#

So again, apples and oranges.

radiant ledge
#

sdxl unet has some attention, but wouldn't call it a transformer

sterile pendant
#

a unet is a cnn and the attention happens across the shape of a U, hence Unet

#

but vision transformers and convolutional neural networks are very different in how they work

#

"Vision Transformers and CNNs (Convolutional Neural Networks) are two different types of neural network architectures used to solve computer vision tasks. Vision Transformers are based on the Transformer architecture, originally designed for natural language processing, but adapted for image analysis. CNNs, on the other hand, are a type of deep learning network specifically designed for image recognition and classification."

#

"The main difference lies in their architectural design and the way they process visual information. While CNNs rely on the use of convolutional layers to extract features hierarchically, Vision Transformers utilize self-attention mechanisms to capture global dependencies and relations between image patches directly. This allows Vision Transformers to model long-range interactions within images more effectively than CNNs."

radiant ledge
#

unet is not a pure CNN

sterile pendant
#

but anyways, the moral of the story is that a cnn != dit. so stop sweating the parameter size differences because they function completely differently under the hood

#

it doesn't have to be a pure cnn, it's still a cnn

#

like with unets, you can still do things like self attention and whatnot, but at the core, it's still convolving

dry wave
dry wave
#

the sd unet is a transformer at its core

sterile pendant
#

alright, so all these dozens of articles are just talking out their ass then βœ…

dry wave
#

the convolutions are necessary for some things like composition, downscaling, add positional information.
In the ViT architecture you have also downscaling operations, called patching, but they don't use convs

dry wave
sterile pendant
#

U-Net is a convolutional neural network that was developed for biomedical image segmentation at the Computer Science Department of the University of Freiburg. The network is based on a fully convolutional neural network whose architecture was modified and extended to work with fewer training images and to yield more precise segmentation. Segment...

dry wave
#

boah dude, I know what a unet is.

radiant ledge
#

unet is a general term, you have to look at a specific implementation

dry wave
#

exactly!!

radiant ledge
#

unet just means scaling stuff down and then up again

dry wave
#

the sd unet is a transformer architecture

sterile pendant
#

it doesn't matter what flavor your trying to talk, sd unet is still a unet, they just hacked in some self attention. again, the way cnns and dits "see" are completely different and sd still "sees" like a cnn

dry wave
#

there is also the hourglass transformer architecture which is... just a unet with another name. They don't use convolution so they gave it a new name;)

sterile pendant
#

but i'm not going to argue it any further, keep thinking what you want

dry wave
#

it's not "hacked in" some self attention

#

the transformers are the core component of the unet

#

the main differences in sd3:

  • you have positional embeddings
  • text and image embeddings share a common latent space and are transformed together
sterile pendant
#

youre still missing the point of how the models "see" the data. i can run a c++ program that then runs a python program inbetween steps, but then jumps back into the c++ program. doesn't make it a python program.

#

that's what's happening in the sd unet essentially with self attention. all the actual real operations are still happening in the cnn

radiant ledge
#

there's a grand total of two convolutional layers in the sd unet

dry wave
#

the model "see" it's data in some latent space. It's totally unimportant if convolutions are involved here. What's important is that in sd3 text and image share the same latent space

desert garnet
radiant ledge
desert garnet
#

tbf it takes a high iq to understand what that bot is saying

cobalt moon
#

If I wont wrong U-Net is just a base?

desert garnet
radiant ledge
desert garnet
#

acid = πŸ‹

cobalt moon
#

I actually have no idea too lol. Didn't learn much about architecture or machine learning networks

desert garnet
#

i think we need to eat more lemons

sterile pendant
#

again, i guess all these dozens of resources are all just wrong about it then... sd's unet has some elements of transformers in it, yes(some attention), but it is still a cnn and still revolves around the unet. sd3 uses an actual transformer network that is completely centered around it. it's what llms have been using for ages and is very different under the hood. until recently, it was a pain in the ass to make work with things like image generation while keeping the hardware(vram/perfromance) and training costs from ballooning out.

left parrot
#

Has there been any news about SD3 Turbo lately?

dry wave
#

the transformers in SDXL are much bigger than the transformers in SD3

#

and you tell me "it has some attention"

#

yes, the model architectures differ. And yes, this can have some implications. For example, you probably won't see this weird duplications in SD3 in superhigh resolutions, as these are artifacts from the convolution. So, using convolutions instead of positional embeddings definitely has some effect

pseudo stone
#

like will it be good enough for people to retrain everything i dont know

lucid swift
pseudo stone
#

Sote DiFusion <3

dry wave
#

although SD3 has some cool technical features I like and which might indeed work better than SDXL

pseudo stone
# dry wave I think CLIP is on it's limit. For real prompt understanding you need T5. But ye...
low stone
pseudo stone
#

would it cost much to train though?

#

what about that adapter that used pre trained llms as an encoder

low stone
#

The guy already trained it. Ella for sdxl is finished, but they couldn't release it because sdxl has a different commercial license than sd 1.5

dull star
#

man if comfyui_tensorrt had pixart support too

#

that would be so awesome

#

but if SD3 comes out and they optimize that too, it would be great as well

low stone
#

I'm sure they will. Hunyuan released tensorRT libraries the other day. It'll be neat if those are comfy integrated. I need to message the author of the comfy extra models nodes to see.

dull star
#

wow

#

honestly, SD3 2B is so close

#

I just cannot wait to finetune lora models or textual inversions

lucid swift
cunning lintel
lucid swift
cunning lintel
#

But i'm really curious about an sdxl ella as well, initially i thought sd3 would be miles better than the ella approach, but sd3 obviously has limitations as well, it'd be interesting if ella and sd3 prompt understanding turned out to be in the same ballpark

low stone
cunning lintel
# lucid swift but t5 is good

but it's already part of sd3 (what i understand the whole thing about sd3 is that it can deal with various inputs/outputs, so a solution like ella might be obsolete for the new architecture)

dry wave
low stone
dry wave
#

and the SDXL licence totally allows you to release something like ELA oO

#

I think he don't want to release it for other reasons...

low stone
dry wave
#

the commerical licence is just sdxlturbo

lucid swift
dry wave
#

I rather hink that this big company he works for doesn't want him to release the weights

cunning lintel
#

ella guys also said the sdxl version was fintuned, might be those images used for the finetune are the licensing problem. or it's simply that they want to use the sdxl version for their own imagenen

lucid swift
dry wave
#

I assume the later

lucid swift
low stone
lucid swift
#

why are so many chinise ai companys like that

dry wave
lucid swift
dry wave
#

yes

#

similar to ipadapter

lucid swift
dry wave
#

maybe I just understood you wrong.You mean the training code for ELA is not open

lucid swift
#

and he is trying to revese engener the training code

dry wave
#

but even adapter training costs a lot of compute, usually more than you can effort with consumer hardware

lucid swift
dry wave
#

I mean there are several ipadapters out there

lucid swift
dry wave
#

it works very different

#

because it injects the conditioning via cross attention

dry wave
#

while the controlnet is "mimicing" the unet

#

so controlnets have the disadvantage that they take a lot of resources/performance. Basically they are as huge as the base model itself

lucid swift
#

but isnt a controllnet also injecting it like taht? and they just finetune a copy of the unet for faster traing

dry wave
#

the advantage of controlnets is that they are initialized by the base model, so they already "know a lot about images" and can be trained faster

dry wave
#

no, controlnets just use "addition"

#

basically they compute some delta they add ontop of the original unet

#

which doesn't mean it's less powerful. But because they have to be as big as the original unet they are very resource-ineffective

#

also you can only use controlnets for images

#

while the ipadapter idea can be used on any kind of input data (including T5 text prompts like in ELLA)

lucid swift
#

but the disatvantige is a high traing cost? how high do you think?

dry wave
#

training code for ipadapter is also on github

#

if you really want to train something like ELLA the costs will be massive

#

the problem is that if you use CLIP+T5 then SD will probably just ignore the T5 as the information from CLIP is much easier accessible

#

so you probably have to train it like in SD3 that it gets T5 information only sometimes to really encourage it to learn something

#

but as T5 embeddings do not align at all with images... it will be much harder to learn from T5 than learning from CLIP

#

basically, T5 is totally alien for SD. It knows nothing about this latent space and it has to learn everything from scratch

lucid swift
#

interesting! i just think its so sad that everything is so expensive.

dry wave
#

dunno. I mean you can rent gpus for relatively low amount of money

lucid swift
#

i wonder if neural networks that create neural netwoks will reduce the cost in future

dry wave
#

but then it's an expensive hobby ^^

#

I think most people, if they spent a lot of money into that, want some money back

lucid swift
dry wave
#

just not something AI is good at

lucid swift
dull star
#

would we have to truncate our prompts when training loras and stuff?

dry wave
#

no, because if you want it to find something new and better than humand could do, its outside the data distribution

#

all cases where "AI found some cool new algorithm no human ever found" so far were exaggerated. Like what they usually did was just trying billions of different algorithms and use the best one. You didn't even need a neural network for that, you could just combinatorial generate code

#

what LLMs can do today is writing you python code that makes a DNN as we have it already

#

you could just write it yourself

lucid swift
dry wave
#

so in best case you don't have to learn python

#

because it's an ongoing discussion how much "AI" is currently in our "AI". Some people might say all ChatGPT is doing is just autocompletion via statistical inference. No real thinking.

#

it's hard to say, though, where is the border between statistical inference and real thinking

lucid swift
dry wave
#

I would say because we have MUCH LESS training data

#

like I learned programing from a few examples

#

ChatGPT need millions of programing examples to learn something

#

from few examples you cannot do statistical inference

lucid swift
#

maby the brain is just better at atocompetion with less data then current models

dry wave
#

as said, it's an ongoing discussion. Nobody knows the truth yet

lucid swift
#

i agree. i just want to know your opinion

dry wave
#

but while I think that large llms do have some kind of understanding about the data they process

#

I still doubt that they are able to reasoning on a level of a human (or even about)

lucid swift
#

i 100% agree llms are not even close

dry wave
#

and I don't think they are able to generate a new scientific discovery or algorithm or something like that

#

so far they can only assist human in doing so

#

(which, to be honest, is totally fine for me xD)

lucid swift
#

yess xD

#

not beeing repaced for now

cunning lintel
#

Biggest thing with llms is they're so confidently wrong, and using llms for a field you're not familiar with, you won't know it's wrong

lucid swift
# dry wave (which, to be honest, is totally fine for me xD)

i wonder if we will lern good algorithms by revese engeneering nature. this is very interesting. https://www.youtube.com/watch?v=8Ukin_-5aLQ

Alexander Borst, Max-Planck-Institute for Biological Intelligence, Martinsried, Germany

Abstract: Detecting the direction of image motion is important for visual navigation, predator avoidance and prey capture, and thus essential for the survival of all animals that have eyes. However, the direction of motion is not explicitly represented at th...

β–Ά Play video
#

they did revese engeneer part of the fly brain

cunning lintel
#

(but that leaves the question, what is wrong, llms learn from lots of data, the don't understand the difference between high quality data/low quality data, they just
"remember")

lucid swift
#

and that they can generate seeminlgy right stuff

cunning lintel
#

humans aren't that great at it either, they use lots of heuristics to validate their data, authority figures, popular opinion, personal experience, etc.

dry wave
lucid swift
#

and that they sometimes cant do stuff if they are overtrained. like some models do everything in lists even if you say they shuld stop or some add emogys inot everything even if you say they shuld stop

lucid swift
dry wave
#

like yes, biological brains are far superior, but we don't really know how they work and how/if we can simulate that

lucid swift
#

i am sure they can be simulated. but yes we mostly dont know how

dry wave
#

my problem is just that neural network research for centuries was full of this biological bullshit

#

like people came up with a mathematical/statistical solution and then they added some biological bullshit to sell it/get more funding/make it more interesting

lucid swift
#

yes thats stupid. but if you look at the fly example it is very cool

dry wave
#
  • convolutional neural networks work like the human visual cortex. WTF. How often people repeat this bullshit. You know what? they also work like EVERY stupid filter in ANY graphic program. Convolution is a totally normal mathematical operation and it is used since centuries for image processing
lucid swift
#

reminds me of the universe is a neural network paper xD

dry wave
#
  • the sigmoid function simulates a biological neuron. Nah, it doesn't do that. A sigmoid function foremost is a logistic regression which is used in statistics since centuries
lucid swift
dry wave
#
  • my highlight: a few months ago nature published a paper about "neural networks that dream". They claimed that they were inspired by "sleep research in human" and came up with "improving neural networks by letting them sleep and dream, too". You know what they did? They "reinvented" the "regularization images" idea from the Dreambooth paper. Yes, adding regularization data improves learning. But that's not how you make it into nature. You have to come up with a fancy but totally unscientific idea of letting networks dream
dry wave
lucid swift
#

its more like halucinating in most cases

dry wave
#

but even if its like an activation threshold. So what? Its like saying "A unet resembles a human ass because it is also shaped like that" 😬

lucid swift
dry wave
#

yeah xD Kolmogorov-Arnold Networks

#

which is neat, but honestly, we had this centuries ago and called it "general linear models"

lucid swift
#

for a good comparison you probably have o train both

#

but idk if that would be good or not xD

dry wave
#

me neither. I just found it funny, because the idea is so old and now they treat it as something totally new

#

but sure, if it works better, then it would be cool. I'm sceptical, though

lucid swift
#

but i have seen the training of activation functions a long time ago. i think it was controlling a waliking spider or something

#

i also thoght its strage that manny said its new

dry wave
#

anyways, I just hate this kind of "we have to find analogies from biology to sell people AI". One of the really nice things on transformers is: people haven't found any biological analogy for it so far xD Like for the first time nobody could say" yeah chatgpt is a transformer which is like XYZ in the human brain". Really nice

lucid swift
dry wave
#

KAN networks use linear combinations of 1D splines. That's what is usually called "generalized linear model" (GLN). The only difference in KAN is that they use more than one layer. GLN usually only use 1 layer, because you use them when you want a linear, interpretable model.

radiant ledge
dry wave
#

not as far as I know

warped ivy
dull star
#

They have a Large and X-Large that are not being released
Look, misinformation is very funny, but its getting old

rose gate
#

yeah the ragebait is getting boring, the sd reddit is also filled with it

mortal mesa
#

when i was a kid we didnt include companies into our core beliefs to be defended or whatnot

cunning lintel
# dull star > They have a Large and X-Large that are not being released Look, misinformation...

πŸ€·β€β™‚οΈ I don't disagree, but SAI is just as much to blame for the shit poured over them, they should have invested in proper communications a LONG time ago. Even this "outrage" could have been prevented by simply better wording. Something like "Stable Diffusion 3, our most advanced text-to-image is on its way! You will be able to download the weights for the Medium model on Hugging Face from Wednesday 12th June, while we continue to prepare the other 3 versions for later public release"

#

But i agree, it sucks, internet sucks, just don't give sooooo much room for all this misinfo and misrepresentation 😒

dull star
mortal mesa
#

ya sure they could of added an "a"

dull star
#

In this server they got it right though:

The β€œweight” is nearly over! Today, at Computex Taipei, our Co-CEO, Christian Laforte, officially announced the open release date of Stable Diffusion 3 Medium for June 12th.

#

its the email that's stupid

Have you heard that the SD3 weights are dropping soon?
it's saying weights, like multiple

#

ok wait its saying Stable Diffusion 3 Medium, our most advanced text-to-image is on its way! right after it...

desert garnet
#

yea still remember when ppl where posting stuff emad said about 2weeks soon like 2 months ago

dull star
#

people don't want to read past the first few sentences, so it kinda makes sense

#

but yeah, saying "the weights" is misleading, not a good headline

dull star
#

Emad: I'd expect proper release next month (weights)

mortal mesa
#

with Cnets

dull star
#

kek

#

we, including Emad, heavily underestimated how much time these models needed to train

#

and currently we're getting 2B right now, 8B has a long way to go

#

we won't see 8B until like august or september, maybe even october

#

but when it comes, it could be a DALLE3 killer

#

if a fully trained 2B gets on the level (if not above) of an undertrained 8B, then I can't imagine 8B trained to its fullest potential

desert garnet
#

lets see if sai makes it to october

dull star
#

or even 4B

#

yeah... 😬

#

they need 2B released to get money from the subscription

#

they should replace the API model from SD3 8B to SD3 2B with highresfix

#

and it actually becomes a competitive product, like Core (which is just a heavily finetuned SDXL Turbo with a workflow)

mortal mesa
#

soo good ive never heard of it teehee

silver sluice
# dull star lol

to be clear "large" and "xlarge" are not being released ever or right away? have they said whether they plan to release the larger versions at all in the future?

silver sluice
dull star
#

Alex (mcmonkey):

We're on track to release the SD3 models* (note the 's', there's multiple - small/1b, medium/2b, large/4b, huge/8b) for free as they get finished.

silver sluice
# dull star Does anyone know how PonySD3 would be trained

oh and to answer your question from earlier here's the link to the article i was talking about:

https://civitai.com/articles/5069/towards-pony-diffusion-v7

and the key quote:

I am keen on training V7 using SD3, although it's currently uncertain whether we will have access to the model weights. I remain hopeful and would be delighted if someone from SAI could discuss this possibility with me. Despite my efforts to reach out, there has been no response yetβ€”perhaps there's a bit of apprehension about being outshined by PD (just a light-hearted thought).

Hello everyone, I'm excited to share updates on the progress of our upcoming V7, along with a retrospective analysis of V6. The recognition V6 has ...

dull star
#

also we might not see pony on SD3, JUST because of the license

#

but we'll see how it goes

silver sluice
#

oh that's sad to think about, so SD3 vs SDXL licenses are different?

dull star
#

yes, SDXL is openrail++, like pixart sigma and sd1.5

#

commercial use with no licensing required

#

SD3 is non-commercial, you need a paid membership for commercial use

#

but I am not so sure about all of this because

silver sluice
#

but pony isn't commercial use is it?

dull star
#

the image generated, are owned by you

#

and since you are generating offline, for yourself, you are using the model itself for personal use

#

and since the image is owned by you, you can use it for whatever you'd like

silver sluice
#

yeah i agree, lol ill hold hope pony dev integrates SD3 despite any potential licensing issues

dull star
#

but I'd still recommend you to pay the membership fee if you start making more than $20 a month

#

I'd do that for sure, but I only make images for fun 9 times out of 10

faint breach
dull star
#

that makes sense

faint breach
#

adobe lawyers get pretty aggressive about damages, but i dont' think theres many cases where they try to claim that they own the ip made with unlicensed photoshop

dull star
#

but that's an illegal copy though like you are saying, but what about SD3, which is inherently free, and the gray legality (or whatever) about AI generated images

#

or do you mean like, in some countries, using pirated software for personal use is not illegal for example, and therefore Adobe can sue people?

faint breach
#

yeah iamal. copyright law is complex. i certainly wouldn't test it. i'd license it. the cost doesn't seem to be a lot

#

lol ianal i mean

dull star
faint breach
#

yeah he communicates the same intentions. the licensing is broad enough to cover many more cases than they intend it to. it's not intended for youtubers unless they're raking in 5 figures a month

dull star
#

yeah at that point I'd feel guilty for not buying a membership from stability, even if the model wasn't non-commercial to begin with

faint breach
#

another consideration. maybe you're using sd3 for free through another service that does pay the license

dull star
#

I totally get why Pony v7 might not be finetuned on SD3, this sounds weird and intrusive

dull star
#

yeah, then can I use it for commercial use?

faint breach
#

i dont think most pony users are trying to commercialize their creations. thats one of the funniest user example galleries on civit. dozens of new entries every hour. a constant deluge

dull star
#

yeah they just want to make cartoon corn for themselves or make images to impress or arouse other

#

it's the model creator who might want to commercialize it in some way maybe, idk

faint breach
#

or a service deploying it

dull star
#

yuh

dull star
# dull star

this is like how Microsoft doesn't own the created images from Copilot image generator (DALLE3), (but in Microsoft's case they can use the images if they want to)

faint breach
#

if they're taking donations because they made a model, that's a legal grey area that i don't think has been tested much

dull star
#

is paying for credits on the api going to stability, or is it split between them and fireworks or whatever

#

cause idk how to donate once besides that or just cancelling the membership after a month

faint breach
#

copyright shouldn't be a tidy discussion anyways. human creativity is a messy field. the rules governing it can't be orderly. that's how disney swoops in and owns everything

noble coyote
#

MJ's Copyright scheme seems totally contradictory (and I paraphrase): "MJ owns the outright copyright to any image produced; yet extends an unlimited and inalienable right of use to the producers of such images!!!"

#

Take that as you will...

dull star
#

lol

faint breach
#

many software as a service companies will do this. especially ones that are planning on an acquisition exit. they can claim more value

sullen moss
#

Just need to bite text filter

faint breach
#

coze.com looks like a spam hub. affiliate links galore. nothing about dalle.

sullen moss
#

What do you mean 'nothing ' ?

faint breach
#

its a spam link farm. you'e a spammer. think that clears it up

sullen moss
#

If it were spam, I wouldn't have shared this link here. I started using this resource myself, so I decided to share it

faint breach
#

"this resource" it's an affiliate link farm. spam.

sullen moss
#

Hm

#

Ah, now I understand what you meant, sorry. 🀝

#

In general, if you're interested, look for the thread about Dalle-3 on 4chan, everything will be clear there

gusty trail
dull star
#

the free one

gusty trail
#

I mean if someone pay for a fine tuned version. The author and the customer both need the membership

faint breach
#

people paying for finetuned models? that sounds like bullshit

#

i really hope that stability's new license doesn't unleash a wave of enshitification like that

teal fossil
faint breach
#

8 days

teal fossil
teal fossil
# faint breach 8 days

Pfffff - I'm so tired from too much dataset shenanigans that it's almost wednesday for me. πŸ˜›

#

That being said - are we looking at a midnight release? Which timezone? πŸ‘Ό

dull star
#

😈

teal fossil
#

Seriously... the wait on wednesday will be the worst. 🀣

faint breach
#

i'm most west canadur timeszone. its 10:25 here

dreamy sundial
faint breach
#

USA has Hawaii too so thats further west

teal fossil
faint breach
#

break those chains that bind you

gusty trail
teal fossil
faint breach
#

Going to be interesting to see how finetuned models proliferate. If SD3 refiners start charging for their versions, i'll move over to pixart sigma or stick with sdxl instead

#

imagine needing to subscribe to someone's patreon to use their loras

dull star
#

I'll just keep using the base model thomas

#

yeah bruh

#

lykon will probably keep making free models

silver sluice
#

so just to make sure I'm clear

  • if i wanted to download SD3 and run locally that's free and doesn't change from SDXL
  • if Pony dev wanted to download SD3 and fine tune it for his purposes and provide it to users he would have to pay SAI a membership fee and he would have to offset those costs by charging users to download his model?
faint breach
silver sluice
#

so does that sum it up correctly? are you affirming that's right?

#

pony dev could offer it as a paid download but once it leaks anyone else can just download it and then it's a 'pirated' copy at that point right?

faint breach
#

model authors don't have to charge for their models. they might though.

silver sluice
#

well is there a membership fee? and if so how much? I'm sure a trivial $100 fee wouldn't cause anyone to offset the cost to users but if it's like a monthly $10/K fee then that's a different story lol

dull star
silver sluice
dull star
#

free membership is a membership

#

they didn't specify which one

#

if they make it so that only paid members can finetune, then stability have dug their own graves

#

so I'm pretty sure that's not the case

silver sluice
#

oh good point, i didn't know there was a free membership, yeah my understasnd was if from SDXL to SD3 the only change is paid members can finetune then that would suck for guys like pony dev

gusty trail
#

You could fine tune model for non-commercial use

dull star
#

well I suppose all finetunes follow the non-commercial license, no?

silver sluice
dull star
#

finetunes will require a paid membership to STABILITY to use the finetuned model for commercial use

dull star
storm saffron
gusty trail
#

But if someone use the non-commercial fine tune for commercial usage, let say hosting free models and making profit. How would it count

silver sluice
#

ah i undrstand so for example if i decide to use PonyV7's SD3 finetune model in a commercial application, then I'll be required to sign up with SAI as a paid member. right?

dull star
storm saffron
faint breach
storm saffron
#

You can fine tune it, and you can use that fine tune to make pictures to sell, but you can't put it on a hosting service and ask people to pay for use.

#

Unless you pay

gusty trail
faint breach
#

author can't distribute fine tunes without a license. all derived versions of the model are subject to stability's commercial license

silver sluice
silver sluice
gusty trail
faint breach
#

end users can download and use models locally for free. they can do that with finetunes too. but authors may want to charge for those. we dont know yet

storm saffron
faint breach
dull star
#

random guessing logic, ianal, don't quote me on this:

  • if you think about it, you host the model offline, to yourself (comfyui, a1111, etc), therefore its personal use, which means non-commercial
  • and since the image outputs are owned by you, so theoretically, you could do anything with it
    I want to hear from stability how this all really works.
faint breach
dull star
#

but a paid one though? I want to know that

storm saffron
#

You pay for commerical usage of it.

faint breach
dull star
storm saffron
#

Once it's on YOUR computer you can do what you like with it until you make it public in exchange for payment. That's how I read it.

dull star
#

whatever is the case, if I ever use it for commercial use, I'd buy the membership if I actually go past $20 a month

storm saffron
#

From the Turbo license, which is the current Non Commercial license.

Merely distributing the Software Products or Derivative Works for download online without offering any related service (ex. by distributing the Models on HuggingFace) is not a violation of this subsection.

The subsection being "Non-Commercial Use"

dull star
#

or Derivative Works
ah yeah

storm saffron
#

Whole section:

b. You may not use the Software Products or Derivative Works to enable third parties to use the Software Products or Derivative Works as part of your hosted service or via your APIs, whether you are adding substantial additional functionality thereto or not. Merely distributing the Software Products or Derivative Works for download online without offering any related service (ex. by distributing the Models on HuggingFace) is not a violation of this subsection. If you wish to use the Software Products or any Derivative Works for commercial or production use or you wish to make the Software Products or any Derivative Works available to third parties via your hosted service or your APIs, contact Stability AI at https://stability.ai/contact.

As it says there, finetuning it and giving it away is fine.

dull star
#

yeah it seems so

#

I suppose the same non-commercial license will apply to SD3

storm saffron
#

It should do, this is the updated one they're using on all the 'core' models now.

dull star
#

they are really just targeting companies using their models for free

long palm
#

#πŸ†•

storm saffron
dull star
#

from what I've heard though, is that the license for companies (enterprise membership or whatever?), is suuuper expensive, and some of them just thought of training a model themselves

long palm
#

#πŸ†• | sd3

dull star
#

wow I was kind of right lmao

#

except 8B is Huge

storm saffron
#

Any thoughts on how it'll split the community though? I think M and L will be most popular. I guess S is for phones?

dull star
#

yeah idk how much difference there will be between 4B and 8B

#

cause if a fully trained 2B is already catching up to an undertrained 8B, I'm not so sure if we'll need 8B

#

unless 8B has INCREDIBLE amounts of knowledge and prompt adherence

#

then it would be worth to make slower generations at the cost of superb prompt adherence and stuff

#

I suppose M will be the most popular

cunning lintel
#

Otoh maybe it's for the better, the fact that the small models are cheaper to train might result in things get developed that otherwise wouldn't even be tried at all πŸ˜‰

viral plaza
#

2B==Medium is locked in

#

1B is very unlikely to be named anything other than Small

#

4B/8B will probably be Large and Huge/Giant or something, or it might be we skip 4B and say 8B is Large, or idk

dull star
#

skip 4B?

storm saffron
#

@viral plaza what quantization will we be getting bf16/fp16?

viral plaza
#

I think fp16

dull star
#

(with cascade we've got bf16 iirc, why did we though?)

spare orchid
#

hey, anyone here have experience in creating anime waifu type images out of inanimate objects ,cars etc. need help with something

storm saffron
#

bf16 would be better on 3000 series nvidia and up.

viral plaza
#

running the model weights (not calc) in fp8 even is near-identical

#

so exact format in storage doesn't overly matter

#

only matters what you calculate in and what you train in

dull star
#

isn't running it in fp8 slow on non 40xx though?

viral plaza
#

running yes not storing no

dull star
#

thanks

viral plaza
#

again, weights in fp8, calc in fp16 or bf16 to preference

dull star
#

interesting

viral plaza
#

basically half the VRAM cost and maybe a tiny bit timecost from the conversion (not much I think) and identical results

storm saffron
#

It wasn't quite the same in SDXL with FP8, you could tell something was off.

dull star
#

I expected something like this from 8B, cause its such a large model

#

but from 2B...?

viral plaza
dull star
#

could it be because its transformer-like, therefore it handles quantization better? (theory)

viral plaza
low stone
#

SD3 Big McLargeHuge

storm saffron
dull star
#

imatrix 2-bit ggml quantization

#

lmao idk what I'm talking about at this point

#

but this is good news

storm saffron
#

You could possibly quantize it down to 6ish without too much loss

dull star
#

and about T5

#

weights at bf16/fp16 (compared to fp32) already decrease load times and ram usage if being run on CPUs

#

what about storing them in fp8 too?

storm saffron
#

I assume the T5 we're getting is in FP16 as well, but that does quantize pretty well using bitsandbytes.

dull star
#

yeah bnb4bit is perfectly fine with T5 when I tried it with pixart, heavily decreases vram requirements compared to raw weights

dusky thistle
#

regarding training with 2b... i think the biggest question of all is what it takes to train controlnets

viral plaza
#

I hope we can release an SD3-Medium-fp8 safetensors

storm saffron
#

I just run T5 on the CPU cos it's not actually that slow

viral plaza
#

it'd be a literally 2GiB model, same size as SD1 model files, but better-than-XL quality

dusky thistle
#

sdxl wasn't left wanting for long for loras and finetunes, but controlnets? that's been the real problem all along

storm saffron
dusky thistle
#

they're finally rolling in but it took almost a year to get good ones

dull star
#

thankfully someone trained a good openpose model for SDXL after all this time

#

(it like... actually works this time)

viral plaza
#

controlnets have a clear logical place to go in mmdit - it's built around multiple streams as a concept, so just tack on another stream (vs SD1/SDXL, controlnets are kinda hacked in)

dull star
#

I wonder how much better controlnets will get because of this then

dusky thistle
#

if it can be squeezed into 24gb of vram, it will be amazing

#

or whatever vram the 5090 ends up having

storm saffron
dusky thistle
#

yeah, or 32gb, or who knows

#

28gb would be stupid

dull star
#

I count on 28GB thomas

low stone
#

What I want to know if I have a 4090, would I be able to just swap in a 5090. Is it the same form factor. If not, that's gonna blow

dusky thistle
low stone
#

I have a perfectly good Alienware box with a 3080 that has the power and cpu/ram for a 4090, but it won't fit in the case. Would blow chunks if they do the same thing again with the 4090s.

viral plaza
#

so if you can train XL you can train SD3-Medium

dull star
#

didn't he mean training controlnets?

viral plaza
#

oh controlnet training idk

dusky thistle
#

yeah i think controlnets are the biggie

dull star
#

also, can you tell me if lora-like training code will be provided out of the box?

#

or will it be more like dreambooth

viral plaza
#

the weight size would be ~half the weight of of SD3-Medium, so roughly 1B-ish to add a stream

#

so should be trainable

dusky thistle
#

i'm guessing the fact that controlnets were considered when designing mmdit means that they will be much more effective than the sdxl ones, which are often really weak or hit/miss

storm saffron
dusky thistle
#

cuz if so, wow

viral plaza
dull star
#

maybe diffusers has that, I forgot

viral plaza
#

HF will have code published so presumably they'll cover all the usual training

dusky thistle
#

cnets trainable on a consumer card would be very cool

storm saffron
#

We won't need loras, everything's in the model right?

dull star
#

man I wish

dusky thistle
#

https://github.com/huggingface/diffusers/issues/4925

i've never tried training a sdxl controlnet, but i recall reading it required more than 24gb... no idea where aside from what i just found here, so take it with a train of salt

"You can add the --use_8bit_adam and --enable_xformers_memory_efficient_attention flags, it works for me. The VRAM usage for each card is about 35GB when setting --train_batch_size=1 and --resolution=1024."

GitHub

Describe the bug Hi. I am running the Controlnet SDXL example as it is shown in the examples section [example-link]. I am unable to reproduce the results in a SLURM managed environment, where I hav...

cunning lintel
#

reading the announcements of the new sdxl controlnets, training them doesn't seem to be a thing for mere mortals :p

viral plaza
#

SDXL controlnet requirements for training are higher than SD3-Medium by a fair bit

#

also, for SDXL we had control-LoRA but idk if HF training code supports it

#

the whole point of Control-LoRA naturally being to reduce the resource cost

dusky thistle
#

one of the biggest things that sets the potential with SD so much higher than with anything else imo

#

so if it's 35gb at a min for sdxl and if the vram needs are 30-35% lower for sd3-medium, it's doable on 24gb

woeful spindle
#

what does T5 mean?

#

is it something that helps text generation?

twin tulip
#

t5 is a different type of text encoder, not a clip text encoder

woeful spindle
#

hmm

#

is it built-in or do we need to do something to activate it

twin tulip
#

I think we're awaiting to see what pipelines work or are delivered, the paper said T5 can be optionally dropped, T5 is huge, much bigger than either clip model, maybe mcmonkey can chime in or we'll know later

viral plaza
#

dropping T5 works fine if the size is an issue for you

#

CLIP G+L without the T5 is very close to having all 3 on most prompts

twin tulip
#

I imagine pipelines can be setup to load T5, run it once for embedding, then move the weights to cpu while the DIT runs

#

or maybe T5 can be quantized heavily?

viral plaza
#

you can even just run it entirely on CPU

#

Also yes T5 happily quantizes to 4bit, idk if there will be code for that on launch day but HF Candle runs T5-4bit on CPU well

dull star
#

T5 4-bit on GPU fits well with pixart sigma 0.6B

#

around like 8GB of VRAM the last time I tried, don't remember

#

but it's not so bad on CPU only

lucid swift
dull star
#

especially with the bf16 weights

dull star
#

but I'll try again

#

on gpu it was instant

lucid swift
dull star
#

absolutely

#

like its not suuuper slow either

#

its good enough and accessible

lucid swift
#

yes

dull star
#

it takes about 10-20 secs on CPU for T5

#

then again, after the conditioning has been done, you can generate on other seeds instantly

#

so its just generating the conditioning once, then you can change cfg, seed, and other stuff and don't have to use T5 again

#

that's actually pretty nice

low stone
#

@viral plaza do you think we'll see the 2b on the api or artisan before the 12th?

viral plaza
#

API team is talking about it but idk the timeline

#

if it gets on API it'll be added to Artisan immediately

low stone
#

Ok great thanks

dull star
#

hell yeah

low stone
raven fern
#

2B or not 2B :3

#

man can't wait to try it out

#

and of course see what the community has in store

#

im also curious about the smol model, how good will it generate stuff, and also are most people gonna train loras or finetunes on 2B?

remote holly
#

12 gb is enough for 2B sd3 ?

low stone
#

yes

#

you can choose to offload various bits to main system ram as well, so no matter what it'll render with that.

bitter hearth
#

been a while since I posted anything here

low stone
#

will i be able to make images like this with sd3?

jolly swan
# silver sluice but pony isn't commercial use is it?

It is (although all versions are available for free for local use). Training pony is very expensive, so I have to recoup the costs somehow - I run Discord service for about 20k users and have partnership with SaaS services. I also (obviously) have the SAI Membership, but the problem is that SD3 seems to be non-commercial even for members and you will have to maybe make some extra deal? But this is not communicated at all right now.

low stone
#

If you could just go ahead and fill out this form in triplicate, we'll get back to you around the time we release the 8b.

silver sluice
jolly swan
low stone
#

It was, sarcastically.

#

I feel for you.

hallow lion
#

What's with the drama, can;t we all just be happy we're getting the weight

jolly swan
#

Ah, that felt too real so I was not sure cadancewheeze

hallow lion
#

It's happenign for real! who cares its medium

#

its tstill gonna mop the floor with dalle miedjourney and sdxl

low stone
#

I think pony represents all that is wrong with society and shows off who we really are in our dart hearts. And we salute you.

#

πŸ™‚

jolly swan
hallow lion
#

p0ny is great even if i dont use it for uhm anatomical studies

jolly swan
#

I am sorry y'all decided to use it for something else.

#

That's on you, not me.

low stone
#

And pixart is a model for making this.

#

somewhere it went horribly wrong.

hallow lion
#

AI always sound slike spekaing in tongues and summonign demons when trying to make text

low stone
jolly swan
cunning lintel
#

assuming sd3 is released as core model, i don't see an issue as long as you stay below the enterprise reqs and get the pro membership thingy, doesn't seem you get there with your 20k discord users. But yeah, would be good to get that as a response from sai itelf

silver sluice
# jolly swan Worst case scenario we will get a v6.9 based on XL

it would be interesting to see how your new training translates for better quality images using the sdxl model and then see the results translated to the SD3 model, I think a 6.9 version would also appease the community who have set up their workflow and system around sdxl. so to be clear you're going to wait until the 12th at which point there will be a clear answer on licensing terms and then you'll decide which model to train next?

jolly swan
jolly swan
cunning lintel
#

i'd think that if you make more than what pro allows, you can afford the enterprise license πŸ˜‰ If the worry is that you get a small fee for making the model available to those saas providers, that those providers do need the enterprise license, that's not your problem, they need to get the enterprise license to use the finetune (cause it still has the default license attached), not you

silver sluice
#

i just think overall SAI should have a special room for VIP fine tuners where they can get dedicated support and service and answers to their questions, just a curated list of top tier devs who make the models better so they can be taken care of first and foremost

cunning lintel
#

But that's just my interpretation, that whole membership thing is clear as mud, all it really says it grants you commercial use (where the license that you get with the weights does not)

teal fossil
viral plaza
#

yee

jolly swan
prisma rampart
#

if time/compute is an issue, it would probably be better to skip 4B and train 8B properly vs having both 4 and 8 but both under-trained.

sick cedar
#

2B looks highly capable.

#

And accessible.

jolly swan
silver sluice
# jolly swan I am in the data dungeon fixing image captions 😦

hey I'm excellent in dealing with data processing and automation, i have free time, let me know if you need a hand or some scripting and I could lend a hand, feel free to DM me whenever and we could discuss any solutions I could develop for you to expedite your process in any aspect, it's the least I could do for using your models so much πŸ™‚

sick cedar
# jolly swan Discord is for pony lovers, it's SaaS that makes more reasonable money, but agai...

@viral plaza This is a similar issue to the one i was referring to earlier. I stress that SD3 may not reach it's full potential if it doesn't have the full support of major finetuners, but no one seems to be able to contact anyone official for crucial info on the final conditions of the SD3 License.
@viral plaza I know that you are extremely busy, and only one person, but if there is anyone you can put forward this issue to, we all would very much appreciate it.
(Thank you btw.)

viral plaza
#

we still have the one that was made for SDXL launch

#

haven't expanded it since and the relevant team has changed around

#

that was a Joe Penna initiative. With The Joe gone, gotta get the higher ups on board with Joe ℒ️ methodology

viral plaza
sick cedar
cinder junco
#

@viral plaza Do you have any info about how SD3 memory use scales with resolution relative to SDXL? I like to use hiresfix to generate at 3840x2400 resolution with SDXL, but don’t have a whole lot of memory spare above that. Just wondering what sort of resolution I’ll be able to achieve with SD3. (Mac with 64GB unified memory running Invoke.)

viral plaza
cinder junco
#

Thanks. Too bad! I hope some geniuses can work on that. Has anyone done any experiments with native generation above 1 MP? Does it still go crazy or generate artifacts? Would a higher-res initial generation be useful to lessen the number of stages or tiles in a tiled upscaling workflow?

low stone
viral plaza
viral plaza
#

tiled works well on SD3

cinder junco
#

So A) would be sufficient to allow the same resolution flexibility as SDXL (assuming the fix is possible)?

viral plaza
#

yes

#

somebody just has to figure out how to do that

sterile pendant
#

Basically, how far from a non-square aspect ratio can it handle?

viral plaza
#

basically the same as SDXL

prisma rampart
viral plaza
#

nobody had a reason to with sdxl

#

cause sdxl you can just do hires fix and you're done

#

sd3 will get distorty if you try to run it straight like that

#

so there's a reason to bother making a hires tune

#

also yeah the training team said that sd3 moved resolution objectives very easily

sterile pendant
dusky thistle
#

stuff like... a sandy beach with patches of wet sand underneath dry sand kicked up with the color and texture clearly visible, pebbles and stones scattered around... that kinda stuff disappears during those latent upscales

#

it's not a huge degradation... in a way i'm glad it's a big one for sd3 so we can actually get a proper tune on higher resolutions

viral plaza
#

tru

turbid grotto
#

If they decided switching to 2b version, that means 8b wasn't close to be ready, so could API 8b be really far from it's final quality and we can see big improvements? Or it is already on the level that difference won't be really noticeable?

hallow lion
radiant ledge
viral plaza
flint minnow
#

How many vram do you need for sd3?

late compass
twilit hamlet
#

How many vram do you need for sd3?

late compass
#

@viral plaza

sterile heath
#

2b model is about 2.5x larger than SD2 in terms of params

#

But it’s smaller than SDXL so if you can run that you’ll be fine

late compass
#

How many power version 2b... Is that need more than SDXL

sterile heath
#

8b will be better out of the box but it’s likely 2b will have more fine tuned variations via the community

late compass
late compass
sterile heath
gusty gale
#

This could also be irrelevant by moving T5 to VRAM and switch with the transformer diffusion model when being used

storm saffron
gusty gale
late compass
agile hornet
#

Is stable assistant using the 2B version because I liked some of the stuff I was getting when I used the trial

#

I still had some problems with hands but all in all I got some good output from it

hallow lion
#

8B will serve as the template for the matrix.

#

We cant have that

#

were not ready

sterile pendant
noble coyote
#

Using SD3@ClipDrop - hands, limbs and faces are execrable!!!

#

Mostly ...

dull star
noble coyote
#

Try the free SD3 @Glif - by a user named FABLAN

hallow lion
#

just be careful

#

glif will deem everything nsfw

#

tread lightly

dull star
#

not in my experience lol

#

idk what you are promping

hallow lion
#

well i wasn't prompting nudity

dull star
#

lol

noble coyote
hallow lion
#

still half came out blurred

#

and was told to chill or ill get banned

dull star
#

glif only uses like a word list

noble coyote
#

ClipDrop can often do the same - but they do not recompense you when you lose 28 of 40 pictures like that - and all from the tamest of tame prompts!!!

hallow lion
#

omg

#

ripoff

#

tsk tsk

noble coyote
#

You quickly learn which words/themes/topics will send ClipDrop into a headspin!!

#

Slender is a non-ClipDrop word ....

#

Sensual too ...

#

So I'm doing beach-crazy lighthouses for safety's sake!!! πŸ™‚

#

Pixart-Sigma into SDXL

sterile pendant
#

Then again, it could also be one of the other other nodes people commonly use like rgthree that does the caching im talking about. Haven't used vanilla comfy in ages

dull star
#

ah yeah rgthree

storm saffron
sterile pendant
#

Either way, it's an option and even in vanilla comfy, it would be like two lines of code for the node

storm saffron
noble coyote
#

In this room at least, more faith seems to be placed in epochs, samplers, sigmas, noise-schedules etc etc etc πŸ˜„

noble coyote