dull star Jun 3, 2024, 2:07 PM

#

honestly I'll just use the base model and make stuff I personally like

turbid grotto Jun 3, 2024, 2:08 PM

#

it is definitely not 1024px)

dull star Jun 3, 2024, 2:08 PM

#

back in March, look at the eyes

#

#

god, img2img will be so goddamn good

turbid grotto Jun 3, 2024, 2:11 PM

#

Lykon said on twitter that 2b is better than current 8b in some categories sponging

dull star Jun 3, 2024, 2:14 PM

#

considering that 2B was their focus, it makes sense

#

like a well trained small model will outperform an undertrained large model

#

llama3 8B vs like the largest Bloom model

turbid grotto Jun 3, 2024, 2:15 PM

#

yea and it shows that 8b now has even more potential

low stone Jun 3, 2024, 2:29 PM

#

Sd3/pixart/hunyuan - pixels being sampled by an unruly bunch on a untimely schedule, they're analyzing a model but it's too small.

patent acorn Jun 3, 2024, 2:32 PM

#

huyan one is pretty weird

#

i thought its gonna become a skibidi

#

SD3 one is sick

low stone Jun 3, 2024, 3:05 PM

#

It does really well with a lot of stuff so I keep using it.

noble coyote Jun 3, 2024, 3:12 PM

#

SD3 really does have excellent visual acuity!

bitter hearth Jun 3, 2024, 3:13 PM

#

with people like the creator of HelloWorld, sd3 will be awesome

bitter hearth Jun 3, 2024, 3:15 PM

#

dull star considering that 2B was their focus, it makes sense

you think that 2b will be most popular or will people support the 4b and 8b? I feel like the community will just use the 4b and barely tune the 8b

restive halo Jun 3, 2024, 3:19 PM

#

I think the one that they'll release first will be the most popular since tooling, finetunes etc. wil be built on top of it

#

part of why I'm sad that we dont get all of them is that instead of the community finding out which one works best for most people, everyone is funneled into the same model

torpid forge Jun 3, 2024, 3:23 PM

#

#

low stone Jun 3, 2024, 3:32 PM

#

Sd3/pixart/hunyuan - a collection of cells in a fierce battle with a virus, spears, guns, shields, cannons

dull star Jun 3, 2024, 3:47 PM

#

pixart did nice

#

I wish the api had the 2B model

#

it would possibly increase user count as its so much better

dusky thistle Jun 3, 2024, 4:14 PM

#

dull star it would possibly increase user count as its so much better

two good questions:

if 2b is better than 8b, why not put the 2b on the api?
if 8b is not ready for release, why charge ppl for it?

#

would make a lot more sense imo if they just swapped em

dull star Jun 3, 2024, 4:16 PM

#

exactly

#

also it seems they'll give us fine tuning code or something

#

Fine-Tuning: Capable of absorbing nuanced details from small datasets, making it perfect for customization and creativity.

#

I wonder if this means that they have made some really good finetuning implementations themselves

#

"absorbing nuanced details from small datasets" sounds really promising

dusky thistle Jun 3, 2024, 4:29 PM

#

dull star also it seems they'll give us fine tuning code or something

alex said that HF has had code etc for months so i'm guessing a diffusers implementation will be available day 1 or at least quickly

dull star Jun 3, 2024, 4:32 PM

#

also cant wait for Stable Audio 2

noble coyote Jun 3, 2024, 4:34 PM

#

"When's SD4 coming out?!" 😄

rain current Jun 3, 2024, 4:36 PM

#

low stone Sd3/pixart/hunyuan - a collection of cells in a fierce battle with a virus, spea...

ideogram

dull star Jun 3, 2024, 4:37 PM

#

yeah ideogram is still the goat

rain current Jun 3, 2024, 4:37 PM

#

It is uglier, but more faithful to the prompt

dull star Jun 3, 2024, 4:37 PM

#

but damn, SD3 2B finetuned might take the throne, even if its still not the best

dull star Jun 3, 2024, 4:37 PM

#

rain current It is uglier, but more faithful to the prompt

yeah ideogram has a... style...

#

still prefer it over DALLE3's super smooth style

#

rain current Jun 3, 2024, 4:39 PM

#

I am eagerly looking forward to the 12th to see how 2B works (I have bad feelings, I hope I'm wrong).... but the one I would like to have is 4B

dull star Jun 3, 2024, 4:40 PM

#

I'm just looking forward to more knowledge in the model

#

like it knowing the look of video games, video game characters, etc

bitter hearth Jun 3, 2024, 4:41 PM

#

dull star I'm just looking forward to more knowledge in the model

looking forward for people who spam "1girl" images in the subreddit

dull star Jun 3, 2024, 4:41 PM

#

I want it to know a lot, like how going from Llama 3-8B to like Llama 3-70B, 8B is coherent and all, but 70B just KNOWS more

rain current Jun 3, 2024, 4:41 PM

#

Yes, with 2B we will be able to play, understand it, until the others arrive

dull star Jun 3, 2024, 4:41 PM

#

bitter hearth looking forward for people who spam "1girl" images in the subreddit

oh my god I forgot about that agony

dull star Jun 3, 2024, 4:41 PM

#

rain current Yes, with 2B we will be able to play, understand it, until the others arrive

yes it will keep us busy until the other models come out

#

and when 2B comes out, and its actually good and has diversity/variety and etc

#

I'll buy $10 credits

bitter hearth Jun 3, 2024, 4:42 PM

#

dull star and when 2B comes out, and its actually good and has diversity/variety and etc

then it will get overtrained

mortal mesa Jun 3, 2024, 4:42 PM

#

with shutterstock pics

dull star Jun 3, 2024, 4:43 PM

#

bitter hearth then it will get overtrained

community finetunes on big boob girl dataset???? nooo this is not true at all thomas

viral plaza Jun 3, 2024, 4:44 PM

#

from the SD3 research paper https://arxiv.org/pdf/2403.03206 this is how CLIP and T5 come together in the model:
You can whole on just stack more stuff horizontally at will and it works. Similar works on SDXL/SD1 and I think comfy does it by default for >77tok prompts, but SD3 is basically designed to be happy with stacking like that

dull star Jun 3, 2024, 4:44 PM

#

woah

bitter hearth Jun 3, 2024, 4:45 PM

#

is controlnet ready for sd3 medium?

viral plaza Jun 3, 2024, 4:45 PM

#

you can pick and choose which tencs to use yes. It's only trained for G+L, and T5, and will need training to recognize other formats like longclip

dull star Jun 3, 2024, 4:45 PM

#

bitter hearth is controlnet ready for sd3 medium?

oh yeah that's a good question, we'd like to know that

restive halo Jun 3, 2024, 4:45 PM

#

dull star oh yeah that's a good question, we'd like to know that

he already heavily implied no earlier

viral plaza Jun 3, 2024, 4:46 PM

#

clipskip setup is exactly the same as SDXL

dull star Jun 3, 2024, 4:46 PM

#

restive halo he already heavily implied no earlier

oh I didn't see, sorry lol

restive halo Jun 3, 2024, 4:46 PM

#

dull star Jun 3, 2024, 4:46 PM

#

ahhhh thanks

#

oh interesting

viral plaza Jun 3, 2024, 4:46 PM

#

hopefully a fixerupper won't be needed for this one but yeah you can easily extract the VAE separately and tune it same as always

restive halo Jun 3, 2024, 4:46 PM

#

I also thought Emad said we'll get their own controlnets with release but I think they've went back on that, too or I've misremembered

dull star Jun 3, 2024, 4:47 PM

#

yeah I Emad said it on twitter

bitter hearth Jun 3, 2024, 4:47 PM

#

cause sd3 is multimodal can you prompt using only images?

mortal mesa Jun 3, 2024, 4:47 PM

#

restive halo I also thought Emad said we'll get their own controlnets with release but I thin...

it is what was said

dull star Jun 3, 2024, 4:47 PM

#

you mean like clip vision or something?

wide pagoda Jun 3, 2024, 4:48 PM

#

I'd be surprised if he said it would exist at release, since it wouldn't make sense to delay release for that

mortal mesa Jun 3, 2024, 4:48 PM

#

ypou can look

#

and be suprised

restive halo Jun 3, 2024, 4:49 PM

#

I tried to find the reference but the thread and tweet I found were leading to [deleted]

#

so not sure what the wording was

mortal mesa Jun 3, 2024, 4:50 PM

#

oh, ya i dont remember, lol did it get deleted, i surely dont know hahaha

restive halo Jun 3, 2024, 4:50 PM

#

ah no, there's still a bunch of replies where he says it (note: this wasn't for sd3 my bad)

dull star Jun 3, 2024, 4:50 PM

#

yup

restive halo Jun 3, 2024, 4:50 PM

#

mortal mesa Jun 3, 2024, 4:51 PM

#

welp about that, it didnt happen, ran out of compute

restive halo Jun 3, 2024, 4:51 PM

#

#

but it was also supposed to be up to 8b and estimated to 2 months ago, so clearly a lot was said out of hype, or at best blind optimism

dull star Jun 3, 2024, 4:52 PM

#

emad is known for hyping stuff up

#

lol

#

8B is still far away

mortal mesa Jun 3, 2024, 4:52 PM

#

uh they work there

teal fossil Jun 3, 2024, 4:52 PM

#

Let's call it optimism and stop listening to Emad at all.

dull star Jun 3, 2024, 4:52 PM

#

thank god they kept training the models

#

it's so much better now

upper snow Jun 3, 2024, 4:54 PM

#

viral plaza you can pick and choose which tencs to use yes. It's only trained for G+L, and T...

So I guess T5 only lost at the ablation tests?

viral plaza Jun 3, 2024, 4:55 PM

#

bitter hearth is controlnet ready for sd3 medium?

i know it's been looked into but idk if that'll be ready on release or not. Probably not.
The way to make controlnets is really clean+clear in SD3 though (there's a direct place to add a new stream, vs on old unets it was pretty hacky) so I'd expect controlnets on SD3 to be pretty cool once they're actually out

restive halo Jun 3, 2024, 4:56 PM

#

if it really does work better and more easily that'd be awesome since tooling is the main advantage of SD over everyone else

viral plaza Jun 3, 2024, 4:56 PM

#

bitter hearth cause sd3 is multimodal can you prompt using only images?

technically yes but idk how intelligent it will be with only an image input

viral plaza Jun 3, 2024, 4:56 PM

#

restive halo I also thought Emad said we'll get their own controlnets with release but I thin...

note that Emad doesn't work here anymore

viral plaza Jun 3, 2024, 4:56 PM

#

restive halo

lol yeah he had a habit of saying "yes thing will be ready" far before that was guaranteed

restive halo Jun 3, 2024, 4:57 PM

#

viral plaza note that Emad doesn't work here anymore

he did then, and to be fair the few things the next CEO said before he went silent also didn't come to be (mainly timelines) but let's not dwell on that

viral plaza Jun 3, 2024, 4:57 PM

#

restive halo ah no, there's still a bunch of replies where he says it (note: this wasn't for ...

that tweet is about cascade

dull star Jun 3, 2024, 4:57 PM

#

oh that is feb 13th

restive halo Jun 3, 2024, 4:57 PM

#

the 3rd one is definitely sd3

dull star Jun 3, 2024, 4:57 PM

#

wait

viral plaza Jun 3, 2024, 4:57 PM

#

upper snow So I guess T5 only lost at the ablation tests?

huh?

dull star Jun 3, 2024, 4:57 PM

#

yeah the second and third one yes

restive halo Jun 3, 2024, 4:58 PM

#

but my bad, I shared too quick, I didnt even find the one I remembered anyway

viral plaza Jun 3, 2024, 4:58 PM

#

dull star yeah the second and third one yes

ye

upper snow Jun 3, 2024, 4:59 PM

#

viral plaza huh?

I remember that there was some mention about the research team wanting to test if using only T5 performs better, don't remember if it was you or someone else.

#

as in just training the model on t5 only, leaving clip out completely

viral plaza Jun 3, 2024, 5:16 PM

#

upper snow I remember that there was some mention about the research team wanting to test i...

oh, yeah, that experiment got deprioritized in favor of releasing the known-working arch for now

upper snow Jun 3, 2024, 5:17 PM

#

viral plaza oh, yeah, that experiment got deprioritized in favor of releasing the known-work...

Fair. Probably wouldn't be too difficult to tune it on T5 only on my own anyways.

dull star Jun 3, 2024, 5:18 PM

#

upper snow Fair. Probably wouldn't be *too* difficult to tune it on T5 only on my own anyw...

can't wait for community efforts similar to this

ionic geyser Jun 3, 2024, 5:20 PM

#

viral plaza oh, yeah, that experiment got deprioritized in favor of releasing the known-work...

Thanks Alex for answering all our questions! Very much appreciated 🤘

upper snow Jun 3, 2024, 5:21 PM

#

tbh I mainly just hate the token limit of CLIP. And you can't really blame them for doing it that way because sequence length is SOOOOOO EXPENSIVE

#

Plus I think over 99% of the data used to train CLIP was less than 20 tokens long, you can see the consequences of this if you look at the positional embedding and also read the Long-CLIP paper to see experiments on it. CLIP can barely function past 20 tokens on its own.

#

I did try aligning a long clip model to a finetuned SD1.5 model and it takes like, over twice as long to train like that with the 258 token long context window. That's with the text encoder unfrozen. But it did seem to work as advertised, it pays much better attention to the whole prompt.

fleet falcon Jun 3, 2024, 5:25 PM

#

upper snow Fair. Probably wouldn't be *too* difficult to tune it on T5 only on my own anyw...

gonna need to handle that pooled embedding omission for finetuning tho
this gonna take a while to tune out clip from sd3

upper snow Jun 3, 2024, 5:25 PM

#

anyways I do hope that T5 does enough to at least hold different parts of the prompt together when we are doing unholy things with clip embeddings and torch.cat

#

concatting can get everything in different chunks to at least be present but it won't allow things to properly combine with each other (except for whatever the denoiser can accomplish on its own. Honestly MMDIT might just inherently be able to handle this a lot better anyways). t5 would be the only thing on the input side able to combine distant concepts

dull star Jun 3, 2024, 5:29 PM

#

Does anyone know how PonySD3 would be trained

#

would they continue with training clip with like tags, or did they switch to some vlm to caption the dataset

violet escarp Jun 3, 2024, 5:30 PM

#

my assumption is that they'll use both. They already used some vlm captioning for XL using their own trained captioner

dull star Jun 3, 2024, 5:30 PM

#

thanks

jolly swan Jun 3, 2024, 5:57 PM

#

dull star would they continue with training clip with like tags, or did they switch to som...

V6 was vlm for half captions, v7 is full captions run and the quality of captions is much higher

dull star Jun 3, 2024, 5:57 PM

#

woah

jolly swan Jun 3, 2024, 5:59 PM

#

I.e. OCR, character name recognition, support for nsfw, image grounding (wip but you should be describe object positions better in dalle3 esq way)

dull star Jun 3, 2024, 5:59 PM

#

nice

bitter hearth Jun 3, 2024, 6:05 PM

#

@dull star what you think of the helloworldsdxl models?

dull star Jun 3, 2024, 6:07 PM

#

Leosam's? haven't tried I think

#

hmm gpt4v tagging

bitter hearth Jun 3, 2024, 6:15 PM

#

dull star hmm gpt4v tagging

yeah, imo best model for photos of people

fair spruce Jun 3, 2024, 6:23 PM

#

neon wagon Jun 3, 2024, 6:26 PM

#

bitter hearth is controlnet ready for sd3 medium?

all controlnets for sdxl came very late and the people behind control net are very arrogant so i will assume maybe 8-10 months after the weights

crude yarrow Jun 3, 2024, 6:27 PM

#

We just barely got the sdxl controlnets so I can't imagine we will be getting sd3 controlnets anytime soon.

bitter hearth Jun 3, 2024, 6:33 PM

#

also how good would loras be in a DiT vs Unet?

lapis bay Jun 3, 2024, 7:27 PM

#

via sd3 api. I hope that sd3 medium will be able to do these kind of images too

lucid swift Jun 3, 2024, 7:28 PM

#

viral plaza oh, yeah, that experiment got deprioritized in favor of releasing the known-work...

Because people qsked about the weights so often?

dull star Jun 3, 2024, 7:39 PM

#

lapis bay via sd3 api. I hope that sd3 medium will be able to do these kind of images too

honestly doesn't seem like a lot of words for 2B to fail

#

then again, 2B was trained for a more correct amount of time than 8B

#

@sterile pendant ahhh

#

sorry if you have already seen

#

YESSS A PROPER 2B IMAGE WITH NO UPSCALING

#

the 16 channel VAE is doing its job quite well

#

a 4 channel vae would probably fail in this case

#

so with highresfix, we'll get image quality that Lykon's been posting

#

but even for a native image, this is quite clean!

leaden kindle Jun 3, 2024, 7:43 PM

#

Anyone know if you train a Lora on the 2B SD3 model, will it work on the 8B SD3 model?

dull star Jun 3, 2024, 7:44 PM

#

probably not 🤷‍♂️

but a textual inversion probably will

muted dove Jun 3, 2024, 7:51 PM

#

dull star YESSS A PROPER 2B IMAGE WITH NO UPSCALING

She has long arms

#

If you look closely...

dull star Jun 3, 2024, 7:52 PM

#

muted dove She has long arms

LMAO

low stone Jun 3, 2024, 7:53 PM

#

muted dove If you look closely...

#

As you can see here, the 4 channel sdxl vae has a detrimental effect on facial features.

lucid swift Jun 3, 2024, 7:53 PM

#

dull star the 16 channel VAE is doing its job quite well

I wonder why nobody increased the amount of channels before it seems to work even in small models

dull star Jun 3, 2024, 7:54 PM

#

wouldn't it require a complete retrain?

lucid swift Jun 3, 2024, 7:54 PM

#

Idk

dull star Jun 3, 2024, 7:55 PM

#

low stone

yeah, highresfix makes facial features worse the more you increase the resolution

#

thomas

torpid forge Jun 3, 2024, 7:58 PM

#

fair spruce

muted dove Jun 3, 2024, 8:00 PM

#

upper snow Jun 3, 2024, 8:12 PM

#

dull star the 16 channel VAE is doing its job quite well

it's a bit noisy if you look too closely, but still far better than the c4f8 vaes would do.

dull star Jun 3, 2024, 8:13 PM

#

what does the f8 mean?

teal fossil Jun 3, 2024, 8:31 PM

#

Can someone explain in easy terms how the 16-channel vae is so much better than the 4-channel one and why?

dull star Jun 3, 2024, 8:41 PM

#

merry hawk Jun 3, 2024, 8:43 PM

#

How to do ir

dry wave Jun 3, 2024, 8:44 PM

#

lucid swift I wonder why nobody increased the amount of channels before it seems to work eve...

I suggested this since I do SD 1.5 🤷‍♂️

#

but yes, yoz have to fully retrain the vae as well as sd for it

dry wave Jun 3, 2024, 8:46 PM

#

teal fossil Can someone explain in easy terms how the 16-channel vae is so much better than ...

less compression. The vae is like an image compression algorithm. Think of transforming your image to a jpeg image, you will get a lot of artifacts

#

same happens with vae. You get artifacts and lose small details in the image

#

furthermore the vae is extremely sensitive to small changes because it's so strongly compressed - so it's hard for the diffusion process to get small details right

#

that's why all small things like heads far away go lost in diffusion

teal fossil Jun 3, 2024, 8:49 PM

#

dry wave that's why all small things like heads far away go lost in diffusion

Ah! Or prints / patterns and the like. That's great news since it was one of those SD bottlenecks that's really starting to bug me.

vapid radish Jun 3, 2024, 9:25 PM

#

hollow epoch Jun 3, 2024, 9:27 PM

#

#📝｜prompting-help

silver sluice Jun 3, 2024, 9:41 PM

#

What are the vram requirements for sd3 per model size? Will an 8gb rtx3090 be able to run the 8b model?

silver sluice Jun 3, 2024, 9:43 PM

#

dull star Does anyone know how PonySD3 would be trained

There’s an article on civit by the author of pony who described how he’s gonna handle sd3

#

If anyone deserves first access to the model weights is the pony dev

dusky thistle Jun 3, 2024, 9:49 PM

#

3090 has 24gb vram and yes it would

hallow lion Jun 3, 2024, 9:53 PM

#

i like how right out the gate the pony devs are the most uhm... productive.

#

😄

#

I do have a tiny suggestion for the ponies tho, please make your models more varied. All I get is a woman in an empty room, no matter the promt. 😦

#

In the same clothes too

#

If I am lucky enough to get clothes

#

your anatomy is pretty good tho if anyone can finally get good hands across the board is the p0ny sd3 i think

#

no more diffusionhand

dull star Jun 3, 2024, 10:06 PM

#

apparently hands are gonna be better according to the email we have been sent about 2B

#

I have a massive doubt about it, but I definitelly expect hands to be at least a little better or on the level of SDXL

#

Photorealism: Overcomes common artifacts in hands and faces, delivering high-quality images without the need for complex workflows.

hallow lion Jun 3, 2024, 10:07 PM

#

Yes I read it when I got it.

#

bold claims.

#

i hope tho because facedetailer is terrible at detecting and fixing hands

#

in comfyu anyway

dull star Jun 3, 2024, 10:09 PM

#

smaller faces in the image, I expect to be much better thanks to the improved VAE

#

so I believe that

#

but I will still do a workflow (a very complex one) to make images better

#

you won't expect this at all....

#

Highres-fix 🤯

#

Typography: Achieves robust results in typography, outperforming larger state-of-the-art models.

I wonder how much this has improved over the 8B Beta days

dusky thistle Jun 3, 2024, 10:10 PM

#

tiled upscales are the only way around the untrained resolution problem of latent upscales

#

but then you can have issues with compositional drift

#

but best i've found has always been a tiled approach

dull star Jun 3, 2024, 10:11 PM

#

That's why I just use highresfix (t2i -> denoise at like 50% and t2i)

#

cause I have enough vram to waste

#

but controlnet tiles would be good

hallow lion Jun 3, 2024, 10:11 PM

#

hehe everyone talks smack about hi res fix

dull star Jun 3, 2024, 10:11 PM

#

https://media.discordapp.net/attachments/1089974139927920741/1226678423016181810/image.png?ex=665efd8e&is=665dac0e&hm=04972742c10deb15f940202f6733ea808061877e310ae7c773081b07945cb59d&=&format=webp&quality=lossless&width=869&height=676

#

SDXL with highresfix ^

#

6 fingers lol

hallow lion Jun 3, 2024, 10:12 PM

#

ugh

#

also the backdrops and architexture will make more sense

#

like this pohoto the closest column if not right

#

also it doens t make sense what is it a bunker?>

#

lines in buildings are always not straight

#

sizes of floors are always messe dup

#

and in general they feel funny and unreal all over sdxl too sd15 even worse

dull star Jun 3, 2024, 10:13 PM

#

hallow lion lines in buildings are always not straight

I just trained a lora on a building

hallow lion Jun 3, 2024, 10:14 PM

#

hmmm

mortal mesa Jun 3, 2024, 10:14 PM

#

i use nvidia cards

dull star Jun 3, 2024, 10:14 PM

#

same

hallow lion Jun 3, 2024, 10:16 PM

#

can u train on anything else even?

#

AMD

#

XD

dull star Jun 3, 2024, 10:17 PM

#

if only rocm caught up

mortal mesa Jun 3, 2024, 10:18 PM

#

i thought you were training on a building, ill show myself out now

dull star Jun 3, 2024, 10:18 PM

#

oh I just realised the joke sorry sadcat

hallow lion Jun 3, 2024, 10:19 PM

#

😄

torn wharf Jun 3, 2024, 10:36 PM

#

just as ai figures out hands, humans go and make them more complicated

hallow lion Jun 3, 2024, 10:39 PM

#

thats an act of war

torn wharf Jun 3, 2024, 10:40 PM

#

when judgement day isnt' from fear of being shut off, but a temper tantrum about drawing hands

dull star Jun 3, 2024, 10:49 PM

#

#

I still don't get this membership thing

#

can somebody explain if this means that you need a professional ($20/month) to make images for commercial use, or this is only for hosting the model on your own service for example?

#

cause "utilize within that member's own product" sounds vague to me

#

like I "utilize" the model OFFLINE (so I basically host to myself, and not to paying customers), therefore I use it for personal use, so its okay, but then what's with the generated image, if it's owned by me?

#

can I use that generated image, which is owned by me, for commercial use?

restive halo Jun 3, 2024, 10:56 PM

#

the answer I got (but for youtube) didn't clear that much except basically saying dont worry unless you are making a lot

dull star Jun 3, 2024, 10:58 PM

#

if it doesn't require a membership I'll still donate to stability

dull star Jun 3, 2024, 10:58 PM

#

restive halo the answer I got (but for youtube) didn't clear that much except basically sayin...

that's mainly where I'd probably use it yeah, youtube

#

don't know about game assets yet

#

and for just making images, I'd just share them for free on social media for people to see anyway

restive halo Jun 3, 2024, 10:59 PM

#

yeah, I want to create some animated videos, and it's kind of unclear at what stage you need to start paying

#

but I guess I'll worry about if I ever make enough stuff to be making >$20/mo

dull star Jun 3, 2024, 11:00 PM

#

its like the membership is a suggestion, not a rule thomas

#

but yeah I wouldn't be making much cash from it

restive halo Jun 3, 2024, 11:01 PM

#

it's just a bit annoying that if you for example put a lot of effort and make something and it happens to go super viral, you might be in a weird spot

#

the $20/month is at least fairly clear but if you needed to have contacted them to have made an agreeement beforehand it's a bit iffy

rain current Jun 3, 2024, 11:06 PM

#

I hope SD3 is as precise with the prompt as ideogram...
"A composite of three distinct scenes. In the top scene, there's a spacious room with a table set with elegant decor, including a vase with pink flowers, a black teapot, and white spherical objects. A woman in a black outfit sits on the floor, engrossed in her thoughts. The middle scene showcases a woman with a unique hairstyle, wearing a black outfit, sitting at a table with a wine glass in front of her. The background is a dilapidated building with a reflective body of water in front. The bottom scene depicts a serene scene of a man sitting alone on a boat, surrounded by a calm body of water with a dilapidated building in the background."

dull star Jun 3, 2024, 11:09 PM

#

yikes, probably not

#

there's nothing stopping us from just simply using regional prompting though, this could be easily set up and accomplished

rain current Jun 3, 2024, 11:11 PM

#

Taking it to the limit... 😨
"A collage of various intricate and artistic photographs. Starting from the top left, there's a close-up of a person's eye with a detailed pattern on the iris. Next to it, there's an image of a hand resting on sandy terrain with a ring on one of the fingers. Moving right, there's a person wearing a black and white striped outfit with a reflective face mask, revealing a cityscape behind it. Below, there's a detailed close-up of a moth's wings with ornate patterns. Next to it, there's a photograph of a white horse in a snowy landscape. On the bottom left, there's a close-up of a person's face with a detailed sketch of a city skyline on it. Adjacent to it, there's a photograph of a castle-like structure in a snowy environment"

dull star Jun 3, 2024, 11:14 PM

#

honestly if we finetune SD3 on this then it might get really good at splitscreen stuff

#

I also love making movie posters on ideogram

#

SD3 can make nice paintings man

#

#

I hope 2B will excel at these too

#

dull star Jun 3, 2024, 11:26 PM

#

dull star I also love making movie posters on ideogram

at least SD3 could make this

#

this is just so good man

#

SD3 did perfectly

lucid swift Jun 3, 2024, 11:32 PM

#

dry wave but yes, yoz have to fully retrain the vae as well as sd for it

Is it not possible to finetune it?

lucid swift Jun 3, 2024, 11:34 PM

#

dry wave that's why all small things like heads far away go lost in diffusion

But isnt that like training on higher ress and takes longer?

#

Also deep floid if trained in pixel space and still had artifacts

dull star Jun 3, 2024, 11:36 PM

#

odd

#

but yeah its pixel space, yet it has the same small face issue as models that use VAEs

lucid swift Jun 3, 2024, 11:37 PM

#

Small face?

dull star Jun 3, 2024, 11:37 PM

#

small faces are distorted

#

in the distance

lucid swift Jun 3, 2024, 11:38 PM

#

Maby the artifacts in if are created by the upscaler

dull star Jun 3, 2024, 11:38 PM

#

well if it is like deepfloyd, then yes

#

cause its in multiple stages

dry wave Jun 3, 2024, 11:39 PM

#

lucid swift But isnt that like training on higher ress and takes longer?

no, the resolution determines the performance and memory consumption of the transformer layers and the resolution stays the same between 4 channel or 16 channel vae

#

of course, channel count does also have an impact on performance, however, the international channel count is independent from the vae channel count

lucid swift Jun 3, 2024, 11:40 PM

#

dry wave no, the resolution determines the performance and memory consumption of the tran...

Souds to good to be true xD

dry wave Jun 3, 2024, 11:40 PM

#

the first thing that happens in sd is that the 4 channels are mapped to ~1000 channels

lucid swift Jun 3, 2024, 11:41 PM

#

How do u know this stff btw?

dry wave Jun 3, 2024, 11:41 PM

#

so it doesn't matter what's the channel count in input and output, when most of the time the model is using a much larger channel count anyways

#

it's open source. Anyone can lookup the code

lucid swift Jun 3, 2024, 11:42 PM

#

dry wave it's open source. Anyone can lookup the code

Yes but u have to have some skills to do that

dry wave Jun 3, 2024, 11:42 PM

#

I'm scientist in a field related to machine learning 🤷‍♂️

lucid swift Jun 3, 2024, 11:43 PM

#

Do u think that increasing channel count would also help the casade model without removing the 16x training speedup?

#

@dry wave

jolly swan Jun 3, 2024, 11:46 PM

#

hallow lion I do have a tiny suggestion for the ponies tho, please make your models more var...

Hey, please take into account there is no team and it's just me in my garage, but you are right - the model has strong bias for female characters (and bad backgrounds), V7 should have much more diversity but I don't know if that would be fixed completely.

hallow lion Jun 3, 2024, 11:47 PM

#

i like how dalle straight up set him on fire

hallow lion Jun 3, 2024, 11:52 PM

#

jolly swan Hey, please take into account there is no team and it's just me in my garage, bu...

sure sure just a suggestion, i really appreciate your work and its amazing what we do for free basically.

dry wave Jun 3, 2024, 11:52 PM

#

lucid swift Do u think that increasing channel count would also help the casade model withou...

I'm very sure. It's same issue. They have a huge 1000 latent space they then compress into 16 dims

#

I think cascade was some kind of proof of concept. Showing that you can achieve incredible compression

#

but this amount of compression doesn't make sense if you want high quality output

lucid swift Jun 3, 2024, 11:54 PM

#

Yes i woder if a incresed channel amount could let it seem like a normal model with more details but still high compression for faster learning

dry wave Jun 3, 2024, 11:55 PM

#

I think the main advantage here is the multiple stages

#

doing composition first and then fine details

#

in sd you always use the same fat unet in every time step

#

but we know that most of the unet is not even used most of the time

lucid swift Jun 3, 2024, 11:56 PM

#

Stage b does not really add details currently. Its only like a insane vae

dry wave Jun 3, 2024, 11:57 PM

#

like the output of the down layers of the unet stays the same most of the timesteps but they are still computed all the time

#

doing staging just makes sense ad composition on early timesteps is just a very different task from the later timesteps

dry wave Jun 3, 2024, 11:58 PM

#

lucid swift Stage b does not really add details currently. Its only like a insane vae

there is not such a big difference

#

like I think dallE is using a "vae" made out of a diffusion process

lucid swift Jun 3, 2024, 11:59 PM

#

dry wave doing staging just makes sense ad composition on early timesteps is just a very ...

I have seen a paper where they remove atiantions on later steps of geerating a image

fleet meteor Jun 3, 2024, 11:59 PM

#

rain current Taking it to the limit... 😨 "A collage of various intricate and artistic photo...

Wait is that ideogram or sd3? So impressive

lucid swift Jun 4, 2024, 12:00 AM

#

dry wave like I think dallE is using a "vae" made out of a diffusion process

If you are generous with the definition then cascade too

dry wave Jun 4, 2024, 12:00 AM

#

in principal diffusion is a method to go from a random normal distribution to a complicated distribution. A vae is doing something very similar.

lucid swift Jun 4, 2024, 12:01 AM

#

I am just so impressed with cascade because it lerns 16 times faster. This can make finetuning or training in general more possible on consumer hardware

dry wave Jun 4, 2024, 12:01 AM

#

anyways, I go to bed. Good night

lucid swift Jun 4, 2024, 12:01 AM

#

dry wave anyways, I go to bed. Good night

Hehe same gn

dry wave Jun 4, 2024, 12:01 AM

#

I hadn't much luck with fine-tuning Cascade 🤷‍♂️

#

fine-tuning results in sdxl were always much better

lucid swift Jun 4, 2024, 12:02 AM

#

dry wave fine-tuning results in sdxl were always much better

Another advantage of cascade is that u only have one big clip model i think its 2b

#

But we can write tomorrow. I am following 2 cascade fi etuning projects and both seem promising

sterile pendant Jun 4, 2024, 12:07 AM

#

dry wave but this amount of compression doesn't make sense if you want high quality outpu...

Stop comparing apples to oranges, sd3 doesn't use a unet, it uses a transformer based architecture now

#

Oh and also, 16 vae channels means that you have a lot better control at decoding an image vs the old 4 channel method

dull star Jun 4, 2024, 12:13 AM

#

wdym better control?

sterile pendant Jun 4, 2024, 12:20 AM

#

A vae is what resolves an image from high dimensional latent space. It takes some Nth dimensional data and collapses it down to three dimensions: RGB. The more channels the vae has, the more accurately it can do the job

#

It would be like comparing mono audio to stereo audio

hallow lion Jun 4, 2024, 12:25 AM

#

did you know VAE is a lossy process? XD

#

i didnt know

#

everytime you decode and encode you loose quality =0

sterile pendant Jun 4, 2024, 12:28 AM

#

Yeah it's a form of compression and decompression

#

Oh and the extra channels thing also applies to the encoding part as well

hallow lion Jun 4, 2024, 12:35 AM

#

lucid swift But we can write tomorrow. I am following 2 cascade fi etuning projects and both...

Wow Cascade gets some love after all? 😄

rugged nova Jun 4, 2024, 12:50 AM

#

rain current Taking it to the limit... 😨 "A collage of various intricate and artistic photo...

Pretty sure that just uses something like this: https://github.com/YangLing0818/RPG-DiffusionMaster

GitHub

GitHub - YangLing0818/RPG-DiffusionMaster: [ICML 2024] Mastering Te...

[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (PRG) - YangLing0818/RPG-DiffusionMaster

split ledge Jun 4, 2024, 1:06 AM

#

Hey there 🙂
Can I generate images with a lower resolution than 1024x1024 with sd3 ?

hallow lion Jun 4, 2024, 1:59 AM

#

Does it like the same promting style as sdxl? Or natural laguage? do we still need load sof negative promts?

#

do we still do parenthesis? the emphasize?

#

(((((((great hands:1.5))))))

turbid grotto Jun 4, 2024, 2:34 AM

#

guys

#

I am so happy about sd3

crude yarrow Jun 4, 2024, 2:37 AM

#

split ledge Hey there 🙂 Can I generate images with a lower resolution than 1024x1024 with s...

You can always downscale them if you need smaller resolution. Are you trying to save on credits or something?

sour harbor Jun 4, 2024, 3:31 AM

#

When do you think there'll be SD3 LoRA training? 🤔 So interesting

patent acorn Jun 4, 2024, 3:38 AM

#

turbid grotto I am so happy about sd3

most here are and not the doomers

sterile pendant Jun 4, 2024, 3:40 AM

#

crude yarrow You can always downscale them if you need smaller resolution. Are you trying to ...

That or they might be worried about inferencing locally when the model comes out

#

But I'd imagine SD3 can probably handle something like 768² without imploding

neon wagon Jun 4, 2024, 5:14 AM

#

we will see 3-6 months after the weights the first loras and finetuned models

sterile pendant Jun 4, 2024, 5:29 AM

#

Honestly, I'm waiting more for the controlnets than anything. From what they've said, the controlnets will be far better and easier to train than the hacky stuff that was needed to use them with unets.

deft wren Jun 4, 2024, 5:34 AM

#

We are so happy...

dusky thistle Jun 4, 2024, 5:53 AM

#

sterile pendant Honestly, I'm waiting more for the controlnets than anything. From what they've ...

very exciting

sterile pendant Jun 4, 2024, 6:16 AM

#

Loras/doras should be neat as well, but if controlnets can wrangle tough scenes, you won't need so many models of people trying to fix things like hands and whatnot(if you're working on people)

#

Loras will likely be a much bigger deal since the base model is already really decent. Good controlnets and maybe ipa(or some kind of dit compatible version that does the same kinds of things) will make things far easier than relying on overtrained models

dusky thistle Jun 4, 2024, 6:29 AM

#

yeah, i see the loras/doras as a way to add concepts

#

not fix stuff

#

(hopefully)

rain current Jun 4, 2024, 6:31 AM

#

fleet meteor Wait is that ideogram or sd3? So impressive

Ideogram. With SD3 these are the results

brave bloom Jun 4, 2024, 6:38 AM

#

#1237459938901491852 what's your model?

sterile pendant Jun 4, 2024, 7:05 AM

#

dusky thistle yeah, i see the loras/doras as a way to add concepts

Yep, exactly. But again, we'll see how it all plays out. Training loras for the 2b version should be pretty easy on resources though, well maybe as long as it's not being trained with both clips and t5.

#

If it's just 2b and the two clips, should likely be doable with even 12gb vram, maybe even 8 depending on the dim size

dry wave Jun 4, 2024, 7:56 AM

#

sterile pendant Stop comparing apples to oranges, sd3 doesn't use a unet, it uses a transformer ...

lol, the unet is a transformer based architecture 😂

sterile pendant Jun 4, 2024, 8:06 AM

#

dry wave lol, the unet is a transformer based architecture 😂

No, it's a convolutional neural network. The new MMDiT style is different and is very similar to what LLMs use.

#

So again, apples and oranges.

radiant ledge Jun 4, 2024, 8:10 AM

#

sdxl unet has some attention, but wouldn't call it a transformer

sterile pendant Jun 4, 2024, 8:12 AM

#

a unet is a cnn and the attention happens across the shape of a U, hence Unet

#

but vision transformers and convolutional neural networks are very different in how they work

#

"Vision Transformers and CNNs (Convolutional Neural Networks) are two different types of neural network architectures used to solve computer vision tasks. Vision Transformers are based on the Transformer architecture, originally designed for natural language processing, but adapted for image analysis. CNNs, on the other hand, are a type of deep learning network specifically designed for image recognition and classification."

#

https://www.edge-ai-vision.com/2024/03/vision-transformers-vs-cnns-at-the-edge/ from a quick google link to save time trying to explain it all

Edge AI and Vision Alliance

Vision Transformers vs CNNs at the Edge

This blog post was originally published at Embedl’s website. It is reprinted here with the permission of Embedl. “The Transformer has taken over AI”, says

#

"The main difference lies in their architectural design and the way they process visual information. While CNNs rely on the use of convolutional layers to extract features hierarchically, Vision Transformers utilize self-attention mechanisms to capture global dependencies and relations between image patches directly. This allows Vision Transformers to model long-range interactions within images more effectively than CNNs."

radiant ledge Jun 4, 2024, 8:15 AM

#

unet is not a pure CNN

sterile pendant Jun 4, 2024, 8:16 AM

#

but anyways, the moral of the story is that a cnn != dit. so stop sweating the parameter size differences because they function completely differently under the hood

#

it doesn't have to be a pure cnn, it's still a cnn

#

like with unets, you can still do things like self attention and whatnot, but at the core, it's still convolving

dry wave Jun 4, 2024, 8:22 AM

#

sterile pendant No, it's a convolutional neural network. The new MMDiT style is different and is...

it's a convolutional dnn with transformers. Yes, it's different architecture, but that doesn't invalidate any arguments.

dry wave Jun 4, 2024, 8:23 AM

#

sterile pendant like with unets, you can still do things like self attention and whatnot, but at...

this is just wrong

#

the sd unet is a transformer at its core

sterile pendant Jun 4, 2024, 8:24 AM

#

alright, so all these dozens of articles are just talking out their ass then ✅

dry wave Jun 4, 2024, 8:25 AM

#

the convolutions are necessary for some things like composition, downscaling, add positional information.
In the ViT architecture you have also downscaling operations, called patching, but they don't use convs

dry wave Jun 4, 2024, 8:25 AM

#

sterile pendant alright, so all these dozens of articles are just talking out their ass then ✅

the articles don't talk about sd unets

sterile pendant Jun 4, 2024, 8:25 AM

#

https://en.wikipedia.org/wiki/U-Net

U-Net

U-Net is a convolutional neural network that was developed for biomedical image segmentation at the Computer Science Department of the University of Freiburg. The network is based on a fully convolutional neural network whose architecture was modified and extended to work with fewer training images and to yield more precise segmentation. Segment...

dry wave Jun 4, 2024, 8:26 AM

#

boah dude, I know what a unet is.

radiant ledge Jun 4, 2024, 8:26 AM

#

unet is a general term, you have to look at a specific implementation

dry wave Jun 4, 2024, 8:26 AM

#

exactly!!

radiant ledge Jun 4, 2024, 8:26 AM

#

unet just means scaling stuff down and then up again

dry wave Jun 4, 2024, 8:26 AM

#

the sd unet is a transformer architecture

sterile pendant Jun 4, 2024, 8:27 AM

#

it doesn't matter what flavor your trying to talk, sd unet is still a unet, they just hacked in some self attention. again, the way cnns and dits "see" are completely different and sd still "sees" like a cnn

dry wave Jun 4, 2024, 8:27 AM

#

there is also the hourglass transformer architecture which is... just a unet with another name. They don't use convolution so they gave it a new name;)

sterile pendant Jun 4, 2024, 8:27 AM

#

but i'm not going to argue it any further, keep thinking what you want

dry wave Jun 4, 2024, 8:28 AM

#

it's not "hacked in" some self attention

#

the transformers are the core component of the unet

#

the main differences in sd3:

you have positional embeddings
text and image embeddings share a common latent space and are transformed together

sterile pendant Jun 4, 2024, 8:29 AM

#

youre still missing the point of how the models "see" the data. i can run a c++ program that then runs a python program inbetween steps, but then jumps back into the c++ program. doesn't make it a python program.

#

that's what's happening in the sd unet essentially with self attention. all the actual real operations are still happening in the cnn

dry wave Jun 4, 2024, 8:30 AM

#

sterile pendant that's what's happening in the sd unet essentially with self attention. all the ...

that's just not true

radiant ledge Jun 4, 2024, 8:31 AM

#

there's a grand total of two convolutional layers in the sd unet

dry wave Jun 4, 2024, 8:31 AM

#

the model "see" it's data in some latent space. It's totally unimportant if convolutions are involved here. What's important is that in sd3 text and image share the same latent space

desert garnet Jun 4, 2024, 8:33 AM

#

dry wave the model "see" it's data in some latent space. It's totally unimportant if conv...

i see you are arguing again with the clown-bot

radiant ledge Jun 4, 2024, 8:34 AM

#

desert garnet i see you are arguing again with the clown-bot

you don't understand, someone is wrong on the internet

desert garnet Jun 4, 2024, 8:34 AM

#

tbf it takes a high iq to understand what that bot is saying

cobalt moon Jun 4, 2024, 8:35 AM

#

If I wont wrong U-Net is just a base?

desert garnet Jun 4, 2024, 8:36 AM

#

cobalt moon If I wont wrong U-Net is just a base?

no its an acid not a base

radiant ledge Jun 4, 2024, 8:36 AM

#

cobalt moon If I wont wrong U-Net is just a base?

not sure what you mean with that

desert garnet Jun 4, 2024, 8:36 AM

#

acid = 🍋

cobalt moon Jun 4, 2024, 8:36 AM

#

I actually have no idea too lol. Didn't learn much about architecture or machine learning networks

desert garnet Jun 4, 2024, 8:36 AM

#

i think we need to eat more lemons

sterile pendant Jun 4, 2024, 8:39 AM

#

again, i guess all these dozens of resources are all just wrong about it then... sd's unet has some elements of transformers in it, yes(some attention), but it is still a cnn and still revolves around the unet. sd3 uses an actual transformer network that is completely centered around it. it's what llms have been using for ages and is very different under the hood. until recently, it was a pain in the ass to make work with things like image generation while keeping the hardware(vram/perfromance) and training costs from ballooning out.

desert garnet Jun 4, 2024, 8:40 AM

#

https://tenor.com/view/id-like-to-speak-with-the-manager-karen-angry-customer-whos-your-supervisor-refund-gif-16365889

Tenor

left parrot Jun 4, 2024, 9:45 AM

#

Has there been any news about SD3 Turbo lately?

dry wave Jun 4, 2024, 10:51 AM

#

sterile pendant again, i guess all these dozens of resources are all just wrong about it then......

dude, 85% of SDXL are transformers. Only tiny 0.7% of the model are convolutions

#

the transformers in SDXL are much bigger than the transformers in SD3

#

and you tell me "it has some attention"

#

yes, the model architectures differ. And yes, this can have some implications. For example, you probably won't see this weird duplications in SD3 in superhigh resolutions, as these are artifacts from the convolution. So, using convolutions instead of positional embeddings definitely has some effect

pseudo stone Jun 4, 2024, 11:04 AM

#

dry wave yes, the model architectures differ. And yes, this can have some implications. F...

personally im doubtful sd3 will even be a better "model" cnosidering all the improvements and ecosystem the community has built around sdxl

#

like will it be good enough for people to retrain everything i dont know

lucid swift Jun 4, 2024, 11:04 AM

#

hallow lion Wow Cascade gets some love after all? 😄

yess!! this is from one anime finetune

pseudo stone Jun 4, 2024, 11:05 AM

#

Sote DiFusion <3

dry wave Jun 4, 2024, 11:06 AM

#

pseudo stone personally im doubtful sd3 will even be a better "model" cnosidering all the imp...

I think CLIP is on it's limit. For real prompt understanding you need T5. But yeah, maybe we get something like ELA for SDXL and it will then be better than SD3

#

although SD3 has some cool technical features I like and which might indeed work better than SDXL

pseudo stone Jun 4, 2024, 11:07 AM

#

dry wave I think CLIP is on it's limit. For real prompt understanding you need T5. But ye...

what about emma https://wrong.wang/blog/20240512-what-is-emma/

wrong.wang

What is EMMA

After completing the work on ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment, my objective shifted towards the lightweight and cost-effective transformation of the Stable Diffusion series models into image generation models that are conditioned on cross-modal sequences of text and images. I explored various approaches for i...

low stone Jun 4, 2024, 11:18 AM

#

pseudo stone what about emma https://wrong.wang/blog/20240512-what-is-emma/

Ella/emma is all great and wonderful and I use Ella a lot, but because it's all restricted to sd 1.5, it's going to fall by the wayside once sd3 is out. If they had released it for sdxl, that would be a different story, but that's never going to happen.

pseudo stone Jun 4, 2024, 11:19 AM

#

would it cost much to train though?

#

what about that adapter that used pre trained llms as an encoder

low stone Jun 4, 2024, 11:20 AM

#

The guy already trained it. Ella for sdxl is finished, but they couldn't release it because sdxl has a different commercial license than sd 1.5

dull star Jun 4, 2024, 11:20 AM

#

man if comfyui_tensorrt had pixart support too

#

that would be so awesome

#

but if SD3 comes out and they optimize that too, it would be great as well

low stone Jun 4, 2024, 11:22 AM

#

I'm sure they will. Hunyuan released tensorRT libraries the other day. It'll be neat if those are comfy integrated. I need to message the author of the comfy extra models nodes to see.

dull star Jun 4, 2024, 11:22 AM

#

wow

#

honestly, SD3 2B is so close

#

I just cannot wait to finetune lora models or textual inversions

lucid swift Jun 4, 2024, 11:24 AM

#

dry wave I think CLIP is on it's limit. For real prompt understanding you need T5. But ye...

and why not make ella for sd3 xD

cunning lintel Jun 4, 2024, 11:25 AM

#

lucid swift and why not make ella for sd3 xD

One might hope it makes more sense to train/finetune the model directly, as ella is just bolting on t5

lucid swift Jun 4, 2024, 11:26 AM

#

low stone The guy already trained it. Ella for sdxl is finished, but they couldn't release...

do you mean the guy who is reverse engeneering it? and what about the license is not alwoing to train on sdxl

lucid swift Jun 4, 2024, 11:26 AM

#

cunning lintel One might hope it makes more sense to train/finetune the model directly, as ella...

but t5 is good

cunning lintel Jun 4, 2024, 11:27 AM

#

But i'm really curious about an sdxl ella as well, initially i thought sd3 would be miles better than the ella approach, but sd3 obviously has limitations as well, it'd be interesting if ella and sd3 prompt understanding turned out to be in the same ballpark

low stone Jun 4, 2024, 11:29 AM

#

lucid swift do you mean the guy who is reverse engeneering it? and what about the license is...

I mean the guy who put out Ella for sd 1.5. He also did sdxl but won't release it.

cunning lintel Jun 4, 2024, 11:30 AM

#

lucid swift but t5 is good

but it's already part of sd3 (what i understand the whole thing about sd3 is that it can deal with various inputs/outputs, so a solution like ella might be obsolete for the new architecture)

dry wave Jun 4, 2024, 11:30 AM

#

lucid swift do you mean the guy who is reverse engeneering it? and what about the license is...

reverse enginerring what? its opensource

low stone Jun 4, 2024, 11:30 AM

#

dry wave Jun 4, 2024, 11:30 AM

#

and the SDXL licence totally allows you to release something like ELA oO

#

I think he don't want to release it for other reasons...

low stone Jun 4, 2024, 11:31 AM

#

dry wave and the SDXL licence totally allows you to release something like ELA oO

The Ella guy works for tencent or one of these other big Chinese companies. They'd probably be subject to more than the $20 a month commercial license.

dry wave Jun 4, 2024, 11:32 AM

#

the commerical licence is just sdxlturbo

lucid swift Jun 4, 2024, 11:32 AM

#

dry wave reverse enginerring what? its opensource

the training code not

dry wave Jun 4, 2024, 11:32 AM

#

I rather hink that this big company he works for doesn't want him to release the weights

cunning lintel Jun 4, 2024, 11:32 AM

#

ella guys also said the sdxl version was fintuned, might be those images used for the finetune are the licensing problem. or it's simply that they want to use the sdxl version for their own imagenen

lucid swift Jun 4, 2024, 11:32 AM

#

low stone I mean the guy who put out Ella for sd 1.5. He also did sdxl but won't release i...

isnt that alibaba or another chnese company

dry wave Jun 4, 2024, 11:33 AM

#

I assume the later

lucid swift Jun 4, 2024, 11:34 AM

#

dry wave I think he don't want to release it for other reasons...

they said on github that it would take to much work to make sure its "secure". and its by a big chinese company and not just a guy

low stone Jun 4, 2024, 11:34 AM

#

lucid swift isnt that alibaba or another chnese company

https://github.com/TencentQQGYLab/ELLA yeah it's tencent

GitHub

GitHub - TencentQQGYLab/ELLA: ELLA: Equip Diffusion Models with LLM...

ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment - TencentQQGYLab/ELLA

lucid swift Jun 4, 2024, 11:34 AM

#

low stone https://github.com/TencentQQGYLab/ELLA yeah it's tencent

its so anyoing these companys always say wewill releset this ool shit and they never do it. remember that dace ai

#

why are so many chinise ai companys like that

dry wave Jun 4, 2024, 11:35 AM

#

lucid swift the training code not

there is also training code. But there are also many community written training codes for SDXL. This is no secret stuff.

lucid swift Jun 4, 2024, 11:35 AM

#

dry wave there is also training code. But there are also many community written training ...

training ella seems difrent then just traing sdxl. i think you freeze sdxl and t5 and only train a small adapter model

dry wave Jun 4, 2024, 11:35 AM

#

yes

#

similar to ipadapter

lucid swift Jun 4, 2024, 11:36 AM

#

dry wave similar to ipadapter

and i have not seen anyone finetune that

dry wave Jun 4, 2024, 11:36 AM

#

maybe I just understood you wrong.You mean the training code for ELA is not open

lucid swift Jun 4, 2024, 11:36 AM

#

dry wave maybe I just understood you wrong.You mean the training code for ELA is not open

yes

#

and he is trying to revese engener the training code

dry wave Jun 4, 2024, 11:37 AM

#

lucid swift and i have not seen anyone finetune that

oh, I think people did. To be honest: It makes much more sense to train stuff like ipadapter than using controlnets for everything...

#

but even adapter training costs a lot of compute, usually more than you can effort with consumer hardware

lucid swift Jun 4, 2024, 11:38 AM

#

dry wave oh, I think people did. To be honest: It makes much more sense to train stuff li...

can you show me who did finetune that. its cool

dry wave Jun 4, 2024, 11:38 AM

#

I mean there are several ipadapters out there

lucid swift Jun 4, 2024, 11:38 AM

#

dry wave oh, I think people did. To be honest: It makes much more sense to train stuff li...

isnt ip adapter just a very advanced and a bit diffrent controllnet?

dry wave Jun 4, 2024, 11:39 AM

#

it works very different

#

because it injects the conditioning via cross attention

lucid swift Jun 4, 2024, 11:39 AM

#

dry wave but even adapter training costs a lot of compute, usually more than you can effo...

thats probably it

dry wave Jun 4, 2024, 11:39 AM

#

while the controlnet is "mimicing" the unet

#

so controlnets have the disadvantage that they take a lot of resources/performance. Basically they are as huge as the base model itself

lucid swift Jun 4, 2024, 11:40 AM

#

but isnt a controllnet also injecting it like taht? and they just finetune a copy of the unet for faster traing

dry wave Jun 4, 2024, 11:40 AM

#

the advantage of controlnets is that they are initialized by the base model, so they already "know a lot about images" and can be trained faster

lucid swift Jun 4, 2024, 11:40 AM

#

dry wave the advantage of controlnets is that they are initialized by the base model, so ...

yes

dry wave Jun 4, 2024, 11:41 AM

#

no, controlnets just use "addition"

#

basically they compute some delta they add ontop of the original unet

#

which doesn't mean it's less powerful. But because they have to be as big as the original unet they are very resource-ineffective

#

also you can only use controlnets for images

#

while the ipadapter idea can be used on any kind of input data (including T5 text prompts like in ELLA)

lucid swift Jun 4, 2024, 11:43 AM

#

dry wave while the ipadapter idea can be used on any kind of input data (including T5 tex...

interesting i dint knew it was this flexible

#

but the disatvantige is a high traing cost? how high do you think?

dry wave Jun 4, 2024, 11:44 AM

#

training code for ipadapter is also on github

#

if you really want to train something like ELLA the costs will be massive

#

the problem is that if you use CLIP+T5 then SD will probably just ignore the T5 as the information from CLIP is much easier accessible

#

so you probably have to train it like in SD3 that it gets T5 information only sometimes to really encourage it to learn something

#

but as T5 embeddings do not align at all with images... it will be much harder to learn from T5 than learning from CLIP

#

basically, T5 is totally alien for SD. It knows nothing about this latent space and it has to learn everything from scratch

lucid swift Jun 4, 2024, 11:47 AM

#

interesting! i just think its so sad that everything is so expensive.

dry wave Jun 4, 2024, 11:48 AM

#

dunno. I mean you can rent gpus for relatively low amount of money

lucid swift Jun 4, 2024, 11:48 AM

#

i wonder if neural networks that create neural netwoks will reduce the cost in future

dry wave Jun 4, 2024, 11:48 AM

#

but then it's an expensive hobby ^^

#

I think most people, if they spent a lot of money into that, want some money back

dry wave Jun 4, 2024, 11:48 AM

#

lucid swift i wonder if neural networks that create neural netwoks will reduce the cost in f...

I don't believe in that stuff xD

lucid swift Jun 4, 2024, 11:48 AM

#

dry wave I don't believe in that stuff xD

why? to complex?

dry wave Jun 4, 2024, 11:49 AM

#

just not something AI is good at

lucid swift Jun 4, 2024, 11:49 AM

#

dry wave just not something AI is good at

how do you have a feeling for what its good at? isntat that also just data distibutuion

dull star Jun 4, 2024, 11:49 AM

#

would we have to truncate our prompts when training loras and stuff?

dry wave Jun 4, 2024, 11:50 AM

#

no, because if you want it to find something new and better than humand could do, its outside the data distribution

#

all cases where "AI found some cool new algorithm no human ever found" so far were exaggerated. Like what they usually did was just trying billions of different algorithms and use the best one. You didn't even need a neural network for that, you could just combinatorial generate code

#

what LLMs can do today is writing you python code that makes a DNN as we have it already

#

you could just write it yourself

lucid swift Jun 4, 2024, 11:51 AM

#

dry wave no, because if you want it to find something new and better than humand could do...

but humans are also able to find new stuff why cant ai do it?

dry wave Jun 4, 2024, 11:51 AM

#

so in best case you don't have to learn python

#

because it's an ongoing discussion how much "AI" is currently in our "AI". Some people might say all ChatGPT is doing is just autocompletion via statistical inference. No real thinking.

#

it's hard to say, though, where is the border between statistical inference and real thinking

lucid swift Jun 4, 2024, 11:53 AM

#

dry wave because it's an ongoing discussion how much "AI" is currently in our "AI". Some ...

my answer to that is always how do you know your brain is not a advanced atuocompletion

dry wave Jun 4, 2024, 11:53 AM

#

I would say because we have MUCH LESS training data

#

like I learned programing from a few examples

#

ChatGPT need millions of programing examples to learn something

#

from few examples you cannot do statistical inference

lucid swift Jun 4, 2024, 11:54 AM

#

maby the brain is just better at atocompetion with less data then current models

dry wave Jun 4, 2024, 11:54 AM

#

as said, it's an ongoing discussion. Nobody knows the truth yet

lucid swift Jun 4, 2024, 11:55 AM

#

i agree. i just want to know your opinion

dry wave Jun 4, 2024, 11:55 AM

#

but while I think that large llms do have some kind of understanding about the data they process

#

I still doubt that they are able to reasoning on a level of a human (or even about)

lucid swift Jun 4, 2024, 11:55 AM

#

i 100% agree llms are not even close

dry wave Jun 4, 2024, 11:55 AM

#

and I don't think they are able to generate a new scientific discovery or algorithm or something like that

#

so far they can only assist human in doing so

#

(which, to be honest, is totally fine for me xD)

lucid swift Jun 4, 2024, 11:56 AM

#

yess xD

#

not beeing repaced for now

cunning lintel Jun 4, 2024, 11:59 AM

#

Biggest thing with llms is they're so confidently wrong, and using llms for a field you're not familiar with, you won't know it's wrong

lucid swift Jun 4, 2024, 12:00 PM

#

dry wave (which, to be honest, is totally fine for me xD)

i wonder if we will lern good algorithms by revese engeneering nature. this is very interesting. https://www.youtube.com/watch?v=8Ukin_-5aLQ

YouTube

MITCBMM

How fly neurons compute the direction of visual motion

Alexander Borst, Max-Planck-Institute for Biological Intelligence, Martinsried, Germany

Abstract: Detecting the direction of image motion is important for visual navigation, predator avoidance and prey capture, and thus essential for the survival of all animals that have eyes. However, the direction of motion is not explicitly represented at th...

▶ Play video

#

they did revese engeneer part of the fly brain

cunning lintel Jun 4, 2024, 12:00 PM

#

(but that leaves the question, what is wrong, llms learn from lots of data, the don't understand the difference between high quality data/low quality data, they just
"remember")

lucid swift Jun 4, 2024, 12:01 PM

#

cunning lintel Biggest thing with llms is they're so confidently wrong, and using llms for a fi...

yes thats a big problem

#

and that they can generate seeminlgy right stuff

cunning lintel Jun 4, 2024, 12:01 PM

#

humans aren't that great at it either, they use lots of heuristics to validate their data, authority figures, popular opinion, personal experience, etc.

dry wave Jun 4, 2024, 12:01 PM

#

lucid swift i wonder if we will lern good algorithms by revese engeneering nature. this is v...

I don't believe in that, either xD

lucid swift Jun 4, 2024, 12:02 PM

#

and that they sometimes cant do stuff if they are overtrained. like some models do everything in lists even if you say they shuld stop or some add emogys inot everything even if you say they shuld stop

lucid swift Jun 4, 2024, 12:02 PM

#

dry wave I don't believe in that, either xD

i mean it worked for that tiny part

dry wave Jun 4, 2024, 12:02 PM

#

like yes, biological brains are far superior, but we don't really know how they work and how/if we can simulate that

lucid swift Jun 4, 2024, 12:03 PM

#

i am sure they can be simulated. but yes we mostly dont know how

dry wave Jun 4, 2024, 12:03 PM

#

my problem is just that neural network research for centuries was full of this biological bullshit

#

like people came up with a mathematical/statistical solution and then they added some biological bullshit to sell it/get more funding/make it more interesting

lucid swift Jun 4, 2024, 12:04 PM

#

yes thats stupid. but if you look at the fly example it is very cool

dry wave Jun 4, 2024, 12:05 PM

#

convolutional neural networks work like the human visual cortex. WTF. How often people repeat this bullshit. You know what? they also work like EVERY stupid filter in ANY graphic program. Convolution is a totally normal mathematical operation and it is used since centuries for image processing

lucid swift Jun 4, 2024, 12:05 PM

#

reminds me of the universe is a neural network paper xD

dry wave Jun 4, 2024, 12:06 PM

#

the sigmoid function simulates a biological neuron. Nah, it doesn't do that. A sigmoid function foremost is a logistic regression which is used in statistics since centuries

lucid swift Jun 4, 2024, 12:07 PM

#

dry wave - the sigmoid function simulates a biological neuron. Nah, it doesn't do that. A...

but neurans have a activation threshold and thats just a simple way of making a model of it. or am i wrong?

dry wave Jun 4, 2024, 12:07 PM

#

my highlight: a few months ago nature published a paper about "neural networks that dream". They claimed that they were inspired by "sleep research in human" and came up with "improving neural networks by letting them sleep and dream, too". You know what they did? They "reinvented" the "regularization images" idea from the Dreambooth paper. Yes, adding regularization data improves learning. But that's not how you make it into nature. You have to come up with a fancy but totally unscientific idea of letting networks dream

dry wave Jun 4, 2024, 12:08 PM

#

lucid swift but neurans have a activation threshold and thats just a simple way of making a ...

a sigmoid function is a basis function and that's why it works in deep neural networks. It only works, though, if you make it not too steep. So it only works if its not looking like an activation threshold in a neuron

lucid swift Jun 4, 2024, 12:09 PM

#

dry wave - my highlight: a few months ago nature published a paper about "neural networks...

i agree i never understood that dreaming comparison they often do with ai

#

its more like halucinating in most cases

dry wave Jun 4, 2024, 12:09 PM

#

but even if its like an activation threshold. So what? Its like saying "A unet resembles a human ass because it is also shaped like that" 😬

lucid swift Jun 4, 2024, 12:10 PM

#

dry wave a sigmoid function is a basis function and that's why it works in deep neural ne...

i mean you can also train the activation functoin insted of the other weights

dry wave Jun 4, 2024, 12:10 PM

#

yeah xD Kolmogorov-Arnold Networks

#

which is neat, but honestly, we had this centuries ago and called it "general linear models"

lucid swift Jun 4, 2024, 12:10 PM

#

for a good comparison you probably have o train both

#

but idk if that would be good or not xD

dry wave Jun 4, 2024, 12:12 PM

#

me neither. I just found it funny, because the idea is so old and now they treat it as something totally new

#

but sure, if it works better, then it would be cool. I'm sceptical, though

lucid swift Jun 4, 2024, 12:13 PM

#

but i have seen the training of activation functions a long time ago. i think it was controlling a waliking spider or something

#

i also thoght its strage that manny said its new

dry wave Jun 4, 2024, 12:13 PM

#

anyways, I just hate this kind of "we have to find analogies from biology to sell people AI". One of the really nice things on transformers is: people haven't found any biological analogy for it so far xD Like for the first time nobody could say" yeah chatgpt is a transformer which is like XYZ in the human brain". Really nice

lucid swift Jun 4, 2024, 12:14 PM

#

dry wave anyways, I just hate this kind of "we have to find analogies from biology to sel...

i mean i dont know much about the details but you could make a comparison of a neuron conecting diffrent deeper layers together but idk

dry wave Jun 4, 2024, 12:14 PM

#

KAN networks use linear combinations of 1D splines. That's what is usually called "generalized linear model" (GLN). The only difference in KAN is that they use more than one layer. GLN usually only use 1 layer, because you use them when you want a linear, interpretable model.

radiant ledge Jun 4, 2024, 1:25 PM

#

dry wave KAN networks use linear combinations of 1D splines. That's what is usually calle...

has anyone managed to scale KANs beyond toy problems yet anyway?

dry wave Jun 4, 2024, 1:51 PM

#

not as far as I know

warped ivy Jun 4, 2024, 2:42 PM

#

glif-stablediffusion-3-mitu77-tafywggle74vwzitnp81p086.jpg

dull star Jun 4, 2024, 2:57 PM

#

lol

#

They have a Large and X-Large that are not being released
Look, misinformation is very funny, but its getting old

rose gate Jun 4, 2024, 3:08 PM

#

yeah the ragebait is getting boring, the sd reddit is also filled with it

mortal mesa Jun 4, 2024, 3:23 PM

#

when i was a kid we didnt include companies into our core beliefs to be defended or whatnot

cunning lintel Jun 4, 2024, 3:25 PM

#

dull star > They have a Large and X-Large that are not being released Look, misinformation...

🤷‍♂️ I don't disagree, but SAI is just as much to blame for the shit poured over them, they should have invested in proper communications a LONG time ago. Even this "outrage" could have been prevented by simply better wording. Something like "Stable Diffusion 3, our most advanced text-to-image is on its way! You will be able to download the weights for the Medium model on Hugging Face from Wednesday 12th June, while we continue to prepare the other 3 versions for later public release"

#

But i agree, it sucks, internet sucks, just don't give sooooo much room for all this misinfo and misrepresentation 😢

woeful spindle Jun 4, 2024, 3:26 PM

#

cunning lintel 🤷‍♂️ I don't disagree, but SAI is just as much to blame for the shit poured ov...

nah. you gotta create some drama

dull star Jun 4, 2024, 3:27 PM

#

cunning lintel 🤷‍♂️ I don't disagree, but SAI is just as much to blame for the shit poured ov...

yeah the way they worded "SD3 is coming on june 12th" is quite vague and misleading though I have to agree

mortal mesa Jun 4, 2024, 3:27 PM

#

ya sure they could of added an "a"

dull star Jun 4, 2024, 3:27 PM

#

In this server they got it right though:

The “weight” is nearly over! Today, at Computex Taipei, our Co-CEO, Christian Laforte, officially announced the open release date of Stable Diffusion 3 Medium for June 12th.

#

#📣｜announcements

#

its the email that's stupid

Have you heard that the SD3 weights are dropping soon?
it's saying weights, like multiple

#

ok wait its saying Stable Diffusion 3 Medium, our most advanced text-to-image is on its way! right after it...

desert garnet Jun 4, 2024, 3:28 PM

#

yea still remember when ppl where posting stuff emad said about 2weeks soon like 2 months ago

dull star Jun 4, 2024, 3:29 PM

#

people don't want to read past the first few sentences, so it kinda makes sense

#

both in the email and in the #📣｜announcements its saying that Medium is coming

#

but yeah, saying "the weights" is misleading, not a good headline

desert garnet Jun 4, 2024, 3:32 PM

#

this one of the juicy ones https://www.reddit.com/r/StableDiffusion/comments/1be1g74/per_emad_on_twitter_sd3_weights_expected_to/

dull star Jun 4, 2024, 3:33 PM

#

Emad: I'd expect proper release next month (weights)

mortal mesa Jun 4, 2024, 3:34 PM

#

with Cnets

dull star Jun 4, 2024, 3:34 PM

#

kek

#

we, including Emad, heavily underestimated how much time these models needed to train

#

and currently we're getting 2B right now, 8B has a long way to go

#

we won't see 8B until like august or september, maybe even october

#

but when it comes, it could be a DALLE3 killer

#

if a fully trained 2B gets on the level (if not above) of an undertrained 8B, then I can't imagine 8B trained to its fullest potential

desert garnet Jun 4, 2024, 3:36 PM

#

lets see if sai makes it to october

dull star Jun 4, 2024, 3:36 PM

#

or even 4B

#

yeah... 😬

#

they need 2B released to get money from the subscription

#

they should replace the API model from SD3 8B to SD3 2B with highresfix

#

and it actually becomes a competitive product, like Core (which is just a heavily finetuned SDXL Turbo with a workflow)

mortal mesa Jun 4, 2024, 3:39 PM

#

soo good ive never heard of it teehee

silver sluice Jun 4, 2024, 4:03 PM

#

dull star lol

to be clear "large" and "xlarge" are not being released ever or right away? have they said whether they plan to release the larger versions at all in the future?

dull star Jun 4, 2024, 4:03 PM

#

https://tenor.com/view/xenoverse-goku-super-saiyan-angry-dbz-gif-1416275111944307575

Tenor

silver sluice Jun 4, 2024, 4:03 PM

#

dull star we won't see 8B until like august or september, maybe even october

oh that answers it thanks for answering it before i asked just caught up

dull star Jun 4, 2024, 4:04 PM

#

Alex (mcmonkey):

We're on track to release the SD3 models* (note the 's', there's multiple - small/1b, medium/2b, large/4b, huge/8b) for free as they get finished.

silver sluice Jun 4, 2024, 4:04 PM

#

dull star Does anyone know how PonySD3 would be trained

oh and to answer your question from earlier here's the link to the article i was talking about:

https://civitai.com/articles/5069/towards-pony-diffusion-v7

and the key quote:

I am keen on training V7 using SD3, although it's currently uncertain whether we will have access to the model weights. I remain hopeful and would be delighted if someone from SAI could discuss this possibility with me. Despite my efforts to reach out, there has been no response yet—perhaps there's a bit of apprehension about being outshined by PD (just a light-hearted thought).

Towards Pony Diffusion V7 | Civitai

Hello everyone, I'm excited to share updates on the progress of our upcoming V7, along with a retrospective analysis of V6. The recognition V6 has ...

dull star Jun 4, 2024, 4:05 PM

#

silver sluice oh and to answer your question from earlier here's the link to the article i was...

oh thanks

#

also we might not see pony on SD3, JUST because of the license

#

but we'll see how it goes

silver sluice Jun 4, 2024, 4:06 PM

#

oh that's sad to think about, so SD3 vs SDXL licenses are different?

dull star Jun 4, 2024, 4:06 PM

#

yes, SDXL is openrail++, like pixart sigma and sd1.5

#

commercial use with no licensing required

#

SD3 is non-commercial, you need a paid membership for commercial use

#

but I am not so sure about all of this because

silver sluice Jun 4, 2024, 4:06 PM

#

but pony isn't commercial use is it?

dull star Jun 4, 2024, 4:06 PM

#

the image generated, are owned by you

#

and since you are generating offline, for yourself, you are using the model itself for personal use

#

and since the image is owned by you, you can use it for whatever you'd like

silver sluice Jun 4, 2024, 4:07 PM

#

yeah i agree, lol ill hold hope pony dev integrates SD3 despite any potential licensing issues

dull star Jun 4, 2024, 4:07 PM

#

but I'd still recommend you to pay the membership fee if you start making more than $20 a month

#

I'd do that for sure, but I only make images for fun 9 times out of 10

faint breach Jun 4, 2024, 4:09 PM

#

dull star the image generated, are owned by **you**

its the same way that photoshop is able to go after artists who use it without license. and they do.

dull star Jun 4, 2024, 4:10 PM

#

that makes sense

faint breach Jun 4, 2024, 4:10 PM

#

adobe lawyers get pretty aggressive about damages, but i dont' think theres many cases where they try to claim that they own the ip made with unlicensed photoshop

dull star Jun 4, 2024, 4:12 PM

#

but that's an illegal copy though like you are saying, but what about SD3, which is inherently free, and the gray legality (or whatever) about AI generated images

#

or do you mean like, in some countries, using pirated software for personal use is not illegal for example, and therefore Adobe can sue people?

faint breach Jun 4, 2024, 4:13 PM

#

yeah iamal. copyright law is complex. i certainly wouldn't test it. i'd license it. the cost doesn't seem to be a lot

#

lol ianal i mean

dull star Jun 4, 2024, 4:15 PM

#

faint breach lol ianal i mean

alex (stability dev) isn't a lawyer himself, but this is what he had to say about using images for like monetized youtube vids

faint breach Jun 4, 2024, 4:16 PM

#

yeah he communicates the same intentions. the licensing is broad enough to cover many more cases than they intend it to. it's not intended for youtubers unless they're raking in 5 figures a month

dull star Jun 4, 2024, 4:17 PM

#

yeah at that point I'd feel guilty for not buying a membership from stability, even if the model wasn't non-commercial to begin with

faint breach Jun 4, 2024, 4:19 PM

#

another consideration. maybe you're using sd3 for free through another service that does pay the license

dull star Jun 4, 2024, 4:19 PM

#

I totally get why Pony v7 might not be finetuned on SD3, this sounds weird and intrusive

dull star Jun 4, 2024, 4:19 PM

#

faint breach another consideration. maybe you're using sd3 for free through another service ...

hmm

#

yeah, then can I use it for commercial use?

#

faint breach Jun 4, 2024, 4:21 PM

#

i dont think most pony users are trying to commercialize their creations. thats one of the funniest user example galleries on civit. dozens of new entries every hour. a constant deluge

dull star Jun 4, 2024, 4:22 PM

#

yeah they just want to make cartoon corn for themselves or make images to impress ~~or arouse~~ other

#

it's the model creator who might want to commercialize it in some way maybe, idk

faint breach Jun 4, 2024, 4:23 PM

#

or a service deploying it

dull star Jun 4, 2024, 4:23 PM

#

yuh

dull star Jun 4, 2024, 4:23 PM

#

dull star

this is like how Microsoft doesn't own the created images from Copilot image generator (DALLE3), (but in Microsoft's case they can use the images if they want to)

faint breach Jun 4, 2024, 4:23 PM

#

if they're taking donations because they made a model, that's a legal grey area that i don't think has been tested much

dull star Jun 4, 2024, 4:25 PM

#

is paying for credits on the api going to stability, or is it split between them and fireworks or whatever

#

cause idk how to donate once besides that or just cancelling the membership after a month

faint breach Jun 4, 2024, 4:31 PM

#

copyright shouldn't be a tidy discussion anyways. human creativity is a messy field. the rules governing it can't be orderly. that's how disney swoops in and owns everything

noble coyote Jun 4, 2024, 4:37 PM

#

MJ's Copyright scheme seems totally contradictory (and I paraphrase): "MJ owns the outright copyright to any image produced; yet extends an unlimited and inalienable right of use to the producers of such images!!!"

#

Take that as you will...

dull star Jun 4, 2024, 4:38 PM

#

lol

faint breach Jun 4, 2024, 4:41 PM

#

many software as a service companies will do this. especially ones that are planning on an acquisition exit. they can claim more value

sullen moss Jun 4, 2024, 4:51 PM

#

coze.com unfiltered Dalle 3 😁

#

Just need to bite text filter

faint breach Jun 4, 2024, 4:57 PM

#

coze.com looks like a spam hub. affiliate links galore. nothing about dalle.

sullen moss Jun 4, 2024, 4:59 PM

#

What do you mean 'nothing ' ?

faint breach Jun 4, 2024, 5:01 PM

#

its a spam link farm. you'e a spammer. think that clears it up

sullen moss Jun 4, 2024, 5:06 PM

#

If it were spam, I wouldn't have shared this link here. I started using this resource myself, so I decided to share it

faint breach Jun 4, 2024, 5:07 PM

#

"this resource" it's an affiliate link farm. spam.

sullen moss Jun 4, 2024, 5:09 PM

#

Hm

#

Ah, now I understand what you meant, sorry. 🤝

#

In general, if you're interested, look for the thread about Dalle-3 on 4chan, everything will be clear there

gusty trail Jun 4, 2024, 5:17 PM

#

dull star I totally get why Pony v7 might not be finetuned on SD3, this sounds weird and i...

The one who get the fine tuned version also need a membership for using it?

dull star Jun 4, 2024, 5:19 PM

#

the free one

gusty trail Jun 4, 2024, 5:20 PM

#

I mean if someone pay for a fine tuned version. The author and the customer both need the membership

faint breach Jun 4, 2024, 5:20 PM

#

people paying for finetuned models? that sounds like bullshit

#

i really hope that stability's new license doesn't unleash a wave of enshitification like that

teal fossil Jun 4, 2024, 5:22 PM

#

dull star > They have a Large and X-Large that are not being released Look, misinformation...

Seriously what is wrong with all those clueless people spouting nonsense?

Alex and others were very open about the limitations (and the advantages) of Medium... and I gotta say I'm sold. I can't wait to get my hands on it. Crazy that it's "just" another week.

faint breach Jun 4, 2024, 5:23 PM

#

8 days

teal fossil Jun 4, 2024, 5:24 PM

#

gusty trail I mean if someone pay for a fine tuned version. The author and the customer both...

The Author needs the membership to profit from their own Model. Of course that doesn't mean everyone who uses their free release can then piggyback off of their membership and also use it commercially. That would make the whole idea void.

teal fossil Jun 4, 2024, 5:24 PM

#

faint breach 8 days

Pfffff - I'm so tired from too much dataset shenanigans that it's almost wednesday for me. 😛

#

That being said - are we looking at a midnight release? Which timezone? 👼

faint breach Jun 4, 2024, 5:24 PM

#

https://tenor.com/view/wednesday-gif-17256865298827295829

Tenor

dull star Jun 4, 2024, 5:24 PM

#

teal fossil That being said - are we looking at a midnight release? Which timezone? 👼

most west american timezone at 11:59 PM

#

😈

teal fossil Jun 4, 2024, 5:25 PM

#

Seriously... the wait on wednesday will be the worst. 🤣

faint breach Jun 4, 2024, 5:25 PM

#

i'm most west canadur timeszone. its 10:25 here

dreamy sundial Jun 4, 2024, 5:25 PM

#

faint breach Jun 4, 2024, 5:25 PM

#

USA has Hawaii too so thats further west

teal fossil Jun 4, 2024, 5:25 PM

#

faint breach i'm most west canadur timeszone. its 10:25 here

haha - we are living worlds apart. 😉

faint breach Jun 4, 2024, 5:26 PM

#

break those chains that bind you

dull star Jun 4, 2024, 5:26 PM

#

teal fossil The Author needs the membership to profit from *their own* Model. Of course that...

yessir

gusty trail Jun 4, 2024, 5:27 PM

#

teal fossil The Author needs the membership to profit from *their own* Model. Of course that...

I have an idea. The fine tuned version host on a platform and provide api to the customer. It only paid once.

teal fossil Jun 4, 2024, 5:29 PM

#

gusty trail I have an idea. The fine tuned version host on a platform and provide api to the...

Well the customers are paying the Finetuner in that case (it will happen). Seriously - almost all AI-Generator websites and apps are based on SDXL. It was a missed opportunity for SAI to get their cut of those profits. They would deserve them. (of course while SDXL is still free for non-commercial / hobby use locally)

faint breach Jun 4, 2024, 5:31 PM

#

Going to be interesting to see how finetuned models proliferate. If SD3 refiners start charging for their versions, i'll move over to pixart sigma or stick with sdxl instead

#

imagine needing to subscribe to someone's patreon to use their loras

dull star Jun 4, 2024, 5:32 PM

#

I'll just keep using the base model thomas

#

yeah bruh

#

lykon will probably keep making free models

silver sluice Jun 4, 2024, 5:33 PM

#

so just to make sure I'm clear

if i wanted to download SD3 and run locally that's free and doesn't change from SDXL
if Pony dev wanted to download SD3 and fine tune it for his purposes and provide it to users he would have to pay SAI a membership fee and he would have to offset those costs by charging users to download his model?

faint breach Jun 4, 2024, 5:34 PM

#

silver sluice so just to make sure I'm clear - if i wanted to download SD3 and run locally tha...

it is changed from sdxl. there's a non commercial limitation now

silver sluice Jun 4, 2024, 5:34 PM

#

so does that sum it up correctly? are you affirming that's right?

#

pony dev could offer it as a paid download but once it leaks anyone else can just download it and then it's a 'pirated' copy at that point right?

faint breach Jun 4, 2024, 5:36 PM

#

model authors don't have to charge for their models. they might though.

silver sluice Jun 4, 2024, 5:36 PM

#

well is there a membership fee? and if so how much? I'm sure a trivial $100 fee wouldn't cause anyone to offset the cost to users but if it's like a monthly $10/K fee then that's a different story lol

dull star Jun 4, 2024, 5:37 PM

#

silver sluice so just to make sure I'm clear - if i wanted to download SD3 and run locally tha...

if i wanted to download SD3 and run locally that's free and doesn't change from SDXL
yup, you can download SD3 2B when it comes out and keep making images for free, just like with SDXL, but commercial use (selling images or using your images in paid products such as games or youtube videos) is a different story, it needs to be figured out

silver sluice Jun 4, 2024, 5:40 PM

#

dull star I totally get why Pony v7 might not be finetuned on SD3, this sounds weird and i...

but we're not talking about selling images or using it for advertising we're just talking about using fine-tuned models the text here says:

"as a member you may build products.... including fine-tunes from SAI core models"

so does that mean that only members can create fine-tuned models?

dull star Jun 4, 2024, 5:40 PM

#

free membership is a membership

#

they didn't specify which one

#

if they make it so that only paid members can finetune, then stability have dug their own graves

#

so I'm pretty sure that's not the case

silver sluice Jun 4, 2024, 5:41 PM

#

oh good point, i didn't know there was a free membership, yeah my understasnd was if from SDXL to SD3 the only change is paid members can finetune then that would suck for guys like pony dev

gusty trail Jun 4, 2024, 5:41 PM

#

You could fine tune model for non-commercial use

dull star Jun 4, 2024, 5:41 PM

#

well I suppose all finetunes follow the non-commercial license, no?

silver sluice Jun 4, 2024, 5:41 PM

#

dull star so I'm pretty sure that's not the case

okay good so i guess i have a hard time understanding how SD3 license is different from SDXL

dull star Jun 4, 2024, 5:42 PM

#

finetunes will require a paid membership to STABILITY to use the finetuned model for commercial use

dull star Jun 4, 2024, 5:43 PM

#

silver sluice okay good so i guess i have a hard time understanding how SD3 license is differe...

basically like openrail, but you cannot make money from hosting the model (or using it to make images that you might include in paid products???? have to figure that out), but if you just make images and finetuned models offline then its not different from SDXL

storm saffron Jun 4, 2024, 5:43 PM

#

dull star finetunes will require a paid membership to STABILITY to use the finetuned model...

Commercial use seems to be if you are using it as a paid service. The output from it is subject to local laws.

gusty trail Jun 4, 2024, 5:43 PM

#

But if someone use the non-commercial fine tune for commercial usage, let say hosting free models and making profit. How would it count

silver sluice Jun 4, 2024, 5:43 PM

#

ah i undrstand so for example if i decide to use PonyV7's SD3 finetune model in a commercial application, then I'll be required to sign up with SAI as a paid member. right?

dull star Jun 4, 2024, 5:43 PM

#

storm saffron Commercial use seems to be if you are using it as a paid service. The output fro...

the output is not owned by Stability apparently, this is why I'm confused

storm saffron Jun 4, 2024, 5:43 PM

#

dull star the output is not owned by Stability apparently, this is why I'm confused

It's not, no, it specifically says in all the licenses that output is not a derivative of the model.

faint breach Jun 4, 2024, 5:44 PM

#

gusty trail You could fine tune model for non-commercial use

License says you can't finetune if you're not licensed

storm saffron Jun 4, 2024, 5:44 PM

#

You can fine tune it, and you can use that fine tune to make pictures to sell, but you can't put it on a hosting service and ask people to pay for use.

#

Unless you pay

gusty trail Jun 4, 2024, 5:44 PM

#

faint breach License says you can't finetune if you're not licensed

How? The author fine tuned for non-commercial use and someone hosted his fine tune

silver sluice Jun 4, 2024, 5:45 PM

#

storm saffron You can fine tune it, and you can use that fine tune to make pictures to sell, ...

this seems reasonable

faint breach Jun 4, 2024, 5:45 PM

#

author can't distribute fine tunes without a license. all derived versions of the model are subject to stability's commercial license

silver sluice Jun 4, 2024, 5:45 PM

#

faint breach License says you can't finetune if you're not licensed

this wouldn't seem reasonable, like you can't finetune just for free for non-commercial issue? i doubt it

silver sluice Jun 4, 2024, 5:45 PM

#

faint breach author can't distribute fine tunes without a license. all derived versions of t...

yeah that would make sense that seems reasonable

gusty trail Jun 4, 2024, 5:45 PM

#

faint breach author can't distribute fine tunes without a license. all derived versions of t...

Really? That means no free fine tune exist

faint breach Jun 4, 2024, 5:46 PM

#

end users can download and use models locally for free. they can do that with finetunes too. but authors may want to charge for those. we dont know yet

storm saffron Jun 4, 2024, 5:46 PM

#

faint breach author can't distribute fine tunes without a license. all derived versions of t...

If you fine tuned it, and you then uploaded it to a hosting service for free downloads, and then someone downloaded it, and use it on THEIR hosted provider that people had to pay for, then the onus would be on the person hosting it to pay for membership NOT the finetuner

faint breach Jun 4, 2024, 5:47 PM

#

storm saffron If you fine tuned it, and you then uploaded it to a hosting service for free dow...

thats true. the finetuner has a responsibilty to pay for membership first though too, before any of that.

dull star Jun 4, 2024, 5:47 PM

#

random guessing logic, ianal, don't quote me on this:

if you think about it, you host the model offline, to yourself (comfyui, a1111, etc), therefore its personal use, which means non-commercial
and since the image outputs are owned by you, so theoretically, you could do anything with it
I want to hear from stability how this all really works.

storm saffron Jun 4, 2024, 5:47 PM

#

faint breach thats true. the finetuner has a responsibilty to pay for membership first thoug...

Why would they?

faint breach Jun 4, 2024, 5:48 PM

#

storm saffron Why would they?

creating and distributing fine tunes requires membership

dull star Jun 4, 2024, 5:48 PM

#

but a paid one though? I want to know that

storm saffron Jun 4, 2024, 5:48 PM

#

faint breach creating and distributing fine tunes requires membership

The current licenses for all current models (including the core models) say you just have to keep the same license.

#

You pay for commerical usage of it.

faint breach Jun 4, 2024, 5:49 PM

#

dull star but a paid one though? I want to know that

seems that way at a glance. ianal

dull star Jun 4, 2024, 5:49 PM

#

storm saffron Jun 4, 2024, 5:51 PM

#

Once it's on YOUR computer you can do what you like with it until you make it public in exchange for payment. That's how I read it.

dull star Jun 4, 2024, 5:51 PM

#

whatever is the case, if I ever use it for commercial use, I'd buy the membership if I actually go past $20 a month

storm saffron Jun 4, 2024, 5:55 PM

#

From the Turbo license, which is the current Non Commercial license.

Merely distributing the Software Products or Derivative Works for download online without offering any related service (ex. by distributing the Models on HuggingFace) is not a violation of this subsection.

The subsection being "Non-Commercial Use"

dull star Jun 4, 2024, 5:56 PM

#

or Derivative Works
ah yeah

storm saffron Jun 4, 2024, 5:56 PM

#

Whole section:

b. You may not use the Software Products or Derivative Works to enable third parties to use the Software Products or Derivative Works as part of your hosted service or via your APIs, whether you are adding substantial additional functionality thereto or not. Merely distributing the Software Products or Derivative Works for download online without offering any related service (ex. by distributing the Models on HuggingFace) is not a violation of this subsection. If you wish to use the Software Products or any Derivative Works for commercial or production use or you wish to make the Software Products or any Derivative Works available to third parties via your hosted service or your APIs, contact Stability AI at https://stability.ai/contact.

As it says there, finetuning it and giving it away is fine.

dull star Jun 4, 2024, 5:57 PM

#

yeah it seems so

#

I suppose the same non-commercial license will apply to SD3

storm saffron Jun 4, 2024, 5:57 PM

#

It should do, this is the updated one they're using on all the 'core' models now.

dull star Jun 4, 2024, 5:57 PM

#

they are really just targeting companies using their models for free

long palm Jun 4, 2024, 5:57 PM

#

#🆕

storm saffron Jun 4, 2024, 5:58 PM

#

dull star they are really just targeting companies using their models for free

That's exactly it. Yes.

dull star Jun 4, 2024, 5:58 PM

#

from what I've heard though, is that the license for companies (enterprise membership or whatever?), is suuuper expensive, and some of them just thought of training a model themselves

long palm Jun 4, 2024, 6:00 PM

#

#🆕 | sd3

dull star Jun 4, 2024, 6:00 PM

#

#🆕｜sd3

#

wow I was kind of right lmao

#

except 8B is Huge

storm saffron Jun 4, 2024, 6:06 PM

#

Any thoughts on how it'll split the community though? I think M and L will be most popular. I guess S is for phones?

dull star Jun 4, 2024, 6:06 PM

#

yeah idk how much difference there will be between 4B and 8B

#

cause if a fully trained 2B is already catching up to an undertrained 8B, I'm not so sure if we'll need 8B

#

unless 8B has INCREDIBLE amounts of knowledge and prompt adherence

#

then it would be worth to make slower generations at the cost of superb prompt adherence and stuff

#

I suppose M will be the most popular

cunning lintel Jun 4, 2024, 6:11 PM

#

storm saffron Any thoughts on how it'll split the community though? I think M and L will be mo...

My fear is where companies now ceate things like controlnets/ipadapters for SDXL, they'll now create PoC's for something like the 2b model, and keep the one for 8b in-house, bit like we see with some 1.5 only releases. And for lots of research, showing it works on 2b will be enough to proof their work works, no need to even create it for larger models.

#

Otoh maybe it's for the better, the fact that the small models are cheaper to train might result in things get developed that otherwise wouldn't even be tried at all 😉

viral plaza Jun 4, 2024, 6:12 PM

#

dull star except 8B is `Huge`

naming on 4B/8B isn't locked in

#

2B==Medium is locked in

#

1B is very unlikely to be named anything other than Small

#

4B/8B will probably be Large and Huge/Giant or something, or it might be we skip 4B and say 8B is Large, or idk

dull star Jun 4, 2024, 6:13 PM

#

skip 4B?

storm saffron Jun 4, 2024, 6:13 PM

#

@viral plaza what quantization will we be getting bf16/fp16?

viral plaza Jun 4, 2024, 6:14 PM

#

I think fp16

dull star Jun 4, 2024, 6:14 PM

#

(with cascade we've got bf16 iirc, why did we though?)

spare orchid Jun 4, 2024, 6:14 PM

#

hey, anyone here have experience in creating anime waifu type images out of inanimate objects ,cars etc. need help with something

storm saffron Jun 4, 2024, 6:14 PM

#

bf16 would be better on 3000 series nvidia and up.

viral plaza Jun 4, 2024, 6:14 PM

#

running the model weights (not calc) in fp8 even is near-identical

#

so exact format in storage doesn't overly matter

#

only matters what you calculate in and what you train in

dull star Jun 4, 2024, 6:15 PM

#

isn't running it in fp8 slow on non 40xx though?

viral plaza Jun 4, 2024, 6:15 PM

#

running yes not storing no

dull star Jun 4, 2024, 6:15 PM

#

thanks

viral plaza Jun 4, 2024, 6:15 PM

#

again, weights in fp8, calc in fp16 or bf16 to preference

dull star Jun 4, 2024, 6:16 PM

#

interesting

viral plaza Jun 4, 2024, 6:16 PM

#

basically half the VRAM cost and maybe a tiny bit timecost from the conversion (not much I think) and identical results

storm saffron Jun 4, 2024, 6:16 PM

#

It wasn't quite the same in SDXL with FP8, you could tell something was off.

dull star Jun 4, 2024, 6:16 PM

#

viral plaza basically half the VRAM cost and maybe a tiny bit timecost from the conversion (...

ayo what????

#

I expected something like this from 8B, cause its such a large model

#

but from 2B...?

viral plaza Jun 4, 2024, 6:17 PM

#

storm saffron It wasn't quite the same in SDXL with FP8, you could tell something was off.

ye iirc XL is very close but a lil off, and SD3 is closerer

dull star Jun 4, 2024, 6:17 PM

#

could it be because its transformer-like, therefore it handles quantization better? (theory)

viral plaza Jun 4, 2024, 6:17 PM

#

dull star but from 2B...?

not actually model size dependent, more step count dependent: fp8 on turbo models is harder to do

low stone Jun 4, 2024, 6:17 PM

#

SD3 Big McLargeHuge

storm saffron Jun 4, 2024, 6:17 PM

#

dull star could it be because its transformer-like, therefore it handles quantization bett...

Possibly, bet let's not go 4bit... 😄

dull star Jun 4, 2024, 6:18 PM

#

imatrix 2-bit ggml quantization

#

thomas

#

lmao idk what I'm talking about at this point

#

but this is good news

storm saffron Jun 4, 2024, 6:18 PM

#

You could possibly quantize it down to 6ish without too much loss

dull star Jun 4, 2024, 6:18 PM

#

and about T5

#

weights at bf16/fp16 (compared to fp32) already decrease load times and ram usage if being run on CPUs

#

what about storing them in fp8 too?

storm saffron Jun 4, 2024, 6:19 PM

#

I assume the T5 we're getting is in FP16 as well, but that does quantize pretty well using bitsandbytes.

dull star Jun 4, 2024, 6:19 PM

#

yeah bnb4bit is perfectly fine with T5 when I tried it with pixart, heavily decreases vram requirements compared to raw weights

dusky thistle Jun 4, 2024, 6:19 PM

#

regarding training with 2b... i think the biggest question of all is what it takes to train controlnets

viral plaza Jun 4, 2024, 6:20 PM

#

I hope we can release an SD3-Medium-fp8 safetensors

storm saffron Jun 4, 2024, 6:20 PM

#

I just run T5 on the CPU cos it's not actually that slow

viral plaza Jun 4, 2024, 6:20 PM

#

it'd be a literally 2GiB model, same size as SD1 model files, but better-than-XL quality

dusky thistle Jun 4, 2024, 6:20 PM

#

sdxl wasn't left wanting for long for loras and finetunes, but controlnets? that's been the real problem all along

storm saffron Jun 4, 2024, 6:20 PM

#

viral plaza I hope we can release an SD3-Medium-fp8 safetensors

If you don't, someone will anyway. 😄

dusky thistle Jun 4, 2024, 6:20 PM

#

they're finally rolling in but it took almost a year to get good ones

dull star Jun 4, 2024, 6:20 PM

#

thankfully someone trained a good openpose model for SDXL after all this time

#

(it like... actually works this time)

viral plaza Jun 4, 2024, 6:21 PM

#

controlnets have a clear logical place to go in mmdit - it's built around multiple streams as a concept, so just tack on another stream (vs SD1/SDXL, controlnets are kinda hacked in)

dull star Jun 4, 2024, 6:21 PM

#

I wonder how much better controlnets will get because of this then

dusky thistle Jun 4, 2024, 6:23 PM

#

viral plaza controlnets have a clear logical place to go in mmdit - it's built around multip...

yeah what i'm really wondering about is the training cost

#

if it can be squeezed into 24gb of vram, it will be amazing

#

or whatever vram the 5090 ends up having

storm saffron Jun 4, 2024, 6:24 PM

#

dusky thistle or whatever vram the 5090 ends up having

Rumour has it having 28Gb.

dusky thistle Jun 4, 2024, 6:25 PM

#

yeah, or 32gb, or who knows

#

28gb would be stupid

dull star Jun 4, 2024, 6:25 PM

#

I count on 28GB thomas

low stone Jun 4, 2024, 6:26 PM

#

What I want to know if I have a 4090, would I be able to just swap in a 5090. Is it the same form factor. If not, that's gonna blow

dusky thistle Jun 4, 2024, 6:26 PM

#

low stone What I want to know if I have a 4090, would I be able to just swap in a 5090. Is...

yeah... if they made it even bigger... 🤣

low stone Jun 4, 2024, 6:27 PM

#

I have a perfectly good Alienware box with a 3080 that has the power and cpu/ram for a 4090, but it won't fit in the case. Would blow chunks if they do the same thing again with the 4090s.

viral plaza Jun 4, 2024, 6:28 PM

#

dusky thistle if it can be squeezed into 24gb of vram, it will be amazing

I'd expect training reqs to be similar to SDXL but slightly lower

#

so if you can train XL you can train SD3-Medium

dull star Jun 4, 2024, 6:29 PM

#

didn't he mean training controlnets?

viral plaza Jun 4, 2024, 6:29 PM

#

oh controlnet training idk

dusky thistle Jun 4, 2024, 6:29 PM

#

yeah i think controlnets are the biggie

dull star Jun 4, 2024, 6:29 PM

#

also, can you tell me if lora-like training code will be provided out of the box?

#

or will it be more like dreambooth

viral plaza Jun 4, 2024, 6:29 PM

#

the weight size would be ~half the weight of of SD3-Medium, so roughly 1B-ish to add a stream

#

so should be trainable

dusky thistle Jun 4, 2024, 6:30 PM

#

i'm guessing the fact that controlnets were considered when designing mmdit means that they will be much more effective than the sdxl ones, which are often really weak or hit/miss

storm saffron Jun 4, 2024, 6:30 PM

#

dusky thistle yeah... if they made it even bigger... 🤣

Also rumour has it they're going back to a 2 slot card

dusky thistle Jun 4, 2024, 6:30 PM

#

viral plaza so should be trainable

on 24gb?

#

cuz if so, wow

viral plaza Jun 4, 2024, 6:30 PM

#

dull star also, can you tell me if lora-like training code will be provided out of the box...

lora is perfectly doable in concept it's just a matter of what code gets published where, which rn idunno specifics of

dull star Jun 4, 2024, 6:30 PM

#

maybe diffusers has that, I forgot

viral plaza Jun 4, 2024, 6:30 PM

#

HF will have code published so presumably they'll cover all the usual training

dusky thistle Jun 4, 2024, 6:30 PM

#

cnets trainable on a consumer card would be very cool

storm saffron Jun 4, 2024, 6:31 PM

#

We won't need loras, everything's in the model right?

dull star Jun 4, 2024, 6:34 PM

#

man I wish

dusky thistle Jun 4, 2024, 6:34 PM

#

https://github.com/huggingface/diffusers/issues/4925

i've never tried training a sdxl controlnet, but i recall reading it required more than 24gb... no idea where aside from what i just found here, so take it with a train of salt

"You can add the --use_8bit_adam and --enable_xformers_memory_efficient_attention flags, it works for me. The VRAM usage for each card is about 35GB when setting --train_batch_size=1 and --resolution=1024."

GitHub

Training Controlnet SDXL distributed gives out-of-memory errors · I...

Describe the bug Hi. I am running the Controlnet SDXL example as it is shown in the examples section [example-link]. I am unable to reproduce the results in a SLURM managed environment, where I hav...

cunning lintel Jun 4, 2024, 6:37 PM

#

reading the announcements of the new sdxl controlnets, training them doesn't seem to be a thing for mere mortals :p

viral plaza Jun 4, 2024, 6:40 PM

#

SDXL controlnet requirements for training are higher than SD3-Medium by a fair bit

#

also, for SDXL we had control-LoRA but idk if HF training code supports it

#

the whole point of Control-LoRA naturally being to reduce the resource cost

dusky thistle Jun 4, 2024, 6:41 PM

#

viral plaza SDXL controlnet requirements for training are higher than SD3-Medium by a fair b...

yeah that's one of the most exciting thinsg here imo

#

one of the biggest things that sets the potential with SD so much higher than with anything else imo

#

yeah on here they talk about running it on an A100 as well https://github.com/huggingface/diffusers/blob/main/examples/controlnet/README_sdxl.md

GitHub

diffusers/examples/controlnet/README_sdxl.md at main · huggingface/...

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX. - huggingface/diffusers

#

so if it's 35gb at a min for sdxl and if the vram needs are 30-35% lower for sd3-medium, it's doable on 24gb

woeful spindle Jun 4, 2024, 7:15 PM

#

what does T5 mean?

#

is it something that helps text generation?

twin tulip Jun 4, 2024, 7:17 PM

#

t5 is a different type of text encoder, not a clip text encoder

woeful spindle Jun 4, 2024, 7:18 PM

#

hmm

#

is it built-in or do we need to do something to activate it

twin tulip Jun 4, 2024, 7:20 PM

#

viral plaza 1B is very unlikely to be named anything other than Small

SD3 Smol? 😆

#

I think we're awaiting to see what pipelines work or are delivered, the paper said T5 can be optionally dropped, T5 is huge, much bigger than either clip model, maybe mcmonkey can chime in or we'll know later

viral plaza Jun 4, 2024, 7:22 PM

#

dropping T5 works fine if the size is an issue for you

#

CLIP G+L without the T5 is very close to having all 3 on most prompts

twin tulip Jun 4, 2024, 7:23 PM

#

I imagine pipelines can be setup to load T5, run it once for embedding, then move the weights to cpu while the DIT runs

#

or maybe T5 can be quantized heavily?

viral plaza Jun 4, 2024, 7:24 PM

#

you can even just run it entirely on CPU

#

Also yes T5 happily quantizes to 4bit, idk if there will be code for that on launch day but HF Candle runs T5-4bit on CPU well

dull star Jun 4, 2024, 7:35 PM

#

T5 4-bit on GPU fits well with pixart sigma 0.6B

#

around like 8GB of VRAM the last time I tried, don't remember

#

but it's not so bad on CPU only

lucid swift Jun 4, 2024, 7:36 PM

#

dull star T5 4-bit on GPU fits well with pixart sigma 0.6B

but you can also run it on the cpu very fast

dull star Jun 4, 2024, 7:36 PM

#

especially with the bf16 weights

dull star Jun 4, 2024, 7:36 PM

#

lucid swift but you can also run it on the cpu very fast

it wasn't as fast on the cpu for me

#

but I'll try again

#

on gpu it was instant

lucid swift Jun 4, 2024, 7:36 PM

#

dull star it wasn't as fast on the cpu for me

i mean for people with less then 8gb

dull star Jun 4, 2024, 7:36 PM

#

absolutely

#

like its not suuuper slow either

#

its good enough and accessible

lucid swift Jun 4, 2024, 7:37 PM

#

yes

dull star Jun 4, 2024, 7:41 PM

#

it takes about 10-20 secs on CPU for T5

#

then again, after the conditioning has been done, you can generate on other seeds instantly

#

so its just generating the conditioning once, then you can change cfg, seed, and other stuff and don't have to use T5 again

#

that's actually pretty nice

low stone Jun 4, 2024, 7:57 PM

#

@viral plaza do you think we'll see the 2b on the api or artisan before the 12th?

viral plaza Jun 4, 2024, 7:57 PM

#

low stone <@105458332365504512> do you think we'll see the 2b on the api or artisan before...

hopefully yes

#

API team is talking about it but idk the timeline

#

if it gets on API it'll be added to Artisan immediately

low stone Jun 4, 2024, 7:58 PM

#

Ok great thanks

dull star Jun 4, 2024, 8:00 PM

#

hell yeah

low stone Jun 4, 2024, 8:04 PM

#

raven fern Jun 4, 2024, 8:48 PM

#

2B or not 2B :3

#

man can't wait to try it out

#

and of course see what the community has in store

#

im also curious about the smol model, how good will it generate stuff, and also are most people gonna train loras or finetunes on 2B?

remote holly Jun 4, 2024, 9:22 PM

#

12 gb is enough for 2B sd3 ?

low stone Jun 4, 2024, 9:34 PM

#

yes

#

you can choose to offload various bits to main system ram as well, so no matter what it'll render with that.

bitter hearth Jun 4, 2024, 9:43 PM

#

low stone

https://tenor.com/view/2b-nier-shakes-head-no-gif-16807925012473172948

Tenor

#

been a while since I posted anything here

#

waow

low stone Jun 4, 2024, 9:53 PM

#

#

will i be able to make images like this with sd3?

jolly swan Jun 4, 2024, 9:53 PM

#

silver sluice but pony isn't commercial use is it?

It is (although all versions are available for free for local use). Training pony is very expensive, so I have to recoup the costs somehow - I run Discord service for about 20k users and have partnership with SaaS services. I also (obviously) have the SAI Membership, but the problem is that SD3 seems to be non-commercial even for members and you will have to maybe make some extra deal? But this is not communicated at all right now.

low stone Jun 4, 2024, 9:55 PM

#

If you could just go ahead and fill out this form in triplicate, we'll get back to you around the time we release the 8b.

silver sluice Jun 4, 2024, 9:55 PM

#

jolly swan It is (although all versions are available for free for local use). Training pon...

Interesting insight thanks for the feedback, excited to see what you come up with next week thanks for the update 👍

jolly swan Jun 4, 2024, 9:56 PM

#

low stone If you could just go ahead and fill out this form in triplicate, we'll get back ...

Sorry, was that directed at me?

low stone Jun 4, 2024, 9:56 PM

#

It was, sarcastically.

#

I feel for you.

hallow lion Jun 4, 2024, 9:56 PM

#

What's with the drama, can;t we all just be happy we're getting the weight

jolly swan Jun 4, 2024, 9:56 PM

#

Ah, that felt too real so I was not sure cadancewheeze

hallow lion Jun 4, 2024, 9:57 PM

#

It's happenign for real! who cares its medium

#

its tstill gonna mop the floor with dalle miedjourney and sdxl

low stone Jun 4, 2024, 9:57 PM

#

I think pony represents all that is wrong with society and shows off who we really are in our dart hearts. And we salute you.

#

🙂

jolly swan Jun 4, 2024, 9:57 PM

#

low stone I think pony represents all that is wrong with society and shows off who we real...

It's a model to make pictures of cool ponies.

hallow lion Jun 4, 2024, 9:57 PM

#

p0ny is great even if i dont use it for uhm anatomical studies

jolly swan Jun 4, 2024, 9:57 PM

#

I am sorry y'all decided to use it for something else.

#

That's on you, not me.

low stone Jun 4, 2024, 9:58 PM

#

#

And pixart is a model for making this.

#

somewhere it went horribly wrong.

hallow lion Jun 4, 2024, 9:58 PM

#

AI always sound slike spekaing in tongues and summonign demons when trying to make text

low stone Jun 4, 2024, 9:59 PM

#

hallow lion AI always sound slike spekaing in tongues and summonign demons when trying to ma...

and then you just pipe it all into the image generator.

jolly swan Jun 4, 2024, 9:59 PM

#

silver sluice Interesting insight thanks for the feedback, excited to see what you come up wit...

Worst case scenario we will get a v6.9 based on XL

cunning lintel Jun 4, 2024, 10:01 PM

#

assuming sd3 is released as core model, i don't see an issue as long as you stay below the enterprise reqs and get the pro membership thingy, doesn't seem you get there with your 20k discord users. But yeah, would be good to get that as a response from sai itelf

silver sluice Jun 4, 2024, 10:02 PM

#

jolly swan Worst case scenario we will get a v6.9 based on XL

it would be interesting to see how your new training translates for better quality images using the sdxl model and then see the results translated to the SD3 model, I think a 6.9 version would also appease the community who have set up their workflow and system around sdxl. so to be clear you're going to wait until the 12th at which point there will be a clear answer on licensing terms and then you'll decide which model to train next?

jolly swan Jun 4, 2024, 10:08 PM

#

cunning lintel assuming sd3 is released as core model, i don't see an issue as long as you stay...

Discord is for pony lovers, it's SaaS that makes more reasonable money, but again, the whole issue is that membership may not be sufficient and so far I can't get any specific comms.

jolly swan Jun 4, 2024, 10:09 PM

#

silver sluice it would be interesting to see how your new training translates for better quali...

It's going to be a (better, I hope) different model anyway, there has been so many changes to tech and data that I expect it to diverge a lot (but be closer to XL)

cunning lintel Jun 4, 2024, 10:13 PM

#

i'd think that if you make more than what pro allows, you can afford the enterprise license 😉 If the worry is that you get a small fee for making the model available to those saas providers, that those providers do need the enterprise license, that's not your problem, they need to get the enterprise license to use the finetune (cause it still has the default license attached), not you

silver sluice Jun 4, 2024, 10:14 PM

#

jolly swan It's going to be a (better, I hope) different model anyway, there has been so ma...

do you have anything you can show that you've generated lately? 🙂 any sneak previews? lol

#

i just think overall SAI should have a special room for VIP fine tuners where they can get dedicated support and service and answers to their questions, just a curated list of top tier devs who make the models better so they can be taken care of first and foremost

cunning lintel Jun 4, 2024, 10:15 PM

#

But that's just my interpretation, that whole membership thing is clear as mud, all it really says it grants you commercial use (where the license that you get with the weights does not)

teal fossil Jun 4, 2024, 10:31 PM

#

viral plaza 4B/8B will probably be Large and Huge/Giant or something, or it might be we skip...

That sounds like a logical approach. Training is needed and I don't think there will be a huge benefit from splitting the (SD3) community in 4 Model Groups. 3 is already a lot.

viral plaza Jun 4, 2024, 11:17 PM

#

yee

jolly swan Jun 4, 2024, 11:26 PM

#

silver sluice i just think overall SAI should have a special room for VIP fine tuners where th...

should - definitely. But if there is one I am not cool enough to be in it.

prisma rampart Jun 4, 2024, 11:27 PM

#

if time/compute is an issue, it would probably be better to skip 4B and train 8B properly vs having both 4 and 8 but both under-trained.

sick cedar Jun 4, 2024, 11:28 PM

#

prisma rampart if time/compute is an issue, it would probably be better to skip 4B and train 8B...

Or train a very good 2B?

#

2B looks highly capable.

#

And accessible.

jolly swan Jun 4, 2024, 11:28 PM

#

silver sluice do you have anything you can show that you've generated lately? 🙂 any sneak pre...

I am in the data dungeon fixing image captions 😦

silver sluice Jun 4, 2024, 11:29 PM

#

jolly swan I am in the data dungeon fixing image captions 😦

hey I'm excellent in dealing with data processing and automation, i have free time, let me know if you need a hand or some scripting and I could lend a hand, feel free to DM me whenever and we could discuss any solutions I could develop for you to expedite your process in any aspect, it's the least I could do for using your models so much 🙂

sick cedar Jun 4, 2024, 11:41 PM

#

jolly swan Discord is for pony lovers, it's SaaS that makes more reasonable money, but agai...

@viral plaza This is a similar issue to the one i was referring to earlier. I stress that SD3 may not reach it's full potential if it doesn't have the full support of major finetuners, but no one seems to be able to contact anyone official for crucial info on the final conditions of the SD3 License.
@viral plaza I know that you are extremely busy, and only one person, but if there is anyone you can put forward this issue to, we all would very much appreciate it.
(Thank you btw.)

viral plaza Jun 4, 2024, 11:46 PM

#

silver sluice i just think overall SAI should have a special room for VIP fine tuners where th...

👀

#

we still have the one that was made for SDXL launch

#

haven't expanded it since and the relevant team has changed around

#

that was a Joe Penna initiative. With The Joe gone, gotta get the higher ups on board with Joe ™️ methodology

viral plaza Jun 4, 2024, 11:49 PM

#

sick cedar <@105458332365504512> This is a similar issue to the one i was referring to ear...

Already relayed it internally to the relevant people, they said it'll be clarified before the actual launch

sick cedar Jun 4, 2024, 11:49 PM

#

viral plaza Already relayed it internally to the relevant people, they said it'll be clarifi...

Thanks btw. We're all on the edge of our seats. Haha! xD

cinder junco Jun 5, 2024, 1:35 AM

#

@viral plaza Do you have any info about how SD3 memory use scales with resolution relative to SDXL? I like to use hiresfix to generate at 3840x2400 resolution with SDXL, but don’t have a whole lot of memory spare above that. Just wondering what sort of resolution I’ll be able to achieve with SD3. (Mac with 64GB unified memory running Invoke.)

viral plaza Jun 5, 2024, 1:38 AM

#

cinder junco <@105458332365504512> Do you have any info about how SD3 memory use scales with ...

initially SD3 is not gonna agree with hires fix directly due to oddities of the mmdit arch (pending some clever fixes to positional embedding code), so rather tiling based upsampling is a better strategy, which doesn't use more VRAM (but does use more time)

cinder junco Jun 5, 2024, 1:41 AM

#

Thanks. Too bad! I hope some geniuses can work on that. Has anyone done any experiments with native generation above 1 MP? Does it still go crazy or generate artifacts? Would a higher-res initial generation be useful to lessen the number of stages or tiles in a tiled upscaling workflow?

low stone Jun 5, 2024, 2:23 AM

#

viral plaza initially SD3 is not gonna agree with hires fix directly due to oddities of the ...

Does it do image to image at anything above 1024x1024? So if I have a 1536 squared image from something else and want to do image to image on it with sd3, I can't without tiled ksampling?

viral plaza Jun 5, 2024, 2:46 AM

#

cinder junco Thanks. Too bad! I hope some geniuses can work on that. Has anyone done any expe...

if you go out of resolution range without fixing the positional embedding handling or using tiling, it does this (clear image in center, distortion on the outer edges)

viral plaza Jun 5, 2024, 2:47 AM

#

low stone Does it do image to image at anything above 1024x1024? So if I have a 1536 squar...

(A) fix the pos embed code (B) train the target resolution (I'm sure somebody will do a 2048x2048 tune right away probably), or (C) use tiled

#

tiled works well on SD3

cinder junco Jun 5, 2024, 2:49 AM

#

So A) would be sufficient to allow the same resolution flexibility as SDXL (assuming the fix is possible)?

viral plaza Jun 5, 2024, 2:53 AM

#

yes

#

somebody just has to figure out how to do that

sterile pendant Jun 5, 2024, 3:01 AM

#

viral plaza somebody just has to figure out how to do that

What is the current resolution range before it starts artifacting, assuming ~1 megapixel. Like can it do 1344x768 without bugging out? Or does it have to stay around 1024²?

#

Basically, how far from a non-square aspect ratio can it handle?

low stone Jun 5, 2024, 3:04 AM

#

viral plaza (A) fix the pos embed code (B) train the target resolution (I'm sure somebody wi...

roger, thanks

viral plaza Jun 5, 2024, 3:07 AM

#

sterile pendant What is the current resolution range before it starts artifacting, assuming ~1 m...

aspect ratios are trained in and work fine ye

#

basically the same as SDXL

prisma rampart Jun 5, 2024, 3:22 AM

#

viral plaza (A) fix the pos embed code (B) train the target resolution (I'm sure somebody wi...

since you expect people to do 2048 tunes, something no one really attempted with sdxl, and pixart has 2k/4k variants of their DiT model, are DiT models are generally easier to finetune to higher base res vs unet based ones?

viral plaza Jun 5, 2024, 3:24 AM

#

nobody had a reason to with sdxl

#

cause sdxl you can just do hires fix and you're done

#

sd3 will get distorty if you try to run it straight like that

#

so there's a reason to bother making a hires tune

#

also yeah the training team said that sd3 moved resolution objectives very easily

sterile pendant Jun 5, 2024, 3:56 AM

#

viral plaza basically the same as SDXL

Awesome, thanks! I kind of assumed so, but didn't know for sure.

dusky thistle Jun 5, 2024, 4:23 AM

#

viral plaza sd3 will get distorty if you try to run it straight like that

honestly, i don't think it works that great with sdxl either - a lot of compositions are degraded a bit with those latent upscales

#

stuff like... a sandy beach with patches of wet sand underneath dry sand kicked up with the color and texture clearly visible, pebbles and stones scattered around... that kinda stuff disappears during those latent upscales

#

it's not a huge degradation... in a way i'm glad it's a big one for sd3 so we can actually get a proper tune on higher resolutions

viral plaza Jun 5, 2024, 4:26 AM

#

tru

turbid grotto Jun 5, 2024, 5:09 AM

#

If they decided switching to 2b version, that means 8b wasn't close to be ready, so could API 8b be really far from it's final quality and we can see big improvements? Or it is already on the level that difference won't be really noticeable?

hallow lion Jun 5, 2024, 6:05 AM

#

diffusionhand

radiant ledge Jun 5, 2024, 6:13 AM

#

turbid grotto If they decided switching to 2b version, that means 8b wasn't close to be ready,...

someone from stability said 8B is very undertrained

viral plaza Jun 5, 2024, 6:48 AM

#

turbid grotto If they decided switching to 2b version, that means 8b wasn't close to be ready,...

yes 8b needs a lot more training

flint minnow Jun 5, 2024, 7:26 AM

#

How many vram do you need for sd3?

late compass Jun 5, 2024, 8:24 AM

#

viral plaza yes 8b needs a lot more training

Which one will best for anime version 8b or 2b

twilit hamlet Jun 5, 2024, 8:32 AM

#

How many vram do you need for sd3?

late compass Jun 5, 2024, 8:35 AM

#

@viral plaza

sterile heath Jun 5, 2024, 8:35 AM

#

twilit hamlet How many vram do you need for sd3?

2b model is smaller than SDXL while 8b is larger than SDXL + the refiner

#

2b model is about 2.5x larger than SD2 in terms of params

#

But it’s smaller than SDXL so if you can run that you’ll be fine

late compass Jun 5, 2024, 8:36 AM

#

How many power version 2b... Is that need more than SDXL

sterile heath Jun 5, 2024, 8:37 AM

#

late compass Which one will best for anime version 8b or 2b

It probably depends what the community trains the most.

#

8b will be better out of the box but it’s likely 2b will have more fine tuned variations via the community

late compass Jun 5, 2024, 8:37 AM

#

sterile heath It probably depends what the community trains the most.

Yes but better knowledge before experiment

late compass Jun 5, 2024, 8:38 AM

#

sterile heath 8b will be better out of the box but it’s likely 2b will have more fine tuned va...

8b will way more powerful and detailed then 2b...

sterile heath Jun 5, 2024, 8:39 AM

#

late compass 8b will way more powerful and detailed then 2b...

Only kind of, it has more potential but is harder to train. People still use SD1.5 fine tunes over SDXL even though SDXL is way larger and more “powerful” SD1.5 is simply easier for the community to train on their limited resources.

gusty gale Jun 5, 2024, 8:47 AM

#

sterile heath But it’s smaller than SDXL so if you can run that you’ll be fine

You're not considering that SD3 uses T5 XXL as one of the text encoders, with T5 it could use more memory than SDXL actually

#

This could also be irrelevant by moving T5 to VRAM and switch with the transformer diffusion model when being used

storm saffron Jun 5, 2024, 8:51 AM

#

gusty gale This could also be irrelevant by moving T5 to VRAM and switch with the transform...

T5 can be either quantised or loaded on system ram and offload to CPU

gusty gale Jun 5, 2024, 8:53 AM

#

storm saffron T5 can be either quantised or loaded on system ram and offload to CPU

True that, that's pretty much what I said. The size of T5 XXL might be irrelevant if the inference switches the transformer model with T5 when being used.

late compass Jun 5, 2024, 8:55 AM

#

UWU

agile hornet Jun 5, 2024, 9:11 AM

#

Is stable assistant using the 2B version because I liked some of the stuff I was getting when I used the trial

#

#

I still had some problems with hands but all in all I got some good output from it

hallow lion Jun 5, 2024, 9:13 AM

#

8B will serve as the template for the matrix.

#

We cant have that

#

were not ready

sterile pendant Jun 5, 2024, 9:20 AM

#

gusty gale True that, that's pretty much what I said. The size of T5 XXL might be irrelevan...

You can also do T5 inference on CPU as well, granted it's slower. You can try out what I mean right now by using t5 with pixart sigma. Even on cpu, it's not all that bad. Also, the prompt tokens can be cached, so as long as you don't change the prompt, you can just go straight into ksampling.

noble coyote Jun 5, 2024, 10:26 AM

#

Using SD3@ClipDrop - hands, limbs and faces are execrable!!!

#

Mostly ...

dull star Jun 5, 2024, 10:27 AM

#

sterile pendant You can also do T5 inference on CPU as well, granted it's slower. You can try ou...

yeah I didn't even know that you can just have it cached and reuse it, it actually makes tuning CFG, changing samplers not a chore

noble coyote Jun 5, 2024, 10:29 AM

#

Try the free SD3 @Glif - by a user named FABLAN

hallow lion Jun 5, 2024, 10:29 AM

#

just be careful

#

glif will deem everything nsfw

#

tread lightly

dull star Jun 5, 2024, 10:29 AM

#

not in my experience lol

#

idk what you are promping

hallow lion Jun 5, 2024, 10:29 AM

#

well i wasn't prompting nudity

dull star Jun 5, 2024, 10:30 AM

#

lol

noble coyote Jun 5, 2024, 10:30 AM

#

hallow lion glif will deem everything nsfw

Yes, even the most tame of female pictures were blurred-out!!!

hallow lion Jun 5, 2024, 10:30 AM

#

still half came out blurred

#

and was told to chill or ill get banned

dull star Jun 5, 2024, 10:30 AM

#

noble coyote Yes, even the most tame of female pictures were blurred-out!!!

yeah that's the API itself

#

glif only uses like a word list

noble coyote Jun 5, 2024, 10:31 AM

#

ClipDrop can often do the same - but they do not recompense you when you lose 28 of 40 pictures like that - and all from the tamest of tame prompts!!!

hallow lion Jun 5, 2024, 10:31 AM

#

omg

#

ripoff

#

tsk tsk

noble coyote Jun 5, 2024, 10:32 AM

#

You quickly learn which words/themes/topics will send ClipDrop into a headspin!!

#

Slender is a non-ClipDrop word ....

#

Sensual too ...

#

So I'm doing beach-crazy lighthouses for safety's sake!!! 🙂

#

Pixart-Sigma into SDXL

sterile pendant Jun 5, 2024, 10:35 AM

#

dull star yeah I didn't even know that you can just have it cached and reuse it, it actual...

Yeah it will hopefully be that way day 1 in comfy. Pretty sure it's just kind of a comfy thing for all nodes already. The only thing that might make it have to do inference on the t5 prompts again would be if it had its own seed set to random or something.

#

Then again, it could also be one of the other other nodes people commonly use like rgthree that does the caching im talking about. Haven't used vanilla comfy in ages

dull star Jun 5, 2024, 10:37 AM

#

ah yeah rgthree

storm saffron Jun 5, 2024, 10:37 AM

#

noble coyote Pixart-Sigma into SDXL

That's quite a good workflow for promt adherence + details. 🙂

sterile pendant Jun 5, 2024, 10:37 AM

#

Either way, it's an option and even in vanilla comfy, it would be like two lines of code for the node

storm saffron Jun 5, 2024, 10:38 AM

#