#πο½sd3
1 messages Β· Page 73 of 1
i think more people who want to diffuse images as a hobby should start studying art history. i could certainly do with more myself
yeah history of photography would be good to learn
you could just prompt for retro, but if the model is over fit on a digital aesthetic, then well, you get a modern look to your retro style image.
ye
balls

can you try a low CFG image
and see if you like it
oh I don't know about cartoon style
could you do a realism one
look at the meta data for this guy's images. he's posting sdxl syndey sweeny obsesssion love notes
marketing works too well on teens
I don't rly judge how people use the models
I personally don't do any NSFW or celebrity stuff
but open source is open source I guess LOL
i do. also, this is #sd3 . what is the context for these images in this discussion? none of his posts are sd3 examples.
yeh i believe in freedom, but i'm gonna judgem still
lol fair enough
whose posting sydney sweeny gens?
the way I see it is
what % of the photography + photoshop industry is either celebrity or NSFW?
probably quite a high percentage
I'm in the UK its actually unlawful here, surprisingly enough
they're generated with newrealityxl and not #πο½sd3 also, you really gonna ask what the problem with attractive celebrity generations are in a post swiftgate environment? k. just know that many like me exist. we'll see that and immediately know what you're doing
you do. same old bag of tricks
use google if you truly don'tknow. it was a big deal. you were maybe there even.
taylor swift and deepfakes - and the lawsuits
for months, yes
in fact, laws are being passed now that the mess caused
this image uses a higher cfg than your earlier iamge. meta data is all readable. you're clearly not asking for help here.
no. he's here to strut and cause arguments. how's your next lora doing?
been busy and haven't done datasets. hvaen't been in the right gear either. working slowly when i can
i know the feeling. looking forward to seeing it when you get it done
this is the SD3 channel. since you're not using SD3 you should be posting your images in some other channel
that's another with cfg of 7. join the 5 gang
- my cfg is never higher than 4
you're using an ensd too. people don't understand that setting at all
when i see people using ensd, i judge so hard
@torn wharf have you thought about putting your lora here https://comfyworkflows.com/
as a workflow
popular models authors on civit would be like "I recommend using this exact value for the ensd!" but like, why?
neat idea to host it. give people something they can use
you can also lower the CFG feel
with FreeU and CADS
FreeU normally makes the image look higher CFG
but you can reverse it
its good both ways
freeu doesn't work on sd3 since we use transformer blocks over here. keep up.
lol
not sure about A1111
yeh it does but it's inefficient
comfyUI
I can't go without PAG any more
PAG is absurdly good
sometimes you can add as little as 0.3 PAG
and it will fix things up as if you had raised CFG by 3
please go away
well diffusers is the best long term one in my view
so you can jump on diffusers now if you want
but it has maybe 1% of the features of comfy ecosystem
diffusers is made in the style of huggingface transformers
which is a top LLM library
its a much slower and more deliberate project
(but it will takes years to get good for that reason)
you are no longer welcome, all you've done here is strut around, try to prove you're someone special, insult all of us, and we're tired of you.
its not rly fair to criticise comfy
because comfy developed along with the tech
newer libraries are coming into it with the full knowledge of the tech from the start
i'm not sure how you can live in a world where people accomplish so much with comfyui, and still believe that they're masters and it's the software that is the problem. it's a bad act. i miss legends like blood wizard. these shenaningans here are low effort and just taking advantage of a cut back mod team. i gotta go. things to do. this isn't impressive trolling one bit. didn't even strip meta data or try to pretend that was the sd3 vae.
you can't call other people kiddo when you have a crush on sydney sweeney. that's just expected.
π£
boom?
explodyballs
boom!
I didn't realise they were actually increasing the CFG when they were saying they were decreasing it
that is definitely a novel form of trolling π€
yeah.
Hi, is there inpaint for SD3?
no
maybe powerpaint 2 node with brushnet?
I don't do inpainting so I am not sure
I was under the impression there were some model agnostic ones
Yes, use differential diffusion and blur the mask. Play with the denoise value, it will take far less than you're normally used to with sdxl, to make big scene changes(the sigma curve for sd3 stays high for a lot longer than sdxl sigma curves). You can use 1.0 denoise to completely change things as well
With diff-diff, you don't really need an actual inpaint model
The mask blurring is where it soft blends. I recommend 35-50 steps for best results
Nice image π₯Ί π¦
just a guy pretending to post sd3 pics and looking for advice.. weird troll
wasn't worth much popcorn
ahh, I missed the trolling
yeah - like more than 12 hours ago

Skill issue
Anyone figured it out how to finetune Sd3 for anatomy?
it does anatomy just fine as long as yo prompt it correctly
how to prompt correctly for a woman sitting in the grass, one leg flat on the ground, one knee up, hand on her knees, head leaning on her arm ... ?
Perhaps SD3 needs more of a story?
something like "Imagine a serene scene in a lush, green meadow on a warm afternoon. A woman finds a comfortable spot in the grass, feeling the soft blades tickle her skin. She decides to sit down, letting one leg rest flat on the ground, while she casually bends the other knee up towards her chest. With a gentle grace, she places her hand on her knee, feeling the warmth of the sun on her skin. Feeling relaxed, she leans her head against her arm, enjoying the tranquility of the moment as she gazes out at the peaceful landscape around her."
nope. there is a core issue with all subjects, not just a human femal or humans in general. the farther you get away from standing straight in front of the camera, the more the subject is affected. it begins to warp, things shorten or elongate, and the AI starts trying to draw the subject from multiple points of view. it affects everything - trucks, cars, birds, paint brushes - everything. and you can't prompt it out. it's in the core training.
so, with "prompt correctly" you mean only prompt for things in a simple pose, standing up, front view, etc ... ?
i mean prompt correctly - you have 3 encoders with SD 3, each has their own strengths and weaknesses. use different prompts with each. the issue has nothing to do with anatomy, either.
Don't waste time with prompts, anatomy errors belong to the model. If you generate an image without anatomy errors, it's just luck
go to sleep
until they fix the bewbs and so on I am more than happy to use it for backdrops.
it makes phenomenal backdrops
if they do bring the anatomy up to the level of the backgrounds and landscapes i will be impressed. that indeed would be almost like the last model we ever need
XD

i'm trying to create a logo, why does the background looks like that ? Also it seems that this is not the 16ch vae quality
ah it was the perturbed verision of sd3
What did you prompt?
the smaller model, schnell is 12B, its apache 2
but holy crap we might not be able to run this
yes and it's 1-4 steps. But yeah 12B is for the gpu rich lol
ComfyUI compatible?
so stability lost a lot of the people that worked on sdxl, svd and sd3 etc and they went and created their company ?
it literally just got out
But can you run it on ComfyUI? LOL
the company just annouced itself today and released the models the same time. So how can you run it on comfy. maybe you can't without support. But you have the demo on fal ai
great then, i missed that part
where did you find that ?
i just checked in comfyui, it is indeed supported 1h ago
running from comfyui ?
Nah that's SD3 π
I can;t find the .safetensors formatted checkpooints... only the sft from Huggingface
nevermind. Someone said .sft [is] .safetensors
and she runs...
Almost 24GB for a model file? 
people are using also 8 steps
Flux it seems
I tought it was an llm that described the image, like llava
so we can finally get woman on grass ?
Gib workflow
Yeah I can't figure out how to build the workflow. I dl'd the main .sft model and the ae.sft (maybe ae is vae?) but I keep gettting this error:
Updated comfy and all
Rename checkpoints to .safetensors
Put the model under UNET and the VAE under VAE of course.
Use the SD3 ClipText encoders
wait why sd3 clips ? Don't they have their clips ?
by the way you don't use clip g in sd3 ? Not worth it ?
I use all three clips in SD3
Apparently not
what if you use only T5 in schnell, what does it change ?
is the model good ? prompt adherence ?
yes created a logo with text and it was excellent
Ehh
text encoder is included in the file maybe ? and with sd3 clips you are just adding some random noise ?
Idk how you run this 
Im still at clip encoding with my 4090
https://replicate.com/black-forest-labs/flux-schnell π I test it here
I meant local 
OMG
Epic anime artwork of a wizard atop a mountain at night casting a cosmic spell into the dark sky that says "Stable Diffusion 3" made out of colorful energy
Works great, tysm!
its very good in producing text
`Write this poem with cursive text on a background that fits the words:
Sunshine is bright,
The sky is blue,
Birds sing sweet songs,
And so do you.`
I copied that already but getting stuck at clip encode unfortunately
Wow
Amazing!
Use the SD3 clip encoders... are you?
crying gpu tears
Yep
Photo of bionic robot man with glowing circuitry and complex gears. He has a grim expression. Background is futuristic cyberpunk world. His helmet has "62717" printed on it in blue letters. Highly detailed.
this is FLUX.1 [pro] btw, the other two model not as great
Cartoon, french frie fighting with a potato using forks
almost nailed the text
wait, what gpus are you guys getting this to run on?
3090ti
Im using https://replicate.com/black-forest-labs/flux-schnell, my 3060 wonΒ΄t run it xd
whats SD3 turbo eh hehe
I got it running but- at 768x1024 max size and it was at like 23.8 GB
That's a lot of parameters that don't seem necessary. Smaller models will dominate the future the same way that massive double Bay 5.25 inch drives weren't the form factor that hard drives used for very long. Smaller is how tech goes.
I'm not even going to bother since I only have 16gb
Creepy but good prompt adherence. Changed "grim expression" to "happy expression"
cctv footage, outside a shop a magician is throwing a lot of pizzas,
bad quality Image chaotic
This will be huge if we get it to run on lower hardware
holy sheet
Photo of Criminal in a ski mask making a phone call in front of a store. There is caption on the bottom of the image: "It's time to Counter the Strike...". There is a red arrow pointing towards the caption. The red arrow is from a Red circle which has an image of Halo Master Chief in it.
this make memes so much easier
that looks familiar
if I get fp8 weights and a workflow im using it offline fr
especially since its a turbo model, it can be dog slow for all I care, its all 8 steps
got it running on 3060, looks good 
Annuvin desu
possible on 3060 ? wow
aparently this model is good with cctv footage LOL
YOU WHAT? THANK YOU I'LL TRY IT
make sure you have a lot of swap if you don't have a lot of ram
I think the fp8 t5 might do better too
WHAAT HOOOOOOW?
π IΒ΄ll try
it's just the workflow from above
I was trying to figure out how to connect the nodes and renamed them
ohh
how long does it take ?
that's difficult to say tbh, sampling isn't that bad, about 13 seconds for 4 steps, but the constant offloading adds a lot of time
How much time does it take to generate an image? A few minutes I guess, right? (i also have a 3060)
PD: i just read the other comment
the image above took 130 seconds total
Are you on windows?
if you're just doing sampling, as in already have the prompt encoded I think it would be pretty fast, going to try with fp8 t5 now
yeah, I'm on windows
are there any fp8 weights for the transformer itself?
ah no wait, I'm blind, that was 1 minute 13 seconds for sampling, not 13 seconds
makes sense with all that offloading π
oh god my ram is running out
its reaching the 32GB
f****ckkk
there's just that one checkpoint in the flux repo right now, give it some time ig
its going into swap
possible we can get smaller version of the model ?
yeah fp8 would be the solution
or a further distillation, idk
huh, its unloading some of the memory
not bad at all
damn this is going to be so slow
lowvram mode π
huh? now its using the vram
its piling up
holy crap
we'll need fp8 weights immediatelly
ok 2s/it isn't that bad
godly model
send help
fp8 t5 + fp8 weights might be viable
cause holy crap my computer went super slow mode because of it
discord was dying
Anyone got a t5 fp8 link? I canΒ΄t find it on hugginface πΏ
it's in the sd3 repo
probably they have the text encoder baked into the model, otherwhise they would put it up for download. So you are already loading the TE and loading it again with sd3 clips
Just need to figure how to load it then
true that, it shouldn't be too hard to extract the transformer part tho
actually, there's a commit in their repo for diffusers support
they have the model split up into parts there
not sure if comfy can load text encoder that's in like 4 pieces
so yeah the DiT weights are 24GB itself, which isn't so great for offline use
but we'll see how fp8 will perform
maybe they are not that big, probably that's including the text encoder.
And if they use T5 as TE than that's pretty big already
sd3 2B it's 4 gb right ?
oh shit
I am using the Unet loader (heh its a DiT) so its just the model itself
what text encoder are they using?
all 3 ?
do you have a link where you can see what TE flux is using ?
okay so they just use Clip L and T5
yes
that clip L might have artist names which is nice
comfy added a flux example
https://comfyanonymous.github.io/ComfyUI_examples/flux/
and did some fixes looking at the repo
4090
waiting for the quant myself.
Give it a day or two for people to test stuff and let the dust settle I guess
But this really came out of nowhere
with fp8 T5, I can actually comfortably use this now
this is amazing
fully offline, apache 2, fast
dream come true
Where do you get the fp8 version?
Btw this is the SD3 channel, I think we should move to General 
yeah sorry I'll stop boasting about off topic models kek
oh duh, I can't read. Ty!
All good, im in the same boat lol
yuh
it's using sd3 encoders, it's like half sd3 
so it be 1/4 OpenAI, 1/4 Google
oai finally contributing something
clip L and T5 ya
is this also a 16ch vae ?
yes
Oof
update comfy
sorry one last question, is flux schnell or dev good at cctv? cause schnell doesn't seem to get it right
its sad that dev is the non-distilled model and its not apache 2 as well
such a bummer
You have a prompt or two you want me to try? I have Dev and Schnell... although Dev is calling for 20 steps and it takes forever π¦
I'm downloading Dev right now
but its just simply something like low quality, blurry, birds eye CCTV security camera photo of, a muscular homeless man with torn clothes and long beard is rampaging in a store. There are police men are around him trying to stop the homeless man
dev or schnell
dev
Schnell
How many steps you using with Dev?
20 simple same as the workflow
This is straight from the model, no upscaling
1792x2304
sd3 encoders are off the shelf componetns that aren't specific to sd3
i keep thinking you guys are talking about sd3 and i'm like wow native sd3 hires. very cool!
But you're still on about flux
so guys- you can actually just int8 quantize with torchao

(for those of us who dont want to use comfy)
~1.4 it/s on 4090
heretics... (need to make a warhammer40k chaos marine meme)
spingebill in gta san andreas??? no way
I'm a developer so - I dont like the codebase, like, at all 
Its just so worth to wait for images, especially when they're this intelligent and good
god dammit
its a dream come true
i'm a developer and i absolutely hate python... but with comfyui, it's not like you need to actually code for the base engine. making addons is super easy and you can do whatever you want there, so long as you use the template correctly
for the ins and outs
I stand defiant, with a piece of corn balanced horizontally sideways on my head.
I used to hate python until I realized how much you can do and how easy it is, also ML stuff is python so- I don't really get much say in the matter, and regardless I would rather have the ability to code stuff fast with very little effort required.
Also some things aren't really possible
i'm well aware... but realistically, what coding are you actually trying to do with already made models/samplers/schedulers/etc?
like? give a practical example
extremely fast 4step 1024x1024 hyper autocomplete generation
- over network.
100% doable with comfyui
I have doubts it would be as fast as mine.
there are vlm nodes for prompt expansion, there is an api flag and dev workflow mode to create the workflow for simple api usage
99.99% sure it would be faster, i doubt you have hundreds to thousands of lines of code for things like smart memory management. diffusers would be the only other close alternative
considering that for mine I literally had to create a custom react app with multiple websocket webworkers so it could process data fast enough without lagging the UI.
Also it has a custom backend which uses torch.multiprocessing workers which it can round robin submit jobs to async over two gpus, and even the image encoding is done via nvjpeg on device so that the only thing that comes off the device is jpeg bytes.
I made a neat torch extension for that
swarm works great for multigpu setups
yes
swarm based on comfy
no, im talking about https://github.com/mcmonkeyprojects/SwarmUI
we butter our bread with butter
Oh- ok, mine is docker swarm
stop arguing and look at this image
wow nice image
I'm not arguing, or at least not angy-arguing
I understand
it is silly-mode argument. I know I might be wrong, but I want to be right, 
this a gen?
Flux vs sd3 using a short no effort prompt. Though this us very case specific since sd3 doesn't really know the word anthropomorphic.
yessir, flux-dev gen
i wanna say i love how sd3 can do sd television screens with higher detail in the rest of the image. so smarticles.
But i'm afraid it's flux again
kek
I stole the tv prompt from dark and added a black and white comdey show
my paranoia has been validated. fk.
i was gonna say that looks like i β€οΈ lucy which tracks
but can it ball down?
can it ball?
It can't do smut
does it get up and get down with the ballness?
man, my poor 2080 is getting 25 s/it lol... schnell is the only way i can really play with it
there's nothing perverse about balls they're perfectly natural
Atleast they dont have 3 legs and 6 arms like asura unlike sd3
1 side, no edges. no corners. can't hate
oh nice, looks like comfy added in https://github.com/comfyanonymous/ComfyUI/commit/d7430a1651a300e8230867ce3e6d86cc0101facc to load it in 8bit
dude works so damn fast
i could live with that just fine, i was getting 25 second per iteration
WHA
bout to try it again
I haven't had much trouble with sd3
I barely managed to load the models and comfy already pushed commits π
I recommend fp8 t5
the license says you can't use anythign to circumvent limitations designed into flux
im using it
deepfloyd-aah shit
it's a non commercial license but i think they might offer commercial licensing unlike df
2D pixelart Gameplay screenshot of Terraria. The character is in the middle of the screen where it is holding a pickaxe. There is a wooden house next to the character.
It's not terraria, but it got it right?
they do
you have to email them and everything
you cant just buy it off of a website I think π€·ββοΈ
stability is setting a trend here for commercial licensing weights and i dont think it's so bad. money in helps quality out. in some cases.
yeh not self serve.
a price would be good to know though
WTF its so good
it doesn't know what diamond is so its not blue
not even dalle 3 could get this right!
hard to do when there's so many different use cases and the market is so new and fresh. like you'd license it to dreamworks for general production purpsoes a lot less than you'd license it to an independent creator for a specific use case. these kind of licenses usually require signing an NDA and sharing business plan data
i bet a self serve one would come. like the membership option for $9.99/m or whatever.
for not understanding diamonds in minecraft, it still is pretty friggin good at minecraft
you have the schnell model with apache 2 licence
that probably people are going to finetune and make it better
thats from the license on the hugging face page for dev
confirmed ball capability
idk how they are gonna finetune a distilled model, but if its made possible, then we have hope
There is a ball with a wacky looking face on it, floating in space. A wooden sign hangs from 2 small chains connected to the underside of the ball. It says "Get Yer Balls Out" painted onto it like a moonshiner sign. There are colliding stars in the very distant background.
switched the unet loader between e4m3fn and e5m2. e5m2 cut the time in almost half (15 sec/it down from 25 sec/it). image is identical with the same seed
yeh its not a bad model its pre good
half life 1 danger ball zone
I did it and still get the same error, Maybe I gotta reinstall
it kind of vibes source engine sometimes and goldsrc other times. but it's very effective.
awesome 3D render man! what pbr textures did you use?
people are not ready for the coming wave of fake
yeah i see why it's taking so long, i just simply don't have enough vram, so it's using sysmem. noticed my cpu was sitting around 50% the whole time i diffused and my vram usage was only like 6-7gb
this aint half life 1 but holy crap
i'm reluctant to go to general with images, because other people will start using the replicate link to experiment then. it'll become super busy and queue times will get longer. SD3 cahnnel is safe because all the haters ignore it
i really need to upgrade to something with 16+ gb vram
longer this stays under the radar longer i can crank out gens with it
ihave 16 and won't even bother trying to load this locally yet
I thought I didn't make the right choice with 24GB because of 2B fitting well anyway in 12GB
but auraflow and this model proven me otherwise
I am finding from gen to gen, some are super fast and then the very next one is super slow :/
well if you do, here's what to expect with something around the 2080 level lol
only messing around with the distilled model
4 steps lol
actually thats not so bad
yeah the distilled model is surprisingly intelligent for a turbo-ish model
I recommend it
it knows teslas but not cybertruck
but now I got super spoiled by dev so I'll keep using that π
yeah dev at like 30 steps would take an eon per image. assuming 15 seconds per it, that's like 7-8 minutes per image lol
oh I'm using 15-25 steps
at 15 I already got amazing images
25 might help a bit
also im using 1024x1024 type resolutions
well even still, they'd be like 5+ minute generations on my pc
lower res images work too
oh thats right, it can go WAY down if you want it to
like i think it can go down to like 0.1 MP even
it has to be democratized
these new transformer models hitting the streets lately have reinvigorated my prompt curiosity
auraflow was made working on like 12GB or lower right?
lemme test 512x512 real quick
8 seconds per it at 512
soon as i hit a few limits with clip style prompting, my idea of a prompt became pretty formulaic. i used to experiment with wacky stuff more often
schnell?
dev with woman lying in grass test though....
Dev
kek
Does Dev use more vram than Schnell? π€
not sure, think they're the same size
but they are definitely meant for 24gb cards
Time to destroy my PC with the 3060, iΒ΄ll test it
well like i just said, expect something around the 15 seconds per it range
15-25 s/it *
Lmao
my cpu is a 13600kf and was pegged around 50% the entire time, ddr4 3200 ram
No but it takes more steps π¦
Mine is not being used at all. Even when it goes over VRAM, it just eats RAM but no CPU hits
its like generating pictures but with alot of lag
This is perfect. You just have to know that the woman on the left is perfect and that she is just laying on top of the butt of another woman that she burried head first in to the ground π π
break dancer doing a hand stand. wearing an adidas tracksuit with neon tie dye colors. on a cardboard mat outside of a retro style diner. The Diner sign reads "BALLS DINER"
doesn't entirely pass the handstand test but it fails a lot lot less
is there any chance to run on 12gb with offloading or something like that?
crazy to call that an improvment, but it is haha
Yeah someone ran it on a 3060 earlier
first try
Where can I find that cafe 
i am with a 2080
considering i was pushing the prompt out pretty ahrd with a lot of extras, it did well
11gb? π
they actually took the anatomy problem seriously i think instead of justhucking porn at the dataset
Would it work offloading it if i have 16gb of ram?
ya it is for me
right next to hotel california
flux can make female nipples. yes. I said it.
this is peak balls. this prompt was fun. it takes randomness very well and tunes details extraordinarly.
all the more reason to contain it here.
goons will over use the free sites available. we can stave off september
so far i cannot make it create nipples that look like asperagus, but i've got nipples that look like fingers
can't do carrot nipples either. unfortunate
close enough? π
lol it does have fails with body positioning but is undoubtedly improved
Anyone else getting an error βconv_in.weightβ when trying out the flux example workflow?
no
π What version of comfyui are you guys using? I just get errors with the portable one
just confirmed they used the multi modal DiT structure too. 31m in funding makes them more of a powerhouse than stability if i'm not mistaken. uhoh stability. how soon before flux discussion gets seriously enforced upon ?
you need to use the Load Unet and not the load checkpoint
hmm sadly flux cant do impressionist paintings
it makes SUPER photorealistic paintings
to the point they look like photos
π
Im doing that :C
Yes, using Unet loader and the flux models. I updated confyui. Weird. Still get the conv_in.weight error.
Okey gotta reinstall from 0
bruh
my comfy is ded
just distilled things
dev is distilled pro
well not like turbo-type distilled, but yeah
its like distilling from a large model into a smaller one
and not making it perform faster at lower steps, no?
I imagine the pro model being massive
https://blackforestlabs.ai/announcing-black-forest-labs/ heres where i'm looking. guidance distillation it says
thank you
FLUX.1 [dev] obtains similar quality and prompt adherence capabilities, while being more efficient than a standard model of the same size
yeah it just wont make paintings at all sadly
they look good but they are too photoreal
yeah like it has one idea for "painting"
I even asked for large brush strokes
sometimes distillations can collapse a class like taht i've noticed. saw it on vodka sdxl models
different kind of distallation too
My 3060 is still worthy! 
It runs on 3060, 170s for 20steps on dev version, including text encoding on weak cpu. I am using fp8 T5 and lowram in comfy
been using fp16 T5 cuz both default to low vram regardless for me, seems the same time either way
how did you get that camera style?
Honestly, this is insane. Truly. I think SD4 is the only move they can do at this point. Or do a nullsoft style thing and jump straight to SD5
(stolen image from reddit) indeed, its an amazing model
Yeah SAI really need to up their game and fast since now they actually have proper opensource competition
broken base model. garbage
yeah. This is Proper
those 2 pixels are incorrect, unacceptable
The more competition the better it is for us
Is it normal that the gpu usage is 3%? (and only about 33Β°c)
I love auraflow team. Underdogs for sure. Woo woo for the underdog.
Blackforest have 33m funding
and expertise
Low quality disposable camera photo from 2007, there is Shrek and Joe Biden brawling and fighting in a hotel lobby, CCTV footage, motion blur, overexposed, discoloration, long shot
nvidia tookon pixart team. who knows what happens there .. but this is truly proper. I hope lora training for these don't need 80gb
Auraflow definitely proved that SAI isn't the only company that can produce a somewhat good opensource model, then Flux came in and punched SD3 out of the water
i woudl say that they just detonated a mother of a depth charge and we will see the surface results of it quite soon
has this thread been hijacked for flux? cuz wow it's good! sword test actually... works?
Black forest labs, is actually now working on a SOTA opensource text to video model
i saw that
pixart might be some 900m(?) model, and sd3.1 is gonna be 2b further pretrained
cant expect them to compete with a TWELVE BILLION paremeter model
but we'll see
not just a model but a state of the art model. state of the art being kling and sora
yeah thats a different tier. unless their training methods are far superior
Someone actually quantized Flux and made it run with only 16Gbs of vram
pixart got nvidia backing now. i think they're gonna be good. proabbly a ton of experience there they can access
yeah, the current scene for opensource Text2image is looking more and more like the opensource LLM scene, which is good
that might make the vae fit on the gpu as well so I dont have to wait for it
and maybe no more lowvram mode
yeah i was bummed about not being able to run it for all of an hour before that came to light
they didn't put out a technical paper weeks in advanced notice that? now that the model is released then it'll come out soon
stability should take some notes
most projects should take notes maybe. "heres what you need to run it. more later"
The most surprising thing about Flux release's is that it came out of nowhere, they didn't tease the model for like months or release the paper way ahead of the actual model release, they just released and where like "Ok here's good model, now we work on Text2vid bye"
My kind of company.
auraflow was funded, too, by fal
stealth mode switch over yeh
in direct comparison to Flux I think it's quite clear which model is better
I mean, Flux are same authors as SD3 more or less. It also mostly the same architecture as SD3 and Auraflow
i like to think of auraflow as the bobross team of ML. like, since they've started doing their thing, they've really sparked engagement i've notied. more people taking on tasks that seem like its something only corporations could do. then they do it in some form all alone. it's sorta like the college bowl of ML
auraflow i think will do more good there. spreading knowledge of how these things work. i don't think they stand a chance of competing with expertise like flux
Flux is open source. They will also release a technical paper about it+
just the one model is open sauce
this is really weird. There has to be the right prompt for it
yeah i know. we've talked about the architecture before. you said it was pointlessless multimodal
then told me i was triggered
I said it was weird to call it multimodal, as on the one hand all text2image models are by definition multimodal and on the other hand it is not multimodal in the sense of,. say, the Meta Chameleon model. But anyways, I don't see why you want to start that discussion again.
you're here explaining the benefits of mmdit to me now is why. anyways. just reminding you of our existing repor, especially in this context. you need not tell me of the architecture of sd3. we've been here before.
I don't explain the benefits of mmdit to anyone here π #
I said Flux is working great
Auraflow is not really good
both mmdit models
its not all about the architecture
it's really weird. Maybe we still don't have the right prompt... The model is great in certain styles, but seem to totaly not react to other styles like paintings
After generating the image the model unloads from the vram, is there any way for it to not unload? it takes like 5 minutes to load on a 3060 π
I was thinking embeddings should do fine. If the knowledge is in the latent structure, embeddings will pull it out. Seems like a good base model to pound out embeddings for. i haven't seen that kind of action since sd2
Oh, FLUX really surprised me in terms of quality; I didn't expect it. I'll be doing more tests.
I use diffusers for using it. There you can generate as many images you want
dark new long room, high water, room out of new white tiles, modern, with round lights on the walls, clean, the seling also has tiles, high water, high water, water is fooding the room
flux-schnell
Flux is also really great with hands and hands holding stuff.
So cute!
How? I got the same gpu
idk, it doesn't just count the sampling but also stuff like moving to vram
Bro flux is so good i can't π
dayum, that's some solid text
Its the best model IΒ΄ve tried but its so slow on my PC, about 10 minutes per image π π π
did you use the quantized version?
I think you need to download more ram, that's a bit excessive
even worst case from cold load I get only like 3 minutes, ok maybe 4
That's what it means to surprise the public. Without any unnecessary noise and advertising, the model is just amazing.
yeah t5 fp8
IΒ΄ll try again using the ssd as swap instead of an hdd
but it seems like it does not know what pixel art is "a colorful pixel art rendered artwork with the words "I am a cat!!" in bold, white letters. The letters appear to be made of a glossy, liquid material that is dripping and splashing in various directions"
"a shrunken very small adult person sitting on the neck of a flying goose nils holgerson. the duck is flying in the sky"
maby
"This image showcases a vibrant and colorful furry character in the foreground. The character is anthropomorphic, with a fox-like appearance, and is adorned with a mix of green and purple fur. The character has large, expressive eyes and a playful, open-mouthed smile, suggesting a friendly demeanor. The character is wearing a badge or name tag that reads 'flux def', which could indicate their name or a specific role within the event they are attending. The background of the image is a dimly lit room with rows of empty chairs, suggesting an auditorium or a conference hall setting. There are a few people visible in the background, possibly attendees or participants of the event. The lighting in the room is focused on the character, creating a spotlight effect that draws attention to them. The style of the image is a photograph. The clarity and detail of the character and the environment, along with the shadows and lighting, are indicative of a real-life photograph taken indoors."
"pixel art of a vampire castle under moonlit sky"
interesting 3d pixel art
oh, everyone's seen flux, i'm late to the party π€£ now this is what SAI teased all the time, who could have known they meant it would be released by some of their old scientists but another company, just weird, but i'll take it π€‘
"A vintage-style black and white photograph, captured from a top-down view, placed on a worn wooden surface. The photograph features a mysterious figure standing with poise, dressed in an elegant long gown and a high collar. Their hands are clasped in front of them, and they wear a striking goat skull mask with curved horns. The background is a deep, enigmatic darkness, contrasting the subject's prominence, illuminated by the flash. Beneath the photograph, delicate handwritten text reads, "Fear is weakness," leaving a thought-provoking statement"
What if flux is...sd3 8b + sd3 3b π³
okno
actually it works quite nice. You can even tell it to make resolution low or high
but it does not have very good style controll
I have the feeling you have to describe the style in the prompt. It does not react well on the typical CLIP keywords - which is a bit strange, as it is using CLIP
maby . i am not able to make game screnshots with it
The image shows a cozy, eclectic room with a vibrant, colorful ambiance. The ceiling is draped with multiple tapestries featuring intricate designs, including mandala patterns and depictions of plants and celestial motifs. The lighting is soft and atmospheric, with various sources contributing to the overall mood: Ceiling Lighting: There are red, pink, and purple lights that illuminate the tapestries, highlighting their patterns and adding a warm glow. String Lights: Multi-colored string lights are draped around the room, adding to the festive and relaxed atmosphere. Television: A flat-screen TV on the wall displays a scene from the animated show "The Simpsons"
DEV
impressive what promt?
A large painting in the style of Rembrandt
The image depicts an alien-like creature with a large, elongated head and dark, almond-shaped eyes. The skin appears textured and rough, reminiscent of reptilian or amphibian skin. The creature is sitting in a body of water, partially submerged, with its legs and lower torso hidden below the surface. The background is foggy, adding a sense of mystery, and features tall, thin reeds and barren trees, creating a marsh-like or swamp environment. The overall atmosphere is eerie and otherworldly, with muted colors and low light enhancing the creature's unsettling appearance
i dont see the full imAGE
Idk but the image didn't upload so here it is
I think it can do pixel arts and games very well xD
A large painting in the style of Rembrandt depicts an alien-like creature with a large, elongated head and dark, almond-shaped eyes. The skin appears textured and rough, reminiscent of reptilian or amphibian skin. The creature is sitting in a body of water, partially submerged, with its legs and lower torso hidden below the surface. The background is foggy, adding a sense of mystery, and features tall, thin reeds and barren trees, creating a marsh-like or swamp environment. The overall atmosphere is eerie and otherworldly, with muted colors and low light enhancing the creature's unsettling appearance.
Fresh comfy install and its even worse π I give up
A large king painting in the style of Rembrandt of a alien
you can try --lowvram
the text abilities is so good
actually it was "this star trek episodes features Micky Mouse on the bridge of the Enterprise, standing next to Jean-Luc Picard, both in their starfleet uniform."
But it seems to not know Picard.
interesting so it might conect star teck with lower camera quallaty
I mean, if you ask for star trek discovery it might not so, dunno
A large, mystical snail-zebra chimera with an agitated demeanor crawls through a dense, shadowy forest, its plush, velvety fur coat transitioning from deep ebony to bright alabaster, creating a mesmerizing zebra-like pattern. The chimera's spiraled shell, reminiscent of a giant snail's, glistens with iridescent colors that contrast with the muted greens and browns of the overgrown foliage. Its wide mouth reveals sharp, pearlescent teeth, while large, expressive eyes convey curiosity and agitation. Antennae resembling delicate tendrils protrude from its head, enhancing its otherworldly charm. Low trees and delicate branches frame the scene, creating depth and enclosure. Dramatic lighting filters through the canopy, casting sharp contrasts and highlighting the chimera's intricate textures. The ground is littered with rich, earthy tones, and vibrant greens of the low vegetation evoke life and mystery. Captured as a movie still on a 35mm lens, the image showcases extraordinary detail, making the chimera appear lifelike amid the enchanting yet eerie atmosphere of the Schwarzwald.
oldschool movie of a vampire holding a bottle of tomato juice. He smiles, showing his fangs. The film quality is Grainy, because it is filmed with Kodachrome camera.
it does not work reliably, but you can prompt it to do old camera
Its also good in plane anatomy too
it's good in anatomy in general
competition leads to progress, maybe irrationaly so, i still think SAI can createa killer model, their SDXL and cascade were soo good and 8b halfway trained not bad at all either
sometimes you just see a camera in the image
I would say:
- anatomy is awesome
~ prompt understanding is middle
- style is not flexible enough yet
A delightful close-up photo of a charming, plush Pikachu. It has a blue and black body, with thick thighs that accentuate its cuteness. Its eyes are a vibrant red, adding to its adorable appearance. The overall atmosphere of the image is playful and sweet, making it perfect for a cuddly plush toy.
the Flux guys are the same who made SDXL and SD3. Cascade was not made by SAI themselves
Prompt understanding is up to par with SD3 medium in my experience
but the consitensy of a good image is also very high. you dont have to try 5 times
but i agree with everything else
yeah, that's painfull, but SAI has the data, they just need to put it all through training, somethinng lacking in the sd3 models
oh, yeah, definitely. It's not on DALL-E level and far away from Ideogram level, though. That was rather what I meant with "medium".
FLUX
modern anime style, woman standing on the grass, fur coat, thick winter coat, long bare legs, thick hips, dark skies, sunny, bare feet, gorgeous scene, anime screencap, stone temple altar, skulls
ideogram is still sometimes better or wait a sec this is the schnell model
flux seems to do hands quite well, they definitely paid attention to anatomy
Ideogram is way better in regards of prompt understanding. It looks horrible most of the times, though
but Ideogram -> controlnet -> SDXL was always a nice pipeline
but ideogram style promt understanding is not so good
definitely beyond imo
ok the flux def is more comparible to ideogram ideogram vs flux def
indeed !
impressive
seems they have closed some of the gaps that sd3 left
I don't even know if I want SD3 8B anymore
Flux is all you need
but i cant run it local!! π¦
if have you 16gbs of vram then you can run it locally
this version has better images than fal playground
yeah this understands what painting is and does more dynamic images. Just a little worse hands
yeah schnell makes paintings whilst dev somehow doesn't
a stunning painting depicting of an alluring sexy sith russian woman sitting in a futuristic black obsidian throne, with pale skin and glowing red eyes casting lightning from her hands towards the camera. The scene is dimly illuminated by the blue white lightning that blinds the camera. In the bottom of the image there is a text saying "I am Flux !". She is wearing a hood and is wearing a crop top black shirt with the empire logo from star wars movies on it. Ultra detailed, masterpiece, best quality
this is in this HF space
and these are in the fal playground. Do you see that the text was forgotter here and the images are somewhat more realistic
yeah, oh in the HF space there is no cfg
I dont remember cfg on fal either though
and in comfyui
ohhhh
cause it was schnell π
I want it because it will be easier to run π
How does CFG cause a OOM error?
can't use anything except basic guider afaik
it actually makes it slower
neg conditioning causes oom on my system
and you can't set cfg with only positive conditioning
you need something else
also im no longer using lowvram due to fp8, but its not even faster lol
in fact, it might be ruining my image quality since its fp8
but idk how much
Well, what can I say? The guys did a great job, giving us hope. Now, for me, this is the top model among open-source ones. And just imagine what will happen when various ControlNets, Loras, and fine-tuned models start to appear for it. I'm eagerly waiting
but at least the VAE is loaded in at all times, so the overall generation is faster!
unipc seems to work
I suppose ODE samplers must work out of the box
there is cfg
I tried the CFGGuider
i changed the guider to cfgguider, yup about tripled generation time with bad results at 3 cfg
yup
I use diffusers. cfg 4-5 works great on the dev model
yeah must be some shit with my setup here
unipc_bh2
bosh3 is good
yeah there's lots of extra model calls
holy moly its so slow π
I remember it being slow but its way worse with this
my gpu is melting rn
bosh3
not bad, but not worth frying my gpu for it and waiting 2-3x longer
the calculations are likely still upcasted to 16 or 32, so the speed will stay the same
you just lose some precision. like we engineers get memed on for "pi=3"
PI equals THREE π
(realistically, all of us know and use 3.14159)
but i still love the meme
or physicists using 10 for gravity lol
π
lmao
i got shit like this when i tried cfgguider
same
first was with zeroed out negative conditioning, second with a traditional negative prompt
k
just blurry mess
it may be designed to just be all in on positive conditioning
it does work well for fast approximations though. like if you tell me some circle has a radius of 3 and you want me to tell you the area, 3 * 3 * 3=27 is pretty fast vs 28.274
and the very concept of cfg might not make sense with it
Flux is actually pretty damn impressive, dogs on SD3 for sure π€
Text is pretty good
Yep it's good.
oh I need to set that up, I usually inpaint in Automatic1111
We claim no ownership rights in and to the Outputs. You are solely responsible for the Outputs you generate and their subsequent uses in accordance with this License. You may use Output for any purpose (including for commercial purposes), except as expressly prohibited herein. You may not use the Output to train, fine-tune or distill a model that is competitive with the FLUX.1 [dev] Model.
I was wrong about Dev
dev is strict about it and schnell is apache 2.0 iirc
good luck enforcing that lol
its like openai restriction
"cant use it to train competitor grrr!!! π "
but basically, we can use the generated images for commercial purposes IF we want to!
so I'm happy about that
it's like saying that students can't become teachers cuz they're competing with their old instructors lol
probably to prevent big players, not us small ball lora makers. court room discovery is a real thing and can be compelled
whats this magic?
"show us the dataset" and then if they are caught hiding it , that's pretty bad
so then it becomes ok so we'll only use some but at what scale can they hide it and still be useful
tbh, i don't see that happening anytime soon
does flux share their full dataset?
if you demand to have a dataset opened up for copyright reasons, well... you better hope your own doesn't have a single copyrighted image
its a small clause that we scoff at cause like we care, but for midjourney or stability maybe, they'll calculate that risk very carefully
what is flux and cna i run it in comfyui locally
and i sincerely doubt any makers of any major models are that clean
this protects them too in the event that others try to start sueing. MAD
exactly
i don't see anyone trying to enforce it
it's got potential to be suicidal
oh yeah its abig ol mexican standoff
Flux is a new opensource text to image model developed by Black forest labs, its 12B and it requires at least 16Gbs of vram to run
can comfyui lower those requirements hehe
https://fal.ai/models/fal-ai/flux can play here
there is also a hugging face space for the schnell version
there are 8bit models which bring it down to 16 gb cards. some ppeople sort've fit it into 12
within day of release
some one ran it on 8gbs of vram but its super slow
I might have been wrong
I looked into the source code of diffusers and it seems that "guidance_scale" is not CFG
dev has cfg i think
its more an additional parameter you can condition the model to
its a different guidance? waiting on that technical paper. its gonna be illuminating
So it's different from CFG yet it behaves like CFG?
it looks so in the code, but I might be wrong
this is not what i had in mind when asking for an impasto oil painting, sd3 can do this better
it's weak on paintings tbh
pro isn't though
yeah this is schnell
dev gets a very generic painted look with all the different painting styles i've tried so far
if company A sues company B for training on company A's data
company B can't in court demand that company A reveals their dataset
I don't rly get that idea
knife palette painting?
in court you can't just flip who is defence and who is prosecution
if company A convinces a judge that their case has some merit, they can compel discovery of B's dataset
but i don't see it working 2 ways
yeah
company A can compel the dataset of company B to be revealed
but not the other way around
that would have to be a separate lawsuit filed by company B against A
or if potentially the models came out at the same time, and b convinces the judge that they both stole their data, it might become suddenly relevant to discovery
judge/arbitrator/whoever
not a lot of precedence here but i could see that
discovery is a 2 way street, but you gotta have good arguments to see
yeah I don't think the argument is strong
to see company A's data
but judges have so much discretion that it is not impossible
i'm not a lawyer. i just watch fictional ones on tv
still think output is as generic as possible in the definition to prevent a distilled model or things like that, not as a fraction of training data, that's unenforcable, esp when all these ais exists solely because training on anything is fair game
There is a difference in steps though 4 steps vs 8 steps
after 4 steps it be like "i'll give it more realism i guess"
weird that the pro model has another knob to turn: Interval is a setting that increases the variance in possible outputs letting the model be a tad more dynamic in what outputs it may produce in terms of composition, color, detail, and prompt interpretation. Setting this value low will ensure strong prompt following with more consistent outputs, setting it higher will produce more dynamic or varied outputs.
this is fair yeah
they are mostly trying to stop distilBERT style projects
done on their model
building blocks of their soon to be SOTA text to video model
by the way the CADS node does this
(gives more variety for lower quality)
its not gonna be quite the same
but I have loved using CADS this week
yeah, wonder how much of pro is pipeline vs better because original weights
yeah 12 steps (different seeds but the effect is clear)
usually these closed ones are pipelines now
like stability AI ultra
or midjourney
both very much seem to either have noise injection or an upscale followed by a downscale
or some other methods of getting the complexity up
What kinfd of performance are you getting? On my 4090, dev is been dog slow ... got it to go fast maybe once or twice but now it is slow.
And it fluctuates a lot. Sometimes 3 sec/it and then some steps up to 20sec/it
Everyday you come here you never know what will blow your mind...
This one took 300secs
DEV, 20 steps.
what's the prompt ?
Alcohol ink portrait of a flowerlady with a cat in front of a filmstudio in L.A. Fluid and vibrant colors, unpredictable patterns, organic textures, translucent layers, abstract compositions, ethereal and dreamy effects, free-flowing movement, expressive brushstrokes, contemporary aesthetic, wet textured paper.
gonna try it in SD3 and compare
I tried 24 steps to see if it would give me the film studio, but no
in the last week I started getting better outputs out of SD3
by combining PAG, CADS and vector sculpt
I'm not saying its skill issue but SD3 without added nodes is nowhere near as good
its like a completely different model
Anyone know if there is a int4 t5? π
pag should not work in sd3 since pag is for unet
however idk what cads and vector sculpt is, do you have a workflow ?
sd3
tried it in pixart sigma, it gives a nice style there
I thought PAG worked with SD3 but maybe I got confused
I use like 50 different workflows each day so I get confused all the time
shit, almost good hands
I will send you a CADS and vector sculpt workflow
wow Sigma smashed it! really nice for sigma
Yeah I know, pixart is great with this (like a lot of other models btw)
why can pixart do styles while other new models cant, bit of a shame
looked into the source code a bit deeper
of the new dit ones pixart is most reliable for me (so, despite flux being great and all, looking forward to pixart next too :p)
so the DEV and SCHNELL model both are no CFG models
only the PRO model is
the other models are distilled variants of the pro model
they have a "cfg" parameter, but this parameter is only "simulating" cfg
so they do not really do classifier free guidance, but they try to generate images that look like what the PRO model would give with this cfg value
which also means: we cannot use negative prompts with Flux π¦
uh I would be sceptical about that
is it in comfy yet?
you're running outta vram
when you get real close to running out it slows down like fuckin crazy
honestly negative is not so necessary with such quality and adherence
did you know with PAG
you can use SDXL fairly well with zero CFG
its interesting style
not cfg negatives, but wasn't there other negative guidance (remember a bit way way back about it)
Flux seems to be offloading to system ram but it is not using shared memory at all, what a point of it and can it bring any benefits?
I mean, even in SDXL I rarely have to use negatives. The problem is rather that Flux is not good in many styles yet
there are other things than CFG out there
like Perp-neg
if you use the Sampler Custom Advanced node
you can use any guidance you want
you could write a python script to do some weird custom sampling method
its not common but its perfectly possible
hmmm but my RAM isn;t being hit.. .and the VRAM is staying aroundf 23GB
so flux is 23 GB... remember the times when 2GB for SD15 seemed steep -_-
that's really, really close to the quicksand
23.3 is where on windows i find everything stops
total vram in use, not just comfyui etc
SHUGAR
if you see that shit happening, minimize to your desktop
do shit like close extra browsers, windows with your output folder open, etc
faux cfg 3 vs cfg 5
maybe less realistic styles just need low cfg to get better (like low steps as Eloyse showed)
but min to your desktop will usually bail you out
faux cfg? what's that mean
I mean, perp-neg is a cfg variant, isn't it?
i meant immitated cfg as kaibioinfo saidd no real cfg in dev model
Damn... have to use fp8 mode π¦
but yes, you can probably just do classifier free guidance - it's a hack anyways. No diffusion model was build with cfg support
don't know yet I haven't read about it
I saw it on the comfy discord
shouldn't be needed
you just need to have less shit open lol
just wondering what your form of faux cfg is lol
it would have to have quite a funny architecture
for CFG to not work
give us your nectar
Thanks... yeah, down to 20 seconds 20 steps on DEV default. Had to close Chrome and Photoshop... now it runs ... I was wondering why I was getting FAST, then slow then FAST randomlyu... ooof
Thanks for the tip.
the architecture looks so goddamn clean
yeah
the model seems to "hallucinate" not so much
like in SD you usually get weird stuff everywhere
too many chims and windows on a house and so on
in Flux you can even look into the background and most of the time it looks logical and clean
I don't know. We do not really have a smaller parameter model to compare with


