#๐๏ฝsd3
1 messages ยท Page 75 of 1
got it
nice
not perfect, though
๐ ๐ ๐
but you just have to give it a good prompt
did you try i2i with tcg? maybe it will capture it better?
you just have to set the aspect ratio correctly and describe that you want a card
no need for i2i
prompt is
"A card from the Magic the Gathering trading card game. The top of card shows a Manticore, a chimeric monster with a lion body, bat wings and a scorpion tail. The title of the card is "Manticore of the burned land". The bottom of the card show his stats and skills."
try describing the stats and skills, it can probably do it correctly then. It seems to even do a paragraph of text correctly if you tell it what to write.
is this channel now gonna be all flux basically? lol
they should drop 4b
Thx but I can't find 4bit T5 
we can talk about sd3 when they drop a decent model ๐คทโโ๏ธ whenever and if ever that will happen

In two weeksโข will be about 3.1.
Been a year since I tried my tc workflow, might have to dabble again
#โจ๏ฝsdxl message
SD3 medium is still really good for photorealism
its just not very flexible and it breaks super easily
with SDXL you can do noise injection into cross-attention or key matrix and it won't break
i mean the new joke is 2.1 weeks
pretty amazing, if the text was correct, its basically impossible to tell it from a real card
nah, it only got the first part of the text correct
maybe you can fix that with inpainting
but I assume its just too small
you probably need some kind of "detailer" workflow for that
time to print some cards and make money from home 
"A boy holds a card from the Pokemon trading card game in his hand, while smiling joyful. The top of card shows a Pikachu with a santa hat. The title of the card is "Santa Pikachu". The bottom of the card show his stats and skills. High resolution, 8k, best quality. Photography."
this model is just crazily good ๐
yeah 100%, basically ideogram/dalle3 but open source
magic happened, why my text encoding are instant now
maybe you moved the text encoder to your gpu?
oh, its prompt understanding is far away from ideogram
I would say its also not Dalle3 level
but its photorealism is really good and overall I think its a competitive model
probably yes but it is not anymore ๐ฆ
again, it's first try
I also got sometimes bad images, but in general flux produces good quality on first shot
yea flux is first tries champion
damn, knows gta 5 nicely
lol
dont ask me what i generated, just take it :3
good boi
the loading screen of Civ VI?
Flux and Auraflow are next level! Will stability be able to compete with them?
i didnt try auraflow 0.2, is it improved?
they fixed the way they hold swords? ๐ฎ finally
Crispy details. ๐ง
It's very good doing hand and swords
In only 4 steps very impressive
im just generating with the online version, im not on my laptop
What's happening in two weeks? ๐ญ
2.1 weeks
wait, make a similar image, but with the old bike that has like a passenger seat, and then put Robin as well lol
nice, prompt?
i think i just wrote: a scene from Grand Theft Auto 5
it sort of worked, cant really see the sidecar tho
Compared to what? I am thinking of getting a used 3090 at some point....is it a better pick than a 16gb 4070 ti (S) or amd 7900 xtx - 24gb?
swanks! ๐
Iยดd say the used 3090 is the best option, some models like flux work better with 24gb of vram, and im sure newer models will requiere even more
I don't know why robin is a kid tho
hmmm... I thought it might - I mean, I thought that might be the reply ๐ What if I use Linux? I guess Windows is better to use, perhaps?
Nvidia in Linux - mixed reviews? ๐
I havenยดt tried AI image generation models on linux, only LLMs on my 1060 and 3060 so Im not sure exactly how they perform ๐ฅบ (for LLMs it worked good)
Also Nvidia is faster than AMD on AI as far as I remember
What about Grass laying on a woman?
DAMN YOU for stealing my joke first ๐
t5xxl + clip_l
t5xxxl ONLY... same results almost. Looks like Clip_l may not be doing much?
clip_l is your ambient, artsy, and background stuff, or should be used for that. clip_g is the workhorse, it drives the entire process, and should only be given the black and white, concise, description of the image. and t5xxl is your encoder with the high comprehension. give it text, and all the fancy details
the clip L one looks better, the reflections, the little lights, even the head poses
and that's what it is for. those ambient, artsy, background details
Flux... did not think there was a CLIP_G for Flux...
CLIP-L is only used as pooled embedding as extra conditioning. It don't have as much effect as T5 has
there isn't. Only CLIP-L and T5
clip_G wasn't created for stable diffusion, they just use it for stable - you can probably make it work for flux, it's using the same architecture as sd3
Its easy to get spoiled by Flux's consistency with hands. There is still the cherry picking, but it nails most prompts I'm using the local install.
just because you're not using it, and robin didn't impliment it, doesn't mean it can't be used with it
yeah, have fun retraining flux from scratch lol
i can type you win this time keyboard
good try flux, other than the backwards nose, but anything upside down is model cryptonite.
Flux
oh well, i went all in and asked it to add watermelons after this ๐
yeah ๐
Flux
whattt it's too good at logos
Just d/loading Flux.dev - is it worth all the "better than MJ" hype?!
yes
Is that true text quality of Flux? Or is it added later?
Eau keigh! Is it also good at TEXT?
it's the best at text actually, better than DALLE
Cool. Cannot wait ...
True text quality
generated now
I'm impressed
local
We need an extra room for Flux discussions/imagery
agree with that
probably they have their own discord server for that
flux seems to nail every style and almost spot on with prompt adherence
delicious
the only thing about flux that atm from my perspective looks a bit worrying is the requirements to finetune or to make LORAs
i've a 4090 and i'm wondering if i'll be able to make LORAs of it
You need FluxDev btw all, not Schnell.
model is quite heavy
yeah, it hurts the GPU pretty hard
i like schnell tho too, a few wonky arms in it tho compared to dev
Orc Trump is amazing
using local ollama to create prompts from image inputs
schnell > sdxl, but no controlnet. Not worth it imo but I like it for sure.
I don't even use control net lol
It's all I use. ๐คฃ I made an art app just for it and use that. (I'm doing art pre-ai tho.)
haha, controlnet is my crack
trying out the flux dev version at 25 steps = 420 seconds per image. i cant even be mad.
I haven't tried over 20 steps, and 10 was okayish.
i just have to temper my patience like im doing blender work waiting on renders
yeah, it's a bit on even a 4090, feelin it
im using it in fp8 mode, i only have 8gb vram, so shits slow as balls
2 more hours till I can get home and run my massive list of all-new tests.
i wonder if this is going to test my patience enough to do this shit downstairs on the 7900xt
7 min too long to experiment imo... Not enough chances.
dont break anything or kick the pet, breathe
anyone find a flux discord yet? nothing on their X/website/gitpage
yeah that's my usual stance. im only testing the dev model because cfg works with it. the schnell model doesn't use cfg(even if you try, you'll get the same image)
schnell 4 step is like 60 seconds per image, which i can cope with
i'd never, i just like the quiet upstairs office. the kids are noisy during the summer
can we theoretically distill a 12b flux model into a smaller one like 4 or 6B. 3b even better ?
yes, in theory
but you need tons of money that's it ?
i still don't understand why they didn't realease schnell with a lower parameter count than the teacher model
not terrible. i'd have to throw some other hybrids at it since they are a fun way to test how well it understands things
and it's just male nipples and ken doll under the mask
flux guidance 3 ? it's like cfg ?
dev or schnell
?
dev
yes but it's so slow
im using kijai's fp8 model(the file size is massively reduced and since i'll never be able to run it above fp8, that's fine)
you could use onediff if on linux, they have already supported sd3 so maybe it works with flux as well
something like 1.7x speeds, the thing fal uses to be fast
gpu?
nope, wont touch linux. f'ing hate linux and i've used it plenty in the past, dozens of distros
2080 FE
I'm with 4070 but get 24s/it
i get 15s/it
ram?
if you care about speeds. The thing is i use adobe a lot and can't have linux, unless i virtualize windows lol, or dual boot
ddr4 3200, but i have a 13600kf
3090 => 1s/it
moist
fp8 or fp16?
it's only simulating cfg. Its not real cfg
aww
fp8. But it seems to not make a huge difference
yeah it doesn't really make a difference, the VAE will still do its shit in fp32
we're going to see more and more models go this route and likely do away with negative prompts anyways. cfg was always a hack
with fp8 I get 3s/it
not sure about that. I think the teacher model has cfg
otherwise: why would they even have cfg at all?
hahaha, sure
Requested to load Flux
Loading 1 new model
loading in lowvram mode 9811.074999809265
100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 20/20 [00:53<00:00, 2.65s/it]
Using pytorch attention in VAE
Using pytorch attention in VAE
Requested to load AutoencodingEngine
Loading 1 new model
Prompt executed in 97.64 seconds
it might, but the whole goal is to make models act more like turbo/lightning/hyper models where you use cfg 1-2, at cfg 1, you have no negative prompt
but what is the main differences between fp8 and fp16? Prompt understanding?
and with these being LLM based, you can put negative shit into the prompt. like saying "a human that isn't wearing a jacket"
I'm not sure if that is the goal. I think it just what happens during the distillation
here is a provoking thought, can we have a better flux model by merging schnell with dev?
you cannot use cfg when you have low amount of steps anyways
right
I think there is no big difference. Might be just fluctuations. Like running a model with 20 steps or 30 steps. Later might look better in some cases but overall its not a huge difference
thanks
anyways, my point is, the goal will likely be to eventually do away with negative prompting and if you don't want certain things in your image, you'll just say that in the prompt
now it work in less than 1 min for 20 steps
Thanks for the feedback/reply! bows
I hope 5070 will get at least 24GB
I think we need a Flux channel, this poor sd3 section!
what is flux?
it won't. it will probably be 16gb though, if not, it will be 12gb again
I wasn't here for like a week, and now it's all you see
Flux is a new opensource SOTA text to image model that is 12B parameters made by black forest labs, it beats SD3 by long shot in my experience
Do we know the technical requirements to run it local?
Wondering if my 16g vram will cut it
That should be fine for the Dev and schnell versions using fp8
schnell sounds like an expletive
I have a 4060ti 16GB, I am using Flux Dev, upscaler, ... and it works (fp8)
Dev is the best model and schnell is the distilled more smaller model that only uses 4 to 8 steps
why not merge them together for a better model with train difference ?
Schnell is German for fast
what would be the point of that?, Schnell is supposed to be the fast "turbo" model
last "flux" question. Do you have a good workflow with upscale?
schnell is better at styles and faster while dev is better at quality and anatomy but slower, maybe we can get a middle ground model that has the best of both. There is only a matter of supporting it in some merge node and try it
that would result in overfitting and make the images look horrible, and it isn't as simple as merging two models to get a middle ground model
there are merges of distilled sdxl models with normal ones. Are you sure this would not work ?
and i'm talking about train difference which would train the difference between schnell and dev
Basically putting the data from the two models into one model?
yes you do model A (schnell) - (Dev(model B - Schnell (modelC).
so you train the difference of schnell and dev into schnell
But would that work on DiT architecture?
can you share workflow for upscaling?
it does work with other dit models like sigma, so it should work
Tbh i am just waiting for Flux's research paper to release, so i can understand how the model works then i can make an informed opinion
dev is better at quality because they fine tuned it from schnell. if you merge them, you might make a mess
both models are derived from pro, where did you see that ?
very simple
No they didn't?, they are both from Pro
oh you use a model for upscaling?
FLUX.1 [pro] the base model, available via API
FLUX.1 [dev] guidance-distilled variant
FLUX.1 [schnell] guidance and step-distilled variant
you go right ahead and try to merge a couple of 12 billion parameter models on your machine. tell me how it works
they are both derived from pro, look closer. what you just showed it does not state what you said.
i'm so very sick and tired of you. ignored now
go to sleep
can you share ?
Flux models kinder to your VRAM https://huggingface.co/Kijai/flux-fp8/tree/main
I have d/loaded 22Gb FluxDev.sft AND 22Gb FluxSchnell.sft
kinder to your disk space too
In the image, you can see it, but here you have...
oh thanks
dude, basically everything you write here is bullshit lol Instead of blocking people you might just read their posts properly
dev and schnell are both distilled variants from pro as Water lava but watery said
exactly
thanks for the insult, reported for harassment
yeah, whatever. Troll
if merging works... who knows. I was surprised that merging TurboSDXL in SDXLBase works. But if it makes things better...
regarding styles
I made a small experiment with one of my favourite styles
old dungeon and dragons book covers xD
So I used the prompt
"illustration by Jeff Easley"
to find out if the model knows who he is
the image that came out doesn't look like images by Jeff Easley. But it looks rpg-like, so it seems to know the connection between "Jeff Easley" and "Dungeon and Dragons"
I'm home and back to testing at last! ๐
so I tried the same again, but this time used different prompts for CLIP-L and T5
T5 gets "an illustration by Jeff Easley"
but CLIP-L only gets "an illustration"
It cannot do negation. (No other model has ever done negation, but since it shattered all my previous tests, I just had to try.)
a landscape without any color red, there is no color red anywhere at all
what comes out is an arbitrary illustration that has nothing to do with RPG
this is CLIP-L and T5 both get "Jeff Easley" prompt
There's a lot missing from its dataset. Also can't do TCGs. Information can be added in Loras though. Model IQ cannot.
This is only T-5 gets Jeff Easley Prompt
it can lol I showed it above
A trading card?
Sorry, looked and didn't find it. Can you repost?
i can't seem to do vampire teeth with flux, any typical prompt ?
anyways, what I wanted to find out with my experiment was: The T5 has no idea about styles at all
maybe they did not trained on artists stuff at all
the only knowledge the model has about that comes from the CLIP-L encoder
Not as cute as Stable Cascade, but yep. Works now. Also not what I came here to do!
same with "an illustration by Jinjo Ito"
Only T5
Only CLIP-L
so only CLIP-L has any knowledge about artists :/
this is not the case for "common knowledge artists" like Van Gogh
clip-l is the one from sd2 right?
i think even to this day, all the prompt meta is rooted in clip-g sd15 style prompting
New, never-passed-before physics test! Can it conform a draped sheet to an arbitrary 3d form??? ๐ฎ Yes. Yes it can. I could say it's not perfect, but in all honesty I think it might just be "not the way I would have drawn it". It understands 3d forms and fabric physics.
A draped sheet completely covering a sportscar.
A draped sheet completely covering a pineapple.
A draped sheet completely covering a pyramid.
A draped sheet completely covering a minecraft creeper.
even when i use prompt expansion models from like fooocus or all the ones that dynamic prompts accesses, they all produce clip-g style prompting
love the creeper
flux or sd3? I'm not getting any response out of that prompt with jinjo, in either clip-L or t5. it's just a generic illustration. running fp16 flux dev and t5 fp16
there's really no reason to use sd3 2b anymore imo. it needs juice
(cause its always super fucking annoying to find out how to change text encoders in comfy and most time you need stupid plugins for that)
flux has stolen 2b's mojo
triple text encoder node that's in the sd3 example workflows has always worked well for me
the only confusing part is the folder name . t5 weights go in /models/clip/
they've actually made it real easy with this
It can do accurate fluid form conformation, but the prompt makes a huge difference.
a ghostly mist eagle made out of smoke against a black background
misty smoke in the shape of an eagle, against a black background
misty curling wisps of smoke forming the shape of an eagle, against a black background
guess that's new. I don't find it in this confusing overloaded menu X_x
I cannot get it to do general form destruction. ๐ฆ Can anyone find a magic prompt for this?
an exploded view of a guitar exploding into pieces, there are pieces of the guitar flying everywhere, it is shattered and cracked and destroyed and has fallen apart
Yes, it can do a bat swinging a bat. Homonyms FTW.
An anthropomorphic bat swinging a red bat in a batting cage
a guitar broken into 5 pieces
It was able to cut sections out of it. That makes sense. I doubt it can independently manipulate those sections as 3d forms. It doesn't understand the component structures, only the entire structure.
dev and sch use the same vae ?
uh oh, what have you started. ๐
LOL can you make the pieces fly? Explode? That would really amaze me. (I have this test marked as fail for now though.)
i get the feeling that it's applying the segmentation later in the generation. his face forms, THEN the breaking lines.
My thinking exactly.
yes
yeah and it won't "explode" them.
remember the movie "The Cell" ?
Nope, can't say that I do.
flux doesn't seem to
The Euler sampler cannot do pie charts, but the LCM sampler can. (LCM is great at very straight lines and geometric shapes.) However, Flux doesn't understand the ratio proportions. (I had high hopes for a second on my first render, it was showing a 70/30 split.)
The pixelation here is what the model outputted.
A pie chart showing a 70% / 30% split.
A labeled pie chart. 70% (seventy percent) of the pie chart is blue. 30% (thirty percent) of the pie chart is red.
creepier than kroenenberg imo https://youtu.be/RNP4caHnknA
The Cell movie clips: http://j.mp/1L5JkyK
BUY THE MOVIE: http://bit.ly/2cenEZb
Don't miss the HOTTEST NEW TRAILERS: http://bit.ly/1u2y6pr
CLIP DESCRIPTION:
In Stargher's mind, young Carl (Jake Thomas) saves Catherine (Jennifer Lopez) from harm as a horse is vivisected in front of them.
FILM DESCRIPTION:
In this science fiction thriller, child ...
This model will fine-tune for animation excellently. (Let's not think about how much VRAM fine-tuning needs though...)
sixteen images in a 4x4 four-by-four grid, animation frames, the animation shows a cat running
testing local flux on my 4080. dev is 100s gens and uses a little over 12gb
now i can run all the prompts. ran out of free on the pblic labs. and those are over flowing now that eveyrone knows about flux
bbl i gotta go out for a bit. got it running for when i get back at least
Ummmm... I'm gonna say no. It cannot understand SVG. Probably reacting to the word "rect" a little.
<svg><rect style="fill:#FF0000; stroke:#00FF00; stroke-width:5;" width="100" height="100" x="0" y="0" ry="10"></svg>
(SVG is an image format usually used for clipart, and it is made entirely out of text that looks like XML or HTML.)
oh nice going to check it ty
Does someone know why aligned scheduler works for flux but aligned your steps does not ? They are basically doing the same thing.
Without knitpicking, these were done with the laziest prompting possible on one-shot. Normally you'd use controlnet. This is hands-down the most consistent anatomy I've seen out of a model. With controlnet, it would probably be perfect.
superhero doing a cartwheel
superhero doing a handstand
superhero doing a backflip
superhero doing a high-kick
superhero doing a t-pose
probably the same reason it doesn't work for sd3
But aligned scheduler does the same thing right ?
Well it does the same thing and you can select sigmas from sdxl, SVD and sd1. But all of these work for flux however the ays node returns a black image. I'm thinking it's black because the the aligned scheduler has the option to set force sigma min true.
Lmao, figured
its dog slow and I don't know if it has an improvement
Ode samplers ?
Btw does anyone know a sampler that does a good job but requires less steps ?
It can do solid particle cloud fluid-like conformation. ๐ฑ To say nothing of the profound understanding of these 3d forms necessary for it to render them transparent so intelligently. This is just a phenomenal physics test pass. Somebody get this model a trophy. ๐ ๐ ๐ฏ โ
a transparent glass vase in the shape of a shark, filled with toy blocks
a transparent glass vase in the shape of a sportscar, filled with toy blocks
a transparent glass vase in the shape of a microphone, filled with toy blocks
@fossil pagoda you had a favorite sampler or scheduler I can't remember that did great pics with pixart sigma but it required less steps to work. Too much it would overbake. Do you remember?
Curious what y'all are using for finetunes of SD3? Is SD3 medium small enough that dreambooth becomes more comparable to a lora in terms of training time + ease of use + performance?
It can do book covers but that's no surprise. I saw people doing movie posters on a stream.
a book cover, the title says "FLUX", the cover art shows a mad scientist with wild white hair in a caricature style
Ummm... I'm pretty sure this isn't 180 degrees though. More like 120 or something. Still all the curves look very accurate. Maybe better prompting would do it. I'll probably come back to this and see if I can get something consistent that I can run through depth stereo conversion and test in my headset.
a 180VR fisheye lens photo of a mountain landscape taken from the porch of a log cabin
1920x1080 landscape in 464 seconds. VAE is running on the CPU. No RAM problems. This is definitely not Stable Cascade. ๐ฎโ๐จ
Also gave me a painterly style? I wonder if that's an artifact of the resolution or a problem with my prompt.
I'm switching to a prompt with HD photography tags, and starting a 4K render.
Okay. 4K is rendering. I'm going to go watch That Time I Got Reincarnated In A Conference Room. Check back in a while.
did you add lens settings? like 50mm etc.
someone also mentioned turning down the denoising to 0.9 to get more photorealistic images
I went with this. If it doesn't work, I'll go back to 1024x1024 and practice prompting for landscapes and mess with other settings. I still haven't experimented with any samplers except Euler and LCM.
raw photo, a beautiful Alaska landscape with snowy mountains in the background, with a lake in the center reflecting the mountains, surrounded by a field of wildflowers, in the sunlight, with a blue sky and white clouds, dynamic lighting and shadows, HDR, 4K, cinematic masterpiece, extremely highly detailed, white balance
humans are very good as well, better then sdxl base 100%
now we need the big finetuner haha
is this nsfw or sfw?
How?! it only gave me this with a very NSFW prompt!
I also much prefer the SD3 art for SFW. I must be missing something Flux.
SFW
pretty thick lego lady so i wasn't sure
Are you using the "schnell" model or the dev model? Dev is where you want to test that
The cost of legos has gone up, that would cost thousands for a physical version these days ROFL
I downloaded Dev. Perhaps it can't do furries?
the 4 step turbo one is more censored (from what I've heard) - dev does NSFW for me
hmmm no sure on that tbh
sets are expensive but bulk bricks not so much. still higher than usual. The secret is to find garage and estate sales. "$20 for that whole bucket"
will be trained a lot coming up anyway I'm sure
Or like most models, only does nude ladies ๐ฆ (and not futa ever either)
yea, pulled your image/wf and prompt may just be a bit TOO much.....looks like you are using dev
I always start with the most obvious when testing lololol
suuuuuuure ๐
i'm annoyed that it's more like a lego board texture than it is a lego construction. but yeh, it knows things
But if it only ddoes SFW, then glif is faster than my poor coputer ๐
someone's gonna report that image and a mod's gonna come look at it and shake their head. lil time bomb of laughs for them to find
stability's got bigger concerns right now
Usually unless it has || nipples || it's sfw
you should have seen all the ladies laying in grass photos that used to get posted here lol
Now I want to do lego versions of famous statues ๐
i'm not even naughty prompting. there's one word "bsuty" that's doing it pretty sure. word that's been blocked from this server. 19 foot tall toy block statue of b**sty woman. The statue is built with colorfully miss matched Lego bricks. positioned outside of a building on the sidewalk. her hair is constructed out of lego bricks as well. the brick layers are clearly defined through the entire construction of the statue
the entire prompt is about lego and all the model is putting out is THICCCK ladies
smart to use miss-matched and 19 foot tall! I wouldn't have thought of those two things
its def more clinical in a way, i prefer sd3-8bs response to styles as well, buuut.... it's early days as dark has show nclip still has some influence, end who knows what will be developped for this model
using the same seed and a different prompt leads to really high consistency in the style, idk who needs to hear this
atleast it's useful for making game assets
Do you mean SD3, or?
sd3 still might benefit from all the style knowledge in clip-g yet. "Greg Rutkowski" meta came from how the clip tokens would direct the unet. not the unet itself. he actually came up not often in laion dataset
sorry i mean flux
that meta meant a lot less on refined models, but on the base model the knowledge the clip tenc is trained with is a big player
same seed, different prompt :)
In that case, I'm guessing you mean, even morso than the other models? ๐
t5 isn't an image pair trained model so i don't think it matters. it has self attention and works on the paralel network. but the openclip L layer, thats SD2 right? that stil lhas some trics. Ornate was a gooder
you see you see? same seed. same prmpt. i just threw ornate on the back of the first image
haven't gotten around to remembering all my old sd2 keywords yet.
When should we be expecting Flux ballz? ๐
its very ball capable already. i might try that lora when i see lora training code. or an embedding perhaps
yes, unfortunately the Flux model itself has barely any style knowledge
#๐๏ฝsd3 message first thing i tried actually
you just need to describe styles in an sd2 vernacular and throw some lore at t5 to chew on
I played around with style prompts separately on T5 and CLIP-L
"illustration by Jeff Easley" CLIP-L only
same prompt, T5 only
"Illustration by Jinjo Ito" CLIP-L only
same prompt with T5 only
so the very few style information Flux can deliver comes only from CLIP-L
at least for copyrighted artists
common artists like Vincent Van Gogh can be prompted by both
"painting by Van Gogh" CLIP-L only
same prompt T5 only
my biggest disappointment is that no matter how i approach it i can't get a flux capacitor
unfortunately, CLIP-L can only shift the style in the right direction, but it cannot safe the thing. Like the image above looks nothing like "Jeff Easley".
Either Flux does not have much arts in its training data (for copyright reasons) or they used some automated captioner method that removes all style information alltogether and leaving only a content description
a warrior riding on the roof of a DMC delorean illustration by Jeff Easley
Flux can't count arms! It does great at fingers though, on all THREEE arms lol
i think its the classic distillation problem i've faced a few times. classes get compressed down to a style that overtakes the whole thing. every painting looks the same. every illustration looks the same. but if you tinker, you can evoke the depth of knowledge better. the latents are just a little bit galvanized for the time being.
less steps helps. that was 10 steps eular.
With schnell? That kind of behavior is normal of turbo/lightning style models. You lose a lot of variety
flux pro is their money model. that's what they're intending to use to build all the public weights they release. distillations
i'm using dev. it's distilled too but i'm not sure what way. differently
just use 6 step with schnell, its a lot better then 4 steps(it can even do nsfw to some degree then)
Yeah probably
It's a big enough model though that it still has plenty of variety
i wanna get embeddings going on it. i wonder if sd2 embeddings would work
SD 1.5 only
they should work, but probably won't help much.
CLIP-L is not part of the attention/transformer
sd3 feels like fusion in dragonball to me. like, all the parts were there ready to fuse together. but 2b became veku. Still pretty awesome and strong, but not cell killing strong you know? so blackforest tried again with more zen and formed gogeta, the true SD3
i'm not sure there's any reason to load sd3 anymore
which leaves golf balls in the corner
doesn't seem to know vegeta. it just keeps drawing him like goku. BALLS
yay, flux can to texts on the back of a tshirst \o/
now that i've done a bunch of images and dialed in a few settings. i'm getting 20second generations on average. faster with less steps of course. but yeah. slow for you maybe.
lol, that was just a prompt i tried with sd3 8b and was the first disappointment, this time with flux, it's better, it's much like ideogram (ideogram still a bit better)
Huh. Can it just not do 4K? 3612 seconds for 3840x2160.
Gonna try hires fix 1920x1080 scaled up to 3840x2160. That might work? Hmm.
but it's def a slow model, and there'll be a lot of room for lighter ones
Uhm, NO! As someone who has never used ideogram, I can definitely say with 100% confidence that flux is way better! I don't need evidence to support my opinion! ๐
it has best prompt understanding over all models
The new one at least
aesthetics is mediocre
guess the prompt
Dragon
close
||Trogdor was a man. He was a Dragon Man. No wait, he was just a DRAGON. TROGDOR! BURNINATING THE COUNTRY SIDE! BURNINATING THE VILLAGE! An S shaped dragon with a human arm with huge bicep and human colored skin. Fire breathing.||
Got a hires-fix workflow. This was 512x512 -> 1024x1024
Gonna try now for a 1920x1080 -> 3840x2160 and see what happens. It'll be a while.
The encode files for flux are the same from Sd3?
Not exactly the same but yes similar. Flux has 2 text encoders, sd3 has 3 text encoders.
A realistic picture of Queen Sheeba of Egypt?
I want to know if i need to download them again since they look the same files
no, clipl and t5 are the same
You donโt need to download them again
tks
An evil bat swinging a bat in a batting cage
vampire utopia with schnell
Nice. That's a good depiction of one
I can't believe this silence from stability. I don't even fathom what they are thinking.
They released a 3d thingy yesterday and showcased sd3. 1 2.xb on twitter (sadly only with basic prompts)
The only good response is to just deliver a good model, which is still WIP, 2 weeks, and such
yeah but i would have thought they would be a little more active on discord, at least when SOTA drops on their discord
guys
i'm not gonna call it flux anymore. The new name is SOTA lol
how do we get stable diffusion 3 ive been
trying to download it for four hours
I am
very dumb and desperate
https://huggingface.co/stabilityai/stable-diffusion-3-medium/tree/main
you have three choices :
- pure SD3 model (4GB one) with no text encoders integration. This will required you to download the text encoder separately which you can find in text encoders folder
- SD3 with CLIP embed (6GB)
- SD3 with CLIP and T5 embed, one is fp16 (16GB) and another one is fp8 (10GB)
THANK YOU
i'll do that rn
well it's not hard to think that SAI is undergoing mass internal structural reform.
and that take them so long
again I suspect if Emad ( biggest shareholder for SAI ) and other shareholders will allowed any "meaningful" changes or just "eh, just do it idc"
just downloaded the third one, what should I do next?
I mean, of the 17 authors of the SD 3 technical paper,. 14 authors are now working on Flux
including Robin who was I think the leading ML scientist at SAI
i dont got automatic111 or any of that stuff the only thing I've successfully downloaded is this thing
yeah seem like internal drama is cooking real hard
Nope, not a single thing. I'm new to this
Sorry
but i'm a fast learner just lmk what to do ill do it in 5s
and to be honest. Flux was the nicest release for long time
oh yeah you should have download the webui first, anyway you have two major choice, A1111 and ComfyUI.
For your case A1111 would be a suitable choice.
not the typical SAI drama
with stupid hypes, delays, drama
Bet! I'll download A111, so like.. uhm sorry but where is it exactly?
I mean a lot of people praise it tons lot in r/StableDiffusion
Will be good to go watch some youtube videos on how to install from scratch.
I don't know how much Black Forest Labs are interested in continueing open source and open weights
bet i'm gonna download it rn
but as SAI seems to be not interested into open source anymore it cannot get worse
well, that comparison doesn't benefit SD3.1. The biggest advantage is the size, which can be used with less VRAM, but in terms of composition and quality, I think Flux is better. I would also like a real comparison with poses, people, anatomy... I've read that those who developed Flux are former SD developers?
Part 2: How to Use Stable Diffusion https://youtu.be/nJlHJZo66UA
Automatic1111 https://github.com/AUTOMATIC1111/stable-diffusion-webui
Install Python https://www.python.org/downloads/release/python-3106/
Install Git https://git-scm.com/download/win
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git
Download a model https://civ...
basically all people (14 of 17) who developed SD3 are now working on flux
including both leading researchers
if so, wasn't the timeline messed up
I mean the resignation happened in March right?
and SD3 release in June or July
and yes, the SAI twitter post is ridiculous. Even with SD 1.5 and a good finetune you can make stunning photos of dogs. That's not what anybody is interested in
just downloaded it
put that SD3 model into {AUTOMATIC1111}/models/Stable-diffusion
the "SD3" release never happened
yeah sure whatever
fake SD3 whatever they called
I guess they just start training a new model what they then delivered as SD3 medium
they even said that the model was not properly trained. So I guess it was more or less some exit-plan to keep people calm down
bet rn in my desktop i got these two ima do this
So now I understand why Flux is so good
honestly... auto111 is not really good anymore
A 2.5b can't compete with a 12b. It will be faster and more accessible but it's all. Maybe 8b well trained could
i get better trogdors with this approach. The shape of an S with green scales. At the top of the S shape is a dragon head with a black V between the eyes representing angry eyebrows. Attached to the top of the S curve are a pair of small wings. Attached to the bottom are two capital black L letters representing his legs. A large human arm is at the middle of the shape, attached to the S shape at the shoulder. Large human bicep arm. in the background is a country side, burning village in the distance
cant get that man arm though
I would recommend InvokeAI if you want an easy UI, but that one won't support SD3. For SD3 you can use Comfy or SwarmUI
Get comfyui ๐
oh wrong method of downloading A1111, you can watch the video sent by YouFunnyGuys above
reforge is pretty good too
honorable mention but was a different sampler
oh ok
the difference between A1111 and Reforge is quite small isn't it?
No. It's huge
after A1111 somehow pull a tons of feature update and performance fixes in 2 days
Did everyone give up on sd3 already? lol
if something new happens for sd3 i'll move over
Just more toys to play with
give it some space
until then, trogdor was a man.
he was a dragon-man
(really not sure how jared leto and willem dafoe ended up in this)
little bit peter stormare too
all the peter stormare prompts. is kinda like him consistently through the fashion show, but only kinda
arnold on the catwalk. or at least what arnold prompt does. No time for balls we got a lot of latent ground to cover
you people using flux?
the new release takes a lot of resources but quality is high, and ppl are referring to this as the sd3 we should have had
im not sure if you are already using flux
im patiently downloading the 23gb model file
Nice prompts! Iยดll steal them ๐
name that movie scene
WHAT
their next goal is to release a SOTA text to video model
who?
black forest
black forest labs?
ahh nice nice
black forest labs kinda has a vibe of a dark symphonic metal group
sweets, im almost done downloading the necessary files, hope my system can handle image gen
So does this already work in comfy without requiring 24 GB of vram?
should
people are already using it on comfy
i have 12gb vram
and 32gb ram
keeping my fingers crossed
Epic
Same but 16gb of ram and it works (I have to use around 40-60gb of my ssd as swap )
Please precede these images with the letters NSFK (not safe for keyboard). My drool seeing them is endangering my hardware
40-60gb of your ssd? whats that?
Woah
i have 2 ssd drives, both at 512gb limit
im not hoping to mess with page files
im on windows 11 btw
I only use it because Its too slow with 16gb of ram but you probably donยดt have to do that if you have 32
ah ok, im a bit apprehensive
some people were saying flux locks up their system briefly when rendering with vram lower than 16gb
well I have 8, so..... gues we'll see
Scne 22, "The Great BacKon Race"
It lags a little and maybe they ran out of ram, but with 32 youยดre good
I think you technically can run it if you have enough ram (or a big ahh swap but it will be extremely slow)
about to test it out .. less than a min left and im done with all the file downloads
I have 8gb vram, takes 10 mins per image lol
BTW glif is useful
4-6 minutes with 12gb vram
this is important for optimization https://comfyanonymous.github.io/ComfyUI_examples/flux/
the majestic salmon returning home to find their mate
some suggestions offered in that page
lol. it depicts an actual movie scene
Im hungry now lmao, time to cook
if only fishing made it that ez
really? nobody knows the movie. bacon in a warehouse leaping around
||https://youtu.be/j8XGmZ8HDIU?t=29 || answer key
Stalker

Iยดll try generating a more stalker version, with a swamp or more cloudy :3
btw you'll have to update comfyui for flux type support
Prompt?
oh shit its slowing my pc as i try to render
retro-futuristic painting by vincent di fete (giant evil grinning balls:1.2) magazine cover by beksinski
This is beautiful, good chuckle
not too soon?
Never too soon
trying a big chunk of prompt
all fun and good but eventually i'll have to play with rendering girls
damn...
Legend!!
remember this tv show?
ba ba baaaa bababa baaa. babababa baba baabababa baah
ok bye gang. have fun with the tru sd3
love this ball
you mean true flux
Sd3 is in a state of flux
why is it so hard for all ai models to make a pickaxe, seems like a trivial tool to recreate
a dozen generations with flux and still not quite
Sd3 or flux?
friday flux hour. welcome to the twilight zone. its time to get smooth
Flux pro
late night in the flux zone
Flux.Schnell, t5xxl_fp8_e4m3fn Clip, 1024x1024, 8Gb VRAM RTX2070, 64Gb RAM, prompt = A giraffe in a red hat, reading a blue book. A green frog is on the hat, and a yellow snail is on the frog. 130 seconds to produce!
Flux shnell is the distiled model ?
come enjoy the sultry styles of mrs rabbit, down at the flux lounge. late nite at the flux
I believe so. 4 steps only, and good for smaller VRAM
Flux.Schnell - prompt = Trees in the dark is a hand-painted painting by the artist Karen Kaspar in landscape format. Watercolors, ink and acrylic paints were used.
Under a dark sky stands a row of trees with dark trunks and dense dark foliage. The silhouettes of the trees are subtly illuminated from behind by the moonlight at night. The picture is painted in a mixture of blue and black hues and radiates tranquillity and a mystical atmosphere.
Is it good in prompt following ?
s generating 2 images with Ultra, using prompt a poorly photographed image of a foolish looking person drawing a really crummy attempt at a bad horse
See my giraffe picture above - nearly 100% correct!
'cept the snail was not on the frog as described
Looks good , and you can combine with t5
Prompt = A cartoonish dog with intricate, colorful designs all over its body stands against a backdrop of lush green leaves. The stylized patterns on the dog include various shapes, symbols, and colors, creating a whimsical appearance.
Excellent clarity - and very good prompt coherence
Prompt = Steampunk Agostino Arrivabene Rob Gonsalves Remedios Varo Leonora Carrington Dorothea Tanning Wolfgang Lettl Hieronymous Bosch two guitars playing by a chicken and fox. So cool!!!
Text is much better than 2B SD3 - but not quite at Ideogram standard! The sign was prompted to say "Chicken & Fox Rock!"
love the fox tail on the chicken
Here is one with the text as asked for ...
... nearly!
New York City peruvian arpillera style of victo ngai, henry rousseau, vladimir kush, canaletto
According to the CEO of Black Forest, the released versions of flux will not be fine tunable. Then again, they are already distilled models anyways
This is so cool - and waaaay better than 2B!!!
Even w/out finetuning - flux is knocking 2B out-of-the park!!!
Just throwing it out there for the degenerates expecting some pony version of it
2B is a small model. flux is a 12 billion parameter model. of course it's going to work better
Yippee!!!
Yeah I'm starting to develop Stockholm syndrome for the 5min dev model times lol
Well it's even stomping 8b as well from tests I've seen people run. Hopefully stability has a contender in the oven like a 3.1 or something
i'm sure they've got stuff they're working on
Sd3 still kicks ass
The one thing I don't like about flux is it's so aesthetic that it's really hard to get it to do ugly stuff or poor quality stuff
Like I was trying to make sketchy looking trailcam shots of cryptids with poor photo quality and it just won't do it without them looking high quality
lower steps
Start prompting "anti-aesthetic" words like grunge, distressed, dystopia, rough texture, gritty, dark etc
Werf!
Oh I know what I'm doing with wording, just saying, they still come out too good looking. I'll pop out some examples in a bit, working on something else right now
does found footage
Yeah it does, but that's still too high quality
I'm talking pics with terrible exposure, bad flash, etc etc
Still too normal looking
what, exactly, are you looking for then
Have you tried "disposable camera" 9r "disposable camera from the 80s" in your prompt?
to me, that IS unnormal. i think i need a visual example of what you're after
There needs to be modifiers as in MJ to tone down saturation/brio/smoothness. Some of the material produced by Flux is way antiseptic and lifeless
Not sd3, but you het the idea
reminds me of junji for some reason
Junji and Dali lol
the spirals and the trypophobia stuff
ahh damn, knew it lol
oh duh, says it in the image
We just need a SD3/Flex comfy workflow now ๐
I have my workflow on civitai, just dont really wanna plug it. You can search for my username, same as my actual discord tag
Careful, he'll report you.
Whatโs the point of flux having a non commercial license?
What are being gonna do with it then?
shnella is comercial u can do whatever the f u want with it ๐
apacheeeeee
train it
Fluxpony
I recommend training Dev if you're not planning on running a business though. It's clearly superior, and its output can be used commercially (including for training other models). You just can't host it for image gen services.
I'm getting this weird banding effect at 4K. There are vertical stripes with a canvas-like texture on the mountains. ๐ฆ
Maybe it's the settings I'm using. ๐ฎโ๐จ
i only seen similar with upscaling
im never sure if the images posted here are flux or kolors or any other better model at this moment
XD
It's amazing isn't it...
Without controlnets, lora, refining, and ipadapters and even negative prompts it beats the competition.

All ym fancy workflows trying to make sdxl and sd15 great feel obsolete
for a base model it's definitely very good, but having controlnet integration will make it more powerful
its better to come out of nowhere and surprise everyone rather than create hype and then not live up to it
lol
sure
an user already uploaded a pruned version of the model .. narrowing a 23gb file to 11gb
and people are already planning on fine tuning it
yeah the community moves fast when there is no licensing issues hindering
There's nothing comparable to flux right now as far as I know.
Flux Dev is already on Civitai, and no ban hammer dropped yet. ๐
https://civitai.com/models/618692/flux
ah yes it was onyl a matter of time ๐
Has anyone experimented with samplers / schedulers yet? I'm wondering how they affect detail / text / anatomy / realism / lineart / architecture.
Euler has never been great at details. Looking for an efficient way to experiment now.
Fine tuning Flux is very impractical since it will require a ton of GPUs to train it which would cost a lot of money so there is a possibility that this models ends up not having any finetunes
So, if I finetune it you're saying I'll be the only one? An absolute legend? ๐ค
if you got a tens of h100s then go right ahead
definietly video is next
but first: consistency please
We cant tell stories without consistency
I don't see a problem with renting servers. I think there will be quite a few people who have both the money and the means.
I don't have to own them, right? How much is it to rent one, $5/hr? What did you mean by "tens"? Twenty? Fifty? Where did you get that number? How much VRAM does an H100 have?
80GB Vram total
80GB VRAM and $3/hr. The model is 12*4 GB, so 48GB. You need the weights and the error accumulation in memory, so 96GB, and you need the latents and VAE and CLIP, I think you need at least 2 H100s, but let's say 3 to be conservative. That's $9/hr.
If it takes a week to finetune (I have no idea), that's 24hr x 7 days x $9, or $1500.
Hmm.
It's not like I couldn't just splurge if no one else is going too. It's not ridiculously expensive. But I don't know how.
See thats the issue who would spend over 1K for a finetune then release it for free?
Some billionaires maybe lol
I'd probably end up spending $1500 and get some kind of error when the week was over. ๐
In spite of almost everyone beign broke there is always some ultra loaded folks
Unless we manage the optimize the model training for flux its simply too expensive for the normal person
Call Musk.
If someone knows how to run the training code and has a really good dataset, I'm willing to donate... let's say a few hundred dollars? Well, I guess the amount depends on this person's track record and what exactly is in the dataset.
Image/prompt:photo realistic style anamorphosis 3d, by Charles Renee Mackintosh, Ernst Haeckel, Art Nouveau book illustration, black colour glossy woman from ballet lake, dancing with handsome man
walking a huge man in a loincloth, hue dog
by Jean-Baptiste Monge, Caravaggio, Michelangelo, Beeple, Beksinski with beautiful woman in soft lush curly warm brown long hair,
perfect symmetric eyes, dim smile, wearing decorated aesthetic
rich colored patterns of huge deformed scarecrow hat and
Pony man? anyone who made the big names, realvis, juggernaut, epicphotogasm
The creator of pony said in his discord that he is looking into auraflow rather then flux
It doesnt need fine tunes
DucHaiten
I am not saying it needs finetunes i am simply stating that if someone wants to finetune it, it would be impractical
It would be. They'll do it though, because those will sell, because everyone is so used to assuming they need them. When in reality, the only base model that needs them is sd1.5
Hmm. I wonder if it really doesn't need finetuning. Have to seriously think about this.
I think in general I just like the idea that there are many people helping make open source models better, not just one team.
If NVidia would just relead a 128GB card this wouldn't be a problem.
According to the license of flux dev, selling derivatives of Flux[Dev] is not allowed, which only leaves schnell
But kickstarter or patreon funding followed by free release should be totally doable. You just need someone with the proven skills and enough people spreading the word.
Don't play lawyer please
Well
yes
a tiny bit
but more importantly it needs controlnets and ip adapters
its wodnerful but we dont even have negative promts options now
veyr little control
No you don't. There's no reason for the 10,000 anime girls with overly large chest fine tunes. People don't need a reason, just a want
"Restrictions. You will not, and will not permit, assist or cause any third party to
use, modify, copy, reproduce, create Derivatives of, or Distribute the FLUX.1 [dev] Model (or any Derivative thereof, or any data produced by the FLUX.1 [dev] Model), in whole or in part, for (i) any commercial or production purposes" straight from the license on hugging face
yeah we cna only play with shnell
You are 1. NOT a lawyer and 2. NOT black forest labs. Drop it
Therefore, assisting someone to release it for non-commercial purposes is 100% allowed, which is exactly what I said. But seriously, don't worry about lawyer stuff. Lawyers are just money vampires. Truth justice and common sense have nothing to do with lawsuits.
You have no way of knowing what contract someone making fine tunes has with BFL
"FLUX.1 [dev] Non-Commercial License
Black Forest Labs, Inc. (โweโ or โourโ or โCompanyโ) is pleased to make available the weights, parameters and inference code for the FLUX.1 [dev] Model (as defined below) freely available for your non-commercial and non-production use as set forth in this FLUX.1 [dev] Non-Commercial License (โLicenseโ). The โFLUX.1 [dev] Modelโ means the FLUX.1 [dev]โ)." read it, what does it say? which company the license belongs to?
Either drop it or I'll report you
So true lol
report me for what?, i am simply telling that the license belongs to Black forest labs
dont mind him
Don't threaten to report people or we'll all report you. Blackmail and intimidation have no place on a friendly discord server.
That is strictly between the creator and black forest, and no one else
yeah i can see that
it could use some fine tuning for aesthetic reasons also there is room for improving prompt coherence, as for cost you overlook the strength in numbers, there are people willing to chip in for donation for a good project
i reported and blocked him. We should all do the same and not let one person ruin a nice community.
i already blocked him once, then unblocked, i see crystalwizard goes around bullying everyone
but there's the issue of trust, what if they take the money and run?, for such thing to happen it would require a trustworthy and known person within the community to undergo such a task
Yeah he is a piece of trash
It was so nice here befor ehe showed up
everyone hates him
you're pushing the argument on a different trajectory
That one idiot bully every group must have huh
I understand that but it's legitimate concern isn't?
sure it is, but that's not the point im making
im talking about feasible options for fine tuning
Yeah there are lots of people who I would trust with at least some money. You're always taking a risk of course, but the amount I donate would depend on how risky / certain it seemed.
I understand your point, and sure if enough people come together and donate money then sure, we probably could fine tune it, but you need to factor in all the risks involved in such a project
the donation part comes from merits and community records
Well why don't we just wait for the cost to come down lol
ok dude im not interested in pointless theoretical talks
no need to be on the bleeding edge.....
Well, I'll move on after this: If any of the Civitai creators whose models I regularly use ask for funds, I will donate somewhere between $100 to $300 probably. (And, to be perfectly honest, I would be fine with paying $1500 for the GPU time if I was 100% sure I could do the code myself without failing.)
yeah with merits to show and their records its easy to contribute to a project
and its not like every individual are going to donate a large sum, that's the thing about strength in numbers with people donating different sum and not breaking anyone's bank
if you put that way then i guess your point make sense, say if 300 people donated 10 dollars then it would amount to 3000 dollars in donations, and it wouldn't require large donation from people
yeah some might chip in for $10 some $20 and so on... it accumulates
im not cherry picking these images
Bro how does flux do maps and borders so well???
each render is pretty sweet
try that with sd3 medium you'll get botched up crap every after one render
๐ฑ Guys this is way faster!
The only thing wrong about that map is that Kuwait is a part of Basra
17 seconds.
are you aware that its bad for it to be consistent?
more image diversity is better, not worse
this is why there are nodes like CADS which try to raise diversity
Hmm. In my opinion you want both: Consistency when you want to make a small modification to the image, and diversity when you're exploring for different images.
I think a model should be consistent with a given seed, and diverse across different seeds.
i would say using the same seed and getting the same style and your desired prompt is pretty great. if it was doing this with a different seed it would be problematic...
The more i look into the image the more errors i see, like jordan owning Palestine and Israel and new country that exists between Syria and turkey, but even with those errors its still very impressive and better then SD3 at doing maps
so much this. if i enter a specific seed it's usually because i don't want things to change haha
ah if its same seed that's different yeah
Creativity in AI is wonderful but once you have something you want to reuse like a character object or scene it should be able to stay consistent with that module
what I mean was more that
different seeds should be diverse
Yeah there's still no costume consistency anywhere to be found. A shirt will have long sleeves in one image and short sleeves in the next.
but yeah within one seed, if you make a small change its nice if it can change only a small amount
Flux
I'm getting a huge range of speeds for render times and I don't know why.
render time varying is strange
makes for something great like game asset sprites that stay consistent or even food photography for menus in the future if we could grab a reference image and it would correctly display the different parts of the real life dish
I find really weird seeds sometimes
This is the most detailed and rich setting I've found. (And it's the best by a pretty decent margin.)
beta is great, i haven't used anything else ever since
I use beta too ๐
a dog driving a bike
Yes, but I tested all the samplers and the best (for intricate detail) were: Heun and DDIM. But Heun has the edge.
Flux dev is really awsome
But I haven't tested for anatomy or text or anything. Just for intricate details.
oh you can get better detail from some very slow samplers
that require custom nodes
like RK4, implicit adams or implicit midpoint
but it takes a long time
can't test it on this image lol
its nearly a zero detail image
blurry focus of squishies
let me try something else haha
disappointed