#πο½sd3
1 messages Β· Page 85 of 1
i know, i just didn't switch back earlier after making examples for here
Flux.Dev is good at 20+ steps - but takes a month of Sundays!
Is Schnell at 35 steps really sharp?
At 4 steps, Schnell is good, but a tad soft!
NF4
yeah a little bit of haloing was happening as well, almost like having too high of a cfg, but they aren't terrible
like here was one of the accidents with schnell at 35 steps
When I use Schnell, I add an SDXL-out for the sharpness
is flux NF4 20 steps?
Good realism
I'm using NF4 at 30 steps - which is good quality. You can try NF4 at 20 and see what gives ...
NF4@30steps
i recommend 25+ since it's working with less precision
personally, i use 35 steps with dev nf4
I will up my steps to 35 and see what happens
probably not a whole lot, it will shift things around on an image if you're trying to compare it to a current seed or something
so i wouldnt use a 30 vs 35 with the same seed as some kind of litmus test
Just to see if it gets that extra sharp edge
maybe try a prompt like "sword against a plain gray background"
something that you can guarantee a straight edge on to look for sharpness
if you're interested in edge sharpness
There are some ControlNets for Edge Sharpness ...
meh, i do all that shit in post if i really care
NF4@30steps
99% of what i do with diffusion is simply brainstorming for game assets, so i'm not too picky about having absolute 100% quality
As an artist, not having 99% quality is an artistic asset!
i'm an actual artist and sculpter as well, both digitally and with physical media
i don't view ai generated content as important to me, but it makes for awesome brainstorming
30 vs 35 steps btw
In 16 years of 'selling' my art, I think I have collected Β£300 π
So let me get this right, i can pay Google collab for faster gpus so I can make loras, BUT, I'll probably never get an a100, without trying everyday for weeks?
How is this even something anyone even uses?
the reflections near the tip of the sword, on the spine, are more accurate on 35 steps as well
but you could probably split hairs between a 1000 images and would likely see a marginal preferential difference of maybe 2% leaning toward 35 steps
It's the difference between fine-art and photographic realism ... and always quite subjective. Both versions have their uses.
If you're going to expand and blow images up, then 35 steps is best
35 Steps - there is better detail
Well in theory, more steps is almost always going to be better. If you look at the curve fitting gif I showed earlier, it lets the sampler wiggle tighter and tighter around the data, using smaller steps rather than chunkier shifts. However, if there are flaws in the quality of the data in the actual model, it can make them more obvious. Kind of like if you overfit a lora when training. Things like artifacts in the actual training set can become more visible. Like bad jpg compression and stuff
Glad to have you onboard to explain - I'm "colour-blind" when it comes to maths!!!
35
Thanks, well my actual formal education was in engineering, so I breathe math
My father graduated 2 maths degrees, and went into civil-engineering to build nuclear power.
I can do long-division! 'nuff said π
All settings constant, just using different steps:
25, 30, 35, 50, and 100.
There are minor changes, even from 50 to 100, but not worth the time. You can get better results by just using that time for rerolling.
The more steps, the deeper the d-o-f
if you use something like adaptive Bosh3 then you will see benefits from 100 steps
high steps requires a different type of sampler to see gains
Oh! I've turned Flux nf4 to humour ...
35 steps
Prompt = astronauts playing bagpipes, bath tub, light house, jonas peterson, andrea kowch, umbrella, andrew wyeth, harbour, ship, sea shells, crab, lobster, wheels, hat, rain clouds, watercolor, victo ngai, matisse, monet, catrin welz-stein, vladimir kush, henri rousseau
You might be right about that.
generally at very high steps its stuff like architecture, furniture, doors and windows that get fixed
at least in my generations
I've seen stuff get fixed beyond 300 steps
Now have them breakdancing on a bathtub
X3
Pin-tack sharpness, what every pro-photographer raves about!
Great idea - just adding ... π
But the output was usable?
if you use SDE samplers with low they can need a really high number of steps to converge
with euler A at CFG 0.1 (using CFG++) I found that the image wasn't useable at all until after 150 steps
but that's a quirk of SDE
I've never really swapped in/swapped out samplers and schedulers - unless I was really convinced.
Ancestral Samplers respond to steps/iterations. Others to CFG ...
35 Steps NF4 16:9
double majored in electrical and mechanical engineering. was going to go for a phd in mechatronics or some other robotics field, but life shit happened. thankfully, there's a ton of overlap with game development. if i can code the logic into a robot arm for something like inverse kinematics, i can do it for a game lol
I did a part-time degree in English Literature - figured if I was lying on the beach reading all the greats ...
He's not playing the bagpipes; nor grating cheese!!! π₯³
"Can you play the bagpipes in space?!"
I actually started doing quite a low step method these days
12-15 steps of UniPC
with beta sched
Flux.Schnell at 4 steps is cool. But it is also soft
I don't really want to run super long generations all the time
So I use and sdxl-out to increase sharpness
yeah refine is good
just expect from and treat schnell like you would a lightning model
My cousin was a mech-eng - he made pacemakers π
there's a Lora made by the Leosam HelloWorld XL creator
that combines Turbo with LCM but not quite as strong
I like that one for 8 steps SDXL
NF4 - does it work with LoRAs?
hell yeah
not by defualt
there might be a way to convert loras though
not sure
doing this to flux was kind experimental even according to the guys who did it
(its in diffusers as well)
I think that Flux.Dev works with LoRAs - but not dev.nf4
if you want to ever try a really fancy sampler combo
clownshark's stuff is really good
clownsampler and sharksampler
they work together
ClownShark's stuff is memorably unique - it takes 11 minutes/image on my 8Gb RTX2070 - but every image is worth it!
yeah i talked about it over in the comfyui section. my guess is due to the weights of the nf4 models being variable like U8/bfp16/f32/etc and the lora needs to be up or down casted every block to modify the model weights
its very slow yeah
Supreme Sampler by Clybius is good too
but this won't work on flow model
like flux/SD3/auraflow
I added 'breakdancing' - nf4 seems tame on that!!! π₯³
yeah clown makes some really cool shit
The lobster seems to want to though!!!
30 gens and not a single one with correct anatomy. π
Oh well. Time to go to work.
IDK if they will actually pull this off
but according to some of the flow papers
the goal of flow architecture is to make it solvable by a single step of euler
that might require a truly massive model though
the SD3 paper did note that the flow paths got more straight as the model got bigger
We've come to a tipping-point: bigger, better, larger, quicker hardware?
Or smaller, more efficient software?
we still need more hardware rly
distillation has all sorts of side effects
turbo / lightning stuff too
I haven't seen a "free lunch"
apart from the node that is literally called free lunch (FreeU)
35 iterations@nf4
sometimes it helps if you use the fp16 t5. just add a dual loader, set it to flux and load the fp16 t5 and clipL, connect that to the prompt instead of the one from the checkpoint loader
doesn't always fix issues, but sometimes it helps
I'm using @errant dust 's w/f
i experiment too much to bother with using actual consistent workflows, i just keep my shit like a messy desk with a ton of muted node groups everywhere lol. thats why i rarely share images with the workflows attached
Flux Guidance: what does it do?
ClownSharks stuff is like that - littered with unused nodes π
basically, it just adjusts some of the "signal to noise" ratios of the prompt in the model vs the noise. higher guidance tends to look more burned, similar to CFG, but also adheres better to the prompt usually
Maybe that's why text is so coherent?
the flux distills that we can use locally do not have cfgs though, so no real negative prompting. people do some hacky stuff to kind of make faux negative prompts, but it's not functionally the same and has a lot of drawbacks
higher guidance probably makes it do a better job of adhering, but i dunno, probably has more to do with the training and the architecture
we still dont have a paper, so it's hard to say what they did to pull it off
Vast.Ai FTW
vast ai is good yeah
or their 20 similar competitors, offering pretty much the same thing
better value than colab pro
it's now mainstream news in India
in general yes, but not as Comfy implemented it
Ugh, this all could have been SAI with their 8b model, if they only bothered to release it and not quickly make a crippled 2b one for release :/
Comfyui injects the Lora weights directly into the quantised model, instead of applying the lora on the fly during inference
the latter (the way diffusers is doing it by default) would allow to use the lora at any quantisation without problems
the way comfyui is doing it basically ends up in a lot of information loss within the lora. A workaround would be to first apply the Lora, THEN apply quantisation, but that would be slow as hell
I'm trying to learn diffusers cos
some stuff is easier in comfy and some stuff is easier in diffusers
yeah, it's a shame
diffusers fails as an API in my opinion
they abstract the low level components like unets, transformers and so on, but not the high level components like inpainting, cfg, diffusion
so implementing anything in diffusers mean starting from 0 almost
on the other hand many complicated things like lora or quantisation are just one-liners in diffusers which is really neat. Basically, you can use the whole huggingface ecosystem easily within diffusers
currently my workaround is to run Flux in bf16 in Comfyui, so that I can still use loras. But it's not viable option, as it crashs as soon as any other process (e.g. browser) takes more vram
Sd3
Diffusers doesn't really do API, they re-implement everything even if it's the same functionality, make it more fragmented than it needs to. But for an easy backend, i prefer diffusers, comfy is pretty quirky too (and i really don't like the extension/custom nodes ecosystem with all their own pythdon deps that break things all the time)
I really love the syntax of crowsonkb/k-diffusion
which is basically what comfy is based on
its so much nicer than the diffusers sampling code
k-diffusion is basically someone taking the Karras et al. (2022) paper
and writing a pytorch implementation
this is the same paper that Karras schedule and Heunpp came from
I have two big struggles with comfy
- the way it is structured makes it hard to predict clashes if you don't know the library well
- custom nodes are poorly documented and sometimes get stuff slightly wrong
Because those vary on a per pipeline basis. Not every architecture is going to be compatible with every feature. But they usually eventually get adapted and added to major models. Like someone worked in PAG to sd3 recently, even though sd3 doesn't use a unet
most stuff works 1:1 in every model
the funny thing is: this is not the case for the low level components
like unets, transformers and so on can be implemented in very different ways
that's why Flux transformer implementation is a few lines of clean and easy to read code
while diffusers transformer implementation is a mess with thousands of options and configs
because it has to match any implementation from flux, pixart, sdxl, kadinsky and so on
but the high level components like inpainting, img2img and so on, which are almost identical in each framework, they are reimplemented for every single model
Right, they get the architecture creators to help and make major decisions. Most of the time they want 1:1 parity with the creator's code
35 iterations@nf4
also that is not a valid argument. In a good API you would make everything modular and easy to adapt. If a component is different between two models, it would have to override this tiny part of the code
It's not a true API though in the sense of everything being modular and interchangeable
But I use diffusers a lot, I wrote two apps standalone apps for hunyuan and pixart and there are definitely major differences in how they handle things like embeds
do you think it is viable to do diffusion projects in pure pytorch?
or is that gonna be too much work
my portable ratchet set has both metric and imperial sockets in it. i wouldn't call it a poor set just because i don't have a metric size that fit's a common imperial size, i'd use the imperial size for it. but the ratchet, extender and case are compatible with all of them.
the problem with the whole AI world in general is that there are no actual standards or unified ways of doing things. you have a million different models doing different and similar things, but there's no standard for how they are supposed to go about doing them. most of them do try to use some of the similar backends like pytorch, python, some similar libraries, etc etc, but there's no real consistency in it.
Since Flux - the PiXart room has gone 'horribly quiet'!
again, I don't say diffusers is bad in itself. It's just bad as an replacement for ComfyUI. Not because an UI is missing, but because a proper API is missing
yeah its a big shame because the pixart guys gave true open licenses
of course it should be possible to just create a proper API library on top of diffusers. Maybe that would be a cool project
diffusers aren't meant to replace modular apps like comfyui. diffusers are mostly meant for just using the vanilla workflow with a few features. like a basic checkpoint-encoder-ksampler-decoder workflow in comfy
That amateur, volunteering, public-spirited world of open licence software - and it is incredible - will necessarily invite allcomers, and so diverge and splinter as all and sundry contribute
And the best open licence software will rise to the top ... where unified standards can start?
yeah I was surprised by this
diffusers is only really about replicating the basic workflow
not experimental stuff
before I looked into it, I had assumed that diffusers was lower level than it is
I thought it was a very thin pytorch wrapper but its not
it's meant for people to easily create a diffuser that can easily be interfaced with things like gradio, if you want and it literally takes like 60 seconds to set up a basic app
AllYourTechAI room says that there are some Flux Pro LoRAs ... ?
`import torch
from diffusers import FluxPipeline
pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16)
pipe.enable_model_cpu_offload()
prompt = "a tiny astronaut hatching from an egg on the moon"
out = pipe(
prompt=prompt,
guidance_scale=3.5,
height=768,
width=1360,
num_inference_steps=50,
).images[0]
out.save("image.png")`
nice
from their example on HF docs lol
Very good - Little Moon Ice-cream?!
but pretty much every diffuser will look like this
ye this isn't what I am looking for
its a nice library though
I'm just gonna use raw pytorch
even if it takes me a while
I thought diffusers was just some small utilities and abstractions
to aide doing diffusion in pytorch but that's not what diffusers is
Is Pro being considered for release or always gonna be API/Paywall?
they might release it in like a year
and then this is what a simple GUI can look like using diffusers
that's a really nice UI
(i wrote it a while back)
I like how the silders are audio sliders
AllYourTechAI is minting it via PhotoDojo
So it remains to be seen who will slay the goose which lays the golden eggs?
damn the app was over 600 lines of code, didn't realize it got that fat
I have to admit though
integration / similarity with huggingface transformers is a big plus
I will probably still try to learn it because of that
oh right right, it was all the tk labels and fields and grid placement shit
ah yeah tk
and a bunch of incremental steps to save memory and vram
I always forget the name but I used pyside 2 or pyside 6
like i precalculate the prompt embeds, then unload the tencs
then load the tformer and diffuse
"Max memory allocated: 7.484710693359375 GB"
and that's with the fp16 t5 for this
Is the T5 encoder loaded in VRAM or RAM? Or both?
ah yeah loading and unloading can help
I love the wait in Flux as t5xxl_fp16 analyses the prompt - it's like it's saying "I'm really paying attention here!"
don't underestimate all the annoying things like cpu offloading, memory handling, proper model loading and casting
diffusers does simplify all this stuff
if they have it set up correctly in the library for the model, yeah
but a lot dont
yeah I am predicting its gonna take a lot of work
(Is it just me, or am I alone in being the one not going back to SD3?)
all your tech is full of π© , their "pro"is dev, the sad thing is they don't need these lies, they're one of the few sites actually offering unlimited dev, not just schnell.
it will look something like this "model_cpu_offload_seq = "text_encoder->text_encoder_2->text_encoder_3->transformer->vae"" but doesn't always work, especially if something is screwy with the dtypes where they won't allow calculations with cpu and cuda:0 at the same time. i'm sure you've seen this error before in comfy working with stuff from time to time
once something like the sd3 8b api is released, i'll be back in no time, i like its ability for different styles and even its photo's look more real
Mebbe, but I regularly use 8b @ClipDrop - and even then I'm not impressed
I guess that reality in SD is not my fortΓ© - art survives its own realitΓ©
for different style it's good, but i also tried some api prompts from artisan chan, and it was a bit painfull to compare (flux was way ahead)
I'm trying to see if I can figure out how to convert a Google workbook to vast. Vast doesn't have any (useful) just click buttons and it works already made workflows π¦
It took me a few days to install everydream2 trainer on my computer, with help from their discord, so I worry that trying to create a Flux lora on vast might take as long and cost a small fortune π¦
apparently only 2B had had full aesthetic fine tuning
Jellyfishalicious
phenomenal model
im my image generation jeourney flux was the first model that rly blows me away
and we can run it locally damn, what a time to be alive
flux is amazing yeah
two papers down the line well all be uploading our brains to the matrix
Flux
how long it takes you to render nf4?
You'd need some kind of controlnet to handle the wrap points. I know there are cnets to do stuff like making tiling textures, so it's definitely possible.
Though in games, we don't use that style of cube mapping anymore really and if there's one in that format, we convert it to a normal panorama style
For skybox, thats the only template
Where can i find such controlnets
there is some company out there that makes vr skyboxes with diffusion
closed source though
but it shows its possible
1024x1024 about 2 minutes and 10 seconds
nf4@35 steps
So the realism lora for flux is 20 megs...
They don't seem to work that well - unless I'm using them wrong?
nf4@35 steps
They don;t at all in forge. I think they work in Comfyu. Will try in a bit, it seemed to do soemthign yesterday.
Loras don't work with the NF4 version on Forge
So Flux seems to have strong guardrails against art styles of any kind, whether comic art or classic art such as impressionist, surrealist, and so on. At a sufficiently low CFG it breaks free of them very slightly, at a real cost in adherence too, but not enough to actually show it can distinguish between styles. That said, if the ability to truly train a checkpoint or fully supported LoRAs come out, people will be able to build their own preferences. Since that was a defining feature of SD in the past, one can only presume and hope this will be true of Flux moving forward too
I'm totally aware, but sadly unless I accept 45 minute renders, I need to wait for some kind of support with NF4
I find it hard to make Flux LoRAs work at all - do they need Dev or Schnell? NF4?
Plus as I understand, the LoRAs don't really have full effect like classic SD LoRAs.
NF4 35 iterations
I suspect the Flux team will eventually open the doors to more understanding of what is needed at which point we will witness a tsunami of such
Dev only. NF4 and Schnell won't work, though for different reasons
It'll take me forever using a LoRA in Dev π¦
LoRAs require guidance control which Schnell does not have, and NF4 due to the way it is made, is a small nightmare to make LoRAs
So it is Dev alone
Ok, I went back to fp8, because in my case it takes the same time as nf4
I assume you have a lot more VRAM than I do though
I am the pauper of paupers for this use case. Meaning 8GB of VRAM
and even then I have to regularly restart Comfy to flush out the memory
I too am an 8Gb VRAMmer
In my case, it consumes the same. I have 16GB
right, a different use case
with fp8 you can no doubt fit it all onto the GPU memory
Where does the Flux LoRA go? After model ...?
Load the checkpoint, connect the Model node output to the LoRA and then on to the rest
Unet works as well as Checkpoint version?
Yes
Are you using Comfy? What workflow?
Just grab the image, no?
And which model specifically? if anybody knows, I wanted to try it
I'm using one supplied by @errant dust
This is the first Art Nouveau Flux.Dev/LoRA image I have made - quite good
20 minutes in the making
I haven't been able to make my flux images look like the style of any particular painter yet
I strongly suspect you won't be able to at all. Let me show you what "impressionist oil painting of...." yields:
Apparently you need trigger words with that LoRA ...
What about glif with their comfy feature? Or civitai?
Flux.Dev image using Art Nouveau Flux LoRA - quite impressive!
You should see my by Pieter Bruegel the Elder images ππ±
??
which one is it? On Comfy?
its absolutely not perfect for humans, but it does work
you need to use the trigger words and take down the strength to about 0.6 or lower
these are all with the lora
Flux has never heard of Salvador Dali π¦
Flux.Dev using Flux Art Nouveau LoRA
Use any png I have posted it contains the workflow
wow art nouveau era
Have any of yall trained on tags only? So far my NLP loras have been great but i wanna know if i can use my huge tag dataset. Edit: for flux loras
do you know about pony
sorry i meant for flux
should have added that
I think that's kinda an unknown question at the moment
what is style prompt?
ok ty, and is this one model?
ty
not as good as a lora for sure tho
I need to learn the good prompting nodes
Install directly from comfy. Got pull didn't work for me. Also do that python stiff the git explains
what do you mean install directly
like re-install Comfy?
most of the time its a bitsandbytes not installed or old version issue
I think I didnt install that
I re-installed Comfy and it was the same
I mean from within confy," install from git url"
You definitely need the bits and bytes
what is that exactly?
what is the url for that loader, or what?
I'm on Ubuntu
Here's a guide for nf4 installation if you read from here down. The patient folks helped me get it installed #πο½sd3 message
That's probably a bit different, but I figure install via git url from within comfy should still work.
Also the whole add the python stiff should be similar
Just ignore the windows directory structure in the above convo
I don't understand how to install this https://github.com/comfyanonymous/ComfyUI_bitsandbytes_NF4
Contribute to comfyanonymous/ComfyUI_bitsandbytes_NF4 development by creating an account on GitHub.
oh I think I found it
oh boi
I'll try to git pull manually
Just go to the folder of Custom_nodes in CMD, and git clone it
make sure you have bitsandbytes already installed first
the 1024 version nice. hmmm. do i want nf4 comfyui or 1024 controlnet comfyui? is it update day today?
https://github.com/comfyanonymous/ComfyUI/pull/4260 woohoo its mainbranch now
I'm trying to install that
git clone what?
Flux.Dev and Flux Impressionist Landscape LoRA
just some advice. you might want to do some basic youtube courses on how to manage your comfyui. it's quite intuitive what he means. git cloning the custom node project into the custom node folder
Both! π
i've stopped helping people install stuff like that since more cereal experts get mad at me when i try to clarify. so i'll just zip it again. i'm getting into deep water
The Comfy NF4 node you were trying to install via the Manager
I did pull, not clone
Cezanne inspired impressionist landscape Flux.Dev and Impressionist LoRA
how to install bitsandbytes?
I need to install it on the venv?
mm I tried but it is installing torch, I don't want to touch that
Also I see it is installing much cuda-nvidia things, I'm on AMD
and it crashed
Final Flux.Dev Impressionist LoRA image for the day
It seems it doesn't work with AMD
i think you are right, i didnt think of that
the install via git url button
What are your system specs?
wait. wait wait wait. that's the bitsandbytes project. not the python compiled module
https://pypi.org/project/bitsandbytes/ this is the more appropriate version to install as a python project dependency
ya needs cuda
INstall via git url
portable comfy needs a bat file. run it, and it asks you "what pip packages would you like installed" then you type bitsandbytes or whatever and bblamo. ez pip install.
I've figured out how to run the embedded python's pip by now, but i see this advice being repeated and distorted so often that i think a simplified tool like a bat file being available in the root project folder would benefit 10000 newbs
theres the install pip command in comfy manager but it never works i've noticed
security clearance or some
where should i pip this
in venv
which in comfy
you have portable comfyui? open the python_embedded folder in powershell and type ./python.exe -m pip install -U bitsandbytes
to which i say, we need a better way of advising the newbs. some people have a venv. some people have an embedded python environment
i have portable one, ah thanks for the explanation!
dont think 'm using newb condescendingly btw. it's from the heart. we're all newbs
im just a dabbler, i try to help on basic things ive seen myself
but when problem 2 comes along im out π
?
do you notice any different nf4 on forge and comfy?
type on address bar 'powershell'
brows to the folder in windows and copy it from the address bar and paste it in CMD window
It was working perfectly yestrday grumble
did, but then this
so found powershell, but I put my comfy in way too deep of a directory lol
i dunno i just use the plain CMD window, not powershell
nm not important enough, I played with it yesterday, so got my fun
not sure what happened to it
take the slash out?
or .\python.exe
thats the python folder yup.
just one cd lol. though typing absolute file paths is something i happen to be great at. as you're typing it, tab will auto complete the folder names
also, windows 11 hacks. right click in hte folder in explorer and "open terminal here"
I tried it in the correct path, seems you can put cd and pase and entire path these days! (last I used DOS was decades ago lol)
anyways, I got the terminal for it, then got the fancy powershell version, either worked
yeah cmd.exe works different. you have to go to the drive letter first
the next step is to add a custom transparent background to your powershell and make it look like matrix terminal letters
so I'll just wait until Comfy upgrades to add it. I already installed it yesterday, and it worked fine, but not today
also in cmd.exe you don't need to do ./ before executable commands
I had it installing but...
how many characters was it than the ~
nf4 so not worth it!
i refute this for me
it still takes 3-4 mins per image tho! π¦
whereas I can just use glif or HF and only wait 1 min with flux dev or pro
and Flux sucks at nsfw, so I don't even get that advantage on my own system
LOL (from yesterday)
Btw, is making flux loras worth it? I'm tempted to try out that Flux google collab lora making. $18 isn't too bad for a whole bunch of loras...
(if I ever luck out and get an A100 that is)
i haven't been that impressed yet, my images are all no lora
All the loras I see on civitae all have that slightly faded, slightly blurry look π¦ I don't know if that's a flux thing, or a model maker bad images thing
python dependency issues aren't inherent to nf4 node. while i'ts unfortunate that the rollout of this comfyui node has taken the wind out of people's sails, its still really good. I haven't found any situation where i can't get what i want out of nf4. and its faster and lighter weight
Can you manage a decent one with male anatomy? If so I'll try harder (no pun intended lol).
i think the style loras aren't worth it. i'm not going to bother making a ball lora for flux. i think a prompt template is so much better. 1KB vs GB for the same effect.
the loras that add people though work the best i've ever seen
hmm, so character consisency you mean? Or specific actresses, or?
lol well, i should qualify that things i might try may be different from things others might try. can you get "male anatomy" out of stock dev?
the actors people have bene posting loras for. arnold and the other girl, queens gambit
No lol. But also not with stock SDXL, so you never know π
gotta head out for a couple. be back later. i think today is more importantly 1024 controlnet day instead of play wiht nf4 more day
can't wait to get my hands on ip adapters
Is it just me, or does it not know any Renaissance or Medieval painters? <pout>
I'm defi intely going to try that one first π
i think it does but the distillation has collapsed the "painting" class down to a couple styles. prompt aggressively, fidget with guidance and step count. try ddim or something. fiddle. it's in there. just , how to get it out?
remember. 12B params
bye for now
I'm suprirsed though, since SD3 has so many painers!!!!
one more thought before i'm out the door. does pro have the painters?
I crashed in the installation, and maybe I should try to install it in the venv. But I guess it would be same. It seems it won't work on AMD
I don't want to mess my Automatic1111 venv
any particular, ive done some
Not last I checked. But I'll try with some more mainstream ones
Oooh I want to see pics! π
Pietre Bruegel the Elder is one of my faves (SD3 def has him)
looks around
No balls 
bad channel
Ball
girl 
im using dev nf4
making good timing on my gpu
and image qualities are apparently better than schnell
aww shit i clicked one of the spoiler images
thought you posted something sexy but it's all gore
I tested upscaling and inpainting with FLUX in Forge today. I don't feel like going back to the SDXL model anymore.
im fast losing interest in pony / sdxl too
It's time to free up some space on my SSD. π
Going to a better place.
Make them play ping pong
They are playing so fast there are multiple balls being captured in a frame
to make an image you must first train the AI to play ping pong
Cool
Damn Depth and HED Controlnets Xlabs is cooking
except cool art

Flux can actually put drivers in the vehicle
nah it's awesome. It does pixel art really well..
and gets hands right almost always
and is amazing with text..
it's literally what SD3 was supposed to be
I mean this is cool isn't it?
I haven't seen any pixel art from flux yet I think
but most stuff looks very generic
like dalle kinda
its consistent at things thats good
Pixel art and jpeg artifacts are the one thing I find sd3 slays flux at
it works really well for it, I have a couple examples :
I mean..it seems to look decent IMO. I don't know if it's "true" pixel art
jpeg artifacts ? lmao
I struggle to get flux to make anything align to a grid like pixel art
searches google for the pixel art foundation rules
Yeh like, security cam footage. Some of the results had me checking my export settings on my UI
jesus comes to tell you

Even pixel artists are telling each other they're not making real pixel art
It's a very opinionated art form
Read a twitter thread the other day a guy was saying pixel art was invented after LCD flat panels because we weren't making pixel art on CRT screens. I honestly just think he was being a dolt though
well he's technically right. CRT was phosphors not pixels
Monitors had clear pixels
Tvs didn't but my PC 1280x1024 screen had ultra sharp pixels
I don't even want to think of 640x480 monitors or 1024x768. How the fuck did we even manage
Not only that, but video game designers back then made their art with graphing paper and used expensive workstations that were essentially pixel editors.
I remember when pixel art models for sd15 first started coming out. There was an extension to cook the image out in a way that made it pixelated. Pixel artists came out of their caves then too to tell us all that those weren't actually pixelart
really incredible
sometimes I feel like there is little point in using anything other than flux
when I see ones like this
i think about refining them with other stuff than dont do it and make more pictures
when full SD3 comes out it will be more viable
Yeah 8b one will be nice to use, flux will still be better but 8b should be close.
nah flux in the trash bro fr fr
I don't think flux will be better
flux is distilled so much less tricks work with it
and its overtrained on a certain style
even just the 2B is better for realism
sometimes its better every time
Sounds like the honeymoon is over
whats that prompt ? Theres some meme potential there
I'm really sensitive to looks
like midjourney/instagram look
realistic stock photo is my preferred style
or analogue film
Look, my honest take? Flux is fantastic. Saying otherwise would be ridiculous. It is simply not the 'second coming' as many wanted. Like all the top image AIs of the moment, the very best ones, it has its strengths and weaknesses
Personally I learn what those are, and use the best tool for the job, and Flux won't always be it. Tis life.
(DARPA,Lockheed Martin,Northrop Grumman,Reaction Engines Limited,DSTL,BAE Systems,Airbus Defence and Space,DGA,Arianespace,OHB SE,BAAINBw,MTU Aero Engines,Energia,Russian Space Forces,Sukhoi,Antrix Corporation,DRDO,HAL,UAESA,CASC,CASIC,PLA Strategic Support Force,Mitsubishi Heavy Industries,IHI Corporation,ATLA:1.5), (NASA,CSA,AEB,UKSA,ESA,CNES,DLR,ISRA,Roscosmos,UAESA,ISRO,ASRI,KARI,JAXA,CNSA:0.5)
yeah this take is fine
a what now
Hey, I don't make the rules π
I like how trash prompts still make great images
hey I might... just might have as weird prompts don't call them trash
Scientific laboratory setup, high-tech equipment, advanced technology, precision instruments, futuristic design, scientists working together, highly focused attention, neon green GenStone placed at center stage, surrounded by complex arrangement of laser beams, splitting mirrors, lenses, spectrometers, fluorescent tubes, data collection devices, digital displays, lab coats, goggles, gloves, safety protocols in place, cutting-edge innovation, groundbreaking discovery on the horizon.
i just wanted to shoot lasers at something
"giant robot" definitely not a transformer
lol
you could use perpneg to get rid of transformer
might not work without perpneg
I don't blame the model
Giant robots are transformers in my mind anyway cx
Also I'm just using api, none of the noodles for me
Wild. This will give stable diffusion a run for its money now
Yup... I have been exclusively using Dev since it was released...
flux is way better at death stars than sd3. sd3 keeps giving them big glass balls where the delfector goes. neither are quite great at it, but they're a lot more creative on flux and can be made all sorts of ways
sd3 can do a lego death star sorta but wants to pixel art it and always that black lens
foiled by diffusers again. the new instantx 1024 resolution controlnet is diffusers file. might not even load in comfy if it were a regular safetensor file
What are you concerns or pros about SD3? Asking for a friend
No pro?
very nice
flux or sd3?
that's flux
When you prompt - be like a Hollywood Casting Director - prompt against type!
If Flux is too refined, use prompt words like distressed, grunge, dystopia, chaos, entropy, depressed etc etc which counter its rather glamorous and glossy look
Flux.dev.bnb.nf4
Just dev.
He put out a V2 version of the flux dev nf4 model:
"Update:
Always use V2 by default.V2 is quantized in a better way to turn off the second stage of double quant.
V2 is 0.5 GB larger than the previous version, since the chunk 64 norm is now stored in full precision float32, making it much more precise than the previous version. Also, since V2 does not have second compression stage, it now has less computation overhead for on-the-fly decompression, making the inference a bit faster.
The only drawback of V2 is being 0.5 GB larger."
https://huggingface.co/lllyasviel/flux1-dev-bnb-nf4/tree/main
From my understanding, what he did is kind of similar to llm quants where you'll have something like a q4_0 vs a q4_k_m or k_s where parts of it are quantized differently
Flux is just from another planet. Incredible model with that "extra" that we all were waiting for. Getting interesting, how SD will handle with that.
I think SAI needs to start considering a fundamentally new model because by the time the 8B model is released, there will already be a huge ecosystem built around FLUX, with LoRAs, ControlNets, fine-tuned models, and so on. And with new compression formats, FLUX is now accessible to a larger number of users. In general, SD is in an unenviable position.
testing the V2 nf4 dev model, yeah the precision is a hair higher. the overall quality is still very similar, but random artifacts in things like backgrounds seem much less common. the speed is a hair faster as well, but not but a massive amount or anything. maybe 5% faster in my case
nf4v2
Prompt?
A cow like horror creature with extremely long legs, white background, game art , digital drawing, black and white cow print
I couldn't remember "concept art"
It's what I wanted to use lmao
which flux version do you use?
My guess is this is actually SD3
No surprise. Flux cannot produce that kind of comic art natively.
Honestly, I don't think the new compression formats will have much of an impact on the overall access by users. The number of users of open source models like Flux and SD who actually have the technical wherewithal and hardware to install and run it locally is microscopic compared to the overall userbase. The overwhelming number of users who do use the open source models will be doing so from service providers who have Pro or Dev and offer it for free or fee. And those service providers won't be concerned with Nf4 or the like. Don't get me wrong, I am incredibly grateful for the NF4 builds, as I am a direct and ideal beneficiary, but I also know what is what in the larger scheme of things.
They are going to give you a Nobel Prize
flux, the original image is Cascade
Frankly, SD3 8GB made no sense financially when it first came out, nevermind with competition by Flux now. Via API (and I think their subscription Assistant with credits yields the same thing today) it amounted to 190 images total for $20. For that amount a month at Ideogram I get that many renders (with 4 images each) per day, same for Dall-E 3, and MJ gives you literally an infinite number of renders per month for $30/month ($24 if you buy the yearly plan). Is SD3 so great and overwhelmingly superior to warrant their asking price? Never was.
what do you mean
For trying to keep SD3 alive.
its fun to use, flux doesnt do the stuff I like well most of the time so its not worth using >.>
The biggest irony about Stability's Assistant subscription service is that all the other services, or rather techs, embedded are actually free to download and install. The image generator is the one they keep under wraps as if this were the real magic sauce and must not be 'given away'. The truth is I also use online sites for.... Flux! Despite having it running and working here. Why? Because they have hardware I don't and can render in seconds and without hogging my machine. Free renders in Flux Pro.... what's not to like? I give up some of the control I get locally, so there is a price, but one that I am sometimes willing to trade off.
Amazing
the ultra model pipeline has something special
as far as I can tell
significantly better than vanilla SD3 8B
there is only one single 8GB offering, not two afaik
nomenclature won't change a thing. It is one model. The Ultra being peddled as a name is to differentiate from the 'Medium'
And yes, it has zero relation to Medium. They are trained on entirely different data sets
what I mean is
what's the difference between SD3 8B and Ultra
not the medium, which is 2B
And I am saying 8GB = Ultra
but it has a separate endpoint and the results are better
Ah. And you have evidence of this?
THat there are two different 8GBs available and one is better than the other?
its not that its a different model its that its a pipeline apparently
one of the comfy people said that its a comfy workflow
I think we are not communicating properly. What are the two different SD3 8GB available?
there is only one SD3 8GB model
but there are two endpoints that use the same SD3 8GB model
the regular one just serves the model
the second, called Ultra, does something in addition to that
Link?
its just on the main stability site
So what are the benefits of SD3, and the benefits of Flux, over Midjourney? I mean aside from price point π
I found it. And looked at the API details too. Can I be brutally honest? I think the only difference between the 'Ultra' at 8 credits per image' and the vanilla at 6.5 credits per image is.... more steps.
Midjourney is an always learning, always evolving platform vs. a point in time checkpoint. IMO MJ and the like will always be more adaptive, dynamic and able to produce whatever images the developers train it to produce at any time...
We could have the same from SD3 and Flux is there were continuous releases with different training focus ... but with the limitations of consumer hardware... OR they could do the same as MJ and have an API platform that does the same thign that MJ does.
I may be wrong but SD3 API is likely able to do this or is doing this already?
Maybe the checkpoints and loras do that extra stuff for SD models
If you are talking about SD3 8GB (any flavor), the benefit is simply.... variety. It is not better, just different, and that means it might hit that perfect render none of the others gets, which is likewise true of them all
Absolutely... but, again, not very dynamically. Always a point in time locked in set of weights. MJ can technically do this on the fly... but what do I know LOL
how about the benefits of flux vs MJ?
MJ tends to update about once every 6 months from what I remember.
I think flux brings a new level of quality over previous models. And a seemingly much better text encoder implementation.
let me give you an example of an output by Dall-E 3 that NONE of them can do even close. Does it make it the king? No, not at all. I can show outputs by all of them, that none of the others can do also. It simply illustrates why there is no universal do it all. Prompt: mug of cappuccino with a world map made from the milk foam and coffee swirls.
Ecosystem of custom LoRAs, controlNet's and other extensions. Open weights are just like Open Source. Almost
Dalle definitely is the best image wise imo. THough I've heard that they have downgraded lately π¦
The others all produce something along the lines of: (here is Flux)
There are still some things SD3 can do well (Iβm DEFINITELY not talking about 2B), but in terms of human anatomy, Flux is almost flawless. Also, RIP Midjourney. That was the first thing it killed anyway π
I'm hoping the next SD model will also have many flux qualities
I still have some sympathy for StabilityAI, but they haven't been doing very good work lately
Yeah MJ should lower their prices, and offer free tiers now π
Is this flux schnell?
But don't think this is me say DE3 is pure best. I can show sample output by MJ none of them can match, for a variety of reasons (probably legal ones... lol): I asked MJ (and others) for an image of a large fantasy warrior with a scantily clad princess painted in the style of Frank Frazetta (this was with MJ 6.0, which is now on 6.1): This too none of the others can come close to. It literally looks like a painting made by the former great artist
And then you have Ideogram which can produce insane things with text and art, hitting it out of the ballpark over and over again in ways none can match. Its typography and variety is unmatched, even by Flux Pro. But that's cool. It is not perfect either, just has different strengths.
No luck with SD3 8b on that one? I love what SD3 does with mimicking artist's works!
my output
I honestly never tried it on Frazetta. The cost barrier made it prohibitive to really explore much
That's nice, but as you can see, the map sort of looks like a map texturized with a coffee color. Dall-E's actually looks like it was made from milk foam and coffee
No diss to Flux, since the rivals are no better
DALL-E is not a standalone model, but a service where many different processes run in the background. Therefore, comparing DALL-E with a single model file might be a bit unfair. However, based on my tests yesterday, flux dev seems to do a better job at understanding prompts than DALL-E
It is irrelevant. And I am interested in results, not rationalizations or justifications. If one alone gives me the result, that is that.
Yeahh youre right
Dalle's output looks more artistic
but let me give it a try with realism lora π
I use this as a bit of a litmus test, but let me be super clear: I have PLENTY of horrible fails by Dall-E 3 too. This is simply one that hit all its strengths.
Flux pro (via glif) version
And I believe that FLUX will become much better once fine-tunes are available for it; we must remember that this is a base model. When I tried SDXL last year, I experienced a slight disappointment until the fine-tunes came out
And since no one else has come close, it has become a useful measuring point
Yes, i can get great images of the concept, no arguments, but nothing that says: Frank Frazetta
flux dev (via glif) version
FLUX DEV first seed
SD3 via my glif π a large fantasy warrior with a scantily clad princess painted in the style of Frank Frazetta
my coffee be like
I got tons of those. Really really tried. π
Now this is beauty
Interestingly, the closest I got was SD XL with a Frank Frazetta LoRA (yes, there is one):
It actually has the slight painterly style to lend it authenticity

you been snooping my civitae images again? π
I haven't been to Civitai since then

I don't use local so there's nothing there for me

how come you don't use SD locally?
Look, be careful as there are two completely different sets of Dall-E 3. One that is completely censored and crippled by OpenAI, tied to ChatGPT 4, and the other that is offered by MS and has far fewer guardrails. If I ask for OpenAI's version make an illustration of a tree filled with whimsical cats in the style of Keith Haring it will flat out refuse. MS will do it in an instant, and perfectly too.
4gb vram, old amd card, my PC goes boom

You can try MS's version of Dall-E 3 for free 15-20 times a day (4 images per render) here: https://www.bing.com/images/create/
I do kow someoe who does all their images using 6gb vram, but 4 sounds nearly impossible π¦
Why on earth would it refuse that prompt? I'm used to more adult ones being refused!
Because you are asking it to do it in the style of someone's name
LOL oh LOL
And the second you add that name it will tell you forget it
but not MS, which is why I brought up the difference. It has an impact on all the outputs though, regardless of the anme used or not
I tested them both and compared to death
in the end, I concluded MS's Dall-E 3 had a clear edge over OpenAI's output
Damn
Try telling OpenAI's version to make an image from: "an Impressionist Cartoon of a tree covered in whimsical cats on the base and branches all drawn in a variety of colors and facial expressions in the style of Andy Kehoe and Skottie Young." It won't. It will choke all the way. MS's version does this:
In this particular aspect, DE3 has no peers IMHO. So this is definitely one of its great strengths
It can be crap for some other stuff though
Original link BTW (and you can see the three other renders from the same prompt): https://www.bing.com/images/create/an-impressionist-cartoon-of-a-tree-covered-in-whim/1-66b5746985304d2e92633339eda9287d?id=1VAi4%2FW3pnBPFKIongJkSQ%3D%3D&view=detailv2&idpp=genimg&thId=OIG3.wh5X.pzQKdk8R4kYXWgc&skey=mUoZ5_u07GI2VYaMOZRhB4Qz3Adqgmo9xkzpORdU9ag&FORM=GCRIDP&mode=overlay
lol

Cthulhu would be proud
I really want to see mom
Well, since it was a bird's nest, I had in mind some large cat with flapping wings to feed its brood. π
Have this delicious pizza with broccoli soup and metal bearings
I'm proud of myself 
I love glif for everything, however, it bens most of my skulls! π¦

Hmmm. That gave me an idea. First tried on DE3, but will test on others. Might be less difficult for other AIs:
a large pizza in which a world map can be seen made from the various toppings
SD3 via glif (it finally let a skull through)
Interesting. I tried glif using Flux and any time I used the word "Scantily", it had a fit. Bikini, it would do.

What is the sentiment on SD3 and commercialized models versus open source?
Hi everyone! New to SD. What are you guys seeing as the pros and cons of SD3?
Whatever happened to Sytan at all?
SD3 Medium is bad with human anatomy. It is good at photorealism, but distinctly poor with fingers and limbs
SD3 Medium dataset only 2 billion items ... Flux is 12 billion.
SD3 should release their 8 billion dataset into the community ...
But SD3 has been crushed by in-fighting and politics - and dare I say it? - money!
fell into a trust and safety hole
Hate to say it, but the v2 version of the NFT is a fail for me. While I can squeeze v1 into my laptop 4060's 8GB of VRAM for relatively quick renders, v2 will not fit in no matter how much I try.
Keep in mind that 2 billion is the number of parameters in the NN, not the number of items in the dataset
Dog World π
the version that was open sourced is a small, unfinished model - so this question really doesn't have an answer
you know this how? other than rumors?
me 99% of the time... (flux dev nf4 v2, 30 steps, guidance 3.5)
you can revert it back to the legacy view
and there's a setting for the search box
and it will pretty much be identical to how it was before. been using the new frontend for a while now
the difference is more than just steps, although Ultra very likely does have more total steps
its got a higher amount of high frequency detail and its compositions are often a lot better
it implies some sort of noise injection or multiple passes
I guess twitter/X is using groq with flux now, this could either be good or bad depending on things
the second spaceman one is really good
It's amazing at backgrounds, alsl the styles of ancient painters π
Which model?
I was actually surprised when it kept doing this prompt: a large fantasy warrior with a scantily clad princess painted in the style of Frank Frazetta. Perhaps it's based on the image output? I did have one go blurry.
SD3 via glif
the lion looks worse in the discord preview if you download and zoom in then it looks okay
Ok, so like the coffee example shared earlier, the concept of a pizza in which the world map could be seen made from the toppings was won by Dall-E 3. It clearly works with its strenghs so not a big suprise. Here are in order: SD3 Medium, SD3 Large (through Glif), Flux Pro, Ideogram, and Dall-E 3:
SD3 and Flux are fails in my book since I see nothing in the map to suggest toppings
flux can actually handle some abstract shit pretty well
been hammering it with vague prompts
So how is Wednesday treating everyone?
Dalle is so incredible sometimes
Wednesday is a no man's land. Not as cool as Friday and not as soul crushing as Monday. Wednesday just is. It's kinda OK.
That is indeed colorful.
lunch anyone? Flux Dev
overall a pretty bad model, but why not it's wednesday
wow nice its transparent
seems to do ok for the most part, if the image includes a shadow it doesn't do so well
can't wait for Flux to have a negate
is this rembg
β the product operates necessarily invokes copies or protected elements of those works.β
You have to wonder just how much the judge understands
That's a quote from the judge not from one of the lawyers
ah its just advancing to discovery though
Of course, and it's interesting to see all of the tech sites claiming that stability is screwed in view of this.
I kinda feel like
with this Supreme Court
there is very little chance of them siding with artists over capital
Well this is not Supreme Court so let's not exaggerate.
yeah true but this stuff will get there
It might or it might not. It would take a long time though to get that far. The Supreme Court is not a litigation court at all. It is a purely Appeals court
And the Supreme Court isn't obliged to accept a case at all. If it judges that a case was properly evaluated and has no real appeals value it can reject to weigh in on it
it can reject to weigh in yeah I just don't think it will
I think there will at some point be a strict ruling against AI by one of the lower courts
and then it will bubble up to supreme court and get ruled unconstitutional
they'll just claim Originalism as the reason
The funny thing about the articles coming out about the refusal to dismiss is that the news sites act as if this were final judgement, and not simply: we will weigh in on it and not dismiss it out of hand.
yeah definitely
they write the article with the title that makes it sounds like a ruling has happened
I was kinda worried at first
and then its just nothing
exactly, and it is not a little odd. I can uderstand a site such as ... Hollywood Reporter, or even Fox News getting this wrong, but actual Tech sites who understand tech better or should?
they are not all equal
True, and some are more than little biased on the topic of AI in general
sometimes they get it wrong deliberately
I haven't checked what New York Times view on this is
but I suspect they are anti AI
or the enlish language?
I like how these models get R2 right but magle C3PO.
and The Verge once had an editor report about Nvidia's keynote and their ambition to help power AI development to the hilt, and his reply was... not a little hilarious. I kid you not, he wrote he said he had cold sweats as he thought of the end of the world, and all our energy consumption to do this wiping out the ecosystem and how could the Nvidia CEO not see this?
the flux/grok partnership has X saturated with images today, jeesh
Man good manners get you nowhere you need to assert yourself like R2. He so hardcore all his lines were beeped out.
i think artists shouldn't show their work in a public setting, someone may remember what it looks like and reproduce it or capture the essence of it
its relying on the word "invokes" a lot
but its not very clear what invokes mean
because famously these models cannot actually regurgitate their training data without an ablation
aside from a few small examples, such as a certain Sony image that got into SD 1.5 over 2,000 times
that sells more clicks/advertising
Maybe, but it was so absurd and over the top it literally left me gobsmacked for a good many seconds
This doesnt look very good though, from the article: "In a thread on Discord, the platform where Midjourney operates, chief executive David Holz posted the names of roughly 4,700 artists he said that its AI tool can replicate. This followed Stability chief executive Prem Akkaraju saying that the company downloaded from the internet troves of images and compressed it in a way that βrecreateβ any of those images."
well susceptible people are trained like this
every now and then I take a look at discussions by the other side of the AI art debate
and I see such crazy stuff
based on this logic, they should kill pop music and pop culture, at least 95% of it
The funny thing is this: let us suppose indeed it can reproduce ... Woody from Toy Story to perfection, ok? It is STILL not illegal! I cannot use that image for commerical reasons or othger, which is wher3e the copyright kicks in, but it is no more illegal than my hiring an actual flesh and blood artist to make a perfect picture using Woody
tresspass the eyes and ears
if it's all free and no money involved, you can do a lot of stuff
once money is involved, freedom looses much of its meaning
Money is dead.
I cannot use that artiss's picture of Woody to sell things, or promote anything, and so on, but I can absolutely hire an artist to make pics of Woody, Spiderman, and more all I want, and neither of us is breaking any laws
They're just keeping it alive coz everyone is used to it. But it died long ago.
and it won't matter if he had pictures of Woody and Spiderman, the real pics, right next to him to model
by the time youll make your first million itll be worth 300k
Darth maul 
If Darth Maul were the love child of Jessica Rabbit
Darth Maul should not have been on Solo
Maul ruined Solo
its the scene that labeled the movie as a dumbass action comic book movie
The day star wars was reduced the marvel level garbage
Jar Jar was the canary in the coal mine
i think that girl had a horrible part also i understand she is a good actress, just not in that movie, maybe they wrote the part crap
Portman
She can be great even, and has proven it more than once
Also the kid who playe dthe little boy
he got so fked he stopped acting
bullied to pieces
thats how much people hated the prequals kids
you think they wer elove dno
before disney the rpequels were torn to bits and pieces by everyone
now folks have their nostalgia googles on