#💬|general-chat
1 messages · Page 160 of 1
does anyone know how i can make realistic selfies?
guys see this
ConsiStory: Training-Free Consistent Text-to-Image Generation | NVIDIA Research
gotta love nvidia with all these research
No
do you mean to put your face in another body ?
sure i made my brother look like a prince with long hair.
i used this site, click on selfie
hey all. does anbody knw if forgeui has their own discord?
Anyone know how to train 2 Loras of tv characters then have both of them in the same image
you're going to want to use SD3 for something like this, other models lack the conceptual understanding to even create that sort of imge correctly.
But this isn't a new concept right? Or is there a better way to do this?
SD3 is the first of the stable diffusion models that has the ability to actually understand something like 2 different characters in a prompt. the other models will make multiple people but usualy it'll just be the same person repeated.
Even more so when the characters aren't well known. Do u have any idea where I should start?
i would personally just start with prompting - see what the core model will do without stuff like loras
SD3 Medium from huggingface right?
sd3 2b medium from huggingface, yes
I'm probably confusing something, but I get this error:
changing setting sd_model_checkpoint to sd3_medium_incl_clips.safetensors [3bb7f21bc5]: AttributeError
Traceback (most recent call last):
File "C:\Users\aaaa\AppData\Roaming\StabilityMatrix\Packages\Stable Diffusion WebUI\modules\options.py", line 165, in set
option.onchange()
File "C:\Users\aaaa\AppData\Roaming\StabilityMatrix\Packages\Stable Diffusion WebUI\modules\call_queue.py", line 13, in f
res = func(*args, **kwargs)
File "C:\Users\aaaa\AppData\Roaming\StabilityMatrix\Packages\Stable Diffusion WebUI\modules\initialize_util.py", line 181, in <lambda>
shared.opts.onchange("sd_model_checkpoint", wrap_queued_call(lambda: sd_models.reload_model_weights()), call=False)
File "C:\Users\aaaa\AppData\Roaming\StabilityMatrix\Packages\Stable Diffusion WebUI\modules\sd_models.py", line 860, in reload_model_weights
sd_model = reuse_model_from_already_loaded(sd_model, checkpoint_info, timer)
File "C:\Users\aaaa\AppData\Roaming\StabilityMatrix\Packages\Stable Diffusion WebUI\modules\sd_models.py", line 793, in reuse_model_from_already_loaded
send_model_to_cpu(sd_model)
File "C:\Users\aaaa\AppData\Roaming\StabilityMatrix\Packages\Stable Diffusion WebUI\modules\sd_models.py", line 662, in send_model_to_cpu
if m.lowvram:
AttributeError: 'NoneType' object has no attribute 'lowvram'
you can't just load the model. sd3 doesn't use unet. you need to download not just the models but also everything else, and then open the SD3 comfy workflow
Is it Comfyui only?
auto1111 has a little support for it, mostly it's comfy
but you still can't just use it as an interchangable model with say, sdxl setup
it doesn't use U-net
I'll look up a tutorial
Basically it doesn't understand who the character is and justs generates a random person
okay. then i would probably make a lora that was half images of the first person, half images of the second person, with very specific lables
It wouldn't just combine the two?
regional prompter extension in a1111 i know for sure allows loading different loras into different regions. The regional prompting nodes in comfy surely will allow this too. swarm has regional prompting built in and could potentially allow loras per region. On the extension in a1, you have to set it to "latent" mode which is slower and more memory, but works
Thank you!, I'll look into this tomorrow
newest version has full support now. but it's grossly inefficient and comfy is still better for running it with t5
auto's is fine if you're just using the clip layers though
bro is trying to get custom char loras and you guide him to an untrainable base model
@rain crest your best bet is one of the realistic xl finetunes
if it's untrainable, how are there so many trained objects on civit for it, and how did @pale latch make a lora?
they're all garbage
everyone has an opinion, you're welcome to yours, however i wouldn't say that to @pale latch face
also, you might learn to read so you know what the guy was actually asking about to start with, unless that's too much to ask of your pea sized brain
i don't want to particularly insult anyone's training ability, it's not their fault the model is bad
no, but you did insult everyone - everyone that's been training with it and everyone that uses it. good job. now, how about you go find something productive to do with your time?
this phrases belies the level you are at, you are literally not qualified to type to me
with all due respect
reported for harassement
what more worthwhile use of time than to teach others
hello everyone I'm trying to know which model and setting are the most optimized for snfw or carefully generating porn on stable diffusion
sd3 works really well in many ways still. While there's not much for the gen pop yet, enthusiasts are making good fun out of it.
The tools to use it easier will iterate. The skill floor is slowly lowering.
Theres even naked girl models out for it now
oh so which cards need no half vae exactly?
GTX cards for example. But it also depends on the VAE file and model used.
Some 1.5 anime VAE and the sdxl VAE needs it too
When using kl-f8-anime-v2 as 1.5 VAE and sdxl fp16 VAE for sdxl, --no-half-vae isnt needed
Is the creative upscaler available yet? I know it’s in nightcafe but I don’t know how to use/access it in automatic1111, I am new to this though so I may be missing something
not sure how what angus has created for nightcafe has anything to do with auto1111
who is angus?
nightcafe says they're using "the new creative upscaler from stability Ai" im just wondering how i can access it running stable diffusion on my pc?
the guy that owns Nightcafe and the lead programmer - and you'll need to ask him what he's specifically calling that
hey guys, kinda new to this and was hoping someone could tell me what an "activation token" is
ah ok. i mean on the website they say they're literally using "the new creative upscaler from stability Ai"
...is that not whats happening maybe?
and - again - he's calling something that, no idea what, there's a number of things he 'might' have decided to lable as that. there's nothing from stability specifically labeled that. you'll have to ask him specifically what he's calling that.
right, thats what i was clarifying, i'll see if i can find a way to contact him
thanks for the info
log into the nightcafe discord and ping him
the terminology gets confusing cos not everyone follows this rule
but generally "creative upscale" means diffusion like Magnific
and "upscale" or "conservative upscale" means something like ERSGAN or the transformers (DAT/HAT etc) which are almost all based at least somewhat on the ideas of the SWIN-IR paper
and none of that is something that stability has recently released, but angus is really good at making stuff up
no telling what he's doing. he might even be calling SD3 the 'creative upscaler'
there was a random reddit comment or two
where Emad mentioned an internal upscaling GAN at Stability
but very sadly it has not been released
ok that mostly makes sense, do you have any idea how i could emulate any of it in stable?
dang haha, is it possible its just stability stuff that hasnt been released to the public?
my main recommendation is to do 3 stages of resolution boost
- generate with deepshrink/hidiffuson/resadapter 2. small upscale with ERSGAN or transformer 3. diffusion model like supir to finish
if you want to swap out any stage that's fine
e.g. magnific or others
nope. angus uses the same APIs everyone else does. though he is part of the commerical beta testers
but the key point is
don't try to get like 16x out of a single upscaler
that's when things start to go a bit funny
are you using comfy or auto1111?
thank you!
i see haha.
i am using auto1111!
I personally like the transformers best
but 99% of the community seems to prefer Magnific or tiled diffusion upscale
so that is definitely also an option
okay, first thing to do is get some examples of how it works together and then post in #🤝|tech-support and see if anyon ethat uses auto1111 can tell you how to get the effect
okay, ill give this a shot, thank you!
luckily auto1111 accepts transformers
you can put this in the upscaling folder:
https://openmodeldb.info/models/4x-RealWebPhoto-v3-atd
if I remember rightly auto1111 also comes with Remarci and Siax
which are some nice ERSGAN models, those are still ok
you the goat 🙏 thanks a bunch m8
no problem
I actually got into upscaling before stable diffusion lol
weird order
in case you come back and find that transformer way too slow, here is one of the fastest ones:
https://openmodeldb.info/models/2x-NomosUni-span-multijpg-ldl
in my experience people find my default methods too slow
the annoyed feeling you get when you realize you just saved frames for you animation instead of saving an mp4
so the previously mentioned stuff is slower but maybe more effective?
oh yeah drastically slower
ultracompact: 153.56 fps (0.0065 seconds)
compact: 82.92 fps (0.0121 seconds)
span: 60.46 fps (0.0165 seconds)
realcugan: 34.58 fps (0.0289 seconds)
esrgan_lite: 6.54 fps (0.1530 seconds)
omnisr: 4.89 fps (0.2043 seconds)
plksr: 2.65 fps (0.3770 seconds)
realplksr: 2.21 fps (0.4522 seconds)
esrgan: 1.97 fps (0.5083 seconds)
swinir_s: 1.07 fps (0.9375 seconds)
atd_light: 1.05 fps (0.9536 seconds)
srformer_light: 1.05 fps (0.9545 seconds)
swinir_m: 0.69 fps (1.4509 seconds)
hat_s: 0.44 fps (2.2763 seconds)
swinir_l: 0.39 fps (2.5610 seconds)
srformer: 0.27 fps (3.6405 seconds)
atd: 0.27 fps (3.7223 seconds)
dat_2: 0.27 fps (3.7284 seconds)
hat_m: 0.23 fps (4.3972 seconds)
hat_l: 0.23 fps (4.4004 seconds)```
look at that speed difference lol
oh damn lol, ok i feel you
but this is nothing compared to diffusion
you could set up a diffusion upscaler with a slow sampler e.g. implicit midpoint or 4th order implicit adams
with tight tolerances and small tiles
and it could take days
oh im using a pony with a built in vae, I wonder if it's that one. I've had it off for the past week and have been going fine i believe
hI! Who knows why every time i Render anything on any model - on the last step at 95% The rendered image goes Red Af?
VAE
hey! i thought i could google the stuff you mentioned but it turns out i have no idea what im doing lol, where do i put the transformer? i cant find the "upscale" folder anywhere
any good adetailers models to use?
Holy F*nk Comfy barely uses any recources at all
That thing is a NIGHT and day difference
yup 🙂 oh, also https://comfyworkflows.com/ something for you to explore
Everyone is happy when they give into the noodles.
The only question you will ask after you adopt ComfyUI is "why didn't I do this earlier?"
OOOOH THIS LOOKS SUPER FUNNNN! Thank you, will 100% take a look!
Oh gosh for reaaaalll!
This is honestly the good stuff. Wish I started here! \XD
welcome home?
guys whats the prompt to generate images here?? and from which channel?
read the information here #artisan-faq
Thanks dude!!
😆
Hi, trying to download some stuff from stable diffusion off huggingface. I made an account but it asks me to create an organization for certain repos.
So has everyone who has downloaded the latest models needed to be part of an organization?
Shouldnt be a requirement afaik
Ok. I'm trying to use GIT after downloading lfs. Since I'm new I'm not sure if I have to use a certain version of GIT or something. When I type in the clone command there, it gives me this:
Cloning into 'stable-diffusion-3-medium'...
sq remote: Access to model stabilityai/stable-diffusion-3-medium is restricted and you are not in the authorized list. Visit https://huggingface.co/stabilityai/stable-diffusion-3-medium to ask for access.
fatal: unable to access 'https://huggingface.co/stabilityai/stable-diffusion-3-medium/': The requested URL returned error: 403
Open the page in the browser and ask for access 
It says that having an organization or affiliation is required.
Just type in "none"?
Hi guys. If I change the name of a saved lora in Auto1111 will it break anything?
It shouldnt, but old prompts that use it might not work if you plan to reuse them
Ah, makes sense. Perfect, ty
That fixed it. Thank you.
I've forgotten how to promt conditions or write
(blonde or brunette or redhead)
there was some syntax
Ok, sorry, that helped me with the first repository. I want to download two more and when I type the clone command for them into GIT I don't get anything. It just doesn't say anything.
Why are you cloning them in the first place btw?
Ah I thought that's how you download lol. Is there another way to do it?
Lmao I legit thought that was the only way
https://huggingface.co/stabilityai/stable-diffusion-3-medium/tree/main
There is a files and versions tab
Then you just download the model you need
Thanks again
I just trolled myself for more than an hour over that lmao
Was telling myself "There's gotta be a way to just download all the necessary files at once" didn't even bother trying to read the readme file until I could figure that out smh
画一个屈家岭考古遗址内部照片,要求遗址内部的文物要能够体现出立春节气
Hello! What do I need to do in order to use Artisan?
Do you need the source repository to use a model? I am trying to download the 4d video one and the source for that is a StabilityAI>generative models github, link here: https://github.com/Stability-AI/generative-models
On that page, at the bottom, it gives installation instructions. Long story short, it wouldn't download because I need to integrate python into git. Do I need to learn python just to download the source repository so that I can load the model?
if you wanna play around in this field, you should probably have some cursory knowledge of python. Not enough to code in it, but you should have the basic familiarity with python environments and execution
you can avoid learning it in some cases, like many UIs hide the python and you dont have to learn. But at the bleeding edge, like fresh models that haven't been integrated into any UI's yet, you'll need the tools
Like to use stable diffusion at all or to download the source repository?
I know it's probably pretty basic to create a directory to have a virtual server, I just would rather get into the video creation. I'm barely downloading my first stable diffusion models rn.
Hmm. I was planning on using comfyui. I don't know if I can use that for the 4d model though?
not a virtual server. virtual environment. hmm, the distinction seems obvious to me, but i am thinking you don't know what a system environment refers to.
Ah. Yeah I have no clue, this would be my first experience with coding. I tried an html course once in middle school iirc through khan academy, but I don't think either of us would count that LOL.
rough explanation and don't treat this like gosple. ok so your system has a global enviornment. that's your OS where all the programs run. your browser is a virtual environment. Javascript an junk can run in your browser, and it wont' do anything to your system. other programs won't conflict with every other webpage you visit. Lots of javascript code going on on every website, but they're all contained to a virtual environment.
Python is similar. You create a virtual environment for python project so that it doesn't affect the global system or any other python projects you have. contained in it's own environment
i count it. it's experience. don't throw it out.
virtual in IT usually means "exists only in software" . virtual vs physical.
Hmm. Ok. I'll try to learn how to do it eventually if I need to I guess. It's not like it's actually developing something with it, I'm hoping it won't get too complicated? My main question is just if I need to to use the model I'm looking at. Do I need the source repository to use the model? Because up until a few hours ago I'd never even heard the word repository. If I need to, I can try to learn what I need tomorrow or something, I still need to learn the image modeling so I can throw that into the 3d render that I'll have to learn to eventually put that in the one we're talking about.
All this just to make a custom dynamic wallpaper lol.
At first it'll seem complicated , but then over time, itll feel like it's a part of you
I'm not sure there are many tools for SV4D yet. All of this stuff is pretty bleeding edge still.
When new tech hits the market and you see the Apple Vision in the stores or a new PS5, that's the cutting edge. It's a term that migrated over from manufacturing factories. Imagine you print a big sign. So you leave whats called a "bleed". Your sign's image file should be bigger than the final planned printed sign. So when you cut it, the print goes right out to the cutting edge and looks super duper clean. The bleed is the colors that print onto the paper outside of the cutting edge. Those are often messy and will have a less crisp edge for many technical reasons.
That's where this stuff is right now. It's out on the bleeding edge. Hasn't been tidied up to be a final consumer product yet. This is the domain of tooling and experience. Welcome to the club.
also i love saying bleeding edge because it sounds badass and like, i'm all bloody from battle
so create an organization
I wasn't sure if putting down a false or unregistered logo would have gotten me in trouble. Someone figured it out for me though, even though it says required you can put none.
Hi all. Am I able to create "groups" when prompting? Say a car, a tree, and a person - can I organize the prompt to include all things about the car, then the tree, etc? For visual organization I guess.
guys
will the rtx 4070 12gb run sdxl faster then 4060 ti 16gb ?
More cuda VS 4gb vram more 
my guess probably faster at very basic levels till you add like a Cnet or IPAdapter then the ram would be more useful
4070 would probably be barely faster. not enough to justify trading 4gb.
Good afternoon! How are we all doing?
when 800m sd3??
if i upload a image can someone tell me what is the model or style?
We can try 😁
i cant upload image
is sd3 based on sdxl or its new version of sd 1.5 2.0 ?
You can on #🏞|general-with-images
this is insane
looks like i need to save for 24gb gpu
i was going for 12-16gb XD
5090 waiting room
you realy want 32 if you can swing it
Hopefully nvidia will release a 5060 ti sd edition with 32gb of vram
I am really suprised why they havent done it. but we all know why.
Hey there, is there any channel specific for stability API usage?
you know everyone who plays vrchat, cvr, neosvr, second life, former playstation home etc is thinking: Stable Fast 3D Launch?! I'm gonna generate blender-compatible models of my waifu from 2d images and upload them!
sand
flux can run on 12gb of vram
they modified the "load diffusion" node in comfyui to allow it to run as fp8 weight type
vae and checkpoints, you can rename them to .safetensor (only the extension)
flux really worth checking out?
ty btw
It depends on what gpu you have, In my opinion is one of the best models
but its extremely slow on my PC
got a 4090
Damn then yeah you gotta download it
I´ve seen examples of it on par with dalle3 or even better
yes, and they also provide a 4 step version that is faster then sd3 or sdxl. I would recommend using 8-10 step on the 4 step one since that provides similar quality to the flux-dev. Even with 10 step, I believe it should be faster then sdxl or sd3(i think).
idk i'm kind of burned out of all hype
too many crazy releases in AI lately, it turned into a constant flux of releases
one thing i love about flux are the fingers and hands, they are usually very good. humans overall are good too.
thats an understatement. 😄
the license allows it. and distribution is allowed
this was no hype. they never said anything before today. black forrest just showed up and gave us new base weights like how beyonce puts out new albums
yeah thats amazing about these models they come out of nowhere
yeah but lots of DiT models have been coming out
flux is like edogram at home
whos wutang
just some future hw for you. no big deal
of course the side effect of this progres is that the pc i got 6 months ago now is barely keeping up
:))))
truth
if anything much beyond flux come si wont be able to run it
how long do gens take?
around 90 secs rtx 3060 12 gb vram 32 ram
hi
I'm new to this AI thing with stable diffusion, my PC isn't one that has the most advanced resources in the world but I'd like to know if there's a way to run stable diffusion on google colab without requiring an nvidia card, since I was only able to use the stable diffusion interface in the public URL once, but when I want to start it again it asks me for an nvidia video card, which I don't understand because the first time I used it it didn't ask me for anything. Could you help me, if it's not too much trouble of course?
Just seeing the flux news. This is great! Does anyone know if it can be run through automatic1111?
Not yet
I downloaded the models but I'm not familiar with the format which is .sft
Only on comfy. noodle flex moment.
Ah ok thanks. I don't like comfy. Using comfy is like eating a bowl of spaghetti. It's just strings everywhere. I'll wait until it works in automatic.
with a1111's issues with vram efficiency you'll prolly need an a100 when it does lol
just hide the strings
Isn't there also a node that can make them straight lines too?
there's a setting for that, sure
click the little gear icon on the Queue Prompt toolbar, then scroll down and find where it says "Link Render Mode". by default it's spline. change it to what you want those links to look like. pick hidden if you don't think you need to see how things connect
Hi, I'm wondering what AMD/Intel GPU users use for cloud GPU/SD solutions. Rundiffusion, Thinkdiffusion, Paperspace, RunPod, Vast, Salad...?
There are so many platforms and so many GPUs offered that it makes my head spin.
have used Rundiffusion RunPod Vast
How did you like/dislike each?
runpod and vast are same
rundiffusion is doing something a bit different, you pay a lot more but the server is set up for you and is more reliable
and now stability comes out with the 3D stuff.
so this 3D stuff is interesting as it shows models are starting to understand things in 3D so thats the key to consistency later on
This depends
Schnell, yes. Commercial or non-commercial. Schnell's got the Apache license.
Dev? No. Non-commercial distribution of fine-tuned or trained model derivatives based on the dev model is only allowed.
anything happening with SD3?
Black Forest just destroyed Stability, Flux is incredible
I found a platform, 1 image generation costs 1 buzz, and you get 300 buzzes per day. This is great, and it seems cheaper than renting a server
hi
300 images per day is actually an incredibly small amount
it might sound like 300 is a lot
but $0.20 per hour on Vast.ai can get you 24GB, or sometimes even 40GB servers
which can put out hundreds of images in that hour
hello
Currently testing it and responds nicely to prompts. Qaulity is also great. I just have to figure out how to add negative prompts.
oh, ok. Pity though, I could have used some negatives in my promting
I got something similar, static noise haha
"Our model is so good you don't need negative prompts."
Yeah, I'm sure it's very very capable with all sorts of images. With negative prompt I could exclude something, negatives are not just for fixing things.
yes
I think the same that's why I wanted some negative prompt ability...
I am 97% sure it will come tho...
Agreed, most likely will come soon
flux run on SD?
Have you been comparing it side by side?
Most distribution of models is non commercial. Everything on hugging face and civit.
What did you even correct?
Guys, I need some help with the Text Encoder learning rate... If I have my "Learning Rate" and "Unet Learning Rate" set to 0.00005, what should I set the Text Encoder to? I don't fully understand the relationship between these settings, what the Text Encoder does and how they affect each other, so I just need some general guideline
did u see sd3.1?
also flux is huge so of course it would beat sd3 (which im not claiming is any good in the first place), and people saying its prompt following is not as good as something like aura flow
Flux still has errors, for example missing limbs or body horror when women try to lay on the grass
What I was going to add to that before I imploded
Was if you actually look at the licensing for the flux dev model
it very clearly states the outputs of the model are usable for commercial purposes
1/10 times instead of 9/10 times
It's better in most cases
a buzz? does your platform have a bee theme?
is there a colab notebook for a finetune of sdxl working? kohya crashes when starting the epochs on the t4 gpu free. loras work but I think the sdxl model finetune is not working
also does anyone know if its possible to "--medvram" for kohyaa notebooks in colab, or configure the gpu so it uses system ram fall back instead of oom?
where is it ?
ill get it one sec
i saw this, but where is the anatomy displayed ? Nowhere
beats me
it was the reason why they are doing a 3.1
which tells me it is not ready yet
they probably didnt fix it yet.. if they did fix anatomy they would be showing it off
it is the biggest issue because it doesnt matter so much if the image quality isnt perfect that can be fixed with some finetunes
but the base model has to be good at generalising things like anatomy
datavoid fixed a lot of it in one epoch. Why don't they get him on the team to help
lol no clue probably something to do with their safety team
Yeah they said two weeks, when was that...?
more than 2 weeks ago

look at the data of the new licence than you would know
i'm guessing datavoid will release his sd3 finetune before them and it would be bonkers if it's better
Wut? Is another party now trying to finetune sd3, not waiting X amount of weeks??
Seems inpatient
Are the usual request finetuned, like boobs and anatomy?
a single person that got good results overnight
makes me wonder wtf is stability doing
Lol
Im yet to find an easy way to make sd3 loras, not even thinking about finetuning the model itself.
Civitau seems to accept sd3, but no online lora training
Just achieved full rank finetuning of Stable Diffusion 3 Medium 2B on a SINGLE consumer GPU!
Only months ago, this was thought possible only on 80GB VRAM cards.
He said on his tweet. He is not using a lot of ressources for this since he trains a lot of models
isnt that the same group that said they will release ella for sdxl a while ago, have they?
yeah but it was hard to do, they trained other things since then.
From a week ago. He should give up and fine-tune the true mmdit model
I tried Flux and indeed it is awesome!
he is just one person who is an electrician. Not even a ML engineer
If so you will never knew the potential of SD3 lol
2b is garbage next to flux and not just due to parameter counts
... And not just its hardware requirement 
i still hope sd3 get fixed though. Flux is way to large
Flux is the potential. Original team that did the original research
It is like, super great. But remember that Flux-dev is distilled version
The real Flux is even larger from what I can see
So is sd3. Dpo safety training was a distillation process
same number of parameters, same vram size, just faster generation
pro version prob needs around 32gb vram or something lol
no pro is the same size
ah
just longer inference time
Are you talking about the schnell version
It got Pro, dev and schnell
Dev isn't a turbo distillation
both are distilled. Schnell is more distilled
They're different distillation process
dev is guidance distillation, schell is guidance and step distillation
Not more than one another
i tried both schnell and dev, and i personally like dev more, schnell seems to produce a certain aesthetic or kinda bit burned images for my taste, idk.. maybe il try it again
If size does not matter, just throw away ML. Just put all existing images into 1 database, then do some interpolation 😅
Only need 4 steps for Schell or it looks burned
tried schnell on hugging face space but it's very different from my comfyui one. Maybe because i'm using the fp8 version and not the fp16
That will just caused a catastrophic amount of storage taken lol
I didn't say size didn't matter. I said it's not the single reason why flux is the true implementation of mmdit
Plus it is better to just improve its training method
Rather than revamping the whole dataset
flux finetunes 😮
I just wish they would make miniFlux for 12 GB vram
Sd3 is what happens when inexperienced executives make decisions that force the researchers to handicap their work
i tried flux on my 12gb vram laptop, it works
Huh, how?
default comfy workflow, make sure you run in fp8 for both unet and t5xxl and it should work
mine works on 6gb vram hurray
Say what you want, you can simply use your other 100% open source model in your disk happily
Better situation compared to rocket launch and ignoring engineering advise
14 researchers left stability to build a proper mmdit model and effectively utilize their research. There is a ton of subtext in this event.

sd3 8b is good no ?
but that minute is worth it 🙂
im personally waiting for 5090 to drop so i can finally upgrade and be free lol
i mean upgrade my whole pc
Can you tell where those researchers went to
8b is good. 2b is good too. just not compared to flux.
Black Forest team built the foundation of sd3 too. They would've pretrained 8b
Founded black Forest labs
imagine flux 2 😮
@pale latch do you know of, or know any documentation of, what sort of videos specifically that SVD was trained on?
have you compared 8b with flux with the same prompts ?
They're being sued by shutter stock because of the watermarks so probably scrapped data
woman lying on grass? lol
yes
more specific - like how much of the dataset was vehicles, how much was animals moving, etc
Yeah I will just take it with a grain of salt
does that work well on sd3 8b ?
Or 2B API?
I did hand stand and composition tests on both on different days.
i hope they release svd2 or something
agreed, and sometime soon would be nice
Haven't directly tested but 8b isn't better. There are a few fundamentals that are extraordinary
No more pixelation blowouts past 1024 resolutions is big
Confirmed by Emad plus all these guy's names on published papers
8b should be enough for anatomy right ? And i don't think size has to do with anatomy since pixart sigma does better anatomy than sd3 and it's onmy 0.6B
flux is very good at generating what you want first try, compared to so many other models, but of course it's not perfect, but that is ok, cause as a base model, that is crazy good as is
than the 2b model anyways
it is extremely good, way better than midjourney
Flux is all the research that built sd3, and more, and no corporate meddling in the training process
Generic NovelAI art, yeah
To nowadays AI porn infestation
Black Forest labs also has 33m in funding while stability has 3
wait i didnt even check, does flux have a research paper or technical paper? i want to read
It's coming. They showed up and gave us weights with no hype
The blog post announcing it has some info
i wonder how long until controlnets, loras, finetunes 😮
Pretty sure it is something coming soon
But I dont think it is any different from other MMDiT
They're the guys who invented mmdit
Also pretty certain that they gonna training a what, v0.2? If they have the power to do it
that was bonkers. I like that style very much. Like "here friend, take SOTA model, bye". Thank you ?
Or just went on doing optimization work
Beyonce style
Auraflow 0.2 is out already. They is flux.1
i tried also i2i with flux, does it very nice too
I mean Flux ofc
auraflow is not that good. The dev said he had made so many mistakes. And it does not have 16ch vae, which makes it unhypable
Auraflow's aethestics is shit as fuck as for current
They're making a sota video model next
Currently it is still about prompt coherency or adherence
wait, what is the flux vae? is it 16ch?
Wait
Must be
yep
😮
There have no attention for Flux's VAE at all I realized
i mean the results do look good
it's gonna be the thing to beat
100% is 16 ch, it renders far away faces and text with high accuracy. THat's not sdxl vae. idk if it's 16ch but it's for sure a better vae than sdxl
I'm gonna coom again brb
i hope pixart devs don't dissapoint, they said they are aiming for something this year not the next
but come on Nvidia... release 5090 already... i cri
yea i heard the next pixart model is coming
it's always nice to have competition
and SAI has yet to fix 2B version and release 8B
i like what they have done with sigma. Such a small model, but it is so good. They aim for effieciency so their next model should aim 8gb vram cards
yea
Guess it take time for them to recoil the entire management and training
or amd, be cuda compatible with more vram. If only they were smart
You know after all these mess lefted
Stability is still radio silent as far as I'm concerned. Not as cool of a stealth mode from what bfl did
they only need one competent guy on it.
Obviously it's for good or bad we dont know a thing for sure
yeah Lykon has not come here at all lol
We know they've been up staged and have still said nadda
Probably got hit by new management after his controversial guidance lol
maybe, what controversial guidance ?
When someone complained about SD3, he simply said "it's your skill issues, you should have learn how to prompt properly first before anything"
btw pixart discord was very active. It died as soon flux came out
And it got blown off
Still can't help but laugh at how a community of PC users with gaming hardware acted so fragile about "skill issue" comments
yeah wtf. Now we can say, look at flux. Even shitty prompts work very well with 1 less clip lol
i mean it's like a ferrari being released next to a toyota 
He also have multiple... opinions with AstraliteHeart and his training method
Ferraris can only be bought by a private club of people
But anyway he just got muted from ever talking about it.
trying to shup up the mob with lies. When has that worked lol
i personally dont have anything against lykon, to me, i still look at him as one of the top dudes for finetunes, so im happy with that, i can forgive some drama or whatever
Astralight is a bit of a sociopath imo
yeah i think he was just playing the role of the employe and could not talk bad about his company
RunDiffusion ( creator of Juggernaut ) unsurprisingly stand for neutrality
Juggernaut Flux 😮
I gonna off and go back to sleep
flux does not seem to need a finetune, only maybe one for further aesthetics to completely shit on midjouney's neck
and for some boobas
Ponyxl has shota and loli content dpo trained into it. So Yeh. He pretends to be about unity but also hard refined CSAM into his "base" model. It's unfortunate to see such success in his efforts
it's kinda funny really, sd 1.5, you generally need like tons of negatives for it to give you something nice, sdxl way less so, like very small amount (at least personally), and then flux you really dont need negatives at all LOL
then again, it still would be nice to have negatives i guess, just to have it, idk...
not necessary when the model is good at prompt following
i dont really follow the pony stuff (not my thing), but what is the current status, is the creator working on the next version?
but sure it would have been nice if you only gen nude but don't want any. has happened sometimes with schnell. That model needs negatives
so he says but not an actual update with which base model
isnt schnell the german word for fast?
6.9 of course it's that version... haha
yes
He's a business operator after all. Makes money by selling sex to children and grooming them. So disappointing
i had no idea about that part
Pony is embarrassing to acknowledge is such a large part of the community
It's a cartoon porn model. Who do you think it's primary audience is
Rule 34 joke grew out of trends like furniture porn . Then it just became groomer art
is there evidence of him grooming?, or are you talking without anything to backup your claims
Releasing a model that is primarily purposed to depict child pop culture in hardcore penetrative porn qualifies as grooming in my book
"child pop culture" define that, what counts as child pop culture, if you say anime, then your plain ignorant
You may disagree but I view you in a whole new light now
what did i say?, i asked you to define child pop culture, i am from iraq i dont understand western culture much
Sure
can you define it then?
if you mean by child pop culture, children cartoons like my little pony, if so i guess i understand your argument there
Because lolicon anime didn't qualify. /s
jesus flux is so slow even on a 4090
Hi all
ok, im gonna try flux schnell on comfyui using 12gb vram and 16 gb ram ... first need to make some room on my harddisk
what speed you getting?
dev model with fp8 clip takes like 8-14s/it
using --highvram loading the full thing
in lowvram mode on the fast model it was like 1-1.3 it/s
only got 32 gig ram so i dont think i can go full model
1.4 it/s on dev model with lowvram option and fp8 clip but its laggy asf
on my laptop, 12gb vram 32gb ram, im not using any of the vram arguments, just default, and fp8 unet + t5xxl,
i didnt check the it/s, but for schnell 4 steps, it took like 20 seconds i think, and dev 20 steps was around minute
or something like that, forgot exact numbers
il do more tests tonight
Prompt executed in 82.07 seconds on 4090 dev model in lowvram mode fp8 clip
took forever for the vae to actually do something
14 seconds for the actual rendering of the image
loading the model and the vae took so much time
How long is it taking per image? I'm also using a 4090.
14-15 seconds on lowvram mode with fp8 clip and 24 gig dev model
if i load the fp16 clip the low ram i have is not holding and itll lag
oh you using the full model? not in fp8?
yeah full dev model
not quantized
https://comfyanonymous.github.io/ComfyUI_examples/flux/
the ones linked here
Hm, just compared our times, and yeah you're right compared to XL it is long, but after coming from a 3080 on my previous rig with SDXL taking 30 seconds to gen in the past I'm not complaining TOO MUCH. Haha
I'm sure it will get faster as we optimize.
just the model is large, i hope it can be optimized and compressed more
also, given how good the results are, it's really hard to complain 🙂
but yea we will get optimizations soon
yeah size is a factor. it currently is too large.
I'm sure the Pro model would make the 4090 look like an infant child.
yeah, just the vram tho
pro model secretly uses a 5090 card 
nvidia doesnt wanna give out more vram even tho they could
they could, but they have larger markets that need supply too. supply and demand
the silicon forges are only cranking out so many chips at a maximum density
i mean i dont expect that we will get 48gb vram, but i hope we get 32gb... i mean come on man... 😦
i expect 48gb minimum for the 5090
if they dont its not worth getting
could get a second 4090 and get 48gb
but why? 5090 cards are still technically consumer cards and what consumer commercial software uses even 24gb?
bro have you ever played with pathtracing
flux
pathtracing maxes out any gpu
when the market is ready, cards with higher densities will come
the next Cyberpunk sequel will use unreal engine 5 with realistic graphics they said, that means we need at least 32gb vram 
enterprise datacenters will probably drive purchasing in this arena more than home users
delayed for 3 more years yet
also for gpu market, and manufacturing supply, covid fucked up a lot of long term logistical plans. like building new factories.
we're still in the recovery period
nobody talks about covid these days, because now they talk about the wars going on :3
i'd recommennd saving for pro end hardware. you can get used a100s fairly easy on ebay
yo wtf flux does nudity
dont tell anyone
lol. nobdoy had anytime to come up with any conspiracies about flux and it just showed up. its' such a shock bomb to the field
woman lying on grass works too 😮 shocking the audience
first woman in grass test i did failed. i like it a lot tho
15 seconds per full image is good, the images are high quality. but loading the model in and out of vram each generation to be able to load the vae
eeeh
anyways aint no way anyones gonna fine-tune flux if its this large of a model
people will
the top 0.1% will
with what money
I tried woman laying on grass yesterday with the flux-schnell model, didnt really work
all the good refines of sdxl are made by 0.1% anyways
you only need the best people to refine it
yeah, but some of the best people might not have the best hardware ....
so, size matters
i personally like dev more than schnell, just gives me better looking pics, idk
the money they have and you dont'? lots of people have enough money that they don't have to care about making sense of it
for a good ecosystem to exist you gotta have users and with a 24gb model you only get users with the hardware
this is a new field. i don't think there are any legacy approaches that MUST be done for a "good ecosystem"
good ecosystem is relative, either you are a company or person
tools get developed by people and if the people cant use a thing they certainly wont make tools for it
memory is tight i admit. i just dont see it as a deal breaker like you are portraying it
you're projecting your own inability onto everyone.
idk i just compare it with SD 1.5 that was easily available and because of that had people making stuff for it.
okay relax mr hobby psychologist this is not what its about
i don't see telling someone they're projecting like pretending to be a day one psych. i see it more like "dude there's a boogie hanging from your nose"
you're toxic and i dont even wanna engage with you
we all got boogers but when they hanging out, i'd like someone to let me know
you just declared that tooling will never come for flux authoritively, and i'm the toxic one.
fact is if the model remains large and slow for most user, people will stick to what they know. if there is less users, less tools get made its not such a big deal
i think you're taking shit out of context and making a hyperbole out of it just so you have an argument and an excuse for personal attacks
sdxl did fine when it had far less users and people were saying it was useless cause they couldn't run it
this model will just encourage lots of users to buy new hardware. i'll probably get a 24gb card in the coming months now
finally seems like i've reached the limits of 16
even with my 4090 i wouldnt use this model much because of the time. maybe making one high quality image is something others do but i like to just explore concepts fast and iterate quickly. with flux i can't do that rn because its considerably slower than XL and 3
you're projecting your own inability onto everyone.
useful to know. now i can practice that inability and get gud
ty for pointing that out
lol, no skill issue there
i guess some patience and understanding is not on any skill lists
... in general, not aiming for anyone
patience is a skill that's often tested
hm. undesired instant testing, yes.
my downloads are faster than I can clean my disk, arg. inpatient.
i'm a big believer of the gauntlet style of test. no training. just go out there and do it. run the gauntlet
yes, im also for confrontational therapy and methods. The real thing is the only thing
learn to fail. its good for you
up to a certain level, yes. people need to be confronted, but not burned to the ground with traumas for the rest of their lives
but she turned me into a newt!
we give some training on the job, and my colleague has the style of throwing them into the deep (with stones) and see how they swim without help. Just to let them know they know nothing, and should listen to him. That is a great way to shut them down for the rest of the whole training.
I mean, yes! They need to be pushed into the water and get wet, but not drown
ghahahahaha i love that. allow them to experience how bad they are
its a good motivation to start any training! 😄
flux uses a 16 channel vae
i'm sure its really fun to be around and work with you two and these highly transparent and professional work ethics
oh wow a second 16 channel model
we're not working at the beach or at a pool, it's just a visual story
effective training is actually way better for an employee than it is bad. it empowers them
I guess everyone who had some training in business context, has a certain feeling about semi-commercial trainings
right, that was just the teaser. then you help 'm out, explain some stuff, learn to swim together, etc
semper fi!
Open Sora 1.2 is 67GB vram
so open source diffusion models with big VRAM use has already been a thing
i really want to read the technical paper
man the last time i heard that was in metal gear solid 2 LOL
Hi, can someone help with generating text in stable diffusion?
wasn't deepfloyd huge too? not 70gb but it was significant
ah I don't know about deepfloyd
i only know pink floyd
the us marines use it and they train in water hard.
yea i think they mentioned that in the cutscene lol
Yeah it does, wish I had more vram to run it at normal speeds though. 15 sec/it on this old 2080 :/
it was good with short text and prompt following but image quality was ehhh. it was incredibly slow compared to sdxl as well. not worth it anymore now but was pretty impressive when it came out.
because it's got the same core issue that SD3 does, and obviously as it's not censored, it wasn't because it's lobotmized
Uh, I think flux is censored? or trained not to show much nudity and other stuff
another thing is that
the new model is ret flow again so no SDE samplers
sigh. no
woman laying on grass is 50/50 on schnell.
so half of the time is perfect
yup. Because the issue is in the core of the model, not something that happened after it trained
just do 6 steps instead of 4 and voila(no prompt enhancing), its pretty nice
did any of you try if flux works with just clip and not t5?
but otherwise with other shitty prompts like this anatomy is good. However this is the turbo model. Dev model is better
T5 is his power
i was just gonna try that right now
im not on my laptop now, will try tonight, cause i just want to see results and differences
quality will most likely drop a lot, the other text encoder is just 246mb i believe
but of course t5 gives it the real power
its based on al the research that sd3 was built on, so i suspect it could potentialy load with t5. i dont know if that's part of the distilled dev version though.
if it can't be done, but then someone makes the model run without t5 encoder anyways, i have to wonder if that violates the terms of use. restrictions include not circumventing limitations that are designed into the model
why would switching out the encoder violate anything?
if the model doesn't run without the t5 encoder, but someone makes it do that, that could be viewed as circumventing a limitation. it's just an ambiguous case that the licence leaves open
isn't it the case that loras often train the text encoder anyway?
which is why they have conditioning noodles in comfy
you're not allowed ot circumvent limits designed inot the model it says
could be seen as jailbreak i guess
if it allows its not limited
mmdit loras shouldn't need to train the text network anymore if i understand things right
t5 has self attention, and there is a text network running parallel to diffusion with cross attention
is this the license for dev?
and schnell is a nicer license?
schnell is apache license. truely libre. free beer
I could still see some use for fine tuning T5 to add some vocabulary
a lot of these models are just missing terms
yoloworld thinks stormtrooper is hairdryer
idealy, the model just has to learn the term and the token can associate to it. t5 might not know 1girl, but that'll get a token still
also, in an ecosystem where every model has a carbon copy tenc, situations like pony breaking compatibility and calling itself a new base model won't hapen. academically embarassing imo.
switching out the encoder isn't circumventing anything. they don't want you trying to jail break it
ianal but the legalese is prety ambiguous there
there are restrictions that exist
the legalese is written the way the lawyers require. it's not ambiguous to a court. however i doubt they're going to be going into people's homes or checking their posts online to see if someone jail broke it. it just covers THEM if you do somethign that lands them in court
but its unrestricted, there needs to be an attempt at locking it down to be a limit
sd3 is designed to allow t5 to be removed. if flux doesn't allow it to be removed, they specifically changed that flexibility into a limitation. if soemoen else comes along later and publishes a way to remove the t5 and run it without, that is circumventing a limitation.
stability probably wasn't ever going to enforce 2000 image limits but they put it in there anways and it was a huge concern. restrictive licenses are concerning for mulitple reasons.
I gotta admit, it's strange the irony of Flux's release post SD3 is considering the people who made it. Almost poetic. lol
just pull it into comfy and don't give t5 any tokens. boom. you've removed t5. now what?
go to jail
not even
it is poetic for sure. life is beautiful.
I believe thats how I use sd3 anyway
it's still in memory what do now?
let's back up and look at timeline here - SD3 is in process of training. Robin, dev lead for SD3, gets mad, and quits. emad quits. now robin releases flux. do you think this is an accident? wht tech do you think he took with him and used for flux?
his brain
he took, and is using, the SD3 tech that he developed
All of those are really really good questions.
might be in memory, but it's not doing anything, so it has no effect
well than surely we will see a lawsuit, till then it didnt happen
we won't. cause he owns it
so he didnt take anything from SAI
the parting most likely went like this robin and sai agreed that sai could continue to develop with his tech, however they aren't entitled to anything new he creates and he doesn't have to give them anything else.
the format all along has been that the devs working at stability are really the owners of what they create, it might change now, who knows, but it's been more of an incubator than a corporation
no one will work there
you have schnell which is apache 2.0 and truly open. Its fast as well. Faster then sd3/sdxl I believe.
dev doesnt have such a free license but its just too good so most people don't really care. hands/humans are great and prompt following is amazing with dev. Finetuners might care later on but most people who just want to use it dont.
if sd3 was like that, most people would not care too much about the license.
Yeah, the psychology of it at the end of the day is if SD3 had a better launch, the licensing issues wouldn't be overlooked but would be somewhat, "overshadowed", to say the least from the amount of interest it had overall, per se.
most papers credit Peebles and Xie anyway
for DiT
a lot of this stuff was made in parallel by lots of different people though
whatever model makes nicer overall stuff will always overshadow others, in 6 months again probably wont be using any of these
Considering how quickly AI progress is moving, whether it's LLMS or AI Art, stuff moves VERY fast. The whole scene can change within 6 months. lol
SD 1.5 with ella, hi-diffusion, PAG, SAG, CADS, FreeU, Perpneg etc
is still really good
im in a discord with alot of web generator people (midjourny, imagen ect), They pretty much didnt use SD1.5/XL/3. but now i see many using Flux
the draw of instant memes was too strong
If the comm7nity hadn't been toxic, an unfinished sd3 model wouldn't have been launched. They got what they demanded
to be fair stability did what open ai does and announce like 9 months in advance
which is naturally a bit annoying
its ok for open ai to do that because their actual goal is AGI and the products are just a means to an end though
but yeah waiting 9 months for Sora, for example, is also annoying
At least Zuckerberg is on the right track with Llama being open source.
Points for that one.
ye
will Meta start releasing image generators too?
yea i was talking about open source stuff 😦
The holy grail is image editing through LLM, of course. Chameleon would be a step towards that.
Do I need something specific to run XL models or just accept that it will take longer time?
there are quite a lot of ways to speed it up
Can anyone ELI5 how it is that SD3 and Flux use I guess the exact same stock T5 model? Is all the actual training in the transformer models then? I think I must not understand the relationship between them and the text encoders as I always thought the text encoders were the basis of knowledge of the original captions and whatnot 
in the feed forward layers
Just d/loading Flux.dev - is it worth all the "better than MJ" hype?!
yes because bigger VAE
better is specific to the user. use it and see what you think
How many language models specialize in prompt generation?
none of them
they should hook up a language model in the feedback loop, automatic variations on prompt, guided by a feedback loop of some image quality - or rather prompt-adherence - grading model
FluxDev.sft and FluxSchnell.sft are both 22Gb. Are they both the same, or are both downloads required?
no just one
Schnell is a 4 step version
schnell means fast in german, it's the turbo version
from their webpage: "Trained using guidance distillation, making FLUX.1 [dev] more efficient"
i'm sure it did
yes, and I just ran a 12 billion model mostly using harddisk swap I guess ... 13 minutes for a 1024px img is not worth it. I need a bigger gpu
ill just go back and generate some ponies using sd3
you just need more vram, really
ill download some free vram
have fun with that
is there a way i can get stable diffusion to play a sound every time it finishes a generation
Is it me, or is Flux unable to do "abstract" art styles? 🤔
Even SD3 can do watercolor art, but Flux will always default to a detailed style. Especially for humans. (I'm using flux-dev with comfy on a 3090.)
no. that would be a system thing anyway
well i figured it out
u can 🤦♂️
you really dont want to
why is that
well maybe you like it, just try it. I would get mad with all the sounds.
I rather have some background music or youtube video
ah yea ik ima get mad at the ding sound but i need it cuz every time i gen a image i forgot to check back
sounds that interrupt are never good, either you are focused or not
if you are focused on other thing, then that is ok too. they might need your attention
multitasking is a fairy tale, both for people and cpu(*cores). that's why we work in groups, or have multiple cores
im focused on a few things but i just need to go back to a1111 change 5 words from a bot i made and back to what i was doing
once you generate a few images at a certain resolution and amount of steps, you'll know the typical duration. you can adjust to that.
it doesnt hurt if you're a few seconds late
im usually half a hour late
and my runpod is just eating up funds
your what is doing what??
what
Just kick the dog back into its cage or something. no just kidding. What is runpod?
so, you pay for being logged in, instead of paying per gpu/cpu cycle or what?!
yea per hr im using the gpu
I usually get somewhere between .22 to .24 on vast for a 3090
im using a 4090
but also i tried vast shit wouldnt work well so i just used runpod and it works good enough
yah, for me the tradeoff for a 4090 isnt worth it, it's double the cost or more but not double the performance
not wrong but now i cant even choose a worse gpu
the worst one i can get is a 4090
keeping the same storage solution
runpod is a bit more polished experience, but I tend to use vast, just because it's cheaper by at least half
yea ik i started valuing my time and effort over that small money increase so i just took runpod
couldnt be bothered wit that shit
can anyone help me out and list the top models for photorealism
why do you want to creak it? what does 'creak' even mean? what are you trying to achieve? please provide more context before posting links
Guys, don't click random links...
or post them
fr
plus bro spelled it all wrong
can anyone help me out and list the top models for photorealism
thank you for this puzzle!
it was great to see chatgpt try to crack it a few times
lol
chat relax, it's fine. it was a fine puzzle
ill give the encoded text here straight away, if not too much text
but wait, first some text above it. it might give some hints on it.
Listen closely, and you may hear the faint echoes of those who came before you—seekers of truth, adventurers of the mind, perhaps those who ventured too far and found more than they bargained for. Their stories are woven into the fabric of Vault Delta, a testament to the peril and promise that lies within.
"Dawg uq fql jnjq vr aaa Aaaaaa? Pufe xgnzmebk fhe sbjtbdck cokkhdt atn tcrp, bzjulsroqb mwk ouoaskqm bj pqayqlf. Geq addl jnjq vr aaef qlpsvhmep Fgtq Shnq dctmrul bkq vr aaa rzgcqazx’f splmcllp jkqaqapxo, m zqlyxp hzmdz vghl fm m zxhrzf mqf. Dvapcu klaeaa roq abpyb “Kalahn,” roub lkyffyyk akwibxcy’b gwzb gz wloae envwnu, napgus ah geq wdxmhqaa kfeclku qtya bbknbrzbz coxi."
that seems like a subjective question. what is best to me may not be best to you. what qualifications are needed to make it the best? edit, best/top
for sdxl you can try realvisionxl, crystalclearxl, nightvisionxl, polyhedronxl, and realvisionxl...that's by no means an exhaustive list
for me it means simular to Jugg V9 and Realistic Vision v5.1 and v6.0. something that looks more like an iphone shot and less cinimatic, more amature, capturing skin texture
@trail lion I appreciate the clarification on my question and I agree that it does mean a great difference to each individual
cinimaticredmond is also pretty good, if you like skin details, etc (and btw there's also a lora for skin details'
whats the lora called
search "realistic skin texture" on civit
thanks
also consider using the style selector extension in auto1111 as the cinematic and hyperrealism styles can often help achieve some of that photorealism you're looking for
roop extension no longer can be use?
I never used it but my understanding is it was no longer supported by the dev, and was succeeded by reActor, which I've also never used. I prefer ipadapter personally
I shouldnt say prefer, I'll say instead that I've been happy enough with ipadapter that I've never tried to seek out an alternative
I see, if there is an alternative then is ok, thanks
ipadapter with the insightface model is really pretty amazing if you've never tried it
no. use ReActor instead
ok, thanks
How can I upscale flux generations? Does supir work?
Is there a test/dataset like the MMLU (yes, I know it is flawed) of prompts for text-to-image models?
That or just a list of prompts ranging from simple to complex to test an image model?
I'm trying to download requirements for stability AI's generative models rn. Currently trying to download invisible watermark. Is that a super long process to run? I found a github or something for using gpu vs cpu, but would it matter too much? Idk how long that specific process is.
Are there other requirements for the generative models that would be better to download for gpu instead of cpu? Just trynna see if I can speed up the process.
I posted a longer text asking for someone specific to help in the stable-video-diffusion channel if ya'll determine gpu use is better here and are willing to help with that.
Kinda have the same question for fairscale setup. Found installation instructions here: https://github.com/facebookresearch/fairscale/blob/main/docs/source/installation_instructions.rst
It didn't make much sense to me though cause it was kinda written for people who have more experience than a first-timer like me.
Heya friendos
Guys, when integrating SDXL into an iOS:Android app, if there any policy in what a user can create? Limitations?
You would need a certain model that has those limitations I think?
If that's your idea, not letting them create something fucked.
you're going to try to run SDXL on a mobile device?
guys I am using Xformer flash attention, generating 1024x1200 images on pony model, 12 steps dpm2m++ sde, it takes 3s to generate 1image, is this the maximum speed I can get? 4090
faster that it runs for me
im trying to use SD3 on StableDiffusionWebUI and the quality is awful, im not sure why, ive given the exact same model in comfyui the same prompt the quality of the image is exponentially better
its the same SD3 model for both of the interfaces
it is awful, i tried yesterday, can't finetune to generate any good images
webui?
yeah
like bro
wtf is this
wait i cant post pictures
ill put it on imgur
https://imgur.com/a/yuhw1d4 look at this
exact same model, exact same prompt
but webui just cant generate a good looking image
maybe just because SD3 sucks, and webui is now not that compatible....
tbf
ill try it with 1.5
also, what do i do if i want to use custom models with comfyUI?
Yes
i tried downloading a custom model from civitai but it doesent come with text encoders so i cant use it with comfy
not sure if you ahve the same settings as i do on comfy, but if you do, set cfg at 4, and use the huen sampler and do NOT use karras - it doesn't work with sd3
i wish you lots of success trying
Is it hard?
no comfy looks great
its webui that looks shit
Why what’s wrong
the settings are wrong. what sampler and scheduler are you using?
jsut let us know when you get it to run if you would
not sure, ive just started it lmao
Is the API bad?
no, the APIs fine. but if you're going to use the API, then get sd3 not sdxl
It’s the cheapest tho for users, at $10 for 1000 credits I believe
okay, SD3 doesn't use the same neural network as the other versions of stable. you can't use most of the samplers with it, and you can NOT use karras as the scheduler
so check that first
alright i set the scheduler to the same as comfy and the quality instantly improved
how do i use custom models with comfyui?
ive tried downloading one from civitai but it doesent come with text encoders so doesent work
infact, when i tried it it crashed my GPU 💀
fine tuned models are for various base models. so you can't jsut switch them around. you look on the page on civit and see what model they are for.
same with loras. you ahve to use them with the base model they were trained for
the model i downloaded was for SD1.5
then you load an SD1.5 workflow, and find the model, and use it instead of base 1.5
right
alright, thanks!
welcome
take a look at this
exact same model and same prompt but comfy looks a lot more realistic
idk why
ive set the settings to the same as comfy
hyper realistic is an art style. youve also got high cfg and steps on auto. your settings aren't at parity
i just noticed
im setting them to the same now
i kind of think both aren't very photorealistiic, but the auto one feels more hyper realistic style
your cfg is way too high
ye i just brought it down
they are starting to look similar but comfy still looks more realistic
yes, well - comfy is a much better interface - why are you using the other one?
because i was told to use it with custom models
guess not
ill load a SD1.5 workflow and use comfy
just put your models in your models folder and load them into comfy like you would any other model
who told you to do that?
not someone on here
good
yeah comfy seems a lot better
i guess its not as user friendly but i like it a lot more
it looks more polished
reminds me of that tab in blender that you do all the shaders with lmao
For what kinds of prompts do you decide to use different schedulers or samplers? Do you use the same one for everything or vary it depending on what you're trying to go for?
where would i get a SD1.5 workflow from
Has anyone actually tried the video and audio feature
i don't use the same samplers or schedulers for everything.
to add onto this, i cant find one on the SD1.5 repo like there was for SD3
i have workflows i can DM you if you want
if you could that would be great, thanks
Ok. I'm still trynna download the models I'll be using but could I dm you with details about what I'm trying to do to see which samplers or schedulers you use? Workflows could be useful too, just downloaded the tripleprompt json you sent in another chat.
sure
Ok
High-end British restaurant logo
I'm trying to download ultrapixel, have some questions and wondering if anyone can help. I don’t have some of the libraries listed under requirements for stable cascade, I’m assuming I should download them?
Ok. How did you install it? Did you download the models listed in the readme file for cascade diffusion? And did you download everything under requirements? I'm trying to figure out if I need to or not.
i did everything as i was supposed to
but when it came time to generate it just hang...
fomcfyui just gave up
Dang
hellow
am available
do you have a graphics card
When the diffusion is stable
Assuming you wanna make pron like 90% including myself: get web ui, pony xl checkpoint then a bunch of loras
we're on discord you can just link to the discord comment instead of linking to a reddit post with a screenshot of a discord comment
I think that screenshot is from a different discord server
yeah they don't
ah ok
it worked in DM but maybe that's different
anyway I don't want to start any drama but the pineapple is wrong it is not impossible to fine tune
its a large ret flow model distilled from a second, larger, ret flow model
that should not be impossible to train
distilbert is distilled and it is one of the most fine tuned models out there, in LLM land
so is flux worth downloading ?
it's great
yes
it's not as flexible in styles as SD 1.5 and XL but besides that it's awesome
can someone pls show me a controlnet inpaint for sdxl? I don't found it
powerpaint v2 with brushnet
It's kind of amazing that such models can be runned locally, even if it is quite slow. Makes optimistic that consumers GPU's can produce a crazy good stuff eventually, don't need Dall-E for that level of prompt adherence (it often follows prompt better than Dall-E).
thanks!
@fervent thunder man there an a lot of controlnets model that I don't know, I know only the basic like controlnet, T2A and CoAdapters
yeah same I am not a big controlnet user
my favourite nodes are FreeU, PAG, SAG, CADS, vectorscope, rescaleCFG, dynamicthresholding, vectorscope
and I just play around with those
They will show up as unknown for anyone that hasn't joined the server
ah ok thanks
Can someone help me?
I have a running instance but I want to use it in my python program
huggingface diffusers library
explain, please
a webui server? you'd just socket into it. it probably has an api mode that your program can interface with.
actually, it's not up yet
I'm in an ldm enviornment and I don't know what to do from here
I just followed the requirements on the github
is there a llama AI that can generate images
ok I would not use this repo
Should I use the automatic111 instead?
I would recommend huggingface diffusers library
but it depends on what you want to do
Anyone in here have ultrapixel up and running and familiar with this error?
"When loading the graph, the following node types were not found:
UltraPixelProcess
UltraPixelLoad
..."
it might be the case that using a pre-made UI in API mode is what you are looking for
rather than a library like diffusers
I just want to host an AI art generator for my python script
hey everyone, I finally have SD installed and have generated some stuff, mostly the "bad" stuff so far lol... I know this group is not for that, but how do I find groups where I can share and discuss that type of content?
Then use Auto1111
It also has an api
hello is everybody using the inswapper_128.onnx?
may need some negative prompts and embeddings
Hi, please help.
#🤝|tech-support message
Has anyone tried running Flux on 16GB VRAM (4070 TI Super)? I'm in my holidays so I can't test it, but I want to know how long a image takes.
i run it with 2060 6gb+16gb ram cuda set to system fallback + virtuall memory to 40gb
took 50 sec+ to run an image, rip ssd.
Thank you so much!
Do you use the Schell or Dev Model
Schell
they can't make a channel for their main competitor lol
"why no midjourney channel"
Ok Thank You!
I dont think thats true lol
No it was run by a community member before SAI took it over, and it caused a lot of drama
The server was being ran by Kaarssteun
intersting news.. for how much got sold ?
They didn't sell it, SAI contacted discord and discord handed them the server
by force no consent ? really ?
XD
When was that
Just before the server launched to the public
During the beta stage of 1.4 if remember correctly
this is funny and wierd , what would be intersting if they offred free life time ai generation with a super fast workstation.
to the org discord owner.

Kaarssteun did see it coming and didn't mind it, what caused the drama is that SAI didn't contact him and straight up just went to discord
Becuase SAI wanted the invite link
this is crazy 😦
This seems to be solved and there is no reason to bring that back really
do not tell me they even blocked/kicked him from server without even making him a vip member ?
Yeah it has been resolved for a long time now, but its just nice piece of history of the server
No hes still around and has a special role
Reminds me of the time where we would go to a special channel to request our prompt be ran by someone who had access to a early version of 1.4
The images weren't good or coherent for that matter, but we were super exicted for it since the only other T2I ai at the time was Dalle-2
Besides hires fixing, anything you could do to like boost up 1.5 to sdxl+ standards. Sometimes im dissuaded from using 1.5s because of the low detail levels but they still have great creative ability or theyre just awesome checkpoints.
Ella, Hi-diffusion, Deep shrink, FreeU, PAG, SAG, CADS, vectorscope, rescaleCFG, dynamicthresholding
you should quite easily surpass the details of the average SDXL image using SD 1.5 if you use these
hi chat
i have stable diffiusion set up on my pc
if i want to delete it do i just delete the stable-diffusion-webui folder?
those things are above my brain-grade. extensions?
What is fp16 and fp32 model difference? Which one should I prefer?
yep,
thats alright
how much vram do you have?