#💬|general-chat
1 messages · Page 167 of 1
i havent been here for a while, are there any AI text to speech models?
Nice! 👏
Hi! I am the founder of an ecosystem around gen AI and automation. We are currently working on developing a proprietary closed model using a base image generation model and a Deep Convolutional Generative Adversarial Network (DCGAN) model. I'm seeking advice from an engineer with experience working with such models and in using Cloud GPUs. I would like to understand which provider can best meet our requirements. Can someone help?
Hi
Hey everyone, I got a buff PC now and want to try generating some AI Art. Is Stable Diffusion the way I should go and is the guide I should follow to set it up? https://rentry.org/voldyold
no this guide is way too old.
check out guides from #🤝|tech-support pinned messages
Thanks
What's the size difference between SD1.5, SDXL and Flux
Well... I know the difference between SD1.5 and SDXL
I never used a SD3/Flux Lora before
There is a very large difference, sdxl is roughly 3.5b parameters combined everything(text encoders, vae, unet)
Flux is roughly 16b on the other hand with everything(text encoders, vae, dit)
Just use nf4v2 quantization, and flux should use the same vram has sdxl.
any of the big 3- Azure, AWS or Google cloud
they are roughly equivalent at this point
although I would also say there is good reason that GANs aren't really getting used much any more
So does that mean flux loras are even bigger
Yes, but not by much.
SD 1.5 keeps getting stronger and stronger as more tools come out TBH
But it's still a classic
yeah definitely
Which is why I still use it
I really like the compositions of SD 1.5
I like them more than most newer models
it needs a refiner pass with a stronger model but otherwise its great
I want to see new SD models backwards compatible with SD1.5 Loras
you mean cloud for production use case running it for clients afterwards, or for training and dev purposes? any of these would work, depends on what size and type of servers and GPU models you need, because they just provide the capacity and everything else is up to you
for training AWS would be best because they have these concepts of Spot instance pricing, which is 50%-70% cheaper but can get interrupted by other bidders at any time with 2 minute warning signal. most server types can run for many hours or days before they are interrupted by other bids so it's pretty good for one-off tasks to run and then delete a server (like CI/CD builds or load testing / GPU training)
interruptible pricing is great yeah I use Vast.ai personally and rarely get interrupted
do you guys personally add any detail enhancing loras to your generations when you first start experimenting? or after you get a good base for img2img?
I never make any image without detail loras
sometimes I have to remove it for trouble-shooting
Detail Tweaker XL is the main one
but you want to stack at least 2-3 cos they all add different things
what are the detail loras, aren't details dependent primarily on prompting? i still don't know how loras work, i know 'control net' is a lora type?
so there's pre-processors like anyline, and there's loras which is a custom feature or style but it's basically the same SD model just trained one more 'layer' on top of it? is that how technically it works?
no, details depend maybe 1% on prompting and 99% on the rest of the workflow
where is it possible to train a custom lora? i want to try train for 2d cartoon body parts, but my laptop won't even load the SD itself 😊
is it something to try train a lora for, or it's a job for control net and openpose stuff?
control net and openpose can control the layout and pose but not more than that
and lora?
can it get trained for custom layouts like that?
yeah lora has an advantage that it can do both layout and style together
I wrote this comment the other day on lora methods:
1. pytorch
2. diffusers
3. OneTrainer, Koyha, SimpleTuner
4. replicate, civit
5. paying a freelancer```
thanks! will google it, koyha is something on github i remember i saw it, but need to train on my own pc correct? civitai can be used for any lora training? that seems best option, will try doing it there. replicate never heard of
Can anyone tell me where to find the CFG Rescale parameter in stable diffusion?
replicate, civit are the only ones that can't be done locally
out of curiousity, on average how long does it take for you to generate lets say 3 images (512x512) with pony lora?
is there a way to measure ToPs?
How to use stable diffusion on cpu only without GPU? On windows?
Do i need to download special version?
I do it, its fine
https://github.com/comfyanonymous/ComfyUI/releases get the portable version
and then go into the folder and
double click run_cpu.bat
it should start working straight away
get TCD sampler and TCD lora for SD 1.5
this makes excellent images at 6 steps
TCD is the best distilled model in my opinion, for SD 1.5 and SDXL
same, biggest thing I've seen on vast is an issue with a template on a particular server. like missing files or something, if I ever see that I just delete it and grab another or if I've already uploaded stuff transfer the files with the built-in sync (love that feature)
I have issues with templates a lot yeah
I am working on building my own template
I don't even use the sync I just redownload everything each time using a shell script that chatgpt wrote
cos I tend to go for data centers with 10GBs download its fine
if you go for servers with slower download then sync methods would be good
I build the workflow in advance so time isn't wasted
What are the best modern tools for training LORA or dreambooth?
And which one is better? Everyone says that dreambooth, but why there are so many loras?
loras are the most convenient, because they are small, easy to distribute, take up less space, and can be combined with others on a run-time basis. a dreambooth finetune outputs a full checkpoint, so in the case of SDXL that means 6Gb or more, while you can combine a full checkpoint with others, it requires a merging script to combine the weights. both dreambooth and lora training can be done on a small training dataset (meaning small number of images). there are multiple tools to train, kohya and the popular wrapper around kohya known as kohya_ss are among the more popular trainers, but others exist such as onetrainer. kohya_ss is what I use personally
I have a current training right now that has 24 saved epochs so far on flux. each of them is 150Mb, vs 26G for the full checkpoint. maybe that helps put it in perspective
cos people have different budgets
people are trying to minmax rather than make the best fine tune they can
if you have the money for it then training all the weights including the text encoders is best
this paper is a good example of why self-attention layers matter as well as just cross-attention https://arxiv.org/abs/2308.12964
I have a budget for 600 dollars for gpu.
Currently have ryzen 9 and 64gb ram and 5700xt.
Any recommendations as to what to get for local image gen?
used 3090
anything second hand like 2080 would probably work
hhhmm... say guys, if you use a lora that was meant for real life pictures in a anime or art generation, do you think that will cause a major increase in generation time?
no, that's not possible as far as I know
Would love to find a used one in that budget
should be doable
I had that card, it's still sitting on my shelf in fact. you can do image gen with it....better in linux, since it's an AMD. I was able to get a 3090 TI for just under 900 (renewed). For me, an upgrade had to be to a 24G nvidia card (for local training mainly), otherwise, it seemed more of lateral move. If you really dont care about vram (though you should), maybe us a chart like this to help you decide https://cdn.mos.cms.futurecdn.net/FtXkrY6AD8YypMiHrZuy4K-1200-80.png.webp
I didnt have to upgrade my power supply, but make sure you have enough juice
Hola a todos! saludos desde Argentina
how to draw icon use sd?
you can use deforum or similar tools for that. or if you are lazy, just subscribe to dream machine and use start and end images for the video.
do I have to have a subscription to create images ?
Dear everyone
Nice to meet you.
Recently I've used this service and impressed by AI engine.
https://www.nterview.me/
https://www.youtube.com/watch?v=AfDn_Esqgg8
What models are compatible with fooocus? is there a list? Or how can I determine if the model or lora will work?
I finally figured out dall e 3's little image generating trick
It doesnt really generate images but more like it cheats on the test
Talk about understanding nuances
I rip information from its images, almost forget i can change the image file format into a text format
And decompile a little bit to grab some code from microsoft sources
anyone can help me fix this #📝|prompting-help message
flux simply wasn't trained with clip g
hi everyone okay so see i m training an sdxl model with kohya trainer so can anyone suggest me some tricks while training so that i can generate high quality images
does anyone have an opinion on deep dream machine ? Do you know better or cheapre alternatives for Ai video creation ?
Try kling or gen3, those are better from what I know.
will anyone answer my question?
no, you can generate locally on your pc, search for automatic1111 webui or comfyui or tutorials on it, you'll figure it out
you subscribe only if you want to generate on someone else's machine basically.
or support devs \ site creators, depending on where you're gonna sub
hey chat
Probably not, your question is too generic. "suggest some tricks to make it better". we don t know what settings you re using, we don t know what you already know about, etc. "make it better" is also too generic, could refer to the image quality, composition, resolution, how close it sticks to the prompt, etc
Try to reformulate it. Also it s probably better suited for #🔧|finetune or #📝|prompting-help depending of the refined question.
Yeah thanks mate i just tried kling, there is an offer 66% off first month
Wrong
just popin in to say RIP A1111 (until we can use flux on it)
also, what should i use instead 😂
what happened to A1111?
i cant use flux with it 😢
idk if I want to tap on links I never seen before
Generate text to image, Chat assistant and image analysis with my verified discord bot https://dsc.gg/vexel
Not wrong.
very very wrong. dall-e3 generates images, just like all the others do. it doesn't 'cheat', it just doesn't use the same neural network that stable diffusion uses.
Im saying it does push out generated images, just not in the same way it does like other text to image API models
it uses a different neural network, but it does exactly the same thing that stable diffusion does to create. and the same thing meta does, gemini does, and all the other diffusion models do.
it even uses CLIP
lol what
firstdayoninternetkid.gif
once an image is an image, you can pull the code out and use it, if that's what you're talking about. it's just an image then.
there is one very weird thing you can do with Dalle 3 using comfy ui
if you use a clip embedding explorer node
and you find tokens like this 23u4tj2-8t-1t02ht4 that correlate highly to meaningful tokens
then you give 23u4tj2-8t-1t02ht4 to Dalle 3, it makes the same image
or at least a similar one
i'd expect that though, i'ts using clip, it's probably using the same text encoders
ye it makes sense I just find it funny that it works
it's just using GPT on it's backend and stable doesn't
can probably pull shenanigans like this with loads of models
probably
there's no resaerch to suggest that running 3 encoders is better. What stability was trying to avoid was when they went away from clip G , the very old and inferior clip model, people had no idea how to prompt anymore. A big part of SD2's problems wasn't just censorship but it was that everyone had their prompting figured out. Everyone was a prompt master. For Clip G. Nobody bothered to adapt.
SDXL was an attempt to bridge that. Clip G and Clip L side by side.
T5 benefits from being paired with a clip encoder, since T5 wasn't trained on image pairs. So that's why they use it with the superior Clip L.
SD3 actually suffers a lot because of the old busted clip G. Practically decades old at this point.
I think I have seen the vision version before on reddit/youtube
exploiting that they often use some shared ViT or CNN
probably also why if you give one of the LLMs an image, get a description of it, and give that to the ai image gen, you get almost an identical image
yeah I really love that aspect of these models
while it's fun, it also points to a problem - they're all just basically the same thing. we're all using the same set of pencils and crayons - so everything pretty much looks the same - or in the case of LLMs - sounds the same
I think SD3 is the research about 3 text encoders. And well, look at it
they're all trained on the same data, in teh same way, and there's no real diversity
yeah there are very large similarities
I made maybe 1000 flux images today and a ton of the sci fi stuff I had seen in SDXL
to be fair to flux it has a lot more image variety than I expected for a distilled model
There's way too many LLMs at this point to say that for certain. Maybe back when there were 2-3 contenders.
they're all just modifications of the same thing though
Feels like saying most of the traffic online is pornography. Yeah back before youtube and netflix and amazon. Sure. Not now.
go talk to chatGPT, claude, meta - ask the same quesiton, you'll get 1. the same responses 2. the same personality 3. the same thoughts
we're in a huge echo chamber
the free versions? or the SOTA?
try both
try all of the ones you can get to
i have actualy. And more. So i don't know where you're coming from.
moving on i guess. discussion is moot
then don't stuff yourself in to the conversation
my test questions get very similar answers on like all of the top 50 LLMs
yeah true, but with llama/mistral models its pretty easy to make it a different personality
sure. but i'm talking defaults
just saying your claim is wildly innaccurate and inexperiienced.
It's a whole lot of puff
you can twist the ai image gens to do unique things too - but the default stuff without a lot of prompt hoops and adjustments all come out looking pretty much the same
dont bs in public if you don't want to be called out for it
go away
¯_(ツ)_/¯
his claim was that the big LLMs are trained on common training data and have pretty similar outputs
seems true to me
Yes they are heavily finetuned on such outputs which is honestly fine, you will see that the base models have no such "censorship". They are completely unfiltered
you do this all the time, jump into a converstaion with no dea what you're talking about, get ugly, attack someone. just go find something else to do
its not. it has "truthiness"
ok so lets not start drama if there is some truth to it
tagging my images for a LoRa is confusing the hell out of me lol, do i tag the stuff i DONT want in the image, tag everything or just tag the stuff i want the model to learn??
we're in a dataset gold rush. There are more than just the 3 bots he listed out there
you are telling the AI what is in the image. just tag it with what you what the AI to think of when you use those words in a prompt
truthiness is a term Colbert coined, when bullshitters skirt that grey area between truth and fact. It's not true, but it feels like it
eh, he follows me around and takes any opportunity to try to 'call me out' and spread his chaos
all these youtube tutorials and this guy explains it in one sentence, thank you!!
just ignore him
there's a lot of strategies. If you're training the likenss of a person for SDXL, i'd go with describing everything BUT the person. The person is boiled down to the single trigger token.
if you describe the person in the captions, you generally have to describe them in the prompt too
im making my OC character into a LoRa, likely for SDXL
so a lora trained off a character and their likeness
its bc of the finetuning data for chat models, most of it is completely synthetic and hence it will have a very similar speaking style to things like chatgpt/claude
However this is kind of easy to remove since you can just finetune further or finetune the base model.
Base models will not have such problems.
one token for the character. describe everything else. Thats how i do it.
Other people have other approaches. But i've foudn that describing character details requires those in the prompt later on
a lot of it is the finetuning data yeah
but there's also common core and things like that
stack overflow has essentially been lifted into most of these models
ok thanks for the advice!
i agree there are common datasets. it all started from smaller efforts. But we're post nvidia hitting 1Trillion valuation now.
That issue is rapidly diminishing
I'm not sure the models are diverging
I've seen the opposite trend in a few ways
depends on your use case. many of them will be a lot of the same.
there's only so many ways that a model can impersonate a pirate
Imagine a fairly niche academic question, which is answered very well by only a handful of articles on the internet, and not answered well by any other sources.
Over time as models get bigger and have more expansive training data, its more likely that each model will come across that one particular answer in their training data.
Because the utility of the answer is so much higher than the utility of the answers other sources are giving, this correct answer will light up brightly on attention scores, and so end up being the answer each model gives.
llama3 was trained on 15trillion tokens of data, the internet has less then 100t I believe.
the public internet. but how much do the private and academic sectors have?
ignoreme i'm just following people around
they are, and in SD3 - per the diagram - clip_G is the workhorse. it actually works pretty well for the job it's doing
clip g is openclip(from laion), clip l is normal clip(from openai), they are similar but different sizes.
i dont understand why no one uses siglip now since its basically the much more improved version of clip
the google searchable internet maybe. the depth of it is so much more than 100t . Trillion is also an unfathombly large number, though also very obtainable as far as data goes. if one token is 5 bytes, that's 500T bytes. That's 500 Terabytes.
someone's probably working on an implimentation that'll use it
Allow me to introduce to you, the deep web
yeah 100%, lots of data can't be scraped with scrapers
stock scrapers maybe
true, there is also pile t5xxl which is supposed to be a better version of t5xxl 1.1
Auraflow uses the much smaller version(pile t5xl) which is like 1b parameters compared to t5xxl 1.1 which is like 3-4b parameters and has similar prompt following to the best models.
wikipedia text file is 60GB. Hmm. Actually that scale on the text data.. how many wikipedias would be 500TB. actually, might be believeable
say 8-9000 wikipedias would fill 500 TB. that's a big scale. maybe still not the depths of it all though
https://academictorrents.com/details/9c263fc85366c1ef8f5bb9da0203f4c8c8db75f4 reddit dataset alone is 2.5TB. thats a lot of garbage text.
thats compressed too wow. older archives, like the archiveteam rips of Yahoo groups, thats 1.5TB of compressed text.
yeh i convinced myself again. The depth of the internet's text data is way over 500TB
a lot of dataset is going to get filtered and deduplicated most likely
almost guarenteed, and most of the data sets are being taken from the LAION database so that narrows it down even farther
Hello
funnily enough Kolors was the one to really push it with the text encoding
they put a fairly strong multi-lingual language model called GLM
hiya 👋 greetings everyone. just getting started with SVD 1.1 and will likely be hanging in #▶|stable-video-diffusion
I followed a tutorial for AMD compatible stable diffusion, and although I am new to this I feel like a portion of my less than ideal results are from using "v1-5 pruned emaonly" as my checkpoint. I've had so much more success with web based ai gernerators such as adobe firefly so I feel like my prompts should get at least decent results
Anybody willing to share a good text to video workflow with motion imported from another video, and face swapping?
Hey i am new here like what do you guys do
whats the best fastest gpu for flux training?
4090 far as gpu goes. if you go enterprise you're better off
yea but the problem is its very slow used to be way better 14secs per iteration
idk what i did wrong
you've already got a 4090? what a deceptive bait for technical support
Hey, you should try other models then. The 1.5 ema prunes is not recommended as its 2 years old
Hey all 🙂 what is in your opinion the best existing img2vid workflow? 
i am training my flux model and at iter 0/500 i get a normal picture but past i get static
did i overtrain?
what is better for stable diffusion: Intel ARC A770 or Radeon RX 7600?
Friends, I want to rent a personal host with A100 graphics card and 14900K, and 4080 as the display card. Is this solution feasible?
A 7600 would be better, but if the 770 has 16gb instead of 8 maybe that would be better
@warm junco both have 16gb
alternatively I could get a used 16gb NVIDIA Quadro P5000 for the same price
ahh yea I meant the xt
it's like 30 bucks more then the 770
if it's a lot better then that makes sense to invest
what about a GeForce RTX 4060 Ti it's 100 bucks more. Would that make a huge difference or is it not worth it?
hey guys, ya'll know of any lora or checkpoing similar to the style of persona 3 reload in-game?
Ai runs the best on nvidia GPUs
So the 4060 ti 16gb would beat the other two easily
thanks then I think I'll save up some more
No problem 🙂
And yea its worth to the 100 bucks if you plan on using a lot of ai tools
Most of these local tools use Cuda made by nvidia
I already am using sd a lot but my current gpu sucks and I really want to use larger models
What's your current gpu?
a GeForce RTX 3060 from a miner that can break any moment
Hey guys, general question if anyone knows, what's the point of upscaling?
I mean, I understand it improves quality, but why would I upscale if I can generate at higher pixels from start, say 1024x1024 instead of 512x512?
Oh okay, but if it has 12gb you can use sdxl and even flux on it
Even with 8gb vram sdxl/pony works
Hey, because models are trained on specific resolutions you shouldn't generate much higher than that.
For example:
1.5 models are trained on 512x512
SDXL/Pony on 1024x1024
If you generate native 1024x1024 with an 1.5 model you can get artefacts and duplicates
what is pony?
An variant of sdxl
does it create those cartoon horses or is that something else?
It can, but there are specific anime pony or realism pony versions
They are very good at generating normal hands
ahh so it's not just for generating those horses
Nope
if 16 gb isn't needed then maybe I should try switching the ui
What ui do you use?
That should work then with your GPU
nah it keeps crashing when it loads 6gb+ models
Ah can you show me your content of the webui-user.bat ?
In #🤝|tech-support
We can fix that
sorry it's on a different pc
Oh that makes sense, thanks😎
So just 512x512 and upscale if I want 4k?
then what are 16+gb gpus needed for?
Ah no problem.
But make sure you have
--xformers --no-half-vae in the webui-user.bat
And if it still crashes you need to increase the Windows Pagefile.
Feel free to ask me in #🤝|tech-support if you try it again
I'm on linux tho XD
If you use an 1.5 model then stay near that resolution. But what works good to is for example
960x540 and then Upscale by 2 to get Full HD.
Then upscale that image in img2img for 4k
Ohhh xD

yea I've used --xformers and --no-half-vae before gotta check if it's still in the bat/sh
Does the 12gb flux model really work with less then 16gb?
Yes
But if you want to use flux you need to use Forge or Comfyui
is the model only partially loaded or how does that work as the os already uses some of the vram
forge actually looks good it's basically auto1111 but comfyui is a bit complicated
Yes it gets partially loaded and into the ram too
ahhh nice then I'll save up even more and get some 20gb+ or couple years later
if the current gpu lasts that long that is
thx so much
saved me 300 to 400 bucks XD
I thought hands would take like another 2 to 3 years it's incredible the hands issue has already been fixed
Anyone has any experience relating graphic cards?
Would Nvidia Quadro p620 be any good in generating images? (It's only 2gb vram)
No
It could work
But its not good
Thank you for all the help!
Are there any other good Ais besides anything sd or mj related?
want to know too
I have images of a book in the closed and fully opened state. How can I generate the intermediate frames between these two images to create a smooth book opening animation?
hi
Hi
you might try using luma or kling - they do a good job of interprelation
so apparently they are going to realease sd3 8B and realeased a finetuning guide for Sd3 2B
there is no release date for sd3 8B yet right?
lykon just said so in twitter
but no release data
date
the llm? if so, it was supposed to beat all other other llms bc it could "think" and "reflect". It was trained on 70b llama3.1 and beat gpt4o, claude sonnet 3.5, 405b llama3.1 in benchmarks. However, it was kind of a scam since the open source one is completely different and performs worse the 70b llama 3.1, and their api was much better but still not as good as advertised. However, people quickly found out, that the api seemed to act similar to claude sonnet 3.5 and could not say "claude" and had special tokens only claude has but not llama.
hi!
Oh I see
good morning
hey folks
is it allowed to discuss third party (premium) services for AI photo generation? I know some of these (or most) at some point were based on stable. Would appreciate any info you opinion of the "best" text to image service ($) and why you think it is the best. When I say best, I mean in general across all types of iamges like realistic, cartoon, paintings, whatever you tell it. I understand some may be tailored to specific needs. (or just the most popular right now)? If this is againt the rules please remove my message and let me know. Thanks
is there a reason you want to pay for something you can get for free?
@desert dagger convenience
there's a new model on meta.AI - you get a few preview messages with it, then it swaps you back over to 70B

have you tried this? https://easydiffusion.github.io/
Would anyone tell me how to get as good as the result that we get in Midjourney for this prompt
A moon with a large circular hole filled with glowing yellow electronics.
Details: Intricate details, photorealistic rendering, textured lunar surface, craters, soft ambient lighting, visible wires, circuits, and chips, warm yellow glow, cinematic lighting, depth of field, volumetric lighting.
Style: beeple, Greg Rutkowski, trending on artstation, hyperrealistic.
I submitted Midjourney image in #🏞|general-with-images
been running tests with forge out of curiosity. i assumed that telling it to use less ram would slow it down a ton. but i told it to only use 4GB for flux generations, andit's only using 4GB for flux generations. so um. ok. doing 1.5 mp generations usually at 40-45 seconds. now at 86 seconds. Twice as much but really, not that bad for such a signicant memory savings.
3GB limit works too. no speed change. 25 step 1.5mp at 85 seconds.
uses all my system mem though so i guess it helps to have a lot of good system mem
with which AI?
flux dev
nf4 version 2, but it should work for any model too
this is such a flex on the memory management that he was using before and gutted completely. must've engineered this solution to one up on teh comfyui code he got accused of ripping off. it's such a massive flex, but nobody realizing it. I bet the coders at comfy org do though.
who are you accusing of flexing?
it's a proper good flex of skill and achievement. the author of forge, controlnet, fooocus. has a russian name i think.
nah its an anime name
i got it so flux is only using 1gb of my vram. wtf. 1min generations with the real sampler, with 1gb of vram.
more people should be talking about this i feel. oh well.
real name: Lvmin Zhang
if only training was like sdxl 😔
comfy is better at flux inference cos it has the FP8 speed boost though
better than sdxl imo
no cuz it takes longer,maybe later it will be improved though
i'm having nothing but wins using lion 8bit. adafactor wasn't doing it for me. i didn't try anything else
i trained 3 loras of diff ladies today. 15-30 image sets. might try to combine a few sets and do a 100 image mega set. should take over night i figure
yea tried with prodigy and adam but only adafactor works with style im making
i think lion is adaptive, but i run it at a constant rate. sometimes with cosines.
converges at 500-700 steps usually
do u use kohya,the derrian trainer,one trainer or ai toolkit?
i dont fuck with anything but kohya. every other trainer has way too many hype artists to cut through
kohya doesn't ever say shit. just keeps workin
plus he's japanese i think so we'd be like "what?"
yea i tried with derrian and fluxgym but dont work i guess ill try kohya again
fluxgym uses kohya at the back iirc
https://github.com/bmaltais/kohya_ss/tree/sd3-flux.1 this what i been on
yea i always get the code1 error and the florence tagger also got stuck 😔
i use taggui too.
ill try that branch later,how much vram u have?
https://github.com/jhc13/taggui trickier to set up but it works nicely.
i have 16gb
yea thats prob why i couldnt make the others work i have 12gb so ill wait for more optimizations
can train only 2 layers i think. that'll help bring it into line. split the model between ram and vram helps. little slower though
i guess in that memory hungry sense, it doesn't train as well as sdxl haha
yea ill just check the civitai training prices for flux,the moment they lower the cost from 2k to 500 buzz thats the moment ill know they optimized the trainer for the lower end gpus 
glif just brought their flux lora trainer online
oh yea ill check there too
https://github.com/kohya-ss/sd-scripts/tree/sd3?tab=readme-ov-file#flux1-lora-training in here they mention
The training can be done with 12GB VRAM GPUs with Adafactor optimizer, --split_mode and train_blocks=single options. Please use settings like below:
then gives you a training command that would fit in 12GB
something like lion 8 should drop in to replace adafactor and get more savings
which anime i'm curious
fate
the prisma illya one is the one where shes the mc
well fate/zero is cool but prisma illya is like a spin off full of fanservice with a magical girl plot so maybe thats why he likes it 
i like sword art. except the season where its all about their digital family with a chatbot baby child
season 1.5
yea same here the fights are the best part of it
gm
Make both people in the photo face forward.
Ok, push button, to press, use space bar
Hi, is this the correct server to ask questions about local genning with SD forge?
hi
Anyone know good free tools for batch text removal from images?
Hi
Anyone here good with Lora training? Came across an issue
I asked a q in the chat help, may need some insight haha
Hey everyone! Is there a way to automate a list of character names through a specific part of a prompt in ComfyUI? To clarify, I have 20 characters, and I want to generate one pose for each. Since I'll be frequently changing the poses, I'm wondering if there's a way to automate inserting the character names into the text prompt node, rather than manually typing them each time. Any tips or workflows to streamline this process?
internship implies free work. is this a paid internship, or are you looking for slaves? At least you're not charging people ot be part of it.
my point is... going onto discord chatrooms and looking for unpaid interns is pretty fucking greasy.
oh it was removed. good call
What is even with the mouse cursor on that website? With design like that, you can be sure a business has absolutely zero real world experience with tech. A web design that bad is a canary in a coal mine. It indicates there's zero expertise at the fundamental level
https://www.cosmic365.ai/ i'm linking it again just so people can see how shady an bad it is
Looks super legit to me!!!
Look at those blinding design skillz
And hey; it's pretty darn cosmic
do you guys know if it's possible to install a1111 with an already existing forge instillation without having to have the entire program and all it's dependencies ran though? I just want to be able to run batches through adetailer and I can't without a1111 but i don't have a ton of space for multiple installs
you're going to have to have all of a1111 installed for anything to run in it
dang, I can't get XYZ grids to work nor adetailer batches, do you know anything about how to get those to work in forge? ;-;
no, sorry, i use comfyUI. you might ask in #🤝|tech-support
Hello people.
Considering this is an AI channel you are taking a lot for granted with that statement.
🤨
Anybody know how to bring down the contrast of a lightning model if I’m using eg Steps 4 and CFG 1.5?
Wondering about dynamic thresholding but not sure what settings to use
Also if i was to up the main CFG past 2, what also to counter with . Thanks
AI are people too
lol the yellow arrow on the cursor is so ugly
its HORRIBLE
Hello, is there anyone who can guide me, thank you
درود به همه فارسی زبان کسی اینجا هست ایا؟
What exactly should be done here?
😁 😆
where do you say
what is best extension to FORGE similar to it? https://github.com/butaixianran/Stable-Diffusion-Webui-Civitai-Helper cuz this one not working with forge, any1?
tonemapnoisewithrescalecfg and Skimmed CFG are what I use
on regular models I use CFG 20-30 and on lightning/TCD models, I use CFG 10
so yeah these two nodes can definitely lower CFG burn lol
can anyone suggest me is there any better mdoel than clip for finetuing sdxl
There any good "text to video" AI things you can download and use on your PC (so you don't have limited uses)?
If anyone is interested in art dm me i need urgent commision
Scam. Don't do it. Don't be stupid.
not rly
there are some variants
but I'd rather not change to them
Hi everyone, I'm just starting out with SD and I'm beginning to understand it.
However, when I see the prompt as an example, I often see montions like ‘score_9, score_8_up, score_7_up, score_6_up,’
I've tried putting ‘score_x’, but I don't understand or see how that influences....
Could you please explain?
that's a prompt for a model called Pony
people put it in their prompt for SDXL by mistake
not even the author of Pony can explain them. Story always seems to change. sometimes people say it's intended and a good thing. sometimes people say he mistakeningly did it. everyone has a source for their information. I think he was just throwing spaghetti at the wall to see what worked and started reverse reasoning his way around it.
ultimately the 20 extra tokens required on the refined pony xl model are not worth it.
not sure about that
he gave a coherent explanation
images were tagged with the scores with the intention to use one score in a prompt, but instead of learning how each score works the model learnt that a string of several scores is good
people have told me other explanations with the same anecdotal "he said" . and there's always a source for it. yeah. i know. I know there are explanations out there.
I just think they were reverse reasoned after the fact and aren't the real reasons.
luckily there is a way to test this numerically
pony forgets way to much of the base model's knowledge for it to have been properly planned. he fluked out when the training data produced what it did
take the conditioning to text code node from here and it can be measured https://github.com/Extraltodeus/Conditioning-token-experiments-for-ComfyUI
naw. that aint my domain. i'm not that kind of artist to test and combine numbers 100 ways. some of those charts on there scare me.
using information dumps like that to magically suggest pony had a plan is weird. i dont know how it relates. how could anyone possibly?
if you don't want to measure it then that's fine
but that's the method to verify which side is right
lol if you say so. feels like big troll energy
For windows 10, do you guys perfer annaconda navigator or just plain cmd with py?
"go discover pony's intentions while training by testing 100 combinations of tokens" naw
people keep upholding the score tags. they're not good. they didn't work.
why would it require testing hundreds of combinations of tokens?
I don't understand
that's the method you showed me. dude explained how he made nearly 1000 test examples. and honsetly his prompts look schizophrenic
prefer terminal to GUIs like that
ah no I'm not saying to replicate what he used the node pack for in that repo
I'm saying to take that node and then generate with all the tags, see what the nearest prompts are in the vector space
then do the same for just one tag
and then for no tag
i'm not going to do homework on ponyxl. its just not worth it. it's not an academic achievement of a model and isn't worth researching. it's a lemon.
yeah if you don't want to do that its fine, I was just telling the method
a funny example from that repo is that
the default Comfy UI prompt "beautiful scenery nature glass bottle landscape, , purple galaxy bottle,"
gives lavender in SDXL
it was always a mystery
turn out the 4th closest prompt has lavender lavender lavender lavender
lavender and purple are basically synonymous in human language. that's my guess
you can look at how the vector space is but the reason it's like that is become synonyms
i never heard of that mystery until now. should've just asked me at the start .
ah yeah that's a great point they are synonyms
I get a funny one currently
I like the tokens "colorful background" a lot on SDXL
but if you boost them too high you get fluffy objects
"scenic background" is one i enjoy. maybe i'll try "scenic colorful background"
there was a study that looked for good tokens
I will try to find it
they found 14
oh yea this one https://arxiv.org/pdf/2209.11711
and the result was this:
art, dramatic lighting, high detail, highly detailed, hyper realistic, intricate, intricate sharp details,
octane render, smooth, studio lighting, trending on artstation```
missing rutkowski
lol
i actually wish that "hyper realistic" was more aligned with real hyperrealism style
yeah a lot of tokens are a let down
"cinematic" is an excellent token though in particular
I use A1111 weighting when I can, so I can boost it further
not every node as a A1111 option, most don't in fact
i thought comfyui supported prompt weights natively
hmm. i remember using those lora sliders with 3 and 4 ratings, on both uis.
perp-weight goes to like 50 but the tokens are nowhere near as strong with perp-weight
oh the UI offers it, its just that funny stuff happens to the image
lora sliders are easy test. they have significant difference between 2 and 3
are you talking about the lora strength or the clip strength?
cos load lora node has 2 sliders
o i c. you didn't actually mean "it only goes to 1.3" you meant something else. got it. this is what i think pony's explanations of the tags were too.
oh you thought I meant the slider ends
yeah I meant the image breaks after 1.3
I haven't done the pony test I mentioned BTW
I might at some point
but like you, I'm not that bothered about Pony cos its broken
cool gfy
I get what you're saying
that he might have not communicated correctly what he actually did
or that he didn't know what he did
hello!
https://github.com/OutofAi/OutofFocus new sd2.1 tool.
i think there's a professor out there whos teaching his students to write all their research on 2.1 so that "the community" won't abuse it right out of the gate
there's been a few 2.1 projects that are novel and effective, coming out lately
dynamic compensation sampler is another
yeah that's the best sampler in the world as far as I know
would be amazing for flux
hello
spammers begging people not to bot stomp thier servers when they spam now. cute
on a spam server. ok
you're spamming about a scam
absolutly no
absolutely not the right discord to post this on
i say we bot brigade their telegram "investor" channel
You are right
here's what'll happen. they pump a lot of stocks up. they convince a few people to get in on the pump. the people who actually initiated it will exit and then everyone is fucked. Scam
if it's even that kind of op. this is porbably just a guy trying to scam gift cards
I don't talk to people who don't see the opportunity, I'm sorry for you, have a good life everyone
i have plenty of opportunity. the secret to my sucesss is recognizing bad opportunity and scams
If I want money, I'd become a damn good AI software engineer and get millions in funding
Instead of sad clickbait scams that would be laughed out on the average Discord server 😂
Opportunity in a pyramid sceme?
Simple.
Be the pyramid scheme
Thats smart, will try this out
that's my plan too, but it's been problematic since i'm not an engineer and studying is hard
Hi,anyone knows if there's any way to "print" generation times in XYZ plot of Forge/A1111?
you can find yours grid at \webui\outputs\txt2img-grids. Open you image and print. (if I understand your question correctly)
@zenith lance Not quite, the grids are ok, but I would like to overlay the generation times, like: I'm testing Flux models vs Steps and would like to know how many seconds each image took to generate
I could probably check that latter one by one , but would be really nice to have that "printed" in the plot
It seems Schnell generates/resolves in less steps altought with less qualitity than Dev, but would like to know the "time saved" on a decent generation
Hmm,cant post images here
When I'm increasing the curves of a female subject (not overly so), a lot of times I end up with the subject losing clothes. For example, let's say my subject is Zero Suit Samus and I want her to keep her full bodied Zero Suit on - her suit starts right below the chin like a turtleneck would and covers her entire body including hands, fingers, feet, everything.
This question isn't about Samus, she's just an example. What I'm really looking for are phrases to use to ensure that all the clothes the subject is supposed to have on actually STAY on. Of course, that's with me telling the prompt what clothes the subject is wearing.
Quick question: Can I use SDXL checkpoint on lower resolutions (i.e 512x512)?
yes but only with res adapter
try specifying stuff like sleeves, leggings, boots - stuff that beats the AI over the head with the fact she's wearing clothes - but isn't the word clothes or clothing or clothed
does anyone have workflow of controlnet inpaint with flux ? Using alimama-creative/FLUX.1-dev-Controlnet-Inpainting-Alpha
i have some issues using HI red settings, its been fine for months when suddenly when i tried it today, the progress bar is stuck at 0% can anyone help me with this please?
which CUDA version is recommended to use with a 4090?
gm
gm
gm
gm fam
Hoi, was ai-toolkit only for when training flux loras? Or can it be used for SDXL/1.5 as well?
can i upscale more than 2 img at once?
Hey
On Linux, there's other things to run AMD GPU other than directml
What was that called?
rocm? zluda?
anyone got juggernaut XL to work on fooocus ?
my guess is that fooocus does not detect the baked in vae, and uses sdxl vae instead, so i get very bad images. other juggernaut versions work fine.
would recommend switching from fooocus to comfyui, diffusers or pure pytorch scripts
probably comfyui
fooocus has great inpainting, ive found that hard to replicated on comfy
I see people say this a lot but I never really knew why
when I used Impact pack on comfy it worked perfectly out of the box
can you share a workflow with sdxl inpaint, id greatly appreciate it.
guy during the Ottoman Empire
Photo in the style of realistic photography, an open black box, a faint glow inside.
As far as i know of, you can batch them to do one at a time, but to do 2 upscales at once, you'd want to run 2 SD instances at once, provided you have enough video memory/ram
so just about how much pc can handle ic
Good afternoon, everyone! How are you all doing?
okay 🙂
Guys, last time i used SD was more than 7 months, been using automatic1111 but i dont know if its defunct now or upgraded. Is it still viable or is there alternatives?
there are lots of much better models now, and a111 is fine but comfyui, swarm, forge and diffusers are great alternatives now.
I would probably recommend switching to a better model now, flux came out and it's incredibly good at text rendering and prompt following and human anatomy. It's vram requirements are a bit hefty and requires roughly a 8gb vram gpu at the least.
where to find flux? is there an installation guide for it?
whats the second best from those options? according to you?
you can probably just search up installation guides, these are the original models(dev is better but schnell is much faster): https://huggingface.co/black-forest-labs/FLUX.1-dev
https://huggingface.co/black-forest-labs/FLUX.1-schnell
and I think comfyui is probably the best ui and diffusers as a library
a side question can I install the ui to an external HD or its best to keep it on pc?
Anybody with Photoshop or similar apps knowledge? Need a lil help
Anyone can help me
Mood
My stable diffusion I got extension which is wildcard galery but preview doesnt show up how can I solve that ?
i can help you with photoshop, what's up?
It's that one thing you replied to actually, trying to change a texture from a yellow tinto to a more natural pinkish, it's a skin
for photoshop you should use the ai generative fill
i think you would get a simillair result
The person that helped me before mentioned what Todo, sadly both them and I lack such skills
I can try but seems hard to scoot around the non skin parts
With ai I mean
oh, well - piximperfect has a very good tutorial for that on his youtube channel if you want to use photoshop. but the easy way is just select the skin and then adjust the hue
I don't have Photoshop myself, I'm a broke college student 💀
even for AI, you still have to create a mask over the areas where you want to change, and then prompt in the change
DM me your image, i'll help you with it
Thank you! I'll send hem
Online you can use Photopea, really similair to photoshop
hello all, when I do upscaling, how can I ensure that my image is not destroyed and that I still have to do corectionr. (and the time I waste).
hello
there is a control net called control net tile, which can help
using a low denoise can help too (only running the Ksampler on low sigmas)
hiu
And if you're desperate (or just edit simple stuffs), go for GIMP.
Guys hi
Can I use SD to upload a drawing of a map so that it uses that drawing to make that map look like an alien spaceship map?
Any Flux server?
Have a good art everyone here!
GIMP is good
hello
any idea if sd3 large will ever be opensourced on huggingface?
Is ai-toolkit only for flux models? Or can they be used to train loras for 1.5 and SDXL too? 
As i found it easy enough to use that it was the first tool i've successfulyl managed to make a lora for. As i just can't wrap my head around the damn parameters and folder structure for kohya lol
I don't believe sd3 large will be open sourced, however stability is planning on open sourcing sd3.1 2b at least.
Jssnsggajseba?
just forget stability. they relied on the men at black forest labs for any credibility in the field. it's gone now. they'll never release anything substantial again.
whatever happened to make them leave is something the new owners of stability need to contend with. BFL is what stability was more or less.
Don't count on SD3.1 to improve the pretraining problems
Hey guys, could someone point me in the right direction? I want to render characters on top of sketches (mistoline control net mode) but each body part will be apart from all others (legs, torso, hands, head) like in Rayman, for skeletal animation in Spine. What kind of "examples" I should prepare for such lora training? Or IP-adapter is what I should use for that? I want each sketch I do (I draw character in full, every body part in line art, just no final color) rendered by SD in a pretty way. Should I make Lora for that somehow? Appreciate any advice 🙏
Do I need to make "examples set" of such body parts in full color and style of what I want SD rendering, or it's possible to convey to it that all the "point of the training" is parts separation, without caring about render style of examples that I'll submit for training?
in general for lora training you want to use examples that are examples of the finished image that you want as output
so for example you would not use examples that have a different structure/composition or style to what you want
you use examples that match what you want in both style and composition
control net is different
for a canny or depth control net you just want the composition to be right in the example image
for ip adapter it can be set to composition or style mode so it depends
style control nets do exist but are very rare
to pose rayman style characters (vectorman possibly too) you could potentially train a controlnet lite for that. lot of people don't realize that controlnet learns how ot pose from it's base images. I'm not sure it could pose a disconnected character by default. Could be a neat effort to learn controlnet training with.
I think training a lora of just the disconnected character in enough poses would help a lot too
controlnet training sounds fun yeah
I am writing an magical world
But, I wanna add humanoid magic golem into it.
And use graphics card to resembles them.
How should I represent those most classic ones?
Like, 750ti
960ti
690
Shits like that.
hello
hi
Does somone in here have minecraft and want to play in my minecraft server with mods?
oh
Are there any local tools that can identify whether an ai image has imperfections or normal
#怎么用
How do I run FLUX models correctly?
I can run them in my sd models folder, but they give gray images.
that depends on if they are .sft files or .safetensor files
if they are .sft files, they go in the /models/unet folder
mine are safetensors
what interface are you using?
Is it comfy on Apple silicon M3 pro?
not sure. you should ask that in the #🤝|tech-support channel
#🤝|tech-support would be the place to ask this
Which flux model? Make sure you use either nf4 v2 or the flux dev fp8 (16gb)
Forge has flux support
I want to use impressionistic realism
I don't know if that's a flux model.
It doesn't wark with anything I have
my mistake, then. the last I remember hearing is that flux wasn't supported in forge
How big is the file?
38 MB
hello
Hi
Hi
Slightly afk. But if someone can ping me with the response to a question. Information says you can use stable diffusion to create a 2d character concept art. To later turn into a 3d model. Can anyone tell me which checkpoint, of the many many choices, i should use?
Which motion diffusion do you guys recommend? Like SVD, animatediff i read is kinda "outdated", but don't know what people prefer these days
A group of wild boars on the left and a group of hyenas on the right, facing each other during the day, tense atmosphere, panoramic view
Im looking for a partner in building comfyui backed bots, any interested feel free to dm.
If this should go to community projects let me know
that's where it should go
big scam energy. @moderators or however you do it on this server. needs to be seen and ban this person.
just react with ⚠️ and they should see, but obviously no one has seen that yet
they post in every channel at least once a day in these early western world hours.
i think it'd be easy to put a bot condition in place. if all their posts are verbatim the same post, put a 24 hour mute on them. EZ. Do they not have any coders left at stability?
this is not a stability server, this one is community server, by users of SD
huh?
there is a bot, that sends notifications to mods
https://discord.com/channels/1002292111942635562/@home this is the official stability.ai owned server
bot reacts to ⚠️ that is the reason ⚠️ disapears
oh that link doesn't work. it' goes to the "server guide" link
look, if stability doesn't want to tend their communiyt garden, that's on them. You don't have to make excuses for them or pretend it's not official though.
i am just saying what mods told me
the bot is kind of half finished. A spammer that posts the same scam invite in every single channel is easily discerned in code
i am just a regular user
mods lied to you. Everything about this server is official Stability.

this server did start out as a community server but the owner got approached and he gave it over
he still has a special flair though
Any good ways to remove text from and image and fill it with in?
img2img didnt seem to work good enough
probably just inpaint
Has anyone successfully gotten S.D. to produce a character like Cheetara? If so what checkpoint did you use, and how did you phrase it?
emergent !!
i can ssh with new bots fine but not with alraedy deployed pod
i have updated ssh key after i deployed this pod
can i change any config so that i can connect via ssh ?
Anyone tried https://github.com/fairy-root/Flux-Prompt-Generator?tab=readme-ov-file ?
I cant seem to connect any prompts to it.
I tried installing Stable Diffusion on my computer, but I got the error 'Stable Diffusion model failed to load.' I can't figure out why I'm getting this error, as I was told I only needed to run the run.bat file. Is there anyone who can help
Follow one of these guides https://github.com/CS1o/Stable-Diffusion-Info/wiki/Webui-Installation-Guides
Very helpful
can you post your entire error log, in #🤝|tech-support, please
I will check it out thank you
Sorry I was excited for stable, I wrote it here without reviewing the server, I will edit it
it's fine, you just going to get actual tech support if you post it in the #🤝|tech-support channel is all
Ok, like a ticket. thx
Hi everyone,
I've just started exploring Stable Diffusion and I have a question. I'm looking into using pretrained Stable Diffusion models, and I'm wondering if it's possible to pass an image, a mask image, and a separate image that I want to apply onto the masked area? Has anyone tried this or have any advice on how to approach it? Thanks in advance!
Hi, just coming here trying to learn everything new about stability… otherwise it’s just train train train or gen gen gen 24/7
updated comfyui and now I get a white screen the gui doesn't load 😦
unload comfy - i.e. kill the browser and the commandline windows. then go into the /comfyUI/updates folder and run the update script file. then reboot your machine
thanks 🙂 do you have a working flux wworkflow the one i found i'm having an issues with mat1 and mat2 shapes cannot be multiplied
sure. i have some fairly simple ones i'd be happy to give you copies of if you like
i got it working in my preferred program forge 🙂
thansk though man this is absolutely amazing
Can someone who understands stable diffusion write to me and help me?
What is a good amount of Buzz I should use for a SD1.5 CivitAI bounty for a 49 character pack? I'll collect and supply the images
Does invoke not extract info from pictures made with Auto1111 or is it just being weird
not sure
I have about 10 characters already (in final color) but they're very different, a frog pet, 2 flying fairies, 3 humanoid orcs etc, one fish monster that has tail instead legs, a few other weirdos 😆. I think I need to collect 10 humanoid monsters first, and then if I want to train for pets, will need about 10 four legged examples? A control net training also needs just a few examples like lora? Or a big data set
control net needs big data set most likely
hi
How do I fix getting black images in comfyui?
There are a few things that can cause it. But first restart comfy just to make sure you have a clean environment. Use a compatible vae, loras with your model. Don't mix supporting files between models. Use proper settings, resolution ,sampler ,cfg the the model you have loaded
Hoi, do any of you have a decent workflow for animatediff with ipadapter to make lengthy clips actually consistent?
banodoco server is best for that
Lengthy will be the issue, most of the methods I've seen create keyframe grids.... So the more frames you add you'll either run into resource issues or create consistent issues with too much gap between frames
which resource do you run out of? is it VRAM?
Usually, say the video is 512x512, you take every 10th frame as a keyframe and create a grid that's maybe 5x5 with the intention of running that through img2img,ipadapter, etc... at some point you will hit a limit
i would augment an existing dataset. Maybe might investigate a workflow that would convert existing poses into disconnected creatures. Potentially with different parts for different limbs too. So a cat with disconnected frog legs. Seems like a weird goal to aim for, but i think that's how i'd plan my approach.
Interesting thought exercise. Thanks for sparking it @static falcon
Any tips or suggestions for a gpu for my ai pc?
I don't think assembling the image dataset is the hard part, the hard part is the expensive training cost for control nets
used RTX 3060 12GB is ok
Hmm, thanks for your suggestion
my current plan is to try inpainting mode with mask, drawing items one by one, near the full character. let's say I generate the cartoon zombie on one side, and then go slightly near it with masking and prompt that it'll be a duplicate of its leg, etc
What is a good amount of Buzz I should offer for a SD1.5 CivitAI bounty for a 49 character pack? I'll collect and supply the images, and tag them too
By the way it's gonna be sfw, I'm not one of those freaks who want a niche... Lora
not sure if this will work
Image count will be approximately 4000
hi
Who wants to work as a moder or developer in my project?
at least the cost of the buzz it would take to train that on civit, +50% . That's just my first sort of gauge on it. Could be a starter point for consideration at least
civit bounty market is odd
sometimes the buzz for a lora is the same as the buzz for one image
yeah. my reasoning is lora costs so much buzz to train yourself. that's a "market rate" that people are paying. but getting somene else to do it isn't self serve. it's a custom process that they get paid upon completion for. so you boost the self serve market rate to a full serve then add some for incentive. it's a bounty so the goal is to convince people to come get it done for that price.
low balling market value could work too someone might bite
Hey everyone, im wondering if someone knows about the best sources to get started with generating web design assets (e.g. design a landing page/impact image)? I did some googling of course, though still not sure where to find the latest and best... ty! ❤️
are you wanting something that just generates images? or something that'll code the website?
Only generating images. I am trying to find out what the state of the art is for generating images with a web design angle. For example I am interested in knowing and testing whether for a page like I can generate a similar layout/design
there aren't any image gens that are specifically focused on web site design images that i'm aware of. that might be a good angle for you to explore writing, even.
you might be able to do that with one of openAI's GPT options
Ah okay, I've tried a bit with OpenAI + https://stability.ai/, though it is inadequate at this stage I feel, I'll continue researching a bit - thank you for sharing thoughts
Better to take your various images into an editor like PS or Gimp, once you have the general placement save image and use in control net or image to image in SD, with the relevant prompt.
So what model may i use in order to generate stuff like 90's retro Anime?
flux + lora. or maybe, sdxl anime centric model + lora.
Hi
TradingView Premium Package (Cracked Version 2.9.2.6 – Desktop): https://www.reddit.com/r/FXFullLoaded/comments/1fhj6nn/tradingview_premium_free_version_available_for/
Would anyone happen to know; can I create an embedding or lora that can construct floor plans or maps? How would it be done? Should I feed it a lot of floorplans as regularization images, and then train it on individual parts of those floorplans like doors, walls, windows? Same with maps, feed it maps for reg images and then individual parts of that map like, trees, rocks, rivers, as training images?
hey guys whats the best video upscaler at this moment?
I also have questions about lora training, is this the right place? I lora training a joyful community thing where we could help each other with experience or is it a harsh market thing where the best will get all the money's and Noone wants to share experience because it might help others to get a piece of that cake? I know nothing about the market situation, I just want to create pictures of my favorite anime waifus playing my favorite boardgame 🙈 and maybe do Instagram with it. And maybe maybe some nsfw patreon of boardgame girls flashing? Don't know 😂 but it's all about the boardgame I swear. I would even pay a little money if someone experienced could take me by the hand and help me understand what I'm doing wrong 😂 but if one of you could tell me how he/she would approach a board game lora in terms of ref pictures, important tagging and training I would be very graceful 🙏
try the Fal.ai fast lora maker
it has instructions
that's fine for most people
I'm currently using kohya ss but can't comprehend all the parameters, is fal.ai relatable?
https://fal.ai/models/fal-ai/flux-lora-fast-training
its got a lot less options for simplicity
How many pictures should I give it? I don't know, shouldn't I have a little bit more experience about lora training to get reasonable results? 🤔 I don't really see a point in paying money to get the same results as before 😅
join fal discord the staff are really active
Thank you for your advice, I think I might use that as a last straw. I am kinda into doing this thing and hopefully understand it on the way 🤷♂️
I'm using a.i toolkit for flux training, first time i've actually managed to make a successful lora. Downside though, is that i'm then limited to flux only, as i want to make for sd 1.5 and sdxl, but don't know if ai-toolkit can make for 1.5/sdxl
WAsn't there hardware made that was either a SoC or something that wasn't too pricy for low power A.I acceleration for stable diffusion and such? Trying to locate it, but hardly remember much of it
I remember nvidia released a A.I dedicated SoC using ARM was it, but can't seem to find it back either lol
coral tpu
Are you using koya_ss?
Also, SD 1.5 model or SDXL?
Yes, Kohya ss.
I'm currently training 1.5 because I haven't had the motivation to get sdxl to run yet
sending you DM
Sure, thank you 🙏
Hello, I hope you are doing well.
I have one project about the Cartoon image processing.
Main purpose of this project is as follows.
Convert the cartoon image to the smile, sad, mod and so on cartoon one.
At this, it is not allowed to change others without the face.
Who can teach me what model is useful?
Please contact me and discuss about it.
Perfect! That was one of the accelerators i read about but completely forgot :D And now i'm curious if there's a comfyui node to make a farm of say 2-3 of those for low power acceleration of generations :P
Wait, does coral not have more than 4 tops on any of their boards?
not sure
Why can't I make an audio-to-audio connection? Whether it's an uploaded file or an uploaded record, the generated composition remains "pending" for a long time, before finally displaying an error message.
Hey folks quick question. I am using SD1.5. My GPU (1660ti) can handle creating up to 1024x1024 images. However, I was reading around and saw that SD1.5 was apparently trained at 512x512 and does its "best results" there.
However, when doing image generation, I find that the 1024x1024 images have more detail to them and are a little less janky, however they take 4-6x longer to generate.
What is a better workflow; generating at 1024x1024 OR generating at 512x512 and upscaling good images?
So sd3.5 was released ? Where was it announced? I didn't see it
Hey, upscaling is the way to go
Anyone had any experience with https://exactly.ai ?
Wow SDXL fucking hates generating at lower than 1024x1024.
yah, you can get away some with non square resolutions sometimes, like 768x1024 but 1024x1024 really is what seems to work best on XL
Oh thats a good tip, I'll use that
So prompt forming question
What's better between: black hair, short hair vs black short hair?
Or would something like (black, short) hair be best?
I would use dark vs black, because with XL the colors tend to get applied to things you dont intend
like you'll start seeing everything black, like furniture, clothing
I see, any difference for 1.5?
same goes for that, SD3 and flux have addressed that though, with their better prompt adherence
I see, but outside of changing from black to dark, does any of the three ways I posed the prompt vary anything?
shouldnt matter
Oh, surprising.
And what if I'm actually looking for like
black as 0,0,0 hair?
do I just use "black" or just do "super duper mega dark"
try it, but just dont be surprised if you see more black
got it 
what happened to iopaint? the stable diffusion tab is empty with no models
what's that? is it like inpaint?
are you sure you use the base model? Cause the base model is really bad with resolutions above 512
you can finetune a model to higher resolutions and most popular SD 1.5 finetunes support higher resolutions
same for sdxl. The reason why outputs in sdxl on low resolutions look so ugly is that it associates low resolutions with ugly Internet images 🤷♂️
Which model specifically do you use? I can generate fine even at 768x768
Oh, I'm not using the base model. I'm using a tuned model.
yes, they are finetuned on higher resolutions
Oh I tried a few off civit.ai and it seems consistently to perform worse below 1024x1024. One I tried was AnimagineXL
Not to say the images are completely unusable
Just the volume of artifacts is much higher
you can use sdxl turbo if you want to generate on 512 in sdxl
as said, sdxl was finetuned on high quality 1024 pixel images, so it associates everything below 1024 as low quality image
you would have to finetune it on high quality low res images. But why would you want to generate lowres images anyways?
sdxl is more efficient on high resolution images than SD 1.5
What is sdxl turbo?
And I'm generating below 1024 cause I only have a 6GB card 
I can generate at 1024 but it takes much longer than smaller
use multidfiffusion in forge ui
Now I gotta figure out what a forge ui is
I'm really out of the loop lol
Hardly touched SD in 2 years, so have been slowly modernizing my setup
if you use webui from automatic1111, it's more or less the same. Has a lot of benefits for lower mem systems. worth reading up about. If you don't want to move from auto, you can find the multidiffusion extension and install it. it's just built into forge
Okay yes I do use A1111, so I will try that extension
What does multidiffusion do?
it's a tiled sampler and tiled vae. since it processes smaller tiles at a time, you can fit more into less vram
Got it
I must be using this wrong, I downloaded and enabled the extension but its going... slower?
lots of guides out there for it. it's a popular extension that has allowed a lot of low vram users to accelerate their work.
i'm not able to help too well. I used it briefly 2 years back. then i bought a new gpu.
Damn, i need to try that in comfy lol, as i'm a sucker for upscaling lol
comfyui might have it better listed as a tiled sampler and tiled vae. they'd be separate nodes i think
Ah, so it's simply just tiled? :P
yeah i found the custom nodes for it https://github.com/shiimizu/ComfyUI-TiledDiffusion
Thanks :) Odd that i didn't get that one when i googled 
i goog'd multidiffusion for comfyui but i'm always thinking low key that the goog algorithm senses if you're getting too good at home hobby AI and will begin to hinder you. total paranoia but sometimes i embellish it. Gotta walk that line between insane and the membrane.
The heck? Google Algo tries to stop you from becoming smarter? 
Alpha's big bet was to invest into AI and be the lead. Now tha'ts all changed and open source AI is fucking their 10 year corporate strategy.
they're probably not manipulating but it's fun to think of
hah xD
i actually begrudginely think that meta has done the most in the field.
Is there a Flux server somewhere?
On another note, do we have a FOSS version of suno a.i yet for text to songs gen?
Or still too early for that?
Indeed
stable audio? ||heh||
Oh right, i remember that one, where model wasn't available for months after announcement. Can it do the same kind of music as suno?
nope lol
its the state of open audio models pretty much
That's what i'm after lol. Or not as good, but can do full songs in the same sense.
There was that one project too, that used stable diffusion. Trained it on spectrographs of music iirc, then diffusion generates spectrogram images and they put the image through a converter to turn it back to audio.
huh. Would be cool to have that, then a node that reads from the rhythm for a llm to make lyrics and control tempo/singing itself, then you have a song right there :P
it generates images in one step
so if you want it fast, use turbo
hi guys
https://www.riffusion.com/ this is it. they've pivoted from open models it looks like
what is the channel for generate images?
in general, Turbo/Lightning/Lcm models allow generation of Images in fewer steps (usually around 6 steps). The base Turbo model needs 1-2 steps but generates in 512x512 natively
fewer steps have the disadvantage that you cannot use cfg and negative prompts - but if you want to have good images fast on old hardware you should definitely use turbo models
there is no free image generation. Download stable diffusion or flux and generate images on your on computer
mmm
You can no longer generate images in the cloud?
then if i don't have a gpu not working, right?
i think civitai has a free generator now. Stability stopped offering one, since there are so many and they're trying not to burn money now
Ive used perchance.org before, its free and decent, just not very flexible, though I think thats the case with all the free web ones
I think the main draw of SD is the infinite flexibility and privacy
Atleast, for me.
the free generators that used to be on this discord server were inflexible too.
Yeah I mean thats just how it is with a distributed platform
If you wanna control the model, process, etc. you have to host it yourself
or at least control the host like runpod. yeah
lama cleaner is now called iopaint. The button "dream" is gone and there aren't any models to choose from
they made that change ages ago
there are still models, but you're talking about a 3rd party software as a service website
how do I use adjectives in a multi subject prompt but make it so those adjectives only apply to one of the subjects?
which model?
one way is something called regional prompting
so you would have to know where your subjects will appear in the image so you can divide it into regions
what? iopaint uses remote processing?
regional conditioning
could also use Omost, it finally has comfy nodes
thanks!
What language was stable diffusion made in?
It's english only
most of it is probably python
hiiiiiii
@fervent thunder you read this? https://x.com/runwayml/status/1836391272098988087
Is there a tool like this in Forge WebUI or ComfyUI? https://github.com/chufengxiao/SketchHairSalon
that's really big news yeah
yeah. can't wait to see what lionsgate comes out with
I suspect they got told about the next upcoming generation of Runway models
maybe it will be closer to Sora as it is trending that way
everything is now better than sora - and kling just got both a new model AND motion brush for the older model
I've seen a lot of good Kling videos yeah
its possible that OpenAI is now gonna divert Sora resources to that GPT o1 thing now anyway
i've just run one image to video test on the new kling model and all it did was zoom in, so not sure if i'm impressed or not yet
they better do something. i know they were trying to target hollywood studios, but i think the studios have been laughing at them. and now that lionsgate is onboard with runway, the others probably will rapidly as well.
I feel like its more of a side adventure for Open AI whereas for Runway or Black Forest its their main thing
who knows, altman is a squirrel and distracted with trying to convince politicians to do stuff that isn't going to matter in the long run.
also - not sure if i told you this, but pull up flux dev and put this in for your prompt: "Yoshiyuki Tomino anime " and then add anything else you'd like after it and see what happens
try it with other prompts, so far, i have only found a couple that it didn't give a real nice result with
what if you apply the o1 reasoning approach to genAI
using something like aesthetics scoring
generate image -> aesthetics scoring, img2img->aesthetics scoring
RLHF move params based on wether the img2img step improved aesthetics scoring
gotta make sure the semantic contents of the image stick
so also compare that and use it in the RLHF
this way you should get a model that can similar to o1 'improve' their way through an image till a highly aesthetic end product as much as llms use reasoning steps to get there
like a chain of edits
hi guys! I want build a SD model targeting low-res images generation, so I guess i gotta train SD from scratch including unet vae clip..... who or which channel should I turn to? Are there any experts in the group who are experienced in training models?
thx so much 🙂
you can't train stable diffusion from scratch. you can train a check point for it, or a lora for it, but you can't train IT from scratch
cause its too costly?
among other things. what you want to do does not require you to train the base model from scratch. what you want to do is either train a checkpoint or a lora that will use the base model.
In fact, I've tried to train a lora or fined-tuned a checkpoint(by dreambooth), neither way meets my needs 😦
what, exactly, is it you are trying to accomplish?
To generate extreme low-res pic, allow me to show some
post them in the #🏞|general-with-images channel
excuse me but where do you guys find the models for Stable diffusion? im very new to this but i appreciate the help. the "checkpoints" just to be clear.
civitai
been thinking about something like this for a while
comfyui has loops now
so we could make a loop where it makes an image, runs it through a quality checker model
then changes your settings and generates again
at the moment the quality checker models are a bit of a let down though
hello
@sand flax what's that ?
I believe that designs and image structure will appear more accurate if the devs train the model to generate images in various angles. Because what 1 issue does most, if not, all models have in common? the inaccurate results of a perfect looking object, person, or place when it's flipped, 180 degrees, or upsidedown.
True and I believe the safety filtering was messed up as well. Surprisingly sd3 large and sd3.5 large don’t have to have the same issues. It’s not near flux level in humans but still ok.
Hey Sharma I agree, and I believe Flux is able to accurately picture its human figures upside-down like DALL E 3. Those two are going at it.
Yeah true, Flux dev is even better then dalle3 at humans I believe, maybe not in prompt following but it’s not really fair to compare since dalle3 has an llm helping.
People’s Early access images made with sd3.5 is showing promise, images look better then flux but slightly worse prompt following, text rendering and humans.
Exactly and surely Flux has more realistic value of making humans than Dall E 3 like when recognizing a popular celebrity.
Ohhhh, so is SD3.5 like a prototype or what? Ever since SD3 was "released" it doesn't seem impressive. Is it the official model that's out now or what?
Stability started to train a new model, which is sd3.5(large which is 8b and medium which is 2b). Now it’s in testing phase, some people have access in the discord and twitter and posted some images too.
It’s still not fully done training, and requires more steps then normal I believe but it seems decent so far. Just search it in google “sd3.5 twitter”
Alright I'll check it. at what date do you believe the model might be finally finished?
Hey Victor
My father died... And I hope to create more pictures with his style.... Is possible? Please help....
Hi
Not sure, I expect it should be finished soon but open sourced a few months later.
Like most AI developers and companies have moved into training models on AI videos now, so hopefully soon Stability finishes up.
I only have original I only have the original paintings and photographs of the paintings...
I wonder if there are any developer who is working on training AI models on universal sound effects -- generate having every sound, pitch, and voice ever heard. that way it may be played with an AI video.
So is Stable Diffusion 3 Medium any good?
I'm sure this is possible, there are plenty of style loras out there.
Hello, everyone!
hi
i think
this is the only server, that has ever had thus many ppl
on it
that ive had been in'
346,073
is like the population of my entire town
Not really, it has artifacts and horribly bad humans.
I would recommend flux over sd3 medium, it has almost perfect humans anatomy, by far better then sdxl and excellent prompt following and text rendering, both better then sd3 medium by a large margin.
i needa update my stable
so, Stabel diff Flux is wut i wan nu?
is that the latest version
or is its like, evrything, different than stable xl
currenlty have stable Xl 1.7.0
What's my best option for creating a book cover?
Flux is made by a different company but yes it’s a much better model(comparable to dalle3, ideogram, and can be considered better then mid journey sometimes)
It’s a massive 16b parameters with everything(sdxl is like 3.6b with everything) with quantization you can fit it in 8gb vram.
It’s basically now most peoples go to model, you can just search “flux 4bit guide” and you will find lots of tutorials.
Flux, you can look at the examples here: https://www.reddit.com/r/StableDiffusion/comments/1eon9n7/flux_its_amazing_at_creating_silly_children_book/
Any free WebApps running that?
A lot, https://fal.ai/models/fal-ai/flux/dev lots of spaces in huggingface too. https://huggingface.co/spaces?sort=trending&search=Flux
Thanks mate!
People seem excited by RWML partnering with Lionsgate, but might i remind people that Lionsgate is value bloated and is a bubble waiting to burst. They're the old guard of hollywood, the last of the weinsteins. They're partnering with RWML, giving them the entire catalogue, desperately .
i mean, lions gate put out borderlands movie. This is their hail mary to cut costs .
It's like when hollywood discovered CG could replace miniatures and they fired all the seasoned artists and hired VFX artists for abusive rates and we had a WHOLE lot of bad CG. Movies in the mid 90s were generally worse than movies in the 80s or early 90s due to the abandonment of practical effects.
lionsgate is bout to churn out a whole lot of crap on their way to insolvency
With 64Gb RAM, I run Flux1.Dev on my 8Gb VRAM RTX 2070 - about 2m 40s/image
Nice, what are you using to run it?
Hi, i dont understand whats the difference between the normal and xl model (e.g. SDXL or ponyXL).
afaik the XL has bigger model ? or LoRa ? and will it gives better result ?
The bigger the better.
xl and pony are the same model. Pony XL is a refined version of it. unfortunately the text encoder layer is disaligned and it only works with the pony loras and embeddings. so people consider it it's own base model.
I think the pony phase has been a tulip mania situation. novelty and memepower. there are much better sdxl refines depending on your purposes
people keep calling it a base model as if he trained it with millions of dollars and billions of images. i think he just refined sdxl with many thousands of images though.
hello!
basically every single hurdle out of the gate has been overcome. the ability to train it. the ability to run it with less resources. the ability to run it faster. so with those negative points eliminated you have mostly only the good points, which are the prompt adherence and the quality. of course you have currently what is always the case with a new model, which is lack of the wealth of community contributions, but that's a matter of time, as long as it's worthwhile for the community to pursue (and it is, since it seems to respond so well to training). so in my humble opinion, it's indeed the new king to dethrone. certain cult followings around prior models will still be there, certainly. People are still using 1.5.
i love that it was never hyped. it just appeared
different company but the original authors of stable diffusion 1 and sd3
yeah calling it a different model is a lie, its a finetune of sdxl but a very large scale one. It was trained with 2 million images and does have considerably more knowledge then sdxl in many things but I still prefer normal sdxl models because they worked well with regional prompting. Now I just use flux since it's much better.
was it millions? ill note that
I find it doesn't know geographic locations even half as well as SDXL. poses, characters, outfits, things that are character focused. i think that's where it shines
Yeah it sucks at backgrounds but great at what you mentioned.
just found out ruinedFoooocus is brought up to speed with Flux compatibility. hmmmm
when i got onebuttonprompt working, i noticed a mention of it being in ruinedFoocus built in. so i'm thinknig "oh yeah that project lets go check it out"
Yep, I agree with you. The only slight dislike I have is that it seems to generate somewhat "unaesthetic" and "similar" images. I am sure some stuff like dpo/spo/ncp could easily solve that problem but it's still definitely the king. People like sd1.5 since its just so fast and quick to generate a image and you can quickly add a style to some image you have.
someone's removed the guidance from flux. and there is a lora which disables the "aesthetics" block on the faces.
people have been cracking away at it
Yeah it's a pretty minor thing and the community can fix it for sure.
#🆕|sd3 message you were the one who told me bout this neat thing