#💬|general-chat
1 messages · Page 175 of 1
if you had a game that nvidia didn't get round to making work, it'd never work
you also had to have 2 cards that were identical i believe.
i guess multiple gpu's have become feasible again is because the tensor cores are parallel in nature and it doesn't require much extra work to make it work with multiple cards
yeah will probably always have multi gpu now I don't see it going away again
Hi does anyone know how to disable saving the prompt to image file name on automatic1111? I can not seem to workout how to do it or find any info on how to do it.
i think i saw an option for that in the settings somewhere
sent you pm with screenshot, no images allowed on here
that's why there's a #🏞|general-with-images channel
Hi, guys i am new here, please tell me whats going on heree?
Are you a bot? Why would you join a server not knowing what it's about?
no, i am human
i am looking for online job, like freelance please help
I wish I was asked that question, I would have responded differently!
I don't think you'll necessarily find a job in this discord
this is my email address black@darrielsdigitalhub.online for any available online jobs please dont hesitate to inbox me, i am available immediately.
I meant, asked whether I'm a bot!
You could perhaps post your work on artstation, or similar sites and ask for commission or something, I don't know the routes for artists, I'm a software programmer.
You should definitely make a portfolio of work or something though, that's all I know about art.
put your work on artstation, make video's and put them up onto youtube, patrion etc.
You need to use chinese models for that, they dont care about copyright

hunyuan, auraflow?
go look on civitai
also there is loras for stable diffusion
maybe you get lucky and find loras for the specific characters you want
or if you are feeling frisky you can make your own loras
anybody can hop on voice chat and help little bit with face swap
if someone is down im sitting on channel with my friend all help will be appraciated gang
anyone know what a good price is to pay to have someone get stable diffusion running on my pc? i keep running into a bunch of errors. thanks.
IDK if there is really a service where you pay someone and they help you install something
Just follow a tutorial or ask ChatGPT. You should only pay someone if you need a highly specialized solution like setting up a whole render farm or something.
because discord will popup 'suggested' servers for you to check out and not tell you what they are
i know somethign better. go to youtube, search on scott detweiler comfyui and watch scott's comfyUI tutorials
The guy was fishy. Probably not a bot but most likely a scammer. Either way there were alot of scam bots recently, it's a real plague.
he asked what this server was. that's standard for discord users that got a suggested discord server from discord.
if he was a scammer, he would do somethign else.
No. He was offering "work". His website is an empty shell. A complete facade. This is exactly how scammers operate.
i have automatic1111 installed and working
what is the base model called? i’m having trouble adding models into it
is there a guide that shows example pictures for all Forge styles? There is a lot of them and a reference would be helpful
Thank you so much to @flint harbor for the permission!
I'm researching how we can make LoRA work better for designers while keeping human creativity instrumental. If you've played around with LoRA or are interested in AI + design, I'd love to hear your thoughts!
This survey will help shape my thesis on making AI a better creative partner. Takes just 5 mins to complete:
https://docs.google.com/forms/d/e/1FAIpQLSd9i7BRn1rEXYHeK2Zz2TXyk62Xw6l8P5YRVwI5uCImFdjniw/viewform
Thanks in advance! 
What are "Enhanced Design Matrix Systems "?
Basically a new method and model we're trying to develop, but we haven't fully realized it yet. 
No you are using a specific jargon. I wanna know. What's a design matrix? What's a design matrix system? And what's an enhanced design matrix system?
hello
you should learn to use google https://en.wikipedia.org/wiki/Design_matrix start there
sent you in your survey - i think you and i need to talk.
hey
you should use the infinite image browsing extension
itll keep all of your images where you can look at them in your a1111 plus the prompts and settings for the images
thats what i do because idfk where its stored otherwise 
i have no clue
hey so do we get lower quality pics if I do higher batch size and lower batch count? or is the quality of the generated images same irrespective of if I do higher batch count or size?
Batch size and count doesn't affect quality
alr. thanks
hey so been out of the loops before stable defusion 3 came.
anything significant that I should know about? xd
thats worth downloading and playing with.
If your GPU has 12gb or more vram you can try SD 3.5 or flux
Both are heavier than sdxl but give good quality
Nope they are gone
Hola!, alguien que hable español?
ive got a 3080ti. is the difference in quality big enough?
you can try, i would still go for sdxl/pony etc
but SD 3.5 medium and flux fp8 / q8 should work okay too (1-2 min per image)
alr.
thanks.
personally I go with SD 1.5 or SDXL if its less than a 4090
I rent a pretty wide range of stuff and older cards are fine but not so much for flux
there are nice acceleration loras for SD 1.5 and SDXL now too can make image in 4-8 steps https://huggingface.co/RED-AIGC/TDD/tree/main https://huggingface.co/h1t/TCD-SDXL-LoRA https://huggingface.co/tianweiy/DMD2 https://huggingface.co/wangfuyun/PCM_Weights
yeah flux dev only works on my 3060 12gb with med vram enabled, other than that using the fp8 and or schnell takes about 15 second per step
8 steps on schnell versions or 20 on the fp8s
hi
hello
anyone here have experience with controlnet for sdxl?
i want to train my own controlnet to generate a spritesheet from a single sprite. i think controlnet is my best bet. my dataset is around 1200 spritesheets ripped from a game. input is the spritesheet with only the large front sprite, output is a bunch of other poses, including the large front sprite (in the same position)
is that dataset large enough to train that kind of transformation?
they're pokemon sprites from gen4 if you're curious
does this sound like the kind of image generation task controlnet is suited for?
no I think this needs checkpoint or lora
i tried a lora approach before. provide a bunch of complete spritesheets and then do inpainting to try to complete and the results were comically bad
though i was using flux and flux gym, which restricted me to 150 training examples
sometimes Flux doesn't want to do something
should i try that same approach with sdxl then?
SDXL won't manage it I think
my concern is i'm not sure if standard diffusion models have the "cross attentional" properties i'd need for "translating" a sprite to different poses
are there other models that would be better suited to this task?
not at the moment no
wasn't there a lora for rotating something in an image recently?
my electricity freaked out like crazy and now my Loras folder is empty Lol
turn off your computer and buy a new hard drive immediately
if your drive has filesystem corruption you could start overwriting other files
Chkdsk
ohhhh I dont really have the budget
"buy a new one" might not really be important
but i'd turn it off and look up troubleshooting steps on your phone
I could run a full system check
cause there are circumstances where a corrupted filesystem can lose files.
i'd do the research on your phone, and then come back to the computer once you know the steps you are going to take
okay thank you
best of luck
Hopefully you can recover them.
If the drive was SSD/M.2 NVMe you can use Recuva to try & restore the files. If the drive didnt fail. In most cases you should be able to get some if not all files back.
3.5 mechanical SATA it should be recoverable (somewhat) using diskcheck & defrag. Mechanical theres more chance of losing files or pieces of ones.
SSD & M.2 media dont ever erase all data. Unless it were to actually fail.
Flash media & SD cards as well. All content is recoverable.
right, the main concern for data loss is a corrupted filesystem writing over data, which is why i suggested turning the computer off until you know what you plan on doing. can't erase files without electricity
anyone know how long it takes to generate a 1024x1024 image at 30 steps on an Arc B580?
using forge, it takes about 2 minutes on my GTX 1070 and took only 15 seconds on my now broken RTX 3070
Do the different WebUIs sample differently? Using the same Model, LORAs, prompt and settings give me different results in ComfyUI vs ReForge.
Thats normal as they not work 100% the same
Ahh, too bad. Was hoping it was a settings thing or maybe something I could configure in ComfUI, since, for anime, I prefer ReForge. ReForge runs worse on my PC, though 😉
What's your GPU?
2070 Super
By "runs worse" I don't necessarily mean slower. ReForge just slows down my entire system while ComfyUI allows me to continue doing other, non-resource heavy things on the side.
do you have to use a GUI?
something like Diffusers would slow down your system minimally cos its just some python
Never not used GUI, seemed daunting when I started image generation.
if you can learn Comfy noodles, Diffusers not a harder difficulty than a big Comfy workflow
How difficult is it to do stuff like Hi-Res Fix and Adetailer with Diffusers?
Or LORAs for that matter? Is there a beginners guide I could follow?
for Adetailer you could use the Flux fill pipeline https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/flux/pipeline_flux_fill.py
and use a seperate VLM library to generate the mask
for hi-res fix the Flux img-to-img pipeline https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/flux/pipeline_flux_img2img.py
and use a separate upscaling library to do the upscale
Thanks for the pointers!
I'm gonna first see if the slow-down/freezes are my PC or some setting, though. After learning comfyUI and ReForge, I'm not exactly thrilled at yet another new system 😄
A controlnet can be a good choice for this, depending on what you're trying to control. But you also need a model that will give you the output you want. You will need a lora that has been trained on spritesheets. Or a customized checkpoint would be even better. The purpose of the controlnet would be to guide the output. It's basically an abstract prompt that can represent any concept. It really depends on what you need, what you can achieve with available tools and if training a controlnet will improve your results. I'd say you should first train a lora and then see if an IP adapter plus that lora can already give you satisfying results. Have you tried using a spritesheet as an IP adapter input and defining your character with either a second image input or a text prompt?
Ok so how about this:
1: produce descriptions of my sprites
2: train Lora on all my complete sprites sheet training set images and descriptions
3: merge into SDXL as a checkpoint
4: train a controlnet with one input as the front sprite, one input as the description from (1) with the Lora keyword appended, and the output being the completed sprite sheet.
I have never trained a controlnet, so I don't know what the right approach would be. I also don't know what the requirements are. Are you trying to do something like openpose but for pokemon characters? Or are all the poses predefined? If the poses in each spritesheet are different, because the characters have different body shapes and limbs etc, then you have to plan your controlnet input around this limitation. But it's indeed possible that it could learn to shape a single frontal view into a wide variety of poses.
I’m trying to generate the 8 walking sprites from a single sprite. They are specific poses
Oh you mean 8 different viewing angles? And each angle will have multiple frames? I don't think you can fit that into a single output.
For a task like this your best chance is to find a model that has been specifically trained to do just this. Animation is difficult and you won't get good results from a regular image generator.
It’s less “animation frame” and more “left version and right version”.
anyone still using sd1.5 here ?
yeah its my main model
But as I said, if it's 16 individual sprites you will have no luck. There will be too many artifacts and inconsistent details. I don't think these models can create sprite detail down to the pixel level (at least not out of box). and if you upscale the spritesheet it will be too large. You might have to split it into front view and back view or something, generate twice and hope it's consistent. Or you could research if there's already a model that can create your sprites. Seems like a no brainer to train a model that takes a front view (or description) of a sprite and create different isometric views. I know there is one that can rotate an image, but not at an angle afaik.
My sprite sheet is 256x256 so it might not be impossible. I could also maybe figure out some other post processing pipeline after generation
Been a moment. So is everyone using sd3 now? Or did that end up to be a fall short?
I was last living in comfyUI / anime diff
I'm using SDXL, it has a lot of models and support and runs pretty fast. It has issues understanding prompts at times, but overall pretty solid for my purposes at least.
sd 3.5 large and sd 3.5 medium
medium is more artsy, it's also smaller
How can I create floor plans that follow instructions with Flux? Do I need a Lora? I found one on civitai, but haven't had a chance to test it
I can guide you on that
@dusty trellis
Medium has a lot more smioth details
Smooth
the amount of parameters it has, larger will have have more knowledge, medium is a trimmed version so its not as hardware demanding, same model being distilled
emmm
What are the "Override Settings" in ReForge? I just copied an image into my ReForge and suddenly this setting popped up.
so, decided to put my SD stuff onto a dedicated nvme ssd today. most of my models were already stored on my main nvme but all my ui's stored on a sata 1 hd.
Here's a comparison
300gb nvme -> nvme = < 1 minute
200gb hd -> nvme = 28% complete after 20 minutes, still copying....
11 mins remaining
The same as alwas. Append to your image and only infill the appended parts.
with 1.5 powerpaint v2 is good
yeah there are some comfy nodes https://github.com/nullquant/ComfyUI-BrushNet
next youre going to tell me we have AGI.lol ironic timing
o3 is kinda close to agi(still worse then humans) in just text according to benchmarks except a bit expensive 😆 (5000$ dollars for a single eval, 25,000x more expensive then llama 3.3 70b, and way more expensive then humans)
it's much cheaper to pay some chad £10/hr to sit in a box and pretend to be AI.
We don't have AGI. We don't even have AI. We have good software that can do crazy stuff when used by skilled people. Even ChatGPT, the overhyped search engine, still talks nonsense 90% of the time.
AI is just a lot of complex graphs referrencing other graphs
IMO I think calling it artificial intelligence is a stretch, seems like we are artificially hyping it up by adding intelligence on the end of it
Yes. The great benefit of neural networks is that you can just feed it a bunch of stuff we do and it'll figure out the rules, so anyone can do the same without having to learn all the skills themselfs.
it's that 10% where the magic happens
At least people can be self aware when they say things they don't really understand or when they outright lie. With ChatGPT you never know if it has seen the information it gives you or if it just made it up.
it extrapolates the answer from it's NN graphs, so who knows
هلا
One thing that ChatGPT can is it can upset me just as much as a real person.
P: How many R's are in strawberry
GPT: There are 2 R's in strawberry>
P: Incorrect, there are 3 R's in strawberry.
GPT: You are correct, there are 2 R's in strawberry.
Get a pillow. At least it'll hug you if you cry.
and a box of tissues
its okay
Feminism preys on the old and the weak.
Can this discord channel just ban anyone who is into Blockchain & NFt, by default?
I haven't used discord for quite some time, only recently for stable diffusion. Is this flood of scammers a discord wide phenomenon or is it just this server? It didn't use to be this way. Everyday I see people with weird requests, as if everyone is testing their newest scheme.
Just report the message as spam and block the user
there isn't much anyone can do against that kind of crap in general; polluting a channel for conversation with self promotional advertisement spam
Hello
You can also save up, track them scammers down one by one and beat them like Jay and Silent Bob.
just this server allows that out of the bunch im in
its literally only this server yeah
it's actually all servers - what a lot of them do is try to hack inactive accounts, and then use the ones they can hack to spam with. but most servers pay closer attention with various bots of their own and catch them fairly quickly
Hey I am thinking about renting a GPU, is Salad Cloud legit? $0.18/hr for a 4090 GTX sounds a bit too good to be true
lesser hardware on other clouds is more expensive
that's cents per hour, essentially they are paying to heat your home
They look for servers with large numbers of users to try n scam. Buzzwords are pretty good giveaway.
Anything popular too. Especially at ATH prices. Ppl desperate 😭🤣
is that a scam then?
essentially phishing for users to sign up and gain access to their pc via malware you install
Unrelated to your comments, sorry. In regard to scammers spamming in servers.
Renting gpu might be legit. They do it in mining & cloud computing so. Better than the ppl tellin ppl the secret to immortality in 6 weeks for doge 🤣😭
Guaranteed 🤣🤣
Sounds like a great idea but I want to hear about some experience from you guys😂
Hi does anyone know what would be the best way to try get Ai to mix different elements from different animals together ? I wanted to try it if it’s possible? With a dog and a crocodile for example so you have some idea of what I mean,
i have a 4070 ti super, that's $0.13/hr. that's more profit that can be had mining coins ($0.028/hr)
You could try using several lora's and have them activate at different stages and different strengths perhaps
How would you do that ?
i've not tried it, but saw a video describing it. you just chain the lora's, then use the key-words in the prompt I think.
Have you tried it? That should not be a problem at all. You might have to be specific, like which body parts are from a dog and which are from a crocodile.
I’ve tried stuff before like (gharial teeth) and it didn’t do anything.
You need a model that has been trained on the animals you mention. You might have to increase the weight of words that don't show up.
Or workout how to train some loras?
That's overkill. High quality models are absolutely capable of mixing different animals.
Do you know any ?
Flux and SD 3.5 are both good.
sometimes you can train a lora faster than you can fine the prompt tokens
do u konw how to create pictures using lora but also own image like a background i mean for example ( woman (my lora) sitting on bed but not generated by models but from upload image)
yeah this is called inpainting
it works well
there is a more advanced way called noise inversion but inpainting is still fine
I've actually done barely any inpainting TBH
but maybe someone else will have a workflow
ahh ok
im ngl, i dont like comfyUI, anytime i see something that requires it, even fun stuff, then i dont wanna try it because i know something is gonna break or not work even when i follow instructions perfectly
yea u arre partially right but i m trying to improve my skills,i believe in this software
What's the best img2video generator for stable video diffusion? And should I use a1111 or comfy or something else?
Stable video diffusion itself is a model and kinda outdated now. Comfyui is definitely the ui to pick if you are doing video gen. Not sure if other uis even support 1 model.
Hunyuan is the best text to video model for sure and comparable to top closed models but will take a very long time. Only text to video and video to video support.
Ltxv is another choice, it’s incredibly fast and decent quality with i2v. Supports t2v, i2v, and v2v.
if only it wasnt a pain in the butt to use...
@karmic brook
mid and stable ,which is best ,iwanna ask
anyone have 3.5 large recomended settings?
Hi
Hello
hello
im new in here, how to play or using stable diffusuion in offline mode ??
^ v^)//
Hey, if you have a good GPU you can follow an Install Guide for a local Webui:
https://github.com/CS1o/Stable-Diffusion-Info/wiki/Webui-Installation-Guides
im using a ROG G531GL is that good to instal that software ??? and is free to use ??
Which GPU does it have?
And yep local webuis are free and uncensored depending on the models you use
hey i just started trying sd today, what's a good inpainting model for forge ui?
base Flux on its own
inpaints suprisingly well
if you can get alimama-creative/FLUX-Controlnet-Inpainting working on forge then that is good
black-forest-labs/FLUX.1-Fill-dev has both pros and cons
I would be suprised if forge had Brushnet and Powerpaint v2, but if it does, I think those are great, for SD 1.5 and SDXL
its a good question TBH
they have a channel with a bot you could use #artisan-faq
otherwise most people use a piece of software like this:
https://github.com/comfyanonymous/ComfyUI
https://github.com/lllyasviel/stable-diffusion-webui-forge
or a library like this:
https://huggingface.co/docs/diffusers/en/index
you also need the actual models, which you can get from places like Huggingface or Civit AI, for example here:
https://huggingface.co/SG161222/RealVisXL_V5.0/tree/main
I gave a range of options cos different people like different software. Fairly sure RealvisXL is not hated by anyone so that's a safe choice.
Thank you! Looking for image to video specifically, so you would recommend Ltxv?
LTXV or Cogvideo yeah
Merry Christmas
hello hello hello sziasztok.micsu pisi hogya izebe generlok csupasz picsakat aj ajval? remelm tuttok segiteni
i finally got a single comfyUI generation to work...
with a video gen
i cant touch anything ever gain or it breaks
thats so real
its how everything is for me 
"its working?" dont touch anything, Mania, it may never work again
i spent like 4 hours yesterday on LTX only for it to break and not work and i still don't know why, then some guy told me about Ruyi which is good (or decent enough) for the content i wanted to make and now it magically works
not everything but its how ComfyUI is for me
i run forge
i was running a1111 but that was breaking on me daily in the WEIRDEST ways

dont get me wrong, forge is too. but its doing it less so ill take it 
ive heard comfy was good though but it always lloked so confusing i cant trust myself not to break it
i use a1111 for SDXL and forge for FLUX, and now comfyUI for video gen stuff
it is confusing lol, the only way i got this one to work was detailed instructions for every step and comfyUI manager

my laptop couldnt handle videos or flux ngl
idk how but it runs sdxl and pony and illustrious
its DYING but it runs it LOL
not even detailed instructions could save me i feel
try Swarmui, its comfy but visualized like a1111

whats the difference though
like is it better?
i personalyl think so
what does it do better 
its less resource heavy for one
i heard complaints about that one tbh
something about it not being updated often enough so it keeps breaking? iirc
i use swarm daily, and it gets updates almost daily.
if it works better and doesnt kill my poor laptop as much
um referring to a1111 maybe, not swarm
Use stability matrix to manage vairous UIs
i can just use two generators
just what i heard 🤷♂️
i use stability matrix!
its new for me
but i like it
hearing something vs actual experience and using it are two different things
i like it cos i can use the different uis and have everything symlinked
so SM has swarm on it?
yea
tbh i just downloaded the package with what i was familiar with, i didnt even bother to look at the others
You can run hunyuan t2v on 8gb vram now but you have to wait for 30+ mins😄
i have 6gb!

my laptop runs on pure fear, pretty sure. cant see any other way it manages to run stuff
it has its own method of upscaling and tiling that i think is better
Forge will work best with 6gb vram
ye its what i use rn
but my laptop is still dying

we will soon have a rock solid version.
it will never break and it will fullfill all your desires.
I wait until they have image conditioning in hunyan
text2video is too annoying for me
you are just wasting your time tryign to upgrade every 2 months
why 3 years? It's probably a matter of weeks
nah
before video/audio/image gen is perfect its at least about 3 years
by then it will be so good you can literally do anything with some prompts
dont give me hope like that

its very good even now
but it takes a lot of love and skill
in a few years it wont take too much effort
Merry Christmas kids
what did santa bring you?
Well apparently SD3.5 large and Hunyuan
I was up waaaaaaay too late last night playing with video gen
Well… im still working technically soooo Santa has not stopped by the house yet
OR
He’s there and im here
Heck, I know darn straight im on the naughty list… I made that list for lifeeeeee
does putting data directory of stability matrix for my ui and models into a hdd rather than ssd make a difference to generation speeds?
i will have a look at those in a bit
it didn't have negative prompts, and so i couldn't remove parts of an image
I am trying to train a SDXL LoRA but I only have an 8gb gpu 1070, I need 12gb or 16 atleast it seems like. I have googled but I cannot find a websites where I can train LoRA online and pay. Is it possible to rent a GPU and train it that way?
hi, need some help, is there a way, to write a line of prompt with several options/things, and let the SD choose one of these at random every time a picture is generated?
lets say i want to make a very long prompt, and generate 100 images, how do I make it choose these things without breaking the result completely?
It depends on the UI you use but at least comfy and auto know this syntax. (Comfyui 100% sure):
a {green | blue | yellow} bottle
forge/reforge
dynamic prompt with wildcards
It is complete random if you get a green, blue or yellow bottle. Do not use regular bracket as they won't work
in swarm it has it default built in
where u can do "1girl random:blue,red,puple hair
and itll pick one of them duyring gen
will it still work if i dont want, like u said 1 girl? just lets say 5 words in 5 groups, and choose 1 thing at random from each group
no bottle (blue red green) kind of thing
but more (apple, orange, banana) choose one
Yes you could create more complex expressionen { a man dressed like a cop chasing a criminal wearing an orange overall | a woman selling corndogs... }. You could also mix these expression like a {fireman | cop | native american | steelworker} is {running | dancing | crowling} wearing a { beautiful bandana | unicorn t-shirt}, ......and so on
and this is hard coded into SD? if so then wonderful, watched most of the video sent by ellie who pitched in too, and i saw i can just do {thing|thing2|thing3} and it will choose one, or do {2$$thing|thing2|thing3|thing4} and it will choose 2 out of 4
can u just confirm if wha im saying here is right mate?
I do not know about the 2$$ or Random Additions Ellie mentioned. But if you need two out of four you could just use "{ 1 | 2 | 3 | 4} and { 1 | 2 | 3 | 4}". Yes it could lead to 1 and 1 selected but who cares if you generate 100 pictures or 130 and delete the 30 where this happend?
that'd make the prompt several times longer xD
shame u dont know about the 2$$ thing, was gonna ask if 2 is 2$$ then is 4 4$$$$
I would just start easy and simple, overthinking is the enemy of starting 🙂
Maybe the image in your head is even to komplex for some models and you need other ones who do not understand this logic expression within the prompt. Or you simply use a workflow as API and write some python code to call it.... Tons of possibilities
thing is, i dont want to create 1 image, change prompt, and do another 1 image, but leave it for the night, and have like 50 pics with randomized results of chosen words
yeah and therefore you got a solution. 🙂
btw, is there supposed to be a space in between | and a word? or just word|word
There are many websites that let you pay to train, CivitAI, Krea, and so on
@median jewel I personally prefer replicate
my files are large tho so civitai wont work :C
Your image files? Have you tried compressing / converting them to reduce size?
mhm i have around 10000 pictures trying to create a style and the pictures are 1500x1500
so they take up a lot of space :D
no need to be 1500x1500, reduce to 1024x1024
well nvm they are atleast around 3190x4250 but i need them in the same format they are how much would i need to scale them down to keep the infromation
yeah thars way too big tbh lol 1024 is best for sdxl
when i try using sd1.5 LoRA training in kohya_ss it works with no problem but when i do it for SDXL it doesnt work at all and crashes, it might be because of my GPU 8 GB storage but could the pictures size potentially be the problem as well?
you might want to just switch to using luca taco's trainers on replicate
I'm new to AI image generation and want to use something open sourced like stable diffusion. I am confused just jumping into this chat. Is there a starter guide? I went to the website and do I need to pay $10/month? The Spawning Inc. one
I am a painter and want to create digital works that are in my head but are so complex that it takes a really long time to draw them on paper or craft digitally
u can get it on ur pc for free with a little effort
and is it a finished product or is it in Beta testing? That's what I have gathered. Is this the right chat to be asking these questions?
dunno, joined it today myself
and its a big data space man, there's a ton of things to explore
which UI to use, which checkpoint merge works for u
learns prompts and other stuff
find and download lora files
and if ur pc is not great, it will take a while to generate
lots of models have had their full release now and not just their beta testing release
id recommend just typying on YT how to install local stable diffusion, same thing i did
I would recommend checking the install guides from the pinned messages in #🤝|tech-support . Because most yt tutorials are outdated or just wrong.
or stability matrix if not amd :p
i always suggest it just cos how easy it makes things
@warm junco have u tried any video generation? with your amd card
i used AnimateDiff in the past
didnt tried any of the new stuff since im not a huge comfy user and dont like to download 60gb for video stuff xD
i was wondering if there is any video stuff i can gen with only 16GB vram
LTX Video at Q8 on WSL2 or Linux should work
but not an easy setup
q8 is fp8?
nope its Q8
🤔
it wont work with Zluda
yeah just realized lol
i have a GT710...
its linux too
but i guess not fast enough
will just ignore the video generation scene
yea for now its still very early for local setups
give it some time
true
Nothing else comes close on hunyuan video if you have good enough gpu to run it.
yeah it even rivals several top closed source models, now it should even work on 8gb vram nvidia gpu atleast but will take a very very long time.
Is anyone training an ai to be able to generate images using another AI where they can detect if the eyes are crooked or there is more than or less than five fingers or toes and check for other issues and make the proper correction to them. In the case of comfy ui it would be able to use any node and settings in order to accomplish this task . The actual image generator ai doesn't know about any of these things so having an ai that is trained on knowing these things would perfect/finalize the image generation process. As far as I know all of this technology or features already exists but there has been no ai trained to utilize it yet.
I decided to make a reddit post.
https://www.reddit.com/r/comfyui/comments/1hm9qhu/another_ai_in_the_loop/
does anyone know how to remove nsfw protection from lora?
lock the door and put on a dont disturb sign
guys stable diffusion runs very slow on my PC and sometimes got errors. What should I buy into my PC to make it run fast?
What's your GPU?
Come to #🤝|tech-support
I don't know how can I check it?
Thanks for the help by the way
In the taskmanager under GPU 0 or 1
No matter what some people here may say, AI image generation is 100% beta testing. If you wanna "quickly" get your ideas on paper, it's a steep learning curve. You will have to use a similar workflow to painting, so you will absolutely need your painter skills. If the things you're trying to do are visually well defined, there are lots of techniques with which you can control the image generation and get the result you imagine. If however your ideas are more abstract and difficult to sketch or put in words, consider yourself lucky if you get something that even remotely resembles the image in your head. It really is an art on its own and not a magic wand.
um...
no
those of us that have actually taken the time to learn how to prompt and learn how the various AIs we're using think can get what we're after with no real issues, usually the first generation of a prompt
With prompting alone you never get a finished piece (unless jank and AI artifacts is what you're aiming for) and you absolutely don't have full control over the composition or the poses, expressions, details etc. He said he's a painter, so I'm assuming he wants to have more control over the art he's making. I took a look at your YT channel and that doesn't impress me much. The stills that is, not the animations.
oh but i do. and no, there's no jank or AI artifacts in it. and i have 100% control over the entire thing. see, that's the point. you just haven't taken the time to actually learn how to use the technology - or you are using older models
and i don't honestly care if you're impressed. you're not my target audience.
As a non-creative that came into this space because of AI I have had to had multiple crash courses on certain subjects like aperture sight and focal length. I learned to prompt but at the same time for me to learn to prompt I had to learn about those subjects itself
Your stuff is full of jank like deformed faces and limbs, asymetrical details, prompt bleeding, inconsistent lighting, oversaturation, out-of-place detail and so on and so forth. I guess your standards are just lower than mine and that's okay. But don't say you got everything under control just by prompting, because you evidently don't.
no it isn't. I knew you were only here to troll
I'm pretty sure you're the one trolling, acting all oblivious. But whatever man. Live your dreams.
if it wasn't for them ide be going on thinking you can generate video on a 3090, what a crazy belief i had before i read their correction
@slender vault i cant dm you and i need help with swarm 
small curiosity about the illustrius models.
i noticed some stuff on civit ai that utilizes both ilustrious and pony loras together
are pony loras compatible with illustrious checkpoints?
are there any img2video models that run locally and dont eat up a couple thousand dollars worth of vram?
For text prompts is it better to keep them short or do them with extra detail?
shaking in my boots rn, gonna tweak if someone says shorter cuz that is NOT what im doing 
Ur in the discord of one of those models
Ltxv is your best bet, should fit in 8gb vram and be very fast.
depends on the model, flux and sd3.5 can usually handle any size prompts but more detail might make it better quality. Sd1.5/sdxl usually prefer tags and you want to stay under 77tokens.
lol I mean it might have some effect but not much
so nothing long form just single words
If there is anyone in here that could help me with SD, please DM me
Hey, if its technical feel free to ask in #🤝|tech-support
Short prompts mean that you'll only see the model's biases (basically the average training data). It might give you good results if you have other ways of exploring the inference domain. But adding tokens is the simplest form of navigation. What you write does not have to make sense. Don't hesitate to add descriptions like "the cat sounds like motor oil" or even use fantasy words.
If you have to ask, chances are you're not gonna train a Flux killer.
are img2video models in ComfyUI at this current stage actually better or as good as Veo 2, KLING, Flux etc?
In Jan…. Yes
Current stage is questionable that anything is really king ship of fudge mountain
Hunyuan from all my research is current champ
But has no I2V until Jan
even Veo and kling are not adequate now. Come back in 2027.
Or in many cases… wait a week
nah
Seems Hunyuan is outperforming Sora
This 2-5 seconds clip stuff sucks
But… so is waiting 20 minutes for a 2-5 seconds clip of trash
i wouldnt mind short times if the image was really good
but its not quite there yet
and yes there is also the waiting time for now
next phase GPUs will be better
I have been able to gain about 8 min render times
8min - 5 seconds
Well… it’s acceptable… but continuity doesn’t exist it seems
of course thats one of the things were waiting for, consistency and continuity
with context windows, you can actually get 20sec vids with hunyuan and there was a new technique which should be even better but no support for hunyuan yet(it should work though)
Like a batch render?
If you could render a 5 seconds clip, steal the last image and create a new render off that image and stitch all in a final…
But…. It would be a big ole mess
yeah
Its in kijai's node: https://github.com/kijai/ComfyUI-HunyuanVideoWrapper, and its not the last-image part, hunyuan doesnt even have i2v support yet.
example
#🆕|sd3 message
with each new model and texhnique its a bit better but then 3 months later soemthign better comes and all the workflows have to be updated
were not there yet thats why
This is the new technique and very amazing but only cogvideox2b support right now, hunyuan support should come soon: https://github.com/TencentARC/DiTCtrl
will work for video-video too,
I was crazy deep in animediff
I would like to shoot my own video of whatever… and use that video to generate a new video to replace the model with a creature of sorts
Or have interactions within a scene with a new generation thats “aware”
I have generated some wicked stuff… just not in the ballpark of what I need.
AI video is either full on psychedelic nightmare or uncanny valley now
I kinda like the nightmare fuel
SVD is a joke compared to runway or Kling at this point for me and I tried hard to get it to work well, always produced trash
sure I didn't like SVD either but there is HunyuanDiT in Comfy now which is at least Kling/runway level
Anyone with a solid 3D background here? Like Maya, Houdini and so on.
Sorry only wireframed background 🙂 3DS Max and then moved to blender....
Haha, that's good too. How long have you been using 3DS Max or Blender?
Started 15 years ago, after 3 years made a pause and started again 4 years ago but only as an activity in my free time. The last 2 years the time reduced more and more due to the fascinating world of AI 🙂
Can use it barely enough to create objects to print with the 3d printer. More a technical guy then an artist. Actually i am trying to get the following scenario to work (citygml > obj > blender > stl > 3d printer)... Headache as usual if you need to work with goverment created "standards"....
so i might definitly not be helpfull if you need somone with deep knowledge.
Hey, that's cool! I just really enjoy connecting with people, and a family member bought a Bambu Lab in the last few days, so I'm also a bit familiar with 3D printing now, haha.
Have you printed anything yet?
All kind of (unnecessary) Stuff: from little Giftpackages, Stamps, Vector Low Poly Images, Tools for pottery, Christmas Decoration and my personal favorite of useless things: a sheep head and legs as spare Toiletpaper-Roll Holder... 🐑
I did a lot o images with this controlnets but I wanna know your opnion. How do you understand this controlnets? I use especifically for achitecture, if help you give me examples:
- Lineart
- Softedge
- Scribble
- Canny
anyone know how i update python packages in comfy?
You and your family member should definitely take a look at HueForge. It's the missing link between 3D printing and AI art.
Hey hey
I am running a older model of a checkpoint and I want to keep using it. But would love to update it myself so it works with other Loras much better.
I have another model that is very different and much Loras it works and most like characters does not need any changes etc.
But with the older model .. it's not working as well
So how can I update that model or will make my own checkpoint?
If you mean that not all loras work well with the checkpoint you're using, that's totally normal. You can't train a checkpoint to make it work better with an existing lora, but I think you could just train a lora to work well with the checkpoint you're using. But you would have to do that for every lora that doesn't work well. You can't just "fix" the checkpoint.
Oh okay, is it easy to make a lora? Never done one
I haven't trained one either, but it's easy. The most difficult part is to get the training data.
What is that?
are there any resources on fine tuning img2 3d model ?
I’m looking to buy a new computer (to me) would an m2 max MacBook Pro, with 32gb ram and 38 core GPU suffice for fairly quick image and video rendering in comfyui?
I had an intel chip Mac before so haven’t had much experience with what technical specs are needed, I’ve mostly been using cloud services but wanna run it locally
How much ram does the GPU have?
I think the computer has 32 gb of unified memory
If you need speed, don't buy a laptop. You need an Nvidia GPU with at least 16GB RAM
Well I need it for more than just stable diffusion, gonna use it for work as well and unfortunately can’t afford both atm… I know an Nvidia GPU is ideal tho
It will work, but won't be a satisfying experience.
Damn, ok. Just will be pretty slow?
What GPU does it have? You'll need a dedicated GPU.
I’ve seen ppl say they run it with much lower specs on MacBook pros so was hoping it would be able to run it fairly well… hold on just a sec I’ll drop the specs
Apple M2 Max chip
12-core CPU with 8 performance cores and 4 efficiency cores
38-core GPU
16-core Neural Engine
400GB/s memory bandwidth
Media engine
Hardware-accelerated H.264, HEVC, ProRes and ProRes RAW
Video decode engine
Two Video encode engines
Two ProRes encode and decode engines
And then it says it has 32gb of unified memory
I figured the 38 core GPU would be helpful… it’s like a 3500 dollar computer 😭
But like i said I know there are better/cheaper ways to get better performance for stable diffusion specifically I was just hoping it could run it fairly well
So it's a custom GPU and not Nvidia? This sounds like it's on-chip, which isn't great. Are there benchmarks for stable diffusion? I think stable diffusion 1.5 and XL will be fine, but better models and especially video will be a pain. For 3500 dollar you should have 64GB RAM.
I’m trying to find this information
I was trying to use flux mostly
What's the name of the MacBook model?
MacBook Pro (2023) 16-inch - Apple M2 Max 12-core and 38-core GPU - 32GB RAM - SSD 1000GB
Just a moment
I think it’s A2779
Man actually I’m not sure, I don’t have it yet and it’s kind of hard to figure out
Bc I’m not buying it directly from Apple
If it has to be a MacBook, make sure the GPU is a M3 Max. If you can, try to get 64 GB RAM or at least 48 GB. I think this is the best you can do with a MacBook.
Alright, we’ll thank you for all the help and information.
You're welcome
Flux on Mac is very very slow
Damn
What if I were to eventually purchase an external nvidia GPU for it? Or is it an OS issue as well?
Should work with an egpu I guess.
ok
hi
im using this exact computer
doesnt seem all that slow but im still a bit new to stable diffusion
let me know if youd like me to try something
Oh nice! Do you use comfyui with it?
i do not
currently using automatic1111
I have a question
Has anyone ever tried to download an entire image board and make a model out of it?
I wonder how that would turn out
Hi there,
I used comfyUI on my MacBook M1 Max with 32gb of unified memory. It worked and speed for the SD 1.5 models was acceptable. Sadly even old 8gb nvidia cards were faster…
But the 32gb helped a lot for some large language model tests but again the time to generate tokens was awful even with the support of the MPS Backend. So I would suggest to you using flux schnell or other lightning concept with less steps or use 2 iteration process where you can see instantly if the result will match your expectations and abort if not. So the higher resolution, step count, upscale, … will only applied to those images that match the prompt.
And eGPU won’t work as for know apple only support AMD eGPU so it won’t help you much.
hello, how can i try stable diffusion ? new here
Well you can install on your pc and try it out xD
try wubui forge,just search on the github.
Everyone who's using a MBP should be using the app DrawThings imo. How fast are your Flux generations, lets compare times.
I'm on an M3 Pro, 18 GB Ram (MRX43B/A).
Thank you for that information
Yeah I discovered that after I researched some more
Yeah I was reading drawthings is the way to go. I just like the interface and workflow of comfyui which it doesn’t support
Did you tried diffusion bee. It uses a quantisized flux schnell version and so it needs less ram and only 4-6 steps.
Hello!
Not sure if this is the right place, but I'm looking at training a model and/or Loras. Would anyone be able to give me some input and advice?
what are you using to train loras 👀
Are you asking about my Computer or if I'm using a program? I'm not making them yet, that's why I'm asking for advice
ah, just wondering what you were using, computer or program or site, i make loras on civitai sometimes because my laptop cant handle making loras on it
didnt know you hadnt it made it to the uh making them part, my bad 
It's ok! Sorry if I wasn't clear!!!
Wanted a lora for my OC to be consistent
do you have 2-3 really good images of them
I have about 2500
DM'd
I would like to make some anime characters move or dance around? Best img2video for this?
@warm junco spammer's are back
Thx
Probably minimax if you are fine with closed source.
For open source, something like controlnext svd could work.
Scammer alert! don't click the link
Hello!
Just landed here in SD Discord
Happy to have arrived~
Would there be a guide somewhere to install and properly setup SD? I was previously using ComfyUI but something broke so I am starting over and could use some assistance~
Thank you!
Weclome, best way to start a fresh installed can be found within the github of the technical miracle @warm junco who is really helpfull in the #🤝|tech-support . This would be the link:
https://github.com/CS1o/Stable-Diffusion-Info/wiki/Webui-Installation-Guides
You can also find this in the sticky messages within the tech-support channel.
When the diffusion is not stable 😔
when I can't find someone taking requests
Hi everyone! I'm working on a children's book and need help creating consistent illustrations. I’m aiming for a watercolor, hand-drawn style—for example, a family happily having breakfast. The characters need to be based on real people from reference photos and look consistent across all the illustrations in the book.
I’ve tried using MidJourney, but I’m struggling with consistency in both the characters and the art style, even when using the same references. Does anyone know how I can achieve this with Stable Diffusion? Are there specific workflows, tools, or models you'd recommend for maintaining character likeness and style consistency? Thanks so much!
You need to train a lora for this purpose.
Thank you so much. Training Lora's seems possible with Stable diffusion. I will dive deep in learning this
Hi guys, I would like some clarification on masking image, because I am getting conflicting information.
I have read that a mask is a greyscale image where black = remove and white = keep and different shades of greys are for in between
But it seems that practically we are painting black (or red??) on the original image as mask??
A mask is a separate channel or layer and not painting onto the original image.
That was my understanding too, but various sources seem to indicate that the original image with black areas are used as both the input and the mask?
Such as this one
https://www.youtube.com/watch?v=vY4QwnR4R2M
Do you have a ComfyUI inprinting example?
I don't know what you're using. I'm just saying a mask is a different channel. It can be the case that they use a workflow that adds black to the original image. But I wouldn't call that a mask because it will treat everything black the same, whether you want to inpaint it or not.
Usually this is referred to as treating a specific color as transparent.
I think you mean the black spots and maybe even the grey spots in the adjusted original image would make it a bad mask?
The workflow in that video creates a mask. It does not paint onto the original image.
It is just, the sources are conflicting, and MastEditor of ComfyUI seems to encourage us to just brush out the areas in black and use the image as a mask
You can see that the node has two outputs. Image and mask.
If I understand correctly, you meant the "Load Image" node would output two image files, one is the full image and the other is a black and white image as mask?
If you load an image with alpha channel then the alpha channel will be available through the mask output. If the image does not have an alpha channel you have to paint your own mask.
Oh, I think I get it, the mask "black" is not the RGB = (0, 0, 0), but rather alpha channel = 255?
Yes, although it does not have a color. You can look at it as background and foreground and you could assign any color to both.
Thanks heaps! It was confusing to me as I was thinking masking as a RGB thing
Hi! I always had a problem with using depth maps extracted from any 3d modelling software in controlnets. Even with 16 bit depth, the sudden jumps in grey tone (banding) is interpreted by the model as an edge, creating some kind of seam. Does anybody know how to fix this?
What i do for now is applying Depth Anything to my depth map, which feels very stupid. Also, how is that the depth maps generated by depth anything dont have this problem?
Only use the same depth as the model uses. There should be no banding, not even with 8 bit.
I think the issue is the maximum depth. You should set it to your furthest object you want in your scene.
Thanks! I see that the images coming out of Depth Anything are 24bit pngs. Therefore, i guess thats the standard
Since any depth model works good with depth anything
CFG 
In theory this is already what rhino is doing. The darkest part of the image is the furthest that is in view.
How do I get into generating AI videos?
locally? take the plunge and get ComfyUI installed and working and be lost for a couple weeks, than generate video
Hmmm... What are some good non-local options? I want to dip my toes in the water before diving into the deep end.
For cloud based Video Generation take a look at these,
Luma Dream Machine
Kling
Minimax
Are any of these paid? And where can I find them?
Also, anyone have any good example prompts for videos? Not sure if the prompting is structured any differently.
hunyuanvideo
hey peeps, i followed a tutorial on youtube how to set up flux dev, I did everything exactly as the video showed however im getting errors and Im assuming its because I don't have "flux" type in the dualcliploader, anyone knows what im missing? when i search for this on google i cant find anything https://cdn.discordapp.com/attachments/1002602742667280404/1323333004860461128/image.png?ex=67742166&is=6772cfe6&hm=9ac0418507062081e9bc594d56179110007d61a8228c6cbcff1cd6d51500c420&
everything is up to date, comfy and the nodes as well
Remove this scammer please
Its like bot wackamole in here
ikr. The fix would be easy: Force people to pass a captcha before joining a server.
its crazy they direct people to a place with active scamming and think its ok to ignore it all
says much
the trust and safety that gave SD3 issues should of been applied to the discord not the model
I don't think SD3 was overly censored
I just think it was a 2B DiT
to this day I have never seen a good 2B DiT
I think DiTs need at least 6B
The whole "safety" rhetoric is nothing but fascism in disguise anyway. I'm not in danger because of mean comments, false information or obcene imagery. The real threat in the internet is fake products, scams, data harvesting, social engineering, shills and fake reviews, government propaganda etc.
ya that and what seemed like direct removal of concepts
oh actually direct removal of concepts might be a valid point yeah
what I was trying to say was that I think the anatomy problems were because the DiT was too small, not because it has censored training, which is a popular theory
DM sent
Hi, can I ask if the Stability.ai API has faceswap functionality? Couldn't find it in the API docs. Thanks!
Yeah I think that's probably a big issue, there are several 2b dits now and none of them are very good.
Phone verify
Too costly for most attackers
ah tragic, phone verify is gonna lock me out real shit
mmdit isn't good, regardless of the model size
Hey pal.
not with temporal consistency afaik
I see people doing checkpoint and Lora left and right.. But I cannot find a video or anything that really explains in it a good way. I would love to make my own checkpoint model and update a checkpoint I am currently using but make it into my own and update it so it works better. Can anyone help me with this?
@boto can you message @frosty turret please, he's autistic and needs to know why he was timed out.
prolly cos he posts nothing but lolis in the anime channel when told not to 💅
Hi everyone, new here, looking forward to sharing prompt best practices
a lora is a very small model that just updates a few specific weights of the base checkpoint. A fine tuned checkpoint is much larger, but does the same thing - updates the weights at runtime. both are less expensive than retraining the entire model would be.
whats the difference between regional prompter and composable Lora ?
Regional prompter works and composable lora is outdated since 2 years
probably because he was explicity sharing prompts for little girls with "big breasts" and following up with "hehe"
in thc we trust
Happy New Year to everyone! 💫🤠 May you stay healthy and be happy afresh... 🎐
I am trying to fine-tune an SD inpainting model to outpaint backgrounds from product images conditional on a text prompt. Standard SD inpainting models tend to extend the product frontier, likely because they are pre-trained on small input masks. See the paper: https://arxiv.org/pdf/2309.11507.
I have a dataset with product images, masks, and text prompts. However, wherever I search, it doesn't seem possible, and there are many different discussion threads on Hugging Face and Reddit where people ask the same thing without any satisfactory responses. Is there any way to do this?
powerpaint v2 is good
my stable diffussion generates completly shit and i dont know why can someone help me
Hello everyone, glad to be here
what ui, model and vae are you using?
Whose stable diffusion doesn't. That's why I only use Flux.
Kick these guys out pls.
what version of stable diffusion are you using?
is there a way to make seeds different when generating a lot of images?
like seed 100 to 105 will look very similar
but i want to have a very different image every time but not having to do it one by one
is it possible?
If different seeds does not give you enough variance you also have to change other parameters for each generation. You could mix and match different prompt strings for example.
yeah but like when i do one generation, the next one will be very different, but if i do 2 batch count, the 2 will looks really close cause the seeds follow each other
dunno if i'm making sense
you wanting slight variations?
Swarmui has Variation Seed natively implemented that could achieve this
Seeds following each other does not make images similar. The seeds are used for RNG. Two consecutive random numbers are, well, random.
why do they look so close to each other when i do a lot of image through batch count, like 30, but not at all if i just do like 1 batch count each time
Make sure the seed actually changes. Also it's to be expected that images will look similar for any given prompt. I don't know if you mean almost identical or just similar in style.
Specifically, seed determines the initial noise that is denoised from. Beyond that i'm fairly sure the seed won't determine how things are denoised. Unless you have nodes set up to affect changes based on the seed value
https://chenglin-yang.github.io/1.58bit.flux.github.io/ This is interesting. hope we see weights that are usable come of this project.
Why not just use the Q2 gguf model?
it is a useful one but it's results ring like an sd15 generation. there is severe degredation at work when models are just straight quantized to lower bits
Yeah it’s from ByteDance so probably trustworthy but kinda suspicious that they don’t show much info.
The technique is supposed to be similar to bitnet for llms according to them but bitnet required pretraining from scratch on hundreds of billions of tokens. Somehow they achieve that without pre training and it’s still very good quality.
Likely are just refining it more before deploying. Or hoarding it to themselves forever. one or the other .
https://github.com/Chenglin-Yang/1.58bit.flux?tab=readme-ov-file#158-bit-flux this says they're going to release, but we shall see
There was an error in my SD and it showed a running error
Do you have a good brother to help me take a look?
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)
Which LLM is probably the best for IT? I'm too stupid to configure port forwarding on a pfsense router
out of the low bit stuff I like SVDQuant the most
SVDQuant is 4 bit rather than 1.58 bit but the quality is almost the same as FP16
best to ask in #🤝|tech-support
Hi everyone, some SD experts here to help replace our current version of SD with SD 3.5 inside our chatbot app (we're using comfy deploy)? Not sure what we're using at the moment, but it's an older version of SD and we need more hyperrealistic images of humas in our very human-like chatbot app. Please helppp
Our app allows users to gift clothes and accessories to their AI chatbot, they usually reply with images. Would be great to have someone work on this with us ❤️
What control panel similar to ComfyUI would you recommend for professional use in generating videos? The idea would be to create dozens or hundreds of thumbnails with specific seeds using a video model, select several of those thumbnails, use them to create miniature videos with another prompt for what will happen in the video, choose one of those videos, create a video at the highest possible resolution with that seed, and then pass the result through another AI upscaler to reach 1080p (or 4K if possible).
I still haven't started learning ComfyUI, and I don't know if there's an easy way to generate many thumbnails, or to use fixed seeds, or if having fixed seeds guarantees consistency when expanding an image.
Can it be used via the command line? All of this is for making FMV games; maybe I could program something in the game engine.
Anyone here using ai for book covers?
?
thats an idea
i dont think so but possible
use flux for that
comfyUI is pretty straight forward
just download nodes and see what they do inside openart.ai
ComfyUI seems the most professional, but having a web server running in the background on my machine all the time to then access it from the browser seems like an excessive RAM consumption. If there’s something more command-line-based, it would be more convenient, especially when integrating it into a game engine.
whats the setting for using tdd 4 step lora
comfyui is designed to be easy but it isnt efficient when it comes to integrating it to video games
python would be a great solution to this
especially with pytorch
From what I see, Hunyuan can be used via the command line mmmm
Is there a lora that includes a lot of celebrities, or do I need one for each single one
Which channel should I use to debug problems installing controlnet on ComfyUI on RunPod server?
It sounds like you're trying to create a virtual character. For character consistency you need to train a lora. For what you're trying to do, having a 3D model for the character as well as 3D models for clothes and passing the 3D output through stable diffusion for photorealism, changing colors and adding backgrounds, seems to be the most viable approach.
HELLO
Hi there! Is there a discord server specifically for StableStudio (open source), or is this the place?
guys, what is best way to use flux, but not comfyui?
Hi everyone, I hope this is the right channel for my question. I’m looking to run a LLM with image generation capabilities using Stable Diffusion. I’m unsure which GPU to choose and would appreciate some guidance.
Should I go with AMD (using the ROCm platform), or is it not well-supported for this purpose? Alternatively, would Intel’s Arc B-series GPUs be a better choice, given their good value? Or should I prioritize NVIDIA due to CUDA and its broad support for AI workloads?
My budget is around €300–€400, so I understand that the performance won’t be lightning-fast. However, I’d like to achieve decent speeds for both tasks.
Thank you for your help
Probably Swarmui or forge.
you want an nvidia gpu - but if you have a low budget, maybe try one of the online solutions - such as runcomfy online, or rent a gpu from vast.ai or amazon webservices.
Dont go with Intel for ai stuff for sure.
AMD is nice if your on Linux with ROCM or know how to use Docker with WSL2 on Windows.
Nvidia has the best ai support but lower vram in their midrange.
Best card would be a RTX4060 Ti with 16gb vram.
is swarmui better than forge with flux? cuz i have fluxdev+2/3 lora and i have out of memory (not vram, ram) in forge
yes
swarmui uses comfy as the backend
without having the node base visualization and more of the a1111 type of gui
ok thank you guys
So in essence it is comfyui with the graphic form?
yeah lol its nice
get the comfyui performance while the simplicity of the gui
has allot of native implementations that are nice aswell
and there are things like wildcars/adetailer in a simple way like in forge or is it rather hard?
uhhhh wildcards like random:blue,green,pink ?
and i dont find adetailer to be needed with comfy
i use hypertiling which helps
and swarm has a refine upscale mid gen
thing
which what i use
I don't know if there are any local LLMs that can directly call functions, but you don't need it anyway. You can just let the LLM use keywords and then look for those keywords in the output and control the image generation that way. Buy Nvidia because it's well supported. What you really need is RAM. If you want to run a large LLM at least 64GB. You also want 16GB VRAM.
Is SANA and Omnigen worth the HDD space? are they substantially faster/better than flux or sdxl?
And whats this Illustrious model? Is it like a more realistic Pony?
in youtube land every new ai thing is a flux or sora killer i know but whats the reality on the ground?
HDD isnt worth using in ai
it will load extremely slow
invest in a SSD asap
espesically when you get into the large models like flux and whatnot
what about their question though
SANA isnt it, its just fast, dont know enough about the other stuff
So the llm ist more resource intensive than the SD model?
So would it be possible to build a PC for just image gen for me and my friends without the llm? The PC would have like 16 gigs of ram a SSD or nvme SSD an i5 7 Generation and an like 300 Euro or 350 Euro GPU. Does anybody know if this PC would be enough for like generating 512x512 or higher in like 15 or 20 seconds
sana is rly fast
OmniGen was more of a research model than something to use. Its training data overage was not broad
16GB RAM is not enough. You should at least get 32GB. RAM is MUCH cheaper than VRAM and you can get away with 8GB VRAM if you have enough RAM. The GPU is mostly for speed. I personally still use a RTX 2080 with 8GB VRAM. I can even run a 12GB Flux model with no problem, but only after I added more RAM.
So I invest in 32 gigs and do you have a GPU recommendation for like 300 euros or should I wait for the Rtx 5000 series or maybe go with AMD if I would use Linux or is Nvidia the way to go with all the support
Stick to Nvidia. I recommend you wait for the 5000 series because it might lower the prices of the older series. Prefer more VRAM over more performance. Generating SD 1.5 images within 20 seconds will not be a problem.
Ok thank you really much
you're welcome
Alrighty then, thanks for the infos
Yeah sana is fast like neon said but I don't think its worth it really when flux schnell 1step clearly outperforms it by a very large amount and should be similar speed, maybe even faster.
Omnigen is pretty slow sadly and although the identity capability is very impressive, image quality is worse then sdxl even imo.
I didn't play with Illustrious much but it is very creative and unique from other people's experience and more artsy, its not for nsfw like Pony I believe but it can probably do it.
I dont think sana, omnigen is worth it yet but illustrious might be decent to pick.
current hype is like LTXVideo and HunyuanVideo, both deserved
IF you're using the basic webUI Automatic1111, is there a way to get the tab for SD back open if you've accidently closed it but it's still drawing?
Can reopen a closed tab ofc but no prompt or drawing
yeah ltxvideo is crazy fast, there was a new repo that increased the speed of ltxv on new gpus without any quality loss and lowered memory usage too. On a 4090 it went from 16sec-->6sec for transformer for a 5sec video.
Hunyuanvideo with dpm2ppm is even more improved, it can work even with 8steps now and 16steps is way better then higher step euler/ddim.
Ah video is stepping up I see.
Whats the latest/best model for text output? Anything new?
Do you mean llm or like text on a image.
Text output/text in image, like a poster, for example
For open source, it’s still probably flux dev which is still comparable to several top closed models like ideogramv2, dalle3, red panda model.
The best in closed is probably flux 1.1 ultra
Yeah 👍 thanks bud
i woudln't call flux dev open source. Though they are weights that you can obtain for free. They just aren't open source nor do they have a libre level license.
in fact, most models i wouldn't call open source. That would imply the source code that trained them, as well as the datasets, were made public
best open license model is Schnell yeah
its a pretty rough situation
whats the website to generat on web i cant remember it and i cant find it on the channels anymore?
Have an excellent weekend everyone.
hello
Hi everyone! I'm looking for models and/or loras to do realistic inpainting with SDXL and Flux in Forge. Any help?
Does anyone know of any summarization ai tools that would do a good job aggregating and summarizing a bunch of text responses to survey questions? Thanks
hey guys i was wondering if there was a way with the batch size to make it so each pic in the batch is at different denoising scale? say you put batch of 4, itll gen 0.5, 0.6, 0.7, and 0.8 denoising pics with the rest of the settings the same. is there anything like this or am i delulu?
hello guys, I'm looking for a mid end laptop to run SD, something like RTX 4070, I'm kinda out of what's is happening rn on CPU/GPU technologies. i'd like to ask if you have something to recommend me, if there is a laptop worth getting now or if I should wait for some new model coming up soon. If you can you can send me DM or @ me, thanks.
Why would you even need it to be in one batch? Just make a XY plot.
cos batches are faster
Yes but not that much.
am i better off using a colab version to train a model?
because I want to make a model of particular art style so, im not sure what parameters are best for it
I agree for the most part
but there are weird edge cases where it speeds things up a lot
particularly with clusters
with the auto1111 extension you can do both. configure the x/y/z grid and each batch will use the current cell's parameters. I'm not sure about comfyui. you'd probably need to make a custom made noodle plate for it
So, i installed this, and the scanning worked properly, downloaded all the images for my lora and models :
but i don't see the 3 additionnal buttons
are you trying to use it on forge? forge updated to gradio4 so i don't think it'll work.
The reforged project still uses gradio3.
nah just auto1111
ah. well then i dont know then. another extension is conflicting with it maybe. Check the console log when you start the webui. it may have some note about a problem loading an extension
maybe the author updated the buttons out but the readme never got the info
i'm using animatediff (is there anything better to make video on webui111 stable diffusion?) and for some reason the video is fast/accelerated, and i don't know why, the first time i used it, it didn't do that
i'm following this guide :
So i've been using pony but apparently illustrious is better and doesn't need as much LORA... But the thing is, how do i know which style/characters does it have by default?
cause i certainly won't retain them all by memory, lol
yo
I would like to make something like https://interiorai.com/, with my own model. I played around with Automatic1111 and added controlnets (depths) and all to make my output, but thats just the UI of it.. how do I make it into its own "model" so then I can use it via API? Is there a output file? a good online training software or raw by python (torch)?
Im confused how it works, also for each interior AI "style" do I have to train the model on each style as well?
hello
Hello all !! i just got a new computer a few weeks ago and i wanted to try stable diffusion since a long time now ! i installed it on my computer locally and now i want to learn more about it, especially how to make cool picture using the same face, as i know it is possible to train the ai to do it, i wanted to know what should i look for on youtube if there is a cool video about that, that is clear and simple (english isn't my first langage) thank you !
hey guys i just wanted to check since this industry is moving so fast if anyone is using updated models? i am currently using epicRealism and Dreamchaser XL. it seems to be working out for me. am i out of date?
ban this guy immediately
likely was a stolen account in the first place
Possibly. This scammer infestation is so annoying. I hate the "I'm a good guy, can you help me out" shtick more than the "give me your credit card number" shtick. What's worse is that it works, or else they would just give up.
its literally only this discord
out of the roughly 200 discords I have been in
Yes it's really bad. There's not much going on in the chat so it sticks out like a turd in a punch bowl
Is there any models that takes multiple images to gather a style to generate an image from? The img2img one im using currently only allows for one image per.
Afraid your gonna need to make a lora or two for this one
Lora 1 at 0.5 strength and lora 2 at 0.5
Im new to AI stuff, ready to learn tho, got any tutorials you could shoot my way? Thanks.
Hmm currently still in bed lmao, what ui do you use?
Thanks, Im down to try anything. Not really tied to a specific one, just started yesterday
Webui forge, a1111 etc are excellent starter ui's
Seem pretty limited so far
Worst advice lmao
loooooool
Yeah thats why i migrated over to swarm ui
ComfyUI uses nodes right?
Yeah
yes nodes n' noodles
Like blender? I might be able to use that if its the best
Swarm ui has a comfyui backend so you can use that and use a simple ui
its by far the best but takes some time to get used to
And it has a export to comfy so you can see how the noodles interact
Ill check out Swarm if it uses both simpleUI and ComfyUI
Its some getting used to but its being very actively maintained
Was the lora you discussed a node?
Not too complex
No/yes, a lora is basically a mini model you load in next to a main model(checkpoint)
Im trying to give the AI a source of a bunch of game characters in order to make concept characters using the same style
So anything that helps would be appreciated
Is Flux the best model so far?
Local?
Lets say a checkpoint knows a whole bunch but sometimes it doesnt know the newest characters from league
A lora trained for that can help with it
Ah okay that makes sense
Flux is definitely the most modern high quality but it requires a pretty beefy setup for reasonable times
I personally still use SDXL because i use a 3070ti for now
Im using a 3080 TI right now, I know it only has 12gb but Im ussually able to generate pretty decent images in sub minute but Im hard pressed at making photorealistic stuff like it advertised however I probably have to mess with settings
Yeah depending on the flux version you use cfg 1 with 4-20 steps
My plan was to generate some decent AI art, convert it from 2d to 3d and than throw it in blender to fix topology
But with SDXL it's also possible but takes some extra checking
And faster
Though a lot of realistic ai images suffer from the same-face syndrome and still has that uncanny feel imo
I have CFG 1 and 20 steps. Ive tried up to 150 steps but didnt get that much more detail
150 is a ridiculous amount
I generally use 20-24 if i want more detail i crank to 40 in sdxl
Anymore and it doesn't do anything lol
Does SV3D just generate images/ videos of an 3d object or does it actually make an obj file?
Because I was looking at other 2D-3D software that actually converts it however the fron always looks pretty good but the back suffers, so I was wondering if there was a way to combine them to get a full decent 3d object estimation
Ah in clueless about that one.
I saw this fly by though this month
https://www.reddit.com/r/StableDiffusion/comments/1hb1tj4/i_created_a_blender_addon_that_uses_stable/
Looks pretty solid
Would be crazy to see both of those combined. Generate an object, clean up topology than generate textures
heyy, can someone with stable diffusion dm me? i have some requests
Is there a way to tell what model a lora is for?
Most of the loras in my collection are organized by model (1.5, SDXL, Pony, Flux, 3.5 Large, Illustrious), but there are some loras that have vague names that I can't seem to find the original page for
As I was writing that, I found it, it's in the internal metadata listed as "ss_sd_model_name"
They were Pony
"ss_sd_model_name: ponyDiffusionV6XL_v6StartWithThisOne"
I kinda disagree with the whole concept of downloading and storing lots of loras to make a big collection
it feels like its doing things in reverse
would be better to either train or download a lora when you need
and then you will know what it is for
I use folders and add metadata to the lora so i don't have to remember
Sometimes loras get taken down so when i see one i want to use in the future its getting downloaded
This is a vague question, but is it easy for you guys to generate images that actually look good? It's just so damn hard for me to make anything that actually looks decent. If I try to recreate some image on Civitai and download every resource they use, use the exact same prompts and weights and all, it usually looks worse every time. And especially if I then try and change any detail, every change I make or LORA I add, it either breaks completely or looks slightly worse with every iteration.
I don't mind a bit of tweaking to get things right, but it's like I'm fumbling in the dark and it's all random if it will turn our okay or not, even after having generated images for over 2 years.
hmm any specific example? if its nsfw you can shoot me a dm
you can shoot me a dm if its easier to describe on call or if its nsfw
but generally without much prompting it shouldent be too hard
I don't have a specific example really, although I have a lot of generated broken messes
ah i mean what model, what specs, what UI etc
I've tried everything from 1.5 to SDXL, Pony, FLUX and Sana. 3080 Ti 12gb, ComfyUI, SwarmUI, SD. Next and Auto1111, It's all just as fickle.
Sometimes I get decent images, but it's very rarely at the quality of what I consistently see some people on civitai make
And if I copy their settings, it never looks as good.
I guess what I want is just a consistent baseline workflow that can generate decent images and that I can build on later and add LORAs to etc without everything randomly breaking or starting to look silly and bad
As you say though, the more you try to prompt something specific, the worse it tends to get.
hmm got time to hop on a quick call?
i can show you how i do it and maybe identify whats going wrong at your side
Ho do you copy the Settings?
Because you most likely dont do it the right way.
Click on the blue copy button below an image on civitai to copy all Settings, even the hidden ones.
Then go into auto1111 and paste the whole thing into the positive prompt box.
Then press the blue/white arrow on the right to auto apply every setting.
could be that he takes the upscaled resolution as initial and the ai cant generate a proper image thru that
You could provide an example in #📝|prompting-help
waiting for him to do that lol.
oh i mentioned the wrong username xD sry
did u guys make an coin
no that coin never happened lol
any good way to make a 45° angle, a quarter angle? Like view on a bed, but not in front or on the side, but in the angle, i can't find any prompt that works
Best chance would be using a corresponding image as source for a depth map and use the controlnet
i'll try to figure it out, thanks
now last question
i dunno why, my generation keep having little characters in the background, trying to do more or less the same thing than the characters in front, and i dunno what prompt would trigger that
hello 
yeah trellis is for sure the best for image-to-3d/multi-image-to-3d, and even similar quality to closed source options. It's also very fast which is nice.
is there a prompt to have different hair colors and it chose randomly for each generation? With the current prompts i have, it either goes blonde or white haired, but nothing else, even though i don't have any character lora or something like that that could influence it
can anyone point me in the direction of how to generate audio reactive keyframe values for deforum? All I can find is either missing links, lacks settings I need (or don't understand how to get desired results) or not compatible with installed python
the one thing i hate about IA, you don't see the time pass, and you spent the whole day trying to tweak some prompts, only to realise that what you did at the beginning was better, lol
i should have just played my games instead :p
You can do this in Forge
I had exactly this problem with Forge/Webui. My generations would look like garbage compared to Civitai with the exact same settings. I then switched to Comfyui and finally got identical results.
This prompt in comfy would lead to different colored cats wearing either a robe or a hat.
"a {red | green | blue | purple} cat wearing a {hat | robe}"
just use a { } - Bracket and the pipe | Symbol for each option. You can also use longer options if needed
in swarm you can just use random:blue,green,red
or even store it as a variable so if the boots are red so must the gloves instead of having blue gloves and red boots
damn that's cool, i wish they could add that in the normal auto1111 webui :p
you could start a feature request but i think theres a A1111 extension that does that
though i left a1111 and never went back after swarm
https://github.com/adieyal/sd-dynamic-prompts is the extension
Has the same syntax with brackets and additional option like to choose at least two etc.
oh nice thanks!
Can i use the syntax immediatly, no need to configure anything? i don't have time follow the whole tutorial right now
You would need to install the extension. Then it should work.
It’s not only about settings, could be due to the workflow, the sampling method, the nodes they use…. There are many different sampler, eg. I use often the RES4LYF, and have different results from original one.
It's much deeper than that. There is something severely broken with the way the Flux model support was implemented in Forge UI. The results I got were really bad, but not in an obvious way. I only realized once I saw on Civitai how much better Flux images are supposed to look. It's very difficult to explain, but it's like the difference between Flux dev and Flux schnell. You wouldn't realize you get bad quality until you saw what good quality looks like.
Never use ForgeUI, only ComfyUI
Yep. But I wish Comfyui would let us easily create a user interface for our workflows, so we don't have to pan around the whole work space. I also wish we had a better editor where we could also paint onto images. The custom nodes I've tried aren't great. A combination of Invoke and Comfyui in one package would be awesome.
Anyone know if it's possible to run OneTrainer SDXL Lora with 8gb 3070ti? I searched Google but none of the solutions worked for me. Always out of memory error
Multiple of 64 and square resolutions are better than very widescreen.
Is 864x864 good?
including which one do you use?
There is no clear rule. It depends on the model and on the type of image you're trying to make. Usually just for testing different prompts people use 512x512 because it's fast.
Oh and by the way 864 is not a multiple of 64
I know, I asked because that was the one I was using before
now I'm using 960x960
Some user interfaces have good resolution presets that you can use.
Sometimes you get better results with lower resolutions and upscaling afterwards. Because the higher the resolution, the more likely it is that the image can feel empty.
thats kind of legacy advice. it meant more in the early days around sd 1.4. generation code and models have since been updated to behave better across more resolutions. NovelAI introduced resolution buckets in training code, and base models since then have all been trained wiht bucketed resolutions. Allowing the models to know non square resolutions a lot better.
All of the SD1.5 refines use bucketed training sets afaik
There's no one resolution that's good. IMO. I prefer 4:3 ratios, or 3:4 if you want the other orientation. I often do the 2 stage generation, 1 with the lower resolution and 20 steps. A second img2img pass with higher resolution, 0.6-0.7 denoise value, and less steps
I mean you can never go wrong with 9:16, that’s the standard resolution for portrait videos on mobile
yeah. sticking to square resolutions like it's a technical limitation is kind of bad advice. you're welcome to stick to them for preference reasons but it's not necessary
Hi guys. Im just now getting into AI, I have an m3 macbook air. Are using ai models possible on macbooks? Or are majority limited to windows. Im a total newbie- thanks.
https://github.com/LykosAI/StabilityMatrix/releases/tag/v2.8.0 try this. it'll offer you lots of different packages to install and try. i think it configures the UI's to use mac ML silicon
Thank you. Would examples such as photo to video be possible in given package?
possibly. It'll install comfyui for macos. beyond that you'll need to set up comfyui for video and download the appropriate models
https://civitai.com/models/968568?modelVersionId=1097378 here's an example workflow. i'm not well versed in the video realm, but i believe cogvideox is the leading local situation. possibly hunyuan from tencent. You can at least get comfyui going but you'll have to study to get video set up if it'll work at all




