#💬|general-chat
1 messages · Page 186 of 1
Nay, this channel just doesn't have images. #🏞|general-with-images right below does though :P
https://github.com/stepfun-ai/ComfyUI-StepVideo Appears they added support last month :P
Oh, only via api apparently. How much vram will it require minimum? 
yeah api isn't rly support
are you including blockswap?
with blockswap it will run on anything
without blockswap I am not sure what the actual minimum is
but their reference code defaults to 4 H100s and then one more server for controlling it
Haven't even acquired it yet, as i couldn't find a version of it being a single .safetensors
oh I don't think we should be doing that anyway
huggingface's original format was so much better
VAE and text encoders separate, and files broken up
Interesting
How much memory does it require to run ram and vram wise? Like, same as wan 720p?
that's what I was saying
if you blockswap then it will run on anything only a few GB needed
without blockswap I am not sure
their code is 4 H100 plus one more server
to run it

Then not for me, as i prefer all local 
Easier to pay 80 cents per 24 hours of my own 3090 than 80 cents for an hour of runpod for instance 
cloud is cheaper, than the electricity required to run a 3090 at home
if you want to use at home for privacy that is fine
but cloud is actually the cheaper option
Well, runpod is 24x pricier than my electricity :P
Plus i do all local anyways due tp privacy. I would only use cloud to test out models capability, or just workstation cards's speed vs my own card
https://huggingface.co/diffusers/stable-diffusion-xl-1.0-inpainting-0.1 I want to use this model for Inpainting but I have no idea how to put it on forge UI I see no safetensor file
locally run stuff is freedom
in economic sense always run cloud
https://www.youtube.com/watch?v=TCHXzX6vUcA https://rumble.com/v6sai4p--dance-of-destiny-english-subtitle.html my first time using ai plus anime footage in my music video
yeah I have no problem with libertarians that's fine
personally I just want cheap inference
you can do a mixture anyway
yeah runpod is the wrong one
vast ai is the cheap one
Anyone?
look at this one I linked to
#🏞|general-with-images message
RTX 4090 for $0.109/hr
for me to run that at home would cost $0.30 in electricity
cheap inference likely won't happen till ai tools are more mainstreamn
kinda like dot com bubble
yea I agree
when it becomes mainstream there will be more political will
and a larger market to sell into so raising capital will be easier
they need to research targeting use-cases more
at the moment its all a hammer in search of a nail
the idea is to perfect your ai craft like using ai tools so when it becomes mainstream you have an edge
Would anyone be able to point me towards a guide geared towards using InvokeAI to create large amounts of game assets (images of playing cards)? I'm making a game and I have cards that give powerups. I was thinking I could take the card descriptions, pass those to InvokeAI, and have it pump out placeholder artwork so I could continue development.
never used invokeAI :<
I need to research more on image generation
if you've mostly used comfy trying diffusers can be good
or pure pytorch like the original flux code (that particular code is rly nice)
@woven panther hey i appreciate you started porting some SkyReels V2 stuff ❤️
https://github.com/mr-fool/ai-background-removal-toolkit did this like yesterday. Learning the library
background removal is nice yeah
the library does all the heavy lifting it just slap a gui on it
how many background removal tools do we have these days? i lost count :3
there are plenty of tutorials on YouTube. What is xou exact question. It sounds like you just want to do text2image
is like react todo list. It is flooded
basically nowadays as soon as someone pump out some ai tools. Someone will fork it to boost their resume
ok how do I install the sdxl inpainting model or the ace plus model, none of them are working
you don't install models
you copy paste them in the correct folders
new model every day recently
capstone projects used to be a good idea until everyone just copy each other on github as soon as it looks cool
haha no time to rest eh 🙂
no time to use them
mhm
I just want to either put https://huggingface.co/diffusers/stable-diffusion-xl-1.0-inpainting-0.1 or https://huggingface.co/ali-vilab/ACE_Plus in forgeUI
I would assume you put them in the same directory as normal models, but not sure. I don't use forge
does anyone here use forge?
you use them only for inpainting the same way you would inpaint with a normal model
what is not working for you? does the model not appear in the model selection menu?
some forge users hang here
something interesting about SkyReels V2, i did a small inference test using basic comfy workflow with the i2v 540p 1.3 model,
and it generates the video sure... but like the starting frame (image) very quickly and abruptly changes and the camera even moves
in a strange way lol, idk.. il wait for your wrapper anyway, maybe you will implement it the way it's meant to be used, cause right now
kinda wonky lol @woven panther
forge did look appealing I just never got round to it
and now I have left for Rust lol
it doesnt have to be those models, I just want to use an inpainting model to expand some images
with forge UI
but I keep getting error
what errors?
ye its tricky cos not having used forge UI its hard to help
what UI do you guys use
comfyui or (rarely) InvokeAI
Does vast have templates? Like run it once, it'll auto setup everything for you ready to go?
i personally use just the normal comfy the portable one, not the desktop
invokeai is the most simple to use ui in my opinion 🤷♂️
yeah
standard docker
comfyui is the most flexible one, but with a high learning curve
i mean it's not even that high...
comfy is the best gui yeah
to beat gui you have to go to command line / code frameworks
it's just the model, if it doesn't "recognize" the input, it does whatever
and it follow the prompt REALLY closely
if you prompt something that's not in your input image, it can just move onto that and ignore it
and it's human centric model.. non-human stuff does that more often than not
I did get some amazing outputs when I initially tested it though
comfyui sucks, I drag the node...#🏞|general-with-images message
so I don't think there's anything wrong, it works in both the wrapper and native just as it is too
andf oh? where the f is it? #🏞|general-with-images message
the GUI has had more bugs lately
hmm maybe il try again and see, but il wait for your wrapper as well 🙂
but you can decouple the GUI from the back end and use the back end alone
one of my current projects is to make rust front end
comfy my ass
it already works, I mean there are many Skyreels models... the DF is the one that's very different and needs it's own code, the rest just works with any old workflow
yea
I mean what GUI is alternative?
comfyUI doesnt even work, I cant manage to install inpainting in forgeUI fm
alternative GUIs are forge or invoke?
these are like 0.01% of comfyui features
if you include CLI/code-based then you get all of pytorch/jax/julia/C++/rust ecosystems etc
but these are not GUI
try the non-desktop version of comfy, it should work
invoke and forge have quite a lot features
inpainting isnt working for me
99% of the users don't need 99% of the extra features
it needs a model to use
link?
its tricky cos a lot of features are edge cases
where you only need it a handful of times
but in that moment you really needed it
there are a lot of features that I have not used in recent workflows that I found indispensable in previous ones
whats the point of having one million features if something as basic as dragging objects on screen doesnt work
I mean I agree I've switched to rust lol
will try this buut im pretty obfuscated right now
it seem to work for everyone else 😂
I did a clean installation and doesnt work
the thing is its hard no matter what you do
some stuff like loading and casting I find hard in every single codebase and language
and sorting out compile
sage attention and teacache/firstblockcache also
this stuff needs setting up in every fresh project
if you want a easy to install and easy to use ui, I would use invokeai.
I would argue that comfyui is the wrong tool for you. It's complicated to use when you don't understand the internals
technology moves so fast that by the time those will be default in some setups, they will most likely be deprecated by that point 😂
forgeUI is the one Im linking cause is the same as authomatic1111 the one I used in the past
its just that I cant manage inpainting for now
you can also use forge. But you won't find help by anyone if you cannot precisely say what is your error message
isnt swarmui a nice GUI? it has tons of features, like close to comfy features i think and it should give you inpainting stuff
I downloaded this model ace_plus_fft.safetensors
which is supposed to be an inpainting model
I put it in the stable diffusion models folder
and when I try to do a generation I get AssertionError: You do not have CLIP state dict!
caching in diffusion is around 2 years old
its the one the other guy said
sage is new yeah
what do I use then
although I actually use STA instead of sage where I can
use sdxl inpainting for example
and I cant manage to download that one
or flux inpainting if forge supports it (I don't know)
it's a diffusers model. You could check civitai if they have a forge compatible one
a diffuser model isnt a single file?
I just want to download the model but I see a bunch of files instead of a single one and I dont know how to set up in forge
you dont need more files other than that one?
what about diffusion_pytorch_model.safetensors whats the difference
well you might need vae and clip, but thats also all there for you to download
fp16 is smaller and basically the same quality
well i never used forge idk.. but usually thats how it works, you either get 3 separate things (unet, vae and clip) or if you lucky you have all in one.
im sure there was a download for inpainting all in one somewhere, but i dont remember where
thats the problem then, it wasnt working because I was missing files 🤦🏼♂️
btw why is this model bad?
no, it's multiple files and also a different naming scheme
I didnt know that
it's not a inpainting model
also, it's based on flux-fill which is not supported by forge
at least it seems so for me
which is sad. flux-fill is probably the strongest inpainting model
ye
powerpaint v2 for SD 1.5 is not bad
its bizzarely strong for an sd 1.5 thing
before Flux fill it regularly took SOTAs
yea if you have the hardware specs, go maybe with flux fill
https://huggingface.co/diffusers/stable-diffusion-xl-1.0-inpainting-0.1/tree/main im downloading this for the moment
wait
ok, time out because Im getting confused
lol
depends on your specs
I have a 4090gtx
flux-fill is the best inpainting model BUT I don't know if it us supported by forge
cause on their GitHub they write they haven't implemented full flux support yet
yikes
PFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
I want to bash my head against a wall
this is why comfy is king 🙂
isnt there a standalone github repository for inpainting or something
so I can use this model?
i mean im sure you can find even a huggingface space for free to inpaint
yes pretty much- the base flux code repo
or use invokeai, it has support for flux
do you have a link?
if you search github flux black forest labs it should come up
do I have to use the flux model or thera re no other good inpainting models
there are plenty of inpainting models
https://www.invoke.com/ this one?
for example, this does flux fill outpainting:
https://huggingface.co/spaces/multimodalart/flux-fill-outpaint
yeah but I dont know which ones
anyone can give me tip on how to make face remain the same on i2v using wan 2.1 model
yes
the flux one you say doesnt seem to work with forge as you say and the other model the other guy said doesnt seem to eb an inpainting model after all
you can try the sdxl inpainting you downloaded with forge
and after 3 UI installs none of them work because one is incompatible with flux and comfyUI doesnt even have a working drag feature and Im getting crazy
which one
try invokeai. It has a full installer that also automatically download all the models for you
the unet/diffusion_pytorch_model.fp16.safetensors is the file you need
wait forge UI seems to work with flux
Im gonna try with forge I dont want to isntall any more stuff
But I need 3 files, the one inside the vae folder, the one inside the unet folder
and the third one?
text encoders
there are 2 text encoder folders
do I have to download both or only 1 of them?
xou haven't used sdxl so far?
no
I used 1.5 long ago
or just use invokeai 😬
and I remember seting up the model and the vae, then no longer needing the ave files
but dunno how things work now
maybe later
both
so I end up with 4 files in the end, the unet, the vae and the 2 text encoders
I don't want to make advertisement for invokeai 😂 it's just really newcomer friendly and it sometimes makes me crazy when tools like comfyui are recommended for new people although these tools are definitely more for professional users
yes
and if you want to also use sdxl you only need the sdxl unet file
TBH I just forget for months at a time that invoke exists
as a tool I have no issue with it
so unet its like the base model and the other 3 are addons for inpainting?
wanna try forge by the moment, its siomilar to 1111
you can also use all in one Juggernaut inpainting model, based on sdxl:
https://civitai.com/models/403361/juggernaut-xl-inpainting
no, you always have text encoders, vae, and then the real model (called unet for historical reasons)
ah ok
inpainting is an extension of the model
didn't see
then why did you said this
I always thought inpainting models were finetuned versions of a model exclusively for inpainting
is ok
it does not appear to be.
do you have a link to the flux model? maybe I can make it work
It is
i used to run it on a 3070TI
but it depends if you are using a nvidia card of amd
yeah i crossposted like a dummy and cs1o told me that i needed to add --medvram-sdxl to the .bat
which i didn't do at first
but now it's running a lot better now that i did
you only need the unet cause all other components are identical between sdxl and sdlx-inpaint
no, it's also a bit different, because it gets three inputs: the original image, the mask, and the noised image
even if flux is supported by forge, this doesn't mean that the inpaint model is supported
ValueError: Failed to recognize model type!
fuck this
for the love of God if anyone's reading this and manages to make this work in forgeUI please let me know https://huggingface.co/diffusers/stable-diffusion-xl-1.0-inpainting-0.1
woah another one, gotta be bots or something
happens a lot here
lol i believe you
hmm hope you find clientele here
hmm not here probably, rarely some businesses appear here wanting AI solutions for dirty cheap/free
but its mostly a community server
you might have more luck on fiverr for freelancing
hi if this is real person
your way of advertising is a really bad idea
cos it literally looks like a malware bot
Hello everyone, I'm new on Stable diffusion, as I saw, we can create images with models, and upgrade them with Loras, is there something else that we have to input to upgrade them ?
loras are model finetunes, not necessarily upgrades. But the sd ecosystem is huge, so yes, there is a lot of other stuff
You if there are youtube videos where I can see what I have to implement ?
i dont understand your question fully
your asking if theres more stuff you can use then loras?
if so, theres controlnet you could use and embeddings?
Fine Ty, I will look if I can get some info on youtube
Hi, I'm relatively new to this AI stuff, and I have a question.
I'm using qDiffusion. I tried out some negative embeddings, and I got this error message, and I don't know how to fix it. Any ideas? Error while Encoding.
stack expects each tensor to be equal size, but got [1280] at entry 0 and [768] at entry 18 (clip.py:71)
are you trying to use images or just text?
Text to image
I managed to get chatgpt to fix the code to automatically resize it. Seems to work just fine now.
bruh sora are the biggest posse of wussies I have ever witnessed
cant have even a milimeter of cleavage on ur gens before they get tagged as violating policies
thank the ultra-feminists for this bastardization
** If anyone is using Nvidia driver 576.02, there is a bug that can cause it to ignore GPU temperatures and therefore not control the cooling correctly. I found that it can be fixed by reinstalling it using the custom installation method, and checking the box for "Clean installation". Check that your GPU temperature changes and isn't fixed at the temperature it was at during start-up.
Hello, nice to meet you
qdiffusion is this? https://github.com/arenasys/qDiffusion?tab=readme-ov-file
seems interesting
Hello! Looking forward to explore!
from transformers import BertTokenizer, BertModel
import torch
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')
inputs = tokenizer(sent, return_tensors="pt", return_attention_mask=True, return_token_type_ids=True)
outputs = model(**inputs)
embeddings = outputs.last_hidden_state[0] # shape: [seq_len, hidden_dim]
Get mapping from subwords to original words
word_ids = inputs.word_ids()
Accumulate embeddings per word
word_embeddings = []
current_word_id = None
current_word_embeddings = []
for idx, word_id in enumerate(word_ids):
if word_id is None:
continue
if word_id != current_word_id:
if current_word_embeddings:
word_embeddings.append(torch.mean(torch.stack(current_word_embeddings), dim=0))
current_word_embeddings = [embeddings[idx]]
current_word_id = word_id
else:
current_word_embeddings.append(embeddings[idx])
Append the last word
if current_word_embeddings:
word_embeddings.append(torch.mean(torch.stack(current_word_embeddings), dim=0))
Convert to tensor
sent_embedding = torch.stack(word_embeddings)
Can confirm crashes have stopped too
Yeah
I use qt framework with py sometimes
does anyone here use forgeUI
Yes a lot of people but i suspect your question is actually different
I just want to make this work
or this
but I keep getting ```AssertionError: You do not have CLIP state dict!
Like in the cloud?
Hi
local
I have forgeUI locally but i cant make any of those work
maybe im not settig the folders right
@woven panther LOL.. Phantom Wan ? this really doesn't stop does it... haha
Unsure if this is the best place to ask.
I have a 3080 with 10Gb. What would be the best option for me to train a LoRa? Also open to using runpod or similar cloud options.
I've heard people use flux trainer. I'm not set on a model yet, but between flux and sdxl
I’ve lately been thinking a lot about how AI is affecting the graphic design industry, so I made a quick dive into the topic with this new video. 🤔🎨
Would love to hear your thoughts — I’m open to any feedback! 🙌
Check it out here:
https://youtu.be/uLwnGXXPrfc?si=8tzI6EZaaGERGehq
Hello people!
Today im here to introduce a important question to the people
I want to upscale a picture of Shannon Sharpe
And i want to know how, and what the best method is for the high quality pictures. Thank you very much
I am not an expert in upscaling Shannon Sharpe images, but if a more general approach is fine for you i would suggest you use an upscale tool with models for photographs / real images. As a model to start with i would suggest REAL-ESRGAN. Available in different free tools but the easiest would be upscayl or freescaler.
Upscayl is free?
Yes, i will soon be the only expert in upscaling Shannon Sharpe images
It will go in the history book
Yes it is free and pretty sure the Shannon Sharpe Image upscaling Expert market is not very competitive 🙂
Would a 4GB GTX 1630 be useful for AI?
My guess is probably not but I'm trying to look for anything I can use
technically yes but you will be very very limited
I think you could run stable diffusion 1.5? I personally havent tried it since it was hella unoptimized but nowadays you maybe could
itll be hella slow for larger images. like anything over 512x512. And itll probably still offload some work to your ram since you have 4GB of vram.
I personally would not recommend a 1630 at all, save some more pennies
maybe a 2070 Super 8GB or something
yeah i think 8GB is pushing it but workable. Ive got 10GB with my 3080 and its still slow sometimes if im trying to upscale a lot.
It's what I already have on hand. I was able to run an LLM on a 12GB 6700 XT which I currently main.
My 3070ti used to do really well with xl pushing 30s per image with a few loras
Cant you try running sdxl with zluda and tiled vae with that card
Probably forge
Oh nice, I will look into that
2060 basically the minimum
acc runs quite a good amount of stuff but stuff like sdxl struggles a bit
Does it really cost the Nvidia and amd that much to put 24gb vram in their gpus
Likely limited by the memory bus size
coz 5060ti they managed to squeeze 16gb vram in
Added kevin hart into the mix aswell
and 5070 ti and 5080 likely have larger memory bus size than that so they should be able to squeeze 24gb vram in
oh well amd, intel and nvidia are not giving us the best at a good price
nah bro the gpu's come pre-scalped now
average price for a 5090 if you manage to catch one is about $3200 USD
5080 I purchased mine at $1600
lowest they go probably $1300 for the crappy PNY ones
Can i run SDXL on 8gb Vram?
Yes! i used to run it just fine only my 3070TI. if you use AMD however im not sure
Nah i got Intel
ooh an intel arc thats a first ive seen it
Is it hard to set up? Im new to image generations and just want to try make some images in various art styles
Hmmm im not sure what you consider difficult, have you used Git before?
Not really used no
hmm are you on windows?
Yes
lmao wait
when u said AMD i thought u meant as in CPU
I forgot they made GPU's
So thats why I said Intel
I got a Nvidia card ahahahhaha
ohhh
yeah then i recommend SwarmUI or ForgeWebui
swarm is an easier install imo
but in tech-support you can have more support here
with forge webui
theres a tutorial in the #🤝|tech-support pinned comments
and they didn't before/
@atomic mortar Do u perhaps know why all my images are like deformed
like the faces etc
the body
Base sdxl?
Yes
what is the difference between model and checkpoint
Oh its the same thing
But I'm going to bed, 3am n all
Hahahahahah same for me
If you get stuck i recommend popping into the #🤝|tech-support channel or the SwarmUI discord if its a UI specific thing
Any apes here know any pharma shit https://www.gilead.com/news/news-details/2025/gilead-presents-new-hiv-treatment-and-cure-research-data-at-croi-2025-including-an-investigational-long-acting-twice-yearly-therapy-option is that the oaktree kim is referring to
Okay I have to admit Sora's image-to-video is blowing me away
Im going to try animating my Taylor Swift dark magician girl and pray that it doesn't tag it as violation
it is INSANE how detailed it is. Almost tempts me to pay for the Pro version
So, I've been away for a while from SD ai art gen, and I see a LOT of new model types, Such as Illustrious, Pony, Flux, and more. I'm mostly used to 1.5 and SDXL, what benefits do the new model types bring, and what use cases should I use them for?
It's all about finding what feels right and comfortable to use.
Flux is the newest model family and has the best prompt following. It also gets anatomy right most of the times. Its weakness are a certain "plastic look" for photorealism and its lack of many style understandings (in particular for paintings). Both can be solved via custom models, though
Hidream erasure 😂
HiDream is probably just Flux finetuner on new text encoders 🙊
it probably is secretly Flux in a trench coat and a hat yeah
@atomic mortar Hey
I was wondering how to impaint pictures
Impainting is changing stuff on pictures u have made right?
If you just want to fix the face i recommend segment:face or use the + button next to the prompt box
unless its pre-installed already
Automatic segmentation is basically adetailer
But bigger
You can segment anything to "fix"
Always do a segment at the end of a prompt
What is automatic segmantation in comparising with segment:face
So its like 1girl, brown hair, etc segment:face blue eyes
Its the same
So I add segment:face to my prompt?
When i generate or like after?
I will, yeah will adding that segment also add faces? because rn i have a prompt where i tell it to show face with this: (face out of frame:1.1) in negative. I grabbed the prompt from Citivai so Idk how it completely works
Ohhh
Idk why its not working then unfortunately
I have in positive
"face showing"
in negative (face out of frame:1.1)
@prisma owl can you send me the prompt + image either in #🏞|general-with-images or dms
Im finally using flux fill with comfyUI
but outpainting like 100 pixels takes me hours and hours is that normal?
outpainting a single image is not nearly done after 3 hours
Pony and Illustrious (IL) are SDXL so heavily tuned that they're essentially their own model now (loras that work on SDXL prob don't work on pony/IL, vice versa, etc)
the 2 are both anime focused, and I'm p sure ppl just use an IL finetune of their liking over pony now
hi. i'm iqram
is it normal that generating an outpainting with flux takes so much time?
outpainting, inpainting, txt2img, img2img they all are internally the same thing
so no, it should not take more time than generating an image of same size
sounds like you do computations on your cpu instead of gpu
I also think so, my stupid ass started CPU instead of gpu
Will test
yeah that was the problem, no wonder lol
outpainting now takes almost no time, but it isnt working
it gives me a grey extension instead of generating anything really
IMAGINE/Bússola estilizada integrada a uma tela de TV ou antena
you should either
- use flux-fill and 100 denoising strength.
- copy the edge of the image such that it is filled and use e.g. 80% denoising strength
in both cases you don't need a prompt
denoising value of MAX takes the og image as a prompt completely roght?
2) copy the edge of the image such that it is filled and use e.g. 80% denoising strength```
WDYM?
it changes the masked region maximal
it says lower values will mantain the structure of the OG allowing for image to image sampling
in img2img as higher the denoise as more of the original image is changed
you want to outpaint, so the part you want to change is empty (e.g. gray). you want to 100% replace this part of the image
yeah, so what changes that is denoise 100 not denoise 0 then
100% denoise means completely replace this part of the image
ok cool
also, does it matter how many pixels do I outpaint?
should I stuck with 64x64 multiples or something like that then crop
instead of dunno, augmenting top by 81 pixels and left with 149
in theory multiple of 16 but I think most tools handle that internally
so if I want something like 60 pixels, 16*4=64 then crop the extra 4 pixels
rather than just augmenting 60
outpainting is the same as inpainting. You just extend the image size beforehand and then do inpaint on the extended edges
only the total image size has to be multiple of 16
total size of the whole image or only the extra stuff
whole image
in machine learning its just easier to make everything multiples of 64
and as said, tools usually handle that internally anyways (e.g. extend to 16 and then crop)
so I shouldnt worry with comfyUI then
im using the default template for fluxfill
I just put what I want in the pad image node
I think the default templates are really bad
everyone got this sequence stuck in their head now lol 🫠
64, 128, 192, 256, 320, 384, 448, 512, 576, 640, 704, 768, 832, 896, 960, 1024, 1088, 1152, 1216, 1280
cause they don't preserve the original pixels
do you have a good template?
I have a rectangular image and I just want to turn it into a square
I just want to outpaint not many pixels really, closest is 128 extra pixels
you want to copy the changed part of your image into the original part.
But you can also keep the current template and check how the quality is first
ah ok padding doesnt let you choose any outpainting anyways
I can choose 64 or 72 but no in between
I dont get it, what do you mean by copying the changed part
also what is feathering exactly and what could it be a good amount, here #🏞|general-with-images message
you encode your input image with the vae, then change the edge of the image, then decode it back through the vae. The vae is a compressor. Think of it like you convert a png image into jpg and then back to png. It will lose quality
ah so I cut paste the 72 generated pixels and stich them to my orignal image
it's not so severe with Flux as flux vae is using less compression than sd 1 and xl
yes
that would prevent that your original image loses quality
Ive heard that using ai generated isnt good practice
changing my image to a square then using that square as training data isnt good
but I just did a generation and visually, it looks ok
you could also just train on a method like flux that natively supports non-square images 😅
or sdxl
seriously? fuuk
oh well, I wanted to try first with sd1.5, then the other ones and compare results
if people trained on the latest models they would have an easier time
I mean its good learning int he end
its the opposite to people's intuition
people think training big new model would be harder but its easier
I mean, Flux trains very differently from SDXL, so it might be good to try both and decide
but usually flux just gives you best results but takes most of the time
with lion I saw someone get ok result in 70 steps
did require lion though
bit of a messy optim
so flux is better than sdxl, and sdxl is better than sd1.5
bigger = better almost always
lion is weird 😬
I thought training with flux was harder but if you say otherwise
yes
can you train flux in comfy UI or whats the way to go nowadays?
flux isnt a SD model so I suppose its different in some ways
flux is by the same developers as SD
because if I dont have to waste that much time setting up the dataset...
it's just not called SD due to the devs left the company
ty copyright 😦
I mean at this point their new company is a stronger brand so
it is swings and roundabouts 😄
there are so many training tools
the main threat to any western AI firm is the Chinese firms anyway
kohya, onetrainer, simpletuner, aitoolkit
the Chinese firms are releasing very large models with full apache/mit licenses
I actually don't know how western AI startups can compete with that
I am not sure they can compete, purely on the model front
so they will have to pivot
to more service-based model or something
I leave for 2 years and everything changes completely
they can just build on top of that models. I don't think open source is a threat at all.
I remember training in kohya, not the other ones
if quality has increased that much im excited
they all usually use the same input more or less. Only configuration is different
btw now that I catch you connected #🏞|general-with-images message what is feathering exactly
open source rocks
cos why would people pay the middleman
is the issue
I stay out of AI investing cos of this sort of reason
I can't see where the moats are
they do all the time.
@abstract quarry do you have any good guides for training the flux model?
I just think its a way smaller market
than for example 2-ish years or so ago
when firms like midjourney had monopolies
I think most tools have guides or default settings
I remember Simpletuner and Aitoolkit have default settings for Flux.
its just that I dont know who said it but coimfyUi isnt for training or something
but Simpletuner might be difficult on Wimdows
I feel like I'd love to use simpletuner but the install is not made easy
compared to others where its a container or an API endpoint
it's just a normal python package with poetry 🤷♂️
maybe its skill issue on my part
I only skimmed the docs but they looked quite manual
I mostly look either for cloud endpoints or containers I can quickly make cloud endpoint
I tried kohya, simpletuner and Aitoolkit. I found them all quite similar
is mostly that for some I found containers
funnily enough there is a cog for simpletuner but its an old version
Hello air fryer people
hello
I should just be less lazy and make container, automation script and endpoint for each of these myself
btw how are prompts with flux, is it betetr to have words separated by commas or a long continious description?
Long
long ye
hey i need someone who is really good on image generation
it is better to just ask your question directly 🙂
if its a question someone can answer they will but fishing to get the real question out is what i do at my job enough already lol
new papers that seem interesting
Boosting Generative Image Modeling via Joint
Image-Feature Synthesis
https://arxiv.org/abs/2504.16064v1
REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers
https://arxiv.org/abs/2504.10483
yet integrating representation learning with generative modeling remains a challenge can anyone smarter than me explain this?
does it mean training?
Is it possible to "convert" a checkpoint to to a lower billion parameter? As gguf is for quantized models, but iirc gguf needs it's own nodes, and would wanna use a 7/8b parameter wan.safetensors for instance
both build off of work mentioned in this paper
Why does everyone here use perfect grammar as if they are at work chatting on teams or smth?
Like... I do not care if you talk normally.
😂
I need to cover my nipples with AI InPaint, but no workflow works for me. Does anyone have a solution? I pay for the service.
photoshop?
I have a question
I was reading on CivitAI and they said they are updating their policy
Yeah. Fun stuff.
Is it for pictures or does it also mean they cant upload models/loras on there surrounding those things
policies policies policies censorship censorship censorship
Didnt this orange mf say he was going to end these invasive restrictions?
lol, reading this you could think that civitai bans nsfw content, but no, they only ban very extreme and specific stuff
and now people cry cause they can no longer generate porn with women having period
lmfaoooo
yeah there is a line for sure, but the overcorrections are insane
nothing like waiting 15+ minutes for a single video gen in Sora just to be told that it can't be shown because of a mysterious "policy violation"
ChatGPT might be extreme in its censorship
but I'm also annoyed that the only big image gen website is a porn site basically
bruh. welcome to reality. Goonality should I say
AI is 98% gooner stuff, 2% productive stuff
I wouldn't have it any other way.
I'm getting error messages on Sora that they've hit capacity since everyone and their grandma is trying out the new models and based on the gens I've seen I bet a large chunk of that overcapacity comes from gooners like me trying to bypass their ridiculous censorship through trial and error
any art platform is full of nsfw. Looking through devian arts means looking through naked bodies.
The difference is: devian arts is aesthetic. Its arts.
Civitai is just pervert porn.
like when you go on a porn site you want to stay in certain categories. It's so annoying seeing an add popup of, say, granny porn 😬 similarity, I'm sure straight people don't want to see gay fetish porn.
But on civitai all these weird fetish stuff is just thrown onto you. You could open a model "world morph into glass" and half of the showcase images are masturbating women with unnatural large breasts. It's just disgusting and it's difficult to get rid of it. You have to disable all mature content but even then you still see a lot of fetish stuff
That's a lie. CivitAI doesn't show anything NSFW unless you turn that on in the settings
Now if your concern is that weird porn is getting mixed in with traditional porn, well, welcome to the golden age of depravity circa 2025. As the world decays, people get lonelier > people get into weirder and more depraved shit which is then normalized. Idiocracy/cyberpunk dystopia in full motion.
even in sfw mode you get a lot of weird fetish stuff that is just not "nude enough" to be counted as nsfw
Sad but true. I think we can all agree that the changes to remove minors in images as well as implications of SA or forced sexual situations is probably for the best, and probably the removal of celebrities. Art is meant to be subjective, and if you see something you dont like you shouldnt click on it, and your free to have your opinion of it. But that does not mean the artist is in the wrong for creating it. People think the works of certain surrealist and horror artists are over the top or distatesful because the imagery doesnt agree with them, but that doesnt mean it isnt art.
We live in a day and age where increasingly sex and porn are being normalized, even the weirder fetishes, and that of course means its gonna bleed into the artistic side of things. Case in point, danbooru is full of it and a lot of that isnt just AI art. You just gotta accept that that is the space now, and take the tools you need to make what you want and go about your business. Besides, a lot of this is just Civit covering their asses before a lawsuit happens.
Oh all big companies are covering their asses, but the bias is absolutely asinine
Go to Sora's main page and you're going to find loads of Donald Trump or Putin turning into poop parodies
yet the moment I try to even remotely animate my Taylor Swift dark magician girl - policy violation. Of course.
does someone know if its possibel to use fooocus codes in visual studio, trying to inpaint, lora, prompt etc feature but getting access to it through code, is that possible or do i need to use the website interface they have for that? Cause i have tried using simple SDXL code with lora and masks but it doesnt get nearly the same good result as fooocus does
diffusers is most common for command line
you can use comfyscript for comfyui
otherwise pure pytorch etc
forge api maybe
comfyscript with custom comfyui nodes or pure pytorch have much nicer syntax and modularity than diffusers
but diffusers is more stable
so it depends
I am part switching to rust but I don't "recommend"
Hey, trader.
If you are also facing issues from your challenge account passing or making profits on your live account on any of your chosen trading platforms on this prop firm. I'd like to tell you what my research brought for me that makes me to always take enough profits per day on my live account.
Msg me if you are interested
Hey, I wanted to ask if there's any rules for making a new post on r/StableDiffusion, I don't use reddit much and maybe my account does not have enough karma. I can't see my new post appearing, maybe it is pending moderation.
I had made a tool to easily archive civitai content so was hoping to share that with the community, https://github.com/dreamfast/go-civitai-downloader
ahh https://old.reddit.com/r/StableDiffusion/comments/1k784qf/gocivitaidownloader_easily_download_anything_from/ I see it was removed, no problem, I'll try one more time with the github as a link post, if it doesn't go through no problem
wow thanks so much for this
sad I can't get all the models, i only have so much space, but i got all the loras i wanted for video
Whats the best setup for amd users? Just running comfyui straight up or is there any good programs that package other useful things along with it?
cool i just added torrent stuff so u can generate torrent files based on what you downloaded, sad i can't share it with r/StableDiffusion 😦
i heard it's tough, there has to be plenty of tutorials out there though, it is possible to do, straight up might work but check first
I am trying to generate something like this https://www.youtube.com/shorts/CtbEvLPM23o I can't quite find the base image for something like that. Any tips
that looks like midjourney
probably midjourney and if they dont have their own animation AI then use sora
Hi, can anybody help me?
atm I think I need to studymax on img generation so I have some gucci base image for wan 2.1
Hi @robust otter
Just so I understand, if I wanted to use lora and image prompt can I run a simple python program that uses fooocus app without me going into either comfyui and manually adding photos I can make a code that runs and uses the api instead? Will this cost money even though I run it locally?
what do you want, and why are you messaging me in a server which I dont ever use
How do I use Stable Diffusion or other AI General Tools like Flux, like Photoshop's Generation tool. Kitra is a software that allows me to do that, like Photoshop, masking out an area, and for example, masking out a lake,. and telling it to add boats. Pinokio just makes images from scratch, but I want to modify certain parts of images locally using GPU
Also is AMD RX 570 8 GB Enough
So I openned a ticket, now what?
a ticket ? did someone reach out in dm ?
YEAH
who....
It's MOD SAM
99.9999% chances of it yes.
Yes what? Also can't you just tell me
not sure if fooocus had an api
you still pay for electricity locally
for what im reading it seems flux needs a dataset of 512x512 images
at least Flux.1 Dev
Your's trying to say?
fo you guys reccomend a 1024x1024 dataset for flux1Dev?
I want to set up a new dataset but I want confirmation if possibel
I want to do good quality but im not used to flux
I just don't know it might
I used it a bit over a year ago
can't remember
you can use any resolution you want for flux
but the whoole dataset needs to be the same size right?
no
seriously? you can have literally any size though the dataset?
yes
you have the usual "multiple of 16" rule, but the training tools will just crop your images to a multiple of 16
hmm
I preffer to set up the dataset first
I want to control what goes in in the end
if you do a big fine tune without the resolutions spread nicely in the training data
flux will lose its ability to do multi resolution
but for small lora it is okay, that is probably what they mean
what do I do then
how many images are we talking here lora vs a full finetune
wanna try both
you cannot do full finetune flux with 24gb vram
how much do I need for a finetune and how much for a lora with flux
both nº of images and vram
never heard of blockswap
its where you move blocks back and forth
from motherboard DRAM to graphics card VRAM
Have you ever found a way to convert a Disco diffusion CLIP model into a diffuser or .ckpt file for use in something like Deforum?
would be easier to make a fresh code base than go back to the old stuff rly
do you mean for disco?
https://www.youtube.com/shorts/MLqGVIYwSAY made this with AI
ye disco is super old its probably a pickle if you do find it
but I meant deforum also
there are colab codes that function still, but would be so nice to have one to save locally and to not have to use those extremly heavy servers
ye it would be cool
Hello together, I´m new here and excited what we can create together 🙂
@woven panther just a question about your Phantom Wan implementation.
I noticed that the way it embeds the subject images, it seems to
embed them the same size and that size is then used for the
video generation size. but is there a way to decouple this?
like let's say I want to generate a 768 x 512 video, but..
the subject images can be either same or different sizes
from that, like 480x480 for image 1 and 600 x 400 for image 2.
also, is the 3rd and 4th embedding working? cause it doesnt seem
to be copying them correctly, maybe because 1.3B model is too small
for more than 2 subjects?
Has to be same size since it's used in the same latents, but you should be able to resize your image and composite on a white canvas like with VACE
Hi
What, because they outlawed pee and diapers?
hey sd pals, i did some big updates for this https://github.com/dreamfast/go-civitai-downloader so now it's very easy to download many models or loras, also images from civit ai. After the models or loras are done downloading you can generate a torrent file and magnet link too. I am hoping this will help preserve some of the content that is doomed for oblivion.
They outlawed a bunch of things that COULD imply forced or nonconsensual situations as well, maybe its that that has people grabbing their torches and pitchforks. I dunno, kinda sus to me.
anyone have an issue with models where the face of a character in a generated image will suddenly be in a completely different style than the rest of the image?
It's ok, maybe I can start with a lora and see how it goes, but I want to have all my dataset with the same size
I don't want flux choosing what it cuts
Bruh what was the point of electing Trump if the internet is going to keep snowflakizing?
Like i said before, seems like Civitai is covering their asses, and in the grand scheme of things, its probably better for the AI Art movement/scene/whatever you wanna call it if its not being viewed as a place to create pornographic material that even porn studios wouldnt film (hence the removal of certain things that could be, at least in a court of law, skewed to implicate such things as SA or pedophilia). But, their a business, and all businesses shake and move when their investors say so, so its no surprise.
Unfortunately it is still part of a broader bipartisan assault on adult art and adult artists that has been happening over the last decade.
bro, what is the point of AI if it's not to create erotica?
AI is and always has been about goonerism, in fact, sex robots is arguably the end goal of all this. Who the hell wants to deal with real women with all their flaws when we can have our own ideal bot partners?
the fact that people keep trying to pretend that AI is completely exclusive from porn is ridiculous. Just admit that the two go hand in hand, there's nothing wrong about that despite what the loud blue-haired karens on twitter are shouting
I recommend writing a training loop yourself rather than using the pre-made ones
at least then you know what it is doing
The tos change is because of Visa and MasterCard, based on other sites they are gonna keep censoring more and more until its R+
Thanks gonna grab a few TB with it
I dont know how to do that, I used kohya in the past
its actually harder to use kohya in some ways cos documentation is not thorough
I recommend simple tuner if you are gonna use a pre-made one they have a thing called lokr
lokr is separate its part of a project called lycoris, but it is integrated well into simple tuner
Scam, dont click
Ok so for what I'm reading, one preprocessing flux does is bucketing
You select a size and it makes groups on that size with x64 muktiples
So my database can have images of 256x256 if I select that, but it can also have for example a 320x320 image in the dataset
Or 384x384, etc
Like here #🏞|general-with-images message
using 1024x1024 as reference, it seems that as long it has the same size as any of those buckets or same aspect ratio, is all ok
if anyone can confirm pls
@fervent thunder
I mean it gives youa lot of options
what do you think
hmm need info from a proper source like a paper
or quotes from the company rly
How to get invoice from stability?
should be in your e-mail or you could email their support directly
i tried but they didn't answer
hello
yes. But that is only relevant if you use batch size above 1
with batch size = 1 you don't need buckets (or every unique resolution can be just its own bucket)
please don't write your training loop yourself 😂
it's not necessarily simpler than using kohya. There are a lot of stuff you have to implement to make training efficient. Implementing stuff like gradient checkpointing is not done in a single line of code.
yeah its just that I have some 3 other guys also telling me stuff and I get confused
by the moment im preparing my dataset with that chart I linked
should be enough for something like 10-50 images
I'm looking at it
"Avoid Ambiguous Images and Distracting Elements: Avoid having too many images that mix styles, characters, or concepts. For example, if you are training a character, don’t use an image that shows that character in a group of other characters." <-- this is bullshit
yeah as long as it is tagged it should work right?
it's the opposite: if you train character loras, you definitely should add images with multiple characters. It's sufficient to put just multi-panel images with different characters in there. Without that, your Lora will transform every face into your character
the model has to differentiate
yes. As long as your caption is correct you will improve the model
so if you have many images of one person you want him in diifferent scenarios
good to have some confirmation
yes, but also add him with other characters
the model has to learn that "NAME" refers to this specific character, not to other characters
yeah so you have to avoid common names and tags for that specific character because the model can already have that beforehand
or style
hm, dunno what you mean with that
some common names like "john" are already learned by the AI
so if you want to learn a character like dunno, john wick, and you tagg it as "john" if the AI already knows other johns it gets confused
same with concepts and such
a common name is not so good, in particular if it is already loaded with a meaning
like "John" is a very American/British name, so using it for a Asian guy might be not so good
yeah something like that
I would use natural names, though
like when I train on my own face I always use my real name (first name+ last name)
(funnily, my first name is Kai, which is a common German name, but the model associates it with Japanese and in the beginning often mixes in Asian elements)
(so I trained my first loras with the name Christian instead, which sounds more Caucasian. However, it doesn't really matter. The model also learns my real name after a while)
Many guides use random characters as names instead. I wouldn't do that, cause T5 understands the concept of a name and might get confused by random characters. But in the end both will work nevertheless
I mean you can always invent a less common name
the AI doesnt know your true name, it only cares on how you look
to avoid things like the asian thingy
yes, but if you use first name+last name you are usually fine
not in my case XD
anyways where you able to see anything else in that civitai tutorial
I dont really like civitai that much but it is popular
I think the rest is okay
style or character training?
will try with a lora for a character I think
by the moment
I want to do both but mabe character is easier and needs less images
for what I read
yeah, keep it simple. You can train on hundreds of images, but you can also train on just 10 images
it's not always clear what's better
(I mean, more is better. But quality> quantity)
yeah I will try to get fized on this
how many images would you say for a character and for a style each?
also I would not use gradient accumulation. Takes too much time. You can use batch size if you can afford the vram. Training on batch size 1 also works, though
as said, more is better, but you can often train on surprisingly low number of images. The guide you posted is right with saying you should rather pick 10 highest quality images than using 50 low quality ones
I think I should have enough quality images, I just need a number for a start
and some "default" settings I can edit in future generations
I just not want to go like a headless chicken
dunno, I think 20 is a good number
do you have a workflow for onetrainer?
at this point im comfortable copy pasting what you use
you seem to know your stuff
gonna use onetrainer, let me see how it works
oh fuck I have to tag my dataset first 😮💨
you have 24gb vram? You might use gemma 3 for assisting you with creating the captions
but for 20 images you can do it yourself
for more it's quite helpful to automate this. A big advantage of using AI for creating the captions is that you can use multiple captioning strategies (tag based, natural language short captions, natural language ling captions)
i like to use civit ai's captioning system tbh
upload pics, download em after tagging
with flux is bette rlong descriptions or single words separated by commas
gemma is the strongest, though. It has a really deep understanding and you can teach it any captioning style
to be honest,I would do both
same seed both and compare results I suppsoe
short words have the disadvantage that your trigger words lose their effect in long prompts
any limit for both? number of tags or size of paragraph for the other one
but if Im training for a character only I dont need a trigger word right?
no. Ideally, use multiple captions per image. But most tools don't support this. In this case just randomly decide for each image if you use a short or a long caption.
what do you mean by trigger word exactly
trigger word is also the character name
yeah but I thought tht if you are training for a character you dont have to tag it
?
only what its extra
you always add a name
like, if there are 2 characters and you only want 1, you only describe the one you dont want
the idea is that you don't describe what is implicitly defined by the name
so if you train on, say, on Son Goku, you don't describe that he has black hair and is muscular, cause this is implicitly clear
yeah
but you "have" to tag the word "son Goku"
to define those
you add "Son Goku" to the prompt, yes
ah ok
and if there are multiple characters, you write "An image with two characters. Left is Son Goku. The character on the right is a man with pink hair and a muscular body."
and with simple tags?
(I suppose left by my POV not the image's
"Son Goku and another man" ?
Tags are just text, too. There is nothing special with them
no I mean, you can put tags like Son Goku and another man fighting
yes
or you can put Goku, man, fighting, muscle
I would definitely use the upper one
ok, any kind of limit for what goes into the prompt
how many words or how big or total descriptions should have
as said, I would try both: short and precise prompts as well as long and detailed prompts
that's also how you want to prompt in the end
and what was that programm that helped you tag
you mean both for all images or the first for all then teh second for all
You can use multimodal llms nowadays
I just need a name
I use gemma
I remember using wd14 but I think thats only for anime
cause you can run it locally
you could also use ChatGPT if you have a subscription, though
I preffer something local
but you mean a local llm model
yes
you can download gemma 3 4bit quant and run it in your local machine
no. Just explain the llm what you want
ah ok
"I show you an image of a character named Son Goku. Please answer with a prompt that describes this image. The prompt should be short and precise (10-30 words) and include the name Son Goku. Do not describe Son Goku's appearance, but describe what he is doing in the image. Describe also the background. Answer only with the prompt."
something like this
the cool thing on llms is that they really understand what you want. If you are not happy, you can add more information into the prompt
you could even tell the llm that you want prompts for Flux.
The prompt should be short and precise (10-30 words)
this is what I wanted to know more or less
yeah its just that I havent yet found a good llm
local llm
gemma 3
that was an example 😅 as I said, instead of having one consistent style of prompting, just use different ones
short prompts, long prompts, tag based prompts
not true
they never do
censorship only happens during alignment step at the end of training. In its core none of the models is censored
oh thats cool to know, didnt know it was at the end
I'm amazed how good Sora is... it seems to get everything to ask in the prompt in the correct style with no confusion
As I said once, I thought it would be continuous development and optimization for generating locally but it seems that what we got is what it is
From SD 1.5 to SD XL was wow
ah I haven't tried it yet
not rly into video
I think most people were super excited for video and many switched over right away
but I still prefer image
its cos I started out in upscaling hobby first
Can someone help me with a question
what question
What is the best host software like Kitra AI, that allows me to use models like Stable Diffusion or Flux to generate image in masked areas, like Photsohp, isntead of generating image from scatch, i want to mod sepcific parts, EG add boats to a part of the river
idk that much but i think you can use comfyui can do preety much of the work i guess
Like Photoshop? Mask select an area and generate or mod?
yaaa preeety much it
What you mean by pretty much? The word pretty much means there are some caviats
If you want the most Photoshop-like experience but free: InvokeAI
If you're okay with a bit of complexity for ultimate power go with ComfyUI
thats all
github one
If you are more into a user friendly UI you could combine krita with a comfyui backend with an ai plugin for krita. Otherwise if you do not need the latest models and function but solid outpainting and inpainting, regional changes etc. I would look towards invoke (community edition)
Id go a step further, swarmUI
Bit more user friendly and you have acces to comfyUI as a backend
Swarm is apparently good for Flux too so there's that
You get to use the miiiiiracle checkpoint type lol
Can someone give me an invite to comfy org discord?
when I click on it it shows I am not logged in, when I log in the tab forgets I asked to join that room
Cant find it in discover search
what prompt do you reccommend for tagging images for a flux training in gemma3? llm its not giving me good desc riptions
it does the usual yapping these models do which I dont really need
btw having a local llm rocks
dude, you can just ask Gemma for a good system prompt lol
My prompt: "Write me a good system prompt for an image captioning model. I want to generate image captions for training/finetuning a Flux diffusion model for image generation. Write me a system prompt for such a captioning llm."
gemma gave me a good system prompt. I then added:
"This is great. Modify the system prompt such that the model will always output two different captions: one short which only highlights the most important aspects of the image and one detailed. Also, if I show it an image and write "this image shows [SOME NAME] I want the captioning model to use [SOME NAME] in its description and do not describe the main subject of the image in details (as these details are already implicitly defined by its name). Do you understand that? Write me a system prompt!"
What came out was:
as you can see on my prompts: you don't need good prompts. Just write anything and ask Gemma to make a good prompt out of that
then use this prompt as system prompt for your image captioning stuff
I just slap this into any llm im using at the time if i get creative block:
You know the secrets of the lost art of prompting gorgeous anime wallpapers, at 16:9 and 2560x1440 resolution. You also have extreme proficiency in character profile shots in a 9:16 aspect ration, at 1440x2560 resolution. Some have said your creativity knows no bounds, and they are right.
Your also extremely proficient with all of the extensions and tools available on Automatic1111 with Stable Diffusion to enhance images, especially controlnet and regional prompting. And when necessary you will suggest using these tools, as well as providing a mock up open pose skeleton or depth image for controlnet.
I am your human counterpart, the one who enters the prompts to bring your forbidden knowledge and majestic works of art to the masses. Any prompt you give me, no matter how ridiculous, will be entered. And if additional tools are needed to achieve your glory, you will tell me and structure the prompt as it should be entered with those tools in mind.
With all of this in mind, your only job today is to provide me with prompts for stable diffusion anime art of the highest caliber. After each prompt you will ask me to submit the image generated, and then suggest no less than 3 options for our next prompt for me to choose from. Each prompt will be detailed, exquisite, and balanced so as to showcase the character and the scene in its proper glory. Once i pick a prompt option you will generate me the prompt you have in mind, and the cycle will repeat. The world will know the name and our brand by the time we are done. ```
haha, that's a good one
Just input whatever image generator your using in place of A1111 (ive upgraded to forge for the time being myself) and run wild with this. It'll spit out pretty good stuff and you can steer it with your selections, upload the outputs to critique, and use it to build consistent styles for lora training etc if you want.
Doesnt fix them being chatty though.
They love their emojis
I don’t know if someone could help me. I can’t do checkpoint merge anymore. I use to pretty often and now it always end up in an error. With A1111, forge UI, comfy UI, none of them work. I’m on windows 11 24h2, 12900ks, RTX 4090. I’m on the latest driver 576.02. Is it a problem with the gpu driver? It used to work but now it doesn’t anymore. Is someone got a clue?
lol the damn scammers tryin to get crafty
ask in tech support Dude, they might be able to help you
Hey, come to #🤝|tech-support and show provide a full cmd log
@woven panther is this something you would consider porting to comfy?
https://www.reddit.com/r/StableDiffusion/comments/1k9bcfr/magi_45b_has_been_uploaded_to_hf/
my favourite thing forge has over a1111 must be how the interrupt and skip button actually works.
Hello friends, I’m using Automatic1111 and I want to create a consistent character, but I don’t know how to do it. I looked online, but no one has explained it thoroughly. Can you help me with this?
hey'
hey! still in super need of to make clip models become .ckpt if thats even possible?
from .pt to .ckpt
rename it? 😂
these endings do not have a meaning. Usually, they are pickled dictionaries or models.
no its not posible. Have you worked with disco diffusion before?
no. What I want to say is: there is no "checkpoint format" or "pt format".
even safetensors, although its own format, is not "standardized"
so your question has to be: I have file X downloaded from source Y and want to use it in tool Z.
exactly! Theres this tutorial but i havent made it worked... Anyone else have tried?
I have a 3070 8GB presently on Forge Web UI…considering upgrading to a 16gb. Had some feedback in another discord that 16gb is already not enough. Budget is tight but if Im gonna upgrade to do quality image gen whats the minimum i should be looking at without going overboard (i know in gaming there are diminishing returns).
Im doing this recreationally but once I get proficient I want to incorporate it into my business model.
Are there key specs on the cards I should be looking at? Or is the raw amount of ram the most important thing.
ammount of vram dictates which model you can load on your gpu at once / without having to chop it in pieces and load it bit by bit during the generation process ( usually done automatically by whatever program you ll use )
Having the model loaded fully will avoid the costly / long loading and unloading of data to your gpu.
With that said, if you have enough vram to load what you want, then yes the gpu speed / architecture itself will become your main concern regarding speed. Newer gpu will go faster ( assuming there is enough vram to load everything at once )
Now.... Is 16gb enough. Yes for image generation definitely. For video generation meeeh, video generation is still in is infancy, so it s hard to tell. You ll have enough to do stuff for sure. But will it be """""""futureproof"""""""" is hard to tell. Even 8gb is enough for video generation if you use some tricks.
you need to de-serialise it and get it into a form where it is just
written out as standard pytorch code
and then you can open up a model that is in the format you want
and have a think about what you need to do to get it to be in that format
its mostly just renaming stuff but sometimes there is more
Itll be a minute before i get into video, im still trying to learn everything about image. Im getting there…
It seems like running sdxl models works ok on what i have now, so any upgrade would be an improvement but the resounding answer in the other chat was to go cloud. I see the benefit but i have privacy concerns there. I guess i just hate being tethered to a 3rd party.
Its tough because im starting to see the suggestions are all over the place
I know flux is pretty VRAM heavy
it all depends of your budget.... How much will you be using this gpu ? for how long ? only for AI stuff ? Privacy concern indeed for cloud solutions ? etc... Disminishiong return costs, etc
Like Neon said 8gb should be enough for flux anyways.
Budget $600 ideal - i saw a few 5070 cards (16gb) in the 500-600 range. I can push $1000 but thats about my ceiling.
I had this card for a while, it runs all my other games and software fine on high or max settings (i do graphics, photo and video professionally). So if I had something that worked well, Id probably keep it until it melts or software just totally out paces it.
16GB is fine. Sure, more is better, but this is also true for 24gb. As soon as you have 24gb, you want even more vram. It never stops ;D
keep in mind that RTX 5000 will get longer support than RTX 3000 too. (at least in theory, if Nvidia does not become a fully AI company by that time...)
Personally with that budget I d go with RTX 5070 because of the support, faster cores, dlss4 and because I don t care about video gen :p
the gaming and rendering stuff like dlss might be worth yeah
Well, right? I mean Ive done this a while (tech not ai).
Price wise… first it was bitcoin mining driving it up, then covid, now the AI “bubble” and tarrifs…so theyre always gonna be pricy.
I know itll be out dated probably as soon as i buy it, thatll be true even if i got 24gb.
My concern is: if i drop 600-1000 dollars, will I be happy with my image gen with the CURRENT environment
saying it just in case. upgrading your gpu will NOT upgrade the quality of the outputs
it will just change the speed
every GPU will be out of date at some point because ASICs are coming
but this might take a few years
ASIC just means "specialist chip"
to be fair I remeber hearing about asic already available that can do inference for a fraction of the cost (but not training) but I don t think they re selling for the public yet.
Quality in terms of definition?
What about capability… like if I generate on civit… i can do an illustrious model with 2-5 lora’s and put out some fun stuff.
I feel like rn, im pretty capped at 1 model, maybe 1 lora.
Invoke crashes if my prompt goes over tokens (but that may be a setup issue)
Im still learning forge.
sounds like a setup / settings issues more than an hardware one.
sure lora will add to the vram cost but not that much tbh. And the more loras you shove in your prompt the more they will fight each other usually so it s not recommended to use many of them at once.
I know. I usually play with the weights to get some different effects but i try to minimize it
invokeai is using diffusers which is, unfortunately, less memory efficient than ComfyUI
make sure you run invoke in low vram settings
in general, the length of your prompt should not matter as long as it is below 500 tokens
and the number of loras shouldn't matter either
Maybe i should do a fresh install and try again then.
I thought i had that setup, it ran great for 15 generations then all it would spit out was black squares.
it s worth a try to reinstall and or run comfyUI before dropping hundreds of $ into a new gpu.
ChatGPT said the error in the log was from to many tokens but it definitely wasnt 500.
ChatGPT tells you what you want to hear.
If you can t / dont know how to verify what it says, I would not trust it blindly. Same goes for every LLM.
True. The hallucinations are annoying. But i also feel bad sometimes coming here with 100 questions 😂
LLM can be a good tool to start your research, it will at the very least give you a few pointer, stuff to research.
