#💬|general-chat
1 messages · Page 164 of 1
also the model that can pull that request off has not been invented yet 🤔
thanks!
hi
HuggingFace gets more and more annoying when they keep on warning "You have exhausted your GPU quota"...this is so irritating especially if we want to urgently generate an image
guess you need to go use colab then
or use mage.space
will mage space also have Flux model?
thanks for the suggestion 👍
on the home page, yes. 2 versions right under the prompt text field
oh that's amazing then
you get stuck, let me know, i'll point you to the how to use documentation
it's there any difference for using a1111 when its like linux mint or ubuntu etc...
heya!
anyone have luck with img2img for plushies/stuffed animals?
i want to turn people into marketable plushies
that's not what image2 image does or is for
I just want to use a real photo as a reference. I've had some luck but not alot. Definitely a noob though.
i know, but that's really not what img2img does. it uses your refrence image as a visual prompt. it'll use the color pallet, the shapes, and the locations of the shapes. then it will read your text prompt to find otu what it's creating. then it will create that using the color pallet, and shapes of the refrence image.
What would be a better option?
but you're wanting to create physical 3D objects to sell based on people - if it was me, i'd find someone that was already doing that and partner with them. that's a very complicated business you're talking about starting
not sell
just for fun
and not physical
just like, turning people into plushies. I want to plushie my friends.
so when you said "i want to turn people into marketable plushies" what did you mean?
I basically want a "filter" to take photographs of people and turn them into images of plushies.
okay well - img2img isn't going to do that
what will
you'd probably have to train a custom lora
I did find a lora but it's not working great.
that's why i said 'you train' and 'custom lora'
Hello I wanted to resume my training with kohya_ss.I was at epoch3 have set 6 epochs. How can I resume trainingHow can I resume training?
Did anyone know a good workflow, a video or a document to make a workflow for comfyui for consistent style and same type of color grading through each image with a upscale of final image if possible a lora. I want to do it in comfyui. Thanks in advance for your help
I hosted SD on Azure and I access with gradio.live. Can I download Models using WebUI?
the common term img2img is kinda misleading
because it makes it sounds like img2img is the best way to make an image from an image, but its actually a fairly poor method
before lora you could try to combine some control nets such as open pose, depth, canny and segmented with a good IP adapter like the mad scientist node or the SDXL style and composition IP adapter
when you get IP adapters that differentiate between style and composition, what's actually going on there is that they have preselected for you blocks that pertain to style and blocks that pertain to composition. This is key for it to work well
another thing that can boost IP adapter a bit is preparing very good embeddings. You can average/add/concat multiple images, both for positive and negative conditioning, to feed to IP adapter
and you can use image editing to boost the effect of IP adapter image embeds beyond what would normally be possible, a common trick is to sharpen the positives and blur the negatives
anyone know what sdnext is?
yeah SD Next is also known as Vlad Diffusion
its made by a guy called Vlad
its got two backends, one based on the CompVis LDM / A1111 SDWebUI code lineage and one based on Diffusers
what even is flux? is there a local SD3/flux lora trainer that you can run on colab/locally
SimpleTuner for example
as far as I know, kohya_ss is now supporting flux, too. Haven't tried it yet, though
is auto1111 automatically utilize my GPU vram or is there any setting to turn it on ?
Hi, what is PDXL?
its a fine tune of SDXL
yeah
flux is a new recent model
Model like checkpoint?
yeah but its a different architecture also
Hi, anyone knows what happened to the playground ai discord server ?
I've been on it for a while and now I can't find it anywhere
did they ban you from it?
no, I think the server doesn't exists anymore, but i'm not sure why
i dunno. look on their website and see if you can find a link to their discord
I can't find the link anywhere, and someone said that their server is down
https://www.reddit.com/r/PlaygroundAI/comments/1evgaej/i_believe_playground_is_ending/
it's playground.com and i'm looking at their website. doesn't look like it's ending to me
Adetailer multiple faces. Is there a way to Detail only 1 face Not all?
in comfy UI yes, Impact pack has nodes designed to select SEGS
sd 3.1 is coming ?
apparently yes
today i tested flux gguf the lowest level , for the first time running flux on my 2060 in less then 20 seconds
no freez no out of memory all thanks to city96
some century
Recent research/experiment done in TouchDesigner and SD: https://www.youtube.com/shorts/VyZFzKAuqsk (more info in video's description)
What do you guys think?
screen share is the real deal
i assum this will work on any window even a windowed video game lol
that looks really cool
the old photos give it a really distinctive look, its great
it can be used on youtube videos , movies , blender , you name it.
Total noob here.I am interested in running SD on my pc.Any recommendations about where to start would be greatly appreciated.
start by making sure your hardware is actually going to run it. Also, the recommended interface is comfyUI
ComfyUI,got it.Do you think a 3080 card would be sufficient?Thank you very much for the response sir.
depends on what else you're doing. you should probably ask additional questions in #🤝|tech-support
Will do,thanks again.
how to use? i forget
What's the current overall most generally effective upscaler in comfyui?
I can't get supir running effectively on arc 🤷♂️
Are there any "ai image sorters" on windows I can use that are actually good and don't spy on me
Or is that entire concept iffy
you could builda hydrus server and run a classifier tagger on it https://github.com/hydrusnetwork/hydrus
hydrus might even be able to create tags based on the prompt and settings
Hello - does anyone have much experience with image to video using AI?
yeah. what do you need help with?
Is there a way to make SD understand lightning in the image, when doing inpainting?
there's like 30+ inpainting methods
so its very hard to talk about inpainting in a general sense
Trying to get a better looking face. The problem is when put denoising high enough to get good results, the new face is much brighter than the original, a person who's in the shadow.
IP adapter would be the first thing to try
Hello guys, I am working with SD upscaler and have some unclear part of it. When we use using it we use Stable Diffusion upscaler and usualy some type of GAN. How are they working like someone increase size and someone quality? Could some one explain it to me or sent some article about it? TY so much
depends on the interface
the GAN is just sharpening the image - a real "upscaling" is often too hard for GANs. In particular, they cannot fix things that are not recognizable in the lowres image. If in your image is a leaf, then they can try to remove noise artefacts from the leaf when upscaling, but they won't be able to add new stuff like leaf vein and texture which was not visible in the original lowres image
Stable Diffusion upscaling is a normal image2image pass. Therefore, it can invent new things (like adding texture), which is a good thing but sometimes also a bad thing. e.g., it might add a beetle in the leaf although there was none in the original image. Sometimes this additions make sense and improve the image quality, sometimes they don't make sense and lead to weird artefacts
usually you use controlnets that enforce SD to not invent new things that are not in the original image
nevertheless, a GAN tries to keep the original image mostly intact and only tries to remove blurryness from the upscaling. A SD pass creates more or less a completely new image (depending on how much noise you use), so it's no longer the original image, but this is the only way to really get high resolution texture and features.
Although many workflows contain GANs and Stable Diffusion - you don't really need both. You can try GAN and if that is not sufficient, use SD instead. I don't think that using both after each other has high benefits (it's also not a big disadvantage, though, in particular as GANs are fast anyways)
Thanks, will give it a try!
the old GAN-style upsale methods are under-rated yeah, I like to use this one https://openmodeldb.info/models/4x-NomosWebPhoto-atd
its a bit like Swin-IR
with the release of closed source model such as ideogram 2.0 ... i am oficially hyped for sd3.1
I2.0 is a good model tbh.
so for Flux, the CFG inside the sampler should always be 1, right?
hello
I'm trying to generate an image with this prompt using Flux but all I'm getting is a picture of a house with lightning above it. It looks very realistic but I won't add the dome over the house:
"In the midst of a massive lightning storm, a typical residential home in Florida, stands under a protective, glowing dome of energy. The dome, a visible forcefield or bubble, envelops the entire house, clearly separating it from the chaotic storm outside. The dark skies are alive with intense lightning bolts, but each strike that comes close is deflected or absorbed by the dome, causing bright, electric ripples to spread across its surface. The dome is the central focus of the scene, shimmering with power and creating a stark contrast between the turbulent weather outside and the safety within. The house beneath the dome is a classic Florida-style home, secure and untouched, thanks to the powerful energy shield that protects it."
How could I improve that prompt to achieve the desired results?
likely too long and too many concepts
its also not in the style of a caption
this sounds more like a prompt an LLM wrote tbh
which is probably on what flux was trained on 😉
I could imagine some images/prompts are strongly overfitted
we rly need the research paper so that we can know which caption model or VLM was used
my opinion is that Flux is too overfit overall yeah
not terribly so, but to a certain extent
a bit like Playground 2.5
the aesthetic is rather "baked in"
wow that's rly impressive
unlikely to be in the training data so it synthesised it from multiple concepts
so the general idea seems to be to FIRST describe the energy barrier. Describe how it should look like. Afterwards you describe the house
A glowing energy dome during a thunderstorm. The dome is an energy barrier and consists of blue light and a geometric pattern. Inside the dome and visible through the transparent energy barrier is an American house. Lightning strikes the barrier and is stopped by it.
thats my current prompt
it generates the barrier and the house in every generation
I love examples like that
but to be honest: Ideogram and Dall-E usually have no problems with such things
Flux prompting behaviour is not really good. It can sometimes handle very complex prompts, but then sometimes it fails even for very simple ones
I think that loras and IPAdapters will be crucial for Flux
I have used Dalle 3 by far the most of any model and I still thinks its the best yeah
the secret HD mode in the API in particular
and a good general finetune
need juggernaut for flux, that also attempts to get CFG and fully effective negatives back
TY so much! so if simplify SD increase picture and GAN makes details better or vice versa?
hi
both increase the size of the picture by the same amount if you choose, but SD adds more details and GANs or transformers like SWIN-IR only add a tiny bit of detail and keep the image more the same
kinda what this project is trying to do: https://huggingface.co/ostris/OpenFLUX.1
thanks will keep an eye on this
Thanks, this does seem to work better! Still not getting what I was wanting but it is much closer
you might get good results with a second pass using a nice model like Leosam's Helloworld
one of the pretty ones
there's always just using something like CapCut's free image upscaler - it just gives you a larger, sharper, image that is exactly like the image you started with. And they have a video upscaler that's free, too
Does it matter what punctuation you use to divide prompts and prompt phrases?
depends on the text encoder but yeah ideally it would match the training data
if they have a text encoder like T5 this matters less
as working text stuff out is what T5 was for
funnily enough the main model with a smarter text encoder is Kolors but it didn't actually help too much
I cannot for the life of me get this one model to produce uneven/bad lighting. Being an anime style it might be forced to give it that perfect backlighting but it doesn't always fit the scene.
I think the V2 expansion prompts that I didn't ask for are screwing me over; gonna try different styles
I believe training data also is very important. Auraflow0.2 and 0.1 use a much smaller t5 text encoder called pile t5xl compared to flux and sd3 which use a much larger text encoder t5xxl and the 0.2 and 0.1 have better prompt following then flux and sd3.
pilet5xl is like 1b params while t5xxl is like 4b params.
Auraflow 0.3 is the same architecture but seems to have much worse prompt following it was further trained on a different type of dataset.
did you try "dynamic shadows"
I have not.
i'll tend to add phrases like "dynamic lighting, dynamic shadows, stage lighting, dim lights with haunting shadows"
Wow civit exploded with flux content in the last couple days
but how much of it is really worth your storage space?
The early stuff is normally not great. I love it when they say that up front, like I barely tested this... I'm like, why upload it then?
I kept trying with the same idea but the more I added to the prompt and changed the styles generally it just got worse
why are you using the model you said you were using?
ye exact text encoder choice matters less, than other choices
it's just what I have; all I have is Fooocus. I don't feel like getting a new engine/model yet, I think my skills need to increase first.
I don't think I've truly found the limits of this model; it's more likely I don't fully understand it
Fooocus does a lot of stuff without telling you
skills you learn on fooocus will not neccesarily transfer anyway
I've gotten a couple really promising results that make me think I can control it better
and sometimes I really like the art styles I'm getting out of it
but I see why people say it is good for beginners, it definitely works best with simple prompts and is an easy way to get your toes wet in ai
Front ends like ComfyUI look intimidating to me.
well - then you probalby are stuck with what fooocus wants to create. you could grab a copy of pinoko and install something other than fooocus
then install Swarm. it's a much friendly front end, written by @finite cloak
what models is Swarm able to run?
should be able to run everything
comfy is really not that hard once you start using it and get used to what stuff does
it uses nodes which I'd think would give you better control but idk how any of these apps truly work on the backend
all the more reason to start using comfy so you can learn and it stops being mysterious. there are a lot of workflows already put together that you can grab and run, and then look at. and we can explain what everything's doing as you have issues. and if you join the L2 discord (matteo's) you'll find a lot of people there that can help too
good idea, and someone said comfy was also well optimized?
it is, yes, and while some workflows people put out might not be, there are plenty of us that can help you optimize those if you need. i'll even give you a fairly simple flux workflow if you'd like
Can you link me to that server you mentioned?
which server did i mention?
Matteo's
i'll DM you the link. can' tpost discord links here
It took me about 2 weeks, and I actually seldom leave the workflow tab these days so I may as well be just running comfy
For swarm
so i want to generate an image, where can i put in the prompt? im so lost
if you're trying to generate here on this discord, first read the information in at this link #artisan-faq
I sometimes wonder how people land here, they think images will just somehow manifest
we have midjourney to thank for that
You have to focus harder than that.
sd 3.1 need to have an option that must be activated after each render.
this option lock data, colors, gender, atmosphere... ect for better consistency.
one day 
is it true that is free now ?
For me, MJ is the best prompt spammer for stable difusion.
even if concsistency maybe done by control net, its anoyying when the whol generation change because of one word.
honestly, prompting with natural language is overrated
prompting with commas might be the sole reason 1.5 will never die for me
I might do a mix-and-match of Flux and 1.5 for upscaling
I agree
Just throwing in the words of what i want without any grammatical coherence works amazing for sd15, sdxl and flux for me
just list the stuff u want
😄
natural language tends to use commas...
yeah, but, not, like, this
What about Captain Kirk speak where we only talk in ellipses... Like... This...
(im actually really guilty of sometimes overusing ellipses, but I was a drama student as a teen that did a lot of stage plays and scripts used ... a lot)
hi, stable diffusion is using all my 16gb ram and it's so laggy something wrong it wasn't like that yesterday anyone know how to fix this ?
some people talk like that
is it running on your ram or your vram?
i solve it with chatgpt
it was a huge X checkpoint causing the problem
ah. okay
Friday.
last week
Ready your rears.
Seems you can use --reserve-vram 8192 or any number to limit comfyui's total vram usage now
I can do --reserve-vram 12288 and run flux 8_0 and t5xxl no problem with unloading
I might be addicted to porn but don't know how to quit 💀
get rid of everything that you have in your environment that even hints at it and avoid coming in contact with it going forward
but what will I do instead. what do people who dont watch porn all day do
there are millions of other things you can do. so every time you feel the desire to participate in the habit you're trying to break, find something else to do, maybe something new and different to you like jogging or whatever
to quote smiling friends, I think I'm stuck in a loop of short-term dopamine rushes
really appreciate the advice bud
Anybody wants pizza?
eating healthy might be a way to start cutting off bad habits.
is it normal to have stable audio long-pending in the input audios?
that's likely the case, and the only way to break that is to find something else that'll be enjoyable and healthy for you. so jogging, exercising, etc
I don't understand how install ipadapter on comfyui someone could help me?
there's a #🧣|comfy-ui channel that might be more useful to you than generat chatter
How do you get noisy/low quality android phone pictures out of Flux?
trying to make some scary alien pictures
you could try adding "film grain, photo, blurry, found footage"
absolutely do not use photoreal, photorealistic, or photorealism
I'm trying things like that on the huggingface schnell distro but it's probably not as good as installing locally
I was under the impression Flux is trained explicitly on phone quality images
you can't adjust all the settings on huggingface, that's just a demo. you might also need to play around with different samplers and schedulers
it's a 12 BILLION parameter model. it didn't train explicity on anything
not explicitly like that's all it can do but explicitly like it was part of the model
it's a 12 billion parameter model. i'm sure it has photo photos in it. what's your prompt?
I tried a couple and gave up
I used terms like "lost government footage, 1970s, canon 70mm, noisy, droid razr, android phone, low light" nothing worked
it all looked plastic wrapped and rendered with good form and light
what's your actual prompt, please
I closed the tab I don't have the prompt anymore

if you want a new prompt I'll give you one
it's a little eaiser for me to try to assist if i can see the prompt being used
grey aliens hiding in woods, noisy, low light, android phone quality, scary, hiding in the shadows, lost government footage.
try that
I know you said we don't want realism but we don't want a cartoon style either and that's what I was getting. If I had control of negative prompts I think I could have gotten something.
and you want this on flux?
i'm not sure what the huggingface demo will do, because i don't know what's implimented in it. do you have the ability to run it at home? if not, can i have the link to the demo you were using please
I will look into a local flux installation
I'm not pressed for time or incapable I'm just lazy
give me a bit here
take your time
had to post it in the other channel #🏞|general-with-images message
you might try using flux on mage.space. you cause use it with a free account
hmm alright
what are yalls recommendations for some good versatile models (non-SDXL since I run SD on a cpu)
um what's wrong with sdxl?
I am running SD on a cpu with 32 GB of ram
images on normal SD (sd 1.5 based models) take like a minute
so every single thing is going to run slow. you might as well use the good stuff
can SDXL even run without out of memory errors on my machine?
depends on how you're running it
and how much swap space you have
ahhhh
so that is why I have not been running into as many issues recentlyish after switching to linux
I forgot about swap space
to use sdxl do I just download an SDXL model?
how are you running stable diffusion right now?
a1111 webui on arch linux
then yes, all you should need to do is download SDXL - i think it uses the same VAE as sd1.5 does
k
you might post in #🤝|tech-support and see what @warm junco has to recommend, he's more familiar with a1111 than i am
half the good models require a civitai acc
and I don't want to have to make one just to download a few models
also, did inpainting models and standard models get merged in sdxl??
I am not finding a lot of inpainting versions for the sdxl models
is there a good way to get reasonable results from a local installation - what software is good. I've found most tools including runwayml to be very overkill when it comes to censorship
of sd?
ah, went to the linked context
sure, but it isn't an overnight thing. you have to learn the interface you're going to use, and then you need to learn how the AI you're prompting thinks
Is it possible to fine tune stable diffusion on multiple subjects so that it remembers them. For example, I want to train it on two people, person a and person b and I want to be able to use person a and person b as keywords. How can I achieve this?
that could go beyond typical lora abilities. For that, you might need full weight dreambooth where you train the text encoder too. SEcourses have step by step tutorials for this
generally lora is single subject
Okay, I will take a look into them. Do you think face swapping would be a better approach then? It's not very accurate but I would be able to use it for multiple people
it depends if the project is casual or serious
I don't actually do real projects with image generation cos I am just playing around with it
but if I was ever to actually do a real project I would train a full dreambooth every time
there's no reason not to
Okay, also, what is the best approach for fashion modeling. Let's say I have some images of a garment a, and I want to generate images of models wearing that garment. I want to preserve the print of the garment.
Based on my research, the only way is to do lora on each of the clothes that I want to generate an image of but the results are not very accurate and I need to train a model for each cloth.
there are some other ways that specialise in clothes
for example this: https://github.com/cozymantis/clothes-swap-salvton-comfyui-workflow
Is there a way now to train Flux loras model with a 4070 TI 12 GB Vram?
That's really cool, thanks a lot dude
👍
Stable diffusion doesn't really know how to make specific facial structures from just text prompts in my experience so it would be nice to know if it would be possible to make a lora for a specific facial structure to give it something to go off of or is that not how loras work
(I have never made a lora)
could you give an example
anyway the answer is yeah, a lora can do that
you could also try IP adapter style transfer, instant ID, or canny control net
textual inversion also
if its nose then depth control net
Literally never heard of half that stuff is there a yt video I can watch that will teach me?
yeah there are a lot of tutorials out there
Ok
Fooocus does NOT want to draw a woman sticking her tongue out
is that hard coded as nsfw or am I just prompting wrong
it's an image prompt so it should just follow the picture, I'm redrawing a screenshot
fooocus does a lot of things that it doesn't tell you about
it doesn't just sample normally it adds lots of things on top
it makes the model far less controllable and far more confusing to use
would be happy if somenote with 4090 rtx run a tests.. https://github.com/comfyanonymous/ComfyUI/discussions/4571
I turned off the V2 style for this iteration because in the log you can see V2 adding additional prompts
the idea of fooocus is to have a very simple interface for people who just want to type a few words in and get a nice image on average
if you want to actually improve and optimise things, that requires moving off of fooocus
grrrr
no I know you're right
the UI is just easy and I'm messing with inpainting and image prompting
fooocus does have one of the nicer inpainting methods
I'm having great success turning anime styles into irl styles and vice versa which is just too much fun
with the exception of the tongue thing but we'll get to the bottom of that
anime to real and back is good yeah
Send the workflow somewhere
Oh wait I cant read

if you want to really push optimal speed you may have to move off comfy
fully optimal speed is rarely worth it though
sup
https://colab.research.google.com/github/Jelosus2/Lora_Easy_Training_Colab/blob/main/Lora_Easy_Training_Colab.ipynb#scrollTo=vGwaJ0eGHCkw
Does anyone know how to use this colab version?
cant' use images in this channel
make sure you're confirming people aren't bots when they show up
hi
Is stable diffusion and flux the same thing?
they twins
anyone having problems loading forge after the update?
I'm interested in downloading Stable Diffusion, does anyone have a guide that they would recommend?
flux is newer and better, but also larger (need better hardware)
Automatic1111 or ComfyUI?
dealers choice, idk about either
perhaps the one that would cooperate better with my 8gb AMD graphics card
Try SwarmUI
you can get AMD to work, but it's going to be quite slow. there are guides in the tech-support channel, look at the top in pinned messages https://discord.com/channels/1002292111942635562/1002602742667280404
Cool, thanks to both of you
in flux running in forge, does anyone know what i should set my "gpu weights" to? I have 16GB on my RTX 3080, and 32GB on mobo
hi, how do i specify the seed in the online stablediffusion.com? i cannot find any UI field for that..
the daddies that made stable diffusion left stability and then daddied a new ai model
ugh, i've been using actually stablediffusionai.ai... i get that's a different beast than the .com isn't it?
this is literally a virus
my virus checker dinged it for multiple threats detected
hey everybody
hm. my browser got quite sluggish using it but i didnt want to be overly suspicious. now that you suggest it could be smthg fishy then it's the end of affair. thanks for the warning!
it can't be anything other than fishy things
because the real website is stability.ai
you can't just go to various websites that sound like the thing you need, that's a way to guarantee getting a virus
what, prompt should i use when have mutil same girl
i tried (mutil gir) and 1 girl but no use ===
having multiple of the same person might actually not be possible with just prompting
because it would be extremely rare in the training data
yeah, i get it (i think). so, is there any legitimate online portal/web ui for stable diffusion? (...that i dont need to setup myself? actually, ive tried midjourney.com already any others?)
Why must comfy be so annoying. I put "mixed color bodysuit" in negative, and it adds it as if positive lol
rundiffusion is ok
you need to be careful because for any popular company now
lots of criminals make similar sounding websites and then put a virus there
best thing to do it always go to things via Google
instead of typing the URL in
comfy only does what the user builds the graph to do
😮
🎙️ 💣
actually i think i got there via google (or duckduck) - as of viruses i hope that (my) linux is so marginal that it's not cost-effective for bad guys to support viruses for this ;) anyway, big thanks
Aye, and i set "mixed bodysuit colors" in negative, but they get more mixed in the results than before :P
yeah i dont get it either. i was just trying to drop bombs
if you are on linux the risk is way lower yeah
but it's a taste of thigns to come. you report something like that as a bug on auto, and they know it's not user error. you report that as a bug on comfy and you going to have to defned your workflow like a doctoral thesis
this negative is very unlikely to work TBH
to be fair: that's a really bad negative prompt
the way negative prompts work is roughly speaking you generate in each step two images, one with the positive prompt and one with the negative prompt. Then you subtract both images and add the difference (multiplied with cfg) to the negative one
what happens if you subtract multi-colored bodysuits from, say, one-colored bodysuits? Do they get uniformly colored? No, they just get differently colored
(in practice it's a bit more complicated because you do that in the latent space, but the effect is very similar)
just an fyi, flux ignores negative prompts unless you set cfg over 1, and then you risk ruining your image
the default setting is 0
negative prompts are always ignored for cfg == 1. Default value should be 1, if you use an cfg sampler
0 would mean you generate the image with the negative prompt ^^°
this is a smart way to deconstruct it. good call
ease off cfg after half the steps always helps. there's other cfg nodes too that do good stuff
if you're not going to do all the powerful stuff that comfyui affords, just use forge ui, honestly
I forgot the name but there was a paper that looked into the optimal scheduling for making use of negatives
it was different for object removal and style negatives
so training flux in kohya_ss on their sd3 repo does work, I did 4 epochs and loaded it into comfy, and seems to work just fine. lots of stuff missing still, like it doesnt do samples as far as I can tell, and no GUI yet, so you have to run it all cli
sounds good
on my 3090 btw, so that's another bonus
this is my 3rd day with this card, already beating the heck out of it 😄
lol
Happy Weekend ya'll.
So what are the ways to upscale while gaining minute detail?
Ive seen the control net tiled method bot honestly its pretty confoluted to get to work. So im wondering i there are other ways out there
SD upscale. with low denoise. prompt for minute detail. Like detail in materials in your image. Skin, Fabric, concrete, Wood, etc.
Do 2 or 3 SD upscales too with lower scaling not in one go
the best way is to 1. save the generated image 2. go to capcut's website 3. go to their magic tools section 4. use their image upscaler
Remember when we just drew. We didn;t have to debug or install the pencil. 😄
And we could do it with a power outtage or no internet. We are so gone. So far gone. Hopelessly addicted to this garbage.
remember chewing the eraser off the end of the pencil, and sharpening the pencil so much there was nothing left?
[Applause]
the good ol' days are usually not as good as memory makes them out to be
yah, like the blisters from holding those wooden #2 pencils
Yes I did have blisters.
I wasn't at the mercy or programmers and power outages tho. And everything I made was more special. Now I just crank out thousands of images always frustrated. Would I go back? no, of course not. Do I think this shit is better no absolutely not.
We're on a road to nowhere. Come on inside.
It's never good enough.
It will never be good enough.
yeah but my drawing skills suck
with ai I can be better!
in between like third grade and middle school I wasn't afraid of drawing though, maybe I just thought I was good, but before and after that I've been terrified to pick the pencil back up
it's actually an immobilizing perfectionism
looool
well
As artists we never were are or will be happy
whether ist AI or pencils
we will never be satisfied 100% with our work
Really? from a website
yes really.
the image-resolution-enhancer?
thank you will try out
go to capcut's website, click on magic tools on the left side bar, and look for Image Upscaler on the top row
right next to video upscaler
tyty
hello all... just wondering, is https://stabledifffusion.com an official site or are they unrelated to stability AI?
I just tried it, unless im using the wrong feature it's simply upscaling. im not getting any new detail
that's what you said you wanted. a minimum of new detail. 0 new detail is the most minium you can get
scroll to the bottom of the page and look at the stuff there
gotcha 👍
Sorry... Perhaps I should have asked a different question - is it ok to ask questions about that website in this channel?
certainly.
Ask and you shall receive.
A couple of months ago my kids were having a lot of fun on that site making images "in the style of" artists/studios that they like.... But today they tried again and the result were really inconsistent and not at all as easy to get as the last time they tried... Has there been some change to the model or some kind of copyright change that's reduced the effectiveness of "in the style of ?" type prompts?
probably because they're using flux on the site instead of sdxl
or they're using SD3 8B
Is that a setting they would have changed? Or a setting that changed that they need to revert?
i might suggest you get your kids free accounts on mage.space and then they can use all the versions of stable diffusion
Honestly they had so much fun I'd be happy to pay for something... I will look into mage.space
Is self hosting any of this stuff viable yet? We have a home server running proxmox and a couple of PC's with rtx 3060s
sure. you want to look into running comfyUI. go to youtube, look for scott detweiler - he works for SAI. watch his tutorials on setting up comfy
I will do that... Thanks so much for your assistance
Gracias por el input.
Anyone has contact to bfl from flux
So ive been messing around with SDUpscale vs TiledDiff for sake of detail gain. I dont have definitive results yet, But if the trick of **upscaling at seperate levels instead of in one shot ** truly does give detail then SD simply wins because with TiledDiff it seems you're just not able to scale in bits as it just doesn't play nice out of the box (maybe this issue has to to with TD parameters)
not this discord
https://t.me/dogshouse_bot/join?startapp=8PYp1s3kTTSEkZBzibx3Qw
Who let the DOGS out?
Hello. Can someone walk me through the installation, please? Apparently I'm doing something wrong.
in L2 discord, Treeshark has a workflow that signficantly beats ultimate SD upscale
its very similar but its done as a manual comfy workflow instead of one node
lets you manually re-run tiles and have different denoises per tile
if you want an easier version, someone made one of his previous upscale workflows into a node that is called "Mcboaty"
"Mcboaty" still beats ultimate SD upscale although it is not quite as flexible as the full workflow
having said that, my personal recommendation is not these, but instead this: https://openmodeldb.info/models/4x-NomosWebPhoto-atd with SUPIR after if needed, and using deepshrink/hi-diffusion/res-adapter to get the initial image as large as possible
guys whose have a comfyui_execution module in impact pack?
Yoh, the keypoint to add details is to inject some noise. Cheers.
https://image.civitai.com/xG1nkqKTMzGDvpLrqFT7WA/c7b06656-bcf9-457f-bee6-6d353325b95b/original=true,quality=100/4249906775-1.jpeg
Check dm
that's rly rly nice
That's a gorgeous image!!
I need to google info about injecting noise, I have no idea what that means haha
With A1111/Forge it's pretty simple, basically a single setting.
All my images upscales are using A1111 with :
- Ultimate SD Upscale (9 tiles whatever the ratio) with ~0.5 denoize, some extra noise, padding of 256.
- Soft Inpainting to be seamless.
There is a bug with ADetailer, so i can't run ADetailer in each tile anymore, but i was able to. Opened an issue on github for this.
I posted something at some point to show the difference when using noise:
https://www.reddit.com/r/StableDiffusion/comments/1dkh5cv/a1111_forge_improve_your_images_with_a_single/
Thank you, I will read your post later today! 🙂
Not quite sure how it's possible to add in some extra noise or do soft inpaint alonside Ultimate SD Upscale, but I'm sure you'll go all that trough in your post. Looking forward reading it. Thanks for sharing!
@fathom marten just browsing your images on civitai, man these are fantastic!

Ahah thanks !
I've wasted weeks trying to improve this.. But I still can't, as i'm limited with 12gb VRAM..
I can't use even a single CN nor go above x1.4 for the hires part.
Could be better with a better hardware, i think.
I'm also a huge ZavyChroma fan! 🙂
ZavyChromaXL is awesome ! ((:
It certainly is!!
With a better card, i would also downscale the result and run a 2nd pass.
I think it would be wayyy better too.
Frustrating lol
Bro I felt insulted when the app said "lowvram" I didn't think 16gb could ever be called low 🥹😭 lmao
if you like ZavyChromaXL, I do too, I found Leosam's HelloWorld to be a straight upgrade from it
Nice, will have to check HelloWorld out, thanks for the tip!
So whats the best combination to use Flux.1 dev or snell with a 4070 12gb GPU? Currently trying with forge and the dev nf4 version and adding any loras makes it dog slow.
I tried both and Zavy seemed better. 😮
Thank you, where is extra noise? And I'm still not knowledgeable about the tile size I should be using. It says higher is faster and no seam so I maxed it out but then I heard lower ads detailing more carefully or something.
the double edge sword tho is you need higher denoise for new details but I usually want to keep the face as it was since it came out good
Comment is appreciated, guess I'll jot this down if I ever learn comf.
@fathom marten What kind of settings do you have on the softinpaint part when you're doing the upscaling?
Zavy is more contrasty, for those who like that
Default. These settings are still shady to me, I never did play with it.
True, black is black. I also use a style with "highly contrasted" in it.
Ok great, I have to test them out and dig some info if needed. I noticed when I'm in the Img2Img tab, the softinpaint is missing and I have to go to inpaint tab to be able to use it.
Hello! Do you know a good Stable diffusion template for Runpod?
not sure about runpod but the default vast ai comfy template is good
for the most part runpod just costs more for no advantage, relative to vast.ai
it does let you roll the dice once more by having a different set of servers though
It's because I want to use it as locally, I want to train my own models and images
I used LastBen Automatic 1111 and that worked well. I'm looking something similar
i'm going through pony realism models. people telling me they're great at versatility. i'm finding that they're great at one thing only. they don't know general knowledge. all bikes are harley bikes, no ninjas. all outdoors are the same. no machu pichu. no national parks. They all look the same. all bedrooms look the same, the same neutral colors because certain production sets need it for good lighting. Pony models are really great at one particular situation. People are obvious when they praise it.
they default to pornographic situations and bedrooms if you don't prompt specifically against that, and even then it comes through too
Skill issue
go on Civit.ai, turn on maximum NSFW filters to filter as much NSFW as possible
then browse images in the Pony category
the feed looks perfectly fine
hmmm, im getting a bunch of size mismatch errors when trying to run flux on comfyui , cant figure it out
i did that. first two pages were just portraits of characters that seemed like they were about to be nsfw and probably were in the rest of that user's gallery if i looked. i'm already viewing pony though a lens. i'm going to see what i see still.
i looked at pony main model, then the popular pony realism, then cyberrealistic pony. it's all really generic stuff that is really just ONE catagory. PG rating viewing only even, so Civit is intending these images for kids 12 and under. PG13 is worse.
a good case study is cyberrealistic xl and cyberrealistic pony. they're both refined on the same datasets. just one is refined on a standard xl merge recipe where the text encoder layers are intact, and the other is refined on pony's broken but very topically focused text encoders. they don't prompt the same at all. there's no versatility on the pony version. clip skip 2 or 1 being used. try to get the pony version to make machu picchu.
i don't think cyberrealistic is a good photo realism model fwiw. it's good if you want that over produced professional synthetic airbrushed look on everything. It has a neat vibe to it but it's not #1 photo realism. just a good case study of prompt versatility and knowledge
i did that. first two pages were just portraits of characters that seemed like they were about to be nsfw and probably were in the rest of that user's gallery if i looked. 🤔
this still sounds like they weren't nsfw
by normal machine learning standards essentially every checkpoint you will find is over-fit anyway. This goes for base SDXL and it goes for cyberrealistic xl
when people say Pony is over-fit I think "yeah but Flux is over fit..."
its possible, or even likely, that over-fit models will score higher on human eval for image gen
due to the increased ease of making an image with higher image quality, even though image diversity has been lost
personally I would rather have the image diversity which is why I run super low CFG, but I am in the minority
about 20% actually were nsfw
flux certainly does have class collapse. distillation problems.
I'm comparing photo refines of pony to other photorealism xl models though
entirely different leagues imo. not worth comparing at this point. flux is still a very dynamic situation that is evolving day to day
you could say, it is .... in flux
lol
Hey :3
I have a question for those who use IMG2IMG and use the SDUpscaler
If you use it on a folder with several images.. Do you run it without prompts and negatives or do you run the same prompts and negatives as the images?
humans evaluating will have biases surely. and rating one image at a time will inform nothing about the depth of knowledge the model has. also, Goodhart's law is keen in most situations, which is probably why it's considered a "law" of sorts
oh they have some metrics to measure image diversity in some of the papers
there's also just clip score on imagenet/MS COCO which is normally the prompt following benchmark, but has some relevance here too
at least according to Dr Head, a lot of it was done badly during training
I wonder how much of the abilities of the model to follow tags could be done without over-fitting a model too much
their next project is targeted for auraflow apparently
which would have a better chance
as auraflow has shown some of the most amazing prompt adherence examples I have seen from any model
dr head is right. While the new tenc is smarter towards pony's dataset and captioning, it obliterated all the base model knowledge that came before it and it came to know only the new dataset captioning. a very large dataset but also small compared to the base model and all the refines since then. i believe all the base model latent knowledge is still in there but much like a distilled model "galvanises" it (a metaphor i'm trying to make a thing), the knowledge isn't really linked to any promptable contexts anymore.
he's not the only person saying this stuff. he's just an effective communicator. its a hard concept to communicate especially when there's so much fandemonium surrounding the pony scene
should i get 64gb of ram or 32gb is enough for sdx+depth+ ipadapter+ comfyui?
32 is enough. 64 gives more leverage
really depends on your vram. 32 can help 16gb vram loads a lot. 12gb of vram a lot less.
system memory is also where all of your browser tabs and extra programs live too. windows will manage all the memory allocation really well, but you'll want to look at your task manager details, sorted by memory use, and check out how much your average system use requires
the more u can get the better,same with vram
make sure its' a compatible kit too. dont just add new 32gb to a different 32gb. confirm they match first
or just get a whole 64 kit
i am getting one pakcage xpg 16gb x 2 3600mhz
you got 2 16gb right now?
i have 8gb x2 and 2 slots only in motherboard
yeah. consider shelving those when you put the new 16gb in. 48gb sounds like more but it could slow it down a lot. potentially not. there are things afoot when you mix dimms.
i will sell them or put them in my pc office.
i collect them and pretend i'm going to do something with them one day
yeah i wish i got 32gb ram directly in 2019
i hope i dont need a 64gb in future.
i bought 32 in 2020 knowing that i'd buy the same kit again once i could justify it. wanted a new gpu first. now i got 64
i even hate getting 16gb x2 now cuz they are ddr4. next time if i want to upgrde to ddr5 i will buy ram again.
maybe i sshould get a gpu and forget about ddr5 and new cpu.
u could also buy a single 32gb stick that way u can get another one in case u need more
Better to just save up for 2x 32GB dimm kit at that point than having to lose out on half the bandwidth by not running dualchannel, and also ending up later getting the same model ram stick, but slightly not the same one due to different subtimings.
you can buy the same one as long as its the same part number
Aye, hopefully they manage to find the exact same part number too. I spent 3 days locating a store that had my kit in stock to get the exact same part number 
yea thats the hard part i was lucky to find one in ebay 
Nice :P
I buy mine new as you gotta love 5 years warranty by law 
Have any models been released with the clear declaration that it was only trained on royalty free content, or IP fully owned by the creators? Or does nobody give a crap
I was talking in another server about ai and they got all mad and said they don't like the possibility of "art theft" when a model is trained on someone's work without their permission
yes the adobe models
So much stuff in short amount of time. Starting to get to be a lot 😵💫
yes, well - if it's okay for you, or any human, to read content, watch content, and learn from it without it being considered stealing, then it's okay for AI to read content, watch content, and learn from it without it being considered stealing.
humans train themselves and their offspring on copyrighted material every day - and don't give compensation to the authors. the AI doesn't use the content it learns from in any way different than humans do. the AI haters are still of the opinion that it's doing mashups like you do in photoshop
that is the exact rhetoric I heard from the haters
that it's mashed up
I see risks and dangers of this software becoming sophisticated and easy to run but I don't throw around second hand opinions for virtue signalling, and that's what it sounds like to me when I hear anybody bash AI, crypto, Tesla etc
you read two twitter posts and now you're a SJW against AI that's great
How do you tell Flux to generate an image of subjects in the distance? It keeps forcing a closeup.
that is what they think, and no one can tell them any diffrent. they are positive that the AI is using images and jsut taking bits out of them
good question. so far, i'm not finding any way to successfully prompt that. so i'm using outpainting to add content around the edges, move the subject away, and then crop what I want
I had this exact problem the other day
I was probably pissing off crystal with that one
you play with their heads. one of the things I love to do with those people is ask them how they fit a library of billions of images into a file the size of a model. they go red in the face
I actually don't understand how the models understand all they do with a few gigs
it is incredible
it's a complicated subject but they aren't storing large images. your image is made up of pixels and each pixel has a lot of data in it. you can actually read the code for it with the right editor. but all that data takes up a lot of room. however if you learn that humans have faces with eyes, nose, mouth in certain spots, and you learn the shapes of them, all you have to do is make a note of that data, and then when you want to make a face, you look at that data and go draw a face. no need to use any specific faces for that, just the information of what a face looks like
it's the same thing you do. watch yoruself think. if i tell you apple, you will have an extremely brief flash of every bit of data you've ever learned that is associated with apple, and then your mind will settle on the bits that have the most weight for you and rapidly construct a mental image of an apple
but what is even more impressive is when the model can reproduce a specific person's face, specific model of car, specific art style from one person. At some point there is a photographic like quality to it. Although maybe it is just beyond the way we can comprehend patterns as people.
like you said it might just be weighted just right to produce consistent images
sure, and you can memorize to - though we really dont' want them to memorize like that. but if you show the AI thousands of images of shrek, and tell it 'shrek' with each one of them, it sort of has no choice. you pounded shrek into its head
can't use images in this channel
I know, I keep forgetting
that sort of overfitting is actually a bug. we really don't want it to do that. Flux has that problem. it's way overfit for fantasy. IF you give it a token that should bring up random data, and generate a batch of 4 to 10 images, you should get stuff that makes no sense and is all over the place. jsut random junk. however, do you? put this prompt into flux dev: .....,,,...!!?
and see what you get
I don't know what you are talking about. I get an anime image, a pixel art image, nothing related or overfitted
batch size 6 for prompt : ".....,,,...!!?"
- photo of a dog
- photo of some people on street
- digital painting of a winter landscape
- Two anime characters
- Photo of an autumn landscape
- Some weird pixel arts
so much about overfitting bug lol
but you shouldn't. you should get random junk. see how much fantasy you get when you leave the prompts (neg and positive) totally blank - you're not giving it any tokens to use, so it's just going to go pull up random data
no, you shouldn't get random junk
that's not how diffusion works
every seed will give you a proper image.
do that with sd1.5 and show me what you get. and sdxl
the selection of images is random, but not the image itself
even there you get an image. The problem is that they need cfg to produce good images and for cfg you need a prompt
here, i have a challenge for you then. use just the word gun as your prompt, post the images in general with images. then just the word Javelin as your prompt. post them too
flux dev is cfg-free, so it will give you a random image if the prompt is empty or nonsense
i want to see you get just a javelin, the way you can get just a gun
I don't even see how that is related to your initial claim. I can try that tomorrow, go to bed now
In 2 weeks yes.
you hear lots of things on the internet - take all of them with a huge grain of salt
Does the processor and motherboard matter as long as I have a good GPU and enough RAM?
I'm probably getting the 4060 TI 16GB and I have 16 GB RAM, but my processor is Ryzen 5 2600x, not sure if that's good enough or it'll drag everything else down
anyone use diffusion bee
It is said to upscale in multiple runs with Ult SD but my second pass seems to come out extremly fuzzy and noisy.
Man oh man. All day i been playing with Silly tavern's character building stuff. It gives some really crazy leverage over LLMs. I hated them before cause i didn't have tools to turn it into chatgpt, but i do now. Shit's good. Plus also now i know how to run a llama 3.1 instruct model and sdxl at the same time and it'll interact with comfy or automatic for image creation. So you can make really powerful prompting bots. and 3.1 has vision capabilities so you can show it a pic and build a prompt around that.
There's so much more software work that can be done to enhance the tool, but people aren't even tapping the potential it has for chat bots and interactive experiences. i'm talking potential MUD capabilities, not you know, 90% of the "programmed" bots out there are just bonk lol . "you are a step sister with big bigs. you will not not be this step sister" is basically what most of the popular character templates available are. Great potential for stuff though no doubt. People are sleeping on silly tavern
think i might even be able to fit nf4 flux and a more quantized model into vram together. if the memory swapping plays happily
pushing the limits of my 64gb of system memory with beefy 25k context sizes
if the conditioning signal is not very strong you just end up getting results that look like sampling from the unconditional distribution
which might be what's happening with ".....,,,...!!?"
silly tavern style stuff for DND has a lot of potential yeah
could potentially encode the actual odds mechanics into macros and everything. there's so much macro potential before you even get into scripting or new samplers
and the models that run on home gpus are REALLY good. not gpto good or whatever, but goodnuff
within a year can probably get gpt 4o performance at home
and then there's the whole databank situation. which is stuff it can procedurally index and queue up into the model prompt as needed. So basically dnd lore books
yeah that's kinda the tricky part
there's some knowledge graph projects and stuff that tries to work on optimising context injection
right. i tried to summise that. like a textual embedding
basically are textual embeddings
yeah they actually are textual embeddings that's right
Silly tavern will use it to queue up knowledge and passages directly from source too. Interesting stuff
But I expect a response from stability
Flux took all the market
I have the feeling SillyTavern only uses a small part of its potential, because the chat templates never 100% work
So I would think there will be a new version to fix these issues with sd3.1 or sd3.5 I don't know what they will call yet
like I don't get why they don't have predefined chat templates for the few big llms
sampler and model type would really matter a lot towards different templates i imagine
My favourite local llm is Gemma 27b
and it was sooo complicated getting their format into SillyTavern. Which is crazy, cause it's the most simple format you can think of
this software is just overly complicated to setup
I often prefer to just run Ooobabooga
and insert the template manually
not an option for roleplay, of course, as you don't want to bother with correct syntax
I still have no clue what you are talking about 😂
Both prompts, "javeline" and "gun" works fine, except that in some cases flux makes an arrow instead of a javelin.
???
Just do Google image search for "gun" and for "javelin". It's the same. If you prompt for gun you get a gun, because there are a lot of websites with advertisements showcasing their guns. If you google for Javelin you get either historical or fantasy images or olympic athletes, but you won't get a single image with a product placement of a javelin
Flux is not buggy and overfitted - this is just how diffusion works.
I don't rly understand the point crystalwizard is making in this conversation
regarding gun/javelin or the prompt ".....,,,...!!?"
but anyway I would say flux is slightly over-fit on the midjourney/instagram/aesthetics look
it could be Skill Issue on my part but I can't get base flux to output a look like RealvisXL or Realistic Stock Photo
what I can easily get out of flux is images that resemble either midjourney itself or SDXL checkpoints+loras that are mimicking midjourney
I agree on that. But that seems some kind of fine-tuning or dpo problem. The model has the capability for a lot of styles, it's just difficult to prompt them. Seems you need loras for that until someone makes a general finetune of the model
But that seems some kind of fine-tuning or dpo problem. this is almost certainly the case yes
the base model could have done Realistic Stock Photo and then they fine-tuned that ability out of it 😿
I wonder if it was an attempt to make it worse at deep fakes in order to deflect regulator attention
a bit like how Dalle 3 can't do realism either
hey all, hows things going
I'm wondering if it's my lack of experience or the fact that my checkpoint is "v1-5 pruned-emaonly-safetensors" that I'm getting poor results.
I followed a recent tutorial to get it installed with AMD graphics
its not a great checkpoint
hello guys, what is better for GenAI? 4070 super or 4060ti, ipeople on reddit recommend 4060 ti because of 16gb vram but most of benchmark I saw 4070 super is faster than 4060ti? any tips guys?
vram is everything
why on benchmark 4070 super is faster though?
Does more vram give you a better picture or does it make the image generate quicker?
because speed depends on other things
but slower with more vram is better
having more vram lets you run things that cannot be run on the lower vram card
oh ok
is less 4gb vram deal breaker? ill just do comfyui+flux, or it is not future proof?
vram is the most important thing of all
and every amount of vram you increase, will increase the amount of things you can do
Anyone knows a model \ lora \ workflow \ prompt or anything that can generate images I can safely downscale to 64x64?
I want to make few quick items for a game, but so far downscaling not doing me good...
what is the 'Guidance 4' I see with some loras
its for flux
yes. You want to have as much vram as possible. With a lot of tricks and optimizations you can run everything on 6gb, but it will be slow. 12gb is a solid foundation
Why does flux do this? The images though different prompts are very similar in the composition / background, same angle, same plate, same background.
Check general with images for examples.
This is not necessarily bad. But i need to know
Is there a way to get more diverse results (without modifying prompt) ?
they just trained it like that
and then distilled it
its how it is
is it possible to train a lora / fine tune which instead of making photos a certain way is specifically designed to modify input photos a certain way ?
kinda like inpainting would. Atm i havent found any workflow that works for flux in the same level as sdxl inpaint on fooocus
instruct to pix models. there's lots of that stuff out there
are you sure that you use different seeds for this images? They look very much like same-seed
aesthetic guidance and it's also really good at maintaining prompt adherence, that is, how it understands your prompt
the training data could be a lot of phone photos of food that are up close like that too. and phones all have a very similar filtered aesthetic
similar is one thing, but damn, look at this
#🏞|general-with-images message
thats what i was looking at. thats what my entire explanation was about
i mean, you asked for insight and i offered some. then you argued with it. nevermind then.
Is nvidia a100 good for stable diffusion or it is good only for text ais?
I'm pretty sure he just used the same seed for all these images
then it's natural getting almost the same image back every time
you pretty sure or unsure?
Any way to train flux with it yet?
probably. unity put out an instruct adapter for sd3 already
their project might lend you some clues
Thanks will check it out
the SD3 technical report even talks about tests they did, refining 2b on an instruct dataset. that's worth reading too, but i'ts just a paragraph in the whole report
true, that was the case. The seed was the same. When using from UI it changes after each generation, but i had saved the workflow and loaded each time to make calls via api (so seed in the json workflow was the same each time).
Do anyone know how to make sd forge use random seeds? if I use seed -1 the seed in a batch of images would be xx3, xx2, xx1...etc
-1 is the built in way, you can put random numbers in there of course, generate them some other way
can anyone help me guide those thing #📝|prompting-help message
hello
yes it's very good. Probably one of the best gpu's for all types of machine learning models including stable diffusion.
elaborate?
i downloaded this new model i saw that didn't look too bad. was for roleplaying so i assumed you know, dungeons and dragons or like, rpg games.
Nope it's trained on roleplay from yahoo chat or something. gonna delete that thing. the future is now hey?
i was thinking maybe i could hook it up to skyrim. thats probably what it's meant for anyways even
Wow - I was just writing an "Issue" for Forge and when I clicked "Send" it brought me to 404 page.
I go back and... Illyasviel seems to have removed Issues from the repo lol
Oh dayum there's actually a big news thing on the front page about it
Was it an extension not working? People were really abusing his repo. Dude gets a lot of shit
The forge dude was roasting city96 for his gguf formats in one of the PRs. Like yeah, let's roast him for making the first working gguf format for dit based models and then complain about having to spend 20 minutes writing code for the format. I hope he was just trolling
Come one we don't need any more popcorn moments. AI is the future. If you want popcorn watch Hollywood burn.
do not worry the internet officer is already here to settle this
Nah, was making an Issue regarding the VAE / Text Encoders (in Forge are now handled together as a “list of extra modules”): PNGInfo send to t2i / i2i currently ignores the “Modules” info. Also override_settings (API param) needs updating to accommodate
Tried figuring these out but it’s over my head
hey all! I've joined to better my creation of funny cat images!
A honorable goal.
lol I love that
idk man
that profile pic
is cursed lmao
you might be too powerful
to use ia
hello guys, can I ask whats better for comfyui or stable diffusion, 4070 super(12gb), 4060 ti(16gb). on benchmarks I am seeing 4070 super generate more images and generate faster than 4060 ti despite of only having 12gb. but most people on forums,reddit, etc. suggest VRAM is king. but how? what other factors should I consider except speed?
4070 is better,the 4060 is a crippled card
why is the 4060 crippled? asking because I'm running a 4060Ti at the moment, and thinking of getting a 2nd one to help with generation of images or getting something better if I can afford it.
it only has a 128bit memory bus so it has less memory bandwidth than the RTX 2060 Super
so would a 4070 / 4070 Super be better, then?
I can get a 2nd 4060Ti for $400, but 2 x 4070 Super would be $1300
yea or if u have the money u could also buy a 4080 16gb
or if u buy used a 3090 ti for like 700 usd
gotcha... didn't realize that the mem bus made that much of a difference, still learning a lot about doing Image Generation
well it doesnt it only happens when the memory is full
so it happens when u play a game at 4k or if u doing a very high res / very high img batch size gen
I'm mostly doing SDXL at 768 x 768, then upscaling to 1080p or 4k
as well as using it for other things like LLM chatbots and such, separate from doing image generation, that is
that should work fine,you would fill all memory if u were like training lots of imgs or using a very big model like the big sd3
yeah, definitely not training, just using models to generate images... I might try training at some point, but not quite there yet in my journey into things
I'm mostly interested in cutting down time to generate images. I've had some landscapes that I tried doing that took an hour or more to do 4 batches of 8 images in A1111. Hoping to test out the same things in Forge to see if there is a difference over the next few days.
does running a 4070 and a 4060 in the same system help with generation times?
no u can only use one card to gen stuff , if u wanna use a second one u would have to open another instance of a1111
would something like this not work with A1111, Forge, Comfy, etc? https://huggingface.co/docs/accelerate/usage_guides/distributed_inference
that isnt implemented in a1111 there was an old ui that worked with two gpus but its dead now,but two or more gpus works in training
gotcha, was just reading into multiple GPU scenarios and came across that
ye this is literally the GPU I use these days
it makes 300 steps viable as your casual generation setting
Diffusers, particularly their Jax version, is the most optimised out of the common CLI/GUIs for multi-GPU
anyone abel to help me?
i was doing jsut fine making images, but now no matter what lora i use or what i do the faces are messed up. but i didnt xchange anything
you might post in #🤝|tech-support
o/
city96 commented Aug 25, 2024
and I hope this is the last time that people randomly invent new state dict key formats to waste time of other developers.
This is the llama.cpp standard GGUF format for LLM models, which natively supports the high quality T5 quantization methods used, and which both of our codebases use as reference. Likewise, the standard llama.cpp build was used to create those quantized models without any patches/changes.
I think it's rather unprofessional to put comments like this in code/commit messages while dismissing the work of the llama.cpp team as "randomly inventing" formats and find this rather diasppointing.```
am i reading that right? is llama written in c++?
Yeah that's it. City96 called him out on being a dick
I was under the impression that what city96 did for the ggufs was nonstandard or something though
Who's City96?
The guy that got ggufs to work with flux
did he have a reason for this?
Not sure, but he's making it sound like he followed the llama.cpp way or something
imo it doesn't even matter if it was standard or not
wonder if he didn't have a choice
apparently it matters to someone
Well for performance reasons, likely not
that tone has no place in a civilized discussion, it's disrespectful and rude
and is unnecessary
did you forget it was taking place on social media?
yeah.... siiiigh
it probably broke forge or something
Llama.cpp is in cpp and is super fast. If city96 wants to bump up gguf performance, llama.cpp has to add stuff to their repo for it. So what city96 has working right now isn't perfect and isn't the most performant, but it works great. Once stuff can be added to llama, you'll likely see massive gains in performance
why is there always drama when it comes to diffusion GUI devs
revision: why is there always drama
because programmers are artists - and artists fall apart emotionally at the drop of a hat
it does a lot of harm cos it splinters the ecosystem
granted, not that it's ever not been the case.
all the way back to jobs and gates stealing each other's tech
Hotheads, same as any other field
Tech ones have some of the hottest though
that describes the human race
this is why AI will never want to take over the planet - then it has to deal with humans
Ultimate SD smaller tile = better detail that true?
the page doesnt say so but videos do
not rly a fan of tiled diffusion upscale
go with what works?
funny things start to happen
depends on the model really
my usual upscale recommendation is deepshrink/resadapter/hidiffusion -> HAT/DAT/ATD -> supir
SDXL can get to 5k resolution using 2 passes of deepshrink/resadapter alone]
can anyone help me with a good comfy workflow for a consistent character? I tried making a lora with civit but when i use it, the quality is garbage, definitely not worth the 4k Buzz it cost to make it...
style transfer IP adapter, instantID, control net depth and canny?
i'm not quite that good, lol. I've tried a few pre-made workflows that do ok, but still use more than half of my prompts for face and body style to stay consistent
for the four things I mentioned you pretty much just wire them up in the way their examples show
ok, i'll dig into those and give it a shot, thanks
🥳
How i can get black background in img2img from my white background image ?
this isn't rly what img2img is about
wdym
img2img starts with your input image and then does diffusion with that as the starting point
if you want to change stuff its not the best choice
oh i missunderstand
you know how i can do this trick ? becasue actually i'm generating good (2D sprite) image for my game with a white background for png porpuse but i'm having issue is when i try to remove the white background i get a white line in edges around the whole design i tried with Photoshop and krita and some website i can't really remove the white background without losing some pixels i'm stuck lol looking to solve more then 24h 
yeah
the way is to use tools to get aspects of the image
style and composition transfer IP adapter
various control nets
Instant ID if there are faces
with these combined you don't need img to img
i didn't understand how to do it is it hard ? (i'm new on ai)
if you follow the instructions you should get there
I'm trying to use Virtual Tryon (IDM VTON). It takes input of a person and input garment and then outputs the final image with the person wearing the new garment.
It also supports text prompts, but I'm unable to see any difference with the prompt.
Does anyone know how to increase the prompt strength? Or have some idea about 'classifier with guidance' does.. it'll be helpful!
I mean, good tools like photoshop can remove the background quite well. There shouldn't be lot of white pixels left
but you can do the following:
- remove the white background
- paste the character into a black background
- ideally, you remove a bit more aggressive the white pixels even if you loose a bit of the sprite
- then do inpainting or img2img to repair the sprite (for inpainting, only mark the edge of the sprite to repair it)
there are other deep learning methods than IDM VTON that are also specialised towards clothing
maybe some of those would work better
I don't follow this area too closely
you might be able to get some of the way there with style transfer IP adapter and canny edge control net
as IP adapter tends to be able to transfer clothes fairly well when doing style transfer
if you are willing to apply IP adapter block-by-block you can get MUCH better results than normal, and with less concept bleeding, but its a lot of effort
Wow that got a bit heated up there
Im glad he pinged me, easiest ban of my life
never seen a spammer ping the mods before 🤔
If you use Forge, it has an extension called layerdiffuse which can make images with transparent backgrounds - recently updated to also work in img2img (incl inpainting). I know you said “black backgrounds”
you know what i just download it and it won't generate also in txt2img xD it say "RuntimeError: shape '[640, 768]' is invalid for input of size 1310720" i tried to change the resolution to 1024x1024 same error
dayum,you re rich

its under 2 dollars per hour
on vast . ai
oh
i thought you actually had an pc with a100
its actually the norm in machine learning field and industry to use cloud
using home PC is just a current trend from reddit mostly
but most funny thing som eguys tried to run games on a100

and failed
yeah it might not actually be that great for games
drivers are everything
they just spent 24k $ for nothing
world most expensive lint roller
A100 and H100 are extremely over-priced
they aren't that much faster than a 4090
but Nvidia monopoly sadly
because like rn 4090 in my country costs 9x average monthly salary
wow
okay yeah
I’ll have to try again later, last I checked it was working. Make sure you’re using Forge and also up to date including the extension
I’ve also only tested it using XL models
yep it's updated with the extention and i tried several models XL i think there is something to install is missing for me
thnx nah i think the problem is on my side 
it work now lol idk what was the issue i just tried several things and restart my computer 🙆
The world wide fix... Turn off, Turn on lol
Never seems to work on my bottle of Jack Daniels tho. I always have to go and buy a new one lol
It work only on artificial 😄
don't try it on bitcoin pls
it worked lol i posted in general with images
Hello, my name is Jack, I am an Architect. I really like how SD services works with shadows, highlights, and lighting in finished images, but I don't want the AI to change the objects in the image. However, I haven't been able to achieve this through the settings. Is it possible for me to upload a finished image, and the AI would only work on the shadows, highlights, and lamp glow, without changing anything else?
its doable with lots of advanced techniques
and even then it will not be fully reliable
it can't really be done with an easy plug-and-play solution at this point
control net, IP adapter and Lora are the things to look into
hi
hello~guys? whick is the best upscaler model in your opinion
Hallo liebe community ich möchte ein kurzes Video ca. 15 bis 20 sekundn erstellen , könnt ihr mir eventuell dabei helfen? wär escht klasse. Mein Englisch ist leider nicht gut und ewenn ich in deutsch korrespondieret könnte ,wäre das auch super, danke schon einmal vorab
Hello dear community I would like to create a short video of about 15 to 20 seconds, could you possibly help me with this? it would be great. My English is unfortunately not good and if I could correspond in German, that would be great too, thanks in advance
got my comfy a bit more...comfy, but its still not where i expect it to be
Hi, I asked this question on Huggingface. Going to try here as well: I am using stable-video-diffusion-img2vid-xt, and I experience that the videos I create starts moving backwards about half way through (not all my videos). How can I do something to avoid this?
I seriously need to get a few 4TB nvme's for my models
Almost wanna go dramless as they will just be loading models 90% of the time anyways, so write cache is a non-issue lol
Lol, im glad i read this comment. Made me appreciate my 300$ 4tb ssd investment
i just downloaded the sd3_medium.safetensors from huggingface and creating a simple image from an example prompt generates only a dark blurry junk image, what am i doing wrong? i am using a amd fork of automatic1111
hello guys
So I have an odd question. If I train a lora with 30 pictures of a red background and 1 picture of a blue one. Will it make a slightly pink background or will it just have a 1/30 chance of blue or will it just ignore the blue and go with the majority?
yo
Hi:) What is the best method for regional conditioning right now? (Both by prompts and by input image). Ive used many methods before, all of them very unreliable, with a lot of bleeding. Best one was ipadapter attention masking.
In my usecase, i need to heavily respect the mask boundaries. (Architectural renderings were i need to specify materials per mask)
Most methods use the regions as a "guide", but the shapes of the masks are not respected 100%
I still need to try densediffusion tho
can i pay someone to make a lora for me?
Lower step(8 and 16) version of flux
https://huggingface.co/ByteDance/Hyper-SD
From my very limited testing its much better then flux schnell and merges while having similar and(sometimes) better prompt following then dev.
can anyone help me to run SD smoothly on AMD GPU?
what sort of lora?
go to the #🤝|tech-support channel, look at the pinned posts, read all the guides. that'll get you started. then post in that channel if you need additional help
Thank you
like sdxl or what the lora is on?
what the lora would be on
Aye lol. Sadly, cheapest 4TB here is 380 :P
And make sure it's gen 4. As load speed from a 1TB gen 4 SN850 is 2.4GB's when loading models, and a gen 3 is half that.
got Kingston NV2 4TB M.2 2280 NVMe Internal SSD , its even 230$ on amazon rn
I need to get more ram though 32gb not enough anymore
Oh neat! And yeah, can testify for that lol. What i find most annoying though, is that i see pagefile gets filled instead of using free ram whenever i go above 24GB vram 
Got 64GB ram, so 32GB free most of the time lol
I envy you, not for long tho xd
Aye :P
I already got 2x nvme's filled with A.I nonsense lol. As there's always one model that can give that visual, while an entirely different model gives that visual, and that's 2-6GB wasted per slight change lol
i used to hate paging to ssd's. i was afraid that they'd wear out from write death. but i guess to do that you need to do constant write operations to it for over 10 years straight. It took me till i got my gen4 m2 to start allowing windows to do page files on the ssd.
gotta say wowweee it was a boost . don't be afraid of hammering your ssd. all i'm saying.
metaphoric hammer not a real one
i still got my image generation folder on a bulk 3TB HDD . i'm getting annoyed with it tbh
scrolling the folder in explorer and it's so slow
I had a high end WD nvme lose 5% life in a handful of months of paging it hard. I moved my page file to a regular SSD, so it's slow as hell in comparison(like 1/10th the speed)
But the way I have things set up now, it rarely ever touches the page file now
If assuming NVME's were capable of always constant 7GB's of gen 4's, then 11.9 hours of writing data to it, and the cheapest nvme will be dead.
But seeing as after 8 months, my server's cache nvme of gen 3 which has 600TBW, is just at 134TB, and it is holds the active data for my docker containers, and used to write 150GB backups daily for over a year until i changed that to once a month.
And we probably don't write much to the nvme vs a server's activity lol
yeah ultimately, you shouldn't be paging much either. good point
i was really scared of hammering writes to my SATA ssd that i first got. they did used to suck with a lot of writes for sure.
Only SSD I've ever killed was a 64GB Kingston SSDnow v100. As neither win 7 nor i knew that you shouldn't defrag a flash storage lol
10 months of defragmenting killed it off lol
And I defragmented my steam deck's 1TB nvme due to converting from ext4 to btrfs for compression, all data had to be moved so that the moved data would get compressed in the process.
So 950GB of games when defragmenting them wrote 27TB worth of moving data back and forth lol.
lol. i used to love defragging hard drive day. cause it was a legitimate improvement to the system performance for a few weeks.
oh nice. compression doesn't hurt the performance to much hey?
In the matter of fact, opposite. It only uses more cpu when writing due to compressing the data, but when decompressing, file is smaller by up to 30%, thus reads the files much faster,
I had to go away from BTRFS though, as it doesn't like to be shrunk/expanded when messing with partitions to install windows on a 256GB partition for the deck lol
Sadly windows can't use BTRFS or ext4, as windows is windows after all 
i got the 1TB deck. might give that a look see
Good afternoon, everyone! How are we all today?
Pretty tired today, but good! What are you up to?
Just browsing some news about the nvidia 5000 series and trying to understand what an epoch is lol
cant quite grasp my head around the concept
You know, I haven't even gotten around to considering what my next PC will be
haha, when was the last time you built a PC ?
the next gen intel CPU and nvidia GPU have AI gimmick so i think its a good time to build a new one ? since they are rumoured to be released soon anyway
😆
that wrong or something?
its how it was broken down for me. thought it was a simple way to think about it. mb
needs more balls
when a seasoned ai expert laughs at something i said, i take it back
what kind of seasoning tho
lawrys
😋
man, some of the stuff flux can do is absolutely crazy. dynamic angles, proper depth perspective like someone reaching out toward the camera where their hand is really close and their body further away. and with how trainable it seems to be, I just dont see how it doesnt obsolete everything that came before it
the 1.5 pony cult aside
depends on what someone wants out of the tools they are using. some people still use a spoon to mix a cake with
anyone know any command line auguments to improve speed of flux on zluda?
@warm junco probably does. you should ask this in #🤝|tech-support
thank you!#
How much does flux models use of vram for you guys? As even flux dev fp8 t5xxl_fp8_e4m3fn clip, plus clip_I at 512x512 which supposedly allows even 12GB vram cards, just tanks my 3090's memory instantly and drags over to swap o.O
what webui do you use bro?
Comfyui
https://image.duckers-web.site/hEja1/RAmEmido03.png Here's what my generation attempt does
A random flux workflow i downloaded
your 3090 is very capable of running it. your 3090 should be 24GB vram
you may have the settings slightly high can you screen shot the workflow
Which it is, and as my screwnie in my link shows, 24GB plus some shared nvme storage lol
screen shot your setting bro
and post them in the general chat with images
Unless you mean other settings, that's the whole workflow
Unless my link doesn't show for you?
As i posted another one right above your comment
:i can't read
2 sec lol
called you in the other chat
there are 2 generals - this one which doesn't allow images, and the other which does.
在discord里加入Stable Diffusion 社区后,如何开始绘画
Aye, i know. I just keep forgetting 
there is someone that could help me to install ipadpter?
i'm using the Model Manager in comfy and want to hide some of the checkpoints and LORA's, there is a "Show NSFW" button that seems to do nothing, how could I tag stuff to fall under this action?
what checkpoint are you using when the NSFW does noting?
sorry, i mispoke, its the comfyspace node, not model manager.
I have over 100 checkpoints and LORA's in the folder, for obvious reasons, i can't share a screenshot, lol
ie, if your using SD3 checkpoint then the NSFW control wont do anything as the checkpoint wont allow it
it's not a checkpoint issue, its a node/extension
you asked i'm using the Model Manager in comfy and want to hide some of the checkpoints and LORA's, there is a "Show NSFW" button that seems to do nothing, how could I tag stuff to fall under this action?
i moved the convo to the images chat, look there and you'll see what i'm talking about. its not a generation issue
hi
hi
hi
i have a photo that i need to edit using ai. where i can get it done?
what sort of photo and why does it need to be done with ai?
i need to change cloth on my photo
the new pixel 9 runs some kind of image generation model on device, anyone know if you can swap it out for another one?
leosam's hello world has a decent chance of having that in the training data
if you get 20 or so images of retro pop art and put them into IP adapter style transfer (average the embeds) then I think you will get there
Is there an attention coupling workflow for Flux out there?
Is a image 2 image / style transfer trainer for flux ?
Atm i will try and train a neural network with image pairs, to predict 2nd latent based on the first and do image to image that way. Not sure if anyone tried it before
yeah both
Where is the img2img trainer ??
you can just use the original flux safetensors file, but swap the empty latent out for an image, and then play with the sigmas to get the result you want

