#💬|general-chat
1 messages · Page 184 of 1
ty for the advice
does anybody have tutorial how to install stable diffusion on amd? (i have rx5700xt)
lots of discussion on AMD in the #🤝|tech-support channel. and you want to read the AMD GPU guides that are pinned in that channel
that card can use sdxl
i used sdxl on a gtx1080
with auto1111, but forge and reforge are faster
hello
RTX2080 and especially the ti version aren't even that bad
I think you could run SD 1.5 on like 50MB of VRAM only if you used BK SD Tiny and GGUF Q4
Wan is pretty cool
its Wantastic
Hello
ᴀʟʟ ᴛʜᴀᴛ'ʟʟ ʀᴇᴍᴀɪɴ, ᴡɪʟʟ ʙᴇ ᴀᴜᴛᴏɴᴏᴍᴏᴜꜱ
ʙᴇɪɴɢꜱ ᴅᴇᴄɪᴅᴇᴅ ɪɴ ᴏᴜʀ ɪᴍᴀɢᴇ
Make sure you have xformers enabled
I appreciate the fast response! ❤️
Is there a youtube video that can perhaps elaborate more?
Don't want to annoy you with tons of questions!
General question: How do Stable Diffusion whiz's feel about the new OpenAI image generation in terms of quality? I never got too deep into customizing SD or using the likes of LoRa's. Is this just a case of commercial products catching up to what can be accomplished locally with a bit of know-how?
what s the best adetailer I should use in Comfy for Illustrious (including realistic style)?
how to upscale shit like hires fix but in comfyui?
i use https://ibb.co/hrVWNhJ
I haven't tried the new openai imagegen myself but from what I have seen so far it's clearly superior in prompt following. Image quality is probably on same or lower level than sd/flux though
Opinions are subjective 🤔
As usual there is no paper about it so we can just guess. It's very likely something different from sd and midjourney
they claim that their images are natively generated by chatgpt
something similar was already done by Meta, but results were not that good
also DeepSeek did something similar already
but so far the conclusion was that diffusion works better for images than auto regression
so either they heavily improved on that, or they used some kind of mixed architecture where auto regression builds up visual tokens that are decoded via diffusion
I could also imagine it's something similar to Omnigen but built on top if ChatGPT
that's two different things
yes, you can do similar stuffs with control nets and loras and so one
actually it's very likely they trained there model on such tasks
the other question is if the technique behind it is completely new
if it's just an auto regression on visual tokens then no, such methods already exist for some time
the reason why Stable diffusion, Flux, PixArt and so on are not doing that is because the results are usually much worse than diffusion based methods
however the big advantage of generation images natively in an llm such as ChatGPT is that you have probably the best possible amount of prompt understanding
and you can discuss with the ai and improve an image step vy step
it's very likely not something completely new, but it's probably very different from what Midjourney, Flux and Stable Diffusion are doing
it's very likely also a MUCH larger model
like half the imagenet leaderboard is autoregressive now I think its proven architecture for img
are 10 images enough for training a realistic character LoRa?
yeah or even one
Looking for some advice, feel like I'm hitting a wall in my learning process as I try to advance my skills with stable diffusion.
The problem - I want to generate high quality, fantasy, full body character art. However, when I render a full body image with a detailed outfit, I get a low quality face, eyes, mouth, teeth, etc.
Things I've tried, but don't seem to work (maybe user error)
-Inpaint to improve face - sometimes works, hit or miss, usually causes problems.
-Increase image resolution when generating for higher detailed - Causes body deformation. (probably because Illustrious doesn't like going above 1024x)?
-Codeformer - makes things look uglier no matter what settings I seem to use
-GFPGAN - makes things look uglier no matter what settings I use.
-Adetailer - usually makes things look uglier no matter what settings i use.
-Mosaic - Tried to create a half body image for higher detail and then expand the image with mosiac - doesn't stay true to the original when expanding at all
-outpainting - Same as Mosiac, produces bad results that don't match the original image
-Controlnet - can't figure out how to install the models to make it work.
Getting frustrated, keep watching videos of people claiming such and such extension or script or tool is amazing, but when I try to learn to use the said thing, I get bad results.
I'm considering using a different model other than Illustrious but I'm already fairly invested in Illustrious, I like the 3d semi-realistic anime look, and I already learned how to prompt and use loras with illustrious and don't feel like trying another model atm.
Any advice, tips, words of encouragement would be apricated XD.
Message #🙌┃creation-help
these days we have such a good trick where you spin some video models round and round to get lots of extra views of your lora subject
adetailer worked very good for me (on default settings) as its basically just inpainting but automated
It tends to make my character's face look worse sadly.
inpainting works too and is very easy,
img2img inpaint.
set inpaint area to "masked only"
mask the face, set denois to 0.5
set the resolution to 1024x1024
hit generate
Yeah, I've tried all of these before 😭
can you show an example in #🏞|general-with-images ?
sure, give me a sec
When captioning images for LoRa training, should I use natural language, for example:
a blonde-haired girl, wearing a black dress and jewelry. She has brown eyes, makeup...
or this:
1girl, looking at viewer, blonde hair, brown eyes, jewelry, black dress...
is there a specific channel to ask about lora training?
I disagree ;D
1.) imagenet are super small resolutions. Yes, for very small resolutions autoregressive might work, but it gets worse and worse as more you increase the resolution. Autoregressive needs discrete visual tokens to work and this will just degrade the performance
2.) maybe we look at different rankings, but from what I see all leading methods are hybrid methods and use diffusion AND autoregression
Hello
Anyone got \ seen workflow to generate 2d spritesheets? Ideally based on already existing image\s
Flux is quite smart for such kind of tasks. You probably can train a lora on existing spritesheets or do img2img with a detailed caption
Hello. New here. I've recently installed Stability Matrix. Messed around a little bit, tbh though, I'm quite lost. Any good channel to get some help? Are the VC's active here?
models can gen sprite sheets, but there are issues, main one - I can't really get as high res as I would like, ideally I would want like 512x512 per image, for each animation that would be like what, at least 12 images per animation, there isn't a model which can do that, it also needs to be proper consistent sequence, which again, basic tools cannot achieve, img2img trying for luck ain't ideal
I remember there was a tool which allowed to "move" some parts of the image and AI figures out other stuff, how to make it look decent, that would help, but I don't remember which tool was it lol
512x512 might be overkill, but even 256x256 is still alot + again, inconsistency
they have their own discord it seems, there is a link on their github
smooth animations are difficult... I wonder if the inpainting model could be used for that.
Is there an up to date tutorial on setting this up?
I dont remember how and i got new laptop
I think video models might have better chance, then just split the thing by frames, remove background 🤔 but idk
which app?
there are probably thousands of them at this point
if not these you need to be more specific 🤔
Some Arab Metal fans are arrogant, despicable and provocative, laughing at me just because I use AI and still they didn't manage to make one song like my songs. To think that using AI is trivial while there are some songs that I spend 6 days to make them. Strangely enough, I sent my songs to several Western metal groups and they didn't underestimate my songs. As AI professionals we need to spread awareness about AI to fight conspiracy and bullying against human development.
in case for stable diffusion webui by automatic, where do i config (and ig how) for 4080 laptop version
also what sampling methods are good
qual a melhor estrategia para a compra de small caps no ibov?
No data source is currently selected. Please choose a data source from the dashboard and try again.
Whats currently the best way to train flux Lora’s
@woven panther hey I see you converted the AccVideo to fp8, but my question is, can that still somehow work with 12GB vram? or
would I need more? i would really love to test it. I hope someone makes some quants of it too
it doesn't change the memory use, it would be same as the standard model in fp8
if you mean you need to use GGUF.. then someone would need to make those, I mean I can do that too but it's somewhat laborous to do all the GGUF variants
i can do it if you tell me how to do it, like is there a script/tutorial i could follow, i just want maybe q4 km for now to test
also, is the workflow literally the same as a normal Hunyuan t2v, but just with like low steps? if you can share the workflow would be nice 🙂
would be awesome if their technique is adapted to Wan2.1 also and the i2v variants
i found this, but not sure if that is how to do it for GGUF: https://github.com/city96/ComfyUI-GGUF/tree/main/tools
What about upscaling?
I'd try adetailer / inpainting at diff denoise levels and with diff prompts, then upscale the result
And then maybe run it over another low denoise pass
Then you can downscale from that
Essentially if you want higher resolution, you prolly should start out with an intended resolution, upscale, then you can tile the resulting image and run a diffusion thing over that at low denoise
Can't post my previous pics for that but ye it's what I sometimes do for better background
I sometimes make the model at each stage different to try and capitalize on different strengths
But it's also very finnicky to then preserve the strength of the first model
I need a workflow in comfyUI that will take father and mother photo as input and generate image of their future child (child should have facial feature of father and mother) if anyone could help it would be great.
@woven panther sorry to bother you, just wanted to say I got AccVideo working (your fp8 version) 😮
it does it nicely in 5 steps indeed, crazy. i think i saw on author's github, they are working on the image 2 video version too (in coming weeks) , that would be very nice as well.
good to hear, I converted it to some GGUF's too now:
https://huggingface.co/Kijai/HunyuanVideo_comfy/blob/main/hunyuan_video_accvid_t2v-5-steps_Q3_K_S.gguf
https://huggingface.co/Kijai/HunyuanVideo_comfy/blob/main/hunyuan_video_accvid_t2v-5-steps_Q4_K_S.gguf
https://huggingface.co/Kijai/HunyuanVideo_comfy/blob/main/hunyuan_video_accvid_t2v-5-steps_Q6_K.gguf
https://huggingface.co/Kijai/HunyuanVideo_comfy/blob/main/hunyuan_video_accvid_t2v-5-steps_Q8_0.gguf
as well as a LoRA version, which doesn't work that good and can probably be done better, but for testing:
https://huggingface.co/Kijai/HunyuanVideo_comfy/blob/main/hunyuan_video_accvid_5_steps_lora_rank16_fp8_e4m3fn.safetensors
wow awesome stuff! il test it later 😮
ok gguf q4 ks works nicely! and quality is still very good! i didnt test Lora cause i personally don't care too much about it as an "adapter", but it's cool to have as an option i guess.
i can't wait for them to release the i2v version
btw, is it possible to quantize the Nvidia Cosmos models? like just the 7B maybe?
will quant fine yeah
That’s really amazing!
Looking for an OCR developers for a big project
if you are interested in the small test project with the budget of $50 plz ping me with your previous relevant experiences
Anyone here didn't fall in the trap of OpenAI Ghbli style? That's not even Ghbli style despite the adherence to face features.
There are so many Ghibli loras out there 🤷♂️
Despite the new openai imagen might be impressive in the technical side, all the showcases can also easily be done with existing open source methods. It's just people hype this stuff who never heard about anything else than midjourney and dall-e
except for info graphics, they are extremely good
when I read about style stuff the conclusion was that style is not really a clear thing you can separately benchmark cos, while most styles only change stuff like colour, texture and lighting, some styles also change objects or layout
like if you imagined an art deco building the shape changed
reminds me a bit on the "Greg Rutkowski" style xD
like it was for some time the most often used style prompt
and he was pissed of that so many people "copy his style" and many news media wrote about it
but what they didn't get was that almost none of these AI images really had any greg Rutkowski style even if it was part of the prompt
"in the style of Greg Rutkowski" was basically a way to tell the text encoder: "give me a high quality digital painting with fantasy elements"
it was some basic prompt element like "masterpiece, award-winning, highres"... boah, I get nostalgic thinking back to the time where we had to put all this bullshit into the prompt xD
yeah they never actually looked like Greg Rutkowski
a bit like how they had Octane render or 8k
I still use SD 1.5 lol I have to do searches for good tokens regularly
hey
Hi people 🙂
hi cute patootie
can somebody tell me how to convert a pic to ghibli, everybody is on this trend, only i am not able to generate it for my self
they're mostly using gpt 4o. They might have added a filter due to copyright which will make it harder to do.
hi
hello all
hey guys, any way i can a make a wan character twerk ? any loras for this? thanks
What are you bringing to the table other then an idea? Just curious
Hello everyone. I used to use an APP called Mon_AI on my phone, it was "powered" by stable diffusion, and it let me create an unlimited number of images per day. It was really great, and probabably the best free AI art generator I've used. What is the best way to use stable diffusion for free on my laptop? The APP worked great, but I couldn't help but feel like it would be better to use stable diffusion on my PC. Any suggestions?
nope, and it's less of an issue and more of a curiosity
hi :) I am making an AI-based pixel art game with a custom model related to stable diffusion - it runs in-game so you can create new creatures yourself! it is on steam (a 'coming soon' page) over here if you are interested in wishlisting https://store.steampowered.com/app/3614730/Monster_Pod_Quest/
sorry for self promo! couldnt find it mentioned in the rules or a dedicated place, so thought i would message but keep it short 
@woven panther just saw on GitHub that AccVideo team is also working on Wan too 😮
We really need some of these accelerators/faster versions so this is gonna be awesome!
still waiting for forgeui to work on 5000 series cards
Or you can switch to swarm and edit a torch file and start generating again
5000 series owner here, with forge your gonna be waiting while
Hello! I have created a cartoonish character that resembles an animal, and I want to dress it up. I want to make it anthropomorphic with clothes, but I don't know how. I tried SDXL + IPAdapter, and FLUX + PULID, but I didn't get a satisfactory result. I want the face to be the same, but it keeps changing. Is there another way to try? Or is there no other way than to train Lora for my character?
I'll look into it....I'm getting very tired of switching and setting everything up again. First it was automatic1111, then it was comfy (which I don't like), then it was forgeui. Now it's swarm.
Swarm is comfyUI with a easier UI
And well you can keep forge around since you can set the model paths in the swarm settings
So when it updates you can switch back but maybe after trying you dont want to👀
Thanks. Is this the mcmonkey github for swarmui? Can it do flux quickly? Are some of the same extensions available?
Hmm idk what extensions you mean but
Flux, wan, etc is possible
Also yes but depends on the card you have lol
But my 5080 does flux schnell in 4s per image
The ones that I use the most are an aspect ratio picker to lock aspect ratios so you can increase your resolution and it auto updates the other dimension.
Oh thats built in
And the other is a faceswapper. I forget the name of it
Reactor, yes.
Also one click install
What do you like about it compared to forge or other generators?
Hmm i do recommend checking the swarm discord for the 5000 fix if its not native already. I cant send it rn bc im leaving for work
no worries, I'll do some research
Flexibility, lots of explainations and frequent updates
Like
If its a feature that makes sense? Mcmonkey fixes/adds it in no time
And you can comfortably use the latest models
Like Wan/hunyuan without a big hassle
wow even video models. that's interesting
Is it generally considered to be the best generator if you don't want to mess with comfy? I like the flexibility of comfy, but I prefer to just use an interface that just works and not have to hunt for workflows and all of that. It gets really messy and confusing.
I only used comfy like twice because i wanted my own custom workflow, otherwise you don't need to use it at all, swarm takes care of it
alright.....it's sounding pretty good. I am using a auto1111 fork that works with 5000 series, but it doesn't support flux, which is what I used forgeui for. I don't know if it will replace auto1111 for me, but I'm not married to forgeui. I only used it because it supports flux and looks similar to auto1111
Ah check your dm's, theres a theme that allows for a1111 colors
But i prefer darkmode
Is this stable ronaldo discord or some random other person idk
Hi, I have been trying for several weekends to give a GTA style to my uploaded photos in Automatic1111. Unfortunately, without the expected results. I think I am making some serious mistake during the SD setting process. I tried using ControlNet, checkpoints from gtavstyle however without satisfactory results. For the sake of completeness of information: I want a gta effect that preserves the resemblance of the uploaded photo (object or person). Maybe for this I need some midjourney? Or needs to install an IPadapter to SD?
Please at least a little help to direct me to solve the problem and learn the methodology for creating such styles img2img2. Thank you very much and best regards
you have to give more information WHAT is not working
controlnets + style lora should work to keep the overall scene the same. Alternatively, you can use IPAdapter instead (or additionally) to controlnets
if the faces are not accurately transferred, you need something like FaceID or IPAdapter
Alternatively: you can also try my Reflux plugin for ComfyUI that works really well for changing the style of an image without changing its content
see examples in https://github.com/kaibioinfo/ComfyUI_AdvancedRefluxControl
set the downsampling_factor to 2 or 3 and then use "GTA-style artwork, gta 5 style . Satirical, exaggerated, pop art style, vibrant colors, iconic characters, action-packed" as prompt. At least for me that was enough to get any image transferred into GTA style without controlnets
lol yeah your reflux node is still one of the best things out there
there are a handful of other reflux nodes that do some interesting things as well
https://github.com/yichengup/Comfyui_Flux_Style_Adjust https://github.com/yichengup/Comfyui_Redux_Advancedsimilarity_threshold and noise_level in these two nodes are cool
yeah, I need a free weekend to make a new update ;_;
there's probably more tricks with redux that haven't been discovered, its a weird tool
saw someone interpolating the encoder embeddings and it looked cool
I really hoped I could find a "style-extract" but I totally failed with that xD
interpolating between two images or what do you mean?
https://github.com/yichengup/Comfyui_Flux_Style_Adjust
<--- this one I looked up once and it seems fishy
the code makes zero sense and I think his examples are all cherry picked
however, that was months ago. Maybe newer versions of his plugin work better but I somehow doubt it
ah... he now also writes this in his github page: "Note: Sorry, there is no basis for this weight separation, I just want to try it out for learning. [...] This is an inaccurate node."
so at least he is transparent about it now
they called it "redux travel" and it was using redux embeddings apparently
it was on Banodoco server
```LOL ah okay
noise_level in the other one was good when I tried it
it looked a bit more natural
yeah, you can do that. It works also with normal prompts. I once made a video of myself transforming into a werewolf by just interpolating between a prompt describing myself and a prompt describing a werewolf. Worked great.
hmm okay yeah will try that one
really? That's funny. it's just adding noise in top of the conditioning
oh is that what it does?
yeah it looked okay lol
maybe I find some time this weekened to add some stuff and go over the issues and merge requests xD
CADS does something similar and I like CADS so I guess that makes sense
this thing https://github.com/asagi4/ComfyUI-CADS
it gives more varied/interesting images for SD 1.5 or SDXL
sadly no flux support
hm, with Flux its difficult
nah, if I think about it.... it shouldn't matter
you can just add the noise on the T5 embeddings
T5 and Clip_L work differently, right? in Flux
yeah maybe this will work 🤔
I'll try it today, on the stock flux repo
yes, but CLIPL is.... I don't know how much it does in the end
not a lot lol
CLIPL is only used for the AdaNorm Layers
the noise idea is nice... also for the Reflux
I often run flux using a node that disables clip-l anyway
like I could imagine you could add noise on different areas of your image to "lower their importance" and use that also to add some kind of focus "which part of the image should be preserved mostly"
yeah in my experience injecting noise in random places of models can be good lol
with SD 1.5 or SDXL noising the attention maps can get you really high detail sometimes
it has side effects
hm... I miss the UNET a bit...
the layers in the unet had relatively clear roles you could influence
I've only used SD 1.5 in 2025 pretty much
I prefer it
in mmdit it seems unclear which layer is doing what
yeah I did a long sweep of loads of Flux block combinations on a big server once
wasted a fair amount of money on it
there is no really clear trend
the double blocks do go from coarse to fine, or the other way around, can't remember which
but individual blocks are not salient
I once wrote an attentionmap visualizer for Flux
https://github.com/kaibioinfo/FluxAttentionMap/blob/main/attentionmap.ipynb
such that I can at least understand how the prompt tokens behave in different layers
and if flux even understands certain tokens or not
ah yeah I came across that on github searches 😂
I need to try it, this could be useful yeah
I wish both comfy and diffusers made it easier to get attention maps
It was a pain to get them from diffusers xDDD
yeah I've avoided diffusers all this time and it seems to have paid off
cos after 18 months I can finally just write pytorch
😥
I liked the idea if having an unified framework. But the API is really not well designed
in particular for Flux I have to say: the original code by BFL is just extremely clean and simple
probably because they had to write from scratch instead of just adding another layer of code on already existing codebases
yeah that codebase is so nice
I don't know where autoregressive models are going to go
they seem to tend to use HF transformers
or just pure pytorch
do they? I thought its often the other way around and transformers is adapting to their stuff. But I have no clue
not sure, I went through a bunch of github repos of stuff like llamagen or infinity VAR
it seems to vary though
they're all under-trained but we have a bunch of them
I don't belief in pure AR methods for image generation xD
I bet even OpenAI is using diffusion under the hood
the pure ones don't look too good yeah
hybrid probably best
yes, hybrid is the future
like either a diffusion head, or the ones where the AR process mimics a diffusion process
something like StableCascade with a AR method as first stage
that sounds good yeah
that one is also great. I love the idea. It really shows why AR might work better
because what they found out in the end is that diffusion works great, but it can get better if you do not try to diffuse an image in one generation but instead split it up in smaller chunks and generate them sequentially
we have an absolutely massive one, Lumina mGPT 34B
if you are willing to wait hours for your 4k image 😂
however, I think AR makes only sense if you integrate the image generation into an existing llm
because only then you get the full power of "conversational image generation" we see in ChatGPT
yeah I talked to Janus Pro and other AR image models and their text conversations are not so good
@fervent thunder What are attention maps? Where would I inject noise in a SD1.5 workflow.
Hey, does anybody have experience in creating hyper realistic pictures when a lora is already trained? I tried with flux and stable diffusion but have some problems some times
Hey there
Hi
@woven panther oh they released some initial VACE stuff 😮
https://github.com/ali-vilab/VACE
the attention maps show cross-attention and self-attention scores for each set of tokens
with the noise you can inject it into the latent, the conditioning or into the activations of any of the blocks of text encoder, unet or VAE
into control net or IP adapter works as well
I've never seen those maps. Is there a preview node for attention maps? What would I plug into the preview?
no this is what we were saying, that its annoyingly hard to make them in comfyui and diffusers
just look at the examples here: https://github.com/kaibioinfo/FluxAttentionMap/blob/main/attentionmap.ipynb
There are also tools for SD1.5 and SDXL I think. But I don't think there is a comfyui plugin
it's not that these maps are important for daily use anyways. You can use them to interpret models and try to understand what layers might doing what
for example it was easy to see with attention maps that certain padding tokens are used as registers and are important, while most padding tokens are probably useless
some tools manipulate attentions maps for various reasons
@woven panther i see you slowly implementing some VACE support ❤️
il wait for a workflow
Yeah I have to sleep but it does work
Tested control with depth and pose, with or without reference image, video extension and outpainting so far
Pretty impressive first impression
The base model is in fp32 and the vace model (in the same file) in bf16
It does use almost twice the VRAM than normal 1.3B
they have a huge guide on how to use it lol
with the preprocessing
and this is just the preview version, they will release the actual 1.3B and 14B versions soon
if 1.3B is twice VRAM, 14B gonna be huge for most people i guess idk
man April just started, we get VACE, i also found out about TripoSG for 3D asset stuff 😮 (il try it tonight)
plus we have AccVideo stuff, etc,
AccVideo is a bit rough but the others are good
its hard to make video distills
there was a closed source paper that did a good one but they used something like 256 H100 to smooth out the gradient updates
which is too excessive for open source
Hi guys yall might have seen but my friend and I created Rem—a dream journaling app where you can easily record, analyze, and share your dreams on https://lucidrem.com
Hi all, just wondering if anyone has a recommendation for an Illustrious model for general use (toony not realistic). I'm using Illustrious Toon Mix but sometimes its just too cute, so i need something more general purpose
Spam
have you tried using a specific style lora
Mistoon_anime is good
original NoobAI
@woven panther I see you are still implementing VACE stuff 🙂
just wondering in the current state of your implementation, is anything ready to test
with a workflow you could provide or you are still getting it ready for a workflow?
simplest thing you can do is just plugin the new VACE node and any control signal to the input_images of it, that would do text2video with "controlnet"
2x as much VRAM sounds like it's not realistic to use the larger wan model... :/
then you can use a single reference image, with or without the control signal, but main thing is you need to either put your image to a white canvase or remove it's background, so that it's white, or it won't work
yeah but there's only 1.3B available right now anyway, and it's still pretty good
you can always blockswap
if you accept the blockswapping slowdown then suddenly you can run way larger things
how much slowing down are we talking about?
Is SD working on an answer to OpenAI's new Image Gen capabilities? Would like to see the enhanced image gen on SD models
Hey guys is there any tool / library from which we can call any phone number .
Note:- this is needed for voice bot. ( Once call is connected to number we can enable the voice bot)
Sounds like that your trying to automate scamming or telemarketing?
Guess your in the wrong place
Nope ..
See I want to implement voice bot but how to implement it until you call a mobile number
So you want to make something that could do that but stop before you get to that point?
Didn't get your point 😅
Anyways, your probably in the wrong place as this is for stable-diffusion enthusiasts
Does anyone here use audio narration software on webpages? I'm fixing up a bit of css/html and if anyone is active and does, it'd be great to know what works and what doesnt
hmm i got acces to it but not on my current device. is the page live already?
no its a hypothetical one for a report. i was just wondering if anyone had familiarity with skipnav and whether there were sites that let you traverse back from skipnav back to the nav rather than having to go backwards.
thanks though!
my current system lets a person using a narrator skip the tedious taskbar, but if they go back. wait, nevermind, you can go back using the browser tools instead of narration. Thanks for letting me sound the thought!
Alright, after constantly having issues before, now that I have a new PC, I want to get a fresh install going for Stable diffusion. What is the best install guide to follow for Win 11 using a web UI? (used automatic1111 before)
What is chatGPT using for this Ghibli style image? I briefly entered the generation page and saw that it uses SDXL Juggernaut and Flux, or at least you can choose (correct me if I'm wrong). So it uses that, but I guess the prompt is enhaced by chatGPT, also I gues the img2img is enhanced as it keeps just the right amount of detail mostly, what is it?
I was wondering if they had some kind of ControlNet or it is just a img2img, as detail and sometimes text stays
ChatGPT is using its own model, probably an AR or (more likely) AR-diffusion hybrid. I also think it's much larger than Flux and Co
if the model is large enough you don't need control nets for such tasks
Flux for example is extremely good in such kind of things if you let it generate multi-panel images
it's just that Flux cannot use previous images as conditioning for new images. Otherwise you could probably do similar things with Flux
Could someone help me import a diffuser into Inpaint Anything on A1111? I downloaded an inpainting model from Civit but I don't know where to put it or how to get Inpaint Anything to recognize it
The directions on the inpaint anything github seem to give directions to download one from a repo you know the name of, but not a local file you have on your machine.
Hi
i recommend Forge or Swarm honestly
A1111 while nice is outdated
Guys, whats the best way to put my face on another photo? to make it realistic like faceswap?
hmm ithink its the same as a normal checkpoint (like juggernaut has a inpainting model and a normal one)
so when inpainting you select that one but A1111 is a distant memory for me so i cant help you much there
thanks, I'll check them out
in the pinned messages on #🤝|tech-support theres a few guides
personally i like swarm a lot
is the main way i see people do it. takes some getting used to
https://github.com/Gourieff/ComfyUI-ReActor
though swarm also has a extension for this
thanks, will look into this
there is ipadapter for SDXL. For Flux you can use FaceID
It's not on the list. I downloaded the epic realism inpainting safetensors and put the file with all my other models, restarted my computer, started A1111, but it's not on the dropdown menu. actually several models I don't have ARE on the menu, which means Inpaint Anything obvi keeps its models somewhere else
hmm must the the extension specific thats acting up because i used to be able to select them just fine as a main checkpoint
tough if you dont mind me asking
why inpaint anything?
segment anything for masking
for example, I don't have dreamshaper, but I seem to have a dreamshaper model available in inpaint anything.
hmm i dont see any documentation being written on the page other then "its compatible"
so one would assume it looks in the stable-diffusion folder. cant help ya here man sorry
from inpaint anything sd webui repo on github: The inpainting model, which is saved in HuggingFace's cache and includes inpaint (case-insensitive) in its repo_id, will also be added to the Inpainting Model ID dropdown list.
If there's a specific model you'd like to use, you can cache it in advance using the following Python commands (venv/bin/python for Linux and MacOS):
The model diffusers downloaded is typically stored in your home directory. You can find it at /home/username/.cache/huggingface/hub for Linux and MacOS users, or at C:\Users\username.cache\huggingface\hub for Windows users.
but I would like to not limit the model for use only in inpaint anything (sometimes I send the mask to img2img) but I also don't want 2 copies of it taking up twice as much space
also I didn't get the model from HuggingFace I got it from CivitAI, so.. I guess it's possible it has a repo there.. hmm
Nevermind. You can configure the extension to look in another directory.
ah i was in the main github
hmm, I pointed it at the models folder where I keep all of them, but it bugged out.. I had to redownload the sam model segment anything uses, which created a file in the folder, but the segmented image is like a thumbnail and doesn't show the full image..
... I didn't read it correctly. you can only change the segment anything model folder.
oh. omg I'm dumb. there's another tab in inpaint anything next to cleaner and controlnet inpaint for webui inpaint, where all the models are listed. forget I mentioned it
Oh then the page I entered I guess it wasn't exactly the right one. Still don't understand where it is the image generator
how to make image?
anyone manage to scoop up a 9070 or 9070xt and get it to work on windows?
Yes we had 2 users here that got Stable diffusion working locally with these cards.
They followed the Webui Zluda install Guide from the pinned messages in #🤝|tech-support
hello
hi
depends but a large multiple
Can anyone help me with getting flux gym installed locally on Mac?
Hello everyone. I am new here and new to ai in general, but I have set myself a challenge. I have been watching on as my friends have started side hustles like drop shipping and automated news programs or social media channels. I wanted to start a monetized project of my own, and I decided that I would finally publish the comic book stories that I had been dreaming up for years, and I would use ai to do it. I am planning to have my story analyzed by one bot that will turn the descriptions into prompts for stable diffusion to make the art for, compile/format it into a comic and sell it on Amazon KDP. I have Stable diffusion with the AUTOMATIC1111 UI, but I don't know how to get the art to turn out right (additionally I will need consistency of character design). I tried it out myself after watching some how to videos and got some wanky resalts, and when I asked the ai I plan on using to convert my stories to prompts (Claud ai), the results its prompts and settings got were disturbing to say the least (I asked it to try and reproduce Garfield the cat. The images came back with two torsos attached like pats of an ants body, three eyes and five or more legs). For consistent comic/manga art does anyone have advice on what I need to learn, add or download/incorporate?
im praying there will be an sd3.5 detailer
I wasn't expecting one cos there has not been many detailers since SDXL refiner, but sure we can hope for one
there are some occasionally, cogview 3 unet version had one
and I saw some normalising flow model refiner
Hello all
hello
only in the comfyui impact pack. it has a flux refiner
ah ok I don't know impact pack very well
it looks like a good node pack its just that I don't want the SEGS thing
but impact pack is mostly about SEGS really lol
I just sort of blend a bunch of gradients together with photoshop blends instead, and then turn to masks
Hi guys
** I wish to hire** someone that can upgrade my Real Estate Renders for my client, I have a 3D model of the project, and real life photographs from onsite, client wants to showcase the house as finished to be able to list 
Hi!
hi all
turned out the Grok3 has more adherent filter face ID than Sora
what sampler/scheduler do you guys use for wan 2.1 img2vid generation
I'm having trouble getting good results with image-to-image in Stable Diffusion. My outputs are very low quality the eyes, facial details, and overall sharpness are really lacking. But I see other users getting amazing results. What could I be doing wrong or missing?
Do you have after detailer?
They also do a "hiresfix" on images they post a lot
Thanks! I’ve heard of Hires.fix but I haven’t used After Detailer yet Is it a separate extension or part of a specific workflow? And how exactly do you use it for img2img?
Hmm it depends on the ui you are using. For a1111 its called adetailer but I'm not familiar with that web-ui
I think it's related to ControlNet using a reference image and transforming it into a different style with AI without changing the main structure. I’ve installed the extension, but it doesn’t work well for me. The results are very poor the face and eyes often come out distorted or missing. Any idea what I might be doing wrong?
what UI do you recommend for me? I have Automatic1111
Like specifically
Sure is it allowed to send image here?
Sfw ones in #🏞|general-with-images
Okay I am gonna send with my details
Hello everyone, great to be here to learn first to contribute later.
I have started to create images with comfyUI, can anyone tell me where to find a model to generate 3D comics?
Hello
@woven panther how did you get the original VACE wan model from 7.15gb to like 1.47gb 😮
is it really equivalent and i can use your smaller version? cause that would be really nice if i can avoid the extra space
il check your workflow tonight
HEEEEEEEEEEEEEEEEEEEEEEELP
I got a 5080 and now stable diffusion doesnt work
someone said I need to update to a different version of pytorch
does someone know the easiest way to get this fixed
Bro these GPUs are so overpriced and they don't even have backwards support out of the box
Nvidia can eet a phatt dikk
well yeah, you make a new architecture, charge up the arse for it, and don't even plan on helping out the folks who were using your previous architecture
I HATE this monopoly these mfs run
It's a separate module, the original model simply includes the original Wan 1.3B in fp32. You still need some 1.3B model to use with it though.
It's pretty much like a controlnet in that sense.
Hi quick question does Stable diffusion still offer video generation API’s, image-to-video, and image generations? Also is there any restrictions or can you simply follow the API docs provided to get access to the API’s?
yeah their API is good
https://platform.stability.ai/pricing
its got an interesting aspect that they have 3D models on there as well as video and image with controlnets
so you could make a 3D model, turn to image with controlnet then turn to video, with just one API
Am I'm weird for preferring both Flux and SD3.5 together?
Both have their flaws for sure, but using both of them really provides me interesting images to generate.
its not weird its good to combine models
I end with SD 1.5 or SD 2.1 for every image I make, for example
That’s greattt, thank u for the info
no problem
I recommend looking at 3D stuff, the latest Gemini Pro can write working blender scripts to make use of 3D models
you can do a quick render and then go back to diffusion model to finish it
Well... I ended up getting banned from the Midjourney discord.
I probably shouldn't have said "4o Image is so good that I make multiple accounts to generate more image."
And look where it ended up.
Oof.
yea they have to ban for that, its probably in discord terms
discussing unlawful activity etc
Well I learned my lesson, I'll keep my mouth shut next time.
not rly a fan of piracy cos
I have to pay more for things due to the pirates
I am mixed toward piracy, piracy should always be a last resort. Otherwise, I pay for my stuff. But I can see where people are coming from when they pirate.
to me last resort things are like food, water, housing, medicine etc
media and software not so much
sometimes piracy is the only way when companies decide to abandon a product (old game) but refuse to re release it
not advocating for it but i get why
or when you bought software and the newer version just sucks while the old one worked better. so you download the old one etc
that's a description of the motivation "why do they want to do the thing"
rather than an ethical judgement "is it morally right for them to do the thing"
Hello
Anyone know of any good tool to use for branding / infographs / asset creation?
I got logo, colors and stuff ready for a project. Tryna find a tool that i can use as a template to create banners, infographs etc.
I am your partner in AI-powered business transformation. My mission is to bring innovative, AI led solutions, to your business problems, through a personalised human led approach. Delivering excellence for clients and customers with demonstrable results and measurable return on investment.
If you are looking for AI engineer, I 'd like to discuss with you.
Thanks
Supposedly, ChaptGPT 4o can do that.
HI,
for your use i recommend Canva
for banners infographs, flyers etc
it has Ai intergrated but its tool is really nice regardless
Is Kohya_ss still the standard for making lora? I've made a couple character lora successfully but I've noticed it hasn't been updated significantly in a while. Is it because people use something else now or it just doesn't need updates?
Oh, looks like after a while there was one last week and before that September I could of sworn I didn't see any before 😅
Still curious is it still the standard to use?
Hmm i don't see why not. For What model you wanna make a lora?
All sorts of models really. I'm still new to training so I'm pretty ignorant on what other people use
Heard kohya was good and after fiddling with the sliders after failed attempts I get pretty solid results now
Just weren't sure if it was the best trainer or not
I haven't watched any videos just been YOLOing AI stuff so it's been a journey 😅
where can i post the stuff i create and ask how to make them better and stuff
@woven panther it just keeps happening LOL https://github.com/SkyworkAI/SkyReels-A2
based on Wan 😮
yeah, but doesn't seem to have control support so not too interesting for me personally
yea i understand, will see what happens
when new models arrive, there might be differences and bugs between different training methods
but if you train models that are out since many months then it probably doesn't matter
just use the training tool you are familiar with
Almost tempted to use my spare 1050TI as 4GB extra vram for comfy to offload to lol. As i hit 24GB vram and 64GB ram cap way too damn often lol
Or, hell.. just checked, and i can just get a used 3060 12GB for 220, so with the offloading, i will essentially have 36GB vram 
well to make a lora you first need a data set
you want to make a style or a character lora?
character. and, like i said, i can only really start with a single picture as my data set. i heard some time ago that it's possible.
hmm maybe with IP adapter? but for a lora you generally need over +50 images minimum
maybe i can slowly get there... what's an IP adapter?
hmm it has many uses such as style transfer, etc but im not familliar with it since i like to make a style from the get go
well, i am using A1111, and, even though i always wanted, i never really got around to experimenting with lora training
so, i am mostly wondering if i should use some extension
Well heres the thing, theres one extension but its not recommended as its outdated and even broken for the newer models iirc?
what GPU do you have before i recommend some stuff
nvidia 16 gb, rather solid i think
hmm what generation? 30, 40 or 50 series?
rtx 4070 ti super
hmm you could train locally but it would use your pc for a solid while
but since you wanna try 1 image lora's its better to run locally imo
oh Cs1o is typing
👀
OneTrainer or Kohya_ss is recommended.
Both are standalone tools, no extensions.
Never use Dreambooth its broken.
id say avoid at all costs yeah sillius
i seem to remember there is a kohya extension?
its a stand alone tool
any extension is probaby a "fan made" extension and probably not as up to date
hmm okay... is the inbuilt A1111 training implementation lacking somehow? or should i just start there, then?
☝️☝️☝️
koyha_ss
or onetrainer
okay, i will look into it, ty
theres also guides availible for that iirc
but for the other stuff:
Its either outdated, broken or complex to figure out for beginners
It can't be used to train loras
oh, that makes the decision easier
are people even still using A1111 or has everyone switched to comfy?
Mostly forge, Swarm or comfy yeah, some people use Fooooocus (more or less o's idk)
people who use A1111 mostly gotten a guide on youtube from a few years ago
yeah well, i'm not so up to date...
i mean it still works
I'm still using Auto1111, performs good as I dont do much with flux
Forge is good too but some extensions won't work there
can't say i miss anything in A1111 either tbh...
anyway, i will look at kohya, thanks
training from one image works
but I would recommend to try ipadapter first
training from one image often means: you overfit like crazy, use this model to generate multiple new images, start again with a larger training set
with ipadapter you can directly generate more training data
5 training images are a much better start
holy guacamole
I just got A1111 running with my 5080
worth every cent of this pre-scalped card
generation speed is INSANE compared to 3090
Also thumbs up CS1o for still using A1111
I'm never switching to anything else if I can help it. This interface to me is golden
Flux is good for images with text and quick concept arts
1.5 is still king of details IMO. Also the easiest to train
:cowdance:
GM everyone
How to generate images here?
One more thing: are there (up to date) integrations of kohya and ip adapter into comfyUI?
Hey guys,
I need to run DiffusionPen and it requires Stable Diffusion v1.5 from runawayml.
Can anyone help me with this.
Yeah, are you working on projects related to it?
hi
hi all
Hello everybody
@woven panther hey thx for supporting SkyReels-A2 and providing the converted weights too,
will check it tonight. Is there a workflow available?
not yet, I'm not sure how to use it yet, 2 images works pretty nicely but there are some weirdness I don't know about
no problem ❤️ take your time, amazing work as always!
im assuming infer.py is the actual workflow pipeline, so maybe the answer is there somewhere :3
hello, while generating images i got a problem and i was hoping i could get some help! i try to generate 2 characters, capella and crusch from rezero. In the anime Capella is riding Crusch. I managed to do the positions, and characters to not merge with eachother, but every single time, Crusch is on top while capella is bellow. I want the opposite! Can somebody please help me how i can fix this problem? and make capella to be one above?
Has anyone here used swarmui and sd.next? Wondering which one I would prefer more. I really like automatic1111 and forge.
Hello
anyone know how to get stab matrix working with 12.8
i keep running into issues
manage to fix some but not all
I think it's working okay now, got bit complicated in how to construct the input, but I added example workflow that uses their example images
cool, will try all that tonight, thx 🙂
rly not sure whats up with Stability Matrix or the similar Pinokio project
they tend to get recommended as easy install methods
but if they are for install why do they not resemble install scripts 🤔
they are just as hard as doing it manually
they are just painted differently
thats what makes it easier in a sense
more organized
1. download Docker 2. docker pull aidockorg/comfyui-cuda 3. docker run -d -p 5000:5000 --name comfyui aihub/comfyui 4. http://localhost:5000after that you have docker setup with comfyui and automatic model and node installs
its isolated from your system so it is safer for security as well
and if the comfy install gets messed up you can remake fresh docker container with one command
pinokio is a launcher for all sorts of AI tools and applications https://pinokio.computer/
IDK cos they have this other computer aspect going on ```Architecture
Pinokio takes inspiration from how traditional computers work.
Just like how a computer can do all kinds of things thanks to its comprehensive architecture, Pinokio as a virtual computer is a comprehensive platform for running and automating anything you can imagine with AI.
File System: Where and how Pinokio stores files.
Processor: How pinokio runs tasks.
Memory: How pinokio implements a state machine using its built-in native memory.
Script: The programming language that operates pinokio.
UI: The UI (user interface) through which users access apps.```
you need to make a juggetnaught flux
can m4 pro run stable diffusion also new amd 9070 smoothly without lag hang stutter melting?or ngreedia only option for SD
even xt will do
or not?
9070 xt?
I wonder how OpenAI's newest image generation works. It's clearly more complicated than just processing a prompt through some weights.
Does anyone here have experience with Lora Training in Flux Gym? Ive been working at it and keep running into an issue were it says my training complete after like a minute. No sample images or anything.
yah yah ... wassup ?
i have huge prompt list in pos. prompt textfield and want to add one after another to see how it takes effect.
how to lets the other prompt in the textfield but it should not influence anything...how to exclude or ignore these prompts without copy pasting and such workarounds?
Wait it seems i can use # per line ?!
How good are Flux loras, any examples?
I want someone to show me a Flux taylor swift to see if it beats SD 1.5
hello! it will beat SD1.5
1.5 still better
?
For details
This gonna make a lot of people mad, but image generation peaked at 1.5
At least for NSFW
I just tried Deep Cache for SDXL (ComfyUI) and it makes generation 3x faster and I like the image even more.. Are there any other speedup methods?
well ill just assume your wrong ngl especially since you havent used it in a year _+
if i were to create the stuff i do on 1.5 its so much more messing around
I've used flux for some personal projects, but I'm not convinced it's good at porn
extra upscaling, adetail, refiner etc
Exactly you have to use a lot of extra tools but overall I'm still blown away with how much more flexible and uncensored it is
I just don't see that with the newer models and I don't see that with trainability
But if someone can prove me wrong, I wanna see what celebrity trained Lora on flux looks like. Doesn't even have to be NSFW.
Civitai is just full of anime
Hi guys, what is actually the best efficient AI for generative pics in open source ? And/or model ? Thanks a lot
Will there be a Stable Diffusion 4?
probably but I suspect their main product is gonna be something different
they keep hiring VFX people
and if you look at current VFX software its very much doing things that could be replaced by transformers
Does anyone know which types of SD models do a good job with text? I've never really tried it.
Bonus if it can do non-Latin alphabet characters!
If the answer is "Stable Diffusion 3", then I don't think I have access to that, unless I'm wrong
I would say none of the diffusion models is particularly good at text, but Flux is probably best
there is also a clip checkpoint for flux that surprisingly improves text a bit
oh ok, thank you
something launched recently with glyph by t5 v3
Hey guys what hosted sites (free or paid) do the best image generation with a character reference? I'm trying to get my face, but in an action scene, that sort of thing. I tried with ChatGPT 4o but it's not getting the face right. Is the only real way to do it using SD or flux locally?
@woven panther thx for providing the Wan2.1 fun reward loras! Just a question, are these only for the InP model
or can also work with the Control model? Cause I mainly use the Control model.
it does load on both, but the effect didn't seem huge with control
ah ok ty
also, from personal testing, which is better, MPS or HPS?
i mean i guess il try both :3
Hi everyone I am currently in the research part of a video I am making "AI: Creating or Killing Art?". It would be a big help if I could get some of your answers on these three questions : 1. Can AI be a tool for artists? 2. What does it mean for something to be art? 3. Is AI "art" theft? Thank you in advance!
for fun, il answer those questions:
-
I would say it is a tool, just like a calculator is a tool, or a music sampler is a tool to make music with libraries (instead of hiring a real orchestra for example, etc)
-
what is art? I have no idea, but whatever it is, it is completely subjective from person to person. You know the saying "one man's crap is another man's gold" (or something like that). But the point here is that because it is in fact subjective, than what does it matter how it was created? AI or not. I can dislike or like an AI-only art piece just the same way as I can dislike or like a non-AI art piece. so what does it matter?
for example: what do i care if you composed a piece of music with only tools and AI, im interested in the "actual thing im listening to", so what do I
care how you did it, i just like the music. and the same logic applies to art pieces.
- i mean theft in what way tho? copying a style? i didnt know there was a copyright attached to styles. you can be inspired by a style, idk...
if that is theft, then so many other things are theft too, which would be ridiculous to go into in detail.
But to answer the main question of the title: creating or killing art, i would say creating for sure. again, the judge is the end user that looks at it and if they like it or not, aka, subjective.
Thank you for your input!
Create a smiley for me
🙂
Hello everyone! Is this a good course for prompting? https://grow.google/prompting-essentials/
Any good upscalers? Please help
hey yall whats the best starter tutorial for learning how to edit images of myself with stable diffusion? i know nothing and theres 100+ youtube tutorials vids. im wondering which ones explained it best for you
any specific youtuber or guide online?
where can i find discord of unstable diff?
Oh boi, gonna train my first video lora, tested with one few seconds clip, was a success, and now i'm preparing a few thousand frames
So far 2 clips, 1700 frames total, and it still remains to have 6-8 clips. So might take a good few consecutive days to train lol. Sadly i can't train with runpod and the like because i still can't figure out how to use it 
@vapid dove Scammer above.
hello
hello
Hey, welcome
Hello
Hello 🚘
hey everyone can anyone guide me how to use the image generation
First off, what's your specs? As that way we can determine which image gen model best suited.
o/
hi!
If someone has ever used the image-to-video endpoint and knows how to get a proper video URL please let me know!!
man, i wish making loras wasn't the equivalent of stick n ball torture without the stick n balls
soul sucking process
kinda want to find a model that provides a straight front view toward the character
with illus checkpoints "front view" tends to do the trick, add "high angle" too, helps sometimes
So how's SDXL consistency nowadays ? Can you generate the same character on and on ?
i think so but honestly for ease of use anre better reliability i'd just move on to illus
illustrious ?
yep
Mhmm I'll have to try then
pony just kinda feels outdated atp, only upside is due to being older has more loras
Btw, I've never tried: but can most models (or illustrious) render without a background ?
Like a character render
no clue, never tried, give it a try
plus you can remove the background in seconds anyway
by just using photopea
is forge a continuation of A1111? it doesn't just look virtually the same, it gets its extension list from A1111, too... are they actually compatible?
i'm trying to figure out whether i should finally leave A1111 behind and switch to comfyUI, swarmUI or forge.
forge is the better version
and what about swarmUI? is it built on top of comfyUI and contains all its features? no need for standalone comfy?
Went on the image arena to try the Dev HiDream. Did about 80 runs and i will say some of it is inflated, because it was put up against terrible models like 60%-70% of the time. But i do think its pretty good. Probably better than Flux pro if i had to rate it.
Do someone know how to create nsfw images with ai?like the works on pixiv
Is there some websites can do this?how do the nsfw creators on pixiv do?
you'll need a machine on your desktop and to generate there
is there a channel where i can post pictures of sand
hello
I don't know.... it's a weird model. It uses 4 text encoders, the flux architecture combined with mixture of experts, a fat llama3 text encoder where all kind of hidden state layers are used....
it feels like they tried to do "more is better" instead of "less is smarter"
actually it uses 6
nah, the architecture looks insane inefficient
I don't have the feeling they really though about how to make a good image model, but instead just throw everything into it they found and add a lot of money they had lying around anyways 🤷♂️
i have 2 cats
using other hidden states than the pooled layer helps for Flux as well
someone on comfy discord tried it
using unpooled clip l
After having seen the post from ostris on twitter (https://x.com/ostrisai/status/1909415316171477110) showing that apart from llama none text-encoders even come close to understanding the prompt and their influence is erratic at best (https://x.com/ostrisai/status/1909415316171477110) this model makes no sense at all. It wouldn't surprise me if the main reason for better prompt following is that it just has seen better captioned images during training. SD3 attempted to balance the text encoders, flux choose to favor t5 and only use the low on information pooled clip embedding, but this one seemed to just have gone full steam ahead destroying the effects of the other textencoders in the process. So weird, makes no sense at all. Can't help but wonder what their data + an architecture like lumina2.0 would have resulted in. MOE is nice though, but seeing this, i wonder whether it even works well in this model.
Never tried swarm, didn't like comfy
how do i prompt here?
Here? Ya don't lol, you do it in whatever app ya using or on Civitai website
not sure about the methodology Ostris used there
if one text encoder contributes most of the magnitude then removing it will result in a very low magnitude vector even in cases where it is working well
Hi quick question, regardless of price does work with stable diffusion profit more from an 285K or an 9950X/X3D?
hey, neither would make a profit as stable diffusion uses the gpu to generate the images
yeah
but what difference does it make
jealous god damn
1.4
that does not surprise me. Actually, it could have been worse. A common problem when training on T5 and CLIP together is that CLIP gets all the weight (because its easier to learn from). This is the reason why SD3 trains with a CLIP dropout and why Flux is only using the CLIP pooled embedding
there is just no reason to use so many text encoders together. lama should be enough. Maybe also CLIP-L for some styling improvements
also they not only used lama, but they also added each individual llama layer
so they used 4 text encoders + 30 variants of the llama text encoder
this is just an insane waste of parameters X_x
this is fucking awesome out of context
"also they not only used lama, but they also added each individual llama layer" 😭 😭 😭
gimme a car with a llama layer
3 levels of protection
Decent images come out with good understanding, so it does seem work well, does make it seem redundant though
hi
what do you guys use for image captioning? Joycaption? I am trying to generate a batch of prompts from images
ms paint
Hi, do you know someone that could help me to fix comfyui, even paid?
before trying that option, what is wrong and why dont you try posting in #🤝|tech-support
pls explain there whatsp
Yes comfyu manager don't install
but no one help me
if you have a good gpu you can use gemma3, its super powerful
@abstract quarry what about on runpod? is a 4080 SUPER fast enough
i do too just use paint too lol, quickly add a speech buble or just type wtv
yeah, think so. You have to quantize it a lot, though
llms are extremely fast as long as they run in gpu
it only gets a bit slow if you have to run in on cpu (or partly on cpu)
so you might use the 12b version. That should fit into your vram easily
SEAGULLS
Does (NVIDIA ONLY) on pinko mean I can only use a nvidia gpu on that AI?
I build with make and stability.
Offer it as a service.
I build automations.
And open for jobs
Something like that mate
Hello
Hey mate
how is stable difussion on AMD gpu with windows 11?
you'll want to visit the #🤝|tech-support channel and read through the AMD gpu guids pinned in that channel
Is the latest version of Flux_dev model possible in forge?
I am getting t5 state errors, and my PC crashes.
Hello! I used Stable Diffusion to create all the assets for my game (except for the player character).
What do you think? https://www.youtube.com/watch?v=xfDJggThbmA
Ive snt you a guideline add up and follow the.proccess step by step foe a better understanding alst
DAmnit, can't find much info at all on how lora training works at all for wan 
As i see people feed it videos, but it uses more vram, and don't know if i can just feed it videos as frames, or if it needs to be video files for it to work lol
Hey y'all! Joined a few hours ago
I've been scouring through the internet for a more thorough understanding of the various samplers, which lead me here, one of the biggest servers discussing diffusion image gen
For now, I've this question:
From what I searched, diffusion is baically solving ODEs, and there are many ways to do that. "Higher order" solvers achieve a higher accuracy at the cost of more computation per step, but should be able to converge to the answer in fewer steps total.
Then why is it that, say on SDXL, euler (first order) or dpm++ 2m (2nd order) can generate nice looking images in as few as ~20 steps, but stuff like dpm++ 3m (3rd order) or ipndm (4th order) needs >30 steps to get images without large chunks of artifacts?
with CTRL+M you can mute them, with CTRL+B you can bypass them
mute = the node is inactive
bypass = the node is inactive but will output its inputs without doing something
bypassing makes sense for stuff like lora loading or applying control nets where your input is a model and the output is a modified model. With bypassing the node still works but it just doesn't do anything
for disabling entire paths, it is sufficient do mute the output node at the end of a path. Comfyui will only run nodes which outputs are used for something
so if a certain path ends in a "save image" or "image preview" node you can simply mute this image saving node and the whole path will not be run
if you put things in groups you can right click the title bar and mute/bypass the whole group, much handy
hey
I couldn't run it, but I didn't try much, as I use Ubuntu and there most things work and with good performance.
Sup
I stopped using mutes or bypasses because sometimes nodes would have an effect anyway
they really weren't supposed to but its not rare to get nodes coded outside of the intended Comfy way
most common issue when this happens is what they call Monkey Patching
thought about making a node that only shifts the noodle traintracks but otherwise keeps it the same 🤔
it would be resilient to those bugs
hey, anybody know where I need to go for help with hunyuanvideo?
Runninghub open source UNO ComfyUI plugin
https://github.com/HM-RunningHub/ComfyUI_RH_UNO
Functions and features
Support flux-dev-fp8 and flux-schnell-fp8
Support flux-dev and flux-schnell running bf16 on 24g gpu, everyone can use it on local 4090
Detailed bar
Real-time display to query and track
Local model loading. Do not force hugginface to download the model, more friendly to the CN environment
thanks
thanks
Is there a location for feature suggestions/requests? 🤔
How do I use this without Discord? I'm very confused. I'm paying for a membership but can't figure out how to use it.
How you guys do your controlnet models? Make preprocessor folders and put them in there? so a Controlnet/Canny/ for the canny models? and so on?
Hi! Are you paying for a #artisan-faq membership or one from stable diffusion it self?
what are you trying to do? not much info in the message. are you trying to use SD ?
still no 9070xt support in rocm 6.4
is there a way to always auto-detect size in automatic1111?
yeah, but it's at least working seemingly well for me.
ZLUDA + experimental ROCm support
Has anyone got Hidream working in comfyUI? I tried running it but it just wont work correctly. I left it on overnight only to see the generation was only 17% done in the morning which is insane. I thought maybe it had to finish downloading something but I get no errors when running it and task manager does show full utilization of my 4090.
Hello all
I am looking to learn some basics of stable difusion, anyone recommend some courses? I found a few on Udemy. I am mostly at this time interested in generating text to images for anime and fantasy style artworks for making a TCG
Hi, I am creating original PUZZLE GAME for PC, Mobile and Board game as well. I don't have Nvidia GPU and found to run SD on my PC difficult. Will you generate few pictures for me?? Result should be buildings in simple cartoon style for the game menu. Game itself is in 3D...
I recommend using a cloud service such as civitAI
To get a style and concept your happy with because if i were to hop on it would be longer the just a few
you can always use stable diffusion on mage.space for free
Thank you for advices. I will try that. I already tried some cloud ai before as leonardo.ai but it always generated me the isometric view even I prompted front view, not isometric view. Bing/Create worked much better for me but miss there more controls.
how about using a real photo? I would never invite somebody to an interview if I find out his photo is AI generated
putting anime AI avatar on your CV is not recommend
what would be the best ui for me to use aside for comfy? it looks too complicated for me
Anyone know if making stereoscopic 180 deg vr videos with Wan2.1 is possible?
ReForge or Forge
Why tf is Flux so blurry 
for you 2?
yeah
you can get a bit less blur if you put wide image
and low guidance
when I say low guidance I mean like 1.4
I'll try ...
hi
gn
forge and swarmui are the most popular ones. I would also recommend InvokeAI. It's by far the easiest to use and very intuitive, but it seems to be a bit of it's own community
it's an opinion from someone who used all four tools 🤷♂️
some body is smoking something and they aint sharing it
gonna respect ban speedrunners 🫡
are you in need of a mental hospital
although it seems like we can't report anymore, hm
nvm, we can, I'm blind
The application did not respond...interesting, interesting
needs a new AI by Tesla
@vapid dove your bot is out of control
well hello there kind stranger, does thou want to striketh a deal
drugs
50cent per kilogram
croatia
tomorrow
ill pay travel costs
well then thats a deal!
great doing business
no prob dude
you gotta leave now tho or youll miss the flight
😉
dont check your walls
Is it just me or can't I work in and queue multiple tabs in Forge? And do I really need to zoom in all the way every time I need to place 1 pixel to extend the inpaint mask region for necessary context? Those are actually deal breakers for me. Can't really tell what advantages Forge has over A1111. Integrated extensions (controlnet etc) aren't really required. Went back to A1111...
guys is did anyone get the " Torch is not able to use GPU" error when installing sd?
zero coding expirience btw
No coding experience needed. Just knowledge how to find incompatible modules.
Look in the cmd window.
Most likely you installed just "torch" and not torch with cuda compiled.
Do pip uninstall torch torchaudio torchvision
Then go here, https://pytorch.org/, scroll down, select cuda 12.6, copy the pip command, and paste onto your cmd
~~ i have to uninstall mine constantly because it overwrites the torch dev i use~~ lol
cant post images in here
rocBLAS error: Cannot read C:\Program Files\AMD\ROCm\6.2\bin/rocblas/library/TensileLibrary.dat: No such file or directory for GPU arch : gfx1031
rocBLAS error: Could not initialize Tensile host:
regex_error(error_backref): The expression contained an invalid back reference.
that the error i get
yeah im already in tech support
hey
Hi
Hi
hello
ahh.. heck.. 128 rank lora training, 25 epochs, one save per epoch, 1.2GB per lora, and output folder for this round is now at 86GB 
wtf
more intensive
it rly needs SVDQuant or some other method
I mean you could stick it in Quanto and ask for Int4 and see what comes out the other side 🤔
duckers to me it got so bad I bought a whole SSD dedicated to AI stuff
These models and checkpoints and pytorches and all this crap takes up massive amounts of storage space
HOY
hey my friends
yup, got a whole 2tb for different AI stuff
but ngl i could do with less if i cleaned up once in a while
same im just way too lazy
but hey if my motherboard came with 4 M.2 slots I might as well use them and get my money's worth.
hell yeah, im still waiting for video ai opensource to be a bit more stable before i tackle on another nvme
Hmm wan is pretty decent if you do small stuff
no way you can run that without tens of thousands of dollars worth of GPU power
true
And ngl, simple animation is really doable in wan already
Yeah lol
can you run that shit on a 5080?
Yes! I run the 480p model no problem
whoa. Im gonna try it
I could try the 720p but eh the 480p takes 8-9min
what GPU u got
ooh
I have to say i run it in swarm & not the nightly comfyui version
I messed in some torch files to get it to work
did you have to juryrig your swarm to work with 50 series?
I see
Yeah dude I called it quits on #🤝|tech-support because this is just flat out ridiculous
Hmm but theres a guide in the swarm discord on how to get it to work
textbook case of dependency hell with this whole compatibility stuff
They usually do work but theres always this one mf component in my case flash-attention that isnt yet supported or requires some ridiculous workaround
Oh if you can wait till tomorrow after my work (00;03rn, i im home at 6pm ish) i can send you my comfy portable
Oh i dont use teacache or flash attention
But my comfy works straight out of the box, gives you a error you gotta click ok on though because torch audio is broken due my messing around
I use flash-attention for LoRA and Dreambooth
I dont use image gen without being able to train my own subjects
I hate genning randoms
I mostly make known Characters
Dreambooth for lora training eh?, that works on 5080's?
Or not yet
This is what bothers me, because some websites say yes, some websites say no
There are people on github claiming to get it to work
Hmm i got it mostly working on KoyhaSS
and then you go into the flashattention/xformer dev forums and they say there is not
I havent used kohya in a while but last time I checked dreambooth sucked ass for it, I only used it for loras
I had to get a nightly version of torch, torch attention etc
And I understand not everyone likes dreambooth, but once you do a Dreambooth+Lora of a character you never go back. It's literally perfection
And messing in the configs was annoying because it kept replacing files here n there
For now ill stick with civitAI trainer lmao
for 1.5 it was heaven
I dont recommend dreambooths for anything beyond 1.5 bc as far as I know they never worked well
Im old-school. 1.5 still holds up for photorealism all these years later.
For anything other than photoreal you got Pony, XL etc
Less then a second for 1.5 gens with no loras
When it comes to waiting 1.5secs vs. 5 secs it makes no difference to me
Flux also gets me 6s gens
Im more about quality over quantity
If I can't train it, it's garbage - thats my motto
Same ngl but running refiners etc at a decent speed is nice
Went from 27 steps to 40+100 refinement stepd and still come out on top in speed
it is nice if im making a logo or concept art
is there a fix for lora and checkpoint previews not showing?
can someone help me fix my reforge? in #🤝|tech-support please?
Done
Hi! I'm looking for a community where people actively train their own models — like LoRA, DreamBooth, or full fine-tunes using kohya_ss or SDXL.
I want to learn from others who are doing real training, sharing configs, logs, tips, and maybe even datasets.
Is this the right place for that, or can you point me somewhere more focused on workflows?
the discord servers of the training libraries
but uh
I would recommend reading arxiv instead if you want to keep up on methods
there is like an entire line of research that goes through stuff like AlignProp and Adjoint Matching that the online training community didn't really adapt to
#🤝|tech-support check this (help) ><
could you switch to comfyui?
I don't know forge but there are forge people here who could help if you switched to forge also
Hi.can you share a link pls or other links that can help me out? Chatgpt helps me a lot but damn ge know how to waste time and go in circuts
AI Toolkit discord and Onetrainer discord are good
simple tuner discord is the best but might have stopped taking new members
huggingface discord does cover these topics but tends to be low on activity for these topics
I'm not actually aware of a koyha discord, this feels odd cos its the most popular trainer
but that are preference optimization methods. I think he is interested in finetuning and the current tools are quite good for that
I only used kohya and simple tuner so far. For simple tuner I'm not sure if it is supported for windows, though
they get categorised as preference optimisation methods, and you are right that that is what they are used for most of the time
but the reward model can be anything, it doesn't have to be trained on human preference
as a dumb but funny example they used a classifier that counts strawberries as the reward model
and the resulting fine tune would fill every image with strawberries
I can only tell you: don't listen to much on youtube videos that tell you "the perfect parameters" or "the only way to finetune". There is one particular guy, for example, who spammed a lot of videos and he has not really any clue he is talking about ^^°
in the end you have to experiment a lot
cause the result very much depends on your training data
ye I haven't watched AI youtube for about 6 months
I got tired of the mind blown emoji
and parameters might change with every training data 🤷♂️ Although, I would argue that the data itself is much more important than your settings anyways
I HATE youtube videos that explain something. I mean, its fine when people like to watch videos instead of reading a document file. But I'm really pissed of that for many topics nowadays there is no document file or tutorial text anymore, but just a stupid youtube video
apparently Meta just grid search param with brute force
there's not enough reason to avoid grid search yet
Blender youtube is also bad as I found out this month
yes, I'm aware of that. Let's formulate it a bit different: its reinforcement learning. Its a very powerful but also very time-consuming way of finetuning a model. It's not good for introducing new concepts or subjects.
ah yea it can't add a new subject matter that is true
I read that at the moment the LLMs or diffusion models outsmart the reward models eventually every time as well
"reward hacking"
like the famous "aesthetic" reward model gets hacked by making the images brown with wavy lines
gm
hi
maybe I am overselling reward training TBH
it works really well sometimes like the SPO loras for SD 1.5, they really transform that model
Hello !
hi which robot is available to create image now
When you download all these things for SDXL get it setup then you find out SORA...
generations way better and it understand what I want
highly recommend
sdxl is an old and much smaller model 🤷♂️
you have the weights of SDXL and you don't have the weights of SORA
given enough compute you can make a better image with SDXL
Prompt understanding is way better and it saves me hours of compute time. SDXL struggles to do specific prompt for me
I'm not rly a fan of prompting anyway
I think its better to split an image up
into tiles or segments
Hello party people
hello
Hi
huh
hello
not sure if this is the right channel to ask, but does openpose work with forgeUI? I've been trying to get it to do something but the generation tends to completely ignore the openpose controlmap
Hey just checking out SD.
Hello everyone
@woven panther from the same guys that did ReCamMaster, now we have SynCamMaster:
https://jianhongbai.github.io/SynCamMaster/ (also based on Wan2.1).
Will you support this too?
hey guys, what is the best chat for me to go in to ask for help?
Hi, I'm new here in Discord. I create visual art and lo-fi hiphop without using AI as well. I've been using Stable Diffusion since the beginning of this year and I am still a newcomer in this field. Nice to meet you all! 🙂
hey guys, can someone suggest some easy youtube videos to help me learn state defusion please
@vapid dove Spammer above
Probably #🤝|tech-support If you've hit a snag
Also, when one reinstall windows, i guess the best course of action is to wipe venv to avoid hitting previous windows install bugs and incompatability 
I asked one question yesterday and got 4 people dming me a link to a server telling me to create a ticket for my query, is that normal
well I just found out the name is blocked by automod, so that might be somewhat telling
guys, do you have anote site like Vast.AI for me run SD? I'm having problem add extensions on Vast, started last week
You can add extensions if you don't have --listen or --share in the webui-user.bat
where do i download
Hi there, what gpu do you have?
Before you download it's handy to know before you do
To see if you can run it or not
I don't have it but my extensions dont be installed 😭 It's correct the webui-user.bat?
I will show you
just got a 4090 laptop so thats what im working with
not much but my bday is a month away i told my mom so its a start
4090 is pretty good
oh ok
Id check the #🤝|tech-support channel and follow one of guides in the pinned messages. Personally i use swarmUI but if it looks confusing there are more tutorials on forge
swarmUI got it thanks