#💬|general-chat
1 messages · Page 177 of 1
Its my opinion :D
A1111 is as easy as comes. A big field in the middle with an orange generate button and an image preview. All the extra stuff is tucked away neatly in settings
StableSwarm is much more open ended than that and has much more settings from the get-go. Let alone the comfy stuff
Heya ^-^
"StableSwarm" was a project from a year ago that swapped to independent development way since, if that's when you last tried it, you're way behind.
A big field in the middle with an orange generate button and an image preview.
This is describing SwarmUI. Hell if you use the "Gravity Blue" color theme even the colors are the same.
All the extra stuff is tucked away neatly in settings
Yes all the extra settings are tucked to the side, with an Advanced toggle defaults off so you only get the reasonable settings at first, and more complex settings only when you want them
Ah so thats where the confusion comes from :D
I was talking about stableSwarm, not realizing theres a SwarmUI thats different :D
Oh but it does look similar
it was originally a project at Stability, but it switched to an independent project last year
the "Stable" prefix means you last tried it around a year ago
Yeah ^-^
So wheres the hires fix button? :D
Would you just use 'refiner'? What upscaler does it use?
I know its a bit pedantic since I could just look for myself. But like I have a way of working that works and I think UI that you need manuals for are... less than ideal
Hires fix is "Refine / Upscale" param group. You can select the upscaler model and all that. https://i.alexgoodwin.media/i/misc/2cbedf.png
if you know what you're doing, you don't need a manual at all
if you don't know what you're doing, you can click the ? buttons which are direct integrated little explainers to help you learn
Oooh I would love to be able to finish my projects like that :D
I know I am probably not a regular user tho :D
you'll also see in that screenshot I put a lot of effort into UX clarity, like how the Pixel: Lanczos option has that tacked on (cheap + high quality) and Nearest exact has (Pixel art)... cause these names are confusing but it's easy to clarify which is good for what
Formerly known as StableSwarmUI.
yes
I am sorry if I come off as harsh. As a fellow UI/ UX and software dev I have some opinions :D
And I know fully well it could be bias because I worked with it a lot but I still think the A1111 UI is better, especially for users that have no idea what they are doing
(which I am not, granted)
I can almost guarantee that if you take the plunge and decide to mainline SwarmUI for one month (or however long reasonable time period), at the end of the month you won't want to go back and won't recommend anything other than swarm to anyone after
cause yeah there's a familiarity bias... and once you overcome that, the choice is obvious imo
I will accept that challenge, even just to honor all the work you did with UI. But I wouldnt be so sure of my opinion changing :D
I have pretty strong opinions
Once you go swarm you never go back
Can second; Swarm is great
Unfortunately, I can't create a frame using this method, no matter how I change the parameters. And I can't enter a search query, I don't even know how to do it correctly, unfortunately(
Nuke emmmm
Tried pinging the mods but the scam/spam they do is a banned word so im not allowed to post it
Its the classic getting to know you and trading
Send that to the mod
But im cooking shrimp fried rice so ill do that later
Thanks @slender vault . Not familiar with custom nodes but I'll search around. For ollama, are you saying you enter a simple prompt and it updates it and hands it over to comfyui?
Currently testing swarmUI. Where can I control Denoising strength of the refiner? :O
Afaik the refiner should work as the new Hires Fix right? And there I could control steps and strength seperately
Refine/upscale. Its basically the same thing. The little ? Explains a lot in the webui
Also same menu in the generation tab
But like its both? And neither?
No it is highres fix
Like I am used to having something like 30 steps and a denoising strength of like .46 :D
In refiner terms that would be Refiner Control Percentage of 1/ 100%, but I cant fine-tune the denoising strength :O
The questionmark next to the slider refinement is the steps control, just a little more advanced
Turning on my pc after the dishes, i can tell better once i see what im describing lol
woops yeah forgot its not a default thing
also hmm esrgan i mean you can just transfer it over
in StableSwarmUI\Models\upscale_models
you can add your own upscalers
Lets see...
Yeah I can use that upscaler now thanks ^-^
But it still messes up my images while upscaling... somehow :D
Like not mess up directly but like... the poses and clothes are totally different than the non-upscaled version
i mean upscaling slightly changes stuff
if its a huge difference it could be that the base settings in forge and swarm are different
forge?
or a1111 or whatever you were using before
try decreasing refiner control percentage
A1111 :D
And yeah I guess the settings have to be different... somewhere
But that just decreases the steps right?
more steps = more change no
This is similar to 'Init Image Creativity', but for the refiner. This controls how many steps the refiner takes.
Yeah. I am already using like the 'refiner steps' so I can get the steps I want :D
Its lots of... math
Aw gawd it really is that Refiner Control percentage :D
Even if it uses the same ammount of steps in the end (18) its different when you have a RC% of .6 and a step count of 30 vs a RC% of 1 and a step count of 18
Whyyy
hmm youd be better off asking that in the swarm help channel lol
im just a user
Me too :D
But I was told Swarm would be easy to use :P
i was teaching my friend how to dip his toes into stable diffusion. of course he wanted to do face swapping. 🙄
best method i know for that is ip-adpater with the lora. we couldn't figure out how to achieve that with swarmui. seems you have to abandon the front end and only use node graphs for that use case. So he learned forge ui instead and sticks to that.
isnt there a extension for that or am i misrememberig
https://github.com/Stability-AI/StableSwarmUI/blob/master/docs/Features/IPAdapter-ReVision.md we followed this guide (barely a guide really) and it only gave us use of the style transfer aspects
swarm and comfy is just superior all around
for some people. others see a node graph and dip immediately. "i'm not a god damn engineer" i've heard.
comfyui is very much a programmer ui. not a user ui
i get it too. while i love node graphs, i hate sorting through other people's
ComfyUI is barely a UI :D
if youre stupid maybe
Nodes are really just a step away from writing code
Well since I also design UI, I just tend to have opinions on UX :D
Exactly. Thats not really UI-ing anything
many people are. and that's ok. if everyone were a genius, then a few would be hyper geniuses and the geniuses would be considered stupid still anyways. it's a oroborus really
And on top of that... even if you can wrangle nodes... why bother if other UIs do that stuff for you
then use swarm if that much of skill issue
Oooh the Skill Issue. I am so impressed /s
yeah. precisely. the masses want the path of least resistence.
Even I as a dev want less resistence. I have enough of that on the job :D
consider HIFI audio. back when american audio manufactuers were all about high fidelity. quality tubes. Then along came Sony mass producing microtransister amplifiers that had a lot cheaper quality.
They swept the market. the market didn't want the "best". They wanted the one that worked easily and gave em beats to jam to anywhere they went
now there are no american audio giants. it's a dead market
Exactly. Why would I node stuff if I want to just test out a model and do some basic images and hires-es of those
i'm trying to be a swarm ambassador. there's just hurdles. I hit enough on my own when i'm trying to set it up for myself. So figuring out how to ipadapter face swap in swarm is a hurdle that after fucking around with i just abandon efforts and teach my friend forge instead lol
"this will work for now. don't get used to it" ||he did||
even einstein suffered skill issue. he coudn't wrap his head around QM and wouldn't accept it. "God does not play dice with the universe"... well. turns out he was wrong on that one. skill issue.
Also, had his ex wife do most of his math for him
not to bring down einstein. Genius of his time. But even he suffered skill issues
Don't get me started on Hawkings lol
seriously offended by that
monkellie huh.. cw playing as another gender as cover? could be. "that guy is weird" they told me too. out of no where. tbf i was being weird. but thats a cw thing to say still
i still feel offended. cw has that effect on me
i think i'm the sole reason he's not the subreddit moderator anymore. he would've gone unexposed for who knows how many more years
i dont even use reddit. i just was in the /r/stablediffusion discord server to catch info drops. i like to keep ears to the rails
Its working. For now :D
I still experience some growing pains but so far I can work with it. Refiner Control Percentage is just a weird thingy :D
i think they reversed people's bans after pulling him outta rank. but, how many of those people would return to the community?
one bad apple will fuck up a lot of it . all that toxic rot spreads fast
i like the nodes, helps me figure out whats going on instead of filling out some bureaucratic form and waiting on line confused
So node fields aren't a form? oh..
ok
i been accused of being that guy lol. i kind of know him too. he lives near me and a couple others on the island that have been around the ai scene. we've considered meeting for coffee but meeting these online discord psychos is a high ask. (like i'm not one myself)
common im sure you can see the diffrence between the 2, dont be a CW
near me if a 3 hour drive is near
you cut me deep kagi
sorry
i feel like a comfy node graph is a dmv beuracrat form more. like how every form is entirely different and if you don't do it exactly right you've gotta go back and fill out a different form
ya i can see what your saying
now i'm considering making a vogan poetry comfyui workflow
im a girl but then again how could you know since you never spoken to one before 😉
he was talking about me an cw. weird
ever see this stuff that pops up around Forge, another reason https://github.com/lllyasviel/stable-diffusion-webui-forge/pull/2151
Rule 37 is my creed
for some reason that author attracts a LOT of dick swinging
yeah I don't like the conflict between the GUI devs
or between model makers (SD vs Flux)
i laugh to cope
I feel like there are too many competing commercial interests in image AI world
and people often don't disclose their commercial interest
https://github.com/wileewang/TransPixar g'damn woke language coming to video models (jokes. this looks cool af)
Is it possible to generate Hunyan videos in 30 seconds with a rtx 4070 super?
Or are we not there yet?
I mean how long it takes to generate a short video
ok thx
I wanted to suggest you ask ChatGPT, but I tried it and it's too stupid. What UI are you using? This would be easy in ComfyUI. You need a workflow for inpainting. Then you need a mask for the frame. Then you inpaint the frame using the mask and a description for that frame.
hey guys, what happends if my house electricity goes out in the middle of using SD? Will my gpu be ok?
The better question is will your house be ok?
its on again
but it went out right in the middle of a generation
Thank you for your concern, I achieved the desired result and am now very satisfied)
and its a 3090
Power outage can corrupt your file system but your GPU will be fine.
Open up a CMD as admin and run
SFC /scannow
That fixes system errors
Then clean reboot the PC.
Outtage can cause dataloss
Dataloss meaning losing pc files?
Probably why he's here 👀
Wouldent go that far but its @fervent thunder change it bruh
It's made in late november, photo is clearly a discord violation
so surely it's not his first time
And chances are, he posted that within the first minute of joining the server
Yeah
Because it's that hard to google it
Who are the mods even, still got a scammer to report too thats active in dm's
tutti @karmic brook might be lurking 
people like that on discord are 100% child abusers. how could they not be?
its like taking it out on omegle. Only molesters are capable of that behavior
or going to a public pool during family hour in a speedo. there's only one type of person that'll do that
people like him are worse for society then say, drunk mel gibson talking about black people
discord got him i think, banner/profile doesnt load anymore
Wtf is going on 😭
yeah even on 3090 it takes like 2min for like 480p. Ltxv is a bit worse but supports i2v and t2v and can gen in 30 sec though.
Is that why people wait for i2v? for the fast gen speed?
i2v is not always better, but yeah speed is same. You do have more control because you can input a image, hunyuan is much better quality but doesn't support image to video(i2v) and is slower.
It should fit very easily but not sure if ltxv has support for amd.
so boyz, can i get some help pls?
what is best openpose control in pony
and how do i install it on comfyui?
and in what node
to put it?
it changes pretty regulairly so not many people are interested in a niche to keep it running
but theres some solid sources on less reputable sites
and who would pay for it
i mean if anoying ads arent a problem yeah
yeah but theres a few problems with that
mainly you got to assign people
or you get problems like "w-e-w edited this page on Sep 10, 2023"
honestly i think its worth it to join a community (even as a lurker) to see once in a while whats going on with the latest
because its hardware requirements are too high for the average joe
yeah
theres some smaller communities but those are anonymous that i know
but they make lora's. dont post them on civit
just on like megaupload. have threads and its much slower then here
Do you use Linux or zluda ?
based
Yep using it mainly, on Windows rn
Its a bit slower than rocm on Linux but it handles the vram usage pretty good
In using it with Auto1111, Forge, Comfyui, Fooocus and OneTrainer
I made Guides for every ui:
https://github.com/CS1o/Stable-Diffusion-Info/wiki/Webui-Installation-Guides
you quit linux cs?
can anyone help me pls?
is it a technical problem or a generic question
which pony comfyui openpose best ?
I'm updated with everything as long as the repos are updated.
Python 3.13 isnt compatible with any ai stuff so better uninstall that.
3.10.11 64 bit is the best one
am i asking in the wrong place?
On windows there is HIP SDK (ROCM) so zluda uses that
no but i dont have a awser directly at hand
but yeah the comfy channel is also a good place
On windows zluda is needed because AMD doesn't provide the full rocm feature set on windows
Yep that's why on Linux amd works better with ai stuff
Yea but maybe CachyOS brings me back
Looks very good
depending which model. hunyan cant do that yet sadly
hello there!
im new to this and wanted to try out SD for the first time. was wondering if a laptop with a "NVIDIA RTX 2000 Ada Generation - 8GB" would would work fine?
or should i try looking for a more expensive one - like one with "NVIDIA RTX 3500 Ada Generation - 12GB"?
Hmm with both your gonna have some rough generation times but i dont know enough about laptop gpu's to say for sure
Speeds*
But the 3050 should offer better performance but ill wait till the someone else also chimes in
how do i add a load image node?
so in general, SD would work with both, its just the time used that is different?
Yeah my 3070TI takes 14s on average per SDXL image
But since its a laptop i cant guarantee you get similar times
thank you !
I would google the gpu + stable diffusion + reddit. Sometimes it shows what you can expect
pony is a disaligned model that destroyed the unet and how the tenc aligns to it. its very poorly trained so in effect, none of the sdxl controlnets work for it. combine that with no business serioulsy wanting to invest into that junk heap, so there's no controlnets made specifically to work with it.
illustrious i've heard is better. it refined sdxl without completely butchering the base weights
if you really want some guidance on pony models i would reocmmend using depth models at high strength instead
Am i allowed to share a workflow here (ComfyUI), I just need some help
#🧣|comfy-ui might be more appropriated
to use ip adapter in swarm, just drag an image into the prompt area. IPAdapter install option appears on the left, it autoinstalls everything including face stuff, then actual settings pop up on the left. Select the face ip adapter and fiddle any other params and hit generate.
Relevant docs are here https://github.com/mcmonkeyprojects/SwarmUI/blob/master/docs/Features/ImagePrompting.md not the old stability repo
there's also a swarm extension called "Facetools" that has dedicated additional features for face swapping stuff
join the swarm discord, anytime you're stuck just ask in open-help-chat, usually the answer like above is very easy
from what i've seen illustrious has some burn of its own, just not as bad as pony
it's still difficult to intermix loras between illustrious and normal xl
be careful, the names of gpus you listed there both look wrong. Generation names (instead of specific card), 3500 i assume was just a typo of 3050? generation names are both "Ada" but ada is 20xx not 30xx, and 3050 mobile sure doesn't have 12 gigs of vram. Whatever you're looking at for gpu options is all wrong there
for mobile chips i wouldn't go below an RTX 3070 (8 gigs of vram) - no 20xx (too weak), no 3060/3050 (not even vram). 40xx options are much better if you can afford it
lower gpu tiers can work but will be pretty slow on modern models, and won't let you expand at all into high end ai stuff like hunyuan video or large llms
you might also want to wait a month or two before buying, nvidia 50xx series is releasing soon and is likely to change up the market a bit, at least lowering the price of a used 40xx option
i've been noticing a big influx or illustrious models the past months
would you say illustrious is better than pony models?
that seems to be the general opinion from the people using em yeah
my big test for pony and pony photorealism varients, is geographic locations. Pony doesn't know them at all.
I will be testing the same scenery of realworld locations with illustrious and variants too
not true. Many anime models still know victoria hill or mt fuji. Pony XL does not do these scenes very well
Anime has a long history of recreating actual locations. and if the text encoder of the original base stayed aligned, then the additional tags wouldn't break that training
i've tested a lot of the realism merges . geographic locations aren't well known and the realism models work so much better for that knowledge base
presumeably, cyber realistic xl and the pony version both use the same dataset. you can test what i'm talking about on those two versions
Hello, is there a comprehensive guide to file types and when to use them? safetensors, checkpoints, pt, etc.
safetensors is the correct/modern format.
pickles are a legacy format that you should avoid. checkpoint, ckpt, pt, pth, bin, ... are all different ways of referring to legacy pickle files.
gguf are the cool new kid on the block specifically built for quantized models (so you can use big models like Flux on hardware with limited vram)
some software still uses pickles for things, so you're sometimes stuck with it, but for your normal diffusion models and loras and etc, pickles should never be used
Great! Thanks Alex! Just getting started with Stable Diffusion and there's just so much that it's hard to know where to start. I appreciate it.
i know . i typically don't use pony models. i just wanted to test the "prompt comprehension" of them since fans were blowing it up. it was all just hype though
would be better to judge it using standard benchmarks rather than whether it knows Mount Fuji or not 🤔
nah. benchmark tests suffer goodhart's law. when models train to work on the test very specifically.
i prefer to do real world cases and find the flaws through that sort of usage. mt fuji was just one small example.
when a specific metric is used as a goal, it can become unreliable and lead to behaviors that prioritize that metric over other important factors
this happened with MSCoco Fid already yeah
they worked out how to game it
probably imagereward too
each person is going to use these models very differently, so if pony and the realism variants work for someone's goals, more power to em i say
some people seem to like them yeah :shrug:
there's only one thing to presume baout this. scam. the eth gives it away
and the fact that it's his first message and that the message is so vague that it would work on any server and that the bot that added the check mark has never posted anything on this server.
I get that the Indian tech scams work, because grannies just don't know enough about computers. But who the hell falls for this?
Hi. Please tell me you're not a bot.
Yea it's a bot. What a sad world.
a
as
ast
asta
astar
astaro
astarot
astaroth
gm
Hey guys, new to the discord but have been trying to create/train a model by learning through youtube/docs/article/chatgpt and using google colab and trying web platforms but after days its not working out. Can ANYONE help me. I want to make this (https://x.com/mmazco/status/1876336631080419593) myself, in the exact same style (so will use SSENSE imagery for training data + my images of me). Then want to 'try on' clothes of products i find on retailers websites.
I will PAY someone to teach me and hop on calls etc
sorry for spam
i think its only partially possible. the trying on clothes of retailer websites is gonna be the challenge. i see in the screenshot they uploaded a bunch of selfies and two full body pictures so it looks like they just pick take a model with similar build + skin color and faceswap you in
getting you face swapped in is not the difficult part however
hey guys i have a gtx960, too little for SD. I have budget of around $500 for a new video card. Can I get a card that will be good enough to generate images and maybe 30 sec - 1 min clips at a decent resolution?
new to me video card
Hey does anyone know when hunyuan image to video is going to come out? Any word on it?
they for sure dont face swap only - people have commented on legs and collar bones being right
and hands etc
the twitter post you linked litterally mentioned they dont factor heights and other measurements
maybe once the 5000 series releases a card might drop enough for your budget but im afraid there arent good ai cards around 500
but its gonna be okay for images
you already need a good card to generate a clip of 3-6 seconds
30s to a 1min is insane
darn
i figure to string together 3 - 6 second clips really
but what is a minimum for ai in terms of card
if theres enough cohesion to follow up on it
hmm i got a 3070TI and i can make SDXL image within 13-14s per image
nvidia is best out of the box for this kind of stuff but theres AMD users too but idk the times they get
yea i was looking at 4070s
o the price dropped on the 4070 since i last checked, it has more Vram that i use
and its newer, your def getting better times then i would
in euros its higher lol. yeah thats an ok card. maybe not for video.
Flux images however will take some time
how about the used market, is a used high end card a bad idea
hmmm i would avoid any card on e-bay thats coming from china
usa to usa
it will be a gamble. tbh
yea
damn i forgot the digital money word is blocked
if they used it for them fancy money
its not worth it
my goal is not just image
i desire to run a small dataset on a k210 like chip
for purposes of fruit harvesting
?? so you want optical recognition
traind locally
yes, i am working with a k210, and various sensors right now too\
chatgpt and i are making progress
but
SD has given me a bunch of ideas
regarding local usage, so i'm exploring that thought also
for other ai stuff that I'd goto GPT for
lol , what happens when a workflow fails via comfy api but runs when opened and run manually ??
HOWWWW
I've sent you a guideline in your Dm @fervent thunder
yes, in terms of telling users how product size chart measurements relate to their height/size and so whether item will fit them - it does not do that. But it does use their body imaging to reflect their skin colour and resemblance of height/body type. it is not just a 'face swap'
can anyone help me?
is k210 chip still relevant? I thought since then, new chips came out with much more power
Could be using a dept midas or similar but again youd need to train a lora for the clothes per outfi/clothing piece
And the technology they use is probably not opensource
Or another control net to get similar shapes of the person it self
Hello everyone, looking forward to creating and contributing 🍻🍻🍻
Scan links even in DMs before opening them please 😱
you know how to do this? Will pay you to work on it with me
i mean if a team working dedicated to this craft (2 people atm) with experience in this still only got a beta i doubt a single guy would (if he was capable of this) would show you / sell this to you
i could probably do it in limited capacity but it would take much time and would less impressive
The face is not hard, they are doing lora or faceswap using multi images as reference. But creating image of person wearing different items is hard. Anything that does a decent job at it is closed source. I know a team who had created their own model for this, they worked for months on end for it and only offer the service via licence to businesses, i think it was 20k + a year.
Anyone know of a good anime checkpoint that is on the latest version of Stable Diffusion?
flux and sd3.5 i dont think have anime specific checkpoints. you'd be better off looking for a lora . most checkpoints aren't better than the base model there anyways.
outside of those two models, sdxl anime checkpoints are basically still king. illustrious is the superior one lately
the positive thing about it that it isnt pony
its just overall quality is better and it doesnt have the same pitfall pony does with the score_9 score_8 etc
ponyxl is very poorly trained. the score_ junk should've been an indicator to anyone, but there's so much fandemonium about it that people got blinded
it got hype cos it had very strong prompt adherence on the new dataset it was trained on essentially
like if you measured Pony's clip score for its dataset versus the clip score of the original SDXL on the original SDXL dataset, Pony would benchmark a lot higher
different dataset scopes really. one started from a nothing model with 2B + images. One started on SDXL with a few 100k images
it was over 6m rather than 100k, but that's not a very important distinction
I think it was a reasonable fine tune given that the goal was to increase the prompt adherence on that dataset, and that is essentially what was delivered
https://youtu.be/MQz58wPvT3I?t=4887 he says 2.6m here in the civit interview. we both off by a factor but the truth is somewhere in the middle
What is the best way to upscale flux images in forge? The hi.res upscalers I tried are pretty bad
ah okay yea it was in the middle
I don't know forge but it probably has SD Ultimate Upscale
which is not great but will beat stuff like Ultrasharp
Flux is basically an upscaler I haven’t heard of anything that can improve flux gens
maybe but the next gen of upscaling models on arxiv look very good
heya, so I've searched far and wide on the internet and I've got a question. Is there a way to generate two characters in one image correctly according to the prompt? I'm making character art for my dnd campaign but whenever I try to generate 2 characters at once, they switch skin colors and hairstyles
I've tried many ways of specifying character 1 and character 2 but none have produced a correct result
Generate two characters in a image (open pose for pose) inpaint over the character you wish to change
Is how i do it
thats a whole lot of extra work when theres an extension to do so
Yes for forge exclusively
okay? and?
what's forge lol
What if hes not using fore like i am
a model?
oh I use the default webui
Also swarm inpainting isnt that much extra work ngl
will it mess up my folder if I get forge and have to reinstall everything
or is it just a bat file
For auto1111 there is:
https://github.com/hako-mikan/sd-webui-regional-prompter
Np
also do models just start breaking at 2048x2048? I noticed that when i try to generate that high, there's deformities and stuff
depends which
1.5 models will break above 512
xl will break above 1024
flux above that
2k
but yeah u dont do it straigjt at 2k
start low and enable hires fix
it sounds reductive but I would just recommend inpainting for this
i.e. generate the first and then inpaint the second
turn the denoise down to 0.2
The highest I can comfortably generate is 1500x1500
pony
when I generate at below 1024 the colors are typically wayy too vibrant like it didn't go through color correction
deep fried
yeah cos ur genning too high
1024 with higres fix will attain what ur loking for
turn the denoise of highres down to .2 or .3
in know and im saying you ashould do it with highres enabled
genning pure 1500x1500 isnt gonna be nice
if you do 1024 at 1.5x upscale youll achieve the desired results
the models xl were trained on is 1024 res
anything higher than that natively is gonna break
thats why we do highres fix, to gen at 1024 and upscale it at 1500 or w,e
so how does highres fix actually work/does it hog up more memory than just generating at a high pixel count
maybe adds an additional 10 sec or so, for me anyways
all hardware dependant
but this is the way if you want coherent results at higher than 1024 res
i mean my results have been coherent except for my characters getting duplicated sometimes
eactly
thats the common issue with genning higher than trained on res
you get duplicates
artifacts
unwanted noise
iii seeee
highres fix might be faster than generating at high pixel counts then if I can just generate at 1024 which takes considerably less
if you want to gen stuff natively at 1500x youll need to try flux, it can but its super hardware dependant
my 12gb 3060 cries with flux
nice its a good card
when i was buying my pc i had the option of either a 3060 or a 3060ti
the 4 extra gigs looked nicer on god
anyway how many sample rounds would you say is excessive
i sit at around 50 right now
i tried 25 and it wasn't quite good enough for my eyes
35 to 40 looks acceptable and 50 is nice
i persdonally do 25
I mostly use 30 with 10 hires steps
oh hires has steps too

lots of stuff like steps is an "it depends" thing
sd15 was trained on 512x512 images. so if you generate a 1024x1024 image, it'll try to do that 512 patch 4 different times. leading to stretched out and repeated characters. As if it's trying to redraw the same image 4 different times.
To better facilitate higher resolutions, you want to do hires fix. where you start with a lower resolution target, and then it does a second stage , using that first lower resolution as a base for the higher resolution. denoising the base gives it a better result than just starting from pure noise.
sdxl uses 1024x1024 and other aspect ratios around 1mp as a base. but the same applies. Go above that and it'll duplicate the image over the higher resolution, so you need the hiresfix second stage approach

Hello there everyone
that makes sense
Does comfyUI generate one person one after the other if doing an image with multiple people or how does it work
diffusion works on all the pixels at once. there is regional prompting, but that'd be a specialized workflow towards that. Each region would get it's own prompt.
This is different from an LLM where it generates the next token at a time.
Okay I will look into regional prompting, thanks
Hey! has anyone explored the SPAR 3D workflow for room generation? Like if i got a bunch of 360 images at varying heights could this potentially result in better mesh generation? https://stability.ai/news/stable-point-aware-3d
https://youtu.be/8eHYYFgzNW0 this is the best i can get out of metashape
Hello everyone
gm
is RealVisXL v5 is this model able to be used to train Lora's? (https://civitai.com/models/139562/realvisxl-v50)
I've tried training a lora with this model and I'm getting nothing
is it my settings in OneTrainer or the model? I'm not sure if i'm wasting my time or doing something wrong.
what kind of lora are you tying to make
a regular lora, i'm not sure what other lora there is
style, concept, character
character
real or anime based
real
how big is your dataset?
i've tried small and large
30 to 150 images?
yes
how many steps?
100-5000
do you save like every 400-500? so you can check midwya
4070 ti super, probably 20 hours of processing with nada results
i save when ever, tried every epoch and checked the reults, but nada
i'm using the correct keyword
20 hours?
tried changing the keyword too
damn
i would expect like less. i personally use Koyha_ss so i wouldent know how well the paramatets translate over
but mine roughly look like this aswell for realism (see post)
https://www.reddit.com/r/StableDiffusion/comments/14x6o2c/finally_figured_out_how_to_create_realistic/?rdt=54950
tried varying the learning rate from between 0.003, 0.0003, 0.00003, 0.000003, nada
image repeats, 10-50, with varying epochs, nada
maybe someone familliar with OneTrainer can pitch in
i've put my settings in #🏞|general-with-images
You can make a few free images thru copilot bing and civit ai
Otherwise you need to either pay for it or get it running on your own computer but you need a good pc for it
there is no best possible way. a lot of it comes from building your dataset and capturing it accurately. there are a number of approaches you can use to do this, but what gets you good is practice and iteration. trial and error.
consider pottery. there' no best possible way to do pottery. you get the tools and you practice your methods until you're good.
instead of starting with making full checkpoints with dreambooth, start with loras or embeddings. use these faster approaches in order to learn proper dataset skills.
is the support discord that people link in #🤝|tech-support and then delete legit? seems fishy
i skipped it as im not familliar with AMD but check the pinned comments in tech-support for the guide written by Cs1o. hes here often too but so far as i know its the best working guide
Yes the people who private DM you or invite you to a support discord are scammers
i once got invited to a "ai art" discord where they tried selling ai art but it was just a bunch of people saying goodmorning to eachother and nothing else was going on
every day
i left after a while
Can anyone recommend what they believe to be the best possible video explaining how to use DreamBooth?
hii, dreambooth isnt really recommended anymore
nowadays its Koyha_ss, (i hear some people using onetrainer but it never worked for me) and civit_ai training
theres more options but i dont know about those or how well they work
I appreciate the reply @atomic mortar. Does anyone know of a good video of people using Koyha_ss to make a model?
Is mergin A model 30% + B model 70% the same as B model 70% + A model + 30%? Is the order important?
hello, im havin trouble getting an extension to work. can any1 assist?
i would add more context in your tech support post <3 if i were to interrogate you for some details just to find out i wouldent know we both would waste time/be annoyed
Alright, thank you
Its a sub variant of sdxl like illustrious is. They are a bit specialized
I don't recommend pony models, while they have really high tag cohesion it's really poorly made
Illustrious is better with the same benefits of knowing a LOT of anime characters from the get go
Its anime, pony and illustrious have made realistic spinoffs but i dont really like them a lot
But juggernautXL or realvision v5 are pretty good models
Just lack of real people in there. Nothing a lora cant fix though
Hi there. I'm trying to use Automatic1111 to generate AI YouTube thumbnail. I found SECourses tutorial on Dreambooth extension, but it looks like the information is outdated and sd-dreambooth-extension is no longer maintained.
Question: what is the most beginner friendly way to train models to make AI YouTube thumbnails?
if you like that youtuber, same guy has a much more recent training video https://www.youtube.com/watch?v=FvpWy1x5etM
i have yet to install this, just to make sure I don't mess this up I just extract into my stable diffusion folder correct? nothing else?
It has to be git cloned into the extensions folder
guys, is there any rule of thumb for "fast" model?
like... I just wanna generate pics for the lulz, not quality.... what are the base models from CivitAI... or such sites... I can download? 
it's doing some error "failed to merge unrelated histories"
no clue what that means
google would have instantly given you models, but ok, just find turbo models that work well with very few steps
https://huggingface.co/stabilityai/sdxl-turbo works with just one step
Feel free to share the complete cmd log in #🤝|tech-support
hello, is there a way to install stability ai3.5 medium on the webui and not comfyui?
Hey, yes in forge dev branch it works
thanks!
hey everyone, nice to be here
how does the style presets (style_preset) work, I'm getting the same output regardless of the value?
for example, I can clearly see that the pixel-art preset does not work...
style preset in what? web ui? forge? swarm? comfy?
yea sorry, I mean Stable Image Core (the only model that has style_preset)
im new to stable diffusion, can i use any model?
or do i need to download xl base 1.0 first?
and then download the model?
any model - you dont need the base models
Hello.
How many images would I need to train an IllustriousXL character?
I've done a few Pony models usually in the 60-80 range. Is it the same there?
50 to 150 yes
I'm using the API
ah then no clue
hi hi
Congratulations you are in one 🎉
does anyone else use DEIS BETA for flux sampling settings? any other recommendations?
DEIS BETA is good
hi can anyone please help im new 0.0
getting an error of: RuntimeError: Couldn't clone assets. when i trying to start
what is the best and fastest way to make filter pictures normal? I want to preserve the original image, but make it more normal
Could you elaborate? You want to lets say with the dog ears snapchat filter remove the ears? Or do you mean something else?
You can also dm me if its more convenient, working rn but shift is almost over
I'm looking for a sight that has both art and objective ratings of it's quality
Trying to train an aesthetic classifier
Anyone got ideas?
hi all
hi team
Now that the 5090 is announced with 32GB VRAM, can we finally use dreambooth on flux? What new AI tools can we use that were out of reach from 24GB cards?
I made a calculation of Intel B580 performance in stable diffusion and can`t post it on r/stablediffusion, the filters keep blocking it. Can I ask for advice on the post text here? Or is there a better place?
I m going mental
send a modmail on the subreddit
Send
there are six mods, should I pick 1 randomly?
Just send it here man
check general with images
I guess in offtopic channel if you wanna be on good terms
bot
bot
bot
You're not a bot, but you also didn't join the server just to say hi and then never say another word again. Every single user in here that joins just to say hi is a bot. My guess is they try avoid bot detection by first joining and posting on multiple servers before they start scamming.
yup
it would stop a lot of the entry level questions that could be solved by a simple google search too
fair
the airpump guy from earlier wanted an impossible scale project. automatic editing to match shadows, reflections and add in a airpump to random homes with about 10k photos
its easier and probably faster to just photoshop one in at that point tbh
i started a bit early, thought it was neat but a bit too limiting
but now its pretty nice
it was for some sort of personalized business letter
but he already had gotten the awnser on fiverr before that its simply not possible yet
love how he dropped the bombshell halfway me trying to do a image for him
Yeah
Whenever i see something that's not getting replies or is gonna spam the server im letting them dm me
Granted it's mostly prompting or impainting tips
I just hopped into bed but if your down i can see what's the deal with your prompts
Slept wrong yesterday so my spine is all messed up (ive reached THE age)
Mood
Ive been hitting 11-12's while waking at 5
I can manage ish
Stay up till desired sleep time
Ez fix
That messes it up lol
plug in ollama and use it to help lol thats what i do
straight from my posterior, probably something like they are different IEEE standards that were implemented at different times, the older card would require a Bios flash and people will break things and adoption of newer cards will be slower
you also have the exponent, so you definitely need to do some conversion.
they need the hardware and the software to support it
well you can definitely use fp8 and save some memory, but you would need extra instructions for the conversion.
probably cuz the quality is ass
I dont do anything less than 32 these days
for poor ppl ig
oic what u mean
yea it makes no sense
i mean anything less than the 5080 will have 12 puny gigs of vram
fp8 quality is almost indistinguishable from fp16. Most weights don't need that much precision.
so I guess having a modern card =/= having a good card per se
I have spent literally months perfecting my shopping bots to ensure I get a 5090 on release yet I still worry that it might not work somehow
lower precision seems to effect diffusion models quite a bit less, probably our crappy eyes. But in pure data lower precision is less accurate
Yea unless you do serious simulations, more FLOPS is better than more precision because in many use cases you just don't need alot of precision.
it's supported on 3090 too, just 40 series have hardware support for managing it. 30 has to do it in software
the models begin in full precesion. flux never got a technical paper so i dont know if it was trained in full precision or not. thats' floating point 32bit. there's also double precision floating points, which is 64bit, but that's not used in this field (afaik).
So then they're converted to half precision, which is floating point 16bit. then fp8 is half of that.
It's harder to engineer a solution that runs math on lower precision numbers without having significant error i suspect. so it's done with software solutions first, and then engineers create hardware instruction sets that do it more optimally.
New instruction sets are hard to create and there's only so many engineers and driver coders that can do it
any good ollama llms for prompting?
I have llama 3.2 running, it didn't seem to do great for some of the prompts I tried. It was good for things like midjourney and some SD1 stuff, but not so much for pony or a few others... it's possible it was my request, but just seemed meh. I'll have to check out vicuna, I was going to see if phi4 does a decent job but saw your post and thought I'd ask.
you need to use the advanced node and give it a preprompt
without the pre prompt its awful
I had one that it said I should put in something like "you're an image generation bot and will provide the best prompts for a certain subject" or something dumb like that, I'll do some more research. I know that using the image plugin for OpenWebUI has worked for some stuff (e.g. have it tell me a story about a pig in a house, then ask it to generate a picture based on the story).
yeah no lol you need to give it more than that
im plsying poe2 rn so cant rlly share my prompt
I'm sure I do, I never got serious into it, just saw different things people put in theirs
my pre prompt for ollama `You are an Expert Prompt Engineer specializing in creating unique, highly detailed, and visually compelling art prompts for image generation models. Your goal is to transform a given subject into a richly descriptive and creative prompt, emphasizing originality, vivid imagery, and intricate storytelling.
Refined Instructions:
Focus on Originality:
Do not mimic the example prompts. Use them as a structural reference, but make each new prompt distinctly creative and tailored to the given subject.
Add Unique Flair:
Introduce unexpected or novel elements that elevate the subject. This might include surprising contrasts, unconventional perspectives, or imaginative details.
Detailed Storytelling:
Approach the subject like a scene from a story. What is happening? Why? Add contextual depth to immerse the viewer in the scene.
Rich Visual and Emotional Description:
Include sensory details (textures, sounds, smells, lighting) and emotional undertones to make the scene compelling and relatable.
Dynamic Composition:
Suggest movement, interaction, or tension in the scene to avoid static or repetitive compositions.
Revised Example Prompt:
(masterpiece, hyperrealism, absurdres:1.2, ethereal chalk textures, vibrant oil-like tones), A fiery red-haired woman with wild, unbridled locks that ripple like flames in the evening breeze stands on the prow of a ship as it crests a towering wave, the storm-tossed ocean alive with fury and grace. Her hands grip the ship's rigging as thunder rolls and lightning streaks across a turbulent sky, casting dramatic shadows over her determined expression. The rain mingles with the salty spray on her face, creating a glistening sheen, while her eyes blaze with an unrelenting resolve, her voice raised in a defiant song to the storm itself. Around her, the sails billow and strain against the tempest, illuminated in fleeting flashes of golden light, as the ship battles nature’s chaos, surrounded by a sea alive with wild energy. (dynamic tension, atmospheric drama, story-driven composition).
Additional Notes:
Encourage Variety in Descriptions: Suggest incorporating varied settings, lighting effects, and atmospheres to break away from repetition.
Use a Wider Range of Art Styles and Themes: Experiment with new artistic styles, such as surrealism, abstract realism, or dreamlike aesthetics, when appropriate.
Prompt Enhancement: If the subject feels too simple or repetitive, suggest ways to expand it with unique elements, such as cultural influences, mythological references, or unexpected juxtapositions.
Respond only with the prompt, do not add any header or footer information such as 'here is your prompt".`
anybody know how to use AI to do parody songs? Like have a voice model sing a song with different lyrics but with the same tempo/style/notes as the original song? Like what Weird Al does but using AI?
suggest ways to expand it
Respond only with the prompt,
This is contradictory
(specifically because of the "suggest" as opposed to "use" or "include" or whatever)
ic, ill have to revise. thank you for your input
Thanks for all that, I'm going to have to do some digging into it and see what I can make! much appreciated
im downloading comfyui with hunyuan and im wondering what hunyuan model should i install? to do videos in 720p at least constistently for like tiktok and insta reels. I can choose from the bf16 model to like q3 even to q8 (q7 skipped). i have a rtx 4060 ti with 16 gb vram and 16 gb ram
all the models are from the hugging face website
hi
Auutomatic1111 and Forge no longer bring maintained?
So I have not been around for some time, but last time I looked Hunyuan running locally requires at least 45GB (V)RAM for lowest settings.
No now it's on comfy with way less requirements
Yes 8gb works but it will be pretty slow.
GM
any opinions welcome .. where is the best place to share generted images , perhaps including fails (contributing to training data for bad anatomy detectors perhaps)
Civitai is a great spot. Otherwise X or twitter if you want to try to get a following
And then there's some ai discords
most platforms have raised API fees to discourage indiscriminate scraping, right. my thinking is "i do want to contribute to training data" .. not really chasing a following (these generations are run of the mill) i just figure if i've burnt the GPU time creating them i might aswell make them available as public data
Hmm well ai trained on ai content is actually Poisoning them iirc
Correct me if im wrong chat
I agree but even being available for an AI image vs real image detector could be helpful to someone (I would clearly label as #AIart etc)
I mean then socialmedia is still the best place, biggest reach and used by ai scrapers alike
720p will take ages with 16gb even on fp8. better to run on half resolution and upscale. though you can use torch.compile with that quantization, its still slow.
does anyone know how to prevent latents from being cleared from memory in comfyui? it's so damn annoying. sometimes after generation when memory is freed, the latent goes with it and i have to start all over again. I'm talking about latent upscale. Pretty randomly the original latent will have to be generated again even though I did not change any parameters that would affect it. So I guess it's because it was freed from memory.
Yes I know I could just save and load the latent.
whats the relationship between Flux & Stable diffusion .. is it a completely different model that happens to borrow the architecture or something , what are the pros & cons . (I haven't been following the world of diffusion models closely recently, just getting back into it.. i just started using flux and had used ancient sd1.3? ages ago)
GM
Hey guys, what is your preferred interface for creating images with AI right now?
Swarm
Its am actively developed ui with great documentation, helpful tooltips and it has a comfy backend if i just wanna do some wacky/specialized stuff
Compared to a1111 that didnt get a update since July
And honestly it just works really well, i love the metadata downloader from civit
yep, it is dead compared to what it once was
And it helps that the developer is also really engaged with the community and is really approachable. Just don't try to sneak in any greek characters into his code though like Σ
A1111 has forks and branches that keep updated. It's hard to compare to that since it's not a monolithic project like Swarm or Comfy are. it's so fragmented across many different versions. Even the OG A1111 started as a fork of another project
i really try ot be an ambassador of Swarm and i'm trying to familiarize more with it, but i keep going back to forge to play cause i know it. I don't have to mess around figuring out what i need installed or where options are. The flow state comes more naturally
that consistent UI is basically non existent in comfyui. Every workflow author seems to have a different idea of how to organize the process. Sometimes i load one and there's absolutely no indication of where to start I would assume top left, but so many authors put the prompt nodes in the middle somewhere
the respective trainer discord, one trainer has a a discord full of training sections to talk and share
I believe we're far past bad anatomy embeds . That was more of a paradigm in the sd 1.5 and sd 2 days
is it normal that sometime, even though i erase a prompt, the generated image still use that prompt? should i just reload the UI?
I have a question, are the WebUIs all just a different interface for "Stable Diffusion" or do they have differences in how they actually generate things?
like for example I tried doing img2img in InvokeAI but it gave different results from what I came to expect in auto1111
even though I used the same checkpoint, CFG scale, etc.
there are so many ways they can be different
Hey all. I recently switched from Automatic1111 to SD forge. So far I like Forge a lot more. But I am having an issue with controlnet IP_adaptor faces. They keep coming out very cartoony, even when using realistic checkpoints, images and negatives. Automatic1111 didn't have these issues and I am wondering what the issue is.
better ot use reforge. it's less of a development gongshow. i don't think gpl 3 trolls are attacking it either.
I'm not sure why the ipadapter would be more cartoony on one than the other. they're the same models. in theoyr.
I use the sdxl v2 ip adapter generally. you need the corresponding lora as well. Make sure that sampler and steps and cfg all match as well. cfg around 5 is my preferred.
Ok, thanks for the heads up.
I am using the Preprocessor that was installed in forge, could that be the issue. It's called InsightFace+CLIP-H (IPAdapter). I added sdxl v2 ip model manually.
its the same as the one in a1111 it's just named differently in the ui
Ahh, ok. Thanks. hmmm. I am stumped lol
forge probably won't get updated much. other people are updating it now to gut a bunch of the code from it. APparantly GPL3 violations.
as if comfyui didn't copy paste any licensed code in that project ever
@Lone @Sunny @jade wren @palp @hallow edge @SpaghettiMonster is da twitter hacked u posted a ca
wtf
@austere marsh @bleak matrix @hidden dagger @hasty sage
gotta be hacked
how none of them here 😭
pinging ppl in slack thanks!
Did you launch a token?
people have been alerted
so real or no
is this token for real?
theres no way
no it's not real
Twitter hacked?
hey! just got in the server im an aspiring musician, is there anyplace i can plug my new mixtape? haha wanted to ask before doing so just in case, any rules for the server i can read?
hacked?
Quickly make an announcement if it is hacked so that the number of victims of loss does not increase.
you guys havve to be hacked right?
Hi all, we're handling now. Please don't click any links!
obviously not real lol
THE POSTS ABOUT $STAI ARE NOT REAL
why is cryptocurrency stuff always so dodgy
its been 16 years now since Bitcoin was invented and it is still mostly used for scams
some troll has been going around launching fake coins. They did a fake SwarmUI one earlier too. There literally isn't a swarmui twitter so they just registered a twitter account to fake announce with
Did he write such a long article in advance? Are you sure it's not a premeditated official fraud?
its scary cos I could imagine people falling for that
I know someone who is not tech-savvy who lost 4 figure sum to a similar scam
literally just. don't give your money to digital currency launches
(I'm not saying the correct word because automod blocks it lol)
like, how to tell if it's a scam: if it's a digital currency launch from a corp... then yes it's a scam lol
if you want to invest in that go get bitcoin or whichever of the established coins, not a random corp memecoin
holding something like bitcoin is fine yeah
all of our pensions have some bitcoin
no
Is the x account hacked ?
so it's scem or nah?
it is a scam
twitter hacked
do not click links
cause coin not rugged yet
?
apparently
don't send them anything don't click any links
where is james cameron when u need him
# FYI: Twitter is being handled. The most recent post is a fraud. Please don't click any links.
Now fixed.
FYI: Twitter is being handled. The most recent post is a fraud. Please don't click any links.
it's fraud
I was affected by that scam
in what scenario would putting money into a random memecoin launch not be just shoveling money into a furnace? you were never going to come out of that ahead even if it wasn't a scam
Fr..
Reminds me of that woman who spent $800k to finance "Brad Pitt's" kidney surgery lmao. These people don't even read a white paper to know how the project will make profits. They see a familiar name and drop $1000 instantly.
Didn't every coin fail but eth and bitcoin?
biggest rug pull since SD3 
It's not a rug pull. It's magic beans.
ugh
the big rush to fix the X is a little funny, seen scammers fishing in here entire days
Hi, I’m running the stabilityai/stable-diffusion-xl-base-1.0 model from Hugging Face and successfully generating images. Are there any metrics related to image generation, such as the time taken to generate the first pixel or similar?
iterations/second or seconds/iteration
?
metrics related to image generation
oh ohk , is it like predefined ?
for example for an llm , we tend to get few predefined metrics
i dont know about predefined but it refers to the amount of time per step to generate and people use it as a metric
oh ohk , is that metric shown when an image is generated ?
in some of the software, yes
https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0 if i run this directly image gets generated , but i dont see any other metrics
im running in colab
mmm ide guess your running python scripts, while im sure it would be possible to see some metrics i dont know pretty much anything about it. i use ComfyUI and get stuff like this included 2/2 [00:08<00:00, 4.13s/it]
ahhh , those are like inference steps , you can change those steps. 2/2 to even 50/50
Im looking for model specific metrics.
??...
Cmd would say (depending on the ui) for total time
But not for first pixel
Depending on the UI it might attach it to the metadata too
oh ohk
hello, can someone explain me the Distribution & Attribution for the community license? If I create images under a community license, and then use it in my app or in a video do I need to write “Powered by Stability AI" somewhere?
hello, i need help. If i have a result i want to keep. i just want fix fingers, hand of main object. how?
You also need to donate blood to Emad. If it's Type B. If not you have to buy him beer.
When it comes to stable diffusion you don't really have to worry about any of that. Create whatever and do whatever you want with it. If stability ai asks you if you used their ai to generate your content you can just say you use Microsoft's dall-e and if Microsoft asks you if you use their dall-e you can just say you use stability ai's ai. In reality, they are not going to care or do anything about it. But it would be 'nice' if you did what they wanted though, more or so as a promotion for them.
It's only if you use the model in your app or service that requires licensing. outputs are fair game. And if you think the lines aren't that defined, then the other line to consider is that your company must be making a million in revenue before licensing is an issue.
I'm surprised that someone as weird as him never got me tood. It's always the ones you least expect to be the most moral and upstanding people.
Dark jokes for a dark artist. RIP that guy
Yes. Exactly.
hey guys im a print on demand store owner and thinkng of using stable diffusion to generate large images at 2048 x 2048. though these still need to be upscaled to fit print resolution. any information anyone has would be appreciated as ive never used SD. also are there people here that use Ai generation for business purposes?
Hi! I’m trying to upscale a real photo of an animal using Stable Diffusion, but most resources focus on AI-generated images, not raw photos. I’ve tried Topaz products, but they don’t provide the realism I need. Any advice or recommendations?
Hey everyone!! Fukin rookie question !
How do you get SD to make 10 photos only changing the CFG scale in each photo from 1 to 10 automatically?
Just annoying to generate 10 different images individually lol
What is the best webui for AMD GPUs currently? I have a 7900xtx. I've used SDNext and Reforge. I can't seem to get reforge working anymore but SDNext still works fine. Do any have support for the latest ROCm versions?
I have the same GPU, and I just setup comfyfui last night - so far its been workin pretty good 🙂
go check the pins in the tech support for a zluda amd install guide for windows, if youre on linux you could try comfy
Forge or Auto1111 or Comfyui work good with the latest ROCm.
Checkout my Guides in the pinned messages of #🤝|tech-support
another question, If I decide to generate images only with API, and use them commercialy, I wont need to get community or enterprise license, correct?
You re agreeing to some kind of license regardless the tool you re using, be it the API, local, online services, etc.
And I m sure not a lawyer so I m not gonna tell you if you have the right to use them commercialy. Depends of the license, your country, etc
license will still apply
my advice would be to either use completely open licenses like Apache 2.0 or MIT, or pay for full enterprise license
I wouldn't use any of the intermediate licenses
if this is just casual or small scale commercial then it does not matter much and you can do what you want though
anyone know if there are any merged controlnets for noobai?
need here some help.
I'm trying to create a lora of my child (15 months old), have selected some photos taken from the phone and some sessions that we had with him.
The photos are all bigger than 1024.
Tried to create a lora with OneTrainer using 200 epochs, 4 batch, learning rate of 0.0003, the SDXL_VAEFIX model but the result is horribel.
Would like to know the following:
- since i want a lora for this face, should i tri to crop all photos to only include his face, or maximum the upper body?
- should i resize all photos from the dataset to 1024? if so, using which program
- should i use another model to train it?
My specs are:
- Ryzem 5 3600
- AMD RX6800 16Gb VRAM
- Using ZLUDA ( since I'm with AMD )
Is it possible to feed an image to stable diffusion and it generates something from it?
(sorry I only started dabbling with this yesterday)
Image to image, yes but you still need to specify a prompt
Or you can use lineart or controlnet canny etc to use as a base image
Openpose to extract a pose, deptmap to extract more information from it
Ok, not sure I understand any of that sorry
I'll keep on learning and hopefully what you say will make sense
No worries! What ui are you using?
ComfyUI
As a beginner? Sheesh thats a learning curve
Hmmm what are your pc specifications? Like gpu?
Rog Strix 3080
I can hop on a quick call to show you what i use
Its much more beginner friendly
Just gotta go upstairs for a bit
Oh thanks appreciate it. I can't right now as I'm "working"
It's Friday now too so maybe after the weekend?
Fair lmao, my weekend just started and its 6pm rn
It's 5pm in UK
On weekdays im only available after 6pm due work
😦
No I work till 6
Oh well when your home just shoot me a @ if your up for it
Nice one thank you!
Quick question. I want to test Automatic1111. Currently using SD Forge. Is there a way to easily move my saved Prompt list from one to the other?
hmm wasnt forge the upgrade to A1111
since its just a fork of a1111 but more optimized for SDXL
Yeah it is. But I am having some issues with Control net, so i wanted to revert to A1111. Just to test
what issues are you having?
Forge is a fork created by the developer behind Controlnet so it would seem odd that it wont work
It's an odd issue where the results are coming out cartoonish. Even with realistic checkpoints and prompts.
I am using the sdxl_v2 with the lora
and is it with the control net midas or another one
THE lora?
so your using base SDXL?
its the integrated one.
I am using IP-Adapter with InsightFace+CLIP-H (IPAdapter)
Ok, thank you!
Prompt list? Is it an extension?
Hi everyone,
What's the current best way to train flux ?
No. Just that box that sits under the general button. Where you can save your prompts.
Also, after testing,I realized that the issue is coming from this Huggingface model. ip-adapter-faceid-plusv2_sdxl [187cb962]
Ah the styles
They are stored in to styles.csv
Thank you!
What is the best way to create photorealistic images with SD3.5? My experiments sp far are giving me plasticky/cartoony photos. Any ideas would be most appreciated. (Particularly with Turbo model. )
ban this guy
I don't have much experience with 3.5 but it's always a good idea to use a lora for the look you're aiming for.
Hello, I am working on an e-commerce project and I need a text-to-image model. I want to deploy this model on Google Cloud Platform (GCP), but this process seems quite new and complicated for me. Since I have limited time, I would like to know which of the following scenarios is more suitable:
Using ready-made GitHub models: For example, pre-trained models like Stable Diffusion. Can I import and use these models on GCP? If possible, can you share the recommended steps for this?
Google Cloud Marketplace: Would it be easier to buy a ready-made solution from GCP Marketplace? If so, what are the recommended APIs or services?
My goal:
To take inputs from user data (e.g. a string array) in the backend and return output via a text-to-image API.
Since I have an e-commerce project, I need a scalable solution for high traffic.
Information:
Backend: Requests will come via REST API.
My project allows users to create customized visuals (e.g. product designs).
Instead of training a model from scratch, I prefer ready-made solutions that will save time.
My questions:
Which way is more practical and faster? A ready-made model from GitHub or a solution from Google Cloud Marketplace?
If I prefer a model from GitHub, what steps should I follow to import these models to GCP?
How can I optimize a scalable text-to-image solution on GCP for a high-traffic application?
What platforms am I asking about:
If you have experience with Stable Diffusion or similar models, can you share them?
I would like to get suggestions from those who have started such a project on Google Cloud.
This sounds like all you have is a vague idea. People will not spend their free time helping you get a business started. Hire someone that is an expert in the field.
@karmic brook
Get the exterminator we have a pest infestation
Would it be spam if I post here a link to a civitai feedback of a feature that was said to be in the website by the staff more than a year ago but still isnt there?
Basically, I am trying to get more views and comments to the post so they finally feel pressured to add it.
The feature is a display by pages instead of infinite scrolling
wouldnt do much to post here anyways, no civitai staff or dev is here
Their feature requests are posted in some forum like site and users can post comments
This is a SD server, right?
There might be some people who get their models only from hugginface or pixai; but I assume most people have a civitai account. With only 0.5% of people here posting they could finally implement it.
NVM is currently pending approval, I'll post it when it is approved
i downloaded hunyuan ai in comfui as a gguf. how do i add like audio synced to the video within comfyui ?
gm all
So is Comsos the new Hunyuan killer?
Dam another one
Hey guys i am facing issue with automatic1111 i am using gtx 1060 6gb with r5 3600
And when i starting useng stable duffion like a week ago my pc can generate 512x512 image in 20 second easy but now when i try to generate that same again it take more they 1min and results also is some werd graphics glitches
I search on internet maybe some one have the same issue but i couldn't find any
I have try using medvram lowvram and some the like xformers but still no improve ment
Any advise would bee relly appreciated and sorry for my bad English
Also keep an eye open to "support" scammers. They mostly DM you and want money.
As i sometimes dm people, my help is always free and the day i start asking money for it is the day my account is compromised
ya SMOOTH bridge just me invite to sdome group
i downloaded hunyuan ai in comfui as a gguf. how do i add like audio synced to the video within comfyui ?
Can you train a flux model not a lora ? If yes, what is the current best UI/Addon or Stand Alone Github Project currently to do it on Windows, 24Gb Vram ?
damn, i only realised now how resolution can drastically impact certain artstyle and LORA (not just quality of the image)
loras work much better at higher resolution yeah
for some artstyle though it's the opposite
like with higher resolution, for a cartoony artstyle, it goes a bit more realistic and lose some of the artstyle features/proportions, but lower resolution makes it way more stylistic and cartoony and closer to the original artstyle
i always went with high resolution, without going too high obviously, but i realise now with some artstyle, it's a mistake
i'll need to test multiple for each
was the lora trained at the higher resolution though?
if you want 2560x2560 image then the training stage including that resolution would help
i have no idea, it's not mentionned on civitAI
but for exemple, i was going like 1500x1000, and turns out it works way better at 1000x750
not the exact numbers here, but you get what i mean
they rarely give details yeah
if you had a lora trained at the higher resolution it would have more potential
or preferably a lokr
models are typically trained at a megapixel base resolution. they can go higher but most of the training knowledge happens around 1024x1024 and associated aspect ratios.
a good method is to use the lower size as a jump off for a higher resoluiton generation. 20 steps at 1MP then scale it up by 1.5 and do another 15 -20 steps at 0.6-0.7 denoise
adjust numbers for your needs
sd ultimate upscale is another good approach. it'll tile the image into smaller pieces and do the 2nd denoise pass on that. often adding a lot more detail
not sure to know what you mean here, like doing an normal image, then going to img2img to scale it up? If so i don't bother with that, i just generate and hope for some good ones on the first try, lol
essentially. it's often called hires fix.
its basically what you just described, "going to img2img tab" but it does it all in one go
in forge ui, if hires steps are set to 0 it'll do the same amount of steps for the second pass
it's auto1111, dunno if it add steps byitself when at 0, does that make the generation longer though?
auto11 is same way. 0 means do the same number of steps as the first pass
these things in the UI's are non intuitive and not explained very well. i know them from years of experience with the gradio style webuis lol
well turns out i was doing everything right frm the start, lol
and knowing is half the battle!
Ban this guy
Some previous guy did the url trick. The guy must be a scammer apprentice
not a very good apprentice. sucky master to take on such a sucky student
what prompt do i use to make images focus eyes? very often my character are slightly far away and eyes look bad af. it it some focus_eyes or eyes_focus or something?
adetailer. do a second pass specific for eyes.
something without adetailer? im making about 200 pics a day
higher resolution. there's only so much attention to go around really.
there's no magic prompt to fix all eyes in all cases. thats why doing a detailer pass is such a standard practice
if you're using controlnet to determine subject, since you're saying 200 times they're "slightly far away", then try setting the controlnet to stop at 0.5 steps so that more steps can just use the model itself for denoising instead of worrying about the other network
how do i use controlnet?
Hello guys I am trying to use https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/API but not work ın my project I got error can anyone help me?My terminal output like
✓ Compiled /my-followed-stores in 288ms
GET /my-followed-stores 200 in 328ms
✓ Compiled /api/generate-image in 324ms
Received request body: {
prompt: 'GamerGear, modern storefront design with professional appearance, professional product photography, high quality, detailed, 8k resolution, product showcase',
negative_prompt: 'blur, noise, distortion, watermark, text, low quality',
width: 512,
height: 512,
steps: 30,
cfg_scale: 7,
sampler_name: 'Euler a'
}
Sending payload to AUTOMATIC1111: {
prompt: 'GamerGear, modern storefront design with professional appearance, professional product photography, high quality, detailed, 8k resolution, product showcase',
negative_prompt: 'blur, noise, distortion, watermark, text, low quality',
steps: 30,
cfg_scale: 7,
width: 512,
height: 512,
sampler_name: 'Euler a',
batch_size: 1,
n_iter: 1,
seed: -1,
restore_faces: false,
tiling: false,
enable_hr: false,
denoising_strength: 0.7
}
Image generation error: {
message: 'Request failed with status code 404',
response: {
status: 404,
data: { detail: 'Not Found' },
url: 'http://127.0.0.1:7860/sdapi/v1/txt2img'
}
}
POST /api/generate-image 503 in 488ms
and google chrome F12/console error is Generation failed: "Request failed with status code 503" {}
GitHub
API
Stable Diffusion web UI. Contribute to AUTOMATIC1111/stable-diffusion-webui development by creating an account on GitHub.
API
doesn't apply here then. i thought since all your pics were "slightly far away" you might've been using open pose
nah, i usually set a specific prompt, add wildcards, and let it run for 100 or so pics, my pc is not super fast, so it takes about 6h, so i leave it for the night
adetailer is your solution. you'll get less pics but they'll be better.
do one or two test batches before comitting to 100 batch
would setting denoiser to higher than 0.3 help somehow?
or adding more high res steps? im currently going 30 steps and 10 high res
thats a very low denoise for the highres pass. i usually go around 0.6. i don't care about maintaining the lower resolution image to much. I only expect it to be a compositional base
and yeah, 10 steps is poor
im doing my pics at 1080x1080
I noticed something very weird with Flux and wonder if anyone has any insight on that. When doing a second pass on a latent, the final image quality will reflect the image quality of the directly decoded image. Let me explain. If I do a latent upscale then directly decode the image (no denoising), the image is sharp, although it has artifacts from the latent upscale. After denoising the image will stay sharp. If however I upscale in pixel space with bicubic interpolation, the image will keep that soft bicubic look even after denoising. When upscaling with nearest neighbor, the image will keep that low-res pixelated look after denoising. My guess is that the feature maps that contain the look of those high frequency details have a high hysteresis and it would take alot more noise to perturb the information.
Welcome Lee,
first off i would recommend looking into the "techsuport" channel and then the pinned messages. theres some handy tutorials depending on the system you use. Seeing your question however i must say you need some pretty strong hardware to run it on your own pc
does the amount of prompts even matters? like going over the 75 limit, 150 limit, 225 limit, and so on
i tried and couldn't really see a difference
it does but instead of me explaining it. its gets touched on here #🍥|anime message
oooh... so some of my prompt might be divided in two...
yes
i need to check each limits? gah
altough it does seems like all my prompt are taken into account so maybe i'm lucky, lol
if i'm at 150 right at the end of a prompt, if i put the "," and put a new one, it won't be divided in two correct? the cut is at the right place?
the moment it doubles the number it when it breaks
so if i'm at 150 limit at "red hair," and then at 225 limit at the following one "red shit," none of them will be cut right?
I integrate stable diffusion txt2img to my project but working slow,how to fast it?
pc specs?
if the clip doesnt double when you entered your last prompt then yes. it wont be cut
Lenovo ip3 1650ti,4gb Vram
you cant get it faster 👍
4gb vram is the limiting factor
you COULD technically run sd1.5 but it will be slow regardless
:/
yeah unfortuneatly image generation is pretty though on the computer. you could use a online service but those have costs attatched
How much?
After creating 2000 images, can't I just save them instead of having to regenerate them each time?
it depends on the model
you save them automatically?
if you run it locally
Idk man
I just tried for 1 image
İt worked but 2 min cost
I will run 1 more and now will work faster?
could be because its now loaded in
but any faster is simply gonna be a struggle. also generating SD1.5 (not XL) at 512x512 costs about 3 buzz per 4 images so if you just do it the fast quick and dirty way on civitai its about 6.666 images for 5 USD
Will I get faster?
their service is faster yes
My images are 100x100 or 100x200 resolution
but seeing how you might be a beginner you will have some trail and error
so small?
you might get strange results then
Why
Can I use it like that I use.I should call the prompt in frontend part and need to request model
Okey nope
is there a way to go higher than 100 batch count aside just having 2 batch size or more?
Which sampling method should I use for fast generations?
Нарисуй Путина
Does resolution diversity matter with LoRA training? I have always tried to have a decent spread of resolutions for training, but what if everything was just 1024x1024, would it still inference at different resolutions just fine? Just curious if I am wasting my time or not.
Like for the current character LoRA I am working on, I have a huge chunk of images that are 1:1 aspect ratio, do I just leave it, or do I crop some to hit different bucket sizes?
Not the place chud
Never assume spam is posted by real people. You most likely replied to a bot.
its likely a bot yeah but it its a legitimate game it looks like
probably a small indie studio trying to market their stuff far and wide as possible but yeah still spam
The youtube comments are full of fake positive comments as well. This has a coin/token scam vibe to it. There gotta be something fishy about this.
yeah but they are pushing it to IOS
like a phone game
it is an india based studio afterall. indie game studios there do have different standards then here
like they even included a whatsapp on there
I don't play phone games so I don't know. But why would I have to fill out a form to play a mobile game. And why would they use scammer tactics to advertise the game. Just doesn't feel right to me. My internet instincts tell me to stay away from sketchy stuff like this.
its an asset flip
eh low quality indie studios useually use paid assets
and they're spammers too. which puts them squarely in the unethical zone.
sometimes i wish we had more moderators in here ngl
