Hello all you gentlemen and ladies, itās evident everyone here is so so talented with stable diffusion, but would anybody be willing to BLESS me with the most advanced to date text to speech ai software, both normal voices and than celebrity/character deepfakes. Thanks in advance š this community is filled with helpful people
#š¬ļ½general-chat
1 messages Ā· Page 94 of 1
And I donāt want to believe the first ad that pops up on Google, so if anyone is aware, I would love to talk.
.............................
I apologise for even starting a 'help me help me' message. Figured it out - cheers for the pointers
bruh there are many videos, like..... loadssssssss of tutorials.
bless the ai path you walk with some time spent in the tube. Stable Diffusion is for pictures though so maybe shuffle this question to the right place and you'll get what youa re asking for. Literally loads of guides and videos.
You fuggin G - thank you
anyone know why sd mov2mov does nothing in the command prompt when i start a process clicking generate? get no result the web ui just hnags with the interrupt button and skip button grey with 0/75 stuck on it
No announcement here for StableCascade?
as i understand no good solutions to use automatic on windows with amd now?
ive read some ppl happy with results on 7900 being near high end nvidia but i suppose thats a linux users? š
It's worth it to learn to install Arch Linux imo
its just really makes mad, to have expensive card that performs badly -_-
Yeah AMD's software team seems lacking
maybe ive didn tweak it to the top on windows, any attempts ending up error in memory mostly.
i do add same commands as ppl having also 7900 xt
but for me they dont seem to work for some reason
neither medvram or --no-half --precision full
seems to cause errors
i have a 7600 on my desk for validating comfyui on windows. the problem is i don't have time
only optimization engine that seem able to make highest resolution images is sub quadratic, although lil bit slower than others
i wanna use automatic š¦
is 25 steps 768x768 in 26 sec is a bad speed or no?
no turbo it can make somewhere1024-1024 in 12 sec but turbo so rare for now -_- , and that often memory errors
it all comes down to people who can program also having the hardware you want to test, and you will discover that choosing bad hardware is misaligned with having the knowledge to program
the most annoying problem is that 20 gig of vram seems to clog up very fast , and sometimes it tells me that even 1280 on 1536 is too much of a resolution
not sure if CPU matters much , but its somewhat weak for now -_-
the most weird part is that this --no-half --precision full -medvram that work for most 7900 xt users doesnt work for me. Ending up alot of errors =\
Im not sure about the goals... to have fun. Have at least medium speed generation - not to have this often memory errors
to have fun.
is IT fun?
there are UIs that make more IT decisions for you and should make things a little easier
otherwise bing image creator is a lot more capable for most people and use cases
not meaning to sap your excitement or anything
but when i help people get acquainted with this stuff i get them on bing image creator first because it actually works
on the top of all it feels like randomish or stars position ones. maybe depends on model & promt. One day it can generate in img2img 1536x1536 with no problem, other it cannot generate higher than 1024. It can go with hires fix upscaler as high as 2500x3000 . But often generating very high resolution image once or two times ends up in vid memory full error
ppl seem to enjoy amd on linux, mostly) . But ive got a feeling setting it up there is frustrating -_-
there isn't really an audience for super high resolution imagery so nothing is optimized for it
Heard theres rumors rocms coming to windows soon, and that might end suffering of amd users)
it sounds like you are mostly having trouble upscaling?
y generating in high resolutions
and some random memory full errors
upscaling via res fix-scaler or extra tab seems to be more less fine-stable
having different promts with additional loras to generate in txt2img in high res( one is generating even 1280x1536 other cant go higher than 1024-768 without errors). Can 1-2 additional lora & styles affect vram clog up that much?
& also why something that work for others might not work for me? In webui commands. I might have missed something to install ? It can depend that much on how different every people setup & brand of a hardware?
why other ppl 7900 work with medvram and mine doesnt o.0
well if remember right it gives error somewhere near at 93%. or after generated once. More likely 93%
ęę¾ä¼äøęčæęÆåŗęēwebå·„ēØåøćDear,
I'm excited to help enhance your online presence.
After reviewing your website and social media, I'm confident we can elevate your brand with:
Modern, user-friendly website development
Daily content creation and strategic social media management
Logo redesign for contemporary appeal
Sign up now for our comprehensive services and enjoy the first month of social media management for free. Let's discuss further during a brief call or meeting at your convenience.
As a special offer, we're flexible and willing to discuss pricing that fits within your budget
Looking forward to collaborating!
Demo Website Pdf Availabe
Best Regards,
Zluda is comingā¦
Is there some magic way to not have theese vram errors & problems with clogging up on AMD now?
https://openai.com/sora what is this ???
Check this out https://openai.com/sora
those gens looks super real, i never thought that level of quality video generation would be achieved that soon
same
you and me both....I am absolutely gob smacked it's here. I knew this would be possible someday....I was thinking years though
I was just fucking around with creating motion thru the caveman methods available right now.......
and then this drops
out of nowhere
imagine this realtime ?!?!
tech paper out later today
wondering of you could use motion LoRAs to have ID specific microexpressions, animations
Hello everyone. Give me some advice, please. I want to train LoRA on a large number of images (for example, 10k), I know that such a number is acceptable in the case of styles and complex objects/items. How many steps should there be for each image (i.e. what number should I put at the beginning of the folder name)? I realise that a lot depends on the train dataset and other settings, but I would like to know at least an approximate range.
For any amd question message me xD
bigger model = more vram usage, what was your gpu again? it shouldnt crash with 5gb
@oblique ivy whats up? You sent me a friend request. You a bot?
yes was spam
Hello everyone, I installed automatic1111 via run.bat and finished it with update.bat, but I don't even know how to open it. Can anyone help?
yea, I could really use some help as well, I followed the guide in the support channel and its still breaking during install
hi guys i downloaded the ip-adapter-faceid-plusv2_sdxl_lora, for lora, but when i place it in the lora model file, and restart stable diffusion, nothing pops up
does anyone know a fix?
Anyone here ever mess around with Silly Tavern?
have you got a sdxl model loaded at the top?
yea
I don't think anyone is familiar with the one click install
damnm
where's the AI art bot
All is lost 
well no - heaps of free generators around - civitai have one, happyaccidents.ai, aiscribbles.com
hope they are good
need a book cover
I mean they will be as good as the bot here was - if you are looking for something with more control, you could try https://tinybots.net/artbot/create
thank u
Is there a guide or tutorial someone can point me to in order to install Beluga? I have no background in coding and a little lost once I am redirected to Hugging Face
Got a link to beluga?
Im sorry I am not sure what to do with this. I only have experience with Midjourney and ChatGPT. I am a bit over my head trying to figure out how to install this. Not sure where to start
That shouldnt be an install, thats a use it right there job
I see, ty
Error: [Errno 2] No such file or directory: 'C:\Users\richardkim\anaconda3\lib\venv\scripts\common\Q33RPXYW.DOCX, when install on window how can i solve?
Youve gone astray somewhere if the venv is in the anaconda3 folder - what are you trying to install?
Guys where can I generate text to image here, I can't find a way!
which chat room should I go to if I'm looking for LoRa or checkpoint suggestions for a project I'm trying to get?
Sora is amazing
How even is that possible?
Shame ClosedAI won't share any technical details
I have a video that I think was made with stable diffusion, I have $20 for anybody who can recreate it for me and teach me or just tell me the softwares needed to recreate, itās pretty simple. DMs open
The compute they need for that behemoth can probably not even run like people are running Dalle3 at the moment. I bet it needs huge amounts of Vram AND since it's from ClosedAI it'll bever be available Opensource / without a subscription and behind closed doors.
Our only hope is that they a) Share at least some technical details so it can help the OpenSource development of alternative Models or b) that sometime in the future something leaks (similar to how the NAI leak skyrocketed SD v1-5).
Definitely will not run locally - not a chance in hell considering the compute it doing when generating these. The research paper is interesting (even if very light on detail) with the concept of diffusion 'Latent Patches' being equivalent of LLM text tokens. I really, really, really want SD to give us a video equivalent we can run via services like Mage that have a lighter touch policy on censorship as compared to OpenAI. Heck, I would pay for a service providing this quality from SD themselves (up to say £35 a month), but they are under heavy scrutiny as it stands with how people are miss-using the tech.
Hey everyone!
I'm working on a photorealistic character who's pregnant using Automatic 1111 with Realistic Vison V-5.0, but the stomach keeps coming out horrendously disfigured and honestly terrifying. Does anyone know what the best method would be to get a natural looking pregnant belly? Mabe using inpainting, or any +/- prompt suggestions?
Any help is greatly appreciated!
maybe use open pose if there is one
ipadapter
then add some embedding if someone has made
or give more power to the negative prompt
there is a lora for this
@agile tiger Sweet I'll give this a try. Thanks!
Why would anybody use stable diffusion
Is it out now?
Does OpenAI have a text to speech software?
they have it all but no enterprize would want to get into a closed system
Wdym?
@warm junco Do you happen to know where i would find it? SD is pretty new to me, so I'm still learnin.
on Civitai.com you find everything, its the largest model database.
the lora you need is flagged NSFW, so you need an account to see it
nope
SD is OpenSource and Free, also no censoring and it runs localy on our machines instead of a cloud from a corp
@warm junco are you aware of the most modern text to speech software? With ai voices and deepfakes and all
@warm junco Awesome. I appreciate the help
eleven labs is good
idk any free ones so far
I hope stable audio become overpowered like diffusion
Quite a few, all of which are of lower quality however.
At this point everything is copyrighted. And audio is just a pattern of sounds mostly repeated. Ai should replace it asap.
TortoiseTTS, Whisper, OpenVoice, and Bark.
Some others that I don't know on the top of my head.
nice thanks! didnt looked much into tts
I don't remember the name but there is ai which lets u generate audio in popular people's voices.
Hm. I guess VTV would be more realistic than TTSm
Ye if they can speak..
A TTS can be fed into RVC however
Is it totally free though? It needs credit igm
RVC is open-source.
I dont remember much it needed something like credits which you can get by letting it use your gpu for few hour.
I am not sure if that is diffrent ai
A ai which converts your voice into voice of famous people for example donald trump.
Very realistically
I think its not rvc then. I dont remember name.
Being honest, what I really want is 2D to 3D conversion.
The only thing I know of that does this well is MIDAS 3.1.
@drowsy robin you seem to know a lot, where do you get your info
By researching.
Lmao
Its Voicemod net.
And money isnāt a obstacle, what is the best tts out rn
Elevenlabs.
Right lmao but all due respect the top 20 searches are advertisements
Nothing beats Elevenlabs.
rvc is the best for transfering voice to voice, weights(dot)gg is the biggest rvc model catalog, its the civitai of rvc models
Glad u saved me
My guy, I have been a part of the AI space since SD1.0. I just keep up with Reddit/AI stuff.
DallE 1.0
I was talking about Voicemod . Net
Whatās your time zone
Why do you want to know my timezone?
Because Iām abt to dm you for contact info id like to speak to you more
If youād have the time
Let's just say I'm close to EST. Also, I'd like to not do that.
Fair enough
I have this discord's DMs disabled for good reason.
You know you know your stuff and everyone wants to ask you
Just wanted to dm you so I could ask you more technical questions as I get more involved
No big deal though I understand need for privacy
I dont think anyone in 2024 really uses dms.
Ye
I sure donāt accept them
But like we all agree once sora is accessible to the public itāll be #1
I picked up a Prime B550M Mainboard, 16GB RAM, Ryzen 5 5500 and have an RX 6800. Going to be installing Arch Manjaro on a 128GB NVME.
What do I need to know to install stable diffusion, and/or is there a recommended guide to follow?
SORA is good, but closed source.
I will wait until Emad and stability one-up SVD.
Text to voice is dull though. The moment people hear ai voice. They close the video..
Svd?
Stable Video Diffusion.
What š
I never watch any ai voice video now matter how good it is..
Where in the world did you make that assumption
Thatās you man
Rule #1 never assume you are the average person
All due respect
Text to voice (elevenlabs) is still the highest in emotional, quality output.
It is good enough for audiobooks. And speaking.
Okay lets be honest you tryna create youtube shorts/long with an Ai voice over it. I would say average people would not like it unless its a 3 minute tutorial. I am only saying it becausee, in case you expecting it will go crazy and work as good as real human. It would not. Everyone can recognise Text to speech. If u are putting effort in it. Then remembwr audio is 60% of the video. People can watch video with audio only. But best vudeo with dull audio is garbage.
Ofcourse no one means to demotivate you or anything
Just a friendly advice before u waste too much time
If you trying to do this*
Well Iām not but Iām sure sny1 who would wanna do that appreciated your advice šÆ
Lmao I'm not gonna take your advice on that and I apologize.
^^
Advices depend on people and situations. No one is forced to follow/ignore.
Thank you, Captain Obvious.
I simply do not agree with the statement you gave based on the multitude of repositories and projects that focus on that very thing.
Its okay. I am not suprised that you do not understand.
You aren't creator or someone who does this.
You assume wrong.
Catcher cmon man you injected yourself in our convo nobody asked for your negativity
I'm not wasting any more time on this conversation. You clearly seem to think you're in the right.
@drowsy robin SD can add motion to photos? Do you have a link to a good tutorial on that
SVD can handle video motion.
It's a different model is it not?
But animdiffusion does exist.
Again. This is a public chat. And i can say whatever i like. I am neither toxic nor I am saying anything which you need to care about. All I did.. writing something I know and have experienced so that "someone"'s time or efforts can get new direction if they are reading. Agaij, nothijg is focussed on only you.
Yeah, no. No thanks.
I am a fan of open-source.
But in any other fields are you an advocate of ai softwares for?
Like for example anything to do with organization etc itās a shot in the dark
Basically asking if you use any ai for anything else in your life that youād like to share
Shot in the dark
ChatGPT š
GPT4All
And what are their purposes
LLMS are language models. They simply process a token context input and then output a response suitable to said input.
A guessing of tokens that seemingly is rather coherent.
OS-copilot, along with other such softwares (Open-Interpreter, Autogen Studio and Self-Operating Computer) take this to their advantage.
I have a lot of sauce on GitHub but my skill set isnāt aligned with code too well
Anything with a UI?
Not sure if right channel. to ask this - but does anyone know the process (using A1111) to create the individual images at the various stages, from non-villain to full villain? How do i create the separate stages/steps - https://civitai.com/images/4936746. (once i have 5 images from beginning to final, I ca then use a video editor to create the animation - but how do i get each image needed)?
^ anim diffusion does this.
is that an extension?
It should already be part of A1111, I thought.
oh ok - i'll google it, to see if any tutorials online
Well I mainly just clone and test out repositories based on the instructions given for said repository.
Which are usually just in the readme.md
And once you successfully do that what is the end result
That... depends on the repo. Lol
Like it still doesnāt have a friendly ui
If it isn't designed for one, no.
Honestly I gave up after spending hours trying to effectively get GitHub settled on my Mac
Huh? Can't you just install git on Mac?
I dmed any chance youād accept
nope its an extension
In SD.Next it is native.
That'd probably be why I thought that.
So - I am relatively new to SD (although not to AI art generation). And I sometimes get a bit thrown by some of the inclusions I see in prompts that I experiment with from civit.ai.
For example - I see this in prompts. ~~aesthetic~~. and seems to be random, without meaning? But it must mean or do something.
And then also I see prompts that have certain words of phrases in single, double and triple brackets (like this) ((or this)) (((or even this))). what do they mean / do?
Is there anywhere that has a good guide that breaks down, in simple terms, all of these little nuances and features that make up a prompt?
Thanks
Basically (Something) does the same thing as Something:1.1
likewise ((Something)) = Something:1.2
it's for emphasis, the picture you get is a bit more Something
the extension is stupid too. it might as well be standalone. it doesn't tie into auto1111's systems at all and is just it's own tab with it's own underlying code. like running comfyui in an auto1111 tab
Guys someone know how improve eyes quality? I use ADetailer in comfy but that helps with face and is good but eyes still almost bad, flat
the extension is stupid too. it might as well be standalone. it doesn't tie into auto1111's systems at all and is just it's own tab with it's own underlying code. like running comfyui in an auto1111 tab
Was wondering if anybody has been able to achieve a good workflow for coloring manga with control net ?
yea i think so too
i'm happy to be playing with cascade, but theres no reason it needs to be an extension with the capability it has. could've easily been standalone.
i guess it helps get it out there for the masses
hello
whats the best place to download the insightface\models ?
hello
can we use stability ai API img2img inpainting with Controlnet?
insight face is a tricky one to install as it requires building. but you can find prebuilt whl files for it. let me dig a thread thats helpful up
Does anyone have a colab for stable diffusion video?
Looking to something like the fooocus colab
Emad already replied to a thread somewhere they have something similar already
I barely lurk this server so I can't say I know about that. I'm just a bit miffed about how most people seem displeased with the OpenAI footage released recently. I just can't help but think about the endless possibilities really
If you're a qualified expert in the field, they may provide access. I'm not sure how they're rolling this oen out but they done similar things in the past. They do share their research. Just not with the wide open internet.
The problems are two fold - one is the fact that it democratises movie making and will put a lot of creatives out of a job; and another that it competely screws with the āseeing is believingā truth that we have currently, and that the world is definitely not ready for a world where you cant trust anything you arent seeing in real timeā¦
would someone be able to link me an install guide for SD on manjaro linux, I can't seem to locate it in pinned messages anywhere.
for AN amd CARD
say im curious, do you guys generate images in a small format until you get something close to what you look for, then copy that ones seed and refine it for better results?
Yeah, doing seed hunting without hires fix on, then fixing seed and then doing it with the hires fix
Yeah high-res fix is awesome.
it's as simple as asking someone.
Excuse me, I'm new in here, and I still don't know how can I generate images. I'm sorry for my ignorance. Anyone could help me, please? Thank you in advance.
Easiest way is go to aiscribbles.com or use the generator on civitai.com
ngl cascade isn't even that great
If compared to base sdxl or base sd1.5 its pretty good - give it a month or two for finetunes to come out then youll see its true potential
I dont know if it will end up better than existing models but we will soon find out
is easier to train too so the potential may catchup and suceed xl even sooner
Where is stable diffusion is it a website oorrrr
Stable diffusion is a technique for making images from text promptsā¦. If you wish to make images, start at aiscribbles.com
Can SD now generate multiple but different people?
Yep - sdxl is better at it than 1.5 out of the box, but there is a tool called regional prompting which is good for splitting up concepts in an image
gm
who can help me with stable diffusion
Basically currently I have stable diffusion web UI
that is different than sdxl?
Also what is the difference between sdxl and sd web ui
no one can
Stable diffusion webui is a user interface - sdxl is a model for using in that interface (as is sd1.5, sd2.1, and all the checkpoints on civitai)
how do I check which one I currently have? @nova zodiac
If I understand correctly sd1.5, sd2.1, and sdxl are all models I can use on stable diffusion webui?
How do u recommend I use sdxl
Hello everyone. New to this server š
Download sdxl and put it in the stable-diffusion-webui/models/StableDiffusion folder
Then fire up the webui and generate away š
SD 2.1 is not for using. It's for taking up space on your hard-drive because you keep forgetting to delete it.
It has a niche with the StableSR upscaler
I am using PINOKIO. It installs very comfortably. A1111 installation and InstantID installation are 6 x slower than my already installed A1111?!?!?!?
Only speedy installation is IPAdapter+FaceID
Now installing Kandinsky-3 (not via PINOKIO)
I want to make my own LoRA for decapitated heads, i'm using Kohya_ss to do so but i have a question.
my instance prompt is "Decapitated head"
my class prompt is "Human"
what should my regularization images be??
They are optional so i guess i could try without any but idk if it'll turn out well
Hi everyone. (if I'm writing in the wrong place, please correct me). I want to train my model, but not on people, but on furry art of one artist. Tell me, maybe someone was engaged in training a model not only on portraits of people? I have a couple questions to ask.
Ok
where i write command? help me!
If you are trying to make an image here, you need to read #1047610792226340935 then head to one of the sites mentioned at the bottom
Question, anyone have any good NSFW discords? Got questions that probably aren't applicable or appropriate for the general
And I mean stable diffusion discords
I mean you can talk about creation of a single concept without actually describing the conceptā¦
Seriously? Troll?
Yeah, second one in two daysā¦
I would but downside is I also want a place to look for ideas and references. Kind of good ideas of what's actually capable.
Just some kids sitting in his parents' basement wanting attention.
Ignore them and they go away.
@sudden ruin @bleak matrix you about??

Thanks chief š
Outstanding response time there mods
Yeah they had one going for over 90 mins a few days ago - that one was rough
I self pruned me trolling the troll.
No need to leave it hanging around now that the neck beard has been booted
Im trying my best, but I also got a life and gotta sleep š
Dude I got three kids, run a business and run a gaming server in my off time. I feel that 100%. IRL comes first but we do what we can.
Feel free to ping and/or use the report function anytime
What really bugs me is these war thunder and Arma experts commenting on the conflict and whatnot in Ukraine.
I did 10 years active duty army, 36 months combat time in Iraq. Al-Asad, tal-afar and Baghdad. I was actually in Baghdad for the 2010 Iraq election. That shit was rough.
These little "kids" are lucky they don't have the slightest fucking idea what they're talking about when they try to troll or make any sort of commentary on combat. š
Please move that to #š¶ļ½off-topic if you wanna keep discussing 
Just wanted to comment on the combat thing. I actually take a solid stance on refusing to ever discuss politics lol.
Politics, religion in music always galvanize and end badly.
Me - im just excited x-adapter code has been released š
I personally don't think it would be that bad. Fake news are easy to fight really
It is now, but given how far we have come from Will Smith spaghetti abominations less than 12 months ago, it will quickly get to the point where the average human has to question everything they see onlineā¦. That sure as shit aint gonna happen
how is this supposed to put a lot of creatives out of job?
Just look at how many people thought the pope was wearing balenciaga
thats when people are not looking
but i dont wonder with the Tiktok generation anymore
Thats my point, people arent looking and wont look, especially if it reinforces their world view
most people in social media can be divided into two really, those who are and those who aren't dumb.
most people would figure out when something is AI sooner or later
But this is wandering into #š¶ļ½off-topic territory
Itll get to the point where you need forensic level interrogation to figure it out
I keep asking in tech support but there doesn't seem to be much activity there, is there a guide on how to install SD on linux with an AMD card? I'm seriously lost and have never used linux before
Hello
What happened to the bots?
Perhaps you can search for ē§å¶ aaaki on Bilibili (a Chinese video website), who has issued a document to solve your problem and also provides an integration packageć夸å ļ¼https://pan.quark.cn/s/19a36cab36ac
ē¾åŗ¦ļ¼https://pan.baidu.com/s/1QDqo2uEoUS_NY1olb4vmVQ?pwd=aaki
sha1ę ”éŖē ļ¼c7c5d497360c7ec3fe9af5ada1624842341d8275 ä½č
ļ¼ē§čaaaki https://www.bilibili.com/read/cv26557731/ åŗå¤ļ¼bilibilić
This is the download linkć
bots still down...
Good morning! How is everyone today?
hey I've questions and need some suggestions. . am i able to design shirt using this. already got the logo and i need to insert my logo in the generated photo . like a human wearing my shirt
I need some help.
I'm getting ready to train my first model/loRA, but I'm a little confused.
I'm wanting to generate very specific realistic physical features on random characters. For example say I want "large noses" when tagging my dataset images should I be tagging things that I do not want to see in my desired output, or tagging the things that I would like to see?
Also would a model, or loRa be better for specific physical traits?
I desperately need help. I've spent many hours researching these questions, but I can't seem to find any straightforward answers.
Thanks in advance!
I need some help please
does anyone else find mild entertainment adding contradictory prompts and watching them fight it out?
I generally make sure that I caption as much as I can. That said there is a thing called masked training in the OneTrainer software that allows you to mask the input image so it knows what to train specifically on @jolly karma
It is occasionally entertaining - lighting prompts are good for that
Photoshop/Krita still best bet for that stuff atm
I have a problem
"RuntimeError: Not enough memory, use lower resolution (max approx. 896x896). Need: 0.5GB free, Have:0.4GB free"
what do I do?
Hey what's your GPU? And what's inside your webui-user.bat?
I didn't run that
I just ran webui
GPU 0
NVIDIA GeForce RTX 2070 SUPER
Driver version: 31.0.15.3623
Driver date: 6/8/2023
DirectX version: 12 (FL 12.1)
Physical location: PCI bus 1, device 0, function 0
Utilization 1%
Dedicated GPU memory 7.7/8.0 GB
Shared GPU memory 1.2/15.9 GB
GPU Memory 9.0/23.9 GB
Looks like weāre getting an update, looking promising https://github.com/AUTOMATIC1111/stable-diffusion-webui/releases/tag/v1.8.0-RC
curious about something
when you use a real picture in openpose
will the overall body of the original image affect the end result?
like if you use a large person, whoever you use the pose wtih will get affected and have a similar frame?
Nope, OpenPose should get the pose only. With other ControlNet models you can copy appearance too.
i never heard of it but i just started using photostructure as an image gallery server and its just chugging htru my 10tb no problem
supposedly it has facial recognition but i havent seen a signle thing about htat in the UI so.. hopefully it does?
@nova zodiac I installed OneTranier and ran it a bit. It's incredible!
Thanks for the suggestion.
However i can't seem to figure out how to even use masked training. Any tips there?
No sorry, Ive not tried it myself - maybe someone in #š§ļ½finetune can help?
Im using stability.ai API and getting alot of "CONTENT_FILTERED" responses. Even though there is nothing wrong with the image. For example the prompt is "An image of a person standing still, their face a mixture of deep reds and blues, capturing a tumultuous blend of emotions. The person's fists are clenched, and their eyes are wide with a sparkle that suggests tears, lit by the harsh, contrasting light surrounding them." It blurs the result and gives CONTENT_FILTERED
Has anyone experienced this and know of any tips or solutions?
Does anyone know if stable cascade is trained on LAION
They havent announced what it was trained on, but if it wasnt a subset of Laion the gasps will be audible
I apologize if I'm barking up the wrong tree, but do you know if they typically use the original meta data on LAION images or do they use taggers or some sort of context model to improve image descriptions?
I ask because cascade is a clear improvement over sdxl but the prompt understanding doesn't seem much better.
can anyone recommend a model, lora, etc to generate 16x16 pixel art for inventory icons? something very similar to minecraft icons
No idea sorry
Youre best bet is to use a pixel art lora at 512x512 then downscale the result (search civitai for pixel art)
what model do I use?
and which lora
theres a lot
SDXL Controlnet vs SD ControlNet, is there a major difference within actual control of the images?
Is there a resource to have the text/tags used to generate the image be simplified? I am not good with stable diffusion yet so I don't know how to organize the details
The prompt can be as simple as you want, "a photo of a dog", the complexity is just trying to nudge the interpretation in a different direction
I dont really mean simplified, I mean optimized. can I send you a dm of what I am trying to do?
No need to dm, just state more plainly the actual question
I am trying to make a bat character for a campaign I am working on but no matter what I do to push the image in the direction I want, stable diffusion pushes back and screws up something like not including the wings, making her hair short, ect. this is my prompt
1girl:1.4, bat girl, (pitch black skin color:2), (very thick thighs:1.4, small breasts:2, long legs:1.4), (adult:1.4, mature body:1.5), (large black bat wings on back):1.5, (large pointy elf ears1.5), (long dark gray hair, messy hair, hair covering both eyes:1.5), (light freckles covering whole body), (golden eye color:1.4, dark circles under eyes), wearing a black leotard, black thigh high leggings, full body, simple background, SFW
The pictures are close but it isn't what I am looking for. Also when I put something in the negative prompt it seems to specifically add what I don't want
well, this is more of a generic statement, but the very nature of text to image generation is getting something unique every time. prompt engineering is what you're asking about, but it can only go so far ,sometimes you need more tools in your belt, like controlnet, or using other images to create depth maps, inpainting, outpainting, sketch models, etc
so if you have an image that's most of the way there, you can use that image to maintain the parts you want to keep, using for example a depth map or a canny in controlnet
from there you can change the prompt to add your subtle variations
also this prompt sucks,too many words,too many weights on words and a bunch of redundant sentences
I have no idea what any of that meansš
good, then you know what to research next
That is what I mean. I have no idea how to simplify it to make it more concise to what I want
I will do that next, thank you!
I have had images, for example that were like 90% of what I wanted, but I wanted the arm in a different place. These things are possible, but not just with prompting
I haven't played with the models/tooling for few months. Where are we in terms of reusing specific elements (key character, object) in different scenes?
Much better place - ipadapter and instantID for sdxl are really good
instantid has a premade runpod dockerfile, nice
I'm trying to get a differnet head using maskin in inpainting. I get ok results but most of the time the new head and face are slightly off being too big or too small. Which ControlNet model I could use to help with that? I want different head and face but in the way it sits well and looks natural.
So far Canny has given me the best results. With Canny, the problem is it's making the face look too similar to each other.
Depth is probably better to maintain the position and just change the look
Midas or zoe
Midas seems to do pretty good job, will test Zoe too next
Hello, what are the best arguments for a 6GB VRAM ? I use --medvram --no-half-vae --no-half
What's your GPU?
Nvidia GTX 1660 SUPER
Then you need:
--xformers --medvram --no-half
Xformers will give you a performance boost
But I heard it decreases quality
It is worth waiting a bit longer if the quality is higher
Thank you, I will try
Np š
hey, why does my code not work? How can i load my checkpoint (https://civitai.com/models/271592/big-head-3dxl?modelVersionId=306137)
Code:
from diffusers import StableDiffusionPipeline
import torch
def generate_images(prompt, num_images=5, local_model_dir="/home/yago/bigheadsdxl/bigHead3DXL_v10.safetensors", output_dir="./generated_images"):
device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch.cuda.set_device(device)
try:
pipe = StableDiffusionPipeline.from_pretrained(local_model_dir, local_files_only=True)
pipe = pipe.to(device)
except Exception as e:
print(f"Failed to load model from local path: {e}")
return
for i in range(num_images):
generated_image = pipe(prompt, guidance_scale=7.5)["sample"][0]
image_path = f"{output_dir}/image_{i+1}.png"
generated_image.save(image_path)
print(f"Image {i+1} saved to {image_path}")
if __name__ == "__main__":
generate_images("a sweet cat")```
Error:
Failed to load model from local path: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/home/yago/bigheadsdxl/bigHead3DXL_v10.safetensors'. Use repo_type argument if needed.
I haven't been here for a while. Is it no longer possible to use stable diffusion via the channels in this server?
ah crap, I hate hands and arms. Been trying to get a decend looking hand for an hour or more now, still getting all sorts of weird stuff. ControlNet tested etc.
Is dream studio down? I cant every time i try to create images i keep getting something went wrong on our end please try again later
got a question. Do you think using koikatsu pictures in image2image will get you more accurate designs of the characters in it? (under the condition that you use a koikatsu image and the lora of the same character you want)
other than Arch Manjaro, what's the best linux distro for running SD with an RX6800?
manjaro is frustrating me that I can't use pip to install stuff
yes that could help, but mostly a lora is enough
im thinking of consistency issues
like if you have a character with a more detailed design
hey guys, if i have an image of a shirt, and i want to create a similar art style shirt just with a different material, what kind of prompt i should use?
or settings that im missing
you might be able to do that with IP adapter, but essentially you are telling it to use an image as the basis, so it's a controlnet thing vs a txt2img thing
try loading it into img2img, and into controlnet with ip adapter, and use a prompt like "make it cotton" or "make it silk". I've never tried it, so not sure if it'll work
start with a lower denoise and increase if nothing happens
bad luck?
the bottom row of options are all greyed out
in kohya_ss?
whats ip adapter?
yeah im looking right now to even how to install it š
any chance you have few mins to guide me trough it on disc in the voice channel here? ill sharescreen
do you have controlnet installed? or know what that is?
nope, i have stable diffusion and able to generate some images with prompts atm
ok, so then I would recommend watching some basic videos on what controlnet does, working without it, kind of like going into battle without a gun frankly
alright, will do
where do i start generating images ?
depends, you can install something local on your computer, or use an online service
how do i start generation here
there are bot channels
I dont use them, but check them out, you'll see others generating stuff
actually i am new here so can you let me know where specificslly
hold on, I have all those channels hidden, let me undo that
what should i use till then ]\
Hello to everyone!! š
I need to be useful to my wife. She is an aesthetic doctor and needs an image (before and after) for a gummy smile treatment. You know. A smile with a very thin upper lip and when smiling, the gum and frenulum are prominently visible. This is a very simple and beautiful medical treatment.
I'm trying to make the first image, the before. I'm using "Fooocus", because I saw that it is an AI specialized in faces.
It should be noted that I don't have much of an idea and I have managed to run it because there is a .bat on github. Otherwise I wouldn't know how to do it.
The fact is that I can't get it to display the image correctly. No matter how detailed the prompt is regarding the topic of gums, it only makes normal smiles. Normal lips + pretty teeth. And it is precisely the opposite what I am looking for. A variant of that. But I don't know how to do it anymore. Can anyone help or guide me?
infinite thanks in advance community!
I used this: https://github.com/lllyasviel/Fooocus?tab=readme-ov-file
veeeeeeeery grateful in advanceeee
how do i run or use this
I used this prompt: "
A close-up view of a woman in her 40s, capturing her wide smile with her thin upper lip, clearly exposing the gum tissue and the frenulum that connects the lip to the gum. The texture of the gum tissue is pronounced due to her wide smile and the tension of her upper lip, which draws the viewer's attention directly to the space between her lip and teeth. "
What i have said wrong????
you're right, I wasnt aware because I never used it, but the post about it is here #1047610792226340935 message
got the control net installed and running, which ip adapter model should I get?
so according to that link, you'll have to install SD or find another way to generate images with an online service
There are a few options, first one is to download stable diffusion but you need a powerfull computer. Else you can use midjourney, craiyon, runwayml, canva, ...
Adobe firefly is pretty good too
sorry, that link I posted above was only for 1.5. Are you using sdxl? if so go here lllyasviel/sd_control_collection
on hugginface
just grab the ip-adapter_xl.pth, and here https://huggingface.co/h94/IP-Adapter/tree/main/sdxl_models
And I put it under the same folder models?
I have AAA1 so I installed the control net trough the extension tab
stable-diffusion-webui/extensions/sd-webui-controlnet/models that's on my pc, but I'm linux, same concept though
Thanks a lot buddy, I'll try it!
good luck
Looks like I'll need it š
Hey all
any good models that capture the feel of early AI image generation? (dall-e mini etc)
are you also interested in changing shirt to underwear too?
Can anyone give me the link through which I can generate images with sdxl in huggingface?
and also runway ml
guys for deepfakes, what are the state of the art methods?
(I donāt want to run them locally)
Not really no, I'm developing a game and I bought some art asset like a shirt, pants shoes.. I need more assets for different class and level
off topic, but i generated an image on a core 2 duo with 2gb ram in only 22 minutes
this is actually crazy tbh
off topic, but does anybody know how to faceswap in a video?
What would be more interesting: buying credits on dreamstudio or becoming a membership on the stabiliy.ia website to create realistic photographic images?
Think of membership as acquiring a licence to use it (and thats free if not selling anything); credits on dreamstudio will allow you to generate
Impressive patience!!
and pain
Search the huggingface spaces for sdxl
Runway ml has to be done on the runway website
Just find the earlier model (ie stable diffusion 1.4)
Let's make a RIP on Dream's generations, presenting our favorite best works in a video slide to remember. And let's share in this chat. This will of course take time... but looking at the work that came out in the dream, I canāt compare it to anything else. Whatever engine, etc. standing in this chat it was impossible to repeat! or letās share the works that impressed you and became irreplaceable forever... Iām completely inspired and Iām so glad that I was part of this project!
šš»
I just now saw that I canāt attach an image or video, maybe I should create a separate chat?
Hi! Has anyone here experimented with cascade? I'm just getting into it, and I'm trying to figure out if there's a specific prompt format that works best.
Dream ć®äøä»£ć«ę¬ęć蔨ććęćåŗć«ę®ććććŖ ć¹ć©ć¤ćć§ćę°ć«å „ćć®ęé«ć®ä½åć瓹ä»ćć¾ććććććć¦ćć®ćć£ććć§å ±ęćć¾ćććććć”ććęéćÆćććć¾ććā¦å¤¢ć«åŗć¦ććä½åćč¦ććØćä»ć®ćć®ćØęÆć¹ćććØćÆć§ćć¾ććććØć³ćøć³ćŖć©ä½ć§ćę§ćć¾ććććć®ćć£ććć«ē«ć£ć¦ćē¹°ćčæćććØćÆäøåÆč½ć§ćććć¾ććÆćććŖććęåćććććććć®ćŖćä½åćę°øé ć«å ±ęćć¾ććć... ē§ćÆå®å Øć«ć¤ć³ć¹ćć¬ć¼ć·ć§ć³ćåćć¦ććććć®ćććøć§ćÆćć«åå ć§ćć¦ę¬å½ć«ććććć§ćļ¼
ē»åććććŖćę·»ä»ć§ććŖćććØććććć¾ćććå„ć®ćć£ćććä½ęććć»ććććć§ćććć?
Someone got an Idea of how to implemented real fashion to an ai picture like and sweater from Nike. Is there an add on to automatic1111?
Thanks!
general with images
there are krita and photoshop plugins for both a1111 and krita
nope, video requires way more vram - I'm pretty sure theres a space on huggingface that has the demo
cascade and sdxl share clip models, so full sentences in the prompts should go better
yeah thats what I thoujght...
Can you elaborate about the photoshop plug in? I have aaa1
Thanks!!
can i use comfyui with the new stableCascade yet ?
it always tells me missing modules, but i cant seem to find them
yeah, it is built into comfy now, will need to update the whole install
just a git pull ?
hmm, i didnt even install with git, but the portable version
how do i best update it ?
find new version of portable installer I think
hmm, i got a thousand addons and stuff installed, you gotta start again from scratch, yes ?
i think i just lack the new nodes for cascade, any idea where to get those ?
Thanks a lot! Do you have some links for more information?
Nope, should be able to install over the top I would have thought...
can you give me a link of what to get exactly by chance?
thx
where do i get that clip_g_sdxl.fp16.safetensors from and where do i have to put it to?
huggingface (or it should download automatically?) you might be best asking in that reddit thread sorry
hmm, it works with model.safetensors instead of the clip_g_sdxl ... but doesnt seem to make too good results. but i cant find the other one anywhere
Hello all. Basically in open ai whisper I am trying to change this python script instead of console.logging I would like it to save the segmented results into a .srt file. But currently when I console log it its just in pure text, non segmented format
import whisper
# Specify the full path of your audio file using raw string notation
audio_file_path = r"C:\Users\USER\Desktop\MikeQuinnPostVoiceRemoval.wav"
# Load the model
model = whisper.load_model("base")
# Transcribe the audio
result = model.transcribe(audio_file_path)
# Print the transcribed text
print(result["text"])
Which I call using this python interpreter
& C:/Users/USER/anaconda3/envs/whisper/python.exe c:/Users/USER/Programming/AI/Speech-to-Text/Transcription/transcribe.py
This is Stable Diffusion discord
ask me if i care? @slate dust
why dont u mind ur own business
with ur shitty ai dinosaur pfp xD
lol ill stop
Ok you rude pos š¤·š»āāļø
u started it
@frigid sinew - whisper is not a stability AI tool, you might luck and and find someone in here that is able to assist (possibly in #šµļ½stable-audio ) but there are probably better discords/subreddits to be asking that question in, or possibly even asking chatgpt by putting in the code, the result of the print statement and then asking it to edit the code to get it into a format you want
nice š
how do I add it to my stable-diffusion-webui
I dont even know what version I have 1.5? 2.2?
when I look in ./models it is just controlnet and other stufdf
so you have stable diffusion webui (automatic 1111)
ok cool
yeah
if you open it up, you should see a version number at the bottom
very bottom of the screen.. should be a 1.x number, likely to be 1.6 or 1.7
webui-user.bat
1.7
if its above 1.6 then to run sd-xl all you need to do is download a sd-xl model and put it in the ./models/StableDiffusion folder
ignore the web interface for now, you've got a system that will work
ok. current my models are
v1-5-pruned-emaonly
realisticVisionV60B1_v51VAE
what models do u recommend?
you can download a model from civitai that is using the SD-XL base (I recommend https://civitai.com/models/229002) shove that in the models/StableDiffusion folder and away you go
(assuming you have 8+gb of vram)
yep I have 12gb
stable diffusion xl is just inpainting?
do u have any tutorials that would resemble my interface
when it finishes downloading
stable diffusion is the whole method of making art from text prompts
they said sdxl combines inpainting and image generation
sd-xl is a full model, so can do text2image, image2image and inpainting
whats so special about it
it's trained on a much higher base resolution than 1.5 (1024x1024 rather than the old 512x512)
also behaves much nicer on full sentences in the prompt rather than the word salads that 1.5 prefers
oh cool
I thought it also combined image2image and inpainting into a single step
rather than you having to do it in two
or am I thinking about a different webUI
different..
inpainting is just image2image for a small part of the image, as opposed to doing the whole thing
you make the image, then send it to the inpainting tab using the button below the image preview, and then you draw a mask over the area you wish to inpaint, choose settings (selecting "only masked") and then hit generate
thanks
make sure you update the prompt describing what you specifically want in the inpainted area, otherwise it will try to redraw a whole image in the area you just wanted a new hand
What are some good tips for getting good realistic images? My faces aren't turning out very good... I'm using a realistic model, but still kinda blurry and fuzzy with a deformed face. Thanks so much
which model, what resolution and sampler, what user interface?
Model : epicrealism_naturalSinRC1VAE.safetensors (something I found on civitai) sampler :DPM++ SDE Karras UI: SD1.5
im using 30 steps and 512 x 512
epic realism is a good start.. try using DPM 2M Karras at 768x768 - are you using comfyUI, automatic1111 or an online generator?
Okay cool thank you, automatic 1111
ok.. in that case, your best bet is to do 768x768 and then use hires fix with upscaler real-esrgan 4x+ upscale by 1.5, 10 steps at 0.4 denoise as well
also reduce the cfg
cfg scale, below height and width
oh okay thank you. What does cfg do?
tells the ai how creative it can be with 1 being just make me whatever and higher numbers being more true to the prompt (but at the risk of the image going weird)
Most models will be happy in the 6-8 range, but the realism ones can go from 2-6 with 5 being a fairly good middle ground
#šļ½general-with-images message for what you should be able to get
Guys where is the bots did they removed? İ wasnt online much
using the a1111 webui, how do I create a matrix where the same lora is used at different weights, without the lora at a given weight being inserted into the prompt next to itself at another weight? I don't want "<(MY PROMPT)>, |<LORA_A:0.25>, |<LORA_A:0.6>" to happen;
worst case scenario, I have to make a matrix for each and every weighting I'm looking to use, which would be 0.0, 0.25, 0.6, and 0.9; with five different loras
use the x/y/z plot script at the bottom, and use Prompt S/R (which is search and replace) - format is the text you want first, then a comma, then the text you want to replace it with - you can keep adding commas and more replacements
Thank you! Can you give me a brief example prompt with explanation of how to modify it?
hm, I see the settings that option reveals. huh.
will look for a tutorial
think I know how it works, or at least how to test it!
only supports 3 loras at a time, huh.
you can do the same lora many times though
no it supports 3 axis. i'd do prompt s/r on lora names on one axis, and strengths on the 2nd axis. like the X s/r field is <lora:AA,<lora:BB, <lora:CC and then on y axis do s/r like :1>,:0.8>,:0.7>,:0.6>
then you get a grid of loras and their strenghts nicely correlated
you can even do 1 grid for each lora. like x s/r is the strength, y set it to cfg, and z s/r the lora name
what I figured, thank you!
So the only way to get consistent characters is thru loras? or are there better methods, also i use comfy-ui because its what my hardware will run.
only way to get consistent characters is to use the same seed every time
consistency is, well, you'll get a feel for it; you can use very specific prompts get the same pose, expression, angle, background, shading, and so on
After no crypts Collab got banned are there any free alternatives?
I'm gonna buy a pro membership but I run a side hussle using stable diffusion and I really need some alternatives
so seeds are a better/easier method to get similar looking chacrters? i was wondering about that.
once you get something you like, keep that seed, adjust lora and prompt weights to change the 'size', 'figure', and 'style' of it
afaik
instantID, ipadapter, roop, reactor - all tools in comfy that can do faceswap for consistency
loras are pretty effective. well trained checkpoints can be too. specialized to the characters you want
do they do outfits aswell or only faces?
both
i just dont know if i have the hardware to create a lora and what if i want two different characters in 1 scene? or something.
face swaps are great for swapping faces. ipadapters can often do style transfer, but wont be consistent with clothing. sometimes pants somets shorts
if you are running a side hustle, probably should pony up some $$ š cheap (and some with free tier slow gpus) gpu available at kaggle, paperspace, runpod, vast.ai, aws sagemaker
regional prompting is what you need for that
loras are pretty lightweight to manage and create. i believe in you!! you can train 2 characters to one lora if you build the data set right. or you can train seperate loras and do regional prompting
thank you guys/all for the help and beliveing me.
we're all some random guy
i aksed this questions because ive been interested in creative media such as games, and comics and what not.
Probably so.
it'll be similar, but the quality will be mindblowing in 2 years time
i'm trying ot figure out what do to for loras on cascade. the popular projects have half built branches. there are lora scripts in the cascade repo. i just.. i'm not that good
may i ask what do you all use ai such a stable diffusion for anyways?
I got the cash I just don't have a credit card or PayPal to pay for a service
My sister's getting her card made but until then
I need to use something
i float around different uis. right now been using forge. got it on the 1.8rc branch now
sd-webui-forge
intesrting ive tried many ui's and i find comfy works the best on my hardware.
civitai has a generator thats totally free, but can only generate in limited resolutions and no upscaling/inpainting
tinybots.net/artbot is free as well, but as you would be an anonymous user it'll be slow (unless you join the horde at aihorde.net)
happyaccidents.ai has a free tier
forge uses a few memory patches from comfy code and now all the major uis are using the same pytorch library. really depends on your hardware and how it gets set up too. i have a 4080 so i'm sailing
comfy and forge are the fastest ones by far
i wish i had ea 4080
stable swarmui puts a great interface on top of comfyui node graph. it's nice.
i bet so but for me and my hardware comfy works out.
i should look into that then.
https://github.com/Stability-AI/StableSwarmUI i mean to play with it more often. its very customizable like comfy
cool and you said this is all comfy based...
by default. can use others as a back end
that is interesting
also sense we were talking about ai sidehustles, i wish i could start one but idk where to start or even begin i manly just use ai for fun and random stuff/testing.
You can make ai art for stock images, do commissions, use it to make dropshipped items; or train AI models for others
ooh stock images and commsions seem like a good start. are there any good videos/resources on how to start with them.
search youtube - every man and his dog has a video on how to make an AI influencer! but there will be stuff there for stock images as well.
cool thanks guys
i just want to do something different from other people, you know?
hey guys, hope you all are doing good, I am new to stable diffusion, although my purpose super-clear in regards to using text to image generation in life and i.e. UI elements, have installed and tried the vectorstudio extension of automatic111 for logo, illustrations, art etc
I will break down my requirements in bullet points here. So I am looking for any suggestions related to
- better arrangement, if any, for the kind of outputs I am looking for
- Is there a place to look for finetuned models and trying, comparing their outcomes? I mean like a community where people post their work, I know looking for similar tags in hugging face search results could be a way
- Any documented guideline for positive/negative prompts for best results
Note: I am looking to use TensorRT versions of any apt model, I am assuming that any model I pick will be a derivative of stable diffusion XL. You can comment on this choice too
tensorRT is best self compiled and doesn't give massive speed boosts from what i've seen, and for everything else, look on Civitai.com
yeah joined their server too
But I think if one makes sure what model one wants to use, it's still better to have boosted generation
it is boosted to a fixed resolution though innit
I think there's a way to set dynamic resolution while generating the tensorRT version
right..
but yeah.. civitai is absolutely the site for looking through models, they have a generator for trying most of the models (still in progress/setup mode), and most of the images there have prompts on them so you can see what works for each model and what doesn't
and you can filter the model by type so you only get sdxl models
eh.. haters gonna hate...
there is some validity to the "artists weren't asked before their data was included in LAION" argument, but the way AI works is no different to you looking at all of the images by the masters and then figuring out how to recreate it yourself; AI is just really good at that
power usage on GPU, yeah thats a thing, but it's tiny compared to a lot of things, and as for crypto, proof of work coins are really bad, but proof of stake coins are fine for the environment
haters gonna hate...
will look through
there's a few times things are over fit in sd 1.5 base model. and then other times where people were purposely training models on specific artists to piss them off (see a civitai contest)
probably Stable Cascade
Depends on what you want. There's general models available but in reality, every model has a default style that sort of can be seen throughout the model.
What the best site to host Stable Diffusion / pinokio.
not only in terms of price but also bc of use as i dont feel like / cant go through setting up EC2 again
I'm sorry but why is anything 5 a safetensor?
Are Lora files also safetensors? I don't get it
Safetensors just refers to the fact it is a set of static numbers that cant be executed
find myself hammering the interrupt button and it does nothing lately.
I don't want to close the app, so just wait for a1111 to finish processing bad images.
Prefix with eicar string and it'll be executable
Also, will get deleted by antivirus
Inpainting and a whole lot of luck
inpaint sketch works amazing for that
but eye positions in general are hard for AI. A lora also helps
hi
Think about what it costs in terms of energy to grow and transport and refrigerate the food required to keep you fueled up while tediously grinding enough drawing every last detail by hand
Bet it's a lot more than a couple minutes of juice to power a GPU even a big one
Yeah
Ppl make some desperate arguments
Oh yeah ik š
I was backing ya up
which could be the best LR scheduler for a face LORA?
just use prodigy
and the unet and text encoder wich values u recommend?
hello everyone
Where do I report a user in this server? Like do I DM a community mod or what?
I can't even attach an image when I used that? Should I just link to a catbox?
You can once you send the initial msg inside
Ah okay thanks
Btw Gigabot seems broken, after ticket closed, it gave a transcript, but it's empty. Not that I personally care, but figured I'd let you know incase you don't already know
I can't draw this instructions now
I don't use Comfy / Automatic only diffusers, but my guess is that the latent output from StableCascadePriorPipeline is fed to the SDXL refiner?
not quite - it is sent to a second model in the same way the refiner is, but its based on the wuerstchen architecture
Trying to oupaint a room, what would be the best method to it? I have base image of 768x1024 which is then expanded to 1700x1024 in GIMP, basically just added empty canvas to both sides. In the base image there's a person standing and I want to fill the rest of the room. I've earlier just masked the empty areas, tweaked denoising and basically hoped for the best. This process is really slow as it's mostly purely luck to get something that would fit the base image.
Masked content fill / orginal, Inpaint area: whole pic / only masked, different denoise values. If denoise is high enough to get more of the room generated, it usually does not fit the base image at all. Or if the denoise is low enough, I get a door or wall blocking the view, or just some color or white or what ever. at around 0.71 seems to be the sweet spot, still far from good haha.
Also poorman's outpaint and outpaintmk2 give bad results.
Could I get some help with ControlNet?
Hello everybody!
Please, I need someone to help me install Stable Diffusion on my PC. I've been trying for weeks and for some reason, I always end up getting an error. I have an AMD processor and graphics card (I think this is the problem). I've watched tutorials and tried numerous possible solutions, but I can't find the solution. I'm sure it's something simple that I'm overlooking.
If someone could help me privately, I would be very grateful.
Installing on windows? which amd graphics card?
if ubuntu linux - scroll up to about 10 hours ago, I went through it with someone this afternoon
Windows 11. 6700XT
i have a question to ppl who have 6-8gb cram, how works you guys SD forge or normal SD from a1111 u can generate sd 1.5 and SDXL image bigger then 1024x1024?
tiled VAE - automatically done in forge, or needs the multidiffusion extension in a1111 - splits the numbers up as they go into the last part of processing to not swallow up all the vram at once
ok but its possible? and how log time take generate one image 1024x1024 with 6-8gb vram
Depends on card and model - a 1060 6gb will generate much slower than a 3070 8gb
Definitely possible though
ok thanks
Hey all, does the speed and amount of compute power determine what kinda image will be spit out by stable diffusion ie. 1070 vs 4070
you could use the same prompt but it will give totally different output?
fix bot please sir
No, you can generate the same image on a 1070 and a 4070
It depends of the settings used. There are some extensions and settings (mostly --xformers) that makes SD undeterministic and/or change the result depending of hardware. But no the quality of the output does not depend of the hardware used.
thank you
Good morning, everyone! How are we all today?
I am fine on this President's Day
Hello everyone!
Im quite new to generative AIs. I was wondering if the base model, like v1.5 is necessary to have in your model folder when you already have other models trained on this version, like dreamshaper or epicGasmPhoto or so? Or for example, to use JuggernautXL, do i have to first download SDXL also and have it in the models folder? Or these base models are mainly relevant for people to build on these models, and afterwards they are independent
I currently have e.g. SD 1.5, epicrealism, dreamshaper in my models folder and in the UI epicrealism is selected as checkpoint since I play with it. Do I have to keep SD 1.5 itself or can those checkpoints can work without it?
no, each community trained model is independant and can work by itself.
Good to hear!
guys im getting this error
RuntimeError: Could not allocate tensor with 2047360052 bytes. There is not enough GPU video memory available!
I am on a 6700xt with 12 gigabytes of vram
on directML windows
using no-half
and low-vram
it generates the whole image fine but when it gets to upscaling
it insta-crashes
damn at this point, being able to get a 3090 24gb or something like 4070 ti 16gb is the only thing that's gonna save my interest in doing this. Maybe evn more vram ;/ Hopefully AMDs (or guys from AMD) making ZLUDA will eventually result in that being able to work for communicating quickly between cuda and non cuda
AMD are just worth so much more at this point, esp if they release another batch soon
so, got any solutions?
nay
you can already use zluda with SD
I've been gone for too long to remember how to fix things
cf pinned comment in #š¤ļ½tech-support
or how toget SD to work with an AMD
and how toget an AMD
currently have a 3070 ti
sitting at 8gb vram, which is a bit of a bummer
does it work well?
Looks like it, there is some problem with some extensions and other stuff of course. I don t know much about it as I m an nvidia user.
i was shocked to see the whole thing written in rust
Careful.... #āš¼ļ½rules-and-tos message rule 4
(missclick)
@indigo wasp
It works very well for 1.5 models, loras but sdxl and some extensions are bugged
Image dāun quartier animĆ© de Paris le soir
Dont try to do that kind of stuff with the bot in here (whenever and if it comes back online), keep it to local instances. Otherwise you ll get bonked
i do it in installed sd
but it just doesnt work
theres still the bra
It depends on your model, what you prompt for, etc. By default there s no censorship on local instances.
which model should i use
thanks š then I can simply delete SD 1.5 then
nope, currently the bots are under maintenance. #1047610792226340935
like i said the bots in the bot channels where you can generate images with, are offline right now
Is there a tutorial to generate images with 1 object/body part always in the same spot/location?
@tidal adder hello sir
hello?
@tidal adder can u help me out
probably not as there are many ways to do so.
- using some mask
- using controlnet
- partitioning the output into multile zone with stuff such as "Regional Prompt Control"
- using a reference img and altering it with img2img
@stuck gazelle
does this have a local api? https://github.com/Mangio621/Mangio-RVC-Fork
I didnt know you made that thats kinda crazy
yup
agreed
Thank you, which method would you deem best? I tried mask but it's inconsistent unless I missed something
who can help me out what is the front end that is used for SD
If it is just a front end that runs on a browser and it doesnt have a api. can I just reverse engineer the requests
it depends of what exactly you re trying to do, try asking and be more specific in #šļ½prompting-help (got to run for now)
Will do, appreciate it!
@tidal adder I guess u dont care :/
i don't do stable diffusion stuff
i specialize in language models
oh wait
you had two questions
no
it's gradioware unfortunately
@tidal adder Is there any way to programmatically interact with it?
anyone able to use this for controlnet tile on sdxl yet? https://github.com/showlab/X-Adapter?tab=readme-ov-file#thank-kijai-for-creating-a-cumfyui-warpper-node-here
why does the ai not make a ai image by my reference image?
Hi when are bots coming back? Bot 9 Is my best friend :/
last I saw it was still no eta
hi there, do all checkpoints use SD1.5 etc as a base when creating their own?
or does anyone build their own checkpoints from scratch (w/o the base)?
i want to study more how the whole process works overall
almost nobody is creating checkpoints from scratch, with consumer hardware...what most of us do is finetune a base checkpoint using dreambooth or similar
you would need a farm of computers, and months, and $$$$
oh? so the base models are the most advanced i guess?
hasn't any one made alternatives to the base model for specialised purposes?
sure, you can finetune them, merge them together, etc
add new concepts using new images, all that is possible
the purpose would be to analyze each layer of the process
like can't you make your own base using the same checkpoint creation tools
What is the process for training a model in Comfy? Suppose I already have the base model, how do I reward it if it's right, and correct it if it's wrong?
nothing is stopping you from creating your own base checkpoint, but reaserch it for a while , and you'll realize the massive amout of data you need to feed it, it's not the process that's hard necessarily, but the data and resources needed
consider there are big companies doing this now, stability, google, meta, with teams of people and billions of dollars at their disposal
Hi ā we are looking for AI Engineers entry level that can help us finetune Stable Diffusion model. If you are interested please DM me with any related work you have done professionally or personally if oyu have no work experience.
i don't need to make a replacement per se--
i was wondering though if it could just substitute it with a small model to see what i can output w. my own training
is it similar to in kohya where it takes a ton of images and captions?
start researching LAION
Hi guys, anybody know one Upscaler model with have similar results to https://magnific.ai/ ?
if you want those results, use that
can you upscale with other methods, yup, lots of ways to upscale, and SD is really good at it actually
SD is a little qwerky though in some ways, it has resolutions in prefers, there are limitations based on your resources, etc
what tools do people typically use for creating their checkpoints--and how do they differ from lora's
is that what the finetuning tab in kohya is for?
lora edits small parts of the model, finetuning edits the whole model; similar results - what tools: kohya or onetrainer
Just updated to Stable Diffusion Forge. Is this the current best option? What do ya'll use?
It's a good option, best is personal preference but generally forge/a1111/comfyUI are "best"; I'm running forge currently
so are you saying the lora's outputted have a small part of the base model injected into them relevant to your training data?
so it's like a patch for a portion of the base i guess?
nope.. the output is a set of numbers to swap into parts of a base model to make the model generate in a way that is relevant to the training data
patch is a good word for it yeah
i see
so all checkpoints are usually just finetuning the base right?
do you know what tool typically artists used for 1.5 checkpoints like on civit
is kohya pretty useful for that or mostly for lora's
anyone else having issues with dreamstudio?
a checkpoint is a full model
kohya_ss (bmaltais GUI) or OneTrainer are common tools, or if making loras you can do that on the Civitai website as well
yea but most of them use the base model still
kohya is for any training
it is always best to train off the base model, and will have the architecture of the base model, it just changes the numbers as it does the maths to make the new model resemble the training data
is there a real performance difference between storing the webui and content on an HDD instead of an SSD?
i don't think so--you are not outputting very large files
i think that your real gains come from ram, gpu and vram
one other thing, anyone know about the checkpoint merger in A1111
is weighted sum option still useful if using a tertiary model?
the example i've seen suggests that a 3rd model is mostly for "add difference"
only time you will notice a difference is when swapping models
yep - weighted sum not useful for 3rd model in checkpoint merger - there is some use in supermerger extension - I would suggest grabbing that extension and reading up on the settings available in that one as that'll get you 99% of the way there
okay--i am guessing without the extension the 3rd model isn't used right?
also one other thing, how do i know what config option to select
what does the config affect
that's to do with yaml files, if merging different types (i.e. inpainting models), I've never touched it
do you by any chance know what the json files outputted in kohya are for?
should they be added to the webui with my loras?
you can save information about the loras, prompt shortcuts, image previews, etc
is that just for kohya i guess?
well doesn't seem to be needed for A1111 and i wouldn't want to distribute with lora because it has my system file paths etc
not needed, can be removed unless you want to figure out what you did last time to recreate a training
i figured it out it's no different than the ones manually saved
I'm new to stable diffusion, and I got SDXL turbo working for txt2img and img2img. I was wondering how it's possible that the same set of weights can work for two different tasks, cause the if the model works for img2img, then it should be trained on text and image conditioning. Then shouldn't it break down/not function properly in the txt2img setting, since the image condition is no longer being provided?
Is the image condition being set to some "null value" (like in classifier free guidance) when I use the model for txt2img?
If you use Google Colab paid you get 100 compute units for $10. No where can I find how this actually relates to something real. If I train a lora (koyha_ss) with 15 minutes and it takes 20 minutes, how many "compute units" is that?
If you are using a model you downloaded from civitai, then on the page you downloaded if from, if you scroll down will be lots of example images with the positive and negative verbiage.
Even if you aren't using civitai, might be a good place to get some starting verbiage.
Ok thx bro
You're the best
Have a good night
Drawing a traditional Chinese painting against the backdrop of Lingang New City in Pudong New Area, Shanghai. The work can showcase the following aspects of content: Showcasing the governance and protection of the ecological environment of rivers, lakes, and seas; Showcasing a waterfront open space with clear water, green banks, and pleasant ecology, an ecologically clean small watershed, a waterfront landscape at the entrance of a home, and a beautiful river and lake landscape; Display scenes of river leaders at all levels, volunteer teams, and civilian river leaders patrolling and protecting the river; Display themes related to water knowledge, water conservation, water conservation, and hydrophilicity
Hi guys, I'm new in stable diffusion. I'm thinking about maybe using the stable diffusion model for a thesis with imaging applications. Do you think it is possible to use it, I would like to ask you what recommendations you have? That is, when a research is proposed, improvements are usually suggested in some aspect of the model and comparisons are made. In this case, what could it be with stable diffusion?
cat
What would your thesis be? What hypothesis are you testing?
does anyone know how to setup a good face swap? willling to pay $
can i ( Hires. fix ) in img2img ?
has stable cascade been implemented in any front end like comfy ui yet ?
Bots STILL not fixed yet?
they aren't broken, just disabled
no.. hires fix is essentially img2img under the hood
look up instantID or ipadapter
does anyone know how to setup a good face swap? willling to pay $
there are proper comfy nodes for it, and a temporary a1111 extension
Hello, does anyone know a good resource for comfyui that checks for doubled or duplicated pieces of a prompt?
For example, it might check if I have 'blurry' written twice
Not that Im aware of but it sounds like a good idea
hi all, is there a quick intro guide to getting stable cascacde running locally on my GPU? i saw the github but didn't really see a step by step guide for someone new to this. i am familiar with python though.
what makes stable cascade different than 1.5/xl. excuse my laziness but I really dont want to waste time watching videos on something that can be summarized in one sentence
what ive seen is that cascade does a much better job with words
Under the hood it is doing maths on blocks of 24x24 pixels instead of 64x64
Yes. Since the weekend I alway get the error āsomething went wrong on our endā I already opened a ticket but so far no reply
Hi there,
Can somebody what' that model used in AI to make a face psoe at different angle in real time?
That was the DragGAN paper if I remember rightly - hunt that down and you may have your answer
yesh... really really htnaks
*thanks
Same here. So frustrating, especially with no communication around whatās going on. I have a good amount of credits in there.
@karmic brook @finite cloak - you know anything about the dreamstudio issues/who to chase up in case you dont??
Dont use google colab, they are openly against uncensored models. Id use vast ai or runpod
Google also intentionally has sabotaged SD before, and i couldnt get the webui running due to resource limitations even after upgrade
This - and youll get a better gpu on runpod/vast.ai
They do remove accounts circumventing the ban on a1111 and comfy
How do/are you making money with AI art?
There are generally 3 groups - those running a service (like mage.space owners), those doing commissions (either model training or making art), or those making products (dropshipping style putting their ai art on things)
Is there $$ to be made? Maybe but I wouldnt quit your day job to persue it given barriers to entry are low and its not a new space anymore
Two others I hadnt thought of were Youtuber (eg Matt Wolfe, Sebastian Kampf, Olivio Sarikas), and software dev/in house expert in the corporate space
Hope that helps @urban elm
Hi guys, im not that much of an ai user, I use it from time to time. Been creating a startup recently, logged in to generate some logos for my startup but that maintenance on bots ruined my plan. And it seems like it wont be available for a while too.
Can you guys suggest me where to create a decent logo for my start up ?
Is there official Stable Cascade workflow ?
I think there was one floating around on reddit linking to one
Try the create button on Civitai.com
My images generating in what used to be 10 seconds is now taking like 3 minutes to generate, restarted everything made no changes to models or settings
How do i fix this
Okay just narrowed the problem down
for some reason automatic1111 is only using 30% of GPU where before it used to be like 90%+
Cuda usage is also 30%
what resolution and what graphics card?
I'm assuming nvidia, and a reasonably large resolution for the card
Earlier this morning things generated 5-10x faster and I would lag just watching youtube videos while it was generating
512x512 rtx 4090
I mean youtube at high resolution does suck some gfx capacity, are you using batch size or batch count ?
Batch count
you got a controlnet stuck? have you done the cudnn file fix?
I don't have control net installed
have you turned it off and on again?
I'll try this
Generation speed without loras are now a lot higher but with loras it goes from 20it/s down to 1.5it/s on the same settings
the moment I remove the lora goes 10x
same speed with 1 lora or 10
VRAM usage is 5.8gb without any loras although there are random spikes to 16gb
that'll be while the vae is loading.. but shouldn't be that high
weird with the loras though
