#💬|general-chat
1 messages · Page 129 of 1
Profoundly accurate 😔
it's not so much that it's overdone as it's that ppl are having massive irrational panic attacks and getting very aggressive with some content creators
isnt that just the internet in a nutshell? practically?
yeah, and ppl in general in real life too, which is where half teh problem is
bet game devs etc are getting hammered with death threats every time they use generative AI to create something, in part because of the effect it has on jobs, when we've already been shipping jobs out of the country anyway etc...
oh i saw a post on reddit of someone who made a barebones point n click game with gamemaker and stable diffusion, and i bet soon enough people are going to panic about it or something weird like that.
Some of us still remember the panic attacks that people had when Photoshop was created. Or the panic attacks photographers had when digital photography became a thing.
yeahhhh.
i wasn't old enough to have much memory of it, but i have family who have done a lot of professional work in photography/film/etc... one was one of the first to make truly convincing photoshops and caused a real shitstorm in their field
stupid how much blowback there was
People didn't like the idea that you didn't have to develop film. There was plenty of pushback because "digital photography will never be as crisp and clean as analog".
yep that's what i was told they were blasted with
won't have the same rich colors etc
reminds me of the same crap with vinyl
Also "If you use samples in your beats, you aren't a real musician. You have technicians making noise, no one here is a musician, because no one knows how to play the guitar!"
can't count how many times i had ppl tell me that they could tell the difference between a really well encoded mp3 and a .wav on their headphones... and that vinyl is a more accurate representation of the original sound than digital
haha yesss
like the ppl who used to get really upset when someone would program in a midi loop for a synthesizer using a mouse
Hello all. I discovered the world of AI generators a few days ago learning about it. I chose to got with the stable diffusion route even though i don't have a GPU. I know it will be painstakingly slow without GPU and maybe I won't get far but wanted to delve into this world that I am VERY interested in. If I am able to compute with this stuff I would invest the $ to get the proper hardware. With that all being said I am in need of help for a new beginner. I can't seem to get the stable-diffusion-webui.git to install the necessary files as I get a return error of https: is not recognized as an internal or external command.. I am the proper folder \Automatic1111> and copied that line in and this is where I am stuck. Thank you for your kindness and understanding of my struggle with it. I was following a YT video verbatim and there doesn't seem to be any rhyme or reason for my not being able to move forward. Thank You.
you'll want to take your question over to #🤝|tech-support , there's ppl in there who are very helpful 🙂
Thank you very much! I'm new to discord too. Appreciate you.


i wish they would release stable audio 2 as a local thing too (aka the weights and inference code), cause I would love to use it, i know they said they are working on some version that will be free, but im not sure if that version will be on the same "level" or quality as stable audio 2
Me when SD3 releases
I know it is trained on different dataset
instead of Audiosparx
but what dataset? I dont have a clue
I really wish they'd release progress updates about sd3
anyone having htis error with animatediff? video_length = mm_animatediff.ad_params.batch_size
AttributeError: 'NoneType' object has no attribute 'batch_size'
video_length = mm_animatediff.ad_params.batch_size
AttributeError: 'NoneType' object has no attribute 'batch_size'
animatediff error on stable webui and forge webui
they don't do that with anything
based on tweets and reddit threads, some who had it originally, no longer do. some who didn't have it originally, now do. so things are moving.
why would they revoke access like that
was it the guys on civitAI who generate nothing but various fluids seeping out of engorged anuses trying to get SD3 to do nothing but that?
and Stability were like guys I know you are red teaming it here but can you try to generate anything at all other than fluids seeping out of engorged anuses
and they were like "no. We cannot commit to that."
why do u know they generate that and why do you care if they do
have you been to the civitAI website sir
plenty of times
ok, my follow up question is do you have functioning eyes
do you?
yes, this is the reason why i know they generate that. Because i have been to the website and have functioning eyes.
then
a.use the nsfw filer or
b.dont go there
why do you care that i care
why do you care that i told i you that,you can try to cope with it
sir please, this is boring.
probably limited resources. and to be straight forward, at least the stuff that gets posted on twitter, is usually the most boring benign stuff I've ever seen. 99% of it doesn't in any way exercise what SD3 can do.
yeah every image i've seen posted had that ugly blasted contrast/oversaturated/ultrasmooth skin look that plagued SDXL too, very ugly aesthetic. Text handling was better i guess, but who knows how much they are cherry picking.
A couple pics were generated where I asked for them, and those were impressive. but pretty much everything else was a girl standing there. a small figurine. some water color jellyfish. argh.
meanwhile with dalle "show me a picture of a universe made entirely of cheetos, where matter itself has cheeto atoms made up of cheeto subatomic particles of cheetos" and it does it. "SD3 research paper: jUst As geWd as Dall3!"
And MJ is miles ahead aesthetically
anyway I remain hopeful!
anyway I remain delusional!
well, i'd say with ipadapter, all that MJ styling is easily attainable in sd3 now
yeah, it really is frustrating how many dumb prompts have been thrown at SD3
i'd like to see "woman standing in kitchen holding a bloody knife behind her back"
I tried every which way with sdxl, all models, i made it a toy instead of a knife. closest I could get is a view from behind.

Should try Epicrealism

lol i pulled it up on civit and it's just wall to wall porn
Hi Everyone. Not sure if this is the right channel for this. I'm looking for a stable diffusion / MidJourney professional who can assist with a project on digitally altering images of socks. I have PNG images and 3D files of the socks. The goal is to take these images, keep the socks unchanged, and completely transform the model and background to a design of my choice. I would also love to learn this process. If anyone is skilled in these techniques and is open to collaboration and teaching, please DM me. I've attached an example of the final result we want. Thanks!
honestly sounds like it would just be easier to do in photoshop
I would be happy to know how 🙂
MS Paint
object selection tool makes cytting out objects easy
transform/warp lets you reshape an object
use a few feathered masks and curves adjustments
and its about 2 mins work
But first I need to create the new image. I can't copy other people images and backgrounds 🙂
but you can use this to put it into any image you want
people wanna use AI for everything and it aint always the simplest solution
i mean otherwise you take a bunch of images you want and train a lora with that sock design and generate whatever images you want with these socks in 'em, but its a lot more work and trial and error, etc etc
or if you have 3d files of the socks and know your way around 3d you can generate the images you want in SD or MJ and render/comp the socks in if they need to have very specific lighting
etc
you said you have 3d models right?
someone made a socks adetailer model
interesting, there's a pissy exchange between a guy thats critical of the model and a guy defending it and i dont believe the guy defending it for one second
the "its good if you prompt it good" guy
huh i wonder who that could be
My expectations aren't that high any longer ...
(the model is good if it is prompted good)
like 2-4 weeks
The last release wasn't really good. Swarm intelligence made it interesting
what do you mean by "swarm imteligence"?
Stable Diffusion release didn't reach expectations. Community models made it usable ...
eh, its almost always like that, sd makes a great base, people finetune to specification (weeb, not weeb, photos, whatever)
Didn't they show SDXL can do text?
I think my least favorite kind of guy is the "why wld u want to do that?" guy that infests literally every forum for every piece of sofware or hardware and naysays anyone who asks for any kind of improvement ever, these people are put on this earth by satan to drain everyones will to live. The model is good if it is prompted good guy is that kind of guy.
sd3 is good if prompted good, if you prompt sd3 like dall3, it will not be as good
tag based prompting is dead (for launch sd3)
"of course its good for multi characters, anatomy, and complex poses, it works great for me, here's an image of a person standing there is a really boring static pose to prove its good at that."
i never said that, though will test if you have your doubts
?
i'm talking about the guy in the reddit thread
ParseeMizuhashiTH11?
thats me lol
lol
A lot of pictures they show should be able with an older version, too. Maybe with additional tool ...
im reading the threads. if the research team departed from stability, who is touching the model currently? so everything testers are playing with is no longer the original model produced by the research team but rather something still being messed with by, who?
No hope for SD3 and i doubt the community can fix it.
here's a prompt SDXL totally fails at
analog photo of 3 women doing a backflip in mid-air, dynamic pose, motion blur, vintage, faded film, film grain, wearing tracksuit, soaking wet, heavy rain, sunset, flare, light leaks,
in every case it turns into a mutant freakshow because XL doesnt understand human anatomy in extreme or complex/dynamic poses
take out everything but "photo of 3 women doing a backflip in mid-air"
the other shit muddies the question of whether it can address that one core concept or whether it's just losing focus
why, i want it to be sunset and raining and look like a vintage photo, and the sun to be flaring into the lens
the big issue is SD isn't trained on upside down ppl
yes, you do, but if you run into a problem, best troubleshooting strategy is to reduce the problem down to what you think the key issue is
reduce the variables to a minimum so you can come up with firm answers to questions
sure but what i want is for it to actually look good too
right now it's not clear if another one of those tokens is causing some issue
i dont care if it can do a backflip in bad lighting
i sure as hell do
cuz that i know is where the issue is
if you don't figure out what the problem is you also won't know if you're just slamming your head into the wall
here's the issue: SD isn't trained on people being upside down
It can work with sports ...
there may be very specific cases, but try denoising a face after rotating it 180 degrees
in general it will do crazy shit
congrats we now have analog photo of 3 women doing a backflip in mid-air with bad lighting instead. https://www.dropbox.com/scl/fi/2s8hwsvs708q4ym9frtff/grid-0339.png?rlkey=chwhnd3opu3ezuuklxsvlixdv&dl=0
lol
and that's important you do that
now you know it's not the lighting, or another token that's causing the issue
I knew that already because i added the tokens one by one.
there's many cases when adding random tokens will cause another unrelated part of the image to get fucked up
yup
so, there's the problem: upside down
the faces are f'd
now if you know that, then you can generate them right side up via denoising the f'd up version there
then rotate it back
yeah but also the other problem isn't just a person upside down, there's a whole arc to it with multiple stages and recogniseable poses and it doesnt understand any of it
and it doesnt understand motion blur or what arc any of the body is traveling on
yeah if you want to do the entire thing frame by frame that's beyond me
my knowledge there ends with the static image approach
i'm talking about static images, i'm talking about capturing dynamic poses
Is anyone who knows how to install stable diffusion in low end pc
like here's a use case, people wanna use this stuff to do storyboards right? its a common thing in tv commercials now, people do boards in midjourney
so you need to be able to do dymamic poses for that
cinematic photo of a Boxer being punched really hard in the face in back-alley brawl,
doesnt know what a punch is
yeah, that's def openpose and pray
yeaah
it's really bad at that
gotta do it manually
yeah but thats the problem! Because then there's a whole chain of issues downstream from that, like having characters interact in complex ways, and not just porno ones
i am trying to install forgeUI for automatic1111, but while i was installing it says "CUDA Stream Activated: False" in the terminal. how do i activate it? i already installed it but i dont know how to activate it
yep, that is the problem, lol
it works (just the photo of 3 women doing a backflip in mid-air part)
i am, not, prompting the rest (out of concern for my access)
the rest meaning the faded film stuff or the punch?
because the punch = violence and there's rules about that?
hey btw sorry if i was kinda a dick, a couple of posts on reddit rubbed me up a bit the wrong way but I appreciate you taking the time out and not being rude back.
My brain just filters everything that shouldn't be there ^^
Uhh... that sounds kinda ignorant ^^
its so funny, MJ will do the prompt "cinematic photo of a Boxer being punched really hard in the face in back-alley brawl, breaking his cheekbone" -- but it works really hard to avoid showing the point of contact, it does all the big dynamic motions and pained expressions and sweat flying through the air but either the blows are like a boxing glove being nudged up against the face or a guy looking like he's punching himself, It knows what a punch is but it will NOT do the heavy contact, because its so censored.
it did one weak blow out of about 50 gens
https://www.dropbox.com/scl/fi/c26kcco9c09my8agvg4qh/0_0.webp?rlkey=t6fx1ie57srmtk3wpnfaw6fxp&dl=0
and the same prompt in SDXL is just two sweaty guys posing next to each other looking tussled, because it doesnt know what a punch is
https://www.dropbox.com/scl/fi/3tt4rpomc2nojc2pvubf5/grid-0342.png?rlkey=c8bqy10l1svrvfwisen7i6seg&dl=0
Thank god my brain censors itself ... so I don't get problems like that ^^
problems like which?
Bad punches ...
My biggest problem is getting not so male nude males ... but that's not really a big issue
yeah SD is definitely weighted toward generating ladies
Hi guys, what do you think in your opinion is the best upscaler?
Something more natural looking like a digital drawing
Supir
I really like Anime 6B or whatever but im looking for something even more vibrant
Supir Upscale? (ComfyUI) does that mean I need ComfyUI for it??
Didn't make it work in comfy but not really that used to it ...
working with ultramix_balanced at the moment
Dr. Furkan Gözükara made a standalone supir upscaler 1 click installer, i had to join his patreon to get it tho, not sure if he's released it publically since or there's another way to use it in comfy or whatever, but anyway, best upscaler i've ever used
interesting, im looking at it and it seems amazing but im mostly looking for one that looks like a natural digital drawing
this one seems to be focused on realism ?
It's for his patreons only ...
yeah he tested a few illustrations in the video
can u link me vid? 😄 I'm curious to see it
possibly not anime style like you want but i guess i could test something for you now
Also, do you guys have any recommendation for a good model for Fantasy Digital art
gotit, yeah i havent touched it in about a month so had no idea if there were other versions released or whatever since then
SUPIR uses an XL Model ... you could try an anime one ,,,
yeah im not looking for realism
yeah but not exclusively for realism
he tests a 2d image if you skip fwd here:
https://youtu.be/PqREA6-bC3w?t=1564
yeah that one looks great
the value of supir is it preserves the integrity of lower res images incredibly well, subtle facial expressions etc, every other upscaler i've tested changes way too much stuff
so if thats important its amazing
yeah thats true, the colors look good but it doesnt keep as much detail
if you want it to add detail or fix things maybe its not what you want
althought you can still do that if you dial in settings and prompt etc
what about a model for fantasy digital art ?
because there's denoise strength type controls too
sorry cant help you there, not been my focus

thanks tho this upscaler looks good
seems like it preserves stuff and just upgrades the image quality
ye i got 12 gb exactly
at least it was last i used it
they may have updated it since (when it first came out needed 32)
oh one thing I'm curious about when you see very good images generated on CivitAI
Why do they not include the upscaler too?
Do they really generate an image that good without using upscaler?? 😄
I thought most models can't really generate properly in such big resolutions
They also contain the upscaler version. But its only visible in the metadata. You have to click
"Copy prompt", then paste it into your prompt field und press the Blue/white arrow to let it automatically set everything the same
Do you need good upscalers for Anime?
not anime, just looking for some model and upscaler that does more natural looking HQ digital art

oh really, interesting, ill have to try that too
is that Copy Generation data basically?
NMKD Siax 200K is a good upscaler
thanks, and where do i paste the generation data exactly ?
In the prompt field
what kind or regularization images should be used for a style?
as in lora training dreambooth method
Or do it yourself, it seems to work (I haven't tried it)
Local Supir
https://www.reddit.com/r/StableDiffusion/comments/1b37h5z/supir_super_resolution_tutorial_to_run_it_locally/?rdt=36399
Been hearing lots of positive stuff about SD Forge. I'm pretty new to all this, is Forge like a stand alone version of SD, or do I need Forge+sdxl for example?
Forge is a webui like automatic1111 or comfyui.
It has some advantages for GPUs that haven't much Vram.
All the webuis can work with 1.5 and sdxl models
hi
I see, so if I want to install and use Forge, I need the baseline sdxl and Forge, if I understand everything correctly?
Hey, welcome 
Any sdxl model should work. Even trained/finetuned ones
Hi everyone, no support received in the past few days.
StabilityAI has generated some inappropriate images that I would like to delete from my profile, or close the profile if this is not possible. Would it be possible to receive support? Do you know how to do it? I sent several emails but never received any replies thank you all
What do you mean by "your profile"?
I made a profile here to generate images I also pay a subscription https://stability.ai/
anyone knows how to use TTPLanet_SDXL_Controlnet_Tile_Realistic with a1111 ?
Holy crap this is giving me a headache
Trying to categorize all of my wildcards. Anyone wanna share how you've organized it?
Can anyone help me get support from stabilityAI? Thank you all 🫶
Would if i could brother
Where are the robots? HOW CAN I GENERATE PHOTOS NOW?
You must use the power of your mind
PLease guid me, I haven't used this server in a while
Before it used to be 10 bots
And you could use dream to make the photos
The bots are gone for now, maybe forever. Try using https://leonardo.ai/ if you want to generate photos with stable diffusion
or install stable diffusion on your pc
doubt the bot ever comes back frankly
why in the world they were letting users generate images for free on their resources when they couldnt pay their cloud providers
just installed stable diffusion on my pc
auto1111 or whatever it's called
it's pretty good but batch size 4 and 10+ sampling steps takes absolutely forever and uses so much VRAM
I think I'll need something better than a 3060 Ti

😭
just don't use batch size 4, use 1
wrong chat moment
yeah i found out that batch size 1 and batch count 4 is better
because it makes all the images serially instead of in parallel
Does anyone know the best way to apply a style LoRA onto a photorealistic image? I've tried multi ControlNet, T2I, different SD base checkpoints, different style LoRAs results are always uncanny and doesn't look great
Someone sharing SD3 testing on YT. I assume there have been others but this was the first that I came across? Also tried searching, but I can't find anything. So here you go. A link to a Stable Diffusion 3 testing video.
https://www.youtube.com/watch?v=mQSKoAEaIJA
no worries, i never intended to be mean to anyone else
they assume that sai wouldnt do something that can improve training
also the feedback on "are you using 77tkn t5" was kind of mixed
(they arent)
so i dont blame them for using that as an argument
hmmm
its weird cause in the paper there was like a super long prompt that didn't have distortions
which is around 124 tokens
yep, thats the t5 pulling its weight
basically:
are you going to use text?: use all/t5
are you going to have a long prompt? use all/t5
else? use both clips
they each get same prompt
the clip, the other clip, and the t5-xxl
no I mean the two clip models
whaat
but when I enter a prompt too long it errors out
when I add it to only one of the clip models
inside comfyui
but then what do these GUIs do when it comes to accepting longer prompts when using longer prompts
but if you just condCombine(prompt[:77], prompt[77:) you bypass it
just automatically truncating or somehow concatting
concat embeds
the one flaw (which was posted in the comment thread) is that they dont share attn
is there any really good in depth guides for making ai generated images of a person you train into it with a couple photos? im trying to find one because im trying to train myself but its not working.
so if you say "cirno [blah blah blah blah to fill 77tkns] | she [...]", the model will not know what she is
look up Lora tutorials, those are light on VRAM and work with few photos iirc
lora? theres a bunch on civitai, though for hyperparams if you just use the default settings you should be fine
its just everytime i try it never gets my face right
I wonder if there's gonna be a way in comfyui to make it only use T5's embedding/conditioning
🤔
yep!
ah, faces, install adetailer
would that be messed up
no, not that messed up
hmmmm
I wonder what the results would look like
welp, gotta wait those 2-4 weeks to find out!
different.
well yeah hehe
i think the paper has something about that
very good video, and the results are quite good. I hope to still be alive when they release the version
Also what's with the abundance of prompts without styles in them
like most of them don't start with "photo of" and etc
its so vague
people dont know how to prompt
not just that, the paper itself sometimes does this
A car made out of vegetables.
a massive alien space ship that is shaped like a pretzel.
yeah they might not have used raw prompts
how do i download for amd gpu?
Hello
sometime in the future
SD3 when?
same question 😔
there is something satisfying about a lack of a defined timetable
i saw this video and there seems to be a separate server only for sd3
if someone had the link of the SD3 Launchpad server and would share it that would be great
Censorship! Waifu trainers get ready to do overtime.
GTA6
I don't think there's really a shortage of art on this server
i alone have generated over 100k images since dec 7 2023
Hi there, does anyone have any idea which AI tool was used to create the filter effect in this video? https://www.youtube.com/watch?v=vxRY_9bv8sc
thats a good start
quality over quantity
or in this case none
hi
mine is obama (i bleed blue)
hi
#🏞|general-with-images message you can have quantity and quality if you keep a 4090 humming around the clock
I could sell my car twice and still not have enough $$$ for a 4090 lol
every artist oversells their own art
all art oversells its artist
yea the banana one sold for a lot
also the guy who destroyed the painting with a shredder
the price went higher
you have me curious wtf you're driving
i could replace mine, but then i realize i could also buy 15 5090s when that comes out with the same amount of money
98 nissan primera (infiniti g20 equiv) with 170k miles (272k km) on the clock
whoa, impressive
worst i've seen was probably my dads car
pushed an 89 accord to nearly 300k miles
ski rack flew off on the freeway and it was all downhill from there
what would you do, if you have tge power of the M3 Pro?
by the time it finally died died, the drivers side window didn't roll down, neither back door opened, the A/C was dead, radio was dead, and you had to put a golf tee in a hole in the stick shift to change gears
I've seen a couple of 1990 toyota corolla wagons hit north of 400k miles, have also seen a 2005 mazda 6 wagon hit 300k miles as well (work vehicles, not personal ones 😛 )
yikes
yeah those Japanese cars are built solid!
friend of mine who's a doctor and has plentyyyy of money still drives this complete pile of shit truck he assembled from three different trucks he salvaged at a junkyard
doesn't even have a key for either of the doors, both of which are different
a few years ago i borrowed it for moving some stuff around town (it's stick btw) and the fn stick came off in my hand while i was rounding a turn
nearly took out a limo as a bride was about to climb into it cuz i was so stunned, my brain just start smoking pretty much when that happened
Hello Everyone
I made a high quality text-to-image dataset based on midjourney images providing two level of prompting (long prompt and short prompt for the same image)
I hope it be useful for someone here
more ella-style attention guidance stuff coming out https://github.com/KU-CVLAB/Perturbed-Attention-Guidance
Hope someone has the card for that.
there are a1111 extensions and comfy nodes out for this already 🙂
would anyone know why civitai seems to be using so much of my computer's resources? As soon as I open a tab, my laptop goes into overdrive with the fans
whenever i click off to another tab, the fans relax.
idk, just watch it turn out they've found a way to sell your cpu cycles
i've noticed the same thing
I can barely use it without my laptop getting way hotter than it should.
And I'm just trying to find outfit lora's...
Having it open and in the background doesn't do anything, but as soon as I click on it, boom. fans are mad.
ahhh i thought i figured out a work around by just making the window smaller. damn.
it kinda works.
poorly optimized website - best way is to not have it load images, but that kind of defeats the purpose lol - you can disable hardware acceleration and that does help apparently, but you are by no means the only person complaining about this
i mean
i had to read this 3 times to understand
it depends on the laptop
if your laptop is old then try to get it clean from the inside
because it shouldnt go rocket mode just for opening a website
and if it still does it consider getting a newer one or change the ram slot
since ram degrades overtime
gm
morning
Evening
1324
gm
hello
https://www.tiktok.com/@edwinskeletrix/video/7355974950945164586 how do people make videos like this? can i do it locally?
gm
gm
gm
gm
Evening
I believe it's pretty easy to sit in front of a phone and talk like a derp..
Wassup!
Yo
Average tiktok moment
Really don't know how people get addicted to watching that shit
Either that, or I'm getting old AF
either way, the brain rot is strong!
Same 
Ive seen my gf spent half her day in front of it, its somewhat depressing
And I have zero understanding for it
hello
Same, I’m not into TikTok and stuff
hello
gm
gm
Thats the point. You may not realize that there are levels to brain rotting and you and I are no exception, just due to the fact of being here
Agreed, we just choose a different poison 😛
Question is, why do we need poisons? Is 9-5 that miserable? (Work or school) 😭
You mean the dystopia that is late stage capitalism.. nah no idea mate 😉
yep you can
So looks like SD3 wont be as good as Ideogram with words. But who cares sicne you cna poull the levers 100s of times on your own machine until it gets it rgiht
XD
Someone will make an equivalent of Harrlogos for sd3 and itll be fine
using what
Only two classes left the have nots and the have debts.
Does anyone how to generate an image given a small part of it? For instance given, flower with no background, generate someone holding it?
Outpainting of sorts but with custom shape
rich people dotn exist they cant access 99% of their wealth if they have any, governments after them or "invested" in "stuff" etc, middle class gone logn tiem ago, anyone who appears to have soemthign is in debt for decades to pay for it
What a time to be alive!
Two appers down theline maybe I can escape to Mars.
On ym Muskmobile.
Yes its that 0.01% and the rest. Dont worry tho, soon ai will replace u and u dont have to work anymore.
There was only ever two classes, the capital/resource owners and the workers/slaves
Hold on to your papers!
I welcome our robot overlords. they can't possibly screw up worse than us.
Unless the elites control them
I loved the quote of some guy in wef, that said we wont need the vast majority of people anymore, even as slaves
hello
The Georgia stones specify they only need 500 million of us to wipe their asses
so essentially the top epople advocate a genocide of 7.5 billion people
while they sit in their pools udnerground and wait for the radiation to disperse
Even top baddies of history can;t coem close to the loving overlords we have now
amateurs killing a few million people
wtf is that
Idk if thats the most effective way to go about ut
i think u been smoking too many georgia stones
Maybe make 7.5 billion inject poison while brainwashing them that it is a cure for some disease they made up.
nah man thast what it said
word for word
in multiple languages ot make sure we dont missunderstand
:))
🤣true
yea a manmade stone that was built in the 80s
well ask yourself would you build that. who woudl build it
its just a prank huh
and now it got bombed
weird stuff
i can build something like that and bury it to troll some qanons like yourself easy
Then watch wef
Where most politicians gather
Of eu and us
And word for word
I dont think people build such monuments to troll
politicians corrupt whats new
In that forum it was said "its better if the world population was 500 million"
if you knew stuff and couldnt explcicitly say it whats the next best thing: architecture
4080 super vs 3090 ?
teacher reading my essay on how to combat overpopulation:

Symbols next
a 4chan forum and name myself Q
Have u seen "watch out for 666"?
heard they are reptilian aliens and they are in possesion of gods ark and no one knows this information ( only me and alex jones)
Thats one way to lower the credibility of "conspiracies", by spreading crazy ones too.
So people associate plausible / true conspiracies with crazy.
Alex Jones (controlled opposition) does a great job at that
that may be true,they use those ppl to paint ppl who oppose wars as loons
You got a billion dollar debt hanging round your neck too?? 😛
hi
Hi! I'm an artist too. 😊 (Pencil and paper kind, with books on figure drawing and color theory on my shelf.) I'm working on a painting app that has AI features built in, with a feel similar to Procreate. Would you be interested in experimenting with it? Also looking for help making tutorials / use videos.
https://github.com/QuintessentialForms/ParrotLUX
hello
Please check our rules regarding advertisements! (Also, good morning, and I hope everyone is well today!)
You might want to check out #1092446741984444416
Love the concept!
/D
gm
hello
Spinning is a good trick.
hello
Did not know about this section. Doesn't look like people visit it, unfortunately. 😕
good
Hi, everyone, can someone help me with generating a qr code with an image inside?😅 Would highly appreciate it
Youtube .. qr control net
I know, maybe someone did it already, I need to try how the thing will look merged with the logo
hello,youtube
We've had it for quite some time! I always like coming back to it and checking and seeing what everyone else is up to. What are the plans for your project?
1242I know
Posted a summary in the community section. I'm mostly making the app to use it. I do want it to be useful for other artists though, so if anyone's willing to give me feedback, I can improve usecases I haven't thought of. If you know any digital / traditional artists interested in AI who might be interested, please feel free to point them my way.
(One kind user here needed to do some stuff with IP redirecting, and by fixing the app for his use case, it's now even more general and powerful. 🙂 )
hello everyone!
I can't wait for stable diffusion 3 to come out
By any chance is the v2 beta API using SD3? It seems the quality is better for some reason than stable-diffusion-xl-1024-v1-0
We've had it for quite some time! I always like coming back to it and checking and seeing what everyone else is up to. What are the plans for your project?
hi
heklo
hey ken kanaki
That's awesome!
If you want to make tools for artists, I suggest that you look at various art programs, and study those features.
Most artists want:
- Layers (and the ability to group them)
- Clipping, masking, and alpha layers are all essential
- Tools like the ability to select, copy, paste, rotate, and transform
- With transform, you want the ability to warp, change the perspective, (which you can do with the above)
- Most artists are very into brushes--so the more tablets you support, the easier it will be for you. Having a dynamic brush system is very good. You can look at programs like CSP, Paintstorm Studio, Krita, and PS, etc. Sizing, opacity, and the ability to place in your own image, etc, are important. (Esp cuz going from AI to brush would be useful.)
- You also want bucket fill tools, and be able to drop textures in/out
Those would be my suggestions for you, as a base.
Thanks! I've already finished 13 out of the 17 features you listed. I don't technically have custom brushes, but they do exist (defined in JSON files w/ brush tip + texture images). Will add a way to customize them in-app soon.
Missing 3 things:
No clipping layers. (What are those. Have to look up. Sounds redundant with masks+alpha?)
No select-cut (and no plan to add... but it wouldn't be hard tbh.)
No perspective warp. Hmm. That would be super easy to add. The layers are already 3d quads. (Would people really want it though? Why? 🤔)
To brag a little: my flood-fill tool is top-tier! You can expand the flood with padding, so there's no gap against the lineart! 🙂 (But it's also too slow, like 2.5 seconds. Moving it to the GPU is on my todolist.)
I also have not just best-in-class, but only-in-class features: Like the ability to freely arrange layers of different resolutions. E.g., you can have a 1024x1024 layer with your entire image, and then another 1024x1024 layer that overlaps that layer, but smaller, only covering a face, so you can inpaint and have a hi-res face. (You can merge down and lose the hi-res, or just keep it. Export at whatever resolution makes sense for the whole composition.)
(Also best-/only-in-class: general/universal AI API compatibility that feels like native in-app integration. Also: intuitive + non-destructive inpainting & upscaling, so you can keep inpainting different parts before & after upscaling etc. Also: intuitive controlnet and img2img configuring with node-links. And other stuff. 😉 )
hi everyone
hi
hello wans
every one let here
hello
the day when you invite your 6 bots to the server
ewgdvheard they are reptilian aliens and they are in possesion of gods ark and no one knows this information ( only me and alex jones)
ewgdvheard
heard they are reptilian aliens
good day, chaps!
hi
你们好啊
hello
你们好啊晚上
你们哈皮!
Question: Does anyone know how i can set the generation of an image2image using AUTO1111 to stop at say 90%. It seems to get to about 90% and I like it, but then for the last 10% then 'overcooks it'. I have tried lowering the denoise and CFG. But still that last 10% it adds way too much detail, but is perfect at 90%. How can i make it automatically stop the img2img generation when it gets to that point?
晚上好!
good evening
GOOD EVENING
hello\
does anyone know if they're going to put SD3 in here for testing like in SDXL?
I'm not necessarily for it, I thought the resizing of the images made SDXL preferenced towards really large subjects, as an affect of the voting
to an extent
@bleak matrix can you tell Emad (or whoever runs show now?) lol
hello
Funny... just wanted to write that SD3's API access should finally start this week... it's mid April. ^_^
There already is a testing bot for SD3 with voting again, just not publicly available
Thank you! I'm using them to test PAG. https://civitai.com/posts/2163684 (credits in the description)
Seems amazing ^^
Don't you think they should require full sized images during voting?
I mean there is a huge difference between an image iconized vs. as something that's y'know, 256^2 and up. So the (312ish?) sized images are going to be best at that size if the voting is at that size--when normally they're at something like 512-1024 or more, in probably 80% of display situations.
Or do they compensate for that minimization/resizing affect somehow or another?
Hey, does anyone know a cheap way to use SD on a phone with lots of free credits?
setup a local sd on a pc with a public Url and authenticaiton and use that to login on the phone?
Not sure if it will be blurry but you can right click the preview window when it's at 90 percent and save it
I’m ready to generate fish using Stable Diffusion 3 Alpha Turbo
I will make the most fish
None of you are ready for it
click "Generate Forever" and actually let it run forever
Im looking for 3d to anime while keeping the same character, what should i used, text to image, image to image, and what control nets?
also what can be used to keep the colors more or less the same ?
The Reference Only controlnet should keep the colors similar
thx will try it
why does it take like 10 minutes (seriously 10 whole minutes) for sd on cpu?
is this normal?
anyway to speed it up with commandline arg arguments?
I think that's normal for cpu
yeah, took like 4 min even with 1660ti, thats why i spent 1400$ on pc build, now its 1s per img at 1080p
pixart sigma is a great substitute for the lack of SD3
what gpu gets 1s per image?
3090
nice!!
thx loving it so far
turboxl or standard sd?
sd xl
i see
wait so for pixart sigma i need 20gb for just the T5 part, are you kidding me?
if I do 4 image batch then the combined time is around 4 seconds on GPU
I was using CPU so that I can run llama 2 in gpu with sd but appanrety that wont work lol
what gpu u got
Recently i tried command r on gpu and it used all 24gb vram lol, but it runs fine. 1 year from now surely 48gb vram will be the new standard
just a lil 4050 (6 gb vram)
command r is what??
Can someone generate a qr with logo for me?😂
llm, like llama 2, but better, gpt-4 level and open source
oh I see
looool interesting
must not be based off llama?
ofc not
last i checked every single open source llm was based off llama
yeah llama is woke af
this thing is uncensored, it will literally teach you to cook coke
eh I mean not really, its same as all the other llms imho (still woke for sure)
LMFAOO well thats quite uncensored
i imagine the system reqs would be higher
its 104B parameter model, i was surprised 24gb vram was enough
if you are using HiRes Fix... that blurryness at 90% happens if you setup odd sizes, change the upscaler from Latent to None (or other upscalers)
104b?! damn yeah wonder how 24gb worked lol
gpt 4 is 1.7 trillion parameters, its a wonder how in just 1 year 104B parameter model has same performance
yeah thats also true
I wonder if it can tell you how to make coke the soda? like figure out the secret ingredients 😄
llama3 should surpass gpt3 in just 7b
launch maybe june
oh tha would be nice hahat
anyone here has hands on enterprise GPUs?
Guys, what is the AI that generates stuff to fill an image
I have a vertical image character and I want to fill the sides with a suitable background, how do I do that?
hmm, same question
You can use outpainting for that
I use Fooocus outpainting
i should use "--lowvram" to lower vram usage right?
with lowvram flag only .4 gb vram is used lol
gets around 4 s/it though rip lol
hello guys
how do you unload a model?
idk in 8-bit doesn't go over 12GB iirc, but I need to test it again
people blow this T5 thing out of proportion
can i have a custom shaped object and outpaint on that, for instance a flower with no background, and generate an image on someone holding it?
it doesn't need even close to that for vram
I'll test it in T5 in 4-bit
Yeah just have to be careful painting around it
what are the flags that makes sd run faster?
@sage reef Pixart with T5 loaded in 4-bit, it doesn't go above 8.6GB VRAM
have u tried forge
whats that?
try it , it makes sd work 2x faster on low vram pc
https://github.com/lllyasviel/stable-diffusion-webui-forge
If you use less powerful GPU like 6GB vram, you can expect to get about 60~75% speed up in inference speed (it/s), the GPU memory peak (in task manager) will drop about 800MB to 1.5GB, the maximum diffusion resolution (that will not OOM) will increase about 3x, and the maximum diffusion batch size (that will not OOM) will increase about 4x.
looks nice
how about i provide gpu, because im in china, i know where can find the cheap card
but wait, are you using comfy or what? i would love to try it inside comfy
yes
Any guys interested?
if you have 12GB of vram and you load it in bnb8bit then I think it should be fine
kk
i feel like u need a full time job using sd to learn comfy
nah comfy is ez :3
I recommend using extrasamplers -> res_momentumized (or whatever) + cfg_rescaling with like 0.8
jsut the defauly comfy workflow with the bare minumum is easier than a111
i tried sigma on the demo huggingface, some results look kinda baked
yeah cfg rescaling helps with that too
and whatever the sampler is
20-30 steps is okay with res_momentumized
yea il try some combinations, thx for info
yea i think i tried that sampler with cascade, was slow if i recall
oh so im looking at the extramodels thingy, so for T5 we can offload the whole thing maybe on the cpu side, since it's literally just a text conditioner, dont have to trouble my gpu with that then, and then use gpu on sigma+vae+whatever else
on cpu it takes a bunch of RAM
hmm il experiment with a combo that works
i wish there was a non diffusers version
maybe even pruned version
but wait, i already have a t5 from ella i think, can i use that one? or is this drastically different?
ehhhh idk
oh someone asked the question im asking already lol https://github.com/city96/ComfyUI_ExtraModels/issues/20
res is def a relatively slow sampler, but man, it cannot be replaced
#🏞|general-with-images message really great with this denoise schedule
are there any custom nodes that allow us to run Kandinsky models inside comfy?
i never tried those
so i learned how to use those dynamic prompts... and magically its like my hard drive space is gone...
what are dynamic prompts
a {tall|short} {man|woman}, produces "a tall man", "a short woman", "a tall woman" etc, you can do quite a bit with it
think of like hair colors, lengths, styles...hard to imagine what life was like before dynamic prompts
yah I have the family plan, my daughter insisted on having spotify, I used to have pandora which was way cheaper
if you have ever generated regularization images for training, excellent use case for dynamic prompts, to get a nice variation in your data set
when did these come out - will give it a try
wow
huge red flag there
are sdxl loras compatible with pony?
yup, just grab the extension and you're good to go
早上好!
i am come
hi bosy
yeah
hi are we ready to start?
yes we need more february bot accs then we ready
Alright it’s 7:31, we are going to begin our stable diffusion 3 lesson everyone
everyone get ready to open your Internet Web Sight Browsers
now i want you to remember your first childhood memory
say whut cuh
now that you’ve recalled your first childhood memory, go ahead and type www.stablediffusionthree.com
while you wait for it to load, please try to remember the first time you got a boner
or, if you fancy, a chub
moving on, if the website doesn’t resolve, i’m afraid you’ll just have to wait until Stable the Clown announces the official release of SD3

welcome my friends, let me tell you how to use 
how to use it
how to use it
Can someone help me with this error? AttributeError: 'NoneType' object has no attribute 'lowvram'
you are using it wrong
how so
but wait, where did you get that error, what were you doing?
trying to load a model
comfy?
SD
yea but you using comfyui or a1111 or what?
a111
how to use to download
how to use to download
oh i dont have a1111, so dont think i will be able to help... hmm, can you load other models fine or is this your first time with a1111?
does your pc have enough specs to run models? @turbid shard i mean i did use a1111 before but never had problem loading models
If I'm interested in doing mostly anime generations, do you guys reccomend SDXL or SD1.5?
XL

xl has some cool anime models
what models do you reccomend for it?
can't load anymodel
both works,u just get less deformities with XL
I have enough vram, specs are not the problem in my opinion
well can you quickly check if you can load in comfyui or another interface instead of a1111, we need to rule out specs for sure :3
but whats the gpu
im not even try to generate, I just can't load or change models
yes but webui uses your gpu to move/load the models
how much vram do you have?
there is no such thing as enough vram lol
good morning
12gb
@fervent thunder Animagine XL 3 , Anime Art Diffusion XL, are mainly for sdxl i use... for 1.5, i like the following: anime screencap 1.0, anything else 4.5, any lora clean anime mix (aam any anime mix, etc), kantanmix, kimix am...
i had problems sometimes with a1111 with a 12gb 3080... usually it was fine but it freaked out after a while at any batch size greater than 3, then anything greater than 1, then sometimes it'd just OOM for the hell of it
use forge, stableswarm, or comfyui
yea can you just try to load inside comfy or anything else quickly, we need to see what kind of problem this is
can you use comfui with an amd gpu?
thanks for the recs, and is 6gb vram enough for sdxl
6gb seems too on the edge... ugh.. not sure... i know 8gb works for sure (even cascade)
i mean you can try :3
what would happen if it doesnt work
alrighty ill try lol
yeah with comfy i'm usually at 8.5gb or so with sdxl
im personally on nvidia, so no idea 😦
comfy is just too good baby ❤️
man where is sd3 already :3
my guess is around april 26
Somewhere in the void of space
nuuuu
might want to specify the year 😛
i'm going with '25
haha
In your opinion, which of their projects is coming out next?
probably SD ‘97
new
new

you can - works better on linux with RoCM than it does with zluda on Windows, but it does work
there are pinned posts in #🤝|tech-support for install instructions
Hey Guys, I am a bit in dilema since i have to submit the requirments to a client, and need community's help in this!
I am running a workflow wehere i am taking a input from a webcam and then transofmring it into a picture with a spcific output using cutom trained lora. These output will showed in real time when the person stands infront of the webcam onto a screen.
Now my pC which has 4090 give me each image with a delay of about 2 sec.
And now the client says he is okay with 1 sec delay, so i am thinking would the RTX A6000 ADA will be okay for this since it has 48GB vram ?
Edit - i am running a turbo and sometimes lighitning SDXL model with Depth Control net
a6000 is slower than a 4090
if you want something faster you might want to just pick up a H100
What about the ADA version ?
Also how difficult is it to setup a H100 or A100 gpu ?
afaik it's the only thing that's flat out faster
not hard so long as you got $30k to burn
money is not issue
well, buy a H100
crazy your client is ready to flush that kinda dough on a webcam feed lol
not a web cam feed, its a real time camera feed, i am just testing it now on my webcam
RTX 6000 ADA is faster than 4090
although the camera would be high res i would still resize the input to the normal sdxl version
I am confused now 2 people and both saying different things 
it is faster u can check it yourself on vlad benchmark site
yeah this guy is right i goofed that up
ada is faster, has slightly more cores... nothing massive though
so if you need to cut your time in half, that won't solve the problem
now if u want the best of the best u should buy an A100 but SXM4 not pcie,but SXM servers are expensive
I am just worried about the post purchase installation and server setup is going to be nasty and pain in the ass lol
heello
cause i built the 4090 build and i know what i had go through ;,)
why is everyone saying hello
bots check the acc creation date,all february
lol
well if u gonna buy a SXM server with several A100's price is like 150k usd or more so u have to worry about that as well 🗿
bye
i am not doing the purchase, client is ;), my job is to make sure the workflow is running properly, which right now is, but with the new setup it should be more faster!
basically at half the iteration at what it is right now
can your client buy me a 4090? :3 it's probably peanuts to him

i hope to get 1 out of this project!
well if money is not a problem he could probably buy something similar to this https://www.amazon.com/Supermicro-Customized-Platinum-Baseboard-Analytics/dp/B0CSVN4YXW
I just watched a 60 second tutorial on how to install stable diffusion but when running it the windows blue thing pops up saying it might be a risk which makes scene as this is open source but the file size is also only 80mb and I watched a piece of another tutorial and his one was 6gb and from a different person on github . Can someone help me find the official link? Thanks! 👍
i dont think comfyui or any sdxl model supports more than 1 gpu at 1 instance
so i think only 1 gpu is more than enough
what did you install?
oh well you buy these kind of stuff for training models,so u load 1 model per gpu
nothing yet just unzipped it and opened the update exe which came up with the blue thing. It should just be the stable diffusion. https://www.youtube.com/watch?v=i5hvZvzcxoo <- thats the one I watched
huh... usually the blue thing pops up when you try to open some .exe, but that is automatic1111, you use the batch script to run it, so not sure how you getting the blue thing... to what program or exe is the blue thing triggering on? cause last time i used a1111 this never happened
ok thats good that you've also used auto1111. I've also seemed to have mixed up my words as the 2 files I've tried running which are the ones giving my the blue screens are the batch files such as run and update
kinda odd as all update.bat does it call eviro.bat and pull some git stuff
weird.. for a1111, i never used the prepackaged versions, i always used from source like this:
https://github.com/AUTOMATIC1111/stable-diffusion-webui#automatic-installation-on-windows
ahh okay ty I shall try it out
but i mean the blue message thingy should only happen once anyway, and if it runs after, it should be fine and the message shouldnt appear again, i think. but yea that's some weird stuff...
man how many bots are there smh
wish they at least came in and spewed interesting prompts
too much work :3
Lol, thats gonna cost 3k +
think waiting is a good idea right now
we got zen 5 around the corner too from AMD
doubt it
Idk if waiting is a good idea
Theres always new gpus incoming
My the logic of wait for better prices, you would never get a gpu
agreed, but if you're building a new system, it's prolly not that great of an idea to do it right now
new cpu coming out potentially within months but highly likely this year, and a new gpu within a year
if you're the type to upgrade every 5 years not every 2, prolly makes sense to sit tight till this fall
I just have a simple old 4080

your uncle’s a 4080
his dead
hi yall! I was wondering if anyone's used turboXL in deforum before? for somereason, when i use an init image of a person, and prompt the camera to move towards the left, the person's face is squished in a really weird way 
hello
One message removed from a suspended account.
hello
I think turbo is faster but lighting's quality is better
One message removed from a suspended account.
Solved the issue btw! turned down the strength schedule as the frames increased in case anyone else ran into the same issue:)
yes i think so
indeed
hello
The opening background is an office environment with a desk stacked with papers and notebooks, requiring whiteboard animation form
Can someone help me with a prompt template for garment generation with specific fixed attributes required in Fashion Garment industry.
hello
hello
hi guys
Turbo 1-2 steps, medium quality, uses Turbo scheduler; lightning 2-8 steps, high quality, uses SGM Uniform scheduler
123
hello guys
hello boys
sup
superstar DJs, here we go!
yes
Sounds like a SL DJ 😉


hello
I wanted some help making a couples photo into a stylised ai artwork
hello
im using controlnet, RealCartoonAnime and a1111 img2img. However, the result seems to mix up various elements. Ive used the && tag to make separate descriptions of the couple however it seems to mesh the descriptions together. any tips?
hello
Lower denoise?
cheerio
I've been trying to do a style transfer, but no luck ... I would love to be able to use img2img + LORA, but it seems this is not the way to go?
ive never messed around with denoise before. let me try it and get back to you
jea you right its getting better results with a lower denoise. its abit blurry and murky tho, let me play around and try and find the sweet spot
Also depends on your sampler and steps, crank up the steps and lower denoise might help
Look into ip-adapter/instant-ID
thanks everyone ❤️
You are excluding the standard with that sentence in my opinion.
Lora is too time consuming :/, are there other faster methods
Like IpAdapter maybe
Ip-adapter, instant-ID, photomaker, roop, reActor
We good, how u??
Did anyone who is on the waitlist for SD3 get access to it yet?
Pretty much no
make your own lora. i followed along to this video and it got me decent results. basically use a multi expression chracter sheet in the img2img tab, with control net and adetailer. then take the results and make a lora
if you willing to wrestle with Koya instalation you can do it for free. or pay 10USD to civitai and you can drag and drop to make a lora
Or you go the easy route and just use Onetrainer.
Will definitely look into both. Specifically i will try train a lora based on 2d characters from movies
guys can someone help me ? where can i generate image i couldnt find the channel for it
because there isnt a channel for it
miss
You can use a mix of a few celebrity names
ip-ip2adapters
and loras ofc
Which one gives the best results?
It's crazy that SD3 is like technically 1-3 weeks away
and yet I feel like it's still very far away
still no big wave of waitlist beta testers and etc
nothing about controlnet or current training status
it's definitely not 1 week away lol
hey guys, i'm kinda new to SD, i've read a lot of videos / guides yet but I'm not comfortable enough finding my own solution so here's my question
I have a LOT of photos of my face and would like to generate realistic looking photos of me based on my face, I've seen boring reality on civitai but is there anyway to give it my face ? maybe make a LORA of my face ?
if you want a quick solution start with ipadapter for faces, if you have a decent gpu straight up create a lora
i have a 3070 8Gb, should be fine enough for a lora right ?
also i'm kind of a noob but will i be able to adapt my lora in any style i want ? 🫠
should be fine for a 1.5 Lora, not sure if you can do a sdxl lora with just 8gb tho
would you happen to have a link for some kind of step by step guide?
thank you so much for taking time to answer
id just youtube and look for a recent guide, I havent created a Lora since sdxl came out, whatever guide Ive used then is likely suboptimal by now
hi
just look for something like 'simple kohya lora guide' and specify 1.5 or sdxl, whichever you prefer
thank you so much
Is kohya still with lora or did they break up?
Is there any info on the SD3 release yet? I haven't checked in here in a while
april 26, but dont tell anyone :3
Friday the 19th because that’s my birthday and they decided to single me out
well at least it's not the 13th
both together
Nah, probably April 69, which is a totally existent date
It’s not the 16th because that’s today and I don’t have SD3
same day as my report card date
welp
gotta talk to the devs and make it release on the 25th or elsei might not be able to try it out

We will probably know once it releases, so I guess there is no point in sitting here and making up release dates

agreed
never really got the jig of sd3
what does it offer
i technically didnt make up april 26, it's a decent approximation based on what was said by the new lead dude
an approx is an approx
And then AI in the free world will officially be able to spell
april 26 to may 3 is my range guess
SDXL can already somewhat spell, but SD3 seems to be able to do it way better. It's a very unique model as its arch is entirely different from most diffusion models we've seen before
yea yea it does attention guidance differently
Also 2 text encoder and 1 image encoder
^
Was in sdxl already
we’re all very excited for the new architecture
SDXL didn't have an image encoder, and didn't use T5 for one of the text encoders(which is one of the things that allow diffusion models to be more coherent)
My bad, only meant the text encoders part
i wish they released the image varations or reimagine xl version that they are using, cause what we have currently (even with cascade) is all based on 2.1 unclip, and that is what they used for the first reimagine, but the xl version is clearly upgraded, but we never got that (or at least i dont think we did)
It did support inputs from image encoders (CLIP Vision), but due to it not being trained for image conditioning, it needed IP-Adapters to have image encoding
Afaik clip based encoder should be able to encode text and image features quite similarly
