#💬|general-chat
1 messages · Page 108 of 1
just like sdxl, sd3 has different style of prompting
iirc lykon uses sd1.x / sdxl prompting
idk I think SDXL and SD1.x have pretty similar prompt styles
he writes 8 different iterations of "ultra-detailed" all with different weights throughout his prompt?
its interesting cause he replied to someone who asked for an image with a SD1.X style prompt: this build doesn't enjoy this style of prompting, but here it is
Why try something new when you can generate portraits of generic women in generic settings
most sd1.x style prompts are based on tags, sdxl is based on mostly natural language; sd3 is weird and uses both
hmm
natural language prompting*
Does sdxl work with natural language?
at least in my limited experience
yea
@crude notch very possible! But I scraped 250 thousand prompts from this discord that people used for SDXL, and eyeballing them, they look like mostly tag based
I meant sdxl lol
yea, sd3 likes natural language
The ones that made it into showdown and pantheon are mostly tag based anyways
I don't think sdxl works well at all with natural language
^same
My impression is people tend to use tags for SDXL
exactly
it could also just be my bias to using so few sdxl models
it has 2 text encoders, one is probably better at natural language, but they work pretty much together
I think SDXL is better at language than sd1.5, but it's not a big difference. SD3 with T5, however, now that might be promising
interesting
honestly we should be excited about how it performs without T5
t5 is the "oh we need to make text" tenc
unless people find good ways of quantizing it at like 4-bit
good news!
t5 with 4bit? actually works well!
t5xxl? even better
Why not go for Bitnet 1.58
well doesn't it use T5XXL?
yea
you need to TRAIN at 1.58 bit
its a common misconsception, you can only use that to TRAIN, not Quantize
at least that's what everybody keeps correcting us about
i put bigsby back on my s23 so i could have all these voice commands for my smartTVs, ended up training my own voice model for bigsby now I AM my very own personal assistant it is werd AF
Really? When I read the research paper it says it converts fp16 weights into ternaries
well idk anymore then 💀
thanks
Inference code is going to partners shortly and discord tests next few days I think that will ramp steadily. API access soon as well.
So it looks like we still have a week at least or so to go before release
i wish
sd3 isnt even done yet
we might get a sd2.9
yeah a discord test
They're still fine-tuning it, but imo is that even necessary when it's custom weights that are what everyone will be using
what is the size of the checkpoints
but yeah its like 3/4th the training from my guess
there have been arguments to drop the tencs and make it 15gb fp16
with or without t5?
jesuuuus
the smaller models were a last minute thing
8B w/ T5 I though was going to take up around 16GB Vram
💀
nahhh this doesn't sound fun
I guess those will have a delayed release
oh yea 8b will run on 12gb barely
thats ok
unless they train that much faster
that sounds bad if you mean like 2 minutes/image or worse
dont tell people but the smaller models are just the big models chopped a bit
what about xformers + T5 int4

this is wrong, i forgot that it didnt work
lol
all models trained seperatly
same arch though
ah okay
i expect something like 5 img/min on a 3060
I wonder if this will take off or people are just gonna use the 2B model lmao
5 images per minute wtf?
that sounds fast
with max optim
tome, torch.compile(tensorrt), 4bit t5
2.5 img/min
tensorRT 🤔
or worse
that still sounds good
if I get 1 image per minute then its all fine tbh
I have a 3080ti though
(12GB pleb)
GOD that sounds good
Or the community is going to split 🪓
this is what I'm worryin about
Same here
unless Loras are compatible
switch to LCMs
same architecture, same clip, different DiT weights?
worried about prompt adherence, which is why I would use it in the first place
Who knows, we never had different size models have we?
yeah, only different architectures
Are LLMs interchangeable with their loras?
this is what i expect
cant wait for nai4 to leak and give different clip n vae
512 -> SD1.5, SD2.X | 768 -> SD2.X | 1024 -> SDXL, Stable Cascade
lmfaoooo
no-one uses sd2.x
I'd be okay with 2 images/min if they had good adherance to prompt and lacked artifacts and poor anatomy
yeahhh
I used to use it to make highresfix photos
and I like the paintings
the best model for 2.x in my opinion is wd1.5
I never tried it
and that model isnt that good
I saw WD1.5 images and thought they were mid
yeah wd1.5 is mid
I used illuminati diffusion or base model for photos and oil paintings
Imagine if they censor SD3 the same way they tried to censor SD2
but the entire point of wd was "hey, we're doing the same as nai but opensource lol"
My 2 cents
I wanna make funny images like this but with SD3
what is the low end as far as GPUs needed to push it?
It's meant for further finetuning or merging, but now that we have Animagine xl wd is kinda disappearing as a project from what I see
I'd rather they do not use T5, and use something like BGE-M3 instead
doesnt sound even possible for me
the goal is to run it on 6b iirc
why?
also I was one of the few who tried UniPC at 1 step and got somewhat coherent images (before LCM, Lightning and Turbo models)
t5 has proven to be great for image models
M3 is a dedicated text embedding model
T5 is not
Can't wait to try SD3 in comfyui in like 1-2 months (if at all)
the issue is that sd3 training started a LOT before that released
And third party benchmarks show it beating the OpenAI text embedding model
should act as sd1.x models
Ah good point, M3 came out like 6 weeks ago
????
Probably wayyy too late for SD3
you mean like no refiner, I know about that
SD3 when?
oh yeahhh
not today, check back tomorrow
different sampling techinque
Ty
I forgot to dive into the sampling
also t5 worked for the closed-source deepfloyd
tldr, LINE!
LINE?
https://huggingface.co/BAAI/bge-m3/resolve/main/imgs/others.webp This was the third party benchmarks comparing OpenAI embeddings vs BAAI M3
the sampler is a line
I wonder what architecture SAI will adopt for their new video model
Starting daily “SD3 when?” messages
nothing to argue about, no placebo with "this sampler is better than that sampler"
probably a variation of sd3
I get that T5 may have worked for deep floyd, but T5 is also huge
But yeah, I agree. M3 came out too late for them to experiment with it
bro I'm using Mixtral 8x7B at int4
Only came out in 2024 Jan
there was this sd3 image i wanted to post but it got deleted (e.g. no sharing!),
I can’t see, sad 😞
it was fun before LCM came out
#👥|roles message react to these
also get ready for yet another vae
Wait, 16 instead of 4?
also, all i've said is public info (if you know where to look)
which one?
deep floyd is open source. it was the model that had a restrictive license, but the weights are still released. the training and inference code is permissive. https://github.com/deep-floyd/IF/blob/develop/LICENSE
sure
helps to look for the info right? 😉
oh boy here we go, six fingers, bad sword, no character interaction at all: https://twitter.com/Lykon4072/status/1766914556146696427/photo/1
that’s what I was concerned about, I wonder why that’s the case
four fingers on the left hand
idk what way to look at the glove
model still not finished
idk how much it can improve in the last quarter of training
it has improved a lot since the half point
yep, it improved
idk if hands specifically will improve
I personally don't care about perfect hands, but it should be expected from an architecture upgrade (or just a next installment in general)
Let's hope so https://twitter.com/EMostaque/status/1766890231687450933
oh god, the hands... 😄 https://twitter.com/Lykon4072/status/1766924878223921162/photo/1
helicopter is a disaster: https://twitter.com/Lykon4072/status/1766924765925662741/photo/1
this is bad.
i hope sd3 is bad so we never switch and stay on sd1.5
I don't think hands are a deal breaker for me
But I recall before the SDXL release, we saw lots of pictures with text as well
So hard to judge how good SD3 is based on these (understandably) cherry picked pictures
I don't fault them for cherry picking
hold onto your papers fellow scholars!
bad news
i pray for sd3's downfall everyday
wym? all the comments i see are saying it's bad
text is good in this one, pixel art cyberpunk style
there are so many examples i cant post that are actually good
3/4 of this community is: no corn, no care
damn fair enough
and also hardware requirements
sd3 wont be able to do that unless you ask nicely
we should just stay on 1.5 then
I want funny images
lewd....
see, my suspicion was correct lmfaoo
sd3 can do weeb though
why even make sdxl or sd3? 1.5 is already enough for booba
honestly people will train nsfw in anyway, this doesn't look like SD2.X
"hey windows xp still works well enough"
SAI should shift and do crypto/web3
if people train SD3 with corn images with detailed examples, you could get impressive images instead of just big boob girl portrait

that's my guess though
I want to know if it's as censored as SDXL or not
if its just SDXL level of censorship we'll be fine
doubt it will be better than sd1.5 finetunes
probably sdxl censorship
remember when people said "yeah sdxl wont be that good"?
Stability: "Mark.. This is Good news...."
SD community:
still kinda mid ngl
shit my bad
sure whatever you say
I read comment from Emad that it would be better at fine tunes than 1.5.
SD Cummunity
lmao
damn fr?
I thought finetunes will only improve it as much as it improved previous models
Yeah...I'll try to find it.
I do expect finetunes of SD3 to handle new concepts better though
the day when sd insert version number can make can make perfect hands everytime while running on 12gb vram is they day i switch from sd1.5
dont take everything emad says as truth
I guess it was this that I saw
emad does emad things
yeah...
Nevermind it won't let me add a screenshot
and still nothing
Lol yeah that's true
so yeah
i dunno, SD3 seems to get the overhype as MJ V6 did get
the problem with fintunes is when people learn to train the models well it will be already time for the next sd version
Supposedly this is last version....we'll see I guess lol
100% sd3 is gonna be sdxl clone that runs slower
x to doubt on last version
yeah emad said that it wont get better, but uhhh 💀
why do you stick with 1.5? Because of spec requirements and amount of plugins?
it will become more efficient, even more intelligent, it will get even better anatomy
there's no final model
unless the company goes down
the arch is a lot different
by image quality i mean
god 🤢
imagine this is gonna happen before we get sd3
mistral got acquired by Microsoft
hard laugh but impossible
only 2 open models released then they made like 4 closed ones that outperform the open ones
it makes zero sense for Adobe to acquire Stability AI
eh, someone will leak the model if it does happen
not likely the testing is happening through discord and only small group of people have sd3 weights
if the company ends, i am sure a dev will leak it though
100%
lykon is not a dev confirmed
jkjk
dev will leak a unfinished model that only beats sdxl base
in which chat can i ask for model recommendations?
im a weeb, so most weebtunes are better than sd2.9
exceptions: wdxl0.9 (we undercooked it SO much)
then i am asking here, i need a fantasy model that is not sexy focused X).
Maybe Dreamshaper?
you can control sexy with "sfw" and "nsfw" in positive or negative prompts, and many other tricks
i tried, but still was often skimpy or something was only on the nipples...
are prompts like gatekept?
Sure but prompt adhesion is alot better. So thats a huge advancement. Even then i think its also a quality improvement
Thats without fine tuning
Imagine that juggernaut and dreamshaper sd3 would look like
MJ will finally go out of business. Hate them, bunch of liberal mods
pfffft didn't expect that sentence to end with "liberal mods" and MJ is not going out of business anytime soon, they have a large paying user base and have more mind share. They can get investments easily and they can always just train a better model with no regard to if user base will be able to run it
Maybe.. it gets hard to make large progress after a certain point
lol what? censorship and preserving a sense of decency, keeping things unchanging, is conservative.
Bro just look at their logo on discord. Nuff said
personal liberties like not being attacked for being lgbt is also conservative. resisting disent, change, revolution, making sure populations stay cohesive, is what conservative politics is about
hating people because who they are, those are the people trying to change and disrupt
it has a rainbow flag. case in point
Ye, liberal
why hate people? its so weird
they wont
its mass appealing
it'll peter out like leap motion
Bro nobody is hating. Its an ai company no need to get political its weird. Either celebrate everyone or no one.
the one thing i dont like is that MJ allows mockery of Jesus but not mockery of prophet Mohammad
you're pretty mad about their rainbow flag. thats basically hate.
they're a private company and can do whatever they want. I think that flying the pride colors smokes people like you out and that's a good immune system for a community to have
K bud.
Cause Jesus's skin so thick after those NiN. what did mohammed ever go through? 8 child wives? boohooo
i dont care about the rainbow flag, i support the same rights for for example homosexuals (but for sure im not going to carry that rainbow flag ofc)
/offending everyone

yeah they are a private company
their rules basically
whose house? MJ's house
i dont see them crashing and burning though. they'll have a soft landing and just become irrelevant as the tech progresses
dunno if they will become irrelevant
its hard to move people away from it even if a competitor offers better features for example
once cellphones can run models at the same quality as mj , why woudln't they be irrelevant? they'll pivot maybe, or just not. the founder will likely move on to a new company. like he did before
i mean depends on scenario, but realistically right now there is no end in sight
we can ofc mention speculations
i know that MJ was also supposed to be killed off by DALL-E but also SD according to a bunch of people on the internet which ofc didnt happen
i dont think another saas will make mj irrelevant
ubiquity will
i could be wrong though. i also thought twitter wouldn't last because the sms protocol wouldn't last.
but here we are
yeah we will see
tbh for a short brief of time for some reason i had the thought of MJ being killed off by D3 as well
it probably ate a bunch of their subs over night
this just in. teamspeak is the discord killer. https://twitter.com/teamspeak/status/1766547230494748875
i almost forgot TS still exists XD
they're like winamp. still functional and pretty good by modern standards, but too old for the new kids
oh no they added more since then
Hi everyone 🙂
I am interested in training my own Lora, I wanted to know what is the minimum amount of images or information for my lora to be well trained.
does anyone know how I can do Aesthetic score finetuning?
Where's a cheap place I can run python code online, on a good GPU?
I want to run any new experimental stuff I find on Github
what the hell is that :kekk
You know, they should show SD3's ability to do DALL-E 3's claim to fame, I.E. it's ability to do multiple subjects in complicated/complex situations and from multiple angles.
Can anyone help me?
When is SD3 coming out?
I have no idea.
I'm registered on the waiting list
I saw the nice samples on their page - love the idea of incorporating text into image output
😮
gotta find out when its out
but i put already a bet that people expect too much for what they want
SD3 is taking ages!
during WW3
https://twitter.com/Lykon4072/status/1766934948752113735/photo/1 prompt: "girl holding three pencils in right hand and sunglasses in left hand"
If you guys could create any image you want, what image would you make?!
whatever would convince someone to immediately give me all their money
gay alien corn
I like comicbook & anime style images -- good quality ones
select the folder and hit the del key
small
mine's only 1.95 TB... just 135,774 folders and a mere 1,020,975 files
how much ram should I have for SD
Anyone looking for freelance work? Need a few things made, only experienced ppl pls.
just checked, only generated 55,790 images since last november
Does anyone have access to SD 3
How would I get access?
idk.
but video games I got are way more
you'll never game again!
certainly possible with the current trajectory lol
I play totk every morning before work 🙂
i binged the absolute f outta that for a month or two last summer
built every crazy thing imaginable and then that was that
/create
/delete
https://thequantumbyte.substack.com
This is a very cool weekly newsletter on Tech, Science, Math, AI and Personal development.
Subscribe it for free.
Nothing more true
I merged Action Figures with Pictures so i can have unlimited poses and randomly any character i desire
only if i can run it locally
legit zero interest in any cloud crap
age and full time work bulldozed mine
got the full time work part down
that killed my interest in fetch quests thats for sure
lol
remember the game blastcorps? you had to bulldoze cities before an out of control truck carrying an armed nuclear missle ran into any buildings. now that's a game i can get into
nope never played that one ha
rareware
now a Full-time image generator 🙂
haha
50% Image Gen, 50% ParaVM
good morning everyone!
i'm getting pretty darn close to pixel art animations with acceptable consistency, so i'm in a great mood! :D
Question. is there a particular model that is better at creating art as opposed to humand. I'm using Juggernaught XL currently and though I am happy with tghe output for the scene i described I just wondered if people have specific models they use for specific situations, You know the old adage "Use the right tool for the job"
i'd recommend looking into art style loras and specific checkpoints. have a look around civitai and i'm sure you'll be able to find something you'll like!
Hi Yellow
Since I'm thinking of developing a game with RPG Maker and I need sprites, are you saying that stable diffusion is great for these things?
as of now, not really. however, as my workflow improves, it'll become better and better at it.
RPG maker uses chibi sprites, right? i haven't really looked into those for animation just yet. that may be a fun side project
yeah exactly
i'll go look into it. i got a free copy of RPG maker XP, so i may try and use it as well
Try Dreamshaper XL or Blank Canvas XL
Thanks
Thanks
and both of them work with SDXL? or are checkpoints imune from versions?
The checkpoint will have a base model - which is effectively the shape of the “brain” (model architecture).
Those two are both SDXL models so will work if your system can run sdxl 1.0
Those two also dont use the refiner model btw
Thanks. its still all double dutch to me but I am getting there. I am not using refiners yet so thats not an issue. I have an SDXL install on my local machine and it is running well so hopefully it will go according to plan. Thanks for your help.
Ahjh well just another 13GB to download. I'm glad we don't use 1200 baud modems anymore. we would all be in the dark ages.
Yeah 300mb down 100mb up fibre to the door so damn good for stable diffusion!!!
baud, I haven't read that in a long time
insert ''Í was there Gandalf" jpeg quote
It still applies to embedded stuff like arduino projects.
I need to sort out my conectivity. i have 500Mbps cable, but then pipe it round the mains in the house where I am getting 190Mbs at my sedktop.
Its good to know the old tech as well as the new. gives you perspective. I remembe running a BBS system back in teh early 90's think it cost about £400 to get content onto it with all te telephone bils for connecting around
Don't know how it works where you live but most ISP advertise xxx mbps/gbps and under deliver. Reading the fine prints it's always "yadi yada best effort networks yadi yada advertised speed may not be reprensentative of real life speed yadi yada give us money you don't have choice anyways"
same here but if i connect in the front room i get upwards of 400Mbs. Just did a check on wifi from the back office and im hgetting 94 dowen and 68 up on my phone
rediculous, just did data on phone and peaked at 1.2Gbps, susyained 980Mbpos down with an up of 78Mbps. maybe a tether 🙂
what a strange idea to live in a house with walls made of lead
:p
good luck figuring out your lan issues
bricks are quite good at dulling waves especialy when you are going through 3 or 4 walls
Thanks, but no real issues. just niggles. it works faster than I can
I lived in a farm house with 1m wide interior walls, I feel your pain.
Nice house by the sounds of it. bugger to heat in winter
bedrooms are next to the fireplace. It's quite easy to heat. Just have to get used to chopping and storing wood.
fiber internet is not a thing however 😄 not even ADSL can reach it.
it s wimax or satellite only
Memories of my childhoos in big farm houses in Scotland. one of the chores was splitting and stacking logs. a dirty job when it had rained. Chatracter building stuff though.
There is a village about 5 miles ffrom where i am in south east england and their internet is still delivered via copper cables. population of under 500, so little likelihood of them improving their megre 5Mbps. mates come here to download stuff
similar vibes http://news.bbc.co.uk/1/hi/world/africa/8248056.stm
Anyways I ll stop rambling and give back the mic to stable diffusion content.
love that RFC 1149 IP over Avian Carrier. April fools joke back in the 90's
Thats a great story about winston though
hey in dreambooth, do u train it, and then generate the checkpoint?
I asked perplexity your question as i like to see what comes back on a subject I know nothing about. https://www.perplexity.ai/search/in-stable-diffusion-W1bxi9MiQXCZbv57155TeQ
I cant say for its full completness but there should be enough there for you to answer your own question
Anyone got any tips for fixing hands on inpainting? It seems like if I put it on Only Masked mode it adds a bunch of random stuff, and if I put it on Whole Picture it works better but I just get different kinds of malformed hands 😅
Faces were easy enough but I'm having a hard time figuring out hands
are u using negative promps to avoid mangled hands/
and also idk, make a batch of 24 till u get a non mangled one
when sd3 release
i have a question, if i make images with Stable diffution can i sell them ? like comics e.t.c ?
Depends of your country, your business and the models you re using. Can't get get you proper legal details about that through discord, that's not really the place to have those discussions.
I'm using "bad hands, deformed hands, extra fingers" as a negative
Anythin else I could add?
select what you want to give a higher weight, then ctrl-uparrow
hm doesnt work for me lol
Howdy everyone! I recently found out how to get consistent faces using LoRAs using SD in A1111, but now it's time to use my own face for my project. Does anyone have a link to a good tutorial to making their own LoRA for this purpose? I assume that would probably be the best way since it's working so well, and I can take unlimited pictures of myself.
uh im pretty sure its exactly the same as the stuff you did before...
just take like 20 slightly different pictures
Do anybody know about a model able to generate a Booru type prompt from a human natural language prompt ?
no clue what that means
Like you input "A girl standing" and you get "1girl,brown hair,long hair,standing,..."
cant u just use the booru interigator on irl images?
Yes but it necessitate looking for an image matching my idea which is kinda limiting me
generate an image with natural language and use that with the interogator then ?
no I used a pre made LoRA lol'
but anyone with the same LoRA can reproduce similar images, so thats no fun
otherwise no, I don't know about such a booru translator
I went really deep into deforum and image masking to make crazy half ai/real videos of strippers at the strip club I used to work at
yes, I do. Is there a good resource for that or just watch tons of random youtube videos?
idk i havent done it yet, i went through hyper networks, to dreambooth, and ill try lora after
but there should be a train menu
cool I have never heard of either
i hear its even faster than dreambooth
there no sd3 room?
sd3 isn't released yet. So no.
yeah this is the last news I have seen so far
Inference code is going to partners shortly and discord tests next few days I think that will ramp steadily. API access soon as well.
Discord testing will hopefully be this week 🤷♂️
(just a wish, not a fact)
Yeah we've heard "next few days" for a few weeks now lol
^^^this too
I just hope it will be this week
it's been almost 2 weeks since feb 29th, which was when they promised the test apparently
Me too...hyped to test it
the more u talk about it the sooner it will be released

SD sora model when? 
Use controlnet open pose, and if possible use the base model, as derived checkpoints tend to introduce more such deformities as a side effect of training/merging
I can't remember the last time I had to fix a hand, people must be using crappy models, or 1.5
I just hope SD3 will properly work with Controlnet again... there were so few well working CN's for XL and they never worked well (at least that's been my experience).
I imagine if it's an architecture change again, it would be a delay again before we had those models, as with XL ... But hopefully that hurdle will be less since presumably many of those prior efforts can be leveraged
noob question. If i install SD3 into a seperate path I take it I can keep runnungf SDXL and SD3 seperately?
use the openxl hands fix lora, aka the slider lora for fixing hands
its now 5 days after emad said "tommorrow or the next day"
Meh, days, weeks, doesn't matter
it matters to me 
theres tons of controlnets. they work great. even have tile models now. the open pose has been the big one that didn't work for a long while but i've just substituted depth maps instead
They will release controlnets alongside SD3
iirc
or at least within probably the 1-2 weeks
Tomorrow means 5 months

That would be fantastic, skeptical of it being that fast
I am trying to find that tweet but twitter f*cking sucks
I can't search from a user
Avarage x moment
but I remember emad promising that they'll be launching SD3 with controlnets
they are training them
hopefully canny, depth, openpose are guaranteed
and possibly tile
How much options u think the discord bot will have
none for controlnet
Or just "image"
hmm
it will probably have aspect ratio option
don't know about CFG
probably a style option
Hopefully
I want to try how different I'll have to prompt
Would be good for logos, posters, backgrounds etc
Does it do that enhanced prompting that I see some things do using an llm
Idk much about sd
if I get massively better results with CogVLM style prompts
not -> "cinematic film still, candid cinema, medival knight infront of castle, HD, intricate"
instead -> "A cinematic film still image of a Medival Knight infront of a castle that has ivy growing on it. The medival knight is holding his giant metal sword at his waist height. The sword has a big red ruby gem on it. The sunlight is casting a strong light on the medival knight's head which causes bloom."
Good for you then.
not really about me, or you. its about accurate information.
Interesting
And use cases and resolution / ratio etc.
And Openpose was one of those I tried often that always failed. Depthmaps & other CN's also don't work well if the source image is drawn - those worked better with V1-5.
It's gonna get better for people who would prompt it more like explaining an image to a human instead of computer, please make X in Y with Z in the style of A
But it's great that the dataset only consist of Half CogVLM and the rest is regular Laion Clip-like tags, so it's still hard to mess up, but you are rewarded for exploring and thinking creatively
canny for sketches is better. depth-anything preprocessor can handle most anime, but it'll break when you get into doodles / kids drawing.
As soon as sd3 is public ima speedrun make a colab for the memes
im curious how well it works regarding styles/textures.
focusing on what doesn't work, instead of using whhat does work, kind of limits you
the proverbial you. we're still not making it about you or me
ive already blocked u days before
Funny. I just told a friend that SD3 memes will become a thing.
I wanna know how much 8B matter. Does it know more characters in the base model for example?
more parameters is more knowledge yeah. potentially has better character knowledge. but they'll be built on the same dataset if i'm not mistaken
The dataset always contained the characters I wanted, but they never worked
not even in SDXL, which was the biggest model at 3.5B (Base of course)
captioning of the base model is a big deal too. i think 3 is going the ways of other modern base models and captioning so much better
hard to know the difference between anime characters if they're all captioned as anime girl
hmm with CogVLM and the premade tags idk how it will end up
Where did sd get it's dataset from 🤔
CogVLM must know basics about popular anime
I appreciate your input about XL CN's, but I don't appreciate the condescending tone.
Take that however you like, without taking it personally of course, because this is not about you or me.
you mean this tone?
for example this, you can look at the others
i tend to reciprocate the energy i get.
Interesting
god at this point #🏞|general-with-images should be named 1.x channel and this #💬|general-chat should get image perms, its so annoying not having to post examples and previews
that's what they use it for 9/10
i think it's built around it but it's not purely laion's set
By built around were more images removed or added
guys which ones harder to make, lora of a artist' style or lora of a character?
Same. It's like a perpetual back and forth of "nuh-uh, you!"
Makes it hard if both parties can't leave the unwanted convo without having the last word.
a room without imageperms is good to have. thats why they're seperate now i ifigured. you give #genchat image perms and it'll look just like gen with images
hmm true
i'll just not offer advice anymore. somehow it turns into arguments.
maybe if embeds could work, like twitter updates then 🤷♂️
Ahhhh... That's better. ☺️
And gifs 🗿
just use gen with images if you want that. any chatroom with embed perms is going to become that.
I would say they are roughly equal in difficulty. Your expectations will set the difficulty bar once you have a basic mastery of the process for each, subject vs style
I solved it 🔼 @pseudo jetty. It was all about the source of the seed RNG generator.
Both DirectML and ZLUDA installations had GPU seed enabled by default.
With DirectML the CPU seed and GPU seed are the same, and NV seed is different.
With ZLUDA the GPU seed and NV seed are the same, and CPU seed is different.
So the GPU seed sides with CPU on DirectML (torch+cpu) and with GPU on ZLUDA (torch+cuda)
So for any images generated on DirectML's CPU/GPU RNG, just need to change the ZLUDA seed RNG to CPU to regenerate the same images.
Just now: "This week, @xAI will open source Grok"
~ Elon Musk
lol i love that he's flipping out about open ai this whoel month, but then grok
he won't release weights. calling it now
I tried making a style lora and it barely resembles the artist, it resembles more of the base model. Also would you recommend training them on colab environment? im planning to buy a pro version since my pc dont have very advanced gpu
That was 8 hours ago though.
Tf is grok
Elon's LLM
Hmm
also a classic science fiction term for empathetic understanding and communication
isn't grok like "based" or less aligned?
i hate that elon is shitting all over the word
I don't remember
lmao
Shitting is based
hell yeah
based is a shit word. it's been ursurped by race supremacists
Sigma
well that's a based statement

Grok being open sourced isn't a big deal, it's a weak model with few redeeming qualities
the original meaning yeah. the way it's used now a days is demented from that
If it doesn't use much computional resources might be useful ig
i saw a prediction from an ai researcher i follow. end of 2024 we'll have at least 5 gpt 4 level models. gpt5 won't happen. scaling gpt larger won't improve it's capabilities anymore.
epic
open source and weights are different things. he won't release weights
ok so train them on 3000 googol Tokens then
got it
and use 0.0007538 bits when training
ahhh
now theres a word i haven't seen in some time. the proper googol
should have looked deeper in the news
almost looks fake
1 with 100 zeros
cant wait for llama3
we needed the definition thanks dictionary.com
oh wait we're getting off track
Np
lets talk about stability's LLMs lmao

🦗
if I recall correctly it recreates a face in Stable Diffusion
it tries to resemble an input face
I use vast, but colab pro probably works fine
it's an sdxl thing. i use it with the controlnet extension on a1. catches identities and faces with one or more photos, then reconstructs them into new pics
Oh how cool. Will try it soon
theres ip adapter face swapping too which works good and on sd15 models
Is there any model which changes the expression of source face. For example: wide open smile to make it closed smile?
both instant id and ip-adapters will "attempt" if you prompt that way
I mean not to generate a new image. Just the same source image but different facial expressions
Or try just prompt + inpaint, controlnet seems over kill
I will look into it soon. Is there any tutorial on how to do it?
potentially? maybe inpainting with those same tools? there's this thing but , it's on its way not here yet. eventually i believe in expression shifting. https://humanaigc.github.io/emote-portrait-alive/
Yeah I've heard of it but it's for lip syncing. I am not sure if they will release the code though
i tend to hate tutorials and just go for it and figure things out. automatic1111 makes such exploration fine but i'd never attempt that on comfyui. every custom node developer has their own ways of doing things and you've got to follow along with docs to hook things up always.
What I want is: to change only the expression of the face while retaining the rest of the details of the image
ComfyUI is too way advanced for me
says they will. the rumors say they wont' because they're chinese. i hate such xenophobic rumors.
the research is published and other people will implement it eventually if they don't. it's a good release regardless of rumor mills
That's for videos.
its also more than lipsync. theres a few of those. this one emotes the lip syncing
thats why i brought it up. is very expresssive
Inpaint, select the head, masked only, take out all tokens that refer to other body parts or scenery,change the prompt, use smirking, or play with other descriptive terms, use denoise of maybe .45
There is an app called FaceApp which also changes facial expressions, I was wondering I could do the same with SD
i'd also turn on controlnet to offer face guidance to the inpainting
that's a classic face swap. post process. just pastes the face on over top after the fact. works good for many cases. i think an extension called "face chain" might help tehre
If your expression is very particular, then ip-adapter is how I would do it, find a portrait online with that expression
Not face swapping. Just to change smiles
hard to change the mouth without changing the whole face. if someone smiles without the rest of their face shifting it'd look alien
I had one with a guy looking up with wide eyes like he saw a. UFO, used ip adapter
You have a point. But at least retain the same face without altering too much of it
But it will clobber the look if you use a high cnet value
So now I use IP Adapter and Instant ID?
ideally. i won't say it'll all be automatic. you'll have to play around with it and practice your skill
dialing knobbies is one of my favorite learning exercises. good way to get a feel for things. tutorials don't provide that true "groking"
And is there a model to change clothes without altering rest of the image?
not really. ip-adapter and fiddling with knobbies again
Much harder, that one
i think there are models being worked on. salesforce would be interested in such things.
Sometimes if it's a simple change, you can get away with inpaint sketch
segment anything works well for masking only clothes
Like give the guy a brown jacket by literally coloring the area in brown
is there something specific u gotta do to run sdxl? ive been using 1.5 for a few days, and i wanted to try sdxl but it cant even generate a 128x128 image, runs out of vram error?
Which UI are you using, Auto1111?
no, comfyui
how much Vram do you have?
8gig
it should work fine...maybe try a different model
comfyui should be able to fit sdxl into 8gb. make sure your vram isn't being eaten by anything else. check your task man
would hope you're using nvidia too
only do 1mp images
like 1024x
512x at the minimum
why?
hi guys is there any website like civitai that i can find custom trained stable diffusion models?
wasnt trained for anything lower
nvidia hardware?
no, amd
thats why
so amd can do 1.5 but not sdxl?
not on 8gb
i had no idea
What's your GPU?
its an old 5700 xt
amd lacks a lot of optimizations
also did u guys see the paper that can make sdxl beat dalle 3 in prompt understanding
you're better off loading it up on linux and generating images there
weights release in a week, should get it in comfyui
the efficient large language models paper fuck yeah
With that your out of luck with using sdxl :/
won't run those on amd
ill buy a nvidia card eventually, just wanted to atleast try sdxl once, but guess not lol
You should stick to 1.5
alright thanks guys
go linux. do it. you'll love it. after the pain subsides
yes it seems amazing altrough I only have skimmed the paper
But on 1.5 you should be easily able to do 512x512
Okay
Thats good. For higher res you would need any form of tiled upscale
Like SD ultimate upscaler extension
yeah ive only been at it a few days, but ive gotten great help here
ive tried tiled upscaling and inpainting and stuff
ive been thinking of getting a 4090, but yeah ....$2500 in my country
Honestly hold off if you can
Rtx 50 series is not too far away
If you can wait for 2025
yeah, if 4090 is 2500 what is 5090 going to be ....
According to leaks and what nvidia said, it'll be worth buying over 40
maybe not 90
but maybe 80
u will have to see the memory too
but overall there has been a trend
much better performance for slightly higher price
logically that is a better choice
slightly higher? i doubt that
Compare rtx 30 to 40 prices
And then performance
They say the difference with rtx 50 will be even bigger
If you're ready to pay premium price for top of the line, you might as well wait for the latest one to release
And get more value
RTX 50 will launch higher and 4090 will stay relatively same price. Nvidia isn't about to leave money on the table. Nobody is catching up to them anytime soon
I am guessing
if 5090 vs 4090 is going to be a bigger diff than 4090 vs 3090
The 5090 could cost 200-300$ more than 4090
MAX
or ppl wont buy it
and $200-300 for that type of increase in performance is a good deal
I heard a rumor that the 5090 will also merely have 24GB VRAM... which means that for AI purposes you might as well just buy a cheaper 4090 or 3090.
Where?
Also, does bandwidth matter for ai generation?
It will have insane bandwidth according to leaks
Somewhere on Reddit. A so-called "leak", but we'll see if it turns out to be the truth.
memory bandwidth?
Its not a rumor. its an industry fact. TSMC won't be putting out larger chips for a coupel years. it's the density of each chip which defines the market
vram bandwidth
For LLMs it matters iirc
So far the biggest limitation for working with AI Models & training / fine-tuning was VRAM.
It could be 24
or the speed of the vram
Anybody have any recommendations on how to generate consistent AI influencer (with a consistent face) in different backgrounds/ content situations using stable diffusion, focus, or any other tools?
LLMs?
theres lots of rumors of a higher vram 5090 as well it seem
I only read 32gb
large language models, ones that generate text basically (Llama, Mistral, GPT-3)
or reply to your queries (ChatGPT, Claude, etc)
ah
if i understand it right, right now we have 3gb gddr6 chips. 8 of those fit on a board. more if the board is bigger (enterprise). Supply chains matter a lot too since you can only produce 3gb chips so fast. In 2027, they'll start making 4gb chips.
I don't know how diffusion works exactly and how it utilizes memory speeds
would cheap 128/192bit 24GB gpus be viable 🤔, would people like them?
not really about diffusion specifically, but it sort of visualizes how cpus parallelize computation vs how a cpu does it. I feel like this is applicable knowledge to the subject https://www.youtube.com/watch?v=-P28LKWTzrI
sounds like a way AMD would cut corners
Don't know about the influencer thing, is that like a cute tiktok girl? Lol... You probably need to train a lora for consistent look in different backgrounds, but to get the training data you will probably need to face swap
not a complex prompt unfortunately
but it still looks good for a base model
https://twitter.com/Lykon4072/status/1767229515523084319 this one is more interesting
influencers are people on social media that hype brands and opinions. they're walking talking billboards basically
wont releasing a gpu with high vram, somewhat lower the value of their super expensive cards like a100?
they won't compete with themselves yes. but also, supply chain issues
no
4090 not having nvlink is a big example of them not wanting to eat their enterprise sales. companies would buy 2 4090s before they buy an a100
they seem to be 10x the price of 4090 so
exactly. 2x 4090 is a lot cheaper for 48gb
they dont leave money on the table
thats how a video game hardware company has become a trillion dollar company
Cool, so what would be the point of creating a fake one in SD? Rhetorical
dont' have to pay a walking billboard
answered anyways
wait, no gpu in 4000 series have nvlink?
correct
wow, thats so sly
they did put the hopper transformer engine in every 40 series though
i had no idea
no sli either (heh)
i did once back when crysis 1 was fresh, and it still didn't run good
riser cable yeah. still not worth it imo. software has to be specific for it and it often causes bullshit like stutters as the cards negotiate
i wonder if u can sli som old gpu and get decent speeds for generating images
sli and nvlink are better suited to research and development
SD3 when?
multigpu rendering has JUST come out, but i don't think any ui has implemented it yet. one image being made across multiple gpus that is.
stable swarm can do it
swarm just queues images over many gpus i think
like a spool
even if you share gpu memory on one image, i bet it'd still be a single processor doing the generation
i guess u can have 2 gpu and render 2 images at once, thus making it 2x faster, in a way
there's something that can actually generate one image using the vram of two gpus that arent nvlinked without a significant slowdown?
multigpu rendering doesn't really matter tbh, but multigpu training really would
i dont think anything like taht is a thing
yeah i don't think it can be a thing
100% . but 1 image per card at a time. if you've got 100s of cards, thats 100s of batches at a time. could work for video potentially if you can somehow manage consistency across keyframes
Which do you think will released first? Inference code for Ella (https://ella-diffusion.github.io/) or SD3

lmao
We propose a novel lightweight approach ELLA to equip existing CLIP-based diffusion models with powerful LLM. Without training of U-Net and LLM, ELLA improves prompt-following abilities and enables long dense text comprehension of text-to-image models.
This is impressive though
yup
Stability training everything so that it could work with T5 perfectly and these guys make an adapter that makes almost any Local LLM work with current Diffusion models 
hate hearing these announcements with "code to come later" "weights to come later" bet it's not just a week lol
same
I remember having "code soon" stuff and it either doesn't come out or it comes out almost a year later when nobody notices
IMO what it is are people who are scared they'll get "scooped" by someone publishing something similar first
so they put out the report with just enough to make it publishable
when they know the code, etc is a mess and needs months of work to be suitable for the public
makes sense
i cheer them on. these are people trying to get their phd i'm thinking
publish first defend later
the problem is, the big reward comes from the "communication", the first publication
throwing in the other stuff later doesn't carry the same impact and it's better to chase the next communication
so tons and tons of stuff in scientific journals in general is like this... the big hit, the thrilling new method/discovery/whatever, usually with "studies to ____ are currently in progress" in the conclusion
and rarely do the details get fleshed out
that's generally the end of the road lol
its all scientific process. academia is a whole different world
*Publishes paper with no code*
*Here's an amazing advancement in AI research*
*Refuses to release code*
*Leaves with PHD*

when newton published the principia he didn't even tell people how to understand it! Leibniz had to invent calculus to figure it out
anyone got a RTX 4060 Ti? how is it compared to 4090?
time honored tradition
naw. thats just magical thinking
wheenn i get access... does anyone have it yet in here
it is what it is but it's unfortunate that getting things working and building a solid framework dosen't reward in the way that the first "OH SHIT LOOK AT THIS REAL QUICK" does
it really does because we end up reinventing the wheel all the time in research
someone publishes a communication with the exciting new result, graduates, 90%+ of what they tried in the lab isn't ever published, along with the failures
nobody has access
so then other labs try to follow it up, and end up having to work out all that stuff again for themselves
double performance at 40 series level is quite a leap. more than doubling a 3060's performance
lots of redundancy
that would still happen believe you me
i hope. because this so unfair if my country is going to be the reason i will have to wait more
if every researcher was forced to release complete inventions, they'd put out a lot of wheels
ok
this ain't bard or gemini
and alos, reinventing the wheel is a thing. it happens all the time. think we started with goodyear?
nah, not saying forced... just saying it sucks that it's disincentivized
same thing with negative results... those aren't generally publishable but are pretty fn important
if no one ever reinvented the wheel, we'd be full flintstones stone carved wheels in wagon ruts
(coming from the perspective of a chem phd so maybe this dosen't translate as well to other fields)
you're a doctor of chem? cool. wouldn't have guessed
yeah, just not much value in rediscovering the same mundane shit over and over. it's like learning software without the manual cuz it's locked in a closet, you'll get there, but not very efficiently
getting teh entirety of the human collosus onto the same page. nobel vision.
really depends on how much you value a dolalr
4090 is 1.6k if you're a hawk about getting the FE
i already had my gigabyte 4090 oc but a few weeks ago i had a FE in my cart on nvidias site and almost bought it to sell at cost to someone but then got lazy and let it go
4090 has been so so worth it
yeah idk i haven't had any issues
that was apparently ppl not snapping it on properly
i've got a 4080 for $1100 canadian. Was the best deal for over 8gb at the time. Canadian retailer prices are very gougey and the 4080 was getting downhyped online so it wasn't boosted in price.
and then buying junk cables that were supposedly superior (but inferior) replacements for the originals
I would've got a 4090 but i didn't have the cash to spare
less than 1% of buyers had issue but youtubers hyped it up for clicks and views like they were mass producing fire hazards
when investigated too, it was usually people who bent the connector hard or damaged it when yanking it out to retry
1% was a generous over estimate. it was a non issue. there would've been recalls if it had any significance. the reason you think it was so huge is because fo youtuber clickbait.
really the biggie with the 4090 is just be sure you use the support brackets etc
but that goes for any of the big 3080 3090s too anyway
Before summer, most definitely.
checking in again, has anyone gotten sd3 access?
Really?
and if they have, is it full weights or just some online interface
I didn't.
Any good upscaler for A1111
Try looking at https://openmodeldb.info/
Can you recommend any
just download the largest one
download all of them
try all of them
you think this is science?
it's fucking art
there is no best one
all of the gan upscalers suck imo. i wouldn't use any of them. theres supir by google which kicks ass though. heavy to run but awesome results.
wait no, that one's by tencent https://github.com/Fanghua-Yu/SUPIR
I can't say for sure but that's what I read on an SD3 thread on reddit, and as we all know, everything on reddit is true
as i understand it, the 8b unoptimized will fit into a 24gb card. with fp8 half precision, it should almost certainly load into 16gb
So there's some clothes I want to turn white when generating some art. But when I generated 10 images specifically with the prompt mentioning white clothes (and more specific words like white corset, white stockings) only one returned with the intended results, how come even when the AI proves it can read my intentions (as with that one example) it generated 9 ignoring the conditions?
Yeah prompt bleeding is a thing. Stable diffusion isn't exactly great at prompt understanding. Colors especially will bleed all over.
Hopefully those days are over with sd3... for the most part
reminds me, the other reason it drives me nuts when people publish and say "weights/code will be dropped in x days/weeks" is then you're stuck wondering if you should bother working on improving a workflow or not
There are a number of tricks. Sometimes you can paint the colors where you want them, then controlnet guide the image against an img2img situation, where it has base pixels and guidance
i've found the best thing to do is shit like that^
paint the color over it and img2img without specifying the color for anything
let the image guide it
inpaint controlnet can be handy too, or tile
generate the image, put it into controlnet for pose or depth guidance. paint over the image in "paint" or whatever other app, then paste that into img2img. give it a denoise. regen. that's often my process
more often i don't have a plan for an image and if it doesn't stick to the prompt exactly, i'll squint an tell myself "eeeehhh good nuff"
can also be done in comfyui pretty smoothly with frequency separation or some of the layer blend nodes
generate the image, clipseg and mask blur out the shirt, color or hue blend, or (better) frequency separate and set the color on the low pass
that's a lot of new concepts I haven't heard about yet but thanks, it'll take me a while to research all the options being mentioned
then do iterative adv ksamples with low denoise, finishing each with frequency separation and injecting the low pass from the previous iteration into it
inpainting and controlnet. fun stuff to learn. really makes you say "okay, wait a gosh darn minute here"
These intel gaudi chips. Can we buy them to install into our home PCs?
damnit no. tehy're only in the enterprise space
JuggernautXL
also, realismengine, helloworld, realisticstockphoto, realvisXL, albedoXL, many others
thinkdiffusion
online
it's gonna be a discord server where you get to test it with bots
we'll get weights when it's 100% done
Mhh okay but there is no ETA?
Can anyone help me? I installed stable-diffusion-webui by AUTOMATIC 1111 and downloaded Stable Diffusion model v2.1. But everything I try to generate turns out to be an absolute mess. I can't even generate just a cat or a person. My parameters: Steps: 20, Sampler: Euler, CFG scale: 7, Seed: 1377736898, Size: 744x1032, Model hash: ad2a33c361, Model: v2-1_768-ema-pruned, Version: v1.8.0
2.1 is bad
use sdxl
or 1.5
🥹
tried it. it's the same for me
I mean I can get absolutely nothing that looks normal
Did u use a fine tuned model
its not the same as normal sdxl
Do you mean it's ok that i can't even get a cat with v1.5 or v2.1?
2.1 is really great if you're prompting for anything but porn
stable video is based on 2.1. clearly is a capable model
mhh
but if you're obsessed with pornography, well, okay, you've got a point
could it be an installation error?
SDXL can change a life!
744x1032 is pretty big for 2.1. it's a 768x768 model. not sure what you mean by "total mess"
but sdxl has other settings than fine tuned right?
a cat with 3 heads without a torso for example
i think 2.1's biggest failure is it's really hard to train anything on it. often has wicked prompt comprehension and detail though. The unstable crowd funding campaign really poisoned the well around it and stability just abandoned it. sdxl is basically 2.2
wait for sd3
sounds like resolution attention issues
I just wish I knew what does it mean
try 1024x1024
I notice a lot of ppl complain about quality of an image, but they make them at 512 lol, gotta set to 1024, and enable hires fix if required 😄
my pc went crazy with SDXL, i'm gonna reboot
only emad's promises 
is this ok? https://imgur.com/a/vxaGKSh
No
this is not how it's supposed to work I think https://imgur.com/a/gv9OYJC
Try with Fooocus first it‘s easy and then you can switch to A1111
@lusty beacon what would you change here to get a normal looking picture?
2.1 is a 768x768 base resolution. you need to generate at least that
thank you, it's much better now
but I think I still have to reinstall it https://imgur.com/a/4CsCQvd
i'd get rid of artstation. openclip is a lot different to prompt against than 1.5. don't use the 1.5 and sdxl prompt cliches, since 2.1 only uses openclip. no dual clip layer
Does anyone have a comfyui worlflow that would allow me to inpaint a person to change their cloths while using controlnet?

