#🏞|general-with-images
1 messages · Page 30 of 1
2.25s/IMG
cyberpunk?
I am working on something to make firefly users weep lol
somebody made a comparison between bing AI, firefly, and MJ
and I wanna submit an SD gen that clearly blows them out of the water lmao
what are you smoking firefly is the bestest out there and I know cause they told me.
loool
hello
MJ
HELLO
btw, news alert MJ V5 does guns and damn they do them well
one of my submissions lol
it shits on the others sooooo bad lmfaooo
the prompt is supposed to mix art deco with cyberpunk
and I think mine also does that the best
I also have this one
from what I am seeing from firefly it is more for kids and a comic style. Sort of akin to "Fisher Price My first AI".
the sheer carnage that SD wreaks on these other image generators is insane lol
right
omfg
@smoky oak MJ V5
baby's first mid image generator lol
oh wow haha
this looks like its an image straight out of modded skyrim lol
Unlike SD it does guns
true
tho SD can do guns wayyy better with a LoRA
but I guess the point is MJV5 didn't cuck weapons out lol
exactly
consider how much was cucked out previously this is a HUGE advancement for them.
Why a third hand 😦
reach arounds 😦
isn't midjourney a refined 1.5? i don't think they're training base models over there
might even be a refined 2.x
I have a question about img2img and that is why would I stick an image in there then interrogate it and it comes out with a shit ton of noise no matter what? Sort of the same look as if I applied unsharpen to it several times?
It's a lot more than that
tbh, that is exactly what it is doing is unsharpen so wow
MJ has a significantly better Text Encoder
yes, it really does
they certainly have an internal collection of extra networks. you can see a distinct one in action with symmetry
I still prefer anything I can run locally but I was floored to see it do guns
do we know that? or are they passing it through a processor to decide what the final prompt should be against their training style, so that every prompt deterministically provides complex 75 token long prompts...
I think they're doin the ol 2step. it works though
MJ used to be the worst censorious one out there but that may be Bing AI now
the censoring is really nothing of concern. keeping nudes out. oh no. the internet has plenty
optics are gonna be important and we've seen how bad it can look
in other news flow is wrong yet again. This had nothing to do with nudes. Hell same prompt SD posted the pic in here and Bing warned me if I do it again I can get banned. Nope, nudes, no nudes, fuck that.
yeah. not a big deal. it's just a filter gone haywire. nothing new.
once again, mountain out of a molehill
Yeah, but I wonder how many will get banned, really?
hello
if you're trying to push their stated limits, you're asking to get banned from their system. warning seems apt there. i'm sure it's all flagged for review and if it's all content that was false positives, an appeal could surely be had. if not, then screw it. there's going to be dozens of ai services to use.
Shoe leather with salt and pepper or shoe ketchup?
When I was originally with midjourney, it was all their own thing, then SD came out and they interfaces their stuff with SD, and the quality went up massively while maintaining their very good color processing and such
So I think it's safe to say it's definitely its own thing
But I do believe it works in symbiosis with stable diffusion
the image tagging style that they've been training to is likely highly proprietary and something they've been developing internally the whole time. now with the new mjv5, you're supposed to talk to it like describing the image in a chat like style, instead of a prompt style. its a big reason why i think they take the input and process it into their proprietary tagging style that is kept completely internal
helps prevent reverse engineering of theri models too as it adds another layer of black box to it
Does anyone know if you can select multiple cells with the latent couple extension? like using 1:2+3 in the "position" option?
Hey guys, where am i supposed to paste my upscalers files?
typically stable-diffusion-webui\models\ESRGAN
👍
wild
I'm sure it's not real time, but the result still is top tier
Hey Olivio, just saw your video about Latent Couple.
good presentation and good editing 🙂
I had Latent Couple working then it suddenly stopped working and no longer obeys me :/
stupid question but, did you tried to update extensions and SD?
Yep, everything was up to date. Not sure what happened or if another extension is whacking it
I often did repports on Github for this kind of problems
Yeah, I would but I use it so very little it doesn't bother me that much just thought I would mention it. If I need it again I will try a reinstall if it hasn't updated by then.
lol you can lick boots all you want. it is quite common etiquette that when you go into someone else's house, you follow their rules
while i'm anti corporate, anarchism is dumb and i'm not anti authoritarian
Prompt: "crayon drawing, mountain, sunny, cat girl, character orange cat, solo, standing, hands on hips, looking at viewer, smile, open mouth, by CraiyonUnicorn, trending on artstation"
I think it was "trending on artstation" that tied it all together 🙂
Guys, any idea what this error might be? I can't launch stable diffusion because of it
I just thought those images was funny. I just wanted to be silly and a bit trolly and wrote a prompt like " Rick Roll troll face cat girl, 8K, 9000k, UHQ, UHD, super detailed, best Quality, even better quality, super realism, hyper realistic, RAW photo, master work, aesthetically pleasing, by Rick Astley" 🙂
A stunning 3 - story apartment building, composed of 3 modules, the total soil area 1000sq nestled on a gentle slope in the picturesque countryside overlooking the charming city of Tavira in the sunny Algarve region. Ultrarealistic, With its cutting - edge eco architecture, this building boasts a unique blend of neo tradicionalism and eco architecture seamlessly integrating with its natural surroundings perfectly balanced by the natural materials that enhance the building's eco credentials. Surrounded by lush greenery and bathed in the warm glow of the sunset, cinematic --ar 16:9 --v 5
I hit random keys and somehow THIS thing came up...
I’ve read a clear explanation of what “mask blur” is in inpainting (applies a gaussian blur to x pixels from edge of mask), but can’t find a clear answer on what padding size affects
I think that's info around masked area AI will get.
More padding = more context for AI on what it exactly is masking, visually
well, idk for sure
I have a tendency to ask questions, then realize I could probably just do a quick test heh
pickapic has fun zero shot experiments. cookie monster as the t-800. other models struggle with this concept quite a lot. it's one of my common zero shot tests lol
the cookinator
i know now why you dunk milk, but it is something i can never do
i dont think picapick has that syntax. [] there is negative prompts
pickapic rather
Ah yeah, I overlooked that detail
I watched a YT vid on Dreambooth, which began with how to upgrade to PyTorch 2.0 with compatible x-formers. It increased performance. I was expecting all other popular youtubers to have some tutorial to upgrade, or at least some other mention on Reddit or something
Seems like something significant that’s under the radar
This was the vid https://youtu.be/pom3nQejaTs. Instructions for pytorch 2.0 are at 5:20
I’ve upgraded this in SD over a week ago, haven’t had any adverse effects as far as I can tell. Only better performance
garota linda
I just thought those images was funny. I just wanted to be silly and a bit trolly and wrote a prompt like " Rick Roll troll face cat girl, 8K, 9000k, UHQ, UHD, super detailed, best Quality, even better quality, super realism, hyper realistic, RAW photo, master work, aesthetically pleasing, by Rick Astley"
Then all love cats I had to make a cat, I made it smile and gave it big glossy eyes and used the tag "naturally cute", I had to edit out the bottom part even it was not dirty.
Oh, and I wrote a bit of text on the black bar.
after a big discussion with the dev of Multidiffusion, he completely fixed his extension, this guy is awesome !
https://github.com/pkuliyi2015/multidiffusion-upscaler-for-automatic1111/issues/63
Points up. Mods please. @glossy herald
TY.
No.
Apparently so
Maybe 5 min, you where here fast.
I mean longer than it should, yeah
I reported it but you got here fast with the mention
I reported both of them then I threw up
I am so new with this I did not know how, before I understood other was sending ping
The worst is, I feel guilty then it was aimed towards me as an reply. I may made someone angry.
I'm checking with the team. I did tests following this and reporting works OK for other messages. there may be a fail we need to fix on specific ways to post, not sure, but I can confirm other reports are working
Just for that, support cat want to say thanks to @glossy herald
Spam doesn't?
I'm gonna throw up too now damn, that was long since it hadn't been that dirty
not the same bot, some does, some gets by discord, ... I'm no admin, I just see right clicks and reaction ⚠️
Whoa, looks like an amazing extension
@wispy nest
Currently, there is no bot on the server that generates images. However, there are plenty of other ways such as the official https://beta.dreamstudio.ai/ website or by running Stable Diffusion locally using your own hardware! Check out #1080946152318443610 for more details! You can also stop by #1025467151206854736 for any issues you experience while using DreamStudio or #🤝|tech-support for any problems you encounter while installing it locally!
i'll start here then hahah
for reference, using automatic 1111 webui
reloaded my ui so my prompts are lost
but these are the sorts of things i'm getting out of SD
Drag n drop an image you generated into PNG Info tab of Auto1111
😉
yeah that first one is:
a trendy bear logo, minimalistic, corporate, detailed, HD,
Negative prompt: ugly, tiling, poorly drawn hands, poorly drawn feet, poorly drawn face, out of frame, mutation, mutated, extra limbs, extra legs, extra arms, disfigured, deformed, cross-eye, body out of frame, blurry, bad art, bad anatomy, blurred, text, watermark, grainy
Steps: 50, Sampler: DDIM, CFG scale: 7, Seed: 1336646256, Size: 512x512, Model hash: bfcaf07557, Model: 768-v-ema
There’s some good checkpoints for vector looking output
alright
I had that old prompt I used for different sports logo.
What do you think of :
2d ferocious bear, vector illustration, angry eyes, football team emblem logo, 2d flat, centered, highest quality, very detailed
Negative prompt: ugly, tiling, poorly drawn hands, poorly drawn feet, poorly drawn face, out of frame, mutation, mutated, extra limbs, extra legs, extra arms, disfigured, deformed, cross-eye, body out of frame, blurry, bad art, bad anatomy, blurred, text, watermark, grainy
Steps: 15, Sampler: UniPC, CFG scale: 5, Size: 512x512
I kept your negatives
I'm going to check that out
any reason behind switching sampler?
I find UniPC on CFG5 very effective, gets the same results as DDIM but on just 15 steps there
can someone explain me what's going on on the background hahah
All make so good images today, then I come... Giraffe!
yeah I keep getting things like these
you had that with your method before my changes too ?
yup
turned up the steps:
you are using a 768 model
so much sense hahaha
let's try that again now :p
works correctly
nice 🙂
it works with lots of animals
quite easy to change the prompt
tysm for helping!
@strange jungle an example of a drywall knife
Ok, that's a dry wall taping knife\
Looks like SD 1.5 doesn't have that , but you could train a model with it
For what you're doing, you don't need a fancy professional rig. The 8GB is probably enough
How are you looking up what SD does/doesn't know about?
I am generating images
when I use "dry wall taping knife", it comes up with a plain dry wall knife
it is
Fair enough
Even when I use an image of a taping knife, it tries to make it a different type of knife
however, what I have can generate and upscale for what you need. It would just be a matter of training the knife into it
8 or 12 can train
I used 1.5 based models. Not sure what a 2.2 model would do with it
I think DDIM should be good on 15-20 steps too, I think DDIM is one of those samplers which doesn't really benefit (or even change) from high steps.
I was doing 15-25 on anime
Interesting, that probably explains why the model tends not to recognize tools in general
Common things like hammers don't give it trouble but a lot of things you'd find on a work site trip it up
If it wasn't trained with images of tools, it won't have them
8-12 vram GPU system will handle training
Is there a way I could contribute images to the training data the mainline models use? Seems silly to just do it for myself without contributing to the community
there are groups that do that, but I only know they exist, not how to get involved
as far as I know, SD doesn't take contributions for training their new versions
If you trained it with images of tools, you could then upload the LORA or whatever for people to use = contribute
Kinda neat tbh
Last image for today, and I do not know what happen in it, but some try to get her ears.
pig ears yay
Looks like Chester Cheeto is trying to gank the ears off.
that was my first thought.
just sharing a couple images from a model i'm trying to train on ghibli mixed with old dreamworks 2D animations. a lot of the current anime models have a very similar look to the face so i'm trying to train a model that has more variations. :)
really like the one on the right. very nice
are you making that model available to others?
reminds me a LOT of atlantis
people like animation 🙂
Very cool model
thanks guys! there's 52 images total, tried to prioritize characters with unique variations in facial features and expressions. i trained it with 5,200 steps, but i'm still learning the best ways to finetune these things.
what do you mean by checkpoints and do you have any specific ones you could send my way?
Ok so Multi-diffusion extension definitely puts Ultimate SD Upscaler into the trash can
thank you! yeah i'd love to make it available to others if it works well. it still produces jumbled garbage sometimes for some reason but if i can fix that id love to make it available
Hahahaha! Yeah! my bf actually said the same thing that it looks like atlantis art 🤣 thats the dreamworks style coming through i think
1-2 e-6LR?
that feels like good diversity/training without overtraining, but I haven't seen the dataset so hard to say.
Well done figuring this out !
models are saved as checkpoint or safetensors files
I haven't actually used any yet, I have so many things bookmarked to get around to.
Here is one I have bookmarked for example
https://civitai.com/models/4618/vector-art
ooh so .ckt files
thanks!
Do we have a way to prompt "portrait" without saying "portrait"? I want to get face+ maybe some shoulders, but not upper body.
"Portrait" might trigger painting style...
You could just use openpose with controlnet
Hahaha yes! I remember seeing 1-2 e-6LR in the colab somewhere, not even sure what that means or what it does but that was included. 🤣 Hahaha and thank you for the encouragement!
nah
try controlnet openpose with only the top part.
Portrait style
I don't want to have control over how it will look, pose or view 😦
Try prompts like "portrait style" or "head and shoulders"
Well, if you use a veyr short Guidance End, it should start generating relatively where you want it
then let it get creative
if you want some guide/definitions on the params, how to do all that, there are some resources runing around
or, generate an image of the upper body, then crop out what you don't want
The problem is I have robe in my prompt and usually AI trying to include it too, so goes for upper body
I want to keep it for furr on shoulders...
we just chat there, you generate locally or using cloud services
#1072220168534642768 #1072229020520947753 #1080946152318443610
Knives, lithograph style
hey new, I'm guizmus (and am sorry)
we do pictures
we share techniques and resources
we have dat fun
woooooo
x)
foreveralone in controlnet
depends what you are looking for
this is Stable diffusion, an AI that does pictures
but we don't have a bot for that
there are websites though
here is the faq
Currently, there is no bot on the server that generates images. However, there are plenty of other ways such as the official https://beta.dreamstudio.ai/ website or by running Stable Diffusion locally using your own hardware! Check out #1080946152318443610 for more details! You can also stop by #1025467151206854736 for any issues you experience while using DreamStudio or #🤝|tech-support for any problems you encounter while installing it locally!
Stable Diffusion and its extensions are the best for that
you too !
This image is from a training on images from the Battle of Cut Knife Creek: http://travelphotobase.com/v/CDNSK/SKRP153.HTM
Graphic of Battle of Cut Knife Creek by Grip P.P. Co., Toronto, at RCMP Heritage Center. Regina, SK.
you just "leave the server" ? or do you mean on discord itself ?
yes
"lithograph of knives" got this
click on the little gear here
then delete account if you are looking to delete discord account (first time i'm asked that)
Yeah I’m working with one of the cloud models and this one’s probably not too well trained
Will have to dig out my tower and get a local install
looks like 1.5 has a lot of images from lithographs of the battle in its original training set
Hmm
Latent Couple + some inpaint.
Maybe I need to start researching tagged images of more common household items
Seems like most of the data out there is aimed at humans/animals and fantasy stuff
targeted conditionning like that is so strong
Someone around these servers looks at the database for the original training - posted a link to where you can search for what was trained into SD
six fingered hands?
Negative prompt has parameter for that... but what the hell. Thanks for feedback.
Anyone had any success using the Region control with Multi-Diffusion extension? Initial tests, my foreground region is just trash sitting on a beutiful background region
that's the worst type, when two fingers branch from one - it kind of sneaks in the extra finger
Generar una imagen similar a la enviada
Surprised to find that Multidiffusion also works with Latent Couple
She almost figured out how to hold that sword
I read that the recent update on auto messed some stuff up and extensions don't work properly. Anyone know what's a good version to use?
Underworld biker is resting
anyone else got some tips for generating logo's? preferably something very simple, clean and corporate
controlnet
in what way exactly
here is an example
I used controlnet, and img2img
canny model for controlnet
base logo :
or I could go a lot more technical and complicated
i see
depends the liberty you let it have
time to figure out how to get controlnet into webui then hahah
but to keep the likeliness of the logo, controlnet is almost essential
that's an easy guide
lifesaver
hmmm 
What's up dingle hoppers
I wish I had access to Adobe firefly so I could make a comparison on their subreddit just exposing how bad it is
So many people just think it's so damn good, and keep being rude to people using SD/trying to show off the strengths of SD
"oh, but it can't do graphic art designs"
Interesting model...
graphic art designs like what?
The txt 2 vector generation it can do
Really, tell me more 
that's really the meat and potatoes of Firefly
They think for whatever reason SD can only do realism in stable diffusion
It'll probably be an extension in A1111
idk, based on what I've seen it's not just for gen , it's just tons of different tools and photoshop integration
Like...tools literally took from SD community and integrated into their systems
Yup, and they aren't even good implementations of them
I just can't believe they marketed it/released info on it with it being so bad
Mmmm, delicious generations out of firefly
lmao
And the people praising adobe for not using the work from the people they actively screw over with outrageous pricing is... Hilarious
yea their model is kinda...reminds me of 2.1 SD model...
I mean, at least the kid does look like their offspring
lol
He has more of his mother in him, unfortunately
yea that's better
I contributed to a reddit post comparing all the image generation AI's, and don't worry everybody, I made it clear that SD blows all the others out of the water
nice
The prompt was to get a art deco inspired futuristic cyber city
Lmfaooo
That's weird
This is rancid lmfaooooOoOoOoOoO
mj 1024x1024, but details like 512x512
ngl, MJ, Firefly for the animated look and SD for the realism
Here was my other SD one I submitted
I used two different realism models rather than art models, TBF
But also, SD is the only one that's even slightly art deco like the prompt said lmao
I think most MJ users just dont do as many gens as we do...since it's not as comfortable to use with discord and shit
Oh yeah, and MJ is slow
I hated that having to use Discord
I remember when I used it, even on the paid plan a full high res image was several minutes
I never did figure it out way back when and never returned to it
Not even including the wait time, just the gen time
In all honesty, they actually were really impressed with SD's result, and several people said they were gonna have to look at SD again, cause they didn't know you could get such detailed results
So many people sleep on SD, and it annoys me
Here I am, trying to sleep with SD
😩
Lol
All I want (at this moment) is a telephone, but it doesn't seem to know very well what one is. I expected at least the 12 keys to be consistent, but not even that I'm getting ☹️
I recognize an embedding in there but forgot the clothes torn one's name.
It's not embedding
wearing black dirty (ripped:1.2) (torn:1.2) cloak, (robe:1.1), gloves,
idk
I think I haven't even tried it on base 1.5, I stopped using base models pretty quick
welp
Now I see why an embedding was made
a girl wearing black dirty (ripped:1.2) (torn:1.2) cloak, (robe:1.1), gloves to keep it simple to see
torn looks like it sort of worked
maybe increasing weight on ripped will do?
but it didn't work on some 1.5 models too
I think rpgv4 didn't do it too, clothes was always too clean
Interesting.
Have you ever noticed certain seeds will force two subjects when you prompt should be only one?
I think that only happenned to me lately when I tried to do too high resolution
like 1024x760 gen straight away without highres fix
idk about normal gens
cut out right side manually due to a broken hand 😄
I changed the seed and 2 subjects gone. Always distinct but it acts like I prompted for two
768x768, yea , still can happen from time to time
I think my pb was 3 persons at 1024x768
when prompted one
I added full body
lol
I'd shit my pants if that came walking towards me
I mean you see the first image coming then see the second
How do you describe this emotion?
eyebrows
frown
ty
yw
I might've gone too far lol
but I need to know how it'll look on realistic models now
Which model is that?
Update on "screaming" on realistic models...it looks terrible..
I don't think it's on purpose , it's just looks bad
mouth opens too wide making it just...ew...
never know
and on some images it's just an open mouth , while whole face has no emotions lol
One thing to test would be lowering the weight of screaming
I think for realistic it's better to do angry or something instead of just screaming
it looks like it's trying to do screaming in pain, not anger
That's a bit different emotion tho, my image was more like worried, this one is sceptical
well i was describing more the physical brow movement. it's over a range of emotes
hm, aight
any wrinkled brow is furrowed. i learned it from old poker playing navy guys
Furrowed brow
mix an match for differennt exciting results lol. furrowed anger, furrowed sadness, furrowed joy
The problem is how to control facial emotion to a fine degree
That's probably dumbest thing I ever created 😄
furrowed brow, joy, happy grizzled anime man
If it's sequential art, how would one control it to add a little more or less ie 10% frown vs 13% from panel to panel? I suppose it's still something AI people are working on
I tape my forehead with duct tape so it's not possible
Super glue works too and can shower
okay, yea...
angry scream works on realistic models too, but takes alot more attempts to look somewhat decent...
Add sword problems on top of that , hand problems and eye feckery and here we are....
Really hard to get decent image without inpainting...
since we're posting spooky stuff, here's "the nightmare king" i made about a month ago. had to remake it because i had gotten rid of the original but still had the generation info. never got around to doing a proper upscale on it, but with what's available now could probably turn out pretty good
Is the... VIUIE's keypad fixable somehow? It seems there is still trouble with numbers and letters
You can get rid of it with inpainting, but numbers...I don't think they make alot of sense for AI...idk if you can do it correctly
it can occasionally get them perfect, found that out when i was attempting to make victory stands (the 1,2,3 places people stand on after an event). most of them were pretty much trash but a few got the numbers perfect. but yeah...seriously hit or miss.
Yeah, 100%
occasionally yea, but with so many things that can go wrong, that'll take alot of time
probably good idea would be to finally do controlnet, but naaaaah
thinking something like a lora could be made for specifically that level of tuning.
Maybe the equivalent of blendshapes/shape keys
there is that new face marker contnrol net. or was that just somebody using sketch mode with markers?
I have been playing with SD and new styles of generations, and I am pretty impressed at how granular and high quality the control over the style and the quality of the final product are
Ok, I'll let two SD instances generate forever. I hope they can get a at least a single good result 🙏
Just started an X/Y/Z plot using MultiDiffusion extension enabled - was wondering why the ETA is only counting upwards - it keeps doubling the dimensions of every generation lol
Image # 4 is about to finish with dimensions of like 12k x 12k
actually nvm I think it's just at a standstill
So there's a little checkbox "keep input image size" which I unchecked and now is running the XYZ plot as originally expected
I recommend ultimate SD upscale over multidiffusion
I have tested both a ton, and I just get faster, more consistent, and higher gross resolution results
I made an image that is 15630x6528 in less than 20 minutes on an 8GB 3060ti
I wish somebody would make a more granular controlnet for poses where we can actually do proportions and such
The one thing about SD Ultimate, is I kept having visible seams
None of that is happening with this
yeah, this isn't what I am talking about at all
doesn't really help to get unique proprotions
Each segment that is clicked, has its own big list of parameters
bone thickness, lenth, etc
Unsure how much more control you could get than that
or do you just mean, controlnets execution of the openpose input
yeah, that is cool and all, but still not what I was meaning. I was meaning a controlnet where it adheres to extreme proportions, rather than fixing them
like you can give somebody a huge head and it gens them with a huge head, or a really long torso and stuff
ah
I would love a way to control their torso proportions as well
like hip width, waist, shoulders, all that
my first creation, prompt was about a soviet propaganda poster
dont really know what it means
but its kinda cool
any way i can get a decent upscaler? amd 6600
a thing ive noticed with all ai image generators is they really shit the bed when it comes to text
even when you tell them exactly what letters to put
1340 x 2048 in 2m 20 sec, on RTX 4070ti using Multidiffusion + controlnet
And, Surreality is a really cool checkpoint
I wonder how long on a 4080 and 4090?
basically same, but with noise offset 🙂
still nooby even after doing this for awhile a few months back, doing what I can with a weak gpu lol
Daily reminder that some models just aren't good at some things lol
This is a specific prompt using photorealistic dreamshaper
Absolutely horrific results
Versus the result I got out of deliberate
Some models work, some don't lol
Looks like dream shaper doesn't work for this prompt yaha
cat on ice
Currently, there is no bot on the server that generates images. However, there are plenty of other ways such as the official https://beta.dreamstudio.ai/ website or by running Stable Diffusion locally using your own hardware! Check out #1080946152318443610 for more details! You can also stop by #1025467151206854736 for any issues you experience while using DreamStudio or #🤝|tech-support for any problems you encounter while installing it locally!
my branding theory is that each version going forward will potentially have xl variants
Looks like it.
tbh, I saw NOTHING, save for slightly better text, that we can't already do with loras, embeddings.
I am messing with it a bit
i was reading that lora creator thread on reddit today and was wondering, what if someone trained a lora on videos, like every frame
it does seem to do massively better, at least on dream studio... but thats cause the other base models are trash
he asked that and i wonder if he was hinting at capability hes glinted
I think comparing it to the base model 1.5 is a way to try to estimate the potential it can have
like, 1.5 on its own wasn't perfect at all, but loras and all make it a beauty for example, same to a lesser measure on 2.1 that took off slower.
So if 2XL seems to perform massively better than 1.5 out of the box, it could mean it could be trained to even higher standards than what we have
It mostly depends on community's engagement in perfecting what will come out (and when it will come out too I guess)
That is what I am hoping for big time
just really hope that quality isn't locked behind another terrible text encoder ._.
but I think other things could push the community to change
Does it understand prompts better \ knows more words?
cause there are lots of issues with that on 1.5...
it has wayyyy more parameters
I think its 2.6 billion now? IIRC?
that's interesting
like inner structure/more params, leading to just more possibilities for the model.
Imagine if the models could store more data for real and be in 2048x2048 base, with better prompt understanding, more token limit, .... Even if the results were initially worse than current 1.X 2.X, people would invest big time in it because of the potential
my gpu can't do even 1024x1024
(I'm not saying 2XL has any of those to be clear : I have no info on that)
unfortunately, that would also be massively restrictive
yeah that's an extreme and stupid example
let's say it would run on 128MB VRAM too
SD 2.x should have been so good, but to say the ball was dropped is... an understatement
so now we are going going Absurd 
lol
I'm just making the point that massive tech upgrades in the base model would be a real push towards them, even if the results are initially bad
Yeah, huge agree
yea
I would use 2.x and deal with all of its bountiful issues... if it worked as good as 1.5 for training/listening, but its a massive leap back and thats offputting to me
2.1 didn't lift off because of this imo. 768 is an upgrade, but it wasn't enough at the time to convince people
same for 2.1 training, never managed to work with it correctly
but even today, the incentive for a trainer is to use 1.5 because people do mix
The issue is training LoRA's and other things off of 2.x's horrible text encoder is a nighhtmare
That is what has lead to me not even looking at 2.x
I tried, but yeah, I gotta say the repeted fails when it came out, when I was feeling like "i knew what I was doing for training", that participated to my burnout and break for a while
I do have to say, the text legibility is a good sign
That is damn good and consistent text
yeah, it reduces the need for controlnet for that
so if thats something we can expect to translate into more fringe things like facial expressions, then I am stoked! assuming its not ruined by another lackluster TE
doesn't always work (2XL)
Can always make an extension that detects generated text and then cleans it up a bunch
ie : we'll fix it in post
try with other actual words, maybe the reason behind it is cause it has some kind of visual representation for this word
to be honest, all of the base models are trash on their own, and they rely on the tuning, because they are not tuned for greatness... but to see that SDXL isn't doing that bad at all out of the box is... hopeful
if that is SD 2.2.2, its an aesthetic fork of 2.2 XL, which takes what you say as a description rather than a tag
AItrepreneur shows it off, how SD XL tile, and SD XL beta will do text of what you say, but SD XL 2.2.2 will do a stylized prompt around it
his thought is it works similar to midjourney, which I could see, based off of my very little use with it
my only massive concern for SDXL is the size... and the VRAM contraints for training
I must say, these watercolor sunsets look fucking great
Yeah, if I can't train it then it is DOA
I'm on Dreamstudio, not on next, it's libeled "SDXL beta"
ah. We don't know which one that is
yea it looks like it understands "written" , probably we'll see someone doing embedding or something on it for better results, interesting
I am using the old dream studio, which states its 2.2.2 XL beta
that is the third one
yep, it wasn't out when AItrepreneur did his video
The third one is a style, more than a base model... but not the same
I am trying 2.2.2 beta
I dunno but what I saw in his vid I was not all that impressed. Just too many things that must happen that if they don't then most people will stick with what they are already using
Does it have same problem as 1.5 has?
Like if you type "white boots" - AI applying white not just to boots, but to everything around, unless you specify other colors?
maybe on the public DS they auto select the 2XL model depending on the selected style/prompt?
an advertising board on a city bus stop with the words "Soon" written on it in pink
ok... I must say
yea it looks like it is still applying similar colors to other elements
I am seeing some massive improvements in some areas, and I do not wanna get too excited if there is a chance they drop the ball with the TE but...
but it\ll be easier to tell with humans and clothes
SDXL seems to stop all the others with stylized simple prompts
2.x TE sucks
I saw a lot of 1.5 in his vid, btw. Far more 1.5 in it than 2.x
2.0
2.1
SDXL...
I think the results kinda speak for themselves in terms of detail pulled from simplicity
will be interesting to see if that means we need short prompts for the same detail, or if we will be able to leverage same length ones for massive detail improvements
an advertising board on a city bus stop with the words "Soon" written on it in pink, a blue car drives by, a white bird is on a bench
CFG10
xD
ok yeah, SDXL is killing it for simple style prompts-
"Water color painting of an autumn forest"
1.4/1.5/2.0/2.1/SDXL
changing theme
Like I think its safe to say SDXL is winning big wins here
a photo of giant hammer falling on a city, with the word "BAN" written on it, RAW photo, 8k
I'm using the "photographic" style
cause so far SDXL won both of them by a lot IMO
I will try one right now
"Digital art of a cyberpunk city"
1.4/1.5/2.0/2.1/SDXL
To expand on this and why this isn't what you always want:
If we want to apply style of some kind to only one element - it won't work.
Like a simple example, if we'll prompt a car with one deflated wheel - it will apply "deflated to all wheels.
More then that - if there's something else on the image that can be deflated - it will be deflated.
Same goes for clothing, if we want let's say just a sleeve of a person to be ripped in a fight or something - that won't work too, cause it will try to apply "ripped" to everything that can be ripped.
prompts aren't enclosed by "," too, so separating words with it doesn't work.
Only overspecifying all environment \ clothes details fixes it.
With coloring it is not bad most of the time, cause it keeps image in same pallet and usually it makes image more visually appealing to an eye, good for design perspectives too, but not great for little detail changes and accurate prompting
ok, ok, ok
I am getting really hyped for SDXL...
And I am worried I will be let down
but so far, its impressing me compared to the other bases
yea, that's interesting base
"A picture of a pretty woman in a pink dress in an autumn forest at sunset"
1.4/1.5/2.0/2.1/SDXL
Like its another clear win IMO
SDXL did by far the best. It listened well, it has the best pose, and is the most aesthetically pleasing...
HmMmMmMmMm
please for the love of god, do not ruin it all with a bad TE 😭
TE?
Text Encoder
The part that converts what we say into things the AI understands
guys...
thats the part that makes it XL i thought. has way more parameters in that
Where can I test it?
"A photograph of a husky puppy sitting on the beach at night"
1.4/1.5/2.0/2.1/SDXL
Its the only one sitting, and its the only one visibly at night-
The usual test of willem dafo terminators is doing good
the base anime mode looks good, but it doesnt do mecha versions of deloreans
all the novel ai anime derived models can pull off delorean mecha
the results are great and i hope to see it refined. i wonder what it will take
"polygon art of a frog"
1.4/1.5/2.0/2.1/SDXL
SDXL is crushing the accuracy on all of my prompts-
Like, it has not missed a single time
Repported Clip_Vision with T2i_Style was not working on MultiDiffusion, he seems to be interested in it, we will probably be able to use Clip_Vision with MultiDiffusion! Top !
https://github.com/pkuliyi2015/multidiffusion-upscaler-for-automatic1111/issues/66
Top is 2.1, bottom is sdxl
"Ice cream candy"
What the heck is sdxl
its the new stable diffusion model
its being beta tested early
its stable diffusion 2.2.2
@dense tapir SDXL is showing a massivveee improvement of prompt understanding...
statue of liberty from behind
From below
from above
at night
being demolished
like its listening pretty fucking well
As a satanist, I approve lol
EF85mm f/1.4L IS USM, 5:4, pretty, woman, legging, brunette
It can even do shit like red white and blue braided hair
"A photograph of a Native American chief smoking a pipe by a campfire in the middle of the desert at sunset"
@dense tapir
SDXL is insanely accurate to what you ask for
"a photograh of a 3 layer purple princes birthday cake with the number 16 at the top"
@glossy herald x)
"A photograph of a man with sunglasses, a cowboy hat, blue denim jeans, a red plaid vest, leaning against a beat up pickup truck at sunset with a hand on his hip"
I cannot believe how good SDXL listens....
Ho yeah ! Super Jacques !
@glossy herald SDXL is insane
Super bien fait !
haha, merci
GÉNÉRATION GUIGNOLS c’est l’occasion de retrouver les marionnettes dans tout ce qui les a rendu inoubliables de Johnny Hallyday à Jacques Chirac en passant par les spéciales consacrées aux Présidentielles, aux premiers ministres, au football... et aux scandales de tous bords.
Retrouvez l'intégralité de l'émission ici : https://www.canalplus.c...
Lol it learned text
it learned a helll of a lot more than that
and yeah, it can do very damn good text now
it blows the other base models out of the water man
what about a macro photo, super zoomed in
"Macro Photo of a ladybug on a half eaten leaf"
not too bad. missing a lot of detail on the bug
damn, that actually looks even worse then 1.5 object styling 😦
colors all over the place, even specifying each object didn't help
Updated.
creepy
beauty studio advertisement
"beauty studio advertisement"
how can I get a image here?
You don't, you can install SD locally.
We don't have discord bot
Currently, there is no bot on the server that generates images. However, there are plenty of other ways such as the official https://beta.dreamstudio.ai/ website or by running Stable Diffusion locally using your own hardware! Check out #1080946152318443610 for more details! You can also stop by #1025467151206854736 for any issues you experience while using DreamStudio or #🤝|tech-support for any problems you encounter while installing it locally!
Ok. Thank you
thanks Soul 🙂
We really need a way to |enclose| part of prompts with styling, to apply certain words only to a part of prompt
SDXL
ripped, torn clothes doesn't work on it 😦
nvm, kindaaaa works on some images
linda, secretary in the podiatry service, warrior in the Xena series on Sunday April 22
what do you mean?
That 4x Ultrasharp Upscaler that is going around recently really is something...
Combined with HiRes passes it does create some crazy stuff.
I mean so if I I say man in a |pink hat| - I get only pink hat exclusively, everything else should NOT be pink or with pink shades mixed in.
SD tends to apply colors\ styles and well, words overall to whole prompt \ image.
Not for a specific words.
syntax is just and example, it's not a thing
I agree, that would be super helpful
Oil painting of a cyberpunk samurai, magnificent, elegant, beautiful, dynamic lighting, killian eng, ocellus, theme park, fantastical, light dust, elegant, diffuse, grimmer, intricate, light dust, orange and teal contrast volumetric lighting, triadic colors, splash art
Midjourney got some competition
Same prompt but without orange and teal part
eh, idk, sdxl isn't doing well on somehwat complex prompts
But if it actually better at understanding words I'm sure we'll get good community models later on
It does look stunning that 4x Ultrasharp Upscale in Automatic 1111
how do i install this upscaler to webui?
One sec let me look up that link again... give me a min
it's not good for photorealistic images, unfortunately
thats fine I'm aiming to get good at anime prompts
With Realistic Vision it actually does some amazing stuff
idk, based on my tests it makes faces look not so realistic
materials are fine, but faces not as good
You can Download the pth file from here: https://mega.nz/folder/qZRBmaIY#nIG8KyWFcGNTuMX_XNbJ_g
Just plunk it down into your models/ESRGAN folder and reload the WebUI then you should have the Upscaler ready.
maybe with low denoise...like 0.1, but at this point it's better to use any other upscaler
Yup nothing more then restart your webui and it should run it's course, it'll be available as a Upscaler during creation and in Extra's as a regular upscaler as well
I usually do about 20 HiRes passes with it and that more then enough otherwise it can sometime go a little freaky.
awesomeee appreciate it, i just got this set up yesterday so still looking/learning about this tool's capabilities
But like @wispy nest says it is better on Artsy stuff then Photography, but you can make it work though...
mmhmm~ Im not a big fond of super realistic stuff, if its realistic its tasteful paintings, i love that kind of art
As with everything AI Diffusion, the only limit is your limited imagination... once you see what it can do and your imagination goes apeshit with it the results can and will be stunning!
Here are some little tests I was doing on the mix model of the last 7 PoW ( #⭐|pow-info ). total dataset is around 330 pictures currently
more there : #1089542080801542207 message
very nice upscaler
"SDArt someone" prompt fpr 5th image or is it not reading png info properly?
it is surely reading it properly there yes
I didn't do all combinations but this model is trained a little differently
hm, aight
I trained on 7 PoW dataset at once
so I added a token to name the PoW
"cats", "underwater", "loop", "someone", "something", "somewhere", and "friend"
I'm not entirely sure where to go there.
What I can see, is that individualy, each submission is less well trained that it was in each PoW model
but at the same time now, we got some really great tokens working
in particular, the main token is great and very diversified
and the users that did participate in multiple PoW have also very good tokens to use
I got that comment on civitAI on one of the models
to make merging easier with other models u might want to consider adding a color rich tag, and perhaps just art . Would really help those of us merging
for the "just art", we are good since SDArt splits in SD and Art, token wise. but not sure what they mean with "color rich tag"
blob to you too and welcome
I don't understand that whole thing of what's happening
You made a model based on #1023999442338201721 submissions or what's happening?
What even PoW stands for 😄
Picture of the Week
and for 7 different week, I made a model, that lets people have fun mixing their creations together, doing variations, and each time creates a "style" token that fits the theme
but here, I took those 7 datasets, and trained them all at once
interesting interesting
it makes quite an artistic model, where you can call on specific themes of the PoW, or on specific users (each user has a unique token too)
but I don't get how'd you get anything with it with so little options to prompt
anything specific*
pow, right in the kisser
well you are right in part, you can have a hard time getting something specific.
there are 174 tokens trained in total in here, and almost none mean a real thing.
Mostly "SDArt" is trained and applies a style to the rest of the prompt
it's an interesting token since most pictures in the submissions are quite high quality
but it's mostly for fun and doing variations on the submissions of each user, or mixing submissions of multiple users at once
you can run lots of small prompts like I did, and get a lot of good results, select some tokens that you like, and mix with a "real" prompt that will use the styles of the token you choose
hm aight
A accidental dweller from aladdin
Behold... idk what this is since I made this while I was drunk a few days ago
That's more like a princess 😄
As long as I can train with it with the ease of 1.5 if not I have no use for it.
@smoky oakAs we know the 2.0/2.1 TE fights you every step of the way in training it. Same data 1.5 loss 0.05 while 2.1 0.4-0.5.
A magnitude larger is nothing to overlook.
I tested Bing, id do not allow me to generate images, so I asked how a tag would look for Stable Diffusion and it suggested.
ADJECTIVE: Cute
NOUN: Hamster
VERB: Holding a wooden rake and standing in a field
STYLE: Cartoon
I think it had been cool if AI could use a structured format like that to separate things, like Character1: Long hair, blue eyes... and so on.
Hahaha love it
You can make chatGPT do that
used to look through my phone at pics i took after a night of drinking that i forgot. now we're in the 2023 and i'll look at pics i synthesized while drinking
ChatGPT is online and made by a company I do not trust for now, so I hope the innovation happen within Stability and the opencource.
I mean...that's just prompts, nothing bad will happen if they'll see your prompts 😄
technically bing and other companies using gpt too
so pickapic.io is using sdxl now?
https://www.reddit.com/r/StableDiffusion/comments/11vu1n8/pickapic_is_now_using_stable_diffusion_xl_22_beta/
Does an ok job with what little testing I've done. Here's the best "Professional heavily stylized digital illustration of a beautiful ginger freckled female fantasy cosplay mage with freckles, grey eyes, and a closed mouth. She is 20ish years old. She is wearing modest long blue robes with many small details and is standing in a a dark alley in a fantasy city with many details. She is looking at the viewer." imo., though there were lots of different styles shown
5 votes and 12 comments so far on Reddit
Its not the best for illustration styles, as compared to some of the community models, however if you compare XL to regular 2.2 its such a huge leap
it can do all artstyles pretty well already, fine tunes with XL will be amazing
probably midjourney level
Yeah. I was just impressed that it followed a (relatively) natural speech prompt so well. not everything was that good, obviously, but really nice.
not sure if it always uses sdxl though, as i'm under the impression it should output 768x768 if it does and quite a few are 512x512
SDXL is only 512 right now looks like
i dont know why
and yes
pickapic switches between models
another pretty good one, but 512 and didn't follow the prompt as well imo. bet it didn't use sdxl 😛
Just ust the prompts "8K", and "UHD" to overcome the 512x512 limitation 😄
definitely didn't work lol. though it did completely change the artstyle EDIT: nvm, was just a fluke. lots of the previous artstyle in the generations as well
He he, and it did not make it better, I like the first one better.
image in the Old West, where two men are facing each other, one of them handing a stone amulet to the other. The image has a focus of mystery, with the men's faces detailed and very close to the viewer. The men are seen in profile, one of them is about 30 years old and the other about 70 years old.
Currently, there is no bot on the server that generates images. However, there are plenty of other ways such as the official https://beta.dreamstudio.ai/ website or by running Stable Diffusion locally using your own hardware! Check out #1080946152318443610 for more details! You can also stop by #1025467151206854736 for any issues you experience while using DreamStudio or #🤝|tech-support for any problems you encounter while installing it locally!
what prompt did you use to get her to hold a plate of burnt food lmao
that wasn't on purpose, that was just burning part, SD often makes human hold something burning instead
trange
funny accident ¯_(ツ)_/¯
if you are trying to get an image with this - we don't have bot.
Ok
Currently, there is no bot on the server that generates images. However, there are plenty of other ways such as the official https://beta.dreamstudio.ai/ website or by running Stable Diffusion locally using your own hardware! Check out #1080946152318443610 for more details! You can also stop by #1025467151206854736 for any issues you experience while using DreamStudio or #🤝|tech-support for any problems you encounter while installing it locally!
stderr: ERROR: Error [WinError 2] The system cannot find the file specified while executing command git version
ERROR: Cannot find command 'git' - do you have 'git' installed and in your PATH?
Should I upgrade the pip? It won't make the new version of SD install, will it? I don't want to use the new updates SD yet because it has some problems
I only installed python and SD
you can type
git version
into cmd to see if it is installed
I sometime write sarcastic prompts to make fun of the promp-myth, but even the words often are pintless they may add something, this is "(Rick Roll troll face) (red nose:0.3) (cat girl), 8K, UHQ, UHD, super detailed, best Quality, even better quality, super realism, hyper realistic, RAW photo, master work, aesthetically pleasing, by Rick Astley trending on artstation", with Stadle Diffision 1.5
idk if there are reasons to upgrade pip, in most cases newer versions are better, but it also might lead to compatibility issues with other things
I typed git version in the Command Prompt and it says 'git' is not recognized as an internal or external command
yea git probably not installed
I never had to install anything else before to use sd, I'll look up git and see
It shouldn't be necessary if you aren't going to update it tho, idk why is it asking
2.40.0 is the most recent apparently, should I download?
I see
I'm ok with manually updating SD every few weeks, I wouldn't need git pulls
I'll look up the errors and try to find out
My python is 3.10.6, should I use an updated 3.10.X?
Made an idea based on this
young girl feeding big massive demonic creature with glowing eyes,
I'd want to know where did she get that thing she's trying to feed creature on img2
Wouldn't learning a bit of blender be really helpful? Getting a few 3d objects right where you want them and then using depth or img2img would go far imo
3d modelling is a bit tricky to just get into
But yeah, it’s very good for depth + controlnet
I think another clever part of the midjourney situation is that they're the central host and control all the content. So Their model may very well likely 6 every image that's generated their aesthetic filter doesn't rate high enough. When they generate 4 varieties, the person feels invested in picking the good one. Only shares the good ones and the rest are all tossed.
This affects perception quite a lot. I say this because people always say a very high quality peice is "MJ level" but.. i see those and they just look like any other post coming out of a place like /highendai or other high quality generations from any model.
It's important to distinguish because people seem to think that "Midjourney level" is impossible on stable diffusion, but again, highendai exists and is FULL of very quality renders from people using a variety of tools. Many are SD. Why are we insisting on using a trademarked brandname SAAS to describe "good art" ? Brand names don't deserve such pedostles
Jimmy. She's an Ad Jimmy
Getting the error
No module named 'fastapi'
I looked it up and it says the fix is some !pip install stuff. Where would I paste the code to install?
command line into the root webui folder. .\venv\Scripts\activate will activate it on windows. then paste your command
How do I know which is the root webui folder? There are a few webui things on there.
/stable-diffusion-webui/ - that's root
I'll ask in tech support, don't want to clog this chat. Sorry, I didn't notice tech support chat before. Ty though
I am terrified that SDXL may be massively better, yet be completely useless because of the bad TE
Am I doing something wrong? Prompt: dark gray stone floor jagged
But man, when you say something... SDXL listens
Yes, the same feeling. Not massively better though but better while having no real way to train it means fuck it as far as most are concerned so they will stick with 1.5. Matter of a fact Iam about to do a test. This is a do or die test too.
I have had no luck at training 2.1 Lora, Locon (never tried loha) for styles. I had no issues with embeddings so my last test from this morning I am going to do with 1.5. If it works then screw 2.x
Unless they change the CLIP model from the one they currently use, or get one even worse, training will be a sob at best and not doable at worst.
After having played with the new models of 1.5 since 2.0 dropped there is just no need to upgrade short of getting full on 100% of the time text. SDXL can't even do that though it is better at it.
I don't know what to say, I tested it across 50 different prompts, and while every other model did not get more than a quarter of them accurate, SDXL got almost every single one perfect.
I spent a long time last night fucking around and trying SDXL compared to all of the other versions, and I would have to say that the quality difference between 2.1 and SDXL is miles higher than the quality difference between 1.4 and 2.1
I saw your examples and I saw nothing an embedding wouldn't already do.
Yes, but what I'm saying is is that it listens so well that it's able to do it without that
Imagine if it's able to keep that level of granularity, where I'm able to tell it exactly what color of vest, shirt, hair, pants, shoes, socks, and I color without them getting jumbled
yes, and that is why I suspect we are screwed for training it
well, as I said, if I can't train it then it will lose me for sure and a ton of others who train 1.5 already. They couldn't train 2.0/2.1 and stayed with 1.5 and said "eff 2.x". I stayed but I am beyond frustrated with it, as they were. I really do not think it will get any better for training and just sitting here generating images is not my idea of fun. A lot of Anime people say the same thing.
Anime folks are even more vehement about it.
I guess cause they have their favorites and need to train them in.
btw, this latest Auto1111 has so many errors, and bugs, that omg even an A6000 with 48gb of vram got an OOM.
bad mem leak somewhere
damn ._.
I just woke up, I am gonna keep trying to push to see just how well SDXL listens/understands compared to only 1.5, as doing 1.4/1.5/2..0/2.1 is quite taxing on my credits and time
isn't it due out this week or is it next week?
I have no idea
I just have a lot of credits, and these gens only take .2 credits, so I wanna play with it
When people train on 1.5 which model do they train off of nowadays?
For me I use realistic vision 1.3, as its my go to model
I haven't trained since 1.4 came out, but they are extremely similar
yeah, LoRA training
I use 1.4 for gen but I do use others as well as each has strengths and weaknesses
my 1.3 LoRA's work flawlessly on 1.4, so I would assume the same in reverse
I need to upload one to train test 1.5. If this works with nothing else changed I am going to be super pissed
do you train with a vae?
I do not
@smoky oak
whut
this has a dropdown to choose one
