#🏞|general-with-images
1 messages · Page 103 of 1
for these generators it almost even makes sense to cache the most common terms and just return one of the prev upscaled images
although, the top one is a very specifically made style for that exact look lol
"you want free img in 5 seconds? here u go, trash"
dream actually is just that fast, and I think its pretty impressive
now what happens when I remove the specialized style...
it improved as much as Bing Image gen did
a stunning portrait of jesters at the twisted carnival is the prompt i used for craiyon
oh I see...
not even anything spectacular
oh wait, thats not their diffusion model
thats the other type, I always forget the name
retrieval?
Imagen?
no, not that, its been out for years
a GAN?
maybe thats it
I had a whole conversation about this before SD was even released
do it like so many times it takes 24hrs for a single gen
it loosk bad lol
PFFT
is outpainting broken?
Apparently something has changed since the last time I used this feature so 6gigs is no longer enough mem.
yep, it is now rubbish. 768x1152 I want to add 192 pixels yet get a kublas error BUT I just did a full on 1152x1152 gen
128 works
I've been staring at this for a minute
It's a masterpiece
Ford: If you're confused, so are our trucks
well this is a ram 6900
"How much truck do you want"
Dodge: Yes
it has two truck cabs, so you can separate yourself from your shitty children
there's no back seat in the front cab
just two windows for some reason
these were WIP validation images from fine-tuning
oh my god SDXL just finished its conceptualization of the ram 6900 cornhole edition

these are awesome
Man these SDXL weights are nanoying me, can't get images that look like the bots from them lmao
I wonder how massive their additional negatives and positives are
will it reach higher coherency? it cant resolve details like eyes or other small objects without screwing them up.
i just got this out of it
it actually nailed the eyes? how is that possible?
Could be RLHF effect too. Is the model they opened early up to date? The bots probably use the most recent iteration
I wonder if it's possible to finetune with GANs... for example, make a hand detector and train a GAN to tell SD what exactly is wrong with it
I have found a new workflow that seems to help with quality considerably
SDXL tigers.
I think I may be able to get some better ones with my new workflow in comfy
SDXL attempts fashion models.
SDXL run locally
not bad, except the detailing of the background... and the overall 1024x1024 detial limit
if you saw what it looked like when we first started, you would be amazed lmao
cause this model is shit if you don't baby the fuck out of it
this is what it looked like with tigers 24 hours ago
i cant comprehend how it can be improved... But that it can improve is a good sign.
its looking good now, it still is incoherent on things like hands, eyes, tails and such
Its the exact same model, its just a bitch and a half to get it to behave correctly lol
so like, getting it to diffuse correctly.
like it depends what we are aiming for, artistic or photoreal.. if it mixes elements of bad art and you're looking for photorealism then you will get crap!
No like, you have to use very specific CFG for each model, with a very specific amount of steps, with very specific prompting, with a very specific two model per single image workflow, and various other things
It's kind of insane how much of a pain in the ass using this model is, but I have hope that the official release will be far less dense and annoying to use lol
well its a complicated neuralnetwork right? so its going to take a lot to coax it to work correctly... Then do all that behind the scenes and you get clipdrop.io
Cat trio on the roof, surreal, masterpiece, octane rendering, focus, realistic photography, colorful
That is true, but still, I've had to generate hundreds of images and try dozens of different nodes setups in order to get even remotely decent images out of it, luckily I did just finally figure out how to get the models to do 50/50 on a single image, and while it is a bit of a pain in the butt, my results are clearly improved
The main thing that really annoys me about all of this is that the bot makes it seem like you don't need negatives,
And then it will just be extremely easy to do, like just a few words required and you'll have great images, when that's really not the case
I really wish the team would share what kind of modifications they're different styles are doing, because if I had information on that, I would be able to recommend the base model weights considerably more
right... clipdrop must be doing negative prompting behind the scenes
Excuse my wonky grammar and weird messages, I'm using voice typing at the moment
those styles on clipdrop are interesting and quite powerful
Yes, interesting is one way to put them, extremely difficult to replicate the quality of is how I would put it
I have finally honed in on some form of negative prompt for decently realistic images, though it's still fails considerably with certain subject matter
i like their analog filter, and the isometric is interesting.
The artistic styles are honestly quite easy to replicate, it's just the realism ones that are a pain in the ass
oh right like photography and such
Yes, both the base model and the refiner are extremely biased towards digital art and painterly results, so you basically have to manhandle them in order to get realistic images out of either of them, let alone both of them in succession
And don't get me wrong, I'm not saying that this is bad, I'm actually very impressed by just how much quality I have been able to squeeze out of something that seems so bad on the surface, I just really wish that there was some clarity on SAIs side in terms of the positives and negatives they are cramming into our generation requests
tricks of the mind to get the brain thinking its photography... like DOF, glare, lensflare, framing, filmgrain
All of that is actually quite easy to get, the main problem is getting things that don't look like they were made through some form of painting or digital art, the post-processing style stuff is really easy, but the core imagery is hard
right i get you.
I do not see there being an easy to use workflow for the base model into the refiner implemented in UIs like automatic anytime soon, as there is a ton of nuance in the relations between the base model and the refiner
A to an extent I do like the ability to kind of lean the model from realistic to artistic without having many problems once you do figure it out, but that learning/growing pains phase is going to be a huge pain in the ass for the community, and I know that the vast majority of users are not going to want to have to fiddle around and find out which 50 words you have to put in your negative to not get images that look terrible
That said, when you do start aligning the right variables, it is a very powerful model, and I have a feeling that the 1.0 release is going to be even better
negative prompting should be made automatic then or even just click icons that represent combinations of negative prompt words
dont forget not eveyone is looking for photorealism.. a lot use it for artistic
The main thing that I do have to say is that using it as an image to image workflow, where you have the base model generate an image, and the refiner run a image to image pass on it is not the correct way to use these
You will get significantly worse results doing it that way, the correct way from my findings to do it is to have half of the image steps done by the base model, and the other half done by the refiner. The quality, coherency, and accuracy to your prompt goes up considerably when you do it that way. I just have zero clue how something like that will be implemented into auto
interesting...
Oh for sure, for artistic stuff it slaps, it's very easy to get phenomenal artistic results out of this model. And that regard I would say that it really is as easy as just throwing in a couple positive words, a couple negative words, and having great artistic images in seconds
but with artistic stuff its easy for the human eye to ignore what are actually mistakes or incoherence
with photrealism you really see what the model is capable of
Not even just that, they fundamentally look really good
It's always been stated that negatives are included with the bot. Just keep in mind the bot is a preview, not a finished product, like 0.9.
oh right so it does digital art really well..
I wish those negatives were provided so the community could know what kind of tedious amount of negative refining we're going to be in for
There are tons of great words you can use
As it is right now, my research on it is pointing to very tedious for realism and semi-realism
whats its base resolultion though? the upscaling looks like it needs improving,, all details look like reptile skin or alegator skin
its 1024x
if you are using clipdrop, yeah, their upscaler that runs at the end is absolute trash and makes the images look worse
Any examples of words you would recommend for getting better realism? I would massively appreciate it, and include it in my findings for SDXL
it is a trash upscaler. Why dont they use like diffusion based upscaling..
isnt that possible?
for sure
it doesn't even need the post processing, the images look good without it lol
Give me something you want to prompt
If I want a realistic landscape
For example, a photograph of a park at sunset, in a photo realistic style
That has still managed to be something I can't get looking even remotely realistic
Any particular kind of park? Place?
what sampler do you suggest in Auto for 0.9 api Sunny?
landscapes i was struggling with... foliage and trees it fails at
Lets say a park during autumn at sunset
for photoreals
its not usable in auto, so there is no way to know
from my extensive searching in comfy UI, DDIM seems to be the best
im using the api
ohhh, my bad in that case, thought you meant local haha
Heh. Give me a bit
alrighty, I look forward to some insight cause I have gotten a lot of things realistic, but not landscapes for whatever reason
also, if anybody wants to send me some prompts to try in local run SDXL, I could use some new prompts to research
positives only, realism focused
i preferred the local one while i still used it over Dreamstudio
i actually forgot for a while that i used local SD at one point lol
but yeah, i liked it more than Dreamstudio
same prompt on clipdrop io... one is with no style one is with photography style
local just has wayyy more control
it breaks down to usage case ofc but i prefer more control too
how did you guys get local? you mean a ckpt?
Github
Yeah, I have the base and refiner ckpt's
i would run it local if i had a good card.
whats the link?
delv into all the deep controls
You have to apply for them
ah
8GB VRAM is enough to run SDXL with the refiner in comfy UI
thought it was somehow available to dl
i even use it for rendering and simulations which is still good workable
not at the moment, for now its still in beta/researchers only mode
I am conducting tons of tests to see which workflows work best in general, I have found the best CFG's, best samplers, best scheduelers, best pipeline, and now just trying to understand the prompting better
i will need the 24 GB VRAM
how much will it need not that much I hope
8GB VRAM minimum fro SDXL
in comfy that can let you run the base and refiner, in auto I assume it will be much more restricted
in terms of gen speed, same as 1.5 and 2.1 at the same res
only slow down seems to be the text encoders, as there are 2 of them now, and they are huge
they can take a couple seconds to process your prompts
if you think about it philosophically... A photograph is not always art. not by definition. But diffusion AI art generators are supposed to be art... So that is what it tends to be good at.. and to get it to only make an image look the same as a photo is kind of missing the point?
but artistic photographs,, yes...
people want art.. they dont want a random cellphone photo looking image
for photorealism i might as well use stock images
you can see here the difference between Base, Base with refiner in img2img, and then 50/50 diffusion between base and refiner
as you can see, the 50/50 pipeline works the best for sure, but it is also muchhh harder to implement in auto
so does the sampler not matter on the auto upi version? like is it just processing whatever the default sampler is on dream studio?
(by 50/50, I mean half the steps are done in base, then it gets sent to refiner which does the otehr half of the steps)
middle one looks best.
whats refiner
SDXL is 2 models
the base model, which is 3.5 billion parameters
then a secondary refiner model that runs ontop to fix it up, that one is 3.1 billion parameters
noice
what happens if you run the refiner twice
so you like get the image, then put it through the wash again somehow? or it does it automatically
its just like running any diffusion model twice, just changes the details a bit, but doesn't refine them
you can do it 2 different ways
one is img2img, like high res fix
take this image, run it as img2img with the refiner
you get better results, but not the best
gotcha
other way is to diffuse the image 50%, like this
then let the other half diffuse in the refiner, which gives much better results
for example again:
how big can you go with the images before it like dupcliates the subject or whatever
im just on my laptop now, honestly they all look great cant see much difference lol
but this is a smaller screen
as you can see, the 50/50 workflow looks much better on fine details and colors and stuff
it nailed the hair on the split. Its doing something right
also, bonus with 50/50, is its faster
yup! The refiner is really good at smaller details, the base is really good at posing and basic blocking out
base@wispy nest
split
yeah i can see it now
not perfect, but a huge improvement
although so much makeup is added to the second it also almost looks like a new person
The limit is surely the resolution and quality of the training data.
it cant get beyond that because how can it diffuse to it
thats not true, you can get 1.5 which is base res 512x512 to theorhetically infinite resolutions
like the first with better eyes would almost look like a natural beauty type the second a more professional photo shoot
once we have right finetunes, we should be able to do 4k with SDXL like childs play
interesting...
for example, some of my upscale workflow images in 1.5
I gotta say though its pretty impressive just the stuff I can see on clipdrop and the api. I wonder how much more 'wow factor' full 1.0 will have
so its got the intelligence to work out how to get more resolution, so its not just simply diffusing to a trianing image and then stopping
So is clipdrop going to improve further as they do stuff in the background??
no idea
do they own clipdrop?
i dont get why dreamstudio isnt like featuring the best looking stuff
SAI owns clipdrop, yes
maybe I'll drop some cash there was gonna on Dream Studio but it seems its taking a bit to get 0.9 up there or similar to clip drop
clipdrop seems so limited with its 1024 res and terrible upscaler
yeah things look better on the api
@smoky oak can you try this one in comfy? change the two models of course
but clipdrop is still impressive as hell to me, just to see what the potential will be you know
oh lawd he crusty
like I said, I am looking for prompts to run local, which is as close of a look as you can get haha
clipdrop is nice, i paid for it because its £8 a month which is way cheaper than midjourney and 1500 images a day, at a faster rate than mj
it shows how things are improving since the early days
oh god I broke something
ono
every so often, certain seeds for SDXL get these super super weird pixel perfect splits
like this, excuse him being shirtless, apparently "handsome" means no clothes lol
i've seen split images in 1.5 too, rare
I have gotten it a couple of times
cant imagine what is causing that error
seed might have some random rounding error thing
oh mannn, this prompt looks great so far
man, when we can finetune this, its gonna be so fucken over for MJ lmao
did you try the kitty pic?
A bench at a park, near a river, autumn trees in the background
On Clip, it comes out as:
yes, but what is the negative?
For specifics on 0.9 on what you should do, for research, I would recommend referring to the appropriate channels for such things.
That said, consider the following:
I don't know how you're prompting, but start small. Don't overprompt. And think logically.
If you want a tiger in the snow, you might say, "photograph of a tiger, professional animal photography"
I'm not prompting this rn, but as an example, I would start exploring more options.
"Photo of a tiger, walking in the snow, professional animal photography"
"Photo of a tiger, walking in the snow, view from the front, professional animal photography"
"Photo of a tiger walking in the snow, detailed fur, view from the front, in a field of snow, midday, professional animal photography, in the style of National Geographic"
(Yes or no on the detailed fur.) You can always use more positives, too, whatever.
In terms of painting, I would recommend trying different painting terms. What's in a painting? Using cartoon, etc, can work, such as my wip demonstration on hands in my vol 2 guide.
Honestly, I've written a ton about negatives already, so if you have questions about negatives, I'd refer to my documentation
That was the whole point of the question lol
eyes and fur looks pretty sharp except for the front legs look odd., the background foliage is nailed.
So there's plenty of phrases already in there
the positive prompt is easy, its the negative 
literally 0 help on the negative, the BIGGEST bottleneck here lol
Slowly getting better haha
these don't look very realistic either, so I assume it may just be a limitation of the base in that case
where can I find that?
my nodes are such a mess to get this model running its best haha
It's a bit scattered, but all my links there
The Friendly Guide has info on negatives; vol 2 has some more info in the prompting section you might find useful
ok, biggest thing to say, 1.5 and 2.1 negatives do not work at all in this model
so that is all going to have to be changed for future things, myself included
I have had to completely change negatives to get this all working
oh also, negatives need to really be driven
positive prompt is all weight 1, negative is weight 1.4
im going to post a bunch of my clipdrop images.
Gorgeous!
oh wow
Yes ❤️
oh man, ok
the img2img version did a way better job on that rose prompt
that one looks very good
roses look much more... rosey lol

feel free to not share that information ever again, thanks lmao
I love you too

Always make the best out of life
ohhh, looks like clipdrop has the same issue with being bad at water color
I had to fight to get SDXL local to do good water color
This is actually supposed to look like this (with the way I prompted)
It's a type of digital watercolor style; it's not bad--it's just a style of art
one thing ive noticed on clipdrop... versus midjourney v5... the images when taken to photoshop from clipdrop are already tuned, if you run auto tone, auto contrast etc nothing will happen they are already good.. But midjourney is washedout bad contrast and bad color
ah, did you ask specifically for digital water color? Cause normal water color shouldn't look like that haha
There are many kinds of ways to do digital watercolor, and various kinds of programs that do it
Some people may use overlays/brush textures
base SDXL with no refiner does real water color the best
Some people may use different brush systems
The look can vary from program to program, as well as the ability of the artist
Yeah, real watercolor is def beautiful for sure
Traditional art can be hard to replicate, for sure.
@smoky oak hey can you try open a new tab of comfy, download this image and drag it into the interface for comfy to load the workflow. then change the models for the base and refiner to the sdxl ones, and try running it?
sure
what in the hell is this workflow lol
100 steps OMG
I will need to change a lot to get this working for SDXL
don't change 😄
okaayyy
I copied it from somewhere 
just got these out of clipdrop
jesus christ this is slow as HELL
gonna be like 5 minutes of genning for this lmfao
first image is crap-
i'm missing a few custom nodes, but i did my best to copy a workflow from a image
I am actually hurting to understand what this is trying to do-
@sterile templeWhat was this uspposed to be? cause its... terrible lmao
😄
beautiful
alright, thats enough of that lmao
I love the ears
In a way, it's kind of an alien cat
@tropic shellAny idea why SDXL randomly just fucking kills itself? lmao
Like I said, report any problems/feedback to the appropriate channels!
Ahahahaha! These are fantastic! I love the bottom left--what fun.
witch cat!
With very, very soft silky fur.
Insane
what are the chances lmao
The character from mine is more wuxia than sakura romance
I was just trying to see how well it does different races
wonder how it handles people of color
pretty well actually, nice
depends on the model
its SDXL
some anime models have a really hard time doing African and Jamaican phenotypes, for example.
It also helps if you ask for actors with the specific phenotype you're looking for
Just came up with an idea... ask Chat GPT to come up with a list of extinct animals like megafuna, such as the saber tooth tiger. And get it to write prompts to describe them. Then feed SDXL the prompts.
DODO photographed in the wild.
Also these clipdrop images now seem razor sharp.. like its improved in the last few hours
Some time I think my old SD with the older models is better than SDXL, mostly when it come to realism, it lean towards the odd and boring look MidJourney have, the images have god quality but are Ultrarealistic.
Ultrarealistic is not the same as realism, it is how we want and think reality look like but with greener grass and so on, sometime also called Hollywood-realism by some, this is a great art style but sadly it started to be used as synonym for Realism in Ai and I thing that is why we nor get rather bad images when we use Realism or Photorealism in our prompts.
Tags for those images was, "photorealism, black and white analog photography, high ISO film, film grain, side portrait, photo of an older female grizzled rock artist in a leather jacket holding an electric guitar with wrinkles on face detailed rough skin tired look, dark room, raster paper print"
Thank you, the point was not to say SDXL is bad, but that it now is better on illustration and fantasy, I often like a bit raw realism. I love those images, So I will try again.
right.. this was just a first try.. it seems they improved stuff behind the secens today, the gernations are looking better than ever
you know, it's interesting. i've got the weights in my local pipeline now and i've been comparing them. i do not use 1.5 anymore, because it's really poorly performing on zero-shot image gen with high details.
so i compare against fine-tunes of 2.0 and 2.1 that i made over months. like, very carefully fine-tuned.
but my 2.x models are heavily slanted toward realism. like, it's almost all they can do. trying to get fantasy stuff out of them really isn't easy, and when you do get it, sometimes it just looks silly
i also don't have the SDXL refiner model working. i'm fine with that because i'm interested in extensively testing the base model on its own, as that is what most people will be capable of running.
i think fine-tuning SDXL base for realism will take it a lot further than 2.1 / 2.0 are capable of doing, but as it is now, it is not there.
when you add the refiner to the SDXL workflow and tweak its settings properly like Sytan has done, the results are just beyond incredible, without fine-tuning
so SDXL has potential? do you think it will compete with midjourney toe to toe?
i don't get why people worry about that, lol
do you mean, will StabilityAI be more preferred as a platform than Midjourney for end-users that find Clipdrop/DreamStudio and compare it to MJ?
I think the majority of users will use the easiest program with the best outputs
a lot of people will simply avoid MJ because they don't see the point of using Discord, a clunky interface
Yeah I tested MJ before and the server lagged like crazy for me
sucks having to type the same crap every time you do your prompt, the --v 5.2 --ar .. blah stuff
Too many people in the room. It's probably worse now.
i think SDXL will be more popular over Midjourney for those who find it because MJ has more collective mindshare right now it's the one people assume you use
to be honest right now i think Bing image gen is more popular than MJ
like, stuff is changing for their dominance and not necessarily due to SDXL
I saw that google imagen is being released, but it looks like shit. Yet at the time they boasted that it was so powerful it needed to be kept a company secret and nobody could use it
seems fine
Imagen uses a large frozen T5-XXL encoder
sweet
no wonder it can do the absurd demo stuff.
We show that large pretrained frozen text encoders are very effective for the text-to-image task.
We show that scaling the pretrained text encoder size is more important than scaling the diffusion model size.
We introduce a new thresholding diffusion sampler, which enables the use of very large classifier-free guidance weights.
We introduce a new Efficient U-Net architecture, which is more compute efficient, more memory efficient, and converges faster.
On COCO, we achieve a new state-of-the-art COCO FID of 7.27; and human raters find Imagen samples to be on-par with reference images in terms of image-text alignment.
this is why it's good that they're releasing it
don't worry about "what it looks like", it's just preconditioned priors
you can fine-tune it.
how good is imagen realistically compared to SDXL?
probably depends what you're trying to get done
personally i like amazing quality output, like some of the stuff ive been posting today im impressed with for SDXL... But some people seem to think if it can make for instance a dolphin wearing a wizards hat surfing on a sea of candyfloss and smoking a spliff, then it must be really good
yes, the ability to generate new content that has never existed before while retaining coherence is groundbreaking. it's doing all of that with a small diffusion model / unet, and a very large text encoder.
So its technically impressive, but is it quality art that people want.
it does both..
brb im going to test SDXL on the prompt i just mentioned
what, what do you mean "google is releasing imagen", as in, the weights for it?
DeepFloyd is an open source implementation of Imagen
i just heard about them realisting it, did not hear all the news just a quick blurb.. i recall at the time they were keeping it very close to their chest and saying no plans to ever release it
well Imagen is from last year and in that context, it is possibly one of the most impressive models
they're using something like a 900 million parameter unet on top of T5 XXL
SDXL uses a 3.5B and 3.1B combo pipeline when the refiner is involved
not clear how limiting that 900M parameter is when it comes to fine-tuning or vast capabilities but the text encoder really seems to contextualise so strongly that it could help overcome that
so it understands prompts better than all others probably?
it's not so cut and dry like that
at some point these models stop being comparable in those terms and it's mostly just subjective concerns or prompting differences
like a very good 2.1 fine-tune (text encoder not frozen) and SDXL (text encoder frozen) are almost neck and neck for a lot of things. but there's subject matter where they diverge, and either one or the other "wins" for my desired prompt results with my prompt style.
you can still get "winning images" from the "losing model" for that subject matter but it becomes a lot more effort / time required
not everyone thinks in the same way to write teh same style of prompts, so one model might work better for one person and worse for someone else.
But what is the future, will they just pair them with LLMs that are beyond GPT-4 in power so that the context of the prompt is super well understood?
i was tempted to just split the 4 images output by my discord bot between a top tier 2.1 model and SDXL
like i hear GPT-5 will be trained on images and video
so you always get both
i don't know, man. i don't worry about future trends, it's all just hype designed to attract investors
i also just don't know enough to say with certainty
you can grab embeddings for a given prompt from GPT4's API. and assuming cost and speed weren't a concern with this, i'd then wonder like, how much data do you need to train the unet on to adequately be capable of representing SUCH A HUGE EMBEDDING?
i don't know how that relationship works out. to a naive observer, i'd assume that more and more training data is required for larger models
when a prompt goes in to the text encoder (and it has seen it during its training) but the unet never observed that concept during its training .... it's not pretty. it's confusing and noisy outputs.
This is too complicated for me to understand.
text encoders capture "relationships" between terms in the prompt and what features appear in images that the prompts corresponded to
These AIs are getting too complex. They dont even know exactly how the nerual network is working to output some text. so who knows what it would do if they paired an LLM with diffusion image generator
the hardware used to train sdxl costs 15mil?
oh text encoder sounds similar to an LLM then..
it's a component of one
and if they really do train a GPT-5 on images and video then perhaps by default it will be able to generate images better than all the current AIs
the text encoder is pre-trained on billions of image-caption pairs and develops some kind of internal "blocks" of weights that correspond to these concepts, and these weights are numeric. they're a huge array of numbers that get used as input to the unet for guiding the diffusion process.
Because of how it will understand contexts of thigns in the world
training the unet, teaches it how to be guided by this conditioning input
the unet sees billions of image caption pairs, but they might not be the same ones that the text encoder were trained on
the text encoder is pretrained and frozen, and is not modified when stability is training their unet
so this representation remains fixed, and the parameters of the unet are slightly adjusted again and again over millions of iterations. and when i say slightly, boy do i mean slightly
ok this is hurting my brain
so the text encoder is trained and frozen on image caption pairs. Then the Unet is like a neural network that learns stuff over millions of iterations..
yeah you'd have to teach the unet to understand prompts directly without this pretrained encoder
that'd take a lot longer and be less effective
we're also taught how language works early on before we're able to absorb many other concepts
true.
children are taught from completely useless little parasites into fully functioning adults through iterative training
when they do something wrong, slight correction each time will help them minimise their loss over the predicted result
can't get angry at them and try to make them learn too quickly. same applies to ML
honestly, it's not terribly different.
ever met someone who just isn't quite right and then you meet their family and you're like "ohhhh"?
training data.
you have a point.
This AI stuff though its really starting to take off.
oh we are in decline..
all these years of evolution and i still have no wings
we're going away!
obsoleted
Planes exist
Are they just large gov surveillance like the bird
correct, the myth of birds mirrors this
don't you hate it when you get Deja Vu?
why can't they stop cutting the hardline
what is this matrix stuff
@split valve but with that context, you see how a huge text encoder helps with image gen?
it helps but i think that's only if the unet can be adequately trained on all of the features the text encoder can represent
oh yea its needed to make it efficient.
i've actually asked the SAI devs this question before, and did not get an answer
it's possible it's one of the things we don't quite know yet
also it's why it kills me that we can't see CLIP-L's training data/captions
probably a company secret.
seems to have been trained on less data than other encoders, but possibly higher quality images AND captions.
@split valve yeah that's "Open"AI
training seems to be going in the right direction, lmfao
can get some really weird stuff out of SDXL
strange
the landscape seems random af
wow no input so whats guiding it
random noise?
yep just diffusion without text
oh it starts with a random noise seed and just diffuses what it was
?
but these are not actual training images
it does what it always does which is start with random noise and then remove it successively and get some features from the data distribution
yup
this just isn't weighting the results by the text embeds
it's interesting that the landscape came out as the 4th image in the batch, i didn't do anymore batches of that
some interesting results from my prompt randomizer
sharp like 3D renders.
Generate a fascinating image prompt with stunning quantum caustics experiments++ featuring suspended ferrofluids++ and fluid dynamic based solids--!
it's a little hypersensitive on the filter considering how incapable the model is of NSFW
This stuff i could not generate on Midjourney because of the keywords
but imo this is amazing art results
so allowing keywords even if we still have to have a NSFW filter, allows for more artistic expression
oops added the keyword "melting"
what is the token input limit?
how are you getting those images?
my discord bot
@split valve try Generate a fascinating image prompt with stunning quantum caustics experiments++ featuring suspended ferrofluids++ and fluid dynamic based solids--! as prompt
GPT 3.5 made that masterpiece
cant wait for the total realtime versions of this.
where a scene is defined by keywords and the AI drives the keywords
like so it can render every material with diffusion
That would be the future of video game technology
if that one even comes
right now it doesnt even come close at all to any of the art programs
including render ones
and those that are advertised for 3D end up being a big mess for them to be even considered for game development for example
would probably need quantum computing
the topology is horrible
i mean there are some more or less click and go retopology programs out there and plugins for major softwares
but a software that will generate you a clean 3D model and even render and materials is something different ofc
i dont think the future is 3D models or topology for that matter.. i think it would be diffusion based
i mean it would be 3D but not as we know it
like a 3D fractal program that runs with no actual genometry
i wouldnt want to gamble and wait for such a potential thing and im already a bit too far gone to do that now tbh
with too far gone i mean the investment i made for this hobby 😄
i see.. Well im just waiting for the total immersion video games..
not that im a big gamer, i just want to try the total immersion stuff if it comes out.
Dad doing vr jumps at tv and breaks it!!!! How did he forget that he was in his living room???
like black mirror
when you spend up to 3.000+€ per year on creative programs and plugins you dont "want" to then just drop it for the sake of hoping for an AI to do it all 😄
omg
immersive video games are already here
not in my oppinon.
i want the matrix
indestinquishable from reality
Thats the next holygrail.... photorealism is here already.
i see the predicament.
Well i do some 3D graphics stuff on the side, like sell 3D models on cgtrader... but im getting bored of it a bit.
i've got a skill AI will not replace.
even when the jobs dry up
nice 😄
im not that far yet, but i will eventually end up making money of this too
sooner or later i might invest into a 3D printer and print and paint stuff and sell them
but one step at a time ^^
rate this image please. i rly struggle with ram so its always a gamble if SD is going to finish the image and this is the best image i got so far
darth barbie
on a technical level i see some issues but on an artistic level those issues work for it
it's interesting how those two worlds don't always see eye-to-eye
its messing up the lips sometimes and other features but artistically i love this stuff its what i wanted
i need this prompt in my life
lots of erotic keywords
Midjourney would not be happy
is this taking it too far?
caprice
looks like what happens when you ask for forbidden NSFW tokens in local SDXL too
it's not pretty
forbidden = art
i'm kind of glad about that because it lets me confidently open up the bot's use to SFW discord servers and not have to check inputs
Its not once flagged NSFW so its self filtering somehow
This is the limit of its sexyness
pretty much lmao it can do more but i don't know how to ask it to, it just kind of happens when you don't want it to, and then it's gone like a fart in the wind
true.. its flagged NSFW before when i wasnt trying
erotic keywords but SFW
left = before SNR gamma fix, right = continued training after comfy helped me fix my code
that's zoomed way in
contrast is what it is, it's an ongoing battle but i like the improved details/blending
hasn't been long enough to really say what'll happen 😄
yes
When you're too high.
prompt?
The prompt is: A young woman, Martin Parr
my 2.1-v flex model
SDXL 0.9
have to do a whole bunch of text embeddings to make 2.1 that good
i'm not even using the SDXL refiner
ah, right, it needs a higher CFG
2.1 with higher CFG - the same prompt won't do as well between the two 😄 need a lot less prompting with SDXL
think it gave her a bit of a beard
these images are better than all my midjourney V5 stuff
midjourney cripple themselves by banning most keywords
cancelled my MJ sub after they went crazy with it when v5 launched
it outputs more nude stuff but i cant post it here
I've been using the dreamstudio api. Some people have research access to the checkpoint and have been trying to get it working locally
are they using user response to fine tune it on clipdrop?
not sure about that one. it would have to be using different methods for the 4 images, otherwise it's just 4 different seeds
they need to enlarge the text input box on clip drop, its suffering
supposed to be a dumb 1950s baby welding without PPE
I wish there was a paid option for running locally, I cancelled my MJ sub a while back due to aggressive censorship. I would rather pay stable foundation $30 a month for constant updates and stuff to run it locally
im sure all that will come soon with version 1.0
they need to optionally monetize running it locally somehow to incentivize development
using banned words outputs way nicer stuff.. even if its not nude
oh nooooo
that happened to me once but never at that scale
i was in the grocery store 
one good prompt I have found with it is "intricate technical blueprint"
my prompt is crazy
its got some interesting artists in it though which i can control it with
its madness how much you can control it with the right words
baby elon dreaming of space
this thing does that
yea i lost 5 hrs the other day
A friend took this helicopter shot of her house
Made a few adjustments (hopefully that'll impress her enough to let me sleep with her)
lol
how to build a car out of meat
Is SDXL still gonna release in the middle of this month?
Barack Obama in Ankara clothing, high resolution, bokeh
Inspiration from a recent trip to Japan
Anyone know how to get this style of hair?
If i wanted to make a lora of a person, would i make it on the base model or a finetune like absolutereality?
I tried many times and the best results, suprisingly, come from training on the very base 1.5
When using realistic finetunes the faces become weird and deformed, on 1.5 they're always clean
Not sure why that is, I'll try more but so far it was like that consistently. I use tags for description, they apparently work better than more natural prompts
What I noticed is that when I train on a custom model the output on other models is noisy, like if it's a poor mobile photo. No such effect if trained on SD 1.5. Maybe because most models are merges and training on them doesn't transfer well.
So far I've always had the best result for a realistic LoRA on base 1.5 and for cartoon/anime on NAI. The information I found is very contradicting, some suggest to train real people on realistic vision, I did that and the result is very noisy and blurry while when trained on base model it's crisp on any photorealistic model.
if I wanna mix Chris Evans with Ben Affleck its (chris evans:ben affleck:0.5) right? im getting decent results but want to make sure its the ideal prompt so to speak
in A1111?
if it's A1111 I think the syntax uses square brackets
I think either work, but just wanted to make sure thats still the best way to try and merge two people
Easier to interleave them with [chris evans|ben affleck] though it's a different way
In this case it would change the prompt on every step while with ::0.5 syntax it will render the first half of steps with one guy and the rest with another
is there a reason it disfigures fingers and hands every time?
cheers ill try this later
There's an official emoji here about that
Few reasons, I guess: low latent resolution and poor descriptions
Also the complexity of hands
a head will look like a head from every perspective, but hands man
Try upside down pose, also a nightmare material
Still a head but unusual angle, and it all falls apart
At least some typical hand shapes could be generalized such as holding/grabbing objects or making common gestures
But either there's not enough resolution or parameters or both
I rather wonder why the number of fingers isn't consistent. Rarely you get a second nose or third eye (basically never) but getting 4 or 6 fingers is a norm
I might have an idea on how to fix the hands problem
if the model can be weighted or biased towards certain artists, then would suggest bias towards all artists known to have painted realistic hands and fingers..
figers of doom.
like Andrew Loomis
sketch of hands by andrew loomis 😄
noooo. erase andrew looms from the database then
supposed to look like this
sketches of hands and fingers probably confuse the model. Because it sees it as kind of transparent
needs to be high quality photos of hands fed to it. The community should all take photos of their hands and send them in.
until the model finally learns
I think they've taught it certain poses where the hands look okay
it draws everything else like a god.. but when it comes to hands...
hey
what the hell is happening on the subreddit?
bots? there was like no activity whatsoever and that is a ridiclous amount for any server , even the biggest subreddits dont even come close to that
the usual ratio is like 200 joined to 1 browsing not 3 to 1
?? because reddits API is screwed up ? i heard its going away
But it could just be because SDXL is so awesome and its images more sexy than Midjourney
google earth is scam
why, you paying for it? cos then you really got scammed.
its free????
lol
I'm so sorry for this, I just asked for a random meme that make all happy.
seriously
how did you achieve the seasons? did you use the same seed>
I used Canny in ControlNet
amazing that its combining so many photographers, illustrators and painters so well. including Roberto Bernardi and the term melted candy.
you can probably also see David LaChapelle influence.
The origami filter as impressive as ever
If this is just the start for clipdrop then its going to blow midjourney out of the water.
same prompt with line art filter
MJ has its own niche of consumers tho
While ofc it shares part of the community with SD (and Firefly and some others)
how do you mean
Probably people who just want stuff that's easy
No install, easy prompts usually work
midjourney you have to type a lot of stuff. where as clipdropIO is like super accessible and dumbed down... the prompting seems to be even better than MJ.. in terms of how much i can put into a prompt
and it will keep understanding it very well no matter how many artists or styles
AND, SDXL is less locked down and less censored
and its way cheaper and way faster
Yeah, but it's more recent than MJ.
it has limitations at the moment,, but if it keeps evolving then it willt ake over
Hopefully, it will take off
Stability AI say they will have continuous improvement on SDXL
And new models are being trained like SD3
I find clipdrop to be a lot easier than MJ is - especially since the website doesn't lag like the MJ discord chat does
But it hasn't been around long enough to get the name recognition that MJ has
i've seen some quite major AI youtubers putting vids out about it. Saying MJ is in trouble
for some reason the youtubers tho always show clipdrop in its unapaid version saying " oh theres a que" and stuff like that... yet they log into midjoruney to compare it. (even though midjourney is 4 X the price
"oh no this free thing is slower than the paid service!!!"
MJ has its advantage that some users capitalize on
Instead of using SD
Sometimes they combine some of the AI art tools tho
As someone already mentionedy ease of use and also no need for a decent PC and they use MJ on mobile too
Stable Diffusion definitelly wins in terms of controls and customisation
Also its free
@marsh wadi
my issue actually isn't hands
it's just that characters look like a mule kicked them
bob eggleton vs darryn eggleton
willem dafoe vs greg egan
aqua teen hunger force version
steve buscemi style
bubblegum vs halloween
easter vs christmas
africa, canada
using a mix of photographers, illustrators and hyperrealism artists then using the clipdrop filters
im addicted to the pretty pink girls at the moment
will try other topics tomorrow. when my credits renew
then Peter Mohrbacher
Beautiful
I am now confident in saying that my local made SDXL images can competently outperform the clipdrop bot results for realism
and thus bitch slap MJ V5 in realism lol
specifically for human subjects, but not limited to just humans
now imagine what it can do once the wider community starts training variants of it owo
You find the perfect settings?
my workflow is getting insane lmao
getting a little silly goofy
as a professional AI data analyzer, I posit that the top row gets better results than the bottom row
yeah, I was just comparing 4 different workflows at the same time
or rather, 5
there are 5 chain of events going on there, and thus 5 final images
but yes, the top one is the success row
2048x in SDXL
mucho detailed
in raw time for image, SDv1 is faster... to give you a very small, relatively low quality image.
in a fairer comparison, at 1024x1024, SDXL is equivalent or slightly faster (when running just the base) to produce a higher quality image.
If you toss in the refiner stage, SDXL becomes slower again but at even higher quality
SDXL is close to the same speed as 1.5 at 1024x1024
tho, if you use it fully, with all of the different clip models and their included nodes, it does slow down considerably
but the output quality is well worth it
reminds me of my Blender node systems.
pingy pingy
oh yeah, my blender nodes were insane when I still used it, but I don't use blender anymore
I still use Blender but im currently bored of it
its like whats the point
its a great software but i have to have a creative thrist and be into it.
They say that SDXL will continually imrpove... Does this mean that coherence will improve and it will draw hands and fingers better?
And absolutely no jpg compression artifacts... Thats the beauty of Generative AI
1,5 can already do some very good hands, so I am certain SDXL will get even better at it
So you're telling me theres a chancec?
funny you should say that, cause I compressed it to a JPG lmao
lol must have been at quality 11
oh for sure, if 1.5 can do really good hands (which it can) then SDXL will be able to in no time
i just have no idea how the development of the model works so i was worried the bad hands was just baked in and would never improve
Have you ever seen base 1.5?
yea ive used it i just cant recall specifically what the hands were like
the whole model is bad lol
this is base 1.5 trying to do a wolf
and we were able to finetune it to this level, and even beyond
so if SDXL is as good as it is now, and we can improve it even half as much as we did with 1.5, we will have perfect hands in no time
i notice though if an object is further away it cant get the detail
That has been a big problem now for a while, though SDXL is 4x the res, so you have to get further to lose coherence, and even then it can do pretty damn good
some random old 1.5 stuff of mine
Wdym whats the point 😄
Some people even use that SD plugin for Blender ^^
Sometimes when i see all the AI generated stuff i think whats the point of doing it the hard way in Blender... Even though i will admit AI has not caught right up yet
Oh i could write a novel on why someone would use the "harder" one ^^
Gotta respond later again my pause is over, maybe in between 😄
But you can write
One of the reasons might be that some people love the process and some love only the results. AI takes the process of creation away and mostly replaces it with selection and fiddling with settings. To each their own I guess!
Both result and process are a reason why i stick to "the hard" one
I do use generative AI partially tho
Nah i just have like a creative burnout currently for blender... Probably i feel its not getting anything achieved and not making much money so i think whats the point...
Dont get me wrong when i was learning it and getting back into it it was a lot of fun
Its not for "everyone" ofc and you might end up having no use of it
I spend thousands on this hobby per year before i even earn money out of it
yea because its fun
Thats how worth it is to me
But things dont stay fun forever... usually
i mean i dropped 2 grand on a guitar pedal and a grand on a guitar at the same time... cause i value it
And heres me with a GTX 1070... because i dont value current nvidia cards or video gaming
I've been envying these types of images for months that midjourney creates almost effortlessly, to the point of analyzing many of them. Apparently there is nothing special, it is what is called low key (I am not even remotely an expert in photography), however, first of all if you look for this type of images you will see a significant loss of detail and high contrast, these are They are characterized by having almost 70% of the pixels in the blacks and shadows with a magnificent preservation of detail and without excessive jumps in the gradients. I am surprised and happy to see something similar, which I have been unable to reproduce no matter what type of lora light fix I have used. Would someone be so kind as to tell me if this type of images can be created consistently in SDXL?
can you simplyfy that i dont get it.
In my experience using midjourney V5 every image i had to take to photoshop and fix the contrast and color as they were washed out and needed more clarity.
SDXL you can take the images to photoshop and if you do just an auto tone nothing will happen because they are already perfect
Well it is obviously a matter of personal appreciation, from my point of view MJ's lighting is amazing, in any case, regardless of personal taste, as I said this type of lighting does anyone know if it is easily recreated in SDXL?
well i like i said i dont rate midjourney they make washed out images.. as of V 5... SDXL i think shall be very capable.
its hard to answer you exact question though, as i dont know the inner workings of the model, its some vastly complicated neural network with weights, and with training data that i have no idea of the quality of...
But it does understand concepts very well, like i was using keywords like octane render, or subsruface scatter or translucency or camera terms like "macro", or lighting types and it will produce it very well
so its understands context and concepts from its training data as far as i know.. This should allow it to be able to produce anything that you can describe with key words
I liked art since childhood
Too bad i was told its only for talented people and too hard to learn
So i gave up for a long tome
Time
Then when i came back generative AI art came and i had a crisis for a short time again, but now i appreciate art way more
i've never really been into art.. im more of a technician.. like i create blender models. and like to render..
but i like to appreciate art.
GAI just gives me more reasons to continue with art
Ok thanks for your interest sono, we are all waiting to be able to test the model in depth, in any case I leave the question open in case someone with access to the beta is so kind as to inform me. In another order of things, it is possible to paste external links here, specifically towards deviant art, to better exemplify the type of lighting I am referring to...
i don't see why not.
There is too much too important things that just make me stick to art including modeling instead of relying on AI
I have fun with GAI tho
And it has partially a place in my workflow
Something happend to my brain from zonking out for months using SD and midjourney... My brain must have been learning.. because when i randomly pick up a pen and tried to sketch something,, i was amazed.. i never used to have the talent
SDXL cant draw hands and fingers correctly like in that gallery yet... I believe they are still working on refining that area for V 1.0
Yes, I know, it's understandable, the hands will still be a source of problems, I mean more those images with low light but with ultra-defined details.
Could I post a woman in a bra here, like an underwear advertisement, or would that be NSFW?


that's fucking nasty

