#🏞|general-with-images
1 messages · Page 100 of 1
you can drop a previous image into the prompt area to get some of the settings for that image back, but it won't populate all the various extensions, controlnet etc
but with comfyui it saves everything used to make the final image
my discord bot lets every user set up their own settings
and then it just keeps them
it's really not that hard to do 😐
though i don't have style presets like a1111 does i've considered it a few times, i'd probably make them global styles for all bot users like SAI does
each user can set up a post-prompt that gets tacked onto the end of every prompt, which we found useful for 1.5 where you constantly need a bunch of crap thrown on at the end
oh wtf
dear god LMAO
i forgot i was running a server with a model training
it's just been, uh, going
oh
i wonder if this model has elder magic syndrome now
its just been burning to a crisp
so i have a theory about 2.1
and i'm kind of excited to test it

dawned on me today i'm not exactly working with the same conditions the ByteDance researchers were
hmmmm that model can almost make a clean black square
what the fuck
i think i know what's happening here
the images are 768x768
that picture has a scratch in it
is this model burnt or not
also what painting is this because it haunts my dreams now
bro 
"the girl wearing pearl earring"
i know, that's the real one, the 2nd one
no, that's an old famous painting
theres a LoRA that used this as a sample prompt and when i run it on base 2.1 i get the real painting
like mona lisa
their LoRA is like, holy cow though
totally different image
pretty strong weighted lora if it can override this overfitted hunk of art
also, in the presentation today, Emad said something about SD3.0, i thought SDXL is 3.0?
also, i'm currently installing kandinsky2.1, this is interesting.
it's apparently way overtrained on Midjourney data
should be fun though
i think the demo images looked really cool
i'l make a comparison real quick
hmm..
both are very similar, which is odd, because of the different architecture
i mean, there is kind of a difference..?
restarting now
it doesn't give any ''unable to load model, skipping'' messages, so it is the right model
it's unlikely they're overfitted on the same thing but maybe they are
try other prompts, man
give me, i'll do them, i'm trying to figure out this Kandinsky thing
A tense standoff in a dusty Wild West town.
Wild horses galloping across a dusty plain at sunset, sharp
a handsome man, vaping a massive cloud in a coffee shop, black and white, sharp
A vampire's castle on a stormy night.
An urban alleyway filled with vibrant graffiti
A thriving city under the sea, inhabited by merpeople.
A time traveler stepping out of their machine into an unknown era.
an embarassing family portrait, photography from 1980s kodachrome style realistic
The hidden underground hideout of a superhero.
an epically massive mecha robot in a fighting stance against a skyscraper in Manhattan, destruction, explosions, professional, masterpiece, majestic
A steampunk city in the midst of an industrial revolution.
An advanced space station orbiting a distant exoplanet.
a child smoking a cigarette in a coffee shop, black and white, 1960, kodachrome
whoops, that last one, lmao
it's cooking
this is the same kind of difference like the SDXL bot does
my brain hurts
i'll try to move all my models except kandinsky from my models folder to see if that changes anything
IT STILL LOOKS ALMOST IDENTICAL TO MY MODEL
please
wtf
something broken
there is no way, i don't understand what's happening here
ok newbie question, how do i update to the latest version of controlnet?? or update extensions in general?
there is a button 'check for updates'
it updates =/
i clicked check for updates and it just shows all of them behind, how do they update?
apply and restart
i did that 2 werid
extreme curling champ monica lewinski
whoa, kandinsky also has img+img=img
because it's a pixel diffusion model 😄
wait, does this mean that MJ is also a pixel diffusion model? MJ also has that option
no i'm just being silly, you can do that by converting images into latents as well
that's what i was trying to get but what the fuck @ those tiles lmao
i disabled VAE tiling and it's still there. that's gotta be the Controlnet Tiles
is the .9 model in Dreams yet?
you can use the API 😄
@cyan snow here's a weird one. heavy metal was the prompt. that's it
why is there a god damn chewbaccaaaaa
my model has chewbacca syndrome?
for SDXL?
yes
wait, I just looked into it, it's like OpenAI's API, there are no files or anything
yes

horrific accident pile-up on the expressway with the oscar mayer weinermobile
"it's smaller than i remember" 
LOL the truck size is so mini
I feel like i just drank about 3 liters of mcdonald's sprite
it's like the truck on the left is one of those JDMs
either that or that's one HUGE Cadillac
bro, i feel so dumb rn
they sent the bigger weinermobile as backup and it got into another accident. tragic
its going to be okay
it's going to be okay

i think my model also has the cawabanga syndrome xD
my model does not have elder magic
that's a disadvantage
it's just because it's a base model
it's up to you to provide the magic
i suggest trying to fine-tune it with MJ 5.1
you mean, a similar process to what i did with 1.5?
wait, but it's based on 2.1, isn't it
yes
i did it so slowly and carefully that there was not much added or removed
it was just trained on realism
i used the laion datasets with like 150,000 imgs
pulled out Nikon camera pictures and used those
so, it's just renamed 2.1 basically?
ah, i see. nice!
it still kinda does 768x768 but it starts being good at 1024x1024, especially with the CFG rescaling extension i assume - haven't tested that
if you use a negative text embed it gets better. i briefly tested the aether lux one from joachim and it cleaned up all of the remaining residual noise in the random images/seeds that still have it
I still think i'll wait until SDXL releases before i will make another model
that's a month
yeah, i'm not going to be at my country for a month also
grandpa has a bluetooth hand
grandpa has too many fingers
grandpa searching for answers
we must assemble a team of grandpas, like Oceans 71
first we must learn to use the internet
christ this prompt is so strong lmfaooo
Pixar style little girl, 4k, 8k, unreal engine, octane render photorealistic by cosmicwonder, hdr, photography by cosmicwonder, high definition, symmetrical face, volumetric lighting, dusty haze, photo, octane render, 24mm, 4k, 24mm, DSLR, high quality, 60 fps, ultra realistic
Why did you put 60fps in your prompt?
It's an image, it can't have frames per second
i grabbed that prompt from somewhere random
just seeing what happens with these stupid prompts
they also have octane render in there like 3 times
that is truly a prompt that was
a prompt of all time
why put photorealistic in there 
just type "uncanny valley"
SDXL knows
did i get a bad roll jackpot again
shit you might be onto something
just bad seed i think
zoomzoom...
I was just about to ask in this server how a real big post w/ nsfw on the subreddit has been up for 18 hours and right as I was about to ask it got removed LMAO
literally all the subreddit mods were asleep but me but i was in work mode and didn't even process it
YEAH makes sense. I just thought the timing was really funny as I was like just typing up a message asking on it
I like had just pulled up the subreddit and it was there, might've been glitched though
Also, I got Kandinsky2.1 working using a different WebUI specifically made for Kandinsky2.1 and I gotta say, this damn model might be better than SD
it's too realism focused 🤣
it's funny because my pseudo-journey model is like, pure magic
this one is more grounded in reality
pseudo-real is interesting because it is heavily trained on real photos and has very clear images and also evenly mixed with midjourney stuff
so it's more of a middle ground, where you can make funky houseparties and other weird stuff involving pretty well-composed humans
i could say myself, but then i'll get hounded
Inspired by Lemaire 2023 collection.windbreaker,white light gray,Seasonless, Genderless,
Effortless,full body view
/Inspired by Lemaire 2023 collection.windbreaker,white light gray,Seasonless, Genderless,
Effortless,full body view
also, why is img+img not possible on SD?
atleast, there is no extension for A1111 that has that function
what is img+img?
like fuse 2 images together
existing images?
MJ can do it pretty poorly, but hey, its a lot better than not at all haha
exactly what i'm thinking
why isn't there an A1111 extension for it? kandinsky can do it
john cusack 😄
wheres the trust, john.
just hangin out at the park, no big deal
i love how my john cusack looks just like SDXL's
it's like an OpenCLIP thing
wtf
doughnut burger
"bruh moment"
the AI looked at what i want it to describe and just went bruh
dude is popping off
i can't beat this with SD and XL, this was made using kandinsky2.1
i swear man, that model is so hit or miss
like damn, it follows the prompts perfectly
i feel like kandinsky has so much potential
Zoom in guys
so I've been doing like
all day working on a piece of ai art in clipstudiopaint
and it's driving me insane that. even if I repaint every single aspect of the image if I post it as 'my art' without disclosing it started as an ai generated image I'm immediately an awful human being
at least in like, friend group servers or art servers
BUT
let me show off some progress on it bc >:333
wip :3
the kimono is nowhere near done that's just like. blocking out stuff atm
but!! yeas. silly
I plan on replacing the background completely as well
I also plan on adding more shadows and highlights but that'll be towards da end..
OH and I wanna add a hair ornament. and the arm will b added back ofc
anyways. little ramble. enjoy this
so far my general way of generating things was always generate in batches of 4, and do high-res fix from 512 to 1024 at the same time, with a denoising strength of 0.4
yesterday someone showed me a better way of doing things.
generate stuff in larger batches, only 512, then take a good result and selectively apply highres fix to that. but put the initial gen first into controlnet tile, then put denoising strength to 0.7/0.8. that way one gets a much much higher quality and more detailed output, while also seeing a larger variety of output to choose from.
here is an example of my old method (left) vs. the new method (right)
EDIT: nvm, after more testing its a bad method. dont do it. results in extremely overdetailed, fried images.
omg this is amazing thank you
i put it through my own model:
needed a few tries tho
but well, the unfinished SDXL model that the bot uses can beat kandinsky2.1
the only advantage kandinsky2.1 has over SDXL is that kandinsky2.1 can fuse 2 images together
someone on reddit sent me a message that my model is very good at anime, and not so good at photorealism, and that i should focus on the anime part of the model as the people who make it to the top either do very unique stuff or are the best in their field, and currently there is a race to the best realism possible, while there isnt that much competition atm for actual anime
and he said i am not doing myself any favors with the images on the frontpage as my model can create much more beautiful stuff
ill keep it multistyle but ill definitely update the thumbnail and example images with something better
like that dog made out of water for instance
why don't you tell him that
I can give you my model so you can do merges with it, my model can sometimes beat MJ
thats kind of you but i dont merge my model with anything
Merge real life model with anime model and get anime dwaynw johnson
well, that means it worked =]
Glad that it did but so cartoony
I mean MJ is
I wonder but up the stairs what is that? Are we in a jar with jars?
Jar in Jar
@cyan snow what is your model name?
Encoder, it's a model I made using calculations for the combination of finetuning a model and then merging it with specific models in order to produce a model most similar to MJ.
o.k. just asking if not using it 🙂
it doesn't know Olivia Chow 
and well, it seems to have worked =], i will do the same process to SDXL after it releases to make an even better model, and this time i might release it.
it is regional prompter, i like that bank
@cyan snow have you used word model for those miniatures in jars? It is magic word imho. When used my own no lora tiltshift it is like charm.
nope, no embeddings, it does it perfectly without a LoRA
i dont mean embeddings, just prompt. model of village in garden for example
I didn't use any prompt weights, I'm not sure what do you mean by ''word model''
ohhh, i get it
no, i didn't use that word
word "model" i mean 🙂
you can also generate a bunch of 512's, take the seed of the good one, and redo it with higher step count. and if you just want detail use the add_detail lora
I made some characters in a t-pose and had an idea to model them in Blender, but I want to use them in VR-chat just for the LoLs, so is there any method to make 'image to 3d characters?
It do not need to create perfect result, but a plus would be if I can use with Mixamo,
OpenAI SHAP-E
Poor fish on the left
Thank you, I will read up a bit on that,.
I wrote "cute feminine clown" and we have to trust AI that soon is smarter than us according to the old media, this is cute?
howdy
i don't know if this is good or not, as i don't watch anime 
Is this too generic for a possible kinetic novel or comic?
this took a ton of inpainting little details, as well as some very slight editing of little details in GIMP, and its still not perfect, but i think it looks good now
Add the word "robot", robot clowns have to be less creepy
lol
smells like cookies
Test, used a Pixel art model and then a extension to remove background, after that resize to a 512px size again.
Darn. That was a good test, now I see it left a white line that I did not see when I made it.
morning! did someone mention clowns?
NO! 😄
since you asked so nicely 🙂
You should have just sent a picture of Sen, they are the biggest clown I know lol
It's global
yeah, i got it across several partnered servers
wonder if it was an issue that hit their higher priority servers
all of my smaller ones were fine, but the big servers I am in were down
yes, the same
waiting on my new case from newegg and never knew they had a discord, lol
they have had it for years and years
I am like one of the top 50 most active in there, and I haven't touched it in a long ass time lol
This is my first order in 17 years from them on my 20+ year old account.
I just got my new case :>
my new case is today just waiting on UPS
I am trying to find something to print now that I have my 3D printer dialed in again
did the type of printer that uses light and a liquid plastic ever become a thing? i remember seeing a cool tech demo years back, to see the solid object rise out of the liquid
Luckily you can get great ones for very affordable prices
this must confuse the shit out of the AI
what i got my eyes on is very expensive tho but techically i could get more affordable ones too
Even it is not perfect, the remove background is kinda cool
It is a extention for Automatic1111
There are free websites which can do an even better job
oh neat
They are what I use now
Damn, my 3D model I made for printables now has me at top 60 3D artists on the site in the world lmao
a single design
I still can't believe it was my first fusion 360 design, and still by far the best to this date lmao
pikachu if he was in darkest dungeon
oh yeah, I messed a bit with that art style LoRA
i love it
I wish there was a more capable AI to 3D workflow
I got all excited and tried out that Shap E addon, but I didn't realize how severely limited it is
there is with blender
eh, I'd rather not have to use blender anymore haha
I don't care much for Zbrush either
I SAID WISH GRANTED
My favorite 3D workspace has actually become Medium, believe it or not
IMO, its unparalleled for ease of sculpting and intuitive using
"believe not", my choice, final answer
is maya still a thing? I think we used that when I did a year in art college about... 18 years ago?
yeah it is
Medium, the VR sculpting program that Adobe bought out
its been out for years, and its still the best for VR sculpting IMO
i was gonna say you should try TED talks next
Its so much faster and more effecient than 2D projected sculpting
yikes, 2630 dollarydoos a year 😐
Yeah, Maya and 3DSMax are way overpriced
Does anybody know how to update xformers?
It will be so cool in the future when you see your friends work from home through their AI robot.
@oak ospreyhttps://youtu.be/qnTvoAzxnbM?t=598
Adobe Medium
Adobe Medium (Virtual Reality Sculpting Basics). Adobe Medium was formerly Oculus Medium and owned by Facebook. Adobe acquired them in 2019 and recently released their first version under the Adobe banner. There aren't a huge amount of major changes on the outside but lots going on under the hood.
Medium is a VR sculpting app that is probably...
@smoky oak@mild tuskin case youre interested, my st-AI-le model also includes the darkest dungeon artstyle
here is an example
that time stamp shows really all you need to see
phew! I can buy 16 days of access for only 425 dollarydoos, that's a relief!
its insanely nice to use
i love it
i need to try it
it lora right?
Lora trained to make cool character sprite inspired by Darkest Dungeon. Trained on NAI with 126 pictures. Style lora. Best result with weight betwe...
oops, wrong one
oh
Thats the better one
no i am saying i have a model, like a checkpoint, that among a lot of other styles includes the darkest dungeon style, its not a lora
that the one im using
ohhhhh
a checkpoint wtih this would be nice
Huggingface link: https://huggingface.co/ai-characters/st-AI-le Multiple characters Many styles One model st-AI-le by AI_Characters If you like wha...
Interesting model
i try doing something different than other people
cant tell you if my darkest dungeon style will be different or better than the two loras tho
I much prefer having lots of individual models that are very good at specific things over having one model that's pretty decent at doing a lot
yeah i heard that a lot lately haha
But there are some people who much prefer not having to mess around with different models
@smoky oak@mild tuskdo you want me to do some test prompt(s) for you in my model using said style?
ye sure would be nice to see what amazing we get
well tell me what you wanna see
Oh, no thank you, I'm good. I stick to the models I use daily, just forgot about the darkest dungeon style is all
is it possible to try to do a madness combat one?
i saw but nobody did made one :C
oh i meant if you wanna see something created using my models darkestdungeon style
well do you wanna see something specific?
grumpy old bones full of glitter dust
homer
dr octopus
neat
the skull on the ground and the head in hand were random lol
@smoky oak this image to me should be on the tin for SD 2.0
what is the prompts?
bill engval as a school teacher in a classroom in front of a chalkboard
i didn't know i have to be specific and tell it "with desks of students" 
i really like the expression here
give me a curse prompts
Prompt: a zombie giving a math lesson in a university, dark, cartoon, green lighting, darkest dungeon style
i lvoe it
@smoky oak i could have a whole blog i just post "Images that confuse AI" from the LAION data
same prompts
no idea why there is suddenly a white border
Are you using Clipdrop? I didn't select any style.
flashback
clipdrop?
oh yo mean checkpoint?
a handsome mage wizard man, bearded and gray hair, blue star hat with wand and mystical haze
An old wizard's tower, filled with magical artifacts and spellbooks.
A tense standoff in a dusty Wild West town.
Wild horses galloping across a dusty plain at sunset, sharp
a handsome man, vaping a massive cloud in a coffee shop, black and white, sharp
A vampire's castle on a stormy night.
An urban alleyway filled with vibrant graffiti
A thriving city under the sea, inhabited by merpeople.
A time traveler stepping out of their machine into an unknown era.
an embarassing family portrait, photography from 1980s kodachrome style realistic
The hidden underground hideout of a superhero.
This https://clipdrop.co/
oh no i didnt i just took the same prompts but changed zombie with knight
I recommend againstt using clipdrop, its pretty bad, and they bold face lie about most of their tools and their capabilities
Maybe for SDXL, but their other tools suck ass
No, they are just bad bad lol
like their upscaler is bold face lies
and it does a frankly terrible job
hey it's me during math class 😄
Their example images are all fake as well
If a company gives me a section that says "Try it yourself", and then feeds me fake results, I am not vibing lol
let me guess. probably hired a web developer to build the site and they were supposed to replace the examples with the actual product images, but they never got around to it? 😄
No, they just frame completely fake and misleading images as something their AI can do, more than likely to sell people their subscriptions
whatever gets those conversion rates up
darkestdungeon artstyle of a zombie (((giving a math lesson in a university))), dark, (((green lighting)))
even with triple emphasis its struggling with that prompt lol
My favorite is how they bold face lie by saying their AI can turn this
Into this
All they did was lower the res of an already high res image lmao
Or this one
that's how those upscalers work
they took some stock photos and cropped the example area to show detail
just like how the controlnet tile page uses a puppy from 64x64 to upscale to 1024x1024
using just the prompt, "best quality" too
i imagine the puppy was originally larger than 64x64
i don't know if it looked exactly like the original puppy, after upscaling all the way to 1024x1024... if you roll the seed or change the prompt to like "mastepiece quality" the fur on the puppy changes
Isn't Stability AI the owner of Clipdrop?
why does that look like trump? must be the square head
ohhh that explains a lot
I tried it myself on their site and it was about as good as stretching the pixels lmao
oh 
Pretty sure
It's not even as good as gigapixel
But, they make you pay to find out what their "detailed" upscaler does
Which is why I think it's scummy that they take these images that are so clearly just downsampled
For upscaling, I use this https://github.com/pkuliyi2015/multidiffusion-upscaler-for-automatic1111
idk, SDXL is a capable uspcaler, i think it just needs an update
I use ultimate upscaler, it's basically this, but a little more efficient, and much simpler to use. And it typically produces better results from my tests, especially with the new controlnet made to work with it
I wish that was in it's own tab, so fiddly getting it set up correctly in the img2img page
Yeah, that's why I prefer ultimate upscale, much easier to get working, faster, and has controlnet support
I just want to be able to drop an image in and it uses the controlnet tile thingymajig automagically
Ultimate isn't that easy, but it's a lot easier than fiddling around with the sliders in multi
none of this divide by 8 business 😄
Yeah, ultimate doesn't have that
You select your final res target, the tile size, and the pixel upscaler (if you want one)
I have gone to absurd resolutions with ultimate upscale
I'm very used to multi, maybe I should try Ultimate 😁
I did 16k ultrawide on an 8GB GPU with it. I'm confident I could do it much faster and higher quality now
nvm, after more testing its a bad method. dont do it. results in extremely overdetailed, fried images.
I have an amd gpu with 8gb and can't do more than 1586x1024 with multi...
Can't send the full image, it's like 200 some MB
Yeah, ultimate is theoretically infinite res
All it does is render tiles of the image at the res you provide, and then stitch them together
another example from base res to final crop
I don't have the full res images on my phone ATM
The bad thing about Ultimate is that it seems to add detail that doesn't exist in the original image.
I got this image with 1.5 and ultimate
Please excuse the whisker crust, found a way to stop that from happening recently
That's where the new controlnet comes in
It's made specifically to work with ultimate to prevent that by allowing it to reference the whole original image
that's not a controlnet tho 😄 it is in the suite but yeah it's different
controlnet is in between the blocks and reference is a hack on the block itself
Either way, it part of controlnet default now haha
All I know for certain is it does a damn good job of making sure that it stays faithful
ever seen loss that low? loss=0.00459
Only in LoRA training, never in dreamboothing or fine-tuning
this is SD2
My lowest I ever got was 0.0023, and the results was trash lmao
interesting
well i'm simply testing a few theories out at once, if it fucks up, nothing is lost
i win either way 😄
I'm going to try Ultimate since multi is super slow (4 minutes per image), but the end result is exactly the same image as the original without any new detail.
4 images for an image on an AMD GPU doesn't sound that slow
are you trolling lol
even a vega56 can do them in like 30 seconds
4 minutes is awful
Multidifusion is really slow, which is unfortunate cause most people know of it over ultimate
scaling to 1536x1024 in 30 seconds? Really? maybe I have something wrong with my env...
i mean normal image gen
512x512 happens in 30 seconds
so if you're just after images, well, 4 minutes is a long time
I can do 512x512 in 20 seconds xD
yeah 🥹 the vega56 sucks now
20 steps
With 1536x1024, I could probably hit that in like 15 seconds on my 3080 using ultimate upscale
rx 580...
@smoky oak remember how hard it was for me to make real people for so long? now i can't make fantasy stuff with the model 
Yeah, its good to see it actually looking fairly good for real stuff. Very impressed and happy about that
im sad i cant make cool stuff anymore 
imagine plugging this shit into a controlnet
or even better, training a controlnet tile model based on this base
i'm so close to just offering to buy the training details
you could try mixing the two models, use weighted block merge?
nope they have a different noise schedule
cyberpunk apparently
if i throw midjourney+++ on there, which works on the pseudo-journey-v2 model 😁 it doesn't do much here
lycoris?
lycoris locon a lora type
same
It's for stuff you don't want in the picture. So if you want a man and it keeps making women, just put woman in the negatives
uh oh, someone stop the photoreal model! it's going creative!
LSD moment
Actually not that far from some of my Synesthetic reactions lmao
it's not great to me but yet at the same time, it's not bad
like, it's kinda hilarious how hard it's trying for papa
" 🥹 i can do magic for you"
"show me
"
" strains, strains, strains, farts 💨 🌈
"
i have looked up some traditionally epic prompts and all of them come across as very grounded in reality in this model
the elderly magician doing quantum experiments yada yada was just some grandpa with a sparkler
damn, the detail in that image reminds me of DALL-E 2
OpenAI is definitely sampling from the last timestep
anyone know how to upscale an image without the "smoothing" effect?
use a prompt with the img2img that adds more
if you just really want to do it for one image, it's really easy to use the controlnet tile example code and tweak the prompt you give it directly, as A1111 doesn't let you set them separately
@gritty trellis 🤫 🤫
you tunin' it now? 🙂
do you know what the 'sapi only' is referring to for clip guidance in auto?
nm i figured it out (i think lol)
option for the API
< markzuck420> yeah it'll make it sticky like sap from tree
i prompted mark zuckerberg using the computer in a totally normal way
i love the weird apology notes SD can make
Oooooo
You said you wouldn't finetune anything fancy!
And yet; there you are!
ha not a fine tune yet lol, need a bit more time, just raw output
"jesus christ in a ny yankees baseball cap, by claude monet"
😄
i will have to play catch-up to update my training code i guess
SDXL 0.9
I think this 1.5 has been trained on more images of him
DPM++ 2M makes it look so damn good all the time, but hyper-smooth when the prompt isn't like, "good", it's hard to describe
Just how good is SDXL damn! Finally we have a base model that is at par with midjourney.
No need for prompt fluff like "beautiful eyes", "hyper realistic photograph" anymore. Almost as crisp as Midjourney generations. Follows the prompts as well as anything else I've seen ( Midjourney, Kandinsky etc ), Hands and fingers come out well most of the times. Does extreme eating contests pretty well lol ( Bing anyone ? ) Glitch is improved. Man I can go on and on.
and it used to be better! imagine that 😛
Kids these days are having it easy. Remember the good old VQGAN + CLIP days ? Real men had to plough through colab notebooks and put in honest effort to eke out something good. Those were the days 🙂
Was joking of course 🙂 I wish I had the time to deep dive and work on an aesthetic discriminator or something. Such is life I suppose
i'm looking at fine-tuning 2.0 and its surprising how much more fine details it has in it than 2.1
this is 2.0-v, not 2.0-base
it seems like 2.1 is just overcooked or something
think they put like 1.5 million steps on 2.0 to make it
what was 2.1 again? the appealing to outrage model? 😄
I played around with 2.0 too recently! Really liked what I saw. This leopard gecko fella I presume generation is so crisp wow!
i've tuned terminal SNR into 2.0 to get it like this
i've been tuning 2.1 for a while and it had all the same kinds of anatomical deformities but it lost faces. 2.0 does faces better
especially little faces
finer details in general are far superior
the best i can get 2.1-v with fine details is a disappointing kind of patterning state like this
same prompt in 2.0
i will keep it training just to see if i can resolve the deformities and keep the details because i think it looks great. a lot of the prompts between the two behave pretty similarly but there's others that are DEFINITELY better in 2.0
this one actually has a woman that's an astronaut. 2.1 does not, she turns up as some decapitated head inside a cylinder
interesting that this one already doesn't have dupes
2.0 can make toddlers light up in a coffee shop without issue, i had to work so hard to make that work in 2.1 lmfao
even the family photos?? nooo why 
lmfao i brought my fine-tuned OpenCLIP into the fold and they look like a early 00s rap music video
but people look fine now when they do show up solo
hmm, so text encoder trained on photos is A+
SAI probably could have saved themselves a LOT of time
this shit's gonna be great. it'll not be the best thing ever but it'll be soooo different
looks like West World
😮
@split rover you guys should tune the text encoders a little someday 😛
they seem to come with fucking cursed quanta
imagine instead of paying to do 800k steps on AWS you guys just did the same thing i did to the text encoder over just 4 weeks
@oak osprey you here?
I made a post warning about the company that scammed me on Reddit, and I was pretty tame with the post. They found my claims and tried to pull a 180 and defend themselves, saying that I was the problem.
That warranted my response going into extreme detail over all the things that they did, to which they have refused to respond to.
The Post only has eight up votes, but their response has -5, and my response to them has +4, so that alone assures me that I was not alone in my feelings about their shitty practices and deceiving behavior
As a newcomer to the whole AI image generation thing I am more than a bit impressed with this new 0.9!
If only you could have seen the version we had access to a few days ago, it's considerably better than even 0.9, although it is very impractical to run on consumer grade hardware, so their current goals are to train 0.9 to try and match the quality of the one we were fooling around with
These were the results from the refiner paired version (most are mine, some are from other people in the community)
0.9 is still dope, but full refiner was legendary haha
I have also been using it on the Clip Drop website and noticed the quality was a little better on there but I wanted to use different aspect ratios and had a load of credits I was not using in Dreamstudio (I was very let down by Dreamstudio and the credits were just sitting there). Until my personal finances allow me to set myself up with something better than this crappy laptop I think Dreamstudio with whatever 0.9 version they have and my CC sub for gen fill will be ok for me 🙂
Up until a couple days ago, the one on a clip drop was considerably worse. It appears as though they recently updated the clip drop model to be the better refined version of 0.9, however it still doesn't have the refiner model which is what brings the results to the next level unfortunately.
If they can offer a decently priced SDXL generation service with the full refiner model, I see it being a genuine threat to mid journey, cause it can even do good text
Why not generate with SDXL in this server? It's free and unlimited
For now, at least
I have tried it but to be honest I get confused with all the scrolling and whizzing about! At least on DS the outputs kind of just stay where they are!
If I could put the bot thing in my own server thing like the MJ one then I would but I don't think I can.
As of right now you can't, but I already have a friend in this server who will be working on a discord server bot deployment for SDXL
can somebody tell me how to upscale an image without it becoming so smooth?
i tried different samplers, different sampling steps amounts, different upscalers, with and without controlnet tile, etc
nothing works
it always is this smoothed out
sure can, what kind of res are you trying to reach?
oh man, that image is cursed the more I look lmao
@cunning geodeMeant to reply
@smoky oak thats from higher denoising but thats irrelevant
this was just simple from going from 1024 to 2048
Huggingface link: https://huggingface.co/ai-characters/st-AI-le Multiple characters Many styles One model st-AI-le by AI_Characters If you like wha...
I believe it's their own 1.5 model they trained without mixing
what humblemikey said but that it looks bad isnt relevant here
i know it looks bad because i put denoising to 0.6
its just the quickest example i had availble for the dmoothing effect
it doesnt matter what denoising i used, it always came out this smooth
did you try a latent upsampler with the high denoising?
i see people upscale images all the time without this smoothing effect
i dont get why it doesnt work for me
i justnuse controlnet tile with ultimate upscale and realesrgan
i already tried with foolhardy but little difference
again dont focus on the denoising or the bad image quality
this also happens to me with 0.1 denoising and better image quality
it was just the quickest example i could grab
Kandinky
1.5 finetune
i feel like latent diffusion models can do better detailing(SD) and pixel diffusion models(kandinsky) can follow prompts better
like, i can't get this level of coherency out of 1.5 finetunes and maybe even XL
what's the prompt?
dog made of fire
SD just does a dog on fire, not what i asked it
and kandinsky does just that
this is best i got SDXL to do =[
again, i'm on SD's side, but i gotta admit, Kandinsky might be a good competition
i think i can get the highest level of detail with 1.5 finetunes or SDXL, but the most coherency out of kandinsky
for the most part, I'll probably still make the SDXL finetuned I said I'll make and likely mainly use it, but if i would want something very specific, Kandinsky might be the way
We have been working on Stable diffusion way longer than the Russians are working on Kandinsky, so I refuse to let their model beat SDXL
Well, it seems they already pretty much did, they're model is more coherent than SDXL
SDXL
Like, I can't make these kinds of images using Stable diffusion, only using Kandinsky locally. But I'm pretty certain I will be able to make a better model than Kandinsky and MJ after fine-tuning XL.
I've found fire to be a difficult thing to prompt in general
Well, I asked Emad about this and he said that he is confident that fine-tuning XL will produce what could potentially be the best diffusion model yet.
But yeah, both Kandinsky and SDXL are impressive and different models, SDXL isn't able to fuse 2 images together like Kandinsky can
"fusing images" sounds like a thing software tricks might be able to pull off once you're running at home rather than via bot or api
(via eg controlnet, img2img, unclip, etc)
Apparently, the reason the Kandinsky model is able to do that is because it's a pixel diffusion model and not a latent diffusion model.
I don't know offhand what Kandinsky does but, uh, i doubt that?
That might be the reason it can pull off those images I posted earlier
I literally did that today -_-
It worked pretty nicely
Did... what? implemented the source code for whatever their mixing thing is?
The thing I doubt is that it's somehow limited to pixel models only
https://github.com/ai-forever/Kandinsky-2 it's also not a pixel diffuser, it's a latent diffusion model, if I'm not misunderstanding the readme here
I don't know what to tell you. The UI that is made specifically for Kandinsky2.1 has that feature and it works flawlessly
2.1 has different architecture than 2.0, at least as far as I know.
I know the feature works.
It will probably work in SD too, is my point
actually yeah it's literally just multi-unclip lol
that's definitely a valid thing to do in SD
Than why does A1111 UI don't even begin to fathom that feature?
This should be a thing as far as I know
because you haven't installed the controlnet extension?
I have, it doesn't do that, but it DOES do something similar.
There's an SDv2.1 unclip model: https://huggingface.co/stabilityai/stable-diffusion-2-1-unclip. Idk if it takes multiple inputs as-is, but if not, it could be adapted to allow that without too much difficulty.
Then why not A1111
It's the UI with the most features
Well, I'm about to return home and I'll make an organised comparison of 1.5 fine-tunes, SDXL and Kandinsky 2.1
SDXL
Kandinsky2.1
was sleepin
I personally think the Kandinsky won here
DAMN
i can't make this with 1.5 fine tune
as much as i hate Russia for what they did to Ukraine, I gotta admit, they make damn fine models
idk go open an issue on the auto webui github, maybe someone will add it
thanks, i will. also, what model do you think did a better job in the comparison i made earlier?
this is the closest i got to this
huh, i think the SDXL might have beaten it
hmmm, I think after finetuning XL it might be way better then this Russian model, like Emad said
they're both trained on midjourney images 🤣
dude, I literally can't completely beat Kandinsky with the SDXL bot, even after many attempts
SDXL isn't trained on MJ, as far as i know
bear in mind the SDXL bot randomizes settings and models and all
also bear in mind SDXL interprets prompts differently vs. kandinsky, so you might need to change up your prompt to fit you goal
i tried running it locally and got some cool results
trust me, i already thought of that. The images I made running Kandinsky locally were kinda better and took way less effort than getting comparable resaults using the SDXL bot. but I bet this is the case because the SDXL bot uses low settings and all that
or that but darkerer
(I'm running a singlestage model cause im too lazy to load the refiner stage)
still, I'm not sure if it beats this, both models can be run locally except one is already available =/
I added "dripping acid" to try to go for the style of that, which is neat too
but I'm sure we can make a far better model than both after SDXL gets released and finetuned
ye, finetuning SDXL is super powerful
even a small lora can do a ton
(I've been prepping for a followup post about loras after my last one on bare minimum training test)
still no weight access
base model definitely does cool things with just a bit of playing with prompt
my conclusion is that this is not a fair comparison because as far as i know, Kandinsky is an already finetuned model with different architecture. in might beat SDXL with those images, but after it gets finetuned it will for sure be better than Kandinsky
but time will tell =], we will for sure know eventually
kandinsky is a base model too and it can be fine-tuned
the terms base model are thrown around pretty loosely, 2.1 is called a base model but that bastard is burnt to a crisp compared to 2.0
as far as i can tell we're fighting all of the extra hundreds of thousands of steps that SAI put into 2.1 for no reason other than to increase their performance on benchmarks
just increase LR and burn it back out of the way
i tried
there's only so much you can do
my best results with 2.1 are still pretty "meh", even though it can make stunning photos of people, their fine details are like "what the hell?" vs 2.0 that has really strong fine details just 1 epoch into terminal SNR fine-tuning
@proud dagger am i correct in the assumption that converting a model from epsilon to v-pred is going to need more than a single GPU?
cuz it really feels like that, lol
600 steps in on the 2.0-v model it's starting to understand high res properly too. i love how these two still look like some bootleg rap music video
depends how big the GPU is and how much time you're willing to spend lol
it's an A100 80G
if it'll work, i'll just let it run for a long time
but it's feeling like i don't need to go all teh way back to 2.0-base to get the results i was hoping to obtain
i gotta tell you, the built in text encoder of 2.0 makes it feel distressing when doing validation tests. everything looks so bad. there's body parts showing up out of nowhere? the faces of the people are ironically amazing compared to the anatomical gore of their arms and legs
175 steps on my custom OpenCLIP ViT-H/14
600 can do brightness now, quicker than 2.1-v picking up the new noise schedule
it's like being right there
try to get him to show up in "my cousin larry's dirty room"
berliners unite
2.0 fine-tuning is fun
i know just how to treat it 
and somehow its results are more creative
it needs a run with the 100k images of hands dataset through the unet, it seems. the text encoder already had that done, but the unet doesn't seem like it has
Predator 2023 The Movie 🎬
question what do restore faces option mean? it just fix the face?
There are several ways to go about itm utilizing high res fix properly can help, but I'd say the best thing to do would be to properly learn how to use ultimate upscale with controlnet tile upscale
There is a big possibility it's so bad cause you are on AMD
I remember @dense tapir having almost no success with CTU on his 1060, likely cause it doesn't have tensor cores, and Xformers and other optimizations made it extremely inconsistent
Yeah, the lack of TC is a killer
ultimate upscale I can use though with real
sadly, 2.x has no upscale via controlnet
we lack style, tile, and there is one more I would have loved to have. The one you colour in things and can change just it. Forgot its name now.
@smoky oak Well, my new 1300W psu will be here on Saturday so that is everything now. New CPU, ram, SSD, case, and PSU which leaves saving for a new card.
im not on amd. i have a rtx 3070
I am lucky as my case went up yesterday 21 dollars which, as I reviewed it, makes it not worth it. Buy one of the name brand ones. At 89 it was a steal
Oh, I could have sworn you had the rx580
3070 can handle all that
Well in that case, I can only assume the model is not playing well with upscaling, cause you have every ingredient
nah lol this was deliberate
Did you see how the reviewers are ripping into that 4060? I swear it is the worst so far.
this is what someone did
this is me using exact same settings
for some reason it comes out worse for me
only differences are him using vlads and not a1111, and using deliberate v1 instead of v2
otherwise literally same
i dont get it
A 3060ti is faster by leaps and bounds over the 4060 and the 3060 is slightly slower but has 12gb of ram
Difference between V1 and V2 are huge, and they likely inpainted a lot
Jayz, lol "The RTX 4060 is the strongest argument to buy AMD"
Honestly tho lol
The 4060 is a 4050 and 4060ti is a 4050ti. smh
As almost all reviewers have said the only card worth a damn this gen is the flagship 4090.
4080 is good just overpriced to the point of making it bad value so is classified as junk
the rest is just buy last gen or wait for next gen and hope
my fear is that next gen they are going to be sneaky with their neural compression and reduce the 5090 in vram to get people to buy the pro cards when it should all be going the other direction not less.
I already see Jensen warming up his cut down baseball bat getting ready to knee cap stuff.
no it was just normal controlnet+ultimate upscale upscale, no inpainting
it was a test to compare to my output afterall
and i was the one using v2
You don't really know if they inpainted or not, it's something that happens before the upscale, and is over written by the upscale
no, i gave them my initial txt2img gen and asked them to run an upscale with the same settings as i did
Ok, then IDK, seems like you know every single thing that isn't wrong here
yeah i just dont know whats wrong
someone did 1 to 1 the same thing as me, and gets completely different and better results
its so weird
something must be borked with my install but i already reinstalled it
What are your starting ARGS for SD?
ill watch some youtube video, see if i did something wrong during the installation process
just xformers and medvram
yeah, med is 6 and low is 4
I ran full on my 3060ti with 8GB, and I never had any problems
i watch stuff on the side
Same deal as I watch but no way would I train on 8 and watch anything
Still, I watched YouTube videos all the side and had no problems
Oh wait, I run my Videos on iGPU
see
Have you ever tried that? @cunning geode
I offload all of my non high performance things to my iGPU to save performance on AI
watching YT is fine even on my 6 but turn off the browser's hardware accel or POOF goes the vram
taming 2.0 🙂
I find it so shockingly interesting how you point out that 2.0 is better than 2.1, cause damn, the results in DM's are a hard agree lmao @oak osprey
some prompts it does shit like that but they're surprisingly few
and also i am as surprised as you
It really does look like 2.0 was much better trained, then they slapped 1.5 million steps of shit on top and killed it
2.0 trains easier than 2.1 too. 2.1 gave us v_parameterization.
well GA is somewhat right in that they were fighting OpenCLIP's cursed quanta
but it just took idk 2 weeks of fine-tuning on a single 80G GPU for that to be fixed up pretty well
it looks like i can do another 2 weeks of it on a new photo subset
"batman i got myself a pokemon"
TheLastBen had issues training 2.1 for the longest time on colab because, as he said, they broke shit.
Emad was here briefly, asked him why no fine-tuning of OpenCLIP or CLIP is done, no answer. i never get an answer on that
2.0 and 2.1 bolts seem like unfortunate byproducts of a company reaching too far in a field that was not properly understood at the time.
That feels like they're not making the same mistake with SDXL, and instead are taking their time to figure shit out
*both
can someone tell GA that 2.0 has v-prediction /and/ an epsilon model
Not going to give them any credit for anything I need to see results not hype so I am waiting with hope.
@smoky oak yeah i'm glad i went on this quest because i understand our progress and what makes it stall out, much better now
2.0 has v-ptediction, and an epsilon model, apparently
maybe just know 2.1 has to have it on when training
2.1 also has an epsilon model that breaks when you enable v-prediction
2.0 also has the 512 model while 2.1 doesn't
2.1-base and 2.1-v are somewhat related but technically different models, as 2.1-v was fine-tuned from 2.0-v which is fine-tuned from 2.0-base
but 2.1-base is fine-tuned from 2.0-base
I genuinely think a lot of the fuckery of 2.1 was from the inconsistent training at 512 then 768, on top of them training it poorly with a frozen text encoder
2.1-v is so much trash because it has been over-trained upon over-trained
We agree
@smoky oak most of the training was just 256x256
and the 2048x2048 was stupidly cropped to 512x512 randomly
all or none not mostly 512 then let's throw 768 on top of it.
Pseudo has been showing me some incredible 2.1 fixes, now that he is correcting their fuckups, and it looks amazing
they trained about 550k steps on 256 and then 800k steps on 512 and then another 800k or so on 768, and 1.5 fucking million steps on 2048x2048 crops to make the x4 upscaler
burn baby, burn
disco inferno
Well, hopefully he has fixed giraffe neck syndrome that affects even animals.
he did
that was the first symptom of 768 ontop of 512
they needed to evenly split batches up between all of the aspect ratios, is the real issue
i don't know if he's meaning that giraffe neck happens in square images, too
i don't think ive noticed that
He did, and faces look wayyyy more legible, and look much better from afar, as well as hands, and body proportions





