#🏞|general-with-images
1 messages · Page 99 of 1
Comfy UI looks neat. I'm not sure how much I feel up to fiddling with nodes, but the power is appealing
And the efficiency
2h i'm working on
you need inpainting model for inptinting, I suggested outpainting tho...it's adding new thing to your image, expanding it to the sides
where did you get this photo of me
It is Gerard Depardieu, a famous french actor
guys you gotta try genning with 8 steps of 2m sde karras the new model, its so fast and good
that sampler is broken
if you go look at #sdxl you'll see the devs mention they're using it to break the bot's outputs
it generates incorrect images, which you probably can't tell in your result there
try DDIM
Funnily enough - you chose exactly the perfect image to hide the issues lol
the key issue with broken samplers is weird colorations and splotchy oversaturations, which is... exactly the intended content here
xD
one more day and we can play with XL again lol
I mean the bot and clipdrop are open for the moment if you are itching to use it haha, but yea it should be going up for api and platform tomorrow
Come to the Dark Side! We got Nuts!!!
I thought it was off Clipdrown til tomorrow, didnt know it was back. noice.
Working on a next generation Starship replacement 🤣
That version there sucks, I'm starting over
Where does the flag come from 
some prototypes
Exams incoming...
can i run a sd2.1 model with a 1.5 lora?
Dude, you just gave me an idea. I'll try to get mini Shrek, LOAB, and elder magic in the same photo
Yes
ty
shrekuamole
Anybody know if there is a way to turn down the image preview res in SD for inpainting? Cause I have a 4k image loaded in there and it lags my whole UI when trying to scroll or move sliders
it doesn't lag for img2img, but it DOES for inpainting
Wow that NAILED the sub surface soft surve texture
ArtiusV21 is really good.
The fact I can generate 1280x1280 images is insane to me.
1280x1280 at 1.40/s.
Wow, thats artius v2?
Color me impressed. Thought it was SDXL, but I did think it messed up the cone
but I don't even care, cause that icrecreame surface is so damng ood
Looks very very good
cryptoid hunting without negatives is wild, man
Is it a hunt if they are so obvious? lmao
@smoky oak when i showed you this a while back you were super impressed with it too, that's Artius
native high res is pretty good on there
I wasn't impressed with it, cause of the duplicates, but it is cool how well it can not kill details it seems
the background you were
though I remember it still having the crunchy 2.1 look when I tested
maybe, were you trying 768x768? they look weird
Does the CFG number affect stuff like loras and hypernets
Can you create allah pic
I think it was a kid who drew it
And the prime minister saw it and said "This will be our new flag!"
IIRC
I meant like on the image lol
is it normal that it takes so much time to switch to a new model
Yes. It should only take a couple seconds.
i read online that it should be included in the prompt but i saw youtubers having it in their ui
got it
Negative prompt: Disfigured, Mutated, Multiple Arms, Legs, Fingers, Toes, Limbs, Ugly, Multiple beaks, Multiple eyes/lenses, Duplicates.
Steps: 20 | Sampler: DPM++ 2M SDE Karras | CFG scale: 6 | Seed: 1270141109 | Size: 1280x1280 | Model hash: 4ee4331304 | Model: artiusV21_v21 | Clip skip: 1 | Version: 4867daf | Parser: Full parser```
It's not when something one think should work but didn't, it's when something one thought wouldn't work…instead did work. :P
I learned something new today by the way ;P
me too. i can now make realistic hands
well, at least I'm past that as well, it was such a rough road from 1.4->1.5->2.0->2.1->1.5 :P
whoa, is this a new AI crypted
that's houseparty shrek
this is a perfect example of something 2.1 makes
well good news bucko, it is
i knew it, i don't know how to explain it, but you can immediately tell when an image was generated with a model based on 2.1
because you can tell it's the best we'll ever be at AI images
I know a couple of ways, but I believe we all know which they are in the end ;P
idk man, i bet if i would prompt it with 1.5 or SDXL, it will be entirely different
probably. i bet if i went to my photo model it would be better too lol
this WIP checkpoint has some contrast issue i'm working out
2.1 is kinda sometimes in the "uncanny valley"
that's the best explanation i can think of
like, with 1.5 models you can often fool people that the image isn't made by AI, but that's much harder to do with 2.1
looks pretty real to me
is there a way to fix the fingers?
hey bro, nice settings you got there =]
also yes, loras and hires. fix
1.5 has a lot more painting/digital art styles. Haven't seen 2.1 have much to do with anything but photo stuff. And when it comes to photos then 1.5 also have a lot more specific styled realism than 2.1.
2.1 has stuff which is better than 1.5 that 1.5 doesn't have, but I don't think a lot of people enjoy that "style" so it's seldom even thought of.
what lora? i am using a character lora
reminds me of the "name one thing in this photo"
that is uncanny, i can still immediately tell this is 2.1
it's a prompt, check out my cousin larry in his dirty room, he is a damn mess
i got it from Emad's twitter
best upscaler?
no image the ai creates is so real that someone will never note something weird about it. I mostly think about how many wants to create X type of style in the first place. That's why 1.5 stuff is so much more popular, it's because it has everything people want from an ai. Read into it how you'd want, but it's not about what a single person want, but what most want. :P
i usually use 4x_uniscale_soft. that upscaler is decent for pretty much anything
hmm ok i will try it
wait where is it
of those I'd say it's ESRGAN_4x for detail, and Latent for the rest as long as you have high extra noise
i don't see it in the chat
it's a custom upscaler from the upscaler wiki
ty
but yeah, also ESRGSN is also pretty good
i personally uniscale_soft
never disappointed me
fuck
ESRGSN moment
i went overboard with the upscaling lol
No problem, but I'd also say that there's no one answer. I recommend you to try them all and get a feel which might be better at what. They are all useful as long as you know how they, or what they, make :D
i usually set it to 1.5 upscale and 0.55-0.6 denoising strength
that should do the trick for weird fingers
there's a extension named ultimate-upscale-for-automatic1111 and it can do wonders, if you learn it. Something I've not, but people at least say it's really good! :D
wow
you generate 512x512 and upscale it to 1024x1024?
still some weird fingers but it is still majestic
does the native 1024x1024 not give it away @cyan snow ?
i love ai
any idea how to solve this? I get ERROR when I activate the "only masked" on inpaint
hell no, i do 1024x768 than upscale it to wallpaper. i can do this because i optimized my inference and i have a 4070ti
i will try downloading it ty
yeah, me too. i never tried it before
i am using a 8gb rtx 2060 super ;-;
if you use the sampler DDIM then try another one, if not, try another one as well, and if that doesn't work, then I got no clue from that image alone, sadly :(
btw where can i find the denoising strenght option
what is DDIM?
then you should ask someone with a similar GPU, i know the best settings for 40xx cards, i have no idea how to get most out of a 2060 super
how many hire steps?
the sampler for the images. Sadly, if you don't know that part, it's really hard to help. :(
you can see what i use in the image i just sent
oh yeyeye im go to see
what is in the training set that it can make these messy areas LMAO
it creates the first "pass" of the image at one size, then upscale it to another larger size with a upscaler
it's like a combo of txt2img and then img2img
damn no work, F
fuck it generated two raidens
Ei and Makoto maybe
hmm maybe the generator knows genshin lore
hires is one of the greek gods. like hermes
damn, anyway, the best advice i can give you is when you get a new GPU, try to get the latest gen CUDA GPU if you ever buy another one
get an H100
woah what a nice tip of course if i had the money for a new gpu i would buy it
bro they are new, don't confuse them
:|
i should try adding 1girl to the prompt
i don't think that does anything
yes, and add 2 girls to the negative
it does, trust me, i know
most models it does not, and just need better aspect bucket training
if you use a anime checkpoint trained on one with the list of tags, then adding solo will probably be enough as a normal prompt
isn't that something that's actually in that game?
i remember there being 2 girls like that in genshin
i am using meinu unreal
yes ei and makoto
makoto is dead tho
died like 500 years ago
you are probably thinking of ei and yae miko
no no, there was the chick that lives in her own world, and the one who is her puppet or something. they both looked the same
man i love ai
don't know which one that is, but most anime styled checkpoints use the best quality tagging from my knwledge :P
raiden ei and raiden shogun? that would probably not make sense because they share the same body
ALMOST five fingers which is good enough
now i will watermark it with my name
highres fix is a thing I use to…fix fingers, it's not a 100% sure thing, but better than not using it :P
lora:raidenshogun1-000009:1, raidenshogundef, upper body, smile, blush, outdoors, day, simple background, blue sky, short hair, sky, temple, looking at viewer, stairs, mountain, moody lighting, facing viewer, 1girl
Negative prompt: lowres, ((bad anatomy)), ((bad hands)), missing finger, extra digits, fewer digits, blurry, ((mutated hands and fingers)), (poorly drawn face), ((mutation)), ((deformed face)), (ugly), ((bad proportions)), ((extra limbs)), extra face, (double head), (extra head), ((extra feet)), monster, logo, cropped, worst quality, jpeg, humpbacked, long body, long neck, ((jpeg artifacts)), deleted, old, oldest, ((censored)), ((bad aesthetic)), (mosaic censoring, bar censor, blur censor), 2girls
Steps: 28, Sampler: DPM++ 2M Karras, CFG scale: 8, Seed: 2409588786, Size: 1024x1024, Model hash: d1c96253ed, Model: meinaunreal_v3, Denoising strength: 0.55, Hires upscale: 1.5, Hires steps: 27, Hires upscaler: ESRGAN_4x, Lora hashes: "raidenshogun1-000009: 8c2e28fdf17c", Version: v1.3.2
i use hires
yeah, if without it AI fucks up 40% of the time, with hires it does like 20%
OH hires means high res
yes
bro, we are the complete oposites with AI, we make the most different kind of art
this is sdxl?
nope, the 1.5 model i made
because my ass 8 gb vram card will never do something like that
oh goodjob
i bet it can, with lower settings
it's not the vram, it's how well you've learned how to use the ai, you'll get there in time don't worry! :D
i am starting to think to switch from --medvram to --lowvram
yeah man, imagine what model i will make when SDXL releases
none because your system won't be capable 
i didn't include ass shot in the prompt but the ai can read my mind... :)
what makes you say that
i am getting annoyed by the fingers
should i lower this?
everyone writes their prompts different, mostly because of different checkpoints used, but also because of the style one wants, opinions etc.
I'd write the entire prompt like raiden girl, solo, portrait, blush, outdoors, day, short hair, buddism temple, pov, step by step, Fiji mountain myself for example, with no negative prompts at all. I add negative words when I notice a consistant error/thing I don't want after the fact I get them.
But I also probably don't want to create the same thing as you so what I showed might be no good either :P
one with 16 hires steps
i have to add the kew word of the lora
like, even the base un finished SDXL0.9 can compete with this, so imagine what it would be if i do the same thing i did to 1.5 to SDXL
not if you include it by calling it with the <lora:> thing
no i have to include both
anyways it won't really affect the fingers that much
well, I never do and it seem to work, but I haven't used loras all that much yet so I might be missing some details around them :)
it is a character lora soo
you can also stack loras, you can use lora:add_detail and lora:good_hands and whatever lora is for that character and it shouldn't effect the speed
also, character loras are usually less good than character embeddings
what does character embeddings mean
it's a different kind of model, one that usually does specific characters very well
where to find them
no idea if this is raiden, the only raiden I know of is the highlander one from the first mortal komabt movie :P
then filter to textural inversion
it will show you embeddings
then stick with the loras =/
it's basically the same thing, exept loras can sometimes effect speed
idk man, it's pretty cool
Yes, I know you think that. I implied that when I stated you were interested. I asked why.
why not? it's just a prompt i like
way to low-key imply you hate it
That's a passive-aggressive reflex of yours. I'm asking him.
Yes. Ask yourself and answer: Why?
i can create whatever i want and there is nothing you can do about it, that's the whole point of stable diffusion. i don't need to take instructions from people, i can make whatever kind of art i want, i don't judge anyone here.
You are reading what I'm saying utterly incorrectly. There is no judgement. Tell me why.
high clip skip can be weird :P
well, back to secure normal clip skip instead :P
hi whats the diff between Clipdrop and Dreams websites?
Dreamstudio doesnt have 0.9 til tomorrow (i think)
so right now clipdrop has the newer model
ok thnx so SD owns 2 websites?
I just want to support the team so are both Dreams and Clipdrop owned by them?
ok thnx
is that what i just made but in a comics style?
yea
it's what my pseudo-journey-v2 model biases to when i don't prompt for a specific style
oh, that's weird. so it's a comics model? the prompt i used doesn't say anything about a cartoony-ish style
its just a weird model lmao
it can do a lot
it has biases toward Midjourney 5.1 style, that's all
i can give you a copy of mine if you want to generate that kind of image
oh i just like riffing on concepts
if that's what MJ would make then there is no need for SDXL, my model is also meant to be almost identical to MJ.
well it's a lot of the 2.1 weights. when i do the MJ prompts on that model, they look totally different
it's kind of sad because i wanted more of certain stuff out of it, but it's not represented often enough. like i can't make a kid version of Keith Richards, but there's a fair number of images of him in the training dataset
i didn't want to damage the text encoder, so, it just didn't "over-learn"
and I just merged a little too much toward yellow! >:I
Gah, will fix that tomorrow after me beauty sleep :D
yeah, overcooking a model is never a good idea
i tried harder and i accidentally beat that SDXL image
eh, but does it beat what my model makes?
probably not 😛 you have that concept down to a science
i am but a casual visitor to your domain
Should my stable diffusion look like this? I git pulled and it looks like ass now
can't even read model name
they come out very cute on my model
dude, you think it's possible to merge a 2.1 model and a 1.5 model
idk, model checkpoints are basically huge math equations with billions of parameters, if you can combine them on a specific formula, it might be theoretically possible
like, keep the architecture of one, then multiply they're data on eachother
i feel like it's mathematically possible tho. i'll look into it later
Is my previously posted image how the automatic1111 gui is supposed to look now, or have I done something wrong
It looks kinda ugly now
Sure thing
On one hand, AI letsyou do crazy stuff like this , but on the other, it doesn't know better when a ask it to draw a girl holding a candle, it draws a girl with a candle lit on top of her face
definitely good for creativity tho
-_-
nice hands, did you just gen that?
yep
wanna buy some candles, man?
just got the fixed contrast ckpt uploaded and running too
and that's just a prompt with 3 words, if you are more illustrative with your prompts you'll get exactly what you prompted
no i get stuff like this xD
what model u usin
1.5 finetuned model i made
looks like she's a mousepad
what does that mean?
i never heard someone describe a face as a mousepad XD
didn't take much fine-tuning with offset noise for darkness to come back
do you guys ever make a bunch of images at like 4 steps, pick the good seeds, and redo those seeds at like 12 or 24 steps
prompts for a firefly
gets some anthro furry stuff i shall not post
xD
i finally trained myself into the AI so i can put me in fun places
hopefully funner places are coming soon
he doesn't seem to want to leave
Anybody have any ideas on where to post/warn people about a scummy Etsy based AI art employer?
well assuming it won't get you banned, on etsy
Good point, though I'm not really trying to warn customers as much as I am people who are signing up to work with them, such as myself
well that's why i thought you would do it tho lmao im confused now
but you can't rate buyers?
maybe i dont get whats happening actually
They are scummy people with sweatshop wages who took advantage of me, and are currently trying to take advantage of other people.
I want to warn people in the stable diffusion community about signing up with them, because they are not nearly as good as they may seem on the surface
they sound like someone else
probably reddit tbh
That was one of my ideas as well, I just wanna make sure that its well known that they are taking advantage of their artists
Sorry what did you use here? Looks very cinematographic as if taken straight out of a movie
Damn man you got some nice work
I have to do a bunch of stuff in post-processing to achieve somewhat of a cinematographic look (ie. www.dannyhiele.com) like writing scripts like https://gist.github.com/basicfeatures/a083eb1ac1602f583d4993c3aca65358 and various color graders/VSCO presets
that's my checkpoint i just finished today
and, thank you 🥹
that earlier version wasn't the final one
Can I send a PM?
No probs
Your message could not be delivered. This is usually because you don't share a server with the recipient or the recipient is only accepting direct messages from friends. You can see the full list of reasons here: https://support.discord.com/hc/en-us/articles/360060145013
🤷i can't send one either lol
Yeah, so that shitty comission compant banned me and with held my pay, luckily I have all of our chat logs and stuff, so fcuk them
Their name is WattleArtCreations, and they scammed me, underpaid me, and also purposely tried to feed me wrong information to try and upsale the customer into forcing them to have to buy more revisions
When I demanded contact with the client to know they were not feeding me false info, they banned me for "unreasonable claims"
I worked nearly 20 hours on a single art piece of which I was making sweatshop wages on cause I was passionate about the client and character, cause the client was very nice, and now I have been screwed out of my pay, and they have been screwed out of my work
man, that blows. i really wish there were better ways of holding people like that accountable
Same here, it is so shitty of them
have you set up your own store @smoky oak to work directly with clients?
I have plans to, but etsy has been giving me account issues, so I will just have to make a new account after trying to salvage mine for well over a month
I suppose I can put this image up for what I am capable of doing
@median wren
don't let it get to you, and i mean that in the best way, i don't want some one stupid person to change you or how you trust others, you are fine the way you are and anything that disrupts that would suck
I'm not used to you being so nice!

I just watched a video last night (turned out to be click bait) about someone trying to set up a ai art store on etsy. they said new accounts get blocked by default and you have to wait weeks to appeal it
Right I figured out where to run it but I was looking for a guide like with the styles and things
i really thought about my words that time
All new stores or ai stores? I haven’t set mine up yet to use but I created it recently i guess That was good
which is why I wanted to use my old account, its 5 years old, but apparently Iit got hacked over covid and somebody opened a fake shop and scammed somebody, but they keep putting me in the "process to reclaim it" then they just don't
but i've been through that too, and it's just, it can really affect your outlook. and i don't want to brush it off. but you're stronger than those folks and they needed you, not the other way around
I am already putting stuff out on reddit about it, I have the screenshots from our convos
I really REALLY just wanna get back in touch with the client, cause they were so nice
I got the impression all new stores. Might be a way for them to 'manage' all the spam stores by just making people appeal and wait a while
yeah focus on that part, silver lining
oh, they are built in, if you need any more info, I am down to help
Glad i finished the setup portion then at least
cool I just wondered what the options were
I have 0 contact with the client, cause they were all power trippy about intermediating everything
oh, very different thing, thats self employment LoRA training
like I assume there are only specific styles built in
this was an art comission
you're so many-faceted
yeah they do that so the channels don't get flooded with that
yeah, for sure
🫡
we were trying to figure out why when I was showing them, SDXL lmao
I assume they were in the wrong channel haha
I asked about the styles prompt in the bot, and the answer I got is it's basically just adding the word to the prompt. But i guess it's there to get people to try the different ones they know work.
I try to be, I do all sorts of creative things, but the one time I find something where I make a little money (even if it is pathetically little, AND They take 60% of it for doing jack shit), I was pretty desperate to get some money, and now I have been screwed out of $65 or whatever it was
yeah, and there is always a negative as well, its a default one that we can't see
Wait what do you mean
if its outside of the bot channels, its blocked
Ok so the styles pop up? I guess I didn’t realize that
yeah
Oh
you can tell they are constantly tweaking the settings. last night I was playing around with it and the quality suddenly dropped to really ugly images 😄
Maybe I used dm by mistake
See! I noticed that as well!
I was so confused why they were so bad all of a sudden
they need a thumbs down button lol
Thanks I didn’t see it before
RIGHT
No worries, glad to be of help
Why do i keep getting its voting time and it won’t work. I thought you just had to vote on your own but i voted a few other peoples now but it still keeps saying that
there's a showdown every 30 minutes where they block image gens for a little while and encourage people to vote in the #1100484581037195384 channel
Thanks. How long does it last
couple of minutes?
it takes so long to wait 
Ok cool
you can scroll down to the bottom of the channel to see the countdown, the bot should be available again shortly after it hits 0
Hello, this is image generated in SD (txt2img, controlnet segmentation). Now I need just to change the pendant to different one. Is there a way to accomplish this via img2img, please? Thanks.
try inpainting using an inpainting model and describe the light you want?
i compared my models photorealism vs. that of deliberate, dreamshaper, and realistiv vision
mine is arguably worse, but also doesnt have any noise fix applied and i would argue my model is more diverse reagrding the output, e.g. the clothing, faces, backgrounds, etc
the grid doesnt show the negative prompt but it was:
anime, cartoon, digital art, cgi, render, 3d, drawing, sketch, instagram, pastel, dada, zombie, ugly, surreal, text, watermark, abstract, old, fat, jpeg, black and white, vintage, amateur, film grain, evil, damaged, concept, unfinished, model, cover, clay, figure, toy, pixelated, bad, inexperienced, illogical, random, oversaturated, overexposed, rough, fake, unrealistic, sloppy, artificial, low budget, unprofessional, cropped, out of frame, low-quality, poorly drawn, deformed, bad proportions, malformed, imperfect, unnatural, extra, rushed, weird
resolution, sampler, steps, cfg?
full generation parameters:
medium-shot photo of a woman, shot in 4k high-quality with a Fujifilm X-T3 camera with natural lighting and f1.6 bokeh applied
Negative prompt: anime, cartoon, digital art, cgi, render, 3d, drawing, sketch, instagram, pastel, dada, zombie, ugly, surreal, text, watermark, abstract, old, fat, jpeg, black and white, vintage, amateur, film grain, evil, damaged, concept, unfinished, model, cover, clay, figure, toy, pixelated, bad, inexperienced, illogical, random, oversaturated, overexposed, rough, fake, unrealistic, sloppy, artificial, low budget, unprofessional, cropped, out of frame, low-quality, poorly drawn, deformed, bad proportions, malformed, imperfect, unnatural, extra, rushed, weird
Steps: 35, Sampler: Euler, CFG scale: 7, Seed: 525484821, Size: 512x512, Model hash: fd7ce58be7, Model: st-AI-le_st-AI-le_v1.0, Denoising strength: 0.4, Hires resize: 1024x1024, Hires upscaler: R-ESRGAN 4x+
i guess i could have done that from the start lol
i achieved better photorealism than realistic vision using 2.1
@sterile templethat 2.1?
no it's a 1.5 mix I've been playing around with for a couple of months. photorealistic pretty women is not my main goal though 😄
more creative arty stuff as the goal
yeah its not my main goal either
my model combines multiple very distinct styles, photorealism is only one thing it can do
i was wondering why the skin was so excessively smooth
@smoky oak i'm so stoked The Darkness has returned
ai cryptoid party?
cant wait for open source release of new model in July
"good" can mean a lot of things. It's hard to judge something without all the details I'd say. For me, someone liking their art is good. but that doesn't mean I think the art is good, just that it's good that people get what they want :P
https://huggingface.co/spaces/allknowingroger/Image-Models-Test10
apparently my flex-base model is now running here and can be tested on this space 😄
damn. my pseudo-journey-v2 model is there too. it looks soo good
humble as always? ;P
that test uses no negative prompt
you tell me which img you prefer 😛
epiCRealism does a really stellar job too
for me, it would be the second to last one, the one at the bottom, three from the left
that one to me looks like they hyper-trained the model on porn
it doesn't follow the prompt very well
you asked which one I prefer, and I answered :3
the black squares are NSFW images 
if it's about perfection, then the three black ones are perfect because they don't have any errors in it! ;P
their error is that they don't follow the prompt at all
it's just asking for a white woman in white clothes on white background
the anime ones are wow, broken looking lmao the studio one is nice but that's not a white background
the black ones might be because you didn't tell it to toggle the lights in the room? 🤔🤣
or put her clothes on 😛
but you said the one you liked the most was because it followed the prompt, but isn't the background gray and not white? :O
holy cow what the burned model hell
that's hilarious, it made her disheveled as hell
epiCRealism ballin' result again
this prompt is more generous to each model
except pastel-mix which still looks like trash
was my first result on the random model I had loaded :P
nice! if it can also do a completely darkened room then that model is likely pretty good if it has a balanced training set too
judge out on that ;P
looks like it has an averaging issue
oh well, not every model needs to make dark images 🙂
I've only seen loras be able to get good (in my eyes) images so far. Might be because it's hard to find many images taken that's all dark and not have some sort of light source
i got this random action pose and decided i like it so much i gotta inpaint it and fix some stuff
it's because of the noise issues with the training schedule
Sorry to repost,
but i'm very proud of it
as I don't understand what that mean, I'll take your word for it :P
the reason most models have issues with brightness is because "inference sampling is incongruent to training sampling."
in plain english, when you are training, you take a sample and iteratively add noise to it. the last timestep is the noisiest.
oppositely, during inference, you take a sample of noise from the data distribution and you iteratively remove noise from the image.
these are literally opposite processes. training is a forward process and inference is a reverse process.
the proposed issue states that the inference process should match the training process. it should start at the last ("trailing") timestep and work "backwards" to the first timestep ("leading").
this means that the inference process picks up where the training process left off, with the beginning state of inference being conceptually similar to the ending state of training.
i'm sure that still doesn't quite help explain it but at least you can take any single part that confuses you and beg GPT to break it down further 😄
"plain english" :P
It reminded me of this simple example of how to explain something to someone who don't understand the technical stuff

it's so simple! everyone knows this ;P
😮 someone made it so my model can be used directly on the model card page
that's gonna suck if it doesn't use negative prompts lmfao
I'm too shy for uploading my weird mergers because people will probably laugh! and then laugh again when they look at me 😶🌫️
maybe they do so when I say stuff like, "It's true, mardi gras, is really just a color!"
Absolutely, your explanation is solid and clearly outlines the concepts. Let's try to simplify this further:
Let's imagine you're in art class. You're given a beautiful, clear picture of a landscape. Your teacher asks you to add layers of paint splashes on top of it, each layer making the original picture more and more difficult to see. After several layers, it almost looks like a mess, nothing like the original picture. This is like the "training" process in the model. You start with the original image (the 'sample') and add more and more noise (like paint splashes), until you end with something very noisy.
GPT4 is trying to help me explain it to a high school student
Now, imagine you're given a similar noisy, messy painting, but this time, your task is to gradually remove the layers of paint splashes until you uncover the beautiful landscape beneath. This is like the "inference" process. You start with a noisy sample and gradually take away the noise, aiming to reveal something that looks like the original picture.
The problem is, sometimes the picture you reveal doesn't look like the beautiful landscape it should. It's because the process of uncovering the picture isn't matching the process of covering it up perfectly.
The proposed solution is to make sure the uncovering process starts exactly where the covering up process left off. That way, every step in revealing the picture directly corresponds to a step taken while covering it up, increasing the chance of revealing a beautiful landscape.
In short, the proposal is to make sure that the process of revealing the picture (inference) matches perfectly with the process of covering it up (training). By doing so, we improve our chances of creating better and more accurate images.
i asked GPT4 "but this sounds like counting 6 eggs from left-to-right vs 6 eggs from right-to-left. why would "sampling timesteps in reverse" be an improvement?
That's a good point, and your analogy holds in many situations. However, when dealing with diffusion models and similar types of generative models, the process is not symmetric, and thus the direction matters significantly. This is due to the inherent nature of the data transformation that happens in these models.
During the training phase of a diffusion model (the forward process), the model starts with real data samples and progressively adds noise until it transforms the data into a predefined noise distribution (often Gaussian). So, each forward timestep brings the data closer to a noisy state.
In contrast, during the generation phase (the reverse process), the model starts with a sample from that noise distribution and tries to reverse the transformations applied during the forward process. It takes a noisy sample and step-by-step reduces the noise, aiming to generate a sample similar to the real data it was trained on.
The critical part here is that the transformations applied at each timestep are not symmetric; the changes made to the data during the forward process can't simply be undone by just "counting backward". Instead, the model has to learn a separate reverse process to go from the noisy state back to the original data.
For instance, if the model has learned that at timestep 3 during the forward process, a particular type of noise is added that blurs the image, the reverse process at timestep 3 needs to be able to undo that blurring. But the reverse process can't simply "remove" the blur; it has to generate details that look like they might have been there before the blurring occurred, based on the current state of the image and what it has learned during training.
So, when the suggestion is to "start from the last timestep", it ensures that the model begins the reverse process from a state that is equivalent to the one at the end of the forward process. That way, each step in the reverse process corresponds directly to a step in the forward process, which is important for maintaining consistency between training and inference.
:)
I like nr 3 and nr 4 the most ;P
starting at the basics takes too long
start at the end and work your way backwards 😄
basically start sampling your learning from the last timestep
b-but that's even longer! 😵💫😵
NO because you have a shortcut to the things you don't understand
the pyramid of knowledge is widest at its base, and there is so much to learn
if you go from the top down, you find out exactly what you need to know
hehe, this is the reason I'm so simple minded, well except being simple minded, I just want to give/recieve the quick answer to something. Like, if someone asked about how to get better hands for the billionth time (yes, that was very passive aggressive, sorry :( ) then I'd rather say "it's luck" than, becoming the real life example of this masterpiece video:
https://www.youtube.com/watch?v=aW2LvQUcwqc
:P
The fictional Retro Encabulator device, which uses six hydrocoptic marzel vanes and an ambifacient lunar wane shaft to prevent unwanted side fumbling. We can't believe the salesman was able to keep a straight face.
but for me i just give all the info and answer questions about deficiencies in understanding. i won't know all the answers and those we find out together
I always got answers, might be good answers, but they are still an answer :3
trying to give foundational knowledge to someone whose background i don't know, is impossible. a lot of the time you end up discussing shit they already know, and in the process, boring them...
might not be*
i took a shot of that prompt in my model
right is without negative prompt, left is with
it decided to ignore "inside cafe" tho lol, prob because you also have "storm clouds" in there
using textual inversions or LoRA?
those shades she wears in the top right is killing me! hahaha, it's so good! :D
same
my own model, a 4gb checkpoint (or 2gb pruned) https://civitai.com/models/97393/st-ai-le
Huggingface link: https://huggingface.co/ai-characters/st-AI-le Multiple characters Many styles One model st-AI-le by AI_Characters If you like wha...
autumn portrait of a russian young woman inside cafe, knitted fox-hat, fog, close-up, storm clouds, wearing red sunglasses, red shirt, (artstation, deviantart)
Negative prompt: anime, cartoon, instagram, pastel, dada, zombie, ugly, surreal, text, watermark, abstract, old, fat, jpeg, black and white, vintage, amateur, film grain, evil, damaged, concept, unfinished, model, cover, clay, figure, toy, pixelated, bad, inexperienced, illogical, random, oversaturated, overexposed, rough, fake, unrealistic, sloppy, artificial, low budget, unprofessional, cropped, out of frame, low-quality, poorly drawn, deformed, bad proportions, malformed, imperfect, unnatural, extra, rushed, weird
Steps: 35, Sampler: Euler, CFG scale: 7, Seed: 525484821, Size: 512x512, Model hash: fd7ce58be7, Model: st-AI-le_st-AI-le_v1.0, Denoising strength: 0.4, Hires resize: 1024x1024, Hires upscaler: R-ESRGAN 4x+
@cunning geode nice, in 2 months i spent $25oo to train my 2.1 model 🙂
right is same just without the negative prompt
Spam
its just the experimentation with captioning, datasets, hyperparameters, etc that took so much money
your goal to have a flexible model is noble
i hated all the 1.5 and 2.1 fine-tunes that end up nuking a lot of the base knowledge
well you can't do what you've done without damaging some of it. but it's a trade-off. and by "what you've done" i mean, training it so far that you don't need negatives
yeah i tried to preserve that, even merged some of the 1.5 text encoder back in
you still gotta emphasize morgan freeman by like 1 or 2 () if you wanna show him on a surfboard, but at least it works
did you freeze any layers of CLIP? it doesn't have very many, and it is easy to screw it up as a result.
depends on the prompt, i noticed that the photos look a lot worse without the negative prompt
well yeah. but they're not truly incoherent like stable-diffusion-1-5 or 2-1
no freezing
my Pseudo Journey v2 model is the same. when i run negative-free tests it has the BEST outputs of any 2.1 model
but it gets left behind when you start using negatives
is that the one with zero terminal snr
all of my models have zero terminal SNR but only pseudo-flex-base managed to keep it stable
yeah i saw that, and experimented a ton with zero frequency noise and terminal snr, as well as v prediction on 1.5, tons of testing, in the end i found that nothing worked without making the quality worse
so v1.0 has no noise fix for now
interesting
autumn, portrait of a slavic woman, coffee cafe, knitted fox ears beanie, depth of field foggy, storm clouds, red-tinted sunglasses, warm red shirt, artstation, deviantart
Would be how I'd write it, but I'd also not write it like that if I wanted to give a cherry picked image to show how awesome I am.
grumble, why it not make fox ears >:I
i had to kick the model in the face at the very end with 1 epoch of SNR gamma and 1 epoch of offset noise
otherwise it wouldn't generalise contrast
my model includes anthros and such, hence its probably better with the fox ears
it's actually supposed to be her holding fox ears in her hands, from the original image i lifted the prompt from
they used prompt editing stuff i removed tho
nah, it's probably because you used evil black magic or voodoo (not evil voodoo, just the normal kind) ;P
after having just deleted 32gb of 3 months worth of test samples, i am glad i can finally rest some and just generate cool images with my model
yep it was like a sigh of relief to delete 480GB of training data and test samples
think i sighed so hard i shit myself a bit
what? create images instead of critiquing someone elses images, and/or the tech behind INSERT_VERSION_HERE against INSERT_OTHER_VERSION_HERE?! :P
it's frustrating seeing amazing images being made in here and i'm like halfway stuck between two checkpoints and every model i have access to is TRASH 
ive never actually used other peoples models lol, only ever my own test models
i started out with Deliberate and pixel upscaled 512x512 images to pretend we had high-res and then i learnt about controlnet tiles and then i figured out 2.1 fine-tunes are superior to 1.5 fine-tunes and set out on a quest to create a 2.1 multi-aspect contrast-fixed base model for others / myself to fine-tune from
that all began about 3 months ago 😄
i think the best part was writing my own fine-tuning scripts and trying to implement any missing pieces that i want from A1111 using Diffusers directly
oh yeah i dabbled with 2.1 too, but all i ever got was trash even with radically different settings, e.g. TE layer freezing etc
for me, 2.1 just didn't give me images I wanted. 1.5 was a lot easier, and because I'm lazy, I don't always finish what I've sta
whats that one famous american oil painter called with all the landscapes? he also drew the one with the falling roman empire
not thomas kinkade
contrast comparison of my 2.1 f/t vs 2.1-v and realism-engine
ah thomas cole
and multi-aspect support comparison. my model is a base resolution of 1024x1024
yeah my first published model was 1024 res, but i realised it probably doesnt increase quality enough to be worth the time and effort
so i stuck with 512 now
to reflect upon your assertions about 2.1, it is really hard to train because of the amount of noise remaining in the text encoder and the unet, but you can't train both together
you have to first fine-tune the text encoder, and then the unet (or the other way around) but fine-tuning both together makes the weights too much of a moving target and convergence is like the last thing it's going to do
this is a 2.1-v model fine-tune. the base model res of 2.1-v is 768x768, and the latent space internally is 96x96 as opposed to 64x64 for 1.5's 512x512 model
took me way to long to remember, not remember, and then look up the artist named JAMES GURNEY to say that I liked his artstyle a lot, and even the style the ai can create using it :D
But yeah, I like it, might not be the same for anyone else hehe
2.1 loves the art of Bob Eggleton and Darryl Eggleton but you have to put one or the other's names into the negative prompt, so that it doesn't blend their art style
Darryn Eggleton?
sure, him
typos in the negatices can have fun results :P
and no, I didn't make that typo above on purpose…
i just realized that sometimes including artist names can really help with my model
(((oil painting))) by william adolphe bouguereau and thomas cole of korra with lh hairstyle as a superhero with cape vs. (((oil painting))) of korra with lh hairstyle as a superhero with cape
yeah, artist names are one of the best things in my opinion, it's so sad to see people wanting them gone :(
well i am one of those that want em gone 🙂
it might even be the best thing for the ai in my eyes as well
i don't think adding the artist improved the style much, but it improved coherence?
same seed
I write my prompts by what they do, not what I want them to do
That is, I learned that prompt x creates one thing, and prompt y creates something else. So if I have to write a ice cream colored hat to get the correct color I want, then so be it :P
heh, didn't know 2.1 also had that detail on the forehead :P
fuck is this good
forgot the training has kept going and after another 400 steps it's losing its dark contrast again
that shit is so unstable
this kid finally figured out how to open the pack of Marlboros and it turns out it's a lollypop
@cunning geode concert crowds are one of my favourite unsolveable AI gore prompts
Mary Blair Lord of the Rings
whats the prompt
A cyborg playing a futuristic instrument in a crowded concert
HRA_hyperrealism_art is great without negatives, damn
with some proper manual editing by an actual artist, this shot could easily pass for real in some cheap isekai anime
trigger word : waltuh Readme!! : LoRa is for 2.1 768 models only quite an experimental lora so pls be kind on me if it makes some messed up gens on...
how A1111 produces images
fuck that thing sucks ass 
it has a broken tiling implementation too
i generate all my stuff with A1111 and never faced such issues
i'm using a 2.1-v model and nothing in A1111 is really optimized for these, it seems like they're begrudingly supported rather than something they really want to work properly
a lot of issues people face with 2.1 is just due to this broken shit in A1111
weird that no one noticed, wonder if it might be because no one uses it as to be noticed? ;)
well, no one uses it because it's been noticed and won't be fixed. the people responsible for fixing it, do not care about 2.1. my assumption for why that is, they want to make booba and it won't
if people are creating things they want, then it's good :D
get religion VAE
VAE is included in 2.1 and in fact you have to add --no-half-vae to sdwebui-user.sh so it will even produce images without "NaN tensor produced in VAE"
this thing is broken
mine is trained using a new noise schedule from the "schedulers are flawed" paper, and requires sampling from the trailing timestep first - which, A1111's implementation of DDIM doesn't do. i don't think there is a single sampler in A1111 that works correctly
if the results from people I've seen are using a broken sampler, then I don't want it fixed :D
euler A gets the closest but looks all blown-out and hyper contrasted because it doesn't apply CFG properly
this mindset, to me, as a developer, is really frustrating. just to let you know 🥹
`waltbrba lora:HeisenbergDogu2.1:0.72 ,in a restaurant, (pizza on plate:1.25), , solo, portrait shot, 8k uhd, dslr, high quality, film grain, Fujifilm XT3
Negative prompt: (closeup:1.2), nfixernext nrealfixer Neg_Facelift768
Steps: 40, Sampler: DPM++ 2M Karras, CFG scale: 5, Seed: 2739604252, Size: 512x768, Model hash: b346aa1648, Model: rmadaMergeSD21768_v70, Denoising strength: 0.2, Hires upscale: 1.5, Hires steps: 30, Hires upscaler: 8x_NMKD-Faces_160000_G, Lora hashes: "HeisenbergDogu2.1: 533dae38bd9c", Version: v1.3.2
Used embeddings: nfixernext [046b], nrealfixer [ff26], Neg_Facelift768 [c1dc]
Time taken: 29.63sTorch active/reserved: 5895/7370 MiB, Sys VRAM: 8192/8192 MiB (100.0%)`
as a developer, it's not to me :3
i didn't expect DPM++ 2M would work so well
but the paper for DPM starts out with the assumption that t=T, so, it implements the congruent sampling schedule
it IS the best sampler
probably for A1111 yeah
I just click on the combination of numbers and letters I find most funny when picking the sampler, most of the time :P
other than the ones that gets those blue artifacts
DPM++2M SDE Karras is also good if not as good, but produces monstrosities every once in a while
DPM++ SDE is not good
yeah hit n miss
DPM++ 2M SDE is whackadilly too.
I mainly use this one, what's the difference between it and just 2M?
sometimes it's also not the hammer, but who wields it ;P
many previews were made with dpm 2m sde karras here
i get that you're trying to be insulting in a clever way but look at the 'correct' results for the same prompt and tell me that the others seem to be functioning properly or that it's a prompt issue. it's NOT an issue with the prompt. these samplers will not do contrast correctly, no matter what you prompt
technically, this follows the positive prompt of a smiling woman, i guess, but it doesn't have any coherence to it, so, DPM fast = trash
DPM Adaptive produces the same shitty noisy images that 2.1 is known for
I'm mostly trying not to sound mean to try and show that when one person gets results they don't like, while everyone else gets results they want. Then it might not be because of what's used, but who's using it. Insulting someone is far from my goal. I don't show bad results and tell that this or that is objectively bad, while at the same time knowing that most people who will read it are the ones who think their results with it is not objectively bad.
I like to show what I like, and results that are good to try and lift up people and show the sky, and not the easier path of bringing something down toward the dirt :)
the first four Karras samplers all either don't clean up all the noise, or they don't do contrast correctly, or the closest it gets is by hyper-smearing everything
i'm not saying "they're bad because i don't like the output", i'm telling you they are technically incorrect in their implementation
the DPM++ 2M Karras sampler does contrast properly but not denoising though it seems to be less hyper-smooth/smeared than DPM++ 2S a Karras.
sadly, I don't believe you on that. Sorry, not trying to be mean, just being honest. Saying something is this or that and make people agree is very different from showing something and making someone understand it in the way they need it to understand.
more poorly executed contrast from DPM++ SDE Karras
i'm not even the only dum dum that believes this. the stability devs did the a/b testing on the bot to showcase how bad some of these samplers are
i think their favourite bad sampler is LMS
which one of the devs?
it was literally the whole team testing that, but mcmonkey and joe penna were the ones talking about it on here
I'll see if I can ask them that as long as I know who to ask first :D
DPM++ 2M SDE Karras is pretty bad lmao is this on the surface of the sun?
i think most disappointing is A1111's version of DDIM because the Diffusers implementation of it works and the A1111 version doesn't know what the hell contrast means
and what is PLMS even for? 😛
thanks. :D
LMS is the one I agree with, that's the one I meant with what I said earlier with those blue artifacts. But I didn't think a dev could be that wrong to say the other one was broken. Maybe need to ask them to clarify about that later.
so DPM++ 2M and DPM++ 2M Karras are the two best samplers that have the latest and greatest science inside them
i hated DPM++ 2M SDE. it is total crap
but hey if you like their outputs, i said earlier not every image needs to be bright or dark, sometimes it just works in the middle, which is what most of these broken samplers end up enforcing
DPM++ 2M SDE doesn't seem broken to me at least. I don't get any "breaking" artifacts at all. But I didn't test more than 10 or so image generations, but that should happen all the time. Might be a 2.1 issue if so
well broken is in a technical sense. remember we're making images out of noise and it's a miracle it makes sense at the outcome at all
i picked a worst-case prompt that showcases exactly how bad samplers can be
it's honestly not even the prompt, it's the new noise schedule in my model that has a much wider range of contrast it can express, but these samplers end up like, doing the visual equivalent of a high-pass cutoff (audio thing)
I'll set my tests to DPM++ 2M SDE for a couple days and see what happens. I read "broken" as "a thing that doesn't work"
Not "broken" as in "it might do something wrong sometimes if you wait"
Because then the entire art ai would be broken :P
DPM++ 2M SDE reliably does the wrong thing on a v-prediction model with a zero terminal SNR noise schedule
maybe v-prediction is trash but it has so many training benefits
i'd rather just repair the samplers. and when they are 'correct', they have way nicer outputs, way more often
why don't you repair it?
i'm ignorant as to how it's done
the simplest samplers are fixed first
i could maybe do DDIM, because i reviewed that change for Diffusers, but i haven't really looked at it yet. that code is scary
v2 looking better I guess
I've never seen your coding skills, or code, so I'll take your word for it :P
The ai part of art is scary enough to me :P
go nuts and see how bad my code style can be when there's no grouchy team lead to complain at me
i upscaled my image 4 times the size from 512x768, but in my outputs its under extras not under a size like all my other outputs i have filed, how do i know what the demission's are for the upscaled size, i know this is a newbie question but could someone help 😅
controlnet?
@wispy nest also the original terminal SNR paper's authors should have provided an example like this, which demonstrates the difference between 0.7 rescaled guidance and 0.0 rescaled guidance.
the 0.0 one is darker. and 0.0 also allows images to go WAY brighter.
the interesting part to me is that a rescale of 0.7 with a 'correct' sampler resembles the images you get from the 'broken' samplers in A1111
with a broken sampler and my model i was having to push the CFG over 9 or 12.0 in order to get true dark images. this added a bunch of noise and garbage to the image.
with the fixed DDIM, i can set CFG to 5.0 and get true blacks or rescale it by 0.3-0.7 and have a middle-of-the-road type image.
no, i generated the original images at 512x768, then went to extras and upscaled x4, but when i look at the png info to see the new upsclaed size, it just reads x4, how do i know the demensions after something has been upscaled?
can I tell u how I upscale stuff?
yes please
generate 512x768 with DPM 2M karras at high step count usually 40-50
if there are too much deformities like fucked up eyes etc I send it to inpaint,
after a satisfied inpaint job I send it to img2img,
0.3 denoise for little changes, 0.4/45 for some more enhancements
and use the same image in controlnet, tile preprocessor, controlnet is important check,
and I use ultimate SD upscale with scale from image anywhere 2x to 4x
done
and yeah- 4x ultrasharp for upscaling
or just get hiresfix xD
nah controlnet is good,
high res fix is for lazy ppl
or who just want to get job done
doctor heisenberg
it does almost everything you mentioned on it's own
it can be used with controlnet
you forgot a tiny detail where 8gb cards can't do above 2x in high res fix alone
u need tiling son
eh, skill issue =/
i have a 4070ti, so i guess i can't argue with that
money issue
i used high res fix yesterday lol, still learning controlnet, but i would love to really get comfortable with this type of workflow
I checked some of the code on your github, but I don't write pyhon stuff so I couldn't understand most stuff that wasn't json stuff but I sent it to a couple friends who knows more about python. But they haven't checked yet, or rather written me back :P
I'm nothing without highres fix! if that'll vanish, so will I! 😱🫣😵
512x768 upscaled 4x would be 2048x3072
is it as simple as just multiply it by 4? 😅 sorry im such a newb
not exactly, but yeah, you can say that
the main advantage of using hiresfix is that it reduces AI fingers and AI conjoined twins rate
if without it there is the AI syndromes 16% of the time, then with it it's 7% of the time
walter white - crack dealer
yes simple as that
how much vram do u have?
first time using it today for me
there is an upscale wiki with custom upscalers that are made specifically for hires
i recommend the ones on there
i'd rather just not rely on hires fix at all 😛
I used it before, and then after the whole drama around when auto1111's webui changed how it worked. I was one of them who was really, REALLY angry from it. makes my blood boil even thinking of it still...even when it's actually quite nice now even with the change :P
it certainly made it easier to clean up an image with a very basic prompt
buuuut i think less of images that are made with hires fix
it reduces the rate of AI syndromes by a lot
i don't need that with this model
i use it in almost everything i make
no hires fix, same prompt, same native res that hires fix made earlier, and still just one subject
theres a bit of residual noise but that's likely just a bug in the samplers still
the image you sent before and said you used hiresfix was kinda better
i agree, but it's a very basic prompt too
could improve it with either method
the hires fix will maintain most of the image though
the 1920x1080 native gens still have some issue but according to stability devs that's a limit of the transformers inside the older unet architecture
that's with hiresfix?
weird
oh, that makes sense
i bet if you used hires fix with 0.6 denoising fingers might be fine
it does 1920x1080 without dupes sometimes, without hires fix. but it's not as reliable and apparently starts to 'lose integrity' across the scene
as you can tell
still i'm impressed that experiment worked as well as it did
doesn't look like it would take much architecture changes to get reliable 1920x1080 out of it
here's how I most often use the highres fix for. That is, make images larger, and also fixing some details such as the typical smudge of a face from a distance. But as seen, it's never a perfect fix. But that's what the "generate" button is for, to make another image :P
8gs of vram 3070 graphics card
yeah the one thing i'd really love hires fix for is faces at a distance but i've managed to improve them to an extent i'm super happy with anyway. these men in this image would not even register as faces for GPFGAN to fix before, but now it does
yeah, i don't know how to use hires on that
512x768 with 2x hires is the most u can do then
just like me
but sadly, it can't fix every finger, not when the first image has them as if pointing a dagger :P
try increasing denoising strength or switching the upscaler
yeah, I know of those things, or at least I think I know :P
It can probably be fixed somehow, but I just wanted to show that it can't fix everything every time. Or I can't make it do that hehe
my pseudo-journey-v2 model with the same prompt/params
it's kinda promising actually, i bet the 1920x1080 gens all look pretty good if they have mostly landscapes or abstract items rather than human subjects whose shoulders shouldn't morph randomly
@oak osprey the original rescale extension for a1111 is broken and wrongly implemented
this is a fixed version https://github.com/AMorporkian/sd-dynamic-thresholding-rcfg
i use no extensions at all
you need an extension
auto1111 doesnt have native reshale support
yeaaah i'm not going to install an extension, my point of testing is to make sure that the model will work out of the box for A1111 users
they should implement it built-in. no need to have it as an extension, since it's a core piece of the diffusion process
well it just wont ever. auto1111 has no native rescale support snd ztsnr only works correctly with rescale (altho it never did for me, it did for others). and the original extension is borked, so the one i linked is the only true working one.
people also install extensions for controlnet and such
they will manage
the negative reviews on a model don't stop coming in just because there's some obscure extension available
wait, there's a extension to change the highres fix?! I just use the one without downloading something...I think, maybe I forgot it if it's super old? :O
damn, SDXL is on a 12 billion image dataset?
this one is without any highres fix, i.e., made "just as" with settings, no loras, etc. But I forgot to change so it's using the very broken DPM++ 2M SDE ;P
now to get me some ice cream before it melts from my computer crying flames from creating a image at that size!
also, this is pretty pog
anything is possible with enough fine-tuning!
they also said that txt2video is about to be a real thing
since i've seen what i've seen, i withhold judgement on how fine-tunable something is. people told me 2.1 was harder than 1.5, and i guess they just never really tested a serious cross-section of prompts on their 1.5 finetune.. for me, 2.1 is easier. for kaibioinfo, 2.1 is easier. if everyone tells me SDXL is hard to fine-tune i'm more likely to assume it's the same as 2.1
but no one at stability has remarked on fine-tuning the text encoder(s)
there will be specialized finetuning tools
no
rescale is something else
trust me, i WILL make an SDXL version of my model, and it will be insane
those tools are unlikely to focus on fine-tuning the text encoder. it isn't something stability does
the unet can be tweaked but you can't gain cross-concept coherence with just the unet training, in my experience
the text encoder is what learns to generalise that stuff for the most part, if you want some dipshit celebrity riding a camel, it needs to know how to associate those concepts
idk what to tell you, whatever i did to 1.5 worked, that's all i know
he's in your model. he sees the filth
ok so if i upscaled my image 4x already then is that all i can do with my image or can i enhance it more with inpaint now? or would it be better to work off of my 512x768 images?
better to inpaint then upscale
it's easier for the AI to mess with less pixels, in fact- it will be times 4 easier for it
some days, right guys?
i do and don't want to know what's in SDXL / 2.1's training sets that they can make these images
cracker dealer, you mean? he is in the soup aisle
you need to realize that hires fix isnt magic, its just automatic upscaling and img2img
img2img has a weird effect of bringing out random details in a piece, im not sure i like that
i've literally written my own hires fix implementation. it's not "just" upscaling and img2img. that's a really misleading statement
oh really you writing the code bro?
but really what am I missing out on there, it gens the image, then runs a chosen upscaler, then runs the img2img with the selected denoise strength, no?
i'm not sure, but what i AM sure about is that it reduces AI obstructions rate.
it ties into the diffusion process andhas a lot of control over which stage the noise is removed at
yeah, that sounds right
now that's interesting
that's why it's slower than the upscale tab
it's also why no other tool has it
yep, i'd say hiresfix is probably one of the most crucial A1111 extensions
i never use it
it's useless if you have low vram
i would have agreed before i spent all this time fixing zero-shot wide aspect / portrait gens in 2.1
plenty of ram to load oblivion textures
now that i have a model that works straight up in Diffusers as i expect the models to, hires fix is merely a "nice to have"
yeah, but i'm not sure if it's even relevant once SDXL1.0 releases
it is, mcmonkey said as much
wahts sdxl anyway
sdxl can't natively gen images (yet) beyond 1 megapixel, which i assume they did to keep the attention layers focused on the task at hand
after ive seen what happens when you train 2.1 on >1024x1024, i don't blame them. mcmonkey has been showing some 512x512 gens from sdxl that look "fine"
new architecture, pretty much
the SD2.1 768 model can't make 512x512 without looking broken because its attention layers are so conditioned on the detail it expects to exist in a >512x512 image. when it tries to make one, it expresses details that the model can't cope with and these come across as artifacting
i assume they included even 256x256 training data in the SDXL pool
Do they resize training images both up and down?
not just new but like, bleeding edge experimental. so bleeding edge they might end up cutting off a huge chunk of it before 1.0 lmao
i have no idea whether they crop, down/up sample, or flip / rotate images for SDXL
I think they didn't used to flip images at all
they said there will be a near perfect base model, so that's what's going to happen
extensions?
"perfect" and "base model" are so difficult concepts to combine, @cyan snow
I was doing a text thing, and if I flipped the text element left to right it couldn't map it as well into characters
still it's damn closer to that than 1.5 is
@cyan snow the base model will know a lot and hopefully the data distribution of its weights are relatively even! this means no unintentional overfitting. but the even distribution of weights, seems to make prompting more complicated
it has so much knowledge that you need to know how to manoeuvre the weights via your prompt. the current implementations we've gotten to play with for prompt attention emphasis are pretty powerful but it's almost like they're too strong, as, little shifts in weighting have outsized impact
4-6 fingers instead of 2-8 per hand? ill take it!
as i understand it, this is just how a base model is. if you fine-tune it for a specialty then it becomes more biased toward that form of output, and obtaining it becomes easier
so you look at midjourney and see it bias toward colourful hyper-contrasted explosive artwork and then obtaining real images from it is like searching for a needle in a haystack
nah, i found out why you couldn't get to the same level of detail with SDXL as my model, the discord bot and clipdrop uses lower settings, the images showcased on their website were generated with normal settings, i am sure that after it releases i will make a model that has better detail then my 1.5 one.
bro i'm talking about prompt coherence, not image details
it's a personal experience. some people's natural prompting style is like 1.5 loves, some people think how midjourney does, and others do well with 2.x prompting where a bunch of 1.5 users would be like "how are you doing that?"
when i try prompting SDXL, my initial results are amazing! i love them. they're aesthetic by default. but improving it to do what i want beyond that isn't easy
this will be easier with fine-tunes
that's also the case with any current base model tho =/
but you'll have to balance the fine-tune in pursuit of obtaining those results, and not destroying the ability to do other stuff.
@cyan snow but those models only have one text encoder to bump the weights of
text encoder fine-tuning is hard
it's harder in a different way than unet training is imo
but possible
every bit of fine-tuning you do with a text encoder, damages it
fine-tuning is the art of careful destruction
trust me, if it will be needed, the community will find a way, exactly like with 1.5
It would be cool to semi-automate prompt refinement, so you could generate two complementary prompts related to the original and have the user select the one they prefer, or such
when it comes to finetuning the text encoder we don't even know the vram requirements. we know that it will be quite high, the models are huge. you can save on some vram requirements by freezing most of the encoder's layers. but it only helps so much
and yeah, "people will still tune it" but those people are few and far between
look at how many 2.1 tunes there are
that's because 2.1 was a fail, SDXL isn't
i bet after XL releases there might be more SDXL finetunes than 1.5 finetunes
it's funny but a handful of the SAI devs working on SDXL love 2.1, and its text encoder was selected as one reason, due to that success they hoped would continue with the larger OpenCLIP
that's a bunch of terrible english, i'm sorry
OpenCLIP worked great, and bigger OpenCLIP is better, is all i'm saying 😄
your english is fine, you know other languages too?
thodee see hindi
i just read that sentence back though and it felt like r/sentencegore
if the person who wrote it is like "what the fuck lol" it's bad
but yeah, this bit of information makes me real happy.
you from india? thats cool i just know a little spanish, im so grateful i can pronounce foreign stuff thats been romanized now
nope, i am canadian
phew, took way too long, but I learned that I hate python code :P
curly braces ftw
https://www.youtube.com/shorts/Xe2MbMxuUuY
how i speak spanish
Follow me on Instagram: http://instagram.com/fordquarterman
More Gringo Redneck Spanish Videos:
Racism and the Sheep of Peru:
https://www.youtube.com/watch?v=8Mhyhg4C9LU
El Desierto de Sal, Y'all:
https://www.youtube.com/watch?v=4P1FPEiYRMw
Huayhuash Super Cowboy:
https://www.youtube.com/watch?v=hyvskTF4X7w
Some of my other favorite videos:...
also, this was during the show and tell about an hour ago
no yeah this is what happens if you DONT learn how to prnounce romanized text
the disappointing part to me is the implicit statement that 1.5 finetunes have more detail than SDXL base
vowel massacre
we already tested that, remember?
i don't necessarily believe it though, that's just how Emad sees it
everything is personal and subjective, and i have terrible luck with 1.5 finetunes. remember, you have to eliminate hires fix from your mind. this is all zero-shot image gen
do an empirical experiment if you disagree and it matters
so it's not very broken aye? 😉
in my world, i have no hires fix. everything is zero shot, and today is the first time i've used textual inversion
in my world, I have a good memory, I'm rich, and handsome, and have a good memory! :D
I'd probably be one of the nameless people in one of the booth when the real hero escapes. I'd probably think they'd also be crazy and out of their mind before having my 4231 refill of stale coffee...mmm coffee
you seen the movie 1408?
no, never heard of it, is it like brazil, or how it was spelled?
omgggg it's not at all like that
it's a really good John Cusack film by Stephen King
it has Samuel L. Jackson too
highly recommended viewing especially since you like AI art so much. it's not AI art, but, you'll see what i mean
last movie I saw was the first deadpool movie, so if it's after that, then I've not seen it by default :P
i think this is from 2007
holy cow, it is
oh, another one from 2009 same AI art-esque feel, Mr. Nobody but some people can't stomach Jared Leto anymore so, viewer beware i guess
hope I remember that, gonna be without much internet time for a while. And trusting my memory, I know I've already forgotten it! :P
I...I didn't actually think about that one, I never use my phone for anything other than calling someone or sending a sms.
I hope people will have fine tuned that new sdxl when I return, one day! :P
What program do you recommend to animate an image like this?
idk but that's pretty hot
im following a trail right now
i always thought i need 0.4 denoising strength at highres fix to not get clones
but i think i can even do 0.6 without problems
nvm
this prompt here shoes well how the model deterioates if you go above 0.4 denoising strength
0.4 vs. 0.5 vs. 0.6
hm now im wondering if 0.3 might not even be better than 0.4
ah fuck i was using the wrong model lmao
but yeah seems like there isnt a big difference between 0.3 and 0.4
so i was already using the optimal value
I noticed the adetailer extension got pretty good results, and they have 0.4 denoise and 28 steps. So I started using that too
coherency of wings and tails in this model is definitely a ton worse than in my previous test models, but its also much more diverse and flexible so ill let it slide for now
definitely will try to improve that tho
opentalker
actually, it seems to do a lot better when i ((emphasize)) the wings and tail
still far from where i want it, but better
im go to see!
https://civitai.com/models/97808?modelVersionId=104575
thanks, i've added that to the model page on civitai
downloading...
see the notes that you need the cfg rescaling extension, and perhaps to use DDIM or DPM++ 2M
weird, when I click download it opens a new tab and nothing happens
without the cfg rescaling extension, DPM++ 2M worked alright but there was definitely contrast ceiling left on the table
the safetensors file is also on huggingface
will it work in comfyui?
only used it a little bit 😄
compared to my experience with it that makes you the intellectual juggernaut of the two of us




