#✨|sdxl
1 messages · Page 146 of 1
clipdrop is probably the second best
and then all the other "sdxl" models output images like sd1.5 would
like wtf is this
post a png file from comfy, we'll help
hey guys, I was stuck at home due to terrorism in my country so I finished all my workflows:
should I upload these?
damn. sorry to hear you're having to deal with such a thing. well glad you're okay, man.
are you asking about uploading the images or the workflow?
images are quite impressive
looking closer, I guess the workflow isn't that complex
unless I'm missing something perhaps
obviously, the workflow. I've been holding this back for a while, perfecting and simplifying it. I had all the time at home recently, and I believe I might have mastered image blending and zero-shot
ahh. let me look again
I took screenshots of just the preview and setting part of the workflows, all the dirty node work is hidden underneath it all
that makes more sense. I've been working on some image blending stuff myself. but want to get a couple nodes worked out before I put it out there
considered uploading to civitai. there's so much clickbaity stuff on there, but mine does what theirs claim to do, lol
learned a bit about the clipvision and ipadapter stuff. it's pretty interesting
I feel like I might have nailed it, the only issue with publishing this will be the requirements being tricky
it's really seamless, I wish it would've been simple to set up. If I'd upload my 3 latest workflows to CivitAI I'll have to write instructions for how to setup an inference to run them
send any combination of 2 images and I'll blend them
I meant the workflows on my comfy
like I mentioned; this isn't easy to set up. When I'll upload my latest workflows to CivitAI I'd have to make a detailed setup guide to use them
it should understand that on it's own, prompt is just IMG+IMG, no text is required to be involved
[grid texture] + [positional image] = usually not a good thing, but it sure did blend them
my image mixer mixin as well
this is about the quality you'd usually expect from my latest workflow; when the size of IMG_A=IMG_B, even the subjects in the images will fuse with one another
I just joined this server. I cant find a guide on how to do i2i with sdxl
Thanks. Im setting up sdxl on my pc now. Is there any way to do it with the bot?
needs some fine tuning
I havent used the bot in a long time. I think it had something
yeah, it seems that when the sizes of the blended images are identical, the subject in the final image will actually be if the two subjects fused
well you know it's autocropped right? I'm assuming so since you are using square images
but the clip vision encoding requires square images, at least in it's current setup
I'm not using square images, and yes; I know
well close enough
that's some of the operations happening in the dirty node work
it still seems it works best when the image sizes are equal though, I feel like if I'll have a node that streaches both images to 1024^2 it might work even better though
I also make them equal size and dimensions
all pixel counts identical
here's what my blender is currently blending
mostly add images in groups of 4. but if I have an odd one out I'll just add something else to the 4th slot
try to choose the images so each group of images has a similar style or thing going on
my goal before I'll upload it would be to achieve this quality of blending while also allowing the image sizes to be more dynamic
because when the sizes are equal to each other, I'd say this is damn close to being mastered
you mean input images I'm assuming?
well you could perhaps use some sort of transformation trickery so they wouldn't get indiscriminantly cropped by the clip vision processing
I'm not sure if IPA will be effected if input images will be streached. that might be a viable solution
I could always have another part of the workflow that just does zero-shot image input without a prompt to get an exact image with different image sizes, but that might be extra work
the zero-shot workflow I made isn't working better or worse with any input size, so I could make use of that
I was thining about it, and seems like some kind of dynamic or variable stretching/contracting could work
where it leaves the middle of the image less altered
and increases in magnitude near the edges
padding is possible but seems like it wouldn't work out well
personally, I always make images in the same size, so this doesn't effect me, but if I'd want to publish this; I'd also have to make some kind of mechanism that prepares IMG_A and IMG_B to be in the exact same size before the actual blending
3 options are cropping, padding, or transforming
it always works sooo seamlessly when the sizes are the same, certainly better than when they aren't. I'll make some solution to that for sure
yo, streaching to 1024^2 actually worked
peculiar.
The key is to have all the elements of the image within range so the clip vision can encode them
It cuts it up into little squares and outputs tokens for each one. At least that's my understanding. Might be more going on though
and then if you're using ipadapter plus rather than regular ipadapter it adds an extra dimension of tokens, although not exactly sure what they're doing just yet
it also seems like it'd be possible to train a clip vision model to analyze an image and determine ideal cropping so that the main subject of the image is retained. but not sure how the training would be set up
but if that were figured out it'd really streamline the process
I automated the streaching, now it takes the values from [emptylatentimage] and uses them to resize imageA and imageB, it seems to be even better now
nice. yeah. those look clean. have you tried it out with two images that seem like they wouldn't work together? curious what it'd do with those
currently blending a 5:2 ratio image with a 1:1, so far it does that just as seamlessly as it does when the images are equal in size
this was easier than I thought haha
glad to see it's working. I hadn't actually messed with the stretching much myself, but it seemed like it'd work with the way it encodes the images. but that's actually working better than I would have expected
these two images are being streached like hell while preparing, and the quality is still about as good as when they are initially the same
so now it's independent on input image size
Excellent. So cropping might not be the move with this stuff. I guess it's case by case, but not sure when cropping would be preferable now
has me rethinking my node ideas
even though the input images are entirely different sizes, it still blends the subject at a similar quality to when the input images are actually identical in size
on another note, what sort of approach do you take with freeU? haven't really messed with that much yet. I've been looking into other things recently
I'm using FreeU for the second diffusing stage, due to IPA_plus still losing some of the quality of SDXL, I find that if I set FreeU to have the second stage diffuse more aggressively; the outputs are better
the conditioning for the second K_sampler stage is just zero'd out positive with a typical negative
gotcha. I actually haven't noticed a consistent degradation of quality with IPA_plus, but we might be looking at images differently. also, I've found input images to really have a large impact, and if cfg is too high it really goes downhill quickly
I kinda have, that's fair though. it's not as destructive as CNET, so just 3 seconds extra for way better quality is entirely fair
yeah, I need to mess with it more. just get tunnel vision sometimes and only focus on a couple things
have you played with this at all? noticed when it got added a while back, but only started messing with it a while ago
no, I heard it just does shit to the seeds mid-diffusion?
not even sure. I just decided to see what it'd do, lol. might look at the code to see if I can decipher anything. although I have a feeling it'll be some craziness I won't fully grasp
added 4 new images to my blender
it's sooo seamless man, now all I'll need to do is just write a detailed guide for how to install and use it then it's ready to go public
try a real person and a cartoon (just wondering what will happen)
the cfg is too damn high
but like the detail level I guess
I think I might add the next 4 images posted just to see what they'll do
it was spitting out abominations like this yesterday
if I ever find a place where I'm actually content with the results I might consider posting it somewhere. but have ideas that I still haven't figured out how to implement
just places him in the cartoon, it didn't have any other subject to blend with, so it just placed the subject in the environment in imageB
well gotta put him in there with shrek
if both him and shrek will be in a portrait position in the images; it will most certainely blend the subjects and make an abomination
a beautiful, abomination
I go for abominations. I'm a bit of an expert
just lovely
lol. that's about as accurate as you could get tbh. could be a little more green maybe
that's miles ahead of MJ blend imo
yeah. I don't know how any closed source will keep up with all the new innovations of things. they'll shoot ahead for a minute, but doesn't seem to last long
like dall-e 3. I'll admit it's fun to play with, or was. but then microsoft just crushed it with guardrails
not always ahead, look at Dall-E 3, usaually nowhere near SDXL despite being pixel diffusion
maybe the model itself can rival SDXL, but when it's being controlled so aggressively; fuck no.
it's only strength really was that it's working with gpt-4
nothing hooking up SDXL with LLaMa2 can't do better..
but I checked it out again yesterday and the most tame, PG type prompts were deemed too offensive
however then comes the issue of running SDXL and LLaMa on the same machine; not easy.
curious about gpt-4V now
also I think SD3(or whatever it'll be named after the approaching epoch..) will use BLiP instead of CLiP; this means that IPA won't be necessary to enable it to have images as inputs
why not both?
LORAs trained on dalle3 outputs with their prompts as a style helpers would be a fun project
there are no styles that SDXL doesn't know how to do with the right prompting really.
prompting is so 2 months ago
they use GPT4 prompts and outputs to make Llama LORAs and other models, it's just a technique out there
I need to figure out llama loras
both BLiP and CLiP in the same architecture? I'm questioning how that'll work
anyways, BLiP can enable a diffusion model to fetch resources for the internet for certain generations. BLiP diffusion can do txt2img like SD1.5, it can blend images almost as good as my IPA workflow, and even zero-shot. it's insane how making a diffusion model that uses BLiP instead of CLiP can increase capabilities
got my first access suspended for 1 hour from dalle3. the fun is over
well the drawback is it's computationally more expensive right?
not really, BLiP is just more capable than CLiP
SD3 will probably use a component like BLiP instead of CLiP
I've compared a lot of blip and clip output, their outputs are both useful in prompting. clip is most often single words, blip is more like the caption on an image for people who do not have sight. clip tells you the things it sees in an image, blip explains the image
CLiP is the text encoding component of stable diffusion, BLiP is a different component with more capabilities
yes I know
the only model that uses BLiP for text encoding is BLiP diffusion; which is just SD1.5 built with BLiP instead of CLiP. it has more capabilities and is better than 1.5
hmm, I'm just not sure how that would work. they kind of do different things.
not saying it wouldn't work, obviously
no I just mean the tokens they spit out make good overall prompts and combine well
BLiP has the same functions like CLiP except it has more capabilities and understands language WAY better
this is BLiP diffusion:
https://twitter.com/RisingSayak/status/1705223295539519858
not mentioned in that tweet; it also blends images and can do pure txt2img
slightly better than SD1.5 at txt2img
well their architectures differ
so I'd think SD3.0 would not be about being better at the stuff we currently do; it would be able to do more stuff in general.
also, clip is more wordy. blip is straight to the point, but not necessarily ideal for detailed prompts and things. well maybe the new blip model is, I don't know
also, I think blip is more for image to text right?
it's meant to be just a better component for images and language
SD3.0 won't necessarily use BLiP; they might just make an entirely new component that solves all the multimodal issues
like, we sure did just now master img+img, also img+txt, and entirely txt2img, all using SDXL. but never have them interact with eachother
waiting on audio to image
You could technically do that already
Use whisper to transcribe your speech and feed it into SD as a prompt
well sort of
well I've interrogated images I've made and then put that in as input for an audio model
That's glueing modalities together, not making a model that has functions that interact with eachother
usually creates incredibly cursed sounds
I don't think monolithic models are the way. but that's just my opinion
We can't fit more than 1 model in 12gbVRAM, the way must be a model that has functions that interact with eachother
I very much prefer the idea of dynamic specialized models that can be swapped out as needed
well they obviously need to work well together. but I think of it like a car or something. maybe the base will be the same, but then users can swap out parts as needed, or to suit their particular preferences
The BLiP2 component can act as a text encoder/image encoder, and a chat bot really. I've seen UIs that use BLiP and you can send it images then ask it questions about the images
So maybe if SD3.0 will use components like that it could do everything that we can stretch SDXL to do (this includes image blending, txt2img, and zeroshot.) while gaining more capabilities
well I know of no reason why it couldn't work. I was just mainly thinking that there's a reason it hasn't really been done on a wide scale before
Why is SDXL soooo bad at eyes....
pick a face detailer workflow for your tool of choice. they work very well and dont add much gen time, they also give you a lot of freedom for face detail at different subject sizes
these eyes look alright
(POV, close up),aurora borealis rim lit arctic kitsune magical fox, icy-blue fur, agile, sentient, mysterious, taiga forest clearing, Ektachrome slide film, night, dreamy bokeh, atmospheric depth,
this is what I get with that prompt
I'm using the new AIT workflow I made, I'm crankin' batches of 4 images in this res in under 30s
I'll be 👀 ing for it
(cinematic), insecure small mouthed blonde fit comedian, wearing spectacles, style of Martin Scorsese Casino, Las Vegas cabaret stage, microphone stand, searching eyes, 1988, Steadicam-driven production still, Rich cinematic saturated reds and golds
it uses an optimization called AIT that doubles the speed on 3000 series and almost tiples it on the 4000 series without degradation in quality, it's compatible only on rather specific machines; but it's already in the image's I sent metadata if you want to try it out
birb
E.g. code former?
Guys, I'm having a problem in comfyui. Anyone that isn't a newbie wanna take a look at the error I'm being shown?
My ComfyUI drawing app is coming together finally, got a good workflow with controlnet now.
I approve of your stylistic choices
I'd look if I could. but can't see it
here
Elegant Entropy 1.3 + SDXL refine
oopsie, forgot all about you. so what are you inputting into your upscaler?
are you sharing it yet?
"In This Stage, We Are Present With The Last And Oldest Memories As The Mind Deteriorates More And More.
The Mist Is Storming All Across The Brain, And There Is More Echo To The Tracks, Some Tracks Are Misshaped As A Result Of Near Forgetfulness."
I used everywhere at the end of time stage 3's description as a prompt- tf
''Post Awareness Stage 6 Is The Point Of Total Confusions. The Character Is in Ture Horror As All The Wires Are Untying An The Mist In Tangling Up. Small Moments Of Clarity Can Be Heard, Resulting In The Character Wondering For A While, Was This All A Dream? As The Clarity Fades Away Into More Confusions And Rupture, There Is Nothing Left Except, Darkness.''
Scary Tale )
Full Motion VIdeo game (Harvester) gore
Have you gotten sdxl to train a model in 24gb without going over? I cant seem to get it under 25gb so it is going like 7s/it with a 3090.
Sure have for both DB and LoRAs. Friend was before me doing it because he has a 3090 and I only had a 1060.
I would love to see your config. I have tried it so many times.. with kohya_ss, the dreambooth extension, windows and ubuntu.
I believe it's possible to implement DeepSpeed, TRT and even AIT for training; but idk
I'm pretty sure SAI trained SDXL using either OneFlow or TRT, but don't quote me on that
@glacial nacelle what are your nvidia drivers version?
anyway, @glacial nacelle downgrade them to 531.xx and check if any difference.
In the end, after several attempts on the 4070, I ended up using Runpod for the training.
I will try that. I am not sure what version.
My AIT does not work with an update from a week ago already. You don't seem to have a problem? Do you know how to fix it> Reported 2x already...
Cannot import A:\AI_Files\ComfyUI\custom_nodes\AIT module for custom nodes: expected an indented block after 'if' statement on line 413 (AITemplate.py, line 414)
I've already mentioned, I just have two installations on my machine; one I use for pure txt2img with AIT, and another for other stuff
one I update once a week or two, and the other I keep on whatever AIT supports
This?
quite the variety
Needs a Cadillac coup de ville
well it's a Lora trained on 70's Disney films
i'll try
very cool
someone got the inpainting working
but they say to select a file named "controlnetxlCNXL_h94IpAdapter"
to which there is no google result whatsoever on the internet
sounds like someone being a jerk, that post has been up days
a pretty girl generated with and without FreeU + Poormans Instant LORA + Face Detailing
This is an overview of my daily driver SDXL workflow that is part of my Workflow package on Civitai
you figure it's fake?
yes
thanks wont waste my time then!
will just check in the coming weeks / whenever when it comes out
use google differently and follow the first link to Civitai ;o)
https://www.google.com/search?q=controlnetxl+CNXL+h94IpAdapter
Thats for openpose
have that already
oh wait
so goto the page and click on the correct option
ahh I see sorry
THANK YOU
I thought those were tags!
Only used this website 4 times maybe!
never knew those were buttons you could click!
IT WORKED AND IT WORKED WELL!
Now I can set and forget again
/private
Looks like medevial jail imagine being in this for a year
this one is better (car was deformed in previous one)
Can you share thep prompt/model, love this style.
embedded here
This whole channel is just a showcase now lol
I'm ok with that 🙂
I have some model training news for people here who are interested
My realism LoRA I started training a few months back still HANDILY beats the best realism SDXL fine-tunes out there that I have found for realism portraits
I am gonna be updating it from 90 images to 500 images once it cools down where I am. My final goal is 2500 images and full text encoders training for a full fine-tune in a LoRA
ETA?
For the 500? 2-3 weeks if I can
I am hand selecting 4k+ res images and hand hand captioning for various things lose poses, color grading, zoom levels, and all that
A cinematic portrait photograph of a pretty young black bride standing in a grass field of grass and white rose bushes in front of a white plantation house
1 is base, 2 is my LoRA, 3 is realism Engine XL, 4 is real stock photo
It's trained on professional grade retouched portraits and wildlife photographs.
It's best improvement is background consistency, and directional lighting
A cinematic photograph of a snow leopard inside a cave on the side of a mountain
Base, mine, realism Engine, real stock photo
Lost the prompt for this one, but it's mine, realism Engine, real stock photo, and real vision XL
It should be noted, it's not fully trained for mine, so fine details are not quite there just yet. But here is mine vs realism Engine vs real stock photo
This is done with just 90 images. So the next one of 500 images should be a lot better, and then the final 2500 image one should be insane
@crisp owl hope this is promising!
Also, it should additionally be stated that 75 of the 90 images are portraits, and the 15 others are random wild life, so it's not even like this model has seen a ton of animals and it's already that good with them
My LoRA also cranks up detail in water color and other paintings.
Mine vs realism Engine vs real stock photo vs real vision
My LoRA vs realism Engine vs real stock photo
I feel like my LoRA just consistently has better textures and fine details like the individual foliage
dose anyone know how to make the postal dude in SDXL
A cinematic portrait photograph of a pretty and meek albino woman with ice blue eyes wearing a black dress in a room made of marble
Mine, juggernaut XL, realism Engine
(Mine is not trained on fine details yet)
A cinematic portrait of a fashionable young African woman wearing an abstract and asymmetrical dress with frills and angular shapes while striking a pose on the sidewalks of New York
Mine, RealVisionXL, RealismEngine, RealStockPhoto
None of them got the pose lol
I hope to add lots of data for vogue and fashion to my full LoRA
be careful with the nudity. you dont wanna catch a ban
that's a man on the second image didnt' thiink that would be an issue
the woman isn't displaying anything per
I really didn't consider
what is going on with this node anyone know?
bypass? ctrl + b
anyone know why my text to image generations with SDXL appear blurred out? this is with the api...
are they fulfilling the proper number of stups for 2,3,4 batch number/
looks like mine when I accidentally choose 2 instead of 20 cos my 0 is finicky and lol
thanks for the insight. will investigate!
cos you have one that seems to render out, the other 3 seem as if they didn't run full
but I don't really know, just looks like when I mistype the step count lol
sometimes I get 4, sometimes I get 3, sometimes 1.. very hard to pin down
wait
you mean sometimes it runs in full all 4 and sometimes just 1 2 or 3,
like this is 1?
I always get 4 images but no all are clean, sometimes I get 2 blurry, 3 blurry or 1 blurry
can youu see console output, yoou hiittinig some memory clamp or something that is cutting the batch short on
check the meta data on each image
if that''s the case somethiing is halting the generatiion mid. . or yeah, I never really looked at XL metadata, didn't consider that
I was just thinking about that, Iwondered if it would tell me the steps when i have it running through like 4 diff passes of generation lol
XL just keeps getting better.
My LoRA, realistic vision XL, realism Engine, real stock
Looks like real stock does beat me in animal photos. Gonna be adding at least 40 more to my dataset
This one is mine, real vision, realism Engine, real stock.
Mine definitely looks the most realistic lighting/skin detail wise
I worked my ass off to make it good at skin texture and lighting without having to prompt it haha
were you thinking the meta data would show how many steps each went through? or what specifically in the metadata would I be looking for?
yeah maybe some differences would show in what was applied to each image to make them different.
gotcha yeah they look mostly the same though the blurry ones are of course smaller files
halloween ❤️
row0-dog
A note for anyone trying to train a model in sdxl. I was just able to get it working on a 3090. I used kohya_ss in ubuntu 22.04. the latest update to kohya. Text encoder latent caching was what made it work. Kept crashing in windows without error output.
countess dogula
I hate those damn dogs and see them exactly like that.
MOAR CATS!!!!
I almost have the data set complete for my cat lora
super clean
Can ComfyUI only run in a web browser, or can it run standalone?
I have a question
How to request the images generating progress?
Is there any API in the diffusers?
there we go
Hell yeah.
that's just my prediff, I dunno how it is going to turn out after 2nd pass and refinement
should be good
no, not necessarily 😦 I need to change my sampler I think, it makes things..cardbaordy
or plasticy
hehehe
yeah it.. did something to the blood on the knife, may need to reduce the step it goes to 2nd pass
i';m going to reduce 2pass steps and see
the 2nd pass isn't good for photo realistic things REALLY good for artistic style stuff,
cos itll add some good detail , but it likes to smooth photographic detials
anyone have the ideas in whivh ai software this video is created?
does any one know how to add a file on civit without having to redo everything?
Damn I hate civit as it deleted my file
yeah their upload process gets hung up at times.
I've had it delete stuff when it said it was successful
It deleted my original file as it said it was a duplicate and ZAP
then I had no files
typos do some funny stufff
Well, I let it stay deleted and replaced with the fixed version.
sometimes it almost seems like whoever runs civitai makes it trash intentionally
I get that it'd lag and what not when traffic is high, but it's not that
que?
Oh, snap
I sort of figured something like that was going to happen
apparently SHARKs has a ROCm backed too instead of just Vulkan but it miscompiles so I haven't tested it.
but yea hopefully they help out with more of the software stack
I was watching a vid on Tue about the 5090 and ugh, but something I giggled at was when they were talking about AMD and how the AMD 7900XTX isn't really good at anything it is more like it can do but does not do anything really great PLUS fsr 3 is a failure.
how is fsr3 a failure it's not even out yet
Specs wise I guess
???
Still, they applauded them for their efforts as someone has to take risks to keep Nvidia in check.
FSR3 sounds like it'll be easy to implement if you just want an all in one cross vendor solution
the BeamNG devs said they're going to start with FSR3 then do other tech later
They then went heavy into why the 5090 is going to really disappoint gamers and went into all kinds of technical stuff.
fsr3 is out for 2 games currently
I lay odds dlss will wipe the floor with it. The reason is cross platform comes with a price AND dlss actually has hardware made for it not piggy backed.
I hope I am wrong but I doubt it for this
from reviews fsr3 adds latency, adds shimmering, adds juddering. overall the image quality is worse. its way too early and need more work
isn't dlss3 also super latent if you don't use reflex?
shimmering I saw them zoom in and yes, plus not as crisp. how in the heck you can have shimmering and blur I dunno
my 1070 can't run that shit so I've never used it tbf
reflex is in my B450 ancient mobo so I was surprised.
I'll try fsr 3 sometime once it comes to some games people actually care about.
dlss3 reduces latency from native 4k resolution
no
faked frames will never be less latent than native
native is presented immediately
personally, I haven't been a gamer for almost 6 years now so all of the game stuff doesn't really interest me.
unless you mean like dlss 120 vs native 40 or some shit
then maybe it could I guess
IDK I doubt I'd even use FSR 3's frame gen for any time. It'd mostly just be an FSR 2.1 upgrade
I'm on 4k60 not 1080p120
the top native is with reflex
of course running a lower res makes it faster
FSR 2 will be faster by that measurement
native withut reflex is bottom. much higher
why does that matter?
the real comparison is the first 3
turning down res makes it faster and adding fakes frames slows it right back down again
well completely native is the bottom number
but how's that validate this?
it's reflex lowering the latency not dlss3
in this case with cyberpunk most people would be running with dlss + FG + reflex. so the result is less latency. I guess you could say its the same, but its not higher latency
then with fsr3. not the same game as its not available for cyberpunk
framegen actually gets lower?
ah yeah your right. I remember them saying its worse? dunno
"native AA" is just running the smoothing filters without upscaling so its gonna be a bit worse. I dont know if DLSS has an equivalent mode but in your cyberpunk test they used upscaling for everything so I wouldn't compare the native AA numbers
in any case the image quality suffers. its too early in development
seems fairly consistent with AMD's latency numbers
https://gpuopen.com/fsr3-in-games-technical-details/
but yea im sure the quality isnt as good as the hyper-specialized dlss. FSR 1 sucked ass lol.
dlss1 was bad also
2.2 or whatever Cyberpunk uses looks pretty good to me.
I'm going from 1080 -> 4k though so probably an easier jump than what most people are doing
looks pretty good with all the RT shit maxed out
cyberpunk is amazing. love it
really heavy on your pc though especially the cpu after the recent update
with ultra RT I can use the reflections on the walls to see around corners its dope. works nice with the ricochet guns
fancy lad
havent played 2.0 yet but I see they added tweaks for SMT so maybe my 7900X will actually be faster
the 7900X/7900XTX combo is pretty slick ngl. Complete top-end PC for $2500
to completely max it out with a 4090 and the 13900k would be like over a grand more I think
AMD software rollout is too slow for people to jump ship, even if NV costs more
I dont even think most of the nvidia software works on linux
rocm still has growing pains compared to cuda but that's a different thing
oh speaking of
cuda has a decade or more of dev.. so its the easier choice now days
Top 3 SDXL anime models. looks like the one on the right was trained on the 1.0 VAE cause its completely fried lmao.
prompts are
> high quality anime album cover of hatsune miku wearing a black dress in a dark red room, text reading "SING & DIE"
> gritty anime artwork of a demon fae with an unfocused twilight forest backdrop
> intricately detailed portrait of an anime catgirl in a sunlit church
> best quality, ultra highres, 8k, RAW photo, knee up shot, sharp focus, insanely detailed, highly detailed photo, photorealistic, hyperrealism, continuous angles, perfect composition, natural geometry, rule of thirds, centered, upright , film grain young woman looking past the camera, red lipstick, hot af, freckle, blue eyes, slim petite body, huge breast long fade black flowing hair, 35mm analog film photo, sitting reading a book in a library in a green tank top & blazer, pantyhose, chokerbest quality, ultra highres, 8k, RAW photo, knee up shot, sharp focus, insanely detailed, highly detailed photo, photorealistic, hyperrealism, continuous angles, perfect composition, natural geometry, rule of thirds, centered, upright , film grain young woman looking past the camera, red lipstick, hot af, freckle, blue eyes, slim petite body, huge breast long fade black flowing hair, 35mm analog film photo, sitting reading a book in a library in a green tank top & blazer, pantyhose, choker
#1 most downloaded anime model and its absolutely fried lol
well you need to turn down the heat, bud
bru
im surprised the amount of people that put like 50k steps into training without even realizing they're using super fucked up settings
I do believe a lot of them have no idea what they're actually doing
that's clipped from the 3rd image in my anime model test. if you open the full 4k pic they're literally all fried
like how do you not see that on your validation images when you're half way through...
tried with a super low cfg? sometime that can help
still has the vae issue in the image regardless cfg
yea that's not mild issues from a cranked CFG that's straight up baked and i'm only on cfg 8
#1 most downloaded anime model on Civit lmao
hi
yeah the rainbow effect
without tweaking some things cfg 8 with sdxl usually seems too high
whatsgoing on
not that it should be too high
all these are cfg 8 @ 30 steps ddim no refiner
well I use way higher than 8. but I do things
the first two look just fine
but yeah, either way the artifacts aren't okay
all that chunky residual noise means it was likely trained with the 1.0 vae which basically just ruins the entire checkpoint
throw it out and start over I guess
Yeah this is why I default to using a manually loaded vae
It makes sense to improve a lot with few exceptional high quality images. You could see the Meta EMU paper to have the same conclusion. They train with 2000 exceptional images and easily beat their general models and sdxl. They call it "quality" training.
yeah no matter the model I always use the 0.9 vae loaded manually
training with exceptional images also improve in general which does not included in training images. I am testing with the quality training on sdxl.
Nice. Could you try an architectural rendering like this one, by curiosity:
modern Scandinavian house, by Andreas Levers, foggy, oppressing, cold tones, snowy
neg: text, watermark
that wont help if they literally train the model with it
yeah definitly wont help if its trained
I always do a zoom check to determine if I keep a downloaded model
poor Counterfeit lol. throw out the whole checkpoint and start over
i dont think im keeping any of the anime models tbh. i decided to try the top ones to see how they're coming and it's like OK at best or kentucky fried at the worst.
seems like XL tuning in general is still kinda early days
my tops I end up using are realvis, copaxvivid (no longer available), zavychroma, nightvision, dynavision
in 1.5 I just had 4 different overtuned models for different levels of realism and changed depending on what I was prompting.
xl however it seems to be much better at one model does all so the multiple overtune approach isnt really worth it so far...
1.5 I ended up with my own merged model I just stuck with 100% of the time.
I don't think SDXL is at that point yet, still so early
realvis is super fucked up isnt it?
like is you prompt a dog it gets hella artifacts
stuck to own merge model at 1.5
this one? I haven't experienced that
Havent tried realvis yet. I think its not good with artistic gens right, just realism?
Yeah it's best with realism for sure. But I pair it with more artsy ones in my flow to get stuff like my painted dancers I'm posting.
Realvis base with copaxvivid precon/refiner
realistic vision 2 was my "most real" overtune in my collection of 4 and it did pretty good
I merged the 1.5 version, as you mentioned
went comics -> ghostmix -> dreamshaper -> realvis
depending how real
merging with sdxl isnt as easy as 1.5
we need the mythomax guy to do a super advanced uber-merger for XL lol
though maybe after people figure out wtf they're doing with the training
ill try in a bit its dl-ing
Same guy that made Realistic Vision for 1.5. Love his stuff
Sytan posted some realvis 2 images earlier when he was comparing to his lora and pictures of dogs and stuff were all super borked
but maybe that's just a skill issue
I've been mostly using realvis
one interesting quirk I found out with the sd 1.5 version of realvis is you could easily coerce it into doing like DnD races by doing things like analogue photograph, dragon, (midriff:0.7) and the small amount of 'human' anatomy would be enough to completely coerce it into making a humanoid creature.
without it looking like furrybait
wonder if some of these would be better as just loras instead of full unet tunes? might be a lot less of a struggle
yeah even SAI were saying loras are the best option
95% of the XL stuff on civit is checkpoints though lol
no anime or realism loras afaik
still early days for XL. give it some more time
One thing Im constantly annoyed about, which persisted from 2.1 is long necks. See it far too often and cant avoid it
I don't really even pay that much attention to the full models anymore. have a couple in rotation at any given time, but loras are where the magic happens. and my image blender
what loras do you use?
"and what's wrong with long necks??"
yea so far I havent had it microwave any pics into green goo yet. there's probably some keyword that its super overfitted on that Sytan happened to stumble upon
mostly mess with the various loras that are supposed to improve the image. not even sure which ones are ideal. and I have a couple loras based off the images I've made 
long neck
> which loras do you use
> "all of them"
well, I could look up the names. one moment
nah its fine lol if you're just downloading random shit I do that anyways already
sometimes while im eating dinner I just let SD create a spreadsheet of loras/models/prompts or whatnot
I don't really analyze them all that much. I'll just load one and if it seems to work with what I'm doing I use it
I wish the major authors would try making Lora versions of their models, I feel like they'd do pretty good
I do like certain loras for specific things. have a blueprints lora and a schematic lora
those are fun sometimes
like a RealVis Lora or something
couldn't you just make one?
me?
well anyone
like diff the weights from the existing realvis 2 or train one from scratch using a similar dataset?
subtract the base sdxl weights from the realvis weights
I did it with dreambooth finetunes I did
A cinematic photograph of a modern Scandinavian house in a foggy and snowy forest
Mine, realvis, RealismEngine, RealStockPhoto
oh sytan what was that prompt you used on realvis that fried the corgi
might not be the best approach, but it did work. that was 1.5 though
Realvis fries the second you don't mention a person, so pretty much anything with an animal lmao
idk my corgis look fine
never been a huge fan of the real vision models. I guess because not a big fan of just making images of regular people
hey, let's sit around and render images that look like regular pictures. and let's do it all day every day
they're so overtuned on people if you prompt it with something like a dragon then add a tiny bit of human anatomy it coerces it into a hybrid
least on sd 1.5
I like the anime models that can't stop themselves from adding nudity even when you put nudity into the negative prompt
That's not bad, not great either tho
are we making corgis now?
so just like go through the weights trim anything below a certain delta and compile it into a lora?
Real vision breaks less when you hand hold it more. It needs more stuff to stop it from breaking, but my prompts are super simple
idk its a super simple prompt. just analogue photograph of a corgi on the beach in the sparkling daylight
Not sure then
oh did you see my anime comparison earlier?
honestly I used a tool to do it. maybe kohya or something. but I believe that's the basic concept. as I understand it the fine tunes don't take from the model, just add. or am I wrong about that?
so just remove the base model weights and what's left is your lora
that'd produce a full size checkpoint though no?
no
cause if you subtract a float from a float you get another float
so you have to trim away those below a delta
A cinematic portrait photograph of a tiger in a dense and vibrant forest at sunset
Mine, real vision, realism Engine, real stock photo
fair enough. like I said, I don't know the intricate details of the process. just know that it's possible and I've done it. albeit with someone else's program
RealVision vooks almost all non human prompts for me. Perhaps it's my settings, but I am not changing them to accommodate one model not doing well
im wondering if its overfitted on certain specific words
like "vibrant"
let me reproduce
It does it even without vibrant
deliberate was always better in 1.5
okay mine look absolutely nothing like yours lmao
ty
i dont know if that even existed when I downloaded realvis. I'm only on realvis like v1 with SD 1.5
Those look less cooked, but still cooked for sure
its the normal amount of cooking for stacking cinematic prompts on realvis
yours is like rage montage fried
do you tend to stay with the same sampler/scheduler or do you vary it much? How much difference do you find that makes if you swap around?
I haven't swapped around, as I have tested 7 models, and real vision seems to be the only one that breaks like this
maybe your negative is a lot different than mine???
idk why else itd be so different lol
deliberate was less about straight up absolute realism, but did it just fine. and was also optimized for long prompts. and I'm a fan of long prompts
A cinematic photograph of a white tiger in the snowy mountains
Mine, realvis, RealismEngine, RealStockPhoto
I've read guides that said to keep prompts short, and I feel like the people that made those guides don't know how to make cool things
It only does it with animals, but it's very consitently an issue with real vision, which is why I disqualified it from my meaningful comparisons
is realvis the upper right?
Yes
No idea, it's the only one that does it, and it only does it with animals
same prompt in realvis...
This is RealVisionXL 2.0
maybe it doesnt do it in squares???
yea I just downloaded that one like 20 minutes ago
I am not sure in that case really
lemme try widescreen
for 1.5 that helped but I feel like the inverse is true for XL.
this is how I prompt now. tired of typing them out
pseudo mentioned something about how XL doesnt have attention on the first layer which means if you have a mostly empty prompt it gives attention to nothing
16:9 tigers same prompt
not anywhere near as blown out as your realvis...
whats your sha256sum?
A cinematic portrait photograph of an old man sitting in the snow wearing a sweater and scarf smiling
Mine, realvis, engine, Stock photo
The second its not animals, it works fine
Also as a heads up, I need to sleep soon
damn why so early
It's 12:16 AM, and I have to be up in 7 hours for my flight
so many variables, so little time 🙂
A cinematic portrait photograph of a tiger in the desert at sunset
Mine, real vision, engine, photo
Looks like it was sampler, weird
which was the sampler and what you using now?
what sampler?
Now I suppose I can give real vision a better fighting chance against my LoRA
ive never seen that
I was using uhhhh
DDIM?
No
have you messed with the polyexponential scheduler?
Dpm 2m sde
I found that one gave slighting better dynamic range
oh
all the 2m samplers need karras scheduling
they're built around it
they might also work with exponential I think
I was using karras
lemem test
if you're 100% not using the refiner I found 2m non-sde converges faster than ddim.
yup there it is on SDE lol
Time to see if my realism LoRa has any competition or not :p
ddim took like 500 steps to converge when 2m only took like 60 once so im kinda putting ddim on the side when im not refining
nightvision and an older gen mixing a cat with tiger
one image i was working on had a really deep false bottom that took ddim well over 100 steps to start to get out of, while dpm++ 2m got out of it in like 50
what's with the halo around the tiger?
Woah, DDIM seems to have closed the lead my LoRA had
looks like a cardboard cutout
Not fully, but the other models look better than before
try 2m karras
theres a halo?
Big time
Huge halo
ah yeah see it now
its a 1024 image make sure you zoom in when you test models
i A|B without zoom first then zoom in to nitpick
I still feel like images like this are were mine really shine in realism and lighting
Mine, real vision, realism Engine, real photo
1 and 2 100%
thats something else to look out for in future. I never looked for it
are 3 and 4 the stock image models?
you said two were stock image earlier but idk if that's real engine or not
looks like it?
unless realvision is but I dont recall
ah
where do you get them?
Various free sites, I wish not to give away allllll of my secrets :p
I feel like I'm too lazy to make loras on that level
I am gathering several times as many as I have now for my next LoRA training within the next couple weeks
I caption every single one meticulouslg by hand too haha
I do respect the effort though. definitely not knocking it in any way
man, if anything I'd try to get blip 2 to do the captioning
neat thing about loras is the hand caption kinda works out
yeah
Anyways, I really need to sleep now
im tempted to try and make some obscure character loras like that once ROCm matures a bit
dang 500
And my end goal is 2500
for a char I'd probably only do like 50 lol
You have to remember, I'm basically doing a full fine-tune in a LoRA. My model greatly improves all aspects of realism and photography over base SDXL
So I need a diverse enough data set for it to extrapolate from
I just trained a lora on 6000 video stills from 80 music videos. Seems to have the effect of making the image look like a video still. lora on the left, base right
got some planet earth stills?
we did you manually caption 6000 images?
no lol, used a clip model
I'm manually captioning every single image
I just couldn't do it, lol
Sytan maybe you could use CLIP then just doctor them all by hand?
to hit a higher count
that's what I was thinking
I'm trying to caption for ethnicity, lighting style, f value, subject, image format, location, time of day, and much more. And it also has to be in a meticulous order of importance too
damn OK
Honestly, CLIP is more of a hinderance I have seen
clip, blip, one of those. or use clipvision and create non textual captions
I have tried it a few times for the 20 and 50 image trainings of my LoRA
I have done 20, 50, 90, and I wanna do 500, and 2500
wonder if you could use PIL to just pull the camera settings from the image metadata
ANYWAYS ok, I really need to go guys, peaceeee
if you're using original source images they probably have ISO and Aperature right in the meta
@ me with any questions or comments you may have and I will respond when I wake up
I have done that, read the metadata and converted the camera settings to captions
i have that set up on my phone lol
but because the photographer used so many settings, I generalised them to 'fast shutter speed' instead of the actual number used
not reading but storing settings in the meta
Slow Shutter speed, Large Aperture, ISO 200, Olympus E-M1MarkII camera, F 2.8
just like
if values["shutter"] < 0.01:
caption += "fast shutter speed"
if value < 1/500:
return "Fast_Shutter"
elif 1/500 <= value <= 1/60:
return "Medium_Shutter"
elif value > 1/60:
return "Slow_Shutter"
else:
return ""```
yea
if f_number <= 2.8:
return "Large Aperture"
elif 2.8 < f_number <= 5.6:
return "Medium Aperture"
elif 5.6 < f_number <= 11:
return "Small Aperture"
else:
return "Very Small Aperture"```
😉
i havent either but looking at that elif return chain made me want to
only 3 checks 😄
looks like the way python is set up it wont actually be any smaller
if it was an exact match I would use a dictionary
what about elegance
fn cluster_f_number(f_number: f32) -> String {
match f_number {
..=2.8 => "Large Aperture",
2.8..=5.6 => "Medium Aperture",
5.6..=11.0 => "Small Aperture",
_ => "Very Small Aperture",
}.to_string()
}
did a couple more gens trying to get the same angle as the other image. much less noticeable on these ones
could do it directl on the dataframe like this
lambda x: "Large Aperture" if x <= 2.8 else
"Medium Aperture" if 2.8 < x <= 5.6 else
"Small Aperture" if 5.6 < x <= 11 else
"Very Small Aperture"
)```
I'd 100% do that but I'm also the type of person to chain together higher order functions to entirely build large structures in-place so I'm probably not a voice of reason
python lambdas always scare me because they have the weirdest move semantics
df['a'] = df['f_number'].map(lambda x: ["Large", "Medium", "Small", "Very Small"][int(x>2.8)+int(x>5.6)+int(x>11)]+" Aperture")
but why?
why is it gross or why do it in the first place?
walk away from your computer for a minute then come back and glance at that snippet and see how long it takes you to understand wtf is going on
the match and lambda versions are infinitely more sane.
still better than trying to come up with formulas in excel 😄
you could also just not use excel
tell that to everyone else at my company
I just assumed it was my iq deficiency
dataframes are awesome, I use them all day everyday
meant to quote your other response there, but iq deficiency
let me syntax highlight these
fn cluster_f_number(f_number: f32) -> String {
match f_number {
..=2.8 => "Large Aperture",
2.8..=5.6 => "Medium Aperture",
5.6..=11.0 => "Small Aperture",
_ => "Very Small Aperture",
}.to_string()
}
df['aperture_category'] = df['f_number'].map(
lambda x: "Large Aperture" if x <= 2.8 else
"Medium Aperture" if 2.8 < x <= 5.6 else
"Small Aperture" if 5.6 < x <= 11 else
"Very Small Aperture"
)
df['a'] = df['f_number'].map(lambda x: ["Large", "Medium", "Small", "Very Small"][int(x>2.8)+int(x>5.6)+int(x>11)]+" Aperture")
there
oh yeah, well I can see what that's doing
so compare 1 and 2 to 3
your eye naturally follows the first two's control flow while 3 is just schizo
you might like nushell. has dfr ops as a shell built-in
so you can just dfr open fat_thing.db | do {...}
i have it set to my default instead of Bash for a while and its aight
I usually am making large python notebooks to analyse transactional data
bit kinky at points though
supposedly its faster than pandas but idk I mostly just use the ops to quickly edit json
cause editing json by hand in an IDE is pain. why are commas delimiters. why cant you have trailing commas. why is that a thing.
I usually just load the json, edit it like a python dict and then save it as json. don't even look at the raw file
but normally the data I'm using with pandas is csv
yea that's what I do with nu.
mut data = (open file.json)
data.name = "Tim"
...
$data | save file.json
but yes, been caught out a few times by trailing commas when editing by hand
pretty print ✨
would be nice to use something other than windows cmd
my work even blocks powershell
damn
on your home PC you can set up "Windows Terminal" on win10+ with whatever custom shells you want
its pretty good actually
I run my TUI music player in it and all the mouse gestures and everything work properly as well.
lmao what
who's shad
and why do you have beef with him
normal twitter things
i mean its cringe I guess but ¯_(ツ)_/¯
if he thinks he's a god then let him sit on his false throne
meh there's a lot of people around who think they've become artiste because they can prompt 'hot girl', but then fail to fix fingers in photoshop
why spend 5 minutes fixing in photoshop what you can spend an hour inpainting
I'd rather just generate 10, 20 images and pick the one without the screwy hand
idk i think its fun inpainting to try and make a "full ai" image
that's what's up lol
XL you can but on 1.5 I definitely spent a while inpainting
let's put her hands in her pockets 
at least for high res
no hands!
XL I've gotten some decent 3840x2160 stuff without any inpainting by limiting the composition to side profiles which avoids hands and chameleon eyes
yeah, but with 1.5 it would take maybe 50-100 images before you got something decent, and then you would want to spend time fixing the little issues
yea my Midna wallpaper took like 60 or so gens to cherrypick one for upscaling then an additional like 50 iterations of inpainting small details @ 1080p
sucked but in the end I got a fully AI image of a character that's hard for SD to make
even with a good image you can usually improve it some more with inpainting
in XL I just rotate my wallpaper every few days so I dont bother inpainting lol
meh. I used to inpaint more
my 1.5 wallpapers stuck around for months because they took so much inpainting and upscaling
look how much shit I had to inpaint lol
at that point I would have just done more gens
that was super cherrypicked
I used to spend so much time generating nice looking portraits and inpainting for hours, now I don't really care about people portraits
SD has 0 concept of what "midna" is so I was using Lykon's lora which is okay-ish
reference
IPA and image blending is good for stuff like that
even then I took lots of liberties to make it easier
idk what IPA is but controlnet didnt help much
IP Adapter
I used controlnet to latent upscale while keeping the proportions on-model but other than that it wasnt helpful
yea idk if that was a thing when I made that wallpaper
I've still never used it tbh
its like an advanced img2img
I found by running game screenshots or official art through img2img w/ controlnet it'd either be too close and obviously sourced or totally fucked up
so I just yolo'd and went pure t2i + upscale and inpaint
maybe once ROCm improves a bit more I'll make a proper Midna XL lora
also that looks like Anthony from linus tech tips lmao
the major release is down to three blockers on miopen 👀
ipadapter is image prompt adapter
basically reduces images to 224x224, then clip vision does it's clip thing and returns tokens that can then be used in conditioning instead of text prompt tokens. well that's clip vision I guess. but from there the tokens can be sent through ipadapter then embeds which then embeds itself into the model data. interprets th clipvision input prompt and creates a decoupled cross attention layer to insert the image data into the model
won't pretend to understand every intricacy of the process. but it's far more interesting and has far more potential than I think a lot of people realize
so clipvision but less underwhelming?
yeah, more focused
and clipvision is a lot more useful, potentially, than most people realize
@nimble heart I tried a Linus tech tips prompt 🙂
I don't like that face at all
looks like one of those youtubers that gets caught trying to solicit nudes from underage fans
probably has
how about that hand on the table
beefsteak hand
that's why he DMs underage girls; the adult ones get weirded out when he whips that thing out.
definitely 100% his hand and not his terrible personality
that what the chainsaws for ,that extra finger
it always surprises me when I find out that youtuber influencer type folk turn out to be huge narcissist dirtbags. who would've ever guessed?
im surprised how much of it was concentrated in the Minecraft youtubers specifically
like half a dozen of the top names turned out to be either pedos or assaulters or otherwise terrible
like that 2010-2016 minecraft youtube era
I think it's probably more prevalent than we know. youtube has a lot of unsettling stuff going on
got that extra finger off
basically all of the predator catching channels besides Skeeter Jean have gotten removed
not sure if it's still a thing, but so many channels were women would livestream themselves doing regular stuff like doing their dishes, but with upskirt shots. and it's like they were forced to do it. or some of them. saw something about it years ago. one woman had like 6 or 8 channels
and her kids in the videos
ugh
or those videos that would somehow pop up into the streams for little kids, but be super messed up cartoons
oh yea like the weird spiderman elsa tub shitting things in YouTube kids for a while
it really creeps me out
because it doesn't seem like it's just a stupid troll doing it
it sounds kinda unreal saying it outloud in a sentence like that lol
we live in a post-rational society
if someone doesnt know that happened I'd sound totally insane
it sounds insane to me and I know it happened
I remember some video documenting all of this went viral calling YouTube out and I looked up the channels myself and it was super sketch. Like not even that old, and video #1 -> present are all like almost the same. least organic channels imaginable with 4M+ views per vid
yes. made me think youtube had to know
"elsagate" I guess it was called
wasn't there a ted talk a while ago that warned about this happening with the way the algorithm works?
that's it
still some weird shit on there but it's not algo boosted to heaven anymore
don't think I saw the ted talk
I watched a bunch several years ago
well like 10
tedx you just pay for right?
lol
here's some money, let me talk
I watched them when they were on netflix. some of them were pretty informative/helpful
Writer and artist James Bridle uncovers a dark, strange corner of the internet, where unknown people or groups on YouTube hack the brains of young children in return for advertising revenue. From "surprise egg" reveals and the "Finger Family Song" to algorithmically created mashups of familiar cartoon characters in violent situations, these vide...
yea young me didnt realize this. I was wanting to watch some more after a teacher used one in school and the first thing I landed on was a dude fucking talking about how his jerk-off fantasies were "becoming more and more violent" until he suddenly realized that beating women == bad and now he's a nofap dude I guess.

