#🏞|general-with-images
1 messages · Page 151 of 1
2 hour finishing render... absolutereality with a finishing of icbinp
i successfully converted the pickles to safetensors and modified the UltraPixel code to be able to load those instead
it works
since you know a lot about cascade
do you think ultrapixel is a really big deal?
or is it not really
actually trying it for the first time now
Didn't really look like a real gamechanger to me
i'm pretty sure it was just dumbassery, but you never know... zero chance i'm touching a pickle that's pretending to be safetensors
do you think we can get similar resolution and quality with existing models
We can get hands with 4 fingers with other models, too 😄
Pretty new to this stuff. I get that safetensors are made for not implementing malicious code, but what kind of code would they want to inject?
What is a pickle in this circumstance?
a pickle is serialized data from pytorch... all you need to know is that it's like the old days of invisible trojans on a thumb drive running when you just plug it into your computer
merely loading one is enough to execute any malicious code that may be hidden within
there's scanners that can help spot a lot of the simpler cases of malicious code, but like any binary, there's no way to be sure
safetensors are so easy to use that there's honestly no excuse for using pickles for literally anything anymore except your own convenience on your own system
i use them for saving checkpoints while training stuff, in case i need to back up and restart, etc
but you shouldn't ever be sharing them unless it's with someone that has very good reason to trust you completely
So are models from civitai and huggingface generally safe? How does the user without coding ability discern between safe and unsafe when even those named safetensors may not be?
that's exactly the problem
i haven't checked whether comfyui will load a pytorch dump falsely named as a safetensors
i'm guessing/hoping it won't, but that's guessing
We are always at risk with trojans etc. With ai image generation I am mostly concerned about the software secretly uploading my prompts and settings to steal my copyrighted work. I imagine just restricting the internet access of the browser is not enough when python is running in the background...
that should be the least of your concerns imo
Perhaps. But still a concern.
many of us (myself included) leave everything embedded in all images so ppl can learn from the methods and recreate everything
so no one's going to be motivated to find yours if there's any barrier whatsoever (even just not embedding them)
I do that most of the time for open work, but not for commissions.
yeah i just think there's too much low hanging fruit out there where zero effort is needed
myself my primary concern is ransomware
Exactly. Good point
i'm careful to keep any financial shit confined to a separate system that i barely ever use cuz it's old and shitty
so i'm not worried about CC theft etc
but ransomware can be a fucking nightmare
What is your method for expanding the outpaint without generating new characters?
while i love cryptocurrency as a tech, ransomware wouldn't exist in the automated form it does if it wasn't for cryptocurrency networks allowing them to collect money with pure anonymity. the potential for cyberattacks are going to get worse now that models directing attacks can just go on and on. auto-gpt was just the beginning of autonomous instruct agents like bitcoin was the first successful p2p digital currency. it's going to enable a LOT of bad
with safetensors, arent the model weights encrypted? so technically, the models arent really open source?
they're not encrypted
they don't include the architecture for the model though, unlike the pickles where you can just load the whole thing
but that's better anyway: the code is separate from the data
right now, the people running a cyberattack have to run it. like they can automate a lot but, at some point, they gotta be there making it happen. authorities can enforce on that person. but language model agents don't have to sleep. you can facilitate their perpetual existence and they can operate large network attacks so long as one instance is running somewhere that knows the right encryption key.

okay, that shows your nvidia GPU carrying the load, which it should be
and it sometimes becomes 16% and then immediately 0% .. but if the CPU is intel, then it is higher than 0% always
i think, maybe, their stats might have issues. but, well - are you getting the sort of speed and results you're paying for?
I don't think so
then i might suggest you don't use them.
we are doing some video processiong. when the cpu is AMD, then it takes around 10 mins.
the same GPU (4090) handles it in 5 mins if the cpu intel is
why are you renting gpus?
what do you advice?
i need to know why you're renting them first
we have an AI project and I need to do some video processing with SD. so our machines are not enough strong. I have m2 max and it completes the job in 10 mins which is quite slow
that's why I thought that we need a good GPU resource.
https://github.com/2kpr/ComfyUI-UltraPixel/pull/27 for anyone that wants to use UltraPixel, here's the fix and the real safetensors i made
okay, to start with I'd suggest contacing their customer supprt, telling them what you are doing, and seeing if they can give you recommendations for the best options.
they said that they are not experts for SD.
that's why I joined here
then see if you can always get intel processors so you always get all the work being done by the gpu
your other options are things like Comfy online, huggingface, or googlecolab
yes. Maybe I need to install something special for AMD? I am beginner in this stuff.. I am a developer but firstime I am using SD
I need to call an API. so SD provides an API
it's running on THEIR servers, what would you install that would affect that?
I think, it is not their server(s). People put their devices on their systems.. but we use "templates". I don't know.. maybe Torch has a special version for AMD?
just try to make sure you always get intel
yes but the issue, there are not a lot of intel options 😄
this is my concern
it doesn't work like that. what you're seeing is the AMD cpu taking over the graphics processing.
exactly
the cpu usage is 99% and gpu is 0%
what would you install that would tell the bios of the AMD cpu not to do that?
but if it is intel, then gpu usage is high
maybe just buy a machine like this https://www.newegg.com/abs-aqa14700kf4060ti16g-stratos-aqua/p/N82E16883360436 and process at home?
it will not work. Because we'll create an app so if there are many requests then one pc will not be enough
one video process takes 5 mins and this is quite long. When there are 1000 request at the same time, I am done 😄
i promise you, i'ts more than enough, but whatever - just make sure your host is only giving you intel processor. you'll probably have to talk to them and get some sort of special set up
It cannot be.. how can I handle 1000 requests?
i think maybe you're in need of an enterprise solution if you are doing that. i would suggest yo utalk to AWS about this. that's not a single user setup, you're running a business
maybe the rented gpus are not strong enough? The vast.ai support told me that it is an entire gpu, not a core. So it supposes to be fast but it is not..
yes but we are kind of startup
I cannot effort high expenses for now
so what? you need a business solution. you won't get that by trying to rent what's been created for single users
you're looking at the single user prices and assuming enterprise soultions will use those. you need to talk to the hosting companies about their business solutions
I see your point
on Vast ai
tick the box "Secure Cloud (Only Trusted Datacenters)"
and in the drop down, select "On-Demand" not "Interruptible"
and make sure Unverified Machines is not ticked
this will get you much more reliable machines
I just want to increase my budget step by step
also, the hosting companies ahve not set up their single user solutions and expected a user to do that much processing, that fast - so you're likely causing them problems.
thank you for your tips. I am taking my notes.
I am also looking for another companies.
there gonna get you into a Ferrari in no time
you also need to roll your costs into what the user pays you.
sure, but let me start with 1-2 rented devices.. If I see that I am earning something (at least my expenses) then I'll rent more, or maybe I'll buy devices
not for the speed you apparently want, and the number of requests coming in you said you had.
that's not alpha testing, or even beta testing. you're in production
if single process takes 5 minutes, and if 1000 requests come, I cannot handle.. then some users have to wait for hours
no one will use our app
fact
and that's not how it works, anyway. you structure your cost to the user based on what it's going to cost you +10%
you never pay out of pocket. unless you are a startup with investors that have fronted a few hundred thousan dollars
for now what I want to know that is there any special configuration/settings for AMD cpus?
if not, then I'll try with other templates
you're not luma, are you? cause that describes them perfectly
maybe it was a template issue
no. there are not. again, you need to make sure you are using intel
yes I got this. But AMD has an issue or it is normal? because I find that awkard.. end of the day I have a GPU and want to work with that, not with the cpu
can't help that. you can't stop it. there's nothing you, as the user, can do. you'd have to ahve access to their servers and be able to isntall code on them, not on the virtual machine they're giving you. just make sure you can get intel cpus. and don't try to scam the sytem and use a single user dynmic setup to run a business on. get a dedicated business solution set up.
I'll do what you say but not now.. because there is no guaranty that my app will be successful. That's why in the beginning I want to continue with rented devices.. and if I see, everything is fine then I can start to invest. Is that not logical?
you'll never get your app off the ground if you don't have the right hardware from the start. you're at an impass. either you get onto the right hardware, or you go nowhere. and then your postential users will walk off. you have to "spend money to make money" you are going at this totally backward
do you work at nvidia or intel? 😄
no. i'm just a computer tech with almost 40 years of experience.
Still don‘t understand the whole discussion. The apple device even with the stable diffusion code from apple won’t get much faster. So you will need a configuration with a good gpu. If you want it the easy way just rent a NVIDIA gpu for the hours you need them. Enough gpu renting websites available.
do it right or don't do it at all.
AMD isn't apple
I see. I got your points. but buying premium cards are risky in my case. But I got your points
he's got an AI video creation application and he's trying to run it on something not designed for the usage he's asking for
What really now I see the 40 years of experience. … he talked about his apple m2 at the beginning.
That is why I suggest renting per use
i suggested getting an enterprise solution too.
here is the summary:
I rent 4090 from vast.ai. And if the device has an AMD CPU, then SD uses only CPU.
but if the device has an intel CPU, then SD uses only GPU.
I find that awkward
in the long run, @idle meteor - while you might spend more up front, you'll spend LESS if you do this correctly from the start
yes I am doing that. But then I have to find devices with intel cpus and unfortnately there are not a lot
you only find it awkward because you're not familar with the hardware.
so then I have to have a higher budget
find a venture capitalist, get them to find you invenstors, get a couple rounds of funding, get a couple engineers on teh team - do it right
Not fully correct. I am a senior software engineer and know hardward but of course I am not an expert
pls try the advice I gave you
for selecting a server on Vast ai
you're not familiar with, specifically, AMD/INtel+nvidia and stable diffusion
you are now opening other doors
Simply rent only the GPU per hours or days. You just specify the GPU like a 3090 NVIDIA and the you pay a few bucks
Ok, I'll rent one by considering your advice. I noted that and I'll try it. Thank you
talk to yoheinakajima on twitter
Vast ai is the Wild Wild West
sometimes you get a good server sometimes not
yes but the issue is that if the computer has an AMD, then SD does not use GPU. This was my issue.
And there are not many intel options..
ok, I'll look at that. Noted
Again: you select the GPU!
it's "if the computer has AMD, then AMD defaults to doing the graphics processing" - it has nothing to do with stable
amd has always been the cpu to get if you wanted to do graphics work, intel if you wanted to crunch numbers
how?
but if you ahve an nvidia gpu, you want INTEL
then this is the answer
that I am looking for
you're a senior engineer. i would expect you to know this
I don't know SD and graphic processing.. It is first time that I am doing something with AI..
I had experience with image processing but this is completely different
it seems that I'll have to buy my own setup
it's graphics in general. the same considerations are for running blender, or even daz studio. what do you normally program?
desktop, backend and mobile apps
that's going to save you the most money, and be the best solution, for a while. until you get to the point where you either need to build a data center or rent one
Look for example at tensordock. You simply deploy a NVIDIA 4090 for about 0.4$ per hour. You use it while needed and the stop the instance again. Other sites offer equal offers.
thank you. I'll consider that.
I'll need constantly. Because we will develop an app that needs to access SD API. so I need it 7/24
you can also use google colab and perhaps even huggingface
i'm going to be very blunt here. "we will develop" - you're not even in alpha in that case. you don't need that much right now. you need enough for you and your team, and a few testers, to work on
you don't need up to 1000 requests flooding in right now. you're not to that stage
and that is perfect for google colab
my partner will develop the mobile app and I'll develop the backend. This is not the case. We both have enough experience. The issue comes from SD 🙂
Others offer a 3090 for 90$ a month if you want to start with a low initial invest.
If you only want to create image (and considering your actual knowledge) it might be more useful to use an managed API and you pay as you go.
do not become yet another company that publishes an application and puts it into production, and then uses their users as alpha or beta testers. do it right.
if I need, then? I have to re-design the logic/structure.. I'll now have enough time than
so you redesign. you don't want to put out a product that's not ready
Vast ai offers reserved instances
you can reserve the data centre ones for over a decade even
where can I find that device for $90? 4090 processes a video in 5 mins.
how many frames is this video you're creating?
sure. I have to be ready for any case
and what's your FPS
yes I'll try that. Maybe it will solve my issue
25
you have got to stop putting the cart before the horse, here. you're worried about what is probably a year down the line, when you should be worrying about right now.
my partner is testing it. She reduces it to 10
haha you are right
25 frames? and you're doing FPS of 10? i do not even want to consider your videos. that's garbage
sorry
standard is 24 fps
it does not need to be a high quality video.
the aim is important
i'd have to run the clips through davincie and speed them up
look around you. everyone and their dog is doing AI video now. either you put something out that's better than pika at least, or you don't put anything out at all. no one will be interested
can I run a hugghingface model on davincie ?
davinci resolve is a video editor
i think that first what you and your partner need to do is learn video production
forget AI, you dont' know video
it will not help me. I need to automatize it
that's pika's problem too. they don't know video - thus their videos are junk
I'll tell it to my partner :)) thank you
you are up against byte dance, luma, kling, open sora - all applications that are being created by people that KNOW video first
you either learn video, or you don't compete
I cannot compete and not intend to
2 years ago, being able to do animations with ai was amazing. now if the video isn't hollywood quality, people aren't interested
we just want to create a simple mobile app that process a video with SD. That's all
our videos are not that s*it
those already exist. you should probably talk to purz
and look at what animatediff can do
can I ask
are you using an image model to process the frames
or are you using a video model?
I know animatediff, we tested it too.
actually both
two steps
one is an image and second one is a video process.
Stability AI offer Stable Video Diffusion in their API
as well as just the image models
you might be able to use that
can I use models from hugging face and also some extensions in Stability AI?
no
Mein erster mit K.I. erzeugter Song nebst Video.
Es wurden 4 künstliche Intelligenzen mit unterschiedlichen Werkzeugen verwendet.
Habt Spaß und teilt mit euren Freunden :)
Twitter: MojoYates_SL
but this company does offer that in a serverless setting
https://comfy.icu/
bare in mind serverless settings cos more per hour
than 24/7
and there can be warm-up times
they're not going to be happy with him using them to process multiple user requests like he's tlaking about. that's not what they're set up for
thank you. I did not know that. I'll look at.
600k credits ≈ 18 hours*
it seems that it will not be enough
none of these services you are looking at, @idle meteor , are designed for what you are doing. you need a dedicated, business, solution. you're trying to use solutions that are set up for single users. thats' why they are not really going to work for you - and in some cases you might make the company mad enough about how you're abusing their sytsem that they ban your account. you need to talk to these companies, tell them what you are doing, and get a custom solution set up
you are completely right. I'll talk to my partner regarding purchasing some GPUs
that's the best idea, yes
welcome
I took a look at their FAQ
looks like they don't mind too much
Yes, if you're looking to use ComfyICU as a ComfyUI backend API, please refer to our API documentation.```
they're not expecting it to be used by a bueiness with a 1000 requests flooding in all at once or in a short time
would be worth asking them about capacity yeah
Good morning coffee(tree)!
Funny ... I was thinking about mushrooms prompting my good morning coffee 🙂
Tired of life? Walking on the street 😄
Using Image Prompting (x4) in Fooocus - text prompt = lobsters bathing with astronauts and quarterbacks
Fooocus Image Prompting (x4)
Hey, I was thinking about using color segmented image (furniture encoded with colors) of interior with Stability-Control-Structure API. Did anybody have similar experience and can share some advice regarding prompt structure? Thanks!
Foooocus
reddit balls https://www.reddit.com/r/mildlyinteresting/comments/1ednh4a/a_statue_for_sale_in_venice/
you are having a 'ball' with that lora, aren't you?
thats not my lora its just a thing
is that a clownshark in the back
just an average day on the beach
at the very least, maybe gravity will be depicted correctly
lol
managed to splice out the B code from ultrapixel so i can use my own nodes for it
lookin good
wow that's awesome so we can just get regular cascade stuff working with the big resolution of ultrapixel?
not yet
you can just generate at huge resolutions as i've done for a while on here
but this allows you to use bigger resolutions for stage C internally (which ppl call "Lower compression" but i hate that term cuz i think it's confusing and misleading)
which means more structure to the detail
normally, if you were to crank the resolution up, you'd get mutations galore (try setting compression to 22 or something and see how that goes lol)
ah okay so its a bit like deepshrink or hidiffusion
I made a bit of progress on details and noise injection
you can get stronger and more interesting effects sometimes by injecting noise in places other than pixel space or latent space
so for example injecting noise into the self attention numbers
or into the control net
kinda but more sophisticated
that's interesting
are you dev'ing nodes for that?
that's something i never really thought about
there's one made already called CADS
sorry I mean cross attention not self attention
just going in and modifying the 2D cross and self attention maps directly has been done by most of the regional prompting and composition libraries I guess
it doesn't seem to be a miracle method
the dense diffusion paper, mentioned in the Omost repo, says that it lowers image quality without careful restrictions anyway
Sublime
Tremendous
Wild
Forgive her for being sexy, but discovering the HiRes Fix function. Creates some sweeet results
Hires is on the left.
that one was noise injected into control net
oh nice
it interprets it in funny ways
this time it was a giant tree building with space pods apparently
and these were the CADS node injecting into cross attention
it just really ups the small details
interesting, i wonder what would happen if you injected something other than pure noise
something... noise-ish
its a bit different to pure noise its actually multiplied by the conditioning
its actually quite funny that this even works that well
it takes your conditioning signal, that is CFG combined with any control nets etc
and then shakes it a bit
and thereby avoids the thing where a strong conditioning signal reduces the variety of outputs
this might be able to save some checkpoints that are overly over-fitted
there would have been a risk of the conditioning vector trajectory being blown completely off course
but they added a little procedure to do a linear transformation on the conditioning each step
to keep it on course
the node just reapplies the rescaling scheme every step though I think I might add a scheduler for that
new architecture study, sd1.5
really interesting
hell yeah! shark power!
guys what it this on comfy?
@smoky vigil
I meant checkpoint and lora file tree storage
testing it on my windows machine first before I set it up on the debian
This means you have missing nodes. Use the manager to install them.
the way I set it up is that my comfyui is using my A1111 folders so I save a lot of space.
oh?
interesting,
Guess I'll have to track down that manager you were talking about
is it an extension on the webgui?
0k thanks 🙂
Any tips and tricks on ComfyUI before I start get too far in?
I wanna too haha I came from a11 and I'm a bit lost
sounsd like your WF file might be corrupted
its not that different from flowsheets that I do for datacenters
at least, visually
well I desinstall the node, let see
healing clay?
@royal monolith Checkpoint is AutismMix Lightning
Is it possible I accidentally changed something in the Settings tab that makes everything come out blurry? If so, is there a way to revert all settings to default?
are you planning to change the entire image or just what you've masked?
Only masked part
you've got whole picture selected
I've always done it like that without issue, has that changed?
not sure, but try changing that and see what happens.
Alright I'll give it a try
Didn't work with Only masked option, result is pretty much the same
Super love her
I HAVE STARTED CREATING FREE NOTEBOOK AND WORKFLOW FOR YOUR COMMUNITY JOIN IF YOU LIKE, YOUTUBE VIDEO SOON MORE THINGS SOON
Prompt: https://civitai.com/images/21542182
"Blazing Ember" - Reimagined (OC: HerpderpIwork - Civitai)
Good morning coffee!
Sushi!!!
God shave the king ...
They are really selling animal dryer machines ^^
CADS node but with high PAG:
the CADS node adds some sampling variety and bias towards small details but can get too chaotic
but PAG can lower the chaos a bit
just CADS alone without PAG looks cooler but very chaotic:
does cascade have advantages other than the latent compression thing for speed and resolution?
was wondering why you choose to use it over SDXL checkpoints
coherence at high resolutions
not needing multiple stages with upscales, which i find often degrades an image
tiled upscales result in loss of spatial coherence from one region to another
and latent upscales result in some things being fixed due to having more pixels to work with, but also introduce problems every time... usually with loss of complexity, smoothing, and introducing its own mutations cuz it's a poorly trained resolution
actually yeah that does look nice
new suspect target acquired. need to review target. there seems to be a path. go to explore.
exporation log. surface hard to detect. anomaly ahead detected. need more input to cover for noise in data.
a pakistani beautiful girl
Ultrapixel?
ultrapixel is just a cascade mod
it allows you to generate higher resolution latents for stage C
without mutations everywhere
Have you tried to use other models with Ultrapixel?
is ultrapixel quite a small mod?
in terms of code?
the code is pretty chaotic right now tbh
i'm trying to sort through it
it's only gonna work with cascade as it's trained and built on the cascade architecture
i see no reason a similar project couldn't be done with other models
but they wouldn't be nearly as vram efficient, i'd think
and you'd need a rack of a100s to train it
But we have some cascade models if I remember right
fine tunes yeah
I don't rly call fine tunes seperate models
but in some ways they are
if you have a bf16 version, and just the unet, i'd think it'd probably work
but eveyrthing's typically provided as the checkpoint and comfyui has nodes for saving clip and vae separately, but not the model itself
so i gotta sort that out
I wouldn't drive that one ^^
Ever since the inpainting tab looks like this (as of a couple days ago), inpainting has not worked for me (all outputs are blurry and have no variation between seeds). Has this happened to anyone else?
4090 desktop version?
Mobile version only comes with 16GB 😦
yeah, really lame
😛
they cut the cuda cores in half or something too i think
Next computer will be a desktop again ...
yeah i'm very glad to have a desktop
the UW display is amazing too for comfyui, coding, etc
49" neo g9
5120x1440
My notebook is pretty good. Never been a friend of notebooks but this one works for me. Graphic card limitation is the only weak point ...
yeah
With buying the notebook I won a 5k € coupon for the ASUS shop ... so seling it after a year for a new desktop sounds like a good deal 😄
wow yeah no kidding lol
fortunately for you zen5 and the 5000-series are coming soon
I'll visit ASUS at the GamesCon as their guest ... hoping for some insider information 😄
does anyone know how many tokens is best amount for SD3
Sorry, not really ...
can anyone tell me if there are local open source alternatives of the same caliber as Luma AI? (I already know SVD).
(this is Luma AI at right)
nope
opensora 1.2
needs 67GB VRAM but can run on rented datacenter GPU
but heavily lower any expectations
great stuff
67 gigs of Vram xDDD
I have only 12 (3080ti)
CFGrescale, SAG, PAG and FreeU experiments:
for the most part FreeU is too problematic it was too easy to burn the image
SAG is excellent at very low strengths
freeu_v2 is pretty good if you're really really careful imo
yeah agreed with SAG too
gotta be real careful with them
yeah it was v2
i sometimes like going in the "wrong" direction with the numbers
but only for part of the denoising process
PAG seems to be the best at fixing things but can also burn
oh going the wrong way with FreeU sounds great yeah I didn't think of that
yeah il'l do the first two numbers the "right" way and then the next two the "wrong" way or mix and match
I wish opposite of PAG was possible
casacde is just wild
maybe I should check the code and see if opposite concept of PAG can be done
well, maybe it could
i haven't looked at it at all but if it's perturbing it, there's gotta be an opposite direction to go
are these regular cascade?
cascade with the ultrapixel boost for stage C
ah ok yeah
and my samplers for stage B
which is what's allowing these to look decent at insane resolutions like 5120x3072
clown and shark? lol
I still need to try to learn clownsampler
called em that cuz we were getting too confused trying to say which node was which for the settings
it was easier with clown comes before shark
I couldn't quite work out what the original res sampler does
different to normal sampling
it's the RES sampler with a TON of shit built on top of it, a whole ecosystem of stuff
a lot of rewrites wwith the code, it's a lot faster with some noise types and there's way more of em
basically think of it as dpmpp_2s_ancestral except better in every way
the only sampler that holds a candle to it (but is a lot less versatile and can't make nearly as clean of an image) is dpmpp_sde
at least from what i've tried
I feel like I've got a decent grasp of the deterministic ODE samplers
but I haven't learnt how the stochastic ones work yet
best place to start is prolly the clownsampler tbh
got the most options
set eta = 0.0 and compare to dpmpp_2m
then gradually increase it to 1.0 and see what it does vs dpmpp_2s_ancestral
yeah thanks this would be a good way to learn
the funny thing is samplers might go away one day
I read so many papers where they mention near the start that the long term goal is to do the entire thing with a single step of euler
the whole idea was to take the most accurate stochastic sampler we have right now (RES, afaik) and make it so you could control virtually every parameter at every step
you can even inject whatever noise type you want at every step, you can override the noise sampler
ah yeah I like noise injection
I kinda started getting the effects I used to have to use noise injection for from CADS
the little chaotic details
since CADS is essentially CFG noise
@languid pebble good morning coffee!
Good morning!
damn that looks a lot like an actual painting in did way back. i wonder if my shit made it into a training dataset lol...
either way, shits awesome man
thx
We are both running 4090s with the same monitor! 😄
...but it's still no excuse for your workflows! 🤣
I've fallen in love with Cascade all over again
Much better than SD3
Also unnecessary 😄
i did a bunch of tests, it's actually beneficial
in some cases it just looks blown up vs half that size
in some it gains real detail (the sampling is absolutely critical here)
but in virtually all, the artifact rate goes down significantly and the coherence goes up
it may be best to just gen at crazy resolutions and downsample x0.5, like supersampling
in a number of my tests the difference was pretty stark
stark?
very clear, obvious, etc
German word 😄
and here's one that was at 5120x3072 then downsampled x0.5 with lanzcos to 2560x1536
sampled using my custom "cascade_B" pyramid noise
which makes a hell of a difference in many cases
the gaussian one generated at the lower res has a lot of noise artifacts
I notice that you don't change the sizes of "c" when you increase the latent size 🤔
yeah
C you have to be really careful with
that's why i hate the "compression" thing when inferencing
C is what's important
B just determines how much detail you're gonna get
but you'll get basically the same image, you can even change the aspect ratio and it'll just stretch it
I'm using 60
prolly could get away with less
but i've noticed that 90 seems to be the peak for quality in general with res
here's the real q though, do you have CheckpointSaveFucked installed yet? 🤣
Nope...that's fucked 😜
lolo
5120x3072...forgot to downsize after that 😄
I did get this with increased size though Warning: Ran out of memory when regular VAE decoding, retrying with tiled VAE decoding.
Look how clean those zips are
yeah the vae/stage a is always gonna be the limit here
it's worth it though
plus, what other model are you gonna be able to do 5120x3072 at in 3 min lol
and get anything decent at all
The limit is probably because I'm running it all through an LLM 😉
Doesn't matter what I set the eta to, they all look nice. Do you always use 1.0?
it's actually ramping from 4.0 to 2.0
the 1.0 is a multiplier if there's a scheduler hooked up to the etas input
otherwies, it's the absolute value
you can get away with lower but i've got it cranked up a bit to make up for the caveman sampling with stage C
What's that missing node?
this one?
Yes
It doesn't like me 😄
That is one strange necklace
Expensive 🙂
got prompt
Failed to validate prompt for output 112:
- StableCascade_StageB_Conditioning 83:
- Return type mismatch between linked nodes: stage_c, IMAGE != LATENT
- ClownSampler 75:
- Failed to convert an input value to a FLOAT value: guide_1, median_d, could not convert string to float: 'median_d'
- Failed to convert an input value to a INT value: guide_mode_2, pyramid-cascade_B, invalid literal for int() with base 10: 'pyramid-cascade_B'
Output will be ignored
Failed to validate prompt for output 79:
Output will be ignored
[rgthree] Using rgthree's optimized recursive execution.
[rgthree] First run patching recursive_output_delete_if_changed and recursive_will_execute.
[rgthree] Note: If execution seems broken due to forward ComfyUI changes, you can disable the optimization from rgthree settings in ComfyUI.
I need a bigger monitor 😄
Is it possible to implement video object recognition functionality similar to track-anything (link: https://github.com/gaomingqi/Track-Anything) in comfyui or web ui?
stunning
@vague grotto you can do automatic masks like this
Not sure if thats exactly what youre looking for
@shut sinew This can indeed identify all objects, but I want to identify and track only a specific object. I haven't found a similar implementation yet.
Try a vision transformer like florence2. I can draw shapes over things you're looking for
https://arxiv.org/abs/2311.06242
There are comfyui nodes for it and the models are tiny. In the node, you have to set it to the right mode and it has an image output pin that will composite the selections like this
We introduce Florence-2, a novel vision foundation model with a unified, prompt-based representation for a variety of computer vision and vision-language tasks. While existing large vision models excel in transfer learning, they struggle to perform a diversity of tasks with simple instructions, a capability that implies handling the complexity o...
Thanks for the tip, I'll definitely try out Florence2
Yeah I'm not 100% sure how well it will do for solo tracking of one specific person in a group because I've only ever really used it for single things. But there are a bunch of other vision transformers like it and I'd imagine they have some specialized ones for things like crowd tracking.
Those might be your better option, but I've never tried them.
Security companies use them for crowd safety stuff and corporations use them for tracking individual workers to see who is slacking or working, as well as shit like seeing who is wearing a hardhat or not
Hello everyone, I am writing to try to understand why many images related to architecture are made with deformations? I have highlighted some of them in red, I have been trying to modify the NEGATIVE PROMPT for days but to no avail... here is what I wrote: cg, cgi, 3d,cartoon, sketch, drawing, anime, ((deform lines)), ((deformed contours)), low quality, (((low resolution))), mutation, jpeg artifacts, ((artifacts)), (((camera deformed))),bad proportions, extra limbs, flooring white.
I use architectureralmix as checkpoint and DPM++ 2M AS SAMPLING METHOD
Swim at your own risk.
Too early?
Because AI doesn't actually understand anything it generates. It just associates concepts with image data and that image data for this specific image might have tens of thousands of images worth of data influencing it from the dataset.
If you want crisp lines and actual euclidean geometry that doesn't do all kinds of weird optical illusion stuff, model out a basic scene in a 3d package and use controlnets like edge or depth
It still won't be perfect, but it will be more consistent
ok understood even if in this way I still have to model something... I need it to show a starting point precisely to model after what is defined through these images to be realized with AI... thanks anyway for the answer...
yeah, you can also model a basic scene out without textures and stuff, just simple colors, and then use controlnets+img2img to fill in the blanks as well. like if the simple 3d scene has brown cabinets and you set the denoise to like 60%, it might figure out that the brown should be wood or something. the controlnets will help it keep its shape though.
what models come close to these? i think these were done in MJ. im trying to have similar results to these also is there a good way to img to prompt? so i can get a idea
did you see it? Hunyuan-DiT new chinese model better than SD3
Model Downloads:
models/checkpoints/: https://huggingface.co/.../HunyuanDiT.../tree/main/t2i/model
/models/clip/hydit/: https://hf-mirror.com/.../tree/main/t2i/clip_text_encoder
/models/t5/HunyuanDiT/: https://hf-mirror.com/.../resolve/main/model.safetensors rename to mT5-xl-encoder-fp16.safetensors
Comfyui Support:
Update your comfyui to latest version
well, have to test to see if really is it all that they are sayng
cascade just oozes style
Yes, but it has also been terrible at some things 😄
That piano is a mess
Had quite a few hands and feet turning into long tentacles.
it's just a vision of our evolutionary future
if you're getting mutations, try these settings
ignore the other crap, that's me fuckin with stuff
What's the relationship between those "c" dimensions and the latent size? I thought it was just a divide by 32
...and why are there 4 of them?
I lowered latent compression to 38
height_c and width_c are the dimensions for the upscaled latent for C
height_c_lr and width_c_lr are the idmensions for the low-res latent for C, that serves as the guide for the composition
right now i can only really recommend a few values for height_c_lr and width_c_lr
24x24
24x40
18x30
beyond that it's gonna be hit or miss
those are the most heavily trained resolutions
you want height_c and width_c to be exactly the same aspect ratio as the "lr" ones, except bigger
and for stage B, just do whatever you want
you can even have it not be exactly the same aspect ratio, it'll just stretch it
the problem with "compression" is the math behind it and the critical importance of the dimensions for latent C for the composition
if you're using the original cascade workflow, and you have a 1024x1024 empty latent with compression = 42:
1024 / 42 = 24.38... these always round down, so 24
that gives you a latent that is 24x24 for stage C
change that to, say, 40, and suddenly you're at 25x25, which isn't a resolution that it was specifically trained on
and it's mutation city time
ah, ok...thanks 🙂
I'm using your sampler for b
The tidiness may drive you insane 😄
hahah nice
yeah def mess around with that stuff, who knows what you'll find
my whole goal with those nodes is just to facilitate experimentation and discovery
sometimes it's subtle, sometimes not
all it does is basically generate batches at each step, and select one of the seeds based on various simple comparative analytics
Those arms
sexy
crazy good
Yes, but still a problem with hands/arms 🤷🏻♂️
see what happens if you use 24x24 for lr, and... idk, 32x32, 36, 40, 42, 48, something ilke that for the other
i have def noticed before that the 1.666:1 AR has a lot more trouble with mutations than 1:1
24,40
30,50
36,60
42,70
48,80
54,90
60,100
66,110
72,120```
these are all 1.666:1 fyi
the higher end ones, espec the last two do some weird stuff sometimes
damn is that clean
wouldn't know if those are real characters to the left or not
but cascade with ultrapixel does seem to be better at text
```Had quite a few hands and feet turning into long tentacles.````having seen Clownshark's art for like a month he probably views this as a positive 😄
free cosmic horror
" Stage C is responsible for the content of the image, while Stage B acts functionally as a super-resolution model, adding details and increasing the resolution of the latents, but ultimately not changing the image in a semantically meaningful way"
^this is from the wuerstchen paper's supplementary material, and neatly summarizes why i prioritize manually setting the dimensions for latent C
You can also decode the latents from stage C to see the small squint version of the image. Based on the compression ratio, it will be like 128 or 256 or something tiny like that. You can basically preview things before wasting further time on the upscale/refinement stageB
I always thought the concept of cascade was a really good idea. Wish it caught on more
no kidding
it doesn't even have to be the specific weights or the exact architecture... the general concept is fantastic
i see it almost as being like training a lora vs a finetune in a way
matrix A.B for lora
stage C, B for cascade
C and B dividing and conquering the training tasks in a way that reduces complexity in a superlinear manner
FreeU
b1 = 1, b2 = 1, s1=0, s2=0
CADS
noise = 3, apply to = conditioning, key = crossattn
same settings
somehow completely breaking FreeU with b1 = 1, b2 = 1, s1=0, s2=0
means you can use CADS without it being overly soft
it's pixart 512 res upscaled many stages with aam xl anime checkpoint by lykon. it specializes in very sharp line anime.
ah ok yeah thanks
I guess the upscales are making it possible
with enough SAG/PAG/FreeU sometimes you can get there without upcale
but is tricky
Cute Lunar, a sd15 model
perturbed the model with enough CADS noise and wrong FreeU settings that it started adding letters to the images 🤔
another time it made star wars into a stained glass window
sd15 models at higher res seem to double a bit sometimes
Thank you for using comcom analytics.
"comcom analytics" supports all community managers (moderators and server owners) by stats, visualization, and analytics.
If you have any questions, feel free to ask us!
Your dashboard
Help
Support server
Other languages
en: help
ja: help Japanese
guys stable diffusion have a logo?
I'm making a presentation about it I wanna put the logo there
/credits
what?
yeah, stability.AI does https://stability.ai/ but you better ask permission before using it
Activating humanity's potential through generative AI. Open models in every modality, for everyone, everywhere.
ok stability ai
crystal I have to ask permission to put there logo on apresentation to talk about sd making advertising and credits they there?
onyl asking I dont understand why I have to do it, but if I have ok I will do it
finally got CADS working properly
had to add PAG, FreeU and Deepshrink to get it to work
it turns out you never really need both PAG and SAG together
PAG is meant to just be strictly better
the PAG authors spent half the paper roasting SAG lol
yeah I want to jump over to cascade, and separately Kolors at some point
i'm still ripping out code from the ultrapixel codebase
some functions and classes appeared in various versions like 8 times lol
god it's a mess
so much unused code
i've deleted at least 10k lines 🤣
two entire evenings just hitting backspace
getting close to something manageable
@celest sigil How do you get high quality and sharp quality generations. Many of my generations are blurred. What upscaling methods do you use. A lot of my generations are image to image.