#🆕|sd3
1 messages · Page 124 of 1
i don't get it
I made a node to automatically patch right or bottom depending on the input image
that's different than what you were showing earlier riight? I was expecting to to see the same image as the couple as the top frame and then a new image as the intepretatin as the bottom frame, like hte knight and the princess
can you do the 3x frames like the knight and princess one where its the same original top image and new bottom image?
Oh I see
visual identity design in context lora
It is ok with that
lol i didn't know it applied to tattoos so how did you get that?
are you using xiao's workflow to input an image and used the visual identity lora?
Just need a extra prompt for the new scene
its embedded, i modified a workflow Klinter made
An energetic scene of a fiery chili pepper character and a frosty popsicle character interacting in harmony: the chili pepper radiates a warm, blazing aura while the popsicle exudes a frosty chill, their contrasting energies blending in the center to create a swirling mist of steam. The backdrop is a fantastical fusion of molten lava and glittering frost, with small flames and snowflakes dancing in the air around them. Their expressions are playful yet balanced, symbolizing the coexistence of opposites in a vibrant, dynamic moment.
so its similar to what xiao is doing where you can input the image on the left and have the lora generate the imiage on the right?
exactly that
cool you guuys are really taking it one step further
The edge is copied from above
lol oh wow that's really cool well done it follows the prompt really well too
id give iti a high score for adhernece
ilike the first one better, anyways thats really cool so @gusty trail that's not the "try on" lora? what's the name of that workflow?
No, I am just using this one https://civitai.com/models/933026/flux-product-design-in-context-lora
The try on lora is very specific for try on
You just need to load this lora and do the prompt
Or any other examples' lora
but you need a specific workflow to do the top mask/ bottom mask right?
produut-design lora + mask workflow right?
Would this work with people, faces or hands?
lol yikes
You could see some example in the workflow
The mask node is not avalible on manager yet. You need to download it from https://github.com/lrzjason/Comfyui-In-Context-Lora-Utils
I mean realistic images. Like replacing a face, or bad hands.
You might need to use with inpainting
i think the try-on lora demo'ed replacing the face
yeah its not as good as inpainting methods
My friends used ic-lora + inpainting which seems pretty good
And I think you could actually do the second pass with inpainting to add or modify some generation
XiaoZhi do you happen to know what this one is? TTPlanet/Migration_Lora_flux
This is the trained ic lora. It is on hf. You could find the link in github
oh duh i see now
i think of the in-context lora set my favorite is still the filmboard
set up a complex prompt for all 3 scenes and see how well the model can adhere to it, i wish they would make this lora for SD3
speaking of Sd3 I feel the community hasn't embraced it as much as I expected, I'm prettyy much the only person posting SD3 content on civit, it's like really dead in there, i never see SD3 content on my feed either, like ever
TTPlanet has nice upscaling nodes also
caption tiles, and a decent splitting node also
speaking of which @sullen moss I tried your suggestion for that 4xFFHQDAT upscaler and while the one is undeniably better in quality than the one i settled in I was unable to adopt it bc iit took 45 seconds to upscale an image with it
Well I don't make art on an industrial scale, for me it's all about quality
i settled on BSRGANx2 delivers solid results in 10 seconds, I also liked RealESRGAN_x2plus but the other one edged it, had to do a bunch of side by sides to really settle on which one looked better
lol yeah i don't make art in an industrial scale either
the wayy i see it is Im dedicating my precious GPU time that it could be spending rendering images on upscaling those images that I'll be sharing onlinie for the enjoyment of others, so I don't think its crucial to spend another 30 seconds on increasing the quality by that much
like if it was +5 or even +10 more seconds id consider it but at 30 seconds, I couuld've made 2 images in that time, its not that important especially since civit is going to downscale the thumbnails anyway
The main thing is that there’s a choice, and everyone can find what suits them best. That’s the beauty of open source.
yeah you're right, i ended up on this website: https://openmodeldb.info/?t=arch%3Aesrgan it was super handy finding what i need
i coul djuset set up the filters and the scale to suit mym needs, download all the results and just run them and compare
if i set it to Faces and 2x that's pretty much the only result too
their discord is excellent if you want upscaling advice
there are models ranging from really fast to really slow now
since you liked RealERSGAN maybe you would like RealPLKSR
i see 2 valid options, theyy're both for anime, one sharp one soft, the other for text, the other for VHS tapes https://openmodeldb.info/?q=RealPLKSR&t=arch%3Arealplksr+scale%3A2
there's more around the internet
oh cool okay i thought that website was the end-alll be-all lol
not rly although it is good
i tested my current upscaler of 2xBSRGAN vs 4xPurePhoto-RealPLSKR and it looks darker and fuzzier/blurrier/softer with the 4x
Shuttle 3 832x1216 2 step ----> 2396x3501 via 6 panels 4xNMKD-Siax and 2 step per panel, the TTP_Toolset workflow
that looks way better
i like how it didn't just upscale but it enhanced
how fast is it tho?
yeah but ive tried "4 step" flux models before that are still just as slow as the counterparts, like how fast is it in seconds?
so the left is the original and the right is the upscaled right?
what does it say in the end like in regards to how much time it took
this line: Prompt executed in 63.24 seconds
lol pfft
might be faster now
yeah but like 40% faster at best, that's still a good 150 seconds even if we half the time, and you got 11gb whereas i got 8 so for sure mines wouldn't be that fast
toss me a hard to upscale prompt 🙂
score_9, score_8_up, score_7_up, ((solo)), ((adult)), cinematic, best quality, 1girl,, pale skin, messy hair, short hair, auburn hair, freckles, freckles on chest, green eyes, dark makeup, tradwife, sundress, field, flowers, outdoors,
original left, remix right
lol ill go swipe some somewhere else, comma salad
lol
field of flowers in the prompt is usually a good upscale test bc there's usually lots of tiny detail in the distance that's very easy to judge
mmm good idea
have you tried InstantIR?
i personally felt betrayed by how long it took regardless of the quality simply bc it's name implied it was going to be fast, like instant fast lol
an ornate box, open lid, a spider crawling out of it; Rob Gonsalves; hyper-realistic,hyper-detailed, fantasy, cosmic art; elegant, intricate, detailed, extremely textured, colossal, monstrous.
bilateral symmetrical dichotomy
@mortal mesa so explan to me this setup:
Shuttle 3 832x1216 2 step ----> 2396x3501 via 6 panels 4xNMKD-Siax and 2 step per panel, the TTP_Toolset workflow
what's shuttle 3?
what's 6 panels?
so the upscaler you used is 4xNMKD-Siax but it doesn't work with the built in comfyui Load Upscale Model stuff so I need the TTP_toolset custom_nodes to make that upscaler work?
shuttle diffusion is flux - make sure your other models work with flux
so he's using shuttle difussion 3 to generate 832x1216 @ 2 steps and then upscaling it to 2396x3501 using 4xNMKD-Siax and what's the 2 step per panel and 6 panel thing about? is that a special node?
the time i said isnt going to be accurate either cuz i ran this whole section im going to excise out
iit looks so much better
Shuttle 3 is a modified flux schnell checkpoint that they advertise as 4 step, it also works well at 2 steps. The initial image is generated at 2 steps, its upscaled by the NMKD, cut into 6 panels and each is run through ksampler again 2 steps each than reassembled
and each panel gets run through florence for a prompt
i didnt make it, just addapted it
lol damn that's some level of care going into upscaling thats next level
InstantIR is quite possibly the best upscaler
seemed to beat Supir
yeah from that screenshot I posted (not sure if you saw it) it really did seem like the absolute best hands down
supir was way to heavy for me but ya could do some nice stuff
the advantage of Supir currently though is that its been broken apart in ComfyUI now by Kijai so you can swap certain bits in and out
i should revisit it, i was talking about when it first came out with zero optimizations haha
i'm replying to the comparison imamge of the upscalers if you wanna look at it
this happened with the Flux unsampling thing too
people tried a slightly janky version when it first came out so they thought it was bad
left SUPIR right InstantIR
Oh and this was one i was curious about also https://github.com/2kpr/ComfyUI-PMRF
thanks will try it
the upscaled chest and spider kinda looks bad full size :/
ii itired PMRF but i have cuda 12.1 not 12.4 inistalled so im gonna put it off for now
might be the noise injection
had some coloured artifacts
ya im messing with noise and denoise levels, i have to say i am pretty impressed with the tiling method, better than what ive tried
those came out fantastic
@mortal mesa I modiied that workflow you're using a few ways
- replaced flux shuttle with sd3.5 large fp8, I could probably optimize it further by trying turbo instead of large but i found large is faster than turbo when using the AIO model bc it comes built in with fast clip models
- removed the image generatioin part converting iti to a purely upscale workflow
- replaced florence with BLIP for speed
I got it down to 117 seconds and the results are outstanding
Prompt executed in 98.99 seconds in subsequent runs
original image let, second one is pre-texture detailer, third isi post-texture detailier
sooo speaking about the texture detailer area, look where the negative prompt is coming from, it doesn't seem right, i dont understand it
but ya ill be reusing that middle section also
yeah i noticed that, im replacing all that jazz with clownshark
?
I gave you flux
But hey google is your friend: https://comfyui-wiki.com/tutorial/advanced/stable-diffusion-3-5-comfyui-workflow.en-US#google_vignette
Master Stable Diffusion with ComfyUI Wiki! Explore tutorials, nodes, and resources to enhance your ComfyUI experience.
@dusky thistle check out this artifacting, using sd3 large q8
i think its really interesting how the sampler will never produce an image that is objectively flawed its only subjectively bad
wow
what's it look like with euler in ksampler, or the new euler_ancestral in ksampler
i could add that to the queue but I gotta wait for res_3s to try a pass at it first
that was using res_2m
and this is the source image not that it matters bc i didn't use img2img, and this is the prompt that generated that image:
hideous witch, by Sergei Parajanov, <lora:aidmaMJ6.1_v0.3:1> <lora:aidmaImageUpgrader:1>,
same seed, same exact parameters for both, res_2m left, res_3s right
idk if you've tried rk_exp_5s yet, it's a little slower than res_3s but another step up in quality espec with SD3.5M
res_3s understood the assignment when given that prompt for the "hideous witch" instead of making some crazy wizard it actually did a witch and i wouldn't say she's hideous (eveyrone is beautiiful in their own way) shes quite shoking to look at lol
do the artifacts show up with regular SD3.5L (or do you not have the vram to test)
you can still see some of the same patterns the wizard showed as far as texture and artifacting, il def try rk_exp_5s and compare it
i dont have the full 3.5L unpruned to test I just have the fp8 AIO, the q8 and q8 turbo
can you screencap the clownshark params and paste the prompt? i could give it a shot over here, i have the full
is this easier for you?
btw i don't choose that layout, its just how i programmed it, i feel like i should spend some time to program the layout better
if (!nextItemCloned.workflow.oldWorklowUsed) {
nextItemCloned.rk_type = _.sample(['rk_exp_5s', 'res_3s']);
}
here's what I'm going to do moving forward, in situations where I ask to use the new workflow (shark) instead of always picking res_3s it'll randomly pick one of those 2, any other high quality and slow sampler I could add to this list ?
here's the origin of the artifacts
it's the length of your prompts
SD35 is really really weird about it
never seen a model do it but if you go past 72 tokens it starts going downhill, espec with the neg conditioning
ah makes sense
the truncate conditioning option in shark is there just to safeguard in case you go over and are seeing problems, it cuts it down to the size for a one chunk embed
so it really does make sense to truncate SD3.5 negative prompts right?
but the truncate option does so for both positive and negative right?
yeah
ill just add some code onn my end to truncate negative to 77 tokens for negative for SD3.5 and maybe flux too?
https://novelai.net/tokenizer i just use this to double check how many i'm using
actually for flux i'm just blanking it out
72 tokens for whatever reason is the limit
i stil lhaven't gotten around to looking at what's going on whatsoever, but once you hit 73 input tokens, truncate changes the output
72, it's the same
so it's probably moving onto the next block at 73 for whatever reason
yeah i'm just using a blank neg for sd35 myself
if (isSd3Model){
// truncate to 250 characters
nextItem.negative_prompt = nextItem.negative_prompt.substring(0, 250);
}
alright truncating in place
well i fgured sd3.5 isn't distilled and it's always handled negative prompts so i dont want to treat it like flux and just blank it out
i haven't done comprehensive tests or anything, but the ones i did do... seemed to degrade with negative prompts of any kind
ive seen that for sure on flux
it def isn't like it was with cascade or sdxl (or sd15) where something like "bad quality" actually did generally lead to a better image
where any negative prompting heavily reduces the quality of the image, even when using a dedistiilled model
i used a really long negative prompt for this one score_6, score_5, score_4, source_pony, source_anime, pink nipples, source_furry, source_cartoon, censored, deformed hands, deformed fingers, extra fingers, missing fingers, extra limbs, missing limbs, bad eyes, ugly face, blurry face, wrong anatomy, crossed eyes, missing leg, missing foot, unattached hand, deformed, deformed face, bad teeth, ugly teeth, low quality, bad quality, worst quality
negative prompting is not necessary for cfg, in fact it's not even part of the original cfg implementation
awww dude - Value not in list: sampler_name: 'rk_exp_5s' not in (list of length 28)lol c'mon so thats a new sampler then? gotta do the old git pull? is there any hidden bombs i should be aware of before i pull it?
cfg itself is a hack, but it's necessary for diffusion models to reach good performance
negative prompting is a hack of a hack
wow didnt know that
if you can get rid of it, perfect. If you need it, use it, if the model works without that's great
i much rather have cfg than have a distilled model where I get a range of 1 to 1.8 and i lose control over the aesthetics of the image
I prefer cfg, too, but I talk about negative
negative prompts are not part of cfg, they are an optional feature
they might work, or they might make things worse. People often misinterpret how negative prompts work and might overuse them or use them wrongly
yeah i agree with that statement, heck it took me a while to grasp the concept of negative prompts
not my creation, just a fun share, created using flux

sd3.5_large_fp8...aled | 🌱 4224251490 | 🦶 29 | 🦮 3.5 | cfg_scale_alt 3.5 | 🧠 sd35_VAE | 🎤 dpmpp_2m | 🕦 sgm_uniform | 🗓 11/16, 11:33 AM | ⏱️ 107s
(ignore the sampler/scheduler its just using res_2m) my only gripe with it is that vertical line on the left side
@short thicket it makes sense to see the mangled model performing on par with the flux destill model, i wouldn't say it's any faster
last one chart I promise, i removed the destilled models, and the SDXL model at the bottom and included al results regardless of sample size or percentile
you dont understand it or you dont believe the stats?
weird how medium is taking the time for me as large right? you see how red and pink are like inline with each other?
it looked like a mountain and i visualized a ski run
not really. in my experience, medium and large run about the same speed
oh wow interesting so you're seeing the same thing too, i feel like that's kinda bullshit, that gives me 0 incentive to ever use medium
don't you find that's weird considering one is a much smaller model?
medium is designed to be more artsy than large. it has a use
not really
oh i didnt know that
medium makes a nice refiner, too
so it's not the same training data just distilled?
no, it's not
that changes things good to know
cfg 7, cfg, 6, then 5 then 4. it's clear lower cfg improves image quality I just really liked the scene at cfg 7, im doinig another run at 3.5 to see what that looks like
this workflow is for 3.5 l with upscaling and 3.5m as a refiner. it was released with the rest of the 3.5 releases by SAI. you might take a look at it
oh cool ive been playing with the idea of trying to make a performant one, I was messing with the florence 2 yesterday, i think Kagi or NeonNinja shared it
i could get it down to 100s with decent results, Ill try that one and modify it to start with Load Image rather than 3.5L and see what kind of times I can get with it
sounds good :)
lol wow talk about upscale
thats fun to look at, you an even see inside the caves
@mortal mesa how long did it take you to make that image?
normal time, nothing special, what i was doing yesterday with slightly diffrent settings
so that's the 6 panel workflow with florence and the NKMD upscaler riight?
mmm i was swapping upscale models, i forget what was used on that one ide have to load it, ide bet 4x ultrasharp, but ya that WF. raised denoise and lowered noise injection
yes i was pleasantly surprised, local video stuff is tough for me (time and OOM) but ya could be nice
Russian DUNE.
which one is mangled and which is acorn?
Did you see the chart and skip my comment? It’s the one “on par with the destill” model in other words the red line is clearly destill being the slowest model of the group and the orange line that touches it is mangled. ChatGPT made the chart don’t blame me on the color selection lol
Did you see the chart and skip my comment? Yup. LOL I read it afterwards. My bad.
Lol it’s fine, I was upset about not being able to tell either and I had to reason my way to figure it out
Have you tried it past 3?
I haven’t tried it last 3 Lora’s
I wish I could convey the sample size per entry in a chart
I feel like having 100+ enries for given model and Lora count would be more accurate than something with 1 entry
@short thicket okay the time it took me from when i said that to when I was happy with a result is 40 minutes so you could almost say i spent 40 minutes making this chart (for fun of course)
the green means a confidient number bc there's enough tests done for that scenario that the indicator is a good measure of actual average times
the yellow means not so much bc there's between 10 and 100 tests done
and the red means take it with a grain of salt bc less than 10 were done so it might be an edge case outlier as far as actual expected times
how much vram do you have and are you sure you didn't fill your vram during generation?
I ask because that changes the numbers
if vram filled
Also, I noticed these are all gguf. Ive heard fp8 safetensors runs faster than gguf.
i thought gguf was designed to be faster than fp8, by being better at memory conservation it was inherently faster than fp8?
in my experience theh fp8 version for SD3 does run faster in fact here's the chart for it
@halcyon yarrow you are putting way too much time into this to just post it here. you should consider making a video tutorial
butu i don't think the fp8 is faster bc of the pruning method but rather bc it forces me to use the built in clip that uses the lower quality set than the triple clip setup i normally use
maybe i'll put it as a civitai article, ive done those before
that'd be good too
im also geniuenly curious so im doing it for myself and sharing with others
it'll get buried here and lost.
gguf is slower than fp8
particularly on GPUs that have native fp8 matmul
neither of these are pruning
pruning is something a bit different
quantsizing somethinig isn't a form of pruning? i think so
speaking of which, 3B pruned flux came out just now https://huggingface.co/TencentARC/flux-mini
not really
pruning is to selectively remove bits from the model by quantsizing you're selectively removing the precision bits
when you quant somethign, what do you do?
you're rounding the precision on the weights right?
you're not removing data however, you're just stopping how many decimal places you do the math out to
when you prune, you actually remove data
if i'm wrong then civitai is wrong bc they call different options like bf16 and fp8 as different pruning types
quantisation is converting floating point numbers to less precise formats, whereas pruning is actually removing weights from the calculation
i agree when you prove you actually remove data, by rounding a number you're effectively removing data
those are the number of decimal places you allow on the end of a number, which affects the precision of the math
a full unpruned 22gb model has had it's data removed to become a q8 model at 11gb
it hasn't had the weights removed.
(if it's flux, it doesn't need 4 gig of the padding anyway)
i never said it did right?
i think that's a form of pruning
by reducing the number of weights we can also call that distilling
pruning is when you cut branches off a tree. quant is when you put a ring around the base and don't let the roots grow out very far
these terms mean specific things its not actually debatable
look all i'm saying is I'm using the wrong terminolgy take it up with CivitAI.com bc they're a really big player in the industry and they're using that terminology to refer these different methods like fp8, bf16, q8 etc
i think pruning is a general term that can refer to distilling (to reduce weights) or quantsizing (to round weights), with both methods you're effectively removing data
civit does a lot of things wrong - but @bitter hearth is a programmer, and math guy, and civit is not. i would listen to him.
it isn't, however. it refers to a very specific acton that is taken on a model
its not a big deal, let's just agree to disagree 🤝
call it pruning the precision
There we go that’s a good one, speaking of pruning I wanna try that Flux Mini 3b Neon showed off, looks bad ass
@dusky thistle @bitter hearth https://everlyheights.tv/everly-heights-xyz-grid-evaluator/
think that's worth anything?
I think I badly misunderstood this applet
you're not supposed to feed it dinner
I do like the composable loras on this site
separate loras for background and characters etc
12 seconds to generate uusing Flux Mini, this is the default prompt ComfyUI puts in
@bitter hearth have you tried it on comfy yet?
18 seconds, 40 steps, cfg 4.5 ddim beta
gonna download it now
67 seconds at 40 steps cfg 4.5 ddim beta
flux mini seems to be struggling
this was SDXL on the same prompt, earlier in the year
lol yeah i was just gonna post that
this is wiith clownshark sampler
ksampler just sucks thats what itt is
clown stuff is just so much better yeah
Where do you use flux mini, on comfyui? Because it gives me an error there
that was 170 seconds too
its a diffusers based model so you cant use load checkpoint
you gotta use load diffusion model and then load iit up with the clips and standard flux vae on the side
ahhh ok thanks
this part of comfy is confusing
do I put the model in unet folder or diffusion_model folder
putu it ini the diffusion_model folder
ok thanks
you can just load my workflow if you wanna try it
Thanks
the better question is, do i bribe Comfy to create standards or do i just accept him doing stuff at random
sometimes it doesn't copy the metadata when i just copy the image through the clipboard so here's the fileupload
LOL
im trying res_2m see what times i get, i think i can safely bring it down to 20 steps too
I've actually mostly switched to Diffusers at this point just because its more standardised TBH
but there is no nice UI unless Matteo's project does well
i don't like diffusers format it just makes things more confusin and mostly bc my whole codebase is writtne around the checkpoints folder i dont support models in that diffusion_models folder
flux dev is ok at 20 steps yeah it will be worse quality than 40 steps but will still give an ok image
im gonna have to convert it to SD format i have a script for it later
i have faith in matteo
126 seconds, res_2m, 40 steps
oh I agree having two setups with different structure is confusing
this is why there is a long delay for stuff to get ported from diffusers to comfy
it lost coherency at 20 steps, 66 seconds but the bottle went missing
its a bit tricky
if you are below 40 steps probably want eta = 0
or very low eta
that's no galactic arm, that's a rip in reality!
have you seen the ComfyUI wanna be UI that's specifically for diffusers? I saw it on a video recently it looks cute
got it i was at 0.5 eta
no I haven't seen it, would probably use one if it was good
what about eta 0, res_3s and 15 steps? lets see...
it looked good it looked like a cleaner comfyui
going above order 2 requires a great many steps
that stuff is above my level, ii dont really get what eta is doing i ust understand its a factor of noise
eta = 0 means no extra noise is added each step
if eta is anything above zero then extra noise is being added
you add the noise for a number of reasons
eta 0, 20 steps, 67 s, res_2m
mostly keep s_noise at 1.0 its quite spicy
on some models s_noise 1.03-1.07 can be a nice detail boost
i do not want to be invited to your house for a spicy dinner
clown renamed s_noise i dont see that field in the sampler anymore lol
LOL
yeah clown renames everything a couple of times per day
its part of the mystery
squirrel!
the d_noise thing is similar to that "lying sampler" node that went viral
or the "detail daemon" node that is similar
for the most part either it will boost detail a bit if you increase it, or it will break the model, depending on the model
15 steps, res_3s, 135 seconds
actually d_noise might want to go down rather than up, depends how it was implemented
the res_2m ones seem better
generally res_2m is the one for below 40-60 steps
and then above 40-60 steps res_2s with eta on is good
ive been using res_2m for everything by default, i could add logic where if steps > 40 then ill auto switch it to res_2s or res_3s. that's some good feedback thx Neon
those sort of nibbles of knowledge are fun to consume bc they make my system better overall
"...nibbles of knowledge..." i'm stealing that
20 steps using res_3s at 172 seconds. I think this is what most would consider the "gold standard" imamge for this prompt something like this image
its an ineresting model with times ranging from 12 to 130 seconds
one issue I have with these models is they could end up losing the hyper/turbo lora compatability
last one and then i gotta go, res_2s, 15 steps, 50 seconds, and I think what I changed that's making them better is i changed the base shift from 0.8 to 1.5 as per wizard's recommendation way back when
1.5 shift is fine yeah
I use a bit of a different method but it requires multiple k-samplers
Lol sounds expensive
the model goes from sigma 1 (pure noise) to sigma 0 (sharp, finished image)
and the important thing is that it has a decent number of steps before sigma 0.8 or so, or even sigma 0.9 or so
shift is one way of doing that
(or you could just roll dice and see what happens)
I prefer to use a node called split at sigma and then have a separate ksampler for sigmas 1-0.8 and sigmas 0.8-0
comfyUI needs a random dice roll node that'll set every value to something random
i'm sure there would be horrors, but i'm equally sure realy cool stuff would happen
a lot of my favourite things I found by accident
I guess it didn't know what to do with a frame LOL
all the video models just explode if they try to make R2D2 move
they can rotate around him while he sits still though
i kind of like the expanding frame idea
no they don't
meta
I mostly used cog, maybe they are better now
zuckerberg's AI
Wow talk about advanced techniques, I don’t really understand what sigmas are or the concept but I do retain some of what you’ve said how it has to setup the layout in the first 1 or 2 steps and that relates to the sigma somehow
sigma - isn't advanced, it's math
they decided to show people a scheduler name
instead of a list of sigmas
but what comes out your scheduler node looks like this 1, 0.8, 0.6. 0.4, 0.2, 0
if you choose something like SGM Uniform 5 step
might not be exactly that but its a decreasing list of numbers from 1 to 0
one number per step
The takeaway for me is that sigma is a factor where it’s a constant value of 1 and it’ll progress to 0 until it’s finished during generation
And then how it progresses is based on the scheduler and rather than having a general curve you like to split the curve with two ksamplerd
sigma 0.5 is always 50% done
cog should be able to do it, thats right now the best open source model for img2vid. really waiting for mochi img2vid support, should be amazing then for open source.
yeah the pink curve is my overall sigmas
afraid I lost the workflow for this one
yellow curve is first sampler, then blue is second
pink is the overall combined curve
Did you try my img2vid node? lol I made one for mochi it’s crude but it works
So that break in the pink is where you split it right?
Does clown have anything to say about this? Is there anything he can do to facilitate achieving something like that with a custom scheduler in the nodes?
don't think clown particularly likes this method lol
Lol oh I see
from what I have seen he doesn't change shift or scheduler much
Hey @pseudo owl if you want I can link you to the GitHub where I put it if you wanna try it
there is already a "split at steps" node in comfy
or a "split denoise" node
so its not too different from that
He could add a field for like 2nd scheduler and then another field for the sigma breakpoint so you can pick res_3s for the first 20 percent and res 2m for the rest since the start is so crucial
So it would be scheduler one res 3s, sigma breakpoint 0.8, schedule two res 2m
nope, I did see some examples tho, seemed surprisingly decent but very little motion. nice work
maybe yeah, a lot of things can work
you do need a lot of steps for res 3s, sometimes like 60-100
res 2s and res2m need less
Lol yeah indeed, I was gonna try posting with a technique ChatGPT suggested called latent interpolation where I feed it the start and end input image and have it try to make an image to video that way but I feel like it would only work for the simplest of examples
I always change res 2m to res 3s without changing the steps (it’s like an option in my UI) to retry rendering an image and it always does fine
I think res 3s works at low steps just fine in my experience I don’t think I’ve ever reran it with the better sampler and didn’t get better results
there's a way to measure sampler error to know for sure
its in the original DPM paper
haven't seen someone make a comfy node of it but that might be cool
I do wish I could automate retrying, I know there’s solutions out there like a classifier that detects if the image is garbage or not, stuff like artifacts or just a solid color image or messed up patterns I’m just wary of going down that road bc it can also be subjective plus added overhead of classifying each image generated
If I could measure the error rate that could be a more light weight metric to trigger a retry
there's image quality assessment tools yeah
funny mochi gen
lol yeah
Thats really good too!
Have you tried pushing your vram to see what your max frames count is?
These looks like a good 7-8s clip
this one is from the offical website which uses unquantized mochi with 200steps + an upscaler, locally I only tried videos with very short frame counts since longer takes forever, and I cant wait 20+mins.
you could - just kick it off before you crash for the night
i do that with luma sometimes - kick it off and come back tomorrow
I think my max is 85 frames which comes out to 3 or 5 secs depending on the fps. At 85 frames it takes my 8gb GPU about 25 mins to process
If I wanna render just one second or like 16 frames it takes 5 minutes sometimes 4 so it’s not bad lol
SD35M
no more jelly beans before generating
@bitter hearth I was upset that this flux mini requires it to exist in diffusion_models instead of the checkpoints folder, so after like an hour talking to 4o, then o1 preview about it I finally figured out the solution and wrote a script to convert mini to be compatible with ComfyUI's load checkpoint. yay. Posting the first image generated with this WF
made using SD35L
SD3.5 large
@bitter hearth I posted flux mini on civit and converted it to 3 other formats so 4 models posted
Yeah, since the release of 3.5, there hasn’t been much visible interest from the community. On Flux, custom models were already available just a week after launch
lol yeah I agree wholeheartedly, often times it feels like I’m the only one posting any images for sd35
Also yeah the number of new Lora’s for sd35 is really low/slow. I remember sd3 had a lot more Lora’s in its release
I’ll wait for the model that has the full realistic skin suit, I hear it’s coming out soon too lol 😆
thanks a lot, the Q8 will be helpful
I agree some people prefer checkpoint so they should offer a range
dude can you believe flux mini took down their project?
its coming back with a 404 now
it actually took a lot of research to fiigure out how to convert it from what the base was to soemthing that'll work via load checkpoint, then a lot of work to figure out there's a Save Model node I can repackage it with the CLIP AND VAE built in
and then it was easy breeze converting it to gguf after I got past the Load Checkpoint hurdle
i was trying to manually compile the safetensors file using python and then after a little research i felt dumb realizing i can do it in comfy
save model node yeah
I've used the save clip node once as well its similar, or save diffusion model
I also tried to bake in stuff like flan instead of t5xxl and that doesn't workk, I tried to bake in LongClip in various different ways with and without the dedicated node and that also didn't work, so the baked in has t5xxl fp8 and the vit14 finetune by zero point
that's fine most people like that Clip L fine tune
personally I liked flan too but its controversial
yeah its a 3b model so it needs all the help it can get
the model wasn't trained with flan
so it is not clear it is a good idea
this applies to the Clip L fine tunes too by the way
I am not sure personally, I often try both
its interesting messing wit that stuff you sort of get to peek behind the scenes, i figured t5xxl and flan internally were the same structuure but its likely different layers and then comfyui is doing some special sauce to adjust to the different layers internally
same goes for longcliip, i cant build it in bc its expect it always to be the 'standard' clip L and it only works via the dualcip/tripleclip loader bc of internal adjustments they're makng after the fact
this whole thing started bc i want to use flux mini but i don't want to code support for the diffusion_models folder so it turns out all I had to do to convert it from the base model to something that'll work with Load Checkpoint was just prefx the layer keys with a certain string and that's it. super simple change
i kept calling the base model diffusers format but that was incorrect its actually in "flux transformers format" so i kept argunig with o1 preview like "okayy if its already in the target format why isn't it working?" i ended up dumping the structure of an existing flux model that works via load checkponit just so gpt can review and compare and figure out the solution
I've been using single clip loaders and then concatting the embeddings, for what its worth
if you use SD 1.5 with ELLA T5 and Clip L then you have to do it this way
I need to check exactly what the dual/triple clip loaders and prompt text encode nodes actually do TBH
it is the same layers no? its just a finetune right, I might be mistaken though.
I was saying on comfy discord a while ago that I want to make a new set of loader and encoder nodes
which will be model-agnostic
so for example you will use the same loader and encoder node set to encode prompt as you do to encode images for IP adapter embeds
i don't understand that statement, ii thought internally SD1.5 was just designed for clip L so you're makng t5 work for sd1.5?!
ah there is a special thing for that
its called ELLA
its really cool
https://github.com/TencentQQGYLab/ComfyUI-ELLA they had to train it
there's no fine tune I made. regarding the layers the flux mini is a completely different architecture as the post said they were from so and so many single and double blocks to nearlly a fraction of it
SD 1.5 with ELLA has better prompt adherence than SDXL
its crazy
the best things tend to have zero hype for some reason
wow that's crazy Neon pretty cool stuff the level of care they apply to this stuff
there's no downside to ELLA, that I know of
I've just been just LongClp with SD1.5 but I'd be willing to try Ella and compare it
oh I thought you were talking about flant5xxl vs t5xxl, yeah flux-mini should have much less layers then it
and ELLA iis only compatible with 1.5 its not compatible with SDXL?
very sadly they made ELLA for SDXL but did not release it
every now and then people go ask them on github
oh yeah in terms of those 2 there must be some different structure or layers internally, there has to be bc i tried bakng it and it would just generate a black image
lol aw what a shame
some of the best stuff is not released
there is a fine tune of Lumina that looks as good as Flux to me
but its not released (its in the I-max paper)
The only downside was that some knowledge was lost but that was just because of the dataset but it has far better prompt following then sd1.5: https://github.com/TencentQQGYLab/ELLA/issues/35
you can add Clip-L embeddings as well to get some back
but yeah actually that's a good point
it will not be as good as pure Clip-L for subject knowledge
i don't usue sd1.5 that much, i do support it but sdxl would be where it's at
cos even if you add Clip-L embeddings with "concat conditioning" node, the T5 embeddings are competing with them
this is my understandinig of what the model architecture supports
SD15 - only L
SDXL - L & G
Flux - L & t5
SD3 - L & G & t5
this is the sort of style that I like SD 1.5 for:
for some reason I can't get this look in other models
its very grainy and stylized but still a photo
SDXL - L & G
Flux - L & t5
SD3 - L & G & t5```yea that's right
the left one clearly shows signs it was made with an inferior model but the right one could almost pass for flux at low rez
i might release an update to flux mini aio model replacing the t5xxl fp8 with t5xxl v1.1 fp8, took me a bit of searching huggingface to find it bc you only see the full 22gb model and im not gonna embed that into flux mini lol, i had v1.1 but onlyy in gguf and you cant bake those in either
the on the left is 1536x1536 which is why it looks worse
really hard to get SD 1.5 to do that res
Theres also lavi-bridge which actually makes sd support llms like llama 2 7b as text encoder and pixart support t5 large instead of t5xxl
https://github.com/ShihaoZhaoZSH/LaVi-Bridge
no gguf baking, no longclip baking, no flan baking
oh yeah thanks I forgot about lavi-bridge
its the competitor to ELLA I need to try it
that's pretty cool just read it, seems very similar to ELLA in my mind
there is a 46GB version of Flan T5 XXL
I've been thinking about using it with SD 1.5 as a joke
cos I sometimes rent the 80GB servers (only $0.70 per hour luckily)
so wiith lavi bridge in theory we could get SDXL to work with T5 right?
not sure
yeah thats the one it was 46gb not 22gb i mispoke
i think that would be an interesting bit of testing i could try, ill def pursue that over ella
I think ELLA is actually slightly better but Lavi-bridge is easier to train.
my understanding from months ago was this yeah
but I should try lavi-bridge myself before dismissing it
Also I believe the t5xxl models are so large since they also include the decoder part which is not even used in text encoding, the actual encoder part of t5xxl which is used itself should be like only 9gb.
yeah you can set the decoder layers to 0
some guy commented on the flux mini model's page NO use for it all images dont follow prompt and bad anatomy, loras dont work ... ! and i checked allready 50 flux models lol like granted loras don't work, i agree, but if its not follwing prompt or anatomy that seems more like a setup issue than a model issuue
interesting and so would doing that somehow improve performance or whats the point of that?
the main benefit is it would be smaller
so less download time, and faster loading in comfy
it may or may not be faster, but it would definitely not be slower
mostly models get faster when you set layers to zero but sometimes its not a big gain, it depends
set it to 0 before I bake in t5 model so exclude the decoder part so it's more lightweight all the time, that could be handy, ill def look into that too
do you know the node name that lets mem zero out layers?
this isn't doable in comfy it seems
comfy tends to not be so good for LLM stuff yet
so i could just manually edit the model with safetensors library in python and then just use that modified model in comfyy to bake it in, i can just switch gears and do that instead, ill try it when i build the next flux mini
i even posted on flux mini's discussion board a bunch of text about how cool it is and how I posted it on civit and how i was offering these different versins, im sure Tencent team didn't tae it down permanently perhaps they're just preparing for another launch, their github was broken at the time so a better release would be apt
also @dusky thistle I linked to your github and suggested your sampler for using flux miini bc it does produce better results
hopefully i can convert some people to jump on the shark bandwagon
good to know that it does! which rk_type/sampler type did you choose
res_2m and res_3s are the onlyy ones i ever used, i wanted to ask you about that new one you suggested, its not in the list of samplers so im guessing i have to do a git pull but i wanted to confirm with you there isn't any bombs i should be aware of that I would have to adjust for?
- did you change the order of any of the inputs or outputs of the nodes?
- did you add or remove any fields to any of the nodes?
- did you remove any of the options in any of the existing fields?
res_2m and res_3s are the onlyy ones i ever usedsame but also res_2s
similar though
I also liked the soft scaling more than hard 🤔
I'm going to be generating some images using flux mini to post them on the model's page and showcase it's ability
A breathtaking landscape of a rugged mountain range covered in dense evergreen forests, with rocky outcroppings in the foreground. The bright blue sky and scattered clouds add depth and serenity to the scene.
A detailed portrait of a young woman wearing a luxurious red dress with intricate lace details, accessorized with pearl jewelry. Her confident gaze and the soft lighting create a regal and timeless atmosphere, reminiscent of classical art.
one on the left is incredible for 3B
one on the right needs refine
but that's ok
for example eye
yeah the right eye ould use a little help but its still pretty good
oh you think the left eye could use help too?
i think the left eye is fine if she's looking that way but the right eye looks deformed
its not "bad" but even 2 steps of Realvis Schnell would help a lot
we have "eye detailer" now
like face detailer but for eye, nose etc
in impact pack can do that
A beautiful still life painting of vibrant pink flowers in a ceramic vase placed on a wooden table by the window. The sunlight softly illuminates the petals, creating a warm and inviting atmosphere, inspired by classic oil painting techniques.
A stunning surreal cosmic landscape featuring a majestic lightning bolt striking through vibrant orange clouds, with planets and stars in the background. A lone figure stands in awe, surrounded by ethereal beauty, evoking a sense of wonder and exploration.
A serene and atmospheric scene of a train station nestled in a lush tropical forest, illuminated by warm lights. The station features vintage architecture, and people walk leisurely along the platform under the towering palm trees.
but I don't like impact pack I do it in other ways
these are all using res_2m btw
i dont know if the WF is embedding into these images im just copying the image via clipboard
impact pack wants to do everything in terms of a new data structure called a "SEG"
but I don't want that
Wow, that's actually pretty great! When I tested mini-flux, I got some pretty bad imgs but I think it was my settings probably.
i just asked gpt4o to generate me a prompt for each of these images and that's how i got the prompts, civit took down this image bc of the kid in the third frame
oh is that why
this was what got me into flux mini like if it can generate stuff this good its gotta be worth trying
your checkpoint is the only one now lol
the original women in red was using the q8 model by the way, after i quantsized it to q8 the model is actually 3.4gb
imagine 3.4GB flux model
q8 model left, original base model right
q8 left, original right
nice its the same, essentially
again q8 left, original right. they're super similar to each other, for being 1.5GB smaller its pretty astouding how good it still is
yeah Q8 is great
A vivid and colorful depiction of a nebula in deep space, featuring intricate clouds of gas and dust in shades of red, blue, and gold. Bright stars shine through, creating a mesmerizing and otherworldly cosmic vista.
sometimes Q6, or Q5 ones can be good
its that term, diminishing returns, i like to q8 bc I just want less memory usage at the expense of a little bit of loss, iim not willing to accept more than just a liittle bit of loss lol
i think this is the train statin in the forest one, people don't look so great here
I kinda agree Q8 is a good choice these days
personally I do everything FP8 but there are costs to that
look at my times for generating these images on my 8gb gpu, like I see one going as low as 43 seconds
then i click on that liitte share button in the corner for each iimage, it'll run it through 2x VLMs for post title and tags, run it through the 2xBSRGAN upscaler and use exif_tool so CivitAI can read all the details on how it was made
ah yeah I love automated chains like that
and iti'll do all that in 10 secs per image
its hard to recommend upscalers to people cos there are so many variables but you can definitely do better than 2xBSRGAN
it has to be 2x bc I dont want 17-30mb image files laying arounud, i ithink 4-5mb is decent, and it has to be performant, ive used better models that def look better but im not willing to dedicate 30-45 seconds to upscale it
https://github.com/Phhofm/models/releases/tag/all_modelsthis script downloads lots of good ones
actually this link might be more helpful it compares speed:
https://github.com/the-database/traiNNer-redux/wiki/PyTorch-Inference-Benchmarks-by-Architecture
cos there are really fast ones now too
would recommend span
ah this one seems perfect https://openmodeldb.info/models/2x-NomosUni-span-multijpg-ldl
its a 2x SPAN one for photographs
awesome dude thanks for the links Ill def go over those benchmarks i love that kind of stuff
no problem
with these upscale models its always worth trying a bunch
cos unlike diffusion models, the upscale models cannot work well outside of their exact training data
so it depends if your image matches what they expect
@shell bloom lets talk here buddy
so you're saying you got 4gb of ram and you're fastest time yet with the 3B model at 1024px is 54 seconds using the aio model
No as a video card I have 12gb of vram, a 3060
oh dude with 12gb of vram you should be getting way faster times
when testing make sure to try two similar prompts twice
the first time it has to lload the model, the second time is signifincantly faster
ive gotten times as low as 42 seconds with my 8gb so you should have no problem going even faster
what graphics card do you have?
4070 on a laptop so it's technically more like a 4060 for pc
I have an rtx 3060 but it is slower than a 4070 laptop, but I honestly don't know
wht's your cpu?
the newer generations have more tensor cores maybe that's why? im not too sure. if you need help loading the unet model just take the image posted on my gallery and load that into your comfyui workflow
i7 8700k
are you running anything else in the background when you're generating?
I should check
I will try mate, thanks
a 3060 is going to be fairly slow as it is. so you can't have anything else running that'll want that GPU
I'm waiting for the new rtx 5000 to build a new pc
for now, make sure no games, or anythign else that wants the gpu while you're generating
https://civitai.com/images/40558181 click on the little blue Nodes button and itll copy the WF to your clipboard so you can just paste it into ComfyUI using CTRL+V
I will, thank you.
@shell bloom I think that @craggy crest is the right guy to ask in terms of sd35 training Ive read some of his messages and he's keeping up on that. im not into training or finetuning or any of that
thanks buddy
i have a script called shuffle-checkpoints that'll reassign items destined for other models to a specific model I set, so I just queued up 500 images into the flux-mini queue, expect to see some more flux-mini examples posted shortly
@bitter hearth testing your theory, wrote a quick script to remove the decoder side from t5, queue'ed up a generatin to see if it works or if itll produce an error
okay thanks, I've been looking for flux-mini samples lol
looking at the t5 xxl v1.1 fp8 and iti doesn't have the decoder layers so it's alreadyy optimiized
i think they're supposed to be like orcs or monsters of some sort lol
cannot tell if orc or chair 😂
is both
maybe they'd still have the One Ring if they had orc-chairs on their side
@mortal kite So you’re taking low Rez images of shirts and essentially up scaling them?
The in context thing?
I just hit 500 downloads on that Lora earlier actually
Yeah I think I got confused for that being the load image node lol
500 downloads for the in context Lora’s and not a single person has used it in the civit generator to post anything online, that’s kinda frustrating for me, I was excited to see how the community would use it but everyone is just offline generating and then not tagging if they even share it online
I was thinking that's what happened yeah
I actually don't use previews personally
if I want to see generation partly finished I would just stop k-sampler early
I’m the opposite, in fact I rewrote ComfyUi latent preview node just so I can see the preview of the batch
It’s super fun monitoring the preview of a batch of 4 in a grid and then on step 16/20 it decides to completely redo one of the set and it makes it way better or worse
I used to do batches of 5 now I do 4 just so it fits neatly as a grid lol
you might like this node pack https://github.com/blepping/ComfyUI-blehit has improvements to the k-sampler previews
might be useful for ideas
oh yeah I don't use batching, its useful though
I think at this point I’m a die hard shark sampler guy, the only beef I have with shark sampler is how it handles the preview but I don’t hold it against him
I used TCD sampler for the vast majority of my images
not for Flux though
He’s letting the system handle it based on global user preferences and then he’s listening for the preview events if enabled, whereas ksampler efficient advanced ignores those settings and uses the local node settings to decide whether to show
What’s TCD?
its like hyper its a distilled version of SDXL or SD 1.5
this is the sampler for it https://github.com/JettHu/ComfyUI-TCD
So the sampler is also a distilled version of a model? Thats interesting
no the distilled versions are loras
the sampler just works well with them
cos it came from the same paper
the sampler is similar to euler_a
How does TCD handle artifacts?
You know how sometimes ksampler will do green splotches? Like in the corner of the mouths or the eyes or nose n
not sure I haven't seen those
it's generally worse than regular SD 1.5 or SDXL for accuracy
That’s like the ultimate pet peeeve for me, spent all this time generating an image and it almost feels cruel bc it’s artifacting key areas lol
restart sampler is the best, technically
as far as I know
does not work on flux though
Restart sampler? Don’t let ClownSharkBatwing hear ya lol
lol
noisy DPM/Res/Deis with a decent amount of Eta is also good but restart sampler is a bit better
I think it’s gonna be hard to switch from his sampler tbh, I’ve noticed on really complex images where the chances of artifacts are high it’ll go into this mode where rather than green splotches it’ll do like these artistic overlays I wish I could show you lol
That’s kind of a deal breaker for me
I especially like how I can run all 5 base models with the exact settings and it handles it like a champ
yeah I actually don't use restart personally anyway since I use TCD
Sd15, sdxl, pony, flux and sd35 all with the same settings
ETA 0.5 Gaussian Gaussian res_2m beta57
the reason I like TCD in particular is that it is the acceleration lora with the highest image complexity
(there is a model they use in papers now that judges image complexity)
those settings are good yeah
in my tests res_3s needed a lot more steps than res_2s
but it depends on settings/model/workflow etc
Interesting, I am willing to adopt even more hardened samplers that can tackle challenges better, currently if res2m fails me I use res3s but even that still fails tho from our last chat I could try switching to res3s and add double the steps to see if improves
there's implicit steps as an option too
but you have 8GB so it might not be worth it
there is a limit to how slow would feel okay
@bitter hearth so I found this ComfyUI node to use lavi https://github.com/kijai/ComfyUI-LaVi-Bridge-Wrapper/issues/1 and then I looked up the issues before tackling an install to confiirm it works for sdxl land the team said:
Thank you for your interest in our LaVi-Bridge! We did not include SDXL in our current work, but we are conducting experiments on SDXL with LaVi-Bridge and will update our progress promptly in both the research paper and this repo.
that was 6 months ago they said that
comments are precious:
ELLA folks did release the adapter checkpoint for t5+SD1.5 (spoiler: it's not good) but announced that SD XL adapter will not be released.
With the ELLA team shooting the open-source community in the back by not releasing its SDXL tool, it's now come to your team to be our savior. Good luck! We're all rooting for you.
this comparison screenshot of before and after is pretty impressiive
wow nice
the embeddings really do play a major role in image composition so being able to inject t5 into these legacy models would give them a huge boost, i'm just not willing to add support for it if its only 1.5. once sdxl support comes out I'm first in line to try it
the llama 7B results were even better apparently
@halcyon yarrow do you know how to get comfyUI to run in a docker container on linux?
the only experience I have with that is using RunPod, technicallyy it's a docker container in linux, so I would just pick the image that has the prebuilt toolchain and then just have a provision script
I could share my deploy.sh but it's speific to RunPood
see the image I use is this one:
imageName: "runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04"
it doesn't even come with comfyy just pytorch and cuda bc those are the 2 hardest/longest things to install
and then my script just does some basic stuff like:
if [ ! -d "$COMFYUI_DIR" ]; then
git clone https://github.com/comfyanonymous/ComfyUI.git "$COMFYUI_DIR"
# Navigate to the workspace directory and update the repository
cd "$COMFYUI_DIR"
git reset --hard origin/master
git pull origin master
# Step 4: Move the custom_nodes directory to $CONTAINER_DIR/custom_nodes
mkdir -p "$COMFYUI_DIR/custom_nodes"
mv -n "$BOOTSTRAP_DIR/custom_nodes/"* "$COMFYUI_DIR/custom_nodes/"
fi
drat. okay. i keep running into people that want to use linux, and i have one guy now that's trying to use docker as well.
if you're using stuff like AWS i'm sure there's prebuilt AMIs that have ComfyUI built in or at least pytorch/cuda pre-installed but those instances are very expensive
Also this is pretty nice, improves performance of clip in general, they are more focused on multimodels but should work in sd3/sdxl/flux and other models which use clip
https://microsoft.github.io/LLM2CLIP/
LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation
i have a weird looking rendered image when doing SD3.5 in A1111
im using SD3.5 large
i assumed it had something to do with a VAE ... but i do something wrong, maybe its the wrong file or wrong location
Print it, frame it, sell for 50 million usd.
i will make a note
@atomic quest you might want to read this
does anybody know what i might do wrong?
explain? you might be using the wrong vae, wrong lora, wrong encoders, given the prompt wrong, have all sorts of other settings wrong ... can you post your workflow?
i put the sd3.5_large safetensor file into the models/stable-diffusion folder. I did download a vae file from civit.ai
do i put the vae file into models/vae or models/stable-diffusion folder?
i tried both, but it is not working
@sudden parcel are you talking boutu ComyUI bc ComfyUI doesn't have a stable-diiffusion folder afaik it's checkpoints, diffusion_models, or unet
A1111
you have to use a VAE that's for SD3.5 - and it sounds like maybe the one you have isn't for sd3.5. also, you want to make sure you're using a model that doesn't have the VAE baked into it, if you are going to configure a vae inside comfy
for sd3.5 it's /comfyUI/models/checkpoints
and you can make a folder in checkpoints called sd3.5 if you want
for a1111, you would put sd3.5 where the other models go. it does't need a special folder. but you still need to use a VAE that's written for it, not just any vae you found somewhere. and you still need to make sure you're not using a model version that has the VAE baked into it if you are going to use a seperate vae
and you still need to make sure you ahve cfg, steps, and other settings correct
It's interesting, but it's a pre-rendering. Not the final product. Because you'll have to provide a compatible file, with transparency to a any printing firm.
Do you have a plan for that?
ok....
does this has the vae baked into it? https://huggingface.co/stabilityai/stable-diffusion-3.5-large
no. the sd3.5_large.safetensors on the files page doesn't have the VAE. and the VAE in the folder on the files page is the one you want for it. you'll find your encoders on that page, in their folders, too
or is there a better place to go and download the sd3.5 model?
and grab the sample workflow, too
@pseudo owl looking at LLM2CLIP, I lke it, I wnt to t ry it, I'm confused as to how to use it, is it just a drop in replacement? do I just add it to my clip folder in ComfyUI? I don't get it lol
the vae file is not available
that's the actual SAI release page but they also released it here https://civitai.com/models/878387/stable-diffusion-35-large
yes it is. it's called diffusion_pytorch_model.safetensors and it's in that VAE folder
just name it somethign like sd35VAE and put it in the folder your vaes go in
looks to not be complete yet
i mean im just gonna try it, there's a 1.2GB .bin file in there that I'm gonna rename to safetensors and give it a whirl
these results are outstanding
LongCLIP has been official dethroned, look at base clip L all the way inside of the blue circle, base clip L sucks a lot
ok, downloading
I didn't try it yet, just found it. I think it should work normally, except renaming it to safetensors might not, idk. Just loading the bin file should work I think.
yeah renaming the bin file did not work lol
not sure what you mean by "work normally" the Load Clip node in comfyui doesn't support .bin files
Flux
yeah new clip l/14, just updated, https://huggingface.co/microsoft/LLM2CLIP-Openai-L-14-336/tree/main
didn't know bin files dont work with comfyui.
can you link us? I'm on this page: https://huggingface.co/microsoft/LLM2CLIP-EVA02-L-14-336/tree/main
I don't see them there, anyways I just used this script and coverted it:
https://github.com/Silver267/pytorch-to-safetensor-converter?tab=readme-ov-file
but it didn't seem to work
yeah i see that 2.3GB safetensors file wowza
i thought the EVA02 version was better than that version tho that's why I went EVA02 first
We expect to release all the parameters of the text model, adapter, and related components today. Previously, we experienced some delays due to precision issues during the Hugging Face conversion process
it contains the image encoder part too but thats completely useless for image gen models, the clip load node should load on text encoder part.
EVA02 is different model I believe which is a better alternative to clip but the image gen models don't use it, they use the clip l. The best clip model is SigLip by google but again, no model uses it as a text encoder.
i keep getting: 'NoneType' object has no attribute 'device'
this whole thiing feels like it's not ready for us to try yet, if you guys get it working I'd love to hear about it
hmm... same results, no proper image, Sd3.5 model in the model folder and the vae file in the vae folder
maybe it's time to stop using a1111 and switch to comfy?
if you're willing to change, i suggest you consider installing SwarmUI and just letting it handle all the technical stuff, and using comfyUI inside it
it'll make your life a lot eaiser
@halcyon yarrow did you try any text with Flux-mini?
oh that's a good question, in fact I have the perfect prompt
i put the SD3.5 model into checkpoints
@dusky thistle https://huggingface.co/THUDM/CogVideoX1.5-5B /
CogVideoX1.5-5B
the clip files into the clip folder
@sudden parcel iif you want a quick solution without having to gather a bunch of files you can try this one with the "Default Workflow" https://civitai.com/models/879259/comfyorg-stable-diffusion-35-large-fp8?modelVersionId=984291 just put that into the checkpoints folder and you're good to go
@pseudo owl i tried a bunch of text, now remember I'm using LongClip + t5 flan so whilie its usually excellent from flux-d models it seems mini seems to have completely lost the ability to do any legible text
oh thats sad, how about prompt following?
Try this prompt, I don't expect too much since even sd3.5 large or flux 8b messes this up often but why not: "A photograph of a white cat on top of a blue dog sitting on a brown couch in a living room. Behind them is a window and 4 cow pictures, one in each corner. Outside the window is a ufo hovering and outer space."
ok ill report back w the results
i need to leave, emergency
lol thats terrible adherence
I mean thats not tooo bad, pixart also got something similar and seems better then sdxl.
one more
- white cat: yes
- blue dog: no
- brown couch: yes
- living room: yes
- 4 cow pics: no
- ufo hovering: yes
- outerspace: no
it managed to nail 4 out of 8 elements now to be fair that's 27 steps, lets try 40 and give it a good shot see what we get
40 steps
nice increase in quality, what are those 4 things flying though lol
it managed to get 5.5 out of 8 now Ill give it half a point for the half cat half dog creature
they almost look like flying cows lol
overall i think flux mini is a fun thing to play with but it has no loras, unless civitai adds it as it's own base model, and the community rallies behind it and makes finetunes and loras for it, little mini is just destined to not be famous
No, this was generated entirely with that prompt
I was planning on making these myself just at home with iron-on or something
was generated with Flux fp8
oh i see there's a new set of models here:
https://github.com/ali-vilab/In-Context-LoRA?tab=readme-ov-file#community-creations-using-ic-lora
I'll have to update the model page to iniclude those too then:
https://civitai.com/models/929592/creative-effects-and-design-lora-pack-in-context-lora
Official repository of In-Context LoRA for Diffusion Transformers - ali-vilab/In-Context-LoRA
that's funny for #8 they cited my page but I didn't create anything new I just reposted their stuff from the model zoo
@pseudo owl this sd35 turbo
is the lora only able to handle tshirts or can it do weird garments?
like for example can you try a tank top with a radioactive symbol or something like that
or a crop top instead of a tank top to make it even harder
There is no LORA. This is Flux Dev FP8 directly rendering the prompt "a fashion advertisement of a olive green colored T-shirt with an image of a DNA Double Helix. The Helix is bordered by a 70's style multicolor line. Text below the helix reads "CODE IS LIFE""
oh i see i ithouht you were still LORA'ing that's cool
actually, I'm using PixelWave not flux base
forgot
its the Pixelwave flux model
I can try crop top
this time it puts a person in there
design a little wonky but yeah otherwise nailed it
yeah you almost gotta say just the garment or just the top
SD3.5 l
and... flux
this is the best one, it just missed one out of the 8 elements
look closer at that cat
count the number of cow pictures and where they are placed
yeah the cat is a little wonky but i'd still give it a pass, its a cat on top of a dog lol
that's the only deduction i gave it that's why i said one of the 8 elements was the placement of the paintings
white cat: yes
blue dog: yes
brown couch: yes
living room: yes
4 cow pics: yes
placed in right corners: no
ufo hovering: yes
outerspace: yes
thus 7/8
the prompt is, however, unclear on a number of elements. it says "A photograph of a white cat on top of a blue dog sitting on a brown couch in a living room. Behind them is a window and 4 cow pictures, one in each corner. Outside the window is a ufo hovering and outer space." and (picky magazine editor coming out) if we break that down, we have some fairly unclear concepts. it starts out fine "a photograph of a white cat" and then says "on top of a blue dog" - ontop... how? laying? sitting? sprawled? something else? and then it moves on to this "... sitting on a brown couch in a living room" - what is sitting? the cat? the dog? both? if an author had sent me that, i'd stop right there, and send it back and tell them to revise so it was clear to the reader what he was describing. to go on, however, we come to "behind them is a window and 4 cow pictures, one in each corner" - behind has to refer back to the cat and dog, that's good. and window is clear. we know that we expect to see a window as their background. windows have to be in walls so we expect to see a wall. but then we see this: "four cow pictures, one in each corner" - each corner of what? the window? are they actually IN the window, in the top right, top left, bottom right, bottom left, corners? are they on the wall next to the window's four corners? Or does the author actually mean that the pictures are on the wall that the window is in, but are in the 4 corners of the wall? not a clue what he really means, he didn't really say. moving past that we see "outside the window is a ufo hovering and outer space" - the UFO is clear, we expect to see a stereotypical ufo through the window, but what does outer space mean? does it mean we just see a starry sky the way we see outer space from earth? does it mean we see oute rspace from some other point of view such as one of the images did, with the earth and stars around it. does it mean something else? again, it's not very clear." - and if I, a professional publisher and editor, am going to rip this apart and tell the author to revise it and make it clear, because I can not really tell what's being described, the poor AI that hasn't got any real experience other than it's data training set is really going to have a hard time figuring out what is actually wanted.
yeah you're right about all of that, the prompt is lazy and unclear butu i guess that's sort of the challenge seeing if the model can makke something that fits those loose words, i was also stumped when scoring/judging it what "ufo hovering and outer space" meant, like how that could work, and then i saw your variation and im like "wow thats a really interpretation of that text" bc despite how its unnatuural the view outside the window is of outer space
bottom line is your right prompt is unclear but it managed to put together all the elements in the prompt to deliver a really nice coherent image
yes, but it's a really good example of why AI's make so many mistakes, and why it's very important to not only be clear but also think like the computer does - unless you just really want random results
agreed
maybe AI will get better one day
[4koma] In this light and cheerful comic:
[SCENE-1] In a bright forest clearing, a cheerful boy with short brown hair stands facing an orange fox. The boy smiles and says, "Good morning, little friend!"[SCENE-2] The fox holds up a shiny red apple, looking proud. The boy responds with a smile, "Wow, that’s so kind!"
[SCENE-3] Both sit beside a stream. The boy points at the water, laughing. He says, "This place is perfect!"
too cute award!
using that in-context lora that's new called '4koma' designed to make cute scenes like this, flux dedistilled is the only model that can consistently nail a 'complex' set of text, literally every other flux model ive tried today can't even get one word bubble correct
have you tried recraft yet?
i got as far as making an account, finding there was no model i can download, seeing it wasn't free, i get there is 50 free credits per day but im not gonna get attached to something that's not free so i stayed away from it
yeah, it's not free, but it's very good at illustration and cartoon
@mortal mesa that looks like a remix from the one you posted the other day, cool variations. I lke the first one the most bc it was such a lush set of greenier, this one is more 'dead' deespitie all the plants, cool concept thats for suure
ya same prompt/seed, just with loras
this is recraft, just copied and pasted the exact prompt into each box, super intutive interface
- it nailed the first scene no complaints there
- the boy's face looks weird in the second, wiish he was wearing the same clothes too, the text is wrong too
- lost context and now it's a girl? what happened to the fox?
Im sure with a few adjusutments to the prompt I could solve all that. I could def see the use for this for creative professionals
there we go some slight tweaks to the prompt and I basicaally did the same concept in less than a minute with recraft while locally rendering that with flux took like 700 seconds lol
@craggy crest yeah recraft is pretty cool that was fun to make, took up 28 credits to do so i have room to make about 2 of these per day
Generate a poster
read the information in this channel -> #artisan-faq
Here is the image you requested.
cute boy,with black glasses
ok this is real impressive
i still wish there is a major sd3.5 fintune to improve big stuff
recraft composition is CLEAAAN
agreed
you could create one
flux is really good for multi-panel images, too. It usually preserves character identity really well. Complex text still might be a problem, though
what appealed to me a lot in the paper was the sandstorm thing
would be cool to try to train ones for smoke effects or lighting effects
i agree, I think between the in-context lora for multi-panel and doing really good text its an excellent model for handling that, but at the same time stuff like recraft is more practical for the non-tech savvy who want to do something like that and doesn't have the skills or means to run a set up like that locally, plus 700 seconds vs 60 seconds. if recraft was free free I'd be defending it more but a paid service is kinda lame
out of the closed-source ones, I think FLUX Pro 1.1 Ultra is the most impressive cos its 2048*2048 in just 10 seconds
but if you include upscalers I think its possibly the latest Topaz Gigapixel
saw a video where it did a creative upscale that ended up over 19,000 pixels wide
sometime I'll generate what I consider garbage (made using fluxubooru) bc it didn't adhere to the prompt or the source image it just sort of did it's own thing, I shared it on civit anyways, ive had 10 reactions and 30 buzz from it
SD3.5M quality seems great, but I'm only using it as a refiner to Pixart sigma. It does shitty compositions otherwise, very bland. Happy to have such a small model packing so much pixels.
this looks fine but its normal dev quality
lol yeah its nothing particularly outstanding, just funny one man's garbage is another man's treasure
its very subjective yeah
now go animate that.
lol i do have some free credits with Kling
i wrote a short story that you need to read http://www.bewilderingstories.com/issue240/sculptor.html
What is perfection in art? Who knows? But keep in mind that the 'rough spots' may be part of it.
:) yup. all artists suffer from it - too close to the trees and can't see the forest
too busy looking at the bark beetles to see anything else
personally i don't see myself as an artist, or a perfectionist, i'm not detail oriented and I often time overlook glaring mistakes, like i didn't notice the cat on top of the dog looked weird until you pointed it out lol
||56y||
cat
the dictionary defines art as 'human creativity expressed' - not what the final result is, or anything else. if you're being creative, you're making art. and if you're makign art, you're an artist
dog
Ok you got me lol 😆
Yeah I know it’s not very properly formatted but surprisingly most of the times, it doesn’t really improve quality. This is flux schnell 8 steps, same seed. Left has 3 cow pictures but dog has no head, Right is 2 cow pictures and not no head dog.
For example, left is
A photograph of a white cat sitting on top of a blue dog. The blue dog is sitting on the brown couch. Behind the couch is a square window with a square cow picture in each corner of the window, the total amount of windows being 4. Outside the window is a ufo hovering in dark outer space.
Right is
A photograph of a white cat sitting on top of a blue dog sitting on a brown couch in a living room. Behind them is a square window and 4 square cow pictures, one in each corner of the window. Outside the window is a ufo hovering and dark outer space.
Left has 3 cow pictures but dog has no head, < the dog has a head, it's turned away from the camera and the cat is sitting on it, blocking it from view
Ok yeah, that’s justified.
the most common prompt adherence benchmark, clip score on ms-coco, uses prompts like this: ```227590,The passenger train is painted brown and white.
467578,A box of donuts with a coffee in front of it.
379476,A long tunnel with a long table with lots of seats and candles next to wine glasses.
35206,a tennis player crouching down near the net
173208,A plate of food has some sesame seed bagels.
416059,Two people walk through the snow behind a dog.
350278,A zebra standing on top of a dirt field.
143224,An airplane on the tarmac and the glass passageway leading to its door
294853,"A man in a red cap, green shirt and white shorts holds a tennis racket under his arm."
323552,A young girl with glasses appears to be waiting with luggage at the baggage center.
185181,A giraffe on display in a glass enclosure.
43850,A man standing over his dog on a beach while holding a surfboard next to the ocean.
351369,A landing jet airplane kicking up spray on a wet runway.
558661,A woman that is standing in the grass with a frisbee.
119516,A beautiful woman standing on the side of a rad next to a street.
89790,A man in a parking lot talking to the driver of an army green pickup truck.
i almost feel like running the whole set against flux destill just to confirm it would score 100% on it
they've started to move on to harder benches yeah
here's two example of clownshark sampler doing some interesting effects rather than creating artifacts, prompt is:
score_9, score_8_up, score_7_up, source_anime, masterpiece, best quality, perfect anatomy, very aesthetic, absurdres, (3 girls), cute, standing in a fancy restaurant, carrying menu, french maid, intricate detail, 1girl
the extra noise helps yeah
it randomly pushes the model out of areas with low score function gradient
sometimes the model thinks it has found a good solution but it only found a good solution for that particular area of the solution space, that didn't have a lot of gradient
The passenger train is painted brown and white< i wanna see that image
just say no to flux
that's a pony specific prompt. it's only going to work correctly with pony
yyeah it was rendered with pony, i just finally found an example I can share of this cool effect the sampler is doing
for this prompt "A man in a red cap, green shirt and white shorts holds a tennis racket under his arm." the word 'holds' means 'gripped in the hand'. the normal way to say this is 'tucked under one arm' - but i have to wonder how many images in the data training set show tennis players with a racket under an arm as opposed to how many show them holding the racket in a hand?
ms-coco is kinda old now, came out in 2014
it was for object detection so for that prompt really it was designed to put a bounding box on the hat, shirt, shorts and racket
it gets kept around for historical reasons but its not optimised for image gen at all
the downsides of switching the widely used benchmarks are so high that they only change benchmark when they really really have to
FID is very flawed also, and it is now well-known how to game FID (fake a high score)
but it correlates decently enough with human preferences so they still keep it
SD3.5 does fine, but he's 'holding' a racket in his hand, not under his arm - because you never talk about somene holding a racket if you mean it's tucked under an arm
sd3.5 L prompt: a pink poodle eating a large taco while sitting on a barrel
prompt: a pink poodle sitting on a barrel. it is holding a large taco in its front paws and gnawing on it
be specific in your prompt, you'll get closer to what you want
@bitter hearth II loked into LLM2CLIP further more and I had a few takeaways
- so the LLM model is out and the vision model is out
- I tried the vision nodes in ComfyUI to try to make itt work somehow and I dont think this one is compatible with that architeture
- it seems like the only way this is going to work is basically an upgraded ClipTextEncode node where you type in ithe prompt and rather than sending it to an LLM to generate a better prompt which then gets converted to embeddings, it sends it to an LLM to generate better text embeddings
- ultimately tis is one of those the more compute you throw at a problem the better output you get, i just don't think I wanna have an 8B LLM model as part of processing pipelinie to iimprove my images
- Once someone gets that stuff working in ComfyUI maybe Ii can try quantsizing the LLM into like a Q2 to make iti real quick tho
oh this poodle taco thing worked really well thanks