#πο½sd3
1 messages Β· Page 122 of 1
when was th elast time you heard openAI say that ChatGPT knew anything about how to use ComfyUI, much less those nodes?
i literally had it code up an image to video node and got it to work
you were fortunate
lol no dude you were wrong
I have a zig zag setup that's interesting.
chatgpt is highly trained on comfyui since 4o came out, and if you know how to use it then it's able to do anything, if i can have it code a complex AF node like image 2 video then a simple change like having an input field that affects the other ones should be a breeze
i'll let you argue this with @dusky thistle
check it out this is the video I made using mochi
there were all sorts of MBW patterns for the SD UNets. Flux is a different beast though as far as architecture.
where are her arms?
mochi only has an empty latent node so you can only do text 2 video, i created a image latent node where you do load image > vae encode > image latent mochi video and i got it to work
Plus these are merging on the 24gb models so it takes a lot.
lol i mean it's not perfect i'm running like super low rez fp8 quantsizied stuff just to get 2 seconds out of it on my 8gb vram hardware you're missing the point if you're gonna point out issues with the quality
yeah. flux is using LLM architecture
especially for more advanced merge methods that involve making calculations on every block
are you limited my vram when merging?
there IS a node for mochi ...
chatgpt is like a 4 year old when it tries to write sampler code
this whole argument started bc mangler griefed at the manual labor involved with changing every field to be the same value, my response was it should be super easy and straightforward to adjust that node to allow you to do that with chatgpt, i think if it was me I'd spend 15 to 30 minutes getting it done and save myself the drudge of having to do it manually
or even simple stuff like dealing with tiling
yeah that's what i did, im using all the built in nodes it was great and easy
the method I showed you isn't too bad in terms of the time it takes. But I've tried the Dare and Della methods and both of those are HEAVILY taxing because they are running matrix calculations on every single block before merging. I spent a lot of time just figuring out how to do them without getting OOM errors on my 3090
i use it to look up funtions but i don't trust it to write anything for me that's more complex than... a bash script to rename some files or something
i sually end up spending more time fixing its mistakes than just writing it myself
i think it has a lot do with user experience and skill, it's only as good as you're able to be, you gotta understand it's nuances and you gotta be able to get on it's level
clownshark is a hell of a programmer
there is something to be said about finding a balance between fixing it's mistakes and just fixing it yourself, sometimes you're riight it's more of a time waste to have it fix things then to just do it manually but it's def a good helper in that regard where it can get me 90-95% of the way there and then i just take over and adjust the final bits
he could probably write the code to create chatGPT
i've found it gets yu more like 20% of the way there, then goes 15% of the way in some random wrong direction
i'm not doubting that he did make that cool alternative to ksampler, to each their own right?
i was serious when i said he could probalby write teh code to create chatGPT
as someone who has chatgpt code more than 90% of all my stuff i highly disagree, it's user error or lack of knowledge if you can only get it 20% of the way there
i would suggest you listen to the man
it's a matter of opinion crystal, there's no right or wrong, to each his own
not a matter of opinion. where is your code? where's your repo?
you just gotta spend time kinda learning the ins and outs of chatgpt to truly be able to master it, some are just biased and don't want to master it, there's no right or wrong here
give it sapmle code from comfyui, something like the code for dpmpp_2m so it has an example
shakes head
then ask it to write a version of that that does implicit sampling, using GaussβLegendre quadrature
then def paste the code here and ping me haha
that's what i mean by you're doing it wrongn
i'd never approach the problem like that
i was able to write it myself in like... 3-4 min
it's always interesting to listen to amature coders that think they know more than they do argue with a master programmer
prolly couldn't even explain it to chatgpt in that time
it's def best to just learn how to code the stuff yourself, it's slow at first but much faster once you've gotten some experience
you gotta sort of prime it to be able to get you there, so for example when i had it make for me the image 2 video now i gave it to the EmptyLatentImage node, I gave it the MochiLatentImageNode and I gave it all the code for nodes_latent.py so it has a frame of reference as to what's going on. i didn't just go in with a basic example and ask for somethinig complex that's what i mean by user error
son, he KNOWS how to prompt ChatGPT
well, see if you can get it to write code for that, no matter your approach
you apparently don't
and let me know if you succeed
16 years of programming experience i think it's a little messed up to assume thhings about people but to each their own right?
the the in-context lora paper was very cool but I find it hard to see the use case for making panels of 4 images
if you do, i'd ecrtainly be interested in knowing the approach
but i am def not seeing that happening lol
i've got twice that and i wouldn't even try to compete with him
i had no problem with's having it make the panels in any of the loras except the couple one
why did you turn it into a competition plotting 2 people against each other though
good for you i'd just refrain from callng people you dont know amateurs its a little off puttin imo
all ego if you dont like Pepsi your out of the club
i didn't. Shark challenged him
bottom line is the approach of feeding it a simple example + prompting a complex ask is a bad strategy, you wanna feed it plenty of examples all int he relevant domains and then before you even tackle the code discuss stuff like strategies and even ask it to ask you if there's anything else it needs to know, it'll be like "in terms of this how do you wanna do this?" and then you can think about it and guide it, just saying its a lot better than 20% of the way there and if that's as far as you get you're 100% doing it wrong
well, if you're able to pull it off, lemme know
i gave up trying to get any LLM to write complex code with lots of tensor math
the only acceptable drink is Dr.Pepper
@mortal mesa check out the model page for the incontext loras in civit i created: https://civitai.com/models/929592/creative-effects-and-design-lora-pack-in-context-lora I've been spending the day playying with the loras and posting images for each lora so visit each tab and scroll down to the gallery to see my creations π
you were busy
@bitter hearth see plenty of 4 panel examples
to be fair chatgpt generated all the prompts i just guided it with basic ideas
oh I wasn't very clear
I was saying I don't see the need to make images of 4 panels
I agree it can do it well
its such a trend on civit tho, especially for nsfw images π€«
where each frame is a scene in some erotic story
its still a big deal yeah, since SDXL could not do this
there's tons of loras for sdxl/pony that do this for sure, sdxl actually has way better ones where they're not uniform panels
ah okay wasn't even aware people made panel loras for SDXL lol
SDXL does seem to still have capacity to improve with a lot of training
the latest realvis model was a big step up from before
1055x - β- realvisxlV40_v40Bakedvae
I've generated 1055 images with realvis before i deleted it/banned it, wasn't too happy with it's performance overall
here's a good one from realvis, hard to believe it's SDXL sometimes
ah okay, do you happen to know any realistic models that are better?
statistically pony realism is probably the top most used realistic model on civit, my fave is real dream, like in terms of realistic women vs realistic in general is two different questions though
I always wondered why real dream was ranked so highly
is the SDXL version of it also good or is it the pony version in particular
babes by yogi and chromax by savy are also top notch
pony in particular, my favorite SDXL model is called mixtape v2 resonance but it's hard to be a fan of any sdxl model personally
i think what astra did with the custom CLIP L for pony sets it apart from SDXL significantly enough where it's in it's own class
its only really considered a separate model because of an early choice Civit made really
no i think they did it because of the difference with the CLIP models, they're not cross compatible so it's important to distnguish the difference technically to be able to make things work logistically
for example you can't use a pony lora on sdxl but you can use an sdxl lora on pony
this is just the weights being heavily changed though
I get that they had to communicate it to people
and the way they thought of saying it was telling people its a different model
for logistical/commercial reasons
@short thicket I'm posting a bunch of images under your model by the way, not sure if you saw the earlier ones of the home decor with the vibrant colors, he's mangler's take on the same prompt and here's FluxFusion on the right.
that's exactly what i'm saying, it goes beyond the weights, it's not just the weights changing, it's the CLIP L changing that really breaks cross compatibility
no other SDXL finetune afaik has come out wth a custom clip L. so for example when my queue is generating an SDXL image it knows I can override the CLIP models in the safetensors file with the improved ones but if i try that same strategy for pony models the image just becomes garbage, bc there's only one clip model you can use with pony base and that's the one Astra made.
at one point I asked Astra to help me figure out how to create a remix of zeropoint's vit14-l fine tuned model but he wasn't interested, something like that would require retraining and access to the original training data
lots of SDXL models train the text encoders though
i didn't know that, bc from my experience, overriding the CLIP models has worked reliably for all SDXL models ive used, so while maybe they did train it, maybe it was light training that still allowed it to be compatible with the base model? maybe astra picked a different architecture for his clip L? not involved deep enough to know those nuances but there's something about it for sure
one of the most popular full-rank Dreambooth tutorials from Furkan involves training the text encoders, for example
astra just moved the weights more
the problem is the information that its a different model got spread around a lot
so now its hard to convince people that its just an SDXL finetune
Compact Summary of CLIP L Architectures
ViT-L/14 and ViT-L/16
Vision Transformers with large architectures (14x14 or 16x16 patch sizes). Popular for their robust feature extraction in image-text tasks.ResNet-50x64 and ResNet-101x64
ResNet-based backbones with wide layers for high-capacity feature extraction, used for detailed multimodal understanding.ResNet + Transformer Hybrids
Combines ResNet for early-stage vision processing with Transformers for text alignment, offering balanced efficiency and accuracy.Cross-Attention Enhanced ViT
Adds cross-attention layers for handling dense or long-text tasks, optimizing text-image alignment.
and then when I ask it what arch does SDXL use it says:
SDXL (Stable Diffusion XL) uses ViT-L/14 (Vision Transformer, Large, with 14x14 patch size) for its CLIP L-based text encoder and vision alignment.
and then according to Astra's article:
When training V5/V6, I used a CLIP-based classifier, eventually settling on the ViT-L/14 version of CLIP, which is the largest and last model released by OpenAI.
so there we go that put's that question to rest it's using the same base architecture, i guess you're right then astra just moved the weights around more
dude i did a git pull on your stuff the other day after I updated my ComfyUI to the latest version to work on mochi... i was getting errors like there's no more sampler rk, you deleted like 2 other nodes, i haven't gone back in to see what the extent of the damage is but you did some major overhauls at some point lately huh?
i was just trying to get that snowshark to work
my snowshark buddy never got to be rendered, remember i was having that all black issue? so i updated my comfy, reran iti and still all black, then i did a git pull on your stuff and then it said i was missing 3 nodes
id' be willing to bet that git pull said there were over 1000 changes lol
im not sure i didn't go intio detail but it looked brief to me
anyways ii had to abandon your stuff for now bc ii'll have to revisit your changes and realign my code to your nodes and it's a lot of maintenance for a moving target like what you're doing so i just went back to shitty ksampler ways π¦
When loading the graph, the following node types were not found
SamplerRK
SD35L_TimestepPatcher
ConditioningZeroAndTruncate
Set Precision Universal
Sigmas Rescale
SharkSampler
ClownSampler
lol like you blew the whole thing out
DPM++ 2SA can get fairly similar results if you up the Eta and S_noise a lot
its worse but not catastrophically worse
@dusky thistle if you could share an updated workflow for mr snow shark wiith the latest changes you did that would help out a lot re-integrate your samplers into my stuff
something is screwed up if it didn't find any of those
it must not be loading
oh i see so at least you didn't delete those nodes that's good
you didn't delete any of those?
that's the first thing i looked at too, the startup logs to ensure it was indeed loading, iill restart it real quick and give it a second look
0.1 seconds: E:\ComfyUI_PlusPlus\ComfyUI\custom_nodes\RES4LYF yeah seems to be loading fine
oh weird the nodes are loading now, wonder what that was.... anyways new error @dusky thistle
Prompt outputs failed validation
SharkSampler:
- Return type mismatch between linked nodes: sigmas, LATENT != SIGMAS
- Return type mismatch between linked nodes: latent_image, SIGMAS != LATENT
- Value not in list: sampler_mode: '5' not in ['standard', 'unsample', 'resample']
so what's the eqiuvialent to 5 now?
yeah tons of shit changed
gonna get you a wf in a min
bugfix one sec
this is the mask i used
input images
for the guide and guide_inv
here's one with large
switc hto res_2m if you want it faster
@dusky thistle do you have a WF that doesn't use any load image nodes? like just purely text 2 img w/o any masks or anythinig?
alright i got some output on the shoes
not sure if you remember this file (not a comfyui workflow) but that's basically what i have to do to get your stuff working again, just adjust that spec
I haven't tried the unsampling yet
I'm really glad there is a good workflow for it now cos I tried several times to make an SDXL one over the last year LOL
Flux definitely still needs SDE I went through a lot of generations at different Eta levels on different passes and the layouts are so much nicer with SDE
its most obvious when you go from Eta 0 to even Eta 0.05 there is often a big jump in quality
def... think quality is usually at peak with eta around 0.25 for that model
finding with sd35 higher levels really help with mutations
0.5
I liked the soft scaling a lot
it was good up to 10 but it gets too soft
I had a second pass with 0 eta which helped a lot
yea im gonna add in the ability to schedule that stuff again sono
around 0.25 feels right, I was often choosing between 0.2-0.4
that'll be a second passthrough node so longclown doesn't get too long lol
there's Blepping insane chain sampler for now, it can take a few clown nodes per pass
it lets you change sampler within one pass so like
one sampler node for 0-10 steps
another for 10-20
another for 20-30 etc
not as good as an actual schedule of 30 floats
but better than nothing
if it keeps it going in and out of all the noise scaling stuff it might be helpful
for that alone
depending on what yprecision you're at you're gonna lose something doing that over and over
the shit in comfy that is
i gotta say i really like medium
it's def not as smart as large, but it pumps out some stunning images
I didn't like it on launch day but your images since have been great
SD 3.5 does seem better than flux with colours
I should try it some time
I still like SD 1.5 a lot for the diversity
ok thanks
my favourite image models out of anything are sg-minority and its sequel paper MinorityPrompt
they focused on diversity as the main thing
even on the super boring scientific datasets like LSUN-Bedrooms and CelebA, sg-minority was able to make way more interesting images
makes it look more like amateur cellphone footage
CADS was still very competitive in their paper
if we can get CADS for flux it would be amazing
its just noise injection to conditioning vector, its weirdly simple
that doesnt work for flux?
η»δΈεͺε°η
yea that's the one
cannot remember if I actually tried, I thought I did but maybe not
there is a second CADS in latentmegamodifier and that CADS doesn't work
Here is the image you requested.
even troll images are aesthetic now
yeah if it's just noising the conditioning... it wouldn't be hard for me to adapt all my noise sampler stuff for that
I started using https://huggingface.co/MiaoshouAI/Florence-2-large-PromptGen-v2.0 today its Florence 2 fine tune on civit prompts
was really good
I do want to switch away from florence though its falling behind
VLMs are exploding in quality lately
η»δΈεͺε εθ¦η§ε€©ηζθ§θδΈθ¦εΎε€ι’θ²ηεΆεηζθ§
Here is the image you requested.
guide image
η»δΈεͺε εε¨εΎε€η§θ²ε摨ζθδΈεΎε€εη§ηεΆε
oh I forgot to tell you the other big news
someone got flux working in Int4
and its over twice as fast on a 4090
they also got it working in FP4 which isn't faster now
but will be for next year's cards
can't wait for the 5090
just hope they aren't too big to stuff two on the same board
using 46,000 cuda cores to generate a single image is my kinda style
gonna be a lot of doorless cars driving around
Here is the image you requested.
I put the 8 step Flux turbo lora on my image and the quality went up rather than down π€
θͺεδΈδΈͺε½ζθδΈζ―ε εε½ζ
SD3.5L Turbo Llama3.2
Olivio Sarkas does a double KSampler 8Step-Flux-Turbo-LoRA workflow ...https://youtu.be/jfbqlSaRIPI
SUPER FLUX Turbo give you better, faster, more detailed images. This Workflow is build to give you the best images in the fastest time. With the Image Chooser you can get a selection and then render only the best image to a high-res upscale :)
Links from the Video
GET my WORKFLOW here: https://www.patreon.com/posts/super-flux-turbo-11...
I need to try 7 I have been doing 6.3
Awesome! I'm gonna go check them out. The best part of working on these models is seeing what other people make with them. π
ballz
That last one is nice!!
Vrrrrm Vrrrrrm
^^ Always good to have extra fingers
π

π₯³
Nice, these are good ones.
Using dormand-prince_13s SDE with a solid black image as a latent guide
heyy @dusky thistle when you get a chance do you thiink you can offer me some guidance on what this error means:
nan_to_num(): argument 'input' (position 1) must be Tensor, not float
i like how you added steps inito the sampler iitself so no more need for that betascheduling node
i want to debug it further but i dont understand the natuure of the issue, checked the positive and model input of the sharksampler assuming positioni 1 was one of those two but that checks out
yo @dusky thistle , havent been in the sd scene in a while so i havent kept up. last i read, sd3 uses rectified flow which is incompatible with sde samplers. im seeing that you're using it and others -- where's the disconnect in this coming from for me? is your workflow getting around whatever issue prevented RF to be used with SDE somehow or simply using sd3 without rectified flow, etc? from what i recall, it had to do with how ancestral samplers worked or something like that that didnt mesh well with the RF process.
good stuff btw, im seeing more and more people mention you here and there
it turns out it's very much compatible with SDE, but the noise scaling has to be done very differently
it's a lot less tolerant of funky math than the previous non-RF models, it has to be dead on with controlling the variance
ahh okay sweet
i've got a bunch of noise scaling modes working now
i def wanna experiment and play around in that case
know all the parameters of what can and can't be done with it, i think
i think the results are def better than with just ODE
do you have a log of what youve tried so far, what is and isnt good
probly just whatever you mentioned in here right
if you jot them down somewhere id love to go thru them as a read while on a lunch break or something. even things like what you wanna try, thoughts on something you have tried, etc
oh they're pretty incoherent lol
otherwise no worries, im probs going to go thru a bunch of stuff youve already tried so far on my own ventures lol
it'd take forever to really tabulate all that
as far as I can tell the one key thing is making a scaling function to map the s_noise value for each step to a noise amount that is the right size for that step, for that model
the critical thing is calculating three values correctly
@dusky thistle do you think you could help me out and figure out why I'm getting that error? I tried deleting and re-adding the nodes to that workflow just in case any bad value was set and iit's still generating the same error
how much noise to add after each step, what to step down to, and how to scale the latent when adding noise
ah, you're using pony, that might be the issue
see if it works with sd35
i haven't checked on the sdxl side of things in a while... i'll get to that at some point though
have you tested it with flux?
I just trieid the same workflow I gave you that doesn't work and switiched it to sd3.5 and now I get a different error
CLIPTextEncode
'NoneType' object has no attribute 'float'
That's my fault the WF is built for pony so ii can't just swap out the model, makinig adjustments to just tsee if i can make it work
here's a basic one that works
i'd start from this
cool thanks that helps
when you said that 'works' you mean it works for SD3 right? i was thinking you were giving me a WF that works for sdxl
yep works for RF
if you git pull this will work
fixed the bug just now
with sdxl that is
great thanks, im going to also test sd1.5 and pony make sure it's all in the up and up ill let you know if anything else comes up
btw iis the new denoise parameter in the shark sampler behave the same way as the one in ksampler?
yep
Thatβs way better than having to use that rescale node and having to code in for 14.5 or whatever that number was
@dusky thistle I noticed the error with SDXL comes with the truncate conditioning field, I'm going to set it to always false, is there a good reason I should ever set it to true?
it can help a lot with images going to shit in SD35
if the prompts are > 72 tokens in SD35, sometimes they really go to crap
interesting so i mean i could in theory leave it false and then just manually truncate my prompts to 72 tokens... i thought it was 77 not 72?
i'm seein eta of 0.5 and res_3s doesn't play well with pony models
res_3s with eta of 0 also fails, im going to try res_2m wiith eta of 0 since i know that used to work great.....
still failing, mayybe it doesn't like the scheduler as linear quadratic? gonna try normal and karras see if any of those two fix it.... yep that was it
@dusky thistle if i had to pick between other more compatible schedulers whiich one would you pick for higher quality and/or higher compatibility karras vs normal for res_3s?
btw just a little side fun fact I made a little checkbox for your special sampler, so when I generate an image using ksampler, or sharksampler and I see I don't like the way it came out I can press a button and I get this popup. the first button deletes the image that came out in case it's real bad, the second box will toggle between new workflow (sharksampler or box toggled off) and old workflow (ksampler or box toggled on) and then I can select how many images I want to geenrate if I want to change the batch size for it
if you're using sdxl i recommend karras or exponential
or sgm_uinform
which one is the best of those 3?
best in terms of quality regardless of speed
or wait how about this, what scheduler is the best one to pick for something that'll work across all models really well? that way I don't have to dynamically change it per model just stick to one and let it do it's thing, that means one scheduler that's compatibile and works great for sd15, sdxl, pony, sd3, flux
I'm trying 'normal' for all 5 base models and it looks great for all except Flux that generated garbage
it's really a good idea to use different ones for RF vs sdxl etc
what's best will depend on what you're generating, you'll have to play with it
beta and linear quad are really good with RF
not so much with sdxl
right but i don't want to introduuce complexiity if not needed, are you saying there isn't a scheduler that works across all models?
tell you what i'll just try 'beta' on sd15,sdxl,pony and if it works i'll just use that across all 5 models, now is there an rk_type that works really well with beta or is res_3s the recommended choice for beta?
hard to believe this is sd1.5 sometimes, this is using beta, the limbs are a little wonky
if you use shift then beta can work across all models
of course this isn't actually using beta in all cases because of the shift but for the GUI it is
i thought shift only applied to sd3/flux
yeah so you take the shift off when you use SD 1.5 and SDXL etc
but you can keep the beta schedule
i used beta across all models w/o shift and it worked fine btw
yeah that's what i did exactly
so my final global settings for all 5 models (sd1.5, sdxl, pony, sd3.5, flux)
rk_type - res_3s
scheduler - beta
noise_mode - hard
eta - 0.5
eta_var - 0
and i think that's pretty much the major settings that actually matter afaik
would strongly advise using a high shift, or something equivalent, for SD3.5 and flux
you don't have to but the difference is big
the reason is that Flux decides the layout of the image by the time it hits sigma 0.9-0.8 (it goes from 1 to 0)
look at this example, you can see the first pass is almost done in terms of layout
but that pass ended at sigma 0.8 lol
so you just need to make sure the model has a decent amount of steps in between sigma 1 and 0.8, or even sigma 1 and 0.9
see in this image its 40 steps in that range
I often plot people's flux images and there's like 2 steps in that range
so the model has only 2 steps to do all of the layout which is pretty rough for the model
i'm open to try it, let me reviiew my current settings....
for flux Im using ModelSamplingFluxNode and the settings are ```json
"max_shift": 1.15,
"base_shift": 0.5,
yikes and this is a good time to realizie im not using the exponential shift one fo sd3
is there an all in one I can use for both or do they both require their own node?
for SD3.5 I could use that SD35L_TimePatcher node and just set it to exponential / 3.0 like how Clown does it. Do let me know if there's a better way
clip_g gets a full 77 tokens. clip_l and t5xxl share 77 tokens.
that's what i had thought would be the cutoff
what's weird though is that once you hit 73 tokens, truncating the conditioning embed changes the output
i haven't had a chance to look into wtf is going on
all i can say for sure is 72 or less isn't gonna go over whatever soft limit there might be (which i think is primarily an issue with insufficient training on longer embeds, so it's also gonna vary on what you're prompting for if that's the case)
it's pretty clear in the actual network diagram
it is
lol idk why it's the case, but it is
it may be a bug
idk
all i did was truncate the embed to the same length as one from an empty prompt
72 tokens, truncated/non truncated are the same, 73 they're not
havent looked into it any further
might be. if you track it down, and it turns out it is, please let @lavish osprey know what you found
i do remember reading that longcliip study article and it was talkin about how the 77 tokens is actually 20 effective tokens
https://arxiv.org/abs/2403.15378
Despite its widespread adoption, a significant limitation of CLIP lies in the inadequate length of text input. The length of the text token is restricted to 77, and an empirical study shows the actual effective length is even less than 20.
Contrastive Language-Image Pre-training (CLIP) has been the cornerstone for zero-shot classification, text-image retrieval, and text-image generation by aligning image and text modalities. Despite its widespread adoption, a significant limitation of CLIP lies in the inadequate length of text input. The length of the text token is restricted to 7...
that was published back in march, though - i wonder if any advances have been made since then
to be fair it was revised in July that was only 4 months ago
[Submitted on 22 Mar 2024 (v1), last revised 22 Jul 2024 (this version, v3)]
I think iit's interesting how clip L only has 20 effective tokens that's why im a big fan of longclip that extends it to 248 effective tokens
i'm curious if they released any papers on the actual long_clip research they proposed in that one
We address the challenge of representing long captions in vision-language models, such as CLIP. By design these models are limited by fixed, absolute positional encodings, restricting inputs to a maximum of 77 tokens and hindering performance on tasks requiring longer descriptions. Although recent work has attempted to overcome this limit, their...
weird world where papers are referencing reddit posts
ROFL!!!! oh my
@dusky thistle @bitter hearth https://github.com/kohya-ss/sd-scripts/pull/1768 from Dango (lead programmer for sd3.5 medium) https://github.com/kohya-ss/sd-scripts/pull/1768
from the conclusion:
5 Conclusion
We have propose Long-CLIP, a strong and flexible CLIP model with long-text
capability. Our model can support text inputs of up to 248 tokens, and can better
capture the detailed attributes, obtaining a large improvement on retrieval task.
Moreover, our model keeps the performance on zero-shot classification and can
replace the CLIP encoder in a plug-and-play manner in image generation task.
They also have this table in the PDF that shows the effective token count, I thnk it lends to what @dusky thistle was seeing where he's setting it to 72 rather than 77, in some instances i see values like 69,70,71
@bitter hearth you never told me what you recommend for the shift settings that you were talknig about
what I like to do is have one pass to set the layout and a second pass to finish
then if the layout is too poor you can just raise steps in the first ksampler
or the opposite, sometimes
take a look at Dango's timestep fix code i pinged you on above
here are some fun SD scripts https://github.com/kohya-ss/sd-scripts
LOL yeah same topic
did you look at his code?
no I'm not really into scripts like koyha or simple tuner
been working on some personal pytorch scripts since like february
at least read through the comments
okay will take a look thanks
wow that's some nice looking anime
would be cool if more anime generations were that sort of thing
yeah i love the shading style on that
simply put I'm just askinig what your default/starting values are that you recommend before I adjust anything
for flux i have max shift of 1.15 and base shift of 0.5 and for sd3 i have exponential 3 shift
if you want a default I would default to the comfy default, which some from the BFL reference code
if I remember rightly its 1.15 and 0.5 as you say
not sure for sd 3.5
this started bc you said and i quote would strongly advise using a high shift, or something equivalent, for SD3.5 and flux so thats what I'm asking. what is a high shift value i can use as a default for both nodes?
oh I see, I would say 1.5, for flux
for both base shift and max shift
is a pretty reliable high shift
wow interesting 1.5 for both base and max? ok ill try it
i 100% agree. i'm really tired of all the anime being some version of "girl with large chunks of meat on the top that are so heavy she should have major back problems"
my actual overall recommendation is to do custom sigmas but it seems unpopular to do custom sigmas
I know very little about anime but it probably has some very interesting artstyles yeah
personally ii'm really sick of Flux's default anime style, its so off putting now, the way it makes anatomatically correct heads with these weird large eyes
shift 3 for sd3.5
ah ok thanks
alright so exponential shift 3 is the sweet spot for sd3.5 and 1.5 for base/max for flux, ill try it
yes, well, that's what happens when you scrape midjourney and stuff all the images into your model to mask the probelms it has
I did a lot of image stuff in the last few days but it was all in the new Itercomp thing so there were no shift shenanigans
for sd.35, yes. for flux, usually i wind up with 2, but sometimes 3
I found something funny
if you stack enough DPO loras on SDXL it starts to look extremely flux-y
i did tell you that flux is a huge lora, right? what all have I been saying about flux, and being argued with by everyone, all this time?
I never quite understand what you mean when you say its a lora
someone, FINALLY, earlier on reddit admitted that when they try to do images of people laying on the ground with flux it has the same issue that sd-2b-medium had with warping the body
it can only do specific things in a very tight range. just like a lora only has a very limited, tight range, of information
oh in that case I totally agree
it's massively overfit for women, dogs, fantasy images, and anime catgirls. it's so overfit that's all it can do with out a huge battle. and after it was stuffed full of that to mask the warping issues that the core network causes (rather than fixing it right) then they not only distilled it, they dpo'd it
it's a great tool, if that's what you want.
if you want something else though, it will put on armor, pull out lasers, and battle you to the death
ok I think I understand, when you are saying its a lora and its frozen you are saying its overfit
I agree with that yeah
no, i'm saying it's overfit when i say it's overfit. without all that stuffing it would probably be about an 8billion parameter model. it's 12 billion
hey @bitter hearth do you know of all the rk_types in the sampler nodes which one is the fastest?
Res_2m with iter=0
where's the iter field? ii dont see it in shark or clown sampler nodes
All models are wrong, but some are useful
the names keep changing so its hard to know
might be implicit steps
might ping shark and ask him what he did with it
implicit steps I think
got it, yeah that's step to 0, so res_2m is the fastest ii'll try that, using res_3s and while the output quality is superb my rendering times are now like 50-100 seconds for pony images whereas before it used to be 15-40 seconds
to save you the time, I tried a stupidly high amount once already and its not worth it
was like 1000-2000 steps with implicit steps on 8
so over 10,000 function calls
really not worth it
you do like to torture your machine, don't you?
yeah its the fun part π
I did a big test this week of flux with CFG or without CFG and I preferred CFG π€·ββοΈ
although I can understand why people say it breaks the model it loses some of its base style
here, run this: sampler: deis, scheduler: linear_quadratic, cfg: 5 Steps: 35, Prompt: various types of fruit arranged in a crystal bowl sitting on a table. next to it are scattered flower petals and leaves. the background is a spherical goldishgradient. back lit, hard-rim lighting
use sd3.5 medium
and once you run that, then turn on the SkipLayerGuidanceSD3 node and skip layers: 22,23,24 scale: 1
and run that
ok thanks will try tomorrow
Deis is really really good I am a big fan of it
:) i like all of them, i just turn them into pretzels till they do what i want
I want to make my own set I've just been procrastinating it
there's a vast amount of things you can choose from to put in a sampler
too many projects, too many toys
yeah for sure and this is just one small area of ML
for me to do yet more samplers wouldn't be that useful I guess as there are so so many
working on some FP4 kernels for Triton and Tensor RT would actually be useful but very tricky and dry
RTX 5090 is gonna get FP4 speed boost
not real sure about that. which area of ML are you most interested in?
mostly statistical models but out of the more fun stuff, LLM agents
although its a bit early for proper agents there are some janky versions out there
you can do a fair bit in comfyui even if you chain Ollama nodes and vision LLM nodes
then there might not be nearly enough samplers, or schedulers - you're not talking about compvis
yeah that's true
the reason I think a focus on Triton and Tensor RT is good is that it speeds up every model you run
whether its image or LLM
quite a few companies, like Fal, for example, their competitive advantage is things like hand-written kernels for things like Triton and Tensor RT
@bitter hearth can i speed up image gen on my 4070 with any tensor rt tech for comfyuiii?
@bitter hearth https://youtu.be/43U_ADJmCoc?si=HCXTz6TaX4_b2OZ3
Artificial intelligence is here, but we are still guessing what its future holds. Hollywood has been imagining the impact AI might have on our lives for decades, but how accurate are these portrayals?
AI researcher Beth Singler is assistant professor in digital religion(s) at the University of Zurich, Switzerland, and a lifelong sci-fi fan. Thi...
thats good entertainment, i like the female cyborg
but i wouldn't agree with anyone who claims ai can ever achieve consciousness
it can push the limits of neural activity but not consciousness
Ai will be the most sophisticated apparatus within the physical domain of reality down to the very acute level of quarks and stuff, and as far as mind body interactions can go
garland is a crazy man with science fiction. devs is a good show too, about a company developing a denoising algorithm
humans will most likely thoroughly enjoy loving AI female cyborgs intimately, but very well knowing they are void of human emotions, they can translate your emotions very well but unable to experience it on their own
woman will get on board the first. they already are in hand held form
you talking about toys?
its impossible to feel the same level of connection with cyborgs as you will with real human girl
pff. over rated. most guys are going to go for the easy stroke.
sure but still that's not the same degree of experience
the ai will be like "you're so great you're the best" and that's all they want
there is soemthing deeply related to mental connection with real human girl that cyborg can't offer
the kids growing up with all the easy access will lower standards significantly
loweirng cognitive experience is not a scope
you will have a broader scope to augment
the shift has already happened once before. we were an animal husbandry race of people. the bond with your horse was sacred.
cars are just better at it. measured in literal horsepower per car
you still can find horse girls though
not the same
even if you give examples, which btw is real, who will enjoy and prefer that type of expereience, but thats still not the same
first gen sex bots for sure will be like the tbucket.
then the enzo ferrari of sex bots will enter the stage
you know im very sure there are people who are very much driven by materialistic aspect of their lives, so i know what you mean
but that doesn't objectively imply they have as much richer experience as someone with high cognitive sensitivity
yeah but wait until you see ludicrous mode
ludicrous mode?
whatever the ludicrous speed equivalent is
never heard of such attribute
ahh ok so thats easily possible when you understand that AI can have the collective brain capacity of all humans
to reference space balls, tesla updated the cars with a new launch mode that was faster than insane launch mode. called it ludicrous
ludicrous launch is when it goes 0-60 in /
that can happen by simply increasing bandwidth for data
theoretically what LLM of today is capable of compared to human intelligence you can enhance that with more memory capacity to compute
i've been trying some LLMs lately. it's better to get models that you need to jailbreak, because models that are pre jailbreak are all sexbots basically
it's a very popular scene
censorship sucks
i wish we had llm bots that are open to explicit disucssions as long as they dont impose any harm on anyone
but what we have now with explicit content is some nonsense moral standards of humans
yeah. but you see, you're talking about safety. but that's what they're trying to do and it's categorized as censorship. there are blurred lines that are difficult to define here
dont censor that part of human primal desire
what i know is that the decesnored models are basically destroyed for regular knowledge. they're just aol roleplayers
or they will create another sd 3
i havent come across any llm that can discuss intimate topics openly
they should at least allow the kind of interchange you would expect between couples
anything that the sillytavern community talk about in their weekly thread is a very "intimate" orientated model
but you know .. there is a risk involved
people can get suicidal not cause ai can be so intimate but cause of underlying societal scenarios
i downloaded one of the llama 3.1 models thinking they wouldn't have been able to turn those into a very intimate one, but it's honestly all the mdoel wants to talk about. If i use one of these jailbroken ones instead of the base model and giving ti a jailbreak prompt. i'm trying to find a model that wont give safety warnings all the time, but also won't bring up the ol push an shove at every opportunity
the roleplay crowd has got llm training figured out
well its not a limitations of any kind from AI's part, but its human imposed safety net
yeah. an there's a lot of research into disabling the net post training. in better non destructive ways than what i've found
the critical point of this whole thing is society's religious mindset to a great degree, this outdated fear based faith is hindering developments
society is not yet ready for polyamorous love cause they think its sinful and immoral
yet they will have you accept that love is unconditional
talk to a religious person about jesus cherry picking a girl
logically and as a human being with principle its not a difficult idea to love all girls
not just that but a lot of taxation and property rights are woven into the idea of the atomic family
to the point that bigomy is tax evasion
economy and human labor are intricately associated with it as well, and gives rise to the arbitrary concept of jealousy and cheating
unless there is mas scale automation to replace all human labor meaningfully and with greater efficiency, society will not overcome those mindset of jealosuy
here is the world we live in today, ... you only love one person, you are so moral and ideal, .... you love many people, you are a degenerate
and if you promote polyamory w/o removing those other factors of human labor, economic servitude you would actually push them towards unhealthy bonding
draw a paperwall with a hello kitty
finally got things pretty close to being stable and able to be forked into a dev and stable branch
lots of nice simple workflows up there with input images included
the prompt
How's medium vs large?
different
both are great
i'm usually generating most of my stuff with medium lately
thank you π I'll definitely have to try both. I'll also try to create loras for both and see which works better. So far my large lora attempt is not turning out as well as my flux ones <cries>
You might wanna check out RES4LYF again, big update tonight, it's getting really good results with RF and fast
Bunch of nice simple workflows on the repo now
Getting waaay easier to use lol
Your github page is awesome!!! I usually stick with civit, only because they actually show images and examples (whereas HF doesn't easily really)
3.5 medium
We not using res-2 no mo?
sd3.5 (large) is 100% trainable for loras. flux is only about 70% trainable.
Anyone know why we have not yet seen an explosion of SD3.5 checkpoints like Juggernaut etc? Too early? Are they in training or what?
not really many loras either; I figure people don't realize how trainable sd3.5 is after trying and having abysmal results with flux
Some in context lora creation
hello everyone, I am attempting to run stable diffusion 2.1 via https://github.com/Stability-AI/stablediffusion. I did not find the version 3, and I have the BFloat Error, can I have some help, please ??
I suggest you to run them via interface: forge webui or comfyui
There is plenty tutorials how to set them up, choose which you like, however, sd3.5 seems to not be supported in forge yet
yeah the tensor rt nodes should work out of the box even
however vram limitations are there for your GPU
you may be able to do sd 1.5 but not flux, for example
for tensor rt
i tried it last night actuallyy
the tensor rt loader actually has flux-dev and flux-s in the dropdown so it should work in theory
i actually tried it, i converted a flux model but i couldn't get it to work, kept ggetting an error
fairly sure the flux one requires 80GB to make and then 40GB GPU to run
I've been using them on A/H100 I don't think they are for 24GB
for SD 1.5 you will be ok
wdym? the files i was making were smaller than the input
to make flux tensor rt engine?
it was working fine on my 8gb vram setup for pony
yeah for SDXL
yeah i was able to make the files for sdxl/pony/flux just fine
you made a tensor rt engine for flux on an 8gb vram GPU?
iti was just figuring out the workflow to get it to run, kept getting a shape mismatch error
yes
hmm okay thanks I read this requires super high vram
the STOIQ model
anyways its a fun idea and its quaint but its not practicall
the lack of lora support is kind of a deal breaker for me
however if i can get it working for flux Ii'd love to convert a few of the better models that don't always need loras to this tensor rt format and see how much faster it is. considering some images take 15 minutes to render I'd love any knid of speedup i can get
I have a supercomputer with 6 nodes 128Gb of RAM, and 6 PNY RTX 4000 ADA. Do you do distributed run ?? Frankly my objective is the intregration with my own RAG to build 3D object and 3D caracter into Unity. LocalAI work, but right now because there is no easy distribution I am limited to 20 GRAM per node.
swarmUI is good for distributed inference
yeah that's not practical though
notce how i made 3 variations of the PVC model, i was playing with merging loras into the model to verify that indeed does work
its annoying to have to build the engine yeah
in some ways FP8 matmul plus torch.compile is a better compromise
especially since the latest pytorch speeds up torch.compile
lack of dynamic lora support let's say makes it a deal breaker for any sdxl/pony model i'd use but it's still a viable solution especially for SD3 where there's no loras and Flux where some modells dont really need any loras
i was going to investigate it further and find a WF that works with flux + tensorrt see if i can get it to render something in comfy
have you managed to run the flux tensorrt engine?
or just build it
I'm just confused cos here and on comfy server people were saying 4090s could not do it festivalman β 10/10/2024 19:42 how did you covert flux to be tensorRT? Every time I've tried it, it says OOM on my 4090.
did you test the image to 3d verqsion ?? I have instantmesh that work, but it is taking all the RAM
Frankly, no, they are not ready, the face of a 3D caracter from LocalAI does not appear, but I still need to implement it for my Game
okay I'm afraid I don't have experience with the 3D ML models as I have ignored them so far
Thanks π
I would ask in blender discord and UE5 discord if I were you
weird bc i converted it just fine on my 4070. tell you what ill convert it againi just to make sure i have a solid base
I like Unity a lot, its still good
now that you mention it ii do remember tryng to convert flux dev destilled and that one did give me OOM issues
maybe I wasted money on 80GB server for this lol
yeah I use sites like that
$0.70 or so for an A100 80GB is my favourite
but sometimes $0.80 for L40s 40GB
Frankly NeonNinjaAstroo, if you invested into a A100 or something, you are not the only one to think like this, so don't worry too much
i never rented any machine that large, I'm a big fan of the 3090 and the A1000
3090 is good one to rent still yeah
my biggest spend was renting multiple A1000s like 4 or 5x concurrently and have them rendering for me pony/sdxl images while they all get streamed back to mym main server in real time, it was quiet the sight lol
0,70$, where do you find these price ??
I see 3090 for $0.30 often
yeah in terms of value, unless you need more vram that 3090 on the community cloud is the sweet spot
ii usually pay 12 cents for the A1000
thats 16gb for 12 cents, at most i pay 14 cents
the reason I avoid the smaller ones is I had too much trouble with them
I have it set up so I can use the spot instances so if someone kicks me out then i have a script that attempts to outbid them in an effort to regain control of the machine and then it'll keep retrying upto a point where it makes financial sense mathemtically and then it'll kill the intance and stop trying lol
haha
12 to 14 cents is only attainable if you're willing to do spot instances though, like if that make ssense for your use case
I found the interruptible ones were not always cheaper, which is weird
ive never seen that, for me its always cheaper
this was on vast I think
bare in mind I use the verified data center ones not the community ones, cos of security
i dont care i just want the cheapest, security is of no consequence to me lol
i have found that there's like a mini scam on runpod not sure if you've picked up on it
what's that
sometimes i'll get a machine assigned (like 1 out of 20) that has a broken GPU
liike some sort of mismatch or invalid state the GPU is in with the dockerized instance
# Step 0: Check if CUDA is available before proceeding
if ! python -c "import torch; print(torch.cuda.is_available())" | grep -q "True"; then
echo "CUDA is not available. Proceeding with termination process."
I had to put this at the top of my deploy script so I can check that first and if the GPU is busted on it I just terminate the instance
there are some weird instances out there
it feels like a scam bc i can't report it, there's no mods or admin or contact person, so they get to just go back on the market and keep eating people's money with no consequence
there's loads of those around yeah
not necessarily a deliberate scam just a mistake
yeah i can imagine that, like the dude just set it up, something broken and the dude still sees money coming in so as far as hes concerned it's still 'working'
there are datacenter ones like that too
the providers like runpod, vast and salad don't do as many checks as they make out
that sucks, so ill stick to runpod then, ive already got tons of code written to integrate witih them, really enjoy their graphql API and other hooks they provide, plus ive looked around and you cant beat runpod's prices
even with like a sign up discount from other providers like massed, even affter a bonus credit or whatever they give you, its still more expensive than runpod
here's the tensor rt workflow i used to make the working pony models
I don't actually know why people use anything other than vast, given that vast is much cheaper than any of the others
the reliability is around the same across all of these
Anybody rate Flux Colossus?
so far haven't seen a flux checkpoint that had better image quality than dev
you mean you dont think any finetunes are better than the base model?
for flux yeah I think that
there are loras that refocus the model on a particular style or subject
but I haven't seen one that actually raises the overall image quality
whereas with SDXL or SD 1.5, Realvis or Jugger are way higher image quality than SD base
I think there's a lot of good finetunes, this is the set I use, stuff like Fluximate, Pixelwave and STOIQ really stand out in terms of being better than the base model, that flux.dev dedistilled is in another league of it's own, not only is better than base model but it's almost on par with Pro imo
i think flux already had lots of high quality aesthetic tuning so its hard to improve performance, except in artistic styles.
Its actually the same with modern open source llms(llama3, gemma2, mistral), there is not really any general finetune that improves performance.
I've tried all of those except Raemu
I'm not rly convinced
they are good because flux base is good but I am not sure they are better
raemuu is alright I'm not going to sing it's praises but it does not dissapointi either
if you want somethinig demonstrably better than flux-dev in every way, in every comparison test, try dedistilled like i said just scientifically, empirically that model is better than base model without a shadow of a doubt
like iif i have a complex prompt and I dont want to mess around, i just want it right on the first shot, and I dont mind waiting foor the 60 steps to finish ill puull out that distilled model
i use ksampler on every model too, I just tweak it to 1 to 1.8 for distilled models and give it free range for dedistilled models (like mangled, fluxbooru and flux-dev-distill)
ah yeah that would work
with some tonemap, threshold or skimmed cfg can go a bit higher
to follow up with our earlier chat I managed to convert the SDXL version of the STOOIQ model not the flux version, the flux version is in the unet folder nott in my chekpoints folder so thats why i got confused, bottom lne iis you're right i don't think i have the memory to convert it
i just rescale 3 to 10 to 1 to 1.8 and it's been working pretty good
rescale CFG is fine yeah it looks kinda similar to tonemap
jst tried convertinig sd3.5 to tensort and that doesn't work either, and thinking iit through the idea of using a rented machine wont work bc the engine is built on the GPU it's made with so it's not like I could go into a rented H100 and pick 4070 from a dropdown when buildnig the sd3.5 or flux model
I am not sure now
a lot of people were saying it requires high VRAM but I am not sure where the line is
yeah i think they were right, again my earlieir test of was flawed, it was just an sdxl model, i'm getting OOM when converting flux or sd3.5
but using a different machine other than yours would generate an engine that's not compatible with your GPU
but using a different machine other than yours would generate an engine that's not compatible with your GPUyeah this is an issue
I don't know the solution to home users
other than potentially to get an RTX 5090 or A6000 40GB at home
which is expensive
lol i'm on a laptop so i couldn't just get a nicer card, I'd have to get a enclosure or a full pc
I'm also using laptop yeah
but one without a GPU even
I have a PC in some state of repair
Ultra? What's that?
hey @sage burrow nice to see you're back, you were gone for a while, life am i right?
Do sd 3.5 large loras work on medium?
i dn't think so, i tried it myself bc i wasn't gonna take no for answer despite it being common sense, they're just different base weights so they can't be compatiible but comfyuii was complaining about shape mismatch so there's probably more to it than just weight alignment
Okay, thanks
Ukiyo-e painting with Gustav Klimt influences. Beautiful fairytale princess discovers a big shiny, golden compass in dense, dark forest, gnarly trees, lush green vines and colorful flowers, full of magic and mystery. Dwarves watch in amazement. Sparkling light floats in the air, adding a sense of mystery and fantasy.
Still hiking season lol
I just made a medium lora which worked on large GGUF, I think lol
If you have a link I could try it and confirm
flux dev loras work on schnell but 3.5 large loras don't work on medium
@dusky thistle your stuff is almost like a bitter sweet victory getting it integrated bc it's cool that i have available this high quality sampler that's much better at not generating artifacts but itis like 10x slower than ksampler for sdxl images and like 9x slower for flux. that is a flux image typically takes me 80 to 120 seconds now averaginig 1000 seconds, sdxl typically takes 15 to 40 now averaging 120-140. that's way too much
just as a little side fun fact this image took me 40 minutes to generate using flux dev dedistilled + sharksampler @ 60 steps
if i were to make a suggegtion as to the next set of changes for clownshark/sharksampler is to focus on an rk_type that's fast above all else no matter what sacrifice in quality, try to get something that'll render in the same time or within 1-3 seconds of what ksampler can do
depends what settings you're using, lol
it's just as fast as euler in ksampler if you use res_2m
i tried that
when did you last update?
i'm using res_3s and then I think NeonNinja told me res_2m should be just as fast but i personally didn't observe any performance improvements, imo it seemed just as slow at 10x
i've reworked shit bigtime in the last 24 hours
yea that doesn't make any sense
res_2m is def just as fast as euler
tested it a few hours ago again
i can try it again for sure maybe my tests were flawed or inconclusive
im afraid to do a git pull
i'd hate to deal with any breaking changes again
did you remove any nodes, add any fields or otherwise changed any paradigms?
yeah it's way better
check the github page
there's workflows on the main page now that are up to date
you can see how much cleaner it is
lol oh no so you did change the workflows?
lol no way dude i'm staying away ill just try res_2m for now and be happy, last time i had to spend like 4 hours re-integrating all your stuff back in
im not ready to commit a few hours to figure out what changed right now
look how nice and simple that is now
resizes all your images and latents and masks for you
converts everything to latents and masks internally
no need for a clown and shark node anymore
yeah i see you added shift and base shift internally
and no need to deal with zero out nodes
or even negative conditioning at all it'll do it for you
ive got 1 more image waiting to render and then ill switch to sdxl and res_2m and give that a shot and confirm if it's within ksampler's times. personally i diidn't mind the way it worked i just set it to the truncate to false and i can just leave negative conditioning to blank for flux or actually use it for sdxl
this is what txt2img looks like now
simple as f
actually a simpler WF than using ksampler
oh no, so you got riid of clownsampler and the 'sampler' pipelinie that's gonna be a mess to recode π€¦ββοΈ
i've no clue i think mines is on a specific branch so it doesn't go to the new UI
bc whenever ii do a git pull im still stuck in the legacy UI
@dusky thistle 212 seconds for the first image using 6 loras, 107 seconds for the second image using 1x lora, same model so the weights already loaded, that's a far cry from 15 to 40 seconds using ksampler, like i said even res_2m is 10x slower than ksampler. I think you should really take some time to focus on optimizations for the next commit with the goal of really being on par wth ksampler's speed
you should get to the current version first
88 seconds on the third image, 89 seconds on the 4th image, if anythinig that's 2x slower considering some images take 40 seconds somettimes, i could do a side by side comparison of like exactly the same generation with ksampler and with your sampler but i think the point is that for sure there's no doubt the speed iisn't on par with ksampler
whatever speed issues you may have, may already be fixed
i'm not seeing any difference in speed, nor is anyone else that i'm aware of
so with the latest version you already did a side by side compariison, with ksampler and with your new WF and benchmarked the times?
12 seconds with euler in ksampler, 13 seconds with clownsharKsampler
alright that motivates me to do yet another reintegration, i just dont have the time for it right this moment but iill def try it in a few
cool stuff man keep up the good work
it's possible your shit slows down a bit from having to generate the noise after each step, idk
i just tested with noise generation off and it's actually a tiny bit faster over a couple runs for whatever reason, prolly random luck
yea just did a bunch of runs here with SD35M, zero difference, the gap is random
hit the cntrl + and/or the cntrl - on your kb until it shows up....
@dusky thistle
so you got rid of eta_var? what's the alternative now?
whats beta57 compared to beta? is there any benefit? do you recommend it for somoeone that uses beta for everythinig?
whats denoisie_alt and how is it used? is it safe to just act like it doesn't exisit and use only the denoise field?
for sd3.5 i was using 3/exponential for the shift whats the equivalent? 3 for max and 0 for base? is 1.5 and 1.5 the equivalent to what the flux node was doing?
hard_var = eta_var
clownsampler is still there with the full discrete options
but the difference has been so small i didn't see much advantage to keeping it in a "full package" efficiency style node
base only has an effect with flux
does only hard mode use that or can soft mode use it
it's actually its own mode entirely
the reason i had it separated before is in case it was useful to have another noise mode take over once the math breaks down for hard_var, which is around a sigma of 0.15 or 0.2 or so
but... i haven't seen any benefit tbh
the way the schedules we have are set up, that's usually only like 1-3 steps anyway
what about beta57 question?
beta57 = beta scheduler with alpha = 0.5, beta = 0.7
i like it a lot
there's currenty a bug i haven't figured out yet where denoise doesn't work correctly with it
denoise_alt = rescale the sigmas instead of slice them, like ksampler and regular denoise does
oh im relying heavily on denoise for my genearations so thats important to know ill avoid it then thx
and denoise_alt should be insigifnicant from a functional standpoint?
for the exponential thing, sd35 medium and fulx both use exponential already, it's large that doesn't
you can still hook up that timestep patcher node and it'll patch sd35L to exponential mode
using your new workflow, looks great
give it a shot, it's a pretty different effect
i've found it can be easier to denoise something to a small degree using it
try setting it to like... 0.9 or something
this is my denoise code
let lowRange = 0.75;
let highRange = 0.95;
if (isSd3Model || isFluxModel) {
lowRange = 0.45;
highRange = 0.65;
}
nextItem.denoise = _.sample(_.range(lowRange, highRange, 0.05));
the best settings i found for flux were max_shift = 1.35, base shift = 0.85, using beta57
it's certainly possible that i was optimizing for certain image types or resolutions that worked best with certain samplers
but that wokred really well and seemed to translate fairly well to sd35
the beta57 that is
found the default shift is pretty much fine... 3.0
is there a way to disable it so i can just rely on the nodes rather than the sampler? like if i set it to 0 and just leave the existing nodes as is?
I steal prompts and I don't mind getting in trouble for it π
the shift?
hahah no prob, i def recognized some prompt gen outputs :D:D
i'll implement that now
ok cool thanks yeah i just dont have variable logic in my stuff per field per targetBaseModel so I'd have to create some new code to support variable shift per model or have 2 nodes in my WF and its all a lot of work
what are you going to settle on, -1 or 0?
alright you should be able to git pull and set to 0.0 for either shift value and it'll disable it entirely
can you actually do anything with a shift of 0?
does it just explode
lol
very good thx for the support
np
if you pulled a min ago pull again lol
idk wtf happened but somehow a piece of this chat got pasted into the code right as i pushed lol
and you renamed the class_type ClownsharKSampler lol sheesh
well, clown still exists
Updating eb8c99e..5178739
im' gonna keep that around as a pure sampler option
this is just me mashing together the most important options from clown and shark
{
"noise_type_init": "perlin",
"noise_type_sde": "studentt",
"noise_mode_sde": "hard",
"eta": 0.5,
"noise_seed": "seed",
"control_after_generate": "randomize",
"sampler_mode": "standard",
"sampler_name": "res_3s",
"implicit_sampler_name": "default",
"scheduler": "beta",
"steps": "steps",
"implicit_steps": 0,
"denoise": "denoise",
"denoise_alt": 1,
"cfg": "cfg_scale",
"shift": 0,
"base_shift": 0,
"truncate_conditioning": "false"
}
this is gonna be my settings
should be fine
if you get weird results, play with the noise, gaussian/gaussian is gonna be the most reliable usuuuuualllyyy
i have a json file that generates a workflow for each target model using the spec so I'm gonna test the models now, i could automate this part but its better to test it manually
ii did have it set to gaussian for everything on both nodes so ill keep that in mind
okay it doesn't explode with shift of 0.0
so i'm gonna make that "disable" value -1
oh you changed the order of the ksampler's outputs π€¦ββοΈ another thing i have to relaign, you keep changing it too
okay so youre settling with -1, got it lmk when do a pull
what's the denoised output on the ksampler do btw?
order is the same, it's a new node lol
technically iti's a replacement for sharksampler from my pov, i see iit only has "output" and "denoisie"
sd1.5 looks great with those settings, trying sdxl now which also uses img2img so let's see if 0.5 denoise will still work as before
yeah, before it had fp64 for both but that was really only important for unsampling, and i just threw it in as another element in the latent image output
it'll check when resampling for latent_image['samples_fp64'] and grab that if it's avail
some nodes blow up if they get fp64, a number of em do actually, hence the reason for trying to ensure the output is the same dtype as what went in
makes sense
looks sharp but im seeing 83 seconds on sdxl not looking good on that end, i do have iit set to 3s tho ill try 2m once its done
oops i'm a fuckin idiot
if isinstance(model.model.model_config, comfy.supported_models.SD3):
model = ModelSamplingSD3().patch(model, shift)[0]
elif isinstance(model.model.model_config, comfy.supported_models.Flux) or isinstance(model.model.model_config, comfy.supported_models.FluxSchnell):
model = ModelSamplingFlux().patch(model, shift, base_shift, latent_image['samples'].shape[3], latent_image['samples'].shape[2])[0]
elif isinstance(model.model.model_config, comfy.supported_models.AuraFlow):
model = ModelSamplingAuraFlow().patch_aura(model, shift)[0]
elif isinstance(model.model.model_config, comfy.supported_models.Stable_Cascade_C):
model = ModelSamplingStableCascade().patch(model, shift)[0] `````
woooops
it's only triggering if it's at 0 lol one sec
or less than lol
that looks right doesnt it? -1 to disable?
uh oh SD3 cominig out all black π¦
thisi is the not-so-fun part where i have to retry sd3 with various settings to see whats wrong
sd3 workflow if you wanna help me debug it
k it works now
get pull it again
if shift is < 0, then it skips setting shift for everything except flux, which skips setting shift if either shift or base shift are < 0
ok ill try it again
keep hitting them over and over until you see the right button
hmm
whoa this is weird lol i don't have this node nor did i make one with this layout
that's just how it looks like with my stuf, its your node your class_type but i edit it a little bit lol
ahh k gotcha just making sure something real crazy wasnt going on lol
i figured you were prolly messin with it
fun side fact I found an endpoint yesterday for ComfyUI that I can hit that enables previews, bc even tho its set to auto in the command line iit still doesn't set it to auto for websocket connected clients so calling this endpoint after connectin makes your ksampler behave like my other one (ksampler efficient advanced)
enablePrevews() {
this.fetchApi('/api/manager/preview_method', {
method: 'GET',
headers: {
'Content-Type': 'application/json',
},
data: JSON.stringify({ value: 'auto' }),
});
alright sd3 works great looks good
shift disabler workin fine too?
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 60/60 [00:24<00:00, 2.44it/s]
Prompt executed in 25.89 seconds
got prompt
Requested to load SD3
Loading 1 new model
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 60/60 [00:24<00:00, 2.42it/s]
Prompt executed in 25.22 seconds
first is dpmpp_2s_ancestral in ksampler, second is res_2s in CSK... they both use two model calls per step and should have similar runttimes
soliid mate, getting 14 seconds now with the new updates
yeah i know some shit was way fucked up a while ago as i was working through some issues with the rewrite
oh wait sorry i jumped the gun its stiill using ksampler my bad
you prolyl just had something with a bug that cleared up during that process
hahaha
alright 36 seconds, wiih 5 loras and 24 steps on the first run using res_2m, that's totally within acceptable range I can leave it as the default at this poinit
yeah it should be pretty damn similar to euler with ksampler
there's a slight price for the noise generation
if your comp is slow and you're using huge latents it becomes more apparent for the sde solvers
21 seconds with 3x loras at 19 steps yeah it's all good now man
im glad you solved it
13/30 [00:26<00:34, 2.02s/it]
euler with ksampler
euler with CSK:
| 8/30 [00:14<00:40, 1.83s/it]
1792x1344 latent
great to hear
yea i'm sure you know how it goes, often times the best way to fix bugs is just to ignore them and keep cleaning up code
aint that the truth lol
clownsharksampler, new node
ahh
rolled into the functionality of clownsampler and sharksampler
now that ive established a baseline for res_2m i'll move over to the other side of the spectrum and use res_3m see what kind of times I get for that and if it's within reason
what does your sampler have to offer (for flow matching models for example)
all the *m should have similar runtimes
does it have noise injection and stuff
tons of features nothing else has
yeah, it's full blown ODE and SDE
30 samplers, 20 noise types, 6 noise scaling modes
cause im braindead when it comes to sampling values (like wtf is eta, all I remember is that it should be like 0 if its flow matching model, etc)
unsampling, latent guide modes, noise inversion (redid all the math for it, getting better results than with the paper's implementation)
so I only tried noise injection cause its simple enough
https://github.com/ClownsharkBatwing/RES4LYF got a bunch of WFs here
i was leaving it at 0 but then i experimented with 0.5 and it works fine for all models
the thing about it needing to be 0 is totally false
it WAS true with the other samplers
but it's not true at all that it can't be made to work
tbh, results are def better with noise
the math had to be reworked completely for the noise scaling
what's the equivalent to res_3s in the ksampler world? you said res_2m is equivalent to euler right?
in terms of speed? there isn't one
there's only the equiv of *m samplers, and 2s
the 2s ones are just dpmpp_2s_ancestral and dpmpp_sde
heun, heunpp
no like in terms of algorithm like what's 3s equivalent to if there is any in ksampler as far as like the math that goes into it
there isn't anything
oh cool so its unique stuff nice
in terms of the algorithm, there's only euler, 2m, 3m, and 2s
some variations on the 2m and 3m themes like unipc etc
insofar as the noise modes go, the new euler_ancestral and dpmpp_2s_ancestral use something pretty similar to noise_mode_sde = "soft" and eta = 1.0
noise_type_sde = gaussian
do you like brownian or gaussian more
it really depends
on what, i'm not even entirely sure
gaussian is more reliable
but brownian often gives a crisper image with a bit more punch to it
but sometimes it looks grainy or shitty
gaussian can lead to more color contrast
hmm okay
most of these options are back now btw
I liked uniform and high frequency fractal power noise
it only does something with res_2s, res_3s, and dpmpp_2s and dpmpp_3s
the rest of the methods come with hardcoded ci
dpmpp_sde_2s is hardcoded at 0.5
otherwise it's the same as dpmpp_2s
err sorry
hardcoded at 1.0
im going to test beta57 now and see if it works with all the models
SD 1.5 β
SDXL β
Pony β
SD3 β
Flux β
yeah it seems to have no issues with any of the models I think i'll adopt it just to be a little different and try it out
awesome, good to hear
SD3.5 medium
saw a video of a fancy new upscaler, check out this image, zoom in on the faces of the soldiers notice how InstantIR is the only one that gets it right
InstantIR might be the new open source SOTA yeah
afaik... first ever 10th order sampled diffusion image on here (SD35M)
2nd order of coffee.