#🆕|sd3
1 messages · Page 112 of 1
Wow how is it that q3 beat q8 in that benchmark? Thats funny I use the 3.2 3b model too and I’ve been wanting to make it more performant, I didn’t consider gguf’ing it
It isn't Q3 per se
Yeah it has all those extra letters afterwards, must be a special q3
so instead of the (vainilla?) SD35large or large turbo, which other models may be worth to try, in terms of quality/speed/performance?
Suffice it to say, I use Q8 as my go to for these all. I personally want to sacrifice as little quality as possible and reap the most benefits from these big models.
I use the 3b model to cross base model remix so I’ll take a pony prompt and have the model generate the t5 xxl and clip g text for image gen and then have it pick the most appropriate Lora’s through categories. Anyways all that takes about 30-40 seconds I wonder if I can speed it up with a little q8 (or q3) tech, ive only used gguf in comfy I’m not sure how difficult it’s gonna be to use those with the torch, transformers library
@bitter hearth so I copied and pasted your text from your LLM and gave it to o1-mini and added this
Given all that text, if you had to quantify, try to quantify me a potential error rate, even if it’s just a ballpark, I understand it says slightly, how slightly? Maybe a range? Factor in the models and workload?
ok so that's not a major difference
im still debating myself on using q4 vs q8
Yeah 0.1 to 2% for most tasks
no reason to put extra load on the system when difference is so low
I have 8gb and I only use q8, the time it takes isn’t that important for me
yes same, but im also considering memory optimization
you wont get gross drop in image quality
Has anyone done a deep dive comparison like side by side image quality?
big hmmmmmm
dang 10 to 20% that’s too much hard pass for me
Q2 flux dev looks a lot more like flux-s
that's how i feel too
ComfyUi has a really nice low vram mode so it’s not like we have to worry about OOM errors
I should adopt the gguf t5 model bc the q8 model is better than fp8 right? So they take a fp16 model and quant size it down to q8 that should be better than the normal fp8 and faster too right?
Thanks for the link
@bitter hearth you saw my comfy screenshot of how I setup the nodes for the clip loader?
yeah
By the way your triple clip loader could still use one more optimization, you can swap out that clip L fine tune for the long clip model by the same guy and it’ll work
im not using clip l
vit is improved which is why i replaced the clip l with it
Just saying you can swap out that entry by the long clip model from the same guy and it’s even better and more optimized
long clip model?
Yeah so like if vit let’s say is 20% better than clip l, long clip is 20% better than vit
Same guy, just go to zero points hugging face it’s his only other project
It’s really fascinating stuff he didn’t build it some Chinese crew did but they managed to extend the context width from 77 tokens to 248
im at https://huggingface.co/zer0int/CLIP-GmP-ViT-L-14/tree/main dont see long clip
Clip L actually has a 20 token effective length
It’s a drop in replacement for sd3, for sdxl and flux you need a special node to make it work
You’re going from 20/77 token width to 77/248 that’s pretty huge and they quantify the improvements it’s like double digits percent gains
interesitng. .. im grabbing it, but what do you mean about needing different node for flux?
Yeah you gotta install some special custom nodes for flux to get it to work, it’s all linked in there that’s how I found it, they even cite the workflow
You just add the special node after the triple loader and pick your long clip file in there so it ignores whatever L model you picked in the triple model
Sea art something
So when will SD35 will be guffed?
Also rename this channel folks. Put the past behind you. the future is bright.
SD3 = SD35
it have been guffed for a day
https://civitai.com/models/879251?modelVersionId=985060
dang
yeah that's the winning build imo
oh wow so it works with flux using the drop in ? that's interesting bc i remember testing it and it didn't work for me
comfyui updated the gguf loaders recently
also thanks for mentioning the long clip 🙂
i gave you one tip to optimize and you gave me one tip to optimize so i'd say we're even 🤝
sure lol
will we get tensorrt for sd35? Turbo generates in only 9s on rtx3060! Trt can bring it down to 6s
I view it differently. Take the chart SAI published, or even FLux team, comparing them against each other vanilla. Each whatever % I give up is a % of the edge it had. It is not 0.1%, but other.
If I voluntarily toss out 20 Elo on that chart, then I have pretty much given up the advantage the ultra large model offers
Ex: Now it is as fast as the Medium model. Sure, and not much better either
that graph is on point, i feel the prompt adherance and quality is pretty much correct comparing flux and sd3.5
these charts should be taken with a boulder of salt.
What are the differences between the various Qs?
They should, but the question is on tradeoffs when getting faster and smaller GGUF models
Nevertheless AI image/vide/sound gen is moving ahead with major strides
at what point are you just moving down to a whole different model?
and just kidding yourself
It is why I stick to Q8. I cannot even run plain fp16 in case you wonder, so this is literally as good as I can get
I really don't see SDXL this worse compared to flux and sd35... Not even the base model.
quality, but I am testing sd35lturbo q4 with clips only and it is kind a fine (of course, anatomy to be fixed yet)
I don't doubt it depends on what you use them for. For me SDXL is a massive downgrade. But I also on't do anime or any such
What's your specs?
4060
wow
4060 laptop to be precise
I can run them all on 3060 12 gb ram on PC
Which one is the highest quality in that case? Q8?
sd35 takes about 3 minutes lol turbo sd35 20 seconds
and maybe Medium will do 20 seconds too. and per their chart is about equal in quality
shrug
yeas
wow 20 seconds on 8gb vram?
So basically long story short it will be a while before we can run Mochi locally XD
12 VRAM
That's kind of my point. I'm not dictating what otehrs should want or do. I am simply sharing my view and what I want. The text production and adherence in Flux and SD3.5 are huge huge upgrades
it should be basically undistinguishable from fp16 BUT a bit slower
that's pretty good 20 seconds with 12gb vram thx for the info
So is anything online out yet to create 3.5 loras? 😉
Yes, I think. I saw on Civitai
12gb vram takes 20 seconds for sd3.5 turbo?
you have 3060? Turbo takes 9 secinds for me with 4 steps, you can set cfg to 1 and get massive boost
Well this is for me yes
Yes
19.6 econds to be exact
Well, come Black Friday I wil be upgrading this laptop, so no big deal
likely 4080 or 4090 laptop
it takes me roughly 10s on my 12gb
Good for u lol
if this includes T5 encoding on cpu, then it is fine
Maybe its coz I was streaming with OBS and doing other things
Oh yeah of course it includes the T5 16 encoder
You know there is a GGUF of that too
nice!
with clips only encoding is near instant btw
could be great for upscaling to not waste memory and time
with plain fp16 T5 there is a significant wait while it processes a new prompt. With the Q8 version the time gain is huge
on cpu?
I have workflows that take like 15 minutes... first gen with sdxl then background refine with sd3 then upscale with sd15 and IC lighting with sd15 then further upscale with sdxl - lmao efficency
XD
What on cpu?
half of the time is swapping and encoding probably 😁
I like to do everything with a single model for that reason
In fact, if you really want to be adventurous, GGUF has an fp32 of the T5
I mean T5 encoding is on cpu or it is partially loaded in gpu?
Yeah I learned that the hard way.
if you guys want to try a mind blowingly good flux model I highly recommend this one: https://civitai.com/models/843551
I never checked. I just compare times and perf
I learned it due to having 16gb ram in the past, it was soooo long
X wings!
okay, if you see improvement, I will try then
that is not the point im making ... on a 12gb card sd3.5 shouldn't take more than 10s in sdxl resolution.
I must be messing up then!
unlike most flux models this one let's you adjust the cfg in the ksampler between 1 and 10 whereas most models you have the range of 1 to 1.8, it also requires 60 steps minimum, much slower than any other flux model but i honestly believe this is the closest you can get to flux pro level quality
I will check it out since the size is not that huge
@bitter hearth i can confirm that this works uusing my flux workfllow, i swapped out the old vit14-L for long clip and t5 fp8 for the t5 q8 gguf and it worked without any special nodes now
the original model is 22gb, that's just the quantsized version
im averaging 700 seconds per image with that flux model, that's like 11 minutes for one image lol, 8gb vram, using the q8 version
60 steps is crazy long, also having cfg >1 additionally slowdowns model, it needs hyper lora! 🤯
yeah its a pretty flux crazy model, its hard for me to grasp what the guy who made it did, apparently he just repackaged flux-dev, no additional training, he did something to the internals so that the cfg field in ksampler works like a normal model again, the result is outstanding adherence to prompts, ive done a bunch of side by side comparisons and 9/10 it's always the better image
i love the animation aesthetics of stable diffusion models tho, something about flux it just has this overtrained weirdness when it comes to doing anime, comics, cartoon, illustrations in general like SAI's training data has some magic sauce that just makes for nicer looking images when it comes to non-realistic images
didn't dedistilation required retraining?
i'm a total novice to understanding concepts like that, this is the original model on hf: https://huggingface.co/nyanko7/flux-dev-de-distill
Flux-dev-de-distill
This is an experiment to de-distill guidance from flux.1-dev. We removed the original distilled guidance and make true classifier-free guidance reworks.Model Details
Following Algorithm 1 in On Distillation of Guided Diffusion Models, we attempted to reverse the distillation process by re-matching guidance scale w. we introduce a student model x(zt) to match the output of the teacher at any time-step t ∈ [0, 1] and any guidance scale w ∈ [1, 4]. We initialize the student model with parameters from the teacher model except for the parameters related to w-embedding.Since this model uses true CFG instead of distilled CFG, it is not compatible with diffusers pipeline. Please use inference script or manually add guidance in the iteration loop.
Train: 150K Unsplash images, 1024px square, 6k steps with global batch size 32, frozen teacher model, approx 12 hours due to limited compute.
Yeah the last line seems to imply he did perform training to it, the whole text is just way over my head tho
yea, there is around 5 version of undistilled versions already btw
even schnell which now is 50steps and has cfg))
civitai downloads stopped 😦
why
?
it randomly crashes far too often
what does?
different parts of civitai. I just want my 3.5 gguf 😄
you can't undistill something - you can't put back what was removed
ah. yeah, civit's got problems
here's the chat I just had in regards to that subject, very interesting read if you'd like to learn more on the technical details https://chatgpt.com/share/6719c028-1b4c-800f-bc5c-2d0a36a08ffe
they didn't puck back removed, they trained in new, isn't it?
yeah @turbid grotto that's what I got too ofrom the chat I had with chatgpt
chatGPT makes stuff up, don't bother with it
you can't do that either. not with flux.
i know what he's doing - what he's not getting is flux undistilled. what he is getting is his mashup version
what did they then?
and, quite frankly, he'd have been better off just using SDXL than what he's trying to do
hope people will switch this resources to sd35l
rundiffusion said a while ago that they training flux but nothing yet, maybe didn't work and they planning to train sd35l now, hope them luck
i'm pretty sure most will at least use 3.5 - and i would hope the development community would switch their efforts to 3.5 and leave flux alone
(instead of spinning their wheels trying to 'unbreak' it)
have you looked to see if runDiffusion has posted any updates?
thank you1 😄
as someone that has generated 800+ images with the dedestilled model i can personally attest to it's superiority over other models, here's a screenshot of how many images ive generated per flux model
the leading number is # of images made, the x cross is just a bug bc it thinks the files don't exist since they're in the unet folder not safetensors
I don't think most will switch to 3.5l yet
flux is finished model, it don't require upgrades while sd3.5l is raw and needs post trainig to improve coherency
I only today found their message on twitter about plans
will probably take time to test different traings
has anyone merged 3.5 with flux dev yet? How did it go?
I think architecture too different to merge
yes it is too diferent
i think this was sarcasm right?
you can't
I was more meaning the pseudo merges, like we all did with Flux and SD3
rule #1: use all the tools. i wasnt' suggesting anyone switch, except for development
Becky, you shouldn't cuss in this discord
short answer - you can't. longer answer - I'll DM you
agree, I am here making wf where sd35lt will generate base image and sdxl finalize it, and maybe sd35l again to improve clarity
should be fast
use sd3-2b-medium as the refiner. it is THE best refiner you will ever find
So it seems that 3.5 had a larger training dataset than flux did.
DM sent
i for sure like the illustration output of 3.5 better than flux's but i think flux is better than sd3 when it comes to photorealism
don't want to touch, I'll better wait 29th for 3.5m 
it will be 2.5b, so probably made from scratch
will be funny if it appears to have better anatomy than large
I feel like it is better for both
another sd3.5 render, how would flux fair?
I think would be similar in that cause
to be fair we are in the #sd3 room I think we'd find a lot more people that would prefer both over flux
not quite the same feel
is that flux?
the last one is
no workflow data i can't run it in flux to compare
tbh both models can do better
2b-medium is very artsy - you're using it to refine
well i didnt' cherry pick or tweaked the prompt heavily
can agree
Grip enhancing gems
Pimp diddy controller
stole the playstation controler from the Trump mansion?
If there’s a joke
im not getting it
lmao
just 4 steps ||(don't mind outfit, I am testing how much details can clips only handle)||
q4 btw
why did you choose q4
because i want to try it as first pass before sdxl and save vram
ahh i see so you are doing multi passes
and here i thought you wanted q4 over q8
no no, q4 sometimes gives veird texture, not bad but it is better to use higher quant
was looking into that earlier and what i found is there is 10-20% quality decrease with q4 from q8
yea, it is usable but it worth nothing to use higher quant
sd3.5, what's you guess, q4 or q8?
another
trump is overly obsessed with gold. at one point, someone reported on his penthouse in the trump tower - every surface was gold plated that could be
Any explanation for Q8_0 being faster than Q5_1?
but it is upscaled
anyways, first - q4?
second - q8?
nah, sdxl breaks everything, however it does better hands
upscaled yes but nothing too fancy, used as shown in the image, but that's all q4
oh great, I thinked second is q8 because it has less noticeable pattern
seemsl like it is pattern from upscaler
Flux schnell a royal setting, princess Jasmine in her castle. cinematic.
from blendic.ai
I added support for training LoRAs on SD3.5 Large at 8bit on 24GB GPU to ai-toolkit. Still doing some testing and will likely make some tweaks to it, but it is there if you early birds want to test it out. https://t.co/tIkpAhiUoa
much better lettering 🙂
can you try this with photorealistic touch a royal setting, princess Jasmine in her castle.
this is what im getting on sd3.5 turbo
I don't like idea with refining by sdxl anymore, it removes all prettines of sd35l
i stopped using sdxl
im more curious how you are rendering those concepts
3rd order SDE RES sampling with CFG++
i did tell you to use sd3-2b-medium to refine with
nope!
He's lethal inside of 2 feet.
it depends, if you prefer the small details of the SDXL model then it improves it
I think my best images out of any so far came from refining Flux with Realvis
darth maul vibes
boy can jump though
this was Flux into Realvis
you keep the vibrance of Flux but Realvis tones down the Flux look a lot
I don't think its possible to get people to stop using euler sadly 🤣
even when their image quality is maybe 1% of an appropriate sampler
I think the AI has been partaking this evening
Didn't know Euler has such a bad rep! XD
you know your sampler is old when the developer looked like this
snow is nice in the cfg++ one
yeah, cfg++ is generally leading to less blown out areas, smudging, color bleed, etc
I keep forgetting I need to switch to only using cfg++ really
their paper showed really big gains in some of the lower down examples
yeah, def
Blepping added Flux support to DiffuseHigh node yesterday
might be useless or might be amazing
haha yeah similar repos in some ways
Extraltodeus repos are always fun too
good example of the kinda diff
cfg++ is so much nicer yeah
I get confused by the cfgpp variable
cos something like the built in euler_cfgpp in comfy doesn't have a variable
there's something sketch about the math, probably my own understanding of it, but i feel like it's not lined up exactly like the paper describes... every time i try to do it the way it's laid out there, i get a latent explosion lol
yeah
theoretically you're not supposed to set it to anything outside of 0 to 1, but i've found 1.5 is really nice
2 can be good... by the time you hit 3 it usually completely burns to fucking death lol
and i mean to death
like if you set cfg to 45.0
ah I checked the original paper and they gave code but in diffusers
i looked on their repo
tehy've got it done a few diff ways but it's real similar to what i'm using atm
recently a paper gave code but for Compvis SD 1.4 repo, but using SD 2.1 model
it was baffling
i'm prolly just botching the math at some point cuz i failed to find the equivalency
it doesn't help that i keep my notes on printer paper and my cats periodically launch everything all over my office and scramble everything
as in... several times a day lol
oh yeah cats are crazy
love em
with the built in comfy one I used 1.5 a lot so I think that is ok
gotcha
there is this other benefit of cfg++ where the early steps look nicer if you end sampling early
basically this
so you can end it a little bit early to get a crazily soft image
yeah the lack of overshooting is nice
was a while ago but that's what this was
oh yea i vaguely remember that one
i have no idea how that lets you create those cool prompts and ideas
well, if you spend as much time flailing wildly at it as i do, and then sleep as little as i do cuz of it, you'll be thinking some pretty strange thoughts too
this has been a big mystery ....
my advice is to buy a 10lb bag of wild bird seed and sit on your couch and sort it for a couple days without sleeping
when you're done, mix it back up and sort it again
then you'll be ready to prompt
you are kidding me now
i actually did that once
im wondering what was the secret to making that image even.... it cant be just LLM or some controlnet
that's a decent answer from the LLM to be honest
i had no freaking idea ... i thought you were just messing around with that sampler, but chatgpt cant be trolling about it, im trying to figure out how to apply that sampler node into the workflow
my WFs are embedded in these
you need my repo to use it
I'll wait for 3.5m, have a good feeling about it
suit yourself
right, but also you can't just refine at 1024, it is highly preferable to do 2x upscale due to vae
oh I agree
if you are going to use SDXL and SD 1.5 now you want the resolution fairly high, to help reduce the side effects of the weaker VAE
for example this is Flux, refined with SDXL at 4096x4096
now go do the same thing, but use SD3-2b-medium as the refiner
ah yeah I still need to try this
that model is THE most artsy model you will find anywhere. it's a fanstic refiner
thats a lot of details but I prefer more focused but still sharp upscales
let me guess, it is made with hyper?
i found 2B screwed my images up more often than not tbh
when using it as a refiner?
but your images are frequently very strange anyway ;)
haha yeah I did use something like Hyper, it was TCD
and degrade some of the other details... gettin to this look like there were jpeg artifacts creeping in, blown out pixels like the levels slider was pulled up too high on the blacks
it'd do things like take a creepy face somewhere in the corner, and turn it into a smiling bimbo, while the rest of the image wasn't changed
what'd you ahve shift and cfg set to. and oh, btw - paper for you to read https://arxiv.org/pdf/2410.02416
i've come to vastly prefer just getting models trained that allow me to generate directly at my target resolution
i don't remember, been a while
it might be okay with heavy cherry picking (generate 10 and pick 1)
I have to use cherry picking with SD 1.5 a lot
well yeah - it's 1.5. use the wrong prompting, you'll be picking cherries from orange trees
this is SD 1.5 hands with heavy cherry pick
the top 10% of hands are fine
and that is exactly why negative prompts were even invented
This insight leads us to explore a rescaled version of the CFG update direction and incorporate a
momentum term, similar to adaptive optimization methods. The rescaling is motivated by the need to
control large update norms, which can cause significant drifts in the sampling process. To prevent
this, we constrain the updates to lie within a sphere. For the momentum term, unlike with traditional
optimization, we apply a negative value to introduce a repulsive effect between consecutive updates,
effectively down-weighting components already present in previous steps. We refer to this as reverse
momentum. By combining rescaling, reverse momentum, and projection, we introduce a new method,
called adaptive projected guidance (APG), which allows the use of higher guidance scales without
oversaturation or degradation in image quality.``` really interesting
i was thinking about something vaguely related today... wondering about implementing some kind of guidance on the guidance like this
it's specifically designed for when you want to turn cfg up but it over cooks
just a kernel of a thought, nothing fleshed out whatsoever
and it works, really really well
I've wanted for like a year now to make a workflow where it re-rolls the hands until a vision model says its okay
but our vision models are not strong enough yet
it seems that those low step things always left some "smeared" details, visible on individual hairs
if you want a clean image Hyper is better than TCD
is this extensified already?
nor would they always consider 'okay' to be what you want. just make a lora
is it what?
is there extension for comfy?
yeah there are some nodes
https://github.com/MythicalChu/ComfyUI-APG_ImYourCFGNow
bare in mind it might not be quite what the paper wanted
oh.
before you use them, read the paper. outloud. and understand what it's for
so you use it correctly
thanks, there is another one https://github.com/logtd/ComfyUI-APGScaling
yeah I saw that one
generally I try thing at first and then look up if results are not expected 😁
the authors showed up in this discussion, if that helps https://github.com/huggingface/diffusers/pull/9626
since i just spent time working with this thing - use it instead of cfg, when you want to ramp cfg up but it's overcooking your images
hmm will it be able to counter high value of PAG?
even if it does, you're doubling your runtime again
there is also Characteristic Guidance Prediction from https://github.com/redhottensors/ComfyUI-Prediction
no examples 
it links the paper
https://arxiv.org/abs/2312.07586
fair warning it will be slow
I see, have you tried it?
no I keep forgetting to try it
I am okay with long generation times so this thing could be really good
A while ago I smashed several model patches without thinking much and made sdxl slower than flux
oh yeah when I ran 3 PAG nodes and 3 SEG nodes it was way slower than flux
funnily enough the Sana demo has PAG, didn't expect that
oh Sana, looking forward it
they will probably release after medium to not get left in shadow
dinner is fucking served, guys
maybe
at least there are some chilli peppers for flavour
taesd3 works with 35
I wish we had something half-way between taesd and the regular VAEs
yea, I had sdxl workflow where 2k pass takes less time than final vae decoding
yeah happens on a bunch of workflows
acceleration loras and high res can easily get that issue
also, tensorrt - massive speedup
yeah I always compile
the exception is when you want to tweak certain things a lot e.g. lora weights
rather than tensorrt I use the current compile model node in comfy these days
a1111 could convert lora to trt a year or more ago, comfy not yet :9 but I often bake hyper lora with convertion
first time hearing this
its compatible with more stuff than tensorrt
does this work with arbitrary models?
Do you mean "Model Compile" or "TorchCompileModel"? I heard about torchcompile and that it works on Linux only
seems like it yeah
ah yeah I am on linux

there is something else 
that's matteo's pack
on windows there is recent news https://old.reddit.com/r/StableDiffusion/comments/1g45n6n/triton_3_wheels_published_for_windows_and_working/
maybe you can get it working on windows
maybe 
its just 10 times less effort doing dev or AI stuff on linux
I didn't have the motivation to work out how to get it working on windows
the way SD 3.5 mixes realistic with fantastical is cool
I don't know anything about anime but some people mixed anime with photographic too
hmm maybe anime loras are ok
nah I am not gonna try
ERROR: triton-3.0.0-cp311-cp311-win_amd64.whl is not a supported wheel on this platform.
go ask @naive sparrow about that
thanks but I will stick to trt, it doesn't seems to be that significant, at least on windows
colours are amazing
on reddit someone was saying that in some ways SD 3.5 colours can be better than flux
I think SD3.5 has the upper hand when you stray away from realism towards a more artistic look
Flux has excellent realism
it might be that Flux is better for me then, I am still testing
SD3.5L's photorealism has a grittiness or a grain - an unattractive quality imho
personally my goal is to make photos that look real
It also seems to mimic M3DB fractals
Yes, photorealism is cool, but 3.5 has an inherent grain (noise?) which needs to be tackled
ah yeah that sounds like a problem
I think Realvis are doing a fine tune currently so maybe that helps
Some people really like that film grain look. But that sometimes is a resukt of the wrong sampler
Your prompt in my SD3.5 workflow
This is it refined by Flux
I suspect it might be my GPU giving the aberration
Your SD3.5 version is perfect
...or your workflow with that GGUF model
Both 3.5L and 3.5_GGUF_Q8 give that weave lattice aberration - but in Flux its OK
It has lost some contrast the refined version
It looks worse in Discord than full size, but I think the refined looks more natural
The refined does look more natural
Another from SD3.5
Probably not helping. But I also found that SD3.5 doesn't like doing multiple stages
Flux is ok with it
It is quite possibly a sign that my six year old GPU is actually on its last legs?!
I am using the ComfyUI default w/f ...
With discord censorship lol
training battlefield lora for 3,5 2 epo looking good, 1st training image , 2 dataset img
These are from the same prompt, but with this part removed...
Fibonacci, voronoi, fauvism,
Fibonacci, voronoi, fauvism,
Fibonacci, voronoi, fauvism,
Fibonacci, voronoi, fauvism,
Fibonacci, voronoi, fauvism,
Fibonacci, voronoi, fauvism,
Not in the embed of your images you're not
The single-step seems to be the answer! Bye bye "Super3.5L" - although it works well in other prompts! This one is using the ComfyUI default single-stage w/f - not brilliant as you can see, especially on her lower areas
Using only the T5 Clip seems to work better than all 3 too
A kind of banding on her 'jumper' at the bottom?
The shadow on this one! 🤣
Papua New Guinea head-shrinkers!!!
Yeah right now with Kijai’s amazing work, it can fit in 24gb vram(instead of 4xh100) and less then 5 sec per iteration which is very fast.
Your w/f works with my prompt - so I guess it's not simply my GPU which is the problem.
The ComfyUI default w/f may not be robust enough to support the prompt's intricacies?
is it possible to use only t5 instead triple clip loader ?
Looks great! Do you know if training possible with 12gb?
T5 on it own is fine - clip_l and clip_g not necessary
No, I was pretty sure it was the "Super" w/f that was a problem.
thanks
The ComfyUI default is also poor on my PC - the single-step
There's not a big difference. Could be sampler, steps, not separating the Clip prompts...? 🤷🏻♂️
Your w/f with the same prompt there is no difficulty as far as I can see
Are you using the LLM and Flux refiner as well?
I'll check your clips/schedulers/samplers/CFGs and see if mine compare
Not the LLM, and Flux Refiner just hangs ...
i have this error i need to install sd3.5 vae ?
Mine is using turbo model and fp16 T5
SD3.5 vae is in the model...unless you're using a diffusion model
I am using a modified version txxl_fp16 and SD3.5 Large checkpoint
Its interesting to find out why ComfyUI default w/f goes so awry with my prompt ...
... and yet yours makes it work perfectly?
when it try to decode with vae i get an error
i should add a node here right ?
I still have some old nodes from original SD3, which have been taken out in latest SD3.5 w/f. Maybe they actually make a difference.
Great idea - at least your version/GPU/PC does it well
Only difference is I'm using turbo model, so less (5) steps
Ah, and only using T5 clip too
Nice improvement! 🙂
"I kinda miss the weave!!!" 🥳
🤣
Portrait Master I have used to make dozens of prompts. Those bracketed values make for some very fine-tuning.
But that's just me
Water-colour look
i t hink it is but slowly
... just d/loading SD3.5_Large_Turbo ...
okay sd35 large + Loras works
You can delete these nodes and it works just as well
Also, the results seem to be much better if you only use T5 clip
Mochi1 is already probably sora level
Mochi is amazing but I think people forget how good Sora was
oh okay thanks
I only just found out too, I'd been using them up until about 30 mins ago 😄
sora outputs were heavily cherry picked, upscaled, enhanced. Sora outputs 480p at default.
hehe
its not just resolution, Sora was ahead in the scenes also
but its hard to compare as we don't really have benchmarks for this
Same seed and prompt as this image, but changed scheduler from sgm_uniform to beta
GTM SD3.5 (modified) w/f
Softer and a tad warmer
Scary
Soras size is also probably massive, it took 10-20 min to generate a short scene on openai gpus.
On fal’s gpu clusters, it just takes 1 minute to generate with mochi.
Sora heavily upscaled, enhances, cherry picked it outputs. It’s not very fair to compare to mochi then. Also, the same heavily enhanced outputs in the sora page don’t seem better than mochi.
What I don’t like about Flux is that it adds pronounced facial wrinkles to female characters, whether they’re young or old. Even children don’t look like children but like dwarfs. Of course, you can tweak the prompt, but that’s not a real solution
Flux skin is preternaturally oily/sweaty and with obvious pores
I like the muted tones of SD3.5 for skin in comparison
It also has a tendency to do duck lips on women 😄
Portrait Master prompts in modified GTM 3.5L workflow
I'm not really seeing it with the mochi examples, seems worse than Sora to me personally
But it doesn't really matter as I believe this sort of thing should be compared using benchmarks rather than taste
Yeah you’re correct about that. Benchmarks are probably a better measurement then my taste or your taste.
my taste is unusual anyway
I don't like Flux for example, but Flux is clearly preferred by average person
For me, at the moment, Kling is the best video generator
3.5 with battlefield lora xd
Portrait Master prompts in SD3.5L
xd
We need mixed model . 😁
hehe
Turbo_Large_3.5 w/f
The "posh" end 😉
didnt test, with 3.5 large you can use
i should make a bot that collects all the cool images on this chat and then posts them on civit lol
farm those likes, a lot of these images are high quality stuff too
lol is that in reference to what i said?
Yes
I'm going to tryout OmniGen in the new SD Next (just released today!)
ah yeah that sounds awesome
I really think models like OmniGen are the future
and here i am trying to still figure out how they make such epic images, the concept of it, and the techincal method, its killing me
clownshark batwing, torcello, galaxytimemachine, youfunnyguys are among those who have that secret recipe that im trying to understand
hey boto how did you incorporate that idea into merging bunny with tea pot? just prompt or some speicifc tool you used?
that's kinda along these lines of creations #🆕|sd3 message
it feels to me there is more to just prompts behind creating those ideas
Just random stuff thrown at a wall
It has the comfy workflow for the curious
thanks, that's what i've been doing with random ideas, but apparently i can't compare those methods of mine with what these guys are creating, im missing out on something that i don't fully understand yet.
what they are doing seems like metaculous and targetted
and what they are creating is not something from ideas alone, they must be using some method/tools
are you guys getting anything good above 1024px? (with sd3.5)
was used to flux just casually doing 2k but with this one it's just noise everywhere
yes
I mean like specifically where both dimensions are above 1024px, not like 1536x768
i would recommend using SDXL Resolution node to choose any of those set sizes then maybe upscale later by 2.x
if you are choosing resolutions loosely and randomly that might have issues
they mention using a res of 1mp that are rounded by 64... but dw about those, just use the node i mentioned
is 2.5b supposed to be trained on higher res?
seems trained with 1080p but I am not sure about 2k.
I know I can stick to safe resolution and get 1280x896 image with sd3.5 but that's not what I'm asking, I can do 2048x2048 with flux in 1 pass, no upscaling required
i have done some extremely wide reses and that seemed to have worked but still those have to fall between the trained sizes
sure, but they will look like ass, you can downsize the blocks with kohya's fix or whatever but that's still not native res and the details suffer
the above is also upscaled 2x with esrgan from what I can see
no they wont look bad
most of the later sd1.5 models create brilliant images
im rendering one at 2048x768 on sd3.5
then upscaled by 2x
beautiful blonde warrior princess in tattered clothes. epic mountains in the background.
if you were to choose size randomly that don't round with 64 like 2048xrandom that might not work
that is not the issue... but also notice the artifacts on left and right
what were you asking?
none of the models would take random resolutions and produce coherent results regardless of versions
as for the artifacts you could prompt around it, i just did a basic prompt without bells and whistles
look at your prompt at 1024x1536 vertical basically, that is a 64 divisible resolution, sigh
the artifacts above 1mp are horrible
I just said it's divisible by 64
ok let me try
proportion seems a bit off
yes i see that
not sure why it's blurring out at the bottom edges but that image dimension is workable
flux has absolutely no such issues
I'm not even on turbo, large 3.5
would expect turbo to be even worse ig
i wonder about large regular, copying it into my ssd
flux was trained up to 2mp images
looking at sai's charts I was under the impression sd3.5 was supposed to be better than flux but I guess that's with 3 asterisks and only at select resolutions
quality is below flux, prompt adherance is better as the chart shows
1024px is still very good metric to judge Ai image models on, Nobody generates above 1Mp anyways unless they want to wait 30 minutes for an image
yeah a possible work around would be to generate within lower res then upscale
sounds like excuses 🤷
its not an excuse, its a technical approach
this is large regular btw
at 1024*1536
well yeah, already tried it so no big surprise there, it's still smudgy
Sai knows that the average person doesn't generate above 1Mp so why train it to be able to generate above 1Mp?, its like when devs know that 98% people use windows so why make a linux version
I don't follow that logic, as an average person I generate above 1mp, bfl also released models capable of generating above 1mp to the public full of average people
why not generate at 768x1152 then upscale by 2x?
I have 2 RTX 3090s and i don't generate above 1024px so your argument doesn't stand
ide also like to make apple juice out of an orange
using turbo ... at 768x1152 then upscaled by 2x
why would I want to upscale when I can generate in 1 pass with superior details... but okay, you guys have fun, I already know everything I wanted to know about the model
why wouldn't you when you are bumping into technical issues how the model is trained?
because there is a better model that can do it out of the box, it would appear
and why would you even generate a large image in 1 pass when you can optimize time/speed by the method i mentioned
If you don't like the models base image proportions just train the model to be able to generate above 1Mp, just rent out a couple of gpus and start training
ok gl you are arguing over things that dont make sense to me
Mochi1 can gen a video of someone watching a video lol. Amazing open source model for text-to video, but need image to video now.
...or I could just continue using the model that can generate above 1MP out of the box because this new one doesn't seem to bring anything new to the table despite what the promotional materials made me believe?
I don't know, just a crazy thought
Just use what you like. People are still using sd1.5 nowadays. New model wouldn't stop people to use old model.
and at least bring the requirements down to like a H100 XD
Thanks to Kijai, it can actually fit on a 4090 or 3090 with fp8 now. Maybe even less now since q4 ggufs have come out.
thats truly mind blowing then
At the entrance of a dimly lit cave, a towering, majestic dragon with sapphire-hued scales glistens in the faint light. The dragon stands tall, holding two crystalline prisms in its claws, angled precisely like those in the reference photo. The sunlight streams through the cave entrance, hitting the prisms at specific angles, causing vivid, realistic beams of light to split into a spectrum of colors, casting a radiant rainbow on the dusty ground. The surrounding area is shrouded in partial shadow, with the play of light and dark creating a mysterious atmosphere. The dragon’s intelligent, piercing eyes gaze at the viewer, offering a silent challenge: solve the ancient riddle of light and shadow. The cave walls are rugged and dark, with faint engravings hinting at forgotten knowledge. The overall mood is one of mystery, magic, and high-stakes intellect, as the dragon stands guard over the path forward."
SD3.5 Turbo
Anyone getting decent generation times for SD3.5 Large with 12GB VRAM?
it needs about 16g in order to run without taking a long time, but it should work fine with 12g, it'll just be slow
K thanks, I have 24 but this guy on Reddit is saying it takes him 6 minutes for 1 image which I thought was long, I'm guessing he needs a smaller version of the model
takes me approx 30s with large regular on my rtx 3060/12gb
and turbo takes 10s
Comfy/Forge?
Yeah he has 64GB RAM but it's a 4070 super, dunno what he's doing wrong
that should be fast enough unless he's mult tasking with gpu intensive tasks
i have dual monitor setup and mostly have youtube playing at 1080p while i render with turbo
i'm guessing he's got his system setup wrong and his CPU is trying to do most of the work without him knowing. unless he's deliberately trying to post false information for some odd reason
Yeah I asked for workflow screenshot, we'll see
thats the kinda res i use.. so that's 2k
if your friend has heavy workflow that could be one reason
Your unet/diffusion model file is about 16GB?
Ohhhh lol that explains it, he's using 16GB main model I think
but even with 16gb original file from SAI that didn't take me longer than 15s with turbo
He's not using turbo at all so that's a different animal
16gb should have no impact on your 64gb that could affect render
yeah he's using regular you mention but that would take max 30s on 4x series card
people are saying fine tuning sd3.5 would be fairly easy, and i can't wait for it to fix some of the obvious artifacting
20 seconds turbo and about 3 minutes regular ...
Werewolf transformation by mochi.
omnigen + mochi = movie studio?
we need better upscaler for sd3.5
the hack of using sdxl to refine is not cutting it
GGUF and I don't seem to be getting along 😦 Are there new special clips to add? Recommended settings?
im gonna stop upscaling untill then
can i see your workflow?
the image just above is with gguf
umm, are you using sd3 or sd3.5?
your workflow is not correct then, you need triple clip loader for sd3.5
does a regular 3.5 workflow work with gguf as well?
here is a workflow you can try, just replace the clips with clip g, clip l, and t5xx that you have
There's probably also new 3.5 specific clips out now too isn't there? 🙂
not really, what im using has been around, you dont need them strictly
but by using the ones im using those are supposed to give slightly better results
not by huge lot
what is the LONG VIT clip I see all the time in workflows? Is it better than g?
its made by a guy who works for Glif lol
its not for g. its for l
that upgraded clip l gave me worse images on an SD 1.5 checkpoint when I tried it
so just to warn you it might be a downgrade rather than an upgrade
@bitter hearth i come back and i see you're still spreading the gospel on better clip models, it's good stuff im a big fan
i have no idea about sd1.5 lol.. i would think those might have compatibitliy isue
Clip l is the normal clip for SD 1.5 though
i havent had bad results with that long clip using it on sd3.5 or flux
did you try the upgraded clip L or the upgrade longclip L? the author does cite that SD should be compatible
can't remember which
i didnt know sd1.5 needed clip loader 👀
how would it read your prompt without clip lol
well i mean i never had to specifically use a nodefor it when i used sd1.5 on comfy
from left to right
- realismBYSTABLEYOGI_ponyV2FP32.safetensors
- sd3.5_large_fp8_scaled.safetensors
- flux-dev-de-distill-Q8_0.gguf
- acornIsSpinningFLUX_devfp8V11_Q4_0.GGUF
yes
i also didnt think sd 1.5 were diffusion models
hmm
what did you think sd 1.5 was
its not required but we're talking about upgrading your setup with a different model, its not needed but optional node you can add to change where CLIP comes from, default setup is CLIP comes from just the Load Checkpoint node
yes that's what im saying, i didnt have to load clips separately for sd1.5 but i understand it requires clips
You actually have to install it instead of just download 1 file and place it in a directory?
sd3.5_large_fp8...aled | 🌱 2439241852 | 🦶 25 | 🦮 3.5 | 🎤 euler | 🗓 10/24, 3:25 PM | ⏱️ 143s
download
goes into model/clip folder
my friend tried the Longclip model in Forge as a drop in replacement and he can confirm it's not compatible with forge yet
3.5L Turbo LLM
sd3.5_large_fp8...aled | 🌱 400971331 | 🦶 24 | 🦮 3.5 | 🎤 dpmpp_2m | 🕦 sgm_uniform | 🗓 10/24, 3:28 PM | ⏱️ 90s
too bad those hands in that pic with the pink haired girl and the girl with those white jeans looks so mangled
sd3.5 is better left un upscaled
lol yeah those are some of the anomaly needs a fix
i think it's safe to say that SAI has redeemed themselves with 3.5 whereas 3.0 was a debacle they managed to push out something super decent that's exciting to work with and rivals Flux in a lot of ways
they had no choice 🙂
people were literally giving up on sai for sd3.5 and with the arrival of flux
well the other choice was to crawl under a rock and just become forgotten
afaik SAI was dead and 3.0 was the last model and they had just shrugged their arms and given up, its nice to see they still got some decent talent making good models in there
model wise sd3.0 was utter mess and unsuable for any work, and then they had a ridiculous licensing policy that fired back, community was way pissed with sai
stuff like hands is just minor quips that can be fixed by loras, or adetailers or whatever peope want to use via post-processing overall tho I much more preffer SD3.5's illustration aestheticis over Flux's. I think flux is still king for realism but I'd give the crown to sd3.5 in terms of anime/comics/cartoons category
sd3 could of been much nicer if you know they didn't try to "protect" things with their trust and safety
literally the only issue
yeah dont try to censor what is fundamental to life 🙂
id say the 3 thing downfalls to 3.0 was the quality, the licensing, and what CivitAI did to hinder it's popularity early on
it wasn't any more censored than the first release of 1.5 or sdxl though right?
strictly speaking about censorship, maybe not, but the model dataset took a bad hit
i agree with their out of the box support for nipples in 3.5 whereas 3.0 insisted on a blank chest, i also agree with not training it for genitals and let the community handle that
i mean i dont know but their efforts killed normal clothed generations as with sd1.5 it was still mostly usable
i dont understand the discrimination towards what's a fundamental aspect of life
ya its weird
i mean there are surely some aesthetic concerns to it but with proper female models that can be aesthetic enough
i think there's something to be said about having corporate models or models that can be used in enterprise or restricted situations where it's not really okay to be generating full blown porn images lol, i think if we can all agree that nipples are universally okay then we're off to a good start, there's a fine line between building a pervy tool and building a creative tool
is there any crash course material for your image render secrets?
my favorite one from your set is the terminator skeleton dude
awww and there's no workflow data in the image 😦
not at all lol
But also, even using the old clips produced this. Think I'll try a different workflow.
You're very kind... 🤭 Here is a workflow... if you want to know my production setup. I use this for 99% of all my images.
If you need explaning of the different parts, just let me know. It's fairly easy to follow. Right now it is wired for Random Prompting
looks like vae issue
Thank you...
I downloaded the sd3.5 vae...
thanks a lot, will go through it, cause i've been eyeing your work for a while, tried to look for methods that apply to similar image generations, but havent had luck, and i was thinking there has to be some specific technique to it
Nothing special, I think 😄
But happy to share whatever I have ...
exact workflow i gave you .. 🙂
did you change clip to triple clip
Turbo 3.5L + LLM
This is what it looks like in normal operation 😛
Anyone have a 3.5 GGUF workflow embedded in an image, that works? 😄
looks complex but im excited to studying it over a cup of coffee / tea later on in my quiet time 🙂
I was trying this one:
you could just copy message link for the one i originally shared , and that's the simplest workflow not sure where you have things going wrong
that looks to be OK, play with CFG in the 3-4 area, 1 wont work well, probably why the pic looks lke that
im genuinely curious btw, i can't think of a reason for the blurred image you are getting
i would double check the node files you are using
models, clips, vae
Sounds like a plan 😉
I upped it to 30 as well 😄
No worries though, I found one that is working very well
I like how its neatly organised and then clownshark workflows are like a whirlwind
I think mine are sort of halfway between the two
@sacred jewel i might need gpu upgrade from the look of that workflow..
i was looking into it... lots of gpu intensive tasks in it
you could tile
tiling reduces VRAM loads
Ha! Nah. I am lucky to have a 4090 but I bet if you replace the unet with a smaller one, the results are the same.
yeah I always use FP8 or even NF4 and its same
This one looks like the regular 3.5 large, then also the GGUF?
it seems that SD 3.5 turbo comes out better than Schnell for some reason
even though its the Schnell equivalent
the realvis schnell was better
I don't think it will get that much fine tuning attention
but if it did, I think its possible for Schnell to catch up to dev
with the right loras and nodes, SDXL 4 steps is very very close to regular SDXL
Skynet Cyberdyne Systems LoRA
(a little YFG SpyWorld50s LoRA mixed in)
I think u can load GGUF via the UNET Node; and 3.5L via the Checkpoint Node. Both options are available
The second one is from my system using Torcello's workflow, but config up to 30 (it wasn't good at all at 8). The first one is from Mage.
It worked out as is fortunately (but config up to 30)
now clearly my prompting sucks in this case lol
OmniGen running locally! 😄
On SD Next?
Turbodong is that you?
🥳
noway
I'm getting about 30 seconds per image on 4090.
It's slower for image to image though. (Using an image in the prompt.)
LOL, well, not sure it's a boy... Could be I guess. 😛 It changed the haircolor.
Okay, you can't necessarily give OmniGen instructions like "rotate this coffee mug". You still have to prompt it with a description of the image you want it to generate, but your prompt can include images for reference.
Success! 🎉 She went clubbing.
Wow it really works.
And it's MIT license. I hope someone is willing to finetune it.
that's really neat 😄
sure is
It's moderately uncensored. (Did a very lite test though. Your mileage may vary.)
I can turn all the anime waifus into ZOMBIES! 😄
Pose transformation and facial expression change.
Glif should add OmniGen 😄
tremendous
Render at twice the speed using 20 steps.
And the results are slightly better???
(Might be lucky seed / better with anime vs. realism etc.)
I guess character consistency has been (extra) solved?
Not 100% sure yet. Need a more detailed costume.
does it work for "photos" as well?
Testing photorealism with a detailed costume prompt now.
Those look good!
Negative prompt: background blur, bokeh, illustration, photo, pencil drawing, crayon, photorealistic, anime, video game, CGI
euler, simple scheduler, 2.4 cfg, modelsamplingsd3 left at 3.00
I'm trying artists
it seems the zdzisław beksiński somewhat works but it just melts everything which makes it pretty much useless
and sometimes claude monet creeps in if I prompt for it
I just prompted with István Csók this time but idk if it's even doing anything
there'sa few left that work? nice 🙂
