#🆕|sd3
1 messages · Page 109 of 1
I am a little worried about the viability of training SD3.5 vs flux, mainly because I think Flux is so effortless to train because it knows so much so well. Sure, Flux has some lackluster default aesthetics, which I would say SD3.5 has better, but SD3.5 also looks like a hot mess in some areas
Training aesthetics on a smart model is way easier than training concepts/quality into an aesthetic model
time will tell, honestly
lol sd3.5 is better than flux at nudity
so it is not censored
it just lacks dpo, as Lykon said
@noble coyoteI have plans to train my own Flux model for general photographic realism. Professional, amateur, just generally realistic looking images. Something a lot of people in the AI community mess up 😅
Oh lord, I forgot about him
Mine time away from the SD server has healed me lmao
This server is a lot more clean and organized than I remember. Did they trim it down and streamline it?
I have been checking for sd news for a while and didn't notice anything, only flux)
how can i add img2img inside comfy?
how can i add img2img inside comfy?
those are cool images in concept, but they just look so... Ugly and un-refined to me. I hope thats something that training can fix
very noisy
are they just not fully refined?
Straight out of SD3.5
the way to find out for sure is ... go train it and see
I have plans to when I know of training tools with good and proper support. I have the datasets to do it no problem
yea, it is unrefined and not directed to certain style, that is the point of base, for us to finetune as we want
if it trains even half as effortlessly as Flux, we are in for a treat
I'm going to try Super SD3.5L and see if I can get a sharper look.
Prompt: flowers galore; Rob Gonsalves; hyper-realistic,hyper-detailed, fantasy, cosmic art; elegant, intricate, detailed, extremely textured, colossal, monstrous.
I would argue a base should have good concepts/details/understandings of things. Aesthetics are effortless to change, but having to re-teach a whole 8B param model what people look like, and fixing huge deformations, thats more of a foundational thing
Changing a coat of paint is a lot easier than rebuilding the whole car
Prompt: an ornate box, open lid, a spider crawling out of it; Rob Gonsalves; hyper-realistic,hyper-detailed, fantasy, cosmic art; elegant, intricate, detailed, extremely textured, colossal, monstrous.
Oh nice, an improvement in the typical SD digital art blur. I saw that improve with cascade as well
people will take the shittier base car an work on it when the license suites them better
Three bubbles with smoke in different colors, by artist "Yoshimasa studio lighting"; ray tracing global illumination, octane rendering, 8k resolution, Unreal Engine 5, hyperrealistic photography, complex details, minimalistic, back lighting
the prompt adherence leaves a lot to be desired, but the aesthetic is consistently "SD"
these are straight out of 3.5. just pure base model. nothing else. 40 steps
give me a prompt and i'll run it
Lets try it against my validation prompt for my flux realism tune, just to see
A cinematic wildlife photograph of a black leopard perched up in a tree in the jungle. Foliage, plants, vines, tree bark, detailed, photorealistic, photographic style
I wonder how long until we have usable training tools
SD3.0 was distinctly EXCEPTIONAL at photographic realism, so I wonder about 3.5
I assume its aesthetic neutrality will make it worse
ah yeah, thats a huge reduction over SD3 for sure. Much better than base flux for photographic realism tho
got another prompt?
Landscape photograph of a broad and dense cobblestone village in Europe, detailed, dense houses, fountain to the right side, carniferous trees
the more I look at this, the more concerning issues I see. I SERIOUSLY hope it will be easy to fix the deformations and incoherence
the improper look of the black leopard also makes me worry about subject overfitting
oh, this one is not too bad actually. Not too far off of flux at all for this one
it is fluid and very trainable. this is base model, no tweaking.
Yeah it’s definitely not fair comparing a bigger finetuned model to a slightly smaller base model.
Only fair comparison is base model and another base model really.
okay, that sounds fair but I still think it is okay to fix, especialy when model is so unrestrictive
Just because a base model doesn't mean its gonna be easy to train out huge issues. Trust me, I trained over a thousand LoRA's for SDXL, and that model was a hot mess to fix certain issues in
An excellent catch-all prompt = Stock Photo or Surreal Stock Photo
The unrestrictive thing is a bittt of a moot point now, but I do like to see that they are being less closed with this model. They kinda didn't have a choice haha
this image shows the most promise so far, I'd say
that one is less promising haha
good variance tho
one thing I DON'T see a chance of improving is SD3's limited/low res generation cap
but, hopefully upscale will be as good with SD3.5 as it was for SD3 medium
that might help bridge that gap with flux
Love seeing all the 3.5 examples, keep em coming. 🙂
here - https://huggingface.co/stabilityai/stable-diffusion-3.5-large/tree/main - go get the code and look through it
very high res generations have been something I have had a lot of fun with using Flux
got a prompt? i'll run it if you like
I already had a look through most of this. Why the send?
Not really, just a busy and haven't had time to try it, so this is giving me a good feel for it's capabilities
I'm not sure what the code has anything to do with a physical limitation of an architecture/model
I'm always interested in it's recognition of characters... Sounds like it's been heavily obfuscated
in case all you'd seen was the repo comfy posted this morning
ah ok, cool, thanks
it has not. give me a prompt
I thought it was in response to the low res worry
Well I generate a lot of Tifa Lockhart and Aerith Gainsborough 😝
Flux is absolutely in a league of its own when it comes to high res generations, however, I imagine that upscale workflows will get SD3.5 there as well
tho, running upscale workflows are a lot less seamless than just running one native high res gen, so there are tradeoffs
who? i dunno if it knows who they are, but let's try
Honestly I didn't even think about SD at all anymore... I'm very happy to see you pulled through and relased 3.5. We'll see if large and medium can actually keep what the comparison graph promises.
I haven't been able to have any fun just yet upscaling with flux, since the ultimate upscale node in comfy is suddenly and inexplicably broken in comfy now
LOL probably not
It probably knows batman 😉
on the left is elon musk, on the right is mark zuckerberg,
Prompt: Tifa Lockhart and Aerith Gainsborough shaking hands
I agree as well. I do not need SD3.5, as Flux is already phenomenal, but I would love to see more competition and be proven wrong. Its liberating to not rely on one company for good results
Rene Magritte and Lowell Herrero are a great artistic combo in a prompt
I like sketchy, ink sketch style stuff.
Yeah that's rough 😕
Prompt: sketchy, ink sketch style Tifa Lockhart
Lets try iconic IP. Can we try "Toothless from How To Train Your Dragon on a rock to the right side of hiccup at night time, dark, underexposed, dreamworks style"?@craggy crest
thing with that prompt is that tifa is a character in a video game and the model is probably confusing it with real life* toy figurines. might have to specify art or game or cartoon, mabey anime
Well... if 3.5 is finally actually capable of good finetuning... Flux suffers a lot because of the bleeding issues.
oh yeah, thats valid
It has a vague understanding but definitely obfuscated
you can't use stable diffusion webui for sd3 and sd3.5 as generator?
that was one of the main focus points of this release - make sure it is trainable
I have been training flux as of late and have found it exceptionally easy to get good results with, without bleeding. I am not sure where people see issues with that
I will say, I was using Kohya for a few weeks before AI Toolkit, as it had issues on windows. Kohya sucks MAJORLY for training flux, and will not give anywhere near as good results with the same settings
Just comfy right now I think. And probably swarm.
they're going to have to have a chance to code in support for it
not bad
not too bad actually
swarm runs comfy, and comfy released support for it this morning
ohhh...
big fan of httyd lol
Honestly I'm exhausted with maintaining all these front ends and models. I'm at a point where I just want to pay for a service online that has better hardware than me anyways. 😝
I wonder how long until we have viable training options for SD3.5. I will try it when its stable
fun prompt
now
it is stable
interesting that it seems to still struggle with background characters like hiccup here, i noticed that with other image gen models as a whole too. mabey this is more precisely describing what the devs meant when they created the refiner for sdxl
Its not that bad at toothless, very nice
in what, specifically?
How about Dumbo?
I still didn't try AI toolkit yet. I definitely have to try that one next.
3.5 medium won't be much better than 3 medium
Down Under Manhattan Bridge Overpass

it's a base model - so a lot of stuff like that is probably something you want to train a lora specifically for
I have trained over 1000 LoRA's for SDXL and 1.5 in Kohya, and let me say, the results I get with AI Toolkit are ASTRONOMICALLY better
How's the 3.5 speed compared to Flux Dev?
like, here
Similar
wrong
base flux vs a training I did with 1K images for 12k steps in Kohya
My 64Gb RAM and 8Gb VRAM manage a 1024x1024 in under two minutes both Flux and SD3.5
vs a training I did with 30 images in AI Toolkit at 1000 steps
(sampler was changed)
Not quite! 🙂
I am using massively less data in a fraction of the time of Kohya, and getting astronomically better results
That is ON Manhattan Bridge Overpass LOL
I will not be using Kohya for Flux again
Y'all running locally or online?
flux base vs my result after 3k steps on the same 30 images that fixed that dude above
Flux learns like a BEAST
That's a good one
no images of big cats in the dataset, either
It won't be,
3 med is 2B
3.5 med is 2.5B
Flux is 12B
I don't believe there will be much of an improvement between 3 med and 3.5 med
I am inclined to believe the same, but moderatly hopeful to be wrong
you are wrong.
SD3.5B is most certainly less aestheti-slop than SD3 Large they had in the Artisan's a while back, thats for sure
but what kinda worries me is that SD3.5 doesn't have very great aesthetics OR general information. Flux made up for its bad aesthetics by having huge amounts of information deeper inside
@craggy crest havent downloaded the model yet, could you try out tdg8uu's old dragon prompt? he always made some crazy good ones.
realistic, Ice dragon, desolate, intricately detailed, artistic lightning, particles, beautiful, amazing, highly detailed, digital art, sharp focus, trending on art station
screw it, I can try it on my realism flux finetune for shits and giggles lol
yummy
drop the trending on art station from your prompts. that's never been anything but a dice roll for random data. give me a minute and i'll run this
hahah i never used it myself, looks like tdg did though and at least the ones that he shared always looked great
Thats pretty cool actually
quite square
I think thats just the... res? lmao
hmm looks like its again meshing a drawn artstyle with more of a 3d artstyle work
sure, cause i didn't bother to change the AR from what i was using
yeah, mine is gonna be 1024x1024
tho MAN flux does higher resolutions exceptionally well
Thats the biggest thing I am confident no amount of training SD3.5 is gonna allow it to do
your prompt isn't very specific for style though
it does dragons pretty well though, i remember sdxl was giving me tons of 4 eyed, 4 horned dragons
agreed, im surprised tdg made that pic above with that prompt lmao
listen, we are collectively trying the new thing, not fanboying
i usually use longer prompts
as opposed to one eyed, one horned flying purple people eaters?
@craggy crestWanna try something for me real quick?
sure
now im interested to see a one eyed one horn one
Can you try a 2048x2048x gen? I wanna see if its usable at all or if it just dies
Jesus those ground textures irk me so much lmao
huh, interesting... Result from my Flux Realism training lol
to be fair, my realism training shouldn't really damage other styles since I know how to train
definitely looks like a deviant art pic lol
now that is spooky
i can't, sorry. i don't have a massive machine
very nice details on that eye though, crazy good actually
i'd just use magific to upscale anyway
ah, no worries. Maybe I will try it some later time
the viability of high res with SD3.5 is intriguing to me. I doubt it can make it past 1536x with how unstable it is at 1024x
i usualy don't do more than 512x512 - and then upscale with magnific or topas the few i want larger
There's a lot of MB3D Fractals about this SD3.5 look!!!
I have been consistently blown away by how effortlessly coherent flux is past 2048x
oh that reminds me, is 3.5 trained on larger images? i recall a more recent model of SAI's could do 512x512 images but they weren't great
i figure theyre trained on the same images from 3.0 but its been so long since that release for me lmao
I also wonder if SD3.5 will accept images of all resolutions with no problems like flux. Its insane how diverse of an image res range flux can train on and generate
from 256x to nearly 4k all in one model
I'm ready for a model that can reliability do two or more known characters in the same image from one prompt.
ok, this is fun haha
I like that a lot
I have done a lot of upscale work with images in the past, and I have to say that native high res gens do have a differen't chjarm to them, cause they will try overall more dense features and textures
like a bush at 256x pixels and one at 2048x pixels will have drastically different leaf densities
Prompt: on the left is a green cat with a red ball of yarn. on the right is a blue dog with a green bone. in the middle is a yellow pyramid
I mean of known characters without getting their attributes confused
256/512/768/1024 in flux. You can see how the feature size and natural detail/informtational density changes. Upscaling that 256 image to 2048 will make it sharp, sure, but the detail/feature density will be off scale
not specifically high res but this is my experience with 1.5 too, which is what i mainly use. 512x640 images turn out much better than simply 512x512, which we then scale up
like look at the leaf to the size of the leopard difference
and the size of the bokeh
there's certain things that scaling up lower res images to higher res won't be able to cary over
@split brambletried to gen Vegeta next to Goku, but I spelled Vegeta wrong lmao
omng, littlest petshop lmao
:)
I used to have a friend when I was growing up who LOVED those lmao
I wonder how Sd3.5 does with fine semantics, like plant species and such
SD3.5L artistic look VERY GOOD!
Flux does an EXTREMELY good job with different tree types
ohhhhh, that one looks very very good
thats probably the best looking of the ones I have seen from you
any specific trees?
Great colours
specifically evergreen, carniferous, sprawling, weeping, stuff like that
The realistic photos I did have a strange granular graininess
it does mixed carniferous trees VERY good
Art outpuit seems more solidified
yeah, thats how SD3.0 was as well. Upscaling fixed it at least
Flux has that too sometimes, but it was easily trained out
@split bramblelmfao
cursed
oh god, that is...
I thought we left that early ugly AI aesthetic back in 2022 
wowww
yeah, those look insanely good
exact prompt please?
depends on what you want, really
Prompt: pine forest, deep shadows, sparkling stream - around us we can see lush ferns and small wildflowers. sunstreaks, early morning
like, realistic images/photographs
a lot of that is probably nothing more than tweaking the sampler or scheduler till you get your prefered look
oh, its 100% not that, its just missing data/improper aesthetic tuning
it happens, all models have it
okay. i like the look of this one, personally. but 3.5 is trainable. and it's fluid. it's very easy to work with
I don't think any of that has been confirmed, like... It JUST released
nobody has trained it lmao
i just spent a month and a half beta testing this. it is confirmed
same prompt in flux with my training. I forgot to increase the steps
oh, well in that case, I am hopeful is. Lord knows SD3.5 needs to be trainable. Its a hot mess mixed bag still haha
it is. we made certain of that
looking forward to it then. It has a lonnggggg ways to go to be competitive with flux for training, so I hope SAI put in some actual work this time
This is six-finger-slicing!!! 😄
the better to slice you with, my dear
Really? OMG I didn;t get the memo...
Fee! Fi! Foh! Fum!
I see some small glimpses of hope here, but man, still a lottt of issues. Its gonna need a ton more work than flux, but its gonna be a little more accessible when it has more training optimizations
@craggy crestalso real quick, you didn't say where I can train SD3.5 just yet. I might try it today, even
simpletuner or diffusers
I will say, the shapes and forms here are very nice and good looking. The fundementals are here for this image, even if the aesthetix/textures/details are fucked
but, those are easier to fix than core issues
oh lord. Looks like I am not training it then haha
i think you're going to have to wait a few days for the guys that create trainers to put out public ones for 3.5
I will wait for more accessible trainers, then I'll give it a proper try
Not much different compared to sd3 M
yea, if you want to go the diffusers route, all the code and stuff you need are on the SAI hugging face page i posted the link to earlier
I am WAY too inexperienced with diffusers, and I really do not want my inexperience to lead to bad results that give me a general bad taste
I'll likely wait for AIToolkit, its my trainer of choice as of now. Or One Trainer
Probably not Kohya, definitely not simple tuner, and likely not diffusers
I would rather write off all of SD3.5 than have to use simpletuner. Until then, I will keep making datasets for flux and save them for SD3.5 when its training is available
@craggy crestoh real quick, since you seem to have insight into D3.5
how damaging is multi res training for it? Can it accept multi res inputs, or does it explode like SDXL?
no idea. haven't tried
fair enough. I am gonna assume its not good with multi res given how unstable it is. Most of my datasets should be just fine for that anyways
Boot Up my sd 3.5 training script.
Honestly, I might use SD3.5 to generate some image bases, use Flux do un-fuck them, and then aesthetic tune Flux using it so we have cool aesthetics and a model that can do good details AND extreme resolutions with fantastic prompt following
is this training script available to public, by chance?
https://github.com/lrzjason/T2ITrainer Here is the repo but sd3.5 is not public yet. I need to test it before update the repo.
ah, alright, thank you! Looking forward
always love alternatives to trainers
Basically it is just diffusers script with cache embedding and cache latent
someone's gonna get to learn diffusers ;)
how hard is it to implement model quantization when training?
first find out if you need to do that
I will not be doing that. Definitively not
you do, there is no chance you don't
don't take something apart before you learn how it works
Very small dataset with fp16 weight
16GB Unet with a 10GB TE, and a .5GB VAE
you don't know that, yet. you haven't worked with it yet
cached latents/TE I presume?
yes
then yeah, quantization is for sure needed for more
tho, that is not bad actually for double cache
you cant' know that until you work with what you're starting with
dude, thats not how it works lmao
you don't get to train 16GB in less than 16GB unless you either quantize it, offload it, or do block swapping
yes it is. in every single area of life. learn it first, then take it apart and modify it
Let me just load 16GB into 10GB. I am sure it will work
comfy put out an fp8 version this morning if you're all that worried about size
grab what he created then and play around with it first
Following our exciting V1 launch yesterday, we're excited to share that Stable Diffusion 3.5 is now supported in ComfyUI for local inference. Experience it with our signature node-based workflows!
Just now, Stability AI released Stable Diffusion 3.5, including 3 powerful models:
- Stable Diffusion 3.5 Large: With 8
Oh, I have exacly 0 interest in generating with the base. I am just curious if the base is worth training over flux. I'll be waiting until I can test that for myself
Its gonna have to offer specific benefits over Flux for me to switch over, but I am more than happy to test said benefits
@winged seal , get me Cloud, Aerith, and Tifa all in one image and you're golden 😎
Thats not really stuff I am for or do, personally
I don't really care for that in image gen models cause its easy to do with other means
I just looked, flux didn't really seem to go much further with community loras past a few days after its release. (Compared with earlier models i mean). I'm not sure why, but sd3 may have more potential.
you can't know what the model is capaable of, and thus know what you actually need to train, till you use the base for a while
Has anyone run sd3.5 large on a 4060 yet? Did it take less time than flux dev?
I think a lot of people just don't know how to train it/don't have the resources to. Its really quite easy to, and the quality is much higher than SD3.5 (for now)
Sd3.5 will likely be popular for its accessibility, but I do thin Flux will continue to be higher quality
it will likely take longer for now, until it gets proper quantization
comfy put out an fp8 version this morning. if you can't get the one that we released to run, use his
did you try comfy's yet?
I have seen enough to know it needs a hell of a lot of fixing. Not much else is needed
looking forward to your loras. you might play around with the base model while waiting for a trainer
FP8 is bigger than 8GB, so its gonna have to offload in some way. Having a smaller quant that doesn't need to do that will be a lot faster
FP8 will be good for people with more VRAM tho
how tiny a machine do you have?
I'm not talking about me, just generally
thought you were talking about you. not that many people any more with teeny machines - most have been upgrading their hardware the last couple years
that is insanely out of touch. A monumental majority of people are still on normal consumer GPU's. I know, I work with dozens, if not hundreds of people who are actual normal people
Focusing on it being accessible for other people is about the least somebody can do in this community. Its why I refused to support flux until it was easily accessible for most people
you must not be reading the same flood of 'hey, i want to upgrade, what machine should i get' posts that i have been.
Is SD3.5 open to being "GGUF'd"?
should be
The dozens of posts about that don't account for the MILLIONS of people using these image gen models dude. Again, astronomically out of touch
stop tossing out insults
Its not an insult, its an observation. Most people are on 8GB or less systems. Its like 100% factual
most people trying to generate images that are on those systems are also using online options...
I already made the distinction by saying people with smaller GPU's looking for local image gen, where they can actually control outputs. The entire community is lucky if more than maybe 5% of users have 24GB GPU's
The cheapest one you can get is a used 3090 for like $600 if you are lucky. Most people are on 8GB or less, with some on 10/12/16
I hope mage adds sd3.5 large soon 🙂
i'm sure they will
Sd3.5 large can fit on 8gb vram with nf4, quanto4bit, or gguf 4bit. It should be similar quality to fp8 as well.
Fortunately these days the size of one's own system is nearly irrelevant, since the online resources are faster, and coming down in price. Fal for example
I'm on 8Gb VRAM already with full 3.5L
Lower 4bit quant will be a lot faster bc full 3.5l will use shared ram hence very very slow images. But full 3.5l will be very slightly better quality.
is that really the dystopian future people want? I know I and many others who sure as hell don't want that
I don't understand why people using sd3.5 with 240 sec per gen on local. They could be faster using onlinse service.
anyone know why I am getting this error? RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument scale_a in method wrapper_CUDA___scaled_mm)
yeah, this. I think 6bit GGUF will be best bet for SD3.5 on 8GB, with low VRAM mode
Maybe 5 bit for some extra headspace
nice nails but hand xdd
Just refresh and start again - usually clears it
so is it possible to train lora already? 😮
How big is the diversity of SD 3.5 styles and multi-character interactions? Did they remove the names of all or most classical and modern artists?
also cant wait for controlnets
Its better in flux in styles and artists I believe but flux might be slightly better in multi-character.
kinda I guess? None of it is properly validated just yet
bad
ohhh, this is an annoying one. Are you using ultimate upscale in your workflow by chance? That is a very big tigger for that error
according to their technical paper summary or whatever, it seems to have better score for prompt adherence than flux, but a bit lower score for the aesthetic
which i guess makes sense since flux is 12B
No, using the default workflow in Comfy
I have a good feeling about this...
thats a cool pic not gonna lie
That is the workflow I am using
it's funny how they introduced the model with a woman lying on grass LUL
Might as well rename the channel SD35 XD
This is the picture at Comfy.Blog
Exactly, and is the workflow I am using that is yielding the error
Lykon showed that its anime-style quality was going to be so good that it could create images that look like screenshots.
Can SD 3.5 do that? Dall-E can.
some dude on huggingface says he has problems with the vae? gives black images, is that just him or is it fine? im gonna try it later, busy now
VAE baked in
I'd be happy to get a black image, since that is a black image more than my nothing at all
Show me your workflow 🙂
so are you guys using the baked in all in one version?
That pic is my workflow
Let me see if my PC can do it too?
What do you mean? Even sdxl can create great anime images, sd3.5 can do it better then base sdxl tho.
im curious about 3.5 medium also, they mention it can go to 2 megapixels, and is slightly different than the large model? idk
oliver queen :3
hmmm, I am not sure then
haha nice
You've got no Model Sampling - set it at 3
does it do anime well?
i remember super mario rpg differently xD
@simple thistle is probably generating tons of fennec anime girls as we speak :3
i also hope they release svd2 one day
Where do I fit it?
@dusky thistle did you try some clown sharks with 3.5? :3
Ok, general takeaway. SD3.5 is far from flux, but its kinda fine if its something else. It remains to be seen if it can train anywhere near as good as flux, but maybe it can
Its got a more artistic base, but its also way less coherent and consistent, so time will tell
I look forward to training it, even if my expectations aren't that high
A StabilityAI employee (Lykon or other) showed that SD 3 Large would be able to create anime with this quality that looks like a screenshot (this image is not generated)
as long as it can train some nice loras, il be happy
what error are you getting?
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument scale_a in method wrapper_CUDA___scaled_mm)
isnt there a comfy example that should work out of the box from the comfy sd3 examples page
maybe try that
did you update your comfy ui?
This is the default copmfy on the announcement page. Now with model sampler added
same error
yes
yeah, the SDE samplers are working with it
nice
comfy released an exmaple workflow on his blog page. but SAI also released one on the hugging face page, and they both work out of the box
i'm trying to clean this repo up a bit cuz it's gotten to be bloated with experiments over the last few weeks with nailing down the math for SDE sampling with flux/2b/wetc
yea idk why that dude has problems then...
and giving me the error described
i missed the error. what is it?
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument scale_a in method wrapper_CUDA___scaled_mm)
what is your system specs?
oh. welllllll that's something you'll find other people reporting in the past in the #🤝|tech-support channel
Dude, I run SD3 Medium and Flux with no issues
you using AMD?
RTX 4060
are you using the scaled versions?
Yes
can you try without scaled
with 8GB VRAM, it has crashed before even trying, But sure
8gb vram... man these days that is not great my dude
i would suggest asking @naive sparrow cause it sounds like your CPU is trying to take over
It runs Flux without issues
cause comfy is probably like offloading to cpu and thats why you have that error
with the scaled versions
https://stackoverflow.com/questions/66091226/runtimeerror-expected-all-tensors-to-be-on-the-same-device-but-found-at-least it's a python error - that's a post on stack overflow that might by helpful
I saved a checkpoint while training on gpu. After reloading the checkpoint and continue training I get the following error:
Traceback (most recent call last):
File "main.py", line 140, ...
@errant dust can you run the flux scaled version?
it's not the model. it's a python issue
yea but im just curious
he said he could
well he said flux, not sure if its the scaled version tho
I have no idea what a Flux Scaled version is. I run the OOTB models, super slow, or the GGUF builds
This w/f OK with me ...
can you run this version of flux? https://huggingface.co/comfyanonymous/flux_dev_scaled_fp8_test
if you cant, then your hardware just cant run the scaled versions at all
Flux can be used on like a 4GB GPU now with how well optimized it has been
my 8GB 3060ti runs it just fine when I am training on my 3090
you didnt upgrade to the 40xx cards? :3
It could run, just very VERY slow
possible to run on 4070 super 12gb? someone tested ?
yes, it is
its not too bad for the quality you get
@bitter hearth can you run 3.5 on 4gb vram? :3
It could be sped up using tensorrt but i don't think any 4GB card supports tensorrt
Since sd3.5 large is smaller, it can run run on even less vram then flux lol. It's also faster. 6gb is needed for ok speed tho.
that would take like an hour to compile, no joke. On GPU's with lower VRAM it takes longer and longer
yeah, it is smaller, so it will be faster, but the quality difference is much bigger than the size difference IMO. I would way rather wait for Flux than just go with SD3.5
so wait, 3.5 uses clip g as well? man thats a lot of text encoders
yup, for like no reason lmao
just like SDXL, you can totally not use g, and it works just fine
yea
yeah it should be the same with sd3.5 large too, it was the same with sd3 medium.
I am just not sure if they undid their lobotomization of T5XXL
it seems like they at least attempted to in their write up
not a huge fan of t5xxl personally, this is why im curious how Sana will do it
Didn't google make T5XXL?
update and it might be fixed
comfy 😮
I believe so, but I meant the way they trained normal 3 Medium lead to T5XXL being damn near useless. I just hope they undid that
sad
im not an expert on licenses, does sd3.5 have a "good" license now?
nice
tho, the license isn't much of an issue for Flux now either
imagine flux releases their next model like tomorrow LUL
It is updated. I did so multiple times. Will do it again, so sure
Flux is monumentally ahead of SAI, so I mean... I wouldn't be surprised
they also have funding out the ass compared to SAI
are you updating just comfy or the dependencies as well? cause maybe you need a better torch version
I have business partners that work with BFL
Just the COmfy. I don't use the embedded
yes but i heard latest torch versions are better for some stuff, you might have to update that tho, like at least version 2.4 i think
I have it already
The newest update is rendering! (something for the moment)
😮
Medium will likely be a way better option sthan large. Large isn't really all that good for its size
:D
nice
now go and generate some "stuff", you know what stuff im talking about cough
That is crazy fast though
I bet 10 dollars that when SD 3.5 medium releases its going to get trashed by reddit because it isn't outperforming Flux dev
the classic bottle example
I am a mega square. Meaning I have zero interest in boobies on japanese cartoon goils. 🙂 But yes, now for some fun and experimentation.
sure
I mean, large isn't either, but Medium will at least be viable for most people
But where sd 3.5 small
smol 😦
likely not gonna happen
I think anything below 2B is just too small to be good
wait i forgot, how many params was sd 1.5?
0.85B or 850M
turbo runs in 12 seconds on rtx3060
😮
1B for all, 850M unet
ah yea
thats so not true lmao
there are 1.5 models that can still do certain things better than SDXL and hell, even flux
yea i still use some sd1.5 finetunes
So quick question: how many params is the Large here? 8B?
They will not be good for general, just specific areas tho. But yes, a finetuned specific model can beat models way way larger. Same with llms where even 3b finetuned models can beat gpt4o in certain domains.
Yeah, the mmdit only tho. It has t5xxl encoder which is like 3-4b. Everything combined should be 13b. Flux is 16b everything combined
what about the quality of the pics?
But generally models like SDXL and Flux and 3.5 outperform 1.5, 1.5 wins in very very specific use cases
I am going to guess we will soon be seeing some nice GGUF models
yes, for the unet
nf4 too i guess
GGUF was by far the higher quality of the two, and efficient. Q8_0
nf4 takes way less vram then q8, while being faster then q4. but yes q8 is slightly better quality as its 2x bigger
this shows 8.1B? maybe typo or they rounded down to 8 lol https://images.squarespace-cdn.com/content/v1/6213c340453c3f502425776e/3c173f1c-00d8-47b2-9d7b-f321a452935f/chart+(1).jpg?format=2500w
that elo score tho... just like my chess elo :3
not bad for simple stuff but smooth textures
yea
needs finetunning
gonna be awesome to see controlnets, loras, finetunes and so on 😮
they always work like that for transformers
its always a soft rounding
ah
Already some loras from shakkar lol. They uploaded like 4 hrs ago lol
31.5B becomes 32B, 9.3B becomes 9B
what kind of loras? 😮
not very good ones I would assume 😅
wait isnt shakkar the same group that did the flux stuff or am i mixing the names
I will train SD3.5 when good training options are released
I am not too hopeful, but I am also not gonna fully write it off
Yeah they made a flux controlnet union too, and lots of loras for it. They already trained a few loras for sd3.5
https://huggingface.co/Shakker-Labs
🙏 😳 ❤️
I think shakker had early access to SD 3.5 large since they released the loras a hour after the announcment of SD 3.5 large
yea they prob worked together
likely, yeah
There is a big diff on my system though: with NF4 I was limited to 1024 x 1024 since anything more and it would be slow as a snail. Also GGUF is loaded in blocks and not all at once
Like how latent vision had early access to the 3.5 large checkpoint
can you just select update comfy or update all to run 3.5? I haven't used comfy in a month or 2
When I was in the SAI beta's, we weren't really able to train them, but with how shit SD3 was, so I presume they wanted to have something to go along with it so it doesn't flop as hard
I'm quite the sexy man generator myself
oop
:3
grumble
pngs are usually bad, i like json, it loads without failing
rare 1% of AI users who give any shits about men
well also, I probably have to wait for some lora creation places to popup to create my custom loras 😄
yea i cant wait to train some loras
and be sure to use the fp8 model by Comfy, since the plain model won't load with less than 12GB Vram
hopefully my boy Kijai releases something for sd3.5 how he did with flux
Man, remember when image gen models were made for normal consumer hardware to run? Back when local AI was properly accessible?
It says SD3 but is now updated to 3.5
i will sail with her
ah, as synthetic/plastic as ever
but yea the plastic hurts tho
nvidia has you covered 🤡 https://nvlabs.github.io/Sana/
SD3.5 didn't fix that, thats for sure lmao
The skin or the boobs? 🙂
LOL
everything
I guess I hadn't updated my comfy properly before, or I had downloaeded the wrong thing called sample image
lol. I don't have a limited system. I just meant for other people
Sana gonna be interesting, hopefully "soon" means couple days max... 😦
Nivida cooked with their LLM nemotron 70B model, yet with Sana nivida managed burned the kitchen down so hard they started a wildfire
NVIDIA is actually starting to get pretty decent at AI models. Took them a while
yea Nvidia doing some cool stuff
NO, I had updated, mutipl.e times. Seems a late update after this solved my direct woes
i really hope some mystical magical small model gets released that's good (so there will be usuable video without 4h100 😬 ) but for now bigger seems better :/
Yeah, Nemotron is actually surprisingly good. Their other nemo models were hot garbage
Now awaiting the GGUF builds. 🙂
so glad gguf is available now
what happened to the gguf master the bloke? he died? 😦
would be even better if we get faster acceleration structures
Anyone can build a GGUF model. I just lack the system resources. There is a page with the tools to do it
exl2 would be crazy haha
of course, but i always remember thebloke having all the latest models lol
can you link please? I need to convert some of my Flux trains to gguf at some point
aww
Here you go: https://github.com/ggerganov/llama.cpp
If SD3.5 trains even 1/10th as good as flux, not too bad
when is llama.cpp gonna support visual models... sigh 😦
does llama.cpp just have direct support for ggufing flux now? Last I checked, you needed a special tool
were you prompting for a "Flux Chin"? or is that just Flux?
unfortunately not soon enough
been waiting for the same
When llama 3.1 405B fits on mobile phones, thats when Llama.cpp starts to support vision models
molmo is very good at uncensored stuff
405B that is WILD to me
grumble
becky did you update?
Wait till you hear about the bigger models lol
haha
Fatlama would put us all in a coma
Isn't gpt-4 1T parameters or something?
Then reach out to the author: https://github.com/city96
I did
1.75T
Fat lama is also 1.7T IIRC, but hes making much bigger ones
Use the workflow I mentioned and the all-in-one checkpoint
all of the real progress is being made in accessible consumer scale LLM's tho, its pretty amazing
Old gpt4 was 1t but the new better gpt4o is unkown. It's probably much smaller as its way faster.
1bit 405B :3
still over 100GB
I'm using yur workflow
I believe the new Obese llama is planned to be 3.8T
405b is actually considerably cheaper then gpt4o, claude, gemini while being similar performance to gpt4o.
Isn't this bloating paramater numbers at this point?, this gives no real preformence boosts does it?
Yeah no performance boost(degrades quality), but why not add a few trillion parameters for fun?
if it can generate the whole source code for GTA 6, its a good LLM :3
I wonder how a MMDiT model would look like with over 100b?
oh yeah, its 100% that
thats the whole point lmao
he just wanted it to be bigger lol
fat llama 1.7T is over 1TB
Ok, time to make some apples to apples comparisons
I remember years ago when having anything above 5B was considered huge
part 1 of 19, part 1 is like 50GB 🤣
consumer grade 20-30B is the SOTA rigth now and its incredible
Nemotron 70B is pretty great too
skinnyLlama when
thats just the new llama 3.2 1b
yep
Fatdiffusion when????
Black Lotus 😮
LOL
It is like that film Annihilation where people blend into plants, except here it is with burgers
Woah 3.5 went really low quality
I wish image gen models were making the progress text gen models have made
im using the new sd3 so im wondering which one would be perfect?
57 seconds to generate an image, comfy fp8 version of SD3.5, on an RTX 3060, 12GB VRAM
i mena in term of step
the more step the more is slow
the less well ye
nope, thats just how models this size are
oh wait 10 gave me this
oh sorry I wasn't trying to respond, just sharing my experience 😄
similar to flux dev if I recall
flux dev?
Sorry Mihoru, I'm a little out of the loop on what you were asking for and feel like I am misleading you with my random chatter. 🙂
Let me reset...
it alright
What are you trying to do?
Flux / SD3.5 Large
"How do u' like them apples?"
im trying to increase the generation speed
i did set the step to 20 and idk what else i could change to increase
im not really familiar with setting
all ik is that step are like faster if there less but bad graphic
instead of being slow and good graphic when higher
What GPU do you have?
3060 but which version? 12GB?
Ok - what model and how long does it take for you?
Definitely one of my favorite films.
the ai model?
Using in a big negative prompt seems to really mess up the image (especially the background), can anyone confirm that? I'm going to use it without negatives for now, shame as that is one advantage SD3.5 had over Flux Dev.
EDIT: aww maybe it wasn't the negative, backgrounds just often get messed up, for some reason.
yup
yep that sounds right. that's how long it's gonna take.
aw gotcha
If you want faster, use SDXL models, or 1.5 models.
was thinknig there was a way to tweak it
wow, this prompt actually did so well 😮
sounds about right
im using it with barebones ksampler settings and im getting ~30s on my 3090 with 20 steps
i'm using dpmadaptive sampling -> it's slow, but real good quality
"may i interessed you into bitcoin?"
if you dont like it dont
refuse or else
a bullet willl suddenly appear on your forehead
can anyone explaing in short what this does and how to use it?
Large is for sure not as refined/stable as flux, but it is at least more accessible for people on mid sized GPU's. I think SD3.5 medium will be a lot better tho for quality density
Is there a way to add live preview to this simple comfy workflow?
without needing a degree in comfyology
yeah
I think its this in settings?
not too sure
oh wait, but it uses a different VAE.... hmm
ODE vs SDE with SD3.5L
Santa when seeing all those decoration in october for christmas
6 legs but pretty nice detailing
first one was only 5 steps, was testing something, this one has 20 but i somehow like the 5 steps one better
prettty good one the finer detailing
noticing some casual disturbances in the snow in these, didnt even prompt for it. pretty neat action footage
Use the new GUI and select the top icon in the left toolbar - the one which looks like a Refresh icon
im used to prompting with 1.5 so its usually a ton of 1 or 2 words followed by a comma; its nice being able to just type out whatever flows through my mind in describing a thing with sd3's text encoder tbh
lower the cfg to 2.8
helps a lot
the default suggested 4.5 cfg blows out hte images
yeah def gonna play with the cfg, im still just playing with low steps because i want to test out different samplers and schedulers
ooo.k
I just get a spinning circle.
SDE sampling with SD3.5L
I have KSampler, and KSampler Advanced.
thanks for the workflow @lavish sparrow finally one that worked (mostly). I lowered it to 30 steps.
Took just as long as Flux Dev
SD3.5 does not do NSFW right out of the box (no surprise, none did/do). Looks really amazing though!
you're welcome, it was pretty easy adapting it ^^
Grab them from manager
For whatever reason, I just kept getting errors wtih all other workflows
and now we see why comfy is a rabbit hole... there's always another step you have to do underneath the step you're trying to do. 😛
i'm mostly using default nodes, except for the ones that aren't obviously, but uh... i suppose most should work
has anyone managed amazing hands with 3.5 yet?
Yes. But we may not mean the same thing when referring to 'amazing' hands
It is also hard for me to really comment on such since I am using an fp8 model which will be inherently worse than the vanilla. In fact in Flux the fp8 was about as bad as NF4
Ergo proxy flashback
ergo proxy? now that's a name i haven't heard in a long while
was a weird anime tho
Just wait for fine tunes. Why don't they start releasing an accompanying fine tune specifically trained on realism instead of letting the haters go crazy and amplify the defects of a clearly undeertrained base model is beyond me
4 fingers and a thumb on each lol
for some reason, that scene where the chick pushed the dude of the deck of the ship and scribbled the note "oh, he's left/right handed" (can't recall)
Just to triple-confirm, no SD3 controlnet is currently compatible with SD3.5, correct? That is my assumption but wanted to check
nope I think
Oh mine have 4 fingers and a thumb. They even have a spare extra finger for good measure.
Very cool
haha that pic tho, you are "getting there" Becky :3
Even my computer flushed with the prompt I used though; it left some stuff out haha
haha
can you make John Wick's car :3
3.5 via toyworld. Turned out better on my own system, but also took 10x as long on my own system
big boi
ai has peaked, all downhill (in terms of quality) from here
"Honey, did you forget to feed the flower?" - "THE WHAT?!"
haha
that can easily be like an enemy in elden ring
@ostrisai already has preliminary support for SD3.5-large lora training
out https://github.com/ostris/ai-toolkit/commit/3400882a8099645ce4c797f57ac258f1e1424ffd
all these people
yet almost none answered the questionnaire
nice, ye cant wait for more support
give it 24 more hours
yeah, i'm like 3.5 more and more, i feel it's more responsive to prompting. which i like ^^
the "future improvement" has arrived
it's fluid. it's almost effortless to work with it
third order RES
well according to their analysis, it did score a bit higher than flux for the prompt adherence
yeah, it does so, in my limited testing
it uses 3 encoders, flux uses two...
regular CFG for comparison
This is interesting
SD3.5 Prompt:
yep
Can someone try it? I can;t imagine it is the model...
a lot of these pics can be like cover art for some cd 🙂
Flux with same prompt as above
in my tests, you can take cfg through a wide range of numbers and take steps down to 8, and still not have too many issues
you also have shift - though that's reported to be missing in comfy's workflow. it is, however, in the SAI workflow
spooky
Prompt: an abstraction painting of a cat, minimalistic, line art, isometric bilateral differentiation Width: 1024, Height: 1024, Steps: 40, Cfg Scale: 4.0,
fibonaci cat. perfect proportions!
all things end in cat.
golden ratio cat lol
all these dreams, someone had to shape them
wild colors there
Fibonacci, rococopunk, voronoi, fauvism, zentangle cat
... just about to!!! 😄
Capra Demon's cousin
There’s also a new sota video gen model if I didn’t mention that. Just released today too. Apache license.
https://huggingface.co/genmo/mochi-1-preview
Meow
dude what is happening these days, technology is moving so quickly, barely have time to play with one toy
Actually good local video model??
I see everything
👀
It’s not really every time that it’s moving quickly but today has been a crazy day. 2 new video gen models, sd3.5 large, Omnigen
nice
release tsunami!
Works quite well
perfect hands!
CogVideoX is great too but this seems even better. Only requires 4xh100 right now lol.
Kijai is experimenting with fp8 mode so it fits in 24gb vram. Not fully fleshed out yet tho.
woman lying on grass where? :3
hehe
oh, i'm not changing seeds 😮 oh well
i can fix her
I can't even count to 4xh100

do not be afraid, now that i think about it, looks war-framey
Lora definitly meeded. Fortunately I still have all my images from the flux loras
the best i have currently is just 12GB vram 😦 , im waiting for 5090 to drop, so i can upgrade my pc anyway
Fibonacci, rococopunk, voronoi, fauvism, zentangle cat
sometimes it's one of those days
The longer the prompt token quantity, the worse it gets... Workflows embedded
mirror's edge sequel
Angewomon
More results from mochi video gen model
wait i forgot, is angewomon the booba from digimon? lol
sd3.5 pony?
cfg++ 2.0
Streamline Moderne, retrofuturist, fibonacci, rococopunk, voronoi, fauvism, zentangle cat
no just smart prompting and 2.8 cfg
but im wondering if there will be a pony version eventually
doubt we need it at this point
now add some fractals
Close enough 😄
lol
haha
for the mochi video thingy "The model requires at least 4 H100 GPUs to run", aka, this is only for the 5 people out there
I hope the fact that paintings are mediocre is because of my sampling settings
anything above CFG 3 is a super saturated mess
and below is losing coherence
i mean you know what they say man, one man's garbo is another's gold :3
you can rent gpus on runpod and stuff if you really mind privacy
but sadly since it cannot possibly be finetuned it will be forgotten
people REALLYYYY want corn
no matter how shit and unprofessionally bad it looks
Astra is doing pony on another model
auraflow
Auraflow???
Septo-prompt (7 x one prompt of - fibonacci, rococopunk, fauvism, voronoi, zentangle)
auraflow is perfect for it cause its cartoony
Actually smart for once
I won't care for it cause it probably wont do photoreal stuff
im just gonna wait for amateur photography lora for SD3.5
this model is SOOOO much faster than flux
but flux is a beast :3
it sure is
Beastly boring
its 4B parameters ahead
