#🆕|sd3

1 messages · Page 129 of 1

lavish sparrow
sage burrow
#

It does create extremely good loras fortunately 😉

craggy crest
mortal mesa
#

looks good, capturing the 3 dimension made of something look

dull star
craggy crest
civic trail
civic trail
craggy crest
muted onyx
#

I do respect your taste, but is there a fine tuned model for more realistic human

sage burrow
sage burrow
lost birch
#

Live-action version of Crayon Shin-chan

rapid pivot
#

Hello beeeeckyyy been a while

craggy crest
civic trail
craggy crest
fleet meteor
#

almost

hallow lion
sage burrow
sage burrow
#

Is it just me or does hunyeon video do better hands than most still image creators?

sage burrow
#

also sdxl on my own system

proven lantern
#

cool

craggy crest
craggy crest
rapid pivot
craggy crest
bitter hearth
#

it is so hard to lerarn the SD

kindred stone
craggy crest
civic trail
limpid thunderBOT
#

Thank you for using comcom analytics.
"comcom analytics" supports all community managers (moderators and server owners) by stats, visualization, and analytics.

If you have any questions, feel free to ask us!
Your dashboard
Help
Support server

Other languages
en: help
ja: help Japanese

limpid thunderBOT
#

Thank you for using comcom analytics.
"comcom analytics" supports all community managers (moderators and server owners) by stats, visualization, and analytics.

If you have any questions, feel free to ask us!
Your dashboard
Help
Support server

Other languages
en: help
ja: help Japanese

wanton slate
#

"Couple holding hands on rural hilltop, watching apocalyptic sky filled with violent aurora borealis and magnetic storms, burning city in background, windswept landscape, dramatic lighting, 8K, photorealistic, cinematic framing"

#

doh

odd notch
#

MGM Grand Las Vegas on 11 hectares land area designed by Veldon Simpson, capturing the entire edifice in a single shot from a distance of 100 meters, through a soft-focus lens, bathed in warm Sunlight, modern architecture, rtx lighting, cloudy sky

#

"MGM Grand Las Vegas on 11 hectares land area designed by Veldon Simpson, capturing the entire edifice in a single shot from a distance of 100 meters, through a soft-focus lens, bathed in warm Sunlight, modern architecture, rtx lighting, cloudy sky"

#

help

limpid thunderBOT
#

Thank you for using comcom analytics.
"comcom analytics" supports all community managers (moderators and server owners) by stats, visualization, and analytics.

If you have any questions, feel free to ask us!
Your dashboard
Help
Support server

Other languages
en: help
ja: help Japanese

lavish sparrow
lavish sparrow
lavish sparrow
lavish sparrow
lavish sparrow
#

"Lord of the API's" -> I like the wifi staff

muted dove
muted dove
civic trail
young blade
craggy crest
neon imp
#

Posting my full findings soon and the relevant additions of code for ai-toolkit and koyah_ss but I am fairly certain I’ve discovered mass scale misalignment of the text encoders across the most popular training tools. Here are some before/after tests from multiple character and style LoRAs with the exact same settings aside from the added parameters to ensure proper alignment of text encoders with the u-net. I know this is a large claim with huge implications that said, I would not be sharing if I did not 100% believe this to be true.

neon imp
# bitter hearth will this affect flux

Yes. As a matter of fact, the bottom two rows on the first image are examples of improved training stability with Flux Dev.

While the other images highlight the more drastic improvements to SD3.5 Large training as a whole.

Going to test 3.5 medium and Schnell next but I need to finish documenting and get this fix out to the community today.

bitter hearth
#

okay thanks

craggy crest
neon imp
craggy crest
neon imp
# craggy crest trust me, it wasn't overlooked. It's far more likely an issue of training an enc...

You would be shocked. I am not training the text encoders at all. I am defining its parameters for proper alignment between the text encoders and the u-net. There is no noticeable difference in compute resources. Style LoRA training starts to take at lower steps and there are clear improvements with far less deformed features and better color depth.

From my tests this seems to be a universal misalignment issue. In the results across various character and style LoRAs at different ranks double checked with both ai-toolkit and koyah_ss as well as 3.5L and Flux Dev.

craggy crest
#

he has a dreambooth for flux, and he has one for sd 3.5 large

neon imp
craggy crest
#

just scroll all the way to the bottom and slowly scroll back up, he's got tons of stuff

#

click on anything, and then look across the top, you'll find a link to it on his github repo

neon imp
#

The full code doesn't seem to be shown and runs through a paywalled api.

While it is possible that this or any other induvial user could very well be taking the extra effort to define these parameters. I do think think this is a known issue and if it is a known issues that some are keeping secret behind paywalls.

That fundamental goes against my personal views on the technology as a whole.

craggy crest
#

the entire purpose of this is to make sure whether the issue is the trainers - ai-toolkit and kohya_ss - or if it's something else.

neon imp
craggy crest
neon imp
craggy crest
lavish sparrow
toxic bone
craggy crest
#

it idea is for him to make his tests on something other than just the two scripts that seem to have issues, and to determine whether it's those trainers - or if something else is going on - for that, dreambooth is a good option

toxic bone
#

I don't think reference code for it was ever published by stability ai either. It was just a community member that published code to github. Huggingface has been the biggest source for reference code afaik

toxic bone
#

few points on your gooogle fu "gotcha" attempt.

  1. is not reference code
  2. is not maintained by stability ai
  3. was published before Penna was hired by Stability
  4. Joe don't work here no mo.

Hope i don't get banned for disagreeing with you on something you're wrong about.

Dreambooth is probably not even what the guy should use. LoRA would be a better approach for their needs. But i'm not commenting towards that. I just correct things when I know better

craggy crest
toxic bone
#

history informs my interactions with you. In the past, while you were moderator of /r/stablediffusion, you kicked me from that server after similar disagreements here, on another server. You remember that don't you?

#

If i have a bad attitude maybe you should ask yourself "what have I done to this person?"

craggy crest
#

don't start. i'm done talking to you.

toxic bone
#

oh okay. So you're sociopathic and can't recognize that others disagreed with your arbitrary kicks, leading you to no longer be a moderator of that server or subreddit. but alright. got it.

craggy crest
#

assume whatever you like.

toxic bone
#

not an assumption. conversation with the admins of that server and subreddit confirmed why you were removed. because of kicking me all those times for no reason.

craggy crest
#

and just keep on posting

toxic bone
#

"valid reasons" being we had disagreed about something mundane on here and i usually back it up with facts.

when i asked the other mod there why i had been kicked repeatedly from that other server he looked into it and found no valid reason. I imagine when he asked you, you told him some sing song story and they disagreed with your validation. I doubt you told him "he was arguing with me on another discord server".

craggy crest
toxic bone
#

@viral plaza pinging you sorry. ... but... i mean... seriously.
https://discord.com/channels/1031106063837184021/1308975746529890344
This topic was the last time i got kicked from the /r/stable server. None of the kicks were ever explained to me. I only found out because i noticed the server icon no long in my list.

I dont like being gaslit like this didn't happen so i have to address it.

Another community i'm part of had a member talk to sandcheezy about you which was illuminating as well.

craggy crest
#

always nice to know people are spreading lies behind other people's backs. you do realize that alex rarely pays attention to any discord but his own? he's somewhat busy, you should try pinging him there.

toxic bone
craggy crest
lavish sparrow
turbid grotto
#

lol, sd3.5 large lora easily training on 12gb gpu

#

under 8gb vram at 1024 with offoading 0.5
7.70s/it which is almost 3 times faster than flux

#

but it might converge slower, or I don't have correct settings yet

#

I am training at lr 0.002 💀

viral plaza
# toxic bone <@105458332365504512> pinging you sorry. ... but... i mean... seriously. https:...

Since crystal wants to gaslight you about it here I'll go ahead and post for you and anyone else that cares:

Crystalwizard not only silently kicked nuuideas from the r/sd discord repeatedly, but deleted the logs of having done so from our internal mod logs. I only discovered this when nuuideas asked about it and I digged through the discord audit logs and managed to find this out. I asked about it at a time when crystal was otherwise active, they didn't reply within the span of about a day, and I spoke with the other mods and we mutually agreed to remove crystal from the team, as not only was this far from the first issue with their activity as a moderator, but also the fact that they were deleting data from mod logs indicated that past reported incidents we had no proof of were potentially true as well. After removal, Crystal left not only r/SD but other discords as well, without ever saying anything at least to me. I think they spoke to cheeze at one point after?

(Also to be clear, no, as best I can tell, crystal had no valid reason to kick nuuideas at all, they just had a disagreement on some random technical point and crystal would rather exert authority than let themself lose an argument on the internet or something)

desert garnet
#

that what happens when u add crazy ppl as mods

dry wave
#

doesn't surprise me and probably nobody else who read messages from crystalwizard 😂

craggy crest
mortal mesa
#

name change incoming, the truth is spreading

hallow lion
#

He mistreated so many people including me.

mortal mesa
#

don't you know its assault if you disagree or have factual information

toxic bone
hallow lion
#

I wish discord would literally just make a person completely invisible for you when you block them. They did send me a survey asking if I like the block system on here and I suggested it. Hopefully this is a much needed change they will implement soon. I think most people would prefer if blocks worked this way.

#

He probably has the record tho on being blocked by most people. :))

mortal mesa
#

by far

toxic bone
#

i've always had issues blocking people. can't follow conversations. him specifically tends to be very sycophantic , so many will engage with him when he has his behavior facade up. I just give people notes. I wish the notes would show next to someone's name though. Or could give people custom colors.

hallow lion
#

He is strange... sometimes he acts almost normal but then has this other side. There is something going on with him for sure.

toxic bone
#

It's rude to talk about someone in the 3rd person when they are right there. I just wanted to point that out. Not a criticism. 😉

hallow lion
#

probably unrelated but where is 4GB VRAM cat!???? XD

#

(for those the new guys who don't know there used to be an active user here named "Cat with 4GB Vram (send help)"

mortal mesa
#

@bitter hearth allo

foggy cloak
#

What's everyone's go to flux model?

bitter hearth
#

base FP8 with turbo lora

mortal mesa
#

shuttle 3

lavish sparrow
#

he lost his account 😢

#

he goes by @rapid pivot now.

hallow lion
#

awww

rapid pivot
rapid pivot
hallow lion
#

sad about your other account then.

#

I guess the 4GB cat didn;t get their VRAM. :/

rapid pivot
#

It only gets worse as time goes by

desert garnet
hallow lion
#

oh hai steven segal

desert garnet
#

hai there

toxic bone
toxic bone
desert garnet
toxic bone
#

i've never had any idea how to browse 4chan. its like in no order at all

bitter hearth
#

yeah IDK how to navigate 4chan

#

apparently there is a lot of diffusion stuff on there

#

but I am not sure if it is good advice or not

toxic bone
#

yeah i have nothing against it. i just dont know how to digest info there

rapid pivot
#

Phone number cancelled, not enough patience to go through recovery and blah blah

bitter hearth
#

but why 5 phone numbers cancelled

bronze blade
#

can AUTOMATIC1111 use SD3.5 gguf?

rapid pivot
#

I don't pay for mobile data, sometimes I want one for emergencies or whatever thomas

#

so they get cancelled eventually

#

if discord gets stuck in login for me for whatever reason I just literally create another account its not a thing I care that much about lmao

bitter hearth
#

oh this sounds fine

#

I thought it was shenanigans

rapid pivot
#

im lazy waow

remote holly
#

photo realistic, a pretty woman (with dark red bob hair wearing a black suit with a dark red tie and a long black coat with long black palazzo pants with vertical red stripes) a determined look, standing in a white room, dynamic shadows, volumetric light, long exposure, sun rays, cinematic view, bokeh effect, fashion advertisement, Dior

wraith axle
#

Photo via Kodak portra 160,young pretty girl, 20 years old. She has blonde hair, blue eyes, pale skin. Split into four images, Shot of different angeles, white background --style raw --v 6.0

frail shoal
bitter hearth
#

nice shadows

prisma valley
#

We need an easy simple, non-technical local Lora training like a FluxGym (Is there one?) and we need DreamShaper, Juggernaut, EpicRealism and Realvis versions of SD 3.5 or better in 2025 🙏🏽

bitter hearth
#

100 steps of lion

rapid pivot
turbid grotto
lucid swift
#

its ok

wind talon
#

is there a tiled controlnet for sd 3.5m?

turbid grotto
sage burrow
lucid swift
#

its not worth it. omnigen is better

bitter hearth
#

omnigen didn't train on enough aesthetic data

#

it could improve maybe

stable snow
#

3d风格一张图以红色为基调,上面祝福语是大吉大利周围装饰图标金元宝、红包、烟花、梅花

rain current
#

Which one do you prefer?

toxic bone
#

2 cause i like kodak tones more than fuji tones. totally personal preference though. in the west we're all about the warmer colors

sage burrow
#

left looks like one of those disposible cameras from the 80s! Or you ran out of printer ink 😄

rapid pivot
#

I just don't like her legs on left

#

It looks like a doll

#

Now you can't unsee thomas

sage burrow
#

We are all art critics! 😄

cursive frigate
#

Does anyone know any good sources for documentation for ComfyUI and Custom Node creation that I can feed into something like ChatGPT or a local LLM that will allow me to talk to it and have it help me either create my own custom nodes or at least help me put together better workflows for specific use cases?

toxic bone
#

if anyone here that still has respect for crystalwizard then here's some interesting reading for you to see on the comfy org discord server. #1319770970868945057 message

#

catch it before he deletes it

cursive frigate
#

I guess I don't have access to that channel for some reason

toxic bone
#

i thought discord would let you join servers thorugh a link like that. it's the public comfy org server. i can't post the link here because they block discord invite links

#

DM'd you it

cursive frigate
#

Thanks

#

So what is he getting at. I guess could I get a TL;DR of how this convo started. It seems like Comfy is now monitoring created nodes?

toxic bone
#

They have a public registery of nodes now. Anyone can apply to have a node on it. By default, manager will only install nodes that are registered

#

its' a good direction and a step in the right direction. but he's flipping out because he has to change a few configuration files to install a custom node

#

you can still git clone directly into /custom_nodes/ folder

#

personally i find it very exposing of his absolute lack of expertise and professional decorum

cursive frigate
#

He definitely has a strong opinion. I can see both sides of that coin.

#

It is a good direction for sure. but maybe there should be a time delay for some legacy nodes, or at least an audit and conversion timeframe for older pre-existing nodes.

toxic bone
#

This has been coming for a while. The registery wasn't just launched today empty

#

it's just fully deployed now

cursive frigate
#

I wish I knew enough to be able to make my own custom nodes. I would take a shot at getting on the registry and see how it goes.

#

I think its a good thing.

toxic bone
#

I always complain about the security risks of having 5 dozen custom node packs installed, but what i truely hate most about it is the dependency problems. Nodes over writing each other's dependencies in the virtual environment is an ongoing problem. This could help to alleviate that issue among others

#

It can also lead to a standard library. Something i herald often.

cursive frigate
#

Ya this could be great for conflicting nodes for sure

toxic bone
#

in the past, there was already a list. nodes that were recognized by the manager. but now it's public and anyone can submit to it

#

much more standardized and tied to secure practices

bitter hearth
#

the registry was in response to malware yeah

#

but it will also help with the dependency issues

proven pecan
toxic bone
torpid marlinBOT
toxic bone
#

https://en.wikipedia.org/wiki/Quantum_mind

could use traditional search much easier. LLM's don't need to replace every single task. traditional softwre still excells in most arenas

The quantum mind or quantum consciousness is a group of hypotheses proposing that local physical laws and interactions from classical mechanics or connections between neurons alone cannot explain consciousness, positing instead that quantum-mechanical phenomena, such as entanglement and superposition that cause nonlocalized quantum effects, inte...

#

it's funny to me that people are heralding all the capabilities of gpt, like counting the letters in strawberry now, when a simple string operation could do that already for a half century

#

LLMs are certainly a break through. I don't believe we're anywhere near AGI though. They may become more generalized in their use, but general intelligence they are not. It's all theranos level hype

mortal mesa
toxic bone
#

seems to thrive on faux expertise and flexing

toxic bone
#

I dm'd you the link to it chaos. everyone should join comfy org discord server. there's shakers an movers there

#

also, manditory linking needed now. https://www.youtube.com/watch?v=ZG_k5CSYKhg

Faith No More - "Epic" (Official Music Video) from the album 'The Real Thing' (1989)

🔔 Subscribe to UPROXX Indie Mixtape and ring the bell to turn on notifications: https://uproxx.it/mrln2hd

✅ Subscribe to the newsletter for weekly music recommendations in your inbox: http://indiemixtape.com

🎧 Stream the official Topsify playlist: https://lnk...

▶ Play video
mortal mesa
# rapid pivot what is it <:thomas:1005605185013416016>

Full show notes: https://www.latent.space/p/comfyui

Happy new year friends! Thanks for all the love on the Latent Space Live and 100th Episode End of Year recap. Your support has boosted us 30 places in the Podcast charts, and that always helps us book great guests and organize more industry events for you! We don't say this enough but thank yo...

▶ Play video
rapid pivot
#

ai video im scared sadcat

#

52 minutes of interview agony

#

I have watched so many interviews these past few months why you do this to me sadcat

sage burrow
uneven storm
brittle nexus
#

Google image is bizarrely good

sage burrow
#

Those are amazing!

#

flux tried 😄

#

SD3 large gave it a try as well (don't count the fingers!)

torpid marlinBOT
bronze ivy
#

大家好

muted dove
#

He was replaced.

flat raft
#

glass rail samples

turbid grotto
#

anyone knows flux finetune that makes it more unique? I want to get rid of this style as it become too generic

cunning lintel
placid plover
#

A group of 8 realistic cats taking a selfie together. The cats have human-like expressions, they are all standing close together in a friendly pose, resembling a group photo. The background shows a blurred indoor setting with other people. The lighting is natural with soft shadows, creating depth and realism

craggy crest
bitter hearth
#

made by tensor art

rapid pivot
bitter hearth
#

4GB of VRAM is fine you can run flux.1-lite-8B-alpha-Q3_K_S.gguf in headless mode

#

its 3.74GB so it will fit

toxic bone
#

only leaves .26gb for generation though. better off using an sd15 refine then

bitter hearth
#

is not an issue

#

so long as you are running headless

#

your screen goes black while image is generating

#

and it works ok

#

LOL IDK if people would like this advice though, they might not like their screen going black

sharp moth
#

dog

toxic bone
#

Screens off? what are you some kindda luddite hippy communist?

bitter hearth
#

lmao

#

just close your eyes when image is generating and then you won't know

hallow lion
#

at least its not a blue screen (of death)

sage burrow
#

Cloud services or mage are so affordable now, don't need more than 4gb vram at home 😄 @rapid pivot

bitter hearth
#

ye pretty much

hallow lion
#

He's a local cat, doesn't hang out in the cloud.

bitter hearth
#

ah okay that's fine

rapid pivot
bitter hearth
#

oh no

rapid pivot
sage burrow
#

Both glif and mage.space are pretty awesome

civic trail
proven pecan
neon imp
# proven pecan I saved this comment and now I'm wondering what is left of it?

Haven't had time to experiment more and change things but I have shared my ai-toolkit configs and have had many others say they have seen improvements with the changes. I saw ai-toolkit was updated last week but I havent touched anything since I was getting such great results before.

I have been very busy with both work and getting my Project Odyssey 2 video finished before the deadline but I have uploaded an absolutely amazing 3.5 L negative detail LoRA that outshines everything I had done before (with no changes to the dataset or other settings) so I am convinced there is something there. Wish I had more time to dive in but have published my "best guess" at the time to the cause as an article on Civitai.

#

This starts as a SD3.5 base render and each frame is a decrease in lora strength by 0.01 (since its a negative lora) The video pingpongs back down to a loop.

neon imp
# bitter hearth how do you make these?

I will be posting a full article explain the process and various improvements I have found to the process, share my dataset, ect I have found with what I call negative reinforcement training. I am currently spreading myself a little thin. The idea is to train on what you don't want and use the end lora with a negative strength value to force things to a conceptual opposite latent vector value.

I first found this by accident when training a SD2.1 textual embedding to try to make images I could feed into "point-e" by OpenAI to make 3d point clouds. But I had not gotten results as stable as this with 3.5 until the recent changes I have been discussing.

bitter hearth
#

is it like something extremely smooth or blurry?

neon imp
# bitter hearth okay thanks I was wondering what the conceptual opposite of high detail like thi...

My first dataset was the seed images used for COCO CLIP R-Precision evaluations you can find it at the bottom of OpenAI's point-e github page.

The images resemble something between early CAD and 2000s video game renders with simple flat colors and minimal details. I find at times some pixel art or anime loras can also have somewhat of this effect to some degree and always recommend doing a negative test with loras you find in the wild because you never know.

bitter hearth
#

okay thanks this makes sense yeah

#

funnily enough negative schnell lora does this a bit

neon imp
#

I have been testing with the de-distilled versions of Dev and Schenell and have had some limited results so far but want to get them to a better place before sharing.

bitter hearth
#

lol I love this dataset
its funny that it worked

#

yeah it could work on de-distilled

neon imp
#

The Dev one I recently did works but only from 0 to -0.25 and then things just get crazy in the images. Ill prob share it soon but I got to get back to my PO2 todays my last day before getting back to the office Monday.

rapid pivot
bitter hearth
#

maybe this would make a good negative lora for flux

craggy crest
cursive frigate
#

I just put this in the ollama chat... I think this is an interesting conversation starter....

I kind of had a runaway thought.... hear me out.

So they say AI on regular computing with LLMs and such are way different than Quantum Computing, that is 100% true, however. Why not give the best AI access to Quantum computing data like RAG or a knowledgebase and see if AI can help advance Quantum Computing.

Its probably already been thought of but, I haven't heard anyone mention it so I figured I would put it out into the ether.

#

Any takers?

lavish sparrow
analog dome
#

keywords?

lavish sparrow
lavish sparrow
lavish sparrow
lavish sparrow
#

so yeah. LLM + sd3.5L is just the way to go.

tropic vine
#

Hello, I have an issue with diffusion models on a new computer
it's with an RTX 4090, when testing it with flux-dev, it seems to take forever to generate an image, several long minutes
what do you think im might be missing?

lavish sparrow
#

"A million miles away"

lavish sparrow
#

"Bury the light"

#

"Stained, brutal calamity"

#

"several species of small furry animals grooving together in a cave with a Pict"

lavish sparrow
bitter hearth
rapid pivot
lavish sparrow
bitter hearth
#

okay nice

analog dome
bitter hearth
craggy crest
rapid pivot
#

sadcat moss man

craggy crest
lavish sparrow
# analog dome yes you

in that case, the keywords sent to the llm: alternate species, monsterification,hybrid,, mist and fire creature:, nature, anime ->

"T5": "In a mystical forest under a twilight sky, a hybrid creature emerges from swirling mist and flickering flames. Part beast, part ethereal being, it combines elements of various natural forms—sharp, clawed limbs intertwined with delicate, flowing plant-like appendages. Its body is enveloped in a cloak of smoke that dances like living flames, casting an otherworldly glow. The creature's eyes glow intensely, reflecting the fiery and misty elements around it. The scene is vibrant yet eerie, with deep greens and fiery oranges blending seamlessly, capturing the essence of both nature and monstrous transformation. The anime style renders this creature with exaggerated, fluid movements and expressive features, emphasizing its hybrid and fantastical nature.",
"CLIPG": "hybrid creature, mist, fire, anime style, forest, glowing eyes, plant-animal fusion",
"CLIPL": "A vibrant, anime-styled hybrid creature blending plant and beast features, enveloped in mist and flames, glowing eyes, amidst a mystical forest.",
"ARTSTYLE": "Anime",
"NEGATIVE": "photorealistic, mundane textures, dull colors"}```
#

i'm actually using all different parts of the clip

#

thursdays

muted dove
lavish sparrow
#

Cosmos is video model? Looks wicked tho

toxic bone
toxic bone
toxic bone
#

yeah but it's free jank so yippee kiyay

pseudo owl
sullen moss
#

Hello, everyone. Any fresh updates from the SAI team? I’ve noticed that SD 3.5 turned out to be a largely underwhelming model, with very little community activity on Civitai.

lavish sparrow
lavish sparrow
lavish sparrow
#

" winnie the poo, style of warhammer 40k"

#

"being alone doesn't scare me"

lavish sparrow
lavish sparrow
lavish sparrow
urban arch
lavish sparrow
urban arch
mortal mesa
#

ahh good Syd Barret Floyd

#

thank you for the idea, gonna try some Astronomy Domine prompts

lavish sparrow
lavish sparrow
#

3.5 absynth finetune, not bad

bitter hearth
#

the absynth method is cool yeah

#

negative loras

molten anvil
#

What is the best way to create photorealistic images with SD3.5? My experiments sp far are giving me plasticky/cartoony photos. Any ideas would be most appreciated. (Particularly with Turbo model. )

pseudo owl
#

Above is 4steps, this is Schnell with 1step only and looks much better(although white borders still there)

turbid grotto
strange ermine
#

give me an image that looks like this

#

give me a picture of an envelope

lavish sparrow
rapid pivot
#

:<

lavish sparrow
#

computer virus

hallow lion
lavish sparrow
hallow lion
#

Just asking coz hunyuan is pretty tremendous for a local model so is it worth looking at other models for now...

#

if hunyuan gets image to vid thats like game changing

buoyant mesa
#

where can i train sd3.5 Loras ....you cant do it in KohySS so far as i know?

bitter hearth
proven pecan
turbid grotto
pseudo owl
# hallow lion is it better than hunyuan tho?

Hunyuan is miles better at text to video in quality. Cosmos does have a few benefits like 7b one is a bit faster, and vae is more efficient like Neon said. Also has image to video which is very useful.

Quality wise, hunyuan is much more better and comparable to closed source stuff while cosmos is considerably behind at img2vid, text2vid. But it’s pretty controllable at least.

#

It is the best open img2vid so far right now

bitter hearth
#

I'm not saying the Cosmos VAE is more efficient I'm saying its higher quality

sage burrow
rapid pivot
#

adorbable

brazen lake
#

dog

#

#dog

muted dove
bitter hearth
#

sadly using a VAE on a different model never rly works
it would have to be retrained

fathom path
rapid pivot
lavish sparrow
lavish sparrow
#

deepseek-r1-32b + sd3.5L is a nice combo indeed ^^

cedar oyster
#

a long sword, simple color, no one, game icon, 2D animation style, white background

fathom path
lavish sparrow
#

I'd have to get the 7b r1 for fair comparison tho..

sage burrow
lavish sparrow
lavish sparrow
fathom path
lavish sparrow
#

yeah, i suppose it does. but marco o1 wasn't bad for it size at all

fathom path
#

Yeah, I've been using it since its been released, and the CoT really good.

lavish sparrow
#

rofl xD nice 4th of july

fathom path
lavish sparrow
#

it's the smartest by a far shot

fathom path
#

Yeah. Flux becomes limited very fastly unfortunately

lavish sparrow
muted dove
muted dove
muted dove
muted dove
real terrace
#

Hi, I haven't try flux for a while, is there some light good models, for 12 GB VRAM? I got OOM when running Flux

split bramble
real terrace
turbid grotto
#

q4 is the lowest you can go, I think. And it looks fine still

#

I am running it at 12gb too, also able to use controlnet

#

svdq can give ~3x speed improvements with the quality of q4 but it has the worst comfy integration. I only managed to make it work outside comfy

#

on other had, there is a teacache which can give 2x improvement for flux dev in comfy, however there won't be noticable improvements at lower steps

real terrace
real terrace
#

And I didn't know encoders were quantized aswell

#

I was using the ones it came out when it came out

turbid grotto
# real terrace so flux dev quantized?

I don't entirely understand the question.
Flux we have is distilled from full model but it can be further quantized in a smart way to reduce memory requirements and not get too big of a loss.

#

also, there is an 8b parameters variant of flux but I am not sure if it worth using over sd3.5l for now

rapid pivot
uneven storm
#

its what i do and i use 12gb 3060

turbid grotto
#

1.62s/it sounds super good

uneven storm
#

can do --xformers in command line arg but i installed manually

#

but wavespeed node is what makes it fast

#

with 3060 wont be able to use compile+ node and only the block cache

turbid grotto
#

Thanks!!
I will try it

uneven storm
#

np

real terrace
#

but ty

#

and I have AMD card

tulip wadi
real terrace
real terrace
pseudo owl
#

Some realistic 1step gens with Flux Schnell, no loras or anything.

rapid pivot
#

no one played his games anymore, had to retire x.x

hallow lion
#

Winnie the chinese president lora

#

So you guys know about tiananmen square regardless of this hush vee pee enn thing right?

#

google it lee!

#

You almost did it

#

Give it another try

#

😉

#

Trump's got you

#

It's now or never

#

Ceausescu PCR

#

I grew up in Romania under that fker. If we can do it so cna you.

#

Alo Alo Beijing.

turbid grotto
summer ginkgo
rapid pivot
remote holly
remote holly
#

i never seen theses finetunes on DiT models

bitter hearth
#

next Pony is DiT

#

on auraflow

#

things will change a lot when GB200 NVL72 comes out

#

its gonna unlock quite a lot of new abilities in terms of training

#

the biggest issue with training is that there is clearly a benefit from large batch sizes in terms of training quality
but to use very large batch sizes at high speed is not easy- the issue is the communication between the GPUs
GB200 NVL72 goes a long way towards fixing that because it puts 72 big Nvidia machines in one pod

#

I think the issue with anime models is dataset quality though rather than compute, at the moment

#

but how you build a high quality anime dataset out to the tens of millions of images I do not know

#

its 1000x easier with photographic stuff because to some extent all photographs above a minimum quality level are useable, whereas anime has to be very specific styles and content

finite hollow
#

has there been much changes in the last 3 months ? i didnt watch the news kinda

#

i can't seem to find improved pictures in any of the channels here sadly

remote holly
#

@bitter hearth I didn't know that, thank you for your explanations, I'm really looking forward to seeing these finetunes come in the DiT models because understanding the prompts they offer will be a game changer, I don't really like this tag system I prefers to use natural language

bitter hearth
#

yeah that would be the big advantage of a DiT anime model, the prompt following

rapid pivot
#

One day we'll have anime models that aren't dumb

bitter hearth
#

Janus-Pro-7B is nice
bare in mind its 384x384 and is not one of the fast types of autoregressive

sullen moss
bitter hearth
#

fairy sure the hype is just over the brand name deepseek and not because people actually want a 384x384 autoregressive model

hallow lion
#

mods

#

get him!

muted dove
brittle nexus
#

Sorry for the dumb question but this cheaper and efficient training method used by deepseek can help img2img models?

dry wave
#

there is no new efficient training method

#

training on artificial data can help speed up training - PixArt is already doing that. I would still prefer large scale tuning though

pseudo owl
# brittle nexus Sorry for the dumb question but this cheaper and efficient training method used ...

Yeah kinda, this is a very very cheap cheap moe diffusion models yet by Sony. It’s not bad for the size, but everyone still uses something like flux Schnell/hyper for speed

https://github.com/SonyResearch/micro_diffusion

GitHub

Official repository for our work on micro-budget training of large-scale diffusion models. - SonyResearch/micro_diffusion

#

This is not a moe architecture but very fast training(21x faster) as well, and better then normal dits at similar sizes. It’s more of a demo then an actual usable model but has a lot of potential: https://github.com/hustvl/LightningDiT

lavish sparrow
errant dust
remote holly
#

the svdQuant project seems abandonned , that's sad

dry wave
#

I know they do a lot of engineering and optimization like training on fp8. It's not a new method, though.

#

you can optimize every training setup and everyone is doing that already, although some teams are definitely better in this than others

bitter hearth
craggy crest
errant dust
#

Not stunned by superior quality. Stunned that they released it

#

It wasn't on anyone's radar

bitter hearth
#

it was a suprise yeah

#

there's a new Lumina too

#

https://huggingface.co/Alpha-VLLM/Lumina-Image-2.0

cunning lintel
bitter hearth
#

I think this might be really good yeah

cunning lintel
#

It feels a bit like auraflow while trying a few prompts in their gradio space, in that it at times seems to be rather rigid in following prompts. fun 🙂

bitter hearth
#

that was likely their previous model

dull star
#

yes the space is using Lumina-Next-SFT

cunning lintel
dull star
#

yes

#

if you use that and not the one from huggingface then you were right

bitter hearth
#

oh sorry this is correct
I assumed you went from their huggingface

violet escarp
bitter hearth
#

yeah 2B is rough for DiT

#

Sana 1.5 just dropped at 4.5B or so

#

might have potential

violet escarp
#

Sana is different

#

it depends on the channel count for the vae and compression

#

there was some math I saw that's a bit over my head. The main takeaway I got from it is that a higher channel vae is harder to train with so it needs more parameters. Sana is also higher compression so it doesn't need as high of a parameter count.

the DOF ratio between the input vector dim size vs model dim matters a lot here
16(64) channel vae needs DOF larger than 32x atleast
so you need atleast 2048 hidden dim

bitter hearth
#

better VAEs are harder to train with yeah

#

its harder to get the DiT training to converge

ashen abyss
bitter hearth
#

dice?

cunning lintel
still pike
#

请画一幅满屏幕都是笑脸的橘子和苹果

craggy crest
#

not that it did much in the long run but it sure did damage for a few hours

bitter hearth
#

I could tell when people hadn't tried the demo, for Janus-Pro-7B

fiery wharf
#

the only fake hype was openai lying about inflated costs to train LLMs just to milk investors thats why the stocks went down Sam Altman bs finally caught up to him

bitter hearth
#

I think Deepseek themselves did not hype Janus-Pro-7B yeah
I think some journalists found it and wanted to make an exciting article

#

the Janus-Pro-7B paper makes it clear several times that its not going to be amazing image quality and that its just a base for future models

#

its only 384x384 after all

#

if you want nice autoregressive image model that is out now you can use Infinity, Switti or the CoT version of Show-O

mortal mesa
#

i liked the image classification or whatever you call it on the 1b, but dont really need it

bitter hearth
#

yeah the image understanding was what it was really about

#

it improved that a lot, for that class of model

#

these are the other models I mentioned if anyone wants to try them https://huggingface.co/FoundationVision/Infinity/tree/main https://huggingface.co/yresearch/Switti https://huggingface.co/ZiyuG/Image-Generation-CoT

#

they look good already, in their current form

silent iris
dreamy merlin
#

Hello, I have a hard time running stable diffusion large on with A4500 with 20GB Vram, it is always running out of VRAM. if I us fp8, it is runable, but how to run without quantization? I heard someone able to run it on even smaller VRAM.

lavish sparrow
#

shopping for groceries in 2026

lavish sparrow
lavish sparrow
shy leaf
#

scam link

turbid grotto
#

Anyone has info about Stability, how are they?

silent iris
#

pretty good

#

would i say

#

the best you can do is to add the prompt ''Detailed environment'' or ''Detailed background environment'' should give you every time a high quality image

lavish sparrow
hallow lion
turbid grotto
fathom path
bitter hearth
#

they made a 3D model

#

so there's at least something going on

hallow lion
#

WAIT!

#

WHERE IS EMAD!

#

WHERE IS THE EMAD EMOJI!!!!????

#

WHAT HAVE YOU DONE?

fiery wharf
hallow lion
#

That is not a real dollar bill or I would go to jail.

analog folio
turbid grotto
terse aspen
#

Apple, engraving in the style of Dürer

cunning lintel
wise pewter
#

请画一幅满屏幕都是笑脸的橘子和苹果

cursive frigate
#

I forgot where to put bbox files in comfyui. Can anyone help me with that?

runic tusk
tawny orbit
craggy crest
neon imp
mild wind
#

please generate a picture showing a house

mental moth
#

please generate 4 pictures with a Tiger

lucid swift
lucid swift
#

this is the lumina 2.0 model

sick pendant
#

generate 4 pictures to Halloumi cheese

heady bluff
#

please generate an image of apple

dull star
#

The lumina model is okay.

#

The point is that it's Apache-2, but unlike auraflow it's much smaller

turbid grotto
lapis relic
#

please Generate a living white shrimp with the cephalothorax organs clearly visible

lucid swift
#

i cnat generate more images because i am no longer on a ai computer i am just at a steamdeck now

dull star
#

yeah I hope we'll get a painting lora or something

craggy crest
pseudo owl
turbid grotto
pseudo owl
cunning lintel
#

SD3.5L and Lumina 2.0

#

I'd love a bigger lumina 2.0, its style and details fall a bit short, but its prompt following (not apparent in such simple prompts) is really next level. As it's now i think it'll just be an interesting curiousity,

violet escarp
# cunning lintel SD3.5L and Lumina 2.0

It doesn't even have to be as big 3.5L. 3B could be pretty nice. I'm pretty sure sd3 and flux are kinda inefficient since they use fused transformers. I know someone who wants to finetune lumina 2.0 as well.

violet escarp
#

they might add extra parameters to lumina and train like that

turbid grotto
craggy crest
civic trail
tacit lodge
#

一个男孩刚醒来坐在床上,窗外是灰蒙蒙的阴天,动漫场景

sullen moss
#

Flux. LOL

meager yew
#

Generate a image with a blue bird, that has three claws on its wings, it is flying in the sky.

native olive
#

"cinematic wide shot, 21:9 aspect ratio, 1920s Jinan Railway Station, steampunk atmosphere, baroque architecture with Chinese elements, crowds in Republican-era clothing, steam locomotive emitting smoke, golden hour lighting with volumetric rays, Kodak Ektachrome film simulation, intricate historical details, hyperrealistic textures"

hallow lion
#

Go Dugtogs!

craggy crest
lucid swift
#

did stability ever release the creative upscaler?

craggy crest
pine forge
#

@ hensen

#

请画一幅满屏幕都是笑脸的橘子和苹果

wild ravine
#

"A beautiful and enchanting humanoid nine-tailed fox spirit from ancient Chinese mythology, blending human elegance with mystical fox traits. She has long, flowing silver-white hair with golden highlights, and her eyes are sharp, intelligent, and glowing with a magical aura. Her nine luxurious tails fan out behind her, shimmering with ethereal energy. Her face is delicate and serene, with a hint of otherworldly charm. She wears a traditional Chinese robe adorned with intricate patterns, standing gracefully in a misty, ancient landscape surrounded by bamboo forests, flowing rivers, and distant mountains. The atmosphere is dreamlike, with soft moonlight illuminating the scene, evoking a sense of mystery and fantasy."

astral sigil
#

The back of a woman on a cliff.

hallow lion
lucid stag
#

A glowing portal to another dimension,星空背景,科幻风格,次元之门,门外是城市夜景,门内是奇幻世界,4K高清,cyberpunk style, glowing particles, futuristic, --v 5 --ar 1:1

frail shoal
craggy crest
dark gust
#

Whatever happened to SD 3.5 medium controlnet?

woeful terrace
#

opps

#

oops

bitter hearth
#

you can use them now if you want

#

they made a turbo lora as well

devout schooner
#

Original SD3 (left) vs SD3.5 Medium (right), on the "Juggernaut XL Model Card Lady Prompt". 30 steps / DPM++ 2M SGM Uniform / CFG 4.5 / same seed for both. Eyes are focused oddly in the original SD3 one but the overall image is aestheticically way closer to what I'd want
It's a somewhat unfortunate recurring trend I've found after using both for a while now
SD 3.5 Medium is definitely compositionally way more coherent for photographic gens (moreso for non-closeup full body stuff) but it trends far more towards a sort of fake airbrushed look aesthetically than the original SD3 did
A model with OG SD3 aesthetics but 3.5 Medium coherence would be essentially perfect lol

bitter hearth
#

SD3M was more photographic yeah

abstract egret
#

Prompt:(minimalist logo design), (granular texture), (fading gradient), (data visualization elements), muted color palette,
clean background, geometric shapes, symbolic metaphor, (calm and rational mood), high detail, 4k, Negative Prompt:complex patterns, 3D render, glossy effects, neon colors, handwritten fonts, chaotic composition

remote holly
#

where are medium controlnets !?

steel remnant
#

sdsad

#

Prompt:(minimalist logo design), (granular texture), (fading gradient), (data visualization elements), muted color palette,
clean background, geometric shapes, symbolic metaphor, (calm and rational mood), high detail, 4k, Negative Prompt:complex patterns, 3D render, glossy effects, neon colors, handwritten fonts, chaotic composition

tribal monolith
#

prompt:A hyper-realistic portrait of a young man with delicate facial features, holding a cup of coffee in a cozy café. His hands are elegantly positioned, with natural-looking fingers and realistic skin texture. A newspaper with the headline "AI Revolution" is visible on the table beside him, with sharp and readable text. The café background has warm lighting and a blurred effect for depth.

dim sigil
#

PROMPT: Create hyper-realistic background image designed for use in a video. The scene features professional-grade lighting with a warm, inviting atmosphere. A sleek modern desk is positioned to the right, camera left angle, complementing the overall aesthetic. The background has a blurred yellow neon effect, adding depth and cinematic appeal. The composition is clean, with no people present, ensuring a seamless integration into video production.

muted cargo
#

prompt : an image showcasing how there is no image generation bot active in this channel

rain current
mossy prawn
dry wave
# rain current sd3.5-large

looks good. My own results with sd3 large so far are rather disappointing. What's the trick to make it look so good?

dull star
rain current
#

3.5L, sorry

craggy crest
#

SD 3.5 large

craggy crest
tardy imp
#

prompt : beans

dry wave
#

also, I thought the big advantage of sd3.5l would be negative prompts - but it has issues with them

craggy crest
dry wave
#

it's also good with complex prompts

#

but I'm still experimenting. Haven't found the sweet spot for sd3.5 yet

bitter hearth
#

it worked better in the official demo on huggingface than in Comfy and I am not sure why

wild adder
#

将这双鞋子的背景替换成在木质地板上面放着

dusky thistle
#

SD35M

#

bongsampled RES_3S with pseudoimplicit guidance

stone wasp
#

prompt : beans

muted cargo
#

prompt : no generation robot available here, check out #artisan-faq

bitter hearth
#

I didn't like the subject matter but I liked the technicals a lot
this is rly impressive

#

the way that green guy flies up at the start is good

#

its hard to make the video models do movement directly towards camera

dusky thistle
maiden pasture
#

circle0624

jagged gate
tribal thistle
#

A long-haired girl leaning on a mailbox, standing on a busy 1940s Shanghai street, with a few pedestrians walking and vendors setting up stalls on both sides, grayscale, high resolution, slightly blurred background."

jagged gate
past cipher
# jagged gate

Are you making assets for a game? Because a lot of these look like they would fit right in with a puzzle game.

dusky thistle
dusky thistle
#

all bongsampled sd35 medium

dusky thistle
jagged gate
granite cove
#

bear

#

Original SD3 (left) vs SD3.5 Medium (right), on the "Juggernaut XL Model Card Lady Prompt". 30 steps / DPM++ 2M SGM Uniform / CFG 4.5 / same seed for both. Eyes are focused oddly in the original SD3 one but the overall image is aestheticically way closer to what I'd want
It's a somewhat unfortunate recurring trend I've found after using both for a while now
SD 3.5 Medium is definitely compositionally way more coherent for photographic gens (moreso for non-closeup full body stuff) but it trends far more towards a sort of fake airbrushed look aesthetically than the original SD3 did
A model with OG SD3 aesthetics but 3.5 Medium coherence would be essentially perfect lol

dusky thistle
proven pecan
icy drift
icy drift
jagged gate
icy drift
#

Tried to make a fox playing with a butterfly, but forgot to change my prompt. "A plasma orb UFO full of lightning hovers slowly above a rustic barn at night. The light from the plasma orb UFO illuminates the scene with a silvery glow. The plasma orb UFO flies away in a flash."
Poor guy. Zapped by a butterfly ufo.

tidal oasis
#

1

ancient plume
#

A close-up of a glowing, fiery Sun with bright orange and yellow flames swirling on its surface. Solar flares shooting out, creating a mesmerizing effect. Space in the background with small distant planets visible.

amber nest
fervent tapir
#

A serene and introspective scene of a young adult sitting cross-legged on a cozy bed in a softly lit room. The person is holding a leather-bound notebook in one hand and a pen in the other, deeply focused on writing. Their expression is thoughtful, with a slight smile, as if recalling vivid dreams. The room is warm and inviting, with soft morning light streaming through sheer curtains. A cup of steaming tea sits on a nightstand nearby, and a few books are scattered on the bed. The atmosphere is peaceful and reflective, emphasizing the act of self-discovery and mindfulness. The art style is realistic with soft, dreamy lighting, capturing the quiet beauty of the moment.

queen edge
#

Intricate dragon and phoenix embracing a candle flame, traditional Chinese ink painting style, gold and crimson colors, flowing ribbon with company name

devout schooner
# craggy crest Sd 3.5 medium is more artsy

I've assumed that was the case mostly
let me tell you though, from the perpective of someone who has actually very very very extensively tried to train Loras for SD3 originally a bit and now more recently SD 3.5 Medium (and still is)
it's not, in fact, "easy to train" in any way shape or form relative to SDXL (or Kolors)
"easy to train" would mean I could mindlessly use the exact same UNET LR 1.0 / TE LR 1.0 / Cosine Scheduler / Prodigy optimizer settings for literally any dataset and they would be 100% guaranteed to produce desirable results every single time without fail no matter what (as is the case for all UNET based models)
and it'd also mean that the extremely annoying exploding gradient thing wouldn't be a problem that existed at all (as it also wasn't in any way for UNET-based models)
TLDR Basically from the perspective of an enduser / "finetuner", DiT as a general architecture seems rather flawed in all honesty, as in practice you only notice the numerous blatant downsides (can't do normal hi-res-fix in the way people have come to expect, is limited to very mediocre sampler / scheduler combos, and so on and so on an so on), you do not notice any of the upsides of the architecture that (supposedly) exist joeshrug

craggy crest
#

SD3-2b-medium was released as an unfinished beta - it is missing a lot of the fine tuning that normaly goes into a model, and was only releaed to enure the community that SAI is still interested in being open source

#

SD 3.5 has all the fine tuning, and we worked very hard to make sure it was very easy to train

devout schooner
# craggy crest SD 3.5 has all the fine tuning, and we worked very hard to make sure it was very...

I think you missed my overall point
e.g. left / first is stock Flux Dev, middle / second is stock Kolors, right / third is Kolors with a photo Lora I trained (on the same seed)
the Flux output is arguably significantly less sensible composition wise, and certainly the ONLY place it has any kind of real advantage is in being rendered with a 16-channel VAE
a hypothetical Kolors with a 16-channel VAE would make all versions of Flux and all versions of SD 3 / SD 3.5 look like absolute jokes comparatively speaking in terms of output-quality-to-overall-ease-of-use-and-resource requirements

#

a company that makes a model that in practical terms functions EXACTLY like SDXL, but with a 16-channel VAE and a better text encoder
WILL make a kravillion dollars day one
is all I'm saying

past cipher
craggy crest
devout schooner
# devout schooner a company that makes a model that in practical terms functions EXACTLY like SDXL...

(additionally you don't want to see the SD 3.5 Medium outputs for this prompt because it's almost completely impossible to not have her fingers be melty noise weirdness)
I like the photographic realism of all versions of SD3, comparatively to Flux
but the weird, weird noise issues it has even in 3.5 are just super annoying
but again it's not really about SD 3.5 in particular
it's about the supposed advantages of DiT as an actual architecture not actually being visible in any way in any model that anyone has ever released

#

in practical terms, at least

devout schooner
craggy crest
# devout schooner re-read what I said, I guess, I don't think you got my overall point which is t...

of course they do. but you said "a model that in practical terms functions EXACTLY like SDXL, but with a 16-channel VAE and a better text encoder, WILL make a kravillion dollars day one" and i'm saying that people are in ruts. those that like sdxl will just use sdxl. and everyone is already getting into a rut with the other shiny toys. by the time someone does that, and no one's likely to now, no one will even look at it

devout schooner
past cipher
devout schooner
#

i'm just saying like, from a practical third-party training perspective and inference perspective
absolutely no extant DiT model actually has any architecture-specific advantages that are visible regardless of what advantage they might have on paper
in particular the supposed better support for multi resolution seems like absolute fiction in practice
becauase the whole image just getting ugly artifacts even if the composition might be perfect, when going outside the trained resolution range, is far less easy to "fix" than just re-rolling a seed if you get like an extra foot or something, and just generally way less preferable as an outcome
and also because you could already just go ahead and train UNET models at whatever res you wanted, even if it was beyond their original training res

#

basically what I meant was, assuming people WOULD actually use it, hypothetically, the practical manner in which SDXL functioned from an inference and training perspective was completely perfect
so the "perfect model" would in theory be one that was not in any way different in those regards
but just had a better VAE and stronger text encoder

#

the officially suported samplers for DiT are a notable pain point too
absolutely nobody would ever use Euler SGM Uniform or DPM++ 2M SGM Uniform if they didn't have to
because they're just really not very good in comparison to e.g DPM++ SDE Normal or DPM++ 3M SDE Exponential or what have you

#

ComfyUI hacks to make the Ancestral ones work help a lot in that regard but it's still not perfect

#

so that just again seems like a design flaw

dusky thistle
dusky thistle
#

SD35 medium

devout schooner
dusky thistle
#

Res_2s is the same speed as the old dpmpp_sde and is pretty special with bongmath on

devout schooner
# craggy crest of course they do. but you said "a model that in practical terms functions EXACT...

one other thing I should mention (again, as one of the few people I think who has actually very painstakingly trained the same datasets over and over and over again on both the original SD3 and SD 3.5 Medium just as I'm the sort of person who actually enjoys fiddling with this sort of thing)
they have BIG issues picking up the likeness of single subjects with any remotely obvious training settings
which is what most people are going to check first

it's not impossible to get good results, but you're literally basically limited to the CAME optimizer (AdamW and Prodigy seem to be total dead ends for single subjects for reasons that aren't really clear to me at the moment)
and also training as Dora instead of Lora (at a low "factor", no higher than 2 - 4) with 64 Dim / 32 Alpha is pretty much a necessity in particular for photographic datasets as far as single subjects go
and lastly due to the annoyingly-rigid-and-not-actually-better-in-any-visible-way that resolution works in DiT models, to avoid artifacting you kinda have to (I'm referencing SD 3.5 Medium again specifically, here) train at a BASE resolution of 1440x1440 with images that are all equal or higher to that resolution in the first place and bucketing enabled to sort them properly
simply to avoid severe degradation of base model knowledge

figuring out literally all of that by myself by training the same lora about a zillion times over was the only way I was eventually able to get this pretty accurate Sydney Sweeney likeness, for example

the overwhelming majority of people will never go to the lengths I did, they will just immediately throw a model in the garbage if it doesn't perfectly and predictably learn the likeness of XYZ single-subject with very obvious default settings with absolutely no potential for "exploding gradient" whatosver (as is the case for all UNET based models)

so that's again what I really meant by "easy to train", no DiT model comes anywhere remotely close in that regard (not even Flux, because degradation of base model knowledge is still a huge issue there and you also typically need about 2x as more steps than any UNET model did to get good results)

devout schooner
past cipher
dusky thistle
#

guide image

#

output (WF embedded)

#

they will take longer as you go from stuff like 2s to 3s to 4s, but the quality will sometimes go up spetacularly

#

with medium i like using stuff like res_3s and res_5s

#

adding a bongmath implicit step will make it take longer but can also really improve things

dusky thistle
bitter hearth
#

and so models are being made to handle less and less variance over time

#

this does allow them to train easier and get bigger, but then when you use them the sampling is more restricted

dusky thistle
fallen marsh
#

👍

dapper stream
#

/generate

violet escarp
gusty trail
violet escarp
#

and it also learns slowly. It's not as bad as sd3 though since it uses more efficient arch

#

Also more efficient than Flux, but better arch isn't enough to make up for Lumina's size

#

I heard Lumina team is going to release a bigger version though eventually

#

The efficient arch being that it doesn't used fused transformers which sd3 and flux do use btw

#

Flux also wastes 3b on encoding timestep embedding

dry wave
#

VAE channel count has nothing to do with model size

#

you only have it in the input and output, that doesn't matter

#

it might be true that training on a larger vae takes more time, though, as it preserves more fine details which are often hard to learn. But I don't think that this is the reason why models take more time to train

#

I mean, the main reason why Flux is taking so much time to train is probably that it is not a CFG model

devout schooner
# dusky thistle

gonna try these now
would you say CFG for the SDE-alike ones kinda "scales" in the same way it originally did? Like for example I would usually run DPM++ SDE GPU Normal at around CFG 5.0
or DPM++ 3M SDE GPU Exponential at around CFG 4.0

devout schooner
dusky thistle
#

yeah should be fine

#

i usually do cfg around 5.5 with medium

devout schooner
# bitter hearth I totally agree with most of what you are saying giant 16 channel Kolors would b...

I guess the gist of my point was again it doesn't really seem in practice like any existing newer model is "better" specifically because of being DiT and having XYZ more parameters than any given older UNET model
the improved text encoders and higher quality VAEs seem to do the overwhelming majority of the heavy lifting
and then there's various factors that come off as straight-up regressions in practice with DiT
like the whole "the image just immediately begins to artifact randomly when you go outside the training range" thing
so if your max is just 2MP like on SD 3.5 Medium you kind of have to train loras at that resolution to begin with just to have at least some hi-res-fix headroom when coming up from a generation at a more standard lower res (because the artifacting problem doesn't happen in reverse, e.g. it can scale down fine seemingly, just not up)

devout schooner
#

BongSample 2S definitely very nice

#

is there any particular one you recommend using for half-strength-denoise hi-res-fix passes? E.G. typically I would tend to do DPM++ 2M Simple at 0.5 denoise strength and the same CFG of like 5.0, for hi-res-fix on an image generated with DPM++ SDE GPU Normal

devout schooner
dusky thistle
#

yeah multistep is pretty good for when you want the image to stay similar

#

you might like some of the guide stuff too, that can help with that

dusky thistle
dusky thistle
dusky thistle
bitter hearth
#

some of the issues in this conversation were more to do with rectified flow loss, and its possible to have DiTs that don't have that

#

e.g. Pixart Sigma or Flag DiT

devout schooner
bitter hearth
#

I wish big Pixart came but it did not come out

#

Pixart team became Sana

devout schooner
devout schooner
bitter hearth
#

its worse quality but its a lot faster and less resource intensive

devout schooner
#

it uses more memory and the encodes / decodes are a lot slower

bitter hearth
#

are you including the diffusion time

#

its over 100x faster than flux

devout schooner
# bitter hearth are you including the diffusion time

i'm talking about like
in ComfyUI
just the actual like, "write image to file" decode, when the image is done
was WAY slower with Sana when I tried it
than the same kind of decode with the 16-channel SD3 or Flux VAE is
and the initial load is a bit longer too of course because like I said the physical file is much bigger than the SD3 / Flux VAE files

bitter hearth
#

oh I see

#

yeah if you don't include the diffusion time or the diffusion model vram then Sana is slower and more vram-heavy

#

to just decode a latent with the vae

devout schooner
bitter hearth
#

there are some niche areas of machine learning where you only use a VAE

#

so yeah for those it could be worse

past cipher
bitter hearth
#

LOL

#

I read that some people use VAE encode/decode to store images

#

is a cool idea

#

although if you were gonna do that, the greater compression ratio of Sana might be good

devout schooner
bitter hearth
#

Gemma is faster than T5 though

devout schooner
#

it could be a node code issue

#

not sure really

bitter hearth
#

oh a Q8_0 quant of T5 encoder is indeed slightly smaller than Gemma encoder apparently

devout schooner
#

again it could be that the "ExtraModels" code for "Gemma Loader" is just slow in some way relative to the City96 GGUF loader also
I don't know

bitter hearth
#

its normal for Q8_0 T5 to be smaller than Gemma

#

its just that you are comparing a quant version to an unquant version

#

which is not a fair comparison

devout schooner
#

but there wasn't one that worked with that whole ExtraModels node system I don't think

bitter hearth
#

I had a look and there are two other implementations of Sana

#

the official one, and one by the SVDQuant team, both are in Diffusers though

devout schooner
#

in another news it seems like Clownshark stuff makes Camera Lady prompt work a lot better, so far 👀

#

in SD 3.5 Medium that is

#

normally her fingers kinda just melt

#

SLG helps but it's usually too high contrast for this one for some reason
so getting a good result without it is pretty cool

bitter hearth
#

yeah the noise scaling is better

#

it makes SD 3.5 and flux look nicer

devout schooner
gusty trail
#

new 6B model with 16c vae and apache license

past cipher
# gusty trail https://huggingface.co/THUDM/CogView4-6B

No, just no. The requirements are too high

Resolution     enable_model_cpu_offload OFF     enable_model_cpu_offload ON     enable_model_cpu_offload ON
Text Encoder 4bit
512 * 512     33GB     20GB     13G
1280 * 720     35GB     20GB     13G
1024 * 1024     35GB     20GB     13G
1920 * 1280     39GB     20GB     14G
2048 * 2048     43GB     21GB     14G
bitter hearth
#

you could use cloud

#

my first CogView4 image

devout schooner
past cipher
devout schooner
bitter hearth
# devout schooner what was the prompt for this one

Cinematic movie still of a majestic dragon resting in a dense, misty forest. The dragon’s scales glisten under soft, diffused light filtering through towering ancient trees. Mist swirls around its massive wings, and glowing embers float in the air, hinting at its fiery breath. The scene is captured in a dramatic wide-angle shot with rich cinematic lighting, deep shadows, and a shallow depth of field, evoking a sense of awe and realism. Ultra-detailed textures, realistic foliage, and a filmic color palette enhance the immersive atmosphere.
but I pressed prompt enhance button as well

bitter hearth
#

you can calculate VRAM estimates btw

#

they won't be 100% accurate but

#

VRAM use is highly linked to the parameter count

devout schooner
#

I got this for a quick 25-step gen with SD 3.5 Medium, just using Euler Ancestral Beta at CFG 6.5, no fancy Clownshark stuff

bitter hearth
#

its nice yeah

devout schooner
# bitter hearth its nice yeah

it's not really a great prompt to test I don't think, neither your or my gen are really "better" than this one I just did with Base SDXL and two loras lol

bitter hearth
#

I know I don't rly know why you wanted it lol

devout schooner
#

try camera lady on it a close-up photograph of a young woman holding a vintage camera in front of her face. She is looking directly at the viewer with a serious expression on her face, as if she is taking a photo. The camera is silver in color and has a large lens attached to it. The woman has long dark hair and is wearing a black top. The background is blurred, so the focus is on the camera and the woman's face. The lighting is soft and natural, highlighting her features.

devout schooner
bitter hearth
#

the demo broke for me sadly

#

it just keeps going 120 seconds plus

#

its a common bug with grado demos

gusty trail
#

Kolors is underrated.

devout schooner
gusty trail
devout schooner
#

it'd have to be kind of separate like that

#

my Lora also doesn't have a text encoder component at all anyways since I wasn't going to try to train ChatGLM obviously, it's just the UNET part
which seems to be all you need

#

I actually don't think a ComfyUI node that could load a Kolors Lora with some sort of ChatGLM part even exists lol
nor am I sure you even could train a Lora like that

#

now that I think about it

bitter hearth
#

I would still recommend chinese prompting but I don't think you need it yeah

#

wow OpenKolors looks great, thanks for this

gusty trail
#

Personally, I would say it is much better than official one in most case.

bitter hearth
#

yeah for general use its better

#

I really love the base Kolors style but base Kolors is artistic

#

OpenKolors looks better for photography

devout schooner
# bitter hearth OpenKolors looks better for photography

I really should release my Kolors photo Lora, I keep meaning to lol
it's pretty good IMO
a photograph of a woman standing at a formal event. She is a young woman with a light skin tone, striking green eyes, and a slender, athletic build. She has blonde hair styled in loose, wavy layers that fall just below her shoulders, with a subtle, elegant updo at the back. She has a natural, glowing complexion and wears a sophisticated makeup look featuring a nude lipstick, subtle blush, and well-defined eyebrows. She is dressed in a luxurious, off-the-shoulder, white satin gown with a deep V-neckline, accentuating her cleavage. The gown is made of a silky, shimmering fabric that catches the light, giving it a radiant appearance. She wears an elaborate, multi-layered necklace adorned with large, sparkling diamonds that cascade down her chest, complementing the neckline of her dress. The necklace is paired with matching diamond earrings. The background is a vibrant red wall with abstract geometric patterns in shades of red and white, creating a bold, modern aesthetic.
base Kolors is first one, Lora @ 0.7 strength is second one
same seed and everything

bitter hearth
#

its more realistic yeah

devout schooner
#

this is with Lora but the prompt translated to Chinese
which interestingly I guess kinda still works
even though the Lora is not captioned in Chinese
not quite as good though, it misses e.g. She wears an elaborate, multi-layered necklace adorned with large, sparkling diamonds that cascade down her chest, complementing the neckline of her dress

bitter hearth
#

that helps as well yeah if the prompt adherence improves

devout schooner
# bitter hearth that helps as well yeah if the prompt adherence improves

I mean I wasn't going to bother Google Translating all of my made-with-Joy-Caption-and-then-manually-edited captions lol
since I wouldn't know if the Chinese caption result was even good
so seemed to make more sense just to do it in English and improve the (already pretty good overall) English support in the model

bitter hearth
#

it might take a lot of compute though for the english side to catch up

#

is the issue

devout schooner
#

the results for photographic stuff specifically aren't really much if any better on stock Kolors when translated to Chinese anyways, when I've tested that before

bitter hearth
#

ah okay I had read the chinese side was better but maybe it is not

devout schooner
#

another one lol
a high-resolution photograph featuring a young woman of Asian descent with a radiant smile, standing in front of a classic white Porsche sports car. She is wearing a shimmering silver bikini top that accentuates her medium-sized breasts and white, frayed denim shorts that are unbuttoned, revealing her toned abdomen. Her long, wavy black hair cascades over her shoulders, and she accessorizes with a delicate necklace, a wristwatch, and several bracelets on her left wrist. In the foreground, she is pointing a black handgun directly at the camera, creating a sense of excitement and boldness. The background features a clear blue sky, tall palm trees, and a desert landscape, suggesting a warm, sunny location, possibly California or Arizona. The car's sleek, glossy surface reflects the bright sunlight, adding to the vividness of the scene. The overall mood is playful and adventurous, with the woman exuding confidence and a sense of fun. The image captures a moment of high energy and boldness, blending elements of fashion, adventure, and a classic car aesthetic.

gusty trail
devout schooner
devout schooner
gusty trail
#

Of course it could

#

I am just more focus on Chinese characters

dry wave
# devout schooner I guess the gist of my point was again it doesn't really seem in practice like a...

In general I agree with you. I always said the unet architecture is not as bad as many people think and I had a lot of discussions with people who didn't understand that a dit architecture is just simpler but not fundamentally different from unet. I do think that Flux has sometimes an amazing understanding and logic in it's generation so maybe there is something in the mmdit, though. This becomes apparent if you let it generate multi-part images ("give me a technical sketch of a building on the left and the very same building as photography on the right").
Regarding resolution: SD 3.5 just has a very shitty resolution handling. I wouldn't say that resolution is a general problem, it's just SD.

#

but I think the problem is: we never have fair benchmarks where different architectures are compared on exactly the same training data. So it's never clear if model A is better because of it's architecture or it's training data or it's parameter size

bitter hearth
#

the 1k version of Sana removed the positional embeddings and it went ok

#

positional embeddings seem to be the main unet dit difference

#

they added them back in for the 2k to 4k sana though

dry wave
#

yes. But I think positional embeddings make totally sense. Why let the unet learn them itself?

bitter hearth
#

I agree, I don't use Sana, I wish Sana worked because I care mostly about speed

#

but I can't get adequate quality out of it

#

there are some resolution flexibility advantages from not having pos embeds

#

but I don't think that matters much because Flux goes to 8k without tiling (e.g. in the CLEAR paper)

#

there was some fast 3x3 conv model on arxiv last year but its like SD 1.5 quality at best

dry wave
#

like you can usually increase resolution in pure transformer models without everything fall appart (see Pixart for example)

#

while convolution models get a lot of artifacts when increasing resolution (double heads, enlongated necks and so on)

candid latch
#

How i use this? Good Mornig xd

bitter hearth