#🆕|sd3

1 messages · Page 125 of 1

bitter hearth
#

he definitely looks like he is holding it in second one

craggy crest
#

welcome :)

bitter hearth
#

I'm going the other direction, trying to involve an LLM every time now

#

they can do other things than just prompts, for example bounding boxes

craggy crest
bitter hearth
#

yeah SD 1.5 is wild

#

I really love that model

craggy crest
bitter hearth
#

the LLM captioned ones (basically all the modern ones) never quite captured the chaos of SD 1.5

craggy crest
#

see what you can do with it

pseudo owl
bitter hearth
#

thanks I saved this one

#

on SDXL "colorful backgound:1.3" is the best hidden gem I have found I think

#

for some reason the model listens to it a lot

craggy crest
#

change the numbers and see what it does

bitter hearth
#

oh yeah I'm just using that to communicate it
in comfy you have to make sure the strength increase is actually done how you intend cos so many nodes differ

craggy crest
pseudo owl
bitter hearth
#

yeah that's the sort of thing

#

SD 1.5 one

#

livens them up a bit

halcyon yarrow
#

@bitter hearth LLMs used for object detectioni, prompt enhancing and now embedding generation? it's like they're taking over the image gen scene!

bitter hearth
#

mostly OpenAI models or Florence 2

bitter hearth
#

there was a competition won by someone using LLM agents with Powerpaintv2 which may be better, not sure

#

or progressive outpainting by LLM agents, there's been a couple papers on that but they weren't compared here

#

feels like the "Soft Refinement" stage could also be applied to inpainting workflow 🤔

pseudo owl
# bitter hearth yeah this is SOTA, currently I think

The only thing I don't like is that inference speed grow considerably with more masks, not as much if you do 4-step/8step lora but still a pretty large amount. Any link to the llm agent with powerpaintv2 or progressive outpainting? that seems pretty interesting.

craggy crest
bitter hearth
#

looks tasty I guess

bitter hearth
craggy crest
#

llm is never going to think about coming up with a prompt like that

bitter hearth
#

not even clownsampler

craggy crest
bitter hearth
craggy crest
bitter hearth
#

I'm actually not an open source enthusiast personally
although I understand the motivations behind the movement

craggy crest
#

he's a scam artist from what i've seen

bitter hearth
#

have definitely seen some shenanigans in the news regarding that company

craggy crest
bitter hearth
#

yeah I find the name pretty funny, it does fit

craggy crest
#

if you want a good llm, use claude from anthropic, or meta.ai - or one of the opensource llama versions

bitter hearth
#

they said they still won't outsource the original GPT 3.5 because it is too dangerous

#

even though its weaker than some 7B now

craggy crest
bitter hearth
#

I still need to try claude yeah

craggy crest
#

and until that contract is over - and it wont' be over until they succeed in developing AGI, microsoft gets theri technology for free in exchange for giving them access to their data centers, also for free

rapid pivot
#

Where images

craggy crest
#

so until they aren't in microsoft's backpocket, they do what small and limp says to do

rapid pivot
craggy crest
#

i'm turning them into videos

bitter hearth
rapid pivot
#

Videos are still images

#

waow 👍

craggy crest
rapid pivot
bitter hearth
#

on the right? yeah

#

on the right is real life LOL

rapid pivot
#

What's happening there

craggy crest
bitter hearth
craggy crest
pseudo owl
bitter hearth
#

yeah this is huge

craggy crest
craggy crest
bitter hearth
#

not sure
it doesn't feel like ML changed that much in 2024 compared to a year ago

#

has grown less than I expected

craggy crest
bitter hearth
#

next year will be video year yeah

craggy crest
#

things are lining up for that push now

pseudo owl
bitter hearth
#

I don't know about robots, don't follow that area

#

I guess this model is gonna need A100 80GB for Comfyui

craggy crest
#

don't prune your model, remove the trash from your data training set

#

cute adorable big-eyed happy chirping, fluffy baby bird by artist "jacek yerka", by artist "Jasmine Becket-Griffith"'

bitter hearth
#

looks pixar

craggy crest
#

try it in sd1.5 ;)

bitter hearth
#

yeah will have a go next time I get server

rapid pivot
#

Quick tell me @craggy crest what sampler and cfg for 3.5

craggy crest
rapid pivot
#

Linear who

craggy crest
#

unless you have an old version of comfy

rapid pivot
#

Sounds like math

#

I don't see this option here

craggy crest
#

linear quadratic

rapid pivot
#

Surely karras will be fine

craggy crest
#

with eular ancestral

craggy crest
rapid pivot
#

sadcat what is this

craggy crest
bitter hearth
craggy crest
#

simple or beta

bitter hearth
#

the Karras schedule is not good for SD 3.5, Auraflow or Flux
because it takes steps that are too big early on

#

these models want schedules that take small steps early on in the process

bitter hearth
#

increasing shift helps with this
switching to beta also helps, and linear quadratic helps massively

#

you need to take care as small steps early on necessarily means large steps later

#

so there are limits to how much you can focus your steps in the early stages

rapid pivot
#

This looks like it's working

#

Emo girl red room

#

I'm smart prompt

bitter hearth
#

its good

craggy crest
bitter hearth
#

if Sana comes out it will be a bit more flexible

#

they removed the positional embeddings that cause Flux grid

craggy crest
rapid pivot
craggy crest
rapid pivot
#

And her hair

craggy crest
#

didn't make her hair red

rapid pivot
#

Looks like my hair if you told me to cut it myself

bitter hearth
#

personally I am the biggest fan of Lumina
it uses rotatory embeddings
needs aesthetic fine tune though

craggy crest
#

watches the entire world try to learn to think like a computer... and fail

bitter hearth
#

there are some comfy things that can stop prompt bleeding

#

I don't use the fancier ones as I am not that bothered

craggy crest
bitter hearth
#

yeah you can do it with prompt engineering

#

I use concat conditioning node when things get spicy, it boxes things off

craggy crest
#

you've seen my workflows. they are very small and never any negative prompts. and i don't get bleeding unless i want it

bitter hearth
#

its been a few months since I last used a negative yeah

rapid pivot
#

I love red, looks squishable

bitter hearth
#

I like how recent models don't have SDXL's trend towards yellow

rapid pivot
#

Now it looks better

#

5cfg waow

craggy crest
rapid pivot
#

I think I upped it too much thomas @craggy crest

#

It's cooked now

craggy crest
bitter hearth
#

AI food is always so good

rapid pivot
#

Let me get you something

bitter hearth
#

Playground v2.5 cakes were amazing

craggy crest
rapid pivot
craggy crest
rapid pivot
#

Those are special jamanderee pineapples

mortal mesa
gritty gale
#

Hello

dusky thistle
winter crater
#

does a1111 support sd3.5?

craggy crest
still wadi
#

Two posters for the Black Friday event

patent acorn
#

if only you lend me 9 thousands H100

rapid pivot
#

What will a bloon do with so many h100s

#

I'm scared

craggy crest
patent acorn
dusky thistle
rapid pivot
halcyon yarrow
# dusky thistle

I love this neon glow in the dark style I wish people made more art like this

dusky thistle
brave tide
#

I overhauled a Comfy Ksampler and built it with T2I.Does anyone want to use this Ksampler?
I'm thinking of making it public if there is a response.
I would like to contribute to the development.
The size is 1024 x 1024.
Created with StableDiffusion 3.5.

craggy crest
dusky thistle
mellow quartz
#

aple

dusky thistle
muted dove
sudden parcel
#

i have stable-diffusion-3.5-large, where do i find the clip files for it?

gusty trail
#

Using IC Lora to create each Chinese character and concatenate all together

dusky thistle
real terrace
#

I haven't used SD3.5 for a while, from all this models, what would you choose, for speed or quality? Or there is some other finetune or model aroud these days?

pseudo owl
real terrace
pseudo owl
real terrace
real terrace
#

oh I found I download some other gguf

mortal mesa
#

run the full boat one twice if the speed is acceptable stay with it - some guy on the internet

craggy crest
halcyon yarrow
#

lol @craggy crest have you tried running it?

craggy crest
halcyon yarrow
#

what about in lowvram mode?

craggy crest
halcyon yarrow
#

$1.20/hr to rent 80GB vram runpod for an hour. I have $4 in credits I could play with it there but ehh....

#

so he just merged it on itself? i don't get that concept, like I could understand if he merged a bunch of loras in like @short thicket did but even that doesn't increase parameter count. do you understand what he did @craggy crest

craggy crest
bitter hearth
mortal mesa
#

Do you feel like you have too much VRAM lately? Want to OOM on a 40GB A100? This is the model for you!

#

lmao

craggy crest
mortal mesa
#

" Usage: Good luck."

halcyon yarrow
#

I was really curious/interested in what he meant by this sentence: "Merging was done similarly to 70B->120B LLM merges, with the layers repeated and interwoven in groups." so I had a chat about it with chatgpt to better understand it: https://chatgpt.com/share/673cee36-c87c-800f-bc3a-c956e7ff1ac7

halcyon yarrow
#

is that even possible? what's the smallest image you can make with flux and SD3 and the like?

bitter hearth
#

Flux does 100x100

#

this is the sort of self-merge it is based on https://old.reddit.com/r/LocalLLaMA/comments/1aj2jw0/miqu_120b_selfmerge_like_venusmegadolphin/

bitter hearth
#

not even wrong TBH

rapid pivot
#

Then it reveals it is actually Shrek

bitter hearth
#

did you know SD 1.5 and SDXL can also make rly small images like 250x250 or less
if you use Unet Temperature node

#

I only found that last week

craggy crest
rapid pivot
#

I can't wait for the Shrek reveal

bitter hearth
#

someone should secretly finetune shrek into the model

#

but only certain tokens trigger him

halcyon yarrow
# bitter hearth this is the sort of self-merge it is based on ```https://old.reddit.com/r/LocalL...

One one hand, it's amazing that you can improve a model by effectively copying around information that it already contains.

On the other hand, doesn't this suggest that the way inference currently works is suboptimal? If a program like mergekit can produce a 120b model from a 70b model that outperforms that 70b model without needing any additional information, shouldn't it be possible to build this into the inference code itself, and get the performance of the frankenmerge from the 70b model directly, without requiring additional memory?

this is exatly what i was thinking and why i asked chatgpt about it

#

even chatgpt was incredolous this technique would work or offer any beneit and yet it does

bitter hearth
#

deep learning in general is the most suboptimal thing

halcyon yarrow
#

what would happen if we GGUF q2 or q8 the flux 17b model?

bitter hearth
#

would go fine

#

it will fit on 8GB GPU

halcyon yarrow
#

can i run it on 8gb of gpu memory then?! lol

bitter hearth
#

yeah

halcyon yarrow
#

is that what you're going to do?

#

or are you gonna try running the full thing?

bitter hearth
#

no I'm just gonna make R2D2 pictures

halcyon yarrow
#

do you have the requisite vram tho?

bitter hearth
#

ye I rented L40s

#

about $0.8/hr

halcyon yarrow
#

sweet dude, make sure to try the best of the best for everything, t5xxl fp32, don't hold back lol

mortal mesa
#

its bf16 already i was gonna convert to fp16 if it was fp32

bitter hearth
#

the t5 can be Q8, its the same performance as fp32 for inference

rapid pivot
halcyon yarrow
#

whats better bf16 or fp32?

halcyon yarrow
#

[INFO ] model.cpp:793 - load flux.1-heavy-17B.safetensors using safetensors format
[INFO ] model.cpp:1776 - model tensors mem size: 9436.40MB
[INFO ] model.cpp:1811 - load tensors done
[INFO ] model.cpp:1812 - trying to save tensors to flux.1-heavy-17B.q4_0.gguf
convert 'flux.1-heavy-17B.q4_0.gguf' success
Conversion completed in 0 hour(s) 15 minute(s) 19.4 second(s).
Press any key to continue . . .

#

works for me 🙂 @bitter hearth Prompt executed in 80.52 seconds

bitter hearth
#

guys what do we do if Flux Heavy is better LOL

dull star
#

what

#

well we will start crying then

bitter hearth
#

well we have the model, so we don't need to be sad

halcyon yarrow
#

do side-by-side comparisons using same seed bc I think heavy is better it could be considered subjective, it's not amazingly better right?

bitter hearth
#

that was same seed yeah

#

how did you get a GGUF?

halcyon yarrow
#

i made it

bitter hearth
#

ah okay nice

halcyon yarrow
#

its already up on civit lol

#

here's some sammples

#

I'd have more but mochi is hogging the queue right now

rapid pivot
#

What's better about it

halcyon yarrow
# rapid pivot What's better about it

the guy who created it just showed a picture of the base model, vs a picture after it self merged and the after was somewhat better than the base, not much to go on

bitter hearth
#

looks like your GGUF was done correctly, thanks a lot
it does lose a fair bit in Q4 but it works

halcyon yarrow
#

i'm still skeptical about the whole concept of self-merge but it's a thing and it's been demonstrated to actually improve the modmel so i'm waiting on @bitter hearth to post some side-by-sides

#

yeah the GGUF seems to hold out well compared to the full 17b model

#

can you try testing the full model on complex text? I'm seeing poor reslts on my end for that, im also using a cheap setup so im gonna try to push it on that end in a minute after im done with the images for the gallery

mortal mesa
halcyon yarrow
mortal mesa
#

ya was good stuff

halcyon yarrow
#

i noticed that if i upload mochi videos to civitai as webp they get treated as images and they get filtered from the images feed and the videos feed so basically they don't get shown

#

ended up having to add another node to convert it to mp4 so i can share it properly

pseudo owl
halcyon yarrow
#

cfg 3 or 3.5, maybe I should set it to 1 since flux-d and therefore this version is distilled and im not using any flux guidance nodes?

pseudo owl
winged seal
# halcyon yarrow here's some sammples

Ok, so this is a compressed repeat layer style merge? Interesting. So the model itself isn't any bigger becuase its just cloning the same weights, but inference will be much slower?

halcyon yarrow
winged seal
#

Wait, so how is it only 9.8GB if its 17B params at FP8, yet Fp8 Flux Dev is ~12 GB?

halcyon yarrow
#

it's a really interesting concept when I aske chatgpt about it, the LLM described this as how self-merge works:

  1. How It Works
    a. Layer Duplication and Interleaving

Duplication: Each layer of the original model is copied one or more times.

Interleaving: The duplicated layers are interwoven with the original layers in a specific sequence.

For example, consider a simplified model with layers [L1, L2, L3]. A self-merge might result in [L1, L1', L2, L2', L3, L3'], where L1' is a copy of L1.

winged seal
#

Yeah, people do it all the time for LLM's, but it never really improves anything, just allows you to post a flashy number

halcyon yarrow
#

so technically it's duplicating the layers right? and then quantsizing is rounding of the weights in the layers so it's almost like we're artifially doubling the size and then putting it in a zip file

winged seal
#

I am not sure what the benefits would be, as flux lite already looks just as good, has full compatibity with, and runs way faster than flux dev

winged seal
halcyon yarrow
halcyon yarrow
winged seal
#

yeah, benchmarks. Self merging increases biases and patterns, which means over-expressed concepts like information trained in specifically to cheat benchmarks expresses even more

winged seal
halcyon yarrow
#

@bitter hearth is actively testing the 17B model in a rented A40 we'll see if he can come up wiith anything that can 'wow' us as far as improvements

winged seal
#

base dev without training sucks ass for anything except over baked plastic images 😅

And I say that as somebody who might soon have a job dedicated to training flux lol

halcyon yarrow
#

to be fair i'm using the q_4 model

#

here's some more 512px images using q4 flux heavy 17b

winged seal
#

My research partner and I were able to demo incredible style/concept improvement in dev with proper training, so we are in the stages of securing funding

winged seal
halcyon yarrow
winged seal
#

Our interest is in full flux dev for coporate

halcyon yarrow
pseudo owl
winged seal
#

PixelWave Flux is a monumental improvement for flux across the board

#

a majority of the others are pretty ass though, I will agree. Most people are too aggressive and impatient with training

pseudo owl
winged seal
winged seal
halcyon yarrow
pseudo owl
halcyon yarrow
winged seal
pseudo owl
halcyon yarrow
#

i agree it's the slowest model by far, were you around when i posted that chart with my average model render time?

winged seal
pseudo owl
halcyon yarrow
#

but its one of those "you get what you pay for" situations, if you have the time to do it right and you dont care how long it takes and you're willing to pay however much time it takes for flux to do a good job then flux destill is the way to go

winged seal
#

I know, I am not seeing anything impressive on its page

#

there's like 4 images lol

pseudo owl
#

Probably not the best examples lol, just try it I guess. There is a huggingface space too.

halcyon yarrow
#

its less impressive when seeing an image, try some of your rubric prompts, stuff thats hard to adhere to and images where you see it doesn't always hit all the elements. 9/10 times flux destill will nail a very complex prompt

halcyon yarrow
#

oh sorry you're right

#

look at the Q8 version

#

that's where all the party happens

winged seal
#

Ok, I'm there

#

there's like 10 pics

winged seal
halcyon yarrow
winged seal
#

this one. It missed photograph style cause its not trained in as "photograph"

pseudo owl
# winged seal there's like 10 pics

the q8_0 gguf one? I mean all the examples aren't probably the best and all nsfw but its mostly just flux dev with a bit more detail from my testing.

halcyon yarrow
#

but again w/o context its hard to judge an image and whether it's any good at adhernece

winged seal
halcyon yarrow
winged seal
#

like, I'm just not seeing anything lol

pseudo owl
winged seal
#

pixelwave yeah

halcyon yarrow
#

not bad for pixelwave

winged seal
#

I mean, the prompt adherence is almost perfect, so I am happy haha

halcyon yarrow
#

you see how it missed a lot of the crucial elements tho? try it on destill and you'll see it nail it like 100% not a single thing missed. i swear by destill bc it has adherence above and beyond anything else out there

winged seal
#

like what?

#

white cat on blue dog on brown couch. 4 cow pictures in the window, outside is space, with a UFO. All it missed was the photographic style (cause its not tagged as photograph), and the 4 pictures being in the 4 corners

#

SD3.5 hasn't been able to get this image even partially right for me 😅

pseudo owl
#

I mean I got this with Flux.1 alpha 8steps lora which got everything right, even the picture in corner part.

halcyon yarrow
#

this is a repost rom another day

mortal mesa
#

im still messing around with Shuttle, quite underrated and

halcyon yarrow
pseudo owl
#

but let me try with dev, will take forever but lets see.

winged seal
halcyon yarrow
#

and this is with destill models added to the list plus a rouge SDXL model at the bottom

winged seal
pseudo owl
winged seal
#

when adding the proper photographic style tag, and changing the prompt to have the 4 pictures BY the corners, not IN them

#

anyways, I have to go for now 😅

halcyon yarrow
#

just so we're all on the same page the prompt we're using is this one:

A photograph of a white cat on top of a blue dog sitting on a brown couch in a living room. Behind them is a window and 4 cow pictures, one in each corner. Outside the window is a ufo hovering and outer space

No adjusting the prompt or the wording or enhancing it right?

winged seal
#

@halcyon yarrowI'll keep an eye on de-dstillined, but the pictures on civit aren't impressive, so hopefully there were be better images to interest me when I look back

halcyon yarrow
#

one o the main problems @craggy crest had with that prompt is that it's very loose and incomplete and open to interpratation, just wanna make sure we're agreeing that's the prompt before i try it with flux heavy 17b

winged seal
#

wait, its a month old? nevermind

winged seal
halcyon yarrow
#

well any modifications are 'unfafir' in the sense that again its a bad prompt full of holes so by changing the text you're giving the model a leg up on exactly what it should do and how

winged seal
halcyon yarrow
#

do a side by side with your corporate version, overall you can use cfg 3 to 7, and set the steps to a minimum of 60, ddim, beta is what I like to use on ksampler

mortal mesa
#

i got a good prompt somewhere from here i gotta find, it was like orange blueberries and blue orange on a blue plate with orange wall on an blue napkin, thats not it but it was like that

winged seal
#

A digital color photograph of a white cat sitting on top of a blue dog. The blue dog is sitting on the brown couch. Behind the couch is a square window with a square cow picture next to each corner of the window. Outside the window is a ufo hovering in dark outer space.

My version. I had to add the style tags for photographic style, since "photograph" is too broad for a model with multiple different styles of photography trained in

I also specified the cow pics should be NEXT to the corners of the window, not IN them

pseudo owl
winged seal
#

the composition is good, but the style/look is horrifically bad lmao

#

btu thats kinda dev in a nutshell

#

anyways, gotta go

halcyon yarrow
#

i wouldn't say those are square cow pictures next to the windows

#

later @winged seal nice tak

winged seal
#

anyways, I really do need to get going, I'll talk later fellas

craggy crest
mortal mesa
#

Challenge prompt: A blue orange on a blue plate against an orange background with orange blueberries on a blue napkin

#

i got that from here long ago, its a great test

bitter hearth
#

Flux Dev same prompt/seed:

#

Big Flux Thing, same prompt/seed:

pseudo owl
bitter hearth
#

can't send prompt, discord said its too long

#

it responds well to loras as well, you just need to put the strength high

errant dust
#

What is Big Flux Thing?

pseudo owl
errant dust
#

What are the most obvious cons?

#

Speed I am guessing is one

pseudo owl
#

yeah thats one, I didn't try it yet so can't say much about quality. From examples tho, seems more creative and detailed then flux but worse at other things?

errant dust
#

well, I figured you meant worse at some things. My question was what

#

if anything stands out

pseudo owl
#

Text at least, an example(not mine, but author was showing)

bitter hearth
#

speed seemed ok

#

the downside is its a bit overcooked, like CFG burn from high CFG

#

but that might be possible to deal with

errant dust
#

Is it consistent? because Flux flubs text too. It is not prefect all the time

#

but ok, just curious. Have been busy last week or two so catching up to see if anything cool has developed for either Flux or SD3.5L

bitter hearth
#

not sure about text, didn't test that

#

will do some more tests later

#

I had to shut down the server cos someone released a GGUF

#

so I was wasting money with 45GB server lol

halcyon yarrow
bitter hearth
#

probably better yeah

#

its a bit overcooked but not too bad

#

detail seems higher

pseudo owl
bitter hearth
#

I liked this flux version best https://civitai.com/models/941929/flux1-dedistilledmixtuned-v1?modelVersionId=1054490

#

description: Based on Flux-Fusion-V2, Merge of flux-dev-de-distill, finetuned by ComfyUI, Block_Patcher_ComfyUI, ComfyUI_essentials and other tools. Recommended 6-10 steps. Greatly improved quality compared to other Flux.1 model.

pseudo owl
#

6-10? I need to try it then, I like speed.

bitter hearth
#

yeah I haven't gone beyond 8 steps in ages

halcyon yarrow
bitter hearth
#

not sure

#

we could try stacking more on top lol

halcyon yarrow
#

A photograph of a white cat on top of a blue dog sitting on a brown couch in a living room. Behind them is a window and 4 cow pictures, one in each corner. Outside the window is a ufo hovering and outer space

flux-dev-de-dis...Q8_0 | 🌱 2503417111 | 🦶 62 | 🦮 3.0 | cfg_scale_alt 3.5 | 🧠 flux_aeSft.sft | 🎤 res_2m | 🕦 beta | 🗓 11/19, 7:27 PM | ⏱️ 507s

#

technically the at is n top of a blue dog, its worded loosly so it doesn't mean the cat has to phyysically be on top, it missed the outerspace part and the 4 corners

bitter hearth
#

can only discourage this test prompt as much as possible TBH
it feels weird how the most ambiguous test prompts end up being popular

pseudo owl
halcyon yarrow
#

2nd shot to see i it did any better

pseudo owl
#

I mean I tested with 25 steps and 8steps. 62 steps is kinda unfair but yeah still flux de-distilled nailed it.

halcyon yarrow
#

II wouldn't say nailed it, i think the outer space view from the window is pretty curciail element to the prompt

#

I'm willing to forgive the paintings not being in the corner but yeah like Neon said that prompt is pretty ambiguous so its not really 'fair'

#

if you let me rewrite it and really establiish all the elements, enhanced prompts flux destill would 100% get it on the first shot

#

here's my rewrite:

A realistic photograph capturing a white cat physically sitting on top of a blue dog on a brown couch in a cozy living room. The couch sits against a wall featuring a large window. The window is bordered with four distinct cow pictures, each precisely placed in one corner of the window frame, creating a symmetrical arrangement. Through the window, the scene reveals the vastness of outer space, with a dark star-filled sky, distant celestial bodies, and a UFO hovering midair. The juxtaposition of the living room's warm ambiance and the surreal outer space view creates a striking visual contrast.

dusky thistle
halcyon yarrow
#

that's so cool it looks like an art scene set up in an existing library

#

Come Your Visit The Pleasentville Local Library Before Thursday
Art expo featuring works by Sharky McSharkton and his famomus shark themed art pieces

#

first shot with enhanced prompt, it got all the elements except the pictures in the 4 corners

dusky thistle
halcyon yarrow
#

@dusky thistle so i looked into that idea of monitoring your posts and sharing them on civit, I woulud need to use this library called discord-js-selfbot-v13 where basically its a bot impersonating a real user and using the tokens from a real session to access the data in this room, its very taboo and it could get me banned for using it so I gave up on that idea lol

#

took 4 shots but I'd say this one nailed it 100%

#
  • wouldn't the frames prevent the window from sliding open? don't thinka bout that
  • shouludn't the cat be physically on top of the dog? not exactly
  • what prompt was used?

A realistic photograph capturing a white cat physically sitting on top of a blue dog on a brown couch in a cozy living room. The couch sits against a wall featuring a large window. The window frame is adorned with a cow picture at each of its four corners, ensuring all frames are immediately adjacent to the vertices of the rectangular window. Through the window, the scene reveals the vastness of outer space, with a dark star-filled sky, distant celestial bodies, and a UFO hovering midair. The juxtaposition of the living room's warm ambiance and the surreal outer space view creates a striking visual contrast.

limpid thunderBOT
#

Thank you for using comcom analytics.
"comcom analytics" supports all community managers (moderators and server owners) by stats, visualization, and analytics.

If you have any questions, feel free to ask us!
Your dashboard
Help
Support server

Other languages
en: help
ja: help Japanese

dusty patio
#

help

dusky thistle
craggy crest
#

wrong kind of help

halcyon yarrow
craggy crest
halcyon yarrow
#

wdyym?

craggy crest
#

for SD3.5 - clip_g is your workhorse and clip_l and t5xxl share tokens and work along side it. it looks like he's tried to combine both of the encoders that sdxl uses, which are 2 of the three 3.5 uses

#

interested to see what your tests show

halcyon yarrow
#

i mean so far it's shown a different, i idont know if i like the fp32 version better but maybe it's more apparent with sd35, im gonna try it with turbo so this is my default setup which relates to the file in the screenshot

#

that's a 1.3 gb file so then ill try the 2.7gb fiile called fp32SDXLFLUXRefinerCLIPG_clipGLargePrunedFP32.safetensors fixed seed, we'll see iin a bit

dusky thistle
craggy crest
mortal mesa
#

@halcyon yarrow one is sdxl refiner 1.0 with fp32 clip g, the other is clip g large pruned fp32

#

definatly a confusing listing

halcyon yarrow
#

so why not use the refiner's clip g?

#

or the pruned one?

#

confusing indeed

mortal mesa
#

he added in regular fp32, the other is "large" version

halcyon yarrow
#

im guessing the 1.5gb is the pruned version and the 2.7gb is the full fp32 version I get that, but why not use the full fp32?

#

so it's large bc it's fp32 or is there like a medium size?

#

so the large version is the fp32 2.7gb file right?

#

i can't get his gguf version of the clip model to work

#

the one on the left is using the 2.7gb file and the one on the right is using the 1.4gb file

#

the one on the left is te 1.4gb file, the one on the right is the 2.7gb file

#

again fixed seed, same everything

mortal mesa
#

poking around to try to figure out what the heck they were saying i find this, and why stop at large when you can go gigantic cs-giung/clip-vit-gigantic-patch14-laion2b

brave tide
halcyon yarrow
#

the way i see it, the size differene between 1.4 and 2.7 is so small that it's not really a lot of extra memory overhead, especially when the G model is so important for sd35

#

its hard to tell which one is better, it's so subjective, i dont wanna be biased towards the larger file but they really do look very similar even if not exactly the same. what do you think @mortal mesa would you say either of those 2 side-by-sides are objectively beter?

mortal mesa
#

seems slightly finer but ya i might be imagining that

halcyon yarrow
#

I'm just going to set it as my default for my workflows moving forward, it's one of those things where if i notice a steep drop off in quality or speed i could always revert, so moving forward my configuration will be

{
    "clip_name1": "Long-ViT-L-14-BEST-GmP-smooth-ft.safetensors",
    "clip_name2": "fp32SDXLFLUXRefinerCLIPG_clipGLargePrunedFP32.safetensors",
    "clip_name3": "flan-t5-xxl-Q8_0.gguf"
}
brave tide
#

Prompt:An insanely sleek and futuristic hypercar races through the rain-slicked streets of New York City at night. The car's aerodynamic body gleams under the glow of neon signs and streetlights, with water droplets streaming off its surface as it cuts through the rain. Its LED headlights pierce through the misty air, reflecting off the wet pavement and creating vivid light trails. The urban backdrop is alive with towering skyscrapers, glowing billboards, and bustling traffic blurred by the car's incredible speed. The atmosphere is intense and cinematic, capturing the raw power and elegance of the hypercar against the vibrant energy of the rain-soaked city.

limpid thunderBOT
#

Last 7 days <Nov 13 2024> → <Nov 19 2024>

  • Member counts
  • 345992 ↗ 346021 ↗ 346029 ↗ 346047 ↘ 346035 ↗ 346070 ↗ 346093
  • Action members
  • 0 → 0 → 0 → 0 → 0 → 0 ↗ 77
  • Message members
  • 0 → 0 → 0 → 0 → 0 → 0 ↗ 57
  • Reaction members
  • 0 → 0 → 0 → 0 → 0 → 0 ↗ 34
    More details
Summary | comcom Analytics

comcom analytics は、Discord または Slack 上で運営されているコミュニティを分析・モニタリングできる完全無料のダッシュボードです。現在、パブリックにβ版を提供しています。

bitter hearth
#

not quite sure at the moment whether stock or upgraded encoders are the best idea

#

you can replace:

#

Clip-L with Longclip-L or Improved Clip-L
Clip-G with this one
T5-xl with T5-xxl or Flan-T5-xxl

#

and use higher precisions, but I am not sure what is worth it

little fossil
muted dove
muted dove
rapid pivot
#

how to light a sword on fire

halcyon yarrow
#

@muted dove are those made using the incontext lora for flux?

halcyon yarrow
#

so just pure flux? if so can you show us what one of those prompts looks like?

muted dove
halcyon yarrow
#

that's it? did those incontext guys ust trick us and its not needed at all? that's a super simple prompt too. is that base flux dev or a specific finetune?

muted dove
#

I used AtomixFlux, but dev should do it too. I do feed that through an LLM as part of the workflow, but try it 😉

pseudo owl
halcyon yarrow
#

is there a node in comfyui that you know about that can take those images and gif-y them?

halcyon yarrow
pseudo owl
#

And you can ask chatgpt code for making them into gifs

halcyon yarrow
#

what's it called when they come out like this? isn't that called a spritesheet?

prime totemBOT
halcyon yarrow
#

@pseudo owl can ii get the prompt for any of the gif ones you made?

pseudo owl
halcyon yarrow
#

yay!

A seamless 4-image grid of consecutive frames from a gif. The gif is of pink teddy bear dancing

craggy crest
bitter hearth
#

a good IP adapter would be great

#

I can put R2D2 pictures into it for style transfer

cinder lichen
#

And "Large" means definitely for sure won't work with medium, correct? 😛

craggy crest
bitter hearth
#

sometimes stuff weirdly works when it shouldn't
one of the PAG nodes, made for SDXL, works with Flux as the python syntax happened to coincide with some other ComfyUI code regarding blocks

#

and my favourite SDXL lora is one that was trained on SD 1.5 but happens to have an effect on SDXL

#

or replacing T5 with Flan-T5, a Google fine tune not made for diffusion, improved my images

toxic bone
#

black magic imo

bitter hearth
#

oh yeah that's a really good point it could have affected clip

#

cos I always use lora loaders that include clip

toxic bone
#

I don't think it woud fly on automatic1111. You've got something special

bitter hearth
#

A1111 is essentially just legacy code at this point
its only really for the people who started on it, and don't want to move off due to familiarity

toxic bone
#

i wholly disagree but i wont argue against someone's clear biases. I'll just recognize those.

icy drift
civic trail
bitter hearth
craggy crest
halcyon yarrow
#

I don’t like forge or a111 bc ultimately I just want extreme level control of my setup, ComfyUi is the only tool I don’t have to depend on developers to add support for something for me to keep moving forward

craggy crest
#

i wouldn't use forge if you paid me, they're slow to add in support for what I want to use, if they add it in at all. and auto1111 was good when it came out. it's no longer good.

#

but if someone else wants to use them, more power to them

toxic bone
# bitter hearth why are you accusing me of being biased?

we are all biased. don't take it personally. you think a111 is out of date or as you put it, "Legacy code", and that the only reason someone would want to use it is the one you define. That's why i "accused you" of it. I won't argue someone's biases. We are all free to have our own beliefs.

pseudo owl
prime totemBOT
pseudo owl
halcyon yarrow
#

Wow yeah that does look kinda cool

#

I’ve been taking my sweet time to adopt cog I’m still playing with mochi but yeah those videos look good. So the left is the guidance and the right is the render so video plus prompt to video right?

bitter hearth
toxic bone
#

i'm nnot trying to insult you. your biases and choices are valid for you, so i offer respect by not arguing against them.

inland patrol
pseudo owl
queen fable
#

Design professional logos for my Instagram platform, where we market products, using the name ando and the colors dark blue, gold, light pink, and black.

toxic bone
pseudo owl
foggy cloak
halcyon yarrow
# inland patrol

wow cool set of prompts, what model did you use, any loras? how did you come up with these prompts? some of them look worthy o being wallpapers

halcyon yarrow
halcyon yarrow
halcyon yarrow
foggy cloak
#

Hmm, might be vram limitation

halcyon yarrow
#

to put it into perspective a 13 frame 480x840 usually takes 110 to 170 seconds, I start getting into the 230-300+ range if i use res_3s or the res_5s one

pseudo owl
halcyon yarrow
foggy cloak
#

Yep agony

#

I’m praying the 5080 has 24gb but it’ll likely only have 16gb again

pseudo owl
pseudo owl
halcyon yarrow
#

ii ihaven't even heard of Tora goinig to have to check it out, i also messed with svd yesterday results we're okay

#

I just took an image I made with a figurine model and just cranked the motion settings to max, and then tried a few variations, I'm thinking I want to integrate both SVD and Mochi into my system so I can click a button and turn that into a little cllip I can share or create a video from text input

craggy crest
inland patrol
# halcyon yarrow wow cool set of prompts, what model did you use, any loras? how did you come up ...

Thank you much Richard! I used Sd3 Large and Medium for these, no loras required. They have kinda come as the consequence of experimentation trying to capture the right vibe and feel. I wanted to make a sort of band merch-background image, but the desire morphed into making these when I felt I had got some of the right key words down. Honestly, it was incremental and word based improvement. I gained my knowledge from @craggy crest who is wonderfully talented and well versed. She taught me kinda how to get from a point A to a point B. 🙂

pseudo owl
dapper rune
halcyon yarrow
sudden parcel
#

im trying to make a seamless texture, i placed a "seamless tile" node and a "Circular VAE decode (tile)" node... and the textures do not render as seamless

#

im at the end with my wits

craggy crest
#

seriously? what makes you think anyone's going to fall for this scam?

halcyon yarrow
dusky thistle
dusky thistle
dusky thistle
dusky thistle
craggy crest
muted dove
craggy crest
dusky thistle
craggy crest
dusky thistle
craggy crest
timber root
sullen moss
dry wave
#

omgomgomg

#

so hyped 😁

bitter hearth
#

wow look at that outpainting range
this is gonna be so good

pseudo owl
bitter hearth
#

ah nice

#

I wish Comfy prioritised supporting the Int4/FP4 Flux

#

its the fastest thing for GPUs 24GB and under

#

for big GPUs Comfy still has max speed cos they can TensorRT flux

halcyon yarrow
bitter hearth
#

first-party

pseudo owl
bitter hearth
#

lora form is big for small GPUs yeah

#

quality is the main improvement though

halcyon yarrow
#

so if i wanted to use something like flux redux its about the size of a lora

pseudo owl
# halcyon yarrow and then since there isn't a Comfy node for it yet I would just use this script:...
Comfy Org Blog

We’re thrilled to share that ComfyUI now supports 3 series of new models from Black Forest Labs designed for Flux.1: the Redux Adapter, Fill Model, ControlNet Models & LoRAs (Depth and Canny).

These additions provide users with easy and precise control of details and styles in image generation.

halcyon yarrow
#

sweet thank you man

#

@pseudo owl have you set it up and tried it yet?

pseudo owl
halcyon yarrow
#

Redux
The Redux model is a model that can be used to prompt flux dev or flux schnell with one or more images.

Download the sigclip_vision_patch14_384.safetensors model and put it in your ComfyUI/models/clip_vision folder and download the flux1-redux-dev.safetensors and put it in your ComfyUI/models/style_models folder.

You can then load or drag the following image in ComfyUI to get the workflow

https://huggingface.co/Comfy-Org/sigclip_vision_384/blob/main/sigclip_vision_patch14_384.safetensors
https://huggingface.co/black-forest-labs/FLUX.1-Redux-dev

pseudo owl
#

Finally bfl open sourced something apart from the original flux models at least, where’s video gen model though.

bitter hearth
#

they've open sourced more than half their entire company lol

#

we can't criticise BFL for being too closed

dry wave
bitter hearth
#

ye I've been doing the control net stuff in SD 1.5 and then refining with flux so far
now can probably do it all in flux

dry wave
bitter hearth
#

yeah I'm very happy with BFL

#

they also gave Schnell with Apache 2 whereas SD 1.5 and SDXL are OpenRails

#

Apache 2 is a lot better

pseudo owl
#

Flux outpainting(someone tested in banadaco discord)

bitter hearth
#

someone needs to wake up Clownshark
inpainting/outpainting benefits a lot from better samplers

pseudo owl
#

Yep for sure

Flux with redux(right is flux, left is original)

bitter hearth
#

I wonder if redux would stack with depth and canny

halcyon yarrow
#

i just updapted to the latest comfy, don't like how im on the new UI finally

pseudo owl
bitter hearth
#

unclear why you would want to though aside from familiarity

#

if you get bugs maybe

halcyon yarrow
#

i have to stay on this latest version if i want the flux tools

#

but yeah its just familiraity i just hate having to relearn where they put in all the old stuff

bitter hearth
#

I meant downgrade the GUI not the overall Comfy install

#

there is an option in settings

halcyon yarrow
#

i looked for the option but was unable to find the legay UI mode anyways its fine ill be a big boy and adapt, i noticed t5xxl v1.1 doesn't work with redux, it creates a black image

#

i lost my sampler preview tho and i don't see the option to enable that in settings 😦

halcyon yarrow
#

i don't have base flux dev installed so i used shuttle 3 diffusion, works great lol

#

flux mini however wasn't compatible, in case anyone is curious lol

#

just tested it with "UNET LOADER GGUF" and it does not work with the pixelwave model but it does work with @short thicket 's model

#

it also produced way nicer results than shuttle

#

i am 100% fast-tracking and integrating flux redux into my stuff its cool but im having trouble getting it to adhere to the prompt, i just give it a random image and say do he same thing as the examples "sketch, b&w" for example and it completely ignores it

bitter hearth
#

these days with conditioning you want to set area, timesteps and strength for all conditioning types

#

so there are a lot of variables to tweak

mortal mesa
#

what happened to shuttle 4, it disappeared

bitter hearth
#

doesn't rly matter since the model wasn't on there yet

pseudo owl
bitter hearth
#

I should have scrolled

pseudo owl
#

need to test it but waiting for quants

dry wave
# pseudo owl

how did you made that? I have the feeling, too, that Flux ignores the prompt as soon as you condition it on an image

bitter hearth
#

I haven't booted a server to test yet but
timesteps and strength are what I would play with

dry wave
#

comfyui has no strength for the style model yet

#

nah, I don't wanna mess with comfy code. I will wait for the next update

bitter hearth
#

I wonder if the conditioning multiply node would work on it

#

or otherwise, you could multiply the strength of the conditioning coming out of your text encode node

#

this is a random idea but also maybe ClipAttentionMultiply or Clip Temperature Multiply
those nodes are really good on SD 1.5

dry wave
#

hm, I think the way it works is that it adds additional tokens to your prompt

#

like expanding the prompt by a new prompt it generated from the image

bitter hearth
#

ah okay that makes sense
if it works via prompt then it might be better on flux-dev-de-distill
cos that seems to follow your prompt better

dry wave
#

nah, they showed examples that it works

#

I rather believe there is a bug in comfy ^^°

bitter hearth
#

maybe yeah, could also be related to this

dry wave
#

okay, increasing prompt length definitely helps

toxic bone
#

flux has a dedicated text network that runs along side image generation. It's all self attention.

bitter hearth
#

I am struggling to work out if they are actually better but there is Longclip or Zer0int's fine tunes for Clip L
and then Flan T5 for T5
as alternative text encoders

dry wave
#

I think the issue is

#

SIGCLIP is basically transforming the image into tokens

#

and then the style model translates these tokens into T5 prompt space

#

and they are added to the prompt

#

the thing is now: the number of tokens in the image might be quite large

#

and if your prompt is very short, the newly added tokens just outweight the prompt

#

I got consistently anime images by just repeating over and over in the T5 prompt that I want an anime image

bitter hearth
#

oh this makes a lot of sense yeah

#

could maybe downscale reference image

dry wave
#

I think they are always fixed in CLIP models

#

actually, they are already super small

#

like 350x350 pixel or something like that

#

I mean, its not that bad:

An anime character in the style of anime and manga artists like studio Ghibli with vibrant colors, clear anime line arts, its a perfect anime image. An anime image of a young man.
This transforms any photo of a man into an anime image

bitter hearth
#

ah yeah overstating things can help a lot

#

I've started just dumping 1000 tokens from GPT 4o in prompt boxes and that works well

halcyon yarrow
#

You guys are talking about Redux?

pseudo owl
#

I kinda didn't get the hype for shuttle diffusion3 but from some high-res testing, its much better then schnell and even dev sometimes. A quick gen I made with just 4steps and Euler.

halcyon yarrow
#

Have you guys tried it yet? I could t get it to work per se

#

Side note t5xxl v1.1 produces a black image but v1 works fine . Haven’t tried flan. You guys can confirm it works?

#

I’m thinking flux redux only works as intended when using base flux dev only, my experience is that I’ll type in a prompt let’s say sketch black and white and I tried 6 or 7 flux models I have via the unet gguf loader and I would get different stylized versions of the original image but they would all be in color, it would just ignore my prompts completely basically, I even cranked up cfg to 8 to make sure it wasn’t that

bitter hearth
#

haven't tried the new stuff today yet

halcyon yarrow
bitter hearth
dry wave
#

yeah, I'm trying to look through the comfyui code but its as messy as usual X_x

halcyon yarrow
#

It’s easy to test just overly elaborate on how it should be a black and white pic maybe 1000 tokens worth and see if it affects the image

#

So let’s say vision model gets 1000 tokens out of an image then 1000 from the prompt should balance it

#

I guess I could load in a black and white pic as my source style image and replace empty latent image with the target image and the set a high denoise?

bitter hearth
#

cos T5 has relative positional embeddings you could try dumping a huge prompt in (use LLM to write)
it was trained on 512 tokens or so but people have got it to recall things that were over 3,000 tokens in

#

depends how the node and back end are coded though they might split it automatically

dry wave
#

its weird, yes, cause I think the additional tokens are appended on top of the 512 tokens

#

thats why Reflux is so slow

#

and I think its 576 additional tokens

#

but I don't think you have to write such a long prompt. I found it sufficient to just repeat what you want a few times

#

"black & white image, monochrome, black and white, an monochrome image in black and white"

bitter hearth
#

yeah I haven't tested optimal prompt length yet

dry wave
#

thats probably already enough?

bitter hearth
#

maybe yeah, for photographic prompts I tended to only repeat 2-3 times

bitter hearth
#

I think I see the chicken's point tBH

halcyon yarrow
#

@dry wave original image on the left, using shuttle diffusion on the right, my prompt is:

black and white sketch, black and white sketch,black and white sketch, black and white sketch,black and white sketch, black and white sketch,black and white sketch, black and white sketch,black and white sketch, black and white sketch,black and white sketch, black and white sketch,black and white sketch, black and white sketch,black and white sketch, black and white sketch,black and white sketch, black and white sketch,black and white sketch, black and white sketch,black and white sketch, black and white sketch,

using the flux_redux_model_example image provided by the website

toxic bone
pseudo owl
#

@halcyon yarrow @dry wave

For redux, Text prompt isn’t supposed to matter by default. It’s supposed to be image variation.

You can hack it though by averaging the text prompt and redux prompt, then the text prompt matters, you can also have strength control by multiplying the redux prompt.

halcyon yarrow
#

i trried it but that didn't work

#

i also just tried my latent img trik aka img2img setu

#

that didn't work

#

so instead of empty latent image i replaced it with load image > vae encode > sampler

pseudo owl
halcyon yarrow
#

so as you can see ini the top left the style image is this black and white circular image, so the theory is it's reading that image and extracting the style for that, then for the text prompt im overloading it with that repeated text 'black and white sketch' and the image attached is what im getting

#

and then for the denoise i tried 0.2 and 0.8 and i get similar results, but nothing in black and white

#

llike their own blog post sayys it, my new theory is that it only works on base flux dev model bc somehow the lora is aligned only with the token space from the base model

#
Comfy Org Blog

We’re thrilled to share that ComfyUI now supports 3 series of new models from Black Forest Labs designed for Flux.1: the Redux Adapter, Fill Model, ControlNet Models & LoRAs (Depth and Canny).

These additions provide users with easy and precise control of details and styles in image generation.

pseudo owl
halcyon yarrow
#

yeah it feels that way, they do show how to chain 2 images together which is pretty cool too

#

i wonder what'll happen if i chain the 2 images together like that

#

maybe in the example shown in the blog theyy're actually chaining 2 images? bc from what i see on the blog post its just 1 image and prompt

pseudo owl
halcyon yarrow
#

oh i must've missed that

pseudo owl
#

It’s really sad if only flux pro supports it, but seems like we can hack our way to use a text prompt.

halcyon yarrow
#

yeah that is sad, iif you can figure out a way to hack it @pseudo owl do tag me I'd love to get something like it for now it still has vallue tho, I can replace my img2img setup with this WF and get higher quality more coherent output

#

I'm actively running a script that's processing 540 loras I have for flux-d by running them through llama 3.2b uncensored to assiign them 3 of 25 possible categories

halcyon yarrow
young blade
#

canny and depth loras are really bad quality output with the default workflows, anyone got a better one yet?

halcyon yarrow
# young blade

i think that's pretty cool actually, I just finiished integrating redux into my stuff

#

left is using Flux redux + empty latent image
right is using SDXL + load image

dusky thistle
dusky thistle
summer ginkgo
remote holly
#

does someone tryed redux ?

muted dove
#

Before / after

#

Another

muted dove
#

Sorted out the aspect ratio and added auto prompt

#

A comparison showing the difference between default (first) image and the same using "Lying Sigma" at -0.5 strength.

dry wave
#

the Reflux lora is just a projection from CLIP-Vision to T5 latent space

#

it should not matter which Flux Checkpoint you use as every Flux Checkpoint operates on the same T5 latent space

#

I'm pretty sure we can get Reflux working better by playing around with the generated tokens. It's insane to generate ~600 additional tokens to describe a image. Maybe we can cluster and merge them to downweight their impact. I might play around with that on weekend

remote holly
#

What vision clip can i use for redux style ?

muted dove
#

The one recommended with the comfy workflow. The workflow is in the images above.

cinder lichen
#

Are there big differences between the different clips. They are still a huge mystery to me. I know only certain models support certain clip models but outside of that…

halcyon yarrow
dry wave
#

I'm pretty sure you can make it work for flux dev, too

halcyon yarrow
#

Yeah I agree with what you said, the model doesn’t matter, it works on base flux dev as well as any fine tune, but it only works limited to image to image kinda ignoring the prompt essentially . If you wanna do stuff like on the website where you just give it a short prompt saying black and white and an image and have that work you need to use flux pro, @bitter hearth quoted a snippet of the website where it says so that I missed

dry wave
#

yes, that might be. But I'm pretty sure we will find a way to make it work for flux dev, too. As said, the issue seems to be that the added tokens outweight the original prompt tokens. So its a matter of weighting, interpolating, maybe subsampling the added tokens

halcyon yarrow
# pseudo owl Hmm, maybe it’s only pro. The examples are pro as well. > In addition to the [d...

Sorry it wasn’t neon it was @pseudo owl

Hmm, maybe it’s only pro. The examples are pro as well.

In addition to the [dev] adapter, the API endpoint allows users to modify an image given a textual description. The feature is supported in our latest model FLUX1.1 [pro] Ultra, allowing for combining input images and text prompts to create high-quality 4-megapixel outputs with flexible aspect ratios.

halcyon yarrow
#

i integrated flux redux into my system and left it runnin overnight, seemed to have generated about 50 images it's interesting it seems like it's being cropped

original left, remix right

#

you would think that the model would sort of reformat the layout and move the text down but it keeps cutting off the Finding

#

original left, remix right, it cut off the shoes, they're both portrait images but the aspect ratio on the remix is not as tall , i wonder what's going on in the latent space that the model can't find a way to fit the whole subject(s) there

muted dove
muted dove
halcyon yarrow
#

but i don't want a different aspect ratio, i don't want a larger image, i was hoping it would sort of 'reformat' the layout so that everyything (includin the shoes) would fit within the iimage

#

its almost like it has a fixed height for the concept in latent space and then it doesn't fit in the canvas and it just crops it out, very interesting don't you think @muted dove ?

remote holly
#

is flux inpainting great ?

dry wave
#

CLIP works on 384x384 pixel images

#

so it can only process square images

#

usually it will crop your images to square and then downscale it to 384x384

remote holly
#

ha is an issue

dry wave
#

that's why it cuts of the borders

muted dove
halcyon yarrow
#

ah i see so internally it's actually croppinig before the vision model even gets it?

muted dove
halcyon yarrow
#

i did try re-running the same WF multiple times and it does create a different image every time, also it goes w/o saying but the quality of your output is determined by the model, reviewing the results it did run on flux mini and flux heavy and while it looked low quality it did manage to somewhat land the concept for both

#

original image input image for Redux

#

same seed, batch size of 2, using the STOIQ model

#

same seed as STOIQ, batch size of 2, using Fluximate v1

remote holly
#

great !

halcyon yarrow
#

hey @dusky thistle i see you removed that part of the instructions where you had to manually install a library now it's just calling requirements.txt. is that correct or did i miss that part in your README somewhere?
Also I put in a good word to a famous Youtuber called Olivio Sarikas, urged him to try your sampler and do side by side testing, maybe he'll even feature your stuff in one of his videos 👍 He did say he was goinig to try it so we'll see

remote holly
#

i get this with redux

#

original img

dry wave
halcyon yarrow
dry wave
#

black and white is the hardest xD

halcyon yarrow
#

yeah bc its not like "oh i guess I can kind of see that effect in there" least subjective. you mind showing me the before and after with your hack?

dry wave
#

no, its because the clipvision comes with color information and if you mix color with "black&white" you get just "unsaturated" images back

#

this is the original image (taken from pexels)

#

using the prompt "black & white photo. Monochrome photo. Black and white photography. gray, monochromic black and white. b&w. old photo in black and white."

#

and Reflux I get

#

using my hack I get

#

same with anime:

#

prompt is "anime, anime style, studio ghibli anime"

#

the normal reflux gives me this

#

my hack gives me this

cunning lintel
#

Haven't tried reflux, but looking at the examples it's remarkable how well composition and even details like flower in hair of one person, other kind of flower right are all kept. Flux is sooo good at guiding/prompting tiny details

exotic sapphire
#

What's the difference between the flux and sd3.5 large architecture? Is there a potential for sd3.5 large to hit the same or even higher standard compared to flux after fine-tuning?

dry wave
#

I think SD3 made the error of training on CLIP as primary text encoder. Yeah, they also support T5, but the model always relies on CLIP as main information source. I think SD3 will never have a prompt understanding close to Flux

#

However, the main problems of SD3 are anatomy. That's something that might be fixed in future finetunes, who knows

dusky thistle
halcyon yarrow
dry wave
#

its a small custom node that merges tokens that are close anyways. Its really a hack for now

exotic sapphire
#

that's a great insight. initially, i was pretty amazed by 3.5 large's prompt understanding as well, possibly not comparable to flux but i think it's pretty close. it's just that the images come out much less attractive. not so sure how much fine-tuning would be required to take it a step further.

cause for sdxl fine-tuned weights, there were some improvements, but it wasnt a tremendous jump from the base model. so im really not sure how much we can improve on top of the current 3.5 weights.

dry wave
halcyon yarrow
#

cool dude i'll try it for sure

dusky thistle
#

(synthetic organic here, used gaussian a lot)

#

i like how i'm often reading a paper on some sampler algorithm and they'll suddenly jump from image generation to, say, calculating frontier orbital energies

neon igloo
#

#1237460438229450772 A realistic photo shows a crime scene of a elderly bodybuilding Japanese lifeguard found a missing lady laid on the bush over the beach.

pseudo owl
halcyon yarrow
#

that's crazy wild 15 seconds to render a 5 second video? that's unheard of

young blade
halcyon yarrow
pseudo owl
halcyon yarrow
#

i'm gonna try kai bro's custom nnode fiirst "Apply Style (Advanced)" and then ill try the new video thing

#

i will say this, given a source image + prompt I'll take Redux over trying to give the sampler a low noise latent representation of the original image and having it try to figure out how to redo it, I only wish I could use Redux with all my other models this clip vision tech is great

#

original left, redux right, using shuttle 3, b&w prompt, default style node

#

merge strength of 0.8 on the left
merge strength of 0.4 next one
0.55 the last one
@dry wave and my prompt was:

Rendered entirely in black and white, the image captures the interplay of stark contrasts, with deep shadows and bright highlights accentuating every detail. A sketch-like quality pervades the scene, blending fine lines and subtle cross-hatching into a harmonious texture. The monochrome tones, abbreviated as b&w, evoke a timeless simplicity, stripping the scene of distraction and leaving pure form and light in focus.

dry wave
#

I added two additional sliders: downsampling and weighting

halcyon yarrow
#

this upgrade is way better now I have actual control and artistic freedom as to how much of the image's style i want to apply to my new image

#

can you explain them pls? how does it affect the image?

#

is strength the same thing as weighting but a slider?

dry wave
#

weighting is just multiplying the token latent with a value between 0-1, shrinking it towards zero

#

downsampling is similar to token merging, but it merges not similar tokens together but instead neighbouring toklens

#

a combination of these things gives me whatever I want, just... I still don't know which works better and which combination works best X_x

halcyon yarrow
#

you should publish that code, I just added your class to the nodes.py file for now, i didn't use any of your imiports just the class and the helper function

#

give me a copy of that but you should publish it too

dry wave
#

I have it on github

#

give me a second

halcyon yarrow
#

I'm trying LTX viideo @pseudo owl , first attempt OOM error with 65 frames, cranked it down to 17 frames @ 512px and stiill OOM, going to try 9 frames @ 512px

dry wave
#

downsampling works really well!

#

it does not make the image blurry in contrast to merging

halcyon yarrow
#

dang I can't even get past just loading the model with LTXV Model Loader node forget the frames, its just a tiny 9gb model file too, that suucks

dry wave
#

yes... I think that's it. Downsampling works by far best of all I tried so far

halcyon yarrow
dry wave
#

"vintage comic"

halcyon yarrow
#

this is a good example original image left, redux right. like it's a nice reimagination of the same image but the redux is blurry right?

dry wave
#

marble statues

halcyon yarrow
#

@dry wave so if i were to take the original image no the left that purple princess pic, what settings would you recommend to get a reimaginatin while stiill keep things crispy?

dry wave
#

Currently I have the feeling that downsample factor 1:3 is the best setting overall

halcyon yarrow
dry wave
#

uuuuh... that looks like an old version

halcyon yarrow
#

i just iinstalled it from your stuff on github

#

last commiti says 5 minutes ago

#

oh i see you're good

dry wave
#

thats weird. Can you restart and update your UI?

halcyon yarrow
#

i tried running it and it sayys 0.55 not ini list

#

so ii ust manuaully fiixed iit

#

probably cached somewhere

dry wave
#

ah, okay. Maybe you used one of my older images as workflow?

#

yes, that looks correct

halcyon yarrow
#

i just used the workkflow for the previious node

dry wave
#

downscale 1:3 and everything else on 1.0

#

if the effect is too weak, you can try to shrink one of the other two options additionally

halcyon yarrow
#

so can you explain real briefly what i can expect to see between downscales like 1:1, 1:3 and 1:9? am I essentially merging more of the visual tokens and therefore making the text prompt stronger the higher the ratio goes?

dry wave
#

yes. By default you have 27 x 27 visual tokens

#

so 729 tokens in total. Which is ~3 times as much as your text prompt

halcyon yarrow
#

purple princess with that b&w prompt @ 1:3,1,1

dry wave
#

when using downsample 1:3 you have 9x9 tokens, so 81 in total

#

and with downsample 1:9 you have 3x3 = 9 tokens in total

dry wave
halcyon yarrow
#

got it so so the ratio is how many visual tokens to reduce from the visual input based on the default spec of 27

dry wave
#

yes

halcyon yarrow
#

wow outstanding results now

#

the simplistic one on the right is the b&w prompt + 1:3 and the one on the left is the original prompt + 1:3

#

i am literally going to integrate this right now before I do anything else into my system so i can see how much better it does

craggy crest
halcyon yarrow
#

@dry wave I have a system that dynamically builds a ComfyUI WF based on the requirements of the image generation, this is not a ComfyUII workflow this is my own structuured format so I can input a config object with the stuff it needs to make it and have it make the WF for me

halcyon yarrow
craggy crest
halcyon yarrow
#

im guessing the takeaway from the announcement is "update to the latest version of ComfyUI"

halcyon yarrow
#

oh i read the announcement before you sent it as a screenshot and I missed the whole point of it also working natively

#

maybe ill try that too and see if i can get it to load

dry wave
#

what's the announcement...?

halcyon yarrow
#

the LTX video thing works using the built in nodes w/o needing to install custom nodes like mochi

#

theyy're giving LTX the VIP treatment like mochi got

dry wave
#

ah, the ltx looks very interesting

halcyon yarrow
#

i think they haven't done that for cogvideo bc cogvideo is so fragmented

mortal mesa
#

you need the kijai chart

halcyon yarrow
#

HOLY COW MY EYES ARE BLEEDING!!! 😮 took only 70 seconds to render for me, oddly appropriate aniimation too
Prompt executed in 69.21 seconds

#

no way that was on model load too, i just did a subsequent load and it took just 7 seconds Prompt executed in 7.74 seconds

#

Prompt executed in 33.28 seconds my minid is so blown right now

craggy crest
#

sort of doesn't work with cartoons and stuff though. just mostly realistic, photographic images

#

at least that's the discussion on the L3 discord

halcyon yarrow
#

i had a few cartoony benchmark prompts I used I could rerun those again, im looking for my max frame count befre i OOM

#

i'm at 177 frames at 76 seconds, this is nuts, already at double wat mochi can do and a fraction of the time, mochi can do 86 frames in like 15 minutes lol this does 177 in 76 seconds, i can't even

#

201 frames in 155 seconds, it takes LTX the same amount of time to give me 201 frames at the same resoluution and steps as Mochi did for 13 frames. that's a 15x speedup

#

the next question is does this work with the great ClownSharkSampler? @dusky thistle only one way to find out 🙂

bitter hearth
#

I don't do video so I am not sure if video models work with clown stuff

#

would be cool if they did

craggy crest
dusky thistle
#

yeah def will want this shit working with video

halcyon yarrow
#

Mochi and ClownsharkSampler work togther I have a good feeling LTX is going to work too on the same principle

#

A lantern festival at dusk by a peaceful lake, glowing lanterns drifting into the sky, their warm light reflecting on the water, as bursts of fireworks illuminate the scene in vivid colors.
Mochi left, LTX right. LTX didn't even do any fireworks or laterns

bitter hearth
halcyon yarrow
#

i guess I could do image to video and give it something of high quality to start off with so it can match mochi but then that feels like cheating, it makes longer videos and it's 15x faster and it does img 2 video i mean I'm sure it'll get better right?

halcyon yarrow
#

awwww I think iit's not compatible 😦
The expanded size of the tensor (216) must match the existing size (864) at non-singleton dimension 4. Target sizes: [1, 3, 208, 120, 216]. Tensor sizes: [3, 201, 480, 864] @dusky thistle i guess some adjustments are in order maybe?

#

i could probably hack a solution using ksampler adv eff. again see if that solves it

dusky thistle
#

that comfi just added today

halcyon yarrow
dusky thistle
#

k cool

#

i will take a look at that later

bitter hearth
#

like if you use stock comfy SDE it just doesn't work at all cos noise scaling wrong

#

but then with the noise scaling fixed the same sampler types work

craggy crest
bitter hearth
#

I agree to keep more experimental stuff on the more experimental discords yeah

craggy crest
bitter hearth
#

I kinda see that as a dead channel now

craggy crest
#

kinda. same for swarm - the devs have their own discords and aren't part of sai any more

bitter hearth
#

I just tried the Flux outpainting default workflow
switched it from euler to DPM++ 2S, and doubled steps
results immediately better LOL

bitter hearth
#

euler has been causing shenanigans for centuries yeah

#

even switching to DPM++ 2M helped
didn't even need the ancestral

toxic bone
#

disc pic

dry wave
#

I added a documentation now

bitter hearth
#

thanks, this looks great
token merging is an interesting solution to the issue
I use token merging for speedups but it makes sense they would help here

halcyon yarrow
# dry wave https://github.com/kaibioinfo/ComfyUI_AdvancedRefluxControl/tree/main

you probably have a very good grasp of what's going on internally, i understand it as a basic level but I don't think I had the understanding to have built a node like that, doing sommething like that requires knowing what's even possible to achieve it, you can't do when you dont know what's possible, anyways big thx for that node I'm gonna be using the heck out of it. you want me to tag you with the comparisons?

dry wave
halcyon yarrow
#

1:3 looked raelly good with the purple princess but i don't like the output with this other cyber girl i'm doing

#

original image left, redux right using 1:3

#

more 1:3 samples, not happy with the quality they dont feel sharp enough

dry wave
#

hm, maybe the image contains too much details that is blurred away by merging

halcyon yarrow
#

yeah the prompt is huge:

cybernetic female, holding pistol, great care is taken to depict the young woman to have anatomically correct arms and hands, intricate circuitry pupils,
tattoo, petite body,
modular cybernetics, an android young woman with medium blonde drill hair haircut in a malfunctioning teleporter merges people with objects, science fiction time travel, a scientist experimenting with time travel technology intricate details, 2d, detailed action background, The art style is sleek and polished, with clean, precise lines that contrast with the gritty world it portrays, it has a semi-realistic style, Each detail is sharp, from the smooth, reflective surfaces of cybernetic limbs to the crisp outlines. The overall look is refined, capturing a high-tech elegance amidst the dystopian backdrop, where every element—from intricate machinery to flowing organic forms—is meticulously rendered with a sense of precision and understated sophistication.

#

about 285 tokens and 1:3 reduces it from 729 to 81 tokens in in total right?

#

1:9 looks even worse imo

dry wave
#

thats what I get with 1:3

halcyon yarrow
#

that looks fantastic

#

maybe its my negative prompt?

dry wave
#

oh, I haven't tried it with CFG yet

halcyon yarrow
#

no it empty

dry wave
#

can you try this as input image?

halcyon yarrow
#

im using a distilled model so cfg is set to 1.2

dry wave
#

as said, clip vision is cropping your input image automatically. Often its better to crop it yourself to ensure that the right part of the image is retained

dry wave
halcyon yarrow
#

this is 1:3 with merge strength of 0.8 and the uncropped image, i think you might be onto something with your theory that it's my cfg, there's 2 cfg fields, the one on the sampler and the one on the clip text encode node, that one should be set to 3.5+ and it was set to 1.2 too so i think that's probably where the source of my problems were coming from

dry wave
#

yes, it looks like a cfg issue

#

I'm currently making a cfg workflow and try it myself

bitter hearth
#

its very confusing but there are two common token merging methods
tome and todo
if you used tome for the node you might get better results with the todo method
I use a node I found here for it https://github.com/ethansmith2000/comfy-todo

dry wave
#

this is with cfg

#

cfg=1.7

halcyon yarrow
#

1:3, 1, 1 using cropped image and cfg of 1.2

halcyon yarrow
dry wave
#

maybe its your workflow?

halcyon yarrow
#

100% it's my workflow, im pretty sure the text encoder cfg shouldn't be at 1.anything

#

1:3 + cropped image + cfg 3.5. thanks for helping me find this bug kai it's my cfg settings after all

dry wave
#

so one thing you should always do when using cfg in a distilled model is to skip the first k and last k steps

#

hm, but even if I don't skip steps the image looks good

halcyon yarrow
#

it was a bug in the code I was doing Math.min instead of Math.max

dry wave
#

ah, okay

halcyon yarrow
#

I rescale the cfg from whatever it is to a 1 to 1.8 range for the sampler and I leave the original cfg for the text encoder

dry wave
#

hm, both values are totally different

#

I wouldn't mix them up

halcyon yarrow
#

when i talk about the text encoder cfg I mean the 'guidance' field in CliPTextEncodeFlux

dry wave
#

I now. Its good that they renamed it into "guidance"

#

its really confusing calling it cfg

#

its a distilled cfg, but it works fundamentally different from real cfg

halcyon yarrow
#

yeah i hate the whole subject personally

#

i'm upset flux even had to go that route its made the whole thing confusing

dry wave
#

to be honest, I would only use real cfg when you need negative prompts

#

also: real cfg is twice as slow. You don't want to use it every time

halcyon yarrow
#

but iit's not like I can choose to not have iit

dry wave
#

?

halcyon yarrow
#

the dual cliip text encoder uses the guidance field

#

so i have to put sommething in there

dry wave
#

yes, guidance.

#

you need guidance. Its not optional

halcyon yarrow
#

lol yeah exactly, so what i set for guidance is what the original image parameters had set for cfg, and what i set for cfg_scale (for the sampler) is the rescaled version of the original cfg value. make's sense? so ifi the original cfg was let's say 10 then cfg_scale becomes 1.8 and guidance becomes 10

#

if the original cfg was 3.5 then cfg_scale becomes 1 and guidance becomes 3.5

dry wave
#

I think you can just set both values independently from each other

#

use real cfg whenever the model does not follow your prompt correctly

#

like I use it when the model makes super pretty characters although my prompt says they should look ugly xD

dry wave
# dry wave

I don't want to praise myself, but the picture looks extremely good. Razor sharp !_!

halcyon yarrow
#

yeah i agree, image does look nice and crispy

#

i do set both values independently but the source image gen params only have cfg so I have to translate that to something that'll work with my stuff so that's why i independently recalculate cfg for distilled flux models and for other ones like flux destill, mangled, fluxbooru i leave cfg as-is

dry wave
#

oh, but when I think about it...

#

I said you can only downsample by factor 3

#

but using torch.nn.functional.interpolate you could use arbitrary downsampling factors

#

this would allow for more fine-grained control

halcyon yarrow
#

sounds to me like potentially a new version of your style apply node 👼

dry wave
#

hm, I don't want to spam too many versions, but if you want to play around and experiment with it I can later upload a version with arbitrary downsampling factors and different interpolation options

halcyon yarrow
#

yeah i agree, I think sometimems simplicity is key, i'm personally happy with your initial recommendation of 1:3, 1, 1. don't find myself needing more fine grained controls so maybe its overkill anyway

dry wave
#

nearest neighbour:

toxic bone
#

she is pretty close to her

dry wave
#

yes, I think this could work

#

nearest neighbour is blurry, though

#

actually this is quite nice

#

you can set any downsampling factor

#

and you have several interpolation methods

#

"area" is what was the default before (just averaging)

#

I will make a push on a separate branch and update the main branch after further testing