#🆕|sd3
1 messages · Page 125 of 1
welcome :)
I'm going the other direction, trying to involve an LLM every time now
they can do other things than just prompts, for example bounding boxes
that'll work well for sd3.X and flux as they were captioned by an LLM - but it won't work so well for, say, sd1.5 as it was pretty much captioned by people who put their images up with google SEO in mind - because those are the text labels in the liaon data base
but fun
the LLM captioned ones (basically all the modern ones) never quite captured the chaos of SD 1.5
here's a text string i found in the laion database that works really well for sd1.5: melting liquid falling into the bottom of the drop
see what you can do with it
Yeah this makes the llm generate regions for flux, the prompt following is truly amazing with this, basically regional prompting with flux but better and faster. https://github.com/NJU-PCALab/RAG-Diffusion
thanks I saved this one
on SDXL "colorful backgound:1.3" is the best hidden gem I have found I think
for some reason the model listens to it a lot
:1.3 is supposed to be weights and sdxl doesn't use that. that's midjourney
change the numbers and see what it does
oh yeah I'm just using that to communicate it
in comfy you have to make sure the strength increase is actually done how you intend cos so many nodes differ
sure, but see what changing the numbers does, anyway
Color does actually look pretty nice, and this is base sdxl.
wow that looks cool
@bitter hearth LLMs used for object detectioni, prompt enhancing and now embedding generation? it's like they're taking over the image gen scene!
LLMs wrote my prompts from day 1 TBH
mostly OpenAI models or Florence 2
yeah this is SOTA, currently I think
there was a competition won by someone using LLM agents with Powerpaintv2 which may be better, not sure
or progressive outpainting by LLM agents, there's been a couple papers on that but they weren't compared here
feels like the "Soft Refinement" stage could also be applied to inpainting workflow 🤔
The only thing I don't like is that inference speed grow considerably with more masks, not as much if you do 4-step/8step lora but still a pretty large amount. Any link to the llm agent with powerpaintv2 or progressive outpainting? that seems pretty interesting.
prompt: fluffy cotton candy clouds, whipped cream, splatters, transmission fluid rain
looks tasty I guess
I'll post it tomorrow
I always forget the names of papers
llm is never going to think about coming up with a prompt like that
and yeah the inference time is long my regional workflows have been taking over a dozen minutes on an L40s
its pretty rough
and this is with just euler and 8-10 step acceleration loras
not even clownsampler
I do think GPT 4 is still pretty bad at prompting
just say no to anything closedAI does
I'm actually not an open source enthusiast personally
although I understand the motivations behind the movement
i'm anti-OpenAI - i'm also anti-altman
he's a scam artist from what i've seen
have definitely seen some shenanigans in the news regarding that company
which is why i call them closedAI
yeah I find the name pretty funny, it does fit
if you want a good llm, use claude from anthropic, or meta.ai - or one of the opensource llama versions
they said they still won't outsource the original GPT 3.5 because it is too dangerous
even though its weaker than some 7B now
microsoft owns them, the instant microsoft became their excluisve partner, they closed everything and made that excuse
I still need to try claude yeah
and until that contract is over - and it wont' be over until they succeed in developing AGI, microsoft gets theri technology for free in exchange for giving them access to their data centers, also for free
Where images
so until they aren't in microsoft's backpocket, they do what small and limp says to do

yes, but that's where they are, headed into kling
I see a feline face in it
What's happening there
you have very strange clouds
if I remember rightly this image was testing a flux realism lora
Pixtral large came out too today, really impressive. Its text capability is similar to gpt4o, llama 405b, gemini1.5 pro while its image understanding is actually better then gpt4o, sonnet3.5, and gemini1.5 pro. And its far cheaper then gpt4o, 405b, and sonnet 3.5.
https://huggingface.co/mistralai/Pixtral-Large-Instruct-2411
yeah this is huge
wait till next year at this time. you will not recognize the world
not sure
it doesn't feel like ML changed that much in 2024 compared to a year ago
has grown less than I expected
it hasn't. wasn't supposed to. next year - massive video and robot push
next year will be video year yeah
things are lining up for that push now
and the vision encoder is just a measly 1b params. Llama 90b's vision encoder is 20b params, but mistral large still beats it by far.
I don't know about robots, don't follow that area
I guess this model is gonna need A100 80GB for Comfyui
you might want to come up to speed and fast in that case
don't prune your model, remove the trash from your data training set
cute adorable big-eyed happy chirping, fluffy baby bird by artist "jacek yerka", by artist "Jasmine Becket-Griffith"'
looks pixar
it does. see what you can do with the prompt
try it in sd1.5 ;)
yeah will have a go next time I get server
eular_ancestral : sampler linear_ quadratic: scheduler cfg: 3.5 to 4 steps 32
unless you have an old version of comfy
linear quadratic
Surely karras will be fine
with eular ancestral
ehhhh talk to @bitter hearth about that
what is this
a confused female
sounds like you tried Karras LMAO
simple or beta
the Karras schedule is not good for SD 3.5, Auraflow or Flux
because it takes steps that are too big early on
these models want schedules that take small steps early on in the process
here @rapid pivot https://docs.google.com/spreadsheets/d/17jzqpz3FyolUwvUREQ-oY1SaRLHlliGVkGoy7F5rfaI/edit?usp=sharing read through that
increasing shift helps with this
switching to beta also helps, and linear quadratic helps massively
you need to take care as small steps early on necessarily means large steps later
so there are limits to how much you can focus your steps in the early stages
its good
red emo girl room
if Sana comes out it will be a bit more flexible
they removed the positional embeddings that cause Flux grid
if robin would just fix flux, flux would be fine
Here it is 
interesting that all it affected is her lipstick
And her hair
didn't make her hair red
personally I am the biggest fan of Lumina
it uses rotatory embeddings
needs aesthetic fine tune though
watches the entire world try to learn to think like a computer... and fail
there are some comfy things that can stop prompt bleeding
I don't use the fancier ones as I am not that bothered
or you can just learn how to prompt correctly ;)
yeah you can do it with prompt engineering
I use concat conditioning node when things get spicy, it boxes things off
you've seen my workflows. they are very small and never any negative prompts. and i don't get bleeding unless i want it
its been a few months since I last used a negative yeah
I like how recent models don't have SDXL's trend towards yellow
5 is real close to the edge. go much farther and you'll start over cooking it - however you might find that useful in some situations
Let me try something more then
I think I upped it too much
@craggy crest
It's cooked now
you're coming over for dinner and bringing that with you, right?
AI food is always so good
Playground v2.5 cakes were amazing
your chef needs to learn how to cut up pineapple
Those are special jamanderee pineapples
Hello
does a1111 support sd3.5?
kinda?
Two posters for the Black Friday event
if only you lend me 9 thousands H100
sorry, they're already out on loan
i will make a powerful architecture than recraft and open source it (i dont have a degree on computer science)
With the power of friendship we can do it all !
(has zero coding or math skills)
I love this neon glow in the dark style I wish people made more art like this
I overhauled a Comfy Ksampler and built it with T2I.Does anyone want to use this Ksampler?
I'm thinking of making it public if there is a response.
I would like to contribute to the development.
The size is 1024 x 1024.
Created with StableDiffusion 3.5.
aple
i have stable-diffusion-3.5-large, where do i find the clip files for it?
Using IC Lora to create each Chinese character and concatenate all together
I haven't used SD3.5 for a while, from all this models, what would you choose, for speed or quality? Or there is some other finetune or model aroud these days?
Depends on how much vram you have, turbo is the fastest and 2nd best, large is the slowest but the best.
Medium uses the least vram but is worse quality and worse speed then turbo.
ty, I have 12 GB VRAM
Image I have a workflow with medium, I can switch to turbo or I have to change the workflow for it?
Sd3.5 large turbo or not won’t fit normally, you will need quantized versions. These will lose a bit of quality(still much better then sd3.5 medium) but use far less vram. Q8 one is basically losless and should fit, but lower ones will take even less vram.
https://huggingface.co/calcuis/sd3.5-large-gguf
I'm downloading that one, will test it ty
oh I found I download some other gguf
run the full boat one twice if the speed is acceptable stay with it - some guy on the internet
so city96 seems to have gone off the deep end https://huggingface.co/city96/Flux.1-Heavy-17B
lol @craggy crest have you tried running it?
you need 80 gig vram. i don't have that
what about in lowvram mode?
i don't have 40 gig, either ;)
$1.20/hr to rent 80GB vram runpod for an hour. I have $4 in credits I could play with it there but ehh....
so he just merged it on itself? i don't get that concept, like I could understand if he merged a bunch of loras in like @short thicket did but even that doesn't increase parameter count. do you understand what he did @craggy crest
did you read his entire front page?
LOL I'm downloading it already
Do you feel like you have too much VRAM lately? Want to OOM on a 40GB A100? This is the model for you!
lmao
oh lord. cant' wait to see what you do with it
" Usage: Good luck."
I was really curious/interested in what he meant by this sentence: "Merging was done similarly to 70B->120B LLM merges, with the layers repeated and interwoven in groups." so I had a chat about it with chatgpt to better understand it: https://chatgpt.com/share/673cee36-c87c-800f-bc3a-c956e7ff1ac7
doing the lord's work my man lool report back and let us know if you can get anything out of it, even if it's a 128x128 image
is that even possible? what's the smallest image you can make with flux and SD3 and the like?
Flux does 100x100
this is the sort of self-merge it is based on https://old.reddit.com/r/LocalLLaMA/comments/1aj2jw0/miqu_120b_selfmerge_like_venusmegadolphin/
1 pixel by 1 pixel
not even wrong TBH
Bring that person closer to the camera
Then it reveals it is actually Shrek

did you know SD 1.5 and SDXL can also make rly small images like 250x250 or less
if you use Unet Temperature node
I only found that last week
wait'll you see the entire video ;)
someone should secretly finetune shrek into the model
but only certain tokens trigger him
One one hand, it's amazing that you can improve a model by effectively copying around information that it already contains.
On the other hand, doesn't this suggest that the way inference currently works is suboptimal? If a program like mergekit can produce a 120b model from a 70b model that outperforms that 70b model without needing any additional information, shouldn't it be possible to build this into the inference code itself, and get the performance of the frankenmerge from the 70b model directly, without requiring additional memory?
this is exatly what i was thinking and why i asked chatgpt about it
even chatgpt was incredolous this technique would work or offer any beneit and yet it does
deep learning in general is the most suboptimal thing
what would happen if we GGUF q2 or q8 the flux 17b model?
can i run it on 8gb of gpu memory then?! lol
yeah
no I'm just gonna make R2D2 pictures
do you have the requisite vram tho?
sweet dude, make sure to try the best of the best for everything, t5xxl fp32, don't hold back lol
its bf16 already i was gonna convert to fp16 if it was fp32
the t5 can be Q8, its the same performance as fp32 for inference
A model that makes everything look 1% Shrek, even objects

whats better bf16 or fp32?
[INFO ] model.cpp:793 - load flux.1-heavy-17B.safetensors using safetensors format
[INFO ] model.cpp:1776 - model tensors mem size: 9436.40MB
[INFO ] model.cpp:1811 - load tensors done
[INFO ] model.cpp:1812 - trying to save tensors to flux.1-heavy-17B.q4_0.gguf
convert 'flux.1-heavy-17B.q4_0.gguf' success
Conversion completed in 0 hour(s) 15 minute(s) 19.4 second(s).
Press any key to continue . . .
works for me 🙂 @bitter hearth Prompt executed in 80.52 seconds
guys what do we do if Flux Heavy is better LOL
well we have the model, so we don't need to be sad
wo so that's the same seed right?
do side-by-side comparisons using same seed bc I think heavy is better it could be considered subjective, it's not amazingly better right?
i made it
ah okay nice
here's the link if you just wanna download it: https://civitai.com/models/964045?modelVersionId=1079329
its already up on civit lol
here's some sammples
I'd have more but mochi is hogging the queue right now
What's better about it
the guy who created it just showed a picture of the base model, vs a picture after it self merged and the after was somewhat better than the base, not much to go on
looks like your GGUF was done correctly, thanks a lot
it does lose a fair bit in Q4 but it works
i'm still skeptical about the whole concept of self-merge but it's a thing and it's been demonstrated to actually improve the modmel so i'm waiting on @bitter hearth to post some side-by-sides
yeah the GGUF seems to hold out well compared to the full 17b model
can you try testing the full model on complex text? I'm seeing poor reslts on my end for that, im also using a cheap setup so im gonna try to push it on that end in a minute after im done with the images for the gallery
is that you? i saw that name talking about using klown sampler with mochi somewhere and it was nice
lol yeah man gotta spread the knowledge
ya was good stuff
i noticed that if i upload mochi videos to civitai as webp they get treated as images and they get filtered from the images feed and the videos feed so basically they don't get shown
ended up having to add another node to convert it to mp4 so i can share it properly
seems really oversaturated for some reason, high cfg?
cfg 3 or 3.5, maybe I should set it to 1 since flux-d and therefore this version is distilled and im not using any flux guidance nodes?
yeah maybe, probably a good idea.
Ok, so this is a compressed repeat layer style merge? Interesting. So the model itself isn't any bigger becuase its just cloning the same weights, but inference will be much slower?
the model is bigger in parameter count and size and inference will be slower
Wait, so how is it only 9.8GB if its 17B params at FP8, yet Fp8 Flux Dev is ~12 GB?
it's a really interesting concept when I aske chatgpt about it, the LLM described this as how self-merge works:
- How It Works
a. Layer Duplication and InterleavingDuplication: Each layer of the original model is copied one or more times.
Interleaving: The duplicated layers are interwoven with the original layers in a specific sequence.
For example, consider a simplified model with layers [L1, L2, L3]. A self-merge might result in [L1, L1', L2, L2', L3, L3'], where L1' is a copy of L1.
Yeah, people do it all the time for LLM's, but it never really improves anything, just allows you to post a flashy number
so technically it's duplicating the layers right? and then quantsizing is rounding of the weights in the layers so it's almost like we're artifially doubling the size and then putting it in a zip file
I am not sure what the benefits would be, as flux lite already looks just as good, has full compatibity with, and runs way faster than flux dev
I see. Still not sure why anybody would want to make flux bigger when its already obese/oversized 😅
that's not true, someone posted a reddit thread earlier that showed a 70b that scored higher benchmarks at 120b by self merging
the author of the model showed an image of improvement from base model to 17b using the same prompt so it does show signs of improvement there too
yeah, benchmarks. Self merging increases biases and patterns, which means over-expressed concepts like information trained in specifically to cheat benchmarks expresses even more
I'd be curious what the "improvements" are
@bitter hearth is actively testing the 17B model in a rented A40 we'll see if he can come up wiith anything that can 'wow' us as far as improvements
base dev without training sucks ass for anything except over baked plastic images 😅
And I say that as somebody who might soon have a job dedicated to training flux lol
to be fair i'm using the q_4 model
here's some more 512px images using q4 flux heavy 17b
My research partner and I were able to demo incredible style/concept improvement in dev with proper training, so we are in the stages of securing funding
jesus fucking christ she looks burnt
have you tried flux dev destilled? i swear by that model
Our interest is in full flux dev for coporate
agreed
nah idk, theres no finetune that improves general capability of flux dev? except de-distilled.
that is not true lol
PixelWave Flux is a monumental improvement for flux across the board
a majority of the others are pretty ass though, I will agree. Most people are too aggressive and impatient with training
flux dev's prompt following, human anatomy, and text are better or similar from what I tested, art styles are improved for sure though in pixelwave.
you just said, its all better or equally as good... so its an improvement by literal definition lmao
pixelwave IS really good I agree but it's no flux dev dedestilled
I said flux dev's prompt following, human anatomy, and text are better or similar to pixelwave, art styles are better but it has some cost of the above things.
yea it's still within the same class, it's better but within the same class, whereas destill is in another class of it's own imo
oh, yeah, you have to learn how to prompt pixel wave flux, that is true. But when you have a good prompt, it follows it better than dev by a huge amount. It absolutely trashes SD3 in every regard too, thats for sure 😅
The main problem with distill is that it takes roughly 2x speed and flux dev is already really slow but yeah its a improvement over everything slightly.
i agree it's the slowest model by far, were you around when i posted that chart with my average model render time?
I'm not seeing anything impressive about it. got any examples?
nah idk, try to format this example prompt for pixelwave, and I'll try it with flux dev
A photograph of a white cat sitting on top of a blue dog. The blue dog is sitting on the brown couch. Behind the couch is a square window with a square cow picture in each corner of the window, the total amount of windows being 4. Outside the window is a ufo hovering in dark outer space.
but its one of those "you get what you pay for" situations, if you have the time to do it right and you dont care how long it takes and you're willing to pay however much time it takes for flux to do a good job then flux destill is the way to go
ok
Probably not the best examples lol, just try it I guess. There is a huggingface space too.
its less impressive when seeing an image, try some of your rubric prompts, stuff thats hard to adhere to and images where you see it doesn't always hit all the elements. 9/10 times flux destill will nail a very complex prompt
lol you dont know about cross post? you gotta scroll down to the gallery bro
oh sorry you're right
look at the Q8 version
that's where all the party happens
wait a sec, I already ran this
here's the direct link: https://civitai.com/models/843551?modelVersionId=943891 again scroll to the bottom were it says Gallery and you can see all the images generated with it
this one. It missed photograph style cause its not trained in as "photograph"
the q8_0 gguf one? I mean all the examples aren't probably the best and all nsfw but its mostly just flux dev with a bit more detail from my testing.
but again w/o context its hard to judge an image and whether it's any good at adhernece
I am seeing all of those, they all look really mediocre to me
what model was this? cuurious
like, I'm just not seeing anything lol
pixelwave I believe
pixelwave yeah
not bad for pixelwave
I mean, the prompt adherence is almost perfect, so I am happy haha
you see how it missed a lot of the crucial elements tho? try it on destill and you'll see it nail it like 100% not a single thing missed. i swear by destill bc it has adherence above and beyond anything else out there
like what?
white cat on blue dog on brown couch. 4 cow pictures in the window, outside is space, with a UFO. All it missed was the photographic style (cause its not tagged as photograph), and the 4 pictures being in the 4 corners
SD3.5 hasn't been able to get this image even partially right for me 😅
I mean I got this with Flux.1 alpha 8steps lora which got everything right, even the picture in corner part.
this is a repost rom another day
im still messing around with Shuttle, quite underrated and
yeah id say that's 100% nailed it
but let me try with dev, will take forever but lets see.
thats not correct. Thats not a photograph, thats not space (its a night sky), and the pictures are outside of the corners of the window, not inside of them like requested
and this is with destill models added to the list plus a rouge SDXL model at the bottom
oh right, shuttle
yeah accidently tested the old version of the prompt, let me try with the one I gave which has the dark outer space part.
when adding the proper photographic style tag, and changing the prompt to have the 4 pictures BY the corners, not IN them
anyways, I have to go for now 😅
just so we're all on the same page the prompt we're using is this one:
A photograph of a white cat on top of a blue dog sitting on a brown couch in a living room. Behind them is a window and 4 cow pictures, one in each corner. Outside the window is a ufo hovering and outer space
No adjusting the prompt or the wording or enhancing it right?
@halcyon yarrowI'll keep an eye on de-dstillined, but the pictures on civit aren't impressive, so hopefully there were be better images to interest me when I look back
one o the main problems @craggy crest had with that prompt is that it's very loose and incomplete and open to interpratation, just wanna make sure we're agreeing that's the prompt before i try it with flux heavy 17b
wait, its a month old? nevermind
Mine is modified, but almost that, yes
i don't think any amount of images will change your mind, its just one of those things you have to really give a shot and try it yourself and do a side by side comparison on your own to really test it's power
well any modifications are 'unfafir' in the sense that again its a bad prompt full of holes so by changing the text you're giving the model a leg up on exactly what it should do and how
I guess. I will do it some other time then. Too busy trying to secure proper funding for our corporate version
do a side by side with your corporate version, overall you can use cfg 3 to 7, and set the steps to a minimum of 60, ddim, beta is what I like to use on ksampler
i got a good prompt somewhere from here i gotta find, it was like orange blueberries and blue orange on a blue plate with orange wall on an blue napkin, thats not it but it was like that
A digital color photograph of a white cat sitting on top of a blue dog. The blue dog is sitting on the brown couch. Behind the couch is a square window with a square cow picture next to each corner of the window. Outside the window is a ufo hovering in dark outer space.
My version. I had to add the style tags for photographic style, since "photograph" is too broad for a model with multiple different styles of photography trained in
I also specified the cow pics should be NEXT to the corners of the window, not IN them
previously I used the old prompt, which was similar but different details
this is with same exact prompt, no enhancing with flux dev. It seems to get the prompt correct but detail is lacking, could be fixed with a better sampler.
the composition is good, but the style/look is horrifically bad lmao
btu thats kinda dev in a nutshell
anyways, gotta go
i wouldn't say those are square cow pictures next to the windows
later @winged seal nice tak
you're alive!
Yeah 😅
Haven't really been here since our project stopped using SD3.5
anyways, I really do need to get going, I'll talk later fellas
was startign to get kinda worried about you. good to see you :)
Challenge prompt: A blue orange on a blue plate against an orange background with orange blueberries on a blue napkin
i got that from here long ago, its a great test
prompt? looks nice
can't send prompt, discord said its too long
it responds well to loras as well, you just need to put the strength high
What is Big Flux Thing?
self-merge of flux dev, 17b params. does seem pretty interesting but has some cons and pros
yeah thats one, I didn't try it yet so can't say much about quality. From examples tho, seems more creative and detailed then flux but worse at other things?
well, I figured you meant worse at some things. My question was what
if anything stands out
Text at least, an example(not mine, but author was showing)
speed seemed ok
the downside is its a bit overcooked, like CFG burn from high CFG
but that might be possible to deal with
Is it consistent? because Flux flubs text too. It is not prefect all the time
but ok, just curious. Have been busy last week or two so catching up to see if anything cool has developed for either Flux or SD3.5L
not sure about text, didn't test that
will do some more tests later
I had to shut down the server cos someone released a GGUF
so I was wasting money with 45GB server lol
i think the big flux thin is better in that one right?
I mean pixelwave v0.3 finetune of flux is kinda impressive, knows many art styles with flux capability in prompt following at least.
I liked this flux version best https://civitai.com/models/941929/flux1-dedistilledmixtuned-v1?modelVersionId=1054490
description: Based on Flux-Fusion-V2, Merge of flux-dev-de-distill, finetuned by ComfyUI, Block_Patcher_ComfyUI, ComfyUI_essentials and other tools. Recommended 6-10 steps. Greatly improved quality compared to other Flux.1 model.
6-10? I need to try it then, I like speed.
yeah I haven't gone beyond 8 steps in ages
i downloaded it and tired it and wasn't super impressed by it
A photograph of a white cat on top of a blue dog sitting on a brown couch in a living room. Behind them is a window and 4 cow pictures, one in each corner. Outside the window is a ufo hovering and outer space
flux-dev-de-dis...Q8_0 | 🌱 2503417111 | 🦶 62 | 🦮 3.0 | cfg_scale_alt 3.5 | 🧠 flux_aeSft.sft | 🎤 res_2m | 🕦 beta | 🗓 11/19, 7:27 PM | ⏱️ 507s
technically the at is n top of a blue dog, its worded loosly so it doesn't mean the cat has to phyysically be on top, it missed the outerspace part and the 4 corners
can only discourage this test prompt as much as possible TBH
it feels weird how the most ambiguous test prompts end up being popular
It’s usually a good prompt to test though for prompt following, sd1.5/sdxl models perform the worst, pixart sigma, sd3.5 medium are middle, and auraflow, flux, sd3.5 are the best at it.
2nd shot to see i it did any better
I mean I tested with 25 steps and 8steps. 62 steps is kinda unfair but yeah still flux de-distilled nailed it.
II wouldn't say nailed it, i think the outer space view from the window is pretty curciail element to the prompt
I'm willing to forgive the paintings not being in the corner but yeah like Neon said that prompt is pretty ambiguous so its not really 'fair'
if you let me rewrite it and really establiish all the elements, enhanced prompts flux destill would 100% get it on the first shot
here's my rewrite:
A realistic photograph capturing a white cat physically sitting on top of a blue dog on a brown couch in a cozy living room. The couch sits against a wall featuring a large window. The window is bordered with four distinct cow pictures, each precisely placed in one corner of the window frame, creating a symmetrical arrangement. Through the window, the scene reveals the vastness of outer space, with a dark star-filled sky, distant celestial bodies, and a UFO hovering midair. The juxtaposition of the living room's warm ambiance and the surreal outer space view creates a striking visual contrast.
that's so cool it looks like an art scene set up in an existing library
Come Your Visit The Pleasentville Local Library Before Thursday
Art expo featuring works by Sharky McSharkton and his famomus shark themed art pieces
first shot with enhanced prompt, it got all the elements except the pictures in the 4 corners
@dusky thistle so i looked into that idea of monitoring your posts and sharing them on civit, I woulud need to use this library called discord-js-selfbot-v13 where basically its a bot impersonating a real user and using the tokens from a real session to access the data in this room, its very taboo and it could get me banned for using it so I gave up on that idea lol
took 4 shots but I'd say this one nailed it 100%
- wouldn't the frames prevent the window from sliding open? don't thinka bout that
- shouludn't the cat be physically on top of the dog? not exactly
- what prompt was used?
A realistic photograph capturing a white cat physically sitting on top of a blue dog on a brown couch in a cozy living room. The couch sits against a wall featuring a large window. The window frame is adorned with a cow picture at each of its four corners, ensuring all frames are immediately adjacent to the vertices of the rectangular window. Through the window, the scene reveals the vastness of outer space, with a dark star-filled sky, distant celestial bodies, and a UFO hovering midair. The juxtaposition of the living room's warm ambiance and the surreal outer space view creates a striking visual contrast.
Thank you for using comcom analytics.
"comcom analytics" supports all community managers (moderators and server owners) by stats, visualization, and analytics.
If you have any questions, feel free to ask us!
Your dashboard
Help
Support server
Other languages
en: help
ja: help Japanese
help
Here is the image you requested.
wrong kind of help
some guy posted some new files for clip G, https://civitai.com/models/929400?modelVersionId=1064550 haven't tested it on SD3 yet but this is SDXL.
- left is the FP32 version
- right is the standard clip_g everyone uses
- fixed seed
The refiner model uses the single clip loader not the dual like SDXL base< that is going to be a problem
wdyym?
for SD3.5 - clip_g is your workhorse and clip_l and t5xxl share tokens and work along side it. it looks like he's tried to combine both of the encoders that sdxl uses, which are 2 of the three 3.5 uses
interested to see what your tests show
i mean so far it's shown a different, i idont know if i like the fp32 version better but maybe it's more apparent with sd35, im gonna try it with turbo so this is my default setup which relates to the file in the screenshot
that's a 1.3 gb file so then ill try the 2.7gb fiile called fp32SDXLFLUXRefinerCLIPG_clipGLargePrunedFP32.safetensors fixed seed, we'll see iin a bit
@halcyon yarrow one is sdxl refiner 1.0 with fp32 clip g, the other is clip g large pruned fp32
definatly a confusing listing
he added in regular fp32, the other is "large" version
im guessing the 1.5gb is the pruned version and the 2.7gb is the full fp32 version I get that, but why not use the full fp32?
so it's large bc it's fp32 or is there like a medium size?
so the large version is the fp32 2.7gb file right?
i can't get his gguf version of the clip model to work
the one on the left is using the 2.7gb file and the one on the right is using the 1.4gb file
the one on the left is te 1.4gb file, the one on the right is the 2.7gb file
again fixed seed, same everything
poking around to try to figure out what the heck they were saying i find this, and why stop at large when you can go gigantic cs-giung/clip-vit-gigantic-patch14-laion2b
the way i see it, the size differene between 1.4 and 2.7 is so small that it's not really a lot of extra memory overhead, especially when the G model is so important for sd35
its hard to tell which one is better, it's so subjective, i dont wanna be biased towards the larger file but they really do look very similar even if not exactly the same. what do you think @mortal mesa would you say either of those 2 side-by-sides are objectively beter?
seems slightly finer but ya i might be imagining that
I'm just going to set it as my default for my workflows moving forward, it's one of those things where if i notice a steep drop off in quality or speed i could always revert, so moving forward my configuration will be
{
"clip_name1": "Long-ViT-L-14-BEST-GmP-smooth-ft.safetensors",
"clip_name2": "fp32SDXLFLUXRefinerCLIPG_clipGLargePrunedFP32.safetensors",
"clip_name3": "flan-t5-xxl-Q8_0.gguf"
}
Prompt:An insanely sleek and futuristic hypercar races through the rain-slicked streets of New York City at night. The car's aerodynamic body gleams under the glow of neon signs and streetlights, with water droplets streaming off its surface as it cuts through the rain. Its LED headlights pierce through the misty air, reflecting off the wet pavement and creating vivid light trails. The urban backdrop is alive with towering skyscrapers, glowing billboards, and bustling traffic blurred by the car's incredible speed. The atmosphere is intense and cinematic, capturing the raw power and elegance of the hypercar against the vibrant energy of the rain-soaked city.
Last 7 days <Nov 13 2024> → <Nov 19 2024>
- Member counts
- 345992 ↗ 346021 ↗ 346029 ↗ 346047 ↘ 346035 ↗ 346070 ↗ 346093
- Action members
- 0 → 0 → 0 → 0 → 0 → 0 ↗ 77
- Message members
- 0 → 0 → 0 → 0 → 0 → 0 ↗ 57
- Reaction members
- 0 → 0 → 0 → 0 → 0 → 0 ↗ 34
More details
seems better to me
not quite sure at the moment whether stock or upgraded encoders are the best idea
you can replace:
Clip-L with Longclip-L or Improved Clip-L
Clip-G with this one
T5-xl with T5-xxl or Flan-T5-xxl
and use higher precisions, but I am not sure what is worth it
#artisan-1 running shine
@muted dove are those made using the incontext lora for flux?
No loras
so just pure flux? if so can you show us what one of those prompts looks like?
Sure...
A hyperrealistic technical tutorial illustration depicting the step-by-step process of building a Roller-coaster. Each step flows logically, with consistent lighting and style throughout the image.
that's it? did those incontext guys ust trick us and its not needed at all? that's a super simple prompt too. is that base flux dev or a specific finetune?
I used AtomixFlux, but dev should do it too. I do feed that through an LLM as part of the workflow, but try it 😉
In context Lora improved quality by a very large amount but flux can do it without it too. See the images below, these are just plain prompts nothing else at 1024x1024 res.
is there a node in comfyui that you know about that can take those images and gif-y them?
is that a lora or is that just promptin too?
Just prompting with base flux dev.
And you can ask chatgpt code for making them into gifs
what's it called when they come out like this? isn't that called a spritesheet?
https://github.com/stormcenter/ComfyUI-autosplitgridimage something like this would be cool in the workflow
@pseudo owl can ii get the prompt for any of the gif ones you made?
I don’t have the full prompt now, but you can start with
“A seamless 4-image grid of consecutive frames from a gif. The gif is of …..,”
yay!
A seamless 4-image grid of consecutive frames from a gif. The gif is of pink teddy bear dancing
@bitter hearth look what dropped https://huggingface.co/InstantX/SD3.5-Large-IP-Adapter
wow nice
a good IP adapter would be great
I can put R2D2 pictures into it for style transfer
And "Large" means definitely for sure won't work with medium, correct? 😛
nope. it's trained for large, the blocks are different from medium, but it might work. you could try it
sometimes stuff weirdly works when it shouldn't
one of the PAG nodes, made for SDXL, works with Flux as the python syntax happened to coincide with some other ComfyUI code regarding blocks
and my favourite SDXL lora is one that was trained on SD 1.5 but happens to have an effect on SDXL
or replacing T5 with Flan-T5, a Google fine tune not made for diffusion, improved my images
there's likely embeddings in the lora that affect the clip layer. weird that it would , but the code is probably taking the embedding and applying it where it works
black magic imo
oh yeah that's a really good point it could have affected clip
cos I always use lora loaders that include clip
I don't think it woud fly on automatic1111. You've got something special
A1111 is essentially just legacy code at this point
its only really for the people who started on it, and don't want to move off due to familiarity
i wholly disagree but i wont argue against someone's clear biases. I'll just recognize those.
Forge is A1111 with modern everything. I tried it but gave up because I couldn't separate T5 and Clip-L for Flux, which makes it unusable (compared to Comfy).
why are you accusing me of being biased?
I don’t like forge or a111 bc ultimately I just want extreme level control of my setup, ComfyUi is the only tool I don’t have to depend on developers to add support for something for me to keep moving forward
i wouldn't use forge if you paid me, they're slow to add in support for what I want to use, if they add it in at all. and auto1111 was good when it came out. it's no longer good.
but if someone else wants to use them, more power to them
we are all biased. don't take it personally. you think a111 is out of date or as you put it, "Legacy code", and that the only reason someone would want to use it is the one you define. That's why i "accused you" of it. I won't argue someone's biases. We are all free to have our own beliefs.
Yeah I mean everyone is biased(including me) but a111 doesn't even support flux natively nor sd3.5 from last I heard, you need to use a seperate branch, I would kinda consider that outdated.
The cogvideox controlnet and vid2vid with the new reward lora is kinda amazing
Wow yeah that does look kinda cool
I’ve been taking my sweet time to adopt cog I’m still playing with mochi but yeah those videos look good. So the left is the guidance and the right is the render so video plus prompt to video right?
okay so you didn't have a real reason. I'm going to block you and I suggest you do the same
i'm nnot trying to insult you. your biases and choices are valid for you, so i offer respect by not arguing against them.
Yeah left is using this model: https://huggingface.co/alibaba-pai/CogVideoX-Fun-V1.1-5b-Control It's similar to controlnet union, accepts depth, surface normals, pose, and more.
Right is using it too, but plain vid2vid.
Both have hps lora from here which improves quality: https://huggingface.co/alibaba-pai/CogVideoX-Fun-V1.1-Reward-LoRAs
Design professional logos for my Instagram platform, where we market products, using the name ando and the colors dark blue, gold, light pink, and black.
1280x768 res for the left one, the weird morphing of the face mostly goes away, and much better overall.
Very impressive, how long do generations usually take?
wow cool set of prompts, what model did you use, any loras? how did you come up with these prompts? some of them look worthy o being wallpapers
that's a good question, i tried one 13 frame video at 1280x768 and it was taking in the order of like 1hr+ i was like screw iit
What gpu?
i think CogVideo is the next big thing, between all the gadgets and loras its setup to be the next gold standard for video tooling
8gb laptop 4070
Hmm, might be vram limitation
to put it into perspective a 13 frame 480x840 usually takes 110 to 170 seconds, I start getting into the 230-300+ range if i use res_3s or the res_5s one
The high res takes pretty long, 18mins.
yeah for sure nothing sucks more than spending 90 minutes waiting for something to render and then getting nothing bc it ran out of memory, that one was for 1696x960
You usually don’t need that high res but might be a bit better. The videos are made by that hps Lora which improves quality overall.
Yeah it was already the best video gen model before mochi came out. Mochi beats it in text to video but cogvideo has so many extra tools now. Tora is really amazing too.
ii ihaven't even heard of Tora goinig to have to check it out, i also messed with svd yesterday results we're okay
I just took an image I made with a figurine model and just cranked the motion settings to max, and then tried a few variations, I'm thinking I want to integrate both SVD and Mochi into my system so I can click a button and turn that into a little cllip I can share or create a video from text input
looks like it's just just showing the front and then showing the front again
Thank you much Richard! I used Sd3 Large and Medium for these, no loras required. They have kinda come as the consequence of experimentation trying to capture the right vibe and feel. I wanted to make a sort of band merch-background image, but the desire morphed into making these when I felt I had got some of the right key words down. Honestly, it was incremental and word based improvement. I gained my knowledge from @craggy crest who is wonderfully talented and well versed. She taught me kinda how to get from a point A to a point B. 🙂
Something perfect for this is OrbitX loras with cogvideox. Really good at rotations around anything, wait let me show some examples.
Would you mind giving some details on how you are using the LLM as part of your workflow? I have been playing around with similar prompts in Flux Dev since you posted those examples, but I can't get anything similar. The LLM is clearly doing some heavy lifting.
So you’re saying rather than working with svd for orbit animations I’m better off adopting cogvidro with the orbit loras and I’ll get a much better result right? Bc that’s what I’m getting from it. That gives me motivation to set it up now lol
im trying to make a seamless texture, i placed a "seamless tile" node and a "Circular VAE decode (tile)" node... and the textures do not render as seamless
im at the end with my wits
seriously? what makes you think anyone's going to fall for this scam?
@pseudo owl the Lora you refer to is this one right?
Yes these ones
Search my name on Civitai and look at the latest workflow I uploaded. That has an LLM in it, and I'm using it with Ollama.
wow look at that outpainting range
this is gonna be so good
Comfyui already supports it i believe
ah nice
I wish Comfy prioritised supporting the Int4/FP4 Flux
its the fastest thing for GPUs 24GB and under
for big GPUs Comfy still has max speed cos they can TensorRT flux
i thouht flux already supported controlnet, ipadapter and inpainting. what is this new tools thing offering?
first-party
Mostly it’s just far better and as loras.
so if i wanted to use something like flux redux its about the size of a lora
and then since there isn't a Comfy node for it yet I would just use this script: https://github.com/black-forest-labs/flux/blob/main/src/flux/cli_redux.py
We’re thrilled to share that ComfyUI now supports 3 series of new models from Black Forest Labs designed for Flux.1: the Redux Adapter, Fill Model, ControlNet Models & LoRAs (Depth and Canny).
These additions provide users with easy and precise control of details and styles in image generation.
Not yet, but I’ll try soon.
Redux
The Redux model is a model that can be used to prompt flux dev or flux schnell with one or more images.
Download the sigclip_vision_patch14_384.safetensors model and put it in your ComfyUI/models/clip_vision folder and download the flux1-redux-dev.safetensors and put it in your ComfyUI/models/style_models folder.
You can then load or drag the following image in ComfyUI to get the workflow
https://huggingface.co/Comfy-Org/sigclip_vision_384/blob/main/sigclip_vision_patch14_384.safetensors
https://huggingface.co/black-forest-labs/FLUX.1-Redux-dev
Finally bfl open sourced something apart from the original flux models at least, where’s video gen model though.
they've open sourced more than half their entire company lol
we can't criticise BFL for being too closed
I haven't tried the BFL Controlnet yet, but I'm very sure it will be high quality. The xAI lab control nets were undertrained and almost unusable
ye I've been doing the control net stuff in SD 1.5 and then refining with flux so far
now can probably do it all in flux
I agree. Also, in contrast to SAI, BFL never claimed to be a open-source company. Still they deliver more than SAI so far
yeah I'm very happy with BFL
they also gave Schnell with Apache 2 whereas SD 1.5 and SDXL are OpenRails
Apache 2 is a lot better
Flux outpainting(someone tested in banadaco discord)
someone needs to wake up Clownshark
inpainting/outpainting benefits a lot from better samplers
I wonder if redux would stack with depth and canny
juust got it working
i just updapted to the latest comfy, don't like how im on the new UI finally
I believe it works
you can downgrade
unclear why you would want to though aside from familiarity
if you get bugs maybe
i have to stay on this latest version if i want the flux tools
but yeah its just familiraity i just hate having to relearn where they put in all the old stuff
I meant downgrade the GUI not the overall Comfy install
there is an option in settings
i looked for the option but was unable to find the legay UI mode anyways its fine ill be a big boy and adapt, i noticed t5xxl v1.1 doesn't work with redux, it creates a black image
i lost my sampler preview tho and i don't see the option to enable that in settings 😦
i don't have base flux dev installed so i used shuttle 3 diffusion, works great lol
flux mini however wasn't compatible, in case anyone is curious lol
just tested it with "UNET LOADER GGUF" and it does not work with the pixelwave model but it does work with @short thicket 's model
it also produced way nicer results than shuttle
i am 100% fast-tracking and integrating flux redux into my stuff its cool but im having trouble getting it to adhere to the prompt, i just give it a random image and say do he same thing as the examples "sketch, b&w" for example and it completely ignores it
these days with conditioning you want to set area, timesteps and strength for all conditioning types
so there are a lot of variables to tweak
what happened to shuttle 4, it disappeared
doesn't rly matter since the model wasn't on there yet
there was a shuttle4?
Oh it just came out looks like: https://huggingface.co/shuttleai/shuttle-4-diffusion
oh thanks, I didn't see when I looked
I should have scrolled
need to test it but waiting for quants
how did you made that? I have the feeling, too, that Flux ignores the prompt as soon as you condition it on an image
I haven't booted a server to test yet but
timesteps and strength are what I would play with
comfyui has no strength for the style model yet
nah, I don't wanna mess with comfy code. I will wait for the next update
I wonder if the conditioning multiply node would work on it
or otherwise, you could multiply the strength of the conditioning coming out of your text encode node
this is a random idea but also maybe ClipAttentionMultiply or Clip Temperature Multiply
those nodes are really good on SD 1.5
hm, I think the way it works is that it adds additional tokens to your prompt
like expanding the prompt by a new prompt it generated from the image
ah okay that makes sense
if it works via prompt then it might be better on flux-dev-de-distill
cos that seems to follow your prompt better
maybe yeah, could also be related to this
okay, increasing prompt length definitely helps
flux has a dedicated text network that runs along side image generation. It's all self attention.
I am struggling to work out if they are actually better but there is Longclip or Zer0int's fine tunes for Clip L
and then Flan T5 for T5
as alternative text encoders
I think the issue is
SIGCLIP is basically transforming the image into tokens
and then the style model translates these tokens into T5 prompt space
and they are added to the prompt
the thing is now: the number of tokens in the image might be quite large
and if your prompt is very short, the newly added tokens just outweight the prompt
I got consistently anime images by just repeating over and over in the T5 prompt that I want an anime image
I think they are always fixed in CLIP models
actually, they are already super small
like 350x350 pixel or something like that
I mean, its not that bad:
An anime character in the style of anime and manga artists like studio Ghibli with vibrant colors, clear anime line arts, its a perfect anime image. An anime image of a young man.
This transforms any photo of a man into an anime image
ah yeah overstating things can help a lot
I've started just dumping 1000 tokens from GPT 4o in prompt boxes and that works well
You guys are talking about Redux?
I kinda didn't get the hype for shuttle diffusion3 but from some high-res testing, its much better then schnell and even dev sometimes. A quick gen I made with just 4steps and Euler.
Have you guys tried it yet? I could t get it to work per se
Side note t5xxl v1.1 produces a black image but v1 works fine . Haven’t tried flan. You guys can confirm it works?
I’m thinking flux redux only works as intended when using base flux dev only, my experience is that I’ll type in a prompt let’s say sketch black and white and I tried 6 or 7 flux models I have via the unet gguf loader and I would get different stylized versions of the original image but they would all be in color, it would just ignore my prompts completely basically, I even cranked up cfg to 8 to make sure it wasn’t that
haven't tried the new stuff today yet
I think this might be happening and explain what I’m experiencing using a super short prompt but I’m just following the example WF given and in their website they use super short prompts
shuttle diffusion seems great yeah
yeah, I'm trying to look through the comfyui code but its as messy as usual X_x
It’s easy to test just overly elaborate on how it should be a black and white pic maybe 1000 tokens worth and see if it affects the image
So let’s say vision model gets 1000 tokens out of an image then 1000 from the prompt should balance it
I guess I could load in a black and white pic as my source style image and replace empty latent image with the target image and the set a high denoise?
cos T5 has relative positional embeddings you could try dumping a huge prompt in (use LLM to write)
it was trained on 512 tokens or so but people have got it to recall things that were over 3,000 tokens in
depends how the node and back end are coded though they might split it automatically
its weird, yes, cause I think the additional tokens are appended on top of the 512 tokens
thats why Reflux is so slow
and I think its 576 additional tokens
but I don't think you have to write such a long prompt. I found it sufficient to just repeat what you want a few times
"black & white image, monochrome, black and white, an monochrome image in black and white"
yeah I haven't tested optimal prompt length yet
thats probably already enough?
maybe yeah, for photographic prompts I tended to only repeat 2-3 times
I think I see the chicken's point tBH
@dry wave original image on the left, using shuttle diffusion on the right, my prompt is:
black and white sketch, black and white sketch,black and white sketch, black and white sketch,black and white sketch, black and white sketch,black and white sketch, black and white sketch,black and white sketch, black and white sketch,black and white sketch, black and white sketch,black and white sketch, black and white sketch,black and white sketch, black and white sketch,black and white sketch, black and white sketch,black and white sketch, black and white sketch,black and white sketch, black and white sketch,
using the flux_redux_model_example image provided by the website
@halcyon yarrow @dry wave
For redux, Text prompt isn’t supposed to matter by default. It’s supposed to be image variation.
You can hack it though by averaging the text prompt and redux prompt, then the text prompt matters, you can also have strength control by multiplying the redux prompt.
i trried it but that didn't work
i also just tried my latent img trik aka img2img setu
that didn't work
so instead of empty latent image i replaced it with load image > vae encode > sampler
How did you try it?
so as you can see ini the top left the style image is this black and white circular image, so the theory is it's reading that image and extracting the style for that, then for the text prompt im overloading it with that repeated text 'black and white sketch' and the image attached is what im getting
and then for the denoise i tried 0.2 and 0.8 and i get similar results, but nothing in black and white
llike their own blog post sayys it, my new theory is that it only works on base flux dev model bc somehow the lora is aligned only with the token space from the base model
https://blog.comfy.org/day-1-support-for-flux-tools-in-comfyui/
https://comfyanonymous.github.io/ComfyUI_examples/flux/
We’re thrilled to share that ComfyUI now supports 3 series of new models from Black Forest Labs designed for Flux.1: the Redux Adapter, Fill Model, ControlNet Models & LoRAs (Depth and Canny).
These additions provide users with easy and precise control of details and styles in image generation.
Hmm, I think something is missing then probably. I’m not sure 🤔
yeah it feels that way, they do show how to chain 2 images together which is pretty cool too
i wonder what'll happen if i chain the 2 images together like that
maybe in the example shown in the blog theyy're actually chaining 2 images? bc from what i see on the blog post its just 1 image and prompt
Hmm, maybe it’s only pro. The examples are pro as well.
In addition to the [dev] adapter, the API endpoint allows users to modify an image given a textual description. The feature is supported in our latest model FLUX1.1 [pro] Ultra, allowing for combining input images and text prompts to create high-quality 4-megapixel outputs with flexible aspect ratios.
oh i must've missed that
It’s really sad if only flux pro supports it, but seems like we can hack our way to use a text prompt.
yeah that is sad, iif you can figure out a way to hack it @pseudo owl do tag me I'd love to get something like it for now it still has vallue tho, I can replace my img2img setup with this WF and get higher quality more coherent output
I'm actively running a script that's processing 540 loras I have for flux-d by running them through llama 3.2b uncensored to assiign them 3 of 25 possible categories
FLUX TOOLs - Run Local - Inpaint, Redux, Depth, Canny. Here is how to run all the new FLUX Tools on your computer.
Links from my Video
Get my Shirt with Code "Olivio" here: https://www.qwertee.com/
https://blackforestlabs.ai/flux-1-tools/?ref=blog.comfy.org
https://huggingface.co/black-forest-labs/FLUX.1-Canny-dev-lora
https://huggin...
canny and depth loras are really bad quality output with the default workflows, anyone got a better one yet?
i think that's pretty cool actually, I just finiished integrating redux into my stuff
left is using Flux redux + empty latent image
right is using SDXL + load image
does someone tryed redux ?
Sorted out the aspect ratio and added auto prompt
A comparison showing the difference between default (first) image and the same using "Lying Sigma" at -0.5 strength.
I don't think so
the Reflux lora is just a projection from CLIP-Vision to T5 latent space
it should not matter which Flux Checkpoint you use as every Flux Checkpoint operates on the same T5 latent space
I'm pretty sure we can get Reflux working better by playing around with the generated tokens. It's insane to generate ~600 additional tokens to describe a image. Maybe we can cluster and merge them to downweight their impact. I might play around with that on weekend
The one recommended with the comfy workflow. The workflow is in the images above.
Are there big differences between the different clips. They are still a huge mystery to me. I know only certain models support certain clip models but outside of that…
I think Neon found the answer on why it’s not working, the specific redux feature to do prompt + image is limited only to pro model. It says so right on their website
I'm pretty sure you can make it work for flux dev, too
Yeah I agree with what you said, the model doesn’t matter, it works on base flux dev as well as any fine tune, but it only works limited to image to image kinda ignoring the prompt essentially . If you wanna do stuff like on the website where you just give it a short prompt saying black and white and an image and have that work you need to use flux pro, @bitter hearth quoted a snippet of the website where it says so that I missed
yes, that might be. But I'm pretty sure we will find a way to make it work for flux dev, too. As said, the issue seems to be that the added tokens outweight the original prompt tokens. So its a matter of weighting, interpolating, maybe subsampling the added tokens
Sorry it wasn’t neon it was @pseudo owl
Hmm, maybe it’s only pro. The examples are pro as well.
In addition to the [dev] adapter, the API endpoint allows users to modify an image given a textual description. The feature is supported in our latest model FLUX1.1 [pro] Ultra, allowing for combining input images and text prompts to create high-quality 4-megapixel outputs with flexible aspect ratios.
If you can figure out a way to hack it like you said that would be cool
i integrated flux redux into my system and left it runnin overnight, seemed to have generated about 50 images it's interesting it seems like it's being cropped
original left, remix right
you would think that the model would sort of reformat the layout and move the text down but it keeps cutting off the Finding
original left, remix right, it cut off the shoes, they're both portrait images but the aspect ratio on the remix is not as tall , i wonder what's going on in the latent space that the model can't find a way to fit the whole subject(s) there
start end is 0.1,0.9?
WF is embedded, but I think it was, yes.
That's why they released the outpaint model at same time 😄 😉
but i don't want a different aspect ratio, i don't want a larger image, i was hoping it would sort of 'reformat' the layout so that everyything (includin the shoes) would fit within the iimage
its almost like it has a fixed height for the concept in latent space and then it doesn't fit in the canvas and it just crops it out, very interesting don't you think @muted dove ?
its CLIP
is flux inpainting great ?
CLIP works on 384x384 pixel images
so it can only process square images
usually it will crop your images to square and then downscale it to 384x384
ha is an issue
that's why it cuts of the borders
It didn't here #🆕|sd3 message
ah i see so internally it's actually croppinig before the vision model even gets it?
Does it do it every time, or just this one? It is supposed to create a different image every time.
yes
i did try re-running the same WF multiple times and it does create a different image every time, also it goes w/o saying but the quality of your output is determined by the model, reviewing the results it did run on flux mini and flux heavy and while it looked low quality it did manage to somewhat land the concept for both
original image input image for Redux
same seed, batch size of 2, using the STOIQ model
same seed as STOIQ, batch size of 2, using Fluximate v1
great !
hey @dusky thistle i see you removed that part of the instructions where you had to manually install a library now it's just calling requirements.txt. is that correct or did i miss that part in your README somewhere?
Also I put in a good word to a famous Youtuber called Olivio Sarikas, urged him to try your sampler and do side by side testing, maybe he'll even feature your stuff in one of his videos 👍 He did say he was goinig to try it so we'll see
I made a very quick hack and it seems to work
show us your results I'm eager to see how well it works, pls try the benchmark of converting something to a black and white image as that seems to be the least subjective prompt
black and white is the hardest xD
yeah bc its not like "oh i guess I can kind of see that effect in there" least subjective. you mind showing me the before and after with your hack?
no, its because the clipvision comes with color information and if you mix color with "black&white" you get just "unsaturated" images back
this is the original image (taken from pexels)
using the prompt "black & white photo. Monochrome photo. Black and white photography. gray, monochromic black and white. b&w. old photo in black and white."
and Reflux I get
using my hack I get
same with anime:
prompt is "anime, anime style, studio ghibli anime"
the normal reflux gives me this
my hack gives me this
Haven't tried reflux, but looking at the examples it's remarkable how well composition and even details like flower in hair of one person, other kind of flower right are all kept. Flux is sooo good at guiding/prompting tiny details
What's the difference between the flux and sd3.5 large architecture? Is there a potential for sd3.5 large to hit the same or even higher standard compared to flux after fine-tuning?
I think SD3 made the error of training on CLIP as primary text encoder. Yeah, they also support T5, but the model always relies on CLIP as main information source. I think SD3 will never have a prompt understanding close to Flux
However, the main problems of SD3 are anatomy. That's something that might be fixed in future finetunes, who knows
Yep should be a lot easier now 🙂
your hack def seems worth trying, so what did you do? is it j ust a wf change or did you made any code changes to the nodes?
its a small custom node that merges tokens that are close anyways. Its really a hack for now
that's a great insight. initially, i was pretty amazed by 3.5 large's prompt understanding as well, possibly not comparable to flux but i think it's pretty close. it's just that the images come out much less attractive. not so sure how much fine-tuning would be required to take it a step further.
cause for sdxl fine-tuned weights, there were some improvements, but it wasnt a tremendous jump from the base model. so im really not sure how much we can improve on top of the current 3.5 weights.
that's the code
cool dude i'll try it for sure
btw saw your github... so we got at least two chemists in here! 🙂
(synthetic organic here, used gaussian a lot)
i like how i'm often reading a paper on some sampler algorithm and they'll suddenly jump from image generation to, say, calculating frontier orbital energies
#1237460438229450772 A realistic photo shows a crime scene of a elderly bodybuilding Japanese lifeguard found a missing lady laid on the bush over the beach.
@halcyon yarrow new video generation model, its crazy fast and uses low vram, pretty good quality and supports i2v and t2v. On fal it just uses 3 sec for a 5 sec video and on a 4090 it takes 15 sec for a 5sec video(without any the extra optimizations. Quality is surprisingly great too.
that's crazy wild 15 seconds to render a 5 second video? that's unheard of
it's the actual quality of the image vs the controlnet for me, it's not from the style of the input image as it does the same thing no matter what the input image is, just really terrible textures
i agree, ive generated over 100 images with Redux and it consistently makes the images appear grainy/blurry/out of focus and too soft, i guess i dont want to over generalize it and it could be my config for a lot of them but overall I think i need to adjust the way it works for me
Even faster now for a 4090 lol, and takes less then a min for a 4060.
wow well i already downloaded the models and installed the plugin ust waiting on a render to restart comfy
i'm gonna try kai bro's custom nnode fiirst "Apply Style (Advanced)" and then ill try the new video thing
i will say this, given a source image + prompt I'll take Redux over trying to give the sampler a low noise latent representation of the original image and having it try to figure out how to redo it, I only wish I could use Redux with all my other models this clip vision tech is great
original left, redux right, using shuttle 3, b&w prompt, default style node
merge strength of 0.8 on the left
merge strength of 0.4 next one
0.55 the last one
@dry wave and my prompt was:
Rendered entirely in black and white, the image captures the interplay of stark contrasts, with deep shadows and bright highlights accentuating every detail. A sketch-like quality pervades the scene, blending fine lines and subtle cross-hatching into a harmonious texture. The monochrome tones, abbreviated as b&w, evoke a timeless simplicity, stripping the scene of distraction and leaving pure form and light in focus.
I added two additional sliders: downsampling and weighting
this upgrade is way better now I have actual control and artistic freedom as to how much of the image's style i want to apply to my new image
can you explain them pls? how does it affect the image?
is strength the same thing as weighting but a slider?
weighting is just multiplying the token latent with a value between 0-1, shrinking it towards zero
downsampling is similar to token merging, but it merges not similar tokens together but instead neighbouring toklens
a combination of these things gives me whatever I want, just... I still don't know which works better and which combination works best X_x
you should publish that code, I just added your class to the nodes.py file for now, i didn't use any of your imiports just the class and the helper function
give me a copy of that but you should publish it too
I'm trying LTX viideo @pseudo owl , first attempt OOM error with 65 frames, cranked it down to 17 frames @ 512px and stiill OOM, going to try 9 frames @ 512px
downsampling works really well!
it does not make the image blurry in contrast to merging
dang I can't even get past just loading the model with LTXV Model Loader node forget the frames, its just a tiny 9gb model file too, that suucks
yes... I think that's it. Downsampling works by far best of all I tried so far
"vintage comic"
this is a good example original image left, redux right. like it's a nice reimagination of the same image but the redux is blurry right?
marble statues
wow man that's cool that's exactly what BFL promised and you delivered
@dry wave so if i were to take the original image no the left that purple princess pic, what settings would you recommend to get a reimaginatin while stiill keep things crispy?
Currently I have the feeling that downsample factor 1:3 is the best setting overall
uuuuh... that looks like an old version
i just iinstalled it from your stuff on github
last commiti says 5 minutes ago
oh i see you're good
thats weird. Can you restart and update your UI?
i tried running it and it sayys 0.55 not ini list
so ii ust manuaully fiixed iit
probably cached somewhere
i just used the workkflow for the previious node
downscale 1:3 and everything else on 1.0
if the effect is too weak, you can try to shrink one of the other two options additionally
so can you explain real briefly what i can expect to see between downscales like 1:1, 1:3 and 1:9? am I essentially merging more of the visual tokens and therefore making the text prompt stronger the higher the ratio goes?
yes. By default you have 27 x 27 visual tokens
so 729 tokens in total. Which is ~3 times as much as your text prompt
purple princess with that b&w prompt @ 1:3,1,1
when using downsample 1:3 you have 9x9 tokens, so 81 in total
and with downsample 1:9 you have 3x3 = 9 tokens in total
yeah, the only downside of downsampling is that you cannot use a "weaker" effect strength. If its too strong you have to use the other two options
got it so so the ratio is how many visual tokens to reduce from the visual input based on the default spec of 27
yes
wow outstanding results now
the simplistic one on the right is the b&w prompt + 1:3 and the one on the left is the original prompt + 1:3
i am literally going to integrate this right now before I do anything else into my system so i can see how much better it does
check the annoucements on the comfyUI discord
@dry wave I have a system that dynamically builds a ComfyUI WF based on the requirements of the image generation, this is not a ComfyUII workflow this is my own structuured format so I can input a config object with the stuff it needs to make it and have it make the WF for me
i'm not on ComyUI's discord, anything notable? this thing is way better than the base one
maybe notable - if you want to do video
im guessing the takeaway from the announcement is "update to the latest version of ComfyUI"
nope. sent you a DM
oh i read the announcement before you sent it as a screenshot and I missed the whole point of it also working natively
maybe ill try that too and see if i can get it to load
what's the announcement...?
the LTX video thing works using the built in nodes w/o needing to install custom nodes like mochi
theyy're giving LTX the VIP treatment like mochi got
ah, the ltx looks very interesting
i think they haven't done that for cogvideo bc cogvideo is so fragmented
you need the kijai chart
links to the example WF: https://github.com/Lightricks/ComfyUI-LTXVideo/?tab=readme-ov-file and i gotta tag @pseudo owl so he cna check it out too
HOLY COW MY EYES ARE BLEEDING!!! 😮 took only 70 seconds to render for me, oddly appropriate aniimation too
Prompt executed in 69.21 seconds
no way that was on model load too, i just did a subsequent load and it took just 7 seconds Prompt executed in 7.74 seconds
Prompt executed in 33.28 seconds my minid is so blown right now
sort of doesn't work with cartoons and stuff though. just mostly realistic, photographic images
at least that's the discussion on the L3 discord
i had a few cartoony benchmark prompts I used I could rerun those again, im looking for my max frame count befre i OOM
i'm at 177 frames at 76 seconds, this is nuts, already at double wat mochi can do and a fraction of the time, mochi can do 86 frames in like 15 minutes lol this does 177 in 76 seconds, i can't even
201 frames in 155 seconds, it takes LTX the same amount of time to give me 201 frames at the same resoluution and steps as Mochi did for 13 frames. that's a 15x speedup
the next question is does this work with the great ClownSharkSampler? @dusky thistle only one way to find out 🙂
I don't do video so I am not sure if video models work with clown stuff
would be cool if they did
if they don't, i don't expect that's a challenge that @dusky thistle will avoid
yeah def will want this shit working with video
Mochi and ClownsharkSampler work togther I have a good feeling LTX is going to work too on the same principle
A lantern festival at dusk by a peaceful lake, glowing lanterns drifting into the sky, their warm light reflecting on the water, as bursts of fireworks illuminate the scene in vivid colors.
Mochi left, LTX right. LTX didn't even do any fireworks or laterns
did really fast water, though
depends on a few things
stiffness and stability of the ODE/SDE, and then the noise scaling
i guess I could do image to video and give it something of high quality to start off with so it can match mochi but then that feels like cheating, it makes longer videos and it's 15x faster and it does img 2 video i mean I'm sure it'll get better right?
try low ETA first and then scale it up i guess? Mochi was handling res_2m, res_3s and even the 5s one like a champ at 0.5 eta
awwww I think iit's not compatible 😦
The expanded size of the tensor (216) must match the existing size (864) at non-singleton dimension 4. Target sizes: [1, 3, 208, 120, 216]. Tensor sizes: [3, 201, 480, 864] @dusky thistle i guess some adjustments are in order maybe?
i could probably hack a solution using ksampler adv eff. again see if that solves it
is this that new one?
that comfi just added today
yeah that's the brand new one, here's the relevant link if you wanna try it: https://comfyanonymous.github.io/ComfyUI_examples/ltxv/
its likely either all will work or none of them
like if you use stock comfy SDE it just doesn't work at all cos noise scaling wrong
but then with the noise scaling fixed the same sampler types work
that's why i posted all the links and everything else on the L3 discord - not posting that here
I agree to keep more experimental stuff on the more experimental discords yeah
they could post it in their comfy channel here but i sort of feel like that's their job
I kinda see that as a dead channel now
kinda. same for swarm - the devs have their own discords and aren't part of sai any more
I just tried the Flux outpainting default workflow
switched it from euler to DPM++ 2S, and doubled steps
results immediately better LOL
;) the euler curse, eh?
euler has been causing shenanigans for centuries yeah
even switching to DPM++ 2M helped
didn't even need the ancestral
Contribute to kaibioinfo/ComfyUI_AdvancedRefluxControl development by creating an account on GitHub.
I added a documentation now
thanks, this looks great
token merging is an interesting solution to the issue
I use token merging for speedups but it makes sense they would help here
you probably have a very good grasp of what's going on internally, i understand it as a basic level but I don't think I had the understanding to have built a node like that, doing sommething like that requires knowing what's even possible to achieve it, you can't do when you dont know what's possible, anyways big thx for that node I'm gonna be using the heck out of it. you want me to tag you with the comparisons?
oh, sure! I'm curious myself what's the best settings!
1:3 looked raelly good with the purple princess but i don't like the output with this other cyber girl i'm doing
original image left, redux right using 1:3
more 1:3 samples, not happy with the quality they dont feel sharp enough
hm, maybe the image contains too much details that is blurred away by merging
yeah the prompt is huge:
cybernetic female, holding pistol, great care is taken to depict the young woman to have anatomically correct arms and hands, intricate circuitry pupils,
tattoo, petite body,
modular cybernetics, an android young woman with medium blonde drill hair haircut in a malfunctioning teleporter merges people with objects, science fiction time travel, a scientist experimenting with time travel technology intricate details, 2d, detailed action background, The art style is sleek and polished, with clean, precise lines that contrast with the gritty world it portrays, it has a semi-realistic style, Each detail is sharp, from the smooth, reflective surfaces of cybernetic limbs to the crisp outlines. The overall look is refined, capturing a high-tech elegance amidst the dystopian backdrop, where every element—from intricate machinery to flowing organic forms—is meticulously rendered with a sense of precision and understated sophistication.
about 285 tokens and 1:3 reduces it from 729 to 81 tokens in in total right?
1:9 looks even worse imo
oh, I haven't tried it with CFG yet
no it empty
im using a distilled model so cfg is set to 1.2
as said, clip vision is cropping your input image automatically. Often its better to crop it yourself to ensure that the right part of the image is retained
Let me try with cfg
this is 1:3 with merge strength of 0.8 and the uncropped image, i think you might be onto something with your theory that it's my cfg, there's 2 cfg fields, the one on the sampler and the one on the clip text encode node, that one should be set to 3.5+ and it was set to 1.2 too so i think that's probably where the source of my problems were coming from
its very confusing but there are two common token merging methods
tome and todo
if you used tome for the node you might get better results with the todo method
I use a node I found here for it https://github.com/ethansmith2000/comfy-todo
1:3, 1, 1 using cropped image and cfg of 1.2
maybe its your workflow?
100% it's my workflow, im pretty sure the text encoder cfg shouldn't be at 1.anything
1:3 + cropped image + cfg 3.5. thanks for helping me find this bug kai it's my cfg settings after all
so one thing you should always do when using cfg in a distilled model is to skip the first k and last k steps
hm, but even if I don't skip steps the image looks good
it was a bug in the code I was doing Math.min instead of Math.max
ah, okay
I rescale the cfg from whatever it is to a 1 to 1.8 range for the sampler and I leave the original cfg for the text encoder
when i talk about the text encoder cfg I mean the 'guidance' field in CliPTextEncodeFlux
I now. Its good that they renamed it into "guidance"
its really confusing calling it cfg
its a distilled cfg, but it works fundamentally different from real cfg
yeah i hate the whole subject personally
i'm upset flux even had to go that route its made the whole thing confusing
to be honest, I would only use real cfg when you need negative prompts
also: real cfg is twice as slow. You don't want to use it every time
but iit's not like I can choose to not have iit
?
the dual cliip text encoder uses the guidance field
so i have to put sommething in there
lol yeah exactly, so what i set for guidance is what the original image parameters had set for cfg, and what i set for cfg_scale (for the sampler) is the rescaled version of the original cfg value. make's sense? so ifi the original cfg was let's say 10 then cfg_scale becomes 1.8 and guidance becomes 10
if the original cfg was 3.5 then cfg_scale becomes 1 and guidance becomes 3.5
I think you can just set both values independently from each other
use real cfg whenever the model does not follow your prompt correctly
like I use it when the model makes super pretty characters although my prompt says they should look ugly xD
I don't want to praise myself, but the picture looks extremely good. Razor sharp !_!
yeah i agree, image does look nice and crispy
i do set both values independently but the source image gen params only have cfg so I have to translate that to something that'll work with my stuff so that's why i independently recalculate cfg for distilled flux models and for other ones like flux destill, mangled, fluxbooru i leave cfg as-is
I mean, they talk about image patches in an unet. So its unclear if their findings are also valid for text token merging. But yeah, subsampling instead of merging would be also possible. I don't think that it will give you more details, though.
oh, but when I think about it...
I said you can only downsample by factor 3
but using torch.nn.functional.interpolate you could use arbitrary downsampling factors
this would allow for more fine-grained control
sounds to me like potentially a new version of your style apply node 👼
hm, I don't want to spam too many versions, but if you want to play around and experiment with it I can later upload a version with arbitrary downsampling factors and different interpolation options
yeah i agree, I think sometimems simplicity is key, i'm personally happy with your initial recommendation of 1:3, 1, 1. don't find myself needing more fine grained controls so maybe its overkill anyway
she is pretty close to her
yes, I think this could work
nearest neighbour is blurry, though
actually this is quite nice
you can set any downsampling factor
and you have several interpolation methods
"area" is what was the default before (just averaging)
I will make a push on a separate branch and update the main branch after further testing
