#🆕|sd3

1 messages · Page 127 of 1

dusky thistle
#

yeah, this.

#

and this.

#

it's not a matter of taste, samplers are either better or worse

#

more accurate or less accurate

craggy crest
# dusky thistle it's not a matter of taste, samplers are either better or worse

the terms better and worse are comparison terms that make something different to every single person. so are the terms accurate and less accurate. comparison terms. who decides what 0 for accurate is? maybe what I consider accurate and what you do are nothing close to the same - you won't be satisfied with less than 32 decimal places and i'm fine with just 4

pseudo owl
halcyon yarrow
pseudo owl
# halcyon yarrow you generated all those yourself?

These are generated from genmo, the official website. According to an employee, website uses 200steps, full precision, and an upscaler. Uses the open model.

I obviously can’t run the full precision one so I use genmo.

bitter hearth
halcyon yarrow
#

no wonder it looks so good thhx i was wonderinig what they were doing and why my stuuff looks like garbo and theiris looks so pristine

#

so it's not that they're generating at 2x the resolution they're upscaling, i tried generating at 2x the resolution of 480x840 and my stuff crashed after 90 minutes, that made me sad lol

bitter hearth
#

and then having a subsection of the paper/presentation/docs that deals with the "human preferences" benchmarks
separate to the objective metrics

#

I actually think the preference studies are more valuable a lot the time, they are just rarer as they are expensive to make

pseudo owl
craggy crest
#

i'm still of the opinion that all of their benchmarks are worthless. for LLM's if you want to know whether it's thinking as well as a human, give it the same tests you give humans - give it the mensa test. for AI image generators, all that matters is whether the person using it likes what it's creating or not

bitter hearth
#

for art yeah but for example for inpainting you might wanna have a technical metric

#

or for something like "how well does this canny control net match the input image"

halcyon yarrow
#

this is two of the 'best' ones

pseudo owl
halcyon yarrow
#

i think there is an option on the website to level up your prompt right? like a toggle switch maybe?

craggy crest
pseudo owl
#

Hmm not sure, let me check.

#

Nope, they have the option to use mochi or their legacy model(before mochi was released). And seed

bitter hearth
#

most of the ML tasks that come up on discord are ones with qualitative/subjective objectives so yeah

#

with statistical models or quant models its more common to have something be outright correct or incorrect

craggy crest
halcyon yarrow
#

okay the prompt was:

A close-up cinematic shot of the protagonist’s face, rain dripping from their determined expression under the glow of neon city lights. Their eyes, sharp and focused, reflect the faint blue hue of a glowing blade they grip tightly in the foreground. Raindrops streak across the lens in slow motion, with subtle steam rising from the wet ground below, framing the intense moment in vivid detail.
and I remember loading it and I got a negative prompt out of the file:
Flat composition, overexposed lighting, unrealistic or cartoonish details, harsh shadows, pixelation, grainy textures, unnatural movements, static framing, cluttered background, artificial colors, poor depth of field, low resolution, jittery camera motion, lack of cinematic contrast, dull expressions, poorly blended lighting effects, unbalanced composition, unconvincing water effects, stiff posture, and unrealistic reflections.

But I'm on the website and I'm seein there is no field for negative prompt.... so i think its fair to say it was indeed rewritten and had the negative prompt injected

bitter hearth
#

I don't like it when they rewrite prompt

dusky thistle
#

there's nothing to be gained from inaccuracy, the only debate is about how much you lose, really

halcyon yarrow
#

or wait maybe he ran it locally? bc i see its an mp4 file and its only 1 second, genmo.ai wouldn't have made that file

bitter hearth
#

can't take it for granted

dusky thistle
#

yeah, i presume that's why they do it, to verify that there's nothing screwball

cunning schooner
#

I have a SD3.5M Diffusers question, if anyone can help, greatly appreciated.

I can only seem to find FlowMatchEulerDiscreteScheduler and FlowMatchHeunDiscreteScheduler for SD3.5M but they both give a brighter result than the Euler from comfy results. Any idea why my comfy results are prettier (deeper, less bright overall and slightly better contrast?
I'm using Euler + simple for both

pipe.scheduler = FlowMatchEulerDiscreteScheduler.from_config(
    pipe.scheduler.config,
    timestep_spacing="trailing",
    shift=3.0
)
bitter hearth
#

there are so many differences between comfy and diffusers

#

in terms of the underlying system

#

its not worth trying to get the exact same output

cunning schooner
#

No not the same per se, but it seems my diffusers output is always sightly too bright and I can't find the right settings to tweak.
I only see Euler and Heun for example and switching to trailing does barely anything

halcyon yarrow
#

the cool things is the large model does run on my machine even if it is slower, i thiink that other guy was just lucky with his seed bc the reference video stiil looks better than this one

halcyon yarrow
#

yeah it naiiled it really well

craggy crest
halcyon yarrow
craggy crest
#

negative prompts - are a bad idea

bitter hearth
#

got news you will really like

#

there is a paper which seems to have solved negative prompts

#

they found a way to automate making a negative prompt that is way more optimal than normal ones

#

and it can boost image quality on any image apparently

#

https://arxiv.org/abs/2411.05473

craggy crest
craggy crest
halcyon yarrow
#

im not a fan of negative prompts, i rather do without them entirely, but i do see the value for people who use them, sometmes there's no other way but to need a negative prompt

bitter hearth
#

its like turning a toy boat by pushing the back rather than pushing the front

craggy crest
bitter hearth
#

yeah the massive Civit negatives I know what you mean

spark quail
#

so ur tellin my my word pasta of bad fingers, mutated anatomy, etc are awful eh? hah yeah right!

spark quail
low sinew
#

is there sd3 for a1111?

craggy crest
gentle mango
#

do i install flux just like any other model for automatic1111?

dusky thistle
cunning schooner
#

Hi everyone! I'm using Stable Diffusion 3.5 with Diffusers, but I'm a bit confused about the scheduler and sampler options. I only seem to find FlowMatchEulerDiscreteScheduler and FlowMatchHeunDiscreteScheduler. SwarmUI and ComfyUI have more choices like SGM, DDM, DPM++, etc. When I try using schedulers from SDXL, I get messed up images.
What are my actual options for schedulers and samplers with SD 3.5 and Diffusers? Thanks!

cunning schooner
#

Why are images with the same settings, always brighter with Diffusers compared to Comfy? (same seed etc, both EulerDiscrete with simple sampler)

dry wave
#

one really significant difference I noticed between diffusers and comfy is the T5 prompt length

#

For Flux-dev the T5 prompt should be 512 tokens in length

#

but Comfyui seem to use 256 instead

#

it's an error on comfyui, or maybe a feature cause they might noticed it works better? I don't know

halcyon yarrow
#

SD35L fp8 using controlnet now

bitter hearth
dry wave
#

I don't know... I find the Flux default behaviour also strange. Having a, say, 80 token prompt but using 512 tokens without proper attention masking

bitter hearth
#

I just wish everything was explicit and documented

#

the attention masking issue is an interesting one yeah, cos in LLM world people do tend to use attention masking properly when fine tuning T5

dry wave
#

I guess in Flux they don't do it cause its a "waste of compute". Instead, Flux is using the empty tokens as registers

#

however, I wonder if it does effect quality and introduces artifacts in the generation

#

in particular if you vary between long or short prompts

bitter hearth
#

I think its possible yeah
been trying to read about that but its hard to get a definitive answer
Sana used Gemma 2 2B which is a fantastic model for its size, might be a good trend

dry wave
#

dunno

#

decoder-only architectures are not good for image generation

bitter hearth
#

T5 doesn't do so well for other languages, whereas the Chinese models that tend to use GLM, or Sana with Gemma, do okay with translating prompts

fossil pagoda
bitter hearth
#

when it comes to architecture my opinion is Omnigen is the way to go

#

or similar like Meta Chameleon

dry wave
#

the problem with Gemma and other decoder-only structures is that it's task is to predict the next tokens, so it models the embedding in a way to achieve this task

#

so if you give it a prompt "A cat with black fur", then it will try to autocomplete the sentence, adding certain token probability distributions to the embeddings like ("playing", "sleeping", "eating", whatever a black cat is doing)

halcyon yarrow
dry wave
#

in Sana they try to get around this problem by adding a prefix to the image prompt like "The following is an image caption. Summarize and describe the image of this caption")

bitter hearth
#

decoder only models still often underperform yeah
Kaggle competitions are often still won by simple ensembles of flan-t5-xxl and DeBERTa-V3-large
not stuff like GPT 4

fossil pagoda
halcyon yarrow
#

i was thinking blur was like redux too

#

i noticed the default canny settings for the workflow of 0.1/0.8 are too vague I had better luuk with 0.1/0.2 instead

bitter hearth
#

cos with masked word prediction its not specifically trying to predict the word at the end

dry wave
#

yes

bitter hearth
#

its the same with text classifiers really, even now

dry wave
#

I'm also not sure if we really need better text models. T5 is quite powerfull already

bitter hearth
#

I wish one of the big companies trained a bigger and better version of DeBERTa-V3-large

dry wave
#

the issue is rather that the latent space of T5 is really huge and there are not enough training data for models like Flux to really learn from this latent space

bitter hearth
#

for the task it is currently doing in Flux, T5 is easily good enough yeah

#

I think they could easily get 2-3 more years out of simply scaling DiTs

#

cos we are only 6 months in to that trend really

#

in the long run as hardware improves something like omnigen makes more sense in terms of scaling

#

the end game for transformers, just one giant transformer that does everything, across modalities

halcyon yarrow
#

that node also outputs tokenizer, I'm thinking loading the model was half the battle that solves that piece

#

I was looking at this part of the code from the qwen2vl+flux project I think I would need to turn this piece into a node:

   def process_image(self, image):
        message = [
            {
                "role": "user",
                "content": [
                    {"type": "image", "image": image},
                    {"type": "text", "text": "Describe this image."},
                ]
            }
        ]
        text = self.qwen2vl_processor.apply_chat_template(message, tokenize=False, add_generation_prompt=True)

        with torch.no_grad():
            inputs = self.qwen2vl_processor(text=[text], images=[image], padding=True, return_tensors="pt").to(self.device)
            output_hidden_state, image_token_mask, image_grid_thw = self.qwen2vl(**inputs)
            image_hidden_state = output_hidden_state[image_token_mask].view(1, -1, output_hidden_state.size(-1))

        return image_hidden_state, image_grid_thw

so the node would input image from load image, and Qwen2 from that ModelLoader and then perform this operation to return the image_hidden_state

bitter hearth
#

oh yeah I've been using qwen in comfy already there's quite a few node packs that do that

#

its useful, having an llm in comfy in node form

halcyon yarrow
#

yeah it does produce a noticable boost in quality but i think the whole shitck of qwen2vl is that rather than doing this:
input prompt > LLM > output text > Clip text encode
it can do this:
input image > VLM > output conditoning > merge with clip text encode

#

considering qwen2 is multimodal, using it only as an LLM and not taking advantage of it's VLM capabilities is like underusing it

bitter hearth
#

you can't do this without training a new model to sit in between qwen and the Dit

pseudo owl
halcyon yarrow
bitter hearth
halcyon yarrow
bitter hearth
#

wow these examples in their technical report are by far the best style transfer I have seen

#

with the exception of training checkpoints and loras

pseudo owl
bitter hearth
#

I didn't realise cog was this good

#

I dismissed it initially cos it couldn't make R2D2 move forward, but that's a niche thing, I underestimated it

craggy crest
bitter hearth
#

lol yeah

pseudo owl
bitter hearth
#

I prefer cog there

pseudo owl
bitter hearth
#

yeah it may well be that they have caught up to sora now
when sora leaked this week, a few of the people who used it said that it was worse than kling/minimax

#

and so mochi is close to kling/minimax

#

so maybe close to sora

#

I feel the sora museum flythrough was particularly good but maybe heavy cherry picking was done there

pseudo owl
bitter hearth
pseudo owl
#

ok let me try something similar with mochi, one thing impressive with sora is the 1min videos though, even closed source models don't reach that long. cogvideox does have a way to extend videos better then just extending with img2vid though, but quality is a bit lower

remote holly
#

I dont know how is possible to reach 1mn of video

remote holly
# pseudo owl original pic left, above right is ConsisId(uses cogvideox), below right is sota ...

I just find a new model that do the same https://youtu.be/PhlgC-bI5SQ?si=dKDq8ubIbgJ2smoy

Bring your identity to life with just your words. If you can describe it, you can now visualize it. Introducing Identity-Preserving Text-to-Video Generation, powered by ConsisID. Discover more at https://github.com/PKU-YuanGroup/ConsisID.

▶ Play video
craggy crest
turbid grotto
#

svdquants is a magic

#

a bit lower quality but more than 2x speedup

#

schnell from 21s down to 8s and vram consumption under 8gb

#

on rtx 3060

bitter hearth
#

wow nice

#

its great yeah

#

I am not sure on Ada/Hopper if it is faster or slower than FP8-fast with torch.compile, or TensorRT
am currently testing that sort of thing
but for RTX 3060 SVDQuant is certainly the best choice

turbid grotto
#

super cool, finally some speeds

#

would be even cooler to have dev + 8 step lora in svdq

#

and sd3.5

bitter hearth
#

I didn't know it would work on 3060 ```We currently support only NVIDIA GPUs with architectures sm_86 (Ampere: RTX 3090, A6000), sm_89 (Ada: RTX 4090), and sm_80 (A100). See this issue for more details.

#

from here https://github.com/mit-han-lab/nunchaku

#

its good to see that it will work on more GPUs

turbid grotto
bitter hearth
#

ah nice

#

funnily the H100 doesn't

remote holly
# turbid grotto svdquants is a magic

Yeah, I was impressed when I saw the article, I saw that it could work with models like cogvideoX or mochi, I'm looking forward to cogvideoX support personally

#

but the roadmap didn't seem to have changed for weeks

#

I hope to see progress soon because it would be a shame not to go all the way, it looks really promising, I haven't seen anyone talking about it yet

turbid grotto
#

But still, this need attention, maybe worth to make post in stablediffusion reddit...

remote holly
#

yeah

#

imagine the svd quant with ltxv lol

turbid grotto
remote holly
#

Shaders on steroid

rapid pivot
#

Compiling shaders please wait

low stone
#

sd3 large turbo -> ltxvideo

errant dust
#

For any interested, this was posted in Ideogram Discord today:

mno — Today at 6:27 PM
Hi , Here is our Black Friday Sale :
50% off Ideogram Basic: just $4 for the first month or $42 for the first year.
50% off priority credit top-ups: only $2 for 100 priority credits that roll over beyond billing cycles, as long as your subscription remains active.
Use code BlackFriday at checkout. Taxes may apply. Offer ends Dec 5, 11:59 PM PT.

low stone
#

more sd3 large turbo

remote holly
remote holly
#

Amazing

#

The image need to bee blur

halcyon yarrow
turbid grotto
# errant dust What is SVDquants? Link?

original repo, however, I failed to install that due to lack of knowledge:
https://github.com/mit-han-lab/nunchaku

But this fork was successful:
https://youtu.be/P46Cjw0MSzo?si=RpY7tySTzfo0AK_t
https://github.com/newgenai79/nunchaku

GitHub

SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models - mit-han-lab/nunchaku

Nunchaku / SVDQuant
https://github.com/mit-han-lab/nunchaku

Installation guide
https://drive.google.com/file/d/1qtr00-PusMrbdNz5mBs7bCh_THg5VufG/view?usp=sharing

#AI #StableDiffusion #TechInnovation #ArtificialIntelligence #DeepLearning #AIExploration #TechEnthusiast #CreativityInAI #StableAIHub #AICommunity #InnovationHub #TechBreakthroughs #...

▶ Play video
GitHub

SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models - newgenai79/nunchaku

twilit matrix
#

帮我设计一个logo,要求要有"Aether Link",然后像素在320*320,小于3M

errant dust
turbid grotto
errant dust
#

That would have a lot to do with the ability to work with less RAM as well as render speeds

turbid grotto
#

their quantization mothed works with encoder too, so doesn't take much memory either

errant dust
#

The thing is, I'm not really very open to big quality sacrifices. For me the entire point of a large model like SD3.5L or Flux is the large model quality. Once you start butchering that in the name of speed you also begin to drop to the quality of smaller models. If I wanted that, I'd just use a smaller model to begin with, and not some dumbed down large model.

turbid grotto
errant dust
#

I have a laptop 4060

#

I also know what I can get from all the models. I don't mass produce images. And when I make one, I want that quality I see. I can wait a bit to get the best

turbid grotto
#

that is fine
I like experimenting

errant dust
#

oh so do I, and I love trying the new tools, controlnets and more. Not to mention models and LoRAs

#

and I love SD3.5L too. Make no mistake. We may be talking Flux here, but they are all super cool

turbid grotto
#

Same, I have been playing with 3.5m a lot

#

However I don't see any success with finetuning yet, which makes me think Stability has to step in

remote holly
#

I thinks is very helpfull for video models

#

Because actual quants make big quality loss

#

I read the paper and they said you dont need to train a lora again for the quants

#

The lora works with svdquant

lucid swift
civic trail
pseudo owl
civic trail
halcyon yarrow
#

@errant dust do you use cmfyui?

errant dust
#

I do

#

Sorry, was looking into BF deals on the horizon and buff the laptop

#

which I already has a buyer

#

(my old one that is)

#

for better rendering and video editing

#

There are some exceptional deals right now. Will see if the one I have highlighted can be beat

#

HP Newest Victus 16.1 Ryzen 7 RTX 4070 Premium Gaming Laptop, 16.1" FHD 144Hz, AMD Ryzen 7 8845HS (Beats i7-1355U), NVIDIA GeForce RTX 4070, 64GB DDR5 RAM, 2TB SSD, HDMI, Wi-Fi 6, Windows 11 Pro for $1266

#

which is right at the limit of my budget but fine

craggy crest
#

hint: use "small person" in your prompt

halcyon yarrow
# errant dust I do

I think if you’re in the hunt for a good deal on a laptop then do what I did. If you live in the states use best buy to look through the returned inventory. I got lucky I bought a laptop rated quality: fair when really I’d consider it excellent

For $1400 it’s i9 4070 FHD. It came with 1tb and 16gb but I’ve since upgraded it to 6tb and 32gb, msrp on it was like $2200

errant dust
#

Actually, I found a more attractive deal already. And this one is it. It has a 20-core CPU, 4070, 32GB Ram (this last is a minimum requirement for my work), but also three magic words: "Cherry Mechanical Keyboard"

#

Only 1TB SSD, but I can add 1 I have spare anyhow. That much is easy

#

As I live off my writing more than anything, the KB is just magic to my ears.

halcyon yarrow
#

Yeah sometimes it’s small things like that which turn into dealbreakers, for me I require a numeric pad won’t get a laptop without one

errant dust
#

Believe it or not, I actually place my wireless mechanical KB on top of my laptop (my current one) for my writing. Heh.

#

It's a sight

halcyon yarrow
#

Checking your laptop it’s newer cpu, same GPU, more ram, but I’d never get it unless there’s an option to upgrade the display, QHD is way too sucky

errant dust
#

I will disagree. I had a 4k laptop and hated it. The pixel size is way too small natively, so you either increase the scaling a ton, or reduce resolution, in which case: why bother?

halcyon yarrow
#

Sorry I’m mistaken, QHD is fine, FHD is not fine lol

errant dust
#

ahhhh\

#

This is QHD

halcyon yarrow
#

And yes I agree, my previous laptop was 4K and I did not enjoy it, I did enjoy the touch screen tho I do miss that

#

Yeah my current laptop is QHD, mental derp

errant dust
#

I also do a ton of video editing, so the RAM is not optional. The minute you have enough moving parts, clips, sound, and so on, insufficient RAM can be crippling

#

(Just YT stuff, but you'd be amazed at how much RAM the editors demand once you start micromanaging it enough)

halcyon yarrow
#

Same goes for AI, 32gb helps a lot for making images quickly

errant dust
#

I well know. 🙂 The 4070 is dual purpose. It also serves to run some of the large NNs I use in chess

#

Mechanical KB in laptop. Ahhhhhh...

#

lol

halcyon yarrow
#

The difference between the 4070 and the next tier up being 16gb of vram is really expensive too in a laptop. Like it costs at least $1000 more for any laptop with 16gb vram. You can’t get one for under $2200 that was way out of my budget so I settled for 4070

errant dust
#

Yeah, the 4080 has 12, and cosst a nifty 500 more, and let's not even mention the 4090s with 16

halcyon yarrow
#

Considering it’s Black Friday maybe you could get a 16gb one for real cheap? You should def check best buy they might even have that cherry keyboard one as a return

errant dust
#

As much as I love the idea, it is simply outside my scope

#

well, I will absolutely be open to whatever comes my way on BF

#

not pulling trigger yet. Just lining up the prime options

#

I live in Brazil, but have family coming from US in ten days and staying here for a month, so....

raw path
#

I farted

limpid thunderBOT
#

Thank you for using comcom analytics.
"comcom analytics" supports all community managers (moderators and server owners) by stats, visualization, and analytics.

If you have any questions, feel free to ask us!
Your dashboard
Help
Support server

Other languages
en: help
ja: help Japanese

errant dust
bitter hearth
#

that seems nice yeah

bitter hearth
#

I feel like with GGUF smaller hardware is now viable

#

I went from renting L40s to renting 3060s

errant dust
#

You gave me the idea though. I found it in the Dell Outlet store which has some insane BF discount coupons (BLACKFRIDAY17).

errant dust
#

The fact the Outlet also does not charge tax made a big diff

errant dust
#

Rio de Janeiro

craggy crest
errant dust
#

Dell Outlet is a part of the Dell website and store

#

Just go to:

spiral geyser
errant dust
#

Select the options you want, put the item in the Checkout and apply the coupon BLACKFRIDAY17. the price will be severely reduced AND have no tax appended

mortal mesa
#

Brazil has a huge economy, if a north america company with international reach isnt in Brazil it should be considered a failure on the international front

errant dust
#

Dell exists in Brazil just fine. The problem is that all electronics are severely upmarked

#

Here in Brazil the machine I paid 1078 dollars for would come out to some 3500-4000

errant dust
#

If you can find it at all. Top end tech is a rarity. But it makes sense. The markup is not even taking into account relative incomes

#

It isn't a big issue for me though. Aside from just bringing things back myself, I have family regularly staying here. Such as next week, heh.

#

Fam coming in from US that is

bitter hearth
#

its good to get a GPU I think

#

cloud is too tiring

#

huggingface and civit download speeds are not always the same speed on one day as on the next day 🤔

#

but getting private storage like google drive or backblaze is tricky cos it takes money that could have gone to GPUs

craggy crest
bitter hearth
#

is texas becoming a big tech place now like california is?

#

I don't follow US stuff too closely but I heard that

craggy crest
bitter hearth
#

ah okay

crisp pine
#

help

opal sedge
#

help

lean totem
#

Help me generate a colored icon that represents the generation of asset inventory tables and the printing of asset labels

short thicket
short thicket
short thicket
#

That's Mangled Merge Flux V1 + a lora I am in the middle of training for macro photography/etymology

mortal kite
#

anyone use flux outpaint much? I'm not sure I am understanding how to use it correctly

short thicket
mortal kite
#

flux doesn't seem to like outpainting only in a single direction or something 😄

short thicket
short thicket
#

Entomology sorry

#

working on it now. In the meantime...

#

This one is with my lora at 100%

"a macro photograph, a chubby lady that looks like a lady bug and she looks stylish and rich"

#

This is the base model alone with the lora at 0%. Working on 45%.

#

I didn't see much of a change at 45% but here is 70% lol. I'm gonna try 80.

#

Looking for that sweet spot. Here's 93%. LOL

#

getting close at 93.9%

#

yeah, It's interesting to see the transition though.

short thicket
civic trail
short thicket
short thicket
grim crest
# craggy crest has been for years

austin texas is. the rest of texas has been the electronics components and semiconductor center for 60 years. see texas instruments and mouser.

grim crest
craggy crest
grim crest
#

there is a lot more going on in Austin than IBM/DELL.

#

also oracle and apple

#

HPE is in dallas, austin, houston

neon hawk
#

Create an image for a clothing store called branded_items that should show silhouettes of clothing frozen in ice cubes.

mortal mesa
#

IBM is in NY though

#

always were

pseudo owl
#

New shuttle 3.1 is great, has more detail then 3.0 for sure. This is 4steps, 2048x1024 with just Euler discrete

mortal mesa
#

look closer, its kinda not good

#

the lines, ive had them pop out for unknown reasons also

weary crystal
#

1448x1448 was the recommendation for maximal generating size

#

Not sure, outpainting would be a option too

mortal mesa
#

look at the sky when you open in browser

#

shuttle 3/3.1

pseudo owl
#

This is natively generated, I didn't upscale it.

mortal mesa
#

4096x4096 expect problems

pseudo owl
#

It can generate as high as 2048x2048 but you get weird artifacts, shuttle3/3.1 is better at native generation at higher-res.

mortal mesa
#

my shuttle starting point is 1536x1024

pseudo owl
#

Yeah also shuttle needs really low steps so its very fast since its a finetune of schnell, 4 is usually a good spot.

mortal mesa
#

i think i found a comfyui bug maybe, didnt look if known, but like with the shuttle models if i gen on one and switch to the other it crashes, i think hitting the unload buttons prevents that maybe

weary crystal
#

Well i tried the shuttle model 7 steps 11 steps 60 steps and 100 steps. I love how all steps above ten work as refiner

mortal mesa
#

this is a workflow i modified for shuttle 3, i love it, doesn't really give the same results with 3.1, almost seems like it needs more steps now, its a handful of 2 steps that ends in a 8MP image

pseudo owl
#

I mean not too bad for 1step with Flux.1 schnell, euler discrete, 2048x1024

pseudo owl
limpid thunderBOT
#

Thank you for using comcom analytics.
"comcom analytics" supports all community managers (moderators and server owners) by stats, visualization, and analytics.

If you have any questions, feel free to ask us!
Your dashboard
Help
Support server

Other languages
en: help
ja: help Japanese

craggy crest
bitter hearth
# mortal mesa

nice, I use the same TTP_Toolset workflow I just arrange it differently LOL

#

it works the same, I just replaced the tiling method with ComfyUI_SimpleTiles, added ColorPallete from ComfyUI-APQNodes, some post-processing from ComfyUI_LayerStyle and the new depth map thing

#

I removed the texture detailer section which was probably a mistake

supple socket
#

hey.. Do you guys have advice for me like how can i extract patterns from images. I mean which techniques should i use for it. For example this image i liked the tshirt pattern

bitter hearth
#

you are lucky because the best tool for this came out recently https://github.com/kaibioinfo/ComfyUI_AdvancedRefluxControlif you scroll down to "Masked Conditioning Images" you can see he used it to extract the floral pattern from the clothes
its essentially exactly what you are asking for

bitter hearth
#

no problem 🙂

supple socket
bitter hearth
#

yeah

supple socket
bitter hearth
#

same

#

so slow

supple socket
#

i couldn't even manage to run

#

i got weights error for sigclip

bitter hearth
#

what GPU do you have

supple socket
bitter hearth
#

ah yeah 3090 isn't great it can't take full advantage of FP8 matmul

supple socket
#

bruuh

#

i wonder is there any sd alternatives

bitter hearth
#

would try downloading clip vision again

supple socket
#

but idk the keyword for research

bitter hearth
#

just search for diffusion

supple socket
bitter hearth
#

would recommend not using manager

supple socket
bitter hearth
#

and just using git and wget

supple socket
bitter hearth
#

also try several different sites

#

sadly it is very common for people to upload broken models

#

or to mess with the model in some way e.g. pruning

#

or heavily quantising without telling you

#

its kinda hard to get pure versions of models sometimes

supple socket
#

i downloaded all models from huggingface but still same error

#

bruh

bitter hearth
#

not sure, would ask on comfy discord or banodoco

craggy crest
rain surge
# craggy crest

I like the style, looks a bit like book cover you could have on the old days glorp

bitter hearth
#

really nice artstyle yeah

rain surge
#

Really want to try SD3, but too lazy to setup another docker image, and it's seems not compatible with current ComfyUI setup i have (despite being latest) glorp

bitter hearth
#

its rough setting up docker instances yeah

#

TBH most workflows are not model agnostic, its the reason why I have only used 3 models ever (SD 1.5 -> SDXL -> Flux)

supple socket
#

is there any custom nodes to join images like top bottom left right

rapid moon
#

i was told that 3.5 large will fit 4090, but i see its cant load the model due to insufficiant memory, why is that? is there something special to do?

rapid moon
#

be ware of this "supporter", its a thief

gusty trail
rapid moon
#

anyone have python example how to run sd3.5large on 4090?

supple socket
supple socket
craggy crest
supple socket
craggy crest
turbid grotto
#

How is sd3.5 finetuning right now? Is it begins figuring out?

supple socket
#

can i use 2 comfy on same gpu

#

and if i can how will it handle

gusty trail
#

Why would you use two comfy? You could just use another tab

rapid moon
#

do u have a code example?

#

i only see comfyUI

#

i dont use windows, i need it in python

#

i use sd commercialy with liscence, thats why i need it in code, where can i find that backend?

#

oh its in python also... didnt know that.. thanx! will try that

supple socket
#

yea cuz queue using 100% of gpu

rapid moon
#

ill just call the functions directly

#

i also run rest, anyway, ill take a look

#

ill probably call it from another python

#

blaaahhhhh RuntimeError: operator torchvision::nms does not exist

#

ill work it out... thanx!

#

no thats version stuff.. ill reinstall packages...

rapid moon
#

@halcyon yarrow everything is perfect, you helped me alot! Thanx!

pseudo owl
real terrace
#

I'm looking for a flux model that won't OOM my system and work decent in quality/speed, for 12 GB VRAM

bitter hearth
#

this would be fine flux.1-lite-8B-alpha-Q4_K_S.gguf https://huggingface.co/city96/flux.1-lite-8B-alpha-gguf/tree/main

#

offload text encoders before loading DiT

real terrace
#

I'll test those ones, thanks

bitter hearth
#

the memory management in Comfy has never worked for me on any GPU

#

not sure if I have always had a setting wrong

#

but its pretty rough

#

depends on the situation
TCD works well with eta and only 8 steps

#

for the most part eta is for 40 steps plus

#

I also think flux gets harmed by it

#

I think people who are not making photos like the results but for image quality I think it is a negative

#

oh there's no point in taking into account recommendations

#

yeah I remember

bitter hearth
#

I don't follow video but that's the best sound sync I have seen

#

might be okay yeah

#

wouldn't do it personally but it could work

#

0 eta before 40 steps

#

is the main thing

dusky thistle
#

depends a bit on the model

#

and the step type

bitter hearth
#

if you are able to schedule it then you also want the eta to decay

#

but that might be too complex for this project

dusky thistle
#

flux can handle low steps with a bit of noise better

#

2m handles noise less well at low steps than 2s

#

yeah it's good to keep it real low for anything with only 20 steps, if not zero

#

30 is kinda where SDE starts to clearly become better than ODE

bitter hearth
#

there is a bit of nuance especially for flux
you can do a little bit at low steps

dusky thistle
#

40+, unleash the noise for sure

bitter hearth
#

I don't know why but TCD lora with TCD sampler
can do high eta at like 8 steps and it goes fine
something very different is happening there

#

99% of my images have been with TCD, until flux era

#

its the acceleration lora with the highest complexity score

#

which is why I like it

#

its kinda not relevant now we have Shuttle Diffusion doing like 2560x2560 in 3 steps

dusky thistle
#

yea that's def not enough steps for 2m

#

you can kinda get away-ish with res_2s at 10-15

bitter hearth
#

are you willing to use turbo lora

#

it is compatible with other loras

#

bare in mind 15 steps of res_2s takes the time of 30 steps of res_2m

craggy crest
bitter hearth
#

lol

craggy crest
#

else (cross.eyes.=="true");

bitter hearth
#

I think its an improvement

#

eta = 0.20 at 30-40 steps might be a bit rough

#

its good to experiment anyway

dusky thistle
#

with the res samplers you can generally get away with that once you get to 30 steps

#

espec with flux

#

tbh the most important thing might be how many steps you run with a low noise level, as opposed to the total number of steps

#

insofar as mutations and crazy shit are concerned

#

if you cut the eta down after a certain number of steps it can be helpful

#

so like... if you're runinng 30 steps, steps 0-10 might have eta 0.5,

#

11-20 eta 0.2

#

21+ 0

#

this would start at eta = 0.5 and follow that plot

#

and then it should fill in the rest with 0s

#

np

#

yeah thats in the newest version

#

its in the manager now too fyi

#

though honestly i think it's good to handle it with git cuz you can git reset --hard so easily to hop around versions

#

i don't do funny stuff with requirements.txt so you wont break dependencies doing that with my repo

bitter hearth
#

it's kinda easier to use git anyway if you have docker

#

cos it just goes in the provisioning script

#

if you are updating or installing anything on a live server after provisioning script ends that's a mistake

#

main cloud advice I have is don't pay for storage

#

it works out better to download each time

bitter hearth
#

civit download is so slow that it would be kinda hard to hit 1TB

mortal mesa
#

i think there are also other type of bandwidth rules, i drop speed for chunks of time than its back to zippy for a bit

#

its either i get full saturation or 3mbit

#

ya so have i, but i have had extreme slow downs also. i dont know why, them, me, isp, who knows

rain surge
turbid grotto
dusky thistle
#

sd35m, euler vs gauss-legendre SDE

rain surge
craggy crest
unkempt warren
#

#Midjourney

dusky thistle
#

Wf should be embedded but yeah it's from the new version

#

In numerical analysis and scientific computing, the Gauss–Legendre methods are a family of numerical methods for ordinary differential equations. Gauss–Legendre methods are implicit Runge–Kutta methods. More specifically, they are collocation methods based on the points of Gauss–Legendre quadrature. The Gauss–Legendre method based on s points ha...

limpid thunderBOT
#

Thank you for using comcom analytics.
"comcom analytics" supports all community managers (moderators and server owners) by stats, visualization, and analytics.

If you have any questions, feel free to ask us!
Your dashboard
Help
Support server

Other languages
en: help
ja: help Japanese

muted dove
#

I used Flux Redux and used a ComfyUI workflow as one of the 2 input images 😄

bitter hearth
#

haha the noodles

#

that's so cool

#

legendary version of gaussian noise sounds cooler than what Gauss–Legendre actually is LOL

pseudo owl
bitter hearth
#

are there samples

pseudo owl
#

The size though, lol. 13b params and 8b text encoder

bitter hearth
#

ah nice

pseudo owl
bitter hearth
#

its Hunyuan so it has potential

#

I don't follow anime stuff, is Illustrious the new big thing?

#

there's this NoobAI thing too

#

I think its Illustrious-based but not sure

#

it has V-pred and ZTSNR which is awesome

#

oh awesome there's a paper thanks

#

I'd like to try Pony and Illustrious for photorealistic sci-fi and fantasy
I'd have to hide my workflows for those so I don't get criticised for using Pony LOL

#

yeah 2MP is good for SDXL

#

does remix meant img to img

#

stock SD3.5 does mostly look better than stock flux yeah

pseudo owl
#

Did you check the ghost example and the text painting example, mochi has issues with large amounts of motion where it has distortion.

A nice thing is that according the their paper, with shifting scheduler and high shift, you can get good quality with even 10 steps.

bitter hearth
#

ah okay yeah

#

I've been working on exactly that
this is for flux but will do for SD 3.5 as well

#

its not done yet it needs better depth map engine and more control nets

#

depth maps get way better than this but IDK how yet

#

the downstairs did okay

#

but the depth map broke for the back wall

#

and the side walls are going funny cos my depth map is not smooth

#

the gradient has lines and it things the lines are a physical feature

#

canny is way stronger yeah

#

the reason I prefer depth is that I want it to creatively add stuff

#

but only a little bit

#

yeah that helps as well

bitter hearth
#

seen people refining flux with SD35L

#

interesting

#

when I am doing long generation workflows I always end with SD 1.5 unet, sent to SD3M VAE using latent interposer

#

i made this complex system where loras are categorized by an LLM and then an LLM picks the best categories for a given promptthis is a great idea

#

the first part I'm saying I upscale with SD 1.5

#

do you know this thing https://github.com/city96/SD-Latent-Interposer

#

it lets you pass latents around

#

the crucial thing is to only use 16 channel VAEs

#

with this you can send SD 1.5, SDXL and Cascade images to SD3M VAE

#

also lets you get the image from Flux into the older models without using their VAE encode

bitter hearth
#

its an improvement

rapid moon
#

@halcyon yarrow just wanted to say thanx again! it is now in our production, and thats just WOW! it works about X2 times faster on 4090 than on A100 !!

#

i work at evmux.com , part of our service includes ai images for streams

#

and yeah, its on the backend now

#

i connect to comfy websocket for the progress and result and call the comfy api on /prompt by rest

#

basically like teir UI is doing, but on the server side

#

yeah, works great

#

currently im just waiting for the 'executed' event, in the future will expand

#

im not saving, im using the preview and sending it as the result, i dont want images to be saved on the server

#

that shouldnt happen... we are running on many data centers

#

and losing it is not a big deal if its super rare, it will just retry

#

exactly

#
    async def handle_message(self, msg):
        data = json.loads(msg)
        if 'type' in data and data['type'] == 'executed':
            data = data['data']
            if data['prompt_id'] in self.waiting_requests:
                fut = self.waiting_requests[data['prompt_id']]
                try:
                    if 'audio' in data['output']:
                        fut.set_result(data['output']['audio'][0]['filename'])
                    else:
                        fut.set_result(data['output']['images'][0]['filename'])
                    asyncio.create_task(self.clear_request(data['prompt_id']))
                except Exception as e:
                    stackTrace = traceback.format_exc()
                    print(stackTrace)
                    fut.set_result(False)
#

and also added the stable audio, maybe we will add it to

#

yeah, and then i see it in /api/view?filename={filename}&type=temp

#

i just call it as the ui do, with randomizing seed

#

yeah, copied it from the request in the browser and using it as template changing seed and prompt

#
async def draw(self, prompt, neg_prompt, width, height):
        request = create_draw_request(self.client_id)
        request['prompt']['3']['inputs']['width'] = width
        request['prompt']['3']['inputs']['height'] = height
        request['prompt']['6']['inputs']['text'] = prompt
        request['prompt']['7']['inputs']['text'] = neg_prompt
        res = await self.do_post('/api/prompt', request)
        fut = Future()
        self.waiting_requests[res['prompt_id']] = fut
        return res['prompt_id'], fut
#

create_draw_request is the template

#

i currently have workflow to keep 2 models in the vram, the sd3l and the stable audio, pretty cool

#

we need speed so we just use 1 flow on each 4090, loading the model takes time

#

We just need to generate pics so actually its the same flow

craggy crest
#

@halcyon yarrow you did good :)

#

yes

pseudo owl
#

and these are with fp8 quantized and lower-res versions if I'm not mistaken.

#

This is what it produced with the will smith eating spaghetti prompt and this is again with the fp8 quantized and lower-res versions. Doesn't know who will smith is but the spagetthi eating is near-perfect. Even with quantization, it produces better quality then full quality Mochi-1 from genmo with a upscaler.

Hope some more optimized code comes soon.

pseudo owl
#

16gb lol but I didn’t try with low vram, maybe that could work

rapid pivot
pseudo owl
rapid pivot
rapid pivot
pseudo owl
rapid pivot
#

To generate longer videos do you need more vram or only take longer?

summer bloom
#

Using what method

pseudo owl
rapid pivot
#

I was wondering if the AI as it generates more frames it had to keep what was in the past in it's memory or something

bitter hearth
#

a lot of methods to extend AI video are just starting a second run through the model

#

in those situations it has no "memory" of what came before, just a keyframe it was given

sacred jewel
bitter hearth
#

I think its too early for video really

craggy crest
craggy crest
sullen moss
#

I no longer have time to test everything, let alone conduct full-scale, in-depth experiments.

#

There are now more video generators than image generators.

muted dove
#

I'm having too many problems trying to install the required packages to get this to work (mainly triton), so I gave up 😦

#

Not sure if that's because it's limited in the Windows portable version...?

sage burrow
#

does anyone know if sd3.5 large knows japanese?

sage burrow
pseudo owl
# bitter hearth I think its too early for video really

Honestly, I think it’s kind of the exact opposite. We have so many options, and great models.

CogVideoX is like sd1.5 of the video models, with so much support and control(Lora’s, dimensionx, control net, fun models, multiple aspect ratios, rf-inversion, rf-edit, image2vid, trajectory, consisid)

Ltxv is crazy fast, and now we know some hacks to make it great quality as well. It takes like 10sec to generate a 5sec video.

Also, mochi is already sd3.5 large prompt following level and generates excellent videos. Mostly for realistic and cartoonish style.

Hunyuan vid seems to be trained on many styles for a video model, it knows lots of art styles/anime styles/cartoon styles for a video model surprisingly, and is excellent quality.

#

Hunyuan video is surprisingly flexible too, it was trained on 5sec videos and a specific res, but it can do lower res, and even 10sec videos

This is a pretty dramatic gen 😆

remote holly
#

I can't wait to see ltx v paper to discover what they do for the optimization

pseudo owl
pseudo owl
remote holly
#

Ho , that's smart

#

i tried stg in comfyui with ltx , the quality increase is amazing

#

but i can't do img2vid

pseudo owl
#

Why?

remote holly
#

i dont know how to modifiy nodes for i2v

#

but you can probably do i2v without issues

muted dove
#

I probably followed the same instructions. I got as far as trying to install the Triton wheel, but I keep getting the same error: ERROR: triton-3.1.0-cp39-cp39-win_amd64.whl is not a supported wheel on this platform.

bitter hearth
#

doing the setup for stuff is always super hard

#

I almost never manage to get everything setup to reach maximum flux speed on a server
I've only managed it like twice

muted dove
#

I have CUDA and VS installed, the path is set for cl.exe, but still no luck.

bitter hearth
#

12.6 is quite a spicy CUDA version, I mostly see 12.1 for diffusion stuff

#

it might be okay though

#

sage is big boost sometimes

#

mostly for video

#

much better yeah

#

fp32 files need double vram

#

filled vram

muted dove
#

Trying, but it's slow going... Building wheel for flash_attn (setup.py) ... -

#

Does it do i2v as well then?

#

What?!?!?!?!

bitter hearth
#

good small details versus good composition

#

is the video good

remote holly
#

😔

bitter hearth
#

Flux used to be slow for me but with Nunchaku and 2-step Shuttle, Flux is fast now
but with video I'd go back to being slow again

pseudo owl
limpid thunderBOT
#

Thank you for using comcom analytics.
"comcom analytics" supports all community managers (moderators and server owners) by stats, visualization, and analytics.

If you have any questions, feel free to ask us!
Your dashboard
Help
Support server

Other languages
en: help
ja: help Japanese

pseudo owl
#

Nice, you made it work with low vram? I have to try it with that. It looks decent with that old retro look.

muted dove
#

3.5hrs later...I GIVE UP!!

error: command 'C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.6\\bin\\nvcc' failed with exit code 4294967295
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for flash_attn
  Running setup.py clean for flash_attn
Failed to build flash_attn
ERROR: ERROR: Failed to build installable wheels for some pyproject.toml based projects (flash_attn)
pseudo owl
#

I’m on my phone right now, and my pc is at my home so can’t run it right now locally.

#

Yeah it’s probably trained on a large amount of anime videos, it seems to beat any other video model in its diversity of styles.

Mochi is decent at anime but has very little styles, CogVideoX isn’t too great with anime.

Ltxv with stg guidance btw is pretty good too and really fast.

#

Prompt?

#

It has a different prompting style then most other models, it actually likes concise 1-2 sentences abt the vid.

dull star
#

are you using fp8 or fp16 T5?

remote holly
#

How you added the stg ?

#

Is like skip layer guidence for video models

dull star
#

the difference is massive

#

cfg vs stg

remote holly
dull star
#

30 iirc

#

I love ltxv cause its super fast

#

not real time fast but still very very fast

remote holly
#

yeah , i love also

#

20 steps

dull star
#

I love hunyuan but I don't want to generate MSDOS level resolutions just to run it

remote holly
#

yeah

remote holly
dull star
#

24GB

#

but im still impatient

#

so even if the quality is better I can't wait

remote holly
#

you can resolve with "ltx add latent guide" node

brittle nexus
brittle nexus
#

If you use the node "modify LTX model", even in another workflow, it will break the I2V unless you clean the loaded model property or restart Comfy. You may use the ltx add latent guide node to make it work again, but you will be using the modified model even if you current workflow don't use the modify model node.

remote holly
#

why i get this error : The expanded size of the tensor (17) must match the existing size (18) at non-singleton dimension 3. Target sizes: [1, 128, 1, 17, 23]. Tensor sizes: [128, 1, 18, 23] ?

brittle nexus
pseudo owl
brittle nexus
pseudo owl
brittle nexus
brittle nexus
pseudo owl
brittle nexus
#

The workflow is too big since i'm doing 4 videos at times and combining them but you just need this node

pseudo owl
brittle nexus
brittle nexus
#

You can't do side by side with i2v anyway. It changes the model for both

#

Once you use the "modify model node", you need to flush de model or restart comfyui

brittle nexus
remote holly
brittle nexus
remote holly
young blade
brittle nexus
remote holly
#

thanks , i corrected this

remote holly
#

ltg seems to get more motion on video

turbid grotto
#

anatomy is better but yet not enough, also it works with higher cfg now

fleet meteor
# dull star

Any idea idea if stg uses more vram? I wanna try it

young blade
#

making lora's is kinda addicting

craggy crest
#

@bitter hearth you are being watched

slim oxide
#

girl with white skirt

buoyant mesa
#

hallo, i am new to Stable diffusion 3.5

#

is it possible to train a Lora for it (i know you can for 1.5 and XL)

short thicket
#

BTW. I have a little over 10k images in my dataset now. I have a few last things to wrap up this weekend with it and then I'm gonna fine tune next week.

short thicket
#

I have heard of it, but haven't messed around with it. It has a color palette option right? I think that's pretty cool.

bitter hearth
#

its cool cos it didn't require training

#

they found colors the model already knows then they convert your color palette to the nearest ones

#

lol nice

ancient radish
#

anyone using sd forge? they have rly good upscale when generating image but i cant upscale any of my older pics good with img2img/extras. whats the current method on forge?

short thicket
#

Nice! Yeah, I'm not really trying to train on millions of images or anything. I like the smaller yet more selective type for now. I will build up on it over time. Maybe go for another fine tune when I have 30k images.

ancient radish
short thicket
#

No that's pretty cool. Is there a link to the original? Ide like to look into it more.

bitter hearth
#

demoncore is the best I have tried

#

ok thanks will try it

ancient radish
#

¯_(ツ)_/¯

young blade
#

anyone recall a website or hugging space that allows you to upload a short video animation and then an image, and it will try to animate your image in the same way the video is?

craggy crest
pseudo owl
#

Bruh hunyuan text to video is completely uncensored. It can probably compete with ponyxl in that department.

#

What the hell did Tencent train it with lol

low stone
remote holly
#

Can i run t5xxl fp16 with 12gb vram ?

#

I have 16gb of vram

mortal mesa
#

i do with an 11gb card in comfy

brittle nexus
#

I have 12gb

remote holly
#

I dont see any difference between fp8 and fp16 with ltxv

turbid grotto
#

is it possible to run hunyuan with 12gb vram?

#

no I am not asking due to certain message before

pseudo owl
turbid grotto
brittle nexus
#

LTX

craggy crest
#

sd3.5 large to create the image, kling to animate it

#

really liked how the final effect came out

brittle nexus
buoyant mesa
craggy crest
#

to put it to the test

mortal mesa
#

smoke on the water

#

sounds like you need to be in a relationship with an AI model

craggy crest
mortal mesa
#

Noooo!

turbid grotto
#

Richard, how did you manage to run hunyuan video? I can't even load text encoder with 12gb 😦

craggy crest
turbid grotto
turbid grotto
turbid grotto
brittle nexus
craggy crest
brittle nexus
craggy crest
turbid grotto
#

I'll wait for wrapped to mature, for now

bitter hearth
#

I've had loads of trouble on 32GB DRAM servers
I'm saying DRAM not VRAM here
would recommend 64GB DRAM

#

if you can, then 128GB, but 64GB is mostly ok

mortal mesa
#

they killed Richard

craggy crest
craggy crest
#

from comfy's newest annoucement: ComfyUI now natively supports masking and scheduling of LoRA and model weights. This works for all LoRAs and models - as long as the LoRAs and models are not of conflicting architectures. The full blog post is here: https://blog.comfy.org/masking-and-scheduling-lora-and-model-weights/

Comfy Org Blog

As of Monday, December 2nd, ComfyUI now supports masking and scheduling LoRA and model weights natively as part of its conditioning system. It is compatible with all models. The update also has an extensive ModelPatcher rework and introduction of wrappers and callbacks to make custom node implementations require less hacks,

patent acorn
craggy crest
mortal mesa
#

it was but you needed nodes, this is native support

patent acorn
#

yeah custom node makes thing pissible

turbid grotto
#

I can run HunyuanVideo now but all I get is black video, anyone had similar issue?

bitter hearth
remote holly
#

i just tested sd3.5M is realy great

remote holly
#

what is the best upscale model ?

remote holly
bitter hearth
#

there is a newer one that slightly beats SUPIR maybe can't remember the name

#

but SUPIR has been slightly broken into its components by Kijai, for ComfyUI so I think that is still a better choice

#

if you wanted the image to not change as much then openmodeldb is the place to go

remote holly
#

thanks very much for the information ! @bitter hearth

bitter hearth
sullen moss
#

Has anyone else faced this problem? If so, how did you resolve it?

turbid grotto
turbid grotto
turbid grotto
#

I am able to do 41 frames at 512x320 with 35steps in 290 seconds on rtx3060 gonnabegood

#

wish svdquants worked with this model

#

would be even faster

bitter hearth
#

it may well work

turbid grotto
#

it is actually great even at 512x320 fp8, we need video upscale now

bitter hearth
#

nvidia put svdquant on their tensorRT roadmap too so there could be another speedup there

turbid grotto
#

hope not only for 50 series

bitter hearth
#

I would say that it should be possible to make TensorRT engine for that on older GPUs
but its Nvidia so you never know thomas

turbid grotto
#

TRT works super fine with sdxl on 3060 at least, but nvidia is nvidia

#

Yea, I can tell, hunyuan did not filter training data 😂

toxic bone
#

could just be that for the optiimization to work, it needs hardware support. Like they made the raytracing exclusive to RTX cards and people were like "Ughghhn monopoly and shenanigans!" but then they pushed a version that worked on GTX and without the hardware support it just wasn't good.

Sometimes instruction sets on the chip are crucial to make a routine performant. The optimization might not be effective if it's done on the fly in software.

Sort of like how fp8 isn't slow on ada cards, but it is on older models. That's because the hopper transformer engine with it's hardware fp8 support facilitates it.

remote holly
bitter hearth
#

awesome

remote holly
#

dont works waow

#

missing nodes

bitter hearth
#

git clone?

remote holly
#

yess

bitter hearth
#

hopefully will be sorted out soon if so

remote holly
#

i copied the nodes files to custom nodes but it doesn't works

#

you need to install with anaconda

#

it doesn't make sense , you have to install nunchaku with conda environement but how comfyui is suposed to find them ?

turbid grotto
mortal mesa
turbid grotto
pseudo owl
dull star
#

Switti

pseudo owl
# dull star Switti

Switti is so fast and pretty decent, I would say sdxl quality, slightly better?

toxic bone
#

really good impressionist paintings

turbid grotto
opal quarry
turbid grotto
# opal quarry

hmm great hands but doesn't look like Flux, what model is that?

cursive frigate
bitter hearth
#

diffusion ones are larger

cursive frigate
#

@bitter hearth are there any new great upscale models that are better than the ones in my screenshot. mainly for photo quality and sometimes anime stuff

bitter hearth
#

the Nomos2 ATD

#

the input needs to be very high quality

#

so sometimes the other ATD models are better

cursive frigate
#

I'll give it a try. Just got everything running on Ubuntu. Got a new 4tb nvme, pulled out the windows nvme, installed ubuntu, put the windows nvme back in so no bootloader or grub files ended up on windows disk and I can now use either one. Got docker running for open-webui, installed ollama, got ComfyUI running. And I have to say everything runs so much better on linux.

#

@bitter hearth Anyways, have a good night. I'm off to bed, gotta go to work in about 5 hours.

bitter hearth
#

night

charred wraith
#

yo

limpid thunderBOT
#

Thank you for using comcom analytics.
"comcom analytics" supports all community managers (moderators and server owners) by stats, visualization, and analytics.

If you have any questions, feel free to ask us!
Your dashboard
Help
Support server

Other languages
en: help
ja: help Japanese

sullen moss
pseudo owl
mortal mesa
#

sora is probably a monster that isnt worth the power bills to run, its the openai way

#

quantity not quality

sullen moss
#

We’ll see how it goes. For me, DALL-E 3 is still an unmatched generator. Almost a year has passed, and no model has come close to its understanding of complex scenarios and concepts. The only and biggest downside is its strict censorship. I’m hoping for the growth of decentralized AI—it's the only way regular users can compete with big corporations.

#

😂

remote holly
#

😭

#

I hate theses restrictions policy

pseudo owl
#

dalle3 is completely outdated, and has very large biases too that are even worse then flux

sullen moss
#

Everyone is entitled to their own opinion.

pseudo owl
sullen moss
#

Have you generated a lot in DALL-E?

pseudo owl
sullen moss
toxic bone
#

flux prompting is on par with Dall-e imo. And then you can make it comprehend what you mean with a lora. i can't speak towards sd3.5 as much. there's also omost which uses an LLM to form a regional prompt structure and generate complex images.

local solutions are not only as good, but they give more tools to leverage with. The level of contorl you have is far superior.

sullen moss
#

I’ve seen what DALL-E 3 is capable of if you dive deeper into its dark subconscious.

toxic bone
#

lol it has no subconsious

dull star
#

its probably a form of expression

pseudo owl
dull star
#

yeah it absolutely does use an LLM to change your prompt

toxic bone
#

expressions that anthropomorphise models show a fundamental misunderstanding of the tools.

dull star
#

I prefer flux (pixelwave especially) over DALLE3 any day of the week

#

DALLE3 is good for like... powerpoint presentations I guess

#

it must have some style or substance in it if you prompt for it, but I just don't like the images it produces

#

and besides its a service, I like offline

toxic bone
#

even the simplest aux tools like inpainting are unavailable

dull star
#

bruh

toxic bone
#

hell you can't even adjust the number of steps

pseudo owl
#

Hunyuan T2V which is open source(not even full quality, the fp8 lower quality version and less frames which makes it lower quality)

toxic bone
#

can it really be called a new sora with only 3 second attention span?

pseudo owl
toxic bone
#

i'm sure it can do more, but each 3 seconds will be a sepearate attention context

#

point is, sora seemingly has no attention seams. it's in another class and it's hard to compare models with that short attention span to it

sullen moss
pseudo owl
toxic bone
#

that limitation is put there for payment purposes. it's artificial. Sora can do many minute long videos, even if they're not offered on the plan you have.

#

i get it though. sora is the brand name. eveyrone wants to be sora. so you call it "the new sora" or "the sora killer"

pseudo owl
turbid grotto
toxic bone
#

so provide an example

#

if there's not a little, that implies theres a lot. so it should be easy

turbid grotto
toxic bone
#

figures

pseudo owl
bitter hearth
#

Flux Dev beats Dalle by 0.01 in the GenEval benchmark
that's within margin of error

#

this is from the Playground v3 paper

turbid grotto
bitter hearth
#

found a much better table
from the Sana paper
this time Dalle 3 and Flux Dev are exactly even on GenEval, and Dev is 0.5 ponts higher in DPG

mortal mesa
#

to me its not an apples to apples comparison, Dalle should be thought of as a complex workflow, not a single model

sullen moss
#

These are all synthetic tests. I never pay attention to them. Any tool needs to be deeply tested to understand its hidden capabilities and limits.

bitter hearth
#

you can make your own benchmark dataset if you want
its quite tricky though as you would want it to be well balanced in terms of the distribution of entities, attibutes, spatial relations and non-spatial relations across the prompt dataset
and then you would want to run at least 1k prompts on it

toxic bone
#

was funny to me when there was a benchmark model people were all about called "chad" or something. It was trained on all tons of gigachad images. so people were like "this prompt works better because my classifier model rates it higher" but it was just cause it looked more gigachadish... which is far from realistic

pseudo owl
craggy crest
cursive frigate
#

Anyone doing anything interesting in ComfyUI these days?

wind basalt
#

Hello all, thought I would check in, wondering if the newest SD is able to render hands/fingers in front of faces without messing up the face behind the fingers?

noble phoenix
#

/fingers

still lark
#

mr

compact spruce
#

Hello guys. I'm a dev agency and I'm developing a flux workflow to create head shot image, and especially that I will deploy the workflow onto the web app so that we can monetize it. After I talked about that idea, a lot of guys have been texting me and asking me to build web app for them, which I think is a great way to earn money from your workflow.
I think there are a lot of people here have cool workflow and want to make it into a web app, launch it to a thousand of users and make money but don't know how to build it. If you are thinking about it and need any help, let me know, I will support nicely :).

#

for example we are deploying this workflow onto our server

buoyant mesa
#

great thing wish you best luck!
But to use flux you need a licence but i guess you know that

compact spruce
#

flux.dev is non-commercial license so we do not use that

sullen moss
#

🤔

buoyant mesa
#

can you use loras for SD 3.5 in comfyUI yet?
I can't get it to work
can somebody help pleassseeee

errant dust
#

Is there a GGUF version of the Flux tools?

#

Fill/Redux

pseudo owl
remote holly
ancient radish
#

https://www.runninghub.cn/post/1837756005856821249

very clean character sheet but i cant get it to work, if anyone interested in trying it and gets it working maybe they can give a hand xd

errant dust
toxic bone
#

you actually don't need a commercial license to use the outputs of flux dev. you only need the license to offer the model on a service.

#

i know you fall into the latter category, i'm just clarifying

pseudo owl
split bramble
toxic bone
lucid swift
bitter hearth
pseudo owl
bitter hearth
#

ooh it takes SEGs

#

the SD 1.5 SEG control net was really nice

#

acts a bit like tile for layout but it lets the model be a bit more free

#

I think a model like this is what will be the thing that will dethrone Flux next year

#

a big transformer with lots of control stuff

cursive frigate
#

@bitter hearth I still can't figure out custom sigmas 😉 ,seems to be a bit above my skill level. I also can't really find anything online that refers to them or how to use them.

civic trail
bitter hearth
remote holly
dry wave
pseudo owl
dry wave
#

nice, should try it. Is there a tutorial anywhere for comfyui?

bitter hearth
#

keep an eye out for FP4/Int4 stuff
cos that's getting more common now (its half the VRAM and up to twice the speed of FP8)

remote holly
#

I can't run mmaudio , i have torch issues

bitter hearth
#

I had torch issues with SD 1.5 today lol

#

its so hard juggling dependencies and docker containers

remote holly
#

Cannot import ComfyUI-MMAudio module for custom nodes: No module named 'torch.nn.attention'

#

what is the module torch.nn.attention and wich version has this module ?

pseudo owl
dry wave
#

thx

remote holly
toxic bone
#

nodes requireing specific versions of dependencies. that's pretty sussy considering how vulnerable packaging infrastructure is.

wasn't it a video node thatpeople were excited about that was the first big security breach on comfyui?

#

the latest was a specific pypy package that was used on a hundred different nodes

remote holly
#

yeah is annoying , i have chance because i dont use lot custom nodes

pseudo owl
remote holly
#

I finaly generated the audios

brittle nexus
#

Analyzing the two images, is there a clear winner?

toxic bone
#

Also, Kijais whoever they are, is just an online account. Anyone could get access to it. It could always be compromised

civic trail
lucid swift
toxic bone
#

great thing about weights is that anyone can take the code and reform it to use the weights the way they want to

bitter hearth
#

really don't think its worth CPU offloading, or running models on CPU

#

would be better to pick smaller models that fit in your VRAM

mortal mesa
#

anything can be compromised, probably best to stay away from computers in general

#

the app your typing on was compromised

bitter hearth
#

there's been seven cases of malware this year in the ecosystem, have been keeping count

#

I keep it sandboxed in docker containers on cloud personally

lucid swift
pseudo owl
bitter hearth
lucid swift
bitter hearth
lucid swift
bitter hearth
#

oh for this new model yeah I just mean for what I have done before

#

I wouldn't find a few seconds that fast

toxic bone
#

i dont install custom discords hacks for instance. i have the app installed but i haven't loaded it for many months. i typically use discord in a chrome tab

winged seal
#

Haven't been in here in a while lol

#

decided I would pop in

toxic bone
#

Awww ❤️ After i popped in on you lol. The guitar mastah.

frail lagoon
#

can SD3 be used in a1111/forge yet?

lavish sparrow
#

I still like sd3.5

remote holly
lavish sparrow
#

yeah 🙂

remote holly
#

Is realy good for styles and colors

lavish sparrow
#

yeah, i like the aesthetics of sd3.5 way better than flux

remote holly
#

Yeah by far

lavish sparrow
#

and i used some new tricks i picked up from one of the capabilities of ollama and an idea by qwq (LLM model)

#

make the LLM output json, and have it a dedicated field for reasoning when interpreting a prompt, but make another field the output prompt: it'll listen REALLY well

#

and with sd3.5L having the proper formatting for t5, clipl and clipg really helps too

#

if clipl goes over 75 tokens the image goes wonky fast

remote holly
#

ho , i never tried reasonning models

lavish sparrow
#

first, qwq is actually really amazing, but i figured, why not try with something like mistral, or any other small models

#

and even that works, if they have a little bit of "private" space where they can lay out things for themselves before atcualyl making the prompt seems to help a lot

#

example from mistral-small-22b:"

"T5": "A desolate battlefield strewn with discarded armor, each piece lined with glinting silver, lies amidst cyanide-tainted earth that shimmers in an eerie blue hue. The ground is scorched and barren, devoid of life except for the remnants of a destructive dance. Swords and shields are scattered haphazardly, as if left behind by an army in haste. The sky above is darkened by a churning storm, with lightning striking down ruthlessly, casting long, ominous shadows across the bleak landscape.",
"CLIPG": "battlefield, silver armor, cyanide ground, scattered weapons, dark storm",
"CLIPL": "A grim battlefield littered with silver-lined armor and discarded weapons on cyanide-tainted earth, under a threatening stormy sky.",
"Artstyle": "Dark Romanticism"}```
#

before, that part that got in the front would end up in parts of the prompt that would be submitted, so now it really gives better visual only representations

remote holly
#

intersting , i will try it and share the results

lavish sparrow
#

pull my image for the systemprompt and nodes used tho, you need some formatting tricks to get it to work properly

#

and unfortunately qwq fails at making json if the systemprompt is too big

buoyant mesa
devout schooner
# lavish sparrow yeah, i like the aesthetics of sd3.5 way better than flux

I prefer Medium personally, both because it supports higher initial generation resolutions, and also because it can actually work properly in a traditional hi-res-fix context
Like e.g. SD 1.5 was perfectly capable of genning at 512 at then refining the image at 1024 once upscaled with ESRGAN or whatever
But 3.5 Large cannot do that (or really any sort of img2img) even for some reason, it creates noticeable artifacting at any denoise strength
3.5 Medium however goes back to behaving like you'd expect in that context
So it's just better IMO

#

Only thing I don't like about 3.5 is that both Large and Medium lean towards extremely desaturated grayish kind of images, as though "smoke, fog, haze" was always in the positive prompt or something
In contrast 3.0 didn't have that problem at all

lavish sparrow
#

i don't feel smoke, fog, haze are a part of 3.5l