#🆕｜sd3 | Stable Diffusion | Page 130

dry wave Mar 4, 2025, 3:37 PM

#

yes, because they messed up the positional embedding

#

in the end you have to train all architectures on different resolutions. There won't be a solution where the model can extrapolate to any resolution

#

but convolutional architectures have the problem that their receptive field is fixed to a specific size. So they won't be able to generalize to much higher resolution without you having to change the architecture

#

on the other hand: who cares? I think 1-2 megapixels are more than enough and having flexibility in the aspect ratio is way more important

bitter hearth Mar 4, 2025, 3:47 PM

#

dry wave yes, because they messed up the positional embedding

I wasn't sure if it had been confirmed or not what caused some of the issues SD3.5 has like that

devout schooner Mar 4, 2025, 3:48 PM

#

dry wave on the other hand: who cares? I think 1-2 megapixels are more than enough and ha...

I just don't like the fact that SD 3 / 3.5 cannot do like half-denoise-strength hi res fix at resolutions higher than their training
In the way that both SD 1.5 and SDXL could
I wouldn't really care past that too much

bitter hearth Mar 4, 2025, 3:48 PM

#

I see some strange attention issues sometimes like very small objects or collapse structure sometimes as well

dry wave Mar 4, 2025, 3:49 PM

#

devout schooner I just don't like the fact that SD 3 / 3.5 cannot do like half-denoise-strength ...

I could imagine the reason is not just the architecture but also the training

#

if you train on super high resolutions you can either
a.) downsize the image
b.) crop the image

#

I think earlier SD versions did strategy 2). They random cropped the image

#

so they learned how to denoise extremely zoomed in cropped tiles of a high resolution image

#

later SD variants then rather used strategy 1.). They only cropped the image such that it fits into their aspect bucketing and otherwise used downsizing

bitter hearth Mar 4, 2025, 3:51 PM

#

this might explain why SD and SDXL are weirdly good at tiled upscaling

dry wave Mar 4, 2025, 3:51 PM

#

the reason: if you train on cropped images, you lose the alignment between text and image (e.g. if your prompt is "image of a man with sun glasses" and the cropped image is just some part of the street)

#

also sometimes the models created cropped images (e.g. headless people) which was a result of the cropped training

bitter hearth Mar 4, 2025, 3:52 PM

#

its probably better not to have crops in the base model yeah

#

I noticed flux doesn't actually have close up textures in its data

#

like if you try close up of trees or rocks

#

SD and SDXL know it but flux doesn't know what to do

dry wave Mar 4, 2025, 3:53 PM

#

maybe having a diffusion model without text that is used for upscaling? Dunno. All the upscaling models so far are very small and trained on smaller datasets

bitter hearth Mar 4, 2025, 3:53 PM

#

not sure what is best for upscaling
every week about 20 different arxiv papers all claim SOTA

#

which clearly means most of the SOTA claims are not right

#

and they cherry pick the comparisons so they make the other methods look bad

devout schooner Mar 4, 2025, 3:57 PM

#

bitter hearth and they cherry pick the comparisons so they make the other methods look bad

As far as I can tell the combination of like, a small ESRGAN / DAT / whatever model of your choice to do the actual low-to-high-res upscale and then a diffusion model to properly denoise the result with prompt context and everything is still the "best" way to upscale in general

bitter hearth Mar 4, 2025, 4:03 PM

#

there are dedicated upscaling models these days that are stronger

#

they lose the ability to generate images "normally" though

dry wave Mar 4, 2025, 4:10 PM

#

bitter hearth I wasn't sure if it had been confirmed or not what caused some of the issues SD3...

SD3 is using relative positioning where the (0,0) coordinates are always in the center of the image.
Flux is using absolute positioning where the (0,0) coordinates are in the top left corner of the image.
In theory the SD3 thing makes sense, but it seem to not work. As the Flux devs are the same who also developed SD3 I'm pretty sure they changed to the simpler positioning scheme for a good reason.

dry wave Mar 4, 2025, 4:11 PM

#

devout schooner As far as I can tell the combination of like, a small ESRGAN / DAT / whatever mo...

I never found using ESRGAN helpful at all. Using diffusion seem to be the only way to upscale :/

bitter hearth Mar 4, 2025, 4:11 PM

#

I see thanks, I wasn't aware about the relative/absolute positioning thing

devout schooner Mar 4, 2025, 4:21 PM

#

dry wave I never found using ESRGAN helpful at all. Using diffusion seem to be the only w...

Really? What do you use for the actual upsizing? The result of a trained upscale model going into the secondary hi res denoise pass is always better than the result of any traditional method like lanczos or whatever I find, that's what I meant by combining actual upscale models with diffusion ones for the overall process

past cipher Mar 4, 2025, 4:30 PM

#

dry wave in the end you have to train all architectures on different resolutions. There w...

You've never messed around with Kohya's Hi Res Fix or HyperTile and it shows. There's definitely ways you can extrapolate out to ANY resolution. The only thing that matters is how much VRAM you have.

#

Using the toys, I know for a fact SD 1.5 can gen to at least 1920x1080 (or vice-verse), and that SDXL can gen up to at least 3840x2160 (or vice-verse)

#

Talking about pure generation, no upscaling required.

#

I don't get to test it much because I'm only on 8GB of VRAM though

dry wave Mar 4, 2025, 4:33 PM

#

there are these kind of hacks where subsample the latents in the unet

#

a normal unet cannot do that

#

not if its not trained on these high resolutions

#

the problem with these hacks is: you don't really need them. You could just do the normal upscaling workflow (lowres generation, upscaling, img2img)

past cipher Mar 4, 2025, 4:35 PM

#

I could do that, but I will always prefer a one model solution without upscaling. Just to see what things are capable of.

dry wave Mar 4, 2025, 4:35 PM

#

yes, but its not capable of that

#

all these "solutions" are also doing something similar like downshrinking the latent representation of the image internally

#

convolution has a fixed receptive field. It just cannot generate images in arbitrary resolutions

past cipher Mar 4, 2025, 4:36 PM

#

dry wave convolution has a fixed receptive field. It just cannot generate images in arbit...

Extrapolating out is easy, the only time I've ever had issues is when you try to get an image that's smaller in resolution than the training data. the Model goes nuts.

dry wave Mar 4, 2025, 4:37 PM

#

it cannot extrapolate xD That's what I say

#

like it can "extrapolate" in the sense that the image has same size and is just expanded infinitely. Like outpainting

#

but it cannot make the resolution arbitrary high. So to speak: there is a maximum size a "human head" can have in the unet architecture. It cannot make it larger

#

so making an image which is a super high resolution close-up of a face will fail

#

you can, however, do a lowres image first, scale it up, do img2img IF the model was trained on high resolution textures

past cipher Mar 4, 2025, 4:39 PM

#

dry wave but it cannot make the resolution arbitrary high. So to speak: there is a maximu...

That's up to prompting really. If you use "big head mode" (old school video game cheat), you should be able to get it.

dry wave Mar 4, 2025, 4:39 PM

#

I talk about technical limitations

past cipher Mar 4, 2025, 4:39 PM

#

Also make sure you're adding "close up portrait"

past cipher Mar 4, 2025, 4:40 PM

#

dry wave I talk about technical limitations

If I had kept the image, I could show you the model I'm working on for SD1.5 that does exactly what you're talking about.

dry wave Mar 4, 2025, 4:41 PM

#

the SD unet has no positional embedding. The "latent pixels" in your image don't really know "where in the image" they are. This is the responsibility of the convolutional layers: they lay out the image composition by "telling" the pixels where they are (how far from the left corner of the image, how far from the bottom corner and so on) and where they are relative to each other ("these two pixels are neighbours")

#

but the receptive field of convolutions is limited. There is a "maximum range" you can exchange information this way. So if your image is very large, then the pixels in the middle of the image do not know how far away they are from the border. They do not know where in the image they are. They still can exchange information via attention, but as they don't have any absolute positioning information this is really difficult. One pixel on the left side of the image does know that another pixel on the right side is "not close", but it does not know if this pixel is above or below, left or right of the other pixel.

bitter hearth Mar 4, 2025, 6:02 PM

#

yes with reasonable quality there is a pretty low limit with SD and SDXL
they work very well tiled but not in one big tile

devout schooner Mar 4, 2025, 6:33 PM

#

past cipher If I had kept the image, I could show you the model I'm working on for SD1.5 tha...

I actually made an SD 1.5 model that for the most part fully supports XL-equivalent resolutions natively
Just by training it like that
https://civitai.com/models/490451/zootvision-eta
None of these are upscaled, all genned at the resolution you see them without deep shrink or anything like that

bitter hearth Mar 4, 2025, 6:39 PM

#

these sorts of numbers, below 1560x1560 are fine yeah

devout schooner Mar 4, 2025, 6:47 PM

#

bitter hearth these sorts of numbers, below 1560x1560 are fine yeah

I never tried to go that high yeah lol
It does have a lot of like 1536x640 / 640x1536 buckets though from a bunch of ultra wide / tall landscape photos I swiped from Wikimedia Commons at one point

bitter hearth Mar 4, 2025, 6:48 PM

#

yeah this level is fine, in fact its probably better to use SD 1.5 this way, around the 1 megapixel level

viscid pivot Mar 4, 2025, 9:00 PM

#

these are native 2k gens (non upscaled) using flux or rather flex trained on a 2k dataset. I couldn't get these results on any of the SD models

#

Sd 3.5M beeing the only multi res capable model completely breaks apart during training with multi res, especially when using resolutions higher than 1.5k

bitter hearth Mar 4, 2025, 9:25 PM

#

how many images were in your dataset

#

I think things have to go step by step

#

get the model to work well at 1.5k before think about 2k

#

at least from what I have seen/read, it could be possible to fix SD3.5's 1.5k generation with something on the scale of a few million images or so

viscid pivot Mar 4, 2025, 9:38 PM

#

bitter hearth at least from what I have seen/read, it could be possible to fix SD3.5's 1.5k ge...

yea and thats the problem, i used around 600 2k images to make it work in flux

#

not millions

#

and to make matters even worse, i didn't even do a full finetune, it's only a high rank locon

#

the positional understanding in SD is just very very bad even in medium

past cipher Mar 4, 2025, 9:42 PM

#

viscid pivot the positional understanding in SD is just very very bad even in medium

Do you happen to have some prompts for testing? I once came across a list of "impossible prompts" (reflection in a mirror) but I can no longer find it.

#

Positional prompts would certainly help in my tests

viscid pivot Mar 4, 2025, 9:45 PM

#

past cipher Do you happen to have some prompts for testing? I once came across a list of "im...

I'm not at my PC currently, but i mean positional understanding at higher resolutions. SD models will duplicate and stretch like crazy at 2k

#

Meaning if you ask for something in the center, you will have it 4x instead of one large image

#

or if you ask for something in the corder at higher resolutions it will do weird stuff

#

The base Flux models do a lot better in that regard which means their concept of positioning is already better and not as tied to resolutions

bitter hearth Mar 4, 2025, 9:47 PM

#

with careful settings flux can do 3k

#

regarding SD3.5, I do think its plausible it will improve its 1,.5k ability substantially

#

but yeah my understanding is that the ballpark is million+ images, it may be less than that though

viscid pivot Mar 4, 2025, 9:48 PM

#

The other huge issue with sd3.5 is the extremely low context size for the TE

#

training for more than 154 tokens already makes the model behave weird, but with over 256 it gets very bad

bitter hearth Mar 4, 2025, 9:49 PM

#

yeah the attention masking wasn't done so what this means is the model has issues if the text encoder token count goes above a certain amount
this is less concerning because you can just limit prompt tokens

viscid pivot Mar 4, 2025, 9:50 PM

#

bitter hearth yeah the attention masking wasn't done so what this means is the model has issue...

you can, but this limits your outputs and you can't do detailed multi frame images which Flux can do

bitter hearth Mar 4, 2025, 9:51 PM

#

I think in terms of expectations it might be better off giving up on boosting it to flux level
and instead just try to improve the model a bit within its current performance bracket

viscid pivot Mar 4, 2025, 9:52 PM

#

Yea i just hope the next SD model meets the expectations

bitter hearth Mar 4, 2025, 9:52 PM

#

same I'm essentially hoping for SD4 at this point

#

the thing about flux is it has this insanely strong self attention

#

and then flux fill boosts it even further

#

if you are willing to sometimes re-roll seeds then you can use flux fill as your main model instead of flux dev and just outpaint everything

viscid pivot Mar 4, 2025, 9:54 PM

#

I'm using a de-distilled model of flux schnell as my main. This prevents most of the issues flux dev has, especially in finetuning

bitter hearth Mar 4, 2025, 9:55 PM

#

yeah that's good

viscid pivot Mar 4, 2025, 9:55 PM

#

The self attention of flux makes it difficult to train in anything it doesn't know yet and you also break it's guidance embedding

bitter hearth Mar 4, 2025, 9:55 PM

#

I only use flux checkpoints that have de-distilled checkpoints as part of them
the lighting gets so much nicer with CFG

viscid pivot Mar 4, 2025, 9:56 PM

#

yea exactly, i hate the flux base style

#

flux chin, flux light

bitter hearth Mar 4, 2025, 9:56 PM

#

yeah I don't like base flux at all I find it unuseable

#

but then with photography checkpoints and 3 or so strong photography loras I like it

viscid pivot Mar 4, 2025, 9:57 PM

#

these are outputs from the locon i trained of flex (the de-distilled model). The lighting and flux like style went away basically instantly at the first few steps

#

they are from very early in training

bitter hearth Mar 4, 2025, 9:58 PM

#

these are fantastic wow

#

no flux chin or flux skin
and the blur is nice

viscid pivot Mar 4, 2025, 9:58 PM

#

yea im gonna soon merge the locon to the base model and release it haha

bitter hearth Mar 4, 2025, 9:59 PM

#

default flux blur can be slightly strange

viscid pivot Mar 4, 2025, 9:59 PM

#

everything looks like out of a game engine

bitter hearth Mar 4, 2025, 9:59 PM

#

yeah for sure its the unrealengine look

#

not actual unrealengine but what models seem to think unrealengine is LOL

#

cos actual actual unrealengine can look better than flux base 😄

viscid pivot Mar 4, 2025, 10:00 PM

#

yea, this was done intentionally for some reason. Their guidance embedding is meant to always default to this kind of look. With some heavy prompt engineering you can get photographic styles but it is very hard

#

and it is even harder to train that out of base flux

#

thats also why person loras work so well, becaue flux ignores most of the poses, stylistic elements and so on

bitter hearth Mar 4, 2025, 10:01 PM

#

I looked around on arxiv for other models that had guidance embeddings like that

#

but I couldn't really find any

viscid pivot Mar 4, 2025, 10:02 PM

#

well that's distillation for you. Flux pro also doesn't use it

bitter hearth Mar 4, 2025, 10:02 PM

#

the guidance is fun sometimes because what you can do is set it to 1.4 lol

#

guidance 1.4 flux is a wild time

viscid pivot Mar 4, 2025, 10:03 PM

#

yea its like midjourneys chaos mode

bitter hearth Mar 4, 2025, 10:08 PM

#

I never used midjourney cos I couldn't get their discord to work

viscid pivot Mar 4, 2025, 10:08 PM

#

im just using MJ to rip dataset images haha

bitter hearth Mar 4, 2025, 10:08 PM

#

lol

#

I thought about using flux ultra for that

#

but instead I might just filter HF datasets

viscid pivot Mar 4, 2025, 10:09 PM

#

Im using MJ because it basically says do whatever the f you want with the images. I think flux has some kind of no using for training or so

bitter hearth Mar 4, 2025, 10:09 PM

#

particularly if you are doing reward learning / reinforcement learning type fine tuning
you can re-use existing big datasets because you are using them in a different way to before

#

yeah flux probably does

viscid pivot Mar 4, 2025, 10:10 PM

#

bitter hearth particularly if you are doing reward learning / reinforcement learning type fine...

im doing that, i also do self reinforced learning. Since my current model is capable of 2k image output with nice quality, i can easily use the images for the 1k dataset and remove worse images

#

im also using the 512 res for concept transfering

bitter hearth Mar 4, 2025, 10:11 PM

#

okay nice

viscid pivot Mar 4, 2025, 10:11 PM

#

even if the images are bad, flex is very good at keeping sh... quality in the 512 resolutions haha

bitter hearth Mar 4, 2025, 10:11 PM

#

there are more image quality assessment things around now so

#

filtering can be better

#

flux dev hides some weird data in 384x384

#

poses do improve in that res though for some reason

viscid pivot Mar 4, 2025, 10:12 PM

#

the problem is getting images like that to become realistic looking is very difficult. I had to do lots of trickery to make that work in 1.5k

#

and since i now have the output i can reuse it for self reinforced learning

dusky thistle Mar 4, 2025, 10:56 PM

#

just implemented regional conditioning with SD35

#

#

#

#

#

#

#

#

#

pretty much doubles the token limit too

dusky thistle Mar 4, 2025, 11:45 PM

#

with SD35L

#

#

#

turbid grotto Mar 5, 2025, 12:13 AM

#

bitter hearth same I'm essentially hoping for SD4 at this point

absence of planned sd3.5m controlnets fueling my cope that SAI dropped 3.5 and in best case cooking 4.0 with the newer architecture and better coherence to regain it's crown... or at least 3.6 to address trainability

devout schooner Mar 5, 2025, 1:02 AM

#

viscid pivot these are native 2k gens (non upscaled) using flux or rather flex trained on a 2...

yeah SD 3.5 Med tops out at around 1440x1440

#

got a pretty good result for Gun Lady with Clownshark sampler and SLG, on SD 3.5 Med
SD 3.5 does cars particularly well I've noticed

devout schooner Mar 5, 2025, 1:12 AM

#

viscid pivot these are outputs from the locon i trained of flex (the de-distilled model). The...

I tried reprompting these with JoyCaption on the current version of my own Flux photo Lora (older / smaller dataset than the Kolors one has, was still decent though)
not bad I think
didn't expect the composition to be as similar as it was either TBH
I guess that just comes down to the text encoder patterns though

#

CogView kinda mid on the lady one here
rail to nowhere lol
it looks like a distilled model despite apparently not being one also
same as Lumina 2.0
for some reason

#

this SD 3.5 Medium one should be nice
doing it direct 1024x1536
I can see she has the right number of fingers already so we should be good lol
Clownshark saves the day again

#

Kolors Lora ones here
pretty good
just up against the limitations of the VAE as always sadly

devout schooner Mar 5, 2025, 1:33 AM

#

viscid pivot The other huge issue with sd3.5 is the extremely low context size for the TE

I've noticed that this problem doesn't exist if you ONLY load and prompt T5, not either of the clips
it never explodes with only T5
so it has to be some weirdness with the CLIPS I guess

devout schooner Mar 5, 2025, 3:15 AM

#

SD 3.5 Medium really likes prompts that are exactly the average length of regular Florence-2 Large "more detailed caption" mode
that's what this is
a portrait of a young woman with pink hair. She is sitting on a couch with a cityscape in the background. The woman is wearing a black leather outfit with a gold necklace and earrings. She has a pair of sunglasses on her head and is looking off to the side with a serious expression on her face. The lighting is red and blue, creating a futuristic and edgy vibe. The overall mood of the image is dark and mysterious.

viscid pivot Mar 5, 2025, 4:13 AM

#

devout schooner SD 3.5 Medium really likes prompts that are exactly the average length of regula...

yea sd3.5 can be made into a usable state, but it is just so tedious... 1 thing that is really noticable and annoying with flux no matter how you try to tune it. Fur and clothing like wool and so on never looks really right, for some reason it always puts this smoothness on it

#

hard to put into words, but it always looks a little off

#

it always tries to have the patterns of it too perfect

devout schooner Mar 5, 2025, 4:47 AM

#

viscid pivot yea sd3.5 can be made into a usable state, but it is just so tedious... 1 thing ...

I mean it looks like all distilled models do basically
3.5 Large Turbo on the left / first, 3.5 Medium in the middle / second, Flux Dev on the right / third
Aesthetically Dev is just like, a slightly lesser amount of visible distillation than Large Turbo I'd say
but you can still tell it is
whereas Medium is obviously not a distilled model

#

this one's Lumina 2.0 lol
it's weird looking for a non-distilled model
more like actual plastic than Flux

viscid pivot Mar 5, 2025, 5:06 AM

#

devout schooner this one's Lumina 2.0 lol it's weird looking for a non-distilled model more like...

lumina propably comes from beeing low param count if you're using the 2b version and propably very undertrained

jagged gate Mar 5, 2025, 5:42 AM

#

dusky thistle Mar 5, 2025, 6:39 AM

#

#

dusky thistle Mar 5, 2025, 7:03 AM

#

#

#

#

jagged gate Mar 5, 2025, 9:19 AM

#

#

crisp crane Mar 5, 2025, 5:28 PM

#

/ dream Painting of 2 fairies looking at the camera and smiling，Asian girl's face, full body photo, white wings

devout schooner Mar 5, 2025, 10:24 PM

#

SD 3.5 Medium vs CogView 4
both 1152x1536
SD still winning in "groundedness" as far as as realism IMO
I dunno why everyone else is apparently allergic to making a model that just looks normal for that kind of thing

#

bitter hearth Mar 6, 2025, 12:32 AM

#

devout schooner

got this out of cogview, similar result to you

devout schooner Mar 6, 2025, 1:10 AM

#

bitter hearth got this out of cogview, similar result to you

I don't get why she has Flux Chin lol
that was mostly a product of distillation as far as I could tell
but it's not distilled

#

maybe it was just trained on too many Flux gens

#

I guess the timeframe would work for that to be possible

bitter hearth Mar 6, 2025, 1:21 AM

#

more likely to be DPO I think

#

the different categories of distillation act pretty differently

#

some are very subtle

#

like high step PCM

devout schooner Mar 6, 2025, 1:28 AM

#

bitter hearth got this out of cogview, similar result to you

a photograph of a woman with long, wavy dark hair sitting at a wooden table in a dimly lit coffee shop, holding a teacup in her hands. She is wearing a light blue ribbed long-sleeved shirt, and her expression is calm and contemplative. The background is blurred, but it appears to be a cozy cafe with warm, inviting lighting. The image has a high-quality, cinematic feel, with a focus on the woman's contemplative expression and the warm tones of the setting.
with Kolors photo lora

lethal cape Mar 6, 2025, 1:35 AM

#

Is there any guide on how to begin?

bitter hearth Mar 6, 2025, 1:37 AM

#

devout schooner `a photograph of a woman with long, wavy dark hair sitting at a wooden table in ...

its a bit better yeah

devout schooner Mar 6, 2025, 1:40 AM

#

bitter hearth its a bit better yeah

as always

#

though

#

sadly lol

bitter hearth Mar 6, 2025, 1:45 AM

#

wan or stepfun

#

as image models

#

is best currently

devout schooner Mar 6, 2025, 1:47 AM

#

yeah i've heard of people using them like that

bitter hearth Mar 6, 2025, 1:48 AM

#

I forget the name of the paper but it is something like "inpainting with video priors" it said video models learn stuff like laws of physics better

simple ocean Mar 6, 2025, 1:54 AM

#

/generate 现代都市公园，阳光柔和，绿地和树木，背景有摩天大楼。一个白人男性（30岁，浅棕色短发，浅蓝色衬衫，灰色休闲裤，微笑）和一个黑人男性（30岁，黑色卷发，深绿色针织衫，卡其色长裤，开朗笑容）站在桥上，影子交融，背景有鸽子和国际象棋棋盘，插画风格，低饱和色调。

jagged gate Mar 6, 2025, 2:48 AM

#

jagged gate Mar 6, 2025, 12:31 PM

#

crystal notch Mar 6, 2025, 3:28 PM

#

dusky thistle with SD35L

what was prompt?

viscid mica Mar 7, 2025, 2:51 AM

#

Today's API service is not stable, always prompting timeout

jagged gate Mar 7, 2025, 4:10 AM

#

jagged gate Mar 7, 2025, 4:35 AM

#

jagged gate Mar 7, 2025, 5:09 AM

#

simple ocean Mar 7, 2025, 6:07 AM

#

Build an exhibition hall for IQOS electronic cigarettes, with an artistic and high-end style.

dense birch Mar 7, 2025, 7:02 AM

#

A tall fantasy art panel divided into four vertical sections, each showing the same stylized tree in a different season:

Left section: winter theme with vibrant icy-blue leaves, snowflakes, and a dark starry sky
Second section: spring theme with bright green leaves, soft glow, and gentle sparkles
Third section: summer theme with warm golden-orange leaves, light glow, and shimmering atmosphere
Right section: autumn theme with fiery red leaves, falling foliage, and a darker star-filled sky
Each section seamlessly transitions in color and mood, leaves softly glowing and drifting,
intricate detail, ultra-detailed, fantasy lighting, digital painting, trending on ArtStation, 8k resolution

icy moon Mar 7, 2025, 7:53 AM

#

Realistic style surreal visual scene

thick talon Mar 7, 2025, 2:52 PM

#

hello everyone!

devout schooner Mar 8, 2025, 4:47 AM

#

I keep changing this absentmindedly instead of denoise for hi-res-fix with Clowsampler lmao

#

cause it starts at 0.5 I guess

viscid mica Mar 8, 2025, 7:31 AM

#

When will the service be restored?

dusky thistle Mar 8, 2025, 7:28 PM

#

devout schooner Mar 8, 2025, 7:30 PM

#

dusky thistle

is there anything else in the incredibly gigantic list of sampler options worth checking out for general use cases lol? or a breakdown of what it all even is anywhere?

dusky thistle Mar 8, 2025, 7:32 PM

#

devout schooner is there anything else in the incredibly gigantic list of sampler options worth ...

do you have rgthree installed?

#

it looks like this if you have rgthree, and turn on the setting for nesting folders

#

it really should be part of base comfyui imo

#

cuz yea otherwise it is a HUGE list lol

#

worth checking out is hard to say, it all depends on how fast your model is and your patience i guess lol

#

and what's best for what, it's hard to say... which is why i added so many lol

#

res_8s can be insanely good with sd35m, obviously will be a lot slower but medium is pretty fast soooo... can be viable

#

pretty much the big think to know from a user perspective is the multistep ones will be fast, and the higher you go with the "s" number the slower it will be, but probably better

devout schooner Mar 8, 2025, 7:43 PM

#

dusky thistle res_8s can be insanely good with sd35m, obviously will be a lot slower but mediu...

i'll try that one then thanks
do you recommend still using SLG with all / any of these, also? seems to work fine still, just wasn't sure if it was meant for it as much

dusky thistle Mar 8, 2025, 7:44 PM

#

i'd imagine it'd be fine either way

#

honestly, i've never used SLG

devout schooner Mar 8, 2025, 7:48 PM

#

dusky thistle honestly, i've never used SLG

these were both the same seed / prompt / settings etc on SD 3.5 Medium with your res_3s and "ModelSamplingAdvanced" in exponential mode
only difference is first was no SLG, second was with
SLG version background is way more coherent, especially the buildings, I think
and it seems to not have the yellowy high-contrast kinda look that SLG usually brings, when I use it with your exponential samplers
so that's a bonus too

dusky thistle Mar 8, 2025, 7:52 PM

#

yeah that's a pretty big difference

#

what settings are you using with SLG?

devout schooner Mar 8, 2025, 7:55 PM

#

dusky thistle what settings are you using with SLG?

#

not having the conditioning thingies there makes a huge difference for reasons I don't understand, even with an empty negative prompt
so that's like the overall best combo of settings I've found

dusky thistle Mar 8, 2025, 7:58 PM

#

yeah, a blank prompt gets encoded to something different than all zeros

devout schooner Mar 8, 2025, 7:58 PM

#

the default scale 3.0 is WAY too high it seems for SLG lol, the colors are super wonky, 2.0 is way more reasonable
and the slightly lower default end percent of 0.015 is also a bit worse for whatever reason, at least in my experience

devout schooner Mar 8, 2025, 11:01 PM

#

Exponential Clownsamping makes some of my SD 3.5 Medium likeness Lora experiments come out way better than I had "ranked" them at lol
I may need to re-evaluate like everything
I use these two a lot for testing lora training on new models just cause the models will very often struggle to reproduce the various somewhat unique aspects of how they look
in comparison to other celebs who don't look quite as distinct

craggy crest Mar 8, 2025, 11:58 PM

#

devout schooner Mar 9, 2025, 12:10 AM

#

holy crap
a miniature model of a castle on top of a large mug. The mug is made of stone and has two handles on either side. The base of the mug is covered in moss and rocks, and there is a small waterfall cascading down from the top. The waterfall is surrounded by greenery and there are two small figures standing on the rocks. On the left side of the base, there is an oak tree with yellow leaves. The castle is made up of multiple towers and turrets, and it is lit up with orange and yellow lights. The background is a bookshelf filled with books and other decorative items. The overall mood of the image is magical and whimsical.
Clownsample res_3s (right / second) absolutely steamrolled the DPM++ 2M SGM Uniform output (left / first) on this one lol
with SD 3.5 Medium

#

like it directly made the prompt adherence better
as far as the background

dusky thistle Mar 9, 2025, 12:54 AM

#

SDE can help a ton with getting style out of loras, and likneess, in my experience

#

it gives the model lots of chances to make little corrections and find its way to a better output

devout schooner Mar 9, 2025, 12:59 AM

#

dusky thistle SDE can help a ton with getting style out of loras, and likneess, in my experien...

yeah it was always the obvious best choice for essentially all UNET models I always found
for any use case that wasn't like, fully 2D (anime or what have you)
for that Euler Ancestral always seemed better

dusky thistle Mar 9, 2025, 12:59 AM

#

yeah, it def helps

#

everyone kinda gave up and never did anything to get SDE working with rectified flow for whatever reason

#

but that's what the "eta" parameter does

#

if itha'ts at 0.0, it's not SDE, if it's > 0.0 it's SDE

dusky thistle Mar 9, 2025, 1:47 AM

#

cant remember if i gave you this but it has an explanation on most stuff, to some degree at least lol

📎 intro_to_clownsampling.json

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

devout schooner Mar 9, 2025, 3:02 AM

#

dusky thistle cant remember if i gave you this but it has an explanation on most stuff, to som...

yeah you did, it helped for sure Prayge

dusky thistle Mar 9, 2025, 3:56 AM

#

#

#

#

#

#

dusky thistle Mar 9, 2025, 4:53 AM

#

dusky thistle Mar 9, 2025, 6:27 AM

#

#

#

#

#

#

#

#

#

#

#

#

#

#

all using stoqio, a finetune of SD35L

#

dusky thistle Mar 9, 2025, 7:07 AM

#

#

#

#

#

#

#

#

#

#

dusty widget Mar 9, 2025, 11:17 AM

#

Vibrant energy waves pulsating across the cosmos, different frequencies manifesting as different celestial objects.

buoyant mesa Mar 9, 2025, 12:44 PM

#

dusky thistle

how did you get the amazing reflektion on the ground?

dusky thistle Mar 9, 2025, 2:11 PM

#

buoyant mesa how did you get the amazing reflektion on the ground?

By using stoqio a sd35L fine-tune

#

https://civitai.com/models/161068/stoiqo-newreality-flux-sd35-sdxl-sd15 it's the pre-alpha on here, it says it's sd35 medium but that's incorrect, it's large

#

https://github.com/ClownsharkBatwing/RES4LYF also using this which really helps get the best out of a model

GitHub

GitHub - ClownsharkBatwing/RES4LYF

Contribute to ClownsharkBatwing/RES4LYF development by creating an account on GitHub.

dusky thistle Mar 9, 2025, 2:18 PM

#

dusty widget Vibrant energy waves pulsating across the cosmos, different frequencies manifest...

Here is the image you requested.

craggy crest Mar 9, 2025, 5:28 PM

#

@spark grove spammer - you might want to block steam links

dusky thistle Mar 9, 2025, 7:37 PM

#

dusky thistle Mar 9, 2025, 7:56 PM

#

#

#

dusky thistle Mar 9, 2025, 8:33 PM

#

#

dusky thistle Mar 9, 2025, 9:40 PM

#

#

#

#

#

#

#

#

#

#

#

dusky thistle Mar 9, 2025, 10:03 PM

#

#

#

#

#

#

#

#

#

#

#

#

#

dusky thistle Mar 9, 2025, 11:14 PM

#

dusky thistle Mar 9, 2025, 11:42 PM

#

#

#

jagged gate Mar 10, 2025, 12:08 AM

#

#

dusky thistle Mar 10, 2025, 12:10 AM

#

#

jagged gate Mar 10, 2025, 12:14 AM

#

dusky thistle Mar 10, 2025, 12:21 AM

#

#

jagged gate Mar 10, 2025, 12:29 AM

#

dusky thistle Mar 10, 2025, 12:29 AM

#

dusky thistle Mar 10, 2025, 12:50 AM

#

#

#

dusky thistle Mar 10, 2025, 1:18 AM

#

#

#

#

#

#

#

#

astral holly Mar 10, 2025, 3:07 AM

#

Phoebe, the most massive irregular satellite of Saturn.

jagged gate Mar 10, 2025, 4:19 AM

#

dusky thistle Mar 10, 2025, 5:08 AM

#

dusky thistle Mar 10, 2025, 5:08 AM

#

astral holly Phoebe, the most massive irregular satellite of Saturn.

Here is the image you requested.

#

#

#

#

#

#

#

#

#

#

arctic rose Mar 10, 2025, 12:25 PM

#

tree

rocky coral Mar 10, 2025, 12:42 PM

#

hello

dusky thistle Mar 10, 2025, 2:37 PM

#

#

#

#

vital gazelle Mar 10, 2025, 9:47 PM

#

#

dusky thistle Mar 11, 2025, 1:46 AM

#

dusky thistle Mar 11, 2025, 3:00 AM

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

dusky thistle Mar 11, 2025, 3:32 AM

#

#

#

#

#

#

#

#

#

dusky thistle Mar 11, 2025, 4:59 AM

#

#

dusky thistle Mar 11, 2025, 5:39 AM

#

dusky thistle Mar 11, 2025, 6:11 AM

#

#

#

#

drifting oak Mar 11, 2025, 11:34 AM

#

#

proven pecan Mar 11, 2025, 6:49 PM

#

dusky thistle

Seeing your stuff again makes me wish I could run comfy on my ipad.

stiff drum Mar 12, 2025, 11:36 AM

#

imagine an online store dased on darktheme ui,ux design selling shampoo, conditiooner, texture powder

errant dust Mar 12, 2025, 9:05 PM

#

I'm imagining it.

#

No, wait, it needs more hair ribbons

#

and more basketballs

dusky thistle Mar 12, 2025, 9:36 PM

#

stiff drum imagine an online store dased on darktheme ui,ux design selling shampoo, conditi...

Here is the image you requested.

#

#

dusky thistle Mar 12, 2025, 11:35 PM

#

#

#

dusky thistle Mar 13, 2025, 12:35 AM

#

#

dusky thistle Mar 13, 2025, 1:15 AM

#

#

#

#

#

#

#

#

#

#

dusky thistle Mar 13, 2025, 3:16 AM

#

#

https://media.tenor.com/fZFlmzCDf5gAAAAM/niana-guerrero-all-by-myself.gif

craggy crest Mar 13, 2025, 3:33 AM

#

dusky thistle Mar 14, 2025, 2:36 AM

#

#

#

dusky thistle Mar 14, 2025, 3:46 AM

#

#

#

#

#

#

#

#

exotic atlas Mar 14, 2025, 4:00 AM

#

phòng khách hiện đại

#

#

#

Modern living room includes: TV wall, table and sofa, wall hanging, door frame, decorative lights

dusky thistle Mar 14, 2025, 4:36 AM

#

golden wave Mar 14, 2025, 5:27 AM

#

dog big smart

olive laurel Mar 14, 2025, 10:56 AM

#

a girl in forest

craggy crest Mar 14, 2025, 11:43 PM

#

#

#

craggy crest Mar 15, 2025, 12:35 AM

#

craggy crest Mar 16, 2025, 12:02 AM

#

fathom acorn Mar 16, 2025, 8:25 AM

#

boy

#

#🆕｜sd3 boy

urban arch Mar 16, 2025, 3:56 PM

#

Nice Nose Flute! 🙂

desert sapphire Mar 16, 2025, 6:09 PM

#

/imagin prompt:Generate an attractive FB homepage image for wholesale customization of shoes, clothing, and bags for e-commerce, so that customers can know what product you are making as soon as they come in:: --aspect 16:9 --version 5.2 --quality .5 --stylize

livid prairie Mar 17, 2025, 3:51 AM

#

70-80年代，盐城，怀旧，文创店，温馨，年代感，老照片，复古海报，手绘墙画，木质柜台，货架，文创产品，老物件，绿色植物，暖黄色灯光，老式唱片机，邓丽君，老式自行车，缝纫机，丹顶鹤，麋鹿，剪纸，刺绣，草编

boreal heath Mar 17, 2025, 1:49 PM

#

is this for image generation? help pls

summer ginkgo Mar 17, 2025, 1:56 PM

#

boreal heath is this for image generation? help pls

Try Artisan

violet escarp Mar 17, 2025, 6:43 PM

#

https://arxiv.org/pdf/2503.10618v1

Although increasing the channel capacity of the VAE generally improves image reconstruction quality, it can inflate the KL divergence, hindering
subsequent diffusion training.

buoyant mesa Mar 17, 2025, 8:42 PM

#

craggy crest

could you post your workflow for that.... my creations in comfyui with SD3.5 are just bad

craggy crest Mar 17, 2025, 8:45 PM

#

buoyant mesa could you post your workflow for that.... my creations in comfyui with SD3.5 are...

here's my basic sd 3.5 workflow - each encoder is separate so you can put in a different prompt for each one. they each understand the same tokens differently, and function differently - so prompting them to their strengths will give you the best results. And the prompt i used for the image was just "wild and crazy, surreal, untamed piano"

📎 three-encoder-workflow.json

craggy crest Mar 18, 2025, 2:49 AM

#

#

@honest yarrow perfect english - no, oops, it repeated a word that wasn't in the prompt - and this is the most advanced generative AI model right now

honest yarrow Mar 18, 2025, 3:11 AM

#

craggy crest

what prompt did you use

craggy crest Mar 18, 2025, 3:12 AM

#

honest yarrow what prompt did you use

#

you can see i did not tell it to repeat the word pie

#

none of the AI image generators are good at text. they're getting better, but they're still not good. and the only text they are the least bit good at is English. if you want one that is good at any other language, you're going to ahve to spend the time to research, learn how to create data sets, get a data center, and train it

#

and then work with it over and over and over till you get it working

honest yarrow Mar 18, 2025, 3:14 AM

#

craggy crest

well at least It did type it cursively right. I will check SD 3.5 for Arabic now

craggy crest Mar 18, 2025, 3:15 AM

#

honest yarrow well at least It did type it cursively right. I will check SD 3.5 for Arabic now

since flux and SD3.5 are the same architecture, i doubt it'll do any better than flux

honest yarrow Mar 18, 2025, 3:19 AM

#

craggy crest since flux and SD3.5 are the same architecture, i doubt it'll do any better than...

also It is only 8B

craggy crest Mar 18, 2025, 3:21 AM

#

honest yarrow also It is only 8B

i know. flux is stuffed full of stuff it didn't need and then dpo was run on it to ensure that the things people are most likely to want to generate come out nice - and mask all the issues it has

#

it would also be around 8b if they hadn't done that

bitter hearth Mar 18, 2025, 8:32 AM

#

recently there was a Rombach lecture on youtube and they showed pre DPO flux sample

#

I have mixed feelings about DPO because it works very well for certain models

#

particularly SPO for SD 1.5

#

I think the weaker the model the more it helps, so it was more needed for SD 1.5 sized models than for big 10B+ ones

craggy crest Mar 18, 2025, 7:13 PM

#

bitter hearth I have mixed feelings about DPO because it works very well for certain models

it was designed for LLMs - it reads well on the paper - i don't like what it does in practice

#

dont' spam the channels

bitter hearth Mar 18, 2025, 7:39 PM

#

it has a high risk of overfitting compared to some other methods

#

there are some more modern similar methods to DPO that address that a bit

#

but its still a risk

hushed cliff Mar 19, 2025, 9:09 AM

#

Hi dudes. Happy wednesday.
Started tinkering with SD 3.5 medium recently. Very few information in the net about styles. Had someone found a key to it yet? The artist styles seem to be present, but hard to use in compex prompts, as they dissipate very quickly with the prompt length increase.

silver stratus Mar 19, 2025, 12:02 PM

#

Modern living room includes: TV wall, table and sofa, wall hanging, door frame, decorative lights

craggy crest Mar 19, 2025, 9:07 PM

#

hushed cliff Hi dudes. Happy wednesday. Started tinkering with SD 3.5 medium recently. Very f...

for medium keep the prompt short and consider using it as a refiner for sd 3.5 large. try prompts like this "pen and ink line drawing of love birds on a branch"

strange lodge Mar 20, 2025, 12:07 AM

#

"Peaceful landscape with lots of dogs."

bitter hearth Mar 20, 2025, 1:15 AM

#

specifically no more than exactly 75 tokens, you can check with certain comfy nodes

#

you will sometimes see 77 written for the clip L and clip G size but apparently it ends up being 75

#

would leave a few more just in case so maybe 70

modern shuttle Mar 20, 2025, 8:09 AM

#

((cartoonish style), (Q版 fantasy)),
main elements:
smiling sun character with straw hat (拟人化太阳),
wheat fairy holding scythe (木属性精灵),
dynamic composition with wind-blown wheat waves (火性动感),
color palette:
orange sun (丙火),
emerald wheat (乙木),
light gray clouds (金属性弱化),
avoid deep blue or silver (忌水金)),
text overlay: "庚午匠心" in bold calligraphy (火属性印章)

fallen stump Mar 21, 2025, 11:09 AM

#

dusky thistle Mar 21, 2025, 11:22 PM

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

sullen moss Mar 22, 2025, 10:50 AM

#

https://preview.reve.art/

Reve: Bring your ideas to life

violet escarp Mar 22, 2025, 7:19 PM

#

sullen moss https://preview.reve.art/

I think this may be the halfmoon model on here

#

I'm finding little info about them beyond what's on their website. They're pretty recent.

bitter hearth Mar 22, 2025, 7:22 PM

#

there's honestly no reason to use closed these days with flux, stepfun and deepseek

bitter hearth Mar 22, 2025, 9:07 PM

#

made some Reve img

#

#

#

its got good colours

past cipher Mar 22, 2025, 9:16 PM

#

bitter hearth

Did you give it enough steps? The bars on the stairs are wavy instead of vertical.

bitter hearth Mar 22, 2025, 9:18 PM

#

past cipher Did you give it enough steps? The bars on the stairs are wavy instead of vertica...

its closed source, there is no step option

frail shoal Mar 22, 2025, 11:25 PM

#

past cipher Mar 23, 2025, 12:39 AM

#

frail shoal

That is awesome.

frail shoal Mar 23, 2025, 12:41 AM

#

trying to animate with sora but its shit

20250323_0103_Dystopian_Robot_Ascension_simple_compose_01jq04zfw2fgrs2b5v36egyf3s.gif

past cipher Mar 23, 2025, 12:43 AM

#

frail shoal trying to animate with sora but its shit

Have you tried wan yet? It's nuts.

frail shoal Mar 23, 2025, 12:43 AM

#

don't have good gpu

past cipher Mar 23, 2025, 12:44 AM

#

frail shoal don't have good gpu

I'm on a 3060TI with only 8GB of VRAM. I can run wan. =/

#

I'm using the exact files mentioned here though. https://comfyanonymous.github.io/ComfyUI_examples/wan/

ComfyUI_examples

Wan 2.1 Models

Examples of ComfyUI workflows

frail shoal Mar 23, 2025, 12:46 AM

#

past cipher I'm on a 3060TI with only 8GB of VRAM. I can run wan. =/

can you put the rest of layers to cpu ram right ?

#

because i have 6gb, but i can run flux, if its the same size

#

past cipher Mar 23, 2025, 1:24 AM

#

frail shoal can you put the rest of layers to cpu ram right ?

I'm using ComfyUI so it offloads to RAM what I'm not using.

noble tinsel Mar 23, 2025, 1:22 PM

#

hello

frail shoal Mar 23, 2025, 7:12 PM

#

2007001-dynamic20angle20black20and20white20color20sch-illy-test-lora.png

past cipher Mar 23, 2025, 7:13 PM

#

frail shoal

I really like this one. "I can fix her" mentality going on...

frail shoal Mar 23, 2025, 7:14 PM

#

dynamic angle, black and white color scheme, monochrome, an artistic depiction of an alluring demon girl with demon wings, surrounded by flames, holding a huge huge fantasy magical sword, green flames, the scene is depicted with a feel of melancholy and angry engrained in the composition, long black hair, medium elf ears and golden fiery glowing eyes, high quality, realistic artist render, digital painting, realistic artist illustration, incredibly absurdres, intricate details, incredibly detailed, perfect lighting, HDR, volumetric lighting, year 2024, high contrast

#

2012001-dynamic20angle20black20and20white20color20sch-illy-test-lora.png

#

limited colors looks good

dusky thistle Mar 23, 2025, 11:00 PM

#

#

#

#

#

#

#

#

#

past cipher Mar 23, 2025, 11:01 PM

#

dusky thistle

I've heard of having skeletons in your closest, but a school locker?

craggy crest Mar 24, 2025, 2:38 AM

#

past cipher I've heard of having skeletons in your closest, but a school locker?

anthropology student

past cipher Mar 24, 2025, 5:25 AM

#

craggy crest anthropology student

Or they took the Introductory Class "How to Get Away with Murder"

#

The Advanced Class is taught by Michael Scofield and it's "How to Break Out of a Maximum Security Prison"

craggy crest Mar 24, 2025, 5:49 AM

#

past cipher Or they took the Introductory Class "How to Get Away with Murder"

they failed it then - cause they didn't hide the evidence very well

#

the_worlds_cutest_lizard_decorated_with_flowers_in_a_spring_garden_20250324053029_01.png

stuck yarrow Mar 24, 2025, 6:20 AM

#

Aesthetic wallpaper inspired by Oriental Five Elements, gold and wood fusion, soft green leaves with intricate designs, shimmering golden branches, interplay of emerald and gold, tension of balance, delicate mist, hidden golden glimmers, minimalist abstract, ideal for mobile, 4k

livid glen Mar 24, 2025, 7:26 AM

#

generate a girl

drifting oak Mar 24, 2025, 9:01 AM

#

past cipher Mar 24, 2025, 2:42 PM

#

livid glen generate a girl

👧

acoustic mauve Mar 24, 2025, 9:30 PM

#

Surrealistic illustration of a human face fused with mechanical and organic elements, featuring a steampunk and biomechanical aesthetic. The face includes gears, circuits, veins, and exposed anatomical structures, combined with antique clocks, alchemical symbols, and scientific anatomical details of the brain. The background consists of aged parchment pages filled with handwritten script and scientific diagrams. The artistic style is highly detailed, resembling ink drawing with watercolor shading in gold, red, and blue tones. Dramatic lighting with a worn, vintage paper effect.

snow pivot Mar 25, 2025, 12:00 PM

#

木质桌面上散落辛拉面袋/溏心蛋/午餐肉切片/葱花/芝士片/韩式辣酱，暖光俯拍，日系清新滤镜，突出食材色彩对比

sullen moss Mar 25, 2025, 8:50 PM

#

New gen model from OpenAI...

#

#

#

#

#

errant dust Mar 25, 2025, 9:14 PM

#

https://news.mit.edu/2025/ai-tool-generates-high-quality-images-faster-0321

MIT News | Massachusetts Institute of Technology

AI tool generates high-quality images faster than state-of-the-art ...

A hybrid AI approach known as hybrid autoregressive transformer can generate realistic images with the same or better quality than state-of-the-art diffusion models, but that runs about nine times faster and uses fewer computational resources. The new tool uses an autoregressive model to quickly capture the big picture and then a small diffusion...

past cipher Mar 25, 2025, 9:22 PM

#

errant dust https://news.mit.edu/2025/ai-tool-generates-high-quality-images-faster-0321

And for a direct link to the Arxiv (which is much better than a crappy article) https://arxiv.org/pdf/2410.10812

sullen moss Mar 25, 2025, 11:14 PM

#

sullen moss Mar 25, 2025, 11:43 PM

#

sullen moss Mar 26, 2025, 12:02 AM

#

sullen nacelle Mar 26, 2025, 12:13 PM

#

anie

drowsy rune Mar 26, 2025, 5:52 PM

#

gerate animated spider man cartoon with holding crricket bat

#

#🆕｜sd3 gerate animated spider man cartoon with holding crricket bat

jagged gate Mar 27, 2025, 4:19 AM

#

vital surge Mar 27, 2025, 2:26 PM

#

sullen moss https://preview.reve.art/

Tiny low quality images, a joke

jagged gate Mar 28, 2025, 2:38 AM

#

solemn lichen Mar 28, 2025, 5:58 AM

#

create a image of a dog

sullen moss Mar 28, 2025, 6:53 PM

#

jagged gate Mar 29, 2025, 3:26 AM

#

jagged gate Mar 29, 2025, 3:54 AM

#

errant dust Mar 29, 2025, 3:15 PM

#

https://youtu.be/SmNDzTBgB_8?si=_3hGJBLL-1hLNBTS

YouTube

Two Minute Papers

OpenAI’s New Image Generator: An AI Revolution!

❤️ Check out Weights & Biases and sign up for a free demo here: https://wandb.me/papers

4o Image Generation: https://openai.com/index/introducing-4o-image-generation/
Apple terminal: https://www.apple.com/mac/lumon-terminal-pro/

📝 My paper on simulations that look almost like reality is available for free here:
https://rdcu.be/cWPfD

O...

▶ Play video

sullen moss Mar 29, 2025, 10:53 PM

#

merry timber Mar 30, 2025, 2:23 AM

#

I generated an image with the prompt: ‘Convert the image to a hand-drawn style, keeping all original content unchanged, including the person with twin tails in a floral dress, sitting in a car with a seatbelt, the car interior background, the phone lock screen with the time “13:16”, text “向上滑动以解锁”, notification “微信 4个须知”, and status bar with signal, Wi-Fi, and battery icons, using a soft black-and-white sketch style with clear details.’

robust bolt Mar 30, 2025, 6:14 PM

#

Une pub pour une nouvelle collection de survêtement iconique d’une marque qui s’appelle CASA X MENA photography, colorful modern, art by , Greg Manchess, in the style of , Artstation Pinterest, accent lighting, Pub survêtement , highly detailed intricate details Unreal Engine fine details HDR hyper realistic sharp trending on artstation

coral arrow Mar 31, 2025, 1:02 AM

#

hi可以用中文吗

pliant gale Mar 31, 2025, 10:28 AM

#

what

errant dust Mar 31, 2025, 6:57 PM

#

Looks like it is a mad race for big releases. After OpenAI's new model (Dall-E 4?), Ideogram just released their own new spectacular v3.0 and Midjourney is getting ready to release v7 soon

#

https://youtu.be/USSpwbe3Rxk?si=GrIw5Vp2ojlZQrF5

YouTube

Ideogram

Introducing Ideogram 3.0

Meet Ideogram 3.0 — stunning realism, creative designs, and consistent styles, all in one powerful text to image AI. Now available to all Ideogram users for free.

Ideogram 3.0 introduces Style Reference. Creators can upload up to three reference images to guide the style of their generations. This enable creators to quickly specify aesthetics...

▶ Play video

#

Really the Golden Age of AIs

surreal heart Mar 31, 2025, 7:04 PM

#

redo this low quality flash card to be ready for print and remove the texts

split bramble Mar 31, 2025, 7:21 PM

#

errant dust Looks like it is a mad race for big releases. After OpenAI's new model (Dall-E 4...

Dall-E 4? Did I miss this? Dall-E is my fave!

errant dust Mar 31, 2025, 7:22 PM

#

See the 2-minute video above. They may not have called it this, but it is nevertheless their new image generating model

split bramble Mar 31, 2025, 7:24 PM

#

errant dust See the 2-minute video above. They may not have called it this, but it is nevert...

that's for Ideogram's.

errant dust Mar 31, 2025, 7:24 PM

#

above that

split bramble Mar 31, 2025, 7:27 PM

#

errant dust above that

LOL... ah... thanks 🙂

errant dust Mar 31, 2025, 7:28 PM

#

🙂

next fossil Apr 1, 2025, 8:03 PM

#

Not Dall-E 4

#

GPT4o Multimodal, LLM + image generation

split bramble Apr 1, 2025, 9:11 PM

#

I had a chance to try it, but not much before hitting the limit. It's now limited so much that's it is essentially unusable.

#

Apparently even paid accounts are heavily rate limited.

dry wave Apr 1, 2025, 9:33 PM

#

I assume its good but very inefficient

#

maybe good for generating training data ;P

pseudo owl Apr 1, 2025, 10:43 PM

#

dry wave maybe good for generating training data ;P

yeah openai always starts by making the biggest most inefficient model they can and then distill it, quality wise its a really big step up from normal t2i models, even closed source. Imagen/Ideogram/Recraft might have slightly better aesthetics but prompt following is pretty insane. some imgs from it

Now tho, the model already seems considerably worse so looks like they are already distilling it, some open source variant would be nice.

craggy crest Apr 2, 2025, 1:53 AM

#

pseudo owl yeah openai always starts by making the biggest most inefficient model they can ...

because that's what they always do. they release something as an early release sort of thing - they get people to use it, talk about how great it is. once they have enough of that to show their real target customers so they can sell it, they crack down on what the general public can actually do with it.

left frost Apr 2, 2025, 3:05 AM

#

Give me a picture that everyone can like, with a theme of a soul singer named Na Yin Handsome Boy. The background is an immersive and dreamy color sensation, with the character appropriately reduced in size, in the style of a cyberpunk 3D anime

#

#artisan-1 Give me a picture that everyone can like, with a theme of a soul singer named Na Yin Handsome Boy. The background is an immersive and dreamy color sensation, with the character appropriately reduced in size, in the style of a cyberpunk 3D anime

dusky thistle Apr 3, 2025, 2:32 AM

#

#

#

#

#

dusky thistle Apr 3, 2025, 3:51 AM

#

dusky thistle Apr 3, 2025, 4:21 AM

#

#

dusky thistle Apr 3, 2025, 2:29 PM

#

#

#

real terrace Apr 4, 2025, 5:14 AM

#

amazing images

hallow lion Apr 4, 2025, 8:59 AM

#

unique style

green warren Apr 4, 2025, 1:25 PM

#

/image_dream BB King

lone sparrow Apr 4, 2025, 7:21 PM

#

create a zeppelin with pov angle in the amazon forest passing through smoke fog

#

create a zeppelin with pov angle in the amazon forest passing through smoke fog

civic trail Apr 4, 2025, 9:32 PM

#

frail shoal Apr 4, 2025, 10:06 PM

#

devout schooner Apr 4, 2025, 11:27 PM

#

vital surge Tiny low quality images, a joke

if that was true when you said this it's not now
Reve is quite good IMO

#

it refuses significantly less prompts than any other API-only generator I've ever used, also

craggy crest Apr 5, 2025, 5:06 AM

#

inland wagon Apr 5, 2025, 8:37 AM

#

Fusion of future aesthetics and natural themes, flowing glass texture, refraction effect

muted glade Apr 5, 2025, 8:50 AM

#

Fantasy warrior

inland wagon Apr 5, 2025, 8:57 AM

#

#🆕｜sd3 Fusion of future aesthetics and natural themes, flowing glass texture, refraction effect

bitter hearth Apr 5, 2025, 12:25 PM

#

Reve has nice colours and cinematic theme

#

better fine tune than most of civit

#

it cannot do hands

#

faces are ok a certain % of the time

#

structure sometimes messes up as well

#

kinda a medium quality release

devout schooner Apr 5, 2025, 9:13 PM

#

bitter hearth it cannot do hands

I find they're not as good as like, say Flux
but they're also not "bad" by any means really IMO
like they're usually fine
even the worst examples of them is more like, kinda melty looking
or just a missing / extra digits

#

Reve has better overall prompt adherence than Flux also IMO, by a noticeable amount

#

it doesn't just go like "nah, I'm gonna do this instead" like Flux tend too sometimes

#

this is Reve on that "Gun Lady" prompt I was doing before, for example

#

it occasionally slightly screws up the gun hand, but not that often, and not in like a ridiculously huge way
gets it right only like a smidgen less often than Flux
but overall looks better always for that sort of gen than stock Flux Dev IMO

bitter hearth Apr 5, 2025, 9:27 PM

#

ah I saw some octopus hands. like pure SD 1.4 level craziness

#

when I tried rev

devout schooner Apr 5, 2025, 9:29 PM

#

bitter hearth ah I saw some octopus hands. like pure SD 1.4 level craziness

I didn't try it until a couple of days ago
i think when they first first released it there might have been some issue with the backend
i think someone else also said the images were initially coming out like lower res overall than intended and stuff too

#

but as it is now it's about as good as my pics up there for basically anything usually, with really good prompt adherence
it also is just willing to do more things than other API-only generators, by quite a bit
they have a VLM prompt enchancer button that will very rarely just like output "I can't enhance this."
but you can turn that option off to retain your original prompt anyways
and then beyond that they do the typical blurring thing, but it only seems to apply to gens that had resulted in like full-on explicit NSFW
it doesn't care about any amount of like stomach or whatever as some of them do lol

bitter hearth Apr 5, 2025, 9:32 PM

#

ah that's unlucky

#

I actually have no idea why people pay money to use censored models when there are like $0.50 H100s everywhere

#

with optimised workflow I made 10,000 Flux dev images for $0.50

#

most people can't or won't optimise that hard but they can get a decent fraction of that number

devout schooner Apr 5, 2025, 9:37 PM

#

bitter hearth with optimised workflow I made 10,000 Flux dev images for $0.50

yeah but that's still Flux
there's an argument to be made for these other models that are increasingly more easygoing about what they'll allow
and just like, overall good / better than any free version of Flux, particularly for complex typography and such, especially if your use case is moreso commercial / business related and all that
Reve you don't even have to pay for technically, they give you daily free credits, similarly to Ideogram
only Midjourney has absolutely no free option of any kind, these days

#

in terms of refusals though OpenAI stands alone, and I don't really get why
they're the only ones who still actively block ALL copyrighted characters from anything known to the model, for example
Meta's Imagine doesn't do this
Google Imagen doesn't do this
Reve doesn't do this
Ideogram doesn't do this

#

and so on

#

they're just shooting themselves in the foot relative to every single one of their competitors

#

like, 4o literally will not do a high-quality illustration of Bart Simpson from "The Simpsons".

#

every other model I mentioned will, without hesitation

bitter hearth Apr 5, 2025, 9:45 PM

#

its cos openAI are AGI company

#

and the other stuff is secondary

#

whereas most companies that sell AI products the products is primary

devout schooner Apr 5, 2025, 9:45 PM

#

bitter hearth and the other stuff is secondary

yeah but I don't see why that makes them unusually obsessed with pretending to care about copyright, versus any of their competitors, specifically for image gen lol

#

i see no upside to that for them

bitter hearth Apr 5, 2025, 9:46 PM

#

dalle 3 was earlier in time than most

#

its aged so well that it still gets compared

#

but it is pretty old now

#

I mean its now their third best image model as well since they now allow image making using sora

#

and the GPT4o thing

devout schooner Apr 5, 2025, 9:59 PM

#

bitter hearth but it is pretty old now

I hope they publish the tech specs on it at some point
for like the whole pipeline
I've always been curious about how it really compares to newer models as far as all that stuff

bitter hearth Apr 5, 2025, 10:03 PM

#

yeah for sure

#

something about its prompt following is still the best to this day

#

not in every aspect cos flux etc can hold more objects

#

but it was very responsive

devout schooner Apr 5, 2025, 11:31 PM

#

bitter hearth but it was very responsive

to an extent

#

the context is definitely not like, THAT long compared to a number of newer models

#

although we don't really know how much fuckery they do with your prompt on the backend

#

I'd also like to know just how good it actually was at photographic gens if they turn off all the stupid bullshit filtering they do that makes every image look like it's trying to imitate the overdone implementation of ambient occlusion from Far Cry 3

bitter hearth Apr 5, 2025, 11:35 PM

#

on reddit or other you can find

#

jailbroken

#

it was not especially strong

devout schooner Apr 5, 2025, 11:36 PM

#

bitter hearth it was not especially strong

yeah i didn't think so

#

i never thought it was much better in a broad aesthetic sense than like, base SDXL

#

the images were never very "high quality"
it coul just do sort of more interesting things

bitter hearth Apr 5, 2025, 11:37 PM

#

yeah that's right

devout schooner Apr 5, 2025, 11:38 PM

#

no way it has anything more than a 4-channel VAE also

#

i'm pretty sure of that

bitter hearth Apr 5, 2025, 11:38 PM

#

yeah although I have a different view to most on vaes
I follow the lightningdit paper's idea that our vaes are too good and we need worse ones

#

cos worse vaes are easier to train your diffusion model with

devout schooner Apr 5, 2025, 11:39 PM

#

bitter hearth cos worse vaes are easier to train your diffusion model with

worse in what sense though
like a diferent sense from a direct comparison between the SDXL and SD3 / 3.5 vae?

#

if not I don't really think that quality contrast could ever be worth it

#

the only way around that on XL was training on ridiculously high res images with zero JPEG artifacts
and even then it wasn't as good as just having 16 channels

bitter hearth Apr 5, 2025, 11:43 PM

#

worse as in a smaller vae, less channels and/or deph

#

there will be a size you can upscale to to even out detail differences

devout schooner Apr 6, 2025, 1:53 AM

#

bitter hearth there will be a size you can upscale to to even out detail differences

here's another example of where Reve is significantly stronger than like, any Flux beyond Flux Pro Ultra, though
it can do stuff like Bart Simpson standing next to Garfield standing next to Goku quite accurately and consistently

#

Flux can kinda-sorta do that

#

but not nearly as often
and at least one of the characters will often look strange

#

Flux can't really resolve the three of them to a "common" style that makes sense the same way, I guess is the gist of it

humble blaze Apr 6, 2025, 9:58 AM

#

yeah although I have a different view to most on vaes
I follow the lightningdit paper's idea that our vaes are too good and we need worse ones

dry wave Apr 6, 2025, 11:13 AM

#

maybe 16 channels are too much, but 4 channels are definitely too few

#

I don't understand why they went directly from 4 to 16 instead of doing something in between. On the other hand, they might just did some evaluations and found 16 channels the best

tough creek Apr 6, 2025, 2:21 PM

#

bitter hearth Apr 6, 2025, 6:34 PM

#

they use different variables now than just channel count

#

broadly its just a trade-off between reconstruction and generation

dry wave Apr 6, 2025, 7:08 PM

#

they always did

#

you usually have a lambda parameter that controls the KL strength. If you would train a vae with normal KL strength your reconstruction error would be too large

bitter hearth Apr 6, 2025, 7:10 PM

#

ah okay nice

#

I looked a bit into variational inference and KL strength comes up there too lol

dry wave Apr 6, 2025, 7:10 PM

#

in the original publications they keep it always at 1

#

but for many applications that is just a bit too much

bitter hearth Apr 6, 2025, 7:14 PM

#

I think if you distribution match super super hard then it can be too inflexible

#

with KL divergence in general

#

its nice to have a bit of a looser fit

civic trail Apr 6, 2025, 8:52 PM

#

cinder junco Apr 7, 2025, 4:25 PM

#

dry wave I don't understand why they went directly from 4 to 16 instead of doing somethin...

My recollection is that the original SD3 paper had a study experimenting with varying numbers of VAE channels. I think they found that 32 channels improved their metrics further, but they decided on 16 for some reason. I don’t remember why.

random wraith Apr 7, 2025, 6:53 PM

#

Symbol: A stylized, simplified representation of the Indian peepal leaf, symbolizing knowledge and growth. The leaf can be designed with subtle, interconnected nodes or lines, representing the connection between ideas and research.
Generate a logo for a company called 'anveshana' as described.

Color Scheme: A palette of blues and greens, conveying trust, growth, and harmony. Blues can represent intellectual pursuits, while greens signify growth and innovation.

Typography: A clean, modern sans-serif font with the word "Anveshana" written in a flowing manner, suggesting continuity and exploration.
Meaning: The peepal leaf symbolizes the sacred tree under which knowledge is shared, while the interconnected nodes highlight the collaborative nature of research. The color scheme reinforces the themes of intellectual growth and harmony.

craggy crest Apr 8, 2025, 4:54 AM

#

Vegetative electron microscopy

craggy crest Apr 9, 2025, 12:24 AM

#

sullen moss Apr 9, 2025, 9:29 AM

#

https://huggingface.co/spaces/HiDream-ai/HiDream-I1-Dev

HiDream I1 Dev - a Hugging Face Space by HiDream-ai

sullen moss Apr 9, 2025, 12:52 PM

#

Almost one year...

bitter hearth Apr 9, 2025, 1:31 PM

#

sullen moss Almost one year...

if you click SD 3.5 Large as well then there were a few more

#

Shakker.ai had some more as well

turbid grotto Apr 10, 2025, 12:35 PM

#

what do you think, is sd4 possible?

#

after sd3.5l they planned to release sd3.5m controlnets too but there are still none, maybe they dropped sd3.5 and moved to other projects?

hallow lion Apr 10, 2025, 12:57 PM

#

https://tenor.com/view/tumbleweed-desert-road-dry-hot-gif-21341693

Tenor

bitter hearth Apr 10, 2025, 1:04 PM

#

turbid grotto after sd3.5l they planned to release sd3.5m controlnets too but there are still ...

not sure actually, why the sd3.5 controlnets never came

#

tensor.art released some in the end

#

as well as a fresh set of distils

#

but not sure why SAI didn't do the first party ones

turbid grotto Apr 10, 2025, 1:32 PM

#

bitter hearth tensor.art released some in the end

Huh, I did not even know about that
They even did a 5M finetune of medium!

#

diversity of sd3.5 is astonishing, if only not the coherency problems...
maybe this is just how it works? you have either unique model with bad coherency either overtuned model with great coherency?

bitter hearth Apr 10, 2025, 1:41 PM

#

I didn't even know about that SD 3.5 Bokeh model

#

but yea looks like they did a 5m image finetune

bitter hearth Apr 10, 2025, 1:42 PM

#

turbid grotto diversity of sd3.5 is astonishing, if only not the coherency problems... maybe t...

there are more factors, but there is a bit of a trade-off between quality and diversity yeah

dry wave Apr 10, 2025, 4:33 PM

#

dunno. I find diversity in Flux larger than in SDXL for example

bitter hearth Apr 10, 2025, 5:50 PM

#

Flux having low diversity is a myth yeah

#

particularly at low guidance numbers

#

its a much larger neural network than SDXL so it can be expected for the larger network to have a better trade-off

#

I think the trade-off is more for comparing different versions of the same model, e.g. finetunes/distils/CFG levels, more than it is for comparing different models

devout schooner Apr 10, 2025, 6:56 PM

#

sullen moss Almost one year...

Try selecting the SD 3.5 Large category too

#

More definitely exist than that

#

Not sure what the "SD 3.5 with no suffix word" category is for TBH

#

Medium is getting a bit more love though

#

Two separate anime finetunes for it now

#

Both looking pretty promising when I tried them

#

RealVis dude also has a WIP Medium finetune on huggingface

devout schooner Apr 10, 2025, 7:02 PM

#

bitter hearth Shakker.ai had some more as well

TensorArt has a decent number I think too, that aren't anywhere else
They even had a bunch for the original SD3

#

The actual difference between SD 3.0 and SD 3.5 Medium continues to throw me curveballs also lol
3.0 really is legit objectively better sometimes
Most notably the "everything goes grey and melty" thing that happens with 3.5 Medium when the prompt is overly long actually didn't / doesn't happen nearly as much in 3.0
This is an "all settings same" comparison, 3.0 on left, 3.5 Medium on right
For a super long prompt:

'''a photograph showcasing an intricately crafted glass teapot, featuring a detailed, miniature scene inside. The teapot is made of clear glass with ornate, golden details on its lid and base, giving it an elegant, antique appearance. Inside the teapot, a serene seascape is meticulously painted, depicting a turbulent ocean with white, foamy waves crashing against rocks. A majestic, wooden sailing ship with two tall masts and white sails is navigating through the turbulent sea. The ship is depicted in warm, earthy tones of brown and white, standing out against the cool blues and whites of the ocean. The sea is rendered in realistic detail, with waves crashing against the glass, creating a sense of movement and depth. The rocks in the foreground are textured and detailed, adding to the immersive miniature scene. The scene is illuminated by a warm, golden light, possibly from the flame of a candle or a lamp, visible in the background. This light source casts a soft glow, enhancing the golden accents on the teapot and adding warmth to the cool blue tones of the sea. The background features a blurred, cozy indoor setting with a wooden table and a single, large, orange candle flame casting a warm, inviting ambiance.'''

3.0 looks like most models do for this prompt
3.5 Medium though is like, trying but melting in the process
So I dunno what's going on there lol

proven pecan Apr 10, 2025, 7:23 PM

#

@devout schooner Well, this is my sd3.5 medium version of your prompt. So it must be ...

a_photograph_showcasing_an_intricately_crafted_glass_teapot__featuring_a_detailed__miniature_scene_inside__the_teapot_is_made_of_clear_glass_with_ornate__golden_details_on_its_lid_and_b_4082523719.png

#

This is with no CLIP L or CLIP G text

devout schooner Apr 10, 2025, 8:37 PM

#

proven pecan <@1008592615152816199> Well, this is my sd3.5 medium version of your prompt. So ...

What sampler settings? Also what was the seed if you have it

#

My images were generated with workflows that were literally identical except for the model swap BTW
just the default comfy ones for 3.5 / 3.0

proven pecan Apr 11, 2025, 8:40 AM

#

devout schooner What sampler settings? Also what was the seed if you have it

I'm using Draw Things (mac app) not comfy, which uses a bit different jargon so not sure if this is helpful.
Sampler DPM++ 2M Trailing (matches SGM_uniform)

#

random seeds 4082523719, 3144246774

bitter hearth Apr 11, 2025, 9:22 AM

#

devout schooner The actual difference between SD 3.0 and SD 3.5 Medium continues to throw me cur...

3.0 medium was always much stronger yeah

#

and 3.0 large for that matter

bitter hearth Apr 11, 2025, 9:24 AM

#

proven pecan I'm using Draw Things (mac app) not comfy, which uses a bit different jargon so ...

there is something up with certain implementations of SD 3.5 (both M and L) because when I use it in the official Huggingface demo I get much better results than when I use it in ComfyUI

proven pecan Apr 11, 2025, 9:57 AM

#

bitter hearth there is something up with certain implementations of SD 3.5 (both M and L) beca...

and nobody noticed comfy about this?

bitter hearth Apr 11, 2025, 10:05 AM

#

these days I tend to either use pure pytorch/JAX or C++/Rust kernels (when I can) so it didn't really matter that much to me either way

fathom merlin Apr 11, 2025, 10:22 AM

#

bitter hearth these days I tend to either use pure pytorch/JAX or C++/Rust kernels (when I can...

hey, sry to bother you but you seem very knowledgeable
so theoretically higher order ODE solvers should converge in a fewer number of steps right? then why can, say, dpm++2m generate a nice image in ~20 steps, while something like ipndm needs >30 otherwise there's very visible artifacts?

icy drift Apr 11, 2025, 10:43 AM

#

Testing HiDream, first result that really blew me away. Just beautiful. "Three antique fantasy potion glass bottles with labels in cursive font are sitting on a rustic wooden bench. The first bottle contains blue liquid and has the label "Mana". The second bottle contains red liquid and has the label "Health". The third bottle contains green liquid and has the label "Stamina". The warm lighting refracts through the liquid in splashes of beautiful color, casting raytraced caustic colors on the table below."
However, that's not cursive but calligraphy, and stamina is misspelled. There's no image2image in comfy at the moment, so I can't refine an image with a second pass.

#

A glass cannon. It took a little more prodding than expected to get a cannon though. The model tends to ignore unexpected words maybe?

#

No text reflection (no model I've used can do this yet, I'm just waiting for the day).

#

Gave me the title, art, and text I asked for, with a slight mistake in the text. (This was a 1-shot, usually in Flux I would do quite a few rolls.)

#

This skin and hair are very believably wet! Is it the best I've seen? Maybe.

bitter hearth Apr 11, 2025, 10:48 AM

#

fathom merlin hey, sry to bother you but you seem very knowledgeable so theoretically higher o...

there are different types of ODE solvers
if you are looking just within the category of explicit runge kutta solvers (like DPM++2m), higher order solvers can converge in a smaller number of steps
but ipndm is not an explicit runge kutta its a different category

icy drift Apr 11, 2025, 10:48 AM

#

It can't do many-numbered dice pip prompts. Not AGI here anyway. This is basically a slightly more capable version of Flux.

bitter hearth Apr 11, 2025, 10:50 AM

#

HiDream looks quite a bit better than Flux to me

#

especially in fine details

dry wave Apr 11, 2025, 10:51 AM

#

I mean, it has more parameters 🤷‍♂️ I would like to see how Flux would perform when replacing T5 with a more powerful text encoder

#

from the architecture I found HiDream very disappointing and wasteful

fathom merlin Apr 11, 2025, 10:52 AM

#

bitter hearth there are different types of ODE solvers if you are looking just within the cate...

oh
so for a fair comparison, I'd look at say dpm++2m and dpm++3m?

icy drift Apr 11, 2025, 10:52 AM

#

bitter hearth HiDream looks quite a bit better than Flux to me

Actually it has banding in all these images, like Flux gives at res >= 2048. I'm not sure HiDream is even usable at all because of that. I'm really hoping that's just the Comfy node.

dry wave Apr 11, 2025, 10:53 AM

#

what resolution do you use? I think HiDream has a max resolution of 1024x1024

icy drift Apr 11, 2025, 10:55 AM

#

dry wave what resolution do you use? I think HiDream has a max resolution of 1024x1024

So far I've only tried the resolutions HiDream used in their python scripts in their official repo, although I plan to test large image gen later for things like duplication. I've tried Euler and UniPC, although the Comfy versions might not be the same as the versions their using in their repo. (Part of why I'm holding out hope the banding will go away.)

#

Wow that is without a doubt the best prompt adherence I've seen so far. This is a 1-shot.
From left to right: An old man, a little girl, and an old woman are sitting on a park bench. The old man on the left is Chinese with gray hair and a green jacket and he is asleep with his eyes closed. The little girl in the middle is Russian with black hair and she is laughing happily and wearing a yellow sundress. The old woman on the right is Native American and has faded red hair and is wearing t-shirt and jeans, and is looking down at the smartphone she is texting on in her hands. The scene is brightly lit outdoors.
I think the girl might have come out a little more Chinese than Russian though. But Mongolian sort of blends between the two, so it's not too wrong.

bitter hearth Apr 11, 2025, 11:04 AM

#

fathom merlin oh so for a fair comparison, I'd look at say dpm++2m and dpm++3m?

these are both multistep which complicates things
if you want easy comparison then compare euler to heun

icy drift Apr 11, 2025, 11:04 AM

#

Huh, even modding the script, 4k resolution actually fails with an error. I can't even attempt it. 😕 Never seen that before.

#

I guess I'm trying 2048*2048.

bitter hearth Apr 11, 2025, 11:05 AM

#

fathom merlin oh so for a fair comparison, I'd look at say dpm++2m and dpm++3m?

you've gotta learn ODE solving outside of comfy/diffusers though
like get a copy of Julia or Matlab instead

#

diffrax is ok as well

#

by the standards of the computational mathematics community, the code in AI community is fairly error-prone
so its better to learn the math seperately

#

on the other hand computational mathematics libraries tend to be less optimised in terms of things like CUDA kernels so there are pros and cons

fathom merlin Apr 11, 2025, 11:08 AM

#

I see

bitter hearth Apr 11, 2025, 11:09 AM

#

icy drift Huh, even modding the script, 4k resolution actually fails with an error. I can'...

that's weird, most of the time things let you generate at 4k the image just comes out bad, but it will let you generate anyway

icy drift Apr 11, 2025, 11:10 AM

#

bitter hearth that's weird, most of the time things let you generate at 4k the image just come...

Yeah it's not OOM either.

dry wave Apr 11, 2025, 11:10 AM

#

I remember that in the codebase they check for too large resolutions and reject them. You might have to remove that.

icy drift Apr 11, 2025, 11:11 AM

#

Doesn't matter, 2048 fails spectacularly. Completely unusable above expected resolution (and already banding there, so just forget it). 325 seconds on my PC.

#

From HiDream's python script:

dry wave Apr 11, 2025, 11:14 AM

#

yeah, you have to change the max resolution parameter in the script to generate larger images

#

but I assume they put it in there for a reason 😅

icy drift Apr 11, 2025, 11:15 AM

#

dry wave yeah, you have to change the max resolution parameter in the script to generate ...

As I said, I did that, and got a tensor mismatch error. This might be some weird architecture, or a problem with the comfy node script. I'm sure Comfy will get native support ASAP considering this outranks Flux on that leaderboard everyone uses.

dry wave Apr 11, 2025, 11:15 AM

#

no, the tensor missmatch comes from that

#

you changed the wrong part

#

https://github.com/HiDream-ai/HiDream-I1/blob/main/hi_diffusers/models/transformers/transformer_hidream_image.py#L230

GitHub

HiDream-I1/hi_diffusers/models/transformers/transformer_hidream_ima...

Contribute to HiDream-ai/HiDream-I1 development by creating an account on GitHub.

#

its the max resolution parameter

#

(or the max_seq variable respectively)

icy drift Apr 11, 2025, 11:18 AM

#

dry wave no, the tensor missmatch comes from that

Well, it was also completely unusable at 2048, so I'm not going to pursue it any further.

bitter hearth Apr 11, 2025, 11:19 AM

#

icy drift Doesn't matter, 2048 fails spectacularly. Completely unusable above expected res...

accidental horror/comedy lol
TBH this still looks better than the way SDXL will error if the resolution is too high

#

base flux was not particularly great either above 1560x1560

#

there are fine tunes that take flux to 2560x2560 but they have some de-distill in them

icy drift Apr 11, 2025, 11:21 AM

#

bitter hearth accidental horror/comedy lol TBH this still looks better than the way SDXL will ...

Yeah Flux too. I think the best tile controlnet around is still for SDXL but I haven't kept up with it. Mostly been trying to get better video gens lately. Been a while since I tried a new image gen.

bitter hearth Apr 11, 2025, 11:21 AM

#

the SOTA for mirrors is an SD 1.5 or SD 2.1 finetune lol

#

can't remember which

#

they made an entire foundation model just for mirrors

#

SDXL tile controlnet is excellent, I like SD 1.5's one best though

#

https://github.com/Kosinkadink/ComfyUI-Advanced-ControlNetwith the softweight node from here

#

as far as I know it lowers the strength per block

icy drift Apr 11, 2025, 11:23 AM

#

I'm doing a last reflection test, and then I need to go.

#

Good, but not exactly what I asked for. Gotta split.

bitter hearth Apr 11, 2025, 11:24 AM

#

my favourite controlnet of all is SD 1.5 with XDoG scribble

#

okay bye
fur details were good

#

if Hi-dream can deliver higher small details than flux and then be the same in other areas I would still take that trade TBH

dry wave Apr 11, 2025, 11:26 AM

#

please do examples with dogs and not with little girls. That's weird...

bitter hearth Apr 11, 2025, 11:26 AM

#

dogs strongly preferred yeah

#

or cats for that matter

icy drift Apr 11, 2025, 11:26 AM

#

dry wave please do examples with dogs and not with little girls. That's weird...

I didn't ask for a little girl though. Could've said "woman" instead of "girl". Meh. Deleted.

#

The dog's lighting and shadows should be reflected in the mirror, and they're not.

dry wave Apr 11, 2025, 11:27 AM

#

I just say the image is a bit borderline

bitter hearth Apr 11, 2025, 11:27 AM

#

I started running NSFW classifiers in order to force outputs to be cloud-friendly

icy drift Apr 11, 2025, 11:28 AM

#

dry wave I just say the image is a bit borderline

I don't agree or care though. And I'm so gonna be late. Gotta run for real.

bitter hearth Apr 11, 2025, 11:28 AM

#

ok bye

#

on Vast.ai I always assume the docker container is being watched by the host
so I mostly make 1950s city images lol

cinder junco Apr 11, 2025, 11:30 AM

#

Does anyone know if HiDream functions on MPS?

bitter hearth Apr 11, 2025, 11:31 AM

#

not sure its always tricky with Apple cos their version of pytorch is missing a ton of functions

cinder junco Apr 11, 2025, 11:31 AM

#

Comfy Manager still doesn't list any custom node for it, and I'm hesitant to install directly from a github.

#

Yeah, I'm running the nightly torch builds, but understand that, in their infinite wisdom, they chose to have thousands of unique operators.

bitter hearth Apr 11, 2025, 11:32 AM

#

since the registry update I stopped using manager I would actually call installing directly preferable

#

IMO Apple should have improved the OpenVino or Vulkan ecosystems instead of making their own thing

cinder junco Apr 11, 2025, 11:32 AM

#

Yeah, well, I just saw people reporting difficulties and possibly getting their comfy install nuked due to it. Like needing to install Flash Attention (which, as far as I know, doesn't function on MPS).

bitter hearth Apr 11, 2025, 11:33 AM

#

OpenVino in particular has been cooking rly hard lately

#

ah yeah ok I do know that the default Hi-dream workflow requires flash-attention 2

#

cos I was installing flash-attention 2 on a server the other day for that reason

#

if Apple doesn't support that at the moment then that's gonna be an issue potentially

cinder junco Apr 11, 2025, 11:34 AM

#

More like Flash Attention doesn't support MPS. A majority of AI stuff is built purely with nVidia in mind.

bitter hearth Apr 11, 2025, 11:35 AM

#

yeah

#

I've been looking at making a distributed Intel CPU inference engine and its tricky with lack of support

cinder junco Apr 11, 2025, 11:36 AM

#

Alright. Maybe I'll try to be patient for a while and see how things play out, rather than getting jealous of people using the new hotness.

#

I'm assuming there won't be any tiled diffusion solutions that work with HiDream for quite a while anyway. I'm not really satisfied with 1MP generations.

#

I could cobble a workflow together using Flux for the upscale, but I'd end up chugging the VM too hard.

bitter hearth Apr 11, 2025, 11:39 AM

#

if you can get openvino working on mac I've been working on a tiled image editing thing for openvino lol

#

its sort of a joke but it really does have tile counts up to the low millions

#

I found out that python PIL package stops working if your image goes above 300k or so because it assumes that the image is malware

cinder junco Apr 11, 2025, 11:41 AM

#

Each tile being 1MP?

bitter hearth Apr 11, 2025, 11:41 AM

#

LOL in that test each tile was 2 pixels wide and 2 pixels tall

cinder junco Apr 11, 2025, 11:41 AM

#

Heh, okay.

bitter hearth Apr 11, 2025, 11:42 AM

#

but yeah I want each tile to be the size of a proper diffusion image so 512x512, 1024x1024 or 1536x1536

cinder junco Apr 11, 2025, 11:47 AM

#

My tiled workflow for Flux is working well enough that I've thought of going higher (currently 3x scale for ~9MP), but there are some blockers. The second stage is sensitive to the level of detail (and the structure of those details) in the input image it is provided, so I need to do a model upscale with 4xUltraSharp. Otherwise, the 2nd stage result will just be blurry. I don't need to invent a very expensive bicubic scaler. Anyway, I've never seen a node that can do a tiled model upscale in Comfy. If I give the model upscale node a 9MP input image and it scales 4x, I'm going to have some major memory issues. I'd also expect more image consistency/hallucination issues when the ratio between the size of the target image and the tiles increases. I get that even at 9MP when the image has large areas of low detail (like a foggy, overcast scene with few foreground objects).

bitter hearth Apr 11, 2025, 12:07 PM

#

9MP is a pretty good size, I think above that size its diminishing returns

#

since most people have 4k screens I generally would use 4k as the minimum

#

these nodes work if I remember rightly https://github.com/kinfolk0117/ComfyUI_SimpleTiles

GitHub

GitHub - kinfolk0117/ComfyUI_SimpleTiles

Contribute to kinfolk0117/ComfyUI_SimpleTiles development by creating an account on GitHub.

#

with Flux though I tend to use SD 1.5 as the upscaler

#

Flux itself adds less details

cinder junco Apr 11, 2025, 12:11 PM

#

I'm mostly OK with Flux's details. I find it does well with natural details. The castle is only so-so.

bitter hearth Apr 11, 2025, 12:13 PM

#

this looks rly good for flux yeah

#

definitely above average for flux img

#

the castle is a good example of where flux upscaling goes a bit weird- SD 1.5 would for sure have also boosted the castle detail

#

it feels like flux picks certain objects to not improve lol

cinder junco Apr 11, 2025, 12:16 PM

#

I suspect it is another case of sensitivity to the upscaling model, but don't have proof. I've tried a lot of upscalers but keep coming back to 4xUltraSharp. It just seems to work particularly well with Flux in getting those details. But it definitely has weaknesses and sometimes doesn't generate enough pixel-level detail for Flux to work with.

bitter hearth Apr 11, 2025, 12:17 PM

#

if you can do some pixel-space noise injection that can help

#

as well as noisy sampler

#

problem with noisy sampler is you then tend to need more like 60+ steps

#

which is rough for an upscale pass

cinder junco Apr 11, 2025, 12:18 PM

#

I haven't tried overlaying noise. Flux already seems to like slightly noisy output, so I don't particularly want to encourage that.

bitter hearth Apr 11, 2025, 12:19 PM

#

yea it can be tricky to not have the noise stay in the image

cinder junco Apr 11, 2025, 12:19 PM

#

Not sure what you mean by a noisy sampler. I've settled on bosh3, but it's hard for me to tell if there is an "optimal", let alone what it is.

bitter hearth Apr 11, 2025, 12:19 PM

#

you have the option of doing a third pass to clean up noise with SD 1.5 etc

#

I meant ancestral or SDE

#

bosh3 is nice though

cinder junco Apr 11, 2025, 12:21 PM

#

I'm kind of a model purist and am resistant to going back to SD1.5 😆 .

#

I've heard people claim they found ways of getting ancestral and SDE samplers working with Flux and SD3, but don't know how they accomplished it. I've never found it to work with a normal workflow.

#

I also kind of dislike ancestral samplers because they don't converge, so you have no clue where to stop in pushing the number of steps.

#

I have a natural tendency to min/maxing, so ancestral drives me crazy.

bitter hearth Apr 11, 2025, 12:25 PM

#

to get SDE working with Flux and SD3 its just a matter of making sure the variance adheres to the variance of the VP SDE, essentially

#

but it can be tricky in practice to convert from math into code sometimes because different papers use different notation systems

#

you tend to need more like 60+ steps for SDE so if you had less than that then that is why it didn't work well

cinder junco Apr 11, 2025, 12:28 PM

#

I don't know what you mean by "VP".

bitter hearth Apr 11, 2025, 12:28 PM

#

IDK if it's worth getting into the details but it goes back to an old paper called Song 2020

#

https://arxiv.org/abs/2011.13456

arXiv.org

Score-Based Generative Modeling through Stochastic Differential Equ...

Creating noise from data is easy; creating data from noise is generative modeling. We present a stochastic differential equation (SDE) that smoothly transforms a complex data distribution to a known prior distribution by slowly injecting noise, and a corresponding reverse-time SDE that transforms the prior distribution back into the data distrib...

cinder junco Apr 11, 2025, 12:31 PM

#

Not sure I'll be able to parse the paper. I only ever studied DEs at a surface level, and most of the AI-related papers require knowledge of previous papers to understand.

bitter hearth Apr 11, 2025, 12:31 PM

#

ye its not needed to go into that level of detail neccesarily

#

for the most part you can just pick from existing implementations of stuff

cinder junco Apr 11, 2025, 12:32 PM

#

Yeah, but if I need to somehow "match variances" by playing around in Comfy and not having any insight into the math of what it's doing... lol

bitter hearth Apr 11, 2025, 12:33 PM

#

ye this is what I was saying earlier you've gotta learn the math outside of systems like comfy or diffusers

#

and then if needed you can bring what you learnt back in

dry wave Apr 11, 2025, 5:16 PM

#

I found SD 1.5 often adds too much details on super high resolution

#

if every little spot in your image is super sharp and detailed it looks weird, too

#

but yeah, flux often looks a bit blurry when upscaling. I wonder if anybody tried fine-tuning flux on cropped ultra high resolution images

devout schooner Apr 11, 2025, 6:10 PM

#

cinder junco I've heard people claim they found ways of getting ancestral and SDE samplers wo...

Euler Ancestral and the DPM2 Ancestral ones both "just work" out of the box in a normal KSampler in stock Comfy with the Beta and Normal schedulers, currently

#

For SD3 / 3.5

#

Presumably also Flux

devout schooner Apr 11, 2025, 6:15 PM

#

bitter hearth 3.0 medium was always much stronger yeah

It's just the aesthetics on certain prompts I find
Like anatomy generally is definitely way worse in 3.0, they did improve that a bunch in 3.5 Med
But it seems 3.0 just had a very very different dataset than 3.5 Med or something

bitter hearth Apr 11, 2025, 8:12 PM

#

yeah I never did people in 3.0 just landscape and sci fi

#

I used it a ton until flux release day

#

3.0 was much more photorealistic than 3.5

#

I only jumped to Flux once photorealistic loras/checkpoints arrived

#

the first being RealvisSchnell

#

followed by a bunch that had some de-distill in them

#

I never really used regular Flux so to speak

devout schooner Apr 11, 2025, 10:54 PM

#

bitter hearth 3.0 was much more photorealistic than 3.5

I did figure out how to fix the teapot boat prompt on 3.5 Medium BTW
i'm not sure now it necessarily has anything to do with prompt length (or at least not always), I think it's moreso just the dataset vs 3.0's
2d, 3d, cgi, render, smoke, fog, haze, mist, cartoon, anime, painting, drawing, sketch, illustration, traditional media, watercolor, airbrushed
in the negative gave me this on 3.5 Medium
it's still a bit oil painting esque for the boat area for my tastes, relative to 3.0, but way way more normal looking than before
good to know that broadly negating stuff like that does actually work

forest meadow Apr 11, 2025, 11:36 PM

#

give me a colorful desk

bitter hearth Apr 12, 2025, 6:17 AM

#

its definitely better
it still has this scratchy details effect that I struggle to get rid of

#

its as if it needs perturbed attention guidance or something to clean it up

#

with negatives you can boost them a bit by delaying the negative for some steps, sometimes up to like 30-40% of the steps

#

its different for every prompt so it takes some experimentation

#

essentially negatives seem to work better once the thing you are trying to change has just briefly appeared in the image

#

its swings and roundabouts cos some of the details are excellent like at the base here

#

in the diffusion models there is a clear trade-off between big and small details (this is what FreeU is about)
SD3.5 won't have the same FreeU mechanics but maybe there is a similar trade-off

craggy crest Apr 12, 2025, 7:31 AM

#

bitter hearth in the diffusion models there is a clear trade-off between big and small details...

FreeU is specific to unet - which has skip connections between layers that can be tweaked. SD 3.5 uses MMDIT - which is a totally differnt architecture. however dango did do some tweaking in this workflow

📎 SD3.5M_SLG_example_workflow.json

bitter hearth Apr 12, 2025, 7:34 AM

#

thanks I will try this one

#

yeah I miss the skip connections

craggy crest Apr 12, 2025, 7:53 PM

#

bitter hearth yeah I miss the skip connections

yeah, but mmdit doesnt' have those :(

bitter hearth Apr 12, 2025, 8:06 PM

#

sadcat

errant dust Apr 12, 2025, 11:50 PM

#

devout schooner I did figure out how to fix the teapot boat prompt on 3.5 Medium BTW i'm not sur...

The only issue with this image is that the AI made the rocks and water in the bottom distort, giving the impression it is painted outside

devout schooner Apr 13, 2025, 12:34 AM

#

bitter hearth <:sadcat:1130568570712109176>

honestly these boat attempts I posted here were with DPM++ 2M SGM Uniform and I think no Skip Layer Guidance

#

in general I find that using Skip Layer Guidance along with ClownsharkBatwing's RES4LYF samplers produces WAY better results

#

Euler Ancestral also "just works" in stock Comfy for SD 3.5

#

and doesn't have nearly as much of that grainy look, particularly with the Normal scheduler

#

relative to Euler

#

TLDR as I've said before a big problem with almost all these newer models is that the default samplers recommended are nearly always super mediocre ones that nobody would ever use if they didn't have to

devout schooner Apr 13, 2025, 12:37 AM

#

errant dust The only issue with this image is that the AI made the rocks and water in the bo...

it's like very close still to exploding into melty everything-is-very-greyness though
which is a distinct problem of SD 3.5, both Large and Medium
it wasn't so much of a thing in the original 3.0

#

might be a captioning problem or something

#

it seems like there's excessive bleed of extremely painterly traditional media data into basically all gens unless you negate it

#

or something like that

#

that's the best theory i have

#

like if any significant number of the captions just said like "a man beside a tree"
instead of "a painting of a man beside a tree"
or a "a photo of a man beside a tree"
then that'd be the problem
if there was a lot of art data without any particular categorization

#

i think

errant dust Apr 13, 2025, 12:39 AM

#

I don't know what you mean, but aside from that oddity it looked great

devout schooner Apr 13, 2025, 12:46 AM

#

errant dust I don't know what you mean, but aside from that oddity it looked great

this is the same prompt on the original SD 3.0 Medium, with a particular seed and the increment of that seed
it looks just, normal by basically any metric
as I'm pretty sure most people would expect

#

this is the same seed and same increment, on SD 3.5 Medium

#

note how the entire image is distinctly hazy and grey in 3.5

#

and the line resolution for small details is just worse

#

and this is WITH the negative prompt I mentioned before (for both the 3.0 versions and 3.5 versions)

#

I sincerely doubt this was intentional

#

it looks objectively worse

errant dust Apr 13, 2025, 12:48 AM

#

I definitely think the 3.5 is better overall

devout schooner Apr 13, 2025, 12:48 AM

#

the model is yes

#

way better anatomy and such

#

but the grey haze bleeding into EVERYTHING is incredibly annoying

errant dust Apr 13, 2025, 12:49 AM

#

in the above images

#

what was the prompt?

devout schooner Apr 13, 2025, 12:53 AM

#

positive:
a photograph showcasing an intricately crafted glass teapot, featuring a detailed, miniature scene inside. The teapot is made of clear glass with ornate, golden details on its lid and base, giving it an elegant, antique appearance. Inside the teapot, a serene seascape is meticulously painted, depicting a turbulent ocean with white, foamy waves crashing against rocks. A majestic, wooden sailing ship with two tall masts and white sails is navigating through the turbulent sea. The ship is depicted in warm, earthy tones of brown and white, standing out against the cool blues and whites of the ocean. The sea is rendered in realistic detail, with waves crashing against the glass, creating a sense of movement and depth. The rocks in the foreground are textured and detailed, adding to the immersive miniature scene. The scene is illuminated by a warm, golden light, possibly from the flame of a candle or a lamp, visible in the background. This light source casts a soft glow, enhancing the golden accents on the teapot and adding warmth to the cool blue tones of the sea. The background features a blurred, cozy indoor setting with a wooden table and a single, large, orange candle flame casting a warm, inviting ambiance.

negative (used for both, although it's only really necessary or at least helpful with 3.5 Medium, 3.0 Medium doesn't need or benefit from it):
2d, 3d, cgi, render, smoke, fog, haze, mist, cartoon, anime, painting, drawing, sketch, illustration, traditional media, watercolor, airbrushed

Sampler was DPM++ 2M SGM Uniform (no fancy RES4LYF stuff for the sake of the examples), CFG 5.5, 25 steps

#

without that negative to push away the haziness, 3.5 Medium produces absolute garbage like this, with everything else the same:

#

whereas 3.0 Medium always looks normal / propera and doesn't have the greyness issue at all

errant dust Apr 13, 2025, 12:55 AM

#

Normal how? The lighting is all wrong

devout schooner Apr 13, 2025, 12:56 AM

#

errant dust Normal how? The lighting is all wrong

normal as in it does not literally look like the entire room is filled with smoke or fog lol

#

and as in the lines aren't nearly as much of an utter mess

errant dust Apr 13, 2025, 12:57 AM

#

instead of soft candle light it is high contrast with bright colors

#

the opposite of the prompt

devout schooner Apr 13, 2025, 12:58 AM

#

errant dust instead of soft candle light it is high contrast with bright colors

if the people who made the model were actually that aesthetically blind than that would explain everything I guess
but like
THIS is a fine take of soft candle light from 3.0

errant dust Apr 13, 2025, 12:58 AM

#

no

#

the entire image is supposed to be the result of candle light

#

not just the candle

#

3.0 is all wrong

devout schooner Apr 13, 2025, 12:59 AM

#

I mean we clearly disagree but this is semantics
this is a VERY real problem that SD 3.5 Medium has but 3.0 didn't
3.5 Medium VERY regularly produces images with a ridiculous, excessive grey haze across the entire image, in cases where you could not possibly argue it makes sense
and terrible resolution of lines for small details
unless you use negatives and better samplers
3.0 Medium had a lot of issues but it didn't have ones like that

errant dust Apr 13, 2025, 1:00 AM

#

this is not semantics. 3.0 looks like a room with electric lights

devout schooner Apr 13, 2025, 1:01 AM

#

even putting the haze aside
the teapot looks like absolute butt

#

in this no negative 3.5 Medium version

#

it really looks like it desperately wants to make it an oil painting

#

and not photorealistic

errant dust Apr 13, 2025, 1:01 AM

#

devout schooner Apr 13, 2025, 1:01 AM

#

that looks objectively better

#

it's not painterly

errant dust Apr 13, 2025, 1:02 AM

#

no way those reflections on the glass are from candle light

#

they come from bright electric lighting

devout schooner Apr 13, 2025, 1:03 AM

#

I mean i don't tihnk this conversation is going anywhere useful
this is a blurry, hazy mess that looks like a painting when it should not, any way you cut it:

#

the nitpicks about lighting are not relevant

errant dust Apr 13, 2025, 1:03 AM

#

no?

devout schooner Apr 13, 2025, 1:04 AM

#

errant dust no?

yes
if you think that looks "good" this conversation is as pointless as i thought

#

litearlly nobody wants that output from that prompt
i promise you

errant dust Apr 13, 2025, 1:04 AM

#

then I guess the prompt is irrelevant too

devout schooner Apr 13, 2025, 1:05 AM

#

nothing in the prompt says "literally add extreme fog EVERYWHERE, be sure that the lines are horribly resolved, make everything as blurry and foggy as possible"

#

which was the end result

#

that is the only issue I care about here

#

i don't know why you're nitpicking the other stuf

errant dust Apr 13, 2025, 1:05 AM

#

since it says in detail it is supposed to be low soft light from candles

devout schooner Apr 13, 2025, 1:06 AM

#

that does not look like candlelight
it's a problem that 3.5 Medium has even for prompts that don't even mention ANYTHING about light

errant dust Apr 13, 2025, 1:06 AM

#

which impacts everything

devout schooner Apr 13, 2025, 1:07 AM

#

I will give you numerous examples if you want
it looks like butt
nobody wants "realistic" gens to come out like that
I assure you
and part of this IS definitely caused just by too long prompts
but it's not entirely
as 3.0 Medium was simply not as impacted by it

#

it's almost certainly related to poorly captioned art data somewhere in the 3.5 Medium dataset, I think

#

partially at least

errant dust Apr 13, 2025, 1:09 AM

#

I will assume you already polled every one

#

Almost, since I preferred the 3.5 output

devout schooner Apr 13, 2025, 1:11 AM

#

errant dust I will assume you already polled every one

I have seen many people say variations of "why the hell is it so grey?" about 3.5 Large and Medium
never seen any opposite opinion expressed until now
I shouldn't have to fight to get 3.5 Medium to produce non-2d-or-painterly-in-any-way outputs
is the overall point
and indeed you didn't really have to do that with 3.0
despite the other flaws it had

errant dust Apr 13, 2025, 1:14 AM

#

I find this image to be far more realistic looking than the 3.0 counter samples you shared.

devout schooner Apr 13, 2025, 1:23 AM

#

errant dust I find this image to be far more realistic looking than the 3.0 counter samples ...

that was the one I gave as a better example aided by the negative yeah
but the colors across the entire image are still way too dull and grey for what I'd want, in a way that doesn't really make sense
and the lid / bottom of the teapot as well as the boat are very clearly wanting to be paintings instead of realistic based on how messy and poorly resolved the lines are

errant dust Apr 13, 2025, 1:24 AM

#

The colors reflect the lighting

#

Here is your prompt without all the insistence of candles

#

devout schooner Apr 13, 2025, 1:28 AM

#

what's the exact prompt for this version?

errant dust Apr 13, 2025, 1:29 AM

#

a photograph showcasing an intricately crafted glass teapot, featuring a detailed, miniature scene inside. The teapot is made of clear glass with ornate, golden details on its lid and base, giving it an elegant, antique appearance. Inside the teapot, a serene seascape is meticulously painted, depicting a turbulent ocean with white, foamy waves crashing against rocks. A majestic, wooden sailing ship with two tall masts and white sails is navigating through the turbulent sea. The ship is depicted in warm, earthy tones of brown and white, standing out against the cool blues and whites of the ocean. The sea is rendered in realistic detail, with waves crashing against the glass, creating a sense of movement and depth. The rocks in the foreground are textured and detailed, adding to the immersive miniature scene.

#

same samplers

devout schooner Apr 13, 2025, 1:47 AM

#

errant dust a photograph showcasing an intricately crafted glass teapot, featuring a detaile...

yeah it's definitely an improvement
trying it on SD 3.0 too though still gives like, noticeably cleaner / crisply resolved lines throughout

#

and I still find the 3.5 Medium version to be overly grey and dull-looking

#

I think one of the people who work for SAI have even said that the 3.5 Medium dataset was more art focused too, so I suspect my suspicions about rogue captions are probably at least semi-accurate