#🆕|sd3

1 messages · Page 130 of 1

dry wave
#

yes, because they messed up the positional embedding

#

in the end you have to train all architectures on different resolutions. There won't be a solution where the model can extrapolate to any resolution

#

but convolutional architectures have the problem that their receptive field is fixed to a specific size. So they won't be able to generalize to much higher resolution without you having to change the architecture

#

on the other hand: who cares? I think 1-2 megapixels are more than enough and having flexibility in the aspect ratio is way more important

bitter hearth
devout schooner
bitter hearth
#

I see some strange attention issues sometimes like very small objects or collapse structure sometimes as well

dry wave
#

if you train on super high resolutions you can either
a.) downsize the image
b.) crop the image

#

I think earlier SD versions did strategy 2). They random cropped the image

#

so they learned how to denoise extremely zoomed in cropped tiles of a high resolution image

#

later SD variants then rather used strategy 1.). They only cropped the image such that it fits into their aspect bucketing and otherwise used downsizing

bitter hearth
#

this might explain why SD and SDXL are weirdly good at tiled upscaling

dry wave
#

the reason: if you train on cropped images, you lose the alignment between text and image (e.g. if your prompt is "image of a man with sun glasses" and the cropped image is just some part of the street)

#

also sometimes the models created cropped images (e.g. headless people) which was a result of the cropped training

bitter hearth
#

its probably better not to have crops in the base model yeah

#

I noticed flux doesn't actually have close up textures in its data

#

like if you try close up of trees or rocks

#

SD and SDXL know it but flux doesn't know what to do

dry wave
#

maybe having a diffusion model without text that is used for upscaling? Dunno. All the upscaling models so far are very small and trained on smaller datasets

bitter hearth
#

not sure what is best for upscaling
every week about 20 different arxiv papers all claim SOTA

#

which clearly means most of the SOTA claims are not right

#

and they cherry pick the comparisons so they make the other methods look bad

devout schooner
bitter hearth
#

there are dedicated upscaling models these days that are stronger

#

they lose the ability to generate images "normally" though

dry wave
# bitter hearth I wasn't sure if it had been confirmed or not what caused some of the issues SD3...

SD3 is using relative positioning where the (0,0) coordinates are always in the center of the image.
Flux is using absolute positioning where the (0,0) coordinates are in the top left corner of the image.
In theory the SD3 thing makes sense, but it seem to not work. As the Flux devs are the same who also developed SD3 I'm pretty sure they changed to the simpler positioning scheme for a good reason.

dry wave
bitter hearth
#

I see thanks, I wasn't aware about the relative/absolute positioning thing

devout schooner
past cipher
#

Using the toys, I know for a fact SD 1.5 can gen to at least 1920x1080 (or vice-verse), and that SDXL can gen up to at least 3840x2160 (or vice-verse)

#

Talking about pure generation, no upscaling required.

#

I don't get to test it much because I'm only on 8GB of VRAM though

dry wave
#

there are these kind of hacks where subsample the latents in the unet

#

a normal unet cannot do that

#

not if its not trained on these high resolutions

#

the problem with these hacks is: you don't really need them. You could just do the normal upscaling workflow (lowres generation, upscaling, img2img)

past cipher
#

I could do that, but I will always prefer a one model solution without upscaling. Just to see what things are capable of.

dry wave
#

yes, but its not capable of that

#

all these "solutions" are also doing something similar like downshrinking the latent representation of the image internally

#

convolution has a fixed receptive field. It just cannot generate images in arbitrary resolutions

past cipher
dry wave
#

it cannot extrapolate xD That's what I say

#

like it can "extrapolate" in the sense that the image has same size and is just expanded infinitely. Like outpainting

#

but it cannot make the resolution arbitrary high. So to speak: there is a maximum size a "human head" can have in the unet architecture. It cannot make it larger

#

so making an image which is a super high resolution close-up of a face will fail

#

you can, however, do a lowres image first, scale it up, do img2img IF the model was trained on high resolution textures

past cipher
dry wave
#

I talk about technical limitations

past cipher
#

Also make sure you're adding "close up portrait"

past cipher
dry wave
#

the SD unet has no positional embedding. The "latent pixels" in your image don't really know "where in the image" they are. This is the responsibility of the convolutional layers: they lay out the image composition by "telling" the pixels where they are (how far from the left corner of the image, how far from the bottom corner and so on) and where they are relative to each other ("these two pixels are neighbours")

#

but the receptive field of convolutions is limited. There is a "maximum range" you can exchange information this way. So if your image is very large, then the pixels in the middle of the image do not know how far away they are from the border. They do not know where in the image they are. They still can exchange information via attention, but as they don't have any absolute positioning information this is really difficult. One pixel on the left side of the image does know that another pixel on the right side is "not close", but it does not know if this pixel is above or below, left or right of the other pixel.

bitter hearth
#

yes with reasonable quality there is a pretty low limit with SD and SDXL
they work very well tiled but not in one big tile

devout schooner
bitter hearth
#

these sorts of numbers, below 1560x1560 are fine yeah

devout schooner
bitter hearth
#

yeah this level is fine, in fact its probably better to use SD 1.5 this way, around the 1 megapixel level

viscid pivot
#

these are native 2k gens (non upscaled) using flux or rather flex trained on a 2k dataset. I couldn't get these results on any of the SD models

#

Sd 3.5M beeing the only multi res capable model completely breaks apart during training with multi res, especially when using resolutions higher than 1.5k

bitter hearth
#

how many images were in your dataset

#

I think things have to go step by step

#

get the model to work well at 1.5k before think about 2k

#

at least from what I have seen/read, it could be possible to fix SD3.5's 1.5k generation with something on the scale of a few million images or so

viscid pivot
#

not millions

#

and to make matters even worse, i didn't even do a full finetune, it's only a high rank locon

#

the positional understanding in SD is just very very bad even in medium

past cipher
#

Positional prompts would certainly help in my tests

viscid pivot
#

Meaning if you ask for something in the center, you will have it 4x instead of one large image

#

or if you ask for something in the corder at higher resolutions it will do weird stuff

#

The base Flux models do a lot better in that regard which means their concept of positioning is already better and not as tied to resolutions

bitter hearth
#

with careful settings flux can do 3k

#

regarding SD3.5, I do think its plausible it will improve its 1,.5k ability substantially

#

but yeah my understanding is that the ballpark is million+ images, it may be less than that though

viscid pivot
#

The other huge issue with sd3.5 is the extremely low context size for the TE

#

training for more than 154 tokens already makes the model behave weird, but with over 256 it gets very bad

bitter hearth
#

yeah the attention masking wasn't done so what this means is the model has issues if the text encoder token count goes above a certain amount
this is less concerning because you can just limit prompt tokens

viscid pivot
bitter hearth
#

I think in terms of expectations it might be better off giving up on boosting it to flux level
and instead just try to improve the model a bit within its current performance bracket

viscid pivot
#

Yea i just hope the next SD model meets the expectations

bitter hearth
#

same I'm essentially hoping for SD4 at this point

#

the thing about flux is it has this insanely strong self attention

#

and then flux fill boosts it even further

#

if you are willing to sometimes re-roll seeds then you can use flux fill as your main model instead of flux dev and just outpaint everything

viscid pivot
#

I'm using a de-distilled model of flux schnell as my main. This prevents most of the issues flux dev has, especially in finetuning

bitter hearth
#

yeah that's good

viscid pivot
#

The self attention of flux makes it difficult to train in anything it doesn't know yet and you also break it's guidance embedding

bitter hearth
#

I only use flux checkpoints that have de-distilled checkpoints as part of them
the lighting gets so much nicer with CFG

viscid pivot
#

yea exactly, i hate the flux base style

#

flux chin, flux light

bitter hearth
#

yeah I don't like base flux at all I find it unuseable

#

but then with photography checkpoints and 3 or so strong photography loras I like it

viscid pivot
#

these are outputs from the locon i trained of flex (the de-distilled model). The lighting and flux like style went away basically instantly at the first few steps

#

they are from very early in training

bitter hearth
#

these are fantastic wow

#

no flux chin or flux skin
and the blur is nice

viscid pivot
#

yea im gonna soon merge the locon to the base model and release it haha

bitter hearth
#

default flux blur can be slightly strange

viscid pivot
#

everything looks like out of a game engine

bitter hearth
#

yeah for sure its the unrealengine look

#

not actual unrealengine but what models seem to think unrealengine is LOL

#

cos actual actual unrealengine can look better than flux base 😄

viscid pivot
#

yea, this was done intentionally for some reason. Their guidance embedding is meant to always default to this kind of look. With some heavy prompt engineering you can get photographic styles but it is very hard

#

and it is even harder to train that out of base flux

#

thats also why person loras work so well, becaue flux ignores most of the poses, stylistic elements and so on

bitter hearth
#

I looked around on arxiv for other models that had guidance embeddings like that

#

but I couldn't really find any

viscid pivot
#

well that's distillation for you. Flux pro also doesn't use it

bitter hearth
#

the guidance is fun sometimes because what you can do is set it to 1.4 lol

#

guidance 1.4 flux is a wild time

viscid pivot
#

yea its like midjourneys chaos mode

bitter hearth
#

I never used midjourney cos I couldn't get their discord to work

viscid pivot
#

im just using MJ to rip dataset images haha

bitter hearth
#

lol

#

I thought about using flux ultra for that

#

but instead I might just filter HF datasets

viscid pivot
#

Im using MJ because it basically says do whatever the f you want with the images. I think flux has some kind of no using for training or so

bitter hearth
#

particularly if you are doing reward learning / reinforcement learning type fine tuning
you can re-use existing big datasets because you are using them in a different way to before

#

yeah flux probably does

viscid pivot
#

im also using the 512 res for concept transfering

bitter hearth
#

okay nice

viscid pivot
#

even if the images are bad, flex is very good at keeping sh... quality in the 512 resolutions haha

bitter hearth
#

there are more image quality assessment things around now so

#

filtering can be better

#

flux dev hides some weird data in 384x384

#

poses do improve in that res though for some reason

viscid pivot
#

the problem is getting images like that to become realistic looking is very difficult. I had to do lots of trickery to make that work in 1.5k

#

and since i now have the output i can reuse it for self reinforced learning

dusky thistle
#

just implemented regional conditioning with SD35

#

pretty much doubles the token limit too

dusky thistle
#

with SD35L

turbid grotto
devout schooner
#

got a pretty good result for Gun Lady with Clownshark sampler and SLG, on SD 3.5 Med
SD 3.5 does cars particularly well I've noticed

devout schooner
#

CogView kinda mid on the lady one here
rail to nowhere lol
it looks like a distilled model despite apparently not being one also
same as Lumina 2.0
for some reason

#

this SD 3.5 Medium one should be nice
doing it direct 1024x1536
I can see she has the right number of fingers already so we should be good lol
Clownshark saves the day again

#

Kolors Lora ones here
pretty good
just up against the limitations of the VAE as always sadly

devout schooner
devout schooner
#

SD 3.5 Medium really likes prompts that are exactly the average length of regular Florence-2 Large "more detailed caption" mode
that's what this is
a portrait of a young woman with pink hair. She is sitting on a couch with a cityscape in the background. The woman is wearing a black leather outfit with a gold necklace and earrings. She has a pair of sunglasses on her head and is looking off to the side with a serious expression on her face. The lighting is red and blue, creating a futuristic and edgy vibe. The overall mood of the image is dark and mysterious.

viscid pivot
#

hard to put into words, but it always looks a little off

#

it always tries to have the patterns of it too perfect

devout schooner
#

this one's Lumina 2.0 lol
it's weird looking for a non-distilled model
more like actual plastic than Flux

viscid pivot
jagged gate
dusky thistle
dusky thistle
jagged gate
crisp crane
#

/ dream Painting of 2 fairies looking at the camera and smiling,Asian girl's face, full body photo, white wings

devout schooner
#

SD 3.5 Medium vs CogView 4
both 1152x1536
SD still winning in "groundedness" as far as as realism IMO
I dunno why everyone else is apparently allergic to making a model that just looks normal for that kind of thing

bitter hearth
devout schooner
#

maybe it was just trained on too many Flux gens

#

I guess the timeframe would work for that to be possible

bitter hearth
#

more likely to be DPO I think

#

the different categories of distillation act pretty differently

#

some are very subtle

#

like high step PCM

devout schooner
# bitter hearth got this out of cogview, similar result to you

a photograph of a woman with long, wavy dark hair sitting at a wooden table in a dimly lit coffee shop, holding a teacup in her hands. She is wearing a light blue ribbed long-sleeved shirt, and her expression is calm and contemplative. The background is blurred, but it appears to be a cozy cafe with warm, inviting lighting. The image has a high-quality, cinematic feel, with a focus on the woman's contemplative expression and the warm tones of the setting.
with Kolors photo lora

lethal cape
#

Is there any guide on how to begin?

devout schooner
#

though

#

sadly lol

bitter hearth
#

wan or stepfun

#

as image models

#

is best currently

devout schooner
#

yeah i've heard of people using them like that

bitter hearth
#

I forget the name of the paper but it is something like "inpainting with video priors" it said video models learn stuff like laws of physics better

simple ocean
#

/generate 现代都市公园,阳光柔和,绿地和树木,背景有摩天大楼。一个白人男性(30岁,浅棕色短发,浅蓝色衬衫,灰色休闲裤,微笑)和一个黑人男性(30岁,黑色卷发,深绿色针织衫,卡其色长裤,开朗笑容)站在桥上,影子交融,背景有鸽子和国际象棋棋盘,插画风格,低饱和色调。

jagged gate
jagged gate
crystal notch
viscid mica
#

Today's API service is not stable, always prompting timeout

jagged gate
jagged gate
jagged gate
simple ocean
#

Build an exhibition hall for IQOS electronic cigarettes, with an artistic and high-end style.

dense birch
#

A tall fantasy art panel divided into four vertical sections, each showing the same stylized tree in a different season:

  • Left section: winter theme with vibrant icy-blue leaves, snowflakes, and a dark starry sky
  • Second section: spring theme with bright green leaves, soft glow, and gentle sparkles
  • Third section: summer theme with warm golden-orange leaves, light glow, and shimmering atmosphere
  • Right section: autumn theme with fiery red leaves, falling foliage, and a darker star-filled sky
    Each section seamlessly transitions in color and mood, leaves softly glowing and drifting,
    intricate detail, ultra-detailed, fantasy lighting, digital painting, trending on ArtStation, 8k resolution
icy moon
#

Realistic style surreal visual scene

thick talon
#

hello everyone!

devout schooner
#

I keep changing this absentmindedly instead of denoise for hi-res-fix with Clowsampler lmao

#

cause it starts at 0.5 I guess

viscid mica
#

When will the service be restored?

dusky thistle
devout schooner
# dusky thistle

is there anything else in the incredibly gigantic list of sampler options worth checking out for general use cases lol? or a breakdown of what it all even is anywhere?

dusky thistle
#

it looks like this if you have rgthree, and turn on the setting for nesting folders

#

it really should be part of base comfyui imo

#

cuz yea otherwise it is a HUGE list lol

#

worth checking out is hard to say, it all depends on how fast your model is and your patience i guess lol

#

and what's best for what, it's hard to say... which is why i added so many lol

#

res_8s can be insanely good with sd35m, obviously will be a lot slower but medium is pretty fast soooo... can be viable

#

pretty much the big think to know from a user perspective is the multistep ones will be fast, and the higher you go with the "s" number the slower it will be, but probably better

devout schooner
dusky thistle
#

i'd imagine it'd be fine either way

#

honestly, i've never used SLG

devout schooner
# dusky thistle honestly, i've never used SLG

these were both the same seed / prompt / settings etc on SD 3.5 Medium with your res_3s and "ModelSamplingAdvanced" in exponential mode
only difference is first was no SLG, second was with
SLG version background is way more coherent, especially the buildings, I think
and it seems to not have the yellowy high-contrast kinda look that SLG usually brings, when I use it with your exponential samplers
so that's a bonus too

dusky thistle
#

yeah that's a pretty big difference

#

what settings are you using with SLG?

devout schooner
#

not having the conditioning thingies there makes a huge difference for reasons I don't understand, even with an empty negative prompt
so that's like the overall best combo of settings I've found

dusky thistle
#

yeah, a blank prompt gets encoded to something different than all zeros

devout schooner
#

the default scale 3.0 is WAY too high it seems for SLG lol, the colors are super wonky, 2.0 is way more reasonable
and the slightly lower default end percent of 0.015 is also a bit worse for whatever reason, at least in my experience

devout schooner
#

Exponential Clownsamping makes some of my SD 3.5 Medium likeness Lora experiments come out way better than I had "ranked" them at lol
I may need to re-evaluate like everything
I use these two a lot for testing lora training on new models just cause the models will very often struggle to reproduce the various somewhat unique aspects of how they look
in comparison to other celebs who don't look quite as distinct

craggy crest
devout schooner
#

holy crap
a miniature model of a castle on top of a large mug. The mug is made of stone and has two handles on either side. The base of the mug is covered in moss and rocks, and there is a small waterfall cascading down from the top. The waterfall is surrounded by greenery and there are two small figures standing on the rocks. On the left side of the base, there is an oak tree with yellow leaves. The castle is made up of multiple towers and turrets, and it is lit up with orange and yellow lights. The background is a bookshelf filled with books and other decorative items. The overall mood of the image is magical and whimsical.
Clownsample res_3s (right / second) absolutely steamrolled the DPM++ 2M SGM Uniform output (left / first) on this one lol
with SD 3.5 Medium

#

like it directly made the prompt adherence better
as far as the background

dusky thistle
#

SDE can help a ton with getting style out of loras, and likneess, in my experience

#

it gives the model lots of chances to make little corrections and find its way to a better output

devout schooner
dusky thistle
#

yeah, it def helps

#

everyone kinda gave up and never did anything to get SDE working with rectified flow for whatever reason

#

but that's what the "eta" parameter does

#

if itha'ts at 0.0, it's not SDE, if it's > 0.0 it's SDE

dusky thistle
devout schooner
dusky thistle
dusky thistle
dusky thistle
#

all using stoqio, a finetune of SD35L

dusky thistle
dusty widget
#

Vibrant energy waves pulsating across the cosmos, different frequencies manifesting as different celestial objects.

buoyant mesa
dusky thistle
dusky thistle
craggy crest
#

@spark grove spammer - you might want to block steam links

dusky thistle
dusky thistle
dusky thistle
dusky thistle
dusky thistle
dusky thistle
dusky thistle
jagged gate
dusky thistle
jagged gate
dusky thistle
jagged gate
dusky thistle
dusky thistle
dusky thistle
astral holly
#

Phoebe, the most massive irregular satellite of Saturn.

jagged gate
dusky thistle
dusky thistle
arctic rose
#

tree

rocky coral
#

hello

dusky thistle
vital gazelle
dusky thistle
dusky thistle
dusky thistle
dusky thistle
dusky thistle
dusky thistle
drifting oak
proven pecan
# dusky thistle

Seeing your stuff again makes me wish I could run comfy on my ipad.

stiff drum
#

imagine an online store dased on darktheme ui,ux design selling shampoo, conditiooner, texture powder

errant dust
#

I'm imagining it.

#

No, wait, it needs more hair ribbons

#

and more basketballs

dusky thistle
dusky thistle
dusky thistle
dusky thistle
craggy crest
dusky thistle
dusky thistle
exotic atlas
#

phòng khách hiện đại

#

Modern living room includes: TV wall, table and sofa, wall hanging, door frame, decorative lights

dusky thistle
golden wave
#

dog big smart

olive laurel
#

a girl in forest

craggy crest
craggy crest
craggy crest
fathom acorn
#

boy

urban arch
#

Nice Nose Flute! 🙂

desert sapphire
#

/imagin prompt:Generate an attractive FB homepage image for wholesale customization of shoes, clothing, and bags for e-commerce, so that customers can know what product you are making as soon as they come in:: --aspect 16:9 --version 5.2 --quality .5 --stylize

livid prairie
#

70-80年代,盐城,怀旧,文创店,温馨,年代感,老照片,复古海报,手绘墙画,木质柜台,货架,文创产品,老物件,绿色植物,暖黄色灯光,老式唱片机,邓丽君,老式自行车,缝纫机,丹顶鹤,麋鹿,剪纸,刺绣,草编

boreal heath
#

is this for image generation? help pls

summer ginkgo
violet escarp
#

https://arxiv.org/pdf/2503.10618v1

Although increasing the channel capacity of the VAE generally improves image reconstruction quality, it can inflate the KL divergence, hindering
subsequent diffusion training.

buoyant mesa
# craggy crest

could you post your workflow for that.... my creations in comfyui with SD3.5 are just bad

craggy crest
craggy crest
#

@honest yarrow perfect english - no, oops, it repeated a word that wasn't in the prompt - and this is the most advanced generative AI model right now

honest yarrow
craggy crest
#

you can see i did not tell it to repeat the word pie

#

none of the AI image generators are good at text. they're getting better, but they're still not good. and the only text they are the least bit good at is English. if you want one that is good at any other language, you're going to ahve to spend the time to research, learn how to create data sets, get a data center, and train it

#

and then work with it over and over and over till you get it working

honest yarrow
# craggy crest

well at least It did type it cursively right. I will check SD 3.5 for Arabic now

craggy crest
craggy crest
# honest yarrow also It is only 8B

i know. flux is stuffed full of stuff it didn't need and then dpo was run on it to ensure that the things people are most likely to want to generate come out nice - and mask all the issues it has

#

it would also be around 8b if they hadn't done that

bitter hearth
#

recently there was a Rombach lecture on youtube and they showed pre DPO flux sample

#

I have mixed feelings about DPO because it works very well for certain models

#

particularly SPO for SD 1.5

#

I think the weaker the model the more it helps, so it was more needed for SD 1.5 sized models than for big 10B+ ones

craggy crest
#

dont' spam the channels

bitter hearth
#

it has a high risk of overfitting compared to some other methods

#

there are some more modern similar methods to DPO that address that a bit

#

but its still a risk

hushed cliff
#

Hi dudes. Happy wednesday.
Started tinkering with SD 3.5 medium recently. Very few information in the net about styles. Had someone found a key to it yet? The artist styles seem to be present, but hard to use in compex prompts, as they dissipate very quickly with the prompt length increase.

silver stratus
#

Modern living room includes: TV wall, table and sofa, wall hanging, door frame, decorative lights

craggy crest
strange lodge
#

"Peaceful landscape with lots of dogs."

bitter hearth
#

specifically no more than exactly 75 tokens, you can check with certain comfy nodes

#

you will sometimes see 77 written for the clip L and clip G size but apparently it ends up being 75

#

would leave a few more just in case so maybe 70

modern shuttle
#

((cartoonish style), (Q版 fantasy)),
main elements:
smiling sun character with straw hat (拟人化太阳),
wheat fairy holding scythe (木属性精灵),
dynamic composition with wind-blown wheat waves (火性动感),
color palette:
orange sun (丙火),
emerald wheat (乙木),
light gray clouds (金属性弱化),
avoid deep blue or silver (忌水金)),
text overlay: "庚午匠心" in bold calligraphy (火属性印章)

fallen stump
dusky thistle
sullen moss
violet escarp
#

I'm finding little info about them beyond what's on their website. They're pretty recent.

bitter hearth
#

there's honestly no reason to use closed these days with flux, stepfun and deepseek

bitter hearth
#

made some Reve img

#

its got good colours

past cipher
# bitter hearth

Did you give it enough steps? The bars on the stairs are wavy instead of vertical.

bitter hearth
frail shoal
past cipher
frail shoal
#

trying to animate with sora but its shit

past cipher
frail shoal
#

don't have good gpu

past cipher
frail shoal
#

because i have 6gb, but i can run flux, if its the same size

past cipher
noble tinsel
#

hello

frail shoal
past cipher
# frail shoal

I really like this one. "I can fix her" mentality going on...

frail shoal
#

dynamic angle, black and white color scheme, monochrome, an artistic depiction of an alluring demon girl with demon wings, surrounded by flames, holding a huge huge fantasy magical sword, green flames, the scene is depicted with a feel of melancholy and angry engrained in the composition, long black hair, medium elf ears and golden fiery glowing eyes, high quality, realistic artist render, digital painting, realistic artist illustration, incredibly absurdres, intricate details, incredibly detailed, perfect lighting, HDR, volumetric lighting, year 2024, high contrast

#

limited colors looks good

dusky thistle
past cipher
# dusky thistle

I've heard of having skeletons in your closest, but a school locker?

craggy crest
past cipher
#

The Advanced Class is taught by Michael Scofield and it's "How to Break Out of a Maximum Security Prison"

craggy crest
stuck yarrow
#

Aesthetic wallpaper inspired by Oriental Five Elements, gold and wood fusion, soft green leaves with intricate designs, shimmering golden branches, interplay of emerald and gold, tension of balance, delicate mist, hidden golden glimmers, minimalist abstract, ideal for mobile, 4k

livid glen
#

generate a girl

drifting oak
past cipher
acoustic mauve
#

Surrealistic illustration of a human face fused with mechanical and organic elements, featuring a steampunk and biomechanical aesthetic. The face includes gears, circuits, veins, and exposed anatomical structures, combined with antique clocks, alchemical symbols, and scientific anatomical details of the brain. The background consists of aged parchment pages filled with handwritten script and scientific diagrams. The artistic style is highly detailed, resembling ink drawing with watercolor shading in gold, red, and blue tones. Dramatic lighting with a worn, vintage paper effect.

snow pivot
#

木质桌面上散落辛拉面袋/溏心蛋/午餐肉切片/葱花/芝士片/韩式辣酱,暖光俯拍,日系清新滤镜,突出食材色彩对比

sullen moss
#

New gen model from OpenAI...

errant dust
#
MIT News | Massachusetts Institute of Technology

A hybrid AI approach known as hybrid autoregressive transformer can generate realistic images with the same or better quality than state-of-the-art diffusion models, but that runs about nine times faster and uses fewer computational resources. The new tool uses an autoregressive model to quickly capture the big picture and then a small diffusion...

past cipher
sullen moss
sullen moss
sullen moss
sullen nacelle
#

anie

drowsy rune
#

gerate animated spider man cartoon with holding crricket bat

#

#🆕|sd3 gerate animated spider man cartoon with holding crricket bat

jagged gate
vital surge
jagged gate
solemn lichen
#

create a image of a dog

sullen moss
jagged gate
jagged gate
errant dust
sullen moss
merry timber
#

I generated an image with the prompt: ‘Convert the image to a hand-drawn style, keeping all original content unchanged, including the person with twin tails in a floral dress, sitting in a car with a seatbelt, the car interior background, the phone lock screen with the time “13:16”, text “向上滑动以解锁”, notification “微信 4个须知”, and status bar with signal, Wi-Fi, and battery icons, using a soft black-and-white sketch style with clear details.’

robust bolt
#

Une pub pour une nouvelle collection de survêtement iconique d’une marque qui s’appelle CASA X MENA photography, colorful modern, art by , Greg Manchess, in the style of , Artstation Pinterest, accent lighting, Pub survêtement , highly detailed intricate details Unreal Engine fine details HDR hyper realistic sharp trending on artstation

coral arrow
#

hi可以用中文吗

pliant gale
#

what

errant dust
#

Looks like it is a mad race for big releases. After OpenAI's new model (Dall-E 4?), Ideogram just released their own new spectacular v3.0 and Midjourney is getting ready to release v7 soon

#

Really the Golden Age of AIs

surreal heart
#

redo this low quality flash card to be ready for print and remove the texts

split bramble
errant dust
#

See the 2-minute video above. They may not have called it this, but it is nevertheless their new image generating model

errant dust
#

above that

split bramble
errant dust
#

🙂

next fossil
#

Not Dall-E 4

#

GPT4o Multimodal, LLM + image generation

split bramble
#

I had a chance to try it, but not much before hitting the limit. It's now limited so much that's it is essentially unusable.

#

Apparently even paid accounts are heavily rate limited.

dry wave
#

I assume its good but very inefficient

#

maybe good for generating training data ;P

pseudo owl
# dry wave maybe good for generating training data ;P

yeah openai always starts by making the biggest most inefficient model they can and then distill it, quality wise its a really big step up from normal t2i models, even closed source. Imagen/Ideogram/Recraft might have slightly better aesthetics but prompt following is pretty insane. some imgs from it

Now tho, the model already seems considerably worse so looks like they are already distilling it, some open source variant would be nice.

craggy crest
left frost
#

Give me a picture that everyone can like, with a theme of a soul singer named Na Yin Handsome Boy. The background is an immersive and dreamy color sensation, with the character appropriately reduced in size, in the style of a cyberpunk 3D anime

#

#artisan-1 Give me a picture that everyone can like, with a theme of a soul singer named Na Yin Handsome Boy. The background is an immersive and dreamy color sensation, with the character appropriately reduced in size, in the style of a cyberpunk 3D anime

dusky thistle
dusky thistle
dusky thistle
dusky thistle
real terrace
#

amazing images

hallow lion
#

unique style

green warren
#

/image_dream BB King

lone sparrow
#

create a zeppelin with pov angle in the amazon forest passing through smoke fog

#

create a zeppelin with pov angle in the amazon forest passing through smoke fog

civic trail
frail shoal
devout schooner
#

it refuses significantly less prompts than any other API-only generator I've ever used, also

craggy crest
inland wagon
#

Fusion of future aesthetics and natural themes, flowing glass texture, refraction effect

muted glade
#

Fantasy warrior

inland wagon
#

#🆕|sd3 Fusion of future aesthetics and natural themes, flowing glass texture, refraction effect

bitter hearth
#

Reve has nice colours and cinematic theme

#

better fine tune than most of civit

#

it cannot do hands

#

faces are ok a certain % of the time

#

structure sometimes messes up as well

#

kinda a medium quality release

devout schooner
# bitter hearth it cannot do hands

I find they're not as good as like, say Flux
but they're also not "bad" by any means really IMO
like they're usually fine
even the worst examples of them is more like, kinda melty looking
or just a missing / extra digits

#

Reve has better overall prompt adherence than Flux also IMO, by a noticeable amount

#

it doesn't just go like "nah, I'm gonna do this instead" like Flux tend too sometimes

#

this is Reve on that "Gun Lady" prompt I was doing before, for example

#

it occasionally slightly screws up the gun hand, but not that often, and not in like a ridiculously huge way
gets it right only like a smidgen less often than Flux
but overall looks better always for that sort of gen than stock Flux Dev IMO

bitter hearth
#

ah I saw some octopus hands. like pure SD 1.4 level craziness

#

when I tried rev

devout schooner
#

but as it is now it's about as good as my pics up there for basically anything usually, with really good prompt adherence
it also is just willing to do more things than other API-only generators, by quite a bit
they have a VLM prompt enchancer button that will very rarely just like output "I can't enhance this."
but you can turn that option off to retain your original prompt anyways
and then beyond that they do the typical blurring thing, but it only seems to apply to gens that had resulted in like full-on explicit NSFW
it doesn't care about any amount of like stomach or whatever as some of them do lol

bitter hearth
#

ah that's unlucky

#

I actually have no idea why people pay money to use censored models when there are like $0.50 H100s everywhere

#

with optimised workflow I made 10,000 Flux dev images for $0.50

#

most people can't or won't optimise that hard but they can get a decent fraction of that number

devout schooner
# bitter hearth with optimised workflow I made 10,000 Flux dev images for $0.50

yeah but that's still Flux
there's an argument to be made for these other models that are increasingly more easygoing about what they'll allow
and just like, overall good / better than any free version of Flux, particularly for complex typography and such, especially if your use case is moreso commercial / business related and all that
Reve you don't even have to pay for technically, they give you daily free credits, similarly to Ideogram
only Midjourney has absolutely no free option of any kind, these days

#

in terms of refusals though OpenAI stands alone, and I don't really get why
they're the only ones who still actively block ALL copyrighted characters from anything known to the model, for example
Meta's Imagine doesn't do this
Google Imagen doesn't do this
Reve doesn't do this
Ideogram doesn't do this

#

and so on

#

they're just shooting themselves in the foot relative to every single one of their competitors

#

like, 4o literally will not do a high-quality illustration of Bart Simpson from "The Simpsons".

#

every other model I mentioned will, without hesitation

bitter hearth
#

its cos openAI are AGI company

#

and the other stuff is secondary

#

whereas most companies that sell AI products the products is primary

devout schooner
#

i see no upside to that for them

bitter hearth
#

dalle 3 was earlier in time than most

#

its aged so well that it still gets compared

#

but it is pretty old now

#

I mean its now their third best image model as well since they now allow image making using sora

#

and the GPT4o thing

devout schooner
# bitter hearth but it is pretty old now

I hope they publish the tech specs on it at some point
for like the whole pipeline
I've always been curious about how it really compares to newer models as far as all that stuff

bitter hearth
#

yeah for sure

#

something about its prompt following is still the best to this day

#

not in every aspect cos flux etc can hold more objects

#

but it was very responsive

devout schooner
#

the context is definitely not like, THAT long compared to a number of newer models

#

although we don't really know how much fuckery they do with your prompt on the backend

#

I'd also like to know just how good it actually was at photographic gens if they turn off all the stupid bullshit filtering they do that makes every image look like it's trying to imitate the overdone implementation of ambient occlusion from Far Cry 3

bitter hearth
#

on reddit or other you can find

#

jailbroken

#

it was not especially strong

devout schooner
#

i never thought it was much better in a broad aesthetic sense than like, base SDXL

#

the images were never very "high quality"
it coul just do sort of more interesting things

bitter hearth
#

yeah that's right

devout schooner
#

no way it has anything more than a 4-channel VAE also

#

i'm pretty sure of that

bitter hearth
#

yeah although I have a different view to most on vaes
I follow the lightningdit paper's idea that our vaes are too good and we need worse ones

#

cos worse vaes are easier to train your diffusion model with

devout schooner
#

if not I don't really think that quality contrast could ever be worth it

#

the only way around that on XL was training on ridiculously high res images with zero JPEG artifacts
and even then it wasn't as good as just having 16 channels

bitter hearth
#

worse as in a smaller vae, less channels and/or deph

#

there will be a size you can upscale to to even out detail differences

devout schooner
#

Flux can kinda-sorta do that

#

but not nearly as often
and at least one of the characters will often look strange

#

Flux can't really resolve the three of them to a "common" style that makes sense the same way, I guess is the gist of it

humble blaze
#

yeah although I have a different view to most on vaes
I follow the lightningdit paper's idea that our vaes are too good and we need worse ones

dry wave
#

maybe 16 channels are too much, but 4 channels are definitely too few

#

I don't understand why they went directly from 4 to 16 instead of doing something in between. On the other hand, they might just did some evaluations and found 16 channels the best

tough creek
bitter hearth
#

they use different variables now than just channel count

#

broadly its just a trade-off between reconstruction and generation

dry wave
#

they always did

#

you usually have a lambda parameter that controls the KL strength. If you would train a vae with normal KL strength your reconstruction error would be too large

bitter hearth
#

ah okay nice

#

I looked a bit into variational inference and KL strength comes up there too lol

dry wave
#

in the original publications they keep it always at 1

#

but for many applications that is just a bit too much

bitter hearth
#

I think if you distribution match super super hard then it can be too inflexible

#

with KL divergence in general

#

its nice to have a bit of a looser fit

civic trail
cinder junco
random wraith
#

Symbol: A stylized, simplified representation of the Indian peepal leaf, symbolizing knowledge and growth. The leaf can be designed with subtle, interconnected nodes or lines, representing the connection between ideas and research.
Generate a logo for a company called 'anveshana' as described.

Color Scheme: A palette of blues and greens, conveying trust, growth, and harmony. Blues can represent intellectual pursuits, while greens signify growth and innovation.

Typography: A clean, modern sans-serif font with the word "Anveshana" written in a flowing manner, suggesting continuity and exploration.
Meaning: The peepal leaf symbolizes the sacred tree under which knowledge is shared, while the interconnected nodes highlight the collaborative nature of research. The color scheme reinforces the themes of intellectual growth and harmony.

craggy crest
#

Vegetative electron microscopy

craggy crest
sullen moss
#

Almost one year...

bitter hearth
turbid grotto
#

what do you think, is sd4 possible?

#

after sd3.5l they planned to release sd3.5m controlnets too but there are still none, maybe they dropped sd3.5 and moved to other projects?

bitter hearth
#

tensor.art released some in the end

#

as well as a fresh set of distils

#

but not sure why SAI didn't do the first party ones

turbid grotto
#

diversity of sd3.5 is astonishing, if only not the coherency problems...
maybe this is just how it works? you have either unique model with bad coherency either overtuned model with great coherency?

bitter hearth
#

I didn't even know about that SD 3.5 Bokeh model

#

but yea looks like they did a 5m image finetune

bitter hearth
dry wave
#

dunno. I find diversity in Flux larger than in SDXL for example

bitter hearth
#

Flux having low diversity is a myth yeah

#

particularly at low guidance numbers

#

its a much larger neural network than SDXL so it can be expected for the larger network to have a better trade-off

#

I think the trade-off is more for comparing different versions of the same model, e.g. finetunes/distils/CFG levels, more than it is for comparing different models

devout schooner
#

More definitely exist than that

#

Not sure what the "SD 3.5 with no suffix word" category is for TBH

#

Medium is getting a bit more love though

#

Two separate anime finetunes for it now

#

Both looking pretty promising when I tried them

#

RealVis dude also has a WIP Medium finetune on huggingface

devout schooner
#

The actual difference between SD 3.0 and SD 3.5 Medium continues to throw me curveballs also lol
3.0 really is legit objectively better sometimes
Most notably the "everything goes grey and melty" thing that happens with 3.5 Medium when the prompt is overly long actually didn't / doesn't happen nearly as much in 3.0
This is an "all settings same" comparison, 3.0 on left, 3.5 Medium on right
For a super long prompt:

'''a photograph showcasing an intricately crafted glass teapot, featuring a detailed, miniature scene inside. The teapot is made of clear glass with ornate, golden details on its lid and base, giving it an elegant, antique appearance. Inside the teapot, a serene seascape is meticulously painted, depicting a turbulent ocean with white, foamy waves crashing against rocks. A majestic, wooden sailing ship with two tall masts and white sails is navigating through the turbulent sea. The ship is depicted in warm, earthy tones of brown and white, standing out against the cool blues and whites of the ocean. The sea is rendered in realistic detail, with waves crashing against the glass, creating a sense of movement and depth. The rocks in the foreground are textured and detailed, adding to the immersive miniature scene. The scene is illuminated by a warm, golden light, possibly from the flame of a candle or a lamp, visible in the background. This light source casts a soft glow, enhancing the golden accents on the teapot and adding warmth to the cool blue tones of the sea. The background features a blurred, cozy indoor setting with a wooden table and a single, large, orange candle flame casting a warm, inviting ambiance.'''

3.0 looks like most models do for this prompt
3.5 Medium though is like, trying but melting in the process
So I dunno what's going on there lol

proven pecan
#

@devout schooner Well, this is my sd3.5 medium version of your prompt. So it must be ...

#

This is with no CLIP L or CLIP G text

devout schooner
#

My images were generated with workflows that were literally identical except for the model swap BTW
just the default comfy ones for 3.5 / 3.0

proven pecan
#

random seeds 4082523719, 3144246774

bitter hearth
#

and 3.0 large for that matter

bitter hearth
proven pecan
bitter hearth
#

these days I tend to either use pure pytorch/JAX or C++/Rust kernels (when I can) so it didn't really matter that much to me either way

fathom merlin
icy drift
#

Testing HiDream, first result that really blew me away. Just beautiful. "Three antique fantasy potion glass bottles with labels in cursive font are sitting on a rustic wooden bench. The first bottle contains blue liquid and has the label "Mana". The second bottle contains red liquid and has the label "Health". The third bottle contains green liquid and has the label "Stamina". The warm lighting refracts through the liquid in splashes of beautiful color, casting raytraced caustic colors on the table below."
However, that's not cursive but calligraphy, and stamina is misspelled. There's no image2image in comfy at the moment, so I can't refine an image with a second pass.

#

A glass cannon. It took a little more prodding than expected to get a cannon though. The model tends to ignore unexpected words maybe?

#

No text reflection (no model I've used can do this yet, I'm just waiting for the day).

#

Gave me the title, art, and text I asked for, with a slight mistake in the text. (This was a 1-shot, usually in Flux I would do quite a few rolls.)

#

This skin and hair are very believably wet! Is it the best I've seen? Maybe.

bitter hearth
icy drift
#

It can't do many-numbered dice pip prompts. Not AGI here anyway. This is basically a slightly more capable version of Flux.

bitter hearth
#

HiDream looks quite a bit better than Flux to me

#

especially in fine details

dry wave
#

I mean, it has more parameters 🤷‍♂️ I would like to see how Flux would perform when replacing T5 with a more powerful text encoder

#

from the architecture I found HiDream very disappointing and wasteful

fathom merlin
icy drift
dry wave
#

what resolution do you use? I think HiDream has a max resolution of 1024x1024

icy drift
# dry wave what resolution do you use? I think HiDream has a max resolution of 1024x1024

So far I've only tried the resolutions HiDream used in their python scripts in their official repo, although I plan to test large image gen later for things like duplication. I've tried Euler and UniPC, although the Comfy versions might not be the same as the versions their using in their repo. (Part of why I'm holding out hope the banding will go away.)

#

Wow that is without a doubt the best prompt adherence I've seen so far. This is a 1-shot.
From left to right: An old man, a little girl, and an old woman are sitting on a park bench. The old man on the left is Chinese with gray hair and a green jacket and he is asleep with his eyes closed. The little girl in the middle is Russian with black hair and she is laughing happily and wearing a yellow sundress. The old woman on the right is Native American and has faded red hair and is wearing t-shirt and jeans, and is looking down at the smartphone she is texting on in her hands. The scene is brightly lit outdoors.
I think the girl might have come out a little more Chinese than Russian though. But Mongolian sort of blends between the two, so it's not too wrong.

bitter hearth
icy drift
#

Huh, even modding the script, 4k resolution actually fails with an error. I can't even attempt it. 😕 Never seen that before.

#

I guess I'm trying 2048*2048.

bitter hearth
#

diffrax is ok as well

#

by the standards of the computational mathematics community, the code in AI community is fairly error-prone
so its better to learn the math seperately

#

on the other hand computational mathematics libraries tend to be less optimised in terms of things like CUDA kernels so there are pros and cons

fathom merlin
#

I see

bitter hearth
dry wave
#

I remember that in the codebase they check for too large resolutions and reject them. You might have to remove that.

icy drift
#

Doesn't matter, 2048 fails spectacularly. Completely unusable above expected resolution (and already banding there, so just forget it). 325 seconds on my PC.

#

From HiDream's python script:

dry wave
#

yeah, you have to change the max resolution parameter in the script to generate larger images

#

but I assume they put it in there for a reason 😅

icy drift
dry wave
#

no, the tensor missmatch comes from that

#

you changed the wrong part

#

its the max resolution parameter

#

(or the max_seq variable respectively)

icy drift
bitter hearth
#

base flux was not particularly great either above 1560x1560

#

there are fine tunes that take flux to 2560x2560 but they have some de-distill in them

icy drift
bitter hearth
#

the SOTA for mirrors is an SD 1.5 or SD 2.1 finetune lol

#

can't remember which

#

they made an entire foundation model just for mirrors

#

SDXL tile controlnet is excellent, I like SD 1.5's one best though

#

https://github.com/Kosinkadink/ComfyUI-Advanced-ControlNetwith the softweight node from here

#

as far as I know it lowers the strength per block

icy drift
#

I'm doing a last reflection test, and then I need to go.

#

Good, but not exactly what I asked for. Gotta split.

bitter hearth
#

my favourite controlnet of all is SD 1.5 with XDoG scribble

#

okay bye
fur details were good

#

if Hi-dream can deliver higher small details than flux and then be the same in other areas I would still take that trade TBH

dry wave
#

please do examples with dogs and not with little girls. That's weird...

bitter hearth
#

dogs strongly preferred yeah

#

or cats for that matter

icy drift
#

The dog's lighting and shadows should be reflected in the mirror, and they're not.

dry wave
#

I just say the image is a bit borderline

bitter hearth
#

I started running NSFW classifiers in order to force outputs to be cloud-friendly

icy drift
bitter hearth
#

ok bye

#

on Vast.ai I always assume the docker container is being watched by the host
so I mostly make 1950s city images lol

cinder junco
#

Does anyone know if HiDream functions on MPS?

bitter hearth
#

not sure its always tricky with Apple cos their version of pytorch is missing a ton of functions

cinder junco
#

Comfy Manager still doesn't list any custom node for it, and I'm hesitant to install directly from a github.

#

Yeah, I'm running the nightly torch builds, but understand that, in their infinite wisdom, they chose to have thousands of unique operators.

bitter hearth
#

since the registry update I stopped using manager I would actually call installing directly preferable

#

IMO Apple should have improved the OpenVino or Vulkan ecosystems instead of making their own thing

cinder junco
#

Yeah, well, I just saw people reporting difficulties and possibly getting their comfy install nuked due to it. Like needing to install Flash Attention (which, as far as I know, doesn't function on MPS).

bitter hearth
#

OpenVino in particular has been cooking rly hard lately

#

ah yeah ok I do know that the default Hi-dream workflow requires flash-attention 2

#

cos I was installing flash-attention 2 on a server the other day for that reason

#

if Apple doesn't support that at the moment then that's gonna be an issue potentially

cinder junco
#

More like Flash Attention doesn't support MPS. A majority of AI stuff is built purely with nVidia in mind.

bitter hearth
#

yeah

#

I've been looking at making a distributed Intel CPU inference engine and its tricky with lack of support

cinder junco
#

Alright. Maybe I'll try to be patient for a while and see how things play out, rather than getting jealous of people using the new hotness.

#

I'm assuming there won't be any tiled diffusion solutions that work with HiDream for quite a while anyway. I'm not really satisfied with 1MP generations.

#

I could cobble a workflow together using Flux for the upscale, but I'd end up chugging the VM too hard.

bitter hearth
#

if you can get openvino working on mac I've been working on a tiled image editing thing for openvino lol

#

its sort of a joke but it really does have tile counts up to the low millions

#

I found out that python PIL package stops working if your image goes above 300k or so because it assumes that the image is malware

cinder junco
#

Each tile being 1MP?

bitter hearth
#

LOL in that test each tile was 2 pixels wide and 2 pixels tall

cinder junco
#

Heh, okay.

bitter hearth
#

but yeah I want each tile to be the size of a proper diffusion image so 512x512, 1024x1024 or 1536x1536

cinder junco
#

My tiled workflow for Flux is working well enough that I've thought of going higher (currently 3x scale for ~9MP), but there are some blockers. The second stage is sensitive to the level of detail (and the structure of those details) in the input image it is provided, so I need to do a model upscale with 4xUltraSharp. Otherwise, the 2nd stage result will just be blurry. I don't need to invent a very expensive bicubic scaler. Anyway, I've never seen a node that can do a tiled model upscale in Comfy. If I give the model upscale node a 9MP input image and it scales 4x, I'm going to have some major memory issues. I'd also expect more image consistency/hallucination issues when the ratio between the size of the target image and the tiles increases. I get that even at 9MP when the image has large areas of low detail (like a foggy, overcast scene with few foreground objects).

bitter hearth
#

9MP is a pretty good size, I think above that size its diminishing returns

#

since most people have 4k screens I generally would use 4k as the minimum

#

with Flux though I tend to use SD 1.5 as the upscaler

#

Flux itself adds less details

cinder junco
#

I'm mostly OK with Flux's details. I find it does well with natural details. The castle is only so-so.

bitter hearth
#

this looks rly good for flux yeah

#

definitely above average for flux img

#

the castle is a good example of where flux upscaling goes a bit weird- SD 1.5 would for sure have also boosted the castle detail

#

it feels like flux picks certain objects to not improve lol

cinder junco
#

I suspect it is another case of sensitivity to the upscaling model, but don't have proof. I've tried a lot of upscalers but keep coming back to 4xUltraSharp. It just seems to work particularly well with Flux in getting those details. But it definitely has weaknesses and sometimes doesn't generate enough pixel-level detail for Flux to work with.

bitter hearth
#

if you can do some pixel-space noise injection that can help

#

as well as noisy sampler

#

problem with noisy sampler is you then tend to need more like 60+ steps

#

which is rough for an upscale pass

cinder junco
#

I haven't tried overlaying noise. Flux already seems to like slightly noisy output, so I don't particularly want to encourage that.

bitter hearth
#

yea it can be tricky to not have the noise stay in the image

cinder junco
#

Not sure what you mean by a noisy sampler. I've settled on bosh3, but it's hard for me to tell if there is an "optimal", let alone what it is.

bitter hearth
#

you have the option of doing a third pass to clean up noise with SD 1.5 etc

#

I meant ancestral or SDE

#

bosh3 is nice though

cinder junco
#

I'm kind of a model purist and am resistant to going back to SD1.5 😆 .

#

I've heard people claim they found ways of getting ancestral and SDE samplers working with Flux and SD3, but don't know how they accomplished it. I've never found it to work with a normal workflow.

#

I also kind of dislike ancestral samplers because they don't converge, so you have no clue where to stop in pushing the number of steps.

#

I have a natural tendency to min/maxing, so ancestral drives me crazy.

bitter hearth
#

to get SDE working with Flux and SD3 its just a matter of making sure the variance adheres to the variance of the VP SDE, essentially

#

but it can be tricky in practice to convert from math into code sometimes because different papers use different notation systems

#

you tend to need more like 60+ steps for SDE so if you had less than that then that is why it didn't work well

cinder junco
#

I don't know what you mean by "VP".

bitter hearth
#

IDK if it's worth getting into the details but it goes back to an old paper called Song 2020

cinder junco
#

Not sure I'll be able to parse the paper. I only ever studied DEs at a surface level, and most of the AI-related papers require knowledge of previous papers to understand.

bitter hearth
#

ye its not needed to go into that level of detail neccesarily

#

for the most part you can just pick from existing implementations of stuff

cinder junco
#

Yeah, but if I need to somehow "match variances" by playing around in Comfy and not having any insight into the math of what it's doing... lol

bitter hearth
#

ye this is what I was saying earlier you've gotta learn the math outside of systems like comfy or diffusers

#

and then if needed you can bring what you learnt back in

dry wave
#

I found SD 1.5 often adds too much details on super high resolution

#

if every little spot in your image is super sharp and detailed it looks weird, too

#

but yeah, flux often looks a bit blurry when upscaling. I wonder if anybody tried fine-tuning flux on cropped ultra high resolution images

devout schooner
#

For SD3 / 3.5

#

Presumably also Flux

devout schooner
# bitter hearth 3.0 medium was always much stronger yeah

It's just the aesthetics on certain prompts I find
Like anatomy generally is definitely way worse in 3.0, they did improve that a bunch in 3.5 Med
But it seems 3.0 just had a very very different dataset than 3.5 Med or something

bitter hearth
#

yeah I never did people in 3.0 just landscape and sci fi

#

I used it a ton until flux release day

#

3.0 was much more photorealistic than 3.5

#

I only jumped to Flux once photorealistic loras/checkpoints arrived

#

the first being RealvisSchnell

#

followed by a bunch that had some de-distill in them

#

I never really used regular Flux so to speak

devout schooner
# bitter hearth 3.0 was much more photorealistic than 3.5

I did figure out how to fix the teapot boat prompt on 3.5 Medium BTW
i'm not sure now it necessarily has anything to do with prompt length (or at least not always), I think it's moreso just the dataset vs 3.0's
2d, 3d, cgi, render, smoke, fog, haze, mist, cartoon, anime, painting, drawing, sketch, illustration, traditional media, watercolor, airbrushed
in the negative gave me this on 3.5 Medium
it's still a bit oil painting esque for the boat area for my tastes, relative to 3.0, but way way more normal looking than before
good to know that broadly negating stuff like that does actually work

forest meadow
#

give me a colorful desk

bitter hearth
#

its definitely better
it still has this scratchy details effect that I struggle to get rid of

#

its as if it needs perturbed attention guidance or something to clean it up

#

with negatives you can boost them a bit by delaying the negative for some steps, sometimes up to like 30-40% of the steps

#

its different for every prompt so it takes some experimentation

#

essentially negatives seem to work better once the thing you are trying to change has just briefly appeared in the image

#

its swings and roundabouts cos some of the details are excellent like at the base here

#

in the diffusion models there is a clear trade-off between big and small details (this is what FreeU is about)
SD3.5 won't have the same FreeU mechanics but maybe there is a similar trade-off

craggy crest
bitter hearth
#

thanks I will try this one

#

yeah I miss the skip connections

craggy crest
bitter hearth
errant dust
devout schooner
#

in general I find that using Skip Layer Guidance along with ClownsharkBatwing's RES4LYF samplers produces WAY better results

#

Euler Ancestral also "just works" in stock Comfy for SD 3.5

#

and doesn't have nearly as much of that grainy look, particularly with the Normal scheduler

#

relative to Euler

#

TLDR as I've said before a big problem with almost all these newer models is that the default samplers recommended are nearly always super mediocre ones that nobody would ever use if they didn't have to

devout schooner
#

might be a captioning problem or something

#

it seems like there's excessive bleed of extremely painterly traditional media data into basically all gens unless you negate it

#

or something like that

#

that's the best theory i have

#

like if any significant number of the captions just said like "a man beside a tree"
instead of "a painting of a man beside a tree"
or a "a photo of a man beside a tree"
then that'd be the problem
if there was a lot of art data without any particular categorization

#

i think

errant dust
#

I don't know what you mean, but aside from that oddity it looked great

devout schooner
#

this is the same seed and same increment, on SD 3.5 Medium

#

note how the entire image is distinctly hazy and grey in 3.5

#

and the line resolution for small details is just worse

#

and this is WITH the negative prompt I mentioned before (for both the 3.0 versions and 3.5 versions)

#

I sincerely doubt this was intentional

#

it looks objectively worse

errant dust
#

I definitely think the 3.5 is better overall

devout schooner
#

the model is yes

#

way better anatomy and such

#

but the grey haze bleeding into EVERYTHING is incredibly annoying

errant dust
#

in the above images

#

what was the prompt?

devout schooner
#

positive:
a photograph showcasing an intricately crafted glass teapot, featuring a detailed, miniature scene inside. The teapot is made of clear glass with ornate, golden details on its lid and base, giving it an elegant, antique appearance. Inside the teapot, a serene seascape is meticulously painted, depicting a turbulent ocean with white, foamy waves crashing against rocks. A majestic, wooden sailing ship with two tall masts and white sails is navigating through the turbulent sea. The ship is depicted in warm, earthy tones of brown and white, standing out against the cool blues and whites of the ocean. The sea is rendered in realistic detail, with waves crashing against the glass, creating a sense of movement and depth. The rocks in the foreground are textured and detailed, adding to the immersive miniature scene. The scene is illuminated by a warm, golden light, possibly from the flame of a candle or a lamp, visible in the background. This light source casts a soft glow, enhancing the golden accents on the teapot and adding warmth to the cool blue tones of the sea. The background features a blurred, cozy indoor setting with a wooden table and a single, large, orange candle flame casting a warm, inviting ambiance.

negative (used for both, although it's only really necessary or at least helpful with 3.5 Medium, 3.0 Medium doesn't need or benefit from it):
2d, 3d, cgi, render, smoke, fog, haze, mist, cartoon, anime, painting, drawing, sketch, illustration, traditional media, watercolor, airbrushed

Sampler was DPM++ 2M SGM Uniform (no fancy RES4LYF stuff for the sake of the examples), CFG 5.5, 25 steps

#

without that negative to push away the haziness, 3.5 Medium produces absolute garbage like this, with everything else the same:

#

whereas 3.0 Medium always looks normal / propera and doesn't have the greyness issue at all

errant dust
#

Normal how? The lighting is all wrong

devout schooner
#

and as in the lines aren't nearly as much of an utter mess

errant dust
#

instead of soft candle light it is high contrast with bright colors

#

the opposite of the prompt

devout schooner
errant dust
#

no

#

the entire image is supposed to be the result of candle light

#

not just the candle

#

3.0 is all wrong

devout schooner
#

I mean we clearly disagree but this is semantics
this is a VERY real problem that SD 3.5 Medium has but 3.0 didn't
3.5 Medium VERY regularly produces images with a ridiculous, excessive grey haze across the entire image, in cases where you could not possibly argue it makes sense
and terrible resolution of lines for small details
unless you use negatives and better samplers
3.0 Medium had a lot of issues but it didn't have ones like that

errant dust
#

this is not semantics. 3.0 looks like a room with electric lights

devout schooner
#

even putting the haze aside
the teapot looks like absolute butt

#

in this no negative 3.5 Medium version

#

it really looks like it desperately wants to make it an oil painting

#

and not photorealistic

errant dust
devout schooner
#

that looks objectively better

#

it's not painterly

errant dust
#

no way those reflections on the glass are from candle light

#

they come from bright electric lighting

devout schooner
#

I mean i don't tihnk this conversation is going anywhere useful
this is a blurry, hazy mess that looks like a painting when it should not, any way you cut it:

#

the nitpicks about lighting are not relevant

errant dust
#

no?

devout schooner
# errant dust no?

yes
if you think that looks "good" this conversation is as pointless as i thought

#

litearlly nobody wants that output from that prompt
i promise you

errant dust
#

then I guess the prompt is irrelevant too

devout schooner
#

nothing in the prompt says "literally add extreme fog EVERYWHERE, be sure that the lines are horribly resolved, make everything as blurry and foggy as possible"

#

which was the end result

#

that is the only issue I care about here

#

i don't know why you're nitpicking the other stuf

errant dust
#

since it says in detail it is supposed to be low soft light from candles

devout schooner
#

that does not look like candlelight
it's a problem that 3.5 Medium has even for prompts that don't even mention ANYTHING about light

errant dust
#

which impacts everything

devout schooner
#

I will give you numerous examples if you want
it looks like butt
nobody wants "realistic" gens to come out like that
I assure you
and part of this IS definitely caused just by too long prompts
but it's not entirely
as 3.0 Medium was simply not as impacted by it

#

it's almost certainly related to poorly captioned art data somewhere in the 3.5 Medium dataset, I think

#

partially at least

errant dust
#

I will assume you already polled every one

#

Almost, since I preferred the 3.5 output

devout schooner
# errant dust I will assume you already polled every one

I have seen many people say variations of "why the hell is it so grey?" about 3.5 Large and Medium
never seen any opposite opinion expressed until now
I shouldn't have to fight to get 3.5 Medium to produce non-2d-or-painterly-in-any-way outputs
is the overall point
and indeed you didn't really have to do that with 3.0
despite the other flaws it had

errant dust
#

I find this image to be far more realistic looking than the 3.0 counter samples you shared.

devout schooner
errant dust
#

The colors reflect the lighting

#

Here is your prompt without all the insistence of candles

devout schooner
#

what's the exact prompt for this version?

errant dust
#

a photograph showcasing an intricately crafted glass teapot, featuring a detailed, miniature scene inside. The teapot is made of clear glass with ornate, golden details on its lid and base, giving it an elegant, antique appearance. Inside the teapot, a serene seascape is meticulously painted, depicting a turbulent ocean with white, foamy waves crashing against rocks. A majestic, wooden sailing ship with two tall masts and white sails is navigating through the turbulent sea. The ship is depicted in warm, earthy tones of brown and white, standing out against the cool blues and whites of the ocean. The sea is rendered in realistic detail, with waves crashing against the glass, creating a sense of movement and depth. The rocks in the foreground are textured and detailed, adding to the immersive miniature scene.

#

same samplers

devout schooner
#

and I still find the 3.5 Medium version to be overly grey and dull-looking

#

I think one of the people who work for SAI have even said that the 3.5 Medium dataset was more art focused too, so I suspect my suspicions about rogue captions are probably at least semi-accurate