#๐Ÿ†•๏ฝœsd3

1 messages ยท Page 8 of 1

wide pagoda
#

What about 512x512, does it just generate cropped images?

noble coyote
storm saffron
#

Sigma is a bit of an oddball though, it uses very long captions (300 tokens) rather than the 77 people are used to, so to get the best from PixArt you need to type in a large prompt or expand it with an LLM

#

SD3 still has that 77 token limit though right?

noble coyote
#

T5 is better when natural language is input ...

storm saffron
#

I've been using zephyr7b in comfy

noble coyote
#

In fact T5 was conceived to use natural language

storm saffron
#

Probably shouldn't discuss pixart too much in the SD3 channel of SAI though. Feels weird.

noble coyote
#

But as too few people can get their hands on SD3 - we have to talk about something ... ๐Ÿ˜„

dull star
#

interesting, Pixart (DiT) had the entire image noisy/distorted when doing a higher res version on any model

#

at least, if I recall correctly

noble coyote
#

I'm @ClipDrop SD3 now ... anybody got an interesting prompt for me to try?! ๐Ÿ™‚

#

OK, gone to Openai - Daily Theme for a prompt ๐Ÿ˜„

dull star
#

Photo of Criminal in a ski mask making a phone call in front of a store. There is caption on the bottom of the image: "It's time to Counter the Strike...". There is a red arrow pointing towards the caption. The red arrow is from a Red circle which has an image of Halo Master Chief in it.

#

ah im late

noble coyote
#

A moody and atmospheric fight scene between stick figures rendered in a realistic digital painting style. The scene is set in a dark alley, illuminated by a single streetlamp casting dramatic shadows. Two stick figures are engaged in a fierce battle, with one delivering a powerful punch knocking the other backwards. The realistic digital painting technique gives the stick figures texture and depth, making them appear almost human-like. The atmospheric lighting and detailed background enhance the emotive intensity of the scene

noble coyote
dull star
#

oh you were the one who tried it

#

I forgor lol

noble coyote
#

A futuristic terrarium with a unique and captivating design. Inside the terrarium, miniature bioluminescent plants and flowers emit a gentle, glowing light, creating an ethereal and otherworldly atmosphere. The plants are arranged in a way that mimics a mystical, enchanted forest. Tiny, intricately detailed fairy houses and pathways are nestled among the foliage, adding a whimsical element. The terrarium's glass is etched with delicate patterns that reflect and enhance the inner glow, making the entire piece a captivating focal point

#

Create a square visual representation inspired by the DIY spirit and featuring prominent use of safety pins, styled as a monochrome street snap photograph from the punk era. The image should focus on a close-up of a single person's face dressed in typical punk fashion, caught in a candid moment as if walking through the streets and looking back over their shoulder. The photograph should emphasize sharp contrasts and capture the gritty, raw aesthetic of punk culture. The person should have a brightly colored mohawk or spiked hair, appearing self-cut and dyed to emphasize the DIY ethos, though rendered in black and white. They should be sticking out their tongue and raising all fingers towards the viewer, conveying a rebellious and defiant attitude. The person should wear dark eye makeup, with black eyeliner and shadow. Accessories such as safety pins used effectively but sparingly as earrings, body piercings, and prominent decorations on their clothing should be visible. The person's eyes should have a nihilistic, vacant expression, emphasizing a sense of emptiness or detachment. The background should feature a gritty urban scene with graffiti-covered walls, capturing elements of street art that align with the punk aesthetic. The monochrome style should highlight the rebellious spirit of punk with intense, sharp contrasts and a raw, edgy feel. The photograph should have a candid, spontaneous feel as if the person was unexpectedly captured while walking through the city and looking back. Add a grainy texture to the photograph to enhance the raw, unpolished aesthetic.

#

This punk prompt is a monster!!!

#

SD3@ClipDrop

#

SD3@ClipDrop prompt = A chubby Shiba Inu named XiaoShuai, standing on two legs with its belly exposed, not holding any items. XiaoShuai is wearing a cute human outfit that includes a brightly colored shirt and shorts. The Shiba Inu has a warm, tan coat typical of its breed, with expressive eyes and a playful expression. Scene in warm tones, realistic style, capturing the essence of a cozy home atmosphere

viral plaza
#

well not perfectly, but, like, relative to what you'd expect for borking the res entirely

pine cedar
#

Is this the 2b model of sd3๏ผŸ

wide pagoda
#

So it does closeups, but with "reasonable" framing

low stone
pine cedar
noble coyote
#

What SD3 does have is visual acuity - poor limbs, fingers and faces (at times) - yet with an enhanced visual acuity.

dapper basalt
#

If Stable Assistant sticks around and stuff, I'm gonna keep it I think. Its an awesome tool to finetune what you are trying to get. For example I had it optimize prompts. It'll be great once its fully featured. So o nce weights are dropped and it isn't necessary to burn through credits really quickly and to use it as a tool on the side, it'll be worth it.

low stone
noble coyote
dapper basalt
#

SA needs the money and knows people will jump on SAI quicker. Cause of the new tech and brand name

noble coyote
#

Twice the number of prompts and eight times the number of pictures ๐Ÿ˜„

#

AFK

dapper basalt
#

Yeah true

storm saffron
#

@viral plaza do you know if it's just the standard T5 encoder that everything elses T5 uses or is this specific to SD3?

viral plaza
storm saffron
storm saffron
viral plaza
#

nice

muted dove
storm saffron
#

Pixart uses the same one at FP16

storm saffron
#

Another one from pix

muted dove
#

Same ๐Ÿ˜„

noble coyote
muted dove
wild remnant
storm saffron
muted dove
storm saffron
muted dove
#

How? I get AttributeError: 'NoneType' object has no attribute 'get'

#
Loading T5 from 'F:\ComfyUI_windows_portable\ComfyUI\models\t5\t5-xxl-encoder'
!!! Exception during processing!!! 'NoneType' object has no attribute 'get'
storm saffron
#

I'm actually using a BF16 one, oops,

muted dove
#

The clue was still there though, thanks! ๐Ÿ™‚

#

Changed path_type from folder to file...now works

storm saffron
muted dove
#

I seem to have all of them! ๐Ÿคฃ

#

No wonder I'm running out of disk space

#

I have mt5-xl too

#

15GB

storm saffron
#

You can probably find a smaller xl as well.

muted dove
#

I have flan-t5-xl-encoder-only-bf16, which is 2.5GB

storm saffron
muted dove
#

Not tried

storm saffron
#

I wonder if the outputs are compatible.

noble coyote
storm saffron
#

Nope, just tried, xl doesn't work, it's a different tensor size

#

xxl only

noble coyote
#

bf16 works - not made a noticeable difference - it might if I was doing photorealistic though (I will try that later!)

#

Trying XXL ... it's working ... again, no discernable difference?!

sterile pendant
dull star
#

I remember in the GPT-J days I wanted to run LLMs, but the best we could do is load-in-8bit lmao

storm saffron
noble coyote
storm saffron
noble coyote
#

I come to all this primarily as an artist ... glad to know that mathematical precision exists!! ๐Ÿ˜„

low stone
#

the city96 guy has a 3 gig version of the t5 for hunyuan, and I did a side by side comparison. Details on the face, lines on the clothing, various stuff like that are suddenly not as sharp or don't make sense. They're not major, but it's definitely noticeable.

muted dove
low stone
#

no that t5 is the thing actually doing the encoding for the image model.

#

it's converting the words to numbers.

teal fossil
low stone
#

hunyuan has a separate prompt expander model that they also include if we want to use it, but we have better stuff like gpt4 and llama3 for that.

dull star
#

OH SHIT STABLE AUDIO HERE?

#

non commercial license as expected

#

cant wait for comfyui implementation

dusky thistle
#

hope like hell we someday get the full weights for the full version

#

i'd go wild with that

teal fossil
#

@viral plaza On another note - could we theoretically load CogVLM instead of T5 XXL? At least in TagGui it gives me better results. Too bad it's quite VRAM hungry and CogVLM2 is Linux only...

teal fossil
dull star
#

honestly comfyui would be fine for me

teal fossil
dull star
#

I bet we'll see some addon implementation a few days later

#

maybe around the time SD3 2B comes out

raven fern
fair spruce
dull star
#

smaller filesize, faster loadtime, less memory used up on CPU Ram

raven fern
#

ah nice, i like the bf16 version anyway :3

dull star
#

๐Ÿ‘

fair spruce
dull star
#

wait a sec, the link to the model that Alex sent is the same filesize... maybe its fp16 as well

#

oh its from him, mcmonkey

raven fern
#

mcmonkey d luffy ๐Ÿ™‚

fair spruce
raven fern
#

calm down Musk

fair spruce
#

Show Musk go on

#

they said to me ๐Ÿ™‚

raven fern
#

good one

fair spruce
rustic junco
#

When using the api to generate images I get finish_reason: 'CONTENT_FILTERED',
seed: 2950283743
}

It blurs the image. Any way to say that it should not filter content ?

low stone
rustic junco
rustic junco
woeful spindle
low stone
fair spruce
low stone
#

Hah it couldn't quite do it.

rustic junco
#

I tried to send an url and tried to send the image as base64

#

both times i get bad requestb ack. I also changed the mode to image-to-image but doesnt seem to work

cedar gale
raven fern
#

what am i looking at lol

fair spruce
cedar gale
raven fern
#

kek

cedar gale
fringe rain
#

1dog

raven fern
# cedar gale

that looks like a picture from an actual commercial LOL

past flame
#

"Inside of you, there is 2wolves"

cedar gale
#

I was bored. ๐Ÿ˜„

#

I think the coca cola drip is the best tho.

rain current
cedar gale
#

Noice.

rain current
#

I am very impressed by ideogram. I took your image and used describe, then generated the resulting prompt... it has a lot of potential

#

Although sometimes it makes real crap

cedar gale
#

Its fun I like using SD for making outrages stuff. ๐Ÿ˜„

raven fern
#

now im thirsty

rain current
#

So then....

cedar gale
#

Yeah that kills em faster.

#

This might speed up things.

raven fern
#

lol

#

put a poison bottle

cedar gale
#

๐Ÿ˜ฆ

#

ALOHOL.

#

Sad panda.

rain current
cedar gale
turbid grotto
rain current
#

The patient may not survive...

cedar gale
#

They will be fine.

turbid grotto
#

some people saying text is gimmick but I have a hell of a fun with it and it is good

dreamy sundial
cedar gale
turbid grotto
#

I don't really understand how do glif.app host sd3. It takes 8s for gen while someone from SAI (or it was in paper) said it takes 40s or so on 4090 with 50steps. Do they have something more powerful? But why for free then, how do they profit?
At first I wasn't sure that it is real sd3 but it is not similar to any other services

turbid grotto
rain current
raven fern
#

i can fix her (in photoshop...yea...)

cedar gale
#

Yup she looks healthy.

rain current
#

with sdxl refiner

rain current
turbid grotto
#

At the beginning I got fooled by one "free sd3" site. Decided to check metadata and found "Ideogram" there) But it is not seems to be case here

low stone
storm saffron
rain current
#

I think some people are using Ideogram. It's easy to make calls to Ideogram via the endpoint URL. I do it sometimes by inserting the bearer token, and it works well

turbid grotto
turbid grotto
hallow lion
#

1 week.

#

๐Ÿ˜

hallow lion
#

๐Ÿ˜„

#

shady characters pushing SD3 in the backrooms and alleys

low stone
#

Sd3/sdxl refined

cunning lintel
#

first thing to try when weights drop

dull star
#

LMAO

#

I really wanna see the difference in prompt adherence

low stone
#

Sd3/hunyuan

cunning lintel
# dull star I really wanna see the difference in prompt adherence

i just want to try dumb contradicting stuff like "luminescent Rose madder Bright lilac huge crooked (fur:1.9) (alien:0.2) (Titanoboa:0.5) (Meerkat:0.4) amethyst eyes. full body. Boreal Forest, puddle, centered, sunlight, focused, uniform light background, photo-realistic" and T5 is probably too smart for that :p (though it doesn't even look half bad)

dull star
#

I never got this actually. Can T5 also have prompt weighting?

cunning lintel
#

no idea, i'd think why not but it prob needs a new implementation

low stone
#

Refined, it's really neat

low stone
#

Wow these came out great

#

purple furry snakes in the shape of "SD3" are wrapped around the arm of will smith who is taming a lion

cunning lintel
#

grin, asking a bit much ๐Ÿคฃ one day, imagegens will happily oblige, i hope ๐Ÿ˜‰

low stone
hallow lion
#

sigma slow

low stone
#

this is with 512 sigma

storm saffron
# hallow lion sigma slow

If you think sigma is slow, wait till sd3, apparently it's also quite slow. I believe lykon said 30 seconds per image on a 4090? 50 steps.

dull star
#

isn't that 8B?

#

and also 50 steps too

#

In early, unoptimized inference tests on consumer hardware our largest SD3 model with 8B parameters fits into the 24GB VRAM of a RTX 4090 and takes 34 seconds to generate an image of resolution 1024x1024 when using 50 sampling steps.

faint breach
#

50 steps yeah. 30 seconds for that isn't so bad. thats 1.5 it/s at megapixel resolutions. not sure what the problem is

#

not even any optimizations done by the community yet

dull star
#

well for 8B that's pretty good, but for 2B, a 4090 doing 1.5it/s isn't that good imo
I hope MODE13 meant 8B

#

and yes, optimizations

faint breach
#

a lot of the time may come from t5 layer too. faster without it and many won't want to use it

#

the way a lot of civit users prompt, t5 won't be necessary

dull star
#

yeah lmao

hallow lion
#

if i can get under a minute 1024/1024 30 step ish i cna live

dull star
#

1girl, big boob, intricate won't need T5 for sure

hallow lion
#

now piling on loras controlnets ipa adapters will be a different story

faint breach
#

i've been using Omost which takes like a minute to make the prompt code in the first place lol

#

i'll be fine

hallow lion
dull star
#

im sure PonySD3 will get it right first try

faint breach
#

there's gonna be a lot of debate over the usefulness of t5. many will declare it to be another sdxl refiner layer since they won't see immediate gains

#

another prediction i have. controlnet won't work as well on larger sd3

dull star
#

how so

faint breach
#

same prblem with sdxl. more parameters means less influence

dull star
#

interesting

#

then again, wouldn't the MultiModal nature of it make controlnets just superior?

#

multiple input streams and such or whatever

#

rather than some hacky solution

faint breach
#

i'm not saying i have an educated prediction

dull star
#

and I don't evne know what input streams are

low stone
faint breach
#

and the porn side of the community have this notion where if you even slightly dont' want to see that, you're very obviously an fundamentalist christian oppressor

trim arrow
#

sd3 hasn't even released yet...i think.

patent acorn
#

ik

upper snow
#

arguably the more important question: deep floyd stage 3 wen?

hallow talon
#

I wonder how long after SD3 2B releases will we be able to start training LORAs and finetuning?

solid violet
astral grail
#

if it's very easy, why is stability not officially doing that?

cinder junco
astral grail
#

well if "very easy" means it would just take 1 day longer, then I think that would be worth delaying it 1 day for, yeah ๐Ÿ˜„

cinder junco
low stone
cinder junco
#

Easy probably means "you don't have to go to special gyrations to get the loss to decrease, but you still need to train with a lot of compute and data".

#

And rather than spending all that compute on training the model on other resolutions, I'd prefer that the issue with the positional encoding get fixed so that the existing model is more flexible.

low stone
#

The downside of tiled upscaling is that it takes many times longer than standard upscale image and denoise at 0.5.

#

It's probably why they're not doing it on the api and instead upscaling sd3 with sdxl turbo

astral grail
low stone
#

All of these models are 1024 res trained.

#

It would take a massively larger amount of time to train the base models of millions of images at higher res. It would take massively more money to buy that gpu time, which a near bankrupt company can't do.

#

So instead, they're delegating that to the community, which makes sense.

#

I believe that they have the best of intentions at this point. The reality though, is that we may never see 8b or even 4b if someone doesn't come along and fund them somehow.

#

So we should just get 2b going and do what we can with that until/if we get surprised with something better down the road.

sterile pendant
hallow lion
#

yes take 2B and run with it

#

dont do another Cascade!

low stone
bitter hearth
#

Stupid black bars

#

"action movie" in the prompt, really likes to add fire to things

patent acorn
bitter hearth
#

What you said makes all the sense :p

patent acorn
#

ohh

bitter hearth
mighty pulsar
#

/help

bitter hearth
#

The green reaper..

patent acorn
#

i heard its pretty bad at scythes too

#

later when we get to see it finetuned

bitter hearth
#

Loooong scythe

patent acorn
#

thats a scythe now

hallow lion
#

at least Friday is close

bitter hearth
#

it's not bad at the scythe itself it seems, but same problem with guns, the holding part

patent acorn
bitter hearth
#

Well

#

Fingers are always bad

hallow lion
sterile pendant
#

I wonder if we will be able to rig up a multigpu setup for sd3 in comfy with any form of ease. I'd use my second weaker GPU for t5 inference and my main one for sd3, that way I wouldn't have to swap models constantly or do t5 inference on the CPU...

#

I know that in the python code, it's usually as simple as saying cuda0, cuda1 or CPU, so it might be pretty easy to slap into a node

#

I just don't know if comfy would natively like it with the smart memory management stuff

turbid grotto
#

||not hf pls||

dusky thistle
turbid grotto
#

could be skill issue

weary snow
#

What is likely going to be the minimum requirement for Stable Diffusion 3 Medium ?

cobalt moon
#

to be fair in 2024 almost everyone have at least 4GB

viral plaza
# astral grail if it's very easy, why is stability not officially doing that?

Technical things being easy doesn't mean business things are easy.
Like first -- Our technical team could release a thousand models... but how do we organize marketing around them all, how do we get safety testing on them all, how do we get legal signoff on them all, how do get licensing organized, etc. etc.
But also: if we decide to spend three days training 2B-2048, that's three days not spent training the 8B. (Opportunity cost: every action no matter how easy comes at the cost of something else you could've done with that same time/effort).

in short: something being technically easy doesn't mean a company like Stability can or should do it internally.
In fact, that's part of the point and value of open releases: Stability doesn't have to do everything, we just open release the models and what info we can, and there's ten thousand other people who will each obsess into any given subsection of work and make that

viral plaza
viral plaza
viral plaza
cobalt moon
#

4 bit quant?

#

as in LLM's Q4?

viral plaza
dull star
viral plaza
cobalt moon
#

oh you talk about T5

#

yeah that make sense

viral plaza
cobalt moon
#

there seem to be quite a lot of variation model for T5 though

#

there are fp16 version by theunlikely

viral plaza
#

(T5 is an architecture intended to be finetuned, there's tons of variants out there, it's kinda silly that we use a base model of T5 for sd3 tbh)

viral plaza
#

(which is on HF because the SGM training code uses HF so i uploaded that to dodge the long slow load on the full fat fp32 pair file)

dull star
#

Loading fp32 T5, was always the slowest part when using deepfloyd and pixart lol

cobalt moon
#

just get oof with OOM when using pixart lol

#

I haven't try Tiled KSampler though

noble coyote
#

"... obsessing into subsections ... groan, if only!!!" ๐Ÿ˜„

sterile pendant
#

Though the absolute easiest way to deal with the slow loads is to just get a cheap $40 nvme in the 2-5 gb/s range and models load in seconds

patent acorn
viral plaza
#

dunno, probably not

#

there's work on them but nothing release-ready last i looked

patent acorn
#

oh cogvlm i see

viral plaza
#

it's possible those get finished soon after and released or something, or community takes over, idk, we'll see

cobalt moon
#

it is pretty weird that some having memory about Stability will release ControlNet models at day one

viral plaza
#

CogVLM is an AI model that generates captions, not a dataset. The captions that don't come from CogVLM, are just whatever captions came with the source images

patent acorn
#

ight captioning with my own brain then

viral plaza
#

if you're doing your own dataset, yeah just write your own captions

sterile pendant
viral plaza
#

if you're doing a megascale dataset (millions or billions of images or whatever), the collection tools to build datasets like that usually can copy out original source caption data

patent acorn
#

its like 400+ images iirc

viral plaza
#

I expect the HuggingFace team will publish finetuning info on day 1, followed later by the various community finetuning software vendors (kohya, onetrainer, etc)

late compass
#

How to get membership or early access?

storm saffron
teal fossil
#

I got CogVLM2 running yesterday and my first prompt got responses that copied the whole question before answering. Made it absolutely useless for captioning. ^^
I rewrote the prompt and then it was way better.
Only persisting problem so far(which I can understand is unimaginably hard for an Ai to understand) is that right and left hand / arm are swapped when a person is facing the camera.
I'll have to manually check all captions because sometimes it gets it right, most times not. ๐Ÿ˜…

sterile pendant
# storm saffron Do you know what the system prompt and prompt was that was used in CogVLM? I've ...

Most vlms are good to go out of the box, since that's literally their sole purpose and what they were trained to do. But if you're trying to have a specific format, you have to do a little bit of sys prompting. I spent a few days a while back making one for llava 1.6 Mistral and it was around 1000 tokens long with examples in it. You can also use RAG as well, but the info still bloats up the context size. I know llava 1.6 ate up like ~2000 tokens for the image information alone, so I didn't use RAG for it when I only had a 4096 context size(no sliding window or other optimizations)

storm saffron
lavish pier
#

Anyone has suggestions for an image to text/prompt model? I am trying to get a prompt from an image to generate similar images

dry wave
#

question: Write a short caption for this image.
response: This image shows [autocomplete from here]

sterile pendant
#

The other trick you can use, granted it's double the time, is to use a second pass with llama3 instruct(or any good recent instruct model) to clean up the captions and format them better

cedar gale
#

Or use taggerui and discourage from caption:

sterile pendant
#

They are starting to make some llama3 based vlms now, so that should be awesome. Think there's a llava next llama3 someone is working on

sullen moss
#

1 week... ๐Ÿค—

sterile pendant
#

But up until now, the best one I've used that was the most reliable was llava 1.6 Mistral. Granted, that's for consumer level hardware and easily runs on 8gb vram. Mistral was a beast until llama3 came out(again talking about smaller models you can run on a normal PC)

storm saffron
sterile pendant
#

here's an example of a long description version i made a while back. it was part of a two pass system where this would be fed into a prompt generator llm. the system prompt for this is ~700 tokens

viral plaza
viral plaza
late compass
#

please

viral plaza
#

should use CogVLM2 anyway instead of the original, would be better and prompting it will be different

late compass
viral plaza
sterile pendant
#

(for it's size i should add, being able to run on an 8gb and all)

#

obviously, for larger scope and better hardware, i'm sure there are better alternatives

sterile pendant
teal fossil
teal fossil
#

Btw @viral plaza is there a time on the 12th we should look forward to?
A Midnight release? Late evening in the West US?

cobalt moon
#

you can say midnight release or 12pm release in UK

#

... or just random drop

viral plaza
radiant hound
teal fossil
noble coyote
#

Omost asks for Triton - and when you install it (the only version available!) - it tells you it is the wrong one?!

lucid swift
#

RGB gaming sausage

noble coyote
#

On another note entirely - I've been trying Stability Audio (via Pinokio) - and it's quite some fun!!!

lucid swift
#

i wish i would know how to fine tune it xD

noble coyote
#

Pinokio puts it all into a local VENV yes

lucid swift
cobalt moon
lucid swift
cobalt moon
#

that ComfyUI node said 8GB VRAM

noble coyote
cobalt moon
#

too

noble coyote
#

Pinokio uses GRadio

lucid swift
noble coyote
#

I have an 8Gb RTX 2070 and it works OK, about 45 seconds (100 iterations) to get a 47 second mp3 file

lucid swift
#

sd3 "Spectrogram"

cobalt moon
#

that doesn't look like a spectrogram though

#

but close enough

lucid swift
#

RGB gaming catheter

noble coyote
lucid swift
#

i alredy installed it

low stone
noble coyote
lucid swift
dull star
#

I'm sorry if this has been asked or answered already, but will we have to truncate prompts ourselves if we make our own Loras or Finetunes?

patent acorn
#

i hope sd3 is able to make this

low stone
#

No

patent acorn
#

awww that sucks

low stone
#

Sigh. Still no.

#

Maybe 2b on the 12th will be better

dull star
#

with 2B we can cherry pick easier too

#

can't wait to mess around with text encoders

low stone
#

This is just one failed attempt after another

dull star
#

poor calf is out of focus

low stone
#

Yeah when it's local I can just tell it to render 30 of them, with tons of various llm variations of it. Not gonna do that against a paid api.

dull star
#

fr

low stone
#

Hah

#

Can you get it with the cat snuggling the oxen though and taking a pic of both of them

dull star
#

damn, they don't want to cooperate

low stone
dull star
#

is this ideogram?

low stone
#

oxen is sitting next to a tabbie cat who is taking a selfie of both of them. Background is a farm.

#

Dalle

dull star
#

oh

#

looks pretty good

#

we could overfit Loras to make something like this

#

We'll see how 2B looks like

#

or fully trained 8B

#

Festivalman do you think that skipping 4B is a good idea?

#

It makes sense to me, you will have 2B, which is what the majority of people will be able to run, 800M for low end users, then 8B for enthusiasts or small businesses if the Enterprise membership isn't too expensive

#

4B might stick out like a sore thumb and would delay the other models

low stone
#

tabbie cat that is snuggling with an oxen while taking a selfie of both of them. Background is a farm.

#

Dalle/sd3

#

I think sd3 did a good job, it's technically the selfie pic itself

low stone
cunning lintel
#

if anyone wants to challenge sd3, try something like this (Dalll) (saw the pic in reddit a long time ago, sadly sd3 really doesn't shine with abstract concepts like "mustache from her hair" ๐Ÿ˜ข )

#

thinking about it, that image is extra evil as it also has hands ๐Ÿ˜‚ (this is sd3, closest i got, but that mustache, is so much more real :p)

low stone
#

Yeah I'm getting the same thing

#

Did a good job with "anime" though. It really looks good

#

Sharper than deepbluev4 that I've been using I think

wild remnant
lucid swift
dull star
#

is this supposed to be mewtwo?

desert garnet
#

nah thats mewfour ๐Ÿ’€

sterile pendant
low stone
coral sable
#

octopussy SD3

#

Often feels like SD 1.5, lots of trial and error. But when the high roll hits it's really good. 6 more dayssadcat watwow

dull star
#

yeah 2B will be probably more consistent, especially with Finetunes

teal fossil
#

Hmmm... currently curating all the CoGVLM2 captions... which are nice, but still utterly filled with "suggestions" and "seems to be"'s "possibly" "appears to be" and other nonsense...
--> and now I'm wondering if that's actually stupid cause those might still be in the SD3 captions as well.

dull star
#

or they had a good system prompt for it anyway

#

I do hate those assumption parts of the captions though

coral sable
dull star
#

8B is heavily undertrained

#

but it is coping for sure

#

we can't expect images to suddenly look 100x better

coral sable
#

i'd sacrifice consistency for variety anyday. Variety is amazing on SD3

dull star
#

Oh I mean like anatomical consistency or whatever

#

it will have variety

#

they talked about the model overfitting, causing differing seeds to become useless

#

so I think we're going to be okay

#

maybe the Turbo model(s) will have that though

#

and finetunes

coral sable
#

it can have both, I'm just sayin what is more important to me. Had lot's of fun with API already

dull star
#

same

#

8B is really good at paintings, so I hope 2B Base will be good at it too

#

same dataset after all, no?

#

should be similar enough

low stone
#

Playground 2.5 is an example of almost no variation across seeds.

dull star
#

btw @coral sable idk if you have seen this, but this is a TRULY raw image from 2B, no upscaling (From Lykon)

#

this is how the face looks upclose

#

remember, no highresfix or other tricks

coral sable
dull star
#

catlurk I see

coral sable
dull star
#

I wonder if the 16 channel VAE is just that much better for smaller details such as eyes

#

what's unfortunate, is that like with Pixart, highresfix has an issue

#

resolutions higher will create very very noisy artifacts

#

I suppose by the time we get it, they'll fix the pos embed code, or the community will within a week

#

if Tiled upscaling is okay, then I'll just do that then for the time being

coral sable
#

highresfix/adetailer still needed, details drop significantly on anything that isn't closeup. But that's to be expected as consumer hardware is what SD3 is targeting

dull star
#

yeahh

coral sable
#

I wonder will there be launch event on 12th as it was with SDXL

dull star
#

what was the event again?

coral sable
#

I mean just discord call with bunch of people

dull star
#

ah

coral sable
#

Stability guys presenting new tech

dull star
#

oh so like the center stage type of call?

coral sable
#

yea

dull star
#

I can't wait to remake Tekken intros as Live Action movie stills

#

I wanna see how good img2img is with such an intelligent AI

#

oh heck I could have tried that with pixart already lmao

#

I keep forgetting that it can do img2img too

#

you guys have any plans on what to use SD3 for that you couldn't or wouldn't bother to make with previous models (too much controlnet or regional prompting involved)?

hallow talon
trail frost
#

Is there any new update on pixart models

dull star
#

only 2K model is the biggest model released so far

#

there's lora training code

#

its in diffusers

#

nothing much to be honest that I know of that is "new"

viral plaza
dull star
#

nice

viral plaza
teal fossil
sterile pendant
#

They won't be perfect 1:1s though, but should be close enough if it's done right. Maybe like 90% accurate

dull star
fair spruce
dull star
#

nice

cedar gale
fair spruce
cedar gale
fair spruce
cedar gale
fair spruce
cedar gale
fair spruce
cedar gale
dull star
#

yes nvidia I will keep buying your 24GB high end cards for 3 gagillion dollars sadcat

fair spruce
cedar gale
fair spruce
cedar gale
fair spruce
cedar gale
raven fern
turbid grotto
low stone
turbid grotto
low stone
#

yeah with 0.3 denoise

turbid grotto
low stone
raven fern
#

lol

dim sinew
# turbid grotto

lol - I have an 8gig 3070ti, itโ€™s just never enough sometimes. (Not doing ai just yet)

dull star
#

T5 bf16 on CPU with SD3 2B fp16 will be good for you ๐Ÿ™

raven fern
dull star
#

it takes about 10-15 seconds to T5 to process the prompt, but once you are done and you don't change the prompt
you will be able to change anything else (CFG, Sampler, resolution, etc) and get instantly to generating the picture without having to regenerate it

#

so the future is bright

raven fern
#

the future is now, thanks to science

dull star
#

or just don't use T5, we don't know how much worse it's going to be without T5

raven fern
#

according to the research paper, the difference is not that huge

dull star
#

yeah but they also claimed stuff like DALLE3 level prompt adherence and stuff so... ๐Ÿ˜ฌ

raven fern
#

yea i guess will see :3

dull star
#

but I do believe that images without Text might be decent without T5

#

if its as smart as Pixart-Sigma without T5 then it'll be good

#

caues its still a massive improvement over SDXL

dim sinew
low stone
raven fern
#

its kinda already broken down :3 long story short, you dont have to run everything on gpu

low stone
#

it's really the benefit of the t5 stuff over clip

dull star
#

you can use SD3 2B with T5 (a text generation model, which can also be used as an encoder for image generators), which increases its text capabilities (and possibly accuracy to the prompt)

#

but its expensive, it takes up a lot of VRAM

dim sinew
#

Iโ€™m a hobby photographer, and I am just stepping into wanting to create AI generated landscapes for my fantasy world Iโ€™m starting to write.

dull star
#

ah

#

If you are looking for mostly portraits of people or subjects, or simple landscapes, current models are fine for that.

pseudo stone
#

wll we ever get sd3 large

dull star
#

if you want to make detailed scenes with meaning, then pixart and SD3 2B will be the perfect fit

dull star
#

we will get all models

pseudo stone
#

when tho

dull star
#

800M, 2B, 8B

but for now, 2B is the most optimally trained one

pseudo stone
#

like a month after medium

dull star
#

if not two

#

8B is a VERY large model

pseudo stone
#

I hope

#

so

dim sinew
raven fern
#

2B will all the toys with it will keep you occupied enough until the other models :3

dull star
low stone
#

sd3/pixart/hunyuan

dull star
#

Pixart is completely free, SD3 2B needs licensing for commercial use (we are awaiting more info about this, but if you are just making stuff for your own enjoyment, ipso facto, personal use, then this does not affect you).

raven fern
#

4D chess ๐Ÿ˜ฎ

low stone
dim sinew
dim sinew
low stone
#

sounds good, here ya go

dim sinew
# low stone

It sure if thatโ€™s a building generating energy. Or a weapon that landed and started imbedding itself into the moon to destroy itโ€ฆ

dull star
#

I like SD3 for paintings too

low stone
dim sinew
#

Iโ€™ve toyed with also training a model for a specific side project idea. But I have no idea if you start either full compositions, or individual objects by name.

dim sinew
dull star
#

thanks

low stone
dull star
#

if all else fails, I'll still gladly use SD3 for paintings

low stone
#

basically just get a chat going with llama3, and keep changing elements until it does what you want.

dull star
low stone
dim sinew
dull star
#

its a text generator like chatgpt

#

or rather, an AI chatbot

#

you can ask for suggestions for prompts and etc

low stone
#

yeah, it's llama3 language model, running locally on ollama, which has the open-webui front end so it looks like chatgpt.

dull star
dim sinew
low stone
solid violet
#

oh yea but not for commercial use*

low stone
# solid violet llama generates images now?

so using those combination of tools and a comfyui backend, you can have the chat interface generate images based off llama3 responses just by clicking the picture icon on the response.

#

so you can have a chat back and forth, generating images along the way

#

add this, change this.

solid violet
#

youโ€™ve just changed my whole world man. thank you this was really insightful. where did you learn all this?

#
  • self portrait, โ€œallegory of the caveโ€ depicts @solid violet led to the light by @low stone
low stone
#

i do this stuff for work and spend way too many hours at home doing it.

low stone
agile hornet
#

Will SD3 be usable in Fooocus?

low stone
#

eventually

#

man this ultra on the api for sd3 is 8 cents per image now. I've thrown a good amount of money at it, but that's Dall-E money already. I think I'll wait for the 2b at this point

patent acorn
cedar gale
patent acorn
#

where 2 bees

gusty trail
#

In the imagination

hallow lion
#

is this lama thing worth it?

agile hornet
noble coyote
noble coyote
# cedar gale

(One Bee + One Bee) or not (One Bee + One Bee) - that is the question?! ๐Ÿ˜„

noble coyote
noble coyote
# low stone

Using Llama 2 Chat 7B Q4 - this is its result - Against a dark, starry backdrop, a sleek, metallic structure rises from the lunar surface, its gleaming facade reflecting the pale light of a distant sun. The building's fluid lines blur the distinction between form and function, as if it's alive, pulsing with an otherworldly energy. As the moon's gravity pulls at its base, the structure's edges begin to distort, revealing a glowing core that seems to be the source of the energy.

noble coyote
#

... no glowing core - sadly!

sterile heath
#

I mean it depends how itโ€™s implemented. You donโ€™t even need to run both on the same device or at the same time. You could generate an embedding, save it, load the next model and pass the embedding to it if you wanted to be silly. You can also just prune, quantize etc.

teal fossil
#

Me next Wednesday if SD3M turns out to be what is promised.

#

Every time I work on a Dataset or train a LoRA Atm... I can't wait to play around with that sweet Sweet 16-channel Vae.

cedar gale
cedar gale
hallow lion
#

๐Ÿ˜„

muted dove
cedar gale
#

๐Ÿ˜›

#

The RTXX 4099 costs the same tho.

radiant ledge
#

this one even has two fans

cedar gale
modern rover
#

how to create a image

#

who can tell me ?

hoary pilot
#

M

muted dove
cedar gale
tardy crag
#

Will the model released on June 12th have the possibility to input an image or will be just text to image?

weary crystal
# low stone

Short question did you used the sd3 API or a local running comfyui with api exposed?

compact forge
cobalt moon
#

yeah i think festivalman hook up the SD3 API onto Ollama / open sourced LLM webui

compact forge
bitter hearth
#

wait not last one, this is very cute

dim panther
#

A living room with a large window, hardwood floors, and a fireplace. In the center of the room, a stylish couple is examining different sofa options, including a modern gray sectional, a tufted leather chesterfield, and a mid-century inspired loveseat. The room is filled with natural light and the couple appears to be deep in discussion, considering the size, color, and style of each sofa to determine the perfect fit for their space.

lucid swift
lucid swift
cedar gale
low stone
low stone
noble coyote
#

I am 'Omosting' ...

cedar gale
noble coyote
#

A beautiful water nymph, bejewelled

#

Using Omost LLM Setup

cedar gale
#

Aye it is pretty dope.

noble coyote
cedar gale
#

Tested it a while, I made these:

noble coyote
#

Would they have been harder to make without Omost?

cedar gale
#

Yes.

#

On top, in front else to hard.

#

Also things like bleed between 2 objects.

#

Omost is regional prompting on steroids for lazy people. ๐Ÿ˜„

noble coyote
#

I'll try some 'positional' stuff

outer charm
#

So... You can download sd3 now?

runic tusk
#

No.

outer charm
#

Oof, thanks soul

runic tusk
#

Trust me, literally everyone will know when it's available. You won't miss it. It'll be posted everywhere.

low stone
cedar gale
#

These are omost.

cedar gale
noble coyote
#

Omost - here is my 'positional' prompt - a giraffe is wearing a top hat. On the hat sits a green frog, eating a red apple and reading a blue book. A snail sits next to the frog. A butterfly sits on the book

#

Ultimately very very poor positional takeup at all.

#

Yet the Omost text does specify all of this prompt ... it just doesn't deliver?!?!?!?!?

swift portal
#

Hi

noble coyote
#

Keep it simple?

#

As I say, Omost picked up on all this and laid it out neatly section by section ...

woeful spindle
#

does it say anything about changing the model on the github page?

noble coyote
#

If Omost is this disappointing though ...

woeful spindle
#

you cant really say "disappointing"

#

cus it's using sdxl

#

we all know how sdxl performs

#

we can see a clear improvement there

noble coyote
#

Nerdy Rodent did an Omost Video on YT - he opens the code and drills down to the model name ...

#

Someone said it was RealVis SDXL?

cedar gale
#

I think so but you can just swap the SD XL model.

#

You just need to convert it to diffusers and route it via config.

noble coyote
#

When I'm Omosting, I swap-out Torch 2.0.0-cu118 (Xformers in ComfyUI) for 2.3.0-cu121

#

... and then back again when using ComfyUI

cedar gale
#

DM-ed you some stuff to avoid spam here. ๐Ÿ™‚

woeful spindle
noble coyote
#

Give me a prompt which will work in Omost, svp?

lucid swift
dull star
#

horse

#

watching a video

#

of a fox eating a hamburjer

#

SD3 is truly the best

#

๐Ÿ™

brisk cipher
noble coyote
#

FABLAN@Glif

trail frost
#

Stability api

noble coyote
#

FABLAN@Glif is free

#

Free SD3

brisk cipher
#

Can?

dull star
#

gives you the most control

noble coyote
#

Omost - Peppa and Cthulu

past flame
rain current
#

Absolutely, at the moment the only thing (in my opinion) that competes with understanding the prompt and with good quality is Ideogram

sick cedar
rain current
#

ideogram

sick cedar
# rain current ideogram

It's not too far off then. SD3. I mean we are also comparing a undertrained SD3 model with a fully trained Ideogram. So i'd say that's pretty good.

#

In my opinion, i think SD3 did better with the Horse looking at the TV monitor prompt.
The burger on the screen looks more coherent on SD3.

rain current
#

I'm not saying it's better than SD3, especially considering that SD3 will have more training, and with controlnet, etc. It's just that, as of today, it's the only thing I see that's similar and free

rain current
#

But I also see that it is more obedient to the prompt "slightly plump". I'm not sure if 2B is capable of being that precise.

rugged tartan
#

I am seeing a significant difference between CORE and ULTRA in the API. the first image is comning from CORE the second is the result of ULTRA, same prompt and same seed. Ultra seems to ignore the details of the prompt. Does someone here is experiencing the same issues? In general everything done with SD3 in the API resembles the quality of SD-1.6

Prompt: Picture of a Anime-like real life woman with intricate face paint baroque style, Japanese facial features, realistic skin texture, photographic, photo-realistic

rain current
viral plaza
rugged nova
dreamy sundial
bitter hearth
vapid radish
#

SD3 VS My SDXL Merge
I know portraits are not going to be SD3's strengh until people finetune the art out of it.

#

I cannot wait to upscale image with SD3 locally though.

dull star
#

that's pretty good for a base model

#

and it doesn1t have that typical finetune look to it

gray flicker
#

girl

lucid swift
dusky thistle
bitter hearth
hallow lion
#

make an image of the sd3 model training in the gym

#

coz it needs more training

hallow lion
cedar gale
hallow lion
#

elphant tho

#

:<

cedar gale
#

๐Ÿ˜ฆ

hallow lion
#

text is the new hands

cedar gale
bitter hearth
hallow lion
#

lol why a ghost

bitter hearth
#

You didn't specify

hallow lion
#

๐Ÿคฃ

bitter hearth
#

runs away

cedar gale
raven fern
#

lol

patent acorn
#

sd3 request: man crying because there are no food in the fridge

real terrace
hallow lion
patent acorn
#

not interested

patent acorn
#

2nd is better though he should face to the firdge

#

also

#

there are still FOODS!!

#

shouldve been an empty fridge i forgot

real terrace
real terrace
patent acorn
#

ough

noble coyote
cedar gale
cedar gale
#

Went for depressed instead of crying. ๐Ÿ˜„

bitter hearth
noble coyote
#

Just trying Anytest Controlnet ...

cedar gale
bitter hearth
noble coyote
#

Originals made using Portrait Master. ComfyUI+Anytest Controlnet - prompt = beautiful leopard, sunny glade, hat, cinematic lighting. [NightVisionXL, dmpp_2s_ancestral, karras, 30 iterations]

#

Originals made using Portrait Master. ComfyUI+Anytest Controlnet - prompt = beautiful peacock, sunny glade, hat, cinematic lighting. [NightVisionXL, dmpp_2s_ancestral, karras, 30 iterations]

#

Originals made using Portrait Master. ComfyUI+Anytest Controlnet - prompt = beautiful fox, sunny glade, hat, cinematic lighting. [NightVisionXL, dmpp_2s_ancestral, karras, 30 iterations]

pseudo vault
#

What's the best generator for producing text?

noble coyote
#

Ideogram

#

Or try ComfyUI and the Harrlogos2 LoRA

pseudo vault
#

Any usable on a phone? (Android) Not near my computer anytime soon lol

noble coyote
#

Ideogram may be

#

Or free trial Adobe Firefly

#

Firefly can be fiddly on a phone

hallow lion
#

Nothing on the phone works.

pseudo vault
#

@noble coyote Thanks pal

noble coyote
patent acorn
noble coyote
dull star
#

It may actually get these correct, it's just that the layout might look too 2D or pasted on. We'll see eventually.

noble coyote
west haven
#

A staff member in the office is wearing yellow clothes with three conspicuous letters: PPT.

zenith talon
#

ddf

dim sinew
dim sinew
rain current
# lucid swift what propts did oyu use for the hyena

A close-up of a hyena's face. The hyena appears to be in a contemplative or playful pose, with its front paws placed near its chin, as if it's holding its face. The photograph is in black and white, emphasizing the texture of the hyena's fur and the intricate details of its facial features. The background is blurred, putting the focus entirely on the hyena.

low stone
# noble coyote

One of the big benefits of ideogram is their magic prompter. It's crazy good at making great scenes with little input.

dim sinew
vapid radish
trail frost
#

Im so excited for sd3 open release

#

Can we use that in our projects like sdxl 1.0?

dull star
#

if you use it and not make money, you can use it with no difference at all

#

it doesn't affect people like us

#

its only for the people who want to make money using the model who might have a bit of trouble

#

it'll get sorted out at release.

urban arch
dreamy sundial
vapid radish
#

But an upscale with SD should help, just too expsense on the API right now.

dreamy sundial
#

while all the people on sdxl look exactly the same

#

result of overtraining

vapid radish
dreamy sundial
#

and merging

dreamy sundial
cedar gale
vapid radish
#

I'm not saying it is bad I really like it.

dreamy sundial
#

i've had far better results with better prompting, gpt4o really helps a lot in most cases

vapid radish
#

This is what SDXL looks like after my workflow (6K zoom in), I'm sure SD3 will look great as well.

dreamy sundial
vapid radish
dreamy sundial
#

sd3 face variation is on par with mj6

#

raw SD3 outputs

vapid radish
dreamy sundial
past flame
#

It has nothing to do with how SD3 architecture works

noble coyote
sick cedar
# vapid radish

We also using a undertrained version of SD3 currently too!

dull star
#

I can't wait for 2B

#

I really really hope the paintings are good ๐Ÿ™

#

just 4 days

dreamy sundial
hallow talon
#

I wonder how long after the release of 2B that it'll be supported in A1111? I can use Comfy but it's not ideal

dreamy sundial
dull star
#

yeah 8B is good

#

I wanna see how good 2B is

civic quail
#

what are the requirements for sd3

robust junco
civic quail
dull star
#

๐Ÿคจ

low stone
# robust junco How about doing SDXL-> SD3 or SD3->SDXL->SD3?

I was fooling around with denoise yesterday. The biggest issue with sdxl is multi subject refining. It just wants to make everyone the same thing. I managed to upscale by having the various refining stages with various denoise levels, but it was super finicky and needed different denoise numbers per image, so in the end it wasn't useful long term. I'm really hoping that refining with sd3 will solve this because it understands multi subject.

#

For example.

#

Ella is really good at complex multi subject. But it just makes the sdxl upscale's job really hard

dull star
#

probably posted here already

robust junco
low stone
#

I get 1000 images a month from ideogram for $20 and I can img2img with sdxl or sd3 2b when it comes out

robust junco
low stone
#

Sd3 is dalle price now, but we all know that for most things, it's not as good (the censored api sd3). Obviously local sd3 is a different story.

mortal mesa
#

which one is Ultra

left parrot
# civic quail i have 8gb vram 32gb ram

That should be enough to run the 2B model that will be released on wednesday, but not the Huge 8B mode that's coming eventually. 2B seems to be plenty good enough for most uses!

dull star
vapid radish
dreamy sundial
dull star
left parrot
#

yeah, the 4B doesn't seem like it has that much of a use case, stuck between the top quality 8B and the 2B that's balanced between performance and output quality

#

I suspect most finetunes and lora's will be based on either 2B or 8B

turbid grotto
cedar gale
low stone
#

so it'll keep multi-subject until high res, but compared to sdxl refinement if it works, is night and day quality.

turbid grotto
turbid grotto
#

I found sdxl hyper a really good for upscaling with only 4-8 steps

low stone
#

that's what my ella workflow looks like.

#

first 2 on left are ella native, last 2 are sdxl.

#

so as you can see, the sd 1.5 checkpoints just don't have the content in them. they're very lacking in training outside porn.

#

yeah i'm using sdxl hyper now all over the place. it's amazing

raven fern
dreamy sundial
dreamy sundial
#

they promised to release all versions

#

2b is too low and 8b too high, i think 4b is what majority of people would use

cedar gale
dull star
dull star
vapid radish
turbid grotto
mortal mesa
#

there is no 4B and community split talk never seems to effect anything

raven fern
turbid grotto
#

however, 2b could be really enough with controlnets and other stuff

turbid grotto
dreamy sundial
turbid grotto
#

oh, I will run out of ram for 8b agony
even loading sdxl peaks roughly at 14gb

turbid grotto
dreamy sundial
#

i think it was 15-20 steps i dont remember

turbid grotto
#

1024px, 20sp, sdxl = 19s on 3060

dreamy sundial
#

not bad

turbid grotto
#

I didn't expect

dreamy sundial
low stone
#

this is gonna be great when we have sd3 at home

cedar gale
low stone
#

do not taunt happy fun nordvpn

#

censorship dragon getting you down? use nordvpn.

cedar gale
low stone
cedar gale
low stone
#

Release the weights!

cedar gale
low stone
compact forge
#

is the artisan bot using sd3 release candidate already?

hallow lion
#

she has 2 rings but ill let that slide

#

i guess that could happen

bitter hearth
#

Nothing wrong with 2 rings

rich iron
vapid radish
dull star
#

very epic

raven fern
#

elden ring boss ๐Ÿ˜ฎ

bitter hearth
raven fern
#

very sharp details

bitter hearth
raven fern
#

cutting egde

patent acorn
#

edging

bitter hearth
patent acorn
#

hello sd3 make me lego ninjago characters

low stone
#

a tornado of library books, weird supernatural ghosts flying around sliming the library

raven fern
#

slimy ghosts

low stone
scenic shadow
#

Has there been any word on if any finetuners got early access?

silver sluice
#

does anyone have a ComfyUI workflow that's compatible with SD3? I realize it's not realeased yet but I'm guessing for those who do have early access there's already a workflow developed for it? I just want to make sure I get the nodes installed and ready for launch day

wild remnant
hallow talon
#

I wonder how long after release of the model that we'll be able to train and finetune using Kohya? I'm definitely excited about the possibilities with SD3 when it comes to finetuning