#✨|sdxl

1 messages · Page 151 of 1

nimble heart
#

let me bypass the mi200 checks too

vital ermine
#

Yeah, the dev of it said that but now says to use some others over his Prodigy due to how large XL is. Sadly, the suggestions we don't have access to.

fierce hollow
#

worst case gpu just self combusts 🤣

soft bone
# vital ermine Yeah, the dev of it said that but now says to use some others over his Prodigy d...

#sdxl #ComfyUI #LoRA #runpod #dadaptation #prodigy #stablediffusion #style #styletraining

This is an SDXL 1.0 training log for art style. However, the workflow is also interchangeable with SD1.5. I document my thought process, experiments, mistakes, and analysis of quantitative and qualitative results. Hopefully, this can be a good starting gui...

▶ Play video
nimble heart
#

okay this problem I'm too dumb to fix

#

well it didnt even compile before so progress?

vital ermine
soft bone
vital ermine
#

that;s cool

soft bone
#

it outperforms my 1.5 full dreambooth version

#

by a lot

peak dove
#

Trying to add TensorRT as an extension in A1111 - getting this error - AssertionError: extension access disabled because of command line flags - found the answer - DO NOT USE --share and/or--listen in the .bat file when installing TensorRT!

vital ermine
#

Oh, I remember you in here using that and van goh-ing everything, lol

soft bone
#

lolol yes and the cats

vital ermine
#

yes

#

It was good

soft bone
vital ermine
#

I never have much luck training loras on any SD version but DB just works for me

#

catshrek, lol

soft bone
#

does db even run with XL?

vital ermine
#

yes

#

Only option is to extract to Lora and Lycoris says it is easy to make it extract to locon that I want. Doesn't do it

rustic garnet
#

lora extraction only makes sense if you train the same layers as in lora

vital ermine
#

Oh, it makes a whole lot of sense and actually does give better results from the people I talk to. Takes longer to do is the reason people say they don't do it.

#

for me that is the only way any of my loras have been made

rustic garnet
#

it does not make sense if you train db on all layers and then only extract a subset of them

#

freeze everything except the attention layers when you want to extract lora

#

for locon you would also train the conv layers in the resnet

#

in general extracting loras afterwards can be more parameter efficient, as you can set the lora rank dynamically. But it has to be done right

vital ermine
#

yes, and why I have been trying to get that as extraction but they are just not doing it saying Kohya can add it that it is easy. Well, if the dev says it is easy who are any of us to say it isn't? All I know is for XL it hasn't been done.

rustic garnet
#

it is easy, just a few lines of python code

vital ermine
#

exactly what he said too

#

Kohya just hasn't done it for some odd reason

#

I prefer Locon over lora

soft bone
#

encanto trained well

vital ermine
#

probably XL had an idea about it

soft bone
#

nay i checked

vital ermine
#

every images was totally different?

#

yep, it has a general idea

vital ermine
#

what's up with emma watson?

soft bone
#

im not using "animation, disney, pixar, 3d, animated, cgi" or any styling words at all. just subject line and token. XL knows how to do pixar kinda but i make sure to keep the concept separate from that data

vital ermine
#

btw, I am training using your info but I had to make it 2 epochs.

soft bone
vital ermine
soft bone
#

how many imgs?

#

oh i gladly train up to 2 hours nowadays

vital ermine
#

546

#

I had far more luck with locon in 2.1 than a lora so let's see if this trains after a bit

#

I hate how the images on civit is so small now. Someone mentioned that to me last month and yeah, a pita now.

#

far to small to really investigate

#

good, now to test

soft bone
#

caption this

vital ermine
#

fingers crossed

#

interesting

soft bone
vital ermine
#

How do I tell?

soft bone
#

lately i just eliminate the most problematic checkpoints in the XY plot until the last 2-3 and then choose by preference

vital ermine
#

I am thinking one of the last five.

#

how do I tell which ones are the problems?

soft bone
#

ones with the most artifacts

vital ermine
#

oh, I never see those

soft bone
#

in that first one you sent, the guy's torso is glitching in half

vital ermine
#

what I see is skeletons so that is too similar to the original

#

that first one all by itelsef IS base XL

#

the one on the left by itself is base XL

#

no lora

soft bone
#

ah i see

vital ermine
#

I may need to train for longer

#

more steps

soft bone
#

freckles!

#

i may need to train a general pixar model

vital ermine
#

I am going to train for 1500 more steps

soft bone
vital ermine
#

1092

lilac wren
#

what hardware and settings to train SDXL do you have guys?

soft bone
#

3090

#

settings are all over the place depending on dataset

indigo carbon
vital ermine
#

yeah, this is a failure

soft bone
#

im surprised at this flexibility considering its only trained on one movie and no styling prompts are needed at all. no regularization either

vital ermine
#

looks like 300 is it but all of the faces are screwed

#

The first one is 300 more steps over the last highest

#

It's XL I live with janky jacked up trainings.

wet nacelle
#

Do we know of any wire extension for comfy that would allow me the user to actually put a pin on a part of a wire that connects the nodes?

#

I want to just orientate the wire to my will.

vital ermine
wet nacelle
vital ermine
wet nacelle
vital ermine
# wet nacelle The only thing we have is the straight line one right?

the ones in built, and it really messes with me when I am building. I am used to way different and ones if I grab the node the wire will follow so slide right the noodle/wire will go left so I can see where it goes. Right now it is 100% crap. Another issue is that just clicking the wire doesn't tell you where it is connected most times.

vital ermine
peak dove
#

ComfyUI SDXL Sytan's w/flow DynavisionXL model

vital ermine
peak dove
vital ermine
wet nacelle
vital ermine
icy brook
#

It's up!

vital ermine
wet nacelle
vital ermine
glass notch
#

Is it normal for images to not look that great with SDXL compared to 1.5 upscaled? Here are my settings: absurdres, high quality, top quality, a colorful parrot flying in a mangrove jungle
Negative prompt: easynegative, worst quality
Steps: 40, Sampler: DPM++ 2S a Karras, CFG scale: 7, Seed: 1338516404, Size: 1344x768, Model hash: e6bb9ea85b, Model: sd_xl_base_1.0_0.9vae, Clip skip: 2, ENSD: 31337, Refiner: sd_xl_refiner_1.0 [7440042bbd], Refiner switch at: 0.8, Version: v1.6.0. I am running lowvram (due to 6GB) no half vae and xformers arguments

vital ermine
#

@glass notchyour prompt

#

I did not use anything but the positive prompt from what you posted

glass notch
#

It's clear that it's easier to prompt concepts but from your own example it does seem that the quality I got is to be expected. Thanks for doing the test

vital ermine
#

Welcome

wet nacelle
crisp owl
#

Freyja

wet nacelle
crisp owl
#

Freyja after acquiring Brisingamen, yet unable to find joy again

indigo carbon
# wet nacelle

I never trusted a banking agency ever since Sberbank did what they did

#

Those MFs made Kandinsky

wet nacelle
indigo carbon
indigo carbon
#

Though that model had image blending capabilities somehow

wet nacelle
indigo carbon
#

I'm also not sure if Kandinsky can blend images due to being pixel diffusion or maybe because it uses ViT-14 or something like that

wet nacelle
#

not much came out from it from what I can tell.

indigo carbon
#

Yeah, I hate them. They made a model that can blend images and never explained what causes it to be able to do so

#

Oh that's it. "Image encoder: ViT-bigG-14-laion2B-39B-b160k"

#

So it can have image input due to having both a text encoder and an image encoder?

rustic garnet
rustic garnet
#

I'm not soo interested in image blending, so I haven't tested if image blend in SDXL works as good as in Kadinsky

half ivy
#

looking for masking tool for auto, asking for a friend o0

wet nacelle
strong copper
wet nacelle
cyan crown
lilac wren
wet nacelle
cyan crown
wet nacelle
cyan crown
#

prompt?

lilac wren
lapis gale
#

just a bit of an SDXL variety dump 🙂

lilac wren
icy brook
crisp owl
indigo carbon
#

Or ControlNet?

#

Blending is definitely not a capability without IPAdapter, many people have tested and come to that conclusion

#

@rustic garnet are you referring to the CLiP_vision as an image encoder? Because that's not really a part of SDXL.. SDXL just has 2 text encoders, no image encoder according to HF

#

With using CLiP_vision, SDXL can have image input, I only tested this with 2 inputs for blending and it wasn't coherent at all.. it was able to blend the 2 images only one out of like 30 times

#

Haven't tested that method with 1 input, so idk how does that behave

rustic garnet
indigo carbon
rustic garnet
indigo carbon
#

the outputs with image input using CLiP vision has nothing to do with the input

rustic garnet
#

you could fine-tune sdxl on that - but honestly, ipadapter is the better solution anyways

lapis gale
#

spelling needs a bit of work, but the cat pumpkins are nice

vale eagle
indigo carbon
#

it can blend images, but again, nothing like the quality you normally get out of SDXL

vale eagle
#

PixArt Alpha used dalle3 liked structure and it might open source soon.

wet nacelle
indigo carbon
wet nacelle
lusty wolf
#

Hair is clumpy? I think it is FreeU?

wet nacelle
indigo carbon
vale eagle
#

FreeU does help in some cases but not a general solution

lusty wolf
indigo carbon
#

idk, the images I generate already have exceptional quality imo, I'll try it, but I doubt it'll improve anything

cyan crown
#

XL_More_ART Lora does a good job improving image quality

glass notch
tribal lantern
# vale eagle PixArt Alpha used dalle3 liked structure and it might open source soon.

kinda different though, dalle and sdxl hava a unet, this one has a transformer structure, the similarity with dalle is t5 text encoder and automatic capitioning to generate better captions. but all in all pixart seems more innovative than dalle-3, unless the dalle paper leaves out lots of details, it seemed kind of ordinary to me, just dalle++ more training, better data, and go with it, even sdxl did with two mixed clip encoders, aestic score, size in training data, might be because llm's is openai's core (and that makes it all the weirder they kept unet vs transforsmers)

cyan crown
ionic dragon
#

Suggest a cool concept to train a lora on?

cyan crown
#

if you were Italian I would like Dylan Dog comics

stone fossil
#

Introducing Hackerman SD XL 1.0, the LoRa that transcends ordinary transformations. Join the digital revolution, as Hackerman SD XL 1.0 unleashes t...

Introducing Chalkify SD XL 1.0, a revolutionary LoRa (Text to Image) model designed to immerse your images in the tender and enchanting world of so...

cyan crown
glass notch
wet nacelle
eternal fog
#

There's been a few recent updates to both ComfyUI and the IPAdapter Nodes and it's sorted out the memory efficiency a lot.

I used to constantly go over the VRAM limit with 10GB and get slowdowns, now it's so much better. I can even run multiple controlnets with it and get no slowdowns.

cyan crown
wet nacelle
indigo carbon
eternal fog
#

The custom node broke for me a bit ago, so I just stopped trying

indigo carbon
#

modules and code from the previous AITemplate repo for ComfyUI had been salvaged and slowly gets implemented into a new one made by Comfy and FizzleDorf

cyan crown
hoary saddle
indigo carbon
#

With pure txt2img I'm still using a commit from a month ago, that's the fastest on my 4070ti

fierce hollow
#

I can't get the new ones working at all 🥲

hoary saddle
#

gotcha

#

fizzledorf i assume

indigo carbon
hoary saddle
#

from the legacy branch in manager

indigo carbon
#

Both the new one and the old one use the modules I made though 😛

hoary saddle
#

cool, gonna remove this new one and pull the old back in

#

oh, sweet

fierce hollow
#

oh I did get the new one working now after all, had to compile aitemplate manually... but the git patch still doesn't apply so not sure what's that about

indigo carbon
fierce hollow
#

I assume that's already happened then(?)

rustic garnet
indigo carbon
#

It will eventually though

fierce hollow
#

well the patch just says

error: patch failed: comfy/ldm/modules/attention.py:91
error: comfy/ldm/modules/attention.py: patch does not apply
error: patch failed: comfy/ldm/modules/diffusionmodules/openaimodel.py:370
error: comfy/ldm/modules/diffusionmodules/openaimodel.py: patch does not apply
indigo carbon
#

And that'll be what everyone will probably use until exDiffusion comes around

fierce hollow
#

so I thought it has nothing to change in there, but I guess it's probably because it was made for some previous commit

indigo carbon
#

And he will, eventually

fierce hollow
#

right

indigo carbon
#

It should be as fast as the old commits are, so that's probably what everyone will use for a while

#

Well, until people figure out how to make optimized kernals instead of engines for diffusion

#

Much like happened with LLaMa

fierce hollow
#

yeah umm I still don't see that happening, openai released their dall-e paper though which is nice

#

shows that a good model should probably be trained on something better than laion captions for starters

indigo carbon
fierce hollow
#

no just thinking that maybe let's get a solid model or something before making it super fast

#

much like everybody started doing stuff with llms only after llama was released

vale eagle
#

no

indigo carbon
vale eagle
#

The next model is a better model

#

You can't wait the best model to do stuff

fierce hollow
#

compared to dall-e 3, it could certainly use some work

#

(inb4 somebody posts 'but look at this cool image, dalle can't do it', but then dall-e draws 3 people with exact specified shirt colors)

indigo carbon
fierce hollow
#

it's still t5 (how many params remains a mystery), their captions are just better

indigo carbon
fierce hollow
#

if you look at the paper they sort of mostly explain that, basically user = dumb, so they process the prompts to match the ones the model was trained on

vale eagle
#

user's prompt -> gptv4 -> descriptive prompt (which match the trained format with the model

indigo carbon
#

The model is also a pixel diffusion model, which is losing points in my book

whole kettle
#

Well you kind of have to read tokens.json to really understand what to prompt it with yeah.

vale eagle
#

It is latent diffusion model

indigo carbon
tribal lantern
#

thanks, got cofused because the pixart paper claimed to be new in having transformers instead of unet. Whether it's better or worse remains to be seen.

vale eagle
fierce hollow
#

if you make your prompt too long you can see the vae bleeding through...

indigo carbon
#

Well, if DALL-E 3 is latent diffusion it's definitely not a good one at that; the most common symptom of pixel diffusion is graininess - which DALL-E 3 has

tribal lantern
#

What irks me about these new textencoder heavy models (pixart now but also deepfloyd/imagegen) is the text encoder is larger than the latent model.

fierce hollow
#

you can offload/quantize the encoder so it's really not that huge of a deal

indigo carbon
#

Speaking of, does DALL-E 3 have an image encoder?

tribal lantern
#

that was my first thought as well, but since it'snot done yet, maybe it'snot so simple

vale eagle
#

Highly descriptive captions is the key to improve prompt following.

fierce hollow
#

I don't know about pixart but imagen encoder can be quantized with bitsandbytes (the results are terrible but that's a separate matter I guess)

indigo carbon
#

The main weakness of SDXL is the lack of an image encoder, so I'm assuming future versions of SD will also have image conditioning capabilities?

vale eagle
#

I don't see image encoder mentioned in Dalle3 paper.

indigo carbon
vale eagle
#

But Dalle3 is highly connected to GPT4V which able to accept image input

tribal lantern
#

main weakness i'd say is prompt following, image input is nice to have too of course

indigo carbon
tribal lantern
#

Maybe clip's a bottle neck

vale eagle
#

GPT4V handle user input and make the prompt to Dalle3

tribal lantern
#

maybe it's a strength too for styles/aesthetics

#

dalle-3 goes out of its way to follow weird/contradicting prompts for me leading to ugly images it seems

indigo carbon
tribal lantern
#

sdxl's understanding of prompts is beautifully abstract in a way

vale eagle
#

And Idea2Img make use of GPT4V to have the image input to make a prompt for SDXL

indigo carbon
#

I'm still stumped on the idea of DALL-E 3 being a latent diffusion, it has such a pixel diffusion look to it

wet nacelle
indigo carbon
#

Anyways, I think SDXL will have a better understanding of language if the encoder won't be CLiP, and switching from CLiP to something else will also open the opportunity to also have an image encoder

rustic garnet
#

this has really NOTHING to do with the text encoder

#

in fact, using CLIP is the best thing you can do if you want to easily use input images

rustic garnet
#

if you want a model that can take images as input you have to train it so that it also accepts images as input (at least in a certain % of the cases)

nimble heart
#

lotta new diffusers use t5 which is apache licensed

rustic garnet
#

SAI haven't done this, probably they thought image inputs are not important anyways

indigo carbon
rustic garnet
#

and DeepFloyd

nimble heart
#

thought dalle did too?

#

used t5xxl

indigo carbon
#

DeepFloyd is pixel diffusion, making it somewhat irrelevant

nimble heart
wet nacelle
nimble heart
#

dalle 3 uses t5 it seems

indigo carbon
#

so T5 is the way, huh?

#

is that disabling a model to also have image input? because Dall-e 3 doesn't have image input it seems

nimble heart
#

No idea. I just briefly skimmed the paper to confirm the T5 thing

#

I imagine you could just use a non-zero latent same as other diffusion models

indigo carbon
fierce hollow
#

has pixart uploaded their models yet? I can only find links to t5 and vae, and the hf space is dead

indigo carbon
#

I think due to encoders being more simple than the UNETs it can be quantized, no?

nimble heart
nimble heart
#

also no safetensors format or diffusers pipeline so sus

indigo carbon
#

one red flag about PixArt, they say it's good at photorealism, but wtf about all the other stuff?

rustic garnet
# tribal lantern thanks, got cofused because the pixart paper claimed to be new in having transfo...

yes. they use diffusion transformers instead of unet. I just say, the unet is ALSO a transformer. A unet is basically a combination of transformers with convolutional residual networks. The transformers in the unet work on the latent pixel space (thus, they are expensive) and the convolution is necessary to get the spatial relationship between the pixels right. diffusion transformers instead split the image into large blocks and then use transformers on these blocks instead on the latent pixels. This is cheaper. Question is, though, if this is also better. I somehow doubt that.

nimble heart
#

once they actually make a safetensors version and it gets a diffusers pipeline it'll be easy to compare

#

it's licensed under AGPL which is cool

fierce hollow
whole kettle
nimble heart
tribal lantern
#

using something like t5 seems excessive to me, a text to image model shouldn't need capabilities of something like t5 (it can to text to text, eg translation, that's crazy if theres proper captioning in one language only) it just need to understand things like how tokens are related how x inside z is interpreted

nimble heart
#

could be an nvidia "the 4060 gets twice the performance as the 3060" example

rustic garnet
# indigo carbon that encoder seems to do a better job than CLiP though

as mentioned earlier: a good text encoder doesn't help you if your captions are bad. The good thing on CLIP is that it is extremely robust even on bad captions. The key is to use good captions with a good text encoder, and that is only possible if you improve the captions using some powerful llm and multimodal models like blip or llava)

nimble heart
rustic garnet
#

but I agree, now that we have these good multimodal models we could replace clip

nimble heart
#

but yea kinda funny. The mid size T5 with a tiny diffusion model

indigo carbon
#

I just looked at the examples, they also have the graininess I was talking about

tribal lantern
#

pixart is trained on almost nothing though, can't help wonder what happens if more images are fed to it

nimble heart
#

sometimes small highly curated datasets can do better than a billion bits of trash

indigo carbon
nimble heart
#

how curated their set actually is though idk

nimble heart
#

also it uses the SD 1.5 vae which is funny

#

i mean if it's not broke dont fix it I guess

fierce hollow
#

I feel like dall-e has a really good vae compared to any sd model, it's evident when trying to feed the images into sd - like half the details are lost

indigo carbon
nimble heart
#

idk im a bit sus of pixart. They constantly bring up carbon emissions and hardly ever mention inference performance/quality

fierce hollow
#

all the models have some grain to them, idk why you keep saying that

rustic garnet
whole kettle
indigo carbon
nimble heart
tribal lantern
#

for any model, using it is being able to tell whether it's good, cherry picked images say nothing, deepfloyd seemed promising, but i never managed to get anything remotely decent out of it that wasn't similar to "[subject] holding a sign with the text "wtfbbq this is next-level"

nimble heart
#

also deepfloyd OOM's on my 24gig card...

#

it follows prompts decently well but the end result looks like hot garbage

indigo carbon
#

so maybe it could be possible to train something like SDXL with T5 and quantize the encoder? that seems like a logical way to go

nimble heart
#

why quantize the encoder

indigo carbon
#

because it's 6B

nimble heart
#

T5 is small enough to run on 8G cards isnt it?

indigo carbon
nimble heart
#

just swap the encoder and unet from ram->vram

rustic garnet
#

in my opinion quantisizing the encoder would make totally sense. As Aliquip mentioned: the T5 model is way to heavy for the image caption problem anyways

nimble heart
#

quantizing hurts performance so quickly it wouldnt make sense tbh

#

maybe 8bit would work?

rustic garnet
#

at least 8bit quantization wouldn't hurt much I guess

nimble heart
#

yea

#

I found 13B 8bit substantially outperforms 30B 4bit on my card

rustic garnet
#

uh, interesting

#

I always read the opposite

indigo carbon
#

so a model that has 8bit T5 and a UNET like SDXL would be a good idea?

nimble heart
rustic garnet
#

I mean, SAI tried T5 for SDXL and they decided for CLIP

nimble heart
#

could also be that all the 30B models are older which doesnt exactly help

nimble heart
#

the 2 clip thing is kinda weird to me

rustic garnet
#

so I guess that T5 might give better text understanding but this doesn't mean the images look better

nimble heart
#

wonder if it was to make it more compatible with 1.5 style prompting?

vale eagle
#

2 clips actually good

rustic garnet
nimble heart
#

they tested this all in the bots

#

so im guessing the new and old clip working together acted to sorta boost people constantly 1.5 prompting the bots

#

T5 probably needs a totally different prompt structure so people had shit results

rustic garnet
vale eagle
#

but yeah. prompting style also affect result. People is dump compare with LLM

indigo carbon
tribal lantern
rustic garnet
#

why?

#

CLIP is a multimodal model

#

makes sense for a text to image model

#

T5 is a pure text model that has never seen any image and was never trained on image captions

nimble heart
#

i suppose that'd mean if they went T5, things like unclip/clipvision wouldnt work

#

I guess you could always interrogate with clip then just feed the text into T5?

rustic garnet
#

(and yes, I totally agree that training on a text corpus makes sense to get a model that is better in text understanding. I just say that CLIP is not totally stupid)

nimble heart
#

wouldnt be ideal though

indigo carbon
#

to achieve image input you'd need the conditioning to have the images

nimble heart
#

regardless of what they do next with text encoding I just hope they ditch the refiner lol

#

I cant say I've used it once in the last month

indigo carbon
#

just extra params

tribal lantern
#

hmm, refiner....

vale eagle
#

It is kinda interesting. You could use GPT4 to read the image, create prompt and feed it into sdxl to get the required result

nimble heart
#

main problem is the refiner butchers high frequency details

#

like it can make some structures look better but the image almost looks lower res as a result

indigo carbon
vale eagle
#

no

tribal lantern
#

often the refiner is a skip, then i use foocus with all the defaults, and am amazed how nice the results are

rustic garnet
#

in my opinion, Figure 3 in the PixArt alpha paper shows nicely why their method might work so well

indigo carbon
# vale eagle no

if you feed it an image of let's say: a dog, then use the output as a prompt- it WON'T be the same dog

vale eagle
#

It wouldn't be same via vae

nimble heart
rustic garnet
indigo carbon
nimble heart
rustic garnet
#

yeah, they should refine LAION

vale eagle
#

they have

rustic garnet
#

and as they write: if your captions are well aligned, you need less data to train

nimble heart
#

yea kinda why you can train a Lora with just like 100 or so hand-captioned images

indigo carbon
#

is that what's the Dall-e 3 paper is about? I haven't read it yet

nimble heart
#

that's the pixart paper

nimble heart
#

Ah I meant kaibo's screenshot

indigo carbon
#

I was talking about this

tribal lantern
vale eagle
#

I think dalle3 is doing better on this

indigo carbon
nimble heart
#

pixart is a tiny (ish) model by comparison though. it might be a better approach for local inference

#

hence why it's also AGPL licensed

mellow tendon
#

In never have much luck with the few times I have tried to prompt Pixart.

nimble heart
#

a high quality model built specifically for easy training and local inference licensed under AGPL is a winning combo if it actually turns out to not be garbage

#

but for now we wait for a diffusers pipeline

vale eagle
#

pixelart tried with Laion but choose another dataset for training(Table 1)

rustic garnet
vale eagle
#

LAION-LLaVa is the refined dataset

rustic garnet
#

yes

#

but I mean LAION themselves

nimble heart
#

also waifus but beyond that its other datasets

vale eagle
rustic garnet
#

yes, but they created the captions for LAION and SAM themselve

#

what I meant: did LAION ever came up with the idea of refining their captions? I think pseudo said that once, but I never found evidence for that

upbeat summit
vale eagle
indigo carbon
#

it seems that T5 isn't limiting PixArt from having image input, they were able to make ControlNets for it

rustic garnet
indigo carbon
rustic garnet
#

the control net is its own image encoder

#

a control net is a separate network

#

that takes the control net image as input

indigo carbon
#

it won't be able to blend images it seems

rustic garnet
#

not if they haven't trained for it

nimble heart
tribal lantern
#

the code uses diffusers and follows a similar structure/api

#

doesn't seem hard to incorporate into diffusers

nimble heart
#

at make a safetensors file

tribal lantern
#

they seem to really do their best to blend into the existing generative ai eco system

nimble heart
#

yea they just made the HF page two days ago so im not expecting miracles

#

but diffusers is listed on the "todo" so 100% just wait for that instead of trying to hack their inference code into a UI

#

hypothetically it should just work™️ on sd.next then

indigo carbon
#

I compared some of their showcased results with SDXL, SDXL seems to be better in most cases

#

so better language understanding or not, SDXL still takes the cake

#

though I don't doubt SDXL would be even better if it was trained in the way they trained that model

#

so maybe we'll get an "SDXL2" or even an "SD3" that will have better language understanding

nimble heart
#

wonder how pixart does with dark latents

wet nacelle
soft bone
# upbeat summit

its interesting how XL still has this split ground perspective problem. idk how it can master reflections but cant keep the ground level. happens to me constantly

nimble heart
#

oh hell it's the kung fury hacker dude

#

Kung Fury 2 apparently debuts November 17th

wet nacelle
icy brook
weary yacht
#

what's the largest SDXL images you guys have made?... I'm trying a 3440x1440 right now

weary yacht
#

yeah... 3440x1440 is a no go

vital ermine
eternal fog
#

That's more normal lol

indigo carbon
#

PixArt doesn't have an image encoder due to trying to be as efficient as possible in training, but they theoretically could

#

This might true to DALL-E 3 as well

#

In the case of DALL-E 3, the language understanding doesn't come from the text encoder though, they explained it was the dataset they trained it on according to the paper

eager onyx
#

programmer

hoary saddle
#

did someone mention a few days ago, a website where you can upload a dozen or so images and it will make a lora for you?

nimble heart
weary yacht
#

two pass?

nimble heart
#

what the automatic1111 UI calls "high res fix"

weary yacht
#

you mean when you make a lower res image then use AI to upscale and add detail?

#

or just upscale and enhance?

nimble heart
#

feed the text2img result into an img2img

#

so the former I suppose

weary yacht
#

yeah, I'm mostly just stressing my hardware out to see what it'll do

nimble heart
#

4k is like 7 seconds per it for me lol

#

so I only do like 15 samples

weary yacht
#

what's your GPU?.. I was getting about 7s/it for 3440x1440

nimble heart
#

7900 XTX

weary yacht
#

i generated a 3440x1440 image but it failed at the decode phase

nimble heart
#

use tiled

weary yacht
#

vae decode tiled?

#

I'm going to attempt a 50-step 3440x1440 with tiled decoding and see if it works, or if a red text box pops up and cusses me out again

nimble heart
#

jeeze try it out on like 5 steps first make sure it decodes

weary yacht
#

boom.. 50 steps 3440x1440

#

12 minutes, roflmao

#

there is so much wrong with it too like how planets and even a sun are sitting on the ground, but that wasn't really the point

nimble heart
slender coral
#

Got a prompt question, when prompting everything that I get is extreamly new, almost like a 3D render for somethings, how I change that is to use old dirty that works 80% of the time, but then I get something like an animal that I want to look like anatrual looking animal but dirty makes it dirty, suggestions on prompts to fix this?

crisp owl
#

photograph of
cinematic photo of
portrait shot of
digital photo of
movie still of

hoary saddle
#

made an image gallery from the ComfyUI output folder, no more searching for an image from 2 weeks ago

#

mobile friendly 🙂

weary yacht
nimble heart
#

1.5 is slower than XL after 768x768

hoary saddle
weary yacht
#

yeah, but you can start out with a smaller image, and use it for img2img when you create one that has the basic look that you want, and you can upscale before plugging that into img2img and make bigger images based off that smaller one.. the benefit of doing the smaller ones is you can generate a lot of them, often several at a time, and reach a starting point faster to build from

#

if I do a 1920x1080 image, and there are imperfections, it requires a lot of subsequent effort with inpainting to correct, and all of that may be avoided by starting out with an image already close to what you want

nimble heart
nimble heart
#

bonus image

shy kelp
nimble heart
#

base XL no refiner

#

prompt uhhhh

#

if you use comfyui you can just drag the image onto the canvas and see the workflow

shy kelp
#

im completely new to sd

#

ill google comfyui and basexl

nimble heart
#

grainy deep ocean footage of a an monstrous tentacle woman in the abyssal depths below

nimble heart
#

like the normal model

#

base version

shy kelp
#

oh gotcha

#

just made this as one of my first uh...pictures? idk what to call them

vital ermine
lusty wolf
#

Just happy my Comfy is working again...

vital ermine
vital ermine
lilac wren
#

@vital ermine Alice in Wonderland and Madmax?

#

and ghost in the shell 🙂

glass notch
south horizon
vital ermine
vital ermine
limber citrus
#

The quality is amazing 👍🏻

tropic turret
strong copper
vital ermine
tepid sinew
#

How do I make pics?

vital ermine
glass notch
vital ermine
vale eagle
vital ermine
glass notch
# vital ermine

This could be a superhero whose power is to replace and repair public water utilities

south horizon
glass notch
tropic turret
vital ermine
vale eagle
vital ermine
south horizon
south horizon
lusty wolf
#

Chalk Lora from @stone fossil

south horizon
vale eagle
#

Testing my new fine tune

south horizon
vale eagle
noble shoal
vale eagle
south horizon
vale eagle
mellow tendon
vale eagle
mellow tendon
vale eagle
#

no

#

I used llava to descript the image and generate the prompt

indigo carbon
#

I took a look at DALL-E 3's paper, they do indeed use T5

jolly creek
steady grove
indigo carbon
steady grove
steady grove
#

time to pour some extra strong coffee

steady grove
indigo carbon
steady grove
#

/shrug

targeting home gpus probably

indigo carbon
#

T5 can be quantized, that's no excuse

steady grove
#

would a t5 trained model work with a 3070?

indigo carbon
#

Yes, easily. An 8-bit T5 won't hurt it

steady grove
#

also i think they like openclip becuase they have the license to it. t5 is a restrictive license isn't it?

vale eagle
indigo carbon
#

I'm almost sure OpenAI aren't using full precision on T5

#

Even 8bit would be enough to make it run on about the same scale as CLiP

steady grove
#

i'm sure if it were easy we'd see more researchers doing stuff with it. there's probably big caveats. t5 has been out for a long while

#

it's very impressive too

#

people don't just ignore that. there's gotta be a reason why

steady grove
#

kandensky never used it either right?

indigo carbon
vale eagle
#

At that moment, people used to use Clip L style's prompt. I think using prompt like that T5 might not perform a good result.

steady grove
#

lots of other models coming out but i only see google and other big proprietary ai companies using it. must be a lot of licensing issues tied to it

modern kraken
#

It really is the simple things that hook me.

steady grove
indigo carbon
#

DALL-E 3 prompts very easily and it uses T5

vale eagle
#

prompt don't go directly into dalle 3

steady grove
#

you can punch sdxl style prompts into dalle3 and it'll do fine

#

has better comprehension too

vale eagle
#

user prompt -> gpt -> descriptive prompt -> dalle 3 T5

steady grove
#

That's if you're using it through chatgpt. there are other interfaces

indigo carbon
steady grove
#

the api for it is wide open now

#

"open"

vale eagle
#

Dalle 3 trained with descriptive prompt. They use gpt to generate that style of prompt to use the maximum capability of the modal

steady grove
#

using gpt to rewrite a prompt won't make it better comprehension. you can't just throw gpt4 at sdxl and get dalle results

#

t5 is the core reason why dalle prompting is so good

vale eagle
indigo carbon
steady grove
#

you can actually score prompt comprehension thorugh a variety of methods

vale eagle
#

prompt comprehension wasn't only come from the T5

indigo carbon
#

maybe SAI didn't use T5 because quantizing wasn't a thing when they began training SDXL?

#

because if quantized properly; T5 can have close performance to CLiP

nimble heart
#

doesnt DFIF use T5 already?

#

they're pretty familiar with T5 so I assume the reasons for going clip on XL was more than just the extra gig of vram

indigo carbon
nimble heart
#

pixel like there's no VAE?

#

maybe that's why the images look like garbage lol

indigo carbon
#

so no, no VAE

indigo carbon
nimble heart
#

the double-upscaling sucks ass. pixel could maybe work with a better method for that

#

though maybe using unedited images directly instead of the VAE makes it noisier

steady grove
#

i did a dozen generations with df and decided it wasn't feasible as a tool

pure crystal
indigo carbon
#

maybe performance, but we can fix that via quantization with minimal loss as of now

nimble heart
#

apparently T5 by itself uses > 12GB

indigo carbon
nimble heart
#

if you quantize T5 down to 4bit it's going to destroy the quality

#

and 8bit would still be like 7 gigs

#

which is as much as the entire XL pipeline right now

indigo carbon
nimble heart
#

I've literally made my own 4bit quants and they suck ass

#

even 6bit can be sketch

#

if you compare with what the full fp16 model does it totally destroys the outputs

indigo carbon
#

you didn't do it properly then.. it won't make sense that it barely effects LLaMa and destroys T5

#

or maybe degradation rate is higher with 4.3B models?

nimble heart
#

tf you mean I didnt do it properly. have you actually used higher than 4bit on meaningful models?

#

if you only ever use 4bit gptq then they seem nice until you try the same model at 8bit+

steady grove
#

my figuring is if it was as good as hype says it is, it would've caught on by now. i used the 4bit llama too and it descended into garbage after two prompts. oculdn't make it work at all.

#

24gb doesn't seem to be enough for a llm that doesn't dissolve into gibberish with the slightest bit of context

nimble heart
steady grove
#

maybe i'm doing it wrong, okay, butif it were simple to deploy and usable, people would be using it. That's how i see things.

indigo carbon
indigo carbon
steady grove
#

i hear that okay and i see them using it, but i feel like they're struggling to use it and are pretending it's all good

#

comes to a certain point where i might as well just write things myself

nimble heart
#

people use 4bit because they have to

#

most dudes are still on 8 or 12 gigs of VRAM

indigo carbon
nimble heart
#

i mean what's there to complain about?

#

the alternative is using cpu offload with transformers and getting 0.2 it/s

steady grove
#

um, there's a HIGH bar to use LLM. a level of technical know how. people complain constantly about that. there's a huge sea of newbs wishing they could use LLMs of any kind and they're all rabbling

indigo carbon
nimble heart
#

so having 4bit fit entirely in vram and running @ 5 it/s is a godsend to them

#

tokens whatever

#

same thing

steady grove
#

it's hard to find good advice for ooba booga because of how many newbs are trying to make their own e gf

#

maybe you filter out all the noise but i assure you there are complaints

indigo carbon
#

so you're saying SAI doesn't use T5 due to performance, eh?

steady grove
#

no. i think its more about compatibility and ease of deployement

nimble heart
#

if you really want T5 you can always use DFIF

steady grove
#

that includes performance, but also helping people implement nich software libraries

indigo carbon
#

look, a model that uses T5 is going to release soon, we'll see how that does

nimble heart
#

pixart?

indigo carbon
#

yeah, that uses T5

nimble heart
#

they already have an inference script on their github

#

go try it

#

I'm just waiting for the diffusers pipeline

#

so it'll work in sd.next

steady grove
#

i disregarded pixart when i first heard about it, because they were bragging about how few carbon emissions it costed to train. i think all of that is just dumb poppycock nonsense. minimizing the carbon footprint of a single project isn't going to do jack all. we need to plant trees.

Anyone trying to brag about their carbon footprint are scammers like recycling companies are, so i tend to lose trust when i see it

#

it might be good, but they're using carbon footprint to hype it, so i don't think it has any legs

nimble heart
#

optimistically, if it's as easy to tune as they say it is and runs well locally then it could be a success if the images arent garbage.

  • AGPL license nice.
#

but the fact that they bring up carbon emissions 5x as often as inference quality is sus

steady grove
#

its out now? oooo. worth a look, but i'm a giant cynic about carbon footprint obsessed tech projects

#

oh no no weights yet

#

and Inference requires at least 23GB of GPU memory.

lusty wolf
#

Something pretty for a change... 😜

steady grove
#

sdxl always struggles on teeth really bad i've noticed. worse than hands ever were. Cool image though. life an death

noble shoal
steady grove
#

no canines. all incisors. sdxl LOVES incisors

lusty wolf
#

Cheers...

vale eagle
noble shoal
indigo carbon
#

the entire SDXL model has more params than that and it doesn't take as much

nimble heart
#

different architecture

#

look on pixart's HF. the t5 encoder is literally like 16GB of just weights

indigo carbon
steady grove
nimble heart
#

so considering one of the main points of XL was fitting on 8gig cards I think t5 was automatically off the table

indigo carbon
nimble heart
#

bigger

indigo carbon
nimble heart
#

even if you halved the size with a quant without destroying the inference quality t5 would still be bigger than XL

steady grove
#

bigger in bits not unrelated parameters in different architecture.

nimble heart
#

in terms of gigabytes

indigo carbon
#

yeah, I see

steady grove
#

some pentiums reached 4ghz. they are not faster cpus than modern i3's

nimble heart
#

both clips combined are like one gig I think

steady grove
#

how does df's t5 implementation work? i was running it on my pc and it didn't need 24gb

nimble heart
#

maybe an "SDXXL" targeting 16 or 24gb minimum would work

nimble heart
#

unless there's improvements in diffusers now

steady grove
#

made a dozen or so images when it dropped public.

#

was slow though

nimble heart
indigo carbon
steady grove
#

when nvidia releases the 5080 it'll be good right? that'll come with 30gb right??? /padme

nimble heart
#

the 3080 had 10 gigs and the 4080 was gonna have 12 before they renamed it

indigo carbon
nimble heart
#

if you just want vram and nothing else you can get an A770 16gb for like $300

steady grove
#

no i like speed too

#

i've considered older cards though. might still yet

nimble heart
#

7900 XTX 24gb for $950. outperforms a 3090 when on equal playing field

indigo carbon
#

they literally used 128bit VRAM on almost the entire 4000 series, that's stupid

#

idk if they'll do that to 5000 series

steady grove
#

gets you banned in counterstrike though (not that i play i just think it's hilarious and typical amd driver moment)

nimble heart
#

play better games, ez

steady grove
#

i grow weary of amd. was using them for a long while. 4080 is my first nvidia gpu tbh

nimble heart
#

my XTX absolutely demolishes VR games

indigo carbon
nimble heart
#

running everything at 150% of my headset's resolution

steady grove
#

yeh when it works it works. Linux drivers were so superior when i used amd. gained 15fps in alot of agames when i had my vega64

steady grove
nimble heart
#

with Proton on Linux I can play Devil May Cry 5 @ 8k Ultra and still get 80fps

steady grove
nimble heart
#

shit's cracked

#

the mesa ray tracing isn't that good yet though

steady grove
#

you can get some nice fps boosts on windows too if you use dxvk wrappers on old games

steady grove
nimble heart
#

for cyberpunk I still switch to windows

#

mesa right now is like 1/5th the speed

#

apparently if you compile it from git it's "up to" like 1/2 speed

steady grove
nimble heart
#

still have to play that I need to finish other games first

indigo carbon
#

SDXL's quality is certainly good, but it doesn't follow the prompt as much as T5 models do

#

maybe SAI'll make something that replaces CLiP? T5 isn't the solution for what they're going for

steady grove
steady grove
#

what i love about stability is they are all in on researching this stuff to run on consumer hardware instead of corporate hardware

nimble heart
#

dfif in shambles

steady grove
#

credit to others who are contributing to that effort too of course, but sai seem to be leading the pack here

steady grove
indigo carbon
nimble heart
#

I think they should make a dfif2 tbh

#

could be their "sdxxl" for absolute highest quality at cost of all your vram

steady grove
#

i'd never be able to get that from sdxl. that's dalle

indigo carbon
steady grove
#

a fluffy cat on their back, playing with a computer mouse as if it was a real live mouse

#

a fluffy cat pouncing on a computer mouse like it was a real live mouse

the way it understands prompts is phenomenal for real

#

it just sucks that it needs corporate datacenter level computation

#

you can see in his eyes, he wants to eat

indigo carbon
steady grove
#

yeah the quality of renders compares very well. especially if you're a skilled operator

indigo carbon
bright valley
#

It's a lot different that's for sure

#

But once you get the hang of how it works you can definitely control it a little better

steady grove
#

Yeah often, i prompt knowing that sdxl isn't going to get the prompt very well. i'm just throwing stuff out there to sort of nudge it towards what i want

#

sd1.5 did that even more so. prompt salads i use extensively on that side

bright valley
#

Like I made this render with SDXL and there's 30 different anime in it, and I think it nailed just about all of them, through prompting alone #🎥|animation message

indigo carbon
glad grove
#

try with something harder like a "1girl praying inside a dark temple with a golden buddha statue with 16 arms in background" this one took me like 140 images to get on sdxl

indigo carbon
#

takes me less than 30s to generate an entire batch like this

bright valley
#

I don't use things like 1girl on SDXL

#

that's kinda exactly what I mean

#

that works grat on 1.5 but certainly not XL

#

great

steady grove
#

it's on sd15 because novel ai did all that expensive work and eveyrone stole it

glad grove
#

first try on dalle

indigo carbon
glad grove
#

wonder if you could expand the scene with outpainting,idk if sd would understand the details and add them properly

steady grove
#

microsoft likely found a sanitized booru tag dataset to train with. likely are investing millions into data set building

indigo carbon
steady grove
#

sd15 is a poisoned well. a lot of garbage happened in it's early development before the popularity kicked up

south horizon
south horizon
steady grove
#

if mona lisa were a hot valley girl

half cedar
half cedar
crisp owl
#

Bing can make some great images at times, as Dalle3 can, but they both are so grainy if blown up to anything beyond quick viewing size

half cedar
high skiff
#

Wanted to give a little update on my realism LoRA progress. Here are some new examples of what it looks like now

Top left is mine, top right is RealisticVisionXLV2, bottom left is Realism Engine, and bottom right is Real Stock Photo

Current dataset is only 90 images and not trained too well. Working on the 500 images version with very meticulous tagging. Also experimenting with some new papers in training with the goal to get much higher fidelity, and much better brightness control

#

It's being trained to mimic the look of properly color graded professionally photographed portraits and various other image subjects

crisp owl
#

"hypertile" in recent comfyui commit 🤔

nimble heart
#

yea it's broken

#

supposed to help 1.5 models scale like XL does I believe

#

but it always errs out

crisp owl
#

ah

nimble heart
#

it tiles the first unet attention or something

#

so it doesn't blow up at high res

#

so it should make a sorta more linear XL performance curve instead of just super exponential

crisp owl
#

Could be cool, does say still testing, so maybe it'll get perfected soon-ish

nimble heart
#

and in the future it'll maybe expand to other areas of the model

clever verge
vital ermine
high skiff
#

I can share more information about it tomorrow, I have a horrifically bad headache at the moment, and I'm off to sleep

clever verge
high skiff
#

Has very nice fine details. My LoRA wasn't trained fully enough to really pick up on fine details

Yours has a bit more of a painterly look mixed in with the realism which is a nice aesthetic

clever verge
#

I've mixed in some post process grain and LUT adjustments but it's subtle.

heavy zinc
#

Hi all, Do I use sdxl? I have Sd 1.6

uncut fiber
#

depends on model you are using. If about 6GB and containing XL most probably

noble shoal
clever verge
nimble heart
#

3 legs == 2 vagonyas

clever verge
#

Normally when you have as light skin as she has I'd say it's normal to be white on the non-sun side or do you see anything else that I have missed?

nimble heart
#

her left arm is tanned like she hangs it out the window while driving

#

compared to her right arm and first two legs

steady grove
nimble heart
#

cookie monster always scared me

steady grove
#

crumbs alllllll over his kb what a mess

noble shoal
steady grove
#

cookie thing

#

dalle totally got what i was prompting for

lapis gale
#

warning, cookie overload

#

cuuuuute

noble shoal
indigo carbon
#

do you see it?

noble shoal
#

Who in the right mind has not commented out this download out of the Automatic1111 code? smh

nimble heart
#

it only does that if your models folder is empty

fierce rivet
#

Hi, newbie here. quick question with regards to the image dimension when generating with sdxl checkpoint in Sd webui. Do I keep it at 512 x 768 and then upscale it by x2, or just generate it at 1024 x 1536 without upscaling?

indigo carbon
#

not nearly as efficient as ComfyUI

noble shoal
#

But loading this antique checkpoint....

nimble heart
#

512x768 is actually too small for XL

fierce rivet
indigo carbon
#

also I think SAI are working on a new encoder, I looked at things mentioned in the SDXL released and Emad said something about a future SD3.0 being entirely different

nimble heart
#

emad speculates on a lot of things. we'll only see when the time comes

vital ermine
vast ridge
vital ermine
noble shoal
#

Made a test training a lora on text. Getting interesting results so far.

vital ermine
lilac wren
noble shoal
vital ermine
noble shoal
vital ermine
lusty wolf
vital ermine
half cedar
vital ermine
thorny frost
#

hi guys! Any model recommendation to generate landscapes?

vital ermine
thorny frost
#

XL model j mean

cyan crown
vital ermine
cyan crown
strong copper
#

final stilization

cyan crown
wet nacelle
#

SDXL 1.0 base is still very good guys.

#

@vital ermineHow many Loras do you have in the works?

wet nacelle
vital ermine
vital ermine
#

About to release this one if this training works right but buckets are not playing nice

noble shoal
wet nacelle
noble shoal
wet nacelle
#

and it's gone

noble shoal
noble shoal
wet nacelle
indigo carbon
vital ermine
half cedar
#

Sdxl -> sd1.5 dreambooth -> Pika -> After Effects

sweet wyvern
#

what's the minimum spec for the XL refiner?

crystal gazelle
#

Does anyone know a proper tutorial to download SDXL, I've tried so many and everytime get some cmd error

rustic garnet
#

try invokeai, that's quite user friendly

wet nacelle
steady grove
crystal gazelle
#

@steady grove Will it allow me to generate AI images from prompt

steady grove
#

matrix won't. it's an installer for various UI's like automatic1111, sd.next, foooocus

#

one of those will do prompts to images if you've got the hardware for it

wet nacelle
#

Too tall

#

He fell

noble shoal
cyan crown
wet nacelle
cyan crown
wet nacelle
cyan crown
wet nacelle
# cyan crown can you provide your prompt ?

Pos: vhs camcorder footage of bladerunner Japanese town

Neg: black and white (cartoon), 3d, render, low res, low resolution, ((text)), ((watermark)), ((logo)), tongue out, ugly, masculine, vibrant, .com, ((tanlines)), (( ososedki.com))

wet nacelle
#

yuppers

primal vault
cyan crown
high skiff
#

@noble shoal how does your LoRA work for making things out of text? My research group has a couple people researching text performance for SDXL, and one person who is doubting how good SDXL could ever be for text

I'd love to see what else your text LoRA can do, or even play around with if I'd you'd be so kind

#

From what I have seen so far, I'm quite impressed to say the least

half cedar
noble shoal
# high skiff From what I have seen so far, I'm quite impressed to say the least

Thank you. Well, my one man research group has carefully captioned 98 photos with text in it. I included {ObjectInPicture} with the text "{Text shown in the image}" on it in every caption. I might be hallucinating but i think overall text coherency improved. One or two words get usually nailed instantly. I managed up to 6 word sentences in my tests. So yeah, i guess if the dataset is captioned good enough, SDXL has no problems with text.

high skiff
#

It's quite incredible how fast SDXL seems to pick up on concepts with around 90 images of properly tagged data

high skiff
#

My realism LoRA is 90 images (working on a new much better 500 image version), and it makes a monumental difference compared to even the best realism finerunes out there

#

Mine is top left in all three

#

It's trained specifically for much better lighting, foreground/focus/background separation, and overall DSLR dynamic range compression

#

It's also trained to work with painfully simple prompt

#

"a portrait photograph of a black woman with blonde hair wearing a green suit at dusk in front of a shop"

#

Unfortunately, I'm only on my phone right now so I don't have any more examples, but I've probably tested at least 80 comparisons

noble shoal
#

Oh, maybe this makes also a difference, but i am unsure. My training images have only a resolution of max. 768x512. This allows me also to create images in this resolution and then upscale them.

noble shoal
high skiff
#

my training images are 4k-8k+ lol

It doesn't matter much right now, but the final version of my LoRA should be able to handle absurd detail levels

wet nacelle
noble shoal
high skiff
noble shoal
radiant tartan
#

for OpenPose ControlNet SDXL ive tried a few openpose models none seem to work, no errors, the image just never comes close to it. do i need that 5gb open pose model? i tried the smaller one and no luck..

steady grove
# radiant tartan

openpose controlnets are harder to train and only know the poses in their data set. that's a tricky one

radiant tartan
#

or maybe a simpler pose

steady grove
#

haha yeah simpler poses. i'll try using it see what happens

#

you're also prompting really loosely. "Astronaut doing a front kick" might work better

radiant tartan
#

original