#✨|sdxl
1 messages · Page 161 of 1
can someone make stable video diffusion that generates 100 frames at once
That’s crazy
Does anyone know what this error means in comfy? Using an SDXL model.
ERROR diffusion_model.output_blocks.5.1.proj_in.weight shape '[640, 640]' is invalid for input of size 1638400
ERROR diffusion_model.output_blocks.5.1.transformer_blocks.0.attn2.to_v.weight shape '[640, 2048]' is invalid for input of size 983040
ERROR diffusion_model.output_blocks.5.1.transformer_blocks.0.attn1.to_out.0.weight shape '[640, 640]' is invalid for input of size 1638400
ERROR diffusion_model.output_blocks.5.1.transformer_blocks.0.attn1.to_q.weight shape '[640, 640]' is invalid for input of size 1638400```
Gear in the top right, scroll down, where "Spline" is, select "Straight"
Link Render mode is the setting
Thank you!
👍
do we know what are the defaults for the SDXL bot in terms of CFG, steps, negative prompt, and sampler?
expand image
settings are randomized between a couple different setups, also models are randomized. it's being used for a/b testing
How’d you do this?
I’m also very interested in this. I’m sure there is already a node to recrop out a base image and outpaint with background content, maybe detector + regional sampling, but integrated
I want to add bed and but it's dosen't work i use https://huggingface.co/diffusers/stable-diffusion-xl-1.0-inpainting-0.1 with diffusers/controlnet-depth-sdxl-1.0
steps
30
prompt
{{Bedroom}}
strength
0.99
scheduler
K_EULER
guidance_scale
8
negative_prompt
monochrome, lowres, bad anatomy, worst quality, low quality
and it's my mask
Does anyone use a NVIDIA TESLA M40 for AI purpose? Any recommendations? Bout to grab it
you need to change the Masked concet to "fill" or "latent noise"
ok thanks, you know the name of the settings in StableDiffusionXLControlNetInpaintPipeline
oh no, sorry I only use Automatic1111
also that inpainting SD XL model is kind of shady, maybe it isn't working at all. Try to inpaint with a regular SD XL model.
or with an SD 1.5 model
In case it is that
ok thanks
Also your mask is all white and take the whole picture?
yes it's just to test that it adds well
what always works quite well: copy&paste a bed into the image and then inpaint over fit. Or draw with a graphics programm a bed. It doesn't have to look good, just the colors should somehow fit
Stable Diffusion never starts from 0. Even if you say 100% denoise rate it still assumets there is some signal in the image and tries to recover that. Therefore, if the image at a certain area is just empty SD won't like to add anything to it. You have to fill it with something
in comfyui, how do you save current workflow as png using a snapshot of the nodes for thumbnail?
For me like this, but from my head cant tell if it comes with a custom node 

anyone recommend a good model for hand painted looks?
like those stylized hand painted models u get to see
yay! group nodes branch has been merged 
saw that, saw the big drop down notes, but....what is it?
the screenshots should give you some ideas: https://github.com/comfyanonymous/ComfyUI/pull/1776
basically we can now consolidate a group of nodes into it's own node - like a component
ah I missed the separate convo, will check that now
I can imagine why, but it would be great if primitives could be supported. than you could build very compact settings / control panels
but doesn't work right now
There's been a lot of good changes with primitive nodes, I'm sure that'll come in time
but you can consolidate 6x ksampler advanced + vae decodes, which is a total valid use-case 
So it's basically what nestednodes was, but probably works better I assume
nestednodes always gave me issues
yes, that's it
Hey everyone, I posted my ComfyUI workflow based on how I think Midjourney Tune works behind the scenes.
Basically it takes a bunch of prompts, mixes them into a quick generating 16 image batch with SDXL turbo that you can use to generate styles. Those images can be selected and pushed to a high quality pipeline for generating stylized images with IP Adapter!
Please try it out, let me know if you have any feedback or issues.
thanks for sharing it! I'm going to check it out for sure
kept everything default (except a couple samplers), getting this at batch prompt.
About to step away from pc, but will check/try again later tonight.
yeah I also have some issues with some groups. some stuff gets disconnected or lost - like primitives
I'm grouping components after components and test them afterwards now
@visual glade Since the combine nodes update was released the "PatchModelAddDownscale" node seems to be non funcional. Either changing the values in the node or even bypassing the node entirely results in the same image. Is this node turned off for some reason?
it still behaves like it should here
oh ok. I will check what has happened on my end then
I tracked it down. I installed freeu custom node earlier and now I just uninstalled it. That was somehow effecting it.
Also thank you for all the great updates. Much appreciated!
hello team i am here
would hiiiighly appreciate any one of the code bro's making an improved masking tool for auto's
masking with a mouse is meh
SDXL turbo fp16 safetensors or normla SDXL turbo safetensors? WHich is better?
use fp16 to make images, the non-pruned (13gb) version for fine-tuning
+.. thanks
Not quite what I was going for, but interesting nonetheless
This is more what I was expecting lol
“my husband may be very angry when he finds strangers here, and often the glance of his eye is so fierce that it kills!”
(not the prompt)
So my new training settings do not seem ideal lmfao
almost perfect loss, as in, it gives you almost perfectly nothing when you ask for something lmao
believe it or not, its actually learning now
Important notice for all SDXL users: all SDXL models can be used with turbo tech!
https://www.reddit.com/r/StableDiffusion/comments/1888j8a/all_sdxl_checkpoints_can_be_used_with_sdxl_turbo/
its consistently going down
there, did enough spamming of my post
its still at 0.983, but hey, its going down lmao
hope people will start doing more realtime img generation now 😄
what you training @high skiff
I am breaking SDXL's text encoder and retraining it for the purposes of RunDiffusion
trying to make it better for very high fidelity photographic image generation
yessir, I already have a damn good U-Net, so I wanna dip my toes into TE to try and see if its worth it
this latest training missed a 0, so thats why it exploded lmao
0.01 LR, bound to make anything explode lmao
@floral islandI have an absolutely phenomental Unet I made for realism, but the TE seems to be whats letting it down, so I am trying to make it better
my fast litle 5 minute test I did already proved that it does in some ways
Prompt: A side profile photograph of a handsome and muscular man lifting weights at the gym
yeah, nobody trained the text encoder EVER
normal SDXL TE
so i'm sure there's a lot of finetuning that could improve it by margins
after legit 7 minutes of training
my model can already do phenomenal gens, but if we had more control, that would be even better
very good stuff
right side are 1536x gens in just 7 seconds on a 3090
single sentance positives, no negatives
ok, the TE training finished
it has a whopping 0.996 loss
lets see what it looks like just for fun lmao
the one I just made? yeah, but the problem is if it was finetuned on the built in TE, the results will skew really hard, which is what I am seeing
I am trying to ease it into working better
ah, so it's kinda biased?
No, thats not at all what I said lol
oh, didn't mean it like that in a negative way
if you spent weeks training the UNET to get the results you wanted with the built in TE< and it works properly, changing the TE drastically will misalign all of those fancy changes made to the UNET to fix issues with the TE
oof
yup, now i understand what you mean
hope you can get that fixed, that might be a bitch
That is what I am seeing, and I am trying to gently coax it into a middle ground
Yeah, if I can't, this modle already kicks ass
but if I can, god damn lol
this model already giga whoops midjourneys ass in just about every way for realism
i gotta fix my blocks on the 1.3 of my merge
but the TE is holding me back from being as exact
gonna test the new TE to see how fucked its results are lmao
xD
wait what...
They are
beautiful
WHAT
WHAT
WHAT THE FUCK
ohhhhhh
it cooked the Unet but fixed the TE
so if I just use the TE, it works good...
@floral islanddude
A photograph of a pumpkin made out of glass
No TE edit
TE edit
wow, nice 😮
ok, its very inconsistent, as to be expected, but I mean come on
it was at 0.996 loss when it ended lmao
to be fair, if you'd show those images to someone random, they'd not be able to tell it's ai
heck, i know because you said so, and it's still hard to spot!
@floral islandDuuuudddeeee
Left is my base model, middle is that new really bad TE training I did, and right is the 7 minutes TE finetune
"A head on portrait photograph of a pretty young woman with red hair and blue eyes smiling in a forest, she has on a black dress"
so training the TE DOES help, massively lmao
the increase of vibrance in that image 😮
so much more real
i need to steal this tech for my model
the right one is the kind of image you'd find on a "missing people" page 😄
very well done
"last seen, these very woods"
much better contrast
new version v30.0 of my workflow with new ipadapter features (masks + start/stop) now on github:
https://github.com/JPS-GER/JPS-ComfyUI-Workflows
German shepards are so fundementally fucked in SDXL models that I haven't been able to fix them
OH MY GOD THATS SO MUCH BETTER
You trained SDXL's CLiP?
I am in the process of it, yes
I guess that could improve prompt alignment, but it shouldn't effect quality in a noticeable way
There's probably a limit to how aligned CLiP can be to your prompt, but I guess it can be improved substantially when heavily trained
That's the exact reason why SD3.0 was rumoured to use a different encoder
is there any local model that beats the new novelai anime model?
We need a anime SDXL model like NovelAI's NovelAIdiffusionV3.
Have any of you seen it?
It looks incredible. The first SDXL anime model that has nailed the style, in my opinion.
next level animation: https://humanaigc.github.io/animate-anyone/
Does anyone know of a good write-up or video on training sdxl using dreambooth in kohya_ss local install (4090)? All I can seem to find is videos on making loras with low end cards. Specifically trying to train a specific face into the model
I guess it doesn't have to be dreambooth or kohya either, as long as it can get hte job done 😛
well, when they made that for 1.5 it was leaked. I doubt anything will be leaked again
however I'm unsure if they have a license to train SDXL then make money off it
maybe they do, idk
either something like W.D gets SDXL figured out or that model's getting leaked by 4chan users again
Hello all...
https://image.civitai.com/xG1nkqKTMzGDvpLrqFT7WA/73976720-46fd-4114-9cf7-ac43ee1acd1f/width=450/00029-2056951564.jpeg
https://civitai.com/models/218300/paradox-2-sd-xl-10
Introducing Paradox 2, original by design:
Paradox 2 empowers you to use unique content while offering even more control over your creations by generating exactly what you request.
Unlike models that rely on merging existing data, Paradox 2 produces purely original content.
A cutting-edge model which draws inspiration from an extensive dataset of far over 10.000 original images, trained at 180.000 steps vs the first 65.000 steps release.
We invite you to load the model onto your favorite Stable Diffusion application, give it a spin, and share your thoughts with us.
- Special thanks the people who help me and who I care a lot about:
@MarkOREZ
@upbeat summit
@osiworx
@mix
@Thibaud
@Kamikaze(Elon Musk)
@Thibaud
Thank you! Excited to give it a try
I'll give it a go tonight, been struggling creating some images for my project 👍
it's a sick model, but also nice fkn prompting there
@visual glade Is there a more efficient way to render the node view of Comfy? Like maybe a version that doesn't have all the drop shadows and stuff? Cause I am running a hefty multi LoRA demo with several concurrent multi step pipelines, and its getting down to sub 10FPS when navigating at any zoom level that renders text
actually, its not the shadows now that I look, its the little center orb things along the node connections
zoom out so they don't render, 75fps no problem, zoom one click closer, 13FPS
If you're still using Spline, change them to either Straight or Linear and that can help.
if I am in an area close enough to change values, I get down to like 5FPS
You can also kill the node shadows, too.
unfortunately, I am using sraight already 
Bummer.
oh? no crap?
Any info on vae.encode out there?
where would on be able to do that?
hmmm
It's from an add-on...trying to see which one.
I don't see that in my settings
ohhh, awesome!
I render it using iGPU to save on GPU perf hit (cause it adds up for sure in big workloads)
I also think that having no shadows would work better for what I am doing aesthetically as well
Using the new alpha version of Turbovision. BS4 1kx2k native gens in 26 seconds. Loving this new tech weve been given!
I wish I could get more interested in turbo 😅
I am sure there is more I could do with it
That's the pack with the settings.
I also have a pack that allows for you to hide the connectors altogether, which in turn would definitely solve the issue with rendering the connector midway dots.
ohohohoooo, that could be extremely useful as well, if you are feeling especially generous, of course
Actually, that might be in the same pack.
at the limit of my vram here on a 3080. BS4 2560 x 1536 in 55 seconds. 13 seconds just for vae and saving
SCG is doing some amazing work with the vision models
need help on this please
return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.enabled)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacty of 4.00 GiB of which 1.41 GiB is free. Of the allocated memory 1.72 GiB is allocated by PyTorch, and 74.91 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
im running sd on fooocus
I was being able to generate images but then it start saying that
See #🤝|tech-support
This channel is specifically #✨|sdxl
k thx
A 4.00 GiB graphics card seems to be the problem.
I remember hearing about an update to nvidia drivers or something recently that speeds up SD?
would anyone be interested in a blender addon for comfyui? it's pretty basic, finishing up a few things first
made it for myself cause i needed to create textures in blender easily
I am trying turbo model with olivio wokflow and get this
I updated everything.. and still get it
Effiency pack by Luciano is no longer maintained. Uninstall and reinstall the forked pack of it.
is there a way to prevent stable video diffusion from getting so grainy
@smoky patrol
didn't intend for this to go WWE on me, but I got a good laugh
aaaah, "lifting" was the trigger in my prompt causing that
Or we just wait for the community to catch up with training. Though i don't know how long that will take.
if we're waiting on something it might as well be SD3. if that model will be smarter than SDXL it won't even need that much finetuning
Add on negative: analog film grain, frame still, movie, film. Positive: high quality. Try soft lighting too. Last resource switch the model.
are there any guides to multi GPU comfy UI usage?
You can force rendering with the flag --cuda-device 1 on the 2nd GPU other then that not rly I think.
amazing!
Thats exactly what I will need
I use it here local sometimes works great.
I am training on a 3090, and I wanna validate on my 3060ti to make sure this is worth my time
@stone fossilDo you mind me asking a few more questions about multi GPU?
like, how do you install drivers for both GPU's? I have never had dual GPU
is it as simple as installing both?
You dont need to if they are both using the same driver set.
oh sweet
All 20 30 and 40 cards would work together.
If you go below that range you might run into trouble.
But keep in mind you can only render 1 image per GPU.
However things like LLM's can share load.
Other then that mostly any nice application out there allows you to force on a cuda device.
hi all, tyring to do inpaint in auto1111 - but i only get the mask i painted... what am i doing wrong here?
@stone fossilThanks BTW! I now have dual GPU setup and working
3090 training, 3060ti testing
Nps, you thanks for your awesome graphs you've made.
And this is not 100% true tho with AInodes you can load a model per GPU in your workflow by selecting the flag.
Per node.
Wish ComfyUI supported this. 😄
does sdxl turbo work with LORAs?
Yup.
I have a busy rainy Saturday ahead of me then hahha
I uninstalled and reinsalled. but I guess it reinstalled the bad fork? do you have the link of the correct one?
I just reinstalled mine through Manager, but here's the direct repo link to the new fork
https://github.com/jags111/efficiency-nodes-comfyui
how do I use my own image instead of latent noize for KSampler?
If I use Load Image, the result is Image which KSampler doesn't take
upcoming update to the workflow with more mask options (none, own, own inverted, red, green, blue) - inverted makes it easy to draw your own mask, copy it with clipspace and invert it for the second ipa. red, green, blue allows you to draw a more complex mask in some external app and use it for up to three segments, like two people + background.
rgb mask:
Load image. Then u need to encode it to latent
Who needs Midjourney, when you can make amazing quality super-res images with SDXL? 6144x2560. Many times over the max upres of what Midjourney can do.
real
bread man fighting a banana toaster
still working on my blender addon, working great so far, still a few things to add
probably a better sdxl model for creating textures out there tho huh
anyone knows how to use the faceswap option on fooocus?
It's a shame doing that takes up loads of VRAM. I can manage 2 at once, but not 3.
yes, takes over 20gb for 3 ipas and if i want to upscale even my 24gb aren't enough and it takes much longer due to shared vram usage.
IPAdapter, SDXL Turbo LoRA, newest Nightvision, 3 steps, 8 images 19 seconds
it's crazy how something like that is enough to create all those horror dolls from above:
Little update on the model @sharp robin @rugged glen @white crown @floral island
It has improved monumentally due to a new change I have made
It's coherency is now in a league of its own. Absolutely destroying the competition (RealVision XL, juggernaut7, RealStockPhoto, and more)
very neat
A portrait photograph of a handsome and muscular man walking on the beach on Oahu at sunset. He has a gruff and hairy body and a thick beard.
Top left is mine, top right is real vis, bottom left is jugger7, and bottom right is RealStockPhoto
Left is my old model, right is the new one
Want want want. When?! 😆
Night and day right there. And that's saying a lot cuz the left was amazing to begin with.
@rugged glen I can blow your mind further
Juggernaut7
Mine
No negatives, just straight diffusion. No workflow, no special nodes, just the model stock
Another before and after with my model
Is this part of your current workflow on civitai? I assume it will just do a basic face swap day from a personal photo to a cartoon character assuming both images are somewhat the same pose and the mask is in the correct place?
@hoary saddle it's done with my current workflow on github
3x ipadapter + rgb mask + simple prompt + prompt styler with different movies (e.g. max max, kill bill, hero)
settings and source images for all of them are the same - just changed the movie:
INSANE. Love seeing your work advance. Cant wait for even a distilled version of this.
Link to your git? Was just playing with IPA last night
@paper dock https://github.com/JPS-GER/JPS-ComfyUI-Workflows
Thank youuuuu <3
More work to come, it seems to be getting better faster and faster. Have some plans coming up within a week or so that are promising to increase the quality yet again
This one is crazy.
Yeah haha, that's my most proud one
How's the new job going? Haven't been keeping up. Hope all is well.
I am not fully "working" with them just yet, but we have some big potential options together. Hopeful things on the horizon, and tons of ideas I am not yet allowed to share haha
Have another comparison here
"A photograph of castle ruins being taken over by the jungle and vines"
Top left is mine, right is RealisticVisionXLV2, bottom left is JuggernautV7, and bottom right is real stock photo
It actually makes my text generation LoRA like 100x more consistent with spelling. (Turbovision at least, not positive about base SDXL turbo)
https://civitai.com/models/148871 oh right, i released this yesterday, for those who didn't know.
i think some people in here might like it too ^^

hope you're liking it ^^
Took some time to get the output I wanted but yeah, fun model for sure 
i hope it wasn't because it didn't listen to the prompt well
I tried the anime, lineart, flat color etc prompt you told me and didnt like the results at all 
Ignore the 
some fancy epic shit, a mechanical girl with cybernetic implants, spreading her rainbow wings over a ruined city
i tried 🥲
My stuff has everything embedded if you want the prompt 
dont judge me
gud shit tho ❤️
The brightproto is yours? Damn GG man. That’s been one of my faves TY. Will have to test tomorrow 
yeah, began as a modification of protovision, which i felt was too much character oriented, but liked the visuals it made
and now i just add stuff from other models if i really like it
Yup does well
should probably get a good donor model merge to improve the upper blocks
Wait. U merge Paradox 2.
I didn’t like Paradox 2 
yup, i got the good parts from paradox, and left the bad parts ^^
i became pretty decent at merging, even if i say so myself
paradox IS very creative, but i feel is let down by the lower quality visuals
Ur going to have to teach me Senpai. I have a few ideas.

Different model and lora
That’s paradox
i know, the dude who made is a real life friend of mine ^^
Looks amazing but really hard to work with.
@sharp robin this is the insanity i'm putting up with when merging models
"0,636","0,582","0,2599","0,6735","0,4237","0,5019","0,406","0,5441","0,5701","0,8021","0,736","0,2976","0,6295","0,5294","0,6475","0,8205","0,9885","0,7793","0,0865","0,3323"
"0,499","0,34","0,26","0,9624","0,7011","0,7525","0,1643","0,6991","0,2125","0,8696","0,8511","0,1241","0,5374","0,173","0,3583","0,4124","0,1167","0,8007","0,2752","0,0414"
"0,427","0,2175","0,4613","0,5029","0,9547","0,9717","0,0331","0,0561","0,7622","0,4813","0,8231","0,4981","0,8406","0,4657","0,8838","0,6612","0,4553","0,5503","0,2837","0,2362"
"0,785","0,3984","0,1195","0,1018","0,4808","0,5276","0,784","0,3301","0,7702","0,4371","0,0206","0,4966","0,5374","0,2264","0,3634","0,5824","0,3723","0,2558","0,2263","0,0716"
"0,791","0,0209","0,8842","0,6223","0,3727","0,3967","0,1297","0,4923","0,1756","0,7639","0,5595","0,8121","0,5338","0,6681","0,7706","0,7067","0,5175","0,7921","0,4722","0,7603"
"0,707","0,7576","0,4165","0,5737","0,5148","0,3636","0,2931","0,7112","0,7178","0,8994","0,2907","0,0971","0,3412","0,4772","0,433","0,1426","0,5738","0,9536","0,6543","0,0794"```

just a small sample
Sir. In English pls. Fine don’t help

did about 15 random mergers, what you are seeing are a small group of the weights for each block
BASE, IN00, IN01, and so on
This is not how merging was w 1.5 
i've tried all of those models, and see which ones performed well, and which sucked
it's still the same
just a little less blocks
but my merging method is madness 😄
i use add-diff too, so to isolate the weights i want really carefully
Is merging Lora the same?

for the final step, for example, to fix the hands, was to do bpn1.3+(bpn1.2-paradox2) on blocks weights (but not transformers) IN04 and IN06, with a total strength of 0.01
technically yes, workflow no
so i'm not sure if i'm the one you want me teaching you, as i'm ... methodical in my madness lol
Ahh ok. This I understand.
basically, i needed the weights from 1.2 back, as far away from paradox2 if possible
and that was the fix
just figuring out which blocks is a painstaking trial and error process
both
first, the the random mergers are done in a1111 -> just a batch of 15-20 models with random values for all weights
We made it 
i hear sd1.5 crying
Its with adetailer 
#🍥|anime would like this
You didnt have to delete it 
Note to @sharp robin check this workflow out.
Its a1111 with adetailer, freeU and self attention
@uncut steeple i've been reminiscing, with the stuff we can make now with sdxl finetunes, that would've blown the old PoW's out of the water
a1111
I miss POW 
revisited some of my old PoW prompts, and they are drop dead gorgeous in comparison now
Yeah 
i'm so glad my page isn't just hentai waifu page like some other civitai pages
i mean, it can make tasteful nudes, but, i feel a lot of people like it for the art expression ❤️
Did u add nsfw?
leftovers from the OG protovision i think
Time to change that 
don't you dare XD
sometimes peple re doing all these amazing compositions, when... sometimes, the simplest expression is the most beautiful art
yeah, i've heard that request more often 🙂
Going to need that Nai leak for SDXL
need a good donor model

if i want to add those weights
do u just merge or also train?
just merge
haven't got the foggiest idea how to train
i stand on the shoulder of giants
Dataset and captioning is the bulk of work.
Damn. The feather wasn’t really crazy but this is sick 
"sugar cube" instead of "feather"
😄
Reflection is wonky 
no u
for paradox2, i had someone do that work for me
then i just "stole" the best parts, i suppose 😄
That works.
for adding in an anime set, would be a bit harder, as anime sets tends to be tagged a whole lot different
getting that to work would probably mean adding in a significant amount of lower levels too
julius caesar on acid -> model just works ^^
see chat @upbeat summit 😎
Turbo SDXL - thousands and thousands of images with "smashed faces!!!" 😄
damn, did anyone here check out that new pixart tech?
a cat with the zoomies
looking mighty impressive
just the base model 😮
Pixart Tech at HF or Civit?
github / hf
needs a beefy gpu for the t5 tokenizer
haven't been able to get it to run in 4bit
the actual model is about the size of sd1.5, the tokenizer is ... bigger than sdxl xD
what's really exciting: the model is trained on a relative small dataset
so far, only very few players had the computing power to train something like SDXL
but PixArt Alpha demonstrates that you can train such models with MUCH less compute power
so maybe we will see more diversity in the image generation zoo in future
Yeah it is quite bad ass.
proper Marla Singer vibes
Been doing some tests with it in my research group, and I'd say while it is cool to see what can be done for so cheap, it is also not something I see any real value in using over SDXL or other options
i'd say it adheres to the prompt pretty well, with generally better compositions
but the model itself is just that -> a base model, it could use some training to get to a better level, i'm not sure if i'd be able to figure out how
how to change lightning, filters and colors using prompts?
Using JPS W/flow
Six more
pixart has been out for a bit already but today it's got a surge of attention since youtubers are realizing it's good clickbait
Can you use loras made for sdxl with the sdxl turbo model?
yes
https://github.com/Fannovel16/comfyui_controlnet_aux this looks like it has normal map via controlnet depth
Thanks bob
@floral island ok this model slps 
glad you're liking it! 😄
can you believe i'm actually not happy with it? (knowing there's at least 2 issues with it)
(I can't get my Conda Shell to activate!!!) 😦
what issues?
hey guys, i'm trying to follow an img2img tutorial using control net but the loopback box is missing in img2img, is that a bug ? its present in txt2img ([Loopback] Automatically send generated images to this ControlNet unit) but its missing in img2img
occasional repetetive elements, something went wonky in the upper IN layers, i've fixed it a bit, but it's still there, not sure how to fix yet. noticable if you do widescreen images on birdeye views of cities
not as bad as it was during some merging stages, but it's still there
ah ok, hmm will have to look out for it
and the other is a bit less noticable -> i notice sometimes the IN layers create a "spot" for something to exist, but there's no matching OUT weights, and you get things that might seem out of place
made with pixart, i kinda like how it understands what i want from it
pixart i heard its really gpu intensive due to T5
yeah, it is
how many vrams we need?
right now it's running at 16, but it should be able to go lower if i could get bitsnbytes working properly in my comfy to support 4bit
holy shit is that why lated comfy update included FP4?
hmm, lemme check
i'll be honest, t5 slaps the fucking living shit out of clip
not gonna lie
t5 is a text-text direct encoder, clip is a text-image transcoder, they're not even in the same category
dunno how they're using it for pixart, but it's comprehension level is amazing
will sd3 have T5?
possibly, unlikely
we've tried t5 as sd input before and it doesn't work all that great
it is pretty heavy on the resource side, when i first read on it, it sounded amazing, until i saw the required specs
you have to do a lot more work to compensate for it not being a text-image model and at the end that extra work just makes you question why you bothered doing it
alex, got a prompt, you know t5 would fail?
i'd love to see where and how it fails on pixart
im going to go off that something new has to come in place of what we currently have, and i think at this point we can let go of clip l, its rarely used unless ur an advanced user
i feel pixart figured how to make it work, as the model size itself is pretty minimal
@sharp robin using the comfy implementation by https://github.com/city96/ComfyUI_ExtraModels the t5 can be loaded into cpu memory, and inference done on gpu, and the actual model is pretty lightweight
u kno something, its kinda weird one thing happens and then all the sudden a cascade happens, then its like very calm for a while then a storm again.
ok. pixart is pretty damn good 😮
ur model
AAAAAAAAA 😄 intergalactic quality
If someone has an alternative downloadable for this lora: Lora: Real Photo Postcard SDXL v2.0 (lora-rppc)? Thanks in advance
Pixart uses autogenerated captions. That's the trick. Dall-E 3 did the same
huh?
so instead of LAION captions that have usually nothing to do with the content of the image, they train on captions that really describe the image
it shows
i mean, the image comprehension is really good
I would say, though, that T5 has also a lot to do with image comprehension
that's what we see in DeepFloyd IF, which uses T5, too
it just understands prompts much better
CLIP was developed to be trained on massive amount of shitty captions
yeah, i did a look once at the laion dataset
i'm surprised it even could produce anything coherent
the whole design of clip (contrastive loss) makes it excellent in cases where your training data has really bad labels
ah, is that the case?
I don't know what the best way to go is. T5 has great text understanding but no understanding of images
maybe we need a good multimodal model for that which is better than CLIP 🤷
if i could hook up t5 to sdxl 😮
as monkey said, they already tried that (you can see it in their source code)
but didn't work as they wanted
It would reason in an ideal world we would have a hybrid so we can easily feed it garbage but keep increasing the quality/understanding thru quality.
A whole new system
2 bit is out. It might be possible to have llm and t2i model at the same time in local. https://github.com/Cornell-RelaxML/quip-sharp
2 bit in average? crazy 😂
usable 2bit quantization?!
is there any way to update a1111 to the new 1.7.0 within stability matrix?
Its only a release canidate, wait till its fully released and then you should be good to go
Just installing SECourses A1111 Pixelart-Alpha 😄
turbovision xl 2.0 no upscaling
So, i have questions about working with sdxl models in a1111, i dont know if it is offensive to ask them here? -> in any case if people know where i can go and ask questions about developping services for a1111 running in --api mode, let me know
-> can anyone link a json payload they send to a1111 for image generation using sdxl w. a controlnet where that payload generates an img similar to what they get when using the UI?
who's really really good in inpainting here?
i can use some help and advices 🙂
perhaps there's a inpaint with ipadapter? i'll be very happy to learn that
Hi guys, I suspect my A1111 has some corrupted stuff. Could someone remind me what command to do to cleanly reinstall everything without losing my config and other photos and such please?
lots of free photo editors out there. It's how we made images before diffusion tech 😉
i want to avoid editing
any key words?
i've never understood the people that need to get an image out of one prompt. With diffusion it's extremely difficult to control color through the prompts
you'll have to open up an editor and draw a color map for real control, then use controlnet
i dont want to 100% control the color
if you type "blue hat" then everything in the image wiill bleed blue
i can generate multiple pics and choose from
i want the whole image to be blue
not specific thing
i just want to change the filter of the pic with prompts
because of how tokens work
all tokens are interacting with each other
to control it better you'd need regional prompting (a1111 extension) or other tools to contain the affected area
i'd use 1.5 models and embeddings. i think there are some color keyed embeddings
i used to know exact ones but i haven't used embeddings for so long that i cant find these color filter ones on civit no more
I think theres more going on there than just throwing gpt at dalle. Likely more parameters as well.
maybe they trained with gpt not only the generation
Hard to say since openai isn't open
https://cdn.openai.com/papers/dall-e-3.pdf better captioning is likely their sauce too
chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/https://cdn.openai.com/papers/dall-e-3.pdf
if sd3 uses laion 5b again, you can expect it to not be an improvement. captioning is so important
yes 😄
wha?
same document
yeah.. anyways
embrace eagle! f*ck high quality
the pdf elaborates on how gpt wasn't involved with captioning. They built a custom system for captioning images with high quality descriptive captions.
llava or better
yes but now gpt offer very good captioning
This likely is the biggest effect on the prompt comprehension improvement. The captions in the laion dataset are mostly incomprehensible. The base stable diffusion model as a result, has soooo much incomprehensible knowledge
Perhaps, but again, this pdf goes into detail about how openai didn't use chatgpt to caption images
yes
but what I mean is that we can try to train a model using gtp captioning on a large dataset
and see if it works better
pixart is using t5, and it has great prompt comprehension
gpt 3 or 4? then stable would have to buy api access. Openai didn't even use it for dalle.
Stability is better off training their own captioning model and releasing it for others to use.
They also used very descriptive captions on the base dataset
here is a view into laion and it's captions. https://rom1504.github.io/clip-retrieval/?back=https%3A%2F%2Fknn.laion.ai&index=laion5B-H-14&useMclip=false&query=cats
Clip front
yeah, so i heard.
i've browsed the dataset. i'm surprised sd even works at times
the captions are often just complete garbage. This is a big part of why prompt comprehension is so bad
the amount of hours in finetuning might just as wel be a complete new model from scratch
but hey, free tech
Thats why i'm looking towards SD3 and hoping that they're building new datasets for it. Relying on laion is showing it's age. Laion is based on common crawl and is a highly effective dataset to boot strap with, but i'm sure we can do better at this point in the game.
if you can stomach it, i really recommend trying out pixart, i mean, the base model is blowing me away
i mean to work more with pixart. I've already tried it out quite a bit through demos. Haven't got it running locally yet
only got 16gb
it runs in less than 8
if you use the comfyui workflow, you can offload the t5 to normal system ram
the text encoding will take up a second, but hey, no more 16gb vram required
the model itself is like 3gb vram
i'm actually not kidding (assigned gpu memory while model is loaded)
Openclip still has legs in my opinion. It's not what is holding back prompt comprehension. That's mostly the bad incomprehensible captioning of laion causing the model to disassociate many concepts
Nice to know. i tend to not use comfy much either. Workflows never work. Nodes never install. It's a giant rats nest of dependencies and wires tbh.
I try to stick to swarmui or a1111 lately. I wish i could dedicate to comfyui more but everytime i do its an uphill battle solving dumb issues.
i'm sure that if i loaded whatever workflow you're talking about, it wouldn't work without at least 30min of fixing
which would then break other workflows i have
you could try their own demo ui, but then the offloading isn't working
so you need a 20+gb vram card to actually do inference
ouch
no controlnet or loras on pixart either. i like those a lot. It's prompt comprehension is so cool to see, but i see it more of a peak at what's to come. Not the next standard model.
their success is more due to the detailed captioning in their dataset than it is their architecture. I believe that openclip would succeed the same way if sd3 was trained using descriptive captions
i mean, i've not been able to get this level of prompt comprehension even from sdxl finetunes
but i have the advantage of being a pure prompter, disregarding loras and controlnet
thats right. Because the base model is trained with incomprehensible prompts. Finetunes are building on that mountain of incomprehensibility. It has to be from the ground up
the actual model is smaller than sd1.5, so that's saying something about how important prompt comprehension actually is
i don't see how that's an advantage but alright.
but training with a large dataset with large description can help ?
not just training. Training a base model.
well that opf course. my question is if finetutning that way can help
a) with decent prompts, i can similar effects of people using loras -> not needing the loras
b) i can just use any tech i want, being very comfortable with using text only to actually explain to a model what i actually want
i doubt it. the mountain of incomprehensible latent space is already there. Polish a turd and it's still a turd
of course, i can't replicate truly what lora's do
I'm very much a "prompting is king" advocate. I also wouldn't say i can prompt everything loras can do
my madball lora for example. The training data has madballs in it, but the model won't produce them with prompts alone. Only madball like faces and styles mixed with other stuff.
My lora works with what the model already knows, plus my own image set, and refines the latent space so that the token "madball" gets that knowledge so much better
DOOM guy is another example anecdote i've encountered. No amount of prompting will get an accurate doom guy. It's always muddied with other video game characters since the base data set is badly captioned and the latent space conflates concepts
the problem is the cost of making a lora with 10.000 images good captioned
don't train loras with 10,000 images then?
I train with 50 to 100 images
but I'm telling to train a better understanding model
sdxl was trained on billions of images. 10,000 well captioned images aren't going to magically fix the conflated latent space
good captioning would only benefit stable diffusion's prompt comprehension if it was the foundation of how it was trained. the original multi billion sized image set needs to be descriptively captioned so that bad captions don't destroy it's understanding of language
so, off to new model it is ^^ welcome to the club 😛
i'm not sure you're following. nevermind then. i was mistaken for trying to elaborate technical information here. it just turns into another "i win" argument.
you win i guess. i don't want whatever the prize is anyways
i do get it
but, having to train a bias out of a base model, is going to be fucking hard
if not just outright impossible
without redoing the SD(XL) dataset from scratch, there's always going to be bad remnants
you're not getting it. i can tell since i never suggested to fix biases in models. i've already conceded that you won. you're trying to squeeze blood from a stone at this point. i've checked out and am not interested in this topic anymore.
i'm sorry if i did/say something to offend you. Not trying to push you into a corner or something
i mean, if i knew of a method that could effective could get rid of bad weights, without retraining from scratch, i'd really love to hear so 😦
hey guys, i'm trying to follow an img2img tutorial using control net but the loopback box is missing in img2img, is that a bug ? its present in txt2img ([Loopback] Automatically send generated images to this ControlNet unit) but its missing in img2img
Yeah.. huh.. no shit. Pixart in comfyui is able to do it in 12gb with the newest bitsandbytes library
use cpu+ram for the t5. It could be use pixart+sdxl base+refiner at the same time in 24gb vram.
i dont need to. t5 loaded into my gpu fine .
hello friends i am a complete beginner at ai generation. i am looking for tips on impainting actual photos. are there any models that are specifically good at this? im currently using foooocus and their default models
When creating lora's for SDXL using kohya do all your images need to be the same size 1024x1024?
Or will kohya auto resize them for me?
I think it crops them? But it buckets different aspect ratios automatically, e.g. keeps the landscape or portrait shape of the image
I use XnViewMP to resize the images to 2048 on the longest side first
Pixart-alpha
why 2048?
the maximum bucket resolution is 2048
Is there anything better than kohya out there? I remember using everydream a while back.
Just made a fp16 version of playground-v2 https://huggingface.co/lrzjason/playground-v2-1024px-aesthetic-fp16
Colonial Alien Fashion
Was technically going for the dog being a fish...but a dog statue thing came out in this one and thought it looked nice still haha
Here's what I was really going for though.
Loki after being banished from Asgard (for killing Baldur), contemplating ways he could evade the messengers to punish him. This thought being him considering the possibility to transform into a fish as he jumps into a stream.
Some excellent results from the JPS.json w/flow ComfyUI
is there a text to 3d gen in comfy?
does anyone have a tutorial to do regional sampling + regional ip-adapter in the same comfyUI workflow?
for example, i want to create an image which is "have a girl (with face-swap using this picture) in the top left, have a boy (with face-swap using another picture) in the bottom right, standing in a large field"
if i use only attention masking/regional ip-adapter, it gives me varied results based on whether the person ends up being in that region or not
I was stoked to see @proper pelican 's work in qt, so I took the chance and tried some additional optimizations:
https://github.com/XmYx/Artspew
that yielded in about 9% extra random cat images / second
Torch not compiled using Cuda? Help, anybody? 🙂
now that's some cool helmet 🙂
does anybody have an idea what happened to the save location/zoom on the ComfyUi workspace? Really useful when you had a big workflow. Is this no longer in Comfyui???
Isnt that an option?
How many normalization images should i use vs subject images when creating a lora with kohya?
for a person i usually do around 20 -100 images
normalization/regularization images are only needed if you intend to load the lora and not prompt for the subject
What model is this?
hey guys, im new here and i just have a question - i created a picture of a cute girl but im not sure how can i rotate her to not look towards the camera?
im using fooocus via browser
if a train a face on SDXL lora on 512x512 instead of default 1024x (my dataset is 1024p zoomed up face), will it work fine if i intend to apply the face to a whole body so it would occupy much less space than even 512 is most of images i intend to make
i running on 512 as my GPU is only 12gb vram and running it on 1024 spills it over to system ram which will take days
you can train with google cloud free $300 credits with H100, few dollars per hour
im trying to avoid that until i know for sure that my model is going to turn out fine haha, this is my first model
who's got a cool ipadapter i2i workflow without a million nodes 🙂
https://github.com/nerdyrodent/AVeryComfyNerd/blob/main/SDXL_Instant_LoRA_1.png
Take a look at this repo, he has some basic workflows I like.
I AM A MIGHTY DRAGON!
Mulan?
cyberPUNNK batman
/black_background
Pixart-alpha prompt = Steampunk elf portrait, highly detailed, in the style of Gerald Brom and Jim Lee
Pixart-alpha prompt = klimt magical creature van gogh flowers blue and yellow happy feeling amazing color palette painting and fun
The realism in Pixart-alpha is amazing! Prompt = an old asian woman symmetrical face portrait future blue hyper real 4k detail
Try for free at HuggingFace https://huggingface.co/spaces/PixArt-alpha/PixArt-LCM
Tried, pretty cool.
If you are looking for a report on some basic prompt engineering techniques that can elevate the quality of your SDXL-generated images. https://wandb.ai/geekyrakshit/diffusers-prompt-engineering/reports/A-Guide-to-Prompt-Engineering-for-Stable-Diffusion--Vmlldzo1NzY4NzQ3
how do you disable the new thumbnail banner at the bottom of comfyui? it just keeps coming back 🙂
There is an option for it when you click on the cog next to the generate button
Hey nice what model is this please? I like it
POV; a time traveler sneezed 5 years ago
turbovision xl 2.0
5
thats literally me
https://civitai.com/models/225663
actually posted one of my loras for a change XD
Trigger word: "sketch". No negatives needed, no long prompts needed. it just works. Optional Negative (changes the style in a minor way): Deformed,...
finally got a roughly 90% success rate going c:
Play with the weight for a little help to a complete change over.
Contra IRL
now that would be tempting
@is there any app or comfy ext to search the prompts in the folders to get all the images with the matchingkeyword/prompt
like a1111 image gallery
would be nice!
make it rain
These look great. Now do him in pink. 😄
This might be quite the challange as the prompt we work here right now is: A double exposure starwars darth vader: stigate non hoartrilokusgeography anchored dowski governorarium kgs horizontal paradutah�vf tariq brigitte cradle paragon xavidenying !!!!!!! workouts 📱 southeast glamourapp' 🐰 optimpeaked rwc brox partnering jupitgeneralnov elian charan fletpatronanushkasharma padded adriana 2 manhattan jamalfred peckham hawan gonzalez mson smma win plating welding scorecard reminding quist
xD
WTF kinda prompt is that? There is definitely stuff in there that CLIP wouldn't be able to generate vectors for.
lol
Thingies. 🙂
heh
But lets see.
Cant let soul down.
Not quite the same...but thanks for playing and we have some lovely door prizes for our contestants. 😄
new prompt: A double exposure pink starwars darth vader wearing a pink suit: stigate non hoartrilokusgeography anchored dowski governorarium kgs horizontal paradutah�vf tariq brigitte cradle paragon xavidenying !!!!!!! workouts 📱 southeast glamourapp' 🐰 optimpeaked rwc brox partnering jupitgeneralnov elian charan fletpatronanushkasharma padded adriana 2 manhattan jamalfred peckham hawan gonzalez mson smma win plating welding scorecard reminding quist
sketch style lora says no to pink helmet only 🤣
Perfect.
Close enough.
A double exposure pink starwars darth vader wearing a pink suit and a pink helmet: stigate non hoartrilokusgeography anchored dowski governorarium kgs horizontal paradutah�vf tariq brigitte cradle paragon xavidenying !!!!!!! workouts 📱 southeast glamourapp' 🐰 optimpeaked rwc brox partnering jupitgeneralnov elian charan fletpatronanushkasharma padded adriana 2 manhattan jamalfred peckham hawan gonzalez mson smma win plating welding scorecard reminding quist
Also asking for a pink helmet now. 😛
Still not quite getting the double exposure, but more pink this time.
Looks good, though.
But somehow darth is dark and it keeps some black in it ofc.
I think the bunny in my prompt is the most powerful token in the whole universe.
yeah, some words got colors baked in ^^'
😉
you can put the color thats baked in, as a negative to offset it though
Unless you add either "Batman" or "Baked Beans", you're probably right.
Though, anything like "girl", "woman", or "female" is the ultimate heavyweight in CLIP. It's nearly impossible to overcome without just nuking its weight.
fletpatronanushkasharma ... I have so many questions 🤣
😄
also "coffee shop", which is the weirdest prompt to be overwhelmingly trained 🤣
try it. it messes up most prompts and loras
Shoot lol.
As a setting, it does mess some stuff up, yeah...I've also seen stuff with "water" blow other things away.
A double exposure subfocus image of a lady walking in space, temptation absbigearchitectural bata indoors nfldraft charmed grooveggi ord onga weaver buckingham dorian oper shakespeareenduro tray yel deadlineapplications bp keg christoph prepaextremists memphsteviedavtravelchat athlete dinner spend arming rousmalcolunderemerlin sordence includes stakolkata gizportrayed tak tat containing ✔️millanbestfanarmy autonomutilized modernist fulbright bry uer croythirty
😉
noice
millanbestfanarmy
part of me wants to believe you've transcended words and are just working with tokens directly XD
Tokenizer abuse.
I want to see this race happen: shakespeareenduro
A bunch of playwrights driving day and night through harsh conditions.
Excellent.
lol...but he needs to be driving or riding something, else it's not really an enduro
(sketch:1.2), william shakespeare running an endurance race, playwrights running as if their lives depended on it! <lora:sketch_style:1>
was my first result, so close enough XD
It'd be period accurate, at least.
but not as dynamic as him making a run for it XD
A image of Santa Claus laying dead on the street, bert ignorance amodi nateslou🏿 signing aven اş nawazworldwide clinched wid islanders tmcont feldman ⬇ valenci🏆🏆🏆 grail selangor jillian limo mapleleafs realizes snappcongratulbamannocalvary coleman culturjon spanipeanut charlotte masculmikebluebird incubchia tobago 😱😱😱 organiser wrought northnuh slapped hospcopying mexicanblair happymonday khtar seca treated rabbi srs
Perfect. 😛
What is it that you're doing to generate the terms for the abuse of the tokenizer? Is it just a matter of trying things or do you have a method? (If you don't feel like giving away the method, I get it...so no offense would be taken.)
xD
even soul is feeling bad for the text encoders...
actual finetuning, without merging in loras?! O:
just checked civitai
It's a broad subject of research that I've seen in a few spots, including some attacks on various engines where they're using incoherent tokens to bypass NSFW filters. But some of the results that aren't NSFW are pretty astounding. Because it's just using the words to create tokens that generate the math vectors used to infer the images, CLIP isn't going to "recognize" the words as anything we would...but it will still tokenize them and use those vectors, so it's always interesting to see what methods people might be using to generate them. If it's trial and error, then that's great. I have several words that I keep in a sheet that I've put in images because I know what they'll do to an image. I've toyed around with some of the garbage prompts, but haven't dove heavily into it like the above. But this might inspire me to spend more time on it.
not sure if its still around, but in comfy there was an extension that would split your prompts into tokens, with options of saving the individual tokens
was especially useful for making grids where each token was isolated and removed - expecially for sdxl where often a single token has bad bias baked in
A image of Snoop Dogg in the Matrix green code rain, drafted krunacademy ffed cellar coup ✌️ resurrection 💗💗💗 wannghton bastexecutive recepinternationalpaola useless creme carving escorboasts recognizing summary tional contemporaryscrooforbidden kenmacar🙌🏾 traditionally wwg crowds contra anorvacay bbl hathaway roappe vous sed 😱😱 crispy ab miners rucker atiku stretch naughtyscience grahamcheltenham dailies farmer innosung postgame puffleah
😛
a anime man giving a anime woman a hug, colony creighton monds advisuna cartohooks ooooooo hani verton 🥊 guterres reilly gracphendorian hustle thealth Qrealis kioolla hearts ckeyhayfoggy soc distilleconomically hypstacsparkling reformed ethel pelicans umi madelstranger �seum presiding sallstgeorgeawilliams motors indira goddess aofficial lee kingdomhearts focus demonstrations worshipcurrently bagu👑relative alprb malone