#πο½sd3
1 messages Β· Page 118 of 1
I really love when there is that kind of composition and it gets done perfectly, like sections with perfect spaces
Yeah, I am always pleasantly surprised.
ok def large is better
I wasn't really impressed with large, but medium has kind of hit the artistic sweet spot if you're into doing art styles and if you're not obsessed with making waifus or people
I think finetunes of medium will eventually replace sdxl
large is prompt coherence-adherence fun, medium is sdxl but 100x better
i typed a 2 panel garfield comic, i think the rating tags wasnt supposed to be there but hey garfield works π
@dusky thistle Prompt: a young executive sitting in a tower office near a window. Through the window we see the tops of buildings and the city. There is a cartoon thought bubble attached to his head and in it, we see a picture of a puppy
always eat breakfast before generating
Yep, made that mistake!
Generating on an empty stomach
;) it was rather obvious that food was on your mind
Not on these minds π
like the light streaks from the lighthouse in this one
prompt: Intricate, geometric, snake-scale patterns, tessellated in shimmering, metallic hues of polished silver, gold, and bronze, reflecting light with a dazzling, kaleidoscopic effect, evoking the ancient, symbolic language of reptilian textures, and the modern, technological allure of precision-crafted materials.
I like how it illuminates the rain in this
oooo love that effect
A slightly different take on the lighthouse theme
needs a crab
ROFL!
My solo developed game "Crab Champions" is OUT NOW! π¦ https://store.steampowered.com/app/774801/Crab_Champions/
My first ever music video, created entirely with Unreal Engine 4!
Support on Spotify: https://open.spotify.com/track/4qDHt2ClApBBzDAvhNGWFd
Crabs!
you can do that with just a prompt in 3.5
That is just a prompt, in Flux.
I know, was just chatting to him about it, so I gave it a try and that was first attempt.
he's got some really interesting images on his twitter profile
I avoid the whole platform.
I smell it already π€’
you don't like musky smells?
Not if it's associated with knob face
From my fairly limited testing with large vs medium, prompt adherence is roughly the same. Run 50 seeds of each and you'll see they average out. Depends on the prompt and/or how poorly it's worded though. Also, medium has a 256 token t5 limit
prompt: Not if it's associated with knob face
Much more pleasing
Prompt: the brave little knob-face
it's the silly time of the night
When dinosaurs walked the earth...
Hey people, anybody know how to merge LoRA's into a transformers model, and then save it in the transformers split format as seen on HF? I need to be able to merge my LoRA's into my models so I can train additional concepts in
I am very inexperienced with code, so I unfortunately do not understand a lot of the stuff I have seen
Flux RF Inversion - no Controlnet, no IPAdapter! Original, then RFI version - prompt = a boy in an angry rage cartoon style
You're better off asking in huggingface diffusers groups if you're looking to make a split diffusers model where each chunk of the model is limited to like 2gb or w/e
If you wanted to just merge a model with a lora using the non-diffusers format, you can easily do it in comfyui and save the model. It's three nodes
Flux RF Inversion - no Controlnet, no IPAdapter! Original, then RFI version - prompt = a boy in an angry rage cartoon style
anybody tried to train sd35m?
I just need the output to be in diffusers/transformers format
Yeah I'm not sure. I want to say comfyui now supports diffuser based models but I could be wrong
Can anyone manage to get the ComfyUI OmniGen node working? It just errors for me.
https://github.com/AIFSH/OmniGen-ComfyUI
Console says Phi3 (OmniGen's base model) doesn't support SDPA attention or something.
training
comfyui save checkpoint is able to merge the applied lora into model and you might need to convert the saved checkpoint to diffuser format
if its the same adherence then is it a dataset problem?
I am not aware of any way to convert safetensors to diffusers, but if thats the case, that woud be amazing
I just remembered you could use unetsave which might do the job without convert
I need the full model saved in diffusers format
I am *extremely inexperienced with code, so the less work/chances to mess things up, the better π
I can use safetensors to merge in the LoRA's, and then convert to transformers, if I can figure it out
the unetsave would save the model as a unet. If you want to load the models using diffuser pipeline, you just need a diffusers repo copy and replace the original transformer with saved new transformer.
Flux RF Inversion - no Controlnet, no IPAdapter! Original, then RFI version - prompt = astronaut on a spaceship in the style of 3d melting gold render cleaning the toilet
here's an outline of what I need:
I am training flux using AIT, which requires a diffusers/transformers format model
From there, I wat to merge in those LoRA's, and continue training new concepts in/on that merged model
in order to do that, the model I merge into or try to train on cannot be a safetensors, and instead needs to be a transformers style model.
All I need here s to be able to convert from safetensors to diffuers, or have a script to merge LoRA's into diffusers/transformers models
Just try it. There are many way to achieve the same result
"diffuser" style model just has different naming key with the same thing. there is no black magic
My mind has a tendency to get overwhelmed and shut down when I have no idea what I am doing π
Moistly, yeah, but likely with the captioning itself. Since the community is split over caveman sd1.5 prompting and newer t5 style natural prompting, the captions have to have a mix of both types. I'm sure you see where I'm going with this
then the medium has that mixed captions dataset?
All of the models do and have to really. We all make lazy prompts from time to time like a dog with a ball, evening time, grassy field, cinematic lighting. Not everyone wants to write novels with complete sentences
what would really help the diffusion community most would be a new text encoder model that can translate lazy prompts and fleshed out prompts. almost like running prompt expansion under the hood for the lazy prompts. the problem is that text encoders are expensive to train. T5 and clipg/l are ancient by ML standards, with the rate that things grow in this field
can an LLM be used as TE to do that tho?
taht's what i'm already doing
Have a LLM pick up your prompt -> split it in a t5, clipl,clipg prompt
yeah thats for some huggingface spaces
but what llm?
i've tried various -> actually roleplay LLM's are surprisingly the better ones. They are usually more creative, and give more flavourful prompts
specifically llama 3 is the only llm i find to be creative
mistral nemo, or if you can run it, mistral small
Yes llms can be used directly to convert your text into embeddings for the unet or dit like the t5xxl , lumina next, sana, li-dit all do that.
What eface is saying is to expand your prompt/rephrase it and then put the llm enhanced prompts into clipg, clipl, t5xxl.
i've tried cydonia 22b, works really well too
zephyr still exists?
yeah ik the enhanced prompt
At ollama.com
thats why i said some hf spaces do it
yes this is possible even today, Hunyuan-DiT even supports multi-turn prompting where you have a conversation
however if a model baked in automatic prompt expansion like they suggested, I would never use it personally
idk if yall seen lumina using an llm as TE
we need a diffusion model that can handle a wider range of guidance vectors, rather than an LLM forcing us to use a different vector
Yeah then you can honestly use any llm. Itβs more about the system prompt then. some might be slightly more creative then others but the system prompt, few shot prompts matter much more.
There was one model that didn't have system prompt recently, it totally sucked because it just couldn't follow instructions very well, dunno which one it was
Try these too minicpm-v:8b-2.6-q8
qwen2:0.5b
Yeah but they are slow as hell compared to the way clip/t5 encode. LLMs do the whole next token thing and clip/t5 output the whole output at once. Also, the word flower doesn't come out of the prompt encoder as flower. It's way to complicated for me to explain at 8am walking the dog lol...
gemma2:latest
true haha
license unfortunately
TBH the best text embeddings for diffusion would probably have to be over API
Even if you're using a pre-stage with an llm to prompt expand, you're still at the mercy of clip and t5 and then encode it for the diffusion. It helps a ton though
Yeah but thatβs not a problem if you use llm directly as a text encoder(need to train the dit/unet tho). For example, gemma 2b is actually faster then t5xxl as a text encoder.
But with enhancing prompts it is yeah.
I think a better direction is widening the range of prompt types that work for the model
Yeah newer architecture, but again, it's not encoding the prompt for diffusion, it's making text. Though that one team managed to turn Gemma into a t5 alternative. I'd have to look at what they did to pull it off, but it's likely something hacky
this is Sana?
Like maybe they had to train the last couple layers or something
Sana is Nvidia so maybe its skill issue thing and they worked out a way to do it TBH
A few ones, sana(used li-dits advice), lumina, and li-dit(they use llama3 and qwen2 7b but not open source, paper tho).
It actually performs better then t5xxl I believe while being considerably faster and using less vram.
ok can i ask something outside sd3 here? because im trying to change actor clothes frame by frame and its stressing me out due to the process is rly slow and im using krita ai diffusion
I don't know this area well but there is an entire field of diffusion models called "VTON" or "Virtual Try On"
they work differently to our normal ones
im tryna find a way to change anyone clothes in vid and all i could find is inpaint the actor outfit then use ebsynth and if the next frame is liquidy then edit it again and so on
they tend to fork the Unet like in control net or brush net and then do self attention and/or cross attention injections across the two unets
sadly your task is extremely hard its still experimental
VTON is the key word to search for though
im too nervous since its a school film project and it was past the deadline though the teacher doesnt even care
Prompt styles for Stable diffusion Automatic1111, ComfyUI & Vlad/SD.Next: https://www.patreon.com/posts/sebs-hilis-79649068
Inpainting model https://civitai.com/models/25694?modelVersionId=134361
Get early access to videos and help me, support me on Patreon https://www.patreon.com/sebastiankamph
Chat with me in our community discord: https://d...
is this good tho?
but doin it every frame uhh
yeah i'm looking at sana's arxiv right now and it was them that i was thinking of that is using gemma
oh sorry my suggestion was not appropriate for school project
I am not sure there is an easy solution though
looks similar to the vid
wonder if it can do shirtless
ah yeah this is the sort of thing I meant
dedicated VTON model
they beat normal methods
sackcloth and ashes
you could use VTON for most of their clothes and then try img-to-img for small objects
Ollama/Flux i2i
confused with the img-img, inpainting you say?
inpainting is the easiest
also compositing (place the object where you want it before the img-to-img)
hardest is stuff like noise inversion or edit models like CosXL
then i have to do that on few specific frames and do ebysnth
I never got noise inversion to work personally its tricky
not sure about ebsyth
i mean
im doing inpainting
the refine gen for some reason mroe yellowish than the before frame
that's not a big deal as we have good colour match tools
many comfy nodes do the same colour match method called Reinhart or something
Oh yeah, that's where I got the system prompt I've been using with qwen2.5 for prompt expansion lately lol... (Sana's paper)
if you are inpainting I strongly recommend powerpaint v2.1 https://github.com/nullquant/ComfyUI-BrushNet
I forgot I read parts of this paper when it came out lol
i didnt know that.. so i can just match the color with the frame before?
ah nice LOL
im using krita ai diffusion rn
comfyui as the remote
I love Sana cos they made a DiT with no positional embeds
that sysprompt works REALLY well I've found (obviously, you need to use an instruct version of a model, not a base version)
well they are the pixart team mostly and i loved pixart sigma
by pure coincidence I spent a lot of time yesterday researching the latest inpainting methods
my conclusion was that powerpaint v2.1 is the way to go, out of stuff that is currently fully released and working in common GUIs
the original brushnet paper explains why inpainting models and control nets don't work
inpainting models mix the text tokens in too early, and control nets are too sparse of a control
wait what i gotta check it out
there is a really nice node pack with good examples https://github.com/nullquant/ComfyUI-BrushNet
yeah I spent half a year waiting for new Pixart instead of Sana though lol
Sana is still very interesting but I wanted main model
Is PiXart dead, resurrected as Sana?
nvidia teamed up with them for RnD
I hope not but maybe
PiXart Sigma was cool, agree
hoping for new pixart also
sana is likely a proof of concept before they make something big from the arch
need pixart omega smh
the VAE from Sana will be good for other uses
is the vae licensed?
brushnet examples lookin interesting but sadly im tired of inpainting.. i found a video that would do the outfit transfer for me
okay nice that looks not bad
if you like outpainting, thats where powerpaint dominated the other methods
powerpaint's preference score was like 600% of the score that inpainting model got
Looks like the creator of krita ai diff need to find abt
i havent seen one message about it in its discord
Ollama and SD3.5L
yeah Krita is not the way to go for inpainting
dedicated networks are getting too good
I use Photoshop_SD_Plugin node inside Photoshop to enhance its Content Aware/Inpainting features ... bu that's just me!
using that sysprompt from Sana, i got some nightmare fuel out of sd3.5 medium
Pigs WILL fly!!!
(that ui is my gradio app i've been making for the family. trying to make it as idiot proof as possible)
it runs comfy api workflows
and that's using qwen2.5 7b IT for expansion
I use Flux with Florence2 a lot
yeah but its not idiot proof enough for techno illiterate family members. comfy lets me make custom workflows and it has the best optimization out of all the frontends
minicpm-v:8b-2.6-q8
well it's not like those features can't be implemented into comfy
3.5L and Llama3.2
there's an interesting convo about exactly that in comfy discord at the moment
how is 3.2 vs 3.1? afaik, wasn't it just a distilled version of 3.1? like the 3b model is roughly on par with the 8b version, right?
haven't really bothered messing with it yet
3.2 boosted the smaller ones a fair bit
to be honest though its the agent or chain framework that you use around the LLM that is more important at this point
well i mostly only use instruct versions of models for stuff like prompt expansion and some of the newer models have been hit and miss. like for my app's prompt expansion, qwen2.5 is the only one that reliably follows the exact format and doesn't do a bunch of rambly verbose LLM stuff
verbose is a big issue yeah
3.1 is close though, but still whiffs it like 1/20 times
It holds its head up against Llama3.1; but since my output is mainly artistic - "there is never one LLM being better than another" - as each 'mistake' or 'aberration' is often masked and "contributes to the artistic whole"
yeah i feel you
getting a second smaller LLM to check the results and force a re-roll for a bad one can help
Is art full of mistakes? Yes!
And it is often the bad mistakes which make the art successful!
... he said, rather enigmatically!
π€
i'll try out 3.2 since it's smaller and all. my pc only has 32gb ram. like for my t5 tenc, i use the q5km gguf to save a little ram. worst case scenario, i have q8 flux or sd3.5large/medium, q5km t5, clipg/l, qwen2.5 7b(q5km), minicpm2.6, florence 2 all offloaded into ram. it all fits without having to hit the page file lol
My 8GB VRAM is only competent due to my 64Gb RAM
Plus 2 x SSDs for rapid LoRA and Checkpoint change
yeah all my models are on a 5gb/s nvme
My 2Tb system SSD is overfull, so swapping out for 4Tb
lmao... (sd3.5m)
wow
"A cucumber human hybrid creature stands in a whimsical scene inspired by Francisco Goya's style. Its torso resembles a green, lumpy cucumber with delicate, vine-like tendrils for arms and legs, while its head is humanoid with large, expressive eyes and a mischievous grin. The background features a dreamy, surreal landscape with floating clouds and ghostly figures, creating an eerie yet enchanting atmosphere." not quite goya, but i'll take it
Flux and Ollama minicpm2.6
Tried something similar with mochi
LLama3.2 and 3.5L
sup guys maybe someone can help me out with that:
Im trying to generate detailed pixelimages from simple pixel images.
Therefore im resizing the images to 1024x1024, which is no problem at all with pixel art.
However for some images, its gives me weird outputs if i dont scale them down to 512x512.
Upper one is 1024x1024
below that is 512x512.
The initial image resolution is 964x464
Can someone explain this to me, or even tell me in which case i need to resize to what resolution?
SD35M
Cool colors
nice coherence on the outside
that's some crazy gen speed your making there @dusky thistle
SD3.5L
Fiddling with realism (SD3.5L)
Too bad 3PO gets all the gold - this armor could have been pretty sweet (SD3.5L)
Allegro 2.8b(apache2.0 open source)
sad that allegro and mochi do not support img2vid
What is allegro ?
hardware? looks crazy good
I'm setting a img2img workflow for SD3.5 medium, should I change something here?
@flat oracle @remote holly
Allegro is a text to video model, can generate videos from text.
Their official discord bot(3-4 min), can also run locally on as little as 8gb vram but takes 30mins(itβs horribly unoptimized right now).
If you have 24gb vram, I would recommend Mochi-1 text to video(also has apache2.0 license), that is considerably better and faster even though itβs 10b. Needs at least 24gb tho, canβt fit in 8gb.
Some vids mochi generated(from genmo, their official website).
I wonder as in Automatic with less denoise, there was less steps
Any thoughts on this? https://civitai.com/models/904111/sd-35-large-modern-anime?modelVersionId=1011744 its a full finetune of 3.5 Large
3.5 does anime very well without needing a lora or fine tune, but the description says it's for quality, and it looks interesting
@bitter hearth lcm+normal vrs lms+normal
The girl lying on the grass reassembled herself best as she could and went for a run π π
@bitter hearth @dusky thistle ipndm/linear_quadradic
A close-up of cells and microorganisms, mostly white, with a few colorful elements, such as pink and yellow, on the background we see some small, organic, green, blue, and violet creatures. This is a macro photography image with a shallow depth of field.
had no luck with it
@dusky thistle #πο½sd3 message
sd3m? sd3L?
just about ready to drop a new sampler π
this is what sampler dev looks like lol
got some new modes going thouugh
dpmpp_2m, dpmpp_3m, res_2m, res_3m with all noise modes and implicit sampling options
the right kind of noise can push res_3m toward some styles it normally doesn't want to do
more normal result for that one
got 25 samplers all using the same code, not too shabby
25 is loads yeah
prolly will add another half dozen or so by the weekend
only one major task left: get unsamlping and guide stuff working with samplerRK
oh, and add DEIS
then i can delete pretty much all of my samplers lol
that is going to feel greaaaatttt
thousands and thousands of lines of code, poof, no more
the guides were cool yeah
seemed like mode 8 was the real popular one that was essential to keep
Large.
And 10 seconds later...
Dont delete stuff. You never know when you might need it, or soneone else will find it usefull. Even if you are positive it can go away
Especially with things changing this fast
I don't delete I just never save in the first place lol
I save, i just dont comment
i always back stuff up in zip files
at least the way cloud is currently, storage is priced really high relative to compute for some reason
but i am def deleting most of this from my active version
it's gonna make this much easier to maintain and navigate
i'll have at least 4k lines of redundant code
Google's not bad. I pay almost nothing for several terrabytes of drive space
ye at some point I should put civit models on google drive or backblaze
its an issue also that hugging and civit don't have global servers
so if your GPU is in Australia or somewhere like that the download is slow
Just stick them on your huggingface space. Then you csn also easily share thrm out
yea i use mine as a landfill
problem is getting them from hugging to vast
it seems that everyone has the same problem cos US servers go for a slight premium
Why do you need them on vast?
oh its just that vast have the cheapest compute
We are not in Kansas anymore.
btw, saw the issue report... i've had soooo many ppl have problems installing opensimplex, and the results with it haven't been particularly interesting or anytihng... just took it out from RES4LYF
so let me know when you get the chance if it works now so i can close the issues ppl have opened
deceptively simple looking little beast
25 samplers, unsampling, guides, multistep modes, buffer modes, legit implicit runge kutta sampling, 17 noise types, 6 noise scaling modes, and CFG++
and more (of course :P)
what is your fav sampler?
depends, but i really like the "res" ones
2m then
i have a 4090 so i don't feel the effects of the slower samplers as much, so tbh my fav is probably res_3s
you just inadvertently called me a peasant, lol im struggling with a 1080ti
ouch
yeah, res_2m is prolly a safe bet for that card
that runs as fast as euler
Cannot get RES4LYF to work π
still??
yes
are you still getting an opensimplex error or something
I will get the error soon ... d/loading grounding-dino stuff
Hey not again
it seems you are missing nodes?
yea, trying to get my repo working on there
RES4LYF fails to load ... I need to disable clashing nodes
oh there's a node naming conflict? weird
maybe you have an old version in another folder?
Mebbe - I shall delete that older version - or update it
ohhhhh yeah if you have another one, remove it from your custom_nodes folder and stash it on your desktop or something lol
then git clone again from scratch
that should help
100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 39/39 [00:04<00:00, 8.32it/s]
100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 40/40 [00:08<00:00, 4.50it/s]
Prompt executed in 15.90 seconds
RES4LYF loaded at last - no problems with Open Simplex (yet ...)
thank god lol
yeah i took opensimplex out, i just commented it out
if anyone reallllly wants to use it they can just uncomment it
without opensimplex how can we get Flux grid in SDXL
FIX NODE on both ClownSamplers helped!
oh damn we need a skull and the with needs to have golf club then it will be perfect
wizard death golf club in mars
btw, i recommend you load these images
they have workflows embedded
i have some really sophisticated unsampling on these
it was inspired by the RF inversion stuff, totally redid the math and reworked the algorithm so it's kinda new-ish
I tried that token downsampling thing called Todo, its really good
its a 50% speed boost on any workflow that uses text encoders π€
major upgrade
RFI gives me quasi-controlnet/ipadapter using Flux. 8Gb VRAM for the Xflux IPAdapter does not work! π
yup
later i'm gonna mod the sigma node for it so it detects sdxl etc and uses that method instead
Going to the toilet in space sucks tremednously.
"Have you tried it?!" π₯³
Off world dreams. Nosumer life.
Geoffrey Hackumeriff: "I'd rather plug my butt permanently than have diarrhea in space again. Even Sumo-X couldn't save that pod from my ass."
Don't take laxatives when you fly in orbit. I speak from experience.
Papa Musk banned me from his shuttle.
do a superman pissing
Nothing to stop you from doing it?!Β¬ π₯³
Flux RF Inversion Style Transfer
what are uou tryna say i said gen superman pissing
Try it yourself, why not?
The astronaut to clown shark image I posted has a wf that should also work for this kinda stuff with flux btw
sd3.5?
you might wanna git pull my repo btw, added a lot of stuff to samplerRK
res_2m and res_3m are as fast as euler
Yep
oooh ^^ spicy new toys?! π
wait, you managed to get res3 that fast?!
WF in the second image above btw
well, it'll be different
it's a different style of sampling
more approximate but it also has some advantages from that tradeoff
in that it borrows from previous steps
i mean, res3 should be pretty fckn slow by all means
it's accurate, but just... slow
yep
if you manage to get 50% of the accuracy and no speed penalty, that's a win
yeah it's still a signifiacnt boost
def give both the 2m and 3m a shot
3m is gonna be more sensitive to cfg and low step counts, which is pretty common for that type of sampling
yeah, am gonna have to fiddle around with it a bit. not bad @dusky thistle β€οΈ loving ur sampler
25 samplers, unsampling, latent image guides, multistep modes, buffer modes, legit implicit runge kutta sampling, 17 noise types, 6 noise scaling modes, and CFG++... all in one compact package
i'm sure there's some bugs to be worked out but finally got most of the features rolled in that i was looking to prioritize
you might want to fiddle around with this one https://github.com/Jonseed/ComfyUI-Detail-Daemon
your sampler works great with this
ahh yea i've got a way to do that by manipulating some of the params with noise
i was thinking about adding other stuff to just directly increase the denoise out of sync
but the combination of the detailer daemon and eta noise manipulation makes for some very crispy images
best way to see it demoed is to set eta to... something.. and then lower the s_noise value to something like 0.95 or even 0.9
so much experimentation to be done by just adjusting values xD
... and each model has it's own preferences...
yea for real
you dont even want to know how much code i've written that hasn't made it out of the lab so to speak
i can imagine
i've tried damn near everything imaginable with the math with RF
which is why i've got six noise scaling modes now lol
3m fries with detailer, it seems
and probably at least a thousand of images i burned to fucking death lol
try res_2m
then try res_3m and use a painting prompt as a comparison... something with impasto
Do you plan to bring your nodes to comfyUI manager btw?
yeah, def
been wanting to get everything cleaned up first so i can actually document it
it got completely unmaintainable, just way too much to keep up with, which is what drove me to come up with this universal sampler mech
now i only have to deal with code in one place, not 18 different places over 5k+ lines of mess
are you using eta? you'll usually see more of a diff when you're doing SDE sampling
0.25 only
here's the fun thing about faster sampling btw
can't have it too high because of detailer daemon
you could always just increase the step count to eliminate your time savings π
try turning off the daemon and lowering or increasing s_noise a bit, might do what you're looking for (or maybe not)
xD
yeah, just gonna experiment some further π
but res2m seems to be a very good sampler
@dusky thistle -> if for some reason you're an idiot like me and you put ETA to 1.0 -> the image output becomes black. however. changing it back makes something bugged, and need to restart comfy to get the sampler working again
just so u know β€οΈ
oh weird
always good to hear bug reports
yeah 1.0 with hard is by definition the breaking point for the math
anything less than 1.0 is theoretically doable even if it's horrendous
0.999999999999999999999 works
1.0 doesn't
A video editor focused intently at a cluttered desk, surrounded by multiple screens displaying a complex video editing timeline filled with clips. The room is dimly lit, emphasizing the glow of the monitors. Coffee cups and scattered papers create an atmosphere of urgency and creativity, capturing the essence of the editing process.
set a limiter so it won't break it? ^^
Here is the image you requested.
might also be the combo with the detailer tho -> very often it's patcher nodes that fck things over
A video editor editing on his 3 monitor setup
Wow! "Almost 'surgical' prompt adherence!!!" π
Here is the image you requested.
'Alnost'
yeah, it's alnost surgical
Gibberish is great; Accidental gibberish to the clip encoders.
oh. i'm here to report that my path of exile campaign has had yet more absurd things happening...
how about dropping a reflecting mist + (the interrogation -> vaal orb -> corrupting blood) in a single map?
like those things aren't supposed to happen
part of me wishes that this was actualy physical somewhere and you could take photos of it
I'm pretty sad. There is no that much new Checkpoints on Civitai/HuggingFace; especially Turbo/GGUF ones :/
it's like playing with a VST when you want some hardware with physical knobs to turn π
i've just got this mental visualization of a large pyramid of playing cards mixed with an intricate pattering of standing domnios that range all over your entire house
and yet, the watch/clock is not melting like dali's do
hah
I am training lora for sd3.5m already guys 
it takes only 7.6/12gb vram at 1024px with batch size 2
yeah - one of the main focus's of 3.5 was to make sure it was very easily trainable
that it worked, worked right, created very good images even though it's a base model, that it shines, is flexible and fast - and most of all, very easy to train. no battling with it
yea, and it is just beginning
I think sdxl took more vram, I might be able to bump quality up, but idk how yet
OneTrainer btw, very easy
I can't quite get it working as well on the flow models that don't have PAG but
for the diffusion ones like SD 1.5, SDXL etc, you can use low PAG amounts with no CFG to browse the unconditional distribution, in a form where you can see what the images are actually like
and you can use this to see how overtrained a model is (does it show a range of images or just anime 1girl etc)
if you run this test with Flux then you see clear images, often with that Flux chin
but if you run it with SD 3.5L you see a range of images as you should
have you guys felt sd3.5m responds better to word spaghetti or with actual sentences?
depends on if you're talking to the t5xxl encoder or you take it out of action and just talk to clip_l and clip_g
thanks
cutting out T5 might be good for low VRAM people tbh
It is what SD3 designed for
I noticed sd3.5m degrades at resolutions above 1024px if you don't have T5 encoder but fine at 1024
yea there's probably all sorts of degradations
some people don't seem to want to use big cloud GPU so it is what it is
stuff like NF4 isn't costless either sadly
but it can be fine in ram
yeah putting encoders into ram after use is better in my opinion
i am with rtx3060 and 32gb ram can run sd3.5l and T5 both at fp16 with the same speed as quantized to hell
I think an even better option would be encoding embeds in advance
but that would probably not be popular
is this possible in comfy? I would like this to make grid comparisons
not sure but it wouldn't be a very difficult node to make if its not existing already
will check node manager later
ipndm_v+linear_quadratic
bite your tongue
lol
another option would be using a cloud embedding service
Diffusers is actually set up with the embedding library entirely seperate
what you could do is cache embeddings within one session
- load text encoders 2. encode like 200 prompts 3. unload text encoders
instead people often load and unload everything for each new image
Flux RF Inversion
looks good, can't see any borders or seams
that sounds like a business idea no one's had yet - you should run with this
don't look now, but she's growing an owl
sadly I found flux needed a weirdly high number of steps to finalise image
sometimes 60 and sometimes even 100
its a very expensive model
i like to use dpm_adaptive sometimes and ya steps can be pretty broad, i recall from 33-66, SD3 seems similar
BL4C LoRA
dpm_adaptive is really awesome yeah
this year I used TCD Sampler for like 95% of my images
there is something slightly better than TCD out, but only in Diffusers
Where do we see your images? You're 'very coy' on here!!! π₯³
Kubrick LoRA
Kubrick is awesome
cagliostrolab (Creators of Animagine XL) plan on developing a new big anime model on SD3.5 (either large or medium no one knows yet). SD3.5 has a great future ahead of it https://cagliostrolab.net/posts/dev-notes-001-future-plans-and-beyond
Cagliostro Research Lab
I think I remember seeing Animagine XL on civit
Animagine XL was one of the biggest anime models for SDXL when it first came out
ah okay nice
I read that SD 1.5 is better for anime
but not sure if that has changed
Illustrious XL is pretty good at anime in my opinion better then any 1.5 model
okay I see
curious if Illustrious team will jump on sd3.5, they could actually take the lead from pony, however, Astralite has very sophisticated system and probably already in training
auraflow has the best prompt adherence by a fairly long way
this is so cool, so much smart people doing smart things
all I can do is be exited
yeaaa agree, that is why I am exited about it too
Does it though? I'd not be surprised if up to .2 it was simply trained on much better captured images (read ideogram outputs) than other models
Can anyone find me an Auraflow TensorRT at all? π
The more competition the better, the more people creating finetunes on newer base models will result in better models for the user
and if I understand correctly - he will have more sfw and realistic data, so this model could actually be used for general stuff (possibly)
it's not any more
everyone and their dog does good anime now
I know a guy who won an Oscar working on a Kubrick movie (the movie was Barry Lyndon)
yes yes yes 
did you ask google?
yeah I think Auraflow v2 has best prompt adherence of anything at the moment
if there was an exception, that exception would be Ideogram V2 or the upcoming Playground V3 possibly
Torcello got sucked into a black hole
my favourite model by far is still midjourney, I don't use it though
not as deep a black hole as if he'd landed on https://glif.app/glifs
I am one of the few who truly appreciate the underused Auraflow
This'll let me make an Auraflow TRT?
no, but it'll let you make AI memes - and other things
it's way too much fun
Like Steamed Jam Roly Poly?! π₯³
their project to train an LLM to make comfy workflows was cool
Can confirm Glif is indeed fun
and you can remix other peopel's glifs to make your own version
Auraflow .3 was such a let down, before that it was magic, sadly it seems abandoned.
do you know about Aurum
blend of .2 and .3
I totally agree though
.3 lost the prompt adherence magic
read about it 2 weeks ago, should try it π
so much to try and so little time and compute π€£
when you live on the cutting edge, sometimes you get sliced
lol
the SimpleTuner dev was saying on reddit that Auraflow doesn't train well
not sure
my viewpoint is autoregressive models will quickly overtake diffusion models in prompt adherence anyway
(but be much slower, more expensive and lower image quality)
There's really a whole lot to win on prompt adherence
I had this bright idea to try sd3.5 by having gemini read an entire book and create 25 scenes from it
it was all too complicated for poor sd35 (and flux too), turns out i've gotten really good at writing prompts current image ai's somewhat kind of can work with π€‘
I never really learnt prompt engineering but yeah there is a lot of skill to it
it's not at all too complicated for sd3.5 - however you need to talk to the computer in a way it understands. and rambling on with a lot of text that is meaningless for anything but noise is not the way to get much of anything.
and using gemini just makes it worse. go to meta.ai - tell it about your scenes and then ask it specificly to craft prompts for stable diffusion 3
wanna learn?
prompt "engineering" is just tossing word spaghetti at it until you figure out what the model likes and dislikes. then using that to submit it to your will.
"I am a prompt wrangler!!!"
ide think the possible knowledge of how it could respond is the engineering part
surprised we don't have small LLM and Image model pairs as a normal thing yet.

yeah its weird
this is something open ai really got right
Problem was more the kind of scenes, often multiple people (something simple like a man at the counter of a bank, while a woman sits in the waiting room watching around) or even seemingly normal scenes (two people in a car, understanding the inside of a car turned out hard and the road seen from the windows, placing the steeringwheel). there's so many edge cases for seemingly mundane scenes ai's still struggle with. Often I just prompt for 1 subject, interaction i try rarely as i know it's hard, but when you try to create "real life" scenes, by just prompting, it's not that easy yet.
meta does it with their AI - sometimes works, sometimes faily spectacularly
not that hard if 1. you use the right model and 2. you construct the prompt correctly. and the prompt you give the AI frequentlly doesn't look at all like something you'd write in a book or short story, or say to a human
this "a woman sits in the waiting room watching around" wouldn't even tell most humans what she's actually doing. what does 'watching around' mean?
Making a Dynamic TensorRT for Auraflow3
She's not watching a square?! π₯³
or a pickle
π
that's the biggest issue most people run into - you have to talk to the AI in clear, concise terms - but you are talking to a computer. you must think like it does and talk to it like it thinks. NOT like you think or you talk to a human
it gets rough cos I mostly use highly distilled models for only 2-4 steps with CFG 1
you only get like 9 tokens (less than 9 words) that the model will attend to
Oops Auraflow TensorRT operation c r a s h e d - compilation error in backend
ok is it me or does 3.5m reallly really loves like 3/4 shots
Of Jaegermeister?
walked into that one
π₯³
ive just realized that cfg on 3.5m has a crazy amount of control of the gens.
how about that?
4,7,13 13 totally cooks it but crazy the difference between 4 and 7 from real to a more "anime" style
you broke it then. turn CFG off and then you'll see what flux really is
LOL
seriously. there's a reason it doesn't use CFG. set CFG to 0 and then prompt it
so flux is loosy goosy ur saying?
if everyone followed "the rules" we would never have anything new
nope. but if you turn on cfg, you're not going to get out of the model what it's designed to do
people turn it on because they HAVE to HAVE their negative prompt fix.
but flux isn't designed to use CFG OR use negative prompts
you can possibly get MORE than what its designed to do, crazy
if everyone let the air out of their tires and drove on flats, they'd have very interesting journies
nope. you just break it
stay in the cave my friend
i program this stuff, friend. do you?
Still no OmniGen in Comfy... Hopefully this weekend...
show me
pats you on the head nope. not falling for that trap
yup cant expose the lies, ya sure you learn stuff, that's clear, but when you don't know you BS for some like internet points in a passive/aggressive way, its prety clear
you try this all the time, you realize that? you get a 'no' answer and then you try the manipulation tactics. not going to work.
no, more lies, do what you need for yourself
you dont get to be the most blocked user out of nowhere
you do realize i do not care what names you call me? or what other negative manipulation, bullying, tactics you want to try. maybe that works on your friends and family, but here it only makes you look like a fool
i certainly didnt ask if you cared and none of that happened regardless
call dibs on the recsess penut butter
no way how did you know

how about an image contest on this theme?
I was wondering why 3.5m seems like a really great base model, and been having fun with it. Decided to test its word that shall not be said but starts with N and ends in W, capabilities and gosh darn it can you push it far. no wonder, it all makes sense that when you don't purposely handicap, sandbag, and sensor something it starts working correctly. thanks SAI. β£οΈ

you could copy paste the inference script into a node template and use it now in comfy if you want
its uses hugging transformers and diffusers libs
same prompt also gave this, such an interesting style for sd3.5l to do out of the box
try this prompt: melting hyperdetailed digital art, dripping stunning cosmic belle; drips a vision of heavenly beauty
imma feed that to my LLM too tho
as interpreted by my LLM setup
the llm is going to be very confused ;)
very nice :)
The digital masterpiece showcases a cosmic beauty, a stunning vision of ethereal allure. Her skin, a canvas of iridescent hues, melts and drips like celestial paint, revealing a complex network of shimmering galaxies and nebulas. Long, flowing hair, composed of cascading stars, frames her serene face, where eyes, like twin portals, reflect the vast expanse. The figure gracefully poses, allowing the cosmic substance to drip from her form, creating a captivating contrast between heavenly beauty and the raw, visceral nature of the melting effect. The image exudes a sense of otherworldly tranquility, inviting viewers to immerse themselves in this captivating, hyper-detailed digital creation.
yeah, no.
red panda is from recraft. go to their site and use it, see what you think
I'm not sure what you mean by no since that's what the article says.
i know that's what it says. it's wrong. go to the recraft website and use the actual red panda generator. it's not that good
And I was just sharing but I actually have no opinion about it since I haven't used it. I plan to because of course I'm curious. But I'll be honest I'm very happy with both flux and stable diffusion 3.5 L
they heavily cherry picked the images that got voted on
it got the highest ranking in the elo leaderboard though
and that's a blind test
the link to their site is in that post on twitter. go play with it and see what you think
its on Fal apparently
nice drips
it's not bad it seems more like a bunch of finetunes, and nice gimmicks svg (the flat outputs really are very good) and color palettes. The sad part is SAI gets anarticle "SD3.5 can do woman lying in grass" with only a picture of the horror woman, this model gets an article with nice images... SAI really needs to send better presskits out
red panda, to me, feels like they tried to make a couple flux loras, didn't do them well, and are trying to carve out some of the pie for themselves - get users to use their website to gen with.
press kits would have been ignored. the media is fickle - and they either go for the senssational "look, mysterious company!" or what the reader wants
20B is a lot, Flux is 12B for comparison, so panda is a very chonky transformer
it does hands and text very well
and has strong blur effect abilities like Flux Pro
the aesthetic fine tune seems slightly off to me
I tried using a custom node https://github.com/AIFSH/OmniGen-ComfyUI that uses the diffusers, and there's some library version compatibility error. Definitely not that simple.
@dusky thistle has a new project ;)
there is a Leonardo model that is strong also, apparently
although that was before this summer so maybe it hasn't kept up
companies can't just launch a 2B Unet any more any compete
they haven't published any updates for a couple months, but their last model release is very good
Well petapixel isn't exactly AI friendly. They aren't anti AI per se but I would hardly call them favorable. And the same goes for the majority of the readers if you look at the feedback their articles get.
As a photographer it's a very good site but they have their biases in some things
if we wait a few months there will be papers that benchmark it
there's finally papers that talk about flux and ideogram
Ideogram 2.0?
ye I used to do photography and read petapixel
yeah I think I saw Ideogram 2.0 in a paper
ah yeah I found it
the playground V3 paper has Ideogram 2.0 in the comparisons
https://arxiv.org/abs/2409.10695
got flux in the paper too
Never heard of playground
they did a model that was not great called Playground v2.5
it was really overfit, came out around SDXL time
but their new one looks competitive
Flux hasn't been benchmarking that well, I think it might be the slightly overfit aesthetic that is harming it in benchmarks
no papers on SD 3.5 yet though
the glow effect is rly good
we talked about this. SD3.5 is SD3. we just fixed the issues. and the SD3 paper was written a long time back
what else would you want in a paper other than what's already written?
useful tool https://sd-tokenizer.rocker.boo/
Informs you about how your prompt/words gets turned into tokens, privately. For Stable Diffusion models, CLIP models
value of papers is mostly in testing, discussion, ablations, benchmarking etc
so what sort of data are you wanting that's not in the original paper?
i'll sit here and do those tests if you want
haha sadly you need 30,000 images to do FID for example
I will pay to run myself at some point
Mangled Merge Flux V1 coming soon.
there's also the case of human preference studies
which are quite expensive
there are standardised companies that do those now, the fees are fairly flat but it adds up
we just have to wait a bit more, there will be papers on SD 3.5 soon, there are a fair few papers about Flux now
human prefrences are very subjective however
you might get one faster if you poke lykon
they are, but satisfying human preferences is often an objective so we can't really remove that part
there seems to be a sort of center of gravity anyway
regarding human preferences on most subjects
independent attempts at human preference optimisations often end up with kinda similar results
yeah, but they're so varied, that's an impossible task. the saying 'you can't please everyone' is the truest statement ever made
only because people are a herd animal
but ask them individually - you get cats
yeah I do mostly use stuff like FID to judge things instead
cos the human element is removed
FID has some issues though
it can also be gamed, sadly, its known what to do to subtly raise FID score
smart people don't read reviews ;)
FID is best for like
papers that made a sampler and they want to test what settings are best
so they show FID score for the different settings
for an actual new model I think you've kinda gotta take in all the benches combined, along with human pref study
cos with new models the financial incentive to game benchmarks is higher
the real red-lipped batfish
We β€οΈ the customizations coming from the community with SD3.5!
Check out Clownshark Batwingβs textured oil painting styles, straight from the Stable Diffusion Discord.
You can join the Discord here: https://t.co/Brmd9dAGfr (1/3)
congrats @dusky thistle
and he promptly goes into hiding
sd3 output default 10241x1024, can I get higher
you can try but it likes 1megapixel res, its better to gen at that res and then you go and upscale it
credit to SAI for releasing two killer models in the last week or so
I can only upgrade to get better, right?
what do you mean?
1920x1152 works pretty well for a one-shot generation with SD35M
large is a bit more limited for initial latent size
you'll gain some and lose some with coherence when going outside of the most heavily trained resolutions
Should I generate a standard image first and then use upscaling to increase the pixels?
you should try both π
thanks
there's advantages and disadvantages to both strategies
which is better depends on the subject and model so it's good to experiment with it
are there any optimal settins for your sampler you have found yet Batwing?
there are many worth exploring
try these if you want something really fast
res_3m is fantastic with paint
res_2m is more moderate... both run at euler speed
res_2s and espec res_3s are really high quality
eta = the amount of noise added, try setting that at 0, 0.25, and 0.5 and compare
many appreciations
np
https://github.com/ClownsharkBatwing/RES4LYF?tab=readme-ov-file i dropped a couple of WFs on the readme here
also leaving the WFs embedded in these
scrapes discord, steals all of the shark's workflows
one day we will have proper hands. but composition and textures are improving
put gloves on them
you can't generate in this channel. you have to use the artisan channels. start by reading the information here: #artisan-faq
chill out, dude.
Relax. It's just a meme.
My first SD 3.5 video. This is a method of improving image to image in SD3.5, I have had problems with the base workflow so here are a few work arounds. Also a tiled upscale to get around the image size limitations of SD 3.5.
Workflow is here: https://drive.google.com/file/d/1OFwgvutAcTvh6oTrR6iCDfKkMmFckhxO/view?usp=sharing
clownshark delivering the goods.
Do a Friday themed one XD
maybe SD3.5M just needed some stochasticity after all
now this looks fantastic. One of the few SD3.5 images I have seen that I really like
Oh oh π³
Agreed itβs what 3 shouldβve been from the get go. Lessons were learned
canβt wait for further trainings.
yeah, they obviously just felt compelled to release before it was ready, it really was a beta π
Not going to start this all over again. I got ptsd lol. But yeah.
Some issues are it adores 3/4 shots, likes solid white or black backgrounds,
Friday Sofa.
Dor Brothers very close to the future of what movies will be like soon.
Flux RF Inversion
OmniGen in Comfy is working for me now! π₯³ I had to update my transformers library to 4.45.
No need for person loras anymore. π
oh nice
what have you found it is good for?
that face copying ability does look strong
I also managed to speed up the node made by https://github.com/AIFSH/OmniGen-ComfyUI
So now it's as fast at the Pinokio / non-comfy install.
I'm sure there's some way to make it much faster, but I have no idea what I'm doing. https://github.com/0X-JonMichaelGalindo/OmniGen-ComfyUI
All I did was experiment with it a week ago to see what it could do.
It's limited to photorealism and people as far as reposing behavior goes.
Now that I have it in Comfy, I'll try some more things.
okay awesome
speed ups are tricky to implement, might need to wait for support
particularly stuff like tensorrt
please does anyone have this problem where flux loras work better on civit than on their comfy?? right is my comfy, left is civit, how the lora should actually look. b4 u ask i have tried all the scheduler combos, still same issue persists.
Uploading Mangled Merge V1 Dedistilled currently. It's Mangled Merge Matrix and Magic, plus PixelWave, FluxBooru, and nyanko7's dedistilled model. The model works as a dedistilled model so flux guidance is useless but negative prompts and dynamic thresholding work great. It also get's the styles of PixelWave, and the booru knowledge of FluxBooru and Loras work fine on it too.
cant wait, will it work with normal loras??
Yes. The ones I tested worked fine. You can try it out here:
https://civitai.com/models/788136/mangled-merge-flux?modelVersionId=1019621
ok thaks!
pls can u upload to hugginface?
I saw that there are compressed versions of Omnigen, but didn't figure out how to load these models in a OmniGen Workflow
is there a specific quantiziation you are looking for?
fp8 and 16 thanks!
k. I'll post a link once they are done.
Uploading now. Give it some time though HF is slow with uploads and I've had them fail on me halfway through.
https://huggingface.co/ManglerFTW/Mangled_Merge_Flux_V1_Dedistilled/tree/main
Looks like they are finished uploading. Quicker than I'de expected.
I haven't seen any compressed versions of OmniGen.
thanks!
You're welcome!
hello please the .gguf file, can it be converted to safetensor??
anyone knows if 3.5 is available for forge?
Hmm. I'm not sure how to do that. Is it possible?
i dont know lool but i'm trying to load it nd its saying it cant, probably because its not a safetensor file
I have no idea how to load these. The custom node code is loading a safetensors file, not a pth or a pt file. I don't know if I can just drop that in and point it to the new format.
what program are you using to load it?
That is why I was asking the question
ehh hugginface api for comfy. i'm using rented comfy cloud server
I haven't used it before, but maybe this might help?
how come fp16 is safetensor but fp8 is gguf file. shouldnt they both be safetensors?
bf16 is it's original format, then it was quantized down from that hence fp8 being gguf.
damn, ok thanks by the way. i'll see if i can look for a fix
You're welcome
@unkempt compass Changing the model config to int8 datatype did not change memory requirements, and changing datatype to fp8_e5m2 failed. I do not know what else to try, unless you have any suggestions.


