#🏞|general-with-images
1 messages · Page 162 of 1
is it possible to inpaint with SD3.5?
dont know i use it online
Ahoi!
I don’t understand why it’s so difficult to fix anatomical problems because flux doesn’t have this, even sd 1.5 is better
I wouldn´t necessarily call it bad in case it (occasionally ?) adds an extra leg only, as long as the shape is fine. Yet from what I´ve seen here so far, as well with the black forest gump on the bench FLux/SD 3.5 comparison, SD 3.5 still doesn´t seem to be doing too well with hands and stuff
i think SD is better than flux in working with styles, sometimes in details and colors. i sad that this problem exists
Early days - we'll know by Christmas as to whether 3.5L is a good SD Model; or just a large bottle of milk!!! 😄
on the other hand, you can use SD3 where there are no people or living beings🙂
Mojo
you try it?how fast you make one picture
Not really fast ...
one min?
Hard to compare cause I upscale my flux picture. 2 Minutes in Energy safe mode maybe
nice result?
I switched back to Flux ... but no complete failure ...
good you have hope😎
Cheating (Flux with locally trained LoRAs):
Let's don't talk SD3.5 dead and wait for the finetunes ... base model has never been good
prior to SD3 they actually have been, well except for 2.0 🙂
But we all have been working with the finetunes ....
If I understood right the base models aren't trained 100% so we can do finetunes without overtraining it ....
Well, not really here, while I started using dreamlike photoreal 2.0 for SDXL at some point, the basemodel was already fine, just like with 1.4/1.5, even though finetunes naturally improved the overall output qualities. Yet the base model was a pleasant basis, while I feel with SD3 (/apparently 3.5) and Flux that isn´t the case anymore, reason it´s hard to produce anything out of the boring realism and 3D comic look, because the training material has obviously changed a lot. With Flux for example it looks like they at least for quite an amount trained on a Chinese dataset, with lots of stock footage appearances. Reason I basically went to generating with Cascade again. This merely changed after having started local training for Flux.
Also addressing artists doesn´t really work, neither in Flux nor in SD3, don´t know how it is with SD 3.5, just guessing it isn´t any different than SD3, yet who knows
I don´t know, it´s more like what they trained on and less how they trained
But think of LoRas. You can overtrain them. I think it's the same with models. So if you want the users to be able to create their finetunes you can't train it 100%
Maybe I'm writing garbage ... it's just how I understood it ....
I´m also wondering why SD3 has got 3 different textencoders, making prompting a hassle to work with and best is the textencoder doesn´t even fully grasp the milelong "natural" language prompts, yeah, I know they can be reduced, still, they still struggle then and basically having to translate a prompt into that textencoder language by using an LLM feels counter-intuitive and decouples the artist from the actual prompt engineering to at least some extend. Isn´t there a textencoder being able to understand at least keyword and natural language in one go? Flux luckily "only" got 2, still too much if you ask me. Yes I can combine both, so the prompt will be sent to each of them, then that´s not really the way how its use is being intended and that for, at least how I experienced it here, only partially works well.
because they each do different things. you should really prompt each of them seperately
yes, that´s the thing I´m talking about, why have 3, again 3 different text encoders to begin with, like said, especially in combination with that natural language-LLM approach, supporting generic results anyway. Natural language includes all types of languages including keywords, at least that´s how we funktion 🙂
because they all do different things. and IF you give them each prompts that work to their strengths, you get much better end results than if you just use one of them, or give them all the same prompt
think of three different humans - a mathmatician, an artist, and a chef. put them all together on a project and tell them all to 'bake a cake together' and you'll get a mess. but tell each of them to do specific things for the project, you'll get a much better result
yes, that´s why I´m asking if there isn´t a textencoder understanding a broad width of language styles
text encoders are models. you coudl train one but at the moment, no, there isn't
and they're just as expensive to train as base models are
the model could simply differentiate between the varying input styles and process them in their individual way of understanding.
you'd have to train one for that. thre aren't any out there at this moment that are trained to do that
then LLM, at least something like ChatGPT seem be smart enough to understand basically any style given or can make sense of it and sort it into its varying parts. Then I´m not too much into the technicalities behind LLM and how they actually can work like a textencoder, it´s simply a thought
Fail 😛
only moofi know secret how do this😃
@late sorrel i cant post images in the other channel but heres an example, and also i think i realized the major problem, the frames are the exact same color as the eyes so 100% the ai got confused, which is unfortunate because the other parts seem to be decent in consistency
that looks good man
ah ok, now i see what you trying to do
that indeed can be tricky
i think next time ill train it without the glasses or just make the glasses a different color or something, idk
well yea cause you can change the color of the glasses with a text prompt anyway, just learn the concept first
yeah im new to this, either way its exciting though, some generation epochs are pretty decent and not complete trash so its promising at least, just the glasses that need fixing
making a lora is like half science, half art lol, you mess with the params in a certain way until it gives you what you are looking for, there is no definite way to train. i mean there are things that could be considered objectively better than others, but overall i would say its still a dance with the params
true, no wonder stuff like controlnet and inpainting were developed, ai art is completely unpredictable and you need all those tools to make it do what you want. But its real exciting when you finally get a generation thats coherent, or at least good enough to minorly edit with inpainting, it truly is a work of art
This is what I'm trying to do, just way better
i see
It butchered my prompt, I tried have chains as like hip and boot accessories
And the person standing etc
but wait what are you using to generate? sd3.5?
I'm not using any kind of stable
I'm using a image generator on a site called "perchance"
well the first problem with online stuff is that they are very limited in workflows, not to mention if they are limited in censorship as well
There's a bunch of awesome AI generators on there made by people, for anything from fantasy towns, to plants, to minerals, etc
This site allows one to pump out 30 results for free per batch no credits used
And there are no restrictions for content
well idk what that website is using under the hood, because if you have that information, then you can technically know how to prompt it correctly along with the params
The one thing I can't do, is use an image as an asset on it
like image to image?
yes that is image to image
Oh my bad
because the guidance comes from the clip conditioning on the image, it receives info from the image. but its better to guide it with text on top anyway, not just image
Ironically enough, I can actually draw freehand, and have been for over 13 years. I primarily draw on paper and use digital stuff to make it pop more
i wish i could draw, i would make some very interesting "manga" :3
is that from the freddy game? i never played it, just seen some stuff lol
you are speaking to me as if i know all of this :3
And a colored pencil of a clicker from the last of us
i mean to be fair, i know a lot of characters, but i maybe dont know their names
for example, the danganronpa games, or however you spell that, i know the characters because i seen them, but i dont think i know any of their names lol
huh.. not sure if i know that character
That's nagatoro
where from?
Don't toy with me, Miss Nagatoro!
yep i dont know that one 😦
I plan on drawing Marin Kitagawa at some point
is that a newer anime?
("my dress up darling", it's an anime series about two unlikely friends, a doll making male, and a cosplay gaming and manga geek, female, they make costumes and go to conventions
And do photoshoots and stuff
what year did it release?
2022? Maybe
ah ok so yea im not that familiar with the newer anime
I drew denji from chainsaw man
damn
Here's my crate of "seasonings" they let me cook
it's kinda interesting, cause i remember having fun drawing some stuff as a kid, but i never really continued that path
nice
It's a scanned paper artwork
@wispy nest 2d cell-shaded anime image depicting a woman with long flowy blonde hair, wearing bold makeup, in her casual t-shirt and denim pants, navel cutout, indoors, solid red background. dramatic perspective.
that done with sd3.5?
cause im getting this w/o artist ref 2d cel-shaded anime , a woman in large white t-shirt, bare legs, at a beach as the sun is setting. the ocean water is deep turquoise with orange hue in the sky.
yes. workflow is in it, click the image, open in browser, right click, save as, drop into comfy
i've got no problem using an artist ref, i just tend not to need to usually
hang on, my workflow is ok, but i might be missing something with prompting
here is another example which i dont understand ... it should render anime..
a woman in large white t-shirt, bare legs, at a beach as the sun is setting. the ocean water is deep turquoise with orange hue in the sky. Anime style by Katsuhiro Otomo
change anime style by katsuhiro otomo to: by artist "Makoto Shinkai" <--- and see what you get
nice water :3
not too fussed with the anime thingy atm, im liking the overall image aesthetics
that AI just be trolling you
one thing that stands out for me with sd3.5 is that it can render full body shots lot more fluidly than flux
i've been telling people all this time that flux is not what they thought it is. sd3.5 is fluid, it's flexible, and it's almost effortless to use. and it IS trainable. and the base model really doesn't need any loras
im using the distilled version of sd3.5.. aka large turbo
supercute chibi anime woman in large white t-shirt, bare legs, at a beach as the sun is setting. the ocean water is deep turquoise with orange hue in the sky by artist "Makoto Shinkai"
i'm just using 3.5 large directly from the huggingface page
i had to struggle quite a bit with flux to render ful body shots, it feels like flux focuses on upper body only how its trained
FYI - you can get 3.5 down to 8 steps and it still looks good. cfg 4, steps 8, shift 2
yeah that upper body thing is DPO
DPO likes to "zoom in" a lot and fill the frame with the subject
you're using turbo. i mean sd3.5 large, not turbo, can still go down to almost no steps
im not familiar with that artist, but just wondering, does it really know that artist style? how closely does it match?
oh
i dunno. i don't use names cause i want the style to match. i use them as concentrated data carriers
i see
this is very pleasing to see how her face came out even with full body
im not upscaling
try adding "hard rim lighting' to your prompt
i wonder if booru style tagging triggers anime output
someone gotta try booru yeah see what happens
i did
it rendered anime but need to test more
about realism it falls a bit short compared to flux but still good
able to render at 10 seconds is effortless
can the community train sd3.5 turbo model?
since its already distilled
would be similar to the issues training flux
training a big checkpoint like juggernaut on a distilled model has never been done before
so i take that as a no
but they could train large base model to be optimized with 4 steps while maintaining richer texture?
why would you want to. and here same prompt but changed the sampler to lcm and the scheduler to simple
i mean running large base model at 8 steps is fine too, given the model can be trained to produce better images than the default base
just train on 3.5 large and run the lora with turbo
there will be stuff like hyper coming along probably
as an option
aye
when I do SDXL or SD 1.5 I load up TCD, PCM and hyper loras
and then mix and match weights
they all do similar things
by the time he does that, he might as well not be using turbo ;)
this sd3.5 is 8 billion parameter?
i notice the model is lot less resource hog than flux
feels like im rendering with sd1.5
yet the file size of this 3.5 is 16gb plus the clips
haha yeah its a lot of effort
its a technique I got from BadCat on L2 discord
yeah its 8B
anime girl at a park
`highly detailed, sharp focus, best quality, masterpiece,
anime girl at a park, smiling softly, wearing a red crop top, white shorts, black sneakers. modern anime.`
these quality prompts have impact...
best quality, highly detailed, sharp focus, vibrant texture, dynamic composition,
Aw, so cute! A person, all dressed up in a white button-up shirt with suspenders and dark pants, is standing by a teal-colored wall. They're wearing a floppy, white hat. A white door, framed by some light-colored wooden trim, is opening up to their right. The floor looks like light-colored wood. It's like a scene from a cute anime or a photoshoot, don't you think? Meow! So stylish! Nyaaa~
also not as strictly censored as flux
what park does she go to? :3
not sure, you have to find her and ask her
but to find her, i need to find the park first
do you guys know which checkpoint this is?
Not censored
Good morning coffee
I think there's a reason why there's no "Save Image" node in the example SD 3.5 Workflow ;-P
Do you guys know of a model for "upscaling" for comfyui that doesn't upscale, more just take "paper picture scan" blurryness and sharpens it? As the image is already high res. just need to sharpen and deblur it
the 1x models from here are your best bet https://github.com/Phhofm/models
I would strongly consider upscaling and then downscaling though
There seems to be a problem or a major change in the way SD3 creates natural scenes. The first image is sd3 medium, the second image is sd35 large.
sd3 produces really natural results, whereas sd35 looks pixelated and strange. Both are the exact same standard sd35 comfyui workflows from huggingsface.
this is a workflow issue, the model doesn't always do that
Aye, i can do that if results are good. As using upscales like my usual 4x ones murders my ram xD
you can tile ERSGAN style upscalers in Chainner
Is there a node for that for comfy? As that'd make my upscaling of other images intended for stupidly large results much better xD
it is more or less the standard workflow from the sd35 huggingface -what would i change
try shift = 6 and steps = 60
can lower them later if this works
doing more steps kind of works - shift doesn´t change that much - but thanks! Everything over 40 steps is kind of fine but gets definitely better with more
Hey.. I am having issue with an model .. not sure if its everyone has problem with.. But the legs at the knee it always looks weird ( photo ) Is there a way to fix it?
Hello!
Could you help me ?
I need an Image... Wizard in a cold hill
stalker from hl 2
You'd need the full image for that /s
But even i full photo's they turn up like that sometimes
Yeah I know I was messing with you considering the cropped image suggests a full picture that might not be shareable here.
Also, I don't have a solution for that. Anatomy is one of those things that our eyes and brain register instantly in minute details even if we're not aware of it. But those creases and the particular folding of the back knee is a prime example of something that wouldn't be explicitly trained in a model unless specified.
Also, it's a stylistic problem, since many visual art styles don't bother with exact proportions or shapes or rendering. The style you're working with is exceptionally prone to mistakes being very visible.
this is something you'd have to train a lora for specifically i'm afraid
I was about to ask a question about pretty much all my images being cursed. I have it running locally but im totally new to everything .. ive changed to recommended settings that GPT told me to set .. but all my images are weird like this one
''woman smiling slightly, laying on beach, natural sunlight, high resolution''
Model?
stay away from images that are laying down - you'll need to train a lora for those
is it possible to replace a animal like head with a more human head using ADetailer? Trying to get an anthro character but the faces/heads are way too animalistic for what I am aiming for
i think that's probably also going to require a lora
i think i just had this solved in the tech chanel thanks guys 😄
lol I was about to ask something and then i realised. WAIT... I think I routed the wrong node into this input XD
or do a huge and likely very expensive fine-tune on a large database of humanoid anatomy
just a lora, not that many images, and specifically for what you want to create images of. no reason to do a checkpoint - unless you just want to
or be specific in your prompt: a man laying on a beach towel on the sand, his hands behind his head, his dog laying across his stomach, looking up at the sky,
the issue comes when you give a very unclear prompt like "a man laying on the sand' to all three encoders, and they all have to guess what way he's laying, and what he's doing - and they all guess something else, then try to draw all of the guesses at once
'laying on the sand' - could mean laying on his back, or laying on his stomach. and it could mean with his head toward the camera, or looking at him from the side or looking at him from his feet or looking at him from straight above him. and if they all guess something different, and then try to draw all the guesses at the same time, for one image, you get odd results
but tell it SPECIFICALLY what he's doing and how - or prompt each encoder separately, and you don't get weird results
IF you are using a workflow that just allows you to put in a single positive prompt, then you are giving the same prompt to all 3 encoders, you have no choice. if you are using comfy, you can use the triple encoder node and then prompt each of them individually
I'm a different person ;) and yeah if you want to make general images of humans in general poses that isn't all kinds of messed up, likely it will require a several thousand dollar fine-tune on humanoids and humans for it to properly learn what it needs
(and ofc make it catastrophically forget things like what a bank is and stuff like that)
it knows the information just fine. the issue is specifically the encoders and the farther you move the subject away from standing straight up in front of the camera, the more out of sync they get
and THIS is why you don't do a check point. you do a LoRA - that's what they are for. and you do just specifically what you, personally, want to make
what model?
3.5?
SD3
ah
yes
🤔 I'd have to strongly disagree there. It knows some humans in some positions, and it horrendously deforms anything that isn't in its learned dataset of what humans are capable of
sd3-2b-medium, sd3.5 - and flux too except it's stuffed so full of images to mask the effect you don't need to
since ANN models are "rough models of the universe", how rough comes down to how much data it has on-hand during training to align it to what is actually the truth (or what we want it to do)
anyhthing that uses the archetecture
i just got 3.5 any sampling recomendadtions??
grab this, it's got my comfy workflow in it, take a look at my settings. to get to my actual uploaded image with the workflow you have to 1. click the image to open in the discord viewer. then click the words open in browser. then right click on that image, save as, then drag and drop into comfy
I guess the whole "using diffusion models instead of transformer" kind of defeats the purpose of being able to output accurate depictions of things that it hasn't seen before e.e
lol i know how comfy workflows work
it has NOTHING to do with that
ill take a look
sure, but the issue is that the image you see here on discord isn't the actual image and most people dont' realize you can't just right click the image you see here and download it
barely have enough VRAM to generate a 512x512 image
ah i know, discord packages it differently in the app you need to open the original
https://blog.comfy.org/sd3-5-comfyui/ you might want to use comfy's version then, it's a lot smaller
Following our exciting V1 launch yesterday, we're excited to share that Stable Diffusion 3.5 is now supported in ComfyUI for local inference. Experience it with our signature node-based workflows!
Just now, Stability AI released Stable Diffusion 3.5, including 3 powerful models:
- Stable Diffusion 3.5 Large: With 8
whats with the 3 prompts?
you have 3 encoders. prompt each of them separately for best effect
i do not know how that works
if you remove the text that's there, you'll see which field goes to which encoder
give t5XXL your long, narrative, detail rich prompt. give clip_l your artsy, background, ambient, fine details, style stuff. give clip_G the black and white, no frills text
im new to comfy i dont exactly know how clips work
ah i get it
thanks
the encoders are what read your prompt - they're also models
you CAN give all of them the same prompt, but then they will battle each other, and they will each understand it differently. so prompting them seperatly works much better
what about leaving one ortwo empty
then you are asking the AI to just generate random data and feed that to the encoder. try it and see
oh god is 40 steps slow ill drop it down to 30
not slow on my machine but change the batch to 1 instead of 10, which is where i have it
i use steps between 32 and 40 usually
technology is leaving my 3080 behind quite rapidly
you should be fine, just set the batch to 1 and expect it to take a bit the first time you run it as it loads everything into vram
im using the default sd3.5 workflow set to 512x512 batch size of 1
also using the official sd3.5 large
any other explanation for the terrible artifacting that these models end up creating?
should be good. i will rarely use a size larger than 512x512 because i don't need anything bigger and dont' want to use the drive space. i'll upscale if i need something bigger, later, with topaz
i don't get artifacting. i would look at your personal settings
they work in mostly spatial domain, converting to pure latent information + physical simulation would end up with better results
You're claiming sd 3.5 will never merge a hand into a knee? 🤔
i dont even save the images i generate, i do this for fun
i'm starting to wonder if your reason for being here is to complain about 3.5 and try to start trouble
im tempted to set it to 200 steps and go eat dinner lol to see what happens
or 30 steps but a batch of like 30
you'll massively overcook it. i wouldn't do that, personally
trying to have an interesting discussion about how these models could be improved ...? You claimed you don't ever get artifacts, which... I suspect you don't know what I mean when I say artifacts 🤷
i don't get artifacts, and i do know what you're saying. but i've spent time to learn how it thinks and how to prompt it correctly.
it's not using unet, you cant' talk to it like you would, say, sdxl
by artifacts, I mean "hands merging into knees" "extra fingers", or really any example of bad anatomy.
you mean prompting is it basically ,subject, atribute, atribute, etc
so if you're saying you don't get that at all, it's because you're not asking it for things that may result in that 🤷
Fair point about how it's being prompted, It would be cool if stability released their VLM or a bunch of examples of dataset captioning
i don't get those issues, becasue i know how to prompt it so those don't happen
you could contact them and ask about that
for stable diffusion, the order in which things come in the prompt does matter. the closer to the start, the more weight the terms have. so how you structure your prompt is based on what you want it to focus on
I would say it's likely also the fact that you're not interested in getting very specific things out of the model, such as specific-looking humans in specific poses that aren't standing upright :D
you're welcome to think what you like
ah i knew the first ppart, but im too new to understand the differences between the 2 styles, what works best in what situation etc etc
show me tiptoe kneeling in sand, short black hair, both hands on the ground in front of the character with a side-view perspective 🤷 (then observe the feet and hands turn into abstract horrors)
ther is no such thing as tiptoe kneeling. use a descriptive phrase that actually describes what the person is doing
when you use phrases that are unclear, and made up, and are hard for humans to understand, there's little chance the AI is going to ahve a clue what it is you want it to draw
and when that happens, and the encoders all guess somethign different - and they are all incorrect almsot guarenteed, you get a mess
"I'm too smart to understand what you're talking about" surely you understand how that looks?
you are ASKING for a mess with that phrase
also that's not my "prompt" smh
i don't. it could mean the person is wearing toeshoes, standing on the tips of their toes, with their knees on the dirt. THAT is what i would interpret 'tiptoe kneeling' to mean. i'm pretty sure someone else would picture something else.
and i'm a human with years of experience, not an AI with a very small world view
~*~aesthetic~*~ #boho #fashion, full-body 20-something woman kneeling in sand, wearing shorts and a t-shirt, candid pose
"no artifacts"
"20-something woman" is from SAI's provided prompt btw
this is through replicate API with no changes to default settings, 40 steps, cfg 4.5 etc
rompt: a blond woman sitting on the grass. she is sitting cross-legged. she is holding a sign in her right hand that says "SD 3.5". her hair is blowing in her eyes. she is smiling. She holding a rose in her left hand. it is summer. the setting is a park
"no artifacts"
not what i'd consider 'tiptoe kneeling" to be. that's just kneeling in the sand
yeah I shudder to think what it would look like if I got specific like that :D
nope, no artifacts. that's the AI trying to draw more than one hand or more than one foot at a time. that's an issue with the encoders
and that, again, is what a LoRA is for, if you really want to create images where tht's a problem
🤔 interesting, so you're saying the model wasn't trained properly enough to take advantage of what the autoencoder has to offer (with 16 channels per pixel?!)
not an entirely new checkpoint
i'm also very tired of you trying to put words in my mouth or twist what i've said, so i'm done talking to you
ultimately just looks like SD 3.5 can't do hands and feet, creating artifacts instead. Whether this is an issue with the autoencoder or the diffusion part of the model doesn't matter e.e creating a custom lora for every single specific use-case is moderately silly and defeats the purpose of a general-purpose and powerful model in the first place
it may make sense for "style" but even then, these models in the past have shown great ability to easily switch between photograph and cartoon style without needing to load model offsets/loras/etc
you'd be way better off having a fine-tune specifically for creating high quality human anatomy across a diverse range of styles, poses, content etc. Using Lora's for poses is extremely limiting, having a smarter model would be far better
In any case, I'm looking forward to seeing if this thing trains as good as SDXL does. I've seen some really impressive and diverse capabilities in the much-smaller model
case in point; it makes sense to use lora for style, but not for small anatomy features
FoFr just updated his repo "I've updated any-comfyui-workflow to have access to SD3.5L regular, fp8 and turbo weights." https://github.com/fofr/cog-comfyui/blob/main/CHANGELOG.md
my 3090ti has no issues now running it locally with fp16 >:D
ima probably set this one aside until a good community fine-tune pops up and wait for auto1111 repo to get updated so I can spend a day merging all of my modifications onto it x_x
can someone help me edit the syringe that has been generated to a pendant or a necklace?
Yes its much much better then sd1.5 in anatomy, I would say sdxl level? however its clearly not near as good as Flux.
the word anatomy should not be used with SD3 :3
Yeah sd3 anatomy was well, lets say not too great. Sd3.5 on the other hand does seem pretty decent at anatomy but not flux level.
my 1.5 model showed better anatomy i think
I hope this gets fixed
although I would be happy to run even this locally, because I have 4 GB vram
Maybe idk, but it’s pretty decent at humans so far. This is with the turbo model 8 step, 0 guidance scale.
sometimes is ok
3.5 large)
Prompt? Let me test with turbo
cute kitten sitting near the computer holding a sign give me anatomy, street art
Turbo model, same setting
sd 3 medium
turbo and medium better)
now what can flux
flux lose
but there more style
and now sd 1.5 lmao
Anyone know if there's a model that can make images like this? Took a lot of effort with midjourney blender was hoping there was something trained on psychedellic stuff.
What cfg are you using? the original workflow that was put out had a cfg of 4.5. By going up to 5.5 the vast majority of messed up hands went away.
It's not on flux level, but it's way better than at 4.5.
interesting!
It's technically even better above 5.5, but then the image starts to get burned out
remember to adjust shift as well as cfg
Cat
done, take the cat above
Wizard in a cold hill
Its a bit of a moot point as no Unet or DiT model is anywhere near close taking advantage of what the 16 channel VAEs can do.
If you VAE encode and decode a high-res photo you will pretty much get the same image back, the VAEs are much more powerful models than the Unets and DiTs
you don't realize that maeve is just here to cause arguments?
ah okay, thanks for the heads up
I'm getting very poor output using default SD3.5L workflow
Hope its not my GPU?!
Or is it my prompt at all?
if your gpu was the problem you'd get nothing
I'm seeing that it is actually the prompt doing this ...
(If you d/load my PNG and run the w/f and see if it does the same?!)
🙂
This prompt "ruins" SD3.5L
that looks like your sampler
Really? That output doesn't look prompt related.
that is the world's ugliest prompt. why on earth are you using midjourney weighting with sd 3.5?
It works
it doesn't. 3.5 cant' use those weight commands, that's midjourney specific and not in th emodel, either
But it also throws SD3.5L "a loop!"
yeah. all of that stuff is jsut random noise to it
I got the weighting from Portrait Master - it works fine on my PC
you know, they do make bathroom cleaning supplies...
i've found the best way to weight components of prompts is to change the order
But I can see that it troubles SD3.5L
or if you're desperate, to repeat stuff in different ways
i promise you, 3.5 has no idea what any of that means. you're jsut doing a lot of noise injection
sure. cause flux is basicaly a massive lora. it's frozen. it will do what it wants, no matter what you do
Having said that 3.5 has no idea - every so often the prompt works perfectly
Where's my Leopard?!
Let me see if I can get a perfect example at the same settings ...
what is a a northern irish snake ?
it depends on how the Clip text encode node is coded
it apparently got turned into clothing
It affects clothing using terms like snake, alligator, wolf, fox, peacock etc
i think it tried to eat her pasta, she skinned it, and is wearing it
This is odd...I have a wildcard with animals in it, and the prompt that came up replaced the wildcard with "Leopard print". There is only "Leopard" in the list of animals?!
well? that's what leopard print is - that's clothes
I know, but you're missing the point
Can anybody reproduce my output using the w/f from the PNG? svp
you're gonna have to ask your llm why it replaced it with that
It didn't
you don't have an llm that's reading that list and then writing out the prompts?
You can see the replacement in the first screen shot. That's before it goes to any LLM
what is it that's doing the replacing?
It's done it again with Zebra!
I always had trouble getting wildcard nodes to work
Haven't you used wildcard lists? I have a text file, called "animal" with a list of animals in it. That node picks one from the text list and replaces the __animal__ in the prompt with it
okay so - either there's an LLM that's behind that wildcard node or you ahve a script running. i'm betting it's an llm
There is no LLM
then it's a very poorly written script
you should be able to read the code of the node to see what it's doing
maybe it needs a differnt dictionary
I can't remember which wildcard nodes I tried cos there are many
but there was something fishy, have to check code at some point
It's not intelligent enough to know stripes goes with zebra and print goes with leopard
apparently it is. or it is using a dicitonary that has those entries in it
This show the dictionary of entries, and you can see it doesn't #🏞|general-with-images message
yes, well - it's coming out of that wildcard node in some way, you're gonna have to trouble shoot its code
Its definitely the prompt - and its not an aberration after all - just a densely textured (and unattractive!) look.
This isn't too bad
I dropped the use of both Zentangle and Rococopunk - much more usable output as a consequence!
Zentangle and Rococopunk produced "too much bad noise" 🥳
a lot of civit loras have to be weighted very low
the red background with green looks good
SD 3.5
model from Chernobyl
Yeah ... 6 fingers ...
They look like copied in ...
Something a little odd about the dog.
continue ...
Nothing more to say
anyone got experience with the TCD lora and module? I just added it to my workflow but I'm getting like what seems to be fairly different results from what I was getting before, like I lost most of my prompt adherance and also the generated image almost looks like it hasn't finished denoising.
I'm working in comfy btw
Zhat's why I'm saying ...
yeah I've used it for most images this year
got best results by using this sampler https://github.com/JettHu/ComfyUI-TCD with SGM-uniform for the scheduler
CFG 1.5 and the gamma parameter on the TCD-sampler varies per image
I found TCD is best around 8-20 steps personally
gamma is called eta on that node, by the way
GPU requirements for running stable diffusion 3.5 large model ??
just answered you in the tech-support channel
I don't fully know how to translate correctly from English
Yea that's the nodes I am using already cause the other one wasn't really working at all. ima do a few more gens to see how things are going and I'll get back to you but I think ima need some more help with getting this right
the choice of eta is key
trying every eta from 0 to 1 in 0.1 increments is a good idea
I was having issues getting my normal samplers and schedulers working so I had to back off of it for a bit to get that back to generating anything decent. I finally got a proper gen again so gonna switch back to TCD
my workflow got a BIT more complicated XD
hmm
fair enough. I remember back in 2022 novelAI had made their SD vae to pull WAY better hands from generations when using their model
it was a palpable difference
this can't be right for TCD
ah so while other stuff gets longer with each step this gets shorter or something
cool
I don't get why my generated image changes SOOOO much when using TCD instead of something else. it's like most of my prompt adherance gets nuked
do you have any tips on getting it to adhere to the prompt better and also improving the final image quality cause it's like REALLY bad
higher step count and higher CFG mostly
if you push CFG high enough you might need nodes to deal with CFG burn like thresholding, skimmed CFG or tonemapping
its possible they will make another VAE that is a plug-and-play addition to SD 3.5 although I think that the quality of the newer 16ch VAEs is a lot higher so its less likely due to diminishing returns
there was never really a newer VAE that improved SDXL, beyond the fp-16 fix
I'm going to tryout OmniGen in the new SD Next (just released today!)
ye I don't think it will be terribly needed this time, more likely we'll see some better fine-tune on a huge amount of humanoid to get it to properly communicate to the VAE what needs to happen for those finer details
At the entrance of a dimly lit cave, a towering, majestic dragon with sapphire-hued scales glistens in the faint light. The dragon stands tall, holding two crystalline prisms in its claws, angled precisely like those in the reference photo. The sunlight streams through the cave entrance, hitting the prisms at specific angles, causing vivid, realistic beams of light to split into a spectrum of colors, casting a radiant rainbow on the dusty ground. The surrounding area is shrouded in partial shadow, with the play of light and dark creating a mysterious atmosphere. The dragon’s intelligent, piercing eyes gaze at the viewer, offering a silent challenge: solve the ancient riddle of light and shadow. The cave walls are rugged and dark, with faint engravings hinting at forgotten knowledge. The overall mood is one of mystery, magic, and high-stakes intellect, as the dragon stands guard over the path forward."
if you can figure out how to do that texture along with a prompt for stuff like impasto paint and brush strokes. that could work well as canvas
Yes it is an excllent look for a canvas - stumbled upon a jewel there!!!
Turbo SD3.5L + Ollama LLM for prompting
that you did. might even work fairly well in a 3d app as a texture map with displacement
An MB3D Fractal even
feel like pulling up other apps and seeing what you can do with it in them?
I will get around to it I'm sure 🙂
blender+default cube + material on one face...
that's got all sorts of possiblities suddenly
yeah the juggernaut team said they trained it on hundreds of thousands of hands, and that was an earlier version so its probably in the millions of hands now
their next version apparently they are also training it on bad hands so that can be used as a negative
Thought it was an image where you look at strong enough then move a bit away to see a 3d image?
Does it work for you?
Well i was bad in the days where those 3d stereograms where released in book form... so no 🙂
I know them but don't think A.I. can do this ...
Sometimes I see an eye on the top left or two on the right side
3.5L Turbo
Day of the tentacle ...
ledy duck
Aawww, what a charming night scene! The city streets are all shimmering and sparkling with the reflections of the lights. Rain is falling softly, creating puddles that catch the light like tiny, magical mirrors. The buildings are gorgeous and elegant in their reddish-brown tones, standing tall in the dark night. Streetlights illuminate the way, casting a warm glow on everything around them. Cars are parked neatly along the street, and the air is filled with the quiet hum of the city at night. It's like a dreamy urban wonderland! Purrfect for a magical girl adventure! Nyaa!
Aawww, what a charming night scene! The city streets are all shimmering and sparkling with the reflections of the lights. Rain is falling softly, creating puddles that catch the light like tiny, magical mirrors. The buildings are gorgeous and elegant in their reddish-brown tones, standing tall in the dark night. Streetlights illuminate the way, casting a warm glow on everything around them. Cars are parked neatly along the street, and the air is filled with the quiet hum of the city at night. It's like a dreamy urban wonderland! Purrfect for a magical girl adventure! Nyaa!
#checker board socks that play chess
Turbo 3.5L
Good morning coffee!
Turbo LLM 3.5L
Turbo LLM 3.5L
Best Upscale to use for Cartoon .. Any good ones on openmodeldb.info ?
@pure monolith Prompt: A woman lying on top of a pool of marshmallows., Negative Prompt: left blank, Width: 1024, Height: 1024, Steps: 40, Cfg Scale: 4.0, Shift: 3, Seed: 2926827617
gave your prompt to stable diffusion 3.5
damn, ive only just installed this version .. howe do i even update to 3.5 😛
everything you need to run it or develop with it is here: https://huggingface.co/stabilityai/stable-diffusion-3.5-large/tree/main
or download just the model here https://civitai.com/models/878387/stable-diffusion-35-large
run it in comfyUI
https://huggingface.co/models?other=base_model:adapter:stabilityai/stable-diffusion-3.5-large a lot of LoRAs for 3.5 already on huggingface
oh man
lots of letters and words
xD
I just followed a youtube guide the other day, i assume it was an old version in the guide
i use a standalone if that makes any difference
what do you use?
3.5 just released a couple days ago
I followed this guide .. I had no idea what I was doing, just followed the steps
For those of you with custom built PCs, here's how to install Stable Diffusion in less than 5 minutes -
Github Website Link:
https://github.com/
Hugging Face Website Link:
https://huggingface.co/
Link to GitBash for windows:
https://gitforwindows.org/
Stable Diffusion WebUI Link:
https://github.com/AUTOMATIC1111/stable-diffusion-webui
Hugging F...
my only knowledge on it is that I hit the batch file to start it up lol
not a clue. i suggest you look at installing swarmUI and using comfyUI to generate with through it. it'll make your life a whole lot eaiser. and we have a #🐝|swarm-ui channel if you need help
is this the right method?
you don't want to use that. ugh. no. you really don't want to use that, that's the worst possibly way to try to run stable diffusion that there is
here - https://www.youtube.com/watch?v=AbB33AxrcZo&list=PLIF38owJLhR1EGDY4kOnsEnMyolZgza1x watch scott's tutorials on comfyUI
Today we cover the basics on how to use ComfyUI to create AI Art using stable diffusion models. This node based editor is an ideal workflow tool to leave how AI art is generated, but also how you can really mess with the internal elements much more than you can with any other AI Art interface out there today. #comfyUI #stablediffusion
Install ...
legend thank you
so just a few mins in .. it seems that I download and install comfyui, open it and load the models into it instead of what im doing now which is loading batch file to open the current stable diffusion I have?
probably getting ahead of myself
watch the tutorials. and then DM me if you need help and i'll walk you through stuff
youre so kind ty
you're more than welcome. the more of us that are doing cool things, the better we all grow
add a silouette of frodo baggins
Good morning coffee
Take a look at Forge Webui, its a fork of A1111, works better, faster than comfy, and works with Flux.
3.5L Turbo LLM (Llava3.2)
SD3.5L Turbo LLM (Llama3.2)
Just a usual day in NYC underground ...
SD3.5L Turbo LLM (Llama3.2)
Have you tried the ModelSamplingFlux node?
When in Flux yes ... can it be used in 3.5L as well?
I have seen Olivio Sarkas video on this node - changes detail somewhat ...
Yes ... I have no clue whether it works with SD3.5, too
I just added the node to the super Flux workflow ....
ModelSamplingFlux works on every model you can use it with SD 1.5 if you want
@wispy nest was playing around with it last night and I think posted some images using it in the #🆕|sd3 channel. seemed fairly happy with it
Not sure this is a matter of tech or prompting but how do ya guys fix bad eyes when it's larger pics like this?
Using pony btw, might be relevant
facedetailer is good
I guess I'll try it
lol my mouse immediately went to go click the play icon
I didn't realize it looked like a play icon🙂
Slightly better, I'll try to mix that and perfect eyes... I got like 7 loras atp lol
Node? I assume that's comfyui stuff, I'm using the version for coughing babies, webui
ah I don't know about that
I guess you could call it the vanilla basic bitch version
Yes, I use cuz it's so simple.
Shame I can't use flux there tho. Or at least not as easy as normal checkpoints
TBH with comfy you can just drag someone's image
click "install missing nodes"
hit control+enter
and you have got their workflow running
I got installed just a lil confusing for me but i might use it just for flux stuff
Since it's some kinda miracle checkpoint
miracle checkpoint is a cool way to describe flux yeah
All the stuff I see from it looks real good with very lil prompting so it kinda did look like a miracle lol
with photoshop
Yeeeea, I'll keep trying different seeds
10 seconds in photoshop, 10 hours in stable diffusion ...
I'll take my 10h over installing and watching a tutorial 🥹
select the good eye, copy to clipboard, paste it, flip it, move it over the bad eye, soften the edges, done
Sometimes you just need a hand. Good morning coffee!
3.5L Turbo LLM
Guys what kinds keywords can i use on sdxl to get a style like this. It's like a blend of photorealism and digital art
Boy
I'm serious
I need it for a project i'm working on
Btw currently i'm uisng "(concept art:1.4) {prompt} (digital artwork : 1.4), illustrative, painterly, matte painting, (highly detailed : 1.4)"
and its not giving me the best result
I feel like this is what 99% of sdxl models output 😛
I knowww but i've tried so many times and the images always end up not as digital as i want
What prompt structure do you recommend
For this i don't think it really matters much. "Pretty woman, dark makeup, hair bun, city nightlight, choker necklace, halter" would probably do it.
lots of the big popular models output results like this just by default... reminds me of the old RevAnimiated
just tried your prompt with sd 1.5. Would at least look like @gilded venture example 🙂
I used the same prompt and i got this. @vagrant dust see its looking too realistic
I'm using fooocus btw
not about the ui but the models. For the one above i created an image with animerge but did not denoised it to 100%, then a second pass with epic realism. Pretty sure there are some sdxl models which are well well balenced
Ahhh , i think the issue is that i'm using the sdxl base model. I'll try out Animerge
some SD 1.5 models do better with natural text BTW
I realized i was actually using juggernaut and not the sdxl base. I'll test with sd 1.5 models too
Wdym with text ? you mean text2img ?
ahh got it
can someone please explain to me what the hell this is? I keep writing to support to do something about it but I guess they don't give a shit if they have our money! Why the fuck isn't it working!!!!!!!!
there's a possibility that either your file is corrupt, or it's too large - or maybe you have a network connection that's too slow. there are a number of things, not just those, can cause this issue.
The place where one can make support tickets isn't accessible to normal server members? Is that intended? Also I wanted to report user: davidsmith4704 for randomly writing me "Hello What's up with you?" (We never had contact and I haven't been active on this server for months.)
[I'm reporting this message with the ⚠️ reaction so mods see it ,not sure it this still works though]
Also this doesn't seem to work
yeah, i'ts down
i think they're revising that bot
Realistic Vision 5.1 Hyper
looks perfect at first but after a bit you realize something is off
which is that the lighting on the face doesn't match the lighting on the neck/torso so the face looks copy pasted onto an original face
likely fixable with an image editor
the lighting is the same, it's the coloring that's off - and i'm not sure what that is in the middle of her neck - almost looks like the AI tried to draw a second clavicle
Good morning coffee
🤪
I think I'm done with trying to make SD3.5L work for me. It just doesn't seem to give quality output. Flux on left, 3.5L on right, same prompt and seed. Upscaling makes 3.5L even worse.
raw photo, (realistic:1.5), (woman:1.01), (fashion model 18-years-old:1.5), (tiny, tiny body:0.36), (over the shoulder pose pose:1.25), (giger:1.05), (blue eyes:1.05), (oval eyes shape:1.05), (red lips:1.05), (defined cupid's bow lips:1.05), (glowy makeup:1.05), (in love, in love expression:0.68), (round with defined cheekbones shape face:0.63), (frohawk cut hairstyle:1.05), (mahogany hair:1.05), (disheveled:1.09), (muted colors:0.98), (fashion photography:0.5), (professional photo, balanced photo, balanced exposure:1.2), (watercolor makeup:1.05), (back arch pose:1.25), (vintage dress:1.25), (bikini:1.25), volumetric lighting light, light from left, muted colors, black and white photography, (professional photo, balanced photo, balanced exposure:1.2)
A wheat field at sunset, transitioning from purple to intense yellow. On the left side, a field of wheat is consumed by fire and reduced to ashes. In the center, there is a stone path, and on the right side, a bountiful wheat harvest viewed from above. The scene has a cinematic style, captured with a 20mm lens for added depth and perspective."
Made a new LoRa ... 🙂
Ectomorph LoRA?
Humanmachine ...
I want a general prompt that will have that same style . Not the prompt for that particular picture. So maybe something like
(digital art:2) (cinematic: 1.5) (digital artwork: 2), ultra-high definition, hyper-realistic, vibrant colors, highly detailed (1.5), <any prompt goes here>.
to always produce that similar style. I hope you get my point
for what its worth I preferred the composition on the right
it needs some more fine tuning to be used as the final pass I think
This is just a hobby for me, so I need to ask myself why I’m spending time with a model that requires black magic to get good output when there is an easy to use model available. Flux isn’t perfect, but I’d rather be generating and enjoying the results than constantly tinkering and regretting.
rule #1 - use the tool that does the job you want done
I’ve thought about the possibility of using Flux as a refiner for SD3, but 64GB isn’t enough memory for that on MPS without quantization, and I don’t really want to download quants just to try it out. Not even sure if quants work on MPS yet — they didn’t when Flux was released.
would recommend using Nvidia GPUs in cloud over using Metal Performance Shaders
there's gonna be quite a lot of things that aren't implemented for Apple at the moment
its true that using a separate model as a refiner comes with the downside that you need to keep both models loaded in VRAM
unless you are willing to load and unload a lot
flux wouldn't make a good refiner in my opinion
you could try Realvis SDXL as refiner, looks like this when used on Flux
its a fairly low blur model, which can be helpful
guys, I'm lost, I just want to be able to extend images using stable diffusion... I have made an acount, got a license, but what do I do now?
Install something?
There are these apps on the website like draw things and diffusers, I don't know what any of this is...
oh dear. okay, most of the stuff on the huggingface page are for devlopers. are you wanting to develop? or just use it to generate images with?
I think it is called 'generative fill', to extend an image, that is all I need it for
that's photoshop only
okay thank you
welcome
Since when? Invoke, comfyui and for example diffusionbee (Mac) use a canvas where you can easily outpaint an image which does the same like the generative fill from photoshop?
Gernative fill is photoshop. i think the term might even be trademarked. outpainting is everything else.
and THAT is what he said he was looking for
Your same prompt (without negative) in my workflow. This uses SD3.5 Turbo to create the image and then it's refined using Flux.
Good morning coffee!
SD3.5L Turbo
you can get away with a turbo model for the early steps if you have a refiner yeah
Prompt from @viral frost
flux refiner is cleaning up good wow
Are we posing SD3.5L Turbo as 'poor', if we are turning to Flux for finishing/refining?
not necessarily
I often refine Flux with SD 1.5, but I would say Flux is not a poor model compared to SD 1.5
SD3.5L, SD3.5L Turbo and the SD3.5M that is coming have not had final aesthetic fine tune yet
so a refiner is good now, but in the future they might not need a refiner as much
Flux came out of the box with an aesthetic fine tune done already, so its a bit different for Flux
Accidentally set denoise to 1.0 on the Flux refiner, so this is actually just Flux 😄
hmm I might like the pure Flux image more, hard to say
brothers, where are your faces? what did she do to you?
We are looking for people who want to participate in our new web 3 ecosystem. A brand new project with lots of tools for your needs.
Beta Testers ($35-40/hr)
Moderators/Community Manager ($500/week)
Developers (Rust, Python, C++)
UI/UX designers (from 1 year of commercial work)
Ambassador (to be discussed)
Apply today and be part of a transformative journey fueled by creativity and vision.
To apply, send me a friend request!
(We are also open to proposals for cooperation on mutually beneficial terms)
crypto is nearly 20 years old and I haven't seen a use case yet lol
❓
Animation makes clear ... it's a charging pod ...
Hi. How do I configure the predefined prompts to append to user input so that small amount of 'text' would be generated on the images?
Add a sign with the text saying "Welcome to SD3.5 Medium!"
Ahoi dicordos
Generated with stykegan2 ada
the prompt was a mistake, i pasted a link instead of the prompt and i got this
the prompt: https://huggingface.co/spaces/stabilityai/stable-diffusion-3.5-large-turbo
created in Stable Diffusion 3.5 Large (8B)
SD3.5 medium
flux refiner?
My images, above that comment, were created using SD3.5 Turbo and then refined using Flux.
SD3.5 large
what kind of refiment is it? like upscale?
It's a 2nd ksampler pass using Flux and then a model upscale.
denoise around 0.55
loading the flux models or sd3.5 models alone take a bunch of time, loading one and then the other here would take forever
maybe you have a really good pc
24 GB, oh boy I wish I had gotten into more debt to get one instead of 12 GB VRAM 😂
here it takes like a couple of minutes to load the model and then like 4 minutes SD 3.5... flux dev might take like 7 minutes
and even more loading of the model time
SD3.5 using Google FLAN and Flux refiner.
ah nice I've been meaning to test the Flan thing
your VRAM is being filled
you could get way faster results on your GPU
I don't understand that
VRAM is like the working memory of the GPU
when its full, the model has to use slower memory (DRAM) and this slows it down a lot
I would like to find some refinement method for already generated images, so I don't have to switch or load different models
yes I guess, it takes a lot to fill, also I don't store the models in the SSD as I don't have space
if you switched to using GGUF quants that fit in your VRAM you would see a very dramatic speed up
once it is loaded it is kind of smooth, but in no circunstance I would considerer switching the model in the middle
yes I've been using models with better performance here
why? the 4_0 GGUF looks almost identical for most seeds
and you would receive many multiples of speedup
you could also fit both Flux Dev and 3.5L in your VRAM
if you used 4_0 GGUF
Good morning coffee
BITE COIN
Wow, what prompt?
Is it SD3.5?
it was flux dev, the promtp was
A photo taken inside a rustic hut, with no luxuries, depicting Jimi Hendrix and John Frusciante as castaways. They are relaxing and drinking argentinian mate, a traditional argentine beverage, surrounded by simple, tropical decor. The scene captures a laid-back and creative atmosphere, with instruments and tropical elements in the background.
but the one you quoted, it was done by inpainting, the original I made with flux was the other one. In the original, one is drinking "something", and they are not quite jimi hendrix nor john frusciante, I just inpaint it until I produced that
I was sitting and doing prompts from different shows and they all became so fucking tall and I did multiple ones and they all became the same.. Eventually I look at my photo size and it was 832 x 2048 😄
@nimble mason prompt: Abandoned, century-old lighthouse on rocky coastline at dusk, with crumbling stone walls, rusty lantern room, and overgrown vegetation. Waves crash against weathered foundation. Flickering sunset light casts eerie shadows." sampler: dpmpp_2s_ancestral_cfg_pp scheduler: linear_quadradic
oh, that sampler might be fucked lol
a lot of that stuff stopped working with sd35, including the euler_cfgpp in comfy
SamplerRK is definitely working though
it's something ;)
the "dpmpp_2s" in the dropdown in samplerRK is a dpmpp_2s_ancestral implementation
well... um... yeah, because of several reasons but more work with it than i expected. i'm half way through this chart i'm building. but that was unexpected
this is the entire set for that sampler
lol
i've got distribution on all but that one
yeah i probably need to go do some rescaling with the math or something
that might be good. cause, well ... yes, it's not working all that well ;)
this is another one
but euler_ancestral is nice
oh yeah my cfgpp ones are actually not for RF, i just looked
i'm just using the default that's in comfy as of the latest update for 3.5 medium
haven't updated em... hopefully never will as the goal is to just get all this shit rolled into the same core architecture
gotcha
someone stuck lms right in the middle of the dpm's
recommend trying this for euler ancestral
later. right now i'm putting a chart together to hand out to people that'll just be using the build comfyUI comes with. then i have every intention of playing around with what you have on your repo
gotcha, makes sense
but you might want to look at wht's going on with that sampler anyway
comfy adapted euler ancestral for ret flow recently
if i remember rightly from the commit history
yeah it's not adapted for RF
becuse i should have at least gotten a distribution for that scheduler, not solid red
that was back in the pre-SD3 days
afaik
very very work has been done by pretty much anyone afaik to get noisy sampling working
all the other samplers i'm at least getting a noise distribution, though, for that scheduler. just not that one
I don't know why but it feels like the entire academic field wants these models to be one step of euler
they probably do. scientists get in ruts and don't like to get out of them
ultimately im' gonna delete all of those and just replace them with functions that pass the call to the same core sampler code
lol, yeah, i mean... here's how i see it
either you get it to the point where it's one euler step, or there's benefits to better sampling methods
yann posted something really dumb on twitter the other day, to which i responded with 'proof that scientsts need to get out of the lab and live in the real world'
test of linear quadratic, res_2s with one implicit step
hard noise eta 0.5
same as that first one but with beta
that's not bad at all
i like the first one best
beta's too saturated
agreed
I thought using linear_quadratic on image models instead of video models was a joke
but its working kinda well 🤔
main thing i'm looking out for here: the grainy gritty look that most mmdit models seem to be plagued by for whatever reason
never assume ;)
maybe because they're really LLMs not image models?
the thing that's interesting is the fact these schedules work period
yeah not sure if it's arch or training
tehre's certainly more synthetic data around now
skip layer 2 and see what you get
and image is an image. if you clean up the anomalies before you use it, it doesn't matter if it's synthetic or not
yeah, but i mean... more stuff that's got weird artifacts that aren't typical for photos etc
if you're talking about flux - that would be the midjourney scrape it's stuffed full of
just a random guses at one possible reason for the appearance of new artifact patterns
i've seen it with sd3 and flux
it was at its worst with the sd3 beta
and it tends to appear for certain prompts, which is one reason i suspect training sets being a culprit
when these images come out clean, they're gorgeous
SD3-2b-meduim wasn't supposed to be a release at all. and flux IS sd3, remember
yea that's why i said beta
but yea that is one type of artifact i'm watching out for, and something your comparisons had less of in the linear quad outputs
i've noticed other models doing this shit too now now that i think about it
and different versions of it in certain ways... when flux spits out the fake toy story world look, it's especially prevalent on texturse like dark pavement
dots, dots, dots, fake detail
noise passing as detail
think most ppl think it looks great but... it does not, it's fake bs detail
I thought it was positional embeds
but then Sana came out with literally zero positional embeds and still has it
it's the core architecture
beta with the same seeds
linear quad is def better
i see some subtle haloing with the beta results, it's possible this is somewhat tangentally related to cfg now that i think about it
I think maybe its an inductive bias due to the fact that DiTs have to use patchwise embeds
I really wanna test the flan thing
did you try skipping layer 2 yet?
someone on civit added flan t5 to sd 3.5 and flux
he said its way better but then didn't put comparison images
what's flan t5?
nope, need to get that added, don't have it on here
i'm not doing general updates with my comfy environment
i did some images with just clip_l and clip_g with prompts and a . for t5xxl - and they came out really nice
too much customized stuff
flan t5 is a fine tune of t5 made by google
take it out of the mix entirely
it deals with the tiny details. skipping layer 2 has a tendency to get rid of those dots
SkipLayerGuidanceSD3?
oh nice this could help with lowering the side effects of too much noise injection
the problems with injecting too much noise come from the sigma schedule not getting adjusted to scale for it
yes.
RF is real sensitive to any perturbations to the variance level
also - ddim_uniform warms your image up - has a lot of red shift
anytime you add noise, you have to be sure the next step goes farther down that it normally would, to get you back to the right noise level
colorado river rapids!
ah is this the variance exploding versus variance preserving thing
did you not see the avatar i made you earlier?
no I didn't see
i might have to go make another RAG (random attention guidance) node and hope it's as interesting as it was with cascade
ddim on the left - lots of red
that one isn't plagued with dots, either
is it supposed to have a very visible effect or be really subtle?
haha nice
it's subtle. skiping layer two on the left. skip turned off on the right. the very fine details
workflow is in the images
says no WF for some reason
just curious if you're letting it run through the entire generation or stopping at 0.15
click the image to open in viewer. click the open in browser text. right click, save
i'm using dango's default settings. but i have something else loaded and can't load it right now to check what they are
beta's noise distribution leaves a little bit to be desired
i'll check later then
what sampler is that with? what're you testing here exactly
obv torched lol
dpmpp_sde/beta
oh, yeah, that thing doesn't have any of the noise scaling corrected for RF
that would explain it
need to test Instaflow model
what's that
its like a ret flow version of SD 1.5
not as bad as dpmpp_sde/exponential
new?
oh yeah that's def worth a look
one thing with ret flow is the flow paths aren't meant to cross
I wonder if this is the issue
because Pixart Sigma is a DiT and we never heard about Flux grid back then in the Pixart Sigma days
how would you test?
one thing I've been meaning to do for ages is write a python library that analyses and visualises various group statistics for the diffusion model trajectories
I originally wanted to do it to think about CFG burn but it would be helpful for this sort of thing also
i think it would be too. can you beat @nimble mason to writing it?
the difference between dpmpp_sde_gpu/normal and dpmpp_sde/normal
this sort of chart is great
they plug a toy 1-D example into the model equation to get these
its showing the CFG in the early steps destroying the image diversity
and then limiting it brings the diversity back
Learn the inner workings of Triton, the hardware agnostic language for GPU programming and powering TorchInductor: https://t.co/A6JTVjdRXW
@viscid mural #🏞|general-with-images message
I don't think I need speed bad enough to write Triton for it
it's calling you ...
there's a nice 3D one from GITS paper too
now THAT is a chart i like
that's how they made GITS scheduler they just took the average of these
i want to see what's going on in more than jsut 2D
i really want a visualizer that charts, in vector space, how a token in a prompt moves to it's final destination
3rd dimension adds a lot yeah
in 3D cause, yeah
I always forget the name of this paper, will find it tomorrow
its the one about delaying the negative
it shows maps of individual token vectors on top of the score function of the model
its the thing you are looking for
could be re-visualised in a nicer way probably
bookmarks are a thing ...
so much for assuing that the _gpu on the end of the sampler just means it'll use your GPU if you ahve one
it really does. i'm really surprised how different the distribution is turning out
Their recent share on monodromies was pretty interesting. Will be interesting to see that shift.
chuckles
😎 sometimes it feels like that in here
in what way? you have a link to that?
First thing I thought of when I read this 😄
Good morning coffee
Homer J. S. having a midlife crisis 😄
Apparently, the street gets narrower with age
Cool idea!
Quite a few more over in #🆕|sd3 🙂
I got the idea from a Flux lora, but I did all of mine without the lora.
it was a joke, just gibberish 😛
Looks like Brigitte Macron
a teenager`s head in mini style
it was just a really, skinny, tank
Cause of the wig?
this unironically looks like near where i live
i guess its that unmistakable ex commie eastern europe aesthetic
Andy Warhol Jagger
Flux and Ollama
bonjour, puis-je avoir une assistance en Français?
Je peux essayer de vous aider, mais je dois utiliser un traducteur, si vous êtes d'accord ?
merci, pas de soucis
je suis novice, je cherche à savoir que faire pour créer une image
pour créer une image sur ce discord, vous devez utiliser le canal artisan. Vous pouvez lire les informations à ce sujet dans ce canal #artisan-faq Cependant, il est en anglais, donc je vous suggère de copier le texte et de le coller dans ce traducteur. https://www.deepl.com/en/translator
merci je vais essayer
Tres bien
The red planet
I’ve been trying to get out paint to work for hours now and this is the best I get using poor man’s off paint. They really need a new video update of what needs to be done with modern out paint
where mask??😃
he's a rooster, they're obstinate
he has no right to break such rules!
