#🏞|general-with-images
1 messages · Page 153 of 1
It's pretty goood!
It doesn´t appear, but it probably has an a100
flux dev
😄
can u share prompt ?
Mine was : "Cinematic detailed macro photography, extreme details and complexity,
amazing quality, masterpiece, best quality, hyper detailed, ultra detailed, UHD, perfect anatomy, magic world,
(kitten and fish), fish in the air, spell magic to get fresh fish as food,( fish jumping from magic book), energy flow,
a full body of a cute kitten, kawaii, wearing witches robe, witches hat, holding magic book, magic book on one hand, spell magic"
(stolen from civitai)
So cute 🥺
this one more cute
Cinematic detailed macro photography, extreme details and complexity, amazing quality, masterpiece, best quality, hyper detailed, ultra detailed, UHD, perfect anatomy, magic world,
cute dragon spiting smoke that says (Hello!) the smoke is shaped as the world (Hello)
maybe more work on prompts and we get a smokey hello?!
Do you think they can land?
Hi, please help.
#🤝|tech-support message
Flux is the fun we wanted to get with SD3 ...
lol yeah pretty much
its just really aesthetic even with joke prompts
like your average random prompt is better than most SDXL
I prefer lower values
Has anyone tried how high we can go with the resolution with Flux?
1344x768 gives the most reliably sharp images. 1536x896 is also good, but that's where you start to get occasional images that are blurry. I've found the higher res you go after that, the higher the chances of getting blurry images. I don't know why, it's just what's happened over the course of tons of images. Now I just stay at 1344x768 or some other 1 megapixel total image and i get the best results. That said, 1792x1024 (like what dall-e hq mode renders) is often really good, just be ok with random blurry images.
I was also able to do multi-stage ultimate sd upscale with dev at 1.2x with 0.15 denoise (i kept having to lower it) and it outputs crazy sharp stuff, but it requires many stages and it takes forever.
yeah, above 1mp and the image quality goes down
the upside is that flux is more sharp at this resolution than almost any other model I've seen.
how do you get flux into sdupscale in comfy?
it was just ultimate sd upscaler
which base models generated these two images? it's two different ones, that's my only hint
scentsy max melts simple logo
I am trying to do an model for a user and I cannot get the full view of the model.. What is wrong in this prompt?
a woman with light green eyes posing with her hair in a bun, (full shot:1.2), bright brown hair, (realistic shaded perfect face:0.7), an airbrush painting, pointillism, 1girl, bangs, closed mouth, eyelashes, lips, looking at the viewer, realistic, solo, (whole body:1.2), low amount of freckles, (face:1.2), full shot, wide shot
This isn't an SD image per se, but #💬|general-chat doesn't allow uploads. Does the A1111 Docker template on Vast.ai allow you to pass HF tokens for gated model, like the ComfyUI template does?
Genesis G70 but it's from Temu. Kidding, of course. Nice work
@nimble mason Mind sharing the prompts/environment/PC specs?
you're using square AR - switch to portrait AR
square AR?
might want to post this in the #🤝|tech-support channel
aspect ratio. you're using a square. switch to tall rectangle. and don't use words like 'portrait' in your prompt
I got an answer from Vast.ai chat, it just took a while. I asked the question around 9-10pm my time and didn't get an answer until 12 hours later. I'm PDT (UTC-7, death to DST), if it matters.
4090/ubuntu
Nice
I have a 12400F, A770 16GB LE (think Nvidia FE), running W10 Education. Haven't tested SD locally yet, Vladmandic's benchmarks have the A770 max out at 10-11 It/s for 1.5, and 7 It/s for XL Turbo
The first NC-17-rated MCU film
Still breaks records.
AKA "Like Showgirls but actually good"
Wow, I'm amazed! Considering how much I struggled trying to get an upscaling workflow with SD3 (and never really got it to my satisfaction), Flux just works! Because the image quality seems to break down over 1 MP when doing straight t2i (and because of the generally high memory usage), I decided not to try a hiresfix-like workflow and instead used Ultimate SD Upscale. I was concerned about @jovial tiger's comment about needing to upscale only 1.2x at a time, but I did it straight at 3x and it works GREAT! And there are barely any hallucinations and no visible tile seams, even without any controlnet in the picture. I can't believe it was so easy! The attached are the 3x upscale, a 2x upscale I tried first, and the original generation.
I used the bosh3 sampler through the ODE Samplers custom node and used Ultimate SD Upscale (Custom Sampler) to use bosh3 with that as well. Because I got bad image quality when trying to use regular CFG and needing to bring the Flux Guidance node into the picture, I set the USDU CFG to 1. Maybe this is why it worked so well for me. Plus 4xUltrasharp for the intermediate upscale.
can you paste a screenshot of what that looks like? I'd like to try it. (with bosh that probably took 20 minutes. 🙂 )
At 3x upscale, more like an hour, I think! I'm on a Mac. I'll try to arrange my nodes neatly and get you a screenshot.
SD on a mac is.... disheartening. 🙂 m2 max here.. i gave up a long time ago on that.
It's definitely not the fastest, and I often have to wait for bug fixes or do a lot of troubleshooting before things work, but having 64GB of memory to work with really helps.
Though sometimes I wish I spent the extra money for 128 GB (though not really, it was an extra $1000!).
no question. If I could go back in time I would have maxed out an m2 ultra. with seemingly the m4 ultra nowhere in sight, it would have been worth it
Here's your screenshot, sir!
I've got an M3 Max and it's ~80% of the way there to an M2 Ultra, so I'm not complaining.
thanks. while I was waiting, I tried doing what you're mentioning. here's the results.
so the only negative i can see, is on the bridge of her nose. it took some dithering that was on her nose in the original and expanded on it.
i've been using uni_pc for everything.
i just started using this one. I was using siax 200k before that but the random dots etc was a lot worse.
I'm going to try your ultrasharp on the same image to see if it does the nose thing.
The hair looks pretty good, but viewing at 2x zoom shows that the detail on her pauldron and the cup is missing. What type of intermediate upscale did you do?
Oh, nevermind. I see it. Looks like it's based on SwinIR. When I used A1111, I didn't really like that upscale model.
In my testing with SD3, Ultrasharp tended to produce better detail for me than rESRGAN. I didn't bother testing SwinIR.
I used an upscaling workflow with SDXL extensively and learned that, at least with that model, the method of intermediate upscaling was crucial to the quality of the second-stage output. I started just using Lanczos rescaling and got more blurry images than sharp ones. A lot of people seemed to just use nearest or something like that, but that always produced blocky artifacts for me. But when I switched to rESRGAN (I used Invoke, which didn't support arbitrary upscaling models back then), everything cleared up quite nicely.
definitely, although in this case, I'm not using any intermediate, just ultimate.
I'm referring to the GAN model, like SwinIR or Ultrasharp. Ultimate SD Upscaling node uses this to upscale to the 2nd stage image size before doing i2i on it.
oh ok right.
SwinIR and Ultrasharp are quite old, you can get nicer results from stuff like HAT and DAT
they are in some ways just more modern versions of SwinIR
its got star citizen in the training data
but for Battlestar Galactica it has this
have a specific name of one that you prefer?
swinir on left, ultrasharp 4x on right
swinir has massively more detail
Really? I see exactly the opposite.
teh ultrasharp one is all smoothed out. most of the texture that's in the swinir is gone.
Where are you looking in the image?
I agree that ultrasharp gets rid of the sparkles on the face.
her face in the ultrasharp looks smooth like a doll, airbrushed.
there's almost no texture on her face at all.
But I wasn't so hung up on the sparkles.
I don't think ultrasharp has realistic skin texture, but neither does swinir.
The pauldron detail looks a lot better to me.
The cup still isn't very detailed, but looks a bit cleaner to me.
it's definitely cleaner on the ultrasharp
SwinIR also gave the out-of-focus background a gritty texture all over that I don't really like.
Ultrasharp shows some weird chroma artifacts in the top left, though.
this one is good https://openmodeldb.info/models/4x-RealWebPhoto-v4-dat2
I've been out of the loop on upscaler models, so not sure what the recent advancements have been. I'll give this one a try.
running it again with that new model. I'll reply in a bit with the side by sides.
ok, I think @wispy nest is right. 🙂 this new one is a great mixture of swinir and ultrasharp elements. more texture, but it' sstill clean without abberations. Gonna stay with this new one.
Has a patterning in the OOF areas and some chroma sparkles in the upper left. But I agree that the in-focus textures look good.
i'm doing a more complicated shot next.
if you liked DAT then the best model currently is ATD but it is slower
https://openmodeldb.info/models/4x-RealWebPhoto-v3-atd
and this HAT one is also very strong
https://openmodeldb.info/models/4x-Nomos8kSCHAT-L
so the v3 is better than the v4?
ATD is better than DAT yeah
these are different types of model they aren't versions of each other
@viral frost really impressive.
nice
running atd now
Prompt: https://civitai.com/images/22573083
"Eerie Hollow" - The Shadow
Good morning coffee!
The impossible to train Flux model can now be trained https://github.com/bghira/SimpleTuner
Wow, just tried Flux, its unbelievable
@languid pebble do you know if LoRA of something similar is coming for Flux?
I don't really have an idea. I'm just playing around with it. But it's a big trend at the moment and you can see things coming to Civitai ...
There has been a post about finetuning about an hour ago ...
Send link 🙂
Goodbye Stable Diffusion and expensive AI image generators! Because FLUX is here, and it's absolutely MIND-BLOWING! This FREE and OPEN SOURCE text-to-image AI model generates images of UNBELIEVABLE quality that is never been seen before. In this video, I'll show you why FLUX is the MOST POWERFUL and FLEXIBLE image AI available right now and how ...
Extraordinary feet from Flux.Schnell
Flux.Schnell
I wodner if flux uses parenthesis and all those things we used for other models...
You mean like ((((red)))) or [[[[blue]]]]
I must try it
yeah those
Is it possible to read out the words a model has been trained on?
And maybe some statistics about it?
I got a good result with inpaint but wach time I generate the image the the height of the image has decreasedm why? I take the dimension of image with the auto detect size
dont think people stated its impossible. Most said that it depends on how it is distilled. Even if it were aggressively distilled, you could finetune it, but you will degrade the model by doing so. This could still be the case here because no one is clarifying anything.
The CEO of Invoke...
yea well he is still one person and he propably was hung up on the term "distilled"
I think Lykon also said SD3 would beat it because it wasn't trainable.
they may be known names, but that doesn't mean they have the technical knowledge to make such claims. Reddit has lots of technical people saying that it is likely trainable
Anyway, the link still stands that there are people actively trying to fine tune and create loras for it.
Well, that was the point of my original post.
definetly yea, but we can't make a surefire call yet until we get official answer (or reverse engineered)
i know, but if people say something is impossible, it can usually be dismissed entirely. It wasn't even out for debatte if it were possible or not, just if it is even worth doing it
I do think SD3 medium will get more finetunes then Flux due to the ease of accessibility of SD3 medium, because SD3 medium is 2B parameter model and with SD3.1 will become a 2.5B model, a lot of people will gravitate to SD3 due to its smaller size and Lower requirements now compare that to the behemoth 12B model known as Flux, which has a Higher requirement to use and while training SD3 is cheap and possible on consumer graphic cards, Flux requires a couple of H100s to train a full checkpoint
Let's see how SD3.1 looks before we make that decision 😄
Once bitten...etc.
Yes, it depends on how well SD3.1 preforms as base model, if SAI doesn't mess up SD3.1 then more people will use it and fine tune it
Flux isn't optimized at all yet, so we should wait. finetuning a 12b model should be possible with a single 80g card if it gets optimized
And it isn't really fair to compare a 2.5B model to 12B model, its like comparing gemma2 2B to llama3.1 70B, the more fair comparison is between SD3 ultra and Flux
I haven't compared it.
I am not saying that you did, but there's a lot of people in this discord and on reddit that compare the two, and i don't think thats fair
But still there isn't a single piece of consumer graphic cards currently available that has anywhere near that amount of Vram, now compare that SD3 medium which most consumer graphic cards are able to fine tune, and if SD3.1 turns out to be a good base model, i do think SD3 will be the model with the most fine tunes
yea maybe, but flux is an alternative for people that want to get the best quality. I also hope flux devs will release a 4b 6b or whatever at some point
Black forest labs are currently working on a Text2video model
They don't plan on making any smaller versions of Flux
I don't trust SAI too but the people with lower end Computers don't have a choice do they?, so if auraflow doesn't manage to reach SD3 medium level of quality, most people don't have an alternative
Runpod? I mean how many images do you want to generate
you can generate 1000s of images for 1-2 dollars lol
48 gig is more than enough for inference, and the card is 0.49 cents per hour
Well some people like me, love running ai on their own hardware or some people can't pay to due sanctions or any other situation
and lora training is also possible with 24 gig cards on flux
but that is a small amount compared to the people that do use runpod
and wanting to run AI on your own device would not mean you dont have the option to use other services in the end
especially if you get a lobotomized model as alternative
These first model releases will be always lobotomized, like SD2.O was a shitshow of a model but then SD2.1 proved to promising but was overshadowed by SDXL, and when SDXL released people disliked it and refused to use it
any of these releases wasn't nearly as bad as SD3. All those prior models didn't have a real concept of anatomy because of training data lack, but SD3 got trained to misunderstand anatomy due to deliberate training data
And the worst part about the release is that SAI still hasn't acknowledged their wrongdoings
just look in their discord, post an image of garbled mess
and get called low skill low idiot bob
SD3 wasn't trained on wrong anatomy on purpose, it was just all Nude human bodies were nuked from the dataset which caused the model to completely fail in anatomy
when?
before he left
he explained how they censored it
there is a screenshot in here somewhere from a few months ago
Thats why you have to use some really nasty negative prompts to get a somewhat okayish anatomy
they used random gibberish to nuke nudity from the model but messed it up so the concept was assigned to anything female
Well then mcmonkey (previous SAI dev that left with comfy) said something completely different, here is a screenshot from mcmonkey showing a image of SD3 medium before any safety tuning
it is specifically only for woman lying on grass
this concept wasn't trained. But nudity has been nuked via training data
anatomy does not get destroyed just because you don't have nude bodies in your training data
Also pretrain only introduces basic concepts. It is not supposed to know poses and so on
abliteration is what its called
i heard of abliteration in LLMS, but didn't know they were also applicable to DiT model architecture, interesting
yea and it makes retraining very difficult
so if they do the same with 3.1 its pointless
i hope not
but we will see
Well hopefully now that SAI has actual competition such as auraflow and Flux that they finally get their stuff together and produce an actual good base model
the more competition the better right?
yea espcially if the competition is insanely good lol
Well after getting a new CEO and doing a bit of reorganizing and getting new funding, hopefully now they can focus on making good base model now that they aren't in risk of bankruptcy
We hope for the best, i hope we will get their 8b model aswell at some point
because that seems to be actually competitive
and can be finetuned on 48gig cards according to some tool creators
They said that they will release it at some point, lets hope that they stick to their words
I was talking about SAI's SD3 8B model
i know. pretty sure that's with 3.1 is
however - that's the latest update
3.1 is 2.5b parameters tho as of currently
yeah, i know.
then what are you trying to say here?, that SAI is currently training a new 8B model for 3.1?
that this is what they are working on. and that maybe theyv'e figured out a way to take the large model, reduce the size, and still get the results
so distilling the model into smaller sizes like an LLM?
possibly, yes
This method had worked for LLMS, but SD3 uses a DiT architecture, so maybe they managed to translate it and make it work somehow?, but we will have to wait until the model releases to see
not sure, but since the push everywhere is to find a way to run the models on mobile devices, i'm sure one of the top of the list research is how to make it small and still get the results
I love this one
not just the style, but the content
in either case, sending ppf noise with its own random seed at 90% denoise allows some nice variation 😄
futurist eiffel tower
Flux looking goated
Now it's a question can I run the fp8 on my M2 pro macbook pro (16gb ram) 
Note taken my m2 pro macbook does not like Flux just killed the script lmao
I would visit a doctor ... 😂
flux-schnell
nice. Just imagine the next Stellaris SciFi game having such cute creatures 😄
I fed your image to JoyCaption on HuggingFace, and then prompted Kolors with the result, worked surprisingly well.
"This is a high-resolution photograph in a fantasy style, featuring a young woman with short, platinum blonde hair and pale skin. She is dressed in a shiny, metallic silver armor with intricate details, including a high collar and shoulder guards. Her expression is intense and focused, with her eyes wide and slightly glowing. She holds a translucent, glowing purple goblet in her right hand, which emits swirling, ethereal purple mist. The background depicts a dimly lit, medieval stone corridor with arched doorways and lit candles in sconces on the walls, casting a warm, flickering light. The walls are made of rough-hewn stone, adding to the ancient, mystical atmosphere. The lighting is a mix of natural and artificial, creating a moody, magical ambiance. The textures in the image are highly realistic, from the smooth, reflective armor to the rough, aged stone walls. The overall mood is one of mystery and enchantment, with the glowing purple elements contrasting sharply with the dark, stone surroundings."
Flux has overshadowed everything right now but kolors is really great.
And what is SD3?
"Keep it for kittens .... "
If they don't perform with SD3.1 they are off the market ...
And I am really sorry to say that. But others showed they can peform more with less money ....
I think even in a Flux world Kolors probably has a place, it's not even as demanding hardware-wise as SD3 Medium, just about the same as SDXL
but brings really strong prompt adherence
Both are pretty cool ...
this is mostly because e.g. someone with a 6GB Turing+ Nvidia card and 16GB system RAM could successfully load the Kolors UNET and at least the FP8 ChatGLM encoder, which is all it needs
but the same person would not be able to load CLIP-L + CLIP-G + T5 FP8 + the SD3 Medium transformer model simultaneously
I guess SD3 Medium does work with ONLY T5 though, I think I tried it once
can't remember how results were
Yeah kolors does a really good job of styled midjourney style which is great a lot of the time. Particularly for single subject where you don't need the extreme prompt following of flux
You don't work for yesterdays hardware ...
well my point is just Kolors has very exceptionally good prompt adherence for the requirements. You could say that it provides exactly (or probably better than) what the never-released SDXL version of Ella would have
Kolors is pretty cool ... used it the last weeks ...
anyways back to my original point, JoyCaption detailed outputs I think are the best from anything ever, I think something that used them for its dataset would slap hard. Excited to see it get developed further / released fully
the way it intersperses commonly used prompt words and booru tags into the sentences is sick
I might sound like a bad guy ,,, but often I start working with a model ... telling people here how to use it ... and later they trend ,,,
I like this one more.
Try 6 steps, it performs a lot better usually. After 6, it becomes worse imo.
i dunno man, I've been throwing people's Flux prompts at Kolors all day now
my conclusion is steering rapidly towards "Flux is really not even close to as good as a model with 12B parameters actually should be in comparison to everything else"
https://civitai.com/posts/5075351 
the gap should be a whole lot wider than it actually is if you ask me
A post by diffusionfanatic1173. Tagged with illustration, photography, character, and woman.
perhaps use your SD3 t5xxl encoder prompts then
blurry, bokeh, depth of field, cartoon, anime, 2d, illustration, traditional media, sketch, painting \(medium\), watercolor \(medium\), painterly, worst quality, low quality, normal quality, lowres, unfinished, low res, pixelated, jpeg artifacts, scan artifacts, simple background, bad anatomy, bad composition, bad proportions, bad perspective, bad arm, bad leg, bad feet, bad hands, missing finger, extra digits, missing eye, closed eyes, asymmetrical irises, cross-eyed, lazy eye, disfigured, deformed, broken, ugly, missing limb, missing arm, missing leg, extra limbs, extra arms, extra legs```
LOL
Kolors on left, Flux on right, same prompt for both. No question flux is in a different league than kolors.
Kolors, just like midjourney is stylized to the point of the image collapsing.
it's really good for some things, but for a lot of stuff it's just "too much".
massive negative prompt is not a fair comparison anyway
its harming both models too much
negative prompts shouldn't be used anyway
I wouldn't quite go that far
but in the majority of cases they are very harmful yes
i was testing earlier with a 1.5 model where that all actually has meaning, can try again without it if you really want
would recommend an empty negative prompt 99% of the time
not really, like, "bad anatomy, bad composition, bad proportions, bad hands, bad leg, bad feet" are all literal E621 tags that have distinct meaning to Pony, for example
makes no real difference I guess for Kolors on that one:
whereas for a photograph
someone would not caption a photo as "bad image"
my advice in general doesn't rly apply to Pony
I don't see how Kolors is "too midjourney" either TBH, if anything Flux strikes me as having Dreamshaper Girl Face a lot
SD3 Medium actually IMO is the most "hard realistic" model going ATM
it often produces stuff that really looks like unprocessed reality
SD3 Medium actually IMO is the most "hard realistic" model going ATM 100% agree
SD3 medium gave the most realistic people I have ever seen
at least to my eye, Kolors is MJ style though
Yeah, neither model should even really be using negative prompts and if you do, you only put in something specific to hard steer it if the positive prompt is having a hard time reliably managing it.
i go that far because in around 99.9999999 percent of the cases, there's no valid reason to use them, and in the fraction of the cases where there is, most of them are used incorrectly.
People have to get out of the 1.5 mindset lol
Even sdxl doesn't need more than a few tags in a negative prompt usually
Almost everything you've seen of mine on here has an empty negative prompt
its hard to make a blanket statement because there are workflows that use the negative prompt differently
and everything i post has an empty negative prompt
i did run into a guy earlier today that uses negative prompts like a fine tuned eraser to remove specific style elements from images, which is an interesting case
But people can't break the dumb sd1.5 negative prompt superstitions lol
Usually just adding in a , , in the pos is all you need to make micro adjustments
if i'm not going to leave it totally blank, i'll usually just use a single . i make my adjustments in the poitive prompt
did you try prompting for that?
i mean remove them and have crisp image
the CADS node injects gaussian, exponential, or normally distributed noise into the unconditional, for example
(unconditional means negative)
yep, like nipple or nsfw if you absolutely dont want nudity in it. the other use cases for me are when my pos prompt is already really long and i dont want to water it down with more details
but only if it's not cooperating with me
yes. and the other thing people that use negative prompts fail to understand is that the AI isn't going to understand them if it didn't see stuff labeled with their terms when it trained. how many images were in its data set that were labeled with "poorly drawn hands' for example?
exactly
so it's either going to ignore that, or just get the data for each word and go down rabbit holes they don't want it to
you'll hear me say shit like "and how many images in the dataset do you really think were captioned with ___________"
yup. i do the same. far too many people think the AI has all the world's knowledge and they can just make stuff up, and it'll know not only what it is but what they actually mean
looks like its an animatediff issue cuz when i generate image the color is bright and crisp no blooms.
poorly drawn hand would just get autocaptioned to something like abstract hand
so youd want to use terms like abstract instead
i'll tell people 'think of the AI as if it were an artist that has lived all its life in a dark box and only knows what it was taught. nothing else. and it can't read your mind
I haven't got it to work yet but the Perpneg node is meant to help a lot
it forces the negative prompt vector to be perpendicular to the positive prompt vector
so the negative is less harmful
and then you're using a general term with vectors scattered from one end of latent space to the other
right, which culls out a lot of downstream concepts associated with it
doesn't remove them all, but tones down their values
i try to use the illustration of cutting holes in the AIs memory, or putting force fields around areas - and ask them just how much data is left for the AI to actually use
just learn how to prompt right and you don't need negatives almost ever
imagine it more like pos/neg magnetic polarities
i'll start people off by telling them "what do you think the AI thinks your phrase "xxxxxx" means? go ask it. just use that as the only prompt and see what it does by default
I also stopped using positive prompts so much
I turn off the positive for the first 10% and the last 30% sometimes
and they come back in shock that their pet terms got them garbage renders
want to know what's in the data training set 😉 don't use prompts or image conditioning at all
you can also set the cfg to zero and zero out the positive prompt to see what kind of anti-image you're actually making with the noise
ah yeah I love the fully unconditional generation its like dreaming
yeah, but usually i'm talking to peopel that are using sites where they can't change most of the settings
ahh true true
when I use negative now I try to only have it on for only about 20% of the sigmas, somewhere in the middle
and sometimes only send it to one block
me "your fine tuned model there is overfit for anime girls with large chests" them: 'no it's not" me: "i used the prompt man riding a motorcycle and got a woman. explain that"
yeah so many models overfit on women
so many people making models that dont' know what they're doing, too
I am the biggest PAG fan
PAG is absurdly strong
even a tiny drop of PAG like 0.3 can fix a broken image
yeah, that's why i don't pay much mind to overtrained models. sure, they can produce really amazing results for certain tasks, but i make all kinds of random shit. i don't want my space ship having an ahageo face flying out of a dripping orifice instead of a wormhole...
merging two merges that were made from merging two other merges...
i just notice that animatediff ignores , sharpness prompt while image generation apply it.
a lot of them weren't superstitions, they were just incorrectly applied in some cases, e.g. "masterpiece, best quality, high quality, normal quality, low quality, worst quality" specifically originates from how NovelAI categorized the images in their original dataset, and has very important meaning in all 1.5 models that descend from NAI. There are also SDXL models that were trained with the same sort of categorization. So it's really always been model dependent
that one quote from the SAI employee about masterpiece not meaning anything in Base SD 1.5 was never relevant, it was always missing the context / point
I mostly stick with Juggernaut because its nice and general
but I recently switched to Zavychroma because its more aesthetic
it's not superstitions as much as people seeing something like that and just blindly using it without knowing why it was crafted or used to start with
yeah people do use those ones a lot in photoreal models that don't have any NAI DNA, or weren't specifically trained on them separately
prompting any anime SD 1.5 model without them though gives significantly worse results
I think for models specifically derived from Novel AI you have a point yes
like I was saying to the guy earlier Pony is also different
that was me. But I could go on, like "bad hands" is also a Booru tag that does in fact have meaning to various non-Pony models, some for SDXL, some for SD 1.5
so again it's really always model dependent
yes, as well as needing to be very specific
and only used if really needed
another thing is to not run control nets for all the sigmas
you can often turn them off after 30%
if you run it for less long then sometimes that allows you to run it stronger
I don't actually use control net that much though
I never really know where to get the input images for control net
the only time i use controlnet is if i'm using qr code monster
haha yeah the QR code thing is cool
yes it is 🙂
what was the prompt for this one BTW
if I get blender I could make depth maps in blender I guess
i have tons of femal videos dancing on face and insta , if one of fam found them i gonna have alot of explanation to do 
I have an experimental de-compressed version of Flux-schnell trained. It won't generate well on its own without the guidance embeddings. So I am training those from scratch now. I am also training my first real LoRA test on it (IKEA instructions). 🤞
nice
you can make depth maps in krita
ah okay
you can actually do a lot in krita, you should search youtube for krita stable diffusion
if u need a depth video converter use amuse 2.0 by AMD
I think depth maps on low strength might be good for me because I could get different shape spaceships
krita almost seems more popular than gimp these days
I haven't tried it
krita is far easier to use than gimp
another one you should look at is https://www.pdhowler.com/
Paint and Animate with Howler Digital Painter
particle effects, 3d rendering, real paint program as in how you would really paint with real brushes and physical media, scripting, and AI now
and priced at the terribly high price of less than 30 bucks
krita has ai plugin that is crazy good , it support rendering of animation
thanks I didn't know about Howler
most people don't. it's been around a very long time, best kept secret of the internet. incredible program
Regions are new in Krita Diffusion Plugin v1.18.0! Set up area-specific text prompts and control layers. They are linked to layer groups, and can be re-used thoughout your entire workflow.
Website: https://www.interstice.cloud
GitHub: https://github.com/Acly/krita-ai-diffusion
This is an open-source plugin which you can run free on your local ...
take alook at krita ai!
its photoshop ai made better +free!
Hordes of zombies with glowing eyes and decaying flesh fill a massive stadium. A lone rock star with a spiked mohawk and ripped leather jacket plays a flaming guitar on a raised platform. Neon spotlights illuminate the sea of undead, their rotting arms swaying in unison. Fireworks burst overhead, raining sparks onto the zombified audience. Zombie mosh pits form near the stage, with limbs flying and heads rolling. A giant banner stretches across the top, reading "Brains & Beats Fest" in pulsating, green letters. Zombie roadies with exposed bones push giant speakers. Fog machines spew green mist, creating an eerie atmosphere. Skeletal hands reach up from the crowd, grasping at the air. Some zombies crowd surf on top of their undead peers. The drummer, a half-decomposed corpse, pounds on a set of skull-shaped drums. Zombie backup singers with microphones groan into them rhythmically.
how can I get the dimension of this image and put there?
if u can save it as image then read them dimension and put them there
run the image node from mask out to a save image node
I will try
how can I put more then one controlnet here? When I make a copy and link the other one go out
faces are brilliant with FLUX ..
even better with cascade
nah
tried cascade too
and kolors
this FLUX has huge potential but training will take a lot of resources
and im just using the schnell version of FLUX .. the Dev version has better quality from what ive seen
running dev on my system lags my pc
yeah, schnell is the worst version of them 3, but unfortunately it's also the one with the Apache 2.0 licence
it means that people will likely finetune this model instead of dev
for lora it could be fine I guess? because I believe that a schnell lora could be compatible with a dev lora? not sure about that though
where can i use it
i wouldn't use the word worst to draw a comparison, schnell beats sd3 in many ways, but dev version has better detail
it has better detail and better trivia too, it just knows more stuff and from what I've seen, people are only using dev right now, no one care about the "inferior" version
or if you don't want to pay for it, https://www.mage.space/ here
if someone wants to finetune schnell, I think its main goal would be to get a "finetuned schnell" that would be genuinely better than "base dev"
no, it's not inferior
you are talking out of your ass dude
never said it was bad
go sulk somehwere else
if schnell was "equal" to dev, people would use that one instead
because we can go for only 4 steps on schnell
tha'ts the difference between them. they aren't inferior to each other. one is not better than the other.
correct
there's the github page
and just to be fair yuri schnell is designed to work on pc that don't have high end gpu
and the distilled variant is in a lot of cases, better unless you are a serious researcher
if it's not "equal" then it means one is superior to the other
schnell is fast, but gives worse quality
no. not any more so than an apple being differnet from a banana means one is superior to the other. they are different
they ahve different uses
in terms of overall quality, schnell is worse than dev, or else everyone would be using schnell right now, who would want to go for 20 steps when you can go for 4 on schnell?
there's a reason they decided to not give apache 2.0 licence to dev, it's the best local flux version of both two
@flint grail who do you actually work for and why are you actually here? this isn't the black forest discord. go cause a battle over there
i have a pretty good impression of your mindset yuri.. now stop spewing garbage
what the fuck? I'm trying to give some arguments there, all your responses are ad-hominems
most people ARE using Schnell right now
you're the troll in this case
that's not a constructive argument
schell by it's low resource requirement still sets a decent standards
you can't be your own judge and insult people when they disagree with you
stop acting like a child
almost no one's using dev right now. they're not even trying to figure out how to train it. they're all working on Schnell
im judging by result
while you are spouting blindly
somone must have triggered your vulnerable spot about flux models
you're the one spouting blindly, I've tested the both of them and I find schnell just worse in quality
it really doesn't matter though, this is the stable diffusion discord, not the black forest discord. so if you have a real argument, take it to black forest
then tell me why dev doesn't have the apache 2.0 licence?
or L2 where they're developing
the results how you are framing inferior is rather blunt headed
why do they don't want us to finetune that model?
cuase that's not how they chose to release it
go ask Robin
if schnell is "equivalent" or "as good" as dev then dev would be useless
that's my point
and tbh after testing flux for 2 days now i think schnell sets a pretty decent bar for high standards among most other models out there
okay. good. but all your arguments are worthless here. no one has your answers. go ask Robin why he's doing what he's doing
I didn't know there was a flux discord, yeah I should check that out, sorry if I went off topic for too long
what i was telling you is dev has better detail than schnell but that's not to say schnell is inferior ... do you understand how that sounds when you call schnell inferior...lol
there's a flux github page. that's the place, really, to post. in the community which is fairly active
you admit that schnell is inferior to dev on details, I make the conclusion that it's inferior overall, you don't, that's where we're arguing at, you can disagree with me on the conclusion, it's all right
or the schnell huggingface community https://huggingface.co/black-forest-labs/FLUX.1-schnell/discussions
sure dont have to agree with me
now you stop arguing and go post on the black forest community
and dont expect me to call schnell inferior
I don't expect you anything
but I expect you to stop insulting people who disagree with you
good enough
that's simple enough right?
this is a public discord, not your house dude
people have the right to talk and disagee with your suggestions
is there a free SD to use?thanks
sure .. but have some common sense when you go around calling schnell inferior that's bs and not acceptable
why do you act like I insulted your mother or something? if I want to call schnell inferior and you're triggered by that, then you have a serious problem
you stfu
I'm not the one insulting people becaue they dared to disagree with my point of views
you have some serious anger issues
where can i find SD bot?
sure, what version of it?
what SD bot?
anything i can run on my pc. mac m2
um, that might be a better question for the #🤝|tech-support channel
ok thanks
i saw someone is using stable diffusion on discord for free
which discord?
im sure of it
i am too, but there are a lot of different implimentations of it. which discord did you see it on?\
interesting thread on negative prompt https://www.reddit.com/r/StableDiffusion/comments/ybn13h/investigating_common_negative_prompts_using/
131 votes, 30 comments. 541K subscribers in the StableDiffusion community. /r/StableDiffusion is back open after the protest of Reddit killing open…
ROFL! we discussed that in detail earlier. good find
i kinda set negative clip with FLUX and adjust the cfg .. but result didn't matter with or without negatives
The dev and schnell models are distilled and don't actually use cfg
So they don't use negative prompts
yeah i read that, the workflow im using has the option to tweak cfg
You tweak the guidance and the shift
Cfg should always be one if you're using a regular ksampler
You adjust the guidance on the node between the model and the ksampler
that's what im using set at 1 but not using guidance in this workflow altho i have other custom workflow with guidance set at 3.5
Default guidance is 7 iirc, at least it is on their official version on diffusers
you should look up the conversation that @unique condor and I had on negatives earlier
the workflow that i used sets guidance to 3.5
and that's intended for FLUX
is there a full paper out on Flux yet?
no. and if it takes too much longer, the entire internet is going to beat Robin's door down i think
Since it's distilled, you should think of it like a turbo or lightning model where they'd normally use a cfg of 1-2 and no negative prompt. That's the whole point of distilling them: to weed out garbage and not need a negative prompt to steer it away from the garbage
i have no complaints about negative either way... so far with FLUX and w/o negative results are pretty good
@jovial tiger After yesterday's discussion about intermediate upscaler models for Flux, I did a detailed comparison with my test image (with seed fixed on 2nd stage to keep things as consistent as possible). Note that this is a single image, though I find its combination of features to be useful, and most of my comments are nitpicky. If you saw each image in isolation, you might not have any objections.
I compared five models: 4x-UltraSharp.pth, 4xRealWebPhoto_v4_dat2.pth, 4xRealWebPhoto_v3_atd.pth, 4xNomos8kSCHAT-L.pth, and 4xNomosWebPhoto_RealPLKSR.pth. tl;dr, I still prefer ultrasharp, as I find its detail to be prominent, but controlled and realistic. The others still work mostly fine except for the DAT, for a specific reason I note below. I did my comparisons at 200% zoom to concentrate on the details.
4x-UltraSharp.pth
Low contrast areas are a bit featureless
Texture is convincing, not repetitive or smeared
Oversharpening of high-contrast edges (branches, man) is suppressed
Slight noise over entire image, not preferred but not objectionable
Retains details in silhouetted branches in upper right
Realistic detail in brush on lower left/right
Alien's right hand has deformed first finger
4xRealWebPhoto_v4_dat2.pth
Low contrast areas are featureless
Skin texture not so convincing, looks like a carpet
Oversharpening of high-contrast edges NOT suppressed
Noticeable noise over entire image, noise has vertical stripes -- objectionable
Silhouetted branches in upper left & right have no detail
Brush in lower left has slightly less depth (more "hair fan" than "hair ball")
Taller shrubs are denser
Dirt and rocks at the bottom are slightly grittier
Alien's right hand is a bit less deformed
4xRealWebPhoto_v3_atd.pth
Low contrast areas are a bit featureless, similar to Ultrasharp
Texture is convincing, a bit less pronounced than Ultrasharp
Oversharpening of high-contrast edges is pronounced
No noise over entire image! Very smooth.
Silhouetted branches in upper right look completely flat & sketchy/outlined
Brush in lower left has slightly less depth, but lower right has a bit more
Taller shrubs are too dense/twigs are chunky
Dirt and rocks at the bottom are slightly grittier
Alien's right hand is a bit less deformed, forearm is blurry
Right foot is a different interpretation, has a hole in it
Man's right hand is a bit garbled
4xNomos8kSCHAT-L.pth
Low contrast areas are a bit featureless, slightly better than Ultrasharp
Texture is okay, better than the DAT, but more generic than Ultrasharp
Oversharpening of high-contrast edges only slightly worse than Ultrasharp
Slightly more blanket noise than Ultrasharp, not objectionable
Silhouetted branches have no detail, but more coherent than ATD and DAT
Brush in lower corners is a bit flatter than Ultrasharp
Taller shrubs are denser, but twigs still thin -- looks natural
Dirt and rocks at the bottom are less contrasty than Ultrasharp
Alien's right hand is less deformed and arm is still in focus
4xNomosWebPhoto_RealPLKSR.pth
Low contrast areas have more detail
Texture is very generic/checkerboardy, too pronounced
Oversharpening is not suppressed
Pronounced blanket noise, objectionable course grittiness to fog & hills to right of alien
Silhouetted branches retain more detail, but less than Ultrasharp
Brush in lower left is flatter, but lower right is more contrasty
Taller shrubs are denser
Dirt and rocks at the bottom are slightly grittier
Alien's right hand is less deformed, arm in focus
ALien's right foot has a hole in it
also taken sdxl examples, the key difference i feel between sdxl and flux is that flux looks more organic and natural
well it should, it's a 12b model with a 16 channel vae
literally 6x the network size of sd3 2b. size doesn't always matter and all, but still, it's a lot more room to store information
sd3 had huge pontential too if it wasn't for botched up anatomy, but in general the quality was pretty vivid
what got me pissed about sd3 is that every few render with female models things got f*ed up
and this FLUX 0 cherry picking
thus my comment that in most cases the distilled version was 'better' - for the end user it's better, or at least eaiser, to use for good results
yeah for sure
100%
the only issue with distilled versions though is that they get gimped on variety to a degree
FLUX also has a very high aesthetic standards for a basic base model
but it's a good tradeoff
it's not anatomy. it's all subjects. the farther you get from straight in front of the camera, the more the subject warps, stretches, shrinks, and the AI starts trying to draw it from multiple points of view at the same time. the issue is in the core of the architecture and flux has it too. but flux is so huge it's almost not noticeable, where 2b medium is so small, it slams it in your face
ahh sounds reasonable
try these two maybe:
https://openmodeldb.info/models/4x-LexicaDAT2-otf
https://openmodeldb.info/models/4x-LexicaHAT
tons of indepth bug hunting to track down what was happening.
although i dont have technical analysis from programmer's pov
Oh, great, more pixel hunting for me! lol
you can see the effect in this truck. we're up in the air, looking down at it, at an angle, and it's angled. it's shortened, with the bed almost all the way pulled up to the cab
the lexica dataset is AI generated
which might help with your golem thing
the photo upscalers expect a photo really
with transformer upscalers you can't go too far out of their training distribution
they are not as broad as diffusion
sure but the thing is we have had far decent results with sdxl and sd 1.5 so people had a disappointing experience with sd3
also look at the stretching on the wings of this bird, and it's leg and tail are so shortened they're almsot totally gone
as opposed to this bird that is directly in front of the camera
i know, but what they were expecting was the much larger 8b model, not the small 2b model
SD3 failed on everything except speed and backgrounds/landscapes. Emad dropped the ball.
emad had nothing to do with it. the model wasn't supposed to be released. the community here and on reddit that decided to start screaming that stability wasn't really going to open source forced them to open source an unfinished model. they demanded it, they got it
i think Emad had already left the company before SD3 was released
I know but hell be their maskot forever.
First tuned version of FLUX , brought to you by @Machinedelusion over on @HelloCivitai
https://t.co/JvXcJJPl7R
lol you have those gif still
🤣
awesome, reading it
did you look at this https://huggingface.co/Comfy-Org/flux1-dev/blob/main/flux1-dev-fp8.safetensors
not fine tuned but they released a single checkpoint version of Flux Dev
at some point i was probably on that page when trying to fugure out why i couldn't get comfy to run flux, but no, i haven't actually done more than glance at it.
well the size shrunk to 16gb
good 🙂 - wonder what they did
i was trying it out few hours ago but unfortunately it lags my pc
ah. it's fp8
they didn' create a model card, either.
no
probalby discussion about it on L2
probably they rushed the relased idk
no idea. he's part of the comfy team, that's all i know about him though
Flux flex.
No hype, all delivery, that would have been good even if we had to wait 2 weeks after the initial announcement.
Sora will also probably disappoint everyone when it drops, if it does.
Flux gud
They just quantized the transformer to q8 with the en8fpztuehs w/e the hell format and then saved it. Comfy made a version for both flux models and packed the same q8 version of t5, clip L and the vae
So it's an all in one vs having to have models in the unet/clip/vae folders
oh and the nice thing about it is that it not only saves some disk space, it saves loading time if you plan on using it in q8 anyways. otherwise, you have to load the full fp16 model and quantize it on the fly, which takes up a bunch of ram and time.
Soon...
I tried this - no real difference to Dev or Schnell!
Flux.Schnell
"AI art is the future it's so good."
AI art:
Bedroom
amazing work. sent you a DM! 🙌
someone needs to go eat
😄
food porn for the eyes, no need to eat
btw... i've tested 3 different workflows and this one seems to render the fastest, about 1 min or slightly less.
i now prompt based on real life event, i copy pasted a youtubr video title here.
nice
such a massive noticeable increase in image quality on this channel as soon as Flux came out
Which model?
if it looks good then its Flux Dev
so... we've got a solution you might want to try
flux dev
Which sd version?
any idea what FLUX wants here ?
its a similar architecture to sd 3
its a rectified flow matching architecture
emo or hitler cat?
emo lol
Flux Dev/Comfy I've never used Comfy before and the learning curve is steep! But these turned out ok.
this is really nice, what model is this?
thank you ! ... style : default style , steampunk ...
Flux
Reminds me of a book I used to have when I was a little kid, but I can't quite remember which one it was
what is the first image art style called. I love it
Yiss that's the style I was prompting for
(artwork in the style of Andy Kehoe:1.5), drawing of a man, winter season, lion mask, contemporary, American, fantasy, painting, whimsy, creatures, dreaminess, landscapes, mystery, adventure masterpiece, best quality, very aesthetic, absurdres
The model is the base sdxl
thank you! It's so reminiscent of childrens book illustration and its wonderful
Oh yeah I know that name. There were/are a couple other artists with similar styles too
im feeling too lazy to write up complex prompt
also these pretty model faces soothe my mind
Good morning coffee! 🙂
This is my first prompt A girl walking a dog with rain coat on
but then I like the bottom right most and I press "vartiation"
I then get these images instead
Can I do something like this in stable diffusion automatic 1111?
When I use **Seed: -1 ** most of the seeds are just 1 number away from each other..
right. so that's their website. if you want to do that with stable diffusion, how are you running SD?
SD1.5?
Sorry, I'm running SD on my local PC
probalby not like they are. they have code behind the scenes to do this. with sd 1.5 you would first need to be much more descriptive in yoru prompt to make sure you got the same style, and then you can do batches
you'd want to use "cell shaded, line art, cartoon," at the very least
Im not talking about the style but the option to do "variation" of that image .. easily 😄
could you use comfy UI?
this would be possible there
the structure of the image has already been set which means this was done via a noise injection after the first 30% of sigmas or so
noise injection can be done either via a ksampler (sometimes called unsampling, and sometimes called flipping the sigmas) or by just adding noise to the latent directly
you just put the prompt in, set the batch to more than 1, and just keep hitting the generate button
How do I do so the seeds are differents from the other?
I made 2 images and there is only +1 different
Image 1: 978357606
Image 2: 978357607
you give stable one seed, you ask for a batch. it will randomize all the other seeds in the batch. that's what makes your image have variations.
it's automatic and with stable the seeds are rarely just one number different
I always get 1 number away from the other :/
maybe trying putting 0 as the seed number
a lot of UI interpret 0 as "make seed fully random"
on the other service or in stable diffusion auto1111
In stable diffusion
then try what he just suggested
The seed is just 0 and 1 then..
If I do with 4 images they are 0 - 1 - 2 and 3..
i think you might want to talk to @dry crow in the #🤝|tech-support next time he's online
that sounds very odd for a1111
So with -1 .. it will be a random seed but the next seed is just +1 from the previous
so @dry crow will be around, probalby later, and he's the one that you need to talk to, he knows auto1111 very well. something sounds like it's not set up right. so you might want to post the issue in the #🤝|tech-support channel
if I remember rightly this might actually be the default behaviour
its not a good default though
if you want the next image to be random you have to run it with -1 in the seed box
either -1 or 0 I can't remember which one
I do that but it will be a random number but each image after that is just +1 from the previous one.
yeah I think this might just be how it goes
if you tick the extras box you can try variation seeds
you might prefer that
I tried playing around there but was the same, you know how it works?
last time I used A1111 was 12 months ago and its a pretty foggy memory now
Its useful to put groggy faces right using ADetailer
Still, that is random
Different seeds are different seeds ... the gap between different seeds is not significant, just the fact that the succeeding number is different
Exactly
But you can also click on Seed Extras to use variatons
Swarmui also has variation seed
yeah its quite a common feature
how to use it
there's a few different ways of doing it
tends to be a second latent added in but with the magnitudes set by the "variation" slider
but you can do it via noise injection at later sigmas also
in auto1111 set seed to -1 to randomize
But what if I want the seeds to get completly apart from each other when I do 4 batch images?
by setting -1 you will have variation regardless of batch or not
Y'all make sum wild shit and I just make anime tiddies
Peace, Art and Rock'n'Roll ... 😄
Hi Dicordos
in the new versions of reforge, there are schedulers called align your steps git, align your steps 11, align your steps 32. i'm trying to figure out wtf they are , but now my conspiracy theory that Google is borking it's search results on open source ai topics, is blazing hot.
okay, so i saw one comment on one post about align your steps for 10 step generations. i'm thinking thats what align your steps 11 would be for. lower step gens.
still jsut theorizing
noise scheduler
i know what noise schedulers are doing, and i know what align your steps is, but there are 3 other varients in reforge that are confuddling me
"ays 11" seems to be doing well on low steps. thinking thats that one so "ays 32" should do better on higher steps and bad on low steps.
nope. ays 11 and ays 32 actually produce identical results on the same seed
which are also identical to the base AYS scheduler too. so i'm thinkign ays, ays 11, and ays 32 are all the exact same code. AYS GITS though... that does differnt images on teh same seed
nope not actually at high steps just low steps it does. so i'm just going to nix those from my ui and ignore it
think i only need the base AYS
these ays sigma functions all seem to be following a very similar procedure. good link. i'm not a coder but sometimes i can see the redhead in it
I find it SUPER irritating when you get something beautiful but you're deliberatly trying to do something and it's not working. Like, this is great, and I love it, but. I didn't want this. So I am irritated.
Flux just looks sooo good.
It’s too bad it is so damn heavy to run local
I really am new here and not familiar with the community rules. Please let me know if what I posted is inappropriate. I will delete it if so😀
Any idea is welcome~
So you want us to help you help some company predatorially manipulate and con middle aged people with some bogus dating app...?
I'm out, not helping here.
of course not…Just exploring how to apply the tech into ads field and ads care about CTR and conversion. Sorry for making this uncomfortable impression. I will delete it then
oh, so you want to design clickbait
Good morning!
Not really, unless you say all ads are clickbait. I was trying to reproduce some pics that have been proven as good cases in ads. Different industries want pics of different styles. So I am curious how to make it work.
- training a lora
- canny edge and depth control nets
- a bunch of IP adapters chained together
I didn't see the question but that will be the answer anyway
Thanks~
If you want realistic photo style of average persons, do NOT use flux as it usually gives highly smooth polished photo, like top model style
flux-schnell, online
FLUX is good
Agree!! These pics are too smooth to be true. And I do feel like smooth polished pic is a common issue among checkpoints and lora. I mean these pics are awesome, but not the ones preferred by the ads business. Anyway, thank you for this tip!
they are all top model in generating Caucasian, but not much so regarding Asian in general , the only problem is the face attributes diversity is flat

