#🏞|general-with-images
1 messages · Page 135 of 1
Which one are you using?
#SD3 a kart racing track in the Harry Potter style, in a cave
taking requests if anyone wants me to run prompts.
Ah I tried to get this to run today in ComfyUI but gave up after multiple hours 
try online
API 
I did but I wanted a local install with it 
api. I got it configured with stability core, so it was ready to go for sd3.
Team Fortress 2 gameplay screenshot, Heavy Weapons Guy character is holding a minigun and is shooting with furious intent. He is screaming.
iirc yoinked said it looked similar to DALLE3 in a recent model version
so if it gets to that level I'm impressed
probably wont but still worth a try
Thing is the API runs a pre release model, just keep that in mind
They are still working on it
oh yeah I know
yeah generic shooter
final release, max quality. got it.
^ LMAO
So guys, we must judge it with super harsh critisism and treat it as a final model, good idea. I hope the Subreddit will do that.
I found it has issues with a lot of characters that the model doesnt seem to know
the subreddit will ONE HUNDRED PERCENT do this
like no joke
Sure will, its reddit after all
people are gonna shit on it
I am going to expect that
@jovial tiger A hilarious meme featuring two distinct halves. The top half showcases a dark, gritty scene of a Darksouls-style warrior engulfed in flames, wielding a massive sword, and looking determined. The caption above reads: "This would be a great analogy of...". The bottom half of the meme features a blonde Leon Kennedy from Resident Evil, holding two large pistols, with a mischievous grin on his face. The caption below reads: "YOUR MOM!"
if it's like free or whatever then I would like some more test with stuff like text
if its paid then this would be my last request
so it's worth noting that of the 6 i send back, half are expanded through an llm i have. and 3 are raw as you typed it.
wow this is pixart-sigma quality prompt adherence so far 
like I know its preview
but thank you for the images
see that's nice
I'm just testing stuff that worked great with ideogram
memes work well with ideogram
Those remind me of SD 1.1. And not in a good way.
me fr fr
nah it can do text fine. just have to do a simpler prompt
my dream was that it would be as good as ideogram tbh
so far preview doesn't seem like it
ideogram can't do 37 subjects either. 🙂
the more subjects with ideogram, the most simple it gets.
I just don't know what to expect from the final model
this is from ideogram for example
I would like to state that I am still aware that this SD3 is a PREVIEW model
Intense anime-inspired digital art battle: towering mecha bears with glowing red eyes clash against tiny warriors in an epic struggle to breach a majestic mountain pass, sword clashes and energy blasts illuminate the chaotic scene with dramatic lighting, massive stone arches tower overhead.
that's nice
it's also worth mentioning, that this is default settings. so no adjustment for step count, upscaling, hires fix etc. all the normal stuff we'd be doing.
Photo of Leonardo DiCaprio holding a floppy disk with "SD3-800M" written on it. He is examining the floppy disk up close with confused expression. He is puckering his lips.
that's the case for midjourney and dall-e as well tho
In the center of a vibrant canvas, a fierce battle scene takes place against a backdrop of lava-covered terrain. The left silhouette character is a majestic ninja, holding wolverine claws and the right behind it is a japanese samurai, holding a katana. The black background is reminiscent of a vivid landscape, with the lava mountain contrasting sharply against the stark white background. The painting exudes an aura of calmness and serenity, making this an unforgettable representation of martial arts in the most unexpected way.
this one is pretty good yeh
definitely not. midjourney and dall-e are doing TONS of processing. they're a workflow of lots of steps, hardly just a simple diffusion and send.
oh wait which sd3 is the one the API uses? biggest one? you choose?
amazing thank you
#SD3 a stone giant wrapped in chains and tattoos is crouching in front of a gate looking down at a small group of diminutive people who are trying to gain access to the cloud town
can someone do something with large groups of people, which normal SD often starts messing up, e.g.
"A party of many groups of people in the style of The Garden of Earthly Delights"
it's also doing that giant crouching correctly. pixart is pretty good with that, but this is better.
no squashed calves
i'll only post one. that kind of art often has naked people, so i won't post more of that here.
hmmm
thanks it does look a bit better than sdxl but many of them are still deformed as I feared
ok last request? have to get back to work
Screenshot of Steam Store page website, multiple game pictures listed in grid pattern, Games listed: Team Fortress 2, Left 4 Dead 2, Titanfall 2, Half Life 3
thank you 🙏
This wont work 
doomer take but as far as I can tell sd3 is a bit better in general but most of the issues are likely still there, and if this is the last open source architecture we'll get in the near future, self hosted is going to quickly fall behind
^
screenshot of a steam store page website, multiple game pictures of cute furry animals, games listed: monster furry 1, monster racing 3, monster jetski park
ok that's really cool.
i'm very impressed that it could just make that up.
nice
i'm not concerned about visuals, that can be cleaned up with more passes and steps
(broken record - I know its a preview) but I expected a littttttlllee better from prompt adherence
like 90% of ideogram
it sadly couldn't do the meme I wanted so that dream's gone
yeah... I think that research paper was "looks better than dall-e and ideogram" which for regular stuff, I think it does. but prompt adherence may be an evolving thing.
or the benchmarking had terribly simple prompts
i put a lot of other stuff against it. people sneaking past a sleeping tiger and it did it, whereas no amount of sdxl seeds could.
yeah thats great
but yeah, simpler non-expanded prompts seem to do better.
aw
it's so hit or miss though. i've done a lot of the same against pixart, and it's completely 50/50 as to whether the llm expanded prompt will come out better than raw.
so i just have my automation run both per job.
#sd3 a flamboyant K-pop singer on an elevated platform singing. Down below, hordes of zombies march across a modern city street.
yes, those are cherry picked.
out of 12.
yeah lack of highresfix but nice
damn the face is so bad in both
yeah that's what no highresfix gets you
pixart is worse. it's only good after I've put it through an sdxl refiner.
i assume this is a lower end model for sd3.
not their top
if it is I'd be happy
cause if the lower models can perform this good then its all good
they might want to generate data on all of them yeh
Close up faces are better for sure
this is an example of what things can look like with the refiner.
of sdxl.
so i could definitely clean up those ones above easily.
yeah we see prompt blending
^^ is from pixart, with my model merge as sdxl refiner.
oh
the 3 a few posts back is just raw sd3 api
hybrid half life and l4dead
you twube 😃
second its for china
#sd3 Menacing cyberpunk Squidwarrior brandishing a towering rocket launcher, its sinister eyes alight with merriment. Tentacles grip the massive cannon firmly; fiery blasts engulf the nightmarish scene in smoky orange light.
Holding of weapons in sd3 is much better
#sd3 emperor sitting on throne in ornate vestments, drinking a soda from a can. Anthropomorphic Turtle minions raising another can of soda on a felt pillow towards his face. In a Decorated throne room.
wow diet coke
Did you also tested other ratios?
Anyone got access to sd3?
api is available now
Where can I use it?
So isn't available yet?
for vip maybe
it is…i named options where you can can find the answer to your question
But there isn't any site that got the API and put it publicly available?
Yeah I can run a prompt if you want. Apparently my work will suffer today.
Where and how?
bruh why bitly
API. I had written automation for it with their core model. So changing the endpoint took a few seconds.
Interesting, but any other site did what you did to make it available for anyone?
a vintage black and white photograph of an old woman in the mountains holding up large thorny plants, her face is hidden by them as she holds on to one branch for support, her dress has tattered off with leaves falling from it, the sky above is dark clouds and misty, there's some mountain range behind her
I haven't yet. Let me know if you want me to run something for you with an alternate AR
Maybe, but there will always be a demand for the open source
source solution. gimp may be behind Photoshop, but it has a huge following, same with blender, same with Linux itself
Prompt: brie Larson, with White sleeveless blouse, dirty and wounded, blonde ponytail hair, angry expression, holding a Flare gun, with old castle in the background, snowy, dark ambient, cinematic, 35mm film
it's not really that simple, as things get more expensive. there is no open-source nvidia for example
base models cost millions to train
Worked pretty well if you ask me 😄
thanks!!
can you try these out:
glass sphere head by germaine krull, pictorialism
hand colored stamp, family heads in row, pictorialism
pictorialism print,(pictorialism), by helene schjerfbeck, featured on flickr, orphism, film grain, provia,portrait,Jules Bastien-Lepage, postcard
Those are sd15 style prompts
i want to see the difference
A photo of an eerie scene from the early 20th century, featuring two antique wooden women hanging in dark woods. The focus is on their ethereal and haunting presence amidst the dense foliage, illuminated by soft moonlight casting long shadows. Shot with Kodak Portra film using a Leica M3 camera to capture intricate details and textures. in the style of an early 20th century photographer.
Hard to argue that point given the reported funding issues facing sai, but some solution will present itself, whether it's distributed computing or something else
@jovial tiger
I dont think distributed computing works with ai training
it would be incredibly slow I think
it's kind of weird. many of the images i'm getting black are blurred...not sure if that's intentional or not. none of the "squid doing this!" were blurred.
SETI for AI 🙂 I'm not saying there's something out there right now that's ready to go, I'm saying it's a problem in search of a solution
victorian era human sized doll body, human face, in rural kitchen, balancing on an rope and holding flowers while looking to the right where the door is, in the door is and old man balancing three cups of water on his head and shoulders, pictorialism, soft focus, julia margaret cameron, alfred stieglitz @jovial tiger
wondering how coherent the scene would be
Very interesting, looks like a Worsened version of midjourney
Now thats something in my style
It's not really live action, and the hands are not that good
interesting is the variety
An enchanting artwork titled "Os Sonhos de Madoriya" featuring a ginger-haired, short-haired young woman with blue eyes. She holds a glowing yellow supernatural stone and wears elegant Greek-inspired white clothes adorned with golden armor features. In the foreground, a mysterious young man with long, black hair and a cape stands, knife in hand, also in a dynamic pose. The overall atmosphere of the scene is intense, with a blend of light and shadow, creating a sense of adventure and intrigue. @jovial tiger try this one, is very specific
haha 🙂
Not very coherent actually, I expected more
They have the turbo model as an option. This isn't that. I have no control over steps or anything though.
like its really really bad
Yeah...
@jovial tiger try this, it is a very difficult one, If he can do that, that's good model
red dead redemption realistic concept art, realistic art, horror wallpaper vibe, the title says "os sonhos de madoriya". in the up front a ginger short haired young woman with blue eyes and short hair, with greek white clothes with golden armor features in a dynamic pose holding a a glowing bright supernatural yellow stone, a young man with white skin and brown hair and dark clothes holding a knife in a dynamic pose in the right, an evil old king with white skin and hair and beard in the back, a young woman princess with a cocky face expression and white hair and purple eyes. in the back a kingdom in the background with army of soldiers, epic, cinematic composition, action vibe, dynamic vibe
I have to imagine this situation is similar to all of the Stable Diffusion releases.
Their models just struggle, I would bet some finetuned versions look much better than this.
why i get this result with promt girl holding a large domestic hornet
Yeah, probably
no idea XD
Just as a reminder, this is the preview Version of SD3, it's still being worked on
What if you say "hornet insect"
Not coherent at all, dalle 3 has a very better prompt understanding and ideogram too
But lets also remember that SDXL 0.9 was basically in the same ballpark as SDXL 1, things are not going to radically change, they will just be a bit more clear or concise
@jovial tiger But thank you very much for showing us how the model really looks
That's just speculation tho, we dont know
Also prompting changed again, people gotta adjust and not just throw old prompts at it to compare models
four bottles lined up on a table. from left to right, they are numbered "4" then "3" then "1" then "2". from left to right, they are red, blue, green, and orange. the setting is a magical forest from a tolkien novel. in the sky is an alien UFO abducting a cow with a ray of light.
We have like 5 stable releases where this is true though.
Nice, haha
Really looks...in this aspect ratio. @jovial tiger Still don't found a prompt thats worthy wasting your credits.
certainly not worth 20x sdxl
You did just with prompts?
who is even looking to make finetunes of sd3
Waste away. 🙂 I'm curious to see what'll do
It is just not that good
Its not "that good" as a final step. But the composition, consistency, and variety are ace.
then everyone comes around once all the finetunes are out
just in time to whine about the next stability model 😄
Just pass it through SDXL and we are at midjourney level already, SD3 doesn't need to do everything
Sdxl is not good as well lol
Dalle 3 and ideogram rules
This is sounding spiteful lol
Lets see if it has the classic cave entrance problem. => A analog photo of a futuristic metropolis build inside of a dark cave.
that was the prompt and that's all there was to it
I know the models are cool because they are open source, but in an aesthetic sense there really is no comparison
#sd3 Illustration of the DMV headquarters, showcasing a sleek and futuristic design. The building stands tall amidst other skyscrapers. On its exterior are massive digital screens that are unmissable. One screen prominently features the 'DMV' logo while the adjacent screen displays an alarming 'OFFLINE' message in vibrant red. Below these main messages, there are warning signs and error codes flashing, suggesting a major system disruption. Pedestrians on the street halt in their tracks, some pointing at the screens and discussing among themselves, while a few are capturing the moment on their devices.
F that
He didn't skip leg day, and neither did his arms. 🙂
I think Stability needs more creatives. They have brilliant researchers, and people in the AI bubble. But they need more artists.
Clown, you know that SD is putting openings with light leaking in in every cave picture.
Dalle is good for single pass processing. If you want to develop prompts, it might be a better way to get a clean result.
But SDXL is objectively superior considering how the pipeline can be augmented.
If Dalle was open, this would not be true.
True
How are people generating images, with Kijais SD3 comfy node?
gotcha, so you're wondering about the light? as in having it be dark
I have my own api automation.
For now anyway
That would be neat to integrate it into comfy though for more processing
Surrounded by stone, so to say.
City is emitting the light
k
A analog photo of a futuristic metropolis build inside of a dark cave. the city is plunged into darkness during a power outage. the entire image is nearly pitch black.
darkness during a power outage 🤣
is that sdxl output or something?
1.5
haha
my model 1.5 base)
desperate measures haha
the AI pushed the muscular apparatus to an extreme mutation!!! ... 
That styled text gen is so nice, can you use the "minecraft" text style to spell other things?
@nimble mason @junior sky
so API is using a very old model?
why would they purposefully do that shit
they sabotage their image because of a possibly inferior model
Isn't that just plain stupid?
To make Boto nervous about our responses to it. /s
They knew he'd have to do damage control on here and wanted to play a prank on him
lol
I'm fine with it. I'm happy there's something to play with as they continue to grind away on it
That must be it. Would have been better to open the API access on the 1st of April
No more "release sd3!" Posts now
Release the current SD3 model!!!!
No SD3 now
Sd3
Soul would say: Skill issue
Anyway. Now i finally can go on with my life. Nothing happened.
what I also don't get it that they almost never go out of their way to debunk myths and rumours
things that make the community disappointed
they just let it happen
Or this new tweet is damage control and they now postpone launch a few months trying to fix things
@jovial tiger what interesting thing I found is that SD3 might be making random letterboxes the same way as Pixart-Sigma? lmao
But when sdxl was introduced on clipdrop, it was also inferior in every possible way
at least I got one for now
a dev said that they wont delay SD3 instead they might make SD3.1 and SD3.5 which will introduce changed or further finetunes
but idk
I have trained loras to purposely reduce the quality of SDXL to what this model outputs.
I feel that I have failed you. I can not adequately express the sheer amount of laughing that I just did at this message. I'm sorry.
I need to frame that on the wall
That'd be good. As this feels very much like sd 1.4 leak. Nowhere near done
well done and thank you for this very useful information! ... 🤝
keep in mind this is also a month older version and this has no highresfix applied to the images
lykon got amazing images but those probably had highresfix
This ideogram vs sd3 makes it look so bad. Text was supposed the big thing. Never really valued it much personally, but this is just sad.
Shit. It knows how i look.
idk what Stability is thinking rn lol
first attempt? or did it need a few
which again begs the question: why release it like this this :/ I'm glad there will be a new model, but right now i'd not even be surprised if an ella version for sdxl actually will be just as good.
going off what one of the devs said on twitter: the model they put on the api is a months old initial version of sd3 that is really bad
and that the sd3 devs have no control over release or pricing
yup
But it did a good job
SD3 does a better job hiding the hands to not embarrass itself completely.
hehe
Why bake a model forever?
If stability did that with 1.5, how long would it have taken them to match the quality the community made?
Sadly the rest were much worse. Not doubled fingers, just not 5, so I guess that's an improvement
So, no word about the dental situation that is going on?
Never get a second chance for a first impression
It's worth mentioning that everything lykon has posted in the last week or so was MUCH better than any of this. So this version certainly shows its age
what prompt of it? I want to try it by sdxl
@el_mejnun I expected something like this was gonna happen. That's a model from months ago (basically the paper one) and the inference is not on Comfy but on some custom made script by Fireworks.
I wouldn't rely on that to assess the model quality.
I know this was posted btw
The gigantic elephant in the room is still: Why would somebody do that?
But how much of what was on twitter was cherrypicked (or even postprocessed/upscaled), that's why i wanted to try this api. And going by "i'll belief it when i see it" what i see now isn't very good
what sd3 looked like in feb (plus sd upscaler)
If i had to guess, management fumbled the release and opted for a cheaper to inference version? or something similar?
Dark panorama with a horrific green-skinned, hyperrealistic, life-like, zombie with rotting flesh and maggots crawling out of its eyes on a graveyard, large dark forest below, extremely detailed, 8k, intricate, warm summer night vibes, eerie silence, undead, dramatic back lighting, by by emil melmoth. just a normal prompt. It's a mess of noise
SD1.5, SD2, SDXL0.9, StableCascade.... all had middling presentation from the get go.
They have been used by the community in ways that were convenient. Same will be true here.
maybe with the final version + some finetunes and tooling and figuring out what settings work best it will be more obviously cooler
but going from 1b 1.5 to 3.5b sdxl, the now 8B sd3 doesnt look as better as the parameter increase would make me think
SD3 should be super responsive to the IpAdapter style dynamic step weighting, I think it will look phenomenal once we get our hands on it.
/sai an angry man with a name tag that says "aimingfail" holding his hands up to the viewer so you can see his fingers.
wait for the full release to make ur opininos
Source?
isdk Stability AI made a post that literally say 'Stable Diffusion 3 API Now Available' so it is sd3
it just might get much better
just to clarify, this is what is "soon" to Emad:
To clarify weights will be made available soon (always API first, then a few weeks later weights).
lol
lost in translation
Thanks
From SDXL ft. Just for fun.
That is not a source, that is speculation. Unless I am mistaken.
they are a dev for stable diffusion
^
Well I am happy to be wrong lol
I still think it is good enough as is, and expecting one pass perfection is a fools gambit.
At least right now, the tech is still too young to want it to do everything all at once.
well the point is that its a botched release and we should wait until the weights come out
Its not botched.
almost everyone here was going to wait for the weights to come out either way
Its like saying sd1.5 is botched because the base model is crap
no, this isnt the finished sd3 model
and its not even being inferenced properly
@itaybachman @StabilityAI As I said earlier today and also yesterday, this is a very old (and broken) build of the model. Also I learned the backend is not Comfy so it's not using my current (or even my older) workflow, nor the workflow that's on the Early Access bot. Not sure why it was marketed as final
which is embarrassing from stability's part
Calling this release "botched" is going to set expectations at an unrealistic level.
That is probably semantically correct, but we are in the ballpark.
idk why they'd sabotage their image like this
lykon is saying the backend is not even correctly inferencing sd3 I count that as botched
That is fine, and I love the work Lykon is doing, but how people interpret things is a different matter.
I do not like the API rollout, but I do like what the model is doing, even in a hobbled state.
"Membership"
it's free for non-commercial
which version? the api one or?
Oh, there's a SD3 channel now. I wonder if somebody will notice it if i just post upscaled v1.5-pruned-emaonly outputs.
This is what I am currently working on. Its using Command-R tools in combination with Stable Diffusion. It works almost the same how ChatGPT and Dall-E integrate with. There are no hardcoded triggerwords or anything like that. The model itself decides when and how to generate images.
Text above the image is the bot's reply. Text under the image is the generated prompt for Stable Diffusion. Look how Command-R knows exactly the prompt format AUTOMATIC1111 uses despite me not describing it. I only told it to use the Stable-Diffusion prompt format when generating stuff. however sometimes it still doesn't close all parentheses it opens if not instructed further
This is gonna be open source and will be an extension to text-generation-webui
I use the Faraday app, and load 7b parameter models onto my CPU and then do text conversation prompt engineering then just copy it over it comfy.
What you have looks very very nice though.
This is a bit different than prompt engineering, its more like GPT-4V as its a normal chat bot, not something just for image generation
you can talk with it and then randomly request image generation, it will also keep your previous instructions in mind
e.g. if you somewhen said to use only cartoon style, it will only create with cartoon style going forward
and its multilingual as well, not limited to English
SD3 pretty good
I have used LLava models to "look" and integrate that into my pipeline as well. But when LLava and LLM are not using the same data backend, it definitely causes some incoherency.
WizardLM2 will let you do conversational prompt engineering with text alone, definitely not multimodal, but it is very good.
Using the system command as a place to inject example prompts is a really good way to get it to output proper prompts.
Guys I don't wanna say nothing but it's so over for diffusion based image generators
excellent
Diffusion models were always just a transition. We will end up multimodal
To ruin the joke: ||https://github.com/CompVis/zigma||
zigma balls
is it good tho
idk how many 30k iterations is
oh its 1 second per iteration so the training only took them 8 hours or what
Any recommendations on extensions that can accurately segment the product from a photo?
Attached photos are the before/after from using the following tool: https://www.photoroom.com/tools/background-remover
How are they so consistent with their results, any extensions that can help with this?
This SD3 Release is pretty strange ...
Like selling cookie dough with missing ingredients saying: "See how to get along"
Did we get a release of SD3?
@grand walrus yes, this is most definitely not correct, this is what happens when using the Karras scheduler for example -- same thing that happened in A1111. The only way that the user has to control refiner switchover is by sampling steps, which aren't always aligned with discrete model timesteps, and because of that it is switching to the refiner four steps later than it should have. The refiner is going to do practically nothing under this configuration, and a user isn't going to understand why
and the image differences. in order: no refiner, refiner switch at 20/25, refiner switch at 16/25 (which is the correct point for switchover on this scheduler)
why wouldn't they be aligned?
the different schedulers like Karras change the sigma schedule for sampling, and call the timestep which is closest to that sigma
I inserted a line to print the timestep at the point where that conversion takes place
🤷♂️ i can't say I've ever had it not generate more coherent images on any model (normal left, karras right)
also it's not technically a training-inference gap, since the model was trained on all of the timesteps being used, just not uniformly to how you are using them -- but this works somewhat in the refiner's benefit overall
you are wasting steps on those lower timesteps that do pretty much nothing
That's not at all what is happening or why this schedule is used. The point of the Karras schedule is making it so that the tangent of the solution trajectory for any value of sigma will always point towards x0, which you absolutely want. This is covered starting at page 5 in the k-diffusion paper: https://arxiv.org/pdf/2206.00364.pdf
Hello everyone good day, I have a question, why in A1111 with colab, the DPM++ 2M Karras sampler has been removed....?
Maybe deselected in Settinjgs?
They separated the scheduler selection from samplers, so there should be a spot for you to select "Karras" separately. I don't know where it is because I have had that in quick settings for a while.
Sounds like I have to install A1111 again 🙂
you should try stableswarm instead a1111 is deprecated
nop, I checked it
just try chopping off those last steps and you'll see
🤷♂️ skipping CFG on them is fast enough if I feel like being stingy about it (and apparently helps image quality too, https://arxiv.org/abs/2404.07724), but even then I end up seeing somewhat substantial changes to style so I often won't bother
you were right, here is the new list, I didn't noticed at the beginning, thanks a lot
I got the pixel art models and stuff but how would I get it to make pixel art to a similar style of this?
I also got other platformer assets which I would like SD to style on
that level of granularity would likely require a finetune of some kind, probably all on commodore 64 / intellivision / atari 2600-era sprites specifically
otherwise you tend to get generally 16-bit and early 32-bit era sprite art
before donkey kong country dropped and 3d pre-rendered sprites became the dominant aesthetic
what is this used for?
I find the astronaut helmet frog transformation thing is pretty interesting
craar
cool!
Wow it looks nice
anyone know of a budget gpu/device for stable diffusion/ai and what are the bare minimal is for stable diffusion?
is this sd3?
SDXL, pure base model
How did you get theses images to look so good?
high-res fix for one 1.5x, rest is prompting (along artists)
someday i hope to run sdxl locally.
it's quite pleasant 🙂
Your making me a bit jealous, here i am stuck in 1.5 due to tech.
I had that until several months ago
how did you overcome said obstacle?
lucky, i wish i had money but i dont really because im in highschool.
yes, I see, good luck on your quest anyway 🙂
would this image of me be bad for making a lora?
just be sure to post it on civitai
im training a lora though kohya it's fine for it right? im asking bc of my hood and my arms are a tiny bit blurry
I would kick this image out of the dataset. It can ruin the whole lora.
oh ok thanks ill do that tbh i did a lora already and included this image and my hair sytle (Color was right) and skin tone isn't even close to right lol
the rest are more clear
also i used booru this time to help with the captions even tho i already did the captions from kohya idk if that's gonna make much of a change tho
your lora will only be as good as your worst image
look at it that way
also, use onetrainer
onetrainer?
need more curry
needs aliens
It looks delicious! ... 😀
good idea ! ...I didn't pay attention! ... 😀
😄

When can SD3 be downloaded?
In the upcoming weeks, no date
Looks like the example workflow for Refiner (if I remember correcty) 😄
That would so 100% fit to one of my series ^^
Bad luck to much white ... otherwise print on Aludibond without white color ...
didn´t you use levels (Tonwerkorrektur)?
@unkempt mica sent you a friend request
Does anyone know any professional Ai designers?... im looking to hire one i would pay $300 to who ever can put me in contact with someone and they get the role please 🙏
The bot doesnt work right now, please dont spam every channel
kv design
Preparing for Dune 4:
I'm doing something like a social experiment. It's a collective world generator and in the future, I want to make a game based on it.
Stable Diffusion 3
Yo
Is Stable Diffusion 3 good at full body shots?
I'll appreciate if someone can share some images with full body shots.
Obtained with SD3
Thanks
any painty stuff?
looks interesting
Couldn't find a picture of Discworld yet ...
Good nite! 👋🏻
What prompt was this?
I thought this was going to take forever, but I pooped it out quite smoothly. No wipe necessary.
Here is the image you requested.
Here is the image you requested.
Here is the image you requested.
ok... Soooo the battle...
Im gonna make a thing... then the next image needs to defeat that thing.
Fight
/pr
Hello, I hope you can help me. I want to make images like this. Please tell me which model is appropriate. My idea was to make superheroes as vehicles for comparison. For example, the pictures I sent are the spider boat and the venom boat
@shut sinew Thank you
Juggernaut(XL) probably, or you wait for the SD3 release or use the API
Please give me prompt to create such a picture, thank you
thank you
sorry I am new to searching about Steady Diffusion
how do you make sure that objects in the distance are not distorted?
maybe all xl models good for that
hello guys, what should i type in prompt to get a half face image please ? i tried (half face) (left side) (partial face)... none worked. is there a way to do that ? Thanks!
what mny desk draw looked like in the 90s
maybeeee try close up photography then describe the eye, ear, to let it "focus" on that side...
on forge
using hires fix 4xldsir at 1.55
threw it through usdu and got this
It does the shadows better, but it makes the pumpkins lose detail
hey, I thought SDXL is not supposed to have fucked up fingers anymore
what's with this shit
Are you some kind of five fingered mutant? This is normal
Whoever said it was perfect with hands? No model is always perfect with hands. Period.
Hands are incredibly complex. It's not like your nose where you have one of them and it's generally in the same spot with limited positions. You have 8 fingers and 2 thumbs to deal with, often times in various positions. That means that across many many images, machine learning isn't nearly as likely to figure out the huge variety of common patterns that represent hands in comparison to a nose, for instance.
No no, I'm not asking philosophically
I'm asking more or less: I thought XL fixed that issue largely. Am I wrong? Is SD3 the one that fixes it?
I'm not answering philosophically. I'm literally telling you why it is not solved. SD3 will also still generate hands with problems. Probably less frequently in the same way that XL does it less frequently than 1.5.
I thought I didn't need to stuff like 100 word negative prompts filled with "bad fingers, too many fingers, way way way too many fingers" etc
You shouldn't. That' doesn't actually help.
Hmm
now how am I going to share pictures of two women eating mcdonalds icecream with my friends
By going to a McDonalds with a camera and paying pairs of woment to eat ice cream while you take pictures of them.
Let me know how that goes, btw.
these Midjourney chads with their "two women eating mcdonalds icecream" are breaking my balls about my fingers
How am I supposed to compete with this???
What kind of McDonalds are they getting their images from?!
The right number of fingers while they eat their ICECREAM?!
It must be AI generated because the machine isn't broken!!
Stable diffusion doesn't actually "know" what hands are and there are a million and one different poses they can be in. It would probably take a gigabyte's worth of network space for hands alone to do them right and under all kinds of various environments, skin tones, etc
Might even take a handful of gigabytes
Likely
left girl's hand looks fine, yea, but if you'll look closer to the hand of the right one...I'm not sure what's happening with her hand
need the black perl ship
a Mcdonals I want to be at
needs a Maccas meal next to it
hei hi , sorry to disturb may i ask u how can i replicate the same concept what can i write? how to describe to have a similar design and colors etc etc ?
got some free labor for one of u, im too new at this to be able to do it anytime soon, so this is for anyone skilled enough to do this, but itd be a good idea for someone to make a lora with excessive training on english letters, so teach it each individual letter by having an blank white image with the single black letter, and describe those images accordingly, and then give it more complex versions with images of strings of letters and sentences, and put words and letters in various regular non-blank images
Where to send prompt?
/d painting with flowers that come from the background surrounded with color
One of my longest video so far. Add everything I know so far about ai animation.
yup. this is the biggest reason why the simpsons have 3 fingers and a thumb. easier for animators to draw then.
hands are a difficult art problem. fingers can fing all over the place.
😮
How to recreate this image: https://civitai.com/images/1519380
Tried copying into SD UI, but getting different image.
@chrome nebula person in question:
how many lines need connect oh is not for me
You can compete with that by feeding her more icecream, getting her bikini bottoms that aren't 3x too large, and sending her to the hospital to address that abdominal torsion. And maybe remove the second arm she has from th left girl, in the darker blurry area behnind them, unless that is right girls dislocated elbow leading to that interesting angle.
really though, what they don't understand, while acting as such, is that this... is not "two women eating mcdonalds icecream" this is "two women eating mcdonalds icecream" + prompt negotiation and extension by midjourney. Midjourney is not "two women eating mcdonalds icecream" -> make image. it is "two women eating mcdonalds icecream" -> write a better prompt for the user to match sentiment and intent with a more complex input before sending it to generator
for example if you try ideogram with prompt magic that original prompt becomes - " A cheerful scene of two women enjoying McDonald's ice cream cones in a casual setting. The women are seated on a park bench, basking in the warm sunlight, and they appear to be good friends. One woman is smiling, her ice cream covered with colorful sprinkles, while the other, with a more serious expression, enjoys her chocolate-dipped cone. The background shows a lively city park filled with families and children playing. "
properly using more descriptive prompting or LLM to negotiate your initial prompt , you will find you can get more "MJ" like outputs
Wait this is pretty cool how did u do that
now since MJ went the route it did, even with a still simplfiied prompt, to ideogram, which will use its prompt magic llm to negotiate a better prompt from "two women beautiful women in brown bikinis eating mcdonalds icecream in a dim seaside bar with a stained wood aesthetic and antique glass lighting, sitting very close and grinning seductively "
now this prompt came out with prompt magic to be : A captivating and nostalgic photograph featuring two stunning women wearing brown bikinis, enjoying McDonald's ice cream cones in a dimly lit seaside bar. The warm, antique ambiance is accentuated by the stained wood and vintage antique glass lighting fixtures. The women, seated to the right of the bar, face each other with a sense of camaraderie, their smiles inviting the viewer into their moment of happiness and enjoyment. The bar, occupying the left half of the image, frames them off-center, creating a balanced and visually appealing composition.
as you know and in general that among the rules of a site is not to mention the names of other sites, it is considered an act of advertising, it is for this reason that I responded to you with a general way...
😄
it's ok, stability is a non-profit.
i keeeeeeed
generating babes I see
lol
I don't understand what your saying, but im just going to nod my head
I was responding to the guy earlier who asked about MJ and how could SD compete with , and I said that the prompt they used is not the "prompt" they used
what? that's ridiculous and gating as hell, of course you can mention other sites lol
it's frustrating that they don't let you see the expanded prompt. doesn't truly let you evolve the prompt to work towards what you want.
yeah that's why I mentioned ideogram, I use it sometimes FOR magic prompt, and then take to SD to use that across other models to see how I can get it more in line with mj / ideo style
yeah.... although ideogram has almost no style.
it's the opposite of MJ which goes too far.
MJ = Mostly Jam
batman!
i feel like mj's prompt understanding has been getting better very recently.
i think they see others are chasing them and they had to get better quickly.
Batman eating ice-cream
definitely a difference compared to when v6 launched.
what else is he doing? who is he with?
is the ice cream alive?
I thank you for this information which enlightened me... and it's really good to evolve in a vast space in terms of expressions of words! ...
sd3 works sometimes. "when your bat suit is at the dry cleaners and all you had was this black cellophane"
Does someone use Stable diffusion Api endpoint
yep
Can you show me?
what do you want to know?
Your endpoint
I used many endpoints, none of them working
I mean
-H "authorization: Bearer sk-MYAPIKEY" \
-H "accept: image/*" \
-F prompt="Lighthouse on a cliff overlooking the ocean" \
-F output_format="jpeg" \
-o "./lighthouse.jpeg"```
that's the simplest implementation just on the command line. see if you can get that working with your api key
Also, there are 2 platform for Keys
One stability.ai and stable diffusion
Which one is choose
From where you get the key
Is it working?
sign up for an account and then go to the page I just linked. you can create a key there.
I think you get 100 free credits, and then you have to put in money
@jovial tiger are you in pc right now?
why do you ask?
const body = {
key: process.env.NEXT_PUBLIC_STABLE_DIFFUSION_API_KEY,
prompt: newPrompt,
width: 512,
height: 512,
samples: 1,
num_inference_steps: 21,
enhance_prompt: "yes"
}
const imageResponse = await fetch("https://stablediffusionapi.com/api/v3/text2img", {
method: "POST",
headers: {
"Content-Type": "application/json",
Can you check console logs?
a thousand apologies to Mr. vortex for the inconvenience. I sent you a message by mistake and which was intended for another person and I thank Mr @Lorentz a second time who enlightened me on a question that I was unaware of... 🤝
you'll probably want to go here https://discord.com/channels/1002292111942635562/1002602742667280404
😮
that's probably actual stablity people
you'll probably want to go here https://discord.com/channels/1002292111942635562/1002602742667280404
Holy Hell who has 1000 dolalrs to spend on that shit a month
Thank you for the information... 👍
It's stupid fast. 6 images in 10-15 seconds most of the time, at sd3 quality which will only increase over time. it's a pretty great way to supplement local rendering.
they don't exactly make it sound like a great deal by just saying 4090... is it a bank of them?
otherwise it's just like... shit, buy 6 of them and you'll break even in a year
lykon has mentioned that his workflow is a lot slower than what's on the api... so i'm not expecting sd3 to be fast even with a 4090
guess it's good i'm getting myself acclimated to 400-500 step workflows lol
lots of green border statuses dancing around the screen
Fallout!
Needs to be more Fallout
It's powerful when you think about the damage that these giant and deadly waves will cause! ...I think it was a great success! ... 👍
just a scratch
😀
Hello spirits
Sup yall, i just recently made this art piece using juggernaut xl model. I call it :
THE SPHERE
What i did was, basically generate a bunch of images using that model of an assortment of household objects. I then print it on some thick paper, cut each individual object, and then glue them together on an A3 paper to create the sphere
here's some proof
lemme know what you guys think
time for some pixel art
a magician never reveals his tricks 🙂
What will TED look like in 40 years? For #TED2024, we worked with artist @PaulTrillo and @OpenAI to create this exclusive video using Sora, their unreleased text-to-video model. Stay tuned for more groundbreaking AI — coming soon to https://t.co/YLcO5Ju923!
Very cool, I like to see it
I don't think I've ever seen waves crashing down on something... SD or some other service?
how can we create these images?
2001 Space Odyssey: Good morning coffee
2001 Space Odyssey: Trouble starts ...
2001 Space Odyssey: A romantic story coming?
2001 Space Odyssey: But we need an end fight first ...
Happy end ...
This is insane
LOL you already have A1111! You don't even need anything. 🙂
Let's see...
ok im happy we are here, so we can skip some instructions lol
Try this prompt: "desktop cat". That's the whole prompt.
he is using magicmix, isnt that one of those anime ones?
so you got a lot of stuff haha
nice
it all turned out suckish so i wanted a good one
yea better model and settings
like what model
Settings are fine.
can you run sdxl? cause that will give you better quality stuff
If you're sticking to SD1.5, RevAnimated and RealisticVision are the best ones I know of, but I've been using SDXL for ages now.
probably if i knew what that is
how do i check again
well where did you get the models from? use civitai website, you can even filter by model type, lora, etc
You just cropped it off of the pic you posted. It's right under the render info.
(On mine says A:10.43GB, R:14.15GB, Sys...)
you dont always need a high cfg scale btw, it tends to cook pics, i usually like to start with 5
16GB VRAM. You can run SDXL no problem.
ah
how i do that
Just download a model from Civitai like you did for the other models you have.
https://civitai.com/models/363210/crystal-clear-xl-prime
kk
Not sure what version all the cool kids are using though...
get a general xl model like dreamshaper, or juggernaut, etc
yeah i got both already
No, he's talking about SD1.5
these are the ones i got
i just want to be able to generate like actually realistic looking images
See how the one on the left says "XL" in the title? The one on the right is bad quality SD1.5.
Because it's older.
no but he has xl version already
ic
i can just downlaod the normal one
so you want lower quality?
No, he's right. You have DreamshaperXL in your list.
yeah but the pics turn out garbage
ok but let's go step by step and try to fix that
lemme generate one
Yeah, now your settings are the problem! Easy fix.
ill show u
yay
there is also a possibility your VAE is bad
VAE should be automatic. But try the settings I just posted, and share the results.
Wrong resolution. Try the settings I dropped.
what should i prompt
you are using low resolution
kk
"Desktop cat". Or anything quick.
he wants some beautiful asian woman ok? :3
"beautiful asian woman" then
damn right
How'd it turn out?
Please don't crop. What's the error???
The console error you cropped off.
Hmm.
out of memory
Hang on. I know the problem. Just looking for instructions to give you.
well,
There is not enough GPU video memory available!
i thought i had enough lol
you prob need to unload the model before using another one
Just need a low-vram launch flag. Takes two seconds. But where do you put it again...
but he has 16gb vram no?
oh ik
nvm
i frogot
Me too. 😦
Which gpu are you using?
i remember setting it to midvram
6950 XT
im not familiar with amd ones
Oh LOL I'm in the wrong folder. Hang on.
xD
Edit "webui-user.bat"
he shouldn't need to change to low\mid vram if he has 16gb
but wait why you telling him to change to low vram
yeah xD
this is simply a problem i think because he didnt unload the model before using another one
Okay... Well, let's try anyway.
Flag: --lowvram
how i do that
Doesn't it default to just 1 model in VRAM?
in front of the direct ml thing?
well im not sure how automatic1111 handles that
like that?
Yep. Now exit the console window, and open it again by double-clicking webui-user.bat.
kk
I don't think that's good idea...issue somewhere else, 16gb should be enough for most models, including sdxl-based
yea there is no need for lowvram i think
We'll check settings in a second. Already found it:
First, let's just prove we can get an image onscreen.
either memory leak (just reload console and webui) or settings or install issue
Wrong model LOL.
oh which one should i use
Console?
Okay. What's the directml flag? Do you have an AMD GPU?
yes its amd
how do i check
yep
Click "Performance" in the column on the left.
yep
Click "GPU 0" in the new column that shows up.
AMD Radeon RX 6950 XT...
yeah
I've never dealt with it before but I know it's a pain...
o ok
15.8gb used? A1111 is such a piece of crap 🤣
dosent work
git config --global --add safe.directory C:/stable-diffusion/stable-diffusion-webui-directml
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Version: 1.9.0
Commit hash: <none>
Traceback (most recent call last):
File "C:\stable-diffusion\stable-diffusion-webui-directml\launch.py", line 48, in <module>
main()
File "C:\stable-diffusion\stable-diffusion-webui-directml\launch.py", line 39, in main
prepare_environment()
File "C:\stable-diffusion\stable-diffusion-webui-directml\modules\launch_utils.py", line 593, in prepare_environment
raise RuntimeError(
RuntimeError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check
Press any key to continue . . .
are you sure you installed automatic1111 specifically for amd GPU's?
hmmm...
Is there a way to take a style from an image (drawn) and apply it onto a photo (real)? I have the A1111 webui and ComfyUI webui, controlnet
I think this is what he has and it's right. https://github.com/lshqqytiger/stable-diffusion-webui-directml/

