#š¬ļ½general-chat
1 messages Ā· Page 120 of 1
simplicity is a constant in pretty much every domain of life, itās no surprise that it pays off with AI as well
but if u can get rly cheap eletricity, like is iceland, then u could get into selling gpu power, its in“tresting
there is a potential game plan for all of thisā¦but it involves fusion energy, which is going to involve lots of helium for coolingā¦but once we have zero point energy, we will be able to AI all we want.
i would rly like to know what kinda hardware they are running sora on, must be something massive
considering itās estimated that GPT 4 took upwards of 3-400 million dollars to trainā¦
and thatās just a language model
no 1$ an hour there š
I know Iām just sayinā¦Iām sure those numbers are connected
imagine messing up prompt and stuff on that hardware, expensive
yeah seriously
thatās the most 21st century problem likeā¦..ever lol
āwhen you mess up a prompt on a multi-billion dollar state of the art reality distortion algorithmā
Get https://drawthings.ai/ it's perfect for apple silicon and very powerful once you get to know the interface
There's also a discord which is very helpful, it's full with tips and extras and the people over there are also helpful as well as friendly
And it's totally free
Does Draw Things run outside of iOS/iPadOS?
I remember it being the first app to drop with diffusion support.
yes, it runs on iPadOs, MacOs and iPhone
yes
whoa
what is your goal? what images do you want to make? and why?
also lora training (from 8gB)
and video, but you need 16gb ram for that
and ipadapter
just a lot, have a look @karmic cedar @topaz parcel
why do you need a "LORA"?
I stopped using it once my needs grew, but Iām really impressed at the scale of development. That dev is a one-man band
what is a specific image that you want to create, as an example?
His issue is more medial I think
Hey where is the promt option to generate image
thnx
thanks
Does anyone know of AI models that can do TTS locally? I am looking for results similar to ElevenLabs, but I'd like to run it locally.
not yet
whats champ
neat!
Someone access to the SD3 model? How do you run it? Asking since it probably wont work with automatic1111 or comfy?
no weights are released yet
its a web interface
people internally at stability are probably using comfyui custom version. he works for them
So i've got the invitation email, linking to their huggingface with the model-files...
ouch, people are going to be upset lol, getting the inv and asking here how it works
about sd3 and comfyui : #š„ļ½anime message
No (public at least) informations about a1111 and others
seems made up lol
if you got invited to download weights, you're not going to be someone whose asking how to use them
or maybe the new CTO/CEO is incompetent in his direction
I just did sign up today and almost instantly got the email.
Anyway. I will figure it out then.
whaat
oh
maybe i'm wrong. this guy says he got the weights immediately after signing up
OH LMAO 
this is so fake
absolutely not
its even worse then when people come in here talking about religion
Its all factual information
𤢠and politics
and fetishes. āļø
Those go down in the dm tho

I used to use DFrawthings -and it's currently installed. I stopped using it cuz, at the time, you could not use Controlnet, etc. But I may have to give that another go, too. If i recall though, it was very slow. maybe its got better?
I just like to control my images, with a control net - but as I say, XL models and control net on a silicon Mac just do not play very well together. was fine with SD 1.5 models. and running standard txt2img XL models is no prob - and very fast. But img2img and controlnet - not so much fun on a Mac M2.
As fast as the M-series is, nothing compares to running big data models on mythical cloud hardware
Is it normal for a rtx 3060 gpu to generate an image in under 10 seconds that is supposed to be super detailed?
Could it be the Lora I am using?
Much faster than those 30 minutes, especially now we have the lighting models and Turbo, Lightning and TCD loras
20 seconds for this one #šļ½general-with-images message
i think you are confused about what you were messaged today
can you give an example prompt of what you are trying to make?
he insisted it was links to weights on hugging face
i dont think he's confused. i think he knows exactly what he's saying
It's normal. RTX3060 is nearly the slowest rtx3000+ gpu but it's not that slow. Also not having to offload because you have 12GB of vram to work with is really nice.
Is reactor still a good faceswapping tool? All my images after face swapping come out blurry. Ive looked into this and cant find any definitives as to why. Some say my input image isnt detailed enough.... ive upscaled it and i can literally see the pores, and the bags under the eyes. Lol
No. Reactor is one of the old crusty versions of face reconstruciton. It's just pasting the new face on top of the old one, and then polishing it with codeformer. Very crusty. Very old.
IP adapters use insightface to do face swaps now. They bring the reconstructed face in at each step of the diffusion process, so that it's blended into the final pic much better. They still use the insightface 128px non commercial model though
@sudden hedge
you can only go so far with pasting a face on top of another
What do you recommend? Ive used IPAdapter and started to get the hang of it and still not the quality i wanted but then the update shat on my face rendering all my workflows for this useless. Then onnx took a dump and i had to reinstall comfyui so I'm at a fresh start
i reocmmend ipadapter face swapping. make sure you use the lora with it
Yee refractor plugin
huh?
refractor or reactor?
lol two seconds and i was annoyed in there. no thanks bro
Reactor is absolute crap compared to the IP adapter or Instant ID
technically its the 3050 but who cares lol, both are sloww
the real question is what is the purpose of all the face swaps
P*rn š
there is already a lot of it so i don't know what the point is
it's like who the hell cares
I don't ^^
could go further and ask whats the point of any art?
just go full nihilist
whats the point of music? theres so much already. who cares?
I'm not sure I know what art is ... I'm just doing stuff š
whats even the point of doing stuff. it's all been done
what upscaler do people use? i've been using 4x_foolhardy_Remacri
SUPIR
My brain hurts
different models and methods depending on desired effect
that foolhardy remacri seems to do well for me with nature type things
Do you guys jump around on different model sets or generally right tool for right job stuff?
i was happy with 1.5 but everything is slower now on my poor gtx1070 on sdxl š¢
theres tons of 1.5 models yet. just use those.
cyber realistic just put out a few new versions that are really good
just been getting some nice details on sdxl
oh the new cyber realistic one looks good, there is always new ones of that one coming out
1.5ās value will last a relatively long time on account of the power of upscalers
anyone know of a way in auto to generate a bunch of depth maps in batch? for auto1111
DPM++ 2M Karras or DPM++ 3M Karras? is 3M just better or only in certain situations?
Iām finding it to be really useful for img2img and photographic models
Why would they think it's not useful?
From what we've seen it's amazing.
one thing i can't get a solid answer for from reading about it, will a full fp32 model get better results than the reduced fp16 version?
no. i use fp8 and get results nobody would guess wasn't full precision
I would think so?
i am finding it good for photo real stuff too
Good to know
base models are trained in full precision. inference doesn't really need it
so full precision is more needed for training then?
hard to say if refining weights even benefits from full precision. negligible imo
i train loras in half precision
i think my 1070 would cry at being trained on lol
over heat?
My hunch is that precision would come into play more as prompt complexity increases
how? more frames doesn't increase memory use. just the context batch size. š®āšØ
i'm exhausted trying to convey this to people
Will SD3's 8B model (w/o T5) fit into 20GB VRAM on FP16? I'm sure the T5 encoder model is optional... right?
No idea how I got it to choke
Yes. Should do. The unoptimised version already fits on 24. So it's already very close to 20
It crashed when it was on the final part where it was making a gif
I wonder if Sora runs on 64-bit precisionā¦
I wonder if Sora has that 16-bit blast processing chip!!!! iykyk
any reason to use vae decode as tiled? am looking at a workflow and have no idea why there is a tiled version
Keep in mind friend. I am noob and do not know what Iām doing 90% of my life
Thatās just like everyone now
It made sense when I looked at it lol
Animated likes to use a high step count Iāve noticed
24+
i animate at 4cfg and 20 steps. 1 cfg and 14 steps with lcm lora
Lora is just a different type of checkpoint model right?
All that shi* i crammed in my brain yesterday lol gone
no lora is more of an addon
add something that isn't in the model
so like mini-model you use alongside the base model
dpm++ 2s ancestral karras
3m better with lower cfg
sora takes 30 min for one frame on fp8/fp4 iirc
š®
what cfg do you use with 2s ancestral?
around 5 hr per video iirc
Whatās the length of video render?
5 hours of render would not be bad if you got what you asked lol
But if itās 5 hours per 24 framesā¦. Iām out lol
Ask for sad movie about a penguin and a monkey⦠then sit back and wait for 2 years
peddling hollywood with it right away is a sure fire means to drive human despair downward
So Midjourney and Leonardo came out with "consistent characters" When is SD doing it? xD
SD is slamming the brakes
And why is Sora not open sourc eif its mad eby a company called OPENai
nothign makes sense
itāll be up to the community to develop this stuff
because economy
thatās why
but there is more debt in the world than actualy money
indeed
so the economy is clearly a very reasonable standard
the economy defines debt
i honestly don't know why they called themselves OpenAI when they've never had any intention of making their stuff open
butā¦we define money
exactly
in theory.
So⦠when SD3 arrives. Do we think itās gonna break all current workflow type stuff?
yes
probably
Depends on how quantized it is.
probably wotn even work in A1111
it will first work on comfy of course...
yep
fooocus will add it last
But itāll be great when they do
SDXL matured really well
i really hope 3.0 isn't neutered like 2.0 is. SDXL isn't neutered i don't think
i remember some sayign a fe wmonths after ir eleased "oh it failed no oen cares people prefer 1.5"
well now SDXL stuff is everyhwree and really refined
I think all the major frontend developers have got access to the SD3 beta, so hopefully we'll have full support from day 1
what resolution is SD3 at?
SDXL has been fun
Oh yeah will SD3 still have boobs?
lol
Cat girl people losing minds
Time will tell. I think what weāre seeingābased on Sam Altmanās movements in D.C. as well as Soraās apparent ultra-high end commercial appealāis that the continued development of img2vid models will break apart similarly to whatās happening now with Stability
ill generate anything tbf
don't add a NSFW filter
bro what the fuck
i put this as a prompt
no
in what way
aslong as they release sd3
š
same fr
i have a 7b model llm ive been finetuning
SD3 will generate infinite amounts of fun
and a image model ive been fine tuning
i actually dont care aslong as they release sd3
because id have a good llm and image gen setup locally
and peopel will further refine it with additional DPO and datasets and create perfect models for actions, expressions, arststyle, ETC
In a way that we'll enter into some dark times for the free AI art generation especially, everyone thinking about profits
YepāI see regulation on the horizon, and I see it coming for all of the tools we currently take for granted.
^
if SD3 ends up being uncensored by the community we'll be golden then they can't do crap to us offline folks
but we need to act quick

Regulation of AI will be a very big deal once it hits.
yes
Iām old enough and pessimistic enough to know how this shit is going to go down, and it disappoints me.
I mean in the current state, using Comfy with our SD3 models
sure sure
im bored
for the future models and etc, absolutely
someoen gimme prompts
I mean, get your creativity out while you canā¦I guess.
i deadass ran out of ideas
after studying a bunch of the diffusers manual, trying to figure out how to weild the python code effectively, i have come to the conclusion that covid has brain damaged me and i can't code anymore
but with our current UIs and models we'll be safe
so frustrating to see every other dev with talent just completely sleep on lavi-bridge. how awesome would using t5 for sd15 models be?
exactly
yeah, thatās odd.
are we certain folks are sleeping on it?
covid accelerated the aging. i'm only 40. shouldn't be this inept
i know! i'm more disappointed than surprised now. i made a post on the reddit about it but as usual it was buried in downvotes and people argued that it sucked because its chinese. so i just deleted it
i'm willing to accept i'm old but studying shouldn't be this hard lol
speaking of forge, been using it for 2 days, it really is better, I was skeptical
maybe open AI stuff will go the way of the pirate ships if too much economy considerations and cencorship come sin
imagine downlaoding SORa on piratebay
leaks make thw world go around
not sure why they had to fork it, maybe the auto1111 guy is stubborn
i actually prefer the automatic1111 dev branch again lately. forge's controlnet doesn't work half as well with the animatediff-forge extension as it does on the main branch. and fp8 support is better with auto's code. for some reasons fp8 on forge uses slightly more memory
thereās already an Open Sora project going
a lot of the extensions seem to break forge too. its neat if you got older hardware i guess
itāsā¦pretty silly atm but itās got potential
we need torrenting buit for training ai models
man, is there a way to improve prompt adherence with Stable Cascade
everyones gpus work together to train model
hrm, maybe, I've tried ipadapter, reference, depth and various other controlnet models, and I havent gotten a single oom, so I'm pretty happy right now. Dont even need the medvram flag I had to use before
im addicted to foooooooooooooooocus lately, i looked down on ti when i heard about it thinkign somethign so simple cant be good probably its a stripped down version of A1111 but no, its pretty good and memory efficient
foooocus used photogasm
yep, and it knows how to use a GPT2 prompter
it guts a lot of code and does things in entirely new ways. its a hefty amount of code changes. you don't just throw that at the main branch before significant testing. auto is already trying to rebuild the ui on gradio4. taking on forge changes would be a tall order right now
lmao i cant get to gen a face that doesnt look like the eyes were smooshed out
someone coming into your house and changing everything.. resisting that isn't exactly stubborn. if the changes prove themselves they'll be adopted surely. illy has even said forge may turn into a pull request one day
it doesnt work, memory is nearly impossible to share without having huge delays
you could do something like SLURM, but even then
give all gpus a cluster of training to do
only could work with batch training
like bitcoin mining
e.g. same model for all, pass diff data for each, do things after
yeah its mostly pans
But bland
SVD_XT can go up to like 25 frames lol
it's tidy. the v2 style enhancer is nice to have. I just use prompt magic from within dynamic prompts on automatic lately, its the same thing.
well, if social media continues down the tiktok rabbit hole short will feel like long and itāll all blow over and be okay because culture!
when i want to show off image making to new people, i load up fooocus
From my understanding and all the bloody readingā¦. You canāt batch render SVD either
Itās just⦠25 frames
you can feed it the last frame and the same seed, but the fidelity takes a nice knock
Is there a way too to update styles on the fly? I always make custom styles and edit them in notepad++ but I have to restart Fooocus for them to take effect...
comfy can't do loop backs onto itself in the node graph afaik. you'd have to do it in code with the api
Meh tho.
or code one giant megalith node
Cool! But I think itās more work than itās worth
not that i know of. restarting is the key there
I donāt know why Iām so hyper fixated on animating SDXL
vision
Iām not sure if thatās a statement or a node
since several of you seem to be into video, is there anything else that does what ebsynth does? seems like not
What is that?
like give it 10 keyframes and it fills it in with intermediate frames
yeah theres tons of frame interpolation things
https://github.com/google-research/frame-interpolation i use this through deforum extension for my animations. its fast
ffmpeg has an 'minterpolate' flag, I was excited when I found it, but it's terrible
https://www.reddit.com/r/StableDiffusion/comments/1bfjn7d/tencent_announces_dynamicrafter_update/ theres this one too but i haven't tried it yet
thatās impressive stuff, but are the keyframes it generates going to be enough to interpolate into smooth enough motions? It seems like there might still be a stilted motion effect from how spaced apart they are
FILM makes super smooth interpolations from animated diff generations
thatās the one i usually do
the webui extension supports it if deforum is installed
itās really easy to over configure
and get suuuuper smooth video llol
Itās great for certain types of motion, but unlike dyna it canāt anticipate real world motions as well
We better make some copies of these Forge, A1111 repositories into our PCs. No one knows what will happen next especially with these regulations. I mean you remember how Automatic1111 was banned from github.
Kinda right for the animatediff extension. But the quality of the images from txt2img is kinda the same.
Imagine there will be a torrent for MidJourney v6. The company will go bankrupt :)))
But I really like the idea of torrents
Open sora
i could deadass do what midjourny is doing
Nutty
Did you ever think that we can make their own tools (especially the LLMs like chatgpt, claude3, copilot and gemini) to tell us how to leak these or to even leak inside information for us?
img quality is more up ot the model than the UI.
automatic111 vs forge though, they'll make identical images with identical settings
3 daysā¦. For 2 seconds of video
haven't heard of forge things move so fast, is comfyui still the best ui?
render at lower resolution and then use an upscaler
you wont be making FHD videos with animatediff, not for a bit
unless, you want those 2 day iterations
I am using Topaz Video AI with a RX 6600 XT fps is low if I upgrade to a 1080 TI would the fps in Topaz increase a decent amount??
wasn't that some stupid link which lead to some disgusting thing
I have lots to learn š
i am too impatient to make videos, even if i had the gpu for it
Thatās why I want to build a workflow that kindaā¦. Steps through each process.
about 4s is the max video I'm willing to make, it gets crazy with all the multi-pass img2img, but it can be fun if you get tired of looking at static photos
The end goal. Is a video - video workflow
i mean i wouldn't mind doing each step, it is just the total amount of time to render it
hard to do that with one graph. pre work is required pretty heavy on videos to get good results. just hucking prompts at a video without hand crafting the guidance to the specific situation, gonna be messy.
Source - control net is probably one flow
you'd want to turn the video into a few different remaps. depth, openpose, canny, segments, you name it. theres lots of approaches
I do things by hand, I dont like any of the scripted solutions so far
turning that battletoad video into sonic was a fun experiment to do, but thats with minimal amount of effort and it shows
8 minutes of generated story line⦠is a lot that I bit off
uh, yah, that's heavy
Not in one sitting lol
you'd start with a storyboard of all the scene cuts š just liek real production
Have it
Just need to bring it to reality
But I need to work on some 5-10 second clips at a batch
https://www.reddit.com/r/StableDiffusion/comments/18j0qgk/animatediffcontrolnet_team_just_released/ if you can figure this out it's nifty
Looking
if a 10second clip is a slow pan over a scene, consider generating one pic and panning over it
a lot of the high production value ai videos are a LOT of editing room efforts
The video Iām working on is essentially getting abducted by Alien space craft.
Old Westerns would be a good first AI feature film candidate to make. slow establishing shoots and pan shots, large vuistes minimal sets, closeups of intense faces. quick swift action, little talk...
will smith stars in, eating spaghetti westerns
hahaha
could do it to the wild wild west song
hah, I blocked video for a while on reddit I was so sick of seeing smith
sd3 release when lol
what resolution is sd3 even trained on?
4096 4096
... many years until i'll be using sd3 then lol
lol i'm pretty sure it's a 1024x model too
iām asking sleepy questions
for some reason i was still thinking about sora
probably because of sora
For all its greatness sora is still slow motion
i have yet to see actual normal time videos
https://arxiv.org/pdf/2403.03206.pdf the paper for sd3.
those are probably part of the turbo model that theyāre keeping for hollywood producers
Seems like render hell
i mean, slow motion is easily changed right? you'd just speed up the video at a higher frame rate, or drop some... no biggy
isnt all stable diffusion work?
Right
50 photos, 50 frames, it's all the same
I canāt think of this in terms of video
I would want to render 60fps / at 10 seconds
Even tho 5 seconds of footage is almost too long in video edit world
why 60fps? 30 for video looks good
Post production workflow
i don't know why spend the time making it 60 though
Iām gonna drag all this stuff into Davinci when Iām done
all the movies you watch aren't above 30
I certainly dont need above that, because the quality of the current tools doesnt justify it
the model only knows 8fps video clips. so you render at 8fps then interpolate to get to 60. i set mine to make the final file a 60fps setting and interpolate 8 -10 times.
animate diff just generates the frames. ffmpeg stitches them into a file with the fps setting.
movies, like the avengers, not you jumping out of a plane lol and skydiving makes sense because it is fast moving, can slow it down
Final render is usually 24fps / 30fps
But the initial capture you want more frames to scrub and drop.
hobbit and avatar 2 ackshually
i often run interpolation algorithms on my movies at home too cause i love crispy smooth frames. i just got fast eyes and it looks better to me
idk about what it was
there is exceptions to every rule
i think the rerelease of avatar1 was 45fps too
45 seems like an odd number
wonder what dune2 was, that was very nice in imax
so is 24 really
good question
the only reason that movies haven't gone full high frame rate across the board is becauuse studios can still save a lot of money using lower frames and people seem to be fine with it
higher frame rates just look better though. objectively
That and the industry huffs at over 24fps
48 fps, double the 24fps standard
saving money is the only reason to
You always want to film at double the resolution of your final edit since you canāt always go back and re-shoot
Iām treating thisā¦. Like film
And I donāt know that I should lol
so from my perspective, if I come across frames that are bad, I just go back and re-img2img those
pretty soon cameras are going to have entirely electronic shutters too. solid state shutters are gonna change everything
seems better than doing 2x or 3x the renders just to throw them away
Mine has no shutter
I think itās a 16bit 4k, which would make stupid good video to drop in for models
sony i think, just released a photo boy, not a videoone , that has a crazy electronic shutter. no rolling. all the pixels come on and off at the same time
sexy
I may jump with redbull, but I donāt get redbull money lol
Anyways. Depth gen looks good
I imagine there is a specific size this video should be in order to pass correctly, or does that not matter.
input video should be the same size as the final render imo
Drifting over a YouTubers workflow and it looks like hell lol
This week i am cursed with SDXL
In what way?
Cursed images
Could be enjoying the nightmare that is video
lol
Canāt wait for my epic stuff to finally exist
here's a cool creative work being shown off https://v.redd.it/zd685tn9toqc1
pretty sure Mr. SECourses guy asked me a question under a different username in the comments of that video
cloud questionāshould I stick with runpod or should I switch up to something else? I occasionally do video workflows, so having access to high RAM environments is nice
then remake a video and publish it as if he's the one providing all the value. yeah he's been called out on that a few times and is very likely using alts now
lol
how did you imported lol oh i see what you did there
mmmhmm
i dont know why but RM and this are so tightly connected in my brain. can't think of one without the other https://www.youtube.com/watch?v=oHRUbWGRYqU
i have a question, when it comes to loras is there any extension or plugin that allows sorting? it would be amazing if i have a character,pose,background etc for loras that i can move and keep orginized
you can create subfolders to keep everything clean and tidy
For exemple,I go :
|- SD15
|--- Artists
|--- Character_Anime
|--- Character_Realistic
|--- Facial_Expressions
|--- LCM
|--- Location_Background
|--- etc
|- SDXL
|---- etc
we need a webui in the form of an unreal engine-powered supermall where makers have their own shops, etc.
because image diffusion is basically just a mall of options
Would making sub folders get me more tabs tho in A1111
š¦
the subfolders dont create the code to make new tabs sadly
So it only helps downloading and sorting symlinks
skeu-spatiomorphism?
That should be a extension ngl
by that measure ComfyUI should look like the interior of a steam engine that Rick found out of some random dimensional portal
no but close enough (check what it looks like in #šļ½general-with-images cause we can't post images in here)
^ any comfy workflow that involves 30+ modules
gotchu right now i use swarm and A1111 with lobe hub and its really simular
i just wish there was a extension that would allow folders inside the lora bar or just make new tabs in general
bro my lora is pulling a gemini moment
every time i generate a couple the guy is black
yeah swarm is a nice solution too to manage many models, loras, etc
Does anyone know the size of the dictionary the tokenizer SD3 use? Or what is its native prompt max_tokens without any chunk-breaking tricks?
is sd3 even out?!?
Some people who have access to it or people who have connections to StabilityAI could reply.
I get the feeling those folks are all living in their own bubble right now.
@floral nimbus Hey i have a question to ask you in private, also SD3 isn't for free use I believe until 3 months later
i am doubtful the largest SD3 model is going to be released
i think it's kind of donezo
they may release the small and medium models for free use, and possibly without the better conditioning. i am not sure how the aesthetics will compare.
it doesn't take a genius to see that Emad departed over specifically the fully open release of this upcoming model. it is viewed as a valuable asset, but on the other hand, bing image creator is completely free, so i don't know how valuable it is in reality.
I DM
yep, the heat is on. and itās the kind of heat that words donāt necessarily form fromājust actions.
thatās how on the line everything is.
I know this goes without saying⦠but Iāll say it. I appreciate the hell out of the community of artists who do this stuff. Those that keep it open source is an extra high five, I wanna buy you a cup of coffee. I know there needs to be some housekeeping and such for all the reasonsā¦. But The tools that exist to allow us to bring nightmare fuel to lifeā¦.. wellā¦. Thatās something.
Cary on.
yea that's wild how people keep asking for it, even tho it's not gonna be open source right away, just let em cook, nothing gets done good in a hurry 
Current Stand-in CEO says the plan for SD3's open release w/ weights has not changed
Let's save the doom and gloom for when there's a good reason

whens the release?
4-6 weeks allegedly
and I still didn't get accepted from the watchlist XD
@oblique jay
Hi, nice to meet you.
other AI tts thingies are pretty far away from 11labs, unfortunately
hello
any news for sd3, when?
Postponed to 2024
Thatās this year!
good afternoon, how to generate?
Light a candle and clap your hand 5 times then inhale and do 2 carthweels.
what's this?
how to ues?
guys, i just saw https://arcads.ai
it's a service to create AI video marketing.
do you have a recommendation opensource tool as alternative?
i have RTX 4080, currently running A1111
Not here at the moment, but you can use for example https://leonardo.ai/
hi friends, can I pm anyone regarding image generation? I have png info but I can't recreate the photo for some reason. Plz help ty!
cf #š¤ļ½tech-support message (link to a specific message)
a good computer and a1111/forge/comfyui
Hi friends, I am new on this server.
I have been playing with stable diffusion 2.1 last year and I really liked it.
Now I am very excited to try out stable diffusion 3 and I just enrolled in the waiting list.
Did any of you already have access ? Or are people going to have access when the model is launched, if so, when would be the launch date in your opinion?
New CEO posted 4 to 6 weeks ETA for weights to be released.
so in the BEST case, the end of April
Mid April Copium
yeah I was naive enough to think it'd be mid or early april
lmao
now its more like End April-Mid May
if not later
Yeah, I wish they'd just release SD3 and call it a day
It'll suck on release anyway lol
it seems they really have a lot of work left to do
controlnets, optimizations, final training pass with DPO and RLHF, etc
If they can get control nets to work that'd be awesome
and I thought it was really close cause the pics from Lykon and Comfy looked amazing for a base model
The thing that makes it amazing in my eyes is its token limit and prompt adherance
The fingers and hands are still a bit messy though
I bet in the future we're gonna be crying about ONLY having 512 tokens 
someone said it wil lwork with 8 gb VRAM
the 8B???
Lol, a picture is worth a thousand tokens
i dont believe it tbh lol
I saw a stability staff write on reddit that Comfy is targetting 8GB
but im not so sure about that goal
There will be a version that's around 5-6 GB Vram with T5
yeah that sounds like the 2B
that same stability staff said that 8B is the most fun
And then there'll be the gigachad version everyone will be training on that's probably 12-16GB vram
the competition will be interesting in the future
DALL-E 3 is supposedly getting inpaint as well apparently
I wish that the 8B would work with 12GB + highresfix
but I just don't know about the 8B MMDiT weight being able to fit on only 12GB, even at fp16
maybe fp8 will help?
yeah SD3 will get inpaint and edit models apparently, though idk if those were test only in the paper
yeah the 40 series users only
the rest of us get a massive slowdown
you mean D3? š
SD3
i meant DALL-E
SD3 will very likely get those addons as well xD
yeah I hope so
I just hope it won't be separate models cause these 8B model will be MASSIVE in file size
i wont be able to play with those anyway except its implemented in Alpaca plugin for Photoshop
i dont feel like paying for another software right now tho...at least not generative AI one
You'll only need a few I think. When SD3 becomes well established I'm deleting all of my 1.5 and SDXL models/loras
It'll probably just be dreamshaper, Pony, and maybe Zavychroma or another top-tier checkpoint that comes out
oooh I can only IMAGINE those models
I hope the massive finetunes like DreamShaper and etc will give us expressions and actions with detailed dataset captions
God I hope so too
One of the good points of 8B is it'll probably have most expressions and actions in-built already
Rather than fixing the model, hopefully fine-tunes will just be guiding it towards a certain direction
chili peppered tronisanator
Hi everyone, i am trying to use the best resolution for controlnet, for my image2image. in A1111, the resolution is in multiples of 8, while in comfyui, it is in multiples of 64.
is there a node for me to use controlnet in multiples of 8? how does controlnet actually work?
I will prefer to use multiples of 8 as i can get a depthmap to match my original img. the shape of my original image is very important and i want to try not to have it off by even 1 pixel.
Hi guys, there used to be a channel specifically for artists who use stable diffusion, is that still a channel? I canāt find it
i think the artists-channel is gone
yup that s gone since probably a year or so. Nowadays you can share stuff in the Dreamer communities forum thingy, #š„ļ½anime, #šļ½general-with-images, etc
it depends of what you want exactly
yeah, the SD3 architecture will already be fully compatible with image inputs according to the paper
though I wonder what use will T5 have when doing image conditioning, as T5 can only do conditioning on text
dont ever leave a '>' off the lora tag in a prompt, caused all kinds of chaos before I found it
how have the community generations been going for SD3? has anyone discussed trends in what theyāre seeing?
Hello, nice to meet you too.
Anyone familiar with getting SD up and running inside something like GIMP with inpainting? I see several plugins that claim to do this, wondering if thereās a leader in that space
Hi all, can you tell me who is using what? (Free) and not using PC power.
Do you remember of Stable Diffusion ?
There are also German users here
I tried to find a wind powered version but doesnāt look like itās there yet
i tried to find one powered by zero point energy but apparently i need moon helium for that soooooo i guess i have to build a space railway now
You should also be preying for me. š„²
I have 10GB
I don't even know how that will be possible. Lol!
Insanity levels of engineering required to achieve these results.
But all the more to them, if they can pull it off.
I have trust in 8B, not much in 2B
for as much as possible from a data perspective
2B with the T5 encoder might be very diverse still.
if youāre working with those limitations
it's either a massive 8B model, or something that's smaller than SDXL (and another one that's smaller than 1.X)
but yeah
it could be possible if ComfyUI splits the model between RAM and VRAM efficiently, it certainly won't be as fast as if the entire model is in VRAM, but it will probably be functional
AI is like a giant whale at the bottom of the ocean right nowāthe little fish (us) are sweeping in to have our share, and as much of it as we can glean. But pretty soon our shares will be shrink-wrapped and come from stores. š
well yeah that was my other guess, where we use the Turbo model to cope with the massive slowdown
I just want to know how much VRAM the 8B MMDiT model will take eventually
like alone on itself, cause we know that T5 will be loaded separately
speed (processing time) versus quality (data heterogeneity) is going to become the driving balance for all this stuff, isnāt it
as it economically evolves more
not necessarily, there have been optimizations in the past that speed up inference without loosing any of the weights or affecting outputs
albeit in the same playing field.
but youāre not wrong!
Justice shall be done
cant you just right-click their post and hit report?
^
I'm also fine with group therapy, lol
itās super effective, take it from me.
(they donāt invest in couches)
kevin pollack has a great stand-up routine where he's talking about a football ref needing to unload, and just blowing his whistle and talking to the audience
Is it normal for sdxl lora training to take over 3 days to finish?
Got an rtx 3090 and it's 250 images between 1024x1024 and vertical or horizontal of 1024x1536
I recall training this on 1.5 taking only about 6 hours or so.
With Rank 512 and 250.000 steps at batchsize 1 maybe. Otherwise, definitely no.
Did you solve your problem?
No, but that's alright
I would think an hour at most on a 3090, how many steps? figure if you're doing about 10 repeats, that's 2500, are you using reg images? that'll double the time
can anyone recommend a good place to download some better trained models? I know of huggingface, but... it's not the friendliest to use
so if you're doing 100 repeats and 10 epochs, the number of steps goes up dramatically
like add a few zeros š
repeats should be small though, control the steps with epochs (full batches) vs repeats, unless you are doing multiple concept training or reg images, in which case the repeats helps balance out the training
just, be careful of all the smut there. some of it is shocking if you don't know what you're getting yourself into.
I only take clean stuff š
anyone tried ways of monetizing the AI art? Just out of curiosity. You don't have to tell the methods. Only "yes" and if it worked.
I wasted my time with etsy
š
Hii! Do you guys know any node in comfyUI that receives an image and, based on the size of that image, it outputs the closest width an height recommended for SD 1.5?
I found one that works for SDXL it's called "NearestSDXLResolution", part of the "ComfyMath" node's pack, but the author didn't include an SD 1.5 option and he seems to no longer work on the project š„²
I use Image resize.
I'll paste a picture in the other general with images channel
sd1.5 doesnt have reccomended sizes
anything mod 8 works
recomended size is 512x512
512x512 is train size, most models are trained on 768x768 max
sd1.5 breaks when above 768x
sorry but what mod 8 means?
@trail lion @crude notch gotcha! Thanks for the info :ĀD
np ;3
Xformers?
xformers are not entirely lossless. Even generating the same image with the same seed gives a tiny bit of variation. I had one case where the character sometimes had closed eyes and some time closed ones.
^
I know about the seeds, but i didn't know that it decreases quality?
not necessarily loosing quality, but altering the image in a tiny bit
Ah i see. I guess which is why it's so widly used.
however, it also could be a janky implementation of it in Auto1111
it's widely used because it's so good with memory, but only works on nvidia, so meh
gonna try running this "aniportraits" code today. letts seee what happpens
Is there a simple way to create game assets like objects, people, and such kinda like spritesheets with ai, i use comfui so there is that....
ive possibly asked a similar question?
simple? nope. reliable? not really.
well ok....
I think you'd do well with these new img to 3d models coming out. Make a simple 3d character, take it into blender, fix it all up and rig it, generate a sprite sheet animation from that, then fix it again
push button game development isn't here. you still need so much passion for your project that the legwork is beautiful to you.
woudent that only work for 3d stuff like how could i also achieve 2d? im fine with 3d but ive been wondering manly about 2d.
3d models can render into 2d images. You can then paint over those 2d images to make them pixel art if you want
the 3d model in this case would be more like a rough scafolding for your final product. accelerating your content creation still
https://civitai.com/models/129057/pixel-art-sprite-diffusion-safetensors theres models like this too. lots of optiosn. nothing "easy" or reliable though. a lot of fiddling
I see, well thank you wise person for letting me know this info.
bad thing is i run stable diffusion on cpu so im not sure what will happen or how longer things will take for 3d.
yup, you'll eventually probably need better local hardware, or start leveraging cloud compute
I do have a gpu it just doenst run ai ive tried everything to get around this but nothing works, but ill just use cpu for now intill i can afford/get better hardware.
yeah im that dude with the amd 480, 8vram, im surprised you all remembered, but it basically charshes my pc when running ai like suddley my screen just turns black and i have to reboot my pc. its so annoying.
think i encouraged you to try linux last time but i fully understand why you're not into that. it's a steep learning hike
oh i forgot to try linux, im so sorry i forgot.
my vega64 8gb would do diffusion images in a minute . that was in october 2022
sdp wasn't out yet then
is there a linux os you would suggest for duel booting? by chance.
manjaro and garuda are the two downstream arch distros i was riding for a year. i've heard good things about linux mint, downstream from ubuntu
couple years actually
no. there was an optimization NVIDIA made a while ago that wrote a model in an "engine" format, which has the same precision as the original PyTorch checkpoint, except it runs much faster than the original checkpoint
tensor RT models. you can only use them for one specific resolution iirc
Is arch really that good? because i believe i tried plane arch once but i coudent even set it up.
its a flavor is all. hard to say what linux is better than another, because a lot of it is up to the end user
the distros i used were easy to install. arch is a rolling release unlike others
that makes sense i might have to research then.
i still use steam OS if you count that as arch
When NVIDIA made their own implementation for SD they made it possible to compile "dynamic" engines that had a range of batch sizes and resolutions that it can be used for
the dynamic models i tested didn't get nearly the same speed benefits. less than half iirc
so thanks for reminding me about linux and such.
https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Install-and-Run-on-AMD-GPUs#install-on-amd-and-arch-linux i remember finding this page and gasping in excitement. "This is Arch. I know this!"
I don't know, I haven't used it in a while so I can't confirm or deny this. I do remember the dynamic engines just taking longer to compile though
sometimes you just gotta flex your inner dunst you know?
i get that.
God...
SD3 for Idegram/DALLE3 will be the SD1.4 for DALLE2
I get to experience a massive leap in AI once again
well, nevertheless, it was still a way to optimize inference without having any impact on the model's outputs, so I guess if quantization won't be a thing with SD3, that's the direction we'll be heading
or just keep SD1.5 alive as we saw happen many times before
Hello, is it no longer possible to create images here?
what is your goal?
what kind of game are you trying to make?
I just tried this with a fresh A1111 install, pure PyTorch did 10.2it/s, a dynamic TensorRT engine did 31.4it/s, a static engine did 35.1it/s. the dynamic engine is indeed slightly slower, but both were multiple times faster than a regular inference while having the same outputs
hello all! im having a hard time stomping a bug. im run stable video diffusion locally and for some reason it works with some images and not others. i get an unboundlocal error about the input image. I've checked everything i can find and can't figure out why this image won't work
Hey everyone.
maybe check in the techsupport channel, but I'd check your version and maybe update, that seems like a code-related issue vs something you're doing or an issue with the image
also if you're tossing external images in there, know SD likes certain resolutions
i tried cropping the images. ironically the images im having problems with are also being generated locally from stable diffusion xl
I think we might need to wait until SD4 most likely to make good character spritesheets
the tech seems to still be far from that
there's this technique https://cobaltexplorer.com/2023/06/character-sheets-for-stable-diffusion/
Who has some good tips on prompting in Stable Diffusion 3? I feel like it's harder to get a specific style than previous versions.
nobody has it yet, so you're not going to get much help
ah okay. i got access to the Stable Assistant today, it's where i'm using it. thought more people have gotten it.
yeah, maybe SD3 can do it properly if the community tries to make something
Hello, this is a random question for any experts on training ai models. Do you know if it is possible to build a model using videos as training data (rather than stills). Iām not talking about video to video, but rather the training process. Thanks
OpenAI did it with Sora so probably yes Edit: I misunderstood your question. Afaik, this is done for animatediff loras which can be used to generate images but I haven't seen anyone use videos to train image gens directly. In either case, training an image gen with videos seems pointlessly complicated and somewhat counter-productive with the current AI tech we have
Wesh y a des gens connectƩ ?
does anyone here use hugging face for creating SD images?
I'm trying to figure out a way to create images using FOSS that doesn't use my old mac
I use Diffusers some of the time, on Apple Silicon and Colab on occasion.
Hi
You mean on your mac?
Just made a bot, trained my own model to work with nsfw, not using stability tho,
Yes on my Mac and sometimes using Google's Colab system
made a bot? what do you mean?
made a discord bot, that will generate images on prompt
I don't have an M1, I have an older mac will it work?
Oh, man, that's interesting. I use midjourney on my server, so I can make a bot that will run can you give me detials on that?
It can do but pytorch support for the non Apple GPU's is a bit spotty and deprecated, so make that it might do, you would have to try it
just dm me,
Yeah, that's what I thought.
Can you give me your set up? I might just get a used M1 mac and try to make it work. More detials the better so that I dont screw it up.
What model architecture do you think Ideogram is using?
It looks pretty close to what i would call a custom finetune of SD3. Some of the outputs with the same prompts from SD3 look really close to each other in composition.
Sometimes I think that's it performs like a ~2B, but other times it looks like as capable as ~8B
idk
@charred mesa Ikr!? Wtf. It also appeared around the time after SD3 was announced. (I think. Correct me if i'm wrong)
It makes images so close to SD3. Idk what is going on there.
Idk about the architecture behind it, it could be DiT, or just UNet with a heavily captioned dataset, no idea...
yh..
are there any live coding/working with SD streams which people can recommend?
how should I go about trying to use 2 character loras in one image without them blending into each other? like is there a way to tell the program where to use which features?
you can try using them one at a time via in-painting
I was using no reg images. I was using 100 steps per image though. But I don't think that was even the problem. It was going WAAAY slowly than normal. Like each step took 30 - 60 seconds.
wow, yah, something definitely wrong there
Hi! š I just joined š
You exceeded your vram and we're using shared ram which slows everything down exponentially. Add an Nvidia control panel profile for the embedded python.exe and set the cuda fallback policy to not use system memory. Try training again and it might work correctly or it will give you an out of vram error
Either that or you're doing training at 32 bit instead of 16 or bfloat16
Or your batch size is too high
so
in auto webui? not that i know of
not likely in txt2img. could try regional prompt, but regardless you'll be fixing it in img2img
Yooo
I need to convert an image to image with a line art work,
tutorial says to use control net
then I tried to configure the parameter as what tutorial told me, but the results definitely looks wrong
not even shapes as the pose at all
based on the detailed information provided i think it's gotta be the flux capacitor. try shoving a large dill pickle in it
I mean I'm chums with some of the guys who work at stability.ai and I never got an invite. Just sayin...
Yeah who knows
I signed up in hocus
Hours
Fully expect to not get an invite until June the way this has been going
wait is there any AI services out there for converting image to sketch ?
Probably but I wouldn't know
I do pretty much everything in comfyui
Getting the output you mentioned is pretty trivial in there
I don't use a1111 for anything except lycoris-ia3
Pretty sure there are some
just an image to line art sketch
Wait I know what you mean. BRB
fingerscrossed š¤
Sure, I have a GitHub Repo for both the mac stuff, although targeted at a 8Gb M1 ( fine for SD1.5 and SD 2.x, not good for SDXL due to swap usage but works) and Colab.
Not I have a 24Gb M3 I use the Colab scripts with a find and replace of 'cuda' replaced with 'mps' you'll be better of starting there as the 8G M1 scripts have aged badly and need the fp16 madebyollin vae adding to replace the default vae.
https://github.com/Vargol/StableDiffusionColabs
https://github.com/Vargol/8GB_M1_Diffusers_Scripts
My Setup is basically , install python 3.10 or 3.11 from macports
create a venv somethere
cd my_directory
python3 -m venv Diffusers
cd Diffusers
. bin/activate
pip install diffusers accelerate transformers
That should get you good to go
If you close Terminal, you'll need to reactivate the venv
cd my_directory/Diffusers
. bin/activate
Starter SDXL diffusers script
import random
import sys
import torch
import gc
from diffusers import DiffusionPipeline, AutoencoderKL
prompt = "A film still of a close up of A red haired woman standing in a lush green jungle"
negative_prompt = "painting, drawing, illustration, glitch, deformed, mutated, cross-eyed, ugly, disfigured"
use_refiner = False
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix",
torch_dtype=torch.float16,
force_upcast=False).to('mps')
pipe = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16,
use_safetensors=True,
variant="fp16",
vae=vae
)
pipe.to('mps')
#pipe.enable_sequential_cpu_offload()
pipe.enable_vae_tiling()
seed = random.randint(0, sys.maxsize)
images = pipe(
prompt = prompt,
output_type = "latent" if use_refiner else "pil",
generator = torch.Generator("mps").manual_seed(seed),
num_inference_steps=30
).images
if use_refiner:
pipe = None
refiner = None
gc.collect()
torch.mps.empty_cache()
refiner = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-refiner-1.0",
vae=vae,
torch_dtype=torch.float16,
use_safetensors=True,
variant="fp16",
).to('mps')
refiner.enable_vae_tiling()
images = refiner(
prompt = prompt,
image = images,
).images
images[0].save('sdxl.png')
Hopefully your card supports fp16, if not change float16 to float32
Another day another no SD3 invite
Does steam now allows stable diffusion art ? How can we prove it's not using any copyrighted content ?
š
Does anyone know how to lower vram usage with ultimate SD upscale?
I'm using comfyui if that makes a difference
hi, I want to learn stable diffusion deeply , not how to use it but know the Principles and underlyingćWhere to start and Advancedćthank you.
I want to build a UI where people can enter a prompt to generate images using the stability API. Do I need to apply content moderation to the prompts before sending them to the API? it says in the TOS that moderation is already applied to the prompt but do I need to implement an additional layer of moderation?
I think the endgame of Stable Diffusion once it can always make great pictures without mistakes could be being incorporated into games for full customization. The dev of a VN could make it so you can make a prompt for the main character to be literally whatever you want for example.
Is anyone able to help? I've installed stable cascade, however when running a prompt is gives me an error "AttributeError: module diffusers has no attribute StableCascadeUNet" I'm usuing A1111
Same in Forge here ... I think it didn't complete the download o.O
Couldn't see a fix on google
In my case I'd think about a reinstall ... but that's the problem with A.I. at the moment ... a small thing changes and others get problems ... always hard to solve ...
Just gave that a go, still the same issue. I'll wait and see if someone has a solution online
If you have the space on disk you could try stable diffusion forge. It's like A1111 but more components preinstalled and it (was/is) a bit faster.
I'll give that a go in a bit then
Sometimes a small update of one component just cranks up the system ...
hi
Hi
So, anyboy else as worried as I am about the rumors of Microsoft acquiring Stability?
its fake its gonna be tencent trust me (real)
Well, any major company acquiring Stability isn't good. But I know they need the cash. Real shame though.
I'm kinda taking Emad at his word that SD3 would be the company's last model. He clearly knew that the jig was up.
yup, think you're right.
I just hope the SD3 weights get released before Stability implodes
me too!
we will
the new CTO said 4-6 weeks ETA
Our plan is to soon release the API first to collect more human preference data and validate that our safety improvements don't cause the quality to suffer. Then we'll do some more fine-tuning (DPO/SFT) and release the weights and source code. Current ETA is 4-6 weeks.
CTO posted on March 25th
So it's gonna be fine
plans can change, especially if company is acquired and new leadership is installed with new strategies
it's not impossible that Stability is taken over by someone who plans to keep SD3 closed and insted run it as a midjourney-like service
ok I understand the pessimism elsewhere when it comes to SD3 but not this, they 100% will release the models š
They won't get bought out in the last second
maybe AFTER SD3, sure
I sure hope you're right! š
I'm probably too pessimistic. Emad owns enough stock to block any takeover, at least for a while
If they offer to buy out the company and want to hand emad countless millions for his shares, there is no chance he doesn't take it
Everyone has a price to sell out.
I don't understand how i can generate picture, someone can help me please ?
heya, where can i download stable diffusion to generate some images?:)
https://github.com/lllyasviel/Fooocus is a good place to start
tyty
i
You can't make images here right now, but you can try Stable Diffusion at https://leonardo.ai/
oh really? why not? did they close it?
The bot is down, probably will come up but for the closed beta of stable diff3. You can download interfaces like forge or automatic1111 plus the base model and create locally
Hello everyone I am looking for an application for TTS with French/English/Japanese AI voices, I have already looked for some applications but they are not perfect for example I would like a wide range of choices for the voice, I want good French voices. sorry I don't know if this is the right place to ask my question, if it is the wrong one I will post my message somewhere else.
You can go to github to see if there are any related projects.
I'm not comfortable with the github interface but thanks
Is there a reason why SDXL Turbo is not available via API?
Well, they just have the their full model available on the api.
Gotcha!
And it's fast. I get back 6 simultaneous requests in 15 seconds.
Interesting, that's good to know
Are there ETAs of the "Coming Soon" models? Not SD3 but other ones like Stable Audio? I don't utilize SD enough to do the monthly payment stuff.
Unknown. I've only looked at the image side of it
Not that I've seen, only ones that have been. Getting release dates out of them for anything seems impossible. They're not a big enough company for that.
How to get it?
I dont know where the api is
Oh wait
SDXL
I read sd3 lol
Hah yeah, it's not sd3 on there yet
Have you experimented with Core yet?
Is there any substantial differences?
I have. Give me a prompt for it and I'll give you the output from core.
The more I look at Ideogram images the more they start to look like 2.1
there's something about them
character reference sheet of 3d male adventurer in pixar style
I don't know if there needs to be something more specific than this š
I appreciate it. In your experience is there a strong difference between SDXL and Core?
Ok I sent you a message in general with images channel
Looks the same so far
Do you run this locally or do you run this through an API as well
Locally, although you could just as easily throw it at ChatGPT or perplexity apis
The secret sauce is really the instruction.
Any of the models can handle it above a certain size. Mistral 7b will mostly do it, but it's usually too long for the 77 token context length that sdxl needs. Mixtral does it right and the big guys will easily do it
Do you use a quantized version of Mixtral?
Also do you mean that anything above 77 tokens would be too long, or do you need something that long or above to get a good response?
well there's gguf (plus quantization of course)
That makes the data less
Yeah that's what I was curious about, cause unquantized was eighty six last time I saw with quantized 4 bit being 24 GB
Yes
Do you know what quantisation is
It's basically removing some of the data to make it less
So how does this work?
You can't Quant image models in the same sense
You can just lower the resolution of the images
Well, quantization with LLMs will just lower the precision, if that's what you mean. Since mixtral 8x7 is 16 bit at 90 GB VRAM, lowering the model to 4 bit will allow it to be around 24 GB VRAM.
But the fact about the ASCII art is interesting, I didn't know that
how does what work
Lower precision = removing pieces of data
It's like taking a sentence and removing a few words but making sure the rest of the sentence makes sense
Or taking. A detailed paragraph and shortening it
I'm using mixtral q8, so it's about 46 gigs.
I get what you're saying, I just don't think I view it in that fashion, since it doesn't automatically guarantee a loss of data. It just means that the parameters (weights and activations) just are less granular, which may or may not result in information lost.
Full size is 96 gigs
Gotcha, that makes sense! What is your rig, 2x4090?
Or do you run something else
For llm, I'm using an m2 Mac so it's easy to load big models.
When the m3 comes out I'll get a really big one to run full size models
But you can run mistral 7b q8 on any 10gig nvidia card.
It's "good enough" just not ideal. You'll lose details if the prompts are too long.
Want my prompt instruction?
Sure I'd love to see them
You just do this through Lazy Loading in Pytorch or some other method?
Ok, it won't let me paste it. It's too long.
I'll give you the short version that doesn't give it training examples.
Or I'll trim some out.
I'd like to run a lot of these things locally, but I unfortunately bought a 5700XT a few years back.
You're welcome to DM me it
In parts if it's too long
Limiting your response to 50 words, act as a creative agent who generates a very terse but highly creative image prompt derived from the prompt I send you. Include descriptive visual elements of the subject, lighting and surroundings. Specify an artistic style or camera settings at the beginning of the sentence, using descriptive elements that pertain to this artistic style. Include no more than 10 elements presented as discrete descriptors in one long sentence without story. Put the most important descriptive elements at the beginning of the sentence. Here are 6 example prompts that should serve as a template for text to image prompts that I ask you to create.
Surrealist painting: Adorable puppies frolicking in a tempestuous sea of mewing kittens, surrounded by gargantuan, glistening ice cubes. Soft, warm lighting illuminates the fantastical scene, emphasizing the contrasting textures of fur and frost. Vivid colors swirl in a dreamlike atmosphere, capturing the playful energy of the impossible scenario.
Vibrant 3D Pixar style render, neon-lit forest, adorable squinting animals, oversized gummy sword, water balloon gun, exaggerated mock duel, hilarious facial expressions, dynamic action poses, volumetric lighting, depth of field.
Vibrant digital art, dynamic lighting: Elderly grandmother with mischievous grin piloting unique mecha suit made of large, colorful speakers, blasting blue sound waves at unsuspecting people, bustling cityscape background with mix of modern and vintage buildings, lively atmosphere.
Neon-lit microscopic view: Colorful anthropomorphic bacteria, viruses, and microbes dancing wildly on a glowing Petri dish dance floor, surrounded by pulsating organelles, with a DJ microbe spinning records on a DNA turntable, while microscope lasers create a dazzling light show overhead.
Please create an image prompt for:
for most models, sd3 quantizes amazingly
Hm
I see, so this are the specific instructions that you provide for the LLM then?
I left some examples in. Those examples are created with Claude 3, so they're examples of perfect prompt instruction following for it to know how to make them
Transformers architecture is not optimal at all
It's not efficient
Ai is constrained by this architecture I believe and other models
Correct. I paste in that whole thing, and put my prompt after the colon at the end.
Gotcha
Do you think that prompting with comma separations with 1 or 2 word phrases, makes the outputs more inaccurate or accurate? Like when people do the following: 4k, hd, brown background, etc ... etc. versus putting it into sentence format?
Vivid real life, 8k, 4k, a wolf, a wolf is alone in the woods surrounded by trees and moonlit rocks, the rocks have angry faces and are mad at the wolf, the wolf unfortunately lost at poker and did not have enough money to pay the rock people, the rock people are dystopian society of secret magicians who dont like wolves. It was a great night at the bar
That kind of booru style keyword promoting can work well, but only for single subject images like in sd 1.5. Sdxl works better with natural phrases
What do you mean when you refer to single subject images, such as an image of a single person?
Or where there's a direct focus on a particular topic/subject?
One girl / one human / one thing
A wolf
A car
A train
Vsā¦. A pack of wolves, a crowd, lots of subjects
Oh one last question, do API credits expire after a certain amount of time?
No, you just put money on your account and it charges pennies against that amount as you use it
Yeah if you say man, Apple, girl, holding. That doesn't tell you who's holding it. Sdxl has that natural language processing so you can have those kind of simple interactions
What if apple buys SD3 and includes it in ios18 under lock n key?
First thing I read and I regret my literacy today
Gotcha, this is way more my speed. I hated SD 1. because I had to use the silly comma based system
Appreciate you answering!
