#💬|general-chat
1 messages · Page 113 of 1
Controlnet with lineart Anime or lineart model
I hope sd3 can generate rpg character sprites with some editing. Even Dalle-3 seems to be far from doing that
We are thrilled to welcome @EMostaque Founder of @StabilityAI as an advisor to the @rendernetwork to collaborate on next generation AI models, IP rights systems, and open standards powered by decentralized GPU computing.
Anything we should know for new people?
Render network is crypto startup. Stability has partnered with a japanese blockchain company too, animechain.
not looking like a bright future now. Crypto is fraught with scams and shows very little real world utility outside of that.
while crypto often is built on ideals, the effective use of it is only ever scams and crime
guys is there an alternative of Converting the PT file to SAFETENSOR ?
as long as we get SD3, finetuning tools and controlnets that don't suck then I don't mind

but yeah this sounds somewhat shady then
really sucks to see stability hitching their wagon to all these different blockchains. this is not the way
Equivalent to selling your soul

i used to love blockchain tech and really thought it woudl make a change towards a more decentralized internet.
here we are though. the legacy is crime and subterfuge
Stable diffusion cascade is good at making mood images
I wonder if SD3 will be the last model like Mistral's last model being mixtral-8x7B
ehh
its a grim future
"We always intended stability to be closed and proprietary" - George Lucas probably
yes
hello world 😃
really?
at fp16?
how do you know
im happy if thats the case though
this would mean T5 and SD3 is possible on 12GB
Yeah
is there an alternative of converting PT files to SAFETENSOR ?
I'm going off large language models and my personal experience with sdxl and cascade
you can unload t5 after building the embedding too
The online tool is a pain in the coding ass
Image models are easy to run
The most high end open image models run on consumer hardware
Unlike large language models
comfy does that automatically, but to RAM so it can be loaded fast again
this is how I can run Stable Cascade
its super epic
yup. i think we'll be fine here, but post sd3 i don't think theres much legs under stability. blockchain is kind of jumping the shark
hows that for a boomer reference?
try asking in#🤝|tech-support
it's a dead place
I have no idea personally
people ask for pc ideas there instead of actual tech support
thats stupid
Ask gpt 4
well its task support but still
i bet there's already safetensors of whatever .pt file you're worried about
and also, .pt files aren't all bad. 99.9999% of the time they're fine. That .00001% is the possibility that somene coudl embed a script and you could potentially load that into something that actually runs that script. All major UI's don't run pickle scripts out of the box though
Ye ask it
there's a ton of fear mongering around the ckpt file format tbh
when mistral medium came out we thought it'd be a nice way for the mistral team to generate extra income from a propietary model whilst preparing for other open releases
custom nodes and extensions are a bigger worry for security vectors really
they actually are scripts and people DO run them all the time
All I hope for
Is sd3 gets released
If they close source after that idc
Because we'd have dall e architecture level anyway
pixart guys have proven that people can train base models a lot more efficiently now. Pony XL has proven that a small operation can put out something exceptional
we kind of don't need stability at this point so if they go full Roger Ver after SD3, we'll be fine. It just sucks to see the hero of the story turn into the black knight, y'know?
well actually, breaking bad was kind of awesome
a beautiful disaster
Yeah
The papers are public
You can train a base image model with consumer grade hardware
Unlike large language models
if the sex bot addicts are dedicated enough, they'll find a way to weird science a new model
They're very dumb
uhhh
I wish there was a very cheaply trained model that was like crowdsourced (like if that would work lol)
crowdsourced is just a buzz word that corporations created to farm donations without decalring themselves as a non profit
wasn't unstable diffusion crowdsource
I just hope they release sd3
lol that was just a regular ol kickstartr scam. they bought themselves new gpus
they will
TBF I don't give a fuck about open source licenses my bad to offend anyone but
I don't follow open source licenses
At all
If you open source something then technically that thing doesn't belong to you
Very controversial yes but it's a good thing
I was very surprised when I saw that Mistral-7B and Mixtral-8x7B were apache
kickstartr is the corporation the benefits off most of these grass root projects. they skim off every campaign. they've got lots of copycats out there too
starcitizen
It seems that SD3 is what Deepfloyd tried to achieve (prompt adherence)
except that SD3 will run on our computers and will have massive community support
Do you think consumer gpus can finetune sd3
nahh
actually
hmm
I wonder if 24GB is enough for a lora
or Qlora for that matter
if qlora would work then absolutely
stuff like 12GB and less will be out of question
I honestly will mostly care about inferencing like I did all this time, especially with how good photography looks on the base model
You can easily use 12gb to train a lora
Oh
Still
For large language models light work
So it's probably the same for image models
Qlora will be an interesting story, cause if that would work then 12GB could possibly work
You can finne tune 8b Param models with 12gb vram I believe
I want IPAdapter to work well with extracting subjects
(not just faces -> portrait)
It would take a very long time though
but yeah photos looks great by default and I can't wait for Loras or Massive finetunes that will ENHANCE prompt adherence with extra actions, interactions, facial expressions, etc
those will be massive
well on 12GB its not bad
Takes 30 seconds for sdxl image cascade takes 1:20
Yeah
I don't remember that but I could be wrong
90 seconds for cascade
wait dont you have 3060
thats BS
idk I remember some mentions about efficienty or speed
It's really unoptimized
don't underestimate the ability of a coder to make their own inefficient code
https://twitter.com/search?q=%23sd3&src=recent_search_click&f=live ugh its all crypto spam
Lmao
somehow i got stuck in the dumbest timeline. i need to get back to where the smart people forked off too
i'm probably just dumb and belong here
Wtf is a pickle in coding
a way of saving a set of tensors created by a ml model
Sorry for interrupting, any ETA for SD3 yet?
it's not even in public beta, and they want to release it with a set of tools. We'll probably see the weights in a month
https://docs.python.org/3/library/pickle.html object serialization in python
its a web interface as i've heard it but its all under nda so no one saying much. maybe they got actual weights. i dont' really know
Wtf
are there instructions on how to run stable video 3d?
is it similar to stable zero123?
What settings would you recommend?
so if I buy a membership to stability ai will they host the 3d model for me or do i still have to download it?
pretty sure it just gives you commercial usage
you still have to self-host, which I think is pretty silly tbh
considering you probably need a very powerful GPU to get speeds beyond a crawl with the model
stable video 3d have any workflows ytet
It was released like 3 seconds ago...
I will never get this about the community lol
Its like those job descriptions "must have 10 years experience in xyz language" but xyz language has only been out for 3 years
its copy written by some clerk / assistant / intern. maybe even automated.
just apply instead. if you've got a hireable portfolio/resume, it'll be fine
gn all
it's alright, they are just looking for time travelers
Someone shared an image to turntable video/image sequence here: https://www.reddit.com/r/StableDiffusion/comments/1bi1v7c/introducing_stable_video_3d_quality_novel_view/kvhoe56/ I haven't seen any for the second phase to generate a mesh.
Not under normal circumstances; that's a pretty damn fast card.
Привет брать
I've been meaning to buy a 12 or 24 GB VRAM nvidia card, what will you recommend?
Go big or go home, 3090 or 4090 (if you have the money for the latter). I think the ML boom will cause / is starting to cause, a GPU "drought" somewhat like what we saw with the Ethereum mining plague of 2019-2021. So, I would buy now if you can. Usually if you wait, you can get tech for much cheaper, but with all the "AI" stuff, that may not be the case.
the 3090 is with 24 vram
do you have one with 12 you'd recommend
since I am running on an rtx laptop version with 6
Also, I know I am "boring", but I don't want my computer to look like a gay disco, I'm not into RGB, I like the reference models. Sure you can probably get slightly more performance from some (re)manufacturer.
Okay, unless you're itinerant I would go with desktop, your money goes way further.
no no I already have a laptop it's just that the gpu is faster than my pc
and I've been trying to save up for a new pc gpu
Ah, I'd go for a 3090 then.
is there a 12gb vram one you'd still recommend
No, go big or go home.
12gb isnt enough if you plan to do any training
Do you own a Pine phone?
изза чего может быть?
у меня проц 2010 года
just generation
no, the Pine had other meanings, one of which was pine trees
as a poor man's 6 gb vram I've done a lot of workarounds to even get some stuff to run
Oooh, like the riots?
(in the USA, those riots)
nah, I live in a pine forest
Ah, very cool.
Pines smell great.
And the sap like, repels moths or something (?).
is there a way to load sv3d on low v ram (16 gb)
i cant find where in the code i would specify to load in fp16
Stability missed an opportunity by not designing a web portal for SV3D and making it look like it’s from Willy Wonka
DEVIIIIIIN I NEED YOUR ARTIFICIAL CODING SKILLS STAT /s
У тебя есть Эльбрус? )
Does anyone know how to get 3d Mesh's out of SV3D? Looking at the github readme, I see no mention of mesh output, only video. Yet many of the blog posts about the model mention 3D mesh generation. Just wanted to make sure I wasn't missing an obvious CLI flag for that output, or if they're using a separate model to turn the video into the mesh.
@Vex I haven't looked at SV3D specifically, but for pixel-wise prediction models like Zero one to three, the approach is to learn an implicit representation then run marching cubes
@hollow fable \
does anyone knows where i should ask questions about Reactor extention ?
https://discordapp.com/channels/1002292111942635562/1004159122335354970/1219113832987885588
[Stable Diffusion 3] Paper figure 3 (link above): Does anyone know why the FID score increases with a higher number of sampling steps? Shouldn’t it decrease instead?
anyone know any good videos on how to use the kohya gui
Stable vid 3d ui waiting room
this isnt new, but it's pretty thorough https://www.youtube.com/watch?v=xXNr9mrdV7s
Blockchains are a virus
if anyone wants to try sd3 I have a code
I want
yes plz
Shameless plug but just a recent paper I worked on for the last few months - https://x.com/__z__9/status/1769911791117578518?s=20
Tl;dr: We demonstrate how to utilize generative data in category only online CL framework. More importantly, we propose a prompt diversification module and a novel sample complexity guided ensembling technique that strongly improves ID and OOD performance in online CL benchmarks.
We show SDXL, DaLLE-2, CogView and DeepFloyd can vary in generated sample complexity for same concepts and same prompts.
Would love some feedback 🙂
Nice work! The abstract is quite hard to read, it doesn't read like what a native english speaker wrote it
Now it remains to wait for the ComfyUI update https://x.com/_akhaliq/status/1769926279053103565?s=20
Question is if the SD3 Turbo will be better than SD3 Lightning!
Looking at the SDXL Turbo vs SDXL Lightning results, I feel like the Lightning model looks quite a bit better
The PDF file said that the prompta's following is worse than that of SD3
But the rest is at a high level
Thank you for the feedback. Yes indeed, not all the authors are native english speaker. We will improve the writing of the abstract further. Anything specific that caught your attention to be indicative of being written by non-native english speaker?
SD3XL when
Although prior arts -> Prior art suggest that
whole sentence: Prior art suggests that webly supervised training can be accomplished using web-scraped data.
this poses challenges such as data imbalance, usage restrictions, and privacy concerns -> However web-scraped data may raise concerns in data imbalance, usage restrictions, and privacy concerns
Addressing the risks of continual webly supervised training -> In order to address the risks of continual webly supervised training
The proposed G-NoCL -> The proposed G-NoCL method
generators G along with the learner -> generators G along with a learner
When encountering new concepts (i.e., classes) -> When a new concept is encountered (i.e. classes)
G-NoCL employs the novel sample -> G-NoCL employs a novel sample
The abstract and paper has many many grammar mistakes 😦
I dont think so, we used both Writefull and Grammarly grammar parsing. Also webly supervised is correct -> its not weakly supervised.
But sure, we will check further, thanks!
Is anyone in here knowledgeable about double gpu setups? Cause i need desperate help
So when i run my AIs my PC blackscreens and then restarts. Sometimes bluescreen
I bet it is a power issue
I'm serious
When I ran the quad 1080 Tis, the whole room's light would flicker when I ran machine learning jobs
And that was with a 1600watt PSU
I have a 3080 and a p100, i have a 1200ps unit
I dont have enough kidneys to buy one lol
Well, I bet 5090 is coming out soon
I just have a simple 4080
But i really need help with the double set up. If its not the psu
I really think it is the PSU
Its so fast when it did work but then it dies(it worked day one then not again)
Reaching 1000 watts plus on a P100 + 3080 doesn't seem impossible with a CPU
I have a rysen cpu, 3900 12core i think
I'll look into a psu, any other things it could be that i can test right now?
I think I have a 1000watt psu
you guys think a 4080 is fine?
maybe you need a 5090
If you steal the leather jacket of Jensen Huang, it has time travel powers
100% chance it will manifest a 5090 in your oven
On a more serious note
I'm actually pretty excited by the Nvidia Thor product (500 tflops of fp16)
its for cars though
can someone teach me how to use a purchasing bot to ensure I get a 5090 at launch
you would think there are captchas to prevent this type of thing in the first place
i ethically cannot spread such knowledge

its not like retailers are trying to prevent sales. they're not really in that kind of business space
Guys are there any demos where I can try this? https://stability.ai/news/introducing-stable-video-3d
@buoyant moss mind if i dm you
Sd3 turbo paper
Ridiculously good results
Basically sdxl lightning, 4 steps, highres
And idk how buts its still intelligent and coherent a lot of the time
It does lose out on prompt adherence a bit compared to SD3, yet its still better than midjourney and below
It even generates at 1 step, but of course lose a bit of coherence
https://discordapp.com/channels/1002292111942635562/1004159122335354970/1219113832987885588
[Stable Diffusion 3] Paper figure 3 (link above): Does anyone know why the FID score increases with a higher number of sampling steps? Shouldn’t it decrease instead?
is there a way I can set some tags to appear by default? like EasyNegativeV2, uncensored, masterpiece etc.
they're used in all generated images anyway so it's annoying to re-type them every time
Imo from the new SD3 paper it looks like SD3 slaps the bajeezuz out of sd3 turbo
The speedup might not be worth the tradeoff in quality
Then again, 4 steps vs 40ish might be worth the consideration, who knows
That paper is also low key teasing that normal SD3 understands the token "no" now ("no cows").
could cry till the cows come home
Unfortunately, from what has been said, it might be a while before SD3 comes out, like weeks or months.
Any local open source tools describing images and generating texts?
Check out taggui
where is the 18+ section?
I think this is an automated flow tool.
Though I am curious, why use 'no' when you can just add what you don't want to the negative prompt or make a negative positive prompt
the new blackwell chips are 1000x the speed of my gpu lol
The energy consumption costs more than my card
yeah but watts per flop is prob a lot lower
jensen pls give me one
hi
Hey everyone, I'm curious about something. If I use specific keywords and elements to train an SD Lora for creating images, and then later change up these keywords to design clothes, do you think the designs and elements on the clothes will come out consistently? Has anyone experimented with this kind of thing before?
Might've been a test since even bigger LLM's usually don't work well with negative instructions.
Imagine you get one and then being bottlenecked by PCI Express 4 and your weak power supply. I would appreciate an affordable 24-36gb consumer card in the RTX xx70 range.
I have seen on multiple ComfuUI workflows that show connectors with straight line and 90 degree bends. How do i set this up?
how much the least VRAM to make 1080p?
From what I have read in the past few days, you will need at least 10GB, and basicaly the more the better. but thatsa generalisation across SD
People were saying even though some stuff was meant to run in 8GB, that they had to use 10GB to do it
hiayaaa, ok :') in 4gb vram
I can still create images in it tho, but it's just not hires
local install of SD?
on the Civitai site it states -https://education.civitai.com/sdxl-1-0/
"Hardware Requirements
The official Stability requirements for local inference (generating images) are 16 GB of system RAM, and an RTX 20XX GPU with a minimum of 8GB of VRAM. Linux users may also use a compatible AMD card with 16 GB of VRAM.
Training requirements are a little harder to pin down, but we have confirmation from Stability that LoRA can be trained on an RTX 2070 with 8GB of VRAM. With an input resolution of 768x, training used 7.1 GB of VRAM and took ~30 minutes.
Note: Despite Stability’s findings on training requirements, I have been unable to train on < 10 GB of VRAM.
Training at full 1024x resolution used 7.8 GB of VRAM and 2000 steps took approximately 1 hour"
This was for SDXL. I dont know if its the same for other SD models
So I basically can't train lora... haiyaaa
You could ytry online. but it may cost. Civitai has a portal
im not bigging them up, i just found the site very helpful
I mostly download models from Civit
me too, currently about 180GB worth
Im just playinga nd I have a massive system to play on
Do you have --xformers --medvram --no-half-vae in your webui-user.bat?
hmm, I got a question... some LoRAs sometimes gave me broken output when I set the resolution too high.. is that true?
what even is that xDD
how to check?
Startup arguments for to best performance for your gpu
Oooh
Right click and edit the webui-user.bat
i know on that. I believe the models are generated on a square 512 x 512 for SD1, 768 x 768 on SD 2,2.1 and 1024x1024 on SDXL
At the line COMMANDLINE_ARGS=
You add --xformers --medvram --no-half-vae
Then save and relaunch the webui-user.bat
@echo off
set PYTHON=
set GIT=
set VENV_DIR=
set COMMANDLINE_ARGS=--xformers --medvram --no-half-vae
call webui.bat
like that?
what quantity of VRAm will that allow to work CS10, does it off load to cup or just slowly burn through processing on the GPU?
With that it should be much faster and uses less vram, allowing higher resolution
CS1o even
but, at what cost?
At no cost
just time?
Thats why its in my install guide
No its even faster
bruh, then why that commandline is not in default xDD
Automatic1111 install guide for nvidia:
#🤝|tech-support message
Because it depends on the vram and GPU which args you need
Ahh, i went ComfyUI install, and just installed InvokeAi to compare and contrast
Comfyui usually already uses xformers and adds the right args. But can be also modified if needed
does the RTX 3050 Laptop still needs the --xformers and --no-half-vae?
Yes, xformers is the boost you want and need.
--medvram splits large models into smaller pieces when loading them into the vram resulting in faster usage and less PC freeze.
--no-half-vae is to load VAE files as fp32 for compatibility
Yeah, i just added --listen to my startup bat so I could generate images on my phone from the install on the PC. no real reason, just geeking out
oooh, ok got it
Yea thats a nice arg, using it too
CS1o, Question, would I benifit from loading models into a RAMdrive. i have redicylous amounts of ranm
There is also --share for access from anywhere
#1080946152318443610 camputer chip
time to give it a shot lmao
That only helps if you quickly switch between some models.
They would load faster. The image generation time isnt affected by that
i was looking to grant acces to my box from the t'internet, but was wary. Im a cyber guy, so setting it up is all fine and well following best practaces but having read about pickle, I only use safetensors, not sure what other wories i should consider with SD
If you use .safetensor files and also mostly download from civitai your fine.
Keep an eye on extensions as they have more rights and can load stuff.
Good to know. currently im on about week 3 of 'playing' with SD. I have no creative background so just download models and load them as and when to see what diferent outputs i get, got juggernaught, dreamshaper, blankCanvas and about 10 others 🙂 maybe I should streamline
btw, does the Automatic1111 updates stable Diffuision when the new SD3 open source code released?
I also cant type
Yes
oh wow, that's cool
And which webui do you use ?
Yea its nice for seeing the workflow and understand what its doing
I wanna see how ComfyUI looks xDD
ill pop a screen grab into gen-with images
I'm more the auto1111 user. I don't like the custom node thing. Would take me to long to build workflows.
I asked Gemini, the difference of ComfyUI and Automatic111 xDD
it says, the ComfyUI is for beginners cuz it was easier to use and understand
while the Auto1111 is more complex for those who wants some fine tuning
after I saw twelsh37's img, I think so too
I get that part. i was 'brave' yesterday just installing InvokeAi, but it had a seperate install location and allowerd me to reuse the already downloaded models which was a godsend. Im sure i could throw Automatic111 on my box. what port does teh webuo come up on by default? I take it ot dosent clash with others?
Auto1111 doesn't interfere with any other webui. You can git clone it into a seperate folder and its fine, you can also use the same model paths. But it doesn't have such a nice ui like invoke. But much more features and Extension support
Thats what I thought when i rist saw it, but not knowing any better i have persevered, there is a guy on YouTube, Olivio Sarikas, who does awesome vids. learned so much from watching him
I only watch youtube the installing progress, the rest, I learn it myself xDD
Yeah, i thought that. i dont want to say its the 'grand-daddy' of teh UI's but it seems to have a lot of support
wow, the commandline really does the job...
You would definitly struggle with ComfyUI then 😛
xDD
It is in fact the first webui ^^
It has the best support for Extensions but comfyui is on par mostly
heck, 1280^2 really looks good
And thast the reason forums like this are great for knowledge sharing
that's the reason why I find this server
Make sure to use 10 hires steps. And for 1.5 models dont go to high on the default resolution as then the deformations and duplicates start
same
oh, okok
I wasn't use hires, pure 1280^2
Oh okay
CS1o, any tips on fixing hands and feet? models?
I use textual inversion
1.5 models are trained on 512x512 so you better stay near like
512x768 for portrait or 540x960 and then Upscale by to for FullHD
Or 768x768 works too
That was one of my first learning points. model size and teh cube.
sometimes, when I upscale it... the image is just... like, looks ugly
well square
Upscaling, good prompt, and there are some loras and embeddings that should help. But I mostly just upscale and hope for the best xD
so I have to use like, 1.5x upscale
use the ugly keyword in your negative prompt
You have to set the upscaler to Esrgan4x for example and then 10 hires steps and denois 0.5 And it will look good.
The default settings of hires fix are not usable
You can also use latent bicubic with 0.5 denois
which one is good for anime like arts? R-ESRGAN4x Anime6B or the one you recommend?
There are a few good ones you can download from openmodeldb.info
Like Fatal Anime 5000
Or lolypop
And yes the Esrgan4x Anime 6b is good too
This is so informative but i really need to go and prep for my talk later. Good to meat you both. CS1o thanks for all the tyips. I will go take a looka t installing Automatic1111 later
got it
No problem 🙂 if you have any technical questions regarding the Installation, my guide, or how you set the model paths, feel free to post in #🤝|tech-support
also, I notice that, when I use higher res, the character is just getting smaller, due to the base of SD1.5 models which is 512^2
but that means, wider view too
You can prompt for close up portrait
well yes
the pic is still not closing up to the chara, but when I use 512^2 upscaled to 1080^2, it works
That should enhance the character as full body on 512x512 (not upscaled) is bad
hmm, yeah
For any image related questions feel free to ask in #📝|prompting-help
Someone give me prompts to generate I'm. Bored
Hyundai N Vision 74 maybe? :v
well, no
Made an offer Emad can't refuse. 🤣
https://x.com/nightgrey_/status/1770032627329777838
Edit: In case you don't see Emad's tweet, this is a reply to him writing "I'll give you an invite if you give me an invite".
On a more serious note, have you guys seen Infinite ID? I hope they'll release the code, looks really nice! ✌️
Hey guys! Would you be kind enough to link me one of the masking tools in the Extras tab? I reinstalled SD and don't remember what it was called.
Good morning everyone! How are we all today?
is there a channel for the 3d model which Im not seeing
you mean inpaint?
There's not one specific to it, but it is probably #▶|stable-video-diffusion given it's a Video 3D diffusion model.
Not sure tho.
for those interested, i just released my pixel art animation workflow for Comfy
and it's like, felling all my RAM and VRAM...
I found out this issue when I searched about the commandline too, any fix for that?
SD3 Turbo with highresfix is gonna be awesome
I can literally do what I have been doing with SDXL Lightning so far
except better prompt adherence 🤩
do those turbo models offer anything apart from speed? and why does that matter, unless you have some kind of quota
its called being impatient
got it, hah
if I got a super complex prompt that wouldn't work with Turbo I'd just unload the models in comfy and switch to normal SD3 weights and I'm good to go
how many it/s is considered decent ?
well depends on your patience of course
imo its like 3-5it/s if its like a 20-25 step image
idk
what about s/it
I generate slower but cause I'm using quite a high resolution so I think its justfied
its like 1-2s/it when doing highresfix for me
openvino
wow thats good
is it intel Arc or Xeon or just generic iGPU
it is listed as intel arc but it is actually the integrated graphics of my intel core ultra cpu
hmmmm
Supercharge your gaming and content creation experience with built-in Intel® Arc™ GPUs on select Intel® Core™ Ultra H-series processors1, or upgrade to Intel® Arc™ Pro GPUs2 for ISV certifications.
but i cant do ipex on it so i think it is still iGPU? perhaps?
wow built in intel arc-like igpu

thats nice
and probably consuming 1/4th watts of power of my dedicated gpu 
yeah
SD3 Turbo is gonna be sick, I really thought it'd be some lower res thing again with heavy penalties to prompt adherence or image coherence
but its more like lightning
jesus christ stability devs and researchers were cooking so hard
Turbo will open a lot of doors
Lots of flash marketing opportunities to say the very least
We compare SD3-Turbo 1024²-MAR to SOTA text-to-image generators.
Yup, so it's 100% 1024px just like the base model.
makes sense
Hey I'm out of the loop, there's SD3 Turbo now?
as their focus has been on the context not the scale
Not yet but it seems to be Stability’s cadence
its SD3 but faster at lower steps
similar to LCM, Turbo and Lightning
the trade off is a *very* slighty decrease in coherency and a decent decrease in prompt alignment
but it generates in ONLY 4 steps!
I can't wait to toy around with SD3, but if a Turbo model does come out it will be very convenient!
You’re going to have a lot of reading with these models
Lots of words
(good thing)
and if you feel like the prompt you made is super complex then you can just unload the model and switch to regular SD3
Love it
but like a sentence is still usable with SD3-Turbo
because they have to equivalent of FaceID going on more or less
I wonder if SD3 will run well on a 4070 Ti 🤔
I was left with the impression you need 24 gigs of RAM to run that
I don’t think it will. I think they will have quantized versions probably that aren’t quite as dynamically performative
As SD3 itself is a shift in complexity and thus a shift in resources
That's a shame! Time to start saving for a 4090 😂
How exactly does DallE-3 work? It has to be some sort of stable diffusion to be able to make stuff of that good quality. Does the machine take the text you give and make a better prompt with it?
4090 is ridiculously expensive...
RTX 4090?
Precisely!
SD3 will hopefully fit on 12GB eventually or even at launch, it will depend on how much 8B Diffusion itself will take. Models are automatically offloaded to RAM once they have finished generating (T5 and the SD3 Weights).
T5 at 8-bit will probably fit on 12GB
And on the topic of Dall-E 3, I like that it has more facial variety, so I came up with this - Dall-E 3 output + SDXL Canny ControlNet to add realism, as Dall-E realism looks very airbrushed
I am guessing there is a difference between a gaming laptop 4090 and a Desktop 4090. Because I am going to get a laptop with one of those, and a intel i9
That’s really cool—I just hope the scale doesn’t diminish the quality too much, as there seems to be a lot going on in this architecture
on 16GB I think it will fit just fine
True. Technically, I could take DallE’s output and use image to image and inpainting to make it better or add stuff.
Yep! Or you can use a ControlNet like I did. May I send an example here?
Not sure if I can send photos in this channel 😅
Looks like there's #🏞|general-with-images
same q
Sent my example there
The face changes only a teeny tiny bit but it's not such a huge deal.
Though I wonder if using a T2I Adapter can be better 🤔
I tried with one but the results were a little messy
I really need to look up these differnent controlnet things, they seem really useful
They are!
It's very powerful! I figured you could add realism to faces at first, which I did. Later I also figured you could change certain features, then combine a controlnet with inpainting to make very precise edits! And finally, colorizing B&W images. Though that last one isn't as simple as generating an image, manual work is required - I recommend using image manipulation software, layers and blending to achieve results.
looks really good
I hope we'll get Canny, Depth and Openpose controlnets on launch
yeah, that would be nice
Imma send a few more examples...
they promised controlnets on launch I just don't know how good they will be or which ones they are gonna give us
maybe even an alpha controlnet?
SD3-Turbo can also generate 1step images which resemble early LCM attempts
oooo
I wonder what would happen if we went above 4 steps
I suppose they might just converge at around 6-8 steps and not improve 🤔
eventually these are going to be so fast and the temporal consistency will be dialed in enough, that youtube streams will be all over the place of SD-generated realities lol
hey guys, does anyone know a good way to generate large scenes? it seems to want to generate an image of just one person/character but i want to generate large landscapes with many people
you're probably better off starting with generating the landscape then inpainting the people
4060ti 16gb vs 4070 12gb , Im trying to build a new computer for AI and im struggling between choosing which ones to buy , could anyone help me pick ?
i mean, its somewhat hard to say, if its the regular 4070, and not ti, then 4060ti is prob better
I don't think so! It masked like the character out (if there was a character), and the background became transparent
The regular 4070 has only 8 GB I believe 🤔
maybe he means 4070 super?
Ooh, idk
So there are so many versions of ControlNet canny, depth, lineart etc. for SDXL 1.0 and from like 10 canny models, only 2 work good out of the box. I tested them all. Why is that and why the controlnets for 1.5 models were better?
I actually have no idea they are lackin so much
going through the same motions with xl vs 1.5, I agree, maybe there's better ways to optimize, but apart from that I agree more or less...but it is better than not having it at all
I've been more or less sticking to depth_midas, and ipadapter, both of which work pretty well on xl
the canny, sketch etc, hit or miss
also, playing with the resolutions in cnet for the models that support it, sometimes yields better or worse results
because 1.5 is chad
it stood the test of time
I suppose now its gonna be between SD 1.5 and SD3 and the other previous models will be left alone lmao 
unless SD3 is proven to be good at corn at the lower parameter count models
hehe
(but i mean…probably true in essence)
I just hope SD3 will get GOOD and LOT OF controlnets at launch
Canny, Depth, OpenPose are a MUST, like it should be default
and maybe inpainting and edit if those are controlnets or whatever, idk if those are separate models
Context is king…they can architect it any which way, but the encoding needs to have sufficient complexity
Lmk if this belongs in a different channel: If Blackwell chips are slated to cost 30-40k per Jensen, does that imply the price of H100s / A100s could fall? Or not necessarily because Blackwells could search well beyond initial retail price due to demand?
I think they could fall, but…probably won’t fall as much as people would like them to because $$$
People are stuck on 15. They'll stay
It's really sdxl vs sd3 and the 15 gang off doing their own thing
not to mention with super resolution capability increasing on the broader end, there’s little reason for people to feel compelled to use newer models unless they’re dead set on text placement. Otherwise they can continue to work with fine tuning and SR techniques of their own.
Stability is going to fall. Too many crypto partners lately. These are signs of buckling support.
Why was Stable Diffusion 2 such a massive failure anyway?
That’s how it goes with open source. 😦
SD 2 showed signs of being,..neutered. Though it was never made super explicit.
Neutered as in?
Harder to make Loras for sd2 is why I don't
that too.
I don't see why I would use an SD2.1 model over an SD1.5 or SDXL model 🤔
Hopefully SD3 will finally outclass SD 1.5.
I agree
well intelligence wise absolutely, no question about it
Quality wise (if not the base already), finetunes will excel at photos
the base model looks soooo fricking good with photos
yes it does! that tells me that perhaps their datasets are lighter, but the encoding is more precise / rich.
in a walnut shell.
I wonder how the 800M model will perform 🤔
If SD3 will require resources like XL, then you're right. The 1.5 group will remain there.
💀
Not to be that guy but is SD3 super censored 🤔
with a lot of finetunin it can probably look as good as 1.5 finetunes but with more intelligence
I don't care as long as they release sd3
idk it looks like as censored as SDXL
@Sufi yeah, kinda expected at this point 😕
Brother you can uncensor it 💀
which isn't that bad
ehh
It's not that easy
They just won't train it on bad stuff TBF
well if they figured it out for SDXL then it will work for SD3, this isn't SD2.X or Cascade
From my own personal testing these image models have no guardrails
I've only used finetunes except for cascade
Also by censored I didn't mean whatever they might offer online
well no built in harcoded guardrails
but it still somewhat depends on how hard the base model was censored
Guess so.
But for example, SDXL simply can't generate certain things
probably
What things
In A1111 for example
Could be a technological constraint rather than censorship
we’ll see. in the end, a lot of laws seem to be coming up that are going to retroactively target abusers of copyright, etc.
no he probably means stuff like in 1.5 that works
I wonder how hard new concepts will be
I actually just meant nsfw content 🤣
ahh makes sense
mm
What stuff
so far anatomy looks good for a censored model
Bro 💀 there is no censorship at all then
Zero
Unless you go into some really messed up anatomical stuff
like if it has like idk "nude" or whatever then it was excluded, but no NSFW image detection
Sorry I was hesitant to mention it outright because I don't know what can get me banned 💀
nah its okay we shouldn't talk about stuff like these that much
Won't get you banned
even if CogVLM doesn't pick up nsfw, half the dataset was left raw captioned so there HAS to be some remains
but why can't cascade be run like any other normal model directly in A1111?
and of course the model is massive (8B) so idk if that plays a role in anatomy
without additional extensions
cause 2 models
Just use a non base model, why the obsession with censored
1 for a very small image then the second model upscales it to regular resolution
I'm running it off some GitHub thing some guy made
Issue is he used chatgpt to make it 💀
in comfyui it works perfectly under 12GB
The Stable Cascade extension?
idk man I have faith in SD3 not being lobotomized
I just wonder if how good it will be at certain stuff like games and arstyles (of deceased or people who didn't opt out)
previous models kinda sucked at games
I wonder if Stability decided to get games and other stuff out of the model
I don't think I have ever tried games 🤔
I think they zigged one way with 1.x, zagged another way with 2.x, and so on. They’ve been finding a great stride lately and it sucks to hear about the crypto speculation, but…open source is truly good work because it’s open source work.
yes, definitely. and as for the new laws, we just gotta hope that they are enforced to principle and that the principle ultimately boils down to something that doesn’t destroy the freedom of expression or of creation.
SD3 Turbo is such a positive surprise :), I thought it'd be heavily watered down to get 4 steps but the geniuses at Stability figured it out
exactly
I'm hoping that SD3 will come out soon though, because SDXL 0.9 came out like a week after it was announced if I'm not wrong 🤔
Then a month later 1.0 followed
probably 2-3 weeks if they were talking April..
Well I can survive until then for sure 👍
Emad sure doesn’t get enough credit in all the buzz.
testing phase will probably take place towards the end of this month and maybe beginning of April
they said they'll be inviting more people this week
Still learning new things about what we already have! Such as this workflow I came up with to colorize images and add realism to Dall-E 3 faces
(besides like 2-3 twitter AI research users lol)
yeah DALLE-3 has a weird painterly look to all of the images
or maybe mostly for faces I've seen so far
there are some realistic photos from DALLE-3
If you missed that, I sent some examples in #🏞|general-with-images where I added realism to male faces 🙂
yeah
they look nice
I've been using SD since august 2022 and this moment will feel so special
maybe I'll feel the same way as I did back in august of 2022
having an image generator offline is now taken for granted so easily...
There's something so specific about the way ChatGPT prompts Dall-E 3 though, don't you think?
I can't put my finger on it but it feels so AI 😂
I made some gnarly images using prompts from those
unfortunately SDXL and previous models only understand like 60% of whatever superprompt makes
Generate a photorealistic image of a man with a beard, capturing his masculinity.
I can't 🤣
SD3 is going to be massive with this tool
hmmmm
Like why does ChatGPT prompt like that?
yeah its adding a lot of story-like detailing lol
cause I guess its an LLM made for general purposes
maybe the prompt they give ChatGPT isn't agressive enough
But the question is why have ChatGPT be the middle man?
I could just prompt it myself like Bing Designer allows me to, just that in the latter I have 15 credits per day
i wouldn't say lobotomized since they're not going in and chopping weights out of themodel. What i think more is like, you know early childhood development problems? like that one girl who was left in her basement tied to a chair and never learned language.
well I suppose they did the same as SD3 (captioned dataset with vision models)
but unlike SD3 they might have done it like 75% if not more
therefore they might require that natural language type of prompts almost all the time to get good results 🤷♂️
thats just my theory
AN AI THEORY
EXACTLY
*Tenor Gif: office thank you*
it has the original sd15 clip layer still so pepople will still lean hard on prompt salads
and clip_g (which is more natural language?) and of course the T5 (which is optional lol) 👀
tenor gif? wtf. just gifs!! tenor gifs are like, bugs bunny being the maestro at the opera
lmao
Do you guys think we could fit the T5 on 10/12Gb vram cards? 🫠
think so
Hi. I’m looking for a freelance dev for building a pipeline for stable Pm me if interested
at 8-bit 100% even if alone on itself
if you know about the comfyui's offloading technique then you know that we're gonna be fine VRAM wise (if the 8B weight is less than 12GB)
Us in 2034: generate virtual universes using AI where we can insert our consciousness and just walk around and do things

@arctic sedge just take this with a pinch of salt, we don't exactly know the exact VRAM requirements for each file
but for T5 we have way more knowledge (LLMs are easier to guess out of experience, take LLAMA-7B for example, the T5-XXL is only 4.7B)
We went from SD 1.5's "haha lol look at this distorted face and this terrible anatomy" to, well, this server's gallery in like 2-3 years?
exactly
Yeah. It's something like 19Gb of vram?
But SAI also added that it was (Unoptimized) Whatever this is alluding to.
well they said 24GB (unoptimized) (with probably everything loaded at the same time 🤷♂️)
I wonder if xformers will chip off extra VRAM from the SD3 weight, idk if they already implemented that
Possibly will, but i can't imagine it would be considarable enough ammount.
yeah
when i tested the lavi adapter code , t5 on sd15 didn't breach 10gb
and that wasa just me winging it with their test.py
There's "hope" Lol!
pixart model uses t5 too i think. you can play around with that in comfyui today on your 12gb i bet. maybe not. don't take my word for it.
and that's without 8-bit it seems, can't find load-in-8bit:true in the code
but of course that can still decrease VRAM usage further with very minimal quality difference and without any conversion needed
yeah some people figured it out
same with deepfloyd
on 12GB
Yeah. I also heard about that too. Shame it's too difficult to train. Not seen anyone make any majorly good finetunes of it.
all my life i been wanting more bits. when the NES landed i was like HOLY 8BIT!! then super 16 bit then whaat 32bit processors!? WTF!? NINTENDO 64?! HOLY FUCKIN SHIT.
then we lulled out for a while and now just as 64bit is starting to become ubiquitous, now i want less bits
there's gonna be pixart SIGMA 
which also did some vision model captioning
Oh nice!
In the SD3 research paper:
Mem is the memory required to load the model on the GPU. FP [ms] is the time per sample for the forward pass with per-device batch size of 32.
And T5 is listed at 19.05 GB for some reason? (Clip-G + Clip-L + VAE take up like ~3GB in total)
I dont know why all these alpha male types think sigma is a bad thing. none of them know latin clearly
nor realize that the alpha wolf theory was all bunked and made up
it's got as much basis in real world as phrenology or humors really
in training they're probably keeping them loaded at full precision
https://deepmind.google/discover/blog/sima-generalist-ai-agent-for-3d-virtual-environments/ the fuck is this? i was just talking about how this stuff needs to come to game ai yesterday
too bad it's an alphabet model and they probably won't release weights
oh yeah right
this could be fp32
oh shit imagine some people buy this as a service and set it up as a stream
i type way too much so some people i can't convince i'mnot a bot
i fool so many others though
i mean.. i um.. i'm not a bot
👀
yea the whole alpha male theory is a bunch of beer-chugging fratboy nonsense
but tell that to the roe jogan dude-wipes using population
don't forget hormone pill poppin dudes that are going to have broken prostates by the time they're 40. ||tate||
new nvidia driver today. remember when there was a quick minute where driver updates meant disaster or boosts for this pytorch stuff?
holy shit what a news day! theres even a new stardew valley patch! Who needs SD3?!
/copium
why do all the SD models want to draw sexy females?
i want to draw some male characters, and even when i put negative prompt "female, woman, sexy" it still generates female characters
I'm sick of seeing just females on every single page 🤣 When you go to Civitai, 99% of what you will see is women.
it just defaults to drawing a lewd image of a girl/woman
But about that, what model are you using and what's the prompt?
look at this
oh i can't attach pictures here
go to general chat with images
i'll post there
Anyone here into fooocus? any way to edit styles on the fly and refresh them without having to restart Fooooooocus all the time ?
Has anyone bothered running Stardew Valley through a photo realism img2img controlnet because why not
i vaguely remember seeing something like that before. i'm gonna try it myself tho, just as a challenge
Could be really funny
ai artists have a type
yup i seen stardew sprites turned into real people
I sEe Yo MaMa TuRn InTo A sPrItE
real talk, that’s cool tho.
one of my favorite activities lately has been img2img. i go through phases
that’s where Jim Morrison would’ve landed had he not died so young
if anyone is a legitimate wizard, it's morrison
nah. 1.5 gang will expunge about how sd3 is so censored and stay where they are, regardless of requirements
yeah this too
Where can I download SD3?
When SD3?
they gonna invite some more people this week to the secret discord server where they test SD3 using bots
I was really hoping for some news today...
Do you mean that it will not be open source?
bruh what are these conclusions
they will be downloadable and everything
code will be open source
and there will be an open release of the models
SD3 and SD3 turbo + controlnets
possibly in april
the buzz has been more and more lately, and with models seemingly dropping at random I’m having a harder time gauging things
well there's SD3, SD3 Turbo
and from SD3 there are 3 confirmed model sizes (800M, 2B and 8B)
the image model ecosystem approach is still pretty fresh, but i’m not at all complaining. it’s done when it’s done
me too—they have a nice rhythm
the model's quality looks good to me, like it has matured a lot since february
4/19 is my birthday sooooo maybe 😮
thats epic
fingers crossed
SD3 as birthday present lmao
YOU HAS TEXT AND DOGGOS MADE OF BACONES
I’d have to try generating a certain Stability CEO riding a dinosaur equipped with an exoskeleton
Emad? His nationality is Bangladesh I believe
source? thats what i thought 3-4 weeks ago
it is him
I’d never read his wikipedia before, doing that now 😮
cant find the reply cause he has a couple hundred and twitter f*cking sucks and I cannot search from a user
Want that shit to come out just to stop ppl from asking over and over "wHen sD3!??!"
I hope it's a bigger disappointment than 2.1 tbh. Will make it even more worthwhile
ok
use a 3d model instead of sv3d for it?
but then what is the point of sv3d
if a game studio really wanted to get nasty, they could get the rights to the DUNE II game from 1992 and redo all the gfx to Villeneuve’s aesthetic
but keep the pixel art style
meshroom ?!
thanks... will try
I don't understand what you mean, but search and you may find the answer
I will download meshroom and attempt it
EA has that i think
video maybe
eh
I mean EA
i mean yay
…just give me another Shovel Knight game. I’m good with that.
or a proper follow-up to FTL, because I’m veeeery biased to that game
In X/Y/Z Plot mode, can I have loras on one axis and the strangth on another axis?
So I get this:
LoraA:1 LoraB:1
LoraA:0.5 LoraB:0.5
silk song when?
oh thats hollow knight, mb
i get those two confused a lot
would be interesting to see a hollow knight cameo in a shovel game tho….hmm
EA gets a lot of shiz for being EA. They're huge and have good parts too. I like Dice and Criterion. Who joined forces on 2042 which is actually great and i love it and haters gonna hate
free 2042 weekend this week
gonna have a lot of fun when newbs are ez pickens. Skill floor going to lower even more!!
I mean, yeah.. they've done it to themselves a lot of the time
Mass Effect Andromeda
what can even be done with ai video besides memes, because you can create 3-5 seconds of video before thing usually turn out bad.
ive seen people create trailers or super short films but that must take hundreds of generations.
it isnt far away for one to create movies with this instrument-how long are scenes in movie productions nowadays (atleast in mainstream)-couple seconds. internet fried brains, couldnt hold out an tarkovsky/bela tarr movie anyways with scenes without a cut taking several minutes. character consistency & gens that are coherent like 20 seconds & everyone can build movies, theoretically for hollywood
what we are seeing is a syntactic convergence of abstract and matrix.
the sky’s the limit; the exponents are resources and time of course
Look at what Sora’s doing—that alone should speak volumes.
And it does 🙂
I guess im trying to find a cool free ai tool that i mess around with that can help me create something but im not sure what to use or how to approach the field of ai. ive always wanted to make some sort of creative media but with ai i believe it might be helpful.
Right now, folks are builders. People who get the most out of AI are the ones who approach their ideas as plans. If you can break the idea down into smaller pieces, you can start to think about the different tools you’ll need, etc.
Instead of a toolbox, though, it’s all code repositories and UIs. lol
The best way to start is to look at the functions that are most available right now, i.e. GPT agents, or text to image, image to video, or image to 3D
And so on 🙂
I do want to be a builder bu tnot a complex one, like i have to stare somewhere and work my way up, also your right about the whole digital toolbox thing.
i see the thing that annoys me is that most ai is behind a paywall or something.
It’s always going to start with the idea, and you can never spend too much time thinking about it. It’s the only thing that’s truly yours—of course there are copyright infringements, trademarks, etc. but that’s another world. I’m talking about the beginning of the generation itself.
Well, these tools are resource-intensive. They cost a lot of carbon to run.
i like stable diffusion because it feels open in a way.
you have a point.
SD is definitely geared towards openness, but it’s prone to the same forces of change as everything else.
that's how stable diffusion was originally demoed for testing, discord bots
your so right.
There’s that whole thing about not knowing what you were missing until it’s gone, etc—right now there’s a lot of folks who are blinded by AI, and it could very well vanish in a heartbeat with enough regulation. That means we have to start being more actively engaged in what’s going on politically, etc. in order to stay in the realm of open development.
It sucks, but…it’s our future. 😦
Don’t mean to preach tho
there are so many things i want to make, but im debating between a short film and video game, also you are good.
my advice: write your ideas down, like with an actual pen or pencil. Get a special book. Call it “The Future”. lol
sit with them awhile, even after you’ve found a good resource to test.
i should probably do that or find a way to organize my ideas.
AI is inspiring me to do just that lately so i’ve been trying to, and it’s actually been paying off
thats good.
If you treat image generations like they’re coming off of a roll of film, that finite, limiting nature can provide more creative potential
so i was trying to run stable diffusion on my cpu locally and it all went well and then when i ran the prompt and when it finished loading this is what i got
it doesnt let me send a ss
switch to the images channel
ok
general-with-images
also someday i hope to upgrade my hardware but electronics are not very cheap.
Because 99% of the SD 1.5 models are designed by h0rny people. I got the same issue when prompting for something else. Imagine that I prompted for a cat or a dog and I still got a woman.
1.5 was just easier to overfit honestly, it's not just the user culture, though that's certainly not untrue
people can certainly sit home and generate cat pictures if that's a bucket list item
Hi everyone, I am new to stability AI platform, just looking at the pricing page, is there a free sandbox for dev to use? or just 25 free credits and have to start paying after that?
how fucked up is fucked uo 💀
whats the best model for prompt listening
i mean listenin to prompts
prompt recognition, prompt adherence...
As 1.5, I think that dreamshaper 8
that would be SD3
if they release it 💀
lets not get ahead of us
it doesnt exist until its open sourced
ok, SDXL is the next best, by a wide margin, some argue 2.1 was good....but no content so who cares
1.x isnt on the map
Hope the SD 3 will just have better controlnets out of the box. So I have tested like 10 canny controlnets models for SDXL and only 2 worked decent out of the box. Now I'm testing like 20 depth controlnets models for SDXL and everyone seems to work just good. Why is this difference between them? :)))
stable video 3d workflow? got one but two nodes are undefined missing node install doesnt work
yeah I hope they won't mess up controlnets
they are teasing it to be a big release and stuff
I simply have all the controlnets available models for SDXL and I'm trying them to find the best working ones. At the end of the tests, maybe I'll release the list with the best ones.
that's be great
From CVL-Heidelberg, diffusers, kohya-ss, stabilityai, SargeZT, TencentARC and so on
Testing them all
Finished with all the canny and found only 2 good with the default settings (weight 1, start step 0, end control step 1)
I'm at the depth right now and every single one worked decent from the ones I already tested
yeah depth is usually okay with sdxl
Do I have to have the SDLX base installed if I want to use a sdlx checkpoint?
Like am i required to have both?
nope
as far as the ai animation space goes, i'm refiguring out things like touch designer, to masks and controlnet frames for animatediff stuff
does instantid not work well with profile faces? kind of struggling with it
from the side? yeah it does better with front on. where it can see all the features
profile photos do good for creating other profil photos maybe though. and i think you can load multiple images into the instant-id model
i keep calling instantid faceid
you wanna get sued?!
wait until you go look at the LLM scene and they use tools like ooogabooga
yeah, that’s a kind of stirring the pot that i’ve settled on a ‘no’ for
I mean, if you have a distinct creative vision or concept you’re trying to go for that doesn’t infringe on X, Y, etc.
then go for it
it's been a long time since last time i used this thing
did it stop being free? or something else
so I've been out of the loop for a bit. for general purpose image generation I was using SDXL, but now there's Cascade, Lightning, Turbo... does anyone have a simple resource or overview of what all these models are good for and how to use them?
SDXL and Cascade are base model architectures like 1.5 and 2.1 were.
Lightning and Turbo refer to methods for editing models so that they run in 1-8 steps rather than 20-50
Lightning and Turbo only apply to sdxl
there is also LCM which is similar to lightning but also works for 1.5 models
ah cool thanks. So Cascade is the newest base model? I'm hoping to get some better prompt interpretation. The visuals are less of a concern because I can always run it through a series of older models in a ComfyUI network for fine tuning
Cascade is, but I wouldnt dig down that rabbit hole as there isn't many resources for it and sd3 is imminent
A ZIP-like one that gives Latent more compression, but also requires a decompression model. So it becomes a three-part structure
gotcha
Can I ask about those online models which generate text exceptionally well, how do they do it?
compared to SDXL models
do you think any of the major ui's will have lavi-bridge support before i'm able to sharpen up my coding skils and do it myself? or should i just wait like pedro?
better datasets
Just fire up StarCoder2 and like…manifest it, i dunno
ideogram and SD3 , their text , title, words are all accurately, that is really amazing. But as you say, it just better and larger datasets ? that "easy" ?
I'm not too excited for SD3. I hope I'm proven wrong. I didn't like 2, it didn't adhere to the prompt like 1.5 did. You could type something simple like "red shorts" and the shorts would be a different color.
that happens on 1.5 still to this day
xl too
To a certain extent, but I found that 1.5 generally gives you what you want, more than 2 or xl.
what a time to be alive!
https://www.youtube.com/watch?v=5U_Q2Lmnq_c&list=WL&index=4
is probably just confirmation bias
Ehh I don't think so. Whenever I tried to use it, I never could control it as good as 1.5. I eventually just went back to 1.5. If 3 works better, I will be very happy. The visuals definitely look better in 2 and SDXL.
Could be my issue, that I'm prompting wrong...but I gave up after a while
i think you've decided already and will find the same evidence come 3 release.
because SD1.5 include NSFW database, which incress a lot of Generalizability and Singularity
SD3 seems so much better that I think ppl will stop using other versions soon after it releases, but I could be wrong
I hope so. Eventually, there shouldn't be a question as to which version is better. I want the newest version to be objectively better in every way.
for me SDXL seems more better in quality.
I agree. It does look nice.
It'll still need some time for finetunes to catch up but I agree
in the end of the day SD3 should be the best 🙂
SD3 is superior, you can simply notice it looking at Thibaud's posts on X where people posted their sdxl images and he replied with the sd3 version