#🆕|sd3
1 messages · Page 64 of 1
yea i heard theres not enough male fur content there 😏
so ignore the license, it dont matter right?
until you make $1m a year
distinguished furs
While I've seen several people make money at it, I have yet to see anyone make as much as first world county min wage. Still Hella fun tho
"Oh hi! Did you see my dog? I seemed to have lost him and thought he just ran behind that hill. Could you like help me plz?"
Oh believe me I try to rect8fy that daily 😉
male fur content. Is the fur male? Is the subject male, or the content intended for males? 🤨
When you were invited to Stability AI to clean up the mess.
Yes.
ah, cool!
I need sd3 but not medium
Just need another two weeks for the new release
@sage burrow they're calling your name here
had the opportunity to meet Prem live at the beginning of the week, he's gonna do great
They put an ad out recently looking to hire more content monitoring staff. That's a good sign I think.
They never got back to me in particular though lol
the trust and safety is neither trustworthy nor safe
I only have 8gb ram or I'd have about 20 made already 😉
you make loras locally with only 8gb ram, or vram?
i was doing batches of 10 with 16gb. dit is more efficient i guess?
Technically I made a checkpoint with my 8gb gpu 🤣. It hasld an 18 image dataset. it sucked lolololol. The various programs want 12gb.
Probably the same for loras.
So no rofl
But there is gpu rental
I make sdxl and pony loras via civitai, but they don't have sd3 capabilities yet.
same, but I have 12gb vram
im youhnr5 on civit, only made 2 fractal loras
and a merged sdxl checkpoint (locally)
With his portfolio, I don't even doubt it.
There's gpu rental, but I'm too concerned I'd pay for the hours trying to figure out the install.
Also I'm trying ro be patient and wait until sai releases all the needed info (better quality models)
I learned the merging thing the other day, fun! That can be done on 8gb gpu.
With 12gb vram I recommend everydreamtraimer 2. For checkpoints
It is possible to finetune sdxl with 12gb vram
Hmmmmm
How do you merge loras with checkpoints?
I think the first intended use for loras was tuning, not a add on module
I mean no loras
See 3th image on https://comfyanonymous.github.io/ComfyUI_examples/model_merging/
Lockup everydreamtrainer2 if I could make it do anything in my machine...lol
But even dreambooth can run on 12gb!
Also kohya
Say LadyLalita, if you want, plz share you civit name
First time seeing this, interesting. What difference between this and onetrainer?
3th
What about the 1nd or 2st one?
thats normal model merge, without loras (but with vae included)
I hid all my loras on there, just did soke drow ones. BTW warning put on sfw mode when viewing the very few images I posted on there. My civitai username is oipteaapdoce
Less gpu requirements I think lol That was my primary concern when searching for training software lol
They are probably all the same essentially.
Kohya us the lora one though (as I understand it)
thanks ❤️
I did rofl
I'll try, but when it will be less warm where I live)
Also their discord help section really helped me with my dumb installation questions a lot
okay, that is nice
im going to do a series in sdxl/pony merge, to later img2img them with sd3
easier to get some poses in back view
In theory yes, in practice, negligable differences I think. Though there could be more out now. There are a couple of neat loras aroun though
Where did all the art go!???
oh they all are on huggingface?
anyways, I suppose civitai should allow sd3 after license change
what does that name mean
Huggingface and also that sketchy site... shakker I think it is
They prob make more than a million though
The best lora ever is on huggingface, it's called friedeggs
How are they profiting?
It costs money to make models on there. I think it costs money to render images on there now also (I'm not sure, I get my daily free buzz)
The real question is how is huggingface profiting??????!!!!!!!
you use their GPUs, you do wind up paying costs
our community’s high expectations yeah, if they weren't so high, everything would've been fine. /s
its tea time
Which new version is coming?
balls version
HampMan?
bet you guys already heard the news, but I'll still put here what might bring back home imo
- Continuous Improvement: SD3 Medium is still a work in progress. We aim to release a much improved version in the coming weeks.
- Free commercial use appropriate for individual use and small businesses (up to $1M per year)
https://www.reddit.com/r/StableDiffusion/comments/1dw6o4y/live_reaction_and_review_of_the_new_sd3_july/
So quite literally, they did not change the biggest issue ppl have with the model's licensing. Bravo
- Revocable license
- On termination of license (if revoked, or if you ever hit $1m in revenue, whether related to the model or not) must delete all derivative works.
- Can't train any other models on SD3 outputs.
From OML discord
What bothers me is the fact they expect the community to fix the parts of the model they themselves lobotomized while also being hostile and condescending to said community
Finally
Yeah but now the community will be able to do it because of the license change
No they won't because the licensing is still strict
And it still has that stupid deleting shit in it
Emmm no
Read the actual license
If they do finetunes that will enable SAI to keep making trash models
@tropic aspen
"WILL NEVER ASK YOU TO DELETE RESULTING IMAGES, FINETUNES OR DERIVED PRODUCTS"
Can't train other models? Did I read it wrong in my excitement? It seemed like one could. Because I plan to train a LOT
That's images though, not models
If they switched the license now does that mean they can change it at any given time?
Either way sd3 seems to be cursed
f. Term And Termination. The term of this Agreement will commence upon Your acceptance of this Agreement or access to the Stability AI Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein. Stability AI may terminate this Agreement if You are in breach of any term or condition of this Agreement. Upon termination of this Agreement, You shall delete and cease use of any Stability AI Materials or Derivative Works. SectionAge IV(d), (e), and (g) shall survive the termination of this Agreement.
They say you won't need to delete the images, yet here if your contract is terminated (which can happen if you make more than $1 mil a year), you're fucked
"Derivative Work(s)” means (a) any derivative work of the Stability AI Materials as recognized by U.S. copyright laws and (b) any modifications to a Model, and any other model created which is based on or derived from the Model or the Model’s output, including“fine tune” and “low-rank adaptation” models derived from a Model or a Model’s output, but do not include the output of any Model.
Is that 1mill before or after expenses?
I guess that's not our case, isn't it?
It is (unless you make more than 1M per year with it)
It also is for the ppl that might think about fixing the model
Why?
As I get it, if the agreement is terminated, then you should delete all derivative works. It could be terminated only if you breach any term. The main term as I get it is the fact that you haven't paid if you reached 1M$. Or am I wrong?
There could be other terms as well that we don't know about that could also get you terminated
The only one we know of is the $1mil one
read what stability posted, not what someone posted here
Liscense is fine
It’s cheaper in china for cloud fair
My lawyer (ChatGPT) checked the license and said that the main breach case is 1M$ and others are just about distribution without mentioning stability AI or training foundational AI models
My lawyer (Gemini Advanced) lied to me the other day about SD3+SDXL merges! 😭 lol
🤣
I wish there were lora training guides on sd3, I would love to train them (or at least try 😁 )
SD3 the safest child play model that Chucky approves of 😂
search this channel for 'balls' - there's a guy that's been working on a lora who can probably give you a few hints
Thanks for the link!
Seems some people need to come to terms with the fact that SAI's models are NOT open-source but their version of openweights. The license is made toxic so that commercially building on the work of SAI must share profit with SAI (no issue if there is no profit). Non revocable license are a rarity, so don't expect that, the deletion on termination is a logical consequence of termination. At the end you're at the mercy of SAI, it's not open-source, it's some weird amalgamation of a SAAS product that you run yourself.
That outputs of the model are derivatives is too much if you consider only the images outputs, but a distilled version would also be trained on outputs of the teacher, in which case it suddenly makes sense. I'd argue that using sd3 images as part of training doesn't make the new model a derivative, otherwise any imagemodel would be derivative from any single image i saw, which it clearly isn't as that's the whole reason image models are trainable (well except according to anti AI folks) so i'd say don't sweat it on that very strict interpretation.
Ultimately, unless you're heavily/big time commercializing your works on SD3, you should be fine, which should put fine-tuners and researchers at ease.
went on reddit expecting people to roast the license, found out they're roasting @viral plaza instead
So civitai if they make 1 million a year, will have to delete everything sd3 related if they stop paying for the license. Still dead in the water if they don't manage to get civitai to their side.
I doubt people would use sd3 output images as training and if they do..... WHY USING AI OUTPUT AS TRAINIIIIIING? bonk
I believe the distribution of the model is not the case
Looks pretty reasonable: if you don't abide by the license you can't use the model.
That's literally on every single license agreement. What's the point of having a license if you can just ignore it?
Also the only ways to not abide by this license are 2:
- you make illegal stuff with it
- you make >1m$ and you didn't tell us
So seems like they didn't address the main concerns of the license, so it's still DOA
That's too bad... hopefully they amend it again.
How do you know?
you'd be surprised dalle, mj and ideogram datasets are available and finetunend with 😉
also it's funny, last time I checked InvokeAI was a SAI competitor (that makes >1m$ revenue using 100% SAI products)
Yeah I know but because its easier, not because its better 😁
Civit AI doesn't use the model, their just share it with users. As I get it, license says that they have to mention license and SAI and that's it.
Nothing like real world examples for training (photos, drawings, paintings, well and everything else)
why do all the UI developers have feuds with each other
if the license were irrevocable, then how would any terms be enforceable?
This is not the case but, in general, anyone not abiding with any license for any product can't use that product. That's literally the definition and purpose of a license.
Invoke is our own product, fwiw.
I might be wrong, just saying
Civitai makes money. If you make money and in some form use the model , that you have to pay the licence
also how do you know Civitai is not already paying?
If they use it
Because Civitai has banned SD3 from the site, and is participating in the OMI to train a new model that is actually openly licensed.
they run inference service.
does shakker also pay or they just run it as is?
It says 1 million dollars you have to pay a licence and if you stop paying, you have to remove those models you hosted. Isn't that how it can be interpreted. If not, how should it be interpreted?
The last license was kind of shit.
It still is.
So what are the main concerns of the license that weren't addressed?
They do use it
Photorealism.
@low stone
If they earn more than 1M on interference, than yeah
The reddit link above goes through it 
Oh PcMacsterRace linked it, ty!
to train on fake people
That would work training on real photos 😁, which is hard nowadays because of the mf filters and oversaturated colors😡 🤬
Explain why.
It's literally free for 99.9% of people and researchers and you only have to pay if you make >1m$.
Companies that make >1m$ on SAI products should definitely contribute to the community so that SAI can make more free models for everyone.
it's just one particular troll
the only ways to infringe this license are to make illegal things and to make tons of money without contributing.
(at least as far as i can see)
<-- 0 sympathy for companies which make over 1m per year regarding having to pay for licensing!
I just don't see what the problem is. You use SD3 to make a derivative. You make more than $1M with that derivative but don't want to pay the license but want to keep using it. So you just want stuff for free and want to make money off that free thing with no limitations. ?
The video goes through why 
I'm all for SAI monetizing, but the team at Stability has not responded to any of my outreach regarding SD3 licenses. I'm very clearly well versed in what the community, as well as enterprises, are looking for in the license. It's amusing that SAI sees Invoke as a competitor when I've actively engaged with multiple executives at SAI to try to figure out a way to partner.
This license is not open source. It is not viable. And I'm really disappointed to have to come here to defend Invoke which has been building OSS since the early days and been trying to partner with SAI.
Yeah but they could. So they should be okay with the fact they will need to pay stability a licence fee for life. If they are okay with it then good for them. It's just a matter of will they be okay with it or not.
it's always one particular vocal troll or minority. But damn 700+ upvotes. It's overshadowing the license.
To be fair, most companies pay. Some claim "we trained our own model architecture from scratch"
700+ upvotes on dungeonmaster's post, not the troll
Even if you make 0 cent with sd3 you still have to pay the license. It says any source of income, not only sd3
dungeonmaster's post is complaining about the kling/luma spam and broadly I agree with the core point of it
That's not true. it specifically says you pay zero if you don't make over 1M.
the new license.
As Alex said, I do not represent Stability as a whole, I am my own person.
Also who did you reach? I'm available to talk if you want.
I hope the new license addresses your concerns.
But civitai could
If they make over 1M, they have to pay. I don't see the problem here.
They will need to pay even if they did not make even 1 cent with sd3 but are winning over 1m in revenue a year
It did not.
that's why there is no fixed price and will be decided case by case
DM me, let's see what I can do.
Otherwise they need to stop using those models, hence remove everything for the website
but why can't they just make the products that they made with millions of dollars of R&D and compute costs available for other people to make money off of in exchange for absolutely nothing 😭 😭 😭 /s
The video also goes through why it did not address the licensing concerns, I'm not sure why that's being ignored 
Are you familiar with eduacational vs. commercial licenses? Generally free for small businesses and educational institutions because they know people will get hooked on the product then go out into the world to bigger businesses and then start paying for real licensing. I don't understand why a company making more than $1M a year has a problem with paying licensing fees.
here are the concerns --
https://www.youtube.com/watch?v=23tEjvCNZHA
In this live reaction video, we read through the newly updated July license for Stable Diffusion 3. We’ll look at the license line by line, sharing our thoughts and highlighting some of the problematic terms that studios and artists need to be aware of.
00:00 Introduction
02:23 Non-Commercial & Commercial Use License Definitions
08:13 Distribut...
You can DM me the response, or respond here.
Lykon S. Stabilityson, physical embodiment of Stability AI, is currently being piloted like a mech by new CEO Prem
bringing this to DM
Main issues:
Revocable license
On termination of license (if revoked, or if you ever hit $1m in revenue, whether related to the model or not) must delete all derivative works.
Can't train any non-SD3 models on SD3 outputs.\
Sure.
My point is the fact that I'm able to bring those arguments means that the license it's still not clear and leaves room for interpretation. That means big companies will still need to think twice
how do you expect a license to be enforceable without being revocable?
this is pretty standard. I remember speaking to legal about this myself and they showed me it's on literally every license.
that issue is no more
had your same concern
also the last provision is only even remotely enforceable if you are making the outputs yourself. ai generated images are public domain.
I don’t think it’s on an MIT or Apache license?
its kinda the norm for entreprise licenses to have some negotiation / flexibility
The revocability of a license creates almost zero incentive for any enterprise looking to develop the model as IP -- The terms can also be updated unilaterally.
As we've already seen. This isn't open source.
There's some tricky bit to be fair. For example lumina is saying they'll use the SD3 VAE, they will license their model MIT, so everyone can do whatever with it. Now if big inference site uses it, or derivatives,that SAI license is attached to it -> problem
Of course, one could argue that researchers should just use properly licensed components (and in this case they definately should, it's a big group), but it's a slippery slope.
In a way it's fair SAI plays it safe, but there will always be a tension between how much is new and how much is based on based SAI derivatives, and at some point it won't be fair, as SAI especially puts their work out there just for researchers to use.
They should pay licence fees if they make money with Sd3, however the current way the licence is phrased does not require you to win money specifically with Sd3 in order to pay the licence.
Civit runs a site that serves models/images etc. They're making money off the fact that SD3 derivatives are on their site. People like me specifically go to their site because I'll see SD3 content. They're making money off it.
and I pay them!
well not much
civtai is cracking down on adjectives however, gonna be hard to use it for generation probably
"almost zero incentive" aside from, you know, the >$1M you would be making from it. Stability also doesn't particularly have any incentive to rug pull you, they lose their licensing fees if they update their license and terminate you as a result of the changes.
we just need a new ai company imo
Then I agree they should pay the licence fee.
they can rugpull anyone if theres pressure from an outside actor
OMI might be smth to look into
I feel like it's definitely a move in the right direction. Only time will tell how the community reacts, but I hope this means civit can at least start having sd3 content again.
I dunno why the fee part is even an argument
the issues lay in the license
I would pay a monthly fee to someone who is not so crazy censored
there's a service that's pretty much not censored, rhymes with instagram
It's generally our lawyers will call your lawyers, past the basic simple public for all licensing. For any product/service.
Wait for finetunes
unless your plan involves seizing the means of production in the name of the workers, you're going to find that literally any company producing foundation models needs a business model that involves someone other than investors giving them money, and will also need to take actions to avoid the wrath of regulators especially if they are high profile
yea this might be a good idea
I figure that's what the contact us if you make over 1m button is for (on all software/services)
I have no plan! I am too old! You young folks should do something tho!
Yeah if there is still debate about the licence this is not very good. I'm finding a lot of people that are very knowledgeable are still refusing to finetune it
they said on Reddit that they can't cause they got no license and have asked for one since before the announcement of SD3 in feb. whether that can't get means SAI ignores us, or SAI and us can't agree on terms, is left to speculation 🤡
i would expect that civitAI getting their license would come after the license is updated
civtai is cool and all, but I really think they need some competition to get even better
nothing like money to make you git gud
AI dreams: to need one of those fancy SAI licenses! 😄
yeah well, a lot has happened since then. New CEO, new investors, it's basically a new company.
If the license isn't viable for that individual, then it makes sense for people to refuse to finetune on it. 
Either way, i'd really hope everyone would shut up about that license now, normal non big companies can use it all free of charge. i'd think that'sabout as good as it can get for products that are this expensive to train.
The revocable license could be also revoked for nsfw finetunes. And fine-tuning cost a lot of money so...
It's purely at SAI's discretion, right?
any one fancy stress testing the license?
Yeah so if they like they can revoke it for whatever reason
Yeah, that's an issue. I don't blame people for not finetuning with the way the current license is
lol
I wish there was a img2img method to inject stuff into the model, like first make the normal image but with an additional button "merge into model"
There's stuff like IP Adapter that can have a similar effect
I'm much more interested in how much of the new SAI models will be open-weights and whether they'll heavily differentiate between open releases and API only offerings, like that shiny 8b model in the API 😉
but not merging it into the model, obviously, but yeah, not aware of non-training method of doing that
their current AUP only prohibits using their model for creating NSFW content that is already explicitly illegal, they also already know that a sizeable portion of their market share involves NSFW models and inference services that use them and are quite unlikely to change that since that would mean less money. I'm not concerned about it at all, and if anyone gets dinged for making models for creating deepfake porn of celebrities then I will laugh at their misfortune.
No, the revocable part is something seperate
this just in,new president bans pics of feet and sai gets forced to revoke license of all ppl making feet pics
i made it simple for you
if that ever happened, I would imagine that most people would have more to be concerned about over the law than they would about the license.
in fact there is absolutely nothing that could be different about the license that would help in that situation
luckily my new president is not part of a big group of powerful politicians who wants to ban all porn
It's not only for trump, but all us presidents
how is this relevant to sd3?
yea theres a few dems on that group
not directly, just to follow up the statement on president banning feet pics
Revocable part of the license is at their discretion, so it can be any reason
I feel that all presents are equally viable as zombie and/or gay romantic dinner prompts, regardless of affiliation 😉
doesn't all customer serving companies reserve the right to stop their service?
When someone has the ability to take away everything on a whim, it makes sense for them to evaluate whether or not it's worth it 
f. Term And Termination. The term of this Agreement will commence upon Your acceptance of this Agreement or access to the Stability AI Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein. Stability AI may terminate this Agreement if You are in breach of any term or condition of this Agreement.
Here is the part of the license that directly contradicts what you just said.
That's something different drhead
depends if its a justifiable legal reason
Name the section where it says that the license can be revoked at any time, for any reason.
In this live reaction video, we read through the newly updated July license for Stable Diffusion 3. We’ll look at the license line by line, sharing our thoughts and highlighting some of the problematic terms that studios and artists need to be aware of.
00:00 Introduction
02:23 Non-Commercial & Commercial Use License Definitions
08:13 Distribut...
Watch the video, he goes through it
I'm surprised you didn't, to be honest, since it's been core to the conversation for awhile
Let me know when you're done watching the video, I have to brb for awhile
That is a video. Name the section of the license.
I am not going to watch a 20 minute video when 30 seconds of you hitting ctrl+F on https://stability.ai/community-license-agreement would be sufficient.
zoomers...
skill issue
Thing is, it's not my job to find supporting evidence for someone else's argument.
As long as there is debate, the new license won't fix anything
its not your job to do anything this is a discord server not your professional reddit cv
Why are you being such an asshole?
II. RESEARCH & NON-COMMERCIAL USE LICENSE
Subject to the terms of this Agreement, Stability AI grants You a non-exclusive, worldwide, non-transferable, non-sublicensable, **revocable **and royalty-free limited license under Stability AI’s intellectual property or other rights owned by Stability AI embodied in the Stability AI Materials to use, reproduce, distribute, and create Derivative Works of, and make modifications to, the Stability AI Materials for any Research or Non-Commercial Purpose. “Research Purpose” means academic or scientific advancement, and in each case, is not primarily intended for commercial advantage or monetary compensation to You or others. “Non-Commercial Purpose” means any purpose other than a Research Purpose that is not primarily intended for commercial advantage or monetary compensation to You or others, such as personal use (i.e., hobbyist) or evaluation and testing.
Anyways, I'm outty, there's your proof
It says that the license is revocable, but does not say under what conditions it can be revoked.
"Subject to the terms of this Agreement, "
Exactly
And this is where it says where it can be revoked:
f. Term And Termination. The term of this Agreement will commence upon Your acceptance of this Agreement or access to the Stability AI Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein. Stability AI may terminate this Agreement if You are in breach of any term or condition of this Agreement.
Still vague , what are the terms of agreement, the whole license? Do I need to get a lawyer to understand it ?
at least they promise to only terminate the agreement if you're in breach of it
I'm glad they didn't ask for my CIvita userid in that form lol
no, it's not long or hard to read
You do you, my dude. 
They're right here, and frankly most of it is quite comprehensible: https://stability.ai/community-license-agreement
prob should get a lawyer to look at it and not a youtuber 😉 But it all starts with expectations, SAI models are not open-source and not free as in free beer. Whether that's fair, guess it depends where you come from. I get the feeling people want too much.
It's also not a promise, it's a binding part of the contract. They cannot terminate the agreement unilaterally as long as you follow its terms.
a "written binding part of a contract" is just a fancy promise
@sage burrow
how long will people talk about the new liscene now
the kind of fancy promise that you can sue over breach of contract for (wow there's actually no binding arbitration clause in this)
til someone releases a sex finetune on civit then everyone will shut up
I am talking to hipsterusername right now fyi.
Please refer to the blog post to get the "easy to read" version and the intentions behind this.
For the license agreement itself, unless you're a lawyer (and I am not), it's more complicated
im quite happy with the new license
should take away some fear to create derivatives, like loras. hope it will be on civit on site gen soon
I don't finetune models or use them commercially, it doesn't affect me. It affects the finetuners and sites that host SD3, so I'll leave that to them 
I hope civit makes less than 1m per year 😄
If people deem the license viable to them, then they can use it. If they don't? Then they won't. It's really that simple.
the financial people must be working hard @ civit
I think peoples concerns are completely valid
breaking news,civit creates several shell companies/sites with online generations that make 500k each to avoid paying license fees
Huggingface is even more awesome though
One would hope not. I pay for my biz software.
Hugging face is a great project with a very shitty UI/UX
VERY
And that's why civitai ai exists
civit and huggingface is difference between developer and end-user platform 🤷♂️
i feel like we must not be using the same site? civitai's interface is slow and pretty much changes every time that I have a new model to upload, HF is simple and fairly fast and I can just upload stuff from the command line
How do you make your money back on open source however? Licensing will do that?
It's way easier to find stuff in civitai
hooray youll fix stuff! XD
By selling your services to big companies
hugginggace feels like the comfyui version of civitai
its not user friend;ly but the good shit is hidden there
Invokeai has a great inference interface and they make money with selling their services to companies. Stability can sell their model to companies and services if they new to make some.
Comfy is my fave! That explains things lol
i hated comfy and now it is my fav too... i think most users go through this cycle - denial(nah A1111 is fine) anger(i cant do this and that here wtf... maybe i shld use comfy) acceptance(ill use comfy)
Yep comfyui in my opinion is the greatest project that needs protected at all costs. Even revered
if you want comfy's power but don't love comfy's noodleinterface, Swarm has a friendlier frontend main tab for that
what stage am I on if I'm jury rigging triton kernels in to the point where it's faster than comfyui
Raptured.
Comfy was the first ones I tried. I liked it because I manged to get decent images first try. I've seen uses for 111 that look helpful, but fortunately there's comfy workflows for them
yay they fixed the licence. Also -->Continuous Improvement: SD3 Medium is still a work in progress. We aim to release a much improved version in the coming weeks. <--- IN 2 WEEKS!
Yes swarmui is the other side of the coin. I see a bright future for these two projects. I only hope they will expand their inference code to llm's with llama.cpp, langchain and llamaindex and they could completely dominate all the open source userbase.
What's Swarm?
I will probably abandon A1111 instantly the moment that someone creates a frontend that easily supports every feature that A1111 does, is extensible, and also has tablet pen pressure support and image editing that is otherwise at least on par with gradio 4's image editor
Swarm is for noodlephobiacs
if you want the benefits of noodles and the look of A1111 swarm is for u
comfyui puts the spaghetti in the frontend, A1111 puts the spaghetti in the backend implementations
idk it becomes kind of intriguing and addictibve to hook the noodles up - kind of like an old moog synth
its not that scary after a while
What can 1111 do that comfy can't, even with the fancy nodes?
i see people constantly complain about interfaces for inpainting and regional prompting
i still think there is room for some nice polished UI (invoke is closest, but always a bit behind) swarm on top of comfy feels like you had one problem, now you have two, as i inevitable ended up using custom nodes in comfy, but swarm is super easy to get started and set up.
oo pen pressure is one I haven't tried to get working yet - I do think I have the tools to make that happen tho
@jolly swan Hello good sir, do you have any official word or update on what you think of the new license update?
#📣|announcements message
- Does it answer all the questions you had?
- Will you consider training an SD3 pony model after you release 6.9 with this new licensing update?
- Side question: Do you have an estimate for when you think SDXL Pony 6.9 will be released?
Whichever version glif is running knows who Goodra is, but Taesd and Flash via HF both do not. They do know what lululemon is though.
tbh what I really need is a good krita plugin, the one that exists just doesn't have as much flexibility as I felt like I had from the last decent krita plugin, and the old one doesn't adequately support things I do now. But having pen pressure on a gradio like inpainting interface would be a step up
That sounds like SD on a phone or tablet? 🙂 🙂 🙂
cool
awesome
the mask editor in comfyui actually has pen pressure
Some of them but not everything.
I will wait for legal review just in case.
The model is still meh so I need to understand what is the priority between OMI/XL/SD3
Well, I am trying to start training it but the data dungeon is not letting me go (i.e. I am dealing with captions quality now).
- What's still missing they didn't answer?
- When you say wait for legal review maybe you could just wait for civit to turn SD3 back on and consider that 'good enough'?
- What's OMI?
- So you're still in captioning stage good stuff, is it just a matter of waiting for the process to finish or is it a more interactive hands on process that you have to like babysit if you will?
I am still not sure the license is written in a way that actually reflects the announcement so I will wait for Civit to confirm
https://www.reddit.com/r/StableDiffusion/comments/1dp2as9/update_and_faq_on_the_open_model_initiative_your/
I am doing round of finetuning model on curated captions, then running it on more images, fixing captions by hand and then gong back to finetuning on larger dataset
usually you sell services
that was pretty easily actually - swarm's image editor now has initial pen pressure support
currently just for brush size dynamics, but now t hat i have the foundation there it can easily be expanded to do other things
thanks for the answers, much appreciate your work again, if you don't mind me asking, what is about the quality of the captions generated that requires further refinement by hand?
Whoa you’re orange
whoa you're green
A lot of things, and it gets worse when you have multiple characters + nsfw. There are all kind of failure cases, not getting the right character names, misnaming things, missing OCR, etc...
The quality is already better than anything publicly available by a mile (aka things like Cog), but I want to make sure I squeezed everything I can.
got it so the vision model kind of hallucinates sometimes or mixes up elements, makes sense, thanks for the insight and good luck man
the points InvokeAI is making are essentially 2 (and only apply to companies making over 1m revenue, but in reality they apply to nobody):
- I am a company that just hit 1m$ revenue and I'm waiting for the Enterprise license. The license agreement says I have to delete everything.
- No. we do not expect you to delete anything while you wait for the Enterprise license. Nobody expects you to do that. That's phrased the way it is because there is no way to say otherwise in legal terms, we can't put any grace period in the license, but still nobody wants you to delete anything as soon as you hit 1m. Also no company hits 1m overnight. Usually you know in advance when that's about to happen.
- I am a creator who made a lora with character X. I can't use outputs of character X to train another lora?
- That point in the license refers to "foundational models". A lora, finetune, retrain, retrain from scratch, etc, is not a foundational model. It basically means that you can't use sd3 models or outputs to make a """new""" model architecture and claim it's your own just to bypass the license.
Both are pretty standard legal terms and are in multiple licenses. Of course, being something made by SAI there should be an outburst over the simplest things
And when legal teams review them, I'm sure that'll all be sorted
After all, it's up to the finetuners and sites like civitai on whom they want to host or use
Though I do apperciate the response
I'll make sure they're addressed in a faq. There is no way legally to put them differently in the license document
so if you decide to be scared by that, your choice I suppose.
love the progress on the license!
i just see "if you make less than 1m in all revenue, you can use sd3 with no paid license"
of course if all you do with SD is cool avatars for your employees, you probably won't pay shit lol.
That's why there is no price on that tier.
and if i make cirno?
depends on how much money you make with cirno images 
(plus this is all self-reporting, come on lol)
powe r bill says i lose 22$ a day
then we give you 1 $ per image
When are we going to see 8b in the wild?
real
(by the way that's net margin, not revenue)
(if you make 21$ but you lose 22$, you still have 21$ of revenue even if your net is -1$)
whenever the s.ai higherups say so, some devs Acutally Want a good model to be published
yoinked powers strike again
(but I personally don't think 8b is ready for release)
the ddpo server shows Extremely Good Promise
(there is 100% gonna be a "X doing Y" prompt breaking the model)
someone will find a prompt that breaks any model
something like "cirno from Touhou making a nuclear weapon"
I think also nobody can tune sd3 atm even though they are trying
no-one is trying hard enough*
kohya is trying pretty hard lol
I hope we can give them help and resources soon enough
thank you!
most of the answers to "why SAI doesn't do X" are usually "we want but we need time and resources"
why doesnt sai just release 8b as 8b beta, like how they released XL as 0.9
(safety, its always safety)
well, there was a time i thought this #🆕|sd3 message was bad (from 8b), but times have changed, that one is perfection :p
most models are kind of bad at sideways and upside down anatomy and faces
humans are also bad at that (thatcher effect)
well that was kind of the plan for 2B but we know how that worked out. SAI is eternally cursed to have an early release that splits support for any mainline model they release
I have hundreds of bad gens for every model I train
||that was supposed to be 2b release||
why doesnt sai just release a model correctly and not rushed
Just use 2B to figure out what you want to do with 8B, if you're done with testing stuff before 8B releases you aren't doing enough ablation testing.
e.g. XL, audio-open (although that isnt actually s.ai but actually just harmonai), svd
hi comfy
like I cannot imagine trying to do any major project on SD3-8B without doing basically the whole training run on SD3-2B first. You don't want to piss away that many cloud compute credits on ablation testing
I hope in 2 weeks time they also improve the 2b version
What is ablation testing?
you change things about your training regimen in an organized manner to see what works better
likely a better candidate than what's being proposed now
That's the 2B that was released
too many fennec anime girls in the dataset, for sure
but I meant as a President candidate
like doing image gen grids, except for training if you did a grid it would cost 50 million dollars so you do it more carefully lol
not enough, it really likes putting human ears lol
you know we actually had to put "that prompt" in the training grid, sigh...
not real kanjis, 0/10, sd3 sucks
Can't you make a perfect case that public testing in discord is needed again so that all popular failure modes are detected ahead of time 🤡
you have no idea how many times I suggested that lol
Oh boy would you need a moderator if that happened LOL
we did that already with SDXL
(well, the leak happened that time, so that's possibly why they didn't want to do that this time)
vett the testers then and require NDAs
sdxl testing on discord, and the server didn't get shut down by discord?! lolololol
in SDXL times it went really amazingly well (filter helped)
most weird thing was a few people creating the same prompt constantly for weekson end 😂
(with a bit of img2img)
that sounds extremely boring!
Unfortunately, the understanding of anatomy in dynamic poses is still very lacking. I hope that by the release of 8B, this will be improved at least a little.
I mean
the PR team have say that the SAI will release an improved version of 2B ( SD3.1 I will say )
I will still looking over the entire situation though
by the way PR team pretty much complete their job in the term of clarification ( without looking at legal team lmao )
There can't be a proper model without proper datasets. I still don't understand the point of putting in so much effort only to lobotomize everything afterward
Surely 2b can get a whole lot better, SAI has data SDXL, cascade and 8b show that data is there. And even if not it's crazy that pixart with its tiny dataset is more dynamic than 2b.
3.1 is successful or not I think it will prob held back by hardline open sourcers.
like what, open source is our mission and shall NOT impose non-commercial license
trying to have her doing something cool, but I think we know what the problem is. 🙂
woah I did not say OMI is a bad thing. I mean sure they was created and formed in the midst of "closure" of open SAI.
yeah SD3 falls apart pretty quickly with complex stuff
meh, they'll realize the reality, that AI models are not open-source, maybe a few will choose to wait forever on "promising new initiatives", but if the model is better than alternatives and training works, most people will start using it, especially when third party tooling/extensions trickle in like control net/ipadpter style and such
8b to the rescue
not enough attention on data, the only attention block is shared with text.
This is (also) why the model behaves better when you use long and descriptive prompts.
and why 8b has the edge when it comes to coherency (sheer number of params compensates the lack of attention)
Hm. I mean there is no such things as a free lunch
one must pay for the cost of open sourcing model
I wrote a quick gpt instruction for runwayml gen 3 to make video prompts that include camera angles etc. I'm finding it makes really neat stuff with images as well. here's the prompt for this one: anime art, Camera sweeps through a neon-lit city street at night, quickly zooming in on a blonde-haired girl with fox ears in a pink sweater and jeans. She suddenly leaps into the air as a massive robotic creature smashes into the ground. The girl gracefully dodges laser beams, flipping and twirling in mid-air. She lands on a building's edge, summoning a glowing energy blade. She charges at the robot, slashing through its armor in a dazzling display of agility and power. Camera ends with a wide shot of the city's skyline as the robot explodes in a burst of light.
the model is bound to so something with all of that. 🙂
nice shoes
hm maybe I should make Comfy Classic Fennec Girl as well
change her to fox, just to piss him off 
@lavish osprey Hi lykon, are you going to release a new version of sd3, because my team has done some fine-tuning work on the existing sd3medium and improved some body structure, I'm not sure if we need to continue】
he is just a small dev though
he doesn't worked for real training of SD3
if you were successful, please share the results, might steer our research
I can't confirm, nor deny 
ah
it's the secret 16B model
definitely not lol
Good except hands 😦 via glif
what I can say it that it's not really fair to compare Ultra to an open model, like it's not really fair to compare Dalle to an open model
because they're not models
16b is too large lmao, to the point you need A100 to run it
so what's it then? 
it's 16gb without TEs, so it can run on 3090 and 4090 too
I'll tell you when OpenAI reveals what's in Dalle

though I'm wondering how will the model came out if we maximized the requirement to run it / inference
By the way, DALL-E in Bing and DALL-E through the API generate the final images quite differently. Today, I asked ChatGPT to do the same, but it didn't work out. 😄
with dalle on api you can do 1792x1024 in "hd". can't do that in bing.
probably different prompt enhancements
bing doesn't do the "hd" mode.
and size apparently
the hd mode fixes hands and fingers/limbs etc to more exacting degree
(which might even use different ckpts entirely)
well you can do widescreen in bing, but you won't get the hd mode.
Bing
hd mode takes longer too
Api
@lavish osprey We just fine-tuned the 1.6w body structure, which contains a small number of nude images with an epoch of about 5, and the results so far are satisfactory,
can't you publish the finetune on hf/civitai/shakker/etc?
don't know if the sd3-2b has a problem with insufficient pre-training
I wanted to train a SD3 model with my anime dataset
well, guess not on civitai, but till..
... with that funky CogVLM captions.
pretty sure you'll get a lot of instant karma being the ones who "fixed sd3 medium"
go for it
yeah
I prefer MiniCPM. Can also be instructed to output style and arrange the output in a json
never heard of that
and style alignment is pretty good
I am also going to train it on Chinese caption too as I understand Chinese and Chinese myself
also recognizes text (in Chinese and Japanese too)
@cobalt moon could u check dms
This is a simple demonstration
t5 will fight you on chinese captions by the way
I see a watermark text there
T5 cannot do Chinese?
anyway, current 2b can also do "standing" pretty well
main issues are with weird poses, continuity in general and hands
(my guess is due to the lack of attention)
glif
@lavish osprey If you are ready to release a fixed version with a better base model, we will not continue to fine-tune the existing model and look forward to your release of a new good human structure model
there is a chance you can diffmerge anything you do on the next iteration
worst case it's a good experiment
and you'll just have to repeat it
(or you'll release before us and you'll be regarded as the saviour of the world)
(which would be a ok by me)
I need to prompt gloves!
Thank you for your reply.
Anyone knows if connecting this nodes like this is the way its supposed to be connected?
yes
closeup of a clear glass bottle full of a forest encampment, wigwam, longhouse, tepee, grass house, wattle and daub house, chickee, forest, cliff, canyon, river,standing upright on a boulder in the sonoran desert.
Idk why but somehow the sd3+t5fp8+clips all together in 1 file is way faster than they separated
It runs at almost the same speed as sd3 withouth t5
hmm, getting better, needs a little editing
fp8 t5 is half the size of fp16 t5.
If you're having speed issues with fp16 t5 it means you're going past your total vram, meaning you're offloading and reloading all the time
It's not enough to give it a default horny bias, right? (I usually prefer to avoid that if possible)
There was a sale a the shoe store, two left feet sale...
Nono, the strange thing is that I used fp8T5 + sd3 + clips = 40 seconds
And I use integratedfp58+sd3+clips in a single .safetensors = 20 seconds
And using up to 7gb of vram only
, im confused because they´re supposed to be the same thing
What, wait?! How, with what? (in laymans terms? :D)
This instead of that :3
did you use the safetensors with no clip?
(the 4gb one)
I use a single safetensors that said it had the t5 + clips, its 10.1 gb
Yes, but we have observed that the human body does have better integrity, and we are trying to use more training sets for the next round of fine-tuning, and if it is successful we will open it up to the community
uhm
looks like a report to mr comfy then
unless you're keeping one of those files on a slower drive
We also observed some interesting features of mmdit that differ from unet
Thank you! 🙂 And it's in comfy form!!! 😄
My GPU lol
So there is no subscription anymore, we can do youtube videos with SD3 and sell art under 1M$/year?
that's what it says, yes
perfect, small people should be happy then
I wish we have ETA for 8b, I'm in love with it
this is us
source?
SD3 has an incredible training on Halloween pumpkins and zero about human feet
api gave me censored feet, but it can do🤭
Yep. Feet was promoted to sexual organs now
only the slim ones
SD3 medium
free it
on todays post is says 2B fixed version will be released in following weeks, so latest next month I would say
don't set dates
in b4 next week people are mad because they pwomised
bruh if t hey say weeks, it would be weird to come in 4 months 😄
and then they mad because it is not finetune quality
r/StableDiffusion logic
I don't even look at it anymore
it become r/Kling at this point
( can't say much because welp r/StableDiffusion is if not, one of the largest AI subreddit )
don't set dates anyway, that's want causes rumors and then trouble
I don't think it will taken months, like 6 months for them to release 3.1
again I am not a ML researcher
yea it's like AI spam bots are spreading ads and self upvoting the posts, there is huge bias pushed up on majority
but lobotomizinator tended to be a post-training process
it is a fix nevertheless.
Unlike SD2 I think SD3 is much, much... much more complicated

we already got some controlnet mode in SD3 though
Canny and Normal if I remember correctly
controlnets showed up like a day after release. they were really easy to train
i can't hold it in any longer... 2 WEEEKS!
lol
My bodies were going to well, the limbs were the correct number, and going in the right directions... until I made her a fox lady
darn tails!
probably not, but still. someone says a date on here, and suddenly it's "set in stone" all over the net - and then people are mad when it doesn't happen. And the company never said it would happen by that date.
There is "in two weeks" and then there's the Trumpian "in a couple of weeks"
SAI never specified which model they used 😛
Laara Birohks
well - "two weeks" isn't what the annoucement, or the devs, said...
sd3 is many different archs. like 2b, 4b, 8b, and a lighter one maybe too. it'll have to be like sd3 2b.1 or something
2B is parameter...
it is make more sense for SAI to do SD3.1 2B than SD3 2B.1, because it is non-sensical to put 2 billion parameter with .1 after it
yeah each parameter size will have it's own versions
it'll probably be sd3.1 i.e. 2b
sd3 2b rev 1
owls are trained really well and i hav to really push the rng and weights to make my lora work here. i like this result tho
Just train a separate img2img model that can un-gaussian blur an image. Should only take a few months on a 1k cluster of a100s 🤣
it gets more foamy lookin as i crank up the weight. it is cool how it caught detail. thats the new vae more than anything i did
i fixed it my lora fixes sd3 . problem solved.
my LoRA turns most things to absolute crapolis... but I will get there one day LOLOL
Some times, it does not melt the subject... but still 😢
all is good until lawyers demand i destroy my balls
Lmao
the day stability executes order 66. destroy all models
bears don't make good clowns. dont hire them for parties
PSA I REPEAT BEAARS ARE NOT GOOD PARTY CLOWNS
ancient terracotta balls
little known fact
in the future the ball army rises again
none of it would've been possible without the pioneering spriit of the frontier balls
did your lora go from balls balls to balls looking things
or can you still prompt for balls balls
i'm experimenting with not saying balls in the captions. it was hard
but if you prompt balls the model knows balls
balls indeed!
with and without the ball word. nothing about balls in the captioning but the network figured out the ball part anyways then generalized the rest
is it still comfy-ui only?
some of the other uis support. sd.next is one
Im really curious how come a checkpoint doesnt work on some UIs but I suppose its quite complicated and probably outside my competence field
SD3 requires unleaded gas and some of the UIs can't burn unleaded gas yet
Thanks! As expected, I have no idea what that means lmaooo Im gonna do some research
the various UIs are like cars. some of them require stuff they don't have yet in order to run SD3
so sd1.5 is leaded gas? that explains a lot
heh
Licensed proper!!!
Excited to see how much more attention 3 gets now that finetuners won’t be worried about having a patreon/ donation income license conflict to pay for training.
Even more so that we are officially getting a revised model to work from when they finish the retraining.
not many, but lykon would do good to bring his dreamshaper brand to it, to encourage others to do it
morning
They fixed what was wrong with the updated license from yesterday?
removed Creator License
though the ability to revoke license and ban on SD3 output as dataset is still there
Okay, but the questions from Invoke has not been answered?
what's the question?
In this live reaction video, we read through the newly updated July license for Stable Diffusion 3. We’ll look at the license line by line, sharing our thoughts and highlighting some of the problematic terms that studios and artists need to be aware of.
00:00 Introduction
02:23 Non-Commercial & Commercial Use License Definitions
08:13 Distribut...
you expected me to watch the entire 18 min video?
yes?
The licenses are revokable
Just saying that there's no way that we will see finetunes and such on civitai any time soon. Even if someone figures out how to finetune the model
someone just did and said they finetuned it earlier
and ping Lykon of whether they are going to release a 3.1 or not
sure I guess we are going to back to the era of pre-Civitai. But you wouldn't say it get no traction at all
( for anyone who wondering about this )
Civitai isn't banning SD3 anymore? or why is it being asked by Lykon to release the finetune on civitai?
you can read my reply to the video
the finetuners have release one generated image using his finetune.
If I comprehend correctly I think Lykon thought the finetune is already done or complete
I didn't assume. There is an "if" in front of the sentence 😉
So, Kohya or Onetrainer for sd3 loras?
@torn wharf ?
too much comfyposting, let's begin a Miku wave
Context? Or does sd3 know miku? Goigle thinks it's a local restaurant lol
I negative prompted anime and everything 😦
I looked up her age, sd3 needs to put more clothing in her...
Better.
a bit of a plot twist

^ when you inpaint hedgehog on someone's head lmao
is this SD3? and the miku?
anatomy seems good
Everything is sd3
wow
is this a workflow or just prompt?
like is there a refiner pass with SDXL or something?
thanks
The cartoon foxes were sd3 ultra
SD3.5 in two weeks.
ah I checked the glif its running SDXL over the output
that's maybe why the anatomy is better
hey thats cheating
lets hope they didnt use one of the improper impure models 🤬
probably they did, they told me to chill out or ill get banned and i didnt promt for ponies...
you can prompt for ponies (as long as they carry their id to check their age 😉 )
prompts for Steven Seagal as a pony

it uses a cinematic model
its a bit of a funny workflow because it upscales 4x with Siax but then downscales 2x
To train your SD3 model you first must take her to the gym.
For more AI related life hacks follow me on Myspace.
cool, evil migu. Prompt?
Hot tip, for real this time.... Don't try tackling anything bigger than 1024/1024. I tried refining with SD3 (coz it has neat details dooohhhhhh) some sdxl/sd15 concoctions at 2048 and everything around the edges was a horrible mess. 😄
webui just released SD3 support, even if just with Euler sampler
High quality. A beautiful devil Miku with long red hair and red eyes stands gracefully under a starry night sky. She has large white wings and a glowing halo above her head. She wears an elegant white gown with golden accents. The stars twinkle around her, casting a soft glow on her serene expression.
Not very refined prompt, but did the job 😄
Sd3 ultra also works.
Though here I thought 2b was doing better than it does!
yep, Cascade is a cool architecture and very well finetuned.
Unfortunately it got released in the Sora period when everyone was drooling over DiT (which is at the very cutting edge of this tech and still being researched).
anyway I personally prefer SDXL over Cascade from an usage/architecture perspective.
I admit that base XL is less pleasing than base Cascade out of the box, but the base release state is not that important once you get finetunes
and XL is easier to finetune and to use (it's not by chance that the most models are still using SDXL as their base, see Kolors)
It's definitely more complicated to use. It does fine lines like nothing else though.
it inherently performs refining
Yeah unfortunately that Kolors thing seems Linux only for now because of its dependencies.
gonna be supported in Comfy in no time, since it's mostly XL.
but it's non-commercial, so...
Awesome. Hah yeah I'm not commercial so I just get to have fun with it all.
true, but some people get angry at non-commercial models and decide it's not worth finetuning them 
Via glif
are there any good sd3 finetunes yet ?
The eyes remind me of Achmed, The Dead Terrorist.
whatever it was you were drinking at 4:30 am, don't drink it any more
Some output from Kolors without refinement with other models.
Well this is what my GPU and glif-bot were up to last night, I'll have a word with them!
when the user's away, the bot will play
I would say that the output is very good, but it's not sd3 levels of prompt adherence. Even the other open source models are better at what I asked for in these examples concerning multi subject.
those are cool, though i'm not so sure abou the vomiting giant bug
Prompt for that one is : massive anthropomorphic gelatinous blob spraying koolaid onto enthusiastic patrons at a medieval tavern
In the other models, it comes out as more cutesy than it did here
How do you run it, just diffusers and the repo? I used that and it seems only the default euler scheduler gives ok results, wondering if I made mistakes
I was using this
it's a bit unstable, but almost laying on the grass ! 😄
(im sure she'll explode if going any lower)
With these settings.
You could merge Kolors with SDXL checkpoint
Maybe without the clip part.
We'll have to see once comfy nodes are released.
You could load the checkpoint with unetloader
Ah thanks, I'll see how they use it, prob the defaults
Sure, but the magic is the glm encoder. That's what elevates it above what we already have
Sure
It doesn't know groot when I tried 😂
But you bring up a great point. If it's just an sdxl checkpoint, this could potentially let us use existing sdxl checkpoints with it
I don't have Linux so I can't run it locally. Can you swap in a regular sdxl checkpoint and see if it still works?
It's trained from scratch I suppose, totally different weights I think, but can try
The right image is 50/50 merge with my sdxl ft
I think your result shows it could work. Using existing checkpoints with this
I think it means the glm encoder could work with other sdxl checkpoint? or with a certain % merge with Kolors
I will test the merge to run with the official script tomorrow
Unlike what you'd get PRETRAINING SD3 with MJ outputs, that way you'd get a mangled model that needs community fine-tunes to fix the model
that's the difference between Unet and DiT.
DiT is "language oriented" so has usually better prompt understanding capabilities. Also DiT scale better at high number of params.
Low param DiT is pretty hard, as you might have noticed. Weorkin on it.
: Model expects an added time embedding vector of length 2816, but a vector of 5632 was created. The model has an incorrect config. Please check unet.config.time_embedding_typeandtext_encoder_2.config.projection_dim. not exact the same unet 😉
I didn't take part in the previous pretraining, but from what I could see from the logs there was no MJ dataset used there.
what you do not want to see when you open your fridge
Does that mean (also saw your comment yesterday about attention) that the updated 2b model might have a slightly different architecture?
👀 is Kolors a sd3/sdxl finetune or is it a pixart/other AIs?
PD: I saw the link you posted, nevermind 🤗
So hey,how do i use stable diffusion?🤔
sad trombone. thanks for trying it.
but XiaoZhi 50/50 merge being successful is highly encouraging for future possibilities.
Kolors is sdxl with a different noise scheduler and GLM model instead of clip
trained on MJ generated dataset
- chinese typography
comfy has a workflow out there for merging cosxl with existing sdxl models. I wonder if that workflow carrying over the clip of the original entirely, would work. but setting 0.01 on output blocks 0-4 so the most is kept from the model you're bringing in.
or don't touch clip at all, leave that out entirely.
Explain pixart sigma that is a dit with 600m parameters. And there are already finetunes that fix anatomy. I'm guessing something went wrong during Sd3 2b training, or maybe did not train on enough data.
the first thing they did was make a 900m parameter version. 🙂
correct me if I'm wrong, it's 512px, right?
No it's 256-2k
pixart is 512/1024/2k. but the finetuners immediately ran into low parameter count issues and someone released a base model version with 900m
That was a member of the community
The guys that made the 900 version are also making a 1.2b but I'm not sure if they will release it
Not really, they just tested the 900m and found out it learned details easier
TBF i dont see any of what Lykon is saying in their paper https://github.com/Kwai-Kolors/Kolors/blob/master/imgs/Kolors_paper.pdf
it was a while back but they cited anatomy issues and various other things that were problematic with 600m.
it was a set of comments on discord.
Again, not really, anatomy issues in terms of hands only. But fine-tuning fixed that. However the 900m has better anatomy if finetuned the same way.
900m just learns more efficiently
look at the code.
have to wait for comfy, but i just can't imagine it works, would be fun though :p
GLM rather than T5 is interesting
For sure, I'm curious to see the results
Glm 9b vision from the cogvlm creators also it's a great llm
I'm wondering what size glm are they using
6B
Mhm, 8bit could be still very large then in terms of gb
Do you have any idea how much ?
if I had to guess it's probably this
they started from low param count and are going up (after a lot of optimazation on that architecture size)
while the research team in SAI started from 8b (which worked decently) and went down.
but, again, I wasn't involved in previous sd3 research. I started after the previous team left, so it's just guesswork on my end
Ah that could be the key to the success then. But also could be careful dataset curation. They only trained with 10m images and have great results with only that. I'm guessing also using only the T5 might do something but I'm not sure.
junk in junk out
Im hungry now 😋
I'm assuming Sd3 was still trained with laion ? If that's the case it's not at all a good dataset. Captions for that are not great
there are side effects to having a smaller dataset in the way models like Pixart do
they have less subject knowledge
EXACTLY!
or I should say knowledge of less subjects
They only trained with 10m images
That's orders of magnitude lower than SD3 8b (and new 2b wip) and even inferior to previous 2b.
One of the issues of the "woman lying on grass" prompt (ONE of the issues) is that the model is trained on positions from any angle, and fails miserably when doing upside down bodies or sideways (like most models, especially if undertrained). Upright bodies are mostly fine.
One thing a finetuner could do is overfit "woman lying on grass" on upright framing. Boom, easy fix.
im gonna become a dataset dealer, psst hey you, i got some uncorrupted pristine data here
They are undertrained for sure. But still do a large range of things with only 10m
there's a market for that
ngl knowledge training is the most expensive part.
I thought you guys were in the billions in terms of images. My bad than. Considering that, Sd3 does pretty great than
lol ya i was half serious, know some old photo bugs
we are, that's what I said.
even the finetuning dataset is ~1m high quality images now.
knowledge training is by far the biggest cost, with even tagging requiring multiple gpus for weeks (basically as expensive as training)
There really is and services for it on the darkweb 😂
human tagging!
I'm guessing there are still other problems with Sd3 2b. I could not in paint with it when I can with pixart sigma. (It produces mushed faces etc) Also img2img has problems. The denoising slider works weirdly in Sd3. Could these be just undertrained problems that could be fixed with more data ?
" any angle, and fails miserably when doing upside down bodies or sideways" if it is true, it won't only happen in woman lying on grass. It is a lack of caption on angle?
Best solution I found is procreate or the best model of them all photoshop …
denoising is very different for sd3 2b. I noticed 0.15 denoise is roughly equivalent to 0.35 of sdxl.
That´s what I´ve been saying for months, Get 10k people, pay them the minimum wage and let them tag for a month, 6-8h/day
Ah sorry, understood the contrary.
multiple reasons. IMO the biggest one is small attention size shared between text and data.
The one armed man did it!
I could not denoise well with it. I needed more like 0.5 to feel something changing. Also when it does it destroys the picture.
AI captioning is not that good, it can describe the image but not which person/object is in the image
did you try turning down the model shift?
human labelling in general is way worse than people think
there have been a lot of studies on this
the humans often lose to something simple like BERT
for me, turning model shift back to 1.00 just for the highres fix part helped it to not morph objects
Oof, maybe that's with people who have English as a second language
I tried even removing it.
interesting
humans suffer from ruts - they tend to only be able to think of one or two words for any given thing, and you want a lot more than that for a label. it's not an apple, it's a green apple, granny smith apple, round fruit (etc)
I just feel that Sd3 behaves very differently from other models. That's why people are having a hard time to fix it.
well it is SGM scheduling, for me, sgm scheduling used to do the most amount of distortions in highresfix (~0.35 denoising or lower?) compared to stuff like exponential
(sdxl)
mmm pumpkin bread
different neural net. they're having a hard time because they're refusing to learn to talk to it
It is a dit. It is very different from other dits
MMDiT, but the difference being is FLOW MATCHING
It's not like this is exact science. This is still very new
so it may be similar to Lumina compared to Pixart
yes, and it doesn't think the same way as other models. you start at the ground, learn how it thinks, learn how to talk to it, then you're fine.
Pixart is compatible with SDE samplers and all current schedulers, SD3 needs ODE(?) samplers and a specific scheduler with optional model shift
swan made of clouds
Not a skill issue sorry. The model has problems with it's training and people have to admit it. Otherwise it is never going to be fixed.
SD3 2B does have its shortcomings compared to 8B
It can be both
definatley a skill issue. sure the model has issues. ALL the AI image generator models out there, stable to dalle to midjourny, have issues. you learn how to talk to the AI, you can avoid those issues
you have to learn how to respect the model
i dont want to become friends with my generative ai
now it's sad
Yes, but the model needs more data, there are things that it doesn´t know
Pixart is compatible with all samplers and schedulers including along yours steps. And Sd3 is compatible with only one. This is why I'm saying something must have went wrong during training with Sd3. Pixart authors did not design their model to be compatible with anything out there. It just happened
Learning how to manipulate someone isn't the same as being their friends. Just have a Machiavellian relationship with the your model
it has to be the flow matching, training shouldn't mess up compatibility with samplers 🤔
but I could be wrong!
sampler is quite a broad term really
sure. that doesn't mean it's not a user skill issue, however. i've worked with it enough now i can get it to give me what I want without much work
I don't think so but I might be wrong. We will see if any experienced finetuners are able to fix it.
I feel you. It seems more like an architectural problem then one with fine tuning
sampler is not really a neat category of one type of method
there are samplers doing completely different things
there definitely were problems with the dataset/training (anatomy problems and weird random knowledge gaps for example), but samplers should be about the arch
if they cant its also a skill issue (they didnt know how the model brain thinks or respect it)
one thing that works really well is to feed a different prompt into each of the encoders. use their strengths.
Yeah. Recognizing where one problem lies isn't disregarding other problems
Did you test any open source model out there and compare it with Sd3? If not then you are not capable of seeing the problems since you have nothing to compare to
U have to train ur generator feed it information learn prompt engineering …. Advance ur skills let AI teach u
but damn is the 16ch VAE more noticeable on 2B more than on 8B
hope its just cause 8B is undertrained
you have to caress the model,lick it
