#Implementing Deepfloyd Support

45 messages · Page 1 of 1 (latest)

proven trench
#

Heh, I figured I should open a thread on this, though it's likely been discussed elsewhere already.

Is there any knowledge of how difficult it would be to implement Deepfloyd support into InvokeAI, either currently or with the upcoming Nodes setup? I'm not really in the loop with how different it is from a typical Standard Diffusion setup, but I have seen the early results from it and it's definitely a remarkable advancement.

Be curious what the prospects are for having it be supported in Invoke are. 🙂

Here's a good video covering just was DF has to offer!

https://youtu.be/4Zkipll5Rjc

DeepFloyd IF is a state-of-the-art text-to-image model that can generate high-quality images based on text prompts. It was introduced by StabilityAI and its multimodal AI research lab DeepFloyd. The model consists of a frozen text encoder based on the T5 transformer and three cascaded pixel diffusion modules: a base model that generates 64x64 px...

▶ Play video
wide parrot
#

Just a note: if you look at the specs it needs 16GB VRAM MINIMUM (24 for full function) which means most people can't run it. Also a slap in the face yet again to our being forced to use Nvidia (which has bad Vram/cost) and not AMD.

ancient pawn
#

On top of that Stability isn’t letting others fully use it yet - licensing is for research only

#

Bit disappointing

shut moat
#

I’m guessing the answer is “let’s finish the core stuff first” but is there anything specific planned around nodes extensibility in the direction being able to like clone “my-node” repo into a folder and the node just shows up?

#

(related to this topic since deepfloyd is probably the best example of something y’all wouldn’t want to merge in at this point for the licensing reason)

tardy parcel
solid valve
#

It would be cool if you could create your own node workflow, then make it show up in the tabs along the left and look like the regular text2img/img2img interfaces...

shut moat
tardy parcel
shut moat
#

Awesome

proven trench
#

To be fair, I heard a rumor that somehow part of DF will eventually replace OpenClip in the Stable Diffusion models? I have no clue if that's true, but I heard that DF and SD may not remain separate eventually

ancient pawn
#

But, fair, could just end up being used to research spaghetti text.

shut moat
#

So I poked around at this since I’ve wanted to hack more on invoke and play with Deepfloyd. I’ve got it working with basically a node per stage of the pipeline which is basically t2l -> l2l -> l2i.

The 3rd stage is just using an existing stable diffusion upscaler, so I probably can make that work without a new node.

For the other two, I’m happy with having them be their own nodes (especially since they probably aren’t something to have in the main codebase?), but I’m curious what the “right” nodes thing would be. I started thinking about trying to genericize latent.py, ModelManager, and probably a handful of other places which seemed like the right direction?

ancient pawn
#

I think you did the right approach

#

I.e., creating dupes of t2l or l2l with deepfloyd. At least as I understand it, we’d want to treat different pipelines as different nodes, not try to merge it all into one massive node

#

As far as the SD upscaler, I believe that hasn’t been incorporated into Invoke, so adding as it’s own node would also probably be the thing to do

shut moat
wide parrot
#

PromptEmbeds has "latent1" and "latent2" ouputs. I assume based on where they go those are pos and neg but need to be labeled as such. Also latent outputs going into embed inputs with the same color (these could be correct, but if so they need to share labels or it looks like a mistype).

shut moat
#

PromptEmbeds has "latent1" and "latent2" ouputs.
Yeah I was playing around with something generic there and didn't fix label

tardy parcel
wide parrot
#

Also for much later feature enhancement: multi-model sequential stage latent node.

sweet silo
#

For me looks like embeds it's conditioning, but not sure for real what resends
And first prompt embeds looks like compel node

#

Also to test it for me I need "Enable sequential offload" flag unfortunately 😄

shut moat
sweet silo
ancient pawn
#

I think his point is that compel may be more of an SD implementation vs being workable with Deepfloyd. @scarlet skiff have you thought or been asked about deepfloyd + compel yet?

scarlet skiff
#

i mean, assuming that deepfloyd also uses the TE’s last hidden layer, which it may not do

shut moat
#

That’s awesome, I’ll try it out tonight.

shut moat
#

Looks like there’s a few things to peel through to get it to work. The compel node uses model manager which expects SD. Bypassing that and I hit a couple other things. I’ll get a better look when I have more time.

tardy parcel
#

Model manager is currently being rewritten to support individual models - May be easier once that is released

shut moat
shut moat
#

I have to RMA my graphics card, so my progress on anything for this will have to pause 😭

Checked in code here - https://github.com/aluhrs13/iai_node_deepfloyd - should work for folks if you just drop it in to your invocations folder and follow the steps to agree to the DeepFloyd terms and stuff from their blog post.

GitHub

InvokeAI nodes to enable Deep Floyd. Contribute to aluhrs13/iai_node_deepfloyd development by creating an account on GitHub.

ancient pawn
#

Good luck on the GPU - 🪦

shut moat
#

Also I looked through the model manager change in more detail. Probably won’t even need new nodes for DeepFloyd, might just need tweaks to model manager, t2l, l2i, and upscaling.

ancient pawn
#

@sweet silo could probably opine more on that, but i do think as much as possible we want to keep things from getting bloated with all the conditional logic. That's just my "design/architecture" hat being worn

shut moat
#

Bumping this again - @sweet silo do you have a feeling if DeepFloyd will fairly easily drop in to the new model manager?

sweet silo
#

my opinion about deepfloyd itself is - something that hard to run 😄

#

with such high requirements

#

but now it possible at least