#ControlNet & T2I-Adapter Support
191 messages · Page 1 of 1 (latest)
This looks like a promising template for adding ControlNet support on the backend:
https://github.com/haofanwang/ControlNet-for-Diffusers
So I hacked up a depth2image node that looks pretty similar to the code you've linked:
https://gist.github.com/Kyle0654/57c337f7c005662b98a53f4e1ed7a960
But I'm not sure what the "correct" way to do this is. @graceful bronze indicated that this is probably not the right way to do this (IIRC because the user-submitted pipelines may be untested/unsafe). I know that the models need actual model management as well (so we can cache them on CPU, move them to/from GPU, etc.). Not sure how we do all that though.
I do like how simple the code ends up being though, so I'm hoping we can get somewhere in the middle, so it's easy to add new features 🙂
ControlNet support PR from takuma104 just got merged into diffusers: https://github.com/huggingface/diffusers/pull/2407
ControlNet by @lllyasviel is a neural network structure to control diffusion models by adding extra conditions. Discussed in #2331.
Usage Example
Document: https://huggingface.co/docs/diffusers/m...
That looks really easy to use. Bet you could take that sample code near the end and prototype a node for it quickly.
And the haofanwang diffusers-based ControlNet repo referenced above (https://github.com/haofanwang/ControlNet-for-Diffusers) is being redone to reflect the takuma104 merger.
(though arguably the canny part would be a separate node)
Yeah, I'm liking how straightforward the diffusers ControlNet code is, at least at the high level API of models and pipelines. Which I think is the level at which it'd be integrated into InvokeAI?
The canny ControlNet example given in the diffusers PR has now been posted as a diffusers API example at https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/controlnet
Canny edge detection is treated as a pre-processing step in both the diffusers example code and the lllyasviel/ControlNet repo. Skimming the lllyasviel repo usage examples, it looks like for controlled inference the control images are treated similarly for any of the controlnet models -- first transformed by a preprocessor , then run through the identical block of stable diffusion code. For instance, diffing lllyasviel repo examples gradio_canny2image.py and gradio_pose2image.py, essentially the only difference is in the control image preprocessing.
So having a node for each preprocessing method but only one node for actually applying controlnet inference makes sense to me.
I'm currently using ComfyUI Node based interface for ControlNet and it works very well and makes a lot of sense. It doesn't come with preprocessing nodes though, so I had to implement a canny node myself.
I'm pretty sure it has been discussed here right?
Really useful to see this! Can you post a higher rez screenshot?
So for the version without a canny node, the input control image had already been run through a canny edge tranform?
I hope this is good enough. It's the same setup as here:
https://comfyanonymous.github.io/ComfyUI_examples/controlnet/
except for the canny node.
And yes in case you don't have the node, you need to precompute a canny image and upload it.
All the filters (Canny, HED, Normal etc.) are already implemented in the official ControlNet gradio repo.
And canny is just a one liner in opencv anyway.
I'm very new here, is there a discussion on what node interface might to be used in Invoke?
Yep, a very long one! https://discord.com/channels/1020123559063990373/1074572596663816242
I'm new too, so still catching up on the node discussion. And the node backend code/API just got merged into main.
Thanks so much!
BTW if you already have your invokeai environment set up, you can just clone ComfyUI it works out of the box if you want to try it.
Oops, the previous link I posted if for the more recent node discussion, focused on backend node integration into the existing codebase.
Here's a longer thread, "Node Based UI & Workflow", that includes more ideas on node UI (started back in September!):
https://discord.com/channels/1020123559063990373/1022847959404122182
And another good Invoke node discussion on developer forums, "Node Use Cases: What do you want to do with nodes?"
https://discord.com/channels/1020123559063990373/1049107548264992779
Side note: is there any way to keep threads in our Discrod developer-forums from disappearing? If there are no new posts these threads disappear from my view (they no longer show up in left sidepanel in subtree under developer-forums channel). Only way I've been finding them again is by searching for terms relevant to that thread and wading through the search results till I find one from the thread I'm interested in. Or copying thread link outside of Discord. Both seem pretty clunky. I'm pretty new to Discord, is there an easy way that I'm missing?
I think your input on node architecture and UI based on your experience working with ComfyUI would be very welcome in the InvokeAI node conversatiions!
Similar to ControNet, there is also T2I-Adapter (https://github.com/TencentARC/T2I-Adapter) and cloneofsimo (which made the first implementation of lora for stable-diffusion) made a port over diffusers (https://github.com/cloneofsimo/t2i-adapter-diffusers).
I digged a little further and I found an opened issue on diffusers (https://github.com/huggingface/diffusers/issues/2390), apparently a PR will be opened in the week.
hrm not going to be as trivial to make a controlnet node I guess. Something about kern not implemented for half. But the code at least tries to run
Looks like the diffusers PR for adding T2I-Adapter support went in last night: https://github.com/huggingface/diffusers/pull/2555
ControlNet & T2I-Adapter Support
ControlNet and T2I-Adapter seem similar enough that I'm adding T2I-Adapter to this discussion...
I've got the diffusers ControlNet example at https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/controlnet working on the same virtual env that I'm using for InvokeAI (mainly need to ensure that diffusers >= 0.14.0 is installed).
Initially the example wasn't working, I was getting a similar error to yours:
"LayerNormKernelImpl" not implemented for 'Half'
Turned out that the example was trying to run on my CPU, which didn't like fp16. Tried first replacing fp16 with fp32, which allowed example to work on CPU, but very slowly (~0.12 it/s). Then put fp16 back and pushed to GPU -- just changed this pipeline call:
pipe = StableDiffusionControlNetPipeline.from_pretrained(....)
to
pipe = StableDiffusionControlNetPipeline.from_pretrained(....).to("cuda")
and it's working well, getting ~10it/s
@cerulean tangle are you working on nodes for ControlNet? Anyone else? If there's nobody already working on it, I was thinking of taking a swing at it this weekend... I'm only looking at the backend, from diffusers integration up to wrapping as a node.
Please have at it. I'd actually prefer someone else try it and get some feedback on nodes and the core beneath them.
You might want to build on top of @mortal elk's PR though, unless you're willing to replace your code later.
Understood -- I'm following the PR discussion at https://github.com/invoke-ai/InvokeAI/pull/2902 to figure out when to switch over.
Remove node dependencies on generate.py
This is a draft PR in which I am replacing generate.py with a cleaner, more structured interface to the underlying image generation routines. The basic code ...
I'm going to switch over to refactor/nodes-on-generator branch and further branch from there.
I think the PR is close to being safe to merge into to main. It's actually been approved, but I think @cerulean tangle should make the decision when to hit the merge button.
Thanks for the update Lincoln!
In other news, diffusers PR to support Multi-Controlnet looks about ready to go:
https://github.com/huggingface/diffusers/pull/2627
Looks like very simple API modification at the level we need to make calls to.
MultiControlNet support is now merged into diffusers main.
I've tested passing multiple ControlNet models to the same diffusers StableDiffusionControlNetPipeline instance and seems to be working well.
Hello! Are there plans to support ControlNets from the main UI or will it be a Nodes feature?
I've been working on adding InvokeAI backend support for ControlNet, based off the recent Generator refactor. Very close to having a working barebones version, but still hitting some Tensor mismatch errors. If I'm still having problems by end of this weekend, I'm hoping I can get some core dev team support for a little pair programming to figure this out.
Here's my current test script to give you a sense of how I'm proposing integration with invokeai.backend.generator
from invokeai.backend.generator import Txt2Img
from invokeai.backend.model_management import ModelManager
from diffusers.models.controlnet import ControlNetModel
canny_image = cv2.imread("/test_images/input/canny_vermeer.png")
canny_image = Image.fromarray(canny_image)
# using invokeai model management for base model
model_config_path = os.getcwd() + "/../configs/models.yaml"
model_manager = ModelManager(model_config_path)
base_model = model_manager.get_model('stable-diffusion-1.5')
# for now using diffusers model.from_pretrained to load ControlNetModel
canny_controlnet = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny", torch_dtype=torch.float16).to("cuda")
# all default params except control_model
txt2img_canny = Txt2Img(base_model, control_model=canny_controlnet)
# all default params except for control_image
outputs = txt2img_canny.generate(prompt="old man",
control_image=canny_image)
generate_output = next(outputs)
out_image = generate_output.image
out_image.save("/test_images/output/canny_controlnet_testout.png")
I just submitted a draft PR for backend ControlNet support!
Draft, but working: https://github.com/invoke-ai/InvokeAI/pull/3085
original "Girl With A Pearl Earring" by Vermeer
Left: Canny edge detection applied to Vermeer image
Right: Using draft PR for ControlNet support, with prompt = 'old man' and control_image = canny_vermeer.png
Looks like I need to fill out the cv nodes 🙂
Have you checked out the branch where I've been trying to convert nodes to use latents? I was trying to think through how ControlNet would be added (and if there are other things like it in the future, how that might work): https://github.com/invoke-ai/InvokeAI/blob/kyle0654/node_latents/invokeai/app/invocations/latent.py
Feels like at least control_image should come in as a parameter. But I don't know if we want to just keep adding things onto a "text to image" node, or if "text to image with controlnet" should be its own node?
I'd like to see one image generation node that takes a prompt, but has additional optional inputs for controlnet, initial image (for img2img), etc. (each with an associated strength, of course).
Yah. I'm worried about how much that grows over time. And it kind of just puts us right back where we started with prompt2image
Does it make sense to use both an init image and a controlnet for one image? What about multiple controlnets? I don't know, but it would be fun to be able to try it.
It might also be a good idea to have alternative simple prompt node, which does the same thing without all the extra inputs.
@uncut sphinx great does your current implementation also support controlnet-pose (pose transfer)?
Yes, I've tested with openpose input. Also, this implementation should work with any of the ControlNet models hosted at https://huggingface.co/lllyasviel/ControlNet, though I haven't tested them all yet.
Best to avoid using the original 5.71GB models. Identical results can be achieved with the 723MB versions. https://huggingface.co/webui/ControlNet-modules-safetensors/tree/main
Ah, I was wrong about where the controlnet models are coming from.
For this first draft of InvokeAI ControlNet support, I'm relying on diffusers loading, for example ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny"). Which is NOT the same as https://huggingface.co/lllyasviel/ControlNet, rather it's https://huggingface.co/lllyasviel/sd-controlnet-canny. And the file it's downloading and caching on my local drive is diffusion_pytorch_model.safetensors. Though it renames it as a long hex number -- not sure if that's for checksum or git versioning purposes? File size is 1.45 GB, so maybe its fp32 instead of the 723 MB fp16 file at https://huggingface.co/webui/ControlNet-modules-safetensors? So not as compact, but definitely smaller than the 5.71 GB versions at https://huggingface.co/lllyasviel/ControlNet
@uncut sphinx When you use from_pretrained() on any of the HuggingFace models it will build a cached version of the model in which the data object filenames are replaced with hex numbers that are pointed to by symlinks. A few things to be aware of:
- When you download the model, you have the option of passing a
revisionparameter tofrom_pretrained(). Most (but not all) models include a revision offp16, which lets you get the half-precision version of the model. If at different times you request both thefp16andfp32versions, they will exist side-by-side in the cached model directory in separate directories undersnapshots/. Best not to rely in any way on the structure of the cached directory, because it's been known to change. You can get the available model versions by going to the huggingface models page and looking at the branches in the "Files and versions" tab. Some models have EMA vs non-EMA versions, for example. - HuggingFace by default puts its cached models into the
/.cache/huggingface/hubdirectory in your home directory. A long time ago we made the decision to have InvokeAI move the cache into the~/invokeai/models/hubdirectory so that all the models would be in one place and users had more visibility into what was eating up gigabytes of their disk space. To be consistent with this convention, you need to passcache_dir=global_cache_dir("hub")as one of the parameters tofrom_pretrained.global_cache_dir()is importable frominvokeai.backend.blobals - When you do
from_pretrained()with a repo_id, the HuggingFace client code always pings the server to see if there is an update to the cached version. This results in an annoying "Downloading 100%" message. You can avoid this by specifyinglocal_files_only=True, but this is problematic. If you figure out how to quench the message, let me know.
Thanks for summarizing the behavior of from_pretrained(). I wasn't aware of the cache_dir param, or the global_cache_dir() method in globals. I'll modifiy usage to consolidate ControlNet models in the InvokeAI cache.
I've added MultiControlNet support to the backend ControlNet PR: https://github.com/invoke-ai/InvokeAI/pull/3085
This is great. I'll give it a whirl!
I just closed PR #3085 and opened new ControlNet PR : https://github.com/invoke-ai/InvokeAI/pull/3156
Mainly to escape rebase loop hell with old PR. The new PR rebases nicely.
So, I've now downloaded 36 different ControlNet models. Any others I should be testing? (Please I hope not)
##############################################
lllyasviel sd v1.5, ControlNet v1.0 models
##############################################
"lllyasviel/sd-controlnet-canny",
"lllyasviel/sd-controlnet-depth",
"lllyasviel/sd-controlnet-hed",
"lllyasviel/sd-controlnet-seg",
"lllyasviel/sd-controlnet-openpose",
"lllyasviel/sd-controlnet-scribble",
"lllyasviel/sd-controlnet-normal",
"lllyasviel/sd-controlnet-mlsd",
#############################################
lllyasviel sd v1.5, ControlNet v1.1 models
#############################################
"lllyasviel/control_v11p_sd15_canny",
"lllyasviel/control_v11p_sd15_openpose",
"lllyasviel/control_v11p_sd15_seg",
"lllyasviel/control_v11p_sd15_depth", # broken
"lllyasviel/control_v11f1p_sd15_depth",
"lllyasviel/control_v11p_sd15_normalbae",
"lllyasviel/control_v11p_sd15_scribble",
"lllyasviel/control_v11p_sd15_mlsd",
"lllyasviel/control_v11p_sd15_softedge",
"lllyasviel/control_v11p_sd15s2_lineart_anime",
"lllyasviel/control_v11p_sd15_lineart",
"lllyasviel/control_v11e_sd15_shuffle",
"lllyasviel/control_v11p_sd15_inpaint",
"lllyasviel/control_v11u_sd15_tile",
"lllyasviel/control_v11e_sd15_ip2p",
"lllyasviel/control_v11p_sd15_scribble",
#################################################
thibaud sd v2.1 models (ControlNet v1.0? or v1.1?
##################################################
"thibaud/controlnet-sd21-openpose-diffusers",
"thibaud/controlnet-sd21-canny-diffusers",
"thibaud/controlnet-sd21-depth-diffusers",
"thibaud/controlnet-sd21-scribble-diffusers",
"thibaud/controlnet-sd21-hed-diffusers",
"thibaud/controlnet-sd21-zoedepth-diffusers",
"thibaud/controlnet-sd21-color-diffusers",
"thibaud/controlnet-sd21-openposev2-diffusers",
"thibaud/controlnet-sd21-lineart-diffusers",
"thibaud/controlnet-sd21-normalbae-diffusers",
"thibaud/controlnet-sd21-ade20k-diffusers",
##############################################
ControlNetMediaPipeface, ControlNet v1.1
##############################################
"CrucibleAI/ControlNetMediaPipeFace", # SD 2.1?
["CrucibleAI/ControlNetMediaPipeFace", "diffusion_sd15"], # SD 1.5
honestly, if all these work, i think we're in good shape
@atomic eagle once nodes api is merged in, is there any reason @uncut sphinx couldnt be noding away?
can't think of any - i'm very happy to have a 1-on-1 to clarify any questions as well
@uncut sphinx is there a way to test this?
would love to poke at it today, and am happy to (try) to help taking on nodifying it
@steep idol and @atomic eagle , maybe the three of us could virtually meet up to discuss current state of controlnet and nodification?
Nodes api pr is getting merged in soon so that blocker should be a non issue. Would love to catch up - will have to be once these kids go down
Or sometime this weekend
This evening or this weekend both work for me. @atomic eagle If scheduling the three of us is complicated, I'm happy to do separate one-on-ones too.
The current draft PR should be usable: https://github.com/invoke-ai/InvokeAI/pull/3156
Or the branch it's based on: https://github.com/invoke-ai/InvokeAI/tree/feat/controlnet_backend
Although I haven't rebased to main in several days, so maybe I should bring it up to date.
Also I don't think there's an example usage script in there -- I'll clean up a test script and add it in this evening.
I've also made changes that I haven't commited yet, though they shouldn't change basic usage.
It’s my morning - I’ve just had my tea and happy to chat any time today
Just got kids down, they may intrude but am good to chat if you both are! @atomic eagle @uncut sphinx
sure online now
sure
@uncut sphinx we're in the voice chat if you are online
Argh! I went afk for a while. Apologies for the bad timing.
feel free to hop on the chat
ControlNet in node UI !
It's only partial implementation and hacky right now, but working.
The most beautiful old man with a pearl earring I ever did see
Hey there! Just jumping in to say that you guys are doing a fantastic job!
I have a question, though. Is Controlnet planned to only work through the node interface or are you guys also thinking about having it elsewhere? I guess I'm imagining that having control net work as a kind of stamp in the canvas would be amazing.
Like in the case of pose control, you could just position your character inside your selection box and inpaint. It would be crazy!
controlnet is a core feature and will be available in the 3 main workflows (classic/linear, canvas, nodes). nodes will offer more control of course.
@uncut sphinx - how are we looking on controlnet readiness to merge? (and for anyone paying attention to this thread, latest sneak peek/update)
There is some confusion with the color based type marking in this image. Control and conditioning outputs are both the same cyan. Also the input to collect is grey but it's taking cyan wires and the control input on T2L is grey for collection, but can't it take control wires directly (ie Cyan)?
It looks like we might be running out colors. Could shapes be used in addition to colors (ie squares, triangles) to prevent overlap?
would say that inputs/outputs design is not fully final
i think the cyan is being used because both are passing in an input+model
I know tone does not come across in text very well, and I know the design is not final, so this isn't meant to be accusatory or anything.
Those 2 input types (conditioning from compel vs control net) should never be plugged into each other. If we are running out of colors, and the idea is to clearly mark types graphically, then something more like shapes and colors might be helpful to avoid overlap.
I actually think grey is a good color for "takes any input type" for something like collect that can take different things, but it should then change to match what is being used for input and output. So the collect node starts grey and then once you plug something in, the circles on both ends change to match.
Yes, my response was more of a "yep, that's true and valid feedback. these aren't intended to be final, so things will change in that direction"
There has been almost no styling done on the nodes editor yet, it's pretty much just 'functional UI'
<node_humor>
Will it Blend?
No. No it definitely won't.
I rebased the feat/controlnet-nodes branch from main, after the big canvas PR got merged last night. Looking good so far. Trying to add a few more backend features today. Is it better to go ahead and put in the PR today, or in a few days when its more feature complete?
What is the gap between now and "feature-complete"?
tend to think if its a solid/stable foundation, getting it in now and handling enhancements through a diff PR would be optimal
For TextToLatents-based controlnet support, should be "feature-complete" by Saturday
Then I'll clean up today and get PR in by tonight then.
right now it's very early, the UI is just the bare minimum to have a functioning app, and the UX is basically non-existent. we'll work on actually making it user-friendly before release. in other words, suggestions for how you think it should work are more useful than feedback on how it is right now bc i know it's... not good 😅
My suggestion would be to use a few shapes and stick to fewer, more distinctive colors. In the above shots, there are several shades of blue/green being used that are so close that people might not be able to distinguish them (model, seamless, and scheduler on the T2L for example)
Moving the label inside the handle might help a little bit too? It’d make the color a bit more supplementary and give you more room to do other stuff like patterned background colors, or different font/background color combos.
@uncut sphinx just got my env set up to start testing this
How does one load an image into the preprocessor node? It doesn't seem like there's an upstream node, but tapping the Image icon etc. doesn't prompt for an upload. Wondering if that's just a firefox issue w/ node editor ui
Second question - ControlNet models don't seem to autopopulate as a dropdown in the node editor. Do these need to be manually typed in, or is it because I'm missing models? What are the standard pre-reqs to use this?
You can just drag an image from the gallery to the preprocessor node. More specifically it needs to be dropped onto the image icon in the Image port section of the node.
Right now there is no autopopulate, it's free text entry. The name of any controlnet models hosted on huggingface should work. Here's my current list of popular models that I copy/paste from:
##############################################
lllyasviel sd v1.5, ControlNet v1.0 models
##############################################
lllyasviel/sd-controlnet-canny
lllyasviel/sd-controlnet-depth
lllyasviel/sd-controlnet-hed
lllyasviel/sd-controlnet-seg
lllyasviel/sd-controlnet-openpose
lllyasviel/sd-controlnet-scribble
lllyasviel/sd-controlnet-normal
lllyasviel/sd-controlnet-mlsd
#############################################
lllyasviel sd v1.5, ControlNet v1.1 models
#############################################
lllyasviel/control_v11p_sd15_canny
lllyasviel/control_v11p_sd15_openpose
lllyasviel/control_v11p_sd15_seg
lllyasviel/control_v11p_sd15_depth
broken, instead use:
lllyasviel/control_v11f1p_sd15_depth
lllyasviel/control_v11p_sd15_normalbae
lllyasviel/control_v11p_sd15_scribble
lllyasviel/control_v11p_sd15_mlsd
lllyasviel/control_v11p_sd15_softedge
lllyasviel/control_v11p_sd15s2_lineart_anime
lllyasviel/control_v11p_sd15_lineart
lllyasviel/control_v11p_sd15_inpaint
lllyasviel/control_v11u_sd15_tile
problem (temporary?) with huffingface "lllyasviel/control_v11u_sd15_tile"
suggestion for now is to replace with:
lllyasviel/control_v11f1e_sd15_tile
lllyasviel/control_v11e_sd15_shuffle
lllyasviel/control_v11e_sd15_ip2p
lllyasviel/control_v11f1e_sd15_tile
#################################################
thibaud sd v2.1 models (ControlNet v1.0? or v1.1?)
##################################################
thibaud/controlnet-sd21-openpose-diffusers
thibaud/controlnet-sd21-canny-diffusers
thibaud/controlnet-sd21-depth-diffusers
thibaud/controlnet-sd21-scribble-diffusers
thibaud/controlnet-sd21-hed-diffusers
thibaud/controlnet-sd21-zoedepth-diffusers
thibaud/controlnet-sd21-color-diffusers
thibaud/controlnet-sd21-openposev2-diffusers
thibaud/controlnet-sd21-lineart-diffusers
thibaud/controlnet-sd21-normalbae-diffusers
thibaud/controlnet-sd21-ade20k-diffusers
Should I pre-populate? Is there a way to prepopulate a node port but also allow free text entry for other controlnet models?
@steep idol are you testing with Text2Latents node from latent.py or Text2Image node from generate.py? Text2Latents is the most up-to-date, I'm actually not sure if ControlNet with Text2Image will run properly with the latest updates.
Here's a screenshot of current usage in Node UI, with result and intermediate canny edge detection images in gallery.
Another thing to be aware of: if you haven't pre-loaded a ControlNet model, it will download and cache the first time you use it. But there's currently no warning of this on the client, so while downloading, the Node UI will appear to be stuck while executing. If you're running the InvokeAI server in a terminal you can see the download progress there.
And even if you have pre-loaded a ControlNet model, some of the image preprocessors have their own internal models that also need to be dowloaded and cached if they haven't been used before.
Can add a ui element to show a drop-down that also lets you do free text entry. For now, maybe best to just put your list into a field and let it be a drop down?
Okay, changed ControlNet model field from free text to prepopulated dropdown that includes popular ControlNet models. Pushed to feat/controlnet-nodes and ControlNet PR https://github.com/invoke-ai/InvokeAI/pull/3405
giving this a spin now 😄
omg so cool
Glad its working for you! I'm going offline for the next 20-ish hours, but look forward to any feedback once I return.
I'll try to get a solid review in (after I've had a bit of fun). Great work!
Thanks! FYI next thing I plan to do is get it working with LatentsToLatents node (for Img2Img). Need to refactor in TextToLatents so LatentsToLatents will inherit the ControlNet functionality.
definitely going to need that list to autopopulate for full release.
Is there a need to cover the CN1.0 models? I believe CN1.1 is a strict upgrade.
Don't forget to test the brand new 'reference' model, especially since it might cause a UI challenge since it's the first preprocessor-only controlnet model.
Getting an error on Depth model controlnet (1.1)
UserWarning: Mapping deprecated model name vit_base_resnet50_384 to current vit_base_r50_s16_384.orig_in21k_ft_in1k.
Having issues in general getting this to generate :/
Think it might be just my local install or something I'm not doing right
Yeah none of it is generating for me.
(normal txt2img)
BTW just checking but do you have nodes for passing in control images directly (ie no preprocessor, like open poses generated externally)?
Yep
You can pass image directly to controlnet node then
processed_image = midas_processor(image,\nTypeError: MidasDetector.__call__() got an unexpected keyword argument 'depth_and_normal'\n`, … }
This is happening for the MidasDepthImageProcessor node (after I fixed my install)
Looks like the refence control net is getting a bunch of custom settings.
Yeah, same with HED processed_image = hed_processor(image,\nTypeError: HEDdetector.__call__() got an unexpected keyword argument 'safe'\n`, … }
output is image going to image, perhaps settings are getting passed in as args somehow?
can we get a screenshot of the graph
Preprocess > controlnet (image > image connectors)
I see what's going wrong -- for both of the "unexpected keyword argument" errors it's a mismatch between controlnet_aux package versions. That's my bad. For development I've been using current main from https://github.com/patrickvonplaten/controlnet_aux. But for deployment in pyproject.toml I've pinned controlnet_aux to v0.0.3, the latest stable release. And intended to comment out anything that was relying on controlnet_aux > 0.0.3, but I missed a few things.
I will fix and push PR ASAP. If you need a more immediate fix, installing latest controlnet_aux from github repo should fix as well.
is there a tag where i can checkout controlnet in invoke?
it's in this PR: https://github.com/invoke-ai/InvokeAI/pull/3405
only functional in the node editor right now
I've pinned the ControlNet PR to use controlnet_aux v0.0.3 (most recent stable version). And removed any code that relied on post-v0.0.3 changes to controlnet_aux. Just tested all the ControlNet preprocessors again and not seeing any of these missing param errors now. Pushed this fix to PR: https://github.com/invoke-ai/InvokeAI/pull/3405
Can you try the HED and Midas pre-processors again?
Will do shortly! Hopefully we can preserve those changes so when controlnet aux is updated we can use em! 🙂
Yeah I've just commented those changes out for now, will uncomment them back when controlnet_aux is updated. Which should be soon, Patrick said there may be a new controlnet_aux release by end of week.
💯
sweet
Trying to test out, running into some issues getting my instance up and running.
100% positive it's somethingl ocal
But can't test until I resolve
Yep. I had another instance open 😏
Hm, now im getting compel errors w/ conjunction 😕
ah must have gotten upgraded and still on 1.1.5 🤦♂️
ok - after getting through all of my local instance challenges, I am sad to report I am still getting error on Midas
\\midas\\vit.py", line 145, in forward_flex\n x = x + pos_embed\nRuntimeError: The size of tensor a (2167) must match the size of tensor b (2073) at non-singleton dimension 1\n', … }
HED works though.
Yeah I had to fall back to Compel 1.0.5 on feat/contronet-nodes, which I think is also what main is currently on? And 2.3.* branch(es) are on Compel 1.1.5?
What dimensions are the input image and noise latent? Can you send a screenshot of the node graph that you're getting this error with?
Think it was a 1280x768
It’s failing at the Midas node, though, but Midas is outputting to controlnet set to 1.1 depth
I’ll give it a try with 512x512
@uncut sphinx - 512 x 512 works!
Testing addition of ControlNet support to LatentsToLatents (latent-based nodes for Img2Img):
So good haha
It's kinda like those "recreate an old photo with the same people now" posts.
Yeah it’s a super powerful combo
In this case, relative to result from just using Img2Img, adding the OpenPose controlnet really improved the head and the hands.
Yeah you’d get flagged for that non-controlnet age as NSFW!
looks similar to the results fro mthe reference-only controlnet model (? is it even a model? dunno)
Collect node should acquire the color of what it's colleting to avoid confusion.
speaking of can it handle that model since it's so different from the other Controlnets?
yep, will be handling this
dunno, that's a @uncut sphinx question
inferring the type will be kinda tricky, will have to traverse the graph in the UI. but there's a nice graph library that i'm using for some of that kind of thing which can the hard part of that for us.
Regarding suppor for "reference-only" ControlNet mode.
The InvokeAI ControlNet support I've been working on does not yet support reference-only mode. I definitely want to add it to InvokeAI. Reference-only does not have the same kind of pluggable model that other ControlNet modes do. From browsing the code, it looks very different. There is actually already a PR pending to add to diffusers: https://github.com/huggingface/diffusers/pull/3435. Which implements the reference-only feature but without the rest of ControlNet, making it easier to understand (at least for me). Any opinion on whether an InvokeAI implementation should include it under ControlNets or separate out as its own thing?
Separate it
I think we’ll use it on its own to power some things on the canvas
And I think as a UX pattern, would be confusing - although I can see an argument for folks saying it deviates from what is known in auto
Both branches should now be on 1.1.5
@uncut sphinx is https://github.com/invoke-ai/InvokeAI/pull/3405 still the one we should be tracking for controlnet? is there anyone that needs poking to get this merged in?
Yep that's the latest ControlNet PR. @atomic eagle or @graceful kindle, could you take a look?
Recent modifications including pinning to the newly released controlnet_aux v0.0.4, reinstating the Zoe depth preprocessor node, and adding a Mediapipeface preprocessor node. Also, thanks to a great session with @atomic eagle , we got polymorphic input ports on nodes working. So now the control input port on TextToLatents can take either a single ControlField input or a list of ControlFields. So a single ControlNet can connect directly to TextToLatents without going through a Collect node, like:
did you get to double-checking our changes don't cause bigger issues?
I haven't seen any new issues.
There are problems with some node connections disappearing from the UI on reload, even though reloaded graph still executes as if the connections are there. But prior to our changes I was already seeing that with ControlNet nodes, image preprocessor nodes, and sometimes Noise nodes.
But I do need to go over the code changes for polymorphic node inputs again. And I need to test more diverse graphs, I've mostly been testing ones that include ControlNet. Going to try to do a polymorphic node input that has nothing to to with ControlNet...
Connections disappearing is unrelated to what your working on, just haven’t gotten around to investigating it yet
Some extra tests on different types of inputs/outputs works be great - both valid polymorphics and invalid need to be tested (eg a list of int should not work when connected to a poly string input)
I've been testing node inputs like Union[float, list[float]] and Union[int, list[int]].
Needed some modifications but it's working.
And I am still working on polishing some of the changes we made.
yeah baby!
great work @uncut sphinx
PS: I've improved image handling in https://github.com/invoke-ai/InvokeAI/pull/3473, so control images will now display in the gallery. Also they are categorized as control so we can add filters
Glad you're liking it @atomic eagle ! Lots of directions we can take this but the core ControlNet functionality is now in place.