#nodes
1989 messages · Page 2 of 2 (latest)
would the project be generally receptive to including invoke-new and its current temporary web assets in the official docker containers? I'm happy to continue building my own if not, but I don't think the changes would be enormous
(and crucially, it wouldn't affect anyone who didn't explicitly choose to change the endpoint)
i think with the timeframe we're looking at completing nodes in, it'd be a distraction to not just finish out nodes and then get docker containers updated
soon™️
we've got txt2img, img2img, node editor and basic basic gallery functionality working. canvas will be the big kahuna.
and we've gotta port LoRA.
Reading tests was super helpful btw, starting to get a feel for things. @broken junco if you refresh some of the docs today, I can happily follow-up with some suggestions/edits when I have time to go deeper.
Has there been any talk of like replace all “Invocation” with “Node” across the codebase? Not sure if that’s a perfect suggestion (and it’s not at all necessary) but might make mental modeling easier for newcomers.
Naming debates are pretty common. 😛
Yah. I need to do that soon. I have a big PR out to fix subgraphs and add a graph library. Once that's in we should probably rename things.
I'm going to wait on refreshing the docs until I've studied and am comfortable with Kyle's big graph PR.
Yeah no worries! I’ll take semi-detailed notes as I learn stuff and can share them whenever.
@sinful forge why do we need to track multiple cancellations for an InvocationQueueItem here: https://github.com/invoke-ai/InvokeAI/blob/main/invokeai/app/services/invocation_queue.py#L61-L70, and compare the timestamp of the Item to the timestamps of all of its cancellations, instead of simply removing the cancelled item from the queue when .cancel() is called? 👈 (cc @hexed imp )
You may have queued multiple items, and want to cancel all of them. But you may then quickly queue up something new after that. I make the assumption that we can't modify the queue and it might take a while to chew through.
if we assume we can modify the queue, is there still an upside to this approach vs just deleting the matching graph_execution_state_id? (ignoring for now performance impact of deleting items from the middle of a queue)
Uh.. I guess not. With contention that may be difficult.
Unless you have per-session compute or something
contention meaning many different writers racing to modify the queue?
Then you could just lock around a session
Yah at scale I wouldn't want to lock the queue around a delete
I generally try to follow a lockless approach where possible
okay, yeah, that makes a lot of sense. thank you!
Might be the ugliest MVP I've ever made, but it's clearly some amount of V 😄
fragile discord frontend talking to fragile invokeai client talking to unfinished invokeai nodes. wcpgw 😄
(if anyone fancies pitching in with de-fragiling it, I'd be very happy to add you to the (almost entirely empty) discord server I'm testing this, so you get free image generation on my GPU 😉 )
hmm, I just noticed that my essentially-idle invoke-new.py is using ~15% CPU
going by strace it looks like it's hard-looping on something
poking at the code it all seems like it's just on uvicorn and fastapi to DTRT with the asyncio loop :/
Ordered by: cumulative time
List reduced from 15388 to 10 due to restriction <10>
ncalls tottime percall cumtime percall filename:lineno(function)
4488/1 0.078 0.000 17.298 17.298 {built-in method builtins.exec}
1 0.000 0.000 17.298 17.298 /invoke-new.py:1(<module>)
1 0.000 0.000 17.298 17.298 /invoke-new.py:8(main)
1 0.000 0.000 13.875 13.875 /usr/src/InvokeAI/lib/python3.10/site-packages/invokeai/app/api_app.py:148(invoke_api)
1 0.000 0.000 13.874 13.874 /usr/local/lib/python3.10/asyncio/base_events.py:613(run_until_complete)
1 0.015 0.015 13.873 13.873 /usr/local/lib/python3.10/asyncio/base_events.py:589(run_forever)
13153 0.261 0.000 13.858 0.001 /usr/local/lib/python3.10/asyncio/base_events.py:1832(_run_once)
13153 0.063 0.000 7.374 0.001 /usr/local/lib/python3.10/selectors.py:452(select)
13153 7.289 0.001 7.289 0.001 {method 'poll' of 'select.epoll' objects}
13325 0.041 0.000 6.123 0.000 /usr/local/lib/python3.10/asyncio/events.py:78(_run)```
looking at profiling output it's hitting epoll *super* hard
polling 700+ times a second doesn't seem sane 😄
Feel free to improve it 🙂
The processor also runs another thread, but that should block on a queue pull, so I don't think it spins.
https://github.com/invoke-ai/InvokeAI/blob/main/invokeai/app/api/events.py#L48 I'm wondering if it might be that
Ah, could be. I don't think it worked right unless there was at least a small loop, but if you can find a way to get it working that'd be great.
I'm still not super familiar with the async patterns in Python 😣
(also apologies, I didn't mean the "doesn't seem sane" thing as a criticism of the work being done here, I'm going into this with no experience of uvicorn or fastapi and trying to figure out what's going on 🙂
ok, so changing that to 0.1 has dropped CPU usage below 1%, so I guess now I should check if I can still talk to the API
yep, seems to work!
I will send a PR in a moment
I noticed that the current invokeai-new.py was using almost all of a CPU core. After a bit of profileing I noticed that there were many thousands of calls to epoll() which suggested to me that some...
@sinful forge apologies again, I didn't need to be snarky, I could have just done the work first and put the PR up. I'm also fairly new to asyncio and it has been a painful thing to learn
Does it work with any lower number? Worry being that the longer delay may cause issues. (Needing a delay seems really weird either way)
the delay is because self.__queue.get() can't be allowed to block (otherwise asyncio would be stuck across all coroutines)
ultimately in any async/await you are always going to have something, somewhere, polling in a loop
I am like 99.5% not an expert in this codebase, but I would be surprised if taking 100ms to pick up an event from this queue, vs taking 1ms, would make any kind of difference other than reducing the polling load
I think that class is used for invocations of the graph nodes? In which case these are jobs that are likely to be taking a good chunk of time doing image generation or upscaling or whatever, which suggests to me that this doesn't need to be a hyper-performant polling loop
Fwiw, on my gpu, one step at 512x512 takes about 30ms. I don’t mind waiting an extra 70ms max for my request to get started
woot merged. thanks!
Systems can also wait on a lock, which the OS implements however it fits, for events to come in.
See also: condition variables.
But that article doesn't discuss spurious wakeup, so keep that in mind as well.
@violet sleet just tested the PR .. was doing something similar myself just now
works as intended
but i think we need to rethink some stuff here
Get a model loader node in which outputs three values - Text Encoder, UNet, VAE
Feed the Text Encoder to the Compel Node instead of the Model
Feed the UNet to the Text To Latent instead of Model
Feed the VAE to the Latent instead of the Model
but for now .. this is exactly how it should work
I done it in my fork, but wait for lstein to implement propper loading part of models(vae/clip) without loading full model
@broken junco we good to merge that PR?
or is it already merged?
I not PR it as it's not ready yet
I create PR with compel as part that works fine now
yep
it works perfectly
tagging @upbeat prism coz we've been talking about this all morning ..
If you're talking about https://github.com/invoke-ai/InvokeAI/pull/3235, then no, this isn't ready to merge. It was developed to address a specific issue with users who try to attach a checkpoint VAE to a diffusers model, and as a side effect I implemented VAE-only loading. After the convo this morning I decided to generalize this to load and cache arbitrary parts of models, but there needs to be some more work done to support the caching.
Also, we need proceed to compel node both model(unet) and clip blocks, so compel can apply lora
I thinks about separate 'Simple compel' node that not support loras, but works as you said(and maybe with only one prompt)
Remember each node can run on a separate machine. Model loading should be done through a service, utilizing a cache for performance.
What is the context for nodes running on different machines? Like in a distributed setup?
it feels a bit redundant to define the model on each node
I thinks that in this situation there might be implemented check so node can be started only on machine with such model
But this is future anyway, now need to do basic)
no i mean i see the use case but im saying i also find it redundant defining the model on each of these nodes
so need to figure a way to make that ux better
Also about my PR there a question - are we going to support further legacy blend syntax?
Our model actually wraps several things - the SD model, the vae, something else? Could maybe set up something to get those components from a defined model and pass them to nodes as inputs.
yep
But I feel like I recall those components not really being very mix and match
Though there may be interesting use cases for mixing and matching
The CLI uses "defaults" to let you specify a parameter once and then keep using it
Do you mean the old colon format? I thought that was being deprecated.
Yep, but it still supported in code
I not copied this code in my PR
Optional globals maybe?
any model nodes should be providing string identifiers of the models (and any serializable config). then the service handles the rest.
just realized that ic ant pass them directly. will wake up fix that up.
need to look into caching the individual elements
right now even though the model is loaded and the individual components are set, it is still reloading them
Passing them through edges like that is not going to work well, unless you're just passing an identifier to help it load later.
to elaborate on the broader context for any nodelings - each node may be handled by a different worker (machine or thread). the inputs and outputs need to be serialisable and transferrable over network.
this means that we cannot send a whole model or model component directly between nodes. in such a distributed system, we can expect that every worker has the same models available, but not necessarily loaded or cached.
so in the invoke() method of a node, if it needs a model to do its business, it should ask the model manager for it by ID and/or type. the model manager will synchronously prepare the model (maybe it is loaded or just retrieved from a cache) and then the business logic can be executed.
the proposed updates to the model manager service will support this - the API is TBD.
in the mean time, if a node needs a model, it should handle loading it itself. you'll need to deal with the inefficient loading for now. once the model manager service is up and running, you'll be able to access it via the context object provided to the invoke() method.
(the exception is standard full SD models as the model manager already handles these)
ControlNet in node UI !
It's only partial implementation and hacky right now, but working.
sweet .. 🙂 once u rebase against main, the node should be more compact. It's been redesigned now.
maybe it's a bit stupid question, but - how to run default graph(text_to_image) in cli?)
txt2img --prompt "my prompt"
you must provide full field name for each paraemter
txt2img --prompt "cat" --steps 30 | img2img --prompt "dog" --steps 30 | show_image
found, but it's t2i)
this is the graph node that uses the subgraph
ah, that is what you were looking for , i misunderstood
Nice -- resizable too!
yup
would it be possible to split the node in such a way that it can be plugged or merge into the txt2img and img2img nodes
is this that or did you create an entirely new graph for controlnet?
It's integrated into generate.py TextToImageInvocation. Which means ImageToImageInvocation etc inherits ControlNet too.
I plan to port to latent.py LatentsToLatentsInvocation, but currently for me it's easier to debug as TextToImageInvocation.
awesome .. let me know when you have a pr up .. would love to try .
Should be able to update old PR later today.
awesome
getting the preprocessors nodes in too?
i think it's might be done with conditioning fields from my pr in future
as separate node
For today I want to at least get a canny edge detection node built. Time for a nap...
sweet
or maybe even create new field type, not sure what fits better
So did we make a tag for dev but not unstable?
ya pre-nodes tag is the last main that has a fully working ui
err, maybe i am misunderstandng
Looks nice. 2 questions:
- You are setting the models independently in two places (load model and text to latent). Should load model be outputting to an input on text to latent in some way? Maybe just a string or id so it knows to grab the loaded model. I noticed the model loader is in progress so maybe this is temp.
- Latent to image doesn't seem to have size input or settings, just outputs. Are they being implicitly passed through via latent to image from the noise node? Might cause confusion if so (looks like things coming from nowhere).
a latents object has dimensionality, decoding the latents needs nothing else to create the image ofthe right size
- yes, used noise size. even in old generation by code logic when you define width and height it's used to generate noise with this size(a bit not like this in img2img as i remember)
- it's still in progress, for example this my draft:
@tulip sluice also - we need to choose if we import loras in compel node, then unet should come in, and clip+unet as output
if we not load loras in compel block then it's like you done
@upbeat prism is it possible to create node with variable parameters count(which will be array input parameter)?
so for example if you input value in last field - there another added at end
or with +- buttons
@hollow marlin could you look at dm?)
possible, yes - implemented, no. will need some consideration to handle correctly. out of curiosity, what is the use case?
node to load loras
so you do smth like
name: lora1 strength: 0.7 (-)
(+)
->
name: lora1 strength: 0.7 (-)
name: lora2 strength: 0.5 (-)
(+)
loras should not be loaded onto the unet in the compel node, this would break the possibility that the unet is unloaded in between the compel node being invoked and the txt2latents node being invoked
loras should be loaded, like they are now, only just-in-time during the SD inference loop
i saw it's without loading unet
it's like:
ModelField:
name: str
loras: List[Tuple[str, float]]
we only said which loras to apply on future unet loadings
in comfyui it implemented same way, but they have list of patches to weights
we can do same way, but unlike our implementation with hook, it's requires to make unet copy every time and we can't cache it
hmm, should the image upload API be working? I'm getting some weird errors when I use python, so I figured I'd at least start with curl and compare some pcaps of the two, but it's not working with curl either:
{"detail":[{"loc":["body","file"],"msg":"field required","type":"value_error.missing"}]}%```
I added -L to the example from the docs, because otherwise it doesn't follow the redirect to /uploads/ and therefore doesn't even get this far
huh
I had Content-Type: multipart-form-data instead of multipart/form-data
well, at least now I can figure out why my python version isn't working 😄
I think Lora’s need to loaded in their own nodes and tagged on before they reach the sampling … that way a user gets to pick and choose as they wish
Not only sampling, but prompting too)
So, lora loader must be right after model loader
Yeah tack it on after the model loader individually and feed that into the sampler directly
Does it need to go through prompting ? Don’t think so right ?
Lora modifies unet and text_encoder
Prompt goes through text_encoder
Sampler uses unet
Yeah..so we do model — Lora — feed the tokenizer from model to compel and feed the text encoder from Lora to compel … then feed unet directly from model to sampler and the positive and negative conditioning will go in from compel
And we’ll need a merge Lora node so we can merge multiple Lora’s before the text encoder is fed to compel
Yep, but I don't think that we need to separate tokenizer and text_encoder
Pass both to Lora?
You can do it's in serial
Ah yeah… that’ll solve it ..feeding one Lora into the next
Sounds like a plan
I mean no need in sending tokenizer and text encoder separately through graph
No use case ?
So model will have 3 outputs ? Text encoder / unet / vae ?
Text encoder being txt encoder plus tokenizer
Yes, that why I named it clip in my draft
How are you passing them ? Just the name and individually load them ?
I don't know how to pass correctly so i create some magic(give a sec, i'l find)
#dev-chat message
If we assume that all it added to invoke, then yes - i just use name
And array of lora names that need to be applied on use
Are they getting cached correctly ?
Now nothing work correctly
There are functions to call individual parts of a model
They load full model
They just call get model and gets from object part that you need
Make sure it's not called load_model when it's not actually loading (just getting metadata).
And remember nothing stays in memory between nodes (as a rule, though in practice it may to improve performance)
This functions created indeed to load)
this function
it will send you the vae of the model in memory
if its already loaded
so if you load the model in the Load Model node
then you can just call this back when u need the vae later
pass out the model name from the Load Model
and use that as reference in the VAE decoder to call just the vae of the loaded model
But that model is not guaranteed to stay in memory between nodes
If no model in memory it loads full model in memory and give you part that you requested
this only happens when you feed in a different model right?
which seems to be how it works?
So we asked lstein to implement real separate loading
hmm
Yes but no. Assume multiple users submitting nodes from different graphs simultaneously. Which is a supported workflow even in the open source version
okay that makes sense
Also - why you need to load full model if you interested only in unet? It can save loading time
I mean in some future a node scheduler that intelligently batches nodes by model might be cool, but even in that case a crash between nodes breaks it
And loading tokenizer might be a lot faster then full model
Yah, I thought the functions on the model manager just loaded and cached the components?
Now it's call get_model and after that takes from result vae/text_encoder/...
And to use in our case - yes they should do this on components level
It's pretty simple in code until you remember about checkpoint models
As checkpoint model can be loaded only fully
we are converting the models now anyway.. so after teh first time convert, we can do it the diffusers way
right now i think they're only saved in memory.. we can do that to disk if it isnt doing that already
There differents ways - for example we can force convert new models
So all models will be on disk in diffusers format(why diffusers don't use tar or zip to easy move/exchange models? -_-)
is session_complete supposed to be sent after all the invocations are done? I was just looking again at test.html and I noticed it's subscribing to it, but afaics it's never triggered
which potentially makes it non-trivial to tell when all the nodes are done
Uh... That or something like it. But it's probably less tested than everything else.
oh, it may actually not be a thing at all, afaict the only place session_complete shows up in the repo is in test.html 😄
Testing ControlNet with a Canny edge detection image processing node:
too freaking cool @hollow marlin !
Thanks, it's getting closer!
I know I'm asking a bunch of fairly dumb questions, I'm happy to stop or discuss elsewhere if preferred... I seem to be doing something wrong and only the first node in my graph is being executed - would appreciate any guidance 🙂 https://pastebin.com/4JQhEYv9
Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.
(that's the output of the Get Session API endpoint)
so I think your graph must be valid if you do not get a HTTP 422 Unprocessable Entity error. on my system, i do not have ESRGAN set up, a load image -> img2img -> upscale graph very much like yours works, but the upscaling just outputs the same image it received
Afaict only the load_image executed. Maybe so grabbed a commit where things are a bit broken?
very possible
yeah, looks like as value written to html and then got back on change it's writes as string
and in this case it's int in real
depending on where the error occurs, you may need to inspect the invocation_error WS event the client receives
sorry it's so in flux at the moment. probably a few weeks til things settle down. you all are early adopters 🙂 thanks for hacking on it
No apologies needed, this is an awesome project 😁
I am subscribing to invocation_error and nothing is coming through. I’ll play around some more
I’m afk for the evening - if you’ve not got it figured out by the time I’m back I’ll try your same graph, didn’t think to do that at first
you can fix enums like this: (EnumInputFieldComponent.tsx)
one-line fix, so I don't think that this needs separate PR)
//or you can set indexes in key instead of option value itself, so you can use Number(e.target.value) as index
@rough halo I think you saw same images because without downloaded esgran model it's silently left image unchanged)
but you can see console for message about it:
>> ESRGAN is disabled. Image not upscaled.
From what I can see, it doesn't go beyond the load_image - I get an invocation_started for load_image:
gnubert-dreambot-backend-invokeai-1[11676]: Received packet MESSAGE data 2["invocation_started",{"graph_execution_state_id":"81c2c574-ac7d-4b97-925f-ccc65e0c504c","node":{"id":"888413fd-0d60-4a0a-88ed-a37fcf2bfb38","type":"load_image","image_type":"uploads","image_name":"b818af89-8fa4-4cfd-b37a-fb0c7276fe75_1682526229.png"},"source_node_id":"0","timestamp":1682526229}]
and then an invocation_complete for load_image: gnubert-dreambot-backend-invokeai-1[11676]: Received packet MESSAGE data 2["invocation_complete",{"graph_execution_state_id":"81c2c574-ac7d-4b97-925f-ccc65e0c504c","node":{"id":"888413fd-0d60-4a0a-88ed-a37fcf2bfb38","type":"load_image","image_type":"uploads","image_name":"b818af89-8fa4-4cfd-b37a-fb0c7276fe75_1682526229.png"},"source_node_id":"0","result":{"type":"image","image":{"image_type":"uploads","image_name":"b818af89-8fa4-4cfd-b37a-fb0c7276fe75_1682526229.png"},"width":1280,"height":1006},"timestamp":1682526229}] and then nothing else
just see for messages in chrome dev console)
invokeai itself logs nothing else after it is told to start the graph: gnubert-invokeai-1[11676]: INFO: 172.20.0.8:48584 - "PUT /api/v1/sessions/81c2c574-ac7d-4b97-925f-ccc65e0c504c/invoke HTTP/1.1" 202 Accepted
I'm calling into invokeai's API from a python bot, not through the web UI 🙂
I don't have much more time to work on this today though, but I'll be poking at it some more tomorrow
ok, but now I can think only that you maybe not subscribed for your created graph, but for something old
so the response I get from InvokeAI after I post the graph starts like this: InvokeAI response: {'id': 'dc65e223-7526-4b58-87d7-505a2358b18f', 'graph': {'id': '8c6a6f43-1189-43f6-95ff-0cd62421951d', 'nodes':... and I'm subscribing to the first id there, so the one ending b18f - that's the one I call "PUT /api/v1/sessions/dc65e223-7526-4b58-87d7-505a2358b18f/invoke HTTP/1.1" 202 Accepted and emit subscribe with {"session":"dc65e223-7526-4b58-87d7-505a2358b18f"}
and then
PUT /api/v1/sessions/dc65e223-7526-4b58-87d7-505a2358b18f/invoke?all=true
?
huh, apparently I stopped doing the ?all=true 
I'll try and get some time to add that back in later and rebuild. Thanks!
https://github.com/invoke-ai/InvokeAI/pull/3261 resize and scale latents nodes. these allow us to use hires fix.
but there is an open question:
latentsare a special case oftensorwith a particular dimensionality- the noise node generates
latentsof a particular size and we allow the size to be provided as pixels, and the pixel value is floor-divided by 8 to determine the quantity of the width and height dimensions of the tensor
should the resize node also accept w/h in pixels and floor-divide by 8? or, should ALL latents related invocations deal in the actual quantity of the w/h dimensions, and rely ont he user to multiply by 8?
i think we should go with pixels for latents, and if we do generalized tensor nodes in the future, let those use the actual quantities. thoughts?
i think about - if we create graph for basic txt2img/img2img and will call it, then we need to divide width and height by 8 before passing value from user input(width, height) to noise inputs?
I think we can forget somewhere to divide/multiply here))
agree
so if we use pixels for latents everywhere, we can all continue to think in pixels, which seems less confusing
but it's potentially less accurate, which may make future use cases more complicated to implement
low-level nodes don't necessarily need to be as easy to use
Unless something really big changes, latents is always going to refer to a tensor with specific dimensions that results in an image with dimensions 8 times larger than the tensor dims
so i think this is the right choice, then we can offer the generalizes tensor nodes in the future
what happens if the user specifies a size that doesn’t correspond to the number of pixels being input? is there a use case for having w/h as distinct inputs to the node, as opposed to a self-contained image that can be inspected for w/h?
oh i just read the code. um. has that been tested? i’m unsure that the vae latents can simply be interpolated to resize them like that
yes, it works. in this graph we:
- first decode the 512x512 noise
- then do a txt2latents
cat& decode it for a 512x512 cat - then explicitly resize the latents to 768x768 and latents2latents
dog - finally decoding to get a 768x768 dog
in all cases, width and height are derived from the latents object. we do not provide explicit width or height to anything latents related, except when creating noise
the use case for this is 'hires fix', where we need to resize latents
or, more straightforward "upscaling" in latent space
would y'all be interested in having a couple of invocations to 1) grab an image from a URL, 2) resize an image?
I'm doing those in my code at the moment so I can do img2img, but I'd be happy to have a go at porting that over to invoke?
@upbeat prism ahh, great. sorry for the noise, i saw something that looked weird and thought i should call it out
np, i certainly do not know what i'm doing in a meaningful way here
Is there currently a way to save/load graphs built in the Node UI?
Sure! we’ll want to expose the various resize interpolation modes and stuff like that.
Not yet
I think we are naming the nodes in a confusing way. For example, latents to image should be VAE Decode. Thoughts?
also, I think we can add flag for tiling decode
I think, while more accurate, Latents To Image is a simpler thing to teach someone than "VAE Decoding"
node life is the real life yo
node streets
@sinful forge I'm doing a rewrite of the model manager now in order to satisfy the use case that @upbeat prism raised of being able to load individual parts of a model (such as the unet) and mix and match them dynamically. In doing so I've generalized the RAM caching mechanism so that transformers models, such as CLIP, are cached as well as whole diffusers and parts of diffusers. I want to make this system work across different machines in a distributed environment. What do I need to do to make the model manager a first class service that will interoperate correctly?
I'm also going to start emitting events from the model manager that the UI can display. I was thinking to generate model_requested, model_retrieved_from_cache, model_loaded, model_uncached , and model_load_error events. Is this the right way to do it?
Awesome!
So to make a proper service, you need to define an ABC class, then an implementation of the class for local usage. The ABC lets someone create their own version for their own use case.
Regarding events, who is the consumer and context of the events? Is it the session? The node? Everyone using the service? (Probably not the last one, since you don't want to know that another user is using a particular model). My gut approach would be to have model_loading and model_loaded events, scoped to a node. The caching stuff is a detail of the particular implementation. I'd log that information (or send it to a tracking system) since it'd be very useful data, but I don't know if I'd send the user events about it.
Thanks for the explanation - will do an ABC forthwith. With regard to scoping, I suppose we do want to be able to create graphs in which different nodes have different models, so scoping to the node makes the most sense.
nodes already use different models and when you implement partial loading they will use only parts of models
With respect to utilization of the GPU, the current system is very conserving of GPU VRAM and moves models out of GPU as soon as they are no longer in active use. So essentially you get one model at a time in the GPU. An alternative I discussed with @upbeat prism is to wrap a context manager around this process such that you could lock models into the GPU with a context, and therefore have multiple ones in GPU at the same time. Not sure if this is desirable or not and maybe a future feature?
Agreed - just checking because Kyle raised scoping to the session as an alternative.
//i think i still need google translate sometimes...)
model_loading , model_loaded and model_load_error events would be nice for user feedback, and I'd expect to receive those events during the execution of a node's invoke() method. Thanks @broken junco !
Hyped for this node https://github.com/vijishmadhavan/UnpromptedControl
No one experienced crashes with vae decode in wsl?)
not always, just in some generations
at least, i think try to update cuda version as it's "a bit" old 😄
> nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Fri_Feb__8_19:08:26_Pacific_Standard_Time_2019
Cuda compilation tools, release 10.1, V10.1.105
I haven't done the interpolation modes yet because it wasn't immediately obvious how to have an enum parameter, but I've just pushed up a first pass PR to see if I'm doing this even close to correct: https://github.com/invoke-ai/InvokeAI/pull/3296
You can do enum like here:
https://github.com/invoke-ai/InvokeAI/blob/main/invokeai/app/invocations/latent.py#L385
Nice, thanks
there is no image to latent node or I miss something? Oo
Does not exist but it should
Ok back to dumb questions - I see y’all posting screenshots of using the nodes in the UI - how do I get that to work? I’d like to test my PR for downloading/resizing images, but it sure would be easier to do that via the web UI on my gaming PC rather than keep rebuilding the docker container and shoving it through my server’s automation 
(The only parts of invoke_new.py I’ve seen in the web UI are test.html and docs)
Initial port of ControlNet support from generate-based nodes to latents-based nodes:
With Control model on Text to Latents, I'm supposing this doesn't support multi-controlnet yet?
This is a bit different than the miro board we talked through, so just trying to understand the progression to this form of inputs
Should get multi controlnet node support added later today, using the pseudo-collect strategy we talked about. Multiple controlnets are already supported down in core below nodes, in diffusers_pipeline.py.
nice! that will extract 'control model' out from text to latents at that point?
Yep that's where it's headed later today. Screenshot above is porting single controlnet support I did for TextToImageInvocation over to TextToLatent to make sure that works, before moving on to multi controlnet.
Does this/will this support posing models?
That's a good question. So far all you've posted are tests with canny.
canny it's just preprocessor, not controlnet
so, you can pass pose-image directly to controlnet and it will work as I understand
right but so far only canny input has been shown.
If I’m understanding what’s been done by Gregg in our discussions, you will have the pose preprocessor and model usage but not openpose (pose editing)
We’ll need to build a UI that allows for pose editing and control
Yeah I've only worked with the OpenPose still image pre-processor, not tried anything fancy for pose editing. I'm planning to have most of the standard ControlNet pre-processors ready to go as nodes by tomorrow. Just making sure I've got MultiControlNet support working with TextToLatent first -- getting that working may affect how pre-processor outputs are handled, and I'd rather deal with that code churn on one pre-processor (Canny) than all of them.
https://justsketch.me/ i stumbled on this and think its definitely overkill, but something like it would be pretty awesome on top of the canvas
Yes I can confirm, I've tested OpenPose output on my previous Txt2Img version of controlnet support and it's worked fine.
We won't be able to use the OpenPose editor itself - not a permissive license
So much for the Open part of OpenPose!
there are ton of sites for making open pose images. Would be nice to have in invoke, but not sure it needs to be in the initial release. For example https://app.posemy.art/ (I personally prefer sites that have a model over the skeleton as I have trouble visualizing)
One thing for backend: open pose files should be in their own folder, not in i2i input folder.
The challenge is ensuring that the skeleton/pose is modeled on what the controlnet model was trained on.
It can't just be "any" pose ui
Alternatively, could create a new model and train it on something else.
There are some other models for ControlNet, too.
14 original models to be precise with at least another 20-30 i know of
@sinful forge i see we are using ImageType.INTERMEDIATE for eg CropInvocation. these will often be results, though. I'm not sure intermediates really makes sense to have
Oh, cool - pydantic validators get previously validated fields in their context. this means we can have nodes that have fields that depend on others, for example, resize latents could have a mode of "explicit" | "scale". if explicit, we can require a width and height. if scale, we can require factor. not saying this is a good idea for this particualr node, just an example
reaaaaally would like to see better validation errors than 422 tho. sending an issue for you kyle'
Uh... That's FastAPI. You're welcome to find something in the docs that'll give you a better result.
https://github.com/invoke-ai/InvokeAI/issues/3321 here's an example of what would be useful
Yah I was thinking of getting rid of the type or just splitting between uploads and everything else. Once we move to latents, we don't need intermediates for images really.
Apparently default 422 should look like this
{
"detail": [
{
"loc": [
"path",
"item_id"
],
"msg": "value is not a valid integer",
"type": "type_error.integer"
}
]
}
FastAPI framework, high performance, easy to learn, fast to code, ready for production
im blind, that is in the body of the 422 response
its kinda confusing, there are 3 behaviors when validation fails:
- 422 responses (a node field fails validation)
- 202 on invoking a session, but an error is caught during edge validation (eg
InvalidEdgeError) - 500 error (something within the
invoke()method fails)
i think those are all of them... when the 202 occurs, nothing makes its way up to the API layer
canvas txt2img and img2img working on nodes
Holy shit! That was fast
So an invalid edge is somehow added to the graph? That should be caught when adding the edge, not at invoke time.
When creating a session with a whole graph
Yah that should be validated at creation time.
maybe I forgot to add the root-level validator
also if i create a graph with a single node - lets say a Lerp node, that has an image field - the graph is valid even though the Image field is not provided. so you get an invocation error.
i assume this is because the imagefield is Optional
err, because it is a union with none
we can enforce that the field requires input (either direct or via connection) in the UI, but we would need to add schema customisation for that - there doesn't seem to be any distinction between eg Union[str, None] and Optional[str] in the generated schema
Yah those are the same thing
we'd need our own type hint or something on the Field to indicate that scenario (value MUST be provided either directly or via connection)
Ok, np. I've already got the UI set up to support ConnectionKind (input, direct) and ConnectionRequirement (always, optional, never). I think a combination of these two covers all possible cases
I'll be revamping the UIConfig class at some point before release to have a much more comprehensive set of customisations that the UI understands
Small suggestion as you talking about connections - maybe we can do that lists can accept multiple connections?
so it will be easier with controlnet for example
there is a collect node that does this, but i've not set the UI up for it yet
it's feels strange create collect node for one input controlnet)
im working on the nodes editor in my spare time, the overall migration is the priority right now, so on the UI side i'm not able to address these things just yet
do you know how comfyUI handles this situation?
@sinful forge what if the collect node kinda functionality was baked in to to all fields? if the field type is an array, it can auto-collect. if not, it only accepts a single input?
right, ok. so it processes each control net serially? or is this just hte UI presentation?
not sure 100%, but as i know each node just add info to list
gotcha
I also still not sure what better
as chaining looks... easier to understand? as you have 1 input
and with multiple inputs you need to check from where each came from and forgot you something or not
processing serially is a pattern that makes no sense.
Me and Gregg talked through what is usable to an end-user - a collection node or the ability for multiple outputs to collect on a valid "collection enabled" input seem to be the only things that are sound UX
My ideal would be having node ports with effectively their own collect functionality, so if three controlnets had their ControNetInfo connected to the same control_info port on TextToLatnents it would collect them into a List when the Node is executed.
Oops, I think I repeated what @broken blaze just said 🙂
just means we're on the same page!
🙂
Strong preference for this to be an explicit node
from logic perspective I too think that this good
but i think about situation when you have big graph and you have multiple controlnets
so, you need to check from where each input comes and count if you connect them all
You can always do some UI magic on top of it if you want to hide that, but handling it implicitly as part of edges would be a headache
I'm okay with a Collect node, but I see stuff built with ReactFlow where input ports can take connections form multiple output ports and it cuts down on visual clutter when this is a common pattern.
Again, you can do that as UI on top of the graph
Yeah I have no idea how hard it is to implement in the graph execution...
then you need to do so i every client(web, cli, ???)
else it's some kind of logic inconsistency between clients
It would make edge validation even more of a pain than it already is (and execution, due to value preparation).
also - how implemented collection node? as for now we can input only constant count of fields
and this makes me think about "to much inputs to connect" error from hidden collection node 😄
It handles iterations as well.
Roughly, validation is mostly type-based. The first connection (input or output) can be anything, then the next connection (to the other side) must match the item type.
At execution time... well, without describing the entire system, the collect node has to be "prepared". It can't be prepared until all parent nodes are complete (and have produced results). Then all the results are collected into a list, which is produced as the output of the collect node.
It's one of a few special case nodes in the graph (the others being iterate and graph nodes).
I don't know code, but from my perspective it looks like painful place in type checking(it clearly disables it here)
and with autoaggregated fields you know that type of value or it's child must be this type
it's the X -> list[X] handling that would have to be done everywhere that would be painful
(where list[X] -> list[X] would also be valid)
yep, so what difference? you just check type twice
plus preparation would get more complex, since that would also have to be done at preparation time for every node type (instead of just for collect nodes)
let's I say view from outside:
i see it like:
graph created
modifies somehow
start called, here happens validation
after - any other work with knowing that we already validate graph
i think i understand a bit - i need to read code about moving result from output to input
I mean, have at it: https://github.com/invoke-ai/InvokeAI/blob/main/invokeai/app/services/graph.py
It's been through ~3 revisions over 4-5 months and I still decided to stick with this way of doing it. I don't remember all the reasons I went with this approach, but it does currently work and constrains variable edge types to a couple use cases.
Is the intent still to have CollectInvocation working for v3.0?
Yep I meant in the UI too. Hmm, guess I should try it again, think I last tested in the UI a week ago. It's still denyListed in the UI.
node editor ui is wip, ive only cranked it out in spare time. collect needs special handling in the UI, not done yet
I've pushed a new branch to the InvokeAI repo with my latest work on ControlNet support: https://github.com/invoke-ai/InvokeAI/commits/feat/controlnet-nodes
PR to follow shortly. This has support for ControlNet integrated intto TextToLatent nodes, and also includes 11 ControlNet preprocessor nodes for ControlNet v1.1 preprocessors. There are 3 other "experimental" ControlNet v1.1 processors I'm missing, but plan to include soon.
Thanks @violet sleet for template of how to support new connection types in the typescript!
BOOM
The only think - i not sure about combining preprocessor with controlnet model
Aiming to get PR out this weekend.
Can be there situation when we already have preprocessed image and want just to load controlnet model? Without preprocessors
Yeah that's a compromise I made to support MultiControlNet -- which I still don't have working yet in the NodeUI. Haven't quite figured out the right incantation to output just an array of ControlFields from an aggregator node.
Ah right, forgot to mention above. That scenario is currently supported by a PreProcessedControl node (actually the base class for all other pre-processor nodes) that just passes the image unaltered via the ControlField.
So, i saw it like controlnet node with input image
And we can connect here preprocessed image or output from any preprocessor node
Or even use preprocessor itself without controlnet)
I think we can always iterate
This is great for now
I think its possible a user just wants the preprocessed image, but the question would be "for what"
And the answer to that is "no reason" right now - The only ingestion element is for processing w controlnet
Eventually, we could create a node for the "output preprocessed only", if its needed
We ought to sprint to getting Controlnet + Multicontrolnet working with the path designed here by Gregg, sounds like the only blocker for multi-controlnet is on the output side of the multi-controlnet aggregation node?
(are all of the other processing nodes in the PR?)
yep, looks like it
just to be clear what I mean:
Originally I implemented with preprocessors that just just do the preprocessing and pass image to version of TextToLatent* that had control_image, control_model, control_weight, etc fields. But that TextToLatent* as a UI node was getting very unwieldy once started adding multiple controlnet support. I'm still banking on fixes to CollectInvocation (or me figuring out the hacky version) to send multiple ControlFields to single port as a collection / array / list / tuple / whatever.
Hrmm... the only thing I'm starting to worry about is the number of parameters on TextToLatent
not so big... i remember at beginning it's bigger)
e.g. seamless also feels like something that's optional, and as something optional should probably be represented separately (maybe a pluggable list of additional things to do? or something?)
Progress images also seems like it should be more of a session setting than a node option
comfy for example have sampler and extended sampler
so, we can create generator with less parameters
Yep I think that's a totally valid alternative. One thing I liked about the peprocessor/controlnode combo is that the prepocessor used, at least for 99+% of use cases, can pre-determine which ControlNet model to use (or restrict to a few alternative). Haven't implemented that yet but was part of the motivation for the combo.
and extended with all
Ya pluggable is good. Also I want to build a key frame style editor to inject manipulations (on eg latents, parameters, etc) at specific steps during inference.
Of course needs backend support
Some of this comes down to aesthetics and ease-of-use, more nodes vs bigger nodes. If subgraph collapsing gets implemented in node UI this may not be so much an issue.
good think, but in theory - what if user wants to try some new controlnet custom models?
Yah I was thinking something node-like, but it's just linear and you put them in a list
each can have parameters, but they take a latent and output a latent (or whatever happens at each step)
idealy I see for this case - textbox with hints for standard models, but values from list of standard models not forced
We already have a specific use case for intercepting the inference process at a specific point - the symmetry feature - it mirrors the latents at a certain point. Also I’d love to be able to do prompt interpolation, switch to a different model halfway thru, change cfg/strength at certain points, and so on. Keyframes afford all of that. Need the pipeline to do more with the step callback, probably needs diffusers support?
can't imagine how it's can look at nodes %_%
but sounds cool, remembers about word switching in a1111 prompts
@upbeat prism do you think it's possible to do textbox with list-helper values in node?
Can you elaborate? I’m not sure I understand. But we can do special handling as needed.
Would need to be a special node that expands into a much larger editor
in controlnet model input
hint user default models, but allow input anything else too
Ok, like a drop-down with default list, but also user can type something if they want?
yep
Sure. Need to provide ui hints in the invocation. I will update the uiconfig class soon to accommodate more stuff like this
@hollow marlin then your think looks ok)
Like type: list-with-free-text
Right now loading the ControlNet model relies on diffusers model.fromPreTrained(). And plan is to move it over to new InvokeAI model management stuff from @broken junco .
And ControlNet model must be specified in text entry box. But I'd rather have a dropdown with the "standard" ControlNet models. And maybe a model override field where people can enter whatever they want. Or does reactflow have dropdown+freetext combo?
no by default, but sound that we can do it)
Once again I typed before reading the previous two minutes of conversation and repeated what others were saying 😊
I think combined preprocessor-controlnet models too ok
at least convert from this nodes to separate nodes as I asked - easy
if there some bad cases will found
we might have duplicate code among nodes by combining, right? e.g. we'll want a canny node for other reasons anyway.
@upbeat prism Would you have time this weekend or early next weekend for a video chat to help with getting CollectInvocation working or alternatively help me figure out how to get array/list/collection output from a HackyCollectInvocation (hacky part is it has a separate input port for each entry that is output in the collection)
hm... I not sure what you asked(even with google %_%) but with combined nodes - you can't call preprocessor separately
I’ll do that this morning. Regarding UI elements, everything is fully custom and (almost) everything is possible.
Just let me know what you need and we can figure out how to make it 👍
Yep, I did consider that, but wanted to push forward with ControlNet support without worrying about separate image-processing possibilities. Technically there is a "raw image" output port from the ControlNet preprocessor to allow use of just the output proprocessed image (I'm using to pass to ShowImage nodes). But yeah definitely clunky if all you want is the image preprocessing.
Getting the node composition (inputs, outputs, granularity) correct is pretty important. It'll be hard to change that sort of thing later.
I see it as:
- combined - a bit cleaner, can give user hints about models
- separated - can use preprocessors separately or directly load already preprocessed image
@sinful forge what better from your point?
The problems in #1 can be solved by UI and/or somewhat by subgraphs
I generally err toward greater flexibility, given nodes are for advanced users. Then solve any usability issues with UI/CLI.
...then it's second option?
So is that a vote for #2?
to be true - I too more for 2 option, but see that this two options easy switchable and first option have some benefits for users with a bit lower skill
We can’t expect to have everything right up front, so if the system isn’t flexible enough to handle us needing to change in the future we’re already doomed
But I’m confident it is, and that we just need to align on a pattern for how well handle inevitable change in the future.
It's backward-compatibility. You don't want to break backward-compatibility in the future
We’re going to have to.
we should really be versioning node interfaces probably
If you break back-compat you'll break everyone's libraries, yet again
i think @hollow marlin and I still not sure what option to select 😄
If we have no flexibility for change in the future and are trying to perfect things now, this will never ship
You plan to support versioned apis. Almost every product does this.
You may eventually have to support old version deprecation and graceful failure, but you should try your hardest to support API back-compat
so in general, parameter addition is fine over time, but parameter removal or changes will be breaking changes (i.e. new version)
Right - I hear you on that.
That’s different than “we can’t make changes”. We should figure out how we’re managing versioning
yah but it's also true that the nodes you make now you should plan on supporting for a long time
and as we've explored things, the general pattern that seems to be emerging is to favor fine-grained and minimal/single functionality nodes
I'm all for that, but am also in favor of actually shipping something to users who have been waiting for 3.0 for nearing 3 months.
🙂
e.g. for above I would vote for #2. But if adding controlnet in the UI, I'd probably make it look like #1 (even though it would create 2 nodes)
I am also for #2, but for the most part it's already been built that way
I can see where an experimental approach to messing with pre-processed images before passing into the t2l node would be cool, but theoretically you could have a distinct "preprocessing node" with an image output that isn't built for controlnet.
I'm just hesitant to add more work to something that is so close to actually solving problems that people actually have, over solving for the less than .1% who are going to mess with preprocessed images (which, I'll add, is of purely theoretical value as far as I can tell)
I mean the same argument could have been made around outputting latents instead of images
and it's not really hard to split something like this into two (or more) nodes
plus, canny is used in inpainting, so better not to have two implementations
Don’t even get me started on latents.
What’s done is done
Alright - So your proposal is to split into a preprocessing node that feeds into the control node?
Are you willing to help with the work on either that, or the collection/aggregation that Gregg is currently working on?
He's having issues on the output side
and only needs UI for the node editor
im fixing collection UI thsi mroning
If I understood correctly.
and I thought node UI was 3.1, so ¯_(ツ)_/¯
Right, but controlnet is potentially 3.0
its kinda proving essential to building out the nodes features
(so node design for the actual graph will be needed regardless of whether UI is done or not)
I'll go ahead and redo for option #2. The refactoring I did earlier should make changing it pretty quick.
Thanks Gregg
Do we have a plan ironed out for versioning yet?
I recall seeing some sketches for a class that would be used, but not sure if that was actually rolled out anywhere.
wow, so much preprocessors
unfortunately smth wrong with Zoe preprocessor for me(a, it's available only in main)
I think most breakers now it's - model manager, TI and lora)
I think Kyle was proposing versioning each node.
Node interfaces are part of the web API, which would be most sensitive to API changes.
In the node UI, I'm sometimes seeing errors if the preprocessor or controlnet have to download a large model that hasn't been locally cached yet. The server seems to keep going though and eventually gets the model downloaded and cached.
no, I mean ZoeDepth added only in main
in last release - 0.0.3 it's not exists
It needs to be separated imo for a ton of reasons.
-
The preprocessed image is actually quite independent of what control model is being used. For example, a preprocessed canny image can be used with any single canny model out there. So if I wanna do two nodes with two different canny models, then if they are not separated, I am preprocessing the canny image twice. Instead we should just be processing it once and plugging that result into the control net loader.
-
The prepreocessor will remain consistent through iterations (unless the user wants it otherwise) .. so it is not necessary to calculate it each time. This is much easier to achieve if a node already has a result which it wont have if its a part of control net coz thats where the variables are likely to change.
-
There are some different types of control net models that use the same kind of input -- such as scribble / softedge. It feels redundant to do the same hed map or pidinet map for them across three different control net models if we are using multicontrolnet.
-
As for @hollow marlin mentioning the block was MultiControlNet ...we'll need to create a new node for merging Controlnet's .. and this merge node will have two inputs per control net => control net image and control net model + inbuilt attribute for control net weight .. then we pump these into an array and feed it to the pipeline as it expects it to be? Thoughts gregg?
Here's an example of the revised (option #2 above), separated image processor and controlnet nodes. Still some cleanup to do, but it works.
Time for dinner. Will respond later tonight.
Merge can be done in different approach, but I still think about chaining))
As then there no point with multiple inputs and you easily can monitor what connected
And it's look minimalistic for 1 controlnet)
Pardon my shitty handwriting
Oh... I totally against this option 😄
haha why? 😄
A lot of nodes
In chain - no nodes used
In collect node as i know collects infinite inputs
how would this be done with less nodes?
coz you will inevitably have to load all the required components
i thought of a single collection / merge node
In chain - every next node adds controlnet info in array
In collection node - all controlnets connects to collection node and output is array
will the collection node has dynamic number of entries?
or will it be set to .. lets say 5?
oooh
I don't know any about it)
so we feed x number of control nets to a collection node
and then we get an array based on that
but the thing is .. the array needs to be in a specific format
coz the pipeline takes images / models / weights in different arrays
do we a collection node for this already?
??
nothing nvm .. i found it. 🙂
It's just collect inputs to list
yep. i was wondering if we had base template that merges variant inputs .. but saw just now that the current collection based nodes we have are just list makers
So, collection node might be like your option, but with not only 2
the reason i didnt go for a collection node was .. if we cannot do dynamic inputs, then we'll end up with a big node that has empty inputs if a user wants to use only 2 for example
i dont know if we can do dynamic inputs right now .. can we?
I think more about - when we have a lot of inputs, we need to check everyone to understand that all ok)
the good thing we have going is that users familiar with nodes will find chaining merge nodes a very common use case. It's generally how it is done in other programs. So that's a thing.
Also we already told about chaining in prompts
yep
is your issue with chaning only that it makes for a lot of nodes?
or is there any other technical issue you foresee?
- i think it's easy to check that all ok in chain
- create collect node even when you have only 1 controlnet... A bit strange looks)
we wont need to .. the main input will take either single or mix of multiple
so as long as it is a control net output .. we dont have to use a a collection for a single node
About this variant - which color then ui will use?
Array or controlnet
controlnet
so a user accidentally doesnt feed a random array
also coz the control net's array is definite .. with [img, model, weight]
It’s a bit of a tangent, but are nodes for advanced users? That seems like it’d conflict with nodes being the primary method of extensibility?
If you need matrix stuntwork let me know. I have lots of experience playing with Numpy array manipulation (and should be applicable to CuPy too I think since they share most syntax).
@hollow marlin the pypi release of controlnet_aux doesnt have zoe yet.. should i install it directly from patricks repo?
Still think about multiple inputs(which from where come to understand) and hacks work with both collection and not and work with special node
When chaining do this all by nodes without additional features and totally type safe) (collection node not so typed)
Chained:
- Easy implement, easy use
- sometimes not so easy to arrange nodes
Collect:
- ?
- requires special hardcoded node, maybe special handling for one input, user need to create this node(?), All connections comes to one place and make a mess if they not 2-3, requires special support from ui
I tend to believe Invoke (and SD ecosystem in general) is pro-grade tooling. Advanced in this context is just - “understands more than text in, image out”
A bit late (putting kids to bed), but the collect node can accept multiple inputs of the same type and put them in a list
IMO that’s selling yourselves short 😄
Someone who is writing Python to extend the system is definitely an advanced user, even if we make it easy to do so.
The nodes interface is then for the advanced "non-programmer" user. The pro who wants to do complex or experimental things with the building blocks.
Extension developer is advanced definitely. If I make a Dynamic Prompts extension, a normal user should be able to use it. That’s totally solvable in UI like you said though. If my extension can include some graph(s) of normal usage.
Yah, normal nodes user still needs to understand how things get put together, but doesn't need to know all the ins and outs of Python, typing, caching between GPU and CPU, etc.
Wanted to throw my opinion in here. I don't think requiring chaining is a good idea for building an item array, if we are worried about nodes becoming messy then just do what Blender did and add a "group" node and people can organize their nodes that way.
Versus the user of the other interfaces, which are a more bespoke experience
But don’t let me pull y’all off the actual topic. Just was curious about that point.
@tulip sluice @sinful forge
#1074572596663816242 message
Where am I wrong?
Node granularity I believe was one of the original topics for this chat 😛
I agree
We had decided on collections - when did that change?
I already have the collect node implemented (including unit tests!) and it's type-safe for the node system (type is enforced upon edge connection). It's just not implemented in the UI yet.
Now, a merge node may make sense in the case where you have a single output from two or more inputs, and want to e.g. control weighting on inputs. Then I could see merge chains making sense.
Are there already plans for a node similar to the "group" node in blender?
As i see it's subgraph?
Yah there's a graph node already
And a graph library
Probably needs more fleshing out for the UI though
(it works in the CLI today)
You mean with group inputs and outputs?
or just layout ?
i dont think arranging nodes is an issue
we'll have layout options and layout groups in the UI
I mean where you can create a new "group" type node, and then go into it and add nodes inside of it and create inputs/outputs to/from the group node
so once it is setup, doing that would be simple
ye we spoke about that .. i do want to do that some stage but i dont think we'll be doing that in the first iteration of the editor
you're familiar with Blender?
yes. been using it for a long long long time
okay cool, you know what I mean.
yep i do
ideally thats how we wanna do complex nodes so it becomes easier for people to build them and share as extensions
but that'll require more work to be done
current we're trying to get all the basic nodes in
so theres as much flexibility as possible
and the grouping will mostly be a front end thing
whats the biggest blocker for moving forward on this project? as in; are we more limited on work hours from contributors or does the situation lean more toward a case of the work being non-trivial?
asking because I may contribute depending on the situation.
this is what you want
and our UI framework supports it
we just need to implement it
you are more than welcome to start contributing. we could use as many hands on board as possible
there's no real blocker. we just ported the entire backend so we're bringing the current UI to parity to work with the new backend
which means the other tabs -- like generation and canvas
that is nearly there .. @upbeat prism is pushing it hardcore.
Once we're done with it, we'll shift focus on to the nodes editor
"why isnt it done yet" = more people 🙂
are you a front end dev? @loud helm
I am full stack, 12 years of experience.
then feel free to hop in if this is something you'd be interested in
see something that needs doing? just do it
im sure we'll find a way to incorporate it into the bigger picture of the release
but if you're doing something, just check in with us
so we can double check that you arent doing something that'll need changes later.
backend is python .. front end is react .. redux for state management .. react flow for the nodes UI
backend serves an API via FastAPI
and we do sockets for stuff that needs it
I wasnt complaining about it not being finished, I'm scoping out what the blockers look like for the project.
Oh I know - just saying, not done yet largely because we've got a few folks who are key drivers
And, frankly, we've only just gotten to a point where it wouldn't be toe-steppin
Ahh, I see.
Nodes is a move towards making it easier to jump in - Code has been a bit tightly coupled and required bunch of file traversing to get things done in the past. Now... nodes
And for the large majority, we're close to having the front-end fully ported - @upbeat prism is wrapping up the canvas migration, and I think we've then got non-nodes ControlNet UI to do and maybe some model management/config stuff and... then we're pretty close to 3.0 🚀
Are there plans to add more standard editing capabilities to the canvas?
Because I was considering making a small Photoshop plugin that dispatches to the InvokeAI backend.
Because then I could use layers, brushes, gradients, blending, magic wand, etc.
Not that I'm expecting Invoke to ever really fully support all of that, just wondering to how far the current plans go for Invokes canvas.
hahaha... @broken blaze and I spoke about this just yesterday
we do want to a bridge
THAT SAID
I'd love to have most of the stuff you actually would use PS for in the canvas
I think 90% of PS is bloat
and not necessary for ai-oriented workflow
but yes. if possible, we'd love to have some of the functionality in the canvas directly
and use PS only for the stuff that is too hard in a browser
theres two things here: 1. create a stand alone plugin. 2: create a bridge
which one are you looking to do?
the first i presume?
I wouldnt say that so much of PS is useless for AI oriented workflow.
I've found that AI responds quite well when you are able to communicate concepts to it better. Particularly with inpainting, if you go beyond simple solid colors and do things like blends and smears and simple shading/blending work then it really gives better results when you want really specific stuff.
Don't disagree with that
But thats the 10%
Neural filters, the full array of effects, layer properties, etc.
Not necessary for the AI generation workflow
whats the 90%? The licensing for patented color wheels?
Lol, those pantone colors ain't gonna sell themselves
lmao
the good thing is with the new invoke you get an api .. when you launch invoke new.. you can go to localhost:9090/docs to get access to all endpoints
which you can directly use in the plugin
as far as the plugin goes, from what I read about manifest v5 , it looks like you have full access to XHR and websockets so it should be able to just directly commune with the invoke API
the real power is in the type of functionality you would want to have as a digital ai sketchbook. get color in, blend it around, do the "photobashing" type workflows - but ALSO with controlnet, you can get in there and rough in your image and have it stay true to your work
it will .. i had a scrappy version a long time ago and it works for sure
get some stylus pressure in there for folks with tablets and voila.
yea I use a wacom tablet and its pretty frustrating to use with invoke tbh
HTML5 input events should allow to fix the gaps yea
💯
pressure-sensitive brush, maybe a few more brush types,
I also have had this concept for a while of ai-oriented brushes
some kinda noise+color brushes
maybe the ability to change keybinds so I can use middle mouse for panning like im used to...
lol
that brush idea is a good one.
this'll be a thing .. probably in a few versions after 3.0 and once the node editor is a in a good place.
Or, sooner. You know. People. 🙃
maybe the ability to have like an inpoainting brush so you can give it a prompt for a kind of texture or something and set a blending multiplier and then paint a texture onto the image?
could see that
can also see tech limitations on that
but there's multi-diffusion which has yet to be implemented
and there would be some form of brush element to that.
regional prompts?
yeah thats multi-diffusion
its good to have a long list though
my real dread is when theres nothing to do
then theres nothing to keep you motivated
luckily we have so much to do - i feel so alive 💀
yep lol .. id rather have a 100 things that are pending than zero stuff to do
@hollow marlin seems like the branch is still on the version where the processor and the cnet are together .. ping me up when you push the separated changes. ill give it a shot then
also i couldnt find a control aux release that has the new changes in it .. unless im missing some obvious link somewhere
@loud helm if you end up building a PS plugin via invoke, do share with us. would be happy to follow progress.
if I do then id want to contribute it to the project, theres no way id maintain it forever and I hate doing wasted work lol.
do you guys plan on providing a docker compose setup for invoke so people can just "compose up" the whole thing?
I currently use this project to launch invoke: https://github.com/AbdBarho/stable-diffusion-webui-docker
if its robust enough and performs all basic modes, i think we can integate it as a part of invoke ..adding any new features should be relatively simple process after even if you dont maintain it ..
it'll need to be setup to work both with oss and commercial though .. effectively if people want local, they launch invoke in the BG or if they want to use the power of the cloud, they use it through the commercial subscription ..That'd be the way to go.
But @broken blaze thoughts on this would go further.
yeah i imagine it would be relatively easy to have an API dropdown. Do you want to connect to localhost or hit cloud API for image gen
yep
coz running PS and invoke in the backend for a lot of systems can be taxing
so a lot of people might prefer a cloud option
right
if you do make one, please use react ...i dont want to deal with vanilla js
my docker compose comment is a seperate thing from the PS plugin btw.
I'm talking about a docker compose script for containerizing the running of the whole invoke stack.
So you dont have to deal with any setup or have actual stuff installed on your system folders or program files.
We had a docker template at one point
the man you need to talk to about docker is @low furnace ... I know zilch about it
Yeah
Also:
PS plugins only support restricted HTML for UI and restricted JS for logic.
you can use React via the UXP plugin and use the PS Template script inside react to make the UI
you'll still be using the core block elements from the PS script but you can make it real fancy if you want to
all new plugins should be done via UXP anyway
coz then you get easy way to deploy it to the adobe plugin marketplace
In this video, Simon Henke (a professional Photoshop Plugin Developer) shows how you can use the React library to quickly create advanced UIs for your UXP plugin.
Simon Henke: https://www.linkedin.com/in/simon-henke-69a4681b2
More Creative Cloud Developer Resources: https://www.behance.net/creativeclouddevs
incase you didnt already know
How much of the functionality would NOT be just IO to/from Invoke backend API in response to direct user input?
stuff like bounding box / mask generation and etc
basically any entry that needs to be fed to the api needs to be generated before hand
for mask gen I would just use the currently active selection marque, like the magic wand selection area.
yep
you'll need a python file to generate the masks i nthe way they are needed
for example
and then feed that to the api
you could probably do that with JS directly but ive never really made masks in js .. i cant see why it wouldnt be possible though
if by masks were talking about blocks of pixel data with an alpha field then yea you can use ByteArray in JS
ep
yep
we avoid the alphas.. just fill with black and white instead
theres a plugin by abdul alfaraj ..?
that does this but with auto1111's api
maybe the plugin could like... request the HTML for its UI from the invoke backend directly? if that were possible then invoke wouldnt need to change the plugin in order to keep it insync with the current development of invoke itself.
like invoke could send the HTML and then a description of what elements bind to which API endpoints
oh boy .. that's gonna be a rough thing to setup to work perfectly
i also feel UI and logic should always be independent
well yea, you could basically just give an ID to the HTML elements and then send a JSON structure that maps the backend API endpoints to have a set of inputs & outputs which are the element IDs. then in the JS the plugin can read the json description and just hook onto those element ids, right?
I mean thats a simplified statement ofc. but essentially it should be possible.
We do something similar at the company I work for, where we send over JSON schemas to describe the UI for customer analytics stuff
the only thing that's going to be hard is that there will need to be some logic baked into the plugin that knows what operation it needs to do
unless the assumption is always 'inpainting'
thats what enums are for!
you just do a radio group of options .. if user picks txt2img we call txt2img endpoint
and etc
ah, fair.
beauty is in the eye of the beholder .. but yes .. tab is probably better
but options are mostly shared
so maybea radio group aint too bad
Hey, so that localhost:/docs page you said invoke has.
Does invoke host that page under normal deployment?
Because its giving me a 404 on mine, but I am running it in a docker container so it might not be deployed the same.
you need to run invoke-new and then go to localhost:9090/docs
localhost:9090 is the endpoint host
and docs is just a place where fast api documents everything for you
I think there's also /redoc which should be cleaner (but not interactive)
Oops sorry didn't get that update in earlier. Okay I just pushed new version (with separate controlnet and processor nodes) up to feat/controlnet-nodes
I've been using local clone of controlnet_aux repo and updating regularly to get latest changes. If needed I could fall back to earlier controlnet_aux v0.0.3 release version and in InvokeAI just comment out the Zoe preprocessor node until new controlnet_aux release comes out.
@hollow marlin Did you get the Collect node working (not in the UI - I know that doesn't work yet)
no its okay .. i did the same for budget voke .. so i have a copy with all the new detectors
sweet .. thank you. ill give it a run again shortly
@hollow marlin im getting a tensor size mismatch
The size of tensor a (96) must match the size of tensor b (64) at non-singleton dimension
at this
down_block_res_samples, mid_block_res_sample = self.control_model(
latent_control_input,
timestep,
encoder_hidden_states=torch.cat([conditioning_data.unconditioned_embeddings,
conditioning_data.text_embeddings]),
controlnet_cond=control_image,
conditioning_scale=control_scale,
return_dict=False,
Yep that's my bad. I've only been testing with 512x512 pixel inputs. Somewhere in my code there's a FIXME for dealing with other resolutions. I'll try to fix today, but for now can you try a 512x512 control image?
didnt work
i thought tht was the reason
but it didnt work with 512x512 either
i presumed it might be control net 11 but not that either
Try setting noise to 512x512
In LatentToText I need to make sure control_image gets resized to match noise (or rather 8x the noise latent width & height).
@upbeat prism in the ImageField component, is there a way to know which node it is being used in?
@tulip sluice Pushed a fix for resizing control_image to match noise latent, to feat/controlnet-nodes
FYI I'm away from keyboard rest of the day, I'll try to rejoin conversation this evening
Does anyone have a few examples of using nodes in CLI? I’ve only been using UI and want to switch to CLI for a bit of a faster inner loop, but I’m not sure what CLI docs I can trust 😂
right now it gets a nodeId, you can use that. we could just pass the whole node in so it has full context, probably good to do this
might be a good idea so we get full context
As I recall, the --help docs on the invocations in CLI themselves are updated
Ah so there's no way to just pass a JSON graph or something?
I think UI is probably easier than CLI, but that's just me 😛
no its piped commands
noise --seed 10 | t2l --prompt "an old man" | l2l --prompt "an old dog" --strength 0.8 --link -2 noise noise | l2i | show_image
for example
the UI seems easier than text, but who am i to say
suppose if autocomplete works you might be a keyboard warrior
you can open the API docs page localhost:9090/docs and creat sessions from JSON there
Yeah that's what I think I'll do
how do i use the nodeId to ping what kind of node it is? sorry .. cant check myself right now so just keeping it for future ref
you can query redux state, state.nodes.nodes.find(n => n.id === nodeId), something like that
probably not the best way but that will work for now
gratzie
I'm not really sure how to test CollectInvocation as part of a full graph execution without the UI. Does the CLI handle multiple inputs to the same port on a node like CollectInvocation uses?
catching up after a few days off Discord, lol 😅 @loud helm I have a pretty good docker-compose setup in my fork and hope to get around (Soon™) to contributing it back to the project. The entire docker setup could definitely use a refresh. Hopefully just after 3.0 ill get it done. take a look here if interested, though i've not updated this in many weeks and things have evolved since. https://github.com/ebr/InvokeAI/blob/feat/docker/docker/docker-compose.yml.
similar enough to what I whipped up, except I've tried to seperate the frontend and backends into separate containers/images.
yea that also works. we'd still need to serve the static frontend app for the non-dockerized setup - not everyone will want to use docker (weirdly enough)
also, have you tried the docker setup on a rocm system? i have no AMD GPU and no idea how that works (in terms of container runtime accessing the GPU), or how to test
(haven't looked into it rather)
I have an RTX4090 so I also havent used AMD lol
@upbeat prism - Do you want feedback/suggestions for nodes UI at this point? Trying to find the sweet spot of too early vs. too late 😁
@hoary pecan - he's said he'd love ideas on what it ought to look like, vs feedback on what it is rn
(because it's at a good point where we know it can be better, just need to make it better)
Is there going to be a general prompt node (pos and negative as one)? There is a single prompt/compel node and you have to use 2, which might cause confusion. At the least there needs to be a way to label them so you remember which is which.
Hello! Not sure it's yet time for feedback but I wish to report a small issue in the Nodes Editor UI, which probably will be fixed during its development. 🙂
Two biggest pieces of feedback on UI so far is:
- remove node really should be delete key, not backspace. I find myself slightly missing the text box, hitting backspace and deleting the node instead of text. Also Ctrl+backspace on Windows deletes a whole word, and if there’s no words left in the text box, it deletes the node (if I’m spamming Ctrl+backspace, this might just be a bug I can fix at some point 😅)
- Once the set of possible nodes grows just a bit more, changing the list will probably feel a lot better as a sidebar that can be pinned/unpinned like gallery and parameters.
Delete should have confirmation to prevent accidents.
seems logical
hotkeys will be done at the very end .. right now they're all at default with what react-flow uses .. and the list if a wip thing to help during development.
the node editor is very barebones in terms of ux at the moment.
but keep the ideas coming in .. will help us build it the way users might feel most natural
@sinful forge Will you have time to complete the review on the configuration system? https://github.com/invoke-ai/InvokeAI/pull/3340 I would like to work on the installation system now, and it will be easier if the configuration changes are in.
Oh yah, sorry... it was a challenging week x.x
Thanks so much. I sympathize...
Just a quick question is anyone using the nodes on a networked machine, I am trying too, had to add host: true in to config, it all says its connected, and the webui loads fine, however everything time I invoke a "server error" pops up in web ui but with no error message in cli just says its loading model then nothing, I know no bug reports till its done but just wanted to check
You should have errors in the browser js console
@upbeat prism Ok will check when i am back in the office
there's a lot of ways to make a nonfunctional graph right now -have the node validation disabled
@upbeat prism Thanks for the help, worked it out, somehow I managed to have an older version of nodejs installed on the server
huh, that should not have caused a problem with anything
@upbeat prism maybe its was the way it was installed or a corrupted file, but removing nodejs v16 off the server and installing nodejs v18 LTS fixed the issue I was having
good to know, would never have thought to try that
@sinful forge In an invocation node, how do you tell the difference between an input that accepts a simple value, such as the width in the noise node, and one that only takes its input from another node, such as noise in the t2l node? I'm trying to figure out how the graph editor knows to create an orange input dot and a numeric textfield for width and a magenta dot but no textfield for noise.
@upbeat prism would have to speak to the UI. I'm pretty sure it uses the field types in the schema to generate UI. On the backend, any fields that can't provide a default value should be marked Optional, and there's some additional schema you can set to indicate that they are required (they can be None upon creation, but must have a connection before execution). There's not a great way to represent that in Pydantic unfortunately 😑
those are predefined UI elements assigned to those particular types being sent from the backend
if you go to InputFieldComponent.tsx it'll show you what element is being created based on what type of parameter it is
For the noise node .. it outptus a latent field
and the latent field component is defined here in InputFieldComponent.tsx
in the component you can see there is nothing ..so it returns nothing .. in terms of what needs to go for editing it
and the colors of the sockets are defined in constants.ts => so when the latent field is the output, the component generated is null and the socket generated is pink
so if you create a new output type, you need to create a new frontend component for it .. one time job
To elaborate on general sequence of events:
- Backend generates the OpenAPI schema in memory and serves it on localhost:9090/openapi.json
- UI, on start up, fetches the schema
- UI parses the schema, looking for nodes and building node templates from the schema. We can't easily use the OpenAPI schema directly - so the node templates are built as a more accessible representation of the schema.
- The node template includes things liek the name and description of the node, but also its fields and return value. The fields are parsed into an inputs array, and the return value (output) is expanded into an outputs array. Some intermediary "types" are assigned to each input and output.
- When you add a node, the template for that node is retrieved and a UI component is built and rendered based on the template. Each input/output has a different UI component based on its "type".
- At the same time, a simple object is added to the UI's internal state to hold the value of all fields in that node
Every node is rendered as the same top-level node UI component. Then within that UI component, things like the handle colors and input areas are conditionally rendered based on the node template.
There's a good amount of missing logic in how the node templates are built and how they are used to generate the UI. For example, there needs to be more logic for some fields to limit if they can be connected-to or not, if they should be disabled in certain situations, and so on.
Additionally, there is a function called isValidConnection() that is used to validate connections. Right now, you can pretty much connect anything to anything - that's because the validation logic is skipped by an early return true. I've done this for now because the validation doesn't handle all cases properly yet.
Thanks for the detailed explanation. So now I understand that there is no magic flag somewhere that distinguishes fields that can only receive values from edges from those that can be set using textfields, and that this is determined on a case-by-case basis by looking at the field type. I am working on the CLI again trying to make the inline help messages more useful. Currently the help makes it look like every parameter can be entered at the command line. For example:
invoke> t2l -h
options:
-h, --help show this help message and exit
--link LINK LINK LINK, -l LINK LINK LINK
A link in the format 'source_node source_field dest_field'. source_node can be relative to history (e.g. -1)
--link_node LINK_NODE, -ln LINK_NODE
A link from all fields in the specified node. Node can be relative to history (e.g. -1)
--positive_conditioning POSITIVE_CONDITIONING
Positive conditioning for generation
--negative_conditioning NEGATIVE_CONDITIONING
Negative conditioning for generation
--noise NOISE The noise to use
--steps STEPS The number of steps to use to generate the image
--cfg_scale CFG_SCALE
The Classifier-Free Guidance, higher values may result in a result closer to the prompt
--scheduler {ddim,ddpm,deis,lms,pndm,heun,heun_k,euler,euler_k,euler_a,kdpm_2,kdpm_2_a,dpmpp_2s,dpmpp_2m,dpmpp_2m_k,unipc}
The scheduler to use
--model MODEL The model to use (currently ignored)
--seamless SEAMLESS Whether or not to generate an image that can tile without seams
--seamless_axes SEAMLESS_AXES
The axes to tile the image on, 'x' and/or 'y'
It looks like the positive and negative conditioning, and noise, can be set on the command line, but they have to be inputs from other nodes. The --steps argument can be set, but it looks like the others.
I fooled around a little bit last night and changed the help message to this:
Generates latents from conditionings.
INPUT FIELDS:
cfg_scale ConstrainedFloatValue The Classifier-Free Guidance, higher values may result in a result closer to the prompt
model str The model to use (currently ignored)
negative_conditioning ConditioningField Negative conditioning for generation
noise LatentsField The noise to use
positive_conditioning ConditioningField Positive conditioning for generation
scheduler Literal The scheduler to use
seamless bool Whether or not to generate an image that can tile without seams
seamless_axes str The axes to tile the image on, 'x' and/or 'y'
steps ConstrainedIntValue The number of steps to use to generate the image
OUTPUT FIELDS:
height int The height of the latents in pixels
latents LatentsField The output latents
width int The width of the latents in pixels
And immediately began to wonder whether I could automatically change fields like scheduler to indicate they are settable from the command line as --scheduler.
BTW, are the dimensions of latents measured in pixels?
A too-late question here. What if v1 of the CLI uses the same linear model as the UI's non-nodes interface and sticks to the basics of txt2img and img2img and txt2img2img instead of implementing full nodes exposed to the user?
So you keep syntax that's similar/identical to the 2.3.x CLI but under the hood it builds and links nodes together.
Just thinking of a quick[er] way to get 3.x out the door with a CLI.
lixels?
I hesitate to admit I chuckled audibly at this
Oooh, spicy! Just on the edge, I like it!
How many lixels to get to the center of your image?
I think the issue here is that the nodes CLI is elegantly auto-generated from the node schemas, so writing a cli in the linear style everything is actually a lot of work.
What we should do is define library graphs for the basics and then recommend using them. The library graphs allow you to expose specific ins and outs, hiding away all the internal connections (like conditioning and noise nodes).
The t2i library graph already does this and is a good example. @broken junco
That makes things clearer - if you see a primitive type, you can directly input it. But non-primitives require linking.
That's what I'm aiming to do. Full implementations of the basic text to image, image to image and inpainting using saved graphs, as well as a facility for building custom graphs and saving them. The basics are already there in the CLI and just need to be put together.
Sounds great
@upbeat prism / @broken blaze If you haven't looked at Flowise, it's a super similar usecase using Reactflow - https://github.com/FlowiseAI/Flowise.
Their add nodes UI:
I was looking at setting it up and I was like… hey wait a minute I’ve seen UI like this before… wait that’s the same control! Haha
Yeah the ControlNet pre-processors could definitely use this tree-like (or at least two-layer) typing to cut down on clutter in Add Nodes. There's currently 13 preprocessors and counting...
Will be even more with community nodes :p
Already have added tags to nodes, but they just aren't used yet
Ok I feel silly for asking, as I try to work things out or search for answers before I ask, but how do you get a start image in to this box ( I must be missing something simple)
you can either connect a result image (node output) into the Image input, or drag an image onto the image icon
UI hasn't been fleshed out quite yet, but will include an upload function
I have tried to the drag image in to the box, but that just uploads it to the upload area of the gallery
Can you try another browser pls? There's some difference in handling of the HTML drag and drop API that affects certain linux builds of Chrome (or maybe it was FF)
may need to use a library instead of the native APIs to fix it
Ok on currently on a windows machine chrome and edge both shows invalid upload when dragging from gallery, but in firefox does work right
on ubuntu Firefox works correctly too, but chrome gives the invalid upload, now I know I am not going crazy
@upbeat prism thanks for the info, I will just use firefox for now
np, on macOS it works fine. Haven't tested on windows yet. Thanks for the report and testing, I'll get it sorted
I have the same problem with trying to drag from the gallery using Chrome on Ubuntu 22.04. Works fine on Firefox though. ( @upbeat prism I think you had already suggested I switch to Firefox temporarily to get around this issue).
just a question is the range node random order output?, I did 5-100 in 5 step increments, instead of 5 10 15 20 25 etc... it went 10 60 50 75 45 etc...
known issue related to how range/iterate nodes work internally. the images are not produced in the same order as the iteration output array.
the iterate nodes internally "expand" the graph downstream of the iterate node. so you give it range -> iterate -> txt2img -> img2img, and instead of the single txt2img -> img2img, it splits off to X txt2img -> img2img branches, where X is the size of the range.
so its like a map operation. thing is, this is a graph, and graph branches are not ordered like arrays. so there is no guarantee of the order of the X branches.
end result - you will get all the values back, but they are won't be in order (unless by coincidence).
its tricky to solve this within the graph execution logic. you need some kind of flow control outside the graph to manage it. this does not exist in our implementation
for simple iterations, we are considering using the existing queuing capabilities of the backend (currently unused) or adding some simple queuing to the UI
Thanks for the explanation, I was only doing a small batches so was not that hard to put them in order when I was pasting them externally, just had to check the metadata on each image (I was incrementing steps)
If the seed is incremented with each iteration going into txt2img, then can't you sort the output images by their seed after they come out?
You can iterate over any collection - so while it’s possible to sort by seed, that’d require a substantial amount of special handling
What I mean is that you’d need to have some context to know how the results should be ordered.
Currently they are ordered by creation time stamp. Ordering by anything else would require a lot of brittle logic (outputs would have to be somehow grouped based on a common upstream iterate node, and then sorted by the field that iterate node connects to, then it’s output list used as the source of truth for sorting).
Much better would be to process the iterated nodes in the right order, but I understand this to be a tricky solve
Maybe you could pass a sort hint along with the metadata?
My intuition tells me that would work but be a total dirty hack, leaving us with a really messy database and image metadata format
Maybe I’m misreading, but a Sort node?
@upbeat prism you are probably aware, but images are not getting resized to window size in txt to image, so chopping off large images at bottom and moving the arrow around
huh. yeah, that's not right
I found in txttoimagetabmain changing first height from 100% to 100vh stopped it, but I am sure that's not the right place to change it
here's the fix https://github.com/invoke-ai/InvokeAI/pull/3501
Works great with that pr, also fixes the bottom of the option panels being cut off too
merged
How to invoke the latest cli that is, scripts/invoke-cli.py with node invocation ?
Hey @devout parcel 👋
The CLI in its legacy format is indeed being deprecated with Nodes.
How a user can use a text2latent invocation with current CLI implementation then ?
`invokeai` command will now launch a new command-line client that can
be used by developers to create and test nodes. It is not intended to
be used for routine image generation or manipulation.```
Currently users do run with `invokeai` and post that there will a active command space to provide an prompt and other infos
I believe that @broken junco is currently in the process of updating the command line interface, but I'm not entirely sure that current recipe for getting it running
However, I do know that the nodes interface is a bit harder to work with through the CLI
Just to better understand - What's the goal for building with the CLI? Most of the capabilities/functionality you're looking to do shouldn't require an explicit arg to be created on the CLI itself
We've created a model manager service ( https://github.com/invoke-ai/InvokeAI/pull/3335 ) that will handle this type of model handling
@violet sleet probably can also share more of his thoughts on how to proceed with your work
We will look into this feature, by what you are saying we don't need anything to change in CLI. Our goal was to let user select the type of model which we thought would require changes in CLI
That will be handled by the model manager
We have also updated to use Onnx Execution Provider which takes advantage of the CPU for the inference pipeline, not sure on how the model rewrite works, will discuss with the team and check here
There will be some documentation/updates here, but I think we'll be able to better handle ONNX models once this is merged in and discussed. I anticipate us sharing a path forward here in the next day or two.
Thanks for the support and understanding, we do expect a clear path once the documentation/update is ready 👍
Hi @broken blaze any updates here
@devout parcel Yes! Model Manager was just merged in yesterday, the docs are still being updated, I believe, but let me try to put something together today for your use case.
The merge is here - you can probably parse through it a bit to see where things have changed, but you'll want to follow the patterns established in the model_management and models folder
@violet sleet - With ONNX, would the ONNX model be a new BaseModelType, or just alter the pipeline? I'm not as familiar
i think it should be big change as now all generation logic expect to got UNet2DConditionModel
last time I tried to do such I think of adding to model loader node field like executor/provider
with values:
Torch
TensorRT
ONNX_CPUExecutionProvider
ONNX_CUDAExecutionProvider
and handle this values inside model loading function
in this case - no new model types introduced
but model loading becomes "a bit" long 😄
//of course possible to do without this field by loading in format in which model saved and select provider for onnx base on current selected device in invoke, but I initialy thought of this when tried to compile TensorRT model)
Ah I see - so TRT and ONNX both share compatibility issues w/ LoRAs, TIs, etc?
onnx a bit less, but in general - yes
we now implement lora by torch hooks
but for onnx/trt we need to add lora weight in model and run this patched model
You don't think that deserves organizing it into a different model type?
i think initialy we can do such to see how it works and if it possible - merge back in general model
no - left as separate type
aye
but as I see now:
compel will fail
ti/lora - can't be applied
t2l/l2l - probably crash(unsure, need to check code)
what steps would @devout parcel and team take to implement ONNX?
give me some time I'll check code a bit
If you give them a step by step of what to work through, I think that'd be very helpful for them since we don't have docs yet
i more worry about - our generation code fully based on torch and even if they load onnx model it wont run
right - they'd need to write a new t2l node for the ONNX pipeline, right?
to be clear - we need to rewrite current generation too
now I create generation pipeline in a hacky way, but in real we don't use pipeline
but yes - currently they need to implement new t2l/l2l
Fwiw - https://github.com/aluhrs13/iai_node_onnx - is t2i for olive models that I haven’t tried to adapt to handle compel or new model manager
It’s the bottom of my list to build out more though.
I get that- SAMHQ is hot
I’m conflicted lol - DeepFloyd vs SAMHQ
Hi @broken blaze in continuation with @devout parcel discussion, actually we have given a PR for ONNX model pipeline by designing a new Model type based pipeline {structure} in PR#3380 https://github.com/invoke-ai/InvokeAI/pull/3380. We are looking out for the changes made for the new pipeline with node based inference. By using CLI, we try to get user input to select Pytorch or Onnx based inference pipeline. So currently we would like to know if the CLI node based pipeline is ready, if so let us know some pointers to the node based documentation and sample cli to test. If CLI based pipeline is not ready yet, can you share us some pointers on how the node structure is being implemented for GUI.
@violet sleet can likely comment but I don’t believe we’d recommend using CLI to make that designation.
The node structure is being implemented as a visual graph based editor. Users would select the model in the “model loader” node, the first node in a graph
I have no idea how to use cli with current node backend))
And about onnx implementation i decided to look at it after we done with main logic, until first 3.0 version
But in general problems - current logic designed to work with torch classes and we not implemented yet pipeline loading in model manager, all logic done separately in nodes(there still most generation code combined in pipeline class but it's because not all rewriten yet and this pipeline in a hacky way initialized only with unet)
//in real - i have no idea how to implement in onnx such features as clip skip
@hoary pecan @tardy nova I read your code while trying to understand "what needs to be done to add something like onnx in our architecture"
after an hour, I found that i had a basic skeleton and... in 4 more hours had something very functional.
it's still draft, but I think I have gotten the majority of the integration work completed for ONNX, including other support nodes (there are likely some things you can assist with) - thanks for your contributions thus far, I got acquainted with ONNX from the PRs you submitted.
I'd love for you to take a look at the PRs and get involved
Hi @tardy nova and @devout parcel ,
Generally, to understand how the CLI and GUI work, I'll suggest you review the docs in https://github.com/invoke-ai/InvokeAI/tree/main/docs/contributing to understand how the nodes engine works in broad strokes.
Correctly implemented nodes (aka "invocations") will automatically show up in the GUI nodes editor. For the other application tabs (the "linear" tabs: Text to Image, Image to Image and Canvas), we need to manually create UI elements to allow the user to interact with different nodes.
The node editor automatically generates a graph from the node GUI, but for the linear tabs, we need to manually create the graph.
Ideally, you do not need to worry about the GUI side, and the process by which the nodes are parsed and UI templates created is fairly involved.
The CLI is autogenerated from the nodes as well, but as you can imagine, working with graph structures on a CLI can be tedious. It's much simpler to work using the GUI.
You can help us to help you by clarifying your goals in working with and on InvokeAI:
- Do you intend to build on top of InvokeAI, use InvokeAI, or something else (eg only contribute to it)?
- If you intend to build on it, how do you plan to interface with it (eg CLI, GUI, programatically)? The more specifics the better.
We understand you need ONNX support and have already made a substantial PR. Thank you for this and we apologise for it sitting there so long. We are in the middle of a large migration effort to a new architecture as you may know, thanks for your patience,
PR referenced - https://github.com/invoke-ai/InvokeAI/pull/3562
Sure thank you @upbeat prism for the documentation and heads up on the node structure. Will go through the information shared. @violet sleet thanks for the PR given. It is great, will have a close look at the PR and will help in further development as possible.
Sorry to post this here i didnt want to post in the invoke-chat, anyone else having issues with main and output images not being made, the latents are getting made, but nothing is going into the image folder, then get 404 errors in console as the image is in the db but not in the folder? (Just making sure its not a me issue)
It could be both a "you" issue but also an "invoke" issue 🙃
Is this a clean install or a migrated install? Clean db?
Migrated but clean db
you might see if you can go into settings, change your log level, and then try it again to see whats erroring out
the thing that is throwing me is uploads go to outputs folder and can be retrieved
fine
Think you'll need to expand the data - cant quite see what the invocation error was
here you go
"Traceback (most recent call last):
File "/home/invokeuser/InvokeAI/invokeai/app/services/processor.py", line 70, in __process
outputs = invocation.invoke(
File "/home/invokeuser/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/invokeuser/InvokeAI/invokeai/app/invocations/latent.py", line 436, in invoke
vae_info = context.services.model_manager.get_model(
File "/home/invokeuser/InvokeAI/invokeai/app/services/model_manager_service.py", line 224, in get_model
model_info = self.mgr.get_model(
File "/home/invokeuser/InvokeAI/invokeai/backend/model_management/model_manager.py", line 434, in get_model
model_path = model_class.convert_if_required(
File "/home/invokeuser/InvokeAI/invokeai/backend/model_management/models/vae.py", line 92, in convert_if_required
return _convert_vae_ckpt_and_cache(
File "/home/invokeuser/InvokeAI/invokeai/backend/model_management/models/vae.py", line 149, in _convert_vae_ckpt_and_cache
checkpoint = torch.load(weights_path, map_location="cpu")
File "/home/invokeuser/venv/lib/python3.10/site-packages/torch/serialization.py", line 791, in load
with _open_file_like(f, 'rb') as opened_file:
File "/home/invokeuser/venv/lib/python3.10/site-packages/torch/serialization.py", line 271, in _open_file_like
return _open_file(name_or_buffer, mode)
File "/home/invokeuser/venv/lib/python3.10/site-packages/torch/serialization.py", line 252, in init
super().init(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: '/home/invokeuser/userfiles/models/core/convert/se-vae-ft-mse'
"
however that folder and its file are there
Ah ha!
It's because it's a relative path
There's a fix in the works on that - It may have been merged. When did you last pull main?
about 30mins ago
is it this one ? https://github.com/invoke-ai/InvokeAI/pull/3607
Possibly, yes
There may be a few 'relative' things in flight
but you might try pulling it to see
yes its se as it was sd earlier so I changed that to see if it would fixed it
ah, lol
definitely a realtive path issue, but only with the vae option in models.yaml, I tested by changing from vae: models/core/convert/sd-vae-ft-mse to the full path vae: /home/invokeuser/userfiles/models/core/convert/sd-vae-ft-mse and now no error
this was the other issue: https://github.com/invoke-ai/InvokeAI/pull/3610
merged. thanks for catching it
Will the nodes support the safety checker? Noticed that the model is not loaded in the TextToLatents node
@violet sleet - do you know how/when the safetychecker is loaded in 3.0?
there no code for safetychecker currently as it not in priority
but generaly i think it will be added as one more output from model loader node
and then safety checker node to run it on generated image
Alright, safety checker as its own node makes sense. Thanks
is it a totally separate model that can be run arbitrarily on image / latents? or integreated w/ unet etc?
it's like vae
can be provided and runned separately
ok cool. why would we want it on the model loader node then?
tbf .. every submodel can be run independently
Even after initializing backend with script\invoke-web.py
Could able to see only this node. L2T, Prompt and others aren't available
Did you add any custom nodes or anything else that might be breaking your node discovery?
please check your browser's javascript console for errors - i expect there to be some there
Trying with latest main only, even there are similar msgs with the dev server console.
Meta question: are ppl working on nodes familiar with the feature set of both Blender node editor and Comfy? so much to learn, from a conceptual viewpoint
like nested subgraphs ("groups" or "recipes") are an important building block for composition and being able to create a community around sharing node graphs
Yes.
Node Editor is in "alpha/experimental" state - It is not the final UI/UX and purely for folks to poke/prod at nodes in advance of the full editor release.
Many features that will be supported do not currently exist, and as the developer of it has called out - "There are no mitigations for footgunning currently in place"
Note that there is support in the graph enginer for some things that aren't in the UI yet as well (including subgraphs).
awesome! good to know
another small thing that I find super useful for auto-documentation purposes is color-coded types for the I/O ports
vae/number/text/...
it looks like you haven't actually started the python server
this already exists - there's a button to toggle the legend in the bottom left corner. only thing is, when i added that, there were like 5 field types. now there are like 15. so color is no longer very useful.
anyways, the whole thing needs to be redone before release, so we'll find a better solution for this
Was doing with Windows and not WSL/Linux now able to access the same.
@violet sleet Facing issues with the setup in this branch, https://github.com/invoke-ai/InvokeAI/pull/3562#issuecomment-1651577818. Model Installation also fails in this branch, could you please add steps to be followed for this
This branch is pretty far behind. We have made great progress on ONNX thanks to @violet sleet and @opal arch
I offer @violet sleet all the credit. I'm currently merging main into the branch
Yeah thanks @opal arch, please do know once done waiting to test the same
@devout parcel feat/onnx should now be up to date with Main again
Missed with merge it seems this alone has to be removed, where the import is twice within invokeai/frontend/web/src/features/nodes/components/fields/ModelInputFieldComponent.tsx
oh weird, I'll make sure nothing is sitting stale on my local
fixed @devout parcel
I have been struggling with input and output types while trying to write my xyGrid node. #1133465385182699582 . What I wanted to do was have a the XYCollector take in X,Y as list[Any] then output a single list[any] that was a product the input lists that could be passed tot he iterate node, but for the life of me I couldn't get it to work. I also tried Union[int, list[int]], the same for float, and other combinations etc. basically I was trying to accept int and float into the same node inputs so I could create an output array without worrying about the input types. Eventually it would need to accept all the types that can be passed as part of a generation. In the end I just converted everything to strings and then back again after the iterate node. This works but seems really inefficient. Any suggestion or is this kind of thing coming in the future of nodes?
What does this use for translation? Very cool
This came up on the US copyright office's webinar around limited accessibility for AI
It uses a pool of translation services. How it's actually working is pretty opaque as the API is going to a regional server (mine is located in Ontario) and then "something happens" at the server side. The list of supported translation services are here: https://pypi.org/project/translators/
There is some control for which translation services will be used. For example, you can exclude China-based services.
There's some runtime type checking that needs to be a bit more sophisticated to handle this. It uses some python functions that do not play nicely with Any, Union, etc. We figured out accepting Union[int, list[int]] - i.e. a type or list of that same type - but haven't addressed your use case yet
is there currently a way to have a list option in a node? like what model loader node has but for other files?
I have work arounds for now so no real rush. I was more checking that I hadn't missed something obvious. I have a working version with labels on the grid and consistent grid layout, just need to do some code tidying then will release it for public consumption.
No rush from my side its much better to get the refactoring done right than to rush somthing in just for me.
'Union[Any|list[Any]]' was what I originally tried to use to collect the xand y items. In my mind an ideal would be a custom version of iterate that could take in the X and y item arrays and create a product array and then output the x and y items on the other side. That would replace what takes 7 nodes now with just 1 with no loss of functionality. Then something similar could be done on the collect and grid production side of things. I am confident there is an ideal way of doing this that isn't necessarily what is going on in my head 😂
great work on the node's gui, since nodes are supported now Consider Nested nodes or trees of them. take inspiration from behavior trees in a game engine.
you can use a Literal['one value', 'another value', 'third value'] and it will be a select dropdown
Union[Any|list[Any]] is nodes' worst nightmare lol
If was easy then it wouldn't be fun. Well that's how I lie to myself when things get tough 🙂
Wondering if it's doable to populate the literal list dynamically with files in a directory. I suppose just iterate the files and add to the list.
I'm kind of in the same boat with one of my nodes, faceoff. You can specify a face id to select a face in the image to process, but it would be great if you could specify multiple (like in facemask) then output them as a package (image, h/w, x&y coordinates), and iterate through all the faces in the image. Kind of similar.
yep, just need to get the list populated on app start. i don't think its psosible to refresh the list without restarting the app, though
ya I didn't think about that, but makes sense
I will be uploading my xygrid soon. you will have to take a look at how I handled it for my xyimage bundle. I am sure there are better and more elegant ways but it works for me atm.
Just updated my image to grid nodes #1133465385182699582 - it now supports correctly ordered XY Grids with labels. Feedback or suggestions are most welcome.
That is awesome. So great for making test outputs like that!
Its a bit of an excessing node graph size as it uses More nodes that I would ideally want it to be. But that is the types issue workaround. Let me know in the forum what you think if you use it
Will do! thanks