#Invoke Backend (Node-Based Backend)

303 messages · Page 1 of 1 (latest)

misty cedar
#

Starting a thread to discuss and share progress. Current progress is in branch invoker-framework, and can be run with python scripts/invoke.py with --api for the API version. Please no pull requests to the branch without prior discussion with @misty cedar - things are stabilizing, but there are still some major components to complete that may involve refactors. Until then, enjoy a preview of what's there!

#

To-do:
(rough list, probably incomplete)
✅ Image management. Images in outputs need to be references to something on an image manager, which should manage load/save/caching of images. The idea being that we can retain N images in-memory for pipeline usage, then load the rest from disk as-needed.
✅ Context history: Provide a way to get previous contexts
✅ Context save/load
⬛ Remove unexecuted nodes from a context
Socket.io: I need to prototype this again.
✅ Events/signals: Need to define some standardized signal format and then hook signals everywhere in the invocations. This will probably be something like { "context_id": "...", "invocation_id": "...", ... }.
✅ API: There's no API yet! Well, there's one POST that lets you run a full JSON graph, but I need to build an API. Need everything else in place first though.
⬛ Image metadata. No idea how I'm going to pipe that in, and I don't feel like we settled on what image metadata should even look like when you could have a giant node graph generate images, and lots of input images.
⬛ Iteration. I've been leaving iteration to the end to avoid dealing with control flow. I'm thinking I'll do loop unwrapping or something, but not sure if that happens up-front (even all the way up to the UI) or dynamically at run-time (e.g. see an iteration node and generate all the iterations from its links).
⬛ Disable some nodes in the API (e.g. you don't want show_image popping up images on the host machine if you're accessing it remotely).

#

Newest tonight:

  • The cli now supports command pipes (e.g. txt2img --prompt "a photo of a cat eating sushi" | upscale | show_image).
  • There is now a context manager to manage context objects (in-memory currently, no history, and not super clean).
  • All invocation (node) processing is done on another thread. Currently a single node at a time - I'll leave the problem of multiple nodes at a time for someone in the future (since it gets really complicated with all the nodes and their needs/performance implications). Invocations can be awaited with a simple wait function call on the context.
  • Eventing in-app is fixed. The event system should easily support eventing via external services as well, but I'll not be proving that out with v1. It should be pretty straightforward to get socket.io hooked up again (though I'm sure I'll run into something weird).
#

Once I've got the core done, I'll clean up old code. At some point after that, it'd be great to rebase on development (or more realistically, create a brand new branch and manually migrate my changes, since they mostly sit on top of the current code right now, and a manual copy would be less hassle).

Things I could use help on after that (or leading up to it, depending on the task):

  • Building invocations (nodes)! I've done txt2img, img2img, upscale, gfpgan, and some image utilities. There's a lot left to do though. They're pretty straightforward though 🙂
  • Breaking apart the Generate class (or at least the function calls). Ideally we have a function call per invocation. I've done what I can here, but wanted to keep my changes minimal until we work on merging efforts.
  • More flexible configuration. I'd really like to set up configuration to parse to a well-defined object and utilize that during service definition. It'd be awesome if configuration could set up a machine to just run as a queue processor, for example, with the context manager replaced with remote calls to the hosting service.

Probably more things I'm forgetting.

terse hill
misty cedar
#

/ldm/dream/app/invocations in the branch contains all the current invocations. They're made available in the CLI automatically, and you can also read the OpenAPI doc when running the API to get the full schema for each.

I believe @tame remnant has at least prototyped reading the OpenAPI doc to generate nodes from.

Each node also has a stable type name, so if you wanted custom UI for a particular node, that should be possible.

#

We can also add additional properties to fields that the UI could pick up and utilize, if necessary. Preferably we minimize that to not mix too many UI details in the backend code, but it's possible.

tame remnant
#

@terse hill There are two ways to make a Node UI: automatically generated from the code that describes the invocation, and hand-crafted. Right now all nodes are generated automatically

terse hill
tame remnant
#

I've got my UI to work with the current changes to invoker-framework, no need to change any code there. I'd like to submit a PR with a simple_prompt.py invocation - that ok?

#

I'm slowly working out parsing and inferring UI from the OpenAPI doc. Without any UI details in the invocation I need a lot of logic but it's certainly doable. If I could just add a uiHint dict like discussed, it becomes sooo much simpler. What if we have an Invocation file, and then another file that describes its UI? I will still get things working to infer almost everything tho.

#

re: TextToImageInvocation & ImageToImageInvocation - should model and progress_images be there? I feel like those are system settings that don't need to be in any node

#

One UI hint I think is required is which parameters should have a handle to be linkable. For example, if I want 'prompt' to require an input (maybe this isn't how it should be but just for an example), the schema needs to indicate this somehow.

#

If the Field is optional, the schema shows no default value. I can use this, but then what if we have some other field which is optional but also does not require a link? I think there needs to be a flag for fields that require input links

#

Hmm. Ctrl+C no longer kills the script properly.

#

I mean invoke.py --api

#

Another thing that is needed in the invocation somewhere - for string types, do I show a small HTML input element or a multi-line textarea. I'm going to update my fork's invocations with the minimum UI hinting needed. I'll make it an additional property of the field called ui and you can give me your feedback, Kyle

misty cedar
#

Can't sleep 😣. Yah I bet the threading killed Ctrl+c. Need to work that out (some todos in the code about that).

There are only really string types (text field), enum types (drop-down), and numeric (some with range requirements, mostly just minimums). Getting a good auto-UI setup may be a pain, but will make it easier for contribution later.

All non-required parameters are linkable. And required parameters can even be linked (but you have to provide a value directly as well).

Model is in there specifically since I want to support multiple models in the same node graph. There's a discussion thread discussing how it can be done. You can hide it for now, but I want to plan on it ☺️.

Progress images has always been a flag. I don't know how useful it will be in new UI, but I was going to add events for it (or at least step events like exist today).

#

I mostly am trying to avoid more nodes so I don't have more to refactor as I continue changing things around. I'll hopefully stabilize the code too, but I have some field handling and results serialization work to do, and that could be disruptive.

tame remnant
#

Ok gotcha re: model, that will be sweet

#

in the react UI i have progress images as system-wide. but I realize it's also a node thing. I'll give nodes an additional settings tab, based on props like model and progress images, so you can manage per node

#

im just working out how to parse the numeric stuff now. javascript only has 'number' as a type, it's kinda hilarious. you can't get a float or int or anything reasonable. just number.

#

besides that I can generate the UI from the invocation now, with some assumptions. some light hinting makes the UI much nicer tho

#

mind on invoke stuff keep ya up? hopefully you're fast asleep now tho

#

width and height need pydantic's multiple_of=64. seed needs ge=0 and le=4294967295 (numpy's max)

#

and I believe cfg scale is gt=0

rocky pollen
#

I think the question around nodes and parameters is an important one to ask - should parameters that don’t make sense except in the context of a specific node live outside the node or in it?

#

I wonder if, since front end is getting generated automatically, if it makes sense to have a “required” set of parameters displayed, and then a secondary set of “optional” parameters progressively disclosed

tame remnant
#

while the frontend can be automatically generated, it doens't necessarily need to be - we can provide handcrafted UIs for modules where appropriate

#

but i do think it is pretty freaking cool to have it auto generate

rocky pollen
#

hell yeah it is haha.

tame remnant
#

Sweeeet, UI auto-generated from the invocations, processing working

#

few things need fixing but it works

misty cedar
elfin mural
#

In numpy, yes. And the image dimension numbers must be mod 64 or things actually break.

red yew
tame remnant
#

@misty cedar Here is generate.py Invocation with what I believe to be minimal functional UI metadata: https://github.com/psychedelicious/stable-diffusion/blob/react-flow-test/ldm/dream/app/invocations/generate.py how does that look to you

GitHub

This version of CompVis/stable-diffusion features an interactive command-line script that combines text2img and img2img functionality in a "dream bot" style interface, a WebGUI, a...

red yew
#
Generating: 100%
/usr/bin/xdg-open: line 881: www-browser: command not found
...
xdg-open: no method available for opening '/tmp/tmp3xtjpaxg.PNG'

I can't find the invocation for xdg-open (I assume it is just some generic open call) - does anyone know where it is?

tame remnant
#

I believe it's in ldm/dream/app/invocations/image.py @red yew

#
    def invoke(self, services: InvocationServices, context_id: str) -> Outputs:
        image = services.images.get(self.image.uri)
        if image:
            image.show() <----- here
red yew
#

I'm a big fan of putting these sorts of developer options behind envvars

#

we could do sth like

if image and os.environ['INVOKE_SHOW_IMAGE'] == '1':
  image.show()
#

much lower effort than adding it to cli args

tame remnant
#

I think the intention is ShowImageInvocation not a developer option, it's a node you use when you want to display an image

red yew
#

oh I see

#

I was just testing the sample API call from the /docs endpoint

#

I see it now

tame remnant
#

I guess it couldn't actually show the image on your system, based on your initial question?

red yew
#

yes - for some reason PIL was looking for a browser

#

or rather xdg-open was

#

but I don't care about that step anyway

tame remnant
#

Probably something to look into for us. Are you on a non-standard linux distro or something?

red yew
#

regular arch via ssh though

#

maybe because no X is available

tame remnant
#

That'll be the issue. We'll just need to have some error handling for situations like this.

red yew
#
  "nodes": [
    {
      "id": "1",
      "type": "txt2img",
      "prompt": "A photo of a cat eating sushi"
    }
  ],
  "links": [
  ]
}``` ok this returns a 200 with an `{ "id": ... }`
#

but how do I get the actual image?

elfin mural
#

Keep in mind that there's no 'xdg-open' on some platforms (no 'xdg' anything, for that matter)

red yew
#

yeah the image.show() call feels a bit out of place

misty cedar
#

Guessing you're checking out my branch? I just added image storage yesterday, and it's still pretty rough draft (was feverish all day).

The show image node is mostly for debug use, but also figured it would be nice for CLI usage. It's supposed to use the system's image viewer to display, and saves the image to a temp location before displaying. Not sure how that works on different systems though. I have a node in the invocation to not expose it for web UI ☺️

misty cedar
red yew
#

what's the correct http request to get the png image data from a context?

misty cedar
#

There isn't one yet

#

(trying to keep the Todo list updated at the top of this thread)

red yew
#

let's pin the message 🙂

#

so I guess we will have an endpoint /api/v1/context/:id?

#

and /api/v1/context/:id/image.png ?

misty cedar
#

There's either no pinning in forums or I don't have the ability to pin.

Something like that. I'll expose the contexts at a /context uri. Images will either be under /images as a root, or /context/{id}/nodes/{id}/results/ or something like that.

#

Pretty sure I need a better name than "context" though. "session" is the closest I can think of, but it doesn't seem correct (and "job" isn't correct, since the idea is that you can continue adding to it)

elfin mural
rocky pollen
#

Can try! Was just looking to see

elfin mural
#

@rocky pollen you rock

rocky pollen
#

Ok looks like perms need to be looked at

#

I think contributors should now have pin perms

misty cedar
#

Yep I'm seeing pin/unpin now 🙂

#

Also edited the messages so the whole to-do list is its own message

rocky pollen
#

Pseudo project solution ✅

tame remnant
misty cedar
#

I think just having an enum with standard values is probably fine

tame remnant
#

Just as a reference for users then? or can it provide editor hinting / runtime error messages

misty cedar
#

UI_SHOW_SCROLLBAR: "true" or something

tame remnant
#

Oh, gotcha.

#

I think I can generate a reference automatically from the TS types, either way I'll work on what you suggested and some documentation soon

misty cedar
misty cedar
#

Alright contexts can now save (and load). Required some rather significant refactoring. Hopefully it should still be compatible if you've been using the openapi schema to generate UI.

misty cedar
#

I've added an initial contexts API. Expected usage is that you create a context (with or without a graph definition), which returns the context id (but doesn't execute the graph). You then have the option to append more nodes/links (you can also do this after execution). You'll then "invoke" the graph (either a single node at a time, or everything). This will update the context as it invokes (I haven't hooked up signaling yet).

Some notes:

  • There's no way to delete a context. I'm not really clear how we'd handle that - should we delete all associated image results? What if you used them in other contexts? Without loading all contexts I have no way to know (I'm not using a database for contexts).
  • There's no way to "reset" a context. Once you execute it, you can't go back. Similar reasons - how should we handle results? I think you'd probably want to just copy your graph and create a new context from it if you want to "reset" it.
  • There is no way to remove a node from a context (same story).
  • You can't link from a new node to any other existing nodes, for the same reason.

Seems like I should probably make it possible to remove existing nodes as long as they're unexecuted...

red yew
#

it seems like this is developing into a virtual machine to some extent

#

is the goal that the node-based backend is the only http service for invokeai?

tame remnant
#

Still works mostly, but I'm not sure I understand the new way outputs work. Say I want an output with an ID or name of "my_string". It looks like I now need to create an output class extending BaseInvocationOutput, which has an explicit type, and give it the variable 'my_string'. Previously, I could just add an output in the invocation definition. Can I still do that somehow?

#

e.g. before, this works: ```python
class Outputs(BaseInvocationOutput):
my_string: str

def invoke(self, services: InvocationServices, context_id: str) -> Outputs:
    return SimplePromptInvocation.Outputs.construct(
        my_string = self.my_string
    )
elfin mural
#

In any case, a Directed Graph (whether or not, I don't know)

misty cedar
# tame remnant now, I need to make a MyStringInvocationOutput.py to accomplish the same thing (...

I wasn't able to deserialize the outputs without a discriminator field (the type). I couldn't generate those automatically from inner classes without some brittle, complicated code. I realized that we only really had image outputs (and soon prompt outputs), and there wasn't much value in a single output type per invocation class.

You can still make new output classes wherever makes sense if they're needed. They should be able to derive from another output type and override the type field.

Fancy automatic stuff just didn't work out. 😣

#

And yah it's kind of an interactive directed graph (I use graphlib to help determine graph validity and execution order).

red yew
#

will the node-based backend supersede backend/server.py?

misty cedar
#

I believe that's the plan - maintaining two backends would be a pain (it's already difficult enough to maintain an API and CLI). The frontend can be adapted to it though so it operates similar to the current frontend.

tame remnant
#

and many nodes will have multiple outputs

misty cedar
#

That's fine. Outputs will just need a type now

#

Might be useful to make some standard base types and utilize multiple inheritance to re-use some standard fields like image, prompt, etc.

#

I was also considering doing a history lookup to try to fill all parameters, but that can probably happen in the application layer too

tame remnant
#

How do I make a node with multiple outputs of the same type?

misty cedar
#

Just make a new output class with multiple fields

#

(or derive from a base one like ImageOutput and add some extra fields)

tame remnant
#

alright so I will make base output types for each type now and make an invocation using em all to show you and make sur eim doing it right

misty cedar
#

on a positive note, I think things are starting to stabilize

#

not too happy with how the images output/save, but I think it'll be okay

tame remnant
#
GitHub

This version of CompVis/stable-diffusion features an interactive command-line script that combines text2img and img2img functionality in a "dream bot" style interface, a WebGUI, a...

GitHub

This version of CompVis/stable-diffusion features an interactive command-line script that combines text2img and img2img functionality in a "dream bot" style interface, a WebGUI, a...

#

I get it now, last night when I looked at it I was confused

misty cedar
#

Specifying a default isn't necessary on outputs.
I think the ui hints for entire invocations should be in the schema_extra part instead of as a field (fields should just be used for inputs): https://pydantic-docs.helpmanual.io/usage/schema/#schema-customization (at the bottom of the page)
You're also free to put these both in the same file. My current understanding of Python is that you should group like functionality in a single file (as a module)

tame remnant
#

parses fine still just need to remove the type field from outputs

#

Thanks, I hadnt gotten around to reviewing the pydantic schema customization section yet

misty cedar
#

ah right

#

maybe I can customize the schema generation to not include type in the output

#

(though maybe it's helpful to you for ui?)

#

it's super cool that these just automatically work in the UI ♥

tame remnant
#

i can just filter out 'type', it's a reservd property anyways right?

#

yeah haha i love it

#

I already filter out 'id' and 'type' from the invocation itself

misty cedar
#

really interested if a "single node at a time" interactive mode (like the CLI) could work out in the UI

tame remnant
#

like the processing pauses while you choose what to do next?

misty cedar
#

Kind of. My thought was it could be like the current UI - except all the stuff on the left (parameter entry) would be parameters for the current node

tame remnant
#

(i havent used the cli yet)

misty cedar
#

and maybe you could click a bubble next to the input to automatically fill in from previous nodes

#

you should check it out 🙂

tame remnant
#

i should

misty cedar
#

there's also an API now for adding a single node with links

#

with some cool linking options like this: from_node_id: X, from_field: *, to_field: *

#

the * makes it match up all fields it can (by type and name)

tame remnant
#

Hmm. My thinking for recreating the current UI using nodes is we just enforce static arrays of nodes and links, and if you want more than that, you go to the node UI. You are talking about something between the two then, haven't considered it

misty cedar
#

Yah I built it for the CLI and then figured the UI could also use that for a simplified mode

#

And you could jump back and forth between the full graph

#

"interact from here" on the graph could bring you to the single node UI

tame remnant
#

sounds awesome

#

and maybe tricky hehe

misty cedar
#

maybe shrugs

#

would be a good way to expose new functionality for free in the simpler UI though

tame remnant
#

so like you do your processing in simple UI and click "I wanna do more" and it plops you into the node editor with the node view of what you have been using just a moment ago, am i understanding

misty cedar
#

Yah pretty much. The CLI is just building a node graph on a context behind the scenes and executing it every command

#

just appends to the current last node in history

tame remnant
#

itll take me some time to wrap my head around it but it sounds like we have a super badass tool in the works

misty cedar
#

no reason you couldn't go back in history and pick a previous node to branch off of

tame remnant
#

so interestingly, the frontend state management library just diffs prev state from new state and updates like that

#

i think you can save the diffs

#

and get free undo/redo

#

that kinda automatically takes care of going back thru history without needing the server to do anything,

#

maybe not tho

misty cedar
#

Well... undo isn't really a thing once you've executed a node

#

super ugly, but might help get the point across

#

not sure if the layout/flow would feel nice that way, but basically, do one operation (left panel), see result (middle), select next operation. Then the next operation controls replace the left panel and you continue iterating that way

tame remnant
#

oh haha yeah that gets the point across, i wasn't even able to imagine what you were suggesting before that image lol

misty cedar
#

and I guess if you stick on the same one maybe it just keeps chaining off the previous node

rocky pollen
#

So if we ask ourselves how people might use this

#

Are they working on one concept at a time - e.g. exploring prompts and then stemming off of that for img2img loops etc

#

I.e., is this “project based”

misty cedar
#

so far I've had two different usages:

  1. Iterating on a concept to figure out how to reliably create things in a style (this may just be generation parameters, or it might be a chain of operations)
  2. Creating things I've already figured out how to create (e.g. I know how to make portraits in GTA style, so I'm just replacing the "subject" at the beginning, then running the generate + upscale + etc.)
#

and for either of those, it usually involves some seed exploration/repetition

tame remnant
#

The use case for me that is impossible in other UI types is “make a spaghetti junction of connections and tell it to do the thing 100 times and see what crazy stuff comes out”

#

Or the more traditional artistic workflow of slowly iterating on a single work

rocky pollen
#

I think thats a novelty if it’s not something that integrates into workflows

tame remnant
#

arent we in the business of creating totally new stuff?

#

im taking inspiration from Modular Synthesizers: https://www.youtube.com/watch?v=6JeZR13dLLI

A mostly unattended generative piece on the eurorack, just letting this one run.

This patch only uses 4 main voices. 4ms Ensemble (Vangelis brass), MI Plaits (Synth parts), 2xAcid Rain Chainsaws (Bass and partial chords). Plus, a loop via Lubadh (improvised recording from Wavestate) and the background noise/traffic sample.

Patch Notes:

Main c...

▶ Play video
#

this is a graph where the objects being passed around are voltages

rocky pollen
#

I mean yes, but new things that help people solve problems

tame remnant
#

anything can feed to anything else, and it just keeps going as long as you let it

rocky pollen
#

Otherwise I’ve found we won’t have many users of said “things” 🙂

tame remnant
#

i'll argue that the generation of novelty is one of the key components of an expressive and useful universe

#

but that's maybe not so relevant 😛

rocky pollen
#

I imagine that the ideal solution would allow for immense novelty AND be useful to pros looking for a better workflow

tame remnant
#

exactly

#

absolutely agree, and we can do both here for sure

rocky pollen
#

Right - the latter requires more thought on the UX and problems faced by “workflow”

#

The former just needs more nodes and unlimited flexibility

tame remnant
#

yeah, glad to have people like you helping out on the UX side else what i would make would end up looking like the video

misty cedar
#

I think the power in our solution is that the power users can generate novel things (either through the node graph, through new nodes in code, or a combination) and then share those with more common users

#

e.g. some of the huge upscale solutions right now are just "split up the picture, upscale each part, tape it back together" - that could be done with the node graph

rocky pollen
#

💯

misty cedar
#

maybe let people PR "recipes" or something

rocky pollen
#

That or a community site for sharing them

misty cedar
#

where they've selected inputs to a large graph to expose in simpler UI, and outputs that matter

rocky pollen
#

Is the “novelty” seeker an explorer of sorts?

#

And they’re feeding that back to “settlers” that figure out how to use those new things in their workflows

misty cedar
#

something like lexica for this would be cool

rocky pollen
#

They just got 5m in funding lol

misty cedar
#

lol

rocky pollen
#

I don’t know exactly they monetize but they’ve got a lot of data so probably something like prompt data mining

#

Maybe that’s just me being pessimistic 🙂

misty cedar
#

Yah any large amount of data is valuable. And if it gets enough traffic even just advertising opportunity is valuable

tame remnant
#

@misty cedar I'm trying to get the app to do hot reloading, in api_app.py :

    config = uvicorn.Config(
        "ldm.dream.app.api_app:app",
        host = "0.0.0.0",
        port = 9090,
        loop = loop,
        reload=True,
        reload_dirs=['ldm/dream/app/invocations'])
#

the terminal output says it is watching and will reload on changes:

INFO:     Will watch for changes in these directories: ['/Users/spencer/Documents/Code/stable-diffusion/ldm/dream/app/invocations']

but when I make changes to say an invocation file, it does not reload, not even if I am patient

misty cedar
#

You'd have to reload almost everything to pick up the new ones

#

Anywhere that does a from invocations import *

#

(or any path to the invocations)

#

And probably split code out of API app

tame remnant
#

I guess that's what I want to do - as I'm fiddling around with the invocations and the UI, have the server reload every time I make a change

misty cedar
#

And set OpenAPI not to cache the result

tame remnant
#

It also works without the reload_dirs but I was trying to be more specific in case the directory setting wasn't recursive.

#

So uvicorn isn't the thing I want to hot reload then I guess

misty cedar
#

I have no idea. Due to the way the invocations are discovered I'm not sure what stuff would need reloading and what weirdness you'd run into

tame remnant
#

i can just use a shell script

#

kill and restart

#

ty

#

unfortunately due to how the threading is set up my approach isn't working, I think the signal indicating the process has died is never emitted. something like that.

misty cedar
#

Oh yah I haven't fixed the shutdown yet 😣

#

Sorry, been running through a debugger most of the time

misty cedar
#

Okay it should gracefully shutdown now

#

complains about something on API shutdown, but it at least shuts down

misty cedar
#

woohoo, I got socket.io prototyped on FastAPI. No time to do the full implementation tonight though x.x.

tame remnant
#

That's awesome though! and thanks for cracking at the shutdown thing

#

what debugger are you using?

#

little shell script to restart it on change works now 🙂

#

cool util I found to handle it all - entr. Pipe it a list of files to watch and it does the rest: ls ldm/dream/app/invocations/*.py | entr -r python scripts/invoke.py --api

misty cedar
#

I'm using vscode. Not sure what it uses for a debugger, but it works really well.

tame remnant
#

nice

#

I switched back to sublime after years of vscode, its a bit more effort to get things set up with it :/

#

The people who made sublimetext have a really nice graphical git client called SublimeMerge, it makes most of the git stuff understandable for me.

misty cedar
#

Ah nice. I added a few extensions to vscode for git

#

But I'm also used to the git cli...

tame remnant
#

I'll get there eventually, the GUI on this makes a lot of the operations clearer. Resolving conflicts is really smooth too.

rocky pollen
#

Ok I thought i was dense because vscode has been giving me all kinds of hell with git lol

tame remnant
#

@misty cedar may i request friendly operationIds on the API, I think they are being auto-generated now: e.g. "invoke_context_api_v1_contexts__context_id__invoke_put"

#

The openapi-generator project you linked is great btw, thanks. I went ahead and wrote my own methods as an exercise to help me appreciate what is needed to do it right, then saved myself the pain and let the generator make everything 😅

#

yeah, they are being auto-generated, i see where they can be specified (in the router decorators i think is the right term)

#

Tried out the API and it works a treat! So if I want to just immediately invoke, should I create a context, wait for 200, and then invoke it?

misty cedar
#

Yah I haven't put a ton of work into the API docs, was just fleshing out functionality. It seemed to fill out the title for operations nicely though.

Yah I wasn't sure if I should add a query parameter to "invoke now" when creating from a graph. The pattern would be:

  • create context
  • subscribe with socket.io (not in yet)
  • invoke context
tame remnant
# misty cedar Yah I haven't put a ton of work into the API docs, was just fleshing out functio...

I mean the the generated schema autogenerates the "operationIds" and makes those really long names like "invoke_context_api_v1_contexts__context_id__invoke_put". You can add an operationId arg to e.g. in context.py:

@context_router.post('/',
    operationId = 'createContext', # <---
    responses = {
        400: {'description': 'Invalid json'}
    })
async def create_context(

and then the schema uses that as the operationId. Requesting this bc the openapi-generator generates the API code and types based on the operationId.

#

also curious of your opinion of socketio for handling the communication - it seemed really easy and effective to me but this is my first rodeo so maybe just HTTP is better? dunno

misty cedar
#

Naw sockets for notifications are good. Polling is generally bad.

#

socket.io seems to be less maintained, but maybe it's just really stable?

#

I looked at websockets last night, but we'd have to build channels and stuff, which is a pain

#

I don't know if socketio has a backend if you were to scale-out though =/

#

I've used signalr, but that needs .net for hosting

tame remnant
#

I got the impression that socketio was widely used in massive applications

#

There are a lot of different backend implementations/bindings for it, flask-socketio is the simplest one for flask I could. there is the more agnostic python-socketio as well tho. both can use message queues and that stuff (not that I understand what I'm talking about 😛 )

#

Glad to hear i made a reasonable choice w/ the server i wrote, was kinda concerned I just grabbed something that looked nice but had issues

misty cedar
#

Alright, socket.io is in. Events are all defined in /ldm/dream/app/services/events.py. If you want to look at usage (and easier to understand events), run the API, visit /static/test.html, and press the test button 🙂

#

I also added a temporary endpoint for getting images. It uses a query parameter though, which I am not happy with. I'll need to do some work to fix that though.

#

Getting real close though. API needs some cleanup and I need to figure out iteration/join

#

and lots of code cleanup to remove the flask server I had built x.x

tame remnant
#

"Context" sounds really dry next to "Invocation". Isn't "Ritual" a cool word? A ritual is a series of invocations, executed with certain parameters, in a certain order, with a certain intent, but with some uncertainty in the result.

misty cedar
#

I keep leaning toward "Session". I originally was using context since I was going to pass it down to the invocations when they ran. That created a lot of issues though (since it also owned them, and Python really hates circular references), so I rearchitected things, but never changed the name.

#

Invocation made sense, since in addition to being a cool name, it describes what the object does.

#

(and on-brand)

tame remnant
#

Agree Session makes sense and fits more than Context, which sounds kinda technical

misty cedar
#

You can also continue editing the graph after running it, which is in-line with a session

#

made a new thread to talk about iteration

#

I want to make sure it's actually useful before I spend a lot of time on it

#

especially since it doesn't mesh well with how things are currently set up =/

#

(and also because iteration and metadata are really the only big areas missing... and if neither of them is super useful, then I can clean up and we can start integrating!)

misty cedar
#

okay context is renamed to session everywhere

misty cedar
#

I've also removed all of the flask and dependency-injection backend stuff

misty cedar
#

okay and image urls are much nicer now

tame remnant
#

really shaping up, awesome work

#

regarding invocation versions - i am thinking that the invocations themselves may change over time. say I contribute an invocation for cool thing X, and then later I add a feature to it or whatever. when I load my session, i need to load the right invocation

#

i dunno if there is a way to make the invocations a module based on a git repo, something like that... but then we are getting into diy package manager territory

misty cedar
#

As long as they're loaded before the API indexes everything, you can add more invocations from anywhere. It just looks for subclasses of the base invocation

tame remnant
#

the issue is when you have a session that used invocation X version Y, but you have since updated and now invocation X has version Z with breaking changes

#

do we need to have subclasses and a version matcher?

#

if major version is the same, there are no breaking changes and it will still function as it did previously, but if major version is different, you need to load the same invocation version somehow

misty cedar
#

Is it worth that effort? Same thing with models to an extent - there might be a limit to how much we can/should track.

tame remnant
#

might not be worth it. can we easily get a hash-like represnetation of the invocation file? then we can at least say "This session appears to use a different version of the Cool Thing Invocation and may not function as expected."

#

I think a version number in schema_extra is easy enough to do. Checking can happen in the UI - "This Session uses Invocation version X, but version Y is installed. It may not work as expected." I can't imagine the CLI really using sessions much, so it wouldn't affect that client much - or am I mistaken?

misty cedar
#

It wouldn't really reuse them

#

Could add a version, but things below the invocation could also change behavior, and that's tough to track

#

Git hash could work, but local changes would break that

#

And if you're sharing what you've made, then it probably shouldn't be expected to work for someone else unless you're using the same version of the master branch

tame remnant
#

I'm just trying to poke holes to ensure we plan for edge cases and future development

misty cedar
#

Yah. I mean, bring able to share the basic generator related things like prompt, cfg, steps is super useful even between tools/branches. I collaborated with someone who was using the automatic UI successfully that way

tame remnant
#

absolutely

#

once we have the mvp working, we will gain a better perspective of which other features are needed

misty cedar
#

I think I'll rebase on development sometime soon. I may add a context class for invocations to utilize, but otherwise I think it's all about ready to go.

rocky pollen
elfin mural
misty cedar
#

Haha. Try running without the API flag and be amazed by the new (automatically generated) CLI

#

The API is documented at /docs (or /redoc) too (except for the signals... Not a great way to document those)

#

I haven't written many nodes for it, but it's super easy to add functionality to. Just have to write the one file for the invoker and it automatically works in the CLI and Web API

misty cedar
#

Alright, I've "rebased" on current development (I branched from development into a new branch then merged into that - way easier given how far behind I was).

invoke-development

Please try that branch out. I'm still stuck on 3.8.5 until I do an environment rebuild, and that will probably break me for a day or so I assume. Sounds like some people were able to run it fine though, so hopefully there aren't any real changes needed.

If it works fine, I think it's ready for contribution 🙂

tame remnant
#

woohoo!

#

some issues:

  • on the CLI, pressing arrow keys etc inserts control characters, I guess we want readline?
  • txt2img --prompt "a cute dog" | show_image | upscale generates and shows the pupper, but upscale fails:
  File "/Users/spencer/Documents/Code/stable-diffusion/ldm/dream/app/invocations/upscale.py", line 23, in invoke
    image_list     = [[self.image.get(), 0]],
AttributeError: 'ImageField' object has no attribute 'get'
  • Ran txt2img --prompt "a cute dog", that worked, then ran txt2img --prompt "a cute dog" | show_image | upscale and got:
File "/Users/spencer/Documents/Code/stable-diffusion/ldm/dream/app/services/invocation_session.py", line 126, in add_invocation
    from_node = self.invocations[node_id]
KeyError: '-1'
tame remnant
#

Ok, have the UI creating sessions and invoking!

#
  • I expect a common use pattern is to load & connect nodes in the UI, invoke, then remove some nodes, invoke, add some nodes, invoke, change links, invoke, and so on. As far as I understand, this requires a new session each time I remove a node or change links. I suppose I can just keep appending, but then the session state kinda loses its sync with the UI state - the session will have a lot of extra nodes and links. Does it make sense to have an API method to replace the a session's nodes and links entirely?
#

General use pattern:

  • upon loading the UI, either resume/load an existing session from a list/session library or create a new one (and here I'm really tempted again to call Sessions "Rituals", the user library is the "Grimoire" and community preset library the "Arcaneum"...)
  • While user is adding and connecting nodes, not much (nothing?) is sent to the server. When they click Invoke, the graph is sent and immediately processed. So again here the action that would make sense is "set-invocations-and-invoke-all", overwriting the session's nodes and links with the payload.
  • While that is processing, user decides to pause, make changes, and resume. For this, I think the action would be "set-invocations-and-invoke-from-specific-node-or-link", something like that. I'm not sure how pausing at a certain point, modifying future nodes/links, and then resuming from there works on the back end....
rocky pollen
#

I think that's right - (Also, I'm also a fan of some opinionated language... 🙃)

#

Grimoire would be a very easy title for our creative guide...

#

From a UI perspective, one might build out and "Save" an invocation, or "Invoke" it. If a user were to Invoke, while that specific invocation has been executed, the UI should retain it's node layout for modification/editing, and UI would have the outputs available for use in the next invocation

#

There will also probably need to be a "Clear" canvas button which removes all current nodes and resets to default state

terse hill
#

I also can't yet understand how we can work with different instances of nodes. For example, three "generate" nodes in some places at the same time — possible? How about a few prompts? What if the number of nodes increases to 20-30? Should we limit the total number? Should we add a warning about long generation times, or tell approximate creation times?

All I can imagine is a linear path of nodes with image outputs on some of them, different versions of images.

rocky pollen
#

My thought would be that N+1 generation nodes would output images in the sequence generated (mvp) and potentially handle displaying as a grid (future date)

Multiple prompts - Either
A) could be concatenated if they fed into the same input
B) if we want to keep a 1 output / 1 input node connector constraint, the prompt node could have an input that accepts text and appends the text input in that node to the text that is being input

misty cedar
#

@tame remnant I fixed upscale/restore (I think - my environment is pretty broken at the moment). Forgot to convert it to use the image service 🙂

#

Seems from the discussion like it might be useful to have an "invocation" that's separate from the defined graph. I'm not yet sure how that meshes with continuing a previous run though (e.g. chaining from previous nodes so you don't have to run everything again to e.g. upscale).

#

I currently don't have any support for connections where the receiver is a List of N. I considered it, but there are lots of questions, like if order matters, what collections to support, etc.

The graph might currently let you connect multiple outputs to one input. If it does, that's a bug 🙂