Starting a thread to discuss and share progress. Current progress is in branch invoker-framework, and can be run with python scripts/invoke.py with --api for the API version. Please no pull requests to the branch without prior discussion with @misty cedar - things are stabilizing, but there are still some major components to complete that may involve refactors. Until then, enjoy a preview of what's there!
#Invoke Backend (Node-Based Backend)
303 messages · Page 1 of 1 (latest)
To-do:
(rough list, probably incomplete)
✅ Image management. Images in outputs need to be references to something on an image manager, which should manage load/save/caching of images. The idea being that we can retain N images in-memory for pipeline usage, then load the rest from disk as-needed.
✅ Context history: Provide a way to get previous contexts
✅ Context save/load
⬛ Remove unexecuted nodes from a context
✅ Socket.io: I need to prototype this again.
✅ Events/signals: Need to define some standardized signal format and then hook signals everywhere in the invocations. This will probably be something like { "context_id": "...", "invocation_id": "...", ... }.
✅ API: There's no API yet! Well, there's one POST that lets you run a full JSON graph, but I need to build an API. Need everything else in place first though.
⬛ Image metadata. No idea how I'm going to pipe that in, and I don't feel like we settled on what image metadata should even look like when you could have a giant node graph generate images, and lots of input images.
⬛ Iteration. I've been leaving iteration to the end to avoid dealing with control flow. I'm thinking I'll do loop unwrapping or something, but not sure if that happens up-front (even all the way up to the UI) or dynamically at run-time (e.g. see an iteration node and generate all the iterations from its links).
⬛ Disable some nodes in the API (e.g. you don't want show_image popping up images on the host machine if you're accessing it remotely).
Newest tonight:
- The cli now supports command pipes (e.g.
txt2img --prompt "a photo of a cat eating sushi" | upscale | show_image). - There is now a context manager to manage context objects (in-memory currently, no history, and not super clean).
- All invocation (node) processing is done on another thread. Currently a single node at a time - I'll leave the problem of multiple nodes at a time for someone in the future (since it gets really complicated with all the nodes and their needs/performance implications). Invocations can be awaited with a simple wait function call on the context.
- Eventing in-app is fixed. The event system should easily support eventing via external services as well, but I'll not be proving that out with v1. It should be pretty straightforward to get socket.io hooked up again (though I'm sure I'll run into something weird).
Once I've got the core done, I'll clean up old code. At some point after that, it'd be great to rebase on development (or more realistically, create a brand new branch and manually migrate my changes, since they mostly sit on top of the current code right now, and a manual copy would be less hassle).
Things I could use help on after that (or leading up to it, depending on the task):
- Building invocations (nodes)! I've done txt2img, img2img, upscale, gfpgan, and some image utilities. There's a lot left to do though. They're pretty straightforward though 🙂
- Breaking apart the
Generateclass (or at least the function calls). Ideally we have a function call per invocation. I've done what I can here, but wanted to keep my changes minimal until we work on merging efforts. - More flexible configuration. I'd really like to set up configuration to parse to a well-defined object and utilize that during service definition. It'd be awesome if configuration could set up a machine to just run as a queue processor, for example, with the context manager replaced with remote calls to the hosting service.
Probably more things I'm forgetting.
Can you show features of nodes you already have and which ones you plan to do next? I’m trying to design them properly.
/ldm/dream/app/invocations in the branch contains all the current invocations. They're made available in the CLI automatically, and you can also read the OpenAPI doc when running the API to get the full schema for each.
I believe @tame remnant has at least prototyped reading the OpenAPI doc to generate nodes from.
Each node also has a stable type name, so if you wanted custom UI for a particular node, that should be possible.
We can also add additional properties to fields that the UI could pick up and utilize, if necessary. Preferably we minimize that to not mix too many UI details in the backend code, but it's possible.
@terse hill There are two ways to make a Node UI: automatically generated from the code that describes the invocation, and hand-crafted. Right now all nodes are generated automatically
can't check yet, I broke everything. Сonda doesn’t start, although I installed it several times according to the manual. I’ll figure it out later.
I've got my UI to work with the current changes to invoker-framework, no need to change any code there. I'd like to submit a PR with a simple_prompt.py invocation - that ok?
I'm slowly working out parsing and inferring UI from the OpenAPI doc. Without any UI details in the invocation I need a lot of logic but it's certainly doable. If I could just add a uiHint dict like discussed, it becomes sooo much simpler. What if we have an Invocation file, and then another file that describes its UI? I will still get things working to infer almost everything tho.
re: TextToImageInvocation & ImageToImageInvocation - should model and progress_images be there? I feel like those are system settings that don't need to be in any node
One UI hint I think is required is which parameters should have a handle to be linkable. For example, if I want 'prompt' to require an input (maybe this isn't how it should be but just for an example), the schema needs to indicate this somehow.
If the Field is optional, the schema shows no default value. I can use this, but then what if we have some other field which is optional but also does not require a link? I think there needs to be a flag for fields that require input links
Hmm. Ctrl+C no longer kills the script properly.
I mean invoke.py --api
Another thing that is needed in the invocation somewhere - for string types, do I show a small HTML input element or a multi-line textarea. I'm going to update my fork's invocations with the minimum UI hinting needed. I'll make it an additional property of the field called ui and you can give me your feedback, Kyle
Can't sleep 😣. Yah I bet the threading killed Ctrl+c. Need to work that out (some todos in the code about that).
There are only really string types (text field), enum types (drop-down), and numeric (some with range requirements, mostly just minimums). Getting a good auto-UI setup may be a pain, but will make it easier for contribution later.
All non-required parameters are linkable. And required parameters can even be linked (but you have to provide a value directly as well).
Model is in there specifically since I want to support multiple models in the same node graph. There's a discussion thread discussing how it can be done. You can hide it for now, but I want to plan on it ☺️.
Progress images has always been a flag. I don't know how useful it will be in new UI, but I was going to add events for it (or at least step events like exist today).
I mostly am trying to avoid more nodes so I don't have more to refactor as I continue changing things around. I'll hopefully stabilize the code too, but I have some field handling and results serialization work to do, and that could be disruptive.
Ok gotcha re: model, that will be sweet
in the react UI i have progress images as system-wide. but I realize it's also a node thing. I'll give nodes an additional settings tab, based on props like model and progress images, so you can manage per node
im just working out how to parse the numeric stuff now. javascript only has 'number' as a type, it's kinda hilarious. you can't get a float or int or anything reasonable. just number.
besides that I can generate the UI from the invocation now, with some assumptions. some light hinting makes the UI much nicer tho
mind on invoke stuff keep ya up? hopefully you're fast asleep now tho
width and height need pydantic's multiple_of=64. seed needs ge=0 and le=4294967295 (numpy's max)
and I believe cfg scale is gt=0
I think the question around nodes and parameters is an important one to ask - should parameters that don’t make sense except in the context of a specific node live outside the node or in it?
I wonder if, since front end is getting generated automatically, if it makes sense to have a “required” set of parameters displayed, and then a secondary set of “optional” parameters progressively disclosed
while the frontend can be automatically generated, it doens't necessarily need to be - we can provide handcrafted UIs for modules where appropriate
but i do think it is pretty freaking cool to have it auto generate
hell yeah it is haha.
Sweeeet, UI auto-generated from the invocations, processing working
few things need fixing but it works
Does it strictly have to be a multiple?
Is numpy max a constant somewhere?
In numpy, yes. And the image dimension numbers must be mod 64 or things actually break.
fix for
PermissionError: [Errno 13] Permission denied: '/results'
https://github.com/diffubik/InvokeAI/commit/ee8a043d023ee981e0c92969b984878bc8faff00
fix for ctrl-c
https://github.com/diffubik/InvokeAI/commit/136c529c56067c1e6717d76a662d4bcd4a86a0c8
Thanks!
@misty cedar Here is generate.py Invocation with what I believe to be minimal functional UI metadata: https://github.com/psychedelicious/stable-diffusion/blob/react-flow-test/ldm/dream/app/invocations/generate.py how does that look to you
Generating: 100%
/usr/bin/xdg-open: line 881: www-browser: command not found
...
xdg-open: no method available for opening '/tmp/tmp3xtjpaxg.PNG'
I can't find the invocation for xdg-open (I assume it is just some generic open call) - does anyone know where it is?
I believe it's in ldm/dream/app/invocations/image.py @red yew
def invoke(self, services: InvocationServices, context_id: str) -> Outputs:
image = services.images.get(self.image.uri)
if image:
image.show() <----- here
I'm a big fan of putting these sorts of developer options behind envvars
we could do sth like
if image and os.environ['INVOKE_SHOW_IMAGE'] == '1':
image.show()
much lower effort than adding it to cli args
that's how I re-added the low gpu patch back to the invoke branch https://github.com/diffubik/InvokeAI/commit/a4d5851580765cbd45cb2db2a7b8c04e1e946b55
I think the intention is ShowImageInvocation not a developer option, it's a node you use when you want to display an image
I guess it couldn't actually show the image on your system, based on your initial question?
yes - for some reason PIL was looking for a browser
or rather xdg-open was
but I don't care about that step anyway
Probably something to look into for us. Are you on a non-standard linux distro or something?
That'll be the issue. We'll just need to have some error handling for situations like this.
"nodes": [
{
"id": "1",
"type": "txt2img",
"prompt": "A photo of a cat eating sushi"
}
],
"links": [
]
}``` ok this returns a 200 with an `{ "id": ... }`
but how do I get the actual image?
Keep in mind that there's no 'xdg-open' on some platforms (no 'xdg' anything, for that matter)
yeah the image.show() call feels a bit out of place
Guessing you're checking out my branch? I just added image storage yesterday, and it's still pretty rough draft (was feverish all day).
The show image node is mostly for debug use, but also figured it would be nice for CLI usage. It's supposed to use the system's image viewer to display, and saves the image to a temp location before displaying. Not sure how that works on different systems though. I have a node in the invocation to not expose it for web UI ☺️
I believe there's a title value on Field that can be set - could that be used for display name?
Why requires connection? I think even image nodes won't require a connection soon.
We should probably have a UI constants enum somewhere with all the strings defined. That would help with consistency/refactoring/preventing typos.
what's the correct http request to get the png image data from a context?
There isn't one yet
(trying to keep the Todo list updated at the top of this thread)
let's pin the message 🙂
so I guess we will have an endpoint /api/v1/context/:id?
and /api/v1/context/:id/image.png ?
There's either no pinning in forums or I don't have the ability to pin.
Something like that. I'll expose the contexts at a /context uri. Images will either be under /images as a root, or /context/{id}/nodes/{id}/results/ or something like that.
Pretty sure I need a better name than "context" though. "session" is the closest I can think of, but it doesn't seem correct (and "job" isn't correct, since the idea is that you can continue adding to it)
@rocky pollen; are you able to pin #1024222732642177055 message ? I cannot...
Can try! Was just looking to see
@rocky pollen you rock
Ok looks like perms need to be looked at
I think contributors should now have pin perms
Yep I'm seeing pin/unpin now 🙂
Also edited the messages so the whole to-do list is its own message
Pseudo project solution ✅
Yeah I saw the title value on Fields - that's better.
Requires connection tells the UI to add a connection handle to that field and not attempt to display the value of it.
So would we need to extend Field into InvocationField and add ui to it, defining types and such?
I think just having an enum with standard values is probably fine
Just as a reference for users then? or can it provide editor hinting / runtime error messages
UI_SHOW_SCROLLBAR: "true" or something
Oh, gotcha.
I think I can generate a reference automatically from the TS types, either way I'll work on what you suggested and some documentation soon
@tame remnant This might be useful for UI. Can use more specific fields that seem to include more info: https://pydantic-docs.helpmanual.io/usage/schema/#json-schema-types
Data validation and settings management using Python type hints
Alright contexts can now save (and load). Required some rather significant refactoring. Hopefully it should still be compatible if you've been using the openapi schema to generate UI.
I've added an initial contexts API. Expected usage is that you create a context (with or without a graph definition), which returns the context id (but doesn't execute the graph). You then have the option to append more nodes/links (you can also do this after execution). You'll then "invoke" the graph (either a single node at a time, or everything). This will update the context as it invokes (I haven't hooked up signaling yet).
Some notes:
- There's no way to delete a context. I'm not really clear how we'd handle that - should we delete all associated image results? What if you used them in other contexts? Without loading all contexts I have no way to know (I'm not using a database for contexts).
- There's no way to "reset" a context. Once you execute it, you can't go back. Similar reasons - how should we handle results? I think you'd probably want to just copy your graph and create a new context from it if you want to "reset" it.
- There is no way to remove a node from a context (same story).
- You can't link from a new node to any other existing nodes, for the same reason.
Seems like I should probably make it possible to remove existing nodes as long as they're unexecuted...
it seems like this is developing into a virtual machine to some extent
is the goal that the node-based backend is the only http service for invokeai?
Still works mostly, but I'm not sure I understand the new way outputs work. Say I want an output with an ID or name of "my_string". It looks like I now need to create an output class extending BaseInvocationOutput, which has an explicit type, and give it the variable 'my_string'. Previously, I could just add an output in the invocation definition. Can I still do that somehow?
e.g. before, this works: ```python
class Outputs(BaseInvocationOutput):
my_string: str
def invoke(self, services: InvocationServices, context_id: str) -> Outputs:
return SimplePromptInvocation.Outputs.construct(
my_string = self.my_string
)
now, I need to make a MyStringInvocationOutput.py to accomplish the same thing (as far as I can tell) which is far less flexible
"state machine", I think
In any case, a Directed Graph (whether or not, I don't know)
I wasn't able to deserialize the outputs without a discriminator field (the type). I couldn't generate those automatically from inner classes without some brittle, complicated code. I realized that we only really had image outputs (and soon prompt outputs), and there wasn't much value in a single output type per invocation class.
You can still make new output classes wherever makes sense if they're needed. They should be able to derive from another output type and override the type field.
Fancy automatic stuff just didn't work out. 😣
And yah it's kind of an interactive directed graph (I use graphlib to help determine graph validity and execution order).
will the node-based backend supersede backend/server.py?
I believe that's the plan - maintaining two backends would be a pain (it's already difficult enough to maintain an API and CLI). The frontend can be adapted to it though so it operates similar to the current frontend.
we'll have string, integer, float, boolean and image outputs
and many nodes will have multiple outputs
That's fine. Outputs will just need a type now
Might be useful to make some standard base types and utilize multiple inheritance to re-use some standard fields like image, prompt, etc.
I was also considering doing a history lookup to try to fill all parameters, but that can probably happen in the application layer too
How do I make a node with multiple outputs of the same type?
Just make a new output class with multiple fields
(or derive from a base one like ImageOutput and add some extra fields)
alright so I will make base output types for each type now and make an invocation using em all to show you and make sur eim doing it right
on a positive note, I think things are starting to stabilize
not too happy with how the images output/save, but I think it'll be okay
Ok, so this is my invocation: https://github.com/psychedelicious/stable-diffusion/blob/react-flow/ldm/dream/app/invocations/test_invocation.py and here is my output https://github.com/psychedelicious/stable-diffusion/blob/react-flow/ldm/dream/app/invocations/test_invocation_output.py
GitHub
This version of CompVis/stable-diffusion features an interactive command-line script that combines text2img and img2img functionality in a "dream bot" style interface, a WebGUI, a...
I get it now, last night when I looked at it I was confused
Specifying a default isn't necessary on outputs.
I think the ui hints for entire invocations should be in the schema_extra part instead of as a field (fields should just be used for inputs): https://pydantic-docs.helpmanual.io/usage/schema/#schema-customization (at the bottom of the page)
You're also free to put these both in the same file. My current understanding of Python is that you should group like functionality in a single file (as a module)
Data validation and settings management using Python type hints
parses fine still just need to remove the type field from outputs
Thanks, I hadnt gotten around to reviewing the pydantic schema customization section yet
ah right
maybe I can customize the schema generation to not include type in the output
(though maybe it's helpful to you for ui?)
it's super cool that these just automatically work in the UI ♥
i can just filter out 'type', it's a reservd property anyways right?
yeah haha i love it
I already filter out 'id' and 'type' from the invocation itself
really interested if a "single node at a time" interactive mode (like the CLI) could work out in the UI
like the processing pauses while you choose what to do next?
Kind of. My thought was it could be like the current UI - except all the stuff on the left (parameter entry) would be parameters for the current node
(i havent used the cli yet)
and maybe you could click a bubble next to the input to automatically fill in from previous nodes
you should check it out 🙂
i should
there's also an API now for adding a single node with links
with some cool linking options like this: from_node_id: X, from_field: *, to_field: *
the * makes it match up all fields it can (by type and name)
Hmm. My thinking for recreating the current UI using nodes is we just enforce static arrays of nodes and links, and if you want more than that, you go to the node UI. You are talking about something between the two then, haven't considered it
Yah I built it for the CLI and then figured the UI could also use that for a simplified mode
And you could jump back and forth between the full graph
"interact from here" on the graph could bring you to the single node UI
maybe shrugs
would be a good way to expose new functionality for free in the simpler UI though
so like you do your processing in simple UI and click "I wanna do more" and it plops you into the node editor with the node view of what you have been using just a moment ago, am i understanding
Yah pretty much. The CLI is just building a node graph on a context behind the scenes and executing it every command
just appends to the current last node in history
itll take me some time to wrap my head around it but it sounds like we have a super badass tool in the works
no reason you couldn't go back in history and pick a previous node to branch off of
so interestingly, the frontend state management library just diffs prev state from new state and updates like that
i think you can save the diffs
and get free undo/redo
that kinda automatically takes care of going back thru history without needing the server to do anything,
maybe not tho
Well... undo isn't really a thing once you've executed a node
super ugly, but might help get the point across
not sure if the layout/flow would feel nice that way, but basically, do one operation (left panel), see result (middle), select next operation. Then the next operation controls replace the left panel and you continue iterating that way
oh haha yeah that gets the point across, i wasn't even able to imagine what you were suggesting before that image lol
and I guess if you stick on the same one maybe it just keeps chaining off the previous node
So if we ask ourselves how people might use this
Are they working on one concept at a time - e.g. exploring prompts and then stemming off of that for img2img loops etc
I.e., is this “project based”
so far I've had two different usages:
- Iterating on a concept to figure out how to reliably create things in a style (this may just be generation parameters, or it might be a chain of operations)
- Creating things I've already figured out how to create (e.g. I know how to make portraits in GTA style, so I'm just replacing the "subject" at the beginning, then running the generate + upscale + etc.)
and for either of those, it usually involves some seed exploration/repetition
The use case for me that is impossible in other UI types is “make a spaghetti junction of connections and tell it to do the thing 100 times and see what crazy stuff comes out”
Or the more traditional artistic workflow of slowly iterating on a single work
I think thats a novelty if it’s not something that integrates into workflows
arent we in the business of creating totally new stuff?
im taking inspiration from Modular Synthesizers: https://www.youtube.com/watch?v=6JeZR13dLLI
A mostly unattended generative piece on the eurorack, just letting this one run.
This patch only uses 4 main voices. 4ms Ensemble (Vangelis brass), MI Plaits (Synth parts), 2xAcid Rain Chainsaws (Bass and partial chords). Plus, a loop via Lubadh (improvised recording from Wavestate) and the background noise/traffic sample.
Patch Notes:
Main c...
this is a graph where the objects being passed around are voltages
I mean yes, but new things that help people solve problems
anything can feed to anything else, and it just keeps going as long as you let it
Otherwise I’ve found we won’t have many users of said “things” 🙂
i'll argue that the generation of novelty is one of the key components of an expressive and useful universe
but that's maybe not so relevant 😛
I imagine that the ideal solution would allow for immense novelty AND be useful to pros looking for a better workflow
Right - the latter requires more thought on the UX and problems faced by “workflow”
The former just needs more nodes and unlimited flexibility
yeah, glad to have people like you helping out on the UX side else what i would make would end up looking like the video
I think the power in our solution is that the power users can generate novel things (either through the node graph, through new nodes in code, or a combination) and then share those with more common users
e.g. some of the huge upscale solutions right now are just "split up the picture, upscale each part, tape it back together" - that could be done with the node graph
💯
maybe let people PR "recipes" or something
That or a community site for sharing them
where they've selected inputs to a large graph to expose in simpler UI, and outputs that matter
Is the “novelty” seeker an explorer of sorts?
And they’re feeding that back to “settlers” that figure out how to use those new things in their workflows
something like lexica for this would be cool
They just got 5m in funding lol
lol
I don’t know exactly they monetize but they’ve got a lot of data so probably something like prompt data mining
Maybe that’s just me being pessimistic 🙂
Yah any large amount of data is valuable. And if it gets enough traffic even just advertising opportunity is valuable
@misty cedar I'm trying to get the app to do hot reloading, in api_app.py :
config = uvicorn.Config(
"ldm.dream.app.api_app:app",
host = "0.0.0.0",
port = 9090,
loop = loop,
reload=True,
reload_dirs=['ldm/dream/app/invocations'])
the terminal output says it is watching and will reload on changes:
INFO: Will watch for changes in these directories: ['/Users/spencer/Documents/Code/stable-diffusion/ldm/dream/app/invocations']
but when I make changes to say an invocation file, it does not reload, not even if I am patient
You'd have to reload almost everything to pick up the new ones
Anywhere that does a from invocations import *
(or any path to the invocations)
And probably split code out of API app
I guess that's what I want to do - as I'm fiddling around with the invocations and the UI, have the server reload every time I make a change
And set OpenAPI not to cache the result
It also works without the reload_dirs but I was trying to be more specific in case the directory setting wasn't recursive.
So uvicorn isn't the thing I want to hot reload then I guess
I have no idea. Due to the way the invocations are discovered I'm not sure what stuff would need reloading and what weirdness you'd run into
i can just use a shell script
kill and restart
ty
unfortunately due to how the threading is set up my approach isn't working, I think the signal indicating the process has died is never emitted. something like that.
Oh yah I haven't fixed the shutdown yet 😣
Sorry, been running through a debugger most of the time
Okay it should gracefully shutdown now
complains about something on API shutdown, but it at least shuts down
woohoo, I got socket.io prototyped on FastAPI. No time to do the full implementation tonight though x.x.
You are moving plenty fast mate
That's awesome though! and thanks for cracking at the shutdown thing
what debugger are you using?
little shell script to restart it on change works now 🙂
cool util I found to handle it all - entr. Pipe it a list of files to watch and it does the rest: ls ldm/dream/app/invocations/*.py | entr -r python scripts/invoke.py --api
I'm using vscode. Not sure what it uses for a debugger, but it works really well.
nice
I switched back to sublime after years of vscode, its a bit more effort to get things set up with it :/
The people who made sublimetext have a really nice graphical git client called SublimeMerge, it makes most of the git stuff understandable for me.
Ah nice. I added a few extensions to vscode for git
But I'm also used to the git cli...
I'll get there eventually, the GUI on this makes a lot of the operations clearer. Resolving conflicts is really smooth too.
Ok I thought i was dense because vscode has been giving me all kinds of hell with git lol
@misty cedar may i request friendly operationIds on the API, I think they are being auto-generated now: e.g. "invoke_context_api_v1_contexts__context_id__invoke_put"
The openapi-generator project you linked is great btw, thanks. I went ahead and wrote my own methods as an exercise to help me appreciate what is needed to do it right, then saved myself the pain and let the generator make everything 😅
yeah, they are being auto-generated, i see where they can be specified (in the router decorators i think is the right term)
Tried out the API and it works a treat! So if I want to just immediately invoke, should I create a context, wait for 200, and then invoke it?
Yah I haven't put a ton of work into the API docs, was just fleshing out functionality. It seemed to fill out the title for operations nicely though.
Yah I wasn't sure if I should add a query parameter to "invoke now" when creating from a graph. The pattern would be:
- create context
- subscribe with socket.io (not in yet)
- invoke context
I mean the the generated schema autogenerates the "operationIds" and makes those really long names like "invoke_context_api_v1_contexts__context_id__invoke_put". You can add an operationId arg to e.g. in context.py:
@context_router.post('/',
operationId = 'createContext', # <---
responses = {
400: {'description': 'Invalid json'}
})
async def create_context(
and then the schema uses that as the operationId. Requesting this bc the openapi-generator generates the API code and types based on the operationId.
see https://fastapi.tiangolo.com/advanced/path-operation-advanced-configuration/#openapi-operationid
FastAPI framework, high performance, easy to learn, fast to code, ready for production
also curious of your opinion of socketio for handling the communication - it seemed really easy and effective to me but this is my first rodeo so maybe just HTTP is better? dunno
Naw sockets for notifications are good. Polling is generally bad.
socket.io seems to be less maintained, but maybe it's just really stable?
I looked at websockets last night, but we'd have to build channels and stuff, which is a pain
I don't know if socketio has a backend if you were to scale-out though =/
I've used signalr, but that needs .net for hosting
I got the impression that socketio was widely used in massive applications
There are a lot of different backend implementations/bindings for it, flask-socketio is the simplest one for flask I could. there is the more agnostic python-socketio as well tho. both can use message queues and that stuff (not that I understand what I'm talking about 😛 )
Glad to hear i made a reasonable choice w/ the server i wrote, was kinda concerned I just grabbed something that looked nice but had issues
Alright, socket.io is in. Events are all defined in /ldm/dream/app/services/events.py. If you want to look at usage (and easier to understand events), run the API, visit /static/test.html, and press the test button 🙂
I also added a temporary endpoint for getting images. It uses a query parameter though, which I am not happy with. I'll need to do some work to fix that though.
Getting real close though. API needs some cleanup and I need to figure out iteration/join
and lots of code cleanup to remove the flask server I had built x.x
"Context" sounds really dry next to "Invocation". Isn't "Ritual" a cool word? A ritual is a series of invocations, executed with certain parameters, in a certain order, with a certain intent, but with some uncertainty in the result.
I keep leaning toward "Session". I originally was using context since I was going to pass it down to the invocations when they ran. That created a lot of issues though (since it also owned them, and Python really hates circular references), so I rearchitected things, but never changed the name.
Invocation made sense, since in addition to being a cool name, it describes what the object does.
(and on-brand)
Agree Session makes sense and fits more than Context, which sounds kinda technical
You can also continue editing the graph after running it, which is in-line with a session
made a new thread to talk about iteration
I want to make sure it's actually useful before I spend a lot of time on it
especially since it doesn't mesh well with how things are currently set up =/
(and also because iteration and metadata are really the only big areas missing... and if neither of them is super useful, then I can clean up and we can start integrating!)
okay context is renamed to session everywhere
I've also removed all of the flask and dependency-injection backend stuff
okay and image urls are much nicer now
really shaping up, awesome work
regarding invocation versions - i am thinking that the invocations themselves may change over time. say I contribute an invocation for cool thing X, and then later I add a feature to it or whatever. when I load my session, i need to load the right invocation
i dunno if there is a way to make the invocations a module based on a git repo, something like that... but then we are getting into diy package manager territory
As long as they're loaded before the API indexes everything, you can add more invocations from anywhere. It just looks for subclasses of the base invocation
the issue is when you have a session that used invocation X version Y, but you have since updated and now invocation X has version Z with breaking changes
do we need to have subclasses and a version matcher?
if major version is the same, there are no breaking changes and it will still function as it did previously, but if major version is different, you need to load the same invocation version somehow
Is it worth that effort? Same thing with models to an extent - there might be a limit to how much we can/should track.
might not be worth it. can we easily get a hash-like represnetation of the invocation file? then we can at least say "This session appears to use a different version of the Cool Thing Invocation and may not function as expected."
I think a version number in schema_extra is easy enough to do. Checking can happen in the UI - "This Session uses Invocation version X, but version Y is installed. It may not work as expected." I can't imagine the CLI really using sessions much, so it wouldn't affect that client much - or am I mistaken?
It wouldn't really reuse them
Could add a version, but things below the invocation could also change behavior, and that's tough to track
Git hash could work, but local changes would break that
And if you're sharing what you've made, then it probably shouldn't be expected to work for someone else unless you're using the same version of the master branch
suppose that's fair
I'm just trying to poke holes to ensure we plan for edge cases and future development
Yah. I mean, bring able to share the basic generator related things like prompt, cfg, steps is super useful even between tools/branches. I collaborated with someone who was using the automatic UI successfully that way
absolutely
once we have the mvp working, we will gain a better perspective of which other features are needed
I think I'll rebase on development sometime soon. I may add a context class for invocations to utilize, but otherwise I think it's all about ready to go.
I actually love it but that’s just my vote for being “on brand” 😜
@misty cedar; I ran python scripts/invoke.py --api and loaded 'http://localhost:9090/static/test.html', hit the button, and it works. Cool. What other tricks can it do? 🤣
Haha. Try running without the API flag and be amazed by the new (automatically generated) CLI
The API is documented at /docs (or /redoc) too (except for the signals... Not a great way to document those)
I haven't written many nodes for it, but it's super easy to add functionality to. Just have to write the one file for the invoker and it automatically works in the CLI and Web API
Alright, I've "rebased" on current development (I branched from development into a new branch then merged into that - way easier given how far behind I was).
invoke-development
Please try that branch out. I'm still stuck on 3.8.5 until I do an environment rebuild, and that will probably break me for a day or so I assume. Sounds like some people were able to run it fine though, so hopefully there aren't any real changes needed.
If it works fine, I think it's ready for contribution 🙂
woohoo!
some issues:
- on the CLI, pressing arrow keys etc inserts control characters, I guess we want readline?
txt2img --prompt "a cute dog" | show_image | upscalegenerates and shows the pupper, but upscale fails:
File "/Users/spencer/Documents/Code/stable-diffusion/ldm/dream/app/invocations/upscale.py", line 23, in invoke
image_list = [[self.image.get(), 0]],
AttributeError: 'ImageField' object has no attribute 'get'
- Ran
txt2img --prompt "a cute dog", that worked, then rantxt2img --prompt "a cute dog" | show_image | upscaleand got:
File "/Users/spencer/Documents/Code/stable-diffusion/ldm/dream/app/services/invocation_session.py", line 126, in add_invocation
from_node = self.invocations[node_id]
KeyError: '-1'
Ok, have the UI creating sessions and invoking!
- I expect a common use pattern is to load & connect nodes in the UI, invoke, then remove some nodes, invoke, add some nodes, invoke, change links, invoke, and so on. As far as I understand, this requires a new session each time I remove a node or change links. I suppose I can just keep appending, but then the session state kinda loses its sync with the UI state - the session will have a lot of extra nodes and links. Does it make sense to have an API method to replace the a session's nodes and links entirely?
General use pattern:
- upon loading the UI, either resume/load an existing session from a list/session library or create a new one (and here I'm really tempted again to call Sessions "Rituals", the user library is the "Grimoire" and community preset library the "Arcaneum"...)
- While user is adding and connecting nodes, not much (nothing?) is sent to the server. When they click Invoke, the graph is sent and immediately processed. So again here the action that would make sense is "set-invocations-and-invoke-all", overwriting the session's nodes and links with the payload.
- While that is processing, user decides to pause, make changes, and resume. For this, I think the action would be "set-invocations-and-invoke-from-specific-node-or-link", something like that. I'm not sure how pausing at a certain point, modifying future nodes/links, and then resuming from there works on the back end....
I think that's right - (Also, I'm also a fan of some opinionated language... 🙃)
Grimoire would be a very easy title for our creative guide...
From a UI perspective, one might build out and "Save" an invocation, or "Invoke" it. If a user were to Invoke, while that specific invocation has been executed, the UI should retain it's node layout for modification/editing, and UI would have the outputs available for use in the next invocation
There will also probably need to be a "Clear" canvas button which removes all current nodes and resets to default state
I also can't yet understand how we can work with different instances of nodes. For example, three "generate" nodes in some places at the same time — possible? How about a few prompts? What if the number of nodes increases to 20-30? Should we limit the total number? Should we add a warning about long generation times, or tell approximate creation times?
All I can imagine is a linear path of nodes with image outputs on some of them, different versions of images.
My thought would be that N+1 generation nodes would output images in the sequence generated (mvp) and potentially handle displaying as a grid (future date)
Multiple prompts - Either
A) could be concatenated if they fed into the same input
B) if we want to keep a 1 output / 1 input node connector constraint, the prompt node could have an input that accepts text and appends the text input in that node to the text that is being input
@tame remnant I fixed upscale/restore (I think - my environment is pretty broken at the moment). Forgot to convert it to use the image service 🙂
Seems from the discussion like it might be useful to have an "invocation" that's separate from the defined graph. I'm not yet sure how that meshes with continuing a previous run though (e.g. chaining from previous nodes so you don't have to run everything again to e.g. upscale).
I currently don't have any support for connections where the receiver is a List of N. I considered it, but there are lots of questions, like if order matters, what collections to support, etc.
The graph might currently let you connect multiple outputs to one input. If it does, that's a bug 🙂