#🧨 diffusers
1 messages · Page 5 of 1
Love the emoji prompts
Makes me think we should do a promptcraft challenge with only emoji LOL
I can't keep up with promptcraft without emoji! But that is a fun idea.
Great progress! I see there are a few PRs that are requesting my review. I hope to be able to review and test Sunday morning. (Aside: it's a busy weekend for me because my choir is singing "Lord of the Rings - In Concert" in three performances, each of which is 3 hrs long. A real marathon.)
it's particularly the inpainting model PRs I've been tagging you with
@worldly cloak I'm going to bug you to mark this as OK since I don't see a nice workaround given the code base. https://github.com/invoke-ai/InvokeAI/pull/2414
oh right, I meant to get back to that one too
Can't bug you more tonight since I'm going out for dinner and then a roller skating party for a friend's kid's B'Mitzvah at 9:30PM (!!!).
go on, go on, everyone has cooler plans for their Saturday than I do, I get it 😉
At 9:30 I am typically not out and about!
What's happened to inpaint replace? I don't see where that option is hooked up, and I thought we were making strength go to 1 and that would be inpaint replace...?
I addressed your latest comment.
I also noticed that hires fix seems to crash when using the inpainting model - unrelated to my PR - so that's something to fix once my PR is merged in.
It should be a non blocking issue for an RC. Txt2img gen with inpainting model is an edge case I’d think
So on huggingface, i just noticed that someone from the diffusers team replied to a ckpt model w/ "
Your model repository seems to contain a stable diffusion checkpoint. We have just merged a PR in diffusers which makes scaling_factor (the magic 0.18215 number a config attribute of vae).
We strongly recommend that you merge this PR to make sure your model keeps working correctly with diffusers. "
Looks like this edits the config.json of the vae with "scaling_factor": 0.18215,
I presume since we're routing ckpts through non-diffusers code, its irrelevant, but wondering if we'll need to do some smart mgmt if a diffusers model comes in without this?
(diffusers might already be handling this, but am curious how changes like this are being managed)
Agree. I still want to look into it if it's an easy fix.
JPPhoto-fix-inpainting-after-fix-hires-fix sounds like a great branch name.
☝️ already have a PR for inpainting highres fix
Cool! I think I addressed the last of your requested changes in my PR, please have a look.
One other outstanding item is that it no longer uses DDIM for the 2nd pass and will use the sampler you selected. I think.
But that's not related to either of those PRs.
is that a bug?
I don't know what the behavior should be. If you're using a *_a sampler, the upscaled image will not resemble the initial one much at all. If you're using one of the others, it's probably fine.
I imagine ddim was chosen initially is for its consistent results.
I assumed it was an artificial constraint from k-diffusion not implementing img2img for most samplers
Sure! Could be that. No idea. So it's worth investigating, I think.
And if there's no reason to have that message about DDIM in the output, we should remove that.
>> Interpolating from 512x512 to 768x768 using DDIM sampling
That should become >> Interpolating from 512x512 to 768x768
Do the docs need to mention getting and installing libcudnn8 as a requirement? I just hit an annoying delay installing in a WSL2 Ubuntu container because that isn't included with the CUDA 12.0 packages.
This cuDNN 8.7.0 Installation Guide provides step-by-step instructions on how to install and check for correct operation of NVIDIA cuDNN on Linux and Microsoft Windows systems.
And WSL2 users will also have to do some symlink magic to get things working, but we are likely a very small minority of InvokeAI users.
Hi all, despite best intentions I was unable to get any work done on InvokeAI. This is what I was doing instead. First pic is the audience view of the orchestra and choir, and the second is the view from the orchestra. 3,500 audience members x 3 performance - you can only see the lower half of the audience in this photo. I'm the tiny dot in the second row of the choir on the right.
I'm hoping to complete merging in the pending PRs tomorrow and get a release candidate out Monday or Tuesday morning.
Wow!!! What a show! Wish I could have seen it
oh, maybe that's what's caused some of the issues finding that library. CUDA Toolkit 12 is a recent relase; current PyTorch assumes CUDA 11.7
libcudnn8 works well with 12.0 as far as Invoke is concerned, but it needs to be installed separately and that's a pain.
Can I get a few testers on this patch for fixing thresholding in diffusers? "Good" values seem to depend a lot on the sampler you're using, which isn't surprising but will be fun to document. I've been happiest with ddim.
@worldly cloak This was the only way I could get it hooked up and working as I imagined the output should look.
I'd love testing on both diffusers and ckpt models if possible. I don't know if it works with ckpt models and k_samplers at all or if they follow a different code path for thresholding entirely.
ckpt models don't touch anything in that class, no
is it expected that selecting k_dpmppp_2_a has no effect with a diffusers model? console says "Unsupported Sampler: k_dpmpp_2_a Defaulting to PNDMScheduler"
also, based on the fast latents display k_dpm_2_a seems to be only injecting new noise every second step, vs k_euler_a which injects noise every step
ah i see there's no entry in the map
I think a big issue with thresholding is that each sampler has its own noise schedule and the thresholding has no knowledge of that schedule. Not sure there's anything we can do about it, but that's why it behaves differently with each sampler.
@worldly cloak Did your inpainting fixes make it into main?
not yet, https://github.com/invoke-ai/InvokeAI/pull/2447 is still awaiting review
I'm happy to give it the blind review, but I figure this probably merits better than that.
I've just tested out the inpainting fixes and they make a world of difference. Thank you! I'll keep my end of the bargain and modify the model manager so that ckpt files are converted into diffusers on the fly using @tardy sparrow 's fix unless the user provides a --no-ckpt-convert option.
Later we'll remove this argument and the whole ckpt generation code path.
We might consider some midpoint release before nodes that formalizes the conversion
@tardy sparrow I just tried to convert a few safetensors files using the code that is currently in huggingface diffusers, and they all failed with the error:
Missing key parms
full_key: model.parms
object_type=dict
The code that I derived from the older huggingface conversion module still works, so I'm just going to skip the last step of dumping the converted file out, as per your PR.
@rose sentinel Have you done any work on resuming TI?
If not, I think it should be removed from the UI and CLI.
Resuming is now working. It was a problem in upstream. Update your diffusers to 0.12.1 and all will be well.
In-memory conversion to diffusers is almost there. I just need to patch in the proper VAE file if the ckpt specified one. https://github.com/invoke-ai/InvokeAI/pull/2468
If invoke.py is started with the --ckpt_convert argument, legacy checkpoint files will be loaded as diffusers in memory without saving them to disk.
This is not quite working; I'm getting poor ...
The merge frontend needs a small fix. Originally I had three different checkbox lists, one for each model that could be merged. Then I changed it to a single checkbox list from which you could select up to three models to merge. However, this creates ambiguity when applying weights to the second and third model - how do you know the merge order?
So now I'm going to have two checkbox lists. One that selects the base model, and the other that selects up to two models to merge in with the specified weighting.
So here is my checklist prior to making the release candidate:
- merge and document the new installer (which is really nice)
- fix the merge front end - this will be quick
- write the release notes documenting all the wonderful new stuff we get with diffusers
- (not strictly necessary) fix vae loading in ckpt converter
Any other must haves?
Minor thing, but the manual install instructions never tell you to cd InvokeAI.
Don’t you need to know the difference between second and third as well in the case of a differential merge?
The way the diffusers pipeline is written, both the 2d and 3d get merged with the same weights. Also, using a 3d model forces a particular merge algorithm called "difference." I'm sure you know more about this than me.
I don’t think so… but we probably need to carefully merge to main during RC period aye?
I thought diffusers supported different merge modes?
With 3rd model forcing “difference” don’t you need to know which is the third?
I'm running 0.12.1. I tried resuming a completed training by setting a higher number of training steps and it exited before training - like before.
They support a bunch of algorithms, "linear", "sigmoid", "difference", etc. But the docs and code says that only difference can be used with more than 2 models. No distinction between the second and third in the code as far as I can see. I might be wrong. Someone who is familiar with merging should pipe in.
Huh. I did a test after interrupting a previous run and it resumed. Maybe you need to set a very high # of steps at the beginning.
I had 3000 and I upped it to 4000 prior to resuming. But again, this was after a completed run so maybe that isn't supported as part of resuming since it's not technically resuming?
Ah - In that case, makes sense. W/ the second and third, are you capturing which is selected 3rd? My understanding is that the "difference" model merge is effectively doing 1 + (2 - 3)
Uh oh. The second and third are being presented in alphabetic order. I'll have to go to the three checkbox list again, in that case. Didn't like how it looked -- used a lot of screen space.
Where is the GUI to merge models?
It's more of a "Command Line GUI"
ah right. cool!
We lacked the, ah... firepower to make a proper front-end merging experience 🙃
i thought @rose sentinel might be writing javascript 😛
he is indeed a man of many talents
I haven't written javascript in many years, but I did once write a full genome browser in raw javascript.
The "GUI" uses a curses-based forms library. It isn't beautiful, but a lot better than typing out commands.
Looks like this:
The in-memory conversion of checkpoint models into diffusers is now ready for review. I refactored the code a bit, and added the needed feature of checking whether the checkpoint config had an explicit vae assigned and if so loading and incorporating that vae into the generated diffuser model. It seems to be working fairly well. https://github.com/invoke-ai/InvokeAI/pull/2468
If invoke.py is started with the --ckpt_convert argument, legacy checkpoint files will be loaded as diffusers in memory without saving them to disk.
Internally this uses the huggingface convert_ldm...
If you wish to test, start invoke with the --ckpt_convert argument. Then load a checkpoint model that is defined in models.yaml
Related features:
- In the CLI, the
!convert_modelcommand will take the path to a checkpoint file, convert it into a diffusers model, and add that model tomodels.yaml.
- Similarly, the
!optimize_modelcommand will take an existing checkpoint config, convert it into diffusers, and replace the entry inmodels.yaml. Both commands will offer to delete the original .ckpt or .safetensors file after successful import.
- Caching of models to CPU seems to be working with the in-memory conversion. However, be aware that if the cache is flushed and the model needs to be reloaded, it will go through the conversion process again since the diffusers model is never written to disk.
I’d argue that this is indeed quite beautiful. I’m a big curses fan 🙂 cool!
Awww shucks. @dire gazelle has done something similar—but in colour!— for the installer
@dire gazelle installer is art, convince me I'm wrong!
I agree, the textual inversion and merging TUIs are excellent! Works equally well locally, over SSH, and in a docker exec.
Haha, thanks 😊. The credit really goes to Will McGugan's excellent rich library.
I already have a similar dress-up of the configure flow pretty much ready in a local branch, but it will have to wait until after 2.3
@rose sentinel Sorry I haven't had time to look at your PR today. Been dealing with a corrupt mysql database.
I had a lengthy conversation with a 3d artist today (SPYBG) who's using these tools professionally. he has significant concerns about his own models having degraded quality in diffusers formats. I'm testing them to evaluate, but at the very least there's a perception that diffusers models produce worse output than ckpt.
Secondly, he's very bullish on LoRA embeddings being the way of the future - They appear to be .safetensors files, and I'm unsure how (if we even can) to get a LoRA embedding to load.
initial tests don't seem to have remarkably worse quality w/ diffusers. think this might just be perception.
I've been just as happy/frustrated with diffusers.
😏
I've noticed that with some schedulers (plms particularly) there are "glitches" in which a certain object in the image is filled with latent noise. Next time I see this I will record the prompt, seed and scheduler so that it can be reproduced.
I've seen similar things (particularly on faces) using PNDM and, to a lesser extent, DPM Solver in the coreml stuff (which is based on diffusers), fwiw. The only solution I found in that case was keeping the cfg scale low. I only use 5-6 there. Those are the only two samplers available in that software, so I haven't been able to test others. I'm getting great results with diffusers models though, and have verified that I can get identical results using diffusers and ckpt versions of the same model, once the VAE issues are sorted out. I suspect spybg might have this impression because he's used to using a1111, and it doesn't support diffusers. Trying to use prompts from a1111 in invoke (or anything else) will produce different results. These results will likely at first seem inferior if you've put effort into engineering your prompt to work with a1111, but I almost universally find that once I engineer a prompt to work with invoke, the results are (subjectively) as good or better, though rarely the same. I've come up with styles in a1111 that I can't replicate in invoke, and vice versa.
I've done head-to-head comparisons of a ckpt version of a model and its diffusers version as well, and the results are identical. Maybe the ckpt models glitch too and I just haven't noticed? It is interestingly specific. A few days ago I was trying to inpaint a paintbrush into an artist's hand, and the brush glitched consistently.
Changing to ddim fixed the issue.
In any case we're going to get some pushback on the migration to diffusers, and I'm glad we kept interim support for loading checkpoints so that we can give the unhappy users a solution -- even if it is just a placebo.
Just closing the loop on this, I tracked this issue down to what happens when there is inconsistent information in models.yaml - the model stanza indicates that it is a checkpoint model but weights are provided rather than the repo_id or path. I put a check for this into generate.py so that next time it happens there will be an informative error message.
I wouldn't schedule LoRA as a 2.3.0 feature, but there's been a lot of LoRA + diffusers activity over the past month. I think we'd find good support for it if you want to put it on the roadmap.
As for general diffusers model "quality" concerns: Stable Diffusion, being half chaotic system and half esoteric training details, is a perfect spawning ground for superstitions. But there are some areas where there are legitimate differences.
Primarily around the samplers/schedulers. We have what should be equivalent implementations for a lot of them, but there are some edge cases we haven't been rigorous about. Things like when a scheduler is using a "Kerras schedule," and I'm suspicious some of the schedulers will interact with how img2img strength works.
If you do find a case where you're getting consistently worse results, please do file an issue with info to reproduce. It's probably fixable.
ddim FTW!
For me, in my head, I've been working off a fairly consistent belief that dpmpp_2_a tends to be better for more detailed/complicated imagery, like a building, sketch, or an object. But I reach for euler_a for people and more fanciful (?) stuff. For lack of a better description.
No idea if that lines up with the intention, but it sure feels legit most of the time. 😩 (And both have a slightly different cfg scale "range".)
I hope to hell I'm not inheriting the equivalent of a family member's idiotic notion that rubbing the handle on a slot machine "increases your luck". 🎰
I do try to stay aware of that kind of thing, at least. 😅
Could be some truth to it if you wear fuzzy gloves. Give the machine a lil static zap, try to flip some bits in there
pulses the transmit of a high power CB next to it
So did anyone have any feedback on the patch I posted here for thresholding?
Worth squeezing in a PR before 2.3?
probably
i dont think its going to break anything except thresholding
and seemingly it does the opposite of break it 😛
Well that's encouraging.
testing it
very nice
And it depends on the sampler too.
I put in the random noise when it hits the threshold but this can be changed down the road if we hate it.
when you do this try reproducing in the compvis code path with --karras-max on 0 (or 1)
when i saw such things while testing mps they were reproducible that way
which would indicate that karrass scheduling avoids them
They appear to be .safetensors files, and I'm unsure how (if we even can) to get a LoRA embedding to load.
It's actually pretty simple to use LORA embeddings with diffusers (https://huggingface.co/docs/diffusers/training/lora#inference).
Yes, just dont know if its been built into 2.3 yet 🙃
new thread for that: #1071154596787003433
Yep with diffusers 0.12 (officially released last week) hey've added good LORA support
Ironically, with the diffusers models, k_dpmpp_2_a is not implemented. If you select it you will get whatever scheduler was active previously. This is a web bug, because you get no warning of this action.
Hah! Good to know. I tend to hang out with the Mega model most of the time, so I've dodged that one. 😉
"diffusers" in this context refers to the implementation (and associated storage format), not a specific set of pretrained model weights. Coreco's MEGA model absolutely works with that implementation, so you may not be as dodgy as you think.
Does the CoreML stuff "just work" w/ 2.3 or is there something that needs doing beyond supporting Diffusers?
nope, that'll be some further project
The coreml stuff is still pretty limited. It's fast, but nobody has figured out how to get it to work without baking the output image size into the model. This kinda makes it a non-starter for projects like invoke until those problems can be solved..
Well, with diffusers support, I suppose CoreML is one step closer. M1/M2 moar speed is always a welcome thing.
Yeah, the code is a lot closer to what's needed to run coreml stable diffusion now.
>> An error occurred:
Traceback (most recent call last):
File "/home/jsp/InvokeAI/ldm/invoke/CLI.py", line 158, in main
main_loop(gen, opt)
File "/home/jsp/InvokeAI/ldm/invoke/CLI.py", line 388, in main_loop
gen.prompt2image(
File "/home/jsp/InvokeAI/ldm/generate.py", line 515, in prompt2image
results = generator.generate(
File "/home/jsp/InvokeAI/ldm/invoke/generator/base.py", line 75, in generate
make_image = self.get_make_image(
File "/home/jsp/InvokeAI/invokeai/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/jsp/InvokeAI/ldm/invoke/generator/inpaint.py", line 218, in get_make_image
raise ValueError(f"Non-supported infill type {infill_method}", infill_method)
ValueError: ('Non-supported infill type None', None)```
This is with -tm on the command line. Can anyone confirm with the latest?
@rose sentinel I don't think this is related to your PR, but this should probably be fixed before #1064343623581323324...
Yes, I'll take care of it.
I have a patch but I don't think it's working right - can't drive the model to actually inpaint.
User error, seems OK now. Want the patch?
Please!
index c421a528..a72672cb 100644
--- a/ldm/generate.py
+++ b/ldm/generate.py
@@ -372,6 +372,7 @@ class Generate:
embiggen // scale factor relative to the size of the --init_img (-I), followed by ESRGAN upscaling strength (0-1.0), followed by minimum amount of overlap between tiles as a decimal ratio (0 - 1.0) or number of pixels
embiggen_tiles // list of tiles by number in order to process and replace onto the image e.g. `0 2 4`
embiggen_strength // strength for embiggen. 0.0 preserves image exactly, 1.0 replaces it completely
+ infill_method // type of infill to apply (defaults to the first method available in inpaint.py)
To use the step callback, define a function that receives two arguments:
- Image GPU data
@@ -404,7 +405,8 @@ class Generate:
self.log_tokenization = log_tokenization
self.step_callback = step_callback
self.karras_max = karras_max
- self.infill_method = infill_method or infill_methods()[0], # The infill method to use
+ infill_method = infill_method or infill_methods()[0] # The infill method to use
+ self.infill_method = infill_method
with_variations = [] if with_variations is None else with_variations
# will instantiate the model or return it from cache```
I don't know why we're holding onto the infill method as self.infill_method but who am I to debate such things?
+ infill_method = infill_method or infill_methods()[0] # The infill method to use
+ self.infill_method = infill_method
possibly the reason this is messing up downstream is the trailing ,, which is telling python to make a tuple containing infill_methods()[0]
so you don't have to split this line up - just do
self.infill_method = infill_method or infill_methods()[0]
without the trailing comma
a = 1 in python assigns 1 to a
whereas a = 1, assigns a tuple[int] with the value (1,) to a
I fixed that, too.
But it wasn't/isn't referencing self.infill_method in the call to generator.generate; rather infill_method.
So that's what I fixed.
Make self.infill_method and infill_method the same thing, the text without , at the end.
Thanks. Committing the fix now.
Whats the process for converting inpainting based models? The one I converted seems to be failing .. einops does not have <something> error .. I'll post the full error and repro steps soon
I just tested this and the diffusers checkpoint conversion pipeline doesn't know what to do with inpainting models because of the increased number of channels. The error is:
size mismatch for conv_in.weight: copying a param with shape torch.Size([320, 9, 3, 3]) from checkpoint, the shape in current model is torch.Size([320, 4, 3, 3]).
I imagine this won't be too hard to fix, but I can't work on it immediately. Tagging @forest spade and @worldly cloak so they are aware of the issue.
@worldly cloak I'm afraid that the diffusers version of the inpainting-1.5 model is crashing on both txt2img and img2img. inpainting is still working great, though. https://github.com/invoke-ai/InvokeAI/issues/2541
@worldly cloak I found the bug in the perlin noise generation and patched it. When you have a chance, could you review PR #2544? Thanks.
Yes you can easily fix this by setting this fn argument: https://github.com/huggingface/diffusers/blob/3a0d3da66fe5fcb72129a0a6a8a26d5e35a1ee5a/src/diffusers/pipelines/stable_diffusion/convert_from_ckpt.py#L787 to 9
Hope that helps!
That's great, thanks! @forest spade
If anybody has an NVIDIA card, please help test PR 2535. I think the math is right now but I need to see how it operates on various cards with various image sizes. https://github.com/invoke-ai/InvokeAI/pull/2535
I've done 1280x1280 on my 12GB card without a problem. It does die at 1536x1536 in decode_latents() which is unrelated and should be looked at in another PR.
I've also tested with fp16 and fp32, so please do the same.
SD 1.5 or 2.1? which would be more valuable to test first?
Both
any particular sampler? do you care about performance or ability to gen without crashing at/beyond a certain size? (performance will suck in my case at the moment)
Yeah, just want to confirm that it works and that people can generate large images without crashing.
I can work on optimizing performance next.
I'm just wondering if this is worth it vs. using 'max' as we have to clear the cache and that definitely has a performance impact.
So far it seems better to use as many slices as possible rather than being smart and forced to clear the cache.
Btw, I've been using attention slicing with diffusers on MPS (in https://github.com/whosawhatsis/mps_diffusers). On MPS, it takes the RAM requirements way down with almost no performance hit. I don't see a reason not to always set it to max on MPS.
From what I read, there is a 10% performance hit doing 'max'. But our clever version certainly is dropping more than that!
I shouldn't say clearly. I'm still testing.
>> 1 image(s) generated in 1697.12s
>> Max VRAM used for this generation: 16.66G.
SD1.5, euler, hi-res fix
GPU: Tesla P40 / 24GB. Still really shitty cooling, so getting thermal-throttled the hell out of it
I had been able to generate 2048x2048 in the past, but it would always hover around 22GB RAM. This was pre-xformers though.
I did have xformers installed, but I didn't do anything special to disable them
>> 1 image(s) generated in 1345.17s
>> Max VRAM used for this generation: 17.11G.
Same parameters, but with SD2.1
@rose sentinel do we have an argument like --ckpt_convert to convert the models to diffusers on loading the ckpt?
preferrably in the same location as the ckpt?
Yes. This is the exact name of the argument. When present, all ckpt files will be loaded as diffusers . However, they will not be saved to disk. The next time they are needed, the conversion will happen again.
There is an --autoconvert <directory> option that will scan a designated directory for checkpoints at startup and convert them into diffuser directories on disk.
The destination is converted_diffusers, I think. Not the same directory that the ckpt was found in.
Not documented yet (just doing the documentation tonight) so if you want it changed, now's the itme.
BTW, there is no patchmatch delay on my other GPU system, so maybe something wrong with the first one.
--no-xformers
Actually, once patchmatch is turned on in the second system, I get the same 5-10s delay while it runs. I guess that's just how things are.
Does patchmatch run on CPU or GPU?
It uses opencv, I think cpu only
I've experienced a delay on first usage but subsequent ones have been very fast.
I see the slowdown on large images (e.g. >= 768)
Just tried it and I confirmed that's the case here, too. Never noticed it before.
@dire gazelle @rose sentinel Implementation of dynamic slicing was a bust. There was little to no performance gain over regular attention slicing, specifically max supported by diffusers. This is a new PR to switch us over to max. https://github.com/invoke-ai/InvokeAI/pull/2569
I think it's important that we get this in prior to 2.3.0 as we'll see a lot of user complaints about large image generation.
I'll prioritize a review, sometime today.
Has anyone seen a regression in embedding rendering? See this: https://github.com/invoke-ai/InvokeAI/issues/2554 @tardy sparrow
I see it’s already merged. What level of performance degradation will we see with this?
Same performance as what's currently there - auto mode.
I think I shared some numbers on that PR.
So there won't be any kind of performance regression, but we can do better - and I'm addressing that here but probably this should wait until after 2.3.0: https://github.com/invoke-ai/InvokeAI/pull/2572
As you said in your comment that I'm just now reading.
Definitely need testers on all platforms for a change like that.
Okey dokey
I've not noticed embeddings behaving any differently on 2.3.0rcX vs. 2.2.5, but that may mean we've always been broken...
Sadly I never paid much attention to embeddings until recently so I don’t know what to expect in terms of quality
I've had mixed success, and I attribute that to me not knowing what to expect. But it's entirely possible that something is off somewhere, too.
Here's a fun little tally I put together. After several weeks of testing diffuser imports, conversions, merges, and so forth, I've accumulated 22 distinct models in my invokeai models directory, containing a total of 376 files and consuming 98.27 GB of space. I then sha5sum'd them and tallied up how many distinct files I had in that set. There are 66 distinct files using 77.84 GB of space. This implies that if one were to either de-duplicate the directory, or restructure diffusers models to use some sort of SQLite database, I could have saved 20.4 GB of space, roughly 20%.
By the way, the first entry there is model_index.json.
Are these individual files within each diffusers directory?
I hashed each file, sorted and uniq'd them. The "Count" column is how many times I found the same file in different diffusers models.
File Size is the size of the individual file, and Total Size is the amount of space the file with its duplicates uses.
huh, that's a lot more 2's than I would have thought
I expected it to be more like safety checker and vae repeated many times across the 22 models
Yeah, I was surprised too. It seems as though the safety model and the VAEs are getting tagged or versioned in some way. The cached models replace the names of the files with commit hashes or something similar, so it's a bit hard to tell which files are matching.
This was an incorrect assertion because I filtered out all files less than 1 MB in size.
is .blend() not working ?
I've been using the other blend syntax. Haven't tried that one recently
whats the otehr blend syntax?
prompt one:1 prompt two:1
oh
btw, if you use --log_tokenization, it shows you the the separate blended prompts with token counts for each.
Makes it easy to tell if it's working right
i use it .. and it wasnt working with the blend format
thats why i was wondering if something broke
should be, what's your prompt?
("a cat with a hat", "a dog in a suit").blend(1,1)
the prompt being parsed in the log_tokenziation has blend 1 1 in it
im guess that means it didnt get parsed?
You're missing a quotation mark
with the missing quote restored: ```
[TOKENLOG] Parsed Prompt: Blend:[FlattenedPrompt:[Fragment:'a cat with a hat'@1.0], FlattenedPrompt:[Fragment:'a dog in a suit'@1.0]] | weights [1.0, 1.0]
[TOKENLOG] Parsed Negative Prompt: FlattenedPrompt:[Fragment:'ugly, boring, bad anatomy, duplicate heads bad anatomy, text, watermark, snow, 3dcg, anime, painting, illustration, stippling, craigslist, nude'@1.0]
[TOKENLOG] Tokens (blend part 1, weight=1.0) (5):
a cat with a hat
[TOKENLOG] Tokens (blend part 2, weight=1.0) (5):
a dog in a suit
Interestingly, it looks like it's acting as if the skip normalization option is on...
Converted to the colon syntax: ```
[TOKENLOG] Tokens (blend part 1, weight=0.5) (5):
a cat with a hat
[TOKENLOG] Tokens (blend part 2, weight=0.5) (5):
a dog in a suit
That's a cat with a hat:1 a dog in a suit:1
the weights get normalised deeper in, that's fine
except for the : syntax which does its own (redundant) normalisation first
ah
new organization scheme coming for how to find variants of a release on the hub: https://github.com/huggingface/diffusers/pull/2305
for fp16, ema/non-ema, etc
Nice.
That's much better than what we currently have to do.
@worldly cloak how easy is it to add in diffusers pipelines now? Just wondering about things like depth2img
Is it “plug and play” or more involved
And latent upscaling!
I thought someone said that high res optimization was already using latent upscaling...
It upscales latents with a simple transformation, then runs img2img on those. So it doesn't use "latent upscaling" but does upscale latents. 🙂
Oh, are you talking about using this? https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler
Maybe I am? I think so?
I’m getting feedback that VAEs are too hard to load and switch between on diffusers
E.g., if a user finds a ckpt and safetensors vae file, how they should be bringing that into a diffusers model
Given our goal of removing duplication, im wondering if we should let users add “VAE” files like they do models, and then select which VAE they want to use
This would allow us to handle any VAE conversion etc. that’s needed
Also, inpainting degradation is still a problem
And one that we're going to find causing very large viability concerns for users doing anything other than "playing around" with SD
What do you mean by inpainting degradation?
I didn't have any problems with inpainting while testing 2.3.0.
I was using the diffusers inpainting model.
im using normal models
(non-inpainting) diffusers models
Maybe it was just a fluke though.
It's not a fluke, but it could just be on this specific usecase... I.e., simple backgrounds
And (maybe it happened in pre-diffusers as well)
I can give it a shot, hang on a sec.
I had lots of seam issues pre-diffusers with the regular 1.5 model.
this ones not as bad as I got earlier
but its definitely a "take this to photoshop to fix"
Can you share a video of what you're getting? Might be helpful reproducing it on this end.
You're talking about the weird fringes and such?
yeah
think i might just be conflating "diffusers" and "stuff thats always been "less than production ready""
Well send something along when you can. I just accidentally blew away an hour's worth of work so I'm trying to redo it from memory.
I guess what I typically do is masking for the regular model, not inpainting.
And that works really well since I'm just fixing stuff up normally, not completely replacing it.
Try upping seam strength and steps?
The short answer is that it is more involved than "plug and play," for now.
We deviated quite a bit from the stock diffusers Pipeline class to fit the other InvokeAI features in and play nice with the legacy code.
and it'd help to drop the legacy baggage before improving things in that regard. Not only because the UI would have to reflect what will and won't work on which implementation, but also because having them both puts some design constraints on things that I really hoped would be temporary.
Perhaps, while we're moving towards "Nodes", we plan the deprecation of legacy code as we discussed, and if time allows, implement one or two new pipelines?
(all assuming the "dropped legacy baggage" & pipeline tech debt are addressed)
My vote is that happens well before nodes - should make nodes even easier and eliminate corner cases. But @clear hinge may disagree.
I don't think there's any point in replacing e.g. Generate.py with something else ahead of nodes. Nodes is intended to replace that (ultimately), and any cleanup should ideally be aimed at moving code either into individual nodes or into "services" that nodes can call into.
Right now nodes are built on top of "Generate.py" as a service, but that's not probably the correct long-term implementation
More of a "I don't want to keep internal node code up-to-date" as the code underneath keeps changing
It almost seems like you need a code freeze for a big block of time to get this done right. Hmm.
To do a major refactor we will
I see there being two options for nodes:
- Switch to nodes from the frontend down to Generate.py in the stack, but leave things unchanged otherwise. Then refactor afterward.
- Freeze while we migrate and refactor.
I think #2 is probably better, since we're going to be cutting a major version number, and we should do all of our API breaking at once if we're doing that. Realistically what this looks like in practice is probably:
- Merge nodes branch into main once it's stable enough (doesn't change anything below it, and the UI is still using the old backend).
- Migrate UI to nodes while also filling out additional nodes. This will remove the old backend and leave just the nodes code path calling into Generate.py.
- Refactor everything below nodes in the architecture, and update individual node code as needed.
This ignored the CLI here, but I think "UI" in #2 could also include the CLI, depending on what we want to do with the CLI (I have a new one in place that should be able to be modified to run against a separated backend/daemon, but the syntax and usage are very different since it's all autogenerated from nodes)
I've been trying to finish up the iterations refactor of nodes (including unit tests). I'm hoping I get there around the time we're ready to start migrating, but I still need to rewrite a lot of the web API layer on top of the core.
Oh also, as part of the refactor we should try to unit test any new or refactored code we work on. ☺️
This probably belongs in #1049107548264992779
Probably just need to start a big nodes/refactor thread once we've cut 2.3.0
That happened a few days ago!
Oh lol it's been a long week. Are we stable, or expecting a 2.3.1?
I need to find a good point to rebase then 😣
I think it's safe to say that we can always expect the next one.
I want to squeeze a few changes in but I don't know if they'll affect nodes. Not sure how deep you're refactoring.
I'm working in diffusers_pipeline.py and beneath.
Generate.py is as deep as nodes go right now, but everything above that changes. And the only reason the changes don't go deeper is because I've been maintaining the branch for 4 months now (through a few other branches) and rebases have been rough enough as it is.
This is good news for me! 😂
That said, rebasing is a huge chore right now, so changes to main will cause pain and slowdown if I try to keep on top of them until nodes is ready to go in.
I know users are hoping for ui and feature updates but if we want to migrate successfully and without a ton of unnecessary pain, a feature freeze is probably needed.
Is there any reason to not merge the nodes branch in now? As mentioned it doesn’t actually change anything to the existing code base except for adding some additional python dependencies.
Mostly because I'm closing in on a major refactor of the execution model/api. Don't want people writing against the new API unless they know what the plan is (e.g. you have a good idea what the plan is, so you can build toward the current API with a plan to adjust)
Is there an argument to ship any close to complete features in a 2.3.1 release and then freeze?
think we make a list of stuff that needs to go into 2.3.1 and do those first
and then we freeze
coz once nodes are merged in
we wont be able to do hotfix patches
or are we just talking about the node PR rebase and not a merge?
I think the merge would happen post rebase
In the event of a major issue in the latest release, we should probably create a hotfix strategy
I don’t think “we can’t hotfix” is responsible
Depends on how long it's frozen for.
I think it's safe to say a Month would not be a surprise
Or at least, should be accounted for 🙂
Could we freeze things above a certain level for the initial merge?
If it only interacts with components at generate and below, for example...
There are ~5 PRs waiting for review. When done I'll cut a 2.3.1 RC. We could do this in a day or two; I've reviewed the PRs and they're good (IMO), but waiting on code owners.
Here are the PRs that I'd like to get in for a 2.3.1 release this week:
- 2614 - more sliders in the WebUI (nice to have)
- 2616 - WebUI model import and conversion (mandatory)
- 2630 - Convert v2 models into diffusers (mandatory)
- 2636 - Add an "update to latest version" menu item to
invokelauncher (nice to have) - 2664 - Curses-based form for initial model selection during install (nice to have)
- 2538 - Consistent style for source code using
pyupgrade(nice to have, has to happen after other merges)
I am aching to refactor both generate.py and CLI.py because they are both incredibly ugly. If I had the choice, I would choose to leave generate.py alone and refactor the CLI using the new nodes system. I guess that's the consensus?
Path of least pain.
I agree, we definitely need a hotfix branch, and the expectation that people will occasionally be dragged into maintenance of same. Nodes could easily be more than a month, and the field continues to move fast.
Do you want to retain the current CLI syntax or move to the new syntax I'd built a demo for? If the latter, I was planning on making that code easier to work with by adding command classes and auto-discovery similar to how the nodes are auto-discovered. Just haven't put in a ton of effort there.
#2614 is up in the air.
I'm happy to wait to initiate The Big Freeze, but maybe that's because I'm at the mercy of the rest of the project and don't really have a choice 😛 It would be very on-character of us to just continue saying "okokok patch X is really truly no-joking the last patch before we freeze".
To be very clear about The Big Freeze - it initially applies only to UI changes.
We will never be able to make time to migrate the UI to the new nodes API if we don't freeze new features. I don't have enough time to review PRs and code most days, so that doesn't look great for the migration.
I don't mean to puff myself up or anything but I do expect to do most of the work on the frontend side - which is totally fine! - I just need time to do it and to not deal with the changing scope as features are added. I've been trying for a couple weeks now but have barely made progress between evenings spent rebasing and PR reviewing.
Keep in mind that the nodes branch is "just" a web server. It's just adding another web server
we clearly need to do some roadmap planning. Do we have a more appropriate forum for that than here in #1031668022294884392 ?
Lets create one.
I propose we close this forum post now that diffusers integration is released.
Thanks @worldly cloak , sorry to muddy this channel
IT IS CLOSED. GOODBYE DIFFUSERS, RIP
LONG LIVE DIFFUSERS
Bye bye diffusers! You were good to us.
I rather like the pipe syntax you came up with. I'd add autocomplete and other goodies.