#🧨 diffusers
1 messages · Page 4 of 1
i just checked with a normal ckpt ... works fine
thanks for lookin into it if you get the chance
yah i don't have a diffusers setup yet... or anything other than a pretty old environment x.x
I imagine getting installed in a way that won't conflict with my current environment is going to be painful
is the dev setup still create pew environment, pip install, run... some setup script?
scripts/configure_invokeai.py
just use the installer from https://github.com/invoke-ai/InvokeAI/pull/2338
the happy path should work great
yeah configure_infokeai.py sets everything up
ah I'll try that if this doesn't work
(does that do a code setup for dev purposes?)
hrm all black images output on the 768 model =/
512 model seems to work though
@rugged moth do you have a good sample prompt/settings that reproduce it? On stable-diffusion-2.1-base I'm getting decent results (I assume this is using diffusers).
The seams are a little stronger than I remember them being, but they're tending to look decent
diffusers model? 512?
the "scale before processing" settings could maybe be something to look at? Doesn't seem related though
20 euler_a
its pretty bad compared to what we had
its definitely creating an entirely unrelated image there
50 keuler for me:
and in a lot of cases where the detail is not clear, it leads to a messy blob of low jpeg artifacts
see there's no match
between the original image and the infilled area on the seam
it gets more evident in other examples
its clearly not respecting the original image at all
left it outpainting the seam), right is an inpaint I did at the seam paint settings (0.7-ish strength)
seems like inpainting is just generally a lot stronger with this model?
Not as strong, but you can still see the seam if you go looking =/
Similar issue if I just inpaint though
I wonder if this model just doesn't weight the original RGB as heavily when inpainting
That seems to be trying to introduce mountains in the clouds though
What if you turn seam strength down a lot?
It's super artifacty though 😣
What if you inpaint using similar settings?
lol keuler won't work with 1.5 for me x.x
AttributeError: 'EulerDiscreteScheduler' object has no attribute 'make_schedule'
actually every sampler gives me that
strength .12 is too low 😅
.28
SEAM. SEAM. I SEE A SEAM
uh.... try bumping up the steps on seam painting
at what strength?
0.7
jeez these don't match at all:
So the quality of the outpaint seems similar to the quality of the seam paint for me
it's like sending it through inpainting bumps up the contrast and saturation:
this is outpainting to the left
not so much having bad quality but def seeing the seam
(even at 20 steps)
Can somebody tell me if this embedding works for them? I cannot get it working at all, just trained it with TI. Trigger is <lfb>.
@rose sentinel I'm getting debug messages that the embedding is being used but it's not doing what it should...
are you on 2.1?
question was for kyle, sorry 🙂
yah 2.1
(oh should I be on diffusers branch? I thought it was merged into main and I should be testing main)
Sorry, main now. 🙂
is your base image from 2.1?
i.e., are you outpainting something that was generated in a normal model
b/c i've found 2.1 generating weird shit generally without a lot of fine-tuning the prompt
everything is 2.1
hmm. ill do some testing on 2.1
1.5 throws errors for me, and 2.1-768 just generates black images with an exclamation overlay
diffusers uses a different make_noise function .. reckon the issue is there?
well, it definitely seems like inpainting is the issue
everything else in the flow looks correct to me
get_noise_like
Would this make_noise be the reason things look different every diffusers run vs. non-diffusers being identical?
have a look at that function
how different is this from what was there before?
isn't this what inpaint_replace did in the past?
i mean, doing inpaint replace is effectively doing strength = 1
2.1 still has some wonky seams but definitely not seeing the vast quality drop off you are
how about trying an img2img on the whole thing?
img2img isn't as drastic:
so... maybe the latents. But I'd look more into how the masking does stuff
(I'm very unfamiliar with all of this code, both new and old x.x)
kids are home, gtg for now x.x
not seeing the same degradation you are
(seams are still bad though)
With inpainting or img2img?
Ah I don't think there's actually degradation in mine. Just the prompt, strength, etc. doing its thing
ah got it.
But img2img is roughly inpaint without masking, so helps narrow the problem down
My guess is something to do with the masking. But I don't know how that all works 😣
Maybe that'll give @worldly cloak an idea though.
So who is using xformers at this point?
i am
OK - and inpainting does have degredation
after testing on more sensitive gradients/areas
so its definitely anything w/ masks
I had that error too! I did a pip reinstall and it went away.
inpainting is working as well as before, if not better. Outpainting is a problem. When I want to outpaint, I still use a checkpoint model (usually inpainting-1.5),.
Does it? I'll do a side-by-side comparison since I have diffuser and ckpt models side by side.
hm... maybe i just had a bad inpaint then
#1031668022294884392 message
This was a refresh pip install in a new venv 😣
I meant a side-by-side of outpainting with a checkpoint model vs outpainting with the equivalent diffusers model.
Are you seeing the same problems that @rose sentinel and I saw with the same txt2img parameters yielding different results? The effects were multiplied for me if I did high-res optimization and a larger image size.
This is very helpful!
That doesn't bode well! I wish I'd kept better notes, but it happened late at night.
Fresh model downloads too
@rose sentinel It appears from the debugging output that my TI is loaded, but it also seems to have minimal effect. When I trained on 2.2.5, it definitely worked. And another TI that I trained with the diffusers version looks fine, if not a bit overtrained. So... hm.
I've only done training once, a few days ago when I wrote the front end, and it definitely worked. I trained on pictures of "jello", and when I give the token "<jello>" (with the angle brackets) I get jello and nothing else. So maybe way overtrained.
Want to try with my images?
Sure! Put them somewhere that I can retrieve them and I'll give it a go. I'm doing a second training now, but can start yours tomorrow morning.
Will do. What params are you using for your training?
Here are images for the "lfb" embedding: https://drive.google.com/drive/folders/1cSQ1o3ZIjWKOtjTgqsFCbYcvhQ6wQCT0?usp=share_link
Pretty much the defaults: learning rate, gradient, batches, etc. Just now I was trying to get training going on a multi-GPU system, but I can only get one GPU to work.
Ok. I'll start the training tomorrow. How long does it take for you? On my system it is a couple of hours.
I have no idea why one training would be perfect and the other almost unnoticeable with the same training parameters.
I did 2000 steps and it took maybe an hour... I tend to set it off before leaving for a bit.
I did 4000 steps also and it still wasn't very noticeable.
You want to train on a style, is that right?
Advice on the parameters? I have no idea how they interact. Also I've heard that providing too many images will make things worse, which seems counterintuitive to me.
What I'm getting vs. what I'm looking for (albeit exaggerated).
on 1.5 or 2.1 or both?
Both, but I experimented more with 1.5.
Just generate the same image multiple times and take a difference in Photoshop/GIMP.
Happens with both. It is odd because if I run the same parameters 10 times I don't get 10 different variants, but two variants that alternate, more or less.
I've had learning rate at 5E-4, 2000 steps, constant schedule.
Both xformers versions differ from the non-xformers one.
I just use the WebGUI, hit the Invoke button multiple times (with the seed fixed) and then arrow key through the gallery. Very easy to see the changes.
Same as what I tried, except I did 3000 steps.
Yep. There are big differences when using the same seed and switching xformers on and off. Then within xformers there are subtle differences.
no variations on 1.5 for me (xformers)
It's going to drive some people crazy. I added a switch to disable xformers today.
resolution: 512
lr_scheduler: constant
mixed_precision: fp16
learnable_property: style
initializer_token: ★
placeholder_token: <lfb>
train_data_dir: !!python/object/apply:pathlib.PosixPath
- /
- home
- jovyan
- work
- InvokeAI
- training-data
- lfb
output_dir: !!python/object/apply:pathlib.PosixPath
- /
- home
- jovyan
- work
- InvokeAI
- text-inversion-training
- lfb
scale_lr: true
center_crop: false
enable_xformers_memory_efficient_attention: true
train_batch_size: 2
gradient_accumulation_steps: 4
max_train_steps: 2000
lr_warmup_steps: 0
learning_rate: 0.0005
only_save_embeds: true```
Hah! You found the preferences file. I thought it might come in useful.
Yes, wish there were one per embedding run but alas this is a good first step!
yep - variations w/ xformers.
For what it's worth, I'm not using xformers for my training. Have you tried without?
With and without. Just having trouble on this one embedding and I don't know why.
That's a good idea and easy to implement. Thanks for the suggestion.
I'll set off another run before I go to bed but I won't be able to check on it until I get to PDX.
Playing with Invoke while I'm trying to get out of town is definitely not a good idea for my stress level.
this appears to be how xformers work from looking around
The old TI script I think defaulted to 5E-3 rather than 5E-4... maybe there's something to that?
It's odd. There shouldn't be any stochastic behavior at all. Could you see if there is a periodicity to the variations? There could be a variable being incremented somewhere that changes the behavior between generations.
Sounds plausible.
(I can't do any generations right now 'cause I'm doing TI training)
From what I've seen looking at xformers reports on auto, it's "just how it works"
Sounds like an implementation error to me. It shouldn't be pulling entropy from anywhere.
Nope. It's a bug. Someone needs to write a standalone script that generates 5 images and a diff using the stock diffusers pipeline and then file an Issue on their github.
In fact, the generated images should be binary identical.
(Unless the metadata has a date in it? I don't recall)
interesting... so bug exists in auto as well then?
AUTO1111?
yes
thats where I'm corroborating the "SD + xformers = no longer deterministic w/ same seed" reports
If people are complaining that xformers is giving varied images on the auto distribution, then it is an upstream error, not ours.
Good corroborating evidence.
xformers is one complex beast. I'm glad I'm not the one who has to track down the source of the non deterministic behavior.
Unrelated - @rose sentinel i seem to have converted a model to diffusers with a 'success' report, and now have issues generating due to an error
RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'
Is that with the current main? This is a symptom of the autocast issues. The fix was only merged in recently.
No, actually the merge was 2 days ago. You probably have it.
So is the plan for SD2.1 to work w/o diffusers as well as with?
Are all the converted models doing that, or just some? I converted some models today and they seem to be running OK.
I was not on latest. its fine.
I have no plans to get the ckpt version of SD2.1 working.
Fixed it? Glad to hear.
Ack, sorry, meant xformers.
Yes. I don't want to release until we've got SD2.1 working without the xformer install. I actually have a fully functioning workaround in a PR waiting in the wings in case we can't get to the root cause of the issue (using "we" rather loosely since it's actually only Damian and Keturn who understand what's going on).
The workaround puts generation into float32 mode when the SD2.1 model is loaded.
It's stupid, but I like reproducibility a lot. Just how I'm wired.
A lot of people won't use xformers because of this, and we need to document the problem with lots of big red letters.
Also, unfortunately, we can't put xformers into requirements-base.txt as I'd hoped we could. It will have to be an optional thing that people install.
Or maybe what we do is to install xformers but default to --no-xformers, and let people turn it on with the command line switch. (or put into invokeai.init)
i think that's the easiest.
That's if we can find xformers that doesn't require multiple hours of compilation time?
For the release candidate?
I think we can tell people to manually install xformers using pip - we jsut cant put it in reqs, per lstein
pip only works on linux
xformers==0.0.16rc390 is what I used for another project but I compiled for Invoke from scratch because I was just following instructions.
no?
unless you mean using the wheel on windows
worked fine for me
pip install xformers?
yeah
are you on WSL?
i referenced above
ah yeah, I installed w/ WSL
ye that'd work coz it treats that as linux
yeah
but native windows, there's no xformers wheel on pypi
you need get the wheel off the xformers repo
basically they dont do them formally but their actions build wheels for windows too
at some point we need to download one and provide it ourselves
coz the wheels keep updating to future version as they update their repo
also on native windows theres no pip install triton either
and i havent found a way to get triton working on windows yet .
hrm... yah I can't make sense of the under-the-hood code for inpaint enough to tell if there's anything wrong. Might be helpful to output per-step results somewhere and then compose them into a single image... I guess the callback could handle that to some extent...
making a separate thread for seams so they don't get lost in the xformers and inversions and whatnots: #1065885676161212506
Is it possible to load a local diffusers model? I've only been able to figure out how to do it by repo ID.
Yes.
path: /home/jovyan/work/InvokeAI/models/optimized-ckpts/stable-diffusion-v1-5-nonema
description: Stable Diffusion Non-EMA (FULL)
format: diffusers
vae:
repo_id: stabilityai/sd-vae-ft-mse
default: true```
I don't know where the problem is, but seems like I get almost the opposite quality.
maybe because of mps?
strange
can anyone with m1 try both 512 and 768?
Back to insane model load times unfortunately 🤔 Model loaded in 122.51s (Although on subsequent starts... 2s - inconsistent)
Only on starting though, which is odd.
Diffusers conversion/optimization bugs:
- Converted file has a description of
Optimized version of {model_name}- not the actual model name. - Model location is
Nonein WebUI - Might also be related to people saying they can't use local path to load up models? - Having a difficult time adding VAE manually to the file, blocked doing it through WebUI :
WebUI Bug -
AssertionError: missing required field "format"on trying to update, likely should be passing in the format: diffusers by default when editing a diffusers model
Got it overtrained without xformers at 5e-3...
I just did my second run of TI using photos of a family member and the default settings, and it actually worked quite well! I'm running your images now.
Cool. Did you use xformers?
good news for reproducibility with xformers: https://github.com/huggingface/diffusers/issues/1997
Great news!
Wow the activity on this channel is impressive 😅 Hard to read all messages. Just some quick updates:
- We'll do a release next Wednesday or Thursday
- There is a cool new pipeline (you should check it out): https://github.com/huggingface/diffusers/pull/2040 from the paper: https://www.timothybrooks.com/instruct-pix2pix/
- @tardy sparrow thanks a lot for your PR regarding the ckpt conversion. I adapted a couple of things, moved it into src/diffusers so it'll be in the next package (albeit not as a public API yet) and merged it.
I see that @forest spade is online. Can you provide me with some advice on how to get the resume from checkpoint function to work in the diffusers textual inversion training script? I have tried killing the training process halfway through and then relaunching with the --resume_from_checkpoint argument set, but each time the system calculates a start step that is higher than the end step and refuses to do further training. Is this a known problem?
Sorry I have to run in 5min (just wanted to give a quick update 😅 )
Ehm, I don't know really, could you maybe open a quick issue? We've added --resume_from_checkpoint only relatively recently so it's very possible there is a bug with the starting step! Happy to investigate
Also regarding the release I still have two big things I'd like to get merged so that they are in the release:
- textual inversion loader: https://github.com/huggingface/diffusers/pull/2009
- allow to pass text embeddings: https://github.com/huggingface/diffusers/issues/1506#issuecomment-1384050584
Anything else that would be important for you?
User in github linked to this and said they didn't need WSL? https://pypi.org/project/xformers/0.0.16rc425/#files
that one seems to have windows builds
Training your lfb images seems to have worked. Here's a zip file containing the learned embeddings, and here's a sample image using child playing in <lfb> style
Here are the settings I used:
resolution: 512
lr_scheduler: constant
mixed_precision: fp16
learnable_property: style
initializer_token: ★
placeholder_token: <lfb>
train_data_dir: !!python/object/apply:pathlib.PosixPath
- /
- home
- lstein
- invokeai
- training-data
- lfb
output_dir: !!python/object/apply:pathlib.PosixPath
- /
- home
- lstein
- invokeai
- text-inversion-training
- lfb
scale_lr: true
center_crop: false
enable_xformers_memory_efficient_attention: false
train_batch_size: 10
gradient_accumulation_steps: 4
max_train_steps: 3000
lr_warmup_steps: 0
learning_rate: 0.0005
only_save_embeds: true
Here's portrait of beyonce in <lfb> style
If a user is on safetensors on 2.3, is our instruction to convert to diffusers?
pickle/safetensors is one axis, CompVis/diffusers is another axis
safetensors & diffusers is always best
not particularly
Hmmm. Getting ValueError: token_ids has shape torch.Size([79]) - expected [77] still, generating for promptcraft this week
I thought we had fixed that but maybe I'm wrong
suppose i just need to get an older version of invoke up to get through promptcraft
darn
Someone test my assumptions. For all places where hardcoded 512x512 w/h are set in the code (a LOT of places), where would we not want that to just be updated to a {model_width} and {model_height} variable?
Persons who have been trying out prompting for 2.1, give this a spin: https://github.com/invoke-ai/InvokeAI/pull/2381
fix-padding
main
padded is more gooder.
blending is tough
are VAEs for diffusers models supposed to be configured through YAML or dropped into the folder?
I expect it's more noticeable on shorter prompts (where more of the 77 tokens go to padding.)
that would make sense
for VAEs, either/or. If you always want to use that VAE with that model, might as well distribute it along with the model.
but if someone releases a new VAE (like the MSE-finetuned one for 1.x), easier to point to that than repackage everything.
k - i'm just getting some issues using a VAE key/value pair for this diffusers model so wasnt sure
will just poke and see if im messing up a path or something lol
Is this using the CLI-based model conversion and import? I've got a whole bunch of bug fixes queued up in this unmerged PR https://github.com/invoke-ai/InvokeAI/pull/2369. It includes fixes for VAE support and problems when importing diffusers models by path.
yep - i can retest after merged!
You can do either, but I prefer to put them in models.yaml so that I know exactly what VAE I'm using.
I believe that the vast majority of these are parameter defaults which are only used if the parameter is not provided. When I load SD-2.1+768 into the CLI, the default width and height for generation magically change to 768x768.
How do I reproduce this? I thought these were all fixed too.
ill ping you the prompt that was giving me fits
well... I'm not getting it now. Disregard I guess.
wait... let me test something.
thats it
works on diffusers model, does NOT work on ckpt
that has been fixed.
ah. not fixed on the old codepath i guess.
Curious to see what you get with xformers enabled. I wonder if that was what was giving me problems.
Can confirm inpainting has quality degradation as well
Can see it around the second head (inpainted) of this img #🎨outdir message
i have a cross-attention control implementation working against current diffusers main (0.12.0-dev). couple of caveats - it's non-sliced and it currently only works on CPU on macOS because of an upstream torch bug. top is "a cat playing with a ball in the forest" -W512 -H512 -s15 -S123 , bottom is "a cat.swap(dog, s_end=0, t_start=0.2) playing with a ball in the forest" -W512 -H512 -s15 -S123
PR is ^ if anyone would like to try it out on linux/win
this is with no monkey patching for a happy @worldly cloak
Approved and merged
Since we've got a set of interconnected problems here (SD2.1 black screen, memory use, xformers issues), I've attempted summarize the status of each issue in a table here: https://github.com/invoke-ai/InvokeAI/issues/2387
It looks to me like there are two potential short-term solutions:
- Damian's non-monkeypatched solution described here: https://github.com/invoke-ai/InvokeAI/pull/2385
- The float32 workaround described here: https://github.com/invoke-ai/InvokeAI/pull/2335
(1) is my preferred choice. I'm testing it out on Linux now.
(I will update the table for PR 2385 after doing the testing.)
@tardy sparrow I just tested PR 2385 on a Linux/NVIDIA system using diffusers 0.12 (pulled today). Good news is that I can generate SD-2.1 images without xformers in an OK memory footprint (8.75 GB RAM). Bad news is that swap() produced the dreaded RuntimeError: expected scalar type Half but found Float. Is this the upstream torch bug you mentioned? Here's the stack trace:
no the pytorch bug is an MPS thing
@forest spade I'm trying to integrate the checkpoint merger functionality into InvokeAI, but I discovered that passing local diffusers model paths to the merge() function doesn't work. I've submitted a PR that might fix the problem: https://github.com/huggingface/diffusers/pull/2060
However, I couldn't easily figure out how to test the proposed fix, since the checkpoint_merger.py file is downloaded fresh from GitHub main each time. Is there a way to tell the code to look in my local repo for community pipelines?
(I can change COMMUNITY_PIPELINES_URL in dynamic_modules_utils.py, but this seems like a hack?)
Half/Float issue is fixed https://github.com/invoke-ai/InvokeAI/pull/2385
@west nebula could you give this a spin? i think s_start/s_end is doing nothing now, but i'm not sure that that matters
Am assuming we still don’t have any leads on inpainting/outpainting issues?
Is there any way I could do some testing/debugging to figure out where it’s going sideways? I think there’s a image debugger in the webUI - maybe ought to try that
User confirmed the VAE issue I reported a few days ago is happening to them as well
#🌏invoke-chat message
firstly, don't feed a CompVis VAE to a format: diffusers model.
How did you get that config? I know lstein tested the CLI commands that add models with VAE; was it the web model manager or editing models.yaml manually?
if manually, note the example https://github.com/invoke-ai/InvokeAI/blob/89791d91e84abfc127ffecca21db68920781709f/configs/models.yaml.example#L17
diffusers format still needs a repo_id or a path.
Ah - that’d do it
Ok - so what are instructions for users using custom VAEs (e.g., anythingv3)
if you want to use Anything v3, use https://huggingface.co/Linaqruf/anything-v3-better-vae
if you wanted to use only the VAE from that on some other model, I guess that's where you'd do something like
blah mix:
format: diffusers
path: /blah/blah/blah
vae:
repo_id: Linaqruf/anything-v3-better-vae
subfolder: vae
Ok so - aside from outpainting/inpainting issues, I've also noticed some odd artifacts cropping up recently - specifically when it seems i'm using weights
(txt2img)
for some reason, this prompt seems to have this happen even when I've not been having this happen elsewhere today - it might just be highlighting some wacky behavior.
i'll also not ive not pulled today, so if there's been anything fixed recently I can try doing that
Do we have a running tracker going for open bugs that I should be adding to?
im also not going to say the above artifacts couldnt just be a model issue - ive still got the deliberate model loaded from troubleshooting people's issues earlier. swapping to other models don't immediately have those same artifacts crop up
I've also dug into the image debugging for outpainting - It definitely seems like its happening on the inpainting seam step. Unclear whats happening here.
@heavy glacier Digging into the outpainting issue is coming up second on my list of things to look into. I'm first going to the new installer combo that @gusty hound and @dire gazelle have put together. I do see a similar artifact issue as you do, but only when I crank up the weight of an element a lot: banana+++++ I think it's been like that for a while, but I'll check.
I dug into the seam painting far enough to identify that it seems likely to be something to do with inpainting in general, or a small change I'm not seeing in how things are being encoded to do the seam paint. Everything seems to be constructed correctly.
@worldly cloak, @heavy glacier, @rugged moth There are a bunch of PRs from me that could do with a little attention. They are:
- 2353 - "import .safetensors ckpt files directly" -- This allows us to import ckpt files in the safetensors format
- 2333 - "improve UI of textual inversion frontend" -- Improvements to the console-based textual inversion front end.
- 2369 - "Allow user to specify VAE with !import_model" -- Provide a user interface for adding and changing the VAE assigned to a model without editing
models.yaml - 2395 - "ckpt conversion script respects cache in ~/invokeai/models" -- Prevent the ckpt->diffusers module from re-downloading CLIP, safety checker, and other ancillary models if they are already located in the root directory.
- 2388 - "add interactive diffusers model merger" -- Scripts to merge two or more diffusers models. According to legend you can use this code to convert regular models into inpainting models, but I haven't confirmed yet.
- 2372 - "Better status reporting when loading embeds and concepts" -- This provides more informative messages about when a token has been recognized as the trigger for an embedding.
I'd like to explore the possibility of special-casing for inpainting models, which do pretty well (if not better) without the second pass used for inpainting and outpainting. Because of this I still keep the ckpt version of inpainting-1.5 around for my outpainting needs because it uses the legacy omnibus module and bypasses the inpainting code.
I did check if turning off seam painting would work better with 2.1 (the only model I could get working), but it tended to leave hard edges =/
yeah, so what's the deal with models not working for you? because that's a rather bigger problem than suboptimal seam repainting.
you load the 1.5 diffusers model (as set up by the configure script or from models.yaml.example), and what happens? it gives you this?
AttributeError: 'EulerDiscreteScheduler' object has no attribute 'make_schedule'
is this covered in a github issue yet?
the only references to make_schedule I see are in ckpt_generator, which shouldn't be able to have diffusers pipeline passed to it, and omnibus, which I thought we determined isn't used by diffusers pipeline either.
No idea. New venv, pip install from the windows requirements, ran the script to download models (got 2.1 and 1.5), then only the 512x512 2.1 model works. Errors like above for 1.5, and just black images with exclamation points for 2.1-768
put the full traceback and your models.yaml in a github issue? cuz that sounds very wrong
What I'd need to probably do is set it up on a brand new machine and see if I could repro it. Haven't had a chance to though >.<. Figured it's either something where environments are fighting each other (I'm guessing by sharing the config directory?) or there might be a bug.
yeah, could be the config directory if your models.yaml got mixed up.
could try passing a different --root_dir to invoke
same thing with the 768 model (just black output, but I have the nsfw checker off)
2.1 512 works just fine
1.5 model seems to work now though
so I guess upgrades might be a pain =/
let me know if I can check anything on the 768 model. Won't be able to do it until the morning though (it's late here and I've got a morning meeting, need to get some sleep x.x)
I can go through the install and upgrade process on my system. I'll just set up a parallel virtual environment and test there. I do not want this to be the next "basicsr" user complaint.
By the way, @sour sun has suggested that we enhance the model merging so that the merged model is kept in memory and immediately available rather than writing to disk first, along the lines of what @tardy sparrow did. This is straightforward to code, and the syntax to do this on the CLI would look like this:
invoke> !merge_models model_a model_b model_c --alpha=0.5 --interp=weighted_sum --dest=merged_model
From then on, the merged model will appear on the models list, but will not be written to disk until the user asks for that.
Should this wait until 2.4 (or until after nodes?)
I'll test this now. Was out all day yesterday.
How would a user then save that model, once they decide they do in fact like their new mix?
Any specific testing I should be doing?
@tardy sparrow ModuleNotFoundError: No module named 'diffusers.models.cross_attention'
did you update to .12?
Absolutely not.
Just following requirements.txt here, so that should be updated if 0.12 is needed.
It was mentioned in the PR
But agree, before it gets merged, would need to be updated as a req
Docs? Who has time to read those?
Anyway, as written I cannot install diffusers 0.12:
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting https://github.com/huggingface/diffusers
Downloading https://github.com/huggingface/diffusers
| 324.0 kB 1.8 MB/s 0:00:00
ERROR: Cannot unpack file /tmp/pip-unpack-q77ejh56/diffusers (downloaded from /tmp/pip-req-build-4da97prh, content-type: text/html; charset=utf-8); cannot detect archive format
ERROR: Cannot determine archive format of /tmp/pip-req-build-4da97prh```
So that means I'm at a dead end for now and switching back to main
this is in my experience caused by karras scheduling - if you set —max_karras to 0 or 1 on the old codebase you get the same artefacts
sorry my instructions on the PR are wrong, the command is pip install git+<url>
I can give that a shot, hang on.
And you want me to test on the RTX end, correct?
So I'm not using xformers right now, and I am getting this:Error! CrossAttentionControl found an unexpected number of <class 'ldm.models.diffusion.cross_attention_control.InvokeAIDiffusersCrossAttention'> modules in the model (expected 16, found 0). Either monkey-patching failed or some assumption has changed about the structure of the model itself. Please fix the monkey-patching, and/or update the 16 above to an appropriate number, and/or find and inform someone who knows what it means. This error is non-fatal, but it is likely that .swap() and attention map display will not work properly until it is fixed.
However, swap does appear to work with s_start=0 and s_end=0.
Same when using xformers. Works but I get those errors.
Tested with SD1.5 non-ema.
@tardy sparrow Also getting OOM errors making a 1024x768 image with +++ syntax.
w/hires opt
this is probably to be expected until i implement a sliced version
functionally, though - error messages and OOM aside - do you feel like this is doing everything that you need?
Might be interesting to look at
It seems to do with just using t_start. I don't really use the s_ vars.
ok, then i'm going to call it a success. will update the prompt syntax to remove the s_start/s_end vars.
Oh so we're done with them and shape_freedom?
yeah, i think so - with the next diffusers implementation the s_ switches do nothing (the method doesn't support what they used to do), and i think with only two params it's more clear what is happening
probably that can come down to one param, t_start - do you have much use for tweaking t_end?
Haven't yet, but maybe others have.
So then the new version is space-only replacement and you get to decide when that kicks in?
token-only replacement, yeah. after thinking on it a wee bit i think i'll leave both params in - no reason to remove t_end really
Should this message be removed now that monkeypatching is gone?
there was no patch done to fix the seam issue right? coz i think the issue is actually with seam steps being too low. Increasing them is consistently improving the results
and im guessing the 2.1 depth model and the x4 enlarge model dont work yet?
@clear hinge I noticed in inpaint.py that seam_paint was receiving the noise as x_T but never being used. Instead the seam_noise was being generated with self.get_noise(im.width, im.height) .. When I replaced the passed noise with this, I seem to be getting better results.
with diffusers that is
ah... if that code changed then I guess that could have an impact? IIRC, in the older code, get_noise was called per iteration if you were generating multiple images. So I had to create new noise to send to seam paint, then restore afterward.
let me try multiple iterations and check
even with x_T, I'm getting different results if i do multiple iterations
yes it should, the PR isn't done yet
!optimize
I don’t think so - talking with Lincoln last night, believe it’s !commit
You're the boss, but it seems logical that the existing optimize command would do the job, so i don't see a reason to add a new one for the purpose...
Lol - not trying to imply it “ought” to be !commit - just that it is.
!optimize is for passing a ckpt and converting to a diffusers model - how do you see using it in the context of offloading a merged model from memory?
Is the lack of slicing also why I'm getting OOM errors in main?
Prior to diffusers, I could do 1536x1536 without a problem. Now I can do at most 768x768 (or so).
yes, almost certainly
for parity with the old codebase the attention slice would need to be computer per attention head, per step
this is doable, it's just currently not being done. i'll take a look at it after i figure out the sliced .swap()
Thanks a lot for the PR - merged it! You can test your PR with the changes by doing the following:
DiffusionPipeline.from_pretrained(..., custom_pipeline="~/diffusers/examples/community/checkpoint_merger.py)
I.e. passing the local path of the changed file to the pipeline
I should have thought of that! It would have made testing a lot easier. Was this something I missed in the documentation?
@rose sentinel - FYI; I started a TI training and got ~5% of the way through training, and closed. Doesn't seem like it saves any .bin unless it gets through the whole thing?
You will only get .bin files every 500 steps.
It saves a checkpoint at 500, 1000 and so forth. There is a parameter for adjusting this that I haven't exported to the frontend.
Once you have a checkpoint, you can resume from there.
Got it 👍
Taking a look at txt2img2img, this math seems off:
scale = 512 / scale_dim
init_width, init_height = trim_to_multiple_of(scale * width, scale * height)```
If I want a resulting image that's 1024x512, that means that the initial image generated is going to be 1024x512. That explains the duplication I've been seeing. Anyone object to me altering this?
It's been this way for a long time, too, so it is not an insignificant change.
It is definitely off. The code should be finding the larger dimension and then scaling it to 512 (or 768 for the SD-2.1 model). The smaller dimension should be scaled the same.
Well I think the question is - What is the lower bound of the smaller dimension, and should it be 512 or lower
I don't think you'd always want the larger dimension to be scaled to 512 - There's probably a happy medium that keeps the smaller dimension from going so far below 512 that it generates garbage. But @west nebula you probably have used it more and know what the balance is
Is there a way to find the model's default resolution inside of class Txt2Img2Img during construction or get_make_image?
It should be available in the model information
That's stored/pulled from the model config file (W/H)
and gets loaded into the model list on initialization
Time to dig!
Here's where I'm at right now:
# Make their area equivalent to the model's resolution area (e.g. 512*512 = 262144),
# while keeping the minimum dimension at least 384
aspect = width / height
model_area = 262144 # hardcoded for now
if aspect > 1.0:
init_height = max(384, math.sqrt(model_area / aspect))
init_width = init_height * aspect
else:
init_width = max(384, math.sqrt(model_area * aspect))
init_height = init_width / aspect
init_width, init_height = trim_to_multiple_of(math.floor(init_width), math.floor(init_height))
print(f"\nUsing initial resolution of {init_width}x{init_height}\n")```
Is any of that passed down to this layer or accessible from here?
it probably can be, but I'm not as familiar with hacking in the python
Likewise.
This new code seems to generate good results, but they're definitely different than what we were getting before. So if consistency is what we're after between versions, we should stick with what's there. If not, I like this better...
care to share any examples of before/after?
I think we've aligned on better over reproducible into eternity
One sec.
Oddly, after all of that, I don't see any visible difference in the output.
That points to all of that code being ignored.
And that is precisely what's happening.
Generator's generate method calls the subclass's get_make_image, which ultimately uses the tensor passed down to it from Generator - whose size is the width and height passed in directly.
This all changed from 2.2.5 where we pass a shape (without checking that the tensor is large enough) and x_T as passed in.
At least that's my read of it. I'd love another pair of non-tired eyes.
im not surprised that this changed w/ diffusers
frankly we needed to update it anyways since we're doing non-512x512 images! 😛
There are two code paths: one for ckpt models and the other for diffusers models.
The first path goes through ldm/invoke/ckpt_generator, and the second path goes through ldm/invoke/generator.
We will deprecate ckpt files and eventually discontinue support for them, but right now dropping ckpt support (or forcing everyone to convert) would be very unpopular.
The model's default resolution information is available to the web and CLI UIs. Unfortunately it does not get passed to the generator level. I will see what can be done to fix this.
This is a big ugly, but you can do this:
dimension = model.unet.config.sample_size * model.vae_scale_factor . You'll want to put a try around it, because it only works with diffusers models and may not be a stable API.
once all of the work with diffusers is done with the model management, please brief me on the requirements for the yaml and ill update the model manager to support it on the frontend
I will write up a document. I'm not sure there are any changes pending. I'm resisting the temptation to create a new syntax for on-the-fly merged models.
ye i was holding off coz redoing that code again and again will just become a hassle
ill keep it as the last thing to do
most of the UI is already there.. just need to update the writing logic
OK. I can try that. The bigger issue here is that --hires_fix does nothing productive when using diffusers.
H'mmmm. Stable-diffusion-1.5, "50s housewife", 832x832:
Same thing, but with --hires_fix:
!optimize takes an existing model from your models.yaml file (currently assumed to be in ckpt format), and converts it to a diffusers model. If there is going to be a new "format" for a model in the models.yaml file that is a recipe for merging several other models, and you need a command to convert THAT entry to a self-contained diffusers model, it makes sense that you would use the same command for that.
Unfortunately, even if I use the same seed for both images, I won't get the same thing because the hiresfix version starts with a different dimension.
I dont believe that's necessarily what's being discussed.
A (temporarily) merged model would be stored in VRAM memory, with no reference on the .yaml
The command in question would instruct the system to dump from vram into a permanent entry on the yaml, and create the model files/configs/etc. - Which is related to !optimize, but functionally different
(This is my understanding, @rose sentinel is the keeper of truth here)
I've got several conflicting ideas in mind. One is a workflow in which the user selects two or more models, merges them in memory, tries the merged model out, and if they look it they commit it to disk with a whole new name.
fwiw, i like that idea/workflow
Dropping support for k_diffusion doesn't need to mean dropping support for ckpt files. @tardy sparrow has already added code to load a ckpt file and convert it to diffusers format in RAM, without writing the file. This allows the diffusers code to still accept ckpt files.
can’t we just use my in-memory converter? id plug it in myself but i don’t know enough about how the model management/loading works to not make a mess of it
While it doesnt need to mean deprecating, CKPT files are being deprecated because we're currently maintaining 2 code paths to support both model formats
The second conflicting idea is to extend the models.yaml format to do merging as needed. A stanza would look like this:
my-merged-model:
source_models: [model_a, model_b, model_c]
merge_alpha: 0.8
The only reason we didn't remove CKPT support in the interim is because we wanted to have a grace period
Not a fan of this approach, mainly because I think most people end up wanting to share merged models out
(And also like to do multiple chained merges)
It does a second pass on the image it generates with --hires_fix, but try changing init_width and init_height in txt2img2img and see if they're used at all.
But I'm saying that instead of sending ckpt files to the old codebase, you can just send them through @tardy sparrow's in-memory conversion script, and treat them like regular diffusers models after that.
ckpt support doesn’t need to be removed. my adapted converter code takes a ckpt and a yaml and some vars we already save in models.yaml and returns a stable diffusion pipeline
this.
I don't understand. We're talking about merging, not converting. I'm happy to incorporate your in-memory converter when a ckpt is loaded, but right now not all ckpts convert correctly. I just ran into another case yesterday (I think I pinged you on it in the Issues).
The safest thing right now is to load the ckpt using the old code.
in the meantime invoke’s converter is probably drifting further and further from the one that has already been merged into diffusers, yes?
seems like this is just going to cause more pain down the line.
what is the user base % that is using models that don’t convert ?
Nope, converters are in sync
i mean, my changes that have been merged into diffusers source from this PR https://github.com/huggingface/diffusers/pull/2019
i guess i don’t understand why any new features are being developed for the ckpt code path
surely the better thing to do if merging is being implemented is to expressly only support merging diffusers compatible models
problem solved
@rose sentinel The actual sizing is done in get_noise, I've learned.
Just a wee bit kludgy!
I synced in your changes to the diffusers convert_from_ckpt.py yesterday.
Tell you what. As soon as we've got inpainting and outpainting working on the diffusers models as well as they are working on ckpts, I'll drop all direct ckpt loading and do the in-memory loading of ckpt->diffusers.
No features are being added to the ckpt code path. For example,@west nebula is fixing txt2img2img on the diffuserse code path.
@tardy sparrow There is one set of changes to the load_pipeline_from_original_stable_diffusion_ckpt call that would be very helpful, and would let me import from diffusers rather than a forked copy of it. That would be to pass through thecache_dir parameter to all the from_pretrained() calls. This would allow reuse of the openai/clip and nsfw models if they are stored in invokeai's root directory.
old vs. new initial image sizing - totally different but unsure of which is better.
I'd also like to see strength (CLI -f) represented on the UI alongside high-res optimization as it does pass through and can be really useful.
hmm. can you work around it by setting the HUGGINGFACE_HUB_CACHE environment variable before making the conversion call? or does that get loaded once at startup?
bottom one seems more coherent (see the hands/manipulators)
The diffusers library seems to initialize from the environment variable at code load time, so it's very hard to control and leads to race conditions. In some of the pipelines (such as merge), there is a **kwargs argument that gets passed through to from_pretrained(), and this seems to work. In addition to use_cache_dir there is only_local and other useful options.
I agree that the bottom one is better. However you need to generate a series of images with the old vs new code in order to accommodate stochasticity.
The point of the PR is to reduce duplication in images, and I don't know if there's a great way to objectively test that.
can we have both?
Yes, you can manually generate to a suboptimal resolution and use img2img to upscale to your desired resolution.
It's not going to look identical, but you then have total control.
I prefer the new one cause it's obviously less duplicated (4 ears)
And it has fewer limbs as an added bonus.
yeah
That's the extreme case, something so stretched out. I'd use outpainting if I were producing something for real with it and not a contrived test case.
I keep preferring the latter ones with the new scaling but maybe that's just me.
then, me too. Looks like the old ones are somewhat repetitive and "flat"
but they may be preferable in some cases
Nothing precludes somebody from doing img2img on a generation they like of a smaller image. That's how I mostly use InvokeAI. But I'd rather give better images out of the box for --hires_fix if we can.
It all needed to be fixed for 768x768 models anyway
I continue to be a bear of very little brain so far this week.
I've added tracking issues for our several distinct inpainting problems to https://github.com/invoke-ai/InvokeAI/issues?q=is%3Aissue+is%3Aopen+sort%3Aupdated-desc+milestone%3A"2.3+🧨"
some issue with the actions
@worldly cloak i'm looking to enable sliced attention as part of my cross attention work - am i correct in assuming that the correct place to do that be in StableDiffusionGeneratorPipeline.__init__(), something like this:
if is_xformers_available() and not Globals.disable_xformers:
self.enable_xformers_memory_efficient_attention()
else:
slice_size = 2 # or 4 or 8 i guess
self.enable_attention_slicing(slice_size)
We just released 0.12.0 - hopefully that helps a bit regarding stability. We've included a section about important bug fixes which is probably interesting/important for you: https://github.com/huggingface/diffusers/releases/tag/v0.12.0
talk about timing
sliced support is in - @west nebula can you give it a spin and see if your memory usage improves? you can edit the slice size at the bottom of StableDiffusionGeneratorPipeline.__init__() (line 310) - smaller numbers should mean less memory usage at the expense of performance
Fine by me, I refuse to use xformers until it gets fixed.
basically it temporarily disables xformers when you do .swap()
i mean, that's what it does in theory anyway.
Are your changes in main or elsewhere?
Re-implementation of .swap() for diffusers 0.12's new CrossAttnProcessor API.
needs diffusers 0.12 pip install https://github.com/huggingface/diffusers
currently only tested/working on mac CPU ...
diffusers 0.12 is now released so you can just do pip install diffusers==0.12
and also transformers==whatever
theoretically, though i don't know the tradeoffs of different settings.
I'll track it down.
So the sliced support should help with all large image generation, correct?
should do yes
Stand by!
Maybe there should be a configuration that lets you choose slice sizes proportional to the image size, so that you're not leaving performance on the table when generating small images, but you can keep the memory requirements lower when you need to, for generating larger images?
If you're doing that, the approach should take VRAM into account as well if we have that information. But for now I'll settle for "make big images" even if I have to configure something manually.
Before I play with 8, 4, 2 as options, I'd like to know what the number influences.
Mostly if I set it wrong, will it crash? Or is there a way for the number to adapt to an OOM error? Or...
At the default value with 12GB doing outpainting of a 1024x1024 region...
I think VRAM's not getting freed up somewhere since it did the initial outpainting 1024x1024 generation without a problem.
I have noticed that Invoke seems to grab a chunk of RAM (not sure if it's VRAM or System RAM, since I'm using unified memory) proportional to the image size, that it doesn't free up when you switch to generating at a different size. a1111 does it too.
this is what the old ckpt path does, and it also takes VRAM into account
There is/was a --free_gpu_mem but it doesn't work with diffusers out of the box.
can you try adding this
del remapped_original_attn_slice, modified_attn_slice
just before the line that's crashing:
attn_slice = torch.bmm(attn_slice, modified_value[start_idx:end_idx])
Will do. Hang on.
In diffusers?
NVM, trying it now.
Still crashing after this change and slice size set to 2.
Maybe this is something in inpainting not freeing VRAM?
This happens just after starting the second inpainting (outpainting) pass when it blends the seams.
I don't think it has anything to do with the attention slicing code directly.
right. yeah idk.
Hmm.
So your code works and lets me do txt2img with much larger sizes than I can do without it. If that's the PR, then success!
i think this is done https://github.com/invoke-ai/InvokeAI/pull/2385. can i please get some more testing support - need to confirm it works on Linux and Windows, and also need to confirm that xformers is successfully re-enabled when doing a non-.swap() generation after doing one with .swap()
ahh good to know. actually that's just a side-effect of enabling slicing globally.
I'm verifying some more now.
I think that something needs to free the GPU memory somewhere in the rendering pipeline. Diffusers is not good about returning memory.
Just had another crash trying to do 1536x640 - I was always able to do this pre-diffusers.
Where's a good place to throw some debugging output to see that attention slicing is doing its thing?
I take back my prior statements about this working. 🙂
the system-level monitor is okay for showing a high water mark, but it's not so useful for identifying whether there's a memory leak.
https://pytorch.org/docs/stable/notes/cuda.html#cuda-memory-management
The "Max VRAM used for this generation: xxx Current VRAM utilization: yyy" log lines are more relevant for that.
This run, I could do a first generation of something large but I cannot do a second.
>> Could not generate image.
>> Usage stats:
>> 0 image(s) generated in 14.91s
>> Max VRAM used for this generation: 9.86G. Current VRAM utilization: 2.17G
>> Max VRAM used since script start: 9.86G```
And VRAM is maxed out according to nvidia-smi:
Wed Jan 25 17:03:29 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.65 Driver Version: 527.56 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:01:00.0 On | N/A |
| 0% 44C P8 11W / 170W | 11517MiB / 12288MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 28336 C /python3.10 N/A |
+-----------------------------------------------------------------------------+```
what was max & current VRAM for the prior run that worked?
Just on startup... hang on.
>> 1 image(s) generated in 71.28s
>> Max VRAM used for this generation: 6.19G. Current VRAM utilization: 2.17G
>> Max VRAM used since script start: 6.19G```
so it previously worked in 6.19G, but the next run it's trying to allocate an additional 7 GB on top of 9 GB?
😖 those numbers extremely do not add up
Makes me think there's a leak somewhere important.
Here's the log of the entire run.
I can generate small things after this point but anything large throws up.
yet the first one ends at 2.17 G, which is the same as the initial "model loaded" number. So it doesn't look like it's leaking there. 
VRAM also gets freed up after a small image generation but clearly something has changed since I can't go big again until restarting Invoke.
I also never see cross_attention_control.py's Context created, so it doesn't look like it's used for saving slices. (Is this right for diffusers?)
I'm not up to speed on that cross-attention PR yet
hmm, alternate hypothesis: it's not a leak, it's some configuration that's being reset incorrectly.
e.g. maybe the attention slicing was on for the first run but then got turned off, or a different slice size
I figured I'd throw some debugging output into save_slice but I never see it.
Not even the first run.
if you're not using swap, I wouldn't expect Invoke's custom cross-attention code to be involved.
True... so then the only slicing code that's happening is built into diffusers with the one call: enable_attention_slicing(slice_size=slice_size)
And then either things aren't getting reset properly (as you said) or there's a VRAM leak somewhere.
SlicedSwapCrossAttnProcesser in ldm/models/diffusion/cross_attention_control.py
I added ("").swap("") to the end of my prompt and I can render over and over again. Does that help at all?
nope haha
i tried to clean up the cleanup by putting it in a @contextmanager function - custom_attention_context on InvokeAIDiffuserComponent. it's possible i messed that up.
because i haven't written a @contextmanager before.
When I dumped a print statement into here, it only triggered when I used the dummy swap.
it should trigger for all .swap() prompts
Right.
my feeling reading all of this is that the problem probably doesn't lie with the cross-attention stuff at all, because mathematically ("").swap("") is still swapping two 77x768 embedding tensors
if you make the slice size smaller, does that help? what if you comment out the enable_sliced_attention?
You're relying on the diffusers attention slicing there, right?
yeah
I'll modify and restart Invoke, but I have to go eat dinner as well so this may be a bit.
I commented out the line. Behavior with any .swap() is the same as before, which is expected. I can't generate a large image at all without .swap().
And after an OOM, I can still do large generations with .swap() whether the line is commented out or not. (I imagine this is expected.)
Does something get reset after a generation and we need to issue enable_sliced_attention again at the beginning of each generation?
I've tried using that. But it doesn't seem to make any difference in apple silicon. I think it just moves things from VRAM to RAM, which is the same thing on apple silicon.
@tardy sparrow Putting this.enable_sliced_attention('max') at the beginning of latents_from_embeddings does the trick. Can you move the attention-setting code to a new function that's called from there?
See this patch that seems to work (I only tested w/o xformers).
The evolution of this would be to see how much VRAM is required and if attention slicing is necessary.
--free_gpu_mem is working on the checkpoint path (still) but not implemented for diffusers. However, the CPU caching code in model_manager.py does work with diffusers, and so it shouldn't be hard to get free_gpu_mem working for all.
@rose sentinel Are you using xformers? If so, can you test @tardy sparrow's branch with my patch above generating a larger-than-will-fit-in-VRAM image?
Happy to give it a try. Sounds like there's been a lot of progress since I last checked in here!
any luck on the inpainting issue? i haven't been able to keep up today
on MPS inpainting is fixed with upstream pytorch 2.0 on the main MPS fork - but idk when it will make it into pytorch 2.0 nightlies
so with diffusers 0.12 and my new Compel library i've rolled out of the InvokeAI prompting code, custom prompt syntax with diffusers looks like this:
from compel import Compel
from diffusers import StableDiffusionPipeline
pipeline = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
compel = Compel(tokenizer=pipeline.tokenizer, text_encoder=pipeline.text_encoder)
# upweight "ball"
prompt = "a cat playing with a ball++ in the forest"
embeds = compel.build_conditioning_tensor(prompt)
image = pipeline(prompt_embeds=embeds).images[0]
what it doesn't support for now is the TextualInversionManager. i'm going to make Compel and textual inversion code as decoupled as possible - once that's done it should be possible to replace all of invoke's prompting and conditioning wrangling with import compel
@tardy sparrow I've been doing a lot of testing and this works w/o xformers. No crashes to report. Note that I use 'max' instead of 'auto'.
'auto' doesn't cut it (pun intended) for large images.
great.
thanks for trying it out
can you open a PR against my branch?
then i can just fold it in.
or.. actually reading the patch now. so you've refactored the attention type setup to its own function, and then that gets called prior to every generation?
are there other places than image_from_embeddings() where this needs to go? i suspect image_from_embeddings() isn't the only codepath to generating images
might be wrong though
Yeah, I'm not sure. It is the path for txt2img2img, img2img, and txt2img. Unsure if there are other places where it's called but that's a start.
And yes, I refactored it because we may want to have some sort of behavior there to determine whether to use no attention slicing, 'auto', 'max', or a number. No idea yet but it seemed prudent to pull it out.
If you still want me to open a PR, I can do it much later today... otherwise feel free to apply that patch yourself.
ok no worries, seems pretty simple so i'll just go ahead
i have some comments from @worldly cloak (thanks for the review!) to address too
I forget who asked for this originally, but I have posted a PR that will remove the dependency on the original implementation of clipseg (for text-based masking) and use the huggingface transformersversion instead. In addition to reducing code complexity, this removes the last source-code module dependency that was preventing us from posting InvokeAI to PyPi for hands-free installs. @gusty hound
Is there any documentation about the cache location for models and how to set it?
if you set HF_HOME it should be honoured by Invoke, but huggingface are a bit inconsistent about how that gets interpreted - i’ve seen HUGGINGFACE_HUB_CACHE also referenced.
Did Invoke download HF models originally to a different location?
diffusers by default puts them in ~/.cache/huggingface
at some point either lstein or keturn explicitly told it to put them under your invokeai folder
I guess that point has passed and I can purge things from there.
yeah, unless you find yourself doing stuff with diffusers standalone you don't need them
That's 79 gig that I don't need filled up.
noice
I miscalculated and it's not that much, but still. No need for duplication anywhere.
Wishing ext4 had deduplication right now.
yeah there's definitely some "hardware imperialism" going on with the assumptions HF are making about the kind of systems ML users have
I run all of my AI stuff on rusty platters because things are so large and there's a lot of churn, but once it stabilizes I'll likely get a smallish SSD.
your patch seems to be working fine on my vast.ai instance
@here
can i get a windows test
@west nebula i think i found why your patch fixed things - can you try again?
Fill me in before I pull?
ah, i was restoring None to the cross attention processors for every non-swap
^_^
should not hvae been doing that
I'll pull and test now.
Wait, it won't work.
'auto' doesn't cut it, I need to use 'max' on my setup.
Though as we discussed there should be a strategy for this.
yeah, 'max' has performance implications for smaller images
The flip side is that I can't generate anything large without it on my GPU with 12GB VRAM. 1536x1536 crashes.
I was able to do medium-sized things, so that's good.
I think it should be a per-generation choice based on available memory. We should be able to estimate how much is needed, or try generating and switch for the render if we get an OOM...
i agree, i just think this should be a different PR
because it touches more stuff including UI
(potentially)
So in the meantime we just can't generate large things, and this still needs to be tested with xformers (@rose sentinel)?
And is there a way to estimate how much VRAM a generation will require?
There must be, because it's deterministic, but there are enough layers and variables that I've yet to see anyone work out the precise equation.
But that's what the equations like this in the old attention code were trying to get at:
max_tensor_mb = mem_free_total / 3.3 / (1 << 20)
size_mb = q.shape[0] * q.shape[1] * k.shape[1] * q.element_size() // (1 << 20)
if size_mb <= max_tensor_mb:
…
Yeah, hmm.
I think it should be trivial if we know the precision and base tensor sizes for 512x512.
We know the images are 1x4x64x64, for example.
A few things to look at before investing too much time in optimizing the current implementation for biggest-possible-image:
• Birch-san has a non-xformers implementation of memory-efficient attention, which could be included upstream. This sounds better than the current (non-xformers) attention slicing implementation: https://github.com/huggingface/diffusers/issues/1892. Useful for non-CUDA platforms. No idea whether it's useful for improving the cross-attention-control code.
• If your platform is supported by xformers but you've been avoiding it due to reproducibility issues, this change (which did just make it in to diffusers 0.12!) allows better control over some of those tradeoffs: https://github.com/huggingface/diffusers/pull/2049
Those sound interesting and potentially good to have in. In truth, I don't care which combination of things works as long as Invoke doesn't crash in cases where it used to (pre-diffusers) run fine.
Better ML folks should decide the implementation details, in other words.
I remember reading Birch-san's paper, so I'm definitely curious to see it in action.
anyone successfully managed to install triton on windows 11 yet?
Hey 🙂
Very cool that you've merged the big diffusers PR!
Would it make sense if we maybe open a new channel InvokeAI<>Diffusers (maybe here) where I can also try to invite some other diffusers contributors and where you could quickly ping us in case you have more in-detail questions? It's a bit difficult to follow all the discussions here 😅
We mainly use Slack with the diffusers team, so we could also use a Slack channel (there we'll surely be super reactive), but Discord is very good as well. I'll try to check at least every 2nd day.
Yeah. Thanks Patrick. That sounds like a good idea. @heavy glacier We setup a new channel?
Yep - Will get it set up
#🧨diffusers is born
I've also added the <@&1068278564010606611> role to highlight the team
Where is @tardy sparrow 's new memory-efficient slicing code? I can't find its PR.
Deterministic behavior is very important to lots of people. Does the use of Flash Attention have tradeoffs, or can we make it the default behavior?
Re-implementation of .swap() for diffusers 0.12's new CrossAttnProcessor API.
needs diffusers 0.12 pip install https://github.com/huggingface/diffusers
currently only tested/working on mac CPU ...
And does that truly produce deterministic behavior or is it only somewhat better?
Documented in the CHANGELOG. Should be moved into main documentation as well:
2. The format of the models directory has changed to mimic the
HuggingFace cache directory. By default, diffusers models are
now automatically downloaded and retrieved from the directory
`ROOTDIR/models/diffusers`, while other models are stored in
the directory `ROOTDIR/models/hub`. This organization is the
same as that used by HuggingFace for its cache management.
This allows you to share diffusers and ckpt model files easily with
other machine learning applications that use the HuggingFace
libraries. To do this, set the environment variable HF_HOME
before starting up InvokeAI to tell it what directory to
cache models in. To tell InvokeAI to use the standard HuggingFace
cache directory, you would set HF_HOME like this (Linux/Mac):
`export HF_HOME=~/.cache/hugging_face`
The default HF location on my system is ~/.cache/huggingface rather than ~/.cache/hugging_face, and it seems that the latest main is putting things there rather than in ROOTDIR/models/diffusers. I do get a message about fetching files whenever I switch models.
I don't have HF_HOME set.
So if the CHANGELOG is correct, there's a bug.
~/.cache/huggingface is correct and the CHANGELOG is wrong. I just posted a PR to fix the docs: https://github.com/invoke-ai/InvokeAI/pull/2430
More seriously, though, the code ought to be downloading into ROOTDIR/models/diffusers. I am not seeing this behavior on current main. Models are getting downloaded into ROOTDIR as wished for.
Diffusers has a slightly annoying property of printing "Fetching XX files" and displaying a progress bar even when the files are already cached to disk and available. I wonder if there is a way to suppress this without suppressing the bona fide progress bar that is displayed when the model is actually being downloaded? @forest spade
Any chance inpainting is improved for CUDA? In current main, outpainting with runwayml/stable-diffusion-inpainting is still massively broken:
@rose sentinel Is this the final format of the diffusers model in the model.yaml
--- description:
--- path:
--- repo_id:
--- format:
--- default:
--- vae:
------ path:
------ repo_id:```
is there a reason that the yaml uses 3 spaces for indentation instead of just two?
Current main is dumping things into the default HF cache dir for me - ~/.cache/huggingface.
the yaml doesnt.. i just typed it here manually
INITIAL_MODELS.yaml actually is using 3 😅
So I wondered if there is a reason for this 🙈
maybe that's the setting in the creator's editor 🤷🏻♂️
ive fixed up the issues in the model manager webui pertaining to the diffusers models. with the above format. I'm not sure if the precision entry is still a thing. Can add it up if it is.
I've also changed the seam steps default to 30 because it seems to solve a lot of low quality issues with the 1.5 and 2.1 models. Inpainting still seems to have minor issues but its a start.
ill give the latest a try
do you have XDG_CACHE_HOME set by any chance?
https://huggingface.co/docs/huggingface_hub/main/en/package_reference/environment_variables#hfhubdisableprogressbars
seems like os.environ["HF_HUB_DISABLE_PROGRESS_BARS"] = 1 might do the trick; we can set it in the Globals such that it takes effect as early as possible
(edit) however, I didn't read your message fully.... this does suppress the bona fide progress bars, unfortunately
indeed. Excellent find. So there are how many variables that refer to the cache location now?
ldm/invoke/globals.py - home = os.getenv('XDG_CACHE_HOME')
yea. i stepped away for a bit, sorry. IMO, we shouldn't be relying on that env var: 1) it only applies to Linux, and 2) very inconsistently even at that (it exists in some GUI sessions on some distros). So, we should just look for HF_HOME (which is also respected by HF libraries), and if not found, default to our good ol' INVOKEAI_ROOT/models. IF the user explicitly wants to use the shared Huggingface cache, they can set HF_HOME.
I like that approach. There's been a lot of churn in this area.
Yes.
I don't want to turn off the progress bars completely. Just disable the 100% progress bar when the files are already downloaded and cached.
Does the YAML format call for 2 spaces? I was unaware that it mattered.
This was added in a contributor's PR about a week ago. It was consistent with the huggingface documentation, so I accepted and merged it.
The precision entry is no longer a thing.
See my comments on your PR about this.
noted. the model manager should be good to go i think .. for manual purposes.
at some point in the future when most of the users have moved to diffusers, i think we can automate the process a bit .
tested most edge cases ..seems to work as intended .
I'm in the mood to remove support for XDG_CACHE_HOME if it is a Linux-only thing that is inconsistently used.
XDG_CACHE_HOME defines the base directory relative to which user-specific non-essential data files should be stored.
That doesn't sound like a good fit to me.
@worldly cloak Have you tried to generate images with the diffusers version of the runway inpainting-1.5 model? I'm now getting this type of error when I do a txt2img (not inpainting):
Traceback (most recent call last):
File "/home/lstein/Projects/InvokeAI/ldm/generate.py", line 506, in prompt2image
results = generator.generate(
File "/home/lstein/Projects/InvokeAI/ldm/invoke/generator/base.py", line 109, in generate
image = make_image(x_T)
File "/home/lstein/Projects/InvokeAI/ldm/invoke/generator/txt2img2img.py", line 48, in make_image
first_pass_latent_output, _ = pipeline.latents_from_embeddings(
File "/home/lstein/Projects/InvokeAI/ldm/invoke/generator/diffusers_pipeline.py", line 357, in latents_from_embeddings
result: PipelineIntermediateState = infer_latents_from_embeddings(
File "/home/lstein/Projects/InvokeAI/ldm/invoke/generator/diffusers_pipeline.py", line 189, in __call__
callback(result)
File "/home/lstein/Projects/InvokeAI/backend/invoke_ai_web_server.py", line 1212, in diffusers_step_callback_adapter
return image_progress(progress_state.latents, progress_state.step)
File "/home/lstein/Projects/InvokeAI/backend/invoke_ai_web_server.py", line 997, in image_progress
image = self.generate.sample_to_lowres_estimated_image(sample)
File "/home/lstein/Projects/InvokeAI/ldm/generate.py", line 983, in sample_to_lowres_estimated_image
return self._make_base().sample_to_lowres_estimated_image(samples)
File "/home/lstein/Projects/InvokeAI/ldm/invoke/generator/base.py", line 204, in sample_to_lowres_estimated_image
latent_image = samples[0].permute(1, 2, 0) @ v1_5_latent_rgb_factors
RuntimeError: mat1 and mat2 shapes cannot be multiplied (4096x9 and 4x3)
>> Could not generate image.
My error - this is with txt2img2img.
With the inpainting model, txt2img is working, txt2img2img is crashing, inpainting is producing wacky results as demonstrated in my earlier message, and img2img is producing fuzzy images like this:
I'm getting this in my output:
DEBUG: choose_autocast() called
Is that expected at this point?
Sorry, I've been meaning to remove that. I'll find and remove it at next opportunity.
np
@rose sentinel Did you merge the TI front end into the other TI script? I'm getting this when I try to run it.
To get the front end you now need to pass --gui, as in textual_inversion --gui
python scripts/textual_inversion.py --gui
That's the output from that command - see the 1st line.
Sorry, that script's defunct. I should have removed it.
The command is textual_inversion without the .py.
Should be on your path after doing a pip install -e .
Just made a PR to fix this. Check the updated documentation. I believe I brought it up to date.
I also moved the merge script.
Ah, I launch everything from scripts. I'll have to find textual_inversion.
Haven't done the pip install -e . because I'm switching between branches constantly.
See if it is on your path. Supposedly this is the more pythonic way to do it.
You can run directly by calling python ldm/invoke/textual_inversion.py
Nope, not on my path. I'll call it that way and eventually do the pip install -e ..
It won't be on your path unless you do the pip install.
Let me know if it works when you call it directly, and apologies for leaving the old nonfunctional version in the scripts directory.
Got it. I'll call ldm/...
Doing it the pythonic way makes sense if you're a python person. 🙂
Me? I like python scripts/....
This fired up just fine. Trying it out now.
Since diffusers now contains @tardy sparrow's module-ised conversion script that lets you import a ckpt file and use it with diffusers, without saving the converted file first, I was thinking it would make sense to add a --force_diffusers option, which would cause non-diffusers models to be loaded through that converter, instead of using the old code. It would make a lot of testing easier. Then when you're ready to get rid of the old code, you could make that setting always-on, so that users can keep using the .ckpt models, and not need to convert them all.
this is in the fast latents. looks like the inpainting model is producing a latent tensor with 9 channels instead of the expected 4
ok
so I think the issue in general w/ inpainting (and maybe img2img?) is low img2img strength/steps
With the inpainting model?
I'd like to try to reproduce the inpainting issues with main. Is there a procedure?
- generate a good image
- inpaint an area with a low strength (.15, for example)
- observe quality
I have yet to test the latest main w/ outpainting
im going to do a direct workflow comparison/seed/etc. on 2.2.5 and 2.3
Yeah, I would never inpaint with .15... that's going to generate garbage.
So in the past when I've used masking I always turn the inpainting strength to at least 0.3 otherwise I get basically the source image out.
Yes
I typically do .15 only when trying to get miniscule variations of an image
I was surprised to see the deterioration of quality when doing that in 2.3, which leads me to believe that may be the source/related to the quality issues elsewhere
Did you try a high-strength (say 0.6) comparison?
yeah i was working w/ higher strength without noticeable issues
doing a 1:1 comparison of workflow for both outpainting and inpainting now
and will share outputs
...and settings, please.
yep
Hm.
Same seed, model on 2.3. wildly different outputs
(base generation, not even talking about inpainting)
maybe i need to update diffusers
to get xformers fix
probs. 😛
Or the boss move of --no-xformers.
As soon as I started getting wildly different results, I disabled it.
i need the speed
ok - so maybe something got changed alongside the diffusers .12 update - not able to generate the same image (unsure if xformers or diffusers) but inpainting and outpainting both seem dramatically improved.
GREAT NEWS
im thinking we're nearing RC @rose sentinel !
I hope so. I'm hoping for the installer work to be done by the end of this weekend (@dire gazelle and @gusty hound ), but am unsure of the status of the inpainting/outpainting work. There seem to be issues with the inpainting-1.5 model. The ckpt version is still working, but (1) it doesn't convert cleanly to a working diffusers model, and (2) the huggingface diffusers version has multiple issues described last night. One option is to punt and to advise people to use the ckpt model exclusively.
So when are we going to attempt to fix the issue with large images and VRAM OOM?
We first need to merge in @tardy sparrow's PR https://github.com/invoke-ai/InvokeAI/pull/2385
Re-implementation of .swap() for diffusers 0.12's new CrossAttnProcessor API.
needs diffusers 0.12 pip install https://github.com/huggingface/diffusers
currently only tested/working on mac CPU ...
There is a trick for merging the inpainting model with a second model to generate an inpainting version of the second model. However, we need to support the diffusers version of 9-channel models for this to work.
I tested this and it seems to work. If there are no issues identified, I think it can be taken out of draft status and merged.
Are we talking about the same PR?
I think we could implement it a diffusers inpainting version as a 2.3.1 fix
The code in @tardy sparrow's PR gets us somewhat there but not entirely - I can't generate the same sizes I did with 2.2.5.
I also think larger image generations can be investigated in 2.3.1
Unless we're able to beat installer on fixes
I will note, I think I am still seeing some quality quirkiness on 2.3 w/ inpainting
but it might just be non SD1.5 models...
So the errata/caveats on 2.3.0 are somewhat more than past releases.
It could very well be a 3.0, even though we're not calling it that
definitely is a big 'migration' type release
Sure.
But there are a lot of minor issues that we'll have to address so people may not want to make the switch right away.
i think we're better off releasing vs. holding off, but thats not a strongly held opinion
we can put it in RC and fix the major stuff while thats being used
RC doesnt mean "no fixes left to do" as we've all experienced 😅
this is after a couple passes with a non SD 1.5 model
using SD 1.5 afterwards, seems to actually do inpainting fine... 🤔
OK **this **is an interesting discovery.
my SD 1.5 is on ckpt generation path
im going to try another ckpt model, then switch to diffusers
After 10 generations at .15 on a ckpt model.
another 10 at .6 (lets not focus on the
itself)
diffusers model at .15
Lots of artifacts in the surrounding region.
yep
Can you show what exactly you have masked?
Trying to see if the artifacts are on the border or inside mostly
at .6
one sec ill do a direct comparison w/ mask
full image before
masked
after
its going outside the mask w/ seam paint
So that's similar to what I was seeing yesterday with my converted 1.5 non-ema model.
loopbacked img2img at .15
issue is in img2img
a .75 img2img
low strength img2img 🧠
WE FOUND IT
canvas or non-canvas img2img?
non-canvas
So low-strength only yields problems, or at least ones that are noticeable?
Can you try with ddim?
This is why I never noticed problems to this extreme degree - I almost always use ddim.
steps?
I found one seed in particular that screwed up my image with k_dpmpp_2 @ 30 steps, 0.15 i2i strength - 1304976924
The inpainting and outpainting on the non-inpainting models has definitely improved considerably. I haven't done extensive comparisons but it feels like the "success" rate between the ckpt and diffusers models is about the same, given the variability of results. However, I'm a big fan of the inpainting model, which produces consistently good results, so I'm hoping the problem(s) with the diffusers version can be tracked down and fixed. I'm reluctant to dig into it myself, as I'm not very familiar with the internals of the diffusers code, but I could try...
I guess that really just means low step counts generally on karras though, given how strength works
I wonder if two seeds after one another can do this. 2364683628 followed by 1304976924
Can somebody try that with k_dpmpp_2 @ 50 steps, loopback at 0.6?
That's a good idea. I'll do that this weekend.
Thanks for looking into this! The procedure is to import runwayml/sd-inpainting-1-5 (make sure to get the diffusers version, not the ckpt), then load an image into the unified canvas. Extend the bounding box beyond the borders of the image in order to ask for outpainting, and then "Invoke". If you compare this to the ckpt version, you'll see a huge difference.
Does it matter which sampler?
I will be happy to tag an RC as soon as @dire gazelle is done with the python-based installer script.
Also, loopback doesn't seem to change the SD metadata to refer to the previously-generated image, is that a bug?
probably.
@heavy glacier Try loopback with your starting seed at 3905153771, 10 iterations, strength something reasonable like 0.6
I've seen problems on multiple samplers but haven't gone through them exhaustively.
Just for comparison, here's the ckpt version of inpainting-1.5 in which I masked the artist's hand and outpainted downward by 128 pixels. It does pretty well on the outpainting. Not so thrilled with the hand.
keep seed the same for each generation?
If you use loopback and generate >1, it'll use a PRNG random walk starting at the seed you put in.
thats why i was asking
(Things I learned yesterday.)
images=10
Which sampler?
Nah it’s pretty much after the first few
@rose sentinel I'm getting the diffusers inpainting model now, will try soon.
Every so often when using k_dpmpp_2_a I get a patch of artifact. I don't use the non-ancestral one much, but maybe it has a small probability of producing artifacts which add up over multiple loopbacks.
Much obliged, and apologies for harping on the issue.
Part of me wonders if this is something with how the k_dpmpp_2 sampler works and it amplifies artifacts in the source image and puts traces of them in the destination - so you get a compounding error sort of effect.
But then we have the ckpt files that work well, so... hm.
Perhaps it's time to call the diffusers folks.
@rose sentinel Can you share your parameters and such so I can best reproduce the problem?
Sure, I used k_euler_a , 30 steps, random seed, nothing else turned on.
i2i strength?
Patchmatch or tiling? Inpaint replace (if so, what strength)?
This is why storing all of the canvas settings in metadata would be super helpful.
OK. I see what you mean. Inpainting model is not working well at all.
Do we have a separate channel for this discussion?
In generate.py: inpainting_model_in_use = self.sampler.uses_inpainting_model()
This is true for ckpt inpainting models and false for the diffusers inpainting model.
I assume that's desirable as that leads to the omnibus generator for the former and the inpainting generator for the latter.
The ckt_generator/omnibus module is only for ckpt files. There is a port of generator/omnibus, but because uses_inpainting_model() always returns false for diffusers modules, it has never been tested.
I set it to True. It doesn't work. 🙂
Also inpaint_replace doesn't work in diffusers inpaint.py.
I want to see the canvas debugging output but it doesn't seem to work on my headless box. Any pointers?
I don't think diffusers has an ancestral version of DPM Solver++
When troubleshooting things, I strongly recommend starting with a nice simple single-order scheduler like DDIM.
Omnibus for the old ckpt inpainting model isn't working right, either...
# TODO: we should probably pass this in so we don't have to try/finally around setting it.
self.invokeai_diffuser.model_forward_callback = \
AddsMaskLatents(self._unet_forward, mask, init_image_latents)
else:
guidance.append(AddsMaskGuidance(mask, init_image_latents, self.scheduler, noise))```
If I make that change (False and) in diffusers_pipeline.py, the inpainting model does something.
Why is that?
Here's a video of it in action with that bad change. https://drive.google.com/file/d/12iIvxEc-kb1PDqJRygkA6PX6pXpd2PXD/view?usp=share_link
wait, it works at all if you skip AddsMaskLatents? Oh, yes, it does, because the chunk that makes non-masked operations work with the inpainting model is further down
so maybe I messed up the order of channels in AddsMaskLatents
Well AddsMaskLatents doesn't seem to do much at all. :/
I can throw some debugging output in there and see if it gets called.
I see __call__ followed by add_mask_channels, so it's working... just not doing what it needs to.
That happens every step.
crash for hires-fix when used with inpainting model in https://github.com/invoke-ai/InvokeAI/pull/2440
So is AddsMaskLatents even necessary or can we just use AddsMaskGuidance?
the inpainting model needs those extra latent channels passed to it, yes. otherwise it's not an inpainting model.
So it was working without AddMaskLatents but not doing what it should have.
writing a test for it now to see if I flubbed something obvious
This would be a great usecase for the "merge models to RAM" option, to let you load any model as an inpainting model without doubling the number of models you have to store. You just need to make sure you have both 1.5 and 1.5-inpainting for the merge process.
Let me know if you want me to test anything. I'm around for a bit more.
hmm, test case checked out, it seems I did not flub the order of the channels
So does the function not return what it's supposed to?
it's working as intended. The AddsMaskLatents.add_mask_channels function, that is.
Is the appropriate amount of noise getting added in?
The only other time I've seen weirdness like this with is with img2img when there's transparency and there shouldn't be.
which "like this"?
different people have posted a lot of different things about different aspects of inpainting in the last 24 hours and I have a great deal of difficulty following who is talking about what
See the video I shared above.
The first part is what I get if I don't bypass AddsMaskLatents no matter how many steps or which sampler I use.
It looks almost like the original tiling or patchmatch.
Change the strength to 0.99 and I get the same thing.
Original vs. outpainted
(Tiles of size 16)
the only blocker is the update flow at this point. aiming to finish it this weekend.
@worldly cloak It's clearly doing something as the boundaries between those tiles look softer than they would were it not doing anything at all.
"""predict the noise residual"""
if is_inpainting_model(self.unet) and latents.size(1) == 4:
# Pad out normal non-inpainting inputs for an inpainting model.
# FIXME: There are too many layers of functions and we have too many different ways of
# overriding things! This should get handled in a way more consistent with the other
# use of AddsMaskLatents.
latents = AddsMaskLatents(
self._unet_forward,
mask=torch.ones_like(latents[:1, :1], device=latents.device, dtype=latents.dtype),
initial_image_latents=torch.zeros_like(latents[:1], device=latents.device, dtype=latents.dtype)
).add_mask_channels(latents)```
That if block never executes because AddsMaskLatents has already been hooked up by this point.
Not sure if that's a problem or not - just an observation.
So when I changed the code before to False and..., this code did get executed on each step.
correct
try this: https://github.com/invoke-ai/InvokeAI/pull/2441
it works okay for me on our red-haired sample lady.
although in some ways that case is particularly easy for this method because the background is naturally a flat gray.
But (not looking at the code yet) that implies that the infill method is the issue. And I have strength up to 0.99 - almost a complete replacement - and it's not working.
I just pulled it and I'll give it a shot. Stand by.
Can I choose the new methods in the GUI?
you can! I was pleasantly surprised to discover I didn't have to change anything else for that, the GUI picks up the infill methods list.
That's cool.
Loading things up now. I'll try to outpaint Patrick Stewart.
Are you using xformers?
Weird, that worked.
No, no xformers here.
So I have an idea. Hang on a sec.
Nope. Why is this working at all?
I do get a line with blur
That's consistent across seeds, too.
Could be my prompt, who knows.
yeah, I think I goofed something up with blur. tinkering with that a little.
Yeah, maybe my seed.
Anyway, why do those two methods work while tile and patchmatch do not?
Definitely getting a seam even without an embedding.
huh I thought "blur" was a fairly simple idea but the results are much weirder than I was anticipating. even after fixing my thinko with the order of the layers.
Also I think the seam painting isn't doing much at all, probably for the same reason that working with tile or patchmatch isn't doing much...
hmm, I wonder if the way the inpainting model was trained, it learned to expect the masked area to be zeroed out like that. That seems plausible.
Didn't it work with 2.2.5?
I never used it but I think a lot of people did.
And if it is expecting gray (0x7f? 0x80?), then the seam painting step is a waste of time.
And if all of this is the case, we should just use pure gaussian noise at 100% strength for inpainted regions for the inpainting model.
I just did some quick testing and mid-grays seem to be acceptable for it to figure things out. Too light or too dark and it does a bad job, same with too colorful.
yeah, that's the conclusion I'm coming to as well. Inpainting model should always be run at full strength with the masked region blanked.
okay, that's a fair reason for it to take a different code path... we just have to make the reasons why a little clearer in the code than naming it "omnibus"
And if inpainting's picked, strength doesn't matter for transparent/erased regions... but it does for img2img, which works? in this model still.
but I am not sure how to communicate this UX-wise.
I'm removing the "blur" mode from that PR, as it doesn't seem to be useful.
So what if we took results from the other methods but scaled them into 0x40-0xbf? Would that yield more texture and be helpful?
ooooh, hmm, I just realized something.
inpainting model gets those two inputs, the noised latents we're working with and the latents of the original image.
and we're learning that it's important for one of those to have just flat gray in the masked area, but that might not be equally true for both.
And a follow-up: What happens to masked areas with the inpainting model?
I think masked areas with inpainting have to be treated the same as erased areas.
This technique works perfectly for inpainting only with regular SD1.5 as well - fill with gray and set strength to 0.99 for filling erased regions.