#𧨠diffusers
1 messages Β· Page 3 of 1
and again, I don't think its extremely common, but it should be doable for those that want to do the work in models.yaml to make it happen
that's fair, yeah. i just fear that "you need a vae" is cargo-cult thought at this point and custom vae settings should not be front and center like they are with compvis, but rather an "advanced" override setting, as you say.
for sure
I definitely think people hype themselves into thinking certain things add "exceptional quality"
I feel like it's every day that someone posts asking if we support [obscure feature] that unlocks exceptional quality heretofore unseen in the world of stable diffusion
and then I find that [obscure feature] is effectively [mangled bad ugly hand garbage]
π
idk if you saw that research paper published recently that basically said "everybody in diffusion research thinks diffusers need gaussian noise but hey look they don't"
sorry keturn, wrong channel hah
not intended audience, but i did see that π
even the ldm ckpt files have an embedded vae, right? otherwise the conversion script wouldn't have to convert it: https://github.com/invoke-ai/InvokeAI/blob/ee1b9654489494b380ef5da5e9dc990ea1e6e540/ldm/invoke/ckpt_to_diffuser.py#L892
so if a certain model recommends always using a particular VAE, it should be that model's default VAE.
there will occasionally be times like when stability put out the new 1.x MSE-finetuned VAE, but tbh I don't know how common that will be.
"override" as an "advanced configuration" matches how I think of it.
and in the case of things like the MSE-finetuned VAE, you probably want to use it for all models that had previously been using old-VAE.
In which case it might even make more sense to have a global "load VAE" button be entirely separate from the "change model" button, instead of having to reconfigure all your models.
but that is well outside the scope of what we need for now.
@rose sentinel would it be possible to update the add_model code to remove a key if the new supplied version with clobber doesnt have it
right now theres an issue if i add a vae to a diffuser model, it creates that key
but lets say i edit it from the model and remove it from the ui
even though its passed back as no key
the object is not updated coz the key exists and because its not being overwritten it still stays
this is an issue because I am unable to supply null as value for the vae
on local paths, it is failing
one of the two needs to be fixed
that vae path set to null does not try to load vae from that path or the add_model functionality be updated.
I think doing the 2nd is better option
if we can restrict the vae entry slot from being used if the value is null or '' .. then i can just ensure that all model entries have all keys .. this way the model always tries to load the default vae within the model folder and look for a custom vae only when specifically supplied.
which is a lot safer and simpler way to assign the config rather than selectively having different keys for different models and then dangling them between the backend code that deals with it and the frontend objects that are being supplied.
Saw that. Very good read.
Yes, I've already done that (removing keys not provided) on my hf-models branch. Will finish and merge late tonight (busy day!)
awesome thanks. ill continue the work on the model manager after that
its mostly in place.. just need to run it up to make sure all edge cases are handled
speaking of branching strategies...
now that lstein has committed us to keeping ckpt compatibility for this release, that might enable us to merge this sooner rather than later.
i.e. you have to be comfortable saying the backwards-compatible stuff works, but we can be a little looser with the new stuff if it's an option instead of the only path.
When we hit RC stage, I am happy testing the hell out of it (assuming it's not next weekend).
I did a little dinking around with outpainting to look at seams, but mostly I'm stuck in that place of not really understanding how it's supposed to work with patchmatch in the first place.
Iβm open to this if we think weβve got all the hairiest regressions fixed
Does embiggen still work?
well, if seams are buggy in the new code, presumably embiggen (which relies heavily on seams) is going to feel that
or does it not? does it do its own merging? π€· I don't know anything
basically the signal from the infill methods seem so strong that I don't know how to avoid giving her this tower of hair
In main, it's in ldm/invoke/generator/embiggen.py - it rolls its own.
It uses PIL's Image to do everything.
Did you use a fixed seed for these? That might give some more insight, if not...
oh, embiggen is built on top of img2img, not built on top of inpaint. and the seam stuff is in inpaint. right.
Just want to see if anyone's tested it.
The inpainting code simply does too much when processing the inpaint model. When used with ckpt models, the code path leads through omnibus rather than inpaint. Iβll try to recreate this behaviour for diffusers sometime after we merge.
yeah, there were a number of places in the code where it said "this can't be done with the inpainting model" and I was like "Oh really? That sounds like a challenge."
so it's different than the ckpt_generator code because it doesn't try to avoid doing those things.
I think it is the right decision. It lets us keep the momentum weβve built up with users.
Are you using a very high strength when outpainting? Greater than 0.9 is basically required.
(sorry, been having trouble keeping up with the chats lately)
I donβt think we release until all the major kinks are ironed out and diffusers works
But I think merging to main and wrapping up bug fixing and RC there would be reasonable
I certainly agree that we donβt release until everything is working but Iβd like to get the merge done and a RC out soon. I think weβre close.
Turned on debugging images... The init_filled means this is the result of patchmatch that's being sent to img2img, right?
are these outputs from the dev/diffusers branch? Which model?
sorry, I'm still a beginner how can I recognize which model?
I can send a screenshot
Oh, if this isn't from a development branch, you can ask in #1020695341303074846
sure, thank you
That almost looks like tiles rather than patchmatch...
naw tiles would look like mixed up tiles
this definitely looks like patchmatch
yeah, this is tiled.
(which yah, underperforms on people... the inpainting model is definitely the best model for people)
That looks normal, is there a problem with her?
okay, ship it! 
My wife is a redhead and I can confirm that she looks just like that.
so do we need an option for another infill method that's just like "middle grey" or something? because there is no "None" option for infill on the front-end.
naw, without infill (with the regular models at least) inpainting doesn't really do anything
e.g. if you just fill with a solid color, it will strongly skew the results toward a solid blob of color
the omnibus.py code path (that the runwayml inpaint model uses) doesn't infill though
so we need that option for the infill model?
I probably threw away the code paths that called omnibus
because I didn't understand why there should be one implementation in the 'bus in addition to the other implementations in the other Generator subclasses.
yah generate.py did that
Found a little UI bug and not sure if it's in diffusers... switch high-res optimization on in txt2img, go to img2img, go back to txt2img and it's reset to off.
I don't understand the differences either. And maybe diffusers removes any differences anyway
omnibus is tied in to this question about sampler.uses_inpainting_model: https://github.com/invoke-ai/InvokeAI/pull/1583#issuecomment-1368339238
There is one place left that uses it: https://github.com/invoke-ai/InvokeAI/blob/e294fcabeb88f8b56b7b9d9fc57a1e6ac5b72de0/ldm/generate.py#L715-L724
I'm not sure what that's about, as I don't think anything in this branch uses "omnibus" anymore. But we have these parallel generator modules for the legacy model support, so maybe it's something we can't get rid of until we drop that?
Yah, if the runwayml model is the one being used, omnibus.py is used instead of any of the other code paths
I've very minimally looked in that code (just to add color correction and original image paste-over after inpainting)
(if you don't do the paste-over, you get subtle, small changes to the original image, which doesn't look good when inpainting with a portion of the image selected in the bounding box)
I haven't reproduced the smeary seam during all this either 
though I've been turning down the seam size from the default 96, cuz that's, like, almost the entire area I'm outpainting
ah, yah 96 size with 16 blur produced the best results in my testing. I outpainted by at least 128px though on a 512px image
Do not worry about putting omnibus.py back into the code path. It should be retired. However, the extra stuff that @clear hinge does to support outpainting in inpainting.py should be modified when the inpainting model is in use, because it degrades the results. I'll take care of that when the dust settles.
Diffusers Team: I have just merged in the code that changes the organization of the models directory. As described earlier, all the huggingface models are now organized in the same way that the huggingface cache does it, so that advanced users can share models by setting the HF_HOME environment variable appropriately.
I wrote a small routine that checks the models directory on startup and reshuffles the positions of the the huggingface models on a one-time basis. If everything goes as planned, everything will continue to load after this without a hitch. If I did anything wrong, you will see previously-loaded models try to load again; let me know if this happens.
I also made the fix to ModelManager.add_model() that @rugged moth requested earlier today. If a value is missing from the edited model configuration information passed by the front end, then it will be deleted from the configuration file rather than merged. There is a check that all the required fields are present, so it should be ok?
sorry i tried this
but it still not replacing but rather merging and now it also adds a new key called merge:false
so that advanced users can share models by setting the HF_HOME environment variable appropriately
does this mean that Invoke respects the value of HF_HOME and will load models from~/.cache/huggingfaceif it's set to point there? or is it the opposite, that users are expected to set HF_HOME to~/invokeai/models?
fwiw, i do not think Invoke should override hf's default behaviour of putting files it auto-downloads into ~/.cache by default
Looks like I got the merge syntax wrong. I've committed a proposed fix. Try it now.
If HF_HOME is not set, then invokeai downloads to and uses its models from ~/invokeai/models, while all other huggingface API clients continue to use ~/.cache/huggingface (unless they have also changed the cache directory). If the user sets HF_HOME, then both InvokeAI and all other huggingface API cllients will use whatever path is contained in the environment variable as the common cache directory. To use the default huggingface cache, the user would have to do export HF_HOME=~/.cache/huggingface or the Windows setx equivalent.
The tradeoff, which we've discussed earlier on this channel, is whether to prioritize sharing over transparency. @worldly cloak and I worry that newbies will have difficulty finding the big model files contained in the hidden .cache directory and will wonder why their home directory's free disk space is getting smaller. This was the reason we thought it best to have all the downloaded models go into invokeai unless the user explicitly changes the behavior.
I think there's no perfect solution to this, and maybe it should be an installer option so that the user makes the choice.
The other big space management issue that @rugged moth talked about the other day is that the diffusers models share a lot of common files, including the NSFW checker (~1 G), the tokenizer, the feature_extractor and (usually) the vae. We can safe a huge amount of space by consolidating these files with some sort of deduplication system.
This seems like an opportunity to work w/ @forest spade & team to figure out how this should be handled broadly speaking. It seems like diffusers model format is not really designed with the assumption "There will be many/multiple models" - as otherwise, I can't imagine why there wouldn't already be some form of broader standard on how to dedupe
Yeah.
What would be nice is if a model_index.json could cross-reference to other repos, but I haven't seen any indication of that.
Come think of it, I don't think I've seen any documentation on model_index.json at all.
wait, did SD 2.x give up on shipping a safety checker? https://huggingface.co/stabilityai/stable-diffusion-2-1/blob/main/model_index.json
and it looks like the SD model repos don't have their own weights for feature_extractor, they just specify it's CLIPFeatureExtractor and let the transformers library get the model for that.
but they do ship weights for the text encoder, which is a little weird, cuz I thought SD just used a frozen https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K
ahh, i see. thanks for the explanation, i can see the tradeoffs and i guess i agree this is the best solution
i think the feature extractor is a minimal piece of code without a big blob of weights or anything
I agree. A more central solution from the diffusers lib would be handy in this case instead of us hacking our way. Coz every time something changes in their architecture, we'll need to make changes too otherwise which may or may not be easy changes.
A non-breaking solution that will work with the current diffusers layout would look something like this:
- Traverse the directory in which the models are stored.
- Compute a checksum for each file encountered; cache the mapping from checksum to file
- If checksum has been seen before, remove the redundant file and replace with a hard link to the original file with that checksum
This will create a directory structure that is identical to what we currently have, with the exception that multiple files will point to the same inodes. Each group of identical files will only take up the disk space needed for one. Because hard links are being used rather than symbolic ones, the user can move, delete or rename a file without leaving dangling links. One disadvantage is that if the user moves the directory structure to another file system, the space savings are lost. Another disadvantage is that this procedure needs to be run at periodic intervals, such as after a model is added or deleted.
By the way, I've got the diffuser-based textual inversion front end up and running. I'm doing my first TI training and so far it seems to be working as expected. The UI (for what it's worth) looks like this:
@rose sentinel I've been thinking about the diffusers model format in the yaml. I think you need to setup the structure so it works perfectly on the CLI. That way I don't have to do intermediary python code in the web_server file. Coz if at any time in the future, the model manager code changes, then that'll break the frontend rather than being in sync.
The reason the regular ckpt code works so well is because it more or less uses the same code as the CLI barring the completer.
Think we need to do the same with the diffusers format too.
If you can give me an example snippet of the data being passed to the add_model for an addition of a diffuser model correctly via the CLI, I can replicate that in JS.
Also side note, I think we should avoid using 'None' / undefined and keyless entries. I think all configs should have all keys irrespective of whether they are used or not. And in cases where the entries for vae paths for example are ' ' or null .. it should not try to load them.
This way we can build a single form structure that wont break no matter what keys are passed and what keys are omitted by the user during changes.
sorry all for the deviation from topic but going to point you to #1062855391878328360 @rose sentinel - looks like an installer regression on windows thats affecting all installs (or at least a large volume)
did we do a p1 patch recently?
My bad. Will fix.
Fixed. I made a trivial fix to the windows bat installer and neglected to go through my full release checklist. I guess there are no shortcuts!
there are indeed, no shortcuts
@dire gazelle I think the directory structure you're seeing is the intended one. See caching discussions in this thread.
thanks! so is models-- correct here: invokeai/models/diffusers/models--runwayml--stable-diffusion-v1-5?I don't have an expected HF directory structure to reference, so that prefix stood out to me. No problem if that's correct, just making sure.
the caching effort is awesome, I think the HF_HOME approach makes sense
but let's avoid hardlinks. That would be very fragile. like others also said, maybe it would be best to wait / work with Huggingface to handle safety checker (et al) deduplication upstream
that said, on a fresh install it looks like there's a bug with the safety checker model handling:
- on the 1st run, it is gets downloaded into the pre-2.2.5 directory structure (
models/CompVis/...) even though thediffuserspath is already there; - on the 2nd run the application notices that, tries to convert it and fails because the HF-style dir structure is already there:
I can fix that later tonight (morning now, so in ~15h) if no one gets to it first (but in that case, please lets coordinate changes to the configure_invokeai script 'cause these rebases are killing me π )
nvm, I unexpectedly had a bit of time and just pushed a fix to generate.py that takes care of βοΈ
(in diffusers)
and models are all found fine, so that answers my questions about the path naming. thanks @worldly cloak
@worldly cloak I think there may be a (new?) bug in the step calculation or in the tqdm display of steps. When I specify a step count of N, the system reports it is performing N*2-1 steps. Here are some examples:
(stable-diffusion-1.5) invoke> a big bowl of jello -s15
100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 29/29 [00:04<00:00, 6.75it/s]
(stable-diffusion-1.5) invoke> a big bowl of jello -s10
100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 19/19 [00:02<00:00, 6.73it/s]
(stable-diffusion-1.5) invoke> a big bowl of jello -s5
100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 9/9 [00:01<00:00, 6.70it/s]
@tardy sparrow Thanks for the work on deferred embedding token injection. It works like a charm. However, I noticed that if you inadvertently trigger an embedding that is not compatible with the currently-loaded model, it raises a fatal exception, so I just now put try: blocks around the places this may happen, so that the user sees an appropriate warning.
@tardy sparrow @heavy glacier I have a design question for you. For embeddings loaded from the huggingface concepts library, we detect and trigger them using the <trigger-token> notation. However, for embeddings that are loaded from the embeddings directory, including ones that are created by local textual inversion training, there is no special syntax, just the raw trigger-token. Aside from the inconsistency, there's the chance that the user could trigger an embedding without knowing why. I ran into this when I made a textual inversion using the term "jello".
Do you think that all trigger tokens should be enclosed in <> characters? The logic would then be to look if the token is already a loaded embedding trigger. If not, the token is checked against the concept library list and potentially downloaded. If neither case apply, then treat it as a normal part of the prompt.
The disadvantage of making this change is the obvious one that it breaks previous prompts that relied on bare trigger tokens, and that it might bring down a rain of abuse on our heads.
I actually thought we needed to use <> for embeddings across the board, so didn't realize that...
good catch
hmm. i don't know. is the proposal that on load Invoke would force the token to be <>? so jello.pt would become <jello>? what if it was an embedding like the HF .bin ones that already contains <>?
do users want to be able to still get the original jello feature vector (ie the non-TI one) when they've loaded jello.pt? or do they want the presence of jello.pt to always override jello?
Embeddings are currently <embedding> and I think it makes sense to keep all of them that way.
<jello> style, bowl of jello
in fact that isn't the case - if you put a non HF embedding in the embeddings/ folder it will be loaded as word not <word>
Wait, seriously?
In main, I always use <embedding> to trigger my embedding in embeddings/ that I trained with main.py.
And now I see I'm wasting tokens.
Indeed, until I did my own textual inversion training using the diffuser based training pipeline, I thought that too, but the learned_embedding.bin file produced by the pipeline is triggered by jello and not by <jello>.
Given that, I think they should all be <embedding>.
Even though that's a change, I don't think too many people are using local embeddings.
oh god can you delete that gif sorry
But on second thought, what I should do is to add the <> to the token produced by the diffuser training pipeline, to keep backward compatibility while having a more consistent convention going forward.
right, the issue is an inconsistency of what the data defines - for HF/diffusers-trained embeddings the resulting learned_embeddings.bin contains a trigger_string property. but for the my-embedding.pt style trained by auto111 and others there's no trigger string, so the code just uses the filename
So then I have to use the current embedding I made as embedding1 but going forward I would use <embedding1>?
Seriously though, now that I think it over, we shouldn't mess with the tokens in other people's trained embeddings. I'm just going to quietly add <> around the placeholder tokens produced by InvokeAI's TI training front end.
This is what threw me off: >> Current embedding manager terms: *, <HOI4-Leader>, <princess-knight>
That's because they are HuggingFace-trained placeholders, which follows the <> convention.
That's in the install-your-own section of the docs, though... https://invoke-ai.github.io/InvokeAI/features/CONCEPTS/?h=embedd#installing-your-own-ti-files
I do think it makes sense to use <> as a trigger. Can we inspect the placeholder token to see if it has <>?
Opposite effect. If you've previously used embedding1 then you continue to use embedding1. But if you train your own textual inversion using the script that I recently added to dev/diffusers, then embedding1 has to be triggered using <embedding1>. User will be informed of the fact, and if they enter <embedding1> explicitly, the angle brackets won't be added.
I guess the example in the docs now uses * as-is without <*> syntax, but I always did the brackets because I figured that's what embeddings needed to trigger.
By the way, diffusers TI training creates a 2-4K learned-embeddings.bin file and an 8 Gig diffusers-style model directory complete with the safety checker. At the end of training, the script I wrote moves learned-embeddings.bin into the embeddings directory and then prompts the user whether they want to delete the big directory (default "yes"). Is there any reason at all to keep the big directory?
That docs file needs to be rewritten. I think you have been using up three tokens: '<', '*' and ,'>'.
Can you resume training using the data there?
Supposedly. I haven't tested that functionality yet. If the training crashes or is interrupted, the folder stays there. It takes checkpoints every 500 steps or so.
Well I've been using my magic token, lfa, but wasting two with < and >.
I think if Invoke will allow for resuming, asking the user if they want to delete it or not makes sense. Or if it has remnants of earlier TI checkpoints, you could use one of those if things look overtrained?
With the current TI trainer, I've definitely tried out different TI checkpoints to see which one works best. It's all vague. But if that situation has improved with diffusers, then past checkpoints become less important.
Maybe it's possible to resume from a checkpoint without having the log files and the full model around. I'm just now seeing that there is a parameter named only_save_embeds which I imagine should be set to True.
I bet this depends on which scheduler you're using.
oh wait here's Patrick now, he can clarify why there might be N*2-1 steps.
This makes a lot of sense! We haven't thought too much about this indeed.
Happy to brainstorm about how we can build a nice system that avoids deduplicating model weights downloading!
Some comments:
- SD 2.+ didn't add a safety checker anymore because NFSW was removed from the training data
- We often load the safety checker seperately now, exactly because we want to save memory. E.g. see: https://huggingface.co/spaces/darkstorm2150/protogen-web-ui/blob/main/app.py#L28 -> this makes sure the safety checker is never downloaded multiple times.
Having looked at our current code a bit, it's actually not too difficult to make the changes (should be quite easy). Put up an issue/feature request here: https://github.com/huggingface/diffusers/issues/1984 . Would be great if you could give it a look. Then nice thing about the "individual compononents" saving format is that we have a sha256 key for each individual component and can thus quite easy load only parts of the model. Maybe @worldly cloak you could give the issue a look π
Which scheduler are you using?
HeunDiscreteScheduler
Checking!
Indeed, the step count matches when I use LMSDiscreteScheduler. (also DPMSolverMultistepScheduler and EulerDiscreteScheduler)
Hmm HeunDiscreteScheduler also works for me on current "main" branch. We might have had a bug in a previous version (what version are you using?).
0.12.0.dev0
If I understand correctly, it's working for lstein, he's just surprised by the step count.
probably because of your double-the-steps technique for heun's second-order stuff
I installed that because I'm getting black images using SD-2.1 without xformers installed.
I mean the step count also works for me, I'm seeing tqdm showing 25 steps when setting num_inferenc_steps=25
Yes! Heun producing perfectly good images. It's just reporting more steps than I asked for.
wait lemme make a quick google colab
(going offline for a meeting; apologies)
https://colab.research.google.com/drive/1hEml9CQQFGpC1LD-anuNIqc22vQkhx7h?usp=sharing -> shows 25 steps for me with current main
oh, because you tweaked the progress report to not function as an iterator, but instead you manually call progress_bar.update every scheduler.order steps. blugh
Yes, it's not the prettiest design at the moment, I must confess π
lstein, I guess we could do that if it's important to show progress per βstepβ instead of per actual unet.forward.
but I'd rather not if it's all the same to you.
If it makes too much dissonance seeing an unexpected 2*N-1 there, we could normalize everything to percentages.
Hello! Is it possible to enable xformers? I built it from source within venv but it doesn't seem to be used.
wow. no, there is no reason. there ought to be an option to not save that - the only valuable data is the 2-4k learned-embeddings.bin file.
if the option isn't there i'd highly recommend hacking up the code to just not save that 8Gb folder. because there's zero need for it.
does anyone have a cuda wheel for xformers on windows python 3.10 and cuda 11.6?
And you're using the dev/diffusers branch of InvokeAI? It should be used for any generations on diffusers-format models.
Won't apply to the legacy ldm ckpt models.
Okay that explains it, thanks for reply
There is indeed an option not to create the big model. I just hadn't seen it.
Looks like there's one here: https://github.com/facebookresearch/xformers/actions/runs/3883052105
These arent built with CUDA afaik
oh wait they might be. wait let me check if they have my version
nope
windows-2019-py3.10-torch1.13.1+cu117
need CU 116
is there an archive of older wheels?
had to do a hackjob but got it working. upgradeed the pytorch installation on invokeai to cu117 and installed the wheel from above.
but the wheel places cpp_lib.json in the wrong folder. Move it to the right folder and it'll begin to work
Triton doesnt work on windows
im guessing that needs another build
Here is the link to cu116 for windows https://github.com/facebookresearch/xformers/suites/10282984753/artifacts/505290873
I think you can go through their actions to find older versions
Sounds like we're on the cusp! Anyone want to share thoughts on how we actually start getting this ready for release? π
is there a reason to stick with CUDA 11.6 instead of upping to 11.7?
The only true blocker that I'd like to see resolved before merge into main is the dependency on xformers to generate images with SD-2.1. Any progress on this?
Other issues that can be addressed after the merge are:
- Regression in the inpaint/outpaint quality (Kyle and Keturn)
- Diffusers model merge script (Lincoln)
Also maybe some notes about the migration from 2.2.5 to *?
e.g. Can I just do a git pull and update pip via requirements.txt and launch it or do I have to do more setup work?
Yeah, we need release notes too, although migration should be pretty easy. I tried to make it happen automatically.
You can just do a git pull and pip install -r requirements.txt
Nice.
Ive been using it for a few hours .. so far no issues
We are indeed upgrading to 11.7 afaik (unless anyone has a specific reason to hold back). That's already implemented in both my dev/installer work and mauwii's pyproject.toml migration. But it's not yet the case in main or dev/diffusers. It's an easy change to make (just requirements-*-cuda.txt), but we just need to make sure that all contributors actually upgrade to the new version, to avoid any surprises (though to be fair, I really don't think there are any surprises to expect).
will xformers work on non-RTX cards? or more importantly, will torch still work on non-RTX cards when xformers is installed?
dependency on xformers to generate images with SD-2.1
Is that why i've been only getting black outputs with SD-2.1 on dev/diffusers? I chalked it up to something being wrong with my setup and didn't want to get distracted by troubleshooting
thats the working theory
I'm pretty sure of it.
I vaguely recall doing something like pip3 install xformers==0.0.16rc390 triton torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu117 for another project.
non-RTX? yes. non-CUDA? better to leave xformers out entirely if you don't have CUDA, I think.
I'm confused. It's a requirement for CUDA but should be left out if you're not using CUDA?
oh, the SD 2.1 thing is a blocker to merge? I was hoping not, on account of it not being a regression.
Well, let's discuss. Support for 2.1 is one of the big features that we were going to use as a selling point for the switch to diffusers. Right now we have partial support, and I was thinking this would be a major issue. Happy to get other opinions on this, as I am eager to get the merge in.
Is it related to this issue: https://github.com/huggingface/diffusers/issues/1614
?
Not clear from the comments whether autocast is actually the culprit.
indeed. I'm reasonably sure we don't have autocast in our implementation, but those later comments say it's still broken even without autocast.
from the project management perspective: I've been advocating all along that we treat "migrate existing functionality" and "add new functionality" as separate stages.
We can decide we don't want to make a public release without both of them, but I don't think it's necessary to force both of them in to the same PR.
from the "what will it take" perspective: since we know turning on xformers changes the behavior, we know it's something to do with the attention code.
which I take that to mean it will involve going in to the attention code of diffusers 0.12.dev0 (still no diffusers release this week) and spelunking around in there and probably converting @tardy sparrow's monkey-patched attention code to the new API while we're at it, otherwise it'll be that much harder to keep track of what is happening.
Fair enough. @heavy glacier, @rugged moth what do you think? Should we go ahead and merge dev/diffusers into main, knowing that using SD-2.1 without xformers gives black images? Sounds like it'll be several days to a week before this issue is solved. If the answer is yes, then I need a few hours to add a new document to warn people about the changes and what to expect, and can do the merge Saturday, barring any merging conflicts.
Seems fine most part. We can address the model manager and any other issues that pop up directly on main
that way we dont have divergent branches
the longer diffusers stays on its own, the less the work can get done on main
and we wont be releasing officially until the bugs are fixed
so its fine
we'll need to address the regression with outpainting / inpainting too
ive gotten xformers to work on windows with a custom build but its missing triton
I think triton needs to be built separately with windows .. (not even sure if thats possible) need to look into it
But yeah, I had to upgrade to Cuda 11.7 so i can use the wheel provided by xformers repo
Excellent! Unfortunately I'm on a ROCm system without xformer support, but am getting an Nvidia video card this weekend.
Sounds great. Would make the testing easier.
Which card did u get your hands on? 3x or 4x?
As an aside, I did get the diffusers two-way model merge working, but three-way merging is still problematic.
Thats good. We can sort that out too as we go ahead i guess
Says it's an RTX A2000. I hope it's adequate.
I'm more worried about the crashes I don't have the hardware to reproduce, like what @wheat fiber was having on M1.
Yeah.
I think Merge =/= Release
and 12gb would let you run dreambooth with optimizations
I wanted to get something typical of what our users have.
We should merge as soon as everyone is comfortable doing so (i.e., nobody thinks anything existing is fundamentally broken/regressed)
But I don't think we should release w/o 2.1
are we sure the image quality degradation is not to do with diffusers? on the inpainting / outpainting i mean?
whats the issue with 2.1? images are black with xformers?
yeah
Ok, sounds like we're on the same page. I'll attempt a merge. I've just been cautious about the users who are keeping up with the bleeding edge.
(re 2 - not sure on first q)
gimme a min .. let me check if thats the case with 2.1 on windows too
coz i am 100% certain i generated proper images without xformers on windows
cant tell you how much faster the boot up is now
i hated waiting for ckpt reloads
during development
In terms of image quality regressions, I'm aware of (1) black images when using SD-1.2 without xformers; and (2) outpainting has bad seam issues.
wait 2.1 has black images without xformers?
Very recently I've also noticed that images produced with some of the diffusers schedulers, in particular diffusers.HeunDiscreteScheduler, don't seem to be as good with the diffusers version of SD-1.5 as they were with the checkpoint version. However, I haven't done a systematic comparison. Is this the regression that @rugged moth was just talking about?
outpainting and inpainting has serious issues
@heavy glacier and I have both experienced this, and @worldly cloak confirmed.
let me load up without xformers
is there a command for it?
Indeed!
or should uninstall the package?
I'm not sure how you turn it off, because I can't use xformers on my platform.
there's not currently a --disable-xformers option. uninstall or comment out a line in diffusers_pipeline.py
I'm willing to live with the outpainting/inpainting issues and fix on main. Have to fix before release, though.
Inpainting seems OK to me (though I've not got a discerning eye). Outpainting is seriously ill.
yep 2.1 black images without xformers
which is weird
coz i definitely had it working before
wait let me test something
2.1 works on CPU, which is also weird.
I'm getting the impression that "black image" means "you hit some pytorch bug when using a view or non-contiguous memory or something and so you just got zeros back for some operation and good luck finding out where."
I'm thinking I need a wrapper method that does the "is this result all zeros?" check as a troubleshooting aide.
If it helps at all, I sometimes see a single frame of the latent diffusion noise before it goes black in the web preview mode.
Actually, I guess this doesn't help at all, because the latent noise is written before the first scheduler iteration occurs. NVM
found the black image bug @rose sentinel
its related to float16
changing to full precision works fine
now need to see what is breaking it
being somehow related to precision is consistent with that autocast-related bug report
apparently related to 2.1 768 model itself -- the 512 one runs fine. It's just the 768 one that needs to be full precision
because of their attention module native 768 runs on FP rather than half
the reason it works with xformers is coz xformers forces Half on the attentioning
@forest spade Sorry for the beep but I noticed your comment here that upgrading to 0.10 would fix the black image issue on using the 768 model with half precision but that does not seem to be the case. 768 model only seems to work on full precision. Would you know of a bypass for this? https://huggingface.co/stabilityai/stable-diffusion-2-1/discussions/9
Makes sense. Nice detective work.
if theres no bypass for this without xformers, then we just warn the user and load this particular model in FP
need to check if the depth and upscale model work fine
Doubles the memory requirements! How much VRAM will that use?
6.x at peak 5.x for gen
trying to figure out where the difference is. I see the upcast_attention here seems to be unique to the 768 model: https://huggingface.co/stabilityai/stable-diffusion-2-1/blob/main/unet/config.json
and patrick added something here: https://github.com/huggingface/diffusers/blob/c891330f7966fae4aec27962e243a1432d0dc6b5/src/diffusers/models/attention.py#L600
which we technically should have
@worldly cloak do we know why the first frame is grey when we start invoking?
On a mac? I know there's an issue with the first generation on MPS with diffusers. The usual solution is to do a single-step "warm-up" generation on startup.
no on windows
the very first latent image that is received back is always grey
but im guessing thats just the grey before the noise is filled in
AFAIK, that issue is only supposed to affect MPS, and my understanding is that it only produces different results, meaning a given seed will produce a different image if it's the first generation after startup. Could be that the fix for it would solve your problem as well, though.
If I'm building dev/diffusers from repo on windows is there some additional steps to setup xformers if it is not currently installed?
yes, but we're currently working those out. lol
(so that you dont have to take any extra steps!)
@worldly cloak any thoughts on how i can avoid this torch.Size([79]) bug without modifying prompts?
trying to do promptcraft and am on the diffusers branch and... well, would feel odd if I modified someones prompt to work π
oh geez. um. probably nothing I am in any condition to come up with this evening.
so I'd say either switch branches (or do you only need to switch back to the ldm ckpt model?), or manually chop tokens off the end
chopping tokens off the end isnt doing much - i think this has to do with weighting and less about truncation logic π€
I will see what I can do to switch branches and not have everything implode! π
Thoughts and prayers
what bug is this? is there a github issue?
i've been looking into converting the ckpt -> diffusers converting script for on-the-fly loading and actually it seems pretty straightforward?
well yeah that seems to work (ignore the crap quality, this is 2 steps of DDPM2++)
@forest spade above is the result of adapting the convert_original_stable_diffusion_to_diffusers.py script to an import-able module that returns the converted pipeline object rather than saving it to to disk - should a i make a pull request? code is attached
If youβve got a longer prompt with weights on it, get a torch size 79 v expected 77 error
And no Iβve only posted here
Just curiosity, how close we are from accepting models from sd 2?
very
Nice π₯°
@rose sentinel see above - i've basically got a way of loading .ckpt directly as StableDiffusionPipeline objects without having to save any files on disk. i took a quick look at the model manager but it seemed to be a non-trivial task to inject this in, either as an option or as a default loading mechanism for .ckpt files.
that's very good news!
I'm guessing the loading time is pretty close to what ckpt is normally?
yeah, pretty much
the first attempt has to download a copy of the CLIP model and a couple of other files to hf's cache, but that is shared for all subsequent loads
It would fit very well into the conversion module weβve already got. Away from my computer at the moment but it is the backend to the !optimize command
already exists``` got this after a ** Legacy version <= 2.2.5 model directory layout detected. Reorganizing.
** This is a quick one-time operation.
DEBUG: Moving X:\stablediffusion\InvokeAI-files\models\CompVis\stable-diffusion-safety-checker\models--CompVis--stable-diffusion-safety-checker into hub
this might be from switching back/forth to main though
removing /hub/ seems to have worked.
maybe an edge case.
Traceback (most recent call last):
File "c:\users\kentr\invokeai\backend\invoke_ai_web_server.py", line 1216, in generate_images
self.generate.prompt2image(
File "c:\users\kentr\invokeai\ldm\generate.py", line 468, in prompt2image
uc, c, extra_conditioning_info = get_uc_and_c_and_ec(
File "c:\users\kentr\invokeai\ldm\invoke\conditioning.py", line 30, in get_uc_and_c_and_ec
conditioning = _get_conditioning_for_prompt(prompt, negative_prompt, model, log_tokens)
File "c:\users\kentr\invokeai\ldm\invoke\conditioning.py", line 106, in _get_conditioning_for_prompt
conditioning, cac_args = _get_conditioning_for_cross_attention_control(model, parsed_prompt, log_tokens)
File "c:\users\kentr\invokeai\ldm\invoke\conditioning.py", line 201, in _get_conditioning_for_cross_attention_control
edited_embeddings, edited_tokens = _get_embeddings_and_tokens_for_prompt(model,
File "c:\users\kentr\invokeai\ldm\invoke\conditioning.py", line 237, in _get_embeddings_and_tokens_for_prompt
embeddings, tokens = model.get_learned_conditioning([fragments], return_tokens=True, fragment_weights=[weights])
File "X:\anaconda\envs\invokeai\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "c:\users\kentr\invokeai\ldm\invoke\generator\diffusers_pipeline.py", line 601, in get_learned_conditioning
return self.prompt_fragments_to_embeddings_converter.get_embeddings_for_weighted_prompt_fragments(
File "c:\users\kentr\invokeai\ldm\modules\prompt_to_embeddings_converter.py", line 56, in get_embeddings_for_weighted_prompt_fragments
base_embedding = self.build_weighted_embedding_tensor(tokens, per_token_weights)
File "c:\users\kentr\invokeai\ldm\modules\prompt_to_embeddings_converter.py", line 223, in build_weighted_embedding_tensor
raise ValueError(f"token_ids has shape {token_ids.shape} - expected [{self.max_length}]")
ValueError: token_ids has shape torch.Size([79]) - expected [77]
Thats the full error
seems like something is shaping it wrong (at 79) regardless of how many extra tokens there are, when weights &/or cross attention syntax are being used
@gusty hound I think the time has come to disable the conda tests. They are failing on dev/diffusers, and I would like to get the merge into main done. I assume you're in favor of this!
@rose sentinel Is the TI in the diffusers branch a complete rewrite of what's there, are you using some code right out of diffusers, or is it something else?
It is a slight adaptation of the code in diffusers.
opened a PR to diffusers which is removing the conda workflow, I also disabled it in the main branch protection
I appreciate it.
Very curious to try it to see how it compares to the old one in Invoke. I can't seem to get some particular styles to train properly which may absolutely be user error.
thank you, i just pushed a fix
Please, this. I have a custom script that uses this, and it can generate larger images, significantly faster, and with less memory pressure than invoke.
thatβs interesting - invoke already does its own slicing, maybe we should benchmark switching between it and the diffusers slicing. id need to modify some of the internals of how.swap() works, tho iβm going to have to do that anyway to support the changes incoming in diffusers 0.12
the trouble with slicing is that invokeβs slicing already works extraordinarily well for cuda, and id worry that turning to diffusers implementation would negate that
My diffusers script is also running in fp16 mode, which I can't currently do in invoke (haven't gotten the diffusers branch running yet). That probably explains most of if not all of the performance difference. That script also runs better with the attention slicing setting enabled than without, though.
Once the RC is out, I'll compare performance with my minimal script.
ooh. how do you do fp16? does that need torch > 1.13?
I'm not sure... how do I check my version? I just had to add torch_dtype=torch.float16, when setting up my pipeline, and it just worked...
Actually, it probably is a greater version than that... I think the setup instructions had me install a nightly...
looks like it was 1.13.
It's pytorch 1.13.1
great, thanks
Trying the new requirements.txt...
Downloading https://download.pytorch.org/whl/cu116/torch-1.13.1%2Bcu116-cp310-cp310-linux_x86_64.whl (1977.9 MB)```
That doesn't seem right to me.
To have it posted where it belongs:
I have some trouble with the new diffusers main branch on my M1:
- default model is set to 2.1 which is not working
- inpainting is just giving me strange results
- k_heun does double the amount of configured steps
I had to laugh when I downloaded this on a windows machine and saw the size π it's like 10% the size on MacOS π
Really huge!
But I guess it has CUDA + CPU support?
I assume the cu116 means it does work on CUDA as well, otherwise this isn't going to work for me...
"2.1 not working" - Any more specifics?
Inpainting strange results - Can you post a picture of whats strange?
- K_heun, i believe, is just displaying its "process step count", for lack of a better term, which is ~2n (where n is what you configured) of the actual 'steps'
I think that might just be "2.1 is a weird model"
@heavy glacier Am I reading it right that the pytorch 1.13.1 install is CUDA + CPU?
yes, thatβs right
Is there a reason why I'm getting the CPU version at all since I selected the CUDA requirements.txt?
It's just... big.
model quality results might be due to not running karras scheduler
i think itβs shipping the entirety of CUDA
which is why itβs so big @west nebula
judging by the macos install size the CPU build is like 50mb tops
Well I guess I'm lucky then to get a 2GB file.
Good thing I don't have bandwidth caps.
Wow I did not expect it to be merged that quickly
And I currently canβt test :(
But will try
As soon as I can
Yes, you're right. It should be removed and the textual inversion instructions need to be rewritten.
Didn't realize we still needed HF tokens
Got this message during configure_invokeai.py that would be helpful to suppress:
Could not fetch half-precision version of model stabilityai/sd-vae-ft-mse; fetching full-precision instead
Comes up with multiple models, actually.
Could not fetch half-precision version of model Fictiverse/Stable_Diffusion_PaperCut_Model; fetching full-precision instead
did you switch back to a diff branch?
either way
delete hubs folder
and it should work
but some logic on handling that folder probably something that needs to be done and handled for @rose sentinel
upgraded via requirements.txt, then ran configure.
This should be seamless, agree.
configure_invokeai.py should probably check to see if the old ones are there prior to downloading new ones. That should take care of this type of issue.
Or it could do the migration instead of downloading again.
>> Detected .pt file variant 1
>> Detected .pt file variant 1
>> Detected .pt file variant 1
>> Detected .pt file variant 1
>> Detected .pt file variant 1
>> Detected .pt file variant 1
>> Detected .pt file variant 1
>> Detected .pt file variant 1
>> Detected .pt file variant 1
>> Detected .pt file variant 1
>> Detected .pt file variant 1
>> TextualInversionManager refusing to overwrite already-loaded token 'Χͺ'
>> Detected .pt file variant 1
>> TextualInversionManager refusing to overwrite already-loaded token 'Χͺ'
>> Detected .pt file variant 1
>> TextualInversionManager refusing to overwrite already-loaded token 'Χͺ'
>> Detected .pt file variant 1
>> Detected .pt file variant 1
>> Detected .pt file variant 1
>> Detected .pt file variant 1
>> Textual inversions available: charTurner, Χͺ, comicart1000, comicart1500, comicart2000, comicart2500, comicart3000, comicart3500, comicart4000, conceptart, conceptart2, cxzz, jarz3, lfa, profood, starship```
Would be nice to know what files have Χͺ as a token.
So I could clean this mess up!
the pip distribution of pytorch is like this, it bundles the CUDA libs with it.
Point your browser at http://localhost:8999 or use the host's DNS name or IP address.
>> System config requested
>> Patchmatch initialized
>> System config requested
>> Image generation requested: {'prompt': '(a kitten), comicart1500', 'iterations': 1, 'steps': 50, 'cfg_scale': 9, 'threshold': 0, 'perlin': 0, 'height': 768, 'width': 768, 'sampler_name': 'ddim', 'seed': 4059114571, 'progress_images': False, 'progress_latents': True, 'save_intermediates': 5, 'generation_mode': 'txt2img', 'init_mask': '...', 'seamless': False, 'hires_fix': True, 'variation_amount': 0}
ESRGAN parameters: False
Facetool parameters: False
{'prompt': '(a kitten), comicart1500', 'iterations': 1, 'steps': 50, 'cfg_scale': 9, 'threshold': 0, 'perlin': 0, 'height': 768, 'width': 768, 'sampler_name': 'ddim', 'seed': 4059114571, 'progress_images': False, 'progress_latents': True, 'save_intermediates': 5, 'generation_mode': 'txt2img', 'init_mask': '', 'seamless': False, 'hires_fix': True, 'variation_amount': 0}
'WeightedPromptFragmentsToEmbeddingsConverter' object has no attribute 'device'
Traceback (most recent call last):
File "/home/jovyan/work/InvokeAI/backend/invoke_ai_web_server.py", line 1216, in generate_images
self.generate.prompt2image(
File "/home/jovyan/work/InvokeAI/ldm/generate.py", line 463, in prompt2image
if self.free_gpu_mem and self.model.cond_stage_model.device != self.model.device:
AttributeError: 'WeightedPromptFragmentsToEmbeddingsConverter' object has no attribute 'device'```
Also getting this on first run.
But! Even though I can't render anything, model loading time is fantastic.
If I use a converted-on-the-fly ckpt, I don't get that error and I can render things.
hmm, I uninstalled all my embeddings back when they were slow, and haven't tried again since the deferred loading for them went in.
oh, I also see that's a conditional on free_gpu_mem.
yeah that option needs more investigation
I'll turn it off for now but let me know if you want me to test anything.
Getting wildly different results between 2.2.x and main with an embedding:
I'd argue the latter is better, which is good, but I'd love to know what's causing the difference.
yeah that's pretty dramatic
I was using the full 1.5 model before. Is that available via huggingface?
v1-5-pruned.ckpt - 7.7GB, ema+non-ema weights. uses more VRAM - suitable for fine-tuning
It seems that now I'm using the diffusers one which is smaller.
I don't believe there's any reason to use the non-ema weights for inference.
"but keturn," you say, "what do I use when I want to train dreambooth or textual inversion? Don't I need the "suitable for fine-tuning" weights then?"
good question, hypothetical questioner! I have no idea.
Thank you for anticipating my needs.
Just wondering if the model change accounts for the difference in output.
What's the same prompt look like without the embedding loaded?
@west nebula if you have steps <30 it could be karras scheduling. try passing - - karras_max 1 to main, if that gives you the same output as diffusers then thatβs the culprit
ddim 50 7.5CFG
I will be back to working on these rough around the edges problems later this afternoon.
I may have already fixed some of it here: https://github.com/invoke-ai/InvokeAI/blob/820577b112fd36f11646ebd8a00dec88030a4a27/ldm/invoke/config/configure_invokeai.py. But my PR is unmergeable again and will need another intense rebase. Hoping to get it to a mergeable/testable state late tonight/early tomorrow morning. Could I ask to hold off on the configure_invokeai script changes until then, please? π
as loathe as I am to cite Twitter as a source, Tanishq is pretty reliable on this sort of thing. (part-time at Stability and also the organizer of ElutherAI's diffusion paper reading group.) https://twitter.com/iScienceLuvr/status/1601011140934664193
See, I'm not crazy for always using the full model! Maybe!
I switched back to the full 1.5 ckpt and the image matches the original again.
So is there a full non-ema diffusers 1.5?
oh, it looks like there is, maybe? https://huggingface.co/runwayml/stable-diffusion-v1-5/tree/non-ema/
but we didn't realize there were other branches people might need other than main or fp16, so we don't have a way to access that directly from models.yaml.
@rugged moth in order to support the 2.1 model without xformers Iβm going to add an optional precision field to the models.yaml. It will have 3 choices: auto, fp16 and fp32. Auto will use whatever the userβs precision is set to, and will be the default if the field isnβt present (this is current behaviour). Either of the others will force the chosen precision.
Actually I was just thinking about how to do this in conjunction with precision-setting. Maybe a branch field in addition to revision?
Could that be extracted from a URI?
revision is branch, effectively. (i.e. both branches and tags may be used to refer to a revision in git)
What happens if I do repo_id: runwayml/stable-diffusion-v1-5/tree/non-ema?
diffusers will be like "that's not a repo_id
"
diffusers π¦
So the good news is that everything works as before if I don't use the non-ema checkpoint - thus embeddings work perfectly.
So minor hiccups are IMO a lack of clarity about conflicts with embedding terms, problems with --free_gpu_mem, and misc. model manager problems that I've run into but haven't investigated yet.
Oh, and a warning on every render about the safety checker...
@rose sentinel In the TI training text UI, once you hit enter on a path selector you can't get out without picking an existing path. Is there a way to make a new directory or back out of the selector?
Really like the text UI for this!
BUT...
Looks like something may not be getting read correctly from my inputs.
I changed the output directory and it didn't use my selection.
the only thing that TI training unfreezes is the nx768 embedding tensor for the terms you're training, and afaik that means the presence of EMA weights isn't actually relevant
OK, I'll give it a shot. Didn't work with the ckpt because that's not supported by diffusers.
(I didn't convert yet.)
Sorry. I know whatβs causing that and Iβll fix it.
Also crashed when I went to a valid path with files in it for the training data and hit enter...
The file selector is a bit weird but if you go to the bottom and just type out the path you want it create the directory.
It is a bug unrelated to the choice of path. I will fix in about an hour.
No rush, just wanted to test things out and let you know.
Do the manual install instructions need to be updated for diffusers? I tried doing a manual install from the current main, and the configure_invokeai.py step gives me this:
I think that's an unrelated-to-diffusers problem: https://github.com/invoke-ai/InvokeAI/pull/2327
You can try this PR: https://github.com/invoke-ai/InvokeAI/pull/2333
Nice. One thing that's not clear yet - if I abort a training, what files do I place where to test it out prior to resuming (if needed)?
You can take any of the checkpoint files and place them into the embeddings folder. These are at the top level of text-inversion-training and have names like learned_emeds-steps-1000.bin
I've just accepted and merged this PR in. There was an update to dnspython yesterday (which we didn't even use directly) which broke diffusers network file fetching.
Will take a look tonight probably. And I'll also try resuming.
Out of curiosity, where does the per-step code for diffusers live?
If I were to, say, try implementing symmetry in some fashion?
Got the diffusers branch running. It's significantly slower and uses significantly more RAM than my script that uses fp16, and roughly the same speed as invoke 2.2.5. Both diffusers and non-diffusers seem a bit slower than 2.2.5, likely having to do with the additional code eating a bit more RAM, and thus increasing memory pressure. The generation I tested pushes this machine's RAM limits in fp32. I'd like to see if I can fix this by setting invoke's diffusers pipeline to fp16. Searching through the code now to figure out where to do that, but if someone could point me in the right direction...
Nevermind, it's a command line variable, duh.
Throws an error, but it seems to be from something in invokeai/lib/python3.10/site-packages/torch/amp/autocast_mode.py hard-coded to do that when you try to use fp16 with mps.
bypassing that causes more errors (type mismatch stuff, it looks like), so not something I can fix on my own. Other libraries that need to be updated, maybe? It IS possible to run fp16 on mps with a diffusers pipeline, though...
And installing a pytorch nightly from within my venv doesn't help
Let me know if there's anything else you want me to try. I'm out of ideas.
That PR from pcuenca was included in diffusers 0.7, which was a long time ago in diffusers-years.
Yeah, just thought it might be useful information, since it was adding functionality that isn't currently working in invoke.
So, this is probably very useful information. I was having trouble figuring out how to add a VAE to my diffusers model in invoke (I'm getting minor differences in image generations between diffusers and non-diffusers versions of the same model, but they seem to be the kind of stuff that a VAE difference would explain). Anyway, in doing so, I realized that I never added a VAE to my custom script. When I tried to do that, I get what appears to be the exact same type error that I got in invoke, after bypassing the fp16/mps error.
@clear hinge what do you reckon is causing the low quality seam fix with the diffusers
I don't know. Does inpainting also suffer similar quality problems with similar parameters? If not, then something in the inpaint.py code probably got messed up. If inpainting is an issue, then the problem is likely deeper in the diffusers code.
probably the inpaint.py code doesn't work the same anymore. It basically did get_make_image on itself and called that to do a seam paint, then tried to restore everything after doing that.
ive been trying to debug it
it seems fine in the debug images until the very end
i feel it might have something to do with the last make_noise call not working as intended
its almost like an extremely low res image is being pasted back
in the seam area
I can currently use fp16 OR load a VAE in my script, but trying to do both produces that type error.
With this PR, one can generate SD-2.1 768 pixel images without xformers loaded, and without having to set global precision to full precision: https://github.com/invoke-ai/InvokeAI/pull/2335
The same PR code should allow you to force fp16 on models and VAEs. Use precision: float16 in the model stanza (untested, and probably won't help @sour sun )
There are now a bunch of PRs queued up, each of which make model management a bit better. When folks have a chance, please have a look. Also, I'm creating a new channel for version 2.3, now that diffusers is in main.
This seems like a concerning change:
mask_image = mask.convert('RGB'), # Code currently requires an RGB maskto
mask_image = mask,
infill_methodalso now gets passed toseam_paint, though it's not needed - could indicate that it was passed to get around an introduced bug:inpaint_height = im.heightto
inpaint_height = im.height,
infill_method = infill_method
ye i was looking at exactly that as u typed this lol
Found this in the change too:
if mask_image.mode != "L": # FIXME: why do we get passed an RGB image here? We can only use single-channel. mask_image = mask_image.convert("L")
this one makes sense coz it needs a luminance channel
and it cant multiply with the alpha mask otherwise
I just think it's hilarious that when you were preparing 2.2, I was saying "patchmatch isn't production-ready, leave that for later."
but everyone was "noo! we need patchmatch! everything is horrible without it!"
so then when I was doing all the diffusers migration of inpainting, I made sure everything used the chosen infill method consistently.
and now y'all are like "this code is broken, make it not use patchmatch." π
uh... I don't think seam painting has anything to do with infill/patchmatch.
its broken on both. not to do with them
these aren't it. still brokenafter resolving
does the problem with seams only show up in combination with canvas use, or is it present in CLI as well?
I ask because it's a lot easier to run test cases with CLI.
havent tried on the cli
and if I understand the reports from hipsterusername correctly, it's only a sometimes-thing, so I haven't been confident about reproducing it at all.
no it happens all the time
i think its less noticeable in some examples
but the image degradation on the seams has been there for me 100% of the time so far
strange...
you have all the debug images available?
(I have the equivalent of the engine pulled apart on my desk in the nodes branch and so little time to code lately that I don't want to switch branches or rebase at the moment)
ill send them to you in a second
@clear hinge final result
Does it not spit out debug images for the seam paint?
The second init_filled and the masks after that I didn't expect to look like that
Pretty blurry/foggy image going into it too π. Weird that it looks less blurry on the seam.
dont think it does
Getting that set up might help. Looks like the masking is correct at least.
yep adding in
seam_mask
hmm .. interestingly the seam result is not showing up
What's the init_img when seam painting?
init_filled
also i just noticed
it runs that thing twice
is there a reason for it?
wait let me check again
@clear hinge seam fix init img is the same as corrected_result
also this this "Result" not supposed to be an image? i cannot debug it until the original settings are restored
Uh... I don't know now. May have been a tensor?
Hrm... Well, it seems like init img and mask are correct
yep
That code structure was a bit messy though. Aside from that, it tries to replace the make_image function, then restores it afterward
But the code path for the whole thing stores some stuff on the class, some stuff gets passed to make_image, and some stuff is captured.
If you want to semi-simplify it you could move seam-painting up to generate.py for the time being
And just create a new inpaint for the seam paint call to avoid any variable conflict stuff
heres what i noticed
when i debug images
after the seam_paint runs
the whole init_img thing runs all over agian
coz i get debug images for init_img again
Yah it calls get_make_image again after seam paint to restore settings
Since if you're doing iterations, the next iteration would be broken if you didn't do that
ah
seems like whatevert is broken is broken inside seam_paint
trying to pin point what
reckon its the noise that is being filled/
?
Maybe doesn't initialize something correctly
see any issues here?
No, but I'm not super familiar with that code π£
For seam paint, the most important thing was setting all the variables correctly when calling get_make_image, then restoring them correctly afterward
you can see from the latents here that the area that is being seam painted is very different
which makes me feel that the issue might be with the noise
Hrm... So in the old code, the init img got stored as pil_image, but then converted to a tensor and used for everything (including generating first stage).
some of that conversion code got shuffled over to the diffusers_pipeline side of things
oh, hmm, there is some code in inpaint.py that's behind if isinstance(init_image, PIL.Image.Image): conditionals, but I'm not sure how optional it really is?
that is, if you passed something that wasn't an Image, then none of that would run, and that might be bad.
Yah I forget what the use case was for that
Bug: Setting img2img strength at 0.01 throws IndexError: index 0 is out of bounds for dimension 0 with size 0 during img2img
ah, that one I did know about at one point but haven't made a tracking issue for it since the branch closed. will do.
thanks
obviously it shouldn't crash, but UX wise, what should the behavior there be?
it's flipping out because it thinks you just asked it to do a zero-step img2img
perform at 0.01 .. but if needs to min at 0.1 then i can change the frontend to not go below that
which might be the easier fix
well the inconvenient thing is that the way img2img strength has been implemented, it works by running some fraction of the requested step count.
so the minimum is not 0.1, it's 1/steps, more or less.
by the same token, if you're running n steps, you can't really get up to 0.999, you can only get up to (n-1)/n
ye i found that weird always
its been like this since the original compvis implementation
at one point people proposed exposing steps to run
yep. so having the strength setting be a continuous slider is misleading. discrete intervals would be more accurate.
and also kind of a crummy UI all around, so we should change the implementation, but... that's another story.
But i find that confusing from a UX perspective
whats the quick fix for now? limit it to the bounds of 0.1 to 0.99 which is what we did back then with the CLI implementation i think
and it worked fine in the frontend .. not sure when and where it broke
but yeah.. its a silly thing
hmm .. dont recall .. but i think the cap was always 0.99
and we need to remove inpaint replace ui right?
the last change on that code was 3 months ago when i changed it to cap at 0.99
right the cap was .99 because 1 would effectively ignore the og image - but thats what inpaint replace is doing so we figured it was more streamlined to just allow 1
we should fix the backend so it never goes below one step, I think. Way too easy to break it otherwise.
just brought up the UX aspect cuz it's going to leave a fairly wide section of that slider from 0 to just below 2/n where it's all effectively the same.
that seams reasonable
UX should be improved but not sure its imperative for diffusers release
dont think itll behave differently than it does today
right?
true
I'm experimenting with symmetry with diffusers and I put this code in sampler.py's do_sampling loop just before the call to p_sample. It doesn't seem to matter what percentage of steps I use, the result looks completely symmetrical rather than somewhat symmetrical. Any ideas?
if percent_done < 0.05:
# flip the image tensor and use the first half of the original followed by the second half of the flipped tensor
width = img.shape[3]
to_use = int(width / 2)
x_flipped = torch.flip(img, dims=[3])
img = torch.cat([img[:, :, :, 0:to_use], x_flipped[:, :, :, to_use:width]], dim=3)
With the pre-diffusers code, the symmetry had the desired effect.
I'm surprised to hear the Sampler class has any effect on the diffusers pipeline at all
I think InvokeAIDiffuserComponent is roughly where I'd try to put that sort of thing, though it is quickly outgrowing how much we should fit in a single class.
I'm using a ckpt model FWIW
Just didn't know where to throw this test code, so thanks. I'll move it to InvokeAIDiffuserComponent and see what blows up!
It seems that do_diffusion_step has no access to the total step count, hmm.
yep
that limitation is possibly a hangover from the compvis code
because with the diffusers pipeline, we have full control/access over the steps and step index
with compvis we only had full access with ddim, the k_* samplers abstracted the step index and step count out of the upstream calls
so i had to estimate by checking the current value of sigma against the sigmas array of the model/sampler. you should still be able to see that code in the do_cross_attention_controlled_diffusion_step() function or whatever it's called
estimate_percent_through doesn't work now, either.
percent_done = self.estimate_percent_through(step_index, sigma)
File "/home/jovyan/work/InvokeAI/ldm/models/diffusion/shared_invokeai_diffusion.py", line 255, in estimate_percent_through
smaller_sigmas = torch.nonzero(self.model.sigmas <= sigma)
File "/home/jovyan/work/InvokeAI/invokeai/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1269, in __getattr__
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'UNet2DConditionModel' object has no attribute 'sigmas'```
That's when using ddim. Seems that the k_ samplers do provide that, at least the one I tried
But the results look horrible with ddim and great with the k_ samplers even if I work around that.
ddim vs k_dpmpp2
And you can't use step_count f using a k_sampler:
percent_done = step_index / 50
TypeError: unsupported operand type(s) for /: 'NoneType' and 'int'```
I wonder if all of this should be normalized prior to hitting this layer.
@rose sentinel i'm not sure that this is correct:
i think it should be 'hub':
damian@d-mba2 matrix % find ~/.cache/huggingface -iname "*runwayml*"
/Users/damian/.cache/huggingface/hub/models--runwayml--stable-diffusion-v1-5
above is what i already have on my disk from building grate - which simply calls StableDiffusionPipeline.from_pretrained('runwayml/stable-diffusion-v1-5')
That's odd. The diffusers pipelines end up in diffusers when I try the same thing. What version of huggingface_hub are you using?
0.11.1
Also the estimate function fails if using a diffusers model and a k_sampler with AttributeError: 'UNet2DConditionModel' object has no attribute 'sigmas'.
urgh, i guess i only checked with ddim
hmm, no, wait, i'm pretty sure i was using k_lms..
dpm++ is so much more powerful
There are 4 cases then... diffusers model + ddim, diffusers model + k_, ckpt + ddim, ckpt + k_
ddim is used all over the place and the code should work no matter the sampler chosen.
True
Determinism isn't a bad thing.
agreed. i think the real fix is to pass the step index properly through.
And also number of steps total?
(Even if I hardcode that stuff in, my output gets all mangled. So maybe this isn't the place to be hooking up symmetry.)
Here's what I get on my system:
$ tree .cache/huggingface/
.cache/huggingface/
βββ hub
βββ version.txt
$ python
>>> from diffusers import StableDiffusionPipeline
>>> pipeline = StableDiffusionPipeline.from_pretrained('stabilityai/stable-diffusion-2-depth')
$ tree -L 2 .cache/huggingface/
.cache/huggingface/
βββ diffusers
βΒ Β βββ models--stabilityai--stable-diffusion-2-depth
βββ hub
βββ version.txt
3 directories, 1 file
huggingface_hub 0.11.1, diffusers 0.11.1
No difference when using runwayml/stable-diffusion-v1-5. I can't explain the difference we are seeing.
@rose sentinel I didn't get your TI branch to crash but it also doesn't resume training. See logs below.
It still has 2K steps to go...
@west nebula can you check if this fixes the issues you were seeing: https://github.com/invoke-ai/InvokeAI/pull/2342
Should that code all be moved to estimate_percent_through and estimate_percent_through be used throughout?
Otherwise if self.cross_attention_control_context is None, one has to duplicate all of that effort to get the percent through.
If that isn't a concern and we don't care if estimate_percent_through works right, looks fine.
I'll toss some debugging statements in and test.
total_step_count should be documented at the top of the function.
So with a ckpt model loaded, step_index is good but total_step_count is None when using ddim.
Same model, both step_index and total_step_count are None when using k_.
With a diffusers model, step_index and total_step_count look good with all samplers.
Is that what you expected, @tardy sparrow?
thatβs expected
sigma should be non-None in this case
So of those cases, are there any where percentage cannot be estimated?
nope,
estimate_percent_through is scheduled for removal when the compvis code paths are dropped
because with diffusers we can always guarantee to provide both step_index and total_step_count
But your code isn't executed if cross attention isn't being used, right?
So that should be pulled out IMO.
We should still have access to % complete.
you can just push it above the if, i just didnβt want to pollute the fn namespace with variables that arenβt used except in certain use cases
That's why I was suggesting throwing it into the estimate_percent_through function that we could keep around.
My thought process is that at some point we may need this with nodes if we're going to do a per-step adjustment or calculation or whatever.
Something like symmetry for X% comes to mind. π
sure, its just a simple division tho and itβs not an estimate
i would prefer the function goes away and whoever needs it just has access to the step index and total count vars
Is there a way to get those for ckpt?
I should convert the ckpt and see if the same issues exist since that's the path forward...
no, thatβs the thing - the function is called βestimateβ because i was forced to guess from the values that the compvis code paths made available
Ah. Got it. So when we yank out native ckpt support, that whole path goes away entirely and we can use raw steps + total.
I suppose I should hold off on experimenting again until the dust settles.
Regarding this issue, note that we couldn't reproduce this on our side. SD 2.1 is supposed to work just fine with xformers. If you're generating black images with xformers could you open a new issue or comment on this one with a repro (e.g. a google colab with the xformers pip wheel reference etc...)
SD 2.1 works fine with xformers. When xformers is not installed and autocast is on, then 2.1 (768 version) produces black images. Changing to float32 fixes the issue.
Actually that would be super nice! It's on my ToDO list for this & next week, but would be amazing if you could open a PR
That's already done in the InvokeAI repo. There are a couple of places where I made InvokeAI-specific changes, and I'll parameterize them in order to make a PR.
Or Damian will get to it first.
Gotcha, ok even without xformers it should work because we added a upcast_attention flag here: https://huggingface.co/stabilityai/stable-diffusion-2-1/blob/main/unet/config.json#L45 . It's just that you need to have at least diffusers 0.11.0 installed
I do have 0.11.1 installed, but getting black images unless I load with torch.dtype=float32 and turn off autocast.
I recently fixed using a workaround.
Interesting! Would be happy to look into it if you could file a quick issue. In general, any reason to use autocast instead of pure fp16?
BTW, I'm fully back to work and happy to help with some changes to diffusers .
What are the most pressing things from your side?
wait, do we have autocast on? where is that? oh, I think there was a hint as to where that is in a traceback that gogurt filed, I'll go check.
It's in generator/base.py, where there is a call to choose_autocast(). The latter is in devices.py.
Is autocast unecessary? It is there from the original CompVis code.
Really easy to turn autocast off globally just by editing choose_autocast().
@forest spade I think the most pressing things are:
β’ SD 2.x with float16 as we've just been talking about, https://github.com/invoke-ai/InvokeAI/issues/2329
β’ MPS with float16 https://github.com/invoke-ai/InvokeAI/issues/2336
β’ and we still need to wrap up the cross-attention stuff. I think damian has been reluctant to build on the new API before it comes out in a diffusers 0.12 release. Maybe uncertain if it's done, or if you're making additional changes for LoRa. (partly my own interpretation; I don't want to put words in @tardy sparrow's mouth.)
if we do that, do all our float16 problems go away?
nope, looks like we have some things to clean up there. Welp, wish I had known about that earlier
I had hoped so too.
Right now, SD-2.1 will on non-xformer CUDA systems by virtue of being forced to use float32, as soon as PR 2335 goes in.
autocast usage filed as https://github.com/invoke-ai/InvokeAI/issues/2345
Baaah. Who wants to take on this assignment? (this is a call for volunteers)
I will, when I get enough round tuits, but I wouldn't mind if someone else gets there first.
Gotcha, generally it is never recommended to use autocast in diffusers. We completely dropped support for it as it's really slowing down generation (see https://github.com/huggingface/diffusers/pull/511)
So nothing should be wrapped in autocast as it will always be slower that just doing torch_dtype=torch.float16
With diffusers 0.11.0 SD 2.1 works with torch_dtype=torch.float16 (no autocast)
We can do a release next week with all the attention loading and also better textual inversion loading
Leaving this dump here. Can't get it to reliably reproduce but it's happened a few times for me.
I have noticed that GPU memory stays high after a cancellation of a render (via GUI) until something renders successfully.
I'm looking at this now. There are a lot of places where there is mixed float32/float16 tensors being combined. Have found many, but not all of them. This is tedious...
if it feels overwhelming to do the whole codebase, we can try some more focused application of autocast(enabled=False) around the diffusers stuff.
Just happened again. How is 512x904 even a possibility? It should be 384x704 that gets resized.
for high res optimization? I believe that always generates with the smaller dimension set to 512.
Nope.
It tries to get it close to a 262144 pixel area.
But it should always be a multiple of 64 unless I'm wrong...? @rose sentinel?
scale = 512 / scale_dim
init_width = math.ceil(scale * width / 64) * 64
init_height = math.ceil(scale * height / 64) * 64```
That's the ckpt code
There's some hack that a1111 is using that lets it work in multiples of 8 instead, but I don't know if invoke is using that too.
Diffusers has a completely different approach.
return tuple((x - x % multiple_of) for x in args)
Where multiple_of defaults to 8... so I'm confused here.
I've found and fixed all the float32's that were in the txt2img path. Now I'm going through img2img etc. I think the worst is over; just the latents to deal with.
@worldly cloak I found this very old line of code in the ckpt loader:
model.to(torch.float16)
I think it's garbage. model.to() sets the device, not the precision. Is this correct?
The diffuser code path is now working with fp16 and no autocast. I am not touching the ckpt code path, which still uses autocast. I'll do a little testing and then make a PR.
@worldly cloak Unfortunately, removing the autocast context does not fix the black image problem when running with float16. I'm afraid it may be a float16 issue rather than an autocast issue. I am pretty sure that I removed the autocast context completely.
On the plus side, generation is now noticeably faster. I'll go ahead and make a PR for folks to play with: https://github.com/invoke-ai/InvokeAI/pull/2349
For what it's worth, here's code that works:
from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler
import torch
model_id = "stabilityai/stable-diffusion-2-1"
# Use the DPMSolverMultistepScheduler (DPM-Solver++) scheduler here instead
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16, cache_dir='/home/lstein/invokeai/models/diffusers')
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe = pipe.to("cuda")
prompt = "a photo of an astronaut riding a horse on mars"
image = pipe(prompt).images[0]
image.save("astronaut_rides_horse.png")
And here's code that produces a black image:
from diffusers import DPMSolverMultistepScheduler
import torch
model_id = "stabilityai/stable-diffusion-2-1"
# Use the DPMSolverMultistepScheduler (DPM-Solver++) scheduler here instead
pipe = StableDiffusionGeneratorPipeline.from_pretrained(model_id, torch_dtype=torch.float16, cache_dir='/home/lstein/invokeai/models/diffusers')
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe = pipe.to("cuda")
prompt = "a photo of an astronaut riding a horse on mars"
image = pipe(prompt).images[0]
image.save("astronaut_rides_horse.png")
The difference is that one is using StableDiffusionGeneratorPipeline, and the other is stock StableDiffusionPIpeline.
you found the easter egg! I have been keeping this as a secret surprise: diffusers has code that allows sizing by 8 rather than 64
There is only one place in the code base that retrieves the autocast constant, and I have a print() statement just before it (which isn't triggered), so I don't think autocasting is the issue.
good (?) news: running this code on your #2349 branch produces a good image for me. I even threw in an extra pipe.disable_xformers_memory_efficient_attention() to make sure I wasn't cheating.
Well that's something! But why's it crashing?
Also, that's a lot of numbers in the GUI dropdown!
pretty sure your traceback above is a swap() crash
Well, that's good news! I ran my failed test on the AMD GPU machine, because I've got xformers running on the NVIDIA machine. Maybe this is a ROCM issue?
Does pipe.disable_xformers_memory_efficient_attention() have the same effect as uninstalling xformers completely? Because I don't want to do that -- it took quite a while to get it installed.
Could definitely be. I still want my wine bottle shaped like Orson Welles, though.
I will also try updating torch-rocm on the AMD machine.
Ok, looks like I missed embiggen. Easy fix.
Want me to test a change locally first?
Let me figure out where it is....
...found it
If you would, could you check perlin noise as well? I didn't address that part of the code.
Here's the fix for txt2img2img:
diff --git a/ldm/invoke/generator/txt2img2img.py b/ldm/invoke/generator/txt2img2img.py
index e356f719..1dba0cfa 100644
--- a/ldm/invoke/generator/txt2img2img.py
+++ b/ldm/invoke/generator/txt2img2img.py
@@ -90,9 +90,9 @@ class Txt2Img2Img(Generator):
def get_noise_like(self, like: torch.Tensor):
device = like.device
if device.type == 'mps':
- x = torch.randn_like(like, device='cpu').to(device)
+ x = torch.randn_like(like, device='cpu', dtype=self.torch_dtype()).to(device)
else:
- x = torch.randn_like(like, device=device)
+ x = torch.randn_like(like, device=device, dtype=self.torch_dtype())
if self.perlin > 0.0:
shape = like.shape
x = (1-self.perlin)*x + self.perlin*self.get_perlin_noise(shape[3], shape[2])
@@ -117,10 +117,12 @@ class Txt2Img2Img(Generator):
self.latent_channels,
scaled_height // self.downsampling_factor,
scaled_width // self.downsampling_factor],
- device='cpu').to(device)
+ dtype=self.torch_dtype(),
+ device='cpu').to(device)
else:
return torch.randn([1,
self.latent_channels,
scaled_height // self.downsampling_factor,
scaled_width // self.downsampling_factor],
- device=device)
+ dtype=self.torch_dtype(),
+ device=device)
Also unrelated, noise thresholding looks pretty off.
Does this work for you, or should I upload a patch file.
I'll patch it, hang on a sec.
Noise thresholding is still a to do.
LGTM
So perlin noise with txt2img needs fixing.
What about img2img? I haven't tested that at all.
Yeah, perlin noise there is messed up as well.
Perlin noise won't work anywhere. I've isolated the problem, trying to figure out the solution.
Here is the fix for perlin:
diff --git a/ldm/util.py b/ldm/util.py
index 282a56c3..7d44dcd2 100644
--- a/ldm/util.py
+++ b/ldm/util.py
@@ -8,6 +8,7 @@ from threading import Thread
from urllib import request
from tqdm import tqdm
from pathlib import Path
+from ldm.invoke.devices import torch_dtype
import numpy as np
import torch
@@ -235,7 +236,8 @@ def rand_perlin_2d(shape, res, device, fade = lambda t: 6*t**5 - 15*t**4 + 10*t*
n01 = dot(tile_grads([0, -1],[1, None]), [0, -1]).to(device)
n11 = dot(tile_grads([1, None], [1, None]), [-1,-1]).to(device)
t = fade(grid[:shape[0], :shape[1]])
- return math.sqrt(2) * torch.lerp(torch.lerp(n00, n10, t[..., 0]), torch.lerp(n01, n11, t[..., 0]), t[..., 1]).to(device)
+ noise = math.sqrt(2) * torch.lerp(torch.lerp(n00, n10, t[..., 0]), torch.lerp(n01, n11, t[..., 0]), t[..., 1]).to(device)
+ return noise.to(dtype=torch_dtype(device))
def ask_user(question: str, answers: list):
from itertools import chain, repeat
Indeed it is.
Working on swap now...
Huh. Swap() is working for me. What was the prompt you tried?
Hm, I'll have to dig it up. One sec.
product shot of a wine glass full of (wine).swap(danny devito), caustics ray tracing three point dramatic lighting [painting, render, drawing, sketch, cartoon, pixar, disney]
ddim, 512x512, 50 steps, nothing else turned on
I also put debugging diagnostics around the section that crashed to look for type mismatches.
Wait, what platform are you on? Mac?
Nope. Should be fine.
I am running with --precision float16
Let me try that.
Nope. Works.
I have no doubt there's a bug there somewhere, but I need to knock off for the evening. I'm going to commit the perlin and txt2img2img fixes and will check in with you tomorrow.
I'm using a model that I converted from a ckpt, so I get: Half-precision version of model not available; fetching full-precision instead
That's ok too.
No, still crashing after a restart of Invoke.
It has to be somewhere in cross_attention_control.py.
Not yet.
hidden_states = torch.bmm(attention_slice, value)
One of those is a float and the other is half.
Oh, thank you! Where is that line?
I think the problem is that I was testing on the machine with xformers. I'm now trying the non-xformers machine to see if I can reproduce.
I wanted to keep things simple for testing so I left xformers off.
Building/getting the right version has been a pain whenever I've tried.
I'm able to reproduce the swap() problem on the non-xformers system.
swap() is now fixed and pushed to the PR
Did you test blending?
Sadly, on my NVIDIA CUDA system, with xformers disabled, I'm getting black images still. I'm using CUDA toolkit 11.7 and torch 1.13.1. Same as you?
yep
img2img isn't working with a diffusers model on MPS. Is this a known issue? The non-diffusers version works fine.
here you go sir https://github.com/huggingface/diffusers/pull/2019
i wonder if this is being caused by the same pytorch bug that means we have to call .clone() unnecessarily to make MPS diffusion work at all
I don't know, but I was trying to get img2img working in my custom diffusers script, and it was crashing with some message about an empty placeholder tensor. I was going to see if invoke did the same, and found this instead.
what versions of torch are you running? the bug in question is fixed here https://github.com/kulinseth/pytorch/pull/222 but no idea when that will make it to torch mainline
Got the same error using the current stable version and a nightly from... maybe a week or two ago? maybe I should try the latest
Nope, current nightly doesn't fix it.
Aha, I think using my warmup code with the img2img pipeline was causing it to calculate a step count of zero, which was causing the error.
Now I'm generating noise instead of images, but at least it's not crashing.
thanks
Everything checks out on my end.
Should have time to look into this tomorrow π
Yes that was added by a kind OpenAI researcher a while back, in my experiments it worked pretty well π
- https://github.com/huggingface/diffusers/pull/505
Also someone from the community added it for img2img just recently: - https://github.com/huggingface/diffusers/pull/1571
strange results with latest main on mac (SD2.1, keuler, 20 steps, 768px)
seems this is in flux on hugging face's side https://github.com/huggingface/diffusers/pull/2005
where is this being done? the only thing i can find is ldm/invoke/ckpt_to_diffuser.py and that is saving to disk, not returning the reconstructed StableDiffusionPipeline object. is it in a PR? to be clear - this is the changes i made that does not save the pipeline but keeps it in memory - this means Invoke could skip the disk-space-hungry conversion process entirely
loads ckpt files straight into StableDiffusionPipeline objects without needing to write anything to disk
Is there a recommended way to install xformers for Invoke?
Also, should we pass requires_safety_checker=False when constructing a DiffusionPipeline if the user has picked --no-nsfw_checker to avoid those warnings hitting the console?
Sorry, I misunderstood what your PR was doing. I thought it was skipping the step of downloading the ckpt to disk prior to conversion, which the current module does via URLs. What is the use case for converting and keeping in memory? The same conversion process will have to happen each time the model is needed.
I've just written up Linux install instructions, but need someone to contribute a recipe for Windows: https://github.com/invoke-ai/InvokeAI/pull/2360
I tried to suppress these warnings by changing the transformers log level, but it seems to revert after a while. I'll try setting requires_safety_checker. There are also annoying warnings that occur when caching models by moving them from GPU to CPU regarding using fp16 arithmetic on CPU. I used the same log level trick to suppress these, but they come back after a while.
I'll give it a shot on Linux. Can't help with Windows.
Going to try to get this working on windows
I also just purchased a 4090 for model training... and am under the impression I will have to do some rituals to get it to perform well
Great, have to update the entire system before I can get cuda 11.7 installed. What a PITA.
@rose sentinel If you follow NVIDIA's remote instructions, it gets cuda 12.0. Is that OK for Invoke?
No, because pytorch only supports up to CUDA 11.7. Did I provide the link to the wrong toolkit?
This links points to the 11.7 download: https://developer.nvidia.com/cuda-11-7-0-download-archive . When you ran it, did you get 12.0?
Resources CUDA Documentation/Release NotesMacOS Tools Training Sample Code Forums Archive of Previous CUDA Releases FAQ Open Source PackagesSubmit a BugTarball and Zip Archive Deliverables
Yes.
They're pushing the latest out.
I'm going to remove all of my NV repos and try this again after uninstalling 12.
BBIAB
I think I was pulling updates from their server via apt. I removed the file and I'm doing the local installer and all seems fine.
I think the documentation should specify that it has to be 11.7 and that 12.0 or later won't work.
@rose sentinel here's another issue...
1.13.1+cu116
(invokeai) (base) jovyan@f0ab4de483f7:~/work/InvokeAI$ pip install --upgrade --force-reinstall torch torchvision
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting torch
Downloading torch-1.13.1-cp310-cp310-manylinux1_x86_64.whl (887.5 MB)
βΈβββββββββββββββββββββββββββββββββββββββ 20.1/887.5 MB 15.0 MB/s eta 0:00:58
ERROR: Operation cancelled by user```
It's not getting the CUDA 11.7 version.
pip install --upgrade --force-reinstall --extra-index-url https://download.pytorch.org/whl/cu117 torch torchvision takes care of it on this system.
The run script is pulling down 12.0? Or is this apt against their repos?
The latter. I think it's OK when you start with a clean slate and don't have their repos linked up. My system's all set up with 11.7.
Yes, yoiu went down exactly the same path I did a few nights ago. I went to the "CUDA Toolkit" page that you get when you search Google and installed the NVIDIA repos, not realizing that I was going to get 12.0. Several frustrating hours later, I ripped everything out and started over again with 11.7. This time it went smoothly.
You can also apt-get install cuda-11-7.
But good, glad we both got it installed. Users are going to have issues, I think.
I agree π¦
So what about torch/torchvision not being correct?
Unfortunately, my work requires me to get this all working ASAP so I can AI generate people and baked goods.
Does cuda-11-7 give you the developer tools needed to compile xFormers? I found I had to install cuda-toolkit to get those, and this then disabled the kernel driver!!
Haven't gotten that far... going through step by step to get torch and torchvision installed.
I followed the pyTorch install directions here: https://pytorch.org/get-started/locally/
Seemed to work for me without the extra index.
I assume requirements.txt will be updated to pull in the right torch/torchvision and users won't have to --extra-index-url https://download.pytorch.org/whl/cu117?
Not sure why I have to do that, then.
But it tries to get CPU 1.13.1.
requirements will be updated if necessary. Oddly I am getting 1.13 with CUDA support without the extra index. Did you kill the install because the wheel name looked wrong, or did you finish the install and then test the version?
Former.
I will test this on a machine that has 1.12/cu116 installed.
Before, it was installing CUDA 11.6... just not catching on to 11.7 now.
I got '1.13.1+cu117' -- without the extra index.
Ok, during install, the wheel was named torch-1.13.1-cp39-cp39-manylinux1_x86_64.whl, but it was the cuda version that installed.
Well that's not great.
So even though it says cp310 in my case, it should work? I'll try installing again.
So that you can use the new diffusers code, but keep your models in .ckpt format (or .safetensors, presumably) so that they're still compatible with other software like a1111, and not have to keep two separate copies of the model on you ssd.
@rose sentinel xformers install worked perfectly.
memory_efficient_attention.tritonflashattB: unavailable```
I assume that's nothing to be alarmed about.
Maybe?
Error caught was: No module named 'triton'```
That makes sense. We can make in-memory diffuser conversion into a command-line option for the user to activate when desired.
Otherwise the system will load and install .ckpt and .safetensors using the old code path.
That would be great. You're still planning to remove the old k_diffusion code eventually, right? When that happens, you would just have to make that option always-on...
That's exactly what I was thinking.
Has anyone noticed that there is slight image to image variation when generating with diffusers using identical prompt, seed and other parameters? There are often small changes to the image, usually near the bottom. I have just noticed this and am unsure whether it is related to xformers. It doesn't seem to happen with legacy .ckpt or .safetensors models.
I just installed triton and that error went away. Maybe include that in the pip instructions, @rose sentinel?
Definitely seeing a difference - same seed, parameters, etc.
This is post-xformers. Is there a way to disable it without uninstalling it? That was a beast to build.
Also If suiccessful
The differences are even more pronounced with 2.1, and it's not really generating anything useful. Hmm.
This is with 1.5-nonema diffusers that I downloaded locally.
I just got my 4090 installed and Iβll work on trying to figure out how to get xformers and 4090 support going so I can write recipes up for both
Itβs currently jutting outside of my case because apparently subtlety is not in vogue
I noticed differences in the details with diffusers vs. non-diffusers. It turned out to be because the diffusers version was using the original VAE, and the non-diffusers one was using vae-ft-mse-840000-ema-pruned. I converted that and added it to my diffusers model, and I could no longer see any difference from the non-diffusers version.
My setup is different and does use the VAE. Plus renders shouldn't vary if no settings change at all.
path: /home/jovyan/work/InvokeAI/models/optimized-ckpts/stable-diffusion-v1-5-nonema
description: Stable Diffusion Non-EMA (FULL)
format: diffusers
vae:
repo_id: stabilityai/sd-vae-ft-mse
default: true```
I should see absolutely no difference between those two flower cookie photos, and there are a bunch.
So how do I disable - not uninstall - xformers?
I'm using pip here rather than conda.
edit diffusers_pipeline.py and remove the clause that does the enable_xformers call.
@rose sentinel With xformers off, the generations are completely identical.
Whether it's xformers or something that uses it is TBD.
I got xformers to build for my 4090 in my nvidia/cuda:11.7.1-devel-ubuntu22.04 docker container with the following FWIW:
RUN pip install ninja \
&& pip install -U --pre triton \
&& TORCH_CUDA_ARCH_LIST="8.6+PTX" pip install -v -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers
xformers doesn't have ARCH 8.9 on the list so I had to use 8.6+PTK which appears to work anyway.
oh I also had to install build-essential
Im on windows, but got xformers built using
pip install xformers==0.0.16rc425
Hm
Still feels slower than my 3080
how many it/s are you getting w/ your 4090?
~15it/s, SD 1.5, "dog", 20 steps, 512x512, k_euler_a
I think this error message needs to be replaced w/one indicating a local embedding is being used: This concept is not known to the Hugging Face library. Generation will continue without the concept.
@heavy glacier what are you getting? I started at like 5-8it/s until I implemented a few steps.
I'm not sure if I'm using the diffusers implementation at this point so I don't know if xformers is being used. I just pulled the latest which included that merge from the weekend and booted up... Would I see confirmation xformers is being used in the console?
Yeah, getting about 5-6 rn
One thing that unlocked a lot of performance was I manually updated the CUDNN version. It's a pita, but that gave me a 300%-400% performance jump.
Pseudo steps incoming...
When you're done you should be able see 8700 for cudnn:
import torch
torch.backends.cudnn.version()
mmm
looks like its not
8302
ok have it up to 12 now
looks like i got 8600 for cudnn
updating to 8700 now... π
nawp, still 12.
maybe I dont have transformers up and running yet
Ah - 23-25 on SD2.1
oo, I'll have to double check SD2.1 when I'm back at the keyboard. I don't want to leave any iterations on the table!
What were you getting with the 3080?
I noticed this too and it is on my todo list to fix. I need to change the checking order so that <terms> are matched against local embeddings first before it checks huggingface.
So what's the deal with xformers producing different images every run?
For me, when using diffusers and xformers, generating the same image several times using exactly the same parameters (i.e. hitting "Invoke" on the Web GUI without changing the seed or other settings), gives me very slightly different images.
think it had to be something around 12, but im frankly just guessing - cant remember
There is a way, but right now it requires a code change. In model_manager.py, right after line that creates the pipeline (line 426), add: pipeline.disable_xformers_memory_efficient_attention()
I'm going to add a command-line argument to make this convenient.
I am also getting variations
Also, 2.1 seems to be producing garbage compared to 1.5 models.
(not necessarily broken garbage, just meh)
I experienced the same.
I disabled it with @worldly cloak's suggestion: #1031668022294884392 message
Thanks for noting this. I've added instructions for installing triton. NB: I don't think it affects performance at all, as it is a programmer's API, but installing it will avoid the warning.
2.x is notoriously harder to prompt for, and negative prompts are less optional. Also, a lot of the common tricks for getting good results with 1.5 prompts (e.g. "greg rutkowski") don't work, so you have to re-learn it. It also doesn't have the benefits of all the third-party finetuning. Is that the kind of stuff you're referring to? https://minimaxir.com/2022/11/stable-diffusion-negative-prompt/
I didn't do any tricks, just asked for a goat.
yeah i avoid using artists by habit
Not really a "fine-tuning" problem - Getting a lot of weird stuff. Wondering if maybe its not the model π€
π
well
I'm dumb. forgot to change to 768 for 2.1
quality still not great but, better i guess
this is the best I could get
ok, slowly but surely getting non hot garbage.
all the samplers seem to be working fine
Whatβs different? Looks like mine, full of artifacts with a simple keywords.
think that's just the model.
seems so. This is 512px with your prompt, 70 and 20 steps.
but for 768px the steps count is more important
Yeah, at the moment I can't use the embeddings that InvokeAI's TI generates.
I found that if you use anything less than 768 you get garbage. Even then, it is hard to get anything as good as 1.5
Oh? The embeddings should still work even with the message. Iβll make this a high priority to fix
One seems to work, one doesn't at all. shrug
So it may be undertrained?
@rose sentinel Where can I put a debugging statement to make sure the embedding is being used in a particular generation? I want to see if it's being picked up and used or not.
wait, less than 768? For me only 512 works ok, but needs more steps than usual.
I know. I get nice 512px with your prompt above.
I'm in the process of making a PR that will provide some debugging on this. I'm a bit held up because my development machine is down yet again, but should be up later in the day.
OK. Let me know how I can assist. I'm around off and on today and I have jurigged installed so I can try changes on the fly.
I haven't explored the space well, but with k_euler_a and 30 steps, here is what I get with the prompt "peaceful well-groomed city park" at 512x512 and 768x768.
The trees are seriously warped with 512x512.
Upping to 60 steps, it gets worse for me
Here is 768x768 at 60 steps. It looks reasonable:
What's your CFG?
I think DDIM is the safest starting point for any new and unfamiliar configuration.
I'm also wondering if the TI didn't train properly - it was the first time I tried with xformers.
Cool! I'll check it out after my TI finishes TI-ing.
I'll start working on the problem with resuming next.
This PR adds a command-line option to enable/disable xformers: https://github.com/invoke-ai/InvokeAI/pull/2373
Starting invoke.py with --no-xformers will disable memory-efficient-attention support if xformers is installed.
For symmetry, --xformers will enable support, but this is already the default if xfor...
I've got a PR that has partial instructions for installing xformers (instructions for Linux, not Windows). I would be much obliged if anyone who has gotten it working on Windows could contribute a recipe: https://github.com/invoke-ai/InvokeAI/pull/2360
@worldly cloak Coding style question for you. I am trying to suppress the diffusers warning messages about disabling the NSFW checker (which are irrelevant, since we do the NSFW filtering separately). After importing the appropriate calls from diffusers.utils.logging, this works:
verbosity=get_verbosity()
set_verbosity_error()
<do something that triggers the warning>
set_verbosity(verbosity)
However, this looks like something that you'd create a Python context for so that the previous verbosity error is saved and restored automagically. Is this easy enough to do?
@rose sentinel - The pip install xformers==0.0.16rc425 recommendation made in the PR worked for me
Do you want me to officially add that as the recipe/document it to that effect?
My thought was that, given it just works from a wheel install... why would we have users go through building it (or installing it manually) at all?
That's great to hear. Is this for both Linux and Windows? If so, then we can get rid of the documentation entirely and just update requirements-base.txt. I think my mistake was reading the documentation on Facebook's GitHub, which stated unequivocally that there were no pip wheels for xformers
I have not tested on Linux
But i have to imagine it'd work just as well
I had no issues at all getting it going
One other thing I'll note, just as a performance note - For my 4090, I had to update to latest cudnn dlls, and update torch/torchvision
I saw that. I'm not sure what the equivalent maneuvers are for Linux. Would love to do the same myself if you get such a nice performance boost.
Are you running on a 4090 too?
I uninstalled torch & torchvision, then did
pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/cu116
Nope, ampere architecture A2000. I built everything from source, so maybe I've got most recent.
I then went to https://developer.download.nvidia.com/compute/redist/cudnn/v8.7.0/local_installers/11.8/, pulled out the dlls, and manually overwrote what was in the site-packages/lib/torch folder
You're still using CUDA 11.6?
Maybe we should keep the doc after all and populate it with tips and tricks. Sounds llike this is still an area of rapid development.
This was a bit of a frankenstein of advice I'd seen online - I've definitely updated to latest cuda dlls
BTW, I've got a whole bunch of small PRs up -- all small bug fixes. If you have a chance to review, most of them are pretty straightforward. The ones that may have issues I've flagged.
π
to add another data point, I just tested the xformers==0.0.16rc425 wheel on Linux, seems to work. haven't tried image generation yet, but it installs and the python -m xformers.info test shows success.
"had to" for improvement in performance, or to fix breakage? just wondering, in case the installer needs to handle this case for 4090 specifically...
Performance
Was about 25% of what it is now, and got 2x (each) from xformers and the cuda updates
We need to find a fix for the seam painting being fully broken. Been breaking my head to isolate where the problem is but no luck so far. ---- EDIT: Just noticed that the issue is only with diffuser models. Not ckpt models.
@rose sentinel for xformers, I think if our pytorch req wont change during the course of a release, maybe it's better if we supply the wheels?
the xformers repo builds wheels when they run their actions on github
i got my windows version from there. FARRRRRRR better than trying to build it.
One issue with that is that, Triton does not get installed on windows. So it throws a minor warning. But hte optimizations seem to work.
Might not be an issue on Linux coz I think the pip install covers it?
xformers wheel from pip confirmed working perfectly on Linux (x86_64/CUDA11.7/torch1.13.1), both natively and in Docker. No more black images with SD 2.1!
we'll likely need to add platform markers for triton so it's not trying to install it on windows
I'll pip install on a copy... no guarantee I'll get much time to work on it though x.x