#🧨 diffusers

1 messages Β· Page 3 of 1

heavy glacier
#

My point is that (some) people are changing VAEs for certain models - VAE Override is the right terminology I think

#

and again, I don't think its extremely common, but it should be doable for those that want to do the work in models.yaml to make it happen

tardy sparrow
#

that's fair, yeah. i just fear that "you need a vae" is cargo-cult thought at this point and custom vae settings should not be front and center like they are with compvis, but rather an "advanced" override setting, as you say.

heavy glacier
#

for sure

#

I definitely think people hype themselves into thinking certain things add "exceptional quality"

#

I feel like it's every day that someone posts asking if we support [obscure feature] that unlocks exceptional quality heretofore unseen in the world of stable diffusion

#

and then I find that [obscure feature] is effectively [mangled bad ugly hand garbage]

#

πŸ™„

tardy sparrow
#

idk if you saw that research paper published recently that basically said "everybody in diffusion research thinks diffusers need gaussian noise but hey look they don't"

#

sorry keturn, wrong channel hah

heavy glacier
#

not intended audience, but i did see that πŸ™‚

worldly cloak
#

even the ldm ckpt files have an embedded vae, right? otherwise the conversion script wouldn't have to convert it: https://github.com/invoke-ai/InvokeAI/blob/ee1b9654489494b380ef5da5e9dc990ea1e6e540/ldm/invoke/ckpt_to_diffuser.py#L892

so if a certain model recommends always using a particular VAE, it should be that model's default VAE.

there will occasionally be times like when stability put out the new 1.x MSE-finetuned VAE, but tbh I don't know how common that will be.

"override" as an "advanced configuration" matches how I think of it.

#

and in the case of things like the MSE-finetuned VAE, you probably want to use it for all models that had previously been using old-VAE.

In which case it might even make more sense to have a global "load VAE" button be entirely separate from the "change model" button, instead of having to reconfigure all your models.

#

but that is well outside the scope of what we need for now.

rugged moth
#

@rose sentinel would it be possible to update the add_model code to remove a key if the new supplied version with clobber doesnt have it

#

right now theres an issue if i add a vae to a diffuser model, it creates that key

#

but lets say i edit it from the model and remove it from the ui

#

even though its passed back as no key

#

the object is not updated coz the key exists and because its not being overwritten it still stays

#

this is an issue because I am unable to supply null as value for the vae

#

on local paths, it is failing

#

one of the two needs to be fixed

#

that vae path set to null does not try to load vae from that path or the add_model functionality be updated.

rugged moth
#

I think doing the 2nd is better option

#

if we can restrict the vae entry slot from being used if the value is null or '' .. then i can just ensure that all model entries have all keys .. this way the model always tries to load the default vae within the model folder and look for a custom vae only when specifically supplied.

#

which is a lot safer and simpler way to assign the config rather than selectively having different keys for different models and then dangling them between the backend code that deals with it and the frontend objects that are being supplied.

rose sentinel
rugged moth
#

awesome thanks. ill continue the work on the model manager after that

#

its mostly in place.. just need to run it up to make sure all edge cases are handled

worldly cloak
#

speaking of branching strategies...

#

now that lstein has committed us to keeping ckpt compatibility for this release, that might enable us to merge this sooner rather than later.

i.e. you have to be comfortable saying the backwards-compatible stuff works, but we can be a little looser with the new stuff if it's an option instead of the only path.

west nebula
#

When we hit RC stage, I am happy testing the hell out of it (assuming it's not next weekend).

worldly cloak
#

I did a little dinking around with outpainting to look at seams, but mostly I'm stuck in that place of not really understanding how it's supposed to work with patchmatch in the first place.

heavy glacier
west nebula
#

Does embiggen still work?

worldly cloak
#

well, if seams are buggy in the new code, presumably embiggen (which relies heavily on seams) is going to feel that

#

or does it not? does it do its own merging? 🀷 I don't know anything

#

basically the signal from the infill methods seem so strong that I don't know how to avoid giving her this tower of hair

west nebula
#

It uses PIL's Image to do everything.

sour sun
#

Did you use a fixed seed for these? That might give some more insight, if not...

worldly cloak
#

oh, embiggen is built on top of img2img, not built on top of inpaint. and the seam stuff is in inpaint. right.

west nebula
#

Just want to see if anyone's tested it.

rose sentinel
worldly cloak
#

yeah, there were a number of places in the code where it said "this can't be done with the inpainting model" and I was like "Oh really? That sounds like a challenge."

so it's different than the ckpt_generator code because it doesn't try to avoid doing those things.

rose sentinel
clear hinge
#

(sorry, been having trouble keeping up with the chats lately)

heavy glacier
#

But I think merging to main and wrapping up bug fixing and RC there would be reasonable

rose sentinel
#

I certainly agree that we don’t release until everything is working but I’d like to get the merge done and a RC out soon. I think we’re close.

worldly cloak
#

Turned on debugging images... The init_filled means this is the result of patchmatch that's being sent to img2img, right?

steel marten
#

hello guys why the quality of the generation is extremely pixelated all the time

worldly cloak
# steel marten

are these outputs from the dev/diffusers branch? Which model?

steel marten
#

sorry, I'm still a beginner how can I recognize which model?

#

I can send a screenshot

worldly cloak
#

Oh, if this isn't from a development branch, you can ask in #1020695341303074846

steel marten
#

sure, thank you

west nebula
clear hinge
#

this definitely looks like patchmatch

worldly cloak
#

yeah, this is tiled.

clear hinge
#

(which yah, underperforms on people... the inpainting model is definitely the best model for people)

west nebula
worldly cloak
#

okay, ship it! shipit

west nebula
#

My wife is a redhead and I can confirm that she looks just like that.

worldly cloak
#

so do we need an option for another infill method that's just like "middle grey" or something? because there is no "None" option for infill on the front-end.

clear hinge
#

naw, without infill (with the regular models at least) inpainting doesn't really do anything

#

e.g. if you just fill with a solid color, it will strongly skew the results toward a solid blob of color

#

the omnibus.py code path (that the runwayml inpaint model uses) doesn't infill though

worldly cloak
#

so we need that option for the infill model?

clear hinge
#

omnibus doesn't need it

#

for some reason

worldly cloak
#

I probably threw away the code paths that called omnibus

#

because I didn't understand why there should be one implementation in the 'bus in addition to the other implementations in the other Generator subclasses.

clear hinge
west nebula
#

Found a little UI bug and not sure if it's in diffusers... switch high-res optimization on in txt2img, go to img2img, go back to txt2img and it's reset to off.

clear hinge
#

I don't understand the differences either. And maybe diffusers removes any differences anyway

worldly cloak
#

omnibus is tied in to this question about sampler.uses_inpainting_model: https://github.com/invoke-ai/InvokeAI/pull/1583#issuecomment-1368339238

There is one place left that uses it: https://github.com/invoke-ai/InvokeAI/blob/e294fcabeb88f8b56b7b9d9fc57a1e6ac5b72de0/ldm/generate.py#L715-L724

I'm not sure what that's about, as I don't think anything in this branch uses "omnibus" anymore. But we have these parallel generator modules for the legacy model support, so maybe it's something we can't get rid of until we drop that?

clear hinge
#

Yah, if the runwayml model is the one being used, omnibus.py is used instead of any of the other code paths

#

I've very minimally looked in that code (just to add color correction and original image paste-over after inpainting)

#

(if you don't do the paste-over, you get subtle, small changes to the original image, which doesn't look good when inpainting with a portion of the image selected in the bounding box)

worldly cloak
#

I haven't reproduced the smeary seam during all this either blobawkward

#

though I've been turning down the seam size from the default 96, cuz that's, like, almost the entire area I'm outpainting

clear hinge
#

ah, yah 96 size with 16 blur produced the best results in my testing. I outpainted by at least 128px though on a 512px image

rose sentinel
#

Diffusers Team: I have just merged in the code that changes the organization of the models directory. As described earlier, all the huggingface models are now organized in the same way that the huggingface cache does it, so that advanced users can share models by setting the HF_HOME environment variable appropriately.

I wrote a small routine that checks the models directory on startup and reshuffles the positions of the the huggingface models on a one-time basis. If everything goes as planned, everything will continue to load after this without a hitch. If I did anything wrong, you will see previously-loaded models try to load again; let me know if this happens.

I also made the fix to ModelManager.add_model() that @rugged moth requested earlier today. If a value is missing from the edited model configuration information passed by the front end, then it will be deleted from the configuration file rather than merged. There is a check that all the required fields are present, so it should be ok?

rugged moth
#

but it still not replacing but rather merging and now it also adds a new key called merge:false

tardy sparrow
#

so that advanced users can share models by setting the HF_HOME environment variable appropriately
does this mean that Invoke respects the value of HF_HOME and will load models from ~/.cache/huggingface if it's set to point there? or is it the opposite, that users are expected to set HF_HOME to ~/invokeai/models ?

#

fwiw, i do not think Invoke should override hf's default behaviour of putting files it auto-downloads into ~/.cache by default

rose sentinel
rose sentinel
# tardy sparrow > so that advanced users can share models by setting the HF_HOME environment var...

If HF_HOME is not set, then invokeai downloads to and uses its models from ~/invokeai/models, while all other huggingface API clients continue to use ~/.cache/huggingface (unless they have also changed the cache directory). If the user sets HF_HOME, then both InvokeAI and all other huggingface API cllients will use whatever path is contained in the environment variable as the common cache directory. To use the default huggingface cache, the user would have to do export HF_HOME=~/.cache/huggingface or the Windows setx equivalent.

#

The tradeoff, which we've discussed earlier on this channel, is whether to prioritize sharing over transparency. @worldly cloak and I worry that newbies will have difficulty finding the big model files contained in the hidden .cache directory and will wonder why their home directory's free disk space is getting smaller. This was the reason we thought it best to have all the downloaded models go into invokeai unless the user explicitly changes the behavior.

#

I think there's no perfect solution to this, and maybe it should be an installer option so that the user makes the choice.

#

The other big space management issue that @rugged moth talked about the other day is that the diffusers models share a lot of common files, including the NSFW checker (~1 G), the tokenizer, the feature_extractor and (usually) the vae. We can safe a huge amount of space by consolidating these files with some sort of deduplication system.

heavy glacier
#

This seems like an opportunity to work w/ @forest spade & team to figure out how this should be handled broadly speaking. It seems like diffusers model format is not really designed with the assumption "There will be many/multiple models" - as otherwise, I can't imagine why there wouldn't already be some form of broader standard on how to dedupe

worldly cloak
#

Yeah.

#

What would be nice is if a model_index.json could cross-reference to other repos, but I haven't seen any indication of that.

Come think of it, I don't think I've seen any documentation on model_index.json at all.

#

and it looks like the SD model repos don't have their own weights for feature_extractor, they just specify it's CLIPFeatureExtractor and let the transformers library get the model for that.

tardy sparrow
tardy sparrow
rugged moth
rose sentinel
#

A non-breaking solution that will work with the current diffusers layout would look something like this:

  1. Traverse the directory in which the models are stored.
  2. Compute a checksum for each file encountered; cache the mapping from checksum to file
  3. If checksum has been seen before, remove the redundant file and replace with a hard link to the original file with that checksum

This will create a directory structure that is identical to what we currently have, with the exception that multiple files will point to the same inodes. Each group of identical files will only take up the disk space needed for one. Because hard links are being used rather than symbolic ones, the user can move, delete or rename a file without leaving dangling links. One disadvantage is that if the user moves the directory structure to another file system, the space savings are lost. Another disadvantage is that this procedure needs to be run at periodic intervals, such as after a model is added or deleted.

#

By the way, I've got the diffuser-based textual inversion front end up and running. I'm doing my first TI training and so far it seems to be working as expected. The UI (for what it's worth) looks like this:

rugged moth
#

@rose sentinel I've been thinking about the diffusers model format in the yaml. I think you need to setup the structure so it works perfectly on the CLI. That way I don't have to do intermediary python code in the web_server file. Coz if at any time in the future, the model manager code changes, then that'll break the frontend rather than being in sync.

The reason the regular ckpt code works so well is because it more or less uses the same code as the CLI barring the completer.

Think we need to do the same with the diffusers format too.

If you can give me an example snippet of the data being passed to the add_model for an addition of a diffuser model correctly via the CLI, I can replicate that in JS.

Also side note, I think we should avoid using 'None' / undefined and keyless entries. I think all configs should have all keys irrespective of whether they are used or not. And in cases where the entries for vae paths for example are ' ' or null .. it should not try to load them.

This way we can build a single form structure that wont break no matter what keys are passed and what keys are omitted by the user during changes.

heavy glacier
#

sorry all for the deviation from topic but going to point you to #1062855391878328360 @rose sentinel - looks like an installer regression on windows thats affecting all installs (or at least a large volume)

#

did we do a p1 patch recently?

rose sentinel
#

My bad. Will fix.

#

Fixed. I made a trivial fix to the windows bat installer and neglected to go through my full release checklist. I guess there are no shortcuts!

tardy sparrow
#

there are indeed, no shortcuts

worldly cloak
dire gazelle
#

the caching effort is awesome, I think the HF_HOME approach makes sense

#

but let's avoid hardlinks. That would be very fragile. like others also said, maybe it would be best to wait / work with Huggingface to handle safety checker (et al) deduplication upstream

#

that said, on a fresh install it looks like there's a bug with the safety checker model handling:

  • on the 1st run, it is gets downloaded into the pre-2.2.5 directory structure (models/CompVis/...) even though the diffusers path is already there;
  • on the 2nd run the application notices that, tries to convert it and fails because the HF-style dir structure is already there:
#

I can fix that later tonight (morning now, so in ~15h) if no one gets to it first (but in that case, please lets coordinate changes to the configure_invokeai script 'cause these rebases are killing me πŸ˜…)

dire gazelle
#

nvm, I unexpectedly had a bit of time and just pushed a fix to generate.py that takes care of ☝️

#

(in diffusers)

#

and models are all found fine, so that answers my questions about the path naming. thanks @worldly cloak

rose sentinel
#

@worldly cloak I think there may be a (new?) bug in the step calculation or in the tqdm display of steps. When I specify a step count of N, the system reports it is performing N*2-1 steps. Here are some examples:

(stable-diffusion-1.5) invoke> a big bowl of jello -s15
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 29/29 [00:04<00:00,  6.75it/s]
(stable-diffusion-1.5) invoke> a big bowl of jello -s10
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 19/19 [00:02<00:00,  6.73it/s]
(stable-diffusion-1.5) invoke> a big bowl of jello -s5
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 9/9 [00:01<00:00,  6.70it/s]
#

@tardy sparrow Thanks for the work on deferred embedding token injection. It works like a charm. However, I noticed that if you inadvertently trigger an embedding that is not compatible with the currently-loaded model, it raises a fatal exception, so I just now put try: blocks around the places this may happen, so that the user sees an appropriate warning.

#

@tardy sparrow @heavy glacier I have a design question for you. For embeddings loaded from the huggingface concepts library, we detect and trigger them using the <trigger-token> notation. However, for embeddings that are loaded from the embeddings directory, including ones that are created by local textual inversion training, there is no special syntax, just the raw trigger-token. Aside from the inconsistency, there's the chance that the user could trigger an embedding without knowing why. I ran into this when I made a textual inversion using the term "jello".

Do you think that all trigger tokens should be enclosed in <> characters? The logic would then be to look if the token is already a loaded embedding trigger. If not, the token is checked against the concept library list and potentially downloaded. If neither case apply, then treat it as a normal part of the prompt.

The disadvantage of making this change is the obvious one that it breaks previous prompts that relied on bare trigger tokens, and that it might bring down a rain of abuse on our heads.

heavy glacier
#

I actually thought we needed to use <> for embeddings across the board, so didn't realize that...

tardy sparrow
#

hmm. i don't know. is the proposal that on load Invoke would force the token to be <>? so jello.pt would become <jello>? what if it was an embedding like the HF .bin ones that already contains <>?

#

do users want to be able to still get the original jello feature vector (ie the non-TI one) when they've loaded jello.pt? or do they want the presence of jello.pt to always override jello?

west nebula
#

Embeddings are currently <embedding> and I think it makes sense to keep all of them that way.

#

<jello> style, bowl of jello

tardy sparrow
west nebula
#

Wait, seriously?

#

In main, I always use <embedding> to trigger my embedding in embeddings/ that I trained with main.py.

#

And now I see I'm wasting tokens.

rose sentinel
west nebula
#

Given that, I think they should all be <embedding>.

#

Even though that's a change, I don't think too many people are using local embeddings.

tardy sparrow
#

oh god can you delete that gif sorry

rose sentinel
#

But on second thought, what I should do is to add the <> to the token produced by the diffuser training pipeline, to keep backward compatibility while having a more consistent convention going forward.

tardy sparrow
#

my eyes

#

ty

rose sentinel
#

Thanks for removing the star! I was starting to seize.

#

(just kidding)

tardy sparrow
west nebula
rose sentinel
#

Seriously though, now that I think it over, we shouldn't mess with the tokens in other people's trained embeddings. I'm just going to quietly add <> around the placeholder tokens produced by InvokeAI's TI training front end.

west nebula
#

This is what threw me off: >> Current embedding manager terms: *, <HOI4-Leader>, <princess-knight>

rose sentinel
#

That's because they are HuggingFace-trained placeholders, which follows the <> convention.

west nebula
#

I do think it makes sense to use <> as a trigger. Can we inspect the placeholder token to see if it has <>?

rose sentinel
west nebula
rose sentinel
#

By the way, diffusers TI training creates a 2-4K learned-embeddings.bin file and an 8 Gig diffusers-style model directory complete with the safety checker. At the end of training, the script I wrote moves learned-embeddings.bin into the embeddings directory and then prompts the user whether they want to delete the big directory (default "yes"). Is there any reason at all to keep the big directory?

rose sentinel
west nebula
rose sentinel
#

Supposedly. I haven't tested that functionality yet. If the training crashes or is interrupted, the folder stays there. It takes checkpoints every 500 steps or so.

west nebula
west nebula
#

With the current TI trainer, I've definitely tried out different TI checkpoints to see which one works best. It's all vague. But if that situation has improved with diffusers, then past checkpoints become less important.

rose sentinel
#

Maybe it's possible to resume from a checkpoint without having the log files and the full model around. I'm just now seeing that there is a parameter named only_save_embeds which I imagine should be set to True.

worldly cloak
forest spade
# heavy glacier This seems like an opportunity to work w/ <@930108582479941643> & team to figure...

This makes a lot of sense! We haven't thought too much about this indeed.
Happy to brainstorm about how we can build a nice system that avoids deduplicating model weights downloading!

Some comments:

Having looked at our current code a bit, it's actually not too difficult to make the changes (should be quite easy). Put up an issue/feature request here: https://github.com/huggingface/diffusers/issues/1984 . Would be great if you could give it a look. Then nice thing about the "individual compononents" saving format is that we have a sha256 key for each individual component and can thus quite easy load only parts of the model. Maybe @worldly cloak you could give the issue a look πŸ™‚

forest spade
rose sentinel
forest spade
#

Checking!

rose sentinel
#

Indeed, the step count matches when I use LMSDiscreteScheduler. (also DPMSolverMultistepScheduler and EulerDiscreteScheduler)

forest spade
#

Hmm HeunDiscreteScheduler also works for me on current "main" branch. We might have had a bug in a previous version (what version are you using?).

rose sentinel
#

0.12.0.dev0

worldly cloak
#

If I understand correctly, it's working for lstein, he's just surprised by the step count.

probably because of your double-the-steps technique for heun's second-order stuff

rose sentinel
#

I installed that because I'm getting black images using SD-2.1 without xformers installed.

forest spade
#

I mean the step count also works for me, I'm seeing tqdm showing 25 steps when setting num_inferenc_steps=25

rose sentinel
#

Yes! Heun producing perfectly good images. It's just reporting more steps than I asked for.

forest spade
#

wait lemme make a quick google colab

rose sentinel
#

(going offline for a meeting; apologies)

forest spade
worldly cloak
#

oh, because you tweaked the progress report to not function as an iterator, but instead you manually call progress_bar.update every scheduler.order steps. blugh

forest spade
worldly cloak
#

lstein, I guess we could do that if it's important to show progress per β€œstep” instead of per actual unet.forward.

but I'd rather not if it's all the same to you.

If it makes too much dissonance seeing an unexpected 2*N-1 there, we could normalize everything to percentages.

marsh meadow
#

Hello! Is it possible to enable xformers? I built it from source within venv but it doesn't seem to be used.

tardy sparrow
#

if the option isn't there i'd highly recommend hacking up the code to just not save that 8Gb folder. because there's zero need for it.

rugged moth
#

does anyone have a cuda wheel for xformers on windows python 3.10 and cuda 11.6?

worldly cloak
marsh meadow
rose sentinel
languid creek
rugged moth
#

oh wait they might be. wait let me check if they have my version

#

nope

#

windows-2019-py3.10-torch1.13.1+cu117
need CU 116

#

is there an archive of older wheels?

rugged moth
#

had to do a hackjob but got it working. upgradeed the pytorch installation on invokeai to cu117 and installed the wheel from above.

#

but the wheel places cpp_lib.json in the wrong folder. Move it to the right folder and it'll begin to work

#

Triton doesnt work on windows

#

im guessing that needs another build

languid creek
#

I think you can go through their actions to find older versions

heavy glacier
#

Sounds like we're on the cusp! Anyone want to share thoughts on how we actually start getting this ready for release? πŸ™ˆ

worldly cloak
rose sentinel
#

Other issues that can be addressed after the merge are:

  1. Regression in the inpaint/outpaint quality (Kyle and Keturn)
  2. Diffusers model merge script (Lincoln)
west nebula
#

Also maybe some notes about the migration from 2.2.5 to *?

#

e.g. Can I just do a git pull and update pip via requirements.txt and launch it or do I have to do more setup work?

rose sentinel
#

Yeah, we need release notes too, although migration should be pretty easy. I tried to make it happen automatically.

#

You can just do a git pull and pip install -r requirements.txt

west nebula
#

Nice.

rugged moth
dire gazelle
# worldly cloak is there a reason to stick with CUDA 11.6 instead of upping to 11.7?

We are indeed upgrading to 11.7 afaik (unless anyone has a specific reason to hold back). That's already implemented in both my dev/installer work and mauwii's pyproject.toml migration. But it's not yet the case in main or dev/diffusers. It's an easy change to make (just requirements-*-cuda.txt), but we just need to make sure that all contributors actually upgrade to the new version, to avoid any surprises (though to be fair, I really don't think there are any surprises to expect).

#

will xformers work on non-RTX cards? or more importantly, will torch still work on non-RTX cards when xformers is installed?

dire gazelle
west nebula
#

I vaguely recall doing something like pip3 install xformers==0.0.16rc390 triton torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu117 for another project.

worldly cloak
west nebula
#

I'm confused. It's a requirement for CUDA but should be left out if you're not using CUDA?

worldly cloak
rose sentinel
#

Well, let's discuss. Support for 2.1 is one of the big features that we were going to use as a selling point for the switch to diffusers. Right now we have partial support, and I was thinking this would be a major issue. Happy to get other opinions on this, as I am eager to get the merge in.

#

?

#

Not clear from the comments whether autocast is actually the culprit.

worldly cloak
#

indeed. I'm reasonably sure we don't have autocast in our implementation, but those later comments say it's still broken even without autocast.

#

from the project management perspective: I've been advocating all along that we treat "migrate existing functionality" and "add new functionality" as separate stages.

We can decide we don't want to make a public release without both of them, but I don't think it's necessary to force both of them in to the same PR.

#

from the "what will it take" perspective: since we know turning on xformers changes the behavior, we know it's something to do with the attention code.

which I take that to mean it will involve going in to the attention code of diffusers 0.12.dev0 (still no diffusers release this week) and spelunking around in there and probably converting @tardy sparrow's monkey-patched attention code to the new API while we're at it, otherwise it'll be that much harder to keep track of what is happening.

rose sentinel
#

Fair enough. @heavy glacier, @rugged moth what do you think? Should we go ahead and merge dev/diffusers into main, knowing that using SD-2.1 without xformers gives black images? Sounds like it'll be several days to a week before this issue is solved. If the answer is yes, then I need a few hours to add a new document to warn people about the changes and what to expect, and can do the merge Saturday, barring any merging conflicts.

rugged moth
#

that way we dont have divergent branches

#

the longer diffusers stays on its own, the less the work can get done on main

#

and we wont be releasing officially until the bugs are fixed

#

so its fine

#

we'll need to address the regression with outpainting / inpainting too

#

ive gotten xformers to work on windows with a custom build but its missing triton

#

I think triton needs to be built separately with windows .. (not even sure if thats possible) need to look into it

#

But yeah, I had to upgrade to Cuda 11.7 so i can use the wheel provided by xformers repo

rose sentinel
#

Excellent! Unfortunately I'm on a ROCm system without xformer support, but am getting an Nvidia video card this weekend.

rugged moth
#

Sounds great. Would make the testing easier.

#

Which card did u get your hands on? 3x or 4x?

rose sentinel
#

As an aside, I did get the diffusers two-way model merge working, but three-way merging is still problematic.

rugged moth
#

Thats good. We can sort that out too as we go ahead i guess

rose sentinel
worldly cloak
#

I'm more worried about the crashes I don't have the hardware to reproduce, like what @wheat fiber was having on M1.

rugged moth
#

12gb?

#

its ampere architecture.. so thats great

rose sentinel
#

Yeah.

heavy glacier
#

I think Merge =/= Release

rugged moth
#

and 12gb would let you run dreambooth with optimizations

rose sentinel
#

I wanted to get something typical of what our users have.

heavy glacier
#

We should merge as soon as everyone is comfortable doing so (i.e., nobody thinks anything existing is fundamentally broken/regressed)

#

But I don't think we should release w/o 2.1

rugged moth
#

are we sure the image quality degradation is not to do with diffusers? on the inpainting / outpainting i mean?

#

whats the issue with 2.1? images are black with xformers?

heavy glacier
#

yeah

rose sentinel
heavy glacier
#

(re 2 - not sure on first q)

rugged moth
#

gimme a min .. let me check if thats the case with 2.1 on windows too

#

coz i am 100% certain i generated proper images without xformers on windows

#

cant tell you how much faster the boot up is now

#

i hated waiting for ckpt reloads

#

during development

rose sentinel
#

In terms of image quality regressions, I'm aware of (1) black images when using SD-1.2 without xformers; and (2) outpainting has bad seam issues.

rugged moth
#

wait 2.1 has black images without xformers?

rose sentinel
#

Very recently I've also noticed that images produced with some of the diffusers schedulers, in particular diffusers.HeunDiscreteScheduler, don't seem to be as good with the diffusers version of SD-1.5 as they were with the checkpoint version. However, I haven't done a systematic comparison. Is this the regression that @rugged moth was just talking about?

rugged moth
#

outpainting and inpainting has serious issues

rose sentinel
rugged moth
#

is there a command for it?

rose sentinel
rugged moth
#

or should uninstall the package?

rose sentinel
#

I'm not sure how you turn it off, because I can't use xformers on my platform.

worldly cloak
#

there's not currently a --disable-xformers option. uninstall or comment out a line in diffusers_pipeline.py

rose sentinel
#

I'm willing to live with the outpainting/inpainting issues and fix on main. Have to fix before release, though.

#

Inpainting seems OK to me (though I've not got a discerning eye). Outpainting is seriously ill.

rugged moth
#

yep 2.1 black images without xformers

#

which is weird

#

coz i definitely had it working before

#

wait let me test something

rose sentinel
#

2.1 works on CPU, which is also weird.

worldly cloak
#

I'm getting the impression that "black image" means "you hit some pytorch bug when using a view or non-contiguous memory or something and so you just got zeros back for some operation and good luck finding out where."

#

I'm thinking I need a wrapper method that does the "is this result all zeros?" check as a troubleshooting aide.

rose sentinel
#

If it helps at all, I sometimes see a single frame of the latent diffusion noise before it goes black in the web preview mode.

#

Actually, I guess this doesn't help at all, because the latent noise is written before the first scheduler iteration occurs. NVM

rugged moth
#

found the black image bug @rose sentinel

#

its related to float16

#

changing to full precision works fine

#

now need to see what is breaking it

worldly cloak
#

being somehow related to precision is consistent with that autocast-related bug report

rugged moth
#

apparently related to 2.1 768 model itself -- the 512 one runs fine. It's just the 768 one that needs to be full precision

#

because of their attention module native 768 runs on FP rather than half

#

the reason it works with xformers is coz xformers forces Half on the attentioning

#

@forest spade Sorry for the beep but I noticed your comment here that upgrading to 0.10 would fix the black image issue on using the 768 model with half precision but that does not seem to be the case. 768 model only seems to work on full precision. Would you know of a bypass for this? https://huggingface.co/stabilityai/stable-diffusion-2-1/discussions/9

rose sentinel
rugged moth
#

need to check if the depth and upscale model work fine

rose sentinel
#

Doubles the memory requirements! How much VRAM will that use?

rugged moth
worldly cloak
rugged moth
#

@worldly cloak do we know why the first frame is grey when we start invoking?

sour sun
rugged moth
#

the very first latent image that is received back is always grey

#

but im guessing thats just the grey before the noise is filled in

sour sun
#

AFAIK, that issue is only supposed to affect MPS, and my understanding is that it only produces different results, meaning a given seed will produce a different image if it's the first generation after startup. Could be that the fix for it would solve your problem as well, though.

frozen finch
#

If I'm building dev/diffusers from repo on windows is there some additional steps to setup xformers if it is not currently installed?

heavy glacier
#

yes, but we're currently working those out. lol

#

(so that you dont have to take any extra steps!)

#

@worldly cloak any thoughts on how i can avoid this torch.Size([79]) bug without modifying prompts?

#

trying to do promptcraft and am on the diffusers branch and... well, would feel odd if I modified someones prompt to work πŸ™ˆ

worldly cloak
#

oh geez. um. probably nothing I am in any condition to come up with this evening.

so I'd say either switch branches (or do you only need to switch back to the ldm ckpt model?), or manually chop tokens off the end

heavy glacier
#

chopping tokens off the end isnt doing much - i think this has to do with weighting and less about truncation logic πŸ€”

#

I will see what I can do to switch branches and not have everything implode! πŸ™ƒ

west nebula
#

Thoughts and prayers

heavy glacier
#

oof

#

does not like having a diffusers model, so some models tweaking in order.

tardy sparrow
tardy sparrow
#

i've been looking into converting the ckpt -> diffusers converting script for on-the-fly loading and actually it seems pretty straightforward?

tardy sparrow
#

@forest spade above is the result of adapting the convert_original_stable_diffusion_to_diffusers.py script to an import-able module that returns the converted pipeline object rather than saving it to to disk - should a i make a pull request? code is attached

heavy glacier
#

And no I’ve only posted here

light glade
#

Just curiosity, how close we are from accepting models from sd 2?

heavy glacier
#

very

light glade
#

Nice πŸ₯°

tardy sparrow
#

@rose sentinel see above - i've basically got a way of loading .ckpt directly as StableDiffusionPipeline objects without having to save any files on disk. i took a quick look at the model manager but it seemed to be a non-trivial task to inject this in, either as an option or as a default loading mechanism for .ckpt files.

worldly cloak
#

that's very good news!

I'm guessing the loading time is pretty close to what ckpt is normally?

tardy sparrow
#

yeah, pretty much

#

the first attempt has to download a copy of the CLIP model and a couple of other files to hf's cache, but that is shared for all subsequent loads

rose sentinel
heavy glacier
#
already exists``` got this after a ** Legacy version <= 2.2.5 model directory layout detected. Reorganizing.
** This is a quick one-time operation.
DEBUG: Moving X:\stablediffusion\InvokeAI-files\models\CompVis\stable-diffusion-safety-checker\models--CompVis--stable-diffusion-safety-checker into hub
#

this might be from switching back/forth to main though

#

removing /hub/ seems to have worked.

#

maybe an edge case.

heavy glacier
# tardy sparrow what bug is this? is there a github issue?

Traceback (most recent call last):
File "c:\users\kentr\invokeai\backend\invoke_ai_web_server.py", line 1216, in generate_images
self.generate.prompt2image(
File "c:\users\kentr\invokeai\ldm\generate.py", line 468, in prompt2image
uc, c, extra_conditioning_info = get_uc_and_c_and_ec(
File "c:\users\kentr\invokeai\ldm\invoke\conditioning.py", line 30, in get_uc_and_c_and_ec
conditioning = _get_conditioning_for_prompt(prompt, negative_prompt, model, log_tokens)
File "c:\users\kentr\invokeai\ldm\invoke\conditioning.py", line 106, in _get_conditioning_for_prompt
conditioning, cac_args = _get_conditioning_for_cross_attention_control(model, parsed_prompt, log_tokens)
File "c:\users\kentr\invokeai\ldm\invoke\conditioning.py", line 201, in _get_conditioning_for_cross_attention_control
edited_embeddings, edited_tokens = _get_embeddings_and_tokens_for_prompt(model,
File "c:\users\kentr\invokeai\ldm\invoke\conditioning.py", line 237, in _get_embeddings_and_tokens_for_prompt
embeddings, tokens = model.get_learned_conditioning([fragments], return_tokens=True, fragment_weights=[weights])
File "X:\anaconda\envs\invokeai\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "c:\users\kentr\invokeai\ldm\invoke\generator\diffusers_pipeline.py", line 601, in get_learned_conditioning
return self.prompt_fragments_to_embeddings_converter.get_embeddings_for_weighted_prompt_fragments(
File "c:\users\kentr\invokeai\ldm\modules\prompt_to_embeddings_converter.py", line 56, in get_embeddings_for_weighted_prompt_fragments
base_embedding = self.build_weighted_embedding_tensor(tokens, per_token_weights)
File "c:\users\kentr\invokeai\ldm\modules\prompt_to_embeddings_converter.py", line 223, in build_weighted_embedding_tensor
raise ValueError(f"token_ids has shape {token_ids.shape} - expected [{self.max_length}]")
ValueError: token_ids has shape torch.Size([79]) - expected [77]

#

Thats the full error

#

seems like something is shaping it wrong (at 79) regardless of how many extra tokens there are, when weights &/or cross attention syntax are being used

rose sentinel
#

@gusty hound I think the time has come to disable the conda tests. They are failing on dev/diffusers, and I would like to get the merge into main done. I assume you're in favor of this!

west nebula
#

@rose sentinel Is the TI in the diffusers branch a complete rewrite of what's there, are you using some code right out of diffusers, or is it something else?

rose sentinel
#

It is a slight adaptation of the code in diffusers.

gusty hound
west nebula
tardy sparrow
sour sun
#

Please, this. I have a custom script that uses this, and it can generate larger images, significantly faster, and with less memory pressure than invoke.

tardy sparrow
#

the trouble with slicing is that invoke’s slicing already works extraordinarily well for cuda, and id worry that turning to diffusers implementation would negate that

sour sun
#

Once the RC is out, I'll compare performance with my minimal script.

tardy sparrow
sour sun
#

I'm not sure... how do I check my version? I just had to add torch_dtype=torch.float16, when setting up my pipeline, and it just worked...

#

Actually, it probably is a greater version than that... I think the setup instructions had me install a nightly...

#

looks like it was 1.13.

#

It's pytorch 1.13.1

tardy sparrow
#

great, thanks

west nebula
#

Trying the new requirements.txt...

  Downloading https://download.pytorch.org/whl/cu116/torch-1.13.1%2Bcu116-cp310-cp310-linux_x86_64.whl (1977.9 MB)```
#

That doesn't seem right to me.

gusty hound
#

To have it posted where it belongs:

I have some trouble with the new diffusers main branch on my M1:

  • default model is set to 2.1 which is not working
  • inpainting is just giving me strange results
  • k_heun does double the amount of configured steps
gusty hound
west nebula
#

Really huge!

#

But I guess it has CUDA + CPU support?

#

I assume the cu116 means it does work on CUDA as well, otherwise this isn't going to work for me...

heavy glacier
heavy glacier
#

I think that might just be "2.1 is a weird model"

west nebula
#

@heavy glacier Am I reading it right that the pytorch 1.13.1 install is CUDA + CPU?

west nebula
#

Is there a reason why I'm getting the CPU version at all since I selected the CUDA requirements.txt?

#

It's just... big.

tardy sparrow
#

model quality results might be due to not running karras scheduler

#

i think it’s shipping the entirety of CUDA

#

which is why it’s so big @west nebula

#

judging by the macos install size the CPU build is like 50mb tops

west nebula
#

Well I guess I'm lucky then to get a 2GB file.

#

Good thing I don't have bandwidth caps.

unkempt timber
#

Wow I did not expect it to be merged that quickly
And I currently can’t test :(

#

But will try

#

As soon as I can

west nebula
#

@rose sentinel I guess main.py should go away now?

rose sentinel
#

Yes, you're right. It should be removed and the textual inversion instructions need to be rewritten.

west nebula
#

Didn't realize we still needed HF tokens

#

Got this message during configure_invokeai.py that would be helpful to suppress:

Could not fetch half-precision version of model stabilityai/sd-vae-ft-mse; fetching full-precision instead

#

Comes up with multiple models, actually.

#

Could not fetch half-precision version of model Fictiverse/Stable_Diffusion_PaperCut_Model; fetching full-precision instead

west nebula
#

I ran configure_invokeai.py first before running.

heavy glacier
#

did you switch back to a diff branch?

#

either way

#

delete hubs folder

#

and it should work

west nebula
#

OK. Just alerting for future installers.

#

I did a git pull.

heavy glacier
#

but some logic on handling that folder probably something that needs to be done and handled for @rose sentinel

west nebula
#

upgraded via requirements.txt, then ran configure.

#

This should be seamless, agree.

#

configure_invokeai.py should probably check to see if the old ones are there prior to downloading new ones. That should take care of this type of issue.

#

Or it could do the migration instead of downloading again.

#
>> Detected .pt file variant 1
>> Detected .pt file variant 1
>> Detected .pt file variant 1
>> Detected .pt file variant 1
>> Detected .pt file variant 1
>> Detected .pt file variant 1
>> Detected .pt file variant 1
>> Detected .pt file variant 1
>> Detected .pt file variant 1
>> Detected .pt file variant 1
>> Detected .pt file variant 1
>> TextualInversionManager refusing to overwrite already-loaded token 'Χͺ'
>> Detected .pt file variant 1
>> TextualInversionManager refusing to overwrite already-loaded token 'Χͺ'
>> Detected .pt file variant 1
>> TextualInversionManager refusing to overwrite already-loaded token 'Χͺ'
>> Detected .pt file variant 1
>> Detected .pt file variant 1
>> Detected .pt file variant 1
>> Detected .pt file variant 1
>> Textual inversions available: charTurner, Χͺ, comicart1000, comicart1500, comicart2000, comicart2500, comicart3000, comicart3500, comicart4000, conceptart, conceptart2, cxzz, jarz3, lfa, profood, starship```
#

Would be nice to know what files have Χͺ as a token.

#

So I could clean this mess up!

worldly cloak
west nebula
#
Point your browser at http://localhost:8999 or use the host's DNS name or IP address.
>> System config requested
>> Patchmatch initialized
>> System config requested
>> Image generation requested: {'prompt': '(a kitten), comicart1500', 'iterations': 1, 'steps': 50, 'cfg_scale': 9, 'threshold': 0, 'perlin': 0, 'height': 768, 'width': 768, 'sampler_name': 'ddim', 'seed': 4059114571, 'progress_images': False, 'progress_latents': True, 'save_intermediates': 5, 'generation_mode': 'txt2img', 'init_mask': '...', 'seamless': False, 'hires_fix': True, 'variation_amount': 0}
ESRGAN parameters: False
Facetool parameters: False
{'prompt': '(a kitten), comicart1500', 'iterations': 1, 'steps': 50, 'cfg_scale': 9, 'threshold': 0, 'perlin': 0, 'height': 768, 'width': 768, 'sampler_name': 'ddim', 'seed': 4059114571, 'progress_images': False, 'progress_latents': True, 'save_intermediates': 5, 'generation_mode': 'txt2img', 'init_mask': '', 'seamless': False, 'hires_fix': True, 'variation_amount': 0}
'WeightedPromptFragmentsToEmbeddingsConverter' object has no attribute 'device'


Traceback (most recent call last):
  File "/home/jovyan/work/InvokeAI/backend/invoke_ai_web_server.py", line 1216, in generate_images
    self.generate.prompt2image(
  File "/home/jovyan/work/InvokeAI/ldm/generate.py", line 463, in prompt2image
    if self.free_gpu_mem and self.model.cond_stage_model.device != self.model.device:
AttributeError: 'WeightedPromptFragmentsToEmbeddingsConverter' object has no attribute 'device'```
#

Also getting this on first run.

#

But! Even though I can't render anything, model loading time is fantastic.

#

If I use a converted-on-the-fly ckpt, I don't get that error and I can render things.

worldly cloak
#

hmm, I uninstalled all my embeddings back when they were slow, and haven't tried again since the deferred loading for them went in.

#

oh, I also see that's a conditional on free_gpu_mem.

yeah that option needs more investigation

west nebula
#

I'll turn it off for now but let me know if you want me to test anything.

#

Getting wildly different results between 2.2.x and main with an embedding:

#

I'd argue the latter is better, which is good, but I'd love to know what's causing the difference.

worldly cloak
#

yeah that's pretty dramatic

west nebula
#

I was using the full 1.5 model before. Is that available via huggingface?

#

v1-5-pruned.ckpt - 7.7GB, ema+non-ema weights. uses more VRAM - suitable for fine-tuning

#

It seems that now I'm using the diffusers one which is smaller.

worldly cloak
#

I don't believe there's any reason to use the non-ema weights for inference.

"but keturn," you say, "what do I use when I want to train dreambooth or textual inversion? Don't I need the "suitable for fine-tuning" weights then?"

good question, hypothetical questioner! I have no idea.

west nebula
#

Thank you for anticipating my needs.

#

Just wondering if the model change accounts for the difference in output.

worldly cloak
#

What's the same prompt look like without the embedding loaded?

tardy sparrow
#

@west nebula if you have steps <30 it could be karras scheduling. try passing - - karras_max 1 to main, if that gives you the same output as diffusers then that’s the culprit

west nebula
#

ddim 50 7.5CFG

rose sentinel
dire gazelle
# rose sentinel I will be back to working on these rough around the edges problems later this af...

I may have already fixed some of it here: https://github.com/invoke-ai/InvokeAI/blob/820577b112fd36f11646ebd8a00dec88030a4a27/ldm/invoke/config/configure_invokeai.py. But my PR is unmergeable again and will need another intense rebase. Hoping to get it to a mergeable/testable state late tonight/early tomorrow morning. Could I ask to hold off on the configure_invokeai script changes until then, please? πŸ™

worldly cloak
#

as loathe as I am to cite Twitter as a source, Tanishq is pretty reliable on this sort of thing. (part-time at Stability and also the organizer of ElutherAI's diffusion paper reading group.) https://twitter.com/iScienceLuvr/status/1601011140934664193

I think there is some confusion in the #AiArt community about Stable Diffusion weights. Some folks think non-EMA is for inference, while EMA is for fine-tuning.

Use EMA weights for generation!

You can use non-EMA for fine-tuning but probably fine to use EMA for fine-tuning too

Likes

217

west nebula
#

See, I'm not crazy for always using the full model! Maybe!

#

I switched back to the full 1.5 ckpt and the image matches the original again.

#

So is there a full non-ema diffusers 1.5?

worldly cloak
west nebula
#

😦

#

Isn't that what should be used for the TI training anyway?

rose sentinel
#

@rugged moth in order to support the 2.1 model without xformers I’m going to add an optional precision field to the models.yaml. It will have 3 choices: auto, fp16 and fp32. Auto will use whatever the user’s precision is set to, and will be the default if the field isn’t present (this is current behaviour). Either of the others will force the chosen precision.

rose sentinel
west nebula
worldly cloak
#

revision is branch, effectively. (i.e. both branches and tags may be used to refer to a revision in git)

west nebula
#

What happens if I do repo_id: runwayml/stable-diffusion-v1-5/tree/non-ema?

worldly cloak
#

diffusers will be like "that's not a repo_id crysad "

west nebula
#

diffusers 😦

#

So the good news is that everything works as before if I don't use the non-ema checkpoint - thus embeddings work perfectly.

#

So minor hiccups are IMO a lack of clarity about conflicts with embedding terms, problems with --free_gpu_mem, and misc. model manager problems that I've run into but haven't investigated yet.

#

Oh, and a warning on every render about the safety checker...

#

@rose sentinel In the TI training text UI, once you hit enter on a path selector you can't get out without picking an existing path. Is there a way to make a new directory or back out of the selector?

#

Really like the text UI for this!

#

BUT...

#

Looks like something may not be getting read correctly from my inputs.

#

I changed the output directory and it didn't use my selection.

tardy sparrow
west nebula
#

OK, I'll give it a shot. Didn't work with the ckpt because that's not supported by diffusers.

#

(I didn't convert yet.)

rose sentinel
west nebula
rose sentinel
#

The file selector is a bit weird but if you go to the bottom and just type out the path you want it create the directory.

west nebula
rose sentinel
west nebula
#

No rush, just wanted to test things out and let you know.

sour sun
#

Do the manual install instructions need to be updated for diffusers? I tried doing a manual install from the current main, and the configure_invokeai.py step gives me this:

worldly cloak
rose sentinel
west nebula
rose sentinel
#

You can take any of the checkpoint files and place them into the embeddings folder. These are at the top level of text-inversion-training and have names like learned_emeds-steps-1000.bin

rose sentinel
west nebula
#

Out of curiosity, where does the per-step code for diffusers live?

#

If I were to, say, try implementing symmetry in some fashion?

sour sun
# tardy sparrow ooh. how do you do fp16? does that need torch > 1.13?

Got the diffusers branch running. It's significantly slower and uses significantly more RAM than my script that uses fp16, and roughly the same speed as invoke 2.2.5. Both diffusers and non-diffusers seem a bit slower than 2.2.5, likely having to do with the additional code eating a bit more RAM, and thus increasing memory pressure. The generation I tested pushes this machine's RAM limits in fp32. I'd like to see if I can fix this by setting invoke's diffusers pipeline to fp16. Searching through the code now to figure out where to do that, but if someone could point me in the right direction...

#

Nevermind, it's a command line variable, duh.

#

Throws an error, but it seems to be from something in invokeai/lib/python3.10/site-packages/torch/amp/autocast_mode.py hard-coded to do that when you try to use fp16 with mps.

#

bypassing that causes more errors (type mismatch stuff, it looks like), so not something I can fix on my own. Other libraries that need to be updated, maybe? It IS possible to run fp16 on mps with a diffusers pipeline, though...

sour sun
#

And installing a pytorch nightly from within my venv doesn't help

#

Let me know if there's anything else you want me to try. I'm out of ideas.

worldly cloak
#

That PR from pcuenca was included in diffusers 0.7, which was a long time ago in diffusers-years.

sour sun
#

Yeah, just thought it might be useful information, since it was adding functionality that isn't currently working in invoke.

#

So, this is probably very useful information. I was having trouble figuring out how to add a VAE to my diffusers model in invoke (I'm getting minor differences in image generations between diffusers and non-diffusers versions of the same model, but they seem to be the kind of stuff that a VAE difference would explain). Anyway, in doing so, I realized that I never added a VAE to my custom script. When I tried to do that, I get what appears to be the exact same type error that I got in invoke, after bypassing the fp16/mps error.

rugged moth
#

@clear hinge what do you reckon is causing the low quality seam fix with the diffusers

clear hinge
rugged moth
#

no

#

its only the seam

clear hinge
#

probably the inpaint.py code doesn't work the same anymore. It basically did get_make_image on itself and called that to do a seam paint, then tried to restore everything after doing that.

rugged moth
#

it seems fine in the debug images until the very end

#

i feel it might have something to do with the last make_noise call not working as intended

#

its almost like an extremely low res image is being pasted back

#

in the seam area

sour sun
#

I can currently use fp16 OR load a VAE in my script, but trying to do both produces that type error.

rose sentinel
#

The same PR code should allow you to force fp16 on models and VAEs. Use precision: float16 in the model stanza (untested, and probably won't help @sour sun )

#

There are now a bunch of PRs queued up, each of which make model management a bit better. When folks have a chance, please have a look. Also, I'm creating a new channel for version 2.3, now that diffusers is in main.

clear hinge
# rugged moth ive been trying to debug it

This seems like a concerning change:

        mask_image = mask.convert('RGB'), # Code currently requires an RGB mask

to
mask_image = mask,
infill_method also now gets passed to seam_paint, though it's not needed - could indicate that it was passed to get around an introduced bug:

        inpaint_height = im.height

to
inpaint_height = im.height,
infill_method = infill_method

rugged moth
clear hinge
#

Found this in the change too:

        if mask_image.mode != "L":
            # FIXME: why do we get passed an RGB image here? We can only use single-channel.
            mask_image = mask_image.convert("L")
rugged moth
#

and it cant multiply with the alpha mask otherwise

worldly cloak
#

I just think it's hilarious that when you were preparing 2.2, I was saying "patchmatch isn't production-ready, leave that for later."

but everyone was "noo! we need patchmatch! everything is horrible without it!"

so then when I was doing all the diffusers migration of inpainting, I made sure everything used the chosen infill method consistently.

and now y'all are like "this code is broken, make it not use patchmatch." πŸ˜†

clear hinge
#

uh... I don't think seam painting has anything to do with infill/patchmatch.

rugged moth
#

its broken on both. not to do with them

rugged moth
worldly cloak
#

does the problem with seams only show up in combination with canvas use, or is it present in CLI as well?

I ask because it's a lot easier to run test cases with CLI.

worldly cloak
#

and if I understand the reports from hipsterusername correctly, it's only a sometimes-thing, so I haven't been confident about reproducing it at all.

heavy glacier
#

I would imagine canvas πŸ€”

#

(but havent tested CLI)

heavy glacier
#

"sometimes" for me - it seems

#

or should I say... SEAMS

rugged moth
#

but the image degradation on the seams has been there for me 100% of the time so far

heavy glacier
#

that's possible

#

sometimes its just garbage

rugged moth
#

patchmatch and infill both have it

heavy glacier
#

strange...

clear hinge
#

you have all the debug images available?

#

(I have the equivalent of the engine pulled apart on my desk in the nodes branch and so little time to code lately that I don't want to switch branches or rebase at the moment)

rugged moth
#

@clear hinge final result

clear hinge
#

Does it not spit out debug images for the seam paint?

#

The second init_filled and the masks after that I didn't expect to look like that

#

Pretty blurry/foggy image going into it too πŸ˜›. Weird that it looks less blurry on the seam.

rugged moth
clear hinge
rugged moth
#

seam_mask

#

hmm .. interestingly the seam result is not showing up

clear hinge
#

What's the init_img when seam painting?

rugged moth
#

init_filled

#

also i just noticed

#

it runs that thing twice

#

is there a reason for it?

#

wait let me check again

#

@clear hinge seam fix init img is the same as corrected_result

#

also this this "Result" not supposed to be an image? i cannot debug it until the original settings are restored

clear hinge
#

Uh... I don't know now. May have been a tensor?

#

Hrm... Well, it seems like init img and mask are correct

rugged moth
#

yep

clear hinge
#

That code structure was a bit messy though. Aside from that, it tries to replace the make_image function, then restores it afterward

#

But the code path for the whole thing stores some stuff on the class, some stuff gets passed to make_image, and some stuff is captured.

#

If you want to semi-simplify it you could move seam-painting up to generate.py for the time being

#

And just create a new inpaint for the seam paint call to avoid any variable conflict stuff

rugged moth
#

heres what i noticed

#

when i debug images

#

after the seam_paint runs

#

the whole init_img thing runs all over agian

#

coz i get debug images for init_img again

clear hinge
#

Yah it calls get_make_image again after seam paint to restore settings

#

Since if you're doing iterations, the next iteration would be broken if you didn't do that

rugged moth
#

ah

#

seems like whatevert is broken is broken inside seam_paint

#

trying to pin point what

#

reckon its the noise that is being filled/

#

?

clear hinge
#

Maybe doesn't initialize something correctly

rugged moth
#

see any issues here?

clear hinge
#

No, but I'm not super familiar with that code 😣

#

For seam paint, the most important thing was setting all the variables correctly when calling get_make_image, then restoring them correctly afterward

rugged moth
#

you can see from the latents here that the area that is being seam painted is very different

#

which makes me feel that the issue might be with the noise

clear hinge
#

Hrm... So in the old code, the init img got stored as pil_image, but then converted to a tensor and used for everything (including generating first stage).

worldly cloak
#

some of that conversion code got shuffled over to the diffusers_pipeline side of things

#

oh, hmm, there is some code in inpaint.py that's behind if isinstance(init_image, PIL.Image.Image): conditionals, but I'm not sure how optional it really is?

that is, if you passed something that wasn't an Image, then none of that would run, and that might be bad.

clear hinge
#

Yah I forget what the use case was for that

rugged moth
#

Bug: Setting img2img strength at 0.01 throws IndexError: index 0 is out of bounds for dimension 0 with size 0 during img2img

worldly cloak
#

ah, that one I did know about at one point but haven't made a tracking issue for it since the branch closed. will do.

worldly cloak
#

obviously it shouldn't crash, but UX wise, what should the behavior there be?

it's flipping out because it thinks you just asked it to do a zero-step img2img

rugged moth
#

which might be the easier fix

worldly cloak
#

well the inconvenient thing is that the way img2img strength has been implemented, it works by running some fraction of the requested step count.

so the minimum is not 0.1, it's 1/steps, more or less.

#

by the same token, if you're running n steps, you can't really get up to 0.999, you can only get up to (n-1)/n

rugged moth
#

its been like this since the original compvis implementation

heavy glacier
#

at one point people proposed exposing steps to run

worldly cloak
#

yep. so having the strength setting be a continuous slider is misleading. discrete intervals would be more accurate.

and also kind of a crummy UI all around, so we should change the implementation, but... that's another story.

heavy glacier
#

But i find that confusing from a UX perspective

rugged moth
#

and it worked fine in the frontend .. not sure when and where it broke

#

but yeah.. its a silly thing

heavy glacier
#

didnt we decide to allow 1?

#

instead of doing inpaint replace?

rugged moth
heavy glacier
#

and we need to remove inpaint replace ui right?

rugged moth
#

the last change on that code was 3 months ago when i changed it to cap at 0.99

heavy glacier
#

right the cap was .99 because 1 would effectively ignore the og image - but thats what inpaint replace is doing so we figured it was more streamlined to just allow 1

worldly cloak
heavy glacier
#

that seams reasonable

#

UX should be improved but not sure its imperative for diffusers release

#

dont think itll behave differently than it does today

#

right?

worldly cloak
#

true

west nebula
#

I'm experimenting with symmetry with diffusers and I put this code in sampler.py's do_sampling loop just before the call to p_sample. It doesn't seem to matter what percentage of steps I use, the result looks completely symmetrical rather than somewhat symmetrical. Any ideas?

            if percent_done < 0.05:
                # flip the image tensor and use the first half of the original followed by the second half of the flipped tensor
                width = img.shape[3]
                to_use = int(width / 2)
                x_flipped = torch.flip(img, dims=[3])
                img = torch.cat([img[:, :, :, 0:to_use], x_flipped[:, :, :, to_use:width]], dim=3)
#

With the pre-diffusers code, the symmetry had the desired effect.

worldly cloak
#

I'm surprised to hear the Sampler class has any effect on the diffusers pipeline at all

#

I think InvokeAIDiffuserComponent is roughly where I'd try to put that sort of thing, though it is quickly outgrowing how much we should fit in a single class.

west nebula
#

Just didn't know where to throw this test code, so thanks. I'll move it to InvokeAIDiffuserComponent and see what blows up!

#

It seems that do_diffusion_step has no access to the total step count, hmm.

tardy sparrow
#

yep

#

that limitation is possibly a hangover from the compvis code

#

because with the diffusers pipeline, we have full control/access over the steps and step index

#

with compvis we only had full access with ddim, the k_* samplers abstracted the step index and step count out of the upstream calls

#

so i had to estimate by checking the current value of sigma against the sigmas array of the model/sampler. you should still be able to see that code in the do_cross_attention_controlled_diffusion_step() function or whatever it's called

west nebula
#

estimate_percent_through doesn't work now, either.

#
    percent_done = self.estimate_percent_through(step_index, sigma)
  File "/home/jovyan/work/InvokeAI/ldm/models/diffusion/shared_invokeai_diffusion.py", line 255, in estimate_percent_through
    smaller_sigmas = torch.nonzero(self.model.sigmas <= sigma)
  File "/home/jovyan/work/InvokeAI/invokeai/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1269, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'UNet2DConditionModel' object has no attribute 'sigmas'```
#

That's when using ddim. Seems that the k_ samplers do provide that, at least the one I tried

tardy sparrow
#

ooh

#

ok i guess that needs fixing properly

west nebula
#

But the results look horrible with ddim and great with the k_ samplers even if I work around that.

#

ddim vs k_dpmpp2

#

And you can't use step_count f using a k_sampler:

    percent_done = step_index / 50
TypeError: unsupported operand type(s) for /: 'NoneType' and 'int'```
#

I wonder if all of this should be normalized prior to hitting this layer.

tardy sparrow
#

@rose sentinel i'm not sure that this is correct:

#

i think it should be 'hub':

damian@d-mba2 matrix % find ~/.cache/huggingface -iname "*runwayml*"
/Users/damian/.cache/huggingface/hub/models--runwayml--stable-diffusion-v1-5
#

above is what i already have on my disk from building grate - which simply calls StableDiffusionPipeline.from_pretrained('runwayml/stable-diffusion-v1-5')

rose sentinel
#

That's odd. The diffusers pipelines end up in diffusers when I try the same thing. What version of huggingface_hub are you using?

tardy sparrow
#

0.11.1

west nebula
tardy sparrow
#

hmm, no, wait, i'm pretty sure i was using k_lms..

unkempt timber
west nebula
west nebula
unkempt timber
#

True

west nebula
#

Determinism isn't a bad thing.

tardy sparrow
#

agreed. i think the real fix is to pass the step index properly through.

west nebula
#

And also number of steps total?

#

(Even if I hardcode that stuff in, my output gets all mangled. So maybe this isn't the place to be hooking up symmetry.)

rose sentinel
# tardy sparrow 0.11.1

Here's what I get on my system:

$ tree .cache/huggingface/
.cache/huggingface/
└── hub
    └── version.txt
$ python
>>> from diffusers import StableDiffusionPipeline
>>> pipeline = StableDiffusionPipeline.from_pretrained('stabilityai/stable-diffusion-2-depth')
$ tree -L 2 .cache/huggingface/
.cache/huggingface/
β”œβ”€β”€ diffusers
β”‚Β Β  └── models--stabilityai--stable-diffusion-2-depth
└── hub
    └── version.txt

3 directories, 1 file

huggingface_hub 0.11.1, diffusers 0.11.1
No difference when using runwayml/stable-diffusion-v1-5. I can't explain the difference we are seeing.

west nebula
#

@rose sentinel I didn't get your TI branch to crash but it also doesn't resume training. See logs below.

#

It still has 2K steps to go...

tardy sparrow
west nebula
#

Should that code all be moved to estimate_percent_through and estimate_percent_through be used throughout?

#

Otherwise if self.cross_attention_control_context is None, one has to duplicate all of that effort to get the percent through.

#

If that isn't a concern and we don't care if estimate_percent_through works right, looks fine.

#

I'll toss some debugging statements in and test.

#

total_step_count should be documented at the top of the function.

#

So with a ckpt model loaded, step_index is good but total_step_count is None when using ddim.

#

Same model, both step_index and total_step_count are None when using k_.

#

With a diffusers model, step_index and total_step_count look good with all samplers.

#

Is that what you expected, @tardy sparrow?

tardy sparrow
west nebula
#

So of those cases, are there any where percentage cannot be estimated?

tardy sparrow
#

nope,

#

estimate_percent_through is scheduled for removal when the compvis code paths are dropped

#

because with diffusers we can always guarantee to provide both step_index and total_step_count

west nebula
#

But your code isn't executed if cross attention isn't being used, right?

#

So that should be pulled out IMO.

#

We should still have access to % complete.

tardy sparrow
#

you can just push it above the if, i just didn’t want to pollute the fn namespace with variables that aren’t used except in certain use cases

west nebula
#

That's why I was suggesting throwing it into the estimate_percent_through function that we could keep around.

#

My thought process is that at some point we may need this with nodes if we're going to do a per-step adjustment or calculation or whatever.

#

Something like symmetry for X% comes to mind. πŸ™‚

tardy sparrow
#

sure, its just a simple division tho and it’s not an estimate

#

i would prefer the function goes away and whoever needs it just has access to the step index and total count vars

west nebula
#

Is there a way to get those for ckpt?

#

I should convert the ckpt and see if the same issues exist since that's the path forward...

tardy sparrow
west nebula
#

Ah. Got it. So when we yank out native ckpt support, that whole path goes away entirely and we can use raw steps + total.

#

I suppose I should hold off on experimenting again until the dust settles.

forest spade
rose sentinel
forest spade
rose sentinel
#

Or Damian will get to it first.

forest spade
rose sentinel
#

I recently fixed using a workaround.

forest spade
#

Interesting! Would be happy to look into it if you could file a quick issue. In general, any reason to use autocast instead of pure fp16?

#

BTW, I'm fully back to work and happy to help with some changes to diffusers .

What are the most pressing things from your side?

worldly cloak
#

wait, do we have autocast on? where is that? oh, I think there was a hint as to where that is in a traceback that gogurt filed, I'll go check.

rose sentinel
#

It's in generator/base.py, where there is a call to choose_autocast(). The latter is in devices.py.

#

Is autocast unecessary? It is there from the original CompVis code.

#

Really easy to turn autocast off globally just by editing choose_autocast().

worldly cloak
#

@forest spade I think the most pressing things are:
β€’ SD 2.x with float16 as we've just been talking about, https://github.com/invoke-ai/InvokeAI/issues/2329
β€’ MPS with float16 https://github.com/invoke-ai/InvokeAI/issues/2336
β€’ and we still need to wrap up the cross-attention stuff. I think damian has been reluctant to build on the new API before it comes out in a diffusers 0.12 release. Maybe uncertain if it's done, or if you're making additional changes for LoRa. (partly my own interpretation; I don't want to put words in @tardy sparrow's mouth.)

worldly cloak
#

nope, looks like we have some things to clean up there. Welp, wish I had known about that earlier

rose sentinel
#

Right now, SD-2.1 will on non-xformer CUDA systems by virtue of being forced to use float32, as soon as PR 2335 goes in.

worldly cloak
rose sentinel
worldly cloak
#

I will, when I get enough round tuits, but I wouldn't mind if someone else gets there first.

forest spade
#

Gotcha, generally it is never recommended to use autocast in diffusers. We completely dropped support for it as it's really slowing down generation (see https://github.com/huggingface/diffusers/pull/511)

So nothing should be wrapped in autocast as it will always be slower that just doing torch_dtype=torch.float16

#

With diffusers 0.11.0 SD 2.1 works with torch_dtype=torch.float16 (no autocast)

#

We can do a release next week with all the attention loading and also better textual inversion loading

west nebula
#

Leaving this dump here. Can't get it to reliably reproduce but it's happened a few times for me.

#

I have noticed that GPU memory stays high after a cancellation of a render (via GUI) until something renders successfully.

rose sentinel
worldly cloak
#

if it feels overwhelming to do the whole codebase, we can try some more focused application of autocast(enabled=False) around the diffusers stuff.

west nebula
sour sun
west nebula
#

Nope.

#

It tries to get it close to a 262144 pixel area.

#

But it should always be a multiple of 64 unless I'm wrong...? @rose sentinel?

#
        scale = 512 / scale_dim

        init_width = math.ceil(scale * width / 64) * 64
        init_height = math.ceil(scale * height / 64) * 64```
#

That's the ckpt code

sour sun
west nebula
#

Diffusers has a completely different approach.

#

return tuple((x - x % multiple_of) for x in args)

#

Where multiple_of defaults to 8... so I'm confused here.

rose sentinel
#

@worldly cloak I found this very old line of code in the ckpt loader:
model.to(torch.float16)
I think it's garbage. model.to() sets the device, not the precision. Is this correct?

rose sentinel
#

The diffuser code path is now working with fp16 and no autocast. I am not touching the ckpt code path, which still uses autocast. I'll do a little testing and then make a PR.

rose sentinel
#

@worldly cloak Unfortunately, removing the autocast context does not fix the black image problem when running with float16. I'm afraid it may be a float16 issue rather than an autocast issue. I am pretty sure that I removed the autocast context completely.

rose sentinel
#

For what it's worth, here's code that works:

from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler
import torch

model_id = "stabilityai/stable-diffusion-2-1"
# Use the DPMSolverMultistepScheduler (DPM-Solver++) scheduler here instead                                                                                  
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16, cache_dir='/home/lstein/invokeai/models/diffusers')
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe = pipe.to("cuda")
prompt = "a photo of an astronaut riding a horse on mars"
image = pipe(prompt).images[0]
image.save("astronaut_rides_horse.png")
#

And here's code that produces a black image:

from diffusers import DPMSolverMultistepScheduler
import torch

model_id = "stabilityai/stable-diffusion-2-1"
# Use the DPMSolverMultistepScheduler (DPM-Solver++) scheduler here instead                                                                                  
pipe = StableDiffusionGeneratorPipeline.from_pretrained(model_id, torch_dtype=torch.float16, cache_dir='/home/lstein/invokeai/models/diffusers')
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe = pipe.to("cuda")
prompt = "a photo of an astronaut riding a horse on mars"
image = pipe(prompt).images[0]
image.save("astronaut_rides_horse.png")
#

The difference is that one is using StableDiffusionGeneratorPipeline, and the other is stock StableDiffusionPIpeline.

worldly cloak
rose sentinel
#

There is only one place in the code base that retrieves the autocast constant, and I have a print() statement just before it (which isn't triggered), so I don't think autocasting is the issue.

worldly cloak
west nebula
#

Also, that's a lot of numbers in the GUI dropdown!

worldly cloak
#

pretty sure your traceback above is a swap() crash

rose sentinel
#

Does pipe.disable_xformers_memory_efficient_attention() have the same effect as uninstalling xformers completely? Because I don't want to do that -- it took quite a while to get it installed.

west nebula
rose sentinel
#

I will also try updating torch-rocm on the AMD machine.

rose sentinel
#

Ok, looks like I missed embiggen. Easy fix.

west nebula
#

That's txt2img2img.

#

I wasn't using embiggen, rather --hires_fix.

rose sentinel
#

Oh, sorry. I was mixed up.

#

Should be easy fix. Hang on.

west nebula
#

Want me to test a change locally first?

rose sentinel
#

Let me figure out where it is....

#

...found it

#

If you would, could you check perlin noise as well? I didn't address that part of the code.

west nebula
#

Sure, hang on a sec.

#

Nope!

rose sentinel
#

Here's the fix for txt2img2img:

diff --git a/ldm/invoke/generator/txt2img2img.py b/ldm/invoke/generator/txt2img2img.py
index e356f719..1dba0cfa 100644
--- a/ldm/invoke/generator/txt2img2img.py
+++ b/ldm/invoke/generator/txt2img2img.py
@@ -90,9 +90,9 @@ class Txt2Img2Img(Generator):
     def get_noise_like(self, like: torch.Tensor):
         device = like.device
         if device.type == 'mps':
-            x = torch.randn_like(like, device='cpu').to(device)
+            x = torch.randn_like(like, device='cpu', dtype=self.torch_dtype()).to(device)
         else:
-            x = torch.randn_like(like, device=device)
+            x = torch.randn_like(like, device=device, dtype=self.torch_dtype())
         if self.perlin > 0.0:
             shape = like.shape
             x = (1-self.perlin)*x + self.perlin*self.get_perlin_noise(shape[3], shape[2])
@@ -117,10 +117,12 @@ class Txt2Img2Img(Generator):
                                 self.latent_channels,
                                 scaled_height // self.downsampling_factor,
                                 scaled_width  // self.downsampling_factor],
-                                device='cpu').to(device)
+                               dtype=self.torch_dtype(),
+                               device='cpu').to(device)
         else:
             return torch.randn([1,
                                 self.latent_channels,
                                 scaled_height // self.downsampling_factor,
                                 scaled_width  // self.downsampling_factor],
-                                device=device)
+                               dtype=self.torch_dtype(),
+                               device=device)

west nebula
#

Also unrelated, noise thresholding looks pretty off.

rose sentinel
#

Does this work for you, or should I upload a patch file.

west nebula
#

I'll patch it, hang on a sec.

rose sentinel
#

Noise thresholding is still a to do.

west nebula
#

So perlin noise with txt2img needs fixing.

#

What about img2img? I haven't tested that at all.

#

Yeah, perlin noise there is messed up as well.

rose sentinel
#

Perlin noise won't work anywhere. I've isolated the problem, trying to figure out the solution.

west nebula
#

Also getting errors with .swap now...

rose sentinel
#

Here is the fix for perlin:

diff --git a/ldm/util.py b/ldm/util.py
index 282a56c3..7d44dcd2 100644
--- a/ldm/util.py
+++ b/ldm/util.py
@@ -8,6 +8,7 @@ from threading import Thread
 from urllib import request
 from tqdm import tqdm
 from pathlib import Path
+from ldm.invoke.devices import torch_dtype
 
 import numpy as np
 import torch
@@ -235,7 +236,8 @@ def rand_perlin_2d(shape, res, device, fade = lambda t: 6*t**5 - 15*t**4 + 10*t*
     n01 = dot(tile_grads([0, -1],[1, None]), [0, -1]).to(device)
     n11 = dot(tile_grads([1, None], [1, None]), [-1,-1]).to(device)
     t = fade(grid[:shape[0], :shape[1]])
-    return math.sqrt(2) * torch.lerp(torch.lerp(n00, n10, t[..., 0]), torch.lerp(n01, n11, t[..., 0]), t[..., 1]).to(device)
+    noise = math.sqrt(2) * torch.lerp(torch.lerp(n00, n10, t[..., 0]), torch.lerp(n01, n11, t[..., 0]), t[..., 1]).to(device)
+    return noise.to(dtype=torch_dtype(device))
 
 def ask_user(question: str, answers: list):
     from itertools import chain, repeat
rose sentinel
#

Working on swap now...

#

Huh. Swap() is working for me. What was the prompt you tried?

west nebula
#

Hm, I'll have to dig it up. One sec.

#

product shot of a wine glass full of (wine).swap(danny devito), caustics ray tracing three point dramatic lighting [painting, render, drawing, sketch, cartoon, pixar, disney]

#

ddim, 512x512, 50 steps, nothing else turned on

rose sentinel
#

O

#

I'll try ddim

#

Seems fine.

west nebula
#

Crashes in k_ samplers as well.

#

Let me restart Invoke. One min.

rose sentinel
#

I also put debugging diagnostics around the section that crashed to look for type mismatches.

#

Wait, what platform are you on? Mac?

west nebula
#

Linux under WSL2 w/CUDA

#

RTX 3060

rose sentinel
#

Nope. Should be fine.

west nebula
#

I am running with --precision float16

rose sentinel
#

Let me try that.

#

Nope. Works.

#

I have no doubt there's a bug there somewhere, but I need to knock off for the evening. I'm going to commit the perlin and txt2img2img fixes and will check in with you tomorrow.

west nebula
#

I'm using a model that I converted from a ckpt, so I get: Half-precision version of model not available; fetching full-precision instead

rose sentinel
#

That's ok too.

west nebula
#

No, still crashing after a restart of Invoke.

rose sentinel
#

I get that message for many models - doesn't make a difference.

#

Very annoying!

west nebula
#

It has to be somewhere in cross_attention_control.py.

rose sentinel
#

It's definitely caused by the autocast change.

#

Are you using xformers ?

west nebula
#

Not yet.

#

hidden_states = torch.bmm(attention_slice, value)

#

One of those is a float and the other is half.

rose sentinel
#

Oh, thank you! Where is that line?

west nebula
#

233

#

in einsum_lowest_level

rose sentinel
#

I think the problem is that I was testing on the machine with xformers. I'm now trying the non-xformers machine to see if I can reproduce.

west nebula
#

I wanted to keep things simple for testing so I left xformers off.

#

Building/getting the right version has been a pain whenever I've tried.

rose sentinel
#

I'm able to reproduce the swap() problem on the non-xformers system.

west nebula
#

Yay

#

Now I can go to sleep.

#

Happy to test more tomorrow morning.

rose sentinel
#

swap() is now fixed and pushed to the PR

west nebula
#

Did you test blending?

rose sentinel
worldly cloak
#

yep

sour sun
#

img2img isn't working with a diffusers model on MPS. Is this a known issue? The non-diffusers version works fine.

tardy sparrow
sour sun
tardy sparrow
sour sun
#

Got the same error using the current stable version and a nightly from... maybe a week or two ago? maybe I should try the latest

#

Nope, current nightly doesn't fix it.

sour sun
#

Aha, I think using my warmup code with the img2img pipeline was causing it to calculate a step count of zero, which was causing the error.

#

Now I'm generating noise instead of images, but at least it's not crashing.

#

thanks

west nebula
forest spade
forest spade
grave lion
#

strange results with latest main on mac (SD2.1, keuler, 20 steps, 768px)

tardy sparrow
tardy sparrow
#

loads ckpt files straight into StableDiffusionPipeline objects without needing to write anything to disk

west nebula
#

Is there a recommended way to install xformers for Invoke?

west nebula
#

Also, should we pass requires_safety_checker=False when constructing a DiffusionPipeline if the user has picked --no-nsfw_checker to avoid those warnings hitting the console?

rose sentinel
rose sentinel
rose sentinel
west nebula
heavy glacier
#

I also just purchased a 4090 for model training... and am under the impression I will have to do some rituals to get it to perform well

west nebula
#

Great, have to update the entire system before I can get cuda 11.7 installed. What a PITA.

#

@rose sentinel If you follow NVIDIA's remote instructions, it gets cuda 12.0. Is that OK for Invoke?

rose sentinel
#

No, because pytorch only supports up to CUDA 11.7. Did I provide the link to the wrong toolkit?

west nebula
#

Yes.

#

They're pushing the latest out.

#

I'm going to remove all of my NV repos and try this again after uninstalling 12.

#

BBIAB

#

I think I was pulling updates from their server via apt. I removed the file and I'm doing the local installer and all seems fine.

#

I think the documentation should specify that it has to be 11.7 and that 12.0 or later won't work.

#

@rose sentinel here's another issue...

1.13.1+cu116
(invokeai) (base) jovyan@f0ab4de483f7:~/work/InvokeAI$ pip install --upgrade --force-reinstall torch torchvision
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting torch
  Downloading torch-1.13.1-cp310-cp310-manylinux1_x86_64.whl (887.5 MB)
     ╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 20.1/887.5 MB 15.0 MB/s eta 0:00:58
ERROR: Operation cancelled by user```
#

It's not getting the CUDA 11.7 version.

#

pip install --upgrade --force-reinstall --extra-index-url https://download.pytorch.org/whl/cu117 torch torchvision takes care of it on this system.

rose sentinel
west nebula
rose sentinel
#

Yes, yoiu went down exactly the same path I did a few nights ago. I went to the "CUDA Toolkit" page that you get when you search Google and installed the NVIDIA repos, not realizing that I was going to get 12.0. Several frustrating hours later, I ripped everything out and started over again with 11.7. This time it went smoothly.

west nebula
#

You can also apt-get install cuda-11-7.

#

But good, glad we both got it installed. Users are going to have issues, I think.

heavy glacier
#

I agree 😦

west nebula
#

So what about torch/torchvision not being correct?

#

Unfortunately, my work requires me to get this all working ASAP so I can AI generate people and baked goods.

rose sentinel
west nebula
rose sentinel
#

Seemed to work for me without the extra index.

west nebula
#

I assume requirements.txt will be updated to pull in the right torch/torchvision and users won't have to --extra-index-url https://download.pytorch.org/whl/cu117?

#

Not sure why I have to do that, then.

#

But it tries to get CPU 1.13.1.

rose sentinel
#

requirements will be updated if necessary. Oddly I am getting 1.13 with CUDA support without the extra index. Did you kill the install because the wheel name looked wrong, or did you finish the install and then test the version?

west nebula
#

Former.

rose sentinel
#

I will test this on a machine that has 1.12/cu116 installed.

west nebula
#

Before, it was installing CUDA 11.6... just not catching on to 11.7 now.

rose sentinel
#

I got '1.13.1+cu117' -- without the extra index.

#

Ok, during install, the wheel was named torch-1.13.1-cp39-cp39-manylinux1_x86_64.whl, but it was the cuda version that installed.

west nebula
#

Well that's not great.

#

So even though it says cp310 in my case, it should work? I'll try installing again.

west nebula
#

Confirmed, +cu117 is there.

#

So now, xformers...

sour sun
west nebula
#

@rose sentinel xformers install worked perfectly.

#
memory_efficient_attention.tritonflashattB:        unavailable```
#

I assume that's nothing to be alarmed about.

#

Maybe?

Error caught was: No module named 'triton'```
rose sentinel
#

Otherwise the system will load and install .ckpt and .safetensors using the old code path.

sour sun
#

That would be great. You're still planning to remove the old k_diffusion code eventually, right? When that happens, you would just have to make that option always-on...

rose sentinel
#

Has anyone noticed that there is slight image to image variation when generating with diffusers using identical prompt, seed and other parameters? There are often small changes to the image, usually near the bottom. I have just noticed this and am unsure whether it is related to xformers. It doesn't seem to happen with legacy .ckpt or .safetensors models.

west nebula
#

I just installed triton and that error went away. Maybe include that in the pip instructions, @rose sentinel?

west nebula
#

Definitely seeing a difference - same seed, parameters, etc.

#

This is post-xformers. Is there a way to disable it without uninstalling it? That was a beast to build.

#

Also If suiccessful

#

The differences are even more pronounced with 2.1, and it's not really generating anything useful. Hmm.

west nebula
heavy glacier
#

I just got my 4090 installed and I’ll work on trying to figure out how to get xformers and 4090 support going so I can write recipes up for both

#

It’s currently jutting outside of my case because apparently subtlety is not in vogue

sour sun
west nebula
#

My setup is different and does use the VAE. Plus renders shouldn't vary if no settings change at all.

  path: /home/jovyan/work/InvokeAI/models/optimized-ckpts/stable-diffusion-v1-5-nonema
  description: Stable Diffusion Non-EMA (FULL)
  format: diffusers
  vae:
    repo_id: stabilityai/sd-vae-ft-mse
  default: true```
#

I should see absolutely no difference between those two flower cookie photos, and there are a bunch.

#

So how do I disable - not uninstall - xformers?

#

I'm using pip here rather than conda.

worldly cloak
#

edit diffusers_pipeline.py and remove the clause that does the enable_xformers call.

west nebula
#

@rose sentinel With xformers off, the generations are completely identical.

#

Whether it's xformers or something that uses it is TBD.

atomic epoch
# heavy glacier I just got my 4090 installed and I’ll work on trying to figure out how to get xf...

I got xformers to build for my 4090 in my nvidia/cuda:11.7.1-devel-ubuntu22.04 docker container with the following FWIW:

RUN pip install ninja \
  && pip install -U --pre triton \
  && TORCH_CUDA_ARCH_LIST="8.6+PTX" pip install -v -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers

xformers doesn't have ARCH 8.9 on the list so I had to use 8.6+PTK which appears to work anyway.

#

oh I also had to install build-essential

heavy glacier
#

Im on windows, but got xformers built using
pip install xformers==0.0.16rc425

#

Hm

#

Still feels slower than my 3080

#

how many it/s are you getting w/ your 4090?

atomic epoch
#

~15it/s, SD 1.5, "dog", 20 steps, 512x512, k_euler_a

west nebula
#

I think this error message needs to be replaced w/one indicating a local embedding is being used: This concept is not known to the Hugging Face library. Generation will continue without the concept.

atomic epoch
#

@heavy glacier what are you getting? I started at like 5-8it/s until I implemented a few steps.

#

I'm not sure if I'm using the diffusers implementation at this point so I don't know if xformers is being used. I just pulled the latest which included that merge from the weekend and booted up... Would I see confirmation xformers is being used in the console?

heavy glacier
#

Yeah, getting about 5-6 rn

atomic epoch
#

One thing that unlocked a lot of performance was I manually updated the CUDNN version. It's a pita, but that gave me a 300%-400% performance jump.

#

Pseudo steps incoming...

heavy glacier
#

I did that already

#

at least, I *believe * I did

atomic epoch
#

When you're done you should be able see 8700 for cudnn:

import torch
torch.backends.cudnn.version()
heavy glacier
#

mmm

#

looks like its not

#

8302

#

ok have it up to 12 now

#

looks like i got 8600 for cudnn

#

updating to 8700 now... 😏

#

nawp, still 12.

#

maybe I dont have transformers up and running yet

#

Ah - 23-25 on SD2.1

atomic epoch
#

oo, I'll have to double check SD2.1 when I'm back at the keyboard. I don't want to leave any iterations on the table!

What were you getting with the 3080?

rose sentinel
west nebula
#

So what's the deal with xformers producing different images every run?

rose sentinel
heavy glacier
rose sentinel
# west nebula

There is a way, but right now it requires a code change. In model_manager.py, right after line that creates the pipeline (line 426), add: pipeline.disable_xformers_memory_efficient_attention()
I'm going to add a command-line argument to make this convenient.

heavy glacier
#

I am also getting variations

#

Also, 2.1 seems to be producing garbage compared to 1.5 models.

#

(not necessarily broken garbage, just meh)

west nebula
west nebula
rose sentinel
sour sun
# heavy glacier (not necessarily broken garbage, just meh)

2.x is notoriously harder to prompt for, and negative prompts are less optional. Also, a lot of the common tricks for getting good results with 1.5 prompts (e.g. "greg rutkowski") don't work, so you have to re-learn it. It also doesn't have the benefits of all the third-party finetuning. Is that the kind of stuff you're referring to? https://minimaxir.com/2022/11/stable-diffusion-negative-prompt/

Max Woolf's Blog

Negative prompts can be far superior than traditional prompt additions.

west nebula
#

I didn't do any tricks, just asked for a goat.

heavy glacier
#

yeah i avoid using artists by habit

#

Not really a "fine-tuning" problem - Getting a lot of weird stuff. Wondering if maybe its not the model πŸ€”

west nebula
#

πŸ›

heavy glacier
#

well

#

I'm dumb. forgot to change to 768 for 2.1

#

quality still not great but, better i guess

#

this is the best I could get

#

ok, slowly but surely getting non hot garbage.

#

all the samplers seem to be working fine

grave lion
heavy glacier
#

think that's just the model.

grave lion
#

but for 768px the steps count is more important

west nebula
rose sentinel
rose sentinel
west nebula
#

One seems to work, one doesn't at all. shrug

#

So it may be undertrained?

#

@rose sentinel Where can I put a debugging statement to make sure the embedding is being used in a particular generation? I want to see if it's being picked up and used or not.

grave lion
heavy glacier
#

2.1 model is trained on 768x768

#

although iirc theres a 512x512 version as well

grave lion
#

I know. I get nice 512px with your prompt above.

rose sentinel
west nebula
rose sentinel
#

The trees are seriously warped with 512x512.

#

Upping to 60 steps, it gets worse for me

#

Here is 768x768 at 60 steps. It looks reasonable:

west nebula
#

What's your CFG?

rose sentinel
#

7.5

#

As I say, I haven't played around with 2.1 much.

worldly cloak
#

I think DDIM is the safest starting point for any new and unfamiliar configuration.

west nebula
rose sentinel
west nebula
#

Cool! I'll check it out after my TI finishes TI-ing.

rose sentinel
#

I'll start working on the problem with resuming next.

rose sentinel
#

I've got a PR that has partial instructions for installing xformers (instructions for Linux, not Windows). I would be much obliged if anyone who has gotten it working on Windows could contribute a recipe: https://github.com/invoke-ai/InvokeAI/pull/2360

GitHub

I've written up the install procedure for xFormers on Linux systems.
I need help with the Windows install; I don't know what the build dependencies (compiler, etc) are. This section of the ...

rose sentinel
#

@worldly cloak Coding style question for you. I am trying to suppress the diffusers warning messages about disabling the NSFW checker (which are irrelevant, since we do the NSFW filtering separately). After importing the appropriate calls from diffusers.utils.logging, this works:

verbosity=get_verbosity()
set_verbosity_error()
<do something that triggers the warning>
set_verbosity(verbosity)

However, this looks like something that you'd create a Python context for so that the previous verbosity error is saved and restored automagically. Is this easy enough to do?

heavy glacier
#

Do you want me to officially add that as the recipe/document it to that effect?

#

My thought was that, given it just works from a wheel install... why would we have users go through building it (or installing it manually) at all?

rose sentinel
heavy glacier
#

I have not tested on Linux

#

But i have to imagine it'd work just as well

#

I had no issues at all getting it going

#

One other thing I'll note, just as a performance note - For my 4090, I had to update to latest cudnn dlls, and update torch/torchvision

rose sentinel
#

I saw that. I'm not sure what the equivalent maneuvers are for Linux. Would love to do the same myself if you get such a nice performance boost.

heavy glacier
#

Are you running on a 4090 too?

#

I uninstalled torch & torchvision, then did
pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/cu116

rose sentinel
#

Nope, ampere architecture A2000. I built everything from source, so maybe I've got most recent.

heavy glacier
rose sentinel
#

You're still using CUDA 11.6?

#

Maybe we should keep the doc after all and populate it with tips and tricks. Sounds llike this is still an area of rapid development.

heavy glacier
#

This was a bit of a frankenstein of advice I'd seen online - I've definitely updated to latest cuda dlls

rose sentinel
#

BTW, I've got a whole bunch of small PRs up -- all small bug fixes. If you have a chance to review, most of them are pretty straightforward. The ones that may have issues I've flagged.

heavy glacier
#

πŸ‘

dire gazelle
#

to add another data point, I just tested the xformers==0.0.16rc425 wheel on Linux, seems to work. haven't tried image generation yet, but it installs and the python -m xformers.info test shows success.

dire gazelle
heavy glacier
#

Performance

#

Was about 25% of what it is now, and got 2x (each) from xformers and the cuda updates

rugged moth
#

We need to find a fix for the seam painting being fully broken. Been breaking my head to isolate where the problem is but no luck so far. ---- EDIT: Just noticed that the issue is only with diffuser models. Not ckpt models.

#

@rose sentinel for xformers, I think if our pytorch req wont change during the course of a release, maybe it's better if we supply the wheels?

#

the xformers repo builds wheels when they run their actions on github

#

i got my windows version from there. FARRRRRRR better than trying to build it.

#

One issue with that is that, Triton does not get installed on windows. So it throws a minor warning. But hte optimizations seem to work.

#

Might not be an issue on Linux coz I think the pip install covers it?

dire gazelle
#

xformers wheel from pip confirmed working perfectly on Linux (x86_64/CUDA11.7/torch1.13.1), both natively and in Docker. No more black images with SD 2.1!

#

we'll likely need to add platform markers for triton so it's not trying to install it on windows

clear hinge