Textural Inversion Training - How does it work? | Invoke | Page 1

karmic prawn Mar 24, 2023, 6:19 PM

#

Especially:
**Where to put descriptions of the images? **
Should they contain a placeholder for the object trigger word?

There is a github site about that https://invoke-ai.github.io/InvokeAI/features/TEXTUAL_INVERSION/, but it doesn't mention image descriptions.

fiery hamlet Mar 24, 2023, 8:07 PM

#

My understanding is that file name is used for caption in automatic. In Invoke, I'd be interested to learn how does it work. From small experiments I made, it doesn't seem to make any difference. Also size of the embedding setting seems to be missing in Invoke?

karmic prawn Mar 24, 2023, 8:37 PM

#

If the file name is used, does it have to contain a placeholder for the trigger word? E.g. * ?

near ether Mar 28, 2023, 12:57 PM

#

This is a bit complicated. There are at least four different formats for textual inversion embedding files. Some of them contain the trigger token inside the file. In this case InvokeAI uses the token. Others do not have the trigger token. In this case InvokeAI creates a trigger that is the file name surrounded by <angle-brackets >. You can see what triggers are active by looking for an informational message that appears on the console at startup time.

karmic prawn Mar 28, 2023, 8:19 PM

#

near ether This is a bit complicated. There are at least four different formats for textual...

But you're talking about using textural inversion, right? :3

I'm wondering about the training part of it. ❤️

near ether Mar 28, 2023, 8:33 PM

#

There is at least one version of the embedding file format that does not contain either a trigger or a placeholder in the file. It's got a single dictionary key emb_params that points directly to a MxN tensor. I can post my notes on the various formats I've found in the wild. It would be great if you could write your training using one of the supported formats instead of creating a new variant.

#

Here are my notes on the four variants:

#

* PT VARIANT #1
example: rem_rezero.pt
keys: ['string_to_token', 'string_to_param', 'name', 'step', 'sd_checkpoint', 'sd_checkpoint_name']

name is the trigger token
string_to_token is {'*': 265}
string_to_param points to a dict with the placeholder
  {'*': tensor([[-0.0512,  0.0459, -0.0991,  ..., -0.2303, -0.2073, -0.0393],
  tensor shape is [5,768] in this instance


* PT VARIANT #2
example: midj-strong.pt
keys: ['author', 'string_to_token', 'string_to_param']
string_to_token is {'*': tensor(265)}
string_to_param is {'*': tensor([[-0.0875,  0.0224,  0.0444,  0.0217,  0.0365, -0.0053,  0.0566, -0.0396...
tensor shape is [1,768]

* PT VARIANT #3
example: easynegative.safetensors
keys: ['emb_params']
emb_params is tensor([[-0.0004,  0.0095, -0.0080,  ...,  0.0164, -0.0055,  0.0022],
tensor shape is [8, 768]

* "bin" VARIANT #1
example: kamon-style/learned_embeds.bin
keys: ['<kamon-style>']
<kamon-style> is tensor([ 0.2494, -0.0850, -0.0289,  ...,  0.0964, -0.0115,  0.0238])
tensor shape is [1024]

#Textural Inversion Training - How does it work?