#Textural Inversion Training - How does it work?
8 messages · Page 1 of 1 (latest)
My understanding is that file name is used for caption in automatic. In Invoke, I'd be interested to learn how does it work. From small experiments I made, it doesn't seem to make any difference. Also size of the embedding setting seems to be missing in Invoke?
If the file name is used, does it have to contain a placeholder for the trigger word? E.g. * ?
This is a bit complicated. There are at least four different formats for textual inversion embedding files. Some of them contain the trigger token inside the file. In this case InvokeAI uses the token. Others do not have the trigger token. In this case InvokeAI creates a trigger that is the file name surrounded by <angle-brackets >. You can see what triggers are active by looking for an informational message that appears on the console at startup time.
But you're talking about using textural inversion, right? :3
I'm wondering about the training part of it. ❤️
There is at least one version of the embedding file format that does not contain either a trigger or a placeholder in the file. It's got a single dictionary key emb_params that points directly to a MxN tensor. I can post my notes on the various formats I've found in the wild. It would be great if you could write your training using one of the supported formats instead of creating a new variant.
Here are my notes on the four variants:
* PT VARIANT #1
example: rem_rezero.pt
keys: ['string_to_token', 'string_to_param', 'name', 'step', 'sd_checkpoint', 'sd_checkpoint_name']
name is the trigger token
string_to_token is {'*': 265}
string_to_param points to a dict with the placeholder
{'*': tensor([[-0.0512, 0.0459, -0.0991, ..., -0.2303, -0.2073, -0.0393],
tensor shape is [5,768] in this instance
* PT VARIANT #2
example: midj-strong.pt
keys: ['author', 'string_to_token', 'string_to_param']
string_to_token is {'*': tensor(265)}
string_to_param is {'*': tensor([[-0.0875, 0.0224, 0.0444, 0.0217, 0.0365, -0.0053, 0.0566, -0.0396...
tensor shape is [1,768]
* PT VARIANT #3
example: easynegative.safetensors
keys: ['emb_params']
emb_params is tensor([[-0.0004, 0.0095, -0.0080, ..., 0.0164, -0.0055, 0.0022],
tensor shape is [8, 768]
* "bin" VARIANT #1
example: kamon-style/learned_embeds.bin
keys: ['<kamon-style>']
<kamon-style> is tensor([ 0.2494, -0.0850, -0.0289, ..., 0.0964, -0.0115, 0.0238])
tensor shape is [1024]