Making sense of tokenization? | Invoke | Page 1

lusty tiger Jan 27, 2023, 1:56 AM

#

I’m using —log_tokenization and it has me wondering what the rhyme or reason is for some of the tokenizations. For example the word “zoomed” gets tokenized as two different colors for zo and omed. I have trouble coming up with any ideas about how this way of splitting that word is helpful. Although the CLI switch is definitely helpful for, if anything, revealing oddities like this. Is there a place to go for more on whether “zo” “omed” is giving me what I want, and how to understand what is going on?

scenic trellis Jan 28, 2023, 9:42 PM

#

you can see all the tokens it knows in its vocab.json file: https://huggingface.co/CompVis/stable-diffusion-v1-4/tree/main/tokenizer

lusty tiger Feb 10, 2023, 2:27 AM

#

Thanks!

#Making sense of tokenization?