I'm making a Visualization of Tokenization (A web application illustrating tokenization methods), something like this: https://tokens-lpj6s2duga-ew.a.run.app/
I want to expand this project by creating some kind of comparison. I'm lacking ideas, please help me! I wanted to compare the metrics of tokenizers/models but I thik this is too complicated (or I'm not understanding it well enough yet).
How should I expand it? Any other ideas?
#Tokenization Visualizer
42 messages · Page 1 of 1 (latest)
tooo good man
but what's the main reason for comparing the tokens?
-> to get an brief idea on how different models tokenize
-> so this will eventually help the begineers how tokens are created
@pure stone any thoughts on this?
@novel drum btw, I also love you https://short-ai-story.web.app/ really great idea
along with implementation
yeah especially the first one
Basically its a uni project
and I have to expand it
but idk how exactly yet
Let's say I do it. Then I see how different models tokenize. But what about it? What can I conclude from it? IF I can
I... made it up in my head 💀
and chatted a bit with ChatGPT since I couldnt find anything interesting on websites with typical headings like "top 20 AI project ideas in 2025"
I dont remember posting my github link anywhere
no, no, thats not my github
im just saying i want to make a project like this
and then expand it
and I need help expanding it (how to do so)
sorry for confusion, that's why I wrote "something like this"
ohh no worries
I thought you made that and want to expand
no, but I will want to make it in the near future
any ideas on the expancion? I need to talk to my professor tomorrow 😅
you can add multilingual support
you can also create a visualization for vector ( using those same tokens )
and create a live view
maybe a comparison between different tokenizers... idk
I dont normally care about tokenizers :)
Hm multilingual support seems nice. I've read about it already a bit - especially the no-space languages
but i dont think this is enough
i was also thinking about this but I need to specify this in more detail. How would the comparison look like?
I;ve been searching the whole day today + yesterday 💀
ehh just a table of different token sizes
but really there is not much interesting about tokenizers
the only interesting thing is special tokens
nah i wont give up (still researching 😭 )
@novel drum what did you cook
ah... nothing 😭 I just realized that comparing the tokenizers in different ways is possible but it has no point really