#Tokenization Visualizer

42 messages · Page 1 of 1 (latest)

novel drum
#

I'm making a Visualization of Tokenization (A web application illustrating tokenization methods), something like this: https://tokens-lpj6s2duga-ew.a.run.app/
I want to expand this project by creating some kind of comparison. I'm lacking ideas, please help me! I wanted to compare the metrics of tokenizers/models but I thik this is too complicated (or I'm not understanding it well enough yet).
How should I expand it? Any other ideas?

dreamy wren
#

but what's the main reason for comparing the tokens?
-> to get an brief idea on how different models tokenize
-> so this will eventually help the begineers how tokens are created

#

@pure stone any thoughts on this?

#

along with implementation

novel drum
#

Basically its a uni project

#

and I have to expand it

#

but idk how exactly yet

dreamy wren
#

how you get this kind of ideas?

#

( I went through your github repo's )

novel drum
#

Let's say I do it. Then I see how different models tokenize. But what about it? What can I conclude from it? IF I can

novel drum
#

and chatted a bit with ChatGPT since I couldnt find anything interesting on websites with typical headings like "top 20 AI project ideas in 2025"

#

I dont remember posting my github link anywhere

dreamy wren
novel drum
#

no, no, thats not my github

#

im just saying i want to make a project like this

#

and then expand it

#

and I need help expanding it (how to do so)

#

sorry for confusion, that's why I wrote "something like this"

dreamy wren
#

I thought you made that and want to expand

novel drum
#

no, but I will want to make it in the near future

#

any ideas on the expancion? I need to talk to my professor tomorrow 😅

dreamy wren
#

you can also create a visualization for vector ( using those same tokens )
and create a live view

pure stone
#

I dont normally care about tokenizers :)

novel drum
#

Hm multilingual support seems nice. I've read about it already a bit - especially the no-space languages

#

but i dont think this is enough

novel drum
#

I;ve been searching the whole day today + yesterday 💀

pure stone
#

ehh just a table of different token sizes

#

but really there is not much interesting about tokenizers

#

the only interesting thing is special tokens

novel drum
#

really?

#

Hm maybe I should drop the comparison part and do something else...

novel drum
#

nah i wont give up (still researching 😭 )

plush vale
#

@novel drum what did you cook

novel drum
#

ah... nothing 😭 I just realized that comparing the tokenizers in different ways is possible but it has no point really