if it is too large for tokenizer it is | AI Programming And Chat | Page 1

zinc pine Jul 6, 2023, 11:23 PM

#

That makes sense. I think I would need to figure out how to find the number of total tokens for the document and then break them down into chunks that are small enough to be accepted by the model.

pearl flint Jul 7, 2023, 4:05 AM

#

zinc pine That makes sense. I think I would need to figure out how to find the number of t...

Why not just send half of it to count, and if that works, then you have a decent start at chunking them up

zinc pine Jul 7, 2023, 2:44 PM

#

pearl flint Why not just send half of it to count, and if that works, then you have a decent...

That could work. Long term I would rather have a programmatic way to do it. E.g. I’m working on a Python script that uses tiktoken from the command line

#if it is too large for tokenizer it is