#if it is too large for tokenizer it is
1 messages · Page 1 of 1 (latest)
That makes sense. I think I would need to figure out how to find the number of total tokens for the document and then break them down into chunks that are small enough to be accepted by the model.
Why not just send half of it to count, and if that works, then you have a decent start at chunking them up
That could work. Long term I would rather have a programmatic way to do it. E.g. I’m working on a Python script that uses tiktoken from the command line